* use device agnostic APIs in test cases
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* fix style
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* add one more
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* xpu now supports integer device id, aligning to CUDA behaviors
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* update to use device_properties
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* fix style
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* update comment
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* fix comments
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* fix style
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
---------
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* fix: format codes
* chore: fix copy mismatch issue
* fix: format codes
* chore: fix copy mismatch issue
* chore: fix copy mismatch issue
* chore: fix copy mismatch issue
* chore: restore previous words
* chore: revert unexpected changes
* tmp commit
* move tests to the right class
* remove ALL all_generative_model_classes = ...
* skip tf roberta
* skip InstructBlipForConditionalGenerationDecoderOnlyTest
* videollava
* reduce diff
* reduce diff
* remove on vlms
* fix a few more
* manual rebase bits
* more manual rebase
* remove all manual generative model class test entries
* fix up to ernie
* a few more removals
* handle remaining cases
* recurrent gemma
* it's better here
* make fixup
* tf idefics is broken
* tf bert + generate is broken
* don't touch tf :()
* don't touch tf :(
* make fixup
* better comments for test skips
* revert tf changes
* remove empty line removal
* one more
* missing one
* add tests
* fix whisper
* update
* nit
* add qwen2-vl
* more updates!
* better this way
* fix this one
* fix more tests
* fix final tests, hope so
* fix led
* Update tests/generation/test_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* pr comments
* not pass pixels and extra for low-mem tests, very flaky because of visio tower
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* No more default chat templates
* Add the template to the GPT-SW3 tests since it's not available by default now
* Fix GPT2 test
* Fix Bloom test
* Fix Bloom test
* Remove default templates again
* Pass datasets trust_remote_code
* Pass trust_remote_code in more tests
* Add trust_remote_dataset_code arg to some tests
* Revert "Temporarily pin datasets upper version to fix CI"
This reverts commit b7672826ca.
* Pass trust_remote_code in librispeech_asr_dummy docstrings
* Revert "Pin datasets<2.20.0 for examples"
This reverts commit 833fc17a3e.
* Pass trust_remote_code to all examples
* Revert "Add trust_remote_dataset_code arg to some tests" to research_projects
* Pass trust_remote_code to tests
* Pass trust_remote_code to docstrings
* Fix flax examples tests requirements
* Pass trust_remote_dataset_code arg to tests
* Replace trust_remote_dataset_code with trust_remote_code in one example
* Fix duplicate trust_remote_code
* Replace args.trust_remote_dataset_code with args.trust_remote_code
* Replace trust_remote_dataset_code with trust_remote_code in parser
* Replace trust_remote_dataset_code with trust_remote_code in dataclasses
* Replace trust_remote_dataset_code with trust_remote_code arg
* try to stylify using ruff
* might need to remove these changes?
* use ruf format andruff check
* use isinstance instead of type comparision
* use # fmt: skip
* use # fmt: skip
* nits
* soem styling changes
* update ci job
* nits isinstance
* more files update
* nits
* more nits
* small nits
* check and format
* revert wrong changes
* actually use formatter instead of checker
* nits
* well docbuilder is overwriting this commit
* revert notebook changes
* try to nuke docbuilder
* style
* fix feature exrtaction test
* remve `indent-width = 4`
* fixup
* more nits
* update the ruff version that we use
* style
* nuke docbuilder styling
* leve the print for detected changes
* nits
* Remove file I/O
Co-authored-by: charliermarsh
<charlie.r.marsh@gmail.com>
* style
* nits
* revert notebook changes
* Add # fmt skip when possible
* Add # fmt skip when possible
* Fix
* More ` # fmt: skip` usage
* More ` # fmt: skip` usage
* More ` # fmt: skip` usage
* NIts
* more fixes
* fix tapas
* Another way to skip
* Recommended way
* Fix two more fiels
* Remove asynch
Remove asynch
---------
Co-authored-by: charliermarsh <charlie.r.marsh@gmail.com>
* adds agnostic decorators and availability fns
* renaming decorators and fixing imports
* updating some representative example tests
bloom, opt, and reformer for now
* wip device agnostic functions
* lru cache to device checking functions
* adds `TRANSFORMERS_TEST_DEVICE_SPEC`
if present, imports the target file and updates device to function
mappings
* comments `TRANSFORMERS_TEST_DEVICE_SPEC` code
* extra checks on device name
* `make style; make quality`
* updates default functions for agnostic calls
* applies suggestions from review
* adds `is_torch_available` guard
* Add spec file to docs, rename function dispatch names to backend_*
* add backend import to docs example for spec file
* change instances of to
* Move register backend to before device check as per @statelesshz changes
* make style
* make opt test require fp16 to run
---------
Co-authored-by: arsalanu <arsalanu@graphcore.ai>
Co-authored-by: arsalanu <hzji210@gmail.com>
* fix test for bart. Order is correct now let's skip BPEs
* ouf
* styling
* fix bert....
* slow refactoring
* current updates
* massive refactoring
* update
* NICE!
* update to see where I am at
* updates
* update
* update
* revert
* updates
* updates
* start supporting legacy_save
* styling
* big update
* revert some changes
* nits
* nniiiiiice
* small fixes
* kinda fix t5 with new behaviour
* major update
* fixup
* fix copies
* today's updates
* fix byt5
* upfate
* update
* update
* updates
* update vocab size test
* Barthez does not use not need the fairseq offset ids
* super calll must be after
* calll super
* move all super init
* move other super init
* fixup
* nits
* more fixes
* nits
* more fixes
* nits
* more fix
* remove useless files
* ouch all of them are affected
* and more!
* small imporvements
* no more sanitize token
* more changes around unique no split tokens
* partially fix more things
* keep legacy save but add warning
* so... more fixes
* updates
* guess deberta tokenizer could be nuked
* fixup
* fixup did some bad things
* nuke it if it breaks
* remove prints and pretrain fast from slow with new format.
* fixups
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* fiou
* nit
* by default specials should not be normalized?
* update
* remove brakpoint
* updates
* a lot of updates
* fixup
* fixes revert some changes to match fast
* small nits
* that makes it cleaner
* fix camembert accordingly
* update
* some lest breaking changes
* update
* fixup
* fix byt5 and whisper mostly
* some more fixes, canine's byte vocab
* fix gpt2
* fix most of the perceiver tests (4 left)
* fix layout lmv3
* fixup
* fix copies for gpt2 style
* make sure to only warn once
* fix perciever and gpt2 tests
* some more backward compatibility: also read special tokens map because some ppl use it........////.....
* fixup
* add else when reading
* nits
* fresh updates
* fix copies
* will this make everything faster?
* fixes
* more fixes
* update
* more fixes
* fixup
* is the source of truth right?
* sorry camembert for the troubles
* current updates
* fixup
* update led
* update
* fix regression
* fix single word
* more model specific fixes
* fix t5 tests
* fixup
* more comments
* update
* fix nllb
* rstrip removed
* small fixes
* better handle additional_special_tokens and vocab sizes
* fixing
* styling
* fix 4 / 21
* fixup
* fix nlbb's tests
* some fixes
* fix t5
* fixes
* style
* fix canine tests
* damn this is nice
* nits
* m2m100 nit
* fixups
* fixes!
* fixup
* stash
* fix merge
* revert bad change
* fixup
* correct order for code Llama
* fix speecht5 post merge
* styling
* revert source of 11 fails
* small nits
* all changes in one go
* fnet hack
* fix 2 more tests
* update based on main branch of tokenizers
* fixup
* fix VITS issues
* more fixes
* fix mgp test
* fix camembert issues
* oups camembert still has 2 failing tests
* mluke fixes
* decode fixes
* small nits
* nits
* fix llama and vits
* fix camembert
* smal nits
* more fixes when initialising a fast from a slow and etc
* fix one of the last test
* fix CPM tokenizer test
* fixups
* fix pop2piano
* fixup
* ⚠️ Change tokenizers required version ⚠️
* ⚠️ Change tokenizers required version ⚠️
* "tokenizers>=0.14,<0.15", don't forget smaller than
* fix musicgen tests and pretraiendtokenizerfast
* fix owlvit and all
* update t5
* fix 800 red
* fix tests
* fix the fix of the fix of t5
* styling
* documentation nits
* cache _added_tokens_encoder
* fixups
* Nit
* fix red tests
* one last nit!
* make eveything a lot simpler
* Now it's over 😉
* few small nits
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* updates that work for now
* tests that should no be skipped / changed and fixed next
* fixup
* i am ashamed
* pushe the fix
* update
* fixups
* nits
* fix added_tokens_encoder
* fix canine test
* fix pegasus vocab
* fix transfoXL
* fixup
* whisper needs to be fixed for train new
* pegasus nits
* more pegasus fixes
* minor update
* better error message in failed test
* fix whisper failing test
* fix whisper failing test
* fix pegasus
* fixup
* fix **** pegasus
* reset things
* remove another file
* attempts to fix the strange custome encoder and offset
* nits here and there
* update
* fixup
* nit
* fix the whisper test
* nits nits
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* updates based on review
* some small update to potentially remove
* nits
* import rlu cache
* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Lysandre Debut <hi@lysand.re>
* move warning to `from_pretrained`
* update tests results now that the special tokens are always added
---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
* First commit while I figure this out
* make fixup
* Remove unused method
* Store prompt attrib
* Fix prompt argument for tests
* Make same changes in fast tokenizer
* Remove global prompts from fast tokenizer too
* stash commit
* stash commit
* Migrate PromptConfig to its True Final Location
* Replace Conversation entirely with the new class
* Import/dependency fixes
* Import/dependency fixes
* Change format for lots of default prompts
* More default prompt fixups
* Revert llama old methods so we can compare
* Fix some default configs
* Fix some default configs
* Fix misspelled kwarg
* Fixes for Blenderbot
* make fixup
* little rebase cleanup
* Add basic documentation
* Quick doc fix
* Truncate docstring for now
* Add handling for the case when messages is a single string
* Quick llama merges
* Update conversational pipeline and tests
* Add a couple of legacy properties for backward compatibility
* More legacy handling
* Add docstring for build_conversation_input_ids
* Restructure PromptConfig
* Let's start T E M P L A T I N G
* Refactor all default configs to use templates instead
* Revert changes to the special token properties since we don't need them anymore
* More class templates
* Make the sandbox even sandier
* Everything replaced with pure templating
* Remove docs for PromptConfig
* Add testing and optional requirement boilerplate
* Fix imports and make fixup
* Fix LLaMA tests and add Conversation docstring
* Finally get LLaMA working with the template system
* Finally get LLaMA working with the template system
* make fixup
* make fixup
* fmt-off for the long lists of test tokens
* Rename method to apply_chat_template for now
* Start on documentation
* Make chat_template a property that reads through to the default if it's not set
* Expand docs
* Expand chat templating doc some more
* trim/lstrip blocks by default and update doc
* Few doc tweaks
* rebase cleanup
* Clarify docstring
* rebase cleanup
* rebase cleanup
* make fixup
* Quick doc edit
* Reformat the standard template to match ChatML
* Re-add PEFT check
* Update docs/source/en/chat_templating.md
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Add apply_chat_template to the tokenizer doc
* make fixup
* Add doc links
* Fix chat links
* Fix chat links
* Explain system messages in the doc
* Add chat template test
* Proper save-loading for chat template attribute
* Add test skips for layout models
* Remove _build_conversation_input_ids, add default_chat_template to code_llama
* Make sure all LLaMA models are using the latest template
* Remove default_system_prompt block in code_llama because it has no default prompt
* Update ConversationPipeline preprocess
* Add correct #Copied from links to the default_chat_templates
* Remove unneeded type checking line
* Add a dummy mark_processsed method
* Reorganize Conversation to have **deprecated_kwargs
* Update chat_templating.md
* Quick fix to LLAMA tests
* Small doc tweaks
* Add proper docstrings and "copied from" statements to all default chat templates
* Merge use_default_system_prompt support for code_llama too
* Improve clarity around self.chat_template
* Docstring fix
* Fix blenderbot default template
* More doctest fix
* Break out some tokenizer kwargs
* Update doc to explain default templates
* Quick tweaks to tokenizer args
* Cleanups for tokenizer args
* Add note about cacheing
* Quick tweak to the chat-templating doc
* Update the LLaMA template with error checking and correct system message embedding
* make fixup
* make fixup
* add requires_jinja
* Cleanup to expected output formatting
* Add cacheing
* Fix typo in llama default template
* Update LLaMA tests
* Update documentation
* Improved legacy handling in the Conversation class
* Update Jinja template with proper error handling
* Quick bugfix
* Proper exception raising
* Change cacheing behaviour so it doesn't try to pickle an entire Jinja env
* make fixup
* rebase cleanup
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* properly support Sequence of pretokenizers
* actual fix
* make sure the fix works. Tests are not working for sure!
* hacky way
* add TODO
* update
* add a todo
* nits
* rename test
* nits
* rename test
* Replaces calls to `.cuda` with `.to(torch_device)` in tests
`torch.Tensor.cuda()` is a pre-0.4 solution to changing a tensor's device. It is recommended to prefer `.to(...)` for greater flexibility and error handling. Furthermore, this makes it more consistent with other tests (that tend to use `.to(torch_device)`) and ensures the correct device backend is used (if `torch_device` is neither `cpu` or `cuda`).
* addressing review comments
* more formatting changes in Bloom test
* `make style`
* Update tests/models/bloom/test_modeling_bloom.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fixes style failures
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* time to say goodbye, torch 1.7 and 1.8
* clean up torch_int_div
* clean up is_torch_less_than_1_8-9
* update
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Result of black 23.1
* Update target to Python 3.7
* Switch flake8 to ruff
* Configure isort
* Configure isort
* Apply isort with line limit
* Put the right black version
* adapt black in check copies
* Fix copies
* torch.jit._state
* Fix past CI
* Fix for perceiver
* Fix REALM
* Fix for Bloom
* Fix for SwinMode
* Fix for TrajectoryTransformerModel
* Fix for test_wav2vec2_with_lm
* make style
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* move generation_*.py src files into generation/*.py
* populate generation.__init__ with lazy loading
* move imports and references from generation.xxx.object to generation.object
* add bloom for question answering
- attempt to add Bloom for question answering
- adapted from `GPTJForQuestionAnswering`
- Fixed `num_labels` to `2` for common tests
- Added a bit of docstring
- All common tests pass
* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* revert changes related to `num_labels`
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Bloom model can now be traced
* Bloom traced model can be torch scripted and serialized
* Bloom can be traced with variable keyword arguments
* Enable XLNet support
* Disable XLNet for now
* fix tolerance for a bloom slow test
* enhance alibi padding
- get rid of for loops
- deals better with padded batched input
- avoid useless cpu/gpu communication when creating alibi
Co-authored-by: justheuristic <justheuristic@gmail.com>
* optimize attention mask
* fix scaled softmax limit values
* optimize building alibi tensor
Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>
* fix attention_mask shape when it's None
* minor fixes
- fix docstring + arg names
* remove colons in docstring
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* apply suggestion
* remove unsued arg
* refactor a bit
- use [:, None] for consistency
* refactor attention block
Co-authored-by: Nouamane Tazi <nouamane98@gmail.com>
* quick fixes
* first attempt
* refactor attention block and fix all tests except "test_simple_generation"
- added comments to better explain attention block
* remove debug lines and add TODO comment
* change `torch.bmm` to `torch.baddbmm`
- fixes `test_simple_generation`but breaks `test_batch_generation_padd`
* styling
* all tests are passing now
- use `bmm`
- add explanation for `allow_fp16_reduced_precision_reduction`
Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>
* styling
Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>
* fix support for accelerate
Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* remove attn softmax in fp32
* refactor comments
* refactor a bit
- remove warning message
- remove print on test
* refer to pytorch t5
* change the slow tests
- do the tests in fp32
- remove some comments
- keep large comments
* update expected output for `test_simple_generation`
- we now test using fp32
* make style + change comments a bit
* fix dtype padd test
Co-authored-by: justheuristic <justheuristic@gmail.com>
Co-authored-by: Nouamane Tazi <nouamane98@gmail.com>
Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* few fixes:
- hardcode tokenizer padding side
- remove unused args
* few fixes:
- added new attribute on TokenizerTesterMixin
- added new slow test
- remove unused arg on tokenizer class
* make style
* Update src/transformers/models/bloom/tokenization_bloom_fast.py
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
* make quality
* apply changes
- remove new attribute
- redefine test on the class
* add comments
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
* add new bloom classes
* (feat) add bloom classification tests; make style
* style: change import in test
* add some typehints to bloom classes
* merge main into branch
* fix: input checking in bloom seq classification
* fix tests
* change model class tests
* fix few tests
- more tests should pass
- one test left
* make token classifier return hidden states
* style: make BLOOM typehints consistent
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* adding template
* update model
* model update
* update conf for debug model
* update conversion
* update conversion script
* update conversion script
* fix missing keys check
* add tests to test the tokenizer in the local machine
* Change variable name
* add tests on xnli dataset
* add more description
* add descriptions + clearer code
* clearer code
* adding new tests + skipping few tests because of env problems
* change comment
* add dtype on the configuration
* add test embeddings
* add hardcoded test
* fix dtype issue
* adding torch.float16 to config
* adding more metrics (min, max, mean)
* add sum
* now the test passes with almost equal
* add files for conversion - test passes on cpu gpu
* add final changes
* cleaning code
* add new args in the docstring
* fix one liner function
* remove macros
* remove forward attention
* clean up init funtion
* add comments on the issue
* rm scale mask softmax
* do make style
* fix dtype in init
* fixing for loop on att probs
* fix style with black
* fix style + doc error
* fix and debug CI errors (docs + style)
* some updates
- change new operations
- finally add scaled softmax
- added new args in the config
* make use cache working
* add changes
- save sharded models
- final changes on the modeling script
* add changes
- comment on alibi
- add TODO on seq length
* test commit
- added a text to test the commit
Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com>
* final changes
- attention mask change
- generation works on BS176b
Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com>
* changes - model + conversion
* move to correct dir
* put ,
* fex fixes
* fix tokenizer autodoc
* fix minor CI issues
* fix minor CI issues
* fix minor CI issues
* fix style issue
* fix minor import issues
* fix few issues
* remove def main on the test
* add require torch
* replace decorator with 'with'
* fix style
* change to bloom
* add quick fix tokenizer
* fix tokenizer file
* fix tokenizer
- merge tests
- small fixes
* fix import issue
* add bloom to readme
* fix consistency
* Update docs/source/en/model_doc/bloom.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Apply suggestions from code review
fix comment issues on file headers
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* fix doc issue
* small fix - modeling test
* some changes
- refactor some code
- taking into account reviews
- more tests should pass
- removed pruning tests
* remove useless division
* more tests should pass
* more tests should pass
* more tests should pass
* let's try this one
-add alibi offset
- remove all permutes to make the grad operations work
- finger crossed
* refactor
- refactor code
- style changes
- add new threshold for test
* major changes
- change BLOOM to Bloom
- add quick doc on bloom.mdx
- move embeddings test on modeling test
* modify readme
* small fixes
* small fix
- better threshold for a test
* remove old test file from fetcher
* fix small typo
* major change
- change BloomLMHead to BloomForCausalLM
* remove onnx config
* major changes
- refactor the code
- remove asserts
- change tol for test
* make style
* small change
* adding a slow test + commenting old ones for now
* make style
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* make style
* fix duplicates
* cleaning comments on config
* clean a bit conversion file
* refacor a bit modeling file
* refactor tokenizer file
* fix tokenization test issue
* fix tokenization issue #2
* fix tokenization issue second try
* fix test issue
* make style + add suggestions
* change test fetcher
* try this one
- slow tests should pass
- finger crossed
* possible final changes
* make style
* try fix padding side issue
* fix side
* fix padding issue
* fix ko-readme
* fix config auto
* cleaning modeling file
* keep bloom in caps in ko
* update config docs
* remove pretraining_pp
* remove model parallel
* update config
- add correct config files
* fix duplicates
* fix fetcher
* fix refactor issue
- remove divide function
* try to remove alibi
* small fixes
- fix alibi
- remove seq length
- refactor a bit the code
* put correct values
- fix bos and eos token ids
* fix attention mask loop
Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com>
* small fixes:
- remove skip bias add
* small fixes
- fix typo in readme
- fix typos in config
* small changes
- remove a test
- add reconstruction test
- change config
* small changes
- change Scaled Softmax to BloomScaledSoftmax
* small fixes
- fix alibi dtype
* major changes
- removing explicit dtype when loading modules
- fixing test args (torch_dtype=auto)
- add dosctring
* fix readmes
* major changes
- now bloom supports alibi shifting
- refactor a bit the code
- better test tolerance now
* refactor a bit
* refactor a bit
* put correct name on test
* change docstring
* small changes
- fix docstring modeling
- fix test tolerance
* fix small nit
- take dtype from tensors in the conversion script
* minor fix
- fix mdx issue
* minor fix
- change config docstring
* forward contrib credits from PR14084
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* apply modifications
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* resolve softmax upcast
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
* final changes modeling
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Merge commit 'd156898f3b9b2c990e5963f5030a7143d57921a2'
* merge commit
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* apply suggestions
Apply suggestions from Stas comments
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Fix gradient checkpointing
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* add slow but exact
* add accelerate compatibility
Co-authored-by: Nicolas Patry <Narsil@users.noreply.github.com>
* forward contrib credits
Co-authored-by: thomasw21 <thomasw21@users.noreply.github.com>
Co-authored-by: sgugger <sgugger@users.noreply.github.com>
Co-authored-by: patrickvonplaten <patrickvonplaten@users.noreply.github.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: LysandreJik <LysandreJik@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix torch device on tests
* make style
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix nits
Co-authored-by: patrickvonplaten<patrickvonplaten@users.noreply.github.com>
* remove final nits
* fix doc
- add more details on the doc
- add links to checkpoints
* Update src/transformers/__init__.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* apply suggestions
Co-authored-by: sgugger <sgugger@users.noreply.github.com>
* put test torchscript to false
* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: justheuristic <justheuristic@gmail.com>
* fix alibi
- create alibi only once
* add small doc
* make quality
* replace torch.nn
* remove token type emb
* fix fused op + output bias
* add fused op
- now can control fused operation from config
* remove fused op
* make quality
* small changes
- remove unsed args on config
- removed bias gelu file
- make the model torchscriptable
- add torchscript slow tests
* Update src/transformers/models/bloom/modeling_bloom.py
* fix slow
* make style
* add accelerate support
* add bloom to deepspeed tests
* minor changes
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* minor change
* slow tests pass
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update docs/source/en/model_doc/bloom.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* minor changes:
- change docstring
- add link to paper
Co-authored-by: Thomwolf <thomwolf@gmail.com>
Co-authored-by: Thomas Wolf <thomas@huggingface.co>
Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: sIncerass <sheng.s@berkeley.edu>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: Nicolas Patry <Narsil@users.noreply.github.com>
Co-authored-by: thomasw21 <thomasw21@users.noreply.github.com>
Co-authored-by: sgugger <sgugger@users.noreply.github.com>
Co-authored-by: patrickvonplaten <patrickvonplaten@users.noreply.github.com>
Co-authored-by: LysandreJik <LysandreJik@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: justheuristic <justheuristic@gmail.com>
Co-authored-by: Stas Bekman <stas@stason.org>