* starting attn refactor for encoder decoder models via bart (eager + sdpa)
* flash attention works, remove unnecessary code
* flex attention support for bart!, gotta check if the renaming is not too aggressive
* some comments
* skip flex grad test for standalone as done with the other test
* revert flex attn rename (for now), sdpa simplify, and todos
* more todos
* refactor mask creation for reuse
* modular attempt at biogpt
* first batch of other models
* fix attn dropout
* fix autoformer copies
* hubert
* another batch of models
* copies/style + last round of bart models --> whisper next?
* remove unnecessary _reshape function and remove copy to whisper
* add skip for decoder-only models out of enc-dec (same as in bart)
* bring back licences
* remove comment, added to pr read instead
* mostly docs
* disable sew flex attn as it's unclear attn mask for now
* oops
* test fixes for enc-dec
* torch fx fixes + try at flex attn
* skip on mbart
* some more fixes
* musicgen skip / delete old attn class logic + sdpa compose compile skip
* disable flex attn for musicgen, not worth the effort
* more fixes and style
* flex attention test for dropout and encoder decoder that dont have main input names
* informer fixes
* the weirdest thing I've encountered yet...
* style
* remove empty tensor attempt, found core root in previous commits
* disable time series due to tests being very text centric on inputs
* add speech to text to be ignoring the other attns, also due to tests
* update docs
* remaining issues resolved ?
* update docs for current state --> nllb moe and pegasus x sdpa is questionable :D
* some models have not set the is_causal flag...
* change dtype in softmax tol old behaviour + some modular fixes
* I hate it but it is what it is
* fixes from main for bart
* forgot this one
* some model fixes
* style
* current status
* marian works now
* fixing some copies
* some copy fixes + time series x informer
* last models possibly and fixes on style/copies
* some post merge fixes
* more fixes
* make attention interface callable and move warnings there
* style lol
* add comment to "unsupported"
* remove callable interface and change interface warnings + some copies
* fix
* ternary is ugly af, make it simpler
* how did that happen
* fix flex attn test
* failing the test
* no more fallback! fixing copies next
* style + attn fixed
* fixing copies and mask creation
* wrong copy
* fixup tests and disable flex attn for now
* fixup last tests?
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* fix: format codes
* chore: fix copy mismatch issue
* fix: format codes
* chore: fix copy mismatch issue
* chore: fix copy mismatch issue
* chore: fix copy mismatch issue
* chore: restore previous words
* chore: revert unexpected changes
* tmp commit
* move tests to the right class
* remove ALL all_generative_model_classes = ...
* skip tf roberta
* skip InstructBlipForConditionalGenerationDecoderOnlyTest
* videollava
* reduce diff
* reduce diff
* remove on vlms
* fix a few more
* manual rebase bits
* more manual rebase
* remove all manual generative model class test entries
* fix up to ernie
* a few more removals
* handle remaining cases
* recurrent gemma
* it's better here
* make fixup
* tf idefics is broken
* tf bert + generate is broken
* don't touch tf :()
* don't touch tf :(
* make fixup
* better comments for test skips
* revert tf changes
* remove empty line removal
* one more
* missing one
* use torch.testing.assertclose instead to get more details about error in cis
* fix
* style
* test_all
* revert for I bert
* fixes and updates
* more image processing fixes
* more image processors
* fix mamba and co
* style
* less strick
* ok I won't be strict
* skip and be done
* up
* tmp commit
* tmp commit
* cull overwrites of deleted tests
* typo
* more specific docstring
* make fixup
* parameterize at the top?
* correction
* more deletions :D
* tmp commit
* for VLMs too
* fix _check_outputs
* test nit
* make fixup
* fix another flaky
* test_generate_from_inputs_embeds -- handle missing attention mask
* 1,100%!
* Clean
* Don't touch DS
* Experiment with dtype allocation
* skip test_load_save_without_tied_weights test
* A little faster
* Include proper upscaling?
* Fixup tests
* Potentially skip?
* Let's see if this fixes git history
* Maintain new dtype
* Fin
* Rm hook idea for now
* New approach, see what breaks
* stage
* Clean
* Stash
* Should be fin now, just need to mark failing models
* Clean up
* Simplify
* Deal with weird models
* Enc/Dec
* Skip w/ reason
* Adjust test
* Fix test
* one more test
* Keep experimenting
* Fix ref
* TO REMOVE: testing feedback CI
* Right push
* Update tests/utils/test_modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* disable
* Add new func
* Test nits from Amy
* Update src/transformers/modeling_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Adjust comment
* Adjust comment on skip
* make private
* Fin
* Should be a not flag
* Clarify and rename test
---------
Co-authored-by: Marc Sun <marc@huggingface.co>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Remove ConversationalPipeline and Conversation object, as they have been deprecated for some time and are due for removal
* Update not-doctested.txt
* Fix JA and ZH docs
* Fix JA and ZH docs some more
* Fix JA and ZH docs some more
* left-padding test revisited
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Adjust length limits and allow naked conversation list inputs
* Adjust length limits and allow naked conversation list inputs
* Maybe use a slightly more reasonable limit than 1024
* Skip tests for old models that never supported this anyway
* Cleanup input docstrings
* More docstring cleanup + skip failing TF test
* Make fixup
* fix test for bart. Order is correct now let's skip BPEs
* ouf
* styling
* fix bert....
* slow refactoring
* current updates
* massive refactoring
* update
* NICE!
* update to see where I am at
* updates
* update
* update
* revert
* updates
* updates
* start supporting legacy_save
* styling
* big update
* revert some changes
* nits
* nniiiiiice
* small fixes
* kinda fix t5 with new behaviour
* major update
* fixup
* fix copies
* today's updates
* fix byt5
* upfate
* update
* update
* updates
* update vocab size test
* Barthez does not use not need the fairseq offset ids
* super calll must be after
* calll super
* move all super init
* move other super init
* fixup
* nits
* more fixes
* nits
* more fixes
* nits
* more fix
* remove useless files
* ouch all of them are affected
* and more!
* small imporvements
* no more sanitize token
* more changes around unique no split tokens
* partially fix more things
* keep legacy save but add warning
* so... more fixes
* updates
* guess deberta tokenizer could be nuked
* fixup
* fixup did some bad things
* nuke it if it breaks
* remove prints and pretrain fast from slow with new format.
* fixups
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* fiou
* nit
* by default specials should not be normalized?
* update
* remove brakpoint
* updates
* a lot of updates
* fixup
* fixes revert some changes to match fast
* small nits
* that makes it cleaner
* fix camembert accordingly
* update
* some lest breaking changes
* update
* fixup
* fix byt5 and whisper mostly
* some more fixes, canine's byte vocab
* fix gpt2
* fix most of the perceiver tests (4 left)
* fix layout lmv3
* fixup
* fix copies for gpt2 style
* make sure to only warn once
* fix perciever and gpt2 tests
* some more backward compatibility: also read special tokens map because some ppl use it........////.....
* fixup
* add else when reading
* nits
* fresh updates
* fix copies
* will this make everything faster?
* fixes
* more fixes
* update
* more fixes
* fixup
* is the source of truth right?
* sorry camembert for the troubles
* current updates
* fixup
* update led
* update
* fix regression
* fix single word
* more model specific fixes
* fix t5 tests
* fixup
* more comments
* update
* fix nllb
* rstrip removed
* small fixes
* better handle additional_special_tokens and vocab sizes
* fixing
* styling
* fix 4 / 21
* fixup
* fix nlbb's tests
* some fixes
* fix t5
* fixes
* style
* fix canine tests
* damn this is nice
* nits
* m2m100 nit
* fixups
* fixes!
* fixup
* stash
* fix merge
* revert bad change
* fixup
* correct order for code Llama
* fix speecht5 post merge
* styling
* revert source of 11 fails
* small nits
* all changes in one go
* fnet hack
* fix 2 more tests
* update based on main branch of tokenizers
* fixup
* fix VITS issues
* more fixes
* fix mgp test
* fix camembert issues
* oups camembert still has 2 failing tests
* mluke fixes
* decode fixes
* small nits
* nits
* fix llama and vits
* fix camembert
* smal nits
* more fixes when initialising a fast from a slow and etc
* fix one of the last test
* fix CPM tokenizer test
* fixups
* fix pop2piano
* fixup
* ⚠️ Change tokenizers required version ⚠️
* ⚠️ Change tokenizers required version ⚠️
* "tokenizers>=0.14,<0.15", don't forget smaller than
* fix musicgen tests and pretraiendtokenizerfast
* fix owlvit and all
* update t5
* fix 800 red
* fix tests
* fix the fix of the fix of t5
* styling
* documentation nits
* cache _added_tokens_encoder
* fixups
* Nit
* fix red tests
* one last nit!
* make eveything a lot simpler
* Now it's over 😉
* few small nits
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* updates that work for now
* tests that should no be skipped / changed and fixed next
* fixup
* i am ashamed
* pushe the fix
* update
* fixups
* nits
* fix added_tokens_encoder
* fix canine test
* fix pegasus vocab
* fix transfoXL
* fixup
* whisper needs to be fixed for train new
* pegasus nits
* more pegasus fixes
* minor update
* better error message in failed test
* fix whisper failing test
* fix whisper failing test
* fix pegasus
* fixup
* fix **** pegasus
* reset things
* remove another file
* attempts to fix the strange custome encoder and offset
* nits here and there
* update
* fixup
* nit
* fix the whisper test
* nits nits
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* updates based on review
* some small update to potentially remove
* nits
* import rlu cache
* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Lysandre Debut <hi@lysand.re>
* move warning to `from_pretrained`
* update tests results now that the special tokens are always added
---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
* hidden layers, huh, what are they good for (absolutely nothing)
* Some tests break with 1 hidden layer, use 2
* Use 1 hidden layer in a few slow models
* Use num_hidden_layers=2 everywhere
* Slightly higher tol for groupvit
* Slightly higher tol for groupvit
* Fix one BLIP arg not being optional, remove misspelled arg
* Remove the lxmert test overrides and just use the base test_saved_model_creation
* saved_model_creation fixes and re-enabling tests across the board
* Remove unnecessary skip
* Stop caching sinusoidal embeddings in speech_to_text
* Fix transfo_xl compilation
* Fix transfo_xl compilation
* Fix the conditionals in xglm
* Set the save spec only when building
* Clarify comment
* Move comment correctly
* Correct embeddings generation for speech2text
* Mark RAG generation tests as @slow
* Remove redundant else:
* Add comment to clarify the save_spec line in build()
* Fix size tests for XGLM at last!
* make fixup
* Remove one band_part operation
* Mark test_keras_fit as @slow
* A fun new PR where I break the entire codebase again
* A fun new PR where I break the entire codebase again
* Handle cross-attention
* Move calls to model(model.dummy_inputs) to the new build() method
* Seeing what fails with the build context thing
* make fix-copies
* Let's see what fails with new build methods
* Fix the pytorch crossload build calls
* Fix the overridden build methods in vision_text_dual_encoder
* Make sure all our build methods set self.built or call super().build(), which also sets it
* make fix-copies
* Remove finished TODO
* Tentatively remove unneeded (?) line
* Transpose b in deberta correctly and remove unused threading local
* Get rid of build_with_dummies and all it stands for
* Rollback some changes to TF-PT crossloading
* Correctly call super().build()
* Rework TF type hints to use | None instead of Optional[] for tf.Tensor
* Rework TF type hints to use | None instead of Optional[] for tf.Tensor
* Don't forget the imports
* Add the imports to tests too
* make fixup
* Refactor tests that depended on get_type_hints
* Better test refactor
* Fix an old hidden bug in the test_keras_fit input creation code
* Fix for the Deit tests
* Enforce single model initialization
* Add OneFormer example for problem 3
* Do it the Stas way
* Actually rename the uses...
* Rewrite test
* Try to change the test this way
* Fix all init slow/fast tests
* Break connection
* Fix more tests
* Fix test for initialization
* Remove custom test
* Quality
* Fix last failing tests
* The end?
* Result of black 23.1
* Update target to Python 3.7
* Switch flake8 to ruff
* Configure isort
* Configure isort
* Apply isort with line limit
* Put the right black version
* adapt black in check copies
* Fix copies
* read to load
* base functionality
* revert init
* fix dummy data
* moving right along
* moving right along
* finally
* cleanup
* pull out comment
* add test
* update docstring for main class
* flake comments and rewriting copies from make repo-consistency`
* remove irrelevant differences/accidental spaces
* put copies back after space removals
* mid
* final test pass
* stray comment
* update test file
* update test file
* fixup
* black
* missed
* black missed one more
* sytle
* add doc update
* fix order of output class
* comment
* Revert "comment"
This reverts commit 03f86b6948.
* remove redundant function, and redundant reshape
* move change out of common
* style
* put common spaces back
* reorder kwargs in output
* doc style
* Try PT1.13 by removing torch scatter
* Skip failing tests
* Style
* Remvoe testing extras for repo utils
* Try with all decorators
* Try to wipe the cache
* Fix all tests?
* Try this way
* Fix comma
* Update to main
* Try with less deps
* Quality
* move generation_*.py src files into generation/*.py
* populate generation.__init__ with lazy loading
* move imports and references from generation.xxx.object to generation.object
* Attempting to test automatically the `_keys_to_ignore`.
* Style.
* First fix pass.
* Moving test on its own.
* Another batch.
* Second round removing BatchNorm
* Fixing layoutlmv{2,3} + support older Python.
* Disable miss missing warning.
* Removing dodgy additions.
* Big pass.
* mbart.
* More corrections.
* Fixup.
* Updating test_correct_missing_keys
* Add escape hatch for when the head has no extra params so doesn't need
the missing keys check.
* Fixing test.
* Greener.
* Green ! (except for weird splinter bug).
* Adding a test about `named_parameters` usage.
* Shorten message.
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* After rebase modifications.
* More explicit condition checking.
* Fixing slow tests issues.
* Remove extra pdb.
* Remove print.
* Attempt to make failure consistent + fixing roc_bert.
* Removing the seed (all tests passing with it).
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>