* wow I was scared!
* fix everything
* nits
* make it BC?
* add todo
* nits
* is_tracing should still be used to pass tracing tests
* nits
* some nits to make sure genration works with static cache uncompiled
* fix sdpa
* fix FA2 for both static and dynamic in a better way?
* style
* fix-copies
* fix fix copies
* fix sequential beam searcg
* style
* use `keys_to_ignore`
* nit
* correct dtype inference when init
* :( the fix for FA2 is still not optimal to investigate!
* styling
* nits
* nit
* this might work better
* add comment
* Update src/transformers/models/llama/modeling_llama.py
* "position_ids" -> "cache_position"
* style
* nit
* Remove changes that should no be propagatted just yet
* Apply suggestions from code review
* Styling
* make sure we raise an errir for static cache with FA2 enabled
* move to the bottom of the signature
* style
* Update src/transformers/models/llama/modeling_llama.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Update src/transformers/models/llama/modeling_llama.py
* nit in the name
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Add tie_weights() to LM heads and set bias in set_output_embeddings()
The bias were not tied correctly in some LM heads, and this change should fix that.
* Moving test_save_and_load_low_cpu_mem_usage to ModelTesterMixin
* Adding _tie_weights() to MPNet and Vilt
* Skip test for low cpu mem usage for Deta/DeformableDetr since they cannot init on meta device
* Rename to test name to save_load to match the convention
* Update the processing so bbox coords are adjusted for padding
* Just pad masks
* Tidy up, add tests
* Better tests
* Fix yolos and mark as slow for pycocotols
* Fix yolos - return_tensors
* Clarify padding and normalization behaviour
* add sudachi_projection option
* Upgrade sudachipy>=0.6.8
* add a test case for sudachi_projection
* Compatible with older versions of SudachiPy
* make fixup
* make style
* error message for unidic download
* revert jumanpp test cases
* format options for sudachi_projection
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* format options for sudachi_split_mode and sudachi_dict_type
* comment
* add tests for full_tokenizer kwargs
* pass projection arg directly
* require_sudachi_projection
* make style
* revert upgrade sudachipy
* check is_sudachi_projection_available()
* revert dependency_version_table and bugfix
* style format
* simply raise ImportError
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* simply raise ImportError
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* refactor with addedtokens decoder
* style
* get rid of lang code to id
* style
* keep some things for BC
* update tests
* add the mask token at the end of the vocab
* nits
* nits
* fix final tests
* style
* nits
* Update src/transformers/models/nllb/tokenization_nllb_fast.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* nits
* style?
* Update src/transformers/convert_slow_tokenizer.py
* make it a tad bit more custom
* ruff please stop
Co-Authored by avidale
<dale.david@mail.ru>
* Update
Co-authored-by: avidale
<dale.david@mail.ru>
* Update
Co-authored-by: avidale <dale.david@mail.ru>
* oupts
* ouft
* nites
* test
* fix the remaining failing tests
* style
* fix failing test
* ficx other test
* temp dir + test the raw init
* update test
* style
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* This is a test commit
* testing commit
* final commit with some changes
* Removed copy statement
* Fixed formatting issues
* Fixed error added past_key_values in the forward method
* Fixed a trailing whitespace. Damn the formatting rules are strict
* Added the copy statement
* [WIP] Hard error when ignoring tensors.
* Better selection/error when saving a checkpoint.
- Find all names we should normally drop (those are in the transformers
config)
- Find all disjoint tensors (for those we can safely trigger a copy to
get rid of the sharing before saving)
- Clone those disjoint tensors getting rid of the issue
- Find all identical names (those should be declared in the config
but we try to find them all anyway.)
- For all identical names:
- If they are in the config, just ignore them everything is fine
- If they are not, warn about them.
- For all remainder tensors which are shared yet neither identical NOR
disjoint. raise a hard error.
* Adding a failing test on `main` that passes here.
* We don't need to keep the subfolder logic in this test.
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Fix typos and grammar mistakes in docs and examples
* Fix typos in docstrings and comments
* Fix spelling of `tokenizer` in model tests
* Remove erroneous spaces in decorators
* Remove extra spaces in Markdown link texts
* Adding [T5/MT5/UMT5]ForTokenClassification
* Add auto mappings for T5ForTokenClassification and variants
* Adding ForTokenClassification to the list of models
* Adding attention_mask param to the T5ForTokenClassification test
* Remove outdated comment in test
* Adding EncoderOnly and Token Classification tests for MT5 and UMT5
* Fix typo in umt5 string
* Add tests for all the existing MT5 models
* Fix wrong comment in dependency_versions_table
* Reverting change to common test for _keys_to_ignore_on_load_missing
The test is correctly picking up redundant keys in _keys_to_ignore_on_load_missing.
* Removing _keys_to_ignore_on_missing from MT5 since the key is not used in the model
* Add fix-copies to MT5ModelTest
* up
* Fix more
* Correct more
* Fix more tests
* fix fast tests
* Fix more
* fix more
* push all files
* finish all
* make style
* Fix timestamp wrap
* make style
* make style
* up
* up
* up
* Fix lang detection behavior
* Fix lang detection behavior
* Add lang detection test
* Fix lang detection behavior
* make style
* Update src/transformers/models/whisper/generation_whisper.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* better error message
* make style tests
* add warning
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* test that tied output embeddings aren't initialized on load
* don't initialize the output embeddings if we're going to tie them to the input embeddings
* Port core files + ESM (because ESM code is odd)
* Search-replace in modelling code
* Fix up transfo_xl as well
* Fix other core files + tests (still need to add correct import to tests)
* Fix cookiecutter
* make fixup, fix imports in some more core files
* Auto-add imports to tests
* Cleanup, add imports to sagemaker tests
* Use correct exception for importing tf_keras
* Fixes in modeling_tf_utils
* make fixup
* Correct version parsing code
* Ensure the pipeline tests correctly revert to float32 after each test
* Ensure the pipeline tests correctly revert to float32 after each test
* More tf.keras -> keras
* Add dtype cast
* Better imports of tf_keras
* Add a cast for tf.assign, just in case
* Fix callback imports
* Enable instantiating model with pretrained backbone weights
* Remove doc updates until changes made in modeling code
* Use load_backbone instead
* Add use_timm_backbone to the model configs
* Add missing imports and arguments
* Update docstrings
* Make sure test is properly configured
* Include recent DPT updates
* fix the function load_balancing_loss_func in Mixtral_Moe to include attention_mask
* format code using black and ruff
* skip computing mask if attention_mask=None
* add tests for load balancing loss Mixtral-Moe
* fix assert loss is different in mixtral_test
* fix pad_leng
* use assertNotAlmostEqual and print to debug
* remove print for debug
* minor updates
* reduce rtol and atol