* remove control flow
* update gptneox
* update ....
* nits
* Actually let's just break. Otherwise we are silently failing which imo is not optimal
* version BC
* fix tests
* fix eager causal
* nit
* add a test
* style
* nits
* nits
* more nits for the test
* update and fix
* make sure cuda graphs are not skipped
* read token is needed for meta llama
* update!
* fiixup
* compile test should be slow
* fix thet fix copies
* stle 🫠
* stash commit
* stash commit
* It works!
* Remove unnecessary change
* We don't actually need the cache_dir!
* Update docstring
* Add test
* Add test with custom cache dir too
* Update model repo path
* Revert "Add tie_weights() to LM heads and set bias in set_output_embeddings() (#28948)"
This reverts commit 725f4ad1cc.
* Revert "Patch to skip failing `test_save_load_low_cpu_mem_usage` tests (#29043)"
This reverts commit 4156f517ce.
* add add_dummy_prefix_space option to slow
* checking kwargs might be better. Should be there for all spm tokenizer IMO
* nits
* fix copies
* more copied
* nits
* add prefix space
* nit
* nits
* Update src/transformers/convert_slow_tokenizer.py
* fix inti
* revert wrong styling
* fix
* nits
* style
* updates
* make sure we use slow tokenizer for conversion instead of looking for the decoder
* support llama ast well
* update llama tokenizer fast
* nits
* nits nits nits
* update the doc
* update
* update to fix tests
* skip unrelated tailing test
* Update src/transformers/convert_slow_tokenizer.py
* add proper testing
* test decode as well
* more testing
* format
* fix llama test
* Apply suggestions from code review
* pass through trust_remote_code for dynamically loading unregistered tokenizers specified by config
add test
* change directories back to previous directory after test
* fix ruff check
* Add a note to that block for future in case we want to remove it later
---------
Co-authored-by: Matt <rocketknight1@gmail.com>
* Add tie_weights() to LM heads and set bias in set_output_embeddings()
The bias were not tied correctly in some LM heads, and this change should fix that.
* Moving test_save_and_load_low_cpu_mem_usage to ModelTesterMixin
* Adding _tie_weights() to MPNet and Vilt
* Skip test for low cpu mem usage for Deta/DeformableDetr since they cannot init on meta device
* Rename to test name to save_load to match the convention
* Update the processing so bbox coords are adjusted for padding
* Just pad masks
* Tidy up, add tests
* Better tests
* Fix yolos and mark as slow for pycocotols
* Fix yolos - return_tensors
* Clarify padding and normalization behaviour
* add sudachi_projection option
* Upgrade sudachipy>=0.6.8
* add a test case for sudachi_projection
* Compatible with older versions of SudachiPy
* make fixup
* make style
* error message for unidic download
* revert jumanpp test cases
* format options for sudachi_projection
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* format options for sudachi_split_mode and sudachi_dict_type
* comment
* add tests for full_tokenizer kwargs
* pass projection arg directly
* require_sudachi_projection
* make style
* revert upgrade sudachipy
* check is_sudachi_projection_available()
* revert dependency_version_table and bugfix
* style format
* simply raise ImportError
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* simply raise ImportError
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* refactor with addedtokens decoder
* style
* get rid of lang code to id
* style
* keep some things for BC
* update tests
* add the mask token at the end of the vocab
* nits
* nits
* fix final tests
* style
* nits
* Update src/transformers/models/nllb/tokenization_nllb_fast.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* nits
* style?
* Update src/transformers/convert_slow_tokenizer.py
* make it a tad bit more custom
* ruff please stop
Co-Authored by avidale
<dale.david@mail.ru>
* Update
Co-authored-by: avidale
<dale.david@mail.ru>
* Update
Co-authored-by: avidale <dale.david@mail.ru>
* oupts
* ouft
* nites
* test
* fix the remaining failing tests
* style
* fix failing test
* ficx other test
* temp dir + test the raw init
* update test
* style
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* This is a test commit
* testing commit
* final commit with some changes
* Removed copy statement
* Fixed formatting issues
* Fixed error added past_key_values in the forward method
* Fixed a trailing whitespace. Damn the formatting rules are strict
* Added the copy statement
* Fix typos and grammar mistakes in docs and examples
* Fix typos in docstrings and comments
* Fix spelling of `tokenizer` in model tests
* Remove erroneous spaces in decorators
* Remove extra spaces in Markdown link texts
* Adding [T5/MT5/UMT5]ForTokenClassification
* Add auto mappings for T5ForTokenClassification and variants
* Adding ForTokenClassification to the list of models
* Adding attention_mask param to the T5ForTokenClassification test
* Remove outdated comment in test
* Adding EncoderOnly and Token Classification tests for MT5 and UMT5
* Fix typo in umt5 string
* Add tests for all the existing MT5 models
* Fix wrong comment in dependency_versions_table
* Reverting change to common test for _keys_to_ignore_on_load_missing
The test is correctly picking up redundant keys in _keys_to_ignore_on_load_missing.
* Removing _keys_to_ignore_on_missing from MT5 since the key is not used in the model
* Add fix-copies to MT5ModelTest
* up
* Fix more
* Correct more
* Fix more tests
* fix fast tests
* Fix more
* fix more
* push all files
* finish all
* make style
* Fix timestamp wrap
* make style
* make style
* up
* up
* up
* Fix lang detection behavior
* Fix lang detection behavior
* Add lang detection test
* Fix lang detection behavior
* make style
* Update src/transformers/models/whisper/generation_whisper.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* better error message
* make style tests
* add warning
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Port core files + ESM (because ESM code is odd)
* Search-replace in modelling code
* Fix up transfo_xl as well
* Fix other core files + tests (still need to add correct import to tests)
* Fix cookiecutter
* make fixup, fix imports in some more core files
* Auto-add imports to tests
* Cleanup, add imports to sagemaker tests
* Use correct exception for importing tf_keras
* Fixes in modeling_tf_utils
* make fixup
* Correct version parsing code
* Ensure the pipeline tests correctly revert to float32 after each test
* Ensure the pipeline tests correctly revert to float32 after each test
* More tf.keras -> keras
* Add dtype cast
* Better imports of tf_keras
* Add a cast for tf.assign, just in case
* Fix callback imports
* Enable instantiating model with pretrained backbone weights
* Remove doc updates until changes made in modeling code
* Use load_backbone instead
* Add use_timm_backbone to the model configs
* Add missing imports and arguments
* Update docstrings
* Make sure test is properly configured
* Include recent DPT updates
* fix the function load_balancing_loss_func in Mixtral_Moe to include attention_mask
* format code using black and ruff
* skip computing mask if attention_mask=None
* add tests for load balancing loss Mixtral-Moe
* fix assert loss is different in mixtral_test
* fix pad_leng
* use assertNotAlmostEqual and print to debug
* remove print for debug
* minor updates
* reduce rtol and atol
* Enable instantiating model with pretrained backbone weights
* Update tests so backbone checkpoint isn't passed in
* Remove doc updates until changes made in modeling code
* Clarify pretrained import
* Update configs - docs and validation check
* Update src/transformers/utils/backbone_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Clarify exception message
* Update config init in tests
* Add test for when use_timm_backbone=True
* Small test updates
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* [DETA] fix freeze/unfreeze function
* Update src/transformers/models/deta/modeling_deta.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/deta/modeling_deta.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add freeze/unfreeze test case in DETA
* fix type
* fix typo 2
* fix : enable aux and enc loss in training pipeline
* Add unsynced variables from original DETA for training
* modification for passing CI test
* make style
* make fix
* manual make fix
* change deta_modeling_test of configuration 'two_stage' default to TRUE and minor change of dist checking
* remove print
* divide configuration in DetaModel and DetaForObjectDetection
* image smaller size than 224 will give topk error
* pred_boxes and logits should be equivalent to two_stage_num_proposals
* add missing part in DetaConfig
* Update src/transformers/models/deta/modeling_deta.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add docstring in configure and prettify TO DO part
* change distribute related code to accelerate
* Update src/transformers/models/deta/configuration_deta.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/deta/test_modeling_deta.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* protect importing accelerate
* change variable name to specific value
* wrong import
* fix aux_loss in conditional_detr
* add test aux_loss
* add aux_loss test in deta and table_transformer
* fix yolos since it doesn't have auxiliary function
* fix maskformer auxiliary_loss related code
* make style
* change param 'auxiliary_loss' to 'use_auxiliary_loss'
* change param 'auxiliary_loss' to 'use_auxiliary_loss' in tests
* make style & fix-copies, also revert yolos related parameter
* revert variable name 'use_auxiliary_loss' to 'auxiliary_loss' due to DetrConfig
* revert variable name in yolos
* revert maskformer
* add aux_loss test in maskformer
* make style
* Update src/transformers/models/yolos/configuration_yolos.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Allow non-special tokens to be added
* Add test, fix token adding code
* Revert changes to id_to_token and token_to_id
* Update the ESM tokenizer to be a bit more standardized
* Update src/transformers/models/esm/tokenization_esm.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* finalize
* make fix copies whisper
* [Tests] Make sure that we don't run tests mulitple times
* Update src/transformers/models/whisper/modeling_whisper.py
* [Tests] Make sure that we don't run tests mulitple times
* fix more
* improve
* improve
* improve further
* improve more
* improve
* fix more
* git commit and git push
* fix more
* fix more
* fix more
* New try
* Fix more whisper stuff
* Improve
* correct more
* correct more
* correct more
* Fix some tests
* Add more tests
* correct more
* correct more
* correct more
* push
* correct more
* Fix more
* Better
* without dec mask
* correct more
* clean
* save intermediate
* Fix more
* Fix VAD for large-v2
* Save new
* Correct more
* make cleaner
* correct tests
* correct src
* Finish
* Fix more
* Fix more
* finish
* Fix edge cases
* fix return_dict_in_generate
* fix all tests
* make style
* add docstrings
* add docstrings
* Fix logit processor
* make style
* fix pipeline test
* fix more style
* Apply suggestions from code review
* apply feedback Sanchit
* correct more
* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* correct more
* correct more
* correct more
* Fix staticmethod
* correct more
* fix
* fix slow tests
* make style
* fix tokenizer test
* fix tokenizer test
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* finish
* finish
* revert kwargs change
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* first commit
* correct default value non causal
* update config and modeling code
* update converting checkpoint
* clean modeling and fix tests
* make style
* add new config parameters to docstring
* fix copied from statements
* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* make position_embeddings_type docstrings clearer
* clean converting script
* remove function not used
* clean modeling file
* apply suggestion for test file + add convert script to not_doctested
* modify tests according to review - cleaner logic and more tests
* Apply nit suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add checker of valid position embeddings type
* instantiate new layer norm layer with the right eps
* fix freeze_feature_encoder since it can be None in some cases
* add test same output in convert script
* restore wav2vec2conformer and add new model
* create processor and FE + clean
* add new model code
* fix convert script and set default config parameters
* correct model id paths
* make style
* make fix-copies and cleaning files
* fix copied from statements
* complete .md and fixe copies
* clean convert script argument defaults
* fix config parameters docstrings
* fix config docstring
* add copied from and enrich FE tests
* fix copied from and repo-consistency
* add autotokenizer
* make test input length shorter and change docstring code
* fix docstrings and copied from
* add add_adapter to ASR training example
* make testing of adapters more robust
* adapt to multi adapter layers
* refactor input_values->input_features and remove w2v2-bert feature extractor
* remove pretraining model
* remove depreciated features and useless lines
* add copied from and ignore statements to modeling tests
* remove pretraining model #2
* change import in convert script
* change default in convert script
* update readme and remove useless line
* Update tests/models/wav2vec2_bert/test_processor_wav2vec2_bert.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* refactor BERT to Bert for consistency
* remove useless ignore copy statement
* add persistent to buffer in rotary
* add eps in LayerNorm init and remove copied from
* add adapter activation parameters and add copied from statements
* Fix copied statements and add unitest.skip reasons
* add copied statement in test_processor
* refactor processor
* make style
* replace numpy random by torch rand
* remove expected output CTC
* improve converting script with processor class
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* remove gumbel class
* remove tests related to previously deleted class
* Update src/transformers/models/wav2vec2_bert/configuration_wav2vec2_bert.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* correct typos
* remove uused parameters
* update processor to takes both text and audio
* update checkpoints
* update expected output and add ctc expected output
* add label_attention_mask
* replace pt with np in processor tests
* fix typo
* revert to behaviour with labels_attention_mask
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>