* tmp commit
* move tests to the right class
* remove ALL all_generative_model_classes = ...
* skip tf roberta
* skip InstructBlipForConditionalGenerationDecoderOnlyTest
* videollava
* reduce diff
* reduce diff
* remove on vlms
* fix a few more
* manual rebase bits
* more manual rebase
* remove all manual generative model class test entries
* fix up to ernie
* a few more removals
* handle remaining cases
* recurrent gemma
* it's better here
* make fixup
* tf idefics is broken
* tf bert + generate is broken
* don't touch tf :()
* don't touch tf :(
* make fixup
* better comments for test skips
* revert tf changes
* remove empty line removal
* one more
* missing one
* use torch.testing.assertclose instead to get more details about error in cis
* fix
* style
* test_all
* revert for I bert
* fixes and updates
* more image processing fixes
* more image processors
* fix mamba and co
* style
* less strick
* ok I won't be strict
* skip and be done
* up
* kinda works
* update
* add tests
* update
* use special tokens in processors
* typo
* fix copies
* fix
* fix moshi after rebase
* update
* fix tests
* update
* Update docs/source/en/main_classes/tokenizer.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* update docs
* test for load time adding tokens
* fix some more tests which are now fetched better
* one more fix
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* tmp commit
* tmp commit
* cull overwrites of deleted tests
* typo
* more specific docstring
* make fixup
* parameterize at the top?
* correction
* more deletions :D
* tmp commit
* for VLMs too
* fix _check_outputs
* test nit
* make fixup
* fix another flaky
* test_generate_from_inputs_embeds -- handle missing attention mask
* more precise name
* better docstrings
* Update src/transformers/cache_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Add YaRN and Dynamic-YaRN RoPE Scaling Methods
YaRN (Yet another RoPE extension method) combines the NTK-By-Parts
Interpolation and Attention Scaling methods, improving upon existing
RoPE interpolation methods for longer context window sizes.
Fine-tuned models maintain their original performance across benchmarks
while enabling efficient extrapolation and transfer learning for
quicker convergence, especially in compute-limited environments.
We implement YaRN and Dynamic-YaRN for the following list of models:
- LLaMA
- Falcon
- GPT-NeoX
- Olmo
- Persimmon
- Phi
- StableLM
- OpenLLaMA
New unit tests are added to assert YaRN's correct behavior on both
short and long sequence inputs.
For more details, please refer to https://arxiv.org/abs/2309.00071.
Co-authored-by: Miguel Almeida <miguel.pessanha.almeida@tecnico.ulisboa.pt>
* Refactor YaRN implementation for LLaMA
Iterate on YaRN implementation for LLaMA and remove diff from remaining
models for increased PR modularity.
This commit includes the following changes:
- Merge 'yarn_rope_scaling' and 'rope_scaling' dictionaries
- Remove unnecessary attributes ('extrapolation_factor' and 'finetuned')
from YaRN classes
- Inherit 'forward' method in YaRN classes from superclass
- Rename 'yarn' method to 'compute_yarn_scaling'
- Extend YaRN tests with further assertions
- Fix style inconsistencies
Co-authored-by: Miguel Monte e Freitas <miguelmontefreitas@tecnico.ulisboa.pt>
* Refactor Tensor Building Logic for YaRN
- Comply with the the tensor building logic introduced in #30743
- Add referencing to the optimized Attention Factor equation
- Remove Dynamic YaRN for a more agile deployment
Co-authored-by: mig-mfreitas <mig-mfreitas@users.noreply.github.com>
* remove unwanted file
---------
Co-authored-by: Miguel Almeida <miguel.pessanha.almeida@tecnico.ulisboa.pt>
Co-authored-by: mig-mfreitas <mig-mfreitas@users.noreply.github.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
* Pass datasets trust_remote_code
* Pass trust_remote_code in more tests
* Add trust_remote_dataset_code arg to some tests
* Revert "Temporarily pin datasets upper version to fix CI"
This reverts commit b7672826ca.
* Pass trust_remote_code in librispeech_asr_dummy docstrings
* Revert "Pin datasets<2.20.0 for examples"
This reverts commit 833fc17a3e.
* Pass trust_remote_code to all examples
* Revert "Add trust_remote_dataset_code arg to some tests" to research_projects
* Pass trust_remote_code to tests
* Pass trust_remote_code to docstrings
* Fix flax examples tests requirements
* Pass trust_remote_dataset_code arg to tests
* Replace trust_remote_dataset_code with trust_remote_code in one example
* Fix duplicate trust_remote_code
* Replace args.trust_remote_dataset_code with args.trust_remote_code
* Replace trust_remote_dataset_code with trust_remote_code in parser
* Replace trust_remote_dataset_code with trust_remote_code in dataclasses
* Replace trust_remote_dataset_code with trust_remote_code arg
* add prefix space ignored in llama #29625
* adding test with add_prefix_space=False
* ruff
---------
Co-authored-by: Ita Zaporozhets <itazaporozhets@Itas-MBP.localdomain>
* Add MistralForTokenClassification
* Add tests and docs
* Add token classification for Mixtral and Qwen2
* Save llma for token classification draft
* Add token classification support for Llama, Gemma, Persimmon, StableLm and StarCoder2
* Formatting
* Add token classification support for Qwen2Moe model
* Add dropout layer to each ForTokenClassification model
* Add copied from in tests
* Update src/transformers/models/llama/modeling_llama.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Propagate suggested changes
* Style
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* push legacy to fast as well
* super strange
* Update src/transformers/convert_slow_tokenizer.py
* make sure we are BC
* fix Llama test
* nit
* revert
* more test
* style
* update
* small update w.r.t tokenizers
* nit
* don't split
* lol
* add a test for `add_prefix_space=False`
* fix gemma tokenizer as well
* update
* fix gemma
* nicer failures
* fixup
* update
* fix the example for legacy = False
* use `huggyllama/llama-7b` for the PR doctest
* nit
* use from_slow
* fix llama
* attempt to fix
* the actual fix that works with compilation!
* this?
* temporary update
* nit?
* dispatcg to memory efficient?
* update both models that have static cache support
* fix copies fix compile
* make sure fix
* fix cohere and gemma
* fix beams?
* nit
* slipped through the cracks
* nit
* nits
* update
* fix-copies
* skip failing tests
* nits
* use user_defined_symbols
* fixup
* nit
* add a very robust test
* make sure all models are tested with the `pretrained_tokenizer_to_test`
* should we make sure we test all of them?
* merge
* remove the id
* fix test
* update
* ousies
* oups
* fixup
* fix copies check
* remove `pretrained_tokenizer_to_test`