* added sequences_scores to the output
* added beam_indices to output
* added test to check for beam_indices, sequences_scores and their shape
* removed redundant whitespaces
* make fixup
* idefics2 enable_input_require_grads not aligned with disable_input_require_grads
make peft+idefics2 checkpoints disable fail
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
* split test case
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
* fix ci failure
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
* refine test
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
---------
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
* refactor weight_norm + propose uniformed solution to reconcile meta load_state_dict with classic loading
* make style
* fix sew
* fix sew and sew_d tests
* Fix failing tensor placement in Whisper
* fix long form generation tests
* more return_timestamps=True
* make fixup
* [run_slow] whisper
* [run_slow] whisper
* Uniformize kwargs for LlaVa and update docs
* Change order of processor inputs in docstring
* Improve BC support for reversed images and text inputs
* cleanup llava processor call docstring
* Add encoded inputs as valid text inputs in reverse input check, add deprecation version in warning
* Put function check reversed images text outside base processor class
* Refactor _validate_images_text_input_order
* Add ProcessingUtilTester
* fix processing and test_processing
* initial commit
* gloups
* updates
* work
* weights match
* nits
* nits
* updates to support the tokenizer :)
* updates
* Pixtral processor (#33454)
* rough outline
* Add in image break and end tokens
* Fix
* Udo some formatting changes
* Set patch_size default
* Fix
* Fix token expansion
* nit in conversion script
* Fix image token list creation
* done
* add expected results
* Process list of list of images (#33465)
* updates
* working image and processor
* this is the expected format
* some fixes
* push current updated
* working mult images!
* add a small integration test
* Uodate configuration docstring
* Formatting
* Config docstring fix
* simplify model test
* fixup modeling and etests
* Return BatchMixFeature in image processor
* fix some copies
* update
* nits
* Update model docstring
* Apply suggestions from code review
* Fix up
* updates
* revert modeling changes
* update
* update
* fix load safe
* addd liscence
* update
* use pixel_values as required by the model
* skip some tests and refactor
* Add pixtral image processing tests (#33476)
* Image processing tests
* Add processing tests
* woops
* defaults reflect pixtral image processor
* fixup post merge
* images -> pixel values
* oups sorry Mr docbuilder
* isort
* fix
* fix processor tests
* small fixes
* nit
* update
* last nits
* oups this was really breaking!
* nits
* is composition needs to be true
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix long seq bug
* fixed format
* fixed fn copy inconsistency
* fix long seq bug
* fixed format
* fixed fn copy inconsistency
* Addressed comments
* added a unit test
* fixed cache position
* Added a warning msg to the forward fn
* fixed test case
* test(tokenizers): add a test showing conflict with sentencepiece
This is due to the fact that protobuf C implementation uses a global
pool for all added descriptors, so if two different files add
descriptors, they will end up conflicting.
* fix(tokenizers): mitigate sentencepiece/protobuf conflict
When sentencepiece is available, use that protobuf instead of the
internal one.
* chore(style): fix with ruff
* Update tokenization_whisper.py
Fix issue with flax whisper model
* Update tokenization_whisper_fast.py
Fix issue with flax whisper model
* Update tokenization_whisper.py
just check len of token_ids
* Update tokenization_whisper_fast.py
just use len of token_ids
* Update tokenization_whisper_fast.py and revert changes in _strip_prompt and add support to jax arrays in _convert_to_list
* Update tokenization_whisper.py and revert changes in _strip_prompt and add support to jax arrays in _convert_to_list
* Update test_tokenization_whisper.py to add test for _convert_to_list method
* Update test_tokenization_whisper.py to fix code style issues
* Fix code style
* Fix code check again
* Update test_tokenization)whisper.py to Improve code style
* Update test_tokenization_whisper.py to run each of jax, tf and flax modules if available
* Update tests/models/whisper/test_tokenization_whisper.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update test_tokenization_whisper.py and use require_xxx decorators instead of `is_xxx_available()` method
* Revert the changes automatically applied by formatter and was unrelated to PR
* Format for minimal changes
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add tests for linear shape behavior
* fix linear shape behavior
ended up adding the reshape at the end, after f8f8bf16_rowwise, because adding
it directly after quantize_fp8_per_row caused f8f8bf16_rowwise to drop the
seq_len dimension. (i.e., (17, 23, 1014) -> (17, 1024))
* save shape up front + comment
* Make StaticCache configurable at model construct time
* integrations import structure
* add new doc file to toc
---------
Co-authored-by: Guang Yang <guangyang@fb.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
* Update docs for GGUF supported models
* Add tensor mappings and define class GGUFPhi3Converter
* Fix tokenizer
* Working version
* Attempt to fix some CI failures
* Run ruff format
* Add vocab, merges, decoder methods like LlamaConverter
* Resolve conflicts since Qwen2Moe was added to gguf
- I missed one place when resolving conflict
- I also made a mistake with tests_ggml.py and now has been fixed to reflect
its master version.
* Import structure & first three model refactors
* Register -> Export. Export all in __all__. Sensible defaults according to filename.
* Apply most comments from Amy and some comments from Lucain
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Lucain Pouget <lucainp@gmail.com>
* Style
* Add comment
* Clearer .py management
* Raise if not in backend mapping
* More specific type
* More efficient listdir
* Misc fixes
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Lucain Pouget <lucainp@gmail.com>
* add self.head_dim for VisionAttention in Qwen2-VL
* add self.head_dim for VisionAttention in Qwen2-VL
* fix ci
* black the test_modeling_qwen2_vl.py
* use ruff to format test_modeling_qwen2_vl.py
* [run-slow] qwen2_vl
* use tying for python3.8
* fix the import format
* use ruff to fix the ci error I001
* [run-slow] qwen2_vl
* remove unused import
* commit for rebase
* use ruff fix ci
* [run-slow] qwen2_vl
---------
Co-authored-by: root <liji>
* Add validation for maximum sequence length in modeling_whisper.py
Added a validation check to ensure that the sequence length of labels does not exceed the maximum allowed length of 448 tokens. If the sequence length exceeds this limit, a ValueError is raised with a descriptive error message.
This change prevents the model from encountering errors or unexpected behavior due to excessively long sequences during training or fine-tuning, ensuring consistent input dimensions and improving overall robustness.
* Change exception message in src/transformers/models/whisper/modeling_whisper.py
The exception message is for whisper's label's sequence max length.
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* Change 448 to config.max_target_positions in src/transformers/models/whisper/modeling_whisper.py
It's for whisper's config.max_target_positions.
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* Change method's documentation in src/transformers/models/whisper/modeling_whisper.py
* Add test for maximum label's sequence length in test_modeling_whisper.py
* Add self to modeling_whisper.py
* Update test_modeling_whisper.py with respect to automatic validations
* Update modeling_whisper.py with respect to ci/circleci: check_code_quality
* Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality
* Update test_modeling_whisper.py with respect to ci/circleci: tests_generate
* Update test_modeling_whisper.py with respect to ci/circleci: tests_generate
* Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality
* Separate test_labels_sequence_max_length tests in test_modeling_whisper.py
* Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality
* Remove assert from test_modeling_whisper.py
* Add max_target_positions to WhisperModelTester in test_modeling_whisper.py
* Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality
* Update test_modeling_whisper.py with respect to ci/circleci: tests_generate
* Update test_modeling_whisper.py
* Change test_labels_sequence_max_length_error_after_changing_config in test_modeling_whisper.py
* Change self.config.max_target_positions to self.max_target_positions modeling_whisper.py
* Add new tests in test_modeling_whisper.py
* Update test_modeling_whisper.py
---------
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* Load remote code only once
* Use hash as load indicator
* Add a new option `force_reload` for old behavior (i.e. always reload)
* Add test for dynamic module is cached
* Add more type annotations to improve code readability
* Address comments from code review
* Add validate images and test processing utils
* Remove encoded text from possible inputs in tests
* Removed encoded inputs as valid in processing_utils
* change text input check to be recursive
* change text check to all element of lists and not just the first one in recursive checks
* [InstructBLIP] qformer_tokenizer is required input
* Bit safer
* Add to instructblipvideo processor
* Fix up
* Use video inputs
* Update tests/models/instructblipvideo/test_processor_instructblipvideo.py
* Fixing a bug in the way "attention_factor" is validated in ROPE utilities.
* Fixing a bug in the way "attention_factor" is validated in ROPE utilities.
* Fixing a bug in the way "attention_factor" is validated in ROPE utilities.
* use gguf internal dequantize
* add Q5_0 test
* add iq1 test
* add remained test
* remove duplicated test
* update docs
* add gguf version limit
* make style
* update gguf import catch
* revert vocab_size patch
* make style
* use GGUF_MIN_VERSION everywhere
* remove to restiction for 4-bit model
* Update src/transformers/modeling_utils.py
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
* bitsandbytes: prevent dtype casting while allowing device movement with .to or .cuda
* quality fix
* Improve warning message for .to() and .cuda() on bnb quantized models
---------
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
* don't run custom when not needed?
* update test fetcher filtering
* fixup and updates
* update
* update
* reduce burden
* nit
* nit
* mising comma
* this?
* this?
* more parallelism
* more
* nit for real parallelism on tf and torch examples
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update to make it more custom
* update to make it more custom
* update to make it more custom
* update to make it more custom
* update
* update
* update
* update
* update
* update
* use correct path
* fix path to test files and examples
* filter-tests
* filter?
* filter?
* filter?
* nits
* fix naming of the artifacts to be pushed
* list vs files
* list vs files
* fixup
* fix list of all tests
* fix the install steps
* fix the install steps
* fix the config
* fix the config
* only split if needed
* only split if needed
* extend should fix it
* extend should fix it
* arg
* arg
* update
* update
* run tests
* run tests
* run tests
* more nits
* update
* update
* update
* update
* update
* update
* update
* simpler way to show the test, reduces the complexity of the generated config
* simpler way to show the test, reduces the complexity of the generated config
* style
* oups
* oups
* fix import errors
* skip some tests for now
* update doctestjob
* more parallelism
* fixup
* test only the test in examples
* test only the test in examples
* nits
* from Arthur
* fix generated congi
* update
* update
* show tests
* oups
* oups
* fix torch job for now
* use single upload setp
* oups
* fu**k
* fix
* nit
* update
* nit
* fix
* fixes
* [test-all]
* add generate marker and generate job
* oups
* torch job runs not generate tests
* let repo utils test all utils
* UPdate
* styling
* fix repo utils test
* more parallel please
* don't test
* update
* bit more verbose sir
* more
* hub were skipped
* split by classname
* revert
* maybe?
* Amazing catch
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* fix
* update
* update
* maybe non capturing
* manual convert?
* pass artifacts as parameters as otherwise the config is too long
* artifact.json
* store output
* might not be safe?
* my token
* mmm?
* use CI job IS
* can't get a proper id?
* ups
* build num
* update
* echo url
* this?
* this!
* fix
* wget
* ish
* dang
* udpdate
* there we go
* update
* update
* pass all
* not .txt
* update
* fetcg
* fix naming
* fix
* up
* update
* update
* ??
* update
* more updates
* update
* more
* skip
* oups
* pr documentation tests are currently created differently
* update
* hmmmm
* oups
* curl -L
* update
* ????
* nit
* mmmm
* ish
* ouf
* update
* ish
* update
* update
* updatea
* nit
* nit
* up
* oups
* documentation_test fix
* test hub tests everything, just marker
* update
* fix
* test_hub is the only annoying one now
* tf threads?
* oups
* not sure what is happening?
* fix?
* just use folder for stating hub
* I am getting fucking annoyed
* fix the test?
* update
* uupdate
* ?
* fixes
* add comment!
* nit
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* Adding SDPA support for RoBERTa-based models
* add not is_cross_attention
* fix copies
* fix test
* add minimal test for camembert and xlm_roberta as their test class does not inherit from ModelTesterMixin
* address some review comments
* use copied from
* style
* consistency
* fix lists
---------
Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* init fix
* fix mask during cached forward, move mask related stuff to own function
* adjust tests as left padding does not change logits as much anymore + batch gen (with todo on logits comp)
* revert overwriting new integration tests
* move some comments to docstring
* add Blip2ForImageTextRetrieval
* use one line and remove unnecessary space in tests
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* use value from the config, rather than hardcoded
* change order of params in Blip2QFormerModel.forward
* update docstring
* fix style
* update test_inference_opt
* move embeddings out of Blip2QFormerModel
* remove from_vision_qformer_configs
* remove autocast float16 in Blip2QFormerModel
* rename fiels into vision_projection,text_projection,use_image_text_matching_head
* use CLIPOutput for Blip2ImageTextMatchingModelOutput
* remove past_key_values_length from Blip2TextEmbeddings
* fix small typo in the CLIPOutput docstring
* add Blip2ForImageTextRetrieval to Zero Shot Image Classification mapping
* update docstring and add require_torch_fp16
* rollback test_inference_opt
* use use_image_text_matching_head=True in convert
* skip test_model_get_set_embeddings
* fix create_rename_keys error on new itm fields
* revert to do scale after dot product between "query" and "key"
* fix ValueError on convert script for blip2-opt-2.7b
* update org of paths to Salesforce
* add is_pipeline_test_to_skip for VisualQuestionAnsweringPipelineTests
* [run_slow] blip_2
* removed Blip2ForImageTextRetrieval from IGNORE_NON_AUTO_CONFIGURED
* fix docstring of Blip2ImageTextMatchingModelOutput
* [run_slow] blip_2
* fix multi-gpu tests
* [run_slow] blip_2
* [run_slow] blip_2
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Add a fix for the case when tokenizers are passed as a string
* Support image processors and feature extractors as well
* Reverting load_feature_extractor and load_image_processor
* Add test
* Test is torch-only
* Add tests for preprocessors and feature extractors and move test
* Extremely experimental fix
* Revert that change, wrong branch!
* Typo!
* Split tests
* fix param not being passed in tested; add exceptions
* better source of model name
* Update utils/create_dummy_models.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix: multilingual midel convert to tflite get wrong token
* fix: modify test_force_tokens_logits_processor the checking value as scores.dtype.min
---------
Co-authored-by: kent.sc.hung <kent.sc.hung@benq.com>
Co-authored-by: Aya <[kent831217@gmail.com]>
* Add new Jinja features:
- Do extension
- Break/continue in loops
- Call strftime to get current datetime in any format
* Add new Jinja features:
- Do extension
- Break/continue in loops
- Call strftime to get current datetime in any format
* Fix strftime template
* Add template strip() just to be safe
* Remove the do extension to make porting easier, and also because it's the least useful
* Rename test
* strftime -> strftime_now
* Split test
* Update test to use strftime_now
* Refactor everything out into chat_template_utils
* Refactor everything out into chat_template_utils
* Refactor everything out into chat_template_utils
* Refactor everything out into chat_template_utils
* Refactor everything out into chat_template_utils
* Add .float() in all generation methods logit outputs
* Switch float-casting of logits to training only for main models
* Add `num_logits_to_keep` in Llama and add it by default in generate
* Apply style
* Add num_logits_to_keep as arg in prepare_input_for_generation
* Add support for Mistral
* Revert models except llama and mistral
* Fix default None value in _supports_num_logits_to_keep()
* Fix dimension of dummy input
* Add exception for prophetnet in _supports_num_logits_to_keep()
* Update _supports_num_logits_to_keep() to use inspect.signature()
* Add deprecation cycle + remove modification with pretraining_tp
* Apply style
* Add most used models
* Apply style
* Make `num_logits_to_keep` an int in all cases to remove if-else clause
* Add compile check for the warning
* Fix torch versions
* style
* Add gemma2
* Update warning version
* Add comment about .float operations in generation utils
* Add tests in GenerationTesterMixin and ModelTesterMixin
* Fix batch size for assisted decoding in tests
* fix small issues in test
* refacor test
* fix slicing removing dim issue
* Add nemotron support (should fix check-copy issue in CIs)
* Trigger new CIs
* Trigger new CIs
* Bump version
* Bump version in TODO
* Trigger CIs
* remove blank space
* Trigger CIs
* more precise name
* better docstrings
* Update src/transformers/cache_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Add TorchAOHfQuantizer
Summary:
Enable loading torchao quantized model in huggingface.
Test Plan:
local test
Reviewers:
Subscribers:
Tasks:
Tags:
* Fix a few issues
* style
* Added tests and addressed some comments about dtype conversion
* fix torch_dtype warning message
* fix tests
* style
* TorchAOConfig -> TorchAoConfig
* enable offload + fix memory with multi-gpu
* update torchao version requirement to 0.4.0
* better comments
* add torch.compile to torchao README, add perf number link
---------
Co-authored-by: Marc Sun <marc@huggingface.co>
* Add padding="max_length" to tokenizer kwargs and change crop_size to size for image_processor kwargs
* remove crop_size argument in align processor tests to be coherent with base tests
* Add pad_token when loading tokenizer if needed, change test override tokenizer kwargs, remove unnecessary test overwrites in grounding dino
* fix typo
* uniform kwargs
* make style
* add comments
* remove return_tensors
* remove common_kwargs from processor since it propagates
* make style
* return_token_type_ids to True
* revert the default imagekwargs since does not accept any value in the image processro
* revert processing_utils.py
* make style
* add molbap's commit
* fix typo
* fix common processor
* remain
* Revert "add molbap's commit"
This reverts commit a476c6ee88.
* add unsync PR
* revert
* make CI happy
* nit
* import annotationformat
* add new model like
* draft cuda forward - mismatched keys (sharding on conv1)
* match keys successfully
* fix split
* get generation/forward running (wrong gens, norm?)
* :update
* some refactoring
* fixes
* works up until copy to cache
* fix
* update
* NON WORKING VERSION
* version that work?
* nit
* fix config
* fix conversion script
* working cuda forward
* nit
* update
* simplifcation
* make mamba slow simple work
* no einops
* todo
* fix style
* no einops
* update fix no einsum
* nit
* remove einops
* bug: scan_output differs strongly
* add rms norm option
* fix fast + slow generation with and w/o cache ✔️
* draft integration tests
* remove a big chunk of the einsum
* fix slow, fast generations, without any einsum
* fix copies
* fix structure
* fix up modeling and tests
* fix tests
* clamping is indeed worse
* recover mamba2 cache test
* fix copies
* no cache position (yet)
* fix tf tests
* fix matmul for generate
* fixup
* skip cache tests for now
* [run-slow]mamba2
* tune out hidden states for padding
* test batched generation
* propagate attention mask changes
* fix past length
* fix integration test
* style
* address comments
* update readme
* add mamba2 version check
* fix tests
* [run-slow]mamba2
* skip edge tests
* [run-slow]mamba2
* last fixup
* [run-slow]mamba2
* update README
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
* save total_vocab_size = vocab_size + user added tokens to speed up operation
* updating length when added_tokens_decoder is set
* add test len(tokenizer)
* Test this zach
* Test for improper init w/o zero3
* Move back
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Get rid of stars in warning
* Make private
* Make clear
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Initial implementation of OffloadedCache
* enable usage via cache_implementation
* Address feedback, add tests, remove legacy methods.
* Remove flash-attn, discover synchronization bugs, fix bugs
* Prevent usage in CPU only mode
* Add a section about offloaded KV cache to the docs
* Fix typos in docs
* Clarifications and better explanation of streams
* Fix conflicting key in init kwargs in PreTrainedTokenizerBase
* Update code to check for callable key in save_pretrained
* Apply PR suggestions
* Invoke CI
* Updates based on PR suggestion
* Remove user-defined tokens which can be obtained through merges
* Remove debug line
* formatting
* Refactor spm slow -> fast converter
* revert unnecessary refactor
* set comprehension
* remove test files
* Use `vocab_scores`
* Always replace spiece underline with space in decode
* we no longer need token filtering
* Add save fast load slow unit test
* Remove tokenizers version check
* Remove duplicate code
* Make `<start_of_turn>` and `<end_of_turn>` special tokens
* Bias merge priority with length if score is the same
* Add unit test for merge priority
* CI
* mvp
* added test (a few models need fixes)
* fix a few test cases
* test nits
* harder test 😈
* revert changes in stablelm
* test with improved condition
* add todo
* tmp commit
* merged with main
* nits
* add todo
* final corrections
* add docs for generation compilation
* docs nits
* add tip
* PR suggestions
* add more details to the compilation docs
* fix cache positions
* cache is now init in generate; update docs
* tag test as flaky
* docs
* post rebase make fixup and other nits
* remove unintended changes
* whisper (encoder-decoder) not supported
* move token default updates to ; add tests for token defaults
* push changes
* manual rebase
* chameleon doesn't support this
* fix test_static_cache_mha_mqa_gqa (broken in another PR)
* docs: dynamic is better with end-to-end compilation
* No more default chat templates
* Add the template to the GPT-SW3 tests since it's not available by default now
* Fix GPT2 test
* Fix Bloom test
* Fix Bloom test
* Remove default templates again
* Updated ruff version and fixed the required code accorindg to the latest version.
* Updated ruff version and fixed the required code accorindg to the latest version.
* Added noqa directive to ignore 1 error shown by ruff
* add DataCollatorBatchFlattening
* Update data_collator.py
* change name
* new FA2 flow if position_ids is provided
* add comments
* minor fix
* minor fix data collator
* add test cases for models
* add test case for data collator
* remove extra code
* formating for ruff check and check_repo.py
* ruff format
ruff format tests src utils
* custom_init_isort.py
* gguf conversion forces add_prefix_space=False for llama3, this is not required and forces from_slow, which fails. changing to None + test
* typo
* clean test
* Change resize_token_embeddings to make it return same Class that is passed to it
* Add explanatory comment as requested in review
* Add explanatory comments for add resizing function in lxmert
* Add comment for padding_idx and moving _resize_bias in lxmert to LxmertForPreTraining
---------
Co-authored-by: Prashanth Sateesh <prasatee@Prashanths-MBP.attlocal.net>
Co-authored-by: Prashanth Sateesh <prasatee@Prashanths-MacBook-Pro.local>
* Add YaRN and Dynamic-YaRN RoPE Scaling Methods
YaRN (Yet another RoPE extension method) combines the NTK-By-Parts
Interpolation and Attention Scaling methods, improving upon existing
RoPE interpolation methods for longer context window sizes.
Fine-tuned models maintain their original performance across benchmarks
while enabling efficient extrapolation and transfer learning for
quicker convergence, especially in compute-limited environments.
We implement YaRN and Dynamic-YaRN for the following list of models:
- LLaMA
- Falcon
- GPT-NeoX
- Olmo
- Persimmon
- Phi
- StableLM
- OpenLLaMA
New unit tests are added to assert YaRN's correct behavior on both
short and long sequence inputs.
For more details, please refer to https://arxiv.org/abs/2309.00071.
Co-authored-by: Miguel Almeida <miguel.pessanha.almeida@tecnico.ulisboa.pt>
* Refactor YaRN implementation for LLaMA
Iterate on YaRN implementation for LLaMA and remove diff from remaining
models for increased PR modularity.
This commit includes the following changes:
- Merge 'yarn_rope_scaling' and 'rope_scaling' dictionaries
- Remove unnecessary attributes ('extrapolation_factor' and 'finetuned')
from YaRN classes
- Inherit 'forward' method in YaRN classes from superclass
- Rename 'yarn' method to 'compute_yarn_scaling'
- Extend YaRN tests with further assertions
- Fix style inconsistencies
Co-authored-by: Miguel Monte e Freitas <miguelmontefreitas@tecnico.ulisboa.pt>
* Refactor Tensor Building Logic for YaRN
- Comply with the the tensor building logic introduced in #30743
- Add referencing to the optimized Attention Factor equation
- Remove Dynamic YaRN for a more agile deployment
Co-authored-by: mig-mfreitas <mig-mfreitas@users.noreply.github.com>
* remove unwanted file
---------
Co-authored-by: Miguel Almeida <miguel.pessanha.almeida@tecnico.ulisboa.pt>
Co-authored-by: mig-mfreitas <mig-mfreitas@users.noreply.github.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
* fix mask creation of gpt2 and gpt_neox caused by me
* forgot the reshape of masks when shape > 2
* add tests for gpt neox and gpt2
* nit on a comment
* add language to words
_collate_word_timestamps uses the return_language flag to determine whether the language of the chunk should be added to the word's information
* ran style checks
added missing comma
* add new language test
test that the pipeline can return both the language and timestamp
* remove model configuration in test
Removed model configurations that do not influence test results
* remove model configuration in test
Removed model configurations that do not influence test results
* 1,100%!
* Clean
* Don't touch DS
* Experiment with dtype allocation
* skip test_load_save_without_tied_weights test
* A little faster
* Include proper upscaling?
* Fixup tests
* Potentially skip?
* Let's see if this fixes git history
* Maintain new dtype
* Fin
* Rm hook idea for now
* New approach, see what breaks
* stage
* Clean
* Stash
* Should be fin now, just need to mark failing models
* Clean up
* Simplify
* Deal with weird models
* Enc/Dec
* Skip w/ reason
* Adjust test
* Fix test
* one more test
* Keep experimenting
* Fix ref
* TO REMOVE: testing feedback CI
* Right push
* Update tests/utils/test_modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* disable
* Add new func
* Test nits from Amy
* Update src/transformers/modeling_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Adjust comment
* Adjust comment on skip
* make private
* Fin
* Should be a not flag
* Clarify and rename test
---------
Co-authored-by: Marc Sun <marc@huggingface.co>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Fix the incorrect permutation of gguf
* rename num_kv_heads
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* add typing to num_kv_heads
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* rename variables
* refactor permute function name
* update the expected text of the llama3 q4 test
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* tmp commit
* shorter
* nit
* explicit kwargs
* propagate changes
* mass propagation with a few manual touches (let's see how CI behaves)
* fix cacheless case
* Update src/transformers/generation/utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* make fixup
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Add warning message for and parameters
* Fix when the warning is raised
* Formatting changes
* Improve testing and remove duplicated warning from _fix_key
* fix galore lr display with lr schedulers
* style
* add some tests to check for displayed lrs
* copy-paste err for warmup steps
* standardize the default lr to be only in the optimizer
* trying out my luck with the reads
* cast image features to model.dtype where needed to support FP16 or other precision in pipelines
* Update src/transformers/pipelines/image_feature_extraction.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Use .to instead
* Add FP16 pipeline support for zeroshot audio classification
* Remove unused torch imports
* Add docs on FP16 pipeline
* Remove unused import
* Add FP16 tests to pipeline mixin
* Add fp16 placeholder for mask_generation pipeline test
* Add FP16 tests for all pipelines
* Fix formatting
* Remove torch_dtype arg from is_pipeline_test_to_skip*
* Fix format
* trigger ci
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Add siglip loss function
* Update docs
* Enable training tests
[experimental] enable GC training tests as it has worked for my own data
* Remove test_training* overrides to enable training tests
[run_slow] siglip
* Skip training tests for Siglip text model and ImageClassificationModel
[run_slow] siglip
* Skip GC training tests for SiglipForImageClassification
* Explicitly skip training tests for SiglipVisionModel
Add skip reason for training tests for SiglipTextModel
* Remove copied from to fix CI
* Fix init for rt-detr heads
* Fixup
* Add separate prior_prob value to config for initialization
* Add bbox init
* Change to 1 / num_labels init
* Adjust weights init test
* Fix style for test
* add gather_use_object arguments
* fix name and pass the CI test for Seq2SeqTrainer
* make style
* make it to functools
* fix typo
* add accelerate version:
* adding warning
* Update src/transformers/trainer.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* make style
* Update src/transformers/training_args.py
* check function move to initial part
* add test for eval_use_gather_object
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* squash into single commit
* run diff once more
* docstring
* tests
* minor chnages and ready to go
* Update src/transformers/models/llava_next_video/processing_llava_next_video.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/vipllava/test_modeling_vipllava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* [run-slow] llava-next-video
* [run-slow] llava-next-video
* [run-slow] llava_next_video
* fix two tests
* fix slow tests
* remove logit checks due to numeric errors
* run test once more
* [run-slow] llava_next_video
* final try to pass the test
* [run-slow] llava_next_video
* [run-slow] llava_next_video
* [run-slow] llava_next_video
* style
* fix
* style
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* fix llama fsdp
* fixup
* adding FSDP tests for CPU offloading
* fixes
* fix tests
* fix tests
* add it for mixtral
* propagate the changes on other models
* Update src/transformers/models/phi/modeling_phi.py
* Delete utils/testing_scripts/fsdp_cpu_offloading.py
Remove script - FSDP + CPU offloading it tested in the test suite
* Delete utils/testing_scripts/dummy_fsdp_config.yml
* Update + add cache_positions docstring
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* starting support for sdpa in `gptneox` models
* small comment on tests
* fix dropout
* documentation and style
* clarify concrete paths for reference
* generalise attn projections and rope application
added head mask check to sdpa mask creation
handle sdpa memory backend bug via own version flag
* update docs and style
* move dtype casting outside of general attn_projection_and_rope function
fix flash_attn_2 stuff
* more generic attn warning if output_attns or head_mask
* simplify head mask check by moving head mask creation to a later point
* remove copied llama artifact
* remove padding_mask from attention function signature
* removing unnecessary comments, only "save" attn implementation once
* [run_slow] gpt_neox
* Add initial implementation of `spectrogram_batch`
* Format the initial implementation
* Add test suite for the `spectrogram_batch`
* Update `spectrogram_batch` to ensure compatibility with test suite
* Update `spectrogram_batch` to include pre and post-processing
* Add `amplitude_to_db_batch` function and associated tests
* Add `power_to_db_batch` function and associated tests
* Reimplement the test suite for `spectrogram_batch`
* Fix errors in `spectrogram_batch`
* Add the function annotation for `spectrogram_batch`
* Address code quality
* Re-add `test_chroma_equivalence` function
* Update src/transformers/audio_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/audio_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* PR SPLIT: moving origina changes for adding user defined symbols
* adding gemma test and generalizing gemma converter
* ruff
* update common test
* update serialization test
* deberta v2 tests updates as rust version adds '.' as a user added token, so a space is not added
* removing commented lines
* applying feedback - user only added_tokens to add and check piece.type instead of trainer_spec for user_defined_symbols
* add comment referencing sentencepiece
* Fix single letter stop strings
* Change the 0 to a 1 to avoid potential empty vector headaches later
* Restructure for clarity
* Update tests/generation/test_stopping_criteria.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Add the unsqueeze
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Improve Python interpreter
* Add with and assert statements
* Prevent overwriting existing tools
* Check interpreter errors are well logged in code agent
* Add lazy evaluation for and and or
* Improve variable assignment
* Fix early return statements in functions
* Add small import fix on interpreter tool
* Pass datasets trust_remote_code
* Pass trust_remote_code in more tests
* Add trust_remote_dataset_code arg to some tests
* Revert "Temporarily pin datasets upper version to fix CI"
This reverts commit b7672826ca.
* Pass trust_remote_code in librispeech_asr_dummy docstrings
* Revert "Pin datasets<2.20.0 for examples"
This reverts commit 833fc17a3e.
* Pass trust_remote_code to all examples
* Revert "Add trust_remote_dataset_code arg to some tests" to research_projects
* Pass trust_remote_code to tests
* Pass trust_remote_code to docstrings
* Fix flax examples tests requirements
* Pass trust_remote_dataset_code arg to tests
* Replace trust_remote_dataset_code with trust_remote_code in one example
* Fix duplicate trust_remote_code
* Replace args.trust_remote_dataset_code with args.trust_remote_code
* Replace trust_remote_dataset_code with trust_remote_code in parser
* Replace trust_remote_dataset_code with trust_remote_code in dataclasses
* Replace trust_remote_dataset_code with trust_remote_code arg
* Draft fast image processors
* Draft working fast version
* py3.8 compatible cache
* Enable loading fast image processors through auto
* Tidy up; rescale behaviour based on input type
* Enable tests for fast image processors
* Smarter rescaling
* Don't default to Fast
* Safer imports
* Add necessary Pillow requirement
* Woops
* Add AutoImageProcessor test
* Fix up
* Fix test for imagegpt
* Fix test
* Review comments
* Add warning for TF and JAX input types
* Rearrange
* Return transforms
* NumpyToTensor transformation
* Rebase - include changes from upstream in ImageProcessingMixin
* Safe typing
* Fix up
* convert mean/std to tesnor to rescale
* Don't store transforms in state
* Fix up
* Update src/transformers/image_processing_utils_fast.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/auto/image_processing_auto.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/auto/image_processing_auto.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/auto/image_processing_auto.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Warn if fast image processor available
* Update src/transformers/models/vit/image_processing_vit_fast.py
* Transpose incoming numpy images to be in CHW format
* Update mapping names based on packages, auto set fast to None
* Fix up
* Fix
* Add AutoImageProcessor.from_pretrained(checkpoint, use_fast=True) test
* Update src/transformers/models/vit/image_processing_vit_fast.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Add equivalence and speed tests
* Fix up
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* First draft, still missing automatic function conversion
* First draft of the automatic schema generator
* Lots of small fixes
* the walrus has betrayed me
* please stop committing your debug breakpoints
* Lots of cleanup and edge cases, looking better now
* Comments and bugfixes for the type hint parser
* More cleanup
* Add tests, update schema generator
* Update tests, proper handling of return values
* Small docstring change
* More doc updates
* More doc updates
* Add json_schema decorator
* Clean up the TODOs and finish the docs
* self.maxDiff = None to see the whole diff for the nested list test
* add import for add_json_schema
* Quick test fix
* Fix something that was bugging me in the chat template docstring
* Less "anyOf" when unnecessary
* Support return types for the templates that need them
* Proper return type tests
* Switch to Google format docstrings
* Update chat templating docs to match new format
* Stop putting the return type in with the other parameters
* Add Tuple support
* No more decorator - we just do it implicitly!
* Add enum support to get_json_schema
* Update docstring
* Add copyright header
* Update src/transformers/tokenization_utils_base.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/chat_templating.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/utils/chat_template_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/utils/chat_template_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Add copyright header
* make fixup
* Fix indentation
* Reformat chat_template_utils
* Correct return value
* Make regexes module-level
* Support more complex, multi-line arg docstrings
* Update error message for ...
* Update ruff
* Add document type validation
* Refactor docs
* Refactor docs
* Refactor docs
* Clean up Tuple error
* Add an extra test for very complex defs and docstrings and clean everything up for it
* Document enum block
* Quick test fixes
* Stop supporting type hints in docstring to fix bugs and simplify the regex
* Update docs for the regex change
* Clean up enum regex
* Wrap functions in {"type": "function", "function": ...}
* Update src/transformers/utils/chat_template_utils.py
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
* Temporary tool calling commit
* Add type hints to chat template utils, partially update docs (incomplete!)
* Code cleanup based on @molbap's suggestion
* Add comments to explain regexes
* Fix up type parsing for unions and lists
* Add custom exception types and adjust tests to look for them
* Update docs with a demo!
* Docs cleanup
* Pass content as string
* Update tool call formatting
* Update docs with new function format
* Update docs
* Update docs with a second tool to show the model choosing correctly
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
* Rename to test_model_common_attributes
The method name is misleading - it is testing being able to get and set embeddings, not common attributes to all models
* Explicitly skip
* Update TVP model to interpolate pre-trained image pad prompter encodings
* feat: Add 2D positional embeddings interpolation in TvpVisualInputEmbedding
* added required comments
* Update TVP model to interpolate pre-trained image pad prompter encodings
* feat: Add 2D positional embeddings interpolation in TvpVisualInputEmbedding
* added required comments
* docstring and argument fix
* doc fixes and test case fix suggested in review.
* varibale typo fix
* styling and name fixes for padding interpolation flag.
* Remove ConversationalPipeline and Conversation object, as they have been deprecated for some time and are due for removal
* Update not-doctested.txt
* Fix JA and ZH docs
* Fix JA and ZH docs some more
* Fix JA and ZH docs some more
* Implement JSON dump conversion for torch_dtype in TrainingArguments
* Add unit test for converting torch_dtype in TrainingArguments to JSON
* move unit test for converting torch_dtype into TrainerIntegrationTest class
* reformating using ruff
* convert dict_torch_dtype_to_str to private method _dict_torch_dtype_to_str
---------
Co-authored-by: jun.4 <jun.4@kakaobrain.com>
* Add list check for image and question
* Handle passing two lists and update docstring
* Add tests
* Add support for dataset
* Add test for dataset as input
* fixup
* fix unprotected import
* fix unprotected import
* fix import again
* fix param type
* Initial attempt
* Updates: PR suggestions
* Interpolate the relative position bias when interpolate_pos_encoding is True
* Add slow tag for the added tests
* Add in DATA2VEC_VISION_INPUTS_DOCSTRING
* Added interpolate pos encoding feature and test to deit
* Added interpolate pos encoding feature and test for deit TF model
* readded accidentally delted test for multi_gpu
* storing only patch_size instead of entire config and removed commented code
* Update modeling_tf_deit.py to remove extra line
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix the get_size_with_aspect_ratio in max_size situation
* make fix-up
* add more general solution
* consider when max_size is not defined
* fix typo
* fix typo
* simple fix
* fix error
* fix if else error
* fix error of size overwrite
* fix yolos image processing
* fix detr image processing
* make
* add longest related test script
* Update src/transformers/models/yolos/image_processing_yolos.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add more test
* add test script about longest size
* remove deprecated
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* token healing impl + trie with extensions
* make fixup
* prefix-robust space tokenization
* examples readme and requirements
* make fixup
* allow input prompt and model
* redundant defaults
* Specialized Trie
* make fixup
* updated tests with new inherited Tree
* input ids to auto device_map
* rm unused import
* Update src/transformers/generation/utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* naming convention
* Revert "naming convention"
This reverts commit dd39d9c5b7a969e2d8a8d2a8e54f121b82dc44f0.
* naming convention
* last -hopefully- changes
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Fix has_file in offline mode
* harmonize env variable for offline mode
* Switch to HF_HUB_OFFLINE
* fix test
* revert test_offline to test TRANSFORMERS_OFFLINE
* Add new offline test
* merge conflicts
* docs
* seems like `split_special_tokens` is used here
* split special token
* add new line at end of file
* moving split special token test to common tests
* added assertions
* test
* fixup
* add co-author
* passing rest of args to gptsan_japanese, fixing tests
* removing direct comparison of fast and slow models
* adding test support for UDOP and LayoutXLM
* ruff fix
* readd check if slow tokenizer
* modify test to handle bos tokens
* removing commented function
* trigger build
* applying review feedback - updated docstrings, var names, and simplified tests
* ruff fixes
* Update tests/test_tokenization_common.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* applying feedback, comments
* shutil temp directory fix
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Ita Zaporozhets <itazaporozhets@Itas-MBP.localdomain>
Co-authored-by: itazap <itazap@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Ita Zaporozhets <itazaporozhets@Itas-MacBook-Pro.local>
* added interpolation for vitmae model in pytorch as well as tf.
* Update modeling_vit_mae.py
irreugalr import fixed
* small changes and proper formatting
* changes suggested in review.
* modified decoder interpolate_func
* arguments and docstring fix
* Apply suggestions from code review
doc fixes
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add test that currently fails
* test passed
* all perceiver passed
* fixup, style, quality, repo-consistency, all passed
* Apply suggestions from code review: default to False + compute sqrt once only
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix a minor bracket
* replace dim with self._num_channels
* add arguments to the rest preprocessors
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add prefix space ignored in llama #29625
* adding test with add_prefix_space=False
* ruff
---------
Co-authored-by: Ita Zaporozhets <itazaporozhets@Itas-MBP.localdomain>
* clean-up
* Update src/transformers/cache_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/cache_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/cache_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fixup
* Update tests/quantization/quanto_integration/test_quanto.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Update src/transformers/generation/configuration_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* more suggestions
* mapping if torch available
* run tests & add 'support_quantized' flag
* fix jamba test
* revert, will be fixed by another PR
* codestyle
* HQQ and versatile cache classes
* final update
* typo
* make tests happy
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
If required padding for a crop larger than input image is odd-numbered,
the padding would be rounded down instead of rounded up, causing the
output dimension to be one smaller than it should be.
* Introduce configured_state
* Include note on tuning
* Allow for users to have defined a state already
* Include tests
* Add note on hpam tune
* Guard a bit better
* Update src/transformers/training_args.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/training_args.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Finish rebase
* Finish rebase
* Guard carefully
* Fixup test
* Refactor
* Fin refactor
* Comment
* Update wrt feedback
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix for custom pipeline configuration
* fix for custom pipelines
* remove extra exception
* added test for custom pipelines extra tag
* format with ruff
* limit extra tag for first time only
* format with ruff
* improve tests for custom pipelines
* Add MistralForTokenClassification
* Add tests and docs
* Add token classification for Mixtral and Qwen2
* Save llma for token classification draft
* Add token classification support for Llama, Gemma, Persimmon, StableLm and StarCoder2
* Formatting
* Add token classification support for Qwen2Moe model
* Add dropout layer to each ForTokenClassification model
* Add copied from in tests
* Update src/transformers/models/llama/modeling_llama.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Propagate suggested changes
* Style
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* add interpolation of positional encoding support to swin
* add style changes
* use default image processor and make size a dictionary
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* remove logits testing
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Refactor image size validation logic when interpolation is disabled
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* remove asserts in modeling
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add dynamic resolution input support to swinv2
* change size to ensure interpolation encoding path is triggered
* set interpolate_pos_encoding default value to False
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set interpolate_pos_encoding default value to False
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set interpolate_pos_encoding default value to False
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set interpolate_pos_encoding default value to False
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set interpolate_pos_encoding default value to False
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set interpolate_pos_encoding default value to False
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set interpolate_pos_encoding default value to False
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set interpolate_pos_encoding default value to False
* add dynamic resolution input to donut swin
* add dynamic resolution input to maskformer swin
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Support arbitrary processor
* fix
* nit
* update
* nit
* nit
* fix and revert
* add a small test
* better check
* fixup
* bug so let's just use class for now
* oups
* .
* Fix llama model forward function with attention=True, same-length encoded sequence.
* Fix style
* propagate fix to modeling_cohere, gemma, dbrx, and olmo (which copy the same sdpa masking logic from llama)
* Fix style
* ignore unnecessary sdpa mask converter when output_attentions=True
* add tests checking sdpa and eager outputs match when output_attentions=True
* Split if statements in two lines
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Fix formatting
* Add fix to new jetmoe model
* Add missing output_attentions argument to jetmoe mask creation
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add method
* change method name
* more comments
* Apply suggestions from code review
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fixup
* add docstrings and fix comment
* warn users on the de-quantized dtype
* Update src/transformers/quantizers/base.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update src/transformers/integrations/bitsandbytes.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* final suggestion - use private method
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Add support for mixing languages in a single batch
* Update docstring
* Enable different detected languages in batch
* Do not require input_features
* Test list of languages
* Fix comment
* Make init_tokens length-1 if possible, broadcast at the end
* Test for ValueError with language list of incorrect length
* Slow test for batched multilingual transcription
* fixup
* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Address review, refactor
* Second attempt to move this line where it was originally
* Split test, fix a bug
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Initial commit
* Just a copy of modeling_idefics.py that will be ported to TF
* - Prepend TF to the name of all classes
- Convert pytorch ops to TF (not all operations are converted yet)
* Add TF imports
* Add autotranslated files
* Add TF classes to model_tf_auto.py
* Add the TF classes in model_doc
* include auto-translated code
* Adopted from auto-translated version
* Add a forgotten super().build
* Add test code for TF version.
* Fix indentation and load pytorch weights for now
* Some fixes. Many tests are still failing but some are passing now.
- I have added TODO's for some of the hacks I made to unblock me
and I will address them soon
- I have the processing_idefics.py hacked in my view to support TF temporarily
* Add ALL_LAYERNORM_LAYERS to match pytorch
* Revert "Add ALL_LAYERNORM_LAYERS to match pytorch"
This reverts commit 7e0a35119b4d7a6284d04d8c543fba1b29e573c9 as it
is not needed in the tf implementation.
* Fix freeze_relevant_params()
* Some more fixes
* Fix test_attention_outputs
* Add tf stuff to processing_idefics.py
processing_idefics.py supports both pytorch and tf now.
test_processor_idefics.py for pytorch is passing, so i didn't break anything
but still some issues with tf. I also need to add tf tests in
test_processor_idefics.py.
* Pass return_tensors to image processing code and fix test
* Pass return_tensors to the image processor __init__
* Fix several test cases
- Make input to some of the forward pass of type `TFModelInputType`
- Decorate main layer forward pass with `@unpack_inputs`
- Decorate main layer with `@keras_serializable`
- Pass `inputs` to TFIdeficsModel
* Some more fixes forgotten in last commit
* Fix processing code and vision_tf.py
* Fix perceiver bug
* Import from
* Auto-add build() methods + style pass
* Fix build() errors due to `None` being passed as shape to some layers
* Change name in TFIdeficsForVisionText2Text to attribute in IdeficsForVisionText2Text
* Fix pytorch weights load for tf2
There were a lot of `name=` missing in weight initialization code.
* Attempt to fix CI
* Add back accidently removed line
* Remove torch-specific stuff from the TF test file
* make fix-copies, make style, remove autotranslated files
* Fixes to imports/docstrings
* Let's try the from future import in desperation
* Fix the core random_attention_mask fn to match the torch/flax behaviour
* Clean random_attention_mask up correctly
* Remove torch-only test
* Fix loss shape, couple of nits
* make style
* Don't test for OOB embeddings because IDEFICS uses those deliberately
* Fix loss computation to handle masking
* Fix test failures when flattening
* Fix some test failures
- Add cross attention gate which was missing and wasn't being passed arround
- Fix overwriting of image_attention_mask due to hack I had for dummy inputs
* Add a proper stateless scaled_dot_product_attention
* make style
* Adding missing attribute from the PyTorch version
* Small cleanups to decoupledlinearlayer in case that helps
* Pass epsilon to LayerNormalization
* Attemp to fix pytorch weight cross-loading for TFIdeficsEmbedding
* Fix a bug in TFIdeficsGatedCrossAttentionLayer
* Patching up build() methods
* Constant self.inv_freq
* Constant self.inv_freq
* First working version
The TF implementation works now, there was a bug in the TFIdeficsDecoupledLinear
where the weights were mis-intialized (in_features,out_features)
when it should be: (out_features, in_features)
I have tested this so far with tiny-random and idefics-9b-instruct
and gives correct output.
I also dumped the final outputs for both pytorch and TF
and they are identical.
* Fix some test failures
* remove print statement
* Fix return_tensors
* Fix CI test failure check_code_quality
* Attempt to fix CI failures by running `make fixup`
The hardcoded IDs in test_modeling_tf_idefics.py are for the integration
test and makes that file unreadable and should probably be moved to a seperate file.
* Attempt to fix tests_pr_documentation_tests
* Fix a test failure in test_image_processing_idefics.py
* Fix test test_pt_tf_model_equivalence
* Fix a few failures
* Tiny fix
* Some minor fixes
* Remove a duplicate test
* Override a few test failures for IDEFICS
- `test_keras_save_load` is passing now
- `test_compile_tf_model` is still failing
* Fix processing_idefics.py after rebase
* Guard import keras with is_tf_available
* fix check code quality
* fix check code quality
* Minor fixes
* Skip test_save_load temporarily
This test passed on my local box but fails on the CI, skipping
for now to see if there are other remaining failures on the CI.
* Run `ruff format tests src utils`
* Fix last failing test, `test_compile_tf_model`
* Add fixes for vision_tf.py
I forgot to add this file in last commit.
* Minor fixes
* Replace "<<<" with "<<" for doc tests
IDEFICS-9B is too big for doctest runner, so don't run it there
* Make code more readable
* Fix bug after code review
I added a layer_norm_eps to IdeficsConfig but I don't even need it
since the vision config has a layer_norm_eps.
* Fix after code review
Use original code tokenizer.convert_tokens_to_ids
* Keep PyTorch as the default return_tensors
* Fixes to modeling_tf after code review
* Fixes from code review
- Remove all references of `TF_IDEFICS_PRETRAINED_MODEL_ARCHIVE_LIST`
- Pass 1e-5 to LayerNormalization in perceiver
* Run ruff
* Undo a change
* Refactor processing code after Matt's suggestion
* Remove TODO's that aren't needed anymore
* For pytorch, Use original pytorch processing code from main
Since this PR is a TF port it shouldn't make any modifications
to pytorch IDEFICS code. This changes undo's the pytorch processing
modifications I made and uses original code from main.
* Update tests/models/idefics/test_modeling_idefics.py
* Update tests/models/idefics/test_modeling_tf_idefics.py
* Add missing imports for is_pt_tf_cross_test
* [DO NOT MERGE]: This is a commit for debugging and will be reverted
The cross test `test_pt_tf_model_equivalence` passes locally but
fails when running on the CI. This commit is to help debug that
and will be reverted.
* Revert "[DO NOT MERGE]: This is a commit for debugging and will be reverted"
This reverts commit 8f0d709ec5bd46685fb0b4259d914ffee794875b.
* [DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted
* [DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted
* Revert "[DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted"
This reverts commit 998cc38b8c3d313bf5e5eb55a7f5b7b881897b89.
* Revert "[DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted"
This reverts commit 1c695ac4219c4ae4d39b330b01744dc27deb7dd4.
* Don't skip test_save_load
IIRC test_save_load was also failing on the CI but not on my local
box, it might be easier to debug that on the CI first than the cross tests
* Debugging commit, will be reverted
* Revert "Debugging commit, will be reverted"
This reverts commit 8eafc8e41e20c4e95a3a90834f06a6e9f445e2d5.
* Override `test_save_load` and push model to save
Maybe this will help me repro this weird bug
* pass my repo_id
* add endpoint
* Pass a temp (write) token just for this CI
* Undo last few commits, still pushing to hub for model debugging
The issue seems to be with save_pretrained(), when I looked at the model saved
from the CI test failure it is basically empty and has no weights.
`self.save_weights(..)` seems to be failing in save_pretrained but needs
more debugging
* Add logging to modeling tf utils, will be reverted just for debugging
* Debugging, will revert
* Revert "Debugging, will revert"
This reverts commit 9d0d3075fb7c82d8cde3a5c76bc8f3876c5c55d3.
* Revert "Add logging to modeling tf utils, will be reverted just for debugging"
This reverts commit 774b6b7b1c17b3ce5d7634ade768f2f686cee617.
* Remove `test_save_load`
The CI failures are gone after my latest rebase, no idea why
but I was still saving the model to my hub on HF and the tf_model.h5
file now has everything.
* Run make fix-copies
* Run ruff format tests src utils
* Debugging commit, will be reverted
* Run ruff, also trigger CI run
* Run ruff again
* Undo debugging commit
---------
Co-authored-by: Matt <rocketknight1@gmail.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* blip with interpolated pos encoding
* feat: Add interpolate_pos_encoding option to other models from `BLIP` family.
* include check for textual generated content in tests
* immutability fix for seq2seq as well as immutability tests for the collators
* ensure we don't act on none labels and formatting
* remove tf/pt in respective tests as they are not required
* more type error fixes tf/np
* remove todo
* apply suggestions from code review
* formatting / style
* Create CodeAgent and ReactAgent
* Fix formatting errors
* Update documentation for agents
* Add custom errors, improve logging
* Support variable usage in ReactAgent
* add messages
* Add message passing format
* Create React Code Agent
* Update
* Refactoring
* Fix errors
* Improve python interpreter
* Only non-tensor inputs should be sent to device
* Calculator tool slight refactor
* Improve docstrings
* Refactor
* Fix tests
* Fix more tests
* Fix even more tests
* Fix tests by replacing output and input types
* Fix operand type issue
* two small fixes
* EM TTS
* Fix agent running type errors
* Change text to speech tests to allow changed outputs
* Update doc with new agent types
* Improve code interpreter
* If max iterations reached, provide a real answer instead of an error
* Add edge case in interpreter
* Add safe imports to the interpreter
* Interpreter tweaks: tuples and listcomp
* Make style
* Make quality
* Add dictcomp to interpreter
* Rename ReactJSONAgent to ReactJsonAgent
* Misc changes
* ToolCollection
* Rename agent's logger to self.logger
* Add while loops to interpreter
* Update doc with new tools. still need to mention collections
* Add collections to the doc
* Small fixes on logs and interpretor
* Fix toolbox return type
* Docs + fixup
* Skip doctests
* Correct prompts with improved examples and formatting
* Update prompt
* Remove outdated docs
* Change agent to accept Toolbox object for tools
* Remove calculator tool
* Propagate removal of calculator in doc
* Fix 2 failing workflows
* Simplify additional argument passing
* AgentType audio
* Minor changes: function name, types
* Remove calculator tests
* Fix test
* Fix torch requirement
* Fix final answer tests
* Style fixes
* Fix tests
* Update docstrings with calculator removal
* Small type hint fixes
* Update tests/agents/test_translation.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update tests/agents/test_python_interpreter.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/agents/default_tools.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/agents/tools.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update tests/agents/test_agents.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/bert/configuration_bert.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/agents/tools.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/agents/speech_to_text.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update tests/agents/test_speech_to_text.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update tests/agents/test_tools_common.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* pygments
* Answer comments
* Cleaning up
* Simplifying init for all agents
* Improving prompts and making code nicer
* Style fixes
* Add multiple comparator test in interpreter
* Style fixes
* Improve BERT example in documentation
* Add examples to doc
* Fix python interpreter quality
* Logging improvements
* Change test flag to agents
* Quality fix
* Add example for HfEngine
* Improve conversation example for HfEngine
* typo fix
* Verify doc
* Update docs/source/en/agents.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/agents/agents.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/agents/prompts.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/agents/python_interpreter.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update docs/source/en/agents.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Fix style issues
* local s2t tool
---------
Co-authored-by: Cyril Kondratenko <kkn1993@gmail.com>
Co-authored-by: Lysandre <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Adding _tie_weights() to prediction heads to support low_cpu_mem_usage=True
* Testing for the non-safe-tensors case, since the default is safe-tensors already
* Running fixup/fix-copies
* Adding accelerate annotations to tests
* Added cache clearing for GPU efficiency.
* Added cache clearing for GPU efficiency.
* Added batch_eval_metrics capability
* Ran make fixup
* Fixed bug
* Fixed whitespace issue
* Fixed outdated condition
* Updated docstrings with instructions for batch_eval_metrics. Updated end of dataloader logic
* Added first version of batch_eval_metrics Trainer test
* Fixed batch_eval_metrics Trainer tests for both eval and predict
* Fixed batch_eval_metrics behavior for new Trainer variables
* Fixed batch_eval_metrics Trainer tests
* Ran fixup
* Trainer: load checkpoint model with multiple adapters
* Trainer._load_from_checkpoint support multiple active adapters
* PeftModel.set_adapter does not support multiple adapters yet
* Trainer._load_from_checkpoint test multiple adapters
---------
Co-authored-by: Clara Luise Pohland <clara-luise.pohland@telekom.de>