* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
Enhanced installation section with troubleshooting, GPU setup, and OS-specific details.
* Update README.md
Enhanced installation section with troubleshooting, GPU setup, and OS-specific details.
* Update installation.md
Updated installation.md to include virtual environment and GPU setup instructions.
* Update installation.md
Updated installation.md to include virtual environment and GPU setup instructions.
* Update installation.md
Updated installation.md to include virtual environment, troubleshooting and GPU setup instructions.
* Update installation.md
* Update installation.md
* Update installation.md
* Update installation.md
Updated installation.md to include virtual environment, troubleshooting functions and GPU setup instructions.
* Update installation.md
Updated installation.md to include virtual environment, troubleshooting functions and GPU setup instructions.
* Update installation.md
Updated installation.md to include virtual environment, troubleshooting functions and GPU setup instructions.
* Update README.md
Removed numbering from README.md.
* Update README.md
Removed unnecessary "a)" formatting as per maintainer feedback.
* Update README.md
Added blank lines around code snippets for better readability.
* Update README.md
Removed the line "b) Install a backend framework:" from README.md as per feedback.
* Update README.md
Simplified "For Windows:" to "Windows" in README.md as per feedback as well as "For macOS/Linux:" to "macOS/Linux"
* Update README.md
Removed unnecessary heading and retained valid code snippet.
* Update README.md
Removed unnecessary heading "d) Optional: Install from source for the latest updates" as per feedback.
* Update README.md
Removed "GPU Setup (Optional)" section to align with minimal design feedback.
* Update installation.md
Removed "Create and Activate a Virtual Environment" section from installation.md as per feedback.
* Update installation.md
Adjusted "Troubleshooting" to a second-level heading and added an introductory line as per feedback.
* Update installation.md
Updated troubleshooting section with simplified headings and formatted code blocks as per feedback.
* Update installation.md
Integrated GPU setup instructions into the "Install with pip" section for better content flow.
* Update README.md
Removed Troubleshooting section from README.md for minimalism as per maintainer feedback.
* Update torchao.md: use auto-compilation
* Update torchao.md: indicate updating transformers to the latest
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Add the helium model.
* Add a missing helium.
* And add another missing helium.
* Use float for the rmsnorm mul.
* Add the Helium tokenizer converter.
* Add the pad token as suggested by Arthur.
* Update the RMSNorm + some other tweaks.
* Fix more rebase issues.
* fix copies and style
* fixes and add helium.md
* add missing tests
* udpate the backlink
* oups
* style
* update init, and expected results
* small fixes
* match test outputs
* style fixup, fix doc builder
* add dummies and we should be good to go!z
* update sdpa and fa2 documentation
---------
Co-authored-by: laurent <laurent.mazare@gmail.com>
* Create token_classification.md
* Update token_classification.md
* Update docs/source/ar/tasks/token_classification.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/token_classification.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/token_classification.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/token_classification.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/token_classification.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/token_classification.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/token_classification.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/token_classification.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/token_classification.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/token_classification.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/token_classification.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/token_classification.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/token_classification.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/token_classification.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/token_classification.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/token_classification.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/token_classification.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/token_classification.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/token_classification.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/token_classification.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/token_classification.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/token_classification.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/token_classification.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/token_classification.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/token_classification.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/token_classification.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update _toctree.yml
---------
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* model can convert to HF and be loaded back
* nit
* works in single batch generation but hallucinates
* use the image tokens
* add image generation
* now it works
* add tests
* update
* add modulare but it doesn't work for porting docstring :(
* skip some tests
* add slow tests
* modular removed the import?
* guess this works
* update
* update
* fix copies
* fix test
* fix copies
* update
* docs
* fix tests
* last fix tests?
* pls
* repo consistency
* more style
* style
* remove file
* address comments
* tiny bits
* update after the new modular
* fix tests
* add one more cond in check attributes
* decompose down/up/mid blocks
* allow static cache generation in VLMs
* nit
* fix copies
* Update docs/source/en/model_doc/emu3.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/emu3.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/emu3.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/emu3.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/emu3.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/emu3.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/emu3.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/emu3.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* fix VAE upsampling
* Update src/transformers/models/emu3/modular_emu3.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* address comments
* state overwritten stuff explicitly
* fix copies
* add the flag for flex attn
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* bug fixes
* organize imports
* wrap cpu warning in reference_compile
* Avoid needing repad_logits_with_grad, always repad with grads when training
I'm not 100% that the conditional with "or labels is None" makes sense though - not sure what the intention is there. Perhaps we can remove that?
* Revert "Avoid needing repad_logits_with_grad, always repad with grads when training"
This reverts commit cedcb4e89b.
* Fix grammar: keep -> keeps
* Propagate grammar fix with modular_model_converter
---------
Co-authored-by: Tom Aarsen <Cubiegamedev@gmail.com>
Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com>
* universal checkpoint
* Update docs/source/en/deepspeed.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/deepspeed.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/deepspeed.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* إضافة الترجمة العربية: multiple_choice.md
* Update multiple_choice.md
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/multiple_choice.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update _toctree.yml
* Add files via upload
* Update _toctree.yml
---------
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* add audio_token attribute to proc
* expand input_ids
* and legacy and expanded input_ids
* test update
* split lines
* add possibility not to provide eos and bos audio tokens
* raise errors
* test incorrect number of audio tokens
* add example
* fmt
* typo
* first adding diffllama
* add Diff Attention and other but still with errors
* complate make attention Diff-Attention
* fix some bugs which may be caused by transformer-cli while adding model
* fix a bug caused by forgetting KV cache...
* Update src/transformers/models/diffllama/modeling_diffllama.py
You don't need to divide by 2 if we use same number of attention heads as llama. instead you can just split in forward.
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* Update src/transformers/models/diffllama/modeling_diffllama.py
fit to changeing "num_heads // 2" place
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* Update src/transformers/models/diffllama/modeling_diffllama.py
new codes are more meaningful than before
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* Update src/transformers/models/diffllama/modeling_diffllama.py
new codes are more meaningful than before
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* Update src/transformers/models/diffllama/modeling_diffllama.py
fit to changeing "num_heads // 2" place
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* Update src/transformers/models/diffllama/modeling_diffllama.py
fix 2times divide by sqrt(self.head_dim)
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* Update src/transformers/models/diffllama/modeling_diffllama.py
fix 2times divide by sqrt(self.head_dim)
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* Update src/transformers/models/diffllama/modeling_diffllama.py
fit to changeing "num_heads // 2" place.
and more visible
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* I found Attention missed implemented from paper still on e072544a3b.
* re-implemented
* adding groupnorm
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* align with transformers code style
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* fix typo
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* adding groupnorm
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* change SdpaAttention to DiffSdpaAttention
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* fix bug
* Update src/transformers/models/diffllama/modeling_diffllama.py
resolve "not same outputs" problem
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* fix bugs of places of "GroupNorm with scale" and etc
* Revert "fix bugs of places of "GroupNorm with scale" and etc"
This reverts commit 26307d92f6.
* simplify multiple of attention (matmul) operations into one by repeating value_states
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* simplify multiple of attention (matmul) operations into one by repeating value_states
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* simplify multiple of attention (matmul) operations into one by repeating value_states
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* remove missed type
* add diffllama model_doc
* apply make style/quality
* apply review comment about model
* apply review comment about test
* place diffllama alphabetically on the src/transformers/__init__.py
* fix forgot code
* Supports parameters that are not initialized with standard deviation 0 in the conventional method
* add DiffLlamaConfig to CONFIG_CLASSES_TO_IGNORE_FOR_DOCSTRING_CHECKPOINT_CHECK on utils/check_config_docstrings.py
* remove unused property of config
* add to supported model list
* add to spda supported model list
* fix copyright, remove pretraining_tensor_parallel, and modify for initialization test
* remove unused import and etc.
* empty commit
* empty commit
* empty commit
* apply modular transformers but with bugs
* revert prev commit
* create src/transformers/model/diffllama/modular_diffllama.py
* run utils/modular_model_converter.py
* empty commit
* leaner modular diffllama
* remove more and more in modular_diffllama.pt
* remove more and more in modular_diffllama.pt
* resolve missing docstring entries
* force reset
* convert modular
---------
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* Add French translation of task_summary and tasks_explained
---------
Co-authored-by: Aymeric Roucher <69208727+aymeric-roucher@users.noreply.github.com>
* إضافة الترجمة العربية: summarization.md
* Update docs/source/ar/tasks/summarization.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/summarization.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/summarization.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/summarization.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/summarization.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/summarization.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/summarization.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/summarization.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/summarization.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/summarization.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/summarization.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/summarization.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/summarization.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/summarization.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/summarization.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update _toctree.yml
---------
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* إضافة الترجمة العربية: question_answering.md
* Update question_answering.md
* Update docs/source/ar/tasks/question_answering.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/question_answering.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/question_answering.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/question_answering.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/question_answering.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/question_answering.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/question_answering.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/question_answering.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/question_answering.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/question_answering.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/question_answering.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/question_answering.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/question_answering.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/question_answering.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/question_answering.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update _toctree.yml
---------
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Improve modular transformers documentation
- Adds hints to general contribution guides
- Lists which utils scripts are available to generate single-files from modular files and check their content
* Show commands in copyable code cells
---------
Co-authored-by: Joel Koch <joel@bitcrowd.net>
* initial cut of modernbert for transformers
* small bug fixes
* fixes
* Update import
* Use compiled mlp->mlp_norm to match research implementation
* Propagate changes in modular to modeling
* Replace duplicate attn_out_dropout in favor of attention_dropout
cc @warner-benjamin let me know if the two should remain separate!
* Update BOS to CLS and EOS to SEP
Please confirm @warner-benjamin
* Set default classifier bias to False, matching research repo
* Update tie_word_embeddings description
* Fix _init_weights for ForMaskedLM
* Match base_model_prefix
* Add compiled_head to match research repo outputs
* Fix imports for ModernBertForMaskedLM
* Just use "gelu" default outright for classifier
* Fix config name typo: initalizer -> initializer
* Remove some unused parameters in docstring. Still lots to edit there!
* Compile the embeddings forward
Not having this resulted in very slight differences - so small it wasn't even noticed for the base model, only for the large model.
But the tiny difference for large propagated at the embedding layer through the rest of the model, leading to notable differences of ~0.0084 average per value, up to 0.2343 for the worst case.
* Add drafts for ForSequenceClassification/ForTokenClassification
* Add initial SDPA support (not exactly equivalent to FA2 yet!)
During testing, FA2 and SDPA still differ by about 0.0098 per value in the token embeddings. It still predicts the correct mask fills, but I'd like to get it fully 1-1 if possible.
* Only use attention dropout if training
* Add initial eager attention support (also not equivalent to FA2 yet!)
Frustratingly, I also can't get eager to be equivalent to FA2 (or sdpa), but it does get really close, i.e. avg ~0.010 difference per value.
Especially if I use fp32 for both FA2&eager, avg ~0.0029 difference per value
The fill-mask results are good with eager.
* Add initial tests, output_attentions, output_hidden_states, prune_heads
Tests are based on BERT, not all tests pass yet: 23 failed, 79 passed, 100 skipped
* Remove kwargs from ModernBertForMaskedLM
Disable sparse_prediction by default to match the normal HF, can be enabled via config
* Remove/adjust/skip improper tests; warn if padding but no attn mask
* Run formatting etc.
* Run python utils/custom_init_isort.py
* FlexAttention with unpadded sequences(matches FA2 within bf16 numerics)
* Reformat init_weights based on review
* self -> module in attention forwards
* Remove if config.tie_word_embeddings
* Reformat output projection on a different line
* Remove pruning
* Remove assert
* Call contiguous() to simplify paths
* Remove prune_qkv_linear_layer
* Format code
* Keep as kwargs, only use if needed
* Remove unused codepaths & related config options
* Remove 3d attn_mask test; fix token classification tuple output
* Reorder: attention_mask above position_ids, fixes gradient checkpointing
* Fix usage if no FA2 or torch v2.5+
* Make torch.compile/triton optional
Should we rename 'compile'? It's a bit vague
* Separate pooling options into separate functions (cls, mean) - cls as default
* Simplify _pad_modernbert_output, remove unused labels path
* Update tied weights to remove decoder.weight, simplify decoder loading
* Adaptively set config.compile based on hf_device_map/device/resize, etc.
* Update ModernBertConfig docstring
* Satisfy some consistency checks, add unfinished docs
* Only set compile to False if there's more than 1 device
* Add docstrings for public ModernBert classes
* Dont replace docstring returns - ends up being duplicate
* Fix mistake in toctree
* Reformat toctree
* Patched FlexAttention, SDPA, Eager with Local Attention
* Implement FA2 -> SDPA -> Eager attn_impl defaulting, crucial
both to match the original performance, and to get the highest inference speed without requiring users to manually pick FA2
* Patch test edge case with Idefics3 not working with 'attn_implementation="sdpa"'
* Repad all_hidden_states as well
* rename config.compile to reference_compile
* disable flex_attention since it crashes
* Update modernbert.md
* Using dtype min to mask in eager
* Fully remove flex attention for now
It's only compatible with the nightly torch 2.6, so we'll leave it be for now. It's also slower than eager/sdpa.
Also, update compile -> reference_compile in one more case
* Call contiguous to allow for .view()
* Copyright 2020 -> 2024
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update/simplify __init__ structure
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Remove "... if dropout_prob > 0 else identity"
As dropout with 0.0 should be efficient like identity
* re-use existing pad/unpad functions instead of creating new ones
* remove flexattention method
* Compute attention_mask and local_attention_mask once in modeling
* Simplify sequence classification prediction heads, only CLS now
Users can make custom heads if they feel like it
Also removes the unnecessary pool parameter
* Simplify module.training in eager attn
* Also export ModernBertPreTrainedModel
* Update the documentation with links to finetuning scripts
* Explain local_attention_mask parameter in docstring
* Simplify _autoset_attn_implementation, rely on super()
* Keep "in" to initialize Prediction head
Doublechecked with Benjamin that it's correct/what we used for pretraining
* add back mean pooling
* Use the pooling head in TokenClassification
* update copyright
* Reset config._attn_implementation_internal on failure
* Allow optional attention_mask in ForMaskedLM head
* fix failing run_slow tests
* Add links to the paper
* Remove unpad_no_grad, always pad/unpad without gradients
* local_attention_mask -> sliding_window_mask
* Revert "Use the pooling head in TokenClassification"
This reverts commit 99c38badd1.
There was no real motivation, no info on whether having this bigger head does anything useful.
* Simplify pooling, 2 options via if-else
---------
Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com>
Co-authored-by: Tom Aarsen <Cubiegamedev@gmail.com>
Co-authored-by: Said Taghadouini <taghadouinisaid@gmail.com>
Co-authored-by: Benjamin Clavié <ben@clavie.eu>
Co-authored-by: Antoine Chaffin <ant54600@hotmail.fr>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* docs: fix typo quickstart snippet in ColPali's model card
* docs: clean the ColPali's model card
* docs: make the `ColPaliForRetrieval`'s docstring more concise
* docs: add missing bash command used to convert weights for `vidore/colpali-v1.3-hf`
* initial commit for PR
Co-authored-by: Gabe Goodhart <gabe.l.hart@gmail.com>
* rename dynamic cache
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* add more unit tests
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* add integration test
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* add integration test
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* Add modular bamba file
* Remove trainer changes from unrelated PR
* Modify modular and cofig to get model running
* Fix some CI errors and beam search
* Fix a plethora of bugs from CI/docs/etc
* Add bamba to models with special caches
* Updat to newer mamba PR for mamba sublayer
* fix test_left_padding_compatibility
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* fix style
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* fix remaining tests
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* missed this test
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* ran make style
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* move slow tag to integration obj
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* make style
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* address comments
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* fix modular
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* left out one part of modular
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* change model
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* Make Rotary modular as well
* Update bamba.md
Added overview, update Model inference card and added config
* Update bamba.md
* Update bamba.md
* Update bamba.md
Minor fixes
* Add docs for config and model back
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
* Add warning when using fast kernels
* replaced generate example
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* Address comments from PR
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
* Propagate attention fixes
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
* Fix attention interfaces to the new API
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
* Fix API for decoder layer
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
* Remove extra weights
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
---------
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
Co-authored-by: Gabe Goodhart <gabe.l.hart@gmail.com>
Co-authored-by: Antoni Viros i Martin <aviros@ibm.com>
Co-authored-by: divya-kumari32 <72085811+divya-kumari32@users.noreply.github.com>
Co-authored-by: Antoni Viros <ani300@gmail.com>
* Add Cohere2 docs details
* Update docs/source/en/model_doc/cohere2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Add Falcon3 documentation
* Update Falcon3 documentation
* Change Falcon to Falcon3
* Update docs and run make fix-copies
* Add blog post and huggingface models links
* refactor image_processing_auto logic
* fix fast image processor tests
* Fix tests fast vit image processor
* Add safeguard when use_fast True and torchvision not available
* change default use_fast back to None, add warnings
* remove debugging print
* call get_image_processor_class_from_name once
* Add files
* Init
* Add TimmWrapperModel
* Fix up
* Some fixes
* Fix up
* Remove old file
* Sort out import orders
* Fix some model loading
* Compatible with pipeline and trainer
* Fix up
* Delete test_timm_model_1/config.json
* Remove accidentally commited files
* Delete src/transformers/models/modeling_timm_wrapper.py
* Remove empty imports; fix transformations applied
* Tidy up
* Add image classifcation model to special cases
* Create pretrained model; enable device_map='auto'
* Enable most tests; fix init order
* Sort imports
* [run-slow] timm_wrapper
* Pass num_classes into timm.create_model
* Remove train transforms from image processor
* Update timm creation with pretrained=False
* Fix gamma/beta issue for timm models
* Fixing gamma and beta renaming for timm models
* Simplify config and model creation
* Remove attn_implementation diff
* Fixup
* Docstrings
* Fix warning msg text according to test case
* Fix device_map auto
* Set dtype and device for pixel_values in forward
* Enable output hidden states
* Enable tests for hidden_states and model parallel
* Remove default scriptable arg
* Refactor inner model
* Update timm version
* Fix _find_mismatched_keys function
* Change inheritance for Classification model (fix weights loading with device_map)
* Minor bugfix
* Disable save pretrained for image processor
* Rename hook method for loaded keys correction
* Rename state dict keys on save, remove `timm_model` prefix, make checkpoint compatible with `timm`
* Managing num_labels <-> num_classes attributes
* Enable loading checkpoints in Trainer to resume training
* Update error message for output_hidden_states
* Add output hidden states test
* Decouple base and classification models
* Add more test cases
* Add save-load-to-timm test
* Fix test name
* Fixup
* Add do_pooling
* Add test for do_pooling
* Fix doc
* Add tests for TimmWrapperModel
* Add validation for `num_classes=0` in timm config + test for DINO checkpoint
* Adjust atol for test
* Fix docs
* dev-ci
* dev-ci
* Add tests for image processor
* Update docs
* Update init to new format
* Update docs in configuration
* Fix some docs in image processor
* Improve docs for modeling
* fix for is_timm_checkpoint
* Update code examples
* Fix header
* Fix typehint
* Increase tolerance a bit
* Fix Path
* Fixing model parallel tests
* Disable "parallel" tests
* Add comment for metadata
* Refactor AutoImageProcessor for timm wrapper loading
* Remove custom test_model_outputs_equivalence
* Add require_timm decorator
* Fix comment
* Make image processor work with older timm versions and tensor input
* Save config instead of whole model in image processor tests
* Add docstring for `image_processor_filename`
* Sanitize kwargs for timm image processor
* Fix doc style
* Update check for tensor input
* Update normalize
* Remove _load_timm_model function
---------
Co-authored-by: Amy Roberts <22614925+amyeroberts@users.noreply.github.com>
* add "Translating Benchmarks.md to Chinese "
* Removed all the English original text (which was previously kept as comments in the document) and refined some of the Chinese expressions.
* Add docs/source/ar/community.md to Add_docs_source_ar_community.md
* Update community.md
* Update community.md
* Update community.md
* Update _toctree.yml - add community.md
* Update docs/source/ar/community.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Create how_to_hack_models.md
* Create modular_transformers.md
* Create tiktoken.md
* Update _toctree.yml
* Update docs/source/ar/how_to_hack_models.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/how_to_hack_models.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/how_to_hack_models.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/how_to_hack_models.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/how_to_hack_models.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/how_to_hack_models.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/how_to_hack_models.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/how_to_hack_models.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/modular_transformers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/modular_transformers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/modular_transformers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/modular_transformers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/modular_transformers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/modular_transformers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/modular_transformers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/modular_transformers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/modular_transformers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tiktoken.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tiktoken.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
---------
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* add commen to offloading
* Update docs/source/en/kv_cache.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Fixed typo in multi gpu docs and OLMoE version
* Fixed typos in docs for agents, agents advanced, knowledge distillation, and image feature extraction
* Fixed incorrect usage of model.image_guided_detection in zero shot object detection docs
* explain release_memory
* Update docs/source/en/llm_tutorial_optimization.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Add docs/source/ar/benchmarks.md to Add_docs_source_ar_benchmarks.md
* Update docs/source/ar/benchmarks.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/benchmarks.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/benchmarks.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/benchmarks.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/benchmarks.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/benchmarks.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/benchmarks.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/benchmarks.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/benchmarks.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/benchmarks.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/benchmarks.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update _toctree.yml
* Update benchmarks.md
---------
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Updated documentation and added conversion utility
* Update docs/source/en/tiktoken.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/tiktoken.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Moved util function to integration folder + allow for str
* Update formatting
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Updated formatting
* style changes
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Add Nemotron GGUF Loading Support
* fix the Nemotron architecture assignation
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Added image-text-to-text pipeline to task guide
* Update docs/source/en/tasks/image_text_to_text.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/tasks/image_text_to_text.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/tasks/image_text_to_text.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/tasks/image_text_to_text.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Merge codeblocks
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* add deformable detr image processor fast
* add fast processor to doc
* fix copies
* nit docstring
* Add tests gpu/cpu and fix docstrings
* fix docstring
* import changes from detr
* fix imports
* rebase and fix
* fix input data format change in detr and rtdetr fast
* Fix post process function called in the instance segmentation example of mask2former
* fix description and additional notes for post_process_instance_segmentation of maskformers
* remove white space in maskformers post_process_instance_segmentation doc
* change image.size[::-1] to height and width for clarity in segmentation examples
* Add model skeletion with transformers-cli add-new-model-like
* Convert config to modular, add rms_norm_eps, delete clip_qkv
* Convert model to modular, add RMSNorm
* Add flash attention with qk norm and no qkv clipping
* Add decoder layer with RMSNorm after attention/feedforward layers
* Add base and causal model
* Add converter improvements from OLMo repo
* Update weight loading in OLMo to HF converter
* Set correct default for rms_norm_eps
* Set correct pipeline_model_mapping in test
* Run make fixup
* Fix model type
* Re-run modular conversion
* Manually set config docs to fix build errors
* Convert olmo-1124 to olmo_1124 to fix flash attention docs errors
* Start updating tests
* Update tests
* Copy upstream test_eager_matches_sdpa_inference_1_bfloat16 changes to olmo_1124
* Rename input_layernorm and post_attention_layernorm to reflect their ops better
* Use correct tokenizer
* Remove test unsupported by GPT2 tokenizer
* Create GenerationConfig outside of from_pretrained call
* Use simpler init file structure
* Add explicit __all__ to support simplified init
* Make safetensor serialization the default
* Update OLMo November 2024 docs
* add XPU path
* use accelerate API
* Update docs/source/en/tasks/semantic_segmentation.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* update more places with accelerate API
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Add docs/source/ar/torchscript.md to Add_docs_source_ar_torchscript.md
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Merge troubleshooting.md with this Branch
* Update _toctree.yml
* Update torchscript.md
* Update troubleshooting.md
---------
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Add docs/source/ar/trainer.md to Add_docs_source_ar_trainer.md
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update trainer.md
* Update trainer.md
* Update trainer.md
* Create _toctree.yml
* Delete docs/source/ar/_toctree.yml
* Update _toctree.yml - add trainer
* Update _toctree.yml
* merge serialization.md into this branch
* merge sagemaker.md into this PR
* Update _toctree.yml
* Update docs/source/ar/trainer.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* kinda works
* update
* add tests
* update
* use special tokens in processors
* typo
* fix copies
* fix
* fix moshi after rebase
* update
* fix tests
* update
* Update docs/source/en/main_classes/tokenizer.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* update docs
* test for load time adding tokens
* fix some more tests which are now fetched better
* one more fix
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Add docs/source/ar/multilingual.md to Add_docs_source_ar_multilingual.md
* Update docs/source/ar/multilingual.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/multilingual.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/multilingual.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/multilingual.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/multilingual.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/multilingual.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/multilingual.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/multilingual.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/multilingual.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/multilingual.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/multilingual.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/multilingual.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/multilingual.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/multilingual.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/multilingual.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/multilingual.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update _toctree.yml
* Update _toctree.yml
* Add Translated files to branch for merg
* Update _toctree.yml
* Update _toctree.yml
* Update custom_models.md
* Update chat_templating.md
* Update docs/source/ar/create_a_model.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update create_a_model.md
* Update gguf.md
* Update gguf.md
* Update gguf.md
* Update gguf.md
---------
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* update doc
* Update docs/source/en/perf_train_cpu.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* delete closing tip
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Standardize image-text-to-text-models-output
add post_process_image_text_to_text to chameleon and cleanup
Fix legacy kwarg behavior and deprecation warning
add post_process_image_text_to_text to qwen2_vl and llava_onevision
Add post_process_image_text_to_text to idefics3, mllama, pixtral processor
* nit var name post_process_image_text_to_text udop
* nit fix deprecation warnings
* Add image-text-to-text pipeline
* add support for image url in chat template for pipeline
* Reformat to be fully compatible with chat templates
* Add tests chat template
* Fix imports and tests
* Add pipeline tag
* change logic handling of single prompt ans multiple images
* add pipeline mapping to models
* fix batched inference
* fix tests
* Add manual batching for preprocessing
* Fix outputs with nested images
* Add support for all common processing kwargs
* Add default padding when multiple text inputs (batch size>1)
* nit change version deprecation warning
* Add support for text only inference
* add chat_template warnings
* Add pipeline tests and add copied from post process function
* Fix batched pipeline tests
* nit
* Fix pipeline tests blip2
* remove unnecessary max_new_tokens
* revert processing kosmos2 and remove unnecessary max_new_tokens
* fix pipeline tests idefics
* Force try loading processor if pipeline supports it
* revert load_processor change
* hardcode loading only processor
* remove unnecessary try except
* skip imagetexttotext tests for kosmos2 as tiny model causes problems
* Make code clearer
* Address review comments
* remove preprocessing logic from pipeline
* fix fuyu
* add BC resize fuyu
* Move post_process_image_text_to_text to ProcessorMixin
* add guard in post_process
* fix zero shot object detection pipeline
* add support for generator input in pipeline
* nit
* change default image-text-to-text model to llava onevision
* fix owlv2 size dict
* Change legacy deprecation warning to only show when True
* add fast image processor rtdetr
* add gpu/cpu test and fix docstring
* remove prints
* add to doc
* nit docstring
* avoid iterating over images/annotations several times
* change torch typing
* Add image processor fast documentation
* add mamba architecture for gguf
* add logic for weights conversion, some fixes and refactoring
* add lm_head layers, unit test refactoring
* more fixes for tests
* remove lm_head creation
* remove unused comments
* feat: Added int conversion and unwrapping
* test: added tests for post_process_keypoint_detection of SuperPointImageProcessor
* docs: changed docs to include post_process_keypoint_detection method and switched from opencv to matplotlib
* test: changed test to not depend on SuperPointModel forward
* test: added missing require_torch decorator
* docs: changed pyplot parameters for the keypoints to be more visible in the example
* tests: changed import torch location to make test_flax and test_tf
* Revert "tests: changed import torch location to make test_flax and test_tf"
This reverts commit 39b32a2f69.
* tests: fixed import
* chore: applied suggestions from code review
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* tests: fixed import
* tests: fixed import (bis)
* tests: fixed import (ter)
* feat: added choice of type for target_size and changed tests accordingly
* docs: updated code snippet to reflect the addition of target size type choice in post process method
* tests: fixed imports (...)
* tests: fixed imports (...)
* style: formatting file
* docs: fixed typo from image[0] to image.size[0]
* docs: added output image and fixed some tests
* Update docs/source/en/model_doc/superpoint.md
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* fix: included SuperPointKeypointDescriptionOutput in TYPE_CHECKING if statement and changed tests results to reflect changes to SuperPoint from absolute keypoints coordinates to relative
* docs: changed SuperPoint's docs to print output instead of just accessing
* style: applied make style
* docs: added missing output type and precision in docstring of post_process_keypoint_detection
* perf: deleted loop to perform keypoint conversion in one statement
* fix: moved keypoint conversion at the end of model forward
* docs: changed SuperPointInterestPointDecoder to SuperPointKeypointDecoder class name and added relative (x, y) coordinates information to its method
* fix: changed type hint
* refactor: removed unnecessary brackets
* revert: SuperPointKeypointDecoder to SuperPointInterestPointDecoder
* Update docs/source/en/model_doc/superpoint.md
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
---------
Co-authored-by: Steven Bucaille <steven.bucaille@buawei.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Add docs/source/ar/fast_tokenizers.md to Add_docs_source_ar_fast_tokenizers.md
* Update _toctree.yml
* Update _toctree.yml
* Update docs/source/ar/_toctree.yml
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/fast_tokenizers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/fast_tokenizers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/fast_tokenizers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/fast_tokenizers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/fast_tokenizers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/fast_tokenizers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/fast_tokenizers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/fast_tokenizers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/fast_tokenizers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/fast_tokenizers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
---------
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* translated gguf.md into chinese
* Apply suggestions from code review
I have updated the PR accordingly.Thank you very much for detailed guidance,and I 'll pay more attention to the details next time.
Co-authored-by: Isotr0py <2037008807@qq.com>
* Apply suggestions from code review
Co-authored-by: Isotr0py <2037008807@qq.com>
---------
Co-authored-by: Isotr0py <2037008807@qq.com>
* Add SynthIDTextWatermarkLogitsProcessor
* esolving comments.
* Resolving comments.
* esolving commits,
* Improving SynthIDWatermark tests.
* switch to PT version
* detector as pretrained model + style
* update training + style
* rebase
* Update logits_process.py
* Improving SynthIDWatermark tests.
* Shift detector training to wikitext negatives and stabilize with lower learning rate.
* Clean up.
* in for 7B
* cleanup
* upport python 3.8.
* README and final cleanup.
* HF Hub upload and initiaze.
* Update requirements for synthid_text.
* Adding SynthIDTextWatermarkDetector.
* Detector testing.
* Documentation changes.
* Copyrights fix.
* Fix detector api.
* ironing out errors
* ironing out errors
* training checks
* make fixup and make fix-copies
* docstrings and add to docs
* copyright
* BC
* test docstrings
* move import
* protect type hints
* top level imports
* watermarking example
* direct imports
* tpr fpr meaning
* process_kwargs
* SynthIDTextWatermarkingConfig docstring
* assert -> exception
* example updates
* no immutable dict (cant be serialized)
* pack fn
* einsum equivalent
* import order
* fix test on gpu
* add detector example
---------
Co-authored-by: Sumedh Ghaisas <sumedhg@google.com>
Co-authored-by: Marc Sun <marc@huggingface.co>
Co-authored-by: sumedhghaisas2 <138781311+sumedhghaisas2@users.noreply.github.com>
Co-authored-by: raushan <raushan@huggingface.co>
* add colorize_depth and matplotlib availability check
* add post_process_depth_estimation for zoedepth + tests
* add post_process_depth_estimation for DPT + tests
* add post_process_depth_estimation in DepthEstimationPipeline & special case for zoedepth
* run `make fixup`
* fix import related error on tests
* fix more import related errors on test
* forgot some `torch` calls in declerations
* remove `torch` call in zoedepth tests that caused error
* updated docs for depth estimation
* small fix for `colorize` input/output types
* remove `colorize_depth`, fix various names, remove matplotlib dependency
* fix formatting
* run fixup
* different images for test
* update examples in `forward` functions
* fixed broken links
* fix output types for docs
* possible format fix inside `<Tip>`
* Readability related updates
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Readability related update
* cleanup after merge
* refactor `post_process_depth_estimation` to return dict; simplify ZoeDepth's `post_process_depth_estimation`
* rewrite dict merging to support python 3.8
---------
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* first try
* codestyle
* idefics2 is happy
* [run-slow] llava, llava_next, video_llava, vipllava, llava_next_video, idefics, idefics2, kosmos2, fuyu, blip, blip_2, instructblip, instructblipvideo, paligemma
* fix-copies
* [run-slow] llava, llava_next, video_llava, vipllava, llava_next_video, idefics, idefics2, kosmos2, fuyu, blip, blip_2, instructblip, instructblipvideo
* blip-2 needs to init vision from config
* when was this removed O_o
* minor fix
* tests
* this way?
* tests
* model-agnostic code
* codestyle
* add tests for idefics
* modify general test for VLMs
* no generation test for vlm yet!
* no generation test here also
* wanr in VIT-SDPA if output attn
* add more tests
* user can pass dict as attn impl
* repo consistency
* update
* muicgen
* no prints
* forgot speech enc-dec and clip
* how many composite models we have?
* musicgen meelody is same as mudicgen
* +siglip
* fix tests + add some more
* remove idefics custom overriden code
* make idefics2 automappable
* nits
* skip tests
* doctests
* Update src/transformers/models/idefics2/configuration_idefics2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/clip/test_modeling_clip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/idefics2/test_modeling_idefics2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/idefics2/test_modeling_idefics2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/configuration_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* major update, no need for automap
* clean up
* add FA2 test
* more tests
* style
* skip tests
* why did these started failing now?
* no attributes for FA2 needed
* one tiny test
* address comment about FA2 false warning
* style
* add new models and resolve conflicts
* fix copies
* let it be this way for now, come back tomorrow to review
* some more fixes
* update
* more updates
* update
* fix copies
* style and tests
* another big update
* fix tests
* fix tests
* update
* another update
* fix tests
* fix copies
* fix tests
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* mistral qna start
* mixtral qna
* oops
* qwen2 qna
* qwen2moe qna
* add missing input embed methods
* add copied to all methods, can't directly from llama due to the prefix
* make top level copied from
* add sdpa to OPT
* chore: remove redundant whitespace in OPTDecoder class
* fixup
* bug fix
* add sdpa and attention generate test
* fixup
* Refactor OPTAttention forward method for improved readability and maintainability
* undo refactor for _shape and key,val states
* add OPT to doc, fixup didn't find it for some reason
* change order
* change default attn_implemntation in testing to eager
* [run-slow] opt
* change test_eager_matches_sdpa_generate to the one llama
* Update default attention implementation in testing common
* [run-slow] opt
* remove uneeded print
* [run-slow] opt
* refactor model testers to have attn_implementation="eager"
* [run-slow] opt
* convert test_eager_matches_sdpa_generate to opt-350M
* bug fix when creating mask for opt
* [run-slow] opt
* if layer head mask default to eager
* if head mask is not none fall to eager
* [run-slow] opt
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Clean up Unpack imports (#33631)
clean up Unpack imports
* Fix DPT /Dinov2 sdpa regression on main (#33660)
* fallback to eager if output attentions.
* fix copies
* handle dependency errors in check_imports (#33622)
* handle dependency errors in check_imports
* change log level to warning
* add back self.max_position_embeddings = config.max_position_embeddings (#33550)
* add back self.max_position_embeddings = config.max_position_embeddings
* fix-copies
* Fix Llava conversion for LlavaQwen2ForCausalLM with Clip vision tower (#33613)
fix llavaqwen2 model conversion
* Uniformize kwargs for Udop processor and update docs (#33628)
* Add optional kwargs and uniformize udop
* cleanup Unpack
* nit Udop
* Generation: deprecate `PreTrainedModel` inheriting from `GenerationMixin` (#33203)
* Enable BNB multi-backend support (#31098)
* enable cpu bnb path
* fix style
* fix code style
* fix 4 bit path
* Update src/transformers/utils/import_utils.py
Co-authored-by: Aarni Koskela <akx@iki.fi>
* add multi backend refactor tests
* fix style
* tweak 4bit quantizer + fix corresponding tests
* tweak 8bit quantizer + *try* fixing corresponding tests
* fix dequant bnb 8bit
* account for Intel CPU in variability of expected outputs
* enable cpu and xpu device map
* further tweaks to account for Intel CPU
* fix autocast to work with both cpu + cuda
* fix comments
* fix comments
* switch to testing_utils.torch_device
* allow for xpu in multi-gpu tests
* fix tests 4bit for CPU NF4
* fix bug with is_torch_xpu_available needing to be called as func
* avoid issue where test reports attr err due to other failure
* fix formatting
* fix typo from resolving of merge conflict
* polish based on last PR review
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* fix CI
* Update src/transformers/integrations/integration_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/integrations/integration_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix error log
* fix error msg
* add \n in error log
* make quality
* rm bnb cuda restriction in doc
* cpu model don't need dispatch
* fix doc
* fix style
* check cuda avaliable in testing
* fix tests
* Update docs/source/en/model_doc/chameleon.md
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update docs/source/en/model_doc/llava_next.md
Co-authored-by: Aarni Koskela <akx@iki.fi>
* Update tests/quantization/bnb/test_4bit.py
Co-authored-by: Aarni Koskela <akx@iki.fi>
* Update tests/quantization/bnb/test_4bit.py
Co-authored-by: Aarni Koskela <akx@iki.fi>
* fix doc
* fix check multibackends
* fix import sort
* remove check torch in bnb
* docs: update bitsandbytes references with multi-backend info
* docs: fix small mistakes in bnb paragraph
* run formatting
* reveret bnb check
* move bnb multi-backend check to import_utils
* Update src/transformers/utils/import_utils.py
Co-authored-by: Aarni Koskela <akx@iki.fi>
* fix bnb check
* minor fix for bnb
* check lib first
* fix code style
* Revert "run formatting"
This reverts commit ac108c6d6b.
* fix format
* give warning when bnb version is low and no cuda found]
* fix device assignment check to be multi-device capable
* address akx feedback on get_avlbl_dev fn
* revert partially, as we don't want the function that public, as docs would be too much (enforced)
---------
Co-authored-by: Aarni Koskela <akx@iki.fi>
Co-authored-by: Titus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Fix error string after refactoring into get_chat_template (#33652)
* Fix error string after refactoring into get_chat_template
* Take suggestion from CR
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
---------
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* uniformize git processor (#33668)
* uniformize git processor
* update doctring
* Modular `transformers`: modularity and inheritance for new model additions (#33248)
* update exampel
* update
* push the converted diff files for testing and ci
* correct one example
* fix class attributes and docstring
* nits
* oups
* fixed config!
* update
* nitd
* class attributes are not matched against the other, this is missing
* fixed overwriting self.xxx now onto the attributes I think
* partial fix, now order with docstring
* fix docstring order?
* more fixes
* update
* fix missing docstrings!
* examples don't all work yet
* fixup
* nit
* updated
* hick
* update
* delete
* update
* update
* update
* fix
* all default
* no local import
* fix more diff
* some fix related to "safe imports"
* push fixed
* add helper!
* style
* add a check
* all by default
* add the
* update
* FINALLY!
* nit
* fix config dependencies
* man that is it
* fix fix
* update diffs
* fix the last issue
* re-default to all
* alll the fixes
* nice
* fix properties vs setter
* fixup
* updates
* update dependencies
* make sure to install what needs to be installed
* fixup
* quick fix for now
* fix!
* fixup
* update
* update
* updates
* whitespaces
* nit
* fix
* simplify everything, and make it file agnostic (should work for image processors)
* style
* finish fixing all import issues
* fixup
* empty modeling should not be written!
* Add logic to find who depends on what
* update
* cleanup
* update
* update gemma to support positions
* some small nits
* this is the correct docstring for gemma2
* fix merging of docstrings
* update
* fixup
* update
* take doc into account
* styling
* update
* fix hidden activation
* more fixes
* final fixes!
* fixup
* fixup instruct blip video
* update
* fix bugs
* align gemma2 with the rest as well
* updats
* revert
* update
* more reversiom
* grind
* more
* arf
* update
* order will matter
* finish del stuff
* update
* rename to modular
* fixup
* nits
* update makefile
* fixup
* update order of the checks!
* fix
* fix docstring that has a call inside
* fiix conversion check
* style
* add some initial documentation
* update
* update doc
* some fixup
* updates
* yups
* Mostly todo gimme a minut
* update
* fixup
* revert some stuff
* Review docs for the modular transformers (#33472)
Docs
* good update
* fixup
* mmm current updates lead to this code
* okay, this fixes it
* cool
* fixes
* update
* nit
* updates
* nits
* fix doc
* update
* revert bad changes
* update
* updates
* proper update
* update
* update?
* up
* update
* cool
* nits
* nits
* bon bon
* fix
* ?
* minimise changes
* update
* update
* update
* updates?
* fixed gemma2
* kind of a hack
* nits
* update
* remove `diffs` in favor of `modular`
* fix make fix copies
---------
Co-authored-by: Lysandre Debut <hi@lysand.re>
* Fix CIs post merging modular transformers (#33681)
update
* Fixed docstring for cohere model regarding unavailability of prune_he… (#33253)
* Fixed docstring for cohere model regarding unavailability of prune_head() methods
The docstring mentions that cohere model supports prune_heads() methods. I have fixed the docstring by explicitly mentioning that it doesn't support that functionality.
* Update src/transformers/models/cohere/modeling_cohere.py
---------
Co-authored-by: Lysandre Debut <hi@lysand.re>
* Generation tests: update imagegpt input name, remove unused functions (#33663)
* Improve Error Messaging for Flash Attention 2 on CPU (#33655)
Update flash-attn error message on CPU
Rebased to latest branch
* Gemma2: fix config initialization (`cache_implementation`) (#33684)
* Fix ByteLevel alphabet missing when Sequence pretokenizer is used (#33556)
* Fix ByteLevel alphabet missing when Sequence pretokenizer is used
* Fixed formatting with `ruff`.
* Uniformize kwargs for image-text-to-text processors (#32544)
* uniformize FUYU processor kwargs
* Uniformize instructblip processor kwargs
* Fix processor kwargs and tests Fuyu, InstructBlip, Kosmos2
* Uniformize llava_next processor
* Fix save_load test for processor with chat_template only as extra init args
* Fix import Unpack
* Fix Fuyu Processor import
* Fix FuyuProcessor import
* Fix FuyuProcessor
* Add defaults for specific kwargs kosmos2
* Fix Udop to return BatchFeature instead of BatchEncoding and uniformize kwargs
* Add tests processor Udop
* remove Copied from in processing Udop as change of input orders caused by BatchEncoding -> BatchFeature
* Fix overwrite tests kwargs processors
* Add warnings and BC for changes in processor inputs order, change docs, add BC for text_pair as arg for Udop
* Fix processing test fuyu
* remove unnecessary pad_token check in instructblip ProcessorTest
* Fix BC tests and cleanup
* FIx imports fuyu
* Uniformize Pix2Struct
* Fix wrong name for FuyuProcessorKwargs
* Fix slow tests reversed inputs align fuyu llava-next, change udop warning
* Fix wrong logging import udop
* Add check images text input order
* Fix copies
* change text pair handling when positional arg
* rebase on main, fix imports in test_processing_common
* remove optional args and udop uniformization from this PR
* fix failing tests
* remove unnecessary test, fix processing utils and test processing common
* cleanup Unpack
* cleanup
* fix conflict grounding dino
* 🚨🚨 Setting default behavior of assisted decoding (#33657)
* tests: fix pytorch tensor placement errors (#33485)
This commit fixes the following errors:
* Fix "expected all tensors to be on the same device" error
* Fix "can't convert device type tensor to numpy"
According to pytorch documentation torch.Tensor.numpy(force=False)
performs conversion only if tensor is on CPU (plus few other restrictions)
which is not the case. For our case we need force=True since we just
need a data and don't care about tensors coherency.
Fixes: #33517
See: https://pytorch.org/docs/2.4/generated/torch.Tensor.numpy.html
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
* bump tokenizers, fix added tokens fast (#32535)
* update based on tokenizers release
* update
* nits
* update
* revert re addition
* don't break that yet
* fmt
* revert unwanted
* update tokenizers version
* update dep table
* update
* update in conversion script as well
* some fix
* revert
* fully revert
* fix training
* remove set trace
* fixup
* update
* update
* [Pixtral] Improve docs, rename model (#33491)
* Improve docs, rename model
* Fix style
* Update repo id
* fix code quality after merge
* HFQuantizer implementation for compressed-tensors library (#31704)
* Add compressed-tensors HFQuantizer implementation
* flag serializable as False
* run
* revive lines deleted by ruff
* fixes to load+save from sparseml, edit config to quantization_config, and load back
* address satrat comment
* compressed_tensors to compressed-tensors and revert back is_serializable
* rename quant_method from sparseml to compressed-tensors
* tests
* edit tests
* clean up tests
* make style
* cleanup
* cleanup
* add test skip for when compressed tensors is not installed
* remove pydantic import + style
* delay torch import in test
* initial docs
* update main init for compressed tensors config
* make fix-copies
* docstring
* remove fill_docstring
* Apply suggestions from code review
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* review comments
* review comments
* comments - suppress warnings on state dict load, tests, fixes
* bug-fix - remove unnecessary call to apply quant lifecycle
* run_compressed compatability
* revert changes not needed for compression
* no longer need unexpected keys fn
* unexpected keys not needed either
* Apply suggestions from code review
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* add to_diff_dict
* update docs and expand testing
* Update _toctree.yml with compressed-tensors
* Update src/transformers/utils/quantization_config.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* update doc
* add note about saving a loaded model
---------
Co-authored-by: George Ohashi <george@neuralmagic.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Sara Adkins <sara@neuralmagic.com>
Co-authored-by: Sara Adkins <sara.adkins65@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Dipika Sikka <ds3822@columbia.edu>
Co-authored-by: Dipika <dipikasikka1@gmail.com>
* update model card for opt
* add batch size to inference table
* [slow-run] opt
* [run-slow] opt
---------
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Co-authored-by: Avishai Elmakies <avishai.elma@cs.huji.ac.il>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: chengchengpei <5881383+chengchengpei@users.noreply.github.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Aarni Koskela <akx@iki.fi>
Co-authored-by: Titus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Tibor Reiss <75096465+tibor-reiss@users.noreply.github.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
Co-authored-by: Muhammad Naufil <m.naufil1@gmail.com>
Co-authored-by: sizhky <yyeshr@gmail.com>
Co-authored-by: Umar Butler <umar@umar.au>
Co-authored-by: Jonathan Mamou <jonathan.mamou@intel.com>
Co-authored-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>
Co-authored-by: George Ohashi <george@neuralmagic.com>
Co-authored-by: Sara Adkins <sara@neuralmagic.com>
Co-authored-by: Sara Adkins <sara.adkins65@gmail.com>
Co-authored-by: Dipika Sikka <ds3822@columbia.edu>
Co-authored-by: Dipika <dipikasikka1@gmail.com>
Add Translate docs into Arabic - section files CONCEPTUAL GUIDES
---------------------------------------------------------------------------------------
Philosophy [i18n-ar] Translated file : docs/source/ar/philosophy.md into Arabic #33064
Glossary [i18n-ar] Translated file : docs/source/ar/glossary.md into Arabic #33038
What 🤗 Transformers can do [i18n-ar] Translated file : docs/source/ar/task_summary.md into Arabic #33073
How 🤗 Transformers solve tasks [i18n-ar] Translated file : docs/source/ar/tasks_explained.md into Arabic #33074
The Transformer model family [i18n-ar] Translated file : docs/source/ar/model_summary.md into Arabic #33047
Summary of the tokenizers [i18n-ar] Translated file : docs/source/ar/tokenizer_summary.md into Arabic #33078
Attention [i18n-ar] Translated file : docs/source/ar/attention.md into Arabic #33021
Padding and truncation [i18n-ar] Translated file : docs/source/ar/pad_truncation.md into Arabic #33050
BERTology [i18n-ar] Translated file : docs/source/ar/bertology.md into Arabic #33024
Perplexity of fixed-length models [i18n-ar] Translated file : docs/source/ar/perplexity.md into Arabic #33063
Pipelines for webserver inference [i18n-ar] Translated file : docs/source/ar/pipeline_webserver.md into Arabic #33066
Model training anatomy [i18n-ar] Translated file : docs/source/ar/model_memory_anatomy.md into Arabic #33045
Getting the most out of LLMs [i18n-ar] Translated file : docs/source/ar/llm_tutorial_optimization.md into Arabic #33043
* rebasing changes
* fixing style
* adding some doc to functions
* remove bitblas
* change dtype
* fixing check_code_quality
* fixing import order
* adding doc to tree
* Small update on BitLinear
* adding some tests
* sorting imports
* small update
* reformatting
* reformatting
* reformatting with ruff
* adding assert
* changes after review
* update disk offloading
* adapting after review
* Update after review
* add is_serializable back
* fixing style
* adding serialization test
* make style
* small updates after review
* improve modular
* style
* Update modular_model_converter.py
* pretty print warning
* style
* Support to remove unused classes as part of added dependencies as well
* nits
* correct bug
* add example
* style
* Add documentation
* Add Auto model for image-text-to-text
* Remove donut from processing auto, add chameleon ti image text to text models
* add qwen2_vl and llava_onevision
* add pixtral to auto model for image-text-to-text
* add mllama and idefics3
* remove models in IGNORE_NON_AUTO_CONFIGURED
* add AutoModelForImageTextToText to tests and doc
* docs: add example for separating q, k, v projections in SAM
* docs: How to Hack Any Transformers Model
* docs: remove changes from sam model docs
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Initial commit for MyT5 model
* custom implementation of MyT5 tokenizer, unused files deleted
* unittest for myt5 tokenizer
* upadate of import structure and style
* removed remmanents of MyT5Config
* fixed docstrings
* Updates after review: filled documentaion file, new docstrings and tests added
* Fixed code style issues
* fixed copied from to refer to function
* updated loading myt5 tokenizer in tests, added sample byte map file to fixtures
* changes after review
* removed redundant copied from
* removed redundant copied from
* optimalization and loading model from hf
* [run_slow] myt5
* [run-slow] myt5
* Updated en documentation for myt5
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* onboard phimoe model
* removed debug code
* added unit tests
* updated docs
* formatted
* fixed unit tests
* fixed test case
* fixed format
* refactored code
* fixed expected outputs in the integration tests
* Added a warning msg
* Addressed comments
* Addressed comments
* fixed test cases
* added paper link
* Addressed comments
* Refactored PhimoeForCausalLM forward fn
* Refactored PhimoeRotaryEmbedding class
* fixed test cases
* fixed testcase
* fixed test case
* Addressed comments
* fixed test cases
* fixed testcases
* Used cache position instead to get the seq len
* enable cpu awq ipex linear
* add doc for cpu awq with ipex kernel
* add tests for cpu awq
* fix code style
* fix doc and tests
* Update docs/source/en/quantization/awq.md
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update tests/quantization/autoawq/test_awq.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* fix comments
* fix log
* fix log
* fix style
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Trainer - deprecate tokenizer for processing_class
* Extend chage across Seq2Seq trainer and docs
* Add tests
* Update to FutureWarning and add deprecation version
* HQQ model serialization attempt
* fix hqq dispatch and unexpected keys
* style
* remove check_old_param
* revert to check HQQLinear in quantizer_hqq.py
* revert to check HQQLinear in quantizer_hqq.py
* update HqqConfig default params
* make ci happy
* make ci happy
* revert to HQQLinear check in quantizer_hqq.py
* check hqq_min version 0.2.0
* set axis=1 as default in quantization_config.py
* validate_env with hqq>=0.2.0 version message
* deprecated hqq kwargs message
* make ci happy
* remove run_expected_keys_check hack + bump to 0.2.1 min hqq version
* fix unexpected_keys hqq update
* add pre_quantized check
* add update_expected_keys to base quantizerr
* ci base.py fix?
* ci base.py fix?
* fix "quantization typo" src/transformers/utils/quantization_config.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix post merge
---------
Co-authored-by: Marc Sun <marc@huggingface.co>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Enable non-safetensor serialization and deserialization for TorchAoConfig quantized model
Summary:
After https://github.com/huggingface/huggingface_hub/pull/2440 we added non-safetensor serialization and deserialization
in huggingface, with this we can now add the support in transformers
Note that we don't plan to add safetensor serialization due to different goals of wrapper tensor subclass and safetensor
see README for more details
Test Plan:
tested locally
Reviewers:
Subscribers:
Tasks:
Tags:
* formatting
* formatting
* minor fix
* formatting
* address comments
* comments
* minor fix
* update doc
* refactor compressed tensor quantizer
* add bloom arch support for gguf
* apply format
* small refactoring, bug fix in GGUF_TENSOR_MAPPING naming
* optimize bloom GGUF_TENSOR_MAPPING
* implement reverse reshaping for bloom gguf
* add qkv weights test
* add q_8 test for bloom
Update siglip.md
This was already partially fixed relative to the deployed docs. But the partial fix made it inconsistent. Additionally, giving the full text ("This is a photo of...") is likely not the desired output.
* Add Idefics 3!
* fixes to make both pipelines identical
* fix for quantized models
* First pass at the review
* remove vocab size from the main config (it's still in the text_config)
* hot fix for merve
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* re-add model_type for text_config
* remove support for old_cache
* remove hidden_size from main config
* rename idefics3 HF repo
* few changes suggested in the PR
* fix to input_data_format computation
* remove overwrite of _autoset_attn_implementation following @zucchini-nlp suggestion
* improve example
* few improvements from amy's review
* big change to enable processing input images as numpy arrays
* Changes to the code to uniformize processor kwargs
* image processing tests
* image processing tests fixes and some bugs they discovered
* addressed review comments from Yoni
* fix modeling tests
* remove special tokens that are not special
* fixes tests
* skip failing tests - they also fail for idefics2
* added paper and readded the tests with multi gpu, who knows
* Update docs/source/en/model_doc/idefics3.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* review amy until image_processing_idefics3
* last comments from Amy
* review amy
* Update src/transformers/models/idefics3/image_processing_idefics3.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/idefics3/modeling_idefics3.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/idefics3.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* doc improvement - amy review
* fix runtime error during fine-tuning
* amy's review
* Update src/transformers/models/idefics3/image_processing_idefics3.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/idefics3/image_processing_idefics3.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/idefics3/modeling_idefics3.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* ruff
* amy's comment on the order
* ruff ruff
* fix copies
* square images when they are not splitted
* ruff :(
* Update src/transformers/models/idefics3/image_processing_idefics3.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/idefics3/test_processing_idefics3.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix small bug introduced in refactor
* amy's image processing changes
* fixes peft tests and ruff
* modify to_pil_image from transformers. and review from emanuele.
* add modified to_pil_image
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Add compressed-tensors HFQuantizer implementation
* flag serializable as False
* run
* revive lines deleted by ruff
* fixes to load+save from sparseml, edit config to quantization_config, and load back
* address satrat comment
* compressed_tensors to compressed-tensors and revert back is_serializable
* rename quant_method from sparseml to compressed-tensors
* tests
* edit tests
* clean up tests
* make style
* cleanup
* cleanup
* add test skip for when compressed tensors is not installed
* remove pydantic import + style
* delay torch import in test
* initial docs
* update main init for compressed tensors config
* make fix-copies
* docstring
* remove fill_docstring
* Apply suggestions from code review
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* review comments
* review comments
* comments - suppress warnings on state dict load, tests, fixes
* bug-fix - remove unnecessary call to apply quant lifecycle
* run_compressed compatability
* revert changes not needed for compression
* no longer need unexpected keys fn
* unexpected keys not needed either
* Apply suggestions from code review
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* add to_diff_dict
* update docs and expand testing
* Update _toctree.yml with compressed-tensors
* Update src/transformers/utils/quantization_config.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* update doc
* add note about saving a loaded model
---------
Co-authored-by: George Ohashi <george@neuralmagic.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Sara Adkins <sara@neuralmagic.com>
Co-authored-by: Sara Adkins <sara.adkins65@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Dipika Sikka <ds3822@columbia.edu>
Co-authored-by: Dipika <dipikasikka1@gmail.com>
* update exampel
* update
* push the converted diff files for testing and ci
* correct one example
* fix class attributes and docstring
* nits
* oups
* fixed config!
* update
* nitd
* class attributes are not matched against the other, this is missing
* fixed overwriting self.xxx now onto the attributes I think
* partial fix, now order with docstring
* fix docstring order?
* more fixes
* update
* fix missing docstrings!
* examples don't all work yet
* fixup
* nit
* updated
* hick
* update
* delete
* update
* update
* update
* fix
* all default
* no local import
* fix more diff
* some fix related to "safe imports"
* push fixed
* add helper!
* style
* add a check
* all by default
* add the
* update
* FINALLY!
* nit
* fix config dependencies
* man that is it
* fix fix
* update diffs
* fix the last issue
* re-default to all
* alll the fixes
* nice
* fix properties vs setter
* fixup
* updates
* update dependencies
* make sure to install what needs to be installed
* fixup
* quick fix for now
* fix!
* fixup
* update
* update
* updates
* whitespaces
* nit
* fix
* simplify everything, and make it file agnostic (should work for image processors)
* style
* finish fixing all import issues
* fixup
* empty modeling should not be written!
* Add logic to find who depends on what
* update
* cleanup
* update
* update gemma to support positions
* some small nits
* this is the correct docstring for gemma2
* fix merging of docstrings
* update
* fixup
* update
* take doc into account
* styling
* update
* fix hidden activation
* more fixes
* final fixes!
* fixup
* fixup instruct blip video
* update
* fix bugs
* align gemma2 with the rest as well
* updats
* revert
* update
* more reversiom
* grind
* more
* arf
* update
* order will matter
* finish del stuff
* update
* rename to modular
* fixup
* nits
* update makefile
* fixup
* update order of the checks!
* fix
* fix docstring that has a call inside
* fiix conversion check
* style
* add some initial documentation
* update
* update doc
* some fixup
* updates
* yups
* Mostly todo gimme a minut
* update
* fixup
* revert some stuff
* Review docs for the modular transformers (#33472)
Docs
* good update
* fixup
* mmm current updates lead to this code
* okay, this fixes it
* cool
* fixes
* update
* nit
* updates
* nits
* fix doc
* update
* revert bad changes
* update
* updates
* proper update
* update
* update?
* up
* update
* cool
* nits
* nits
* bon bon
* fix
* ?
* minimise changes
* update
* update
* update
* updates?
* fixed gemma2
* kind of a hack
* nits
* update
* remove `diffs` in favor of `modular`
* fix make fix copies
---------
Co-authored-by: Lysandre Debut <hi@lysand.re>
* enable cpu bnb path
* fix style
* fix code style
* fix 4 bit path
* Update src/transformers/utils/import_utils.py
Co-authored-by: Aarni Koskela <akx@iki.fi>
* add multi backend refactor tests
* fix style
* tweak 4bit quantizer + fix corresponding tests
* tweak 8bit quantizer + *try* fixing corresponding tests
* fix dequant bnb 8bit
* account for Intel CPU in variability of expected outputs
* enable cpu and xpu device map
* further tweaks to account for Intel CPU
* fix autocast to work with both cpu + cuda
* fix comments
* fix comments
* switch to testing_utils.torch_device
* allow for xpu in multi-gpu tests
* fix tests 4bit for CPU NF4
* fix bug with is_torch_xpu_available needing to be called as func
* avoid issue where test reports attr err due to other failure
* fix formatting
* fix typo from resolving of merge conflict
* polish based on last PR review
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* fix CI
* Update src/transformers/integrations/integration_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/integrations/integration_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix error log
* fix error msg
* add \n in error log
* make quality
* rm bnb cuda restriction in doc
* cpu model don't need dispatch
* fix doc
* fix style
* check cuda avaliable in testing
* fix tests
* Update docs/source/en/model_doc/chameleon.md
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update docs/source/en/model_doc/llava_next.md
Co-authored-by: Aarni Koskela <akx@iki.fi>
* Update tests/quantization/bnb/test_4bit.py
Co-authored-by: Aarni Koskela <akx@iki.fi>
* Update tests/quantization/bnb/test_4bit.py
Co-authored-by: Aarni Koskela <akx@iki.fi>
* fix doc
* fix check multibackends
* fix import sort
* remove check torch in bnb
* docs: update bitsandbytes references with multi-backend info
* docs: fix small mistakes in bnb paragraph
* run formatting
* reveret bnb check
* move bnb multi-backend check to import_utils
* Update src/transformers/utils/import_utils.py
Co-authored-by: Aarni Koskela <akx@iki.fi>
* fix bnb check
* minor fix for bnb
* check lib first
* fix code style
* Revert "run formatting"
This reverts commit ac108c6d6b.
* fix format
* give warning when bnb version is low and no cuda found]
* fix device assignment check to be multi-device capable
* address akx feedback on get_avlbl_dev fn
* revert partially, as we don't want the function that public, as docs would be too much (enforced)
---------
Co-authored-by: Aarni Koskela <akx@iki.fi>
Co-authored-by: Titus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add sdpa to dinov2
* fixup
* add dinov2 to sdpa doc
* update doc order
* [run-slow] dinov2
* common to eager
* [run-slow] dinov2
* update attn implementation in common
* update test_modeling_dinov2 to have mask_ration, num_masks and mask_length similar to vit
* [run-slow] dinov2
---------
Co-authored-by: Avishai Elmakies <avishai.elma@cs.huji.ac.il>
* clean mimi commit
* some nits suggestions from Arthur
* make fixup
* rename repo id + change readme
* Update docs/source/en/model_doc/mimi.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add flaky flag to batching equivalence due to audio_codes failing sometimes
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Add explicit example for RAG chat templating
* Add Tip box and reformulate
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
---------
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Update ar lang build_documentation.yml
* Update ar lang build_pr_documentation.yml
* Update docs/source/ar/pipeline_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/pipeline_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/pipeline_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/pipeline_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/pipeline_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/pipeline_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/pipeline_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/pipeline_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/pipeline_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/pipeline_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/pipeline_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/pipeline_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/pipeline_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/pipeline_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/pipeline_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/pipeline_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/pipeline_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/pipeline_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/pipeline_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/pipeline_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/pipeline_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/pipeline_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/pipeline_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/pipeline_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/pipeline_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/autoclass_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/autoclass_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/autoclass_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/autoclass_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/autoclass_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/autoclass_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/autoclass_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/autoclass_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/autoclass_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/autoclass_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/autoclass_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/autoclass_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/autoclass_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/autoclass_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/preprocessing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/preprocessing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/preprocessing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/preprocessing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/preprocessing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/preprocessing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/preprocessing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/preprocessing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/preprocessing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/preprocessing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/preprocessing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/preprocessing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/preprocessing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/preprocessing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/preprocessing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/preprocessing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/preprocessing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/preprocessing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/preprocessing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/preprocessing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/preprocessing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/preprocessing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/preprocessing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/preprocessing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/preprocessing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/preprocessing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/preprocessing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/preprocessing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/preprocessing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/preprocessing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/preprocessing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/preprocessing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/preprocessing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/preprocessing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/preprocessing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/preprocessing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/preprocessing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/preprocessing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/preprocessing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/preprocessing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/training.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/training.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/training.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/training.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/training.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/training.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/training.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/training.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/training.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/training.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/training.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/training.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/training.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/training.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/training.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/training.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/training.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/training.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/training.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/training.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/run_scripts.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/run_scripts.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/run_scripts.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/run_scripts.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/run_scripts.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/run_scripts.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/run_scripts.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/accelerate.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/accelerate.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/accelerate.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/accelerate.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/accelerate.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/accelerate.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Create _config.py
* Update _toctree.yml
* Update _toctree.yml
* Update docs/source/ar/peft.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/peft.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/peft.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/peft.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/peft.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/peft.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/peft.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/peft.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/peft.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update _toctree.yml
* Update docs/source/ar/model_sharing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/model_sharing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/model_sharing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/model_sharing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/model_sharing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/model_sharing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/model_sharing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/model_sharing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/model_sharing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/model_sharing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/model_sharing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/model_sharing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/model_sharing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/model_sharing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/model_sharing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/model_sharing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/model_sharing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/model_sharing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/model_sharing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/model_sharing.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/conversations.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/conversations.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/conversations.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/conversations.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/conversations.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/conversations.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/conversations.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/conversations.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/conversations.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/conversations.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/conversations.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/conversations.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/conversations.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/conversations.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/conversations.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/agents.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/agents.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/agents.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/agents.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/agents.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/agents.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/agents.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/agents.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/agents.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/agents.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/agents.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/agents.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/agents.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/agents.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/agents.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/agents.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/agents.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/agents.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/agents.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/agents.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/agents.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/agents.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/agents.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/agents.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/agents.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/agents.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/agents.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/agents.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/agents.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/agents.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/agents.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/agents.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/agents.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/agents.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/agents.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/agents.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/agents.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/agents.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/llm_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/llm_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/llm_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/llm_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/llm_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/llm_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/llm_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/llm_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/llm_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/llm_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/llm_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/llm_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/llm_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/llm_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/llm_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/llm_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/llm_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/llm_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/llm_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/llm_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/llm_tutorial.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update llm_tutorial.md
* Update _toctree.yml
* Update autoclass_tutorial.md
* Update autoclass_tutorial.md
* Update preprocessing.md
* Update glossary.md
* Update run_scripts.md
* Update run_scripts.md
* Update run_scripts.md
---------
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* initial commit
* gloups
* updates
* work
* weights match
* nits
* nits
* updates to support the tokenizer :)
* updates
* Pixtral processor (#33454)
* rough outline
* Add in image break and end tokens
* Fix
* Udo some formatting changes
* Set patch_size default
* Fix
* Fix token expansion
* nit in conversion script
* Fix image token list creation
* done
* add expected results
* Process list of list of images (#33465)
* updates
* working image and processor
* this is the expected format
* some fixes
* push current updated
* working mult images!
* add a small integration test
* Uodate configuration docstring
* Formatting
* Config docstring fix
* simplify model test
* fixup modeling and etests
* Return BatchMixFeature in image processor
* fix some copies
* update
* nits
* Update model docstring
* Apply suggestions from code review
* Fix up
* updates
* revert modeling changes
* update
* update
* fix load safe
* addd liscence
* update
* use pixel_values as required by the model
* skip some tests and refactor
* Add pixtral image processing tests (#33476)
* Image processing tests
* Add processing tests
* woops
* defaults reflect pixtral image processor
* fixup post merge
* images -> pixel values
* oups sorry Mr docbuilder
* isort
* fix
* fix processor tests
* small fixes
* nit
* update
* last nits
* oups this was really breaking!
* nits
* is composition needs to be true
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add xpu note
* add one more case
* add more
* Update docs/source/en/run_scripts.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Make StaticCache configurable at model construct time
* integrations import structure
* add new doc file to toc
---------
Co-authored-by: Guang Yang <guangyang@fb.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
* Update docs for GGUF supported models
* Add tensor mappings and define class GGUFPhi3Converter
* Fix tokenizer
* Working version
* Attempt to fix some CI failures
* Run ruff format
* Add vocab, merges, decoder methods like LlamaConverter
* Resolve conflicts since Qwen2Moe was added to gguf
- I missed one place when resolving conflict
- I also made a mistake with tests_ggml.py and now has been fixed to reflect
its master version.
* Fixed typo: insted to instead
* Fixed typo: relase to release
* Fixed typo: nighlty to nightly
* Fixed typos: versatible, benchamarks, becnhmark to versatile, benchmark, benchmarks
* Fixed typo in comment: quantizd to quantized
* Fixed typo: architecutre to architecture
* Fixed typo: contibution to contribution
* Fixed typo: Presequities to Prerequisites
* Fixed typo: faste to faster
* Fixed typo: extendeding to extending
* Fixed typo: segmetantion_maps to segmentation_maps
* Fixed typo: Alternativelly to Alternatively
* Fixed incorrectly defined variable: output to output_disabled
* Fixed typo in library name: tranformers.onnx to transformers.onnx
* Fixed missing import: import tensorflow as tf
* Fixed incorrectly defined variable: token_tensor to tokens_tensor
* Fixed missing import: import torch
* Fixed incorrectly defined variable and typo: uromaize to uromanize
* Fixed incorrectly defined variable and typo: uromaize to uromanize
* Fixed typo in function args: numpy.ndarry to numpy.ndarray
* Fixed Inconsistent Library Name: Torchscript to TorchScript
* Fixed Inconsistent Class Name: OneformerProcessor to OneFormerProcessor
* Fixed Inconsistent Class Named Typo: TFLNetForMultipleChoice to TFXLNetForMultipleChoice
* Fixed Inconsistent Library Name Typo: Pytorch to PyTorch
* Fixed Inconsistent Function Name Typo: captureWarning to captureWarnings
* Fixed Inconsistent Library Name Typo: Pytorch to PyTorch
* Fixed Inconsistent Class Name Typo: TrainingArgument to TrainingArguments
* Fixed Inconsistent Model Name Typo: Swin2R to Swin2SR
* Fixed Inconsistent Model Name Typo: EART to BERT
* Fixed Inconsistent Library Name Typo: TensorFLow to TensorFlow
* Fixed Broken Link for Speech Emotion Classification with Wav2Vec2
* Fixed minor missing word Typo
* Fixed minor missing word Typo
* Fixed minor missing word Typo
* Fixed minor missing word Typo
* Fixed minor missing word Typo
* Fixed minor missing word Typo
* Fixed minor missing word Typo
* Fixed minor missing word Typo
* Fixed Punctuation: Two commas
* Fixed Punctuation: No Space between XLM-R and is
* Fixed Punctuation: No Space between [~accelerate.Accelerator.backward] and method
* Added backticks to display model.fit() in codeblock
* Added backticks to display openai-community/gpt2 in codeblock
* Fixed Minor Typo: will to with
* Fixed Minor Typo: is to are
* Fixed Minor Typo: in to on
* Fixed Minor Typo: inhibits to exhibits
* Fixed Minor Typo: they need to it needs
* Fixed Minor Typo: cast the load the checkpoints To load the checkpoints
* Fixed Inconsistent Class Name Typo: TFCamembertForCasualLM to TFCamembertForCausalLM
* Fixed typo in attribute name: outputs.last_hidden_states to outputs.last_hidden_state
* Added missing verbosity level: fatal
* Fixed Minor Typo: take To takes
* Fixed Minor Typo: heuristic To heuristics
* Fixed Minor Typo: setting To settings
* Fixed Minor Typo: Content To Contents
* Fixed Minor Typo: millions To million
* Fixed Minor Typo: difference To differences
* Fixed Minor Typo: while extract To which extracts
* Fixed Minor Typo: Hereby To Here
* Fixed Minor Typo: addition To additional
* Fixed Minor Typo: supports To supported
* Fixed Minor Typo: so that benchmark results TO as a consequence, benchmark
* Fixed Minor Typo: a To an
* Fixed Minor Typo: a To an
* Fixed Minor Typo: Chain-of-though To Chain-of-thought
* use gguf internal dequantize
* add Q5_0 test
* add iq1 test
* add remained test
* remove duplicated test
* update docs
* add gguf version limit
* make style
* update gguf import catch
* revert vocab_size patch
* make style
* use GGUF_MIN_VERSION everywhere
* Adding SDPA support for RoBERTa-based models
* add not is_cross_attention
* fix copies
* fix test
* add minimal test for camembert and xlm_roberta as their test class does not inherit from ModelTesterMixin
* address some review comments
* use copied from
* style
* consistency
* fix lists
---------
Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add Blip2ForImageTextRetrieval
* use one line and remove unnecessary space in tests
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* use value from the config, rather than hardcoded
* change order of params in Blip2QFormerModel.forward
* update docstring
* fix style
* update test_inference_opt
* move embeddings out of Blip2QFormerModel
* remove from_vision_qformer_configs
* remove autocast float16 in Blip2QFormerModel
* rename fiels into vision_projection,text_projection,use_image_text_matching_head
* use CLIPOutput for Blip2ImageTextMatchingModelOutput
* remove past_key_values_length from Blip2TextEmbeddings
* fix small typo in the CLIPOutput docstring
* add Blip2ForImageTextRetrieval to Zero Shot Image Classification mapping
* update docstring and add require_torch_fp16
* rollback test_inference_opt
* use use_image_text_matching_head=True in convert
* skip test_model_get_set_embeddings
* fix create_rename_keys error on new itm fields
* revert to do scale after dot product between "query" and "key"
* fix ValueError on convert script for blip2-opt-2.7b
* update org of paths to Salesforce
* add is_pipeline_test_to_skip for VisualQuestionAnsweringPipelineTests
* [run_slow] blip_2
* removed Blip2ForImageTextRetrieval from IGNORE_NON_AUTO_CONFIGURED
* fix docstring of Blip2ImageTextMatchingModelOutput
* [run_slow] blip_2
* fix multi-gpu tests
* [run_slow] blip_2
* [run_slow] blip_2
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Very small change to one of the parameters
np.random.randint second parameter is not included in the possible options. Therefore, we want the upper range to be 2, so that we have some 1 labels in our classification as well.
* Add changes for uroman package to handle non-Roman characters
* Update docs for uroman changes
* Modifying error message to warning, for backward compatibility
* Update instruction for user to install uroman
* Update docs for uroman python version dependency and backward compatibility
* Update warning message for python version compatibility with uroman
* Refine docs
* Fix: fix all model_type of Llava-Next-Video to llava_next_video
* Fix doc for llava_next_video
* * Fix formatting issues
* Change llava-next-video.md file name into llava_next_video.md to make it compatible with implementation
* Fix docs TOC for llava-next-video
* more precise name
* better docstrings
* Update src/transformers/cache_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update the Kubernetes CPU training example
* Add namespace arg
Signed-off-by: Dina Suehiro Jones <dina.s.jones@intel.com>
---------
Signed-off-by: Dina Suehiro Jones <dina.s.jones@intel.com>
* Add TorchAOHfQuantizer
Summary:
Enable loading torchao quantized model in huggingface.
Test Plan:
local test
Reviewers:
Subscribers:
Tasks:
Tags:
* Fix a few issues
* style
* Added tests and addressed some comments about dtype conversion
* fix torch_dtype warning message
* fix tests
* style
* TorchAOConfig -> TorchAoConfig
* enable offload + fix memory with multi-gpu
* update torchao version requirement to 0.4.0
* better comments
* add torch.compile to torchao README, add perf number link
---------
Co-authored-by: Marc Sun <marc@huggingface.co>
* Rename "Templates for Chat Models" doc to "Chat Templates"
* Small formatting fix
* Small formatting fix
* Small formatting fix
* Cleanup tool calling docs as well
* Remove unneeded 'revision'
* Move tip to below main code example
* Little bonus section on template editing
* add new model like
* draft cuda forward - mismatched keys (sharding on conv1)
* match keys successfully
* fix split
* get generation/forward running (wrong gens, norm?)
* :update
* some refactoring
* fixes
* works up until copy to cache
* fix
* update
* NON WORKING VERSION
* version that work?
* nit
* fix config
* fix conversion script
* working cuda forward
* nit
* update
* simplifcation
* make mamba slow simple work
* no einops
* todo
* fix style
* no einops
* update fix no einsum
* nit
* remove einops
* bug: scan_output differs strongly
* add rms norm option
* fix fast + slow generation with and w/o cache ✔️
* draft integration tests
* remove a big chunk of the einsum
* fix slow, fast generations, without any einsum
* fix copies
* fix structure
* fix up modeling and tests
* fix tests
* clamping is indeed worse
* recover mamba2 cache test
* fix copies
* no cache position (yet)
* fix tf tests
* fix matmul for generate
* fixup
* skip cache tests for now
* [run-slow]mamba2
* tune out hidden states for padding
* test batched generation
* propagate attention mask changes
* fix past length
* fix integration test
* style
* address comments
* update readme
* add mamba2 version check
* fix tests
* [run-slow]mamba2
* skip edge tests
* [run-slow]mamba2
* last fixup
* [run-slow]mamba2
* update README
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
* Initial implementation of OffloadedCache
* enable usage via cache_implementation
* Address feedback, add tests, remove legacy methods.
* Remove flash-attn, discover synchronization bugs, fix bugs
* Prevent usage in CPU only mode
* Add a section about offloaded KV cache to the docs
* Fix typos in docs
* Clarifications and better explanation of streams
* mvp
* added test (a few models need fixes)
* fix a few test cases
* test nits
* harder test 😈
* revert changes in stablelm
* test with improved condition
* add todo
* tmp commit
* merged with main
* nits
* add todo
* final corrections
* add docs for generation compilation
* docs nits
* add tip
* PR suggestions
* add more details to the compilation docs
* fix cache positions
* cache is now init in generate; update docs
* tag test as flaky
* docs
* post rebase make fixup and other nits
* remove unintended changes
* whisper (encoder-decoder) not supported
* move token default updates to ; add tests for token defaults
* push changes
* manual rebase
* chameleon doesn't support this
* fix test_static_cache_mha_mqa_gqa (broken in another PR)
* docs: dynamic is better with end-to-end compilation
* No more default chat templates
* Add the template to the GPT-SW3 tests since it's not available by default now
* Fix GPT2 test
* Fix Bloom test
* Fix Bloom test
* Remove default templates again
* add DataCollatorBatchFlattening
* Update data_collator.py
* change name
* new FA2 flow if position_ids is provided
* add comments
* minor fix
* minor fix data collator
* add test cases for models
* add test case for data collator
* remove extra code
* formating for ruff check and check_repo.py
* ruff format
ruff format tests src utils
* custom_init_isort.py
* Add llama3-llava-next-8b to llava_next conversion script
Adds support for the lmms-lab/llama3-llava-next-8b model to the
convert_llava_next_weights_to_hf.py script, along with an example
prompt generated from the llava_llama_3 conv_template in the LLaVA-NeXT
repo.
* Exclude <|begin_of_text|> from prompt example
This token gets added automatically, so it should not be included in the
prompt example.
* Add llava-next-72b and llava-next-110b
Adds the Qwen-based LLaVA-Next models to the conversion script, along
with changes to load the models on multiple GPUs for inference.
* Add llama3 and qwen prompt formats to docs
* Chat prompt and padding side left for llama3 batched
* update
* Update src/transformers/models/llava_next/convert_llava_next_weights_to_hf.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/llava_next/convert_llava_next_weights_to_hf.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* remove code
* better naming
---------
Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* minor edits and clarifications
* address comment
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>