Commit Graph

17554 Commits

Author SHA1 Message Date
Cyril Vallez
d363e71d0e
🧹 Remove deprecated RotaryEmbedding parts in the Attention layers (#34858)
* update

* style

* fix missing args

* remove last trace of old rope classes

* remove deprecated copied from

* fix copies

* trigger CIs

* post rebase clean-up

* reverse mistral

* cleanup after dropping commits

* Add comment
2024-12-11 11:16:52 +01:00
Raushan Turganbay
9094b87dd4
BLIP: enable device map (#34850)
fix device map
2024-12-11 11:03:30 +01:00
HMJ0628
10feacd88a
[i18n-<languageCode>] Translating agents.md to Chinese (#35139)
* add "translate agents.md"

* add "agents.md"

* add "translate warnings"

* add "totree"

* add "remove transformer_agent"

* add "remove transformer _agent file"

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-12-10 15:16:37 -08:00
John Graham Reynolds
e8508924fd
Update data collator docstrings to accurately reference Nvidia tensor core compute capability version (#35188)
update data collator docs to reflect correct tensor core compute capability

Co-authored-by: John Graham Reynolds <john.graham.reynolds@vumc.org>
2024-12-10 15:16:01 -08:00
Steven Liu
5290f6a62d
[docs] Fix FlashAttention link (#35171)
fix link
2024-12-10 11:36:25 -08:00
French_Ball
91b8ab18b7
[i18n-<languageCode>] Translating Benchmarks.md to Chinese (#35137)
* add "Translating Benchmarks.md to Chinese "

* Removed all the English original text (which was previously kept as comments in the document) and refined some of the Chinese expressions.
2024-12-10 09:58:47 -08:00
Gaétan Lepage
217c47e31b
Only import torch.distributed if it is available (#35133) 2024-12-10 18:19:30 +01:00
Henry Hyeonmok Ko
52d135426f
Multiple typo fixes in NLP, Audio docs (#35181)
Fixed multiple typos in Tutorials, NLP, and Audio sections
2024-12-10 09:08:55 -08:00
Ahmed Almaghz
425af6cdc2
[i18n-ar] Translated file : docs/source/ar/community.md into Arabic (#33027)
* Add docs/source/ar/community.md to Add_docs_source_ar_community.md

* Update community.md

* Update community.md

* Update community.md

* Update _toctree.yml - add community.md

* Update docs/source/ar/community.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Create how_to_hack_models.md

* Create modular_transformers.md

* Create tiktoken.md

* Update _toctree.yml

* Update docs/source/ar/how_to_hack_models.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/how_to_hack_models.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/how_to_hack_models.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/how_to_hack_models.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/how_to_hack_models.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/how_to_hack_models.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/how_to_hack_models.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/how_to_hack_models.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/modular_transformers.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/modular_transformers.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/modular_transformers.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/modular_transformers.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/modular_transformers.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/modular_transformers.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/modular_transformers.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/modular_transformers.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/modular_transformers.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tiktoken.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tiktoken.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

---------

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
2024-12-10 09:08:27 -08:00
Mohamed Mekkouri
e5c45a6679
Fixing GGUF support for StableLm (#35060)
fix

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2024-12-10 16:30:09 +01:00
Huang, Guangtai
3e2769a3c9
Fix DBRX LayerNorm init method (#35177)
fix dbrx layernorm init
2024-12-10 14:31:22 +00:00
Xavier Dupré
5fba3f99c0
Remove unnecessary masked_fill in deberta models (#35182) 2024-12-10 13:52:20 +00:00
Gallil Maimon
6acb4e43a7
Support BatchNorm in Hubert pos_conv_emb as in fairseq (#34389)
* Support BatchNorm in Hubert pos_conv_emb as in fairseq

* Correct the new defaults (#34377)

* Correct the new defaults

* CIs

* add check

* Update utils.py

* Update utils.py

* Add the max_length in generate test checking shape without passing length

* style

* CIs

* fix fx CI issue

* [auto. ping] Avoid sending empty info + add more team members (#34383)

* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix glm  (#34388)

* Fix duplicated

* fix import

* Use non nested images and batched text Idefics2/3  (#34222)

* add support for non nested images and add tests

* add tests error scenario

* fix style

* added single and no image to error tests

* Fix onnx non-expotable inplace aten op (#34376)

* fix onnx non-expotable inplace op

* mistral, qwen2, qwen2_vl, starcoder2

* fixup copies

* Fix right padding in LLaVA models (#34305)

* fix right pad llavas

* device mismatch

* no filter (#34391)

* no filter

* no filter

* no filter

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* SynthID: better example (#34372)

* better example

* Update src/transformers/generation/configuration_utils.py

* Update src/transformers/generation/logits_process.py

* nits

* Tests: upgrade `test_eager_matches_sdpa_generate` (#34386)

* Fix bnb training test failure (#34414)

* Fix bnb training test: compatibility with OPTSdpaAttention

* Avoid check expected exception when it is on CUDA (#34408)

* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix typos in agents_advanced.md (#34405)

* [docs] Cache implementations (#34325)

cache

* [run-slow] hubert

* Support BatchNorm in Hubert pos_conv_emb as in fairseq
Add conversion integration test, and make batchnorm explicit variable

* Support BatchNorm in Hubert pos_conv_emb as in fairseq
fix make fixup styling changes

* [run-slow] hubert

* Support BatchNorm in Hubert pos_conv_emb as in fairseq

* [run-slow] hubert

* Support BatchNorm in Hubert pos_conv_emb as in fairseq
Add conversion integration test, and make batchnorm explicit variable

* Support BatchNorm in Hubert pos_conv_emb as in fairseq
fix make fixup styling changes

* [run-slow] hubert

* [run-slow] hubert

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
Co-authored-by: Rudy Delouya <rudy.delouya@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
2024-12-10 14:18:23 +01:00
Trevor Royer
80f2b1610f
Fix file path for shard_num 1 with mllama converter (#35053)
"#35049 fix path for num_shard 1"
2024-12-10 09:11:45 +00:00
Raushan Turganbay
0938b57770
Assisted decoding multi-gpu (#35116)
* fix

* move a few lines up
2024-12-10 09:59:17 +01:00
Spiros Dontas
dada0fd85f
Fix num_items_in_batch not being an integer (#35115)
In method `Trainer#get_batch_samples`, the return values should be a
list of batch samples and an integer indicating the number of items that
exist in the batch. However, this was not actually a case and what was
returned instead of an integer, was a tensor with one element. In the
multi-GPU setup, this tensor is placed in a different device than the
loss tensor, causing the loss function to raise a `RuntimeError`.

The problem arises from
5d7739f15a/src/transformers/trainer.py (L5139-L5144),
where the outer `sum` operates over a list of tensors which means that
the final result is also a tensor. To counter this issue, a new check
(after the accelerator gathering) has been added in order to convert a
potential tensor to an integer before returning the
`num_items_in_batch`.
2024-12-10 08:40:40 +01:00
Matthew Douglas
34f4080ff5
[CI] Fix bnb quantization tests with accelerate>=1.2.0 (#35172) 2024-12-09 13:55:16 -05:00
UV
fa8763ce17
Fixed typo of 'avilable' in prompts.py (#35145) 2024-12-09 16:40:32 +00:00
fzyzcjy
4bc39de5c3
Super tiny fix logging message (#35132)
Update integration_utils.py
2024-12-09 16:31:32 +00:00
Lysandre Debut
8e806a336f
Cleanup: continue the init refactor (#35167)
Round 2
2024-12-09 16:09:50 +01:00
Mohamed Mekkouri
7238387f67
Fix typo in EETQ Tests (#35160)
fix
2024-12-09 14:13:36 +01:00
Daniel Bogdoll
de8a0b7547
Option to set 'non_blocking' for to(device) in BatchEncoding and BatchFeature (#34883)
* Option to set 'non_blocking' for to(device) operation for performance improvements. Defaults to 'false', thus no behavioral changes.

* Enabling non_blocking in to() operation of BatchFeature.

* Improved docstring on utilization of non_blocking

* Force non_blocking as keyword argument

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

---------

Co-authored-by: Daniel Bogdoll <dbogdoll@umich.edu>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2024-12-09 11:29:04 +01:00
UV
1452dc2514
Corrected typo in agent system prompts (#35143) 2024-12-09 10:42:23 +01:00
NielsRogge
9e420e0269
[I-JEPA] Update docs (#35148)
Update docs
2024-12-09 10:01:31 +01:00
kang sheng
1ccca8f48c
Fix GA loss bugs and add unit test (#35121)
* fix GA bugs and add unit test

* narrow down model loss unit test diff gap

* format code to make ruff happy

* send num_items_in_batch argument to decoder

* fix GA loss bug in BertLMHeadModel

* use TinyStories-33M to narrow down diff gap

* fotmat code

* missing .config

* avoid add extra args

---------

Co-authored-by: kangsheng <kangsheng@meituan.com>
2024-12-09 09:57:41 +01:00
Pavel Iakubovskii
c8c8dffbe4
Update I-JEPA checkpoints path (#35120)
Update checkpoints path
2024-12-06 13:42:51 +00:00
Victor Agostinelli
7f95372c62
Add feature dim attributes to BitLinear for easier PEFT integration (#34946)
Update bitnet.py, extremely small change to allow for easier PEFT integration

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2024-12-06 13:39:45 +01:00
Aymeric Roucher
9ad4c93536
Add Aria (#34157)
* Add Aria
---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-12-06 12:17:34 +01:00
Yih-Dar
15ab310c3a
Fix private forked repo. CI (#35114)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-12-06 12:03:31 +01:00
Steven Liu
98e8062df3
[docs] top_p, top_k, temperature docstrings (#35065)
clarify
2024-12-05 11:24:51 -08:00
Jacky Lee
44f88d8ccb
[docs] Update Python version in translations (#35096)
update: doc version
2024-12-05 11:06:54 -08:00
Lysandre
66ab300aaf Dev version 2024-12-05 19:12:22 +01:00
Pablo Montalvo
a5bb528471
Fix signatures for processing kwargs (#35105)
* add conversion script

* remove pg2 refs

* fixup style

* small update

* get correct scaling

* add back missing bos

* fix missing config keys

* might revert this pos_embeddings

* fixup 9b config

* fix 9b

* fixup 9b conversion for good + add back num_hidden_layers

* add correct query scaling for 2b, 9b, 27b

* fixup 27b conversion

* Additional variant: 27b-896

* Use CPU for conversion to reduce GPU RAM requirements

* fix causal mask generation + formatting

* fix in-training causal mask generation edge case

* trigger CI

* update config

* update config

* update config

* update config

* update config

* update config

* update config

* update config

* update config

* move conversion file to main model dir

* handle multi-images + bos token

* address comments for input ids

* revert ci fixes

* [run-slow] paligemma

* fix

* [run-slow] paligemma

* skip end 2 end

* [run-slow] paligemma

---------

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-12-05 18:15:48 +01:00
Jonathan Mamou
e27465c801
Adaptive dynamic number of speculative tokens (#34156)
* initial commit

* update strategy

* add tradeoff FPR TPR with cost

* all probs

* fix

* fix

* fix style

* Update src/transformers/generation/configuration_utils.py

shorter docstring

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* import guard

* fix style

* add is_sklearn_available condition

* vectorizing to flatten the for-loop

* fix style

* disable adaptation for UAG

* update doc

* add TestAssistedCandidateGeneratorUpdateStrategy

* fix style

* protect import

* fix style

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2024-12-05 17:07:33 +01:00
Yih-Dar
b0a51e5cff
Fix flaky Hub CI (test_trainer.py) (#35062)
* fix

* Update src/transformers/testing_utils.py

Co-authored-by: Lucain <lucainp@gmail.com>

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* check

* check

* check

* check

* check

* check

* Update src/transformers/testing_utils.py

Co-authored-by: Lucain <lucainp@gmail.com>

* Update src/transformers/testing_utils.py

Co-authored-by: Lucain <lucainp@gmail.com>

* check

* check

* check

* Final space

* Final adjustment

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Lucain <lucainp@gmail.com>
2024-12-05 17:02:27 +01:00
Arthur
a928d9c128
[trainer] fix the GA model_accepts_loss_kwargs (#34915)
* fix

* style

* values

* fix
2024-12-05 16:37:46 +01:00
Raushan Turganbay
e682c17e4a
BLIP: this is correct now (#35081)
this is correct now
2024-12-05 16:30:09 +01:00
João Marcelo
50189e36a6
Add I-JEPA (#33125)
* first draft

* add IJepaEmbeddings class

* fix copy-from for IJepa model

* add weight conversion script

* update attention class names in IJepa model

* style changes

* Add push_to_hub option to convert_ijepa_checkpoint function

* add initial tests for I-JEPA

* minor style changes to conversion script

* make fixup related

* rename conversion script

* Add I-JEPA to sdpa docs

* minor fixes

* adjust conversion script

* update conversion script

* adjust sdpa docs

* [run_slow] ijepa

* [run-slow] ijepa

* [run-slow] ijepa

* [run-slow] ijepa

* [run-slow] ijepa

* [run-slow] ijepa

* formatting issues

* adjust modeling to modular code

* add IJepaModel to objects to ignore in docstring checks

* [run-slow] ijepa

* fix formatting issues

* add usage instruction snippet to docs

* change pos encoding, add checkpoint for doc

* add verify logits for all models

* [run-slow] ijepa

* update docs to include image feature extraction instructions

* remove pooling layer from IJepaModel in image classification class

* [run-slow] ijepa

* remove pooling layer from IJepaModel constructor

* update docs

* [run-slow] ijepa

* [run-slow] ijepa

* small changes

* [run-slow] ijepa

* style adjustments

* update copyright in init file

* adjust modular ijepa

* [run-slow] ijepa
2024-12-05 16:14:46 +01:00
Mohamed Mekkouri
95a855e212
Deprecate quanto and switch to optimum-quanto (#35001)
* deprecate quanto

* fix style
2024-12-05 16:11:09 +01:00
Isotr0py
482cb28a18
Fix tie_word_embeddings handling for GGUF models (#35085)
* fix tie_word_embeddings

Signed-off-by: Isotr0py <2037008807@qq.com>

* fix

Signed-off-by: Isotr0py <2037008807@qq.com>

---------

Signed-off-by: Isotr0py <2037008807@qq.com>
2024-12-05 16:00:41 +01:00
Cyril Vallez
35447054f5
Update Mistral conversion script (#34829)
* Update convert_mistral_weights_to_hf.py

* Update convert_mistral_weights_to_hf.py

* Update convert_mistral_weights_to_hf.py
2024-12-05 15:47:20 +01:00
Arthur
93f87d3cf5
[tokenizers] bump to 0.21 (#34972)
bump to 0.21
2024-12-05 15:46:02 +01:00
eustlb
54aae121eb
[Whisper] Fix whisper tokenizer (#34537)
* handle single timestamp ending

* include last timestamp token

* handle single timestamp ending

* avoid floating points arithm limitations

* ensure float64 operations

* new test

* make fixup

* make copies

* handle edge case double tokens ending with different tokens

* handle single timestamp ending

* make fixup

* handle conditioning on prev segments

* fix

* Update src/transformers/models/whisper/generation_whisper.py

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>

* [run-slow] whisper

* don't call item() to avoid unnecessary sync

* fix

---------

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
Co-authored-by: Eustache Le Bihan <eustlb@users.noreply.huggingface.co>
2024-12-05 13:46:29 +01:00
Yih-Dar
beb2c66ec3
Informative (#35059)
* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-12-05 09:50:27 +01:00
Steven Liu
1ed1de2fec
[docs] Increase visibility of torch_dtype="auto" (#35067)
* auto-dtype

* feedback
2024-12-04 09:18:44 -08:00
Fanli Lin
baa3b22137
[docs] add a comment that offloading requires CUDA GPU (#35055)
* add commen to offloading

* Update docs/source/en/kv_cache.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-12-04 07:48:34 -08:00
Cyril Vallez
1da1e0d7f2
Support for easier multimodal use of modular (#35056)
* update modular and add examples

* style

* improve example comments

* style

* fix small logic issue for imports

* fix relative order issue when files do not make sense

* Improve comments

* trigger CIs
2024-12-04 15:13:11 +01:00
Anton Vlasjuk
46df859975
[GPTNeoX] Flex Attention + Refactor (#34896)
* gpt neox flex attention + refactor

* some formatting

* small fix on dropout

* add assertion on flex attn test

* flaky ci :(

* add head mask support

* style

* handle dtype, replace torch where

* fixup flex with output attns

* code review and several other fixes

* Update src/transformers/modeling_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* style

* remove unnecessary comment

* remove incorrect comment

* make flex attn check more agnostic tor versions and centralized

* change peft input dtype check to value since q and k could be affected by other stuff like RoPE

* i forgor

* flaky

* code review and small fixes

* Update src/transformers/models/gpt_neox/modeling_gpt_neox.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-12-04 14:48:28 +01:00
Vladislav Bronzov
accb7204f9
Add Pytorch Tensor Parallel support for Qwen2, Qwen2Moe, Starcoder2 (#35007)
* add base tp plan for qwen2 and qwen2moe

* add parallel tp for starcoder2

* fix modular conversion

* add infer dim for qkv states

* Update src/transformers/models/qwen2_moe/configuration_qwen2_moe.py

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-12-04 14:43:36 +01:00
Tianshu Wang
c7a109ec81
Fix pad_token_tensor is None in warning (#34005)
Fix pad_token_tensor is None in warning
2024-12-04 11:15:25 +01:00