Commit Graph

18472 Commits

Author SHA1 Message Date
Yoni Gozlan
12f2ebef63
Support custom dosctrings in modular (#36726)
* Override docstrings in modular if not none

* Update doc
2025-03-18 14:00:54 -04:00
Gar
00915d3041
Fix chameleon's TypeError because inputs_embeds may None (#36673)
* fix chameleon TypeError when inputs_embeds is None

* reformat

* hotfix
2025-03-18 18:59:30 +01:00
Marc Sun
14b597f518
Fix casting dtype for qunatization (#36799)
* fix

* remove print
2025-03-18 18:46:03 +01:00
Yoni Gozlan
30580f035b
Fix Mistral3 tests (#36797)
* fix processor tests

* fix modeling tests

* fix test processor chat template

* revert modeling test changes
2025-03-18 13:08:12 -04:00
Cyril Vallez
db1d4c5a0b
Loading optimizations (#36742)
* improvements

* Update modeling_utils.py

* add some doc about loading

* Update modeling_utils.py
2025-03-18 16:38:44 +01:00
Yih-Dar
7baf00089a
Update SHA for tj-actions/changed-files (#36795)
* trigger

* trigger

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-18 16:19:39 +01:00
Marc Sun
3017536ebf
fix hqq due to recent modeling changes (#36771)
* fix-hqq

* style

* test
2025-03-18 12:20:27 +01:00
Cyril Vallez
e959530b8f
Add Mistral3 (#36790)
* initial start

* style and dummies

* Create convert_mistral3_weights_to_hf.py

* update

* typo

* typo

* Update convert_mistral3_weights_to_hf.py

* Update convert_mistral3_weights_to_hf.py

* Update convert_mistral3_weights_to_hf.py

* Update convert_mistral3_weights_to_hf.py

* up

* Update convert_mistral3_weights_to_hf.py

* Update convert_mistral3_weights_to_hf.py

* update

* update

* Update image_processing_mistral3.py

* Update convert_mistral3_weights_to_hf.py

* fix patch merger

* Update convert_mistral3_weights_to_hf.py

* Update convert_mistral3_weights_to_hf.py

* up

* update modular to fit

* style

* Update convert_mistral3_weights_to_hf.py

* typo

* Update modular_mistral3.py

* simplify a lot all shape shenanigans

* simplify

* add working test processor

* Add partially working common modeling tests

* All tests working and remove mistral3 image processors

* add docs and fixup

* fix inference with image size >1540

* 🚨fix test image proc pixtral

* Remove vision_feature_select_strategy

* Update convert_mistral3_weights_to_hf.py

* Update convert_mistral3_weights_to_hf.py

* Update convert_mistral3_weights_to_hf.py

* Update convert_mistral3_weights_to_hf.py

* clean

* fix test checkpoints

* Update test_modeling_mistral3.py

* Update test_modeling_mistral3.py

* style

* Use Pixtral processor

* up

* finish cleaning processor to use pixtral directly

* Update __init__.py

* Update processing_pixtral.py

* doc

* Update __init__.py

* Update mistral3.md

* Update _toctree.yml

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: yonigozlan <yoni.gozlan10@gmail.com>
2025-03-18 12:04:42 +01:00
Lysandre Debut
bd92073692
Fix gemma3_text tokenizer in mapping (#36793) 2025-03-18 11:50:22 +01:00
Zebin
7426d02ea8
Fixing typo in gemma3 image_processor_fast and adding a small test (#36776)
Co-authored-by: zebz13 <zeb@fedora>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-03-18 11:35:06 +01:00
Afanti
19b9d8ae13
chore: fix typos in tests directory (#36785)
* chore: fix typos in tests directory

* chore: fix typos in tests directory

* chore: fix typos in tests directory

* chore: fix typos in tests directory

* chore: fix typos in tests directory

* chore: fix typos in tests directory

* chore: fix typos in tests directory
2025-03-18 10:31:13 +01:00
Afanti
7f5077e536
fix typos in the tests directory (#36717) 2025-03-17 17:45:57 +00:00
Daniel Kleine
cbfb8d7b27
doc: Clarify is_decoder usage in PretrainedConfig documentation (#36724)
* fix: clarify decoder usage in PretrainedConfig documentation

* Apply suggestions from code review

updated doc

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-03-17 09:40:25 -07:00
Steven Liu
ac1a1b66b9
[docs] Update README (#36265)
* update

* feedback

* feedback

* update versions
2025-03-17 09:37:19 -07:00
Joao Gante
cff4caa0c1
[CI] remove redundant checks in test_eager_matches_sdpa_inference (#36740) 2025-03-17 16:29:18 +00:00
Christopher Akiki
e3af4fec91
[MINOR:TYPO] Update hubert.md (#36733)
* [MINOR:TYPO] Update hubert.md

- typo fix (wave2vec instead of hubert)
- make code snippet copiable and runnable

* Run tests
2025-03-17 09:07:51 -07:00
Petr Kuderov
c8a2b25f91
Fix TrainingArguments.torch_empty_cache_steps post_init check (#36734)
Mistaken use of De Morgan's law. Fixed "not (X or Y)"
to correct "not (X and Y)" check to raise a ValueError.

Added corresponding test to check "positive int or None" condition.

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-03-17 16:09:46 +01:00
Sambhav Dixit
8e67230860
Fix test isolation for clear_import_cache utility (#36345)
* test fixup

* test fixup

* fixing tests for unused imports

* style fixes

* fix

* style fixes

* styke fix

* remove isolated module cache

* rm custom subprocess defination

* run using exsiting fn

* style fixup

* make fixup

* remove redundant comments

* rm redundat skipif + style changes
2025-03-17 16:09:09 +01:00
jiqing-feng
27361bd218
fix xpu tests (#36656)
* fix awq xpu tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix llava next video bnb tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-03-17 15:57:49 +01:00
Fredrik Norén
da7d64f4ff
Allow ray datasets to be used with trainer (#36699)
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-03-17 15:44:47 +01:00
jiqing-feng
2256875a77
fix can_generate (#36570)
* fix can_generate

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix can generate for speecht5 and blip

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix speecht5 tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
2025-03-17 14:56:18 +01:00
Marc Sun
9e94801146
enable/disable compile for quants methods (#36519)
* disable compile for most quants methods

* fix

* Update src/transformers/generation/configuration_utils.py

Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

* Update tests/quantization/bnb/test_mixed_int8.py

Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

* Update src/transformers/generation/configuration_utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* changes from joao suggestions

---------

Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-03-17 11:38:21 +01:00
Armaghan Shakir
c53d53da89
🚨🚨🚨 Fix sdpa in SAM and refactor relative position embeddings (#36422)
* fall back to eager if output_attentions

* improve relative position embeddings

* run modular on got_ocr2

* run-slow: sam

* fix run-length encoding

* fix tf processor errors

* update tf_sam

* fix compile error

* re-run tests
2025-03-17 09:39:52 +00:00
Joao Gante
fc8764c9a6
[Generation, Gemma 3] When passing a custom generation_config, overwrite default values with the model's base generation_config (#36684) 2025-03-15 12:40:09 +00:00
Guillaume LEGENDRE
f263e88dcf
Update self-push-caller.yml 2025-03-15 11:32:04 +01:00
Ilyas Moutawwakil
6f3e0b68e0
Fix grad accum arbitrary value (#36691) 2025-03-14 22:03:01 +01:00
Cyril Vallez
2c2495cc7b
Fix post_init() code duplication (#36727)
* Update modeling_utils.py

* CIs
2025-03-14 17:36:02 +01:00
MaCAT
25992b493c
🌐 [i18n-KO] Translated codegen.md to Korean (#36698)
* Initial translation

* Add _toctree.yml
2025-03-14 09:31:18 -07:00
Joao Gante
42ebb6c23e
[tests] Parameterized test_eager_matches_sdpa_inference (#36650) 2025-03-14 14:41:27 +00:00
Matt
9215cc62d4
Try working around the processor registration bugs (#36184)
* Try working around the processor registration bugs

* oops

* Update error message

* Clarify error

* Docstring docstring docstring

* The extra content is indexed by config class, so let's grab some values out of there

* Commit my confusion as a TODO

* Resolve my confusion

* Cleanup and mostly revert to the original

* Better autoclass fallback

* Don't nest f-strings you lunatic

* Clearer error message

* Less getattr()

* Revert a lot of changes to try a different approach!

* Try the global registry

* Check the dynamic list as well as the transformers root

* Move the dynamic list somewhere safer

* Move the dynamic list somewhere even safer

* More import cleanup

* Simplify all the register_for_auto_class methods

* Set _auto_class in the register() methods

* Stop setting the cls attribute in register()

* Restore specifying the model class for Model derivatives only

* Fix accidentally taking the .__class__ of a class

* Revert register_for_auto_class changes

* Fix get_possibly_dynamic_module

* No more ALL_CUSTOM_CLASSES

* Fix up get_possibly_dynamic_module as well

* Revert unnecessary formatting changes

* Trigger tests
2025-03-14 13:56:21 +00:00
Sean (Seok-Won) Yi
691d1b52c3
Fix/best model checkpoint fix (#35885)
* Set best_model_checkpoint only when ckpt exists.

Rather than set it explicitly without checking if the checkpoint directory even exists as before, now we moved the setting logic inside of _save_checkpoint and are only setting it if it exists.

* Added best_global_step to TrainerState.

* Added tests for best_model_checkpoint.

* Fixed hard-coded values in test to prevent fail.

* Added helper func and removed hard-coded best_step.

* Added side effect patch generator for _eval.

* Added evaluate side effect func.

* Removed erroneous patching.

* Fixed minor bug.

* Applied Ruff.

* Fixed Ruff problem in make style.

* Used Trainer.set_initial_training_values.
2025-03-14 14:24:53 +01:00
Joao Gante
3bd1a0ddf1
[model loading] don't gc.collect() if only 1 shard is used (#36721)
* don't gc collect if 1 shard is used

* delete state dict anyways
2025-03-14 12:56:56 +00:00
Matt
8cb522b419
Cleanup the regex used for doc preprocessing (#36648)
* Cleanup the regex used for doc preprocessing

* Run tests
2025-03-14 12:18:49 +00:00
Matt
72861e11eb
Make the flaky list a little more general (#36704)
* Make the flaky list a little more general

* Trigger tests

* Make the flaky list a little more general
2025-03-14 12:15:32 +00:00
Kingsley
53742b11f5
Gemma3 processor typo (#36710)
* fix typo when  is on

* tiny

* add test and remove 'text_crops'

* lint
2025-03-14 13:07:55 +01:00
Yoni Gozlan
69bc848480
Add support for fast image processors in add-new-model-like CLI (#36313)
* add support for fast image processors in add-new-model-like

* fix header not found add-fast-image-processor-cli

* Encourage adding fast image processor

* nit

* start improve doc

* update docs

* make requested modifs
2025-03-13 14:16:37 -04:00
Matt
48ef468c74
Final CI cleanup (#36703)
* make fixup

* make fixup

* Correct skip decorator

* Add TODOs

* add is_flaky() parentheses
2025-03-13 17:26:09 +00:00
Isotr0py
b070025aa6
Add GGUF support to T5-Encoder (#36700)
* add gguf support to t5encoder

Signed-off-by: Isotr0py <2037008807@qq.com>

* fix

Signed-off-by: Isotr0py <2037008807@qq.com>

* remove gguf from model_kwargs

Signed-off-by: Isotr0py <2037008807@qq.com>

---------

Signed-off-by: Isotr0py <2037008807@qq.com>
2025-03-13 17:57:33 +01:00
Mohamed Mekkouri
4a60bae8e2
Handling an exception related to HQQ quantization in modeling (#36702)
* adding exception

* style

* add types
2025-03-13 17:53:36 +01:00
Mehant Kammakomati
09a309d273
fix: fsdp sharded state dict wont work for save_only_model knob (#36627)
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-03-13 17:17:35 +01:00
Cyril Vallez
2a004f9ff1
Add loading speed test (#36671)
* Update test_modeling_utils.py

* Update test_modeling_utils.py

* Update test_modeling_utils.py

* Update test_modeling_utils.py

* Update test_modeling_utils.py

* Update test_modeling_utils.py

* trigger CIs

* Update test_modeling_utils.py

* Update test_modeling_utils.py

* Update test_modeling_utils.py

* better error messages

* Update test_modeling_utils.py

* Update test_modeling_utils.py
2025-03-13 17:07:30 +01:00
Joao Gante
a3201cea14
[CI] Automatic rerun of certain test failures (#36694) 2025-03-13 15:40:23 +00:00
Afanti
d84569387f
chore: fix typos in utils module (#36668)
* chore: fix typos in utils module

* chore: fix typos in utils module

* chore: fix typos in utils module

* chore: fix typos in utils module

* chore: fix typos in utils module

* chore: fix typos in utils module
2025-03-13 15:12:44 +00:00
Cyril Vallez
32c95bd847
Fix dtype for params without tp_plan (#36681)
* Update tensor_parallel.py

* CIs
2025-03-13 15:28:14 +01:00
wineandchord
bb965d8e87
fix type annotation for ALL_ATTENTION_FUNCTIONS (#36690)
Corrects the type annotation to match actual usage. The variable was typed as
Dict[str, Dict[str, Callable]] but is actually used as Dict[str, Callable]
where keys are attention mechanism names and values are the corresponding
attention functions directly. This change makes the type annotation consistent
with how the dictionary is used in the codebase.
2025-03-13 14:27:50 +00:00
Yoni Gozlan
1c287aecfc
Change Qwen2_VL image processors to have init and call accept the same kwargs (#36207)
Change qwen2VL image processors to have init and call accept the same kwargs
2025-03-13 10:15:17 -04:00
Mohamed Mekkouri
65b8e38aac
Upgrading torch version and cuda version in quantization docker (#36264)
* update

* small update

* no spqr quant

* testing

* testing

* test nightly

* gptqmodel

* flute

* fix hadamard

* running tests

* new docker

* fix docker

* run tests

* testing new docker

* new docker

* run tests

* new docker

* run tests

* final test

* update

* update

* run tests

* new docker

* launch tests

* test_docker

* running tests

* add comments

* fixing yml

* revert
2025-03-13 12:39:16 +01:00
bd793fcb
87b30c3589
fix wandb hp search unable to resume from sweep_id (#35883)
* fix wandb hp search unable to resume from sweep_id

* format styles

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-03-13 12:32:26 +01:00
Mohamed Mekkouri
47cc4da351
Changing the test model in Quanto kv cache (#36670)
changing model
2025-03-13 12:23:34 +01:00
Marc Sun
bc3d5781e7
Fix slicing for 0-dim param (#36580)
* fix

* switch to ellipsis instead

* Add co-author
Co-authored-by: fxmarty-amd <fxmarty-amd@users.noreply.github.com>

* Add co-author second try
Co-authored-by: fxmarty-amd <felmarty@amd.com>
2025-03-13 12:16:13 +01:00