Commit Graph

17264 Commits

Author SHA1 Message Date
Vijay
fc1ae7f30f
[docs] update input documentation for MAMBA2 and MISTRAL models to include cache_position and attention_mask details (#34322)
* [docs] update input documentation for MAMBA2 and MISTRAL models to include cache_position and attention_mask details

* [docs] correct input documentation for MISTRAL model to reference `input_ids` instead of `decoder_input_ids`

* [docs] clarify cache_position description in MISTRAL model documentation
2024-10-28 09:14:07 -07:00
Sean (Seok-Won) Yi
c1753436db
New option called "best" for args.save_strategy. (#31817)
* Add _determine_best_metric and new saving logic.

1. Logic to determine the best logic was separated out from
`_save_checkpoint`.
2. In `_maybe_log_save_evaluate`, whether or not a new best metric was
achieved is determined after each evaluation, and if the save strategy
is "best' then the TrainerControl is updated accordingly.

* Added SaveStrategy.

Same as IntervalStrategy, but with a new attribute called BEST.

* IntervalStrategy -> SaveStrategy

* IntervalStratgy -> SaveStrategy for save_strat.

* Interval -> Save in docstring.

* Updated docstring for save_strategy.

* Added SaveStrategy and made according changes.

`save_strategy` previously followed `IntervalStrategy` but now follows
`SaveStrategy`.

Changes were made accordingly to the code and the docstring.

* Changes from `make fixup`.

* Removed redundant metrics argument.

* Added new test_save_best_checkpoint test.

1. Checks for both cases where `metric_for_best_model` is explicitly
provided and when it's not provided.
2. The first case should have two checkpoints saved, whereas the second
should have three saved.

* Changed should_training_end saving logic.

The Trainer saves a checkpoints at the end of training by default as
long as `save_strategy != SaveStrategy.NO`. This condition was modified
to include `SaveStrategy.BEST` because it would be counterintuitive that
we'd only want the best checkpoint to be saved but the last one is as
well.

* `args.metric_for_best_model` default to loss.

* Undo metric_for_best_model update.

* Remove checking metric_for_best_model.

* Added test cases for loss and no metric.

* Added error for metric and changed default best_metric.

* Removed unused import.

* `new_best_metric` -> `is_new_best_metric`

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Applied `is_new_best_metric` to all.

Changes were made for consistency and also to fix a potential bug.

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
2024-10-28 16:02:22 +01:00
AbdelKarim ELJANDOUBI
8b3b9b48fc
exclude fsdp from delay_optimizer_creation (#34140)
* exclude fsdp from delay_optimizer_creation

* add test case for trainer: FSDP mode and fp8 as mixed precision

* rearrange imports

* ruff formatted

* adapt _init_fsdp to fp8

* use _init_fsdp only when resume_from_checkpoint

* In case of FDP, self.layer will be CheckpointWrapper which has no len() method

* delete _init_fsdp

* solve conflict

* fix conflict

* make fixup
2024-10-28 13:50:16 +01:00
Nischay
92bcdff2ef
Fix batch size handling in prediction_loop for DataLoaderShard (#34343)
* Fix batch size handling in prediction_loop for DataLoaderShard

Updated the prediction_loop method in the Trainer class to correctly handle batch size when using DataLoaderShard. This ensures that the batch size is retrieved from total_batch_size for distributed training scenarios, preventing TypeError related to NoneType during evaluation.

* Update src/transformers/trainer.py

Co-authored-by: Zach Mueller <muellerzr@gmail.com>

* Applied the fix to remove unused imports

---------

Co-authored-by: Zach Mueller <muellerzr@gmail.com>
2024-10-28 13:23:52 +01:00
Yih-Dar
9360f1827d
Tiny update after #34383 (#34404)
* update

* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-10-28 12:01:05 +01:00
Yih-Dar
fc465bb196
pin tensorflow_probability<0.22 in docker files (#34381)
0.21

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-10-28 11:59:46 +01:00
Ilyas Moutawwakil
fddbd3c13c
Fix pix2struct (#34374)
* fix

* fix and test use_cache test

* style

* remove atol
2024-10-28 11:24:56 +01:00
Steven Liu
1d06379331
[docs] Cache implementations (#34325)
cache
2024-10-25 08:52:45 -07:00
Rudy Delouya
6a62a6d1b5
Fix typos in agents_advanced.md (#34405) 2024-10-25 08:52:29 -07:00
Yih-Dar
f73f5e62e2
Avoid check expected exception when it is on CUDA (#34408)
* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-10-25 17:14:07 +02:00
Matthew Douglas
e447185b1f
Fix bnb training test failure (#34414)
* Fix bnb training test: compatibility with OPTSdpaAttention
2024-10-25 10:23:20 -04:00
Joao Gante
186b8dc190
Tests: upgrade test_eager_matches_sdpa_generate (#34386) 2024-10-25 11:55:07 +01:00
Joao Gante
8814043c8c
SynthID: better example (#34372)
* better example

* Update src/transformers/generation/configuration_utils.py

* Update src/transformers/generation/logits_process.py

* nits
2024-10-25 11:46:46 +01:00
Yih-Dar
223855314f
no filter (#34391)
* no filter

* no filter

* no filter

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-10-25 12:32:39 +02:00
Raushan Turganbay
9f365fe0ac
Fix right padding in LLaVA models (#34305)
* fix right pad llavas

* device mismatch
2024-10-25 11:02:07 +02:00
Ilyas Moutawwakil
5779bac4c4
Fix onnx non-expotable inplace aten op (#34376)
* fix onnx non-expotable inplace op

* mistral, qwen2, qwen2_vl, starcoder2

* fixup copies
2024-10-25 09:44:09 +02:00
Yoni Gozlan
940a6bd343
Use non nested images and batched text Idefics2/3 (#34222)
* add support for non nested images and add tests

* add tests error scenario

* fix style

* added single and no image to error tests
2024-10-24 20:00:13 -04:00
Cyril Vallez
3d99f1746e
Fix glm (#34388)
* Fix duplicated

* fix import
2024-10-24 19:17:52 +02:00
Yih-Dar
a308d28d39
[auto. ping] Avoid sending empty info + add more team members (#34383)
* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-10-24 19:07:23 +02:00
Cyril Vallez
4c6e0c9252
Correct the new defaults (#34377)
* Correct the new defaults

* CIs

* add check

* Update utils.py

* Update utils.py

* Add the max_length in generate test checking shape without passing length

* style

* CIs

* fix fx CI issue
2024-10-24 18:42:03 +02:00
Michael Benayoun
1c5918d910
Fix torch.fx issue related to the new loss_kwargs keyword argument (#34380)
* Fix FX

* Unskip tests
2024-10-24 18:34:28 +02:00
Benjamin Bossan
d9989e0b9a
[PEFT] Add warning for missing key in LoRA adapter (#34068)
When loading a LoRA adapter, so far, there was only a warning when there
were unexpected keys in the checkpoint. Now, there is also a warning
when there are missing keys.

This change is consistent with
https://github.com/huggingface/peft/pull/2118 in PEFT and the planned PR
https://github.com/huggingface/diffusers/pull/9622 in diffusers.

Apart from this change, the error message for unexpected keys was
slightly altered for consistency (it should be more readable now). Also,
besides adding a test for the missing keys warning, a test for
unexpected keys warning was also added, as it was missing so far.
2024-10-24 17:56:40 +02:00
Yoni Gozlan
fe35073319
Ignore unsupported kwarg in ProcessorMixin call (#34285)
Fix accept any common kwargs
2024-10-24 11:46:39 -04:00
Winston H.
e288616606
refactor: remove redundant if-condition and improve type correctness for convert_tokens_to_ids (#34030)
* chore: remove redundant if-condition

* fix: import `Iterable`
2024-10-24 17:40:26 +02:00
Vijay
450b9cbfac
Add code sample docstrings and checkpoint reference for GLM models (#34360)
* Add code sample docstrings and checkpoint reference for GLM models

* Update modular_glm.py

* Update modeling_glm.py
2024-10-24 17:28:51 +02:00
Yoni Gozlan
6432ad8bb5
Fix pil_torch_interpolation_mapping import in image_processing_detr_fast (#34375)
fix pil_torch_interpolation_mapping import
2024-10-24 09:22:50 -04:00
김준재
dd267fca72
Add T5 GGUF loading support (#33389)
* add: GGUFT5Converter

* add: tensormapping for t5

* add: test code for t5

* fix: Remove whitespace from blank line

* add: t5 fp16 tests

* fix: whitespace formatting

* fix: minor formatting

* fix: testing every weights
2024-10-24 15:10:59 +02:00
Thomas Furtner
30c76d5b28
add code generation to natural language processing section (#34333) 2024-10-24 14:42:47 +02:00
Lysandre Debut
2112027d0c
Zamba is an LM (#34342)
* Zamba is an LM

* Addition
2024-10-24 14:29:33 +02:00
Raushan Turganbay
b29c24ff1e
CI: fix failures (#34371)
fix
2024-10-24 13:44:53 +02:00
blueingman
f0b3ef9e2e
translated gguf.md into chinese (#34163)
* translated gguf.md into chinese

* Apply suggestions from code review

I have updated the PR accordingly.Thank you very much for detailed guidance,and I 'll pay more attention to the details next time.

Co-authored-by: Isotr0py <2037008807@qq.com>

* Apply suggestions from code review

Co-authored-by: Isotr0py <2037008807@qq.com>

---------

Co-authored-by: Isotr0py <2037008807@qq.com>
2024-10-24 11:47:58 +02:00
Arthur Zucker
9643069465 v4.47.0.dev0 2024-10-24 11:23:29 +02:00
Yih-Dar
f0e640adfa
Drop support for Python 3.8 (#34314)
* drop python 3.8

* update docker files

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-10-24 11:16:55 +02:00
Arthur
05863817d6
Better defaults (#34026)
* be nice to our usres

* nit

* fixup

* default to -1

* oups

* turbo nit

* auto infer framework
2024-10-24 11:11:55 +02:00
Abhishek Maurya
65753d6065
Remove graph breaks for torch.compile() in flash_attention_forward when Lllama Model is padding free tuned (#33932)
* fix: fixes for graph breaks

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* fix: formatting

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* fix: import error

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* fix: Add Fa2Kwargs

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* fix: PR Changes

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* PR changes

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* PR changes

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* PR changes

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* PR changes

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* Revert "PR changes"

This reverts commit 39d2868e5c.

* PR changes

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* fix: FlashAttentionKwarg

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* fix: FlashAttentionKwarg

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* PR Changes

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* PR Changes

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* PR Changes

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* PR Changes

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* PR Changes

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* addition of documentation

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* change in _flash_attention_forward

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* make fix-copies

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* revert make fix-copies

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* fix copies

* style

* loss kwargs typing

* style and pull latest changes

---------

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
2024-10-24 11:02:54 +02:00
Joao Gante
b0f0c61899
Add SynthID (watermerking by Google DeepMind) (#34350)
* Add SynthIDTextWatermarkLogitsProcessor

* esolving comments.

* Resolving comments.

* esolving commits,

* Improving SynthIDWatermark tests.

* switch to PT version

* detector as pretrained model + style

* update training + style

* rebase

* Update logits_process.py

* Improving SynthIDWatermark tests.

* Shift detector training to wikitext negatives and stabilize with lower learning rate.

* Clean up.

* in for 7B

* cleanup

* upport python 3.8.

* README and final cleanup.

* HF Hub upload and initiaze.

* Update requirements for synthid_text.

* Adding SynthIDTextWatermarkDetector.

* Detector testing.

* Documentation changes.

* Copyrights fix.

* Fix detector api.

* ironing out errors

* ironing out errors

* training checks

* make fixup and make fix-copies

* docstrings and add to docs

* copyright

* BC

* test docstrings

* move import

* protect type hints

* top level imports

* watermarking example

* direct imports

* tpr fpr meaning

* process_kwargs

* SynthIDTextWatermarkingConfig docstring

* assert -> exception

* example updates

* no immutable dict (cant be serialized)

* pack fn

* einsum equivalent

* import order

* fix test on gpu

* add detector example

---------

Co-authored-by: Sumedh Ghaisas <sumedhg@google.com>
Co-authored-by: Marc Sun <marc@huggingface.co>
Co-authored-by: sumedhghaisas2 <138781311+sumedhghaisas2@users.noreply.github.com>
Co-authored-by: raushan <raushan@huggingface.co>
2024-10-23 21:18:52 +01:00
Arthur
e50bf61dec
Fix red CI: benchmark script (#34351)
* dont'trigger always

* fux

* oups

* update

* ??

* ?

* aie
2024-10-23 18:33:52 +02:00
Yih-Dar
c42b3223db
skip test_pipeline_depth_estimation temporarily (#34316)
skip

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-10-23 17:27:51 +02:00
Zach Mueller
d9f733625c
Enable Gradient Accumulation fix across all models + trainer fully in forward() (#34283)
* Enable grad accum fix across all models + trainer fully in forward()

* handle peft case

* Account for DDP: need to run scale tests

* Use accelerator state

* Quality

* Guard

* Experiment w/ only fairseq fix

* Fairseq only

* Revert multiply_grads fix

* Mult by grad accum to fully bring back solution

* Style

* Good to go now

* Skip fx tests for now

* Bookmark

* Working now
2024-10-23 11:24:57 -04:00
Aymeric Roucher
1fb575fcf0
Support boolean tool args (#34208)
Support boolean tool arguments
2024-10-23 16:48:21 +02:00
Filippos Ventirozos
343c8cb86f
Added Deberta model type support (#34308)
* Added Deberta model type for 'add_prefix_space' functionality

* housekeeping

---------

Co-authored-by: Filippos Ventirozos <filippos.ventirozos@autotrader.co.uk>
2024-10-23 11:15:36 +02:00
Steven Liu
5ba85de7a4
[docs] Fix Korean toctree (#34324)
fix
2024-10-23 10:52:51 +02:00
Vijay
049682a5a6
Example doc for token classification of Llama and Dependent/Copied Models (#34139)
* Added Example Doc for token classification on all tokenClassificationModels copied from llama

* Refactor code to add code sample docstrings for Gemma and Gemma2 models (including modular Gemma)

* Refactor code to update model checkpoint names for Qwen2 models
2024-10-22 10:26:16 -07:00
wony617
644d5287b2
🌐 [i18n-KO] Translated model_doc/bartpho.md to Korean (#33981)
* docs: ko: model_doc/bartpho.md

* feat: nmt draft

* Update docs/source/ko/model_doc/bartpho.md

* Update docs/source/ko/_toctree.yml

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-10-22 09:46:52 -07:00
Ahnjj_DEV
b03dc0a87e
🌐 [i18n-KO] Translated bert japanese.md to Korean (#33890)
* docs: ko: bert-japanese.md

* Update _toctree.yml

* fix: manual edits

* Update docs/source/ko/_toctree.yml

Co-authored-by: Sungmin Oh <fabxoe.kor@gmail.com>

* Update docs/source/ko/_toctree.yml

Co-authored-by: Sungmin Oh <fabxoe.kor@gmail.com>

---------

Co-authored-by: Sungmin Oh <fabxoe.kor@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-10-22 09:46:31 -07:00
Ahnjj_DEV
4b14aa1bcd
🌐 [i18n-KO] Translated executorch.md to Korean (#33888)
* docs: ko: executorch.md

* Update _toctree.yml

* fix: manual edits

* Update docs/source/ko/main_classes/executorch.md

Co-authored-by: HyeokJun SHIN <96534680+jun048098@users.noreply.github.com>

* Update docs/source/ko/_toctree.yml

Co-authored-by: Sungmin Oh <fabxoe.kor@gmail.com>

* Update docs/source/ko/_toctree.yml

* Update docs/source/ko/_toctree.yml

* Update docs/source/ko/_toctree.yml

---------

Co-authored-by: HyeokJun SHIN <96534680+jun048098@users.noreply.github.com>
Co-authored-by: Sungmin Oh <fabxoe.kor@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-10-22 09:46:20 -07:00
Fanli Lin
688eeac81e
[docs] fix typo (#34235)
fix typo
2024-10-22 09:46:07 -07:00
Mansu Kim
a65a6ce7fe
fix error in _get_eval_sampler when group_by_length enabled (#34237)
* remove self in _get_eval_sampler

* remove self in front of _get_eval_sampler
2024-10-22 18:02:42 +02:00
Yoni Gozlan
e7c3fa7f57
Fix continue_final_message for image-text-to-text chat templates (#34236)
* fix continue_final_message for vlms

* Add one test for vlms continue_final_message chat template
2024-10-22 11:57:44 -04:00
Chinedum Echeta
96f67c068b
Feature: Add MLFLOW_MAX_LOG_PARAMS to MLflowCallback (#34279) 2024-10-22 16:34:17 +02:00