Commit Graph

19383 Commits

Author SHA1 Message Date
Yoach Lacombe
569f6c7d43
Fix FA2 tests (#29909)
* fix FA2 tests

* refactor inference test name
2024-04-01 07:51:00 +00:00
Zach Mueller
3b8e2932ce
Rework tests to compare trainer checkpoint args (#29883)
* Start rework

* Fix failing test

* Include max

* Update src/transformers/trainer.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-03-30 22:19:17 -04:00
TechxGenus
6e584070d4
[BC] Fix BC for AWQ quant (#29965)
fix awq quant
2024-03-30 19:37:25 +01:00
Bo Zheng
46d636818b
Update model card and link of blog post. (#29928)
* Update qwen2_moe.md

* update link of blogpost.

* fixup

---------

Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
2024-03-30 17:49:03 +01:00
Gary Wang
f6701bc664
Reset alarm signal when the function is ended (#29706)
Fixes #29690
2024-03-30 17:41:27 +01:00
Alexander Jipa
e644b60038
fix: get mlflow version from mlflow-skinny (#29918)
Co-authored-by: Alexander Jipa <azzhipa@amazon.com>
2024-03-30 17:38:29 +01:00
Jacky Lee
156d30da94
Add warning message for run_qa.py (#29867)
* improve: error message for best model metric

* update: raise warning instead of error
2024-03-30 17:02:31 +01:00
Jacky Lee
6fd93fe93a
Fix rope theta for OpenLlama (#29893)
fix: rope_theta for open llama
2024-03-30 16:30:52 +01:00
fzyzcjy
5ad7f17002
Super tiny fix 12 typos about "with with" (#29926)
* with with

* style
2024-03-29 14:31:31 +00:00
Yih-Dar
43d17c1836
Mark test_eager_matches_sdpa_generate flaky for some models (#29479)
* fix

* revert for qwen2

* revert for qwen2

* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-03-29 11:51:20 +01:00
MariaHei
ba56ed0869
Update installs in image classification doc (#29947)
Trainer with PyTorch now requires accelerate to be installed.

Partly resolves huggingface/transformers#29174
2024-03-28 14:26:27 -07:00
Arthur
536ea2aca2
[LlamaSlowConverter] Slow to Fast better support (#29797)
* fix

* fix test

* style

* nit

* rather rely on concert token to id

* fix quality

* Update src/transformers/convert_slow_tokenizer.py
2024-03-28 16:19:32 +01:00
VINAYAKK GARG
e203646871
Fix doc issue #29758 in DebertaV2Config class (#29842)
Fix doc issue in DebertaV2Config class

Co-authored-by: Vinayakk Garg <vigar@akamai.com>
2024-03-28 14:49:57 +00:00
Arthur
2bbbf1be5b
[BC] Fix BC for other libraries (#29934)
* fi xbc?

* nit
2024-03-28 15:13:23 +01:00
Yu Chin Fabian Lim
4df5b9b4b2
Allow GradientAccumulationPlugin to be configured from AcceleratorConfig (#29589)
* add gradient_accumulation_kwargs to AcceleratorConfig

* add suggestions from @muellerzr to docstrings, new behavior and tests

* Documentation suggestions from @muellerz

Co-authored-by: Zach Mueller <muellerzr@gmail.com>

* addressed @muellerzr comments regarding tests and test utils

* moved accelerate version to top of file.

* @muellerzr's variable fix

Co-authored-by: Zach Mueller <muellerzr@gmail.com>

* address @amyeroberts. fix tests and docstrings

* address @amyeroberts additional suggestions

---------

Co-authored-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
2024-03-28 14:01:40 +00:00
Arthur
a2a7f71604
[ TokenizationLlama] fix the way we convert tokens to strings to keep leading spaces 🚨 breaking fix (#29453)
* nit

* update test and fix test

* fixup
2024-03-28 13:58:40 +01:00
Arthur
e677479c81
[Mamba] from pretrained issue with self.embeddings (#29851)
* nit

* update

* oups

* Update src/transformers/models/mamba/modeling_mamba.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

---------

Co-authored-by: Lysandre Debut <hi@lysand.re>
2024-03-28 13:54:51 +01:00
Joao Gante
441de62f49
RoPE models: add numerical sanity-check test for RoPE scaling (#29808)
* add hard rope scaling test

* make fixup

* quick rope scaling tests

* add copy statements
2024-03-28 11:25:50 +00:00
Christopher Keibel
aac7099c92
add functions to inspect model and optimizer status to trainer.py (#29838)
* add functions to get number of params which require grad, get optimizer group for parameters and get learning rates of param groups to trainer.py

* add tests and raise ValueError when optimizer is None

* add second layer to test and freeze its weigths

* check if torch is available before running tests

* use decorator to check if torch is available

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix test indentation

Co-authored-by: Zach Mueller <muellerzr@gmail.com>

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
2024-03-28 10:37:16 +00:00
amyeroberts
855b95ce34
Safe import of LRScheduler (#29919)
* Safe import of LRScheduler

* Update src/transformers/trainer_pt_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/trainer_pt_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Fix up

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-03-28 09:54:51 +00:00
Aymeric Roucher
c9d2e855ea
Add beam search visualizer to the doc (#29876) 2024-03-28 09:54:08 +00:00
Joao Gante
248d5d23a2
Tests: replace torch.testing.assert_allclose by torch.testing.assert_close (#29915)
* replace torch.testing.assert_allclose by torch.testing.assert_close

* missing atol rtol
2024-03-28 09:53:31 +00:00
Fanli Lin
7c19fafe44
[doc] fix some typos and add xpu to the testing documentation (#29894)
fix typo
2024-03-28 09:42:49 +00:00
Eduardo Pacheco
22d159ddf9
Adding Flash Attention 2 Support for GPT2 (#29226)
* First commit to add flash attention 2 for GPT-2

* more improvements

* Make GPT2 pass tests and fixed Decison Transformers copies

* Fixed missing arg

* fix copies

* Added expected speedup

* Update src/transformers/models/gpt2/modeling_gpt2.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/gpt2/modeling_gpt2.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/gpt2/modeling_gpt2.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Added test

* Fixed attn attribute

* Update docs/source/en/model_doc/gpt2.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt2.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update Decision transformer attentions

* More updates

* Passing tests

* Fix copies

* Fix copies part 2

* Decision transformer updates

* Update src/transformers/models/gpt2/modeling_gpt2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fix copies

* Decision transformer not supporting flash attn

* Addressed comments

* Addressed comments

* Addressed comments

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-03-28 09:31:24 +00:00
Arthur
3a7e68362b
[pipeline]. Zero shot add doc warning (#29845)
* add doc warning

* fix build pr
2024-03-28 09:10:26 +01:00
Arthur
543889f3f6
[GptNeox] don't gather on pkv when using the trainer (#29892)
don't gather on pkv when using the trainer
2024-03-28 08:56:53 +01:00
Arthur
b256516a8c
[make fix-copies] update and help (#29924)
* add some help

* style
2024-03-28 08:56:14 +01:00
Minseo Kang
d9dc993fdd
Fix typo in T5Block error message (#29881) 2024-03-28 03:30:29 +01:00
Lorenzo Verardo
a25037beb9
MixtralSparseMoeBlock: add gate jitter (#29865)
This commit adds gate jitter to MixtralSparseMoeBlock's input data
before passing it through the MoE layer, if turned on.
2024-03-27 16:14:26 +01:00
huismiling
75769744e9
add Cambricon MLUs support (#29627)
* add Cambricon MLUs support

* fix mlu device rng state

* up for quality check

* up mlu to support fp16

* fix mlu device dependency error

* fix mlu device dependency error

* enable mlu device for bf16

* fix mlu device memory tracker
2024-03-27 15:54:28 +01:00
Raushan Turganbay
0efcf32351
Move eos_token_id to stopping criteria (#29459)
* add eos stopping criteria

* minor fix

* Update tests/generation/test_stopping_criteria.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* check eos is not None and fix tests

* make style and fixup

* Update src/transformers/generation/stopping_criteria.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update tests/generation/test_utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update tests/generation/test_utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/__init__.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/generation/stopping_criteria.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/generation/stopping_criteria.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/generation/stopping_criteria.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* camel case everywhere

* call stopping criteria list for candidate ids

* make style  and fixup

* Empty commit

* Empty commit to pass flaky test

* set max length in PromptLookupCandidateGenerator

* Update src/transformers/generation/utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* lets fix this typo in docs

* Update src/transformers/generation/utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/generation/utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* update PR

* empty commit

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-03-27 12:18:10 +00:00
Marc Sun
31c575bcf1
fix fuyu device_map compatibility (#29880)
fix foward
2024-03-27 10:18:48 +01:00
Lysandre Debut
4d8427f739
Reimplement "Automatic safetensors conversion when lacking these files" (#29846)
* Automatic safetensors conversion when lacking these files (#29390)

* Automatic safetensors conversion when lacking these files

* Remove debug

* Thread name

* Typo

* Ensure that raises do not affect the main thread

* Catch all errors
2024-03-27 08:58:08 +01:00
Hovnatan Karapetyan
a81cf9ee90
Fix 29807, sinusoidal positional encodings overwritten by post_init() (#29813)
* Check for requires_grad when initing weights

* Add unit test

* Move sinusoidal positional encoding generation after post_init()

* Add modules to skip init list

* Move create_sinusoidal_embeddings to _init_weights
2024-03-27 06:28:00 +01:00
Anton Vlasjuk
cefb819f7a
Mamba slow_forward gradient fix (#29563)
* FIX: Cached slow forward in mamba
- additionally added mamba cached test
- added unused test (mamba causal lm forward and backward)
- fixed typo: "causl" --> "causal"

* formatting

* fix: use real `slow_forward` call instead of torch module's

* add shape assertion for mixer block test

* adjust shape assertion
2024-03-27 04:52:12 +01:00
Bo Zheng
1c39974a4c
Add Qwen2MoE (#29377)
* add support for qwen2 MoE models

* update docs

* add support for qwen2 MoE models

* update docs

* update model name & test

* update readme

* update class names & readme & model_doc of Qwen2MoE.

* update architecture name

* fix qwen2_moe tests

* use Qwen2Tokenizer instead of Qwen2MoeTokenizer

* update modeling_qwen2_moe.py

* fix model architecture

* fix qwen2_moe tests

* use Qwen2Tokenizer instead of Qwen2MoeTokenizer

* update modeling_qwen2_moe.py

* fix model architecture

* fix style

* fix test when there are sparse and non sparse layers

* fixup

* Update README.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fixup

* fixup

* add archive back

* add support for qwen2 MoE models

* update docs

* update model name & test

* update readme

* update class names & readme & model_doc of Qwen2MoE.

* update architecture name

* fix qwen2_moe tests

* use Qwen2Tokenizer instead of Qwen2MoeTokenizer

* update modeling_qwen2_moe.py

* fix model architecture

* fixup

* fix qwen2_moe tests

* use Qwen2Tokenizer instead of Qwen2MoeTokenizer

* fix style

* fix test when there are sparse and non sparse layers

* fixup

* add archive back

* fix integration test

* fixup

---------

Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-03-27 02:11:55 +01:00
Benjamin Minixhofer
8e08acad6b
Support num_attention_heads != num_key_value_heads in Flax Llama Implementation (#29557)
* fix tinyllama flax modelling

* rename vars to minimize changes

* move

* formatting

* remove unused var
2024-03-27 02:08:43 +01:00
Lucain
f01e1609bf
Set custom_container in build docs workflows (#29855) 2024-03-26 14:46:02 +01:00
Ilyas Moutawwakil
07d79520ef
Disable AMD memory benchmarks (#29871)
* remove py3nvml to skip amd memory benchmarks

* uninstall pynvml from docker images
2024-03-26 14:43:12 +01:00
Yanyi Liu
ef60995858
Add cosine_with_min_lr scheduler in Trainer (#29341)
* Add cosine_with_min_lr scheduler

* Update error message for missing min_lr or min_lr_rate
2024-03-26 13:57:07 +01:00
Zhihao Lin
998b5bb56f
Allow bos_token_id is None during the generation with inputs_embeds (#29772)
* update

* add ut

* update
2024-03-26 12:51:00 +00:00
Michael
b9ceb03df8
[docs] Indent ordered list in add_new_model.md (#29796) 2024-03-26 12:03:39 +00:00
Merve Noyan
de81a677c4
Fix header in IFE task guide (#29859)
Update image_feature_extraction.md
2024-03-26 12:32:37 +01:00
yunxiangtang
b32bf85b58
Replace 'decord' with 'av' in VideoClassificationPipeline (#29747)
* replace the 'decord' with 'av' in VideoClassificationPipeline

* fix the check of backend in VideoClassificationPipeline

* adjust the order of imports

* format 'video_classification.py'

* format 'video_classification.py' with ruff

---------

Co-authored-by: wanqiancheng <13541261013@163.com>
2024-03-26 10:12:24 +00:00
Jonathan Flynn
b5a6d6eeab
Add warnings if training args differ from checkpoint trainer state (#29255)
* add warnings if training args differ from checkpoint args stored in trainer_state.json

* run formatting and styling

* add a test

* format and styling

---------

Co-authored-by: Jonathan Flynn <jonl.flynn@guardian.co.uk>
2024-03-26 07:13:13 +01:00
Johannes Kolbe
7eb3ba8224
remove quotes in code example (#29812)
Co-authored-by: Johannes <johannes.kolbe@tech.better.team>
2024-03-25 13:26:54 +00:00
Arthur Zucker
e3e16ddc3c [revert commit] revert 00a09ed448 2024-03-25 22:01:01 +09:00
Arthur Zucker
00a09ed448 fix 😭 2024-03-25 21:57:31 +09:00
Yuki Watanabe
8e9a2207b3
Populate torch_dtype from model to pipeline (#28940)
* Populate torch_dtype from model to pipeline

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

* use property

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

* lint

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

* Remove default handling

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

---------

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
2024-03-25 10:46:40 +01:00
yhuang
afe73aed54
Fix the behavior of collecting 'num_input_tokens_seen' (#29099)
fix the behavior of collecting 'num_input_tokens_seen'

See https://github.com/huggingface/transformers/issues/28791 for more details.
2024-03-25 10:43:46 +01:00