Yoach Lacombe
569f6c7d43
Fix FA2 tests ( #29909 )
...
* fix FA2 tests
* refactor inference test name
2024-04-01 07:51:00 +00:00
Zach Mueller
3b8e2932ce
Rework tests to compare trainer checkpoint args ( #29883 )
...
* Start rework
* Fix failing test
* Include max
* Update src/transformers/trainer.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-03-30 22:19:17 -04:00
TechxGenus
6e584070d4
[BC
] Fix BC for AWQ quant ( #29965 )
...
fix awq quant
2024-03-30 19:37:25 +01:00
Bo Zheng
46d636818b
Update model card and link of blog post. ( #29928 )
...
* Update qwen2_moe.md
* update link of blogpost.
* fixup
---------
Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
2024-03-30 17:49:03 +01:00
Gary Wang
f6701bc664
Reset alarm signal when the function is ended ( #29706 )
...
Fixes #29690
2024-03-30 17:41:27 +01:00
Alexander Jipa
e644b60038
fix: get mlflow version from mlflow-skinny ( #29918 )
...
Co-authored-by: Alexander Jipa <azzhipa@amazon.com>
2024-03-30 17:38:29 +01:00
Jacky Lee
156d30da94
Add warning message for run_qa.py
( #29867 )
...
* improve: error message for best model metric
* update: raise warning instead of error
2024-03-30 17:02:31 +01:00
Jacky Lee
6fd93fe93a
Fix rope theta for OpenLlama ( #29893 )
...
fix: rope_theta for open llama
2024-03-30 16:30:52 +01:00
fzyzcjy
5ad7f17002
Super tiny fix 12 typos about "with with" ( #29926 )
...
* with with
* style
2024-03-29 14:31:31 +00:00
Yih-Dar
43d17c1836
Mark test_eager_matches_sdpa_generate
flaky for some models ( #29479 )
...
* fix
* revert for qwen2
* revert for qwen2
* update
* update
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-03-29 11:51:20 +01:00
MariaHei
ba56ed0869
Update installs in image classification doc ( #29947 )
...
Trainer with PyTorch now requires accelerate to be installed.
Partly resolves huggingface/transformers#29174
2024-03-28 14:26:27 -07:00
Arthur
536ea2aca2
[LlamaSlowConverter
] Slow to Fast better support ( #29797 )
...
* fix
* fix test
* style
* nit
* rather rely on concert token to id
* fix quality
* Update src/transformers/convert_slow_tokenizer.py
2024-03-28 16:19:32 +01:00
VINAYAKK GARG
e203646871
Fix doc issue #29758 in DebertaV2Config class ( #29842 )
...
Fix doc issue in DebertaV2Config class
Co-authored-by: Vinayakk Garg <vigar@akamai.com>
2024-03-28 14:49:57 +00:00
Arthur
2bbbf1be5b
[BC
] Fix BC for other libraries ( #29934 )
...
* fi xbc?
* nit
2024-03-28 15:13:23 +01:00
Yu Chin Fabian Lim
4df5b9b4b2
Allow GradientAccumulationPlugin to be configured from AcceleratorConfig ( #29589 )
...
* add gradient_accumulation_kwargs to AcceleratorConfig
* add suggestions from @muellerzr to docstrings, new behavior and tests
* Documentation suggestions from @muellerz
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
* addressed @muellerzr comments regarding tests and test utils
* moved accelerate version to top of file.
* @muellerzr's variable fix
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
* address @amyeroberts. fix tests and docstrings
* address @amyeroberts additional suggestions
---------
Co-authored-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
2024-03-28 14:01:40 +00:00
Arthur
a2a7f71604
[ TokenizationLlama
] fix the way we convert tokens to strings to keep leading spaces 🚨 breaking fix ( #29453 )
...
* nit
* update test and fix test
* fixup
2024-03-28 13:58:40 +01:00
Arthur
e677479c81
[Mamba
] from pretrained issue with self.embeddings
( #29851 )
...
* nit
* update
* oups
* Update src/transformers/models/mamba/modeling_mamba.py
Co-authored-by: Lysandre Debut <hi@lysand.re>
---------
Co-authored-by: Lysandre Debut <hi@lysand.re>
2024-03-28 13:54:51 +01:00
Joao Gante
441de62f49
RoPE models: add numerical sanity-check test for RoPE scaling ( #29808 )
...
* add hard rope scaling test
* make fixup
* quick rope scaling tests
* add copy statements
2024-03-28 11:25:50 +00:00
Christopher Keibel
aac7099c92
add functions to inspect model and optimizer status to trainer.py ( #29838 )
...
* add functions to get number of params which require grad, get optimizer group for parameters and get learning rates of param groups to trainer.py
* add tests and raise ValueError when optimizer is None
* add second layer to test and freeze its weigths
* check if torch is available before running tests
* use decorator to check if torch is available
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix test indentation
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
2024-03-28 10:37:16 +00:00
amyeroberts
855b95ce34
Safe import of LRScheduler ( #29919 )
...
* Safe import of LRScheduler
* Update src/transformers/trainer_pt_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/trainer_pt_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Fix up
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-03-28 09:54:51 +00:00
Aymeric Roucher
c9d2e855ea
Add beam search visualizer to the doc ( #29876 )
2024-03-28 09:54:08 +00:00
Joao Gante
248d5d23a2
Tests: replace torch.testing.assert_allclose
by torch.testing.assert_close
( #29915 )
...
* replace torch.testing.assert_allclose by torch.testing.assert_close
* missing atol rtol
2024-03-28 09:53:31 +00:00
Fanli Lin
7c19fafe44
[doc] fix some typos and add xpu
to the testing documentation ( #29894 )
...
fix typo
2024-03-28 09:42:49 +00:00
Eduardo Pacheco
22d159ddf9
Adding Flash Attention 2 Support for GPT2 ( #29226 )
...
* First commit to add flash attention 2 for GPT-2
* more improvements
* Make GPT2 pass tests and fixed Decison Transformers copies
* Fixed missing arg
* fix copies
* Added expected speedup
* Update src/transformers/models/gpt2/modeling_gpt2.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/gpt2/modeling_gpt2.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/gpt2/modeling_gpt2.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Added test
* Fixed attn attribute
* Update docs/source/en/model_doc/gpt2.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update docs/source/en/model_doc/gpt2.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update Decision transformer attentions
* More updates
* Passing tests
* Fix copies
* Fix copies part 2
* Decision transformer updates
* Update src/transformers/models/gpt2/modeling_gpt2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Fix copies
* Decision transformer not supporting flash attn
* Addressed comments
* Addressed comments
* Addressed comments
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-03-28 09:31:24 +00:00
Arthur
3a7e68362b
[pipeline
]. Zero shot add doc warning ( #29845 )
...
* add doc warning
* fix build pr
2024-03-28 09:10:26 +01:00
Arthur
543889f3f6
[GptNeox
] don't gather on pkv when using the trainer ( #29892 )
...
don't gather on pkv when using the trainer
2024-03-28 08:56:53 +01:00
Arthur
b256516a8c
[make fix-copies
] update and help ( #29924 )
...
* add some help
* style
2024-03-28 08:56:14 +01:00
Minseo Kang
d9dc993fdd
Fix typo in T5Block error message ( #29881 )
2024-03-28 03:30:29 +01:00
Lorenzo Verardo
a25037beb9
MixtralSparseMoeBlock: add gate jitter ( #29865 )
...
This commit adds gate jitter to MixtralSparseMoeBlock's input data
before passing it through the MoE layer, if turned on.
2024-03-27 16:14:26 +01:00
huismiling
75769744e9
add Cambricon MLUs support ( #29627 )
...
* add Cambricon MLUs support
* fix mlu device rng state
* up for quality check
* up mlu to support fp16
* fix mlu device dependency error
* fix mlu device dependency error
* enable mlu device for bf16
* fix mlu device memory tracker
2024-03-27 15:54:28 +01:00
Raushan Turganbay
0efcf32351
Move eos_token_id
to stopping criteria ( #29459 )
...
* add eos stopping criteria
* minor fix
* Update tests/generation/test_stopping_criteria.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* check eos is not None and fix tests
* make style and fixup
* Update src/transformers/generation/stopping_criteria.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update tests/generation/test_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update tests/generation/test_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update src/transformers/generation/__init__.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/generation/stopping_criteria.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/generation/stopping_criteria.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/generation/stopping_criteria.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* camel case everywhere
* call stopping criteria list for candidate ids
* make style and fixup
* Empty commit
* Empty commit to pass flaky test
* set max length in PromptLookupCandidateGenerator
* Update src/transformers/generation/utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* lets fix this typo in docs
* Update src/transformers/generation/utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/generation/utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* update PR
* empty commit
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-03-27 12:18:10 +00:00
Marc Sun
31c575bcf1
fix fuyu device_map compatibility ( #29880 )
...
fix foward
2024-03-27 10:18:48 +01:00
Lysandre Debut
4d8427f739
Reimplement "Automatic safetensors conversion when lacking these files" ( #29846 )
...
* Automatic safetensors conversion when lacking these files (#29390 )
* Automatic safetensors conversion when lacking these files
* Remove debug
* Thread name
* Typo
* Ensure that raises do not affect the main thread
* Catch all errors
2024-03-27 08:58:08 +01:00
Hovnatan Karapetyan
a81cf9ee90
Fix 29807, sinusoidal positional encodings overwritten by post_init() ( #29813 )
...
* Check for requires_grad when initing weights
* Add unit test
* Move sinusoidal positional encoding generation after post_init()
* Add modules to skip init list
* Move create_sinusoidal_embeddings to _init_weights
2024-03-27 06:28:00 +01:00
Anton Vlasjuk
cefb819f7a
Mamba slow_forward
gradient fix ( #29563 )
...
* FIX: Cached slow forward in mamba
- additionally added mamba cached test
- added unused test (mamba causal lm forward and backward)
- fixed typo: "causl" --> "causal"
* formatting
* fix: use real `slow_forward` call instead of torch module's
* add shape assertion for mixer block test
* adjust shape assertion
2024-03-27 04:52:12 +01:00
Bo Zheng
1c39974a4c
Add Qwen2MoE ( #29377 )
...
* add support for qwen2 MoE models
* update docs
* add support for qwen2 MoE models
* update docs
* update model name & test
* update readme
* update class names & readme & model_doc of Qwen2MoE.
* update architecture name
* fix qwen2_moe tests
* use Qwen2Tokenizer instead of Qwen2MoeTokenizer
* update modeling_qwen2_moe.py
* fix model architecture
* fix qwen2_moe tests
* use Qwen2Tokenizer instead of Qwen2MoeTokenizer
* update modeling_qwen2_moe.py
* fix model architecture
* fix style
* fix test when there are sparse and non sparse layers
* fixup
* Update README.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fixup
* fixup
* add archive back
* add support for qwen2 MoE models
* update docs
* update model name & test
* update readme
* update class names & readme & model_doc of Qwen2MoE.
* update architecture name
* fix qwen2_moe tests
* use Qwen2Tokenizer instead of Qwen2MoeTokenizer
* update modeling_qwen2_moe.py
* fix model architecture
* fixup
* fix qwen2_moe tests
* use Qwen2Tokenizer instead of Qwen2MoeTokenizer
* fix style
* fix test when there are sparse and non sparse layers
* fixup
* add archive back
* fix integration test
* fixup
---------
Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-03-27 02:11:55 +01:00
Benjamin Minixhofer
8e08acad6b
Support num_attention_heads
!= num_key_value_heads
in Flax Llama Implementation ( #29557 )
...
* fix tinyllama flax modelling
* rename vars to minimize changes
* move
* formatting
* remove unused var
2024-03-27 02:08:43 +01:00
Lucain
f01e1609bf
Set custom_container in build docs workflows ( #29855 )
2024-03-26 14:46:02 +01:00
Ilyas Moutawwakil
07d79520ef
Disable AMD memory benchmarks ( #29871 )
...
* remove py3nvml to skip amd memory benchmarks
* uninstall pynvml from docker images
2024-03-26 14:43:12 +01:00
Yanyi Liu
ef60995858
Add cosine_with_min_lr
scheduler in Trainer ( #29341 )
...
* Add cosine_with_min_lr scheduler
* Update error message for missing min_lr or min_lr_rate
2024-03-26 13:57:07 +01:00
Zhihao Lin
998b5bb56f
Allow bos_token_id is None
during the generation with inputs_embeds
( #29772 )
...
* update
* add ut
* update
2024-03-26 12:51:00 +00:00
Michael
b9ceb03df8
[docs] Indent ordered list in add_new_model.md ( #29796 )
2024-03-26 12:03:39 +00:00
Merve Noyan
de81a677c4
Fix header in IFE task guide ( #29859 )
...
Update image_feature_extraction.md
2024-03-26 12:32:37 +01:00
yunxiangtang
b32bf85b58
Replace 'decord' with 'av' in VideoClassificationPipeline ( #29747 )
...
* replace the 'decord' with 'av' in VideoClassificationPipeline
* fix the check of backend in VideoClassificationPipeline
* adjust the order of imports
* format 'video_classification.py'
* format 'video_classification.py' with ruff
---------
Co-authored-by: wanqiancheng <13541261013@163.com>
2024-03-26 10:12:24 +00:00
Jonathan Flynn
b5a6d6eeab
Add warnings if training args differ from checkpoint trainer state ( #29255 )
...
* add warnings if training args differ from checkpoint args stored in trainer_state.json
* run formatting and styling
* add a test
* format and styling
---------
Co-authored-by: Jonathan Flynn <jonl.flynn@guardian.co.uk>
2024-03-26 07:13:13 +01:00
Johannes Kolbe
7eb3ba8224
remove quotes in code example ( #29812 )
...
Co-authored-by: Johannes <johannes.kolbe@tech.better.team>
2024-03-25 13:26:54 +00:00
Arthur Zucker
e3e16ddc3c
[revert commit
] revert 00a09ed448
2024-03-25 22:01:01 +09:00
Arthur Zucker
00a09ed448
fix 😭
2024-03-25 21:57:31 +09:00
Yuki Watanabe
8e9a2207b3
Populate torch_dtype from model to pipeline ( #28940 )
...
* Populate torch_dtype from model to pipeline
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
* use property
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
* lint
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
* Remove default handling
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
---------
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
2024-03-25 10:46:40 +01:00
yhuang
afe73aed54
Fix the behavior of collecting 'num_input_tokens_seen' ( #29099 )
...
fix the behavior of collecting 'num_input_tokens_seen'
See https://github.com/huggingface/transformers/issues/28791 for more details.
2024-03-25 10:43:46 +01:00