Joao Gante
441de62f49
RoPE models: add numerical sanity-check test for RoPE scaling ( #29808 )
...
* add hard rope scaling test
* make fixup
* quick rope scaling tests
* add copy statements
2024-03-28 11:25:50 +00:00
Christopher Keibel
aac7099c92
add functions to inspect model and optimizer status to trainer.py ( #29838 )
...
* add functions to get number of params which require grad, get optimizer group for parameters and get learning rates of param groups to trainer.py
* add tests and raise ValueError when optimizer is None
* add second layer to test and freeze its weigths
* check if torch is available before running tests
* use decorator to check if torch is available
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix test indentation
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
2024-03-28 10:37:16 +00:00
amyeroberts
855b95ce34
Safe import of LRScheduler ( #29919 )
...
* Safe import of LRScheduler
* Update src/transformers/trainer_pt_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/trainer_pt_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Fix up
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-03-28 09:54:51 +00:00
Aymeric Roucher
c9d2e855ea
Add beam search visualizer to the doc ( #29876 )
2024-03-28 09:54:08 +00:00
Joao Gante
248d5d23a2
Tests: replace torch.testing.assert_allclose
by torch.testing.assert_close
( #29915 )
...
* replace torch.testing.assert_allclose by torch.testing.assert_close
* missing atol rtol
2024-03-28 09:53:31 +00:00
Fanli Lin
7c19fafe44
[doc] fix some typos and add xpu
to the testing documentation ( #29894 )
...
fix typo
2024-03-28 09:42:49 +00:00
Eduardo Pacheco
22d159ddf9
Adding Flash Attention 2 Support for GPT2 ( #29226 )
...
* First commit to add flash attention 2 for GPT-2
* more improvements
* Make GPT2 pass tests and fixed Decison Transformers copies
* Fixed missing arg
* fix copies
* Added expected speedup
* Update src/transformers/models/gpt2/modeling_gpt2.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/gpt2/modeling_gpt2.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/gpt2/modeling_gpt2.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Added test
* Fixed attn attribute
* Update docs/source/en/model_doc/gpt2.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update docs/source/en/model_doc/gpt2.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update Decision transformer attentions
* More updates
* Passing tests
* Fix copies
* Fix copies part 2
* Decision transformer updates
* Update src/transformers/models/gpt2/modeling_gpt2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Fix copies
* Decision transformer not supporting flash attn
* Addressed comments
* Addressed comments
* Addressed comments
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-03-28 09:31:24 +00:00
Arthur
3a7e68362b
[pipeline
]. Zero shot add doc warning ( #29845 )
...
* add doc warning
* fix build pr
2024-03-28 09:10:26 +01:00
Arthur
543889f3f6
[GptNeox
] don't gather on pkv when using the trainer ( #29892 )
...
don't gather on pkv when using the trainer
2024-03-28 08:56:53 +01:00
Arthur
b256516a8c
[make fix-copies
] update and help ( #29924 )
...
* add some help
* style
2024-03-28 08:56:14 +01:00
Minseo Kang
d9dc993fdd
Fix typo in T5Block error message ( #29881 )
2024-03-28 03:30:29 +01:00
Lorenzo Verardo
a25037beb9
MixtralSparseMoeBlock: add gate jitter ( #29865 )
...
This commit adds gate jitter to MixtralSparseMoeBlock's input data
before passing it through the MoE layer, if turned on.
2024-03-27 16:14:26 +01:00
huismiling
75769744e9
add Cambricon MLUs support ( #29627 )
...
* add Cambricon MLUs support
* fix mlu device rng state
* up for quality check
* up mlu to support fp16
* fix mlu device dependency error
* fix mlu device dependency error
* enable mlu device for bf16
* fix mlu device memory tracker
2024-03-27 15:54:28 +01:00
Raushan Turganbay
0efcf32351
Move eos_token_id
to stopping criteria ( #29459 )
...
* add eos stopping criteria
* minor fix
* Update tests/generation/test_stopping_criteria.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* check eos is not None and fix tests
* make style and fixup
* Update src/transformers/generation/stopping_criteria.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update tests/generation/test_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update tests/generation/test_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update src/transformers/generation/__init__.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/generation/stopping_criteria.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/generation/stopping_criteria.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/generation/stopping_criteria.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* camel case everywhere
* call stopping criteria list for candidate ids
* make style and fixup
* Empty commit
* Empty commit to pass flaky test
* set max length in PromptLookupCandidateGenerator
* Update src/transformers/generation/utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* lets fix this typo in docs
* Update src/transformers/generation/utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/generation/utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* update PR
* empty commit
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-03-27 12:18:10 +00:00
Marc Sun
31c575bcf1
fix fuyu device_map compatibility ( #29880 )
...
fix foward
2024-03-27 10:18:48 +01:00
Lysandre Debut
4d8427f739
Reimplement "Automatic safetensors conversion when lacking these files" ( #29846 )
...
* Automatic safetensors conversion when lacking these files (#29390 )
* Automatic safetensors conversion when lacking these files
* Remove debug
* Thread name
* Typo
* Ensure that raises do not affect the main thread
* Catch all errors
2024-03-27 08:58:08 +01:00
Hovnatan Karapetyan
a81cf9ee90
Fix 29807, sinusoidal positional encodings overwritten by post_init() ( #29813 )
...
* Check for requires_grad when initing weights
* Add unit test
* Move sinusoidal positional encoding generation after post_init()
* Add modules to skip init list
* Move create_sinusoidal_embeddings to _init_weights
2024-03-27 06:28:00 +01:00
Anton Vlasjuk
cefb819f7a
Mamba slow_forward
gradient fix ( #29563 )
...
* FIX: Cached slow forward in mamba
- additionally added mamba cached test
- added unused test (mamba causal lm forward and backward)
- fixed typo: "causl" --> "causal"
* formatting
* fix: use real `slow_forward` call instead of torch module's
* add shape assertion for mixer block test
* adjust shape assertion
2024-03-27 04:52:12 +01:00
Bo Zheng
1c39974a4c
Add Qwen2MoE ( #29377 )
...
* add support for qwen2 MoE models
* update docs
* add support for qwen2 MoE models
* update docs
* update model name & test
* update readme
* update class names & readme & model_doc of Qwen2MoE.
* update architecture name
* fix qwen2_moe tests
* use Qwen2Tokenizer instead of Qwen2MoeTokenizer
* update modeling_qwen2_moe.py
* fix model architecture
* fix qwen2_moe tests
* use Qwen2Tokenizer instead of Qwen2MoeTokenizer
* update modeling_qwen2_moe.py
* fix model architecture
* fix style
* fix test when there are sparse and non sparse layers
* fixup
* Update README.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fixup
* fixup
* add archive back
* add support for qwen2 MoE models
* update docs
* update model name & test
* update readme
* update class names & readme & model_doc of Qwen2MoE.
* update architecture name
* fix qwen2_moe tests
* use Qwen2Tokenizer instead of Qwen2MoeTokenizer
* update modeling_qwen2_moe.py
* fix model architecture
* fixup
* fix qwen2_moe tests
* use Qwen2Tokenizer instead of Qwen2MoeTokenizer
* fix style
* fix test when there are sparse and non sparse layers
* fixup
* add archive back
* fix integration test
* fixup
---------
Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-03-27 02:11:55 +01:00
Benjamin Minixhofer
8e08acad6b
Support num_attention_heads
!= num_key_value_heads
in Flax Llama Implementation ( #29557 )
...
* fix tinyllama flax modelling
* rename vars to minimize changes
* move
* formatting
* remove unused var
2024-03-27 02:08:43 +01:00
Lucain
f01e1609bf
Set custom_container in build docs workflows ( #29855 )
2024-03-26 14:46:02 +01:00
Ilyas Moutawwakil
07d79520ef
Disable AMD memory benchmarks ( #29871 )
...
* remove py3nvml to skip amd memory benchmarks
* uninstall pynvml from docker images
2024-03-26 14:43:12 +01:00
Yanyi Liu
ef60995858
Add cosine_with_min_lr
scheduler in Trainer ( #29341 )
...
* Add cosine_with_min_lr scheduler
* Update error message for missing min_lr or min_lr_rate
2024-03-26 13:57:07 +01:00
Zhihao Lin
998b5bb56f
Allow bos_token_id is None
during the generation with inputs_embeds
( #29772 )
...
* update
* add ut
* update
2024-03-26 12:51:00 +00:00
Michael
b9ceb03df8
[docs] Indent ordered list in add_new_model.md ( #29796 )
2024-03-26 12:03:39 +00:00
Merve Noyan
de81a677c4
Fix header in IFE task guide ( #29859 )
...
Update image_feature_extraction.md
2024-03-26 12:32:37 +01:00
yunxiangtang
b32bf85b58
Replace 'decord' with 'av' in VideoClassificationPipeline ( #29747 )
...
* replace the 'decord' with 'av' in VideoClassificationPipeline
* fix the check of backend in VideoClassificationPipeline
* adjust the order of imports
* format 'video_classification.py'
* format 'video_classification.py' with ruff
---------
Co-authored-by: wanqiancheng <13541261013@163.com>
2024-03-26 10:12:24 +00:00
Jonathan Flynn
b5a6d6eeab
Add warnings if training args differ from checkpoint trainer state ( #29255 )
...
* add warnings if training args differ from checkpoint args stored in trainer_state.json
* run formatting and styling
* add a test
* format and styling
---------
Co-authored-by: Jonathan Flynn <jonl.flynn@guardian.co.uk>
2024-03-26 07:13:13 +01:00
Johannes Kolbe
7eb3ba8224
remove quotes in code example ( #29812 )
...
Co-authored-by: Johannes <johannes.kolbe@tech.better.team>
2024-03-25 13:26:54 +00:00
Arthur Zucker
e3e16ddc3c
[revert commit
] revert 00a09ed448
2024-03-25 22:01:01 +09:00
Arthur Zucker
00a09ed448
fix 😭
2024-03-25 21:57:31 +09:00
Yuki Watanabe
8e9a2207b3
Populate torch_dtype from model to pipeline ( #28940 )
...
* Populate torch_dtype from model to pipeline
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
* use property
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
* lint
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
* Remove default handling
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
---------
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
2024-03-25 10:46:40 +01:00
yhuang
afe73aed54
Fix the behavior of collecting 'num_input_tokens_seen' ( #29099 )
...
fix the behavior of collecting 'num_input_tokens_seen'
See https://github.com/huggingface/transformers/issues/28791 for more details.
2024-03-25 10:43:46 +01:00
Lysandre Debut
39114c0383
Remove static pretrained maps from the library's internals ( #29112 )
...
* [test_all] Remove static pretrained maps from the library's internals
* Deprecate archive maps instead of removing them
* Revert init changes
* [test_all] Deprecate instead of removing
* [test_all] PVT v2 support
* [test_all] Tests should all pass
* [test_all] Style
* Address review comments
* Update src/transformers/models/deprecated/_archive_maps.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/deprecated/_archive_maps.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* [test_all] trigger tests
* [test_all] LLAVA
* [test_all] Bad rebase
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-03-25 10:33:38 +01:00
gamepad_coder
76a33a1092
model_summary.md - Restore link to Harvard's Annotated Transformer. ( #29702 )
...
* model_summary.md - Add link to Harvard's Annotated Transformer.
* model_summary.md - slight wording change + capitalize name of the paper
* model_summary.md - moves the Annotated Transformer link in a praenthesis next to the link to the original paper (great idea, stevhliu!)
* model_summary.md - moves the Annotated Transformer link in a praenthesis next to the link to the original paper (commit pt. 2, accidentally removed "has" in pt. 1)
2024-03-23 18:29:39 -07:00
Billy Cao
dafe370255
[DOCS] Fix typo for llava next docs ( #29829 )
...
Fix typo for llava next docs
2024-03-23 11:32:31 -07:00
amyeroberts
c5f0288bc7
[SuperPoint
] Fix doc example ( #29816 )
...
[SuperPoint] Fix doc example
2024-03-22 16:04:30 +00:00
Lysandre Debut
7e1413d16a
Complete security policy with mentions of remote code ( #29707 )
...
* Security policy
* Apply suggestions from code review
Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>
Co-authored-by: Michelle Habonneau <83347449+Michellehbn@users.noreply.github.com>
* Update SECURITY.md
Co-authored-by: Diogo Teles Sant'Anna <diogoteles@google.com>
---------
Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>
Co-authored-by: Michelle Habonneau <83347449+Michellehbn@users.noreply.github.com>
Co-authored-by: Diogo Teles Sant'Anna <diogoteles@google.com>
2024-03-22 14:13:18 +01:00
Arthur
2e7cb46f85
[cleanup
] vestiges of causal mask ( #29806 )
...
nit
2024-03-22 12:25:40 +00:00
igeni
884b2215c3
replaced concatenation to f-strings to improve readability and unify … ( #29785 )
...
replaced concatenation to f-strings to improve readability and unify with the rest code
2024-03-22 12:23:16 +00:00
Joao Gante
34e07f4ba8
Generate: remove unused attributes in AssistedCandidateGenerator
( #29787 )
...
remove unused attrs
2024-03-22 12:20:32 +00:00
jiqing-feng
e85654f5ec
rm input dtype change in CPU ( #28631 )
...
* rm input dtype change in CPU
* add warning when use CPU low-precision
* rm useless logging
2024-03-22 12:02:43 +00:00
fxmarty
13b23704a8
Correct llava mask & fix missing setter for vocab_size
( #29389 )
...
* correct llava mask
* fix vipllava as wlel
* mask out embedding for padding tokens
* add test
* fix style
* add setter
* fix test on suggestion
2024-03-22 19:57:08 +08:00
Ilyas Moutawwakil
aa17cf986f
Enable AMD docker build CI ( #29803 )
...
* enable amd ci
* remove unnecessary clean up
2024-03-22 11:56:47 +01:00
Steven Madere
347916130c
Fix type hint for train_dataset param of Trainer.__init__() to allow IterableDataset. Issue 29678 ( #29738 )
...
* Fixed typehint for train_dataset param in Trainer.__init__(). Added IterableDataset option.
* make fixup
2024-03-22 10:46:14 +00:00
Arthur
e68ff30419
[quality
] update quality check to make sure we check imports 😈 ( #29771 )
...
* update quality check
* make it nice
* update
* let's make sure it runs and we have the logs actually
* update workflow
* nits
2024-03-22 10:11:59 +01:00
Raushan Turganbay
fadb053379
Change in-place operations to out-of-place in LogitsProcessors ( #29680 )
...
* change in-place -> out-of-place
* add tests
* add more tests
* naming consistency
* fix doctest
* forgot min-length processors
* empty
* Revert "fix doctest"
This reverts commit 4772768457
.
* revert change in docstring
* Update tests/generation/test_logits_process.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/generation/test_logits_process.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-03-21 16:37:33 +00:00
Raushan Turganbay
b469ebc5cf
Prepend bos token
to Blip generations ( #29642 )
...
* prepend "bos" to blip generation
* minor changes
* Update src/transformers/models/blip_2/modeling_blip_2.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update src/transformers/models/instructblip/modeling_instructblip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add generation tester mixin
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-03-21 16:33:18 +00:00
Joao Gante
ee38fc31fb
Llama: always convert the causal mask in the SDPA code path ( #29663 )
...
* always convert the mask
* rebase and fix copies
2024-03-21 16:30:18 +00:00
Joao Gante
5ffef2a978
Generate: remove legacy generation mixin imports ( #29782 )
2024-03-21 16:28:25 +00:00