Arthur
543889f3f6
[GptNeox
] don't gather on pkv when using the trainer ( #29892 )
...
don't gather on pkv when using the trainer
2024-03-28 08:56:53 +01:00
Arthur
b256516a8c
[make fix-copies
] update and help ( #29924 )
...
* add some help
* style
2024-03-28 08:56:14 +01:00
Minseo Kang
d9dc993fdd
Fix typo in T5Block error message ( #29881 )
2024-03-28 03:30:29 +01:00
Lorenzo Verardo
a25037beb9
MixtralSparseMoeBlock: add gate jitter ( #29865 )
...
This commit adds gate jitter to MixtralSparseMoeBlock's input data
before passing it through the MoE layer, if turned on.
2024-03-27 16:14:26 +01:00
huismiling
75769744e9
add Cambricon MLUs support ( #29627 )
...
* add Cambricon MLUs support
* fix mlu device rng state
* up for quality check
* up mlu to support fp16
* fix mlu device dependency error
* fix mlu device dependency error
* enable mlu device for bf16
* fix mlu device memory tracker
2024-03-27 15:54:28 +01:00
Raushan Turganbay
0efcf32351
Move eos_token_id
to stopping criteria ( #29459 )
...
* add eos stopping criteria
* minor fix
* Update tests/generation/test_stopping_criteria.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* check eos is not None and fix tests
* make style and fixup
* Update src/transformers/generation/stopping_criteria.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update tests/generation/test_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update tests/generation/test_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update src/transformers/generation/__init__.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/generation/stopping_criteria.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/generation/stopping_criteria.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/generation/stopping_criteria.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* camel case everywhere
* call stopping criteria list for candidate ids
* make style and fixup
* Empty commit
* Empty commit to pass flaky test
* set max length in PromptLookupCandidateGenerator
* Update src/transformers/generation/utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* lets fix this typo in docs
* Update src/transformers/generation/utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/generation/utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* update PR
* empty commit
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-03-27 12:18:10 +00:00
Marc Sun
31c575bcf1
fix fuyu device_map compatibility ( #29880 )
...
fix foward
2024-03-27 10:18:48 +01:00
Lysandre Debut
4d8427f739
Reimplement "Automatic safetensors conversion when lacking these files" ( #29846 )
...
* Automatic safetensors conversion when lacking these files (#29390 )
* Automatic safetensors conversion when lacking these files
* Remove debug
* Thread name
* Typo
* Ensure that raises do not affect the main thread
* Catch all errors
2024-03-27 08:58:08 +01:00
Hovnatan Karapetyan
a81cf9ee90
Fix 29807, sinusoidal positional encodings overwritten by post_init() ( #29813 )
...
* Check for requires_grad when initing weights
* Add unit test
* Move sinusoidal positional encoding generation after post_init()
* Add modules to skip init list
* Move create_sinusoidal_embeddings to _init_weights
2024-03-27 06:28:00 +01:00
Anton Vlasjuk
cefb819f7a
Mamba slow_forward
gradient fix ( #29563 )
...
* FIX: Cached slow forward in mamba
- additionally added mamba cached test
- added unused test (mamba causal lm forward and backward)
- fixed typo: "causl" --> "causal"
* formatting
* fix: use real `slow_forward` call instead of torch module's
* add shape assertion for mixer block test
* adjust shape assertion
2024-03-27 04:52:12 +01:00
Bo Zheng
1c39974a4c
Add Qwen2MoE ( #29377 )
...
* add support for qwen2 MoE models
* update docs
* add support for qwen2 MoE models
* update docs
* update model name & test
* update readme
* update class names & readme & model_doc of Qwen2MoE.
* update architecture name
* fix qwen2_moe tests
* use Qwen2Tokenizer instead of Qwen2MoeTokenizer
* update modeling_qwen2_moe.py
* fix model architecture
* fix qwen2_moe tests
* use Qwen2Tokenizer instead of Qwen2MoeTokenizer
* update modeling_qwen2_moe.py
* fix model architecture
* fix style
* fix test when there are sparse and non sparse layers
* fixup
* Update README.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fixup
* fixup
* add archive back
* add support for qwen2 MoE models
* update docs
* update model name & test
* update readme
* update class names & readme & model_doc of Qwen2MoE.
* update architecture name
* fix qwen2_moe tests
* use Qwen2Tokenizer instead of Qwen2MoeTokenizer
* update modeling_qwen2_moe.py
* fix model architecture
* fixup
* fix qwen2_moe tests
* use Qwen2Tokenizer instead of Qwen2MoeTokenizer
* fix style
* fix test when there are sparse and non sparse layers
* fixup
* add archive back
* fix integration test
* fixup
---------
Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-03-27 02:11:55 +01:00
Benjamin Minixhofer
8e08acad6b
Support num_attention_heads
!= num_key_value_heads
in Flax Llama Implementation ( #29557 )
...
* fix tinyllama flax modelling
* rename vars to minimize changes
* move
* formatting
* remove unused var
2024-03-27 02:08:43 +01:00
Lucain
f01e1609bf
Set custom_container in build docs workflows ( #29855 )
2024-03-26 14:46:02 +01:00
Ilyas Moutawwakil
07d79520ef
Disable AMD memory benchmarks ( #29871 )
...
* remove py3nvml to skip amd memory benchmarks
* uninstall pynvml from docker images
2024-03-26 14:43:12 +01:00
Yanyi Liu
ef60995858
Add cosine_with_min_lr
scheduler in Trainer ( #29341 )
...
* Add cosine_with_min_lr scheduler
* Update error message for missing min_lr or min_lr_rate
2024-03-26 13:57:07 +01:00
Zhihao Lin
998b5bb56f
Allow bos_token_id is None
during the generation with inputs_embeds
( #29772 )
...
* update
* add ut
* update
2024-03-26 12:51:00 +00:00
Michael
b9ceb03df8
[docs] Indent ordered list in add_new_model.md ( #29796 )
2024-03-26 12:03:39 +00:00
Merve Noyan
de81a677c4
Fix header in IFE task guide ( #29859 )
...
Update image_feature_extraction.md
2024-03-26 12:32:37 +01:00
yunxiangtang
b32bf85b58
Replace 'decord' with 'av' in VideoClassificationPipeline ( #29747 )
...
* replace the 'decord' with 'av' in VideoClassificationPipeline
* fix the check of backend in VideoClassificationPipeline
* adjust the order of imports
* format 'video_classification.py'
* format 'video_classification.py' with ruff
---------
Co-authored-by: wanqiancheng <13541261013@163.com>
2024-03-26 10:12:24 +00:00
Jonathan Flynn
b5a6d6eeab
Add warnings if training args differ from checkpoint trainer state ( #29255 )
...
* add warnings if training args differ from checkpoint args stored in trainer_state.json
* run formatting and styling
* add a test
* format and styling
---------
Co-authored-by: Jonathan Flynn <jonl.flynn@guardian.co.uk>
2024-03-26 07:13:13 +01:00
Johannes Kolbe
7eb3ba8224
remove quotes in code example ( #29812 )
...
Co-authored-by: Johannes <johannes.kolbe@tech.better.team>
2024-03-25 13:26:54 +00:00
Arthur Zucker
e3e16ddc3c
[revert commit
] revert 00a09ed448
2024-03-25 22:01:01 +09:00
Arthur Zucker
00a09ed448
fix 😭
2024-03-25 21:57:31 +09:00
Yuki Watanabe
8e9a2207b3
Populate torch_dtype from model to pipeline ( #28940 )
...
* Populate torch_dtype from model to pipeline
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
* use property
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
* lint
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
* Remove default handling
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
---------
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
2024-03-25 10:46:40 +01:00
yhuang
afe73aed54
Fix the behavior of collecting 'num_input_tokens_seen' ( #29099 )
...
fix the behavior of collecting 'num_input_tokens_seen'
See https://github.com/huggingface/transformers/issues/28791 for more details.
2024-03-25 10:43:46 +01:00
Lysandre Debut
39114c0383
Remove static pretrained maps from the library's internals ( #29112 )
...
* [test_all] Remove static pretrained maps from the library's internals
* Deprecate archive maps instead of removing them
* Revert init changes
* [test_all] Deprecate instead of removing
* [test_all] PVT v2 support
* [test_all] Tests should all pass
* [test_all] Style
* Address review comments
* Update src/transformers/models/deprecated/_archive_maps.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/deprecated/_archive_maps.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* [test_all] trigger tests
* [test_all] LLAVA
* [test_all] Bad rebase
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-03-25 10:33:38 +01:00
gamepad_coder
76a33a1092
model_summary.md - Restore link to Harvard's Annotated Transformer. ( #29702 )
...
* model_summary.md - Add link to Harvard's Annotated Transformer.
* model_summary.md - slight wording change + capitalize name of the paper
* model_summary.md - moves the Annotated Transformer link in a praenthesis next to the link to the original paper (great idea, stevhliu!)
* model_summary.md - moves the Annotated Transformer link in a praenthesis next to the link to the original paper (commit pt. 2, accidentally removed "has" in pt. 1)
2024-03-23 18:29:39 -07:00
Billy Cao
dafe370255
[DOCS] Fix typo for llava next docs ( #29829 )
...
Fix typo for llava next docs
2024-03-23 11:32:31 -07:00
amyeroberts
c5f0288bc7
[SuperPoint
] Fix doc example ( #29816 )
...
[SuperPoint] Fix doc example
2024-03-22 16:04:30 +00:00
Lysandre Debut
7e1413d16a
Complete security policy with mentions of remote code ( #29707 )
...
* Security policy
* Apply suggestions from code review
Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>
Co-authored-by: Michelle Habonneau <83347449+Michellehbn@users.noreply.github.com>
* Update SECURITY.md
Co-authored-by: Diogo Teles Sant'Anna <diogoteles@google.com>
---------
Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>
Co-authored-by: Michelle Habonneau <83347449+Michellehbn@users.noreply.github.com>
Co-authored-by: Diogo Teles Sant'Anna <diogoteles@google.com>
2024-03-22 14:13:18 +01:00
Arthur
2e7cb46f85
[cleanup
] vestiges of causal mask ( #29806 )
...
nit
2024-03-22 12:25:40 +00:00
igeni
884b2215c3
replaced concatenation to f-strings to improve readability and unify … ( #29785 )
...
replaced concatenation to f-strings to improve readability and unify with the rest code
2024-03-22 12:23:16 +00:00
Joao Gante
34e07f4ba8
Generate: remove unused attributes in AssistedCandidateGenerator
( #29787 )
...
remove unused attrs
2024-03-22 12:20:32 +00:00
jiqing-feng
e85654f5ec
rm input dtype change in CPU ( #28631 )
...
* rm input dtype change in CPU
* add warning when use CPU low-precision
* rm useless logging
2024-03-22 12:02:43 +00:00
fxmarty
13b23704a8
Correct llava mask & fix missing setter for vocab_size
( #29389 )
...
* correct llava mask
* fix vipllava as wlel
* mask out embedding for padding tokens
* add test
* fix style
* add setter
* fix test on suggestion
2024-03-22 19:57:08 +08:00
Ilyas Moutawwakil
aa17cf986f
Enable AMD docker build CI ( #29803 )
...
* enable amd ci
* remove unnecessary clean up
2024-03-22 11:56:47 +01:00
Steven Madere
347916130c
Fix type hint for train_dataset param of Trainer.__init__() to allow IterableDataset. Issue 29678 ( #29738 )
...
* Fixed typehint for train_dataset param in Trainer.__init__(). Added IterableDataset option.
* make fixup
2024-03-22 10:46:14 +00:00
Arthur
e68ff30419
[quality
] update quality check to make sure we check imports 😈 ( #29771 )
...
* update quality check
* make it nice
* update
* let's make sure it runs and we have the logs actually
* update workflow
* nits
2024-03-22 10:11:59 +01:00
Raushan Turganbay
fadb053379
Change in-place operations to out-of-place in LogitsProcessors ( #29680 )
...
* change in-place -> out-of-place
* add tests
* add more tests
* naming consistency
* fix doctest
* forgot min-length processors
* empty
* Revert "fix doctest"
This reverts commit 4772768457
.
* revert change in docstring
* Update tests/generation/test_logits_process.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/generation/test_logits_process.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-03-21 16:37:33 +00:00
Raushan Turganbay
b469ebc5cf
Prepend bos token
to Blip generations ( #29642 )
...
* prepend "bos" to blip generation
* minor changes
* Update src/transformers/models/blip_2/modeling_blip_2.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update src/transformers/models/instructblip/modeling_instructblip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add generation tester mixin
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-03-21 16:33:18 +00:00
Joao Gante
ee38fc31fb
Llama: always convert the causal mask in the SDPA code path ( #29663 )
...
* always convert the mask
* rebase and fix copies
2024-03-21 16:30:18 +00:00
Joao Gante
5ffef2a978
Generate: remove legacy generation mixin imports ( #29782 )
2024-03-21 16:28:25 +00:00
Jacky Lee
ef6e371dba
Add support for torch_dtype
in the run_mlm example ( #29776 )
...
feat: add support for torch_dtype
Co-authored-by: Jacky Lee <jackylee328@gmail.com>
2024-03-21 15:09:35 +00:00
Zach Mueller
10d232e88e
Add deterministic config to set_seed
( #29778 )
...
* Add deterministic config
* Add note on slowdown
* English fails me again
2024-03-21 11:07:39 -04:00
Zach Mueller
f0bfb150fe
Silence deprecations and use the DataLoaderConfig ( #29779 )
...
* Remove deprecations
* Clean
2024-03-21 10:26:51 -04:00
Matt
de627f5a14
Cast bfloat16 to float32 for Numpy conversions ( #29755 )
...
* Cast bfloat16 to float32 for Numpy conversions
* Add test
2024-03-21 14:04:11 +00:00
Arthur
73a73b415e
[LlavaNext
] Fix llava next unsafe imports ( #29773 )
...
* path llava-next
* styling
* styling
2024-03-21 13:47:58 +01:00
Yih-Dar
2ddceef9a2
Fix docker image build for Latest PyTorch + TensorFlow [dev]
( #29764 )
...
* update
* update
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-03-21 13:14:29 +01:00
théo gigant
fd734be1b6
fix issue with logit processor during beam search in Flax ( #29636 )
...
fix issue with logit processor in beam search in Flax
2024-03-21 11:27:03 +00:00
Matthias Dittrich
691c3d7325
Allow -OO
mode for docstring_decorator
( #29689 )
...
Fixes
```
File "/nix/store/rv8xdwghdad9jv2w86b8g08kan9l6ksm-python3.11-transformers-4.38.2/lib/python3.11/site-packages/transformers/models/auto/configuration_auto.py", line 987, in <module>
class AutoConfig:
File "/nix/store/rv8xdwghdad9jv2w86b8g08kan9l6ksm-python3.11-transformers-4.38.2/lib/python3.11/site-packages/transformers/models/auto/configuration_auto.py", line 1011, in AutoConfig
@replace_list_option_in_docstrings()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nix/store/rv8xdwghdad9jv2w86b8g08kan9l6ksm-python3.11-transformers-4.38.2/lib/python3.11/site-packages/transformers/models/auto/configuration_auto.py", line 966, in docstring_decorator
lines = docstrings.split("\n")
^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'split'
```
2024-03-21 11:18:17 +00:00