Commit Graph

16108 Commits

Author SHA1 Message Date
Yih-Dar
bb3bd44739
Fix the check of models supporting FA/SDPA not run (#28202)
* add check_support_list.py

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-12-22 12:56:11 +01:00
Michael Feil
e37ab52dff
Bug: training_args.py fix missing import with accelerate with version accelerate==0.20.1 (#28171)
* fix-accelerate-version

* updated with exported ACCELERATE_MIN_VERSION,

* update string in ACCELERATE_MIN_VERSION
2023-12-22 11:41:35 +00:00
NielsRogge
c9fb250a25
Add Swinv2 backbone (#27742)
* First draft

* More improvements

* More improvements

* Make all tests pass

* Remove script

* Update image processor

* Address comments

* Use new gradient checkpointing method

* Convert checkpoints, add integration test

* Do not keep aspect ratio for now

* Set keep_aspect_ratio=False for beit, add integration test

* Remove print statement
2023-12-22 11:12:56 +00:00
Nicholas Neo
1ef86c4f56
Fix: [SeamlessM4T - S2TT] Bug in batch loading of audio in torch.Tensor format in the SeamlessM4TFeatureExtractor class (#27914)
* fixes: code fixes on is_batched condition to also check for batched audio data in torch.Tensor format instead of only just checking for batched audio data in np.ndarray format

* Update src/transformers/models/seamless_m4t/feature_extraction_seamless_m4t.py

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>

* refactor: code refactoring to remove torch framework dependency

* docs: updated docstring to add torch tensor compatibility

* test: add test cases to incorporate torch tensor inputs

* test: ran make fix-copies for code conformity

* test: refactor test to separate the test_call into test_call_numpy and test_call_torch

---------

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
2023-12-22 10:47:30 +00:00
Dean Wyatte
548a8f6119
Fix ONNX export for causal LM sequence classifiers by removing reverse indexing (#28144)
* normalize reverse indexing for causal lm sequence classifiers

* normalize reverse indexing for causal lm sequence classifiers

* normalize reverse indexing for causal lm sequence classifiers

* use modulo instead

* unify modulo-based sequence lengths
2023-12-22 10:33:44 +00:00
Yih-Dar
71f460578d
Update docs/source/en/perf_infer_gpu_one.md (#28198)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-12-22 10:40:22 +01:00
Younes Belkada
3a8769f6a9
[Docs] Add 4-bit serialization docs (#28182)
* add 4-bit serialization docs

* up

* up
2023-12-22 10:18:32 +01:00
amyeroberts
3657748b4d
Update YOLOS slow test values (#28187)
Update test values
2023-12-21 18:17:07 +00:00
amyeroberts
cd1350ce9b
Fix slow backbone tests - out_indices must match stage name ordering (#28186)
Indices must match stage name ordering
2023-12-21 18:16:50 +00:00
Matt
260b9d2179
Even more TF test fixes (#28146)
* Fix vision text dual encoder

* Small cleanup for wav2vec2 (not fixed yet)

* Small fix for vision_encoder_decoder

* Fix SAM builds

* Update TFBertTokenizer test with modern exporting + tokenizer

* Fix DeBERTa

* Fix DeBERTav2

* Try RAG fix but it's impossible to test locally

* Actually fix RAG now that I got FAISS working somehow

* Fix Wav2Vec2, add sermon

* Fix Hubert
2023-12-21 15:14:46 +00:00
Arthur
f9a98c476c
[Mixtral & Mistral] Add support for sdpa (#28133)
* some nits

* update test

* add support d\sd[a

* remove some dummy inputs

* all good

* style

* nits

* fixes

* fix more copies

* nits

* styling

* fix

* Update src/transformers/models/mistral/modeling_mistral.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* add a slow test just to be sure

* fixup

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-12-21 12:38:22 +01:00
Sanchit Gandhi
814619f54f
[Whisper] Use torch for stft if available (#26119)
* [Whisper] Use torch for stft if available

* update docstring

* mock patch decorator

* fit on one line
2023-12-21 11:04:05 +00:00
Joao Gante
7e93ce40c5
Fix input_embeds docstring in encoder-decoder architectures (#28168) 2023-12-21 11:01:54 +00:00
Poedator
4f7806ef7e
[bnb] Let's make serialization of 4bit models possible (#26037)
* updated bitsandbytes.py

* rm test_raise_* from test_4bit.py

* add test_4bit_serialization.py

* modeling_utils bulk edits

* bnb_ver 0.41.3 in integrations/bitsandbytes.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* @slow reinstated

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* bnb ver 0.41.3 in  src/transformers/modeling_utils.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* rm bnb version todo in  integrations/bitsandbytes.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* moved 4b serialization tests to test_4bit

* tests upd for opt

* to torch_device

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* ruff fixes to tests

* rm redundant bnb version check in mod_utils

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* restore _hf_peft_config_loaded  modeling_utils.py::2188

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* restore _hf_peft_config_loaded  test in modeling_utils.py::2199

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* fixed NOT getattr(self, "is_8bit_serializable")

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* setting model.is_4bit_serializable

* rm separate fp16_statistics arg from set_module...

* rm else branch in integrations::bnb::set_module

* bnb 4bit dtype check

* upd comment on 4bit weights

* upd tests for FP4 safe

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-12-21 11:54:44 +01:00
Dean Wyatte
e268d7e5dc
disable test_retain_grad_hidden_states_attentions on SeamlessM4TModelWithTextInputTest (#28169)
disable retain_grad_hidden_states_attentions on SeamlessM4TModelWithTextInputTest
2023-12-21 08:39:44 +01:00
amyeroberts
1d77735947
Fix yolos resizing (#27663)
* Fix yolos resizing

* Update tests

* Add a test
2023-12-20 20:55:51 +00:00
Joao Gante
45b70384a7
Generate: fix speculative decoding (#28166)
Co-authored-by: Merve Noyan <merveenoyan@gmail.com>
2023-12-20 18:55:35 +00:00
Steven Liu
01c081d138
[docs] Trainer docs (#28145)
* fsdp, debugging, gpu selection

* fix hfoption

* fix
2023-12-20 10:37:23 -08:00
amyeroberts
ee298a16a2
Align backbone stage selection with out_indices & out_features (#27606)
* Iteratre over out_features instead of stage_names

* Update for all backbones

* Add tests

* Fix

* Align timm backbone behaviour with other backbones

* Fix tests

* Stricter checks on set out_features and out_indices

* Revert back stage selection logic

* Remove out-of-order logic

* Document restriction in docstrings
2023-12-20 18:33:17 +00:00
amyeroberts
224ab70969
Update FA2 exception msg to point to hub discussions (#28161)
* Update FA2 exception msg to point to hub discussions

* Use path for hub url
2023-12-20 16:52:16 +00:00
Yih-Dar
9924df9eb2
Avoid unnecessary warnings when loading CLIPConfig (#28108)
* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-12-20 17:24:53 +01:00
Yih-Dar
7938c8c836
Fix weights not properly initialized due to shape mismatch (#28122)
* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-12-20 14:20:02 +01:00
peter-sk
769a9542de
move code to Trainer.evaluate to enable use of that function with multiple datasets (#27844)
* move code to Trainer.evaluate to enable use of that function with multiple datasets

* test

* update doc string

* and a tip

* forgot the type

---------

Co-authored-by: Prof. Peter Schneider-Kamp <jps@ordbogen.com>
2023-12-20 10:55:56 +01:00
Jong-hun Shin
cd9f9d63f1
[gpt-neox] Add attention_bias config to support model trained without attention biases (#28126)
* add attention_bias hparam for a model trained without attention biases

* fix argument documentation error
2023-12-20 10:05:32 +01:00
Sourab Mangrulkar
def581ef51
Fix FA2 integration (#28142)
* fix fa2

* fix FA2 for popular models

* improve warning and add Younes as co-author

Co-Authored-By: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update src/transformers/modeling_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix the warning

* Add Tip

* typo fix

* nit

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-12-20 14:25:07 +05:30
Abolfazl Shahbazi
b134f6857e
Remove deprecated CPU dockerfiles (#28149)
Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>
2023-12-20 05:51:35 +01:00
Aaron Jimenez
38611086d2
[docs] Fix mistral link in mixtral.md (#28143)
Fix mistral link in mixtral.md
2023-12-19 10:34:14 -08:00
Mike Zellinger
23f8e4db77
Update modeling_utils.py (#28127)
In docstring for PreTrainedModel.resize_token_embeddings, correct definition of new_num_tokens parameter to read "the new number of tokens" (meaning the new size of the vocab) rather than "the number of new tokens" (number of newly added tokens only).
2023-12-19 09:07:57 -08:00
Arthur
4a04b4ccca
[Mixtral] Fix loss + nits (#28115)
* default config should not use sliding window

* update the doc

* nits

* add a proper test

* update

* update

* update expected value

* Update src/transformers/tokenization_utils_fast.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* convert to float

* average then N**2

* comment

* revert nit

* good to fo

* fixup

* Update tests/models/mixtral/test_modeling_mixtral.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

* revert unrelated change

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
2023-12-19 17:31:54 +01:00
Joao Gante
ac974199c8
Generate: speculative decoding (#27979)
* speculative decoding

* fix test

* space

* better comments

* remove redundant test

* test nit

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* PR comments

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-12-19 13:58:30 +00:00
amyeroberts
bd7a356135
Update split string in doctest to reflect #28087 (#28135) 2023-12-19 13:55:09 +00:00
qihqi
5aec50ecaf
When save a model on TPU, make a copy to be moved to CPU (#27993)
* When save a model, make a copy to be moved to CPU, dont move the original
model

* make deepcopy inside of _save_tpu

* Move to tpu without copy
2023-12-19 10:08:51 +00:00
Aaron Jimenez
4edffda636
[Doc] Fix token link in What 🤗 Transformers can do (#28123)
Fix token link
2023-12-18 15:06:54 -08:00
Mike Salvatore
c52b515e94
Fix a typo in tokenizer documentation (#28118) 2023-12-18 19:44:35 +01:00
Steven Liu
a52e180a0f
[docs] General doc fixes (#28087)
* doc fix friday

* deprecated objects

* update not_doctested

* update toctree
2023-12-18 10:44:09 -08:00
Rockerz
08a6e7a702
Fix indentation error - semantic_segmentation.md (#28117)
Update semantic_segmentation.md
2023-12-18 12:47:54 -05:00
Matt
71d47f0ad4
More TF fixes (#28081)
* More build_in_name_scope()

* Make sure we set the save spec now we don't do it with dummies anymore

* make fixup
2023-12-18 15:26:03 +00:00
Lucain
0695b2421a
Remove warning if DISABLE_TELEMETRY is used (#28113)
remove warning if DISABLE_TELEMETRY is used
2023-12-18 16:18:01 +01:00
Daize Dong
7c5408dade
Disable jitter noise during evaluation in SwitchTransformers (#28077)
* Disable jitter noise during evaluation

* Update outdated configuration information

* Formatting

* Add new line
2023-12-18 15:08:55 +00:00
lain
a0522de497
fix ConversationalPipeline docstring (#28091) 2023-12-18 15:08:37 +00:00
Wang, Yi
e6cb8e052a
in peft finetune, only the trainable parameters need to be saved (#27825)
to reduce the storage size and also save the time of checkpoint saving while using deepspeed for training

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
2023-12-18 14:27:05 +00:00
Aeneas Stankowski
7f2a8f92e4
Spelling correction (#28110)
Update mixtral.md

correct minor typo in overview
2023-12-18 14:04:05 +00:00
Younes Belkada
b8378b658e
[Llava / Vip-Llava] Add SDPA into llava (#28107)
add SDPA into llava
2023-12-18 13:46:30 +01:00
cyyever
e6dcf8abd6
Fix the deprecation warning of _torch_pytree._register_pytree_node (#27803) 2023-12-17 11:13:42 +01:00
Poedator
f85a1e82c1
4D attention_mask support (#27539)
* edits to _prepare_4d_causal_attention_mask()

* initial tests for 4d mask

* attention_mask_for_sdpa support

* added test for inner model hidden

* added autotest decorators

* test mask dtype to torch.int64

* torch.testing.assert_close

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* torch_device and @torch_gpu in tests

* upd tests

* +torch decorators

* torch decorators fixed

* more decorators!

* even more decorators

* fewer decorators

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-12-17 11:08:04 +01:00
Sourab Mangrulkar
238d2e3c44
fix resuming from ckpt when using FSDP with FULL_STATE_DICT (#27891)
* fix resuming from ckpt when suing FSDP with FULL_STATE_DICT

* update tests

* fix tests
2023-12-16 19:41:43 +05:30
Steven Liu
ebfdb9ca62
[docs] MPS (#28016)
* mps docs

* toctree
2023-12-15 13:17:29 -08:00
Steven Liu
0d63d17765
[docs] Trainer (#27986)
* first draft

* add to toctree

* edits

* feedback
2023-12-15 12:06:55 -08:00
Younes Belkada
1faeff85ce
Fix Vip-llava docs (#28085)
* Update vipllava.md

* Update modeling_vipllava.py
2023-12-15 20:16:47 +01:00
Ligeng Zhu
ffa04def0e
Fix wrong examples in llava usage. (#28020)
* Fix wrong examples in llava usage.

* Update modeling_llava.py
2023-12-15 17:09:50 +00:00