Yih-Dar
71f460578d
Update docs/source/en/perf_infer_gpu_one.md
( #28198 )
...
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-12-22 10:40:22 +01:00
Younes Belkada
3a8769f6a9
[Docs
] Add 4-bit serialization docs ( #28182 )
...
* add 4-bit serialization docs
* up
* up
2023-12-22 10:18:32 +01:00
amyeroberts
3657748b4d
Update YOLOS slow test values ( #28187 )
...
Update test values
2023-12-21 18:17:07 +00:00
amyeroberts
cd1350ce9b
Fix slow backbone tests - out_indices must match stage name ordering ( #28186 )
...
Indices must match stage name ordering
2023-12-21 18:16:50 +00:00
Matt
260b9d2179
Even more TF test fixes ( #28146 )
...
* Fix vision text dual encoder
* Small cleanup for wav2vec2 (not fixed yet)
* Small fix for vision_encoder_decoder
* Fix SAM builds
* Update TFBertTokenizer test with modern exporting + tokenizer
* Fix DeBERTa
* Fix DeBERTav2
* Try RAG fix but it's impossible to test locally
* Actually fix RAG now that I got FAISS working somehow
* Fix Wav2Vec2, add sermon
* Fix Hubert
2023-12-21 15:14:46 +00:00
Arthur
f9a98c476c
[Mixtral
& Mistral
] Add support for sdpa ( #28133 )
...
* some nits
* update test
* add support d\sd[a
* remove some dummy inputs
* all good
* style
* nits
* fixes
* fix more copies
* nits
* styling
* fix
* Update src/transformers/models/mistral/modeling_mistral.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* add a slow test just to be sure
* fixup
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-12-21 12:38:22 +01:00
Sanchit Gandhi
814619f54f
[Whisper] Use torch for stft if available ( #26119 )
...
* [Whisper] Use torch for stft if available
* update docstring
* mock patch decorator
* fit on one line
2023-12-21 11:04:05 +00:00
Joao Gante
7e93ce40c5
Fix input_embeds
docstring in encoder-decoder architectures ( #28168 )
2023-12-21 11:01:54 +00:00
Poedator
4f7806ef7e
[bnb] Let's make serialization of 4bit models possible ( #26037 )
...
* updated bitsandbytes.py
* rm test_raise_* from test_4bit.py
* add test_4bit_serialization.py
* modeling_utils bulk edits
* bnb_ver 0.41.3 in integrations/bitsandbytes.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* @slow reinstated
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* bnb ver 0.41.3 in src/transformers/modeling_utils.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* rm bnb version todo in integrations/bitsandbytes.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* moved 4b serialization tests to test_4bit
* tests upd for opt
* to torch_device
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* ruff fixes to tests
* rm redundant bnb version check in mod_utils
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* restore _hf_peft_config_loaded modeling_utils.py::2188
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* restore _hf_peft_config_loaded test in modeling_utils.py::2199
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* fixed NOT getattr(self, "is_8bit_serializable")
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* setting model.is_4bit_serializable
* rm separate fp16_statistics arg from set_module...
* rm else branch in integrations::bnb::set_module
* bnb 4bit dtype check
* upd comment on 4bit weights
* upd tests for FP4 safe
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-12-21 11:54:44 +01:00
Dean Wyatte
e268d7e5dc
disable test_retain_grad_hidden_states_attentions on SeamlessM4TModelWithTextInputTest ( #28169 )
...
disable retain_grad_hidden_states_attentions on SeamlessM4TModelWithTextInputTest
2023-12-21 08:39:44 +01:00
amyeroberts
1d77735947
Fix yolos resizing ( #27663 )
...
* Fix yolos resizing
* Update tests
* Add a test
2023-12-20 20:55:51 +00:00
Joao Gante
45b70384a7
Generate: fix speculative decoding ( #28166 )
...
Co-authored-by: Merve Noyan <merveenoyan@gmail.com>
2023-12-20 18:55:35 +00:00
Steven Liu
01c081d138
[docs] Trainer docs ( #28145 )
...
* fsdp, debugging, gpu selection
* fix hfoption
* fix
2023-12-20 10:37:23 -08:00
amyeroberts
ee298a16a2
Align backbone stage selection with out_indices & out_features ( #27606 )
...
* Iteratre over out_features instead of stage_names
* Update for all backbones
* Add tests
* Fix
* Align timm backbone behaviour with other backbones
* Fix tests
* Stricter checks on set out_features and out_indices
* Revert back stage selection logic
* Remove out-of-order logic
* Document restriction in docstrings
2023-12-20 18:33:17 +00:00
amyeroberts
224ab70969
Update FA2 exception msg to point to hub discussions ( #28161 )
...
* Update FA2 exception msg to point to hub discussions
* Use path for hub url
2023-12-20 16:52:16 +00:00
Yih-Dar
9924df9eb2
Avoid unnecessary warnings when loading CLIPConfig
( #28108 )
...
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-12-20 17:24:53 +01:00
Yih-Dar
7938c8c836
Fix weights not properly initialized due to shape mismatch ( #28122 )
...
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-12-20 14:20:02 +01:00
peter-sk
769a9542de
move code to Trainer.evaluate to enable use of that function with multiple datasets ( #27844 )
...
* move code to Trainer.evaluate to enable use of that function with multiple datasets
* test
* update doc string
* and a tip
* forgot the type
---------
Co-authored-by: Prof. Peter Schneider-Kamp <jps@ordbogen.com>
2023-12-20 10:55:56 +01:00
Jong-hun Shin
cd9f9d63f1
[gpt-neox] Add attention_bias config to support model trained without attention biases ( #28126 )
...
* add attention_bias hparam for a model trained without attention biases
* fix argument documentation error
2023-12-20 10:05:32 +01:00
Sourab Mangrulkar
def581ef51
Fix FA2 integration ( #28142 )
...
* fix fa2
* fix FA2 for popular models
* improve warning and add Younes as co-author
Co-Authored-By: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix the warning
* Add Tip
* typo fix
* nit
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-12-20 14:25:07 +05:30
Abolfazl Shahbazi
b134f6857e
Remove deprecated CPU dockerfiles ( #28149 )
...
Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>
2023-12-20 05:51:35 +01:00
Aaron Jimenez
38611086d2
[docs] Fix mistral link in mixtral.md ( #28143 )
...
Fix mistral link in mixtral.md
2023-12-19 10:34:14 -08:00
Mike Zellinger
23f8e4db77
Update modeling_utils.py ( #28127 )
...
In docstring for PreTrainedModel.resize_token_embeddings, correct definition of new_num_tokens parameter to read "the new number of tokens" (meaning the new size of the vocab) rather than "the number of new tokens" (number of newly added tokens only).
2023-12-19 09:07:57 -08:00
Arthur
4a04b4ccca
[Mixtral
] Fix loss + nits ( #28115 )
...
* default config should not use sliding window
* update the doc
* nits
* add a proper test
* update
* update
* update expected value
* Update src/transformers/tokenization_utils_fast.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* convert to float
* average then N**2
* comment
* revert nit
* good to fo
* fixup
* Update tests/models/mixtral/test_modeling_mixtral.py
Co-authored-by: Lysandre Debut <hi@lysand.re>
* revert unrelated change
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
2023-12-19 17:31:54 +01:00
Joao Gante
ac974199c8
Generate: speculative decoding ( #27979 )
...
* speculative decoding
* fix test
* space
* better comments
* remove redundant test
* test nit
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* PR comments
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-12-19 13:58:30 +00:00
amyeroberts
bd7a356135
Update split string in doctest to reflect #28087 ( #28135 )
2023-12-19 13:55:09 +00:00
qihqi
5aec50ecaf
When save a model on TPU, make a copy to be moved to CPU ( #27993 )
...
* When save a model, make a copy to be moved to CPU, dont move the original
model
* make deepcopy inside of _save_tpu
* Move to tpu without copy
2023-12-19 10:08:51 +00:00
Aaron Jimenez
4edffda636
[Doc] Fix token link in What 🤗 Transformers can do ( #28123 )
...
Fix token link
2023-12-18 15:06:54 -08:00
Mike Salvatore
c52b515e94
Fix a typo in tokenizer documentation ( #28118 )
2023-12-18 19:44:35 +01:00
Steven Liu
a52e180a0f
[docs] General doc fixes ( #28087 )
...
* doc fix friday
* deprecated objects
* update not_doctested
* update toctree
2023-12-18 10:44:09 -08:00
Rockerz
08a6e7a702
Fix indentation error - semantic_segmentation.md ( #28117 )
...
Update semantic_segmentation.md
2023-12-18 12:47:54 -05:00
Matt
71d47f0ad4
More TF fixes ( #28081 )
...
* More build_in_name_scope()
* Make sure we set the save spec now we don't do it with dummies anymore
* make fixup
2023-12-18 15:26:03 +00:00
Lucain
0695b2421a
Remove warning if DISABLE_TELEMETRY
is used ( #28113 )
...
remove warning if DISABLE_TELEMETRY is used
2023-12-18 16:18:01 +01:00
Daize Dong
7c5408dade
Disable jitter noise during evaluation in SwitchTransformers ( #28077 )
...
* Disable jitter noise during evaluation
* Update outdated configuration information
* Formatting
* Add new line
2023-12-18 15:08:55 +00:00
lain
a0522de497
fix ConversationalPipeline docstring ( #28091 )
2023-12-18 15:08:37 +00:00
Wang, Yi
e6cb8e052a
in peft finetune, only the trainable parameters need to be saved ( #27825 )
...
to reduce the storage size and also save the time of checkpoint saving while using deepspeed for training
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
2023-12-18 14:27:05 +00:00
Aeneas Stankowski
7f2a8f92e4
Spelling correction ( #28110 )
...
Update mixtral.md
correct minor typo in overview
2023-12-18 14:04:05 +00:00
Younes Belkada
b8378b658e
[Llava
/ Vip-Llava
] Add SDPA into llava ( #28107 )
...
add SDPA into llava
2023-12-18 13:46:30 +01:00
cyyever
e6dcf8abd6
Fix the deprecation warning of _torch_pytree._register_pytree_node ( #27803 )
2023-12-17 11:13:42 +01:00
Poedator
f85a1e82c1
4D attention_mask
support ( #27539 )
...
* edits to _prepare_4d_causal_attention_mask()
* initial tests for 4d mask
* attention_mask_for_sdpa support
* added test for inner model hidden
* added autotest decorators
* test mask dtype to torch.int64
* torch.testing.assert_close
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* torch_device and @torch_gpu in tests
* upd tests
* +torch decorators
* torch decorators fixed
* more decorators!
* even more decorators
* fewer decorators
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-12-17 11:08:04 +01:00
Sourab Mangrulkar
238d2e3c44
fix resuming from ckpt when using FSDP with FULL_STATE_DICT ( #27891 )
...
* fix resuming from ckpt when suing FSDP with FULL_STATE_DICT
* update tests
* fix tests
2023-12-16 19:41:43 +05:30
Steven Liu
ebfdb9ca62
[docs] MPS ( #28016 )
...
* mps docs
* toctree
2023-12-15 13:17:29 -08:00
Steven Liu
0d63d17765
[docs] Trainer ( #27986 )
...
* first draft
* add to toctree
* edits
* feedback
2023-12-15 12:06:55 -08:00
Younes Belkada
1faeff85ce
Fix Vip-llava docs ( #28085 )
...
* Update vipllava.md
* Update modeling_vipllava.py
2023-12-15 20:16:47 +01:00
Ligeng Zhu
ffa04def0e
Fix wrong examples in llava usage. ( #28020 )
...
* Fix wrong examples in llava usage.
* Update modeling_llava.py
2023-12-15 17:09:50 +00:00
Kotaro Tanahashi
29a1c1b472
Fix low_cpu_mem_usage
Flag Conflict with DeepSpeed Zero 3 in from_pretrained
for Models with keep_in_fp32_modules
" ( #27762 )
...
Fix `from_pretrained` Logic
for `low_cpu_mem_usage` with DeepSpeed Zero3
2023-12-15 17:03:41 +00:00
Quentin Lhoest
26ea725bc0
Update fixtures-image-utils ( #28080 )
...
* fix hf-internal-testing/fixtures_image_utils
* fix test
* comments
2023-12-15 16:58:36 +00:00
dumpmemory
1c286be508
Fix bug for checkpoint saving on multi node training setting ( #28078 )
...
* add multi-node traning setting
* fix style
2023-12-15 16:18:56 +00:00
Julien Chaumond
dec84b3211
make torch.load a bit safer ( #27282 )
...
* make torch.load a bit safer
* Fixes
---------
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2023-12-15 16:01:18 +01:00
Ke Wen
74cae670ce
Make GPT2 traceable in meta state ( #28054 )
...
* Put device in tensor constructor instead of to()
* Fix copy
2023-12-15 15:45:31 +01:00