transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-05 05:40:05 +06:00

Author	SHA1	Message	Date
Yih-Dar	71f460578d	Update `docs/source/en/perf_infer_gpu_one.md` (#28198 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-12-22 10:40:22 +01:00
Younes Belkada	3a8769f6a9	[`Docs`] Add 4-bit serialization docs (#28182 ) * add 4-bit serialization docs * up * up	2023-12-22 10:18:32 +01:00
amyeroberts	3657748b4d	Update YOLOS slow test values (#28187 ) Update test values	2023-12-21 18:17:07 +00:00
amyeroberts	cd1350ce9b	Fix slow backbone tests - out_indices must match stage name ordering (#28186 ) Indices must match stage name ordering	2023-12-21 18:16:50 +00:00
Matt	260b9d2179	Even more TF test fixes (#28146 ) * Fix vision text dual encoder * Small cleanup for wav2vec2 (not fixed yet) * Small fix for vision_encoder_decoder * Fix SAM builds * Update TFBertTokenizer test with modern exporting + tokenizer * Fix DeBERTa * Fix DeBERTav2 * Try RAG fix but it's impossible to test locally * Actually fix RAG now that I got FAISS working somehow * Fix Wav2Vec2, add sermon * Fix Hubert	2023-12-21 15:14:46 +00:00
Arthur	f9a98c476c	[`Mixtral` & `Mistral`] Add support for sdpa (#28133 ) * some nits * update test * add support d\sd[a * remove some dummy inputs * all good * style * nits * fixes * fix more copies * nits * styling * fix * Update src/transformers/models/mistral/modeling_mistral.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * add a slow test just to be sure * fixup --------- Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>	2023-12-21 12:38:22 +01:00
Sanchit Gandhi	814619f54f	[Whisper] Use torch for stft if available (#26119 ) * [Whisper] Use torch for stft if available * update docstring * mock patch decorator * fit on one line	2023-12-21 11:04:05 +00:00
Joao Gante	7e93ce40c5	Fix `input_embeds` docstring in encoder-decoder architectures (#28168 )	2023-12-21 11:01:54 +00:00
Poedator	4f7806ef7e	[bnb] Let's make serialization of 4bit models possible (#26037 ) * updated bitsandbytes.py * rm test_raise_* from test_4bit.py * add test_4bit_serialization.py * modeling_utils bulk edits * bnb_ver 0.41.3 in integrations/bitsandbytes.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * @slow reinstated Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * bnb ver 0.41.3 in src/transformers/modeling_utils.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * rm bnb version todo in integrations/bitsandbytes.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * moved 4b serialization tests to test_4bit * tests upd for opt * to torch_device Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * ruff fixes to tests * rm redundant bnb version check in mod_utils Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * restore _hf_peft_config_loaded modeling_utils.py::2188 Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * restore _hf_peft_config_loaded test in modeling_utils.py::2199 Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * fixed NOT getattr(self, "is_8bit_serializable") Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * setting model.is_4bit_serializable * rm separate fp16_statistics arg from set_module... * rm else branch in integrations::bnb::set_module * bnb 4bit dtype check * upd comment on 4bit weights * upd tests for FP4 safe --------- Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>	2023-12-21 11:54:44 +01:00
Dean Wyatte	e268d7e5dc	disable test_retain_grad_hidden_states_attentions on SeamlessM4TModelWithTextInputTest (#28169 ) disable retain_grad_hidden_states_attentions on SeamlessM4TModelWithTextInputTest	2023-12-21 08:39:44 +01:00
amyeroberts	1d77735947	Fix yolos resizing (#27663 ) * Fix yolos resizing * Update tests * Add a test	2023-12-20 20:55:51 +00:00
Joao Gante	45b70384a7	Generate: fix speculative decoding (#28166 ) Co-authored-by: Merve Noyan <merveenoyan@gmail.com>	2023-12-20 18:55:35 +00:00
Steven Liu	01c081d138	[docs] Trainer docs (#28145 ) * fsdp, debugging, gpu selection * fix hfoption * fix	2023-12-20 10:37:23 -08:00
amyeroberts	ee298a16a2	Align backbone stage selection with out_indices & out_features (#27606 ) * Iteratre over out_features instead of stage_names * Update for all backbones * Add tests * Fix * Align timm backbone behaviour with other backbones * Fix tests * Stricter checks on set out_features and out_indices * Revert back stage selection logic * Remove out-of-order logic * Document restriction in docstrings	2023-12-20 18:33:17 +00:00
amyeroberts	224ab70969	Update FA2 exception msg to point to hub discussions (#28161 ) * Update FA2 exception msg to point to hub discussions * Use path for hub url	2023-12-20 16:52:16 +00:00
Yih-Dar	9924df9eb2	Avoid unnecessary warnings when loading `CLIPConfig` (#28108 ) * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-12-20 17:24:53 +01:00
Yih-Dar	7938c8c836	Fix weights not properly initialized due to shape mismatch (#28122 ) * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-12-20 14:20:02 +01:00
peter-sk	769a9542de	move code to Trainer.evaluate to enable use of that function with multiple datasets (#27844 ) * move code to Trainer.evaluate to enable use of that function with multiple datasets * test * update doc string * and a tip * forgot the type --------- Co-authored-by: Prof. Peter Schneider-Kamp <jps@ordbogen.com>	2023-12-20 10:55:56 +01:00
Jong-hun Shin	cd9f9d63f1	[gpt-neox] Add attention_bias config to support model trained without attention biases (#28126 ) * add attention_bias hparam for a model trained without attention biases * fix argument documentation error	2023-12-20 10:05:32 +01:00
Sourab Mangrulkar	def581ef51	Fix FA2 integration (#28142 ) * fix fa2 * fix FA2 for popular models * improve warning and add Younes as co-author Co-Authored-By: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/modeling_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix the warning * Add Tip * typo fix * nit --------- Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-12-20 14:25:07 +05:30
Abolfazl Shahbazi	b134f6857e	Remove deprecated CPU dockerfiles (#28149 ) Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>	2023-12-20 05:51:35 +01:00
Aaron Jimenez	38611086d2	[docs] Fix mistral link in mixtral.md (#28143 ) Fix mistral link in mixtral.md	2023-12-19 10:34:14 -08:00
Mike Zellinger	23f8e4db77	Update modeling_utils.py (#28127 ) In docstring for PreTrainedModel.resize_token_embeddings, correct definition of new_num_tokens parameter to read "the new number of tokens" (meaning the new size of the vocab) rather than "the number of new tokens" (number of newly added tokens only).	2023-12-19 09:07:57 -08:00
Arthur	4a04b4ccca	[`Mixtral`] Fix loss + nits (#28115 ) * default config should not use sliding window * update the doc * nits * add a proper test * update * update * update expected value * Update src/transformers/tokenization_utils_fast.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * convert to float * average then N*2 comment * revert nit * good to fo * fixup * Update tests/models/mixtral/test_modeling_mixtral.py Co-authored-by: Lysandre Debut <hi@lysand.re> * revert unrelated change --------- Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Lysandre Debut <hi@lysand.re>	2023-12-19 17:31:54 +01:00
Joao Gante	ac974199c8	Generate: speculative decoding (#27979 ) * speculative decoding * fix test * space * better comments * remove redundant test * test nit * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * PR comments --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-12-19 13:58:30 +00:00
amyeroberts	bd7a356135	Update split string in doctest to reflect #28087 (#28135 )	2023-12-19 13:55:09 +00:00
qihqi	5aec50ecaf	When save a model on TPU, make a copy to be moved to CPU (#27993 ) * When save a model, make a copy to be moved to CPU, dont move the original model * make deepcopy inside of _save_tpu * Move to tpu without copy	2023-12-19 10:08:51 +00:00
Aaron Jimenez	4edffda636	[Doc] Fix token link in What 🤗 Transformers can do (#28123 ) Fix token link	2023-12-18 15:06:54 -08:00
Mike Salvatore	c52b515e94	Fix a typo in tokenizer documentation (#28118 )	2023-12-18 19:44:35 +01:00
Steven Liu	a52e180a0f	[docs] General doc fixes (#28087 ) * doc fix friday * deprecated objects * update not_doctested * update toctree	2023-12-18 10:44:09 -08:00
Rockerz	08a6e7a702	Fix indentation error - semantic_segmentation.md (#28117 ) Update semantic_segmentation.md	2023-12-18 12:47:54 -05:00
Matt	71d47f0ad4	More TF fixes (#28081 ) * More build_in_name_scope() * Make sure we set the save spec now we don't do it with dummies anymore * make fixup	2023-12-18 15:26:03 +00:00
Lucain	0695b2421a	Remove warning if `DISABLE_TELEMETRY` is used (#28113 ) remove warning if DISABLE_TELEMETRY is used	2023-12-18 16:18:01 +01:00
Daize Dong	7c5408dade	Disable jitter noise during evaluation in SwitchTransformers (#28077 ) * Disable jitter noise during evaluation * Update outdated configuration information * Formatting * Add new line	2023-12-18 15:08:55 +00:00
lain	a0522de497	fix ConversationalPipeline docstring (#28091 )	2023-12-18 15:08:37 +00:00
Wang, Yi	e6cb8e052a	in peft finetune, only the trainable parameters need to be saved (#27825 ) to reduce the storage size and also save the time of checkpoint saving while using deepspeed for training Signed-off-by: Wang, Yi <yi.a.wang@intel.com>	2023-12-18 14:27:05 +00:00
Aeneas Stankowski	7f2a8f92e4	Spelling correction (#28110 ) Update mixtral.md correct minor typo in overview	2023-12-18 14:04:05 +00:00
Younes Belkada	b8378b658e	[`Llava` / `Vip-Llava`] Add SDPA into llava (#28107 ) add SDPA into llava	2023-12-18 13:46:30 +01:00
cyyever	e6dcf8abd6	Fix the deprecation warning of _torch_pytree._register_pytree_node (#27803 )	2023-12-17 11:13:42 +01:00
Poedator	f85a1e82c1	4D `attention_mask` support (#27539 ) * edits to _prepare_4d_causal_attention_mask() * initial tests for 4d mask * attention_mask_for_sdpa support * added test for inner model hidden * added autotest decorators * test mask dtype to torch.int64 * torch.testing.assert_close Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * torch_device and @torch_gpu in tests * upd tests * +torch decorators * torch decorators fixed * more decorators! * even more decorators * fewer decorators --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-12-17 11:08:04 +01:00
Sourab Mangrulkar	238d2e3c44	fix resuming from ckpt when using FSDP with FULL_STATE_DICT (#27891 ) * fix resuming from ckpt when suing FSDP with FULL_STATE_DICT * update tests * fix tests	2023-12-16 19:41:43 +05:30
Steven Liu	ebfdb9ca62	[docs] MPS (#28016 ) * mps docs * toctree	2023-12-15 13:17:29 -08:00
Steven Liu	0d63d17765	[docs] Trainer (#27986 ) * first draft * add to toctree * edits * feedback	2023-12-15 12:06:55 -08:00
Younes Belkada	1faeff85ce	Fix Vip-llava docs (#28085 ) * Update vipllava.md * Update modeling_vipllava.py	2023-12-15 20:16:47 +01:00
Ligeng Zhu	ffa04def0e	Fix wrong examples in llava usage. (#28020 ) * Fix wrong examples in llava usage. * Update modeling_llava.py	2023-12-15 17:09:50 +00:00
Kotaro Tanahashi	29a1c1b472	Fix `low_cpu_mem_usage` Flag Conflict with DeepSpeed Zero 3 in `from_pretrained` for Models with `keep_in_fp32_modules`" (#27762 ) Fix `from_pretrained` Logic for `low_cpu_mem_usage` with DeepSpeed Zero3	2023-12-15 17:03:41 +00:00
Quentin Lhoest	26ea725bc0	Update fixtures-image-utils (#28080 ) * fix hf-internal-testing/fixtures_image_utils * fix test * comments	2023-12-15 16:58:36 +00:00
dumpmemory	1c286be508	Fix bug for checkpoint saving on multi node training setting (#28078 ) * add multi-node traning setting * fix style	2023-12-15 16:18:56 +00:00
Julien Chaumond	dec84b3211	make torch.load a bit safer (#27282 ) * make torch.load a bit safer * Fixes --------- Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>	2023-12-15 16:01:18 +01:00
Ke Wen	74cae670ce	Make GPT2 traceable in meta state (#28054 ) * Put device in tensor constructor instead of to() * Fix copy	2023-12-15 15:45:31 +01:00

... 4 5 6 7 8 ...

15053 Commits