transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-31 02:02:21 +06:00

Author	SHA1	Message	Date
Gallil Maimon	6acb4e43a7	Support BatchNorm in Hubert pos_conv_emb as in fairseq (#34389 ) * Support BatchNorm in Hubert pos_conv_emb as in fairseq * Correct the new defaults (#34377) * Correct the new defaults * CIs * add check * Update utils.py * Update utils.py * Add the max_length in generate test checking shape without passing length * style * CIs * fix fx CI issue * [auto. ping] Avoid sending empty info + add more team members (#34383) * update * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * Fix glm (#34388) * Fix duplicated * fix import * Use non nested images and batched text Idefics2/3 (#34222) * add support for non nested images and add tests * add tests error scenario * fix style * added single and no image to error tests * Fix onnx non-expotable inplace aten op (#34376) * fix onnx non-expotable inplace op * mistral, qwen2, qwen2_vl, starcoder2 * fixup copies * Fix right padding in LLaVA models (#34305) * fix right pad llavas * device mismatch * no filter (#34391) * no filter * no filter * no filter --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * SynthID: better example (#34372) * better example * Update src/transformers/generation/configuration_utils.py * Update src/transformers/generation/logits_process.py * nits * Tests: upgrade `test_eager_matches_sdpa_generate` (#34386) * Fix bnb training test failure (#34414) * Fix bnb training test: compatibility with OPTSdpaAttention * Avoid check expected exception when it is on CUDA (#34408) * update * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * Fix typos in agents_advanced.md (#34405) * [docs] Cache implementations (#34325) cache * [run-slow] hubert * Support BatchNorm in Hubert pos_conv_emb as in fairseq Add conversion integration test, and make batchnorm explicit variable * Support BatchNorm in Hubert pos_conv_emb as in fairseq fix make fixup styling changes * [run-slow] hubert * Support BatchNorm in Hubert pos_conv_emb as in fairseq * [run-slow] hubert * Support BatchNorm in Hubert pos_conv_emb as in fairseq Add conversion integration test, and make batchnorm explicit variable * Support BatchNorm in Hubert pos_conv_emb as in fairseq fix make fixup styling changes * [run-slow] hubert * [run-slow] hubert --------- Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com> Co-authored-by: Raushan Turganbay <raushan@huggingface.co> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com> Co-authored-by: Rudy Delouya <rudy.delouya@gmail.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>	2024-12-10 14:18:23 +01:00
Trevor Royer	80f2b1610f	Fix file path for shard_num 1 with mllama converter (#35053 ) "#35049 fix path for num_shard 1"	2024-12-10 09:11:45 +00:00
Raushan Turganbay	0938b57770	Assisted decoding multi-gpu (#35116 ) * fix * move a few lines up	2024-12-10 09:59:17 +01:00
Spiros Dontas	dada0fd85f	Fix `num_items_in_batch` not being an integer (#35115 ) In method `Trainer#get_batch_samples`, the return values should be a list of batch samples and an integer indicating the number of items that exist in the batch. However, this was not actually a case and what was returned instead of an integer, was a tensor with one element. In the multi-GPU setup, this tensor is placed in a different device than the loss tensor, causing the loss function to raise a `RuntimeError`. The problem arises from `5d7739f15a/src/transformers/trainer.py (L5139-L5144)`, where the outer `sum` operates over a list of tensors which means that the final result is also a tensor. To counter this issue, a new check (after the accelerator gathering) has been added in order to convert a potential tensor to an integer before returning the `num_items_in_batch`.	2024-12-10 08:40:40 +01:00
Matthew Douglas	34f4080ff5	[CI] Fix bnb quantization tests with accelerate>=1.2.0 (#35172 )	2024-12-09 13:55:16 -05:00
UV	fa8763ce17	Fixed typo of 'avilable' in prompts.py (#35145 )	2024-12-09 16:40:32 +00:00
fzyzcjy	4bc39de5c3	Super tiny fix logging message (#35132 ) Update integration_utils.py	2024-12-09 16:31:32 +00:00
Lysandre Debut	8e806a336f	Cleanup: continue the init refactor (#35167 ) Round 2	2024-12-09 16:09:50 +01:00
Mohamed Mekkouri	7238387f67	Fix typo in EETQ Tests (#35160 ) fix	2024-12-09 14:13:36 +01:00
Daniel Bogdoll	de8a0b7547	Option to set 'non_blocking' for to(device) in BatchEncoding and BatchFeature (#34883 ) * Option to set 'non_blocking' for to(device) operation for performance improvements. Defaults to 'false', thus no behavioral changes. * Enabling non_blocking in to() operation of BatchFeature. * Improved docstring on utilization of non_blocking * Force non_blocking as keyword argument Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> --------- Co-authored-by: Daniel Bogdoll <dbogdoll@umich.edu> Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2024-12-09 11:29:04 +01:00
UV	1452dc2514	Corrected typo in agent system prompts (#35143 )	2024-12-09 10:42:23 +01:00
NielsRogge	9e420e0269	[I-JEPA] Update docs (#35148 ) Update docs	2024-12-09 10:01:31 +01:00
kang sheng	1ccca8f48c	Fix GA loss bugs and add unit test (#35121 ) * fix GA bugs and add unit test * narrow down model loss unit test diff gap * format code to make ruff happy * send num_items_in_batch argument to decoder * fix GA loss bug in BertLMHeadModel * use TinyStories-33M to narrow down diff gap * fotmat code * missing .config * avoid add extra args --------- Co-authored-by: kangsheng <kangsheng@meituan.com>	2024-12-09 09:57:41 +01:00
Pavel Iakubovskii	c8c8dffbe4	Update I-JEPA checkpoints path (#35120 ) Update checkpoints path	2024-12-06 13:42:51 +00:00
Victor Agostinelli	7f95372c62	Add feature dim attributes to BitLinear for easier PEFT integration (#34946 ) Update bitnet.py, extremely small change to allow for easier PEFT integration Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>	2024-12-06 13:39:45 +01:00
Aymeric Roucher	9ad4c93536	Add Aria (#34157 ) * Add Aria --------- Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-12-06 12:17:34 +01:00
Yih-Dar	15ab310c3a	Fix private forked repo. CI (#35114 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-12-06 12:03:31 +01:00
Steven Liu	98e8062df3	[docs] top_p, top_k, temperature docstrings (#35065 ) clarify	2024-12-05 11:24:51 -08:00
Jacky Lee	44f88d8ccb	[docs] Update Python version in translations (#35096 ) update: doc version	2024-12-05 11:06:54 -08:00
Lysandre	66ab300aaf	Dev version	2024-12-05 19:12:22 +01:00
Pablo Montalvo	a5bb528471	Fix signatures for processing kwargs (#35105 ) * add conversion script * remove pg2 refs * fixup style * small update * get correct scaling * add back missing bos * fix missing config keys * might revert this pos_embeddings * fixup 9b config * fix 9b * fixup 9b conversion for good + add back num_hidden_layers * add correct query scaling for 2b, 9b, 27b * fixup 27b conversion * Additional variant: 27b-896 * Use CPU for conversion to reduce GPU RAM requirements * fix causal mask generation + formatting * fix in-training causal mask generation edge case * trigger CI * update config * update config * update config * update config * update config * update config * update config * update config * update config * move conversion file to main model dir * handle multi-images + bos token * address comments for input ids * revert ci fixes * [run-slow] paligemma * fix * [run-slow] paligemma * skip end 2 end * [run-slow] paligemma --------- Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-12-05 18:15:48 +01:00
Jonathan Mamou	e27465c801	Adaptive dynamic number of speculative tokens (#34156 ) * initial commit * update strategy * add tradeoff FPR TPR with cost * all probs * fix * fix * fix style * Update src/transformers/generation/configuration_utils.py shorter docstring Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * import guard * fix style * add is_sklearn_available condition * vectorizing to flatten the for-loop * fix style * disable adaptation for UAG * update doc * add TestAssistedCandidateGeneratorUpdateStrategy * fix style * protect import * fix style --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2024-12-05 17:07:33 +01:00
Yih-Dar	b0a51e5cff	Fix flaky Hub CI (`test_trainer.py`) (#35062 ) * fix * Update src/transformers/testing_utils.py Co-authored-by: Lucain <lucainp@gmail.com> * fix * fix * fix * fix * fix * fix * fix * fix * check * check * check * check * check * check * Update src/transformers/testing_utils.py Co-authored-by: Lucain <lucainp@gmail.com> * Update src/transformers/testing_utils.py Co-authored-by: Lucain <lucainp@gmail.com> * check * check * check * Final space * Final adjustment --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Lucain <lucainp@gmail.com>	2024-12-05 17:02:27 +01:00
Arthur	a928d9c128	[`trainer`] fix the GA `model_accepts_loss_kwargs` (#34915 ) * fix * style * values * fix	2024-12-05 16:37:46 +01:00
Raushan Turganbay	e682c17e4a	BLIP: this is correct now (#35081 ) this is correct now	2024-12-05 16:30:09 +01:00
João Marcelo	50189e36a6	Add I-JEPA (#33125 ) * first draft * add IJepaEmbeddings class * fix copy-from for IJepa model * add weight conversion script * update attention class names in IJepa model * style changes * Add push_to_hub option to convert_ijepa_checkpoint function * add initial tests for I-JEPA * minor style changes to conversion script * make fixup related * rename conversion script * Add I-JEPA to sdpa docs * minor fixes * adjust conversion script * update conversion script * adjust sdpa docs * [run_slow] ijepa * [run-slow] ijepa * [run-slow] ijepa * [run-slow] ijepa * [run-slow] ijepa * [run-slow] ijepa * formatting issues * adjust modeling to modular code * add IJepaModel to objects to ignore in docstring checks * [run-slow] ijepa * fix formatting issues * add usage instruction snippet to docs * change pos encoding, add checkpoint for doc * add verify logits for all models * [run-slow] ijepa * update docs to include image feature extraction instructions * remove pooling layer from IJepaModel in image classification class * [run-slow] ijepa * remove pooling layer from IJepaModel constructor * update docs * [run-slow] ijepa * [run-slow] ijepa * small changes * [run-slow] ijepa * style adjustments * update copyright in init file * adjust modular ijepa * [run-slow] ijepa	2024-12-05 16:14:46 +01:00
Mohamed Mekkouri	95a855e212	Deprecate quanto and switch to optimum-quanto (#35001 ) * deprecate quanto * fix style	2024-12-05 16:11:09 +01:00
Isotr0py	482cb28a18	Fix `tie_word_embeddings` handling for GGUF models (#35085 ) * fix tie_word_embeddings Signed-off-by: Isotr0py <2037008807@qq.com> * fix Signed-off-by: Isotr0py <2037008807@qq.com> --------- Signed-off-by: Isotr0py <2037008807@qq.com>	2024-12-05 16:00:41 +01:00
Cyril Vallez	35447054f5	Update Mistral conversion script (#34829 ) * Update convert_mistral_weights_to_hf.py * Update convert_mistral_weights_to_hf.py * Update convert_mistral_weights_to_hf.py	2024-12-05 15:47:20 +01:00
Arthur	93f87d3cf5	[`tokenizers`] bump to 0.21 (#34972 ) bump to 0.21	2024-12-05 15:46:02 +01:00
eustlb	54aae121eb	[Whisper] Fix whisper tokenizer (#34537 ) * handle single timestamp ending * include last timestamp token * handle single timestamp ending * avoid floating points arithm limitations * ensure float64 operations * new test * make fixup * make copies * handle edge case double tokens ending with different tokens * handle single timestamp ending * make fixup * handle conditioning on prev segments * fix * Update src/transformers/models/whisper/generation_whisper.py Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * [run-slow] whisper * don't call item() to avoid unnecessary sync * fix --------- Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> Co-authored-by: Eustache Le Bihan <eustlb@users.noreply.huggingface.co>	2024-12-05 13:46:29 +01:00
Yih-Dar	beb2c66ec3	Informative (#35059 ) * fix * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-12-05 09:50:27 +01:00
Steven Liu	1ed1de2fec	[docs] Increase visibility of torch_dtype="auto" (#35067 ) * auto-dtype * feedback	2024-12-04 09:18:44 -08:00
Fanli Lin	baa3b22137	[docs] add a comment that offloading requires CUDA GPU (#35055 ) * add commen to offloading * Update docs/source/en/kv_cache.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2024-12-04 07:48:34 -08:00
Cyril Vallez	1da1e0d7f2	Support for easier multimodal use of modular (#35056 ) * update modular and add examples * style * improve example comments * style * fix small logic issue for imports * fix relative order issue when files do not make sense * Improve comments * trigger CIs	2024-12-04 15:13:11 +01:00
Anton Vlasjuk	46df859975	[`GPTNeoX`] Flex Attention + Refactor (#34896 ) * gpt neox flex attention + refactor * some formatting * small fix on dropout * add assertion on flex attn test * flaky ci :( * add head mask support * style * handle dtype, replace torch where * fixup flex with output attns * code review and several other fixes * Update src/transformers/modeling_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * style * remove unnecessary comment * remove incorrect comment * make flex attn check more agnostic tor versions and centralized * change peft input dtype check to value since q and k could be affected by other stuff like RoPE * i forgor * flaky * code review and small fixes * Update src/transformers/models/gpt_neox/modeling_gpt_neox.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-12-04 14:48:28 +01:00
Vladislav Bronzov	accb7204f9	Add Pytorch Tensor Parallel support for Qwen2, Qwen2Moe, Starcoder2 (#35007 ) * add base tp plan for qwen2 and qwen2moe * add parallel tp for starcoder2 * fix modular conversion * add infer dim for qkv states * Update src/transformers/models/qwen2_moe/configuration_qwen2_moe.py --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-12-04 14:43:36 +01:00
Tianshu Wang	c7a109ec81	Fix `pad_token_tensor` is None in warning (#34005 ) Fix pad_token_tensor is None in warning	2024-12-04 11:15:25 +01:00
Fanli Lin	329f5dbf97	[docs] use device-agnostic API instead of hard-coded cuda (#35048 ) replace cuda	2024-12-03 10:54:15 -08:00
Fanli Lin	b8cdc262d5	[docs] use device-agnostic instead of `cuda` (#35047 ) * fix on xpu * [run_all] * add the missing import for Image lib * add more devices in comment * bug fix * replace cuda	2024-12-03 10:53:45 -08:00
wwwbai	346597b644	Translate community.md into Chinese (#35013 ) * community translation * Update docs/source/zh/community.md Co-authored-by: Isotr0py <2037008807@qq.com> --------- Co-authored-by: Isotr0py <2037008807@qq.com>	2024-12-03 10:22:02 -08:00
Fanli Lin	3deaa8179d	[docs] fix example code bug (#35054 ) fix code bug	2024-12-03 09:18:39 -08:00
Wang, Yi	125de41643	fix speecht5 failure issue in test_peft_gradient_checkpointing_enable… (#34454 ) * fix speecht5 failure issue in test_peft_gradient_checkpointing_enable_disable Signed-off-by: Wang, Yi <yi.a.wang@intel.com> * [run-slow] speecht5 --------- Signed-off-by: Wang, Yi <yi.a.wang@intel.com> Co-authored-by: Matt <rocketknight1@gmail.com>	2024-12-03 13:58:54 +00:00
Yih-Dar	7a7f27697a	Fix `BertGeneration` (#35043 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-12-03 13:56:59 +01:00
Aymeric Roucher	901f504580	Add token cost + runtime monitoring to Agent and HfEngine children (#34548 ) * Add monitoring to Agent and HfEngine children	2024-12-03 13:14:52 +01:00
Cyril Vallez	ee37bf0d95	Automatic compilation in generate: do not rely on inner function (#34923 ) * compiled forward in PreTrainedModel * update * style * update name * trigger CIs * Add way to use custom compile args * style * switch parameterization to generation_config * Add to inits * Update configuration_utils.py * inits * style * docs * style * Update configuration_utils.py * back without dataclass for repo consistency * Update configuration_utils.py * style * style * style once again * add config serialization * update * true dataclass * trigger CIs * merge compile methods + remove serialization of compile config	2024-12-03 11:20:31 +01:00
wwwbai	f9c7e6021e	Translate bertlogy.md into Chinese (#34908 ) * bertology translation * Update docs/source/zh/_toctree.yml Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/zh/bertology.md Co-authored-by: blueingman <15329507600@163.com> * Update docs/source/zh/bertology.md Co-authored-by: blueingman <15329507600@163.com> * Update docs/source/zh/bertology.md Co-authored-by: Isotr0py <2037008807@qq.com> * Update docs/source/zh/bertology.md Co-authored-by: Isotr0py <2037008807@qq.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: blueingman <15329507600@163.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2024-12-02 11:42:40 -08:00
Fanli Lin	527dc04e46	[docs] add the missing import for Image and bug fix (#34776 ) * add the missing import for Image lib * add more devices in comment * bug fix	2024-12-02 11:40:20 -08:00
Ahmed Almaghz	4955e4e638	[i18n-ar] Translated file : `docs/source/ar/notebooks.md` into Arabic (#33049 ) * Add docs/source/ar/notebooks.md to Add_docs_source_ar_notebooks.md * Update notebooks.md * Update _toctree.yml	2024-12-02 11:40:04 -08:00
secrettoad	f0dec874f0	add docstring example for compute_loss_func (#35020 )	2024-12-02 11:39:09 -08:00

1 2 3 4 5 ...

17542 Commits