transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-13 09:40:06 +06:00

Author	SHA1	Message	Date
AbdelKarim ELJANDOUBI	7ecc5b88c0	Add image classifier donut & update loss calculation for all swins (#37224 ) * add classifier head to donut * add to transformers __init__ * add to auto model * fix typo * add loss for image classification * add checkpoint * remove no needed import * reoder import * format * consistency * add test of classifier * add doc * try ignore * update loss for all swin models	2025-04-10 15:00:42 +02:00
Mohamed Mekkouri	5ae9b2cac0	Quark Quantization gated repo (#37412 ) * fix * empty commit * empty * nit * fix maybe ?	2025-04-10 14:57:15 +02:00
Yih-Dar	d9e76656ae	Fix new failure reports not including anything other than `tests/models/` (#37415 ) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-04-10 14:47:23 +02:00
Raushan Turganbay	1ae8d54b04	[chat-template] Unify tests and clean up 🧼 (#37275 ) * fix tests and some clean up * make one general test for each modality * remove redundant merging of kwargs * edge cases * dont enforce slow when reloading * fix gemma3 tests * has to adapt llama 4 after rebase * remove also from overriden tests * should be green now	2025-04-10 14:42:32 +02:00
Arthur	10144ff116	use `rms_norm_eps` for the L2Norm for Llama4 (#37418 ) use `rms_norm_eps`	2025-04-10 13:33:50 +02:00
ivarflakstad	aa478567f8	Allow rocm systems to run these tests (#37278 ) * Allow rocm systems to run these tests * Fix skipTest logic * Use get_device_properties to check system capabilities	2025-04-10 13:33:01 +02:00
Wang, Yi	ae5ce22664	from_pretrained should handle xpu case (#37382 ) * from_pretrained should handle xpu case Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * fmt Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> --------- Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>	2025-04-10 13:23:17 +02:00
Yih-Dar	4f139f5a50	Send trainer/fsdp/deepspeed CI job reports to a single channel (#37411 ) * send trainer/fsdd/deepspeed channel * update * change name * no . * final --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-04-10 13:17:31 +02:00
Arthur	a2c2fb0108	update `kernels` to 0.4.3 (#37419 ) * update `kernels` * oups	2025-04-10 12:14:22 +02:00
Wing Lian	0ddad2d655	mark llama4 as not supported with fa2 (#37416 )	2025-04-10 11:48:46 +02:00
Cyril Vallez	fbb2054ed5	Offloaded hybrid cache for Llama4 (#37401 ) * first try (maybe race condition) * Update cache_utils.py * cannot avoid the race condition -> use 2 layers * Update cache_utils.py * Update cache_utils.py	2025-04-10 11:44:34 +02:00
Cyril Vallez	6d8b0b3378	Fix Llama4 offset (#37414 ) * add +1 * Update modeling_llama4.py	2025-04-10 11:40:58 +02:00
Mohamed Mekkouri	f5865d32a2	Restrict & Explain tp_plan for FBgemm (#37404 ) * explain tp_plan * add llama4 check * add clarification	2025-04-10 11:33:33 +02:00
Serge Panev	e39c732644	Handle torch ver in flexattn (#37400 ) * Handle torch ver in flexattn * update	2025-04-10 11:27:54 +02:00
Manuel de Prada Corral	bc0150bb04	Add warning when failed to acquire other user's lock at model download (#37395 )	2025-04-10 11:18:27 +02:00
Wing Lian	9cda4265d6	handle torch version edge cases (#37399 )	2025-04-09 21:49:57 +02:00
Arthur	e032d12e8a	the fix that did not get in (#37370 ) * debugging improvements * add debugging details * add more debugging details * debug more * the fix that did not get in * First fix flex * fix query offset * fix flex first * fix device mask creation for speed * small mask creation sdpa * Update flex_attention.py * remove chunked prefill from HybridChunkedCache * never seen such a fucked up merged * clean up layers + output * add summary json file * Efficient general cache * Update cache_utils.py * cleanup * fix? * fix! * oups typo * not everywhere * more fixes * revert unrelated changes * Fix but ugly for now -> should use pad instead * oups * re-initialize the cache * Use pad to simplify * style * correct slicing --------- Co-authored-by: Pablo <pablo.montalvo.leroux@gmail.com> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>	2025-04-09 20:15:33 +02:00
Mohamed Mekkouri	f834ca2c19	Attention Quantization with FBGemm & TP (#37384 ) * fix * keep fused * contiguous * rm print * update * update * rm print	2025-04-09 18:45:42 +02:00
DerekLiu35	c5c648dd74	Fix some failing AWQ tests (#37383 ) * update AwqQuantizer * fix style * add an arg to get_modules_to_not_convert to add get_keys_to_not_convert(model)	2025-04-09 18:24:57 +02:00
Brayden Zhong	71b35387fd	Apply torchfix to replace deprecated functions: `_pytree._register_pytree_node` and `torch.cpu.amp.autocast` (#37372 ) fix: apply torchfix	2025-04-09 16:11:18 +01:00
Sangyun_LEE (이상윤)	ad340908e4	Fix warning message for PEFT models in text-generation pipeline #36783 (#36887 ) * add peft model in constant * add test * fix formating * make fixup execute * change code * check by self.task * add test * fixup test code * fix minor typo * fix pipeline test * apply maintainers reqests	2025-04-09 15:36:52 +01:00
DerekLiu35	2527f71a47	Add "selecting a quantization method" doc (#37159 ) * initial draft * make documentation simpler * Update docs/source/en/quantization/selecting.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/selecting.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/selecting.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/selecting.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/selecting.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/selecting.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/selecting.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/selecting.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/selecting.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/selecting.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * turn pros and cons into tables * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * add links to each quant method page * separate calibration vs no calibration methods * add calibration time estimates --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-04-09 15:51:37 +02:00
Marc Sun	7ae0be722e	update deepspeed docker (#37371 ) * update * create docker image * 03 * uninstall pytest as it conflits with transformers * wrong one * better * see which package depends on pytest * up * resintall * fix * deepspeedddddddd * deepspeedddddddd * deepspeedddddddd * deepspeedddddddd * deepspeedddddddd * deepspeedddddddd * deepspeedddddddd * deepspeedddddddd --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-04-09 14:54:06 +02:00
Arthur	e3eda6d188	Add glm4 (#37388 ) * add changed * Revert "add changed" This reverts commit `0a0166a1fe`. * update with NEW MODEL class called GLM4 * update * Update glm4.md * Name * style * fix copies * fixup test --------- Co-authored-by: Yuxuan Zhang <2448370773@qq.com>	2025-04-09 14:02:04 +02:00
Jonas M. Kübler	1e6ff5fd55	fix: llama4 conversion script no_rope_layers (#37359 ) fix conversion script no_rope_layers `no_rope_layers` should either be a list of NoPE layers or None, such that it is created in the config from the `no_rope_layer_interval` Co-authored-by: Pedro Cuenca <pedro@huggingface.co>	2025-04-09 13:02:15 +02:00
Raushan Turganbay	6f4058aee3	Update composition flag usage (#36263 ) * update composition flag usage * remove print * fix tests * actually fix * oh c'mon * now should be fixed right? * fix copies	2025-04-09 11:48:49 +02:00
Jerry Zhang	08e3217baf	Preserve requires_grad in pre quantized model (#37354 ) * Preserve requires_grad in pre quantized model Summary: discovered this when running lm-eval for some models, current code will set requires_grad to True always Test Plan: lm_eval --model hf --model_args pretrained=jerryzh168/phi4-torchao-gguf-q4_k --tasks hellaswag --device cuda:0 --batch_size 8 Reviewers: Subscribers: Tasks: Tags: * ruff format --------- Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>	2025-04-08 18:41:30 +02:00
Matt	4d0de5f73a	🚨 🚨 Setup -> setupclass conversion (#37282 ) * More limited setup -> setupclass conversion * make fixup * Trigger tests * Fixup UDOP * Missed a spot * tearDown -> tearDownClass where appropriate * Couple more class fixes * Fixups for UDOP and VisionTextDualEncoder * Ignore errors when removing the tmpdir, in case it already got cleaned up somewhere * CLIP fixes * More correct classmethods * Wav2Vec2Bert fixes * More methods become static * More class methods * More class methods * Revert changes for integration tests / modeling files * Use a different tempdir for tests that actually write to it * Remove addClassCleanup and just use teardownclass * Remove changes in modeling files * Cleanup get_processor_dict() for got_ocr2 * Fix regression on Wav2Vec2BERT test that was masked by this before * Rework tests that modify the tmpdir * make fix-copies * revert clvp modeling test changes * Fix CLIP processor test * make fix-copies	2025-04-08 17:15:37 +01:00
KimmiShi	c15a7adb28	fix(qwen): fix shape error when using tp (#36947 ) * fix(qwen): fix shape error when using tp * Update modeling_qwen2_vl.py --------- Co-authored-by: shidongxing <shidongxing@pjlab.org.cn> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-04-08 17:47:30 +02:00
Jonathan Mamou	121f91d36c	prune LM Head for USD (#36695 ) * initial commit * fix * fix style * set default to prune * add tests * comment * remove prune flag from generate * address Joao's comments * deprecate_kwarg * add doc * fix target_vocab_size * Update src/transformers/generation/candidate_generator.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/candidate_generator.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/candidate_generator.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/candidate_generator.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * fix deprecated argument assistant_model_device --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2025-04-08 16:44:10 +01:00
Joao Gante	4321b0648c	[core] remove `GenerationMixin` inheritance by default in `PreTrainedModel` (#37173 )	2025-04-08 16:42:05 +01:00
Kerry	aab0878327	Skip non-selected experts for mixtral and qwen2_moe (#32429 ) * Skip non-selected experts for mixtral and qwen2_moe * Fix: tensor tolist() * WIP: tokenization test * fix modular source of truth * nits --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-04-08 17:41:28 +02:00
Joao Gante	35f0f5b5da	[llama 4] dynamic rope decorator (#37365 ) l4 + dynamic rope decorator	2025-04-08 15:56:31 +01:00
Ryan Mullins	530322ccb6	Set vision config to None for Gemma 1B conversion (#37366 ) * Set vision config to None for Gemma 1B conversion * Trigger tests --------- Co-authored-by: Matt <rocketknight1@gmail.com>	2025-04-08 14:22:32 +01:00
Yih-Dar	8064cd9b4f	fix deepspeed job (#37284 ) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-04-08 15:19:33 +02:00
Cyril Vallez	cdfb018d03	A bit of cleaning 🧹🧹 (#37215 ) * cleaning * CIs	2025-04-08 14:33:58 +02:00
cyyever	1e6b546ea6	Use Python 3.9 syntax in tests (#37343 ) Signed-off-by: cyy <cyyever@outlook.com>	2025-04-08 14:12:08 +02:00
Minho Ryu	0fc683d1cd	convert float for yarn related arguments in rope_scaling (#37139 ) * convert float for yarn related arguments in rope_scaling * sort keys alphabetically --------- Co-authored-by: ryan.agile <ryan.agile@kakaobrain.com>	2025-04-08 13:58:22 +02:00
Alex Brooks	2515a5a290	Expose blip2qformer (#37254 ) * Expose blip2qformer * Add missing args to blip2 config	2025-04-08 12:04:33 +02:00
Arthur	2da82e432d	Multiple llama4 fixe (#37353 ) * update for fixes * more fixes * fuxix dynamic cache? * style * fix both traiining and generating. Eager seems alright * dynamic does not work * fix most cases, use_cache or not, eager or not, no default cache (ex: not training but you want to get cache states) * should be final fixes * fix more stuff no cat * style * fix * style * final sytle * qualityeioiwhjfaopsejdpofqsdjkfjha;wesdhgfkjlqsw.denghjkaswednkgs * fix * revert	2025-04-08 11:14:49 +02:00
salman	794fde7b1c	Fixing flex attention for torch=2.6.0 (#37285 ) * adding compile kwarg for torch 2.6 * fixing dynamic * addressing comment * typo * Update src/transformers/integrations/flex_attention.py --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-04-07 23:04:46 +02:00
Wing Lian	b54c2f4689	more fixes for post-training llama4 (#37329 ) * more fixes for post-training llama4 * use target_length instead of guearded past_key_values	2025-04-07 21:20:23 +02:00
Tugsbayasgalan Manlaibaatar	754a370bca	Remove unnecessary attr assignment (#36837 ) Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2025-04-07 20:19:54 +01:00
logesh R	31a62c2eb8	Updated Model-card for donut (#37290 ) * Updated documentation for Donut model * Update docs/source/en/model_doc/donut.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/donut.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/donut.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/donut.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Updated code suggestions * Update docs/source/en/model_doc/donut.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Updated code suggestion to Align with the AutoModel example * Update docs/source/en/model_doc/donut.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Updated notes section included code examples * close hfoption block and indent --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-04-07 11:54:47 -07:00
Mohamed Mekkouri	f830105183	Add bnb to the list of supported quantization methods for LLama4 (#37348 ) * add bnb * style * update * add pre_quantized check	2025-04-07 20:34:06 +02:00
Parag Ekbote	e2b0224d94	Update Model Card for Jamba (#37152 ) * Update model card for jamba * Apply the suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Apply suggestions from code review-2 Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * update model page. * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update as per code review. * Update docs/source/en/model_doc/jamba.md as per code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/jamba.md as per code review ` Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * update as per code review. * fixes --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-04-07 11:02:59 -07:00
Devesh Rahatekar	6cc109c354	Improvements in Gemma2 model card (#37076 ) * Improved Model card for Gemma2 * Made changes in gemma2 as suggested * Made more changes in the doc (adding image, notes, closing hfoptions) * minor fixes --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-04-07 10:51:26 -07:00
Mohamed Mekkouri	8bbcdf5409	Clean up the compressed-tensors integration (#37349 ) clean up	2025-04-07 19:26:45 +02:00
Ashvanth.S	3a826a45ca	Update Model card for GPT2 (#37101 ) * Update Model card for gpt2 * Update link for gpt2 space * fixes docs based on suggestions * Add transformers-cli and quantization example for GPT-2 * Remove resources and flash attention docs and fix typos	2025-04-07 10:15:28 -07:00
Ricardo Alanis	5e855095a2	Update falcon mamba card (#37253 ) * feat: edit falcon mamba card * fix: edit statement on falconmamba arch * Update docs/source/en/model_doc/falcon_mamba.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/falcon_mamba.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/falcon_mamba.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * fix: add right indent for tags * fix: remove notas --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-04-07 10:12:44 -07:00

... 15 16 17 18 19 ...

19383 Commits