transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-31 02:02:21 +06:00

Author	SHA1	Message	Date
cyyever	1e6b546ea6	Use Python 3.9 syntax in tests (#37343 ) Signed-off-by: cyy <cyyever@outlook.com>	2025-04-08 14:12:08 +02:00
Minho Ryu	0fc683d1cd	convert float for yarn related arguments in rope_scaling (#37139 ) * convert float for yarn related arguments in rope_scaling * sort keys alphabetically --------- Co-authored-by: ryan.agile <ryan.agile@kakaobrain.com>	2025-04-08 13:58:22 +02:00
Alex Brooks	2515a5a290	Expose blip2qformer (#37254 ) * Expose blip2qformer * Add missing args to blip2 config	2025-04-08 12:04:33 +02:00
Arthur	2da82e432d	Multiple llama4 fixe (#37353 ) * update for fixes * more fixes * fuxix dynamic cache? * style * fix both traiining and generating. Eager seems alright * dynamic does not work * fix most cases, use_cache or not, eager or not, no default cache (ex: not training but you want to get cache states) * should be final fixes * fix more stuff no cat * style * fix * style * final sytle * qualityeioiwhjfaopsejdpofqsdjkfjha;wesdhgfkjlqsw.denghjkaswednkgs * fix * revert	2025-04-08 11:14:49 +02:00
salman	794fde7b1c	Fixing flex attention for torch=2.6.0 (#37285 ) * adding compile kwarg for torch 2.6 * fixing dynamic * addressing comment * typo * Update src/transformers/integrations/flex_attention.py --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-04-07 23:04:46 +02:00
Wing Lian	b54c2f4689	more fixes for post-training llama4 (#37329 ) * more fixes for post-training llama4 * use target_length instead of guearded past_key_values	2025-04-07 21:20:23 +02:00
Tugsbayasgalan Manlaibaatar	754a370bca	Remove unnecessary attr assignment (#36837 ) Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2025-04-07 20:19:54 +01:00
logesh R	31a62c2eb8	Updated Model-card for donut (#37290 ) * Updated documentation for Donut model * Update docs/source/en/model_doc/donut.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/donut.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/donut.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/donut.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Updated code suggestions * Update docs/source/en/model_doc/donut.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Updated code suggestion to Align with the AutoModel example * Update docs/source/en/model_doc/donut.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Updated notes section included code examples * close hfoption block and indent --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-04-07 11:54:47 -07:00
Mohamed Mekkouri	f830105183	Add bnb to the list of supported quantization methods for LLama4 (#37348 ) * add bnb * style * update * add pre_quantized check	2025-04-07 20:34:06 +02:00
Parag Ekbote	e2b0224d94	Update Model Card for Jamba (#37152 ) * Update model card for jamba * Apply the suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Apply suggestions from code review-2 Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * update model page. * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update as per code review. * Update docs/source/en/model_doc/jamba.md as per code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/jamba.md as per code review ` Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * update as per code review. * fixes --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-04-07 11:02:59 -07:00
Devesh Rahatekar	6cc109c354	Improvements in Gemma2 model card (#37076 ) * Improved Model card for Gemma2 * Made changes in gemma2 as suggested * Made more changes in the doc (adding image, notes, closing hfoptions) * minor fixes --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-04-07 10:51:26 -07:00
Mohamed Mekkouri	8bbcdf5409	Clean up the compressed-tensors integration (#37349 ) clean up	2025-04-07 19:26:45 +02:00
Ashvanth.S	3a826a45ca	Update Model card for GPT2 (#37101 ) * Update Model card for gpt2 * Update link for gpt2 space * fixes docs based on suggestions * Add transformers-cli and quantization example for GPT-2 * Remove resources and flash attention docs and fix typos	2025-04-07 10:15:28 -07:00
Ricardo Alanis	5e855095a2	Update falcon mamba card (#37253 ) * feat: edit falcon mamba card * fix: edit statement on falconmamba arch * Update docs/source/en/model_doc/falcon_mamba.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/falcon_mamba.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/falcon_mamba.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * fix: add right indent for tags * fix: remove notas --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-04-07 10:12:44 -07:00
Shubham Panchal	416b5a875d	Update model-card for DINOv2 (#37104 ) [docs] Update model-card for DINOv2	2025-04-07 10:11:08 -07:00
Nahieli	f8a16805c5	updated model card for Mistral (#37156 ) * model card for Mistral * Update docs/source/en/model_doc/mistral.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/mistral.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/mistral.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/mistral.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/mistral.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * apply suggestions * fix typo * updated with comments * updated with comments * updated with comments * remove hfoption block --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-04-07 10:05:36 -07:00
Cyril Vallez	48e179857c	Remove HQQ from caching allocator warmup (#37347 ) Update modeling_utils.py	2025-04-07 18:33:48 +02:00
Steven Liu	832cb684a0	Update translation template (#37294 )	2025-04-07 09:29:37 -07:00
Cyril Vallez	22065bd645	fix derived berts `_init_weights` (#37341 ) * fix derived berts * more * roformer	2025-04-07 18:25:07 +02:00
Matt	f789f960c8	Avoid build crashes when torch.version.xpu doesn't exist and fix Llama4 processor tests (#37346 ) * Avoid build crashes when torch.version.xpu doesn't exist * Trigger tests * Fix image token and skip inappropriate test * Remove ignore_errors=True * Add another skip	2025-04-07 17:05:54 +01:00
Yao Matrix	12bf24d6ae	enable 2 llama UT cases on xpu (#37126 ) * enable tests/models/llama/test_modeling_llama.py::LlamaIntegrationTest::test_model_7b_logits and tests/models/llama/test_modeling_llama.py::LlamaIntegrationTest::test_model_7b_logits_bf16 on xpu Signed-off-by: YAO Matrix <matrix.yao@intel.com> * switch to use Expectations Signed-off-by: YAO Matrix <matrix.yao@intel.com> * fix style Signed-off-by: YAO Matrix <matrix.yao@intel.com> * extract gen bits from architecture and use it Signed-off-by: YAO Matrix <matrix.yao@intel.com> * add cross refererence Signed-off-by: YAO Matrix <matrix.yao@intel.com> * fix style Signed-off-by: YAO Matrix <matrix.yao@intel.com> --------- Signed-off-by: YAO Matrix <matrix.yao@intel.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-04-07 16:02:14 +02:00
Yih-Dar	e7ad077012	byebye torch 2.0 (#37277 ) * bump Torch 2.1 with broken compatibility `torch.compile` * dep table * remove usage of is_torch_greater_or_equal_than_2_1 * remove usage of is_torch_greater_or_equal_than_2_1 * remove if is_torch_greater_or_equal("2.1.0") * remove torch >= "2.1.0" * deal with 2.0.0 * PyTorch 2.0+ --> PyTorch 2.1+ * ruff 1 * difficult ruff * address comment * address comment --------- Co-authored-by: Jirka B <j.borovec+github@gmail.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-04-07 15:19:47 +02:00
jiqing-feng	99f9f1042f	Fix torchao usage (#37034 ) * fix load path Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix path Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * Fix torchao usage Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * revert useless change Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * revert fp8 test Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix fp8 test Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix fp8 test Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix torch dtype Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-04-07 14:50:48 +02:00
cyyever	0fb8d49e88	Use Python 3.9 syntax in examples (#37279 ) Signed-off-by: cyy <cyyever@outlook.com>	2025-04-07 12:52:21 +01:00
Cyril Vallez	08f36771b3	Fix `init empty weights` without accelerate (#37337 ) * add the integration * Update accelerate.py * Update accelerate.py * add find_tied_params as well * Update accelerate.py * add where copied from * simplify * add error	2025-04-07 11:37:29 +02:00
Cyril Vallez	9db31ea585	Fix deepspeed with quantization (#37324 ) * Update modeling_utils.py * Update modeling_utils.py	2025-04-07 11:36:44 +02:00
hoshi-hiyouga	debfe904c9	fix llama4 training (#37319 )	2025-04-07 09:24:44 +02:00
Wing Lian	54538ebee3	fix flex attn when optional args aren't passed (#37327 )	2025-04-07 09:12:21 +02:00
Lysandre	d1b92369ca	v4.52.0.dev0	2025-04-05 22:04:21 +02:00
Arthur	25b7f27234	Add llama4 (#37307 ) * remove one of the last deps * update fast image processor after refactor * styling * more quality of life improvements * nit * update * cleanups * some cleanups * vllm updates * update fake image token * [convert] Fix typo * [convert] Strip extraneous bytes from shards * [convert] Minor fixes * [convert] Use num_experts * multi-image fixes in modeling + processor * fixup size * 128 experts * Use default rope * Unfuse mlp * simplify a lot inputs embeds merging * remove .item() 👀 * fix from review * Address feedback * Use None "default" for rope_scaling. Add eot. * set seed * return aspect ratios and bug fixes * Moe 128 rebased (#8) * 128 experts * Use default rope * Unfuse mlp * Address feedback * Use None "default" for rope_scaling. Add eot. * Meta/llama quant compat (#7) * add quant compatible model & conversion code for llama4 * fix a few issues * fix a few issues * minor type mapping fix --------- Co-authored-by: Lu Fang <fanglu@fb.com> * use a new config parameter to determine which model definition to use for MoE --------- Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Lu Fang <fanglu@fb.com> * un-comment write_tokenizer from converting script * remove un-used imports * [llama4] Pop aspect_ratios from image processor output in Llama4Processor Signed-off-by: Jon Swenson <jmswen@gmail.com> * Fix parameter_count name * Update src/transformers/models/llama4/configuration_llama4.py * nit * Add changes for no_rope, moe_layers, chunked attention. Just need to test all * Update src/transformers/models/llama4/image_processing_llama4_fast.py * nit * fix post merge with main * support flex attention * fixes * fix * add layer * small updates * rebase and delete llm_compressor * nit * [llama4/mm] Add back <\|image\|> token that delimits global tile * [llama4/mm] Fix Llama 4 image processing unit tests * add explicit dtype Signed-off-by: Jon Swenson <jmswen@gmail.com> * sdpa works * comment todo small * fix model loading Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> * revert * nits * small fix for TP on 1 node * Read new params from config * Add <\|eom\|> * lol don't know how this got here * adding fp8 * Save processor, fix chat template * style * Add boi/eoi tokens We don't use them. * fixes for now flex seems to work :) * updates * nits * updates * missking keys * add context parallel * update * update * fix * nits * add worldsize and make eager attn work for vision * Ignore new key present in base models * add tp_plan * fix nope Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> * minor fix Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> * Clean up Llama4 vision model * current updates * add support for `attn_temperature_tuning` * add floor scale * add missing attn scales * push what works, dirty trick for the device synch * oups * Fix pad_token_id See https://huggingface.co/ll-re/Llama-4-Scout-17B-16E/discussions/2/files Confirmed in the original codebase. * fix causallml loading * rm * fix tied-weights * fix sdpa * push current version * should work with both short and long * add compressed_tensos & fix fbgemm tp * Fix flex impl * style * chunking * try to revert the potentially breaking change * fix auto factory * fix shapes in general * rm processing * commit cache utils cleanup * Fix context length * fix * allocate * update tp_plan * fix SDPA! * Add support for sparse `Llama4TextMoe` layer from the kernel hub * cleanup * better merge * update * still broken fixing now * nits * revert print * Write max_position_embeddings and max_model_length * Update modeling_llama4.py * Save attention_chunk_size * Sync eos terminators * Read initializer_range * style * remove `dict` * fix * eager should use `chunked_attention_mask` * revert * fixup * fix config * Revert "Merge pull request #36 from huggingface/sparse-llama4-moe" This reverts commit `ccda19f050`, reversing changes made to `a515579aed`. * Fix typo and remove warning with compiled flex and chunked prefill * Fix MoE vs FF (#41) * fix * Use correct no_rope_layers if provided one is empty list * update tests * fix * skipping some tests * fix fp8 loading Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> * fix text geneartion pipeline Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> * eager needs 4D mask * fix * Some cleanup * fix * update * fix * replace correctly module * patch * modulelist * update * update * clean up * Don't move to `cuda:0` in distributed mode * restrict to compressed tensors for now * rm print * Docs! * Fixes * Update docs/source/en/model_doc/llama4.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Fixes * cuda graph fix * revert some stuff * fixup * styling * Update src/transformers/models/llama4/modeling_llama4.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fixup * commit licence, cleanup here and there and style * more styling changes * fix dummies * fix and clean docstrings * remove comment * remove warning * Only fast image processor is supported * nit * trigger CI * fix issue with flex encoder * fix dynamic cache * Code quality * Code quality * fix more tests for now * Code quality * Code quality * Nuke bunch of failing stuff * Code quality * Code quality * cleanup removal of slow image processor * ruff fix fast image processor * fix * fix styling * Docs * Repo consistency * Repo consistency * fix sliding window issue * separate llama cache * styling * Repo consistency * Repo consistency * push waht works * L4 Repo consistency * Docs * fix last last alst alst alst alstsaltlsltlaslt --------- Signed-off-by: Jon Swenson <jmswen@gmail.com> Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> Co-authored-by: yonigozlan <yoni.gozlan10@gmail.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Pablo Montalvo <pablo.montalvo.leroux@gmail.com> Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> Co-authored-by: Keyun Tong <tongkeyun@gmail.com> Co-authored-by: Zijing Liu <liuzijing2014@users.noreply.github.com> Co-authored-by: Lu Fang <fanglu@fb.com> Co-authored-by: Zijing Liu <liuzijing2014@gmail.com> Co-authored-by: Jon Swenson <jmswen@gmail.com> Co-authored-by: jmswen <jmswen@users.noreply.github.com> Co-authored-by: MekkCyber <mekk.cyber@gmail.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: Mohit Sharma <mohit21sharma.ms@gmail.com> Co-authored-by: Yong Hoon Shin <yhshin@meta.com> Co-authored-by: Marc Sun <marc@huggingface.co> Co-authored-by: drisspg <drisspguessous@gmail.com> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> Co-authored-by: Daniël de Kok <me@danieldk.eu> Co-authored-by: Lysandre <hi@lysand.re> Co-authored-by: Ye (Charlotte) Qi <ye.charlotte.qi@gmail.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-04-05 22:02:22 +02:00
Lysandre Debut	aa40fda346	Hf Xet extra (#37305 ) * Hf Xet extra * Hf Xet extra	2025-04-05 21:06:05 +02:00
Cyril Vallez	e94571580b	Fix deepspeed loading (part 2) (#37306 ) * fix * Update modeling_utils.py * Update modeling_utils.py * oups remove print	2025-04-05 20:41:42 +02:00
Cyril Vallez	84aa13dd85	Fix deepspeed loading (#37281 ) * Update modeling_utils.py * Update modeling_utils.py * fix and remove all imports * Update modeling_utils.py * Update modeling_utils.py * style * Update modeling_utils.py	2025-04-05 17:05:45 +02:00
Linnet Cosmos Tuscano	0ef339ff1b	Update OpenAI GPT model card (#37255 ) * Update OpenAI GPT model card * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update OpenAI GPT model card: add usage examples and notes section * Add API autodoc tags after Notes section for OpenAI GPT model * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Added missing badges --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-04-04 15:25:16 -07:00
Sharareh Younesian	46d73910d5	Updated T5 model card with standardized format (#37261 ) * Updated T5 model card with standardized format * Updated T5 model card with standardized format, fixed typo * Update docs/source/en/model_doc/t5.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/t5.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/t5.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/t5.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/t5.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/t5.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/t5.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/t5.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/t5.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/t5.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Apply reviewer suggestions * Update docs/source/en/model_doc/t5.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-04-04 15:23:09 -07:00
Chathumina Vimukthi	579135a2f6	Updated model card for distilbert (#37157 ) * Updated model card for distilbert * Updated the distilbert model card * Updated model card for distilbert * Updated the distilbert model card * Addressed code review comments * Addressed review comments * fix pipeline --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-04-04 15:22:46 -07:00
Reshan Gomis	8cd57eb731	mobilebert model card update (#37256 ) * mobilebert model card update * Updates to model card mobilebert --------- Co-authored-by: Reshan Gomis <reshang@verdentra.com>	2025-04-04 14:28:35 -07:00
Rahul Tuli	ebe47ce3e9	Fix: Unexpected Keys, Improve `run_compressed`, Rename Test Folder (#37077 )	2025-04-04 21:30:11 +02:00
Shubham Panchal	531e4fcf0e	Update model card for Depth Anything (#37065 ) [docs] Update model card for Depth Anything	2025-04-04 11:36:05 -07:00
byi8220	a4e55fcff8	Disable delay_optimizer_creation in `Trainer` to support fsdp2 (#37147 ) * github why you do this * fix * make fixup * disable cpu offload test * fixup * tmp reworks * git branch movement * make fixup * add require_fsdp_v2_version * dep issues * update ruff and fixup	2025-04-04 20:11:37 +02:00
Yao Matrix	878562b68d	fix test device spec relative path importing issue (#37190 ) Signed-off-by: YAO Matrix <matrix.yao@intel.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>	2025-04-04 18:22:55 +02:00
Matt	8ebc435267	Fix llava_onevision tests (#37280 ) * Fix llava_onevision tests * Trigger tests	2025-04-04 15:03:38 +01:00
Joao Gante	ad3d157188	[RoPE] abstract dynamic RoPE update under a decorator ✨ (#37249 ) * dynamic rope decorator * longrope; shorter fwd pass * propper docstring * make fixup	2025-04-04 14:27:28 +01:00
Lysandre Debut	3d40bda30e	Hugging Face Hub pin to v0.30.0 for Xet (#37166 )	2025-04-04 14:58:22 +02:00
Joao Gante	acbcb5d07d	[Tests] flaky `test_constrained_beam_search_generate_dict_output` (#37276 )	2025-04-04 13:38:42 +01:00
Ryan McConville	4ba0989eab	Clarify error message to ensure min 28x28 image supplied for Qwen 2.5 VL (#37264 ) fix: clarify error message for min 28x28 images Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2025-04-04 12:53:38 +01:00
Yih-Dar	352ec8ef22	pin specific `natten` version in docker file (#37274 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-04-04 13:47:16 +02:00
cyyever	edd345b52e	Fix deprecated PT functions (#37237 ) * Fix deprecated PT functions Signed-off-by: cyy <cyyever@outlook.com> * Revert some changes Signed-off-by: cyy <cyyever@outlook.com> --------- Signed-off-by: cyy <cyyever@outlook.com>	2025-04-04 12:31:11 +01:00
Yih-Dar	b016de1ae4	Fix `utils/check_bad_commit.py` (#37272 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-04-04 12:18:20 +02:00
Nikos Antoniou	f74d7da836	Introduce modular files for speech models (#35902 ) * WAV_2_VEC_2 to WAV2VEC2 * added modular files for hubert, wavlm, wav2vec2_bert, data2vec_audio * remove unnessary definitions in modulars * added modular files for UniSpeech, UniSpeechSat, Wav2Vec2Conformer * docstring fix for UniSpeechForCTC * removed unneccessary re-definition of modular classes * reverted lazy imports change on modular_model_converter, type-alias for Wav2Vec2BaseModelOutput * top-level import of deepspeed in seamless_m4t, speecht5 * avoid tracking imports inside classes, relocate lazy deepspeed, peft imports in their original locations * convert modular * tiny modular typing fixes * some more modular fixes * make style --------- Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com> Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>	2025-04-04 11:46:27 +02:00

1 2 3 4 5 ...

18547 Commits