transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-04 21:30:07 +06:00

Author	SHA1	Message	Date
Yao Matrix	3526e25d3d	enable misc test cases on XPU (#38852 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details * enable misc test cases on XPU Signed-off-by: YAO Matrix <matrix.yao@intel.com> * fix style Signed-off-by: YAO Matrix <matrix.yao@intel.com> * tweak bamba ground truth on XPU Signed-off-by: YAO Matrix <matrix.yao@intel.com> * remove print Signed-off-by: YAO Matrix <matrix.yao@intel.com> * one more Signed-off-by: YAO Matrix <matrix.yao@intel.com> * fix style Signed-off-by: YAO Matrix <matrix.yao@intel.com> --------- Signed-off-by: YAO Matrix <matrix.yao@intel.com>	2025-06-18 09:20:49 +02:00
Cyril Vallez	4b8ec667e9	Remove all traces of `low_cpu_mem_usage` (#38792 ) * remove it from all py files * remove it from the doc * remove it from examples * style * remove traces of _fast_init * Update test_peft_integration.py * CIs	2025-06-12 16:39:33 +02:00
Raushan Turganbay	dbfc79c17c	[generation] bring back tests on vision models (#38603 ) * bring back geenration tests on VLMs * remove head mask tests overwritten	2025-06-06 08:23:15 +00:00
Yao Matrix	89542fb81c	enable more test cases on xpu (#38572 ) * enable glm4 integration cases on XPU, set xpu expectation for blip2 Signed-off-by: Matrix YAO <matrix.yao@intel.com> * more Signed-off-by: YAO Matrix <matrix.yao@intel.com> * fix style Signed-off-by: YAO Matrix <matrix.yao@intel.com> * refine wording Signed-off-by: YAO Matrix <matrix.yao@intel.com> * refine test case names Signed-off-by: YAO Matrix <matrix.yao@intel.com> * run Signed-off-by: YAO Matrix <matrix.yao@intel.com> * add gemma2 and chameleon Signed-off-by: YAO Matrix <matrix.yao@intel.com> * fix review comments Signed-off-by: YAO Matrix <matrix.yao@intel.com> --------- Signed-off-by: Matrix YAO <matrix.yao@intel.com> Signed-off-by: YAO Matrix <matrix.yao@intel.com>	2025-06-06 09:29:51 +02:00
Arthur	f5d45d89c4	🚨Early-error🚨 config will error out if `output_attentions=True` and the attn implementation is wrong (#38288 ) * Protect ParallelInterface * early error out on output attention setting for no wraning in modeling * modular update * fixup * update model tests * update * oups * set model's config * more cases * ?? * properly fix * fixup * update * last onces * update * fix? * fix wrong merge commit * fix hub test * nits * wow I am tired * updates * fix pipeline! --------- Co-authored-by: Lysandre <hi@lysand.re>	2025-05-23 17:17:38 +02:00
Yuanyuan Chen	11b670a282	Fix run_slow (#38314 ) Signed-off-by: cyy <cyyever@outlook.com>	2025-05-23 10:18:30 +02:00
Arthur	211f2b0875	Add CB (#38085 ) * stash for now * initial commit * small updated * up * up * works! * nits and fixes * don't loop too much * finish working example * update * fix the small freeblocks issue * feat: stream inputs to continuous batch * fix: update attn from `eager` to `sdpa` * refactor: fmt * refactor: cleanup unnecessary code * feat: add `update` fn to `PagedAttentionCache` * feat: broken optimal block size computation * fix: debugging invalid cache logic * fix: attention mask * refactor: use custom prompts for example * feat: add streaming output * fix: prefill split refactor: add doc strings and unsound/redundant logic fix: compute optimal blocks logic * fix: send decoded tokens when `prefilling_split` -> `decoding` * refactor: move logic to appropriate parent class * fix: remove truncation as we split prefilling anyways refactor: early return when we have enough selected requests * feat: add paged attention forward * push Ggraoh> * add paged sdpa * update * btter mps defaults * feat: add progress bar for `generate_batch` * feat: add opentelemetry metrics (ttft + batch fill %age) * feat: add tracing * Add cuda graphs (#38059) * draft cudagraphs addition * nits * styling * update * fix * kinda draft of what it should look like * fixes * lol * not sure why inf everywhere * can generate but output is shit * some fixes * we should have a single device synch * broken outputs but it does run * refactor * updates * updates with some fixes * fix mask causality * another commit that casts after * add error * simplify example * update * updates * revert llama changes * fix merge conflicts * fix: tracing and metrics * my updates * update script default values * fix block allocation issue * fix prefill split attnetion mask * no bugs * add paged eager * fix * update * style * feat: add pytorch traces * fix * fix * refactor: remove pytorch profiler data * style * nits * cleanup * draft test file * fix * fix * fix paged and graphs * small renamings * cleanups and push * refactor: move tracing and metrics logic to utils * refactor: trace more blocks of code * nits * nits * update * to profile or not to profile * refactor: create new output object * causal by default * cleanup but generations are still off for IDK what reason * simplifications but not running still * this does work. * small quality of life updates * nits * updaet * fix the scheduler * fix warning * ol * fully fixed * nits * different generation parameters * nice * just style * feat: add cache memory usage * feat: add kv cache free memory * feat: add active/waiting count & req latency * do the sampling * fix: synchronize CUDA only if available and improve error handling in ContinuousBatchingManager * fix on mps * feat: add dashboard & histogram buckets * perf: improve waiting reqs data structures * attempt to compile, but we should only do it on mps AFAIK * feat: decouple scheduling logic * just a draft * c;eanup and fixup * optional * style * update * update * remove the draft documentation * fix import as well * update * fix the test * style doomed --------- Co-authored-by: Luc Georges <luc.sydney.georges@gmail.com>	2025-05-22 17:43:48 +02:00
Anton Vlasjuk	d95c864a25	🔴🔴🔴 [`Attention`] Refactor Attention Interface for Bart-based Models (#38108 ) * starting attn refactor for encoder decoder models via bart (eager + sdpa) * flash attention works, remove unnecessary code * flex attention support for bart!, gotta check if the renaming is not too aggressive * some comments * skip flex grad test for standalone as done with the other test * revert flex attn rename (for now), sdpa simplify, and todos * more todos * refactor mask creation for reuse * modular attempt at biogpt * first batch of other models * fix attn dropout * fix autoformer copies * hubert * another batch of models * copies/style + last round of bart models --> whisper next? * remove unnecessary _reshape function and remove copy to whisper * add skip for decoder-only models out of enc-dec (same as in bart) * bring back licences * remove comment, added to pr read instead * mostly docs * disable sew flex attn as it's unclear attn mask for now * oops * test fixes for enc-dec * torch fx fixes + try at flex attn * skip on mbart * some more fixes * musicgen skip / delete old attn class logic + sdpa compose compile skip * disable flex attn for musicgen, not worth the effort * more fixes and style * flex attention test for dropout and encoder decoder that dont have main input names * informer fixes * the weirdest thing I've encountered yet... * style * remove empty tensor attempt, found core root in previous commits * disable time series due to tests being very text centric on inputs * add speech to text to be ignoring the other attns, also due to tests * update docs * remaining issues resolved ? * update docs for current state --> nllb moe and pegasus x sdpa is questionable :D * some models have not set the is_causal flag... * change dtype in softmax tol old behaviour + some modular fixes * I hate it but it is what it is * fixes from main for bart * forgot this one * some model fixes * style * current status * marian works now * fixing some copies * some copy fixes + time series x informer * last models possibly and fixes on style/copies * some post merge fixes * more fixes * make attention interface callable and move warnings there * style lol * add comment to "unsupported" * remove callable interface and change interface warnings + some copies * fix * ternary is ugly af, make it simpler * how did that happen * fix flex attn test * failing the test * no more fallback! fixing copies next * style + attn fixed * fixing copies and mask creation * wrong copy * fixup tests and disable flex attn for now * fixup last tests?	2025-05-22 17:12:58 +02:00
Joao Gante	9c500015c5	🚨🚨🚨 [pipelines] update defaults in pipelines that can `generate` (#38129 ) * pipeline generation defaults * add max_new_tokens=20 in test pipelines * pop all kwargs that are used to parameterize generation config * add class attr that tell us whether a pipeline calls generate * tmp commit * pt text gen pipeline tests passing * remove failing tf tests * fix text gen pipeline mixin test corner case * update text_to_audio pipeline tests * trigger tests * a few more tests * skips * some more audio tests * not slow * broken * lower severity of generation mode errors * fix all asr pipeline tests * nit * skip * image to text pipeline tests * text2test pipeline * last pipelines * fix flaky * PR comments * handle generate attrs more carefully in models that cant generate * same as above	2025-05-19 18:02:06 +01:00
Joao Gante	dbb9813dff	[generation] Less verbose warnings by default (#38179 ) * tmp commit (imports broken) * working version; update tests * remove line break * shorter msg * dola checks need num_beams=1; other minor PR comments * update early trainer failing on bad gen config * make fixup * test msg	2025-05-19 10:03:37 +00:00
Raushan Turganbay	01ad9f4b49	Bart: new cache format (#35314 ) * bart compile * add mbart * some more models touched by fix-copies * more * more models * even more models * fix copies * fix tests * fix copies * fix * biogpt accepts position ids now (breaking?) * fix failing non-slow tests * fix some tests * should not be removed * small update * Update src/transformers/models/bart/modeling_bart.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * update for last `main` * fix copies * clone `update_causal_mask` from llama * tmp * fixup * why? how? * fix bart tests * dont skip test * address comments * fix tests * fix * fixup and delete the file --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2025-05-16 13:26:54 +02:00
Joao Gante	0e0e5c1044	[generate] Run custom generation code from the Hub (#36405 ) * mvp * remove trust_remote_code * generate_from_hub * handle requirements; docs * english * doc PR suggestions * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * changed remote code path to generate/generate.py * model repo has custom generate -> override base generate * check for proper inheritance * some doc updates (missing: tag-related docs) * update docs to model repo * nit * nit * nits * Update src/transformers/dynamic_module_utils.py * Apply suggestions from code review * Update docs/source/en/generation_strategies.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * trust remote code is required * use new import utils for requirements version parsing * use org examples * add tests * Apply suggestions from code review Co-authored-by: Manuel de Prada Corral <6536835+manueldeprada@users.noreply.github.com> * ascii file structure; tag instructions on readme.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Manuel de Prada Corral <6536835+manueldeprada@users.noreply.github.com>	2025-05-15 10:35:54 +01:00
Raushan Turganbay	955e61b0da	Remove head mask in generative models (#35786 ) * just squash into one commit * delete print	2025-05-15 10:44:19 +02:00
Yao Matrix	8f08318769	enable generation fsdp/utils cases on XPU (#38009 ) * enable generation fsdp/utils test cases on XPU Signed-off-by: Yao Matrix <matrix.yao@intel.com> * fix style Signed-off-by: Yao Matrix <matrix.yao@intel.com> * xx Signed-off-by: Yao Matrix <matrix.yao@intel.com> * use backend_xx APIs Signed-off-by: Yao Matrix <matrix.yao@intel.com> * fix style Signed-off-by: Yao Matrix <matrix.yao@intel.com> --------- Signed-off-by: Yao Matrix <matrix.yao@intel.com>	2025-05-09 20:52:41 +00:00
eustlb	798f948e88	Add CSM model (#36719 ) * draft structure * depth decoder with forward pre hook * full model forward draft * draft update * depth decoder update * ConversationalSpeechModelForCausalLM udpates * add generate * max length criteria small fix * udpate * updates * generation update * update in loss compute * conversion script * update for correct input embeddings * handle interleaved rope * update * update * update * support compile * update training * add doc * update doc * correct inits * ConversationalSpeechModel -> Csm * conf update * name update * tests CsmForCausalLMTest * convert use cached_file * conf + modeling updates * generate utils handle third dim shape * integration test * modeling + conf updates * common test handle more than 2 dims * add nested audio list utils * processing handle nested audio list * csm processing draft * mimi util * init updates * modular update * convert modular * processing update * csm tests update * generate tests handle third dim * generate utils handle third dim * propagate _get_initial_cache_position update * tied_weight_keys update + convert correctly * fix inputs_embeds * revert audio nested list * batch inference update + return audio * audio_utils update * processor update * some more integration tests * remove old test * porcessing output labels * improve * fix * update rope values with equivalent ones * conversion update * udpate tests * handle depth decoder generation config * remove default eos_token_id * make style * revert modeling_mimi * add default generation_config * remove sdpa since handled by default * make * fix conflict * fix conflicts * correct naming * correct imports * make * causal -> conditional naming * causal -> conditional naming * auto update * make * make * add doc * test update * fix weight init * audio tokens offsets as buffer * 4d mask in conditional class * make * doc update * fix causal mask * fix causal mask * doc update * doc update * add processor doc * update doc * fix 4d causal mask * update make_list_of_audio * do not default to mutable * remove duplicates * remove useless reset_parameters * use GradientCheckpointingLayer * use can_return_tuple * formatting * prepend placeholder in _sample * torch compile fix * some more fixies * convert modular * fix * default max_length in convert * handle depth decoder generation config correctly * clearer formulation * handle output_loading_info * handle softmax warning * add doc * propagate _get_initial_cache_position changes * generation in its own module * add processor tests * fix compile witu cuda graphs * fix compile with cuda graphs * add csm.md * include CSM loss * doc nit * doc nit * doc nit * Update docs/source/en/model_doc/csm.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * add save_audio to processor * Update src/transformers/models/csm/modular_csm.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * doc update * simplify audio_codes_mask computation * doc update * simplify loss computation * fix static cache test * fix * remove comment * simplify encoded length computation * use hf-internal-testing * doc update * cast to float before numpy * nit * mem efficient codebook head * nit * cat input values with cutoffs --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-05-07 10:20:13 -04:00
Sukriti Sharma	471958b620	Add GraniteMoeHybrid support for 4.0 (#37658 ) * initial config and MLA layer Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * first pass at decoder Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * completion of layers Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * modeling class Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * adding hybrid class to imports Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix imports granitemoehybrid Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix granitehybrid imports Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix granitehybrid import Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix generated modeling file Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * add some comments Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * minor fixes in layers Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * add sharedMLP layer Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * correct layer names Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fixes in mamba config Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix mamba config Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * change name of MLP layer Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix seq mizer layers Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * correct mamba config Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fixes in param names Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * enable hybrid model Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * update config Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix config granite hybrid Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix attention layer Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * cleanup to re-use mamba code Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * keep layer types Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * attention bias cleanup Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * update mamba layer name Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * first pass at tests Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * first pass at tests Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * use granite attention Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix: self attn weights Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * pass at making pos_emb optional Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * initialize self_attn only as needed Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * overwrite forward to create HybridMambaCache Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * Log invalid layer types * Add attention outputs test * Only emit attentions/logits if not None * Fix config test hidden size divisibility * mark granitmoehybrid as stateful * Initialize mamba convolutional layers * Formatting fixes * config docstring, removed some unused attrs * Fix missing arg in models test * Fix create and check decoder model test * support logits to keep in granitemoe * regen to pass logits_to_keep * Allow None or rope * Fix gradient checkpointing * Add granitemoehybrid as special cache for generate check * Remove unused MLA refs * Fix mamba layer mask * Remove logits to keep from config * Minor docstring nits * Update licenses * Enable cache by default * map layer types to layer block type * First pass at granite moe hybrid docs * Ignore granite moe hybrid in valid checkpoint check * Align attention interfaces * regenerate modular granitemoeshared attention interface * Align granite moe hybrid attn interface * run formatting * Handle mamba initialization * avoid conditional attr defs * Move hybrid layer validation to config * Add placeholder integration tests * Docs nits / Update model names * Clean up forward conditions * Use gradient checkpointing layer * Remove some copied bamba tests + inherit align test init delete more tests Use common layer init with bamba tests finish test consolidation * avoid redundant intermediate std var * use @can_return_tuple * Remove unused moe state * make skipped test names consistent * Fix docstring order * Add missing toc * Always create the shared mlp * Fix name in docstring * link preview model in docs --------- Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> Co-authored-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-05-06 06:47:43 +02:00
Joao Gante	8e8025b384	[tests] reset logs in `torch.compile` test (#37894 )	2025-04-30 16:04:28 +01:00
Joao Gante	3a1acc36ed	[tests] fix flaky pattern in `test_generate_continue_from_past_key_values` (#37724 )	2025-04-29 12:20:42 +01:00
Raushan Turganbay	79d4bc761d	[causal mask] fix preparation with multi-gpu (#37612 ) * fix multi-gpu * forgot non-copied models * fixup	2025-04-25 09:34:18 +02:00
Joao Gante	8bdd4f2acd	[generate] skip compilation on cpu offload (#37709 ) * skip compilation on cpu offload * add test * better logic * docstring * boolean logic * add disk offload check * warn users if compilation options are set but compilation doesn happen * fix test --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-04-24 14:08:17 +01:00
Joao Gante	80ea2c05c2	[tests, `qwen2_5_omni`] fix flaky tests (#37721 )	2025-04-23 17:54:12 +01:00
Joao Gante	85665a4263	[tests] Stricter generate + compilation test -- no recompilations allowed (#37629 ) * tmp commit * stricter compilation test * trigger tests * rm todo	2025-04-22 11:12:18 +01:00
Joao Gante	362fa37da2	[test] update `test_past_key_values_format` (#37614 ) allow custom shapes	2025-04-22 11:07:34 +01:00
Alex Brooks	a42ba80fa5	Allow Exclusion of Input IDs from RepetitionPenaltyLogitsProcessor (#37625 ) * Allow exclusion of input IDs for repetition penalty * Add logit proc tests for rep penalty exclusion * Expose rep pen flag through generate * Only slice if needed * keep current rep pen default behavior * Revert exposing reppen changes through generate * Fix test arg * Update src/transformers/generation/logits_process.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Rename to rep penalty kwarg * Add custom repetition penalty processor example * Validate prompt_ignore_length --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2025-04-21 15:46:05 +01:00
Yoni Gozlan	a245011252	Add InternVL (2.5 MPO) (#35968 ) * initial commit * add convert internvl * add first end-to-end working internvl * nit prompt and image proc * add working chat template * add conversion llama-based models * add tests * pass all tests * fix isort * fix modular after main merge * add video processing for internvl * add support for interlaced images and videos * Remove processing and config from modular, add more tests * add llama model tests * Modify processor for compatibility with refactored got ocr image processor * add comments in processor * Add docs and nits * change video processing to use custom sample_indices_fn * rebase and fix tests * add processor tests * Add changes Raushan review * Use the new attention interface for the vision model * nits * add support for custom video_load_backend * remove mention to InternVLTokenizer * refactor vision model to simplify logic * refactor processor for better readibility * fix copies * fix require av processor test * refactor internVL vision * Update processor and fix processing tests * fix docstring * update convert_weights for internvl3 * change image processor to fast by default * remove do_center_crop=True in convert_weights * force use_cache to True * push_to_hub before reloading * fix internVLVision for larger models * update convert weight for qk norm * fix convert_weights * fix eos_token_id in convert * update docs and integration tests * make modifs after review * fix wrong k_norm and reduce modular * change image_token_index to image_token_id * change checkpoint to OpenGVLab org * last nits * explicitely del self.num_key_value_groups * add extra special tokens	2025-04-18 18:57:33 +02:00
Raushan Turganbay	2ba6b92a6f	[VLMs] use only `xxx_token_id` for multimodal tokens (#37573 ) * use only `xxx_token_id` for multimodal tokens * update modeling files as well * fixup * why fixup doesn't fix modular docstring first? * janus, need to update configs in the hub still * last fixup	2025-04-18 17:03:39 +02:00
Joao Gante	e5ac23081e	[Gemma3] compile ✨ (#37447 )	2025-04-18 14:55:43 +01:00
Yaswanth Gali	a2ef3cf537	Add Janus model (#36053 ) * Iterative generation using input embeds * Add Janus model * discard changes * Janus imports * Refactor config and processor * Added Vision tower of Janus * Import Janus Image processor * Vision tower fixes * Refactor code * Added VQ Model * Complete model integration * temp conversion script * processor refactor * Adding files to facilitate pulling * Fixes after debugging * Skip test for these models * Add Janus Model * discard changes * Janus imports * Refactor config and processor * Added Vision tower of Janus * Import Janus Image processor * Vision tower fixes * Refactor code * Added VQ Model * Complete model integration * temp conversion script * processor refactor * Adding files to facilitate pulling * Fixes after debugging * Refactor to Text config * ✨ Added generate function * Saving intermediate convert file. Still need to read configs from the hub and convert them to our format. * Adding version that reads from the JSON files. Still have to tweak some parameters manually. * relative imports * Initial tests * Refactor image processor * Seemingly working version of the conversion script, will need to test further. * Adding command message * Fixing conflicting JanusTextConfig class * Incorporating some of the discussed changes. * Small fix to create dir. * Removing system from JINJA template * Adding draft processor tests * style fixes * Minor fixes and enhancement * added generation config * Initial tests * Small modifications, tests are now passing. * Small changes I noticed while reading code. * more fixes * Added JanusModel class * Small merge adaptations * Small merge adaptations * Image processing tests passing * More tests and fixes * Convert script updated and refactored * Tests and cleanup * make style * Postprocessing for image generation * generate refactor * fixes * - Passing tests that write a part of the model to cpu (e.g. test_cpu_offload) - Passing tests of dispatching SDPA - Only gradient checkpointing tests are left. * Removing temporary code * Changes * Writing change to modular * Added JanusVisionModel. SDPA dispatch tests pass more robustly. Gradient checkpoint tests are next * Gradient checkpoint tests passing * Removing debug code * Major generate refactor 😮‍💨 * Temp changes for testing * Green quality CI * 2 out of 4 integration tests passing * breadcrumbs * Usage Examples * Regenerate modeling after merge * dirty code * JanusIntegrationTest are passing * breadcrumbs * happy CI * fixes * Changing template * nits * Text generation logits matching original codebase at 100% precision * Remove ./tmp from git tracking * Remove ./tmp from git tracking * Checkpointing changes after reviewing * Fixing code in docstrings * CHanging comments and small bug in convert file * Fixing bug in image_token_id for 7B version * Removing line that was added by both of us * Pushing changes after discussion. Only one left is to change the key mapping for convert file. * Updating module file * New convert file using dict. Tested that it is equivalent to the old one by: - comparing keys in a script - comparing checksums of the output files between version generated with the current convert script and those generated with the old script. This is a more reliable test. * revert changes * mistake * consistency change for CI * make style * doc fixes * more fixes * experimenting with masking out pad token * checkpoint * Batched generation with multi-images working for 1B models. Will test 7B next. * Device fix. * Writing changes to modular, previous ones were written to modeling just for quick testing. * Using passed processor attention mask (only in modeling for now) * Matching performance done in the non-standard way * Working version of batched generation. Will change how some args are passed to make it more similar to language case * More compliant version of the code * Removed duplicated `_prepare_4d_causal_attention_mask_with_cache_position` * Updating modular file, making masked filling with paddings more efficient * Slightly more efficient version * Modifying JanusVisionModel to be a wrapper * Fixing test to comply with new names * Modular overhaul * More refactoring * - Changing JanusVisionModel back - Changing forward pass - Adding boi token to the comparison * - Removing whole context model_ids - Using inherited implementation of prepare_inputs_for_generation * Moving the way boi token is passed to the model * Fixing sdpa test * Minor changes * testing changes * Minor fix * - Adding postprocessing test - checking values of generated image on integration test * changes * Removing pooled attention vision module, fixing convert script as a consequence * More changes * Fixes * Draft after merge * Bug fixes * More bug fix * Fixing docs * Nits * Refactor return dict * Moving image post processing test to main processor post process * Passing guidance_scale as kwarg * make style * 🔥 refactor * make style * Update and green CI * Nits and tests update * up * Added MID block * fix * Dead code * update testcase * update * model_id change * init_weight changes --------- Co-authored-by: hsilva664 <metallic-silver@hotmail.com>	2025-04-17 09:18:51 +02:00
Yao Matrix	a335dc4d6d	enable `test_offloaded_cache_implementation` on XPU (#37514 ) Signed-off-by: YAO Matrix <matrix.yao@intel.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>	2025-04-16 11:04:57 +02:00
Raushan Turganbay	cb39f7dd5b	[qwen-omni] fix processor (#37493 ) * fix * delete print * accept kwargs in overriden models as well * remove duplicate	2025-04-14 17:30:31 +02:00
BakerBunker	4b8c6d4cf8	Add Qwen2.5-Omni (#36752 ) * Add qwen2.5-omni * Remove einops dependency * Add torchdiffeq dependency * Sort init * Add torchdiffeq to extras['diffeq'] * Fix repo consistency * use cached_file * del odeint * renew pytest * format * Remove torchdiffeq * format * fixed batch infer bug * Change positional_embedding to parameter * Change default speaker * Config revision * Use modular & code clean * code clean * decouple padding with model & code cleaning * sort init * fix * fix * Second code review * fix * fix * rename vars to full name + some comments * update pytest * Code clean & fix * fix * style * more clean up * fixup * smaller vision model in tests * fix processor test * deflake a bit the tests (still flaky though) * de-flake tests finally + add generation mixin * final nits i hope * make sure processor tests are complete * replace with Qwen2_5OmniForConditionalGeneration * fix tests after updating ckpt * fix typos when cleaning, also we can't change ckpt * fixup * images and videos kwargs for processor * thinker and talker loadable from hub ckpt * address comments and update tests after rebase * fixup * skip for now * fixup * fixup * remove torch dependency in processors --------- Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.con> Co-authored-by: feizi.wx <feizi.wx@alibaba-inc.com> Co-authored-by: raushan <raushan@huggingface.co>	2025-04-14 12:36:41 +02:00
Yih-Dar	ac1df5fccd	Fix tests failed with gated repos. (#37484 ) * fix * slow --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-04-14 12:08:13 +02:00
Alex Brooks	623d395aff	Add Granite Speech Support (#36801 ) * First pass at speech granite Add encoder / projector, rename things * Combine into one model file with causal lm outputs for forward * Add loss calc * Fix config loading Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> * Split new / old loading logic * Use transformers integration for loading peft adapters * Add generation wrapper for selective lora enablement * Add note for qformer encoder automodel * Guard torch/audio imports in feature extractor * Handle granite speech autoclasses * Handle optional deps in package structure for granite speech * Add granite pretrained model def for init * Add dummy objects for torch/torchaudio * Add tests for granite speech processor * Minor formatting fixes and refactoring * Add options for falling back to config in forward * Tentative model docstrings for granite speech * Fix config type * Remove legacy load * Allow non-lora variants for granite speech * Override weight tying for llm * Use text config instead of llm config * Add output embeddings getter to fix weight tying * Fix relative imports * computing the number of audio features, based on the raw audio sequence. * collating audio inputs, and keeping the original lengths. * asserted we have text. otherwise we can't specify the audio special token. * assering the number of audio-symbols/audios match correctly. running get validated_audios only when audio is present * indentation bugfix + supporting different feature lengths when expanding audio. * redundant, done in _get_validated_text * adapting the tests: - we must have text (not either audio or text) - _get_num_audio_features takes a list of raw lengths, provided it insetad. * Minor cleanup, remove unused import * Add more tests for batch feature processing * Allow setting offset in rel position embeddings * Add config option for warning if peft is not installed w/ lora * Port blip2 qformer code into granite speech * Add sad test for numpy arr processing * Allow numpy arrays / tuples in granite speech processor * Fix config type for projector * - pad instead of creating a zeros tensor, to keep the original dtype/device (support bfloat16) - cast input_features to the model dtype (support bfloat16) * merge Blip2QFormerConfig to GraniteSpeechProjectorConfig * prevent a crash when re-saving/loading the model (line 109) * consider additional edge cases during preprocessing. * consider additional edge cases during preprocessing. * add features mask for batched inference (bugfix) * Minor refactor, remove multiaudio processor tests * Add set input/output embeddings for granite speech * Fix feature dim check in processor test * Pop input features in embed test for granite speech * Small fixes for test edge cases Add granite speech to seq2seq causal lm mapping names * Add small tests for granite speech model * Fix data parallelism test * Standardize model class names * Fix check for copies * Fix misaligned init check * Skip granite speech in checkpoint check * Use default for tie_word_embeddings in granite speech * Fix non documentation granite speech repo issues * Fix comments and docstring checks * Add placeholder docs for granite speech * Fix test naming collision * Code formatting * Rerun torch dummy obj regen * Fix save pretrained for granite speech * Import sorting * Fix tests typo * Remove offset hack * Pass args through encoder config * Remove unused prune heads from blip2 * removing einsum. replaced with explicit multiplication (relative positional encodings) and sdpa attention. * remove Sequential from ConformerFeedForward and ConformerConvModule. + fix for sdpa attention * remove GraniteSpeechConformerScale * rename to hidden_states * rename conformer layers to self.layers, remove the first linear from the list to keep the list homogenous. * move pre-norm to the attention/feedforward blocks (avoid complex module wrapping) * adding pre_norm into forward * feature extractor refactoring to resemble how it's done in phi4multimodal. * rename feature_extractor to audio_processor * bugfix: input_feature_mask fix to get the exact number tokens. * Fix pytest decorator in processor test * Add (disabled) integration tests for granite speech * Fix handling of optional feature masking * Loosen validation in processing for vLLM compatability * Formatting fixes * Update init structure to mirror llama * Make granite speech projector generic * Update test config to reflect generic projector * Formatting fixes * Fix typos, add license * Fix undefined var in input processing * Cleanup and expose ctc encoder * Add missing config docstrings * Better var names, type hints, etc * Set attn context size in init * Add max pos emb to encoder config * Cleanup feature extractor * Add granite speech architecture details * Remove granite speech qformer ref * Add paper link, explicit calc for qkv * Calculate padding directly in depthwise conv1d init * Raise value error instead of asserting * Reorder class defs (classes used at top) * Precompute relpos distances * Run formatting * Pass attention distances through forward * Apply suggestions from code review Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com> * Add todo for using common batch feature extraction * Rename audios/features * Ensure chat template may be provided to processor * Move granite speech docs to audio models * Add todos for input proc refactoring * Fix import order * Guard torch import * Use relative imports * Require torch backend for processor in granite speech * Add backend guards in feature extractor --------- Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> Co-authored-by: Avihu Dekel <avihu.dekel@ibm.com> Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>	2025-04-11 18:52:00 +02:00
Raushan Turganbay	6f4058aee3	Update composition flag usage (#36263 ) * update composition flag usage * remove print * fix tests * actually fix * oh c'mon * now should be fixed right? * fix copies	2025-04-09 11:48:49 +02:00
Jonathan Mamou	121f91d36c	prune LM Head for USD (#36695 ) * initial commit * fix * fix style * set default to prune * add tests * comment * remove prune flag from generate * address Joao's comments * deprecate_kwarg * add doc * fix target_vocab_size * Update src/transformers/generation/candidate_generator.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/candidate_generator.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/candidate_generator.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/candidate_generator.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * fix deprecated argument assistant_model_device --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2025-04-08 16:44:10 +01:00
cyyever	1e6b546ea6	Use Python 3.9 syntax in tests (#37343 ) Signed-off-by: cyy <cyyever@outlook.com>	2025-04-08 14:12:08 +02:00
Joao Gante	acbcb5d07d	[Tests] flaky `test_constrained_beam_search_generate_dict_output` (#37276 )	2025-04-04 13:38:42 +01:00
cyyever	edd345b52e	Fix deprecated PT functions (#37237 ) * Fix deprecated PT functions Signed-off-by: cyy <cyyever@outlook.com> * Revert some changes Signed-off-by: cyy <cyyever@outlook.com> --------- Signed-off-by: cyy <cyyever@outlook.com>	2025-04-04 12:31:11 +01:00
Joao Gante	9a1c1fe7ed	[CI] green llama tests (#37244 ) * green llama tests * use cleanup instead * better test comment; cleanup upgrade * better test comment; cleanup upgrade	2025-04-03 14:15:53 +01:00
Joao Gante	e90d55ebcc	[Tests] add `min_new_tokens` to prevent flaky length checks (#37175 )	2025-04-02 15:24:00 +01:00
Xavier Dupré	6f5dc9c82e	Fixes DynamicCache export issues due to control flow and inplace modifications (#36652 ) * Remove unnecessary masked_fill in deberta models * Enable some code when exporting but not compiling * add missing import * style * replace if by torch.cond * style * use numel * style * add unit tests * style * change empty value for dynamic cache * replace != [] by numel() * fix import issue * style	2025-04-02 12:04:40 +01:00
Yao Matrix	24e311f42b	fix XPU UT error case brough by RNG difference btw XPU and CUDA (#37121 ) * fix XPU UT error case brough by RNG difference btw XPU and CUDA Signed-off-by: YAO Matrix <matrix.yao@intel.com> * enable tests/models/llama/test_modeling_llama.py::LlamaIntegrationTest::test_model_7b_logits and tests/models/llama/test_modeling_llama.py::LlamaIntegrationTest::test_model_7b_logits_bf16 on xpu Signed-off-by: YAO Matrix <matrix.yao@intel.com> * Revert "enable tests/models/llama/test_modeling_llama.py::LlamaIntegrationTest::test_model_7b_logits and tests/models/llama/test_modeling_llama.py::LlamaIntegrationTest::test_model_7b_logits_bf16 on xpu" This reverts commit `3ef83a4f02`. --------- Signed-off-by: YAO Matrix <matrix.yao@intel.com>	2025-04-01 13:52:55 +01:00
Yao Matrix	8f6b27eb5c	enable `test_assisted_decoding_in_different_gpu` test on XPU (#37120 ) Signed-off-by: YAO Matrix <matrix.yao@intel.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>	2025-04-01 11:22:59 +02:00
cyyever	786d9c5ed9	Fix more inefficient PT operations (#37060 ) * Fix inefficient operations * Remove cpu() call * Reorder detach() * Reorder detach() * tolist without detach * item without detach * Update src/transformers/models/rag/modeling_rag.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update tests/models/encodec/test_modeling_encodec.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Use detach().cpu().numpy * Revert some numpy operations * More fixes --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2025-03-31 16:31:24 +01:00
huismiling	d0b65bb479	[MLU] Fix FA2 check error, remove deepspeed-mlu deps. (#36159 ) * add Cambricon MLUs support * fix mlu device rng state * up for quality check * up mlu to support fp16 * fix mlu device dependency error * fix mlu device dependency error * enable mlu device for bf16 * fix mlu device memory tracker * Cambricon support SDPA and flash_attn * MLU devices : Checks if `mlu` is available via an `cndev-based` check which won't trigger the drivers and leave mlu * Fix mlu FA2 check. Remove deepspeed-mlu check. add mlu tests support. * fix testing errors. * Merge branch 'hf/main' into main * fix get_device_count error. * fix mlu testing utils. * fix code quality and style. * switch to @require_torch_multi_accelerator	2025-03-31 11:02:49 +02:00
Joao Gante	29f322d04d	[generate, cache] handle more complex device maps (#37014 )	2025-03-27 14:33:20 +00:00
Raushan Turganbay	57f551c78d	[chameleon] fix num image token check (#36918 ) * [chameleon] fix num image token check * embed after merging image token * skip this also * mistral require_read_token	2025-03-24 12:36:08 +01:00
Joao Gante	179d02ffb8	[generate] ✨ vectorized beam search ✨ (#35802 )	2025-03-18 18:39:36 +00:00
Cyril Vallez	e959530b8f	Add Mistral3 (#36790 ) * initial start * style and dummies * Create convert_mistral3_weights_to_hf.py * update * typo * typo * Update convert_mistral3_weights_to_hf.py * Update convert_mistral3_weights_to_hf.py * Update convert_mistral3_weights_to_hf.py * Update convert_mistral3_weights_to_hf.py * up * Update convert_mistral3_weights_to_hf.py * Update convert_mistral3_weights_to_hf.py * update * update * Update image_processing_mistral3.py * Update convert_mistral3_weights_to_hf.py * fix patch merger * Update convert_mistral3_weights_to_hf.py * Update convert_mistral3_weights_to_hf.py * up * update modular to fit * style * Update convert_mistral3_weights_to_hf.py * typo * Update modular_mistral3.py * simplify a lot all shape shenanigans * simplify * add working test processor * Add partially working common modeling tests * All tests working and remove mistral3 image processors * add docs and fixup * fix inference with image size >1540 * 🚨fix test image proc pixtral * Remove vision_feature_select_strategy * Update convert_mistral3_weights_to_hf.py * Update convert_mistral3_weights_to_hf.py * Update convert_mistral3_weights_to_hf.py * Update convert_mistral3_weights_to_hf.py * clean * fix test checkpoints * Update test_modeling_mistral3.py * Update test_modeling_mistral3.py * style * Use Pixtral processor * up * finish cleaning processor to use pixtral directly * Update __init__.py * Update processing_pixtral.py * doc * Update __init__.py * Update mistral3.md * Update _toctree.yml --------- Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co> Co-authored-by: yonigozlan <yoni.gozlan10@gmail.com>	2025-03-18 12:04:42 +01:00
Afanti	19b9d8ae13	chore: fix typos in tests directory (#36785 ) * chore: fix typos in tests directory * chore: fix typos in tests directory * chore: fix typos in tests directory * chore: fix typos in tests directory * chore: fix typos in tests directory * chore: fix typos in tests directory * chore: fix typos in tests directory	2025-03-18 10:31:13 +01:00

1 2 3 4 5 ...

334 Commits