transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-03 21:00:08 +06:00

Author	SHA1	Message	Date
Anton Vlasjuk	d03a3ca692	[`OPT`] Fix attention scaling (#38290 ) * fix opt attention scaling * add comment to why we do this	2025-05-26 11:02:16 +02:00
Yao Matrix	a5a0c7b888	switch to device agnostic device calling for test cases (#38247 ) * use device agnostic APIs in test cases Signed-off-by: Matrix Yao <matrix.yao@intel.com> * fix style Signed-off-by: Matrix Yao <matrix.yao@intel.com> * add one more Signed-off-by: YAO Matrix <matrix.yao@intel.com> * xpu now supports integer device id, aligning to CUDA behaviors Signed-off-by: Matrix Yao <matrix.yao@intel.com> * update to use device_properties Signed-off-by: Matrix Yao <matrix.yao@intel.com> * fix style Signed-off-by: Matrix Yao <matrix.yao@intel.com> * update comment Signed-off-by: Matrix Yao <matrix.yao@intel.com> * fix comments Signed-off-by: Matrix Yao <matrix.yao@intel.com> * fix style Signed-off-by: Matrix Yao <matrix.yao@intel.com> --------- Signed-off-by: Matrix Yao <matrix.yao@intel.com> Signed-off-by: YAO Matrix <matrix.yao@intel.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-05-26 10:18:53 +02:00
Raushan Turganbay	cba279f46c	[VLMs] add helpers for get/set embedding (#38144 ) * add helpers in VLMs * fix tied weight key test	2025-05-26 09:50:32 +02:00
Yih-Dar	6e3063422c	Uninstall `kernels` for AMD docker images (#38354 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details Uninstall kernels for AMD docker images Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-05-25 19:42:25 +02:00
Yih-Dar	4a03044ddb	Hot fix for AMD CI workflow (#38349 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-05-25 11:15:31 +02:00
Yih-Dar	d0c9c66d1c	new failure CI reports for all jobs (#38298 ) Some checks failed Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details Check Tiny Models / Check tiny models (push) Has been cancelled Details * new failures * report_repo_id * report_repo_id * report_repo_id * More fixes * More fixes * More fixes * ruff --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-05-24 19:15:02 +02:00
Kseniya Parkhamchuk	31f8a0fe8a	[docs]: update roformer.md model card (#37946 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details * Update roformer model card * fix example purpose description * fix model description according to the comments * revert changes for autodoc * remove unneeded tags * fix review issues * fix hfoption --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-05-23 16:27:56 -07:00
Bryan C.	36f97ae15b	docs(swinv2): Update SwinV2 model card to new standard format (#37942 ) * docs(swinv2): Update SwinV2 model card to new standard format * docs(swinv2): Apply review suggestions Incorporates feedback from @stevhliu to: - Enhance the introductory paragraph with more details about scaling and SimMIM. - Generalize the tip from "image classification tasks" to "vision tasks". Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-05-23 13:04:13 -07:00
Aguedo	33d23c39ed	Update BioGPT model card (#38214 ) * Update BioGPT model card * Update docs/source/en/model_doc/biogpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/biogpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/biogpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/biogpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/biogpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/biogpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/biogpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/biogpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/biogpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/biogpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/biogpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * correction for CPU fallback * added quantization code and method * fixed transformers-cli call --------- Co-authored-by: Aguedo <aguedo@fakeemail.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-05-23 13:03:47 -07:00
Cheery	dffb118013	Remove duplicate docstring: resample (#38305 ) Duplicate of the line above.	2025-05-23 13:02:58 -07:00
Cyril Vallez	e0aad278fe	Never fallback to eager implicitly (#38327 ) Some checks failed Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details New model PR merged notification / Notify new model (push) Has been cancelled Details * remove arg everywhere * Update warnings * add more models * Update sdpa_attention.py * fix style * fix * readd warnings but not for flex * Update test_modeling_common.py * skip * fix --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-05-23 19:48:01 +02:00
Alex Brooks	e64ed0304c	Use Gradient Checkpointing Layer in Jamba & Blip Related Models (#38310 ) * Use gradient checkpointing class in blip classes * Use gradient checkpointing class in jamba/bamba	2025-05-23 19:35:25 +02:00
Matt	53fb245eb6	🚨 🚨 Inherited CausalLM Tests (#37590 ) * stash commit * Experiment 1: Try just Gemma * Experiment 1: Just try Gemma * make fixup * Trigger tests * stash commit * Try adding Gemma3 as well * make fixup * Correct attrib names * Correct pipeline model mapping * Add in all_model_classes for Gemma1 again * Move the pipeline model mapping around again * make fixup * Revert Gemma3 changes since it's a VLM * Let's try Falcon * Correct attributes * Correct attributes * Let's try just overriding get_config() for now * Do Nemotron too * And Llama! * Do llama/persimmon * Correctly skip tests * Fix Persimmon * Include Phimoe * Fix Gemma2 * Set model_tester_class correctly * Add GLM * More models! * models models models * make fixup * Add Qwen3 + Qwen3MoE * Correct import * make fixup * Add the QuestionAnswering classes * Add the QuestionAnswering classes * Move pipeline mapping to the right place * Jetmoe too * Stop RoPE testing models with no RoPE * Fix up JetMOE a bit * Fix up JetMOE a bit * Can we just force pad_token_id all the time? * make fixup * fix starcoder2 * Move pipeline mapping * Fix RoPE skipping * Fix RecurrentGemma tests * Fix Falcon tests * Add MoE attributes * Fix values for RoPE testing * Make sure we set bos_token_id and eos_token_id in an appropriate range * make fixup * Fix GLM4 * Add mamba attributes * Revert bits of JetMOE * Re-add the JetMOE skips * Update tests/causal_lm_tester.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Add licence --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-05-23 18:29:31 +01:00
Aaron V	d5f992f5e6	Enhance Model Loading By Providing Parallelism, Uses Optional Env Flag (#36835 ) * Get parallel loader working. Include tests. * Update the tests for parallel loading * Rename env variables. * Add docs for parallel model weight loading. * Touch up parallel model loading docs. * Touch up parallel model loading docs again. * Edit comment in test_modeling_utils_parallel_loading.py * Make sure HF_PARALLEL_LOADING_WORKERS is spelled correctly in modeling_utils.py * Correct times for parallelized loading, previous times were for a "hot" filesystem * Update parallel model loading so the spawn method is encapsulated. DRY up the code by leveraging get_submodule. * Update docs on model loading parallelism so that details on setting the multiprocessing start method are removed, now that the package handles this step internally. * Fix style on model loading parallelism changes. * Merge latest version of master's modeling_utils. * Removed unused variable. * Fix argument packing for the parallel loader. * Fix state dict being undefined in the parallel model loader. * Rename variables used in parallel model loading for clarity. Use get_module_from_name(). * Switch to the use of threads for parallel model loading. * Update docs for parallel loading. * Remove the use of json.loads when evaluating HF_ENABLE_PARALLEL_LOADING. Prefer simple casting. * Move parallelized shard loading into its own function. * Remove use of is_true(). Favor checking env var true values for HF_ENABLE_PARALLEL_LOADING. * Update copyright to 2025 in readme for paralell model loading. * Remove garbage collection line in load_shard_file, implicit garbage collection already occurs. * Run formatter on modeling_utils.py * Apply style fixes * Delete tests/utils/test_modeling_utils_parallel_loading.py --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>	2025-05-23 16:39:47 +00:00
Anton Vlasjuk	1ed19360b1	[`FlexAttention`] Reenable flex for encoder-decoder and make the test more robust (#38321 ) * reenable most flex attention test cases * style * trigger * trigger	2025-05-23 18:16:43 +02:00
Ita Zaporozhets	bb567d85a4	refactor can_save_slow_tokenizer (#37722 ) * refactor to rm property can_save_slow_tokenizer, it can be done within the if of save_vocab * move property to fast * revert if * check if vocab_file is attr * fix check for sp * fix if condition * fix if condition * fix if condition	2025-05-23 17:29:38 +02:00
Zhen	3c289e2104	[performance_optim] reduce frequency of declaring attention_mask in Ascend NPU flash attention (#38278 ) [performance_optim] reduce frequency of declaring attention_mask in ASCEND NPU flash attention	2025-05-23 17:24:51 +02:00
Arthur	f5d45d89c4	🚨Early-error🚨 config will error out if `output_attentions=True` and the attn implementation is wrong (#38288 ) * Protect ParallelInterface * early error out on output attention setting for no wraning in modeling * modular update * fixup * update model tests * update * oups * set model's config * more cases * ?? * properly fix * fixup * update * last onces * update * fix? * fix wrong merge commit * fix hub test * nits * wow I am tired * updates * fix pipeline! --------- Co-authored-by: Lysandre <hi@lysand.re>	2025-05-23 17:17:38 +02:00
Cyril Vallez	896833c183	Fix some tests (especially compile with fullgraph=True on Python<3.11) (#38319 ) * fix tests * better fix for python<3.11 * fixes * style	2025-05-23 17:11:40 +02:00
Yih-Dar	a63bc17416	add `vasqu` to `self-comment-ci.yml` (#38324 ) add vasqu Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-05-23 17:09:44 +02:00
Joao Gante	54cd86708d	[custom_generate] don't forward `custom_generate` and `trust_remote_code` (#38304 ) * prevent infinite loops * docs * more links to custom generation methods	2025-05-23 14:49:39 +00:00
Jinan Zhou	135163e9c5	Expose AutoModelForTimeSeriesPrediction for import (#38307 ) * expose AutoModelForTimeSeriesPrediction for import * add in docs	2025-05-23 13:09:29 +00:00
Joao Gante	a6b51e7341	[Whisper + beam search] fix usage of `beam_indices` (#38259 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details New model PR merged notification / Notify new model (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details * tmp * fix test_tiny_token_timestamp_batch_generation * better comments * test * comments * Apply suggestions from code review Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> --------- Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>	2025-05-23 10:05:44 +00:00
Joao Gante	3e960e032d	[tf/flax] handle `forced_decoder_ids` deletion (#38316 ) fix tf/flax, attr checks	2025-05-23 09:44:58 +00:00
Ryan Mullins	9eb0a37c9e	Adds use_repr to model_addition_debugger_context (#37984 ) * Adds use_repr to model_addition_debugger_context * Updating docs for use_repr option	2025-05-23 09:35:13 +00:00
Abdessamad Enabih	38f9c5b15b	Fix typo: change 'env' to 'environment' in .circleci/config.yml (#38273 ) * Fix typo: change 'env' to 'environment' in .circleci/config.yml * Remove CIRCLE_TOKEN environment variable from artifact retrieval step --------- Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>	2025-05-23 10:45:27 +02:00
Yuanyuan Chen	11b670a282	Fix run_slow (#38314 ) Signed-off-by: cyy <cyyever@outlook.com>	2025-05-23 10:18:30 +02:00
Raushan Turganbay	b01984a51d	[emu3] fix conversion script (#38297 ) * fix conversion script and update weights * fixup * remove commented line	2025-05-23 09:49:56 +02:00
Yaswanth Gali	2b585419b4	[Tests] Cleanup Janus Testcase (#38311 ) * Cleanup janus testcase * shift code to setup	2025-05-23 09:29:16 +02:00
Cyril Vallez	b59386dc0a	Oups typo for HybridChunkedCache (#38303 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details New model PR merged notification / Notify new model (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details typo	2025-05-22 17:52:37 +02:00
Arthur	211f2b0875	Add CB (#38085 ) * stash for now * initial commit * small updated * up * up * works! * nits and fixes * don't loop too much * finish working example * update * fix the small freeblocks issue * feat: stream inputs to continuous batch * fix: update attn from `eager` to `sdpa` * refactor: fmt * refactor: cleanup unnecessary code * feat: add `update` fn to `PagedAttentionCache` * feat: broken optimal block size computation * fix: debugging invalid cache logic * fix: attention mask * refactor: use custom prompts for example * feat: add streaming output * fix: prefill split refactor: add doc strings and unsound/redundant logic fix: compute optimal blocks logic * fix: send decoded tokens when `prefilling_split` -> `decoding` * refactor: move logic to appropriate parent class * fix: remove truncation as we split prefilling anyways refactor: early return when we have enough selected requests * feat: add paged attention forward * push Ggraoh> * add paged sdpa * update * btter mps defaults * feat: add progress bar for `generate_batch` * feat: add opentelemetry metrics (ttft + batch fill %age) * feat: add tracing * Add cuda graphs (#38059) * draft cudagraphs addition * nits * styling * update * fix * kinda draft of what it should look like * fixes * lol * not sure why inf everywhere * can generate but output is shit * some fixes * we should have a single device synch * broken outputs but it does run * refactor * updates * updates with some fixes * fix mask causality * another commit that casts after * add error * simplify example * update * updates * revert llama changes * fix merge conflicts * fix: tracing and metrics * my updates * update script default values * fix block allocation issue * fix prefill split attnetion mask * no bugs * add paged eager * fix * update * style * feat: add pytorch traces * fix * fix * refactor: remove pytorch profiler data * style * nits * cleanup * draft test file * fix * fix * fix paged and graphs * small renamings * cleanups and push * refactor: move tracing and metrics logic to utils * refactor: trace more blocks of code * nits * nits * update * to profile or not to profile * refactor: create new output object * causal by default * cleanup but generations are still off for IDK what reason * simplifications but not running still * this does work. * small quality of life updates * nits * updaet * fix the scheduler * fix warning * ol * fully fixed * nits * different generation parameters * nice * just style * feat: add cache memory usage * feat: add kv cache free memory * feat: add active/waiting count & req latency * do the sampling * fix: synchronize CUDA only if available and improve error handling in ContinuousBatchingManager * fix on mps * feat: add dashboard & histogram buckets * perf: improve waiting reqs data structures * attempt to compile, but we should only do it on mps AFAIK * feat: decouple scheduling logic * just a draft * c;eanup and fixup * optional * style * update * update * remove the draft documentation * fix import as well * update * fix the test * style doomed --------- Co-authored-by: Luc Georges <luc.sydney.georges@gmail.com>	2025-05-22 17:43:48 +02:00
Cyril Vallez	73286d8e29	Fix HybridChunedCache & Llama4 (#38299 ) * Update cache_utils.py * Update cache_utils.py	2025-05-22 17:25:51 +02:00
Anton Vlasjuk	d95c864a25	🔴🔴🔴 [`Attention`] Refactor Attention Interface for Bart-based Models (#38108 ) * starting attn refactor for encoder decoder models via bart (eager + sdpa) * flash attention works, remove unnecessary code * flex attention support for bart!, gotta check if the renaming is not too aggressive * some comments * skip flex grad test for standalone as done with the other test * revert flex attn rename (for now), sdpa simplify, and todos * more todos * refactor mask creation for reuse * modular attempt at biogpt * first batch of other models * fix attn dropout * fix autoformer copies * hubert * another batch of models * copies/style + last round of bart models --> whisper next? * remove unnecessary _reshape function and remove copy to whisper * add skip for decoder-only models out of enc-dec (same as in bart) * bring back licences * remove comment, added to pr read instead * mostly docs * disable sew flex attn as it's unclear attn mask for now * oops * test fixes for enc-dec * torch fx fixes + try at flex attn * skip on mbart * some more fixes * musicgen skip / delete old attn class logic + sdpa compose compile skip * disable flex attn for musicgen, not worth the effort * more fixes and style * flex attention test for dropout and encoder decoder that dont have main input names * informer fixes * the weirdest thing I've encountered yet... * style * remove empty tensor attempt, found core root in previous commits * disable time series due to tests being very text centric on inputs * add speech to text to be ignoring the other attns, also due to tests * update docs * remaining issues resolved ? * update docs for current state --> nllb moe and pegasus x sdpa is questionable :D * some models have not set the is_causal flag... * change dtype in softmax tol old behaviour + some modular fixes * I hate it but it is what it is * fixes from main for bart * forgot this one * some model fixes * style * current status * marian works now * fixing some copies * some copy fixes + time series x informer * last models possibly and fixes on style/copies * some post merge fixes * more fixes * make attention interface callable and move warnings there * style lol * add comment to "unsupported" * remove callable interface and change interface warnings + some copies * fix * ternary is ugly af, make it simpler * how did that happen * fix flex attn test * failing the test * no more fallback! fixing copies next * style + attn fixed * fixing copies and mask creation * wrong copy * fixup tests and disable flex attn for now * fixup last tests?	2025-05-22 17:12:58 +02:00
Ákos Hadnagy	9895819514	Update CI Docker base image for AMD tests (#38261 ) use newer Pytorch base image for AMD CI tests	2025-05-22 16:38:40 +02:00
Yao Matrix	dfbee79ca3	refine `transformers env` output (#38274 ) * refine `transformers env` output Signed-off-by: Matrix Yao <matrix.yao@intel.com> * fix style Signed-off-by: Matrix Yao <matrix.yao@intel.com> --------- Signed-off-by: Matrix Yao <matrix.yao@intel.com>	2025-05-22 15:22:18 +02:00
Yuanyuan Chen	1234683309	More typing in src/transformers/training_args.py (#38106 ) * Annotate `framework` in src/transformers/training_args.py Signed-off-by: cyy <cyyever@outlook.com> * Fix typing Signed-off-by: cyy <cyyever@outlook.com> * Revert framework change Signed-off-by: cyy <cyyever@outlook.com> --------- Signed-off-by: cyy <cyyever@outlook.com>	2025-05-22 13:14:33 +02:00
Marc Sun	03a4c024dc	Fix tp error when torch distributed is already initialized (#38294 ) fix tp error	2025-05-22 12:34:05 +02:00
Yih-Dar	dcaf47dde5	add `liger-kernel` to docker file (#38292 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details New model PR merged notification / Notify new model (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details add Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-05-22 11:58:17 +02:00
Cyril Vallez	163138a911	🚨🚨[core] Completely rewrite the masking logic for all attentions (#37866 ) * start * start having a clean 4d mask primitive * Update mask_utils.py * Update mask_utils.py * switch name * Update masking_utils.py * add a new AttentionMask tensor class * fix import * nits * fixes * use full and quandrants * general sdpa mask for all caches * style * start some tests * tests with sliding, chunked * add styling * test hybrid * Update masking_utils.py * small temp fixes * Update modeling_gemma2.py * compile compatible * Update masking_utils.py * improve * start making it more general * Update masking_utils.py * generate * make it work with flex style primitives! * Update masking_utils.py * Update masking_utils.py * Update masking_utils.py * improve * Update cache_utils.py * Update masking_utils.py * simplify - starting to look good! * Update masking_utils.py * name * Update masking_utils.py * style * Update masking_utils.py * Update masking_utils.py * Update masking_utils.py * Update masking_utils.py * small fix for flex * flex compile * FA2 * Update masking_utils.py * Escape for TGI/vLLM! * Update masking_utils.py * Update masking_utils.py * Update masking_utils.py * General case without cache * rename * full test on llama4 * small fix for FA2 guard with chunk * Update modeling_gemma2.py * post rebase cleanup * FA2 supports static cache! * Update modeling_flash_attention_utils.py * Update flex_attention.py * Update masking_utils.py * Update masking_utils.py * Update utils.py * override for export * Update executorch.py * Update executorch.py * Update executorch.py * Update executorch.py * Update masking_utils.py * Update masking_utils.py * output attentions * style * Update masking_utils.py * Update executorch.py * Add doicstring * Add license and put mask visualizer at the end * Update test_modeling_common.py * fix broken test * Update test_modeling_gemma.py * Update test_modeling_gemma2.py * Use fullgraph=False with FA2 * Update utils.py * change name * Update masking_utils.py * improve doc * change name * Update modeling_attn_mask_utils.py * more explicit logic based on model's property * pattern in config * extend * fixes * make it better * generalize to other test models * fix * Update masking_utils.py * fix * do not check mask equivalence if layer types are different * executorch * Update modeling_gemma2.py * Update masking_utils.py * use layer_idx instead * adjust * Update masking_utils.py * test * fix imports * Update modeling_gemma2.py * other test models * Update modeling_llama4.py * Update masking_utils.py * improve * simplify * Update masking_utils.py * typos * typo * fix * Update masking_utils.py * default DynamicCache * remove default cache * simplify * Update masking_utils.py * Update masking_utils.py * Update masking_utils.py * Update masking_utils.py * simplify * Update masking_utils.py * Update masking_utils.py * Update masking_utils.py * export * Update executorch.py * Update executorch.py * Update flex_attention.py * Update executorch.py * upstream to modular gemma 1 & 2 * Update modular_mistral.py * switch names * use dict * put it in the Layer directly * update copy model source for mask functions * apply so many modular (hopefully 1 shot) * use explicite dicts for make style happy * protect import * check docstring * better default in hybrid caches * qwens * Update modular_qwen2.py * simplify core logic! * Update executorch.py * qwen3 moe * Update masking_utils.py * Update masking_utils.py * simplify a lot sdpa causal skip * Update masking_utils.py * post-rebase * gemma3 finally * style * check it before * gemma3 * More general with newer torch * align gemma3 * Update utils.py * Update utils.py * Update masking_utils.py * Update test_modeling_common.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * test * executorch * Update test_modeling_common.py * Update masking_utils.py * Update masking_utils.py * Update masking_utils.py * Update masking_utils.py * Update executorch.py * Update test_modeling_common.py * fix copies * device * sdpa can be used without mask -> pass the torchscript tests in this case * Use enum for check * revert enum and add check instead * remove broken test * cohere2 * some doc & reorganize the Interface * Update tensor_parallel.py * Update tensor_parallel.py * doc and dummy * Update test_modeling_paligemma2.py * Update modeling_falcon_h1.py * Update masking_utils.py * executorch patch * style * CIs * use register in executorch * final comments! --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>	2025-05-22 11:38:26 +02:00
Joao Gante	f8630c778c	[Whisper] handle deprecation of `forced_decoder_ids` (#38232 ) * fix * working saved forced_decoder_ids * docstring * add deprecation message * exception message ordering * circular import comment	2025-05-22 09:16:38 +00:00
Joao Gante	aa02a5d902	[whisper] move processor test into processor test file 🧹 (#38266 ) move processor tests	2025-05-22 10:07:11 +01:00
Yao Matrix	b26157d64c	add XPU info print in print_env (#38282 ) Signed-off-by: Matrix Yao <matrix.yao@intel.com>	2025-05-22 11:03:56 +02:00
Bryan C.	b369a65480	docs(swin): Update Swin model card to standard format (#37628 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details * docs(swin): Update Swin model card to standard format * docs(swin): Refine link to Microsoft organization for Swin models Apply suggestion from @stevhliu in PR #37628. This change updates the link pointing to the official Microsoft Swin Transformer checkpoints on the Hugging Face Hub. The link now directs users specifically to the Microsoft organization page, filtered for Swin models, providing a clearer and more canonical reference compared to the previous general search link. Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * docs(swin): Clarify padding description and link to backbone docs Apply suggestion from @stevhliu in PR #37628. This change introduces two improvements to the Swin model card: 1. Refines the wording describing how Swin handles input padding for better clarity. 2. Adds an internal documentation link to the general "backbones" page when discussing Swin's capability as a backbone model. These updates enhance readability and improve navigation within the Transformers documentation. Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * docs(swin): Change Swin paper link to huggingface.co/papers as suggested Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-05-21 16:16:43 -07:00
Parag Ekbote	28d3148b07	Update Model Card for Mamba (#37863 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details New model PR merged notification / Notify new model (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details * update model card. * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * update quantization example. * update example. * update --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-05-21 10:58:23 -07:00
Arthur	7b7bb8df97	Protect ParallelInterface (#38262 ) Co-authored-by: Lysandre <hi@lysand.re> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>	2025-05-21 17:45:38 +02:00
ritsumei-aoi	5c13cc0f94	Remove Japanese sequence_classification doc and update references (#38246 )	2025-05-21 08:33:41 -07:00
jiqing-feng	71009e4b68	assign the correct torchao data layout for xpu (#37781 ) * assign the correct data layout for xpu Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * check torch version before using torchao xpu Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix the log Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix zero point type Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix check torch version Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>	2025-05-21 17:21:55 +02:00
danielyxyang	d6c34cdcd0	Fix: missing else branch to handle "--load_best_model_at_end" in training_args.py (#38217 ) Update training_args.py	2025-05-21 14:28:56 +00:00
Yuanyuan Chen	ae3e4e2d97	Improve typing in TrainingArgument (#36944 ) * Better error message in TrainingArgument typing checks * Better typing * Small fixes Signed-off-by: cyy <cyyever@outlook.com> --------- Signed-off-by: cyy <cyyever@outlook.com>	2025-05-21 13:54:38 +00:00
amd-xiaoyu12	174684a9b6	Simplify DTensor Check for modeling_utils.py (#38245 ) Update modeling_utils.py	2025-05-21 13:35:44 +00:00

... 2 3 4 5 6 ...

19225 Commits