transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-04 13:20:12 +06:00

Author	SHA1	Message	Date
Joao Gante	beaed8ce01	[generate] move `SinkCache` to a `custom_generate` repo (#38399 ) remove sink cache	2025-06-02 12:13:30 +02:00
Fanli Lin	51d732709e	[docs] add xpu environment variable for gpu selection (#38194 ) Some checks failed Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Has been cancelled Details Build documentation / build (push) Has been cancelled Details Slow tests on important models (on Push - A10) / Get all modified files (push) Has been cancelled Details Self-hosted runner (push-caller) / Check if setup was changed (push) Has been cancelled Details Secret Leaks / trufflehog (push) Has been cancelled Details Update Transformers metadata / build_and_package (push) Has been cancelled Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Has been cancelled Details Self-hosted runner (push-caller) / build-docker-containers (push) Has been cancelled Details Self-hosted runner (push-caller) / Trigger Push CI (push) Has been cancelled Details * squash commits * rename gpu * rename accelerator * change _toctree.yml * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: sdp <sdp@a4bf01943ff7.jf.intel.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-05-30 16:05:07 +00:00
Yuanzhou Cai	942c60956f	Model card for mobilenet v1 and v2 (#37948 ) * doc: #36979 * doc: update hfoptions * add model checkpoints links * add model checkpoints links * update example output * update style #36979 * add pipeline tags * improve comments * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * apply suggested changes * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-05-28 09:20:19 -07:00
Jiwook Han	9a8510572b	Updated the model card for ViTMAE (#38302 ) * Update vit_mae.md * badge float:right * Update docs/source/en/model_doc/vit_mae.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/vit_mae.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/vit_mae.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/vit_mae.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/vit_mae.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/vit_mae.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/vit_mae.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/vit_mae.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/vit_mae.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update model_doc/vit_mae.md * fix --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-05-28 09:19:43 -07:00
Vanshu	c9fcbd5bf9	Updated the Model docs - for the ALIGN model (#38072 ) * Updated the Model docs - for the ALIGN model * Update docs/source/en/model_doc/align.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/align.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Updated align.md * Update docs/source/en/model_doc/align.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/align.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update align.md * fix --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-05-28 09:19:09 -07:00
Andy Vu	3b3ebcec40	Updated model card for OLMo2 (#38394 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details * Updated OLMo2 model card * added command line * Add suggestions Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Added suggestions Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Indented code block as per suggestions --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-05-27 16:24:36 -07:00
Tanuj Rai	a092f6babf	Update granite.md (#37791 ) * Update granite.md * Update docs/source/en/model_doc/granite.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/granite.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/granite.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * update granite.md * Update docs/source/en/model_doc/granite.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/granite.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/granite.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/granite.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/granite.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/granite.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * minor fixes --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-05-27 12:55:15 -07:00
RogerSinghChugh	be7aa3210b	New bart model card (#37858 ) * Modified BART documentation wrt to issue #36979. * Modified BART documentation wrt to issue #36979. * fixed a typo. * Update docs/source/en/model_doc/bart.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bart.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bart.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bart.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bart.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bart.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * blank commit. --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-05-27 11:51:41 -07:00
RogerSinghChugh	587c1b0ed1	Updated BERTweet model card. (#37981 ) * Updated BERTweet model card. * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * updated toctree (EN). * Updated BERTweet model card. * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * updated toctree (EN). * Updated BERTweet model card. * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bertweet.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * updated toctree (EN). --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-05-27 11:51:22 -07:00
RogerSinghChugh	b73faef52f	Updated BigBird Model card as per #36979 . (#37959 ) * Updated BigBird Model card as per #36979. * Update docs/source/en/model_doc/big_bird.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/big_bird.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/big_bird.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/big_bird.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-05-27 11:24:28 -07:00
Madhav Kumar	538e847c06	Updated Zoedepth model card (#37898 ) * Edited zoedepth model card according to specifications. * Edited Zoedepth model file * made suggested changes.	2025-05-27 10:06:53 -07:00
Parag Ekbote	4f7b0ff8d1	Update Model Card for Mamba-2 (#37951 ) * update model page. * update model page. * Update docs/source/en/model_doc/mamba2.md Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> * update the model page. * update. * Apply suggestions from code review Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> * Apply the suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * add an quantization example and update the toctree. * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * remove the additional comma --------- Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-05-27 10:06:39 -07:00
eustlb	b9f8f863d9	[CSM] update model id (#38211 ) * update model id * codec_model eval * add processor img * use ungated repo for processor tests	2025-05-27 17:03:55 +02:00
eustlb	3142bd8592	[CSM] infer codec model with no_grad + audio eos label (#38215 ) * infer codec model with no_grad * codec_model eval * training labels: add audio eos token	2025-05-27 14:10:17 +00:00
Ragnar	63964b7c67	fix typos (#38336 ) * Update video_processor.md * Update deepseek_v3.md	2025-05-26 14:42:37 +00:00
Manuel de Prada Corral	78079abeff	Improved cache docs (#38060 ) * improved cache docs Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-05-26 13:53:41 +00:00
Kseniya Parkhamchuk	31f8a0fe8a	[docs]: update roformer.md model card (#37946 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details * Update roformer model card * fix example purpose description * fix model description according to the comments * revert changes for autodoc * remove unneeded tags * fix review issues * fix hfoption --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-05-23 16:27:56 -07:00
Bryan C.	36f97ae15b	docs(swinv2): Update SwinV2 model card to new standard format (#37942 ) * docs(swinv2): Update SwinV2 model card to new standard format * docs(swinv2): Apply review suggestions Incorporates feedback from @stevhliu to: - Enhance the introductory paragraph with more details about scaling and SimMIM. - Generalize the tip from "image classification tasks" to "vision tasks". Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-05-23 13:04:13 -07:00
Aguedo	33d23c39ed	Update BioGPT model card (#38214 ) * Update BioGPT model card * Update docs/source/en/model_doc/biogpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/biogpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/biogpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/biogpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/biogpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/biogpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/biogpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/biogpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/biogpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/biogpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/biogpt.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * correction for CPU fallback * added quantization code and method * fixed transformers-cli call --------- Co-authored-by: Aguedo <aguedo@fakeemail.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-05-23 13:03:47 -07:00
Cyril Vallez	e0aad278fe	Never fallback to eager implicitly (#38327 ) Some checks failed Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details New model PR merged notification / Notify new model (push) Has been cancelled Details * remove arg everywhere * Update warnings * add more models * Update sdpa_attention.py * fix style * fix * readd warnings but not for flex * Update test_modeling_common.py * skip * fix --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-05-23 19:48:01 +02:00
Aaron V	d5f992f5e6	Enhance Model Loading By Providing Parallelism, Uses Optional Env Flag (#36835 ) * Get parallel loader working. Include tests. * Update the tests for parallel loading * Rename env variables. * Add docs for parallel model weight loading. * Touch up parallel model loading docs. * Touch up parallel model loading docs again. * Edit comment in test_modeling_utils_parallel_loading.py * Make sure HF_PARALLEL_LOADING_WORKERS is spelled correctly in modeling_utils.py * Correct times for parallelized loading, previous times were for a "hot" filesystem * Update parallel model loading so the spawn method is encapsulated. DRY up the code by leveraging get_submodule. * Update docs on model loading parallelism so that details on setting the multiprocessing start method are removed, now that the package handles this step internally. * Fix style on model loading parallelism changes. * Merge latest version of master's modeling_utils. * Removed unused variable. * Fix argument packing for the parallel loader. * Fix state dict being undefined in the parallel model loader. * Rename variables used in parallel model loading for clarity. Use get_module_from_name(). * Switch to the use of threads for parallel model loading. * Update docs for parallel loading. * Remove the use of json.loads when evaluating HF_ENABLE_PARALLEL_LOADING. Prefer simple casting. * Move parallelized shard loading into its own function. * Remove use of is_true(). Favor checking env var true values for HF_ENABLE_PARALLEL_LOADING. * Update copyright to 2025 in readme for paralell model loading. * Remove garbage collection line in load_shard_file, implicit garbage collection already occurs. * Run formatter on modeling_utils.py * Apply style fixes * Delete tests/utils/test_modeling_utils_parallel_loading.py --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>	2025-05-23 16:39:47 +00:00
Arthur	f5d45d89c4	🚨Early-error🚨 config will error out if `output_attentions=True` and the attn implementation is wrong (#38288 ) * Protect ParallelInterface * early error out on output attention setting for no wraning in modeling * modular update * fixup * update model tests * update * oups * set model's config * more cases * ?? * properly fix * fixup * update * last onces * update * fix? * fix wrong merge commit * fix hub test * nits * wow I am tired * updates * fix pipeline! --------- Co-authored-by: Lysandre <hi@lysand.re>	2025-05-23 17:17:38 +02:00
Joao Gante	54cd86708d	[custom_generate] don't forward `custom_generate` and `trust_remote_code` (#38304 ) * prevent infinite loops * docs * more links to custom generation methods	2025-05-23 14:49:39 +00:00
Jinan Zhou	135163e9c5	Expose AutoModelForTimeSeriesPrediction for import (#38307 ) * expose AutoModelForTimeSeriesPrediction for import * add in docs	2025-05-23 13:09:29 +00:00
Ryan Mullins	9eb0a37c9e	Adds use_repr to model_addition_debugger_context (#37984 ) * Adds use_repr to model_addition_debugger_context * Updating docs for use_repr option	2025-05-23 09:35:13 +00:00
Anton Vlasjuk	d95c864a25	🔴🔴🔴 [`Attention`] Refactor Attention Interface for Bart-based Models (#38108 ) * starting attn refactor for encoder decoder models via bart (eager + sdpa) * flash attention works, remove unnecessary code * flex attention support for bart!, gotta check if the renaming is not too aggressive * some comments * skip flex grad test for standalone as done with the other test * revert flex attn rename (for now), sdpa simplify, and todos * more todos * refactor mask creation for reuse * modular attempt at biogpt * first batch of other models * fix attn dropout * fix autoformer copies * hubert * another batch of models * copies/style + last round of bart models --> whisper next? * remove unnecessary _reshape function and remove copy to whisper * add skip for decoder-only models out of enc-dec (same as in bart) * bring back licences * remove comment, added to pr read instead * mostly docs * disable sew flex attn as it's unclear attn mask for now * oops * test fixes for enc-dec * torch fx fixes + try at flex attn * skip on mbart * some more fixes * musicgen skip / delete old attn class logic + sdpa compose compile skip * disable flex attn for musicgen, not worth the effort * more fixes and style * flex attention test for dropout and encoder decoder that dont have main input names * informer fixes * the weirdest thing I've encountered yet... * style * remove empty tensor attempt, found core root in previous commits * disable time series due to tests being very text centric on inputs * add speech to text to be ignoring the other attns, also due to tests * update docs * remaining issues resolved ? * update docs for current state --> nllb moe and pegasus x sdpa is questionable :D * some models have not set the is_causal flag... * change dtype in softmax tol old behaviour + some modular fixes * I hate it but it is what it is * fixes from main for bart * forgot this one * some model fixes * style * current status * marian works now * fixing some copies * some copy fixes + time series x informer * last models possibly and fixes on style/copies * some post merge fixes * more fixes * make attention interface callable and move warnings there * style lol * add comment to "unsupported" * remove callable interface and change interface warnings + some copies * fix * ternary is ugly af, make it simpler * how did that happen * fix flex attn test * failing the test * no more fallback! fixing copies next * style + attn fixed * fixing copies and mask creation * wrong copy * fixup tests and disable flex attn for now * fixup last tests?	2025-05-22 17:12:58 +02:00
Cyril Vallez	163138a911	🚨🚨[core] Completely rewrite the masking logic for all attentions (#37866 ) * start * start having a clean 4d mask primitive * Update mask_utils.py * Update mask_utils.py * switch name * Update masking_utils.py * add a new AttentionMask tensor class * fix import * nits * fixes * use full and quandrants * general sdpa mask for all caches * style * start some tests * tests with sliding, chunked * add styling * test hybrid * Update masking_utils.py * small temp fixes * Update modeling_gemma2.py * compile compatible * Update masking_utils.py * improve * start making it more general * Update masking_utils.py * generate * make it work with flex style primitives! * Update masking_utils.py * Update masking_utils.py * Update masking_utils.py * improve * Update cache_utils.py * Update masking_utils.py * simplify - starting to look good! * Update masking_utils.py * name * Update masking_utils.py * style * Update masking_utils.py * Update masking_utils.py * Update masking_utils.py * Update masking_utils.py * small fix for flex * flex compile * FA2 * Update masking_utils.py * Escape for TGI/vLLM! * Update masking_utils.py * Update masking_utils.py * Update masking_utils.py * General case without cache * rename * full test on llama4 * small fix for FA2 guard with chunk * Update modeling_gemma2.py * post rebase cleanup * FA2 supports static cache! * Update modeling_flash_attention_utils.py * Update flex_attention.py * Update masking_utils.py * Update masking_utils.py * Update utils.py * override for export * Update executorch.py * Update executorch.py * Update executorch.py * Update executorch.py * Update masking_utils.py * Update masking_utils.py * output attentions * style * Update masking_utils.py * Update executorch.py * Add doicstring * Add license and put mask visualizer at the end * Update test_modeling_common.py * fix broken test * Update test_modeling_gemma.py * Update test_modeling_gemma2.py * Use fullgraph=False with FA2 * Update utils.py * change name * Update masking_utils.py * improve doc * change name * Update modeling_attn_mask_utils.py * more explicit logic based on model's property * pattern in config * extend * fixes * make it better * generalize to other test models * fix * Update masking_utils.py * fix * do not check mask equivalence if layer types are different * executorch * Update modeling_gemma2.py * Update masking_utils.py * use layer_idx instead * adjust * Update masking_utils.py * test * fix imports * Update modeling_gemma2.py * other test models * Update modeling_llama4.py * Update masking_utils.py * improve * simplify * Update masking_utils.py * typos * typo * fix * Update masking_utils.py * default DynamicCache * remove default cache * simplify * Update masking_utils.py * Update masking_utils.py * Update masking_utils.py * Update masking_utils.py * simplify * Update masking_utils.py * Update masking_utils.py * Update masking_utils.py * export * Update executorch.py * Update executorch.py * Update flex_attention.py * Update executorch.py * upstream to modular gemma 1 & 2 * Update modular_mistral.py * switch names * use dict * put it in the Layer directly * update copy model source for mask functions * apply so many modular (hopefully 1 shot) * use explicite dicts for make style happy * protect import * check docstring * better default in hybrid caches * qwens * Update modular_qwen2.py * simplify core logic! * Update executorch.py * qwen3 moe * Update masking_utils.py * Update masking_utils.py * simplify a lot sdpa causal skip * Update masking_utils.py * post-rebase * gemma3 finally * style * check it before * gemma3 * More general with newer torch * align gemma3 * Update utils.py * Update utils.py * Update masking_utils.py * Update test_modeling_common.py * Update flex_attention.py * Update flex_attention.py * Update flex_attention.py * test * executorch * Update test_modeling_common.py * Update masking_utils.py * Update masking_utils.py * Update masking_utils.py * Update masking_utils.py * Update executorch.py * Update test_modeling_common.py * fix copies * device * sdpa can be used without mask -> pass the torchscript tests in this case * Use enum for check * revert enum and add check instead * remove broken test * cohere2 * some doc & reorganize the Interface * Update tensor_parallel.py * Update tensor_parallel.py * doc and dummy * Update test_modeling_paligemma2.py * Update modeling_falcon_h1.py * Update masking_utils.py * executorch patch * style * CIs * use register in executorch * final comments! --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>	2025-05-22 11:38:26 +02:00
Joao Gante	f8630c778c	[Whisper] handle deprecation of `forced_decoder_ids` (#38232 ) * fix * working saved forced_decoder_ids * docstring * add deprecation message * exception message ordering * circular import comment	2025-05-22 09:16:38 +00:00
Bryan C.	b369a65480	docs(swin): Update Swin model card to standard format (#37628 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details * docs(swin): Update Swin model card to standard format * docs(swin): Refine link to Microsoft organization for Swin models Apply suggestion from @stevhliu in PR #37628. This change updates the link pointing to the official Microsoft Swin Transformer checkpoints on the Hugging Face Hub. The link now directs users specifically to the Microsoft organization page, filtered for Swin models, providing a clearer and more canonical reference compared to the previous general search link. Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * docs(swin): Clarify padding description and link to backbone docs Apply suggestion from @stevhliu in PR #37628. This change introduces two improvements to the Swin model card: 1. Refines the wording describing how Swin handles input padding for better clarity. 2. Adds an internal documentation link to the general "backbones" page when discussing Swin's capability as a backbone model. These updates enhance readability and improve navigation within the Transformers documentation. Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * docs(swin): Change Swin paper link to huggingface.co/papers as suggested Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-05-21 16:16:43 -07:00
Parag Ekbote	28d3148b07	Update Model Card for Mamba (#37863 ) Some checks are pending Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Waiting to run Details Build documentation / build (push) Waiting to run Details New model PR merged notification / Notify new model (push) Waiting to run Details Slow tests on important models (on Push - A10) / Get all modified files (push) Waiting to run Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Check if setup was changed (push) Waiting to run Details Self-hosted runner (push-caller) / build-docker-containers (push) Blocked by required conditions Details Self-hosted runner (push-caller) / Trigger Push CI (push) Blocked by required conditions Details Secret Leaks / trufflehog (push) Waiting to run Details Update Transformers metadata / build_and_package (push) Waiting to run Details * update model card. * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * update quantization example. * update example. * update --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-05-21 10:58:23 -07:00
ritsumei-aoi	5c13cc0f94	Remove Japanese sequence_classification doc and update references (#38246 )	2025-05-21 08:33:41 -07:00
youngrok cha	101b3fa4ea	fix multi-image case for llava-onevision (#38084 ) * _get_padding_size module * do not patchify images when processing multi image * modify llava onevision image processor fast * tensor to list of tensors * backward compat * reuse pad_to_square in llave & some clarification * add to doc * fix: consider no image cases (text only or video) * add integration test * style & repo_consistency	2025-05-21 11:50:46 +02:00
Younes Belkada	6829936ee0	[MODEL] Add Falcon H1 (#38249 ) * Create push-important-models.yml * feat: add falcon-h1 * fixup * address comment * fix * fix copies * fix copies * fix * fix * fix * fix * fix copies * fix * fix copies * fix test import to at least trigget the cis * yups * update * fix make fix copies * fix inits? * fix style * skip annoying test * add integration test for Falcon H1 * fix copies * fix --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: dhia.rhaiem <dhia.rhaiem@tii.ae>	2025-05-21 10:43:11 +02:00
Garrett Goon	390f153469	Add padding-free to bamba (#35861 ) * add seq_idx and fa kwargs * update tests * docs and grad ckpt support * fmt * better names * test_raise_missing_padding_free_kwarg_errs * + seq_idx in doc strings * padding free training docs * add link to pr plots * raise err on attn_mask with padding free * rm raising missing padding free err test * BambaFlashAttentionKwargs * run modular util for modular_granitemoehybrid.py	2025-05-20 17:13:59 +02:00
Matej Sirovatka	7a611f0afd	Fix: make docs work better with doc builder (#38213 )	2025-05-20 08:23:03 +00:00
Fanli Lin	9ecee14378	[doc] fix bugs in `how_to_hack_models.md` (#38198 ) fix several bugs	2025-05-19 10:37:54 -07:00
Nanji Huaji	f524439cc5	Translating model_doc/bert.md to Chinese (#37806 ) * Translated model_doc/bert.md * Revise grammatical errors * Changed _toctree.yml * Revise some errors	2025-05-19 10:14:57 -07:00
Matej Sirovatka	6e738411e1	Tensor parallel docs (#38178 ) * Feat: initial docs * Feat: update doc * Final typos/changes * Refactor: reorder top to bottom.	2025-05-19 17:05:01 +00:00
NielsRogge	7c9b0ca08c	[SAM-HQ] Update names in the docs (#38058 ) Update names	2025-05-19 09:21:14 -07:00
Fanli Lin	9644acb7cb	[docs] add Audio import (#38195 ) add Audio import	2025-05-19 13:16:35 +00:00
Fanli Lin	7d93f93f83	[docs] minor fixes in `models.md` (#38193 ) minor gix	2025-05-19 13:14:21 +00:00
Joao Gante	0e0e5c1044	[generate] Run custom generation code from the Hub (#36405 ) * mvp * remove trust_remote_code * generate_from_hub * handle requirements; docs * english * doc PR suggestions * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * changed remote code path to generate/generate.py * model repo has custom generate -> override base generate * check for proper inheritance * some doc updates (missing: tag-related docs) * update docs to model repo * nit * nit * nits * Update src/transformers/dynamic_module_utils.py * Apply suggestions from code review * Update docs/source/en/generation_strategies.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * trust remote code is required * use new import utils for requirements version parsing * use org examples * add tests * Apply suggestions from code review Co-authored-by: Manuel de Prada Corral <6536835+manueldeprada@users.noreply.github.com> * ascii file structure; tag instructions on readme.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Manuel de Prada Corral <6536835+manueldeprada@users.noreply.github.com>	2025-05-15 10:35:54 +01:00
Raushan Turganbay	955e61b0da	Remove head mask in generative models (#35786 ) * just squash into one commit * delete print	2025-05-15 10:44:19 +02:00
guspuffygit	4a2decd192	Update trainer.md (#38113 ) Fix typo in torch.compile method parameters	2025-05-14 12:40:00 +00:00
Jinyong Lee	342961f669	Add Fast Image Processor for vilt (#37304 ) * init vilt image processor fast * Refactor image processor tests to use loop for all processors * Add ViltImageProcessorFast with PyTorch-based optimized image processing * Change made automatically by make fixup command * Change made automatically by make fix-copies command * Fix type hints in ViltImageProcessorFast for Python compatibility * Define constants for image resizing based on COCO dataset aspect ratio * Add missing property initializations to ViltImageProcessorFast * Extract resize logic into dedicated method in ViltImageProcessorFast * Extract padding logic into dedicated method * Implement shape-based image grouping for optimized processing in Vilt * Update test suite to verify ViltImageProcessorFast attributes * Move variable declarations to _preprocess method parameters * Remove unused parameters * Rename _resize method to resize to override existing function * Remove whitespace * Remove unnecessary type check and conversion for stacked_images * Remove redundant loop and apply padding directly to stacked images * Refactor pad function to return images and mask as tuple instead of dict * Add tests comparing padding masks in slow and fast implementations * Update ViltImageProcessor tests to ensure compatibility between slow and fast implementations * Replace add_start_docstrings with auto_docstring in ViltImageProcessorFast * Move docstrings of custom args to ViltFastImageProcessorKwargs * Use reorder_images function for both masks and images --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>	2025-05-13 15:40:53 +00:00
谭九鼎	5c85018072	docs: fix md style (#38057 )	2025-05-12 15:56:31 +01:00
Joao Gante	8efe3a9d77	[`chat`] generate parameterization powered by `GenerationConfig` and UX-related changes (#38047 ) * accept arbitrary kwargs * move user commands to a separate fn * work with generation config files * rm cmmt * docs * base generate flag doc section * nits * nits * nits * no <br> * better basic args description	2025-05-12 14:04:41 +01:00
Raushan Turganbay	a31fa218ad	🔴 Video processors as a separate class (#35206 ) * initial design * update all video processors * add tests * need to add qwen2-vl (not tested yet) * add qwen2-vl in auto map * fix copies * isort * resolve confilicts kinda * nit: * qwen2-vl is happy now * qwen2-5 happy * other models are happy * fix copies * fix tests * add docs * CI green now? * add more tests * even more changes + tests * doc builder fail * nit * Update src/transformers/models/auto/processing_auto.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * small update * imports correctly * dump, otherwise this is getting unmanagebale T-T * dump * update * another update * update * tests * move * modular * docs * test * another update * init * remove flakiness in tests * fixup * clean up and remove commented lines * docs * skip this one! * last fix after rebasing * run fixup * delete slow files * remove unnecessary tests + clean up a bit * small fixes * fix tests * more updates * docs * fix tests * update * style * fix qwen2-5-vl * fixup * fixup * unflatten batch when preparing * dump, come back soon * add docs and fix some tests * how to guard this with new dummies? * chat templates in qwen * address some comments * remove `Fast` suffix * fixup * oops should be imported from transforms * typo in requires dummies * new model added with video support * fixup once more * last fixup I hope * revert image processor name + comments * oh, this is why fetch test is failing * fix tests * fix more tests * fixup * add new models: internvl, smolvlm * update docs * imprt once * fix failing tests * do we need to guard it here again, why? * new model was added, update it * remove testcase from tester * fix tests * make style * not related CI fail, lets' just fix here * mark flaky for now, filas 15 out of 100 * style * maybe we can do this way? * don't download images in setup class --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2025-05-12 11:55:51 +02:00
Mikhail Moskovchenko	7f1a97bae3	Fix reduce-labels in BEIT Fast Image Processor (#38042 ) * Fixed reduce-labels * Little doc fix * Change docstring	2025-05-09 11:51:46 -04:00
Lysandre Debut	23d79cea75	Support for version spec in requires & arbitrary mismatching depths across folders (#37854 ) * Support for version spec in requires & arbitrary mismatching depths * Quality * Testing	2025-05-09 15:26:27 +02:00

1 2 3 4 5 ...

3289 Commits