Anton Vlasjuk
d95c864a25
🔴 🔴 🔴 [Attention
] Refactor Attention Interface for Bart-based Models ( #38108 )
...
* starting attn refactor for encoder decoder models via bart (eager + sdpa)
* flash attention works, remove unnecessary code
* flex attention support for bart!, gotta check if the renaming is not too aggressive
* some comments
* skip flex grad test for standalone as done with the other test
* revert flex attn rename (for now), sdpa simplify, and todos
* more todos
* refactor mask creation for reuse
* modular attempt at biogpt
* first batch of other models
* fix attn dropout
* fix autoformer copies
* hubert
* another batch of models
* copies/style + last round of bart models --> whisper next?
* remove unnecessary _reshape function and remove copy to whisper
* add skip for decoder-only models out of enc-dec (same as in bart)
* bring back licences
* remove comment, added to pr read instead
* mostly docs
* disable sew flex attn as it's unclear attn mask for now
* oops
* test fixes for enc-dec
* torch fx fixes + try at flex attn
* skip on mbart
* some more fixes
* musicgen skip / delete old attn class logic + sdpa compose compile skip
* disable flex attn for musicgen, not worth the effort
* more fixes and style
* flex attention test for dropout and encoder decoder that dont have main input names
* informer fixes
* the weirdest thing I've encountered yet...
* style
* remove empty tensor attempt, found core root in previous commits
* disable time series due to tests being very text centric on inputs
* add speech to text to be ignoring the other attns, also due to tests
* update docs
* remaining issues resolved ?
* update docs for current state --> nllb moe and pegasus x sdpa is questionable :D
* some models have not set the is_causal flag...
* change dtype in softmax tol old behaviour + some modular fixes
* I hate it but it is what it is
* fixes from main for bart
* forgot this one
* some model fixes
* style
* current status
* marian works now
* fixing some copies
* some copy fixes + time series x informer
* last models possibly and fixes on style/copies
* some post merge fixes
* more fixes
* make attention interface callable and move warnings there
* style lol
* add comment to "unsupported"
* remove callable interface and change interface warnings + some copies
* fix
* ternary is ugly af, make it simpler
* how did that happen
* fix flex attn test
* failing the test
* no more fallback! fixing copies next
* style + attn fixed
* fixing copies and mask creation
* wrong copy
* fixup tests and disable flex attn for now
* fixup last tests?
2025-05-22 17:12:58 +02:00
Cyril Vallez
163138a911
🚨 🚨 [core] Completely rewrite the masking logic for all attentions ( #37866 )
...
* start
* start having a clean 4d mask primitive
* Update mask_utils.py
* Update mask_utils.py
* switch name
* Update masking_utils.py
* add a new AttentionMask tensor class
* fix import
* nits
* fixes
* use full and quandrants
* general sdpa mask for all caches
* style
* start some tests
* tests with sliding, chunked
* add styling
* test hybrid
* Update masking_utils.py
* small temp fixes
* Update modeling_gemma2.py
* compile compatible
* Update masking_utils.py
* improve
* start making it more general
* Update masking_utils.py
* generate
* make it work with flex style primitives!
* Update masking_utils.py
* Update masking_utils.py
* Update masking_utils.py
* improve
* Update cache_utils.py
* Update masking_utils.py
* simplify - starting to look good!
* Update masking_utils.py
* name
* Update masking_utils.py
* style
* Update masking_utils.py
* Update masking_utils.py
* Update masking_utils.py
* Update masking_utils.py
* small fix for flex
* flex compile
* FA2
* Update masking_utils.py
* Escape for TGI/vLLM!
* Update masking_utils.py
* Update masking_utils.py
* Update masking_utils.py
* General case without cache
* rename
* full test on llama4
* small fix for FA2 guard with chunk
* Update modeling_gemma2.py
* post rebase cleanup
* FA2 supports static cache!
* Update modeling_flash_attention_utils.py
* Update flex_attention.py
* Update masking_utils.py
* Update masking_utils.py
* Update utils.py
* override for export
* Update executorch.py
* Update executorch.py
* Update executorch.py
* Update executorch.py
* Update masking_utils.py
* Update masking_utils.py
* output attentions
* style
* Update masking_utils.py
* Update executorch.py
* Add doicstring
* Add license and put mask visualizer at the end
* Update test_modeling_common.py
* fix broken test
* Update test_modeling_gemma.py
* Update test_modeling_gemma2.py
* Use fullgraph=False with FA2
* Update utils.py
* change name
* Update masking_utils.py
* improve doc
* change name
* Update modeling_attn_mask_utils.py
* more explicit logic based on model's property
* pattern in config
* extend
* fixes
* make it better
* generalize to other test models
* fix
* Update masking_utils.py
* fix
* do not check mask equivalence if layer types are different
* executorch
* Update modeling_gemma2.py
* Update masking_utils.py
* use layer_idx instead
* adjust
* Update masking_utils.py
* test
* fix imports
* Update modeling_gemma2.py
* other test models
* Update modeling_llama4.py
* Update masking_utils.py
* improve
* simplify
* Update masking_utils.py
* typos
* typo
* fix
* Update masking_utils.py
* default DynamicCache
* remove default cache
* simplify
* Update masking_utils.py
* Update masking_utils.py
* Update masking_utils.py
* Update masking_utils.py
* simplify
* Update masking_utils.py
* Update masking_utils.py
* Update masking_utils.py
* export
* Update executorch.py
* Update executorch.py
* Update flex_attention.py
* Update executorch.py
* upstream to modular gemma 1 & 2
* Update modular_mistral.py
* switch names
* use dict
* put it in the Layer directly
* update copy model source for mask functions
* apply so many modular (hopefully 1 shot)
* use explicite dicts for make style happy
* protect import
* check docstring
* better default in hybrid caches
* qwens
* Update modular_qwen2.py
* simplify core logic!
* Update executorch.py
* qwen3 moe
* Update masking_utils.py
* Update masking_utils.py
* simplify a lot sdpa causal skip
* Update masking_utils.py
* post-rebase
* gemma3 finally
* style
* check it before
* gemma3
* More general with newer torch
* align gemma3
* Update utils.py
* Update utils.py
* Update masking_utils.py
* Update test_modeling_common.py
* Update flex_attention.py
* Update flex_attention.py
* Update flex_attention.py
* test
* executorch
* Update test_modeling_common.py
* Update masking_utils.py
* Update masking_utils.py
* Update masking_utils.py
* Update masking_utils.py
* Update executorch.py
* Update test_modeling_common.py
* fix copies
* device
* sdpa can be used without mask -> pass the torchscript tests in this case
* Use enum for check
* revert enum and add check instead
* remove broken test
* cohere2
* some doc & reorganize the Interface
* Update tensor_parallel.py
* Update tensor_parallel.py
* doc and dummy
* Update test_modeling_paligemma2.py
* Update modeling_falcon_h1.py
* Update masking_utils.py
* executorch patch
* style
* CIs
* use register in executorch
* final comments!
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
2025-05-22 11:38:26 +02:00
Raushan Turganbay
0a52bd2403
[fix] sliding window attention mask ( #38045 )
...
* fix sliding attn
* make style
* Update tests/test_modeling_common.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* no a second throught, should default to `True` fo BC
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-05-20 09:32:19 +00:00
Yao Matrix
3bd1c20149
enable misc cases on XPU & use device agnostic APIs for cases in tests ( #38192 )
...
* use device agnostic APIs in tests
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* more
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* fix style
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* add reset_peak_memory_stats API
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* update
---------
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-20 10:09:01 +02:00
Raushan Turganbay
01ad9f4b49
Bart: new cache format ( #35314 )
...
* bart compile
* add mbart
* some more models touched by fix-copies
* more
* more models
* even more models
* fix copies
* fix tests
* fix copies
* fix
* biogpt accepts position ids now (breaking?)
* fix failing non-slow tests
* fix some tests
* should not be removed
* small update
* Update src/transformers/models/bart/modeling_bart.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* update for last `main`
* fix copies
* clone `update_causal_mask` from llama
* tmp
* fixup
* why? how?
* fix bart tests
* dont skip test
* address comments
* fix tests
* fix
* fixup and delete the file
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-05-16 13:26:54 +02:00
Raushan Turganbay
955e61b0da
Remove head mask in generative models ( #35786 )
...
* just squash into one commit
* delete print
2025-05-15 10:44:19 +02:00
Raushan Turganbay
d23aae2b8c
[VLMs] support attention backends ( #37576 )
...
* update models
* why rename
* return attn weights when sdpa
* fixes
* fix attn implementation composite
* fix moshi
* add message
* add typings
* use explicitly all flags for each attn type
* fix some tests
* import what is needed
* kosmos on main has ew attention already, yay
* new models in main, run fixup
* won't fix kosmos yet
* fix-copies
* clean up after rebasing
* fix tests
* style
* dont cast attns to fp32
* did we update ruff? oke, let's just do what it asks
* fix pixtral after rebase
2025-05-08 18:18:54 +02:00
Raushan Turganbay
17742bd9c8
🔴 [VLM] Add base model without head ( #37033 )
...
* i guessreverted all CdGen classes
* style
* llava onevision
* fix copies
* fix some tests
* some more tests
* dump
* skip these
* nevermind, i am dumb
* revert fix not needed
* fixup
* fixup
* another fixup
* more fixup to make ci finally happy
* fixup after rebasing
* fix qwen tests
* add internVL + typos here and there
* image token index -> id
* style
* fix init weights
* revert blip-2 not supported
* address comments
* fix copies
* revert blip2 test file as well
* as discussed internally, revert back CdGen models
* fix some tests
* fix more tests for compile
* CI red
* fix copies
* enumerate explicitly allowed models
* address comments
* fix tests
* fixup
* style again
* add tests for new model class
* another fixup ( x _ x )
* [fixup] unused attributes can be removed post-deprecation
2025-05-07 17:47:51 +02:00
eustlb
798f948e88
Add CSM model ( #36719 )
...
* draft structure
* depth decoder with forward pre hook
* full model forward draft
* draft update
* depth decoder update
* ConversationalSpeechModelForCausalLM udpates
* add generate
* max length criteria small fix
* udpate
* updates
* generation update
* update in loss compute
* conversion script
* update for correct input embeddings
* handle interleaved rope
* update
* update
* update
* support compile
* update training
* add doc
* update doc
* correct inits
* ConversationalSpeechModel -> Csm
* conf update
* name update
* tests CsmForCausalLMTest
* convert use cached_file
* conf + modeling updates
* generate utils handle third dim shape
* integration test
* modeling + conf updates
* common test handle more than 2 dims
* add nested audio list utils
* processing handle nested audio list
* csm processing draft
* mimi util
* init updates
* modular update
* convert modular
* processing update
* csm tests update
* generate tests handle third dim
* generate utils handle third dim
* propagate _get_initial_cache_position update
* tied_weight_keys update + convert correctly
* fix inputs_embeds
* revert audio nested list
* batch inference update + return audio
* audio_utils update
* processor update
* some more integration tests
* remove old test
* porcessing output labels
* improve
* fix
* update rope values with equivalent ones
* conversion update
* udpate tests
* handle depth decoder generation config
* remove default eos_token_id
* make style
* revert modeling_mimi
* add default generation_config
* remove sdpa since handled by default
* make
* fix conflict
* fix conflicts
* correct naming
* correct imports
* make
* causal -> conditional naming
* causal -> conditional naming
* auto update
* make
* make
* add doc
* test update
* fix weight init
* audio tokens offsets as buffer
* 4d mask in conditional class
* make
* doc update
* fix causal mask
* fix causal mask
* doc update
* doc update
* add processor doc
* update doc
* fix 4d causal mask
* update make_list_of_audio
* do not default to mutable
* remove duplicates
* remove useless reset_parameters
* use GradientCheckpointingLayer
* use can_return_tuple
* formatting
* prepend placeholder in _sample
* torch compile fix
* some more fixies
* convert modular
* fix
* default max_length in convert
* handle depth decoder generation config correctly
* clearer formulation
* handle output_loading_info
* handle softmax warning
* add doc
* propagate _get_initial_cache_position changes
* generation in its own module
* add processor tests
* fix compile witu cuda graphs
* fix compile with cuda graphs
* add csm.md
* include CSM loss
* doc nit
* doc nit
* doc nit
* Update docs/source/en/model_doc/csm.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add save_audio to processor
* Update src/transformers/models/csm/modular_csm.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* doc update
* simplify audio_codes_mask computation
* doc update
* simplify loss computation
* fix static cache test
* fix
* remove comment
* simplify encoded length computation
* use hf-internal-testing
* doc update
* cast to float before numpy
* nit
* mem efficient codebook head
* nit
* cat input values with cutoffs
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-05-07 10:20:13 -04:00
Yao Matrix
34f26e2c3e
enable internvl UTs on XPU ( #37779 )
...
* enable internvl UTs on XPU
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* fix style
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* fix style per comments
Signed-off-by: Yao Matrix <matrix.yao@intel.com>
---------
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Signed-off-by: Yao Matrix <matrix.yao@intel.com>
2025-04-30 10:29:40 +02:00
co63oc
d5fa7d2d19
Fix typos in strings and comments ( #37799 )
2025-04-28 11:39:11 +01:00
Cyril Vallez
58e5e976e0
Small fix on context manager detection ( #37562 )
...
* small fixes
* Update modeling_utils.py
* test
* Update test_modeling_common.py
* Update test_modeling_timm_backbone.py
* more general
* simpler
2025-04-17 15:39:44 +02:00
Cyril Vallez
688f4707bf
All models can be initialized on meta device ( #37563 )
...
* Update test_modeling_common.py
* fix all
* more fixes
2025-04-16 23:26:44 +02:00
Yih-Dar
5a6de703a7
Run test_can_load_with_global_device_set
using a subprocess ( #37553 )
...
* fix
* fix
* fix
* Update tests/test_modeling_common.py
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-04-16 19:48:30 +02:00
Garrett Goon
503541d7ef
add FlashAttentionKwargs and seq_idx to flat collator ( #36456 )
...
* add flash attn kwargs to flattening collator
* add return_seq_idx option
* doc string edits
* cleaner max len updates
* various fixes
* temp testing code
* return int32 seq_idx and FlashAttnKwargs
* DataCollatorIntegrationTest impl
* fix batch dims and dtypes
* fill out remaining collator tests
* test name change and fmt
* rm unused var
* fmt
* minor change
* fmt
* add missing pos_ids check
* consistent {np,pt,tf} tests
* split pt tests into 3, like np/tf tests
* mv comment, rename fa test
* remove batch dim comment
* simply wrapping
* compute cu_seq_len/max_length once
* fmt
* remove tf code
* rm warning
* move separator_id back to 2nd pos
* use cleaner lists in tests
* ret -> batch
* fmt
* attr ordering
* use py ints for max_length_{k,q}
2025-04-16 15:45:03 +02:00
Yao Matrix
33f6c5a5c8
enable several cases on XPU ( #37516 )
...
* enable several cases on XPU
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* Update tests/test_modeling_common.py
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* fix style
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
---------
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-04-16 11:01:04 +02:00
Cyril Vallez
c8e0e603de
Detect and use device context manager or global device in from_pretrained
( #37216 )
...
* Update modeling_utils.py
* improve
* Update modeling_utils.py
* Update test_modeling_common.py
* Update test_modeling_timm_backbone.py
* Update test_modeling_common.py
* Update test_modeling_common.py
* Update test_modeling_common.py
* Update test_modeling_common.py
* CIs
2025-04-15 09:59:20 +02:00
Cyril Vallez
4e53840920
Detect and fix most _init_weights()
issues - make it work for composite models ( #37070 )
...
* Update test_modeling_common.py
* Fix Llama and its modular children
* Update test_modeling_common.py
* qwen3
* first try at prioritizing models
* Update test_modeling_common.py
* Update test_modeling_common.py
* Update test_modeling_common.py
* test
* fix
* fix
* more models
* more
* more
* more
* smarter init for composite models!
* fix post rebase
* smol
* fix missing args
* more
* typo
* Super elegant and efficient init for submodels
* Update modeling_utils.py
* style
* last fixes
* cleanup
* finalize cleanup
* CIs
* improve docstring
* Update modeling_utils.py
* llama4
* style
* CIs
* style
* add dpt
* granite speech
* qwen 2.5 omni
* better fix
* Parse the config file instead
* CIs
2025-04-14 16:19:04 +02:00
Lysandre Debut
54a123f068
Simplify soft dependencies and update the dummy-creation process ( #36827 )
...
* Reverse dependency map shouldn't be created when test_all is set
* [test_all] Remove dummies
* Modular fixes
* Update utils/check_repo.py
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
* [test_all] Better docs
* [test_all] Update src/transformers/commands/chat.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* [test_all] Remove deprecated AdaptiveEmbeddings from the tests
* [test_all] Doc builder
* [test_all] is_dummy
* [test_all] Import utils
* [test_all] Doc building should not require all deps
---------
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-04-11 11:08:36 +02:00
cyyever
371c44d0ef
Remove old code for PyTorch, Accelerator and tokenizers ( #37234 )
...
* Remove unneeded library version checks
Signed-off-by: cyy <cyyever@outlook.com>
* Remove PyTorch condition
Signed-off-by: cyy <cyyever@outlook.com>
* Remove PyTorch condition
Signed-off-by: cyy <cyyever@outlook.com>
* Fix ROCm get_device_capability
Signed-off-by: cyy <cyyever@outlook.com>
* Revert "Fix ROCm get_device_capability"
This reverts commit 0e756434bd
.
* Remove unnecessary check
Signed-off-by: cyy <cyyever@outlook.com>
* Revert changes
Signed-off-by: cyy <cyyever@outlook.com>
---------
Signed-off-by: cyy <cyyever@outlook.com>
2025-04-10 20:54:21 +02:00
ivarflakstad
aa478567f8
Allow rocm systems to run these tests ( #37278 )
...
* Allow rocm systems to run these tests
* Fix skipTest logic
* Use get_device_properties to check system capabilities
2025-04-10 13:33:01 +02:00
cyyever
1e6b546ea6
Use Python 3.9 syntax in tests ( #37343 )
...
Signed-off-by: cyy <cyyever@outlook.com>
2025-04-08 14:12:08 +02:00
Yao Matrix
f697b3f824
enable 2 types of case on XPU ( #37198 )
...
enable 2 types of case on XPU 1. test_resize_tokens_embeddings_with_deepspeed_multi_gpu 2. test_resize_embeddings_untied_with_deepspeed_multi_gpu
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-04-03 11:37:55 +02:00
Cyril Vallez
6ce238fe7a
Fix test ( #37213 )
...
* Update test_modeling_common.py
* style
2025-04-03 10:24:34 +02:00
Pavel Iakubovskii
3249c5dc15
Refactor attention for SigLIP based models ( #36981 )
...
* Update Siglip attention implementation
* Update tests for Siglip
* Remove one level of indentation
* Update test to be more specific
* Fixup
* Idefics2
* Idefics3
* Emu3
* SmolVLM
* Phi4 (just init small update)
* Idefics2 (test fix)
* Update siglip2 tests
* Update eager
* trigger
* Clean up
* Transfer inputs to device in test
* Fixing test
* Fixing test
* Revert contiguous
* Remove unused is_flash_attn_2_available
* Move flaky to specific models
2025-04-01 15:37:25 +02:00
cyyever
786d9c5ed9
Fix more inefficient PT operations ( #37060 )
...
* Fix inefficient operations
* Remove cpu() call
* Reorder detach()
* Reorder detach()
* tolist without detach
* item without detach
* Update src/transformers/models/rag/modeling_rag.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update tests/models/encodec/test_modeling_encodec.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Use detach().cpu().numpy
* Revert some numpy operations
* More fixes
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-03-31 16:31:24 +01:00
Cyril Vallez
f304318f5f
Remove low_cpu_mem_usage and _fast_init ( #36963 )
...
* Remove low_cpu_mem_usage and _fast_init
* Update deepspeed.py
* Update modeling_utils.py
* remove the first 2 tests everywhere
* Update test_modeling_common.py
* remove what was remaining about fast_init
* fix logic and simplify
* mismatched keys logic update
* Update modeling_utils.py
* Update modeling_utils.py
* Update modeling_utils.py
* Update modeling_utils.py
* fix 2 models init_weights
* extend to others
* remove grad
* Update modeling_fsmt.py
* init weights in tests
* style
* Update test_modeling_fsmt.py
* more old models
* fix more init_weights
* copies
* fix
* style
* Update modeling_lxmert.py
* fix inits
* more and more
* more
* should finalize
* style
* Update modeling_dinov2_with_registers.py
* fix
* Update modeling_encoder_decoder.py
* fix
* style
* Update modeling_lxmert.py
* post rebase cleanup
* Update modeling_informer.py
* back to start for device
* fix
* add test to detect all failing cases correctly
* Update test_modeling_common.py
* fix
* fix
* sam
* style
* Update modeling_maskformer_swin.py
* CIs
* CIs
* remove test - will add it on separate PR
* fix
* fix
* Update modeling_sam.py
* CIs
* CIs
* CIs
* convnext
* suggestions
* CIs
* fix copies after merge
---------
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-03-31 17:18:43 +02:00
Abu Bakr Soliman
49b5ab6a27
Support QuestionAnswering Module for ModernBert based models. ( #35566 )
...
* push ModernBertForQuestionAnswering
* update ModernBertForQuestionAnswering
* update __init__ loading
* set imports for ModernBertForQuestionAnswering
* update ModernBertForQuestionAnswering
* remove debugging logs
* update init_weights method
* remove custom initialization for ModernBertForQuestionAnswering
* apply make fix-copies
* apply make style
* apply make fix-copies
* append ModernBertForQuestionAnswering to the pipeline supported models
* remove unused file
* remove invalid autoload value
* update en/model_doc/modernbert.md
* apply make fixup command
* make fixup
* Update dummies
* update usage tips for ModernBertForQuestionAnswering
* update usage tips for ModernBertForQuestionAnswering
* add init
* add lint
* add consistency
* update init test
* change text to trigger stuck text
* use self.loss_function instead of custom loss
By @Cyrilvallez
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* Update modeling_modernbert.py
make comparable commit to even it out
* Match whitespace
* whitespace
---------
Co-authored-by: Matt <rocketknight1@gmail.com>
Co-authored-by: Orion Weller <wellerorion@gmail.com>
Co-authored-by: Orion Weller <31665361+orionw@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-03-26 21:24:18 +01:00
Yih-Dar
c6814b4ee8
Update ruff to 0.11.2
( #36962 )
...
* update
* update
* update
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-25 16:00:11 +01:00
Pavel Iakubovskii
66291778dd
Refactor Attention implementation for ViT-based models ( #36545 )
...
* Refactor vit attention
* Refactor ViT-based models
* 🚨 🚨 🚨 Fix prefix for DPT
* Update params order
* trigger tests
* Fix Dinov2 attention
* Fix DPT attention impl propagation for backbone config
* Common test fix: config is modif. inplace - avoid it
* view->reshape
* Fixup
* Fixup
* Enable IJepa FA2
* Add FA2 in corresponding model docs
2025-03-20 15:15:01 +00:00
Joao Gante
cff4caa0c1
[CI] remove redundant checks in test_eager_matches_sdpa_inference
( #36740 )
2025-03-17 16:29:18 +00:00
Joao Gante
42ebb6c23e
[tests] Parameterized test_eager_matches_sdpa_inference
( #36650 )
2025-03-14 14:41:27 +00:00
Cyril Vallez
071a161d3e
[core] Large/full refactor of from_pretrained
( #36033 )
...
* squash everything together
start to simplify inner logic
Update modeling_utils.py
Update modeling_utils.py
Update modeling_utils.py
Update modeling_utils.py
continue refactor
fix
small fixes
add type hints/docstring
Update modeling_utils.py
remove _fast_init
keep improving
Update modeling_utils.py
Update modeling_utils.py
new first tp loading version
style
fix weird in-place op
trigger CIs
Update modeling_utils.py
much clearer renaming of keys
fix
update
Update test_modeling_common.py
trigger CIs
update
update
style
Update modeling_utils.py
Update modeling_utils.py
Update modeling_utils.py
fix
fast download first prototype
remove old function
remove old functions
Remove unused function and move back _get_tp_registry
fix tp plan registry
simplify
CIs
Update hub.py
Update modeling_utils.py
simplify
simplify renaming logic
remove unused check
add sanity check back (a test depends on it)
Update modeling_utils.py
finalize sound renaming logic
style
add forgotten check
Update modeling_utils.py
add key_mapping keyword
style
Update modeling_utils.py
add comment
minor updates
minor change for clarity
fix small prefix issue and simplify
style
trigger CIs
typo fix
Post rebase fix
post rebase cleanup
simplify tp
typo
oupsi
typo
correctly escape
improvements based on Marc's review
finalize Marc's review comments
squash everything
* improve
* Update modeling_utils.py
* Update modeling_utils.py
* fix
* Update modeling_utils.py
* Update modeling_utils.py
* style
* Update modeling_utils.py
* simplify
* style
* Update modeling_utils.py
* Update modeling_utils.py
* Update modeling_utils.py
* Update modeling_utils.py
* Update modeling_utils.py
* Update modeling_utils.py
* fix dtype issue
* Update modeling_utils.py
* style
* remove test that does not make sense
* style
* small fixes
* style
* fix
* cleanup after rebase
* style
* typo
* escape
* tp for task specific top modules
* Update modeling_utils.py
* Update modeling_utils.py
* fix allocation
* CIs
* CIs
* CIs
* improve docstring
* CIs
* Update modeling_utils.py
* fix
2025-03-12 13:39:25 +01:00
Ilyas Moutawwakil
89f6956015
HPU support ( #36424 )
...
* test
* fix
* fix
* skip some and run some first
* test fsdp
* fix
* patches for generate
* test distributed
* copy
* don't test distributed loss for hpu
* require fp16 and run first
* changes from marc's PR fixing zero3
* better alternative
* return True when fp16 support on gaudi without creating bridge
* fix
* fix tested dtype in deepspeed inference test
* test
* fix
* test
* fix
* skip
* require fp16
* run first fsdp
* Apply suggestions from code review
* address comments
* address comments and refactor test
* reduce precison
* avoid doing gaudi1 specific stuff in the genreation loop
* document test_gradient_accumulation_loss_alignment_with_model_loss test a bit more
2025-03-12 09:08:12 +01:00
co63oc
996f512d52
Fix typos in tests ( #36547 )
...
Signed-off-by: co63oc <co63oc@users.noreply.github.com>
2025-03-05 15:04:06 -08:00
Yih-Dar
482d17be60
Fix hub_retry
( #36449 )
...
* cry
* trigger
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-27 14:38:25 +01:00
Zach Mueller
41925e4213
Add retry hf hub decorator ( #35213 )
...
* Add retry torch decorator
* New approach
* Empty commit
* Empty commit
* Style
* Use logger.error
* Add a test
* Update src/transformers/testing_utils.py
Co-authored-by: Lucain <lucainp@gmail.com>
* Fix err
* Update tests/utils/test_modeling_utils.py
---------
Co-authored-by: Lucain <lucainp@gmail.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-02-25 20:53:11 +01:00
Joao Gante
678885bbbd
[CI] Check test if the GenerationTesterMixin
inheritance is correct 🐛 🔫 ( #36180 )
2025-02-21 10:18:20 +00:00
Pavel Iakubovskii
a957b7911a
Add SigLIP 2 ( #36323 )
...
* Docs
* Inits
* Auto classes
* Add siglip base
* Add base tests
* Fix Siglip V1 for fix res version
* Add image processor
* Update conversion
* Experimenting with vectorized embeddings
* Fixup
* Add modular Siglip2Processor
* Add modular configuration
* Rename num patches
* Correct image and text features merging
* Working conversion script
* Refactoring conversion script
* Remove unused code in conversion script
* Shorten dict a bit
* Refactoring conversion
* Done conversion refactoring
* Fixup
* Modular siglip2
* Make model exportable and compilable without graph breaks
* Remove position_ids from image_processor
* REmove position ids from modeling file
* Update modular
* Type hint
* Fixup
* Set defaults to processor
* Add integration test
* Revert spatial shapes back to tensor
* Change order
* Fix most of the tests
* Fix docstring
* Remove interpolate_pos_encoding arg (not needed)
* Update docs
* Standardize processing
* Fix attention_mask in vision head
* Siglip v1: remove double transpose in FA2
* Update modular file
* Update FA2 test
* Update expected logits
* Fix interpolation for siglip2 image processor
* Skip init test
* Skip dispatch on flash test
* Fix modeling tests
* Fixup
* Add dummy objects
* Fix some docstrings
* Add siglip2 in index.md
* Fix consistency
* Add docs
* Remove size and data format
* Add image processor tests
* Fix
* Add fast image processor
* Fix style
* Fix
* Docs
* Set lowercase for tokenizer
* Adjust head size for Siglip v1
* Update siglip2 for consistency with siglip1
* Update siglip2 conversion
* Update pipeline
* Update checkpoints in tests
* Update checkpoint name
* Fix pooling for image classification model
* Fix FA2 test
* Update processor
* Fix check repo
* Update docs
* Fix typos
* Fix docstring for fast image processor
* Add siglip2 to FA2 docs
* Fix fast ip tests
* Fix constitency
* Fix tokenizer class for siglip v1
* Fix missing header
* Refactor scaling for clip, siglip, siglip2
* Remove unused imports
* Make fast IP default for siglip2
* Update docs
* Update checkpoints
* Update modular
* Update paper link
* Fixup
* Fix name in toctree
* Fix test
2025-02-21 09:04:19 +00:00
Orr Zohar
4397dfcb71
SmolVLM2 ( #36126 )
...
* smolvlm init
* updates
* fixing bugs
* minimal run, no checks
* minimal run, no checks
* passing first check + adding url support
* updating video dataloading logic
* fixing image logic
* trying modular, but fails
* modular is working, changing processor to match PR comments and general transformers logic
* fixing kwargs
* offloading video loading logic to image_util
* fixing circleci code formatting errors
* fixing circleci code formatting errors
* fixing circleci code formatting errors
* fixing circleci code formatting errors
* fixing circleci code formatting errors
* fixing circleci code formatting errors
* fixing circleci code formatting errors
* fixing circleci code formatting errors
* fixing circleci code formatting errors
* fixing circleci code formatting errors
* fixing circleci code formatting errors
* fixing circleci code formatting errors
* fixing circleci code formatting errors
* fixing circleci code formatting errors
* update
* add idefics3-based tests
* add keyword to all
* add PreTrainedModel
* updateing video loading logic
* working inference
* updates for PR comments
* updates for PR comments
* moving SmolVLMPretrainedModel higher to fix import error
* CI test pass
* CI test pass
* removing lambda
* CI test pass
* CI test pass
* CI test pass
* CI test pass
* CI test pass
* CI test pass
* processor tests
* add example in docs
* typo
* fix copies
* skip compile tests - sdpa for VisionTransformer
* fix init
* raise import error for num2words
* update doc for FA2
* more doc fix
* CI
* updates for PR comments
* Update docs/source/en/model_doc/smolvlm.md
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update docs/source/en/model_doc/smolvlm.md
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update docs/source/en/model_doc/smolvlm.md
Co-authored-by: Joshua Lochner <admin@xenova.com>
* Update docs/source/en/model_doc/smolvlm.md
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Update docs/source/en/model_doc/smolvlm.md
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* fixing processor -- tokenizer not defined properly, (gpt2 tokenizer), and does not have the attributes of fake image token, etc
* adding smolvlm to VQA models
* removing vqa auto class
* Update src/transformers/models/smolvlm/processing_smolvlm.py
Co-authored-by: Joshua Lochner <admin@xenova.com>
* removing smolvlmvisiontransformer from index.md
* my bad, video processing had typos
* fixing docs
* renaming params in SmolVLMModel.inputs_merger
* removing un-needed dtype/device in model forward
* ruff for CI
* update docs
* Update docs/source/en/model_doc/smolvlm.md
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* return cache position
* return cache position
* return cache also in modular
* needed to run modular again
* fix training tests
* push vectorized inputs merger
* format
* format
* reduce number of mappings
* addressing PR comments
* happy CI, happy me :)
* skip non-nested images
* adjust integration test for smaller GPUs
* format
* fix kwargs in chat template apply
* skip this for now
---------
Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: Pablo <pablo.montalvo.leroux@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Joshua Lochner <admin@xenova.com>
2025-02-20 15:00:26 +01:00
Joao Gante
99adc74462
[tests] remove flax-pt equivalence and cross tests ( #36283 )
2025-02-19 15:13:27 +00:00
Joao Gante
0863eef248
[tests] remove pt_tf
equivalence tests ( #36253 )
2025-02-19 11:55:11 +00:00
Raushan Turganbay
0c78ef6cd3
🔴 VLM: compile compatibility ( #35724 )
...
* llavas
* add mroe models
* fix `compile_forward` test for all models
* fix copies
* make style
* also doesn't support cache class
* fix some tests
* not copied from
* ci green?
* fix tests
* fix copies
* fix tests
* check with `numel` and remove `item`
* fix copies
* fix copies
* Update src/transformers/models/cohere2/modeling_cohere2.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* opt remove cross attn
* gemma2
* fixup
* fixup
* fix newly added test
* maybe fixed?
* green please?
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-02-14 15:23:49 +01:00
Joao Gante
62c7ea0201
CI: avoid human error, automatically infer generative models ( #33212 )
...
* tmp commit
* move tests to the right class
* remove ALL all_generative_model_classes = ...
* skip tf roberta
* skip InstructBlipForConditionalGenerationDecoderOnlyTest
* videollava
* reduce diff
* reduce diff
* remove on vlms
* fix a few more
* manual rebase bits
* more manual rebase
* remove all manual generative model class test entries
* fix up to ernie
* a few more removals
* handle remaining cases
* recurrent gemma
* it's better here
* make fixup
* tf idefics is broken
* tf bert + generate is broken
* don't touch tf :()
* don't touch tf :(
* make fixup
* better comments for test skips
* revert tf changes
* remove empty line removal
* one more
* missing one
2025-02-13 16:27:11 +01:00
Pavel Iakubovskii
f42d46ccb4
Add common test for torch.export
and fix some vision models ( #35124 )
...
* Add is_torch_greater_or_equal test decorator
* Add common test for torch.export
* Fix bit
* Fix focalnet
* Fix imagegpt
* Fix seggpt
* Fix swin2sr
* Enable torch.export test for vision models
* Enable test for video models
* Remove json
* Enable for hiera
* Enable for ijepa
* Fix detr
* Fic conditional_detr
* Fix maskformer
* Enable test maskformer
* Fix test for deformable detr
* Fix custom kernels for export in rt-detr and deformable-detr
* Enable test for all DPT
* Remove custom test for deformable detr
* Simplify test to use only kwargs for export
* Add comment
* Move compile_compatible_method_lru_cache to utils
* Fix beit export
* Fix deformable detr
* Fix copies data2vec<->beit
* Fix typos, update test to work with dict
* Add seed to the test
* Enable test for vit_mae
* Fix beit tests
* [run-slow] beit, bit, conditional_detr, data2vec, deformable_detr, detr, focalnet, imagegpt, maskformer, rt_detr, seggpt, swin2sr
* Add vitpose test
* Add textnet test
* Add dinov2 with registers
* Update tests/test_modeling_common.py
* Switch to torch.testing.assert_close
* Fix masformer
* Remove save-load from test
* Add dab_detr
* Add depth_pro
* Fix and test RT-DETRv2
* Fix dab_detr
2025-02-11 11:37:31 +00:00
Zach Mueller
28f73bc307
Fix model kwargs ( #35875 )
...
* Save state
* Make a failing test
* Better test
* mpt -> done, many more to go
* Rm extranious
* Bamba
* Bert
* big_bird
* biogpt
* bloom
* codegen
* ctrl
* data2vec
* dbrx
* Through up to Dbrx
* electra
* ernie
* falcon
* Fuyu/persimmon
* Include noop kwargs to base models
* Rebase
* Skip musigen
* Refactor/skip mllama
* Revert makefile
* Rm file
* Fix PT failing, need to modify rest of loss funcs to not resize
* Propagate some
* Continue
* More
* More options
* Mostly fixed
* Proved that it's the same
* Bloom is good
* Make ability to override loss func possible
* Fixup
* Clean
* Fix xglm
* Quality tests
* Skip OCR2
* Make specific loss for xglm
* Make order the same/line up 1:1
* xglm
* Skip fx output loss bloom model
* Didn't pass in pad_token_id
* Fix quality
2025-02-06 11:35:25 -05:00
Yih-Dar
dce9970884
Update test_flash_attn_2_can_dispatch_composite_models
( #36050 )
...
* update
* update
* update
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-06 12:09:49 +01:00
Yih-Dar
fe52679e74
Update tests regarding attention types after #35235 ( #36024 )
...
* update
* update
* update
* dev-ci
* more changes
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-04 18:04:47 +01:00
Yih-Dar
5757681837
Less flaky for TimmBackboneModelTest::test_batching_equivalence
( #35971 )
...
* fix
* remove is_flaky
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-01-30 16:56:26 +01:00
Raushan Turganbay
9725e5be2f
Pixtral: vectorize patch embeddings and enable tests ( #35122 )
...
* initial POC
* - batch mix feature
* fix tests
* fix tests
* make style
* do not skip and instead fix tests
* update
* return back the test
* correct text with the correct ckpt
2025-01-30 12:40:18 +01:00