transformers/utils
Cyril Vallez 163138a911
🚨🚨[core] Completely rewrite the masking logic for all attentions (#37866)
* start

* start having a clean 4d mask primitive

* Update mask_utils.py

* Update mask_utils.py

* switch name

* Update masking_utils.py

* add a new AttentionMask tensor class

* fix import

* nits

* fixes

* use full and quandrants

* general sdpa mask for all caches

* style

* start some tests

* tests with sliding, chunked

* add styling

* test hybrid

* Update masking_utils.py

* small temp fixes

* Update modeling_gemma2.py

* compile compatible

* Update masking_utils.py

* improve

* start making it more general

* Update masking_utils.py

* generate

* make it work with flex style primitives!

* Update masking_utils.py

* Update masking_utils.py

* Update masking_utils.py

* improve

* Update cache_utils.py

* Update masking_utils.py

* simplify - starting to look good!

* Update masking_utils.py

* name

* Update masking_utils.py

* style

* Update masking_utils.py

* Update masking_utils.py

* Update masking_utils.py

* Update masking_utils.py

* small fix for flex

* flex compile

* FA2

* Update masking_utils.py

* Escape for TGI/vLLM!

* Update masking_utils.py

* Update masking_utils.py

* Update masking_utils.py

* General case without cache

* rename

* full test on llama4

* small fix for FA2 guard with chunk

* Update modeling_gemma2.py

* post rebase cleanup

* FA2 supports static cache!

* Update modeling_flash_attention_utils.py

* Update flex_attention.py

* Update masking_utils.py

* Update masking_utils.py

* Update utils.py

* override for export

* Update executorch.py

* Update executorch.py

* Update executorch.py

* Update executorch.py

* Update masking_utils.py

* Update masking_utils.py

* output attentions

* style

* Update masking_utils.py

* Update executorch.py

* Add doicstring

* Add license and put mask visualizer at the end

* Update test_modeling_common.py

* fix broken test

* Update test_modeling_gemma.py

* Update test_modeling_gemma2.py

* Use fullgraph=False with FA2

* Update utils.py

* change name

* Update masking_utils.py

* improve doc

* change name

* Update modeling_attn_mask_utils.py

* more explicit logic based on model's property

* pattern in config

* extend

* fixes

* make it better

* generalize to other test models

* fix

* Update masking_utils.py

* fix

* do not check mask equivalence if layer types are different

* executorch

* Update modeling_gemma2.py

* Update masking_utils.py

* use layer_idx instead

* adjust

* Update masking_utils.py

* test

* fix imports

* Update modeling_gemma2.py

* other test models

* Update modeling_llama4.py

* Update masking_utils.py

* improve

* simplify

* Update masking_utils.py

* typos

* typo

* fix

* Update masking_utils.py

* default DynamicCache

* remove default cache

* simplify

* Update masking_utils.py

* Update masking_utils.py

* Update masking_utils.py

* Update masking_utils.py

* simplify

* Update masking_utils.py

* Update masking_utils.py

* Update masking_utils.py

* export

* Update executorch.py

* Update executorch.py

* Update flex_attention.py

* Update executorch.py

* upstream to modular gemma 1 & 2

* Update modular_mistral.py

* switch names

* use dict

* put it in the Layer directly

* update copy model source for mask functions

* apply so many modular (hopefully 1 shot)

* use explicite dicts for make style happy

* protect import

* check docstring

* better default in hybrid caches

* qwens

* Update modular_qwen2.py

* simplify core logic!

* Update executorch.py

* qwen3 moe

* Update masking_utils.py

* Update masking_utils.py

* simplify a lot sdpa causal skip

* Update masking_utils.py

* post-rebase

* gemma3 finally

* style

* check it before

* gemma3

* More general with newer torch

* align gemma3

* Update utils.py

* Update utils.py

* Update masking_utils.py

* Update test_modeling_common.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* test

* executorch

* Update test_modeling_common.py

* Update masking_utils.py

* Update masking_utils.py

* Update masking_utils.py

* Update masking_utils.py

* Update executorch.py

* Update test_modeling_common.py

* fix copies

* device

* sdpa can be used without mask -> pass the torchscript tests in this case

* Use enum for check

* revert enum and add check instead

* remove broken test

* cohere2

* some doc & reorganize the Interface

* Update tensor_parallel.py

* Update tensor_parallel.py

* doc and dummy

* Update test_modeling_paligemma2.py

* Update modeling_falcon_h1.py

* Update masking_utils.py

* executorch patch

* style

* CIs

* use register in executorch

* final comments!

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
2025-05-22 11:38:26 +02:00
..
test_module 🔴 Video processors as a separate class (#35206) 2025-05-12 11:55:51 +02:00
tf_ops Check TF ops for ONNX compliance (#10025) 2021-02-15 07:55:10 -05:00
add_pipeline_model_mapping_to_test.py update ruff version (#30932) 2024-05-22 06:40:15 +02:00
check_bad_commit.py CI reporting improvements (#38230) 2025-05-20 19:34:58 +02:00
check_build.py Use deformable_detr kernel from the Hub (#36853) 2025-03-21 13:08:47 +01:00
check_config_attributes.py 🚨🚨[core] Completely rewrite the masking logic for all attentions (#37866) 2025-05-22 11:38:26 +02:00
check_config_docstrings.py Add GraniteMoeHybrid support for 4.0 (#37658) 2025-05-06 06:47:43 +02:00
check_copies.py Fix typos (#37978) 2025-05-06 14:45:20 +01:00
check_doc_toc.py update ruff version (#30932) 2024-05-22 06:40:15 +02:00
check_docstrings.py [AutoDocstring] Based on inspect parsing of the signature (#33771) 2025-05-08 17:46:07 -04:00
check_doctest_list.py update ruff version (#30932) 2024-05-22 06:40:15 +02:00
check_dummies.py Add llama4 (#37307) 2025-04-05 22:02:22 +02:00
check_inits.py Simplify soft dependencies and update the dummy-creation process (#36827) 2025-04-11 11:08:36 +02:00
check_model_tester.py Add a new script to check model testers' config (#22063) 2023-03-13 19:11:19 +01:00
check_modular_conversion.py Fix wrong argparse type in modular checker script (#37472) 2025-04-14 16:11:29 +01:00
check_repo.py 🔴 [VLM] Add base model without head (#37033) 2025-05-07 17:47:51 +02:00
check_self_hosted_runner.py Tiny fix for check_self_hosted_runner.py (#24052) 2023-06-06 18:17:41 +02:00
check_tf_ops.py Check TF ops for ONNX compliance (#10025) 2021-02-15 07:55:10 -05:00
create_dependency_mapping.py Modular Conversion --fix_and_overwrite on Windows (#36583) 2025-03-06 13:12:30 +00:00
create_dummy_models.py CI: fix efficientnet pipeline timeout and prevent future similar issues due to large image size (#33123) 2024-08-27 11:58:27 +01:00
custom_init_isort.py chore: fix typos in utils module (#36668) 2025-03-13 15:12:44 +00:00
deprecate_models.py chore: fix typos in utils module (#36668) 2025-03-13 15:12:44 +00:00
download_glue_data.py Update ruff to 0.11.2 (#36962) 2025-03-25 16:00:11 +01:00
extract_pr_number_from_circleci.py Trigger CircleCI via GitHub Actions when ready for review (#37885) 2025-05-09 11:45:03 +02:00
extract_warnings.py update github actions packages' version to suppress warnings (#30249) 2024-04-15 15:08:09 +02:00
fetch_hub_objects_for_ci.py Try to avoid/reduce some remaining CI job failures (#37202) 2025-04-02 14:39:57 +02:00
get_ci_error_statistics.py Add artifact name in job step to maintain job / artifact correspondence (#28682) 2024-01-31 15:58:17 +01:00
get_github_job_time.py Update ruff to 0.11.2 (#36962) 2025-03-25 16:00:11 +01:00
get_modified_files.py exclude deleted files in the fixup script (#21436) 2023-02-03 12:57:02 -05:00
get_previous_daily_ci.py CI reporting improvements (#38230) 2025-05-20 19:34:58 +02:00
get_test_info.py CI: fix efficientnet pipeline timeout and prevent future similar issues due to large image size (#33123) 2024-08-27 11:58:27 +01:00
important_models.txt ENH: [CI] Add new workflow to run slow tests of important models on push main if they are modified (#29235) 2024-04-12 10:01:28 +02:00
models_to_deprecate.py update ruff version (#30932) 2024-05-22 06:40:15 +02:00
modular_model_converter.py 🔴 Video processors as a separate class (#35206) 2025-05-12 11:55:51 +02:00
not_doctested.txt Samhq model addition (#35147) 2025-04-28 19:07:09 +02:00
notification_service_doc_tests.py Refactor doctest (#30210) 2024-04-15 13:20:36 +02:00
notification_service_quantization.py CI reporting improvements (#38230) 2025-05-20 19:34:58 +02:00
notification_service.py CI reporting improvements (#38230) 2025-05-20 19:34:58 +02:00
past_ci_versions.py Update ruff to 0.11.2 (#36962) 2025-03-25 16:00:11 +01:00
patch_helper.py [Patch helper] update to not have to checkout main (#34006) 2024-10-09 09:21:46 +02:00
pr_slow_ci_models.py notify new model merged to main (#36375) 2025-02-24 17:53:18 +01:00
print_env.py add XPU info print in print_env (#38282) 2025-05-22 11:03:56 +02:00
process_bad_commit_report.py CI reporting improvements (#38230) 2025-05-20 19:34:58 +02:00
process_circleci_workflow_test_reports.py Update ruff to 0.11.2 (#36962) 2025-03-25 16:00:11 +01:00
process_test_artifacts.py fix the parallel number of CI nodes when it is smaller than number of tests (#33276) 2024-09-03 16:53:21 +02:00
release.py Remove research projects (#36645) 2025-03-11 13:47:38 +00:00
set_cuda_devices_for_ci.py Fix Cohere CI (#31263) 2024-06-10 15:16:58 +02:00
slow_documentation_tests.txt Update CodeLlama references (#30218) 2024-05-09 22:57:52 +02:00
sort_auto_mappings.py update ruff version (#30932) 2024-05-22 06:40:15 +02:00
split_doctest_jobs.py chore: fix typos in utils module (#36668) 2025-03-13 15:12:44 +00:00
split_model_tests.py consistent job / pytest report / artifact name correspondence (#30392) 2024-04-24 22:32:42 +02:00
tests_fetcher.py Add Optional to remaining types (#37808) 2025-04-28 14:20:45 +01:00
update_metadata.py Update ruff to 0.11.2 (#36962) 2025-03-25 16:00:11 +01:00
update_tiny_models.py Mention model_info.id instead of model_info.modelId (#32106) 2024-07-22 14:14:47 +01:00