transformers/utils
Sukriti Sharma 471958b620
Add GraniteMoeHybrid support for 4.0 (#37658)
* initial config and MLA layer

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* first pass at decoder

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* completion of layers

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* modeling class

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* adding hybrid class to imports

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fix imports granitemoehybrid

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fix granitehybrid imports

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fix granitehybrid import

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fix generated modeling file

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* add some comments

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* minor fixes in layers

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* add sharedMLP layer

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* correct layer names

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fixes in mamba config

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fix mamba config

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* change name of MLP layer

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fix seq mizer layers

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* correct mamba config

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fixes in param names

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* enable hybrid model

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* update config

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fix config granite hybrid

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fix attention layer

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* cleanup to re-use mamba code

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* keep layer types

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* attention bias cleanup

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* update mamba layer name

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* first pass at tests

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* first pass at tests

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* use granite attention

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fix: self attn weights

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* pass at making pos_emb optional

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* initialize self_attn only as needed

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* overwrite forward to create HybridMambaCache

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* Log invalid layer types

* Add attention outputs test

* Only emit attentions/logits if not None

* Fix config test hidden size divisibility

* mark granitmoehybrid as stateful

* Initialize mamba convolutional layers

* Formatting fixes

* config docstring, removed some unused attrs

* Fix missing arg in models test

* Fix create and check decoder model test

* support logits to keep in granitemoe

* regen to pass logits_to_keep

* Allow None or rope

* Fix gradient checkpointing

* Add granitemoehybrid as special cache for generate check

* Remove unused MLA refs

* Fix mamba layer mask

* Remove logits to keep from config

* Minor docstring nits

* Update licenses

* Enable cache by default

* map layer types to layer block type

* First pass at granite moe hybrid docs

* Ignore granite moe hybrid in valid checkpoint check

* Align attention interfaces

* regenerate modular granitemoeshared attention interface

* Align granite moe hybrid attn interface

* run formatting

* Handle mamba initialization

* avoid conditional attr defs

* Move hybrid layer validation to config

* Add placeholder integration tests

* Docs nits / Update model names

* Clean up forward conditions

* Use gradient checkpointing layer

* Remove some copied bamba tests + inherit

align test init

delete more tests

Use common layer init with bamba tests

finish test consolidation

* avoid redundant intermediate std var

* use @can_return_tuple

* Remove unused moe state

* make skipped test names consistent

* Fix docstring order

* Add missing toc

* Always create the shared mlp

* Fix name in docstring

* link preview model in docs

---------

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Co-authored-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-05-06 06:47:43 +02:00
..
test_module AutoImageProcessor (#20111) 2022-11-08 19:54:41 +00:00
tf_ops Check TF ops for ONNX compliance (#10025) 2021-02-15 07:55:10 -05:00
add_pipeline_model_mapping_to_test.py update ruff version (#30932) 2024-05-22 06:40:15 +02:00
check_bad_commit.py Fix utils/check_bad_commit.py (#37272) 2025-04-04 12:18:20 +02:00
check_build.py Use deformable_detr kernel from the Hub (#36853) 2025-03-21 13:08:47 +01:00
check_config_attributes.py Add D-FINE Model into Transformers (#36261) 2025-04-29 12:17:55 +01:00
check_config_docstrings.py Add GraniteMoeHybrid support for 4.0 (#37658) 2025-05-06 06:47:43 +02:00
check_copies.py Samhq model addition (#35147) 2025-04-28 19:07:09 +02:00
check_doc_toc.py update ruff version (#30932) 2024-05-22 06:40:15 +02:00
check_docstrings.py Samhq model addition (#35147) 2025-04-28 19:07:09 +02:00
check_doctest_list.py update ruff version (#30932) 2024-05-22 06:40:15 +02:00
check_dummies.py Add llama4 (#37307) 2025-04-05 22:02:22 +02:00
check_inits.py Simplify soft dependencies and update the dummy-creation process (#36827) 2025-04-11 11:08:36 +02:00
check_model_tester.py Add a new script to check model testers' config (#22063) 2023-03-13 19:11:19 +01:00
check_modular_conversion.py Fix wrong argparse type in modular checker script (#37472) 2025-04-14 16:11:29 +01:00
check_repo.py Samhq model addition (#35147) 2025-04-28 19:07:09 +02:00
check_self_hosted_runner.py Tiny fix for check_self_hosted_runner.py (#24052) 2023-06-06 18:17:41 +02:00
check_tf_ops.py Check TF ops for ONNX compliance (#10025) 2021-02-15 07:55:10 -05:00
create_dependency_mapping.py Modular Conversion --fix_and_overwrite on Windows (#36583) 2025-03-06 13:12:30 +00:00
create_dummy_models.py CI: fix efficientnet pipeline timeout and prevent future similar issues due to large image size (#33123) 2024-08-27 11:58:27 +01:00
custom_init_isort.py chore: fix typos in utils module (#36668) 2025-03-13 15:12:44 +00:00
deprecate_models.py chore: fix typos in utils module (#36668) 2025-03-13 15:12:44 +00:00
download_glue_data.py Update ruff to 0.11.2 (#36962) 2025-03-25 16:00:11 +01:00
extract_warnings.py update github actions packages' version to suppress warnings (#30249) 2024-04-15 15:08:09 +02:00
fetch_hub_objects_for_ci.py Try to avoid/reduce some remaining CI job failures (#37202) 2025-04-02 14:39:57 +02:00
get_ci_error_statistics.py Add artifact name in job step to maintain job / artifact correspondence (#28682) 2024-01-31 15:58:17 +01:00
get_github_job_time.py Update ruff to 0.11.2 (#36962) 2025-03-25 16:00:11 +01:00
get_modified_files.py exclude deleted files in the fixup script (#21436) 2023-02-03 12:57:02 -05:00
get_previous_daily_ci.py Ping team members for new failed tests in daily CI (#34171) 2024-10-17 16:11:52 +02:00
get_test_info.py CI: fix efficientnet pipeline timeout and prevent future similar issues due to large image size (#33123) 2024-08-27 11:58:27 +01:00
important_models.txt ENH: [CI] Add new workflow to run slow tests of important models on push main if they are modified (#29235) 2024-04-12 10:01:28 +02:00
models_to_deprecate.py update ruff version (#30932) 2024-05-22 06:40:15 +02:00
modular_model_converter.py [modular] Fix the prefix-based renaming if the old and new model share a common name suffix (#37829) 2025-04-29 10:43:23 +02:00
not_doctested.txt Samhq model addition (#35147) 2025-04-28 19:07:09 +02:00
notification_service_doc_tests.py Refactor doctest (#30210) 2024-04-15 13:20:36 +02:00
notification_service_quantization.py Update ruff to 0.11.2 (#36962) 2025-03-25 16:00:11 +01:00
notification_service.py More fault tolerant notification service (#37924) 2025-05-05 15:19:48 +02:00
past_ci_versions.py Update ruff to 0.11.2 (#36962) 2025-03-25 16:00:11 +01:00
patch_helper.py [Patch helper] update to not have to checkout main (#34006) 2024-10-09 09:21:46 +02:00
pr_slow_ci_models.py notify new model merged to main (#36375) 2025-02-24 17:53:18 +01:00
print_env.py Print more library versions in CI (#17384) 2022-06-02 10:24:16 +02:00
process_bad_commit_report.py Tiny update after #34383 (#34404) 2024-10-28 12:01:05 +01:00
process_circleci_workflow_test_reports.py Update ruff to 0.11.2 (#36962) 2025-03-25 16:00:11 +01:00
process_test_artifacts.py fix the parallel number of CI nodes when it is smaller than number of tests (#33276) 2024-09-03 16:53:21 +02:00
release.py Remove research projects (#36645) 2025-03-11 13:47:38 +00:00
set_cuda_devices_for_ci.py Fix Cohere CI (#31263) 2024-06-10 15:16:58 +02:00
slow_documentation_tests.txt Update CodeLlama references (#30218) 2024-05-09 22:57:52 +02:00
sort_auto_mappings.py update ruff version (#30932) 2024-05-22 06:40:15 +02:00
split_doctest_jobs.py chore: fix typos in utils module (#36668) 2025-03-13 15:12:44 +00:00
split_model_tests.py consistent job / pytest report / artifact name correspondence (#30392) 2024-04-24 22:32:42 +02:00
tests_fetcher.py Add Optional to remaining types (#37808) 2025-04-28 14:20:45 +01:00
update_metadata.py Update ruff to 0.11.2 (#36962) 2025-03-25 16:00:11 +01:00
update_tiny_models.py Mention model_info.id instead of model_info.modelId (#32106) 2024-07-22 14:14:47 +01:00