transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-03 12:50:06 +06:00

History

Sukriti Sharma 471958b620 Add GraniteMoeHybrid support for 4.0 (#37658 ) * initial config and MLA layer Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * first pass at decoder Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * completion of layers Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * modeling class Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * adding hybrid class to imports Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix imports granitemoehybrid Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix granitehybrid imports Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix granitehybrid import Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix generated modeling file Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * add some comments Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * minor fixes in layers Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * add sharedMLP layer Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * correct layer names Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fixes in mamba config Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix mamba config Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * change name of MLP layer Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix seq mizer layers Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * correct mamba config Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fixes in param names Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * enable hybrid model Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * update config Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix config granite hybrid Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix attention layer Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * cleanup to re-use mamba code Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * keep layer types Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * attention bias cleanup Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * update mamba layer name Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * first pass at tests Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * first pass at tests Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * use granite attention Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix: self attn weights Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * pass at making pos_emb optional Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * initialize self_attn only as needed Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * overwrite forward to create HybridMambaCache Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * Log invalid layer types * Add attention outputs test * Only emit attentions/logits if not None * Fix config test hidden size divisibility * mark granitmoehybrid as stateful * Initialize mamba convolutional layers * Formatting fixes * config docstring, removed some unused attrs * Fix missing arg in models test * Fix create and check decoder model test * support logits to keep in granitemoe * regen to pass logits_to_keep * Allow None or rope * Fix gradient checkpointing * Add granitemoehybrid as special cache for generate check * Remove unused MLA refs * Fix mamba layer mask * Remove logits to keep from config * Minor docstring nits * Update licenses * Enable cache by default * map layer types to layer block type * First pass at granite moe hybrid docs * Ignore granite moe hybrid in valid checkpoint check * Align attention interfaces * regenerate modular granitemoeshared attention interface * Align granite moe hybrid attn interface * run formatting * Handle mamba initialization * avoid conditional attr defs * Move hybrid layer validation to config * Add placeholder integration tests * Docs nits / Update model names * Clean up forward conditions * Use gradient checkpointing layer * Remove some copied bamba tests + inherit align test init delete more tests Use common layer init with bamba tests finish test consolidation * avoid redundant intermediate std var * use @can_return_tuple * Remove unused moe state * make skipped test names consistent * Fix docstring order * Add missing toc * Always create the shared mlp * Fix name in docstring * link preview model in docs --------- Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> Co-authored-by: Alex-Brooks <Alex.Brooks@ibm.com>		2025-05-06 06:47:43 +02:00
..
test_module	AutoImageProcessor (#20111 )	2022-11-08 19:54:41 +00:00
tf_ops	Check TF ops for ONNX compliance (#10025 )	2021-02-15 07:55:10 -05:00
add_pipeline_model_mapping_to_test.py	update ruff version (#30932 )	2024-05-22 06:40:15 +02:00
check_bad_commit.py	Fix `utils/check_bad_commit.py` (#37272 )	2025-04-04 12:18:20 +02:00
check_build.py	Use `deformable_detr` kernel from the Hub (#36853 )	2025-03-21 13:08:47 +01:00
check_config_attributes.py	Add D-FINE Model into Transformers (#36261 )	2025-04-29 12:17:55 +01:00
check_config_docstrings.py	Add GraniteMoeHybrid support for 4.0 (#37658 )	2025-05-06 06:47:43 +02:00
check_copies.py	Samhq model addition (#35147 )	2025-04-28 19:07:09 +02:00
check_doc_toc.py	update ruff version (#30932 )	2024-05-22 06:40:15 +02:00
check_docstrings.py	Samhq model addition (#35147 )	2025-04-28 19:07:09 +02:00
check_doctest_list.py	update ruff version (#30932 )	2024-05-22 06:40:15 +02:00
check_dummies.py	Add llama4 (#37307 )	2025-04-05 22:02:22 +02:00
check_inits.py	Simplify soft dependencies and update the dummy-creation process (#36827 )	2025-04-11 11:08:36 +02:00
check_model_tester.py	Add a new script to check model testers' config (#22063 )	2023-03-13 19:11:19 +01:00
check_modular_conversion.py	Fix wrong argparse type in modular checker script (#37472 )	2025-04-14 16:11:29 +01:00
check_repo.py	Samhq model addition (#35147 )	2025-04-28 19:07:09 +02:00
check_self_hosted_runner.py	Tiny fix for `check_self_hosted_runner.py` (#24052 )	2023-06-06 18:17:41 +02:00
check_tf_ops.py	Check TF ops for ONNX compliance (#10025 )	2021-02-15 07:55:10 -05:00
create_dependency_mapping.py	Modular Conversion --fix_and_overwrite on Windows (#36583 )	2025-03-06 13:12:30 +00:00
create_dummy_models.py	CI: fix `efficientnet` pipeline timeout and prevent future similar issues due to large image size (#33123 )	2024-08-27 11:58:27 +01:00
custom_init_isort.py	chore: fix typos in utils module (#36668 )	2025-03-13 15:12:44 +00:00
deprecate_models.py	chore: fix typos in utils module (#36668 )	2025-03-13 15:12:44 +00:00
download_glue_data.py	Update ruff to `0.11.2` (#36962 )	2025-03-25 16:00:11 +01:00
extract_warnings.py	update github actions packages' version to suppress warnings (#30249 )	2024-04-15 15:08:09 +02:00
fetch_hub_objects_for_ci.py	Try to avoid/reduce some remaining CI job failures (#37202 )	2025-04-02 14:39:57 +02:00
get_ci_error_statistics.py	Add artifact name in job step to maintain job / artifact correspondence (#28682 )	2024-01-31 15:58:17 +01:00
get_github_job_time.py	Update ruff to `0.11.2` (#36962 )	2025-03-25 16:00:11 +01:00
get_modified_files.py	exclude deleted files in the fixup script (#21436 )	2023-02-03 12:57:02 -05:00
get_previous_daily_ci.py	Ping team members for new failed tests in daily CI (#34171 )	2024-10-17 16:11:52 +02:00
get_test_info.py	CI: fix `efficientnet` pipeline timeout and prevent future similar issues due to large image size (#33123 )	2024-08-27 11:58:27 +01:00
important_models.txt	ENH: [`CI`] Add new workflow to run slow tests of important models on push main if they are modified (#29235 )	2024-04-12 10:01:28 +02:00
models_to_deprecate.py	update ruff version (#30932 )	2024-05-22 06:40:15 +02:00
modular_model_converter.py	[modular] Fix the prefix-based renaming if the old and new model share a common name suffix (#37829 )	2025-04-29 10:43:23 +02:00
not_doctested.txt	Samhq model addition (#35147 )	2025-04-28 19:07:09 +02:00
notification_service_doc_tests.py	Refactor doctest (#30210 )	2024-04-15 13:20:36 +02:00
notification_service_quantization.py	Update ruff to `0.11.2` (#36962 )	2025-03-25 16:00:11 +01:00
notification_service.py	More fault tolerant notification service (#37924 )	2025-05-05 15:19:48 +02:00
past_ci_versions.py	Update ruff to `0.11.2` (#36962 )	2025-03-25 16:00:11 +01:00
patch_helper.py	[`Patch helper`] update to not have to checkout main (#34006 )	2024-10-09 09:21:46 +02:00
pr_slow_ci_models.py	notify new model merged to `main` (#36375 )	2025-02-24 17:53:18 +01:00
print_env.py	Print more library versions in CI (#17384 )	2022-06-02 10:24:16 +02:00
process_bad_commit_report.py	Tiny update after #34383 (#34404 )	2024-10-28 12:01:05 +01:00
process_circleci_workflow_test_reports.py	Update ruff to `0.11.2` (#36962 )	2025-03-25 16:00:11 +01:00
process_test_artifacts.py	fix the parallel number of CI nodes when it is smaller than number of tests (#33276 )	2024-09-03 16:53:21 +02:00
release.py	Remove research projects (#36645 )	2025-03-11 13:47:38 +00:00
set_cuda_devices_for_ci.py	Fix Cohere CI (#31263 )	2024-06-10 15:16:58 +02:00
slow_documentation_tests.txt	Update CodeLlama references (#30218 )	2024-05-09 22:57:52 +02:00
sort_auto_mappings.py	update ruff version (#30932 )	2024-05-22 06:40:15 +02:00
split_doctest_jobs.py	chore: fix typos in utils module (#36668 )	2025-03-13 15:12:44 +00:00
split_model_tests.py	consistent job / pytest report / artifact name correspondence (#30392 )	2024-04-24 22:32:42 +02:00
tests_fetcher.py	Add Optional to remaining types (#37808 )	2025-04-28 14:20:45 +01:00
update_metadata.py	Update ruff to `0.11.2` (#36962 )	2025-03-25 16:00:11 +01:00
update_tiny_models.py	Mention model_info.id instead of model_info.modelId (#32106 )	2024-07-22 14:14:47 +01:00