transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-31 02:02:21 +06:00

History

Ryan Mullins 50d3530aa0 Gemma3 (#36658 ) * Fix converter * [Broken] Adds Gemma 3 to Hugging Face Transformers * Consolidating Config and Processor params across impls * Sorting out configuration parameters. Adds qk_norm before RoPE. Still not sure if RoPE is right. * Additional plumbing for CausalLM and ConditionalGeneration variants * incomplete draft of Orbax conversion script * More complete checkpoint conversion * Supporting Gemma 3 1B checkpoints * Updating RoPE for multiple frequencies * Adjustments to rotary embedder * Proof of life for text-only operation * Updating the conversion script to handle multimodal projection weights * Fixing tet-only conversions * Cleaner conversion script with multimodal support and a simpler processor * Additional refatcors to the Gemma3Processor * Simplified Processor to work over text representations * Updated conversion script to join text and vision embeddings at converion time * Logging for debugging * Update src/transformers/models/gemma2/modeling_gemma2.py Co-authored-by: Joshua Lochner <admin@xenova.com> * Removed extraneous Config params * Switching to fast tokenizer for checkpoint conversions * isolating siglip for performance tetsing * Minor changes for debugging tests against baselines * Adding average pooling for soft tokens * Updating processor code to enable simpler embedding interleaving for arbitrary number of images in prompts * Updating conversion script for ShieldGemma 2 conversion compatibility * Allow disable_compile to be provided as a kwarg * Refresh from modular * Updated conversion script and corrected sliding window * Fix type mismatch in cache_position (#4) * Fix dtype (#5) * Fix type mismatch in cache_position * Actually fix in the modular file Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com> --------- Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com> * fixes for embedding table overflow and missing image_soft_token_mask from Gemma3Processor * Adding 2D pooling for image embeddings * Revert "Adding 2D pooling for image embeddings" This reverts commit `65350cf531`. * Gemma3 average pooling changed from 1D to 2D * Major refactor to Gemma3MultimodalInputProjection * Updating Gemm 3 Auto* registrations * Add option to save Gemma 3 chat template with tokenizer during weights conversion * Removing unused imports * Moving out-of-vocab handling from Gemma3Processor to Gemma3ForConditionalGeneration * Removing duplicate config property * Removing final logit softcapping and 1-indexing of position ids * Fixing image processor config and none --> None typo * Fixing sliding window size for 1B * Updating image_mean and image_std in Image Processor * Attention masking changed to lower triangular * Moving image special tokens to conversion script * Mirror image processor defaults from conversion script into Gemma3ProcessorKwargs * Remove special token variables from symbol space * Moving image soft token mask computation from Gemma3Processor to Gemma3ForConditionalGeneration * tie lm_head and embedding weights Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com> * Correct tied weights in Gemma3CausalLM * iterative bidirectional attention * resolving merge conflicts * Reverting to Gemma 2 HybridCache with sldiing window support and a sliding_window_pattern of 6 * Correcting RoPE scaling * clean up first pass, dummy model geenration works * final clean up before fixing tests * causal lm test works, so fine * Fix conversion * Update src/transformers/models/gemma3/processing_gemma3.py * model tests are happy * processor tests are happy * image processing tests added * fixup * Fix pre-processing in conversion * Inputs merging * Do not normalize vision embeddings * Apply Ryan's (and team) changes to attention * token type ids + mask * template * move embed scale, add rope scale, fix tests * Add chat template to tokenizer * Use prefix for causal model loading * use existing code for sliding mask from gemma2 * self.embed_tokens already normalizes * Correcting Gemma3TextConfig parameters in conversion script * typo, modular overwrites my fixes * enable device map for text model * Conversion updates * ultra nit: no einsums * update image token * copy deepcopy config + some docs * add some test, still WIP * Refactoring --include_chat_tempalte logic in converter * Update src/transformers/models/gemma3/modular_gemma3.py Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> * Add eos tokens for instruct models * dump so i can work on dgx * Removing add_bos by default * dump * add fast im proc * docs for PaS + fixup * another fixup * one more fixup * fix tests * Inverting prior BOS change * ultra nit * Reverting to Tokenizer saved with add_bos_token=True and chat template starting with BOS * resize embeds, remove sqrt, add slow test outputs * FA2 but quality is meh * nit * skip FA2, no idea what happened * last bit for green CI * please, green CI for docs * T_T * Fix for Gemma3 logits * Support both options for system prompt * Update src/transformers/models/gemma3/image_processing_gemma3_fast.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/model_doc/gemma3.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/model_doc/gemma3.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/model_doc/gemma3.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/model_doc/gemma3.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/model_doc/gemma3.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Docs updates now that assets are live * Style fixes --------- Co-authored-by: Joshua Lochner <admin@xenova.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com> Co-authored-by: Mayank Chaturvedi <imayank@google.com> Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com> Co-authored-by: raushan <raushan@huggingface.co> Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> Co-authored-by: Lysandre <hi@lysand.re>		2025-03-12 09:06:17 +01:00
..
test_module	AutoImageProcessor (#20111 )	2022-11-08 19:54:41 +00:00
tf_ops	Check TF ops for ONNX compliance (#10025 )	2021-02-15 07:55:10 -05:00
add_pipeline_model_mapping_to_test.py	update ruff version (#30932 )	2024-05-22 06:40:15 +02:00
check_bad_commit.py	Fix `utils/check_bad_commit.py` (for auto ping in CI) (#34943 )	2024-11-28 15:34:38 +01:00
check_build.py	Fix import of `FalconMambaForCausalLM` (#33381 )	2024-09-10 09:14:54 +02:00
check_config_attributes.py	Gemma3 (#36658 )	2025-03-12 09:06:17 +01:00
check_config_docstrings.py	Add TimmWrapper (#34564 )	2024-12-11 12:40:30 +00:00
check_copies.py	[Modular] skip modular checks based on diff (#36130 )	2025-02-13 12:53:21 +00:00
check_doc_toc.py	update ruff version (#30932 )	2024-05-22 06:40:15 +02:00
check_docstrings.py	Refactoring of ImageProcessorFast (#35069 )	2025-02-04 17:52:31 -05:00
check_doctest_list.py	update ruff version (#30932 )	2024-05-22 06:40:15 +02:00
check_dummies.py	update ruff version (#30932 )	2024-05-22 06:40:15 +02:00
check_inits.py	Fix import of `FalconMambaForCausalLM` (#33381 )	2024-09-10 09:14:54 +02:00
check_model_tester.py	Add a new script to check model testers' config (#22063 )	2023-03-13 19:11:19 +01:00
check_modular_conversion.py	Modular Conversion --fix_and_overwrite on Windows (#36583 )	2025-03-06 13:12:30 +00:00
check_repo.py	Add SigLIP 2 (#36323 )	2025-02-21 09:04:19 +00:00
check_self_hosted_runner.py	Tiny fix for `check_self_hosted_runner.py` (#24052 )	2023-06-06 18:17:41 +02:00
check_tf_ops.py	Check TF ops for ONNX compliance (#10025 )	2021-02-15 07:55:10 -05:00
create_dependency_mapping.py	Modular Conversion --fix_and_overwrite on Windows (#36583 )	2025-03-06 13:12:30 +00:00
create_dummy_models.py	CI: fix `efficientnet` pipeline timeout and prevent future similar issues due to large image size (#33123 )	2024-08-27 11:58:27 +01:00
custom_init_isort.py	Import structure & first three model refactors (#31329 )	2024-09-10 11:10:53 +02:00
deprecate_models.py	Remove copied froms for deprecated models (#31153 )	2024-06-03 09:42:53 +01:00
download_glue_data.py	update ruff version (#30932 )	2024-05-22 06:40:15 +02:00
extract_warnings.py	update github actions packages' version to suppress warnings (#30249 )	2024-04-15 15:08:09 +02:00
get_ci_error_statistics.py	Add artifact name in job step to maintain job / artifact correspondence (#28682 )	2024-01-31 15:58:17 +01:00
get_github_job_time.py	Make Slack CI reporting stronger (#21823 )	2023-02-28 17:12:44 +01:00
get_modified_files.py	exclude deleted files in the fixup script (#21436 )	2023-02-03 12:57:02 -05:00
get_previous_daily_ci.py	Ping team members for new failed tests in daily CI (#34171 )	2024-10-17 16:11:52 +02:00
get_test_info.py	CI: fix `efficientnet` pipeline timeout and prevent future similar issues due to large image size (#33123 )	2024-08-27 11:58:27 +01:00
important_models.txt	ENH: [`CI`] Add new workflow to run slow tests of important models on push main if they are modified (#29235 )	2024-04-12 10:01:28 +02:00
models_to_deprecate.py	update ruff version (#30932 )	2024-05-22 06:40:15 +02:00
modular_model_converter.py	Fix doc formatting in forward passes & modular (#36243 )	2025-02-25 11:09:01 +01:00
not_doctested.txt	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
notification_service_doc_tests.py	Refactor doctest (#30210 )	2024-04-15 13:20:36 +02:00
notification_service_quantization.py	Revive Nightly/Past CI (#31159 )	2024-06-20 18:57:24 +02:00
notification_service.py	Remove old `benchmark` code (#35730 )	2025-01-21 17:56:43 +00:00
past_ci_versions.py	(Re-)Enable Nightly + Past CI (#22393 )	2023-03-30 21:06:35 +02:00
patch_helper.py	[`Patch helper`] update to not have to checkout main (#34006 )	2024-10-09 09:21:46 +02:00
pr_slow_ci_models.py	notify new model merged to `main` (#36375 )	2025-02-24 17:53:18 +01:00
print_env.py	Print more library versions in CI (#17384 )	2022-06-02 10:24:16 +02:00
process_bad_commit_report.py	Tiny update after #34383 (#34404 )	2024-10-28 12:01:05 +01:00
process_circleci_workflow_test_reports.py	Aggeregate test summary files in CircleCI workflow runs (#34989 )	2024-12-16 11:06:17 +01:00
process_test_artifacts.py	fix the parallel number of CI nodes when it is smaller than number of tests (#33276 )	2024-09-03 16:53:21 +02:00
release.py	Remove research projects (#36645 )	2025-03-11 13:47:38 +00:00
set_cuda_devices_for_ci.py	Fix Cohere CI (#31263 )	2024-06-10 15:16:58 +02:00
slow_documentation_tests.txt	Update CodeLlama references (#30218 )	2024-05-09 22:57:52 +02:00
sort_auto_mappings.py	update ruff version (#30932 )	2024-05-22 06:40:15 +02:00
split_doctest_jobs.py	Refactor doctest (#30210 )	2024-04-15 13:20:36 +02:00
split_model_tests.py	consistent job / pytest report / artifact name correspondence (#30392 )	2024-04-24 22:32:42 +02:00
tests_fetcher.py	Ignore conversion files in test fetcher (#36251 )	2025-02-20 13:32:02 +01:00
update_metadata.py	Add ColPali to 🤗 transformers (#33736 )	2024-12-17 11:26:43 +01:00
update_tiny_models.py	Mention model_info.id instead of model_info.modelId (#32106 )	2024-07-22 14:14:47 +01:00