transformers/tests
pglorio 33cb1f7b61
Add Zamba2 (#34517)
* First commit

* Finish model implementation

* First commit

* Finish model implementation

* Register zamba2

* generated modeling and configuration

* generated modeling and configuration

* added hybrid cache

* fix attention_mask in mamba

* dropped unused loras

* fix flash2

* config docstrings

* fix config and fwd pass

* make fixup fixes

* text_modeling_zamba2

* small fixes

* make fixup fixes

* Fix modular model converter

* added inheritances in modular, renamed zamba cache

* modular rebase

* new modular conversion

* fix generated modeling file

* fixed import for Zamba2RMSNormGated

* modular file cleanup

* make fixup and model tests

* dropped inheritance for Zamba2PreTrainedModel

* make fixup and unit tests

* Add inheritance of rope from GemmaRotaryEmbedding

* moved rope to model init

* drop del self.self_attn and del self.feed_forward

* fix tests

* renamed lora -> adapter

* rewrote adapter implementation

* fixed tests

* Fix torch_forward in mamba2 layer

* Fix torch_forward in mamba2 layer

* Fix torch_forward in mamba2 layer

* Dropped adapter in-place sum

* removed rope from attention init

* updated rope

* created get_layers method

* make fixup fix

* make fixup fixes

* make fixup fixes

* update to new attention standard

* update to new attention standard

* make fixup fixes

* minor fixes

* cache_position

* removed cache_position postion_ids use_cache

* remove config from modular

* removed config from modular (2)

* import apply_rotary_pos_emb from llama

* fixed rope_kwargs

* Instantiate cache in Zamba2Model

* fix cache

* fix @slow decorator

* small fix in modular file

* Update docs/source/en/model_doc/zamba2.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* several minor fixes

* inherit mamba2decoder fwd and drop position_ids in mamba

* removed docstrings from modular

* reinstate zamba2 attention decoder fwd

* use regex for tied keys

* Revert "use regex for tied keys"

This reverts commit 9007a522b1.

* use regex for tied keys

* add cpu to slow forward tests

* dropped config.use_shared_mlp_adapter

* Update docs/source/en/model_doc/zamba2.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* re-convert from modular

---------

Co-authored-by: root <root@node-2.us-southcentral1-a.compute.internal>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-01-27 10:51:23 +01:00
..
agents use torch.testing.assertclose instead to get more details about error in cis (#35659) 2025-01-24 16:55:28 +01:00
bettertransformer use torch.testing.assertclose instead to get more details about error in cis (#35659) 2025-01-24 16:55:28 +01:00
deepspeed use torch.testing.assertclose instead to get more details about error in cis (#35659) 2025-01-24 16:55:28 +01:00
extended [tests] skip tests for xpu (#33553) 2024-09-19 19:28:04 +01:00
fixtures Implementation of SuperPoint and AutoModelForKeypointDetection (#28966) 2024-03-19 14:43:02 +00:00
fsdp [tests] make cuda-only tests device-agnostic (#35607) 2025-01-13 14:48:39 +01:00
generation Add Zamba2 (#34517) 2025-01-27 10:51:23 +01:00
models Add Zamba2 (#34517) 2025-01-27 10:51:23 +01:00
optimization fix: Fixed the 1st argument name in classmethods (#31907) 2024-07-11 12:11:50 +01:00
peft_integration use torch.testing.assertclose instead to get more details about error in cis (#35659) 2025-01-24 16:55:28 +01:00
pipelines Fix test_pipelines_video_classification that was always failing (#35842) 2025-01-23 19:22:32 +01:00
quantization use torch.testing.assertclose instead to get more details about error in cis (#35659) 2025-01-24 16:55:28 +01:00
repo_utils Fix modular edge case + modular sorting order (#35562) 2025-01-09 17:17:52 +01:00
sagemaker Trainer - deprecate tokenizer for processing_class (#32385) 2024-10-02 14:08:46 +01:00
tokenization tokenizer train from iterator without pre_tokenizers (#35396) 2025-01-09 15:34:43 +01:00
tp Simplify Tensor Parallel implementation with PyTorch TP (#34184) 2024-11-18 19:51:49 +01:00
trainer use torch.testing.assertclose instead to get more details about error in cis (#35659) 2025-01-24 16:55:28 +01:00
utils use torch.testing.assertclose instead to get more details about error in cis (#35659) 2025-01-24 16:55:28 +01:00
__init__.py GPU text generation: mMoved the encoded_prompt to correct device 2020-01-06 15:11:12 +01:00
test_backbone_common.py Align backbone stage selection with out_indices & out_features (#27606) 2023-12-20 18:33:17 +00:00
test_configuration_common.py Load sub-configs from composite configs (#34410) 2024-11-05 11:34:01 +01:00
test_feature_extraction_common.py Split common test from core tests (#24284) 2023-06-15 07:30:24 -04:00
test_image_processing_common.py use torch.testing.assertclose instead to get more details about error in cis (#35659) 2025-01-24 16:55:28 +01:00
test_image_transforms.py fix: center_crop occasionally outputs off-by-one dimension matrix (#30934) 2024-05-21 13:56:52 +01:00
test_modeling_common.py use torch.testing.assertclose instead to get more details about error in cis (#35659) 2025-01-24 16:55:28 +01:00
test_modeling_flax_common.py 🚨All attention refactor🚨 (#35235) 2024-12-18 16:53:39 +01:00
test_modeling_tf_common.py 🚨All attention refactor🚨 (#35235) 2024-12-18 16:53:39 +01:00
test_pipeline_mixin.py Add image text to text pipeline (#34170) 2024-10-31 15:48:11 -04:00
test_processing_common.py VLMs: major clean up 🧼 (#34502) 2025-01-08 10:35:23 +01:00
test_sequence_feature_extraction_common.py Fix typo (#25966) 2023-09-05 10:12:25 +02:00
test_tokenization_common.py [tokenizers] Ensure that add_prefix_space is propagated to backend_tokenizer.pre_tokenizer (#35593) 2025-01-09 17:46:50 +01:00