transformers/tests
Armaghan Shakir 55736eea99
Add support for MiniMax's MiniMax-Text-01 (#35831)
* end-to-end architecture

* lightning-attn: refactor, clean, optimize

* put minimax_text_01 in other files

* use latest __init__ standards and auto-generate modular

* support attention_mask for lightning-attn

* Revert "use latest __init__ standards and auto-generate modular"

This reverts commit d8d3c409d8.

* fix modular conversion

* pass both attention masks instead of tuple

* formatting

* Updated Dynamic Cache

* created MiniMaxText01Cache

* fix hardcoded slope_rate

* update attn_type_list in config

* fix lightning when use_cache=False

* copy tests from mixtral

* (checkpoint) all tests pass for normal attention

* fix all unittests

* fix import sorting

* fix consistency and formatting tests

* fix config

* update tests, since changes in main

* fix seq_len error

* create dummy docs

* fix checkpoint

* add checkpoint in config docstring

* run modular_conversion

* update docs

* fix checkpoint path and update tests

* fix ruff

* remove repeated expected_slice

* update docs

* rename "minimax-text-01" to "minimax"

* inherit config from mixtral

* remove from docs in other languages

* undo files that should be untouched

* move minimax to end in conversation docs

* use MiniMaxForCausalLM as it is

* ruff fixes

* run modular

* fix docstring example in causallm

* refactor attention loop and decay factors

* refactor config in modular

* run modular

* refactor cache

* rename static_cache to linear_cache

* make positional embeddings necessary

* remove unnecessary layernorms declarations

* fix import in tests

* refactor attention in next tokens

* remove outdated code

* formatting and modular

* update tests

* rename layernorm alpha/beta factors

* register decay factors as buffers

* remove unused declarations of decay factors

* update config for alpha/beta factors

* run modular

* remove head_dim in tests

* remove minimax from fx.py

* remove stuff that is not really needed

* update __init__

* update qkv torch.split

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* fix qkv torch.split

* quality fixes

* remove mistakenly added dummy

* purge unused ModelTester code

* fix-copies

* run fix-copies

* fix head_dim

* write cache formatting tests

* remove postnorm

* avoid contiguous in attention current states

* update expected_slice

* add generation test for integration

* fix dtype in generation test

* update authors

* update with changes in main

* update graident checkpointing and minor fixes

* fix mutable attn_type_list

* rename: attn_type -> layer_type

* update for layer_types

* update integration tests

* update checkpoint

* clean overview in docs

---------

Co-authored-by: Shakib-IO <shakib.khan17@northsouth.edu>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-06-04 09:38:40 +02:00
..
bettertransformer Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
deepspeed 🚨 rm already deprecated pad_to_max_length arg (#37617) 2025-05-01 15:21:55 +02:00
extended Add Optional to remaining types (#37808) 2025-04-28 14:20:45 +01:00
fixtures Implementation of SuperPoint and AutoModelForKeypointDetection (#28966) 2024-03-19 14:43:02 +00:00
fsdp Fix the fsdp config cannot work issue. (#37549) 2025-04-28 10:44:51 +02:00
generation 🚨Early-error🚨 config will error out if output_attentions=True and the attn implementation is wrong (#38288) 2025-05-23 17:17:38 +02:00
models Add support for MiniMax's MiniMax-Text-01 (#35831) 2025-06-04 09:38:40 +02:00
optimization Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
peft_integration FIX: Faulty PEFT tests (#37757) 2025-04-28 15:10:46 +02:00
pipelines 🚨Early-error🚨 config will error out if output_attentions=True and the attn implementation is wrong (#38288) 2025-05-23 17:17:38 +02:00
quantization Name change AOPermod -> ModuleFqn (#38456) 2025-06-03 15:43:31 +00:00
repo_utils Simplify soft dependencies and update the dummy-creation process (#36827) 2025-04-11 11:08:36 +02:00
sagemaker Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
tensor_parallel [TP] Change command in tests to python3 (#38555) 2025-06-03 11:03:33 +00:00
tokenization Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
trainer switch to device agnostic device calling for test cases (#38247) 2025-05-26 10:18:53 +02:00
utils [core] support tensor-valued _extra_state values in from_pretrained (#38155) 2025-05-28 15:38:42 +02:00
__init__.py GPU text generation: mMoved the encoded_prompt to correct device 2020-01-06 15:11:12 +01:00
causal_lm_tester.py Remove redundant test_sdpa_equivalence test (#38436) 2025-05-28 17:22:25 +02:00
test_backbone_common.py Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
test_configuration_common.py Update composition flag usage (#36263) 2025-04-09 11:48:49 +02:00
test_feature_extraction_common.py Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
test_image_processing_common.py fix multi-image case for llava-onevision (#38084) 2025-05-21 11:50:46 +02:00
test_image_transforms.py Fix pad image transform for batched inputs (#37544) 2025-05-08 10:51:15 +01:00
test_modeling_common.py Add support for MiniMax's MiniMax-Text-01 (#35831) 2025-06-04 09:38:40 +02:00
test_modeling_flax_common.py Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
test_modeling_tf_common.py Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
test_pipeline_mixin.py Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
test_processing_common.py 🔴 Video processors as a separate class (#35206) 2025-05-12 11:55:51 +02:00
test_sequence_feature_extraction_common.py Use Python 3.9 syntax in tests (#37343) 2025-04-08 14:12:08 +02:00
test_tokenization_common.py 🚨 rm already deprecated pad_to_max_length arg (#37617) 2025-05-01 15:21:55 +02:00
test_training_args.py Fix TrainingArguments.torch_empty_cache_steps post_init check (#36734) 2025-03-17 16:09:46 +01:00
test_video_processing_common.py 🔴 Video processors as a separate class (#35206) 2025-05-12 11:55:51 +02:00