Armaghan Shakir
55736eea99
Add support for MiniMax's MiniMax-Text-01 ( #35831 )
...
* end-to-end architecture
* lightning-attn: refactor, clean, optimize
* put minimax_text_01 in other files
* use latest __init__ standards and auto-generate modular
* support attention_mask for lightning-attn
* Revert "use latest __init__ standards and auto-generate modular"
This reverts commit d8d3c409d8
.
* fix modular conversion
* pass both attention masks instead of tuple
* formatting
* Updated Dynamic Cache
* created MiniMaxText01Cache
* fix hardcoded slope_rate
* update attn_type_list in config
* fix lightning when use_cache=False
* copy tests from mixtral
* (checkpoint) all tests pass for normal attention
* fix all unittests
* fix import sorting
* fix consistency and formatting tests
* fix config
* update tests, since changes in main
* fix seq_len error
* create dummy docs
* fix checkpoint
* add checkpoint in config docstring
* run modular_conversion
* update docs
* fix checkpoint path and update tests
* fix ruff
* remove repeated expected_slice
* update docs
* rename "minimax-text-01" to "minimax"
* inherit config from mixtral
* remove from docs in other languages
* undo files that should be untouched
* move minimax to end in conversation docs
* use MiniMaxForCausalLM as it is
* ruff fixes
* run modular
* fix docstring example in causallm
* refactor attention loop and decay factors
* refactor config in modular
* run modular
* refactor cache
* rename static_cache to linear_cache
* make positional embeddings necessary
* remove unnecessary layernorms declarations
* fix import in tests
* refactor attention in next tokens
* remove outdated code
* formatting and modular
* update tests
* rename layernorm alpha/beta factors
* register decay factors as buffers
* remove unused declarations of decay factors
* update config for alpha/beta factors
* run modular
* remove head_dim in tests
* remove minimax from fx.py
* remove stuff that is not really needed
* update __init__
* update qkv torch.split
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* fix qkv torch.split
* quality fixes
* remove mistakenly added dummy
* purge unused ModelTester code
* fix-copies
* run fix-copies
* fix head_dim
* write cache formatting tests
* remove postnorm
* avoid contiguous in attention current states
* update expected_slice
* add generation test for integration
* fix dtype in generation test
* update authors
* update with changes in main
* update graident checkpointing and minor fixes
* fix mutable attn_type_list
* rename: attn_type -> layer_type
* update for layer_types
* update integration tests
* update checkpoint
* clean overview in docs
---------
Co-authored-by: Shakib-IO <shakib.khan17@northsouth.edu>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-06-04 09:38:40 +02:00