transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-04 05:10:06 +06:00

History

Armaghan Shakir 55736eea99 Add support for MiniMax's MiniMax-Text-01 (#35831 ) * end-to-end architecture * lightning-attn: refactor, clean, optimize * put minimax_text_01 in other files * use latest __init__ standards and auto-generate modular * support attention_mask for lightning-attn * Revert "use latest __init__ standards and auto-generate modular" This reverts commit `d8d3c409d8`. * fix modular conversion * pass both attention masks instead of tuple * formatting * Updated Dynamic Cache * created MiniMaxText01Cache * fix hardcoded slope_rate * update attn_type_list in config * fix lightning when use_cache=False * copy tests from mixtral * (checkpoint) all tests pass for normal attention * fix all unittests * fix import sorting * fix consistency and formatting tests * fix config * update tests, since changes in main * fix seq_len error * create dummy docs * fix checkpoint * add checkpoint in config docstring * run modular_conversion * update docs * fix checkpoint path and update tests * fix ruff * remove repeated expected_slice * update docs * rename "minimax-text-01" to "minimax" * inherit config from mixtral * remove from docs in other languages * undo files that should be untouched * move minimax to end in conversation docs * use MiniMaxForCausalLM as it is * ruff fixes * run modular * fix docstring example in causallm * refactor attention loop and decay factors * refactor config in modular * run modular * refactor cache * rename static_cache to linear_cache * make positional embeddings necessary * remove unnecessary layernorms declarations * fix import in tests * refactor attention in next tokens * remove outdated code * formatting and modular * update tests * rename layernorm alpha/beta factors * register decay factors as buffers * remove unused declarations of decay factors * update config for alpha/beta factors * run modular * remove head_dim in tests * remove minimax from fx.py * remove stuff that is not really needed * update __init__ * update qkv torch.split Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * fix qkv torch.split * quality fixes * remove mistakenly added dummy * purge unused ModelTester code * fix-copies * run fix-copies * fix head_dim * write cache formatting tests * remove postnorm * avoid contiguous in attention current states * update expected_slice * add generation test for integration * fix dtype in generation test * update authors * update with changes in main * update graident checkpointing and minor fixes * fix mutable attn_type_list * rename: attn_type -> layer_type * update for layer_types * update integration tests * update checkpoint * clean overview in docs --------- Co-authored-by: Shakib-IO <shakib.khan17@northsouth.edu> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>		2025-06-04 09:38:40 +02:00
..
ar	Fixed broken links (#37466 )	2025-04-14 14:16:07 +01:00
de	Transformers cli clean command (#37657 )	2025-04-30 12:15:43 +01:00
en	Add support for MiniMax's MiniMax-Text-01 (#35831 )	2025-06-04 09:38:40 +02:00
es	Transformers cli clean command (#37657 )	2025-04-30 12:15:43 +01:00
fr	[agents] remove agents 🧹 (#37368 )	2025-04-11 18:42:37 +01:00
hi	[i18n-HI] Translated TFLite page to Hindi (#34572 )	2024-11-04 09:40:30 -08:00
it	Fix typos (#37978 )	2025-05-06 14:45:20 +01:00
ja	Expose AutoModelForTimeSeriesPrediction for import (#38307 )	2025-05-23 13:09:29 +00:00
ko	[generate] move `SinkCache` to a `custom_generate` repo (#38399 )	2025-06-02 12:13:30 +02:00
ms	[agents] remove agents 🧹 (#37368 )	2025-04-11 18:42:37 +01:00
pt	Transformers cli clean command (#37657 )	2025-04-30 12:15:43 +01:00
te	Fix typos in translated quicktour docs (#35302 )	2024-12-17 09:32:00 -08:00
tr	Translate index.md to Turkish (#27093 )	2023-11-08 08:35:20 -05:00
zh	Translating model_doc/bert.md to Chinese (#37806 )	2025-05-19 10:14:57 -07:00
_config.py	[#29174 ] ImportError Fix: Trainer with PyTorch requires accelerate>=0.20.1 Fix (#29888 )	2024-04-08 14:21:16 +01:00