NielsRogge
dabe855668
[Mistral, Mixtral] Improve docs ( #29084 )
...
* Improve docs
* Improve chat template
2024-02-22 11:48:01 +01:00
Klaus Hipp
721ee783ca
[Docs] Fix spelling and grammar mistakes ( #28825 )
...
* Fix typos and grammar mistakes in docs and examples
* Fix typos in docstrings and comments
* Fix spelling of `tokenizer` in model tests
* Remove erroneous spaces in decorators
* Remove extra spaces in Markdown link texts
2024-02-02 08:45:00 +01:00
Francisco Kurucz
3724156b4d
Fix load correct tokenizer in Mixtral model documentation ( #28437 )
2024-01-10 18:09:06 +01:00
Aaron Jimenez
38611086d2
[docs] Fix mistral link in mixtral.md ( #28143 )
...
Fix mistral link in mixtral.md
2023-12-19 10:34:14 -08:00
Aeneas Stankowski
7f2a8f92e4
Spelling correction ( #28110 )
...
Update mixtral.md
correct minor typo in overview
2023-12-18 14:04:05 +00:00
Timon Käch
5cec306cdc
Fix parameter count in readme for mixtral 45b ( #27945 )
...
fix parameter count in readme
2023-12-11 14:58:48 +00:00
Arthur
accccdd008
[Add Mixtral
] Adds support for the Mixtral MoE ( #27942 )
...
* up
* up
* test
* logits ok
* up
* up
* few fixes
* conversion script
* up
* nits
* nits
* update
* nuke
* more updates
* nites
* fix many issues
* nit
* scatter
* nit
* nuke megablocks
* nits
* fix conversion script
* nit
* remove
* nits
* nit
* update
* oupsssss
* change
* nits device
* nits
* fixup
* update
* merge
* add copied from
* fix the copy mentions
* update tests
* more fixes
* nits
* conversion script
* add parts of the readme
* Update tests/models/mixtral/test_modeling_mixtral.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* new test + conversion script
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Apply suggestions from code review
* fix
* fix copies
* fix copies
* ooops
* fix config
* Apply suggestions from code review
* fix nits
* nit
* add copies
* add batched tests
* docs
* fix flash attention
* let's add more verbose
* add correct outputs
* support router ouptus
* ignore copies where needed
* fix
* cat list if list is given for now
* nits
* Update docs/source/en/model_doc/mixtral.md
* finish router refactoring
* fix forward
* fix expected values
* nits
* fixup
* fix
* fix bug
* fix
* fix dtype mismatch
* fix
* grrr grrr I support item assignment
* fix CI
* docs
* fixup
* remove some copied form
* fix weird diff
* skip doctest fast on the config and modeling
* mark that is supports flash attention in the doc
* update
* Update src/transformers/models/mixtral/modeling_mixtral.py
Co-authored-by: Lysandre Debut <hi@lysand.re>
* Update docs/source/en/model_doc/mixtral.md
Co-authored-by: Lysandre Debut <hi@lysand.re>
* revert router logits config issue
* update doc accordingly
* Update src/transformers/models/mixtral/convert_mixtral_weights_to_hf.py
* nits
* use torch testing asssert close
* fixup
* doc nits
---------
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
2023-12-11 12:50:27 +01:00