Lorenzo Verardo
a25037beb9
MixtralSparseMoeBlock: add gate jitter ( #29865 )
...
This commit adds gate jitter to MixtralSparseMoeBlock's input data
before passing it through the MoE layer, if turned on.
2024-03-27 16:14:26 +01:00
Khai Mai
c5c69096b3
Exclude the load balancing loss of padding tokens in Mixtral-8x7B ( #28517 )
...
* fix the function load_balancing_loss_func in Mixtral_Moe to include attention_mask
* format code using black and ruff
* skip computing mask if attention_mask=None
* add tests for load balancing loss Mixtral-Moe
* fix assert loss is different in mixtral_test
* fix pad_leng
* use assertNotAlmostEqual and print to debug
* remove print for debug
* minor updates
* reduce rtol and atol
2024-01-24 10:12:14 +01:00
liangxuZhang
e768616afa
Fix load balancing loss func for mixtral ( #28256 )
...
* Correct the implementation of auxiliary loss of mixtrtal
* correct the implementation of auxiliary loss of mixtrtal
* Implement a simpler calculation method
---------
Co-authored-by: zhangliangxu3 <zhangliangxu3@jd.com>
2024-01-11 16:16:12 +01:00
Arthur
f9a98c476c
[Mixtral
& Mistral
] Add support for sdpa ( #28133 )
...
* some nits
* update test
* add support d\sd[a
* remove some dummy inputs
* all good
* style
* nits
* fixes
* fix more copies
* nits
* styling
* fix
* Update src/transformers/models/mistral/modeling_mistral.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* add a slow test just to be sure
* fixup
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-12-21 12:38:22 +01:00
Arthur
4a04b4ccca
[Mixtral
] Fix loss + nits ( #28115 )
...
* default config should not use sliding window
* update the doc
* nits
* add a proper test
* update
* update
* update expected value
* Update src/transformers/tokenization_utils_fast.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* convert to float
* average then N**2
* comment
* revert nit
* good to fo
* fixup
* Update tests/models/mixtral/test_modeling_mixtral.py
Co-authored-by: Lysandre Debut <hi@lysand.re>
* revert unrelated change
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
2023-12-19 17:31:54 +01:00
Arthur
accccdd008
[Add Mixtral
] Adds support for the Mixtral MoE ( #27942 )
...
* up
* up
* test
* logits ok
* up
* up
* few fixes
* conversion script
* up
* nits
* nits
* update
* nuke
* more updates
* nites
* fix many issues
* nit
* scatter
* nit
* nuke megablocks
* nits
* fix conversion script
* nit
* remove
* nits
* nit
* update
* oupsssss
* change
* nits device
* nits
* fixup
* update
* merge
* add copied from
* fix the copy mentions
* update tests
* more fixes
* nits
* conversion script
* add parts of the readme
* Update tests/models/mixtral/test_modeling_mixtral.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* new test + conversion script
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Apply suggestions from code review
* fix
* fix copies
* fix copies
* ooops
* fix config
* Apply suggestions from code review
* fix nits
* nit
* add copies
* add batched tests
* docs
* fix flash attention
* let's add more verbose
* add correct outputs
* support router ouptus
* ignore copies where needed
* fix
* cat list if list is given for now
* nits
* Update docs/source/en/model_doc/mixtral.md
* finish router refactoring
* fix forward
* fix expected values
* nits
* fixup
* fix
* fix bug
* fix
* fix dtype mismatch
* fix
* grrr grrr I support item assignment
* fix CI
* docs
* fixup
* remove some copied form
* fix weird diff
* skip doctest fast on the config and modeling
* mark that is supports flash attention in the doc
* update
* Update src/transformers/models/mixtral/modeling_mixtral.py
Co-authored-by: Lysandre Debut <hi@lysand.re>
* Update docs/source/en/model_doc/mixtral.md
Co-authored-by: Lysandre Debut <hi@lysand.re>
* revert router logits config issue
* update doc accordingly
* Update src/transformers/models/mixtral/convert_mixtral_weights_to_hf.py
* nits
* use torch testing asssert close
* fixup
* doc nits
---------
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
2023-12-11 12:50:27 +01:00