transformers/docs/source/en/tasks
Arthur accccdd008
[Add Mixtral] Adds support for the Mixtral MoE (#27942)
* up

* up

* test

* logits ok

* up

* up

* few fixes

* conversion script

* up

* nits

* nits

* update

* nuke

* more updates

* nites

* fix many issues

* nit

* scatter

* nit

* nuke megablocks

* nits

* fix conversion script

* nit

* remove

* nits

* nit

* update

* oupsssss

* change

* nits device

* nits

* fixup

* update

* merge

* add copied from

* fix the copy mentions

* update tests

* more fixes

* nits

* conversion script

* add parts of the readme

* Update tests/models/mixtral/test_modeling_mixtral.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* new test + conversion script

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Apply suggestions from code review

* fix

* fix copies

* fix copies

* ooops

* fix config

* Apply suggestions from code review

* fix nits

* nit

* add copies

* add batched tests

* docs

* fix flash attention

* let's add more verbose

* add correct outputs

* support router ouptus

* ignore copies where needed

* fix

* cat list if list is given for now

* nits

* Update docs/source/en/model_doc/mixtral.md

* finish router refactoring

* fix forward

* fix expected values

* nits

* fixup

* fix

* fix bug

* fix

* fix dtype mismatch

* fix

* grrr grrr I support item assignment

* fix CI

* docs

* fixup

* remove some copied form

* fix weird diff

* skip doctest fast on the config and modeling

* mark that is supports flash attention in the doc

* update

* Update src/transformers/models/mixtral/modeling_mixtral.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

* Update docs/source/en/model_doc/mixtral.md

Co-authored-by: Lysandre Debut <hi@lysand.re>

* revert router logits config issue

* update doc accordingly

* Update src/transformers/models/mixtral/convert_mixtral_weights_to_hf.py

* nits

* use torch testing asssert close

* fixup

* doc nits

---------

Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
2023-12-11 12:50:27 +01:00
..
asr.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
audio_classification.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
document_question_answering.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
idefics.md Translate en/tasks folder docs to Japanese 🇯🇵 (#27098) 2023-12-04 14:10:54 -08:00
image_captioning.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
image_classification.md Pvt model (#24720) 2023-07-24 15:34:19 +01:00
image_to_image.md Image-to-Image Task Guide (#26595) 2023-10-16 15:12:03 +02:00
knowledge_distillation_for_image_classification.md Knowledge distillation for vision guide (#25619) 2023-10-18 04:42:32 -07:00
language_modeling.md [Add Mixtral] Adds support for the Mixtral MoE (#27942) 2023-12-11 12:50:27 +01:00
masked_language_modeling.md fixed broken link (#27560) 2023-11-17 08:20:42 -08:00
monocular_depth_estimation.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
multiple_choice.md Add Multi Resolution Analysis (MRA) (New PR) (#24513) 2023-07-10 10:50:43 +01:00
object_detection.md [doc] fixed indices in obj detection example (#26343) 2023-09-22 10:29:27 -04:00
prompting.md [docs] LLM prompting guide (#26274) 2023-10-12 08:48:01 -04:00
question_answering.md [MPT] Add MosaicML's MPT model to transformers (#24629) 2023-07-25 14:32:40 +02:00
semantic_segmentation.md [docs] Custom semantic segmentation dataset (#27859) 2023-12-07 10:47:35 -08:00
sequence_classification.md [Add Mixtral] Adds support for the Mixtral MoE (#27942) 2023-12-11 12:50:27 +01:00
summarization.md Translate en/tasks folder docs to Japanese 🇯🇵 (#27098) 2023-12-04 14:10:54 -08:00
text-to-speech.md Reflect RoCm support in the documentation (#27636) 2023-11-25 00:59:17 +09:00
token_classification.md Add Phi-1 and Phi-1_5 (#26170) 2023-11-10 15:28:30 +00:00
translation.md Translate en/tasks folder docs to Japanese 🇯🇵 (#27098) 2023-12-04 14:10:54 -08:00
video_classification.md Add ViViT (#22518) 2023-07-11 14:04:04 +01:00
visual_question_answering.md VQA task guide (#25244) 2023-08-09 08:29:06 -04:00
zero_shot_image_classification.md [docs] Fix model reference in zero shot image classification example (#26206) 2023-09-19 00:45:12 +02:00
zero_shot_object_detection.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00