mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-24 23:08:57 +06:00
![]() * up * up * test * logits ok * up * up * few fixes * conversion script * up * nits * nits * update * nuke * more updates * nites * fix many issues * nit * scatter * nit * nuke megablocks * nits * fix conversion script * nit * remove * nits * nit * update * oupsssss * change * nits device * nits * fixup * update * merge * add copied from * fix the copy mentions * update tests * more fixes * nits * conversion script * add parts of the readme * Update tests/models/mixtral/test_modeling_mixtral.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * new test + conversion script * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Apply suggestions from code review * fix * fix copies * fix copies * ooops * fix config * Apply suggestions from code review * fix nits * nit * add copies * add batched tests * docs * fix flash attention * let's add more verbose * add correct outputs * support router ouptus * ignore copies where needed * fix * cat list if list is given for now * nits * Update docs/source/en/model_doc/mixtral.md * finish router refactoring * fix forward * fix expected values * nits * fixup * fix * fix bug * fix * fix dtype mismatch * fix * grrr grrr I support item assignment * fix CI * docs * fixup * remove some copied form * fix weird diff * skip doctest fast on the config and modeling * mark that is supports flash attention in the doc * update * Update src/transformers/models/mixtral/modeling_mixtral.py Co-authored-by: Lysandre Debut <hi@lysand.re> * Update docs/source/en/model_doc/mixtral.md Co-authored-by: Lysandre Debut <hi@lysand.re> * revert router logits config issue * update doc accordingly * Update src/transformers/models/mixtral/convert_mixtral_weights_to_hf.py * nits * use torch testing asssert close * fixup * doc nits --------- Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Lysandre Debut <hi@lysand.re> |
||
---|---|---|
.. | ||
internal | ||
main_classes | ||
model_doc | ||
tasks | ||
_config.py | ||
_redirects.yml | ||
_toctree.yml | ||
accelerate.md | ||
add_new_model.md | ||
add_new_pipeline.md | ||
add_tensorflow_model.md | ||
attention.md | ||
autoclass_tutorial.md | ||
benchmarks.md | ||
bertology.md | ||
big_models.md | ||
chat_templating.md | ||
community.md | ||
contributing.md | ||
create_a_model.md | ||
custom_models.md | ||
custom_tools.md | ||
debugging.md | ||
fast_tokenizers.md | ||
generation_strategies.md | ||
glossary.md | ||
hpo_train.md | ||
index.md | ||
installation.md | ||
llm_tutorial_optimization.md | ||
llm_tutorial.md | ||
model_memory_anatomy.md | ||
model_sharing.md | ||
model_summary.md | ||
multilingual.md | ||
notebooks.md | ||
pad_truncation.md | ||
peft.md | ||
perf_hardware.md | ||
perf_infer_cpu.md | ||
perf_infer_gpu_one.md | ||
perf_torch_compile.md | ||
perf_train_cpu_many.md | ||
perf_train_cpu.md | ||
perf_train_gpu_many.md | ||
perf_train_gpu_one.md | ||
perf_train_special.md | ||
perf_train_tpu_tf.md | ||
perf_train_tpu.md | ||
performance.md | ||
perplexity.md | ||
philosophy.md | ||
pipeline_tutorial.md | ||
pipeline_webserver.md | ||
pr_checks.md | ||
preprocessing.md | ||
quantization.md | ||
quicktour.md | ||
run_scripts.md | ||
sagemaker.md | ||
serialization.md | ||
task_summary.md | ||
tasks_explained.md | ||
testing.md | ||
tf_xla.md | ||
tflite.md | ||
tokenizer_summary.md | ||
torchscript.md | ||
training.md | ||
transformers_agents.md | ||
troubleshooting.md |