transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-19 12:38:23 +06:00

History

Arthur 19ade2426a [WIP]`NLLB-MoE` Adds the moe model (#22024 ) * Initial commit * update modeling code * update doc * add functions necessary * fix impotrs * revert changes * fixup * more styling to get going * remove standalone encoder * update code * styling * fix config and model * update code and some refactoring * make more tests pass * Adding NLLB-200 - MoE - 54.5B for no language left behind Fixes #21300 * fix mor common tests * styke * update testing file * update * update * Router2 doc * update check config with sparse layer * add dummy router * update current conversion script * create on the fly conversion script * Fixup * style * style 2 * fix empty return * fix return * Update default config sparse layers * easier to create sparse layers * update * update conversion script * update modeling * add to toctree * styling * make ruff happy * update docstring * update conversion script * update, will break tests but impelemting top2 * update * ❗local groups are supported here * ⚠️ Support for local groups is now removed ⚠️ This is because it has to work with model parallelism that we do not support * finish simplificaiton * Fix forward * style * fixup * Update modelling and test, refactoring * update tests * remove final layer)norm as it is done in the FF * routing works! Logits test added * nit in test * remove top1router * style * make sure sparse are tested. Had to change route_tokens a liottle bit * add support for unslip models when converting * fixup * style * update test s * update test * REFACTOR * encoder outputs match! * style * update testing * 🎉encoder and decoder logits match 🎉 * styleing * update tests * cleanup tests * fix router test and CIs * cleanup * cleanup test styling * fix tests * Finally the generation tests match! * cleanup * update test * style testing file * remove script * cleanup * more cleanup * nits * update * NLLB tokenizer is wrong and will be fixed soon * use LongTensors * update tests * revert some small changes * fix second expert sampling and batch prioritized routing * update tests * finish last tests * make ruff happy * update * ruff again * style * Update docs/source/en/model_doc/nllb-moe.mdx Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Updates based on review * style and fix import issue * nit * more nits * cleanup * styling * update test_seconde_expert_policy * fix name * last nit on the markdown examples --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>		2023-03-27 19:42:00 +02:00
..
internal	[Time-Series] informer model (#21099 )	2023-03-07 21:36:38 +01:00
main_classes	[deepspeed zero3] need `generate(synced_gpus=True, ...)` (#22242 )	2023-03-22 12:18:57 -07:00
model_doc	[WIP]`NLLB-MoE` Adds the moe model (#22024 )	2023-03-27 19:42:00 +02:00
tasks	[WIP]`NLLB-MoE` Adds the moe model (#22024 )	2023-03-27 19:42:00 +02:00
_config.py	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
_toctree.yml	[WIP]`NLLB-MoE` Adds the moe model (#22024 )	2023-03-27 19:42:00 +02:00
accelerate.mdx	✨ update to use interlibrary links instead of Markdown (#18500 )	2022-08-08 10:53:52 -05:00
add_new_model.mdx	🚨🚨🚨 Enforce single model initialization (#21431 )	2023-02-09 15:46:26 -05:00
add_new_pipeline.mdx	Spanish translation of asr.mdx and add_new_pipeline.mdx (#20569 )	2022-12-12 09:23:23 -05:00
add_tensorflow_model.mdx	docs: Resolve many typos in the English docs (#20088 )	2022-11-07 09:19:04 -05:00
attention.mdx	Refactor model summary (#21408 )	2023-02-15 10:35:14 -08:00
autoclass_tutorial.mdx	Update doc examples feature extractor -> image processor (#20501 )	2022-11-30 14:50:55 +00:00
benchmarks.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
bertology.mdx	update: bertology paper (#22012 )	2023-03-08 07:54:30 -05:00
big_models.mdx	docs: Resolve many typos in the English docs (#20088 )	2022-11-07 09:19:04 -05:00
community.mdx	Fix en documentation typos (#21799 )	2023-02-27 08:36:36 +01:00
contributing.md	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
converting_tensorflow_models.mdx	Docs - Guide to add a new TensorFlow model (#19256 )	2022-09-30 20:30:38 +01:00
create_a_model.mdx	Documentation code sample fixes (#21302 )	2023-01-25 11:33:39 -05:00
custom_models.mdx	Replace awkward timm link with the expected one (#20109 )	2022-11-07 13:57:39 -05:00
debugging.mdx	Spanish translation of the file debugging.mdx (#20566 )	2022-12-12 10:38:56 -05:00
fast_tokenizers.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
generation_strategies.mdx	Fix typo in Greedy Search Description (#22345 )	2023-03-24 07:32:18 -04:00
glossary.mdx	docs: New terms and updates to glossary (#21982 )	2023-03-13 19:09:37 -04:00
hpo_train.mdx	update doc for perf_train_cpu_many (#19506 )	2022-10-11 22:54:19 -04:00
index.mdx	[WIP]`NLLB-MoE` Adds the moe model (#22024 )	2023-03-27 19:42:00 +02:00
installation.mdx	Can't install tf2 on M1 Chip by default (#22046 )	2023-03-09 07:44:58 -05:00
migration.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
model_sharing.mdx	Fix `PushToHubCallback` import in Share a model docs (#21457 )	2023-02-06 09:26:22 -05:00
model_summary.mdx	Refactor model summary (#21408 )	2023-02-15 10:35:14 -08:00
multilingual.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
notebooks.md	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
pad_truncation.mdx	Example of pad_to_multiple_of for padding and truncation guide & docstring update (#22278 )	2023-03-20 14:18:55 -04:00
perf_hardware.mdx	[WIP] [doc] performance/scalability revamp (#15723 )	2022-05-16 13:36:41 +02:00
perf_infer_cpu.mdx	add doc for (#20525 )	2022-12-01 16:52:13 +01:00
perf_infer_gpu_many.mdx	add doc for (#20525 )	2022-12-01 16:52:13 +01:00
perf_infer_gpu_one.mdx	[`Doc`] Fix int8 docs (#21487 )	2023-02-07 15:09:27 +01:00
perf_infer_special.mdx	Improve performance docs (#17750 )	2022-06-23 14:51:54 +02:00
perf_train_cpu_many.mdx	update cpu related doc (#20444 )	2022-11-28 08:54:35 -05:00
perf_train_cpu.mdx	Add perf numbers for perf_train_cpu (#20974 )	2023-02-06 09:20:43 -05:00
perf_train_gpu_many.mdx	Fix Typo in Docs for GPU (#20509 )	2022-11-30 10:41:18 -05:00
perf_train_gpu_one.mdx	Migrate torchdynamo to torch.compile (#20634 )	2022-12-08 11:18:52 -05:00
perf_train_special.mdx	Fix Typo in Docs for GPU (#20509 )	2022-11-30 10:41:18 -05:00
perf_train_tpu_tf.mdx	Typos/fixes to link syntax (#21450 )	2023-02-07 15:19:19 +00:00
perf_train_tpu.mdx	Fix Typo in Docs for GPU (#20509 )	2022-11-30 10:41:18 -05:00
performance.mdx	Fix Typo in Docs for GPU (#20509 )	2022-11-30 10:41:18 -05:00
perplexity.mdx	Fix incorrect size of input for 1st strided window length in `Perplexity of fixed-length models` (#18906 )	2022-09-06 15:20:12 -04:00
philosophy.mdx	Update doc examples feature extractor -> image processor (#20501 )	2022-11-30 14:50:55 +00:00
pipeline_tutorial.mdx	Update 2 doctest expected values for torch 2.0.0 (#22148 )	2023-03-14 09:13:16 +00:00
pipeline_webserver.mdx	Update quality tooling for formatting (#21480 )	2023-02-06 18:10:56 -05:00
pr_checks.mdx	Cleanup quality (#21493 )	2023-02-07 12:27:31 -05:00
preprocessing.mdx	Updates to computer vision section of the Preprocess doc (#21181 )	2023-01-19 08:43:36 -05:00
quicktour.mdx	Fix 2 quicktour file doctest (#21742 )	2023-02-23 09:41:28 +01:00
run_scripts.mdx	Just re-reading the whole doc every couple of months 😬 (#18489 )	2022-08-06 09:38:55 +02:00
sagemaker.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
serialization.mdx	Add Mega: Moving Average Equipped Gated Attention (#21766 )	2023-03-24 08:17:27 -04:00
task_summary.mdx	Remove trailing 'extractive' word from en documentation (#21594 )	2023-02-13 10:09:00 -05:00
tasks_explained.mdx	Update task summary (#21067 )	2023-02-02 11:41:27 -08:00
testing.mdx	[`tests`] add `accelerate` marker (#21743 )	2023-02-27 12:33:34 +01:00
tf_xla.mdx	Rewrite a couple of lines in the TF XLA doc (#21177 )	2023-01-18 17:53:05 +00:00
tokenizer_summary.mdx	Update tokenizer_summary.mdx (#20135 )	2022-11-15 01:18:13 +01:00
torchscript.mdx	Breakup export guide (#19271 )	2022-10-03 13:18:29 -07:00
training.mdx	Fix code example in training tutorial (#21201 )	2023-01-20 07:38:15 -08:00
troubleshooting.mdx	Removed BLIP mention from the troubleshooting guide (#21872 )	2023-03-01 08:26:25 -05:00