transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-16 11:08:23 +06:00

History

Younes Belkada 163ac3d3ee Add Switch transformers (#19323 ) * first commit * add more comments * add router v1 * clean up - remove `tf` modeling files * clean up - remove `tf` modeling files * clean up * v0 routers * added more router - Implemented `ExpertsChooseMaskedRouter` - added tests - 2 more routers to implement * last router * improved docstring - completed the docstring in `router.py` - added more args in the config * v0 sparse mlp * replace wrong naming * forward pass run * update MOE layer * small router update * fixup * consistency * remove scatter router * remove abstract layer * update test and model for integration testing * v1 conversion * update * hardcode hack * all keys match * add gin conversion, without additional libraries * update conversion sctipy * delete router file * update tests wrt router deletion * fix router issues * update expert code * update, logits match, code needsREFACTORING * Refactor code Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com> * add generate tests Co-authored-by: younesbelkada <younesbelkada@gmail.com> * add support for router loss Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com> * fix forward error * refactor a bit * remove `FlaxSwitchTransformers` modules * more tests pass * Update code Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com> * fixup * fix tests * fix doc * fix doc + tokenization * fix tokenizer test * fix test * fix loss output * update code for backward pass * add loss support * update documentation * fix documentation, clean tokenizer * more doc fix, cleanup example_switch * fix failing test * fix test * fix test * fix loss issue * move layer * update doc and fix router capacity usage * fixup * add sparse mlp index for documentation on hub * fixup * test sparse mix architecture * Apply suggestions from code review * Update docs/source/en/model_doc/switch_transformers.mdx * fixup on update * fix tests * fix another test * attempt fix * Update src/transformers/models/switch_transformers/configuration_switch_transformers.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/switch_transformers/convert_switch_transformers_original_flax_checkpoint_to_pytorch.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * try * all tests pass * fix jitter noise * Apply suggestions from code review * doc tests pass * Update src/transformers/models/switch_transformers/modeling_switch_transformers.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/switch_transformers/modeling_switch_transformers.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * remove assert * change config order * fix readme japanese * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * remove parallelizable tests + add one liners * remove ONNX config * fix nits - add `T5Tokenizer` in auto mapping - remove `Switch Transformers` from ONNX supported models * remove `_get_router` * remove asserts * add check in test for `router_dtype` * add `SwitchTransformersConfig` in `run_pipeline_test` * Update tests/pipelines/test_pipelines_summarization.py * add huge model conversion script * fix slow tests - add better casting for `Linear8bitLt` - remove `torchscript` tests * add make dir * style on new script * fix nits - doctest - remove `_keys_to_ignore_on_load_unexpected` * Update src/transformers/models/switch_transformers/configuration_switch_transformers.py * add google as authors * fix year * remove last `assert` statements * standardize vertical spaces * fix failing import * fix another failing test * Remove strange àuthorized_keys` * removing todo and padding that is never used Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: ybelkada <younes@huggingface.co> Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Arthur Zucker <arthur@huggingface.co>		2022-11-15 13:06:45 +01:00
..
internal	Generate: move generation_.py src files into generation/.py (#20096 )	2022-11-09 15:34:08 +00:00
main_classes	Generate: move generation_.py src files into generation/.py (#20096 )	2022-11-09 15:34:08 +00:00
model_doc	Add Switch transformers (#19323 )	2022-11-15 13:06:45 +01:00
tasks	docs: Resolve many typos in the English docs (#20088 )	2022-11-07 09:19:04 -05:00
_config.py	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
_toctree.yml	Add Switch transformers (#19323 )	2022-11-15 13:06:45 +01:00
accelerate.mdx	✨ update to use interlibrary links instead of Markdown (#18500 )	2022-08-08 10:53:52 -05:00
add_new_model.mdx	add small updates only (#19847 )	2022-10-24 10:18:20 -07:00
add_new_pipeline.mdx	Update add_new_pipeline.mdx (#18224 )	2022-07-21 07:55:30 +02:00
add_tensorflow_model.mdx	docs: Resolve many typos in the English docs (#20088 )	2022-11-07 09:19:04 -05:00
autoclass_tutorial.mdx	Mention TF and Flax checkpoints (#18894 )	2022-09-05 11:09:39 +02:00
benchmarks.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
bertology.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
big_models.mdx	docs: Resolve many typos in the English docs (#20088 )	2022-11-07 09:19:04 -05:00
community.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
contributing.md	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
converting_tensorflow_models.mdx	Docs - Guide to add a new TensorFlow model (#19256 )	2022-09-30 20:30:38 +01:00
create_a_model.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
custom_models.mdx	Replace awkward timm link with the expected one (#20109 )	2022-11-07 13:57:39 -05:00
debugging.mdx	[doc] debug: fix import (#19042 )	2022-09-14 16:29:58 -07:00
fast_tokenizers.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
glossary.mdx	add cv + audio labels (#20114 )	2022-11-09 07:40:15 -08:00
hpo_train.mdx	update doc for perf_train_cpu_many (#19506 )	2022-10-11 22:54:19 -04:00
index.mdx	Add Switch transformers (#19323 )	2022-11-15 13:06:45 +01:00
installation.mdx	Move cache folder to huggingface/hub for consistency with hf_hub (#18492 )	2022-08-05 13:14:00 -04:00
migration.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
model_sharing.mdx	Just re-reading the whole doc every couple of months 😬 (#18489 )	2022-08-06 09:38:55 +02:00
model_summary.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
multilingual.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
notebooks.md	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
pad_truncation.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
perf_hardware.mdx	[WIP] [doc] performance/scalability revamp (#15723 )	2022-05-16 13:36:41 +02:00
perf_infer_cpu.mdx	fix jit trace error for model forward sequence is not aligned with jit.trace tuple input sequence, update related doc (#19891 )	2022-11-03 10:50:03 -04:00
perf_infer_gpu_many.mdx	Update perf_infer_gpu_many.mdx (#18744 )	2022-08-24 10:37:52 +02:00
perf_infer_gpu_one.mdx	[bnb] Move documentation (#18671 )	2022-08-18 17:34:48 +02:00
perf_infer_special.mdx	Improve performance docs (#17750 )	2022-06-23 14:51:54 +02:00
perf_train_cpu_many.mdx	update doc for perf_train_cpu_many (#19506 )	2022-10-11 22:54:19 -04:00
perf_train_cpu.mdx	fix jit trace error for model forward sequence is not aligned with jit.trace tuple input sequence, update related doc (#19891 )	2022-11-03 10:50:03 -04:00
perf_train_gpu_many.mdx	Improve performance docs (#17750 )	2022-06-23 14:51:54 +02:00
perf_train_gpu_one.mdx	docs: Resolve many typos in the English docs (#20088 )	2022-11-07 09:19:04 -05:00
perf_train_special.mdx	Improve performance docs (#17750 )	2022-06-23 14:51:54 +02:00
perf_train_tpu.mdx	Improve performance docs (#17750 )	2022-06-23 14:51:54 +02:00
performance.mdx	Improve performance docs (#17750 )	2022-06-23 14:51:54 +02:00
perplexity.mdx	Fix incorrect size of input for 1st strided window length in `Perplexity of fixed-length models` (#18906 )	2022-09-06 15:20:12 -04:00
philosophy.mdx	Update philosophy to include other preprocessing classes (#18550 )	2022-08-10 13:20:39 -05:00
pipeline_tutorial.mdx	Generate: move generation_.py src files into generation/.py (#20096 )	2022-11-09 15:34:08 +00:00
pr_checks.mdx	📝 update documentation build section (#18548 )	2022-08-09 18:22:55 -05:00
preprocessing.mdx	AutoImageProcessor (#20111 )	2022-11-08 19:54:41 +00:00
quicktour.mdx	Fix doctest (#20023 )	2022-11-02 19:37:25 +01:00
run_scripts.mdx	Just re-reading the whole doc every couple of months 😬 (#18489 )	2022-08-06 09:38:55 +02:00
sagemaker.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
serialization.mdx	add MobileNetV2 model (#17845 )	2022-11-14 01:00:10 -05:00
task_summary.mdx	Generate: move generation_.py src files into generation/.py (#20096 )	2022-11-09 15:34:08 +00:00
testing.mdx	docs: Resolve many typos in the English docs (#20088 )	2022-11-07 09:19:04 -05:00
tokenizer_summary.mdx	Update tokenizer_summary.mdx (#20135 )	2022-11-15 01:18:13 +01:00
torchscript.mdx	Breakup export guide (#19271 )	2022-10-03 13:18:29 -07:00
training.mdx	Update training.mdx (#19791 )	2022-10-21 09:46:44 -04:00
troubleshooting.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00