transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-16 11:08:23 +06:00

History

Younes Belkada 163ac3d3ee Add Switch transformers (#19323 ) * first commit * add more comments * add router v1 * clean up - remove `tf` modeling files * clean up - remove `tf` modeling files * clean up * v0 routers * added more router - Implemented `ExpertsChooseMaskedRouter` - added tests - 2 more routers to implement * last router * improved docstring - completed the docstring in `router.py` - added more args in the config * v0 sparse mlp * replace wrong naming * forward pass run * update MOE layer * small router update * fixup * consistency * remove scatter router * remove abstract layer * update test and model for integration testing * v1 conversion * update * hardcode hack * all keys match * add gin conversion, without additional libraries * update conversion sctipy * delete router file * update tests wrt router deletion * fix router issues * update expert code * update, logits match, code needsREFACTORING * Refactor code Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com> * add generate tests Co-authored-by: younesbelkada <younesbelkada@gmail.com> * add support for router loss Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com> * fix forward error * refactor a bit * remove `FlaxSwitchTransformers` modules * more tests pass * Update code Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com> * fixup * fix tests * fix doc * fix doc + tokenization * fix tokenizer test * fix test * fix loss output * update code for backward pass * add loss support * update documentation * fix documentation, clean tokenizer * more doc fix, cleanup example_switch * fix failing test * fix test * fix test * fix loss issue * move layer * update doc and fix router capacity usage * fixup * add sparse mlp index for documentation on hub * fixup * test sparse mix architecture * Apply suggestions from code review * Update docs/source/en/model_doc/switch_transformers.mdx * fixup on update * fix tests * fix another test * attempt fix * Update src/transformers/models/switch_transformers/configuration_switch_transformers.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/switch_transformers/convert_switch_transformers_original_flax_checkpoint_to_pytorch.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * try * all tests pass * fix jitter noise * Apply suggestions from code review * doc tests pass * Update src/transformers/models/switch_transformers/modeling_switch_transformers.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/switch_transformers/modeling_switch_transformers.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * remove assert * change config order * fix readme japanese * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * remove parallelizable tests + add one liners * remove ONNX config * fix nits - add `T5Tokenizer` in auto mapping - remove `Switch Transformers` from ONNX supported models * remove `_get_router` * remove asserts * add check in test for `router_dtype` * add `SwitchTransformersConfig` in `run_pipeline_test` * Update tests/pipelines/test_pipelines_summarization.py * add huge model conversion script * fix slow tests - add better casting for `Linear8bitLt` - remove `torchscript` tests * add make dir * style on new script * fix nits - doctest - remove `_keys_to_ignore_on_load_unexpected` * Update src/transformers/models/switch_transformers/configuration_switch_transformers.py * add google as authors * fix year * remove last `assert` statements * standardize vertical spaces * fix failing import * fix another failing test * Remove strange àuthorized_keys` * removing todo and padding that is never used Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: ybelkada <younes@huggingface.co> Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Arthur Zucker <arthur@huggingface.co>		2022-11-15 13:06:45 +01:00
..
albert.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
auto.mdx	AutoImageProcessor (#20111 )	2022-11-08 19:54:41 +00:00
bart.mdx	Generate: move generation_.py src files into generation/.py (#20096 )	2022-11-09 15:34:08 +00:00
barthez.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
bartpho.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
beit.mdx	AutoImageProcessor (#20111 )	2022-11-08 19:54:41 +00:00
bert-generation.mdx	Result of new doc style with fixes (#17015 )	2022-04-29 17:42:15 -04:00
bert-japanese.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
bert.mdx	Add BERT resources (#19852 )	2022-11-01 11:09:53 -07:00
bertweet.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
big_bird.mdx	Update documentation on seq2seq models with absolute positional embeddings, to be in line with Tips section for BERT and GPT2 (#20068 )	2022-11-04 11:32:44 -04:00
bigbird_pegasus.mdx	Update documentation on seq2seq models with absolute positional embeddings, to be in line with Tips section for BERT and GPT2 (#20068 )	2022-11-04 11:32:44 -04:00
blenderbot-small.mdx	Update documentation on seq2seq models with absolute positional embeddings, to be in line with Tips section for BERT and GPT2 (#20068 )	2022-11-04 11:32:44 -04:00
blenderbot.mdx	Update documentation on seq2seq models with absolute positional embeddings, to be in line with Tips section for BERT and GPT2 (#20068 )	2022-11-04 11:32:44 -04:00
bloom.mdx	docs: Resolve many typos in the English docs (#20088 )	2022-11-07 09:19:04 -05:00
bort.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
byt5.mdx	[Doctests] Fix all T5 doc tests (#16646 )	2022-04-13 11:36:54 +02:00
camembert.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
canine.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
clip.mdx	AutoImageProcessor (#20111 )	2022-11-08 19:54:41 +00:00
clipseg.mdx	[CLIPSeg] Add resources (#20118 )	2022-11-09 18:31:22 +01:00
codegen.mdx	Add CodeGen model (#17443 )	2022-06-24 17:10:38 +02:00
conditional_detr.mdx	[Conditional, Deformable DETR] Add postprocessing methods (#19709 )	2022-10-31 08:28:44 +01:00
convbert.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
convnext.mdx	AutoImageProcessor (#20111 )	2022-11-08 19:54:41 +00:00
cpm.mdx	Allow all imports from transformers (#17050 )	2022-05-02 12:47:39 -04:00
ctrl.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
cvt.mdx	[CvT] Tensorflow implementation (#18597 )	2022-10-11 18:16:52 +01:00
data2vec.mdx	Add TFData2VecVision for semantic segmentation (#17271 )	2022-06-08 14:03:18 +01:00
deberta-v2.mdx	Add DebertaV2ForMultipleChoice (#17135 )	2022-05-10 16:21:44 -04:00
deberta.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
decision_transformer.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
deformable_detr.mdx	[Conditional, Deformable DETR] Add postprocessing methods (#19709 )	2022-10-31 08:28:44 +01:00
deit.mdx	AutoImageProcessor (#20111 )	2022-11-08 19:54:41 +00:00
detr.mdx	fix docs example, add object_detection to DETR docs (#19377 )	2022-10-07 00:02:26 +02:00
dialogpt.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
distilbert.mdx	add resources for distilbert (#19930 )	2022-10-28 13:16:07 -07:00
dit.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
donut.mdx	Generate: move generation_.py src files into generation/.py (#20096 )	2022-11-09 15:34:08 +00:00
dpr.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
dpt.mdx	AutoImageProcessor (#20111 )	2022-11-08 19:54:41 +00:00
electra.mdx	[FlaxBert] Add ForCausalLM (#16995 )	2022-05-03 11:26:19 +02:00
encoder-decoder.mdx	[EncoderDecoder] Improve docs (#18271 )	2022-07-27 10:08:59 +02:00
ernie.mdx	add task_type_id to BERT to support ERNIE-2.0 and ERNIE-3.0 models (#18686 )	2022-09-09 07:36:46 -04:00
esm.mdx	Add ESMFold (#19977 )	2022-10-31 21:32:58 -04:00
flan-t5.mdx	Add `flan-t5` documentation page (#19892 )	2022-10-26 17:22:57 +02:00
flaubert.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
flava.mdx	AutoImageProcessor (#20111 )	2022-11-08 19:54:41 +00:00
fnet.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
fsmt.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
funnel.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
glpn.mdx	AutoImageProcessor (#20111 )	2022-11-08 19:54:41 +00:00
gpt_neo.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
gpt_neox_japanese.mdx	Add support for Japanese GPT-NeoX-based model by ABEJA, Inc. (#18814 )	2022-09-14 10:17:40 -04:00
gpt_neox.mdx	[WIP] Adding GPT-NeoX-20B (#16659 )	2022-05-24 09:31:10 -04:00
gpt2.mdx	docs: Resolve many typos in the English docs (#20088 )	2022-11-07 09:19:04 -05:00
gptj.mdx	Generate: move generation_.py src files into generation/.py (#20096 )	2022-11-09 15:34:08 +00:00
groupvit.mdx	[TensorFlow] Adding GroupViT (#18020 )	2022-09-29 10:48:04 +01:00
herbert.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
hubert.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
ibert.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
imagegpt.mdx	AutoImageProcessor (#20111 )	2022-11-08 19:54:41 +00:00
jukebox.mdx	Add Jukebox model (replaces #16875 ) (#17826 )	2022-11-10 21:05:27 +01:00
layoutlm.mdx	[LayoutLM] Add clarification to docs (#18716 )	2022-09-02 14:48:19 +02:00
layoutlmv2.mdx	AutoImageProcessor (#20111 )	2022-11-08 19:54:41 +00:00
layoutlmv3.mdx	AutoImageProcessor (#20111 )	2022-11-08 19:54:41 +00:00
layoutxlm.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
led.mdx	Update documentation on seq2seq models with absolute positional embeddings, to be in line with Tips section for BERT and GPT2 (#20068 )	2022-11-04 11:32:44 -04:00
levit.mdx	AutoImageProcessor (#20111 )	2022-11-08 19:54:41 +00:00
lilt.mdx	Add docs (#19729 )	2022-10-18 17:42:46 +02:00
longformer.mdx	docs: Resolve many typos in the English docs (#20088 )	2022-11-07 09:19:04 -05:00
longt5.mdx	Update longt5.mdx (#18634 )	2022-08-16 10:20:46 -05:00
luke.mdx	Adding fine-tuning models to LUKE (#18353 )	2022-08-01 11:09:47 -04:00
lxmert.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
m2m_100.mdx	Fix `m2m_100.mdx` doc example missing `labels` (#19149 )	2022-09-29 13:27:58 +02:00
marian.mdx	Replace `as_target` context managers by direct calls (#18325 )	2022-07-29 08:09:09 -04:00
markuplm.mdx	Fix doctest for `MarkupLM` (#19845 )	2022-10-24 17:54:23 +02:00
maskformer.mdx	Add doc tests (#20158 )	2022-11-10 15:25:30 +01:00
mbart.mdx	Replace `as_target` context managers by direct calls (#18325 )	2022-07-29 08:09:09 -04:00
mctct.mdx	Replace `as_target` context managers by direct calls (#18325 )	2022-07-29 08:09:09 -04:00
megatron_gpt2.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
megatron-bert.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
mluke.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
mobilebert.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
mobilenet_v2.mdx	add MobileNetV2 model (#17845 )	2022-11-14 01:00:10 -05:00
mobilevit.mdx	AutoImageProcessor (#20111 )	2022-11-08 19:54:41 +00:00
mpnet.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
mt5.mdx	docs: Resolve many typos in the English docs (#20088 )	2022-11-07 09:19:04 -05:00
mvp.mdx	Add MVP model (#17787 )	2022-06-29 09:30:55 -04:00
nezha.mdx	Nezha Pytorch implementation (#17776 )	2022-06-23 12:36:22 -04:00
nllb.mdx	Replace `as_target` context managers by direct calls (#18325 )	2022-07-29 08:09:09 -04:00
nystromformer.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
openai-gpt.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
opt.mdx	Add `OPTForQuestionAnswering` (#19402 )	2022-10-10 09:30:59 -04:00
owlvit.mdx	fix owlvit tests, update docstring examples (#18586 )	2022-08-11 19:10:25 +03:00
pegasus_x.mdx	PEGASUS-X (#18551 )	2022-09-02 19:54:02 +02:00
pegasus.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
perceiver.mdx	AutoImageProcessor (#20111 )	2022-11-08 19:54:41 +00:00
phobert.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
plbart.mdx	Replace `as_target` context managers by direct calls (#18325 )	2022-07-29 08:09:09 -04:00
poolformer.mdx	AutoImageProcessor (#20111 )	2022-11-08 19:54:41 +00:00
prophetnet.mdx	Update documentation on seq2seq models with absolute positional embeddings, to be in line with Tips section for BERT and GPT2 (#20068 )	2022-11-04 11:32:44 -04:00
qdqbert.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
rag.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
realm.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
reformer.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
regnet.mdx	TF implementation of RegNets (#17554 )	2022-06-29 13:45:14 +01:00
rembert.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
resnet.mdx	Add TF ResNet model (#17427 )	2022-07-04 10:59:15 +01:00
retribert.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
roberta.mdx	Add RoBERTa resources (#19911 )	2022-10-27 11:33:15 -07:00
roc_bert.mdx	Add RocBert (#20013 )	2022-11-08 10:03:43 -05:00
roformer.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
segformer.mdx	AutoImageProcessor (#20111 )	2022-11-08 19:54:41 +00:00
sew-d.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
sew.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
speech_to_text_2.mdx	Generate: move generation_.py src files into generation/.py (#20096 )	2022-11-09 15:34:08 +00:00
speech_to_text.mdx	Fix some doctests after PR 15775 (#20036 )	2022-11-03 14:18:45 +01:00
speech-encoder-decoder.mdx	Replace `as_target` context managers by direct calls (#18325 )	2022-07-29 08:09:09 -04:00
splinter.mdx	docs: Resolve many typos in the English docs (#20088 )	2022-11-07 09:19:04 -05:00
squeezebert.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
swin.mdx	Add Tensorflow Swin model (#16988 )	2022-05-16 22:19:53 +01:00
swinv2.mdx	Add swin transformer v2 (#17469 )	2022-07-27 11:14:47 -04:00
switch_transformers.mdx	Add Switch transformers (#19323 )	2022-11-15 13:06:45 +01:00
t5.mdx	Generate: move generation_.py src files into generation/.py (#20096 )	2022-11-09 15:34:08 +00:00
t5v1.1.mdx	docs: Resolve many typos in the English docs (#20088 )	2022-11-07 09:19:04 -05:00
table-transformer.mdx	Add docs (#19729 )	2022-10-18 17:42:46 +02:00
tapas.mdx	Fix tapas scatter (#20149 )	2022-11-14 01:04:26 -05:00
tapex.mdx	Add TAPEX (#16473 )	2022-04-08 10:57:51 +02:00
time_series_transformer.mdx	time series forecasting model (#17965 )	2022-09-30 15:32:59 -04:00
trajectory_transformer.mdx	Add trajectory transformer (#17141 )	2022-05-17 19:07:43 -04:00
transfo-xl.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
trocr.mdx	Generate: move generation_.py src files into generation/.py (#20096 )	2022-11-09 15:34:08 +00:00
ul2.mdx	Add UL2 (just docs) (#17740 )	2022-06-21 10:24:50 +02:00
unispeech-sat.mdx	Add Wav2Vec2Conformer (#16812 )	2022-05-17 00:43:16 +02:00
unispeech.mdx	Add Wav2Vec2Conformer (#16812 )	2022-05-17 00:43:16 +02:00
van.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
videomae.mdx	AutoImageProcessor (#20111 )	2022-11-08 19:54:41 +00:00
vilt.mdx	AutoImageProcessor (#20111 )	2022-11-08 19:54:41 +00:00
vision-encoder-decoder.mdx	[EncoderDecoder] Improve docs (#18271 )	2022-07-27 10:08:59 +02:00
vision-text-dual-encoder.mdx	docs: Resolve many typos in the English docs (#20088 )	2022-11-07 09:19:04 -05:00
visual_bert.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
vit_mae.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
vit_msn.mdx	MSN (Masked Siamese Networks) for ViT (#18815 )	2022-09-22 07:15:03 -04:00
vit.mdx	AutoImageProcessor (#20111 )	2022-11-08 19:54:41 +00:00
wav2vec2_phoneme.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
wav2vec2-conformer.mdx	[Wav2Vec2Conformer] Official release (#17709 )	2022-06-15 18:34:15 +02:00
wav2vec2.mdx	Add wav2vec2 resources (#19931 )	2022-10-28 13:28:18 -07:00
wavlm.mdx	Add Wav2Vec2Conformer (#16812 )	2022-05-17 00:43:16 +02:00
whisper.mdx	Generate: move generation_.py src files into generation/.py (#20096 )	2022-11-09 15:34:08 +00:00
xclip.mdx	Improve vision models docs (#19103 )	2022-09-19 19:22:34 +02:00
xglm.mdx	Add TF implementation of `XGLMModel` (#16543 )	2022-08-24 10:51:05 +01:00
xlm-prophetnet.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
xlm-roberta-xl.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
xlm-roberta.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
xlm.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
xlnet.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
xls_r.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
xlsr_wav2vec2.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
yolos.mdx	Fix docs (#19687 )	2022-10-18 09:52:51 +02:00
yoso.mdx	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00