mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-16 11:08:23 +06:00
![]() * first commit * add more comments * add router v1 * clean up - remove `tf` modeling files * clean up - remove `tf` modeling files * clean up * v0 routers * added more router - Implemented `ExpertsChooseMaskedRouter` - added tests - 2 more routers to implement * last router * improved docstring - completed the docstring in `router.py` - added more args in the config * v0 sparse mlp * replace wrong naming * forward pass run * update MOE layer * small router update * fixup * consistency * remove scatter router * remove abstract layer * update test and model for integration testing * v1 conversion * update * hardcode hack * all keys match * add gin conversion, without additional libraries * update conversion sctipy * delete router file * update tests wrt router deletion * fix router issues * update expert code * update, logits match, code needsREFACTORING * Refactor code Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com> * add generate tests Co-authored-by: younesbelkada <younesbelkada@gmail.com> * add support for router loss Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com> * fix forward error * refactor a bit * remove `FlaxSwitchTransformers` modules * more tests pass * Update code Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com> * fixup * fix tests * fix doc * fix doc + tokenization * fix tokenizer test * fix test * fix loss output * update code for backward pass * add loss support * update documentation * fix documentation, clean tokenizer * more doc fix, cleanup example_switch * fix failing test * fix test * fix test * fix loss issue * move layer * update doc and fix router capacity usage * fixup * add sparse mlp index for documentation on hub * fixup * test sparse mix architecture * Apply suggestions from code review * Update docs/source/en/model_doc/switch_transformers.mdx * fixup on update * fix tests * fix another test * attempt fix * Update src/transformers/models/switch_transformers/configuration_switch_transformers.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/switch_transformers/convert_switch_transformers_original_flax_checkpoint_to_pytorch.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * try * all tests pass * fix jitter noise * Apply suggestions from code review * doc tests pass * Update src/transformers/models/switch_transformers/modeling_switch_transformers.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/switch_transformers/modeling_switch_transformers.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * remove assert * change config order * fix readme japanese * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * remove parallelizable tests + add one liners * remove ONNX config * fix nits - add `T5Tokenizer` in auto mapping - remove `Switch Transformers` from ONNX supported models * remove `_get_router` * remove asserts * add check in test for `router_dtype` * add `SwitchTransformersConfig` in `run_pipeline_test` * Update tests/pipelines/test_pipelines_summarization.py * add huge model conversion script * fix slow tests - add better casting for `Linear8bitLt` - remove `torchscript` tests * add make dir * style on new script * fix nits - doctest - remove `_keys_to_ignore_on_load_unexpected` * Update src/transformers/models/switch_transformers/configuration_switch_transformers.py * add google as authors * fix year * remove last `assert` statements * standardize vertical spaces * fix failing import * fix another failing test * Remove strange àuthorized_keys` * removing todo and padding that is never used Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: ybelkada <younes@huggingface.co> Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Arthur Zucker <arthur@huggingface.co> |
||
---|---|---|
.. | ||
albert.mdx | ||
auto.mdx | ||
bart.mdx | ||
barthez.mdx | ||
bartpho.mdx | ||
beit.mdx | ||
bert-generation.mdx | ||
bert-japanese.mdx | ||
bert.mdx | ||
bertweet.mdx | ||
big_bird.mdx | ||
bigbird_pegasus.mdx | ||
blenderbot-small.mdx | ||
blenderbot.mdx | ||
bloom.mdx | ||
bort.mdx | ||
byt5.mdx | ||
camembert.mdx | ||
canine.mdx | ||
clip.mdx | ||
clipseg.mdx | ||
codegen.mdx | ||
conditional_detr.mdx | ||
convbert.mdx | ||
convnext.mdx | ||
cpm.mdx | ||
ctrl.mdx | ||
cvt.mdx | ||
data2vec.mdx | ||
deberta-v2.mdx | ||
deberta.mdx | ||
decision_transformer.mdx | ||
deformable_detr.mdx | ||
deit.mdx | ||
detr.mdx | ||
dialogpt.mdx | ||
distilbert.mdx | ||
dit.mdx | ||
donut.mdx | ||
dpr.mdx | ||
dpt.mdx | ||
electra.mdx | ||
encoder-decoder.mdx | ||
ernie.mdx | ||
esm.mdx | ||
flan-t5.mdx | ||
flaubert.mdx | ||
flava.mdx | ||
fnet.mdx | ||
fsmt.mdx | ||
funnel.mdx | ||
glpn.mdx | ||
gpt_neo.mdx | ||
gpt_neox_japanese.mdx | ||
gpt_neox.mdx | ||
gpt2.mdx | ||
gptj.mdx | ||
groupvit.mdx | ||
herbert.mdx | ||
hubert.mdx | ||
ibert.mdx | ||
imagegpt.mdx | ||
jukebox.mdx | ||
layoutlm.mdx | ||
layoutlmv2.mdx | ||
layoutlmv3.mdx | ||
layoutxlm.mdx | ||
led.mdx | ||
levit.mdx | ||
lilt.mdx | ||
longformer.mdx | ||
longt5.mdx | ||
luke.mdx | ||
lxmert.mdx | ||
m2m_100.mdx | ||
marian.mdx | ||
markuplm.mdx | ||
maskformer.mdx | ||
mbart.mdx | ||
mctct.mdx | ||
megatron_gpt2.mdx | ||
megatron-bert.mdx | ||
mluke.mdx | ||
mobilebert.mdx | ||
mobilenet_v2.mdx | ||
mobilevit.mdx | ||
mpnet.mdx | ||
mt5.mdx | ||
mvp.mdx | ||
nezha.mdx | ||
nllb.mdx | ||
nystromformer.mdx | ||
openai-gpt.mdx | ||
opt.mdx | ||
owlvit.mdx | ||
pegasus_x.mdx | ||
pegasus.mdx | ||
perceiver.mdx | ||
phobert.mdx | ||
plbart.mdx | ||
poolformer.mdx | ||
prophetnet.mdx | ||
qdqbert.mdx | ||
rag.mdx | ||
realm.mdx | ||
reformer.mdx | ||
regnet.mdx | ||
rembert.mdx | ||
resnet.mdx | ||
retribert.mdx | ||
roberta.mdx | ||
roc_bert.mdx | ||
roformer.mdx | ||
segformer.mdx | ||
sew-d.mdx | ||
sew.mdx | ||
speech_to_text_2.mdx | ||
speech_to_text.mdx | ||
speech-encoder-decoder.mdx | ||
splinter.mdx | ||
squeezebert.mdx | ||
swin.mdx | ||
swinv2.mdx | ||
switch_transformers.mdx | ||
t5.mdx | ||
t5v1.1.mdx | ||
table-transformer.mdx | ||
tapas.mdx | ||
tapex.mdx | ||
time_series_transformer.mdx | ||
trajectory_transformer.mdx | ||
transfo-xl.mdx | ||
trocr.mdx | ||
ul2.mdx | ||
unispeech-sat.mdx | ||
unispeech.mdx | ||
van.mdx | ||
videomae.mdx | ||
vilt.mdx | ||
vision-encoder-decoder.mdx | ||
vision-text-dual-encoder.mdx | ||
visual_bert.mdx | ||
vit_mae.mdx | ||
vit_msn.mdx | ||
vit.mdx | ||
wav2vec2_phoneme.mdx | ||
wav2vec2-conformer.mdx | ||
wav2vec2.mdx | ||
wavlm.mdx | ||
whisper.mdx | ||
xclip.mdx | ||
xglm.mdx | ||
xlm-prophetnet.mdx | ||
xlm-roberta-xl.mdx | ||
xlm-roberta.mdx | ||
xlm.mdx | ||
xlnet.mdx | ||
xls_r.mdx | ||
xlsr_wav2vec2.mdx | ||
yolos.mdx | ||
yoso.mdx |