transformers/docs/source/en/model_doc
Younes Belkada ca2a55e9df
BLOOM (#17474)
* adding template

* update model

* model update

* update conf for debug model

* update conversion

* update conversion script

* update conversion script

* fix missing keys check

* add tests to test the tokenizer in the local machine

* Change variable name

* add tests on xnli dataset

* add more description

* add descriptions + clearer code

* clearer code

* adding new tests + skipping few tests because of env problems

* change comment

* add dtype on the configuration

* add test embeddings

* add hardcoded test

* fix dtype issue

* adding torch.float16 to config

* adding more metrics (min, max, mean)

* add sum

* now the test passes with almost equal

* add files for conversion - test passes on cpu  gpu

* add final changes

* cleaning code

* add new args in the docstring

* fix one liner function

* remove macros

* remove forward attention

* clean up init funtion

* add comments on the issue

* rm scale mask softmax

* do make style

* fix dtype in init

* fixing for loop on att probs

* fix style with black

* fix style + doc error

* fix and debug CI errors (docs + style)

* some updates

- change new operations
- finally add scaled softmax
- added new args in the config

* make use cache working

* add changes

- save sharded models
- final changes on the modeling script

* add changes

- comment on alibi
- add TODO on seq length

* test commit

- added a text to test the commit

Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com>

* final changes

- attention mask change
- generation works on BS176b

Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com>

* changes - model + conversion

* move to correct dir

* put ,

* fex fixes

* fix tokenizer autodoc

* fix minor CI issues

* fix minor CI issues

* fix minor CI issues

* fix style issue

* fix minor import issues

* fix few issues

* remove def main on the test

* add require torch

* replace decorator with 'with'

* fix style

* change to bloom

* add quick fix tokenizer

* fix tokenizer file

* fix tokenizer

- merge tests
- small fixes

* fix import issue

* add bloom to readme

* fix consistency

* Update docs/source/en/model_doc/bloom.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply suggestions from code review

fix comment issues on file headers

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix doc issue

* small fix - modeling test

* some changes

- refactor some code
- taking into account reviews
- more tests should pass
- removed pruning tests

* remove useless division

* more tests should pass

* more tests should pass

* more tests should pass

* let's try this one

-add alibi offset
- remove all permutes to make the grad operations work
- finger crossed

* refactor

- refactor code
- style changes
- add new threshold for test

* major changes

- change BLOOM to Bloom
- add quick doc on bloom.mdx
- move embeddings test on modeling test

* modify readme

* small fixes

* small fix

- better threshold for a test

* remove old test file from fetcher

* fix small typo

* major change

- change BloomLMHead to BloomForCausalLM

* remove onnx config

* major changes

- refactor the code
- remove asserts
- change tol for test

* make style

* small change

* adding a slow test + commenting old ones for now

* make style

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* make style

* fix duplicates

* cleaning comments on config

* clean a bit conversion file

* refacor a bit modeling file

* refactor tokenizer file

* fix tokenization test issue

* fix tokenization issue #2

* fix tokenization issue second try

* fix test issue

* make style + add suggestions

* change test fetcher

* try this one

- slow tests should pass
- finger crossed

* possible final changes

* make style

* try fix padding side issue

* fix side

* fix padding issue

* fix ko-readme

* fix config auto

* cleaning modeling file

* keep bloom in caps in ko

* update config docs

* remove pretraining_pp

* remove model parallel

* update config

- add correct config files

* fix duplicates

* fix fetcher

* fix refactor issue

- remove divide function

* try to remove alibi

* small fixes

- fix alibi
- remove seq length
- refactor a bit the code

* put correct values

- fix bos and eos token ids

* fix attention mask loop

Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com>

* small fixes:

- remove skip bias add

* small fixes

- fix typo in readme
- fix typos in config

* small changes

- remove a test
- add reconstruction test
- change config

* small changes

- change Scaled Softmax to BloomScaledSoftmax

* small fixes

- fix alibi dtype

* major changes

- removing explicit dtype when loading modules
- fixing test args (torch_dtype=auto)
- add dosctring

* fix readmes

* major changes

- now bloom supports alibi shifting
- refactor a bit the code
- better test tolerance now

* refactor a bit

* refactor a bit

* put correct name on test

* change docstring

* small changes

- fix docstring modeling
- fix test tolerance

* fix small nit

- take dtype from tensors in the conversion script

* minor fix

- fix mdx issue

* minor fix

- change config docstring

* forward contrib credits from PR14084

* Apply suggestions from code review

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* apply modifications

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* resolve softmax upcast

* Apply suggestions from code review

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update src/transformers/models/bloom/modeling_bloom.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* final changes modeling

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Merge commit 'd156898f3b9b2c990e5963f5030a7143d57921a2'

* merge commit

* Apply suggestions from code review

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* apply suggestions

Apply suggestions from Stas comments
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Fix gradient checkpointing

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* add slow but exact

* add accelerate compatibility

Co-authored-by: Nicolas Patry <Narsil@users.noreply.github.com>

* forward contrib credits

Co-authored-by: thomasw21 <thomasw21@users.noreply.github.com>
Co-authored-by: sgugger <sgugger@users.noreply.github.com>
Co-authored-by: patrickvonplaten <patrickvonplaten@users.noreply.github.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: LysandreJik <LysandreJik@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix torch device on tests

* make style

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix nits

Co-authored-by: patrickvonplaten<patrickvonplaten@users.noreply.github.com>

* remove final nits

* fix doc

- add more details on the doc
- add links to checkpoints

* Update src/transformers/__init__.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/bloom/modeling_bloom.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* apply suggestions

Co-authored-by: sgugger <sgugger@users.noreply.github.com>

* put test torchscript to false

* Update src/transformers/models/bloom/modeling_bloom.py

Co-authored-by: justheuristic <justheuristic@gmail.com>

* fix alibi

- create alibi only once

* add small doc

* make quality

* replace torch.nn

* remove token type emb

* fix fused op + output bias

* add fused op

- now can control fused operation from config

* remove fused op

* make quality

* small changes

- remove unsed args on config
- removed bias gelu file
- make the model torchscriptable
- add torchscript slow tests

* Update src/transformers/models/bloom/modeling_bloom.py

* fix slow

* make style

* add accelerate support

* add bloom to deepspeed tests

* minor changes

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* minor change

* slow tests pass

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/en/model_doc/bloom.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* minor changes:

- change docstring
- add link to paper

Co-authored-by: Thomwolf <thomwolf@gmail.com>
Co-authored-by: Thomas Wolf <thomas@huggingface.co>
Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: sIncerass <sheng.s@berkeley.edu>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: Nicolas Patry <Narsil@users.noreply.github.com>
Co-authored-by: thomasw21 <thomasw21@users.noreply.github.com>
Co-authored-by: sgugger <sgugger@users.noreply.github.com>
Co-authored-by: patrickvonplaten <patrickvonplaten@users.noreply.github.com>
Co-authored-by: LysandreJik <LysandreJik@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: justheuristic <justheuristic@gmail.com>
Co-authored-by: Stas Bekman <stas@stason.org>
2022-06-09 12:00:40 +02:00
..
albert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
auto.mdx add mobilebert onnx configs (#17029) 2022-05-09 10:36:53 -04:00
bart.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
barthez.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
bartpho.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
beit.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
bert-generation.mdx Result of new doc style with fixes (#17015) 2022-04-29 17:42:15 -04:00
bert-japanese.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
bert.mdx [FlaxBert] Add ForCausalLM (#16995) 2022-05-03 11:26:19 +02:00
bertweet.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
big_bird.mdx [FlaxBert] Add ForCausalLM (#16995) 2022-05-03 11:26:19 +02:00
bigbird_pegasus.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
blenderbot-small.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
blenderbot.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
bloom.mdx BLOOM (#17474) 2022-06-09 12:00:40 +02:00
bort.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
byt5.mdx [Doctests] Fix all T5 doc tests (#16646) 2022-04-13 11:36:54 +02:00
camembert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
canine.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
clip.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
convbert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
convnext.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
cpm.mdx Allow all imports from transformers (#17050) 2022-05-02 12:47:39 -04:00
ctrl.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
cvt.mdx Add CvT (#17299) 2022-05-18 17:47:18 +02:00
data2vec.mdx Add TFData2VecVision for semantic segmentation (#17271) 2022-06-08 14:03:18 +01:00
deberta-v2.mdx Add DebertaV2ForMultipleChoice (#17135) 2022-05-10 16:21:44 -04:00
deberta.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
decision_transformer.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
deit.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
detr.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
dialogpt.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
distilbert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
dit.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
dpr.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
dpt.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
electra.mdx [FlaxBert] Add ForCausalLM (#16995) 2022-05-03 11:26:19 +02:00
encoder-decoder.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
flaubert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
flava.mdx [feat] Add FLAVA model (#16654) 2022-05-11 14:56:48 -07:00
fnet.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
fsmt.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
funnel.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
glpn.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
gpt_neo.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
gpt_neox.mdx [WIP] Adding GPT-NeoX-20B (#16659) 2022-05-24 09:31:10 -04:00
gpt2.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
gptj.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
herbert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
hubert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
ibert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
imagegpt.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
layoutlm.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
layoutlmv2.mdx Correct & Improve Doctests for LayoutLMv2 (#17168) 2022-05-23 08:02:31 -04:00
layoutlmv3.mdx Add LayoutLMv3 (#17060) 2022-05-24 09:53:45 +02:00
layoutxlm.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
led.mdx [LED] fix global_attention_mask not being passed for generation and docs clarification about grad checkpointing (#17112) 2022-05-17 23:44:37 +02:00
levit.mdx Adding LeViT Model by Facebook (#17466) 2022-06-01 17:06:20 +02:00
longformer.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
luke.mdx Result of new doc style with fixes (#17015) 2022-04-29 17:42:15 -04:00
lxmert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
m2m_100.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
marian.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
maskformer.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
mbart.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
mctct.mdx M-CTC-T Model (#16402) 2022-06-08 00:33:07 +02:00
megatron_gpt2.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
megatron-bert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
mluke.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
mobilebert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
mpnet.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
mt5.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
nystromformer.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
openai-gpt.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
opt.mdx Opt in flax and tf (#17388) 2022-05-31 18:41:22 +02:00
pegasus.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
perceiver.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
phobert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
plbart.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
poolformer.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
prophetnet.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
qdqbert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
rag.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
realm.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
reformer.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
regnet.mdx RegNet (#16188) 2022-04-07 21:58:00 +02:00
rembert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
resnet.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
retribert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
roberta.mdx [FlaxBert] Add ForCausalLM (#16995) 2022-05-03 11:26:19 +02:00
roformer.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
segformer.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
sew-d.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
sew.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
speech_to_text_2.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
speech_to_text.mdx [Speech2Text Doc] Fix docs (#16611) 2022-04-06 14:19:00 +02:00
speech-encoder-decoder.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
splinter.mdx Add support for pretraining recurring span selection to Splinter (#17247) 2022-05-17 23:42:14 +02:00
squeezebert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
swin.mdx Add Tensorflow Swin model (#16988) 2022-05-16 22:19:53 +01:00
t5.mdx [DocTests] Fix some doc tests (#16889) 2022-04-23 08:40:14 +02:00
t5v1.1.mdx [Doctests] Fix all T5 doc tests (#16646) 2022-04-13 11:36:54 +02:00
tapas.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
tapex.mdx Add TAPEX (#16473) 2022-04-08 10:57:51 +02:00
trajectory_transformer.mdx Add trajectory transformer (#17141) 2022-05-17 19:07:43 -04:00
transfo-xl.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
trocr.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
unispeech-sat.mdx Add Wav2Vec2Conformer (#16812) 2022-05-17 00:43:16 +02:00
unispeech.mdx Add Wav2Vec2Conformer (#16812) 2022-05-17 00:43:16 +02:00
van.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
vilt.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
vision-encoder-decoder.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
vision-text-dual-encoder.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
visual_bert.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
vit_mae.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
vit.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
wav2vec2_phoneme.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
wav2vec2-conformer.mdx Add Wav2Vec2Conformer (#16812) 2022-05-17 00:43:16 +02:00
wav2vec2.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
wavlm.mdx Add Wav2Vec2Conformer (#16812) 2022-05-17 00:43:16 +02:00
xglm.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
xlm-prophetnet.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
xlm-roberta-xl.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
xlm-roberta.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
xlm.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
xlnet.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
xls_r.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
xlsr_wav2vec2.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
yolos.mdx Add YOLOS (#16848) 2022-05-02 18:30:55 +02:00
yoso.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00