transformers/tests/models
Younes Belkada ca2a55e9df
BLOOM (#17474)
* adding template

* update model

* model update

* update conf for debug model

* update conversion

* update conversion script

* update conversion script

* fix missing keys check

* add tests to test the tokenizer in the local machine

* Change variable name

* add tests on xnli dataset

* add more description

* add descriptions + clearer code

* clearer code

* adding new tests + skipping few tests because of env problems

* change comment

* add dtype on the configuration

* add test embeddings

* add hardcoded test

* fix dtype issue

* adding torch.float16 to config

* adding more metrics (min, max, mean)

* add sum

* now the test passes with almost equal

* add files for conversion - test passes on cpu  gpu

* add final changes

* cleaning code

* add new args in the docstring

* fix one liner function

* remove macros

* remove forward attention

* clean up init funtion

* add comments on the issue

* rm scale mask softmax

* do make style

* fix dtype in init

* fixing for loop on att probs

* fix style with black

* fix style + doc error

* fix and debug CI errors (docs + style)

* some updates

- change new operations
- finally add scaled softmax
- added new args in the config

* make use cache working

* add changes

- save sharded models
- final changes on the modeling script

* add changes

- comment on alibi
- add TODO on seq length

* test commit

- added a text to test the commit

Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com>

* final changes

- attention mask change
- generation works on BS176b

Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com>

* changes - model + conversion

* move to correct dir

* put ,

* fex fixes

* fix tokenizer autodoc

* fix minor CI issues

* fix minor CI issues

* fix minor CI issues

* fix style issue

* fix minor import issues

* fix few issues

* remove def main on the test

* add require torch

* replace decorator with 'with'

* fix style

* change to bloom

* add quick fix tokenizer

* fix tokenizer file

* fix tokenizer

- merge tests
- small fixes

* fix import issue

* add bloom to readme

* fix consistency

* Update docs/source/en/model_doc/bloom.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply suggestions from code review

fix comment issues on file headers

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix doc issue

* small fix - modeling test

* some changes

- refactor some code
- taking into account reviews
- more tests should pass
- removed pruning tests

* remove useless division

* more tests should pass

* more tests should pass

* more tests should pass

* let's try this one

-add alibi offset
- remove all permutes to make the grad operations work
- finger crossed

* refactor

- refactor code
- style changes
- add new threshold for test

* major changes

- change BLOOM to Bloom
- add quick doc on bloom.mdx
- move embeddings test on modeling test

* modify readme

* small fixes

* small fix

- better threshold for a test

* remove old test file from fetcher

* fix small typo

* major change

- change BloomLMHead to BloomForCausalLM

* remove onnx config

* major changes

- refactor the code
- remove asserts
- change tol for test

* make style

* small change

* adding a slow test + commenting old ones for now

* make style

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* make style

* fix duplicates

* cleaning comments on config

* clean a bit conversion file

* refacor a bit modeling file

* refactor tokenizer file

* fix tokenization test issue

* fix tokenization issue #2

* fix tokenization issue second try

* fix test issue

* make style + add suggestions

* change test fetcher

* try this one

- slow tests should pass
- finger crossed

* possible final changes

* make style

* try fix padding side issue

* fix side

* fix padding issue

* fix ko-readme

* fix config auto

* cleaning modeling file

* keep bloom in caps in ko

* update config docs

* remove pretraining_pp

* remove model parallel

* update config

- add correct config files

* fix duplicates

* fix fetcher

* fix refactor issue

- remove divide function

* try to remove alibi

* small fixes

- fix alibi
- remove seq length
- refactor a bit the code

* put correct values

- fix bos and eos token ids

* fix attention mask loop

Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com>

* small fixes:

- remove skip bias add

* small fixes

- fix typo in readme
- fix typos in config

* small changes

- remove a test
- add reconstruction test
- change config

* small changes

- change Scaled Softmax to BloomScaledSoftmax

* small fixes

- fix alibi dtype

* major changes

- removing explicit dtype when loading modules
- fixing test args (torch_dtype=auto)
- add dosctring

* fix readmes

* major changes

- now bloom supports alibi shifting
- refactor a bit the code
- better test tolerance now

* refactor a bit

* refactor a bit

* put correct name on test

* change docstring

* small changes

- fix docstring modeling
- fix test tolerance

* fix small nit

- take dtype from tensors in the conversion script

* minor fix

- fix mdx issue

* minor fix

- change config docstring

* forward contrib credits from PR14084

* Apply suggestions from code review

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* apply modifications

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* resolve softmax upcast

* Apply suggestions from code review

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update src/transformers/models/bloom/modeling_bloom.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* final changes modeling

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Merge commit 'd156898f3b9b2c990e5963f5030a7143d57921a2'

* merge commit

* Apply suggestions from code review

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* apply suggestions

Apply suggestions from Stas comments
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Fix gradient checkpointing

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* add slow but exact

* add accelerate compatibility

Co-authored-by: Nicolas Patry <Narsil@users.noreply.github.com>

* forward contrib credits

Co-authored-by: thomasw21 <thomasw21@users.noreply.github.com>
Co-authored-by: sgugger <sgugger@users.noreply.github.com>
Co-authored-by: patrickvonplaten <patrickvonplaten@users.noreply.github.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: LysandreJik <LysandreJik@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix torch device on tests

* make style

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix nits

Co-authored-by: patrickvonplaten<patrickvonplaten@users.noreply.github.com>

* remove final nits

* fix doc

- add more details on the doc
- add links to checkpoints

* Update src/transformers/__init__.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/bloom/modeling_bloom.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* apply suggestions

Co-authored-by: sgugger <sgugger@users.noreply.github.com>

* put test torchscript to false

* Update src/transformers/models/bloom/modeling_bloom.py

Co-authored-by: justheuristic <justheuristic@gmail.com>

* fix alibi

- create alibi only once

* add small doc

* make quality

* replace torch.nn

* remove token type emb

* fix fused op + output bias

* add fused op

- now can control fused operation from config

* remove fused op

* make quality

* small changes

- remove unsed args on config
- removed bias gelu file
- make the model torchscriptable
- add torchscript slow tests

* Update src/transformers/models/bloom/modeling_bloom.py

* fix slow

* make style

* add accelerate support

* add bloom to deepspeed tests

* minor changes

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* minor change

* slow tests pass

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/en/model_doc/bloom.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* minor changes:

- change docstring
- add link to paper

Co-authored-by: Thomwolf <thomwolf@gmail.com>
Co-authored-by: Thomas Wolf <thomas@huggingface.co>
Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: sIncerass <sheng.s@berkeley.edu>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: Nicolas Patry <Narsil@users.noreply.github.com>
Co-authored-by: thomasw21 <thomasw21@users.noreply.github.com>
Co-authored-by: sgugger <sgugger@users.noreply.github.com>
Co-authored-by: patrickvonplaten <patrickvonplaten@users.noreply.github.com>
Co-authored-by: LysandreJik <LysandreJik@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: justheuristic <justheuristic@gmail.com>
Co-authored-by: Stas Bekman <stas@stason.org>
2022-06-09 12:00:40 +02:00
..
albert Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
auto Automatically sort auto mappings (#17250) 2022-05-16 13:24:20 -04:00
bart fix train_new_from_iterator in the case of byte-level tokenizers (#17549) 2022-06-08 15:30:41 +02:00
barthez Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
bartpho Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
beit Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
bert Black preview (#17217) 2022-05-12 16:25:55 -04:00
bert_generation Black preview (#17217) 2022-05-12 16:25:55 -04:00
bert_japanese Black preview (#17217) 2022-05-12 16:25:55 -04:00
bertweet Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
big_bird Black preview (#17217) 2022-05-12 16:25:55 -04:00
bigbird_pegasus Black preview (#17217) 2022-05-12 16:25:55 -04:00
blenderbot fix train_new_from_iterator in the case of byte-level tokenizers (#17549) 2022-06-08 15:30:41 +02:00
blenderbot_small Fx support for multiple model architectures (#17393) 2022-05-31 10:02:55 +02:00
bloom BLOOM (#17474) 2022-06-09 12:00:40 +02:00
bort Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
byt5 Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
camembert Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
canine Black preview (#17217) 2022-05-12 16:25:55 -04:00
clip Fx support for multiple model architectures (#17393) 2022-05-31 10:02:55 +02:00
convbert Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
convnext has_attentions - consistent test skipping logic and tf tests (#17495) 2022-06-09 09:50:03 +02:00
cpm Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
ctrl Fix CTRL tests (#17508) 2022-06-01 16:27:23 +02:00
cvt has_attentions - consistent test skipping logic and tf tests (#17495) 2022-06-09 09:50:03 +02:00
data2vec Add TFData2VecVision for semantic segmentation (#17271) 2022-06-08 14:03:18 +01:00
deberta fix train_new_from_iterator in the case of byte-level tokenizers (#17549) 2022-06-08 15:30:41 +02:00
deberta_v2 Fx support for Deberta-v[1-2], Hubert and LXMERT (#17539) 2022-06-07 18:05:20 +02:00
decision_transformer Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
deit Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
detr Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
distilbert Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
dit Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
dpr Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
dpt Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
electra Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
encoder_decoder Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
flaubert Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
flava has_attentions - consistent test skipping logic and tf tests (#17495) 2022-06-09 09:50:03 +02:00
fnet Black preview (#17217) 2022-05-12 16:25:55 -04:00
fsmt Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
funnel Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
glpn Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
gpt_neo fix train_new_from_iterator in the case of byte-level tokenizers (#17549) 2022-06-08 15:30:41 +02:00
gpt_neox [WIP] Adding GPT-NeoX-20B (#16659) 2022-05-24 09:31:10 -04:00
gpt2 fix train_new_from_iterator in the case of byte-level tokenizers (#17549) 2022-06-08 15:30:41 +02:00
gptj fix train_new_from_iterator in the case of byte-level tokenizers (#17549) 2022-06-08 15:30:41 +02:00
herbert Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
hubert Fx support for Deberta-v[1-2], Hubert and LXMERT (#17539) 2022-06-07 18:05:20 +02:00
ibert fix train_new_from_iterator in the case of byte-level tokenizers (#17549) 2022-06-08 15:30:41 +02:00
imagegpt Enabling imageGPT auto feature extractor. (#16871) 2022-05-24 12:30:46 +02:00
layoutlm Fx support for multiple model architectures (#17393) 2022-05-31 10:02:55 +02:00
layoutlmv2 Add LayoutLMv3 (#17060) 2022-05-24 09:53:45 +02:00
layoutlmv3 Add LayoutLMv3 (#17060) 2022-05-24 09:53:45 +02:00
layoutxlm Fix LayoutXLMProcessorTest (#17506) 2022-06-01 16:26:37 +02:00
led fix train_new_from_iterator in the case of byte-level tokenizers (#17549) 2022-06-08 15:30:41 +02:00
levit fix integration test levit (#17555) 2022-06-06 13:47:32 +02:00
longformer fix train_new_from_iterator in the case of byte-level tokenizers (#17549) 2022-06-08 15:30:41 +02:00
luke Debug LukeForMaskedLM (#17499) 2022-06-01 10:03:06 -04:00
lxmert Fx support for Deberta-v[1-2], Hubert and LXMERT (#17539) 2022-06-07 18:05:20 +02:00
m2m_100 Fx support for multiple model architectures (#17393) 2022-05-31 10:02:55 +02:00
marian Fx support for multiple model architectures (#17393) 2022-05-31 10:02:55 +02:00
maskformer Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
mbart Fx support for multiple model architectures (#17393) 2022-05-31 10:02:55 +02:00
mbart50 Black preview (#17217) 2022-05-12 16:25:55 -04:00
mctct M-CTC-T Model (#16402) 2022-06-08 00:33:07 +02:00
megatron_bert Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
megatron_gpt2 Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
mluke Black preview (#17217) 2022-05-12 16:25:55 -04:00
mobilebert Black preview (#17217) 2022-05-12 16:25:55 -04:00
mpnet Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
mt5 Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
nystromformer Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
openai Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
opt Fix all offload and MP tests (#17533) 2022-06-03 09:59:13 -04:00
pegasus Fx support for multiple model architectures (#17393) 2022-05-31 10:02:55 +02:00
perceiver Black preview (#17217) 2022-05-12 16:25:55 -04:00
phobert Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
plbart Fx support for multiple model architectures (#17393) 2022-05-31 10:02:55 +02:00
poolformer has_attentions - consistent test skipping logic and tf tests (#17495) 2022-06-09 09:50:03 +02:00
prophetnet Black preview (#17217) 2022-05-12 16:25:55 -04:00
qdqbert Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
rag Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
realm Black preview (#17217) 2022-05-12 16:25:55 -04:00
reformer Black preview (#17217) 2022-05-12 16:25:55 -04:00
regnet has_attentions - consistent test skipping logic and tf tests (#17495) 2022-06-09 09:50:03 +02:00
rembert Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
resnet has_attentions - consistent test skipping logic and tf tests (#17495) 2022-06-09 09:50:03 +02:00
retribert fix retribert's test_torch_encode_plus_sent_to_model (#17231) 2022-05-17 14:33:13 +02:00
roberta fix train_new_from_iterator in the case of byte-level tokenizers (#17549) 2022-06-08 15:30:41 +02:00
roformer Skip RoFormer ONNX test if rjieba not installed (#16981) 2022-05-04 10:04:10 +02:00
segformer Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
sew Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
sew_d Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
speech_encoder_decoder Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
speech_to_text Fx support for multiple model architectures (#17393) 2022-05-31 10:02:55 +02:00
speech_to_text_2 Fx support for multiple model architectures (#17393) 2022-05-31 10:02:55 +02:00
splinter Add support for pretraining recurring span selection to Splinter (#17247) 2022-05-17 23:42:14 +02:00
squeezebert Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
swin Fx support for multiple model architectures (#17393) 2022-05-31 10:02:55 +02:00
t5 Skip disk offload test for T5 2022-06-07 11:11:40 -04:00
tapas Add magic method to our TF models to convert datasets with column inference (#17160) 2022-06-06 15:53:49 +01:00
tapex Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
trajectory_transformer Add trajectory transformer (#17141) 2022-05-17 19:07:43 -04:00
transfo_xl Add magic method to our TF models to convert datasets with column inference (#17160) 2022-06-06 15:53:49 +01:00
trocr Fx support for multiple model architectures (#17393) 2022-05-31 10:02:55 +02:00
unispeech Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
unispeech_sat Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
van has_attentions - consistent test skipping logic and tf tests (#17495) 2022-06-09 09:50:03 +02:00
vilt Black preview (#17217) 2022-05-12 16:25:55 -04:00
vision_encoder_decoder Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
vision_text_dual_encoder Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
visual_bert Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
vit ViT and Swin symbolic tracing with torch.fx (#17182) 2022-05-12 10:42:27 +02:00
vit_mae Fix ViTMAEModelTester (#17470) 2022-05-31 15:01:54 +02:00
wav2vec2 Black preview (#17217) 2022-05-12 16:25:55 -04:00
wav2vec2_conformer [Test] Fix W2V-Conformer integration test (#17303) 2022-05-17 18:20:36 +02:00
wav2vec2_phoneme Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
wav2vec2_with_lm Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
wavlm Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
xglm Fx support for multiple model architectures (#17393) 2022-05-31 10:02:55 +02:00
xlm Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
xlm_prophetnet Black preview (#17217) 2022-05-12 16:25:55 -04:00
xlm_roberta Black preview (#17217) 2022-05-12 16:25:55 -04:00
xlm_roberta_xl Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
xlnet Fx support for multiple model architectures (#17393) 2022-05-31 10:02:55 +02:00
yolos Move test model folders (#17034) 2022-05-03 14:42:02 +02:00
yoso fix train_new_from_iterator in the case of byte-level tokenizers (#17549) 2022-06-08 15:30:41 +02:00
__init__.py Move test model folders (#17034) 2022-05-03 14:42:02 +02:00