Commit Graph

20 Commits

Author SHA1 Message Date
Sanchit Gandhi
e93103632b
Add bloom flax (#25094)
* First commit

* step 1 working

* add alibi

* placeholder for `scan`

* add matrix mult alibi

* beta scaling factor for bmm

* working v1 - simple forward pass

* move layer_number from attribute to arg in call

* partial functioning scan

* hacky working scan

* add more modifs

* add test

* update scan for new kwarg order

* fix position_ids problem

* fix bug in attention layer

* small fix

- do the alibi broadcasting only once

* prelim refactor

* finish refactor

* alibi shifting

* incorporate dropout_add to attention module

* make style

* make padding work again

* update

* remove bogus file

* up

* get generation to work

* clean code a bit

* added small tests

* adding albii test

* make CI tests pass:

- change init weight
- add correct tuple for output attention
- add scan test
- make CI tests work

* fix few nits

* fix nit onnx

* fix onnx nit

* add missing dtype args to nn.Modules

* remove debugging statements

* fix scan generate

* Update modeling_flax_bloom.py

* Update test_modeling_flax_bloom.py

* Update test_modeling_flax_bloom.py

* Update test_modeling_flax_bloom.py

* fix small test issue + make style

* clean up

* Update tests/models/bloom/test_modeling_flax_bloom.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* fix function name

* small fix test

* forward contrib credits from PR17761

* Fix failing test

* fix small typo documentation

* fix non passing test

- remove device from build alibi

* refactor call

- refactor `FlaxBloomBlockCollection` module

* make style

* upcast to fp32

* cleaner way to upcast

* remove unused args

* remove layer number

* fix scan test

* make style

* fix i4 casting

* fix slow test

* Update src/transformers/models/bloom/modeling_flax_bloom.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* remove `layer_past`

* refactor a bit

* fix `scan` slow test

* remove useless import

* major changes

- remove unused code
- refactor a bit
- revert import `torch`

* major refactoring

- change build alibi

* remove scan

* fix tests

* make style

* clean-up alibi

* add integration tests

* up

* fix batch norm conversion

* style

* style

* update pt-fx cross tests

* update copyright

* Update src/transformers/modeling_flax_pytorch_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* per-weight check

* style

* line formats

---------

Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: haileyschoelkopf <haileyschoelkopf@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-07-27 18:24:56 +01:00
Yih-Dar
896a58de15
Byebye pytorch 1.9 (#24080)
byebye

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-06-16 16:38:23 +02:00
Joao Gante
918a06e25d
Generate: add test to check KV format (#23403)
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-05-16 19:28:19 +01:00
Yih-Dar
67c2dbdb54
Time to Say Goodbye, torch 1.7 and 1.8 (#22291)
* time to say goodbye, torch 1.7 and 1.8

* clean up torch_int_div

* clean up is_torch_less_than_1_8-9

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-21 19:22:01 +01:00
Yih-Dar
871c31a6f1
🔥Rework pipeline testing by removing PipelineTestCaseMeta 🚀 (#21516)
* Add PipelineTesterMixin

* remove class PipelineTestCaseMeta

* move validate_test_components

* Add for ViT

* Add to SPECIAL_MODULE_TO_TEST_MAP

* style and quality

* Add feature-extraction

* update

* raise instead of skip

* add tiny_model_summary.json

* more explicit

* skip tasks not in mapping

* add availability check

* Add Copyright

* A way to diable irrelevant tests

* update with main

* remove disable_irrelevant_tests

* skip tests

* better skip message

* better skip message

* Add all pipeline task tests

* revert

* Import PipelineTesterMixin

* subclass test classes with PipelineTesterMixin

* Add pipieline_model_mapping

* Fix import after adding pipieline_model_mapping

* Fix style and quality after adding pipieline_model_mapping

* Fix one more import after adding pipieline_model_mapping

* Fix style and quality after adding pipieline_model_mapping

* Fix test issues

* Fix import requirements

* Fix mapping for MobileViTModelTest

* Update

* Better skip message

* pipieline_model_mapping could not be None

* Remove some PipelineTesterMixin

* Fix typo

* revert tests_fetcher.py

* update

* rename

* revert

* Remove PipelineTestCaseMeta from ZeroShotAudioClassificationPipelineTests

* style and quality

* test fetcher for all pipeline/model tests

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-02-28 19:40:57 +01:00
Sylvain Gugger
6f79d26442
Update quality tooling for formatting (#21480)
* Result of black 23.1

* Update target to Python 3.7

* Switch flake8 to ruff

* Configure isort

* Configure isort

* Apply isort with line limit

* Put the right black version

* adapt black in check copies

* Fix copies
2023-02-06 18:10:56 -05:00
Yih-Dar
5fa0b17c3d
[Past CI] 🔥 Leave Past CI failures in the past 🔥 (#20861)
* torch.jit._state

* Fix past CI

* Fix for perceiver

* Fix REALM

* Fix for Bloom

* Fix for SwinMode

* Fix for TrajectoryTransformerModel

* Fix for test_wav2vec2_with_lm

* make style

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-12-27 18:37:25 +01:00
Joao Gante
f270b960d6
Generate: move generation_*.py src files into generation/*.py (#20096)
* move generation_*.py src files into generation/*.py

* populate generation.__init__ with lazy loading

* move imports and references from generation.xxx.object to generation.object
2022-11-09 15:34:08 +00:00
Yih-Dar
cbb8a37929
Skip BloomEmbeddingTest.test_embeddings for PyTorch < 1.10 (#19261)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-10-10 10:05:30 +02:00
Younes Belkada
587d84b178
Add BloomForQuestionAnswering (#19310)
* add bloom for question answering

- attempt to add Bloom for question answering
- adapted from `GPTJForQuestionAnswering`
- Fixed `num_labels` to `2` for common tests
- Added a bit of docstring
- All common tests pass

* Update src/transformers/models/bloom/modeling_bloom.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* revert changes related to `num_labels`

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-10-04 17:52:13 +02:00
Yih-Dar
ea75e9f10e
Use assertAlmostEqual in BloomEmbeddingTest.test_logits (#19200)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-09-26 14:56:41 +02:00
Shijie Wu
f3d3863255
fix arg name in BLOOM testing and remove unused arg document (#18843) 2022-09-15 20:25:32 +02:00
Niklas Muennighoff
56ef0ba447
Update BLOOM parameter counts (#18531)
* Update BLOOM parameter counts

* Update BLOOM parameter counts
2022-08-12 19:36:18 +02:00
Michael Benayoun
4e2f4a92dd
[FX] Symbolic trace for Bloom (#18356)
* Bloom model can now be traced

* Bloom traced model can be torch scripted and serialized

* Bloom can be traced with variable keyword arguments

* Enable XLNet support

* Disable XLNet for now
2022-07-29 16:12:27 +02:00
Younes Belkada
6a1b1bf7a6
BLOOM minor fixes small test (#18175)
* minor fixes

- add correct revision
- corrected dosctring for test
- removed a test

* contrib credits

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: Nouamane Tazi <nouamane98@gmail.com>

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: Nouamane Tazi <nouamane98@gmail.com>
2022-07-18 19:18:19 +02:00
Younes Belkada
a462fc9232
Bloom Optimize operations (#17866)
* fix tolerance for a bloom slow test

* enhance alibi padding

- get rid of for loops
- deals better with padded batched input
- avoid useless cpu/gpu communication when creating alibi

Co-authored-by: justheuristic <justheuristic@gmail.com>

* optimize attention mask

* fix scaled softmax limit values

* optimize building alibi tensor

Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>

* fix attention_mask shape when it's None

* minor fixes

- fix docstring + arg names

* remove colons in docstring

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* apply suggestion

* remove unsued arg

* refactor a bit

- use [:, None] for consistency

* refactor attention block

Co-authored-by: Nouamane Tazi <nouamane98@gmail.com>

* quick fixes

* first attempt

* refactor attention block and fix all tests except "test_simple_generation"

- added comments to better explain attention block

* remove debug lines and add TODO comment

* change `torch.bmm` to `torch.baddbmm`
- fixes `test_simple_generation`but breaks `test_batch_generation_padd`

* styling

* all tests are passing now
- use `bmm`
- add explanation for `allow_fp16_reduced_precision_reduction`

Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>

* styling

Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>

* fix support for accelerate

Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* remove attn softmax in fp32

* refactor comments

* refactor a bit

- remove warning message
- remove print on test

* refer to pytorch t5

* change the slow tests

- do the tests in fp32
- remove some comments
- keep large comments

* update expected output for `test_simple_generation`
- we now test using fp32

* make style + change comments a bit

* fix dtype padd test

Co-authored-by: justheuristic <justheuristic@gmail.com>
Co-authored-by: Nouamane Tazi <nouamane98@gmail.com>
Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-07-11 13:16:13 -04:00
Younes Belkada
18c263c4b6
BLOOM minor changes on tokenizer (#17823)
* few fixes:

- hardcode tokenizer padding side
- remove unused args

* few fixes:

- added new attribute on TokenizerTesterMixin
- added new slow test
- remove unused arg on tokenizer class

* make style

* Update src/transformers/models/bloom/tokenization_bloom_fast.py

Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>

* make quality

* apply changes

- remove new attribute
- redefine test on the class

* add comments

Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
2022-06-23 15:57:12 +02:00
Younes Belkada
d453ea6120
fix tolerance for a bloom slow test (#17634) 2022-06-14 18:14:12 +02:00
Hailey Schoelkopf
edb672ac5e
Add BloomForSequenceClassification and BloomForTokenClassification classes (#17639)
* add new bloom classes

* (feat) add bloom classification tests; make style

* style: change import in test

* add some typehints to bloom classes

* merge main into branch

* fix: input checking in bloom seq classification

* fix tests

* change model class tests

* fix few tests

- more tests should pass
- one test left

* make token classifier return hidden states

* style: make BLOOM typehints consistent

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2022-06-14 17:10:12 +02:00
Younes Belkada
ca2a55e9df
BLOOM (#17474)
* adding template

* update model

* model update

* update conf for debug model

* update conversion

* update conversion script

* update conversion script

* fix missing keys check

* add tests to test the tokenizer in the local machine

* Change variable name

* add tests on xnli dataset

* add more description

* add descriptions + clearer code

* clearer code

* adding new tests + skipping few tests because of env problems

* change comment

* add dtype on the configuration

* add test embeddings

* add hardcoded test

* fix dtype issue

* adding torch.float16 to config

* adding more metrics (min, max, mean)

* add sum

* now the test passes with almost equal

* add files for conversion - test passes on cpu  gpu

* add final changes

* cleaning code

* add new args in the docstring

* fix one liner function

* remove macros

* remove forward attention

* clean up init funtion

* add comments on the issue

* rm scale mask softmax

* do make style

* fix dtype in init

* fixing for loop on att probs

* fix style with black

* fix style + doc error

* fix and debug CI errors (docs + style)

* some updates

- change new operations
- finally add scaled softmax
- added new args in the config

* make use cache working

* add changes

- save sharded models
- final changes on the modeling script

* add changes

- comment on alibi
- add TODO on seq length

* test commit

- added a text to test the commit

Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com>

* final changes

- attention mask change
- generation works on BS176b

Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com>

* changes - model + conversion

* move to correct dir

* put ,

* fex fixes

* fix tokenizer autodoc

* fix minor CI issues

* fix minor CI issues

* fix minor CI issues

* fix style issue

* fix minor import issues

* fix few issues

* remove def main on the test

* add require torch

* replace decorator with 'with'

* fix style

* change to bloom

* add quick fix tokenizer

* fix tokenizer file

* fix tokenizer

- merge tests
- small fixes

* fix import issue

* add bloom to readme

* fix consistency

* Update docs/source/en/model_doc/bloom.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply suggestions from code review

fix comment issues on file headers

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix doc issue

* small fix - modeling test

* some changes

- refactor some code
- taking into account reviews
- more tests should pass
- removed pruning tests

* remove useless division

* more tests should pass

* more tests should pass

* more tests should pass

* let's try this one

-add alibi offset
- remove all permutes to make the grad operations work
- finger crossed

* refactor

- refactor code
- style changes
- add new threshold for test

* major changes

- change BLOOM to Bloom
- add quick doc on bloom.mdx
- move embeddings test on modeling test

* modify readme

* small fixes

* small fix

- better threshold for a test

* remove old test file from fetcher

* fix small typo

* major change

- change BloomLMHead to BloomForCausalLM

* remove onnx config

* major changes

- refactor the code
- remove asserts
- change tol for test

* make style

* small change

* adding a slow test + commenting old ones for now

* make style

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* make style

* fix duplicates

* cleaning comments on config

* clean a bit conversion file

* refacor a bit modeling file

* refactor tokenizer file

* fix tokenization test issue

* fix tokenization issue #2

* fix tokenization issue second try

* fix test issue

* make style + add suggestions

* change test fetcher

* try this one

- slow tests should pass
- finger crossed

* possible final changes

* make style

* try fix padding side issue

* fix side

* fix padding issue

* fix ko-readme

* fix config auto

* cleaning modeling file

* keep bloom in caps in ko

* update config docs

* remove pretraining_pp

* remove model parallel

* update config

- add correct config files

* fix duplicates

* fix fetcher

* fix refactor issue

- remove divide function

* try to remove alibi

* small fixes

- fix alibi
- remove seq length
- refactor a bit the code

* put correct values

- fix bos and eos token ids

* fix attention mask loop

Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com>

* small fixes:

- remove skip bias add

* small fixes

- fix typo in readme
- fix typos in config

* small changes

- remove a test
- add reconstruction test
- change config

* small changes

- change Scaled Softmax to BloomScaledSoftmax

* small fixes

- fix alibi dtype

* major changes

- removing explicit dtype when loading modules
- fixing test args (torch_dtype=auto)
- add dosctring

* fix readmes

* major changes

- now bloom supports alibi shifting
- refactor a bit the code
- better test tolerance now

* refactor a bit

* refactor a bit

* put correct name on test

* change docstring

* small changes

- fix docstring modeling
- fix test tolerance

* fix small nit

- take dtype from tensors in the conversion script

* minor fix

- fix mdx issue

* minor fix

- change config docstring

* forward contrib credits from PR14084

* Apply suggestions from code review

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* apply modifications

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* resolve softmax upcast

* Apply suggestions from code review

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update src/transformers/models/bloom/modeling_bloom.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* final changes modeling

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Merge commit 'd156898f3b9b2c990e5963f5030a7143d57921a2'

* merge commit

* Apply suggestions from code review

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* apply suggestions

Apply suggestions from Stas comments
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Fix gradient checkpointing

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* add slow but exact

* add accelerate compatibility

Co-authored-by: Nicolas Patry <Narsil@users.noreply.github.com>

* forward contrib credits

Co-authored-by: thomasw21 <thomasw21@users.noreply.github.com>
Co-authored-by: sgugger <sgugger@users.noreply.github.com>
Co-authored-by: patrickvonplaten <patrickvonplaten@users.noreply.github.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: LysandreJik <LysandreJik@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix torch device on tests

* make style

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix nits

Co-authored-by: patrickvonplaten<patrickvonplaten@users.noreply.github.com>

* remove final nits

* fix doc

- add more details on the doc
- add links to checkpoints

* Update src/transformers/__init__.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/bloom/modeling_bloom.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* apply suggestions

Co-authored-by: sgugger <sgugger@users.noreply.github.com>

* put test torchscript to false

* Update src/transformers/models/bloom/modeling_bloom.py

Co-authored-by: justheuristic <justheuristic@gmail.com>

* fix alibi

- create alibi only once

* add small doc

* make quality

* replace torch.nn

* remove token type emb

* fix fused op + output bias

* add fused op

- now can control fused operation from config

* remove fused op

* make quality

* small changes

- remove unsed args on config
- removed bias gelu file
- make the model torchscriptable
- add torchscript slow tests

* Update src/transformers/models/bloom/modeling_bloom.py

* fix slow

* make style

* add accelerate support

* add bloom to deepspeed tests

* minor changes

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* minor change

* slow tests pass

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/en/model_doc/bloom.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* minor changes:

- change docstring
- add link to paper

Co-authored-by: Thomwolf <thomwolf@gmail.com>
Co-authored-by: Thomas Wolf <thomas@huggingface.co>
Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: sIncerass <sheng.s@berkeley.edu>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: Nicolas Patry <Narsil@users.noreply.github.com>
Co-authored-by: thomasw21 <thomasw21@users.noreply.github.com>
Co-authored-by: sgugger <sgugger@users.noreply.github.com>
Co-authored-by: patrickvonplaten <patrickvonplaten@users.noreply.github.com>
Co-authored-by: LysandreJik <LysandreJik@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: justheuristic <justheuristic@gmail.com>
Co-authored-by: Stas Bekman <stas@stason.org>
2022-06-09 12:00:40 +02:00