Commit Graph

3 Commits

Author SHA1 Message Date
Sanchit Gandhi
e93103632b
Add bloom flax (#25094)
* First commit

* step 1 working

* add alibi

* placeholder for `scan`

* add matrix mult alibi

* beta scaling factor for bmm

* working v1 - simple forward pass

* move layer_number from attribute to arg in call

* partial functioning scan

* hacky working scan

* add more modifs

* add test

* update scan for new kwarg order

* fix position_ids problem

* fix bug in attention layer

* small fix

- do the alibi broadcasting only once

* prelim refactor

* finish refactor

* alibi shifting

* incorporate dropout_add to attention module

* make style

* make padding work again

* update

* remove bogus file

* up

* get generation to work

* clean code a bit

* added small tests

* adding albii test

* make CI tests pass:

- change init weight
- add correct tuple for output attention
- add scan test
- make CI tests work

* fix few nits

* fix nit onnx

* fix onnx nit

* add missing dtype args to nn.Modules

* remove debugging statements

* fix scan generate

* Update modeling_flax_bloom.py

* Update test_modeling_flax_bloom.py

* Update test_modeling_flax_bloom.py

* Update test_modeling_flax_bloom.py

* fix small test issue + make style

* clean up

* Update tests/models/bloom/test_modeling_flax_bloom.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* fix function name

* small fix test

* forward contrib credits from PR17761

* Fix failing test

* fix small typo documentation

* fix non passing test

- remove device from build alibi

* refactor call

- refactor `FlaxBloomBlockCollection` module

* make style

* upcast to fp32

* cleaner way to upcast

* remove unused args

* remove layer number

* fix scan test

* make style

* fix i4 casting

* fix slow test

* Update src/transformers/models/bloom/modeling_flax_bloom.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* remove `layer_past`

* refactor a bit

* fix `scan` slow test

* remove useless import

* major changes

- remove unused code
- refactor a bit
- revert import `torch`

* major refactoring

- change build alibi

* remove scan

* fix tests

* make style

* clean-up alibi

* add integration tests

* up

* fix batch norm conversion

* style

* style

* update pt-fx cross tests

* update copyright

* Update src/transformers/modeling_flax_pytorch_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* per-weight check

* style

* line formats

---------

Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: haileyschoelkopf <haileyschoelkopf@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-07-27 18:24:56 +01:00
Arthur
799df10aef
[Umt5] Add google's umt5 to transformers (#24477)
* add tokenization template

* update conversion script

* update modeling code

* update

* update convert checkpoint

* update modeling

* revert changes on convert script

* new conversion script for new format

* correct position bias

* cleaning a bit

* Credit co authors

Co-authored-by: agemagician
<ahmed.elnaggar@tum.de>

Co-authored-by: stefan-it
<>

* styling

* Add docq

* fix copies

* add co author

* Other Author

* Merge branch 'main' of https://github.com/huggingface/transformers into add-umt5

* add testing

* nit

* Update docs/source/en/model_doc/umt5.mdx

Co-authored-by: Stefan Schweter <stefan@schweter.it>

* fix t5

* actual fix?

* revert wrong changes

* remove

* update test

* more fixes

* revert some changes

* add SPIECE_UNDERLINE

* add a commone xample

* upfate

* fix copies

* revert changes on t5 conversion script

* revert bytefallback changes since there was no addition yet

* fixup

* fixup

* ingore umt5 cutom testing folder

* fix readmes

* revertT5 changes

* same outputs

* fixup

* update example

* Apply suggestions from code review

* style

* draft addition of all new files

* current update

* fix attention and stuff

* finish refactoring

* auto config

* fixup

* more nits

* add umt5 to init

* use md format

* Update README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* revert changes on mt5

* revert mt4 changes

* update test

* more fixes

* add to mapping

* fix-copies

* fix copies

* foix retain grad

* fix some tests

* nits

* done

* Update src/transformers/models/umt5/modeling_umt5.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/en/model_doc/umt5.md

* Update src/transformers/models/umt5/__init__.py

* Update docs/source/en/model_doc/umt5.md

Co-authored-by: Stefan Schweter <stefan@schweter.it>

* Update src/transformers/models/umt5/modeling_umt5.py

* update conversion script + use google checkpoints

* nits

* update test and modelling

* stash slow convert

* update fixupd

* don't change slow

---------

Co-authored-by: stefan-it <>
Co-authored-by: Stefan Schweter <stefan@schweter.it>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-07-03 07:38:21 +02:00
Sylvain Gugger
eb849f6604
Migrate doc files to Markdown. (#24376)
* Rename index.mdx to index.md

* With saved modifs

* Address review comment

* Treat all files

* .mdx -> .md

* Remove special char

* Update utils/tests_fetcher.py

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

---------

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
2023-06-20 18:07:47 -04:00