Sanchit Gandhi
|
e93103632b
|
Add bloom flax (#25094)
* First commit
* step 1 working
* add alibi
* placeholder for `scan`
* add matrix mult alibi
* beta scaling factor for bmm
* working v1 - simple forward pass
* move layer_number from attribute to arg in call
* partial functioning scan
* hacky working scan
* add more modifs
* add test
* update scan for new kwarg order
* fix position_ids problem
* fix bug in attention layer
* small fix
- do the alibi broadcasting only once
* prelim refactor
* finish refactor
* alibi shifting
* incorporate dropout_add to attention module
* make style
* make padding work again
* update
* remove bogus file
* up
* get generation to work
* clean code a bit
* added small tests
* adding albii test
* make CI tests pass:
- change init weight
- add correct tuple for output attention
- add scan test
- make CI tests work
* fix few nits
* fix nit onnx
* fix onnx nit
* add missing dtype args to nn.Modules
* remove debugging statements
* fix scan generate
* Update modeling_flax_bloom.py
* Update test_modeling_flax_bloom.py
* Update test_modeling_flax_bloom.py
* Update test_modeling_flax_bloom.py
* fix small test issue + make style
* clean up
* Update tests/models/bloom/test_modeling_flax_bloom.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* fix function name
* small fix test
* forward contrib credits from PR17761
* Fix failing test
* fix small typo documentation
* fix non passing test
- remove device from build alibi
* refactor call
- refactor `FlaxBloomBlockCollection` module
* make style
* upcast to fp32
* cleaner way to upcast
* remove unused args
* remove layer number
* fix scan test
* make style
* fix i4 casting
* fix slow test
* Update src/transformers/models/bloom/modeling_flax_bloom.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* remove `layer_past`
* refactor a bit
* fix `scan` slow test
* remove useless import
* major changes
- remove unused code
- refactor a bit
- revert import `torch`
* major refactoring
- change build alibi
* remove scan
* fix tests
* make style
* clean-up alibi
* add integration tests
* up
* fix batch norm conversion
* style
* style
* update pt-fx cross tests
* update copyright
* Update src/transformers/modeling_flax_pytorch_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* per-weight check
* style
* line formats
---------
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: haileyschoelkopf <haileyschoelkopf@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
|
2023-07-27 18:24:56 +01:00 |
|
Arthur
|
799df10aef
|
[Umt5 ] Add google's umt5 to transformers (#24477)
* add tokenization template
* update conversion script
* update modeling code
* update
* update convert checkpoint
* update modeling
* revert changes on convert script
* new conversion script for new format
* correct position bias
* cleaning a bit
* Credit co authors
Co-authored-by: agemagician
<ahmed.elnaggar@tum.de>
Co-authored-by: stefan-it
<>
* styling
* Add docq
* fix copies
* add co author
* Other Author
* Merge branch 'main' of https://github.com/huggingface/transformers into add-umt5
* add testing
* nit
* Update docs/source/en/model_doc/umt5.mdx
Co-authored-by: Stefan Schweter <stefan@schweter.it>
* fix t5
* actual fix?
* revert wrong changes
* remove
* update test
* more fixes
* revert some changes
* add SPIECE_UNDERLINE
* add a commone xample
* upfate
* fix copies
* revert changes on t5 conversion script
* revert bytefallback changes since there was no addition yet
* fixup
* fixup
* ingore umt5 cutom testing folder
* fix readmes
* revertT5 changes
* same outputs
* fixup
* update example
* Apply suggestions from code review
* style
* draft addition of all new files
* current update
* fix attention and stuff
* finish refactoring
* auto config
* fixup
* more nits
* add umt5 to init
* use md format
* Update README.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* revert changes on mt5
* revert mt4 changes
* update test
* more fixes
* add to mapping
* fix-copies
* fix copies
* foix retain grad
* fix some tests
* nits
* done
* Update src/transformers/models/umt5/modeling_umt5.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update docs/source/en/model_doc/umt5.md
* Update src/transformers/models/umt5/__init__.py
* Update docs/source/en/model_doc/umt5.md
Co-authored-by: Stefan Schweter <stefan@schweter.it>
* Update src/transformers/models/umt5/modeling_umt5.py
* update conversion script + use google checkpoints
* nits
* update test and modelling
* stash slow convert
* update fixupd
* don't change slow
---------
Co-authored-by: stefan-it <>
Co-authored-by: Stefan Schweter <stefan@schweter.it>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
|
2023-07-03 07:38:21 +02:00 |
|
Sylvain Gugger
|
eb849f6604
|
Migrate doc files to Markdown. (#24376)
* Rename index.mdx to index.md
* With saved modifs
* Address review comment
* Treat all files
* .mdx -> .md
* Remove special char
* Update utils/tests_fetcher.py
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
---------
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
|
2023-06-20 18:07:47 -04:00 |
|