Patrick von Platen
8c9b5fcbaf
[Flax] Big FlaxBert Refactor ( #11364 )
...
* improve flax
* refactor
* typos
* Update src/transformers/modeling_flax_utils.py
* Apply suggestions from code review
* Update src/transformers/modeling_flax_utils.py
* fix typo
* improve error tolerance
* typo
* correct nasty saving bug
* fix from pretrained
* correct tree map
* add note
* correct weight tying
2021-04-23 09:53:09 +02:00
Patrick von Platen
e87505f3a1
[Flax] Add other BERT classes ( #10977 )
...
* add first code structures
* add all bert models
* add to init and docs
* correct docs
* make style
2021-03-31 09:45:58 +03:00
Patrick von Platen
8780caa388
[WIP][Flax] Add general conversion script ( #10809 )
...
* save intermediate
* finish first version
* delete some more
* improve import
* fix roberta
* Update src/transformers/modeling_flax_pytorch_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/modeling_flax_pytorch_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* small corrections
* apply all comments
* fix deterministic
* make fix-copies
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-03-30 12:13:59 +03:00
Patrick von Platen
0b98ca368f
[Flax] Adapt Flax models to new structure ( #9484 )
...
* Create modeling_flax_eletra with code copied from modeling_flax_bert
* Add ElectraForMaskedLM and ElectraForPretraining
* Add modeling test for Flax electra and fix naming and arg in Flax Electra model
* Add documentation
* Fix code style
* Create modeling_flax_eletra with code copied from modeling_flax_bert
* Add ElectraForMaskedLM and ElectraForPretraining
* Add modeling test for Flax electra and fix naming and arg in Flax Electra model
* Add documentation
* Fix code style
* Fix code quality
* Adjust tol in assert_almost_equal due to very small difference between model output, ranging 0.0010 - 0.0016
* Remove redundant ElectraPooler
* save intermediate
* adapt
* correct bert flax design
* adapt roberta as well
* finish roberta flax
* finish
* apply suggestions
* apply suggestions
Co-authored-by: Chris Nguyen <anhtu2687@gmail.com>
2021-03-18 09:44:17 +03:00
Patrick von Platen
9f8619c6aa
Flax testing should not run the full torch test suite ( #10725 )
...
* make flax tests pytorch independent
* fix typo
* finish
* improve circle ci
* fix return tensors
* correct flax test
* re-add sentencepiece
* last tokenizer fixes
* finish maybe now
2021-03-16 08:05:37 +03:00
Patrick von Platen
640e6fe190
[Flax] Align FlaxBertForMaskedLM with BertForMaskedLM, implement from_pretrained, init ( #9054 )
...
* save intermediate
* save intermediate
* save intermediate
* correct flax bert model file
* new module / model naming
* make style
* almost finish BERT
* finish roberta
* make fix-copies
* delete keys file
* last refactor
* fixes in run_mlm_flax.py
* remove pooled from run_mlm_flax.py`
* fix gelu | gelu_new
* remove Module from inits
* splits
* dirty print
* preventing warmup_steps == 0
* smaller splits
* make fix-copies
* dirty print
* dirty print
* initial_evaluation argument
* declaration order fix
* proper model initialization/loading
* proper initialization
* run_mlm_flax improvements: improper model inputs bugfix + automatic dataset splitting + tokenizers parallelism warning + avoiding warmup_steps=0 bug
* removed tokenizers warning hack, fixed model re-initialization
* reverted training_args.py changes
* fix flax from pretrained
* improve test in flax
* apply sylvains tips
* update init
* make 0.3.0 compatible
* revert tevens changes
* revert tevens changes 2
* finalize revert
* fix bug
* add docs
* add pretrained to init
* Update src/transformers/modeling_flax_utils.py
* fix copies
* final improvements
Co-authored-by: TevenLeScao <teven.lescao@gmail.com>
2020-12-16 13:03:32 +01:00
Sylvain Gugger
8d4bb02056
Refactor FLAX tests ( #9034 )
2020-12-10 15:57:39 -05:00