* Remove "Model" suffix from Flax models to look more 🤗
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Initial working (forward + backward) for Flax MLM training example.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Simply code
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Addressing comments, using module and moving to LM task.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Restore parameter name "module" wrongly renamed model.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Restore correct output ordering...
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Actually commit the example 😅
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Add FlaxBertModelForMaskedLM after rebasing.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Make it possible to initialize the training from scratch
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Reuse flax linen example of cross entropy loss
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Added specific data collator for flax
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Remove todo for data collator
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Added evaluation step
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Added ability to provide dtype to support bfloat16 on TPU
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Enable flax tensorboard output
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Enable jax.pmap support.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Ensure batches are correctly sized to be dispatched with jax.pmap
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Enable bfloat16 with --fp16 cmdline args
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Correctly export metrics to tensorboard
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Added dropout and ability to use it.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Effectively enable & disable during training and evaluation steps.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Oops.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Enable specifying kernel initializer scale
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Style.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Added warmup step to the learning rate scheduler.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Fix typo.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Print training loss
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Make style
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* fix linter issue (flake8)
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Fix model matching
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Fix dummies
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Fix non default dtype on Flax models
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Use the same create_position_ids_from_input_ids for FlaxRoberta
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Make Roberta attention as Bert
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* fix copy
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Wording.
Co-authored-by: Marc van Zee <marcvanzee@gmail.com>
Co-authored-by: Marc van Zee <marcvanzee@gmail.com>
* diverse beam search
* bug fixes
* bug fixes
* bug fix
* separate out diverse_beam_search function
* separate out diverse_beam_search function
* bug fix
* improve code quality
* bug fix
* bug fix
* separate out diverse beam search scorer
* code format
* code format
* code format
* code format
* add test
* code format
* documentation changes
* code quality
* add slow integration tests
* more general name
* refactor into logits processor
* add test
* avoid too much copy paste
* refactor
* add to docs
* fix-copies
* bug fix
* Revert "bug fix"
This reverts commit c99eb5a8dc.
* improve comment
* implement sylvains feedback
Co-authored-by: Ayush Jain <a.jain@sprinklr.com>
Co-authored-by: ayushtiku5 <40797286+ayushtiku5@users.noreply.github.com>
* Add new SQUAD example
* Same with a task-specific Trainer
* Address review comment.
* Small fixes
* Initial work for XLNet
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Final clean up and working XLNet script
* Test and debug
* Final working version
* Add new SQUAD example
* Same with a task-specific Trainer
* Address review comment.
* Small fixes
* Initial work for XLNet
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Final clean up and working XLNet script
* Test and debug
* Final working version
* Add tick
* Update README
* Address review comments
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Removed unused `encoder_hidden_states` and `encoder_attention_mask` from MobileBert
* Removed decoder tests for MobileBert
* Removed now unnecessary import
* [training] SAVE_STATE_WARNING was removed in pytorch
FYI `SAVE_STATE_WARNING` has been removed 3 days ago: pytorch/pytorch#46813
Fixes: #8232
@sgugger
* style, but add () to prevent autoformatters from botching it
* switch to try/except
* cleanup
* initial commit
* [cli] lfs commands
* Fix FileSlice
* Tweak to FileSlice
* [hf_api] Backport filetype arg from `datasets`
cc @lhoestq
* Silm down the CI while i'm working
* Ok let's try this in CI
* Update config.yml
* Do not try this at home
* one more try
* Update lfs.py
* Revert "Tweak to FileSlice"
This reverts commit d7e32c4b35.
* Update test_hf_api.py
* Update test_hf_api.py
* Update test_hf_api.py
* CI still green?
* make CI green again?
* Update test_hf_api.py
* make CI red again?
* Update test_hf_api.py
* add CI style back
* Fix CI?
* oh my
* doc + switch back to real staging endpoint
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Pierric Cistac <Pierrci@users.noreply.github.com>
* Fix docblock + f-strings
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Pierric Cistac <Pierrci@users.noreply.github.com>
* Add TFGPT2ForSequenceClassification based on DialogRPT
* Add TFGPT2ForSequenceClassification based on DialogRPT
* TFGPT2ForSequenceClassification based on DialogRPT-refactored code, implemented review comments and added input processing
* Add TFGPT2ForSequenceClassification based on DialogRPT
* TFGPT2ForSequenceClassification based on DialogRPT-refactored code, implemented review comments and added input processing
* code refactor for latest other TF PR
* code refactor
* code refactor
* Update modeling_tf_gpt2.py
Without this fix, training a `BARTForSequenceClassification` model with `run_pl_glue.py` gives `TypeError: forward() got an unexpected keyword argument 'token_type_ids'`, because BART does not have token_type_ids. I've solved this issue in the same way as it's solved for the "distilbert" model, and I can train BART models on SNLI without errors now.
* Add badge w/ number of models on the hub
* try to apease @sgugger 😇
* not sure what this `c` was about [ci skip]
* Fix script and move stuff around
* Fix doc styling error
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
* [trainer] improve code
This PR:
- removes redundant code
```
self.model = model if model is not None else None
```
and
```
self.model = model
```
are the same.
* separate attribute assignment from code logic - which simplifies things further.
* whitespace
* Warning about too long input for fast tokenizers too
If truncation is not set in tokenizers, but the tokenization is too long
for the model (`model_max_length`), we used to trigger a warning that
The input would probably fail (which it most likely will).
This PR re-enables the warning for fast tokenizers too and uses common
code for the trigger to make sure it's consistent across.
* Checking for pair of inputs too.
* Making the function private and adding it's doc.
* Remove formatting ?? in odd place.
* Missed uppercase.