* Adding support for `pipeline("automatic-speech-recognition")`.
- Ugly `"config"` choice for AutoModel. It would be great to have the
possibility to have something like `AutoModelFor` that would implement
the same logic (Load the config, check Architectures and load the first
one)
* Remove `model_id` was not needed in the end.
* Rebased !
* Remove old code.
* Rename `nlp`.
* Validation split percentage to be used for custom data files also
Issue same as https://github.com/huggingface/transformers/issues/12406 fixed for pytorch branch run_mlm.py
* Validation split added in the right place
* Update run_clm.py
* validation split added for custom files
* Validation split added for custom files
* Update run_plm.py
* fixed validation split for custom files as input for pytorch examples in lm
* Update run_clm_no_trainer.py
* args modified
* Copy BART to MBart and rename some stuff
* Add copy statements pointing to FlaxBart
* Update/add some common files
* Update shift_tokens_rigth + fix imports
* Fix shift_tokens_right method according to MBart implementation
* Update shift_tokens_right in tests accordingly
* Fix the import issue and update docs file
* make style quality
* Do some minor changes according to patil-suraj suggestions
* Change the order of normalization layer and attention
* Add some copu statementes
* Update generate method and add integration test for mBart
* Make a few updates after a review
Besides, add `lang_code_to_id` to MBartTokenizeFast
* fix-copies; make style quality
* Apply suggestions from code review
* Apply suggestions from code review
* Apply suggestions from code review
* fix output type, style
* add copied from
* resolve conflicts
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* fix_torch_device_generate_test
* remove @
* upload
* finish dataset streaming
* adapt readme
* finish
* up
* up
* up
* up
* Apply suggestions from code review
* finish
* make style
* make style2
* finish
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
* Validation split added: custom data files
Validation split added in case of no validation file and loading custom data
* Updated documentation with custom file usage
Updated documentation with custom file usage
* Update README.md
* Update README.md
* Update README.md
* Made some suggested stylistic changes
* Used logger instead of print.
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Made similar changes to add validation split
In case of a missing validation file, a validation split will be used now.
* max_train_samples to be used for training only
max_train_samples got misplaced, now corrected so that it is applied on training data only, not whole data.
* styled
* changed ordering
* Improved language of documentation
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Improved language of documentation
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Fixed styling issue
* Update run_mlm.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>