Shamane Siri
5257818e68
minor fixes in original RAG training ( #12395 )
2021-06-29 13:39:48 +01:00
Jabin Huang
e3f39a2952
fix ids_to_tokens naming error in tokenizer of deberta v2 ( #12412 )
...
Co-authored-by: Jipeng Huang <jihuan@microsoft.com>
2021-06-29 08:15:35 -04:00
Patrick von Platen
813328682e
[Flax] Example scripts - correct weight decay ( #12409 )
...
* fix_torch_device_generate_test
* remove @
* finish
* finish
* correct style
2021-06-29 12:01:08 +01:00
Suraj Patil
aecae53377
[example/flax] add summarization readme ( #12393 )
...
* add readme
* update readme and add requirements
* Update examples/flax/summarization/README.md
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-06-29 14:02:33 +05:30
Will Rice
3886104574
Fix TFWav2Vec2 SpecAugment ( #12289 )
...
* Fix TFWav2Vec2 SpecAugment
* Invert masks
* Feedback changes
2021-06-29 09:15:57 +01:00
Will Rice
bc084938f2
Add out of vocabulary error to ASR models ( #12288 )
...
* Add OOV error to ASR models
* Feedback changes
2021-06-29 08:57:46 +01:00
NielsRogge
1fc6817a30
Rename detr targets to labels ( #12280 )
...
* Rename target to labels in DetrFeatureExtractor
* Update DetrFeatureExtractor tests accordingly
* Improve docs of DetrFeatureExtractor
* Improve docs
* Make style
2021-06-29 03:07:46 -04:00
Stas Bekman
7682e97702
[models] respect dtype of the model when instantiating it ( #12316 )
...
* [models] respect dtype of the model when instantiating it
* cleanup
* cleanup
* rework to handle non-float dtype
* fix
* switch to fp32 tiny model
* improve
* use dtype.is_floating_point
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* fix the doc
* recode to use explicit torch_dtype_auto_detect, torch_dtype args
* docs and tweaks
* docs and tweaks
* docs and tweaks
* merge 2 args, add docs
* fix
* fix
* better doc
* better doc
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-28 20:11:21 -07:00
Patrick von Platen
31c3e7e75b
[Flax] Add T5 pretraining script ( #12355 )
...
* fix_torch_device_generate_test
* remove @
* add length computatan
* finish masking
* finish
* upload
* fix some bugs
* finish
* fix dependency table
* correct tensorboard
* Apply suggestions from code review
* correct processing
* slight change init
* correct some more mistakes
* apply suggestions
* improve readme
* fix indent
* Apply suggestions from code review
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
* correct tokenizer
* finish
* finish
* finish
* finish
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
2021-06-28 20:11:29 +01:00
Stas Bekman
e277074889
pass the matching trainer log level to deepspeed ( #12401 )
2021-06-28 11:43:24 -07:00
Matt
7e22609e0f
Tensorflow LM examples ( #12358 )
...
* Tensorflow MLM example
* Add CLM example
* Style fixes, adding missing checkpoint code from the CLM example
* Fix TPU training, avoid massive dataset warnings
* Fix incorrect training length calculation for multi-GPU training
* Fix incorrect training length calculation for multi-GPU training
* Refactors and nitpicks from the review
* Style pass
* Adding README
2021-06-28 19:31:44 +01:00
Patrick von Platen
2d70c91206
[Flax] Adapt flax examples to include push_to_hub
( #12391 )
...
* fix_torch_device_generate_test
* remove @
* finish
* correct summary writer
* correct push to hub
* fix indent
* finish
* finish
* finish
* finish
* finish
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
2021-06-28 19:23:35 +01:00
Funtowicz Morgan
a7d0b288fa
Remove the need for einsum
in Albert's attention computation ( #12394 )
...
* debug albert einsum
* Fix matmul computation
* Let's use torch linear layer.
* Style.
2021-06-28 18:30:05 +02:00
Sylvain Gugger
276bc149d2
Fix copies
2021-06-28 12:26:40 -04:00
Patrick von Platen
27b6ac4611
Update README.md
2021-06-28 17:22:10 +01:00
Patrick von Platen
89b57a6669
[Flax community event] Add more description to readme ( #12398 )
...
* fix_torch_device_generate_test
* remove @
* boom boom
* correct typos
* Apply suggestions from code review
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Apply suggestions from code review
Co-authored-by: Suzana Ilić <io.suzanai@gmail.com>
* Apply suggestions from code review
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Suzana Ilić <io.suzanai@gmail.com>
2021-06-28 17:18:42 +01:00
Bhadresh Savani
04dbea31a9
[Examples] Added context manager to datasets map ( #12367 )
...
* added cotext manager to datasets map
* fixed style and spaces
* fixed warning of deprecation
* changed desc
2021-06-28 09:14:00 -07:00
Stas Bekman
d25ad34c82
[CI] add dependency table sync verification ( #12364 )
...
* add dependency table sync verification
* improve the message
* improve the message
* revert
* ready to merge
2021-06-28 08:55:59 -07:00
Sylvain Gugger
57461ac0b4
Add possibility to maintain full copies of files ( #12312 )
2021-06-28 10:02:53 -04:00
Taha ValizadehAslani
9490d668d2
Update run_mlm.py ( #12344 )
...
Before the code could not be used for validation only because of this line:
extension = data_args.train_file.split(".")[-1]
was assuming that extension must be extracted from the training dataset. This line would run regardless of the training or validation options of the user. This would lead to an error if the user only wants to run an evaluation only and does not want to do train (because the training file does not exist). I modified it to extract extension from the training file if the user wants to do train and extract it from the validation file if the user wants to run eval. This way the code can be used for both training and validation separately.
2021-06-28 07:49:22 -04:00
Kilian Kluge
c7faf2ccc0
[Documentation] Warn that DataCollatorForWholeWordMask is limited to BertTokenizer-like tokenizers ( #12371 )
...
* Notify users that DataCollatorForWholeWordMask is limited to BertTokenier-like tokenizers
* Fix code formatting
2021-06-28 07:39:56 -04:00
Bhadresh Savani
ff5cdc086b
replace print with logger ( #12368 )
2021-06-26 09:31:25 -07:00
Bhadresh Savani
9a7545943d
updated example template ( #12365 )
2021-06-25 20:50:30 -07:00
Bhadresh Savani
539ee456d4
[Examples] Replicates the new --log_level feature to all trainer-based pytorch ( #12359 )
...
* added log_level
* fix comment
* fixed log_level
* Trigger CI
* Unfied logging
* simplified args for log_level
2021-06-25 14:58:42 -07:00
Stas Bekman
64e6098094
[trainer] add main_process_first context manager ( #12351 )
...
* main_process_first context manager
* handle multi-node, add context description
* sync desc
2021-06-25 14:58:03 -07:00
cronoik
f866425898
fixed multiplechoice tokenization ( #12362 )
...
* fixed multiplechoice tokenization
The model would have seen two sequences:
1. [CLS]prompt[SEP]prompt[SEP]
2. [CLS]choice0[SEP]choice1[SEP]
that is not correct as we want a contextualized embedding of prompt and choice
* removed outer brackets for proper sequence generation
2021-06-25 17:41:08 -04:00
Stas Bekman
4a872caef4
remove extra white space from log format ( #12360 )
2021-06-25 13:20:14 -07:00
Sylvain Gugger
a3daabfe14
Style
2021-06-25 15:54:31 -04:00
Kai Fricke
238521b0b6
Replace NotebookProgressReporter by ProgressReporter in Ray Tune run ( #12357 )
...
* Replace NotebookProgressReporter by ProgressReporter in Ray Tune run
* Move to local import
2021-06-25 14:12:03 -04:00
Vasudev Gupta
332a245861
Add FlaxBigBird QuestionAnswering script ( #12233 )
...
* port bigbird script
* adapt script a bit
* change location
* adapt more
* save progress
* init commit
* style
* dataset script tested
* readme add
2021-06-25 18:05:48 +01:00
jglaser
55bb4c06f7
Fix exception in prediction loop occurring for certain batch sizes ( #12350 )
...
* fix distributed_concat for scalar outputs
* Update README.md
* fixed typo (#12356 )
* simplify fix with terser syntax
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Trigger CI
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: michal pitr <21157924+MichalPitr@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-25 10:55:15 -04:00
michal pitr
d4ce31e839
fixed typo ( #12356 )
2021-06-25 07:49:29 -04:00
Patrick von Platen
aa550c4a11
Update README.md
2021-06-25 11:55:51 +01:00
Marc van Zee
f2c4ce7e33
Add flax/jax quickstart ( #12342 )
2021-06-24 17:04:18 +01:00
Sylvain Gugger
5b1b5635d3
Document patch release v4.8.1
2021-06-24 10:15:15 -04:00
Lysandre Debut
8ef62ec9e1
Fix torchscript tests ( #12336 )
...
* Fix torchscript tests
* Better test
* Remove bogus print
2021-06-24 09:52:28 -04:00
Suraj Patil
aef3823e1a
[examples/Flax] move the examples table up ( #12341 )
2021-06-24 16:03:37 +05:30
Richard Liaw
7875b638cd
try-this ( #12338 )
...
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2021-06-24 04:13:17 -04:00
Sylvain Gugger
cf3c9198aa
Fix default to logging_dir lost in merge conflict
2021-06-23 16:22:29 -04:00
Stas Bekman
07ae6103c3
[Deepspeed] new docs ( #12077 )
...
* document sub_group_size
* style
* install + issues reporting
* style
* style
* Update docs/source/main_classes/deepspeed.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* indent 4
* restore
* style
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-23 11:07:37 -07:00
Sam Havens
3694484d0a
Update training_args.py ( #12328 )
...
mention in `save_strategy` param description that `load_best_model_at_end` can override
2021-06-23 13:39:43 -04:00
Sylvain Gugger
2150dfed31
v4.9.0.dev0
2021-06-23 13:31:19 -04:00
Sylvain Gugger
9252a5127f
Release: v4.8.0
2021-06-23 13:25:56 -04:00
Patrick von Platen
468cda20f2
[Flax T5] Fix weight initialization and fix docs ( #12327 )
...
* finish t5 flax fixes
* improve naming
2021-06-23 17:39:21 +01:00
Sylvain Gugger
12a4457c56
Pin good version of huggingface_hub
2021-06-23 12:30:15 -04:00
Michael Benayoun
986ac03e37
changed modeling_fx_utils.py to utils/fx.py for clarity ( #12326 )
...
Co-authored-by: Michael Benayoun <michael@huggingface.co>
2021-06-23 18:16:24 +02:00
Lysandre
941b4442ba
Temporarily revert the fill-mask
improvements.
2021-06-23 17:46:24 +02:00
Lysandre Debut
4bdff2cdbe
Conda build ( #12323 )
2021-06-23 11:07:07 -04:00
Sylvain Gugger
9eda6b52e2
Add all XxxPreTrainedModel to the main init ( #12314 )
...
* Add all XxxPreTrainedModel to the main init
* Add to template
* Add to template bis
* Add FlaxT5
2021-06-23 10:40:54 -04:00
Sylvain Gugger
53c60babe4
Clean push to hub API ( #12187 )
...
* Clean push to hub API
* Create working dir if it does not exist
* Different tweak
* New API + all models + test Flax
* Adds the Trainer clean up
* Update src/transformers/file_utils.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Address review comments
* (nit) output types
* No need to set clone_from when folder exists
* Update src/transformers/trainer.py
Co-authored-by: Julien Chaumond <julien@huggingface.co>
* Add generated_from_trainer tag
* Update to new version
* Fixes
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-06-23 10:11:19 -04:00