Commit Graph

48 Commits

Author SHA1 Message Date
Stas Bekman
3b9a1dc132
[examples] improve block_size warning message (#21463) 2023-02-06 08:36:12 -08:00
Sylvain Gugger
7119bb052a
v4.27.0.dev0 2023-01-23 16:52:35 -05:00
Sylvain Gugger
05e72aa0c4
Adapt repository creation to latest hf_hub (#21158)
* Adapt repository creation to latest hf_hub

* Update all examples

* Fix other tests, add Flax examples

* Address review comments
2023-01-18 11:14:00 -05:00
Sylvain Gugger
60d1f31bb0
v4.26.0.dev0 2022-12-01 16:19:33 -05:00
Jiahao Li
9681f052a1
Fix result saving errors of pytorch examples (#20276) 2022-11-16 09:51:04 -05:00
Muhammad Sakib Khan Inan
777b1bfe62
New logging support to "Trainer" Class (ClearML Logger) (#20184)
* Init Update

* ClearML Callbacks integration

* update corrections

* args reporting updated

* {'tensorboard': False, 'pytorch': False}

* ClearML Tests added

* add clearml

* output_uri=True in Task.init

* reformatted integrations.py

* reformatted and fixed

* IF-ELSE statement issue on "has_clearml" resolved

* Add clearml in main callback docs

* Add additional clearml documentation

* Update src/transformers/integrations.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Accept suggestion

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Accept suggestion

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Small change in comments

* Make style clearml

* Accept suggestion

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Victor Sonck <victor.sonck@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-11-15 10:08:59 -05:00
Sylvain Gugger
06886d5a68
Only resize embeddings when necessary (#20043)
* Only resize embeddings when necessary

* Add comment
2022-11-03 12:05:04 -04:00
Sylvain Gugger
c3a93d8d82
v4.25.0.dev0 2022-10-31 21:48:40 -04:00
Lysandre
10100979ed Dev version 2022-10-10 17:25:40 -04:00
Lysandre
16913b3c92 Dev version 2022-09-14 14:58:20 -04:00
Rahul A R
00fc9217d1
Fixed bug which caused overwrite_cache to always be True (#19000)
* fixed bug which caused overwrite_cache to always be True (#18967).

* reformatting changes
2022-09-13 11:29:48 -04:00
Nicholas Broad
4f299b2446
Accelerator end training (#18910)
* add accelerator.end_training()

Some trackers need this to end their runs.

* fixup and quality

* add space

* add space again ?!?
2022-09-07 07:46:26 -04:00
Sylvain Gugger
c61f116b63
Tie weights after preparing the model in run_clm (#18855) 2022-09-01 12:06:56 -04:00
Rahul A R
e9442440fc
streamlining 'checkpointing_steps' parsing (#18755) 2022-08-25 11:00:38 -04:00
Atharva Ingle
d90a36d192
remove check for main process for trackers initialization (#18706) 2022-08-22 11:16:27 -04:00
zhoutang776
25e651a2de
Update run_translation_no_trainer.py (#18637)
* Update run_translation_no_trainer.py

found an error in selecting `no_decay` parameters and some small modifications when the user continues to train from a checkpoint

* fixs `no_decay` and `resume_step` issue

1. change `no_decay` list
2. if use continue to train their model from provided checkpoint, the `resume_step` will not be initialized properly if `args.gradient_accumulation_steps != 1`
2022-08-16 13:25:57 -04:00
Rasmus Arpe Fogh Jensen
a765b68aa6
Update no_trainer.py scripts to include accelerate gradient accumulation wrapper (#18473)
* Added accelerate gradient accumulation wrapper to run_image_classification_no_trainer.py example script

* make fixup changes

* PR comments

* changed input to Acceletor based on PR comment, ran make fixup

* Added comment explaining the sync_gradients statement

* Fixed lr scheduler max steps

* Changed run_clm_no_trainer.py script to use accelerate gradient accum wrapper

* Fixed all scripts except wav2vec2 pretraining to use accelerate gradient accum wrapper

* Added accelerate gradient accum wrapper for wav2vec2_pretraining_no_trainer.py script

* make fixup and lr_scheduler step inserted back into run_qa_beam_search_no_trainer.py

* removed changes to run_wav2vec2_pretraining_no_trainer.py script and fixed using wrong constant in qa_beam_search_no_trainer.py script
2022-08-08 15:52:47 -04:00
Ritik Nandwal
3db4378bd7
Update no trainer scripts for language modeling and image classification examples (#18443)
* Update no_trainer script for image-classification

* Update no_trainer scripts for language-modeling examples

* Remove unused variable

* Removing truncation from losses array for language modeling examples
2022-08-03 08:33:18 -04:00
Sylvain Gugger
941d233153
Fix ROUGE add example check and update README (#18398)
* Fix ROUGE add example check and update README

* Stay consistent in values
2022-08-01 11:14:49 -04:00
Zachary Mueller
75259b44bf
Properly calculate the total train iterations and recalculate num epochs in no_trainer scripts (#17856) 2022-06-23 15:46:01 -04:00
Sylvain Gugger
3cab90279f
Add examples telemetry (#17552)
* Add examples telemetry

* Alternative approach

* Add to all other examples

* Add to templates as well

* Put framework separately

* Same for TensorFlow
2022-06-07 11:57:52 -04:00
Sourab Mangrulkar
d156898f3b
Improve notrainer examples (#17449)
* improve no-trainer examples

* Trigger CI

* adding comment to clarify tracker init on main process

* Trigger CI

* Trigger CI

* Trigger CI
2022-05-28 00:06:31 +05:30
Sylvain Gugger
afe5d42d8d
Black preview (#17217)
* Black preview

* Fixup too!

* Fix check copies

* Use the same version as the CI

* Bump black
2022-05-12 16:25:55 -04:00
Zachary Mueller
f275e593bf
Fix no_trainer examples to properly calculate the number of samples (#17046)
* Update all examples to properly calculate progress bar
2022-05-02 11:56:25 -04:00
Zachary Mueller
35d48db881
Update no_trainer examples to use new logger (#17044)
* Propagate and fix imports
2022-05-02 11:56:15 -04:00
Zachary Mueller
60e1d883f1
Fixup no_trainer save logic (#16968)
* Fixup all examples
2022-04-27 14:46:49 -04:00
Sylvain Gugger
c79bbc3ba5
Fix multiple deletions of the same files in save_pretrained (#16947)
* Fix multiple deletions of the same files in save_pretrained

* Add is_main_process argument
2022-04-27 12:28:42 -04:00
Zachary Mueller
be752d12f8
Fixup no_trainer examples scripts and add more tests (#16765)
* Change tracking to store_true

* Remove step param and use it in the log dictionary directly

* use vars(args) when passing args to init_trackers

* Include tracking tests since tensorboard is already a dep
2022-04-13 14:40:48 -04:00
Zachary Mueller
d4b3e359aa
Don't push checkpoints to hub in no_trainer scripts (#16703)
Adds checkpoint prefixes to the gitignore if `push_to_hub` is used along with `checkpointint_steps`
2022-04-11 12:42:45 -04:00
Zachary Mueller
d57da99237
Add tests for no_trainer and fix existing examples (#16656)
* Fixed some bugs involving saving during epochs
* Added tests mimicking the existing examples tests
* Added in json exporting to all `no_trainer` examples for consistency
2022-04-08 10:03:56 -04:00
Zachary Mueller
febe42b5da
Update no_trainer scripts with new Accelerate functionalities (#16617)
Adds logging and save/loading to the Accelerate scripts

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-04-06 15:29:32 -04:00
Sylvain Gugger
4975002df5
Reorganize file utils (#16264)
* Split file_utils in several submodules

* Fixes

* Add back more objects

* More fixes

* Who exactly decided to import that from there?

* Second suggestion to code with code review

* Revert wront move

* Fix imports

* Adapt all imports

* Adapt all imports everywhere

* Revert this import, will fix in a separate commit
2022-03-23 10:26:33 -04:00
Julien Chaumond
6cdc3a7844
[urls to hub] Replace outdated model tags with their now-canonical pipeline types (#14617)
* Replace outdated model tags with their now-canonical pipeline types

* spam the CI till it's green
2021-12-06 04:35:01 -05:00
Nicholas Broad
69e16abf98
Switch from using sum for flattening lists of lists in group_texts (#14472)
* remove sum for list flattening

* change to chain(*)

* make chain object a list

* delete empty lines

per sgugger's suggestions

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Nicholas Broad <nicholas@nmbroad.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-11-22 16:17:26 -05:00
Sylvain Gugger
08a5f57567
Add new LFS prune API (#14294) 2021-11-05 18:58:51 -04:00
Patrick von Platen
44eb8bdeea
map only on one process (#13810) 2021-09-30 18:52:53 +02:00
Sylvain Gugger
b7d264be0d
Add push_to_hub to no_trainer examples (#13659)
* Add push_to_hub to no_trainer examples

* Quality

* Document integration

* Roll out to other examples
2021-09-21 13:13:30 -04:00
Stefan Schweter
4046e66e40
examples: only use keep_linebreaks when reading TXT files (#13320)
* examples: only use keep_linebreaks when reading TXT files for all CLM examples

* examples: only use keep_linebreaks when reading TXT files for all CLM examples

* examples: only use keep_linebreaks when reading TXT files for all CLM examples
2021-08-28 16:22:29 +02:00
Stefan Schweter
319d840b46
examples: add keep_linebreaks option to CLM examples (#13150)
* examples: add keep_linebreaks option to text dataset loader for all CLM examples

* examples: introduce new keep_linebreaks option as data argument in CLM examples
2021-08-27 11:35:45 +02:00
Allan Lin
91ff480e26
Update namespaces inside torch.utils.data to the latest. (#13167)
* Update torch.utils.data namespaces to the latest.

* Format

* Update Dataloader.

* Style
2021-08-19 14:29:51 +02:00
Sylvain Gugger
7fcee113c1
Tpu tie weights (#13030)
* Fix tied weights on TPU

* Manually tie weights in no trainer examples

* Fix for test

* One last missing

* Gettning owned by my scripts

* Address review comments

* Fix test

* Fix tests

* Fix reformer tests
2021-08-06 20:41:39 +02:00
Sylvain Gugger
6f1adc4334
Fix group_lengths for short datasets (#12558) 2021-07-08 07:23:41 -04:00
Souvic Chakraborty
1d6623c6a2
MLM training fails with no validation file(same as #12406 for pytorch now) (#12517)
* Validation split percentage to be used for custom data files also

Issue same as https://github.com/huggingface/transformers/issues/12406 fixed for pytorch branch run_mlm.py

* Validation split added in the right place

* Update run_clm.py

* validation split added for custom files

* Validation split added for custom files

* Update run_plm.py

* fixed validation split for custom files as input for pytorch examples in lm

* Update run_clm_no_trainer.py

* args modified
2021-07-07 09:05:44 -04:00
Stas Bekman
4a872caef4
remove extra white space from log format (#12360) 2021-06-25 13:20:14 -07:00
Bhavitvya Malik
e43e11260f
update desc for map in all examples (#12226)
* update desc for map in all examples

* added plm

* suggestions
2021-06-17 15:37:31 -04:00
Stas Bekman
6287c929c1
[lm examples] fix overflow in perplexity calc (#11855)
* fix overflow in perplexity calc

* use inf

* fix
2021-05-25 08:11:26 -07:00
Jonathan Chang
6f40e31766
Fix comment in run_clm_no_trainer.py (#11624) 2021-05-07 12:32:30 +05:30
Sylvain Gugger
dabeb15292
Examples reorg (#11350)
* Base move

* Examples reorganization

* Update references

* Put back test data

* Move conftest

* More fixes

* Move test data to test fixtures

* Update path

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments and clean

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-04-21 11:11:20 -04:00