transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-23 14:29:01 +06:00

Author	SHA1	Message	Date
Patrick von Platen	1f9dcfc1ef	[Trainer] Add nan/inf logging filter (#13619 ) * finish * add test * push * remove unnecessary code * up * correct test * Update src/transformers/training_args.py	2021-09-17 16:21:59 +02:00
Sylvain Gugger	3081d3868e	Push to hub when saving checkpoints (#13503 ) * Push to hub when saving checkpoints * Add model card * Revert partial model card * Small fix for checkpoint * Add tests * Add documentation * Fix tests * Bump huggingface_hub * Fix test	2021-09-14 08:02:15 -04:00
Sylvain Gugger	e59d4d0147	Refactor internals for Trainer push_to_hub (#13486 )	2021-09-09 13:04:37 -04:00
Philip May	b7439675b8	fix `Trainer.train(resume_from_checkpoint=False)` is causing an exception (#12981 ) * fix #12970 * Update tests/test_trainer.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update tests/test_trainer.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update tests/test_trainer.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * remove unnecessary issue link * fix test formatting Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-08-03 10:10:33 +02:00
Sylvain Gugger	0118ef89ee	Enforce eval and save strategies are compatible when --load_best_model_at_end (#12786 ) * Enforce eval and save strategies are compatible when --load_best_model_at_end * Update doc * Fix typos * Fix tests	2021-07-19 19:50:47 +02:00
Sylvain Gugger	53c60babe4	Clean push to hub API (#12187 ) * Clean push to hub API * Create working dir if it does not exist * Different tweak * New API + all models + test Flax * Adds the Trainer clean up * Update src/transformers/file_utils.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Address review comments * (nit) output types * No need to set clone_from when folder exists * Update src/transformers/trainer.py Co-authored-by: Julien Chaumond <julien@huggingface.co> * Add generated_from_trainer tag * Update to new version * Fixes Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Julien Chaumond <julien@huggingface.co> Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>	2021-06-23 10:11:19 -04:00
Stas Bekman	ebe5413589	[trainer] 2 bug fixes and a rename (#12309 ) * bug fixes and a rename * add extended DDP test	2021-06-22 11:13:23 -07:00
Stas Bekman	0d97ba8a98	[tests] multiple improvements (#12294 ) * [tests] multiple improvements * cleanup * style * todo to investigate * fix	2021-06-21 19:51:36 -07:00
Stas Bekman	dad414d5f9	[trainer + examples] set log level from CLI (#12276 ) * set log level from CLI * add log_level_replica + test + extended docs * cleanup * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * rename datasets objects to allow datasets module * improve the doc * style * doc improve Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-06-21 19:30:50 -07:00
Stas Bekman	a4ed074d4b	reset report_to to none, avoid deprecation warning (#12293 )	2021-06-21 16:50:12 -07:00
Amog Kamsetty	b9d66f4c4b	Ray Tune Integration Updates (#12134 ) * fix * fixes * add back to scheduled tests * formatting * Update integrations.py	2021-06-15 14:11:29 -04:00
Stas Bekman	372ab9cd6d	[style] consistent nn. and nn.functional: part 3 `tests` (#12155 ) * consistent nn. and nn.functional: p3 templates * restore	2021-06-14 12:18:22 -07:00
Stas Bekman	ff7c81687a	[optim] implement AdafactorSchedule (#12123 ) * implement AdafactorSchedule * typo * fix * Update src/transformers/optimization.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-06-14 09:43:48 -07:00
Stas Bekman	b1a8aa94f0	[test] support more than 2 gpus (#12074 ) * support more than 2 gpus * style	2021-06-09 09:23:47 -07:00
Stas Bekman	4ba203d9d3	[Trainer] add train loss and flops metrics reports (#11980 ) * add train loss and flops metrics reports * consistency * add train_loss to skip keys * restore on_train_end call timing	2021-06-01 15:58:31 -07:00
Lysandre Debut	6da129cb31	Enable memory metrics in tests that need it (#11859 )	2021-05-25 04:06:19 -04:00
Sylvain Gugger	afe479adb5	[Trainer] Report both steps and num samples per second (#11818 ) * [Trainer] Report both steps and num samples per second * Fix batch number * Update src/transformers/trainer_utils.py Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * Address review comments Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>	2021-05-24 19:51:42 -04:00
Sylvain Gugger	a515caa331	Fix checkpoint deletion (#11748 )	2021-05-18 07:42:39 -04:00
Sylvain Gugger	a135f59536	Auto modelcard (#11599 ) * Autogenerate model cards from the Trainer * ModelCard deprecated * Fix test * Style * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Address review comments * Quality * With all metadata * Metadata * Post-merge conflict mess * Data args and all examples * Default license and languages when possible Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2021-05-11 11:30:34 -04:00
Sylvain Gugger	6b241e0e3b	Reproducible checkpoint (#11582 ) * Set generator in dataloader * Use generator in all random samplers * Checkpoint all RNG states * Final version * Quality * Test * Address review comments * Quality * Remove debug util * Add python and numpy RNGs * Split states in different files in distributed * Quality * local_rank for TPUs * Only use generator when accepted * Add test * Set seed to avoid flakiness * Make test less flaky * Quality	2021-05-04 16:20:56 -04:00
Stas Bekman	bc2571e61c	[Deepspeed] ZeRO-Infinity integration plus config revamp (#11418 ) * adding Z-inf * revamp config process * up version requirement * wip * massive rewrite * cleanup * cleanup * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * consistent json commas * act on suggestions * leave this feature for 0.3.16 * style Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-04-26 10:40:32 -07:00
Sylvain Gugger	7959d83599	Give each test a different repo name (#11453 )	2021-04-26 11:52:23 -04:00
Sylvain Gugger	bf2e0cf70b	Trainer push to hub (#11328 ) * Initial support for upload to hub * push -> upload * Fixes + examples * Fix torchhub test * Torchhub test I hate you * push_model_to_hub -> push_to_hub * Apply mixin to other pretrained models * Remove ABC inheritance * Add tests * Typo * Run tests * Install git-lfs * Change approach * Add push_to_hub to all * Staging test suite * Typo * Maybe like this? * More deps * Cache * Adapt name * Quality * MOAR tests * Put it in testing_utils * Docs + torchhub last hope * Styling * Wrong method * Typos * Update src/transformers/file_utils.py Co-authored-by: Julien Chaumond <julien@huggingface.co> * Address review comments * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Julien Chaumond <julien@huggingface.co> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2021-04-23 09:17:37 -04:00
Sylvain Gugger	c0328a6c26	Load checkpoint without re-creating the model (#11318 )	2021-04-19 20:31:29 -04:00
Sylvain Gugger	d9c62047a8	Trainer support for IterableDataset for evaluation and predict (#11286 ) * Bulk of the work * Polish and tests * Update QA Trainer * Avoid breaking the predict method * Deprecation warnings * Store real eval dataloder * Get eval dataset reference before wrap	2021-04-16 16:01:58 -04:00
Sylvain Gugger	aaaed56ffc	Trainer iterable dataset (#11254 ) * IterableDatasetShard * Test and integration in Trainer * Update src/transformers/trainer_pt_utils.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Style Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2021-04-14 17:02:26 -04:00
Stas Bekman	c6d664849b	[DeepSpeed] ZeRO Stage 3 (#10753 ) * synced gpus * fix * fix * need to use t5-small for quality tests * notes * complete merge * fix a disappearing std stream problem * start zero3 tests * wip * tune params * sorting out the pre-trained model loading * reworking generate loop wip * wip * style * fix tests * split the tests * refactor tests * wip * parameterized * fix * workout the resume from non-ds checkpoint pass + test * cleanup * remove no longer needed code * split getter/setter functions * complete the docs * suggestions * gpus and their compute capabilities link * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * style * remove invalid paramgd * automatically configure zero3 params that rely on hidden size * make _get_resized_embeddings zero3-aware * add test exercising resize_token_embeddings() * add docstring Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2021-04-08 09:53:01 -07:00
Stas Bekman	3318c246f3	make failure to find a resume checkpoint fatal + tests (#10777 )	2021-03-17 11:16:37 -07:00
Stas Bekman	cd8c93f701	[DeepSpeed] improve checkpoint loading code plus tests (#10760 ) * deepspeed checkpoint loading code plus tests * style * style	2021-03-17 10:22:58 -07:00
Sylvain Gugger	3ced9b3eb9	Check layer types for Optimizer construction (#10598 ) * Check layer types for Optimizer construction * Duplicate class	2021-03-08 16:40:11 -05:00
Sylvain Gugger	821d518e03	Revert "Tests" This reverts commit `b35e7b68ca`.	2021-03-08 16:05:55 -05:00
Sylvain Gugger	4196bfeda0	Revert "Style" This reverts commit `a8ec52efc2`.	2021-03-08 16:05:52 -05:00
Sylvain Gugger	a8ec52efc2	Style	2021-03-08 16:04:46 -05:00
Sylvain Gugger	b35e7b68ca	Tests	2021-03-08 16:04:30 -05:00
Stas Bekman	f882966004	fix double wrapping + test (#10583 )	2021-03-08 10:15:55 -05:00
Sylvain Gugger	6290169eb3	Rework TPU checkpointing in Trainer (#10504 ) * Rework TPU checkpointing in Trainer * Wraps the barrier in a dist test * Address review comments * Remove line	2021-03-04 11:46:11 -05:00
Tanmay Garg	256482ac92	Introduce save_strategy training argument (#10286 ) * Introduce save_strategy training argument * deprecate EvaluationStrategy * collapse EvaluationStrategy and LoggingStrategy into a single IntervalStrategy enum * modify tests to use modified enum	2021-02-27 19:34:22 -05:00
Kai Fricke	98569d4ba2	Add Ray Tune hyperparameter search integration test (#10414 )	2021-02-26 10:18:33 -05:00
Stas Bekman	4eddc459a9	[trainer] implement support for full fp16 in evaluation/predict (#10268 ) * implement --fp16_full_eval * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * style * add test Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-02-18 17:02:35 -08:00
Stas Bekman	d9a81fc0c5	fix func signature (#10271 )	2021-02-18 16:44:42 -08:00
Stas Bekman	97e688bc22	[Trainer] memory tracker metrics (#10225 ) * memory tracker metrics * go back to eval for somewhat consistency * handle no-gpu case * deal with stackable eval calls * restore callback order * style * simplify the API * add test * docs * consistently use eval_ prefix * improve docs * Update src/transformers/trainer_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * rename method * style Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-02-18 09:27:32 -08:00
Sylvain Gugger	7169d1ea7b	Store FLOS as floats to avoid overflow. (#10213 )	2021-02-16 11:15:15 -05:00
Lysandre Debut	8cbd0bd137	Specify dataset dtype (#10195 ) Co-authored-by: Quentin Lhoest <lhoest.q@gmail.com> Co-authored-by: Quentin Lhoest <lhoest.q@gmail.com>	2021-02-15 12:57:17 -05:00
Sylvain Gugger	b4e559cfa1	Deprecate model_path in Trainer.train (#9854 )	2021-01-28 08:32:46 -05:00
Sylvain Gugger	35d55b7b84	When resuming training from checkpoint, Trainer loads model (#9818 ) * Whenresuming training from checkpoint, Trainer loads model * Finish cleaning tests * Address review comment * Use global_step from state	2021-01-27 09:31:18 -05:00
Sylvain Gugger	5e1bea4f16	Fix Trainer with a parallel model (#9578 ) * Fix Trainer with a parallel model * More clean up	2021-01-14 03:23:41 -05:00
Sylvain Gugger	04dc65e5c6	Fix data parallelism in Trainer (#9566 ) * Fix data parallelism in Trainer * Update src/transformers/training_args.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2021-01-13 09:54:41 -05:00
Stas Bekman	9f675b05d4	[trainer] self.model_wrapped + _model_unwrap (#9390 ) * model wrapped + model_unwrap * cleanup * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * style * deprecation warning * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-01-06 06:50:11 -05:00
Sylvain Gugger	1198ba8fba	Add timing inside Trainer (#9196 ) * Add timing inside Trainer * Fix tests * Add n_objs for train * Sort logs	2020-12-18 15:10:39 -05:00
Sylvain Gugger	ad895af98d	Add possibility to switch between APEX and AMP in Trainer (#9137 ) * Add possibility to switch between APEX and AMP in Trainer * Update src/transformers/training_args.py Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * Address review comments * Update src/transformers/training_args.py Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>	2020-12-15 16:38:10 -05:00

1 2

91 Commits