transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-30 09:42:22 +06:00

Author	SHA1	Message	Date
Stas Bekman	cd8c93f701	[DeepSpeed] improve checkpoint loading code plus tests (#10760 ) * deepspeed checkpoint loading code plus tests * style * style	2021-03-17 10:22:58 -07:00
Cheng Li	c83fbc5f2d	[Deepspeed] Allow HF optimizer and scheduler to be passed to deepspeed (#10464 ) * pass hf optimizer and scheduler to deepspeed if not specified in ds config * pass hf optimizer and scheduler to deepspeed if not specified in ds config * update * make init_deepspeed support config dict * fix docstring formatting * clean up trainer's comments * add new tests * fix type * composit argparse doesn't work * style * add a new test, rename others * document new functionality * complete tests, add docs * style * correct level * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * add new methods to the doc * must tell DS we are using a non-native optimizer * add protection against cpu_offload + HF optimizer combo * fix the cli overrides * sync docs + tests * restore AdamW * better docs * need new version * no longer needed * remove outdate information * refactor duplicated code Co-authored-by: Stas Bekman <stas@stason.org> Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-03-16 15:51:09 -07:00
Théo Matussière	6f840990a7	split seq2seq script into summarization & translation (#10611 ) * split seq2seq script, update docs * needless diff * fix readme * remove test diff * s/summarization/translation Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * cr * fix arguments & better mbart/t5 refs * copyright Co-authored-by: Suraj Patil <surajp815@gmail.com> * reword readme Co-authored-by: Suraj Patil <surajp815@gmail.com> * s/summarization/translation * short script names * fix tests * fix isort, include mbart doc * delete old script, update tests * automate source prefix * automate source prefix for translation * s/translation/trans Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * fix script name (short version) * typos Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * exact parameter Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * remove superfluous source_prefix calls in docs * rename scripts & warn for source prefix * black * flake8 Co-authored-by: theo <theo@matussie.re> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>	2021-03-15 09:11:42 -04:00
Stas Bekman	4c32f9f26e	AdamW is now supported by default (#9624 )	2021-03-12 13:40:07 -08:00
Sylvain Gugger	0d909f6bd8	Fairscale FSDP fix model save (#10596 ) * Hotfix fairscale FSDP * Evaluation works * Save on process zero	2021-03-09 14:42:07 -05:00
Stas Bekman	917f104502	[examples tests] various fixes (#10584 ) * fix sharded ddp enum * test fixes * stronger validation + apex breaks other tests	2021-03-08 10:28:44 -08:00
Sylvain Gugger	9d14be5c20	Add support for ZeRO-2/3 and ZeRO-offload in fairscale (#10354 ) * Ass support for ZeRO-2/3 and ZeRO-offload in fairscale * Quality * Rework from review comments * Add doc * Apply suggestions from code review Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * Address review comments Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>	2021-02-25 11:07:53 -05:00
Stas Bekman	3437d12134	[Trainer/Deepspeed] handle get_last_lr() before first step() (#10362 ) * handle get_last_lr() before first step() * abstract away the lr getting logic * cleanup * add test * move to utils	2021-02-23 17:42:25 -08:00
Stas Bekman	eab0afc19c	[Trainer] implement gradient_accumulation_steps support in DeepSpeed integration (#10310 ) * implement gradient_accumulation_steps support in DeepSpeed integration * typo * cleanup * cleanup	2021-02-22 11:15:59 -08:00
Stas Bekman	97e688bc22	[Trainer] memory tracker metrics (#10225 ) * memory tracker metrics * go back to eval for somewhat consistency * handle no-gpu case * deal with stackable eval calls * restore callback order * style * simplify the API * add test * docs * consistently use eval_ prefix * improve docs * Update src/transformers/trainer_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * rename method * style Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-02-18 09:27:32 -08:00
Stas Bekman	d1eb88f42d	[CI] 2 fixes (#10248 ) * fix invalid port * missing requirements	2021-02-17 14:12:39 -08:00
Stas Bekman	0b1f552a24	fix run_seq2seq.py; porting trainer tests to it (#10162 ) * fix run_seq2seq.py; porting DeepSpeed tests to it * unrefactor * defensive programming * defensive programming 2 * port the rest of the trainer tests * style * a cleaner scripts dir finder * cleanup	2021-02-15 09:12:17 -08:00
Stas Bekman	b54cb0bd82	[DeepSpeed in notebooks] Jupyter + Colab (#10130 ) * init devices/setup explicitly * docs + test * simplify * cleanup * cleanup * cleanup * correct the required dist setup * derive local_rank from env LOCAL_RANK	2021-02-11 14:02:05 -08:00
Stas Bekman	77b862847b	[DeepSpeed] restore memory for evaluation (#10114 ) * free up memory at the end of train * rework tests * consistent formatting * correction	2021-02-10 09:09:48 -08:00
Stas Bekman	781220acab	transition to new tests dir (#10080 )	2021-02-08 12:41:52 -08:00

15 Commits