transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-24 23:08:57 +06:00

Author	SHA1	Message	Date
kang sheng	2cbcc5877d	Fix condition when GA loss bug fix is not performed (#35651 ) * fix condition when GA loss bug fix is not performed * max loss diff is 2.29 * fix typo * add an extra validation that loss should not vary too much	2025-01-16 13:59:53 +01:00
Mahdi Baghbanzadeh	c61fcde910	Enhance DataCollatorForLanguageModeling with Configurable Token Replacement Probabilities (#35251 ) * DataCollatorForLanguageModeling class was updated with new parameters that provides more control over the token masking and relacing * DataCollatorForLanguageModeling class was updated with new parameters that provides more control over the token masking and relacing * Addressed review comments, modified the docstring and made a test for the DataCollatorForLanguageModeling	2025-01-14 17:01:10 +00:00
Fanli Lin	2fa876d2d8	[tests] make cuda-only tests device-agnostic (#35607 ) * intial commit * remove unrelated files * further remove * Update test_trainer.py * fix style	2025-01-13 14:48:39 +01:00
Zach Mueller	b02828e4af	Let `EarlyStoppingCallback` not require `load_best_model_at_end` (#35101 ) * Bookmark * Add warning	2025-01-10 10:25:32 -05:00
Zach Mueller	1211e616a4	Use inherit tempdir makers for tests + fix failing DS tests (#35600 ) * Use existing APIs to make tempdir folders * Fixup deepspeed too * output_dir -> tmp_dir	2025-01-10 10:01:58 -05:00
nhamanasu	b32938aeee	Fix all output_dir in test_trainer.py to use tmp_dir (#35266 ) * update codecarbon * replace directly-specified-test-dirs with tmp_dir * pass tmp_dir to all get_regression_trainer * test_trainer.py: Use tmp_dir consistently for all output_dir arguments * fix some with...as tmp_dir blocks * reflect the comments to improve test_trainer.py * refresh .gitignore	2025-01-08 19:44:39 +01:00
Sean (Seok-Won) Yi	88e18b3c63	Update doc for `metric_for_best_model` when `save_strategy="best"`. (#35389 ) * Updated docstring for _determine_best_metric. * Updated docstring for metric_for_best_model. * Added test case for save strategy. * Updated incorrect test case. * Changed eval_strategy to match save_strategy. * Separated test cases for metric. * Allow load_best_model when save_strategy == "best". * Updated docstring for metric_for_best_model.	2025-01-08 16:32:35 +01:00
kang sheng	1ccca8f48c	Fix GA loss bugs and add unit test (#35121 ) * fix GA bugs and add unit test * narrow down model loss unit test diff gap * format code to make ruff happy * send num_items_in_batch argument to decoder * fix GA loss bug in BertLMHeadModel * use TinyStories-33M to narrow down diff gap * fotmat code * missing .config * avoid add extra args --------- Co-authored-by: kangsheng <kangsheng@meituan.com>	2024-12-09 09:57:41 +01:00
Yih-Dar	b0a51e5cff	Fix flaky Hub CI (`test_trainer.py`) (#35062 ) * fix * Update src/transformers/testing_utils.py Co-authored-by: Lucain <lucainp@gmail.com> * fix * fix * fix * fix * fix * fix * fix * fix * check * check * check * check * check * check * Update src/transformers/testing_utils.py Co-authored-by: Lucain <lucainp@gmail.com> * Update src/transformers/testing_utils.py Co-authored-by: Lucain <lucainp@gmail.com> * check * check * check * Final space * Final adjustment --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Lucain <lucainp@gmail.com>	2024-12-05 17:02:27 +01:00
AbdelKarim ELJANDOUBI	8d50fda644	Remove FSDP wrapping from sub-models. (#34452 ) * Remove FSDP wrapping from sub-models. * solve conflict trainer.py * make fixup * add unit test for fsdp_auto_wrap_policy when using auto_find_batch_size * put back extract_model_from_parallel * use transformers unwrap_model	2024-11-15 23:00:03 +01:00
Raushan Turganbay	187439c3fa	VLM: special multimodal Tokenizer (#34461 ) * kinda works * update * add tests * update * use special tokens in processors * typo * fix copies * fix * fix moshi after rebase * update * fix tests * update * Update docs/source/en/main_classes/tokenizer.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * update docs * test for load time adding tokens * fix some more tests which are now fetched better * one more fix --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-11-04 16:37:51 +01:00
Zach Mueller	ef976a7e18	Update trainer for easier handling of accumulate, compile fixes, and proper reporting (#34511 ) * Update trainer for easier handling of accumulate + proper reporting * test * Fixup tests * Full fix * Fix style * rm comment * Fix tests * Minimize test + remove py 311 check * Unused import * Forward contrib credits from discussions * Fix reported metrics * Refactor, good as it's going to get * rm pad tok id check * object detection and audio are being annoying * Fin * Fin x2 --------- Co-authored-by: Gyanateet Dutta <Ryukijano@users.noreply.github.com>	2024-11-04 07:47:34 -05:00
Sean (Seok-Won) Yi	c1753436db	New option called `"best"` for `args.save_strategy`. (#31817 ) * Add _determine_best_metric and new saving logic. 1. Logic to determine the best logic was separated out from `_save_checkpoint`. 2. In `_maybe_log_save_evaluate`, whether or not a new best metric was achieved is determined after each evaluation, and if the save strategy is "best' then the TrainerControl is updated accordingly. * Added SaveStrategy. Same as IntervalStrategy, but with a new attribute called BEST. * IntervalStrategy -> SaveStrategy * IntervalStratgy -> SaveStrategy for save_strat. * Interval -> Save in docstring. * Updated docstring for save_strategy. * Added SaveStrategy and made according changes. `save_strategy` previously followed `IntervalStrategy` but now follows `SaveStrategy`. Changes were made accordingly to the code and the docstring. * Changes from `make fixup`. * Removed redundant metrics argument. * Added new test_save_best_checkpoint test. 1. Checks for both cases where `metric_for_best_model` is explicitly provided and when it's not provided. 2. The first case should have two checkpoints saved, whereas the second should have three saved. * Changed should_training_end saving logic. The Trainer saves a checkpoints at the end of training by default as long as `save_strategy != SaveStrategy.NO`. This condition was modified to include `SaveStrategy.BEST` because it would be counterintuitive that we'd only want the best checkpoint to be saved but the last one is as well. * `args.metric_for_best_model` default to loss. * Undo metric_for_best_model update. * Remove checking metric_for_best_model. * Added test cases for loss and no metric. * Added error for metric and changed default best_metric. * Removed unused import. * `new_best_metric` -> `is_new_best_metric` Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Applied `is_new_best_metric` to all. Changes were made for consistency and also to fix a potential bug. --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Zach Mueller <muellerzr@gmail.com>	2024-10-28 16:02:22 +01:00
AbdelKarim ELJANDOUBI	8b3b9b48fc	exclude fsdp from delay_optimizer_creation (#34140 ) * exclude fsdp from delay_optimizer_creation * add test case for trainer: FSDP mode and fp8 as mixed precision * rearrange imports * ruff formatted * adapt _init_fsdp to fp8 * use _init_fsdp only when resume_from_checkpoint * In case of FDP, self.layer will be CheckpointWrapper which has no len() method * delete _init_fsdp * solve conflict * fix conflict * make fixup	2024-10-28 13:50:16 +01:00
Zach Mueller	6ba31a8a94	Enable users to use their own loss functions + deal with prefetching for grad accum (#34198 ) * bookmark * Bookmark * Bookmark * Actually implement * Pass in kwarg explicitly * Adjust for if we do or don't have labels * Bookmark fix for od * bookmark * Fin * closer * Negate accelerate grad accum div * Fixup not training long enough * Add in compute_loss to take full model output * Document * compute_loss -> compute_loss_fn * Add a test * Refactor * Refactor * Uncomment tests * Update tests/trainer/test_trainer.py Co-authored-by: Daniel Han <danielhanchen@gmail.com> --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2024-10-17 17:01:56 -04:00
Marc Sun	3f06f95ebe	Revert "Fix FSDP resume Initialization issue" (#34193 ) Revert "Fix FSDP resume Initialization issue (#34032)" This reverts commit `4de1bdbf63`.	2024-10-16 15:25:18 -04:00
Shikhar Mishra	4de1bdbf63	Fix FSDP resume Initialization issue (#34032 ) * Fix FSDP Initialization for resume training * Added init_fsdp function to work with dummy values * Fix FSDP initialization for resuming training * Added CUDA decorator for tests * Added torch_gpu decorator to FSDP tests * Fixup for failing code quality tests	2024-10-15 13:48:10 +02:00
Matthew Hoffman	70b07d97cf	Default `synced_gpus` to `True` when using `FullyShardedDataParallel` (#33483 ) * Default synced_gpus to True when using FullyShardedDataParallel Fixes #30228 Related: * https://github.com/pytorch/pytorch/issues/100069 * https://github.com/pytorch/pytorch/issues/123962 Similar to DeepSpeed ZeRO Stage 3, when using FSDP with multiple GPUs and differently sized data per rank, the ranks reach different synchronization points at the same time, leading to deadlock To avoid this, we can automatically set synced_gpus to True if we detect that a PreTrainedModel is being managed by FSDP using _is_fsdp_managed_module, which was added in 2.0.0 for torch.compile: https://github.com/pytorch/pytorch/blob/v2.0.0/torch/distributed/fsdp/_dynamo_utils.py * Remove test file * ruff formatting * ruff format * Update copyright year Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Add test for FSDP-wrapped model generation Before #33483, these tests would have hung for 10 minutes before crashing due to a timeout error * Ruff format * Move argparse import * Remove barrier I think this might cause more problems if one of the workers was killed * Move import into function to decrease load time https://github.com/huggingface/transformers/pull/33483#discussion_r1787972735 * Add test for accelerate and Trainer https://github.com/huggingface/transformers/pull/33483#discussion_r1790309675 * Refactor imports * Ruff format * Use nullcontext --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-10-10 14:09:04 -04:00
Zach Mueller	4fb28703ad	Fix PIL dep for tests (#34028 ) Fix PIL dep for tess	2024-10-09 10:45:06 -04:00
amyeroberts	b7474f211d	Trainer - deprecate tokenizer for processing_class (#32385 ) * Trainer - deprecate tokenizer for processing_class * Extend chage across Seq2Seq trainer and docs * Add tests * Update to FutureWarning and add deprecation version	2024-10-02 14:08:46 +01:00
Matthew Douglas	196d35ccfc	Add AdEMAMix optimizer (#33682 ) * Add AdEMAMix optimizer * Fix test * Update tests/trainer/test_trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2024-09-25 18:07:21 +01:00
Fanli Lin	b87755aa6d	[tests] skip tests for xpu (#33553 ) * enable * fix * add xpu skip * add marker * skip for xpu * add more * add one more	2024-09-19 19:28:04 +01:00
teamclouday	6c051b4e1e	Add revision to trainer push_to_hub (#33482 ) * add revision to trainer push_to_hub * apply suggestions * add test for revision * apply ruff format * reorganize imports * change test trainer path	2024-09-17 23:11:32 +02:00
Steven Shimizu	ba1f1dc132	Updated Trainer's liger-kernel integration to call correct patching API (#33502 ) * Updated liger-kernel integration in Trainer to call correct patching API * Fixed styling	2024-09-17 02:40:24 +02:00
Wing Lian	1027a532c5	add a callback hook right before the optimizer step (#33444 )	2024-09-13 10:43:45 +02:00
Wing Lian	62aecd85ff	schedulefree optimizers (#30079 ) * schedulefree optimizers * fix train instead of eval for optimizer * fixes and update docs * chore: lint * add tests and drop overly-verbose _32bit suffix * chore: lint * fix for docs * fix code review issues * use duck-typing to avoid per-optimizer patches * fixup style * fixup style * warn if incorrect accelerate version with schedule free Co-authored-by: Aman Gupta Karmani <aman@tmm1.net> --------- Co-authored-by: Aman Karmani <aman@tmm1.net>	2024-09-09 09:51:39 +02:00
Zach Mueller	6b7d64ac1c	Only disallow DeepSpeed Zero-3 for auto bs finder (#31731 ) * Only disallow DeepSpeed * Clean * DeepSpeed! * Add a test for deepspeed	2024-09-03 09:16:28 -04:00
Jeongseok Kang	963ed98bed	docs: Replace package abbreviations with full name(`bitsandbytes`) in docstrings (#33230 ) * docs: Provide fullname for `bitsandbytes` package * docs: Provide fullname for `bitsandbytes` package (2)	2024-09-02 13:40:34 +02:00
Jason (Siyu) Zhu	adb91179b9	Integrate Liger (Linkedin GPU Efficient Runtime) Kernel to Trainer (#32860 ) * add liger integration * fix syntax * fix import issue * add trainer.md * Use _apply_liger_kernel() * Fixed log message * Update docs/source/en/trainer.md Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update docs/source/en/trainer.md Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/training_args.py Co-authored-by: Byron Hsu <byronhsu1230@gmail.com> * Update src/transformers/trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/training_args.py Co-authored-by: Byron Hsu <byronhsu1230@gmail.com> * Update docs/source/en/trainer.md Co-authored-by: Byron Hsu <byronhsu1230@gmail.com> * Fixed checkstyle and updated readme * Added test * Fixed checkstyle * fix docstring * rename use_liger to use_liger_kernel * Trigger Build * Added test * add fix-copies * Fixed copy inconsistencies --------- Co-authored-by: shimizust <sshimizu@linkedin.com> Co-authored-by: Steven Shimizu <shimizust@gmail.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>	2024-08-23 13:20:49 +02:00
Marc Sun	c42d264549	FEAT / Trainer: Add adamw 4bit optimizer (#31865 ) * add 4bit optimizer * style * fix msg * style * add qgalore * Revert "add qgalore" This reverts commit `25278e805f`. * style * version check	2024-08-22 15:07:09 +02:00
Eric Hartford	481e15604a	Add support for GrokAdamW optimizer (#32521 ) * add grokadamw * reformat * code review feedback, unit test * reformat * reformat	2024-08-13 13:20:28 +01:00
RhuiDih	9cf4f2aa9a	Enhancing SFT Training Efficiency Using Packing and FlashAttention2 with Position IDs (#31629 ) * add DataCollatorBatchFlattening * Update data_collator.py * change name * new FA2 flow if position_ids is provided * add comments * minor fix * minor fix data collator * add test cases for models * add test case for data collator * remove extra code * formating for ruff check and check_repo.py * ruff format ruff format tests src utils * custom_init_isort.py	2024-07-23 15:56:41 +02:00
Lucain	0fdea8607d	Fix tests after `huggingface_hub` 0.24 (#32054 ) * adapt tests * style * comment	2024-07-19 19:32:39 +01:00
Yih-Dar	080e14b24c	Modify `warnings` in a `with` block to avoid flaky tests (#31893 ) * fix * [test_all] check before merge --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-07-10 17:56:12 +02:00
Anton Vlasjuk	a01b033cb4	Fix galore lr display with schedulers (#31710 ) * fix galore lr display with lr schedulers * style * add some tests to check for displayed lrs * copy-paste err for warmup steps * standardize the default lr to be only in the optimizer * trying out my luck with the reads	2024-07-05 18:59:09 +01:00
Sangbum Daniel Choi	cb298978ad	add gather_use_object arguments (#31514 ) * add gather_use_object arguments * fix name and pass the CI test for Seq2SeqTrainer * make style * make it to functools * fix typo * add accelerate version: * adding warning * Update src/transformers/trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * make style * Update src/transformers/training_args.py * check function move to initial part * add test for eval_use_gather_object --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2024-06-28 13:50:27 +01:00
amyeroberts	1de7dc7403	Skip tests properly (#31308 ) * Skip tests properly * [test_all] * Add 'reason' as kwarg for skipTest * [test_all] Fix up * [test_all]	2024-06-26 21:59:08 +01:00
Albert Villanova del Moral	a14b055b65	Pass datasets trust_remote_code (#31406 ) * Pass datasets trust_remote_code * Pass trust_remote_code in more tests * Add trust_remote_dataset_code arg to some tests * Revert "Temporarily pin datasets upper version to fix CI" This reverts commit `b7672826ca`. * Pass trust_remote_code in librispeech_asr_dummy docstrings * Revert "Pin datasets<2.20.0 for examples" This reverts commit `833fc17a3e`. * Pass trust_remote_code to all examples * Revert "Add trust_remote_dataset_code arg to some tests" to research_projects * Pass trust_remote_code to tests * Pass trust_remote_code to docstrings * Fix flax examples tests requirements * Pass trust_remote_dataset_code arg to tests * Replace trust_remote_dataset_code with trust_remote_code in one example * Fix duplicate trust_remote_code * Replace args.trust_remote_dataset_code with args.trust_remote_code * Replace trust_remote_dataset_code with trust_remote_code in parser * Replace trust_remote_dataset_code with trust_remote_code in dataclasses * Replace trust_remote_dataset_code with trust_remote_code arg	2024-06-17 17:29:13 +01:00
Bastien Le Chenadec	485fd81471	Support multiple validation datasets when `dataloader_persistent_workers=True` (#30627 ) * Support multiple validation datasets when dataloader_persistent_workers=True * Test support of multiple validation datasets	2024-06-17 16:58:39 +01:00
조준래	60861fe1fd	Implement JSON dump conversion for torch_dtype in TrainingArguments (#31224 ) * Implement JSON dump conversion for torch_dtype in TrainingArguments * Add unit test for converting torch_dtype in TrainingArguments to JSON * move unit test for converting torch_dtype into TrainerIntegrationTest class * reformating using ruff * convert dict_torch_dtype_to_str to private method _dict_torch_dtype_to_str --------- Co-authored-by: jun.4 <jun.4@kakaobrain.com>	2024-06-07 15:43:34 +01:00
Dhruv Pai	5c88253556	Add on_optimizer_step to callback options (#31095 ) * Modified test * Added on_optimizer_step to callbacks * Move callback after step is called * Added on optimizer step callback	2024-05-29 16:20:59 +02:00
Zach Mueller	daf281f44f	Enforce saving at end of training if saving option chosen (#30160 ) * Enforce saving at end of training * Fix test * Rework test * Fixup tests' * Update comment based on sourab feedback * Clean	2024-05-21 07:50:11 -04:00
Mohit Sharma	7a4792e6b3	CI: AMD MI300 tests fix (#30797 ) * add fix * update import * updated dicts and comments * remove prints * Update testing_utils.py	2024-05-21 12:46:07 +01:00
Younes Belkada	8871b26150	FEAT / Trainer: LOMO optimizer support (#30178 ) * add V1 - adalomo not working yet * add todo docs + refactor from comments * adjust LR * add docs * add more elaborated test * Apply suggestions from code review Co-authored-by: Zach Mueller <muellerzr@gmail.com> * fix * push * add accelerate check * fix DDP case * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix * init kwargs * safely add attribute * revert to enum logic * Update src/transformers/trainer.py --------- Co-authored-by: Zach Mueller <muellerzr@gmail.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-05-21 10:16:37 +02:00
Zach Mueller	92d1d97c05	Introduce configured_state arg for accelerator_config (#29781 ) * Introduce configured_state * Include note on tuning * Allow for users to have defined a state already * Include tests * Add note on hpam tune * Guard a bit better * Update src/transformers/training_args.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/training_args.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Finish rebase * Finish rebase * Guard carefully * Fixup test * Refactor * Fin refactor * Comment * Update wrt feedback --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-05-20 09:21:40 -04:00
fxmarty	37bba2a32d	CI: update to ROCm 6.0.2 and test MI300 (#30266 ) * update to ROCm 6.0.2 and test MI300 * add callers for mi300 * update dockerfile * fix trainer tests * remove apex * style * Update tests/trainer/test_trainer_seq2seq.py * Update tests/trainer/test_trainer_seq2seq.py * Update tests/trainer/test_trainer_seq2seq.py * Update tests/trainer/test_trainer_seq2seq.py * update to torch 2.3 * add workflow dispatch target * we may need branches: mi300-ci after all * nit * fix docker build * nit * add check runner * remove docker-gpu * fix issues * fix --------- Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-05-13 18:14:36 +02:00
Anton Vlasjuk	71c1985069	Immutability for data collators (#30603 ) * immutability fix for seq2seq as well as immutability tests for the collators * ensure we don't act on none labels and formatting * remove tf/pt in respective tests as they are not required * more type error fixes tf/np * remove todo * apply suggestions from code review * formatting / style	2024-05-08 17:54:49 +01:00
Nate Cibik	df475bf8e6	Trainer - add cache clearing and the option for batched eval metrics computation (#28769 ) * Added cache clearing for GPU efficiency. * Added cache clearing for GPU efficiency. * Added batch_eval_metrics capability * Ran make fixup * Fixed bug * Fixed whitespace issue * Fixed outdated condition * Updated docstrings with instructions for batch_eval_metrics. Updated end of dataloader logic * Added first version of batch_eval_metrics Trainer test * Fixed batch_eval_metrics Trainer tests for both eval and predict * Fixed batch_eval_metrics behavior for new Trainer variables * Fixed batch_eval_metrics Trainer tests * Ran fixup	2024-05-06 08:23:40 -04:00
Clara Pohland	e076953079	Trainer._load_from_checkpoint - support loading multiple Peft adapters (#30505 ) * Trainer: load checkpoint model with multiple adapters * Trainer._load_from_checkpoint support multiple active adapters * PeftModel.set_adapter does not support multiple adapters yet * Trainer._load_from_checkpoint test multiple adapters --------- Co-authored-by: Clara Luise Pohland <clara-luise.pohland@telekom.de>	2024-05-06 08:22:52 -04:00
Anton Vlasjuk	9112520b15	Fix seq2seq collator padding (#30556 ) * fix seq2seq data collator to respect the given padding strategy further added tests for the seq2seq data collator in the style of the `data_collator_for_token_classification` (pt, tf, np) * formatting and change bool equals "==" to "is" * add missed return types in tests * update numpy test as it can handle unequal shapes, not like pt or tf	2024-04-30 18:32:30 +01:00

1 2 3

150 Commits