transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-20 21:18:21 +06:00

Author	SHA1	Message	Date
Zhaofeng Wu	1b74af76b7	Allow dataset to be an optional argument for (Distributed)LengthGroupedSampler (#13820 ) * Allow dataset to be an optional argument for (Distributed)LengthGroupedSampler * Fix	2021-10-05 09:04:39 -04:00
Michael Benayoun	d4e4efce68	Initial support for symbolic tracing with torch.fx allowing dynamic axes (#13579 ) * Symbolic trace dynamic axes support for BERT like models (albert, bert, distilbert, mobilebert, electra, megatron-bert) * Sanity checks before tracing that make sure the model to trace is supported * Adapted to PyTorch 1.9 Co-authored-by: Michael Benayoun <michael@huggingface.co>	2021-10-05 14:19:47 +02:00
Alex Hedges	46efc58024	Improve error message when loading models from Hub (#13836 ) * Improve error message when loading models from Hub * Adjust error message wording	2021-10-05 08:09:10 -04:00
Nicolas Patry	3a9c0f23b4	Fixing empty prompts for text-generation when BOS exists. (#13859 ) * Fixing empty prompts for text-generation when BOS exists. * Fixing odd case with Pegasus. * Fixing Bert is Assertion Error.	2021-10-05 13:46:10 +02:00
Yih-Dar	a6ea244f99	Fix: save checkpoint after each epoch and push checkpoint to the hub (#13872 ) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2021-10-05 16:30:13 +05:30
Nicolas Patry	7079a99e76	Fixing 1-length special tokens cut. (#13862 )	2021-10-05 12:26:54 +02:00
Sam Hardwick	7051b89267	Update Tatoeba conversion (#13757 ) * Update Tatoeba conversion	2021-10-05 14:45:18 +05:30
Bram Vanroy	12b4d66a80	Update no_* argument (HfArgumentParser) (#13865 ) * update no_* argument Changes the order so that the no_* argument is created after the original argument AND sets the default for this no_* argument to False * import copy * update test * make style * Use kwargs to set default=False * make style	2021-10-04 16:28:52 -04:00
Nathan Raw	cc0a415e2f	✨ update image classification example (#13824 ) * ✨ update image classification example * 📌 update reqs	2021-10-04 11:49:51 -07:00
Evgeniy Zheltonozhskiy	6c08840628	Fix broken link to distill models in docs (#13848 ) * Fix broken link to distill models * Missing symbol * Fix spaces	2021-10-04 11:57:54 -04:00
Sidd Karamcheti	3a8de58c51	Add Mistral GPT-2 Stability Tweaks (#13573 ) * Add layer-wise scaling * Add reorder & upcasting argument * Add OpenAI GPT-2 weight initialization scheme * start `layer_idx` count at zero for consistency * disentangle attn and reordered and upscaled attn function * rename `scale_attn_by_layer` to `scale_attn_by_layer_id` * make autocast from amp compatible with pytorch<1.6 * fix docstring * style fixes * Add fixes from PR feedback, style tweaks * Fix doc whitespace * Reformat * First pass scale_attn_by_layer_idx and reorder_and_upcast_attn tests * Rename scale_attn_by_layer_idx, add tip * Remove extra newline * add test for weight initialization * update code format * add assert check weights are fp32 * remove assert * Fix incorrect merge * Fix shape mismatch in baddbmm * Add generation test for Mistral flags Co-authored-by: leandro <leandro.vonwerra@spoud.io> Co-authored-by: Keshav Santhanam <keshav2@stanford.edu> Co-authored-by: J38 <jebolton@stanford.edu>	2021-10-04 07:37:09 -04:00
Yaser Abdelaziz	955fd4fea9	[docs/gpt-j] fix typo (#13851 )	2021-10-04 12:30:50 +02:00
Gunjan Chhablani	de948350c2	Delete convert_multiberts_checkpoint_to_pytorch.py (#13852 )	2021-10-04 12:30:21 +02:00
Stas Bekman	bcc3f7b656	include megatron_gpt2 in installed modules (#13834 )	2021-10-01 11:42:08 -07:00
Silviu Oprea	707f7eb181	Bart: check if decoder_inputs_embeds is set (#13800 ) In BartForConditionalGeneration.forward, if labels are provided, decoder_input_ids are set to the labels shifted to the right. This is problematic: if decoder_inputs_embeds is also set, the call to self.model, which eventually gets to BartDecoder.forward, will raise an error. The fix is quite simple, similar to what is there already in BartModel.forward. Mainly, we should not compute decoder_input_ids if decoder_inputs_embeds is provided. Co-authored-by: Silviu Vlad Oprea <silviuvo@amazon.co.uk>	2021-10-01 19:36:57 +02:00
Anton Lozhkov	4213728067	[Examples] Add an official audio classification example (#13722 ) * Restore broken merge * Additional args, DDP, remove CommonLanguage * Update examples for V100, add training results * Style * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Remove custom datasets for simplicity, apply suggestions from code review * Add the attention_mask flag, reorganize README Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-10-01 18:52:45 +02:00
Arfon Smith	c4113721f8	Update CITATION.cff (#13833 )	2021-10-01 10:41:27 -04:00
Yuta Hayashibe	90f980ed35	Fix warning situation: UserWarning: max_length is ignored when padding=True" (#13829 ) * Removed wrong warning * Raise a warning when `max_length` is given with wrong `truncation` * Update the error message * Update the warning message Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-10-01 09:29:08 -04:00
Suraj Patil	8bbb53e20b	skip gptj slow generate tests for now (#13809 )	2021-09-30 15:44:33 -04:00
Patrick von Platen	41436d3dfb	[DPR] Correct init (#13796 ) * update * add to docs and init * make fix-copies	2021-09-30 18:55:20 +02:00
Patrick von Platen	44eb8bdeea	map only on one process (#13810 )	2021-09-30 18:52:53 +02:00
Gunjan Chhablani	9a9805fccf	Add MultiBERTs conversion script (#13077 ) * Init multibert checkpoint conversion script * Rename conversion script * Fix MultiBerts Conversion Script * Apply suggestions from code review Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>	2021-09-30 18:48:56 +02:00
Stas Bekman	e1d1c7c087	[testing] auto-replay captured streams (#13803 )	2021-09-30 09:26:49 -07:00
Sylvain Gugger	5f25855b3e	Update doc for v4.11.2	2021-09-30 11:58:33 -04:00
Sylvain Gugger	269c3d1400	Fix gather for TPU (#13813 )	2021-09-30 11:32:40 -04:00
Suraj Patil	7db2a79b38	[examples/flax] use Repository API for push_to_hub (#13672 ) * use Repository for push_to_hub * update readme * update other flax scripts * update readme * update qa example * fix push_to_hub call * fix typo * fix more typos * update readme * use abosolute path to get repo name * fix glue script	2021-09-30 16:38:07 +05:30
Stas Bekman	b90096fe14	[examples `run_glue.py`] missing requirements `scipy`, `sklearn` (#13768 ) * missing requirement * list both	2021-09-29 13:45:19 -07:00
Suraj Patil	bf6118e70c	[docs/gpt-j] addd instructions for how minimize CPU RAM usage (#13795 ) * add a note about tokenizer * add tips to load model is less RAM * fix link * fix more links	2021-09-29 23:43:46 +05:30
Sylvain Gugger	55695df0f7	Merge remote-tracking branch 'origin/master'	2021-09-29 12:09:54 -04:00
Sylvain Gugger	cf4aa3597f	Update doc for v4.11.1	2021-09-29 12:09:40 -04:00
Matt	2a51b15518	Add TF notebooks (#13793 )	2021-09-29 17:07:10 +01:00
Sylvain Gugger	63cc5bda60	Fix length of IterableDatasetShard and add test (#13792 ) * Fix length of IterableDatasetShard and add test * Add comments	2021-09-29 11:48:48 -04:00
Li-Huai (Allan) Lin	7d84c3a488	Enable readme link synchronization (#13785 ) * Enable readme link synchronization * Style * Reuse regex pattern * Apply suggestions * Update	2021-09-29 11:18:59 -04:00
Nishant Prabhu	a1ea3adb28	Fix LayoutLM ONNX test error (#13710 ) Fix LayoutLM ONNX test error	2021-09-29 06:50:15 -07:00
Matt	3a8a8013ad	Keras callback to push to hub each epoch, or after N steps (#13773 ) * Keras callback to push to hub each epoch, or after N steps * Reworked the callback to use Repository * Use an Enum for save_strategy * Style pass * Correct type for tokenizer * Update src/transformers/keras_callbacks.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/keras_callbacks.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/keras_callbacks.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/keras_callbacks.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/keras_callbacks.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/keras_callbacks.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Adding print message to the final upload * Adding print message to the final upload * Change how we wait for the last process to finish * is_done is a property, not a method, derp * Docstrings and documentation * Style pass * Style edit * Docstring reformat * Docstring rewrite * Replacing print with internal logger Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-09-29 12:47:35 +01:00
Patrick von Platen	aa018a795d	up (#13777 )	2021-09-29 10:30:00 +02:00
Sylvain Gugger	a21ee1f990	Implement len in IterableDatasetShard (#13780 )	2021-09-28 18:22:37 -04:00
Sylvain Gugger	83d3dc0f6f	Fix warning for gradient_checkpointing (#13767 )	2021-09-28 14:21:17 -04:00
Sylvain Gugger	5e3b4a70d3	Fix filtering in test fetcher utils (#13766 )	2021-09-27 15:26:54 -04:00
Lysandre	11c69b8045	Docs for version v4.11.0	2021-09-27 14:19:38 -04:00
Lysandre	dc193c906d	Release: v4.11.0	2021-09-27 14:14:09 -04:00
Sylvain Gugger	1c96500088	Fix gather for SageMaker model parallel	2021-09-27 13:11:58 -04:00
Sylvain Gugger	4e0410e927	Fix in gather for SM distributed	2021-09-27 11:57:18 -04:00
Matt	367c2ef53b	Modified TF train_step (#13678 ) Allows models to be compiled without a loss, and to use the internal loss computations for training with fit()	2021-09-27 14:47:07 +01:00
Sylvain Gugger	e00bc7cd2f	Silence warning in gradient checkpointing when it's False (#13734 )	2021-09-27 07:43:38 -04:00
Sylvain Gugger	3ffd18a617	Fix loss computation in Trainer (#13760 ) Co-authored-by: quantitative-technologies <james.hirschorn@quantitative-technologies.com> Co-authored-by: quantitative-technologies <james.hirschorn@quantitative-technologies.com>	2021-09-27 07:33:08 -04:00
Xiaohan Zou	3ccc27019a	Fix type annotations for `distributed_concat()` (#13746 ) * Fix type annotations for `distributed_concat()` * Use Any	2021-09-27 06:29:12 -04:00
Anton Lozhkov	e0d31a8982	[Tests] Cast Hubert test models to fp16 (#13755 )	2021-09-26 22:58:23 +03:00
Stas Bekman	400c5a158b	[megatron gpt checkpoint conversion] causal mask requires pos_embed dimension (#13735 )	2021-09-26 09:51:40 -07:00
Patrick von Platen	91df45516c	[Trainer] Make sure shown loss in distributed training is correctly averaged over all workers (#13681 ) * push * improve tr loss gather	2021-09-26 09:03:45 +02:00

... 14 15 16 17 18 ...

8821 Commits