transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-31 02:02:21 +06:00

Author	SHA1	Message	Date
Matt	96d833b211	Return scalar losses instead of per-sample means (#18013 ) * Return scalar losses instead of per-sample means * Make loss shape (1,) instead of scalar * Allow scalar losses in test_loss_computation * Allow scalar losses in test_loss_computation * Allow scalar losses in test_loss_computation * Remove XLA loss function for RAG	2022-07-04 17:26:19 +01:00
Matthijs Hollemans	6cb19540c9	sort list of models (#18011 )	2022-07-04 09:20:55 -04:00
regisss	7498db06a1	Replace BloomTokenizer by BloomTokenizerFast in doc (#18005 )	2022-07-04 08:40:13 -04:00
regisss	3cfdefaa4d	Fix typo in error message in generation_utils (#18000 )	2022-07-04 06:04:58 -04:00
amyeroberts	cf2578ae00	Refactor to inherit from nn.Module instead of nn.ModuleList (#17501 ) * Refactor to inherit from nn.Module instead of nn.ModuleList * Fix typo * Empty to trigger CI re-run Blender Bot tests failing (should be unrelated to this PR) and pass locally). I don't have sufficient permisisons to re-run the CI workflow (totally or from failed)	2022-07-04 06:03:42 -04:00
amyeroberts	77ea5130a1	Add TF ResNet model (#17427 ) * Rought TF conversion outline * Tidy up * Fix padding differences between layers * Add back embedder - whoops * Match test file to main * Match upstream test file * Correctly pass and assign image_size parameter Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Add in MainLayer * Correctly name layer * Tidy up AdaptivePooler * Small tidy-up More accurate type hints and remove whitespaces * Change AdaptiveAvgPool Use the AdaptiveAvgPool implementation by @Rocketknight1, which correctly pools if the output shape does not evenly divide by input shape c.f. `9e26607e22 (r900109509)` Co-authored-by: From: matt <rocketknight1@gmail.com> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Use updated AdaptiveAvgPool Co-authored-by: matt <rocketknight1@gmail.com> * Make AdaptiveAvgPool compatible with CPU * Remove image_size from configuration * Fixup * Tensorflow -> TensorFlow * Fix pt references in tests * Apply suggestions from code review - grammar and wording Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Add TFResNet to doc tests * PR comments - GlobalAveragePooling and clearer comments * Remove unused import * Add in keepdims argument * Add num_channels check * grammar fix: by -> of Co-authored-by: matt <rocketknight1@gmail.com> Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * Remove transposes - keep NHWC throughout forward pass * Fixup look sharp * Add missing layer names * Final tidy up - remove from_pt now weights on hub Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: matt <rocketknight1@gmail.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>	2022-07-04 10:59:15 +01:00
Lysandre Debut	7b18702ca7	Add link to existing documentation (#17931 )	2022-07-04 04:13:05 -04:00
Dobatymo	a045cbd6c9	only a stupid typo, but it can lead to confusion (#17930 )	2022-07-04 04:04:16 -04:00
David Heryanto	49c8c67fb8	Exclude Databricks from notebook env only if the runtime is below 11.0 (#17988 ) * Exclude Databricks from notebook env only if the runtime is below 11.0 * Dummy commit to trigger CI * Empty commit to trigger CI * Empty commit to trigger CI * Empty commit to trigger CI * Empty commit to trigger CI * Empty commit to trigger CI * Empty commit to trigger CI * Empty commit to trigger CI	2022-07-01 16:17:40 -04:00
seungeunrho	6890d1960f	Shifting labels for causal LM when using label smoother (#17987 ) * Shifting labels for causal LM when using label smoother When training CausalLM, loss is computed within model's foward() function and labels are shifted internally. However, if label smoothing is applied, loss is computed in trainer's compute_loss function and labels are not shifted. This causes unintended confusion during the alignment of labels and corresponding inputs. This commit is for resolving this confusion. Resolves #17960 On branch shift_labels_for_causalLM Changes to be committed: modified: src/transformers/trainer.py modified: src/transformers/trainer_pt_utils.py * Update trainer.py * Update src/transformers/trainer.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-07-01 14:55:35 -04:00
Yih-Dar	6f0723a9be	Restore original task in test_warning_logs (#17985 ) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-07-01 20:44:27 +02:00
amyeroberts	009171d1ba	Ensure PT model is in evaluation mode and lightweight forward pass done (#17970 )	2022-07-01 19:33:47 +01:00
Matt	d6cec45801	XLA train step fixes (#17973 ) * Copy inputs to train and test step before modifying them, as this breaks things * Add XLA tests, fix our loss functions to be XLA-compatible * make fixup * Update loss computation test to expect vector of per-sample losses * Patch loss for TFLED * Patch loss for TFAlbert * Add a tf_legacy_loss config flag that enables old loss functions * Stop using config.get() because it's not a dict * Skip loss computation test for RAG because its loss is very strange and I'm afraid to rewrite it * make fixup * Add XLA-compatible RAG loss * Fix dtype of loss mask for TFAlbert * Fix test for XLNet too because it overrides the default one * make fixup * Fix config test * No more depending on GPU NaN behaviour * Add test, avoid potential zero division * Fix test item assignment * Fix loss computation masking test * make fixup * Fix dtype bugs	2022-07-01 19:11:14 +01:00
Sanchit Gandhi	485bbe79d5	[Flax] Add remat (gradient checkpointing) (#17843 ) * [Flax] Add remat (gradient checkpointing) * fix variable naming in test * flip: checkpoint using a method * fix naming * fix class naming * apply PVP's suggestions from code review * make fix-copies * fix big-bird, electra, roberta * cookie-cutter * fix flax big-bird * move test to common	2022-07-01 18:33:54 +01:00
Yih-Dar	664688b94f	higher atol to avoid flaky trainer test failure (#17979 ) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-07-01 17:53:16 +02:00
Yih-Dar	8bb2c387f4	Fix FlaxBigBirdEmbeddings (#17842 ) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-07-01 16:46:01 +02:00
Nouamane Tazi	b68d408f1b	add ONNX support for BLOOM (#17961 ) * add onnx support for BLOOM * use TYPE_CHECKING for type annotations * fix past_shape for bloom (different from gpt2) * use logical_or instead of `+` for onnx support * bigger `atol_for_validation` for larger bloom models * copied -> taken because it's no longer an exact copy * remove "copied from" comment Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-07-01 10:44:42 -04:00
Sourab Mangrulkar	462b7f3a94	fixing fsdp autowrap functionality (#17922 ) * fixing fsdp autowrap functionality * update version and quality * update torch version to latest stable version	2022-07-01 19:40:55 +05:30
Wissam Antoun	3a064bd4dd	fix `bias` keyword argument in TFDebertaEmbeddings (#17940 )	2022-07-01 14:48:43 +01:00
Yih-Dar	569b679adb	Update expected values in CodeGen tests (#17888 ) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-07-01 15:33:36 +02:00
Billy Cao	cb42502410	Fix typo in perf_train_gpu_one.mdx (#17983 )	2022-07-01 09:19:13 -04:00
Yih-Dar	14fb8a63b9	skip some gpt_neox tests that require 80G RAM (#17923 ) * skip some gpt_neox tests that require 80G RAM * remove tests * fix quality Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-07-01 09:04:38 -04:00
Aaron Pham	49cd736a28	feat: add pipeline registry abstraction (#17905 ) * feat: add pipeline registry abstraction - added `PipelineRegistry` abstraction - updates `add_new_pipeline.mdx` (english docs) to reflect the api addition - migrate `check_task` and `get_supported_tasks` from transformers/pipelines/__init__.py to transformers/pipelines/base.py#PipelineRegistry.{check_task,get_supported_tasks} Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com> * fix: update with upstream/main chore: Apply suggestions from sgugger's code review Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * chore: PR updates - revert src/transformers/dependency_versions_table.py from upstream/main - updates pipeline registry to use global variables Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com> * tests: add tests for pipeline registry Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com> * tests: add test for output warning. Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com> * chore: fmt and cleanup unused imports Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com> * fix: change imports to top of the file and address comments Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-06-30 12:11:08 -04:00
regisss	9cb7cef285	Add ONNX support for LayoutLMv3 (#17953 ) * Add ONNX support for LayoutLMv3 * Update docstrings * Update empty description in docstring * Fix imports and type hints	2022-06-30 12:09:52 -04:00
Yih-Dar	fe14046421	skip some ipex tests until it works with torch 1.12 (#17964 ) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-06-30 18:05:29 +02:00
Joao Gante	91e1f24ef3	CLI: convert sharded PT models (#17959 ) * sharded conversion; add flag to control max hidden error * better hidden name matching * Add test: load TF from PT shards * fix test (PT data must be local)	2022-06-30 16:51:03 +01:00
Sylvain Gugger	f25457b273	Fix number of examples for iterable dataset in distributed training (#17951 )	2022-06-30 11:01:40 -04:00
Patrick von Platen	e4d2588573	[Pipelines] Add revision tag to all default pipelines (#17667 ) * trigger test failure * upload revision poc * Update src/transformers/pipelines/base.py Co-authored-by: Julien Chaumond <julien@huggingface.co> * up * add test * correct some stuff * Update src/transformers/pipelines/__init__.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * correct require flag Co-authored-by: Julien Chaumond <julien@huggingface.co> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-06-30 16:37:18 +02:00
Jannis Born	4f8361afe7	Unifying training argument type annotations (#17934 ) * doc: Unify training arg type annotations * wip: extracting enum type from Union * blackening	2022-06-30 08:53:32 -04:00
Jason Phang	205bc4152c	Fix GPT-NeoX-20B past handling, attention computation (#17811 ) * Fix GPT-NeoX-20B past handling, swap attention computation to hopefully avoid NaN, update docs * 20B tests	2022-06-30 08:47:40 -04:00
Crystina	692e61e91a	Flax t5 Encoder (#17784 ) * first draft adding Flax-t5-encoder and Flax-mt5-encoder * imports * after make fixup * flax t5 encoder test * black on test * make fix-copies * clean * all_model_classes -> tuple * clean test * is_encoder_decoder=False in t5-enc tester * remove file docstring before FlaxT5Encoder * black * isort * commit suggestions on src/transformers/models/t5/modeling_flax_t5.py Co-authored-by: Suraj Patil <surajp815@gmail.com> * commit suggestions on src/transformers/models/t5/modeling_flax_t5.py Co-authored-by: Suraj Patil <surajp815@gmail.com> * Apply suggestions from code review Co-authored-by: Suraj Patil <surajp815@gmail.com> * remove _get_encoder_module * self.decoder_seq_length -> self.encoder_seq_length as t5-enc does not have decoder * bugfix - self.module_class is class itself, not instance; * docs for mt5 and t5 * call -> __call__ in t5 doc * FlaxMT5EncoderModel to TYPE_HINT * run doc-builder to allow change the files Co-authored-by: Suraj Patil <surajp815@gmail.com>	2022-06-30 00:49:02 +02:00
Clémentine Fourrier	eb1493b15d	Fix #17893 , removed dead code (#17917 ) * Removed dead position_id code, fix #17893 * Removed unused var * Now ignores removed (dead) dict key for backward comp	2022-06-29 17:54:26 -04:00
Matthijs Hollemans	fbc7598bab	add MobileViT model (#17354 ) * add MobileViT * fixup * Update README.md Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * remove empty line Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * use clearer variable names * rename to MobileViTTransformerLayer * no longer inherit from nn.Sequential * fixup * fixup * not sure why this got added twice * rename organization for checkpoints * fix it up * Update src/transformers/models/mobilevit/__init__.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/mobilevit/configuration_mobilevit.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/mobilevit/configuration_mobilevit.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/mobilevit/configuration_mobilevit.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update tests/models/mobilevit/test_modeling_mobilevit.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/mobilevit/modeling_mobilevit.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/mobilevit/modeling_mobilevit.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/mobilevit/modeling_mobilevit.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/mobilevit/modeling_mobilevit.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * code style improvements * fixup * Update docs/source/en/model_doc/mobilevit.mdx Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update docs/source/en/model_doc/mobilevit.mdx Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/mobilevit/configuration_mobilevit.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/mobilevit/configuration_mobilevit.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * download labels from hub * rename layers * rename more layers * don't compute loss in separate function * remove some nn.Sequential * replace nn.Sequential with new MobileViTTransformer class * replace nn.Sequential with MobileViTMobileNetLayer * fix pruning since model structure changed * fixup * fix doc comment * remove custom resize from feature extractor * fix ONNX import * add to doc tests * use center_crop from image_utils * move RGB->BGR flipping into image_utils * fix broken tests * wrong type hint * small tweaks Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-06-29 16:07:51 -04:00
Matt	5feac3d080	Fix prepare_tf_dataset when drop_remainder is not supplied (#17950 )	2022-06-29 19:23:39 +01:00
Bram Vanroy	bc019b0e5f	ExplicitEnum subclass str (JSON dump compatible) (#17933 ) * ExplicitEnum subclass str (JSON dump compatible) * allow union if one of the types is str	2022-06-29 13:49:31 -04:00
Yih-Dar	b089cca347	PyTorch 1.12.0 for scheduled CI (#17949 ) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-06-29 19:32:19 +02:00
Younes Belkada	d444edb3f6	OPT - Fix Softmax NaN in half precision mode (#17437 )	2022-06-29 19:15:32 +02:00
Yih-Dar	9fe2403bc5	Use explicit torch version in deepspeed CI (#17942 ) * use explicit torch version Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-06-29 18:20:34 +02:00
Stas Bekman	4c722e9e22	fix regexes with escape sequence (#17943 )	2022-06-29 08:55:22 -07:00
Zachary Mueller	7c4c6f6084	Fix all is_torch_tpu_available issues (#17936 ) * Fix all is_torch_tpu_available	2022-06-29 11:03:33 -04:00
Mishig Davaadorj	77b76672e2	Fix img seg tests (load checkpoints from `hf-internal-testing`) (#17939 ) * Revert "Skip failing test until they are fixed." This reverts commit `8f400775fc`. * Use `tiny-detr` checkpts from `hf-internal-testing`	2022-06-29 10:19:37 -04:00
StevenTang1998	3cff4cc587	Add MVP model (#17787 ) * Add MVP model * Update README * Remove useless module * Update docs * Fix bugs in tokenizer * Remove useless test * Remove useless module * Update vocab * Remove specifying * Remove specifying * Add #Copied ... statement * Update paper link * Remove useless TFMvp * Add #Copied ... statement * Fix style in test mvp model * Fix some typos * Fix properties of unset special tokens in non verbose mode * Update paper link * Update MVP doc * Update MVP doc * Fix README * Fix typos in docs * Update docs	2022-06-29 09:30:55 -04:00
Sylvain Gugger	8f400775fc	Skip failing test until they are fixed.	2022-06-29 09:11:29 -04:00
Sylvain Gugger	47b9165109	Remove imports and use forward references in ONNX feature (#17926 )	2022-06-29 09:02:53 -04:00
Yih-Dar	5cdfff5df3	Fix job links in Slack report (#17892 ) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-06-29 14:53:13 +02:00
Aritra Roy Gosthipaty	a7eba83161	TF implementation of RegNets (#17554 ) * chore: initial commit Copied the torch implementation of regnets and porting the code to tf step by step. Also introduced an output layer which was needed for regnets. * chore: porting the rest of the modules to tensorflow did not change the documentation yet, yet to try the playground on the model * Fix initilizations (#1) * fix: code structure in few cases. * fix: code structure to align tf models. * fix: layer naming, bn layer still remains. * chore: change default epsilon and momentum in bn. * chore: styling nits. * fix: cross-loading bn params. * fix: regnet tf model, integration passing. * add: tests for TF regnet. * fix: code quality related issues. * chore: added rest of the files. * minor additions.. * fix: repo consistency. * fix: regnet tf tests. * chore: reorganize dummy_tf_objects for regnet. * chore: remove checkpoint var. * chore: remov unnecessary files. * chore: run make style. * Update docs/source/en/model_doc/regnet.mdx Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * chore: PR feedback I. * fix: pt test. thanks to @ydshieh. * New adaptive pooler (#3) * feat: new adaptive pooler Co-authored-by: @Rocketknight1 * chore: remove image_size argument. Co-authored-by: matt <rocketknight1@gmail.com> Co-authored-by: matt <rocketknight1@gmail.com> * Empty-Commit * chore: remove image_size comment. * chore: remove playground_tf.py * chore: minor changes related to spacing. * chore: make style. * Update src/transformers/models/regnet/modeling_tf_regnet.py Co-authored-by: amyeroberts <aeroberts4444@gmail.com> * Update src/transformers/models/regnet/modeling_tf_regnet.py Co-authored-by: amyeroberts <aeroberts4444@gmail.com> * chore: refactored __init__. * chore: copied from -> taken from./g * adaptive pool -> global avg pool, channel check. * chore: move channel check to stem. * pr comments - minor refactor and add regnets to doc tests. * Update src/transformers/models/regnet/modeling_tf_regnet.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * minor fix in the xlayer. * Empty-Commit * chore: removed from_pt=True. Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: matt <rocketknight1@gmail.com> Co-authored-by: amyeroberts <aeroberts4444@gmail.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>	2022-06-29 13:45:14 +01:00
Joao Gante	e6d27ca5c8	TF: XLA beam search + most generation-compatible models are now also XLA-generate-compatible (#17857 ) * working beam search 🎉 * XLA generation compatible with ALL classes * add xla generation slow test	2022-06-29 12:41:01 +01:00
Leon Derczynski	b8142753f9	Add missing comment quotes (#17379 )	2022-06-29 06:16:36 -04:00
NielsRogge	e113c5cb64	Remove render tags (#17897 ) Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>	2022-06-29 06:06:42 -04:00
Santiago Castro	90415475bb	Fix the Conda package build (#16737 ) * Fix the Conda package build * Update build.sh * Update release-conda.yml	2022-06-29 06:03:16 -04:00

1 2 3 4 5 ...

10164 Commits