transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-04 21:30:07 +06:00

Author	SHA1	Message	Date
Sylvain Gugger	f295fc8a16	Fix last models for common tests that are too big. (#25058 ) * Fix last models for common tests that are too big. * Remove print statement	2023-07-25 07:56:04 -04:00
Sylvain Gugger	afe8bfc075	Comment again print statement	2023-07-24 10:12:20 -04:00
Sylvain Gugger	42571f6eb8	Make more test models smaller (#25005 ) * Make more test models tiny * Make more test models tiny * More models * More models	2023-07-24 10:08:47 -04:00
Sylvain Gugger	1023705440	Check models used for common tests are small (#24824 ) * First models * Conditional DETR * Treat DETR models, skip others * Skip LayoutLMv2 as well * Fix last tests	2023-07-14 14:43:19 -04:00
Yih-Dar	fd6735102a	Make PT/Flax tests could be run on GPU (#24557 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-06-28 20:11:01 +02:00
Sylvain Gugger	8e5d1619b3	Clean load keys (#24505 ) * Preliminary work on some models * Fix test load missing and make sure nonpersistent buffers are tested * Always ignore nonpersistent buffers if in state_dict * Treat models * More models * Treat remaining models * Fix quality * Fix tests * Remove draft * This test is not needed anymore * Fix copies * Fix last test * Newly added models * Fix last tests * Address review comments	2023-06-27 14:45:40 -04:00
Younes Belkada	3ce3385c47	Revert "Fix gradient checkpointing + fp16 autocast for most models" (#24420 ) Revert "Fix gradient checkpointing + fp16 autocast for most models (#24247)" This reverts commit `285a48011d`.	2023-06-22 16:11:27 +02:00
Younes Belkada	285a48011d	Fix gradient checkpointing + fp16 autocast for most models (#24247 ) * fix gc bug * continue PoC on OPT * fixes * 🤯 * fix tests * remove pytest.mark * fixup * forward contrib credits from discussions * forward contrib credits from discussions * reverting changes on untouched files. --------- Co-authored-by: zhaoqf123 <zhaoqf123@users.noreply.github.com> Co-authored-by: 7eu7d7 <7eu7d7@users.noreply.github.com>	2023-06-21 17:04:59 +02:00
Sylvain Gugger	372f50030b	Split common test from core tests (#24284 )	2023-06-15 07:30:24 -04:00
Sylvain Gugger	695928e1e5	Tied params cleanup (#24211 ) * First test * Add info for all models * style * Repo consistency * Fix last model and cleanup prints * Repo consistency * Use consistent function for detecting tied weights	2023-06-13 11:38:39 -04:00
Stas Bekman	bbbc5c15d4	[AutoModel] fix `torch_dtype=auto` in `from_pretrained` (#23379 ) * [automodel] fix torch_dtype=auto in from_pretrained * add test * fix logic * Update src/transformers/models/auto/auto_factory.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2023-05-16 10:21:42 -07:00
Lucain	74c55ab9e5	Prepare tests for hfh 0.14 (#22958 ) * Test hf_hub 0.14.0rc1 * fix mocked tests * package version --------- Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com> Co-authored-by: testbot <lucainp@hf.co>	2023-04-24 09:31:50 -04:00
Matt	edb704b26e	Fix inverted conditional in TF common test! (#22540 ) * Fix inverted conditional in TF common test! * Make the same change in the PT tests file * Make sure hidden states for GPT2 have the same output shape in PT/TF * Minor fix to PT implementation of token classification loss * Skip loss equivalence test for TFHubert because it keeps overflowing to inf * Compute LM loss for TF the (weird) way it's computed in PT * Skip loss equivalence test for Wav2Vec2 for the same reason as Hubert * Fix - don't try to access the hidden states property when output is a tuple	2023-04-04 21:59:54 +01:00
Matt	5f3ea66bc0	Add TF port of BLIP (#22090 ) * Initial commit * more stash commit * Yet another stash commit * yet more stash commit * Mostly working except for docs / repo consistency * Stop importing model list from torch file * Add TF BLIP models to docs * Add auto classes * Move get_text_features and get_image_features * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip_text.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/blip/test_modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/blip/test_modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update tests/models/blip/test_modeling_tf_blip_text.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip_text.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Use channels_last convolutions in TF (better performance + compatibility) * Remove _shape function * Move multi-line statement to one line in PT + TF * Specify tf.keras.layers instead of importing from it * Remove test_gradient_checkpointing and empty test_training methods * move some multi-line statements to one line * Update docstring for generate * Remove pruned heads set * Remove self.seq_len_dim * Fixed issues with loss computation, should resolve some tests. Also ensured that the PT version follows the config for output_attentions and output_hidden_states * ensure original model follows config in more cases * Skip the same cross-attention tests in the PT tests - didn't realize we did it twice! * Add training args throughout the models and layers * make fixup * Fix docstring for inputs_embeds * Add docstring for is_decoder * Add docstrings to text models * Remove redundant computation * Add unpack_inputs / keras_serializable * Add modeling_tf_blip to doctests * Add config classes for keras serialization * Changes to allow model porting with pt-to-tf * Quick fix to decoder head and test tweaks * Revert an issue with masking the embeddings outputs * Allow missing keys in some equivalence tests (for unused layers) * Add tf-pt equivalence tests back in * Update src/transformers/models/blip/modeling_tf_blip.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip_text.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/blip/modeling_tf_blip_text.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * make fixup * Refactor invert_attention_mask out into tf_utils * Re-enable cross-tests on the PT side too --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2023-04-04 16:05:22 +01:00
Nicolas Patry	d143087d18	Making sure we can use safetensors to serialize all the time. (#22437 ) * Making sure we can use safetensors to serialize all the time. * Expanding the tests for increased coverage. * Update the test. * Getting current state of affairs. * Tentative fix. * Fixing black version. * Fixing the worst offenders. * Try to modify less files. * Fixing blip_2 (Weird solution right now). * Fixing deta. * Fix blip ? * Missing extra newline. * No deta modification. * Adding some comments. * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Addressing comments. * Addressing comments. * creating warn_once. * Warning_once ! --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2023-03-31 16:07:35 +02:00
Patrick von Platen	f780557a34	[Safetensors] Add explicit flag to from pretrained (#22083 ) * [Safetensors] Add explicit flag to from pretrained * add test * remove @ * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2023-03-13 21:39:06 +01:00
Lucain	923110b74f	Remove set_access_token usage + fail tests if FutureWarning (#22051 ) * Remove set_access_token usage + fail tests if FutureWarning * do not fail on FutureWarning in CI --------- Co-authored-by: testbot <lucainp@hf.co>	2023-03-09 09:23:48 -05:00
Yih-Dar	9474abdf47	Use larger atol in `torch.allclose` for some tests (#21966 ) Use larger atol Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-03-06 17:41:00 +01:00
Younes Belkada	831f3144a6	[`tests`] add `accelerate` marker (#21743 ) * add `accelerate` marker * add to docs * Update docs/source/en/testing.mdx	2023-02-27 12:33:34 +01:00
Arthur	c51dc4f927	[torch] remove deprecated uint8 in favor of bool (#21384 ) * uint8 -> bool * fix copies * style * update test modeling commen when checking attention buffers * style * use logical not on random mask instead of subtraction with 1 * remove torch uint8 * quality * remove modified modeling utils * Update based on review Co-authored-by: sgugger <sylvain.gugger@gmail.com> --------- Co-authored-by: sgugger <sylvain.gugger@gmail.com>	2023-02-27 11:46:02 +01:00
Aaron Gokaslan	5e8c8eb5ba	Apply ruff flake8-comprehensions (#21694 )	2023-02-22 09:14:54 +01:00
Sylvain Gugger	d4ba6e1a0e	Fix generation config for empty state dict (#21630 )	2023-02-14 10:57:28 -05:00
Stas Bekman	2f5507580b	[from_pretrained] extend `torch_dtype="auto"` to look up `config.torch_dtype` first, expand docs (#21524 ) * [from_pretrained] expand on torch_dtype entry * fold 4 into 1 * style * support torch_dtype='config' plus tests * style * oops * fold config into auto, fix bug * fix check * better log * better log * clean up	2023-02-10 09:09:21 -08:00
Patrick von Platen	b20147a3c8	[Variant] Make sure variant files are not incorrectly deleted (#21562 ) * [Variant] Make sure variant files are not incorrectly deleted * Apply suggestions from code review * fix	2023-02-10 15:44:51 +01:00
Sylvain Gugger	04b2f13c37	🚨🚨🚨 Enforce single model initialization (#21431 ) * Enforce single model initialization * Add OneFormer example for problem 3 * Do it the Stas way * Actually rename the uses... * Rewrite test * Try to change the test this way * Fix all init slow/fast tests * Break connection * Fix more tests * Fix test for initialization * Remove custom test * Quality * Fix last failing tests * The end?	2023-02-09 15:46:26 -05:00
Sylvain Gugger	2020ac4bd6	Fix from_pretrained API with config and state_dict (#21542 )	2023-02-09 15:44:02 -05:00
Joao Gante	0d33381fad	Tag tests as slow ⌛ (#21537 ) begone slow tests	2023-02-09 14:46:15 +00:00
Sylvain Gugger	6f79d26442	Update quality tooling for formatting (#21480 ) * Result of black 23.1 * Update target to Python 3.7 * Switch flake8 to ruff * Configure isort * Configure isort * Apply isort with line limit * Put the right black version * adapt black in check copies * Fix copies	2023-02-06 18:10:56 -05:00
Patrick von Platen	90cddfa824	Add variant to transformers (#21332 ) * Bump onnx in /examples/research_projects/decision_transformer Bumps [onnx](https://github.com/onnx/onnx) from 1.11.0 to 1.13.0. - [Release notes](https://github.com/onnx/onnx/releases) - [Changelog](https://github.com/onnx/onnx/blob/main/docs/Changelog.md) - [Commits](https://github.com/onnx/onnx/compare/v1.11.0...v1.13.0) --- updated-dependencies: - dependency-name: onnx dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> * adapt * finish * Update examples/research_projects/decision_transformer/requirements.txt * up * add tests * Apply suggestions from code review Co-authored-by: Lucain <lucainp@gmail.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * fix test --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Lucain <lucainp@gmail.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co>	2023-02-01 09:21:52 +01:00
Yih-Dar	4e41b87e3d	Use `model_class.__name__` and compare against `XXX_MAPPING_NAMES` (#21304 ) * update * update all * clean up * make quality * clean up Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-01-26 11:31:31 +01:00
Joao Gante	1eda4a4102	Generate: save generation config with the models' `.save_pretrained()` (#21264 )	2023-01-23 16:21:44 +00:00
Susnato Dhar	b5be744d3c	Fixed issue #21039 (#21062 ) Fixed issue #21039 and added test for low_cpu_mem_usage	2023-01-12 10:03:13 +01:00
Yih-Dar	5fa0b17c3d	[Past CI] 🔥 Leave Past CI failures in the past 🔥 (#20861 ) * torch.jit._state * Fix past CI * Fix for perceiver * Fix REALM * Fix for Bloom * Fix for SwinMode * Fix for TrajectoryTransformerModel * Fix for test_wav2vec2_with_lm * make style Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-12-27 18:37:25 +01:00
NielsRogge	11745b4e45	[Tests] Improve test_attention_outputs (#20701 ) * Improve tests * Improve TF tests * Apply suggestion * Fix test Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>	2022-12-14 14:41:40 +01:00
NielsRogge	0bae286de9	[AutoBackbone] Improve API (#20407 ) * Add hidden states and attentions to backbone outputs * Update ResNet * Fix more tests * Debug test * Fix test_determinism * Fix test_save_load * Remove file * Disable fx tests * Test * Add fx support for backbones Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>	2022-11-28 17:20:24 +01:00
NielsRogge	4973d2a04c	Add Audio Spectogram Transformer (#19981 ) * First draft * Make conversion script work * Add id2label mapping, run code quality * Fix copies * Add first draft of feature extractor * Update conversion script to use feature extractor * Make more tests pass * Add docs * update input_features to input_values + pad by default to max length * Fix doc tests * Add feature extractor tests * Add proper padding/truncation to feature extractor * Add support for conversion of all audioset checkpoints * Improve docs and extend conversion script * Fix README * Rename spectogram to spectrogram * Fix copies * Add integration test * Remove dummy conv * Update to ast * Update organization * Fix init * Rename model to AST * Add require_torchaudio annotator * Move import of ASTFeatureExtractor under a is_speech_available * Fix rebase * Add pipeline config * Update name of classifier head * Rename time_dimension and frequency_dimension for clarity * Remove print statement * Fix pipeline test * Fix pipeline test * Fix index table * Fix init * Fix conversion script * Rename to ForAudioClassification * Fix index table Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>	2022-11-21 18:58:54 +01:00
Yih-Dar	536e60d2c7	mark `test_save_load_fast_init_from_base` as `is_flaky` (#20200 ) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-11-14 18:51:33 +01:00
Nicolas Patry	bac2d29a80	Attempting to test automatically the `_keys_to_ignore`. (#20042 ) * Attempting to test automatically the `_keys_to_ignore`. * Style. * First fix pass. * Moving test on its own. * Another batch. * Second round removing BatchNorm * Fixing layoutlmv{2,3} + support older Python. * Disable miss missing warning. * Removing dodgy additions. * Big pass. * mbart. * More corrections. * Fixup. * Updating test_correct_missing_keys * Add escape hatch for when the head has no extra params so doesn't need the missing keys check. * Fixing test. * Greener. * Green ! (except for weird splinter bug). * Adding a test about `named_parameters` usage. * Shorten message. * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * After rebase modifications. * More explicit condition checking. * Fixing slow tests issues. * Remove extra pdb. * Remove print. * Attempt to make failure consistent + fixing roc_bert. * Removing the seed (all tests passing with it). Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-11-09 16:03:36 +01:00
Michael Benayoun	9080607b2c	Fixed torch.finfo issue with torch.fx (#20040 )	2022-11-03 16:14:44 +01:00
Sylvain Gugger	49b77b89ea	Quality (#20002 )	2022-11-02 09:53:37 -04:00
Younes Belkada	7629656926	`accelerate` support for `RoBERTa` family (#19906 )	2022-10-26 22:41:53 +02:00
Yih-Dar	688c3e8e40	Update `max_diff` in `test_save_load_fast_init_to_base` (#19849 ) * Fix test_save_load_fast_init_to_base * Fix test_save_load_fast_init_to_base * update Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-10-26 17:09:47 +02:00
Yih-Dar	3a1aeea3c5	Fix CTRL `test_torchscrip_xxx` CI by updating `_create_and_check_torchscript` (#19786 ) * Run inputs before trace * Run inputs before trace Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-10-21 16:23:13 +02:00
Sylvain Gugger	3e2dd7f92d	Poc to use safetensors (#19175 ) * Poc to use safetensors * Typo * Final version * Add tests * Save with the right name! * Update tests/test_modeling_common.py Co-authored-by: Julien Chaumond <julien@huggingface.co> * Support for sharded checkpoints * Test from Hub part 1 * Test from hub part 2 * Fix regular checkpoint sharding * Bump for fixes Co-authored-by: Julien Chaumond <julien@huggingface.co>	2022-09-30 10:58:04 -04:00
Younes Belkada	4d0f8c05f5	Add `accelerate` support for ViLT (#18683 )	2022-09-22 13:14:39 +02:00
Sylvain Gugger	ca485e562b	Add tests for legacy load by url and fix bugs (#19078 )	2022-09-16 23:20:02 +02:00
Ankur Goyal	2ef7742117	Add DocumentQuestionAnswering pipeline (#18414 ) * [WIP] Skeleton of VisualQuestionAnweringPipeline extended to support LayoutLM-like models * Fixup * Use the full encoding * Basic refactoring to DocumentQuestionAnsweringPipeline * Cleanup * Improve args, docs, and implement preprocessing * Integrate OCR * Refactor question_answering pipeline * Use refactored QA code in the document qa pipeline * Fix tests * Some small cleanups * Use a string type annotation for Image.Image * Update encoding with image features * Wire through the basic docs * Handle invalid response * Handle empty word_boxes properly * Docstring fix * Integrate Donut model * Fixup * Incorporate comments * Address comments * Initial incorporation of tests * Address Comments * Change assert to ValueError * Comments * Wrap `score` in float to make it JSON serializable * Incorporate AutoModeLForDocumentQuestionAnswering changes * Fixup * Rename postprocess function * Fix auto import * Applying comments * Improve docs * Remove extra assets and add copyright * Address comments Co-authored-by: Ankur Goyal <ankur@impira.com>	2022-09-07 13:38:49 -04:00
Lucain	169b8cde47	Fix mock in `test_cached_files_are_used_when_internet_is_down` (#18804 )	2022-08-29 15:56:08 +02:00
Yih-Dar	8b67f20935	Fix memory leak issue in `torch_fx` tests (#18547 ) Co-authored-by: Lysandre Debut <hi@lysand.re> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-08-29 11:43:20 +02:00
Sylvain Gugger	5cd4032368	Use new huggingface_hub tools for download models (#18438 ) * Draft new cached_file * Initial draft for config and model * Small fixes * Fix first batch of tests * Look in cache when internet is down * Fix last tests * Bad black, not fixing all quality errors * Make diff less * Implement change for TF and Flax models * Add tokenizer and feature extractor * For compatibility with main * Add utils to move the cache and auto-do it at first use. * Quality * Deal with empty commit shas * Deal with empty etag * Address review comments	2022-08-05 10:12:40 -04:00
NielsRogge	f9a0008d2d	Add VideoMAE (#17821 ) * First draft * Add VideoMAEForVideoClassification * Improve conversion script * Add VideoMAEForPreTraining * Add VideoMAEFeatureExtractor * Improve VideoMAEFeatureExtractor * Improve docs * Add first draft of model tests * Improve VideoMAEForPreTraining * Fix base_model_prefix * Make model take pixel_values of shape (B, T, C, H, W) * Add loss computation of VideoMAEForPreTraining * Improve tests * Improve model testsé * Make all tests pass * Add VideoMAE to main README * Add tests for VideoMAEFeatureExtractor * Add integration test * Improve conversion script * Rename patch embedding class * Remove VideoMAELayer from init * Update design of patch embeddings * Improve comments * Improve conversion script * Improve conversion script * Add conversion of pretrained model * Add loss verification of pretrained model * Add loss verification of unnormalized targets * Add integration test for pretraining model * Apply suggestions from code review * Fix bug to make feature extractor resize only shorter edge * Address more comments * Improve normalization of videos * Add doc examples * Move constants to dedicated script * Remove scripts * Transfer checkpoints, fix docs * Update script * Update image mean and std * Fix doc tests * Set return_tensors to NumPy by default * Revert the previous change Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>	2022-08-04 18:02:55 +02:00
Sylvain Gugger	01db72abd4	Rewrite push_to_hub to use upload_files (#18366 ) * Rewrite push_to_hub to use upload_files * Adapt the doc a bit * Address review comments and clean doc	2022-08-01 12:07:30 -04:00
Mikkel Denker	70e7d1d656	Fixes torch jit tracing for LayoutLMv2 model (re-open) (#18313 ) * Fixes torch jit tracing for LayoutLMv2 model. Pytorch seems to reuse memory for input_shape which caused a mismatch in shapes later in the forward pass. * Fixed code quality * avoid unneeded allocation of vector for shape	2022-07-27 06:38:40 -04:00
Patrick von Platen	3bb6356d4d	[From pretrained] Allow download from subfolder inside model repo (#18184 ) * add first generation tutorial * [from_pretrained] Allow loading models from subfolders * remove gen file * add doc strings * allow download from subfolder * add tests * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * apply comments * correct doc string Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-07-19 11:53:53 +02:00
Yih-Dar	6561fbcc6e	Update TF(Vision)EncoderDecoderModel PT/TF equivalence tests (#18073 ) Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-07-18 15:29:14 +02:00
Sylvain Gugger	df8e6804c0	Offload fixes (#17810 ) * Offload fixes * Add a test	2022-06-22 12:23:07 -04:00
Yih-Dar	f47afefb21	Use 5e-5 For BigBird PT/Flax equivalence tests (#17780 ) * rename to check_pt_flax_outputs * update check_pt_flax_outputs * use 5e-5 for BigBird PT/Flax test Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-06-21 17:55:26 +02:00
Lysandre Debut	6a5272b205	Prepare transformers for v0.8.0 huggingface-hub release (#17716 ) * Prepare CI for v0.8.0 * pin hfh (revert before merge) * Revert "pin hfh (revert before merge)" This reverts commit `a0103140e1`. * Test rc3 * Test latest rc * Unpin to the RC Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>	2022-06-21 11:51:18 -04:00
Stas Bekman	75343de938	[modeling_utils] torch_dtype/auto floating dtype fixes (#17614 ) * [modeling_utils] torch_dtype/auto fixes * add test * apply suggestions * add missing fallback * Renaming things * Use for else Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>	2022-06-09 10:18:26 -07:00
amyeroberts	dfc76b2542	has_attentions - consistent test skipping logic and tf tests (#17495 )	2022-06-09 09:50:03 +02:00
Michael Benayoun	5c8f601007	Fx support for Deberta-v[1-2], Hubert and LXMERT (#17539 ) * Support for deberta and deberta-v2 * Support for LXMert * Support for Hubert * Fix for pt1.11 * Trigger CI	2022-06-07 18:05:20 +02:00
Sylvain Gugger	8343901263	Fix all offload and MP tests (#17533 )	2022-06-03 09:59:13 -04:00
Sylvain Gugger	4390151ba2	Fix MP and CPU offload tests for Funnel and GPT-Neo (#17503 )	2022-06-01 09:59:40 -04:00
Sylvain Gugger	567d9c061d	Disk offload fix (#17428 ) * Fix offload to disk for big models * Add test * Fix test for other models	2022-05-31 09:16:18 -04:00
Michael Benayoun	28d0048218	Fx support for multiple model architectures (#17393 ) * Support for Bart and LayoutLM, and partial support for XLNet * Support for mbart * A lot of new models supported * Support for other models * LayoutLM fix * Use strings instead of classes	2022-05-31 10:02:55 +02:00
Sylvain Gugger	98f6e1ee87	Fix model parallelism test (#17439 )	2022-05-26 09:57:12 -04:00
Sylvain Gugger	31484afbed	Add test for new model parallelism features (#17401 )	2022-05-25 10:51:27 -04:00
Sylvain Gugger	56f50590d5	Use Accelerate in `from_pretrained` for big model inference (#17341 ) * Initial work * More or less finished with first draft * Update src/transformers/modeling_utils.py Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * Update src/transformers/modeling_utils.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Fix randomly initialized weights * Update src/transformers/modeling_utils.py Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr> * Address review comments * Rename DeepSpeed folder to temporarily fix the test issue? * Revert to try if Accelerate fix works * Use latest Accelerate release * Quality and fixes * Style * Quality * Add doc * Test + fix * More blocks Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>	2022-05-23 14:32:21 -04:00
Michael Benayoun	2e7e4280aa	Traced models serialization and torchscripting fix (#17206 ) * Fix torch.jit.script and pickling issues * Fix get_attr issues * Fix import in function * Fix GPT-J and T5 tracing for torch=1.11 * Gate graph surgery on torch version * Modeling minor changes to enable TorchScripting * Model serialization / deserialization test * Remove _assert_is_none users	2022-05-23 17:50:40 +02:00
Kyungmin Lee	f0395cf58e	Fix test_model_parallelization (#17249 ) * Fix test_model_parallelization * Modify	2022-05-16 23:30:49 +02:00
Sylvain Gugger	afe5d42d8d	Black preview (#17217 ) * Black preview * Fixup too! * Fix check copies * Use the same version as the CI * Bump black	2022-05-12 16:25:55 -04:00
Michael Benayoun	8c7481f35c	ViT and Swin symbolic tracing with torch.fx (#17182 ) * Support tracing for ViT * Swin support * Fix copies * Fix type annotation issue * Removed unused import	2022-05-12 10:42:27 +02:00
Yih-Dar	e6d23a4b9b	Improve test_pt_tf_model_equivalence on PT side (#16731 ) * Update test_pt_tf_model_equivalence on PT side Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-04-19 21:13:27 +02:00
Stas Bekman	5da33f8729	[modeling utils] revamp `from_pretrained(..., low_cpu_mem_usage=True)` + tests (#16657 ) * add low_cpu_mem_usage tests * wip: revamping * wip * install /usr/bin/time * wip * cleanup * cleanup * cleanup * cleanup * cleanup * fix assert * put the wrapper back * cleanup; switch to bert-base-cased * Trigger CI * Trigger CI	2022-04-14 18:10:05 -07:00
Yih-Dar	c04619ecf3	Enable more test_torchscript (#16679 ) * update _create_and_check_torchscript * Enable test_torchscript * clear_class_registry Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-04-11 18:23:35 +02:00
Yih-Dar	3918d6a9d6	Reduce memory leak in _create_and_check_torchscript (#16691 ) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-04-11 18:22:28 +02:00
Yih-Dar	2109afae71	Rename the method test_torchscript (#16693 ) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-04-11 18:21:45 +02:00
NielsRogge	979b039c89	Add DPT (#15991 ) * First draft * More improvements * Add fusion blocks * Make conversion script work for dpt_large * Make conversion script work * Improve implementation * Improve conversion script * Add DPTForSemanticSegmentation * Make conversion work for semantic segmentation * Add tests * Remove print statements * First draft * Redesign neck * Improve tests * Improve implementation some more * Make neck output list of tensors * Improve neck and feature extractor * Fix integration tests * Make more tests pass * Make all tests pass * Add missing config archive map * Add in_index attribute to make heads accept list of tensors * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Apply some more suggestions * Add copied from statements * Remove assert * Apply suggestions from code review * Apply suggestions from code review * Remove DPTInterpolate in favor of nn.Upsample * Add comments * Apply suggestions from code review * Apply suggestions from code review * Add proposed design * Update design * Add DPTReassembleLayer * Add DPTFeatureFusionStage * Apply more suggestions from code review * Apply suggestions from code review * Apply suggestions from code review * Fix rebase * Update in_index and out_indices * Fix conversion script * Fix code quality * Add model to toctree and use DepthEstimatorOutput * Fix rebase * Fix code examples * Improve code * Fix copied from statements * Apply suggestions from code review * Remove compute_loss method * Apply suggestions from code review * Fix documentation tests file * Remove test.py file * Improve doc example Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Niels Rogge <nielsrogge@nielss-mbp.home>	2022-03-28 16:28:10 +02:00
Sylvain Gugger	b473617d63	Checkpoint sharding (#16343 ) * Sharded checkpoint support * Handle distant sharded checkpoints * Add tests * TODO is done * Apply suggestions from code review Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * Fix docstring * Add example and format * Address review comments * More review comments * End of merge * Revert unintentional change * VsCode what did you do? * Style * Changes * Address final comments * Quality * Moar tests * Move import beneath is_pt_available Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>	2022-03-25 11:59:25 -04:00
Yih-Dar	f571dc20ac	Update PT Flax equivalence tests in PT test file (#16280 ) * update PT/Flax equivalence tests on PT side * overwrite check_outputs in BigBirdModelTest Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-03-24 14:45:30 +01:00
Sylvain Gugger	c595b6e6a9	Make Transformers use cache files when hf.co is down (#16362 ) * Make Transformers use cache files when hf.co is down * Fix tests * Was there a random circleCI failure? * Isolate patches * Style * Comment out the failure since it doesn't fail anymore * Better comment	2022-03-23 15:56:49 -04:00
Sylvain Gugger	4975002df5	Reorganize file utils (#16264 ) * Split file_utils in several submodules * Fixes * Add back more objects * More fixes * Who exactly decided to import that from there? * Second suggestion to code with code review * Revert wront move * Fix imports * Adapt all imports * Adapt all imports everywhere * Revert this import, will fix in a separate commit	2022-03-23 10:26:33 -04:00
Yih-Dar	75c666b4a8	Aggressive PT/TF equivalence test on PT side (#16250 ) * Aggressive PT/TF equivalence test on PT side * Ugly fix for `TFTapasForQuestionAnswering` * apply review suggestions Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-03-18 18:51:24 +01:00
NielsRogge	8d83ebdf18	[Tests] Add attentions_option to ModelTesterMixin (#15909 ) * Add attentions_option to common tester * Fix tests, apply suggestion * Apply suggestion from code review Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>	2022-03-10 12:00:30 +01:00
NielsRogge	286fdc6b3c	[vision] Add problem_type support (#15851 ) * Add problem_type to missing models * Fix deit test Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>	2022-03-01 18:09:52 +01:00
Eduardo Gonzalez Ponferrada	df5a4094a6	Add Data2Vec (#15507 ) * Add data2vec model cloned from roberta * Add checkpoint conversion script * Fix copies * Update docs * Add checkpoint conversion script * Remove fairseq data2vec_text script and fix format * Add comment on where to get data2vec_text.py * Remove mock implementation cheat.py and fix style * Fix copies * Remove TF and Flax classes from init * Add back copy from fairseq data2vec_text.py and fix style * Update model name in docs/source/index.mdx to be CamelCase * Revert model name in table to lower-case to get check_table test to pass * Update src/transformers/models/data2vec/__init__.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/data2vec/convert_data2vec_original_pytorch_checkpoint_to_pytorch.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/data2vec/modeling_data2vec.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/data2vec/modeling_data2vec.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/data2vec/modeling_data2vec.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/data2vec/modeling_data2vec.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update docs/source/model_doc/data2vec.mdx Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update docs/source/model_doc/data2vec.mdx Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/auto/configuration_auto.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/data2vec/configuration_data2vec.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/data2vec/modeling_data2vec.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/data2vec/modeling_data2vec.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/data2vec/modeling_data2vec.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update tests/test_modeling_data2vec.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/data2vec/configuration_data2vec.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/data2vec/modeling_data2vec.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update documentation * Copy-paste Data2VecConfig from BertConfig * Update config checkpoint to point to edugp/data2vec-nlp-base. Fix style and repo-consistency * Update config special tokens to match RoBERTa * Split multiple assertions and add individual error messages * Rename Data2VecModel to Data2VecForTextModel * Add Data2Vec to _toctree.yml * Rename Data2VecEmbeddings to Data2VecForTextEmbeddings * Add initial Data2VecForAudio model (unfinished). Only matching fairseq's implementation up to the feature encoder (before positional encoding). * finish audio model * finish audio file * Update names and fix style, quality and repo consistency * Remove Data2VecAudioForPretraining. Add tests for Data2VecAudio, mimicking the Wav2Vec2 test suite. Fix bias initilization in positional conv layers. Move back configurations for audio and text to separate files. * add inputs to logits to data2vec' * correct autio models * correct config auto * correct tok auto * Update utils/tests_fetcher.py * delete unnecessary files * delete unnecessary files * further renaming * make all tests pass * finish * remove useless test file * Update tests/test_modeling_common.py * Update utils/check_repo.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/data2vec/modeling_data2vec_text.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Fix copies * Update docs * Remove fairseq data2vec_text script and fix format * Add comment on where to get data2vec_text.py * Remove mock implementation cheat.py and fix style * Fix copies * Remove TF and Flax classes from init * Add back copy from fairseq data2vec_text.py and fix style * Update model name in docs/source/index.mdx to be CamelCase * Revert model name in table to lower-case to get check_table test to pass * Update documentation * Update src/transformers/models/data2vec/__init__.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/data2vec/convert_data2vec_original_pytorch_checkpoint_to_pytorch.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/data2vec/modeling_data2vec.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/data2vec/modeling_data2vec.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/data2vec/modeling_data2vec.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/data2vec/modeling_data2vec.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/auto/configuration_auto.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/data2vec/configuration_data2vec.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/data2vec/modeling_data2vec.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/data2vec/modeling_data2vec.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/data2vec/modeling_data2vec.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update tests/test_modeling_data2vec.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/data2vec/configuration_data2vec.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/data2vec/modeling_data2vec.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Copy-paste Data2VecConfig from BertConfig * Update config checkpoint to point to edugp/data2vec-nlp-base. Fix style and repo-consistency * Update config special tokens to match RoBERTa * Split multiple assertions and add individual error messages * Rename Data2VecModel to Data2VecForTextModel * Add Data2Vec to _toctree.yml * Rename Data2VecEmbeddings to Data2VecForTextEmbeddings * Add initial Data2VecForAudio model (unfinished). Only matching fairseq's implementation up to the feature encoder (before positional encoding). * finish audio model * finish audio file * add inputs to logits to data2vec' * Update names and fix style, quality and repo consistency * Remove Data2VecAudioForPretraining. Add tests for Data2VecAudio, mimicking the Wav2Vec2 test suite. Fix bias initilization in positional conv layers. Move back configurations for audio and text to separate files. * correct autio models * correct config auto * correct tok auto * delete unnecessary files * delete unnecessary files * Update utils/tests_fetcher.py * further renaming * make all tests pass * finish * remove useless test file * Update tests/test_modeling_common.py * Update utils/check_repo.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/data2vec/modeling_data2vec_text.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Move data2vec tests to new structure * Fix test imports for text tests * Remove fairseq files * Change paper link to arxiv * Modify Data2Vec documentation to reflect that the encoder is not shared across the audio and text models in the current implementation. * Update text model checkpoint to be facebook/data2vec-text-base * Add 'Copy from' statements and update paper links and docs * fix copy from statements * improve copied from * correct more copied from statements * finish copied from stuff * make style * add model to README * add to master Co-authored-by: Eduardo Gonzalez Ponferrada <eduardo@ferrumhealth.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-03-01 11:09:20 +01:00
Patrick von Platen	ddbb485c41	[TF-PT-Tests] Fix PyTorch - TF tests for different GPU devices (#15846 )	2022-02-28 15:46:46 -05:00
Sylvain Gugger	d1fcc90abf	Fix from_pretrained with default base_model_prefix (#15814 )	2022-02-24 11:43:51 +01:00
NielsRogge	57882177be	Add SimMIM (#15586 ) * Add first draft * Make model importable * Make SwinForMaskedImageModeling importable * Fix imports * Add missing inits * Add support for Swin * Fix bug * Fix bug * Fix another bug * Fix Swin MIM implementation * Fix default encoder stride * Fix Swin * Add print statements for debugging * Add image_size data argument * Fix Swin * Fix image_size * Add print statements for debugging * Fix print statement * Remove print statements * Improve reshaping of bool_masked_pos * Add support for DeiT, fix tests * Improve docstrings * Apply new black version * Improve script * Fix bug * Improve README * Apply suggestions from code review * Remove DS_Store and add to gitignore * Apply suggestions from code review + fix BEiT Flax * Revert BEiT changes * Improve README * Fix code quality * Improve README Co-authored-by: Niels Rogge <nielsrogge@Nielss-MBP.localdomain> Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>	2022-02-17 19:44:55 +01:00
Lysandre Debut	943e2aa036	Fix model equivalence tests (#15670 ) * Fix model equivalence tests * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-02-15 18:55:22 -05:00
Sylvain Gugger	1f60bc46f3	Make sure custom configs work with Transformers (#15569 ) * Make sure custom configs work with Transformers * Apply code review suggestions	2022-02-09 10:04:44 -05:00
Joao Gante	8406fa6dd5	Add TFSpeech2Text (#15113 ) * Add wrapper classes * convert inner layers to tf * Add TF Encoder and Decoder layers * TFSpeech2Text models * Loadable model * TF model with same outputs as PT model * test skeleton * correct tests and run the fixup * correct attention expansion * TFSpeech2Text pask_key_values with TF format	2022-02-08 16:27:23 +00:00
Michael Benayoun	0fe17f375a	FX tracing improvement (#14321 ) * Change the way tracing happens, enabling dynamic axes out of the box * Update the tests and modeling xlnet * Add the non recoding of leaf modules to avoid recording more values for the methods to record than what will be seen at tracing time (which would otherwise desynchronize the recorded values and the values that need to be given to the proxies during tracing, causing errors). * Comments and making tracing work for gpt-j and xlnet * Refactore things related to num_choices (and batch_size, sequence_length) * Update fx to work on PyTorch 1.10 * Postpone autowrap_function feature usage for later * Add copyrights * Remove unnecessary file * Fix issue with add_new_model_like * Apply suggestions	2022-02-07 22:25:33 +01:00
Sylvain Gugger	44b21f117b	Save code of registered custom models (#15379 ) * Allow dynamic modules to use relative imports * Work for configs * Fix last merge conflict * Save code of registered custom objects * Map strings to strings * Fix test * Add tokenizer * Rework tests * Tests * Ignore fixtures py files for tests * Tokenizer test + fix collection * With full path * Rework integration * Fix typo * Remove changes in conftest * Test for tokenizers * Add documentation * Update docs/source/custom_models.mdx Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Add file structure and file content * Add more doc * Style * Update docs/source/custom_models.mdx Co-authored-by: Suraj Patil <surajp815@gmail.com> * Address review comments Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Suraj Patil <surajp815@gmail.com>	2022-02-02 10:44:37 -05:00
Sylvain Gugger	33f36c869f	Add a main_input_name attribute to all models (#14803 ) * Add a main_input_name attribute to all models * Fix tests * Wtf Vs Code? * Update src/transformers/models/imagegpt/modeling_imagegpt.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Style * Fix copies Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2021-12-20 11:19:08 -05:00
NielsRogge	25156eb296	Rename ImageGPT (#14526 ) * Rename * Add MODEL_FOR_CAUSAL_IMAGE_MODELING_MAPPING	2021-11-29 10:19:11 +01:00
Sylvain Gugger	d83b0e0c07	Add a post init method to all models (#14431 ) * Add a post init method to all models * Fix tests * Fix last tests * Fix templates * Add comment * Forgot to save	2021-11-18 08:38:09 -05:00
Sylvain Gugger	040fd47162	Fix gradient_checkpointing backward compatibility (#14408 ) * Fix gradient_checkpointing backward compatibility * Remove needless line * make sure mask prob is big enough and length small enough * Fix tests Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>	2021-11-16 08:58:42 -05:00
Yih-Dar	be4a6c64dc	Add TFViTModel (#13778 ) * Start the work for TFViTModel * Convert to TF code - need to check in the follow up commits * Clean up model code * Expose TFViTModel * make style * make quality * Add test * make style & quality * Fix some imports * fix wrong usage - kwargs => * kwargs * Fix Conv2D weight loading (PT->TF) issue * Add tests for images with different sizes + fix model * Fix some common tests for TFViTModel * Use inputs instead of input_ids in test_compile_tf_model * Add a comment about transpose and Conv2D in convert_tf_weight_name_to_pt_weight_name * Avoid transpose in TFViT call * Fix Conv2D issue in load_tf2_weights_in_pytorch_model * Use tf.keras.layers.Conv2D instead of tf.nn.conv2d * Using simpler heuristic to detect Conv2D layer * Change convert_tf_weight_name_to_pt_weight_name to return TransposeType * Check tf_weight_shape is not None before using it * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fix missing comma * fix input dtype Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-11-09 07:54:37 -05:00
Sylvain Gugger	dfb00bf644	Expand dynamic supported objects to configs and tokenizers (#14296 ) * Dynamic configs * Add config test * Better tests * Add tokenizer and test * Add to from_config * With save	2021-11-08 15:28:25 -05:00
Sylvain Gugger	558f8543ba	Update Transformers to huggingface_hub >= 0.1.0 (#14251 ) * Update Transformers to huggingface_hub >= 0.1.0 * Forgot to save... * Style * Fix test	2021-11-02 18:58:42 -04:00
NielsRogge	e20faa6f03	Add BeitForSemanticSegmentation (#14096 ) * Add first draft * Make forward pass work * Improve conversion script * Add notebook that checks if it works * Add BeitForSemanticSegmentation to the tests * More improvements * Make BeitForSemanticSegmentation consistent with Segformer * Small bug fix * Add BeitForSemanticSegmentation to docs * Make sure model doesn't output hidden states when the user doesn't want to * Make it possible to convert the large model * Fix issue * Fix conversion script for large model * Add auxiliary_head option to semantic segmentation model * Apply suggestions from @sgugger's review * Apply suggestions from code review * Fix failing test Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>	2021-11-01 19:55:45 +01:00
Sylvain Gugger	c28bc80bbb	Generalize problem_type to all sequence classification models (#14180 ) * Generalize problem_type to all classification models * Missing import * Deberta BC and fix tests * Fix template * Missing imports * Revert change to reformer test * Fix style	2021-10-29 10:32:56 -04:00
Patrick von Platen	0c3174c758	Add TF<>PT and Flax<>PT everywhere (#14047 ) * up * up * up * up * up * up * up * add clip * fix clip PyTorch * fix clip PyTorch * up * up * up * up * up * up * up	2021-10-25 23:55:08 +02:00
Li-Huai (Allan) Lin	234cfefbb0	Fix ignore_mismatched_sizes (#14085 ) * Fix * Style * Name * Fix tests * Style * Remove embed sizes checking * Disable some tests * Fix * Apply suggestion	2021-10-21 12:31:29 -04:00
Patrick von Platen	dca6796876	[Gradient checkpoining] Correct disabling `find_unused_parameters` in Trainer when gradient checkpointing is enabled (#13961 ) * up * correct test	2021-10-11 15:34:01 +02:00
Michael Benayoun	d4e4efce68	Initial support for symbolic tracing with torch.fx allowing dynamic axes (#13579 ) * Symbolic trace dynamic axes support for BERT like models (albert, bert, distilbert, mobilebert, electra, megatron-bert) * Sanity checks before tracing that make sure the model to trace is supported * Adapted to PyTorch 1.9 Co-authored-by: Michael Benayoun <michael@huggingface.co>	2021-10-05 14:19:47 +02:00
Sylvain Gugger	27d4639779	Make gradient_checkpointing a training argument (#13657 ) * Make gradient_checkpointing a training argument * Update src/transformers/modeling_utils.py Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * Update src/transformers/configuration_utils.py Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * Fix tests * Style * document Gradient Checkpointing as a performance feature * Small rename * PoC for not using the config * Adapt BC to new PoC * Forgot to save * Rollout changes to all other models * Fix typo Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Stas Bekman <stas@stason.org>	2021-09-22 07:51:38 -04:00
Sylvain Gugger	002a078aff	Dynamically load model code from the Hub (#13467 ) * Dynamic model * Use defensive flag * Style * Doc and arg rename * Arg rename * Add tests * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Address review comments * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2021-09-20 13:59:21 -04:00
Patrick von Platen	95f933ea85	[Pretrained Model] Add resize_position_embeddings (#13559 ) * finish * delete bogus file * correct some stuff * finish * finish	2021-09-15 19:03:56 +02:00
Sylvain Gugger	74b3344fbc	Clean up test file	2021-08-31 07:06:49 -04:00
Sylvain Gugger	8b2de0e483	Tests fetcher tests (#13340 ) * Incorporate tests dependencies in tests_fetcher * Harder modif * Debug * Loop through all files * Last modules * Remove debug statement	2021-08-31 03:57:01 -04:00
Stas Bekman	5c6eca71a9	fix `AutoModel.from_pretrained(..., torch_dtype=...)` (#13209 ) * fix AutoModel.from_pretrained(..., torch_dtype=...) * fix to_diff_dict * add better test * torch is not always available when a model has self.torch_dtype	2021-08-24 11:43:41 +02:00
Lysandre Debut	3290315a2a	Fix AutoModel tests (#12733 )	2021-07-15 09:06:12 -04:00
Sylvain Gugger	90178b0cef	Add option to load a pretrained model with mismatched shapes (#12664 ) * Add option to load a pretrained model with mismatched shapes * Fail at loading when mismatched shapes in Flax * Fix tests * Update src/transformers/modeling_flax_utils.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Address review comments Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2021-07-13 10:15:15 -04:00
Stas Bekman	2d1d92181a	[roberta] fix lm_head.decoder.weight ignore_key handling (#12446 ) * fix lm_head.decoder.weight ignore_key handling * fix the mutable class variable * Update src/transformers/models/roberta/modeling_roberta.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * replicate the comment * make deterministic Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2021-07-01 10:31:19 -07:00
Stas Bekman	7682e97702	[models] respect dtype of the model when instantiating it (#12316 ) * [models] respect dtype of the model when instantiating it * cleanup * cleanup * rework to handle non-float dtype * fix * switch to fp32 tiny model * improve * use dtype.is_floating_point * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fix the doc * recode to use explicit torch_dtype_auto_detect, torch_dtype args * docs and tweaks * docs and tweaks * docs and tweaks * merge 2 args, add docs * fix * fix * better doc * better doc Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-06-28 20:11:21 -07:00
Lysandre Debut	8ef62ec9e1	Fix torchscript tests (#12336 ) * Fix torchscript tests * Better test * Remove bogus print	2021-06-24 09:52:28 -04:00
Michael Benayoun	986ac03e37	changed modeling_fx_utils.py to utils/fx.py for clarity (#12326 ) Co-authored-by: Michael Benayoun <michael@huggingface.co>	2021-06-23 18:16:24 +02:00
Sylvain Gugger	53c60babe4	Clean push to hub API (#12187 ) * Clean push to hub API * Create working dir if it does not exist * Different tweak * New API + all models + test Flax * Adds the Trainer clean up * Update src/transformers/file_utils.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Address review comments * (nit) output types * No need to set clone_from when folder exists * Update src/transformers/trainer.py Co-authored-by: Julien Chaumond <julien@huggingface.co> * Add generated_from_trainer tag * Update to new version * Fixes Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Julien Chaumond <julien@huggingface.co> Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>	2021-06-23 10:11:19 -04:00
Stas Bekman	372ab9cd6d	[style] consistent nn. and nn.functional: part 3 `tests` (#12155 ) * consistent nn. and nn.functional: p3 templates * restore	2021-06-14 12:18:22 -07:00
NielsRogge	d3eacbb829	Add DETR (#11653 ) * Squash all commits of modeling_detr_v7 branch into one * Improve docs * Fix tests * Style * Improve docs some more and fix most tests * Fix slow tests of ViT, DeiT and DETR * Improve replacement of batch norm * Restructure timm backbone forward * Make DetrForSegmentation support any timm backbone * Fix name of output * Address most comments by @LysandreJik * Give better names for variables * Conditional imports + timm in setup.py * Address additional comments by @sgugger * Make style, add require_timm and require_vision to testsé * Remove train_backbone attribute of DetrConfig, add methods to freeze/unfreeze backbone * Add png files to fixtures * Fix type hint * Add timm to workflows * Add `BatchNorm2d` to the weight initialization * Fix retain_grad test * Replace model checkpoints by Facebook namespace * Fix name of checkpoint in test * Add user-friendly message when scipy is not available * Address most comments by @patrickvonplaten * Remove return_intermediate_layers attribute of DetrConfig and simplify Joiner * Better initialization * Scipy is necessary to get sklearn metrics * Rename TimmBackbone to DetrTimmConvEncoder and rename DetrJoiner to DetrConvModel * Make style * Improve docs and add 2 community notebooks Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>	2021-06-09 11:51:13 -04:00
Lysandre Debut	db0b2477cc	Add some tests to the slow suite #11860	2021-05-25 04:06:06 -04:00
Michael Benayoun	f4a0d6ff86	A cleaner and more scalable implementation of symbolic tracing (#11763 ) Cleaner and more scalable implementation of symbolic tracing with torch.fx, and provides support for new architectures: - ALBERT - DistilBERT - MobileBERT - MegatronBERT - GPT2 - GPT Neo Co-authored-by: Michael Benayoun <michael@huggingface.co>	2021-05-20 18:02:29 +02:00
Sylvain Gugger	469384a777	Fix regression in regression (#11785 ) * Fix regression in regression * Add test	2021-05-20 09:55:13 -04:00
Michael Benayoun	86d5fb0b36	Experimental symbolic tracing feature with torch.fx for BERT, ELECTRA and T5 (#11475 ) Symbolic tracing feature for BERT, ELECTRA and T5 Co-authored-by: Michael Benayoun <michael@huggingface.co> Co-authored-by: Stas Bekman <stas@stason.org> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-05-14 20:57:30 +02:00
Volodymyr Byno	218d552f30	Fix loading the best model on the last stage of training (#11718 )	2021-05-13 16:11:12 -04:00
Sylvain Gugger	f13f1f8fb8	Test checkpointing (#11682 ) * Add test and see where CI is unhappy * Load with strict=False	2021-05-11 12:02:48 -04:00
Vasudev Gupta	dc3f6758cf	Add BigBirdPegasus (#10991 ) * init bigbird pegasus * add debugging nb ; update config * init conversion * update conversion script * complete conversion script * init forward() * complete forward() * add tokenizer * add some slow tests * commit current * fix copies * add docs * add conversion script for bigbird-roberta-summarization * remove TODO * small fixups * correct tokenizer * add bigbird core for now * fix config * fix more * revert pegasus-tokenizer back * make style * everything working for pubmed; yayygit status * complete tests finally * remove bigbird pegasus tok * correct tokenizer * correct tests * add tokenizer files * finish make style * fix test * update * make style * fix tok utils base file * make fix-copies * clean a bit * small update * fix some suggestions * add to readme * fix a bit, clean tests * fix more tests * Update src/transformers/__init__.py * Update src/transformers/__init__.py * make fix-copies * complete attn switching, auto-padding left * make style * fix auto-padding test * make style * fix batched attention tests * put tolerance at 1e-1 for stand-alone decoder test * fix docs * fix tests * correct slow tokenizer conversion * Apply suggestions from code review Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * complete remaining suggestions * fix test Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-05-07 09:27:43 +02:00
Patrick von Platen	3e3e41ae20	Pytorch - Lazy initialization of models (#11471 ) * lazy_init_weights * remove ipdb * save int * add necessary code * remove unnecessary utils * Update src/transformers/models/t5/modeling_t5.py * clean * add tests * correct * finish tests * finish tests * fix some more tests * fix xlnet & transfo-xl * fix more tests * make sure tests are independent * fix tests more * finist tests * final touches * Update src/transformers/modeling_utils.py * Apply suggestions from code review * Update src/transformers/modeling_utils.py Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * Update src/transformers/modeling_utils.py Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * clean tests * give arg positive name * add more mock weights to xlnet Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>	2021-05-05 17:22:20 +02:00
abhishek thakur	c40c7e213b	Add multi-class, multi-label and regression to transformers (#11012 ) * add to bert * review comments * Update src/transformers/configuration_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/configuration_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * self.config.problem_type * fix style * fix * fin * fix * update doc * fix * test * Test more problem types * Update src/transformers/configuration_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fix * remove * fix * quality * make fix-copies * remove test Co-authored-by: abhishek thakur <abhishekkrthakur@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>	2021-05-04 02:23:40 -04:00
Patrick von Platen	f748bd4242	[Flax] Add docstrings & model outputs (#11498 ) * add attentions & hidden states * add model outputs + docs * finish docs * finish tests * finish impl * del @ * finish * finish * correct test * apply sylvains suggestions * Update src/transformers/models/bert/modeling_flax_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * simplify more Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-04-29 12:04:51 +02:00
Patrick von Platen	32dbb2d954	make style (#11442 )	2021-04-26 13:50:34 +02:00
Daniel Stancl	e3ff165aa5	Fix cross-attention head mask for Torch encoder-decoder models (#10605 ) * Fix cross-attention head mask for Torch BART models * Fix head masking for cross-attention module for the following models: BART, Blenderbot, Blenderbot_small, M2M_100, Marian, MBart, Pegasus * Enable test_headmasking for M2M_100 model * Fix cross_head_mask for FSMT, LED and T5 * This commit fixes `head_mask` for cross-attention modules in the following models: FSMT, LED, T5 * It also contains some smaller changes in doc so that it is be perfectly clear the shape of `cross_head_mask` is the same as of `decoder_head_mask` * Update template * Fix template for BartForCausalLM * Fix cross_head_mask for Speech2Text models * Fix cross_head_mask in templates * Fix args order in BartForCausalLM template * Fix doc in BART templates * Make more explicit naming * `cross_head_mask` -> `cross_attn_head_mask` * `cross_layer_head_mask` -> `cross_attn_layer_head_mask` * Fix doc * make style quality * Fix speech2text docstring	2021-04-23 18:58:06 +02:00
Sylvain Gugger	bf2e0cf70b	Trainer push to hub (#11328 ) * Initial support for upload to hub * push -> upload * Fixes + examples * Fix torchhub test * Torchhub test I hate you * push_model_to_hub -> push_to_hub * Apply mixin to other pretrained models * Remove ABC inheritance * Add tests * Typo * Run tests * Install git-lfs * Change approach * Add push_to_hub to all * Staging test suite * Typo * Maybe like this? * More deps * Cache * Adapt name * Quality * MOAR tests * Put it in testing_utils * Docs + torchhub last hope * Styling * Wrong method * Typos * Update src/transformers/file_utils.py Co-authored-by: Julien Chaumond <julien@huggingface.co> * Address review comments * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Julien Chaumond <julien@huggingface.co> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2021-04-23 09:17:37 -04:00
Sylvain Gugger	81009b7a5c	Replace error by warning when loading an architecture in another (#11207 ) * Replace error by warning when loading an architecture in another * Style * Style again * Add a test * Adapt old test	2021-04-13 10:33:52 -04:00
Sylvain Gugger	ba8b1f4754	Add support for multiple models for one config in auto classes (#11150 ) * Add support for multiple models for one config in auto classes * Use get_values everywhere * Prettier doc	2021-04-08 18:41:36 -04:00
NielsRogge	30677dc743	Add Vision Transformer and ViTFeatureExtractor (#10950 ) * Squash all commits into one * Update ViTFeatureExtractor to use image_utils instead of torchvision * Remove torchvision and add Pillow * Small docs improvement * Address most comments by @sgugger * Fix tests * Clean up conversion script * Pooler first draft * Fix quality * Improve conversion script * Make style and quality * Make fix-copies * Minor docs improvements * Should use fix-copies instead of manual handling * Revert "Should use fix-copies instead of manual handling" This reverts commit `fd4e591bce`. * Place ViT in alphabetical order Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-04-01 11:16:05 -04:00
Sylvain Gugger	acc3bd9d2a	Enforce string-formatting with f-strings (#10980 ) * First third * Styling and fix mistake * Quality * All the rest * Treat %s and %d * typo * Missing ) * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2021-03-31 10:00:27 -04:00
Vimarsh Chaturvedi	094afa515d	from_pretrained: check that the pretrained model is for the right model architecture (#10586 ) * Added check to ensure model name passed to from_pretrained and model are the same * Added test to check from_pretrained throws assert error when passed an incompatiable model name * Modified assert in from_pretrained with f-strings. Modified test to ensure desired assert message is being generated * Added check to ensure config and model has model_type * Fix FlauBERT heads Co-authored-by: vimarsh chaturvedi <vimarsh chaturvedi> Co-authored-by: Stas Bekman <stas@stason.org> Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>	2021-03-18 12:51:42 -04:00
Patrick von Platen	0234de8418	Add Fine-Tuning for Wav2Vec2 (#10145 ) * add encode labels function to tokenizer * start adding finetuning * init dropout * upload * correct convert script * apply changes * fix second typo * make first dummy training run * adapt convert script * push confg for comparison * remove conf * finish training * adapt data collator * add research folder * update according to fairseq feedback * some minor corrections * refactor masking indices a bit * some minor changes * clean tokenizer * finish clean-up * remove previous logic * update run script * correct training * finish changes * finish model * correct bug * fix training a bit more * add some tests * finish gradient checkpointing * finish example * correct gradient checkpointing * improve tokenization method * revert changes in tokenizer * revert general change * adapt fine-tuning * update * save intermediate test * Update README.md * finish finetuning * delete conversion script * Update src/transformers/models/wav2vec2/configuration_wav2vec2.py * Update src/transformers/models/wav2vec2/processing_wav2vec2.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * finish wav2vec2 script * finish wav2vec2 fine-tuning * finalize test * correct test * adapt tests * finish * remove test file Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2021-03-01 12:13:17 +03:00
Daniel Stancl	71bdc076dd	Add head_mask and decoder_head_mask to PyTorch LED (#9856 ) * Add {decoder_,}head_mask to LED * Fix create_custom_forward signatue in encoder * Add head_mask to longformer * Add head_mask to longformer to fix dependencies of LED on Longformer. * Not working yet * Add mising one input in longofrmer_modeling.py * make fix-copies	2021-02-02 11:06:52 -08:00
Patrick von Platen	12c1b5b8f4	fix test (#9669 )	2021-01-19 09:06:24 +01:00
Daniel Stancl	357fb1c5d8	Add head_mask/decoder_head_mask for BART (#9569 ) * Add head_mask/decoder_head_mask for BART This branch implement head_mask and decoder_head_mask for BART-based models. Full list below: - BART - MBart - Blenderbot - BlenderbotSmall - Marian - Pegasus Everything is accompanied with updated testing. * Fix test_headmasking for BART models * Fix text_headmasking for BART-like models which has only 2 layers in each modules. The condition ``` self.assertNotEqual(attentions[1][..., 0, :, :].flatten().sum().item(), 0.0) ``` is, therefore, invalid for encoder-decoder models considering the `head_mask` ``` head_mask = torch.ones( self.model_tester.num_hidden_layers, self.model_tester.num_attention_heads, device=torch_device, ) head_mask[0, 0] = 0 head_mask[-1, :-1] = 0 ``` specified in the `test_headmasking` test/function. * Adjust test_modeling_common.py to reflect T5 input args * Update tests/test_modeling_common.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * make style * make fix-copies Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-01-18 13:35:22 +01:00
Stas Bekman	143289dcf7	[test_model_parallelization] multiple fixes (#9354 )	2021-01-04 12:09:12 -08:00
Patrick von Platen	61443cd7d9	[GPT2] Correct gradient checkpointing (#9308 ) * correct gpt2 * fix gpt2 * fix use_cache ordering * correct past tolerance * fix for all cases * style	2020-12-25 23:28:12 +01:00
TobiasNorlund	08abdabda1	Fixed beam search generation for GPT2 and T5 (#9219 )	2020-12-21 08:05:23 -05:00
Patrick von Platen	06971ac4f9	[Bart] Refactor - fix issues, consistency with the library, naming (#8900 ) * remove make on the fly linear embedding * start refactor * big first refactor * save intermediate * save intermediat * correct mask issue * save tests * refactor padding masks * make all tests pass * further refactor * make pegasus test pass * fix bool if * fix leftover tests * continue * bart renaming * delete torchscript test hack * fix imports in tests * correct shift * fix docs and repo cons * re-add fix for FSTM * typo in test * fix typo * fix another typo * continue * hot fix 2 for tf * small fixes * refactor types linting * continue * finish refactor * fix import in tests * better bart names * further refactor and add test * delete hack * apply sylvains and lysandres commens * small perf improv * further perf improv * improv perf * fix typo * make style * small perf improv	2020-12-09 20:55:24 +01:00
Lysandre Debut	aa60b230ec	Patch model parallel test (#8920 ) * Patch model parallel test * Remove line * Remove `ci_*` from scheduled branches	2020-12-03 17:15:47 -05:00
Patrick von Platen	443f67e887	[PyTorch] Refactor Resize Token Embeddings (#8880 ) * fix resize tokens * correct mobile_bert * move embedding fix into modeling_utils.py * refactor * fix lm head resize * refactor * break lines to make sylvain happy * add news tests * fix typo * improve test * skip bart-like for now * check if base_model = get(...) is necessary * clean files * improve test * fix tests * revert style templates * Update templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/modeling_{{cookiecutter.lowercase_modelname}}.py	2020-12-02 19:19:50 +01:00
Lysandre Debut	18c32eeb21	Model parallel tests should return, not pass in non model parallel settings. (#8825 )	2020-11-27 16:41:29 -05:00
Max Del	0a921b6459	BART & FSMT: fix decoder not returning hidden states from the last layer (#8597 ) * Fix decoder not returning hidden states from the last layer * Resolve conflict * Change the way to gather hidden states * Add decoder hidden states test * Make pytest and black happy * Remove redundant line * remove new line Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>	2020-11-27 18:35:34 +01:00
Joe Davison	369f1d77b4	Return correct Bart hidden state tensors (#8747 ) * bart output hidden states upstream * same w/ decoder * add tests * fix prophetnet * fix gpt2 and ctrl * fix fstm and skip test for reformer and longformer * fix all models Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2020-11-25 22:06:04 +01:00
Stas Bekman	e84786aaa6	consistent ignore keys + make private (#8737 ) * consistent ignore keys + make private * style * - authorized_missing_keys => _keys_to_ignore_on_load_missing - authorized_unexpected_keys => _keys_to_ignore_on_load_unexpected * move public doc of private attributes to private comment	2020-11-23 12:33:13 -08:00
alexorona	1cd9be2aeb	gpt2 and t5 parallel modeling (#8696 ) * gpt2 and t5 parallel modeling * model_parallel utils update * adding missing model_parallel_utils Adds missing model_parallel_utils and reverses the changes to code in modeling_gpt2 and modeling_t5 * training_args reformat Reformatted training_args * style formatting Style formatting doc string length on training_args and model_parallel_utils * style changes make style && make quality for training_args and model_parallel_utils. * adding tests * minor change in trainer reverts loss calculation * Update training_args.py * Update training_args.py added back docstring language for adam_beta1 and adam_beta2 * Update trainer.py * Update src/transformers/trainer.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Fix style & rebase Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>	2020-11-23 14:41:23 -05:00
Sylvain Gugger	1073a2bde5	Switch `return_dict` to `True` by default. (#8530 ) * Use the CI to identify failing tests * Remove from all examples and tests * More default switch * Fixes * More test fixes * More fixes * Last fixes hopefully * Use the CI to identify failing tests * Remove from all examples and tests * More default switch * Fixes * More test fixes * More fixes * Last fixes hopefully * Run on the real suite * Fix slow tests	2020-11-16 11:43:00 -05:00
Patrick von Platen	42e2d02e44	[T5] Bug correction & Refactor (#8518 ) * fix bug * T5 refactor * refactor tf * apply sylvains suggestions	2020-11-13 16:57:31 +01:00
Stas Bekman	02bdfc0251	using multi_gpu consistently (#8446 ) * s\|multiple_gpu\|multi_gpu\|g; s\|multigpu\|multi_gpu\|g' * doc	2020-11-10 13:23:58 -05:00
Patrick von Platen	9c83b96e62	[Tests] Add Common Test for Training + Fix a couple of bugs (#8415 ) * add training tests * correct longformer * fix docs * fix some tests * fix some more train tests * remove ipdb * fix multiple edge case model training * fix funnel and prophetnet * clean gpt models * undo renaming of albert	2020-11-09 18:24:41 +01:00
Yossi Synett	bc0d26d1de	[All Seq2Seq model + CLM models that can be used with EncoderDecoder] Add cross-attention weights to outputs (#8071 ) * Output cross-attention with decoder attention output * Update src/transformers/modeling_bert.py * add cross-attention for t5 and bart as well * fix tests * correct typo in docs * add sylvains and sams comments * correct typo Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2020-11-06 19:34:48 +01:00
Guillaume Filion	27b402cab0	Output global_attentions in Longformer models (#7562 ) * Output global_attentions in Longformer models * make style * small refactoring * fix tests * make fix-copies * add for tf as well * remove comments in test * make fix-copies * make style * add docs * make docstring pretty Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>	2020-11-05 21:10:43 +01:00
Patrick von Platen	a1bbcf3f6c	Refactoring the generate() function (#6949 ) * first draft * show design proposition for new generate method * up * make better readable * make first version * gpt2 tests pass * make beam search for gpt2 work * add first encoder-decoder code * delete typo * make t5 work * save indermediate * make bart work with beam search * finish beam search bart / t5 * add default kwargs * make more tests pass * fix no bad words sampler * some fixes and tests for all distribution processors * fix test * fix rag slow tests * merge to master * add nograd to generate * make all slow tests pass * speed up generate * fix edge case bug * small fix * correct typo * add type hints and docstrings * fix typos in tests * add beam search tests * add tests for beam scorer * fix test rag * finish beam search tests * move generation tests in seperate file * fix generation tests * more tests * add aggressive generation tests * fix tests * add gpt2 sample test * add more docstring * add more docs * finish doc strings * apply some more of sylvains and sams comments * fix some typos * make fix copies * apply lysandres and sylvains comments * final corrections on examples * small fix for reformer	2020-11-03 16:04:22 +01:00
Lysandre Debut	10f8c63620	Ci test tf super slow (#8007 ) * Test TF GPU CI * Change cache * Fix missing torch requirement * Fix some model tests Style * LXMERT * MobileBERT * Longformer skip test * XLNet * The rest of the tests * RAG goes OOM in multi gpu setup * YAML test files * Last fixes * Skip doctests * Fill mask tests * Yaml files * Last test fix * Style * Update cache * Change ONNX tests to slow + use tiny model	2020-10-30 10:25:48 -04:00
Santiago Castro	969859d5f6	Fix doc errors and typos across the board (#8139 ) * Fix doc errors and typos across the board * Fix a typo * Fix the CI * Fix more typos * Fix CI * More fixes * Fix CI * More fixes * More fixes	2020-10-29 10:33:33 -04:00
Stas Bekman	57516c0cc8	[multiple models] skip saving/loading deterministic state_dict keys (#7878 ) * make the save_load special key tests common * handle mbart * cleaner solution * fix * move test_save_load_missing_keys back into fstm for now * restore * style * add marian * add pegasus * blenderbot * revert - no static embed	2020-10-21 08:06:07 -04:00
Stas Bekman	3e31e7f956	[testing] rename skip targets + docs (#7863 ) * rename skip targets + docs * fix quotes * style * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * small improvements * fix Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2020-10-20 04:39:13 -04:00
Weizhen	2422cda01b	ProphetNet (#7157 ) * add new model prophetnet prophetnet modified modify codes as suggested v1 add prophetnet test files * still bugs, because of changed output formats of encoder and decoder * move prophetnet into the latest version * clean integration tests * clean tokenizers * add xlm config to init * correct typo in init * further refactoring * continue refactor * save parallel * add decoder_attention_mask * fix use_cache vs. past_key_values * fix common tests * change decoder output logits * fix xlm tests * make common tests pass * change model architecture * add tokenizer tests * finalize model structure * no weight mapping * correct n-gram stream attention mask as discussed with qweizhen * remove unused import * fix index.rst * fix tests * delete unnecessary code * add fast integration test * rename weights * final weight remapping * save intermediate * Descriptions for Prophetnet Config File * finish all models * finish new model outputs * delete unnecessary files * refactor encoder layer * add dummy docs * code quality * fix tests * add model pages to doctree * further refactor * more refactor, more tests * finish code refactor and tests * remove unnecessary files * further clean up * add docstring template * finish tokenizer doc * finish prophetnet * fix copies * fix typos * fix tf tests * fix fp16 * fix tf test 2nd try * fix code quality * add test for each model * merge new tests to branch * Update model_cards/microsoft/prophetnet-large-uncased-cnndm/README.md Co-authored-by: Sam Shleifer <sshleifer@gmail.com> * Update model_cards/microsoft/prophetnet-large-uncased-cnndm/README.md Co-authored-by: Sam Shleifer <sshleifer@gmail.com> * Update src/transformers/modeling_prophetnet.py Co-authored-by: Sam Shleifer <sshleifer@gmail.com> * Update utils/check_repo.py Co-authored-by: Sam Shleifer <sshleifer@gmail.com> * apply sams and sylvains comments * make style * remove unnecessary code * Update README.md Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update README.md Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/configuration_prophetnet.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * implement lysandres comments * correct docs * fix isort * fix tokenizers * fix copies Co-authored-by: weizhen <weizhen@mail.ustc.edu.cn> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Sam Shleifer <sshleifer@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2020-10-19 17:36:09 +02:00
Sam Shleifer	960faaaf28	Blenderbot (#7418 ) Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2020-10-07 19:09:23 -04:00
Patrick von Platen	62f5ae68ec	[Seq2Seq] Fix a couple of bugs and clean examples (#7474 ) * clean T5 * fix t5 tests * fix index typo * fix tf common test * fix examples * change positional ordering for Bart and FSTM * add signature test * clean docs and add tests * add docs to encoder decoder * clean docs * correct two doc strings * remove sig test for TF Elektra & Funnel * fix tf t5 slow tests * fix input_ids to inputs in tf * Update src/transformers/modeling_bart.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/modeling_bart.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * implement lysandre results * make style * fix encoder decoder typo * fix tf slow tests * fix slow tests * renaming * remove unused input Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2020-10-01 17:38:50 +02:00
Sylvain Gugger	d155b38d6e	Funnel transformer (#6908 ) * Initial model * Fix upsampling * Add special cls token id and test * Formatting * Test and fist FunnelTokenizerFast * Common tests * Fix the check_repo script and document Funnel * Doc fixes * Add all models * Write doc * Fix test * Initial model * Fix upsampling * Add special cls token id and test * Formatting * Test and fist FunnelTokenizerFast * Common tests * Fix the check_repo script and document Funnel * Doc fixes * Add all models * Write doc * Fix test * Fix copyright * Forgot some layers can be repeated * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/modeling_funnel.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Address review comments * Update src/transformers/modeling_funnel.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Address review comments * Update src/transformers/modeling_funnel.py Co-authored-by: Sam Shleifer <sshleifer@gmail.com> * Slow integration test * Make small integration test * Formatting * Add checkpoint and separate classification head * Formatting * Expand list, fix link and add in pretrained models * Styling * Add the model in all summaries * Typo fixes Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Sam Shleifer <sshleifer@gmail.com>	2020-09-08 08:08:08 -04:00
Stas Bekman	e71f32c0ef	[testing] fix ambiguous test (#6898 ) Since `generate()` does: ``` num_beams = num_beams if num_beams is not None else self.config.num_beams ``` This test fails if `model.config.num_beams > 1` (which is the case in the model I'm porting). This fix makes the test setup unambiguous by passing an explicit `num_beams=1` to `generate()`. Thanks.	2020-09-02 16:18:17 +02:00
Lysandre	a75c64d80c	Black 20 release	2020-08-26 17:20:22 +02:00
Sylvain Gugger	a573777901	Update repo to isort v5 (#6686 ) * Run new isort * More changes * Update CI, CONTRIBUTING and benchmarks	2020-08-24 11:03:01 -04:00
Patrick von Platen	505f2d749e	[Tests] fix attention masks in Tests (#6621 ) * fix distilbert * fix typo	2020-08-20 13:23:47 -04:00
Patrick von Platen	8bcceaceff	fix model outputs test (#6593 )	2020-08-19 16:18:51 +02:00
Pradhy729	2a7402cbd3	Feed forward chunking others (#6365 ) * Feed forward chunking for Distilbert & Albert * Added ff chunking for many other models * Change model signature * Added chunking for XLM * Cleaned up by removing some variables. * remove test_chunking flag Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>	2020-08-19 14:31:10 +02:00
Lysandre Debut	f7cbc13db7	Test model outputs equivalence (#6445 ) * Test model outputs equivalence * Fix failing tests * From dict to kwargs * DistilBERT * Addressing @sgugger and @patrickvonplaten's comments	2020-08-13 11:59:35 -04:00
Pradhy729	b25cec13c5	Feed forward chunking (#6024 ) * Chunked feed forward for Bert This is an initial implementation to test applying feed forward chunking for BERT. Will need additional modifications based on output and benchmark results. * Black and cleanup * Feed forward chunking in BertLayer class. * Isort * add chunking for all models * fix docs * Fix typo Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>	2020-08-11 03:12:45 -04:00
Sylvain Gugger	d951c14ae4	Model output test (#6155 ) * Use return_dict=True in all tests * Formatting	2020-07-31 09:44:37 -04:00
Sylvain Gugger	91cb95461e	Switch from return_tuple to return_dict (#6138 ) * Switch from return_tuple to return_dict * Fix test * [WIP] Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleC… (#5614) * Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleChoice} models and tests * AutoModels Tiny tweaks * Style * Final changes before merge * Re-order for simpler review * Final fixes * Addressing @sgugger's comments * Test MultipleChoice * Rework TF trainer (#6038) * Fully rework training/prediction loops * fix method name * Fix variable name * Fix property name * Fix scope * Fix method name * Fix tuple index * Fix tuple index * Fix indentation * Fix variable name * fix eval before log * Add drop remainder for test dataset * Fix step number + fix logging datetime * fix eval loss value * use global step instead of step + fix logging at step 0 * Fix logging datetime * Fix global_step usage * Fix breaking loop + logging datetime * Fix step in prediction loop * Fix step breaking * Fix train/test loops * Force TF at least 2.2 for the trainer * Use assert_cardinality to facilitate the dataset size computation * Log steps per epoch * Make tfds compliant with TPU * Make tfds compliant with TPU * Use TF dataset enumerate instead of the Python one * revert previous commit * Fix data_dir * Apply style * rebase on master * Address Sylvain's comments * Address Sylvain's and Lysandre comments * Trigger CI * Remove unused import * Switch from return_tuple to return_dict * Fix test * Add recent model Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Julien Plu <plu.julien@gmail.com>	2020-07-30 09:17:00 -04:00
Lysandre Debut	3f94170a10	[WIP] Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleC… (#5614 ) * Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleChoice} models and tests * AutoModels Tiny tweaks * Style * Final changes before merge * Re-order for simpler review * Final fixes * Addressing @sgugger's comments * Test MultipleChoice	2020-07-29 14:26:26 -04:00
Stas Bekman	35cb101eae	DataParallel fixes (#5733 ) * DataParallel fixes: 1. switched to a more precise check - if self.args.n_gpu > 1: + if isinstance(model, nn.DataParallel): 2. fix tests - require the same fixup under DataParallel as the training module * another fix	2020-07-20 09:29:12 -04:00
Sylvain Gugger	edfd82f5ff	Change model outputs types to self-document outputs (#5438 ) * [WIP] Proposal for model outputs * All Bert models * Make CI green maybe? * Fix ONNX test * Isolate ModelOutput from pt and tf * Formatting * Add Electra models * Auto-generate docstrings from outputs * Add TF outputs * Add some BERT models * Revert TF side * Remove last traces of TF changes * Fail with a clear error message * Add Albert and work through Bart * Add CTRL and DistilBert * Formatting * Progress on Bart * Renames and finish Bart * Formatting * Fix last test * Add DPR * Finish Electra and add FlauBERT * Add GPT2 * Add Longformer * Add MMBT * Add MobileBert * Add GPT * Formatting * Add Reformer * Add Roberta * Add T5 * Add Transformer XL * Fix test * Add XLM + fix XLMForTokenClassification * Style + XLMRoberta * Add XLNet * Formatting * Add doc of return_tuple arg	2020-07-10 11:36:53 -04:00
Sam Shleifer	d4886173b2	[Bart] enable test_torchscript, update test_tie_weights (#5457 ) * Passing all but one torchscript test * Style * move comment * remove unneeded assert	2020-07-07 10:06:48 -04:00
Patrick von Platen	d697b6ca75	[Longformer] Major Refactor (#5219 ) * refactor naming * add small slow test * refactor * refactor naming * rename selected to extra * big global attention refactor * make style * refactor naming * save intermed * refactor functions * finish function refactor * fix tests * fix longformer * fix longformer * fix longformer * fix all tests but one * finish longformer * address sams and izs comments * fix transpose	2020-07-01 17:43:32 +02:00
Sam Shleifer	13deb95a40	Move tests/utils.py -> transformers/testing_utils.py (#5350 )	2020-07-01 10:31:17 -04:00
Thomas Wolf	27cf1d97f0	[Tokenization] Fix #5181 - make #5155 more explicit - move back the default logging level in tests to WARNING (#5252 ) * fix-5181 Padding to max sequence length while truncation to another length was wrong on slow tokenizers * clean up and fix #5155 * fix XLM test * Fix tests for Transfo-XL * logging only above WARNING in tests * switch slow tokenizers tests in @slow * fix Marian truncation tokenization test * style and quality * make the test a lot faster by limiting the sequence length used in tests	2020-06-25 17:24:28 +02:00
Joseph Liu	f4e1f02210	Output hidden states (#4978 ) * Configure all models to use output_hidden_states as argument passed to foward() * Pass all tests * Remove cast_bool_to_primitive in TF Flaubert model * correct tf xlnet * add pytorch test * add tf test * Fix broken tests * Configure all models to use output_hidden_states as argument passed to foward() * Pass all tests * Remove cast_bool_to_primitive in TF Flaubert model * correct tf xlnet * add pytorch test * add tf test * Fix broken tests * Refactor output_hidden_states for mobilebert * Reset and remerge to master Co-authored-by: Joseph Liu <joseph.liu@coinflex.com> Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>	2020-06-22 10:10:45 -04:00
Patrick von Platen	ebba39e4e1	[Bart] Question Answering Model is added to tests (#5024 ) * fix test * Update tests/test_modeling_common.py * Update tests/test_modeling_common.py	2020-06-15 22:50:09 +02:00
Sylvain Gugger	d541938c48	Make multiple choice models work with input_embeds (#4921 )	2020-06-10 18:38:34 -04:00
Sylvain Gugger	ac99217e92	Fix the CI (#4903 ) * Fix CI	2020-06-10 09:26:06 -04:00
Sylvain Gugger	0a375f5abd	Deal with multiple choice in common tests (#4886 ) * Deal with multiple choice in common tests	2020-06-10 08:10:20 -04:00
Bharat Raghunathan	6e603cb789	[All models] Extend config.output_attentions with output_attentions function arguments (#4538 ) * DOC: Replace instances of ``config.output_attentions`` with function argument ``output_attentions`` * DOC: Apply Black Formatting * Fix errors where output_attentions was undefined * Remove output_attentions in classes per review * Fix regressions on tests having `output_attention` * Fix further regressions in tests relating to `output_attentions` Ensure proper propagation of `output_attentions` as a function parameter to all model subclasses * Fix more regressions in `test_output_attentions` * Fix issues with BertEncoder * Rename related variables to `output_attentions` * fix pytorch tests * fix bert and gpt2 tf * Fix most TF tests for `test_output_attentions` * Fix linter errors and more TF tests * fix conflicts * DOC: Apply Black Formatting * Fix errors where output_attentions was undefined * Remove output_attentions in classes per review * Fix regressions on tests having `output_attention` * fix conflicts * fix conflicts * fix conflicts * fix conflicts * fix pytorch tests * fix conflicts * fix conflicts * Fix linter errors and more TF tests * fix tf tests * make style * fix isort * improve output_attentions * improve tensorflow Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2020-06-09 23:39:06 +02:00
Julien Chaumond	d4c2cb402d	Kill model archive maps (#4636 ) * Kill model archive maps * Fixup * Also kill model_archive_map for MaskedBertPreTrainedModel * Unhook config_archive_map * Tokenizers: align with model id changes * make style && make quality * Fix CI	2020-06-02 09:39:33 -04:00
Julien Chaumond	4c06893610	Fix nn.DataParallel compatibility in PyTorch 1.5 (#4300 ) * Test case for #3936 * multigpu tests pass on pytorch 1.4.0 * Fixup * multigpu tests pass on pytorch 1.5.0 * Update src/transformers/modeling_utils.py * Update src/transformers/modeling_utils.py * rename multigpu to require_multigpu * mode doc	2020-05-18 20:34:50 -04:00
Patrick von Platen	dca34695d0	Reformer (#3351 ) * first copy & past commit from Bert and morgans LSH code * add easy way to compare to trax original code * translate most of function * make trax lsh self attention deterministic with numpy seed + copy paste code * add same config * add same config * make layer init work * implemented hash_vectors function for lsh attention * continue reformer translation * hf LSHSelfAttentionLayer gives same output as trax layer * refactor code * refactor code * refactor code * refactor * refactor + add reformer config * delete bogus file * split reformer attention layer into two layers * save intermediate step * save intermediate step * make test work * add complete reformer block layer * finish reformer layer * implement causal and self mask * clean reformer test and refactor code * fix merge conflicts * fix merge conflicts * update init * fix device for GPU * fix chunk length init for tests * include morgans optimization * improve memory a bit * improve comment * factorize num_buckets * better testing parameters * make whole model work * make lm model work * add t5 copy paste tokenizer * add chunking feed forward * clean config * add improved assert statements * make tokenizer work * improve test * correct typo * extend config * add complexer test * add new axial position embeddings * add local block attention layer * clean tests * refactor * better testing * save intermediate progress * clean test file * make shorter input length work for model * allow variable input length * refactor * make forward pass for pretrained model work * add generation possibility * finish dropout and init * make style * refactor * add first version of RevNet Layers * make forward pass work and add convert file * make uploaded model forward pass work * make uploaded model forward pass work * refactor code * add namedtuples and cache buckets * correct head masks * refactor * made reformer more flexible * make style * remove set max length * add attention masks * fix up tests * fix lsh attention mask * make random seed optional for the moment * improve memory in reformer * add tests * make style * make sure masks work correctly * detach gradients * save intermediate * correct backprob through gather * make style * change back num hashes * rename to labels * fix rotation shape * fix detach * update * fix trainer * fix backward dropout * make reformer more flexible * fix conflict * fix * fix * add tests for fixed seed in reformer layer * fix trainer typo * fix typo in activations * add fp16 tests * add fp16 training * support fp16 * correct gradient bug in reformer * add fast gelu * re-add dropout for embedding dropout * better naming * better naming * renaming * finalize test branch * finalize tests * add more tests * finish tests * fix * fix type trainer * fix fp16 tests * fix tests * fix tests * fix tests * fix issue with dropout * fix dropout seeds * correct random seed on gpu * finalize random seed for dropout * finalize random seed for dropout * remove duplicate line * correct half precision bug * make style * refactor * refactor * docstring * remove sinusoidal position encodings for reformer * move chunking to modeling_utils * make style * clean config * make style * fix tests * fix auto tests * pretrained models * fix docstring * update conversion file * Update pretrained_models.rst * fix rst * fix rst * update copyright * fix test path * fix test path * fix small issue in test * include reformer in generation tests * add docs for axial position encoding * finish docs * Update convert_reformer_trax_checkpoint_to_pytorch.py * remove isort * include sams comments * remove wrong comment in utils * correct typos * fix typo * Update reformer.rst * applied morgans optimization * make style * make gpu compatible * remove bogus file * big test refactor * add example for chunking * fix typo * add to README	2020-05-07 10:17:01 +02:00
Lysandre Debut	79b1c6966b	Pytorch 1.5.0 (#3973 ) * Standard deviation can no longer be set to 0 * Remove torch pinned version * 9th instead of 10th, silly me	2020-05-05 10:23:01 -04:00
Sam Shleifer	2c77842887	[Fix common tests on GPU] send model, ids to torch_device (#4014 )	2020-04-29 09:47:20 -04:00
Patrick von Platen	01c37dcdb5	[Config, Caching] Remove `output_past` everywhere and replace by `use_cache` argument (#3734 ) * remove output_past from pt * make style * add optional input length for gpt2 * add use cache to prepare input * save memory in gpt2 * correct gpt2 test inputs * make past input optional for gpt2 * finish use_cache for all models * make style * delete modeling_gpt2 change in test file * correct docstring * correct is true statements for gpt2	2020-04-14 14:40:28 -04:00
Patrick von Platen	ce2298fb5f	[T5, generation] Add decoder caching for T5 (#3682 ) * initial commit to add decoder caching for T5 * better naming for caching * finish T5 decoder caching * correct test * added extensive past testing for T5 * clean files * make tests cleaner * improve docstring * improve docstring * better reorder cache * make style * Update src/transformers/modeling_t5.py Co-Authored-By: Yacine Jernite <yjernite@users.noreply.github.com> * make set output past work for all layers * improve docstring * improve docstring Co-authored-by: Yacine Jernite <yjernite@users.noreply.github.com>	2020-04-10 01:02:50 +02:00
Patrick von Platen	2ee410560e	[Generate, Test] Split generate test function into beam search, no beam search (#3601 ) * split beam search and no beam search test * fix test * clean generate tests	2020-04-06 10:37:05 +02:00
Patrick von Platen	b38d552a92	[Generate] Add bad words list argument to the generate function (#3367 ) * add bad words list * make style * add bad_words_tokens * make style * better naming * make style * fix typo	2020-03-31 18:42:31 +02:00
Sam Shleifer	39371ee454	[Bart/Memory] don't create lm_head (#3323 ) * delete lm_head, skips weight tying * Fixed s3	2020-03-26 18:40:39 -04:00
Patrick von Platen	bbf26c4e61	Support T5 Generation (#3228 ) * fix conflicts * update bart max length test * correct spelling mistakes * implemented model specific encode function * fix merge conflicts * better naming * save intermediate state -> need to rethink strucuture a bit * leave tf problem as it is for now * current version * add layers.pop * remove ipdb * make style * clean return cut decoding * remove ipdbs * Fix restoring layers in the decoders that doesnt exists. * push good intermediate solution for now * fix conflicts * always good to refuse to merge conflicts when rebasing * fix small bug * improve function calls * remove unused file * add correct scope behavior for t5_generate Co-authored-by: Morgan Funtowicz <funtowiczmo@gmail.com>	2020-03-19 23:18:23 +01:00
Patrick von Platen	e8f44af5bf	[generate] do_sample default back to False (#3298 ) * change do_samples back * None better default as boolean * adapt do_sample to True in test example * make style	2020-03-17 10:52:37 -04:00
Patrick von Platen	aceb3fbaf4	only do output_past=True for language generation in bart	2020-03-11 11:06:56 +01:00
Patrick von Platen	ff648221bd	fix conflicts	2020-03-11 11:06:56 +01:00
Patrick von Platen	c0d9dd3ba9	refactored code a bit and made more generic	2020-03-11 11:06:56 +01:00
Patrick von Platen	d8e2b3c547	fix conflicts	2020-03-11 11:06:56 +01:00
Lysandre Debut	0001d05686	Correct missing keys + test (#3143 )	2020-03-05 17:01:54 -05:00
Patrick von Platen	4134100363	Add generate() functionality to TF 2.0 (#3063 ) * add first copy past test to tf 2 generate * add tf top_k_top_p_filter fn * add generate function for TF * add generate function for TF * implemented generate for all models expect transfoXL * implemented generate for all models expect transfoXL * implemented generate for all models expect transfoXL * make style * change permission of test file to correct ones * delete ipdb * delete ipdb * fix bug and finish simple gpt2 integration test * clean test file * clean test file * make style * make style * make style * make style * change import style * change import style * make style * make style * add decorators * add decorators * fix tf ctrl bug dim => axis in TF * make style * make style * refactored test file * refactored test file * take out test_torch_tf_conversion if nothing is defined * take out test_torch_tf_conversion if nothing is defined * remove useless files * remove useless files * fix conflicts * fix conflicts * fix conflicts * fix conflicts * fix conflicts * solve conflicts * solve conflicts * fix conflicts * fix conflicts * merge conflicts * delete ipdb * exposed top_k_top_p_filtering fns * delete weirdly created w! file * add comment to test tf common modeling * fix conflicts * fix conflicts * make style * merge conflicts * make style * change tf.tensor.shape to shape_list(tensor)	2020-03-03 09:42:15 -05:00
Patrick von Platen	2fdc7f6ce8	correct greedy generation when doing beam search (#3078 ) * correct greedy generation when doing beam search * improve comment	2020-03-02 12:00:09 -05:00
Julien Chaumond	9cda3620b6	Fix (non-slow) tests on GPU (torch) (#3024 ) * Fix tests on GPU (torch) * Fix bart slow tests Co-authored-by: Sam Shleifer <sshleifer@gmail.com>	2020-02-26 11:59:25 -05:00
Patrick von Platen	17c45c39ed	Add slow generate tests for pretrained lm models (#2909 ) * add slow generate lm_model tests * fix conflicts * merge conflicts * fix conflicts * add slow generate lm_model tests * make style * delete unused variable * fix conflicts * fix conflicts * fix conflicts * delete unused variable * fix conflicts * finished hard coded tests	2020-02-24 11:51:57 -05:00
Patrick von Platen	fc38d4c86f	Improve special_token_id logic in run_generation.py and add tests (#2885 ) * improving generation * finalized special token behaviour for no_beam_search generation * solved modeling_utils merge conflict * solve merge conflicts in modeling_utils.py * add run_generation improvements from PR #2749 * adapted language generation to not use hardcoded -1 if no padding token is available * remove the -1 removal as hard coded -1`s are not necessary anymore * add lightweight language generation testing for randomely initialized models - just checking whether no errors are thrown * add slow language generation tests for pretrained models using hardcoded output with pytorch seed * delete ipdb * check that all generated tokens are valid * renaming * renaming Generation -> Generate * make style * updated so that generate_beam_search has same token behavior than generate_no_beam_search * consistent return format for run_generation.py * deleted pretrain lm generate tests -> will be added in another PR * cleaning of unused if statements and renaming * run_generate will always return an iterable * make style * consistent renaming * improve naming, make sure generate function always returns the same tensor, add docstring * add slow tests for all lmhead models * make style and improve example comments modeling_utils * better naming and refactoring in modeling_utils * improving generation * finalized special token behaviour for no_beam_search generation * solved modeling_utils merge conflict * solve merge conflicts in modeling_utils.py * add run_generation improvements from PR #2749 * adapted language generation to not use hardcoded -1 if no padding token is available * remove the -1 removal as hard coded -1`s are not necessary anymore * add lightweight language generation testing for randomely initialized models - just checking whether no errors are thrown * add slow language generation tests for pretrained models using hardcoded output with pytorch seed * delete ipdb * check that all generated tokens are valid * renaming * renaming Generation -> Generate * make style * updated so that generate_beam_search has same token behavior than generate_no_beam_search * consistent return format for run_generation.py * deleted pretrain lm generate tests -> will be added in another PR * cleaning of unused if statements and renaming * run_generate will always return an iterable * make style * consistent renaming * improve naming, make sure generate function always returns the same tensor, add docstring * add slow tests for all lmhead models * make style and improve example comments modeling_utils * better naming and refactoring in modeling_utils * changed fast random lm generation testing design to more general one * delete in old testing design in gpt2 * correct old variable name * temporary fix for encoder_decoder lm generation tests - has to be updated when t5 is fixed * adapted all fast random generate tests to new design * better warning description in modeling_utils * better comment * better comment and error message Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>	2020-02-21 12:09:59 -05:00
Sam Shleifer	53ce3854a1	New BartModel (#2745 ) * Results same as fairseq * Wrote a ton of tests * Struggled with api signatures * added some docs	2020-02-20 18:11:13 -05:00
sshleifer	9e5b549b4d	fix default getattr	2020-02-04 16:38:52 -05:00
sshleifer	25848a6094	double quotes	2020-02-04 16:38:52 -05:00
sshleifer	cbcb83f21d	minor cleanup of test_attention_outputs	2020-02-04 16:38:52 -05:00
Julien Chaumond	d9fa1bad72	Fix failing torchscript test for xlnet model.parameters() order is apparently not stable (only for xlnet, for some reason)	2020-01-15 20:22:21 -05:00
Julien Chaumond	715fa638a7	Merge branch 'master' into from_scratch_training	2020-01-14 18:58:21 +00:00
Lysandre	100e3b6f21	Bias should be resized with the weights Created a link between the linear layer bias and the model attribute bias. This does not change anything for the user nor for the conversion scripts, but allows the `resize_token_embeddings` method to resize the bias as well as the weights of the decoder. Added a test.	2020-01-14 13:43:45 -05:00
Julien Chaumond	c6f682c1eb	flake	2020-01-11 03:18:31 +00:00
Julien Chaumond	2f32dfd33b	Convention: name mixins mixins	2020-01-11 01:24:29 +00:00
Julien Chaumond	055e80cfad	rm old ConfigTester	2020-01-10 21:36:18 +00:00
alberduris	81d6841b4b	GPU text generation: mMoved the encoded_prompt to correct device	2020-01-06 15:11:12 +01:00
alberduris	dd4df80f0b	Moved the encoded_prompts to correct device	2020-01-06 15:11:12 +01:00
Aymeric Augustin	e6c0019c80	Remove unused variables in tests.	2019-12-23 22:38:18 +01:00
Aymeric Augustin	798b3b3899	Remove sys.version_info[0] == 2 or 3.	2019-12-22 18:38:42 +01:00
Aymeric Augustin	c824d15aa1	Remove __future__ imports.	2019-12-22 17:47:54 +01:00
Aymeric Augustin	daf8bebcdd	Remove unused GPTModelTester. It isn't imported anywhere.	2019-12-22 15:35:25 +01:00
Aymeric Augustin	345c23a60f	Replace (TF)CommonTestCases for modeling with a mixin. I suspect the wrapper classes were created in order to prevent the abstract base class (TF)CommonModelTester from being included in test discovery and running, because that would fail. I solved this by replacing the abstract base class with a mixin. Code changes are just de-indenting and automatic reformattings performed by black to use the extra line space.	2019-12-22 15:35:18 +01:00
Aymeric Augustin	7e98e211f0	Remove unittest.main() in test modules. This construct isn't used anymore these days. Running python tests/test_foo.py puts the tests/ directory on PYTHONPATH, which isn't representative of how we run tests. Use python -m unittest tests/test_foo.py instead.	2019-12-22 14:42:03 +01:00
Aymeric Augustin	ced0a94204	Switch test files to the standard test_*.py scheme.	2019-12-22 14:15:13 +01:00

... 3 4 5 6 7 ...

434 Commits