transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-04 21:30:07 +06:00

Author	SHA1	Message	Date
Lucain	169b8cde47	Fix mock in `test_cached_files_are_used_when_internet_is_down` (#18804 )	2022-08-29 15:56:08 +02:00
Yih-Dar	8b67f20935	Fix memory leak issue in `torch_fx` tests (#18547 ) Co-authored-by: Lysandre Debut <hi@lysand.re> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-08-29 11:43:20 +02:00
Sylvain Gugger	5cd4032368	Use new huggingface_hub tools for download models (#18438 ) * Draft new cached_file * Initial draft for config and model * Small fixes * Fix first batch of tests * Look in cache when internet is down * Fix last tests * Bad black, not fixing all quality errors * Make diff less * Implement change for TF and Flax models * Add tokenizer and feature extractor * For compatibility with main * Add utils to move the cache and auto-do it at first use. * Quality * Deal with empty commit shas * Deal with empty etag * Address review comments	2022-08-05 10:12:40 -04:00
NielsRogge	f9a0008d2d	Add VideoMAE (#17821 ) * First draft * Add VideoMAEForVideoClassification * Improve conversion script * Add VideoMAEForPreTraining * Add VideoMAEFeatureExtractor * Improve VideoMAEFeatureExtractor * Improve docs * Add first draft of model tests * Improve VideoMAEForPreTraining * Fix base_model_prefix * Make model take pixel_values of shape (B, T, C, H, W) * Add loss computation of VideoMAEForPreTraining * Improve tests * Improve model testsé * Make all tests pass * Add VideoMAE to main README * Add tests for VideoMAEFeatureExtractor * Add integration test * Improve conversion script * Rename patch embedding class * Remove VideoMAELayer from init * Update design of patch embeddings * Improve comments * Improve conversion script * Improve conversion script * Add conversion of pretrained model * Add loss verification of pretrained model * Add loss verification of unnormalized targets * Add integration test for pretraining model * Apply suggestions from code review * Fix bug to make feature extractor resize only shorter edge * Address more comments * Improve normalization of videos * Add doc examples * Move constants to dedicated script * Remove scripts * Transfer checkpoints, fix docs * Update script * Update image mean and std * Fix doc tests * Set return_tensors to NumPy by default * Revert the previous change Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>	2022-08-04 18:02:55 +02:00
Sylvain Gugger	01db72abd4	Rewrite push_to_hub to use upload_files (#18366 ) * Rewrite push_to_hub to use upload_files * Adapt the doc a bit * Address review comments and clean doc	2022-08-01 12:07:30 -04:00
Mikkel Denker	70e7d1d656	Fixes torch jit tracing for LayoutLMv2 model (re-open) (#18313 ) * Fixes torch jit tracing for LayoutLMv2 model. Pytorch seems to reuse memory for input_shape which caused a mismatch in shapes later in the forward pass. * Fixed code quality * avoid unneeded allocation of vector for shape	2022-07-27 06:38:40 -04:00
Patrick von Platen	3bb6356d4d	[From pretrained] Allow download from subfolder inside model repo (#18184 ) * add first generation tutorial * [from_pretrained] Allow loading models from subfolders * remove gen file * add doc strings * allow download from subfolder * add tests * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * apply comments * correct doc string Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-07-19 11:53:53 +02:00
Yih-Dar	6561fbcc6e	Update TF(Vision)EncoderDecoderModel PT/TF equivalence tests (#18073 ) Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-07-18 15:29:14 +02:00
Sylvain Gugger	df8e6804c0	Offload fixes (#17810 ) * Offload fixes * Add a test	2022-06-22 12:23:07 -04:00
Yih-Dar	f47afefb21	Use 5e-5 For BigBird PT/Flax equivalence tests (#17780 ) * rename to check_pt_flax_outputs * update check_pt_flax_outputs * use 5e-5 for BigBird PT/Flax test Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-06-21 17:55:26 +02:00
Lysandre Debut	6a5272b205	Prepare transformers for v0.8.0 huggingface-hub release (#17716 ) * Prepare CI for v0.8.0 * pin hfh (revert before merge) * Revert "pin hfh (revert before merge)" This reverts commit `a0103140e1`. * Test rc3 * Test latest rc * Unpin to the RC Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>	2022-06-21 11:51:18 -04:00
Stas Bekman	75343de938	[modeling_utils] torch_dtype/auto floating dtype fixes (#17614 ) * [modeling_utils] torch_dtype/auto fixes * add test * apply suggestions * add missing fallback * Renaming things * Use for else Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>	2022-06-09 10:18:26 -07:00
amyeroberts	dfc76b2542	has_attentions - consistent test skipping logic and tf tests (#17495 )	2022-06-09 09:50:03 +02:00
Michael Benayoun	5c8f601007	Fx support for Deberta-v[1-2], Hubert and LXMERT (#17539 ) * Support for deberta and deberta-v2 * Support for LXMert * Support for Hubert * Fix for pt1.11 * Trigger CI	2022-06-07 18:05:20 +02:00
Sylvain Gugger	8343901263	Fix all offload and MP tests (#17533 )	2022-06-03 09:59:13 -04:00
Sylvain Gugger	4390151ba2	Fix MP and CPU offload tests for Funnel and GPT-Neo (#17503 )	2022-06-01 09:59:40 -04:00
Sylvain Gugger	567d9c061d	Disk offload fix (#17428 ) * Fix offload to disk for big models * Add test * Fix test for other models	2022-05-31 09:16:18 -04:00
Michael Benayoun	28d0048218	Fx support for multiple model architectures (#17393 ) * Support for Bart and LayoutLM, and partial support for XLNet * Support for mbart * A lot of new models supported * Support for other models * LayoutLM fix * Use strings instead of classes	2022-05-31 10:02:55 +02:00
Sylvain Gugger	98f6e1ee87	Fix model parallelism test (#17439 )	2022-05-26 09:57:12 -04:00
Sylvain Gugger	31484afbed	Add test for new model parallelism features (#17401 )	2022-05-25 10:51:27 -04:00
Sylvain Gugger	56f50590d5	Use Accelerate in `from_pretrained` for big model inference (#17341 ) * Initial work * More or less finished with first draft * Update src/transformers/modeling_utils.py Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * Update src/transformers/modeling_utils.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Fix randomly initialized weights * Update src/transformers/modeling_utils.py Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr> * Address review comments * Rename DeepSpeed folder to temporarily fix the test issue? * Revert to try if Accelerate fix works * Use latest Accelerate release * Quality and fixes * Style * Quality * Add doc * Test + fix * More blocks Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>	2022-05-23 14:32:21 -04:00
Michael Benayoun	2e7e4280aa	Traced models serialization and torchscripting fix (#17206 ) * Fix torch.jit.script and pickling issues * Fix get_attr issues * Fix import in function * Fix GPT-J and T5 tracing for torch=1.11 * Gate graph surgery on torch version * Modeling minor changes to enable TorchScripting * Model serialization / deserialization test * Remove _assert_is_none users	2022-05-23 17:50:40 +02:00
Kyungmin Lee	f0395cf58e	Fix test_model_parallelization (#17249 ) * Fix test_model_parallelization * Modify	2022-05-16 23:30:49 +02:00
Sylvain Gugger	afe5d42d8d	Black preview (#17217 ) * Black preview * Fixup too! * Fix check copies * Use the same version as the CI * Bump black	2022-05-12 16:25:55 -04:00
Michael Benayoun	8c7481f35c	ViT and Swin symbolic tracing with torch.fx (#17182 ) * Support tracing for ViT * Swin support * Fix copies * Fix type annotation issue * Removed unused import	2022-05-12 10:42:27 +02:00
Yih-Dar	e6d23a4b9b	Improve test_pt_tf_model_equivalence on PT side (#16731 ) * Update test_pt_tf_model_equivalence on PT side Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-04-19 21:13:27 +02:00
Stas Bekman	5da33f8729	[modeling utils] revamp `from_pretrained(..., low_cpu_mem_usage=True)` + tests (#16657 ) * add low_cpu_mem_usage tests * wip: revamping * wip * install /usr/bin/time * wip * cleanup * cleanup * cleanup * cleanup * cleanup * fix assert * put the wrapper back * cleanup; switch to bert-base-cased * Trigger CI * Trigger CI	2022-04-14 18:10:05 -07:00
Yih-Dar	c04619ecf3	Enable more test_torchscript (#16679 ) * update _create_and_check_torchscript * Enable test_torchscript * clear_class_registry Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-04-11 18:23:35 +02:00
Yih-Dar	3918d6a9d6	Reduce memory leak in _create_and_check_torchscript (#16691 ) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-04-11 18:22:28 +02:00
Yih-Dar	2109afae71	Rename the method test_torchscript (#16693 ) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-04-11 18:21:45 +02:00
NielsRogge	979b039c89	Add DPT (#15991 ) * First draft * More improvements * Add fusion blocks * Make conversion script work for dpt_large * Make conversion script work * Improve implementation * Improve conversion script * Add DPTForSemanticSegmentation * Make conversion work for semantic segmentation * Add tests * Remove print statements * First draft * Redesign neck * Improve tests * Improve implementation some more * Make neck output list of tensors * Improve neck and feature extractor * Fix integration tests * Make more tests pass * Make all tests pass * Add missing config archive map * Add in_index attribute to make heads accept list of tensors * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Apply some more suggestions * Add copied from statements * Remove assert * Apply suggestions from code review * Apply suggestions from code review * Remove DPTInterpolate in favor of nn.Upsample * Add comments * Apply suggestions from code review * Apply suggestions from code review * Add proposed design * Update design * Add DPTReassembleLayer * Add DPTFeatureFusionStage * Apply more suggestions from code review * Apply suggestions from code review * Apply suggestions from code review * Fix rebase * Update in_index and out_indices * Fix conversion script * Fix code quality * Add model to toctree and use DepthEstimatorOutput * Fix rebase * Fix code examples * Improve code * Fix copied from statements * Apply suggestions from code review * Remove compute_loss method * Apply suggestions from code review * Fix documentation tests file * Remove test.py file * Improve doc example Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Niels Rogge <nielsrogge@nielss-mbp.home>	2022-03-28 16:28:10 +02:00
Sylvain Gugger	b473617d63	Checkpoint sharding (#16343 ) * Sharded checkpoint support * Handle distant sharded checkpoints * Add tests * TODO is done * Apply suggestions from code review Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * Fix docstring * Add example and format * Address review comments * More review comments * End of merge * Revert unintentional change * VsCode what did you do? * Style * Changes * Address final comments * Quality * Moar tests * Move import beneath is_pt_available Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>	2022-03-25 11:59:25 -04:00
Yih-Dar	f571dc20ac	Update PT Flax equivalence tests in PT test file (#16280 ) * update PT/Flax equivalence tests on PT side * overwrite check_outputs in BigBirdModelTest Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-03-24 14:45:30 +01:00
Sylvain Gugger	c595b6e6a9	Make Transformers use cache files when hf.co is down (#16362 ) * Make Transformers use cache files when hf.co is down * Fix tests * Was there a random circleCI failure? * Isolate patches * Style * Comment out the failure since it doesn't fail anymore * Better comment	2022-03-23 15:56:49 -04:00
Sylvain Gugger	4975002df5	Reorganize file utils (#16264 ) * Split file_utils in several submodules * Fixes * Add back more objects * More fixes * Who exactly decided to import that from there? * Second suggestion to code with code review * Revert wront move * Fix imports * Adapt all imports * Adapt all imports everywhere * Revert this import, will fix in a separate commit	2022-03-23 10:26:33 -04:00
Yih-Dar	75c666b4a8	Aggressive PT/TF equivalence test on PT side (#16250 ) * Aggressive PT/TF equivalence test on PT side * Ugly fix for `TFTapasForQuestionAnswering` * apply review suggestions Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-03-18 18:51:24 +01:00
NielsRogge	8d83ebdf18	[Tests] Add attentions_option to ModelTesterMixin (#15909 ) * Add attentions_option to common tester * Fix tests, apply suggestion * Apply suggestion from code review Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>	2022-03-10 12:00:30 +01:00
NielsRogge	286fdc6b3c	[vision] Add problem_type support (#15851 ) * Add problem_type to missing models * Fix deit test Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>	2022-03-01 18:09:52 +01:00
Eduardo Gonzalez Ponferrada	df5a4094a6	Add Data2Vec (#15507 ) * Add data2vec model cloned from roberta * Add checkpoint conversion script * Fix copies * Update docs * Add checkpoint conversion script * Remove fairseq data2vec_text script and fix format * Add comment on where to get data2vec_text.py * Remove mock implementation cheat.py and fix style * Fix copies * Remove TF and Flax classes from init * Add back copy from fairseq data2vec_text.py and fix style * Update model name in docs/source/index.mdx to be CamelCase * Revert model name in table to lower-case to get check_table test to pass * Update src/transformers/models/data2vec/__init__.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/data2vec/convert_data2vec_original_pytorch_checkpoint_to_pytorch.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/data2vec/modeling_data2vec.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/data2vec/modeling_data2vec.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/data2vec/modeling_data2vec.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/data2vec/modeling_data2vec.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update docs/source/model_doc/data2vec.mdx Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update docs/source/model_doc/data2vec.mdx Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/auto/configuration_auto.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/data2vec/configuration_data2vec.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/data2vec/modeling_data2vec.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/data2vec/modeling_data2vec.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/data2vec/modeling_data2vec.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update tests/test_modeling_data2vec.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/data2vec/configuration_data2vec.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/data2vec/modeling_data2vec.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update documentation * Copy-paste Data2VecConfig from BertConfig * Update config checkpoint to point to edugp/data2vec-nlp-base. Fix style and repo-consistency * Update config special tokens to match RoBERTa * Split multiple assertions and add individual error messages * Rename Data2VecModel to Data2VecForTextModel * Add Data2Vec to _toctree.yml * Rename Data2VecEmbeddings to Data2VecForTextEmbeddings * Add initial Data2VecForAudio model (unfinished). Only matching fairseq's implementation up to the feature encoder (before positional encoding). * finish audio model * finish audio file * Update names and fix style, quality and repo consistency * Remove Data2VecAudioForPretraining. Add tests for Data2VecAudio, mimicking the Wav2Vec2 test suite. Fix bias initilization in positional conv layers. Move back configurations for audio and text to separate files. * add inputs to logits to data2vec' * correct autio models * correct config auto * correct tok auto * Update utils/tests_fetcher.py * delete unnecessary files * delete unnecessary files * further renaming * make all tests pass * finish * remove useless test file * Update tests/test_modeling_common.py * Update utils/check_repo.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/data2vec/modeling_data2vec_text.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Fix copies * Update docs * Remove fairseq data2vec_text script and fix format * Add comment on where to get data2vec_text.py * Remove mock implementation cheat.py and fix style * Fix copies * Remove TF and Flax classes from init * Add back copy from fairseq data2vec_text.py and fix style * Update model name in docs/source/index.mdx to be CamelCase * Revert model name in table to lower-case to get check_table test to pass * Update documentation * Update src/transformers/models/data2vec/__init__.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/data2vec/convert_data2vec_original_pytorch_checkpoint_to_pytorch.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/data2vec/modeling_data2vec.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/data2vec/modeling_data2vec.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/data2vec/modeling_data2vec.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/data2vec/modeling_data2vec.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/auto/configuration_auto.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/data2vec/configuration_data2vec.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/data2vec/modeling_data2vec.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/data2vec/modeling_data2vec.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/data2vec/modeling_data2vec.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update tests/test_modeling_data2vec.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/data2vec/configuration_data2vec.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/data2vec/modeling_data2vec.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Copy-paste Data2VecConfig from BertConfig * Update config checkpoint to point to edugp/data2vec-nlp-base. Fix style and repo-consistency * Update config special tokens to match RoBERTa * Split multiple assertions and add individual error messages * Rename Data2VecModel to Data2VecForTextModel * Add Data2Vec to _toctree.yml * Rename Data2VecEmbeddings to Data2VecForTextEmbeddings * Add initial Data2VecForAudio model (unfinished). Only matching fairseq's implementation up to the feature encoder (before positional encoding). * finish audio model * finish audio file * add inputs to logits to data2vec' * Update names and fix style, quality and repo consistency * Remove Data2VecAudioForPretraining. Add tests for Data2VecAudio, mimicking the Wav2Vec2 test suite. Fix bias initilization in positional conv layers. Move back configurations for audio and text to separate files. * correct autio models * correct config auto * correct tok auto * delete unnecessary files * delete unnecessary files * Update utils/tests_fetcher.py * further renaming * make all tests pass * finish * remove useless test file * Update tests/test_modeling_common.py * Update utils/check_repo.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/data2vec/modeling_data2vec_text.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Move data2vec tests to new structure * Fix test imports for text tests * Remove fairseq files * Change paper link to arxiv * Modify Data2Vec documentation to reflect that the encoder is not shared across the audio and text models in the current implementation. * Update text model checkpoint to be facebook/data2vec-text-base * Add 'Copy from' statements and update paper links and docs * fix copy from statements * improve copied from * correct more copied from statements * finish copied from stuff * make style * add model to README * add to master Co-authored-by: Eduardo Gonzalez Ponferrada <eduardo@ferrumhealth.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-03-01 11:09:20 +01:00
Patrick von Platen	ddbb485c41	[TF-PT-Tests] Fix PyTorch - TF tests for different GPU devices (#15846 )	2022-02-28 15:46:46 -05:00
Sylvain Gugger	d1fcc90abf	Fix from_pretrained with default base_model_prefix (#15814 )	2022-02-24 11:43:51 +01:00
NielsRogge	57882177be	Add SimMIM (#15586 ) * Add first draft * Make model importable * Make SwinForMaskedImageModeling importable * Fix imports * Add missing inits * Add support for Swin * Fix bug * Fix bug * Fix another bug * Fix Swin MIM implementation * Fix default encoder stride * Fix Swin * Add print statements for debugging * Add image_size data argument * Fix Swin * Fix image_size * Add print statements for debugging * Fix print statement * Remove print statements * Improve reshaping of bool_masked_pos * Add support for DeiT, fix tests * Improve docstrings * Apply new black version * Improve script * Fix bug * Improve README * Apply suggestions from code review * Remove DS_Store and add to gitignore * Apply suggestions from code review + fix BEiT Flax * Revert BEiT changes * Improve README * Fix code quality * Improve README Co-authored-by: Niels Rogge <nielsrogge@Nielss-MBP.localdomain> Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>	2022-02-17 19:44:55 +01:00
Lysandre Debut	943e2aa036	Fix model equivalence tests (#15670 ) * Fix model equivalence tests * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-02-15 18:55:22 -05:00
Sylvain Gugger	1f60bc46f3	Make sure custom configs work with Transformers (#15569 ) * Make sure custom configs work with Transformers * Apply code review suggestions	2022-02-09 10:04:44 -05:00
Joao Gante	8406fa6dd5	Add TFSpeech2Text (#15113 ) * Add wrapper classes * convert inner layers to tf * Add TF Encoder and Decoder layers * TFSpeech2Text models * Loadable model * TF model with same outputs as PT model * test skeleton * correct tests and run the fixup * correct attention expansion * TFSpeech2Text pask_key_values with TF format	2022-02-08 16:27:23 +00:00
Michael Benayoun	0fe17f375a	FX tracing improvement (#14321 ) * Change the way tracing happens, enabling dynamic axes out of the box * Update the tests and modeling xlnet * Add the non recoding of leaf modules to avoid recording more values for the methods to record than what will be seen at tracing time (which would otherwise desynchronize the recorded values and the values that need to be given to the proxies during tracing, causing errors). * Comments and making tracing work for gpt-j and xlnet * Refactore things related to num_choices (and batch_size, sequence_length) * Update fx to work on PyTorch 1.10 * Postpone autowrap_function feature usage for later * Add copyrights * Remove unnecessary file * Fix issue with add_new_model_like * Apply suggestions	2022-02-07 22:25:33 +01:00
Sylvain Gugger	44b21f117b	Save code of registered custom models (#15379 ) * Allow dynamic modules to use relative imports * Work for configs * Fix last merge conflict * Save code of registered custom objects * Map strings to strings * Fix test * Add tokenizer * Rework tests * Tests * Ignore fixtures py files for tests * Tokenizer test + fix collection * With full path * Rework integration * Fix typo * Remove changes in conftest * Test for tokenizers * Add documentation * Update docs/source/custom_models.mdx Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Add file structure and file content * Add more doc * Style * Update docs/source/custom_models.mdx Co-authored-by: Suraj Patil <surajp815@gmail.com> * Address review comments Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Suraj Patil <surajp815@gmail.com>	2022-02-02 10:44:37 -05:00
Sylvain Gugger	33f36c869f	Add a main_input_name attribute to all models (#14803 ) * Add a main_input_name attribute to all models * Fix tests * Wtf Vs Code? * Update src/transformers/models/imagegpt/modeling_imagegpt.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Style * Fix copies Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2021-12-20 11:19:08 -05:00
NielsRogge	25156eb296	Rename ImageGPT (#14526 ) * Rename * Add MODEL_FOR_CAUSAL_IMAGE_MODELING_MAPPING	2021-11-29 10:19:11 +01:00
Sylvain Gugger	d83b0e0c07	Add a post init method to all models (#14431 ) * Add a post init method to all models * Fix tests * Fix last tests * Fix templates * Add comment * Forgot to save	2021-11-18 08:38:09 -05:00
Sylvain Gugger	040fd47162	Fix gradient_checkpointing backward compatibility (#14408 ) * Fix gradient_checkpointing backward compatibility * Remove needless line * make sure mask prob is big enough and length small enough * Fix tests Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>	2021-11-16 08:58:42 -05:00
Yih-Dar	be4a6c64dc	Add TFViTModel (#13778 ) * Start the work for TFViTModel * Convert to TF code - need to check in the follow up commits * Clean up model code * Expose TFViTModel * make style * make quality * Add test * make style & quality * Fix some imports * fix wrong usage - kwargs => * kwargs * Fix Conv2D weight loading (PT->TF) issue * Add tests for images with different sizes + fix model * Fix some common tests for TFViTModel * Use inputs instead of input_ids in test_compile_tf_model * Add a comment about transpose and Conv2D in convert_tf_weight_name_to_pt_weight_name * Avoid transpose in TFViT call * Fix Conv2D issue in load_tf2_weights_in_pytorch_model * Use tf.keras.layers.Conv2D instead of tf.nn.conv2d * Using simpler heuristic to detect Conv2D layer * Change convert_tf_weight_name_to_pt_weight_name to return TransposeType * Check tf_weight_shape is not None before using it * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fix missing comma * fix input dtype Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-11-09 07:54:37 -05:00
Sylvain Gugger	dfb00bf644	Expand dynamic supported objects to configs and tokenizers (#14296 ) * Dynamic configs * Add config test * Better tests * Add tokenizer and test * Add to from_config * With save	2021-11-08 15:28:25 -05:00
Sylvain Gugger	558f8543ba	Update Transformers to huggingface_hub >= 0.1.0 (#14251 ) * Update Transformers to huggingface_hub >= 0.1.0 * Forgot to save... * Style * Fix test	2021-11-02 18:58:42 -04:00
NielsRogge	e20faa6f03	Add BeitForSemanticSegmentation (#14096 ) * Add first draft * Make forward pass work * Improve conversion script * Add notebook that checks if it works * Add BeitForSemanticSegmentation to the tests * More improvements * Make BeitForSemanticSegmentation consistent with Segformer * Small bug fix * Add BeitForSemanticSegmentation to docs * Make sure model doesn't output hidden states when the user doesn't want to * Make it possible to convert the large model * Fix issue * Fix conversion script for large model * Add auxiliary_head option to semantic segmentation model * Apply suggestions from @sgugger's review * Apply suggestions from code review * Fix failing test Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>	2021-11-01 19:55:45 +01:00
Sylvain Gugger	c28bc80bbb	Generalize problem_type to all sequence classification models (#14180 ) * Generalize problem_type to all classification models * Missing import * Deberta BC and fix tests * Fix template * Missing imports * Revert change to reformer test * Fix style	2021-10-29 10:32:56 -04:00
Patrick von Platen	0c3174c758	Add TF<>PT and Flax<>PT everywhere (#14047 ) * up * up * up * up * up * up * up * add clip * fix clip PyTorch * fix clip PyTorch * up * up * up * up * up * up * up	2021-10-25 23:55:08 +02:00
Li-Huai (Allan) Lin	234cfefbb0	Fix ignore_mismatched_sizes (#14085 ) * Fix * Style * Name * Fix tests * Style * Remove embed sizes checking * Disable some tests * Fix * Apply suggestion	2021-10-21 12:31:29 -04:00
Patrick von Platen	dca6796876	[Gradient checkpoining] Correct disabling `find_unused_parameters` in Trainer when gradient checkpointing is enabled (#13961 ) * up * correct test	2021-10-11 15:34:01 +02:00
Michael Benayoun	d4e4efce68	Initial support for symbolic tracing with torch.fx allowing dynamic axes (#13579 ) * Symbolic trace dynamic axes support for BERT like models (albert, bert, distilbert, mobilebert, electra, megatron-bert) * Sanity checks before tracing that make sure the model to trace is supported * Adapted to PyTorch 1.9 Co-authored-by: Michael Benayoun <michael@huggingface.co>	2021-10-05 14:19:47 +02:00
Sylvain Gugger	27d4639779	Make gradient_checkpointing a training argument (#13657 ) * Make gradient_checkpointing a training argument * Update src/transformers/modeling_utils.py Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * Update src/transformers/configuration_utils.py Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * Fix tests * Style * document Gradient Checkpointing as a performance feature * Small rename * PoC for not using the config * Adapt BC to new PoC * Forgot to save * Rollout changes to all other models * Fix typo Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Stas Bekman <stas@stason.org>	2021-09-22 07:51:38 -04:00
Sylvain Gugger	002a078aff	Dynamically load model code from the Hub (#13467 ) * Dynamic model * Use defensive flag * Style * Doc and arg rename * Arg rename * Add tests * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Address review comments * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2021-09-20 13:59:21 -04:00
Patrick von Platen	95f933ea85	[Pretrained Model] Add resize_position_embeddings (#13559 ) * finish * delete bogus file * correct some stuff * finish * finish	2021-09-15 19:03:56 +02:00
Sylvain Gugger	74b3344fbc	Clean up test file	2021-08-31 07:06:49 -04:00
Sylvain Gugger	8b2de0e483	Tests fetcher tests (#13340 ) * Incorporate tests dependencies in tests_fetcher * Harder modif * Debug * Loop through all files * Last modules * Remove debug statement	2021-08-31 03:57:01 -04:00
Stas Bekman	5c6eca71a9	fix `AutoModel.from_pretrained(..., torch_dtype=...)` (#13209 ) * fix AutoModel.from_pretrained(..., torch_dtype=...) * fix to_diff_dict * add better test * torch is not always available when a model has self.torch_dtype	2021-08-24 11:43:41 +02:00
Lysandre Debut	3290315a2a	Fix AutoModel tests (#12733 )	2021-07-15 09:06:12 -04:00
Sylvain Gugger	90178b0cef	Add option to load a pretrained model with mismatched shapes (#12664 ) * Add option to load a pretrained model with mismatched shapes * Fail at loading when mismatched shapes in Flax * Fix tests * Update src/transformers/modeling_flax_utils.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Address review comments Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2021-07-13 10:15:15 -04:00
Stas Bekman	2d1d92181a	[roberta] fix lm_head.decoder.weight ignore_key handling (#12446 ) * fix lm_head.decoder.weight ignore_key handling * fix the mutable class variable * Update src/transformers/models/roberta/modeling_roberta.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * replicate the comment * make deterministic Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2021-07-01 10:31:19 -07:00
Stas Bekman	7682e97702	[models] respect dtype of the model when instantiating it (#12316 ) * [models] respect dtype of the model when instantiating it * cleanup * cleanup * rework to handle non-float dtype * fix * switch to fp32 tiny model * improve * use dtype.is_floating_point * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fix the doc * recode to use explicit torch_dtype_auto_detect, torch_dtype args * docs and tweaks * docs and tweaks * docs and tweaks * merge 2 args, add docs * fix * fix * better doc * better doc Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-06-28 20:11:21 -07:00
Lysandre Debut	8ef62ec9e1	Fix torchscript tests (#12336 ) * Fix torchscript tests * Better test * Remove bogus print	2021-06-24 09:52:28 -04:00
Michael Benayoun	986ac03e37	changed modeling_fx_utils.py to utils/fx.py for clarity (#12326 ) Co-authored-by: Michael Benayoun <michael@huggingface.co>	2021-06-23 18:16:24 +02:00
Sylvain Gugger	53c60babe4	Clean push to hub API (#12187 ) * Clean push to hub API * Create working dir if it does not exist * Different tweak * New API + all models + test Flax * Adds the Trainer clean up * Update src/transformers/file_utils.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Address review comments * (nit) output types * No need to set clone_from when folder exists * Update src/transformers/trainer.py Co-authored-by: Julien Chaumond <julien@huggingface.co> * Add generated_from_trainer tag * Update to new version * Fixes Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Julien Chaumond <julien@huggingface.co> Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>	2021-06-23 10:11:19 -04:00
Stas Bekman	372ab9cd6d	[style] consistent nn. and nn.functional: part 3 `tests` (#12155 ) * consistent nn. and nn.functional: p3 templates * restore	2021-06-14 12:18:22 -07:00
NielsRogge	d3eacbb829	Add DETR (#11653 ) * Squash all commits of modeling_detr_v7 branch into one * Improve docs * Fix tests * Style * Improve docs some more and fix most tests * Fix slow tests of ViT, DeiT and DETR * Improve replacement of batch norm * Restructure timm backbone forward * Make DetrForSegmentation support any timm backbone * Fix name of output * Address most comments by @LysandreJik * Give better names for variables * Conditional imports + timm in setup.py * Address additional comments by @sgugger * Make style, add require_timm and require_vision to testsé * Remove train_backbone attribute of DetrConfig, add methods to freeze/unfreeze backbone * Add png files to fixtures * Fix type hint * Add timm to workflows * Add `BatchNorm2d` to the weight initialization * Fix retain_grad test * Replace model checkpoints by Facebook namespace * Fix name of checkpoint in test * Add user-friendly message when scipy is not available * Address most comments by @patrickvonplaten * Remove return_intermediate_layers attribute of DetrConfig and simplify Joiner * Better initialization * Scipy is necessary to get sklearn metrics * Rename TimmBackbone to DetrTimmConvEncoder and rename DetrJoiner to DetrConvModel * Make style * Improve docs and add 2 community notebooks Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>	2021-06-09 11:51:13 -04:00
Lysandre Debut	db0b2477cc	Add some tests to the slow suite #11860	2021-05-25 04:06:06 -04:00
Michael Benayoun	f4a0d6ff86	A cleaner and more scalable implementation of symbolic tracing (#11763 ) Cleaner and more scalable implementation of symbolic tracing with torch.fx, and provides support for new architectures: - ALBERT - DistilBERT - MobileBERT - MegatronBERT - GPT2 - GPT Neo Co-authored-by: Michael Benayoun <michael@huggingface.co>	2021-05-20 18:02:29 +02:00
Sylvain Gugger	469384a777	Fix regression in regression (#11785 ) * Fix regression in regression * Add test	2021-05-20 09:55:13 -04:00
Michael Benayoun	86d5fb0b36	Experimental symbolic tracing feature with torch.fx for BERT, ELECTRA and T5 (#11475 ) Symbolic tracing feature for BERT, ELECTRA and T5 Co-authored-by: Michael Benayoun <michael@huggingface.co> Co-authored-by: Stas Bekman <stas@stason.org> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-05-14 20:57:30 +02:00
Volodymyr Byno	218d552f30	Fix loading the best model on the last stage of training (#11718 )	2021-05-13 16:11:12 -04:00
Sylvain Gugger	f13f1f8fb8	Test checkpointing (#11682 ) * Add test and see where CI is unhappy * Load with strict=False	2021-05-11 12:02:48 -04:00
Vasudev Gupta	dc3f6758cf	Add BigBirdPegasus (#10991 ) * init bigbird pegasus * add debugging nb ; update config * init conversion * update conversion script * complete conversion script * init forward() * complete forward() * add tokenizer * add some slow tests * commit current * fix copies * add docs * add conversion script for bigbird-roberta-summarization * remove TODO * small fixups * correct tokenizer * add bigbird core for now * fix config * fix more * revert pegasus-tokenizer back * make style * everything working for pubmed; yayygit status * complete tests finally * remove bigbird pegasus tok * correct tokenizer * correct tests * add tokenizer files * finish make style * fix test * update * make style * fix tok utils base file * make fix-copies * clean a bit * small update * fix some suggestions * add to readme * fix a bit, clean tests * fix more tests * Update src/transformers/__init__.py * Update src/transformers/__init__.py * make fix-copies * complete attn switching, auto-padding left * make style * fix auto-padding test * make style * fix batched attention tests * put tolerance at 1e-1 for stand-alone decoder test * fix docs * fix tests * correct slow tokenizer conversion * Apply suggestions from code review Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * complete remaining suggestions * fix test Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-05-07 09:27:43 +02:00
Patrick von Platen	3e3e41ae20	Pytorch - Lazy initialization of models (#11471 ) * lazy_init_weights * remove ipdb * save int * add necessary code * remove unnecessary utils * Update src/transformers/models/t5/modeling_t5.py * clean * add tests * correct * finish tests * finish tests * fix some more tests * fix xlnet & transfo-xl * fix more tests * make sure tests are independent * fix tests more * finist tests * final touches * Update src/transformers/modeling_utils.py * Apply suggestions from code review * Update src/transformers/modeling_utils.py Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * Update src/transformers/modeling_utils.py Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * clean tests * give arg positive name * add more mock weights to xlnet Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>	2021-05-05 17:22:20 +02:00
abhishek thakur	c40c7e213b	Add multi-class, multi-label and regression to transformers (#11012 ) * add to bert * review comments * Update src/transformers/configuration_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/configuration_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * self.config.problem_type * fix style * fix * fin * fix * update doc * fix * test * Test more problem types * Update src/transformers/configuration_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fix * remove * fix * quality * make fix-copies * remove test Co-authored-by: abhishek thakur <abhishekkrthakur@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>	2021-05-04 02:23:40 -04:00
Patrick von Platen	f748bd4242	[Flax] Add docstrings & model outputs (#11498 ) * add attentions & hidden states * add model outputs + docs * finish docs * finish tests * finish impl * del @ * finish * finish * correct test * apply sylvains suggestions * Update src/transformers/models/bert/modeling_flax_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * simplify more Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-04-29 12:04:51 +02:00
Patrick von Platen	32dbb2d954	make style (#11442 )	2021-04-26 13:50:34 +02:00
Daniel Stancl	e3ff165aa5	Fix cross-attention head mask for Torch encoder-decoder models (#10605 ) * Fix cross-attention head mask for Torch BART models * Fix head masking for cross-attention module for the following models: BART, Blenderbot, Blenderbot_small, M2M_100, Marian, MBart, Pegasus * Enable test_headmasking for M2M_100 model * Fix cross_head_mask for FSMT, LED and T5 * This commit fixes `head_mask` for cross-attention modules in the following models: FSMT, LED, T5 * It also contains some smaller changes in doc so that it is be perfectly clear the shape of `cross_head_mask` is the same as of `decoder_head_mask` * Update template * Fix template for BartForCausalLM * Fix cross_head_mask for Speech2Text models * Fix cross_head_mask in templates * Fix args order in BartForCausalLM template * Fix doc in BART templates * Make more explicit naming * `cross_head_mask` -> `cross_attn_head_mask` * `cross_layer_head_mask` -> `cross_attn_layer_head_mask` * Fix doc * make style quality * Fix speech2text docstring	2021-04-23 18:58:06 +02:00
Sylvain Gugger	bf2e0cf70b	Trainer push to hub (#11328 ) * Initial support for upload to hub * push -> upload * Fixes + examples * Fix torchhub test * Torchhub test I hate you * push_model_to_hub -> push_to_hub * Apply mixin to other pretrained models * Remove ABC inheritance * Add tests * Typo * Run tests * Install git-lfs * Change approach * Add push_to_hub to all * Staging test suite * Typo * Maybe like this? * More deps * Cache * Adapt name * Quality * MOAR tests * Put it in testing_utils * Docs + torchhub last hope * Styling * Wrong method * Typos * Update src/transformers/file_utils.py Co-authored-by: Julien Chaumond <julien@huggingface.co> * Address review comments * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Julien Chaumond <julien@huggingface.co> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2021-04-23 09:17:37 -04:00
Sylvain Gugger	81009b7a5c	Replace error by warning when loading an architecture in another (#11207 ) * Replace error by warning when loading an architecture in another * Style * Style again * Add a test * Adapt old test	2021-04-13 10:33:52 -04:00
Sylvain Gugger	ba8b1f4754	Add support for multiple models for one config in auto classes (#11150 ) * Add support for multiple models for one config in auto classes * Use get_values everywhere * Prettier doc	2021-04-08 18:41:36 -04:00
NielsRogge	30677dc743	Add Vision Transformer and ViTFeatureExtractor (#10950 ) * Squash all commits into one * Update ViTFeatureExtractor to use image_utils instead of torchvision * Remove torchvision and add Pillow * Small docs improvement * Address most comments by @sgugger * Fix tests * Clean up conversion script * Pooler first draft * Fix quality * Improve conversion script * Make style and quality * Make fix-copies * Minor docs improvements * Should use fix-copies instead of manual handling * Revert "Should use fix-copies instead of manual handling" This reverts commit `fd4e591bce`. * Place ViT in alphabetical order Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-04-01 11:16:05 -04:00
Sylvain Gugger	acc3bd9d2a	Enforce string-formatting with f-strings (#10980 ) * First third * Styling and fix mistake * Quality * All the rest * Treat %s and %d * typo * Missing ) * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2021-03-31 10:00:27 -04:00
Vimarsh Chaturvedi	094afa515d	from_pretrained: check that the pretrained model is for the right model architecture (#10586 ) * Added check to ensure model name passed to from_pretrained and model are the same * Added test to check from_pretrained throws assert error when passed an incompatiable model name * Modified assert in from_pretrained with f-strings. Modified test to ensure desired assert message is being generated * Added check to ensure config and model has model_type * Fix FlauBERT heads Co-authored-by: vimarsh chaturvedi <vimarsh chaturvedi> Co-authored-by: Stas Bekman <stas@stason.org> Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>	2021-03-18 12:51:42 -04:00
Patrick von Platen	0234de8418	Add Fine-Tuning for Wav2Vec2 (#10145 ) * add encode labels function to tokenizer * start adding finetuning * init dropout * upload * correct convert script * apply changes * fix second typo * make first dummy training run * adapt convert script * push confg for comparison * remove conf * finish training * adapt data collator * add research folder * update according to fairseq feedback * some minor corrections * refactor masking indices a bit * some minor changes * clean tokenizer * finish clean-up * remove previous logic * update run script * correct training * finish changes * finish model * correct bug * fix training a bit more * add some tests * finish gradient checkpointing * finish example * correct gradient checkpointing * improve tokenization method * revert changes in tokenizer * revert general change * adapt fine-tuning * update * save intermediate test * Update README.md * finish finetuning * delete conversion script * Update src/transformers/models/wav2vec2/configuration_wav2vec2.py * Update src/transformers/models/wav2vec2/processing_wav2vec2.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * finish wav2vec2 script * finish wav2vec2 fine-tuning * finalize test * correct test * adapt tests * finish * remove test file Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2021-03-01 12:13:17 +03:00
Daniel Stancl	71bdc076dd	Add head_mask and decoder_head_mask to PyTorch LED (#9856 ) * Add {decoder_,}head_mask to LED * Fix create_custom_forward signatue in encoder * Add head_mask to longformer * Add head_mask to longformer to fix dependencies of LED on Longformer. * Not working yet * Add mising one input in longofrmer_modeling.py * make fix-copies	2021-02-02 11:06:52 -08:00
Patrick von Platen	12c1b5b8f4	fix test (#9669 )	2021-01-19 09:06:24 +01:00
Daniel Stancl	357fb1c5d8	Add head_mask/decoder_head_mask for BART (#9569 ) * Add head_mask/decoder_head_mask for BART This branch implement head_mask and decoder_head_mask for BART-based models. Full list below: - BART - MBart - Blenderbot - BlenderbotSmall - Marian - Pegasus Everything is accompanied with updated testing. * Fix test_headmasking for BART models * Fix text_headmasking for BART-like models which has only 2 layers in each modules. The condition ``` self.assertNotEqual(attentions[1][..., 0, :, :].flatten().sum().item(), 0.0) ``` is, therefore, invalid for encoder-decoder models considering the `head_mask` ``` head_mask = torch.ones( self.model_tester.num_hidden_layers, self.model_tester.num_attention_heads, device=torch_device, ) head_mask[0, 0] = 0 head_mask[-1, :-1] = 0 ``` specified in the `test_headmasking` test/function. * Adjust test_modeling_common.py to reflect T5 input args * Update tests/test_modeling_common.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * make style * make fix-copies Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-01-18 13:35:22 +01:00
Stas Bekman	143289dcf7	[test_model_parallelization] multiple fixes (#9354 )	2021-01-04 12:09:12 -08:00
Patrick von Platen	61443cd7d9	[GPT2] Correct gradient checkpointing (#9308 ) * correct gpt2 * fix gpt2 * fix use_cache ordering * correct past tolerance * fix for all cases * style	2020-12-25 23:28:12 +01:00
TobiasNorlund	08abdabda1	Fixed beam search generation for GPT2 and T5 (#9219 )	2020-12-21 08:05:23 -05:00

1 2 3 4 5

237 Commits