Commit Graph

311 Commits

Author SHA1 Message Date
Patrick von Platen
b20147a3c8
[Variant] Make sure variant files are not incorrectly deleted (#21562)
* [Variant] Make sure variant files are not incorrectly deleted

* Apply suggestions from code review

* fix
2023-02-10 15:44:51 +01:00
Sylvain Gugger
04b2f13c37
🚨🚨🚨 Enforce single model initialization (#21431)
* Enforce single model initialization

* Add OneFormer example for problem 3

* Do it the Stas way

* Actually rename the uses...

* Rewrite test

* Try to change the test this way

* Fix all init slow/fast tests

* Break connection

* Fix more tests

* Fix test for initialization

* Remove custom test

* Quality

* Fix last failing tests

* The end?
2023-02-09 15:46:26 -05:00
Sylvain Gugger
2020ac4bd6
Fix from_pretrained API with config and state_dict (#21542) 2023-02-09 15:44:02 -05:00
Joao Gante
0d33381fad
Tag tests as slow (#21537)
begone slow tests
2023-02-09 14:46:15 +00:00
Sylvain Gugger
6f79d26442
Update quality tooling for formatting (#21480)
* Result of black 23.1

* Update target to Python 3.7

* Switch flake8 to ruff

* Configure isort

* Configure isort

* Apply isort with line limit

* Put the right black version

* adapt black in check copies

* Fix copies
2023-02-06 18:10:56 -05:00
Patrick von Platen
90cddfa824
Add variant to transformers (#21332)
* Bump onnx in /examples/research_projects/decision_transformer

Bumps [onnx](https://github.com/onnx/onnx) from 1.11.0 to 1.13.0.
- [Release notes](https://github.com/onnx/onnx/releases)
- [Changelog](https://github.com/onnx/onnx/blob/main/docs/Changelog.md)
- [Commits](https://github.com/onnx/onnx/compare/v1.11.0...v1.13.0)

---
updated-dependencies:
- dependency-name: onnx
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

* adapt

* finish

* Update examples/research_projects/decision_transformer/requirements.txt

* up

* add tests

* Apply suggestions from code review

Co-authored-by: Lucain <lucainp@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* fix test

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Lucain <lucainp@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2023-02-01 09:21:52 +01:00
Yih-Dar
4e41b87e3d
Use model_class.__name__ and compare against XXX_MAPPING_NAMES (#21304)
* update

* update all

* clean up

* make quality

* clean up

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-01-26 11:31:31 +01:00
Joao Gante
1eda4a4102
Generate: save generation config with the models' .save_pretrained() (#21264) 2023-01-23 16:21:44 +00:00
Susnato Dhar
b5be744d3c
Fixed issue #21039 (#21062)
Fixed issue #21039 and added test for low_cpu_mem_usage
2023-01-12 10:03:13 +01:00
Yih-Dar
5fa0b17c3d
[Past CI] 🔥 Leave Past CI failures in the past 🔥 (#20861)
* torch.jit._state

* Fix past CI

* Fix for perceiver

* Fix REALM

* Fix for Bloom

* Fix for SwinMode

* Fix for TrajectoryTransformerModel

* Fix for test_wav2vec2_with_lm

* make style

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-12-27 18:37:25 +01:00
NielsRogge
11745b4e45
[Tests] Improve test_attention_outputs (#20701)
* Improve tests

* Improve TF tests

* Apply suggestion

* Fix test

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-12-14 14:41:40 +01:00
NielsRogge
0bae286de9
[AutoBackbone] Improve API (#20407)
* Add hidden states and attentions to backbone outputs

* Update ResNet

* Fix more tests

* Debug test

* Fix test_determinism

* Fix test_save_load

* Remove file

* Disable fx tests

* Test

* Add fx support for backbones

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-11-28 17:20:24 +01:00
NielsRogge
4973d2a04c
Add Audio Spectogram Transformer (#19981)
* First draft

* Make conversion script work

* Add id2label mapping, run code quality

* Fix copies

* Add first draft of feature extractor

* Update conversion script to use feature extractor

* Make more tests pass

* Add docs

* update input_features to input_values + pad by default to max length

* Fix doc tests

* Add feature extractor tests

* Add proper padding/truncation to feature extractor

* Add support for conversion of all audioset checkpoints

* Improve docs and extend conversion script

* Fix README

* Rename spectogram to spectrogram

* Fix copies

* Add integration test

* Remove dummy conv

* Update to ast

* Update organization

* Fix init

* Rename model to AST

* Add require_torchaudio annotator

* Move import of ASTFeatureExtractor under a is_speech_available

* Fix rebase

* Add pipeline config

* Update name of classifier head

* Rename time_dimension and frequency_dimension for clarity

* Remove print statement

* Fix pipeline test

* Fix pipeline test

* Fix index table

* Fix init

* Fix conversion script

* Rename to ForAudioClassification

* Fix index table

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-11-21 18:58:54 +01:00
Yih-Dar
536e60d2c7
mark test_save_load_fast_init_from_base as is_flaky (#20200)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-14 18:51:33 +01:00
Nicolas Patry
bac2d29a80
Attempting to test automatically the _keys_to_ignore. (#20042)
* Attempting to test automatically the `_keys_to_ignore`.

* Style.

* First fix pass.

* Moving test on its own.

* Another batch.

* Second round removing BatchNorm

* Fixing layoutlmv{2,3} + support older Python.

* Disable miss missing warning.

* Removing dodgy additions.

* Big pass.

* mbart.

* More corrections.

* Fixup.

* Updating test_correct_missing_keys

* Add escape hatch for when the head has no extra params so doesn't need

the missing keys check.

* Fixing test.

* Greener.

* Green ! (except for weird splinter bug).

* Adding a test about `named_parameters` usage.

* Shorten message.

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* After rebase modifications.

* More explicit condition checking.

* Fixing slow tests issues.

* Remove extra pdb.

* Remove print.

* Attempt to make failure consistent + fixing roc_bert.

* Removing the seed  (all tests passing with it).

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-11-09 16:03:36 +01:00
Michael Benayoun
9080607b2c
Fixed torch.finfo issue with torch.fx (#20040) 2022-11-03 16:14:44 +01:00
Sylvain Gugger
49b77b89ea
Quality (#20002) 2022-11-02 09:53:37 -04:00
Younes Belkada
7629656926
accelerate support for RoBERTa family (#19906) 2022-10-26 22:41:53 +02:00
Yih-Dar
688c3e8e40
Update max_diff in test_save_load_fast_init_to_base (#19849)
* Fix test_save_load_fast_init_to_base

* Fix test_save_load_fast_init_to_base

* update

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-10-26 17:09:47 +02:00
Yih-Dar
3a1aeea3c5
Fix CTRL test_torchscrip_xxx CI by updating _create_and_check_torchscript (#19786)
* Run inputs before trace

* Run inputs before trace

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-10-21 16:23:13 +02:00
Sylvain Gugger
3e2dd7f92d
Poc to use safetensors (#19175)
* Poc to use safetensors

* Typo

* Final version

* Add tests

* Save with the right name!

* Update tests/test_modeling_common.py

Co-authored-by: Julien Chaumond <julien@huggingface.co>

* Support for sharded checkpoints

* Test from Hub part 1

* Test from hub part 2

* Fix regular checkpoint sharding

* Bump for fixes

Co-authored-by: Julien Chaumond <julien@huggingface.co>
2022-09-30 10:58:04 -04:00
Younes Belkada
4d0f8c05f5
Add accelerate support for ViLT (#18683) 2022-09-22 13:14:39 +02:00
Sylvain Gugger
ca485e562b
Add tests for legacy load by url and fix bugs (#19078) 2022-09-16 23:20:02 +02:00
Ankur Goyal
2ef7742117
Add DocumentQuestionAnswering pipeline (#18414)
* [WIP] Skeleton of VisualQuestionAnweringPipeline extended to support LayoutLM-like models

* Fixup

* Use the full encoding

* Basic refactoring to DocumentQuestionAnsweringPipeline

* Cleanup

* Improve args, docs, and implement preprocessing

* Integrate OCR

* Refactor question_answering pipeline

* Use refactored QA code in the document qa pipeline

* Fix tests

* Some small cleanups

* Use a string type annotation for Image.Image

* Update encoding with image features

* Wire through the basic docs

* Handle invalid response

* Handle empty word_boxes properly

* Docstring fix

* Integrate Donut model

* Fixup

* Incorporate comments

* Address comments

* Initial incorporation of tests

* Address Comments

* Change assert to ValueError

* Comments

* Wrap `score` in float to make it JSON serializable

* Incorporate AutoModeLForDocumentQuestionAnswering changes

* Fixup

* Rename postprocess function

* Fix auto import

* Applying comments

* Improve docs

* Remove extra assets and add copyright

* Address comments

Co-authored-by: Ankur Goyal <ankur@impira.com>
2022-09-07 13:38:49 -04:00
Lucain
169b8cde47
Fix mock in test_cached_files_are_used_when_internet_is_down (#18804) 2022-08-29 15:56:08 +02:00
Yih-Dar
8b67f20935
Fix memory leak issue in torch_fx tests (#18547)
Co-authored-by: Lysandre Debut <hi@lysand.re>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-08-29 11:43:20 +02:00
Sylvain Gugger
5cd4032368
Use new huggingface_hub tools for download models (#18438)
* Draft new cached_file

* Initial draft for config and model

* Small fixes

* Fix first batch of tests

* Look in cache when internet is down

* Fix last tests

* Bad black, not fixing all quality errors

* Make diff less

* Implement change for TF and Flax models

* Add tokenizer and feature extractor

* For compatibility with main

* Add utils to move the cache and auto-do it at first use.

* Quality

* Deal with empty commit shas

* Deal with empty etag

* Address review comments
2022-08-05 10:12:40 -04:00
NielsRogge
f9a0008d2d
Add VideoMAE (#17821)
* First draft

* Add VideoMAEForVideoClassification

* Improve conversion script

* Add VideoMAEForPreTraining

* Add VideoMAEFeatureExtractor

* Improve VideoMAEFeatureExtractor

* Improve docs

* Add first draft of model tests

* Improve VideoMAEForPreTraining

* Fix base_model_prefix

* Make model take pixel_values of shape (B, T, C, H, W)

* Add loss computation of VideoMAEForPreTraining

* Improve tests

* Improve model testsé

* Make all tests pass

* Add VideoMAE to main README

* Add tests for VideoMAEFeatureExtractor

* Add integration test

* Improve conversion script

* Rename patch embedding class

* Remove VideoMAELayer from init

* Update design of patch embeddings

* Improve comments

* Improve conversion script

* Improve conversion script

* Add conversion of pretrained model

* Add loss verification of pretrained model

* Add loss verification of unnormalized targets

* Add integration test for pretraining model

* Apply suggestions from code review

* Fix bug to make feature extractor resize only shorter edge

* Address more comments

* Improve normalization of videos

* Add doc examples

* Move constants to dedicated script

* Remove scripts

* Transfer checkpoints, fix docs

* Update script

* Update image mean and std

* Fix doc tests

* Set return_tensors to NumPy by default

* Revert the previous change

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-08-04 18:02:55 +02:00
Sylvain Gugger
01db72abd4
Rewrite push_to_hub to use upload_files (#18366)
* Rewrite push_to_hub to use upload_files

* Adapt the doc a bit

* Address review comments and clean doc
2022-08-01 12:07:30 -04:00
Mikkel Denker
70e7d1d656
Fixes torch jit tracing for LayoutLMv2 model (re-open) (#18313)
* Fixes torch jit tracing for LayoutLMv2 model.
Pytorch seems to reuse memory for input_shape which caused a mismatch in shapes later in the forward pass.

* Fixed code quality

* avoid unneeded allocation of vector for shape
2022-07-27 06:38:40 -04:00
Patrick von Platen
3bb6356d4d
[From pretrained] Allow download from subfolder inside model repo (#18184)
* add first generation tutorial

* [from_pretrained] Allow loading models from subfolders

* remove gen file

* add doc strings

* allow download from subfolder

* add tests

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* apply comments

* correct doc string

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-07-19 11:53:53 +02:00
Yih-Dar
6561fbcc6e
Update TF(Vision)EncoderDecoderModel PT/TF equivalence tests (#18073)
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-07-18 15:29:14 +02:00
Sylvain Gugger
df8e6804c0
Offload fixes (#17810)
* Offload fixes

* Add a test
2022-06-22 12:23:07 -04:00
Yih-Dar
f47afefb21
Use 5e-5 For BigBird PT/Flax equivalence tests (#17780)
* rename to check_pt_flax_outputs

* update check_pt_flax_outputs

* use 5e-5 for BigBird PT/Flax test

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-06-21 17:55:26 +02:00
Lysandre Debut
6a5272b205
Prepare transformers for v0.8.0 huggingface-hub release (#17716)
* Prepare CI for v0.8.0

* pin hfh (revert before merge)

* Revert "pin hfh (revert before merge)"

This reverts commit a0103140e1.

* Test rc3

* Test latest rc

* Unpin to the RC

Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
2022-06-21 11:51:18 -04:00
Stas Bekman
75343de938
[modeling_utils] torch_dtype/auto floating dtype fixes (#17614)
* [modeling_utils] torch_dtype/auto fixes

* add test

* apply suggestions

* add missing fallback

* Renaming things

* Use for else

Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
2022-06-09 10:18:26 -07:00
amyeroberts
dfc76b2542
has_attentions - consistent test skipping logic and tf tests (#17495) 2022-06-09 09:50:03 +02:00
Michael Benayoun
5c8f601007
Fx support for Deberta-v[1-2], Hubert and LXMERT (#17539)
* Support for deberta and deberta-v2

* Support for LXMert

* Support for Hubert

* Fix for pt1.11

* Trigger CI
2022-06-07 18:05:20 +02:00
Sylvain Gugger
8343901263
Fix all offload and MP tests (#17533) 2022-06-03 09:59:13 -04:00
Sylvain Gugger
4390151ba2
Fix MP and CPU offload tests for Funnel and GPT-Neo (#17503) 2022-06-01 09:59:40 -04:00
Sylvain Gugger
567d9c061d
Disk offload fix (#17428)
* Fix offload to disk for big models

* Add test

* Fix test for other models
2022-05-31 09:16:18 -04:00
Michael Benayoun
28d0048218
Fx support for multiple model architectures (#17393)
* Support for Bart and LayoutLM, and partial support for XLNet

* Support for mbart

* A lot of new models supported

* Support for other models

* LayoutLM fix

* Use strings instead of classes
2022-05-31 10:02:55 +02:00
Sylvain Gugger
98f6e1ee87
Fix model parallelism test (#17439) 2022-05-26 09:57:12 -04:00
Sylvain Gugger
31484afbed
Add test for new model parallelism features (#17401) 2022-05-25 10:51:27 -04:00
Sylvain Gugger
56f50590d5
Use Accelerate in from_pretrained for big model inference (#17341)
* Initial work

* More or less finished with first draft

* Update src/transformers/modeling_utils.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update src/transformers/modeling_utils.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Fix randomly initialized weights

* Update src/transformers/modeling_utils.py

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

* Address review comments

* Rename DeepSpeed folder to temporarily fix the test issue?

* Revert to try if Accelerate fix works

* Use latest Accelerate release

* Quality and fixes

* Style

* Quality

* Add doc

* Test + fix

* More blocks

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
2022-05-23 14:32:21 -04:00
Michael Benayoun
2e7e4280aa
Traced models serialization and torchscripting fix (#17206)
* Fix torch.jit.script and pickling issues

* Fix get_attr issues

* Fix import in function

* Fix GPT-J and T5 tracing for torch=1.11

* Gate graph surgery on torch version

* Modeling minor changes to enable TorchScripting

* Model serialization / deserialization test

* Remove _assert_is_none users
2022-05-23 17:50:40 +02:00
Kyungmin Lee
f0395cf58e
Fix test_model_parallelization (#17249)
* Fix test_model_parallelization

* Modify
2022-05-16 23:30:49 +02:00
Sylvain Gugger
afe5d42d8d
Black preview (#17217)
* Black preview

* Fixup too!

* Fix check copies

* Use the same version as the CI

* Bump black
2022-05-12 16:25:55 -04:00
Michael Benayoun
8c7481f35c
ViT and Swin symbolic tracing with torch.fx (#17182)
* Support tracing for ViT

* Swin support

* Fix copies

* Fix type annotation issue

* Removed unused import
2022-05-12 10:42:27 +02:00
Yih-Dar
e6d23a4b9b
Improve test_pt_tf_model_equivalence on PT side (#16731)
* Update test_pt_tf_model_equivalence on PT side

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-04-19 21:13:27 +02:00
Stas Bekman
5da33f8729
[modeling utils] revamp from_pretrained(..., low_cpu_mem_usage=True) + tests (#16657)
* add low_cpu_mem_usage tests

* wip: revamping

* wip

* install /usr/bin/time

* wip

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* fix assert

* put the wrapper back

* cleanup; switch to bert-base-cased

* Trigger CI

* Trigger CI
2022-04-14 18:10:05 -07:00
Yih-Dar
c04619ecf3
Enable more test_torchscript (#16679)
* update _create_and_check_torchscript

* Enable test_torchscript

* clear_class_registry

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-04-11 18:23:35 +02:00
Yih-Dar
3918d6a9d6
Reduce memory leak in _create_and_check_torchscript (#16691)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-04-11 18:22:28 +02:00
Yih-Dar
2109afae71
Rename the method test_torchscript (#16693)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-04-11 18:21:45 +02:00
NielsRogge
979b039c89
Add DPT (#15991)
* First draft

* More improvements

* Add fusion blocks

* Make conversion script work for dpt_large

* Make conversion script work

* Improve implementation

* Improve conversion script

* Add DPTForSemanticSegmentation

* Make conversion work for semantic segmentation

* Add tests

* Remove print statements

* First draft

* Redesign neck

* Improve tests

* Improve implementation some more

* Make neck output list of tensors

* Improve neck and feature extractor

* Fix integration tests

* Make more tests pass

* Make all tests pass

* Add missing config archive map

* Add in_index attribute to make heads accept list of tensors

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply some more suggestions

* Add copied from statements

* Remove assert

* Apply suggestions from code review

* Apply suggestions from code review

* Remove DPTInterpolate in favor of nn.Upsample

* Add comments

* Apply suggestions from code review

* Apply suggestions from code review

* Add proposed design

* Update design

* Add DPTReassembleLayer

* Add DPTFeatureFusionStage

* Apply more suggestions from code review

* Apply suggestions from code review

* Apply suggestions from code review

* Fix rebase

* Update in_index and out_indices

* Fix conversion script

* Fix code quality

* Add model to toctree and use DepthEstimatorOutput

* Fix rebase

* Fix code examples

* Improve code

* Fix copied from statements

* Apply suggestions from code review

* Remove compute_loss method

* Apply suggestions from code review

* Fix documentation tests file

* Remove test.py file

* Improve doc example

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Niels Rogge <nielsrogge@nielss-mbp.home>
2022-03-28 16:28:10 +02:00
Sylvain Gugger
b473617d63
Checkpoint sharding (#16343)
* Sharded checkpoint support

* Handle distant sharded checkpoints

* Add tests

* TODO is done

* Apply suggestions from code review

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Fix docstring

* Add example and format

* Address review comments

* More review comments

* End of merge

* Revert unintentional change

* VsCode what did you do?

* Style

* Changes

* Address final comments

* Quality

* Moar tests

* Move import beneath is_pt_available

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2022-03-25 11:59:25 -04:00
Yih-Dar
f571dc20ac
Update PT Flax equivalence tests in PT test file (#16280)
* update PT/Flax equivalence tests on PT side

* overwrite check_outputs in BigBirdModelTest

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-03-24 14:45:30 +01:00
Sylvain Gugger
c595b6e6a9
Make Transformers use cache files when hf.co is down (#16362)
* Make Transformers use cache files when hf.co is down

* Fix tests

* Was there a random circleCI failure?

* Isolate patches

* Style

* Comment out the failure since it doesn't fail anymore

* Better comment
2022-03-23 15:56:49 -04:00
Sylvain Gugger
4975002df5
Reorganize file utils (#16264)
* Split file_utils in several submodules

* Fixes

* Add back more objects

* More fixes

* Who exactly decided to import that from there?

* Second suggestion to code with code review

* Revert wront move

* Fix imports

* Adapt all imports

* Adapt all imports everywhere

* Revert this import, will fix in a separate commit
2022-03-23 10:26:33 -04:00
Yih-Dar
75c666b4a8
Aggressive PT/TF equivalence test on PT side (#16250)
* Aggressive PT/TF equivalence test on PT side

* Ugly fix for `TFTapasForQuestionAnswering`

* apply review suggestions

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-03-18 18:51:24 +01:00
NielsRogge
8d83ebdf18
[Tests] Add attentions_option to ModelTesterMixin (#15909)
* Add attentions_option to common tester

* Fix tests, apply suggestion

* Apply suggestion from code review

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-03-10 12:00:30 +01:00
NielsRogge
286fdc6b3c
[vision] Add problem_type support (#15851)
* Add problem_type to missing models

* Fix deit test

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-03-01 18:09:52 +01:00
Eduardo Gonzalez Ponferrada
df5a4094a6
Add Data2Vec (#15507)
* Add data2vec model cloned from roberta

* Add checkpoint conversion script

* Fix copies

* Update docs

* Add checkpoint conversion script

* Remove fairseq data2vec_text script and fix format

* Add comment on where to get data2vec_text.py

* Remove mock implementation cheat.py and fix style

* Fix copies

* Remove TF and Flax classes from init

* Add back copy from fairseq data2vec_text.py and fix style

* Update model name in docs/source/index.mdx to be CamelCase

* Revert model name in table to lower-case to get check_table test to pass

* Update src/transformers/models/data2vec/__init__.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/convert_data2vec_original_pytorch_checkpoint_to_pytorch.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update docs/source/model_doc/data2vec.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/model_doc/data2vec.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/auto/configuration_auto.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/configuration_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update tests/test_modeling_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/configuration_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update documentation

* Copy-paste Data2VecConfig from BertConfig

* Update config checkpoint to point to edugp/data2vec-nlp-base. Fix style and repo-consistency

* Update config special tokens to match RoBERTa

* Split multiple assertions and add individual error messages

* Rename Data2VecModel to Data2VecForTextModel

* Add Data2Vec to _toctree.yml

* Rename Data2VecEmbeddings to Data2VecForTextEmbeddings

* Add initial Data2VecForAudio model (unfinished). Only matching fairseq's implementation up to the feature encoder (before positional encoding).

* finish audio model

* finish audio file

* Update names and fix style, quality and repo consistency

* Remove Data2VecAudioForPretraining. Add tests for Data2VecAudio, mimicking the Wav2Vec2 test suite. Fix bias initilization in positional conv layers. Move back configurations for audio and text to separate files.

* add inputs to logits to data2vec'

* correct autio models

* correct config auto

* correct tok auto

* Update utils/tests_fetcher.py

* delete unnecessary files

* delete unnecessary files

* further renaming

* make all tests pass

* finish

* remove useless test file

* Update tests/test_modeling_common.py

* Update utils/check_repo.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/modeling_data2vec_text.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Fix copies

* Update docs

* Remove fairseq data2vec_text script and fix format

* Add comment on where to get data2vec_text.py

* Remove mock implementation cheat.py and fix style

* Fix copies

* Remove TF and Flax classes from init

* Add back copy from fairseq data2vec_text.py and fix style

* Update model name in docs/source/index.mdx to be CamelCase

* Revert model name in table to lower-case to get check_table test to pass

* Update documentation

* Update src/transformers/models/data2vec/__init__.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/convert_data2vec_original_pytorch_checkpoint_to_pytorch.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/auto/configuration_auto.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/configuration_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update tests/test_modeling_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/configuration_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Copy-paste Data2VecConfig from BertConfig

* Update config checkpoint to point to edugp/data2vec-nlp-base. Fix style and repo-consistency

* Update config special tokens to match RoBERTa

* Split multiple assertions and add individual error messages

* Rename Data2VecModel to Data2VecForTextModel

* Add Data2Vec to _toctree.yml

* Rename Data2VecEmbeddings to Data2VecForTextEmbeddings

* Add initial Data2VecForAudio model (unfinished). Only matching fairseq's implementation up to the feature encoder (before positional encoding).

* finish audio model

* finish audio file

* add inputs to logits to data2vec'

* Update names and fix style, quality and repo consistency

* Remove Data2VecAudioForPretraining. Add tests for Data2VecAudio, mimicking the Wav2Vec2 test suite. Fix bias initilization in positional conv layers. Move back configurations for audio and text to separate files.

* correct autio models

* correct config auto

* correct tok auto

* delete unnecessary files

* delete unnecessary files

* Update utils/tests_fetcher.py

* further renaming

* make all tests pass

* finish

* remove useless test file

* Update tests/test_modeling_common.py

* Update utils/check_repo.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/modeling_data2vec_text.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Move data2vec tests to new structure

* Fix test imports for text tests

* Remove fairseq files

* Change paper link to arxiv

* Modify Data2Vec documentation to reflect that the encoder is not shared across the audio and text models in the current implementation.

* Update text model checkpoint to be facebook/data2vec-text-base

* Add 'Copy from' statements and update paper links and docs

* fix copy from statements

* improve copied from

* correct more copied from statements

* finish copied from stuff

* make style

* add model to README

* add to master

Co-authored-by: Eduardo Gonzalez Ponferrada <eduardo@ferrumhealth.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-03-01 11:09:20 +01:00
Patrick von Platen
ddbb485c41
[TF-PT-Tests] Fix PyTorch - TF tests for different GPU devices (#15846) 2022-02-28 15:46:46 -05:00
Sylvain Gugger
d1fcc90abf
Fix from_pretrained with default base_model_prefix (#15814) 2022-02-24 11:43:51 +01:00
NielsRogge
57882177be
Add SimMIM (#15586)
* Add first draft

* Make model importable

* Make SwinForMaskedImageModeling importable

* Fix imports

* Add missing inits

* Add support for Swin

* Fix bug

* Fix bug

* Fix another bug

* Fix Swin MIM implementation

* Fix default encoder stride

* Fix Swin

* Add print statements for debugging

* Add image_size data argument

* Fix Swin

* Fix image_size

* Add print statements for debugging

* Fix print statement

* Remove print statements

* Improve reshaping of bool_masked_pos

* Add support for DeiT, fix tests

* Improve docstrings

* Apply new black version

* Improve script

* Fix bug

* Improve README

* Apply suggestions from code review

* Remove DS_Store and add to gitignore

* Apply suggestions from code review + fix BEiT Flax

* Revert BEiT changes

* Improve README

* Fix code quality

* Improve README

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MBP.localdomain>
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-02-17 19:44:55 +01:00
Lysandre Debut
943e2aa036
Fix model equivalence tests (#15670)
* Fix model equivalence tests

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-02-15 18:55:22 -05:00
Sylvain Gugger
1f60bc46f3
Make sure custom configs work with Transformers (#15569)
* Make sure custom configs work with Transformers

* Apply code review suggestions
2022-02-09 10:04:44 -05:00
Joao Gante
8406fa6dd5
Add TFSpeech2Text (#15113)
* Add wrapper classes

* convert inner layers to tf

* Add TF Encoder and Decoder layers

* TFSpeech2Text models

* Loadable model

* TF model with same outputs as PT model

* test skeleton

* correct tests and run the fixup

* correct attention expansion

* TFSpeech2Text pask_key_values with TF format
2022-02-08 16:27:23 +00:00
Michael Benayoun
0fe17f375a
FX tracing improvement (#14321)
* Change the way tracing happens, enabling dynamic axes out of the box

* Update the tests and modeling xlnet

* Add the non recoding of leaf modules to avoid recording more values for the methods to record than what will be seen at tracing time (which would otherwise desynchronize the recorded values and the values that need to be given to the proxies during tracing, causing errors).

* Comments and making tracing work for gpt-j and xlnet

* Refactore things related to num_choices (and batch_size, sequence_length)

* Update fx to work on PyTorch 1.10

* Postpone autowrap_function feature usage for later

* Add copyrights

* Remove unnecessary file

* Fix issue with add_new_model_like

* Apply suggestions
2022-02-07 22:25:33 +01:00
Sylvain Gugger
44b21f117b
Save code of registered custom models (#15379)
* Allow dynamic modules to use relative imports

* Work for configs

* Fix last merge conflict

* Save code of registered custom objects

* Map strings to strings

* Fix test

* Add tokenizer

* Rework tests

* Tests

* Ignore fixtures py files for tests

* Tokenizer test + fix collection

* With full path

* Rework integration

* Fix typo

* Remove changes in conftest

* Test for tokenizers

* Add documentation

* Update docs/source/custom_models.mdx

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Add file structure and file content

* Add more doc

* Style

* Update docs/source/custom_models.mdx

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Address review comments

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
2022-02-02 10:44:37 -05:00
Sylvain Gugger
33f36c869f
Add a main_input_name attribute to all models (#14803)
* Add a main_input_name attribute to all models

* Fix tests

* Wtf Vs Code?

* Update src/transformers/models/imagegpt/modeling_imagegpt.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Style

* Fix copies

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-12-20 11:19:08 -05:00
NielsRogge
25156eb296
Rename ImageGPT (#14526)
* Rename

* Add MODEL_FOR_CAUSAL_IMAGE_MODELING_MAPPING
2021-11-29 10:19:11 +01:00
Sylvain Gugger
d83b0e0c07
Add a post init method to all models (#14431)
* Add a post init method to all models

* Fix tests

* Fix last tests

* Fix templates

* Add comment

* Forgot to save
2021-11-18 08:38:09 -05:00
Sylvain Gugger
040fd47162
Fix gradient_checkpointing backward compatibility (#14408)
* Fix gradient_checkpointing backward compatibility

* Remove needless line

* make sure mask prob is big enough and length small enough

* Fix tests

Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
2021-11-16 08:58:42 -05:00
Yih-Dar
be4a6c64dc
Add TFViTModel (#13778)
* Start the work for TFViTModel

* Convert to TF code - need to check in the follow up commits

* Clean up model code

* Expose TFViTModel

* make style

* make quality

* Add test

* make style & quality

* Fix some imports

* fix wrong usage - *kwargs => ** kwargs

* Fix Conv2D weight loading (PT->TF) issue

* Add tests for images with different sizes + fix model

* Fix some common tests for TFViTModel

* Use inputs instead of input_ids in test_compile_tf_model

* Add a comment about transpose and Conv2D in convert_tf_weight_name_to_pt_weight_name

* Avoid transpose in TFViT call

* Fix Conv2D issue in load_tf2_weights_in_pytorch_model

* Use tf.keras.layers.Conv2D instead of tf.nn.conv2d

* Using simpler heuristic to detect Conv2D layer

* Change convert_tf_weight_name_to_pt_weight_name to return TransposeType

* Check tf_weight_shape is not None before using it

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix missing comma

* fix input dtype

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-11-09 07:54:37 -05:00
Sylvain Gugger
dfb00bf644
Expand dynamic supported objects to configs and tokenizers (#14296)
* Dynamic configs

* Add config test

* Better tests

* Add tokenizer and test

* Add to from_config

* With save
2021-11-08 15:28:25 -05:00
Sylvain Gugger
558f8543ba
Update Transformers to huggingface_hub >= 0.1.0 (#14251)
* Update Transformers to huggingface_hub >= 0.1.0

* Forgot to save...

* Style

* Fix test
2021-11-02 18:58:42 -04:00
NielsRogge
e20faa6f03
Add BeitForSemanticSegmentation (#14096)
* Add first draft

* Make forward pass work

* Improve conversion script

* Add notebook that checks if it works

* Add BeitForSemanticSegmentation to the tests

* More improvements

* Make BeitForSemanticSegmentation consistent with Segformer

* Small bug fix

* Add BeitForSemanticSegmentation to docs

* Make sure model doesn't output hidden states when the user doesn't want to

* Make it possible to convert the large model

* Fix issue

* Fix conversion script for large model

* Add auxiliary_head option to semantic segmentation model

* Apply suggestions from @sgugger's review

* Apply suggestions from code review

* Fix failing test

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-11-01 19:55:45 +01:00
Sylvain Gugger
c28bc80bbb
Generalize problem_type to all sequence classification models (#14180)
* Generalize problem_type to all classification models

* Missing import

* Deberta BC and fix tests

* Fix template

* Missing imports

* Revert change to reformer test

* Fix style
2021-10-29 10:32:56 -04:00
Patrick von Platen
0c3174c758
Add TF<>PT and Flax<>PT everywhere (#14047)
* up

* up

* up

* up

* up

* up

* up

* add clip

* fix clip PyTorch

* fix clip PyTorch

* up

* up

* up

* up

* up

* up

* up
2021-10-25 23:55:08 +02:00
Li-Huai (Allan) Lin
234cfefbb0
Fix ignore_mismatched_sizes (#14085)
* Fix

* Style

* Name

* Fix tests

* Style

* Remove embed sizes checking

* Disable some tests

* Fix

* Apply suggestion
2021-10-21 12:31:29 -04:00
Patrick von Platen
dca6796876
[Gradient checkpoining] Correct disabling find_unused_parameters in Trainer when gradient checkpointing is enabled (#13961)
* up

* correct test
2021-10-11 15:34:01 +02:00
Michael Benayoun
d4e4efce68
Initial support for symbolic tracing with torch.fx allowing dynamic axes (#13579)
* Symbolic trace dynamic axes support for BERT like models (albert, bert, distilbert, mobilebert, electra, megatron-bert)
* Sanity checks before tracing that make sure the model to trace is supported
* Adapted to PyTorch 1.9

Co-authored-by: Michael Benayoun <michael@huggingface.co>
2021-10-05 14:19:47 +02:00
Sylvain Gugger
27d4639779
Make gradient_checkpointing a training argument (#13657)
* Make gradient_checkpointing a training argument

* Update src/transformers/modeling_utils.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update src/transformers/configuration_utils.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Fix tests

* Style

* document Gradient Checkpointing as a performance feature

* Small rename

* PoC for not using the config

* Adapt BC to new PoC

* Forgot to save

* Rollout changes to all other models

* Fix typo

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>
2021-09-22 07:51:38 -04:00
Sylvain Gugger
002a078aff
Dynamically load model code from the Hub (#13467)
* Dynamic model

* Use defensive flag

* Style

* Doc and arg rename

* Arg rename

* Add tests

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-09-20 13:59:21 -04:00
Patrick von Platen
95f933ea85
[Pretrained Model] Add resize_position_embeddings (#13559)
* finish

* delete bogus file

* correct some stuff

* finish

* finish
2021-09-15 19:03:56 +02:00
Sylvain Gugger
74b3344fbc Clean up test file 2021-08-31 07:06:49 -04:00
Sylvain Gugger
8b2de0e483
Tests fetcher tests (#13340)
* Incorporate tests dependencies in tests_fetcher

* Harder modif

* Debug

* Loop through all files

* Last modules

* Remove debug statement
2021-08-31 03:57:01 -04:00
Stas Bekman
5c6eca71a9
fix AutoModel.from_pretrained(..., torch_dtype=...) (#13209)
* fix AutoModel.from_pretrained(..., torch_dtype=...)

* fix to_diff_dict

* add better test

* torch is not always available when a model has self.torch_dtype
2021-08-24 11:43:41 +02:00
Lysandre Debut
3290315a2a
Fix AutoModel tests (#12733) 2021-07-15 09:06:12 -04:00
Sylvain Gugger
90178b0cef
Add option to load a pretrained model with mismatched shapes (#12664)
* Add option to load a pretrained model with mismatched shapes

* Fail at loading when mismatched shapes in Flax

* Fix tests

* Update src/transformers/modeling_flax_utils.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Address review comments

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-07-13 10:15:15 -04:00
Stas Bekman
2d1d92181a
[roberta] fix lm_head.decoder.weight ignore_key handling (#12446)
* fix lm_head.decoder.weight ignore_key handling

* fix the mutable class variable

* Update src/transformers/models/roberta/modeling_roberta.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* replicate the comment

* make deterministic

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-07-01 10:31:19 -07:00
Stas Bekman
7682e97702
[models] respect dtype of the model when instantiating it (#12316)
* [models] respect dtype of the model when instantiating it

* cleanup

* cleanup

* rework to handle non-float dtype

* fix

* switch to fp32 tiny model

* improve

* use dtype.is_floating_point

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix the doc

* recode to use explicit torch_dtype_auto_detect, torch_dtype args

* docs and tweaks

* docs and tweaks

* docs and tweaks

* merge 2 args, add docs

* fix

* fix

* better doc

* better doc

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-28 20:11:21 -07:00
Lysandre Debut
8ef62ec9e1
Fix torchscript tests (#12336)
* Fix torchscript tests

* Better test

* Remove bogus print
2021-06-24 09:52:28 -04:00
Michael Benayoun
986ac03e37
changed modeling_fx_utils.py to utils/fx.py for clarity (#12326)
Co-authored-by: Michael Benayoun <michael@huggingface.co>
2021-06-23 18:16:24 +02:00
Sylvain Gugger
53c60babe4
Clean push to hub API (#12187)
* Clean push to hub API

* Create working dir if it does not exist

* Different tweak

* New API + all models + test Flax

* Adds the Trainer clean up

* Update src/transformers/file_utils.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments

* (nit) output types

* No need to set clone_from when folder exists

* Update src/transformers/trainer.py

Co-authored-by: Julien Chaumond <julien@huggingface.co>

* Add generated_from_trainer tag

* Update to new version

* Fixes

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-06-23 10:11:19 -04:00
Stas Bekman
372ab9cd6d
[style] consistent nn. and nn.functional: part 3 tests (#12155)
* consistent nn. and nn.functional: p3 templates

* restore
2021-06-14 12:18:22 -07:00
NielsRogge
d3eacbb829
Add DETR (#11653)
* Squash all commits of modeling_detr_v7 branch into one

* Improve docs

* Fix tests

* Style

* Improve docs some more and fix most tests

* Fix slow tests of ViT, DeiT and DETR

* Improve replacement of batch norm

* Restructure timm backbone forward

* Make DetrForSegmentation support any timm backbone

* Fix name of output

* Address most comments by @LysandreJik

* Give better names for variables

* Conditional imports + timm in setup.py

* Address additional comments by @sgugger

* Make style, add require_timm and require_vision to testsé

* Remove train_backbone attribute of DetrConfig, add methods to freeze/unfreeze backbone

* Add png files to fixtures

* Fix type hint

* Add timm to workflows

* Add `BatchNorm2d` to the weight initialization

* Fix retain_grad test

* Replace model checkpoints by Facebook namespace

* Fix name of checkpoint in test

* Add user-friendly message when scipy is not available

* Address most comments by @patrickvonplaten

* Remove return_intermediate_layers attribute of DetrConfig and simplify Joiner

* Better initialization

* Scipy is necessary to get sklearn metrics

* Rename TimmBackbone to DetrTimmConvEncoder and rename DetrJoiner to DetrConvModel

* Make style

* Improve docs and add 2 community notebooks

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-06-09 11:51:13 -04:00
Lysandre Debut
db0b2477cc
Add some tests to the slow suite #11860 2021-05-25 04:06:06 -04:00
Michael Benayoun
f4a0d6ff86
A cleaner and more scalable implementation of symbolic tracing (#11763)
Cleaner and more scalable implementation of symbolic tracing with torch.fx, and provides support for new architectures:
- ALBERT
- DistilBERT
- MobileBERT
- MegatronBERT
- GPT2
- GPT Neo

Co-authored-by: Michael Benayoun <michael@huggingface.co>
2021-05-20 18:02:29 +02:00
Sylvain Gugger
469384a777
Fix regression in regression (#11785)
* Fix regression in regression

* Add test
2021-05-20 09:55:13 -04:00
Michael Benayoun
86d5fb0b36
Experimental symbolic tracing feature with torch.fx for BERT, ELECTRA and T5 (#11475)
Symbolic tracing feature for BERT, ELECTRA and T5

Co-authored-by: Michael Benayoun <michael@huggingface.co>
Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-05-14 20:57:30 +02:00
Volodymyr Byno
218d552f30
Fix loading the best model on the last stage of training (#11718) 2021-05-13 16:11:12 -04:00
Sylvain Gugger
f13f1f8fb8
Test checkpointing (#11682)
* Add test and see where CI is unhappy

* Load with strict=False
2021-05-11 12:02:48 -04:00
Vasudev Gupta
dc3f6758cf
Add BigBirdPegasus (#10991)
* init bigbird pegasus

* add debugging nb ; update config

* init conversion

* update conversion script

* complete conversion script

* init forward()

* complete forward()

* add tokenizer

* add some slow tests

* commit current

* fix copies

* add docs

* add conversion script for bigbird-roberta-summarization

* remove TODO

* small fixups

* correct tokenizer

* add bigbird core for now

* fix config

* fix more

* revert pegasus-tokenizer back

* make style

* everything working for pubmed; yayygit status

* complete tests finally

* remove bigbird pegasus tok

* correct tokenizer

* correct tests

* add tokenizer files

* finish make style

* fix test

* update

* make style

* fix tok utils base file

* make fix-copies

* clean a bit

* small update

* fix some suggestions

* add to readme

* fix a bit, clean tests

* fix more tests

* Update src/transformers/__init__.py

* Update src/transformers/__init__.py

* make fix-copies

* complete attn switching, auto-padding left

* make style

* fix auto-padding test

* make style

* fix batched attention tests

* put tolerance at 1e-1 for stand-alone decoder test

* fix docs

* fix tests

* correct slow tokenizer conversion

* Apply suggestions from code review

Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* complete remaining suggestions

* fix test

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-05-07 09:27:43 +02:00
Patrick von Platen
3e3e41ae20
Pytorch - Lazy initialization of models (#11471)
* lazy_init_weights

* remove ipdb

* save int

* add necessary code

* remove unnecessary utils

* Update src/transformers/models/t5/modeling_t5.py

* clean

* add tests

* correct

* finish tests

* finish tests

* fix some more tests

* fix xlnet & transfo-xl

* fix more tests

* make sure tests are independent

* fix tests more

* finist tests

* final touches

* Update src/transformers/modeling_utils.py

* Apply suggestions from code review

* Update src/transformers/modeling_utils.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update src/transformers/modeling_utils.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* clean tests

* give arg positive name

* add more mock weights to xlnet

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2021-05-05 17:22:20 +02:00
abhishek thakur
c40c7e213b
Add multi-class, multi-label and regression to transformers (#11012)
* add to  bert

* review comments

* Update src/transformers/configuration_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/configuration_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* self.config.problem_type

* fix style

* fix

* fin

* fix

* update doc

* fix

* test

* Test more problem types

* Update src/transformers/configuration_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix

* remove

* fix

* quality

* make fix-copies

* remove test

Co-authored-by: abhishek thakur <abhishekkrthakur@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-05-04 02:23:40 -04:00
Patrick von Platen
f748bd4242
[Flax] Add docstrings & model outputs (#11498)
* add attentions & hidden states

* add model outputs + docs

* finish docs

* finish tests

* finish impl

* del @

* finish

* finish

* correct test

* apply sylvains suggestions

* Update src/transformers/models/bert/modeling_flax_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* simplify more

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-04-29 12:04:51 +02:00
Patrick von Platen
32dbb2d954
make style (#11442) 2021-04-26 13:50:34 +02:00
Daniel Stancl
e3ff165aa5
Fix cross-attention head mask for Torch encoder-decoder models (#10605)
* Fix cross-attention head mask for Torch BART models

* Fix head masking for cross-attention module for the following
models: BART, Blenderbot, Blenderbot_small, M2M_100, Marian, MBart,
Pegasus

* Enable test_headmasking for M2M_100 model

* Fix cross_head_mask for FSMT, LED and T5

* This commit fixes `head_mask` for cross-attention modules
in the following models: FSMT, LED, T5

* It also contains some smaller changes in doc so that
it is be perfectly clear the shape of `cross_head_mask`
is the same as of `decoder_head_mask`

* Update template

* Fix template for BartForCausalLM

* Fix cross_head_mask for Speech2Text models

* Fix cross_head_mask in templates

* Fix args order in BartForCausalLM template

* Fix doc in BART templates

* Make more explicit naming

* `cross_head_mask` -> `cross_attn_head_mask`

* `cross_layer_head_mask` -> `cross_attn_layer_head_mask`

* Fix doc

* make style quality

* Fix speech2text docstring
2021-04-23 18:58:06 +02:00
Sylvain Gugger
bf2e0cf70b
Trainer push to hub (#11328)
* Initial support for upload to hub

* push -> upload

* Fixes + examples

* Fix torchhub test

* Torchhub test I hate you

* push_model_to_hub -> push_to_hub

* Apply mixin to other pretrained models

* Remove ABC inheritance

* Add tests

* Typo

* Run tests

* Install git-lfs

* Change approach

* Add push_to_hub to all

* Staging test suite

* Typo

* Maybe like this?

* More deps

* Cache

* Adapt name

* Quality

* MOAR tests

* Put it in testing_utils

* Docs + torchhub last hope

* Styling

* Wrong method

* Typos

* Update src/transformers/file_utils.py

Co-authored-by: Julien Chaumond <julien@huggingface.co>

* Address review comments

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-04-23 09:17:37 -04:00
Sylvain Gugger
81009b7a5c
Replace error by warning when loading an architecture in another (#11207)
* Replace error by warning when loading an architecture in another

* Style

* Style again

* Add a test

* Adapt old test
2021-04-13 10:33:52 -04:00
Sylvain Gugger
ba8b1f4754
Add support for multiple models for one config in auto classes (#11150)
* Add support for multiple models for one config in auto classes

* Use get_values everywhere

* Prettier doc
2021-04-08 18:41:36 -04:00
NielsRogge
30677dc743
Add Vision Transformer and ViTFeatureExtractor (#10950)
* Squash all commits into one

* Update ViTFeatureExtractor to use image_utils instead of torchvision

* Remove torchvision and add Pillow

* Small docs improvement

* Address most comments by @sgugger

* Fix tests

* Clean up conversion script

* Pooler first draft

* Fix quality

* Improve conversion script

* Make style and quality

* Make fix-copies

* Minor docs improvements

* Should use fix-copies instead of manual handling

* Revert "Should use fix-copies instead of manual handling"

This reverts commit fd4e591bce.

* Place ViT in alphabetical order

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-04-01 11:16:05 -04:00
Sylvain Gugger
acc3bd9d2a
Enforce string-formatting with f-strings (#10980)
* First third

* Styling and fix mistake

* Quality

* All the rest

* Treat %s and %d

* typo

* Missing )

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-03-31 10:00:27 -04:00
Vimarsh Chaturvedi
094afa515d
from_pretrained: check that the pretrained model is for the right model architecture (#10586)
* Added check to ensure model name passed to from_pretrained and model are the same

* Added test to check from_pretrained throws assert error when passed an incompatiable model name

* Modified assert in from_pretrained with f-strings. Modified test to ensure desired assert message is being generated

* Added check to ensure config and model has model_type

* Fix FlauBERT heads

Co-authored-by: vimarsh chaturvedi <vimarsh chaturvedi>
Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-03-18 12:51:42 -04:00
Patrick von Platen
0234de8418
Add Fine-Tuning for Wav2Vec2 (#10145)
* add encode labels function to tokenizer

* start adding finetuning

* init dropout

* upload

* correct convert script

* apply changes

* fix second typo

* make first dummy training run

* adapt convert script

* push confg for comparison

* remove conf

* finish training

* adapt data collator

* add research folder

* update according to fairseq feedback

* some minor corrections

* refactor masking indices a bit

* some minor changes

* clean tokenizer

* finish clean-up

* remove previous logic

* update run script

* correct training

* finish changes

* finish model

* correct bug

* fix training a bit more

* add some tests

* finish gradient checkpointing

* finish example

* correct gradient checkpointing

* improve tokenization method

* revert changes in tokenizer

* revert general change

* adapt fine-tuning

* update

* save intermediate test

* Update README.md

* finish finetuning

* delete conversion script

* Update src/transformers/models/wav2vec2/configuration_wav2vec2.py

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* finish wav2vec2 script

* finish wav2vec2 fine-tuning

* finalize test

* correct test

* adapt tests

* finish

* remove test file

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-03-01 12:13:17 +03:00
Daniel Stancl
71bdc076dd
Add head_mask and decoder_head_mask to PyTorch LED (#9856)
* Add {decoder_,}head_mask to LED

* Fix create_custom_forward signatue in encoder

* Add head_mask to longformer

* Add head_mask to longformer to fix dependencies
of LED on Longformer.

* Not working yet

* Add mising one input in longofrmer_modeling.py

* make fix-copies
2021-02-02 11:06:52 -08:00
Patrick von Platen
12c1b5b8f4
fix test (#9669) 2021-01-19 09:06:24 +01:00
Daniel Stancl
357fb1c5d8
Add head_mask/decoder_head_mask for BART (#9569)
* Add head_mask/decoder_head_mask for BART

This branch implement head_mask and decoder_head_mask
for BART-based models. Full list below:
- BART
- MBart
- Blenderbot
- BlenderbotSmall
- Marian
- Pegasus

Everything is accompanied with updated testing.

* Fix test_headmasking for BART models

* Fix text_headmasking for BART-like models
which has only 2 layers in each modules.
The condition
```
self.assertNotEqual(attentions[1][..., 0, :, :].flatten().sum().item(), 0.0)
```
is, therefore, invalid for encoder-decoder models considering
the `head_mask`
```
head_mask = torch.ones(
    self.model_tester.num_hidden_layers,
    self.model_tester.num_attention_heads,
    device=torch_device,
)
head_mask[0, 0] = 0
head_mask[-1, :-1] = 0
```
specified in the `test_headmasking` test/function.

* Adjust test_modeling_common.py to reflect T5 input args

* Update tests/test_modeling_common.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* make style

* make fix-copies

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-01-18 13:35:22 +01:00
Stas Bekman
143289dcf7
[test_model_parallelization] multiple fixes (#9354) 2021-01-04 12:09:12 -08:00
Patrick von Platen
61443cd7d9
[GPT2] Correct gradient checkpointing (#9308)
* correct gpt2

* fix gpt2

* fix use_cache ordering

* correct past tolerance

* fix for all cases

* style
2020-12-25 23:28:12 +01:00
TobiasNorlund
08abdabda1
Fixed beam search generation for GPT2 and T5 (#9219) 2020-12-21 08:05:23 -05:00
Patrick von Platen
06971ac4f9
[Bart] Refactor - fix issues, consistency with the library, naming (#8900)
* remove make on the fly linear embedding

* start refactor

* big first refactor

* save intermediate

* save intermediat

* correct mask issue

* save tests

* refactor padding masks

* make all tests pass

* further refactor

* make pegasus test pass

* fix bool if

* fix leftover tests

* continue

* bart renaming

* delete torchscript test hack

* fix imports in tests

* correct shift

* fix docs and repo cons

* re-add fix for FSTM

* typo in test

* fix typo

* fix another typo

* continue

* hot fix 2 for tf

* small fixes

* refactor types linting

* continue

* finish refactor

* fix import in tests

* better bart names

* further refactor and add test

* delete hack

* apply sylvains and lysandres commens

* small perf improv

* further perf improv

* improv perf

* fix typo

* make style

* small perf improv
2020-12-09 20:55:24 +01:00
Lysandre Debut
aa60b230ec
Patch model parallel test (#8920)
* Patch model parallel test

* Remove line

* Remove `ci_*` from scheduled branches
2020-12-03 17:15:47 -05:00
Patrick von Platen
443f67e887
[PyTorch] Refactor Resize Token Embeddings (#8880)
* fix resize tokens

* correct mobile_bert

* move embedding fix into modeling_utils.py

* refactor

* fix lm head resize

* refactor

* break lines to make sylvain happy

* add news tests

* fix typo

* improve test

* skip bart-like for now

* check if base_model = get(...) is necessary

* clean files

* improve test

* fix tests

* revert style templates

* Update templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/modeling_{{cookiecutter.lowercase_modelname}}.py
2020-12-02 19:19:50 +01:00
Lysandre Debut
18c32eeb21
Model parallel tests should return, not pass in non model parallel settings. (#8825) 2020-11-27 16:41:29 -05:00
Max Del
0a921b6459
BART & FSMT: fix decoder not returning hidden states from the last layer (#8597)
* Fix decoder not returning hidden states from the last layer

* Resolve conflict

* Change the way to gather hidden states

* Add decoder hidden states test

* Make pytest and black happy

* Remove redundant line

* remove new line

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2020-11-27 18:35:34 +01:00
Joe Davison
369f1d77b4
Return correct Bart hidden state tensors (#8747)
* bart output hidden states upstream

* same w/ decoder

* add tests

* fix prophetnet

* fix gpt2 and ctrl

* fix fstm and skip test for reformer and longformer

* fix all models

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-11-25 22:06:04 +01:00
Stas Bekman
e84786aaa6
consistent ignore keys + make private (#8737)
* consistent ignore keys + make private

* style

* - authorized_missing_keys    => _keys_to_ignore_on_load_missing
  - authorized_unexpected_keys => _keys_to_ignore_on_load_unexpected

* move public doc of private attributes to private comment
2020-11-23 12:33:13 -08:00
alexorona
1cd9be2aeb
gpt2 and t5 parallel modeling (#8696)
* gpt2 and t5 parallel modeling

* model_parallel utils update

* adding missing model_parallel_utils

Adds missing model_parallel_utils and reverses the changes to code in modeling_gpt2 and modeling_t5

* training_args reformat

Reformatted training_args

* style formatting

Style formatting doc string length on training_args and model_parallel_utils

* style changes

make style && make quality for training_args and model_parallel_utils.

* adding tests

* minor change in trainer

reverts loss calculation

* Update training_args.py

* Update training_args.py

added back docstring language for adam_beta1 and adam_beta2

* Update trainer.py

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix style & rebase

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
2020-11-23 14:41:23 -05:00
Sylvain Gugger
1073a2bde5
Switch return_dict to True by default. (#8530)
* Use the CI to identify failing tests

* Remove from all examples and tests

* More default switch

* Fixes

* More test fixes

* More fixes

* Last fixes hopefully

* Use the CI to identify failing tests

* Remove from all examples and tests

* More default switch

* Fixes

* More test fixes

* More fixes

* Last fixes hopefully

* Run on the real suite

* Fix slow tests
2020-11-16 11:43:00 -05:00
Patrick von Platen
42e2d02e44
[T5] Bug correction & Refactor (#8518)
* fix bug

* T5 refactor

* refactor tf

* apply sylvains suggestions
2020-11-13 16:57:31 +01:00
Stas Bekman
02bdfc0251
using multi_gpu consistently (#8446)
* s|multiple_gpu|multi_gpu|g; s|multigpu|multi_gpu|g'

* doc
2020-11-10 13:23:58 -05:00
Patrick von Platen
9c83b96e62
[Tests] Add Common Test for Training + Fix a couple of bugs (#8415)
* add training tests

* correct longformer

* fix docs

* fix some tests

* fix some more train tests

* remove ipdb

* fix multiple edge case model training

* fix funnel and prophetnet

* clean gpt models

* undo renaming of albert
2020-11-09 18:24:41 +01:00
Yossi Synett
bc0d26d1de
[All Seq2Seq model + CLM models that can be used with EncoderDecoder] Add cross-attention weights to outputs (#8071)
* Output cross-attention with decoder attention output

* Update src/transformers/modeling_bert.py

* add cross-attention for t5 and bart as well

* fix tests

* correct typo in docs

* add sylvains and sams comments

* correct typo

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-11-06 19:34:48 +01:00
Guillaume Filion
27b402cab0
Output global_attentions in Longformer models (#7562)
* Output global_attentions in Longformer models

* make style

* small refactoring

* fix tests

* make fix-copies

* add for tf as well

* remove comments in test

* make fix-copies

* make style

* add docs

* make docstring pretty

Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
2020-11-05 21:10:43 +01:00
Patrick von Platen
a1bbcf3f6c
Refactoring the generate() function (#6949)
* first draft

* show design proposition for new generate method

* up

* make better readable

* make first version

* gpt2 tests pass

* make beam search for gpt2 work

* add first encoder-decoder code

* delete typo

* make t5 work

* save indermediate

* make bart work with beam search

* finish beam search bart / t5

* add default kwargs

* make more tests pass

* fix no bad words sampler

* some fixes and tests for all distribution processors

* fix test

* fix rag slow tests

* merge to master

* add nograd to generate

* make all slow tests pass

* speed up generate

* fix edge case bug

* small fix

* correct typo

* add type hints and docstrings

* fix typos in tests

* add beam search tests

* add tests for beam scorer

* fix test rag

* finish beam search tests

* move generation tests in seperate file

* fix generation tests

* more tests

* add aggressive generation tests

* fix tests

* add gpt2 sample test

* add more docstring

* add more docs

* finish doc strings

* apply some more of sylvains and sams comments

* fix some typos

* make fix copies

* apply lysandres and sylvains comments

* final corrections on examples

* small fix for reformer
2020-11-03 16:04:22 +01:00
Lysandre Debut
10f8c63620
Ci test tf super slow (#8007)
* Test TF GPU CI

* Change cache

* Fix missing torch requirement

* Fix some model tests


Style

* LXMERT

* MobileBERT

* Longformer skip test

* XLNet

* The rest of the tests

* RAG goes OOM in multi gpu setup

* YAML test files

* Last fixes

* Skip doctests

* Fill mask tests

* Yaml files

* Last test fix

* Style

* Update cache

* Change ONNX tests to slow + use tiny model
2020-10-30 10:25:48 -04:00
Santiago Castro
969859d5f6
Fix doc errors and typos across the board (#8139)
* Fix doc errors and typos across the board

* Fix a typo

* Fix the CI

* Fix more typos

* Fix CI

* More fixes

* Fix CI

* More fixes

* More fixes
2020-10-29 10:33:33 -04:00
Stas Bekman
57516c0cc8
[multiple models] skip saving/loading deterministic state_dict keys (#7878)
* make the save_load special key tests common

* handle mbart

* cleaner solution

* fix

* move test_save_load_missing_keys back into fstm for now

* restore

* style

* add marian

* add pegasus

* blenderbot

* revert - no static embed
2020-10-21 08:06:07 -04:00
Stas Bekman
3e31e7f956
[testing] rename skip targets + docs (#7863)
* rename skip targets + docs

* fix quotes

* style

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* small improvements

* fix

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-10-20 04:39:13 -04:00
Weizhen
2422cda01b
ProphetNet (#7157)
* add new model prophetnet

prophetnet modified

modify codes as suggested v1

add prophetnet test files

* still bugs, because of changed output formats of encoder and decoder

* move prophetnet into the latest version

* clean integration tests

* clean tokenizers

* add xlm config to init

* correct typo in init

* further refactoring

* continue refactor

* save parallel

* add decoder_attention_mask

* fix use_cache vs. past_key_values

* fix common tests

* change decoder output logits

* fix xlm tests

* make common tests pass

* change model architecture

* add tokenizer tests

* finalize model structure

* no weight mapping

* correct n-gram stream attention mask as discussed with qweizhen

* remove unused import

* fix index.rst

* fix tests

* delete unnecessary code

* add fast integration test

* rename weights

* final weight remapping

* save intermediate

* Descriptions for Prophetnet Config File

* finish all models

* finish new model outputs

* delete unnecessary files

* refactor encoder layer

* add dummy docs

* code quality

* fix tests

* add model pages to doctree

* further refactor

* more refactor, more tests

* finish code refactor and tests

* remove unnecessary files

* further clean up

* add docstring template

* finish tokenizer doc

* finish prophetnet

* fix copies

* fix typos

* fix tf tests

* fix fp16

* fix tf test 2nd try

* fix code quality

* add test for each model

* merge new tests to branch

* Update model_cards/microsoft/prophetnet-large-uncased-cnndm/README.md

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* Update model_cards/microsoft/prophetnet-large-uncased-cnndm/README.md

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* Update src/transformers/modeling_prophetnet.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* Update utils/check_repo.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* apply sams and sylvains comments

* make style

* remove unnecessary code

* Update README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/configuration_prophetnet.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* implement lysandres comments

* correct docs

* fix isort

* fix tokenizers

* fix copies

Co-authored-by: weizhen <weizhen@mail.ustc.edu.cn>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-10-19 17:36:09 +02:00
Sam Shleifer
960faaaf28
Blenderbot (#7418)
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-10-07 19:09:23 -04:00
Patrick von Platen
62f5ae68ec
[Seq2Seq] Fix a couple of bugs and clean examples (#7474)
* clean T5

* fix t5 tests

* fix index typo

* fix tf common test

* fix examples

* change positional ordering for Bart and FSTM

* add signature test

* clean docs and add tests

* add docs to encoder decoder

* clean docs

* correct two doc strings

* remove sig test for TF Elektra & Funnel

* fix tf t5 slow tests

* fix input_ids to inputs in tf

* Update src/transformers/modeling_bart.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_bart.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* implement lysandre results

* make style

* fix encoder decoder typo

* fix tf slow tests

* fix slow tests

* renaming

* remove unused input

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-10-01 17:38:50 +02:00
Sylvain Gugger
d155b38d6e
Funnel transformer (#6908)
* Initial model

* Fix upsampling

* Add special cls token id and test

* Formatting

* Test and fist FunnelTokenizerFast

* Common tests

* Fix the check_repo script and document Funnel

* Doc fixes

* Add all models

* Write doc

* Fix test

* Initial model

* Fix upsampling

* Add special cls token id and test

* Formatting

* Test and fist FunnelTokenizerFast

* Common tests

* Fix the check_repo script and document Funnel

* Doc fixes

* Add all models

* Write doc

* Fix test

* Fix copyright

* Forgot some layers can be repeated

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/modeling_funnel.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments

* Update src/transformers/modeling_funnel.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Address review comments

* Update src/transformers/modeling_funnel.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* Slow integration test

* Make small integration test

* Formatting

* Add checkpoint and separate classification head

* Formatting

* Expand list, fix link and add in pretrained models

* Styling

* Add the model in all summaries

* Typo fixes

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-09-08 08:08:08 -04:00
Stas Bekman
e71f32c0ef
[testing] fix ambiguous test (#6898)
Since `generate()` does:
```
        num_beams = num_beams if num_beams is not None else self.config.num_beams
```
This test fails if `model.config.num_beams > 1` (which is the case in the model I'm porting).

This fix makes the test setup unambiguous by passing an explicit `num_beams=1` to `generate()`.

Thanks.
2020-09-02 16:18:17 +02:00
Lysandre
a75c64d80c Black 20 release 2020-08-26 17:20:22 +02:00
Sylvain Gugger
a573777901
Update repo to isort v5 (#6686)
* Run new isort

* More changes

* Update CI, CONTRIBUTING and benchmarks
2020-08-24 11:03:01 -04:00