Commit Graph

1513 Commits

Author SHA1 Message Date
Matt
854260ca44
TF/Numpy variants for all DataCollator classes (#13105)
* Adding a TF variant of the DataCollatorForTokenClassification to get feedback

* Added a Numpy variant and a post_init check to fail early if a missing import is found

* Fixed call to Numpy variant

* Added a couple more of the collators

* Update src/transformers/data/data_collator.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fixes, style pass, finished DataCollatorForSeqToSeq

* Added all the LanguageModeling DataCollators, except SOP and PermutationLanguageModeling

* Adding DataCollatorForPermutationLanguageModeling

* Style pass

* Add missing `__call__` for PLM

* Remove `post_init` checks for frameworks because the imports inside them were making us fail code quality checks

* Remove unused imports

* First attempt at some TF tests

* A second attempt to make any of those tests actually work

* TF tests, round three

* TF tests, round four

* TF tests, round five

* TF tests, all enabled!

* Style pass

* Merging tests into `test_data_collator.py`

* Merging tests into `test_data_collator.py`

* Fixing up test imports

* Fixing up test imports

* Trying shuffling the conditionals around

* Commenting out non-functional old tests

* Completed all tests for all three frameworks

* Style pass

* Fixed test typo

* Style pass

* Move standard `__call__` method to mixin

* Rearranged imports for `test_data_collator`

* Fix data collator typo "torch" -> "pt"

* Fixed the most embarrassingly obvious bug

* Update src/transformers/data/data_collator.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Renaming mixin

* Updating docs

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Dalton Walker <dalton_walker@icloud.com>
Co-authored-by: Andrew Romans <andrew.romans@hotmail.com>
2021-08-31 13:06:48 +01:00
Sylvain Gugger
74b3344fbc Clean up test file 2021-08-31 07:06:49 -04:00
Kamal Raj
3efcfeab67
Deberta_v2 tf (#13120)
* Deberta_v2 tf

* added new line at the end of file, make style

* +V2, typo

* remove never executed branch of code

* rm cmnt and fixed typo in url filter

* cleanup according to review comments

* added #Copied from
2021-08-31 06:32:47 -04:00
tucan9389
41c559415a
Add GPT2ForTokenClassification (#13290)
* Add GPT2ForTokenClassification

* Fix dropout exception for GPT2 NER

* Remove sequence label in test

* Change TokenClassifierOutput to TokenClassifierOutputWithPast

* Fix for black formatter

* Remove dummy

* Update docs for GPT2ForTokenClassification

* Fix check_inits ci fail

* Update dummy_pt_objects after make fix-copies

* Remove TokenClassifierOutputWithPast

* Fix tuple input issue

Co-authored-by: danielsejong55@gmail.com <danielsejong55@gmail.com>
2021-08-31 12:19:04 +02:00
Sylvain Gugger
8b2de0e483
Tests fetcher tests (#13340)
* Incorporate tests dependencies in tests_fetcher

* Harder modif

* Debug

* Loop through all files

* Last modules

* Remove debug statement
2021-08-31 03:57:01 -04:00
Olatunji Ruwase
42f359d015
Use DS callable API to allow hf_scheduler + ds_optimizer (#13216)
* Use DS callable API to allow hf_scheduler + ds_optimizer

* Preserve backward-compatibility

* Restore backward compatibility

* Tweak arg positioning

* Tweak arg positioning

* bump the required version

* Undo indent

* Update src/transformers/trainer.py

* style

Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2021-08-30 10:01:06 -07:00
Laura Hanu
35236b870e
Add missing module __spec__ (#13321)
* added missing __spec__ to _LazyModule

* test __spec__ is not None after module import

* changed module_spec arg to be optional in _LazyModule

* fix style issue

* added module spec test to test_file_utils
2021-08-30 12:39:05 -04:00
Sylvain Gugger
c4ecd234f2
Fix AutoTokenizer when no fast tokenizer is available (#13336)
* Fix AutoTokenizer when a tokenizer has no fast version

* Add test
2021-08-30 11:55:18 -04:00
Kamal Raj
98e409abb3
albert flax (#13294)
* albert flax

* year -> 2021

* docstring updated for flax

* removed head_mask

* removed from_pt

* removed passing attention_mask to embedding layer
2021-08-30 17:29:27 +02:00
Kamal Raj
774760e6f3
distilbert-flax (#13324)
* distilbert-flax

* added missing self

* docs fix

* removed tied kernal extra init

* updated docs

* x -> hidden states

* removed head_mask

* removed from_pt, +FLAX

* updated year
2021-08-30 14:16:18 +02:00
NielsRogge
b6ddb08a66
Add LayoutLMv2 + LayoutXLM (#12604)
* First commit

* Make style

* Fix dummy objects

* Add Detectron2 config

* Add LayoutLMv2 pooler

* More improvements, add documentation

* More improvements

* Add model tests

* Add clarification regarding image input

* Improve integration test

* Fix bug

* Fix another bug

* Fix another bug

* Fix another bug

* More improvements

* Make more tests pass

* Make more tests pass

* Improve integration test

* Remove gradient checkpointing and add head masking

* Add integration test

* Add LayoutLMv2ForSequenceClassification to the tests

* Add LayoutLMv2ForQuestionAnswering

* More improvements

* More improvements

* Small improvements

* Fix _LazyModule

* Fix fast tokenizer

* Move sync_batch_norm to a separate method

* Replace dummies by requires_backends

* Move calculation of visual bounding boxes to separate method + update README

* Add models to main init

* First draft

* More improvements

* More improvements

* More improvements

* More improvements

* More improvements

* Remove is_split_into_words

* More improvements

* Simply tesseract - no use of pandas anymore

* Add LayoutLMv2Processor

* Update is_pytesseract_available

* Fix bugs

* Improve feature extractor

* Fix bug

* Add print statement

* Add truncation of bounding boxes

* Add tests for LayoutLMv2FeatureExtractor and LayoutLMv2Tokenizer

* Improve tokenizer tests

* Make more tokenizer tests pass

* Make more tests pass, add integration tests

* Finish integration tests

* More improvements

* More improvements - update API of the tokenizer

* More improvements

* Remove support for VQA training

* Remove some files

* Improve feature extractor

* Improve documentation and one more tokenizer test

* Make quality and small docs improvements

* Add batched tests for LayoutLMv2Processor, remove fast tokenizer

* Add truncation of labels

* Apply suggestions from code review

* Improve processor tests

* Fix failing tests and add suggestion from code review

* Fix tokenizer test

* Add detectron2 CI job

* Simplify CI job

* Comment out non-detectron2 jobs and specify number of processes

* Add pip install torchvision

* Add durations to see which tests are slow

* Fix tokenizer test and make model tests smaller

* Frist draft

* Use setattr

* Possible fix

* Proposal with configuration

* First draft of fast tokenizer

* More improvements

* Enable fast tokenizer tests

* Make more tests pass

* Make more tests pass

* More improvements

* Addd padding to fast tokenizer

* Mkae more tests pass

* Make more tests pass

* Make all tests pass for fast tokenizer

* Make fast tokenizer support overflowing boxes and labels

* Add support for overflowing_labels to slow tokenizer

* Add support for fast tokenizer to the processor

* Update processor tests for both slow and fast tokenizers

* Add head models to model mappings

* Make style & quality

* Remove Detectron2 config file

* Add configurable option to label all subwords

* Fix test

* Skip visual segment embeddings in test

* Use ResNet-18 backbone in tests instead of ResNet-101

* Proposal

* Re-enable all jobs on CI

* Fix installation of tesseract

* Fix failing test

* Fix index table

* Add LayoutXLM doc page, first draft of code examples

* Improve documentation a lot

* Update expected boxes for Tesseract 4.0.0 beta

* Use offsets to create labels instead of checking if they start with ##

* Update expected boxes for Tesseract 4.1.1

* Fix conflict

* Make variable names cleaner, add docstring, add link to notebooks

* Revert "Fix conflict"

This reverts commit a9b46ce9afe47ebfcfe7b45e6a121d49e74ef2c5.

* Revert to make integration test pass

* Apply suggestions from @LysandreJik's review

* Address @patrickvonplaten's comments

* Remove fixtures DocVQA in favor of dataset on the hub

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-08-30 12:35:42 +02:00
Patrick von Platen
a75db353c4
[Slow tests] Disable Wav2Vec2 pretraining test for now (#13303)
* fix_torch_device_generate_test

* remove @

* wav2vec2 pretraining

Co-authored-by: Patrick von Platen <patrick@huggingface.co>
2021-08-30 06:03:02 -04:00
Patrick von Platen
4362ee298a
correct (#13304) 2021-08-30 06:02:08 -04:00
Anton Lozhkov
b6f332ecaf
Add Wav2Vec2 & Hubert ForSequenceClassification (#13153)
* Add hubert classifier + tests

* Add hubert classifier + tests

* Dummies for all classification tests

* Wav2Vec2 classifier + ER test

* Fix hubert integration tests

* Add hubert IC

* Pass tests for all classification tasks on Hubert

* Pass all tests + copies

* Move models to the SUPERB org
2021-08-27 20:52:51 +03:00
Patrick von Platen
2bef3433e5
[Flax] Correct all return tensors to numpy (#13307)
* fix_torch_device_generate_test

* remove @

* finish find and replace
2021-08-27 17:38:34 +02:00
Nicolas Patry
8aa67fc192
Fixing mbart50 with return_tensors argument too. (#13301)
* Fixing mbart50 with `return_tensors` argument too.

* Adding mbart50 tokenization tests.
2021-08-27 17:22:06 +02:00
Nicolas Patry
b89a964d3f
Moving zero-shot-classification pipeline to new testing. (#13299)
* Moving `zero-shot-classification` pipeline to new testing.

* Cleaning up old mixins.

* Fixing tests
`sshleifer/tiny-distilbert-base-uncased-finetuned-sst-2-english` is
corrupted in PT.

* Adding warning.
2021-08-27 15:46:11 +02:00
NielsRogge
cc27ac1a87
Fix BeitForMaskedImageModeling (#13275)
* First pass

* Fix docs of bool_masked_pos

* Add integration script

* Fix docstring

* Add integration test for BeitForMaskedImageModeling

* Remove file

* Fix docs
2021-08-27 09:09:57 -04:00
Nicolas Patry
a3f96f366a
Moving translation pipeline to new testing scheme. (#13297)
* Moving `translation` pipeline to new testing scheme.

* Update tokenization mbart tests.
2021-08-27 12:26:17 +02:00
Nicolas Patry
45a8eb66bb
Moving token-classification pipeline to new testing. (#13286)
* Moving `token-classification` pipeline to new testing.

* Fix tests.
2021-08-27 11:24:56 +02:00
Nicolas Patry
a6e36558ef
Moving text-generation pipeline to new testing framework. (#13285)
* Moving `text-generation` pipeline to new testing framework.

* Keep check_model_type but log instead of raise Exception.

* warning -> error.
2021-08-26 17:30:03 +02:00
Nicolas Patry
662b143b71
Hotfixing master tests. (#13282) 2021-08-26 10:09:53 -04:00
Nicolas Patry
59c378d069
Moving text2text-generation to new pipeline testing mecanism. (#13281) 2021-08-26 16:09:48 +02:00
Nicolas Patry
0ebda5382b
Moving table-question-answering pipeline to new testing. (#13280) 2021-08-26 09:09:57 -04:00
Nicolas Patry
879fe8fa75
Moving summarization pipeline to new testing format. (#13279)
* Moving `summarization` pipeline to new testing format.

* Remove generate_kwargs from __init__ args.
2021-08-26 14:47:11 +02:00
Nicolas Patry
55fb88d369
Moving question_answering tests to the new testing scheme. Had to tweak a little some ModelTesterConfig for pipelines. (#13277)
* Moving question_answering tests to the new testing scheme. Had to tweak
a little some ModelTesterConfig for pipelines.

* Removing commented code.
2021-08-26 12:37:55 +02:00
Nicolas Patry
6b586ed18c
Move image-classification pipeline to new testing (#13272)
- Enforce `test_small_models_{tf,pt}` methods to exist (enforce checking
actual values in small tests)
- Add support for non RGB image for the pipeline.
2021-08-26 05:52:49 -04:00
Stas Bekman
40d60e1536
fix tokenizer_class_from_name for models with - in the name (#13251)
* fix tokenizer_class_from_name

* Update src/transformers/models/auto/tokenization_auto.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* add test

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-08-26 04:29:14 -04:00
Nicolas Patry
83bfdbdd75
Migrating conversational pipeline tests to new testing format (#13114)
* New test format for conversational.

* Putting back old mixin.

* Re-enabling auto tests with LazyLoading.

* Feature extraction tests.

* Remove feature-extraction.

* Feature extraction with feature_extractor (No pun intended).

* Update check_model_type for fill-mask.
2021-08-26 03:50:43 -04:00
Lysandre Debut
72eefb34a9
Add require flax to test (#13260) 2021-08-25 12:56:25 -04:00
Lysandre Debut
3bbe68f837
Hubert test fix (#13261) 2021-08-25 18:41:26 +02:00
Stas Bekman
5c6eca71a9
fix AutoModel.from_pretrained(..., torch_dtype=...) (#13209)
* fix AutoModel.from_pretrained(..., torch_dtype=...)

* fix to_diff_dict

* add better test

* torch is not always available when a model has self.torch_dtype
2021-08-24 11:43:41 +02:00
Yih-Dar
2e20c0f34a
Make Flax GPT2 working with cross attention (#13008)
* make flax gpt2 working with cross attention

* Remove encoder->decoder projection layer

* A draft (incomplete) for FlaxEncoderDecoderModel

* Add the method from_encoder_decoder_pretrained + the docstrings

* Fix the mistakes of using EncoderDecoderModel

* Fix style

* Add FlaxEncoderDecoderModel to the library

* Fix cyclic imports

* Add FlaxEncoderDecoderModel to modeling_flax_auto.py

* Remove question comments

* add tests for FlaxEncoderDecoderModel

* add flax_encoder_decoder to the lists of ignored entries in check_repo.py

* fix missing required positional arguments

* Remove **kwargs when creating FlaxEncoderDecoderModel in from_encoder_decoder_pretrained()

Also fix generation eos/pad tokens issue

* Fix: Use sequences from the generated_output

* Change a check from assert to raise ValueError

* Fix examples and token ids issues

* Fix missing all_cross_attentions when outputting tuple in modeling_gpt2

* Remove the changes in configuration docstrings.

* allow for bert 2 gpt2

* make fix-copies

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Change remaining examples to bert2gpt2

* Change the test to Bert2GPT2

* Fix examples

* Fix import

* Fix unpack bug

* Rename to FlaxEncoderDecoderModelTest and change the test to bert2gpt2

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Fix: NotImplentedError -> NotImplementedError

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* up

* finalize

Co-authored-by: ydshieh <ydshieh@user.noreply>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-08-23 17:57:29 +02:00
SaulLu
7223844df9
Change how "additional_special_tokens" argument in the ".from_pretrained" method of the tokenizer is taken into account (#13056)
* add test

* add change in PretrainedTokenizerBase

* change Luke

* deactivate

* add the possibility to add additional special tokens for M2M100

* format

* add special test for canine

* proposed changes for mbart

* proposed changes for mbart50

* proposed changes for byt5

* proposed changes for canine

* proposed changes for t5

* test fast and slow

* remove comment

* remove comment

* add fast version for all tests

* replace break by continue

* add more comments

* add check to avoid duplicates

* remove comment

* format

* proposed change for wave2vec2

* reverse changes mbart

* uncomment

* format
2021-08-23 14:35:18 +02:00
Philipp Schmid
f689743e74
SageMaker: Fix sagemaker DDP & metric logs (#13181)
* Barrier -> barrier

* added logger for metrics

* removed stream handler in trainer

* moved handler

* removed streamhandler from trainer

* updated test image and instance type added datasets version to test

* Update tests/sagemaker/scripts/pytorch/requirements.txt

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2021-08-23 10:18:07 +02:00
NielsRogge
8679bd7144
Add min and max question length options to TapasTokenizer (#12803)
* Add min and max question length option to the tokenizer

* Add corresponding test
2021-08-23 03:44:42 -04:00
NielsRogge
588e6caa15
Overwrite get_clean_sequence as this was causing a bottleneck (#13183) 2021-08-23 03:41:35 -04:00
Allan Lin
91ff480e26
Update namespaces inside torch.utils.data to the latest. (#13167)
* Update torch.utils.data namespaces to the latest.

* Format

* Update Dataloader.

* Style
2021-08-19 14:29:51 +02:00
Patrick von Platen
ecfa7eb260
[AutoFeatureExtractor] Fix loading of local folders if config.json exists (#13166)
* up

* up
2021-08-18 16:18:13 +02:00
Ori Ram
439a43b6b4
Add splinter (#12955)
* splinter template

* initialize splinter classes

* Splinter Tokenizer

* splinter.rst

* tokenization fixes

* Documentation & some minor variable name changes

* bug fix (added back question_token_id to config) + variable names

* Minor bug fixes + variable name changes

* Fix Splinter references after merge with new transformers

* changes after running make style & quality

* Fix documentation unindent

* Fix doc indentation in tokenization_splinter

* Fix also SplinterTokenizerFast

* Add Splinter to index.rst and README

* Fixdouble whitespace from index.rst

* Fixed index.rst with 'make fix-copies'

* Update docs/source/model_doc/splinter.rst

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Update docs/source/model_doc/splinter.rst

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Update docs/source/model_doc/splinter.rst

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Update docs/source/model_doc/splinter.rst

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Update src/transformers/models/splinter/__init__.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Added "copied from BERT" comments

* Removing unnexessary code from modeling_splinter

* Update README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/splinter/configuration_splinter.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Remove references to TF modeling from splinter

* Update src/transformers/models/splinter/modeling_splinter.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Remove unnecessary check

* Update src/transformers/models/splinter/modeling_splinter.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Add differences between Splinter and Bert tokenizers

* Update src/transformers/models/splinter/modeling_splinter.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/splinter/tokenization_splinter_fast.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Remove unnecessary check

* Doc formatting

* Update src/transformers/models/splinter/tokenization_splinter.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/splinter/tokenization_splinter.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* bug fix: remove load_tf_weights attribute

* Some minor quality changes

* Update docs/source/model_doc/splinter.rst

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/splinter/configuration_splinter.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Change FullyConnectedLayer to SplinterFullyConnectedLayer

* Variable naming

* Reove gather_positions function

* Remove ClassificationHead as it's outdated

* Update src/transformers/models/splinter/modeling_splinter.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Remove hardcoded 102 token id

* Minor style change

* Added "tau" organization to all model identifiers & URLS

* Added tau to the tests as well

* Copy-from comments

* Removed all unnecessary classes (e.g. SplinterForMaskedLM)

* Running make fix-copies

* Bug fix: Further removed unnecessary classes

* Add Splinter to AutoTokenization

* Add an integration test for Splinter

* Removed initialize_new_qass from config - It will be done through different checkpoints

* Removed `initialize_new_qass` from documentation as well

* Added new checkpoint names (`tau/splinter-base-qass` and same for large) in the code

* Minor change to test

* SplinterTokenizer now doesn't abstract from BertTokenizer

* SplinterTokenizerFast also dosn't abstract from Bert

* style and quality

* bug fix: import ing torch in tests only if it's available

* Auto mappings

* Changed copyrights in Splinter's files

* Update src/transformers/models/splinter/configuration_splinter.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: yuvalkirstain <kirstain.yuval@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-08-17 08:29:01 -04:00
Nicolas Patry
d58926ab1d
Moving fill-mask pipeline to new testing scheme (#12943)
* Fill mask pipelines test updates.

* Model eval !!

* Adding slow test with actual values.

* Making all tests pass (skipping quite a bit.)

* Doc styling.

* Better doc cleanup.

* Making an explicit test with no pad token tokenizer.

* Typo.
2021-08-13 12:04:18 +02:00
Sylvain Gugger
9a498c37a2
Rely on huggingface_hub for common tools (#13100)
* Remove hf_api module and use hugginface_hub

* Style

* Fix to test_fetcher

* Quality
2021-08-12 14:59:02 +02:00
Patrick von Platen
6900dded49
[Flax/JAX] Run jitted tests at every commit (#13090)
* up

* up

* up
2021-08-12 14:49:46 +02:00
Sylvain Gugger
ea8ffe36d3
Proper import for unittest.mock.patch (#13085) 2021-08-12 11:23:00 +02:00
Kamal Raj
d329b63369
Deberta tf (#12972)
* TFDeberta

moved weights to build and fixed name scope

added missing ,

bug fixes to enable graph mode execution

updated setup.py

fixing typo

fix imports

embedding mask fix

added layer names avoid autmatic incremental names

+XSoftmax

cleanup

added names to layer

disable keras_serializable
Distangled attention output shape hidden_size==None
using symbolic inputs

test for Deberta tf

make style

Update src/transformers/models/deberta/modeling_tf_deberta.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Update src/transformers/models/deberta/modeling_tf_deberta.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Update src/transformers/models/deberta/modeling_tf_deberta.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Update src/transformers/models/deberta/modeling_tf_deberta.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Update src/transformers/models/deberta/modeling_tf_deberta.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Update src/transformers/models/deberta/modeling_tf_deberta.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Update src/transformers/models/deberta/modeling_tf_deberta.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

removed tensorflow-probability

removed blank line

* removed tf experimental api
+torch_gather tf implementation from @Rocketknight1

* layername DeBERTa --> deberta

* copyright fix

* added docs for TFDeberta & make style

* layer_name change to fix load from pt model

* layer_name change as pt model

* SequenceClassification layername change,
to same as pt model

* switched to keras built-in LayerNormalization

* added `TFDeberta` prefix most layer classes

* updated to tf.Tensor in the docstring
2021-08-12 05:01:26 -04:00
Sylvain Gugger
0454e4bd8b
Fix ModelOutput instantiation form dictionaries (#13067)
* Fix ModelOutput instantiation form dictionaries

* Style
2021-08-10 12:20:04 +02:00
Lysandre Debut
6f5ab9daf1
Add MBART to models exportable with ONNX (#13049)
* Add MBART to models exportable with ONNX

* unittest mock

* Add tests

* Misc fixes
2021-08-09 08:56:04 -04:00
Lysandre Debut
1bf38611a4
Put smaller ALBERT model (#13028) 2021-08-06 12:41:33 -04:00
Michael Benayoun
dc420b0eb1
T5 with past ONNX export (#13014)
T5 with past ONNX export, and more explicit past_key_values inputs and outputs names for ONNX model

Authored-by: Michael Benayoun <michael@huggingface.co>
2021-08-06 15:46:26 +02:00
Sylvain Gugger
9870093f7b
[WIP] Disentangle auto modules from other modeling files (#13023)
* Initial work

* All auto models

* All tf auto models

* All flax auto models

* Tokenizers

* Add feature extractors

* Fix typos

* Fix other typo

* Use the right config

* Remove old mapping names and update logic in AutoTokenizer

* Update check_table

* Fix copies and check_repo script

* Fix last test

* Add back name

* clean up

* Update template

* Update template

* Forgot a )

* Use alternative to fixup

* Fix TF model template

* Address review comments

* Address review comments

* Style
2021-08-06 13:12:30 +02:00
Patrick von Platen
60e448c87e
[Flax] Correct pt to flax conversion if from base to head (#13006)
* finish PR

* add tests

* correct tests

* finish

* correct other flax tests

* better naming

* correct naming

* finish

* apply sylvains suggestions
2021-08-05 18:38:50 +02:00
Michael Benayoun
a6d62aaba0
GPT-Neo ONNX export (#12911)
GPT-Neo ONNX export and task / feature refactoring

Authored-by: Michael Benayoun <michael@huggingface.co>
2021-08-05 10:12:13 +02:00
NielsRogge
83e5a10603
Add BEiT (#12994)
* First pass

* Make conversion script work

* Improve conversion script

* Fix bug, conversion script working

* Improve conversion script, implement BEiTFeatureExtractor

* Make conversion script work based on URL

* Improve conversion script

* Add tests, add documentation

* Fix bug in conversion script

* Fix another bug

* Add support for converting masked image modeling model

* Add support for converting masked image modeling

* Fix bug

* Add print statement for debugging

* Fix another bug

* Make conversion script finally work for masked image modeling models

* Move id2label for datasets to JSON files on the hub

* Make sure id's are read in as integers

* Add integration tests

* Make style & quality

* Fix test, add BEiT to README

* Apply suggestions from @sgugger's review

* Apply suggestions from code review

* Make quality

* Replace nielsr by microsoft in tests, add docs

* Rename BEiT to Beit

* Minor fix

* Fix docs of BeitForMaskedImageModeling

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-08-04 18:29:23 +02:00
Lysandre Debut
0dd1152c18
Skip ProphetNet test (#12462) 2021-08-04 18:24:54 +02:00
Patrick von Platen
a317e6c3be
[Flax] Correctly Add MT5 (#12988)
* finish PR

* finish mt5

* push

* up

* Update tests/test_modeling_flax_mt5.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>

Co-authored-by: Suraj Patil <surajp815@gmail.com>
2021-08-04 16:03:13 +02:00
Patrick von Platen
da9754a3a0
[Flax] Align jax flax device name (#12987)
* [Flax] Align device name in docs

* make style

* fix import error
2021-08-04 16:00:09 +02:00
Sylvain Gugger
d4c834d2e0
Fix from_pretrained with corrupted state_dict (#12939)
* Fix from_pretrained with corrupted state_dict

* Adapt test

* Use better checkpoint

* Style

* Clean up
2021-08-04 11:48:39 +02:00
NielsRogge
a28da4c490
Replace nielsr by google namespace in tests (#12453) 2021-08-04 03:29:34 -04:00
Philip May
b7439675b8
fix Trainer.train(resume_from_checkpoint=False) is causing an exception (#12981)
* fix #12970

* Update tests/test_trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update tests/test_trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update tests/test_trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* remove unnecessary issue link

* fix test formatting

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-08-03 10:10:33 +02:00
Nicolas Patry
e2d22eef14
Moving feature-extraction pipeline to new testing scheme (#12843)
* Update feature extraction pipelilne.

* Leaving 1 small model for actual values check.

* Fixes tests

- Better support for tokenizer with no pad token
- Increasing PegasusModelTesterConfig for pipelines
- Test of feature extraction are more permissive + don't test Multimodel
models + encoder-decoder.

* Fixing model loading with incorrect shape (+ model with HEAD).

* Update tests/test_pipelines_common.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Revert modeling_utils modification.

* Some corrections.

* Update tests/test_pipelines_common.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update tests/test_pipelines_feature_extraction.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Syntax.

* Fixing text-classification tests.

* Don't modify this file.

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-07-29 19:35:55 +02:00
Funtowicz Morgan
640421c0ec
ONNX v2 raises an Exception when using PyTorch < 1.8.0 (#12933)
* Raise an issue if the pytorch version is < 1.8.0

* Attempt to add a test to ensure it correctly raises.

* Missing docstring.

* Second attempt, patch with string absolute import.

* Let's do the call before checking it was called ...

* use the correct function ... 🤦

* Raise ImportError and AssertionError respectively when unable to find torch and torch version is not sufficient.

* Correct path mock patching

* relax constraint for torch_onnx_dict_inputs to ge instead of eq.

* Style.

* Split each version requirements for torch.

* Let's compare version directly.

* Import torch_version after checking pytorch is installed.

* @require_torch
2021-07-29 18:02:29 +02:00
Nicolas Patry
a3bd763732
Better heuristic for token-classification pipeline. (#12611)
* Better heuristic for token-classification pipeline.

Relooking at the problem makes thing actually much simpler,
when we look at ids from a tokenizer, we have no way in **general**
to recover if some substring is part of a word or not.

However, within the pipeline, with offsets we still have access to the
original string, so we can simply look if previous character (if it
exists) of a token, is actually a space. This will obviously be wrong
for tokenizers that contain spaces within tokens, tokenizers where
offsets include spaces too (Don't think there are a lot).

This heuristic hopefully is fully bc and still can handle non-word based
tokenizers.

* Updating test with real values.

* We still need the older "correct" heuristic to prevent fusing
punctuation.

* Adding a real warning when important.
2021-07-26 16:21:26 +02:00
Thibault FEVRY
434022adac
Add RemBERT model code to huggingface (#10692)
* Faster list concat for trainer_pt_utils.get_length_grouped_indices() (#11825)

get_length_grouped_indices() in LengthGroupedSampler and DistributedLengthGroupedSampler
is prohibitively slow for large number of megabatches (in test case takes hours for ~270k
megabatches with 100 items each) due to slow list concatenation with sum(megabatches, []).

Resolves: #11795

Co-authored-by: ctheodoris <cvtheodo@ds.dfci.harvard.edu>

* Replace double occurrences as the last step (#11367)

* [Flax] Fix PyTorch import error (#11839)

* fix_torch_device_generate_test

* remove @

* change pytorch import to flax import

* Fix reference to XLNet (#11846)

* Switch mem metrics flag (#11851)

* Switch mem metrics flag

* Update src/transformers/training_args.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Fix flos single node (#11844)

* fixing flos bug/typo in non-distributed setting

* storing flos every logging_interval

* Fix two typos in docs (#11852)

* typo2

* fix typo

* [Trainer] Report both steps and num samples per second (#11818)

* [Trainer] Report both steps and num samples per second

* Fix batch number

* Update src/transformers/trainer_utils.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Address review comments

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Add some tests to the slow suite #11860

* Enable memory metrics in tests that need it (#11859)

* fixed a small typo in the doc (#11856)

* typo (#11858)

* Add option to log only once in multinode training (#11819)

* Add option to long only once in multinode training

* Use an alternate property

* [Wav2Vec2] SpecAugment Fast (#11764)

* first try

* finish

* [lm examples] fix overflow in perplexity calc (#11855)

* fix overflow in perplexity calc

* use inf

* fix

* [Examples] create model with custom config on the fly (#11798)

* create custom model on the flight

* better wording

* add update_from_string

* cleanup

* cleanup

* Update src/transformers/configuration_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* more bool options

* style

* fix logger

* add test

* add the doc

* assert on conflict of options

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* [Wav2Vec2ForCTC] example typo fixed (#11878)

* Ensure input tensor are on device. (#11874)

The feature extractor does not create tensors on the appropriate device,
so we call `ensure_tensor_on_device` before feeding the processed inputs
to the model.

* Fix usage of head masks by TF encoder-decoder models' `generate()` function (#11775)

* Fix Bart

* Fix Blenderbot{,_small}

* Fix LED

* Fix Marian

* Fix MBart

* Fix Pegasus

* Fix T5

* Add test for generation with head_mask

* Add a common TF test

* Override a test for the LED model as head masking is not yet properly implemented

* Remove all head_masks from input preparation for LED

* Drop masking for T5 as it needs a bit of refactor

* Correcting comments in T5Stack to reflect correct tuple order  (#11330)

* Correcting comments to reflect correct tuple order

In order to match the actual order (line 513 and 516, and as accessed in 968), I've changed the order mentioned in comments L962 and L966-967.

* Update modeling_t5.py

Updating another comment as well

* Removing extra space

* Fixing style and quality

* style & quality

* Update src/transformers/models/t5/modeling_t5.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* [Flax] Allow dataclasses to be jitted (#11886)

* fix_torch_device_generate_test

* remove @

* change dataclasses to flax ones

* fix typo

* fix jitted tests

* fix bert & electra

* changing find_batch_size to work with tokenizer outputs (#11890)

* changing find_batch_size to work with tokenizer outputs

trainer_pt_utils.find_batch_size does not recognize the batch size of BatchEncoding objects. This can cause an error when a trainer relies on find_batch_size to report the number of observed examples in the evaluation loop.

* Trigger CI

Co-authored-by: jrenner <joseph.renner@inria.fr>

* Link official Cloud TPU JAX docs (#11892)

* Flax Generate (#11777)

* fix_torch_device_generate_test

* remove @

* add

* indexing

* correct a couple of tests

* fix tests

* add logits processor

* finish top_k, top_p, temp

* add docs

* correct flax prng key default

* improve generate

* add generation docs

* add docs

* make style

* revert model outputs change

* make style

* correct typo

* fix tests

* fix slow test

* add raise

* finish generation

Co-authored-by: Patrick von Platen <patrick@huggingface.co>

* Add Emotion Speech Noteboook (#11900)

* Update deepspeed config to reflect hyperparameter search parameters (#11896)

* rebuild deepspeed config for hyperparameter search

* reformat code to fix style issues

* Adding new argument `max_new_tokens` for generate. (#11476)

* Adding new argument `max_new_tokens` for generate.

This is a proposal to add a new argument `max_new_tokens` to `generate`.
This include a `MaxNewTokensCriteria` that enables callers that don't
know about the token length ahead (like pipelines callers) to manage
more easily the length of their generated output.

* Adding a test for the user warning when both`max_length` and
`max_new_tokens` are used together.

* Removed redundant `no_grad`.

* Added Sequence Classification class in GPTNeo (#11906)

* seq classification changes

* fix tests

* [Flax] Return Attention from BERT, ELECTRA, RoBERTa and GPT2 (#11918)

* Added logic to return attention from flax-bert model and added test cases to check that

* Added new line at the end of file to test_modeling_flax_common.py

* fixing code style

* Fixing Roberta and Elextra models too from cpoying bert

* Added temporary hack to not run test_attention_outputs for FlaxGPT2

* Returning attention weights from GPT2 and changed the tests accordingly.

* last fixes

* bump flax dependency

Co-authored-by: jayendra <jayendra@infocusp.in>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Test optuna and ray (#11924)

* Remove `datasets` submodule

* fix assert (#11935)

* Remove redundant `nn.log_softmax` in `run_flax_glue.py` (#11920)

* Remove redundant `nn.log_softmax` in `run_flax_glue.py`

`optax.softmax_cross_entropy` expects unnormalized logits, and so it already calls `nn.log_softmax`, so I believe it is not needed here. `nn.log_softmax` is idempotent so mathematically it shouldn't have made a difference.

* Remove unused 'flax.linen' import

* Add MT5ForConditionalGeneration as supported arch. to summarization README (#11961)

* Add MT5ForConditionalGeneration as supported arch.

* Update README.md

* Add FlaxCLIP (#11883)

* add flax CLIP

* default input_shape

* add tests

* fix test

* fix name

* fix docs

* fix shapes

* attend at least 1 token

* flax conv to torch conv

* return floats

* fix equivalence tests

* fix import

* return attention_weights and update tests

* fix dosctrings

* address patricks comments

* input_shape arg

* add tests for get_image_features and get_text_features methods

* fix tests

* RAG-2nd2end-revamp (#11893)

* initial

* code quality test

* code quality

* added test functions in test_modeling_rag.py and test_retrieval_rag.py to test end2end retreiver

* minor change in test_modeling_rag

* fixed tests

* Update examples/research_projects/rag-end2end-retriever/README.md

typo corrected as suggested by lhoestq

Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>

* Update examples/research_projects/rag-end2end-retriever/finetune_rag.py

type change suggested by lhoestq

Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>

* Update src/transformers/models/rag/retrieval_rag.py

Adding this change as mentioned by lhoestq.

Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>

* completed the minor changes suggested by the reviewers

Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>

* modify qa-trainer (#11872)

* modify qa-trainer

* fix flax model

* bugfixes training_args.py (#11922)

modified according to:
https://pytorch.org/xla/release/1.8.1/_modules/torch_xla/core/xla_model.html

* reinitialize wandb config for each hyperparameter search run (#11945)

* Add regression tests for slow sentencepiece tokenizers.  (#11737)

* add test_vocab_size for sentencepiece tok.

* add test_get_vocab for sentencepiece tok.

* add test_convert_token_and_id for sentencepiece tok.

* add test_tokenize_and_convert_tokens_to_string for all tok.

* improve test_tokenize_and_convert_tokens_to_string for sp. tok.

* add common tokenizer integration tests
- for albert
- for barthez

* add tokenizer integration tests to bert gen.

* add most tokenizer integration tests

* fix camembert tokenizer integration test

* add tokenizer integration test to marian

* add tokenizer integration test to reformer

* add typing and doc to tokenizer_integration_test_util

* fix tokenizer integration test of reformer

* improve test_sentencepiece_tokenize_and_convert_tokens_to_string

* empty commit to trigger CI

* fix tokenizer integration test of reformer

* remove code not needed anymore

* empty commit to trigger CI

* empty commit to trigger CI

* Authorize args when instantiating an AutoModel (#11956)

* Neptune.ai integration (#11937)

An option that turns on neptune.ai logging
--report_to 'neptune'

Additional ENV variables:
	NEPTUNE_PROJECT
	NEPTUNE_API_TOKEN
	NEPTUNE_RUN_NAME (optional)
	NEPTUNE_STOP_TIMEOUT (optional)

* Run the integration tests on schedule tests instead of master tests

* [deepspeed] docs (#11940)

* deepspeed docs

* cleanup

* cleanup

* typo correction (#11973)

* typo correction

* type corrections

* ByT5 model (#11971)

* allow tf to use uneven num of layers

* add tokenizer

* finish docs

* finish docs

* Apply suggestions from code review

* include in index

* finish

* Update docs/source/model_doc/byt5.rst

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* apply sylvais suggestions

* make style

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Typo in usage example, changed to device instead of torch_device (#11979)

* [DeepSpeed] decouple `DeepSpeedConfigHF` from `Trainer` (#11966)

* decouple DeepSpeedConfigHF from Trainer

* add LoggingLevel ctx manager; add new test

* cleanup

* add docs

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* implemented suggested renames

* formatter workaround

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* [Trainer] add train loss and flops metrics reports (#11980)

* add train loss and flops metrics reports

* consistency

* add train_loss to skip keys

* restore on_train_end call timing

* Bump urllib3 from 1.25.8 to 1.26.5 in /examples/research_projects/lxmert (#11983)

Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.25.8 to 1.26.5.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/1.25.8...1.26.5)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [RAG] Fix rag from pretrained question encoder generator behavior (#11962)

* fix_torch_device_generate_test

* remove @

* fix rag from pretrained loading

* add test

* uplaod

* finish

* VisualBERT (#10534)

* Init VisualBERT

* Add cookie-cutter, Config, and Embeddings

* Add preliminary Model

* Add Bert analogous classes

* Add basic code for NLVR, VQA, Flickr

* Update Init

* Fix VisualBert Downstream Models

* Rename classifier to cls

* Comment position_ids buffer

* Remove sentence image predictor output

* Update output dicts

* Remove unnecessary files

* Fix Auto Modeling

* Fix transformers init

* Add conversion script

* Add conversion script

* Fix docs

* Update visualbert modelling

* Update configuration

* Style fixes

* Add model and integration tests

* Add all tests

* Update model mapping

* Add simple detector from original repository

* Update docs and configs

* Fix style

* Fix style

* Update docs

* Fix style

* Fix import issues in style

* Fix style

* Add changes from review

* Fix style

* Fix style

* Update docs

* Fix style

* Fix style

* Update docs/source/model_doc/visual_bert.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/visual_bert/modeling_visual_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update tests/test_modeling_visual_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/visual_bert/modeling_visual_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/visual_bert/modeling_visual_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/visual_bert/modeling_visual_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Add changes from review

* Remove convert run script

* Add changes from review

* Update src/transformers/models/visual_bert/modeling_visual_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/visual_bert/modeling_visual_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/visual_bert/modeling_visual_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/visual_bert/modeling_visual_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/visual_bert/modeling_visual_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Add changes from review

* Add changes from review

* Add visual embedding example in docs

* Fix "copied from" comments

* Add changes from review

* Fix error, style, checkpoints

* Update docs

* Fix integration tests

* Fix style

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix examples (#11990)

* [docs] fix xref to `PreTrainedModel.generate` (#11049)

* fix xref to generate

* do the same for search methods

* style

* style

* Update return introduction (#11976)

Make it clear that the `forward` method now returns a dict instead of tuple.

Fix style

* [deepspeed] Move code and doc into standalone files (#11984)

* move code and docs

* style

* moved

* restore

* [deepspeed] add nvme test skip rule (#11997)

* add nvme skip rule

* fix

* Fix weight decay masking in `run_flax_glue.py` (#11964)

* Fix weight decay masking in `run_flax_glue.py`

Issues with the previous implementation:
- The `dict` from `traverse_util.flatten_dict` has keys which are tuples of strings, not one long string with the path separated by periods.
- `optax.masked` applies the transformation wherever the mask is True, so the masks are flipped.
- Flax's LayerNorm calls the scale parameter `scale` not `weight`

* Fix formatting with black

* adapt results

Co-authored-by: Patrick von Platen <patrick@huggingface.co>

* [Flax] Refactor MLM  (#12013)

* fix_torch_device_generate_test

* remove @

* finish refactor

Co-authored-by: Patrick von Platen <patrick@huggingface.co>

* [Deepspeed] Assert on mismatches between ds and hf args (#12021)

* wip

* add mismatch validation + test

* renames

* Update docs/source/main_classes/deepspeed.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* renames

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* [TrainerArguments] format and sort __repr__, add __str__ (#12018)

* format and sort __repr__, add __str__

* typo

* use __str__ directly

* alias __repr__ = __str__

* Fixed Typo in modeling_bart.py (#12035)

* Fixed Typo in modeling_bart.py - Issue #11895

* Fixed Typo in modeling_bart.py

* fix deberta 2 tokenizer integration test (#12017)

* fix docs of past_key_values (#12049)

* [JAX] Bump jax lib (#12053)

* fix_torch_device_generate_test

* remove @

* bump up jax lib

* Fixes bug that appears when using QA bert and distilation. (#12026)

* Fixing bug that appears when using distilation (and potentially other uses).
During backward pass Pytorch complains with:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation
This happens because the QA model code modifies the start_positions and end_positions input tensors, using clamp_ function: as a consequence the teacher and the student both modifies the inputs, and backward pass fails.

* Fixing all models QA clamp_ bug.

* Extend pipelines for automodel tupels (#12025)

* fix_torch_device_generate_test

* remove @

* finish

* refactor

* add test

* fix test

* Attempt at simplification.

* Small fix.

* Fixing non existing AutoModel for TF.

* Naming.

* Remove extra condition.

Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>

* Add optional grouped parsers description to HfArgumentParser (#12042)

* Adding optional argument group to HfArgumentParser

* Minor

* remove whitespace

* Minor styling

* adds metric prefix. (#12057)

* adds metric prefix.

* update tests to include prefix

* skip failing test (#12059)

* Fix integration tests (#12066)

* Fix tapas issue (#12063)

* Fix scatter function to be compatible with torch-scatter 2.7.0

* Allow test again

* updated the original RAG implementation to be compatible with latest Pytorch-Lightning (#11806)

* updated the original RAG implementation to be compatible with the latest PL version

* updated the requirements.txt file

* execute make style

* code quality test

* code quality

* conflix resolved in requirement.txt

* code quality

* changed the MyDDP class name to CustomDDP

* Replace legacy tensor.Tensor with torch.tensor/torch.empty (#12027)

* Replace legacy torch.Tensor constructor with torch.{tensor, empty}

* Remove torch.Tensor in examples

* Add torch to requirements.txt in language-modeling (#12040)

* Add torch to requirements.txt in language-modeling

* Update examples/pytorch/language-modeling/requirements.txt

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Properly indent block_size (#12070)

* [Deepspeed] various fixes (#12058)

* replace deprecated config

* sub_group_size was too big

* complete deprecation removal

* [Deepspeed Wav2vec2] integration (#11638)

* wip

* wip - but working with https://github.com/microsoft/DeepSpeed/pull/1044

* cleanup

* workaround

* working 5/8 modes

* solve fp32 distributed zero3

* style

* sync

* sync

* rework

* deprecation

* cleanup

* https://github.com/microsoft/DeepSpeed/pull/1044 pr was merged

* clean up

* add a guide

* more prose

* more prose

* fix

* more prose

* sub_group_size was too big

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* refactor

* bug fix

* make the true check explicit

* new deepspeed release

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* typo

* Update run_ner.py with id2label config (#12001)

* sync LayerDrop for Wav2Vec2Encoder + tests (#12076)

* Add DETR (#11653)

* Squash all commits of modeling_detr_v7 branch into one

* Improve docs

* Fix tests

* Style

* Improve docs some more and fix most tests

* Fix slow tests of ViT, DeiT and DETR

* Improve replacement of batch norm

* Restructure timm backbone forward

* Make DetrForSegmentation support any timm backbone

* Fix name of output

* Address most comments by @LysandreJik

* Give better names for variables

* Conditional imports + timm in setup.py

* Address additional comments by @sgugger

* Make style, add require_timm and require_vision to testsé

* Remove train_backbone attribute of DetrConfig, add methods to freeze/unfreeze backbone

* Add png files to fixtures

* Fix type hint

* Add timm to workflows

* Add `BatchNorm2d` to the weight initialization

* Fix retain_grad test

* Replace model checkpoints by Facebook namespace

* Fix name of checkpoint in test

* Add user-friendly message when scipy is not available

* Address most comments by @patrickvonplaten

* Remove return_intermediate_layers attribute of DetrConfig and simplify Joiner

* Better initialization

* Scipy is necessary to get sklearn metrics

* Rename TimmBackbone to DetrTimmConvEncoder and rename DetrJoiner to DetrConvModel

* Make style

* Improve docs and add 2 community notebooks

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

* [test] support more than 2 gpus (#12074)

* support more than 2 gpus

* style

* Wav2Vec2 Pretraining (#11306)

* Working quantizer forward

* Working quantizer forward

* Clean up unused model parts, test reproducibility

* Working quantizer forward

* Clean up unused model parts, test reproducibility

* Remove custom outputs from the shared ones

* correct conversion

* correct bug

* add first pretrain script

* save intermediate

* static shapes

* save intermediate

* finish first pretrain script version

* more refactor

* remove wanddb

* refactor more

* improve test

* correct perplexity compute bug

* finish model implementation

* add to docs

* finish docs

* finish pretraining script

* finish pretraining script

* remove wandb

* finish PR for merge

* finish config

* finish

* make deepspeed work

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* apply suggestions

* fix flaky test

Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* pass decay_mask fn to optimizer (#12087)

* rm require_version_examples (#12088)

* [Wav2Vec2ForPretraining] Correct checkpoints wav2vec2 & fix tests (#12089)

* fix_torch_device_generate_test

* remove @

* fix tests

* Add text_column_name and label_column_name to run_ner and run_ner_no_trainer args (#12083)

* Add text_column_name and label_column_name to run_ner args

* Minor fix: grouping for text and label column name

* CLIPFeatureExtractor should resize images with kept aspect ratio (#11994)

* Resize with kept aspect ratio

* Fixed failed test

* Overload center_crop and resize methods instead

* resize should handle non-PIL images

* update slow test

* Tensor => tensor

Co-authored-by: patil-suraj <surajp815@gmail.com>

* New TF GLUE example (#12028)

* Pushing partially-complete new GLUE example

* First draft of the new TF GLUE example! Needs a little more testing to be sure but it's almost ready.

* Fix to the fit() call

* Bugfixes, making sure TPU and multi-GPU support is ready

* Remove logger line that depends on Pytorch

* Style pass

* Deleting old TF GLUE example

* Include label2id and id2label in the saved model config

* Don't clobber the existing model.config.label2id

* Style fixes

* Update examples/tensorflow/text-classification/run_glue.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix quality

* Update README.md to cover the TF GLUE example.

* Minor style edits

* Appending label2id and id2label to models to ensure inference works properly (#12102)

* Fix a condition in test_generate_with_head_masking (#11911)

* Fix a condition in test_generate_with_head_masking

* Fix usage of head_mask in bigbirg_pegasus

* Fix head masking for speech2text

* Resolve copy mismatch + drop unwanted print statement

* Fix the condition

* Flax VisionTransformer (#11951)

* adding vit for flax

* added test for Flax-vit and some bug-fixes

* overrided methods where variable changes were necessary for flax_vit test

* added FlaxViTForImageClassification for test

* Update src/transformers/models/vit/modeling_flax_vit.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* made changes suggested in PR

* Adding jax-vit models for autoimport

* swapping num_channels and height,width dimension

* fixing the docstring for torch-like inputs for VIT

* add model to main init

* add docs

* doc, fix-copies

* docstrings

* small test fixes

* fix docs

* fix docstr

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* style

Co-authored-by: jayendra <jayendra@infocusp.in>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* add relevant description to tqdm in examples (#11927)

* add relevant `desc` in examples

* require_version datasets>=1.8.0

* Fix head masking generate tests (#12110)

* fix_torch_device_generate_test

* remove @

* fix tests

* Flax CLM script (#12023)

* first draft

* max_seq_length => block_size

* fix arg names

* fix typos

* fix loss calculation

* add max examples, fix  train eval steps, metrics

* optimizer mask

* fix perpelexity, metric logging

* fix logging

* data_collator = > data_loader

* refactor loss_fn

* support single GPU

* pass distributed to write_metric

* fix jitting

* fix single device training

* fix single device metrics

* close inner progress bars once finished

* add overwrite_cache arg

* ifx dataset caching issue

* add more logs

* few small fixes,

* address nicholas suggestions

* fix docstr

* address patricks suggestions

* make flake happy

* pass new new_dropout_rng to apply_gradients

* reset train metrics after every epoc

* remove distributed logis, small fixes

* Add from_pretrained to dummy timm objects (#12097)

* Add from_pretrained to dummy timm

* Fix at the source

* Update utils/check_dummies.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Missing pretrained dummies

* Style

Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix t5 error message (#12136)

* Fix t5 error message

* Fix again

* Fix megatron_gpt2 attention block's causal mask (#12007)

* Fix megatron_gpt2 attention block's causal mask.

* compatibility with checkpoints created with recent versions of Megatron-LM

* added integration test for the released Megatron-GPT2 model

* code style changes

* added option to megatron conversion script to read from config file

Co-authored-by: Guido Novati <gnovati@nvidia.com>

* Add mlm pretraining xla torch readme (#12011)

* fix_torch_device_generate_test

* remove @

* upload

* Apply suggestions from code review

* Apply suggestions from code review

* Apply suggestions from code review

* Update examples/flax/language-modeling/README.md

* add more info

* finish

* fix

Co-authored-by: Patrick von Platen <patrick@huggingface.co>

* add readme for flax clm (#12111)

* add readme for flax clm

* use section link for tokenizer

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* update metrics

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* FlaxBart (#11537)

* Start working on FlaxBart

* Create modeling_flax_bart.py

* Write FlaxBartAttention

* Add FlaxBartEncoderLayer

* Add FlaxBartDecoderLayer and some typing

* Add helepr function for FlaxBart

* shift_tokens_right

* _make_causal_mask

* _expand_mask

* Add PositionalEmbedding and fix init_std naming

* Add FlaxBartPretrainedModel

* Add FlaxBartEncoder

* Add FlaxBartEncoder

* Add FlaxBartEncoder among modules to be imported

* YET WE CANNOT INITIALIZE THAT!! :(

* Make BartEncoder working

Change BartEncoder to instance of nn.Module so far

* Add FlaxBartDecoder

* Add FlaxBartModel

* TODO to make model run -> Prepapre model inputs

* Resolve padding

* Add FlaxBartModel

* Add FlaxBartModel into importable modules

* Remove FlaxBartEncoder and FlaxBartDecoder from importable modules

* make style; not properly working

* make style; make quality not pass due to some import I left

* Remove TODO for padding_idx in nn.Embed so far

* Add FlaxBartForConditionalGeneration

* Incorporate Flax model output classes, i.e. return_dict

* Add another models and incorporate use_cache arg

* Add FlaxBartForSequenceClassification and FlaxBartForQuestionAnswering

* Incorporate use_cache arg from PyTorch implementation

* Add all necessary Flax output utils

* Add FlaxBartForCausalLM; not working yet'

* Add minor improvements; still lacks some functionality

* Update docs, src and tests

* Add support of FlaxBart to docs/source

* Fix some bugs in FlaxBart souce code

* Add some neccessary tests for FlaxBart models - jit_compilation not passing

* Fix tests and add test_head_masking

* Fix tests for @jax.jit computation

* Add test_head_masking

* Migrate FlaxBart tests from jax.numpy to numpy

* Remove FlaxBartForCausalLM

* Clean repo

* fix bart model weight structure

* Fix FlaxBartForSequenceClassification

Slicing is not possible to use below jit, therefore, selecting sentence
representation from hidden_states must be changed.

* Allow FlaxBartForSequenceClassification for testing pt_flax equivalence

* Allow testing for FlaxBartForQA for pt_flax equivalence

* Add a comment to FlaxBartForSequenceClassification + change noise from 1e-3 to 1e-6

* remove past_key_values

* remove inputs_mebeds and make input_ids required

* add position ids

* re-write attention layer

* fix dataclass

* fix pos embeds and attention output

* fix pos embeds

* expose encode method

* expose decode method

* move docstring to top

* add cache for causal attn layer

* remove head masking for now

* s2s greedy search first pass

* boom boom

* fix typos

* fix greedy generate for bart

* use encoder, decoder layers instead of num_hidden_layers

* handle encoder_outputs

* cleanup

* simplify decoding

* more clean-up

* typos

* Change header + add {decoder_,}position_ids into 2 models

* add BartConfig

* fix existing tests

* add encode, decode methods

* Fix shift_tokens_right for JIT compilation + clarify one condition

* fix decode

* encoder => encode

* simplify generate

* add tests for encode and decode

* style

* add tests for cache

* fix equivalence tests

* sample generate now works with seq2seq

* generation tests

* initialize dense layers

* docstring and cleanup

* quality

* remove get/set input_embeddings

* address Patricks suggestions

* decode for every model, remove encoder_outputs from call

* update tests accordingly

* decode returns only decoder outputs and logits

* fix arguments

* doc encode, decode methods

* correct base_model_prefix

* fix test for seq classif model

* fix docs

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Feature to use the PreTrainedTokenizerFast class as a stand-alone tokenizer (#11810)

* feature for tokenizer without slow/legacy version

* format

* modify common test

* add tests

* add PreTrainedTokenizerFast to AutoTokenizer

* format

* change tokenizer common test in order to be able to run test without a slow version

* update tokenizer fast test in order to use `rust_tokenizer_class` attribute instead of `tokenizer_class`

* add autokenizer test

* replace  `if self.tokenizer_class is not None` with ` if self.tokenizer_class is None`

* remove obsolete change in comment

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/tokenization_utils_fast.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* change `get_main_tokenizer` into `get_tokenizers`

* clarify `get_tokenizers` method

* homogenize with `test_slow_tokenizer` and `test_rust_tokenizer`

* add `test_rust_tokenizer = False` to tokenizer which don't define a fast version

* `test_rust_tokenizer = False` for BertJapaneseTokenizer

* `test_rust_tokenizer = False` for BertJapaneseCharacterTokenizationTest

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* [Flax] Add links to google colabs (#12146)

* fix_torch_device_generate_test

* remove @

* add colab links

* Don't log anything before logging is setup in examples (#12121)

* Don't log anything before logging is setup in examples

* Last example

* Use text_column_name variable instead of "text" (#12132)

* Use text_column_name variable instead of "text"

`text_column_name` was already defined above where I made the changes and it was also used below where I made changes.

This is a very minor change. If a dataset does not use "text" as the column name, then the `tokenize_function` will now use whatever column is assigned to `text_column_name`. `text_column_name` is just the first column name if "text" is not a column name. It makes the function a little more robust, though I would assume that 90% + of datasets use "text" anyway.

* black formatting

* make style

Co-authored-by: Nicholas Broad <nicholas@nmbroad.com>

* [lm examples] Replicate --config_overrides addition to other LM examples (#12135)

* [lm examples] Replicate --config_overrides addition to other LM examples

* Removing no trainer files changes

* Update README

Co-authored-by: Kumar Abhishek <kabhishek@expedia.com>

* fix error message (#12148)

* [optim] implement AdafactorSchedule (#12123)

* implement AdafactorSchedule

* typo

* fix

* Update src/transformers/optimization.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* [style] consistent nn. and nn.functional (#12124)

* consistent nn. and nn.functional

* fix glitch

* fix glitch #2

* Adding TFWav2Vec2Model (#11617)

* [WIP] Add TFWav2Vec2Model

Work in progress for adding a tensorflow version of Wav2Vec2

* feedback changes

* small fix

* Test Feedback Round 1

* Add SpecAugment and CTC Loss

* correct spec augment mask creation

* docstring and correct copyright

* correct bugs

* remove bogus file

* finish tests correction

* del unnecessary layers

* Update src/transformers/models/wav2vec2/modeling_tf_wav2vec2.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* make style

* correct final bug

* Feedback Changes

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* [Flax] Fix flax pt equivalence tests (#12154)

* fix_torch_device_generate_test

* remove @

* upload

* consistent nn. and nn.functional: p2 templates (#12153)

* Flax Big Bird (#11967)

* add flax bert

* bert -> bigbird

* original_full ported

* add debugger

* init block sparse

* fix copies ; gelu_fast -> gelu_new

* block sparse port

* fix block sparse

* block sparse working

* all ckpts working

* fix-copies

* make quality

* init tests

* temporary fix for FlaxBigBirdForMultipleChoice

* skip test_attention_outputs

* fix

* gelu_fast -> gelu_new ; fix multiple choice model

* remove nsp

* fix sequence classifier

* fix

* make quality

* make fix-copies

* finish

* Delete debugger.ipynb

* Update src/transformers/models/big_bird/modeling_flax_big_bird.py

* make style

* finish

* bye bye jit flax tests

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* [style] consistent nn. and nn.functional: part 3 `tests` (#12155)

* consistent nn. and nn.functional: p3 templates

* restore

* [style] consistent nn. and nn.functional: part 4 `examples` (#12156)

* consistent nn. and nn.functional: p4 examples

* restore

* consistent nn. and nn.functional: part 5 docs (#12161)

* Add video links to the documentation (#12162)

* [Flax generate] Add params to generate (#12171)

* fix_torch_device_generate_test

* remove @

* add params as input

* finish

* Use a released version of optax rather than installing from Git. (#12173)

Use a released version of optax rather than installing from Git

* Have dummy processors have a `from_pretrained` method (#12145)

* Add course banner (#12157)

* Add course banner

* Update course banner

* Adjust banner width

* Enable add_prefix_space if model_type is roberta or gpt2 (#12116)

* Update AutoModel classes in summarization example (#12178)

- Convert use of deprecated AutoModelWithLMHead to AutoModelForSeq2SeqLM
- Add newly required `truncation=True` to `tokenizer.encode` with `max_length`

This silences all warnings.

* Ray Tune Integration Updates (#12134)

* fix

* fixes

* add back to scheduled tests

* formatting

* Update integrations.py

* [testing] ensure concurrent pytest workers use a unique port for torch.dist (#12166)

* ensure concurrent pytest workers use a unique port for torch.distributed.launch

* reword

* Model card defaults (#12122)

* [WIP] Model card defaults

* finetuned_from default value

* Add all mappings to the mapping file

* Be more defensive on finetuned_from arg

* Add default task tag

* Separate tags from tasks

* Edge case for dataset

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Temporarily deactivate torch-scatter while we wait for new release (#12181)

* Temporarily deactivate torch-scatter while we wait for new release

* torch-1.8.1 binary for scatter

* Revert to 1.8.0

* Pin torch dependency

* torchaudio and torchvision

* Temporarily deactivate torchhub test (#12184)

* [Flax] Add Beam Search (#12131)

* fix_torch_device_generate_test

* remove @

* push new logit processors

* add processors

* save first working version

* save intermediate

* finish

* make style

* make fix-copies

* finish

* Update tests/test_modeling_flax_bart.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Suraj Patil <surajp815@gmail.com>

Co-authored-by: Patrick von Platen <patrick@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Hubert (#11889)

* fix_torch_device_generate_test

* remove @

* add hubert

* add first test file

* more docs

* fix bugs

* fix bug

* finish

* finish

* finish docstring

* fix

* fix

* finalize

* add to ignored

* finish

* Apply suggestions from code review

* correct naming

* finish

* fix auto config

* finish

* correct convert script

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Suraj Patil <surajp815@gmail.com>

* apply suggestions lysandre & suraj

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Suraj Patil <surajp815@gmail.com>

* updated DLC images and sample notebooks (#12191)

* Enabling AutoTokenizer for HubertConfig. (#12198)

* Use yaml to create metadata (#12185)

* Use yaml to create metadata

* Fix typo

* Remove pin

* [Docs] fixed broken link (#12205)

* fixed broken link

* Update docs/source/benchmarks.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/benchmarks.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Pipeline update & tests (#12207)

* Improve detr (#12147)

* Remove unused variables

* Improve docs

* Fix docs of segmentation masks

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Add link to the course (#12229)

* Support for torch 1.9.0 (#12224)

* Support for torch 1.9.0

* Torch scatter for 1.9.0

* Github Actions run on 1.9.0

* fix pt-1.9.0 `add_` deprecation (#12217)

* fix pt-1.9.0 add_ deprecation

* add () for clarity

* Trigger CI

* require_version(torch

* Release: v4.7.0

* Docs for v4.8.0

* AutoTokenizer: infer the class from the tokenizer config if possible (#12208)

* AutoTokenizer: infer the class from the tokenizer config if possible

* Add tests

* Update src/transformers/models/auto/tokenization_auto.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* update desc for map in all examples (#12226)

* update desc for map in all examples

* added plm

* suggestions

* [Flax] FlaxAutoModelForSeq2SeqLM (#12228)

* add FlaxAutoModelForSeq2SeqLM

* [FlaxBart] few small fixes (#12247)

* boom boom

* remove flax clip example

* few small fixes

* Depreciate pythonic Mish and support PyTorch 1.9 version of Mish (#12240)

* Moved Mish to Torch 1.9 version

* Run black formatting

* [t5 doc] make the example work out of the box (#12239)

* [run_clm.py] restore caching

* style

* [t5 doc] make the example work out of the box

This PR expands the training example to include the correct model type for the example to work, e.g. with `T5Model` this example will break.

* Update docs/source/model_doc/t5.rst

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* expand the other example

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Fix the scheduled CI

* Better CI feedback (#12279)

* Better run ID

* Only part of CI

* Revert "Only part of CI"

This reverts commit 29f7f248d2.

* Fix for making student ProphetNet for Seq2Seq Distillation (#12130)

* make_student.py: fix to make student ProphetNet

* reformat

* [FlaxClip] fix test from/save pretrained test (#12284)

* boom boom

* remove flax clip example

* fix from_save_pretrained

* [Flax] [WIP] allow loading head model with base model weights (#12255)

* boom boom

* remove flax clip example

* allow loading head model with base model weights

* add test

* fix imports

* disable save, load test for clip

* add test_save_load_to_base

* [DeepSpeed] don't ignore --adafactor (#12257)

* [Flax] Fix flax test save pretrained (#12256)

* fix_torch_device_generate_test

* remove @

* fix flax save pretrained test

* Tensorflow QA example (#12252)

* New Tensorflow QA example!

* Style pass

* Updating README.md for the new example

* flake8 fixes

* Update examples/tensorflow/question-answering/README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* [Flax] Add jax flax to env command (#12251)

* fix_torch_device_generate_test

* remove @

* add commands for flax/jax

* reset report_to to none, avoid deprecation warning (#12293)

* [trainer + examples] set log level from CLI (#12276)

* set log level from CLI

* add log_level_replica + test + extended docs

* cleanup

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* rename datasets objects to allow datasets module

* improve the doc

* style

* doc improve

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* [tests] multiple improvements (#12294)

* [tests] multiple improvements

* cleanup

* style

* todo to investigate

* fix

* Fix for the issue of device-id getting hardcoded for token_type_ids during Tracing [WIP] (#11252)

* registering a buffer for token_type_ids, to pass the error of device-id getting hardcoded when tracing

* sytle format

* adding persistent flag to the resgitered buffers that prevent from adding them to the state_dict and addresses the Backward compatibility issue

* adding the try catch to the fix as persistent flag is only available from PT >1.6

* adding version check

* added the condition to only use the token_type_ids buffer when its autogenerated not passed by user

* adding comments and making the conidtion where token_type_ids are None to use the registered buffer

* taking out position-embeddding from the if block

* adding comments

* handling the case if buffer for position_ids was not registered

* reverted the changes on position_ids, fix the issue with size of token_type_ids buffer, moved the modification for generated token_type_ids to Bertmodel, instead of Embeddings

* reverting the token_type_ids in case of None to the previous version

* reverting changes on position_ids adding back the if block

* changes added by running make fix-copies

* changes added by running make fix-copies and added the import version as it was getting used

* changes added by running make fix-copies

* changes added by running make fix-copies

* fixing the import format

* fixing the import format

* modified to use temp tensor for trimed and expanded token_type_ids buffer

* changes made by fix-copies after temp tensor modifications

* changes made by fix-copies after temp tensor modifications

* changes made by fix-copies after temp tensor modifications

* clean up

* clean up

* clean up

* clean up

* Nit

* Nit

* Nit

* modified according to support device conversion on traced models

* modified according to support device conversion on traced models

* modified according to support device conversion on traced models

* modified according to support device conversion on traced models

* changes based on latest in master

* Adapt templates

* Add version import

Co-authored-by: Ubuntu <ubuntu@ip-172-31-32-81.us-west-2.compute.internal>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

* trainer_tf: adjust wandb installation command (#12291)

* add FlaxAutoModelForImageClassification in main init (#12298)

* Fix and improve documentation for LEDForConditionalGeneration (#12303)

* Replace conditional generation example (fixes #12268)

* Replace model in summarization example with finetuned checkpoint, adapt example text

* Fix typo in new summarization example

* Fix docstring formatting, add missing import statement to example

* [Flax] Main doc for event orga (#12305)

* fix_torch_device_generate_test

* remove @

* push

* finish

* some typos

* add more info on communication

* add suggestions

* [trainer] 2 bug fixes and a rename (#12309)

* bug fixes and a rename

* add extended DDP test

* FlaxBartPretrainedModel -> FlaxBartPreTrainedModel (#12313)

* [docs]  performance  (#12258)

* initial performance document

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* rewrites based on suggestions

* 8x multiple is for AMP only

* add contribute section

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Add CodeCarbon Integration (#12304)

* Add optional dependency

* Add CodeCarbon integration

* Add CodeCarbon integration

* Add CodeCarbon integration

* typo

* Optimizing away the `fill-mask` pipeline. (#12113)

* Optimizing away the `fill-mask` pipeline.

- Don't send anything to the tokenizer unless needed. Vocab check is
much faster
- Keep BC by sending data to the tokenizer when needed. User handling warning messages will see performance benefits again
- Make `targets` and `top_k` work together better `top_k` cannot be
higher than `len(targets)` but can be smaller still.
- Actually simplify the `target_ids` in case of duplicate (it can happen
because we're parsing raw strings)
- Removed useless code to fail on empty strings. It works only if empty
string is in first position, moved to ignoring them instead.
- Changed the related tests as only the tests would fail correctly
(having incorrect value in first position)

* Make tests compatible for 2 different vocabs... (at the price of a
warning).

Co-authored-by: @EtaoinWu

* ValueError working globally

* Update src/transformers/pipelines/fill_mask.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* `tokenizer.vocab` -> `tokenizer.get_vocab()` for more compatiblity +
fallback.

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Add output in a dictionary for TF `generate` method (#12139)

* Add output args to greedy search

* Fix critical typo + make style quality

* Handle generate_beam_search

* Add dict_specific tests and fix the placement of encoder outputs

* Add  specific outputs

* Update doc

* Fix typo

* Adjust handling encoder_outputs + Fix generating for T5

* Fix generate for RAG

* Fix handling ouptut_attentions when target_mapping is not None

Take care of situations when target_mapping is provided
as there are 2-tuple of attentions

Change from:
if inputs["output_attentions"]:
    attentions = tuple(tf.transpose(t, perm(2, 3, 0, 1)) for t in attentions)

to:
if inputs["output_attentions"]:
    if inputs["target_mapping"] is not None:
        # when target_mapping is provided, there are 2-tuple of attentions
         attentions = tuple(
             tuple(tf.transpose(attn_stream, perm=(2, 3, 0, 1)) for attn_stream in t) for t in attentions
        )
    else:
        attentions = tuple(tf.transpose(t, perm=(2, 3, 0, 1)) for t in attentions)

* Rename kwargs to model_kwargs

* make style quality

* Move imports in test_modeling_tf_common.py

Move ModelOutput-related imports in test_modeling_tf_common.py
into the `is_tf_available():` statement.

* Rewrite nested if-statements

* Fix added tests

* Flax summarization script  (#12230)

* add summrization script

* fix arguments, preprocessing, metrics

* add generation and metrics

* auto model, prediction loop

* prettify

* label smoothing

* adress Sylvain and Patricks suggestions

* dynamically import shift_tokens_right

* fix shift_tokens_right_fn call

* Rewrite ProphetNet to adapt converting ONNX friendly (#11981)

* Rewrite

* [ONNX] rewrite

* Flax T5 (#12150)

* copy pytorch-t5

* init

* boom boom

* forward pass same

* make generation work

* add more tests

* make test work

* finish normal tests

* make fix-copies

* finish quality

* correct slow example

* correct slow test

* version table

* upload models

* Update tests/test_modeling_flax_t5.py

* correct incorrectly deleted line

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick@huggingface.co>

* Add mention of the huggingface_hub methods for offline mode (#12320)

* [Flax/JAX] Add how to propose projects markdown (#12311)

* fix_torch_device_generate_test

* remove @

* finish

* make style

* [TFWav2Vec2] Fix docs (#12283)

* fix error

* make style check happy

Co-authored-by: chenhaitao <chenhaitao@qiyi.com>

* Clean push to hub API (#12187)

* Clean push to hub API

* Create working dir if it does not exist

* Different tweak

* New API + all models + test Flax

* Adds the Trainer clean up

* Update src/transformers/file_utils.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments

* (nit) output types

* No need to set clone_from when folder exists

* Update src/transformers/trainer.py

Co-authored-by: Julien Chaumond <julien@huggingface.co>

* Add generated_from_trainer tag

* Update to new version

* Fixes

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

* Add all XxxPreTrainedModel to the main init (#12314)

* Add all XxxPreTrainedModel to the main init

* Add to template

* Add to template bis

* Add FlaxT5

* Conda build (#12323)

* Temporarily revert the `fill-mask` improvements.

* changed modeling_fx_utils.py to utils/fx.py for clarity (#12326)

Co-authored-by: Michael Benayoun <michael@huggingface.co>

* Pin good version of huggingface_hub

* [Flax T5] Fix weight initialization and fix docs (#12327)

* finish t5 flax fixes

* improve naming

* Release: v4.8.0

* v4.9.0.dev0

* Update training_args.py (#12328)

mention in `save_strategy` param description that `load_best_model_at_end` can override

* [Deepspeed] new docs (#12077)

* document sub_group_size

* style

* install + issues reporting

* style

* style

* Update docs/source/main_classes/deepspeed.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* indent 4

* restore

* style

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix default to logging_dir lost in merge conflict

* try-this (#12338)

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>

* [examples/Flax] move the examples table up (#12341)

* Fix torchscript tests (#12336)

* Fix torchscript tests

* Better test

* Remove bogus print

* Document patch release v4.8.1

* Add flax/jax quickstart (#12342)

* Update README.md

* fixed typo (#12356)

* Fix exception in prediction loop occurring for certain batch sizes (#12350)

* fix distributed_concat for scalar outputs

* Update README.md

* fixed typo (#12356)

* simplify fix with terser syntax

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Trigger CI

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: michal pitr <21157924+MichalPitr@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Add FlaxBigBird QuestionAnswering script (#12233)

* port bigbird script

* adapt script a bit

* change location

* adapt more

* save progress

* init commit

* style

* dataset script tested

* readme add

* Replace NotebookProgressReporter by ProgressReporter in Ray Tune run (#12357)

* Replace NotebookProgressReporter by ProgressReporter in Ray Tune run

* Move to local import

* Style

* remove extra white space from log format (#12360)

* fixed multiplechoice tokenization (#12362)

* fixed multiplechoice tokenization

The model would have seen two sequences:
1. [CLS]prompt[SEP]prompt[SEP]
2. [CLS]choice0[SEP]choice1[SEP]
that is not correct as we want a contextualized embedding of prompt and choice

* removed outer brackets for proper sequence generation

* [trainer] add main_process_first context manager (#12351)

* main_process_first context manager

* handle multi-node, add context description

* sync desc

* [Examples] Replicates the new --log_level feature to all trainer-based pytorch (#12359)

* added log_level

* fix comment

* fixed log_level

* Trigger CI

* Unfied logging

* simplified args for log_level

* updated example template (#12365)

* replace print with logger (#12368)

* [Documentation] Warn that DataCollatorForWholeWordMask is limited to BertTokenizer-like tokenizers (#12371)

* Notify users that DataCollatorForWholeWordMask is limited to BertTokenier-like tokenizers

* Fix code formatting

* Update run_mlm.py (#12344)

Before the code could not be used for validation only because of this line:
extension = data_args.train_file.split(".")[-1]
was assuming that extension must be extracted from the training dataset. This line would run regardless of the training or validation options of the user. This would lead to an error if the user only wants to run an evaluation only and does not want to do train (because the training file does not exist). I modified it to extract extension from the training file if the user wants to do train and extract it from the validation file if the user wants to run eval. This way the code can be used for both training and validation separately.

* Add possibility to maintain full copies of files (#12312)

* [CI] add dependency table sync verification (#12364)

* add dependency table sync verification

* improve the message

* improve the message

* revert

* ready to merge

* [Examples] Added context manager to datasets map (#12367)

* added cotext manager to datasets map

* fixed style and spaces

* fixed warning of deprecation

* changed desc

* [Flax community event] Add more description to readme (#12398)

* fix_torch_device_generate_test

* remove @

* boom boom

* correct typos

* Apply suggestions from code review

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Apply suggestions from code review

Co-authored-by: Suzana Ilić <io.suzanai@gmail.com>

* Apply suggestions from code review

Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Suzana Ilić <io.suzanai@gmail.com>

* Update README.md

* Fix copies

* Remove the need for `einsum` in Albert's attention computation (#12394)

* debug albert einsum

* Fix matmul computation

* Let's use torch linear layer.

* Style.

* [Flax] Adapt flax examples to include `push_to_hub` (#12391)

* fix_torch_device_generate_test

* remove @

* finish

* correct summary writer

* correct push to hub

* fix indent

* finish

* finish

* finish

* finish

* finish

Co-authored-by: Patrick von Platen <patrick@huggingface.co>

* Tensorflow LM examples (#12358)

* Tensorflow MLM example

* Add CLM example

* Style fixes, adding missing checkpoint code from the CLM example

* Fix TPU training, avoid massive dataset warnings

* Fix incorrect training length calculation for multi-GPU training

* Fix incorrect training length calculation for multi-GPU training

* Refactors and nitpicks from the review

* Style pass

* Adding README

* pass the matching trainer log level to deepspeed (#12401)

* [Flax] Add T5 pretraining script (#12355)

* fix_torch_device_generate_test

* remove @

* add length computatan

* finish masking

* finish

* upload

* fix some bugs

* finish

* fix dependency table

* correct tensorboard

* Apply suggestions from code review

* correct processing

* slight change init

* correct some more mistakes

* apply suggestions

* improve readme

* fix indent

* Apply suggestions from code review

Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>

* correct tokenizer

* finish

* finish

* finish

* finish

Co-authored-by: Patrick von Platen <patrick@huggingface.co>
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>

* [models] respect dtype of the model when instantiating it (#12316)

* [models] respect dtype of the model when instantiating it

* cleanup

* cleanup

* rework to handle non-float dtype

* fix

* switch to fp32 tiny model

* improve

* use dtype.is_floating_point

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix the doc

* recode to use explicit torch_dtype_auto_detect, torch_dtype args

* docs and tweaks

* docs and tweaks

* docs and tweaks

* merge 2 args, add docs

* fix

* fix

* better doc

* better doc

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Rename detr targets to labels (#12280)

* Rename target to labels in DetrFeatureExtractor

* Update DetrFeatureExtractor tests accordingly

* Improve docs of DetrFeatureExtractor

* Improve docs

* Make style

* Add out of vocabulary error to ASR models (#12288)

* Add OOV error to ASR models

* Feedback changes

* Fix TFWav2Vec2 SpecAugment (#12289)

* Fix TFWav2Vec2 SpecAugment

* Invert masks

* Feedback changes

* [example/flax] add summarization readme (#12393)

* add readme

* update readme and add requirements

* Update examples/flax/summarization/README.md

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* [Flax] Example scripts - correct weight decay  (#12409)

* fix_torch_device_generate_test

* remove @

* finish

* finish

* correct style

* fix ids_to_tokens naming error in tokenizer of deberta v2 (#12412)

Co-authored-by: Jipeng Huang <jihuan@microsoft.com>

* minor fixes in original RAG training (#12395)

* Added talks (#12415)

* Easily train a new fast tokenizer from a given one (#12361)

* [WIP] Easily train a new fast tokenizer from a given one

* Fix test

* Roll out to other tokenizers and add tests

* Fix bug with unk id and add emoji to test

* Really use something different in test

* Implement special tokens map

* Map special tokens in the Transformers tokenizers

* Fix test

* Make test more robust

* Fix test for BPE

* More robust map and test

Co-authored-by SaulLu

* Test file

* Stronger tests

Co-authored-by: SaulLu <lucilesaul.com@gmail.com>

* Map unk token for Wordpiece and address review comment

* Fix lowercase test and address review comment

* Fix all tests

* Simplify test

* Fix tests for realsies

* Easily train a new fast tokenizer from a given one - tackle the special tokens format (str or AddedToken) (#12420)

* Propose change in tests regarding lower case

* add new test for special tokens types

* put back the test part about decoding

* add feature: the AddedToken is re-build with the different mapped content

* Address review comment: simplify AddedToken building

Co-authored-by: sgugger <sylvain.gugger@gmail.com>

* Update src/transformers/tokenization_utils_fast.py

Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: SaulLu <lucilesaul.com@gmail.com>
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>

* [modelcard] fix (#12422)

this PR is fixing an incorrect attribute - probably some tests are needed?

* Add option to save on each training node (#12421)

* Add option to save on each training node

* Apply suggestions from code review

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Address review comments

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Added to talks section (#12433)

Added one more confirmed speaker, zoom links and gcal event links

* Fix default bool in argparser (#12424)

* Fix default bool in argparser

* Add more to test

* Add default bos_token and eos_token for tokenizer of deberta_v2 (#12429)

* fix ids_to_tokens naming error in tokenizer of deberta v2

* Update tokenization_deberta_v2.py

Add bos_token and eos_token.

* format code

Co-authored-by: Jipeng Huang <jihuan@microsoft.com>

* Add CANINE (#12024)

* First pass

* More progress

* Add support for local attention

* More improvements

* More improvements

* Conversion script working

* Add CanineTokenizer

* Make style & quality

* First draft of integration test

* Remove decoder test

* Improve tests

* Add documentation

* Mostly docs improvements

* Add CanineTokenizer tests

* Fix most tests on GPU, improve upsampling projection

* Address most comments by @dhgarrette

* Remove decoder logic

* Improve Canine tests, improve docs of CanineConfig

* All tokenizer tests passing

* Make fix-copies and fix tokenizer tests

* Fix test_model_outputs_equivalence test

* Apply suggestions from @sgugger's review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Address some more comments

* Add support for hidden_states and attentions of shallow encoders

* Define custom CanineModelOutputWithPooling, tests pass

* First pass

* More progress

* Add support for local attention

* More improvements

* More improvements

* Conversion script working

* Add CanineTokenizer

* Make style & quality

* First draft of integration test

* Remove decoder test

* Improve tests

* Add documentation

* Mostly docs improvements

* Add CanineTokenizer tests

* Fix most tests on GPU, improve upsampling projection

* Address most comments by @dhgarrette

* Remove decoder logic

* Improve Canine tests, improve docs of CanineConfig

* All tokenizer tests passing

* Make fix-copies and fix tokenizer tests

* Fix test_model_outputs_equivalence test

* Apply suggestions from @sgugger's review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Address some more comments

* Make conversion script work for Canine-c too

* Fix tokenizer tests

* Remove file

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Document patch release v4.8.2

* fix typo in mt5 configuration docstring (#12432)

* Add to talks section (#12442)

* [JAX/Flax readme] add philosophy doc (#12419)

* add philosophy doc

* fix typos

* update doc

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* address Patricks suggestions

* add a training example and fix typos

* jit the training step

* jit train step

* fix example code

* typo

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* [Flax] Add wav2vec2 (#12271)

* fix_torch_device_generate_test

* remove @

* start flax wav2vec2

* save intermediate

* forward pass has correct shape

* add weight norm

* add files

* finish ctc

* make style

* finish gumbel quantizer

* correct docstrings

* correct some more files

* fix vit

* finish quality

* correct tests

* correct docstring

* correct tests

* start wav2vec2 pretraining script

* save intermediate

* start pretraining script

* finalize pretraining script

* finish

* finish

* small typo

* finish

* correct

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>

* make style

* push

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Add missing Copied from statements

* Reference model uploaded under Google org

* Fix various duplicates from merging

* Rembert-large -> rembert, fix overeager Copied from, return type

* Incorporate PR comments from Patrick and Sylvain

Co-authored-by: ctheodoris <seanymphoceana@yahoo.com>
Co-authored-by: ctheodoris <cvtheodo@ds.dfci.harvard.edu>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Teven <teven.lescao@gmail.com>
Co-authored-by: Nick Lane-Smith <nlanesmith@gmail.com>
Co-authored-by: Shiro T <stsuchi@users.noreply.github.com>
Co-authored-by: Wang Ran (汪然) <wrran@outlook.com>
Co-authored-by: Ahmet Akkoç <themadprogramer@gmail.com>
Co-authored-by: francescorubbo <francescorubbo@users.noreply.github.com>
Co-authored-by: Daniel Stancl <46073029+stancld@users.noreply.github.com>
Co-authored-by: talkhaldi <tareq.alkhaldi@gmail.com>
Co-authored-by: joerenner <joepeterrenner@gmail.com>
Co-authored-by: jrenner <joseph.renner@inria.fr>
Co-authored-by: Avital Oliver <avitalo@google.com>
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
Co-authored-by: Josh Tanner <mindful.jt@gmail.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Bhadresh Savani <bhadreshpsavani@gmail.com>
Co-authored-by: Jayendra <jayendra0parmar@gmail.com>
Co-authored-by: jayendra <jayendra@infocusp.in>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Philip May <philip@may.la>
Co-authored-by: Nicholas Vadivelu <nicholas.vadivelu@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Shamane Siri <shamane@ahlab.org>
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
Co-authored-by: Fan Zhang <zhangfan.tju@gmail.com>
Co-authored-by: Riccardo Bassani <48254418+BassaniRiccardo@users.noreply.github.com>
Co-authored-by: Volodymyr Byno <volodymyr.byno@gmail.com>
Co-authored-by: Jeoung-Minju <51041861+JminJ@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Alberto Villa <a.villa.diez@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Gunjan Chhablani <chhablani.gunjan@gmail.com>
Co-authored-by: Kou Yong Kang <kou.yongkang@dhs.sg>
Co-authored-by: Shiva Pundir <36535845+ceevaaa@users.noreply.github.com>
Co-authored-by: François Lagunas <francois.lagunas@gmail.com>
Co-authored-by: Peter Izsak <232524+peteriz@users.noreply.github.com>
Co-authored-by: Russell Klopfer <russell@klopfer.us>
Co-authored-by: Mario Šaško <mariosasko777@gmail.com>
Co-authored-by: cdleong <4109253+cdleong@users.noreply.github.com>
Co-authored-by: Koichi Yasuoka <yasuoka@kanji.zinbun.kyoto-u.ac.jp>
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
Co-authored-by: kumapo <kumapo@users.noreply.github.com>
Co-authored-by: Tobias Norlund <tobias@norlund.se>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
Co-authored-by: Bhavitvya Malik <bhavitvya.malik@gmail.com>
Co-authored-by: Jonathan Chang <31893406+cccntu@users.noreply.github.com>
Co-authored-by: Guido Novati <16716298+novatig@users.noreply.github.com>
Co-authored-by: Guido Novati <gnovati@nvidia.com>
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
Co-authored-by: Nicholas Broad <nbroad94@gmail.com>
Co-authored-by: Nicholas Broad <nicholas@nmbroad.com>
Co-authored-by: Kumar Abhishek <kr.abhish@gmail.com>
Co-authored-by: Kumar Abhishek <kabhishek@expedia.com>
Co-authored-by: Will Rice <will@spokestack.io>
Co-authored-by: Vasudev Gupta <7vasudevgupta@gmail.com>
Co-authored-by: Kilian Kluge <32523967+ionicsolutions@users.noreply.github.com>
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com>
Co-authored-by: Xa9aX ツ <mishradiganta91@gmail.com>
Co-authored-by: Vishal Burman <vishal.a.burman23@gmail.com>
Co-authored-by: Hamid Shojanazeri <hamid.nazeri2010@gmail.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-32-81.us-west-2.compute.internal>
Co-authored-by: Stefan Schweter <stefan@schweter.it>
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
Co-authored-by: David Fan <30608893+jiafatom@users.noreply.github.com>
Co-authored-by: chenht2010 <chenht2010@yahoo.com>
Co-authored-by: chenhaitao <chenhaitao@qiyi.com>
Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
Co-authored-by: Michael Benayoun <michael@huggingface.co>
Co-authored-by: Sam Havens <47401552+sam-qordoba@users.noreply.github.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Marc van Zee <marcvanzee@gmail.com>
Co-authored-by: michal pitr <21157924+MichalPitr@users.noreply.github.com>
Co-authored-by: jglaser <glaserj@ornl.gov>
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
Co-authored-by: cronoik <johannes.schaffrath@mail.de>
Co-authored-by: Taha ValizadehAslani <47432410+TahaAslani@users.noreply.github.com>
Co-authored-by: Suzana Ilić <io.suzanai@gmail.com>
Co-authored-by: Funtowicz Morgan <mfuntowicz@users.noreply.github.com>
Co-authored-by: Will Rice <wrice20@gmail.com>
Co-authored-by: Jabin Huang <huangjipengnju@gmail.com>
Co-authored-by: Jipeng Huang <jihuan@microsoft.com>
Co-authored-by: SaulLu <lucilesaul.com@gmail.com>
Co-authored-by: fcakyon <34196005+fcakyon@users.noreply.github.com>
2021-07-24 11:31:42 -04:00
Patrick von Platen
f6e254474c
[Sequence Feature Extraction] Add truncation (#12804)
* fix_torch_device_generate_test

* remove @

* add truncate

* finish

* correct test

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* clean tests

* correct normalization for truncation

* remove casting

* up

* save intermed

* finish

* finish

* correct

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-07-23 17:53:30 +02:00
Stas Bekman
98364ea74f
[tests] fix logging_steps requirements (#12860) 2021-07-23 08:05:48 -07:00
Nicolas Patry
795c1444e9
Improving pipeline tests (#12784)
* Proposal

* Testing pipelines slightly better.

- Overall same design
- Metaclass to get proper different tests instead of subTest (not well
supported by Pytest)
- Added ANY meta object to make output checking more readable.
- Skipping architectures either without tiny_config or without
architecture.

* Small fix.

* Fixing the tests in case of None value.

* Oups.

* Rebased with more architectures.

* Fixing reformer tests (no override anymore).

* Adding more options for model tester config.

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-07-22 15:19:35 +02:00
Sylvain Gugger
786ced3639
Add versioning system to fast tokenizer files (#12713)
* Add versioning system to fast tokenizer files

* Deal with offline mode

* Use staging env in tests

* Style

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Style

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-07-21 08:24:36 -04:00
Lysandre Debut
c3d9ac7607
Expose get_config() on ModelTesters (#12812)
* Expose get_config() on ModelTesters

* Typo
2021-07-21 04:13:11 -04:00
Stas Bekman
cabcc75171
[trainer] sanity checks for save_steps=0|None and logging_steps=0 (#12796)
* [trainer] fix % 0

* sanity checks

* fix logging_strategy

* correction

* Update src/transformers/training_args.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-07-20 09:05:26 -07:00
Sylvain Gugger
0118ef89ee
Enforce eval and save strategies are compatible when --load_best_model_at_end (#12786)
* Enforce eval and save strategies are compatible when --load_best_model_at_end

* Update doc

* Fix typos

* Fix tests
2021-07-19 19:50:47 +02:00
Tomohiro Endo
08d609bfb8
Add tokenizers class mismatch detection between cls and checkpoint (#12619)
* Detect mismatch by analyzing config

* Fix comment

* Fix import

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>

* Revise based on reviews

* remove kwargs

* Fix exception

* Fix handling exception again

* Disable mismatch test in PreTrainedTokenizerFast

Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
2021-07-17 15:52:21 +02:00
Patrick von Platen
b4b562d834
[Wav2Vec2] Padded vectors should not allowed to be sampled (#12764)
* fix_torch_device_generate_test

* remove @

* finish

* correct script

* correct script
2021-07-16 19:07:08 +02:00
SaulLu
6e87010060
Preserve list type of additional_special_tokens in special_token_map (#12759)
* preserve type of `additional_special_tokens` in `special_token_map`

* format

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-07-16 18:26:54 +02:00
Funtowicz Morgan
fbf1397bf8
Turn on eval mode when exporting to ONNX (#12758)
* Set model in eval mode when exporting to ONNX.

* Disable t5 for now.

* Disable T5 with past too.

* Style.
2021-07-16 15:09:15 +02:00
Patrick von Platen
2e9fb13fb1
[Wav2Vec2] Correctly pad mask indices for PreTraining (#12748)
* fix_torch_device_generate_test

* remove @

* start adding tests

* correct wav2vec2 pretraining

* up

* up

Co-authored-by: Patrick von Platen <patrick@huggingface.co>
2021-07-15 21:40:25 +01:00
Lysandre Debut
959d448b3f
Fix led torchscript (#12735)
* Don't test LED on torchscript

* Typo
2021-07-15 11:48:50 -04:00
Lysandre Debut
f03580fb02
Fix DETR integration test (#12734) 2021-07-15 11:48:37 -04:00
Lysandre Debut
f42d9dcc0e
Patch T5 device test (#12742) 2021-07-15 16:40:17 +01:00
Lysandre Debut
370be9cc38
Fix MBart failing test (#12737) 2021-07-15 16:39:35 +01:00
Lysandre Debut
eb2e006b35
Skip test while the model is not available (#12740) 2021-07-15 09:14:12 -04:00
Lysandre Debut
8c7bd1b97b
Skip test while the model is not available (#12739) 2021-07-15 09:06:47 -04:00
Lysandre Debut
3290315a2a
Fix AutoModel tests (#12733) 2021-07-15 09:06:12 -04:00
Lysandre Debut
01cb2f25e3
LXMERT integration test typo (#12736) 2021-07-15 08:29:49 -04:00
Stas Bekman
a18a17d2b6
[test] split test into 4 sub-tests to avoid timeout (#12710)
* split the test into 4 sub-tests to avoid timeout

* fix decorator order
2021-07-14 13:04:58 -07:00
Sylvain Gugger
084873b025
Only test the files impacted by changes in the diff (#12644)
* Base test

* More test

* Fix mistake

* Add a docstring change

* Add doc ignore

* Add changes

* Add recursive dep search

* Add recursive dep search

* save

* Finalize test mapping

* Fix bug

* Print prettier

* Ignore comments and empty lines

* Make script runnable from anywhere

* Need dev install

* Like that

* Adapt

* Add as artifact

* Try on torch tests

* Fix yaml error

* Install GitPython

* Apply everywhere

* Be more defensive

* Revert to all tests if something is wrong

* Install GitPython

* Test if there are tests before launching.

* Fixes

* Fixes

* Fixes

* Fixes

* Bash syntax is horrible

* Be less stupid

* Try differently

* Typo

* Typo

* Typo

* Style

* Better name

* Escape quotes

* Ignore black unhelpful re-formatting

* Not a docstring

* Deal with inits in dependency map

* Run all tests once PR is merged.

* Add last job

* Apply suggestions from code review

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Stronger dependencies gather

* Ignore empty lines too!

* Clean up

* Fix quality

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2021-07-14 10:56:55 -04:00
Stas Bekman
5dd0c956a8
non-native optimizers are mostly ok with zero-offload (#12690) 2021-07-13 20:18:51 -07:00
Stas Bekman
78f5fe1416
[Deepspeed] adapt multiple models, add zero_to_fp32 tests (#12477)
* zero_to_fp32 tests

* args change

* remove unnecessary work

* use transformers.trainer_utils.get_last_checkpoint

* document the new features

* cleanup

* wip

* fix fsmt

* add bert

* cleanup

* add xlm-roberta

* electra works

* cleanup

* sync

* split off the model zoo tests

* cleanup

* cleanup

* cleanup

* cleanup

* reformat

* cleanup

* casing

* deepspeed>=0.4.3

* adjust distilbert

* Update docs/source/main_classes/deepspeed.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* style

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-07-13 12:07:32 -07:00
Patrick von Platen
cee2d2135f
[Flax Generation] Correct inconsistencies PyTorch/Flax (#12662)
* fix_torch_device_generate_test

* remove @

* correct greedy search

* save intertmed

* add final logits bias

* correct

* up

* add more tests

* fix another bug

* finish tests

* finish marian tests

* up

Co-authored-by: Patrick von Platen <patrick@huggingface.co>
2021-07-13 18:53:30 +01:00
Sylvain Gugger
90178b0cef
Add option to load a pretrained model with mismatched shapes (#12664)
* Add option to load a pretrained model with mismatched shapes

* Fail at loading when mismatched shapes in Flax

* Fix tests

* Update src/transformers/modeling_flax_utils.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Address review comments

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-07-13 10:15:15 -04:00
Lysandre Debut
9da1acaea2
**encode_plus() shouldn't run for W2V2CTC (#12655)
* **encode_plus() shouldn't run for  W2V2CTC

* Typo
2021-07-13 06:31:56 -04:00
Lysandre Debut
a6938c4721
Patch BigBird tokenization test (#12653) 2021-07-13 02:53:06 -04:00
Lysandre Debut
b189226e8c
Fix transfo xl integration test (#12652)
* Cleanup test

* Skip TF TransfoXL test
2021-07-12 11:51:35 -04:00
Lysandre Debut
fd41e2daf4
Pipeline should be agnostic (#12656) 2021-07-12 11:42:59 -04:00
Lysandre Debut
fb5665b5ad
The extended trainer tests should require torch (#12650) 2021-07-12 09:47:05 -04:00
Lysandre Debut
0af8579bbe
Skip TestMarian_MT_EN (#12649)
* Skip TestMarian_MT_EN

* Skip EN_ZH and EN_ROMANCE

* Skip EN_ROMANCE pipeline
2021-07-12 09:11:32 -04:00
Will Rice
fb65f65ea6
Add TFHubertModel (#12206)
* TFHubert

* Update with TFWav2Vec Bug Fixes

* Add OOV Error

* Feedback changes

* Fix kwargs call
2021-07-09 18:55:25 +01:00
Alex Hedges
e7f33e8cb3
Pass model_kwargs when loading a model in pipeline() (#12449)
* Pass model_kwargs when loading a model in pipeline

* Add test for model_kwargs parameter of pipeline()

* Rewrite test to not download model

* Fix failing style checks
2021-07-09 09:24:55 -04:00
Patrick von Platen
65e27215ba
[Flax] Add flax marian (#12595)
* fix_torch_device_generate_test

* remove @

* add marian

* finish make style

* add model

* add docs

* add test

* add integration tests

* up

* solve bug

* correct tests

* correct some tests

* Apply suggestions from code review

Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* correct adapt marian

* finish

Co-authored-by: Patrick von Platen <patrick@huggingface.co>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-07-09 11:42:13 +01:00
Nicolas Patry
cc12e1dbf6
This will reduce "Already borrowed error": (#12550)
* This will reduce "Already borrowed error":

Original issue https://github.com/huggingface/tokenizers/issues/537

The original issue is caused by transformers calling many times
mutable functions on the rust tokenizers.
Rust needs to guarantee that only 1 agent has a mutable reference
to memory at a given time (for many reasons which don't need explaining
here). Usually, the rust compiler can guarantee that this property is
true at compile time.

Unfortunately, this is impossible for Python to do that, so PyO3, the
bridge between rust and python used by `tokenizers`, will change the
compile guarantee for a dynamic guarantee, so if multiple agents try
to have multiple mutable borrows at the same time, then the runtime will
yell with "Already borrowed".

The proposed fix here in transformers, is simply to reduce the actual
number of calls that really need mutable borrows. By reducing them,
we reduce the risk of running into "Already borrowed" error.
The caveat is now we add a call to read the current configuration of the
`_tokenizer`, so worst case we have 2 calls instead of 1, and best case
we simply have 1 + a Python comparison of a dict (should be negligible).

* Adding a test.

* trivial error :(.

* Update tests/test_tokenization_fast.py

Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>

* Adding reference to original issues in the tests.

* Update the tests with fast tokenizer.

Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
2021-07-09 09:36:05 +02:00
Nicolas Patry
4da568c152
Fixing the pipeline optimization by reindexing targets (V2) (#12330)
* Fixing the pipeline optimization by rescaling the logits first.

* Add test for target equivalence

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-07-08 16:58:15 +02:00
Funtowicz Morgan
2aa3cd935d
[RFC] Laying down building stone for more flexible ONNX export capabilities (#11786)
* Laying down building stone for more flexible ONNX export capabilities

* Ability to provide a map of config key to override before exporting.

* Makes it possible to export BART with/without past keys.

* Supports simple mathematical syntax for OnnxVariable.repeated

* Effectively apply value override from onnx config for model

* Supports export with additional features such as with-past for seq2seq

* Store the output path directly in the args for uniform usage across.

* Make BART_ONNX_CONFIG_* constants and fix imports.

* Support BERT model.

* Use tokenizer for more flexibility in defining the inputs of a model.

* Add TODO as remainder to provide the batch/sequence_length as CLI args

* Enable optimizations to be done on the model.

* Enable GPT2 + past

* Improve model validation with outputs containing nested structures

* Enable Roberta

* Enable Albert

* Albert requires opset >= 12

* BERT-like models requires opset >= 12

* Remove double printing.

* Enable XLM-Roberta

* Enable DistilBERT

* Disable optimization by default

* Fix missing setattr when applying optimizer_features

* Add value field to OnnxVariable to define constant input (not from tokenizers)

* Add T5 support.

* Simplify model type retrieval

* Example exporting token_classification pipeline for DistilBERT.

* Refactoring to package `transformers.onnx`

* Solve circular dependency & __main__

* Remove unnecessary imports in `__init__`

* Licences

* Use @Narsil's suggestion to forward the model's configuration to the ONNXConfig to avoid interpolation.

* Onnx export v2 fixes (#12388)

* Tiny fixes
Remove `convert_pytorch` from onnxruntime-less runtimes
Correct reference to model

* Style

* Fix Copied from

* LongFormer ONNX config.

* Removed optimizations

* Remvoe bad merge relicas.

* Remove unused constants.

* Remove some deleted constants from imports.

* Fix unittest to remove usage of PyTorch model for onnx.utils.

* Fix distilbert export

* Enable ONNX export test for supported model.

* Style.

* Fix lint.

* Enable all supported default models.

* GPT2 only has one output

* Fix bad property name when overriding config.

* Added unittests and docstrings.

* Disable with_past tests for now.

* Enable outputs validation for default export.

* Remove graph opt lvls.

* Last commit with on-going past commented.

* Style.

* Disabled `with_past` for now

* Remove unused imports.

* Remove framework argument

* Remove TFPreTrainedModel reference

* Add documentation

* Add onnxruntime tests to CircleCI

* Add test

* Rename `convert_pytorch` to `export`

* Use OrderedDict for dummy inputs

* WIP Wav2Vec2

* Revert "WIP Wav2Vec2"

This reverts commit f665efb04c92525c3530e589029f0ae7afdf603e.

* Style

* Use OrderedDict for I/O

* Style.

* Specify OrderedDict documentation.

* Style :)

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-07-08 10:54:42 -04:00
Nicolas Patry
ebc69afc30
Adding support for pipeline("automatic-speech-recognition"). (#11525)
* Adding support for `pipeline("automatic-speech-recognition")`.

- Ugly `"config"` choice for AutoModel. It would be great to have the
possibility to have something like `AutoModelFor` that would implement
the same logic (Load the config, check Architectures and load the first
one)

* Remove `model_id` was not needed in the end.

* Rebased !

* Remove old code.

* Rename `nlp`.
2021-07-07 16:06:48 +02:00
Daniel Stancl
61400e1ec7
[Flax] Add FlaxMBart (#12236)
* Copy BART to MBart and rename some stuff

* Add copy statements pointing to FlaxBart

* Update/add some common files

* Update shift_tokens_rigth + fix imports

* Fix shift_tokens_right method according to MBart implementation

* Update shift_tokens_right in tests accordingly

* Fix the import issue and update docs file
* make style quality

* Do some minor changes according to patil-suraj suggestions

* Change the order of normalization layer and attention

* Add some copu statementes

* Update generate method and add integration test for mBart

* Make a few updates after a review

Besides, add `lang_code_to_id` to MBartTokenizeFast

* fix-copies; make style quality

* Apply suggestions from code review

* Apply suggestions from code review

* Apply suggestions from code review

* fix output type, style

* add copied from

* resolve conflicts

Co-authored-by: Suraj Patil <surajp815@gmail.com>
2021-07-07 12:20:38 +05:30
sadakmed
3fd85777ea
implementing tflxmertmodel integration test (#12497)
* implementing tflxmertmodel integration test

* move import

* revert and fix
2021-07-06 11:44:47 -04:00
Suraj Patil
7a259c190c
FlaxGPTNeo (#12493)
* flax gpt neo

* fix query scaling

* update generation test

* use flax model for test
2021-07-06 18:55:18 +05:30
yujun
626a0a0147
[RoFormer] Fix some issues (#12397)
* add RoFormerTokenizerFast into AutoTokenizer

* fix typo in roformer docs

* make onnx export happy

* update RoFormerConfig embedding_size

* use jieba not rjieba

* fix 12244 and make test_alignement passed

* update ARCHIVE_MAP

* make style & quality & fixup

* update

* make style & quality & fixup

* make style quality fixup

* update

* suggestion from LysandreJik

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* make style

* use rjieba

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-07-06 03:31:57 -04:00
sadakmed
0e1718afb6
create LxmertModelIntegrationTest Pytorch (#9989)
* create LxmertModelIntegrationTest

* implementation using numpy seeding to fix inputs params.

* fix code quality

* isort check
2021-07-05 05:21:25 -04:00
Lysandre Debut
b889d3f6c4
Fix TAPAS test uncovered by #12446 (#12480) 2021-07-02 04:35:10 -04:00
Stas Bekman
2d1d92181a
[roberta] fix lm_head.decoder.weight ignore_key handling (#12446)
* fix lm_head.decoder.weight ignore_key handling

* fix the mutable class variable

* Update src/transformers/models/roberta/modeling_roberta.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* replicate the comment

* make deterministic

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-07-01 10:31:19 -07:00
Patrick von Platen
27d348f2fe
[Wav2Vec2, Hubert] Fix ctc loss test (#12458)
* fix_torch_device_generate_test

* remove @

* fix test
2021-07-01 08:59:32 -04:00
SaulLu
3aa37b945e
Add test for a WordLevel tokenizer model (#12437)
* add a test for a WordLevel tokenizer

* adapt common test to new tokenizer
2021-07-01 12:37:07 +02:00
Patrick von Platen
0d1f67e651
[Flax] Add wav2vec2 (#12271)
* fix_torch_device_generate_test

* remove @

* start flax wav2vec2

* save intermediate

* forward pass has correct shape

* add weight norm

* add files

* finish ctc

* make style

* finish gumbel quantizer

* correct docstrings

* correct some more files

* fix vit

* finish quality

* correct tests

* correct docstring

* correct tests

* start wav2vec2 pretraining script

* save intermediate

* start pretraining script

* finalize pretraining script

* finish

* finish

* small typo

* finish

* correct

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>

* make style

* push

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
2021-06-30 18:44:23 +01:00
NielsRogge
6e68597877
Add CANINE (#12024)
* First pass

* More progress

* Add support for local attention

* More improvements

* More improvements

* Conversion script working

* Add CanineTokenizer

* Make style & quality

* First draft of integration test

* Remove decoder test

* Improve tests

* Add documentation

* Mostly docs improvements

* Add CanineTokenizer tests

* Fix most tests on GPU, improve upsampling projection

* Address most comments by @dhgarrette

* Remove decoder logic

* Improve Canine tests, improve docs of CanineConfig

* All tokenizer tests passing

* Make fix-copies and fix tokenizer tests

* Fix test_model_outputs_equivalence test

* Apply suggestions from @sgugger's review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Address some more comments

* Add support for hidden_states and attentions of shallow encoders

* Define custom CanineModelOutputWithPooling, tests pass

* First pass

* More progress

* Add support for local attention

* More improvements

* More improvements

* Conversion script working

* Add CanineTokenizer

* Make style & quality

* First draft of integration test

* Remove decoder test

* Improve tests

* Add documentation

* Mostly docs improvements

* Add CanineTokenizer tests

* Fix most tests on GPU, improve upsampling projection

* Address most comments by @dhgarrette

* Remove decoder logic

* Improve Canine tests, improve docs of CanineConfig

* All tokenizer tests passing

* Make fix-copies and fix tokenizer tests

* Fix test_model_outputs_equivalence test

* Apply suggestions from @sgugger's review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Address some more comments

* Make conversion script work for Canine-c too

* Fix tokenizer tests

* Remove file

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-30 08:05:44 -04:00
Sylvain Gugger
c9486fd0f5
Fix default bool in argparser (#12424)
* Fix default bool in argparser

* Add more to test
2021-06-30 07:57:05 -04:00
Sylvain Gugger
dc42e770b8
Easily train a new fast tokenizer from a given one (#12361)
* [WIP] Easily train a new fast tokenizer from a given one

* Fix test

* Roll out to other tokenizers and add tests

* Fix bug with unk id and add emoji to test

* Really use something different in test

* Implement special tokens map

* Map special tokens in the Transformers tokenizers

* Fix test

* Make test more robust

* Fix test for BPE

* More robust map and test

Co-authored-by SaulLu

* Test file

* Stronger tests

Co-authored-by: SaulLu <lucilesaul.com@gmail.com>

* Map unk token for Wordpiece and address review comment

* Fix lowercase test and address review comment

* Fix all tests

* Simplify test

* Fix tests for realsies

* Easily train a new fast tokenizer from a given one - tackle the special tokens format (str or AddedToken) (#12420)

* Propose change in tests regarding lower case

* add new test for special tokens types

* put back the test part about decoding

* add feature: the AddedToken is re-build with the different mapped content

* Address review comment: simplify AddedToken building

Co-authored-by: sgugger <sylvain.gugger@gmail.com>

* Update src/transformers/tokenization_utils_fast.py

Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: SaulLu <lucilesaul.com@gmail.com>
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
2021-06-29 15:00:08 -04:00
Will Rice
bc084938f2
Add out of vocabulary error to ASR models (#12288)
* Add OOV error to ASR models

* Feedback changes
2021-06-29 08:57:46 +01:00
NielsRogge
1fc6817a30
Rename detr targets to labels (#12280)
* Rename target to labels in DetrFeatureExtractor

* Update DetrFeatureExtractor tests accordingly

* Improve docs of DetrFeatureExtractor

* Improve docs

* Make style
2021-06-29 03:07:46 -04:00
Stas Bekman
7682e97702
[models] respect dtype of the model when instantiating it (#12316)
* [models] respect dtype of the model when instantiating it

* cleanup

* cleanup

* rework to handle non-float dtype

* fix

* switch to fp32 tiny model

* improve

* use dtype.is_floating_point

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix the doc

* recode to use explicit torch_dtype_auto_detect, torch_dtype args

* docs and tweaks

* docs and tweaks

* docs and tweaks

* merge 2 args, add docs

* fix

* fix

* better doc

* better doc

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-28 20:11:21 -07:00
Bhadresh Savani
04dbea31a9
[Examples] Added context manager to datasets map (#12367)
* added cotext manager to datasets map

* fixed style and spaces

* fixed warning of deprecation

* changed desc
2021-06-28 09:14:00 -07:00
Stas Bekman
4a872caef4
remove extra white space from log format (#12360) 2021-06-25 13:20:14 -07:00
Lysandre Debut
8ef62ec9e1
Fix torchscript tests (#12336)
* Fix torchscript tests

* Better test

* Remove bogus print
2021-06-24 09:52:28 -04:00
Michael Benayoun
986ac03e37
changed modeling_fx_utils.py to utils/fx.py for clarity (#12326)
Co-authored-by: Michael Benayoun <michael@huggingface.co>
2021-06-23 18:16:24 +02:00
Lysandre
941b4442ba Temporarily revert the fill-mask improvements. 2021-06-23 17:46:24 +02:00
Sylvain Gugger
53c60babe4
Clean push to hub API (#12187)
* Clean push to hub API

* Create working dir if it does not exist

* Different tweak

* New API + all models + test Flax

* Adds the Trainer clean up

* Update src/transformers/file_utils.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments

* (nit) output types

* No need to set clone_from when folder exists

* Update src/transformers/trainer.py

Co-authored-by: Julien Chaumond <julien@huggingface.co>

* Add generated_from_trainer tag

* Update to new version

* Fixes

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-06-23 10:11:19 -04:00
Vasudev Gupta
e98233dde1
Flax T5 (#12150)
* copy pytorch-t5

* init

* boom boom

* forward pass same

* make generation work

* add more tests

* make test work

* finish normal tests

* make fix-copies

* finish quality

* correct slow example

* correct slow test

* version table

* upload models

* Update tests/test_modeling_flax_t5.py

* correct incorrectly deleted line

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
2021-06-23 13:13:32 +01:00
Daniel Stancl
26a2e36595
Add output in a dictionary for TF generate method (#12139)
* Add output args to greedy search

* Fix critical typo + make style quality

* Handle generate_beam_search

* Add dict_specific tests and fix the placement of encoder outputs

* Add  specific outputs

* Update doc

* Fix typo

* Adjust handling encoder_outputs + Fix generating for T5

* Fix generate for RAG

* Fix handling ouptut_attentions when target_mapping is not None

Take care of situations when target_mapping is provided
as there are 2-tuple of attentions

Change from:
if inputs["output_attentions"]:
    attentions = tuple(tf.transpose(t, perm(2, 3, 0, 1)) for t in attentions)

to:
if inputs["output_attentions"]:
    if inputs["target_mapping"] is not None:
        # when target_mapping is provided, there are 2-tuple of attentions
         attentions = tuple(
             tuple(tf.transpose(attn_stream, perm=(2, 3, 0, 1)) for attn_stream in t) for t in attentions
        )
    else:
        attentions = tuple(tf.transpose(t, perm=(2, 3, 0, 1)) for t in attentions)

* Rename kwargs to model_kwargs

* make style quality

* Move imports in test_modeling_tf_common.py

Move ModelOutput-related imports in test_modeling_tf_common.py
into the `is_tf_available():` statement.

* Rewrite nested if-statements

* Fix added tests
2021-06-23 10:52:11 +01:00
Nicolas Patry
d4be498441
Optimizing away the fill-mask pipeline. (#12113)
* Optimizing away the `fill-mask` pipeline.

- Don't send anything to the tokenizer unless needed. Vocab check is
much faster
- Keep BC by sending data to the tokenizer when needed. User handling warning messages will see performance benefits again
- Make `targets` and `top_k` work together better `top_k` cannot be
higher than `len(targets)` but can be smaller still.
- Actually simplify the `target_ids` in case of duplicate (it can happen
because we're parsing raw strings)
- Removed useless code to fail on empty strings. It works only if empty
string is in first position, moved to ignoring them instead.
- Changed the related tests as only the tests would fail correctly
(having incorrect value in first position)

* Make tests compatible for 2 different vocabs... (at the price of a
warning).

Co-authored-by: @EtaoinWu

* ValueError working globally

* Update src/transformers/pipelines/fill_mask.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* `tokenizer.vocab` -> `tokenizer.get_vocab()` for more compatiblity +
fallback.

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-06-23 10:38:04 +02:00
Stas Bekman
ebe5413589
[trainer] 2 bug fixes and a rename (#12309)
* bug fixes and a rename

* add extended DDP test
2021-06-22 11:13:23 -07:00
Stas Bekman
0d97ba8a98
[tests] multiple improvements (#12294)
* [tests] multiple improvements

* cleanup

* style

* todo to investigate

* fix
2021-06-21 19:51:36 -07:00
Stas Bekman
dad414d5f9
[trainer + examples] set log level from CLI (#12276)
* set log level from CLI

* add log_level_replica + test + extended docs

* cleanup

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* rename datasets objects to allow datasets module

* improve the doc

* style

* doc improve

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-21 19:30:50 -07:00
Stas Bekman
a4ed074d4b
reset report_to to none, avoid deprecation warning (#12293) 2021-06-21 16:50:12 -07:00
Patrick von Platen
4e9a6796c7
[Flax] Fix flax test save pretrained (#12256)
* fix_torch_device_generate_test

* remove @

* fix flax save pretrained test
2021-06-21 16:37:13 +01:00
Suraj Patil
eb881674f2
[Flax] [WIP] allow loading head model with base model weights (#12255)
* boom boom

* remove flax clip example

* allow loading head model with base model weights

* add test

* fix imports

* disable save, load test for clip

* add test_save_load_to_base
2021-06-21 15:56:42 +01:00
Suraj Patil
8d5b7f36e5
[FlaxClip] fix test from/save pretrained test (#12284)
* boom boom

* remove flax clip example

* fix from_save_pretrained
2021-06-21 15:54:34 +01:00
Sylvain Gugger
adb70eda4d
AutoTokenizer: infer the class from the tokenizer config if possible (#12208)
* AutoTokenizer: infer the class from the tokenizer config if possible

* Add tests

* Update src/transformers/models/auto/tokenization_auto.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-06-17 12:39:22 -04:00
Lysandre Debut
b56848c8c8
Pipeline update & tests (#12207) 2021-06-17 09:41:16 +02:00
Patrick von Platen
ccca510276
Hubert (#11889)
* fix_torch_device_generate_test

* remove @

* add hubert

* add first test file

* more docs

* fix bugs

* fix bug

* finish

* finish

* finish docstring

* fix

* fix

* finalize

* add to ignored

* finish

* Apply suggestions from code review

* correct naming

* finish

* fix auto config

* finish

* correct convert script

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Suraj Patil <surajp815@gmail.com>

* apply suggestions lysandre & suraj

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
2021-06-16 12:14:12 +01:00
Patrick von Platen
c3c39f7e84
[Flax] Add Beam Search (#12131)
* fix_torch_device_generate_test

* remove @

* push new logit processors

* add processors

* save first working version

* save intermediate

* finish

* make style

* make fix-copies

* finish

* Update tests/test_modeling_flax_bart.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Suraj Patil <surajp815@gmail.com>

Co-authored-by: Patrick von Platen <patrick@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
2021-06-16 09:43:54 +01:00
Stas Bekman
6e7cc5cc51
[testing] ensure concurrent pytest workers use a unique port for torch.dist (#12166)
* ensure concurrent pytest workers use a unique port for torch.distributed.launch

* reword
2021-06-15 11:12:59 -07:00
Amog Kamsetty
b9d66f4c4b
Ray Tune Integration Updates (#12134)
* fix

* fixes

* add back to scheduled tests

* formatting

* Update integrations.py
2021-06-15 14:11:29 -04:00
Stas Bekman
372ab9cd6d
[style] consistent nn. and nn.functional: part 3 tests (#12155)
* consistent nn. and nn.functional: p3 templates

* restore
2021-06-14 12:18:22 -07:00
Vasudev Gupta
d9c0d08f9a
Flax Big Bird (#11967)
* add flax bert

* bert -> bigbird

* original_full ported

* add debugger

* init block sparse

* fix copies ; gelu_fast -> gelu_new

* block sparse port

* fix block sparse

* block sparse working

* all ckpts working

* fix-copies

* make quality

* init tests

* temporary fix for FlaxBigBirdForMultipleChoice

* skip test_attention_outputs

* fix

* gelu_fast -> gelu_new ; fix multiple choice model

* remove nsp

* fix sequence classifier

* fix

* make quality

* make fix-copies

* finish

* Delete debugger.ipynb

* Update src/transformers/models/big_bird/modeling_flax_big_bird.py

* make style

* finish

* bye bye jit flax tests

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-06-14 20:01:03 +01:00
Patrick von Platen
007be9e402
[Flax] Fix flax pt equivalence tests (#12154)
* fix_torch_device_generate_test

* remove @

* upload
2021-06-14 19:19:10 +01:00
Will Rice
d438eee030
Adding TFWav2Vec2Model (#11617)
* [WIP] Add TFWav2Vec2Model

Work in progress for adding a tensorflow version of Wav2Vec2

* feedback changes

* small fix

* Test Feedback Round 1

* Add SpecAugment and CTC Loss

* correct spec augment mask creation

* docstring and correct copyright

* correct bugs

* remove bogus file

* finish tests correction

* del unnecessary layers

* Update src/transformers/models/wav2vec2/modeling_tf_wav2vec2.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* make style

* correct final bug

* Feedback Changes

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-06-14 18:58:54 +01:00
Stas Bekman
ff7c81687a
[optim] implement AdafactorSchedule (#12123)
* implement AdafactorSchedule

* typo

* fix

* Update src/transformers/optimization.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-14 09:43:48 -07:00
SaulLu
476ba679dd
Feature to use the PreTrainedTokenizerFast class as a stand-alone tokenizer (#11810)
* feature for tokenizer without slow/legacy version

* format

* modify common test

* add tests

* add PreTrainedTokenizerFast to AutoTokenizer

* format

* change tokenizer common test in order to be able to run test without a slow version

* update tokenizer fast test in order to use `rust_tokenizer_class` attribute instead of `tokenizer_class`

* add autokenizer test

* replace  `if self.tokenizer_class is not None` with ` if self.tokenizer_class is None`

* remove obsolete change in comment

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/tokenization_utils_fast.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* change `get_main_tokenizer` into `get_tokenizers`

* clarify `get_tokenizers` method

* homogenize with `test_slow_tokenizer` and `test_rust_tokenizer`

* add `test_rust_tokenizer = False` to tokenizer which don't define a fast version

* `test_rust_tokenizer = False` for BertJapaneseTokenizer

* `test_rust_tokenizer = False` for BertJapaneseCharacterTokenizationTest

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-14 11:58:44 +02:00
Daniel Stancl
4a51b1dd9b
FlaxBart (#11537)
* Start working on FlaxBart

* Create modeling_flax_bart.py

* Write FlaxBartAttention

* Add FlaxBartEncoderLayer

* Add FlaxBartDecoderLayer and some typing

* Add helepr function for FlaxBart

* shift_tokens_right

* _make_causal_mask

* _expand_mask

* Add PositionalEmbedding and fix init_std naming

* Add FlaxBartPretrainedModel

* Add FlaxBartEncoder

* Add FlaxBartEncoder

* Add FlaxBartEncoder among modules to be imported

* YET WE CANNOT INITIALIZE THAT!! :(

* Make BartEncoder working

Change BartEncoder to instance of nn.Module so far

* Add FlaxBartDecoder

* Add FlaxBartModel

* TODO to make model run -> Prepapre model inputs

* Resolve padding

* Add FlaxBartModel

* Add FlaxBartModel into importable modules

* Remove FlaxBartEncoder and FlaxBartDecoder from importable modules

* make style; not properly working

* make style; make quality not pass due to some import I left

* Remove TODO for padding_idx in nn.Embed so far

* Add FlaxBartForConditionalGeneration

* Incorporate Flax model output classes, i.e. return_dict

* Add another models and incorporate use_cache arg

* Add FlaxBartForSequenceClassification and FlaxBartForQuestionAnswering

* Incorporate use_cache arg from PyTorch implementation

* Add all necessary Flax output utils

* Add FlaxBartForCausalLM; not working yet'

* Add minor improvements; still lacks some functionality

* Update docs, src and tests

* Add support of FlaxBart to docs/source

* Fix some bugs in FlaxBart souce code

* Add some neccessary tests for FlaxBart models - jit_compilation not passing

* Fix tests and add test_head_masking

* Fix tests for @jax.jit computation

* Add test_head_masking

* Migrate FlaxBart tests from jax.numpy to numpy

* Remove FlaxBartForCausalLM

* Clean repo

* fix bart model weight structure

* Fix FlaxBartForSequenceClassification

Slicing is not possible to use below jit, therefore, selecting sentence
representation from hidden_states must be changed.

* Allow FlaxBartForSequenceClassification for testing pt_flax equivalence

* Allow testing for FlaxBartForQA for pt_flax equivalence

* Add a comment to FlaxBartForSequenceClassification + change noise from 1e-3 to 1e-6

* remove past_key_values

* remove inputs_mebeds and make input_ids required

* add position ids

* re-write attention layer

* fix dataclass

* fix pos embeds and attention output

* fix pos embeds

* expose encode method

* expose decode method

* move docstring to top

* add cache for causal attn layer

* remove head masking for now

* s2s greedy search first pass

* boom boom

* fix typos

* fix greedy generate for bart

* use encoder, decoder layers instead of num_hidden_layers

* handle encoder_outputs

* cleanup

* simplify decoding

* more clean-up

* typos

* Change header + add {decoder_,}position_ids into 2 models

* add BartConfig

* fix existing tests

* add encode, decode methods

* Fix shift_tokens_right for JIT compilation + clarify one condition

* fix decode

* encoder => encode

* simplify generate

* add tests for encode and decode

* style

* add tests for cache

* fix equivalence tests

* sample generate now works with seq2seq

* generation tests

* initialize dense layers

* docstring and cleanup

* quality

* remove get/set input_embeddings

* address Patricks suggestions

* decode for every model, remove encoder_outputs from call

* update tests accordingly

* decode returns only decoder outputs and logits

* fix arguments

* doc encode, decode methods

* correct base_model_prefix

* fix test for seq classif model

* fix docs

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
2021-06-14 15:16:08 +05:30
Guido Novati
ecd6efe7cb
Fix megatron_gpt2 attention block's causal mask (#12007)
* Fix megatron_gpt2 attention block's causal mask.

* compatibility with checkpoints created with recent versions of Megatron-LM

* added integration test for the released Megatron-GPT2 model

* code style changes

* added option to megatron conversion script to read from config file

Co-authored-by: Guido Novati <gnovati@nvidia.com>
2021-06-14 04:57:55 -04:00
Patrick von Platen
e47765d884
Fix head masking generate tests (#12110)
* fix_torch_device_generate_test

* remove @

* fix tests
2021-06-11 04:04:07 -04:00
Jayendra
9a9314f6d9
Flax VisionTransformer (#11951)
* adding vit for flax

* added test for Flax-vit and some bug-fixes

* overrided methods where variable changes were necessary for flax_vit test

* added FlaxViTForImageClassification for test

* Update src/transformers/models/vit/modeling_flax_vit.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* made changes suggested in PR

* Adding jax-vit models for autoimport

* swapping num_channels and height,width dimension

* fixing the docstring for torch-like inputs for VIT

* add model to main init

* add docs

* doc, fix-copies

* docstrings

* small test fixes

* fix docs

* fix docstr

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* style

Co-authored-by: jayendra <jayendra@infocusp.in>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-06-10 21:17:13 +05:30
Daniel Stancl
0eaeae2e36
Fix a condition in test_generate_with_head_masking (#11911)
* Fix a condition in test_generate_with_head_masking

* Fix usage of head_mask in bigbirg_pegasus

* Fix head masking for speech2text

* Resolve copy mismatch + drop unwanted print statement

* Fix the condition
2021-06-10 15:28:07 +01:00
Tobias Norlund
9d2cee8b48
CLIPFeatureExtractor should resize images with kept aspect ratio (#11994)
* Resize with kept aspect ratio

* Fixed failed test

* Overload center_crop and resize methods instead

* resize should handle non-PIL images

* update slow test

* Tensor => tensor

Co-authored-by: patil-suraj <surajp815@gmail.com>
2021-06-10 18:40:41 +05:30
Patrick von Platen
bc6f51e539
[Wav2Vec2ForPretraining] Correct checkpoints wav2vec2 & fix tests (#12089)
* fix_torch_device_generate_test

* remove @

* fix tests
2021-06-09 20:41:59 +01:00
Stas Bekman
61e191987d
rm require_version_examples (#12088) 2021-06-09 11:02:52 -07:00
Anton Lozhkov
d472bd7b18
Wav2Vec2 Pretraining (#11306)
* Working quantizer forward

* Working quantizer forward

* Clean up unused model parts, test reproducibility

* Working quantizer forward

* Clean up unused model parts, test reproducibility

* Remove custom outputs from the shared ones

* correct conversion

* correct bug

* add first pretrain script

* save intermediate

* static shapes

* save intermediate

* finish first pretrain script version

* more refactor

* remove wanddb

* refactor more

* improve test

* correct perplexity compute bug

* finish model implementation

* add to docs

* finish docs

* finish pretraining script

* finish pretraining script

* remove wandb

* finish PR for merge

* finish config

* finish

* make deepspeed work

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* apply suggestions

* fix flaky test

Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-09 18:40:56 +01:00
Stas Bekman
b1a8aa94f0
[test] support more than 2 gpus (#12074)
* support more than 2 gpus

* style
2021-06-09 09:23:47 -07:00
NielsRogge
d3eacbb829
Add DETR (#11653)
* Squash all commits of modeling_detr_v7 branch into one

* Improve docs

* Fix tests

* Style

* Improve docs some more and fix most tests

* Fix slow tests of ViT, DeiT and DETR

* Improve replacement of batch norm

* Restructure timm backbone forward

* Make DetrForSegmentation support any timm backbone

* Fix name of output

* Address most comments by @LysandreJik

* Give better names for variables

* Conditional imports + timm in setup.py

* Address additional comments by @sgugger

* Make style, add require_timm and require_vision to testsé

* Remove train_backbone attribute of DetrConfig, add methods to freeze/unfreeze backbone

* Add png files to fixtures

* Fix type hint

* Add timm to workflows

* Add `BatchNorm2d` to the weight initialization

* Fix retain_grad test

* Replace model checkpoints by Facebook namespace

* Fix name of checkpoint in test

* Add user-friendly message when scipy is not available

* Address most comments by @patrickvonplaten

* Remove return_intermediate_layers attribute of DetrConfig and simplify Joiner

* Better initialization

* Scipy is necessary to get sklearn metrics

* Rename TimmBackbone to DetrTimmConvEncoder and rename DetrJoiner to DetrConvModel

* Make style

* Improve docs and add 2 community notebooks

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-06-09 11:51:13 -04:00
Stas Bekman
11d86d3de4
[Deepspeed Wav2vec2] integration (#11638)
* wip

* wip - but working with https://github.com/microsoft/DeepSpeed/pull/1044

* cleanup

* workaround

* working 5/8 modes

* solve fp32 distributed zero3

* style

* sync

* sync

* rework

* deprecation

* cleanup

* https://github.com/microsoft/DeepSpeed/pull/1044 pr was merged

* clean up

* add a guide

* more prose

* more prose

* fix

* more prose

* sub_group_size was too big

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* refactor

* bug fix

* make the true check explicit

* new deepspeed release

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-08 12:32:03 -07:00
Stas Bekman
32290d87f6
[Deepspeed] various fixes (#12058)
* replace deprecated config

* sub_group_size was too big

* complete deprecation removal
2021-06-08 08:36:15 -07:00
Mario Šaško
f5eec0d8e9
Replace legacy tensor.Tensor with torch.tensor/torch.empty (#12027)
* Replace legacy torch.Tensor constructor with torch.{tensor, empty}

* Remove torch.Tensor in examples
2021-06-08 13:58:38 +01:00
NielsRogge
70f88eeccc
Fix tapas issue (#12063)
* Fix scatter function to be compatible with torch-scatter 2.7.0

* Allow test again
2021-06-08 05:22:31 -04:00
NielsRogge
e56e3140dd
Fix integration tests (#12066) 2021-06-08 05:21:38 -04:00
Stas Bekman
4abc6dd690
skip failing test (#12059) 2021-06-07 20:48:41 -07:00
Nicolas Patry
2056f26e85
Extend pipelines for automodel tupels (#12025)
* fix_torch_device_generate_test

* remove @

* finish

* refactor

* add test

* fix test

* Attempt at simplification.

* Small fix.

* Fixing non existing AutoModel for TF.

* Naming.

* Remove extra condition.

Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
2021-06-07 17:41:27 +02:00
Philip May
3857f2b4e3
fix deberta 2 tokenizer integration test (#12017) 2021-06-07 04:55:55 -04:00
Stas Bekman
2c73b93099
[Deepspeed] Assert on mismatches between ds and hf args (#12021)
* wip

* add mismatch validation + test

* renames

* Update docs/source/main_classes/deepspeed.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* renames

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-04 08:58:23 -07:00
Stas Bekman
61c5063491
[deepspeed] add nvme test skip rule (#11997)
* add nvme skip rule

* fix
2021-06-02 12:06:37 -07:00
Stas Bekman
640318befa
[deepspeed] Move code and doc into standalone files (#11984)
* move code and docs

* style

* moved

* restore
2021-06-02 09:56:00 -07:00
Gunjan Chhablani
88ca6a231d
VisualBERT (#10534)
* Init VisualBERT

* Add cookie-cutter, Config, and Embeddings

* Add preliminary Model

* Add Bert analogous classes

* Add basic code for NLVR, VQA, Flickr

* Update Init

* Fix VisualBert Downstream Models

* Rename classifier to cls

* Comment position_ids buffer

* Remove sentence image predictor output

* Update output dicts

* Remove unnecessary files

* Fix Auto Modeling

* Fix transformers init

* Add conversion script

* Add conversion script

* Fix docs

* Update visualbert modelling

* Update configuration

* Style fixes

* Add model and integration tests

* Add all tests

* Update model mapping

* Add simple detector from original repository

* Update docs and configs

* Fix style

* Fix style

* Update docs

* Fix style

* Fix import issues in style

* Fix style

* Add changes from review

* Fix style

* Fix style

* Update docs

* Fix style

* Fix style

* Update docs/source/model_doc/visual_bert.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/visual_bert/modeling_visual_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update tests/test_modeling_visual_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/visual_bert/modeling_visual_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/visual_bert/modeling_visual_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/visual_bert/modeling_visual_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Add changes from review

* Remove convert run script

* Add changes from review

* Update src/transformers/models/visual_bert/modeling_visual_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/visual_bert/modeling_visual_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/visual_bert/modeling_visual_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/visual_bert/modeling_visual_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/visual_bert/modeling_visual_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Add changes from review

* Add changes from review

* Add visual embedding example in docs

* Fix "copied from" comments

* Add changes from review

* Fix error, style, checkpoints

* Update docs

* Fix integration tests

* Fix style

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-02 18:13:08 +05:30
Patrick von Platen
43f46aa7fd
[RAG] Fix rag from pretrained question encoder generator behavior (#11962)
* fix_torch_device_generate_test

* remove @

* fix rag from pretrained loading

* add test

* uplaod

* finish
2021-06-02 09:17:14 +01:00
Stas Bekman
4ba203d9d3
[Trainer] add train loss and flops metrics reports (#11980)
* add train loss and flops metrics reports

* consistency

* add train_loss to skip keys

* restore on_train_end call timing
2021-06-01 15:58:31 -07:00
Stas Bekman
7ec596ecda
[DeepSpeed] decouple DeepSpeedConfigHF from Trainer (#11966)
* decouple DeepSpeedConfigHF from Trainer

* add LoggingLevel ctx manager; add new test

* cleanup

* add docs

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* implemented suggested renames

* formatter workaround

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-01 13:24:52 -07:00
Patrick von Platen
47a98fc4cb
ByT5 model (#11971)
* allow tf to use uneven num of layers

* add tokenizer

* finish docs

* finish docs

* Apply suggestions from code review

* include in index

* finish

* Update docs/source/model_doc/byt5.rst

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* apply sylvais suggestions

* make style

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
2021-06-01 19:07:37 +01:00
Philip May
fcad801825
Add regression tests for slow sentencepiece tokenizers. (#11737)
* add test_vocab_size for sentencepiece tok.

* add test_get_vocab for sentencepiece tok.

* add test_convert_token_and_id for sentencepiece tok.

* add test_tokenize_and_convert_tokens_to_string for all tok.

* improve test_tokenize_and_convert_tokens_to_string for sp. tok.

* add common tokenizer integration tests
- for albert
- for barthez

* add tokenizer integration tests to bert gen.

* add most tokenizer integration tests

* fix camembert tokenizer integration test

* add tokenizer integration test to marian

* add tokenizer integration test to reformer

* add typing and doc to tokenizer_integration_test_util

* fix tokenizer integration test of reformer

* improve test_sentencepiece_tokenize_and_convert_tokens_to_string

* empty commit to trigger CI

* fix tokenizer integration test of reformer

* remove code not needed anymore

* empty commit to trigger CI

* empty commit to trigger CI
2021-06-01 09:24:39 -04:00
Shamane Siri
9ec0f01b6c
RAG-2nd2end-revamp (#11893)
* initial

* code quality test

* code quality

* added test functions in test_modeling_rag.py and test_retrieval_rag.py to test end2end retreiver

* minor change in test_modeling_rag

* fixed tests

* Update examples/research_projects/rag-end2end-retriever/README.md

typo corrected as suggested by lhoestq

Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>

* Update examples/research_projects/rag-end2end-retriever/finetune_rag.py

type change suggested by lhoestq

Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>

* Update src/transformers/models/rag/retrieval_rag.py

Adding this change as mentioned by lhoestq.

Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>

* completed the minor changes suggested by the reviewers

Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
2021-06-01 07:32:26 +01:00
Suraj Patil
ad25fd62bd
Add FlaxCLIP (#11883)
* add flax CLIP

* default input_shape

* add tests

* fix test

* fix name

* fix docs

* fix shapes

* attend at least 1 token

* flax conv to torch conv

* return floats

* fix equivalence tests

* fix import

* return attention_weights and update tests

* fix dosctrings

* address patricks comments

* input_shape arg

* add tests for get_image_features and get_text_features methods

* fix tests
2021-06-01 09:44:31 +05:30
Philip May
fb60c309c6
fix assert (#11935) 2021-05-31 04:02:10 -04:00
Jayendra
af1a10bff4
[Flax] Return Attention from BERT, ELECTRA, RoBERTa and GPT2 (#11918)
* Added logic to return attention from flax-bert model and added test cases to check that

* Added new line at the end of file to test_modeling_flax_common.py

* fixing code style

* Fixing Roberta and Elextra models too from cpoying bert

* Added temporary hack to not run test_attention_outputs for FlaxGPT2

* Returning attention weights from GPT2 and changed the tests accordingly.

* last fixes

* bump flax dependency

Co-authored-by: jayendra <jayendra@infocusp.in>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-05-28 16:16:56 +05:30
Bhadresh Savani
e1205e478a
Added Sequence Classification class in GPTNeo (#11906)
* seq classification changes

* fix tests
2021-05-28 06:27:02 -04:00
Nicolas Patry
80d712fac6
Adding new argument max_new_tokens for generate. (#11476)
* Adding new argument `max_new_tokens` for generate.

This is a proposal to add a new argument `max_new_tokens` to `generate`.
This include a `MaxNewTokensCriteria` that enables callers that don't
know about the token length ahead (like pipelines callers) to manage
more easily the length of their generated output.

* Adding a test for the user warning when both`max_length` and
`max_new_tokens` are used together.

* Removed redundant `no_grad`.
2021-05-27 14:22:58 +02:00
Patrick von Platen
996a315e76
Flax Generate (#11777)
* fix_torch_device_generate_test

* remove @

* add

* indexing

* correct a couple of tests

* fix tests

* add logits processor

* finish top_k, top_p, temp

* add docs

* correct flax prng key default

* improve generate

* add generation docs

* add docs

* make style

* revert model outputs change

* make style

* correct typo

* fix tests

* fix slow test

* add raise

* finish generation

Co-authored-by: Patrick von Platen <patrick@huggingface.co>
2021-05-27 00:18:17 +01:00
Patrick von Platen
d5a72b6e19
[Flax] Allow dataclasses to be jitted (#11886)
* fix_torch_device_generate_test

* remove @

* change dataclasses to flax ones

* fix typo

* fix jitted tests

* fix bert & electra
2021-05-26 15:01:13 +01:00
Daniel Stancl
0b93358447
Fix usage of head masks by TF encoder-decoder models' generate() function (#11775)
* Fix Bart

* Fix Blenderbot{,_small}

* Fix LED

* Fix Marian

* Fix MBart

* Fix Pegasus

* Fix T5

* Add test for generation with head_mask

* Add a common TF test

* Override a test for the LED model as head masking is not yet properly implemented

* Remove all head_masks from input preparation for LED

* Drop masking for T5 as it needs a bit of refactor
2021-05-26 14:02:44 +01:00
Stas Bekman
1b6530104d
[Examples] create model with custom config on the fly (#11798)
* create custom model on the flight

* better wording

* add update_from_string

* cleanup

* cleanup

* Update src/transformers/configuration_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* more bool options

* style

* fix logger

* add test

* add the doc

* assert on conflict of options

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-05-25 10:40:49 -07:00
Patrick von Platen
7630c11f32
[Wav2Vec2] SpecAugment Fast (#11764)
* first try

* finish
2021-05-25 13:59:52 +01:00
Sylvain Gugger
f086652b16
Add option to log only once in multinode training (#11819)
* Add option to long only once in multinode training

* Use an alternate property
2021-05-25 08:03:43 -04:00
Lysandre Debut
6da129cb31
Enable memory metrics in tests that need it (#11859) 2021-05-25 04:06:19 -04:00
Lysandre Debut
db0b2477cc
Add some tests to the slow suite #11860 2021-05-25 04:06:06 -04:00
Sylvain Gugger
afe479adb5
[Trainer] Report both steps and num samples per second (#11818)
* [Trainer] Report both steps and num samples per second

* Fix batch number

* Update src/transformers/trainer_utils.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Address review comments

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2021-05-24 19:51:42 -04:00
Stas Bekman
a26f4d6208
[Deepspeed] support zero.Init in from_config (#11805)
* support zero.Init in from_config

* no need for eval test
2021-05-21 09:07:46 -07:00
Lysandre Debut
1b652295c5
Patch recursive import (#11812) 2021-05-21 06:50:01 -04:00
Keren Fuentes
223943872e
Fix failing test on Windows Platform (#11589)
* add separator for windows

* fixes test_is_copy_consistent on Windows

* fixing writing encoding issue on extended test (for Windows)

* resolving comments
2021-05-20 19:54:23 -04:00
Michael Benayoun
f4a0d6ff86
A cleaner and more scalable implementation of symbolic tracing (#11763)
Cleaner and more scalable implementation of symbolic tracing with torch.fx, and provides support for new architectures:
- ALBERT
- DistilBERT
- MobileBERT
- MegatronBERT
- GPT2
- GPT Neo

Co-authored-by: Michael Benayoun <michael@huggingface.co>
2021-05-20 18:02:29 +02:00
Sylvain Gugger
469384a777
Fix regression in regression (#11785)
* Fix regression in regression

* Add test
2021-05-20 09:55:13 -04:00
yujun
206f06f2dd
Add new model RoFormer (use rotary position embedding ) (#11684)
* add roformer

* Update docs/source/model_doc/roformer.rst

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Update docs/source/model_doc/roformer.rst

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* update

* add TFRoFormerSinusoidalPositionalEmbedding and fix TFMarianSinusoidalPositionalEmbedding

* update docs

* make style and make quality

* roback

* unchanged

* rm copies from , this is a error in TFMarianSinusoidalPositionalEmbedding

* update Copyright year

* move # Add modeling imports here to the correct position

* max_position_embeddings can be set to 1536

* # Copied from transformers.models.bert.modeling_bert.BertOutput with Bert->RoFormer

* # Copied from transformers.models.bert.modeling_bert.BertLayer.__init__ with Bert->RoFormer

* update tokenization_roformer

* make style

* add staticmethod apply_rotary_position_embeddings

* add TF staticmethod apply_rotary_position_embeddings

* update torch apply_rotary_position_embeddings

* fix tf apply_rotary_position_embeddings error

* make style

* add pytorch RoFormerSelfAttentionRotaryPositionEmbeddingTest

* add TF rotary_position_embeddings test

* update test_modeling_rofomer

* Update docs/source/model_doc/roformer.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/__init__.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/__init__.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/__init__.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/__init__.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/roformer/convert_roformer_original_tf_checkpoint_to_pytorch.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/roformer/modeling_roformer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/roformer/modeling_roformer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/roformer/modeling_tf_roformer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* refact roformer tokenizer

* add RoFormerTokenizerFast

* add RoFormerTokenizationTest

* add require_jieba

* update Copyright

* update tokenizer & add copy from

* add option rotary_value

* use rust jieba

* use rjieba

* use rust jieba

* fix test_alignement_methods

* slice normalized_string is too slow

* add config.embedding_size when embedding_size!=hidden_size

* fix pickle tokenizer

* Update docs/source/model_doc/roformer.rst

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* make style and make quality

Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-05-20 08:00:34 -04:00
Patrick von Platen
43891be19b
[T5 failing CI] Fix generate test (#11770)
* fix_torch_device_generate_test

* remove @
2021-05-19 05:31:17 -04:00
Daniel Stancl
680d181ce8
Fix usage of head masks by PT encoder-decoder models' generate() function (#11621)
* Add missing head masking for generate() function

* Add head_mask, decoder_head_mask and cross_attn_head_mask
into prepare_inputs_for_generation for generate() function
for multiple encoder-decoder models.

* Add test_genereate_with_head_masking

* [WIP] Update the new test and handle special cases

* make style

* Omit ProphetNet test so far

* make fix-copies
2021-05-19 00:44:53 +01:00
Suraj Patil
ca33278fdb
FlaxGPT2 (#11556)
* flax gpt2

* combine masks

* handle shared embeds

* add causal LM sample

* style

* add tests

* style

* fix imports, docs, quality

* don't use cache

* add cache

* add cache 1st version

* make use cache work

* start adding test for generation

* finish generation loop compilation

* rewrite test

* finish

* update

* update

* apply sylvains suggestions

* update

* refactor

* fix typo

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-05-18 22:50:51 +01:00
Vyom Pathak
fd3b12e8c3
Fixed: Better names for nlp variables in pipelines' tests and docs. (#11752)
* Fixed: Better names for nlp variables in pipelines' tests and docs.

* Fixed: Better variable names
2021-05-18 09:47:28 -04:00
Sylvain Gugger
a515caa331
Fix checkpoint deletion (#11748) 2021-05-18 07:42:39 -04:00
Nicolas Patry
b88e0e016d
[TokenClassification] Label realignment for subword aggregation (#11680)
* [TokenClassification] Label realignment for subword aggregation

Tentative to replace https://github.com/huggingface/transformers/pull/11622/files

- Added `AggregationStrategy`
- `ignore_subwords` and `grouped_entities` arguments are now fused
  into `aggregation_strategy`. It makes more sense anyway because
  `ignore_subwords=True` with `grouped_entities=False` did not have a
  meaning anyway.
- Added 2 new ways to aggregate which are MAX, and AVERAGE
- AVERAGE requires a bit more information than the others, for now this
case is slightly specific, we should keep that in mind for future
changes.
- Testing has been modified to reflect new argument, and to check the
correct deprecation and the new aggregation_strategy.
- Put the testing argument and testing results for aggregation_strategy,
close together, so that readers can understand what is supposed to
happen.
- `aggregate` is now only tested on a small model as it does not mean
anything to test it globally for all models.
- Previous tests are unchanged in desired output.
- Added a new test case that showcases better the difference between the
  FIRST, MAX and AVERAGE strategies.

* Wrong framework.

* Addressing three issues.

1- Tags might not follow B-, I- convention, so any tag should work now
(assumed as B-TAG)
2- Fixed an issue with average that leads to a substantial code change.
3- The testing suite was not checking for the "index" key for "none"
strategy. This is now fixed.

The issue is that "O" could not be chosen by AVERAGE strategy because
those tokens were filtered out beforehand, so their relative scores were
not counted in the average. Now filtering on
ignore_labels will happen at the very end of the pipeline fixing
that issue.
It's a bit hard to make sure this stays like that because we do
not have a end-to-end test for that behavior

* Formatting.

* Adding formatting to code + cleaner handling of B-, I- tags.

Co-authored-by: Francesco Rubbo <rubbo.francesco@gmail.com>
Co-authored-by: elk-cloner <rezakakhki.rk@gmail.com>

* Typo.

Co-authored-by: Francesco Rubbo <rubbo.francesco@gmail.com>
Co-authored-by: elk-cloner <rezakakhki.rk@gmail.com>
2021-05-18 09:53:20 +02:00
Patrick von Platen
73893fc771
[BigBird Pegasus] Make tests faster (#11744)
* improve tests

* remove bogus file

* make style

Co-authored-by: Patrick von Platen <patrick@huggingface.co>
2021-05-17 06:30:53 -04:00
Michael Benayoun
86d5fb0b36
Experimental symbolic tracing feature with torch.fx for BERT, ELECTRA and T5 (#11475)
Symbolic tracing feature for BERT, ELECTRA and T5

Co-authored-by: Michael Benayoun <michael@huggingface.co>
Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-05-14 20:57:30 +02:00
Volodymyr Byno
218d552f30
Fix loading the best model on the last stage of training (#11718) 2021-05-13 16:11:12 -04:00
lexhuismans
91cf29153b
[T5] Add 3D attention mask to T5 model (2) (#9643) (#11197)
* Add 3D attention mask to T5 model (#9643)

Added code for 3D attention mask in T5 model. Similar to BERT model.

* Add test for 3D attention mask

Added test for 3D attention mask: test_decoder_model_past_with_3d_attn_mask()
3D attention mask of the shape [Batch_size, Seq_length, Seq_length] both for
attention mask and decoder attention mask. Test is passing.
2021-05-13 12:02:27 +01:00
Philip May
37ed3ab719
Enable option for subword regularization in more tokenizers. (#11417)
* improve slow class tok usage at xlm rob

* add subword regularization for barthez

* improve barthez tok. test

* fix tokenizer tests

* add subword regularization for camembert

* add subword regularization for deberta v2 tokenizer

* add more doc to deberta v2 tokenizer

* add subword regularization for speech to text tok.

* fix sp_model_kwargs type in speech 2 text tok.

* add subword regularization for M2M100 tok.

* add more concrete type hints

* fix tests for m2m100 and s2t tok.

* add missing Any import

* fix syntax error in m2m100 tok.

* fix unpickle of m2m100 and s2t tok.

* fix test of m2m100 and s2t tok.

* improve unpickle of deberta v2 tok.

* add test for pickle of barthez & camembert

* fix pickle of barthez & camembert

* add test for deberta v2 tok. pickle

* fix m2m100 tok. pickle

* fix s2t tok. pickle

* add subword regularization to albert tok.

* refactor subword reg. test into TokenizerTesterMixin

improve albert tok. test

remove sample argument form albert tok.

check subword reg. using TokenizerTesterMixin

improve tok. tests

improve xlm roberta tok. tests

improve xlm roberta tok. tests

* add subword regularization for big bird t.

* improve xlm roberta tok. test

* add subword regularization for mbart50 tok.

* add subword regularization for pegasus tok.

* add subword regularization for reformer tok.

* add subword regularization for T5 tok.

* fix t5 tok. test formatting

* add subword regularization for xlm_proph. tok.

* add subword regularization for xlnet tok.

* add subword regularization for gert_gen tok.

* add typing to tokenizers

* add typing to xlm rob. tok

* add subword regularization for marian tok.

* add reverse tok. test

* fix marian tok test

* fix marian tok test

* fix casing in tok. tests

* fix style of tok. common test

* fix deberta v2 tok test

* add type annotations to tok. tests

* add type annotations to tok. __init__

* add typing to kokenizer

* add type annotations to tok. __init__

* don't specify the default when it's None

* fix barthez tok. doc

* move sentencepiece tok. tests to TokenizerTesterMixin

* fix unused imports

* fix albert tok. test

* add comment to sentencepiece test options

* fix Any import at big bird tok.

* fix Any import at xlm prophetnet tok.

* empty commit to trigger CI
2021-05-13 02:44:55 -04:00
NielsRogge
fa84540e98
Vit deit fixes (#11309)
* Improve docs of DeiT and ViT, add community notebook

* Add gitignore for test_samples

* Add notebook with Trainer

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-05-12 11:46:02 -04:00
Suraj Patil
8719afa1ad
CLIP (#11445)
* begin second draft

* fix import, style

* add loss

* fix embeds, logits_scale, and projection

* fix imports

* add conversion script

* add feature_extractor and processor

* style

* add tests for tokenizer, extractor and processor

* add vision model tests

* add weight init

* add more tests

* fix save_load  test

* model output, dosstrings, causal mask

* config doc

* add clip model tests

* return dict

* bigin integration test

* add integration tests

* fix-copies

* fix init

* Clip => CLIP

* fix module name

* docs

* fix doc

* output_dim => projection_dim

* fix checkpoint names

* remoe fast tokenizer file

* fix conversion script

* fix tests, quality

* put causal mask on device

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix attribute test

* style

* address sylvains comments

* style

* fix docstrings

* add qucik_gelu in activations, docstrings

* clean-up attention test

* fix act fun

* fix config

* fix torchscript tests

* even batch_size

* remove comment

* fix ouput tu_tuple

* fix save load tests

* fix add tokens test

* add fast tokenizer

* update copyright

* new processor API

* fix docs

* docstrings

* docs

* fix doc

* fix doc

* fix tokenizer

* fix import in doc example

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* check types of config

* valhalla => openai

* load image using url

* fix test

* typo

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-05-12 13:48:15 +05:30
Sylvain Gugger
f13f1f8fb8
Test checkpointing (#11682)
* Add test and see where CI is unhappy

* Load with strict=False
2021-05-11 12:02:48 -04:00
Sylvain Gugger
a135f59536
Auto modelcard (#11599)
* Autogenerate model cards from the Trainer

* ModelCard deprecated

* Fix test

* Style

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Address review comments

* Quality

* With all metadata

* Metadata

* Post-merge conflict mess

* Data args and all examples

* Default license and languages when possible

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-05-11 11:30:34 -04:00
Pavel Soriano
9120ae7d66
Fixes NoneType exception when topk is larger than one coupled with a small context in the Question-Answering pipeline (#11628)
* added fix to decode function. added test to qa pipeline tests

* completed topk docstring

* fixed formatting with black

* applied style_doc to fix line length
2021-05-10 13:28:10 -04:00
Tanmay Laud
f7f872955d
Big Bird Fast Tokenizer implementation (#11075)
* Added Big Bird Fast Tokenizer initial file

* style fixes

* flake fixes

* Added big bird fast tokenizer to init files

* Added big bird fast to Auto tokenization

* fix styles

* minor quality fixes

* Added initial test code

* Fix SpmConverter when precompiled_charsmap doesn't exist

* fixed post processor

* minor style fix

* minor fix input names

* Actually fix identity normalization

* style

* Added token type ids to fast tokenizer

* style

* flake fix

* fix copies

Co-authored-by: Anthony MOI <m.anthony.moi@gmail.com>
2021-05-10 03:01:23 -04:00
Lysandre Debut
39084ca663
Add the ImageClassificationPipeline (#11598)
* Add the ImageClassificationPipeline

* Code review

Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>

* Have `load_image` at the module level

Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
2021-05-07 08:08:40 -04:00
Vasudev Gupta
dc3f6758cf
Add BigBirdPegasus (#10991)
* init bigbird pegasus

* add debugging nb ; update config

* init conversion

* update conversion script

* complete conversion script

* init forward()

* complete forward()

* add tokenizer

* add some slow tests

* commit current

* fix copies

* add docs

* add conversion script for bigbird-roberta-summarization

* remove TODO

* small fixups

* correct tokenizer

* add bigbird core for now

* fix config

* fix more

* revert pegasus-tokenizer back

* make style

* everything working for pubmed; yayygit status

* complete tests finally

* remove bigbird pegasus tok

* correct tokenizer

* correct tests

* add tokenizer files

* finish make style

* fix test

* update

* make style

* fix tok utils base file

* make fix-copies

* clean a bit

* small update

* fix some suggestions

* add to readme

* fix a bit, clean tests

* fix more tests

* Update src/transformers/__init__.py

* Update src/transformers/__init__.py

* make fix-copies

* complete attn switching, auto-padding left

* make style

* fix auto-padding test

* make style

* fix batched attention tests

* put tolerance at 1e-1 for stand-alone decoder test

* fix docs

* fix tests

* correct slow tokenizer conversion

* Apply suggestions from code review

Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* complete remaining suggestions

* fix test

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-05-07 09:27:43 +02:00
Stas Bekman
619200cc42
[cuda ext tests] fixing tests (#11619)
* fixing tests

* cleanup
2021-05-06 13:35:28 -07:00
Patrick von Platen
3e3e41ae20
Pytorch - Lazy initialization of models (#11471)
* lazy_init_weights

* remove ipdb

* save int

* add necessary code

* remove unnecessary utils

* Update src/transformers/models/t5/modeling_t5.py

* clean

* add tests

* correct

* finish tests

* finish tests

* fix some more tests

* fix xlnet & transfo-xl

* fix more tests

* make sure tests are independent

* fix tests more

* finist tests

* final touches

* Update src/transformers/modeling_utils.py

* Apply suggestions from code review

* Update src/transformers/modeling_utils.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update src/transformers/modeling_utils.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* clean tests

* give arg positive name

* add more mock weights to xlnet

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2021-05-05 17:22:20 +02:00
Lysandre
8fa8e19429 Skip Funnel test 2021-05-05 12:38:01 +02:00
Sylvain Gugger
6b241e0e3b
Reproducible checkpoint (#11582)
* Set generator in dataloader

* Use generator in all random samplers

* Checkpoint all RNG states

* Final version

* Quality

* Test

* Address review comments

* Quality

* Remove debug util

* Add python and numpy RNGs

* Split states in different files in distributed

* Quality

* local_rank for TPUs

* Only use generator when accepted

* Add test

* Set seed to avoid flakiness

* Make test less flaky

* Quality
2021-05-04 16:20:56 -04:00
Patrick Fernandes
0afe4a90f9
[Flax] Add Electra models (#11426)
* add electra model to flax

* Remove Electra Next Sentence Prediction model added by mistake

* fix parameter sharing and loosen equality threshold

* fix styling issues

* add mistaken removen imports

* fix electra table

* Add FlaxElectra to automodels and fixe docs

* fix issues pointed out the PR

* fix flax electra to comply with latest changes

* remove stale class

* add copied from

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-05-04 20:56:09 +02:00
Patrick von Platen
084a187da3
[FlaxRoberta] Add FlaxRobertaModels & adapt run_mlm_flax.py (#11470)
* add flax roberta

* make style

* correct initialiazation

* modify model to save weights

* fix copied from

* fix copied from

* correct some more code

* add more roberta models

* Apply suggestions from code review

* merge from master

* finish

* finish docs

Co-authored-by: Patrick von Platen <patrick@huggingface.co>
2021-05-04 19:57:59 +02:00
Lysandre Debut
09b0bcfea9
Enable added tokens (#11325)
* Fix tests

* Reorganize

* Update tests/test_modeling_mobilebert.py

* Remove unnecessary addition
2021-05-04 08:13:57 -04:00
abhishek thakur
c40c7e213b
Add multi-class, multi-label and regression to transformers (#11012)
* add to  bert

* review comments

* Update src/transformers/configuration_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/configuration_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* self.config.problem_type

* fix style

* fix

* fin

* fix

* update doc

* fix

* test

* Test more problem types

* Update src/transformers/configuration_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix

* remove

* fix

* quality

* make fix-copies

* remove test

Co-authored-by: abhishek thakur <abhishekkrthakur@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-05-04 02:23:40 -04:00
Muktan
a721a5eefd
[Wav2vec2] Fixed tokenization mistakes while adding single-char tokens to tokenizer (#11538)
* Fixed tokenization mistakes while adding single-char tokens to tokenizer

* Added tests and Removed unnecessary comments.

* finalize wav2vec2 tok

* add more aggressive tests

* Apply suggestions from code review

* fix useless import

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-05-03 17:19:12 +02:00
NielsRogge
f3cf8ae7b3
Add LUKE (#11223)
* Rebase with master

* Minor bug fix in docs

* Copy files from adding_luke_v2 and improve docs

* change the default value of use_entity_aware_attention to True

* remove word_hidden_states

* fix head models

* fix tests

* fix the conversion script

* add integration tests for the pretrained large model

* improve docstring

* Improve docs, make style

* fix _init_weights for pytorch 1.8

* improve docs

* fix tokenizer to construct entity sequence with [MASK] entity when entities=None

* Make fix-copies

* Make style & quality

* Bug fixes

* Add LukeTokenizer to init

* Address most comments by @patil-suraj and @LysandreJik

* rename _compute_extended_attention_mask to get_extended_attention_mask

* add comments to LukeSelfAttention

* fix the documentation of the tokenizer

* address comments by @patil-suraj, @LysandreJik, and @sgugger

* improve docs

* Make style, quality and fix-copies

* Improve docs

* fix docs

* add "entity_span_classification" task

* update example code for LukeForEntitySpanClassification

* improve docs

* improve docs

* improve the code example in luke.rst

* rename the classification layer in LukeForEntityClassification from typing to classifier

* add bias to the classifier in LukeForEntitySpanClassification

* update docs to use fine-tuned hub models in code examples of the head models

* update the example sentences

* Make style & quality

* Add require_torch to tokenizer tests

* Add require_torch to tokenizer tests

* Address comments by @sgugger and add community notebooks

* Make fix-copies

Co-authored-by: Ikuya Yamada <ikuya@ikuya.net>
2021-05-03 09:07:29 -04:00
Stas Bekman
4e7bf94e72
[DeepSpeed] fp32 support (#11499)
* prep for deepspeed==0.3.16

* new version

* too soon

* support and test fp32 mode

* troubleshooting doc start

* workaround no longer needed

* add fp32 doc

* style

* cleanup, add tf32 note

* clarify

* release was made
2021-04-30 12:51:48 -07:00
Takuya Makino
c2cd02ac62
Accepts BatchEncoding in LengthSampler (#11431) 2021-04-30 08:27:46 -04:00
Shubham Sanghavi
30ede8994e
Implement Fast Tokenization for Deberta (#11387) 2021-04-30 08:08:15 -04:00
Nicolas Patry
db9dd09cf9
Adding AutomaticSpeechRecognitionPipeline. (#11337)
* Adding `AutomaticSpeechRecognitionPipeline`.

- Because we added everything to enable this pipeline, we probably
should add it to `transformers`.
- This PR tries to limit the scope and focuses only on the pipeline part
(what should go in, and out).
- The tests are very specific for S2T and Wav2vec2 to make sure both
architectures are supported by the pipeline. We don't use the mixin for
tests right now, because that requires more work in the `pipeline`
function (will be done in a follow up PR).
- Unsure about the "helper" function `ffmpeg_read`. It makes a lot of
  sense from a user perspective, it does not add any additional
dependencies (as in hard dependency, because users can always use their
own load mechanism). Meanwhile, it feels slightly clunky to have so much
optional preprocessing.
- The pipeline is not done to support streaming audio right now.

Future work:

- Add `automatic-speech-recognition` as a `task`. And add the
FeatureExtractor.from_pretrained within `pipeline` function.
- Add small models within tests
- Add the Mixin to tests.
- Make the logic between ForCTC vs ForConditionalGeneration better.

* Update tests/test_pipelines_automatic_speech_recognition.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Adding docs + main import + type checking + LICENSE.

* Doc style !.

* Fixing TYPE_HINT.

* Specifying waveform shape in the docs.

* Adding asserts + specify in the documentation the shape of the input
np.ndarray.

* Update src/transformers/pipelines/automatic_speech_recognition.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Adding require to tests + move the `feature_extractor` doc.

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-04-30 11:54:08 +02:00
Philip May
e0db8276a6
add sp_model_kwargs to unpickle of xlm roberta tok (#11430)
add test for pickle

simplify test

fix test code style

add missing pickle import

fix test

fix test

fix test
2021-04-30 03:44:58 -04:00
Patrick von Platen
f748bd4242
[Flax] Add docstrings & model outputs (#11498)
* add attentions & hidden states

* add model outputs + docs

* finish docs

* finish tests

* finish impl

* del @

* finish

* finish

* correct test

* apply sylvains suggestions

* Update src/transformers/models/bert/modeling_flax_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* simplify more

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-04-29 12:04:51 +02:00
Ashwin Geet D'Sa
741d48f5c7
Remove max length beam scorer (#11378)
* removed max_len

* removed max_length from BeamSearchScorer

* correct max length

* finish

* del vim

* finish & add test

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-04-27 00:28:40 +02:00
Stas Bekman
bc2571e61c
[Deepspeed] ZeRO-Infinity integration plus config revamp (#11418)
* adding Z-inf

* revamp config process

* up version requirement

* wip

* massive rewrite

* cleanup

* cleanup

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* consistent json commas

* act on suggestions

* leave this feature for 0.3.16

* style

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-04-26 10:40:32 -07:00
Bhadresh Savani
1d30ec95c7
[Examples] Fixes inconsistency around eval vs val and predict vs test (#11380)
* added changes for uniformity

* modified files

* corrected typo

* fixed qa scripts

* fix typos

* fixed predict typo in qa no trainer

* fixed test file

* reverted trainer changes

* reverted trainer changes in custom exmaples

* updated readme

* added changes in deepspeed test

* added changes for predict and eval
2021-04-26 09:24:31 -07:00
Sylvain Gugger
7959d83599
Give each test a different repo name (#11453) 2021-04-26 11:52:23 -04:00
Daniel Stancl
38a716cd41
TF BART models - Add cross_attentions to model output and fix cross-attention head masking (#10699)
* Add cross_attn_head_mask to BART

* Fix cross_attentions in TFBart-like models

* This commit enables returning of `cross_attentions`
for TFBart-like models

* It also fixes attention head masking in cross-attenion module

* Update TF model templates

* Fix missing , in TF model templates

* Fix typo: congig -> config
2021-04-26 14:16:21 +02:00
Patrick von Platen
32dbb2d954
make style (#11442) 2021-04-26 13:50:34 +02:00
cronoik
35cd8eed88
EncoderDecoderConfigs should not create new objects (#11300)
* removes the creation of separate config objects and uses the existing ones instead+overwrite resize_token_embeddings from parent class because it is not working for the EncoderDecoderModel

* rollback to current version of the huggingface master branch

* reworked version that ties the encoder and decoder config of the parent encoderdecoder instance

* overwrite of resize_token_embeddings throws an error now

* review comment suggestion

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* implemented warning in case encoderdecoder is created with differing configs of encoderdecoderconfig and decoderconfig or encoderconfig

* added test to avoid diverging configs of wrapper class and wrapped classes

* Update src/transformers/models/encoder_decoder/modeling_encoder_decoder.py

* make style

Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-04-25 11:45:46 +02:00
Daniel Stancl
f45cb66bf6
Add head_mask, decoder_head_mask, cross_head_mask to ProphetNet (#9964)
* Add head_mask & decoder_head_mask + some corrections

* Fix head masking for N-grams

* Enable test_headmasking for encoder and decod

* Fix one typo regarding in modeling_propgetnet.py

* Enable test_headmasking for ProphetNetStandaloneDecoderModelTest
and ProphetNetStandaloneEncoderModelTest in test_modeling_prophetnet.py

* make style

* Fix cross_head_mask

* Fix attention head mask naming

* `cross_head_mask` -> `cross_attn_head_mask`

* `cross_layer_head_mask` -> `cross_attn_layer_head_mask`

* Still need to merge #10605 to master to pass the tests
2021-04-25 11:06:16 +02:00
Philip May
195bfd118a
Enable option for subword regularization in XLMRobertaTokenizer (#11149)
* enable subword regularization.

* fix tokenizer storage

* fix docstring formatting

* Update src/transformers/models/xlm_roberta/tokenization_xlm_roberta.py

Co-authored-by: Stefan Schweter <stefan@schweter.it>

* fix docstring formatting

* add test for subword regularization tokenizer

* improve comments of test

* add sp_model_kwargs

* reformat docstring to match the style

* add some more documentation

* Update src/transformers/models/xlm_roberta/tokenization_xlm_roberta.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* improve docstring

* empty commit to trigger CI

* Update src/transformers/models/xlm_roberta/tokenization_xlm_roberta.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix docstring formatting for sphinx

Co-authored-by: Stefan Schweter <stefan@schweter.it>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-04-23 17:52:31 -04:00
Daniel Stancl
e3ff165aa5
Fix cross-attention head mask for Torch encoder-decoder models (#10605)
* Fix cross-attention head mask for Torch BART models

* Fix head masking for cross-attention module for the following
models: BART, Blenderbot, Blenderbot_small, M2M_100, Marian, MBart,
Pegasus

* Enable test_headmasking for M2M_100 model

* Fix cross_head_mask for FSMT, LED and T5

* This commit fixes `head_mask` for cross-attention modules
in the following models: FSMT, LED, T5

* It also contains some smaller changes in doc so that
it is be perfectly clear the shape of `cross_head_mask`
is the same as of `decoder_head_mask`

* Update template

* Fix template for BartForCausalLM

* Fix cross_head_mask for Speech2Text models

* Fix cross_head_mask in templates

* Fix args order in BartForCausalLM template

* Fix doc in BART templates

* Make more explicit naming

* `cross_head_mask` -> `cross_attn_head_mask`

* `cross_layer_head_mask` -> `cross_attn_layer_head_mask`

* Fix doc

* make style quality

* Fix speech2text docstring
2021-04-23 18:58:06 +02:00
Sylvain Gugger
bf2e0cf70b
Trainer push to hub (#11328)
* Initial support for upload to hub

* push -> upload

* Fixes + examples

* Fix torchhub test

* Torchhub test I hate you

* push_model_to_hub -> push_to_hub

* Apply mixin to other pretrained models

* Remove ABC inheritance

* Add tests

* Typo

* Run tests

* Install git-lfs

* Change approach

* Add push_to_hub to all

* Staging test suite

* Typo

* Maybe like this?

* More deps

* Cache

* Adapt name

* Quality

* MOAR tests

* Put it in testing_utils

* Docs + torchhub last hope

* Styling

* Wrong method

* Typos

* Update src/transformers/file_utils.py

Co-authored-by: Julien Chaumond <julien@huggingface.co>

* Address review comments

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-04-23 09:17:37 -04:00
Patrick von Platen
74e84f1fa6
make blenderbot test slow (#11395) 2021-04-23 07:49:09 -04:00
Patrick von Platen
8c9b5fcbaf
[Flax] Big FlaxBert Refactor (#11364)
* improve flax

* refactor

* typos

* Update src/transformers/modeling_flax_utils.py

* Apply suggestions from code review

* Update src/transformers/modeling_flax_utils.py

* fix typo

* improve error tolerance

* typo

* correct nasty saving bug

* fix from pretrained

* correct tree map

* add note

* correct weight tying
2021-04-23 09:53:09 +02:00
Patrick von Platen
880154d2e1
[Wav2Vec2] Fix special tokens for Wav2Vec2 tokenizer (#11349)
* fix wav2vec2 tok

* up
2021-04-22 12:23:08 +02:00
Sylvain Gugger
dabeb15292
Examples reorg (#11350)
* Base move

* Examples reorganization

* Update references

* Put back test data

* Move conftest

* More fixes

* Move test data to test fixtures

* Update path

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments and clean

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-04-21 11:11:20 -04:00
Nicolas Patry
aad95c7cde
Removed max_length from being mandatory within generate. (#11314)
* Removed `max_length` from being mandatory within `generate`.

- Moving on to fully using `StoppingCriteria` for `greedy` and `sample`
modes.
- `max_length` still used for `beam_search` and `group_beam_search`
(Follow up PR)
- Fixes a bug with MaxLengthStoppingCriteria (we should stop as soon a
we hit the max_length, the comparison needs to be or equal, that affects
the tests).
- Added options to use `logits_processor` and `stopping_criteria`
directly within `generate` function (so some users can define their own
`logits_processor` and `stopping_criteria`).
- Modified the backward compat tests to make sure we issue a warning.

* Fix `max_length` argument in `generate`.

* Moving validate to being functional.

- Renamed `smax_length` to `stoppping_max_length`.

* Removing `logits_processor` and `stopping_criteria` from `generate`
arguments.

* Deepcopy.

* Fix global variable name.
2021-04-21 11:56:45 +02:00
Suraj Patil
cfd2eaa8cf
[GPTNeo] create local attention mask ones (#11335)
* create local attention mask ones

* remove old method, address patricks comment
2021-04-20 18:37:44 +05:30
Sylvain Gugger
c0328a6c26
Load checkpoint without re-creating the model (#11318) 2021-04-19 20:31:29 -04:00
Sylvain Gugger
d9c62047a8
Trainer support for IterableDataset for evaluation and predict (#11286)
* Bulk of the work

* Polish and tests

* Update QA Trainer

* Avoid breaking the predict method

* Deprecation warnings

* Store real eval dataloder

* Get eval dataset reference before wrap
2021-04-16 16:01:58 -04:00
Nicolas Patry
92970c0cb9
Enabling multilingual models for translation pipelines. (#10536)
* [WIP] Enabling multilingual models for translation pipelines.

* decoder_input_ids -> forced_bos_token_id

* Improve docstring.

* Rebase

* Fixing 2 bugs

- Type token_ids coming from `_parse_and_tokenize`
- Wrong index from tgt_lang.

* Fixing black version.

* Adding tests for _build_translation_inputs and add them for all
tokenizers.

* Mbart actually puts the lang code at the end.

* Fixing m2m100.

* Adding TF support to `deep_round`.

* Update src/transformers/pipelines/text2text_generation.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Adding one line comment.

* Fixing M2M100 `_build_translation_input_ids`, and fix the call site.

* Fixing tests + deep_round -> nested_simplify

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-04-16 11:31:35 +02:00
Sylvain Gugger
2550b41aa2
Tokenizer fast save (#11234)
* Save fast tokenizers in both formats

* Fix for HerBERT

* Proper fix

* Properly test new behavior
2021-04-15 09:32:32 -04:00
Nicolas Patry
c3fcba3219
Adding pipeline task aliases. (#11247)
* Adding task aliases and adding `token-classification` and
`text-classification` tasks.

* Cleaning docstring.
2021-04-15 09:51:24 +02:00
Sylvain Gugger
aaaed56ffc
Trainer iterable dataset (#11254)
* IterableDatasetShard

* Test and integration in Trainer

* Update src/transformers/trainer_pt_utils.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Style

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-04-14 17:02:26 -04:00
Stas Bekman
83206ca6a8
[deepspeed] test on one node 2 gpus max (#11237)
* test on one node 2 gpus max

* fix the other place

* refactor

* fix

* cleanup

* more exact version
2021-04-14 11:06:59 -07:00
Stas Bekman
3d339ee659
[Deepspeed] zero3 tests band aid (#11235)
* temp band-aid

* style
2021-04-13 17:58:09 -04:00
Sylvain Gugger
81009b7a5c
Replace error by warning when loading an architecture in another (#11207)
* Replace error by warning when loading an architecture in another

* Style

* Style again

* Add a test

* Adapt old test
2021-04-13 10:33:52 -04:00
Philipp Schmid
f243a5ec0d
Sagemaker test docs update for framework upgrade (#11206)
* increased train_runtime for model parallelism

* added documentation for framework upgrade
2021-04-12 19:08:33 -04:00
NielsRogge
9f1260971f
Add DeiT (PyTorch) (#11056)
* First draft of deit

* More improvements

* Remove DeiTTokenizerFast from init

* Conversion script works

* Add DeiT to ViT conversion script

* Add tests, add head model, add support for deit in vit conversion script

* Update model checkpoint names

* Update image_mean and image_std, set resample to bicubic

* Improve docs

* Docs improvements

* Add DeiTForImageClassificationWithTeacher to init

* Address comments by @sgugger

* Improve feature extractors

* Make fix-copies

* Minor fixes

* Address comments by @patil-suraj

* All models uploaded

* Fix tests

* Remove labels argument from DeiTForImageClassificationWithTeacher

* Fix-copies, style and quality

* Fix tests

* Fix typo

* Multiple docs improvements

* More docs fixes
2021-04-12 18:07:10 -04:00
Sylvain Gugger
26212c14e5 Reactivate Megatron tests an use less workers 2021-04-09 18:09:53 -04:00
Philipp Schmid
6f90c29eaa
added json dump and extraction of train run time (#11167)
* added json dump and extraction of train run time

* make style happy
2021-04-09 15:18:00 -04:00
Kevin Canwen Xu
fb41f9f50c
Add a special tokenizer for CPM model (#11068)
* Add a special tokenizer for CPM model

* make style

* fix

* Add docs

* styles

* cpm doc

* fix ci

* fix the overview

* add test

* make style

* typo

* Custom tokenizer flag

* Add REAMDE.md

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-04-10 02:07:47 +08:00
Sylvain Gugger
269c9638df Merge branch 'master' of github.com:huggingface/transformers 2021-04-08 21:14:56 -04:00
Sylvain Gugger
d31c7b104e Skip Megatron tests for now 2021-04-08 21:14:43 -04:00
Sylvain Gugger
ba8b1f4754
Add support for multiple models for one config in auto classes (#11150)
* Add support for multiple models for one config in auto classes

* Use get_values everywhere

* Prettier doc
2021-04-08 18:41:36 -04:00
Stas Bekman
66446909b2
[tests] relocate core integration tests (#11146)
* relocate core integration tests

* add sys.path context manager

* cleanup

* try

* try2

* fix path

* doc

* style

* add dep

* add 2 more deps
2021-04-08 13:13:17 -07:00
Andrea Cappelli
6c40e49712
Run mlm pad to multiple for fp16 (#11128)
* Add mlm collator pad to multiple option (#10627)

* Use padding to 8x in run mlm (#10627)
2021-04-08 16:12:49 -04:00
Philipp Schmid
9c9b8e707b
Updates SageMaker docs for updating DLCs (#11140) 2021-04-08 16:05:53 -04:00
Julien Demouth
02ec02d6d3
Add nvidia megatron models (#10911)
* Add support for NVIDIA Megatron models

* Add support for NVIDIA Megatron GPT2 and BERT

Add the megatron_gpt2 model. That model reuses the existing GPT2 model. This
commit includes a script to convert a Megatron-GPT2 checkpoint downloaded
from NVIDIA GPU Cloud. See examples/megatron-models/README.md for details.

Add the megatron_bert model. That model is implemented as a modification of
the existing BERT model in Transformers. This commit includes a script to
convert a Megatron-BERT checkpoint downloaded from NVIDIA GPU Cloud. See
examples/megatron-models/README.md for details.

* Update src/transformers/models/megatron_bert/configuration_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/configuration_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/configuration_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Remove model.half in tests + add "# Copied ..."

Remove the model.half() instruction which makes tests fail on the CPU.

Add a comment "# Copied ..." before many classes in the model to enable automatic
tracking in CI between the new Megatron classes and the original Bert ones.

* Fix issues

* Fix Flax/TF tests

* Fix copyright

* Update src/transformers/models/megatron_bert/configuration_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/configuration_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update docs/source/model_doc/megatron_bert.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/model_doc/megatron_gpt2.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/__init__.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Resolve most of 'sgugger' comments

* Fix conversion issue + Run make fix-copies/quality/docs

* Apply suggestions from code review

* Causal LM & merge

* Fix init

* Add CausalLM to last auto class

Co-authored-by: Julien Demouth <jdemouth@nvidia.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-04-08 14:09:11 -04:00
Stas Bekman
c6d664849b
[DeepSpeed] ZeRO Stage 3 (#10753)
* synced gpus

* fix

* fix

* need to use t5-small for quality tests

* notes

* complete merge

* fix a disappearing std stream problem

* start zero3 tests

* wip

* tune params

* sorting out the pre-trained model loading

* reworking generate loop wip

* wip

* style

* fix tests

* split the tests

* refactor tests

* wip

* parameterized

* fix

* workout the resume from non-ds checkpoint pass + test

* cleanup

* remove no longer needed code

* split getter/setter functions

* complete the docs

* suggestions

* gpus and their compute capabilities link

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* style

* remove invalid paramgd

* automatically configure zero3 params that rely on hidden size

* make _get_resized_embeddings zero3-aware

* add test exercising resize_token_embeddings()

* add docstring

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-04-08 09:53:01 -07:00
Stas Bekman
1c15128312
[versions] handle version requirement ranges (#11110)
* handle version requirement ranges

* add mixed requirement test

* cleanup
2021-04-07 09:09:38 -07:00
Vasudev Gupta
7442801df5
fix tests (#11109) 2021-04-07 10:07:26 -04:00
Stas Bekman
c9035e4537
fix: The 'warn' method is deprecated (#11105)
* The 'warn' method is deprecated

* fix test
2021-04-07 09:20:06 -04:00
Sylvain Gugger
403d530eec
Auto feature extractor (#11097)
* AutoFeatureExtractor

* Init and first tests

* Tests

* Damn you gitignore

* Quality

* Defensive test for when not all backends are here

* Use pattern for Speech2Text models
2021-04-06 19:20:08 -04:00
Suraj Patil
2a8115f083
[WIP] GPT Neo cleanup (#10985)
* better names

* add attention mixin

* all slow tests in one class

* make helper methods static so we can test

* add local attention tests

* better names

* doc

* apply review suggestions
2021-04-06 12:24:15 -04:00
Philipp Schmid
76800fb8e6
added new merged Trainer test (#11090) 2021-04-06 15:12:21 +02:00
Sylvain Gugger
04ceee7d24
Fix distributed gather for tuples of tensors of varying sizes (#11071) 2021-04-05 16:21:49 -04:00
Sylvain Gugger
090e3e6896
Add center_crop to ImageFeatureExtractoMixin (#11066) 2021-04-05 15:28:51 -04:00
konstin
abb7430003
Replace pkg_resources with importlib_metadata (#11061)
* Replace pkg_resources with importlib_metadata

Fixes #10964. The other reason for this change is that pkg_resources has been [deprecated](8fe85c22ce) in favor of importlib_metadata.

* Reduce to a single importlib_metadata import switch

* Trigger CI

Co-authored-by: Stas Bekman <stas@stason.org>
2021-04-05 12:12:19 -07:00
Lysandre Debut
9f4e0c23d6
Documentation about loading a fast tokenizer within Transformers (#11029)
* Documentation about loading a fast tokenizer within Transformers

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* style

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-04-05 10:51:16 -04:00
Lysandre Debut
eb3479e7cf
Some models have no tokenizers (#11064) 2021-04-05 09:37:49 -04:00
cronoik
57c1749efa
DebertaTokenizer Rework closes #10258 (#10703)
* closes #10258

* typo

* reworked deberta test

* implemented the comments from BigBird01 regarding sequence pair encoding of deberta

* Update style

* VOCAB_FILES_NAMES is now a oneliner as suggested by @sgugger

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* added #fmt: on as requested by @sgugger

* Style

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-04-01 13:53:53 -04:00
NielsRogge
30677dc743
Add Vision Transformer and ViTFeatureExtractor (#10950)
* Squash all commits into one

* Update ViTFeatureExtractor to use image_utils instead of torchvision

* Remove torchvision and add Pillow

* Small docs improvement

* Address most comments by @sgugger

* Fix tests

* Clean up conversion script

* Pooler first draft

* Fix quality

* Improve conversion script

* Make style and quality

* Make fix-copies

* Minor docs improvements

* Should use fix-copies instead of manual handling

* Revert "Should use fix-copies instead of manual handling"

This reverts commit fd4e591bce.

* Place ViT in alphabetical order

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-04-01 11:16:05 -04:00
Sylvain Gugger
cd56f3fe7e
Merge trainers (#10975)
* Replace is_sagemaker_distributed_available

* Merge SageMakerTrainer into Trainer

* Test with shorter condition

* Put back deleted line

* Deprecate SageMakerTrainer and SageMakerTrainingArguments

* Apply suggestions from code review

Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com>

Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com>
2021-03-31 10:01:30 -04:00
Sylvain Gugger
acc3bd9d2a
Enforce string-formatting with f-strings (#10980)
* First third

* Styling and fix mistake

* Quality

* All the rest

* Treat %s and %d

* typo

* Missing )

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-03-31 10:00:27 -04:00
Philipp Schmid
ced7284a60
Sagemaker test fix (#10987)
* wrong makefile command

* ddp test fix
2021-03-31 07:44:22 -04:00
Patrick von Platen
e87505f3a1
[Flax] Add other BERT classes (#10977)
* add first code structures

* add all bert models

* add to init and docs

* correct docs

* make style
2021-03-31 09:45:58 +03:00
Suraj Patil
83d38c9ff3
GPT Neo few fixes (#10968)
* fix checkpoint names

* auto model

* fix doc
2021-03-30 11:15:55 -04:00
Patrick von Platen
7772ddb473
fix big bird gpu test (#10967) 2021-03-30 17:03:48 +03:00
Suraj Patil
860264379f
GPT Neo (#10848)
* lets begin

* boom boom

* fix out proj in attn

* fix attention

* fix local attention

* add tokenizer

* fix imports

* autotokenizer

* fix checkpoint name

* cleanup

* more clean-up

* more cleanup

* output attentions

* fix attn mask creation

* fix imports

* config doc

* add tests

* add slow tests

* quality

* add conversion script

* copyright

* typo

* another bites the dust

* fix attention tests

* doc

* add embed init in convert function

* fix copies

* remove tokenizer

* enable caching

* address review comments

* improve config and create attn layer list internally

* more consistent naming

* init hf config from mesh-tf config json file

* remove neo tokenizer from doc

* handle attention_mask in local attn layer

* attn_layers => attention_layers

* add tokenizer_class in config

* fix docstring

* raise if len of attention_layers is not same as num_layers

* remove tokenizer_class from config

* more consistent naming

* fix doc

* fix checkpoint names

* fp16 compat

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-03-30 09:42:30 -04:00
Patrick von Platen
8780caa388
[WIP][Flax] Add general conversion script (#10809)
* save intermediate

* finish first version

* delete some more

* improve import

* fix roberta

* Update src/transformers/modeling_flax_pytorch_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_flax_pytorch_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* small corrections

* apply all comments

* fix deterministic

* make fix-copies

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-03-30 12:13:59 +03:00
Philipp Schmid
604c085087
Sagemaker test (#10925)
* init

* first working test

* added todo for setup.py

* working test for single node multi node ddp and smd

* added tensorflow single node test

* added directory for pytorch and tensorflow due to different requirements.txt

* added directory for pytorch and tensorflow

* added comment for run_glue until it is available

* added output_dir to it

* smaller dataset to make test running faster

* adjust HP and script

* adjusted parameter for tensorflow

* refactored test scripts

* adjusted make file

* init

* first working test

* added todo for setup.py

* working test for single node multi node ddp and smd

* added tensorflow single node test

* added directory for pytorch and tensorflow due to different requirements.txt

* added directory for pytorch and tensorflow

* added comment for run_glue until it is available

* added output_dir to it

* smaller dataset to make test running faster

* adjust HP and script

* adjusted parameter for tensorflow

* refactored test scripts

* adjusted make file

* updated dlc container

* commented in all tests

* added both ecr images

* added new master branches

* debug

* added new datasets version

* init

* strange rebase bug

* removed changes

* changed min version for tests to work

* updated DLC

* added model parallel test

* removed test files

* removed test files

* tested with ned dlc

* added correct sagemaker sdk version

* adjust DLCs for official one

* reworked tests

* quality

* removed default profile added documentation to it

* added step in release for sagemaker tests

* reverted version for example script removed duplicated script and added install from master to requirements.txt

* removed mistaken .DS_Stores from mac

* fixed tests

* added Sylvains feedback

* make style

* added lysandre's feedback
2021-03-30 08:28:02 +02:00
Vasudev Gupta
6dfd027279
BigBird (#10183)
* init bigbird

* model.__init__ working, conversion script ready, config updated

* add conversion script

* BigBirdEmbeddings working :)

* slightly update conversion script

* BigBirdAttention working :) ; some bug in layer.output.dense

* add debugger-notebook

* forward() working for BigBirdModel :) ; replaced gelu with gelu_fast

* tf code adapted to torch till rand_attn in bigbird_block_sparse_attention ; till now everything working :)

* BigBirdModel working in block-sparse attention mode :)

* add BigBirdForPreTraining

* small fix

* add tokenizer for BigBirdModel

* fix config & hence modeling

* fix base prefix

* init testing

* init tokenizer test

* pos_embed must be absolute, attn_type=original_full when add_cross_attn=True , nsp loss is optional in BigBirdForPreTraining, add assert statements

* remove position_embedding_type arg

* complete normal tests

* add comments to block sparse attention

* add attn_probs for sliding & global tokens

* create fn for block sparse attn mask creation

* add special tests

* restore pos embed arg

* minor fix

* attn probs update

* make big bird fully gpu friendly

* fix tests

* remove pruning

* correct tokenzier & minor fixes

* update conversion script , remove norm_type

* tokenizer-inference test add

* remove extra comments

* add docs

* save intermediate

* finish trivia_qa conversion

* small update to forward

* correct qa and layer

* better error message

* BigBird QA ready

* fix rebased

* add triva-qa debugger notebook

* qa setup

* fixed till embeddings

* some issue in q/k/v_layer

* fix bug in conversion-script

* fixed till self-attn

* qa fixed except layer norm

* add qa end2end test

* fix gradient ckpting ; other qa test

* speed-up big bird a bit

* hub_id=google

* clean up

* make quality

* speed up einsum with bmm

* finish perf improvements for big bird

* remove wav2vec2 tok

* fix tokenizer

* include docs

* correct docs

* add helper to auto pad block size

* make style

* remove fast tokenizer for now

* fix some

* add pad test

* finish

* fix some bugs

* fix another bug

* fix buffer tokens

* fix comment and merge from master

* add comments

* make style

* commit some suggestions

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix typos

* fix some more suggestions

* add another patch

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix copies

* another path

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* update

* update nit suggestions

* make style

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-03-30 08:51:34 +03:00
Sylvain Gugger
b0595d33c1
Add ImageFeatureExtractionMixin (#10905)
* Add ImageFeatureExtractionMixin

* Add dummy vision objects

* Add require_vision

* Add tests

* Fix test
2021-03-26 11:23:56 -04:00
Amir Tahmasbi
4684bfc757
Layout lm tf 2 (#10636)
* Added embeddings layer

* Added layoutlm layers, main model, maskedlm and token classification classes

* Added model classes to tf auto models

* Added model to PT to TF conversion script

* Added model to doc README

* Added tests

* Removed unused imports

* Added layoutlm model, test, and doc for sequence classification, and fix imports in __init__.py

* Made tests pass!

* Fixed typos in imports and docs

* Fixed a typo in embeddings layer

* Removed imports

* Fixed formatting issues, imports, tests

* Added layoutlm layers, main model, maskedlm and token classification classes

* Added model classes to tf auto models

* Added model to PT to TF conversion script

* Removed unused imports

* Added layoutlm model, test, and doc for sequence classification, and fix imports in __init__.py

* Made tests pass!

* Fixed typos in imports and docs

* Removed imports

* Fixed small formatting issues

* Removed duplicates import from main __init__.py

* Chnaged deafult arg to true for adding  pooling layer to tf layoutlm

* Fixed formatting issues

* Style

* Added copied from to classes copied from bert

* Fixed doc strings examples to work with layoutlm inputs

* Removed PyTorch reference in doc strings example

* Added integration tests

* Cleaned up initialization file

* Updated model checkpoint identifiers

* Fixed imports

Co-authored-by: Amir Tahmasbi <amir@ehsai.ca>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-03-25 12:32:38 -04:00
Sylvain Gugger
a735f727cc
Fix test_trainer_distributed (#10875) 2021-03-23 19:03:06 -04:00
Patrick von Platen
77bf3fe787
[Generate] Add save mode logits processor to remove nans and infs if necessary (#10769)
* push

* finish

* finish

* make fix copies

* change name
2021-03-23 01:00:05 +03:00
Théo Matussière
117dba9948
fix backend tokenizer args override: key mismatch (#10686)
* fix backend tokenizer args override: key mismatch

* no touching the docs

* fix mpnet

* add mpnet to test

* fix test

Co-authored-by: theo <theo@matussie.re>
2021-03-18 22:13:45 -04:00
Sylvain Gugger
008672e6e5
Fix distributed evaluation (#10795)
* Fix distributed evaluation

* Use logger
2021-03-18 13:12:04 -04:00
Vimarsh Chaturvedi
094afa515d
from_pretrained: check that the pretrained model is for the right model architecture (#10586)
* Added check to ensure model name passed to from_pretrained and model are the same

* Added test to check from_pretrained throws assert error when passed an incompatiable model name

* Modified assert in from_pretrained with f-strings. Modified test to ensure desired assert message is being generated

* Added check to ensure config and model has model_type

* Fix FlauBERT heads

Co-authored-by: vimarsh chaturvedi <vimarsh chaturvedi>
Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-03-18 12:51:42 -04:00
Patrick von Platen
0b98ca368f
[Flax] Adapt Flax models to new structure (#9484)
* Create modeling_flax_eletra with code copied from modeling_flax_bert

* Add ElectraForMaskedLM and ElectraForPretraining

* Add modeling test for Flax electra and fix naming and arg in Flax Electra model

* Add documentation

* Fix code style

* Create modeling_flax_eletra with code copied from modeling_flax_bert

* Add ElectraForMaskedLM and ElectraForPretraining

* Add modeling test for Flax electra and fix naming and arg in Flax Electra model

* Add documentation

* Fix code style

* Fix code quality

* Adjust tol in assert_almost_equal due to very small difference between model output, ranging 0.0010 - 0.0016

* Remove redundant ElectraPooler

* save intermediate

* adapt

* correct bert flax design

* adapt roberta as well

* finish roberta flax

* finish

* apply suggestions

* apply suggestions

Co-authored-by: Chris Nguyen <anhtu2687@gmail.com>
2021-03-18 09:44:17 +03:00
Mansi Mane
0282e24eef
Smmp batch not divisible by microbatches fix (#10778)
* Added debug prints

* Added config

* Added prints

* Added prints

* Added extra samples to SequentialDistributedSampler

* Added extra samples to SequentialDistributedSampler

Updated SequentialDistributedSampler call

* Added deubg prints

* Removed extra prints

* Making predicitons and labels multiple of batchsize

* updated number of microbatches

* Removed extra prints

* Made start_remainder similar to DistributedSamplerWithLoop

* Minor spacing update

* Added debug prints

Added config

Added prints

Added prints

* Added extra samples to SequentialDistributedSampler

Updated SequentialDistributedSampler call

Added extra samples to SequentialDistributedSampler

Added deubg prints

Removed extra prints

Making predicitons and labels multiple of batchsize

updated number of microbatches

Removed extra prints

Squashing redundant commits

* Made start_remainder similar to DistributedSamplerWithLoop

Minor spacing update

Made start_remainder similar to DistributedSamplerWithLoop

* Test and styling

* Rename test

Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
2021-03-17 19:18:11 -04:00
Sylvain Gugger
40b049c701
Check copies blackify (#10775)
* Apply black before checking copies

* Fix for class methods

* Deal with lonely brackets

* Remove debug and add forward changes

* Separate copies and fix test

* Add black as a test dependency
2021-03-17 18:11:20 -04:00
Stas Bekman
3318c246f3
make failure to find a resume checkpoint fatal + tests (#10777) 2021-03-17 11:16:37 -07:00
Stas Bekman
cd8c93f701
[DeepSpeed] improve checkpoint loading code plus tests (#10760)
* deepspeed checkpoint loading code plus tests

* style

* style
2021-03-17 10:22:58 -07:00
Patrick von Platen
0486ccdd3d
small improvements (#10773) 2021-03-17 18:10:17 +03:00
Patrick von Platen
f20d75a13f
up (#10771) 2021-03-17 16:15:14 +03:00
Lysandre Debut
2097aa1826
Patches the full import failure and adds a test (#10750)
* Patches the full import failure and adds a test

* Add comment
2021-03-16 15:37:52 -04:00
Sylvain Gugger
a0a027c2ed
Add DistributedSamplerWithLoop (#10746)
* Add DistributedSamplerWithLoop

* Fix typo

* Test and small fix
2021-03-16 11:22:39 -04:00
Patrick von Platen
9f8619c6aa
Flax testing should not run the full torch test suite (#10725)
* make flax tests pytorch independent

* fix typo

* finish

* improve circle ci

* fix return tensors

* correct flax test

* re-add sentencepiece

* last tokenizer fixes

* finish maybe now
2021-03-16 08:05:37 +03:00
Joe Davison
966ba081c9
zero-shot pipeline multi_class -> multi_label (#10727) 2021-03-15 16:02:46 -06:00
Lysandre Debut
58f672e65c
Tests run on Docker (#10681)
* Tests run on Docker

Co-authored-by: Morgan <funtowiczmo@gmail.com>

* Comments from code review

* Reply to itself

* Dependencies

Co-authored-by: Morgan <funtowiczmo@gmail.com>
2021-03-15 17:28:01 -04:00
Patrick von Platen
d9e693e1d0
make wav2vec2 test deterministic (#10714) 2021-03-15 09:50:05 -04:00
Adam Pocock
3f1714f8a7
Adding required flags to non-default arguments in hf_argparser (#10688)
* Adding required flags to non-default arguments.

Signed-off-by: Adam Pocock <adam.pocock@oracle.com>

* make style fix.

* Update src/transformers/hf_argparser.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-03-15 09:27:55 -04:00
Igor Shalyminov
505494a86f
GPT2DoubleHeadsModel made parallelizable (#10658)
* GPT2DoubleHeadsModel made parallelizeable

* GPT2DoubleHeadsModel added as parallelizeable onto the GPT2 test suite
2021-03-15 09:10:44 -04:00
Patrick von Platen
bd8f6cafd4
make rag tests smaller (#10679) 2021-03-15 10:07:12 +03:00
Lysandre Debut
184ef8ecd0
TensorFlow tests: having from_pt set to True requires torch to be installed. (#10664)
* TF model exists for Blenderbot 400M

* Marian

* RAG
2021-03-12 14:16:40 +03:00
Nicolas Patry
543d0549f8
Adding new parameter to generate: max_time. (#9846)
* [WIP] Adding new parameter to `generate`:  `max_time`.

Generation by tokens number is sometimes a bit clunky because we don't
know how many tokens are good enough or even how many tokens are in
the payload (for pipelines users for instance). This leads to hard
to understand behavior.

This PR proposes a new argument `max_time` which is a float of seconds
for the allowed time for `generate` to run on.
Ideally combinations of `max_tokens=None`, `max_time=2` could be used to
generate as many tokens as possible within time budget.

NB: Another possible approach consists of passing a callback to `generate`
  putting the caller in charge of the actual decision of when to stop
  generating tokens. It opens the door to 'which args should we pass'
  to this callback. It's hard to imagine other use-cases for this
  early stopping behavior than time (that are not already covered by
  parameters of generate)

* Revamp with StoppingCriteria

* Removing deprecated mentions.

* Forgot arguments to stopping criteria.

* Readding max_length it's not just used as a stopping criteria.

* Default value for `stopping_criteria`.

* Address @patrickvonplaten comments.

- More docstrings
- Actual doc
- Include in global namespace
- Remove TF work.

* Put back `max_length` (deprecation different PR).

* Doc quality.

* Fixing old behavior without `stopping_criteria` but with `max_length`.

Making sure we don't break that in the future.

* Adding more tests for possible inconsistencies between

`max_length` and `stopping_criteria`.

* Fixing the torch imports.
2021-03-12 10:11:50 +01:00
Lysandre Debut
ea46e3fa9c
Adjust loss difference (#10669) 2021-03-12 09:09:46 +03:00
Sylvain Gugger
fda703a553
Fix integration slow tests (#10670)
* PoC

* Fix slow tests for the PT1.8 Embedding problem
2021-03-11 13:43:53 -05:00
Funtowicz Morgan
3ab6820370
Onnx fix test (#10663)
* Allow to pass kwargs to model's from_pretrained when using pipeline.

* Disable the use of past_keys_values for GPT2 when exporting to ONNX.

* style

* Remove comment.

* Appease the documentation gods

* Fix style

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-03-11 13:38:29 -05:00
Lysandre Debut
a637ae00c4
Fixes Pegasus tokenization tests (#10671) 2021-03-11 13:35:50 -05:00
Lysandre Debut
7e4428749c
Conversion to tensors requires padding (#10661) 2021-03-11 12:58:15 -05:00
Lysandre Debut
2adc8c926a
W2v2 test require torch (#10665)
* Adds a @require_torch to a test that requires it

* Tokenizer too

* Style
2021-03-11 12:56:12 -05:00
Sylvain Gugger
2295d783d5
Copy tokenizer files in each of their repo (#10624)
* Move tokenizer files in each repo

* Fix mBART50 tests

* Fix mBART tests

* Fix Marian tests

* Update templates
2021-03-10 11:26:23 -05:00
Suraj Patil
d26b37e744
Speech2TextTransformer (#10175)
* s2t

* fix config

* conversion script

* fix import

* add tokenizer

* fix tok init

* fix tokenizer

* first version working

* fix embeds

* fix lm head

* remove extra heads

* fix convert script

* handle encoder attn mask

* style

* better enc attn mask

* override _prepare_attention_mask_for_generation

* handle attn_maks in encoder and decoder

* input_ids => input_features

* enable use_cache

* remove old code

* expand embeddings if needed

* remove logits bias

* masked_lm_loss => loss

* hack tokenizer to support feature processing

* fix model_input_names

* style

* fix error message

* doc

* remove inputs_embeds

* remove input_embeds

* remove unnecessary docstring

* quality

* SpeechToText => Speech2Text

* style

* remove shared_embeds

* subsample => conv

* remove Speech2TextTransformerDecoderWrapper

* update output_lengths formula

* fix table

* remove max_position_embeddings

* update conversion scripts

* add possibility to do upper case for now

* add FeatureExtractor and Processor

* add tests for extractor

* require_torch_audio => require_torchaudio

* add processor test

* update import

* remove classification head

* attention mask is now 1D

* update docstrings

* attention mask should be of type long

* handle attention mask from generate

* alwyas return attention_mask

* fix test

* style

* doc

* Speech2TextTransformer => Speech2Text

* Speech2TextTransformerConfig => Speech2TextConfig

* remove dummy_inputs

* nit

* style

* multilinguial tok

* fix tokenizer

* add tgt_lang setter

* save lang_codes

* fix tokenizer

* add forced_bos_token_id to tokenizer

* apply review suggestions

* add torchaudio to extra deps

* add speech deps to CI

* fix dep

* add libsndfile to ci

* libsndfile1

* add speech to extras all

* libsndfile1 -> libsndfile1

* libsndfile

* libsndfile1-dev

* apt update

* add sudo to install

* update deps table

* install libsndfile1-dev on CI

* tuple to list

* init conv layer

* add model tests

* quality

* add integration tests

* skip_special_tokens

* add speech_to_text_transformer in toctree

* fix tokenizer

* fix fp16 tests

* add tokenizer tests

* fix copyright

* input_values => input_features

* doc

* add model in readme

* doc

* change checkpoint names

* fix copyright

* fix code example

* add max_model_input_sizes in tokenizer

* fix integration tests

* add do_lower_case to tokenizer

* remove clamp trick

* fix "Add modeling imports here"

* fix copyrights

* fix tests

* SpeechToTextTransformer => SpeechToText

* fix naming

* fix table formatting

* fix typo

* style

* fix typos

* remove speech dep from extras[testing]

* fix copies

* rename doc file,

* put imports under is_torch_available

* run feat extract tests when torch is available

* dummy objects for processor and extractor

* fix imports in tests

* fix import in modeling test

* fxi imports

* fix torch import

* fix imports again

* fix positional embeddings

* fix typo in import

* adapt new extractor refactor

* style

* fix torchscript test

* doc

* doc

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix docs, copied from, style

* fix docstring

* handle imports

* remove speech from all extra deps

* remove s2t from seq2seq lm mapping

* better names

* skip training tests

* add install instructions

* List => Tuple

* doc

* fix conversion script

* fix urls

* add instruction for libsndfile

* fix fp16 test

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-03-10 21:42:04 +05:30
Sylvain Gugger
72d9e039f9
Fix tests of TrainerCallback (#10615)
* Fix tests of TrainerCallback

* Update tests/test_trainer_callback.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-03-09 16:25:32 -05:00
Suraj Patil
20c10258a4
layerdrop 0 (#10604) 2021-03-09 17:35:07 +03:00
Patrick von Platen
9a06b6b11b
[FeatureExtractorSavingUtils] Refactor PretrainedFeatureExtractor (#10594)
* save first version

* finish refactor

* finish refactor

* correct naming

* correct naming

* shorter names

* Update src/transformers/feature_extraction_common_utils.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* change name

* finish

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-03-09 12:16:59 +03:00
Lysandre Debut
546cbe7e9e
Speedup tf tests (#10601)
* Pipeline tests should be slow

* Temporarily mark some tests as slow

* Temporarily mark Barthez tests as slow
2021-03-08 21:44:07 -05:00
Ratthachat (Jung)
696e8a4365
Add TFRag (#9002)
* Create modeling_tf_dpr.py

* Add TFDPR

* Add back TFPegasus, TFMarian, TFMBart, TFBlenderBot

last commit accidentally deleted these 4 lines, so I recover them back

* Add TFDPR

* Add TFDPR

* clean up some comments, add TF input-style doc string

* Add TFDPR

* Make return_dict=False as default

* Fix return_dict bug (in .from_pretrained)

* Add get_input_embeddings()

* Create test_modeling_tf_dpr.py

The current version is already passed all 27 tests!
Please see the test run at : 
https://colab.research.google.com/drive/1czS_m9zy5k-iSJbzA_DP1k1xAAC_sdkf?usp=sharing

* fix quality

* delete init weights

* run fix copies

* fix repo consis

* del config_class, load_tf_weights

They shoud be 'pytorch only'

* add config_class back

after removing it, test failed ... so totally only removing "use_tf_weights = None" on Lysandre suggestion

* newline after .. note::

* import tf, np (Necessary for ModelIntegrationTest)

* slow_test from_pretrained with from_pt=True

At the moment we don't have TF weights (since we don't have official official TF model)
Previously, I did not run slow test, so I missed this bug

* Add simple TFDPRModelIntegrationTest

Note that this is just a test that TF and Pytorch gives approx. the same output.
However, I could not test with the official DPR repo's output yet

* upload correct tf model

* remove position_ids as missing keys

* create modeling_tf_rag

* add tests for tf

* add tf tests

* revert wrong pt commit

* further refactor

* further refactor

* refactor

* Update modeling_tf_rag.py

- input_processing
- fix prepare_input_for_generation (mostly fix generate bug)
- bring back from_pretrained hack in order to test generate

* delete colab pieces of code

* Show case of greedy "generate"

Temporarily change from beam_search test to greedy_search test to show case that TF and PT do get equivalent output.

* cosmetic update

* correct typos

* update

* push some progress

* make easy check

* fix rag save from pretrained

* Update src/transformers/modeling_tf_utils.py

* remove commented out lines

* delete unnecessary lines

* add simple test case for nq_checkpoint

Add nq_checkpoint test to show that current version without hack still fails

* temporarily put ugly hack back again

* Add TFRagSequenceForGeneration!!

* __init__.py , import TFRagSequenceForGeneration

* Add TFRagSequence tests!

* rag init.py - add TFRagSequenceForGeneration

* fix from_pretrained

* fix prepare_inputs_for_generation

* Beam search for RagToken!

* minor clean up

* add tf.cast in TFRagModel

* More tf.cast

* Add all remaining tests (still have issues)

* delete all T5 related

* make style

* fix load weight prefix

* fix bart

* fix return_dict for tf_rag

make all tests pass .. Hooray

* fix some tests

* fix code quality

* fix qualtiy check

* finish tests tf rag

* add tf rag to docs

* remove TFT5 from docstring

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* remove TFT5 from docstring

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Delete outdated comments

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* improve doc strings

* add generative model classes

* fix adjust token logic

* refactor generate for TFRag

* using shape_list, not _get_shape

Co-authored-by: Julien Plu <plu.julien@gmail.com>

* axis=[1]->axis=1

* delete NEED_HELP comment

* improve readability

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* improve readability

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* improve readability

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Indicating model is in a developing state in docstrings

As suggested by Julien

* small last changes

* apply sylvains suggestions

* finish tf rag

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: patrickvonplaten <patrick@huggingface.co>
Co-authored-by: Julien Plu <plu.julien@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-03-09 00:49:51 +03:00
Sylvain Gugger
3ced9b3eb9
Check layer types for Optimizer construction (#10598)
* Check layer types for Optimizer construction

* Duplicate class
2021-03-08 16:40:11 -05:00
Sylvain Gugger
821d518e03 Revert "Tests"
This reverts commit b35e7b68ca.
2021-03-08 16:05:55 -05:00
Sylvain Gugger
4196bfeda0 Revert "Style"
This reverts commit a8ec52efc2.
2021-03-08 16:05:52 -05:00
Sylvain Gugger
a8ec52efc2 Style 2021-03-08 16:04:46 -05:00
Sylvain Gugger
b35e7b68ca Tests 2021-03-08 16:04:30 -05:00
Stas Bekman
6f84531e61
offline mode for firewalled envs (part 2) (#10569)
* more readable test

* add all the missing places

* one more nltk

* better exception check

* revert
2021-03-08 08:52:20 -08:00
Stas Bekman
f882966004
fix double wrapping + test (#10583) 2021-03-08 10:15:55 -05:00
Suraj Patil
2a737bffef
[M2M100] fix positional embeddings (#10590)
* fix tests

* emb should be a parameter

* fix positional embeddings

* fix make_weights

* don't save pos embeds

* add comment to describe the clamping
2021-03-08 16:06:19 +05:30
Suraj Patil
f6e74a63ca
Add m2m100 (#10236)
* m2m_100

* no layernorm_embedding

* sinusoidal positional embeddings

* update pos embeddings

* add default config values

* tokenizer

* add conversion script

* fix config

* fix pos embed

* remove _float_tensor

* update tokenizer

* update lang codes

* handle lang codes

* fix pos embeds

* fix spm key

* put embedding weights on device

* remove qa and seq classification heads

* fix convert script

* lang codes pn one line

* fix embeds

* fix tokenizer

* fix tokenizer

* add fast tokenizer

* style

* M2M100MT => M2M100

* fix copyright, style

* tokenizer converter

* vocab file

* remove fast tokenizer

* fix embeds

* fix tokenizer

* fix tests

* add tokenizer tests

* add integration test

* quality

* fix model name

* fix test

* doc

* doc

* fix doc

* add copied from statements

* fix tokenizer tests

* apply review suggestions

* fix urls

* fix shift_tokens_right

* apply review suggestions

* fix

* fix doc

* add lang code to id

* remove unused function

* update checkpoint names

* fix copy

* fix tokenizer

* fix checkpoint names

* fix merge issue

* style
2021-03-06 22:14:16 +05:30
Stas Bekman
88a951e3cc
offline mode for firewalled envs (#10407)
* offline mode start

* add specific values

* fix fallback

* add test

* better values check and range

* test that actually works

* document the offline mode

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* more strict check

* cleaner test

* pt-only test

* style

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-03-05 17:27:48 -08:00
Lysandre Debut
6b58e15507
Fix torch 1.8.0 segmentation fault (#10546)
* Only run one test

* Patch segfault

* Fix summarization pipeline

* Ready for merge
2021-03-05 12:10:19 -05:00
Nicolas Patry
54e55b52d4
Fixing conversation test for torch 1.8 (#10545) 2021-03-05 09:24:14 -05:00
Patrick von Platen
c503a1c15e
[ProphetNet] Bart-like Refactor (#10501)
* first step to refactor

* make all fast tests pass

* make all slow tests pass

* save intermediate

* correct cache

* finish PR

* make fp16 work
2021-03-04 23:27:12 +03:00
Sylvain Gugger
6290169eb3
Rework TPU checkpointing in Trainer (#10504)
* Rework TPU checkpointing in Trainer

* Wraps the barrier in a dist test

* Address review comments

* Remove line
2021-03-04 11:46:11 -05:00
Mehrad Moradshahi
1750e62900
Generate can return cross-attention weights too (#10493) 2021-03-03 13:57:02 +05:30
Patrick von Platen
0234de8418
Add Fine-Tuning for Wav2Vec2 (#10145)
* add encode labels function to tokenizer

* start adding finetuning

* init dropout

* upload

* correct convert script

* apply changes

* fix second typo

* make first dummy training run

* adapt convert script

* push confg for comparison

* remove conf

* finish training

* adapt data collator

* add research folder

* update according to fairseq feedback

* some minor corrections

* refactor masking indices a bit

* some minor changes

* clean tokenizer

* finish clean-up

* remove previous logic

* update run script

* correct training

* finish changes

* finish model

* correct bug

* fix training a bit more

* add some tests

* finish gradient checkpointing

* finish example

* correct gradient checkpointing

* improve tokenization method

* revert changes in tokenizer

* revert general change

* adapt fine-tuning

* update

* save intermediate test

* Update README.md

* finish finetuning

* delete conversion script

* Update src/transformers/models/wav2vec2/configuration_wav2vec2.py

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* finish wav2vec2 script

* finish wav2vec2 fine-tuning

* finalize test

* correct test

* adapt tests

* finish

* remove test file

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-03-01 12:13:17 +03:00
Tanmay Garg
256482ac92
Introduce save_strategy training argument (#10286)
* Introduce save_strategy training argument

* deprecate EvaluationStrategy

* collapse EvaluationStrategy and LoggingStrategy into a single
  IntervalStrategy enum

* modify tests to use modified enum
2021-02-27 19:34:22 -05:00
Kai Fricke
98569d4ba2
Add Ray Tune hyperparameter search integration test (#10414) 2021-02-26 10:18:33 -05:00
Julien Chaumond
83d2d55c94
[ci, flax] non-existing models are unlikely to pass tests (#10409)
😂
2021-02-26 12:35:36 +03:00
Sylvain Gugger
26f8b2cb10
Make Barthez tokenizer tests a bit faster (#10399)
* Make Barthez tokenizer tests a bit faster

* Quality
2021-02-25 11:42:25 -05:00