Commit Graph

798 Commits

Author SHA1 Message Date
sandip
f6b44e6190
Transfoxl seq classification (#8868)
* Transfoxl sequence classification

* Transfoxl sequence classification
2020-12-02 10:08:32 -05:00
Sylvain Gugger
7c10dd22ae
Better support for resuming training (#8878) 2020-12-01 13:45:21 -05:00
elk-cloner
4a9e502a36
Ctrl for sequence classification (#8812)
* add CTRLForSequenceClassification

* pass local test

* merge with master

* fix modeling test for sequence classification

* fix deco

* fix assert
2020-12-01 09:49:27 +01:00
Nicolas Patry
d8fc26e919
NerPipeline (TokenClassification) now outputs offsets of words (#8781)
* NerPipeline (TokenClassification) now outputs offsets of words

- It happens that the offsets are missing, it forces the user to pattern
match the "word" from his input, which is not always feasible.
For instance if a sentence contains the same word twice, then there
is no way to know which is which.
- This PR proposes to fix that by outputting 2 new keys for this
pipelines outputs, "start" and "end", which correspond to the string
offsets of the word. That means that we should always have the
invariant:

```python
input[entity["start"]: entity["end"]] == entity["entity_group"]
                                    # or entity["entity"] if not grouped
```

* Fixing doc style
2020-11-30 14:05:08 -05:00
Funtowicz Morgan
51b071313b
Attempt to fix Flax CI error(s) (#8829)
* Slightly increase tolerance between pytorch and flax output

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* test_multiple_sentences doesn't require torch

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Simplify parameterization on "jit" to use boolean rather than str

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Use `require_torch` on `test_multiple_sentences` because we pull the weight from the hub.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Rename "jit" parameter to "use_jit" for (hopefully) making it self-documenting.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Remove pytest.mark.parametrize which seems to fail in some circumstances

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Fix unused imports.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Fix style.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Give default parameters values for traced model.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Review comment: Change sentences to sequences

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-11-30 13:43:17 -05:00
Ahmed Elnaggar
40ecaf0c2b
Add T5 Encoder for Feature Extraction (#8717)
* Add T5 Encoder class for feature extraction

* fix T5 encoder add_start_docstrings indent

* update init with T5 encoder

* update init with TFT5ModelEncoder

* remove TFT5ModelEncoder

* change T5ModelEncoder order in init

* add T5ModelEncoder to transformers init

* clean T5ModelEncoder

* update init with TFT5ModelEncoder

* add TFModelEncoder for Tensorflow

* update init with TFT5ModelEncoder

* Update src/transformers/models/t5/modeling_t5.py

change output from Seq2SeqModelOutput to BaseModelOutput

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* remove encoder_outputs

1. remove encoder_outputs from the function call.
2. remove the encoder_outputs If statement.
3. remove isinstance from return_dict.

* Authorize missing decoder keys

* remove unnecessary input parameters

remove pask_key_values and use_cache

* remove use_cache

remove use_cache from the forward method

* add doctoring for T5 encoder

add doctoring for T5 encoder with T5_ENCODER_INPUTS_DOCSTRING

* change return_dict to dot access

* add T5_ENCODER_INPUTS_DOCSTRING for TF T5

* change TFT5Encoder output type to BaseModelOutput

* remove unnecessary parameters for TFT5Encoder

* remove unnecessary if statement

* add import BaseModelOutput

* fix BaseModelOutput typo to TFBaseModelOutput

* update T5 doc with T5ModelEncoder

* add T5ModelEncoder to tests

* finish pytorch

* finish docs and mt5

* add mtf to init

* fix init

* remove n_positions

* finish PR

* Update src/transformers/models/mt5/modeling_mt5.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/t5/modeling_t5.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/t5/modeling_tf_t5.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/mt5/modeling_tf_mt5.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* make style

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-11-30 08:34:40 +01:00
Patrick von Platen
5ced23dc84
[Pegasus] Refactor Tokenizer (#8731)
* refactor

* further refactor

* fix the rest tomorrow

* save intermediate

* finish slow tokenizer

* make more tests pass

* finish refactor

* fix comment

* clean further

* fix name

* fix naming

* Update src/transformers/models/reformer/tokenization_reformer.py

* Apply suggestions from code review

* Apply suggestions from code review

* refactor

* fix init tokenizers

* refactor

* improve convert

* refactor

* correct convert slow tokenizer

* final fix for Pegasus Tok

* remove ipdb

* improve links
2020-11-29 16:57:43 +01:00
Lysandre Debut
18c32eeb21
Model parallel tests should return, not pass in non model parallel settings. (#8825) 2020-11-27 16:41:29 -05:00
Max Del
0a921b6459
BART & FSMT: fix decoder not returning hidden states from the last layer (#8597)
* Fix decoder not returning hidden states from the last layer

* Resolve conflict

* Change the way to gather hidden states

* Add decoder hidden states test

* Make pytest and black happy

* Remove redundant line

* remove new line

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2020-11-27 18:35:34 +01:00
Moussa Kamal Eddine
81fe0bf085
Add barthez model (#8393)
* Add init barthez

* Add barthez model, tokenizer and docs

BARThez is a pre-trained french seq2seq model that uses BART objective.

* Apply suggestions from code review docs typos

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Add license

* Change URLs scheme

* Remove barthez model keep tokenizer

* Fix style

* Fix quality

* Update tokenizer

* Add fast tokenizer

* Add fast tokenizer test

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-11-27 12:31:42 -05:00
Patrick von Platen
a7d46a0609
Fix dpr<>bart config for RAG (#8808)
* correct dpr test and bert pos fault

* fix dpr bert config problem

* fix layoutlm

* add config to dpr as well
2020-11-27 16:26:45 +01:00
Patrick von Platen
a2cf37595e
[Flax test] Add require pytorch to flix flax test (#8816)
* try flax fix

* same for roberta
2020-11-27 14:40:42 +01:00
Kristian Holsheimer
f8eda599bd
[FlaxBert] Fix non-broadcastable attention mask for batched forward-passes (#8791)
* [FlaxBert] Fix non-broadcastable attention mask for batched forward-passes

* [FlaxRoberta] Fix non-broadcastable attention mask

* Use jax.numpy instead of ordinary numpy (otherwise not jit-able)

* Partially revert "Use jax.numpy ..."

* Add tests for batched forward passes

* Avoid unnecessary OOMs due to preallocation of GPU memory by XLA

* Auto-fix style

* Re-enable GPU memory preallocation but with mem fraction < 1/paralleism
2020-11-27 13:21:19 +01:00
Patrick von Platen
2a6fbe6a40
[XLNet] Fix mems behavior (#8567)
* fix mems in xlnet

* fix use_mems

* fix use_mem_len

* fix use mems

* clean docs

* fix tf typo

* make xlnet tf for generation work

* fix tf test

* refactor use cache

* add use cache for missing models

* correct use_cache in generate

* correct use cache in tf generate

* fix tf

* correct getattr typo

* make sylvain happy

* change in docs as well

* do not apply to cookie cutter statements

* fix tf test

* make pytorch model fully backward compatible
2020-11-25 16:54:59 -05:00
Joe Davison
369f1d77b4
Return correct Bart hidden state tensors (#8747)
* bart output hidden states upstream

* same w/ decoder

* add tests

* fix prophetnet

* fix gpt2 and ctrl

* fix fstm and skip test for reformer and longformer

* fix all models

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-11-25 22:06:04 +01:00
Lysandre Debut
138f45c184
Fix QA argument handler (#8765)
* Fix QA argument handler

* Attempt to get a better fix for QA (#8768)

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2020-11-25 14:02:15 -05:00
Julien Plu
29d4992453
New TF model inputs (#8602)
* Apply on BERT and ALBERT

* Update TF Bart

* Add input processing to TF BART

* Add input processing for TF CTRL

* Add input processing to TF Distilbert

* Add input processing to TF DPR

* Add input processing to TF Electra

* Add input processing for TF Flaubert

* Add deprecated arguments

* Add input processing to TF XLM

* remove unused imports

* Add input processing to TF Funnel

* Add input processing to TF GPT2

* Add input processing to TF Longformer

* Add input processing to TF Lxmert

* Apply style

* Add input processing to TF Mobilebert

* Add input processing to TF GPT

* Add input processing to TF Roberta

* Add input processing to TF T5

* Add input processing to TF TransfoXL

* Apply style

* Rebase on master

* Bug fix

* Retry to bugfix

* Retry bug fix

* Fix wrong model name

* Try another fix

* Fix BART

* Fix input precessing

* Apply style

* Put the deprecated warnings in the input processing function

* Remove the unused imports

* Raise an error when len(kwargs)>0

* test ModelOutput instead of TFBaseModelOutput

* Bug fix

* Address Patrick's comments

* Address Patrick's comments

* Address Sylvain's comments

* Add the new inputs in new Longformer models

* Update the template with the new input processing

* Remove useless assert

* Apply style

* Trigger CI
2020-11-24 13:55:00 -05:00
Stas Bekman
82d443a7fd
[core] implement support for run-time dependency version checking (#8645)
* implement support for run-time dependency version checking

* try not escaping !

* use findall that works on py36

* small tweaks

* autoformatter worship

* simplify

* shorter names

* add support for non-versioned checks

* add deps

* revert

* tokenizers not required, check version only if installed

* make a proper distutils cmd and add make target

* tqdm must be checked before tokenizers

* workaround the DistributionNotFound peculiar setup

* handle the rest of packages in setup.py

* fully sync setup.py's install_requires - to check them all

* nit

* make install_requires more readable

* typo

* Update setup.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* restyle

* add types

* simplify

* simplify2

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-11-24 13:22:25 -05:00
Lysandre Debut
e09e54fd9d
MT5 should have an autotokenizer (#8743)
* MT5 should have an autotokenizer

* Different configurations should be able to point to same tokenizers
2020-11-24 09:50:25 -05:00
Lysandre Debut
6fdd0bb231
Fix slow tests v2 (#8746)
* Fix BART test

* Fix MBART tests

* Remove erroneous line from yaml

* Update tests/test_modeling_bart.py

* Quality
2020-11-24 09:35:12 -05:00
zhiheng-huang
2c83b3c38d
Support various BERT relative position embeddings (2nd) (#8276)
* Support BERT relative position embeddings

* Fix typo in README.md

* Address review comment

* Fix failing tests

* [tiny] Fix style_doc.py check by adding an empty line to configuration_bert.py

* make fix copies

* fix configs of electra and albert and fix longformer

* remove copy statement from longformer

* fix albert

* fix electra

* Add bert variants forward tests for various position embeddings

* [tiny] Fix style for test_modeling_bert.py

* improve docstring

* [tiny] improve docstring and remove unnecessary dependency

* [tiny] Remove unused import

* re-add to ALBERT

* make embeddings work for ALBERT

* add test for albert

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-11-24 14:40:53 +01:00
LysandreJik
7f2c00913a TF BERT test update 2020-11-23 18:20:19 -05:00
LysandreJik
e1b7e10d5f Update TF BERT test 2020-11-23 18:19:12 -05:00
Colin Brochtrup
8ffc01a76a
Add early stopping callback to pytorch trainer (#8581)
* Add early stopping patience and minimum threshold metric must improve to prevent early stopping to pytorch trainer

* Add early stopping test

* Set patience counter to 0 if best metric not defined yet

* Make early stopping a callback. Add callback event for updating the best metric for early stopping callback to trigger on.

* Run make style

* make funciton name sensible

* Improve new argument docstring wording and hope that flakey CI test passes.

* Use on_evaluation callback instead of custom. Remove some debug printing

* Move early stopping arguments and state into early stopping callback

* Run make style

* Remove old code

* Fix docs formatting. make style went rogue on me.

* Remove copied attributes and fix variable

* Add assertions on training arguments instead of mutating them. Move comment out of public docs.

* Make separate test for early stopping callback. Add test of invalid arguments.

* Run make style... I remembered before CI this time!

* appease flake8

* Add EarlyStoppingCallback to callback docs

* Make docstring EarlyStoppingCallabck match other callbacks.

* Fix typo in docs
2020-11-23 17:25:35 -05:00
Stas Bekman
e84786aaa6
consistent ignore keys + make private (#8737)
* consistent ignore keys + make private

* style

* - authorized_missing_keys    => _keys_to_ignore_on_load_missing
  - authorized_unexpected_keys => _keys_to_ignore_on_load_unexpected

* move public doc of private attributes to private comment
2020-11-23 12:33:13 -08:00
alexorona
1cd9be2aeb
gpt2 and t5 parallel modeling (#8696)
* gpt2 and t5 parallel modeling

* model_parallel utils update

* adding missing model_parallel_utils

Adds missing model_parallel_utils and reverses the changes to code in modeling_gpt2 and modeling_t5

* training_args reformat

Reformatted training_args

* style formatting

Style formatting doc string length on training_args and model_parallel_utils

* style changes

make style && make quality for training_args and model_parallel_utils.

* adding tests

* minor change in trainer

reverts loss calculation

* Update training_args.py

* Update training_args.py

added back docstring language for adam_beta1 and adam_beta2

* Update trainer.py

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix style & rebase

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
2020-11-23 14:41:23 -05:00
Julien Chaumond
0cc5ab1333
Improve bert-japanese tokenizer handling (#8659)
* Make ci fail

* Try to make tests actually run?

* CI finally failing?

* Fix CI

* Revert "Fix CI"

This reverts commit ca7923be73.

* Ooops wrong one

* one more try

* Ok ok let's move this elsewhere

* Alternative to globals() (#8667)

* Alternative to globals()

* Error is raised later so return None

* Sentencepiece not installed make some tokenizers None

* Apply Lysandre wisdom

* Slightly clearer comment?

cc @sgugger

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-11-23 11:15:02 -05:00
Yossi Synett
18c8cf000b
Fix bug in x-attentions output for roberta and harden test to catch it (#8660) 2020-11-23 13:28:29 +01:00
Patrick von Platen
9c0afdaf7b
fix flaky ci (#8694) 2020-11-20 22:07:21 +01:00
Sylvain Gugger
6494910f27
Add sentencepiece to the CI and fix tests (#8672)
* Fix the CI and tests

* Fix quality

* Remove that m form nowhere
2020-11-19 16:44:20 -05:00
Stas Bekman
42111f1d56
[tokenizers] convert_to_tensors: don't reconvert when the type is already right (#8283)
* don't reconvert when the type is already right

* better name

* adjust logic as suggested

* merge
2020-11-19 12:06:01 -08:00
Zhylko Dima
ca0109bd68
disable_ngram_loss fix for prophetnet (#8554)
* `disable_ngram_loss` fix for prophetnet

* add changes documentation

* fix _compute_loss to use mean reduction and -100 to masked tokens & remove unnecessary arguments

* mean label smoothing loss

* small refactor

* fix test

Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
2020-11-19 19:18:07 +01:00
Sylvain Gugger
4208f496ee
Better filtering of the model outputs in Trainer (#8633)
* Better filtering of the model outputs in Trainer

* Fix examples tests

* Add test for Lysandre
2020-11-19 10:43:15 -05:00
Lysandre Debut
f2e07e7272
Fix a bunch of slow tests (#8634)
* CI should install `sentencepiece`

* Requiring TF

* Fixing some TFDPR bugs

* remove return_dict=False/True hack

Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
2020-11-19 10:41:41 -05:00
elk-cloner
5362bb8a6b
Tf longformer for sequence classification (#8231)
* working on LongformerForSequenceClassification

* add TFLongformerForMultipleChoice

* add TFLongformerForTokenClassification

* use add_start_docstrings_to_model_forward

* test TFLongformerForSequenceClassification

* test TFLongformerForMultipleChoice

* test TFLongformerForTokenClassification

* remove test from repo

* add test and doc for TFLongformerForSequenceClassification, TFLongformerForTokenClassification, TFLongformerForMultipleChoice

* add requested classes to modeling_tf_auto.py
update dummy_tf_objects
fix tests
fix bugs in requested classes

* pass all tests except test_inputs_embeds

* sync with master

* pass all tests except test_inputs_embeds

* pass all tests

* pass all tests

* work on test_inputs_embeds

* fix style and quality

* make multi choice work

* fix TFLongformerForTokenClassification signature

* fix TFLongformerForMultipleChoice, TFLongformerForSequenceClassification signature

* fix mult choice

* fix mc hint

* fix input embeds

* fix input embeds

* refactor input embeds

* fix copy issue

* apply sylvains changes and clean more

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-11-19 10:37:27 -05:00
Sylvain Gugger
1e62e999e8
Fixes the training resuming with gradient accumulation (#8624) 2020-11-18 12:00:11 -05:00
Nicola De Cao
2f9d49b389
Adding PrefixConstrainedLogitsProcessor (#8529)
* Adding PrefixConstrainedLogitsProcessor

* fixing RAG and style_doc

* fixing black (v20 instead of v19)

* Improving doc in generation_logits_process.py

* Improving docs and typing in generation_utils.py

* docs improvement

* adding test and fixing doc typo

* fixing doc_len

* isort on test

* fixed test

* improve docstring a bit

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-11-18 17:06:25 +01:00
Sylvain Gugger
dd52804f5f
Remove deprecated (#8604)
* Remove old deprecated arguments

Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>

* Remove needless imports

* Fix tests

Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
2020-11-17 15:11:29 -05:00
Lysandre Debut
3095ee9dab
Tokenizers should be framework agnostic (#8599)
* Tokenizers should be framework agnostic

* Run the slow tests

* Not testing

* Fix documentation

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-11-17 14:03:03 -05:00
Patrick von Platen
86822a358b
T5 & mT5 (#8552)
* add mt5 and t5v1_1 model

* fix tests

* correct some imports

* add tf model

* finish tf t5

* improve examples

* fix copies

* clean doc
2020-11-17 12:23:09 +01:00
Sylvain Gugger
c89bdfbe72
Reorganize repo (#8580)
* Put models in subfolders

* Styling

* Fix imports in tests

* More fixes in test imports

* Sneaky hidden imports

* Fix imports in doc files

* More sneaky imports

* Finish fixing tests

* Fix examples

* Fix path for copies

* More fixes for examples

* Fix dummy files

* More fixes for example

* More model import fixes

* Is this why you're unhappy GitHub?

* Fix imports in conver command
2020-11-16 21:43:42 -05:00
Sylvain Gugger
1073a2bde5
Switch return_dict to True by default. (#8530)
* Use the CI to identify failing tests

* Remove from all examples and tests

* More default switch

* Fixes

* More test fixes

* More fixes

* Last fixes hopefully

* Use the CI to identify failing tests

* Remove from all examples and tests

* More default switch

* Fixes

* More test fixes

* More fixes

* Last fixes hopefully

* Run on the real suite

* Fix slow tests
2020-11-16 11:43:00 -05:00
LSinev
afb50c663a
Fix GPT2DoubleHeadsModel to work with model.generate() (#6601)
* Fix passing token_type_ids during GPT2DoubleHeadsModel.generate() if used

and for GPT2LMHeadModel too

* Update tests to check token_type_ids usage in GPT2 models
2020-11-16 14:35:44 +01:00
Yusuke Mori
04d8136bde
Adding the prepare_seq2seq_batch function to ProphetNet (#8515)
* Simply insert T5Tokenizer's prepare_seq2seq_batch

* Update/Add some 'import'

* fix RunTimeError caused by '.view'

* Moves .view related error avoidance from seq2seq_trainer to inside prophetnet

* Update test_tokenization_prophetnet.py

* Format the test code with black

* Re-format the test code

* Update test_tokenization_prophetnet.py

* Add importing require_torch in the test code

* Add importing BatchEncoding in the test code

* Re-format the test code on Colab
2020-11-16 14:18:25 +01:00
Thomas Wolf
f4e04cd2c6
[breaking|pipelines|tokenizers] Adding slow-fast tokenizers equivalence tests pipelines - Removing sentencepiece as a required dependency (#8073)
* Fixing roberta for slow-fast tests

* WIP getting equivalence on pipelines

* slow-to-fast equivalence - working on question-answering pipeline

* optional FAISS tests

* Pipeline Q&A

* Move pipeline tests to their own test job again

* update tokenizer to add sequence id methods

* update to tokenizers 0.9.4

* set sentencepiecce as optional

* clean up squad

* clean up pipelines to use sequence_ids

* style/quality

* wording

* Switch to use_fast = True by default

* update tests for use_fast at True by default

* fix rag tokenizer test

* removing protobuf from required dependencies

* fix NER test for use_fast = True by default

* fixing example tests (Q&A examples use slow tokenizers for now)

* protobuf in main deps extras["sentencepiece"] and example deps

* fix protobug install test

* try to fix seq2seq by switching to slow tokenizers for now

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-11-15 22:50:59 +01:00
Julien Plu
24184e73c4
Rework some TF tests (#8492)
* Update some tests

* Small update

* Apply style

* Use max_position_embeddings

* Create a fake attribute

* Create a fake attribute

* Update wrong name

* Wrong TransfoXL model file

* Keep the common tests agnostic
2020-11-13 17:07:17 -05:00
Patrick von Platen
42e2d02e44
[T5] Bug correction & Refactor (#8518)
* fix bug

* T5 refactor

* refactor tf

* apply sylvains suggestions
2020-11-13 16:57:31 +01:00
Julien Plu
5d80539488
Add pretraining loss computation for TF Bert pretraining (#8470)
* Add pretraining loss computation for TF Bert pretraining

* Fix labels creation

* Fix T5 model

* restore T5 kwargs

* try a generic fix for pretraining models

* Apply style

* Overide the prepare method for the BERT tests
2020-11-12 14:08:26 -05:00
Lysandre
c7b6bbec5c Skip test until investigation 2020-11-11 12:59:40 -05:00
Ratthachat (Jung)
026a2ff225
Add TFDPR (#8203)
* Create modeling_tf_dpr.py

* Add TFDPR

* Add back TFPegasus, TFMarian, TFMBart, TFBlenderBot

last commit accidentally deleted these 4 lines, so I recover them back

* Add TFDPR

* Add TFDPR

* clean up some comments, add TF input-style doc string

* Add TFDPR

* Make return_dict=False as default

* Fix return_dict bug (in .from_pretrained)

* Add get_input_embeddings()

* Create test_modeling_tf_dpr.py

The current version is already passed all 27 tests!
Please see the test run at : 
https://colab.research.google.com/drive/1czS_m9zy5k-iSJbzA_DP1k1xAAC_sdkf?usp=sharing

* fix quality

* delete init weights

* run fix copies

* fix repo consis

* del config_class, load_tf_weights

They shoud be 'pytorch only'

* add config_class back

after removing it, test failed ... so totally only removing "use_tf_weights = None" on Lysandre suggestion

* newline after .. note::

* import tf, np (Necessary for ModelIntegrationTest)

* slow_test from_pretrained with from_pt=True

At the moment we don't have TF weights (since we don't have official official TF model)
Previously, I did not run slow test, so I missed this bug

* Add simple TFDPRModelIntegrationTest

Note that this is just a test that TF and Pytorch gives approx. the same output.
However, I could not test with the official DPR repo's output yet

* upload correct tf model

* remove position_ids as missing keys

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: patrickvonplaten <patrick@huggingface.co>
2020-11-11 12:28:09 -05:00
Julien Plu
da842e4e72
Add next sentence prediction loss computation (#8462)
* Add next sentence prediction loss computation

* Apply style

* Fix tests

* Add forgotten import

* Add forgotten import

* Use a new parameter

* Remove kwargs and use positional arguments
2020-11-11 15:02:06 +01:00
Patrick von Platen
70708cca1a
fix t5 token type ids (#8437) 2020-11-10 14:21:54 -05:00
Lysandre Debut
9fd1f56236
[No merge] TF integration testing (#7621)
* stash

* TF Integration testing for ELECTRA, BERT, Longformer

* Trigger slow tests

* Apply suggestions from code review
2020-11-10 14:02:33 -05:00
Stas Bekman
02bdfc0251
using multi_gpu consistently (#8446)
* s|multiple_gpu|multi_gpu|g; s|multigpu|multi_gpu|g'

* doc
2020-11-10 13:23:58 -05:00
Patrick von Platen
b93569457f
fix t5 special tokens (#8435) 2020-11-10 18:54:17 +01:00
Sam Shleifer
c314b1fd3b
[docs] improve bart/marian/mBART/pegasus docs (#8421) 2020-11-10 10:18:34 -05:00
Lysandre Debut
850afb422d
Patch token classification pipeline (#8364)
* Patch token classification pipeline

* Some added tests for TokenClassificationArgumentHandler (#8366)

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2020-11-10 07:29:34 -05:00
Julien Chaumond
70f622fab4
Model versioning (#8324)
* fix typo

* rm use_cdn & references, and implement new hf_bucket_url

* I'm pretty sure we don't need to `read` this file

* same here

* [BIG] file_utils.networking: do not gobble up errors anymore

* Fix CI 😇

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Tiny doc tweak

* Add doc + pass kwarg everywhere

* Add more tests and explain

cc @sshleifer let me know if better

Co-Authored-By: Sam Shleifer <sshleifer@gmail.com>

* Also implement revision in pipelines

In the case where we're passing a task name or a string model identifier

* Fix CI 😇

* Fix CI

* [hf_api] new methods + command line implem

* make style

* Final endpoints post-migration

* Fix post-migration

* Py3.6 compat

cc @stefan-it

Thank you @stas00

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-11-10 07:11:02 -05:00
Patrick von Platen
9c83b96e62
[Tests] Add Common Test for Training + Fix a couple of bugs (#8415)
* add training tests

* correct longformer

* fix docs

* fix some tests

* fix some more train tests

* remove ipdb

* fix multiple edge case model training

* fix funnel and prophetnet

* clean gpt models

* undo renaming of albert
2020-11-09 18:24:41 +01:00
Sylvain Gugger
908a28894c
Add new token classification example (#8340)
* Add new token classification example

* Remove txt file

* Add test

* With actual testing done

* Less warmup is better

* Update examples/token-classification/run_ner_new.py

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Address review comments

* Fix test

* Make Lysandre happy

* Last touches and rename

* Rename in tests

* Address review comments

* More run_ner -> run_ner_old

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-11-09 11:39:55 -05:00
Stas Bekman
78d706f3ae
[fsmt tokenizer] support lowercase tokenizer (#8389)
* support lowercase tokenizer

* fix arg pos
2020-11-09 10:41:39 -05:00
Yossi Synett
bc0d26d1de
[All Seq2Seq model + CLM models that can be used with EncoderDecoder] Add cross-attention weights to outputs (#8071)
* Output cross-attention with decoder attention output

* Update src/transformers/modeling_bert.py

* add cross-attention for t5 and bart as well

* fix tests

* correct typo in docs

* add sylvains and sams comments

* correct typo

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-11-06 19:34:48 +01:00
Sylvain Gugger
04e442d575
Make Trainer evaluation handle dynamic seq_length (#8336)
* Make Trainer evaluation handle dynamic seq_length

* Document behavior.

* Fix test

* Better fix

* Fixes for realsies this time

* Address review comments

* Without forgetting to save...
2020-11-05 15:13:51 -05:00
Guillaume Filion
27b402cab0
Output global_attentions in Longformer models (#7562)
* Output global_attentions in Longformer models

* make style

* small refactoring

* fix tests

* make fix-copies

* add for tf as well

* remove comments in test

* make fix-copies

* make style

* add docs

* make docstring pretty

Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
2020-11-05 21:10:43 +01:00
Sylvain Gugger
9c4aa4ac1a
Clean up data collators and datasets (#8308)
* Clean up data collators and datasets

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Remove needless clone

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-11-04 17:24:49 -05:00
Nicolas Patry
7342d9a583
Improve QA pipeline error handling (#8286)
- The issue is that with previous code we would have the following:

```python
qa_pipeline = (...)
qa_pipeline(question="Where was he born ?", context="")
-> IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)
```

The goal here is to improve this to actually return a ValueError
wherever possible.

While at it, I tried to simplify QuestionArgumentHandler's code to
make it smaller and more compat while keeping backward compat.
2020-11-04 11:30:42 -05:00
Patrick von Platen
cb966e640b
[Generate Test] fix greedy generate test (#8293)
* fix greedy generate test

* delet ipdb
2020-11-04 15:44:36 +01:00
Ceyda Cinarel
29b536a73a
[WIP] Ner pipeline grouped_entities fixes (#5970)
* Bug fix: NER pipeline shouldn't group separate entities of same type

* style fix

* [Bug Fix] Shouldn't group entities that are both 'B' even if they are same type
	(B-type1 B-type1) != (B-type1 I-type1)
[Bug Fix] add an option `ignore_subwords` to ignore subsequent ##wordpieces in predictions. Because some models train on only the first token of a word and not on the subsequent wordpieces (BERT NER default). So it makes sense doing the same thing at inference time.
	The simplest fix is to just group the subwords with the first wordpiece.
	[TODO] how to handle ignored scores? just set them to 0 and calculate zero invariant mean ?
	[TODO] handle different wordpiece_prefix ## ? possible approaches:
		get it from tokenizer? but currently most tokenizers dont have a wordpiece_prefix property?
		have an _is_subword(token)
[Feature add] added option to `skip_special_tokens`. Cause It was harder to remove them after grouping.
[Additional Changes] remove B/I prefix on returned grouped_entities
[Feature Request/TODO] Return indexes?
[Bug TODO]  can't use fast tokenizer with grouped_entities ('BertTokenizerFast' object has no attribute 'convert_tokens_to_string')

* use offset_mapping to fix [UNK] token problem

* ignore score for subwords

* modify ner_pipeline test

* modify ner_pipeline test

* modify ner_pipeline test

* ner_pipeline change ignore_subwords default to true

* add ner_pipeline ignore_subword=False test case

* fix offset_mapping index

* fix style again duh

* change is_subword and convert_tokens_to_string logic

* merge tests with new test structure

* change test names

* remove old tests

* ner tests for fast tokenizer

* fast tokenizers have convert_tokens_to_string

* Fix the incorrect merge

Co-authored-by: Ceyda Cinarel <snu-ceyda@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-11-03 17:21:04 -05:00
Stas Bekman
1bb4bba53c
[CIs] Better reports everywhere (#8275)
* make it possible to invoke testconf.py in both test suites without crashing on having the same option added

* perl -pi -e 's|--make_reports|--make-reports|' to be consistent with other opts

* add `pytest --make-reports` to all CIs (and artifacts)

* fix
2020-11-03 16:57:12 -05:00
Sylvain Gugger
7f556d2e39
Data collator for token classification (#8274)
* Add DataCollatorForTokenClassification and clean tests

* Make quality
2020-11-03 16:33:27 -05:00
Sylvain Gugger
4c19f3baab
Clean Trainer tests and datasets dep (#8268) 2020-11-03 15:50:55 -05:00
guillaume-be
74f6f91a9d
Updated ConversationalPipeline to work with encoder-decoder models (#8207)
* Updated ConversationalPipeline to work with encoder-decoder models (e.g. BlenderBot)

* Addition of integration test for EncoderDecoder conversation model

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-11-03 10:33:01 -05:00
Nicolas Patry
c66ffa3a17
[FIX] TextGenerationPipeline is currently broken. (#8256)
* [FIX] TextGenerationPipeline is currently broken.

It's most likely due to #8180.
What's missing is a multi vs single string handler at the beginning of
the pipe.
And also there was no testing of this pipeline.

* Fixing Conversational tests too.
2020-11-03 10:10:22 -05:00
Patrick von Platen
a1bbcf3f6c
Refactoring the generate() function (#6949)
* first draft

* show design proposition for new generate method

* up

* make better readable

* make first version

* gpt2 tests pass

* make beam search for gpt2 work

* add first encoder-decoder code

* delete typo

* make t5 work

* save indermediate

* make bart work with beam search

* finish beam search bart / t5

* add default kwargs

* make more tests pass

* fix no bad words sampler

* some fixes and tests for all distribution processors

* fix test

* fix rag slow tests

* merge to master

* add nograd to generate

* make all slow tests pass

* speed up generate

* fix edge case bug

* small fix

* correct typo

* add type hints and docstrings

* fix typos in tests

* add beam search tests

* add tests for beam scorer

* fix test rag

* finish beam search tests

* move generation tests in seperate file

* fix generation tests

* more tests

* add aggressive generation tests

* fix tests

* add gpt2 sample test

* add more docstring

* add more docs

* finish doc strings

* apply some more of sylvains and sams comments

* fix some typos

* make fix copies

* apply lysandres and sylvains comments

* final corrections on examples

* small fix for reformer
2020-11-03 16:04:22 +01:00
Stas Bekman
504ff7bb12
2 SinusoidalPositionalEmbedding fixes (#8226) 2020-11-02 18:50:26 -05:00
Santiago Castro
0c92e7d9fa
Fix ignore list behavior in doctests (#8213) 2020-11-02 08:47:37 -05:00
Nicolas Patry
84caa23301
Fix the behaviour of DefaultArgumentHandler (removing it). (#8180)
* Some work to fix the behaviour of DefaultArgumentHandler by removing it.

* Fixing specific pipelines argument checking.
2020-11-02 12:33:50 +01:00
TFUsers
00112c3539
Replace swish with silu (#8166)
* Replace swish with silu

* revert nn.silu to nn.swish due to older version

* simplify optimized silu conditional and fix format

* Update activations.py

* Update activations_tf.py

* Update modeling_flax_utils.py

* Update modeling_openai.py

* add swish testcase

* add pytorch swish testcase

* Add more robust python version check

* more formatting fixes

Co-authored-by: TFUsers <TFUsers@gmail.com>
2020-10-30 15:09:10 -04:00
Sam Shleifer
566b083eb1
TFMarian, TFMbart, TFPegasus, TFBlenderbot (#7987)
* Start plumbing

* Marian close

* Small stubs for all children

* Fixed bart

* marian working

* pegasus test is good, but failing

* Checkin tests

* More model files

* Subtle marian, pegasus integration test failures

* Works well

* rm print

* boom boom

* Still failing model2doc

* merge master

* Equivalence test failing, all others fixed

* cleanup

* Fix embed_scale

* Cleanup marian pipeline test

* Undo extra changes

* Smaller delta

* Cleanup model testers

* undo delta

* fix tests import structure

* cross test decorator

* Cleaner set_weights

* Respect authorized_unexpected_keys

* No warnings

* No warnings

* style

* Nest tf import

* black

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* functional dropout

* fixup

* Fixup

* style_doc

* embs

* shape list

* delete slow force_token_id_to_be_generated func

* fixup

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-10-30 11:23:16 -04:00
Lysandre Debut
10f8c63620
Ci test tf super slow (#8007)
* Test TF GPU CI

* Change cache

* Fix missing torch requirement

* Fix some model tests


Style

* LXMERT

* MobileBERT

* Longformer skip test

* XLNet

* The rest of the tests

* RAG goes OOM in multi gpu setup

* YAML test files

* Last fixes

* Skip doctests

* Fill mask tests

* Yaml files

* Last test fix

* Style

* Update cache

* Change ONNX tests to slow + use tiny model
2020-10-30 10:25:48 -04:00
Sylvain Gugger
acf56408d8
Smarter prediction loop and no- -> no_ in console args (#8151)
* Smarter prediction loop and no- -> no_ in console args

* Fix test
2020-10-29 10:56:25 -04:00
Santiago Castro
969859d5f6
Fix doc errors and typos across the board (#8139)
* Fix doc errors and typos across the board

* Fix a typo

* Fix the CI

* Fix more typos

* Fix CI

* More fixes

* Fix CI

* More fixes

* More fixes
2020-10-29 10:33:33 -04:00
Stas Bekman
5423f2a9d4
[testing] port test_trainer_distributed to distributed pytest + TestCasePlus enhancements (#8107)
* move the helper code into testing_utils

* port test_trainer_distributed to work with pytest

* improve docs

* simplify notes

* doc

* doc

* style

* doc

* further improvements

* torch might not be available

* real fix

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-10-28 11:51:32 -04:00
Joe Davison
3e58b6b7b8
infer entailment label id on zero shot pipeline (#8059)
* add entailment dim argument

* rename dim -> id

* fix last name change, style

* rm arg, auto-infer only

* typo

* rm superfluous import
2020-10-27 14:09:55 -04:00
Harutaka Kawamura
7bff0af0a4
Fix a bug for CallbackHandler.callback_list (#8052)
* Fix callback_list

* Add test

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* Fix test

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
2020-10-27 10:37:04 -04:00
Stas Bekman
bfd5e370a7
[CI] generate separate report files as artifacts (#7995)
* better reports

* a whole bunch of reports in their own files

* clean up

* improvements

* github artifacts experiment

* style

* complete the report generator with multiple improvements/fixes

* fix

* save all reports under one dir to easy upload

* can remove temp failing tests

* doc fix

* some cleanup
2020-10-27 09:25:07 -04:00
Lysandre Debut
cbad90d86d
Fix + Test (#8049) 2020-10-26 12:32:27 -04:00
Sylvain Gugger
077478637d
Fix label name in DataCollatorForNextSentencePrediction test (#8048) 2020-10-26 09:23:12 -04:00
Sam Shleifer
8bbe8247f1
Cleanup pytorch tests (#8033) 2020-10-26 08:59:06 -04:00
Sam Shleifer
f20aec1de5
fsmt slow test uses lists (#8031) 2020-10-26 08:32:36 -04:00
Thomas Wolf
79eb391586
[tokenizers] Fixing #8001 - Adding tests on tokenizers serialization (#8006)
* fixing #8001

* make T5 tokenizer serialization more robust - style
2020-10-26 10:27:48 +01:00
Anthony MOI
5e323017a4
Fix BatchEncoding.word_to_tokens for removed tokens (#7939) 2020-10-23 10:29:37 -04:00
Patrick von Platen
4acfd1a8dc
[Reformer] remove reformer pad_token_id (#7991)
* remove reformer pad_token_id

* fix pegasus
2020-10-23 10:29:15 -04:00
Thomas Wolf
3a40cdf58d
[tests|tokenizers] Refactoring pipelines test backbone - Small tokenizers improvements - General tests speedups (#7970)
* WIP refactoring pipeline tests - switching to fast tokenizers

* fix dialog pipeline and fill-mask

* refactoring pipeline tests backbone

* make large tests slow

* fix tests (tf Bart inactive for now)

* fix doc...

* clean up for merge

* fixing tests - remove bart from summarization until there is TF

* fix quality and RAG

* Add new translation pipeline tests - fix JAX tests

* only slow for dialog

* Fixing the missing TF-BART imports in modeling_tf_auto

* spin out pipeline tests in separate CI job

* adding pipeline test to CI YAML

* add slow pipeline tests

* speed up tf and pt join test to avoid redoing all the standalone pt and tf tests

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* Update src/transformers/pipelines.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/pipelines.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/testing_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* add require_torch and require_tf in is_pt_tf_cross_test

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-10-23 15:58:19 +02:00
Sylvain Gugger
06fc3954a1
Only log total_flos at the end of training (#7981)
* Only log total_flos at the end of training

* Fix test
2020-10-22 14:26:55 -04:00
Julien Chaumond
ff65beafa3
FillMaskPipeline: support passing top_k on __call__ (#7971)
* FillMaskPipeline: support passing top_k on __call__

Also move from topk to top_k

* migrate to new param name in tests

* Review from @sgugger
2020-10-22 12:54:25 -04:00
Sylvain Gugger
2e5052d4f1
New run glue script (#7917)
* Start simplification

* More progress

* Finished script

* Address comments and update tests instructions

* Wrong test

* Accept files as inputs and fix test

* Update src/transformers/trainer_utils.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Fix labels and add combined score

* Add special labels

* Update TPU command

* Revert to old label strategy

* Use model labels

* Fix for STT-B

* Styling

* Apply suggestions from code review

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Code styling

* Fix review comments

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-10-22 11:42:22 -04:00
Nicolas Patry
18ce6b8ff3
Fixing the "translation", "translation_XX_to_YY" pipelines. (#7975)
* Actually make the "translation", "translation_XX_to_YY" task behave correctly.

Background:
- Currently "translation_cn_to_ar" does not work. (only 3 pairs are
supported)
- Some models, contain in their config the correct values for the (src,
tgt) pair they can translate. It's usually just one pair, and we can
infer it automatically from the `model.config.task_specific_params`. If
it's not defined we can still probably load the TranslationPipeline
nevertheless.

Proposed fix:
- A simplified version of what could become more general which is
a `parametrized` task. "translation" + (src, tgt) in this instance
it what we need in the general case. The way we go about it for now
is simply parsing "translation_XX_to_YY". If cases of parametrized task arise
we should preferably go in something closer to what `datasets` propose
which is having a secondary argument `task_options`? that will be close
to what that task requires.
- Should be backward compatible in all cases for instance
`pipeline(task="translation_en_to_de") should work out of the box.
- Should provide a warning when a specific translation pair has been
selected on behalf of the user using
`model.config.task_specific_params`.

* Update src/transformers/pipelines.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-10-22 17:16:21 +02:00
Patrick von Platen
f34372a9ff
[PretrainedConfig] Fix save pretrained config for edge case (#7943)
* fix config save

* add test

* add config class variable and another test

* line break

* fix fsmt and typo

* god am I making many errors today :-/

* Update src/transformers/configuration_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-10-22 15:39:01 +02:00
Stas Bekman
64b4d25cf3
[fsmt test] basic config test with online model + super tiny model (#7860)
* basic config test with online model

* typo

* style

* better test
2020-10-22 09:14:54 -04:00
Stas Bekman
8348105692
[testing] slow tests should be marked as slow (#7895)
* slow tests should be slow

* exception note

* style

* integrate LysandreJik's notes with some expansions

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* another slow test

* fix link, and prose

* clarify.

* note from Sam

* typo

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-10-22 06:34:05 -04:00
Patrick von Platen
52decab371
fix test (#7947) 2020-10-21 19:06:23 +02:00
François Lagunas
e174bfeb34
TensorBoard/Wandb/optuna/raytune integration improvements. (#7935)
Improved TensorBoard and Wandb integration, as well as optuna and ray/tune support, with minor modifications to trainer core code.
2020-10-21 17:18:52 +02:00
Stas Bekman
57516c0cc8
[multiple models] skip saving/loading deterministic state_dict keys (#7878)
* make the save_load special key tests common

* handle mbart

* cleaner solution

* fix

* move test_save_load_missing_keys back into fstm for now

* restore

* style

* add marian

* add pegasus

* blenderbot

* revert - no static embed
2020-10-21 08:06:07 -04:00
Sam Shleifer
829842159e
Add TFBartForConditionalGeneration (#5411)
* half done

* doc improvement

* Cp test file

* brokedn

* broken test

* undo some mess

* ckpt

* borked

* Halfway

* 6 passing

* boom boom

* Much progress but still 6

* boom boom

* merged master

* 10 passing

* boom boom

* Style

* no t5 changes

* 13 passing

* Integration test failing, but not gibberish

* Frustrated

* Merged master

* 4 fail

* 4 fail

* fix return_dict

* boom boom

* Still only 4

* prepare method

* prepare method

* before delete classif

* Skip tests to avoid adding boilerplate

* boom boom

* fast tests passing

* style

* boom boom

* Switch to supporting many input types

* remove FIXMENORM

* working

* Fixed past_key_values/decoder_cached_states confusion

* new broken test

* Fix attention mask kwarg name

* undo accidental

* Style and reviewers

* style

* Docs and common tests

* Cleaner assert messages

* copy docs

* style issues

* Sphinx fix

* Simplify caching logic

* test does not require torch

* copy _NoLayerEmbedTokens

* Update src/transformers/modeling_tf_bart.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update tests/test_modeling_tf_bart.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/modeling_tf_bart.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/modeling_tf_bart.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/modeling_tf_bart.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Line length and dont document None

* Add pipeline test coverage

* assert msg

* At parity

* Assert messages

* mark slow

* Update compile test

* back in init

* Merge master

* Fix tests

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-10-21 13:10:16 +02:00
Patrick von Platen
29792864cb
[ProphetNet] Add Question Generation Model + Test (#7942)
* new prophetnet model

* correct name

* make style
2020-10-21 11:49:58 +02:00
Stas Bekman
3e31e7f956
[testing] rename skip targets + docs (#7863)
* rename skip targets + docs

* fix quotes

* style

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* small improvements

* fix

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-10-20 04:39:13 -04:00
Quentin Lhoest
033f29c625
Allow Custom Dataset in RAG Retriever (#7763)
* add CustomHFIndex

* typo in config

* update tests

* add custom dataset example

* clean script

* update test data

* minor in test

* docs

* docs

* style

* fix imports

* allow to pass the indexed dataset directly

* update tests

* use multiset DPR

* address thom and patrick's comments

* style

* update dpr tokenizer

* add output_dir flag in use_own_knowledge_dataset.py

* allow custom datasets in examples/rag/finetune.py

* add test for custom dataset in distributed rag retriever
2020-10-19 19:42:45 +02:00
Julien Rossi
a09fe140c1
Trainer with Iterable Dataset (#7858)
* fix 5990

* accomodate iterable dataset without predefined length
* set it as 1 use case: provide max_steps, and NO num_epochs
* Is a merge of master and PR 5995

* fix trainer test under TF

* fix only for torch
* TF trainer untouched
* trainer tests are skipped when no torch

* address comments

* fix quality checks

* remove torch.dataset from test_trainer

* unnecessary inheritance
* RegressionDataset implements all needed methods __len__ and __getitem__

* fix quality checks

* restore RegressionDataset

* was wrongly under is_torch_available()
2020-10-19 11:57:39 -04:00
Weizhen
2422cda01b
ProphetNet (#7157)
* add new model prophetnet

prophetnet modified

modify codes as suggested v1

add prophetnet test files

* still bugs, because of changed output formats of encoder and decoder

* move prophetnet into the latest version

* clean integration tests

* clean tokenizers

* add xlm config to init

* correct typo in init

* further refactoring

* continue refactor

* save parallel

* add decoder_attention_mask

* fix use_cache vs. past_key_values

* fix common tests

* change decoder output logits

* fix xlm tests

* make common tests pass

* change model architecture

* add tokenizer tests

* finalize model structure

* no weight mapping

* correct n-gram stream attention mask as discussed with qweizhen

* remove unused import

* fix index.rst

* fix tests

* delete unnecessary code

* add fast integration test

* rename weights

* final weight remapping

* save intermediate

* Descriptions for Prophetnet Config File

* finish all models

* finish new model outputs

* delete unnecessary files

* refactor encoder layer

* add dummy docs

* code quality

* fix tests

* add model pages to doctree

* further refactor

* more refactor, more tests

* finish code refactor and tests

* remove unnecessary files

* further clean up

* add docstring template

* finish tokenizer doc

* finish prophetnet

* fix copies

* fix typos

* fix tf tests

* fix fp16

* fix tf test 2nd try

* fix code quality

* add test for each model

* merge new tests to branch

* Update model_cards/microsoft/prophetnet-large-uncased-cnndm/README.md

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* Update model_cards/microsoft/prophetnet-large-uncased-cnndm/README.md

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* Update src/transformers/modeling_prophetnet.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* Update utils/check_repo.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* apply sams and sylvains comments

* make style

* remove unnecessary code

* Update README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/configuration_prophetnet.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* implement lysandres comments

* correct docs

* fix isort

* fix tokenizers

* fix copies

Co-authored-by: weizhen <weizhen@mail.ustc.edu.cn>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-10-19 17:36:09 +02:00
Funtowicz Morgan
8f8f8d99fc
Integrate Bert-like model on Flax runtime. (#3722)
* WIP flax bert

* Initial commit Bert Jax/Flax implementation.

* Embeddings working and equivalent to PyTorch.

* Move embeddings in its own module BertEmbeddings

* Added jax.jit annotation on forward call

* BertEncoder on par with PyTorch ! :D

* Add BertPooler on par with PyTorch !!

* Working Jax+Flax implementation of BertModel with < 1e-5 differences on the last layer.

* Fix pooled output to take only the first token of the sequence.

* Refactoring to use BertConfig from transformers.

* Renamed FXBertModel to FlaxBertModel

* Model is now initialized in FlaxBertModel constructor and reused.

* WIP JaxPreTrainedModel

* Cleaning up the code of FlaxBertModel

* Added ability to load Flax model saved through save_pretrained()

* Added ability to convert Pytorch Bert model to FlaxBert

* FlaxBert can now load every Pytorch Bert model with on-the-fly conversion

* Fix hardcoded shape values in conversion scripts.

* Improve the way we handle LayerNorm conversion from PyTorch to Flax.

* Added positional embeddings as parameter of BertModel with default to np.arange.

* Let's roll FlaxRoberta !

* Fix missing position_ids parameters on predict for Bert

* Flax backend now supports batched inputs

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Make it possible to load msgpacked model on convert from pytorch in last resort.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Moved save_pretrained to Jax base class along with more constructor parameters.

* Use specialized, model dependent conversion functio.

* Expose `is_flax_available` in file_utils.

* Added unittest for Flax models.

* Added run_tests_flax to the CI.

* Introduce FlaxAutoModel

* Added more unittests

* Flax model reference the _MODEL_ARCHIVE_MAP from PyTorch model.

* Addressing review comments.

* Expose seed in both Bert and Roberta

* Fix typo suggested by @stefan-it

Co-Authored-By: Stefan Schweter <stefan@schweter.it>

* Attempt to make style

* Attempt to make style in tests too

* Added jax & jaxlib to the flax optional dependencies.

* Attempt to fix flake8 warnings ...

* Redo black again and again

* When black and flake8 fight each other for a space ... 💥 💥 💥

* Try removing trailing comma to make both black and flake happy!

* Fix invalid is_<framework>_available call, thanks @LysandreJik 🎉

* Fix another invalid import in flax_roberta test

* Bump and pin flax release to 0.1.0.

* Make flake8 happy, remove unused jax import

* Change the type of the catch for msgpack.

* Remove unused import.

* Put seed as optional constructor parameter.

* trigger ci again

* Fix too much parameters in BertAttention.

* Formatting.

* Simplify Flax unittests to avoid machine crashes.

* Fix invalid number of arguments when raising issue for an unknown model.

* Address @bastings comment in PR, moving jax.jit decorated outside of __call__

* Fix incorrect path to require_flax/require_pytorch functions.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Attempt to make style.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Correct rebasing of circle-ci dependencies

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Fix import sorting.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Fix unused imports.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Again import sorting...

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Installing missing nlp dependency for flax unittests.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Fix laoding of model for Flax implementations.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* jit the inner function call to make JAX-compatible

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Format !

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Flake one more time 🎶

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Rewrites BERT in Flax to the new Linen API (#7211)

* Rewrite Flax HuggingFace PR to Linen

* Some fixes

* Fix tests

* Fix CI with change of name of nlp (#7054)

* nlp -> datasets

* More nlp -> datasets

* Woopsie

* More nlp -> datasets

* One last

* Expose `is_flax_available` in file_utils.

* Added run_tests_flax to the CI.

* Attempt to make style

* trigger ci again

* Fix import sorting.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Revert "Rewrites BERT in Flax to the new Linen API (#7211)"

This reverts commit 23703a5eb3364e26a1cbc3ee34b4710d86a674b0.

* Remove jnp.lax references

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make style.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Reintroduce Linen changes ...

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make style.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Use jax native's gelu function.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Renaming BertModel to BertModule to highlight the fact this is the Flax Module object.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Rewrite FlaxAutoModel test to not rely on pretrained_model_archive_map

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Remove unused variable in BertModule.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Remove unused variable in BertModule again

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Attempt to have is_flax_available working again.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Introduce JAX TensorType

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Improve ImportError message when trying to convert to various TensorType format.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Makes Flax model jittable.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Ensure flax models are jittable in unittests.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Remove unused imports.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Ensure jax imports are guarded behind is_flax_available.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make style.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make style again

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make style again again

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make style again again again

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Update src/transformers/file_utils.py

Co-authored-by: Marc van Zee <marcvanzee@gmail.com>

* Bump flax to it's latest version

Co-authored-by: Marc van Zee <marcvanzee@gmail.com>

* Bump jax version to at least 0.2.0

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Style.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Update the unittest to use TensorType.JAX

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* isort import in tests.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Match new flax parameters name "params"

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Remove unused imports.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Add flax models to transformers __init__

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Attempt to address all CI related comments.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Correct circle.yml indent.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Correct circle.yml indent (2)

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Remove coverage from flax tests

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Addressing many naming suggestions from comments

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Simplify for loop logic to interate over layers in FlaxBertLayerCollection

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* use f-string syntax for formatting logs.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Use config property from FlaxPreTrainedModel.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* use "cls_token" instead of "first_token" variable name.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* use "hidden_state" instead of "h" variable name.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Correct class reference in docstring to link to Flax related modules.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Added HF + Google Flax team copyright.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make Roberta independent from Bert

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Move activation functions to flax_utils.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Move activation functions to flax_utils for bert.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Added docstring for BERT

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Update import for Bert and Roberta tokenizers

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make style.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* fix-copies

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Correct FlaxRobertaLayer to match PyTorch.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Use the same store_artifact for flax unittest

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Style.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make sure gradient are disabled only locally for flax unittest using torch equivalence.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Use relative imports

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

Co-authored-by: Stefan Schweter <stefan@schweter.it>
Co-authored-by: Marc van Zee <marcvanzee@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-10-19 09:55:41 -04:00
Lalit Pagaria
0193c8290d
[RAG] Propagating of n_docs as parameter to all RagModel's related functions (#7891)
* Propagating n_docs as parameter to all RagModel's related functions that defaults to self.config.n_docs

* Making n_docs parameter's default value to None in marginalize function

* Fixing code quality issues

* Handle the special case when generator is of T5PreTrainedModel instance type. T5PreTrainedModel do not have n_docs as parameter

* T5PreTrainedModel do not have n_docs as parameter

* Addressing review comment

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Correcting comment by addressing review comment

* Adding assert statement verifying that n_docs is correctly set. n_docs should be the same for both retriever and generator.

* Fixing flake8 reported issue

* Correcting test datasets for rag

* Using doc_scores instead of context_input_ids to check assert as in RagSequenceForGeneration context_input_ids can be null

* doc_scores second dimension have number of retrieved docs

* Changing assert comment

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-10-19 15:15:52 +02:00
Stas Bekman
4eb61f8e88
remove USE_CUDA (#7861) 2020-10-19 07:08:34 -04:00
Sam Shleifer
b86a71ea38
[tests] fix slow bart cnn test, faster marian tests (#7888) 2020-10-18 20:18:08 -04:00
Thomas Wolf
ba8c4d0ac0
[Dependencies|tokenizers] Make both SentencePiece and Tokenizers optional dependencies (#7659)
* splitting fast and slow tokenizers [WIP]

* [WIP] splitting sentencepiece and tokenizers dependencies

* update dummy objects

* add name_or_path to models and tokenizers

* prefix added to file names

* prefix

* styling + quality

* spliting all the tokenizer files - sorting sentencepiece based ones

* update tokenizer version up to 0.9.0

* remove hard dependency on sentencepiece 🎉

* and removed hard dependency on tokenizers 🎉

* update conversion script

* update missing models

* fixing tests

* move test_tokenization_fast to main tokenization tests - fix bugs

* bump up tokenizers

* fix bert_generation

* update ad fix several tokenizers

* keep sentencepiece in deps for now

* fix funnel and deberta tests

* fix fsmt

* fix marian tests

* fix layoutlm

* fix squeezebert and gpt2

* fix T5 tokenization

* fix xlnet tests

* style

* fix mbart

* bump up tokenizers to 0.9.2

* fix model tests

* fix tf models

* fix seq2seq examples

* fix tests without sentencepiece

* fix slow => fast  conversion without sentencepiece

* update auto and bert generation tests

* fix mbart tests

* fix auto and common test without tokenizers

* fix tests without tokenizers

* clean up tests lighten up when tokenizers + sentencepiece are both off

* style quality and tests fixing

* add sentencepiece to doc/examples reqs

* leave sentencepiece on for now

* style quality split hebert and fix pegasus

* WIP Herbert fast

* add sample_text_no_unicode and fix hebert tokenization

* skip FSMT example test for now

* fix style

* fix fsmt in example tests

* update following Lysandre and Sylvain's comments

* Update src/transformers/testing_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/testing_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-10-18 20:51:24 +02:00
Stas Bekman
d8ca57d2ce
fix/hide warnings (#7837)
s
2020-10-16 03:19:51 -04:00
Sam Shleifer
96e47d9229
[cleanup] assign todos, faster bart-cnn test (#7835)
* 2 beam output

* unassign/remove TODOs

* remove one more
2020-10-16 03:11:18 -04:00
rmroczkowski
7b13bd01df
Herbert polish model (#7798)
* HerBERT transformer model for Polish language understanding.

* HerbertTokenizerFast generated with HerbertConverter

* Herbert base and large model cards

* Herbert model cards with tags

* Herbert tensorflow models

* Herbert model tests based on Bert test suit

* src/transformers/tokenization_herbert.py edited online with Bitbucket

* src/transformers/tokenization_herbert.py edited online with Bitbucket

* docs/source/model_doc/herbert.rst edited online with Bitbucket

* Herbert tokenizer tests and bug fixes

* src/transformers/configuration_herbert.py edited online with Bitbucket

* Copyrights and tests for TFHerbertModel

* model_cards/allegro/herbert-base-cased/README.md edited online with Bitbucket

* model_cards/allegro/herbert-large-cased/README.md edited online with Bitbucket

* Bug fixes after testing

* Reformat modified_only_fixup

* Proper order of configuration

* Herbert proper documentation formatting

* Formatting with make modified_only_fixup

* Dummies fixed

* Adding missing models to documentation

* Removing HerBERT model as it is a simple extension of BERT

* Update model_cards/allegro/herbert-base-cased/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Update model_cards/allegro/herbert-large-cased/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* HerbertTokenizer deprecated configuration removed

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-10-16 03:06:51 -04:00
Lysandre Debut
52c9e84285
Fix DeBERTa integration tests (#7729) 2020-10-16 02:49:13 -04:00
Nicolas Patry
0911b6bd86
Improving Pipelines by defaulting to framework='tf' when pytorch seems unavailable. (#7728)
* Improving Pipelines by defaulting to framework='tf' when

pytorch seems unavailable.

* Actually changing the default resolution order to account for model
defaults

Adding a new tests for each pipeline to check that pipeline(task) works
too without manually adding the framework too.
2020-10-15 09:42:07 +02:00
Sylvain Gugger
a1d1b332d0
Add predict step accumulation (#7767)
* Add eval_accumulation_step and clean distributed eval

* Add TPU test

* Add TPU stuff

* Fix arg name

* Fix Seq2SeqTrainer

* Fix total_size

* Update src/transformers/trainer_pt_utils.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Doc and add test to TPU

* Add unit test

* Adapt name

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-10-14 11:41:45 -04:00
Jonathan Chang
121dd4332b
Add batch inferencing support for GPT2LMHeadModel (#7552)
* Add support for gpt2 batch inferencing

* add test

* remove typo

Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
2020-10-14 13:40:24 +02:00
Sylvain Gugger
7968051aba
Fix typo 2020-10-13 17:30:46 -04:00
Sam Shleifer
2977bd528f
Faster pegasus tokenization test with reduced data size (#7762) 2020-10-13 16:22:29 -04:00
Patrick von Platen
82b09a8481
[Rag] Fix loading of pretrained Rag Tokenizer (#7756)
* fix rag

* Update tokenizer save_pretrained

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-10-13 14:34:22 +02:00
Felipe Curti
dcba9ee03b
Gpt1 for sequence classification (#7683)
* Add Documentation for GPT-1 Classification

* Add GPT-1 with Classification head

* Add tests for GPT-1 Classification

* Add GPT-1 For Classification to auto models

* Remove authorized missing keys, change checkpoint to openai-gpt
2020-10-13 05:06:15 -04:00
Sylvain Gugger
c6e18de9f8
Fix flaky test in test_trainer (#7689) 2020-10-09 20:01:15 -04:00
Stas Bekman
b0f05e0c4c
[pegasus] Faster tokenizer tests (#7672) 2020-10-09 11:10:32 -04:00
Funtowicz Morgan
21ed3a6b99
Reintroduce clean_text on BertTokenizer call which was removed by mistake in #4723 (#5749)
* Reintroduce clean_text call which was removed by mistake in #4723

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Added unittest for clean_text parameter on Bert tokenizer.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Better unittest name.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Adapt unittest to use untrained tokenizer.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Code quality + update test

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-10-09 08:07:28 -04:00
Thomas Wolf
9aeacb58ba
Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer (#7141)
* [WIP] SP tokenizers

* fixing tests for T5

* WIP tokenizers

* serialization

* update T5

* WIP T5 tokenization

* slow to fast conversion script

* Refactoring to move tokenzier implementations inside transformers

* Adding gpt - refactoring - quality

* WIP adding several tokenizers to the fast world

* WIP Roberta - moving implementations

* update to dev4 switch file loading to in-memory loading

* Updating and fixing

* advancing on the tokenizers - updating do_lower_case

* style and quality

* moving forward with tokenizers conversion and tests

* MBart, T5

* dumping the fast version of transformer XL

* Adding to autotokenizers + style/quality

* update init and space_between_special_tokens

* style and quality

* bump up tokenizers version

* add protobuf

* fix pickle Bert JP with Mecab

* fix newly added tokenizers

* style and quality

* fix bert japanese

* fix funnel

* limite tokenizer warning to one occurence

* clean up file

* fix new tokenizers

* fast tokenizers deep tests

* WIP adding all the special fast tests on the new fast tokenizers

* quick fix

* adding more fast tokenizers in the fast tests

* all tokenizers in fast version tested

* Adding BertGenerationFast

* bump up setup.py for CI

* remove BertGenerationFast (too early)

* bump up tokenizers version

* Clean old docstrings

* Typo

* Update following Lysandre comments

Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
2020-10-08 11:32:16 +02:00
Sam Shleifer
e3e6517355
Fix 3 failing slow bart/blender tests (#7652) 2020-10-07 22:05:03 -04:00
Sam Shleifer
960faaaf28
Blenderbot (#7418)
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-10-07 19:09:23 -04:00
Sylvain Gugger
08ba4b4902
Trainer callbacks (#7596)
* Initial callback proposal

* Finish various callbacks

* Post-rebase conflicts

* Fix tests

* Don't use something that's not set

* Documentation

* Remove unwanted print.

* Document all models can work

* Add tests + small fixes

* Update docs/source/internal/trainer_utils.rst

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments

* Fix TF tests

* Real fix this time

* This one should work

* Fix typo

* Really fix typo

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-10-07 10:50:21 -04:00
Lysandre Debut
5982431814
Add GPT2ForSequenceClassification based on DialogRPT (#7501)
* Add GPT2ForSequenceClassification based on DialogRPT

* Better documentation

* Code quality
2020-10-06 17:31:21 -04:00
Julien Plu
9cf7b23b9b
Custom TF weights loading (#7422)
* First try

* Fix TF utils

* Handle authorized unexpected keys when loading weights

* Add several more authorized unexpected keys

* Apply style

* Fix test

* Address Patrick's comments.

* Update src/transformers/modeling_tf_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply style

* Make return_dict the default behavior and display a warning message

* Revert

* Replace wrong keyword

* Revert code

* Add forgot key

* Fix bug in loading PT models from a TF one.

* Fix sort

* Add a test for custom load weights in BERT

* Apply style

* Remove unused import

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-10-05 09:58:45 -04:00
Sylvain Gugger
d3adb985d1
Expand test to locate flakiness (#7580) 2020-10-05 09:45:47 -04:00
Forrest Iandola
02ef825be2
SqueezeBERT architecture (#7083)
* configuration_squeezebert.py

thin wrapper around bert tokenizer

fix typos

wip sb model code

wip modeling_squeezebert.py. Next step is to get the multi-layer-output interface working

set up squeezebert to use BertModelOutput when returning results.

squeezebert documentation

formatting

allow head mask that is an array of [None, ..., None]

docs

docs cont'd

path to vocab

docs and pointers to cloud files (WIP)

line length and indentation

squeezebert model cards

formatting of model cards

untrack modeling_squeezebert_scratchpad.py

update aws paths to vocab and config files

get rid of stub of NSP code, and advise users to pretrain with mlm only

fix rebase issues

redo rebase of modeling_auto.py

fix issues with code formatting

more code format auto-fixes

move squeezebert before bert in tokenization_auto.py and modeling_auto.py because squeezebert inherits from bert

tests for squeezebert modeling and tokenization

fix typo

move squeezebert before bert in modeling_auto.py to fix inheritance problem

disable test_head_masking, since squeezebert doesn't yet implement head masking

fix issues exposed by the test_modeling_squeezebert.py

fix an issue exposed by test_tokenization_squeezebert.py

fix issue exposed by test_modeling_squeezebert.py

auto generated code style improvement

issue that we inherited from modeling_xxx.py: SqueezeBertForMaskedLM.forward() calls self.cls(), but there is no self.cls, and I think the goal was actually to call self.lm_head()

update copyright

resolve failing 'test_hidden_states_output' and remove unused encoder_hidden_states and encoder_attention_mask

docs

add integration test. rename squeezebert-mnli --> squeezebert/squeezebert-mnli

autogenerated formatting tweaks

integrate feedback from patrickvonplaten and sgugger to programming style and documentation strings

* tiny change to order of imports
2020-10-05 04:25:43 -04:00
Sylvain Gugger
29baa8fabe
Clean the Trainer state (#7490)
* Trainer should not modify its TrainingArguments

* Trainer should not modify its TrainingArguments

* Trainer should not modify its TrainingArguments

* Add test of resumed training

* Fixes

* Non multiGPU test

* Clean Trainer state

* Add more to the state

* Documentation

* One last test

* Make resume training test more complete

* Unwanted changes
2020-10-01 13:07:04 -04:00
Patrick von Platen
62f5ae68ec
[Seq2Seq] Fix a couple of bugs and clean examples (#7474)
* clean T5

* fix t5 tests

* fix index typo

* fix tf common test

* fix examples

* change positional ordering for Bart and FSTM

* add signature test

* clean docs and add tests

* add docs to encoder decoder

* clean docs

* correct two doc strings

* remove sig test for TF Elektra & Funnel

* fix tf t5 slow tests

* fix input_ids to inputs in tf

* Update src/transformers/modeling_bart.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_bart.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* implement lysandre results

* make style

* fix encoder decoder typo

* fix tf slow tests

* fix slow tests

* renaming

* remove unused input

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-10-01 17:38:50 +02:00
Sam Shleifer
9e80f972fb
Enable pegasus fp16 by clamping large activations (#7243)
* Clean clamp

* boom boom

* Take some other changes

* boom boom

* boom boom

* boom boom

* one chg

* fix test

* Use finfo

* style
2020-10-01 04:48:37 -04:00
Pengcheng He
7a0cf0ec93
Add DeBERTa model (#5929)
* Add DeBERTa model

* Remove dependency of deberta

* Address comments

* Patch DeBERTa
Documentation
Style

* Add final tests

* Style

* Enable tests + nitpicks

* position IDs

* BERT -> DeBERTa

* Quality

* Style

* Tokenization

* Last updates.

* @patrickvonplaten's comments

* Not everything can be a copy

* Apply most of @sgugger's review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Last reviews

* DeBERTa -> Deberta

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-30 07:07:30 -04:00
Sylvain Gugger
4ba248748f
Get a better error when check_copies fails (#7457)
* Get a better error when check_copies fails

* Fix tests
2020-09-30 10:05:14 +02:00
Sylvain Gugger
8546dc55c2
Fix Trainer tests in a multiGPU env (#7458) 2020-09-29 14:06:41 -04:00
Teven
9e9a1fb8c7
Adding gradient checkpointing to GPT2 (#7446)
* GPT2 gradient checkpointing

* find_unused_parameters removed if checkpointing

* find_unused_parameters removed if checkpointing

* Update src/transformers/configuration_gpt2.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Added a test for generation with checkpointing

* Update src/transformers/configuration_gpt2.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-29 12:26:26 -04:00
Sylvain Gugger
52e8392b7e
Add automatic best model loading to Trainer (#7431)
* Add automatic best model loading to Trainer

* Some small fixes

* Formatting
2020-09-29 10:41:18 -04:00
Marcin Zabłocki
4083a55ab0
Flos fix (#7384) 2020-09-28 04:09:26 -04:00
Sam Shleifer
748425d47d
[T5] allow config.decoder_layers to control decoder size (#7409)
* Working assymmetrical T5

* rename decoder_layers -> num_decoder_layers

* Fix docstring

* Allow creation of asymmetric t5 students
2020-09-28 03:08:04 -04:00
Patrick von Platen
e50a931c11
[Longformer, Bert, Roberta, ...] Fix multi gpu training (#7272)
* fix multi-gpu

* fix longformer

* force to delete unnecessary layers

* fix notifications

* fix warning

* fix roberta

* fix tests

* remove hasattr

* fix tests

* fix roberta

* merge and clean authorized keys
2020-09-25 20:33:21 +02:00
Patrick von Platen
2c8ecdf8a8
fix rag retriever save pretrained (#7399) 2020-09-25 19:47:12 +02:00
Sylvain Gugger
ad39271ae8
Fix FP16 and attention masks in FunnelTransformer (#7374)
* Fix #7371

* Fix training

* Fix test values

* Apply the fix to TF as well
2020-09-25 12:20:39 -04:00