Commit Graph

19383 Commits

Author SHA1 Message Date
Julien Chaumond
56d4ba8ddb [run_lm_finetuning] Train from scratch 2020-01-21 16:57:38 -05:00
Lysandre
c7f79815e7 Cleanup unused variables 2020-01-21 11:40:24 -05:00
Lysandre
15579e2d55 [SQuAD v2] Code quality 2020-01-21 11:36:46 -05:00
Lysandre
088fa7b759 Correct segment ID for XLNet single sequence 2020-01-21 11:33:45 -05:00
Lysandre
073219b43f Manage impossible examples SQuAD v2 2020-01-21 11:24:43 -05:00
Branden Chan
983c484fa2 add __getstate__ and __setstate__ to XLMRobertaTokenizer 2020-01-21 10:18:24 -05:00
James Betker
cefd51c50c Fix glue processor failing on tf datasets 2020-01-20 11:46:43 -05:00
Lysandre
ca6ce3040d Fix style 2020-01-20 10:56:23 -05:00
Morgan Funtowicz
908cd5ea27 Make forward asynchrone to avoid long computation timing out.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-01-20 10:56:23 -05:00
Morgan Funtowicz
6e6c8c52ed Fix bad handling of env variable USE_TF / USE_TORCH leading to invalid framework being used.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-01-20 10:56:23 -05:00
Brendan Roof
23c6998bf4 Add lower bound to tqdm for tqdm.auto
- It appears that `tqdm` only introduced `tqdm.auto` in 4.27.
- See https://github.com/tqdm/tqdm/releases/tag/v4.27.0.
- Without the lower bound I received the following stack trace in an environment where I already had tqdm installed:
```
  File "/home/brendanr/anaconda3/envs/allennlp/lib/python3.6/site-packages/transformers/__init__.py", line 20, in <module>
    from .file_utils import (TRANSFORMERS_CACHE, PYTORCH_TRANSFORMERS_CACHE, PYTORCH_PRETRAINED_BERT_CACHE,
  File "/home/brendanr/anaconda3/envs/allennlp/lib/python3.6/site-packages/transformers/file_utils.py", line 24, in <module>
    from tqdm.auto import tqdm
ModuleNotFoundError: No module named 'tqdm.auto'
```
2020-01-17 18:29:11 -05:00
Mark Neumann
65a89a8976 Fix BasicTokenizer to respect never_split parameters (#2557)
* add failing test

* fix call to _run_split_on_punc

* format with black
2020-01-17 14:57:56 -05:00
jiyeon_baek
6d5049a24d Fix typo in examples/run_squad.py
Rul -> Run
2020-01-17 11:22:51 -05:00
Julien Chaumond
23a2cea8cb Tokenizer.from_pretrained: fetch all possible files remotely 2020-01-16 16:47:19 -05:00
Julien Chaumond
99f9243de5 same here, try to not serialize too much if unneeded 2020-01-16 16:47:19 -05:00
Julien Chaumond
9d8fd2d40e tokenizer.save_pretrained: only save file if non-empty 2020-01-16 16:47:19 -05:00
Lysandre
6e2c28a14a Run SQuAD warning when the doc stride may be too high 2020-01-16 13:59:26 -05:00
Thomas Wolf
b8f43cb273
Merge pull request #2239 from ns-moosavi/HANS-evaluation-example
HANS evaluation
2020-01-16 13:28:25 +01:00
thomwolf
258ed2eaa8 adding details in readme 2020-01-16 13:21:30 +01:00
thomwolf
50ee59578d update formating - make flake8 happy 2020-01-16 13:21:30 +01:00
thomwolf
1c9333584a formating 2020-01-16 13:21:30 +01:00
thomwolf
e25b6fe354 updating readme 2020-01-16 13:21:30 +01:00
thomwolf
27c7b99015 adding details in readme - moving file 2020-01-16 13:21:30 +01:00
Nafise Sadat Moosavi
99d4515572 HANS evaluation 2020-01-16 13:21:30 +01:00
Thomas Wolf
dc17f2a111
Merge pull request #2538 from huggingface/py3_super
💄 super
2020-01-16 13:17:15 +01:00
Thomas Wolf
880854846b
Merge pull request #2540 from huggingface/torch14_fix
[PyTorch 1.4] Fix failing torchscript test for xlnet
2020-01-16 13:16:59 +01:00
Julien Chaumond
d9fa1bad72 Fix failing torchscript test for xlnet
model.parameters() order is apparently not stable (only for xlnet, for some reason)
2020-01-15 20:22:21 -05:00
Julien Chaumond
a98b2ca8c0 Style + fixup BertJapaneseTokenizer 2020-01-15 19:05:51 -05:00
Julien Chaumond
83a41d39b3 💄 super 2020-01-15 18:33:50 -05:00
Julien Chaumond
cd51893d37 Merge branch 'Rexhaif-patch-1' 2020-01-15 18:25:15 -05:00
Julien Chaumond
248aeaa842 Merge branch 'patch-1' of https://github.com/Rexhaif/transformers into Rexhaif-patch-1 2020-01-15 18:22:01 -05:00
Aditya Bhargava
c76c3cebed Add check for token_type_ids before tensorizing
Fix an issue where `prepare_for_model()` gives a `KeyError` when
`return_token_type_ids` is set to `False` and `return_tensors` is
enabled.
2020-01-15 12:31:43 -05:00
Julien Chaumond
eb59e9f705 Graduate sst-2 to a canonical one 2020-01-15 16:28:50 +00:00
Julien Chaumond
e184ad13cf Close #2392 2020-01-15 15:43:44 +00:00
Lysandre
dfe012ad9d Fix misleading RoBERTa token type ids 2020-01-14 17:47:28 -05:00
Lysandre
c024ab98df Improve padding side documentation 2020-01-14 17:44:23 -05:00
Lysandre
9aeb0b9b8a Improve padding side documentation 2020-01-14 17:43:00 -05:00
Julien Chaumond
715fa638a7 Merge branch 'master' into from_scratch_training 2020-01-14 18:58:21 +00:00
Lysandre
100e3b6f21 Bias should be resized with the weights
Created a link between the linear layer bias and the model attribute bias. This does not change anything for the user nor for the conversion scripts, but allows the `resize_token_embeddings` method to resize the bias as well as the weights of the decoder.

Added a test.
2020-01-14 13:43:45 -05:00
Lysandre
6c32d8bb95 Size > Dimensionality + Remove final TODOs 2020-01-14 14:09:09 +01:00
Lysandre
760164d63b RoBERTa example 2020-01-14 14:09:09 +01:00
Lysandre
387217bd3e Added example usage 2020-01-14 14:09:09 +01:00
Lysandre
7d1bb7f256 Add missing XLNet and XLM models 2020-01-14 14:09:09 +01:00
Lysandre
a1cb100460 Wrap up configurations 2020-01-14 14:09:09 +01:00
Lysandre
c11b6fd393 Update links in all configurations 2020-01-14 14:09:09 +01:00
Lysandre Debut
632682726f Updated Configurations 2020-01-14 14:09:09 +01:00
Thomas Wolf
2b566c182e
Merge pull request #2384 from dimagalat/master
Releasing file lock
2020-01-14 13:19:01 +01:00
Julien Chaumond
764f836d52
Update test_tokenization_auto.py 2020-01-13 22:50:34 -05:00
Julien Chaumond
d5831acb07
Update test_tokenization_auto.py 2020-01-13 22:47:33 -05:00
Julien Chaumond
ed6cd597cc
Update test_tokenization_auto.py 2020-01-13 22:46:35 -05:00