Thomas Wolf
7e17f09fb5
Merge pull request #1803 from importpandas/fix-xlnet-squad2.0
...
fix run_squad.py during fine-tuning xlnet on squad2.0
2019-12-21 13:38:48 +01:00
thomwolf
8a2be93b4e
fix merge
2019-12-21 13:31:28 +01:00
Thomas Wolf
562f864038
Merge branch 'master' into fix-xlnet-squad2.0
2019-12-21 12:48:10 +01:00
Thomas Wolf
8618bf15d6
Merge pull request #1736 from huggingface/fix-tf-xlnet
...
Fix TFXLNet
2019-12-21 12:42:05 +01:00
Thomas Wolf
2fa8737c44
Merge pull request #1586 from enzoampil/include_special_tokens_in_bert_examples
...
Add special tokens to documentation for bert examples to resolve issue: #1561
2019-12-21 12:36:11 +01:00
Thomas Wolf
f15f087143
Merge pull request #1764 from DomHudson/bug-fix-1761
...
Bug-fix: Roberta Embeddings Not Masked
2019-12-21 12:13:27 +01:00
Thomas Wolf
fae4d1c266
Merge pull request #2217 from aaugustin/test-parallelization
...
Support running tests in parallel
2019-12-21 11:54:23 +01:00
Aymeric Augustin
b8e924e10d
Restore test.
...
This looks like debug code accidentally committed in b18509c2
.
Refs #2250 .
2019-12-21 08:50:15 +01:00
Aymeric Augustin
767bc3ca68
Fix typo in model name.
...
This looks like a copy/paste mistake. Probably this test was never run.
Refs #2250 .
2019-12-21 08:46:26 +01:00
Aymeric Augustin
343c094f21
Run examples separately from tests.
...
This optimizes the total run time of the Circle CI test suite.
2019-12-21 08:43:19 +01:00
Aymeric Augustin
80caf79d07
Prevent excessive parallelism in PyTorch.
...
We're already using as many processes in parallel as we have CPU cores.
Furthermore, the number of core may be incorrectly calculated as 36
(we've seen this in pytest-xdist) which make compound the problem.
PyTorch performance craters without this.
2019-12-21 08:43:19 +01:00
Aymeric Augustin
bb3bfa2d29
Distribute tests from the same file to the same worker.
...
This should prevent two issues:
- hitting API rate limits for tests that hit the HF API
- multiplying the cost of expensive test setups
2019-12-21 08:43:19 +01:00
Aymeric Augustin
29cbab98f0
Parallelize tests on Circle CI.
...
Set the number of CPUs manually based on the Circle CI resource class,
or else we're getting 36 CPUs, which is far too much (perhaps that's
the underlying hardware and not what Circle CI allocates to us).
Don't parallelize the custom tokenizers tests because they take less
than one second to run and parallelization actually makes them slower.
2019-12-21 08:43:19 +01:00
Aymeric Augustin
a4c9338b83
Prevent parallel downloads of the same file with a lock.
...
Since the file is written to the filesystem, a filesystem lock is the
way to go here. Add a dependency on the third-party filelock library to
get cross-platform functionality.
2019-12-21 08:43:19 +01:00
Aymeric Augustin
b670c26684
Take advantage of the cache when running tests.
...
Caching models across test cases and across runs of the test suite makes
slow tests somewhat more bearable.
Use gettempdir() instead of /tmp in tests. This makes it easier to
change the location of the cache with semi-standard TMPDIR/TEMP/TMP
environment variables.
Fix #2222 .
2019-12-21 08:43:19 +01:00
Aymeric Augustin
b67fa1a8d2
Download models directly to cache_dir.
...
This allows moving the file instead of copying it, which is more
reliable. Also it avoids writing large amounts of data to /tmp,
which may not be large enough to accomodate it.
Refs #2222 .
2019-12-21 08:43:19 +01:00
Aymeric Augustin
286d5bb6b7
Use a random temp dir for writing pruned models in tests.
2019-12-21 08:43:19 +01:00
Aymeric Augustin
478e456e83
Use a random temp dir for writing file in tests.
2019-12-21 08:43:19 +01:00
Aymeric Augustin
12726f8556
Remove redundant torch.jit.trace in tests.
...
This looks like it could be expensive, so don't run it twice.
2019-12-21 08:43:19 +01:00
Julien Chaumond
ac1b449cc9
[doc] move distilroberta to more appropriate place
...
cc @lysandrejik
2019-12-21 00:09:01 -05:00
Julien Chaumond
3e52915fa7
[RoBERTa] Embeddings: fix dimensionality bug
2019-12-20 19:01:27 -05:00
Dom Hudson
228f52867c
Bug fix: 1764
2019-12-20 18:27:35 -05:00
Francesco
a80778f40e
small refactoring (only esthetic, not functional)
2019-12-20 17:21:24 -05:00
Francesco
3df1d2d144
- Create the output directory (whose name is passed by the user in the "save_directory" parameter) where it will be saved encoder and decoder, if not exists.
...
- Empty the output directory, if it contains any files or subdirectories.
- Create the "encoder" directory inside "save_directory", if not exists.
- Create the "decoder" directory inside "save_directory", if not exists.
- Save the encoder and the decoder in the previous two directories, respectively.
2019-12-20 17:21:24 -05:00
Lysandre
a436574bfd
Release: v2.3.0
2019-12-20 16:22:20 -05:00
Thomas Wolf
d0f8b9a978
Merge pull request #2244 from huggingface/fix-tok-pipe
...
Fix Camembert and XLM-R `decode` method- Fix NER pipeline alignement
2019-12-20 22:10:39 +01:00
Thomas Wolf
a557836a70
Merge pull request #2191 from huggingface/fix_sp_np
...
Numpy compatibility for sentence piece
2019-12-20 22:08:08 +01:00
thomwolf
655fd06853
clean up
2019-12-20 21:57:49 +01:00
thomwolf
e5812462fc
clean up debug and less verbose tqdm
2019-12-20 21:51:48 +01:00
thomwolf
4775ec354b
add overwrite - fix ner decoding
2019-12-20 21:47:15 +01:00
Lysandre
cb6d54bfda
Numpy compatibility for sentence piece
...
convert to int earlier
2019-12-20 15:06:28 -05:00
thomwolf
f79a7dc661
fix NER pipeline
2019-12-20 20:57:45 +01:00
thomwolf
a241011057
fix pipeline NER
2019-12-20 20:43:48 +01:00
thomwolf
e37ca8e11a
fix camembert and XLM-R tokenizer
2019-12-20 20:43:42 +01:00
thomwolf
ceae85ad60
fix mc loading
2019-12-20 19:52:24 +01:00
thomwolf
71883b6ddc
update link in readme
2019-12-20 19:40:23 +01:00
Thomas Wolf
8d5a47c79b
Merge pull request #2243 from huggingface/fix-xlm-roberta
...
fixing xlm-roberta tokenizer max_length and automodels
2019-12-20 19:34:08 +01:00
thomwolf
79e4a6a25c
update serving API
2019-12-20 19:33:12 +01:00
thomwolf
bbaaec046c
fixing CLI pipeline
2019-12-20 19:19:20 +01:00
thomwolf
1c12ee0e55
fixing xlm-roberta tokenizer max_length and automodels
2019-12-20 18:28:27 +01:00
Lysandre
65c75fc587
Clean special tokens test
2019-12-20 11:34:16 -05:00
Lysandre
fb393ad994
Added test for all special tokens
2019-12-20 11:29:58 -05:00
Dirk Groeneveld
90debb9ff2
Keep even the first of the special tokens intact while lowercasing.
2019-12-20 11:29:43 -05:00
Morgan Funtowicz
b98ff88544
Added pipelines quick tour in README
2019-12-20 15:52:50 +01:00
Thomas Wolf
3a2c4e6f63
Merge pull request #1548 from huggingface/cli
...
[2.2] - Command-line interface - Pipeline class
2019-12-20 15:28:29 +01:00
Rémi Louf
4e3f745ba4
add example for Model2Model in quickstart
2019-12-20 09:12:31 -05:00
thomwolf
db0795b5d0
defaults models for tf and pt - update tests
2019-12-20 15:07:00 +01:00
Morgan Funtowicz
7f74084528
Fix leading axis added when saving through the command run
2019-12-20 14:47:04 +01:00
thomwolf
c37815f130
clean up PT <=> TF 2.0 conversion and config loading
2019-12-20 14:35:40 +01:00
thomwolf
73fcebf7ec
update serving command
2019-12-20 13:47:35 +01:00