Commit Graph

19383 Commits

Author SHA1 Message Date
Julien Chaumond
7d22fefd37 [pipeline] Alias NerPipeline as TokenClassificationPipeline 2020-02-14 09:18:10 -05:00
Manuel Romero
61a2b7dc9d Fix typo 2020-02-14 09:13:07 -05:00
Ilias Chalkidis
6e261d3a22 Fix typos 2020-02-14 09:11:07 -05:00
Manuel Romero
4e597c8e4d Fix typo 2020-02-14 09:07:42 -05:00
Julien Chaumond
925a13ced1 [model_cards] mv README.md 2020-02-13 23:07:29 -05:00
Manuel Romero
575a3b7aa1 Create distill-bert-base-spanish-wwm-cased-finetuned-spa-squad2-es.md 2020-02-13 23:04:52 -05:00
Julien Chaumond
4d36472b96 [run_ner] Don't crash if fine-tuning local model that doesn't end with digit 2020-02-14 03:25:29 +00:00
Ilias Chalkidis
8514018300 Update with additional information
Added a "Pre-training details" section
2020-02-13 21:54:42 -05:00
Ilias Chalkidis
1eec69a900 Create README.md 2020-02-13 19:27:22 -05:00
Felix MIKAELIAN
8744402f1e
add model_card flaubert-base-uncased-squad (#2833)
* add model_card

* Add tag

cc @fmikaelian

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-02-13 17:19:13 -05:00
Severin Simmler
7f98edd7e3
Model card: Literary German BERT (#2843)
* feat: create model card

* chore: add description

* feat: stats plot

* Delete prosa-jahre.svg

* feat: years plot (again)

* chore: add more details

* fix: typos

* feat: kfold plot

* feat: kfold plot

* Rename model_cards/severinsimmler/literary-german-bert.md to model_cards/severinsimmler/literary-german-bert/README.md

* Support for linked images + add tags

cc @severinsimmler

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-02-13 15:43:44 -05:00
Joe Davison
f1e8a51f08
Preserve spaces in GPT-2 tokenizers (#2778)
* Preserve spaces in GPT-2 tokenizers

Preserves spaces after special tokens in GPT-2 and inhereted (RoBERTa)
tokenizers, enabling correct BPE encoding. Automatically inserts a space
in front of first token in encode function when adding special tokens.

* Add tokenization preprocessing method

* Add framework argument to pipeline factory

Also fixes pipeline test issue. Each test input now treated as a
distinct sequence.
2020-02-13 13:29:43 -05:00
Sam Shleifer
0ed630f139
Attempt to increase timeout for circleci slow tests (#2844) 2020-02-13 09:11:03 -05:00
Sam Shleifer
ef74b0f07a
get_activation('relu') provides a simple mapping from strings i… (#2807)
* activations.py contains a mapping from string to activation function
* resolves some `gelu` vs `gelu_new` ambiguity
2020-02-13 08:28:33 -05:00
Lysandre
f54a5bd37f Raise error when using an mlm flag for a clm model + correct TextDataset 2020-02-12 13:23:14 -05:00
Lysandre
569897ce2c Fix a few issues regarding the language modeling script 2020-02-12 13:23:14 -05:00
Julien Chaumond
21da895013 [model_cards] Better image for social sharing 2020-02-11 20:30:08 -05:00
Julien Chaumond
9a70910d47 [model_cards] Tweak @mrm8488's model card 2020-02-11 20:20:39 -05:00
Julien Chaumond
9274734a0d [model_cards] mv to correct location + tweak tag 2020-02-11 20:13:57 -05:00
Manuel Romero
69f948461f Create bert-base-spanish-wwm-cased-finetuned-spa-squad2-es.md 2020-02-11 20:07:15 -05:00
Julien Chaumond
e0b6247cf7 [model_cards] Change formatting slightly as we updated our markdown engine
cc @tholor @loretoparisi @simonefrancia
2020-02-11 18:25:21 -05:00
sshleifer
5f2dd71d1b Smaller diff 2020-02-11 17:20:09 -05:00
sshleifer
31158af57c formatting 2020-02-11 17:20:09 -05:00
sshleifer
5dd61fb9a9 Add more specific testing advice to Contributing.md 2020-02-11 17:20:09 -05:00
Oleksiy Syvokon
ee5de0ba44 BERT decoder: Fix causal mask dtype.
PyTorch < 1.3 requires multiplication operands to be of the same type.
This was violated when using default attention mask (i.e.,
attention_mask=None in arguments) given BERT in the decoder mode.

In particular, this was breaking Model2Model and made tutorial
from the quickstart failing.
2020-02-11 15:19:22 -05:00
jiyeon
bed38d3afe Fix typo in src/transformers/data/processors/squad.py 2020-02-11 11:22:24 -05:00
Stefan Schweter
498d06e914
[model_cards] Add new German Europeana BERT models (#2805)
* [model_cards] New German Europeana BERT models from dbmdz

* [model_cards] Update German Europeana BERT models from dbmdz
2020-02-11 10:49:39 -05:00
Funtowicz Morgan
3e3a9e2c01
Merge pull request #2793 from huggingface/tensorflow-210-circleci-fix
Fix circleci cuInit error on Tensorflow >= 2.1.0.
2020-02-11 10:48:42 +00:00
Julien Chaumond
1f5db9a13c [model_cards] Rm extraneous tag 2020-02-10 17:45:13 -05:00
Julien Chaumond
95bac8dabb [model_cards] Add language metadata to existing model cards
This will enable filtering on language (amongst other tags) on the website

cc @loretoparisi, @stefan-it, @HenrykBorzymowski, @marma
2020-02-10 17:42:42 -05:00
ahotrod
ba498eac38
Create README.md (#2785)
* Create README.md

* Update README.md

* Update README.md

* Update README.md

* [model_cards] Use code fences for consistency

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-02-10 17:27:59 -05:00
Malte Pietsch
68ccc04ee6
Add model readme for deepset/roberta-base-squad2 (#2797)
* Add readme for deepset/roberta-base-squad2

* update model readme
2020-02-10 15:21:48 -05:00
Lysandre
539f601be7 intermediate_size > hidden_dim in distilbert config docstrings 2020-02-10 13:45:57 -05:00
Lysandre
cfb7d108bd FlauBERT lang embeddings only when n_langs > 1 2020-02-10 13:24:04 -05:00
Julien Chaumond
b4691a438d [model_cards] BERT-of-Theseus: use the visual as thumbnail
cc @jetrunner

Co-Authored-By: Kevin Canwen Xu <canwenxu@outlook.com>
2020-02-10 11:27:08 -05:00
Julien Chaumond
fc325e97cd [model_cards] Showcase model tag syntax 2020-02-10 11:27:08 -05:00
Lysandre
fd639e5be3 Correct quickstart example when using the past 2020-02-10 11:25:56 -05:00
Julien Chaumond
63a5399bc4 [model_cards] Specify language meta + thumbnail
cc @tholor

see #2799
2020-02-10 11:20:05 -05:00
Lysandre
125a75a121 Correctly compute tokens when padding on the left 2020-02-10 10:47:42 -05:00
Malte Pietsch
9c64d1da35
Add model readme for bert-base-german-cased (#2799)
* add readme for bert-base-german-cased

* update readme
2020-02-10 10:27:29 -05:00
Kevin Canwen Xu
bf99014c46 Create BERT-of-Theseus model card 2020-02-10 09:58:40 -05:00
Thomas Wolf
92e974196f
Merge pull request #2765 from huggingface/extract-cached-archives
Add option to `cached_path` to automatically extract archives
2020-02-10 14:05:16 +01:00
Morgan Funtowicz
6aa7973aec Fix circleci cuInit error on Tensorflow >= 2.1.0.
Tensorflow 2.1.0 introduce a new dependency model where pip install tensorflow would install tf with GPU support.
Before it would just install with CPU support, thus CircleCI is looking for NVidia driver version at initialization of the
tensorflow related tests but fails as their is no NVidia Driver running.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-02-10 13:24:37 +01:00
Lysandre
520e7f2119 Correct docstring for xlnet 2020-02-07 16:42:35 -05:00
Lysandre
dd28830327 Update RoBERTa tips 2020-02-07 16:42:35 -05:00
Lysandre
db97930122 Update XLM-R tips 2020-02-07 16:42:35 -05:00
Lysandre
7046de2991 E231 2020-02-07 15:28:13 -05:00
VictorSanh
0d3aa3c04c styling 2020-02-07 15:28:13 -05:00
VictorSanh
d8b43600fd omission 2020-02-07 15:28:13 -05:00
VictorSanh
ee5a6856ca distilbert-base-cased weights + Readmes + omissions 2020-02-07 15:28:13 -05:00