Morgan Funtowicz
28e64ad5a4
Raise an exception if the pipeline allocator can't determine the tokenizer from the model.
2019-12-13 14:12:54 +01:00
Morgan Funtowicz
be5bf7b81b
Added NER pipeline.
2019-12-13 14:12:17 +01:00
Morgan Funtowicz
80eacb8f16
Adding labels mapping for classification models in their respective config.
2019-12-13 14:10:22 +01:00
Morgan Funtowicz
f69dbecc38
Expose classification labels mapping (and reverse) in model config.
2019-12-12 10:25:36 +01:00
thomwolf
6709739a05
allowing from_pretrained to load from url directly
2019-12-11 18:15:45 +01:00
Morgan Funtowicz
c28273793e
Add missing DistilBert and Roberta to AutoModelForTokenClassification
2019-12-11 15:31:45 +01:00
Morgan Funtowicz
b040bff6df
Added supported model to AutoModelTokenClassification
2019-12-11 14:13:58 +01:00
Morgan Funtowicz
9a24e0cf76
Refactored qa pipeline argument handling + unittests
2019-12-11 00:33:25 +01:00
Morgan Funtowicz
63e36007ee
Make sure padding, cls and another non-context tokens cannot appear in the answer.
2019-12-10 16:47:35 +01:00
Morgan Funtowicz
40a39ab650
Reuse recent SQuAD refactored data structure inside QA pipelines.
2019-12-10 15:59:38 +01:00
Morgan Funtowicz
aae74065df
Added QuestionAnsweringPipeline unit tests.
2019-12-10 13:37:20 +01:00
Morgan Funtowicz
a7d3794a29
Remove token_type_ids for compatibility with DistilBert
2019-12-10 13:37:20 +01:00
Morgan Funtowicz
fe0f552e00
Use attention_mask everywhere.
2019-12-10 13:37:20 +01:00
Morgan Funtowicz
348e19aa21
Expose attention_masks and input_lengths arguments to batch_encode_plus
2019-12-10 13:37:18 +01:00
Morgan Funtowicz
c2407fdd88
Enable the Tensorflow backend.
2019-12-10 13:37:14 +01:00
Morgan Funtowicz
f116cf599c
Allow hidding frameworks through environment variables (NO_TF, NO_TORCH).
2019-12-10 13:37:07 +01:00
Morgan Funtowicz
6e61e06051
batch_encode_plus generates the encoder_attention_mask to avoid attending over padded values.
2019-12-10 13:37:07 +01:00
Morgan Funtowicz
02110485b0
Added batching, topk, chars index and scores.
2019-12-10 13:36:55 +01:00
Morgan Funtowicz
e1d89cb24d
Added QuestionAnsweringPipeline with batch support.
2019-12-10 13:36:55 +01:00
Morgan Funtowicz
81babb227e
Added download command through the cli.
...
It allows to predownload models and tokenizers.
2019-12-10 12:18:59 +01:00
thomwolf
31a3a73ee3
updating CLI
2019-12-10 12:18:59 +01:00
thomwolf
7c1697562a
compatibility with sklearn and keras
2019-12-10 12:12:22 +01:00
thomwolf
b81ab431f2
updating AutoModels and AutoConfiguration - adding pipelines
2019-12-10 12:11:33 +01:00
thomwolf
2d8559731a
add pipeline - train
2019-12-10 11:34:16 +01:00
thomwolf
72c36b9ea2
[WIP] - CLI
2019-12-10 11:33:14 +01:00
Thomas Wolf
e57d00ee10
Merge pull request #1984 from huggingface/squad-refactor
...
[WIP] Squad refactor
2019-12-10 11:07:26 +01:00
Thomas Wolf
ecabbf6d28
Merge pull request #2107 from huggingface/encoder-mask-shape
...
create encoder attention mask from shape of hidden states
2019-12-10 10:07:56 +01:00
Julien Chaumond
1d18930462
Harmonize no_cuda
flag with other scripts
2019-12-09 20:37:55 -05:00
Rémi Louf
f7eba09007
clean for release
2019-12-09 20:37:55 -05:00
Rémi Louf
2a64107e44
improve device usage
2019-12-09 20:37:55 -05:00
Rémi Louf
c0707a85d2
add README
2019-12-09 20:37:55 -05:00
Rémi Louf
ade3cdf5ad
integrate ROUGE
2019-12-09 20:37:55 -05:00
Rémi Louf
076602bdc4
prevent BERT weights from being downloaded twice
2019-12-09 20:37:55 -05:00
Rémi Louf
5909f71028
add py-rouge dependency
2019-12-09 20:37:55 -05:00
Rémi Louf
a1994a71ee
simplified model and configuration
2019-12-09 20:37:55 -05:00
Rémi Louf
3a9a9f7861
default output dir to documents dir
2019-12-09 20:37:55 -05:00
Rémi Louf
693606a75c
update the docs
2019-12-09 20:37:55 -05:00
Rémi Louf
c0443df593
remove beam search
2019-12-09 20:37:55 -05:00
Rémi Louf
2403a66598
give transformers API to BertAbs
2019-12-09 20:37:55 -05:00
Rémi Louf
4d18199902
cast bool tensor to long for pytorch < 1.3
2019-12-09 20:37:55 -05:00
Rémi Louf
9f75565ea8
setup training
2019-12-09 20:37:55 -05:00
Rémi Louf
4735c2af07
tweaks to the BeamSearch API
2019-12-09 20:37:55 -05:00
Rémi Louf
ba089c780b
share pretrained embeddings
2019-12-09 20:37:55 -05:00
Rémi Louf
9660ba1cbd
Add beam search
2019-12-09 20:37:55 -05:00
Rémi Louf
1c71ecc880
load the pretrained weights for encoder-decoder
...
We currently save the pretrained_weights of the encoder and decoder in
two separate directories `encoder` and `decoder`. However, for the
`from_pretrained` function to operate with automodels we need to
specify the type of model in the path to the weights.
The path to the encoder/decoder weights is handled by the
`PreTrainedEncoderDecoder` class in the `save_pretrained` function. Sice
there is no easy way to infer the type of model that was initialized for
the encoder and decoder we add a parameter `model_type` to the function.
This is not an ideal solution as it is error prone, and the model type
should be carried by the Model classes somehow.
This is a temporary fix that should be changed before merging.
2019-12-09 20:37:55 -05:00
Rémi Louf
07f4cd73f6
update function to add special tokens
...
Since I started my PR the `add_special_token_single_sequence` function
has been deprecated for another; I replaced it with the new function.
2019-12-09 20:37:55 -05:00
Pierric Cistac
5c877fe94a
fix albert links
2019-12-09 18:53:00 -05:00
Bilal Khan
79526f82f5
Remove unnecessary epoch variable
2019-12-09 16:24:35 -05:00
Bilal Khan
9626e0458c
Add functionality to continue training from last saved global_step
2019-12-09 16:24:35 -05:00
Bilal Khan
2d73591a18
Stop saving current epoch
2019-12-09 16:24:35 -05:00