Commit Graph

324 Commits

Author SHA1 Message Date
Funtowicz Morgan
7231f7b503
Enable ONNX/ONNXRuntime optimizations through converter script (#6131)
* Add onnxruntime transformers optimization support

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added Optimization section in ONNX/ONNXRuntime documentation.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Improve note reference

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fixing imports order.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Add warning about different level of optimization between torch and tf export.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Address @LysandreJik wording suggestion

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address @LysandreJik wording suggestion

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Always optimize model before quantization for maximum performances.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Address comments on the documentation.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Improve TensorFlow optimization message as suggested by @yufenglee

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Removed --optimize parameter

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Warn the user about current quantization limitation when model is larger than 2GB.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Trigger CI for last check

* Small change in print for the optimization section.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-07-31 09:45:13 +02:00
Sylvain Gugger
f3065abdb8
Doc tokenizer (#6110)
* Start doc tokenizers

* Tokenizer documentation

* Start doc tokenizers

* Tokenizer documentation

* Formatting after rebase

* Formatting after merge

* Update docs/source/main_classes/tokenizer.rst

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address comment

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Address Thom's comments

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-07-30 14:51:19 -04:00
guillaume-be
e642c78908
Addition of a DialoguePipeline (#5516)
* initial commit for pipeline implementation

Addition of input processing and history concatenation

* Conversation pipeline tested and working for single & multiple conversation inputs

* Added docstrings for dialogue pipeline

* Addition of dialogue pipeline integration tests

* Delete test_t5.py

* Fixed max code length

* Updated styling

* Fixed test broken by formatting tools

* Removed unused import

* Added unit test for DialoguePipeline

* Fixed Tensorflow compatibility

* Fixed multi-framework support using framework flag

* - Fixed docstring
- Added `min_length_for_response` as an initialization parameter
- Renamed `*args` to `conversations`, `conversations` being a `Conversation` or a `List[Conversation]`
- Updated truncation to truncate entire segments of conversations, instead of cutting in the middle of a user/bot input

* - renamed pipeline name from dialogue to conversational
- removed hardcoded default value of 1000 and use config.max_length instead
- added `append_response` and `set_history` method to the Conversation class to avoid direct fields mutation
- fixed bug in history truncation method

* - Updated ConversationalPipeline to accept only active conversations (otherwise a ValueError is raised)

* - Simplified input tensor conversion

* - Updated attention_mask value for Tensorflow compatibility

* - Updated last dialogue reference to conversational & fixed integration tests

* Fixed conflict with master

* Updates following review comments

* Updated formatting

* Added Conversation and ConversationalPipeline to the library __init__, addition of docstrings for Conversation, added both to the docs

* Update src/transformers/pipelines.py

Updated docsting following review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-07-30 14:11:39 -04:00
Sylvain Gugger
91cb95461e
Switch from return_tuple to return_dict (#6138)
* Switch from return_tuple to return_dict

* Fix test

* [WIP] Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleC… (#5614)

* Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleChoice} models and tests

* AutoModels


Tiny tweaks

* Style

* Final changes before merge

* Re-order for simpler review

* Final fixes

* Addressing @sgugger's comments

* Test MultipleChoice

* Rework TF trainer (#6038)

* Fully rework training/prediction loops

* fix method name

* Fix variable name

* Fix property name

* Fix scope

* Fix method name

* Fix tuple index

* Fix tuple index

* Fix indentation

* Fix variable name

* fix eval before log

* Add drop remainder for test dataset

* Fix step number + fix logging datetime

* fix eval loss value

* use global step instead of step + fix logging at step 0

* Fix logging datetime

* Fix global_step usage

* Fix breaking loop + logging datetime

* Fix step in prediction loop

* Fix step breaking

* Fix train/test loops

* Force TF at least 2.2 for the trainer

* Use assert_cardinality to facilitate the dataset size computation

* Log steps per epoch

* Make tfds compliant with TPU

* Make tfds compliant with TPU

* Use TF dataset enumerate instead of the Python one

* revert previous commit

* Fix data_dir

* Apply style

* rebase on master

* Address Sylvain's comments

* Address Sylvain's and Lysandre comments

* Trigger CI

* Remove unused import

* Switch from return_tuple to return_dict

* Fix test

* Add recent model

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Julien Plu <plu.julien@gmail.com>
2020-07-30 09:17:00 -04:00
Oren Amsalem
d24ea708d7
Actually the extra_id are from 0-99 and not from 1-100 (#5967)
a = tokenizer.encode("we got a <extra_id_99>", return_tensors='pt',add_special_tokens=True)
print(a)
>tensor([[   62,   530,     3,     9, 32000]])
a = tokenizer.encode("we got a <extra_id_100>", return_tensors='pt',add_special_tokens=True)
print(a)
>tensor([[   62,   530,     3,     9,     3,     2, 25666,   834,    23,    26,
           834,  2915,  3155]])
2020-07-30 06:13:29 -04:00
Funtowicz Morgan
6c002853a6
Added capability to quantize a model while exporting through ONNX. (#6089)
* Added capability to quantize a model while exporting through ONNX.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

We do not support multiple extensions

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Reformat files

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* More quality

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Ensure test_generate_identified_name compares the same object types

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added documentation everywhere on ONNX exporter

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Use pathlib.Path instead of plain-old string

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Use f-string everywhere

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Use the correct parameters for black formatting

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Use Python 3 super() style.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Use packaging.version to ensure installed onnxruntime version match requirements

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fixing imports sorting order.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Missing raise(s)

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added quantization documentation

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix some spelling.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix bad list header format

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-07-29 13:21:29 +02:00
Funtowicz Morgan
640550fc7a
ONNX documentation (#5992)
* Move torchscript and add ONNX documentation under modle_export

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Let's follow guidelines by the gurus: Renamed torchscript.rst to serialization.rst

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Remove previously introduced tree element

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* WIP doc

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* ONNX documentation

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix invalid link

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Improve spelling

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Final wording pass

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-07-29 11:02:35 +02:00
Xin Wen
b9b11795cf
Update model_summary.rst (#5737)
Add '-' to make the reference of Transformer-XL more accurate and formal.
2020-07-27 05:34:02 -04:00
Sylvain Gugger
3b44aa935a
Model utils doc (#6005)
* Document TF modeling utils

* Document all model utils
2020-07-24 09:16:28 -04:00
Sylvain Gugger
33d7506ea1
Update doc of the model page (#5985) 2020-07-22 18:14:57 -04:00
Sylvain Gugger
e714412fe6
Update doc to new model outputs (#5946)
* Update doc to new model outputs

* Fix outputs in quicktour
2020-07-21 18:13:55 -04:00
Sylvain Gugger
a20969170b
Add AlbertForPretraining to doc (#5914) 2020-07-20 17:53:21 -04:00
Joe Davison
5d178954c9
tiny ppl doc typo fix (#5751) 2020-07-14 10:39:44 -06:00
Stas Bekman
45addfe96d
FlaubertForTokenClassification (#5644)
* implement FlaubertForTokenClassification as a subclass of XLMForTokenClassification

* fix mapping order

* add the doc

* add common tests
2020-07-13 14:59:53 -04:00
Stas Bekman
0a19a49dfe
doc improvements (#5688) 2020-07-13 18:10:17 +08:00
Sylvain Gugger
7fad617dc1
Document model outputs (#5673)
* Document model outputs

* Update docs/source/main_classes/output.rst

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-07-10 17:31:02 -04:00
Sylvain Gugger
b2747af543
Improvements to PretrainedConfig documentation (#5642)
* Update PretrainedConfig doc

* Formatting

* Small fixes

* Forgotten args and more cleanup
2020-07-10 10:31:47 -04:00
Sylvain Gugger
760f726e51
Add forum link in the docs (#5637) 2020-07-09 15:13:22 -04:00
Lysandre Debut
1158e56551
Correct extension (#5631) 2020-07-09 11:03:07 -04:00
Stas Bekman
fa5423b169
doc fixes (#5613) 2020-07-08 19:52:44 -04:00
Joe Davison
b4b33fdf25
Guide to fixed-length model perplexity evaluation (#5449)
* add first draft ppl guide

* upload imgs

* expand on strides

* ref typo

* rm superfluous past var

* add tokenization disclaimer
2020-07-07 16:04:15 -06:00
Sam Shleifer
353b8f1e7a
Add mbart-large-cc25, support translation finetuning (#5129)
improve unittests for finetuning, especially w.r.t testing frozen parameters
fix freeze_embeds for T5
add streamlit setup.cfg
2020-07-07 13:23:01 -04:00
Suraj Patil
33e43edddc
[docs] fix model_doc links in model summary (#5566)
* fix model_doc links

* update model links
2020-07-07 11:06:12 -04:00
Quentin Lhoest
fbd8792195
Add DPR model (#5279)
* beginning of dpr modeling

* wip

* implement forward

* remove biencoder + better init weights

* export dpr model to embed model for nlp lib

* add new api

* remove old code

* make style

* fix dumb typo

* don't load bert weights

* docs

* docs

* style

* move the `k` parameter

* fix init_weights

* add pretrained configs

* minor

* update config names

* style

* better config

* style

* clean code based on PR comments

* change Dpr to DPR

* fix config

* switch encoder config to a dict

* style

* inheritance -> composition

* add messages in assert startements

* add dpr reader tokenizer

* one tokenizer per model

* fix base_model_prefix

* fix imports

* typo

* add convert script

* docs

* change tokenizers conf names

* style

* change tokenizers conf names

* minor

* minor

* fix wrong names

* minor

* remove unused convert functions

* rename convert script

* use return_tensors in tokenizers

* remove n_questions dim

* move generate logic to tokenizer

* style

* add docs

* docs

* quality

* docs

* add tests

* style

* add tokenization tests

* DPR full tests

* Stay true to the attention mask building

* update docs

* missing param in bert input docs

* docs

* style

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-07-07 08:56:12 -04:00
Lysandre
1d2332861f Post v3.0.2 release commit 2020-07-06 18:56:47 -04:00
Lysandre
b0892fa0e8 Release: v3.0.2 2020-07-06 18:49:44 -04:00
Arnav Sharma
b2309cc6bf
Typo fix in training doc (#5495) 2020-07-06 09:15:22 -04:00
ELanning
7ecff0ccbb
Fix typo in training (#5510) 2020-07-06 09:14:57 -04:00
Sylvain Gugger
6b735a7253
Tokenizer summary (#5467)
* Work on tokenizer summary

* Finish tutorial

* Link to it

* Apply suggestions from code review

Co-authored-by: Anthony MOI <xn1t0x@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Add vocab definition

Co-authored-by: Anthony MOI <xn1t0x@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-07-02 17:07:42 -04:00
George Ho
84e56669af
Fix typo in glossary (#5466) 2020-07-02 09:19:33 -04:00
Patrick von Platen
d16e36c7e5
[Reformer] Add Masked LM Reformer (#5426)
* fix conflicts

* fix

* happy rebasing
2020-07-01 22:43:18 +02:00
Patrick von Platen
fe81f7d12c
finish reformer qa head (#5433) 2020-07-01 12:27:14 -04:00
Sylvain Gugger
6c55e9fc32
Fix dropdown bug in searches (#5440)
* Trigger CI

* Fix dropdown bug in searches
2020-07-01 11:02:59 -04:00
Sylvain Gugger
4ade7491f4
Fix examples titles and optimization doc page (#5408) 2020-07-01 08:11:25 -04:00
Sylvain Gugger
87716a6d07
Documentation for the Trainer API (#5383)
* Documentation for the Trainer API

* Address review comments

* Address comments
2020-06-30 11:43:43 -04:00
Sylvain Gugger
0607b88945
How to share model cards with the CLI (#5374)
* How to share model cards

* Switch the two options

* Fix bad copy/cut

* Julien's suggestion
2020-06-30 08:59:32 -04:00
Lysandre Debut
b9ee87f5c7
Doc for v3.0.0 (#5366)
* Doc for v3.0.0

* Update docs/source/_static/js/custom.js

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/_static/js/custom.js

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-06-29 11:08:54 -04:00
Lysandre
b62ca59527 Release: v3.0.0 2020-06-29 10:40:13 -04:00
Patrick von Platen
4bcc35cd69
[Docs] Benchmark docs (#5360)
* first doc version

* add benchmark docs

* fix typos

* improve README

* Update docs/source/benchmarks.rst

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* fix naming and docs

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-06-29 16:08:57 +02:00
Julien Chaumond
c950fef545 [docs] Small tweaks to #5323 2020-06-29 14:24:33 +02:00
Sylvain Gugger
1af58c0706
New model sharing tutorial (#5323) 2020-06-27 11:10:02 -04:00
Thomas Wolf
601d4d699c
[tokenizers] Updates data processors, docstring, examples and model cards to the new API (#5308)
* remove references to old API in docstring - update data processors

* style

* fix tests - better type checking error messages

* better type checking

* include awesome fix by @LysandreJik for #5310

* updated doc and examples
2020-06-26 19:48:14 +02:00
Joe Davison
2ffef0d0c7
Training & fine-tuning quickstart (#5034)
* add initial fine-tuning guide

* split code blocks to smaller segments

* fix up trianer section of fine-tune doc

* a few last typos

* Update usage -> task summary link

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-06-25 15:11:11 -06:00
Lysandre Debut
364a5ae1f0
Refactor Code samples; Test code samples (#5036)
* Refactor code samples

* Test docstrings

* Style

* Tokenization examples

* Run rust of tests

* First step to testing source docs

* Style and BART comment

* Test the remainder of the code samples

* Style

* let to const

* Formatting fixes

* Ready for merge

* Fix fixture + Style

* Fix last tests

* Update docs/source/quicktour.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Addressing @sgugger's comments + Fix MobileBERT in TF

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-06-25 16:46:00 -04:00
Sylvain Gugger
d12ceb48ba
Tokenization tutorial (#5257)
* All done

* Link to the tutorial

* Typo fixes

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Add metnion of the return_xxx args

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-06-24 18:43:20 -04:00
Sylvain Gugger
6894b486d0
Fix version controller links (for realsies) (#5251) 2020-06-24 12:13:43 -04:00
Sylvain Gugger
609e0c583f
Fix links (#5248) 2020-06-24 11:35:55 -04:00
Sylvain Gugger
7c41057d50
Add hugs (#5225) 2020-06-24 07:56:14 -04:00
Sylvain Gugger
173528e368
Add version control menu (#5222)
* Add version control menu

* Constify things

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Apply suggestions from code review

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-23 17:05:12 -04:00
Sylvain Gugger
417e492f1e
Quick tour (#5145)
* Quicktour part 1

* Update

* All done

* Typos

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Address comments in quick tour

* Update docs/source/quicktour.rst

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update from feedback

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-06-22 16:08:09 -04:00