Commit Graph

435 Commits

Author SHA1 Message Date
Sylvain Gugger
820cb97a3f
Organize test jobs (#19058)
* Tests conditional run

* Syntax

* Deps

* Try early exit

* Another way

* Test with no tests to run

* Test all

* Typo

* Try this way

* With tests to run

* Mostly finished

* Typo

* With a modification in one file only

* No change, no tests

* Final cleanup

* Address review comments
2022-09-16 09:19:51 -04:00
Sylvain Gugger
f7ce4f1ff7
Fix custom tokenizers test (#19052)
* Fix CI for custom tokenizers

* Add nightly tests

* Run CI, run!

* Fix paths

* Typos

* Fix test
2022-09-15 11:31:09 -04:00
Sylvain Gugger
3774010161
Automate check for new pipelines and metadata update (#19029)
* Automate check for new pipelines and metadata update

* Add Datasets to quality extra
2022-09-14 14:06:49 -04:00
Craig Chan
fbf382c84d
Determine framework automatically before ONNX export (#18615)
* Automatic detection for framework to use when exporting to ONNX

* Log message change

* Incorporating PR comments, adding unit test

* Adding tf for pip install for run_tests_onnxruntime CI

* Restoring past changes to circleci yaml and test_onnx_v2.py, tests moved to tests/onnx/test_features.py

* Fixup

* Adding test to fetcher

* Updating circleci config to log more

* Changing test class name

* Comment typo fix in tests/onnx/test_features.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Moving torch_str/tf_str to self.framework_pt/tf

* Remove -rA flag in circleci config

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2022-08-25 16:31:34 +02:00
Yih-Dar
84beb8a49b
Unpin detectron2 (#18727)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-08-23 11:10:07 +02:00
Yih-Dar
2c947d2939
Ping detectron2 for CircleCI tests (#18680)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-08-18 12:57:18 +02:00
Matt
6eb51450fa
TF Examples Rewrite (#18451)
* Finished QA example

* Dodge a merge conflict

* Update text classification and LM examples

* Update NER example

* New Keras metrics WIP, fix NER example

* Update NER example

* Update MC, summarization and translation examples

* Add XLA warnings when shapes are variable

* Make sure batch_size is consistently scaled by num_replicas

* Add PushToHubCallback to all models

* Add docs links for KerasMetricCallback

* Add docs links for prepare_tf_dataset and jit_compile

* Correct inferred model names

* Don't assume the dataset has 'lang'

* Don't assume the dataset has 'lang'

* Write metrics in text classification

* Add 'framework' to TrainingArguments and TFTrainingArguments

* Export metrics in all examples and add tests

* Fix training args for Flax

* Update command line args for translation test

* make fixup

* Fix accidentally running other tests in fp16

* Remove do_train/do_eval from run_clm.py

* Remove do_train/do_eval from run_mlm.py

* Add tensorflow tests to circleci

* Fix circleci

* Update examples/tensorflow/language-modeling/run_mlm.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update examples/tensorflow/test_tensorflow_examples.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update examples/tensorflow/translation/run_translation.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update examples/tensorflow/token-classification/run_ner.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Fix save path for tests

* Fix some model card kwargs

* Explain the magical -1000

* Actually enable tests this time

* Skip text classification PR until we fix shape inference

* make fixup

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2022-08-10 16:49:51 +01:00
Sylvain Gugger
70b0d4e193
Fix compatibility with 1.12 (#17925)
* Fix compatibility with 1.12

* Remove pin from examples requirements

* Update torch scatter version

* Fix compatibility with 1.12

* Remove pin from examples requirements

* Update torch scatter version

* fix torch.onnx.symbolic_opset12 import

* Reject bad version

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-08-08 09:53:08 -04:00
Yih-Dar
6649133124
Add PYTEST_TIMEOUT for CircleCI test jobs (#18251)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-07-26 17:57:59 +02:00
Yih-Dar
4b1ed7979f
update cache to v0.5 (#18203)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-07-20 08:14:10 +02:00
Yih-Dar
05ed569c79
Use next-gen CircleCI convenience images (#18197)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-07-19 15:43:05 +02:00
Sylvain Gugger
1b749a7f8d
Sort doc toc (#18034)
* Add script to sort doc ToC

* Style and fixes

* Add check to quality job
2022-07-07 08:17:58 -04:00
Yih-Dar
216499bfcc
Fix CI tests hang forever (#17471)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-06-02 10:30:54 +02:00
Joao Gante
ca1f1c8685
CLI: tool to convert PT into TF weights and open hub PR (#17497) 2022-06-01 18:52:07 +01:00
NielsRogge
31ee80d556
Add LayoutLMv3 (#17060)
* Make forward pass work

* More improvements

* Remove unused imports

* Remove timm dependency

* Improve loss calculation of token classifier

* Fix most tests

* Add docs

* Add model integration test

* Make all tests pass

* Add LayoutLMv3FeatureExtractor

* Improve integration test + make fixup

* Add example script

* Fix style

* Add LayoutLMv3Processor

* Fix style

* Add option to add visual labels

* Make more tokenizer tests pass

* Fix more tests

* Make more tests pass

* Fix bug and improve docs

* Fix import of processors

* Improve docstrings

* Fix toctree and improve docs

* Fix auto tokenizer

* Move tests to model folder

* Move tests to model folder

* change default behavior add_prefix_space

* add prefix space for fast

* add_prefix_spcae set to True for Fast

* no space before `unique_no_split` token

* add test to hightligh special treatment of added tokens

* fix `test_batch_encode_dynamic_overflowing` by building a long enough example

* fix `test_full_tokenizer` with add_prefix_token

* Fix tokenizer integration test

* Make the code more readable

* Add tests for LayoutLMv3Processor

* Fix style

* Add model to README and update init

* Apply suggestions from code review

* Replace asserts by value errors

* Add suggestion by @ducviet00

* Add model to doc tests

* Simplify script

* Improve README

* a step ahead to fix

* Update pair_input_test

* Make all tokenizer tests pass - phew

* Make style

* Add LayoutLMv3 to CI job

* Fix auto mapping

* Fix CI job name

* Make all processor tests pass

* Make tests of LayoutLMv2 and LayoutXLM consistent

* Add copied from statements to fast tokenizer

* Add copied from statements to slow tokenizer

* Remove add_visual_labels attribute

* Fix tests

* Add link to notebooks

* Improve docs of LayoutLMv3Processor

* Fix reference to section

Co-authored-by: SaulLu <lucilesaul.com@gmail.com>
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-05-24 09:53:45 +02:00
Sylvain Gugger
ddb1a47ec8
Automatically sort auto mappings (#17250)
* Automatically sort auto mappings

* Better class extraction

* Some auto class magic

* Adapt test and underlying behavior

* Remove re-used config

* Quality
2022-05-16 13:24:20 -04:00
Sylvain Gugger
afe5d42d8d
Black preview (#17217)
* Black preview

* Fixup too!

* Fix check copies

* Use the same version as the CI

* Bump black
2022-05-12 16:25:55 -04:00
Zachary Mueller
2fbb237967
Add the auto_find_batch_size capability from Accelerate into Trainer (#17068)
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

- Adds auto_batch_size finder 
- Moves training loop to an inner training loop
2022-05-09 12:29:18 -04:00
Zachary Mueller
ef20390291
Update to build via git for accelerate (#17084) 2022-05-04 09:42:36 -04:00
lewtun
4bb1d0ec84
Skip RoFormer ONNX test if rjieba not installed (#16981)
* Skip RoFormer ONNX test if rjieba not installed

* Update deps table

* Skip RoFormer serialization test

* Fix RoFormer vocab

* Add rjieba to CircleCI
2022-05-04 10:04:10 +02:00
Yih-Dar
19420fd99e
Move test model folders (#17034)
* move test model folders (TODO: fix imports and others)

* fix (potentially partially) imports (in model test modules)

* fix (potentially partially) imports (in tokenization test modules)

* fix (potentially partially) imports (in feature extraction test modules)

* fix import utils.test_modeling_tf_core

* fix path ../fixtures/

* fix imports about generation.test_generation_flax_utils

* fix more imports

* fix fixture path

* fix get_test_dir

* update module_to_test_file

* fix get_tests_dir from wrong transformers.utils

* update config.yml (CircleCI)

* fix style

* remove missing imports

* update new model script

* update check_repo

* update SPECIAL_MODULE_TO_TEST_MAP

* fix style

* add __init__

* update self-scheduled

* fix add_new_model scripts

* check one way to get location back

* python setup.py build install

* fix import in test auto

* update self-scheduled.yml

* update slack notification script

* Add comments about artifact names

* fix for yolos

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-05-03 14:42:02 +02:00
Sylvain Gugger
1073f00d4e
Clean up setup.py (#17045)
* Clean up setup.py

* Trigger CI

* Upgrade Python used
2022-05-02 12:58:17 -04:00
Yih-Dar
ede5e04191
Add a check on config classes docstring checkpoints (#17012)
* Add the check

* add missing ckpts

* add a list to ignore

* call the added check script

* better regex pattern

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-04-30 10:40:46 +02:00
Stas Bekman
5da33f8729
[modeling utils] revamp from_pretrained(..., low_cpu_mem_usage=True) + tests (#16657)
* add low_cpu_mem_usage tests

* wip: revamping

* wip

* install /usr/bin/time

* wip

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* fix assert

* put the wrapper back

* cleanup; switch to bert-base-cased

* Trigger CI

* Trigger CI
2022-04-14 18:10:05 -07:00
Zachary Mueller
89293a0f6b
Make nightly install dev accelerate (#16783) 2022-04-14 09:41:02 -04:00
Zachary Mueller
d57da99237
Add tests for no_trainer and fix existing examples (#16656)
* Fixed some bugs involving saving during epochs
* Added tests mimicking the existing examples tests
* Added in json exporting to all `no_trainer` examples for consistency
2022-04-08 10:03:56 -04:00
Sylvain Gugger
473709fc76
Use doc builder styler (#16412)
* Config update

* Use doc-builder styler

* Cleanup

* Adapt import

* We need it there too!
2022-03-28 07:45:18 -04:00
Lysandre Debut
eca77f4719
Updates the default branch from master to main (#16326)
* Updates the default branch from master to main

* Links from `master` to `main`

* Typo

* Update examples/flax/README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-03-23 03:46:59 -04:00
Lysandre Debut
0868fdef85
Fix torch-scatter version (#16072) 2022-03-11 09:03:27 -05:00
lewtun
50dd314d93
Add ONNX export for ViT (#15658)
* Add ONNX support for ViT

* Refactor to use generic preprocessor

* Add vision dep to tests

* Extend ONNX slow tests to ViT

* Add dummy image generator

* Use model_type to determine modality

* Add deprecation warnings for tokenizer argument

* Add warning when overwriting the preprocessor

* Add optional args to docstrings

* Add minimum PyTorch version to OnnxConfig

* Refactor OnnxConfig class variables from CONSTANT_NAME to snake_case

* Add reasonable value for default atol

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-03-09 17:36:59 +01:00
SaulLu
e93763d420
fix CLIP fast tokenizer and change some properties of the slow version (#15067)
Very big changes concerning the tokenizer fast of CLIP which did not correspond to the tokenizer slow of CLIP

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-02-18 10:21:30 +01:00
cody-moveworks
a54961c5f7
Make OpenAIGPTTokenizer work with SpaCy 2.x and 3.x (#15019)
* Make OpenAIGPTTokenizer work with SpaCy 3.x

SpaCy 3.x introduced an API change to creating the tokenizer that
breaks OpenAIGPTTokenizer. The old API for creating the tokenizer in
SpaCy 2.x no longer works under SpaCy 3.x, but the new API for creating
the tokenizer in SpaCy 3.x DOES work under SpaCy 2.x. Switching to the
new API should allow OpenAIGPTTokenizer to work under both SpaCy 2.x and
SpaCy 3.x versions.

* Add is_spacy_available and is_ftfy_available methods to file utils

* Add spacy and ftfy unittest decorator to testing utils

* Add tests for OpenAIGPTTokenizer that require spacy and ftfy

* Modify CircleCI config to run tests that require spacy and ftfy

* Remove unneeded unittest decorators are reuse test code

* Run make fixup
2022-01-10 07:53:20 -05:00
Sylvain Gugger
87e6e4fe5c
Doc styler v2 (#14950)
* New doc styler

* Fix issue with args at the start

* Code sample fixes

* Style code examples in MDX

* Fix more patterns

* Typo

* Typo

* More patterns

* Do without black for now

* Get more info in error

* Docstring style

* Re-enable check

* Quality

* Fix add_end_docstring decorator

* Fix docstring
2021-12-27 16:31:21 -05:00
Sylvain Gugger
7af80f6618
Convert docstrings of modeling files (#14850)
* Convert file_utils docstrings to Markdown

* Test on BERT

* Return block indent

* Temporarily disable doc styler

* Remove from quality checks as well

* Remove doc styler mess

* Remove check from circleCI

* Fix typo

* Convert file_utils docstrings to Markdown

* Test on BERT

* Return block indent

* Temporarily disable doc styler

* Remove from quality checks as well

* Remove doc styler mess

* Remove check from circleCI

* Fix typo

* Let's go on all other model files

* Add templates too

* Styling and quality
2021-12-21 05:37:32 -05:00
Patrick von Platen
c4a96cecbc
Wav2Vec2 meets phonemes (#14353)
* up

* add tokenizer

* improve more

* finish tokenizer

* finish

* adapt speech recognition script

* adapt convert

* more fixes

* more fixes

* update phonemizer wav2vec2

* better naming

* fix more tests

* more fixes swedish

* correct tests

* finish

* improve script

* remove file

* up

* lets get those 100 model architectures until the end of the month

* make fix-copies

* correct more

* correct script

* more fixes

* more fixes

* add to docs

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* replace assert

* fix copies

* fix docs

* new try docs

* boom boom

* update

* add phonemizer to audio tests

* make fix-copies

* up

* upload models

* some changes

* Update tests/test_tokenization_wav2vec2_phoneme.py

Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

* more fixes

* remove @

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
2021-12-17 19:56:44 +01:00
Sylvain Gugger
e9800122a6 Add kenlm dep to missing tests 2021-12-08 19:59:44 -05:00
Patrick von Platen
961732c276
[Wav2Vec2] PyCTCDecode Integration to support language model boosted decoding (#14339)
* up

* up

* up

* make it cleaner

* correct

* make styhahalal

* add more tests

* finish

* small fix

* make style

* up

* tryout to solve cicrle ci

* up

* fix more tests

* fix more tests

* apply sylvains suggestions

* fix import

* correct docs

* add pyctcdecode only to speech tests

* fix more tests

* add tf, flax and pt tests

* add pt

* fix last tests

* fix more tests

* Apply suggestions from code review

* change lines

* Apply suggestions from code review

Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

* correct tests

* correct tests

* add doc string

Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
2021-12-08 12:07:54 +01:00
Suraj Patil
c824d7ed48
add flax example tests in CI workflow (#14637) 2021-12-06 14:50:43 +05:30
Suraj Patil
c5bd732ac6
Add Flax example tests (#14599)
* add test for glue

* add tests for clm

* fix clm test

* add summrization tests

* more tests

* fix few tests

* add test for t5 mlm

* fix t5 mlm test

* fix tests for multi device

* cleanup

* ci job

* fix metric file name

* make t5 more robust
2021-12-06 10:48:58 +05:30
Lysandre Debut
e4c67d60ec
Python 3.6 -> Python 3.7 for TF runs (#14598) 2021-12-02 04:09:17 -05:00
Sylvain Gugger
4df7d05a87
Doc new front (#14590)
* Convert PretrainedConfig doc to Markdown

* Use syntax

* Add necessary doc files (#14496)

* Doc fixes (#14499)

* Fixes for the new front

* Convert DETR file for table

* Title is needed

* Simplify a bit

* Even simpler

* Remove imports

* Fix typo in toctree (#14516)

* Fix checkpoints badge

* Update versions.yml format (#14517)

* Doc new front github actions (#14512)

* Doc new front github actions

* Fix docstring

* Fix feature extraction utils import (#14515)

* Address Julien's comments

* Push to doc-builder

* Ready for merge

* Remove old build and deploy

* Doc misc fixes (#14583)

* Rm versions.yml from doc

* Fix converting.rst

* Rm pretrained_models from toctree

* Fix index links (#14567)

* Fix links in README

* Localized READMEs

* Fix copy script

* Fix find doc script

* Update README_ko.md

Co-authored-by: Julien Chaumond <julien@huggingface.co>

Co-authored-by: Julien Chaumond <julien@huggingface.co>

* Adapt build command to new CLI tools (#14578)

* Fix typo

* Fix doc interlinks (#14589)

* Convert PretrainedConfig doc to Markdown

* Use syntax

* Rm pattern <[a-z]+(.html).*>

* Rm huggingface.co/transformers/master

* Rm .html

* Rm .html from index.mdx

* Rm .html from model_summary.rst

* Update index.mdx rm html

* Update remove .html

* Fix inner doc links

* Fix interlink in preprocssing.rst

* Update pr_checks

Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Convert PretrainedConfig doc to Markdown

* Use syntax

* Add necessary doc files (#14496)

* Doc fixes (#14499)

* Fixes for the new front

* Convert DETR file for table

* Title is needed

* Simplify a bit

* Even simpler

* Remove imports

* Fix checkpoints badge

* Fix typo in toctree (#14516)

* Update versions.yml format (#14517)

* Doc new front github actions (#14512)

* Doc new front github actions

* Fix docstring

* Fix feature extraction utils import (#14515)

* Address Julien's comments

* Push to doc-builder

* Ready for merge

* Remove old build and deploy

* Doc misc fixes (#14583)

* Rm versions.yml from doc

* Fix converting.rst

* Rm pretrained_models from toctree

* Fix index links (#14567)

* Fix links in README

* Localized READMEs

* Fix copy script

* Fix find doc script

* Update README_ko.md

Co-authored-by: Julien Chaumond <julien@huggingface.co>

Co-authored-by: Julien Chaumond <julien@huggingface.co>

* Adapt build command to new CLI tools (#14578)

* Fix typo

* Fix doc interlinks (#14589)

* Convert PretrainedConfig doc to Markdown

* Use syntax

* Rm pattern <[a-z]+(.html).*>

* Rm huggingface.co/transformers/master

* Rm .html

* Rm .html from index.mdx

* Rm .html from model_summary.rst

* Update index.mdx rm html

* Update remove .html

* Fix inner doc links

* Fix interlink in preprocssing.rst

* Update pr_checks

Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Styling

Co-authored-by: Mishig Davaadorj <mishig.davaadorj@coloradocollege.edu>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Julien Chaumond <julien@huggingface.co>
2021-12-01 14:13:02 -05:00
Kamal Raj
c468a87a69
Tapas tf (#13393)
* TF Tapas first commit

* updated docs

* updated logger message

* updated pytorch weight conversion
script to support scalar array

* added use_cache to tapas model config to
work properly with tf input_processing

* 1. rm embeddings_sum
2. added # Copied
3. + TFTapasMLMHead
4. and lot other small fixes

* updated docs

* + test for tapas

* updated testing_utils to check
is_tensorflow_probability_available

* converted model logits post processing using
numpy to work with both PT and TF models

* + TFAutoModelForTableQuestionAnswering

* added TF support

* added test for
TFAutoModelForTableQuestionAnswering

* added test for
TFAutoModelForTableQuestionAnswering pipeline

* updated auto model docs

* fixed typo in import

* added tensorflow_probability to run tests

* updated MLM head

* updated tapas.rst with TF  model docs

* fixed optimizer import in docs

* updated convert to np
data from pt model is not
`transformers.tokenization_utils_base.BatchEncoding`
after pipeline upgrade

* updated pipeline:
1. with torch.no_gard removed, pipeline forward handles
2. token_type_ids converted to numpy

* updated docs.

* removed `use_cache` from config

* removed floats_tensor

* updated code comment

* updated Copyright Year and
logits_aggregation Optional

* updated docs and comments

* updated docstring

* fixed model weight loading

* make fixup

* fix indentation

* added tf slow pipeline test

* pip upgrade

* upgrade python to 3.7

* removed from_pt from tests

* revert commit f18cfa9
2021-11-30 11:07:55 +01:00
NielsRogge
3772af49ce
[Tests] Improve vision tests (#14458)
* Improve tests

* Install vision for tf tests
2021-11-24 15:22:20 +01:00
Shang Zhang
a59e7c1ed4
Add QDQBert model and quantization examples of SQUAD task (#14066)
* clean up branch for add-qdqbert-model

* README update for QAT example; update docstrings in modeling_qdqbert.py

* Update qdqbert.rst

* Update README.md

* Update README.md

* calibration data using traning set; QAT example runs in fp32

* re-use BERTtokenizer for qdqbert

* Update docs/source/model_doc/qdqbert.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/model_doc/qdqbert.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/model_doc/qdqbert.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* remove qdqbert tokenizer

* Update qdqbert.rst

* update evaluate-hf-trt-qa.py

* update configuration_qdqbert.py

* update modeling_qdqbert.py: add copied statement; replace assert with ValueError

* update copied from statement

* add is_quantization_available; run make fix-copies

* unittest add require_quantization

* add backend dependency to qdqbert model

* update README; update evaluate script; make style

* lint

* docs qdqbert update

* circleci build_doc add pytorch-quantization for qdqbert

* update README

* update example readme with instructions to upgrade TensorRT to 8.2

* Update src/transformers/models/qdqbert/configuration_qdqbert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/qdqbert/configuration_qdqbert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/qdqbert/configuration_qdqbert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/qdqbert/configuration_qdqbert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* change quantization to pytorch_quantization for backend requirement

* feed_forward_chunking not supported in QDQBert

* make style

* update model docstrings and comments in testing scripts

* rename example to quantization-qdqbert; rename example scripts from qat to quant

* Update src/transformers/models/qdqbert/modeling_qdqbert.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* rm experimental functions in quant_trainer

* qa cleanup

* make fix-copies for docs index.rst

* fix doctree; use post_init() for qdqbert

* fix early device assignment for qdqbert

* fix CI:Model templates runner

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-11-19 13:33:39 -05:00
Lysandre Debut
331c3d2aa0
Add GitPython to quality tools (#14459)
* Update setup.py

* Update setup.py

* Update setup.py

* Remove GitPython install
2021-11-19 08:43:48 -05:00
Lysandre
c6c075544d Docs for version v4.12.5 2021-11-17 11:39:12 -05:00
Lysandre
888fb21159 Docs for v4.12.4 2021-11-16 17:40:58 -05:00
Sylvain Gugger
f0d6e952c0
Quality explain (#14264)
* Start PR doc

* Cleanup the quality checks and document them

* Add reference in the contributing guide

* Apply suggestions from code review

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Rename file as per review suggestion

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2021-11-03 17:43:19 -04:00
Lysandre
9fc1951711 Docs for v4.12.2 2021-10-29 14:51:05 -04:00
Lysandre
513fa30a63 Docs for v4.12.1 2021-10-29 13:49:50 -04:00
Lysandre Debut
63d91f449c
Torch 1.10 (#14169)
* Torch 1.10

* torch scatter for 1.10

* style

* Skip tests
ok
2021-10-29 13:43:43 -04:00
Sylvain Gugger
4ab6a4a086
Fix pipeline tests env and fetch (#14209)
* Fix pipeline tests env and fetch

* Fix quality
2021-10-29 09:35:05 -04:00
Lysandre
b8fad022a0 v4.13.0.dev0 2021-10-28 12:56:46 -04:00
Lysandre Debut
5b317f7ea4
Scatter dummies + skip pipeline tests (#13996)
* Scatter dummies + skip pipeline tests

* Add torch scatter to build docs
2021-10-14 15:30:27 -04:00
Lysandre
5be59a3649 Deploy docs for v4.11.3 2021-10-06 12:58:47 -04:00
Sylvain Gugger
5f25855b3e Update doc for v4.11.2 2021-09-30 11:58:33 -04:00
Sylvain Gugger
cf4aa3597f Update doc for v4.11.1 2021-09-29 12:09:40 -04:00
Lysandre
11c69b8045 Docs for version v4.11.0 2021-09-27 14:19:38 -04:00
Sylvain Gugger
a8ec002926
Update test dependence for torch examples (#13738) 2021-09-25 18:47:39 +02:00
Sylvain Gugger
af5c6ae5ed
Properly use test_fetcher for examples (#13604)
* Properly use test_fetcher for examples

* Fake example modification

* Fake modeling file modification

* Clean fake modifications

* Run example tests for any modification.
2021-09-16 15:13:00 -04:00
patrickvonplaten
72ec2f3eb5 Docs for v4.10.1 2021-09-10 16:45:19 +02:00
Sylvain Gugger
c1b20e42f5 Redeploy stable documentation 2021-09-01 09:21:50 -04:00
Li-Huai (Allan) Lin
85cb447766 Revert "Correct wrong function signatures on the docs website (#13198)"
This reverts commit ffecfea949.
2021-09-01 09:17:08 -04:00
Lysandre
e53af030c0 Re-deploy documentation 2021-08-31 16:18:14 +02:00
Lysandre
5ee67a4412 Docs for v4.10.0 2021-08-31 16:02:31 +02:00
Patrick von Platen
062300ba7f
[Testing] Add Flax Tests on GPU, Add Speech and Vision to Flax & TF tests (#13313)
* up

* finish

* Apply suggestions from code review

* apply Lysandres suggestions

* adapt circle ci as well

* finish

* Update setup.py
2021-08-31 11:08:22 +02:00
Li-Huai (Allan) Lin
ffecfea949
Correct wrong function signatures on the docs website (#13198)
* Correct outdated function signatures on website.

* Upgrade sphinx to 3.5.4 (latest 3.x)

* Test

* Test

* Test

* Test

* Test

* Test

* Revert unnecessary changes.

* Change sphinx version to 3.5.4"

* Test python 3.7.11
2021-08-30 11:40:25 -04:00
NielsRogge
b6ddb08a66
Add LayoutLMv2 + LayoutXLM (#12604)
* First commit

* Make style

* Fix dummy objects

* Add Detectron2 config

* Add LayoutLMv2 pooler

* More improvements, add documentation

* More improvements

* Add model tests

* Add clarification regarding image input

* Improve integration test

* Fix bug

* Fix another bug

* Fix another bug

* Fix another bug

* More improvements

* Make more tests pass

* Make more tests pass

* Improve integration test

* Remove gradient checkpointing and add head masking

* Add integration test

* Add LayoutLMv2ForSequenceClassification to the tests

* Add LayoutLMv2ForQuestionAnswering

* More improvements

* More improvements

* Small improvements

* Fix _LazyModule

* Fix fast tokenizer

* Move sync_batch_norm to a separate method

* Replace dummies by requires_backends

* Move calculation of visual bounding boxes to separate method + update README

* Add models to main init

* First draft

* More improvements

* More improvements

* More improvements

* More improvements

* More improvements

* Remove is_split_into_words

* More improvements

* Simply tesseract - no use of pandas anymore

* Add LayoutLMv2Processor

* Update is_pytesseract_available

* Fix bugs

* Improve feature extractor

* Fix bug

* Add print statement

* Add truncation of bounding boxes

* Add tests for LayoutLMv2FeatureExtractor and LayoutLMv2Tokenizer

* Improve tokenizer tests

* Make more tokenizer tests pass

* Make more tests pass, add integration tests

* Finish integration tests

* More improvements

* More improvements - update API of the tokenizer

* More improvements

* Remove support for VQA training

* Remove some files

* Improve feature extractor

* Improve documentation and one more tokenizer test

* Make quality and small docs improvements

* Add batched tests for LayoutLMv2Processor, remove fast tokenizer

* Add truncation of labels

* Apply suggestions from code review

* Improve processor tests

* Fix failing tests and add suggestion from code review

* Fix tokenizer test

* Add detectron2 CI job

* Simplify CI job

* Comment out non-detectron2 jobs and specify number of processes

* Add pip install torchvision

* Add durations to see which tests are slow

* Fix tokenizer test and make model tests smaller

* Frist draft

* Use setattr

* Possible fix

* Proposal with configuration

* First draft of fast tokenizer

* More improvements

* Enable fast tokenizer tests

* Make more tests pass

* Make more tests pass

* More improvements

* Addd padding to fast tokenizer

* Mkae more tests pass

* Make more tests pass

* Make all tests pass for fast tokenizer

* Make fast tokenizer support overflowing boxes and labels

* Add support for overflowing_labels to slow tokenizer

* Add support for fast tokenizer to the processor

* Update processor tests for both slow and fast tokenizers

* Add head models to model mappings

* Make style & quality

* Remove Detectron2 config file

* Add configurable option to label all subwords

* Fix test

* Skip visual segment embeddings in test

* Use ResNet-18 backbone in tests instead of ResNet-101

* Proposal

* Re-enable all jobs on CI

* Fix installation of tesseract

* Fix failing test

* Fix index table

* Add LayoutXLM doc page, first draft of code examples

* Improve documentation a lot

* Update expected boxes for Tesseract 4.0.0 beta

* Use offsets to create labels instead of checking if they start with ##

* Update expected boxes for Tesseract 4.1.1

* Fix conflict

* Make variable names cleaner, add docstring, add link to notebooks

* Revert "Fix conflict"

This reverts commit a9b46ce9afe47ebfcfe7b45e6a121d49e74ef2c5.

* Revert to make integration test pass

* Apply suggestions from @LysandreJik's review

* Address @patrickvonplaten's comments

* Remove fixtures DocVQA in favor of dataset on the hub

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-08-30 12:35:42 +02:00
Sylvain Gugger
b0a917c48a
Fix CircleCI nightly tests (#13113) 2021-08-13 08:57:30 +02:00
Sylvain Gugger
9e9b8f1d99
Roll out the test fetcher on push tests (#13055)
* Use test fetcher for push tests as well

* Force diff with last commit for circleCI on master

* Fix syntax error

* Style

* Schedule nightly tests
2021-08-10 14:54:52 +02:00
Lysandre
a8bf2fa76e Documentation for patch v4.9.2 2021-08-09 16:14:17 +02:00
Sylvain Gugger
a492aec82d Update doc 2021-07-26 10:27:14 -04:00
Lysandre
40de2d5a4f Docs for v4.10.0dev0 2021-07-22 12:52:25 +02:00
Stas Bekman
7fae535052
add troubleshooting docs (#12791) 2021-07-20 03:32:02 -04:00
Sylvain Gugger
084873b025
Only test the files impacted by changes in the diff (#12644)
* Base test

* More test

* Fix mistake

* Add a docstring change

* Add doc ignore

* Add changes

* Add recursive dep search

* Add recursive dep search

* save

* Finalize test mapping

* Fix bug

* Print prettier

* Ignore comments and empty lines

* Make script runnable from anywhere

* Need dev install

* Like that

* Adapt

* Add as artifact

* Try on torch tests

* Fix yaml error

* Install GitPython

* Apply everywhere

* Be more defensive

* Revert to all tests if something is wrong

* Install GitPython

* Test if there are tests before launching.

* Fixes

* Fixes

* Fixes

* Fixes

* Bash syntax is horrible

* Be less stupid

* Try differently

* Typo

* Typo

* Typo

* Style

* Better name

* Escape quotes

* Ignore black unhelpful re-formatting

* Not a docstring

* Deal with inits in dependency map

* Run all tests once PR is merged.

* Add last job

* Apply suggestions from code review

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Stronger dependencies gather

* Ignore empty lines too!

* Clean up

* Fix quality

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2021-07-14 10:56:55 -04:00
Funtowicz Morgan
2aa3cd935d
[RFC] Laying down building stone for more flexible ONNX export capabilities (#11786)
* Laying down building stone for more flexible ONNX export capabilities

* Ability to provide a map of config key to override before exporting.

* Makes it possible to export BART with/without past keys.

* Supports simple mathematical syntax for OnnxVariable.repeated

* Effectively apply value override from onnx config for model

* Supports export with additional features such as with-past for seq2seq

* Store the output path directly in the args for uniform usage across.

* Make BART_ONNX_CONFIG_* constants and fix imports.

* Support BERT model.

* Use tokenizer for more flexibility in defining the inputs of a model.

* Add TODO as remainder to provide the batch/sequence_length as CLI args

* Enable optimizations to be done on the model.

* Enable GPT2 + past

* Improve model validation with outputs containing nested structures

* Enable Roberta

* Enable Albert

* Albert requires opset >= 12

* BERT-like models requires opset >= 12

* Remove double printing.

* Enable XLM-Roberta

* Enable DistilBERT

* Disable optimization by default

* Fix missing setattr when applying optimizer_features

* Add value field to OnnxVariable to define constant input (not from tokenizers)

* Add T5 support.

* Simplify model type retrieval

* Example exporting token_classification pipeline for DistilBERT.

* Refactoring to package `transformers.onnx`

* Solve circular dependency & __main__

* Remove unnecessary imports in `__init__`

* Licences

* Use @Narsil's suggestion to forward the model's configuration to the ONNXConfig to avoid interpolation.

* Onnx export v2 fixes (#12388)

* Tiny fixes
Remove `convert_pytorch` from onnxruntime-less runtimes
Correct reference to model

* Style

* Fix Copied from

* LongFormer ONNX config.

* Removed optimizations

* Remvoe bad merge relicas.

* Remove unused constants.

* Remove some deleted constants from imports.

* Fix unittest to remove usage of PyTorch model for onnx.utils.

* Fix distilbert export

* Enable ONNX export test for supported model.

* Style.

* Fix lint.

* Enable all supported default models.

* GPT2 only has one output

* Fix bad property name when overriding config.

* Added unittests and docstrings.

* Disable with_past tests for now.

* Enable outputs validation for default export.

* Remove graph opt lvls.

* Last commit with on-going past commented.

* Style.

* Disabled `with_past` for now

* Remove unused imports.

* Remove framework argument

* Remove TFPreTrainedModel reference

* Add documentation

* Add onnxruntime tests to CircleCI

* Add test

* Rename `convert_pytorch` to `export`

* Use OrderedDict for dummy inputs

* WIP Wav2Vec2

* Revert "WIP Wav2Vec2"

This reverts commit f665efb04c92525c3530e589029f0ae7afdf603e.

* Style

* Use OrderedDict for I/O

* Style.

* Specify OrderedDict documentation.

* Style :)

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-07-08 10:54:42 -04:00
Lysandre
2870fd198f Bump CircleCI machine sizes 2021-07-06 17:46:39 +02:00
Lysandre
89073a95ba Document patch release v4.8.2 2021-06-30 14:39:52 +02:00
Stas Bekman
d25ad34c82
[CI] add dependency table sync verification (#12364)
* add dependency table sync verification

* improve the message

* improve the message

* revert

* ready to merge
2021-06-28 08:55:59 -07:00
Sylvain Gugger
5b1b5635d3 Document patch release v4.8.1 2021-06-24 10:15:15 -04:00
Sylvain Gugger
2150dfed31 v4.9.0.dev0 2021-06-23 13:31:19 -04:00
Lysandre
0daadc1919 Docs for v4.8.0 2021-06-17 18:17:42 +02:00
Lysandre Debut
3a960c4857
Support for torch 1.9.0 (#12224)
* Support for torch 1.9.0

* Torch scatter for 1.9.0

* Github Actions run on 1.9.0
2021-06-17 11:29:01 -04:00
Lysandre Debut
52c7ca0488
Temporarily deactivate torch-scatter while we wait for new release (#12181)
* Temporarily deactivate torch-scatter while we wait for new release

* torch-1.8.1 binary for scatter

* Revert to 1.8.0

* Pin torch dependency

* torchaudio and torchvision
2021-06-15 16:03:58 -04:00
NielsRogge
d3eacbb829
Add DETR (#11653)
* Squash all commits of modeling_detr_v7 branch into one

* Improve docs

* Fix tests

* Style

* Improve docs some more and fix most tests

* Fix slow tests of ViT, DeiT and DETR

* Improve replacement of batch norm

* Restructure timm backbone forward

* Make DetrForSegmentation support any timm backbone

* Fix name of output

* Address most comments by @LysandreJik

* Give better names for variables

* Conditional imports + timm in setup.py

* Address additional comments by @sgugger

* Make style, add require_timm and require_vision to testsé

* Remove train_backbone attribute of DetrConfig, add methods to freeze/unfreeze backbone

* Add png files to fixtures

* Fix type hint

* Add timm to workflows

* Add `BatchNorm2d` to the weight initialization

* Fix retain_grad test

* Replace model checkpoints by Facebook namespace

* Fix name of checkpoint in test

* Add user-friendly message when scipy is not available

* Address most comments by @patrickvonplaten

* Remove return_intermediate_layers attribute of DetrConfig and simplify Joiner

* Better initialization

* Scipy is necessary to get sklearn metrics

* Rename TimmBackbone to DetrTimmConvEncoder and rename DetrJoiner to DetrConvModel

* Make style

* Improve docs and add 2 community notebooks

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-06-09 11:51:13 -04:00
Sylvain Gugger
252082001d Fix v4.6.0 doc 2021-05-13 10:45:28 -04:00
Sylvain Gugger
cbbf49f644 Fix doc deployment 2021-05-13 10:34:14 -04:00
Lysandre
d77eb0cf92 Docs for v4.7.0.dev0 2021-05-12 17:08:35 +02:00
Sylvain Gugger
2ce0fb84cc
Make quality scripts work when one backend is missing. (#11573)
* Make quality scripts work when one backend is missing.

* Check env variable is properly set

* Add default

* With print statements

* Fix typo

* Set env variable

* Remove debug code
2021-05-04 09:53:44 -04:00
Sylvain Gugger
81a6c7cd39 Use 3 workers for torch tests 2021-04-23 18:47:46 -04:00
Sylvain Gugger
ca6b80cadb Wrong branch Sylvain... 2021-04-23 12:46:54 -04:00
Sylvain Gugger
3951fc55ee Try to trigger failure more 2021-04-23 12:44:54 -04:00
Sylvain Gugger
bf2e0cf70b
Trainer push to hub (#11328)
* Initial support for upload to hub

* push -> upload

* Fixes + examples

* Fix torchhub test

* Torchhub test I hate you

* push_model_to_hub -> push_to_hub

* Apply mixin to other pretrained models

* Remove ABC inheritance

* Add tests

* Typo

* Run tests

* Install git-lfs

* Change approach

* Add push_to_hub to all

* Staging test suite

* Typo

* Maybe like this?

* More deps

* Cache

* Adapt name

* Quality

* MOAR tests

* Put it in testing_utils

* Docs + torchhub last hope

* Styling

* Wrong method

* Typos

* Update src/transformers/file_utils.py

Co-authored-by: Julien Chaumond <julien@huggingface.co>

* Address review comments

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-04-23 09:17:37 -04:00
Sylvain Gugger
dabeb15292
Examples reorg (#11350)
* Base move

* Examples reorganization

* Update references

* Put back test data

* Move conftest

* More fixes

* Move test data to test fixtures

* Update path

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments and clean

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-04-21 11:11:20 -04:00
Sylvain Gugger
893e51a53f Document v4.5.1 2021-04-13 11:28:17 -04:00
Sylvain Gugger
26212c14e5 Reactivate Megatron tests an use less workers 2021-04-09 18:09:53 -04:00
Kevin Canwen Xu
fb41f9f50c
Add a special tokenizer for CPM model (#11068)
* Add a special tokenizer for CPM model

* make style

* fix

* Add docs

* styles

* cpm doc

* fix ci

* fix the overview

* add test

* make style

* typo

* Custom tokenizer flag

* Add REAMDE.md

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-04-10 02:07:47 +08:00
Stas Bekman
97ccf67bb3
[setup] extras[docs] must include 'all' (#11148)
* extras[doc] must include 'all'

* fix

* better

* regroup
2021-04-08 18:10:44 -04:00
Lysandre
9853c5dd58 Development on v4.6.0dev0 2021-04-06 12:53:25 -04:00
Sylvain Gugger
b0d49fd536
Add a script to check inits are consistent (#11024) 2021-04-04 20:41:34 -04:00
NielsRogge
30677dc743
Add Vision Transformer and ViTFeatureExtractor (#10950)
* Squash all commits into one

* Update ViTFeatureExtractor to use image_utils instead of torchvision

* Remove torchvision and add Pillow

* Small docs improvement

* Address most comments by @sgugger

* Fix tests

* Clean up conversion script

* Pooler first draft

* Fix quality

* Improve conversion script

* Make style and quality

* Make fix-copies

* Minor docs improvements

* Should use fix-copies instead of manual handling

* Revert "Should use fix-copies instead of manual handling"

This reverts commit fd4e591bce.

* Place ViT in alphabetical order

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-04-01 11:16:05 -04:00
Sylvain Gugger
d0b3797a3b
Add more metadata to the user agent (#10972)
* Add more metadata to the user agent

* Fix typo

* Use DISABLE_TELEMETRY

* Address review comments

* Use global env

* Add clean envs on circle CI
2021-03-31 09:36:07 -04:00
Lysandre
3f48b2bc3e Update stable docs 2021-03-23 11:01:16 -04:00
Sylvain Gugger
21e86f99e6
Sort init import (#10801)
* Initial script

* Add script to properly sort imports in init.

* Add to the CI

* Update utils/custom_init_isort.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Separate scripts that change content from quality

* Move class_mapping_update to style_checks

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-03-19 16:17:13 -04:00
Sylvain Gugger
dcebe254fa Document v4.4.2 2021-03-18 15:19:25 -04:00
Lysandre
73fe40898d Docs for v4.4.1 2021-03-16 15:41:49 -04:00
Lysandre
1b5ce1e63b Development on v4.5.0dev0 2021-03-16 11:41:15 -04:00
Patrick von Platen
9f8619c6aa
Flax testing should not run the full torch test suite (#10725)
* make flax tests pytorch independent

* fix typo

* finish

* improve circle ci

* fix return tensors

* correct flax test

* re-add sentencepiece

* last tokenizer fixes

* finish maybe now
2021-03-16 08:05:37 +03:00
Suraj Patil
d26b37e744
Speech2TextTransformer (#10175)
* s2t

* fix config

* conversion script

* fix import

* add tokenizer

* fix tok init

* fix tokenizer

* first version working

* fix embeds

* fix lm head

* remove extra heads

* fix convert script

* handle encoder attn mask

* style

* better enc attn mask

* override _prepare_attention_mask_for_generation

* handle attn_maks in encoder and decoder

* input_ids => input_features

* enable use_cache

* remove old code

* expand embeddings if needed

* remove logits bias

* masked_lm_loss => loss

* hack tokenizer to support feature processing

* fix model_input_names

* style

* fix error message

* doc

* remove inputs_embeds

* remove input_embeds

* remove unnecessary docstring

* quality

* SpeechToText => Speech2Text

* style

* remove shared_embeds

* subsample => conv

* remove Speech2TextTransformerDecoderWrapper

* update output_lengths formula

* fix table

* remove max_position_embeddings

* update conversion scripts

* add possibility to do upper case for now

* add FeatureExtractor and Processor

* add tests for extractor

* require_torch_audio => require_torchaudio

* add processor test

* update import

* remove classification head

* attention mask is now 1D

* update docstrings

* attention mask should be of type long

* handle attention mask from generate

* alwyas return attention_mask

* fix test

* style

* doc

* Speech2TextTransformer => Speech2Text

* Speech2TextTransformerConfig => Speech2TextConfig

* remove dummy_inputs

* nit

* style

* multilinguial tok

* fix tokenizer

* add tgt_lang setter

* save lang_codes

* fix tokenizer

* add forced_bos_token_id to tokenizer

* apply review suggestions

* add torchaudio to extra deps

* add speech deps to CI

* fix dep

* add libsndfile to ci

* libsndfile1

* add speech to extras all

* libsndfile1 -> libsndfile1

* libsndfile

* libsndfile1-dev

* apt update

* add sudo to install

* update deps table

* install libsndfile1-dev on CI

* tuple to list

* init conv layer

* add model tests

* quality

* add integration tests

* skip_special_tokens

* add speech_to_text_transformer in toctree

* fix tokenizer

* fix fp16 tests

* add tokenizer tests

* fix copyright

* input_values => input_features

* doc

* add model in readme

* doc

* change checkpoint names

* fix copyright

* fix code example

* add max_model_input_sizes in tokenizer

* fix integration tests

* add do_lower_case to tokenizer

* remove clamp trick

* fix "Add modeling imports here"

* fix copyrights

* fix tests

* SpeechToTextTransformer => SpeechToText

* fix naming

* fix table formatting

* fix typo

* style

* fix typos

* remove speech dep from extras[testing]

* fix copies

* rename doc file,

* put imports under is_torch_available

* run feat extract tests when torch is available

* dummy objects for processor and extractor

* fix imports in tests

* fix import in modeling test

* fxi imports

* fix torch import

* fix imports again

* fix positional embeddings

* fix typo in import

* adapt new extractor refactor

* style

* fix torchscript test

* doc

* doc

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix docs, copied from, style

* fix docstring

* handle imports

* remove speech from all extra deps

* remove s2t from seq2seq lm mapping

* better names

* skip training tests

* add install instructions

* List => Tuple

* doc

* fix conversion script

* fix urls

* add instruction for libsndfile

* fix fp16 test

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-03-10 21:42:04 +05:30
Sylvain Gugger
7da995c00c
Fix embeddings for PyTorch 1.8 (#10549)
* Fix embeddings for PyTorch 1.8

* Try with PyTorch 1.8.0

* Fix embeddings init

* Fix copies

* Typo

* More typos
2021-03-05 16:18:48 -05:00
Lysandre
dc9aaa3848 Pin torch to 1.7.1 in tests while we resolve issues 2021-03-05 07:57:35 -05:00
Lysandre
093b88f4e9 Update scatter to use torch 1.8.0 2021-03-05 07:31:51 -05:00
Lysandre
3591844306 v4.3.3 docs 2021-02-24 15:19:01 -05:00
Stas Bekman
d478257d9b
[CI] build docs faster (#10115)
I assume the CI machine should have at least 4 cores, so let's build docs faster
2021-02-10 03:02:39 -05:00
Sylvain Gugger
0c3d23dff7 Add patch releases to the doc 2021-02-09 14:17:09 -05:00
Lysandre
bf1a06a437 Docs for v4.3.1 release 2021-02-09 10:02:50 +01:00
Lysandre
ba542ffb49 Fix deployment script 2021-02-09 08:43:00 +01:00
Lysandre
0dd579c9cf Docs for v4.3.0 2021-02-08 18:53:24 +01:00
Sylvain Gugger
45aaf5f7ab
A few fixes in the documentation (#10033) 2021-02-08 05:02:01 -05:00
Lysandre
ad2c431097 Update doc deployment script path 2021-02-05 11:18:59 +01:00
Lysandre
95a5f271e5 Update doc deployment script 2021-02-05 11:10:29 +01:00
Sylvain Gugger
3be965c5db
Update doc for pre-release (#10014)
* Update doc for pre-release

* Use stable as default

* Use the right commit :facepalms:
2021-02-04 16:52:27 -05:00
Lysandre Debut
910aa89671
Temporarily deactivate TPU tests while we work on fixing them (#9720) 2021-01-21 04:17:39 -05:00
Lysandre
33a8497db8 v4.2.0 documentation 2021-01-13 16:15:40 +01:00
Lysandre
bd40345d3e v4.1.1 docs 2020-12-17 11:28:38 -05:00
Lysandre
f83d9c8da7 v4.1.0 docs 2020-12-17 10:16:07 -05:00
NielsRogge
1551e2dc6d
[WIP] Tapas v4 (tres) (#9117)
* First commit: adding all files from tapas_v3

* Fix multiple bugs including soft dependency and new structure of the library

* Improve testing by adding torch_device to inputs and adding dependency on scatter

* Use Python 3 inheritance rather than Python 2

* First draft model cards of base sized models

* Remove model cards as they are already on the hub

* Fix multiple bugs with integration tests

* All model integration tests pass

* Remove print statement

* Add test for convert_logits_to_predictions method of TapasTokenizer

* Incorporate suggestions by Google authors

* Fix remaining tests

* Change position embeddings sizes to 512 instead of 1024

* Comment out positional embedding sizes

* Update PRETRAINED_VOCAB_FILES_MAP and PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES

* Added more model names

* Fix truncation when no max length is specified

* Disable torchscript test

* Make style & make quality

* Quality

* Address CI needs

* Test the Masked LM model

* Fix the masked LM model

* Truncate when overflowing

* More much needed docs improvements

* Fix some URLs

* Some more docs improvements

* Test PyTorch scatter

* Set to slow + minify

* Calm flake8 down

* First commit: adding all files from tapas_v3

* Fix multiple bugs including soft dependency and new structure of the library

* Improve testing by adding torch_device to inputs and adding dependency on scatter

* Use Python 3 inheritance rather than Python 2

* First draft model cards of base sized models

* Remove model cards as they are already on the hub

* Fix multiple bugs with integration tests

* All model integration tests pass

* Remove print statement

* Add test for convert_logits_to_predictions method of TapasTokenizer

* Incorporate suggestions by Google authors

* Fix remaining tests

* Change position embeddings sizes to 512 instead of 1024

* Comment out positional embedding sizes

* Update PRETRAINED_VOCAB_FILES_MAP and PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES

* Added more model names

* Fix truncation when no max length is specified

* Disable torchscript test

* Make style & make quality

* Quality

* Address CI needs

* Test the Masked LM model

* Fix the masked LM model

* Truncate when overflowing

* More much needed docs improvements

* Fix some URLs

* Some more docs improvements

* Add add_pooling_layer argument to TapasModel

Fix comments by @sgugger and @patrickvonplaten

* Fix issue in docs + fix style and quality

* Clean up conversion script and add task parameter to TapasConfig

* Revert the task parameter of TapasConfig

Some minor fixes

* Improve conversion script and add test for absolute position embeddings

* Improve conversion script and add test for absolute position embeddings

* Fix bug with reset_position_index_per_cell arg of the conversion cli

* Add notebooks to the examples directory and fix style and quality

* Apply suggestions from code review

* Move from `nielsr/` to `google/` namespace

* Apply Sylvain's comments

Co-authored-by: sgugger <sylvain.gugger@gmail.com>

Co-authored-by: Rogge Niels <niels.rogge@howest.be>
Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
2020-12-15 17:08:49 -05:00
Lysandre Debut
91fa707217
Remove docs only check (#9065) 2020-12-11 10:27:31 -05:00
Sylvain Gugger
783d7d2629
Reorganize examples (#9010)
* Reorganize example folder

* Continue reorganization

* Change requirements for tests

* Final cleanup

* Finish regroup with tests all passing

* Copyright

* Requirements and readme

* Make a full link for the documentation

* Address review comments

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Add symlink

* Reorg again

* Apply suggestions from code review

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Adapt title

* Update to new strucutre

* Remove test

* Update READMEs

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-12-11 10:07:02 -05:00
Stas Bekman
5e637e6c69
[wip] [ci] doc-job-skip take #4 dry-run (#8980)
* ci-doc-job-skip-take-4

* wip

* wip

* wip

* wip

* skip yaml

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* ready to test

* yet another way

* trying with HEAD

* trying with head.sha

* trying with head.sha fix

* trying with head.sha fix wip

* undo

* try to switch to sha

* current branch

* current branch

* PR number check

* joy ride

* joy ride

* joy ride

* joy ride

* joy ride

* joy ride

* joy ride

* joy ride

* joy ride

* joy ride

* joy ride

* joy ride
2020-12-09 15:36:36 -05:00
Lysandre Debut
2ae7388eee
Check table as independent script (#8976) 2020-12-07 19:55:12 -05:00
Julien Chaumond
28fa014a1f
transformers-cli: LFS multipart uploads (> 5GB) (#8663)
* initial commit

* [cli] lfs commands

* Fix FileSlice

* Tweak to FileSlice

* [hf_api] Backport filetype arg from `datasets`

cc @lhoestq

* Silm down the CI while i'm working

* Ok let's try this in CI

* Update config.yml

* Do not try this at home

* one more try

* Update lfs.py

* Revert "Tweak to FileSlice"

This reverts commit d7e32c4b35.

* Update test_hf_api.py

* Update test_hf_api.py

* Update test_hf_api.py

* CI still green?

* make CI green again?

* Update test_hf_api.py

* make CI red again?

* Update test_hf_api.py

* add CI style back

* Fix CI?

* oh my

* doc + switch back to real staging endpoint

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Pierric Cistac <Pierrci@users.noreply.github.com>

* Fix docblock + f-strings

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Pierric Cistac <Pierrci@users.noreply.github.com>
2020-12-07 16:38:39 -05:00
Stas Bekman
37f4c24f10
> 30 files leads to hanging on --More--
cancel debug printing for now. As it can be seen lead to a failing test here:
https://app.circleci.com/pipelines/github/huggingface/transformers/16894/workflows/cc86f7a9-4020-45af-8ab3-c22f79b427cf/jobs/131924
2020-12-07 12:18:05 -08:00
Stas Bekman
73c51f7fcd
[ci] skip doc jobs - circleCI is not reliable - disable skip for now (#8926)
* disable skipping, but leave logging for the future
2020-12-04 10:13:42 -08:00
Stas Bekman
24f0c2fe33
[ci] skip doc jobs take #3 (#8885)
* check that we get any match first

* docs only

* 2 docs only

* add code

* restore
2020-12-02 10:06:45 -05:00
Stas Bekman
693ac3594b
disable job skip - need more work
reference: https://github.com/huggingface/transformers/pull/8853#issuecomment-736779863
2020-12-01 12:03:29 -08:00
Stas Bekman
21db560df3
[CI] skip docs-only jobs take #2 (#8853)
* restore skip

* Revert "Remove deprecated `evalutate_during_training` (#8852)"

This reverts commit 5530299096.

* check that pipeline.git.base_revision is defined before proceeding

* Revert "Revert "Remove deprecated `evalutate_during_training` (#8852)""

This reverts commit dfec84db3f.

* check that pipeline.git.base_revision is defined before proceeding

* doc only

* doc + code

* restore

* restore

* typo
2020-12-01 13:15:25 -05:00
LysandreJik
9995a341c9 Update docs 2020-11-30 12:07:52 -05:00
Sylvain Gugger
08e707633c Comment the skip job on doc line 2020-11-30 10:51:25 -05:00
Stas Bekman
c239dcda83
[CI] implement job skipping for doc-only PRs (#8826)
* implement job skipping for doc-only PRs

* silent grep is crucial

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* let's add doc

* let's add code

* revert test commits

* restore

* Better name

* Better name

* Better name

* some more testing

* some more testing

* some more testing

* finish testing
2020-11-29 11:31:30 -05:00
Julien Chaumond
0cc5ab1333
Improve bert-japanese tokenizer handling (#8659)
* Make ci fail

* Try to make tests actually run?

* CI finally failing?

* Fix CI

* Revert "Fix CI"

This reverts commit ca7923be73.

* Ooops wrong one

* one more try

* Ok ok let's move this elsewhere

* Alternative to globals() (#8667)

* Alternative to globals()

* Error is raised later so return None

* Sentencepiece not installed make some tokenizers None

* Apply Lysandre wisdom

* Slightly clearer comment?

cc @sgugger

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-11-23 11:15:02 -05:00
Sylvain Gugger
6494910f27
Add sentencepiece to the CI and fix tests (#8672)
* Fix the CI and tests

* Fix quality

* Remove that m form nowhere
2020-11-19 16:44:20 -05:00
Sylvain Gugger
bb03a14edd Update doc for v3.5.1 2020-11-13 10:29:58 -05:00
Funtowicz Morgan
121c24efa4
Update deploy-docs dependencies on CI to enable Flax (#8475)
* Update deploy-docs dependencies on CI to enable Flax

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added pair of ""

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-11-11 18:31:41 -05:00
Funtowicz Morgan
a5b682329c
Flax/Jax documentation (#8331)
* First addition of Flax/Jax documentation

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* make style

* Ensure input order match between Bert & Roberta

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Install dependencies "all" when building doc

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* wraps build_doc deps with ""

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Addressing @sgugger comments.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Use list to highlight JAX features.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Make style.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Let's not look to much into the future for now.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Style

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-11-11 14:53:36 -05:00
Lysandre
aec51e5696 v3.5.0 documentation 2020-11-10 08:58:47 -05:00
Sylvain Gugger
854b44aa38 Revert size change as it doesn't change anything 2020-11-04 11:13:24 -05:00
Sylvain Gugger
414985c427 Upgrade resource for doc building 2020-11-04 10:44:19 -05:00
Stas Bekman
1bb4bba53c
[CIs] Better reports everywhere (#8275)
* make it possible to invoke testconf.py in both test suites without crashing on having the same option added

* perl -pi -e 's|--make_reports|--make-reports|' to be consistent with other opts

* add `pytest --make-reports` to all CIs (and artifacts)

* fix
2020-11-03 16:57:12 -05:00
Sylvain Gugger
4c19f3baab
Clean Trainer tests and datasets dep (#8268) 2020-11-03 15:50:55 -05:00
Sylvain Gugger
691176283d
Add a template for examples and apply it for mlm and plm examples (#8153)
* Add a template for example scripts and apply it to mlm

* Formatting

* Fix test

* Add plm script

* Add a template for example scripts and apply it to mlm

* Formatting

* Fix test

* Add plm script

* Add a template for example scripts and apply it to mlm

* Formatting

* Fix test

* Add plm script

* Styling
2020-10-29 13:38:11 -04:00
Lysandre Debut
1b6c8d4811
Update CI cache (#8126) 2020-10-28 13:59:43 -04:00
Lysandre Debut
a0906068cf
Fully remove codecov (#8093) 2020-10-27 14:14:13 -04:00
Stas Bekman
bfd5e370a7
[CI] generate separate report files as artifacts (#7995)
* better reports

* a whole bunch of reports in their own files

* clean up

* improvements

* github artifacts experiment

* style

* complete the report generator with multiple improvements/fixes

* fix

* save all reports under one dir to easy upload

* can remove temp failing tests

* doc fix

* some cleanup
2020-10-27 09:25:07 -04:00
Sylvain Gugger
08f534d2da
Doc styling (#8067)
* Important files

* Styling them all

* Revert "Styling them all"

This reverts commit 7d029395fd.

* Syling them for realsies

* Fix syntax error

* Fix benchmark_utils

* More fixes

* Fix modeling auto and script

* Remove new line

* Fixes

* More fixes

* Fix more files

* Style

* Add FSMT

* More fixes

* More fixes

* More fixes

* More fixes

* Fixes

* More fixes

* More fixes

* Last fixes

* Make sphinx happy
2020-10-26 18:26:02 -04:00
Thomas Wolf
3a40cdf58d
[tests|tokenizers] Refactoring pipelines test backbone - Small tokenizers improvements - General tests speedups (#7970)
* WIP refactoring pipeline tests - switching to fast tokenizers

* fix dialog pipeline and fill-mask

* refactoring pipeline tests backbone

* make large tests slow

* fix tests (tf Bart inactive for now)

* fix doc...

* clean up for merge

* fixing tests - remove bart from summarization until there is TF

* fix quality and RAG

* Add new translation pipeline tests - fix JAX tests

* only slow for dialog

* Fixing the missing TF-BART imports in modeling_tf_auto

* spin out pipeline tests in separate CI job

* adding pipeline test to CI YAML

* add slow pipeline tests

* speed up tf and pt join test to avoid redoing all the standalone pt and tf tests

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* Update src/transformers/pipelines.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/pipelines.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/testing_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* add require_torch and require_tf in is_pt_tf_cross_test

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-10-23 15:58:19 +02:00
Lysandre
467573ddde Fix documentation redirect 2020-10-22 15:37:51 -04:00
Lysandre
ef0ac063c9 Docs for v3.4.0 2020-10-20 16:29:00 +02:00
Stas Bekman
ca37db0559
[flax] fix repo_check (#7914)
* [flax] fix repo_check

Unless, this is actually a problem, this adds `modeling_flax_utils` to ignore list. otherwise currently it expects to have a 'tests/test_modeling_flax_utils.py' for it.
for context please see: https://github.com/huggingface/transformers/pull/3722#issuecomment-712360415

* fix 2 more issues

* merge https://github.com/huggingface/transformers/pull/7919/
2020-10-20 07:55:40 -04:00
Funtowicz Morgan
8f8f8d99fc
Integrate Bert-like model on Flax runtime. (#3722)
* WIP flax bert

* Initial commit Bert Jax/Flax implementation.

* Embeddings working and equivalent to PyTorch.

* Move embeddings in its own module BertEmbeddings

* Added jax.jit annotation on forward call

* BertEncoder on par with PyTorch ! :D

* Add BertPooler on par with PyTorch !!

* Working Jax+Flax implementation of BertModel with < 1e-5 differences on the last layer.

* Fix pooled output to take only the first token of the sequence.

* Refactoring to use BertConfig from transformers.

* Renamed FXBertModel to FlaxBertModel

* Model is now initialized in FlaxBertModel constructor and reused.

* WIP JaxPreTrainedModel

* Cleaning up the code of FlaxBertModel

* Added ability to load Flax model saved through save_pretrained()

* Added ability to convert Pytorch Bert model to FlaxBert

* FlaxBert can now load every Pytorch Bert model with on-the-fly conversion

* Fix hardcoded shape values in conversion scripts.

* Improve the way we handle LayerNorm conversion from PyTorch to Flax.

* Added positional embeddings as parameter of BertModel with default to np.arange.

* Let's roll FlaxRoberta !

* Fix missing position_ids parameters on predict for Bert

* Flax backend now supports batched inputs

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Make it possible to load msgpacked model on convert from pytorch in last resort.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Moved save_pretrained to Jax base class along with more constructor parameters.

* Use specialized, model dependent conversion functio.

* Expose `is_flax_available` in file_utils.

* Added unittest for Flax models.

* Added run_tests_flax to the CI.

* Introduce FlaxAutoModel

* Added more unittests

* Flax model reference the _MODEL_ARCHIVE_MAP from PyTorch model.

* Addressing review comments.

* Expose seed in both Bert and Roberta

* Fix typo suggested by @stefan-it

Co-Authored-By: Stefan Schweter <stefan@schweter.it>

* Attempt to make style

* Attempt to make style in tests too

* Added jax & jaxlib to the flax optional dependencies.

* Attempt to fix flake8 warnings ...

* Redo black again and again

* When black and flake8 fight each other for a space ... 💥 💥 💥

* Try removing trailing comma to make both black and flake happy!

* Fix invalid is_<framework>_available call, thanks @LysandreJik 🎉

* Fix another invalid import in flax_roberta test

* Bump and pin flax release to 0.1.0.

* Make flake8 happy, remove unused jax import

* Change the type of the catch for msgpack.

* Remove unused import.

* Put seed as optional constructor parameter.

* trigger ci again

* Fix too much parameters in BertAttention.

* Formatting.

* Simplify Flax unittests to avoid machine crashes.

* Fix invalid number of arguments when raising issue for an unknown model.

* Address @bastings comment in PR, moving jax.jit decorated outside of __call__

* Fix incorrect path to require_flax/require_pytorch functions.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Attempt to make style.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Correct rebasing of circle-ci dependencies

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Fix import sorting.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Fix unused imports.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Again import sorting...

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Installing missing nlp dependency for flax unittests.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Fix laoding of model for Flax implementations.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* jit the inner function call to make JAX-compatible

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Format !

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Flake one more time 🎶

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Rewrites BERT in Flax to the new Linen API (#7211)

* Rewrite Flax HuggingFace PR to Linen

* Some fixes

* Fix tests

* Fix CI with change of name of nlp (#7054)

* nlp -> datasets

* More nlp -> datasets

* Woopsie

* More nlp -> datasets

* One last

* Expose `is_flax_available` in file_utils.

* Added run_tests_flax to the CI.

* Attempt to make style

* trigger ci again

* Fix import sorting.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Revert "Rewrites BERT in Flax to the new Linen API (#7211)"

This reverts commit 23703a5eb3364e26a1cbc3ee34b4710d86a674b0.

* Remove jnp.lax references

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make style.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Reintroduce Linen changes ...

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make style.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Use jax native's gelu function.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Renaming BertModel to BertModule to highlight the fact this is the Flax Module object.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Rewrite FlaxAutoModel test to not rely on pretrained_model_archive_map

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Remove unused variable in BertModule.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Remove unused variable in BertModule again

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Attempt to have is_flax_available working again.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Introduce JAX TensorType

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Improve ImportError message when trying to convert to various TensorType format.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Makes Flax model jittable.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Ensure flax models are jittable in unittests.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Remove unused imports.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Ensure jax imports are guarded behind is_flax_available.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make style.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make style again

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make style again again

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make style again again again

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Update src/transformers/file_utils.py

Co-authored-by: Marc van Zee <marcvanzee@gmail.com>

* Bump flax to it's latest version

Co-authored-by: Marc van Zee <marcvanzee@gmail.com>

* Bump jax version to at least 0.2.0

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Style.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Update the unittest to use TensorType.JAX

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* isort import in tests.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Match new flax parameters name "params"

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Remove unused imports.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Add flax models to transformers __init__

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Attempt to address all CI related comments.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Correct circle.yml indent.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Correct circle.yml indent (2)

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Remove coverage from flax tests

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Addressing many naming suggestions from comments

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Simplify for loop logic to interate over layers in FlaxBertLayerCollection

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* use f-string syntax for formatting logs.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Use config property from FlaxPreTrainedModel.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* use "cls_token" instead of "first_token" variable name.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* use "hidden_state" instead of "h" variable name.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Correct class reference in docstring to link to Flax related modules.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Added HF + Google Flax team copyright.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make Roberta independent from Bert

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Move activation functions to flax_utils.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Move activation functions to flax_utils for bert.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Added docstring for BERT

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Update import for Bert and Roberta tokenizers

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make style.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* fix-copies

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Correct FlaxRobertaLayer to match PyTorch.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Use the same store_artifact for flax unittest

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Style.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make sure gradient are disabled only locally for flax unittest using torch equivalence.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Use relative imports

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

Co-authored-by: Stefan Schweter <stefan@schweter.it>
Co-authored-by: Marc van Zee <marcvanzee@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-10-19 09:55:41 -04:00
Stas Bekman
805a202e1a
[CIs] report slow tests add --durations=0 to some pytest jobs (#7884)
* add --durations=50 to some pytest runs

* report all tests
2020-10-19 08:23:14 -04:00
Thomas Wolf
ba8c4d0ac0
[Dependencies|tokenizers] Make both SentencePiece and Tokenizers optional dependencies (#7659)
* splitting fast and slow tokenizers [WIP]

* [WIP] splitting sentencepiece and tokenizers dependencies

* update dummy objects

* add name_or_path to models and tokenizers

* prefix added to file names

* prefix

* styling + quality

* spliting all the tokenizer files - sorting sentencepiece based ones

* update tokenizer version up to 0.9.0

* remove hard dependency on sentencepiece 🎉

* and removed hard dependency on tokenizers 🎉

* update conversion script

* update missing models

* fixing tests

* move test_tokenization_fast to main tokenization tests - fix bugs

* bump up tokenizers

* fix bert_generation

* update ad fix several tokenizers

* keep sentencepiece in deps for now

* fix funnel and deberta tests

* fix fsmt

* fix marian tests

* fix layoutlm

* fix squeezebert and gpt2

* fix T5 tokenization

* fix xlnet tests

* style

* fix mbart

* bump up tokenizers to 0.9.2

* fix model tests

* fix tf models

* fix seq2seq examples

* fix tests without sentencepiece

* fix slow => fast  conversion without sentencepiece

* update auto and bert generation tests

* fix mbart tests

* fix auto and common test without tokenizers

* fix tests without tokenizers

* clean up tests lighten up when tokenizers + sentencepiece are both off

* style quality and tests fixing

* add sentencepiece to doc/examples reqs

* leave sentencepiece on for now

* style quality split hebert and fix pegasus

* WIP Herbert fast

* add sample_text_no_unicode and fix hebert tokenization

* skip FSMT example test for now

* fix style

* fix fsmt in example tests

* update following Lysandre and Sylvain's comments

* Update src/transformers/testing_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/testing_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-10-18 20:51:24 +02:00
Sylvain Gugger
28d183c90c
Allow soft dependencies in the namespace with ImportErrors at use (#7537)
* PoC on RAG

* Format class name/obj name

* Better name in message

* PoC on one TF model

* Add PyTorch and TF dummy objects + script

* Treat scikit-learn

* Bad copy pastes

* Typo
2020-10-05 09:12:04 -04:00
Lysandre
16c213820e Update docs to version v3.3.0 2020-09-28 16:32:00 +02:00
Stas Bekman
df53643807
[code quality] fix confused flake8 (#7309)
* fix confused flake

We run `black  --target-version py35 ...` but flake8 doesn't know that, so currently with py38 flake8 fails suggesting that black should have reformatted 63 files. Indeed if I run:

```
black --line-length 119 --target-version py38 examples templates tests src utils
```
it indeed reformats 63 files.

The only solution I found is to create a black config file as explained at https://github.com/psf/black#configuration-format, which is what this PR adds.

Now flake8 knows that py35 is the standard and no longer gets confused regardless of the user's python version.

* adjust the other files that will now rely on black's config file
2020-09-22 22:12:36 -04:00
Lysandre
6e21f24220 Documentation version 2020-09-22 18:04:39 +02:00
Sylvain Gugger
e4b94d8e58
Copy code from Bert to Roberta and add safeguard script (#7219)
* Copy code from Bert to Roberta and add safeguard script

* Fix docstring

* Comment code

* Formatting

* Update src/transformers/modeling_roberta.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Add test and fix bugs

* Fix style and make new comand

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-09-22 05:02:27 -04:00
Stas Bekman
79111b77d2
remove deprecated flag (#7171)
```
/home/circleci/.local/lib/python3.6/site-packages/isort/main.py:915: UserWarning: W0501: The following deprecated CLI flags were used and ignored: --recursive!
  "W0501: The following deprecated CLI flags were used and ignored: "
```
2020-09-17 05:52:12 -04:00
Sylvain Gugger
514486739c
Fix CI with change of name of nlp (#7054)
* nlp -> datasets

* More nlp -> datasets

* Woopsie

* More nlp -> datasets

* One last
2020-09-10 14:51:08 -04:00
Lysandre
3726754a6c v3.1.0 documentation 2020-09-01 14:39:07 +02:00
Stas Bekman
59a6a32a61
add a final report to all pytest jobs (#6861)
we had it added for one job, please add it to all pytest jobs - we need the output of what tests were run to debug the codecov issue. thank you!
2020-08-31 22:47:23 -04:00
Sylvain Gugger
abc0202194
More tests to Trainer (#6699)
* More tests to Trainer

* Add warning in the doc
2020-08-25 07:07:36 -04:00
Sylvain Gugger
a573777901
Update repo to isort v5 (#6686)
* Run new isort

* More changes

* Update CI, CONTRIBUTING and benchmarks
2020-08-24 11:03:01 -04:00
Masatoshi Suzuki
48c6c6139f
Support additional dictionaries for BERT Japanese tokenizers (#6515)
* Update BERT Japanese tokenizers

* Update CircleCI config to download unidic

* Specify to use the latest dictionary packages
2020-08-17 12:00:23 +08:00
zcain117
fd3de2000f
Get GKE logs via kubectl logs instead of gcloud logging read. (#6446) 2020-08-12 11:46:24 -04:00
Sylvain Gugger
a8db954cda
Activate check on the CI (#6427)
* Activate check on the CI

* Fix repo inconsistencies

* Don't document too much
2020-08-12 08:42:14 -04:00
Lysandre
8a3db6b303 Add TPU testing once again 2020-08-11 08:49:37 +02:00
zcain117
f65ac1faf2
Add missing docker arg for TPU CI. (#6393) 2020-08-11 02:48:49 -04:00
Lysandre
1bbc54a87c Temporarily de-activate TPU CI 2020-08-10 08:11:40 +02:00
zcain117
1b8a7ffcfd
Add setup for TPU CI to run every hour. (#6219)
* Add setup for TPU CI to run every hour.

* Re-organize config.yml

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-08-07 11:17:07 -04:00
Lysandre Debut
80a0676a51
CI dependency wheel caching (#6287)
* Single workflow cache test




Remove cache dir, re-trigger cache


Only pip archives


Not sudo when pip

* All workflow cache

Remove no-cache-dir instruction


Remove last sudo occurrences


v0.3
2020-08-07 02:48:59 -04:00
Lysandre Debut
1d5c3a3d96
Test with --no-cache-dir (#6235) 2020-08-04 03:20:19 -04:00
Lysandre Debut
d740351f7d
Upgrade pip when doing CI (#6234)
* Upgrade pip when doing CI

* Don't forget Github CI
2020-08-04 02:37:12 -04:00
Paul O'Leary McCann
cf3cf304ca
Replace mecab-python3 with fugashi for Japanese tokenization (#6086)
* Replace mecab-python3 with fugashi

This replaces mecab-python3 with fugashi for Japanese tokenization. I am
the maintainer of both projects.

Both projects are MeCab wrappers, so the underlying C++ code is the
same. fugashi is the newer wrapper and doesn't use SWIG, so for basic
use of the MeCab API it's easier to use.

This code insures the use of a version of ipadic installed via pip,
which should make versioning and tracking down issues easier.

fugashi has wheels for Windows, OSX, and Linux, which will help with
issues with installing old versions of mecab-python3 on Windows.
Compared to mecab-python3, because fugashi doesn't use SWIG, it doesn't
require a C++ runtime to be installed on Windows.

In adding this change I removed some code dealing with `cursor`,
`token_start`, and `token_end` variables. These variables didn't seem to
be used for anything, it is unclear to me why they were there.

I ran the tests and they passed, though I couldn't figure out how to run
the slow tests (`--runslow` gave an error) and didn't try testing with
Tensorflow.

* Style fix

* Remove unused variable

Forgot to delete this...

* Adapt doc with install instructions

* Fix typo

Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-07-31 04:41:14 -04:00
Stas Bekman
daa5dd1202
add a summary report flag for run_examples on CI (#6035)
Currently, it's hard to derive which example tests were run on CI, and which weren't. Adding `-rA` flag to `pytest`, will now include a summary like:

```
==================================================================== short test summary info =====================================================================
PASSED examples/test_examples.py::ExamplesTests::test_generation
PASSED examples/test_examples.py::ExamplesTests::test_run_glue
PASSED examples/test_examples.py::ExamplesTests::test_run_language_modeling
PASSED examples/test_examples.py::ExamplesTests::test_run_squad
FAILED examples/test_examples.py::ExamplesTests::test_run_pl_glue - AttributeError: 'Namespace' object has no attribute 'gpus'
============================================================ 1 failed, 4 passed, 8 warnings in 42.96s ============================================================
```
which makes it easier to validate whether some example is being covered by CI or not.
2020-07-26 14:09:14 -04:00
sgugger
a540405213 Fix commit hash for stable doc 2020-07-24 09:07:40 -04:00
Lysandre
b9d8af07e6 Update stable doc 2020-07-09 11:06:23 -04:00
Lysandre
5c82bf6831 Update stable doc 2020-07-09 10:16:13 -04:00
Lysandre
1d2332861f Post v3.0.2 release commit 2020-07-06 18:56:47 -04:00
Sam Shleifer
ac61114592
[CI] gh runner doesn't use -v, cats new result (#5409) 2020-06-30 16:12:14 -04:00
Lysandre Debut
b9ee87f5c7
Doc for v3.0.0 (#5366)
* Doc for v3.0.0

* Update docs/source/_static/js/custom.js

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/_static/js/custom.js

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-06-29 11:08:54 -04:00
Sam Shleifer
bf0d12c220
CircleCI stores cleaner output at test_outputs.txt (#5291) 2020-06-26 13:59:31 -04:00
Sylvain Gugger
0148c262e7
Fix first test (#5255) 2020-06-24 15:16:04 -04:00
Sylvain Gugger
70c1e1d2d5
Use master _static (#5253)
* Use _static from master everywhere

* Copy to existing too
2020-06-24 15:06:14 -04:00
Sylvain Gugger
f216b60671
Fix deploy doc (#5246)
* Try with the same command

* Try like this
2020-06-24 10:59:06 -04:00
Sylvain Gugger
49f6e7a3c6
Add some prints to debug (#5244) 2020-06-24 10:37:01 -04:00
Sylvain Gugger
64c393ee74
Don't recreate old docs (#5243) 2020-06-24 09:59:07 -04:00
Sylvain Gugger
c439752482
Switch master/stable doc and add older releases (#5193) 2020-06-22 16:38:53 -04:00
Lysandre Debut
4e741efa92
Have documentation fail on warning (#5189)
* Have documentation fail on warning

* Force ci failure

* Revert "Force ci failure"

This reverts commit f0a4666ec2.
2020-06-22 15:49:50 -04:00
Harutaka Kawamura
b0c9fbb293
Add workflow to build docs (#3763) 2020-04-17 11:23:18 -04:00