Commit Graph

191 Commits

Author SHA1 Message Date
Sylvain Gugger
893e51a53f Document v4.5.1 2021-04-13 11:28:17 -04:00
Sylvain Gugger
26212c14e5 Reactivate Megatron tests an use less workers 2021-04-09 18:09:53 -04:00
Kevin Canwen Xu
fb41f9f50c
Add a special tokenizer for CPM model (#11068)
* Add a special tokenizer for CPM model

* make style

* fix

* Add docs

* styles

* cpm doc

* fix ci

* fix the overview

* add test

* make style

* typo

* Custom tokenizer flag

* Add REAMDE.md

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-04-10 02:07:47 +08:00
Stas Bekman
97ccf67bb3
[setup] extras[docs] must include 'all' (#11148)
* extras[doc] must include 'all'

* fix

* better

* regroup
2021-04-08 18:10:44 -04:00
Lysandre
9853c5dd58 Development on v4.6.0dev0 2021-04-06 12:53:25 -04:00
Sylvain Gugger
b0d49fd536
Add a script to check inits are consistent (#11024) 2021-04-04 20:41:34 -04:00
NielsRogge
30677dc743
Add Vision Transformer and ViTFeatureExtractor (#10950)
* Squash all commits into one

* Update ViTFeatureExtractor to use image_utils instead of torchvision

* Remove torchvision and add Pillow

* Small docs improvement

* Address most comments by @sgugger

* Fix tests

* Clean up conversion script

* Pooler first draft

* Fix quality

* Improve conversion script

* Make style and quality

* Make fix-copies

* Minor docs improvements

* Should use fix-copies instead of manual handling

* Revert "Should use fix-copies instead of manual handling"

This reverts commit fd4e591bce.

* Place ViT in alphabetical order

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-04-01 11:16:05 -04:00
Sylvain Gugger
d0b3797a3b
Add more metadata to the user agent (#10972)
* Add more metadata to the user agent

* Fix typo

* Use DISABLE_TELEMETRY

* Address review comments

* Use global env

* Add clean envs on circle CI
2021-03-31 09:36:07 -04:00
Lysandre
3f48b2bc3e Update stable docs 2021-03-23 11:01:16 -04:00
Sylvain Gugger
21e86f99e6
Sort init import (#10801)
* Initial script

* Add script to properly sort imports in init.

* Add to the CI

* Update utils/custom_init_isort.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Separate scripts that change content from quality

* Move class_mapping_update to style_checks

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-03-19 16:17:13 -04:00
Sylvain Gugger
dcebe254fa Document v4.4.2 2021-03-18 15:19:25 -04:00
Lysandre
73fe40898d Docs for v4.4.1 2021-03-16 15:41:49 -04:00
Lysandre
1b5ce1e63b Development on v4.5.0dev0 2021-03-16 11:41:15 -04:00
Patrick von Platen
9f8619c6aa
Flax testing should not run the full torch test suite (#10725)
* make flax tests pytorch independent

* fix typo

* finish

* improve circle ci

* fix return tensors

* correct flax test

* re-add sentencepiece

* last tokenizer fixes

* finish maybe now
2021-03-16 08:05:37 +03:00
Suraj Patil
d26b37e744
Speech2TextTransformer (#10175)
* s2t

* fix config

* conversion script

* fix import

* add tokenizer

* fix tok init

* fix tokenizer

* first version working

* fix embeds

* fix lm head

* remove extra heads

* fix convert script

* handle encoder attn mask

* style

* better enc attn mask

* override _prepare_attention_mask_for_generation

* handle attn_maks in encoder and decoder

* input_ids => input_features

* enable use_cache

* remove old code

* expand embeddings if needed

* remove logits bias

* masked_lm_loss => loss

* hack tokenizer to support feature processing

* fix model_input_names

* style

* fix error message

* doc

* remove inputs_embeds

* remove input_embeds

* remove unnecessary docstring

* quality

* SpeechToText => Speech2Text

* style

* remove shared_embeds

* subsample => conv

* remove Speech2TextTransformerDecoderWrapper

* update output_lengths formula

* fix table

* remove max_position_embeddings

* update conversion scripts

* add possibility to do upper case for now

* add FeatureExtractor and Processor

* add tests for extractor

* require_torch_audio => require_torchaudio

* add processor test

* update import

* remove classification head

* attention mask is now 1D

* update docstrings

* attention mask should be of type long

* handle attention mask from generate

* alwyas return attention_mask

* fix test

* style

* doc

* Speech2TextTransformer => Speech2Text

* Speech2TextTransformerConfig => Speech2TextConfig

* remove dummy_inputs

* nit

* style

* multilinguial tok

* fix tokenizer

* add tgt_lang setter

* save lang_codes

* fix tokenizer

* add forced_bos_token_id to tokenizer

* apply review suggestions

* add torchaudio to extra deps

* add speech deps to CI

* fix dep

* add libsndfile to ci

* libsndfile1

* add speech to extras all

* libsndfile1 -> libsndfile1

* libsndfile

* libsndfile1-dev

* apt update

* add sudo to install

* update deps table

* install libsndfile1-dev on CI

* tuple to list

* init conv layer

* add model tests

* quality

* add integration tests

* skip_special_tokens

* add speech_to_text_transformer in toctree

* fix tokenizer

* fix fp16 tests

* add tokenizer tests

* fix copyright

* input_values => input_features

* doc

* add model in readme

* doc

* change checkpoint names

* fix copyright

* fix code example

* add max_model_input_sizes in tokenizer

* fix integration tests

* add do_lower_case to tokenizer

* remove clamp trick

* fix "Add modeling imports here"

* fix copyrights

* fix tests

* SpeechToTextTransformer => SpeechToText

* fix naming

* fix table formatting

* fix typo

* style

* fix typos

* remove speech dep from extras[testing]

* fix copies

* rename doc file,

* put imports under is_torch_available

* run feat extract tests when torch is available

* dummy objects for processor and extractor

* fix imports in tests

* fix import in modeling test

* fxi imports

* fix torch import

* fix imports again

* fix positional embeddings

* fix typo in import

* adapt new extractor refactor

* style

* fix torchscript test

* doc

* doc

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix docs, copied from, style

* fix docstring

* handle imports

* remove speech from all extra deps

* remove s2t from seq2seq lm mapping

* better names

* skip training tests

* add install instructions

* List => Tuple

* doc

* fix conversion script

* fix urls

* add instruction for libsndfile

* fix fp16 test

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-03-10 21:42:04 +05:30
Sylvain Gugger
7da995c00c
Fix embeddings for PyTorch 1.8 (#10549)
* Fix embeddings for PyTorch 1.8

* Try with PyTorch 1.8.0

* Fix embeddings init

* Fix copies

* Typo

* More typos
2021-03-05 16:18:48 -05:00
Lysandre
dc9aaa3848 Pin torch to 1.7.1 in tests while we resolve issues 2021-03-05 07:57:35 -05:00
Lysandre
093b88f4e9 Update scatter to use torch 1.8.0 2021-03-05 07:31:51 -05:00
Lysandre
3591844306 v4.3.3 docs 2021-02-24 15:19:01 -05:00
Stas Bekman
d478257d9b
[CI] build docs faster (#10115)
I assume the CI machine should have at least 4 cores, so let's build docs faster
2021-02-10 03:02:39 -05:00
Sylvain Gugger
0c3d23dff7 Add patch releases to the doc 2021-02-09 14:17:09 -05:00
Lysandre
bf1a06a437 Docs for v4.3.1 release 2021-02-09 10:02:50 +01:00
Lysandre
ba542ffb49 Fix deployment script 2021-02-09 08:43:00 +01:00
Lysandre
0dd579c9cf Docs for v4.3.0 2021-02-08 18:53:24 +01:00
Sylvain Gugger
45aaf5f7ab
A few fixes in the documentation (#10033) 2021-02-08 05:02:01 -05:00
Lysandre
ad2c431097 Update doc deployment script path 2021-02-05 11:18:59 +01:00
Lysandre
95a5f271e5 Update doc deployment script 2021-02-05 11:10:29 +01:00
Sylvain Gugger
3be965c5db
Update doc for pre-release (#10014)
* Update doc for pre-release

* Use stable as default

* Use the right commit :facepalms:
2021-02-04 16:52:27 -05:00
Lysandre Debut
910aa89671
Temporarily deactivate TPU tests while we work on fixing them (#9720) 2021-01-21 04:17:39 -05:00
Lysandre
33a8497db8 v4.2.0 documentation 2021-01-13 16:15:40 +01:00
Lysandre
bd40345d3e v4.1.1 docs 2020-12-17 11:28:38 -05:00
Lysandre
f83d9c8da7 v4.1.0 docs 2020-12-17 10:16:07 -05:00
NielsRogge
1551e2dc6d
[WIP] Tapas v4 (tres) (#9117)
* First commit: adding all files from tapas_v3

* Fix multiple bugs including soft dependency and new structure of the library

* Improve testing by adding torch_device to inputs and adding dependency on scatter

* Use Python 3 inheritance rather than Python 2

* First draft model cards of base sized models

* Remove model cards as they are already on the hub

* Fix multiple bugs with integration tests

* All model integration tests pass

* Remove print statement

* Add test for convert_logits_to_predictions method of TapasTokenizer

* Incorporate suggestions by Google authors

* Fix remaining tests

* Change position embeddings sizes to 512 instead of 1024

* Comment out positional embedding sizes

* Update PRETRAINED_VOCAB_FILES_MAP and PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES

* Added more model names

* Fix truncation when no max length is specified

* Disable torchscript test

* Make style & make quality

* Quality

* Address CI needs

* Test the Masked LM model

* Fix the masked LM model

* Truncate when overflowing

* More much needed docs improvements

* Fix some URLs

* Some more docs improvements

* Test PyTorch scatter

* Set to slow + minify

* Calm flake8 down

* First commit: adding all files from tapas_v3

* Fix multiple bugs including soft dependency and new structure of the library

* Improve testing by adding torch_device to inputs and adding dependency on scatter

* Use Python 3 inheritance rather than Python 2

* First draft model cards of base sized models

* Remove model cards as they are already on the hub

* Fix multiple bugs with integration tests

* All model integration tests pass

* Remove print statement

* Add test for convert_logits_to_predictions method of TapasTokenizer

* Incorporate suggestions by Google authors

* Fix remaining tests

* Change position embeddings sizes to 512 instead of 1024

* Comment out positional embedding sizes

* Update PRETRAINED_VOCAB_FILES_MAP and PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES

* Added more model names

* Fix truncation when no max length is specified

* Disable torchscript test

* Make style & make quality

* Quality

* Address CI needs

* Test the Masked LM model

* Fix the masked LM model

* Truncate when overflowing

* More much needed docs improvements

* Fix some URLs

* Some more docs improvements

* Add add_pooling_layer argument to TapasModel

Fix comments by @sgugger and @patrickvonplaten

* Fix issue in docs + fix style and quality

* Clean up conversion script and add task parameter to TapasConfig

* Revert the task parameter of TapasConfig

Some minor fixes

* Improve conversion script and add test for absolute position embeddings

* Improve conversion script and add test for absolute position embeddings

* Fix bug with reset_position_index_per_cell arg of the conversion cli

* Add notebooks to the examples directory and fix style and quality

* Apply suggestions from code review

* Move from `nielsr/` to `google/` namespace

* Apply Sylvain's comments

Co-authored-by: sgugger <sylvain.gugger@gmail.com>

Co-authored-by: Rogge Niels <niels.rogge@howest.be>
Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
2020-12-15 17:08:49 -05:00
Lysandre Debut
91fa707217
Remove docs only check (#9065) 2020-12-11 10:27:31 -05:00
Sylvain Gugger
783d7d2629
Reorganize examples (#9010)
* Reorganize example folder

* Continue reorganization

* Change requirements for tests

* Final cleanup

* Finish regroup with tests all passing

* Copyright

* Requirements and readme

* Make a full link for the documentation

* Address review comments

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Add symlink

* Reorg again

* Apply suggestions from code review

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Adapt title

* Update to new strucutre

* Remove test

* Update READMEs

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-12-11 10:07:02 -05:00
Stas Bekman
5e637e6c69
[wip] [ci] doc-job-skip take #4 dry-run (#8980)
* ci-doc-job-skip-take-4

* wip

* wip

* wip

* wip

* skip yaml

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* ready to test

* yet another way

* trying with HEAD

* trying with head.sha

* trying with head.sha fix

* trying with head.sha fix wip

* undo

* try to switch to sha

* current branch

* current branch

* PR number check

* joy ride

* joy ride

* joy ride

* joy ride

* joy ride

* joy ride

* joy ride

* joy ride

* joy ride

* joy ride

* joy ride

* joy ride
2020-12-09 15:36:36 -05:00
Lysandre Debut
2ae7388eee
Check table as independent script (#8976) 2020-12-07 19:55:12 -05:00
Julien Chaumond
28fa014a1f
transformers-cli: LFS multipart uploads (> 5GB) (#8663)
* initial commit

* [cli] lfs commands

* Fix FileSlice

* Tweak to FileSlice

* [hf_api] Backport filetype arg from `datasets`

cc @lhoestq

* Silm down the CI while i'm working

* Ok let's try this in CI

* Update config.yml

* Do not try this at home

* one more try

* Update lfs.py

* Revert "Tweak to FileSlice"

This reverts commit d7e32c4b35.

* Update test_hf_api.py

* Update test_hf_api.py

* Update test_hf_api.py

* CI still green?

* make CI green again?

* Update test_hf_api.py

* make CI red again?

* Update test_hf_api.py

* add CI style back

* Fix CI?

* oh my

* doc + switch back to real staging endpoint

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Pierric Cistac <Pierrci@users.noreply.github.com>

* Fix docblock + f-strings

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Pierric Cistac <Pierrci@users.noreply.github.com>
2020-12-07 16:38:39 -05:00
Stas Bekman
37f4c24f10
> 30 files leads to hanging on --More--
cancel debug printing for now. As it can be seen lead to a failing test here:
https://app.circleci.com/pipelines/github/huggingface/transformers/16894/workflows/cc86f7a9-4020-45af-8ab3-c22f79b427cf/jobs/131924
2020-12-07 12:18:05 -08:00
Stas Bekman
73c51f7fcd
[ci] skip doc jobs - circleCI is not reliable - disable skip for now (#8926)
* disable skipping, but leave logging for the future
2020-12-04 10:13:42 -08:00
Stas Bekman
24f0c2fe33
[ci] skip doc jobs take #3 (#8885)
* check that we get any match first

* docs only

* 2 docs only

* add code

* restore
2020-12-02 10:06:45 -05:00
Stas Bekman
693ac3594b
disable job skip - need more work
reference: https://github.com/huggingface/transformers/pull/8853#issuecomment-736779863
2020-12-01 12:03:29 -08:00
Stas Bekman
21db560df3
[CI] skip docs-only jobs take #2 (#8853)
* restore skip

* Revert "Remove deprecated `evalutate_during_training` (#8852)"

This reverts commit 5530299096.

* check that pipeline.git.base_revision is defined before proceeding

* Revert "Revert "Remove deprecated `evalutate_during_training` (#8852)""

This reverts commit dfec84db3f.

* check that pipeline.git.base_revision is defined before proceeding

* doc only

* doc + code

* restore

* restore

* typo
2020-12-01 13:15:25 -05:00
LysandreJik
9995a341c9 Update docs 2020-11-30 12:07:52 -05:00
Sylvain Gugger
08e707633c Comment the skip job on doc line 2020-11-30 10:51:25 -05:00
Stas Bekman
c239dcda83
[CI] implement job skipping for doc-only PRs (#8826)
* implement job skipping for doc-only PRs

* silent grep is crucial

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* let's add doc

* let's add code

* revert test commits

* restore

* Better name

* Better name

* Better name

* some more testing

* some more testing

* some more testing

* finish testing
2020-11-29 11:31:30 -05:00
Julien Chaumond
0cc5ab1333
Improve bert-japanese tokenizer handling (#8659)
* Make ci fail

* Try to make tests actually run?

* CI finally failing?

* Fix CI

* Revert "Fix CI"

This reverts commit ca7923be73.

* Ooops wrong one

* one more try

* Ok ok let's move this elsewhere

* Alternative to globals() (#8667)

* Alternative to globals()

* Error is raised later so return None

* Sentencepiece not installed make some tokenizers None

* Apply Lysandre wisdom

* Slightly clearer comment?

cc @sgugger

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-11-23 11:15:02 -05:00
Sylvain Gugger
6494910f27
Add sentencepiece to the CI and fix tests (#8672)
* Fix the CI and tests

* Fix quality

* Remove that m form nowhere
2020-11-19 16:44:20 -05:00
Sylvain Gugger
bb03a14edd Update doc for v3.5.1 2020-11-13 10:29:58 -05:00
Funtowicz Morgan
121c24efa4
Update deploy-docs dependencies on CI to enable Flax (#8475)
* Update deploy-docs dependencies on CI to enable Flax

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added pair of ""

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-11-11 18:31:41 -05:00