Commit Graph

11251 Commits

Author SHA1 Message Date
Mishig
e2a23b6ce9
Update github pr docs actions (#20125) 2022-11-08 10:37:24 -05:00
Sylvain Gugger
2d6a92f22a
Fix repo consistency 2022-11-08 10:04:30 -05:00
Weiwe Shi
efa889d2e4
Add RocBert (#20013)
* add roc_bert

* update roc_bert readme

* code style

* change name and delete unuse file

* udpate model file

* delete unuse log file

* delete tokenizer fast

* reformat code and change model file path

* add RocBertForPreTraining

* update docs

* delete wrong notes

* fix copies

* fix make repo-consistency error

* fix files are not present in the table of contents error

* change RocBert -> RoCBert

* add doc, add detail test

Co-authored-by: weiweishi <weiweishi@tencent.com>
2022-11-08 10:03:43 -05:00
NielsRogge
258963062b
Add CLIPSeg (#20066)
* Add first draft

* Update conversion script

* Improve conversion script

* Improve conversion script some more

* Add conditional embeddings

* Add initial decoder

* Fix activation function of decoder

* Make decoder outputs match original implementation

* Make decoder outputs match original implementation

* Add more copied from statements

* Improve model outputs

* Fix auto tokenizer file

* Fix more tests

* Add test

* Improve README and docs, improve conditional embeddings

* Fix more tests

* Remove print statements

* Remove initial embeddings

* Improve conversion script

* Add interpolation of position embeddings

* Finish addition of interpolation of position embeddings

* Add support for refined checkpoint

* Fix refined checkpoint

* Remove unused parameter

* Improve conversion script

* Add support for training

* Fix conversion script

* Add CLIPSegFeatureExtractor

* Fix processor

* Fix CLIPSegProcessor

* Fix conversion script

* Fix most tests

* Fix equivalence test

* Fix README

* Add model to doc tests

* Use better variable name

* Convert other checkpoint as well

* Update config, add link to paper

* Add docs

* Update organization

* Replace base_model_prefix with clip

* Fix base_model_prefix

* Fix checkpoint of config

* Fix config checkpoint

* Remove file

* Use logits for output

* Fix tests

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-11-08 10:55:47 +01:00
Sanchit Gandhi
3e39fd09a9
[Audio Processor] Only pass sr to feat extractor (#20022)
* [Audio Processor] Only pass sr to feat extractor

* move out of if/else

* copy to other processors
2022-11-08 08:59:03 +00:00
Sylvain Gugger
fb1c8db78a
Fix AutoTokenizer with subfolder passed (#20110) 2022-11-07 17:59:46 -05:00
Tom Aarsen
6156bffa2b
Replace awkward timm link with the expected one (#20109) 2022-11-07 13:57:39 -05:00
Steven Liu
71f772ebd0
Add new terms to the glossary (#20051)
* add new terms

* apply review
2022-11-07 10:45:27 -08:00
Tom Aarsen
d44ac47bac
docs: Fixed variables in f-strings (#20087)
* docs: Fixed variables in f-strings

* Replace unknown `block` with known `block_type` in ValueError

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Add missing torch import in docs code block

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-11-07 13:18:09 -05:00
Yih-Dar
2bdd9fa284
Fix generate_dummy_inputs for ImageGPTOnnxConfig (#20103)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-07 16:31:26 +01:00
TAGAMI Yukihiro
cfaeb1539e
use huggingface_hub.model_inifo() to get pipline_tag (#20077) 2022-11-07 10:07:59 -05:00
Tom Aarsen
3222fc645b
docs: Resolve many typos in the English docs (#20088)
* docs: Fix typo in ONNX parser help: 'tolerence' => 'tolerance'

* docs: Resolve many typos in the English docs

Typos found via 'codespell ./docs/source/en'
2022-11-07 09:19:04 -05:00
Tom Aarsen
b8112eddec
Replace unsupported facebookresearch/bitsandbytes (#20093)
With https://github.com/TimDettmers/bitsandbytes, which is by the same author and is still being updated
2022-11-07 08:52:03 -05:00
Yih-Dar
4ab6e9e2f8
Skip 2 tests in VisionTextDualEncoderProcessorTest (#20098)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-07 14:51:05 +01:00
Saad Mahmud
b77406bcb2
Removing RobertaConfig inheritance from CamembertConfig (#20059)
* swap RobertaConfig with PretrainedConfig

* Add camembert specific attributes

* Add PretrainedConfig docstring

* Add arguments docstring

* Change CamembertConfig docstring definition

* Fix typo CamembertConfig -> CamembertModel

* Fix typo BertModel -> CamembertModel

* Fix style of CamembertConfig
2022-11-07 08:50:10 -05:00
Saad Mahmud
9617b1304e
[Doctest] Add configuration_dpr.py (#20080)
* Add example docstring for DPRConfig

* Add DPRConfig to documentation_tests
2022-11-07 14:49:59 +01:00
Joao Gante
a0f8674303
Generate: TF contrastive search with XLA support (#20050)
* Add contrastive search
2022-11-07 10:54:29 +00:00
Christopher Akiki
504db92e7d
Update hub.py (#20075) 2022-11-04 22:25:02 +01:00
Christopher Akiki
4b86e44693
Update modeling_tf_utils.py (#20076) 2022-11-04 22:24:37 +01:00
amyeroberts
d68c46026b
Update defaults and logic to match old FE (#20065)
* Update defaults and logic to match old FE

* Use docker run rest values
2022-11-04 19:14:56 +00:00
Yih-Dar
c06d555647
Show installed libraries and their versions in GA jobs (#20069)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-04 18:03:18 +01:00
Yih-Dar
2d02178e5c
Allow passing arguments to model testers for CLIP-like models (#20044)
* POC

* For more CLIP-like models

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-04 18:01:41 +01:00
Jordan Clive
3bd0007e87
Update documentation on seq2seq models with absolute positional embeddings, to be in line with Tips section for BERT and GPT2 (#20068)
Co-authored-by: jordiclive <jordiclive19@imperial.ac.uk>
2022-11-04 11:32:44 -04:00
Matt
6e1c5786dc
Update READMEs for ESMFold and add notebooks (#20067)
* Update READMEs for ESMFold and add notebooks

* Fix PyCharm formatting

* make fix-copies
2022-11-04 15:10:13 +00:00
H. Jhoo
707b12a353
change constant torch.tensor to torch.full (#20061) 2022-11-04 10:41:56 -04:00
NielsRogge
787620e2a2
[Swin] Add Swin SimMIM checkpoints (#20034)
* Fix Swin

* Remove file

* Update code snippet

* Add copied from to maskformer

* Fix docstring

* Add whole name to replace

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-11-04 15:32:44 +01:00
amyeroberts
3936411b9d
PoolformerImageProcessor defaults to match previous FE (#20048)
* Poolformer image processor defaults to previous FE

* Remove unnecessary math.floor
2022-11-04 13:52:58 +00:00
Sanchit Gandhi
94e17c456c
[Trainer] Fix model name in push_to_hub (#20064) 2022-11-04 13:40:21 +00:00
Sourab Mangrulkar
19067711e7
fix tokenizer_type to avoid error when loading checkpoint back (#20062) 2022-11-04 19:04:01 +05:30
bhuang
3502c202f9
Update README.md (#20063) 2022-11-04 08:56:54 -04:00
Matt
1076d587b5
Fix ESM LM head test (#20045)
* Fix esm lm head test

* make fixup
2022-11-04 12:45:34 +00:00
Patrick Deutschmann
d447c460b1
Speed up TF token classification postprocessing by converting complete tensors to numpy (#19976)
* Speed up TF postprocessing by converting to numpy before

* Fix bug that was triggered when offset_mapping was None

Co-authored-by: Patrick Deutschmann <patrick.deutschmann@dedalus.com>
2022-11-03 16:56:22 +00:00
Sylvain Gugger
06886d5a68
Only resize embeddings when necessary (#20043)
* Only resize embeddings when necessary

* Add comment
2022-11-03 12:05:04 -04:00
Michael Benayoun
9080607b2c
Fixed torch.finfo issue with torch.fx (#20040) 2022-11-03 16:14:44 +01:00
Matt
6f257bb3c2
Update esmfold conversion script (#20028)
* Update ESM conversion script for ESMfold

* Fix bug in ESMFold example

* make fixup and move restypes to one line
2022-11-03 14:58:06 +00:00
Wang, Yi
2564f0c21d fix jit trace error for model forward sequence is not aligned with jit.trace tuple input sequence, update related doc (#19891)
* fix jit trace error for classification usecase, update related doc

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* add implementation in torch 1.14.0

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* update_doc

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* update_doc

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2022-11-03 10:50:03 -04:00
Arthur
737bff6a36
[FuturWarning] Add futur warning for LEDForSequenceClassification (#19066)
* fix led eos_mask

* add Futur Warning

* revert uselesss cahnges

* Update src/transformers/models/led/modeling_led.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-11-03 15:26:09 +01:00
Sanchit Gandhi
06d488061f
[Whisper Tokenizer] Make more user-friendly (#19921)
* [Whisper Tokenizer] Make more user-friendly

* use property

* make indexing rigorous

* small clean-up

* tests

* skip seq2seq tests

* remove multilingual arg

* reorder args

* collapse to one function

Co-authored-by: ArthurZucker <arthur@huggingface.co>

* option to override attributes

Co-authored-by: ArthurZucker <arthur@huggingface.co>

* add to docs

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* make comment more clear

Co-authored-by: sgugger <sylvain@huggingface.co>

* don't add special tokens in get_decoder_prompt_ids

* add test for set_prefix_tokens

Co-authored-by: ArthurZucker <arthur@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: sgugger <sylvain@huggingface.co>
2022-11-03 14:22:40 +00:00
Saad Mahmud
790ff2544a
[Doctest] Add configuration_camembert.py (#20039)
* Add example docstring for CamembertConfig

* Add configuration_camembert to documentation_tests
2022-11-03 14:50:42 +01:00
Yih-Dar
9ccea7acb1
Fix some doctests after PR 15775 (#20036)
* Add skip_special_tokens=True in some doctest

* For T5

* Fix for speech_to_text.mdx

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-03 14:18:45 +01:00
amyeroberts
a639ea9e8a
Add **kwargs (#20037) 2022-11-03 12:51:49 +00:00
Nicolas Patry
ec6878f6ca
Now supporting pathlike in pipelines too. (#20030) 2022-11-03 09:14:45 +01:00
Steven Liu
aa39967b28
reorganize glossary (#20010) 2022-11-02 16:58:17 -07:00
Yih-Dar
305e8718b4
Show installed libraries and their versions in CI jobs (#20026)
* Show versions

* check

* store outputs

* revert

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-02 20:52:39 +01:00
Ben Eyal
9f9ddcc2de
🚨 🚨 🚨 Fix Issue 15003: SentencePiece Tokenizers Not Adding Special Tokens in convert_tokens_to_string (#15775)
* Add test for SentencePiece not adding special tokens to strings

* Add SentencePieceStringConversionMixin to fix issue 15003

* Fix conversion from tokens to string for most SentencePiece tokenizers

Tokenizers fixed:
- AlbertTokenizer
- BarthezTokenizer
- CamembertTokenizer
- FNetTokenizer
- M2M100Tokenizer
- MBart50Tokenizer
- PegasusTokenizer
- Speech2TextTokenizer

* Fix MarianTokenizer, adjust SentencePiece test to accomodate vocab

* Fix DebertaV2Tokenizer

* Ignore LayoutXLMTokenizer in SentencePiece string conversion test

* Run 'make style' and 'make quality'

* Clean convert_tokens_to_string test

Instead of explicitly ignoring LayoutXLMTokenizer in the test,
override the test in LayoutLMTokenizationTest and do nothing in it.

* Remove commented out code

* Improve robustness of convert_tokens_to_string test

Instead of comparing lengths of re-tokenized text and input_ids,
check that converting all special tokens to string yields a string
with all special tokens.

* Inline and remove SentencePieceStringConversionMixin

The convert_tokens_to_string method is now implemented
in each relevant SentencePiece tokenizer.

* Run 'make style' and 'make quality'

* Revert removal of space in convert_tokens_to_string

* Remove redundant import

* Revert test text to original

* Uncomment the lowercasing of the reverse_text variable

* Mimic Rust tokenizer behavior for tokenizers

- Albert
- Barthez
- Camembert
- MBart50
- T5

* Fix accidentally skipping test in wrong tokenizer

* Add test for equivalent Rust and slow tokenizer behavior

* Override _decode in BigBirdTokenizer to mimic Rust behavior

* Override _decode in FNetTokenizer to mimic Rust behavior

* Override _decode in XLNetTokenizer to mimic Rust behavior

* Remove unused 're' import

* Update DebertaV2Tokenizer to mimic Rust tokenizer

* Deberta tokenizer now behaves like Albert and its `convert_tokens_to_string` is not tested.

* Ignore problematic tests in Deberta V2

* Add comment on why the Deberta V2 tests are skipped
2022-11-02 15:45:38 -04:00
Yih-Dar
fb7cbe236b
Fix doctest (#20023)
* Fix doctest

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-02 19:37:25 +01:00
Yih-Dar
f69eb24b5a
Improve model tester (#19984)
* part 1

* part 2

* part 3

* fix

* For CANINE

* For ESMFold

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-02 17:38:44 +01:00
Saad Mahmud
7487743793
[Doctest] Add configuration_deberta_v2.py (#19995)
* Add example docstring for DebertaV2Config

* Add DebertaV2Config to documentation_tests

* Fix mistake with directory name
2022-11-02 16:22:11 +01:00
amyeroberts
9aedce99b0
Update auto processor to check image processor created (#20021) 2022-11-02 15:19:33 +00:00
Sylvain Gugger
49b77b89ea
Quality (#20002) 2022-11-02 09:53:37 -04:00