Sylvain Gugger
fb1c8db78a
Fix AutoTokenizer with subfolder passed ( #20110 )
2022-11-07 17:59:46 -05:00
Tom Aarsen
6156bffa2b
Replace awkward timm link with the expected one ( #20109 )
2022-11-07 13:57:39 -05:00
Steven Liu
71f772ebd0
Add new terms to the glossary ( #20051 )
...
* add new terms
* apply review
2022-11-07 10:45:27 -08:00
Tom Aarsen
d44ac47bac
docs: Fixed variables in f-strings ( #20087 )
...
* docs: Fixed variables in f-strings
* Replace unknown `block` with known `block_type` in ValueError
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add missing torch import in docs code block
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-11-07 13:18:09 -05:00
Yih-Dar
2bdd9fa284
Fix generate_dummy_inputs
for ImageGPTOnnxConfig
( #20103 )
...
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-07 16:31:26 +01:00
TAGAMI Yukihiro
cfaeb1539e
use huggingface_hub.model_inifo() to get pipline_tag ( #20077 )
2022-11-07 10:07:59 -05:00
Tom Aarsen
3222fc645b
docs: Resolve many typos in the English docs ( #20088 )
...
* docs: Fix typo in ONNX parser help: 'tolerence' => 'tolerance'
* docs: Resolve many typos in the English docs
Typos found via 'codespell ./docs/source/en'
2022-11-07 09:19:04 -05:00
Tom Aarsen
b8112eddec
Replace unsupported facebookresearch/bitsandbytes ( #20093 )
...
With https://github.com/TimDettmers/bitsandbytes , which is by the same author and is still being updated
2022-11-07 08:52:03 -05:00
Yih-Dar
4ab6e9e2f8
Skip 2 tests in VisionTextDualEncoderProcessorTest
( #20098 )
...
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-07 14:51:05 +01:00
Saad Mahmud
b77406bcb2
Removing RobertaConfig inheritance from CamembertConfig ( #20059 )
...
* swap RobertaConfig with PretrainedConfig
* Add camembert specific attributes
* Add PretrainedConfig docstring
* Add arguments docstring
* Change CamembertConfig docstring definition
* Fix typo CamembertConfig -> CamembertModel
* Fix typo BertModel -> CamembertModel
* Fix style of CamembertConfig
2022-11-07 08:50:10 -05:00
Saad Mahmud
9617b1304e
[Doctest] Add configuration_dpr.py ( #20080 )
...
* Add example docstring for DPRConfig
* Add DPRConfig to documentation_tests
2022-11-07 14:49:59 +01:00
Joao Gante
a0f8674303
Generate: TF contrastive search with XLA support ( #20050 )
...
* Add contrastive search
2022-11-07 10:54:29 +00:00
Christopher Akiki
504db92e7d
Update hub.py ( #20075 )
2022-11-04 22:25:02 +01:00
Christopher Akiki
4b86e44693
Update modeling_tf_utils.py ( #20076 )
2022-11-04 22:24:37 +01:00
amyeroberts
d68c46026b
Update defaults and logic to match old FE ( #20065 )
...
* Update defaults and logic to match old FE
* Use docker run rest values
2022-11-04 19:14:56 +00:00
Yih-Dar
c06d555647
Show installed libraries and their versions in GA jobs ( #20069 )
...
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-04 18:03:18 +01:00
Yih-Dar
2d02178e5c
Allow passing arguments to model testers for CLIP-like models ( #20044 )
...
* POC
* For more CLIP-like models
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-04 18:01:41 +01:00
Jordan Clive
3bd0007e87
Update documentation on seq2seq models with absolute positional embeddings, to be in line with Tips section for BERT and GPT2 ( #20068 )
...
Co-authored-by: jordiclive <jordiclive19@imperial.ac.uk>
2022-11-04 11:32:44 -04:00
Matt
6e1c5786dc
Update READMEs for ESMFold and add notebooks ( #20067 )
...
* Update READMEs for ESMFold and add notebooks
* Fix PyCharm formatting
* make fix-copies
2022-11-04 15:10:13 +00:00
H. Jhoo
707b12a353
change constant torch.tensor to torch.full ( #20061 )
2022-11-04 10:41:56 -04:00
NielsRogge
787620e2a2
[Swin] Add Swin SimMIM checkpoints ( #20034 )
...
* Fix Swin
* Remove file
* Update code snippet
* Add copied from to maskformer
* Fix docstring
* Add whole name to replace
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-11-04 15:32:44 +01:00
amyeroberts
3936411b9d
PoolformerImageProcessor defaults to match previous FE ( #20048 )
...
* Poolformer image processor defaults to previous FE
* Remove unnecessary math.floor
2022-11-04 13:52:58 +00:00
Sanchit Gandhi
94e17c456c
[Trainer] Fix model name in push_to_hub ( #20064 )
2022-11-04 13:40:21 +00:00
Sourab Mangrulkar
19067711e7
fix tokenizer_type
to avoid error when loading checkpoint back ( #20062 )
2022-11-04 19:04:01 +05:30
bhuang
3502c202f9
Update README.md ( #20063 )
2022-11-04 08:56:54 -04:00
Matt
1076d587b5
Fix ESM LM head test ( #20045 )
...
* Fix esm lm head test
* make fixup
2022-11-04 12:45:34 +00:00
Patrick Deutschmann
d447c460b1
Speed up TF token classification postprocessing by converting complete tensors to numpy ( #19976 )
...
* Speed up TF postprocessing by converting to numpy before
* Fix bug that was triggered when offset_mapping was None
Co-authored-by: Patrick Deutschmann <patrick.deutschmann@dedalus.com>
2022-11-03 16:56:22 +00:00
Sylvain Gugger
06886d5a68
Only resize embeddings when necessary ( #20043 )
...
* Only resize embeddings when necessary
* Add comment
2022-11-03 12:05:04 -04:00
Michael Benayoun
9080607b2c
Fixed torch.finfo issue with torch.fx ( #20040 )
2022-11-03 16:14:44 +01:00
Matt
6f257bb3c2
Update esmfold conversion script ( #20028 )
...
* Update ESM conversion script for ESMfold
* Fix bug in ESMFold example
* make fixup and move restypes to one line
2022-11-03 14:58:06 +00:00
Wang, Yi
2564f0c21d
fix jit trace error for model forward sequence is not aligned with jit.trace tuple input sequence, update related doc ( #19891 )
...
* fix jit trace error for classification usecase, update related doc
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* add implementation in torch 1.14.0
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* update_doc
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* update_doc
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2022-11-03 10:50:03 -04:00
Arthur
737bff6a36
[FuturWarning] Add futur warning for LEDForSequenceClassification ( #19066 )
...
* fix led eos_mask
* add Futur Warning
* revert uselesss cahnges
* Update src/transformers/models/led/modeling_led.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-11-03 15:26:09 +01:00
Sanchit Gandhi
06d488061f
[Whisper Tokenizer] Make more user-friendly ( #19921 )
...
* [Whisper Tokenizer] Make more user-friendly
* use property
* make indexing rigorous
* small clean-up
* tests
* skip seq2seq tests
* remove multilingual arg
* reorder args
* collapse to one function
Co-authored-by: ArthurZucker <arthur@huggingface.co>
* option to override attributes
Co-authored-by: ArthurZucker <arthur@huggingface.co>
* add to docs
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* make comment more clear
Co-authored-by: sgugger <sylvain@huggingface.co>
* don't add special tokens in get_decoder_prompt_ids
* add test for set_prefix_tokens
Co-authored-by: ArthurZucker <arthur@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: sgugger <sylvain@huggingface.co>
2022-11-03 14:22:40 +00:00
Saad Mahmud
790ff2544a
[Doctest] Add configuration_camembert.py ( #20039 )
...
* Add example docstring for CamembertConfig
* Add configuration_camembert to documentation_tests
2022-11-03 14:50:42 +01:00
Yih-Dar
9ccea7acb1
Fix some doctests after PR 15775 ( #20036 )
...
* Add skip_special_tokens=True in some doctest
* For T5
* Fix for speech_to_text.mdx
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-03 14:18:45 +01:00
amyeroberts
a639ea9e8a
Add **kwargs ( #20037 )
2022-11-03 12:51:49 +00:00
Nicolas Patry
ec6878f6ca
Now supporting pathlike in pipelines too. ( #20030 )
2022-11-03 09:14:45 +01:00
Steven Liu
aa39967b28
reorganize glossary ( #20010 )
2022-11-02 16:58:17 -07:00
Yih-Dar
305e8718b4
Show installed libraries and their versions in CI jobs ( #20026 )
...
* Show versions
* check
* store outputs
* revert
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-02 20:52:39 +01:00
Ben Eyal
9f9ddcc2de
🚨 🚨 🚨 Fix Issue 15003: SentencePiece Tokenizers Not Adding Special Tokens in convert_tokens_to_string
( #15775 )
...
* Add test for SentencePiece not adding special tokens to strings
* Add SentencePieceStringConversionMixin to fix issue 15003
* Fix conversion from tokens to string for most SentencePiece tokenizers
Tokenizers fixed:
- AlbertTokenizer
- BarthezTokenizer
- CamembertTokenizer
- FNetTokenizer
- M2M100Tokenizer
- MBart50Tokenizer
- PegasusTokenizer
- Speech2TextTokenizer
* Fix MarianTokenizer, adjust SentencePiece test to accomodate vocab
* Fix DebertaV2Tokenizer
* Ignore LayoutXLMTokenizer in SentencePiece string conversion test
* Run 'make style' and 'make quality'
* Clean convert_tokens_to_string test
Instead of explicitly ignoring LayoutXLMTokenizer in the test,
override the test in LayoutLMTokenizationTest and do nothing in it.
* Remove commented out code
* Improve robustness of convert_tokens_to_string test
Instead of comparing lengths of re-tokenized text and input_ids,
check that converting all special tokens to string yields a string
with all special tokens.
* Inline and remove SentencePieceStringConversionMixin
The convert_tokens_to_string method is now implemented
in each relevant SentencePiece tokenizer.
* Run 'make style' and 'make quality'
* Revert removal of space in convert_tokens_to_string
* Remove redundant import
* Revert test text to original
* Uncomment the lowercasing of the reverse_text variable
* Mimic Rust tokenizer behavior for tokenizers
- Albert
- Barthez
- Camembert
- MBart50
- T5
* Fix accidentally skipping test in wrong tokenizer
* Add test for equivalent Rust and slow tokenizer behavior
* Override _decode in BigBirdTokenizer to mimic Rust behavior
* Override _decode in FNetTokenizer to mimic Rust behavior
* Override _decode in XLNetTokenizer to mimic Rust behavior
* Remove unused 're' import
* Update DebertaV2Tokenizer to mimic Rust tokenizer
* Deberta tokenizer now behaves like Albert and its `convert_tokens_to_string` is not tested.
* Ignore problematic tests in Deberta V2
* Add comment on why the Deberta V2 tests are skipped
2022-11-02 15:45:38 -04:00
Yih-Dar
fb7cbe236b
Fix doctest ( #20023 )
...
* Fix doctest
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-02 19:37:25 +01:00
Yih-Dar
f69eb24b5a
Improve model tester ( #19984 )
...
* part 1
* part 2
* part 3
* fix
* For CANINE
* For ESMFold
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-02 17:38:44 +01:00
Saad Mahmud
7487743793
[Doctest] Add configuration_deberta_v2.py ( #19995 )
...
* Add example docstring for DebertaV2Config
* Add DebertaV2Config to documentation_tests
* Fix mistake with directory name
2022-11-02 16:22:11 +01:00
amyeroberts
9aedce99b0
Update auto processor to check image processor created ( #20021 )
2022-11-02 15:19:33 +00:00
Sylvain Gugger
49b77b89ea
Quality ( #20002 )
2022-11-02 09:53:37 -04:00
Yih-Dar
c6c9db3d0c
Fix gradient checkpoint test in encoder-decoder ( #20017 )
...
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-02 14:15:09 +01:00
amyeroberts
a6b7759880
Add Image Processors ( #19796 )
...
* Add CLIP image processor
* Crop size as dict too
* Update warning
* Actually use logger this time
* Normalize doesn't change dtype of input
* Add perceiver image processor
* Tidy up
* Add DPT image processor
* Add Vilt image processor
* Tidy up
* Add poolformer image processor
* Tidy up
* Add LayoutLM v2 and v3 imsge processors
* Tidy up
* Add Flava image processor
* Tidy up
* Add deit image processor
* Tidy up
* Add ConvNext image processor
* Tidy up
* Add levit image processor
* Add segformer image processor
* Add in post processing
* Fix up
* Add ImageGPT image processor
* Fixup
* Add mobilevit image processor
* Tidy up
* Add postprocessing
* Fixup
* Add VideoMAE image processor
* Tidy up
* Add ImageGPT image processor
* Fixup
* Add ViT image processor
* Tidy up
* Add beit image processor
* Add mobilevit image processor
* Tidy up
* Add postprocessing
* Fixup
* Fix up
* Fix flava and remove tree module
* Fix image classification pipeline failing tests
* Update feature extractor in trainer scripts
* Update pad_if_smaller to accept tuple and int size
* Update for image segmentation pipeline
* Update src/transformers/models/perceiver/image_processing_perceiver.py
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
* Update src/transformers/image_processing_utils.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/beit/image_processing_beit.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* PR comments - docstrings; remove accidentally added resize; var names
* Update docstrings
* Add exception if size is not in the right format
* Fix exception check
* Fix up
* Use shortest_edge in tuple in script
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
2022-11-02 11:57:36 +00:00
Ripose
2e3452af0f
make sentencepiece import conditional in bertjapanesetokenizer ( #20012 )
2022-11-02 07:44:37 -04:00
Yih-Dar
8827e1b217
clean up vision/text config dict arguments ( #19954 )
...
* clean up
* For backward compatibility
* clean up
* Same changes for more models
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-02 12:03:43 +01:00
Alara Dirik
cb630ffab8
Update object detection pipeline to use post_process_object_detection methods( #20004 )
2022-11-02 10:26:36 +03:00