Commit Graph

2240 Commits

Author SHA1 Message Date
Joao Gante
938cb04789
Generate: add Bloom fixes for contrastive search (#20213) 2022-11-14 18:34:11 +00:00
Yih-Dar
536e60d2c7
mark test_save_load_fast_init_from_base as is_flaky (#20200)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-14 18:51:33 +01:00
Younes Belkada
8dcf494ef1
[ROC_BERT] Make CI happy (#20175)
* fix slow test

* Update tests/models/roc_bert/test_modeling_roc_bert.py
2022-11-14 18:04:25 +01:00
Bartosz Szmelczynski
78a471ff71
Fix tapas scatter (#20149)
* First draft

* Remove scatter dependency

* Add require_torch

* update vectorized sum test, add clone call

* remove artifacts

* fix style

* fix style v2

* remove "scatter" mentions from the code base

* fix isort error

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-14 01:04:26 -05:00
Matthijs Hollemans
f711d683b5
add MobileNetV2 model (#17845)
* add model files etc for MobileNetV2

* rename files for MobileNetV1

* initial implementation of MobileNetV1

* fix conversion script

* cleanup

* write docs

* tweaks

* fix conversion script

* extract hidden states

* fix test cases

* make fixup

* fixup it all

* rename V1 to V2

* fix checkpoints

* fixup

* implement first block + weight conversion

* add remaining layers

* add output stride and dilation

* fixup

* add tests

* add deeplabv3+ head

* a bit of fixup

* finish deeplab conversion

* add link to doc

* fix issue with JIT trace

in_height and in_width would be Tensor objects during JIT trace, which caused Core ML conversion to fail on the remainder op. By making them ints, the result of the padding calculation becomes a constant value.

* cleanup

* fix order of models

* fix rebase error

* remove main from doc link

* add image processor

* remove old feature extractor

* fix converter + other issues

* fixup

* fix unit test

* add to onnx tests (but these appear broken now)

* add post_process_semantic_segmentation

* use google org

* remove unused imports

* move args

* replace weird assert
2022-11-14 01:00:10 -05:00
amyeroberts
6cc06d1739
Fix type - update any PIL.Image.Resampling (#20172) 2022-11-11 16:55:59 +00:00
NielsRogge
cbbeca3d17
[OWL-ViT] Make model consistent with CLIP (#20144)
* Apply fix

* Fix test

* Remove another argument which is not used

* Fix pipeline test

* Add argument back, add deprecation warning

* Add warning add other location

* Use warnings instead

* Add num_channels to config

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MBP.localdomain>
2022-11-11 11:36:17 +01:00
Arthur
61a51f5f23
Add Jukebox model (replaces #16875) (#17826) 2022-11-10 21:05:27 +01:00
Sylvain Gugger
9740a03f61
Skip broken test 2022-11-10 14:59:32 -05:00
Sanchit Gandhi
905e5773a3
[processor] Add 'model input names' property (#20117)
* [processor] Add 'model input names' property

* add test

* no f string

* add generic property method to mixin

* copy to multimodal

* copy to vision

* tests for all audio

* remove ad-hoc tests

* style

* fix flava test

* fix test

* fix processor code
2022-11-10 19:29:20 +00:00
Nicolas Patry
d066c3731b
Adding support for LayoutLMvX variants for object-detection. (#20143)
* Adding support for LayoutLMvX variants for `object-detection`.

* Revert bogs `layoutlm` feature extractor which does not exist (it was a
V2 model) .

* Updated condition.

* Handling the comments.
2022-11-10 11:33:38 +01:00
amyeroberts
f3d99e49d4
Update VisionEncoderDecoder to use an image processor (#20137)
* TrOCR processor uses an image processor

* Update VisionEncoderDecoder

* Add feature_extractor_class property
2022-11-09 16:31:05 +00:00
Joao Gante
f270b960d6
Generate: move generation_*.py src files into generation/*.py (#20096)
* move generation_*.py src files into generation/*.py

* populate generation.__init__ with lazy loading

* move imports and references from generation.xxx.object to generation.object
2022-11-09 15:34:08 +00:00
Nicolas Patry
bac2d29a80
Attempting to test automatically the _keys_to_ignore. (#20042)
* Attempting to test automatically the `_keys_to_ignore`.

* Style.

* First fix pass.

* Moving test on its own.

* Another batch.

* Second round removing BatchNorm

* Fixing layoutlmv{2,3} + support older Python.

* Disable miss missing warning.

* Removing dodgy additions.

* Big pass.

* mbart.

* More corrections.

* Fixup.

* Updating test_correct_missing_keys

* Add escape hatch for when the head has no extra params so doesn't need

the missing keys check.

* Fixing test.

* Greener.

* Green ! (except for weird splinter bug).

* Adding a test about `named_parameters` usage.

* Shorten message.

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* After rebase modifications.

* More explicit condition checking.

* Fixing slow tests issues.

* Remove extra pdb.

* Remove print.

* Attempt to make failure consistent + fixing roc_bert.

* Removing the seed  (all tests passing with it).

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-11-09 16:03:36 +01:00
Yih-Dar
c4cad8e301
Update CLIPSegModelTester (#20134)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-09 15:21:52 +01:00
amyeroberts
4eb918e656
AutoImageProcessor (#20111)
* AutoImageProcessor skeleton

* Update references

* Add mapping in init

* Add model image processors to __init__ for importing

* Add AutoImageProcessor tests

* Fix up

* Image Processor documentation

* Remove pdb

* Update docs/source/en/model_doc/mobilevit.mdx

* Update docs

* Don't add whitespace on json files

* Remove fixtures

* Move checking model config down

* Fix up

* Add check for image processor

* Remove FeatureExtractorMixin in docstrings

* Rename model_tmpfile to config_tmpfile

* Don't make None if not in image processor map
2022-11-08 19:54:41 +00:00
Weiwe Shi
efa889d2e4
Add RocBert (#20013)
* add roc_bert

* update roc_bert readme

* code style

* change name and delete unuse file

* udpate model file

* delete unuse log file

* delete tokenizer fast

* reformat code and change model file path

* add RocBertForPreTraining

* update docs

* delete wrong notes

* fix copies

* fix make repo-consistency error

* fix files are not present in the table of contents error

* change RocBert -> RoCBert

* add doc, add detail test

Co-authored-by: weiweishi <weiweishi@tencent.com>
2022-11-08 10:03:43 -05:00
NielsRogge
258963062b
Add CLIPSeg (#20066)
* Add first draft

* Update conversion script

* Improve conversion script

* Improve conversion script some more

* Add conditional embeddings

* Add initial decoder

* Fix activation function of decoder

* Make decoder outputs match original implementation

* Make decoder outputs match original implementation

* Add more copied from statements

* Improve model outputs

* Fix auto tokenizer file

* Fix more tests

* Add test

* Improve README and docs, improve conditional embeddings

* Fix more tests

* Remove print statements

* Remove initial embeddings

* Improve conversion script

* Add interpolation of position embeddings

* Finish addition of interpolation of position embeddings

* Add support for refined checkpoint

* Fix refined checkpoint

* Remove unused parameter

* Improve conversion script

* Add support for training

* Fix conversion script

* Add CLIPSegFeatureExtractor

* Fix processor

* Fix CLIPSegProcessor

* Fix conversion script

* Fix most tests

* Fix equivalence test

* Fix README

* Add model to doc tests

* Use better variable name

* Convert other checkpoint as well

* Update config, add link to paper

* Add docs

* Update organization

* Replace base_model_prefix with clip

* Fix base_model_prefix

* Fix checkpoint of config

* Fix config checkpoint

* Remove file

* Use logits for output

* Fix tests

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-11-08 10:55:47 +01:00
Yih-Dar
4ab6e9e2f8
Skip 2 tests in VisionTextDualEncoderProcessorTest (#20098)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-07 14:51:05 +01:00
Joao Gante
a0f8674303
Generate: TF contrastive search with XLA support (#20050)
* Add contrastive search
2022-11-07 10:54:29 +00:00
amyeroberts
d68c46026b
Update defaults and logic to match old FE (#20065)
* Update defaults and logic to match old FE

* Use docker run rest values
2022-11-04 19:14:56 +00:00
Yih-Dar
2d02178e5c
Allow passing arguments to model testers for CLIP-like models (#20044)
* POC

* For more CLIP-like models

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-04 18:01:41 +01:00
Matt
1076d587b5
Fix ESM LM head test (#20045)
* Fix esm lm head test

* make fixup
2022-11-04 12:45:34 +00:00
Michael Benayoun
9080607b2c
Fixed torch.finfo issue with torch.fx (#20040) 2022-11-03 16:14:44 +01:00
Sanchit Gandhi
06d488061f
[Whisper Tokenizer] Make more user-friendly (#19921)
* [Whisper Tokenizer] Make more user-friendly

* use property

* make indexing rigorous

* small clean-up

* tests

* skip seq2seq tests

* remove multilingual arg

* reorder args

* collapse to one function

Co-authored-by: ArthurZucker <arthur@huggingface.co>

* option to override attributes

Co-authored-by: ArthurZucker <arthur@huggingface.co>

* add to docs

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* make comment more clear

Co-authored-by: sgugger <sylvain@huggingface.co>

* don't add special tokens in get_decoder_prompt_ids

* add test for set_prefix_tokens

Co-authored-by: ArthurZucker <arthur@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: sgugger <sylvain@huggingface.co>
2022-11-03 14:22:40 +00:00
Nicolas Patry
ec6878f6ca
Now supporting pathlike in pipelines too. (#20030) 2022-11-03 09:14:45 +01:00
Ben Eyal
9f9ddcc2de
🚨 🚨 🚨 Fix Issue 15003: SentencePiece Tokenizers Not Adding Special Tokens in convert_tokens_to_string (#15775)
* Add test for SentencePiece not adding special tokens to strings

* Add SentencePieceStringConversionMixin to fix issue 15003

* Fix conversion from tokens to string for most SentencePiece tokenizers

Tokenizers fixed:
- AlbertTokenizer
- BarthezTokenizer
- CamembertTokenizer
- FNetTokenizer
- M2M100Tokenizer
- MBart50Tokenizer
- PegasusTokenizer
- Speech2TextTokenizer

* Fix MarianTokenizer, adjust SentencePiece test to accomodate vocab

* Fix DebertaV2Tokenizer

* Ignore LayoutXLMTokenizer in SentencePiece string conversion test

* Run 'make style' and 'make quality'

* Clean convert_tokens_to_string test

Instead of explicitly ignoring LayoutXLMTokenizer in the test,
override the test in LayoutLMTokenizationTest and do nothing in it.

* Remove commented out code

* Improve robustness of convert_tokens_to_string test

Instead of comparing lengths of re-tokenized text and input_ids,
check that converting all special tokens to string yields a string
with all special tokens.

* Inline and remove SentencePieceStringConversionMixin

The convert_tokens_to_string method is now implemented
in each relevant SentencePiece tokenizer.

* Run 'make style' and 'make quality'

* Revert removal of space in convert_tokens_to_string

* Remove redundant import

* Revert test text to original

* Uncomment the lowercasing of the reverse_text variable

* Mimic Rust tokenizer behavior for tokenizers

- Albert
- Barthez
- Camembert
- MBart50
- T5

* Fix accidentally skipping test in wrong tokenizer

* Add test for equivalent Rust and slow tokenizer behavior

* Override _decode in BigBirdTokenizer to mimic Rust behavior

* Override _decode in FNetTokenizer to mimic Rust behavior

* Override _decode in XLNetTokenizer to mimic Rust behavior

* Remove unused 're' import

* Update DebertaV2Tokenizer to mimic Rust tokenizer

* Deberta tokenizer now behaves like Albert and its `convert_tokens_to_string` is not tested.

* Ignore problematic tests in Deberta V2

* Add comment on why the Deberta V2 tests are skipped
2022-11-02 15:45:38 -04:00
Yih-Dar
f69eb24b5a
Improve model tester (#19984)
* part 1

* part 2

* part 3

* fix

* For CANINE

* For ESMFold

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-02 17:38:44 +01:00
amyeroberts
9aedce99b0
Update auto processor to check image processor created (#20021) 2022-11-02 15:19:33 +00:00
Sylvain Gugger
49b77b89ea
Quality (#20002) 2022-11-02 09:53:37 -04:00
Yih-Dar
c6c9db3d0c
Fix gradient checkpoint test in encoder-decoder (#20017)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-02 14:15:09 +01:00
amyeroberts
a6b7759880
Add Image Processors (#19796)
* Add CLIP image processor

* Crop size as dict too

* Update warning

* Actually use logger this time

* Normalize doesn't change dtype of input

* Add perceiver image processor

* Tidy up

* Add DPT image processor

* Add Vilt image processor

* Tidy up

* Add poolformer image processor

* Tidy up

* Add LayoutLM v2 and v3 imsge processors

* Tidy up

* Add Flava image processor

* Tidy up

* Add deit image processor

* Tidy up

* Add ConvNext image processor

* Tidy up

* Add levit image processor

* Add segformer image processor

* Add in post processing

* Fix up

* Add ImageGPT image processor

* Fixup

* Add mobilevit image processor

* Tidy up

* Add postprocessing

* Fixup

* Add VideoMAE image processor

* Tidy up

* Add ImageGPT image processor

* Fixup

* Add ViT image processor

* Tidy up

* Add beit image processor

* Add mobilevit image processor

* Tidy up

* Add postprocessing

* Fixup

* Fix up

* Fix flava and remove tree module

* Fix image classification pipeline failing tests

* Update feature extractor in trainer scripts

* Update pad_if_smaller to accept tuple and int size

* Update for image segmentation pipeline

* Update src/transformers/models/perceiver/image_processing_perceiver.py

Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>

* Update src/transformers/image_processing_utils.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/beit/image_processing_beit.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* PR comments - docstrings; remove accidentally added resize; var names

* Update docstrings

* Add exception if size is not in the right format

* Fix exception check

* Fix up

* Use shortest_edge in tuple in script

Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
2022-11-02 11:57:36 +00:00
Joao Gante
831590f6a9
Generate: contrastive search with full optional outputs (#19963)
* Use beam search functionality; Add extra outputs and test

* Add full tests for contrastive search

* Add error message on unconventional cache format
2022-11-01 18:15:36 +00:00
Mohit Sharma
c796b6dea6
Added onnx config whisper (#19525)
* Added onnx config whisper

* added whisper support onnx

* add audio input data

* added whisper support onnx

* fixed the seqlength value

* Updated the whisper onnx ocnfig

* restore files to old version

* removed attention mask from inputs

* Updated get_dummy_input_onnxruntime docstring

* Updated relative imports and token generation

* update docstring
2022-11-01 07:50:42 -04:00
Matt
7f9b7b3f0e
Add ESMFold (#19977)
* initial commit

* First draft that gets outputs without crashing!

* Add all the ported openfold dependencies

* testing

* Restructure config files for ESMFold

* Debugging to find output discrepancies

* Mainly style

* Make model runnable without extra deps

* Remove utils and merge them to the modeling file

* Use correct gelu and remove some debug prints

* More cleanup

* Update esm docs

* Update conversion script to support ESMFold properly

* Port some top-level changes from ESMFold repo

* Expand EsmFold docstrings

* Make attention_mask optional (default to all 1s)

* Add inference test for ESMFold

* Use config and not n kwargs

* Add modeling output class

* Remove einops

* Remove chunking in ESM FFN

* Update tests for ESMFold

* Quality

* REpo consistency

* Remove tree dependency from ESMFold

* make fixup

* Add an error in case my structure map function breaks later

* Remove needless code

* Stop auto-casting the LM to float16 so CPU tests pass

* Stop auto-casting the LM to float16 so CPU tests pass

* Final test updates

* Split test file

* Copyright and quality

* Unpin PyTorch to see built doc

* Fix config file to_dict() method

* Add some docstrings to the output

* Skip TF checkpoint tests for ESM until we reupload those

* make fixup

* More docstrings

* Unpin to get even with main

* Flag example to write

Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
2022-10-31 21:32:58 -04:00
NielsRogge
4c9e0f029e
Add support for gradient checkpointing (#19990)
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-10-31 18:37:17 +01:00
NielsRogge
0b294c2334
[Conditional, Deformable DETR] Add postprocessing methods (#19709)
* Add postprocessing methods

* Update docs

* Add fix

* Add test

* Add test for deformable detr postprocessing

* Add post processing methods for segmentation

* Update code examples

* Add post_process to make the pipeline work

* Apply updates

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-10-31 08:28:44 +01:00
Raghav Prabhakar
0d4c45c585
Add Onnx Config for ImageGPT (#19868)
* add Onnx Config for ImageGPT

* add generate_dummy_inputs for onnx config

* add TYPE_CHECKING clause

* Update doc for generate_dummy_inputs

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-10-28 09:39:53 -04:00
donguk.lim
347ba38cb4
Support segformer fx (#19924)
* Support segformer fx

* Add fx_compatible attribute to test_modeling_segformer.py

* Update glpn model (fx support)

glpn model was copied from segformer.

* Update utils/fx.py | add semantic-segmentation

for SegformerForSemanticSegmentation model

* Fix minor import order(isort)

* Add random input generation for segformer fx

Co-authored-by: noelbird <lduldu00228@gmail.com>
2022-10-28 08:44:38 -04:00
Sylvain Gugger
6c24443ff5
Safetensors tf (#19900)
* Wip

* Add safetensors support for TensorFlow

* First tests

* Add final test for now

* Retrigger CI like this

* Update src/transformers/modeling_tf_utils.py

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
2022-10-27 15:56:29 -04:00
Antonio Carlos Falcão Petri
ea118ae2e1
Fix bug in Wav2Vec2's GPU tests (#19803)
* Fix tests when running on GPU

* Fix tests that require mp.set_start_method
2022-10-27 09:00:03 -04:00
Yih-Dar
f1e42bc50e
Some fixes regarding auto mappings and test class names (#19923)
* Add pegasus_x

* ViTMSN

* ESM

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-10-27 14:38:59 +02:00
Younes Belkada
7629656926
accelerate support for RoBERTa family (#19906) 2022-10-26 22:41:53 +02:00
Patrick von Platen
6d023270f6
Allow flax subfolder (#19902)
* add first generation tutorial

* [Flax] Add subfolder functionality

* [Flax] Add subfolder functionality

* up

* finish

* delete file and re-add test
2022-10-26 18:33:23 +02:00
Yih-Dar
688c3e8e40
Update max_diff in test_save_load_fast_init_to_base (#19849)
* Fix test_save_load_fast_init_to_base

* Fix test_save_load_fast_init_to_base

* update

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-10-26 17:09:47 +02:00
Nicolas Patry
5fd5990dce
Factored out some code in the image-segmentation pipeline. (#19727)
* Factored out some code in the image-segmentation pipeline

Re-enable `small_model_pt`.

Re-enable `small_model_pt`.

Enabling the current test with the current values.

Debugging the values on the CI.

More logs ? Printing doesn't work ?

Using the CI values instead. Seems to be a Pillow sensitivity.

Added a test showcasing that models not supporting some tasks get a
clear error.

Factored out code.

Further factor out.

Fixup.

Bad rebase.

Put `panoptic` before `instance` as it should be a superset.

* Fixing tests.

* Adding subtasks tests

+ Fixes `instance` segmentation which was broken due to default and
non kwargs arguments.

* Fix bad replace.
2022-10-26 10:44:36 +02:00
Yih-Dar
f9257843b5
Fix incorrect model<->tokenizer mapping in tokenization testing (#19872)
* Fix model-tokenizer mapping

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-10-25 16:02:13 +02:00
Lysandre Debut
eedaba682f
[Past CI] Vilt only supports PT >= v1.10 (#19851)
* Support for Vilt in v1.9

* Skip if not higher or equal than 1.10

* Move test :)

* I am bad at python
2022-10-25 15:59:35 +02:00
Guillaume Klein
ab108a0e31
Add missing lang tokens in M2M100Tokenizer.get_vocab (#18416) 2022-10-25 09:18:24 -04:00
Sylvain Gugger
d4eb52d13d
Refactor conversion function (#19799)
* Refactor conversion function

* Remove dupe line

* Fixes

* Fixes

* Use the right variable...

* Fix last test
2022-10-24 13:48:40 -04:00