transformers/tests/utils/test_image_utils.py
amyeroberts 4181320b8c
Add normalize to image transforms module (#19544)
* Adapt FE methods to transforms library

* Mixin for saving the image processor

* Base processor skeleton

* BatchFeature for packaging image processor outputs

* Initial image processor for GLPN

* REmove accidental import

* Fixup and docs

* Mixin for saving the image processor

* Fixup and docs

* Import BatchFeature from feature_extraction_utils

* Fixup and docs

* Fixup and docs

* Fixup and docs

* Fixup and docs

* BatchFeature for packaging image processor outputs

* Import BatchFeature from feature_extraction_utils

* Import BatchFeature from feature_extraction_utils

* Fixup and docs

* Fixup and docs

* BatchFeature for packaging image processor outputs

* Import BatchFeature from feature_extraction_utils

* Fixup and docs

* Mixin for saving the image processor

* Fixup and docs

* Add rescale back and remove ImageType

* fix import mistake

* Fix enum var reference

* Can transform and specify image data format

* Remove redundant function

* Update reference

* Data format flag for rescale

* Fix typo

* Fix dimension check

* Fixes to make IP and FE outputs match

* Add tests for transforms

* Add test for utils

* Update some docstrings

* Make sure in channels last before converting to PIL

* Remove default to numpy batching

* Fix up

* Add docstring and model_input_types

* Use feature processor config from hub

* Alias GLPN feature extractor to image processor

* Alias feature extractor mixin

* Add return_numpy=False flag for resize

* Fix up

* Fix up

* Use different frameworks safely

* Safely import PIL

* Call function checking if PIL available

* Only import if vision available

* Address Sylvain PR comments
Co-authored-by: Sylvain.gugger@gmail.com

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/image_transforms.py

Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>

* Update src/transformers/models/glpn/feature_extraction_glpn.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Add in docstrings

* Fix TFSwinSelfAttention to have relative position index as non-trainable weight (#18226)

Signed-off-by: Seunghwan Hong <seunghwan@scatterlab.co.kr>

* Refactor `TFSwinLayer` to increase serving compatibility (#18352)

* Refactor `TFSwinLayer` to increase serving compatibility

Signed-off-by: Seunghwan Hong <seunghwan@scatterlab.co.kr>

* Fix missed parameters while refactoring

Signed-off-by: Seunghwan Hong <seunghwan@scatterlab.co.kr>

* Fix window_reverse to calculate batch size

Signed-off-by: Seunghwan Hong <harrydrippin@gmail.com>
Co-Authored-By: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Add TF prefix to TF-Res test class (#18481)

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Remove py.typed (#18485)

* Fix pipeline tests (#18487)

* Fix pipeline tests

* Make sure all pipelines tests run with init changes

* Use new huggingface_hub tools for download models (#18438)

* Draft new cached_file

* Initial draft for config and model

* Small fixes

* Fix first batch of tests

* Look in cache when internet is down

* Fix last tests

* Bad black, not fixing all quality errors

* Make diff less

* Implement change for TF and Flax models

* Add tokenizer and feature extractor

* For compatibility with main

* Add utils to move the cache and auto-do it at first use.

* Quality

* Deal with empty commit shas

* Deal with empty etag

* Address review comments

* Fix `test_dbmdz_english` by updating expected values (#18482)

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Move cache folder to huggingface/hub for consistency with hf_hub (#18492)

* Move cache folder to just huggingface

* Thank you VsCode for this needless import

* Move to hub

* Forgot one

* Update some expected values in `quicktour.mdx` for `resampy 0.3.0` (#18484)

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Forgot one new_ for cache migration

* disable Onnx test for google/long-t5-tglobal-base (#18454)

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Typo reported by Joel Grus on TWTR (#18493)

* Just re-reading the whole doc every couple of months 😬 (#18489)

* Delete valohai.yaml

* NLP => ML

* typo

* website supports https

* datasets

* 60k + modalities

* unrelated link fixing for accelerate

* Ok those links were actually broken

* Fix link

* Make `AutoTokenizer` auto-link

* wording tweak

* add at least one non-nlp task

* `transformers-cli login` => `huggingface-cli login` (#18490)

* zero chance anyone's using that constant no?

* `transformers-cli login` => `huggingface-cli login`

* `transformers-cli repo create` => `huggingface-cli repo create`

* `make style`

* Add seed setting to image classification example (#18519)

* [DX fix] Fixing QA pipeline streaming a dataset. (#18516)

* [DX fix] Fixing QA pipeline streaming a dataset.

QuestionAnsweringArgumentHandler would iterate over the whole dataset
effectively killing all properties of the pipeline.
This restores nice properties when using `Dataset` or `Generator` since
those are meant to be consumed lazily.

* Handling TF better.

* Clean up hub (#18497)

* Clean up utils.hub

* Remove imports

* More fixes

* Last fix

* update fsdp docs (#18521)

* updating fsdp documentation

* typo fix

* Fix compatibility with 1.12 (#17925)

* Fix compatibility with 1.12

* Remove pin from examples requirements

* Update torch scatter version

* Fix compatibility with 1.12

* Remove pin from examples requirements

* Update torch scatter version

* fix torch.onnx.symbolic_opset12 import

* Reject bad version

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Remove debug statement

* Specify en in doc-builder README example (#18526)

Co-authored-by: Ankur Goyal <ankur@impira.com>

* New cache fixes: add safeguard before looking in folders (#18522)

* unpin resampy (#18527)

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

*  update to use interlibrary links instead of Markdown (#18500)

* Add example of multimodal usage to pipeline tutorial (#18498)

* 📝 add example of multimodal usage to pipeline tutorial

* 🖍 apply feedbacks

* 🖍 apply niels feedback

* [VideoMAE] Add model to doc tests (#18523)

* Add videomae to doc tests

* Add pip install decord

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>

* Update perf_train_gpu_one.mdx (#18532)

* Update no_trainer.py scripts to include accelerate gradient accumulation wrapper (#18473)

* Added accelerate gradient accumulation wrapper to run_image_classification_no_trainer.py example script

* make fixup changes

* PR comments

* changed input to Acceletor based on PR comment, ran make fixup

* Added comment explaining the sync_gradients statement

* Fixed lr scheduler max steps

* Changed run_clm_no_trainer.py script to use accelerate gradient accum wrapper

* Fixed all scripts except wav2vec2 pretraining to use accelerate gradient accum wrapper

* Added accelerate gradient accum wrapper for wav2vec2_pretraining_no_trainer.py script

* make fixup and lr_scheduler step inserted back into run_qa_beam_search_no_trainer.py

* removed changes to run_wav2vec2_pretraining_no_trainer.py script and fixed using wrong constant in qa_beam_search_no_trainer.py script

* Add Spanish translation of converting_tensorflow_models.mdx (#18512)

* Add file in spanish docs to be translated

* Finish translation to Spanish

* Improve Spanish  wording

* Add suggested changes from review

* Spanish translation of summarization.mdx (#15947) (#18477)

* Add Spanish translation of summarization.mdx

* Apply suggestions from code review

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Let's not cast them all (#18471)

* add correct dtypes when checking for params dtype

* forward contrib credits

* Update src/transformers/modeling_utils.py

Co-authored-by: Thomas Wang <24695242+thomasw21@users.noreply.github.com>

* more comments

- added more comments on why we cast only floating point parameters

* Update src/transformers/modeling_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: sgugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Thomas Wang <24695242+thomasw21@users.noreply.github.com>

* fix: data2vec-vision Onnx ready-made configuration. (#18427)

* feat: add the data2vec conf that are missing https://huggingface.co/docs/transformers/serialization

* fix: wrong config

* Add mt5 onnx config (#18394)

* update features

* MT5OnnxConfig added with updated with tests and docs

* fix imports

* fix onnc_config_cls for mt5

Co-authored-by: Thomas Chaigneau <thomas.deeptools.ai>

* Minor update of `run_call_with_unpacked_inputs` (#18541)

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* BART - Fix attention mask device issue on copied models (#18540)

* attempt to fix attn mask device

* fix bart `_prepare_decoder_attention_mask`

- add correct device
- run `make fix-copies` to propagate the fix

* Adding a new `align_to_words` param to qa pipeline. (#18010)

* Adding a new `align_to_words` param to qa pipeline.

* Update src/transformers/pipelines/question_answering.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Import protection.

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* 📝 update metric with evaluate (#18535)

* Restore _init_weights value in no_init_weights (#18504)

* Recover _init_weights value in no_init_weights

For potential nested use. 
In addition, users might modify private no_init_weights as well.

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Remove private variable change check

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Clean up comment

* 📝 update documentation build section (#18548)

* `bitsandbytes` - `Linear8bitLt` integration into `transformers` models (#17901)

* first commit

* correct replace function

* add final changes

- works like charm!
- cannot implement tests yet
- tested

* clean up a bit

* add bitsandbytes dependencies

* working version

- added import function
- added bitsandbytes utils file

* small fix

* small fix

- fix import issue

* fix import issues

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* refactor a bit

- move bitsandbytes utils to utils
- change comments on functions

* reformat docstring

- reformat docstring on init_empty_weights_8bit

* Update src/transformers/__init__.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* revert bad formatting

* change to bitsandbytes

* refactor a bit

- remove init8bit since it is useless

* more refactoring

- fixed init empty weights issue
- added threshold param

* small hack to make it work

* Update src/transformers/modeling_utils.py

* Update src/transformers/modeling_utils.py

* revmoe the small hack

* modify utils file

* make style + refactor a bit

* create correctly device map

* add correct dtype for device map creation

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* apply suggestions

- remove with torch.grad
- do not rely on Python bool magic!

* add docstring

 - add docstring for new kwargs

* add docstring

- comment `replace_8bit_linear` function
- fix weird formatting

* - added more documentation
- added new utility function for memory footprint tracking
- colab demo to add

* few modifs

- typo doc
- force cast into float16 when load_in_8bit is enabled

* added colab link

* add test architecture + docstring a bit

* refactor a bit testing class

* make style + refactor a bit

* enhance checks

- add more checks
- start writing saving test

* clean up a bit

* male style

* add more details on doc

* add more tests

- still needs to fix 2 tests

* replace by "or"

- could not fix it from GitHub GUI

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* refactor a bit testing code + add readme

* make style

* fix import issue

* Update src/transformers/modeling_utils.py

Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>

* add few comments

* add more doctring + make style

* more docstring

* raise error when loaded in 8bit

* make style

* add warning if loaded on CPU

* add small sanity check

* fix small comment

* add bitsandbytes on dockerfile

* Improve documentation

- improve documentation from comments

* add few comments

* slow tests pass on the VM but not on the CI VM

* Fix merge conflict

* make style

* another test should pass on a multi gpu setup

* fix bad import in testing file

* Fix slow tests

- remove dummy batches
- no more CUDA illegal memory errors

* odify dockerfile

* Update docs/source/en/main_classes/model.mdx

* Update Dockerfile

* Update model.mdx

* Update Dockerfile

* Apply suggestions from code review

* few modifications

- lm head can stay on disk/cpu
- change model name so that test pass

* change test value

- change test value to the correct output
- torch bmm changed to baddmm in bloom modeling when merging

* modify installation guidelines

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* replace `n`by `name`

* merge `load_in_8bit` and `low_cpu_mem_usage`

* first try - keep the lm head in full precision

* better check

- check the attribute `base_model_prefix` instead of computing the number of parameters

* added more tests

* Update src/transformers/utils/bitsandbytes.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Merge branch 'integration-8bit' of https://github.com/younesbelkada/transformers into integration-8bit

* improve documentation

- fix typos for installation
- change title in the documentation

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>

* TF: XLA-trainable DeBERTa v2 (#18546)

* fix deberta issues

* add different code paths for gpu and tpu

* shorter gpu take along axis

* Stable Dropout without tf cond

* variable must be float

* Preserve hub-related kwargs in AutoModel.from_pretrained (#18545)

* Preserve hub-related kwargs in AutoModel.from_pretrained

* Fix tests

* Remove debug statement

* TF Examples Rewrite (#18451)

* Finished QA example

* Dodge a merge conflict

* Update text classification and LM examples

* Update NER example

* New Keras metrics WIP, fix NER example

* Update NER example

* Update MC, summarization and translation examples

* Add XLA warnings when shapes are variable

* Make sure batch_size is consistently scaled by num_replicas

* Add PushToHubCallback to all models

* Add docs links for KerasMetricCallback

* Add docs links for prepare_tf_dataset and jit_compile

* Correct inferred model names

* Don't assume the dataset has 'lang'

* Don't assume the dataset has 'lang'

* Write metrics in text classification

* Add 'framework' to TrainingArguments and TFTrainingArguments

* Export metrics in all examples and add tests

* Fix training args for Flax

* Update command line args for translation test

* make fixup

* Fix accidentally running other tests in fp16

* Remove do_train/do_eval from run_clm.py

* Remove do_train/do_eval from run_mlm.py

* Add tensorflow tests to circleci

* Fix circleci

* Update examples/tensorflow/language-modeling/run_mlm.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update examples/tensorflow/test_tensorflow_examples.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update examples/tensorflow/translation/run_translation.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update examples/tensorflow/token-classification/run_ner.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Fix save path for tests

* Fix some model card kwargs

* Explain the magical -1000

* Actually enable tests this time

* Skip text classification PR until we fix shape inference

* make fixup

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Use commit hash to look in cache instead of calling head (#18534)

* Use commit hash to look in cache instead of calling head

* Add tests

* Add attr for local configs too

* Stupid typos

* Fix tests

* Update src/transformers/utils/hub.py

Co-authored-by: Julien Chaumond <julien@huggingface.co>

* Address Julien's comments

Co-authored-by: Julien Chaumond <julien@huggingface.co>

* `pipeline` support for `device="mps"` (or any other string) (#18494)

* `pipeline` support for `device="mps"` (or any other string)

* Simplify `if` nesting

* Update src/transformers/pipelines/base.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix? @sgugger

* passing `attr=None` is not the same as not passing `attr` 🤯

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update philosophy to include other preprocessing classes (#18550)

* 📝 update philosophy to include other preprocessing classes

* 🖍 apply feedbacks

* Properly move cache when it is not in default path (#18563)

* Adds CLIP to models exportable with ONNX (#18515)

* onnx config for clip

* default opset as 14

* changes from the original repo

* input values order fix

* outputs fix

* remove unused import

* ran make fix-copies

* black format

* review comments: forward ref, import fix, model change revert, .to cleanup

* make style

* formatting fixes

* revert groupvit

* comment for cast to int32

* comment fix

* make .T as .t() for onnx conversion

* ran make fix-copies

* remove unneeded comment

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix copies

* remove comment

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* raise atol for MT5OnnxConfig (#18560)

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* fix string (#18568)

* Segformer TF: fix output size in documentation (#18572)

* Segformer TF: fix output size in doc

* Segformer pytorch: fix output size in doc

Co-authored-by: Maxime Gardoni <maxime.gardoni@ecorobotix.com>

* Fix resizing bug in OWL-ViT (#18573)

* Fixes resizing bug in OWL-ViT
* Defaults to square resize if size is set to an int
* Sets do_center_crop default value to False

* Fix LayoutLMv3 documentation (#17932)

* fix typos

* fix sequence_length docs of LayoutLMv3Model

* delete trailing white spaces

* fix layoutlmv3 docs more

* apply make fixup & quality

* change to two versions of input docstring

* apply make fixup & quality

* Skip broken tests

* Change BartLearnedPositionalEmbedding's forward method signature to support Opacus training (#18486)

* changing BartLearnedPositionalEmbedding forward signature and references to it

* removing debugging dead code (thanks style checker)

* blackened modeling_bart file

* removing copy inconsistencies via make fix-copies

* changing references to copied signatures in Bart variants

* make fix-copies once more

* using expand over repeat (thanks @michaelbenayoun)

* expand instead of repeat for all model copies

Co-authored-by: Daniel Jones <jonesdaniel@microsoft.com>

* german docs translation (#18544)

* Create _config.py

* Create _toctree.yml

* Create index.mdx

not sure about "du / ihr" oder "sie"

* Create quicktour.mdx

* Update _toctree.yml

* Update build_documentation.yml

* Update build_pr_documentation.yml

* fix build

* Update index.mdx

* Update quicktour.mdx

* Create installation.mdx

* Update _toctree.yml

* Deberta V2: Fix critical trace warnings to allow ONNX export (#18272)

* Fix critical trace warnings to allow ONNX export

* Force input to `sqrt` to be float type

* Cleanup code

* Remove unused import statement

* Update model sew

* Small refactor

Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>

* Use broadcasting instead of repeat

* Implement suggestion

Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>

* Match deberta v2 changes in sew_d

* Improve code quality

* Update code quality

* Consistency of small refactor

* Match changes in sew_d

Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>

* [FX] _generate_dummy_input supports audio-classification models for labels (#18580)

* Support audio classification architectures for labels generation, as well as provides a flag to print warnings or not

* Use ENV_VARS_TRUE_VALUES

* Fix docstrings with last version of hf-doc-builder styler (#18581)

* Fix docstrings with last version of hf-doc-builder styler

* Remove empty Parameter block

* Bump nbconvert from 6.0.1 to 6.3.0 in /examples/research_projects/lxmert (#18565)

Bumps [nbconvert](https://github.com/jupyter/nbconvert) from 6.0.1 to 6.3.0.
- [Release notes](https://github.com/jupyter/nbconvert/releases)
- [Commits](https://github.com/jupyter/nbconvert/compare/6.0.1...6.3.0)

---
updated-dependencies:
- dependency-name: nbconvert
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump nbconvert in /examples/research_projects/visual_bert (#18566)

Bumps [nbconvert](https://github.com/jupyter/nbconvert) from 6.0.1 to 6.3.0.
- [Release notes](https://github.com/jupyter/nbconvert/releases)
- [Commits](https://github.com/jupyter/nbconvert/compare/6.0.1...6.3.0)

---
updated-dependencies:
- dependency-name: nbconvert
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* fix owlvit tests, update docstring examples (#18586)

* Return the permuted hidden states if return_dict=True (#18578)

* Load sharded pt to flax (#18419)

* initial commit

* add small test

* add cross pt tf flag to test

* fix quality

* style

* update test with new repo

* fix failing test

* update

* fix wrong param ordering

* style

* update based on review

* update related to recent new caching mechanism

* quality

* Update based on review

Co-authored-by: sgugger <sylvain.gugger@gmail.com>

* quality and style

* Update src/transformers/modeling_flax_utils.py
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Add type hints for ViLT models (#18577)

* Add type hints for Vilt models

* Add missing return type for TokenClassification class

* update doc for perf_train_cpu_many, add intel mpi introduction (#18576)

* update doc for perf_train_cpu_many, add mpi introduction

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* Update docs/source/en/perf_train_cpu_many.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/en/perf_train_cpu_many.mdx

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* typos (#18594)

* FSDP bug fix for `load_state_dict` (#18596)

* Add `TFAutoModelForSemanticSegmentation` to the main `__init__.py` (#18600)

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Generate: validate `model_kwargs` (and catch typos in generate arguments) (#18261)

* validate generate model_kwargs

* generate tests -- not all models have an attn mask

* Supporting seq2seq models for `bitsandbytes` integration (#18579)

* Supporting seq2seq models for `bitsandbytes` integration

- `bitsandbytes` integration supports now seq2seq models
- check if a model has tied weights as an additional check

* small modification

- tie the weights before looking at tied weights!

* Add Donut (#18488)

* First draft

* Improve script

* Update script

* Make conversion work

* Add final_layer_norm attribute to Swin's config

* Add DonutProcessor

* Convert more models

* Improve feature extractor and convert base models

* Fix bug

* Improve integration tests

* Improve integration tests and add model to README

* Add doc test

* Add feature extractor to docs

* Fix integration tests

* Remove register_buffer

* Fix toctree and add missing attribute

* Add DonutSwin

* Make conversion script work

* Improve conversion script

* Address comment

* Fix bug

* Fix another bug

* Remove deprecated method from docs

* Make Swin and Swinv2 untouched

* Fix code examples

* Fix processor

* Update model_type to donut-swin

* Add feature extractor tests, add token2json method, improve feature extractor

* Fix failing tests, remove integration test

* Add do_thumbnail for consistency

* Improve code examples

* Add code example for document parsing

* Add DonutSwin to MODEL_NAMES_MAPPING

* Add model to appropriate place in toctree

* Update namespace to appropriate organization

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>

* Fix URLs (#18604)

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>

* Update BLOOM parameter counts (#18531)

* Update BLOOM parameter counts

* Update BLOOM parameter counts

* [doc] fix anchors (#18591)

the manual anchors end up being duplicated with automatically added anchors and no longer work.

* [fsmt] deal with -100 indices in decoder ids (#18592)

* [fsmt] deal with -100 indices in decoder ids

Fixes: https://github.com/huggingface/transformers/issues/17945

decoder ids get the default index -100, which breaks the model - like t5 and many other models add a fix to replace -100 with the correct pad index. 

For some reason this use case hasn't been used with this model until recently - so this issue was there since the beginning it seems.

Any suggestions to how to add a simple test here? or perhaps we have something similar already? user's script is quite massive.

* style

* small change (#18584)

* Flax Remat for LongT5 (#17994)

* [Flax] Add remat (gradient checkpointing)

* fix variable naming in test

* flip: checkpoint using a method

* fix naming

* fix class naming

* apply PVP's suggestions from code review

* add gradient_checkpointing to examples

* Add gradient_checkpointing to run_mlm_flax

* Add remat to longt5

* Add gradient checkpointing test longt5

* Fix args errors

* Fix remaining tests

* Make fixup & quality fixes

* replace kwargs

* remove unecessary kwargs

* Make fixup changes

* revert long_t5_flax changes

* Remove return_dict and copy to LongT5

* Remove test_gradient_checkpointing

Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>

* mac m1 `mps` integration (#18598)

* mac m1 `mps` integration

* Update docs/source/en/main_classes/trainer.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* addressing comments

* Apply suggestions from code review

Co-authored-by: Dan Saattrup Nielsen <47701536+saattrupdan@users.noreply.github.com>

* resolve comment

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Dan Saattrup Nielsen <47701536+saattrupdan@users.noreply.github.com>

* Change scheduled CIs to use torch 1.12.1 (#18644)

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Add checks for some workflow jobs (#18583)

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* TF: Fix generation repetition penalty with XLA (#18648)

* Update longt5.mdx (#18634)

* Update run_translation_no_trainer.py (#18637)

* Update run_translation_no_trainer.py

found an error in selecting `no_decay` parameters and some small modifications when the user continues to train from a checkpoint

* fixs `no_decay` and `resume_step` issue

1. change `no_decay` list
2. if use continue to train their model from provided checkpoint, the `resume_step` will not be initialized properly if `args.gradient_accumulation_steps != 1`

* [bnb] Minor modifications (#18631)

* bnb minor modifications

- refactor documentation
- add troubleshooting README
- add PyPi library on DockerFile

* Apply suggestions from code review

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Apply suggestions from code review

* Apply suggestions from code review

* Apply suggestions from code review

* put in one block

- put bash instructions in one block

* update readme

- refactor a bit hardware requirements

* change text a bit

* Apply suggestions from code review

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

* apply suggestions

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

* add link to paper

* Apply suggestions from code review

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update tests/mixed_int8/README.md

* Apply suggestions from code review

* refactor a bit

* add instructions Turing & Amperer

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* add A6000

* clarify a bit

* remove small part

* Update tests/mixed_int8/README.md

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

* Examples: add Bloom support for token classification (#18632)

* examples: add Bloom support for token classification (FLAX, PyTorch and TensorFlow)

* examples: remove support for Bloom in token classication (FLAX and TensorFlow currently have no support for it)

* Fix Yolos ONNX export test (#18606)

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fixup

* Fix up

* Move PIL default arguments inside function for safe imports

* Add image utils to toctree

* Update `rescale` method to reflect changes in #18677

* Update docs/source/en/internal/image_processing_utils.mdx

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Address Niels PR comments

* Add normalize method to transforms library

* Apply suggestions from code review - remove defaults to None

Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix docstrings and revert to PIL.Image.XXX resampling

Use PIL.Image.XXX resampling values instead of PIL.Image.Resampling.XXX enum as it's only in the recent version >= 9.10 and version is not yet pinned and older version support deprecated

* Some more docstrings and PIL.Image tidy up

* Reorganise arguments so flags by modifiers

* Few last docstring fixes

* Add normalize to docs

* Accept PIL.Image inputs with deprecation warning

* Update src/transformers/image_transforms.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update warning to include version

* Trigger CI - hash clash on doc build

Signed-off-by: Seunghwan Hong <seunghwan@scatterlab.co.kr>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: Amy Roberts <amyeroberts@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Seunghwan Hong <harrydrippin@gmail.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
Co-authored-by: Ankur Goyal <ankrgyl@gmail.com>
Co-authored-by: Ankur Goyal <ankur@impira.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
Co-authored-by: Mishig Davaadorj <dmishig@gmail.com>
Co-authored-by: Rasmus Arpe Fogh Jensen <Rasmus.arpe@gmail.com>
Co-authored-by: Ian Castillo <7807897+donelianc@users.noreply.github.com>
Co-authored-by: AguilaCudicio <aguila.cudicio@gmail.com>
Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Thomas Wang <24695242+thomasw21@users.noreply.github.com>
Co-authored-by: Niklas Hansson <niklas.sven.hansson@gmail.com>
Co-authored-by: Thomas Chaigneau <t.chaigneau.tc@gmail.com>
Co-authored-by: YouJiacheng <1503679330@qq.com>
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Dhruv Karan <k4r4n.dhruv@gmail.com>
Co-authored-by: Michael Wyatt <mrwyattii@gmail.com>
Co-authored-by: Maxime G <joihn@users.noreply.github.com>
Co-authored-by: Maxime Gardoni <maxime.gardoni@ecorobotix.com>
Co-authored-by: Wonseok Lee (Jack) <rollerkid02@snu.ac.kr>
Co-authored-by: Dan Jones <dan.j.jones2@gmail.com>
Co-authored-by: Daniel Jones <jonesdaniel@microsoft.com>
Co-authored-by: flozi00 <flozi00.fz@gmail.com>
Co-authored-by: iiLaurens <iiLaurens@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Wang, Yi <yi.a.wang@intel.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: Karim Foda <35491698+KMFODA@users.noreply.github.com>
Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>
Co-authored-by: Dan Saattrup Nielsen <47701536+saattrupdan@users.noreply.github.com>
Co-authored-by: zhoutang776 <47708118+zhoutang776@users.noreply.github.com>
Co-authored-by: Stefan Schweter <stefan@schweter.it>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2022-10-17 17:02:14 +01:00

561 lines
25 KiB
Python

# coding=utf-8
# Copyright 2021 HuggingFace Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import unittest
import datasets
import numpy as np
import pytest
from transformers import is_torch_available, is_vision_available
from transformers.image_utils import ChannelDimension, get_channel_dimension_axis
from transformers.testing_utils import require_torch, require_vision
if is_torch_available():
import torch
if is_vision_available():
import PIL.Image
from transformers import ImageFeatureExtractionMixin
from transformers.image_utils import get_image_size, infer_channel_dimension_format, load_image
def get_random_image(height, width):
random_array = np.random.randint(0, 256, (height, width, 3), dtype=np.uint8)
return PIL.Image.fromarray(random_array)
@require_vision
class ImageFeatureExtractionTester(unittest.TestCase):
def test_conversion_image_to_array(self):
feature_extractor = ImageFeatureExtractionMixin()
image = get_random_image(16, 32)
# Conversion with defaults (rescale + channel first)
array1 = feature_extractor.to_numpy_array(image)
self.assertTrue(array1.dtype, np.float32)
self.assertEqual(array1.shape, (3, 16, 32))
# Conversion with rescale and not channel first
array2 = feature_extractor.to_numpy_array(image, channel_first=False)
self.assertTrue(array2.dtype, np.float32)
self.assertEqual(array2.shape, (16, 32, 3))
self.assertTrue(np.array_equal(array1, array2.transpose(2, 0, 1)))
# Conversion with no rescale and channel first
array3 = feature_extractor.to_numpy_array(image, rescale=False)
self.assertTrue(array3.dtype, np.uint8)
self.assertEqual(array3.shape, (3, 16, 32))
self.assertTrue(np.array_equal(array1, array3.astype(np.float32) * (1 / 255.0)))
# Conversion with no rescale and not channel first
array4 = feature_extractor.to_numpy_array(image, rescale=False, channel_first=False)
self.assertTrue(array4.dtype, np.uint8)
self.assertEqual(array4.shape, (16, 32, 3))
self.assertTrue(np.array_equal(array2, array4.astype(np.float32) * (1 / 255.0)))
def test_conversion_array_to_array(self):
feature_extractor = ImageFeatureExtractionMixin()
array = np.random.randint(0, 256, (16, 32, 3), dtype=np.uint8)
# By default, rescale (for an array of ints) and channel permute
array1 = feature_extractor.to_numpy_array(array)
self.assertTrue(array1.dtype, np.float32)
self.assertEqual(array1.shape, (3, 16, 32))
self.assertTrue(np.array_equal(array1, array.transpose(2, 0, 1).astype(np.float32) * (1 / 255.0)))
# Same with no permute
array2 = feature_extractor.to_numpy_array(array, channel_first=False)
self.assertTrue(array2.dtype, np.float32)
self.assertEqual(array2.shape, (16, 32, 3))
self.assertTrue(np.array_equal(array2, array.astype(np.float32) * (1 / 255.0)))
# Force rescale to False
array3 = feature_extractor.to_numpy_array(array, rescale=False)
self.assertTrue(array3.dtype, np.uint8)
self.assertEqual(array3.shape, (3, 16, 32))
self.assertTrue(np.array_equal(array3, array.transpose(2, 0, 1)))
# Force rescale to False and no channel permute
array4 = feature_extractor.to_numpy_array(array, rescale=False, channel_first=False)
self.assertTrue(array4.dtype, np.uint8)
self.assertEqual(array4.shape, (16, 32, 3))
self.assertTrue(np.array_equal(array4, array))
# Now test the default rescale for a float array (defaults to False)
array5 = feature_extractor.to_numpy_array(array2)
self.assertTrue(array5.dtype, np.float32)
self.assertEqual(array5.shape, (3, 16, 32))
self.assertTrue(np.array_equal(array5, array1))
@require_torch
def test_conversion_torch_to_array(self):
feature_extractor = ImageFeatureExtractionMixin()
tensor = torch.randint(0, 256, (16, 32, 3))
array = tensor.numpy()
# By default, rescale (for a tensor of ints) and channel permute
array1 = feature_extractor.to_numpy_array(array)
self.assertTrue(array1.dtype, np.float32)
self.assertEqual(array1.shape, (3, 16, 32))
self.assertTrue(np.array_equal(array1, array.transpose(2, 0, 1).astype(np.float32) * (1 / 255.0)))
# Same with no permute
array2 = feature_extractor.to_numpy_array(array, channel_first=False)
self.assertTrue(array2.dtype, np.float32)
self.assertEqual(array2.shape, (16, 32, 3))
self.assertTrue(np.array_equal(array2, array.astype(np.float32) * (1 / 255.0)))
# Force rescale to False
array3 = feature_extractor.to_numpy_array(array, rescale=False)
self.assertTrue(array3.dtype, np.uint8)
self.assertEqual(array3.shape, (3, 16, 32))
self.assertTrue(np.array_equal(array3, array.transpose(2, 0, 1)))
# Force rescale to False and no channel permute
array4 = feature_extractor.to_numpy_array(array, rescale=False, channel_first=False)
self.assertTrue(array4.dtype, np.uint8)
self.assertEqual(array4.shape, (16, 32, 3))
self.assertTrue(np.array_equal(array4, array))
# Now test the default rescale for a float tensor (defaults to False)
array5 = feature_extractor.to_numpy_array(array2)
self.assertTrue(array5.dtype, np.float32)
self.assertEqual(array5.shape, (3, 16, 32))
self.assertTrue(np.array_equal(array5, array1))
def test_conversion_image_to_image(self):
feature_extractor = ImageFeatureExtractionMixin()
image = get_random_image(16, 32)
# On an image, `to_pil_image1` is a noop.
image1 = feature_extractor.to_pil_image(image)
self.assertTrue(isinstance(image, PIL.Image.Image))
self.assertTrue(np.array_equal(np.array(image), np.array(image1)))
def test_conversion_array_to_image(self):
feature_extractor = ImageFeatureExtractionMixin()
array = np.random.randint(0, 256, (16, 32, 3), dtype=np.uint8)
# By default, no rescale (for an array of ints)
image1 = feature_extractor.to_pil_image(array)
self.assertTrue(isinstance(image1, PIL.Image.Image))
self.assertTrue(np.array_equal(np.array(image1), array))
# If the array is channel-first, proper reordering of the channels is done.
image2 = feature_extractor.to_pil_image(array.transpose(2, 0, 1))
self.assertTrue(isinstance(image2, PIL.Image.Image))
self.assertTrue(np.array_equal(np.array(image2), array))
# If the array has floating type, it's rescaled by default.
image3 = feature_extractor.to_pil_image(array.astype(np.float32) * (1 / 255.0))
self.assertTrue(isinstance(image3, PIL.Image.Image))
self.assertTrue(np.array_equal(np.array(image3), array))
# You can override the default to rescale.
image4 = feature_extractor.to_pil_image(array.astype(np.float32), rescale=False)
self.assertTrue(isinstance(image4, PIL.Image.Image))
self.assertTrue(np.array_equal(np.array(image4), array))
# And with floats + channel first.
image5 = feature_extractor.to_pil_image(array.transpose(2, 0, 1).astype(np.float32) * (1 / 255.0))
self.assertTrue(isinstance(image5, PIL.Image.Image))
self.assertTrue(np.array_equal(np.array(image5), array))
@require_torch
def test_conversion_tensor_to_image(self):
feature_extractor = ImageFeatureExtractionMixin()
tensor = torch.randint(0, 256, (16, 32, 3))
array = tensor.numpy()
# By default, no rescale (for a tensor of ints)
image1 = feature_extractor.to_pil_image(tensor)
self.assertTrue(isinstance(image1, PIL.Image.Image))
self.assertTrue(np.array_equal(np.array(image1), array))
# If the tensor is channel-first, proper reordering of the channels is done.
image2 = feature_extractor.to_pil_image(tensor.permute(2, 0, 1))
self.assertTrue(isinstance(image2, PIL.Image.Image))
self.assertTrue(np.array_equal(np.array(image2), array))
# If the tensor has floating type, it's rescaled by default.
image3 = feature_extractor.to_pil_image(tensor.float() / 255.0)
self.assertTrue(isinstance(image3, PIL.Image.Image))
self.assertTrue(np.array_equal(np.array(image3), array))
# You can override the default to rescale.
image4 = feature_extractor.to_pil_image(tensor.float(), rescale=False)
self.assertTrue(isinstance(image4, PIL.Image.Image))
self.assertTrue(np.array_equal(np.array(image4), array))
# And with floats + channel first.
image5 = feature_extractor.to_pil_image(tensor.permute(2, 0, 1).float() * (1 / 255.0))
self.assertTrue(isinstance(image5, PIL.Image.Image))
self.assertTrue(np.array_equal(np.array(image5), array))
def test_resize_image_and_array(self):
feature_extractor = ImageFeatureExtractionMixin()
image = get_random_image(16, 32)
array = np.array(image)
# Size can be an int or a tuple of ints.
resized_image = feature_extractor.resize(image, 8)
self.assertTrue(isinstance(resized_image, PIL.Image.Image))
self.assertEqual(resized_image.size, (8, 8))
resized_image1 = feature_extractor.resize(image, (8, 16))
self.assertTrue(isinstance(resized_image1, PIL.Image.Image))
self.assertEqual(resized_image1.size, (8, 16))
# Passing an array converts it to a PIL Image.
resized_image2 = feature_extractor.resize(array, 8)
self.assertTrue(isinstance(resized_image2, PIL.Image.Image))
self.assertEqual(resized_image2.size, (8, 8))
self.assertTrue(np.array_equal(np.array(resized_image), np.array(resized_image2)))
resized_image3 = feature_extractor.resize(image, (8, 16))
self.assertTrue(isinstance(resized_image3, PIL.Image.Image))
self.assertEqual(resized_image3.size, (8, 16))
self.assertTrue(np.array_equal(np.array(resized_image1), np.array(resized_image3)))
def test_resize_image_and_array_non_default_to_square(self):
feature_extractor = ImageFeatureExtractionMixin()
heights_widths = [
# height, width
# square image
(28, 28),
(27, 27),
# rectangular image: h < w
(28, 34),
(29, 35),
# rectangular image: h > w
(34, 28),
(35, 29),
]
# single integer or single integer in tuple/list
sizes = [22, 27, 28, 36, [22], (27,)]
for (height, width), size in zip(heights_widths, sizes):
for max_size in (None, 37, 1000):
image = get_random_image(height, width)
array = np.array(image)
size = size[0] if isinstance(size, (list, tuple)) else size
# Size can be an int or a tuple of ints.
# If size is an int, smaller edge of the image will be matched to this number.
# i.e, if height > width, then image will be rescaled to (size * height / width, size).
if height < width:
exp_w, exp_h = (int(size * width / height), size)
if max_size is not None and max_size < exp_w:
exp_w, exp_h = max_size, int(max_size * exp_h / exp_w)
elif width < height:
exp_w, exp_h = (size, int(size * height / width))
if max_size is not None and max_size < exp_h:
exp_w, exp_h = int(max_size * exp_w / exp_h), max_size
else:
exp_w, exp_h = (size, size)
if max_size is not None and max_size < size:
exp_w, exp_h = max_size, max_size
resized_image = feature_extractor.resize(image, size=size, default_to_square=False, max_size=max_size)
self.assertTrue(isinstance(resized_image, PIL.Image.Image))
self.assertEqual(resized_image.size, (exp_w, exp_h))
# Passing an array converts it to a PIL Image.
resized_image2 = feature_extractor.resize(array, size=size, default_to_square=False, max_size=max_size)
self.assertTrue(isinstance(resized_image2, PIL.Image.Image))
self.assertEqual(resized_image2.size, (exp_w, exp_h))
self.assertTrue(np.array_equal(np.array(resized_image), np.array(resized_image2)))
@require_torch
def test_resize_tensor(self):
feature_extractor = ImageFeatureExtractionMixin()
tensor = torch.randint(0, 256, (16, 32, 3))
array = tensor.numpy()
# Size can be an int or a tuple of ints.
resized_image = feature_extractor.resize(tensor, 8)
self.assertTrue(isinstance(resized_image, PIL.Image.Image))
self.assertEqual(resized_image.size, (8, 8))
resized_image1 = feature_extractor.resize(tensor, (8, 16))
self.assertTrue(isinstance(resized_image1, PIL.Image.Image))
self.assertEqual(resized_image1.size, (8, 16))
# Check we get the same results as with NumPy arrays.
resized_image2 = feature_extractor.resize(array, 8)
self.assertTrue(np.array_equal(np.array(resized_image), np.array(resized_image2)))
resized_image3 = feature_extractor.resize(array, (8, 16))
self.assertTrue(np.array_equal(np.array(resized_image1), np.array(resized_image3)))
def test_normalize_image(self):
feature_extractor = ImageFeatureExtractionMixin()
image = get_random_image(16, 32)
array = np.array(image)
mean = [0.1, 0.5, 0.9]
std = [0.2, 0.4, 0.6]
# PIL Image are converted to NumPy arrays for the normalization
normalized_image = feature_extractor.normalize(image, mean, std)
self.assertTrue(isinstance(normalized_image, np.ndarray))
self.assertEqual(normalized_image.shape, (3, 16, 32))
# During the conversion rescale and channel first will be applied.
expected = array.transpose(2, 0, 1).astype(np.float32) * (1 / 255.0)
np_mean = np.array(mean).astype(np.float32)[:, None, None]
np_std = np.array(std).astype(np.float32)[:, None, None]
expected = (expected - np_mean) / np_std
self.assertTrue(np.array_equal(normalized_image, expected))
def test_normalize_array(self):
feature_extractor = ImageFeatureExtractionMixin()
array = np.random.random((16, 32, 3))
mean = [0.1, 0.5, 0.9]
std = [0.2, 0.4, 0.6]
# mean and std can be passed as lists or NumPy arrays.
expected = (array - np.array(mean)) / np.array(std)
normalized_array = feature_extractor.normalize(array, mean, std)
self.assertTrue(np.array_equal(normalized_array, expected))
normalized_array = feature_extractor.normalize(array, np.array(mean), np.array(std))
self.assertTrue(np.array_equal(normalized_array, expected))
# Normalize will detect automatically if channel first or channel last is used.
array = np.random.random((3, 16, 32))
expected = (array - np.array(mean)[:, None, None]) / np.array(std)[:, None, None]
normalized_array = feature_extractor.normalize(array, mean, std)
self.assertTrue(np.array_equal(normalized_array, expected))
normalized_array = feature_extractor.normalize(array, np.array(mean), np.array(std))
self.assertTrue(np.array_equal(normalized_array, expected))
@require_torch
def test_normalize_tensor(self):
feature_extractor = ImageFeatureExtractionMixin()
tensor = torch.rand(16, 32, 3)
mean = [0.1, 0.5, 0.9]
std = [0.2, 0.4, 0.6]
# mean and std can be passed as lists or tensors.
expected = (tensor - torch.tensor(mean)) / torch.tensor(std)
normalized_tensor = feature_extractor.normalize(tensor, mean, std)
self.assertTrue(torch.equal(normalized_tensor, expected))
normalized_tensor = feature_extractor.normalize(tensor, torch.tensor(mean), torch.tensor(std))
self.assertTrue(torch.equal(normalized_tensor, expected))
# Normalize will detect automatically if channel first or channel last is used.
tensor = torch.rand(3, 16, 32)
expected = (tensor - torch.tensor(mean)[:, None, None]) / torch.tensor(std)[:, None, None]
normalized_tensor = feature_extractor.normalize(tensor, mean, std)
self.assertTrue(torch.equal(normalized_tensor, expected))
normalized_tensor = feature_extractor.normalize(tensor, torch.tensor(mean), torch.tensor(std))
self.assertTrue(torch.equal(normalized_tensor, expected))
def test_center_crop_image(self):
feature_extractor = ImageFeatureExtractionMixin()
image = get_random_image(16, 32)
# Test various crop sizes: bigger on all dimensions, on one of the dimensions only and on both dimensions.
crop_sizes = [8, (8, 64), 20, (32, 64)]
for size in crop_sizes:
cropped_image = feature_extractor.center_crop(image, size)
self.assertTrue(isinstance(cropped_image, PIL.Image.Image))
# PIL Image.size is transposed compared to NumPy or PyTorch (width first instead of height first).
expected_size = (size, size) if isinstance(size, int) else (size[1], size[0])
self.assertEqual(cropped_image.size, expected_size)
def test_center_crop_array(self):
feature_extractor = ImageFeatureExtractionMixin()
image = get_random_image(16, 32)
array = feature_extractor.to_numpy_array(image)
# Test various crop sizes: bigger on all dimensions, on one of the dimensions only and on both dimensions.
crop_sizes = [8, (8, 64), 20, (32, 64)]
for size in crop_sizes:
cropped_array = feature_extractor.center_crop(array, size)
self.assertTrue(isinstance(cropped_array, np.ndarray))
expected_size = (size, size) if isinstance(size, int) else size
self.assertEqual(cropped_array.shape[-2:], expected_size)
# Check result is consistent with PIL.Image.crop
cropped_image = feature_extractor.center_crop(image, size)
self.assertTrue(np.array_equal(cropped_array, feature_extractor.to_numpy_array(cropped_image)))
@require_torch
def test_center_crop_tensor(self):
feature_extractor = ImageFeatureExtractionMixin()
image = get_random_image(16, 32)
array = feature_extractor.to_numpy_array(image)
tensor = torch.tensor(array)
# Test various crop sizes: bigger on all dimensions, on one of the dimensions only and on both dimensions.
crop_sizes = [8, (8, 64), 20, (32, 64)]
for size in crop_sizes:
cropped_tensor = feature_extractor.center_crop(tensor, size)
self.assertTrue(isinstance(cropped_tensor, torch.Tensor))
expected_size = (size, size) if isinstance(size, int) else size
self.assertEqual(cropped_tensor.shape[-2:], expected_size)
# Check result is consistent with PIL.Image.crop
cropped_image = feature_extractor.center_crop(image, size)
self.assertTrue(torch.equal(cropped_tensor, torch.tensor(feature_extractor.to_numpy_array(cropped_image))))
@require_vision
class LoadImageTester(unittest.TestCase):
def test_load_img_local(self):
img = load_image("./tests/fixtures/tests_samples/COCO/000000039769.png")
img_arr = np.array(img)
self.assertEqual(
img_arr.shape,
(480, 640, 3),
)
def test_load_img_rgba(self):
dataset = datasets.load_dataset("hf-internal-testing/fixtures_image_utils", "image", split="test")
img = load_image(dataset[0]["file"]) # img with mode RGBA
img_arr = np.array(img)
self.assertEqual(
img_arr.shape,
(512, 512, 3),
)
def test_load_img_la(self):
dataset = datasets.load_dataset("hf-internal-testing/fixtures_image_utils", "image", split="test")
img = load_image(dataset[1]["file"]) # img with mode LA
img_arr = np.array(img)
self.assertEqual(
img_arr.shape,
(512, 768, 3),
)
def test_load_img_l(self):
dataset = datasets.load_dataset("hf-internal-testing/fixtures_image_utils", "image", split="test")
img = load_image(dataset[2]["file"]) # img with mode L
img_arr = np.array(img)
self.assertEqual(
img_arr.shape,
(381, 225, 3),
)
def test_load_img_exif_transpose(self):
dataset = datasets.load_dataset("hf-internal-testing/fixtures_image_utils", "image", split="test")
img_file = dataset[3]["file"]
img_without_exif_transpose = PIL.Image.open(img_file)
img_arr_without_exif_transpose = np.array(img_without_exif_transpose)
self.assertEqual(
img_arr_without_exif_transpose.shape,
(333, 500, 3),
)
img_with_exif_transpose = load_image(img_file)
img_arr_with_exif_transpose = np.array(img_with_exif_transpose)
self.assertEqual(
img_arr_with_exif_transpose.shape,
(500, 333, 3),
)
class UtilFunctionTester(unittest.TestCase):
def test_get_image_size(self):
# Test we can infer the size and channel dimension of an image.
image = np.random.randint(0, 256, (32, 64, 3))
self.assertEqual(get_image_size(image), (32, 64))
image = np.random.randint(0, 256, (3, 32, 64))
self.assertEqual(get_image_size(image), (32, 64))
# Test the channel dimension can be overriden
image = np.random.randint(0, 256, (3, 32, 64))
self.assertEqual(get_image_size(image, channel_dim=ChannelDimension.LAST), (3, 32))
def test_infer_channel_dimension(self):
# Test we fail with invalid input
with pytest.raises(ValueError):
infer_channel_dimension_format(np.random.randint(0, 256, (10, 10)))
with pytest.raises(ValueError):
infer_channel_dimension_format(np.random.randint(0, 256, (10, 10, 10, 10, 10)))
# Test we fail if neither first not last dimension is of size 3 or 1
with pytest.raises(ValueError):
infer_channel_dimension_format(np.random.randint(0, 256, (10, 1, 50)))
# Test we correctly identify the channel dimension
image = np.random.randint(0, 256, (3, 4, 5))
inferred_dim = infer_channel_dimension_format(image)
self.assertEqual(inferred_dim, ChannelDimension.FIRST)
image = np.random.randint(0, 256, (1, 4, 5))
inferred_dim = infer_channel_dimension_format(image)
self.assertEqual(inferred_dim, ChannelDimension.FIRST)
image = np.random.randint(0, 256, (4, 5, 3))
inferred_dim = infer_channel_dimension_format(image)
self.assertEqual(inferred_dim, ChannelDimension.LAST)
image = np.random.randint(0, 256, (4, 5, 1))
inferred_dim = infer_channel_dimension_format(image)
self.assertEqual(inferred_dim, ChannelDimension.LAST)
# We can take a batched array of images and find the dimension
image = np.random.randint(0, 256, (1, 3, 4, 5))
inferred_dim = infer_channel_dimension_format(image)
self.assertEqual(inferred_dim, ChannelDimension.FIRST)
def test_get_channel_dimension_axis(self):
# Test we correctly identify the channel dimension
image = np.random.randint(0, 256, (3, 4, 5))
inferred_axis = get_channel_dimension_axis(image)
self.assertEqual(inferred_axis, 0)
image = np.random.randint(0, 256, (1, 4, 5))
inferred_axis = get_channel_dimension_axis(image)
self.assertEqual(inferred_axis, 0)
image = np.random.randint(0, 256, (4, 5, 3))
inferred_axis = get_channel_dimension_axis(image)
self.assertEqual(inferred_axis, 2)
image = np.random.randint(0, 256, (4, 5, 1))
inferred_axis = get_channel_dimension_axis(image)
self.assertEqual(inferred_axis, 2)
# We can take a batched array of images and find the dimension
image = np.random.randint(0, 256, (1, 3, 4, 5))
inferred_axis = get_channel_dimension_axis(image)
self.assertEqual(inferred_axis, 1)