transformers/utils
NielsRogge 836921fdeb
Add UDOP (#22940)
* First draft

* More improvements

* More improvements

* More fixes

* Fix copies

* More improvements

* More fixes

* More improvements

* Convert checkpoint

* More improvements, set up tests

* Fix more tests

* Add UdopModel

* More improvements

* Fix equivalence test

* More fixes

* Redesign model

* Extend conversion script

* Use real inputs for conversion script

* Add image processor

* Improve conversion script

* Add UdopTokenizer

* Add fast tokenizer

* Add converter

* Update README's

* Add processor

* Add fully fledged tokenizer

* Add fast tokenizer

* Use processor in conversion script

* Add tokenizer tests

* Fix one more test

* Fix more tests

* Fix tokenizer tests

* Enable fast tokenizer tests

* Fix more tests

* Fix additional_special_tokens of fast tokenizer

* Fix tokenizer tests

* Fix more tests

* Fix equivalence test

* Rename image to pixel_values

* Rename seg_data to bbox

* More renamings

* Remove vis_special_token

* More improvements

* Add docs

* Fix copied from

* Update slow tokenizer

* Update fast tokenizer design

* Make text input optional

* Add first draft of processor tests

* Fix more processor tests

* Fix decoder_start_token_id

* Fix test_initialization

* Add integration test

* More improvements

* Improve processor, add test

* Add more copied from

* Add more copied from

* Add more copied from

* Add more copied from

* Remove print statement

* Update README and auto mapping

* Delete files

* Delete another file

* Remove code

* Fix test

* Fix docs

* Remove asserts

* Add doc tests

* Include UDOP in exotic model tests

* Add expected tesseract decodings

* Add sentencepiece

* Use same design as T5

* Add UdopEncoderModel

* Add UdopEncoderModel to tests

* More fixes

* Fix fast tokenizer

* Fix one more test

* Remove parallelisable attribute

* Fix copies

* Remove legacy file

* Copy from T5Tokenizer

* Fix rebase

* More fixes, copy from T5

* More fixes

* Fix init

* Use ArthurZ/udop for tests

* Make all model tests pass

* Remove UdopForConditionalGeneration from auto mapping

* Fix more tests

* fixups

* more fixups

* fix the tokenizers

* remove un-necessary changes

* nits

* nits

* replace truncate_sequences_boxes with truncate_sequences for fix-copies

* nit current path

* add a test for input ids

* ids that we should get taken from c9f7a32f57

* nits converting

* nits

* apply ruff

* nits

* nits

* style

* fix slow order of addition

* fix udop fast range as well

* fixup

* nits

* Add docstrings

* Fix gradient checkpointing

* Update code examples

* Skip tests

* Update integration test

* Address comment

* Make fixup

* Remove extra ids from tokenizer

* Skip test

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update year

* Address comment

* Address more comments

* Address comments

* Add copied from

* Update CI

* Rename script

* Update model id

* Add AddedToken, skip tests

* Update CI

* Fix doc tests

* Do not use Tesseract for the doc tests

* Remove kwargs

* Add original inputs

* Update casting

* Fix doc test

* Update question

* Update question

* Use LayoutLMv3ImageProcessor

* Update organization

* Improve docs

* Update forward signature

* Make images optional

* Remove deprecated device argument

* Add comment, add add_prefix_space

* More improvements

* Remove kwargs

---------

Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-03-04 18:49:02 +01:00
..
test_module AutoImageProcessor (#20111) 2022-11-08 19:54:41 +00:00
tf_ops Check TF ops for ONNX compliance (#10025) 2021-02-15 07:55:10 -05:00
add_pipeline_model_mapping_to_test.py A script to add/update pipeline_model_mapping systematically (#22180) 2023-04-06 18:08:14 +02:00
check_build.py Clean up CUDA kernels (#23455) 2023-05-18 14:14:43 -04:00
check_config_attributes.py Add UDOP (#22940) 2024-03-04 18:49:02 +01:00
check_config_docstrings.py Update all references to canonical models (#29001) 2024-02-16 08:16:58 +01:00
check_copies.py Add support for fine-tuning CLIP-like models using contrastive-image-text example (#29070) 2024-02-20 12:08:31 +00:00
check_doc_toc.py Doc checks (#25408) 2023-08-10 10:53:22 +02:00
check_docstrings.py [ gemma] Adds support for Gemma 💎 (#29167) 2024-02-21 14:21:28 +01:00
check_doctest_list.py Avoid many failing tests in doctesting (#27262) 2023-11-03 12:47:07 +01:00
check_dummies.py Doc checks (#25408) 2023-08-10 10:53:22 +02:00
check_inits.py Make using safetensors files automated. (#27571) 2023-12-01 15:51:10 +01:00
check_model_tester.py Add a new script to check model testers' config (#22063) 2023-03-13 19:11:19 +01:00
check_repo.py Add UDOP (#22940) 2024-03-04 18:49:02 +01:00
check_self_hosted_runner.py Tiny fix for check_self_hosted_runner.py (#24052) 2023-06-06 18:17:41 +02:00
check_support_list.py Fix the check of models supporting FA/SDPA not run (#28202) 2023-12-22 12:56:11 +01:00
check_table.py Add support for fine-tuning CLIP-like models using contrastive-image-text example (#29070) 2024-02-20 12:08:31 +00:00
check_task_guides.py More utils doc (#25457) 2023-08-17 07:58:35 +02:00
check_tf_ops.py Check TF ops for ONNX compliance (#10025) 2021-02-15 07:55:10 -05:00
create_dummy_models.py Update tiny model creation script (#27674) 2023-11-28 10:05:34 +01:00
custom_init_isort.py More utils doc (#25457) 2023-08-17 07:58:35 +02:00
download_glue_data.py Raise exceptions instead of asserts (#13907) 2021-10-07 12:44:23 +05:30
extract_warnings.py Make Slack CI reporting stronger (#21823) 2023-02-28 17:12:44 +01:00
get_ci_error_statistics.py Add artifact name in job step to maintain job / artifact correspondence (#28682) 2024-01-31 15:58:17 +01:00
get_github_job_time.py Make Slack CI reporting stronger (#21823) 2023-02-28 17:12:44 +01:00
get_modified_files.py exclude deleted files in the fixup script (#21436) 2023-02-03 12:57:02 -05:00
get_previous_daily_ci.py Fix a minor bug in CI slack report (#22906) 2023-04-21 20:36:35 +02:00
get_test_info.py Add an utility file to get information from test files (#21856) 2023-03-01 17:53:29 +01:00
not_doctested.txt Convert SlimSAM checkpoints (#28379) 2024-03-04 11:51:16 +01:00
notification_service_doc_tests.py Fix slack report failing for doctest (#27042) 2023-10-30 10:48:24 +01:00
notification_service.py [CI] Quantization workflow (#29046) 2024-02-28 10:09:25 -05:00
past_ci_versions.py (Re-)Enable Nightly + Past CI (#22393) 2023-03-30 21:06:35 +02:00
print_env.py Print more library versions in CI (#17384) 2022-06-02 10:24:16 +02:00
release.py More utils doc (#25457) 2023-08-17 07:58:35 +02:00
slow_documentation_tests.txt Add SeamlessM4T v2 (#27779) 2023-11-30 20:24:43 +01:00
sort_auto_mappings.py More utils doc (#25457) 2023-08-17 07:58:35 +02:00
split_model_tests.py Split daily CI using 2 level matrix (#28773) 2024-01-31 18:04:43 +01:00
tests_fetcher.py Update important model list (#29019) 2024-02-16 11:31:51 +01:00
update_metadata.py Add feature extraction mapping for automatic metadata update (#28944) 2024-02-26 10:35:37 +00:00
update_tiny_models.py Update tiny model summary file for recent models (#22637) 2023-04-06 22:52:59 +02:00