transformers/tests
Jinho Park 17fdd35481
Add BROS (#23190)
* add Bros boilerplate

* copy and pasted modeling_bros.py from official Bros repo

* update copyright of bros files

* copy tokenization_bros.py from official repo and update import path

* copy tokenization_bros_fast.py from official repo and update import path

* copy configuration_bros.py from official repo and update import path

* remove trailing period in copyright line

* copy and paste bros/__init__.py from official repo

* save formatting

* remove unused unnecessary pe_type argument - using only crel type

* resolve import issue

* remove unused model classes

* remove unnecessary tests

* remove unused classes

* fix original code's bug - layer_module's argument order

* clean up modeling auto

* add bbox to prepare_config_and_inputs

* set temporary value to hidden_size (32 is too low because of the of the
Bros' positional embedding)

* remove decoder test, update create_and_check* input arguemnts

* add missing variable to model tests

* do make fixup

* update bros.mdx

* add boilerate plate for no_head inference test

* update BROS_PRETRAINED_MODEL_ARCHIVE_LIST (add naver-clova-ocr prefix)

* add prepare_bros_batch_inputs function

* update modeling_common to add bbox inputs in Bros Model Test

* remove unnecessary model inference

* add test case

* add model_doc

* add test case for token_classification

* apply fixup

* update modeling code

* update BrosForTokenClassification loss calculation logic

* revert logits preprocessing logic to make sure logits have original shape

* - update class name

* - add BrosSpadeOutput
- update BrosConfig arguments

* add boilerate plate for no_head inference test

* add prepare_bros_batch_inputs function

* add test case

* add test case for token_classification

* update modeling code

* update BrosForTokenClassification loss calculation logic

* revert logits preprocessing logic to make sure logits have original shape

* apply masking on the fly

* add BrosSpadeForTokenLinking

* update class name
put docstring to the beginning of the file

* separate the logits calculation logic and loss calculation logic

* update logic for loss calculation so that logits shape doesn't change
when return

* update typo

* update prepare_config_and_inputs

* update dummy node initialization

* update last_hidden_states getting logic to consider when return_dict is False

* update box first token mask param

* bugfix: remove random attention mask generation

* update keys to ignore on load missing

* run make style and quality

* apply make style and quality of other codes

* update box_first_token_mask to bool type

* update index.md

* apply make style and quality

* apply make fix-copies

* pass check_repo

* update bros model doc

* docstring bugfix fix

* add checkpoint for doc, tokenizer for doc

* Update README.md

* Update docs/source/en/model_doc/bros.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update bros.md

* Update src/transformers/__init__.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/bros.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* apply suggestions from code review

* apply suggestions from code review

* revert test_processor_markuplm.py

* Update test_processor_markuplm.py

* apply suggestions from code review

* apply suggestions from code review

* apply suggestions from code review

* update BrosSpadeELForTokenClassification head name to entity linker

* add doc string for config params

* update class, var names to more explicit and apply suggestions from code review

* remove unnecessary keys to ignore

* update relation extractor to be initialized with config

* add bros processor

* apply make style and quality

* update bros.md

* remove bros tokenizer, add bros processor that wraps bert tokenizer

* revert change

* apply make fix-copies

* update processor code, update itc -> initial token, stc -> subsequent token

* add type hint

* remove unnecessary condition branches in embedding forward

* fix auto tokenizer fail

* update docstring for each classes

* update bbox input dimension as standard 2 points and convert them to 4
points in forward pass

* update bros docs

* apply suggestions from code review : update Bros -> BROS in bros.md

* 1. box prefix var -> bbox
2. update variable names to be more explicit

* replace einsum with torch matmul

* apply style and quality

* remove unused argument

* remove unused arguments

* update docstrings

* apply suggestions from code review: add BrosBboxEmbeddings, replace
einsum with classical matrix operations

* revert einsum update

* update bros processor

* apply suggestions from code review

* add conversion script for bros

* Apply suggestions from code review

* fix readme

* apply fix-copies

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-09-14 18:02:37 +01:00
..
benchmark [Test refactor 1/5] Per-folder tests reorganization (#15725) 2022-02-23 15:46:28 -05:00
bettertransformer Add methods to PreTrainedModel to use PyTorch's BetterTransformer (#21259) 2023-04-27 11:03:42 +02:00
deepspeed fix the deepspeed tests (#26021) 2023-09-13 10:26:53 +05:30
extended [tests] switch to torchrun (#22712) 2023-04-12 08:25:45 -07:00
fixtures [WIP] add SpeechT5 model (#18922) 2023-02-03 12:43:46 -05:00
generation Fix beam search when using model parallel (#24969) 2023-09-14 11:00:52 -04:00
models Add BROS (#23190) 2023-09-14 18:02:37 +01:00
optimization Make schedulers picklable by making lr_lambda fns global (#21768) 2023-03-02 12:08:43 -05:00
peft_integration [PEFT] Fix PEFT + gradient checkpointing (#25846) 2023-09-14 13:01:58 +02:00
pipelines [Whisper] Fix word-level timestamps for audio < 30 seconds (#25607) 2023-09-14 17:42:35 +01:00
quantization [RWKV] Final fix RWMV 4bit (#26134) 2023-09-13 16:30:20 +02:00
repo_utils Document check copies (#25291) 2023-08-04 14:56:29 +02:00
sagemaker Avoid invalid escape sequences, use raw strings (#22936) 2023-04-25 09:17:56 -04:00
tokenization [ PreTrainedTokenizerFast] Keep properties from fast tokenizer (#25053) 2023-07-25 18:45:01 +02:00
tools Add support for for loops in python interpreter (#24429) 2023-06-26 09:58:14 -04:00
trainer enable optuna multi-objectives feature (#25969) 2023-09-12 18:01:22 +01:00
utils Add missing Maskformer dataclass decorator, add dataclass check in ModelOutput for subclasses (#25638) 2023-09-14 10:30:49 +01:00
__init__.py GPU text generation: mMoved the encoded_prompt to correct device 2020-01-06 15:11:12 +01:00
test_backbone_common.py Add ViTDet (#25524) 2023-08-29 10:03:52 +01:00
test_configuration_common.py Deal with nested configs better in base class (#25237) 2023-08-04 14:56:09 +02:00
test_configuration_utils.py Deal with nested configs better in base class (#25237) 2023-08-04 14:56:09 +02:00
test_feature_extraction_common.py Split common test from core tests (#24284) 2023-06-15 07:30:24 -04:00
test_feature_extraction_utils.py Split common test from core tests (#24284) 2023-06-15 07:30:24 -04:00
test_image_processing_common.py Input data format (#25464) 2023-08-16 17:45:02 +01:00
test_image_processing_utils.py Run hub tests (#24807) 2023-07-13 15:25:45 -04:00
test_image_transforms.py Add input_data_format argument, image transforms (#25462) 2023-08-11 15:09:31 +01:00
test_modeling_common.py Add BROS (#23190) 2023-09-14 18:02:37 +01:00
test_modeling_flax_common.py Split common test from core tests (#24284) 2023-06-15 07:30:24 -04:00
test_modeling_flax_utils.py Split common test from core tests (#24284) 2023-06-15 07:30:24 -04:00
test_modeling_tf_common.py Skip test_onnx_runtime_optimize for now (#25560) 2023-08-17 11:23:16 +02:00
test_modeling_tf_utils.py Split common test from core tests (#24284) 2023-06-15 07:30:24 -04:00
test_modeling_utils.py Skip warning if tracing with dynamo (#25581) 2023-09-08 21:13:33 +02:00
test_pipeline_mixin.py Add Text-To-Speech pipeline (#24952) 2023-08-17 17:34:47 +01:00
test_sequence_feature_extraction_common.py Fix typo (#25966) 2023-09-05 10:12:25 +02:00
test_tokenization_common.py Overhaul Conversation class and prompt templating (#25323) 2023-09-14 15:10:34 +01:00
test_tokenization_utils.py Split common test from core tests (#24284) 2023-06-15 07:30:24 -04:00