transformers/docs/source
NielsRogge b6ddb08a66
Add LayoutLMv2 + LayoutXLM (#12604)
* First commit

* Make style

* Fix dummy objects

* Add Detectron2 config

* Add LayoutLMv2 pooler

* More improvements, add documentation

* More improvements

* Add model tests

* Add clarification regarding image input

* Improve integration test

* Fix bug

* Fix another bug

* Fix another bug

* Fix another bug

* More improvements

* Make more tests pass

* Make more tests pass

* Improve integration test

* Remove gradient checkpointing and add head masking

* Add integration test

* Add LayoutLMv2ForSequenceClassification to the tests

* Add LayoutLMv2ForQuestionAnswering

* More improvements

* More improvements

* Small improvements

* Fix _LazyModule

* Fix fast tokenizer

* Move sync_batch_norm to a separate method

* Replace dummies by requires_backends

* Move calculation of visual bounding boxes to separate method + update README

* Add models to main init

* First draft

* More improvements

* More improvements

* More improvements

* More improvements

* More improvements

* Remove is_split_into_words

* More improvements

* Simply tesseract - no use of pandas anymore

* Add LayoutLMv2Processor

* Update is_pytesseract_available

* Fix bugs

* Improve feature extractor

* Fix bug

* Add print statement

* Add truncation of bounding boxes

* Add tests for LayoutLMv2FeatureExtractor and LayoutLMv2Tokenizer

* Improve tokenizer tests

* Make more tokenizer tests pass

* Make more tests pass, add integration tests

* Finish integration tests

* More improvements

* More improvements - update API of the tokenizer

* More improvements

* Remove support for VQA training

* Remove some files

* Improve feature extractor

* Improve documentation and one more tokenizer test

* Make quality and small docs improvements

* Add batched tests for LayoutLMv2Processor, remove fast tokenizer

* Add truncation of labels

* Apply suggestions from code review

* Improve processor tests

* Fix failing tests and add suggestion from code review

* Fix tokenizer test

* Add detectron2 CI job

* Simplify CI job

* Comment out non-detectron2 jobs and specify number of processes

* Add pip install torchvision

* Add durations to see which tests are slow

* Fix tokenizer test and make model tests smaller

* Frist draft

* Use setattr

* Possible fix

* Proposal with configuration

* First draft of fast tokenizer

* More improvements

* Enable fast tokenizer tests

* Make more tests pass

* Make more tests pass

* More improvements

* Addd padding to fast tokenizer

* Mkae more tests pass

* Make more tests pass

* Make all tests pass for fast tokenizer

* Make fast tokenizer support overflowing boxes and labels

* Add support for overflowing_labels to slow tokenizer

* Add support for fast tokenizer to the processor

* Update processor tests for both slow and fast tokenizers

* Add head models to model mappings

* Make style & quality

* Remove Detectron2 config file

* Add configurable option to label all subwords

* Fix test

* Skip visual segment embeddings in test

* Use ResNet-18 backbone in tests instead of ResNet-101

* Proposal

* Re-enable all jobs on CI

* Fix installation of tesseract

* Fix failing test

* Fix index table

* Add LayoutXLM doc page, first draft of code examples

* Improve documentation a lot

* Update expected boxes for Tesseract 4.0.0 beta

* Use offsets to create labels instead of checking if they start with ##

* Update expected boxes for Tesseract 4.1.1

* Fix conflict

* Make variable names cleaner, add docstring, add link to notebooks

* Revert "Fix conflict"

This reverts commit a9b46ce9afe47ebfcfe7b45e6a121d49e74ef2c5.

* Revert to make integration test pass

* Apply suggestions from @LysandreJik's review

* Address @patrickvonplaten's comments

* Remove fixtures DocVQA in favor of dataset on the hub

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-08-30 12:35:42 +02:00
..
_static Documentation for patch v4.9.2 2021-08-09 16:14:17 +02:00
imgs [doc] DP/PP/TP/etc parallelism (#12524) 2021-07-09 17:39:09 -07:00
internal Fix doc building error 2021-08-12 05:49:02 -04:00
main_classes [Flax] Correct flax docs (#12782) 2021-08-04 16:31:23 +02:00
model_doc Add LayoutLMv2 + LayoutXLM (#12604) 2021-08-30 12:35:42 +02:00
add_new_model.rst consistent nn. and nn.functional: part 5 docs (#12161) 2021-06-14 13:34:32 -07:00
benchmarks.rst [Docs] fixed broken link (#12205) 2021-06-16 15:14:53 -04:00
bertology.rst Fix documentation links always pointing to master. (#9217) 2021-01-05 06:18:48 -05:00
community.md docs: add HuggingArtists to community notebooks (#13050) 2021-08-10 09:36:44 +02:00
conf.py Add multilingual documentation support (#12952) 2021-07-30 20:56:14 +08:00
contributing.md Update installation page and add contributing to the doc (#5084) 2020-06-17 14:01:10 -04:00
converting_tensorflow_models.rst Examples reorg (#11350) 2021-04-21 11:11:20 -04:00
custom_datasets.rst Rename NLP library to Datasets library (#10920) 2021-03-26 08:07:59 -04:00
debugging.rst [debug] DebugUnderflowOverflow doesn't work with DP (#12816) 2021-07-21 09:36:02 -07:00
examples.md per_device instead of per_gpu/error thrown when argument unknown (#4618) 2020-05-27 11:36:55 -04:00
fast_tokenizers.rst Documentation about loading a fast tokenizer within Transformers (#11029) 2021-04-05 10:51:16 -04:00
favicon.ico Adding usage examples for common tasks (#2850) 2020-02-25 13:48:24 -05:00
glossary.rst Add video links to the documentation (#12162) 2021-06-15 06:37:37 -04:00
index.rst Add LayoutLMv2 + LayoutXLM (#12604) 2021-08-30 12:35:42 +02:00
installation.md Add mention of the huggingface_hub methods for offline mode (#12320) 2021-06-23 09:45:30 -04:00
migration.md consistent nn. and nn.functional: part 5 docs (#12161) 2021-06-14 13:34:32 -07:00
model_sharing.rst Add video links to the documentation (#12162) 2021-06-15 06:37:37 -04:00
model_summary.rst Add video links to the documentation (#12162) 2021-06-15 06:37:37 -04:00
multilingual.rst Examples reorg (#11350) 2021-04-21 11:11:20 -04:00
notebooks.md Update notebooks (#3620) 2020-04-06 14:32:39 -04:00
parallelism.md [parallelism doc] document Deepspeed-Inference and parallelformers (#12836) 2021-07-21 15:11:02 -07:00
performance.md [doc] performance: batch sizes (#12725) 2021-07-15 09:39:34 -07:00
perplexity.rst Create perplexity.rst (#13004) 2021-08-05 02:56:13 -04:00
philosophy.rst Minor documentation revisions from copyediting (#9266) 2020-12-23 10:15:49 -05:00
preprocessing.rst Add video links to the documentation (#12162) 2021-06-15 06:37:37 -04:00
pretrained_models.rst GPT Neo few fixes (#10968) 2021-03-30 11:15:55 -04:00
quicktour.rst Doctests job (#13088) 2021-08-12 03:42:25 -04:00
sagemaker.md remove documentation (#12657) 2021-07-12 18:02:51 +02:00
serialization.rst Add to ONNX docs (#13048) 2021-08-09 09:51:49 -04:00
task_summary.rst Doctests job (#13088) 2021-08-12 03:42:25 -04:00
testing.rst [doc] testing: how to trigger a self-push workflow (#12724) 2021-07-15 16:18:56 -07:00
tokenizer_summary.rst Add video links to the documentation (#12162) 2021-06-15 06:37:37 -04:00
training.rst fixed docs (#12646) 2021-07-12 12:03:13 -04:00
troubleshooting.md [troubleshooting] add 2 points of reference to the offline mode (#11236) 2021-04-14 08:39:23 -07:00