* add a test for a word only input
* make LukeForMaskedLM work without entity inputs
* update test
* add LukeForMaskedLM to MODEL_FOR_MASKED_LM_MAPPING_NAMES
* restore pyproject.toml
* empty line at the end of pyproject.toml
I think you mean to accept either an instance of `PreTrainedTokenizer` or `PreTrainedTokenizerFast` inside of the `pipeline(...)` factory function, if the `tokenizer` argument isn't a `str`.
* initial commit
* add init file
* update globakl init
* update index and dummy objects
* style
* update modelling auto
* fix initi typo in src/transformers
* fix typo in modeling tf auto, opt was in wrong mapping name
* fixed a slow test : saved_model
* style
* fix positionnal embedding if no position id is provided
* update tf test
* update test flax requirements
* fixed serialization
* update
* update tf name to allow smooth convertion
* update flax tests
* style
* fix test typo
* fix tf typo test
* add xla for generate support in causal LM
* fixed bug
* cleaned tf tests
* style
* removed from PT for slow tests
* fix typp
* opt test as slow
* trying to fix GPT2 undefined
* correct documentation and add to test doc
* update tf doc
* fix doc
* fake commit
* Apply suggestions from code review
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* update test based on review
* merged main layer for functionning test
* fixup + quality
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* update long comment
* make fix copies
Co-authored-by: Arthur <arthur@huggingface.co>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [Json dump] Make json prettier
* correct more tokenizeirs
* more patterns
* add aggressive test
* the aggressive test was actually useful :-)
* more tests
* Apply suggestions from code review
* Accumulate tokens into batches in PreTrainedTokenizerBase.add_tokens()
For tokenizers with a small number of special tokens or special tokens
with consecutive token IDs, this reduces the time complexity of creating
the trie from quadratic to linear, see also #16936.
* Extend explanation of batching added tokens
* Setup for Italian translation and add first document
- Add 'it' folder for files translated into Italian
- Add _config.py and _toctree.yml files
- Add translation of quicktour.mdx
* Fix style issue of italian documentation files
* Add 'it' to the languages section in the .github/workflows
* Remove - installation from _toctree for Italian
* Translation for index file
- Add index to _toctree.yml
- Add translation of index.mdx
* Fix typo in docs/source/it/index.mdx
* Translate code comments in docs/source/it/_config.py
Co-authored-by: Martina Fumanelli <martinafumanelli@Martinas-MBP.homenet.telecomitalia.it>
* Add onnx configuration for xlm
* Add supported features for xlm
* Add xlm to models exportable with onnx
* Add xlm architecture to test file
* Modify docs
* Make code quality fixes
* Support for Bart and LayoutLM, and partial support for XLNet
* Support for mbart
* A lot of new models supported
* Support for other models
* LayoutLM fix
* Use strings instead of classes
* Enablign `imageGPT` auto feature extractor.
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Small updates.
* Update after rebase to use `input_ids` instead of `pixel_values`.
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Make forward pass work
* More improvements
* Remove unused imports
* Remove timm dependency
* Improve loss calculation of token classifier
* Fix most tests
* Add docs
* Add model integration test
* Make all tests pass
* Add LayoutLMv3FeatureExtractor
* Improve integration test + make fixup
* Add example script
* Fix style
* Add LayoutLMv3Processor
* Fix style
* Add option to add visual labels
* Make more tokenizer tests pass
* Fix more tests
* Make more tests pass
* Fix bug and improve docs
* Fix import of processors
* Improve docstrings
* Fix toctree and improve docs
* Fix auto tokenizer
* Move tests to model folder
* Move tests to model folder
* change default behavior add_prefix_space
* add prefix space for fast
* add_prefix_spcae set to True for Fast
* no space before `unique_no_split` token
* add test to hightligh special treatment of added tokens
* fix `test_batch_encode_dynamic_overflowing` by building a long enough example
* fix `test_full_tokenizer` with add_prefix_token
* Fix tokenizer integration test
* Make the code more readable
* Add tests for LayoutLMv3Processor
* Fix style
* Add model to README and update init
* Apply suggestions from code review
* Replace asserts by value errors
* Add suggestion by @ducviet00
* Add model to doc tests
* Simplify script
* Improve README
* a step ahead to fix
* Update pair_input_test
* Make all tokenizer tests pass - phew
* Make style
* Add LayoutLMv3 to CI job
* Fix auto mapping
* Fix CI job name
* Make all processor tests pass
* Make tests of LayoutLMv2 and LayoutXLM consistent
* Add copied from statements to fast tokenizer
* Add copied from statements to slow tokenizer
* Remove add_visual_labels attribute
* Fix tests
* Add link to notebooks
* Improve docs of LayoutLMv3Processor
* Fix reference to section
Co-authored-by: SaulLu <lucilesaul.com@gmail.com>
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>