* fix set_transform link
* Update docs/source/en/preprocessing.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* use doc-builder sintax
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* remove docstrings CodeGen from objects_to_ignore
* autofix codegen docstrings
* fill in the missing types and docstrings
* fixup
* change descriptions to be in a separate line
* apply docstring suggestions from code review
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* update n_ctx description in CodeGenConfig
---------
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* Remove ChineseCLIPImageProcessor, ChineseCLIPTextConfig, ChineseCLIPVisionConfig from check_docstrings
* Run fix_and_overwrite for ChineseCLIPImageProcessor, ChineseCLIPTextConfig, ChineseCLIPVisionConfig
* Replace <fill_type> and <fill_docstring> in configuration_chinese_clip.py, image_processing_chinese_clip.py with type and docstring values
---------
Co-authored-by: vignesh-raghunathan <vignesh_raghunathan@intuit.com>
* initial commit
* add processor, add fuyu naming
* add draft processor
* fix processor
* remove dropout to fix loading of weights
* add image processing fixes from Pedro
* fix
* fix processor
* add basic processing fuyu test
* add documentation and TODO
* address comments, add tests, add doc
* replace assert with torch asserts
* add Mixins and fix tests
* clean imports
* add model tester, clean imports
* fix embedding test
* add updated tests from pre-release model
* Processor: return input_ids used for inference
* separate processing and model tests
* relax test tolerance for embeddings
* add test for logit comparison
* make sure fuyu image processor is imported in the init
* fix formattingh
* more formatting issues
* and more
* fixups
* remove some stuff
* nits
* update init
* remove the fuyu file
* Update integration test with release model
* Update conversion script.
The projection is not used, as confirmed by the authors.
* improve geenration
* Remove duplicate function
* Trickle down patches to model call
* processing fuyu updates
* remove things
* fix prepare_inputs_for_generation to fix generate()
* remove model_input
* update
* add generation tests
* nits
* draft leverage automodel and autoconfig
* nits
* fix dtype patch
* address comments, update READMEs and doc, include tests
* add working processing test, remove refs to subsequences
* add tests, remove Sequence classification
* processing
* update
* update the conversion script
* more processing cleanup
* safe import
* take out ModelTesterMixin for early release
* more cl;eanup
* more cleanup
* more cleanup
* and more
* register a buffer
* nits
* add postprocessing of generate output
* nits
* updates
* add one working test
* fix test
* make fixup works
* fixup
* Arthur's updates
* nits
* update
* update
* fix processor
* update tests
* passe more fixups
* fix
* nits
* don't import torch
* skip fuyu config for now
* fixup done
* fixup
* update
* oups
* nits
* Use input embeddings
* no buffer
* update
* styling processing fuyu
* fix test
* update licence
* protect torch import
* fixup and update not doctested
* kwargs should be passed
* udpates
* update the impofixuprts in the test
* protect import
* protecting imports
* protect imports in type checking
* add testing decorators
* protect top level import structure
* fix typo
* fix check init
* move requires_backend to functions
* Imports
* Protect types
---------
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre@huggingface.co>
* fix
* last attempt
* current work
* fix forward compatibility
* save all special tokens
* current state
* revert additional changes
* updates
* remove tokenizer.model
* add a test and the fix
* nit
* revert one more break
* fix typefield issue
* quality
* more tests
* fix fields for FC
* more nits?
* new additional changes
* how
* some updates
* simplify all
* more nits
* revert some things to original
* nice
* nits
* a small hack
* more nits
* ahhaha
* fixup
* update
* make test run on ci
* use subtesting
* update
* Update .circleci/create_circleci_config.py
* updates
* fixup
* nits
* replace typo
* fix the test
* nits
* update
* None max dif pls
* a partial fix
* had to revert one thing
* test the fast
* updates
* fixup
* and more nits
* more fixes
* update
* Oupsy 👁️
* nits
* fix marian
* on our way to heaven
* Update src/transformers/models/t5/tokenization_t5.py
Co-authored-by: Lysandre Debut <hi@lysand.re>
* fixup
* Update src/transformers/tokenization_utils_fast.py
Co-authored-by: Leo Tronchon <leo.tronchon@gmail.com>
* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Leo Tronchon <leo.tronchon@gmail.com>
* fix phobert
* skip some things, test more
* nits
* fixup
* fix deberta
* update
* update
* more updates
* skip one test
* more updates
* fix camembert
* can't test this one
* more good fixes
* kind of a major update
- seperate what is only done in fast in fast init and refactor
- add_token(AddedToken(..., speicla = True)) ignores it in fast
- better loading
* fixup
* more fixups
* fix pegasus and mpnet
* remove skipped tests
* fix phoneme tokenizer if self.verbose
* fix individual models
* update common tests
* update testing files
* all over again
* nits
* skip test for markup lm
* fixups
* fix order of addition in fast by sorting the added tokens decoder
* proper defaults for deberta
* correct default for fnet
* nits on add tokens, string initialized to special if special
* skip irrelevant herbert tests
* main fixes
* update test added_tokens_serialization
* the fix for bart like models and class instanciating
* update bart
* nit!
* update idefix test
* fix whisper!
* some fixup
* fixups
* revert some of the wrong chanegs
* fixup
* fixup
* skip marian
* skip the correct tests
* skip for tf and flax as well
---------
Co-authored-by: Lysandre Debut <hi@lysand.re>
Co-authored-by: Leo Tronchon <leo.tronchon@gmail.com>
* Chore: Typo fixed in multiple files of docs/source/en/model_doc
* Update docs/source/en/model_doc/nllb-moe.md
Co-authored-by: Aryan V S <avs050602@gmail.com>
---------
Co-authored-by: Aryan V S <avs050602@gmail.com>
* Adjust length limits and allow naked conversation list inputs
* Adjust length limits and allow naked conversation list inputs
* Maybe use a slightly more reasonable limit than 1024
* Skip tests for old models that never supported this anyway
* Cleanup input docstrings
* More docstring cleanup + skip failing TF test
* Make fixup
* Remove BertGenerationTokenizer from objects to ignore
The file BertGenerationTokenizer is removed from
objects to ignore as a first step to fix docstring.
* Docstrings fix for BertGenerationTokenizer
Docstring fix is generated for BertGenerationTokenizer
by using check_docstrings.py.
* Fix docstring for BertGenerationTokenizer
Added sep_token type and docstring in BertGenerationTokenizer.
* Remove space in template comment
I think the space between the eos and bos tokens is not present in the actual template output. I'm using this documentation as a reference for everyone asking about prompting, so would like to clarify whether there's a space or not :)
* Update fast tokenizer too
* Apply to Code Llama
* Link to original code snippet.
* Remove CanineConfig from check_docstrings
* Run fix_and_overwrite for CanineConfig
* Replace <fill_type> and <fill_docstring> in configuration_canine.py with type and docstring values
---------
Co-authored-by: vignesh-raghunathan <vignesh_raghunathan@intuit.com>