* try to stylify using ruff
* might need to remove these changes?
* use ruf format andruff check
* use isinstance instead of type comparision
* use # fmt: skip
* use # fmt: skip
* nits
* soem styling changes
* update ci job
* nits isinstance
* more files update
* nits
* more nits
* small nits
* check and format
* revert wrong changes
* actually use formatter instead of checker
* nits
* well docbuilder is overwriting this commit
* revert notebook changes
* try to nuke docbuilder
* style
* fix feature exrtaction test
* remve `indent-width = 4`
* fixup
* more nits
* update the ruff version that we use
* style
* nuke docbuilder styling
* leve the print for detected changes
* nits
* Remove file I/O
Co-authored-by: charliermarsh
<charlie.r.marsh@gmail.com>
* style
* nits
* revert notebook changes
* Add # fmt skip when possible
* Add # fmt skip when possible
* Fix
* More ` # fmt: skip` usage
* More ` # fmt: skip` usage
* More ` # fmt: skip` usage
* NIts
* more fixes
* fix tapas
* Another way to skip
* Recommended way
* Fix two more fiels
* Remove asynch
Remove asynch
---------
Co-authored-by: charliermarsh <charlie.r.marsh@gmail.com>
* fix
* last attempt
* current work
* fix forward compatibility
* save all special tokens
* current state
* revert additional changes
* updates
* remove tokenizer.model
* add a test and the fix
* nit
* revert one more break
* fix typefield issue
* quality
* more tests
* fix fields for FC
* more nits?
* new additional changes
* how
* some updates
* simplify all
* more nits
* revert some things to original
* nice
* nits
* a small hack
* more nits
* ahhaha
* fixup
* update
* make test run on ci
* use subtesting
* update
* Update .circleci/create_circleci_config.py
* updates
* fixup
* nits
* replace typo
* fix the test
* nits
* update
* None max dif pls
* a partial fix
* had to revert one thing
* test the fast
* updates
* fixup
* and more nits
* more fixes
* update
* Oupsy 👁️
* nits
* fix marian
* on our way to heaven
* Update src/transformers/models/t5/tokenization_t5.py
Co-authored-by: Lysandre Debut <hi@lysand.re>
* fixup
* Update src/transformers/tokenization_utils_fast.py
Co-authored-by: Leo Tronchon <leo.tronchon@gmail.com>
* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Leo Tronchon <leo.tronchon@gmail.com>
* fix phobert
* skip some things, test more
* nits
* fixup
* fix deberta
* update
* update
* more updates
* skip one test
* more updates
* fix camembert
* can't test this one
* more good fixes
* kind of a major update
- seperate what is only done in fast in fast init and refactor
- add_token(AddedToken(..., speicla = True)) ignores it in fast
- better loading
* fixup
* more fixups
* fix pegasus and mpnet
* remove skipped tests
* fix phoneme tokenizer if self.verbose
* fix individual models
* update common tests
* update testing files
* all over again
* nits
* skip test for markup lm
* fixups
* fix order of addition in fast by sorting the added tokens decoder
* proper defaults for deberta
* correct default for fnet
* nits on add tokens, string initialized to special if special
* skip irrelevant herbert tests
* main fixes
* update test added_tokens_serialization
* the fix for bart like models and class instanciating
* update bart
* nit!
* update idefix test
* fix whisper!
* some fixup
* fixups
* revert some of the wrong chanegs
* fixup
* fixup
* skip marian
* skip the correct tests
* skip for tf and flax as well
---------
Co-authored-by: Lysandre Debut <hi@lysand.re>
Co-authored-by: Leo Tronchon <leo.tronchon@gmail.com>
* Add test_backbone for convnext
* Add TimmBackbone model
* Add check for backbone type
* Tidying up - config checks
* Update convnextv2
* Tidy up
* Fix indices & clearer comment
* Exceptions for config checks
* Correclty update config for tests
* Safer imports
* Safer safer imports
* Fix where decorators go
* Update import logic and backbone tests
* More import fixes
* Fixup
* Only import all_models if torch available
* Fix kwarg updates in from_pretrained & main rebase
* Tidy up
* Add tests for AutoBackbone
* Tidy up
* Fix import error
* Fix up
* Install nattan in doc_test_job
* Revert back to setting self._out_xxx directly
* Bug fix - out_indices mapping from out_features
* Fix tests
* Dont accept output_loading_info for Timm models
* Set out_xxx and don't remap
* Use smaller checkpoint for test
* Don't remap timm indices - check out_indices based on stage names
* Skip test as it's n/a
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Cleaner imports / spelling is hard
---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* intiial commit
* new styling
* update
* just run doctest in CI
* remove more test for fast dev
* update
* update refs
* update path and fetch upstream
* update documentatyion trests
* typo
* parse pwd
* don't check for files that are in hidden folders
* just give paths relative to transformers
* update
* update
* update
* major refactoring
* make sure options is ok
* lest test that mdx is tested
* doctest glob
* nits
* update doctest nightly
* some cleaning
* run correct test on diff
* debug
* run on a single worker
* skip_cuda_test tampkate
* updates
* add rA and continue on failure
* test options
* parse `py` codeblock?
* we don't need to replace ignore results, don't remember whyu I put it
* cleanup
* more cleaning
* fix arg
* more cleaning
* clean an todo
* more pre-processing
* doctest-module has none so extra `- ` is needed
* remove logs
* nits
* doctest-modules ....
* oups
* let's use sugar
* make dataset go quiet
* add proper timeout
* nites
* spleling timeout
* update
* properly skip tests that have CUDSA
* proper skipping
* cleaning main and get tests to run
* remove make report?
* remove tee
* some updates
* tee was removed but is the full output still available?
* [all-test]
* only our tests
* don't touch tee in this PR
* no atee-sys
* proper sub
* monkey
* only replace call
* fix sub
* nits
* nits
* fix invalid syntax
* add skip cuda doctest env variable
* make sure all packages are installed
* move file
* update check repo
* revert changes
* nit
* finish cleanup
* fix re
* findall
* update don't test init files
* ignore pycache
* `-ignore-pycache` when running pytests
* try to fix the import missmatch error
* install dec
* pytest is required as doctest_utils imports things from it
* the only log issues were dataset, ignore results should work
* more cleaning
* Update .circleci/create_circleci_config.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* [ydshieh] empty string if cuda is found
* [ydshieh] fix condition
* style
* [ydshieh] fix
* Add comment
* style
* style
* show failure
* trigger CI
---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Add kernel size to NATTEN's QK arguments.
The new NATTEN 0.14.5 supports PyTorch 2.0, but also adds an additional
argument to the QK operation to allow optional RPBs.
This ends up failing NATTEN tests.
This commit adds NATTEN back to circleci and adds the arguments to get
it working again.
* Force NATTEN >= 0.14.5
* Mark pipeline tests to skip them easily
* Mark the mixin as pipeline test
* Update src/transformers/testing_utils.py
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
---------
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* Add a new test to check config attributes being used
* Add a new test to check config attributes being used
* Add a new test to check config attributes being used
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Apply suggestions
* Update allowed cases - part 1
* Update allowed cases - part 2
* final
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Result of black 23.1
* Update target to Python 3.7
* Switch flake8 to ruff
* Configure isort
* Configure isort
* Apply isort with line limit
* Put the right black version
* adapt black in check copies
* Fix copies
* initial commit. added tip placeholders and a script
* removed unused imports, fixed paths
* fixed generated links
* make style
* split language modeling doc into two: causal language modeling and masked language modeling
* added check_task_guides.py to make fix-copies
* review feedback addressed
* Add DiNAT
* Adds DiNAT + tests
* Minor fixes
* Added HF model
* Add natten to dependencies.
* Cleanup
* Minor fixup
* Reformat
* Optional NATTEN import.
* Reformat & add doc to _toctree
* Reformat (finally)
* Dummy objects for DiNAT
* Add NAT + minor changes
Adds NAT as its own independent model + docs, tests
Adds NATTEN to ext deps to ensure ci picks it up.
* Remove natten from `all` and `dev-torch` deps, add manual pip install to ci tests
* Minor fixes.
* Fix READMEs.
* Requested changes to docs + minor fixes.
* Requested changes.
* Add NAT/DiNAT tests to layoutlm_job
* Correction to Dinat doc.
* Requested changes.
* Try PT1.13 by removing torch scatter
* Skip failing tests
* Style
* Remvoe testing extras for repo utils
* Try with all decorators
* Try to wipe the cache
* Fix all tests?
* Try this way
* Fix comma
* Update to main
* Try with less deps
* Quality
* Change the import of kenlm from github to pypi
* Change the import of kenlm from github to pypi in circleci config
* Fix code quality issues
* Fix isort issue, add kenlm in extras for audio
* Add kenlm to deps
* Add kenlm to deps
* Commit 'make fixup' changes
* Remove version from kenlm deps
* commit make fixup changes
* Remove manual installation of kenlm
* Remove manual installation of kenlm
* Remove manual installation of kenlm
* Generate config on the file
* Fake modif for all test launch
* Upload more artifacts
* Typo and quality
* Try converting th yml to txt
* Leave my long lines alone yaml
* Debug prints
* Debug prints v2
* Try without sorting
* Was it really working before?
* Typo
* Use a parameter
* Use a parameter?
* Typo
* Here is some JSON
* Another try
* Learning to read...
* Check default is used
* Does this work?
* With continuation
* WiP
* Use a parameter for test list
* Other fake modif
* With the comma
* Name the test step so it doesn't blow up
* Just one example modification
* Final steps
* Add nightlies
* Move config generator
* Add trigger for nightlies
* Better workflow
* Rebase on recent changes
* Fix config creation
* Fake modif in an example
* Now fake modif in one config file
* Fix install step in custom tokenizers test
* Fix generated config
* Better fix hopefully
* Finally test modif in setup
* final cleanup
* Rework pipeline tests
* Try to fix Flax tests
* Try to put it before
* Use a new decorator instead
* Remove ignore marker since it doesn't work
* Filter pipeline tests
* Woopsie
* Use the fitlered list
* Clean up and fake modif
* Remove init
* Revert fake modif
* add sudachipy and jumanpp tokenizers for bert_japanese
* use ImportError instead of ModuleNotFoundError in SudachiTokenizer and JumanppTokenizer
* put test cases of test_tokenization_bert_japanese in one line
* add require_sudachi and require_jumanpp decorator for testing
* add sudachi and pyknp(jumanpp) to dependencies
* remove sudachi_dict_small and sudachi_dict_full from dependencies
* empty commit for ci
* Fix test fetching for examples
* Fake example modif
* Debug statements
* Typo
* You need to persist the file...
* Revert change in example
* Remove debug statements
* Tests conditional run
* Syntax
* Deps
* Try early exit
* Another way
* Test with no tests to run
* Test all
* Typo
* Try this way
* With tests to run
* Mostly finished
* Typo
* With a modification in one file only
* No change, no tests
* Final cleanup
* Address review comments
* Automatic detection for framework to use when exporting to ONNX
* Log message change
* Incorporating PR comments, adding unit test
* Adding tf for pip install for run_tests_onnxruntime CI
* Restoring past changes to circleci yaml and test_onnx_v2.py, tests moved to tests/onnx/test_features.py
* Fixup
* Adding test to fetcher
* Updating circleci config to log more
* Changing test class name
* Comment typo fix in tests/onnx/test_features.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Moving torch_str/tf_str to self.framework_pt/tf
* Remove -rA flag in circleci config
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Finished QA example
* Dodge a merge conflict
* Update text classification and LM examples
* Update NER example
* New Keras metrics WIP, fix NER example
* Update NER example
* Update MC, summarization and translation examples
* Add XLA warnings when shapes are variable
* Make sure batch_size is consistently scaled by num_replicas
* Add PushToHubCallback to all models
* Add docs links for KerasMetricCallback
* Add docs links for prepare_tf_dataset and jit_compile
* Correct inferred model names
* Don't assume the dataset has 'lang'
* Don't assume the dataset has 'lang'
* Write metrics in text classification
* Add 'framework' to TrainingArguments and TFTrainingArguments
* Export metrics in all examples and add tests
* Fix training args for Flax
* Update command line args for translation test
* make fixup
* Fix accidentally running other tests in fp16
* Remove do_train/do_eval from run_clm.py
* Remove do_train/do_eval from run_mlm.py
* Add tensorflow tests to circleci
* Fix circleci
* Update examples/tensorflow/language-modeling/run_mlm.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update examples/tensorflow/test_tensorflow_examples.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update examples/tensorflow/translation/run_translation.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update examples/tensorflow/token-classification/run_ner.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Fix save path for tests
* Fix some model card kwargs
* Explain the magical -1000
* Actually enable tests this time
* Skip text classification PR until we fix shape inference
* make fixup
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Make forward pass work
* More improvements
* Remove unused imports
* Remove timm dependency
* Improve loss calculation of token classifier
* Fix most tests
* Add docs
* Add model integration test
* Make all tests pass
* Add LayoutLMv3FeatureExtractor
* Improve integration test + make fixup
* Add example script
* Fix style
* Add LayoutLMv3Processor
* Fix style
* Add option to add visual labels
* Make more tokenizer tests pass
* Fix more tests
* Make more tests pass
* Fix bug and improve docs
* Fix import of processors
* Improve docstrings
* Fix toctree and improve docs
* Fix auto tokenizer
* Move tests to model folder
* Move tests to model folder
* change default behavior add_prefix_space
* add prefix space for fast
* add_prefix_spcae set to True for Fast
* no space before `unique_no_split` token
* add test to hightligh special treatment of added tokens
* fix `test_batch_encode_dynamic_overflowing` by building a long enough example
* fix `test_full_tokenizer` with add_prefix_token
* Fix tokenizer integration test
* Make the code more readable
* Add tests for LayoutLMv3Processor
* Fix style
* Add model to README and update init
* Apply suggestions from code review
* Replace asserts by value errors
* Add suggestion by @ducviet00
* Add model to doc tests
* Simplify script
* Improve README
* a step ahead to fix
* Update pair_input_test
* Make all tokenizer tests pass - phew
* Make style
* Add LayoutLMv3 to CI job
* Fix auto mapping
* Fix CI job name
* Make all processor tests pass
* Make tests of LayoutLMv2 and LayoutXLM consistent
* Add copied from statements to fast tokenizer
* Add copied from statements to slow tokenizer
* Remove add_visual_labels attribute
* Fix tests
* Add link to notebooks
* Improve docs of LayoutLMv3Processor
* Fix reference to section
Co-authored-by: SaulLu <lucilesaul.com@gmail.com>
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* Automatically sort auto mappings
* Better class extraction
* Some auto class magic
* Adapt test and underlying behavior
* Remove re-used config
* Quality