Arthur
b15343de6f
[Patch-t5-tokenizer] Patches the changes on T5 to make sure previous behaviour is still valide for beginning of words ( #24622 )
...
* patch `_tokenize` function
* more tests
* properly fix
* fixup
* Update src/transformers/models/t5/tokenization_t5.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix without ifs
* update
* protect import
* add python processing
* is first needed
* add doc and update with lefacy
* updaate
* fix T5 SPM converter
* styling
* fix T5 warning
* add is_seqio_available
* remove is_first
* revert some changes
* more tests and update
* update llama test batterie
* fixup
* refactor T5 spm common tests
* draft the llama tests
* update
* uopdate test
* nits
* refine
* name nit
* fix t5 tests
* fix T5
* update
* revert convert slow to fast changes that fail lots of tests
* legacy support
* fixup
* nits is first not defined
* don't use legacy behaviour for switch transformers
* style
* My attempt to check.
* nits
* fixes
* update
* fixup
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* updates
* fixup
* add legacy warning
* fixup
* warning_once nit
* update t5 documentation test
* update llama tok documentation
* add space to warning
* nits
* nit
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* last nits
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2023-07-11 15:02:18 +02:00
Arthur
b52a03cd3b
⚠️ ⚠️ [T5Tokenize
] Fix T5 family tokenizers ⚠️ ⚠️ ( #24565 )
...
* don't add space before single letter chars that don't have a merge
* fix the fix
* fixup
* add a test
* more testing
* fixup
* hack to make sure fast is also fixed
* update switch transformers test
* revert convert slow
* Update src/transformers/models/t5/tokenization_t5.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* add typechecking
* quality
---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-06-30 07:00:43 +02:00
Aaron Gokaslan
5e8c8eb5ba
Apply ruff flake8-comprehensions ( #21694 )
2023-02-22 09:14:54 +01:00
Sylvain Gugger
6f79d26442
Update quality tooling for formatting ( #21480 )
...
* Result of black 23.1
* Update target to Python 3.7
* Switch flake8 to ruff
* Configure isort
* Configure isort
* Apply isort with line limit
* Put the right black version
* adapt black in check copies
* Fix copies
2023-02-06 18:10:56 -05:00
Pengfei Liu
8ad06b7c13
using raw string for regex to search <extra_id> ( #21162 )
...
* using raw string for regex to search <extra_id>
* fix the same issue in test file:`tokenization_t5.py`
2023-01-18 09:43:54 -05:00
raghavanone
03ae1f060b
change the way sentinel tokens can retrived ( #20373 )
...
* change the way sentinel tokens can retrived
* Fix line length for doc string
* Fix line length for doc string
* Add more stronger test for t5 tokenization
* Format file changes
* Make a stronger test for filtering sentinel tokens
* fix file format issues
2022-11-23 09:35:44 -05:00
Sylvain Gugger
986526a0e4
Replace as_target
context managers by direct calls ( #18325 )
...
* Preliminary work on tokenizers
* Quality + fix tests
* Treat processors
* Fix pad
* Remove all uses of in tests, docs and examples
* Replace all as_target_tokenizer
* Fix tests
* Fix quality
* Update examples/flax/image-captioning/run_image_captioning_flax.py
Co-authored-by: amyeroberts <amy@huggingface.co>
* Style
Co-authored-by: amyeroberts <amy@huggingface.co>
2022-07-29 08:09:09 -04:00
Yih-Dar
19420fd99e
Move test model folders ( #17034 )
...
* move test model folders (TODO: fix imports and others)
* fix (potentially partially) imports (in model test modules)
* fix (potentially partially) imports (in tokenization test modules)
* fix (potentially partially) imports (in feature extraction test modules)
* fix import utils.test_modeling_tf_core
* fix path ../fixtures/
* fix imports about generation.test_generation_flax_utils
* fix more imports
* fix fixture path
* fix get_test_dir
* update module_to_test_file
* fix get_tests_dir from wrong transformers.utils
* update config.yml (CircleCI)
* fix style
* remove missing imports
* update new model script
* update check_repo
* update SPECIAL_MODULE_TO_TEST_MAP
* fix style
* add __init__
* update self-scheduled
* fix add_new_model scripts
* check one way to get location back
* python setup.py build install
* fix import in test auto
* update self-scheduled.yml
* update slack notification script
* Add comments about artifact names
* fix for yolos
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-05-03 14:42:02 +02:00