transformers/tests
Arthur ef7e93699a
[Tokenizer] Fix slow and fast serialization (#26570)
* fix

* last attempt

* current work

* fix forward compatibility

* save all special tokens

* current state

* revert additional changes

* updates

* remove tokenizer.model

* add a test and the fix

* nit

* revert one more break

* fix typefield issue

* quality

* more tests

* fix fields for FC

* more nits?

* new additional changes

* how

* some updates

* simplify all

* more nits

* revert some things to original

* nice

* nits

* a small hack

* more nits

* ahhaha

* fixup

* update

* make test run on ci

* use subtesting

* update

* Update .circleci/create_circleci_config.py

* updates

* fixup

* nits

* replace typo

* fix the test

* nits

* update

* None max dif pls

* a partial fix

* had to revert one thing

* test the fast

* updates

* fixup

* and more nits

* more fixes

* update

* Oupsy 👁️

* nits

* fix marian

* on our way to heaven

* Update src/transformers/models/t5/tokenization_t5.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

* fixup

* Update src/transformers/tokenization_utils_fast.py

Co-authored-by: Leo Tronchon <leo.tronchon@gmail.com>

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Leo Tronchon <leo.tronchon@gmail.com>

* fix phobert

* skip some things, test more

* nits

* fixup

* fix deberta

* update

* update

* more updates

* skip one test

* more updates

* fix camembert

* can't test this one

* more good fixes

* kind of a major update

- seperate what is only done in fast in fast init and refactor
- add_token(AddedToken(..., speicla = True)) ignores it in fast
- better loading

* fixup

* more fixups

* fix pegasus and mpnet

* remove skipped tests

* fix phoneme tokenizer if self.verbose

* fix individual models

* update common tests

* update testing files

* all over again

* nits

* skip test for markup lm

* fixups

* fix order of addition in fast by sorting the added tokens decoder

* proper defaults for deberta

* correct default for fnet

* nits on add tokens, string initialized to special if special

* skip irrelevant herbert tests

* main fixes

* update test added_tokens_serialization

* the fix for bart like models and class instanciating

* update bart

* nit!

* update idefix test

* fix whisper!

* some fixup

* fixups

* revert some of the wrong chanegs

* fixup

* fixup

* skip marian

* skip the correct tests

* skip for tf and flax as well

---------

Co-authored-by: Lysandre Debut <hi@lysand.re>
Co-authored-by: Leo Tronchon <leo.tronchon@gmail.com>
2023-10-18 16:30:53 +02:00
..
benchmark [Test refactor 1/5] Per-folder tests reorganization (#15725) 2022-02-23 15:46:28 -05:00
bettertransformer Fixed malapropism error (#26660) 2023-10-09 11:04:57 +02:00
deepspeed fix the deepspeed tests (#26021) 2023-09-13 10:26:53 +05:30
extended remove SharedDDP as it is deprecated (#25702) 2023-10-06 16:03:11 +02:00
fixtures [WIP] add SpeechT5 model (#18922) 2023-02-03 12:43:46 -05:00
fsdp Skip TrainerIntegrationFSDP::test_basic_run_with_cpu_offload if torch < 2.1 (#26764) 2023-10-12 18:22:09 +02:00
generation [Assistant Generation] Improve Encoder Decoder (#26701) 2023-10-11 15:52:20 +02:00
models [Tokenizer] Fix slow and fast serialization (#26570) 2023-10-18 16:30:53 +02:00
optimization Make schedulers picklable by making lr_lambda fns global (#21768) 2023-03-02 12:08:43 -05:00
peft_integration [PEFT] Final fixes (#26559) 2023-10-03 14:53:09 +02:00
pipelines Add many missing spaces in adjacent strings (#26751) 2023-10-12 10:28:40 +02:00
quantization 🚨🚨🚨 [Quantization] Store the original dtype in the config as a private attribute 🚨🚨🚨 (#26761) 2023-10-16 19:56:53 +02:00
repo_utils Docstring check (#26052) 2023-10-04 15:13:37 +02:00
sagemaker Add many missing spaces in adjacent strings (#26751) 2023-10-12 10:28:40 +02:00
tokenization [Tokenizer] Fix slow and fast serialization (#26570) 2023-10-18 16:30:53 +02:00
tools Add support for for loops in python interpreter (#24429) 2023-06-26 09:58:14 -04:00
trainer enable optuna multi-objectives feature (#25969) 2023-09-12 18:01:22 +01:00
utils Fix failing MusicgenTest .test_pipeline_text_to_audio (#26586) 2023-10-06 15:53:59 +02:00
__init__.py GPU text generation: mMoved the encoded_prompt to correct device 2020-01-06 15:11:12 +01:00
test_backbone_common.py [AutoBackbone] Add test (#26094) 2023-09-18 23:47:54 +02:00
test_configuration_common.py Deal with nested configs better in base class (#25237) 2023-08-04 14:56:09 +02:00
test_configuration_utils.py Deal with nested configs better in base class (#25237) 2023-08-04 14:56:09 +02:00
test_feature_extraction_common.py Split common test from core tests (#24284) 2023-06-15 07:30:24 -04:00
test_feature_extraction_utils.py Split common test from core tests (#24284) 2023-06-15 07:30:24 -04:00
test_image_processing_common.py Input data format (#25464) 2023-08-16 17:45:02 +01:00
test_image_processing_utils.py Run hub tests (#24807) 2023-07-13 15:25:45 -04:00
test_image_transforms.py Add input_data_format argument, image transforms (#25462) 2023-08-11 15:09:31 +01:00
test_modeling_common.py [FA2] Fix flash attention 2 fine-tuning with Falcon (#26852) 2023-10-17 15:38:03 +02:00
test_modeling_flax_common.py Split common test from core tests (#24284) 2023-06-15 07:30:24 -04:00
test_modeling_flax_utils.py Split common test from core tests (#24284) 2023-06-15 07:30:24 -04:00
test_modeling_tf_common.py Skip test_onnx_runtime_optimize for now (#25560) 2023-08-17 11:23:16 +02:00
test_modeling_tf_utils.py Split common test from core tests (#24284) 2023-06-15 07:30:24 -04:00
test_modeling_utils.py skip flaky hub tests (#26594) 2023-10-04 17:47:55 +02:00
test_pipeline_mixin.py Fix failing MusicgenTest .test_pipeline_text_to_audio (#26586) 2023-10-06 15:53:59 +02:00
test_sequence_feature_extraction_common.py Fix typo (#25966) 2023-09-05 10:12:25 +02:00
test_tokenization_common.py [Tokenizer] Fix slow and fast serialization (#26570) 2023-10-18 16:30:53 +02:00
test_tokenization_utils.py Split common test from core tests (#24284) 2023-06-15 07:30:24 -04:00