transformers/tests
Matthijs Hollemans cd927a4736
add word-level timestamps to Whisper (#23205)
* let's go!

* initial implementation of token-level timestamps

* only return a single timestamp per token

* remove token probabilities

* fix return type

* fix doc comment

* strip special tokens

* rename

* revert to not stripping special tokens

* only support models that have alignment_heads

* add integration test

* consistently name it token-level timestamps

* small DTW tweak

* initial support for ASR pipeline

* fix pipeline doc comments

* resolve token timestamps in pipeline with chunking

* change warning when no final timestamp is found

* return word-level timestamps

* fixup

* fix bug that skipped final word in each chunk

* fix failing unit tests

* merge punctuations into the words

* also return word tokens

* also return token indices

* add (failing) unit test for combine_tokens_into_words

* make combine_tokens_into_words private

* restore OpenAI's punctuation rules

* add pipeline tests

* make requested changes

* PR review changes

* fix failing pipeline test

* small stuff from PR

* only return words and their timestamps, not segments

* move alignment_heads into generation config

* forgot to set alignment_heads in pipeline tests

* tiny comment fix

* grr
2023-06-21 17:48:21 +02:00
..
benchmark [Test refactor 1/5] Per-folder tests reorganization (#15725) 2022-02-23 15:46:28 -05:00
bettertransformer Add methods to PreTrainedModel to use PyTorch's BetterTransformer (#21259) 2023-04-27 11:03:42 +02:00
bnb [tests] fix bitsandbytes import issue (#24151) 2023-06-09 21:53:11 -07:00
deepspeed accelerate deepspeed and gradient accumulation integrate (#23236) 2023-05-31 15:16:22 +05:30
extended [tests] switch to torchrun (#22712) 2023-04-12 08:25:45 -07:00
fixtures [WIP] add SpeechT5 model (#18922) 2023-02-03 12:43:46 -05:00
generation Generate: add SequenceBiasLogitsProcessor (#24334) 2023-06-21 11:14:41 +01:00
models add word-level timestamps to Whisper (#23205) 2023-06-21 17:48:21 +02:00
onnx Fix issue introduced in PR #23163 (#23363) 2023-05-15 11:38:44 +02:00
optimization Make schedulers picklable by making lr_lambda fns global (#21768) 2023-03-02 12:08:43 -05:00
pipelines add word-level timestamps to Whisper (#23205) 2023-06-21 17:48:21 +02:00
repo_utils Fix expected value in tests of the test fetcher (#24077) 2023-06-07 11:38:56 -04:00
sagemaker Avoid invalid escape sequences, use raw strings (#22936) 2023-04-25 09:17:56 -04:00
tokenization Update quality tooling for formatting (#21480) 2023-02-06 18:10:56 -05:00
tools Fix image segmentation tool bug (#23897) 2023-06-15 08:09:31 -04:00
trainer 🚨🚨🚨 Replace DataLoader logic for Accelerate in Trainer, remove unneeded tests 🚨🚨🚨 (#24028) 2023-06-12 11:23:37 -04:00
utils Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
__init__.py
test_backbone_common.py Add TimmBackbone model (#22619) 2023-06-06 17:11:30 +01:00
test_configuration_common.py Split common test from core tests (#24284) 2023-06-15 07:30:24 -04:00
test_configuration_utils.py Split common test from core tests (#24284) 2023-06-15 07:30:24 -04:00
test_feature_extraction_common.py Split common test from core tests (#24284) 2023-06-15 07:30:24 -04:00
test_feature_extraction_utils.py Split common test from core tests (#24284) 2023-06-15 07:30:24 -04:00
test_image_processing_common.py Split common test from core tests (#24284) 2023-06-15 07:30:24 -04:00
test_image_processing_utils.py Split common test from core tests (#24284) 2023-06-15 07:30:24 -04:00
test_image_transforms.py Bug fix - flip_channel_order for channels first images (#23701) 2023-05-31 17:12:27 +01:00
test_modeling_common.py Fix gradient checkpointing + fp16 autocast for most models (#24247) 2023-06-21 17:04:59 +02:00
test_modeling_flax_common.py Split common test from core tests (#24284) 2023-06-15 07:30:24 -04:00
test_modeling_flax_utils.py Split common test from core tests (#24284) 2023-06-15 07:30:24 -04:00
test_modeling_tf_common.py Add test for proper TF input signatures (#24320) 2023-06-16 17:03:13 +01:00
test_modeling_tf_utils.py Split common test from core tests (#24284) 2023-06-15 07:30:24 -04:00
test_modeling_utils.py Tied weights load (#24310) 2023-06-16 10:55:42 -04:00
test_pipeline_mixin.py Update tiny models for pipeline testing. (#24364) 2023-06-20 14:43:10 +02:00
test_sequence_feature_extraction_common.py Apply ruff flake8-comprehensions (#21694) 2023-02-22 09:14:54 +01:00
test_tokenization_common.py Split common test from core tests (#24284) 2023-06-15 07:30:24 -04:00
test_tokenization_utils.py Split common test from core tests (#24284) 2023-06-15 07:30:24 -04:00