* Add Blip2 model in VQA pipeline
* use require_torch_gpu for test_large_model_pt_blip2
* use can_generate in vqa pipeline
* test Blip2ForConditionalGeneration using float16
* remove custom can_generate from Blip2ForConditionalGeneration
* add AutoModelForTextToSpeech class
* add TTS pipeline and tessting
* add docstrings to text_to_speech pipeline
* fix torch dependency
* corrector 'processor is None' case in Pipeline
* correct repo id
* modify text-to-speech -> text-to-audio
* remove processor
* rename text_to_speech pipelines files to text_audio
* add textToWaveform and textToSpectrogram instead of textToAudio classes
* update TTS pipeline to the bare minimum
* update tests TTS pipeline
* make style and erase useless import torch in TTS pipeline tests
* modify how to check if generate or forward in TTS pipeline
* remove unnecessary extra new lines
* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* refactor input_texts -> text_inputs
* correct docstrings of TTS.__call__
* correct the shape of generated waveform
* take care of Bark tokenizer special case
* correct run_pipeline_test TTS
* make style
* update TTS docstrings
* address Sylvain nit refactors
* make style
* refactor into one liners
* correct squeeze
* correct way to test if forward or generate
* Update output audio waveform shape
* make style
* correct import
* modify how the TTS pipeline test if a model can generate
* align shape output of TTS pipeline with consistent shape
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* [ASR Pipeline] Fix init
* refactor test
* change default kwarg setting
* only perform checks if we have to
* override init
* move pre/forward/post checks to sanitize
* Fix rescaling bug
* Add tests
* Update integration tests
* Fix up
* Update src/transformers/image_transforms.py
* Update test - new possible order in list
* add llama
* add other readmes
* update padding id in readme
* add link to paper
* fix paths and tokenizer
* more nits
* styling
* fit operation in 2 lines when possible
* nits
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* add form
* update reademe
* update readme, we don't have a default pad token
* update test and tokenization
* LLaMA instead of Llama
* nits
* add expected text
* add greeedy output
* styling
* Update src/transformers/models/llama/modeling_llama.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* sequential device map
* skip relevant changes
---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Allow dict input for audio classification pipeline
* make style
* Empty commit to trigger CI
* Empty commit to trigger CI
* check for torchaudio
* add pip instructions
Co-authored-by: Sylvain <sylvain.gugger@gmail.com>
* Update src/transformers/pipelines/audio_classification.py
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
* asr -> audio class
* asr -> audio class
---------
Co-authored-by: Sylvain <sylvain.gugger@gmail.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
* let's go!
* initial implementation of token-level timestamps
* only return a single timestamp per token
* remove token probabilities
* fix return type
* fix doc comment
* strip special tokens
* rename
* revert to not stripping special tokens
* only support models that have alignment_heads
* add integration test
* consistently name it token-level timestamps
* small DTW tweak
* initial support for ASR pipeline
* fix pipeline doc comments
* resolve token timestamps in pipeline with chunking
* change warning when no final timestamp is found
* return word-level timestamps
* fixup
* fix bug that skipped final word in each chunk
* fix failing unit tests
* merge punctuations into the words
* also return word tokens
* also return token indices
* add (failing) unit test for combine_tokens_into_words
* make combine_tokens_into_words private
* restore OpenAI's punctuation rules
* add pipeline tests
* make requested changes
* PR review changes
* fix failing pipeline test
* small stuff from PR
* only return words and their timestamps, not segments
* move alignment_heads into generation config
* forgot to set alignment_heads in pipeline tests
* tiny comment fix
* grr
* Chunkable classification pipeline
The TokenClassificationPipeline is now able to process sequences longer than 512. No matter the framework, the model, the tokenizer. We just have to pass process_all=True and a stride number (optional). The behavior remains the same if you don't pass these optional parameters. For overlapping parts when using stride above 0, we consider only the max scores for each overlapped token in all chunks where the token is.
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* update with latest black format
* update black format
* Update token_classification.py
* Update token_classification.py
* format correction
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update comments
* Update src/transformers/pipelines/token_classification.py
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
* Update token_classification.py
Correct spaces, remove process_all and keep only stride. If stride is provided, the pipeline is applied to the whole text.
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update chunk aggregation
Update the chunk aggregation strategy based on entities aggregation.
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
Remove unnecessary pop from outputs dict
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update token_classification.py
* Update src/transformers/pipelines/token_classification.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* add chunking tests
* correct formating
* correct formatting
* correct model id for test chunking
* update scores with nested simplify
* Update test_pipelines_token_classification.py
* Update test_pipelines_token_classification.py
* update model to a tiny one
* Update test_pipelines_token_classification.py
* Adding smaller test for chunking.
* Fixup
* Update token_classification.py
* Update src/transformers/pipelines/token_classification.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/pipelines/token_classification.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [WIP] whisper refacto to support language output.
* Handling merges.
* A bit more cleanup and comments.
* Many improvements.
Lots of details everywhere.
* Cleanup old code and tests.
* Handle lone timestamp tokens (just recover when something bad happens).
* Adding return_language example.
* No ffmpeg.
* Hmm.
* Some corrections.
* Both fast and slow.
* New black.
* Update src/transformers/models/whisper/tokenization_whisper.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/whisper/tokenization_whisper.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Remove print.
* Undoing tests modifications.
* Smaller test modifications.
* Rename.
* Remove maxDiff.
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Mark pipeline tests to skip them easily
* Mark the mixin as pipeline test
* Update src/transformers/testing_utils.py
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
---------
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* add pipeline
* update init
* add zero shot to init
* update inits and correct checkpoints
* update base to support input features
* add tests
* Update src/transformers/pipelines/zero_shot_audio_classification.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Update src/transformers/pipelines/zero_shot_audio_classification.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* update pieline code
* use tiny checkpoint
* nits and expected value with tiny model
* style
* last nit on tests values
* fix styling
* fix collate fn that was casting t float
* update
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* fix: Change is_last chunk calc and add conditional break
* format fix
* account for 0 and full stride_rights, add comment
* add new test
* make style
* update slow whisper asr test timestamps
* use nested_simplify on output and round timestamp to hundreths place
* Result of black 23.1
* Update target to Python 3.7
* Switch flake8 to ruff
* Configure isort
* Configure isort
* Apply isort with line limit
* Put the right black version
* adapt black in check copies
* Fix copies
* update whisper logit processor
* add generate for whisper
* remove part of the whisper specific code from pipeline
* update logit processes
* major update
* enforce first timestamp
* update generate
* add more tests
* update new decoding strategy
* Apply suggestions from code review
* update docstring
* fixup
* default config will not have multilingual ar
* update expected tokenizer size, see pull on the hub for whisper-tiny
* Fixing the pipeline with image processor.
* Update the slow test.
* Using only the first image processor.
* Include exclusion mecanism for Image processor.
* Do not handle Gitconfig, deemed as a bug.
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Remove `conversational` changes. They are not supposed to be here.
* Address first row of comments.
* Remove OneFormer modifications.
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>