* initial working additions
* clean and rename, add cond stripping initial prompt to decode
* cleanup, edit create_initial_prompt_ids, add tests
* repo consistency, flip order of conditional
* fix error, move the processor fn to the tokenizer
* repo consistency, update test ids to corresponding tokenizer
* use convert_tokens_to_ids not get_vocab...
* use actual conditional in generate
* make sytle
* initial address comments
* initial working add new params to pipeline
* first draft of sequential generation for condition_on_previous_text
* add/update tests, make compatible with timestamps
* make compatible with diff. input kwargs and max length
* add None check
* add temperature check
* flip temp check operand
* refocusing to prev pr scope
* remove the params too
* make style
* edits, move max length incorporating prompt to whisper
* address comments
* remove asr pipeline prompt decoding, fix indexing
* address comments (more tests, validate prompt)
* un-comment out tests (from debug)
* remove old comment
* address comments
* fix typo
* remove timestamp token from test
* make style
* cleanup
* copy method to fast tokenizer, set max_new_tokens for test
* prompt_ids type just pt
* address Amy's comments
* make style
* add `get_input_embeddings` to `WhisperForAudioClassification`
* add common tests
* fix another common test
* Update tests/models/whisper/test_modeling_whisper.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix style
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* [Whisper] Add model for audio classification
* make fix-copies
* add to docs
* add docstring
* empty returns
* add code example
* switch to fleurs
* stick everything on one line
* Result of black 23.1
* Update target to Python 3.7
* Switch flake8 to ruff
* Configure isort
* Configure isort
* Apply isort with line limit
* Put the right black version
* adapt black in check copies
* Fix copies
* add draft logit processor
* add template functions
* update timesapmt processor parameters
* draft script
* simplify code
* cleanup
* fixup and clean
* update pipeline
* style
* clean up previous idea
* add tokenization utils
* update tokenizer and asr output
* fit whisper type
* style and update test
* clean test
* style test
* update tests
* update error test
* udpate code (not based on review yet)
* update tokenization
* update asr pipeline
* update code
* cleanup and update test
* fmt
* remove text verificatino
* cleanup
* cleanup
* add model test
* update tests
* update code add docstring
* update code and add docstring
* fix pipeline tests
* add draft logit processor
add template functions
update timesapmt processor parameters
draft script
simplify code
cleanup
fixup and clean
update pipeline
style
clean up previous idea
add tokenization utils
update tokenizer and asr output
fit whisper type
style and update test
clean test
style test
update tests
update error test
udpate code (not based on review yet)
update tokenization
update asr pipeline
update code
cleanup and update test
fmt
remove text verificatino
cleanup
cleanup
add model test
update tests
update code add docstring
update code and add docstring
fix pipeline tests
* Small update.
* Fixup.
* Tmp.
* More support.
* Making `forced_decoder_ids` non mandatory for users to set.
* update and fix first bug
* properly process sequence right after merge if last
* tofo
* allow list inputs + compute begin index better
* start adding tests
* add the 3 edge cases
* style
* format sequences
* fixup
* update
* update
* style
* test passes, edge cases should be good
* update last value
* remove Trie
* update tests and expec ted values
* handle bigger chunk_length
* clean tests a bit
* refactor chunk iter and clean pipeline
* update tests
* style
* refactor chunk iter and clean pipeline
* upade
* resolve comments
* Apply suggestions from code review
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
* take stride right into account
* update test expected values
* Update code based on review
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
* major refactor
* add correct strides for tests
* Update src/transformers/pipelines/automatic_speech_recognition.py
* fix whisper timestamp test
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
* add draft logit processor
* add template functions
* update timesapmt processor parameters
* draft script
* simplify code
* cleanup
* fixup and clean
* update pipeline
* style
* clean up previous idea
* add tokenization utils
* update tokenizer and asr output
* fit whisper type
* style and update test
* clean test
* style test
* update tests
* update error test
* udpate code (not based on review yet)
* update tokenization
* update asr pipeline
* update code
* cleanup and update test
* fmt
* remove text verificatino
* cleanup
* cleanup
* add model test
* update tests
* update code add docstring
* update code and add docstring
* fix pipeline tests
* add draft logit processor
add template functions
update timesapmt processor parameters
draft script
simplify code
cleanup
fixup and clean
update pipeline
style
clean up previous idea
add tokenization utils
update tokenizer and asr output
fit whisper type
style and update test
clean test
style test
update tests
update error test
udpate code (not based on review yet)
update tokenization
update asr pipeline
update code
cleanup and update test
fmt
remove text verificatino
cleanup
cleanup
add model test
update tests
update code add docstring
update code and add docstring
fix pipeline tests
* Small update.
* Fixup.
* Tmp.
* More support.
* Making `forced_decoder_ids` non mandatory for users to set.
* update and fix first bug
* properly process sequence right after merge if last
* tofo
* allow list inputs + compute begin index better
* start adding tests
* add the 3 edge cases
* style
* format sequences
* fixup
* update
* update
* style
* test passes, edge cases should be good
* update last value
* remove Trie
* update tests and expec ted values
* handle bigger chunk_length
* clean tests a bit
* refactor chunk iter and clean pipeline
* update tests
* style
* refactor chunk iter and clean pipeline
* upade
* resolve comments
* Apply suggestions from code review
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
* take stride right into account
* update test expected values
* Update code based on review
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
* move generation_*.py src files into generation/*.py
* populate generation.__init__ with lazy loading
* move imports and references from generation.xxx.object to generation.object