Yih-Dar
36ee128375
Fix WhisperModelTest
( #21883 )
...
* force on the same device
* fix tests
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-01 20:41:27 +01:00
Yih-Dar
871c31a6f1
🔥 Rework pipeline testing by removing PipelineTestCaseMeta
🚀 ( #21516 )
...
* Add PipelineTesterMixin
* remove class PipelineTestCaseMeta
* move validate_test_components
* Add for ViT
* Add to SPECIAL_MODULE_TO_TEST_MAP
* style and quality
* Add feature-extraction
* update
* raise instead of skip
* add tiny_model_summary.json
* more explicit
* skip tasks not in mapping
* add availability check
* Add Copyright
* A way to diable irrelevant tests
* update with main
* remove disable_irrelevant_tests
* skip tests
* better skip message
* better skip message
* Add all pipeline task tests
* revert
* Import PipelineTesterMixin
* subclass test classes with PipelineTesterMixin
* Add pipieline_model_mapping
* Fix import after adding pipieline_model_mapping
* Fix style and quality after adding pipieline_model_mapping
* Fix one more import after adding pipieline_model_mapping
* Fix style and quality after adding pipieline_model_mapping
* Fix test issues
* Fix import requirements
* Fix mapping for MobileViTModelTest
* Update
* Better skip message
* pipieline_model_mapping could not be None
* Remove some PipelineTesterMixin
* Fix typo
* revert tests_fetcher.py
* update
* rename
* revert
* Remove PipelineTestCaseMeta from ZeroShotAudioClassificationPipelineTests
* style and quality
* test fetcher for all pipeline/model tests
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-02-28 19:40:57 +01:00
Arthur
087436c98e
Fix-ci-whisper ( #21767 )
...
* fix history
* input_features instead of input ids for TFWhisport doctest
* use translate intead of transcribe
2023-02-24 11:39:25 +01:00
bofeng huang
c8545d2a9c
[Whisper] Add SpecAugment ( #21298 )
...
* Return and rescale attention_mask
* Add SpecAugment to Whisper modeling
* Fix test
* Update docstring
* Add SpecAug related parameters to model config
* Add the _mask_input_features function to doc
* Fix quality
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Remove dev comments
* Add test
* Resolve conflict
* feat: mask {feature, time} prob fast tests
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-02-24 11:07:52 +01:00
Jonatan Kłosko
deafc24388
Add WhisperTokenizerFast ( #21222 )
...
* Add WhisperTokenizerFast
* Fixup
* Up
* Up
* Improve tests
* Update src/transformers/models/whisper/tokenization_whisper_fast.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Keep stride in whisper pipelien test
* Remove unknown token special case
* Reduce vocabulary size in tests
* Fix vocab size assertion
* Sync copied changes from WhisperTokenizer
* Skip pipeline tests
* Update assertion
* Remove Whisper tokenizer dependency on sentencepiece
* Format
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-02-21 06:58:54 +01:00
Sylvain Gugger
c87bbe1ff0
Fix quality
2023-02-20 03:27:09 -05:00
Andy Ehrenberg
2840272c5f
add flax whisper implementation ( #20479 )
...
* add flax whisper implementation
* rever change to setup
* remove unused imports
* revert generation changes
* flax whisper docs
* docs
* import order
* import sorting
* isort
* add dummy objects
* doc formatting
* formatting
* remove trailing whitespaces
* fix flax whisper docs
* add generation logic to unlock flax whisper
* remove scans
* give credits to Flax Bart implementation
* remove unused imports
* add license
* remove assert
* more credits to Bart
* fix style
* formatting
* support left padding
* add flax whisper generation test
* remove copied from comments whenever not a full copy
* fix docstrings for logits processors
* revert change to FlaxForceTokensLogitsProcessor
* revert doc changes
* improve generation docs
* reorganize
* formatting
* cleanup docs
* add tests
* handle empty list case
* fix forced decoder ids in flax tests
* add flax whisper to inits
* upate dummy objects
* docs for FlaxAutoModelForSpeechSeq2Seq
* fix decoder_position_ids computation in pretrained model decode/__call__ fns
* add Copied from statements as necessary
* compute position_ids only in __call__ and decode methods of pretrained model subclasses
* improve readabilityof compute positional embeddings
* check dimensionality of input_features instead of hidden_states
* copied from statement for init_cache
* formatting
* fix copies
* fix copies
* pass attention mask to encoder layers
* fix decoder module outputs
* set dtype
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* smaller flax model for whisper test
* Update src/transformers/generation/flax_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/whisper/modeling_flax_whisper.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update tests/models/whisper/test_modeling_flax_whisper.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* cleanup
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/whisper/modeling_flax_whisper.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* bias cleanup
* doc fix
* align style for force tokens processor
* readability
* fix input shape in tests
* revert FlaxGenerationMixin docstring
* formatting
* fix tests
* fix imports
* consistent encoder hidden states
* consistent hidden states
* input shapes
* typo
* partial class trick
* partial class for input shape
* base_class with correct input shape
* partial base classes
* match by name
* set main_input_name
* compare on names
* formatting
* remove unused import
* safer position ids computation
* safer position id computation
* Update src/transformers/models/whisper/modeling_flax_whisper.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Update src/transformers/models/whisper/modeling_flax_whisper.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* remove identical inherited tests
* fix prompt ids in tests
* use generation config
* use jnp array
* better var names
* more explicit bias use
* import transformers
* formatting
* test formatting
* remove unused imports
* remove unused imports
* formatting
* isort
* docs
* fix ln orders for encoder hidden states
* whisper unique generation stuff
* flake
* use finfo for attention bias
* docs
* Update src/transformers/generation/flax_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* docs
* add timestamp flax test
* jit for timestamps
* formatting
* clean up timestamps processor
* formatting
* remove if_true
* cleanup
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-02-20 09:17:40 +01:00
Yih-Dar
cbecf121cd
Fix env. variable type issue in testing ( #21609 )
...
* fix env issue
* fix env issue
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-02-13 20:53:26 +01:00
Joao Gante
24273268b7
Generate: Fix flaky indexing error in test_constrained_beam_search_generate_dict_output
( #21561 )
2023-02-13 15:12:07 +00:00
Quentin Meeus
5b72b3412b
Remove CLI spams with Whisper FeatureExtractor ( #21267 )
...
* Remove CLI spams with Whisper FeatureExtractor
Whisper feature extractor representation includes the MEL filters, a list of list that is represented as ~16,000 lines. This needlessly spams the command line. I added a `__repr__` method that replaces this list with a string "<array of shape (80, 201)>"
* Remove mel_filters from to_dict output
Credits to @ArthurZucker
* remove unused import
* update feature extraction tests for the changes in to_dict
2023-02-10 09:15:16 -05:00
Sylvain Gugger
97d3390fc8
Skip failing test for now
2023-02-09 20:11:26 -05:00
Sylvain Gugger
6f79d26442
Update quality tooling for formatting ( #21480 )
...
* Result of black 23.1
* Update target to Python 3.7
* Switch flake8 to ruff
* Configure isort
* Configure isort
* Apply isort with line limit
* Put the right black version
* adapt black in check copies
* Fix copies
2023-02-06 18:10:56 -05:00
Arthur
0dff407d71
[Whisper] another patch ( #21324 )
...
* another patch
* fix timestamp test modeling
* let it be negative when the token is None
2023-01-27 16:35:16 +01:00
Arthur
6f3faf3863
[WHISPER] Small patch ( #21307 )
...
* add small patch
* update tests, forced decoder ids is not prioritary against generation config
* fix two new tests
2023-01-25 22:49:23 +01:00
Arthur
255257f3ea
[Whisper] Refactor whisper ( #21252 )
...
* update whisper logit processor
* add generate for whisper
* remove part of the whisper specific code from pipeline
* update logit processes
* major update
* enforce first timestamp
* update generate
* add more tests
* update new decoding strategy
* Apply suggestions from code review
* update docstring
* fixup
* default config will not have multilingual ar
* update expected tokenizer size, see pull on the hub for whisper-tiny
2023-01-25 13:09:43 +01:00
Arthur
e9b4800dda
[Whisper] Fix timestamp processor ( #21187 )
...
* add draft logit processor
* add template functions
* update timesapmt processor parameters
* draft script
* simplify code
* cleanup
* fixup and clean
* update pipeline
* style
* clean up previous idea
* add tokenization utils
* update tokenizer and asr output
* fit whisper type
* style and update test
* clean test
* style test
* update tests
* update error test
* udpate code (not based on review yet)
* update tokenization
* update asr pipeline
* update code
* cleanup and update test
* fmt
* remove text verificatino
* cleanup
* cleanup
* add model test
* update tests
* update code add docstring
* update code and add docstring
* fix pipeline tests
* add draft logit processor
add template functions
update timesapmt processor parameters
draft script
simplify code
cleanup
fixup and clean
update pipeline
style
clean up previous idea
add tokenization utils
update tokenizer and asr output
fit whisper type
style and update test
clean test
style test
update tests
update error test
udpate code (not based on review yet)
update tokenization
update asr pipeline
update code
cleanup and update test
fmt
remove text verificatino
cleanup
cleanup
add model test
update tests
update code add docstring
update code and add docstring
fix pipeline tests
* Small update.
* Fixup.
* Tmp.
* More support.
* Making `forced_decoder_ids` non mandatory for users to set.
* update and fix first bug
* properly process sequence right after merge if last
* tofo
* allow list inputs + compute begin index better
* start adding tests
* add the 3 edge cases
* style
* format sequences
* fixup
* update
* update
* style
* test passes, edge cases should be good
* update last value
* remove Trie
* update tests and expec ted values
* handle bigger chunk_length
* clean tests a bit
* refactor chunk iter and clean pipeline
* update tests
* style
* refactor chunk iter and clean pipeline
* upade
* resolve comments
* Apply suggestions from code review
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
* take stride right into account
* update test expected values
* Update code based on review
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
* major refactor
* add correct strides for tests
* Update src/transformers/pipelines/automatic_speech_recognition.py
* fix whisper timestamp test
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
2023-01-19 16:25:56 +01:00
Arthur
bb300ac686
Whisper Timestamp processor and prediction ( #20620 )
...
* add draft logit processor
* add template functions
* update timesapmt processor parameters
* draft script
* simplify code
* cleanup
* fixup and clean
* update pipeline
* style
* clean up previous idea
* add tokenization utils
* update tokenizer and asr output
* fit whisper type
* style and update test
* clean test
* style test
* update tests
* update error test
* udpate code (not based on review yet)
* update tokenization
* update asr pipeline
* update code
* cleanup and update test
* fmt
* remove text verificatino
* cleanup
* cleanup
* add model test
* update tests
* update code add docstring
* update code and add docstring
* fix pipeline tests
* add draft logit processor
add template functions
update timesapmt processor parameters
draft script
simplify code
cleanup
fixup and clean
update pipeline
style
clean up previous idea
add tokenization utils
update tokenizer and asr output
fit whisper type
style and update test
clean test
style test
update tests
update error test
udpate code (not based on review yet)
update tokenization
update asr pipeline
update code
cleanup and update test
fmt
remove text verificatino
cleanup
cleanup
add model test
update tests
update code add docstring
update code and add docstring
fix pipeline tests
* Small update.
* Fixup.
* Tmp.
* More support.
* Making `forced_decoder_ids` non mandatory for users to set.
* update and fix first bug
* properly process sequence right after merge if last
* tofo
* allow list inputs + compute begin index better
* start adding tests
* add the 3 edge cases
* style
* format sequences
* fixup
* update
* update
* style
* test passes, edge cases should be good
* update last value
* remove Trie
* update tests and expec ted values
* handle bigger chunk_length
* clean tests a bit
* refactor chunk iter and clean pipeline
* update tests
* style
* refactor chunk iter and clean pipeline
* upade
* resolve comments
* Apply suggestions from code review
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
* take stride right into account
* update test expected values
* Update code based on review
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
2023-01-17 15:50:09 +01:00
Sanchit Gandhi
77382e918d
[Whisper] Fix forced decoder ids ( #20652 )
...
* [Whisper] Fix forced decoder ids
* fix test
2022-12-07 16:44:13 +00:00
Sanchit Gandhi
74fb524e20
[Whisper] Fix decoder ids methods ( #20599 )
...
* [Whisper] Fix decoder ids methods
* enum property
2022-12-05 18:45:22 +00:00
Arthur
761b3fad92
Expected output for the test changed ( #20493 )
2022-11-30 15:07:28 +01:00
Arthur
11b2e45ccc
[WHISPER] Update modeling tests ( #20162 )
...
* Update modeling tests
* update tokenization test
* typo
* nit
* fix expected attention outputs
* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Update tests from review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* remove problematics kwargs passed to the padding function
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-11-15 11:04:58 +01:00
Sanchit Gandhi
905e5773a3
[processor] Add 'model input names' property ( #20117 )
...
* [processor] Add 'model input names' property
* add test
* no f string
* add generic property method to mixin
* copy to multimodal
* copy to vision
* tests for all audio
* remove ad-hoc tests
* style
* fix flava test
* fix test
* fix processor code
2022-11-10 19:29:20 +00:00
Joao Gante
f270b960d6
Generate: move generation_*.py src files into generation/*.py ( #20096 )
...
* move generation_*.py src files into generation/*.py
* populate generation.__init__ with lazy loading
* move imports and references from generation.xxx.object to generation.object
2022-11-09 15:34:08 +00:00
Sanchit Gandhi
06d488061f
[Whisper Tokenizer] Make more user-friendly ( #19921 )
...
* [Whisper Tokenizer] Make more user-friendly
* use property
* make indexing rigorous
* small clean-up
* tests
* skip seq2seq tests
* remove multilingual arg
* reorder args
* collapse to one function
Co-authored-by: ArthurZucker <arthur@huggingface.co>
* option to override attributes
Co-authored-by: ArthurZucker <arthur@huggingface.co>
* add to docs
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* make comment more clear
Co-authored-by: sgugger <sylvain@huggingface.co>
* don't add special tokens in get_decoder_prompt_ids
* add test for set_prefix_tokens
Co-authored-by: ArthurZucker <arthur@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: sgugger <sylvain@huggingface.co>
2022-11-03 14:22:40 +00:00
Yih-Dar
3436842102
Run some TF Whisper tests in subprocesses to avoid GPU OOM ( #19772 )
...
* Run some TF Whisper tests in subprocesses to avoid GPU OOM
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-10-21 21:59:18 +02:00
Arthur
d51ca32404
fix tests ( #19670 )
2022-10-18 06:45:48 +02:00
Sanchit Gandhi
c937f0b954
[Whisper] Don't return attention mask in feat extractor ( #19521 )
...
* [Whisper] Don't return attention mask in feat extractor
* remove attention mask from test
* fix failing tests
* quality
2022-10-14 14:36:03 +01:00
Sanchit Gandhi
bbd150e92f
[Whisper] Freeze params of encoder ( #19527 )
...
* [Whisper] Freeze params of encoder
* add tests
2022-10-13 09:50:02 +01:00
Yih-Dar
440bbd44aa
Update WhisperModelIntegrationTests.test_large_batched_generation
( #19472 )
...
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-10-11 14:39:24 +02:00
amyeroberts
e3f028f3af
Add TF whisper ( #19378 )
...
* simplify loop
* add featur extractor
* add model
* start conversion
* add dropout
* initial commit of test files
* copnversion for all models
* update processor for correct padding
* update feature extraction
* update integration test logits match
* fmnt: off for the logits
* on the fly mel bank
* small nit
* update test
* update tokenizer
* nit feature extraction
* update
* update tokenizer test
* adds logit processor and update tokenizer to get supress tokens
* style
* clean convert
* revert to original modeling tf utils
* Update
* update
* nit
* clean convert file
* update tests and nits
* quality
* slow generation test
* ffn_dim to allow customization
* update readme
* add to toctreee
* start fixing integration tests
* update tests and code
* fix feature extractor
* fix config tests common
* update code to fix tests
* fix feature exctractor
* nit feature extraction
* update test for new feature extractor
* style
* add absrtact
* large logits wioth custom decoder input ids
* wraap around is otrch available
* fix feature extractor
* correct logits for whisper small.en
* nit
* fix encoder_attentino_mask
* some fixes
* remove unnecessary inputs
* nits
* add normalizer file
* update etst tokenization
* fix attention mask not defined
* fix generate
* remove uncoder attention mask useless
* update test modeling whisper
* update condfig to add second non supress tokens
* nits on feature exrtactor
* nit for test tokenizers
* update etsts
* update tests
* update tokenization test
* fixup
* invalidated hf token. Clean convert openai to whisper
* fix logit tests
* fixup
* Add model to README
* Fix doc tests
* clean merge
* revert toc_tree changes
* remove useless LogitProcessor
* Update whisper .mdx
* update config file doc
* update configuration docstring
* update test tokenization
* update test tokenization
* update tokenization whisper
Added copied from where needed
* update feature extraction
* nit test name
* style
* quality
* remove get suppress tokens and update non_speech tokens global variables
* Update src/transformers/models/whisper/feature_extraction_whisper.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* clean modeling whisper and test
Removed the attention mask arguments that are deprecated
* fix large test
* Add multilingual audio test, and translate test
* style
* fix larg multilingual test
* nits
* add copied from for attention layer
* remove attention masks in doc
* add english normalizer
* Update docs/source/en/model_doc/whisper.mdx
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* update tokenization test
* remove copied from in whisper attention : no bias in k_proj only
* wrap around dependencies in english normalizer
* style
* correct import generation logits
* for now, wrap feature extractor with torch
* remove torch depencies for feature extraction and style
* Update src/transformers/models/whisper/convert_openai_whisper_to_tfms.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/whisper/configuration_whisper.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update docs/source/en/model_doc/whisper.mdx
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* fixup
* nit
* update logitds
* style
* nit
* nits and fix final tests
* add `is_more_itertools_available` to utils
* quality
* add begin supress tokens, supress tokens to generate args and config
* clean supressTokensLogitProcessor in generation logits
* Nit naming
* add supressTokensAtBegin
* udpate tests, supress tokens to None or correct values
* nit and style
* update RAG to fit test and generate_logit
* add copy pasted statment on english normalizer
* add arguments to config_common_kwargs
* Update src/transformers/generation_utils.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/generation_logits_process.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* revert changes based on reviews
* update doc and nits
* Update src/transformers/models/whisper/configuration_whisper.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* more nits
* last nits
* update test configuration common
* add BART name in decoder attention mask documentation
* Update src/transformers/models/whisper/modeling_whisper.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* style
* nit
* nit
* add english.json file to git
* nits on documentation
* nit
* nits
* last styling
* add main toctree file
* remove sentence piece dependency
* clean init file
* fix tokenizer that has no dependencies on sentencepiece
* update whisper init file, nit
* remove english.json file
* add get decoder prompt id
* All weights loading
* Remove hanging pdb
* Fixup and tidy up
* Use same copied from as PT model
* Remove whitespace changes
* Remove torch references
* Tie embeddings
* Remove logits processor input to generate
* Update logit values
* revert changes and add forced logit processor
* nit
* clean normalizer
* remove protected
* Add logit processors and update generation code & tests
* Some tidy up
* Update docstring
* update
* update based on review
* Update src/transformers/models/whisper/configuration_whisper.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/whisper/configuration_whisper.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update to reflect changes on the PT model branch
* Tidy up
* Remove extra whitespace
* Fix test - make input ids small enough we can append
* Include upstream changes on main
* PR comments - add batch tests, remove comments & defaults
* Fix model output imports
* Update src/transformers/models/whisper/modeling_tf_whisper.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update src/transformers/generation_tf_logits_process.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update src/transformers/models/whisper/modeling_tf_whisper.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update src/transformers/models/whisper/modeling_tf_whisper.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update tests/models/whisper/test_modeling_tf_whisper.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update src/transformers/models/whisper/modeling_tf_whisper.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update src/transformers/models/whisper/modeling_tf_whisper.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update docstring example
* Update src/transformers/models/whisper/modeling_tf_whisper.py
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Remove changes to adjust_logits_during_generation function
* Update src/transformers/models/whisper/modeling_tf_whisper.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Tidy up imports that don't require TF
* Update tests - skip and no more skip
* Update tests/generation/test_generation_tf_logits_process.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update src/transformers/models/whisper/modeling_tf_whisper.py
* Update src/transformers/models/whisper/modeling_tf_whisper.py
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Add training flags
* Add (skipped) XLA generation tests
* Add embedding correctness test
* Add constant ids for generation tests
* Make logits finding a bit tidier
* Remove unused args
* xla generation enabled
* Don't skip XLA tests anymore
* Fix tests - add position ids to expected signature and update rag generation
* Undo method reorder
* Remove added whitespace
* Remove copy-paste gradient checkopint ref
* Remove
* Trigger CI - (issue with refs when pulling)
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: NielsRogge <niels.rogge1@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
2022-10-10 14:48:17 +01:00
Arthur
45e14038f2
Add WhisperModel to transformers ( #19166 )
...
* simplify loop
* add featur extractor
* add model
* start conversion
* add dropout
* initial commit of test files
* copnversion for all models
* update processor for correct padding
* update feature extraction
* update integration test logits match
* fmnt: off for the logits
* on the fly mel bank
* small nit
* update test
* update tokenizer
* nit feature extraction
* update
* update tokenizer test
* adds logit processor and update tokenizer to get supress tokens
* style
* clean convert
* revert to original modeling tf utils
* Update
* update
* nit
* clean convert file
* update tests and nits
* quality
* slow generation test
* ffn_dim to allow customization
* update readme
* add to toctreee
* start fixing integration tests
* update tests and code
* fix feature extractor
* fix config tests common
* update code to fix tests
* fix feature exctractor
* nit feature extraction
* update test for new feature extractor
* style
* add absrtact
* large logits wioth custom decoder input ids
* wraap around is otrch available
* fix feature extractor
* correct logits for whisper small.en
* nit
* fix encoder_attentino_mask
* some fixes
* remove unnecessary inputs
* nits
* add normalizer file
* update etst tokenization
* fix attention mask not defined
* Add model to README
* Fix doc tests
* fix generate
* remove uncoder attention mask useless
* update test modeling whisper
* update condfig to add second non supress tokens
* nits on feature exrtactor
* nit for test tokenizers
* update etsts
* update tests
* update tokenization test
* fixup
* invalidated hf token. Clean convert openai to whisper
* fix logit tests
* fixup
* clean merge
* revert toc_tree changes
* remove useless LogitProcessor
* Update whisper .mdx
* update config file doc
* update configuration docstring
* update test tokenization
* update test tokenization
* update tokenization whisper
Added copied from where needed
* update feature extraction
* nit test name
* style
* quality
* remove get suppress tokens and update non_speech tokens global variables
* Update src/transformers/models/whisper/feature_extraction_whisper.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* clean modeling whisper and test
Removed the attention mask arguments that are deprecated
* fix large test
* Add multilingual audio test, and translate test
* style
* fix larg multilingual test
* nits
* Update docs/source/en/model_doc/whisper.mdx
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* add copied from for attention layer
* remove attention masks in doc
* add english normalizer
* update tokenization test
* remove copied from in whisper attention : no bias in k_proj only
* wrap around dependencies in english normalizer
* style
* correct import generation logits
* for now, wrap feature extractor with torch
* Update src/transformers/models/whisper/convert_openai_whisper_to_tfms.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/whisper/configuration_whisper.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update docs/source/en/model_doc/whisper.mdx
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* remove torch depencies for feature extraction and style
* fixup
* nit
* update logitds
* style
* nit
* nits and fix final tests
* add `is_more_itertools_available` to utils
* quality
* add begin supress tokens, supress tokens to generate args and config
* clean supressTokensLogitProcessor in generation logits
* Nit naming
* add supressTokensAtBegin
* udpate tests, supress tokens to None or correct values
* nit and style
* update RAG to fit test and generate_logit
* add copy pasted statment on english normalizer
* add arguments to config_common_kwargs
* Update src/transformers/generation_utils.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/generation_logits_process.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/whisper/configuration_whisper.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* revert changes based on reviews
* update doc and nits
* more nits
* last nits
* update test configuration common
* add BART name in decoder attention mask documentation
* Update src/transformers/models/whisper/modeling_whisper.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* style
* nit
* nit
* add english.json file to git
* nits on documentation
* nit
* nits
* last styling
* add main toctree file
* remove sentence piece dependency
* clean init file
* fix tokenizer that has no dependencies on sentencepiece
* update whisper init file, nit
* remove english.json file
* add get decoder prompt id
* revert changes and add forced logit processor
* nit
* clean normalizer
* remove protected
* update
* Update src/transformers/models/whisper/configuration_whisper.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* update based on review
* Update src/transformers/models/whisper/configuration_whisper.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* add batched tests
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: NielsRogge <niels.rogge1@gmail.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-10-05 22:28:31 +02:00