* add draft logit processor
* add template functions
* update timesapmt processor parameters
* draft script
* simplify code
* cleanup
* fixup and clean
* update pipeline
* style
* clean up previous idea
* add tokenization utils
* update tokenizer and asr output
* fit whisper type
* style and update test
* clean test
* style test
* update tests
* update error test
* udpate code (not based on review yet)
* update tokenization
* update asr pipeline
* update code
* cleanup and update test
* fmt
* remove text verificatino
* cleanup
* cleanup
* add model test
* update tests
* update code add docstring
* update code and add docstring
* fix pipeline tests
* add draft logit processor
add template functions
update timesapmt processor parameters
draft script
simplify code
cleanup
fixup and clean
update pipeline
style
clean up previous idea
add tokenization utils
update tokenizer and asr output
fit whisper type
style and update test
clean test
style test
update tests
update error test
udpate code (not based on review yet)
update tokenization
update asr pipeline
update code
cleanup and update test
fmt
remove text verificatino
cleanup
cleanup
add model test
update tests
update code add docstring
update code and add docstring
fix pipeline tests
* Small update.
* Fixup.
* Tmp.
* More support.
* Making `forced_decoder_ids` non mandatory for users to set.
* update and fix first bug
* properly process sequence right after merge if last
* tofo
* allow list inputs + compute begin index better
* start adding tests
* add the 3 edge cases
* style
* format sequences
* fixup
* update
* update
* style
* test passes, edge cases should be good
* update last value
* remove Trie
* update tests and expec ted values
* handle bigger chunk_length
* clean tests a bit
* refactor chunk iter and clean pipeline
* update tests
* style
* refactor chunk iter and clean pipeline
* upade
* resolve comments
* Apply suggestions from code review
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
* take stride right into account
* update test expected values
* Update code based on review
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
* Clarify and add missing typical_p docstring.
* Make the docstring easier to understand.
* Clarify typical_p docstring
Accept the suggestion by @stevhliu for paraphrasing the docstring.
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Use the same docstring as in GenerationConfig
Follow the suggestion suggested by @stevhliu in the pull request conversation.
* Fix docstring spacing.
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Add num_workers for prepare_tf_dataset
* Bugfix in the default collator and change default tensor type
* Remove the "num_workers" arg and move it to a new PR
* Fixing #20783
* Update src/transformers/pipelines/base.py
* Fixing some tests.
* Fixup.
* Remove ffmpeg dep + a bit more relaxed for bigbird QA precision.
* Better dataset.
* Prevent failing on TF.
* Better condition. We can't use `can_use_iterator` since we cannot use it
directly.
* Optimize inference only mode memory if ipex is used
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* fix code style
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* docs: add wandb metrics and model checkpointing to callback docstrings
* docs: update reference to wandb documentation
* fix: change default of `"WANDB_WATCH"` from ``"gradients"` to ``"false"`
* feature: add `on_save` method and update `"WANDB_LOG_MODEL` behaviour
* fix: use default wandb run names instead of `output_dir`
- removes duplicated run names from wandb workspace
- models can be logged with corresponding run names
* fix: edit deprecation warning based on review suggestions
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* fix: change indentation of docstrings
* fix: change indentation of docstrings and run fixup
* fix: empty commit for circleci permissions issue
* fix: format deprecation doc strings review suggestion
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* docs: Highlight WANDB_DISABLED arg in documentaion
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* fix: run fixup after updating docstrings
Co-authored-by: Bharat Ramanathan <ramanathan.parameshwaran@gohuddl.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* [Fix] Make the attention head size in distilbert an object attribute
* Fix code style
Co-authored-by: Felix Joehnk <fjoehnk@N73GCH2NDH.corp.proofpoint.com>
* Add support for turning off the model uploading in ClearML
* Add documentation for the CLEARML_LOG_MODEL environment variable
* Adjust new doc addition to the new style
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Dudu Lasry <dudu.lasry@viz.ai>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
[NumPy] Remove references to deprecated NumPy type aliases.
This change replaces references to a number of deprecated NumPy type aliases (np.bool, np.int, np.float, np.complex, np.object, np.str) with their recommended replacement (bool, int, float, complex, object, str).
NumPy 1.24 drops the deprecated aliases, so we must remove uses before updating NumPy.
Co-authored-by: Peter Hawkins <phawkins@google.com>
Co-authored-by: Peter Hawkins <phawkins@google.com>