* Enabling dataset iteration on pipelines.
Enabling dataset iteration on pipelines.
Unifying parameters under `set_parameters` function.
Small fix.
Last fixes after rebase
Remove print.
Fixing text2text `generate_kwargs`
No more `self.max_length`.
Fixing tf only conversational.
Consistency in start/stop index over TF/PT.
Speeding up drastically on TF (nasty bug where max_length would increase
a ton.)
Adding test for support for non fast tokenizers.
Fixign GPU usage on zero-shot.
Fix working on Tf.
Update src/transformers/pipelines/base.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Update src/transformers/pipelines/base.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Small cleanup.
Remove all asserts + simple format.
* Fixing audio-classification for large PR.
* Overly explicity null checking.
* Encapsulating GPU/CPU pytorch manipulation directly within `base.py`.
* Removed internal state for parameters of the pipeline.
Instead of overriding implicitly internal state, we moved
to real named arguments on every `preprocess`, `_forward`,
`postprocess` function.
Instead `_sanitize_parameters` will be used to split all kwargs
of both __init__ and __call__ into the 3 kinds of named parameters.
* Move import warnings.
* Small fixes.
* Quality.
* Another small fix, using the CI to debug faster.
* Last fixes.
* Last fix.
* Small cleanup of tensor moving.
* is not None.
* Adding a bunch of docs + a iteration test.
* Fixing doc style.
* KeyDataset = None guard.
* RRemoving the Cuda test for pipelines (was testing).
* Even more simple iteration test.
* Correct import .
* Long day.
* Fixes in docs.
* [WIP] migrating object detection.
* Fixed the target_size bug.
* Fixup.
* Bad variable name.
* Fixing `ensure_on_device` respects original ModelOutput.
* Moving slow tokenizer to the Trie world.
* Adding more docstrings to the Trie.
* Fixing doctest (incompatible wiht our format? )
* Update src/transformers/tokenization_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Adding a lot more comment into the internals of this algorithm.
* Cleaner doc.
* Fixing the namings.
* Update src/transformers/tokenization_utils.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* quality.
* Fixing longest first match.
* Small improvements to cuts + more test + canine resistant test.
* Fixing fast test.
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* [docs] update dead quickstart link on resuing past for GPT2
Thed dead link have been replaced by two links of forward and call methods of the GPT2 class for torch and tensorflow respectively.
* [docs] fix formatting for gpt2 page update
* refactor GPT Config to allow dyn. properties
* make attribute_map a class attribute
* remove old code
* update unit test to test config: Add test for common properties setter
* update unit test to test config: Add test for common properties passed as parameters to __init__
* update to black code format
* Allow that setters are not defined for certain config classes
* update config classes to implement attribute_map
* bugfix lxmert config - id2labels was not defined when num_labels was set
* update broken configs - add attribute_maps
* update bart config
* update black codestyle
* update documentation on common config attributes
* update GPTJ config to new attribute map
* update docs on common attributes
* gptj config: add max_position_embeddings
* gptj config: format with black
* update speech to text 2 config
* format doc file to max_len 119
* update config template
* [docs] Update perplexity.rst to use negative log likelihood
Model `forward` returns the negative log likelihood. The document correctly defines and calculates perplexity, but the description and variable names are inconsistent, which might cause confusion.
* [docs] restyle perplexity.rst
* correct order of overflowing_tokens for slow tokenizer (issue fix#13148)
* python 3.9 requires sentencepiece version 0.1.94 or above
* slicing of ids fixed in truncated_sequence()
* Update setup.py
* Correct order of overflowing tokens for pair of sentences
* code reformatted
* Update tokenization_utils_base.py
* reformatting file
* test to check single_input added
* missing function restored
* test to check pair_input overflowing tokens order
* test to check pair_input overflowing tokens order
* test to check pair_input overflowing tokens order
* added an error message for pair of seq and longest_first strategy
* test for pair_input modified
* variable name corrected
* fixed a typo in error message
* requested changes implemented
* required test added
* Corrected the message to match test message
* added error message for Luke Tokenizer
* lost test recovered
* docstring for truncate_sequences and prepare_for_model updated
* docstring for luke tokenizer updated
* updated ENCODE_PLUS_ADDITIONAL_KWARGS_DOCSTRING
* aligned text and fixed puncuatations
* improved style and quality of code
* fixed error_msg in truncate_sequences
* replaced encode_plus method with regular call method
* clean up
* rephrased the docstring