* Add check-ops script
* Finish to implement check_tf_ops and start the test
* Make the test mandatory only for BERT
* Update tf_ops folder
* Remove useless classes
* Add the ONNX test for GPT2 and BART
* Add a onnxruntime slow test + better opset flexibility
* Fix test + apply style
* fix tests
* Switch min opset from 12 to 10
* Update src/transformers/file_utils.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Fix GPT2
* Remove extra shape_list usage
* Fix GPT2
* Address Morgan's comments
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Conversion from slow to fast for BPE spm vocabs contained an error.
- There is only 1 test currently (tokenizers + slow) that used the modified path
and it's reformer, which does not contain any ids modification so the
bug was silent for now.
- The real issue is that vocab variable was overloaded by
SentencePieceExtractor, leading to Slow specific vocab oddities to be
completely ignored
- The bug was reported here https://github.com/huggingface/transformers/issues/9518
- Ran the complete tokenization test suite with slow without error
(`RUN_SLOW=1 pytest -sv tests/test_tokenization_*`)
* Remove rebase error.
* Adding the fixture.
* Cleaning up `ConversationalPipeline` to support more than DialoGPT.
Currently ConversationalPipeline was heavily biased towards DialoGPT
,which is the default model for this pipeline.
This PR proposes changes to put back the modifications specific to
DialoGPT into tokenizer-specific behavior wherever possible, by
creating `_build_conversation_input_ids` function that takes
conversation as input, and returns a list of ints corresponding
to the tokens. It feels natural to put here because all models
have probably different strategies to build input_ids from the
full conversation and it's the tokenizer's job to transform strings
into tokens (and vice-versa)
If `_build_conversation_input_ids` is missing, previous behavior is
used so we don't break anything so far (except for blenderbot where it's a fix).
This PR also contains a fix for too long inputs. There used
to be dead code for trying to limit the size of incoming input.
The introduced fixed is that we limit
within `_build_conversation_input_ids` to `tokenizer.model_max_length`.
It corresponds to the intent of the removed dead code and is actually
better because it corresponds to `model_max_length` which is different
from `max_length` (which is a default parameter for `generate`).
- Removed `history` logic from the Conversation as it's not relevant
anymore because tokenization logic has been moved to tokenizer.
And tokenizer cannot save any cache, and conversation cannot know
what is relevant or not.
Also it's not usable from `blenderbot` because the input_ids are
not append only (EOS tokens is always at the end).
- Added `iter_texts` method on `Conversation` because all
the code was literred with some form of this iteration of
past/generated_responses.
* Removing torch mention in types.
* Adding type checking to `_build_conversation_input_ids`.
* Fixing import in strings.
Adding new `encoder_no_repeat_ngram_size` to `generate`.
Blenderbot results seemed off compared to original ParlAI script:
`https://parl.ai/projects/recipes/`. Notably the model seems
to repeat a lot what was said during the conversation.
The actual problem was that `no_repeat_ngram_size` actually applies
to the `encoder_input_ids` but HF's `no_repeat_ngram_size` applies
to the previously generated ids (within the decoder). The history
conversation of blenderbot is within the `encoder` part so that
explains why HF's implementation had the repetitions.
This fix was focused on blenderbot *not* small and added tests
for those because they are quite different in configuration.
This change includes:
- Adding a new EncoderNoRepeatLogitProcessor.
- Adding 1 new arg to `generate` (`encoder_no_repeat_ngram_size`)
- Adding 1 new config parameter `encoder_no_repeat_ngram_size`.
- Adding 2 tests, one for the pipeline (high level, inputs exhibited
repeat behavior, one low level for EncoderNoRepeatLogitProcessor)
- Factored NoRepeatLogitProcessor so that logic could be reused.
Further work:
- Blenderbot conversational pipeline still does not behave correctly
as they way input is prepared within the pipeline is still incorrect
(follow up PR)
- Blenderbot allows the bot to have personas, which is done by
prepending "your personna: XXXX" to the input, this could be explored
too in a follow up PR.
@patrickvonplaten
@LysandreJik
* Update src/transformers/generation_logits_process.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/generation_utils.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/generation_utils.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/configuration_utils.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Doc quality.
* Fixing test.
* Last fixes.
* Fixing to account for batch_size.
* Update src/transformers/configuration_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/generation_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>