Commit Graph

175 Commits

Author SHA1 Message Date
Yung-Sung Chuang
d094d8d9ec
Generate: Add new decoding strategy "DoLa" in .generate() (#29619)
Co-authored-by: Joao Gante <joao@huggingface.co>
2024-07-09 17:37:38 +01:00
jiqing-feng
7f91f168a1
fix assisted decoding (#31401)
* fix assisted decoding

* check None

* fix typo

* fix _prepare_special_tokens

* fix style

* fix lint

* add tests for assisted decoding

* fix style

* fix tests check
2024-07-03 09:22:56 +01:00
Joao Gante
82486e5995
🚨🚨 TextGenerationPipeline: rely on the tokenizer default kwargs (#31747)
* rely on the tokenizer default kwargs

* fix a few tests
2024-07-02 16:17:42 +02:00
Sanchit Gandhi
a9701953ff
[whisper] static kv cache (#31166)
* make work with cache abstraction

* correct for static cache

* hacks for compile

* make fast

* fix

* fix pos ids

* generate

* fix sdpa

* fix sdpa cache pos

* fix fa2

* clean fa2

* integrate cache into generate

* make style

* copies

* more copies

* update eager

* update sdpa

* update fa2

* simplify

* use cache pos

* always compute cross-cache for debug

* avoid recompiles
Co-authored-by: Arthur Zucker <arthur@huggingface.co>

* fix fix

* fix fix fix

* more fix

* try encoder-decoder cache (too messy)

* revert encoder-decoder cache

* check cross-attn cache

* use enc-dec dataclass

* use richer enc-dec dataclass

* clean-up

* revert static cache changes

* small fixes

* revert to cpu flag

* fix copies

* add static slow test

* past k/v docstring

* more docstrings

* cache_position docstrings

* add to docs

* add enc-dec cache to docs

* make style

* fix after rebase

* fix beam

* style

* fix generation strategies

* fix most decoder-only tests

* style

* skip test

* more clean up

* small docstrings

* Apply suggestions from code review

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* add todo

* only crop self-attn

* check cache in mixin

* style

* fix re-compile after rebase

* move `is_updated` logic to enc-dec wrapper

* revert back

* revert cache back

* finalise design

* fix

* fix fix

* style

* Update src/transformers/cache_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* deprecate

* updates

* final updates

* style

* style

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-07-02 13:24:15 +01:00
amyeroberts
1de7dc7403
Skip tests properly (#31308)
* Skip tests properly

* [test_all]

* Add 'reason' as kwarg for skipTest

* [test_all] Fix up

* [test_all]
2024-06-26 21:59:08 +01:00
Joao Gante
1fd60fec75
RWKV: enable generation tests (#31490)
* add rwkv tests

* has_attentions set in individual tests
2024-06-20 14:15:01 +01:00
Joao Gante
83259e406d
Mamba: add generative tests (#31478) 2024-06-19 10:27:23 +01:00
Matt
28316d0e8b
Fix single letter stop strings (#31448)
* Fix single letter stop strings

* Change the 0 to a 1 to avoid potential empty vector headaches later

* Restructure for clarity

* Update tests/generation/test_stopping_criteria.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Add the unsqueeze

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-06-18 14:07:16 +01:00
Raushan Turganbay
5fabd1e83b
Generation: fix handling of special tokens (#31254)
* fix special tokens in generatioon

* fix test

* add warning

* fix the check

* warn once

* fix
2024-06-06 15:21:32 +05:00
Raushan Turganbay
83238eeebc
Pass device in Logits Processor's init (#29804)
* add device in logits processor

* remove device when not needed

* codestyle

* tests

* forgot `melody` version

* Update src/transformers/models/whisper/generation_whisper.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* codestyle

* updates

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2024-06-04 10:19:19 +05:00
Ahmed Moubtahij
39b2ff69d6
Token healing (#30081)
* token healing impl + trie with extensions

* make fixup

* prefix-robust space tokenization

* examples readme and requirements

* make fixup

* allow input prompt and model

* redundant defaults

* Specialized Trie

* make fixup

* updated tests with new inherited Tree

* input ids to auto device_map

* rm unused import

* Update src/transformers/generation/utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* naming convention

* Revert "naming convention"

This reverts commit dd39d9c5b7a969e2d8a8d2a8e54f121b82dc44f0.

* naming convention

* last -hopefully- changes

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-06-03 10:53:15 +02:00
Raushan Turganbay
779bc360ff
Watermark: fix tests (#30961)
* fix tests

* style

* Update tests/generation/test_utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-05-28 17:07:42 +05:00
Raushan Turganbay
d583f1317b
Quantized KV Cache (#30483)
* clean-up

* Update src/transformers/cache_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/cache_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/cache_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fixup

* Update tests/quantization/quanto_integration/test_quanto.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update src/transformers/generation/configuration_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* more suggestions

* mapping if torch available

* run tests & add 'support_quantized' flag

* fix jamba test

* revert, will be fixed by another PR

* codestyle

* HQQ and versatile cache classes

* final update

* typo

* make tests happy

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2024-05-23 17:25:20 +05:00
Kamil Akesbi
eb1a77bbb0
Using assistant in AutomaticSpeechRecognitionPipeline with different encoder size (#30637)
* fiw input to generate in pipeline

* fixup

* pass input_features to generate with assistant

* error if model and assistant with different enc size

* fix

* apply review suggestions

* use self.config.is_encoder_decoder

* pass inputs to generate directly

* add slow tests

* Update src/transformers/generation/utils.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update tests/pipelines/test_pipelines_automatic_speech_recognition.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update tests/pipelines/test_pipelines_automatic_speech_recognition.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update tests/pipelines/test_pipelines_automatic_speech_recognition.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update tests/pipelines/test_pipelines_automatic_speech_recognition.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update tests/pipelines/test_pipelines_automatic_speech_recognition.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* apply review

* Update src/transformers/generation/utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/pipelines/test_pipelines_automatic_speech_recognition.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* apply code review

* update attributes encoder_xyz to check

* Update src/transformers/generation/utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* add slow test

* solve conflicts

---------

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2024-05-23 09:59:38 +01:00
Raushan Turganbay
b1065aa08a
Generation: get special tokens from model config (#30899)
* fix

* let's do this way?

* codestyle

* update

* add tests
2024-05-22 18:15:41 +02:00
Raushan Turganbay
5ad960f1f4
Add Watermarking LogitsProcessor and WatermarkDetector (#29676)
* add watermarking processor

* remove the other hashing (context width=1 always)

* make style

* Update src/transformers/generation/logits_process.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/logits_process.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/logits_process.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/configuration_utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* update watermarking process

* add detector

* update tests to use detector

* fix failing tests

* rename `input_seq`

* make style

* doc for processor

* minor fixes

* docs

* make quality

* Update src/transformers/generation/configuration_utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/logits_process.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/watermarking.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/watermarking.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/watermarking.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* add PR suggestions

* let's use lru_cache's default max size (128)

* import processor if torch available

* maybe like this

* lets move the config to torch independet file

* add docs

* tiny docs fix to make the test happy

* Update src/transformers/generation/configuration_utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/watermarking.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* PR suggestions

* add docs

* fix test

* fix docs

* address pr comments

* style

* Revert "style"

This reverts commit 7f33cc34ff.

* correct style

* make doctest green

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2024-05-14 13:31:39 +05:00
Joao Gante
7130a22db9
Generate: consistently handle special tokens as tensors (#30624)
* tmp commit

* [test_all] mvp

* missing not

* [test_all] final test fixes

* fix musicgen_melody and rag

* [test_all] empty commit

* PR comments

* Update src/transformers/generation/utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-05-09 18:01:57 +01:00
Joao Gante
df53c6e5d9
Generate: add min_p sampling (#30639)
* min_p

* more relaxed test to avoid numerical issues

* Update src/transformers/generation/logits_process.py

Co-authored-by: menhguin <minh1228@gmail.com>

* Update src/transformers/generation/configuration_utils.py

Co-authored-by: menhguin <minh1228@gmail.com>

* docstring clarifications

* PR comments

* Update tests/generation/test_logits_process.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* make fixup

---------

Co-authored-by: menhguin <minh1228@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-05-09 14:36:53 +01:00
Raushan Turganbay
77b59dce9f
Fix on "cache position" for assisted generation (#30068)
* clean commit history I hope

* get kv seq length correctly

* PR suggestions

* Update src/transformers/testing_utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* add comment

* give gpt bigcode it's own overriden method

* remove code

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2024-04-23 16:23:36 +05:00
Matt
0d84901cb7
Terminator strings for generate() (#28932)
* stash commit (will discard all of this)

* stash commit

* First commit - needs a lot of testing!

* Add a test

* Fix imports and make the tests actually test something

* Tests pass!

* Rearrange test

* Add comments (but it's still a bit confusing)

* Stop storing the tokenizer

* Comment fixup

* Fix for input_ids with a single sequence

* Update tests to test single sequences

* make fixup

* Fix incorrect use of isin()

* Expand tests to catch more cases

* Expand tests to catch more cases

* make fixup

* Fix length calculation and update tests

* Handle Ġ as a space replacement too

* Update src/transformers/generation/stopping_criteria.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Add optimizations from Joao's suggestion

* Remove TODO

* Update src/transformers/generation/stopping_criteria.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update tests/generation/test_stopping_criteria.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* make fixup

* Rename some variables and remove some debugging clauses for clarity

* Add tests for the sub-methods

* Clarify one test slightly

* Add stop_strings to GenerationConfig

* generate() supports stop_string arg, asks for tokenizer if not provided

* make fixup

* Cleanup code and rename variables for clarity

* Update tokenizer error

* Update tokenizer passing, handle generation on GPU

* Slightly more explanation cleanup

* More comment cleanup

* Factor out the token cleanup so it's more obvious what we're doing, and we can change it later

* Careful with that cleanup!

* Cleanup + optimizations to _get_matching_positions

* More minor performance tweaks

* Implement caching and eliminate some expensive ops (startup time: 200ms -> 9ms)

* Remove the pin_memory call

* Parallelize across all stop strings!

* Quick fix for tensor devices

* Update embeddings test for the new format

* Fix test imports

* Manual patching for BERT-like tokenizers

* Return a bool vector instead of a single True/False

* Better comment

* Better comment

* Add tests from @zucchini-nlp

* Amy's list creation nit

* tok_list -> token_list

* Push a big expanded docstring (should we put it somewhere else?)

* Expand docstrings

* Docstring fixups

* Rebase

* make fixup

* Make a properly general method for figuring out token strings

* Fix naming throughout the functions

* Move cache, refactor, fix tests

* Add comment

* Remove finished TODO

* Remove finished TODO

* make fixup

* Update src/transformers/generation/stopping_criteria.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update and shorten docstring

* Update tests to be shorter/clearer and test specific cases

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-04-22 14:13:04 +01:00
Raushan Turganbay
b1cd48740e
Do not remove half seq length in generation tests (#30016)
* remove seq length from generation tests

* style and quality

* [test_all] & PR suggestion

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update tests/generation/test_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* [test all] remove unused variables

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-04-19 17:32:52 +01:00
tomeras91
3f20877da9
Add jamba (#29943)
* Add jamba arch

* apply "make fix-copies" changes

* fix link to model in JambaConfig docstring

* Add n_ctx in modeling file because repo-consistency wants that

* Add jamba to flash attention and sdpa documentation

* mamba dt_proj quant fix now works for LoRA as well

* override test_left_padding_compatibility and use a more permissive tolerance. left padding numerical difference are accentuated by mamba layers

* add jamba to tokenization auto

* fix comments of shape (PR #24 in the model page: https://huggingface.co/ai21labs/Jamba-v0.1/discussions/24)

* simple PR fixes

* remove unnecessary kwargs from JambaAttentionDecoderLayer and JambaMambaDecoderLayer

* remove the LoRA hack for the mamba dt_proj bias. It was solved in huggingface/peft#1530 (https://github.com/huggingface/peft/pull/1530)

* Add copied comment on JambaMLP (it's the same as MixtralMLP)

* remove padding_mask warnings. It's not supported anymore

* fix docstring. Float instead of int

* A few more minor PR fixes

* (1) lowercase names for mamba layernorms (2) remove _apply_inner_layernorms and do it directly in the forward pass

* Return None attention weights from mamba layers. Append to all attentions only if not None.

* remove some leftover jamba archive lists

* Better separation between expert vs non-expert layers. non-expert layers return None as router_logits, and it is not concatenated to all_router_logits returned from JambaModel

* no need to take router_logits at config.expert_layer_offset anymore. result.router_logits now holds results only for expert layers

* Add Jamba paper on READMEs

* (1) rename n_ctx -> max_position_embeddings (2) don't use it in the modeling file since it's not needed (set it as an exception to check_config_attributes)

* Add copied from comment

* remove the code path for apply_inner_layernorms=False. Jamba always has the inner mamba layernorms

* clearer docstring for _convert_to_standard_cache

* style fixes

* Change calc_logits_for_entire_prompt (bool) to num_logits_to_keep (int). Adapt assisted decoding code tp use it. Also small change in low memory beam search decoding path to support this new int value in model_inputs

* rename test so it still overrides what its meant to override

* draft

* oups

* nit

* remove more complexe logic

* fix names used in config

* fix fix fix

* style

* fix some more failing tests

* generate did not init the cache 🙃

* more small nits

* typo

* config.mamba_expand * config.hidden_size for the intermediate size of the mamba shapes

* fix init of pkv with torch.tensor()

* empty tensor

* fix some init issues

* stupid changes required by generate because it does not even support it's own DynamicCache class

* more fixes

* fix general assisted gen cache_position bug

* tests passing

* Add offsets and periods as SPECIAL_CASES_TO_ALLOW in check_config_attributes.py

* fix reorder_cache to reorder mamba states and override some more functions in HybridMambaAttentionDynamicCache

* no need to override test_past_key_values_format() and _check_past_key_values_for_generate() in tests anymore

* fix docstrings and typehints for past_key_values

* style fixes

* fix docs

* change typehint due to copy from Mixtral

* forgot import

* import order

* Add configuration_jamba and modeling_jamba to not_doctested because the model is too big to download (in docstring of JambaForCausalLM.forward)

* Add integration test with tiny tandom Jamba model on hub

* fix flash attention cache shapes

* bring back forgotten hidden states

* rename HybridMambaAttentionDynamicCache.seqlen_offset to has_previous_state (and make bool) and bugfix - it should be set to True after a finished forward pass of the entire model

* align integration test after modeling fixes

* bugfix - mamba can use precomputed states only of forward pass is on a single token

* bugfix - mamba can use precomputed states only if they match the batch size

* typo

* remove making _prepare_4d_causal_attention_mask a leaf function

* stop using past_seq_len.get_seq_length(). Use cache positions instead. Adjust test (test_decoder_model_past_with_large_inputs) accordingly

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
2024-04-18 11:04:02 +02:00
Raushan Turganbay
41579763ee
Fix length related warnings in speculative decoding (#29585)
* avoid generation length warning

* add tests

* Update src/transformers/generation/candidate_generator.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* add tests and minor fixes

* refine `min_new_tokens`

* Update src/transformers/generation/candidate_generator.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* add method to prepare length arguments

* add test for min length

* Update src/transformers/generation/candidate_generator.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* fix variable naming

* empty commit for tests

* trigger tests (empty)

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2024-04-10 12:45:07 +05:00
Matt
ec59a42192
Revert workaround for TF safetensors loading (#30128)
* See if we can get tests to pass with the fixed weights

* See if we can get tests to pass with the fixed weights

* Replace the revisions now that we don't need them anymore
2024-04-09 11:04:18 +01:00
amyeroberts
7f9aff910b
Patch fix - don't use safetensors for TF models (#30118)
* Patch fix - don't use safetensors for TF models

* Skip test for TF for now

* Update for another test
2024-04-08 13:29:20 +01:00
théo gigant
fed27ffc7e
Adding FlaxNoRepeatNGramLogitsProcessor (#29677)
* fix issue with logit processor in beam search in Flax

* adding FlaxNoRepeatNGramLogitsProcessor class + unit test

* style correction and code verification

* add FlaxNoRepeatNGramLogitsProcessor to the test_processor_list and test_processor_list_jitted tests

* fix an issue where ngrams are banned only if they appear ==1 time + update description of get_previous_ngrams

* replace non-jit compatible masking of ngrams that are not yet generated with jittable version

* Revert "fix issue with logit processor in beam search in Flax"

This reverts commit 09b70d7e4d.

* add FlaxNoRepeatNGramLogitsProcessor to _get_logits_processor

* change the method of casting to boolean of banned tokens indices

* fix code style

* remove some useless operations + significantly faster computation of update indices using jax.lax.fori_loop

* remove useless loop iterations

* set some variables that were calculated and used multiple times

* fix format
2024-04-02 11:39:33 +02:00
Arthur
83b26dd79d
[generate] fix breaking change for patch (#29976)
* fix bug and add tests

* nit

* otherway to get the cur len instead of attention mask

* more places where this might have been broken

* nit

* oups

* inputs_embeds vs input_embeds

* test generated outptus

* style

* nit

* fix

* skip failing biogpt
2024-04-02 09:51:45 +02:00
Joao Gante
c9f6e5e351
Generate: move misplaced test (#29902) 2024-04-01 12:45:25 +01:00
Raushan Turganbay
0efcf32351
Move eos_token_id to stopping criteria (#29459)
* add eos stopping criteria

* minor fix

* Update tests/generation/test_stopping_criteria.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* check eos is not None and fix tests

* make style and fixup

* Update src/transformers/generation/stopping_criteria.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update tests/generation/test_utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update tests/generation/test_utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/__init__.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/generation/stopping_criteria.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/generation/stopping_criteria.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/generation/stopping_criteria.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* camel case everywhere

* call stopping criteria list for candidate ids

* make style  and fixup

* Empty commit

* Empty commit to pass flaky test

* set max length in PromptLookupCandidateGenerator

* Update src/transformers/generation/utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* lets fix this typo in docs

* Update src/transformers/generation/utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/generation/utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* update PR

* empty commit

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-03-27 12:18:10 +00:00
Zhihao Lin
998b5bb56f
Allow bos_token_id is None during the generation with inputs_embeds (#29772)
* update

* add ut

* update
2024-03-26 12:51:00 +00:00
Raushan Turganbay
fadb053379
Change in-place operations to out-of-place in LogitsProcessors (#29680)
* change in-place -> out-of-place

* add tests

* add more tests

* naming consistency

* fix doctest

* forgot min-length processors

* empty

* Revert "fix doctest"

This reverts commit 4772768457.

* revert change in docstring

* Update tests/generation/test_logits_process.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/generation/test_logits_process.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-03-21 16:37:33 +00:00
Raushan Turganbay
425ba56cdf
Clean-up generation tests after moving methods to private (#29582)
* clean-up tests

* refine comments

* fix musicgen tests

* make style

* remove slow decorator from a test

* more clean-up

* fix other failing tests
2024-03-19 17:03:31 +00:00
Fanli Lin
1ea3ad1aec
[tests] use torch_device instead of auto for model testing (#29531)
* use torch_device

* skip for XPU

* Update tests/generation/test_utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-03-08 11:21:43 +00:00
Joao Gante
bc764f4263
Generate: left-padding test, revisited (#29515)
* left-padding test revisited

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-03-08 10:06:46 +00:00
Joao Gante
ffe60fdcd6
v4.39 deprecations 🧼 (#29492) 2024-03-07 10:44:43 +00:00
Joao Gante
700d48fb2d
Generate: get generation mode from the generation config instance 🧼 (#29441) 2024-03-06 11:18:35 +00:00
FredericOdermatt
871ba71dfa
GenerationConfig validate both constraints and force_words_ids (#29163)
GenerationConfig validate both options for constrained decoding: constraints and force_words_ids
2024-02-27 01:43:52 +01:00
Raushan Turganbay
8f2f0f0f85
Track each row separately for stopping criteria (#29116) 2024-02-26 16:06:16 +00:00
Joao Gante
a7755d2409
Generate: unset GenerationConfig parameters do not raise warning (#29119) 2024-02-20 11:34:31 +00:00
Max Baak
08cd694ef0
ENH: added new output_logits option to generate function (#28667)
output_logits option behaves like output_scores, but returns the raw, unprocessed prediction logit scores,
ie. the values before they undergo logit processing and/or warping. The latter happens by default for the
regular output scores.

It's useful to have the unprocessed logit scores in certain circumstances. For example, unprocessed logit scores
are very useful with causallm models when one wants to determine the probability of a certain answer, e.g.
when asking a question with a yes/no answer. In that case getting the next-token probabilities of both "yes" and
"no" (and/or their relative ratio) is of interest for classification. The reason for getting these _before_ logit
processing and/or warping is b/c a) that can change the probabilities or b) reject the tokens of interest / reduce
the number of tokens to just 1.

For an example use-case see paper TabLLM: Few-shot Classification of Tabular Data with Large Language Models
by Stefan Hegselmann, Alejandro Buendia, Hunter Lang, Monica Agrawal, Xiaoyi Jiang, and David Sontag.
https://arxiv.org/abs/2210.10723

In addition:
- added dedicated unit test: tests/generation/test_utils/test_return_unprocessed_logit_scores
  which tests return of logics with output_logits=True in generation.
- set output_logits=True in all other generation unit tests, that also have output_scores=True.

Implemented @gante's and @amyeroberts review feedback

Co-authored-by: kx79wq <max.baak@ing.com>
2024-02-19 17:34:17 +00:00
Jonathan Mamou
258da40efd
fix num_assistant_tokens with heuristic schedule (#28759)
* fix heuristic num_assistant_tokens_schedule

* Update src/transformers/generation/configuration_utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/candidate_generator.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update utils.py

check that candidate_generator.assistant_model exists since some some speculations (like ngram and PLD) don't have assistant_model attribute

* Update src/transformers/generation/candidate_generator.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update tests/generation/test_utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* make fixup

* merge conflict

* fix docstring

* make fixup

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-02-16 11:44:58 +00:00
Raushan Turganbay
aee11fe427
Fix max_length criteria when using inputs_embeds (#28994)
* fix max_length for inputs_embeds

* make style

* Update src/transformers/generation/utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Static Cache: load models with MQA or GQA (#28975)

* fix

* fix tests

* fix tests

* Update src/transformers/generation/utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* more fixes

* make style

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-02-16 11:25:12 +00:00
Lysandre Debut
f497f564bb
Update all references to canonical models (#29001)
* Script & Manual edition

* Update
2024-02-16 08:16:58 +01:00
Raushan Turganbay
d628664688
Support batched input for decoder start ids (#28887)
* support batched input for decoder start ids

* Fix typos

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* minor changes

* fix: decoder_start_id as list

* empty commit

* empty commit

* empty commit

* empty commit

* empty commit

* empty commit

* empty commit

* empty commit

* empty commit

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2024-02-08 16:00:53 +00:00
Matt
415e9a0980
Add tf_keras imports to prepare for Keras 3 (#28588)
* Port core files + ESM (because ESM code is odd)

* Search-replace in modelling code

* Fix up transfo_xl as well

* Fix other core files + tests (still need to add correct import to tests)

* Fix cookiecutter

* make fixup, fix imports in some more core files

* Auto-add imports to tests

* Cleanup, add imports to sagemaker tests

* Use correct exception for importing tf_keras

* Fixes in modeling_tf_utils

* make fixup

* Correct version parsing code

* Ensure the pipeline tests correctly revert to float32 after each test

* Ensure the pipeline tests correctly revert to float32 after each test

* More tf.keras -> keras

* Add dtype cast

* Better imports of tf_keras

* Add a cast for tf.assign, just in case

* Fix callback imports
2024-01-30 17:26:36 +00:00
amyeroberts
9e8f35fa28
Mark test_constrained_beam_search_generate as flaky (#28757)
* Make test_constrained_beam_search_generate as flaky

* Update tests/generation/test_utils.py
2024-01-29 15:22:25 +00:00
Ofir Zafrir
9efec11400
Fix _speculative_sampling implementation (#28508) 2024-01-19 14:07:31 +00:00
Saibo-creator
d4fc1eb498
feat: Sequential beam search (#26304) 2024-01-19 11:36:54 +00:00
Joao Gante
f4f57f9dfa
Config: warning when saving generation kwargs in the model config (#28514) 2024-01-16 18:31:01 +00:00
Joao Gante
7e0ddf89f4
Generate: consolidate output classes (#28494) 2024-01-15 17:04:08 +00:00