Yaswanth Gali
7aee036e54
Iterative generation using Input embeds and past_key_values
( #35890 )
...
* Iterative generation using input embeds
* ruff fix
* Added Testcase
* Updated comment
* ♻️ Refactored testcase
* Skip test for these models
* Continue generation using input embeds and cache
* Skip generate_continue_from_embeds test
* Refactor `prepare_input_for_generation` func
* Continue generation using input embeds and cache
* Modular changes fix
* Overwrite 'prepare_inputs_for_generation' function
2025-02-06 11:06:05 +01:00
Cyril Vallez
3f860dba55
Fix mask slicing for models with HybridCache ( #35681 )
...
* correctly slice
* check mask
* Update modular_gemma2.py
* fix
* add tests
* fix typo
* finally fix mask slicing
* Finally correctly slice in all cases!!
* add test for all attention functions
* small fix in tests
* trick around dynamo tracing issue
* last update
* more robust
* kwargs propagation
* make it explicit for checkpointing
* apply modular
2025-01-28 14:35:00 +01:00
Cyril Vallez
ab1afd56f5
Fix some tests ( #35682 )
...
* cohere tests
* glm tests
* cohere2 model name
* create decorator
* update
* fix cohere2 completions
* style
* style
* style
* add cuda in comments
2025-01-17 12:10:43 +00:00
Joao Gante
94af1c0aa2
[generate] return Cache object even if passed in a legacy format ( #35673 )
...
* generate returns a Cache object by default
* fix tests
* fix test for encoder-decoder models
2025-01-16 17:06:24 +00:00
Cyril Vallez
3a4ae6eace
Refactor/fix Cohere2 ( #35594 )
...
* refactor/fix cohere2
* add kwargs
* tests
* remove func and import it
2025-01-09 17:54:57 +01:00
alexrs-cohere
64478c7631
Add Cohere2 model ( #35224 )
2024-12-13 09:35:50 +01:00