Commit Graph

3959 Commits

Author SHA1 Message Date
Yoni Gozlan
5bcbdff159
Modify ProcessorTesterMixin for better generalization (#32637)
* Add padding="max_length" to tokenizer kwargs and change crop_size to size for image_processor kwargs

* remove crop_size argument in align processor tests to be coherent with base tests

* Add pad_token when loading tokenizer if needed, change test override tokenizer kwargs, remove unnecessary test overwrites in grounding dino
2024-08-13 11:48:53 -04:00
Sai-Suraj-27
c3cd9d807e
Fix: Fixed directory path for utils folder in test_tokenization_utils.py (#32601)
* Removed un-necessary expressions.

* Fixed directory path for utils folder in test_tokenization_utils.py
2024-08-13 16:48:15 +01:00
Bertrand Thia
cc25757a44
Add Depth Anything V2 Metric models (#32126)
* add checkpoint and repo names

* adapt head to support metric depth estimation

* add max_depth output scaling

* add expected logits

* improve docs

* fix docstring

* add checkpoint and repo names

* adapt head to support metric depth estimation

* add max_depth output scaling

* add expected logits

* improve docs

* fix docstring

* rename depth_estimation to depth_estimation_type

* add integration test

* Refactored tests to include metric depth model inference test
* Integration test pass when the timm backbone lines are commented (L220-L227)

* address feedback

* replace model path to use organization path

* formatting

* delete deprecated TODO

* address feedback

* [run_slow] depth_anything
2024-08-13 16:16:30 +02:00
Eric Hartford
481e15604a
Add support for GrokAdamW optimizer (#32521)
* add grokadamw

* reformat

* code review feedback, unit test

* reformat

* reformat
2024-08-13 13:20:28 +01:00
Fanli Lin
b5016d5de7
fix tensors on different devices in WhisperGenerationMixin (#32316)
* fix

* enable on xpu

* no manual remove

* move to device

* remove to

* add move to
2024-08-13 11:29:57 +01:00
Pablo Montalvo
a5a8291ad1
Fix tests (#32649)
* skip failing tests

* [no-filter]

* [no-filter]

* fix wording catch in FA2 test

* [no-filter]

* trigger normal CI without filtering
2024-08-13 09:46:21 +01:00
Lysandre Debut
29c3a0fa01
Automatically add transformers tag to the modelcard (#32623)
* Automatically add `transformers` tag to the modelcard

* Specify library_name and test
2024-08-13 07:59:01 +02:00
Raushan Turganbay
a29eabd0eb
Expand inputs in processors for VLMs (#30962)
* let it be

* draft

* should not have changed

* add warnings

* fix & add tests

* fix tests

* ipnuts embeds cannot be passed with pixels

* more updates

* paligemma ready!

* minor typos

* update blip-2

* fix tests & raise error

* docstring

* add blip2 test

* tmp

* add image seq length to config

* update docstring

* delete

* fix tests

* fix blip

* fix paligemma

* out-of-place scatter

* add llava-next-video

* Update src/transformers/models/blip_2/modeling_blip_2.py

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

* remove tmp

* codestyle

* nits

* more nits

* remove overriding in tests

* comprehension when merging video

* fix-copies

* revert changes for embeds test

* fix tests after making comprehension

* Update src/transformers/models/blip_2/processing_blip_2.py

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

* Update src/transformers/models/blip_2/processing_blip_2.py

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

* more updates

* fix tests

---------

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
2024-08-13 10:14:39 +05:00
Quentin Gallouédec
f1c8542ff7
"to be not" -> "not to be" (#32636)
* "to be not" -> "not to be"

* Update sam.md

* Update trainer.py

* Update modeling_utils.py

* Update test_modeling_utils.py

* Update test_modeling_utils.py
2024-08-12 20:20:17 +01:00
Sai-Suraj-27
ce4b28830a
fix: Fixed failing test_find_base_model_checkpoint (#32638)
Fixed failing test_find_base_model_checkpoint.
2024-08-12 19:51:30 +01:00
Raushan Turganbay
8f2b6d5e3d
Fix: FA2 with packed training (#32487)
* fix check

* add tests

* [run-slow] llama, gemma2

* oops, whisper actually runs but needed some special treatment
2024-08-12 13:40:07 +05:00
Younes Belkada
7c11491208
Add new model (#32615)
* v1 - working version

* fix

* fix

* fix

* fix

* rename to correct name

* fix title

* fixup

* rename files

* fix

* add copied from on tests

* rename to `FalconMamba` everywhere and fix bugs

* fix quantization + accelerate

* fix copies

* add `torch.compile` support

* fix tests

* fix tests and add slow tests

* copies on config

* merge the latest changes

* fix tests

* add few lines about instruct

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix

* fix tests

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-08-12 08:22:47 +02:00
Arthur
e4522fe399
fix slow integration gemma2 test (#32534)
no empty revision
2024-08-09 11:28:22 +02:00
Guang Yang
0164560353
Fixed test test_static_cache_exportability with torch 2.4.0 (#32516)
Workaround the export issue in torch 2.4

Co-authored-by: Guang Yang <guangyang@fb.com>
2024-08-08 18:13:40 +01:00
Pablo Montalvo
044281605f
Fix generate with inputs_embeds as input (#32493)
* I think inputs_embeds has ndim == 3

* fix sequence length catch

* add generate test

* [run-slow]olmo, persimmon, gemma, gemma2, qwen2, llama

* skip whisper

* fix bart test

* more fixes
2024-08-08 18:44:53 +02:00
Yunfei Chu
16ed0640be
Add Qwen2-Audio (#32137)
* add qwen2audio

* Update check_repo.py

* fix style

* fix test

* fix style

* add model size

* Qwen2AudioEncoderModel->Qwen2AudioEncoder; add copy info

* Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>

* Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>

* Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>

* switch the attention_mask and the feature_attention_mask

* add to PRIVATE_MODELS in check_repo.py; add to MODEL_NAMES_TO_IGNORE in check_table.py

* fix initialization

* update chat_template

* fix consistency issue after copy

* add docstrings to _merge_input_ids_with_audio_features

* add copied from to prepare_inputs_for_generation

* add more details to docs

* rm comment

* add init_std

* Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>

* Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>

* Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>

* Update src/transformers/models/qwen2_audio/modeling_qwen2_audio.py

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>

* update

* Update docs/source/en/model_doc/qwen2_audio.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* update tests

* rm ignore_index

* update processor

* rm ffmpeg_read

* Update tests/models/qwen2_audio/test_modeling_qwen2_audio.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2_audio.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2_audio.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2_audio.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* update

* typo

* [run_slow] qwen2_audio

* [run_slow] qwen2_audio

* [run_slow] qwen2_audio

* fix quality

* [run_slow] qwen2_audio

* [run_slow] qwen2_audio

* [run_slow] qwen2_audio

* add official model

---------

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-08-08 15:47:24 +02:00
Sangbum Daniel Choi
d3b3551750
Uniformize kwargs for processors - GroundingDINO (#31964)
* fix typo

* uniform kwargs

* make style

* add comments

* remove return_tensors

* remove common_kwargs from processor since it propagates

* make style

* return_token_type_ids to True

* revert the default imagekwargs since does not accept any value in the image processro

* revert processing_utils.py

* make style

* add molbap's commit

* fix typo

* fix common processor

* remain

* Revert "add molbap's commit"

This reverts commit a476c6ee88.

* add unsync PR

* revert

* make CI happy

* nit

* import annotationformat
2024-08-08 14:03:08 +01:00
Aymeric Roucher
e0d82534cc
Agents use grammar (#31735)
* Allow optional use of grammars to constrain generation
2024-08-07 11:42:52 +02:00
Raushan Turganbay
a30c865f99
Cache: new Cache format in decoder-only models (#31421)
* draft bart with new cache

* add cache for decoder-only models

* revert utils

* modify docstring

* revert bart

* minor fixes

* fix copies (not related)

* revert tests

* remove enc-dec related code

* remove bloom

* remove opt (enc-dec)

* update docstring

* git, codegen, gpt_neo, gpt_neox, gpj

* clean up

* copied from statements

* revert

* tmp

* update warning msg

* forgot git

* add more flags

* run-slow git,codegen,gpt_neo,gpt_neox,gpj

* add cache flag to VLMs

* remove files

* style

* video LLMs also need a flag

* style

* llava will go in another PR

* style

* [run-slow] codegen, falcon, git, gpt_neo, gpt_neox, gptj, idefics

* Update src/transformers/models/gpt_neo/modeling_gpt_neo.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* copy from

* deprecate until v4.45 and warn if not training

* nit

* fix test

* test static cache

* add more tests and fix models

* fix copies

* return sliding window mask

* run slow tests & fix + codestyle

* one more falcon fix for alibi

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-08-07 10:02:16 +05:00
Pablo Montalvo
80b90e7b2f
Add codestral mamba2 (#32080)
* add new model like

* draft cuda forward - mismatched keys (sharding on conv1)

* match keys successfully

* fix split

* get generation/forward running (wrong gens, norm?)

* :update

* some refactoring

* fixes

* works up until copy to cache

* fix

* update

* NON WORKING VERSION

* version that work?

* nit

* fix config

* fix conversion script

* working cuda forward

* nit

* update

* simplifcation

* make mamba slow simple work

* no einops

* todo

* fix style

* no einops

* update fix no einsum

* nit

* remove einops

* bug: scan_output differs strongly

* add rms norm option

* fix fast + slow generation with and w/o cache ✔️

* draft integration tests

* remove a big chunk of the einsum

* fix slow, fast generations, without any einsum

* fix copies

* fix structure

* fix up modeling and tests

* fix tests

* clamping is indeed worse

* recover mamba2 cache test

* fix copies

* no cache position (yet)

* fix tf tests

* fix matmul for generate

* fixup

* skip cache tests for now

* [run-slow]mamba2

* tune out hidden states for padding

* test batched generation

* propagate attention mask changes

* fix past length

* fix integration test

* style

* address comments

* update readme

* add mamba2 version check

* fix tests

* [run-slow]mamba2

* skip edge tests

* [run-slow]mamba2

* last fixup

* [run-slow]mamba2

* update README

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
2024-08-06 16:39:52 +02:00
Ao Tang
6a03942db7
Add Nemotron HF Support (#31699)
* Add nemotron support

* fix inference

* add unit test

* add layernorm1p as a class to avoid meta device mismatch

* test fixed

* Add copied_from statements

* remove pretraining_tp args

* remove nemotronlayernorm

* force LN computation done in FP32

* remove nemotrontokenizer and use llamatokenizer

* license update

* add option for kv_channels for minitron8b

* remove assert

* o_proj fixed

* o_proj reshape

* add gated_proj option

* typo

* remove todos

* fix broken test after merging latest main

* remove nezha/nat after meging main

* chnage default config to 15b model

* add nemo conversion script

* rename conversion script

* remove gate_proj option

* pr comment resolved

* fix unit test

* rename kv_channels to head_dim

* resolve PR issue

* add nemotron md

* fix broken tests

* refactor rope for nemotron

* test fix

* remove linearscaling

* whitespace and import

* fix some copied-from

* code style fix

* reformatted

* add position_embedding to nemotronattention

* rope refactor to only use config, copied-from fix

* format

* Run make fix-copies

* nemotron md with autodoc

* doc  fix

* fix order

* pass check_config_docstrings.py

* fix config_attributes

* remove all llama BC related code

* Use PreTrainedTokenizerFast

* ruff check examples

* conversion script update

* add nemotron to toctree
2024-08-06 15:42:05 +02:00
Francisco Kurucz
438d06c95a
Fix get large model config for Switch Transformer encoder only tester (#32438) 2024-08-06 11:48:32 +01:00
Pavel Iakubovskii
fb66ef8147
Update kwargs validation for preprocess with decorator (#32024)
* BLIP preprocess

* BIT preprocess

* BRIDGETOWER preprocess

* CHAMELEON preprocess

* CHINESE_CLIP preprocess

* CONVNEXT preprocess

* DEIT preprocess

* DONUT preprocess

* DPT preprocess

* FLAVA preprocess

* EFFICIENTNET preprocess

* FUYU preprocess

* GLPN preprocess

* IMAGEGPT preprocess

* INTRUCTBLIPVIDEO preprocess

* VIVIT preprocess

* ZOEDEPTH preprocess

* VITMATTE preprocess

* VIT preprocess

* VILT preprocess

* VIDEOMAE preprocess

* VIDEOLLAVA

* TVP processing

* TVP fixup

* SWIN2SR preprocess

* SIGLIP preprocess

* SAM preprocess

* RT-DETR preprocess

* PVT preprocess

* POOLFORMER preprocess

* PERCEIVER preprocess

* OWLVIT preprocess

* OWLV2 preprocess

* NOUGAT preprocess

* MOBILEVIT preprocess

* MOBILENETV2 preprocess

* MOBILENETV1 preprocess

* LEVIT preprocess

* LAYOUTLMV2 preprocess

* LAYOUTLMV3 preprocess

* Add test

* Update tests
2024-08-06 11:33:05 +01:00
Fanli Lin
e85d86398a
add the missing flash attention test marker (#32419)
* add flash attention check

* fix

* fix

* add the missing marker

* bug fix

* add one more

* remove order

* add one more
2024-08-06 11:18:58 +01:00
amyeroberts
7e5d46ded4
Respect the config's attn_implementation if set (#32383)
* Respect the config's attn if set

* Update test - can override in from_config

* Fix
2024-08-05 16:33:19 +01:00
Sai-Suraj-27
458b0cd2c5
fix: Updated test_embeded_special_tokens for luke and mluke models (#32413)
Fixed tokenizertests for luke, mluke models.
2024-08-05 15:19:42 +01:00
Abdi
baf7e5c927
Persist embedding type of BART and mBART models after resize (#32242)
* fix: persist embedding type of MBartConditonalGeneration after resize

* fix: persist embedding type of BartConditonalGeneration after resize
2024-08-05 14:15:36 +01:00
Ita Zaporozhets
3d7c2f9dea
#32184 save total_vocab_size (#32240)
* save total_vocab_size = vocab_size + user added tokens to speed up operation

* updating length when added_tokens_decoder is set

* add test len(tokenizer)
2024-08-05 09:22:48 +02:00
Raushan Turganbay
3bb646a54f
Phi3 tests: fix typing for Python 3.8 (#32388)
fix phi
2024-08-05 11:58:42 +05:00
TechInterMezzo
05ae3a300d
fix: SeamlessM4TFeatureExtractor stride remainder (#32088)
* fix: SeamlessM4TFeatureExtractor stride remainder

* Added attention mask size test

* Reran ruff for style correction
2024-08-05 08:40:58 +02:00
Joao Gante
083e13b7c4
RoPE: Add numerical tests (#32380)
tests! :D
2024-08-02 09:39:45 +01:00
Zach Mueller
82efc53513
Yell at the user if zero-3 init wasn't performed, but expected to have been done (#32299)
* Test this zach

* Test for improper init w/o zero3

* Move back

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Get rid of stars in warning

* Make private

* Make clear

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-08-01 15:18:43 -04:00
OsamaS99
51ab25e293
Fixed Hybrid Cache Shape Initialization. (#32163)
* fixed hybrid cache init, added test

* Fix Test Typo

---------

Co-authored-by: Aaron Haag <aaron.haag@siemens.com>
2024-08-01 13:57:42 +01:00
Nikos Karampatziakis
ca59d6f77c
Offloaded KV Cache (#31325)
* Initial implementation of OffloadedCache

* enable usage via cache_implementation

* Address feedback, add tests, remove legacy methods.

* Remove flash-attn, discover synchronization bugs, fix bugs

* Prevent usage in CPU only mode

* Add a section about offloaded KV cache to the docs

* Fix typos in docs

* Clarifications and better explanation of streams
2024-08-01 14:42:07 +02:00
Omar Salman
b4727a1216
Fix conflicting key in init kwargs in PreTrainedTokenizerBase (#31233)
* Fix conflicting key in init kwargs in PreTrainedTokenizerBase

* Update code to check for callable key in save_pretrained

* Apply PR suggestions

* Invoke CI

* Updates based on PR suggestion
2024-08-01 14:32:13 +02:00
Ita Zaporozhets
2229ebe722
update clean_up_tokenization_spaces warning (#32371) 2024-08-01 13:57:41 +02:00
Lunwen He
48ed24c50a
Remove size check between attn_weights and kv_seq_len for phi3 (#32339)
* Remove size check between attn_weights and kv_seq_len

* add unit tests
2024-08-01 13:49:00 +02:00
Sanchit Gandhi
e234061cdd
[whisper] compile compatibility with long-form decoding (#31772)
* [whisper] compile compatibility with long-form decoding

* clarify comment

* fix after rebase

* finalise

* fix bsz

* fix cache split

* remove contiguous

* style

* finish

* update doc

* prevent cuda graph trace
2024-08-01 18:10:56 +08:00
fxmarty
92abe60334
>3-5x faster torch.compile forward compilation for autoregressive decoder models (#32227)
* draft

* apply changes to all relevant archs

* rerun ci - check_docstrings.py failing?

* fix docstring

* move 2D->4D mask creation to modeling file

* repo consistency

* fix the batch size = 1 case - calling contiguous is not enough

* nit

* style

* propagate to gemma/gemma-2

* prepare inputs for gemma generation

* implement test and tiny fix in gemma2

* Update src/transformers/models/bloom/modeling_bloom.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix copies

* ci pass

* fix gemma's test_compile_static_cache tests

* flacky

* retrigger ci

---------

Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-08-01 02:03:07 +08:00
amyeroberts
5f1fcc299c
[Idefics2] - Fix FA2 call for Perceiver layer (#32275)
* Fix FA2 call for Perciever layer

* [run_slow] idefics2

* [run_slow] idefics2

* [run_slow] idefics2

* Fix up

* [run_slow] idefics2

* [run_slow] idefics2

* [run_slow] idefics2
2024-07-31 14:51:04 +01:00
Joao Gante
b75ad56620
Llama 3.1: Fix incorrect inv_freq assignment (#32330)
fix 💩
2024-07-31 11:12:46 +01:00
Raushan Turganbay
7f552e28e0
Gemma2 and flash-attention (#32188)
* enable flash-attn & static cache

* this works, not the prev

* fix for sliding window layers

* not needed anymore
2024-07-31 10:33:38 +05:00
Joshua Lochner
6e2d04e429
Fix slow GemmaTokenizer and improve SPM slow -> fast conversion process (#32191)
* Remove user-defined tokens which can be obtained through merges

* Remove debug line

* formatting

* Refactor spm slow -> fast converter

* revert unnecessary refactor

* set comprehension

* remove test files

* Use `vocab_scores`

* Always replace spiece underline with space in decode

* we no longer need token filtering

* Add save fast load slow unit test

* Remove tokenizers version check

* Remove duplicate code

* Make `<start_of_turn>` and `<end_of_turn>` special tokens

* Bias merge priority with length if score is the same

* Add unit test for merge priority

* CI
2024-07-30 23:36:38 +02:00
Guang Yang
811a9caa21
Make static cache compatible with torch.export (#32168) 2024-07-29 18:19:15 +01:00
Sanchit Gandhi
7f5d644e69
[pipeline] fix padding for 1-d tensors (#31776)
* [pipeline] fix padding for 1-d tensors

* add test

* make style

* Update tests/pipelines/test_pipelines_automatic_speech_recognition.py

Co-authored-by: Kamil Akesbi <45195979+kamilakesbi@users.noreply.github.com>

* Update tests/pipelines/test_pipelines_automatic_speech_recognition.py

---------

Co-authored-by: Kamil Akesbi <45195979+kamilakesbi@users.noreply.github.com>
2024-07-29 21:24:42 +08:00
Kamil Akesbi
3fbaaaa64d
Whisper tokenizer word level timestamps (#32197)
* fix _fix_key in PreTrainedModel

* fix _find_longest_common_sequence

* add test

* remove result.json

* nit

* update test
2024-07-29 11:19:52 +01:00
Joao Gante
7ffe25f2b9
Generate: end-to-end compilation (#30788)
* mvp

* added test (a few models need fixes)

* fix a few test cases

* test nits

* harder test 😈

* revert changes in stablelm

* test with improved condition

* add todo

* tmp commit

* merged with main

* nits

* add todo

* final corrections

* add docs for generation compilation

* docs nits

* add  tip

* PR suggestions

* add more details to the compilation docs

* fix cache positions

* cache is now init in generate; update docs

* tag test as flaky

* docs

* post rebase make fixup and other nits

* remove unintended changes

* whisper (encoder-decoder) not supported

* move token default updates to ; add tests for token defaults

* push changes

* manual rebase

* chameleon doesn't support this

* fix test_static_cache_mha_mqa_gqa (broken in another PR)

* docs: dynamic is better with end-to-end compilation
2024-07-29 10:52:13 +01:00
Raushan Turganbay
f739687684
🚨 Bloom support for cache class (#31445)
* bloom dynamic cache

* bloom follows standard cache format

* no skips for bloom anymore

* use cache position when possible

* clean up

* codestyle

* Update src/transformers/models/bloom/modeling_bloom.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/bloom/modeling_bloom.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/bloom/modeling_bloom.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* pr comments

* isinstance fix

* address comments

* make musicgen test happy

* [run-slow] bloom

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-29 10:58:59 +05:00
Raushan Turganbay
81233c069c
Flash-Attn: fix generation when no attention mask or no pading (#32241)
* fix

* fix prev test (half of failures)

* [run-slow] llama, gemma2

* [run-slow] llama, gemma2
2024-07-26 14:45:55 +05:00
Fanli Lin
27c7f971c0
[tests] fix static cache implementation is not compatible with attn_implementation==flash_attention_2 (#32039)
* add flash attention check

* fix

* fix
2024-07-26 11:41:27 +02:00