Commit Graph

30 Commits

Author SHA1 Message Date
Matt
53fb245eb6
🚨 🚨 Inherited CausalLM Tests (#37590)
* stash commit

* Experiment 1: Try just Gemma

* Experiment 1: Just try Gemma

* make fixup

* Trigger tests

* stash commit

* Try adding Gemma3 as well

* make fixup

* Correct attrib names

* Correct pipeline model mapping

* Add in all_model_classes for Gemma1 again

* Move the pipeline model mapping around again

* make fixup

* Revert Gemma3 changes since it's a VLM

* Let's try Falcon

* Correct attributes

* Correct attributes

* Let's try just overriding get_config() for now

* Do Nemotron too

* And Llama!

* Do llama/persimmon

* Correctly skip tests

* Fix Persimmon

* Include Phimoe

* Fix Gemma2

* Set model_tester_class correctly

* Add GLM

* More models!

* models models models

* make fixup

* Add Qwen3 + Qwen3MoE

* Correct import

* make fixup

* Add the QuestionAnswering classes

* Add the QuestionAnswering classes

* Move pipeline mapping to the right place

* Jetmoe too

* Stop RoPE testing models with no RoPE

* Fix up JetMOE a bit

* Fix up JetMOE a bit

* Can we just force pad_token_id all the time?

* make fixup

* fix starcoder2

* Move pipeline mapping

* Fix RoPE skipping

* Fix RecurrentGemma tests

* Fix Falcon tests

* Add MoE attributes

* Fix values for RoPE testing

* Make sure we set bos_token_id and eos_token_id in an appropriate range

* make fixup

* Fix GLM4

* Add mamba attributes

* Revert bits of JetMOE

* Re-add the JetMOE skips

* Update tests/causal_lm_tester.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Add licence

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-05-23 18:29:31 +01:00
Joao Gante
362fa37da2
[test] update test_past_key_values_format (#37614)
allow custom shapes
2025-04-22 11:07:34 +01:00
cyyever
1e6b546ea6
Use Python 3.9 syntax in tests (#37343)
Signed-off-by: cyy <cyyever@outlook.com>
2025-04-08 14:12:08 +02:00
Matt
2d46a08b63
Purge unused ModelTester code (#37085)
* Purge correctly this time

* Remove more methods from recent PRs

* make fixup
2025-04-03 17:48:35 +01:00
Afanti
26c83490d2
chore: fix typos in the tests directory (#36813)
* chore: fix typos in the tests

* chore: fix typos in the tests

* chore: fix typos in the tests

* chore: fix typos in the tests

* chore: fix typos in the tests

* chore: fix typos in the tests

* chore: fix typos in the tests

* chore: fix typos in the tests

* chore: fix typos in the tests

* chore: fix typos in the tests

* chore: fix typos in the tests

* chore: fix typos in the tests

* chore: fix typos in the tests

* fix: format codes

* chore: fix copy mismatch issue

* fix: format codes

* chore: fix copy mismatch issue

* chore: fix copy mismatch issue

* chore: fix copy mismatch issue

* chore: restore previous words

* chore: revert unexpected changes
2025-03-21 10:20:05 +01:00
Fanli Lin
4dbf17c17f
[tests] enable bnb tests on xpu (#36233)
* fix failed test

* fix device

* fix more device cases

* add more cases

* fix empty cache

* Update test_4bit.py

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-02-24 11:30:15 +01:00
Joao Gante
62c7ea0201
CI: avoid human error, automatically infer generative models (#33212)
* tmp commit

* move tests to the right class

* remove ALL all_generative_model_classes = ...

* skip tf roberta

* skip InstructBlipForConditionalGenerationDecoderOnlyTest

* videollava

* reduce diff

* reduce diff

* remove  on vlms

* fix a few more

* manual rebase bits

* more manual rebase

* remove all manual generative model class test entries

* fix up to ernie

* a few more removals

* handle remaining cases

* recurrent gemma

* it's better here

* make fixup

* tf idefics is broken

* tf bert + generate is broken

* don't touch tf :()

* don't touch tf :(

* make fixup

* better comments for test skips

* revert tf changes

* remove empty line removal

* one more

* missing one
2025-02-13 16:27:11 +01:00
Arthur
b912f5ee43
use torch.testing.assertclose instead to get more details about error in cis (#35659)
* use torch.testing.assertclose instead to get more details about error in cis

* fix

* style

* test_all

* revert for I bert

* fixes and updates

* more image processing fixes

* more image processors

* fix mamba and co

* style

* less strick

* ok I won't be strict

* skip and be done

* up
2025-01-24 16:55:28 +01:00
Arthur
2c47618c1a
🚨All attention refactor🚨 (#35235)
* refactor LlamaAttention

* minimal changes

* fix llama

* update

* modular gemmas

* modular nits

* modular updates

* nits

* simplify

* gpt2

* more modualr and fixes

* granite

* modular modular modular

* nits

* update

* qwen2 + starcoder2

* mostly gemma2

* Update image_processing_auto.py

* fix

* Update modular_starcoder2.py

* fix

* remove all copied from attentions

* remove gcv

* make fix-copies

* oups

* oups2.0

* fix some modulars + all copied from

* should be good now

* revert unwanted changes

* Update modeling_decision_transformer.py

* finish cleanup

* Update modeling_olmo.py

* consistency

* re-add gradient checkpointing attribute

* fix

* style

* make config necessary

* bis

* bis

* Update modeling_my_new_model2.py

* is_causal attr

* fix

* remove past kv return from decoder layer

* fix

* default rope config

* correctly fix rope config

* fix bias

* fix gpt2 attention output

* fix test

* fix inits

* fix default sdpa

* fix default sdpa implementation

* harmonize classes

* fix mistral

* fix sliding window models

* mixtral

* be more explicit

* style

* fix

* several fixes

* Update modeling_dbrx.py

* fix test

* olmo + phi

* rotary

* syle

* phi

* phi again

* again

* kwargs

* Update test_modeling_common.py

* skip fx tracing tests

* Update modeling_utils.py

* gemma 2

* again

* Update modeling_recurrent_gemma.py

* gemma2

* granite

* style

* starcoder

* Update sdpa_attention.py

* switch args

* Update modeling_mllama.py

* fix

* cache type tests

* gpt2

* Update test_modeling_common.py

* fix

* consistency

* fix shape with encoder

* should be the last one

* tests non model

* most comments

* small oupsi

* be more explicit in modulars

* more explicit modulars

* CIs! it works locally

* add kwargs to _flash_attention_forward

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2024-12-18 16:53:39 +01:00
Cyril Vallez
d363e71d0e
🧹 Remove deprecated RotaryEmbedding parts in the Attention layers (#34858)
* update

* style

* fix missing args

* remove last trace of old rope classes

* remove deprecated copied from

* fix copies

* trigger CIs

* post rebase clean-up

* reverse mistral

* cleanup after dropping commits

* Add comment
2024-12-11 11:16:52 +01:00
Joao Gante
186b8dc190
Tests: upgrade test_eager_matches_sdpa_generate (#34386) 2024-10-25 11:55:07 +01:00
Pavel Iakubovskii
48461c0fe2
Make pipeline able to load processor (#32514)
* Refactor get_test_pipeline

* Fixup

* Fixing tests

* Add processor loading in tests

* Restructure processors loading

* Add processor to the pipeline

* Move model loading on tom of the test

* Update `get_test_pipeline`

* Fixup

* Add class-based flags for loading processors

* Change `is_pipeline_test_to_skip` signature

* Skip t5 failing test for slow tokenizer

* Fixup

* Fix copies for T5

* Fix typo

* Add try/except for tokenizer loading (kosmos-2 case)

* Fixup

* Llama not fails for long generation

* Revert processor pass in text-generation test

* Fix docs

* Switch back to json file for image processors and feature extractors

* Add processor type check

* Remove except for tokenizers

* Fix docstring

* Fix empty lists for tests

* Fixup

* Fix load check

* Ensure we have non-empty test cases

* Update src/transformers/pipelines/__init__.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

* Update src/transformers/pipelines/base.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

* Rework comment

* Better docs, add note about pipeline components

* Change warning to error raise

* Fixup

* Refine pipeline docs

---------

Co-authored-by: Lysandre Debut <hi@lysand.re>
2024-10-09 16:46:11 +01:00
Raushan Turganbay
65bb284448
Compile compatibilty for decoder-only models (#32617)
* squash into one commit

* add qwen2-vl for rope standardization

* fix mistral compile

* fix qwen2-vl

* fix-copies
2024-09-09 10:59:04 +02:00
amyeroberts
1de7dc7403
Skip tests properly (#31308)
* Skip tests properly

* [test_all]

* Add 'reason' as kwarg for skipTest

* [test_all] Fix up

* [test_all]
2024-06-26 21:59:08 +01:00
Arthur
673440d073
update ruff version (#30932)
* update ruff version

* fix research projects

* Empty

* Fix errors

---------

Co-authored-by: Lysandre <lysandre@huggingface.co>
2024-05-22 06:40:15 +02:00
Nilabhra Roy Chowdhury
e52741f601
Support for Falcon2-11B (#30771)
* remove unrelated changes

* remove unrelated changes on phi and stable LM

* add: Test for Falcon 10B

* fix: formatting

* fix: loading the falcon 10B in 8 bit precision using bitsanbytes.

* fix: device placement

* fix: broken tests.

* fix: backwards compatibility for falcon 1B architecture.

* chore: updated test.

* chore: test_modeling_falcon.py to use the 11B model.

* chore: minor edit

* chore: formating.

---------

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>
2024-05-13 13:32:43 +02:00
fxmarty
1897874edc
Fix falcon with SDPA, alibi but no passed mask (#30123)
* fix falcon without attention_mask & alibi

* add test

* Update tests/models/falcon/test_modeling_falcon.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-04-08 22:25:07 +08:00
Yih-Dar
43d17c1836
Mark test_eager_matches_sdpa_generate flaky for some models (#29479)
* fix

* revert for qwen2

* revert for qwen2

* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-03-29 11:51:20 +01:00
Joao Gante
441de62f49
RoPE models: add numerical sanity-check test for RoPE scaling (#29808)
* add hard rope scaling test

* make fixup

* quick rope scaling tests

* add copy statements
2024-03-28 11:25:50 +00:00
fxmarty
80377eb018
F.scaled_dot_product_attention support (#26572)
* add sdpa

* wip

* cleaning

* add ref

* yet more cleaning

* and more :)

* wip llama

* working llama

* add output_attentions=True support

* bigcode sdpa support

* fixes

* gpt-bigcode support, require torch>=2.1.1

* add falcon support

* fix conflicts falcon

* style

* fix attention_mask definition

* remove output_attentions from attnmaskconverter

* support whisper without removing any Copied from statement

* fix mbart default to eager renaming

* fix typo in falcon

* fix is_causal in SDPA

* check is_flash_attn_2_available in the models init as well in case the model is not initialized through from_pretrained

* add warnings when falling back on the manual implementation

* precise doc

* wip replace _flash_attn_enabled by config.attn_implementation

* fix typo

* add tests

* style

* add a copy.deepcopy on the config in from_pretrained, as we do not want to modify it inplace

* obey to config.attn_implementation if a config is passed in from_pretrained

* fix is_torch_sdpa_available when torch is not installed

* remove dead code

* Update src/transformers/modeling_attn_mask_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/bart/modeling_bart.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* remove duplicate pretraining_tp code

* add dropout in llama

* precise comment on attn_mask

* add fmt: off for _unmask_unattended docstring

* precise num_masks comment

* nuke pretraining_tp in LlamaSDPAAttention following Arthur's suggestion

* cleanup modeling_utils

* backward compatibility

* fix style as requested

* style

* improve documentation

* test pass

* style

* add _unmask_unattended tests

* skip meaningless tests for idefics

* hard_check SDPA requirements when specifically requested

* standardize the use if XXX_ATTENTION_CLASSES

* fix SDPA bug with mem-efficient backend on CUDA when using fp32

* fix test

* rely on SDPA is_causal parameter to handle the causal mask in some cases

* fix FALCON_ATTENTION_CLASSES

* remove _flash_attn_2_enabled occurences

* fix test

* add OPT to the list of supported flash models

* improve test

* properly test on different SDPA backends, on different dtypes & properly handle separately the pad tokens in the test

* remove remaining _flash_attn_2_enabled occurence

* Update src/transformers/modeling_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/perf_infer_gpu_one.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* remove use_attn_implementation

* fix docstring & slight bug

* make attn_implementation internal (_attn_implementation)

* typos

* fix tests

* deprecate use_flash_attention_2=True

* fix test

* add back llama that was removed by mistake

* fix tests

* remove _flash_attn_2_enabled occurences bis

* add check & test that passed attn_implementation is valid

* fix falcon torchscript export

* fix device of mask in tests

* add tip about torch.jit.trace and move bt doc below sdpa

* fix parameterized.expand order

* move tests from test_modeling_attn_mask_utils to test_modeling_utils as a relevant test class is already there

* update sdpaattention class with the new cache

* Update src/transformers/configuration_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/bark/modeling_bark.py

* address review comments

* WIP torch.jit.trace fix. left: test both eager & sdpa

* add test for torch.jit.trace for both eager/sdpa

* fix falcon with torch==2.0 that needs to use sdpa

* fix doc

* hopefully last fix

* fix key_value_length that has no default now in mask converter

* is it flacky?

* fix speculative decoding bug

* tests do pass

* fix following #27907

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-12-09 05:38:14 +09:00
Yih-Dar
623432dcc9
Skip pipeline tests for 2 models for now (#27687)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-11-24 09:43:20 +01:00
Tom Aarsen
05ea7b79e6
Refactor: Use Llama RoPE implementation for Falcon (#26933)
* Use Llama RoPE implementation for Falcon

+ Add copy functionalities

* Use standard cache format for Falcon

* Simplify apply_rotary_pos_emb, copy from Llama

* Remove unnecessary cache conversion test

We don't need to convert any caches anymore!

* Resolve copy complaint
2023-11-03 11:05:55 +00:00
Matt
bdb391e9c6
Fix Falcon generation test (#26770) 2023-10-13 15:10:27 +01:00
Lysandre Debut
67239f7360
Revert falcon exception (#26472)
* Revert "Falcon: fix revision propagation (#26006)"

This reverts commit 118c676ef3.

* Revert "Put Falcon back (#25960)"

This reverts commit 22a69f1d7d.
2023-10-02 09:13:19 +02:00
Joao Gante
a796f7eea6
Falcon: batched generation (#26137) 2023-09-13 17:00:52 +01:00
Lysandre Debut
22a69f1d7d
Put Falcon back (#25960)
* Put Falcon back

* Update src/transformers/models/auto/configuration_auto.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update test

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-09-04 14:17:09 -04:00
Joao Gante
53e2fd785b
Falcon: Add RoPE scaling (#25878) 2023-09-01 12:05:53 +01:00
Yih-Dar
bd90cda9a6
CI with num_hidden_layers=2 🚀🚀🚀 (#25266)
* CI with layers=2

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-08-02 20:22:36 +02:00
Yih-Dar
1b4f6199c6
Update tiny model info. and pipeline testing (#25213)
* update tiny_model_summary.json

* update

* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-07-31 19:35:33 +02:00
Matt
b3ab3fac1d
Falcon port (#24523)
* Initial commit

* Update src/transformers/models/falcon/configuration_falcon.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/falcon/configuration_falcon.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Cleanup config docstring

* Update src/transformers/models/falcon/configuration_falcon.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Convert to relative imports

* Remove torch < 1.8 warning

* Restructure cos_sin header

* qkv -> query, key, value

* Refactor attention calculation

* Add a couple of config variables to account for the different checkpoints

* Successful merging of the code paths!

* Fix misplaced line in the non-parallel attention path

* Update config and tests

* Add a pad_token_id when testing

* Support output_attentions when alibi is None

* make fixup

* Skip KV cache shape test

* No more _keys_to_ignore_on_load_missing

* Simplify self attention a bit

* Simplify self attention a bit

* make fixup

* stash commit

* Some more attention mask updates

* Should pass all tests except assisted generation!

* Add big model generation test

* make fixup

* Add temporary workaround for test

* Test overrides for assisted generation

* Update src/transformers/models/falcon/modeling_falcon.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/falcon/modeling_falcon.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/falcon/modeling_falcon.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update tests/models/falcon/test_modeling_falcon.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Test overrides for assisted generation

* Add generation demo

* Update copyright

* Make the docstring model actually small

* Add module-level docstring

* Remove all assertions

* Add copied from bloom

* Reformat the QKV layer

* Add copied from bloom

* Update src/transformers/models/falcon/modeling_falcon.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Remove unused line and reformat

* No single letter variables

* Cleanup return names

* Add copied from line

* Remove the deprecated arguments blocks

* Change the embeddings test to an alibi on/off test

* Remove position_ids from FalconForQA

* Remove old check for token type IDs

* Fix the alibi path when multi_query is False

* Update src/transformers/models/falcon/modeling_falcon.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/falcon/modeling_falcon.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/falcon/test_modeling_falcon.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update config naming

* Fix typo for new_decoder_architecture

* Add some comments

* Fix docstring

* Fix docstring

* Create range in the right dtype from the start

* Review comment cleanup

* n_head_kv -> num_kv_heads

* self.alibi -> self.use_alibi

* self.num_kv -> self.num_kv_heads

* Reorder config args

* Made alibi arguments Optional

* Add all model docstrings

* Add extra checkpoints

* Add author info for Falcon

* Stop removing token_type_ids because our checkpoints shouldn't return it anymore

* Add one hopeful comment for the future

* Fix typo

* Update tests, fix cache issue for generation

* Use -1e9 instead of -inf to avoid float overflow

* Recompute the rotary embeddings much less often

* Re-enable disabled tests

* One final fix to attention mask calculation, and update tests

* Cleanup targeting falcon-40b equivalency

* Post-rebase docs update

* Update docstrings, especially in the config

* More descriptive variable names, and comments where we can't rename them

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-07-11 13:36:31 +01:00