* stash commit
* Experiment 1: Try just Gemma
* Experiment 1: Just try Gemma
* make fixup
* Trigger tests
* stash commit
* Try adding Gemma3 as well
* make fixup
* Correct attrib names
* Correct pipeline model mapping
* Add in all_model_classes for Gemma1 again
* Move the pipeline model mapping around again
* make fixup
* Revert Gemma3 changes since it's a VLM
* Let's try Falcon
* Correct attributes
* Correct attributes
* Let's try just overriding get_config() for now
* Do Nemotron too
* And Llama!
* Do llama/persimmon
* Correctly skip tests
* Fix Persimmon
* Include Phimoe
* Fix Gemma2
* Set model_tester_class correctly
* Add GLM
* More models!
* models models models
* make fixup
* Add Qwen3 + Qwen3MoE
* Correct import
* make fixup
* Add the QuestionAnswering classes
* Add the QuestionAnswering classes
* Move pipeline mapping to the right place
* Jetmoe too
* Stop RoPE testing models with no RoPE
* Fix up JetMOE a bit
* Fix up JetMOE a bit
* Can we just force pad_token_id all the time?
* make fixup
* fix starcoder2
* Move pipeline mapping
* Fix RoPE skipping
* Fix RecurrentGemma tests
* Fix Falcon tests
* Add MoE attributes
* Fix values for RoPE testing
* Make sure we set bos_token_id and eos_token_id in an appropriate range
* make fixup
* Fix GLM4
* Add mamba attributes
* Revert bits of JetMOE
* Re-add the JetMOE skips
* Update tests/causal_lm_tester.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Add licence
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* fix: format codes
* chore: fix copy mismatch issue
* fix: format codes
* chore: fix copy mismatch issue
* chore: fix copy mismatch issue
* chore: fix copy mismatch issue
* chore: restore previous words
* chore: revert unexpected changes
* tmp commit
* move tests to the right class
* remove ALL all_generative_model_classes = ...
* skip tf roberta
* skip InstructBlipForConditionalGenerationDecoderOnlyTest
* videollava
* reduce diff
* reduce diff
* remove on vlms
* fix a few more
* manual rebase bits
* more manual rebase
* remove all manual generative model class test entries
* fix up to ernie
* a few more removals
* handle remaining cases
* recurrent gemma
* it's better here
* make fixup
* tf idefics is broken
* tf bert + generate is broken
* don't touch tf :()
* don't touch tf :(
* make fixup
* better comments for test skips
* revert tf changes
* remove empty line removal
* one more
* missing one
* use torch.testing.assertclose instead to get more details about error in cis
* fix
* style
* test_all
* revert for I bert
* fixes and updates
* more image processing fixes
* more image processors
* fix mamba and co
* style
* less strick
* ok I won't be strict
* skip and be done
* up
* Add MistralForTokenClassification
* Add tests and docs
* Add token classification for Mixtral and Qwen2
* Save llma for token classification draft
* Add token classification support for Llama, Gemma, Persimmon, StableLm and StarCoder2
* Formatting
* Add token classification support for Qwen2Moe model
* Add dropout layer to each ForTokenClassification model
* Add copied from in tests
* Update src/transformers/models/llama/modeling_llama.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Propagate suggested changes
* Style
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* init: add StableLm 2 support
* add integration test for parallel residual and qk layernorm
* update(modeling): match qk norm naming for consistency with phi/persimmon
* fix(tests): run fwd/bwd on random init test model to jitter norm weights off identity
* `use_parallel_residual`: add copy pointer to `GPTNeoXLayer.forward`
* refactor: rename head states var in `StableLmLayerNormPerHead`
* tests: update test model and add generate check