Support tensor-valued _extra_state values
TransformerEngine uses the pytorch get/set_extra_state API to store FP8
layer config information as bytes Tensor in the _extra_state entry in
the state dict. With recent changes to from_pretrained, this
functionality has broken and loading a model that uses this API doesn't
appear to work. This PR fixes the save/load pretrained functions for
extra state entries that use a pytorch tensor, and adds a (currently
x-failing) test for a dictionary extra state.
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
* start refactoring whisper
* revert for now
* first step
* carry over attn fixes
* check if this works
* whisper has an off by one somewhere - cutting mask in any interface
* make it based on interface
* remove some tests that were skipped but now work
* some fixes for whisper tests
* interface changes
* change the order of fix
* some attention adjustments for eager + TP
* fix scaling
* mask changes
* why does whisper contain those extra seq lens?
* fix from config for fa2 as input_ids is invalid
* fix another test
* another fix
* disable flex attn due to compile issues
* copies and refactor for qwen audio since it somewhat relies on whisper
* fix scaling and smaller things
* retrigger
* new new interface version + more fixups
* adjust qwen
* add comment
* forgot this one
* change copies as whisper cuts on the mask
* add guard
* add flex attention
* switch to new mask function + add skips for torchscript
* remove old api with cache position
* last changes?
* trigger ci
* Updated OLMo2 model card
* added command line
* Add suggestions
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Added suggestions
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Indented code block as per suggestions
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update granite.md
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* update granite.md
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* minor fixes
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Modified BART documentation wrt to issue #36979.
* Modified BART documentation wrt to issue #36979.
* fixed a typo.
* Update docs/source/en/model_doc/bart.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bart.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bart.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bart.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bart.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bart.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* blank commit.
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Updated BERTweet model card.
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* updated toctree (EN).
* Updated BERTweet model card.
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* updated toctree (EN).
* Updated BERTweet model card.
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* updated toctree (EN).
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Updated BigBird Model card as per #36979.
* Update docs/source/en/model_doc/big_bird.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/big_bird.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/big_bird.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/big_bird.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* update model page.
* update model page.
* Update docs/source/en/model_doc/mamba2.md
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
* update the model page.
* update.
* Apply suggestions from code review
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
* Apply the suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* add an quantization example and update the toctree.
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* remove the additional comma
---------
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* standardize
* fix tests
* batch update some processors, not final yet
* oke, now I tested that everything indeed runs. Still needs prettification
* emu3
* fixup
* gemma3 but it doesn't generate anything
* fuyu
* update
* why?
* Update src/transformers/models/aya_vision/processing_aya_vision.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* address comments
* bc
* why do we need to guard import this every time?
* i hate guarded imports
* i am blind
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Firstly: Better detection of when we're a custom class
* Trigger tests
* Let's break everything
* make fixup
* fix mistaken line doubling
* Let's try to get rid of it from config classes at least
* Let's try to get rid of it from config classes at least
* Fixup image processor
* no more circular import
* Let's go back to setting `_auto_class` again
* Let's go back to setting `_auto_class` again
* stash commit
* Revert the irrelevant changes until we figure out AutoConfig
* Change tests since we're breaking expectations
* make fixup
* do the same for all custom classes
* Cleanup for feature extractor tests
* Cleanup tokenization tests too
* typo
* Fix tokenizer tests
* make fixup
* fix image processor test
* make fixup
* Remove warning from register_for_auto_class
* Stop adding model info to auto map entirely
* Remove todo
* Remove the other todo
* Let's start slapping _auto_class on models why not
* Let's start slapping _auto_class on models why not
* Make sure the tests know what's up
* Make sure the tests know what's up
* Completely remove add_model_info_to_*
* Start adding _auto_class to models
* Start adding _auto_class to models
* Add a flaky decorator
* Add a flaky decorator and import
* stash commit
* More message cleanup
* make fixup
* fix indent
* Fix trust_remote_code prompts
* make fixup
* correct indentation
* Reincorporate changes into dynamic_module_utils
* Update call to trust_remote_code
* make fixup
* Fix video processors too
* Fix video processors too
* Remove is_flaky additions
* make fixup
* let's try a non-regex solution
* make fixup
* Slight adjustment
* Let's just use the original code with a check
* slight tweak to conditional
* slight tweak to conditional