transformers.enable_full_determinism enables deterministic
flash attention using `FLASH_ATTENTION_DETERMINISTIC`
800510c67b/src/transformers/trainer_utils.py (L79)
However, current checks use a global variable `deterministic_g`,
which will do the environment variable check as soon as importing,
this will cause issues as users can call
`transformers.enable_full_determinism` after
`transformers.modeling_flash_attention_utils` is imported. This
behavior is introduced in
https://github.com/huggingface/transformers/pull/33932/files#r1806668579
to fix the graph break.
As a result, this PR implement fixes by delaying the environment variable
check to the first time when `_flash_attention_forward` is executed, so
that we can fix this issue and we won't introduce a graph break.
Signed-off-by: Hollow Man <hollowman@opensuse.org>
* squash commits
* rename gpu
* rename accelerator
* change _toctree.yml
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: sdp <sdp@a4bf01943ff7.jf.intel.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* A shallow copy in groundingdino
Fixes#37333
* Supprimer une ligne vide dans la classe GroundingDinoForObjectDetection
* Translate comments in the GroundingDinoForObjectDetection class from French to English
* make it go brrrr
* date time
* update
* fix
* up
* uppp
* up
* no number i
* udpate
* fix
* [paligemma] fix processor with suffix (#38365)
fix pg processor
* [video utils] group and reorder by number of frames (#38374)
fix
* Fix convert to original state dict for VLMs (#38385)
* fix convert to original state dict
* fix
* lint
* Update modeling_utils.py
* update
* warn
* no verbose
* fginal
* ouft
* style
---------
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>
* Use dict comprehension to create dict
* Fix type annotation
Union[Any] doesn't really make any sense
* Remove methods that are already implemented in the `UserDict` parent
class
* updates
* fixup
* fix tests
* fix test
* fix
* let it be here for now, till monday
* two more fixes
* persimmon
* fixup
* fix
* fixup
* make sure fuyu runs now that LM has new attn API
* fixup + tests
* qwen vl uses new mask interface as well
* qwen image features format
* update
* remove image_sizes
* address comments
* i am dumb...
* feat: add cache retention for requests
* fix: propagate `manual_eviction` param & refactor `finish_request`
`finish_request` now only takes `request_id: str` as an input rather
than the full `RequestState`, which was not needed and simplifies
calling from `ContinuousBatchingManager::evict_request_from_cache`
* refactor: pop req from `active_requests`
* Apply style fixes
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Update vit_mae.md
* badge float:right
* Update docs/source/en/model_doc/vit_mae.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/vit_mae.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/vit_mae.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/vit_mae.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/vit_mae.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/vit_mae.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/vit_mae.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/vit_mae.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/vit_mae.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update model_doc/vit_mae.md
* fix
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Updated the Model docs - for the ALIGN model
* Update docs/source/en/model_doc/align.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/align.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Updated align.md
* Update docs/source/en/model_doc/align.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/align.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update align.md
* fix
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Support tensor-valued _extra_state values
TransformerEngine uses the pytorch get/set_extra_state API to store FP8
layer config information as bytes Tensor in the _extra_state entry in
the state dict. With recent changes to from_pretrained, this
functionality has broken and loading a model that uses this API doesn't
appear to work. This PR fixes the save/load pretrained functions for
extra state entries that use a pytorch tensor, and adds a (currently
x-failing) test for a dictionary extra state.
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
* start refactoring whisper
* revert for now
* first step
* carry over attn fixes
* check if this works
* whisper has an off by one somewhere - cutting mask in any interface
* make it based on interface
* remove some tests that were skipped but now work
* some fixes for whisper tests
* interface changes
* change the order of fix
* some attention adjustments for eager + TP
* fix scaling
* mask changes
* why does whisper contain those extra seq lens?
* fix from config for fa2 as input_ids is invalid
* fix another test
* another fix
* disable flex attn due to compile issues
* copies and refactor for qwen audio since it somewhat relies on whisper
* fix scaling and smaller things
* retrigger
* new new interface version + more fixups
* adjust qwen
* add comment
* forgot this one
* change copies as whisper cuts on the mask
* add guard
* add flex attention
* switch to new mask function + add skips for torchscript
* remove old api with cache position
* last changes?
* trigger ci
* Updated OLMo2 model card
* added command line
* Add suggestions
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Added suggestions
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Indented code block as per suggestions
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update granite.md
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* update granite.md
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* minor fixes
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Modified BART documentation wrt to issue #36979.
* Modified BART documentation wrt to issue #36979.
* fixed a typo.
* Update docs/source/en/model_doc/bart.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bart.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bart.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bart.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bart.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bart.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* blank commit.
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Updated BERTweet model card.
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* updated toctree (EN).
* Updated BERTweet model card.
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* updated toctree (EN).
* Updated BERTweet model card.
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* updated toctree (EN).
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Updated BigBird Model card as per #36979.
* Update docs/source/en/model_doc/big_bird.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/big_bird.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/big_bird.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/big_bird.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>