* Add saving in the new format (but no loading yet!)
* Add saving in the new format (but no loading yet!)
* A new approach to template files!
* make fixup
* make fixup, set correct dir
* Some progress but need to rework for cached_file
* Rework loading handling again
* Small fixes
* Looks like it's working now!
* make fixup
* Working!
* make fixup
* make fixup
* Add TODO so I don't miss it
* Cleaner control flow with one less indent
* Copy the new logic to processing_utils as well
* Proper support for dicts of templates
* make fixup
* define the file/dir names in a single place
* Update the processor chat template reload test as well
* Add processor loading of multiple templates
* Flatten correctly to match tokenizers
* Better support when files are empty sometimes
* Stop creating those empty templates
* Revert changes now we don't have empty templates
* Revert changes now we don't have empty templates
* Don't support separate template files on the legacy path
* Rework/simplify loading code
* Make sure it's always a chat_template key in chat_template.json
* Update processor handling of multiple templates
* Add a full save-loading test to the tokenizer tests as well
* Correct un-flattening
* New test was incorrect
* Correct error/offline handling
* Better exception handling
* More error handling cleanup
* Add skips for test failing on main
* Reorder to fix errors
* make fixup
* clarify legacy processor file docs and location
* Update src/transformers/processing_utils.py
Co-authored-by: Lucain <lucainp@gmail.com>
* Update src/transformers/processing_utils.py
Co-authored-by: Lucain <lucainp@gmail.com>
* Update src/transformers/processing_utils.py
Co-authored-by: Lucain <lucainp@gmail.com>
* Update src/transformers/processing_utils.py
Co-authored-by: Lucain <lucainp@gmail.com>
* Rename to _jinja and _legacy
* Stop saving multiple templates in the legacy format
* Cleanup the processing code
* Cleanup the processing code more
* make fixup
* make fixup
* correct reformatting
* Use correct dir name
* Fix import location
* Use save_jinja_files instead of save_raw_chat_template_files
* Correct the test for saving multiple processor templates
* Fix type hint
* Update src/transformers/utils/hub.py
Co-authored-by: Julien Chaumond <julien@huggingface.co>
* Patch llava_onevision test
* Update src/transformers/processing_utils.py
Co-authored-by: Julien Chaumond <julien@huggingface.co>
* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Julien Chaumond <julien@huggingface.co>
* Refactor chat template saving out into a separate function
* Update tests for the new default
* Don't do chat template saving logic when chat template isn't there
* Ensure save_jinja_files is propagated to tokenizer correctly
* Trigger tests
* Update more tests to new default
* Trigger tests
---------
Co-authored-by: Lucain <lucainp@gmail.com>
Co-authored-by: Julien Chaumond <julien@huggingface.co>
* the fix that did not get in
* add kernels
* full graph does not work
* simpler is better
* Update src/transformers/integrations/hub_kernels.py
Co-authored-by: Daniël de Kok <me@danieldk.eu>
* Update src/transformers/integrations/fbgemm_fp8.py
Co-authored-by: Daniël de Kok <me@danieldk.eu>
* Update src/transformers/integrations/hub_kernels.py
Co-authored-by: Daniël de Kok <me@danieldk.eu>
* fixup
---------
Co-authored-by: Daniël de Kok <me@danieldk.eu>
Corrects the file path used to locate the CUDA kernels
for the Deformable Attention module. This ensures that
the kernels are loaded correctly, resolving potential
errors during module initialization and usage.
Previously, the identity function was used for dropped tokens
with a weight from the expert that was not applied to the hidden states.
This was misleading, because dropping means, the expert weight is zero.
Instead of trying to fix the weight, we take an easier approach by initializing with zeros.
Fixes issue https://github.com/huggingface/transformers/issues/37017
* add classifier head to donut
* add to transformers __init__
* add to auto model
* fix typo
* add loss for image classification
* add checkpoint
* remove no needed import
* reoder import
* format
* consistency
* add test of classifier
* add doc
* try ignore
* update loss for all swin models
* fix tests and some clean up
* make one general test for each modality
* remove redundant merging of kwargs
* edge cases
* dont enforce slow when reloading
* fix gemma3 tests
* has to adapt llama 4 after rebase
* remove also from overriden tests
* should be green now
* from_pretrained should handle xpu case
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* fmt
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
---------
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* debugging improvements
* add debugging details
* add more debugging details
* debug more
* the fix that did not get in
* First fix flex
* fix query offset
* fix flex first
* fix device mask creation for speed
* small mask creation sdpa
* Update flex_attention.py
* remove chunked prefill from HybridChunkedCache
* never seen such a fucked up merged
* clean up layers + output
* add summary json file
* Efficient general cache
* Update cache_utils.py
* cleanup
* fix?
* fix!
* oups typo
* not everywhere
* more fixes
* revert unrelated changes
* Fix but ugly for now -> should use pad instead
* oups
* re-initialize the cache
* Use pad to simplify
* style
* correct slicing
---------
Co-authored-by: Pablo <pablo.montalvo.leroux@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* add peft model in constant
* add test
* fix formating
* make fixup execute
* change code
* check by self.task
* add test
* fixup test code
* fix minor typo
* fix pipeline test
* apply maintainers reqests
* initial draft
* make documentation simpler
* Update docs/source/en/quantization/selecting.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/selecting.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/selecting.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/selecting.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/selecting.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/selecting.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/selecting.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/selecting.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/selecting.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/selecting.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* turn pros and cons into tables
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* add links to each quant method page
* separate calibration vs no calibration methods
* add calibration time estimates
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* add changed
* Revert "add changed"
This reverts commit 0a0166a1fe.
* update with NEW MODEL class called GLM4
* update
* Update glm4.md
* Name
* style
* fix copies
* fixup test
---------
Co-authored-by: Yuxuan Zhang <2448370773@qq.com>
fix conversion script no_rope_layers
`no_rope_layers` should either be a list of NoPE layers or None, such that it is created in the config from the `no_rope_layer_interval`
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Preserve requires_grad in pre quantized model
Summary:
discovered this when running lm-eval for some models, current
code will set requires_grad to True always
Test Plan:
lm_eval --model hf --model_args pretrained=jerryzh168/phi4-torchao-gguf-q4_k --tasks hellaswag --device cuda:0 --batch_size 8
Reviewers:
Subscribers:
Tasks:
Tags:
* ruff format
---------
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
* More limited setup -> setupclass conversion
* make fixup
* Trigger tests
* Fixup UDOP
* Missed a spot
* tearDown -> tearDownClass where appropriate
* Couple more class fixes
* Fixups for UDOP and VisionTextDualEncoder
* Ignore errors when removing the tmpdir, in case it already got cleaned up somewhere
* CLIP fixes
* More correct classmethods
* Wav2Vec2Bert fixes
* More methods become static
* More class methods
* More class methods
* Revert changes for integration tests / modeling files
* Use a different tempdir for tests that actually write to it
* Remove addClassCleanup and just use teardownclass
* Remove changes in modeling files
* Cleanup get_processor_dict() for got_ocr2
* Fix regression on Wav2Vec2BERT test that was masked by this before
* Rework tests that modify the tmpdir
* make fix-copies
* revert clvp modeling test changes
* Fix CLIP processor test
* make fix-copies