* add more cases
* fix method not found in unittest
Signed-off-by: Lin, Fanli <fanli.lin@intel.com>
* fix more cases
* add more models
* add all
* no unittest.case
* remove for oneformer
* fix style
---------
Signed-off-by: Lin, Fanli <fanli.lin@intel.com>
* draft, run model as compreszed/uncompressed mode
* draft
* run run_compressed=False
* run_compressed as attr
* set run_compressed=False using quantization_config
* remove redundant line
* make is_qat_trainable dependent on run_compressed status
* add tests
* lint
* full in docstring
* add decompress
* comments
* decompress if model is compresssed and not run_compressed
* apply_quant_config logic fix -- populate statedict properly
* comments
* remove non compressed model
* make is_compressed as property
* cosmetic
* run apply_quant_config for non-compressed models -- popualte scales and zeropoints
* add pahtway for decompressing sparse models
* typo on is_quantization_compressed
* lint
* fix typo
* Add files
* Init
* Add TimmWrapperModel
* Fix up
* Some fixes
* Fix up
* Remove old file
* Sort out import orders
* Fix some model loading
* Compatible with pipeline and trainer
* Fix up
* Delete test_timm_model_1/config.json
* Remove accidentally commited files
* Delete src/transformers/models/modeling_timm_wrapper.py
* Remove empty imports; fix transformations applied
* Tidy up
* Add image classifcation model to special cases
* Create pretrained model; enable device_map='auto'
* Enable most tests; fix init order
* Sort imports
* [run-slow] timm_wrapper
* Pass num_classes into timm.create_model
* Remove train transforms from image processor
* Update timm creation with pretrained=False
* Fix gamma/beta issue for timm models
* Fixing gamma and beta renaming for timm models
* Simplify config and model creation
* Remove attn_implementation diff
* Fixup
* Docstrings
* Fix warning msg text according to test case
* Fix device_map auto
* Set dtype and device for pixel_values in forward
* Enable output hidden states
* Enable tests for hidden_states and model parallel
* Remove default scriptable arg
* Refactor inner model
* Update timm version
* Fix _find_mismatched_keys function
* Change inheritance for Classification model (fix weights loading with device_map)
* Minor bugfix
* Disable save pretrained for image processor
* Rename hook method for loaded keys correction
* Rename state dict keys on save, remove `timm_model` prefix, make checkpoint compatible with `timm`
* Managing num_labels <-> num_classes attributes
* Enable loading checkpoints in Trainer to resume training
* Update error message for output_hidden_states
* Add output hidden states test
* Decouple base and classification models
* Add more test cases
* Add save-load-to-timm test
* Fix test name
* Fixup
* Add do_pooling
* Add test for do_pooling
* Fix doc
* Add tests for TimmWrapperModel
* Add validation for `num_classes=0` in timm config + test for DINO checkpoint
* Adjust atol for test
* Fix docs
* dev-ci
* dev-ci
* Add tests for image processor
* Update docs
* Update init to new format
* Update docs in configuration
* Fix some docs in image processor
* Improve docs for modeling
* fix for is_timm_checkpoint
* Update code examples
* Fix header
* Fix typehint
* Increase tolerance a bit
* Fix Path
* Fixing model parallel tests
* Disable "parallel" tests
* Add comment for metadata
* Refactor AutoImageProcessor for timm wrapper loading
* Remove custom test_model_outputs_equivalence
* Add require_timm decorator
* Fix comment
* Make image processor work with older timm versions and tensor input
* Save config instead of whole model in image processor tests
* Add docstring for `image_processor_filename`
* Sanitize kwargs for timm image processor
* Fix doc style
* Update check for tensor input
* Update normalize
* Remove _load_timm_model function
---------
Co-authored-by: Amy Roberts <22614925+amyeroberts@users.noreply.github.com>
Original issue: https://github.com/huggingface/peft/issues/2256
There is a potential error when using load_best_model_at_end=True with a
prompt learning PEFT method. This is because Trainer uses load_adapter
under the hood but with some prompt learning methods, there is an
optimization on the saved model to remove parameters that are not
required for inference, which in turn requires a change to the model
architecture. This is why load_adapter will fail in such cases and users
should instead set load_best_model_at_end=False and use
PeftModel.from_pretrained. As this is not obvious, we now intercept the
error and add a helpful error message.
* Support BatchNorm in Hubert pos_conv_emb as in fairseq
* Correct the new defaults (#34377)
* Correct the new defaults
* CIs
* add check
* Update utils.py
* Update utils.py
* Add the max_length in generate test checking shape without passing length
* style
* CIs
* fix fx CI issue
* [auto. ping] Avoid sending empty info + add more team members (#34383)
* update
* update
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Fix glm (#34388)
* Fix duplicated
* fix import
* Use non nested images and batched text Idefics2/3 (#34222)
* add support for non nested images and add tests
* add tests error scenario
* fix style
* added single and no image to error tests
* Fix onnx non-expotable inplace aten op (#34376)
* fix onnx non-expotable inplace op
* mistral, qwen2, qwen2_vl, starcoder2
* fixup copies
* Fix right padding in LLaVA models (#34305)
* fix right pad llavas
* device mismatch
* no filter (#34391)
* no filter
* no filter
* no filter
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* SynthID: better example (#34372)
* better example
* Update src/transformers/generation/configuration_utils.py
* Update src/transformers/generation/logits_process.py
* nits
* Tests: upgrade `test_eager_matches_sdpa_generate` (#34386)
* Fix bnb training test failure (#34414)
* Fix bnb training test: compatibility with OPTSdpaAttention
* Avoid check expected exception when it is on CUDA (#34408)
* update
* update
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Fix typos in agents_advanced.md (#34405)
* [docs] Cache implementations (#34325)
cache
* [run-slow] hubert
* Support BatchNorm in Hubert pos_conv_emb as in fairseq
Add conversion integration test, and make batchnorm explicit variable
* Support BatchNorm in Hubert pos_conv_emb as in fairseq
fix make fixup styling changes
* [run-slow] hubert
* Support BatchNorm in Hubert pos_conv_emb as in fairseq
* [run-slow] hubert
* Support BatchNorm in Hubert pos_conv_emb as in fairseq
Add conversion integration test, and make batchnorm explicit variable
* Support BatchNorm in Hubert pos_conv_emb as in fairseq
fix make fixup styling changes
* [run-slow] hubert
* [run-slow] hubert
---------
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
Co-authored-by: Rudy Delouya <rudy.delouya@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* fix GA bugs and add unit test
* narrow down model loss unit test diff gap
* format code to make ruff happy
* send num_items_in_batch argument to decoder
* fix GA loss bug in BertLMHeadModel
* use TinyStories-33M to narrow down diff gap
* fotmat code
* missing .config
* avoid add extra args
---------
Co-authored-by: kangsheng <kangsheng@meituan.com>
* gpt neox flex attention + refactor
* some formatting
* small fix on dropout
* add assertion on flex attn test
* flaky ci :(
* add head mask support
* style
* handle dtype, replace torch where
* fixup flex with output attns
* code review and several other fixes
* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* style
* remove unnecessary comment
* remove incorrect comment
* make flex attn check more agnostic tor versions and centralized
* change peft input dtype check to value since q and k could be affected by other stuff like RoPE
* i forgor
* flaky
* code review and small fixes
* Update src/transformers/models/gpt_neox/modeling_gpt_neox.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Use torch.nn.attention.sdpa_kernel instead of deprecated torch.backends.cuda.sdp_kernel
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
* Fix test_eager_matches_sdpa_inference for XPU backend
As of PyTorch 2.5 XPU backend supports only torch.nn.attention.SDPBackend.MATH
which is implemented on PyTorch level using aten operators and is device
agnostic with respect to implementation of each aten operator. Thus, we can
reuse CUDA (or CPU) MATH weights for XPU.
Fixes: #34888
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
* Use torch.amp.autocast instead of deprecated torch.cuda.amp.autocast in nemotron
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
---------
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
* [PEFT] Set eval mode when loading PEFT adapter
Resolves#34469
When calling model.load_adapter to load a PEFT adapter, by default the
adapter should be set to eval mode. This is now correctly done. Users
can still pass is_trainable=True to load the adapter in training mode.
* Linter
* Initial draft
* Add .jinja file loading for processors
* Add processor saving of naked chat template files
* make fixup
* Add save-load test for tokenizers
* Add save-load test for tokenizers
* stash commit
* Try popping the file
* make fixup
* Pop the arg correctly
* Pop the arg correctly
* Add processor test
* Fix processor code
* stash commit
* Processor clobbers child tokenizer's chat template
* Processor clobbers child tokenizer's chat template
* make fixup
* Split processor/tokenizer files to avoid interactions
* fix test
* Expand processor tests
* Rename arg to "save_raw_chat_template" across all classes
* Update processor warning
* Move templates to single file
* Move templates to single file
* Improve testing for processor/tokenizer clashes
* Improve testing for processor/tokenizer clashes
* Extend saving test
* Test file priority correctly
* make fixup
* Don't pop the chat template file before the slow tokenizer gets a look
* Remove breakpoint
* make fixup
* Fix error
* fix test_tiny_timestamp_generation
* fix test_large_timestamp_generation
* fix test_whisper_shortform_single_batch_prev_cond
* fix test_whisper_shortform_multi_batch_hard_prev_cond
* return_timestamps necessary with long form
* fix test_default_multilingual_transcription_long_form
* fix test_tiny_token_timestamp_generation_longform
* fix test_whisper_longform_multi_batch_hard
* Update tests/models/whisper/test_modeling_whisper.py
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* fix typo
* do not expect special tokens
* fix test_whisper_longform_single_batch_beam
* fix test_whisper_longform_multi_batch_hard_prev_cond
* update test_whisper_longform_multi_batch_hard_prev_cond
* update test_whisper_longform_multi_batch_hard_prev_cond
* these tests does not make sense anymore
* this test does not make sense anymore
* make fixup
* suggested nits
* add test with forced_decoder_ids
* this test does not make sense anymore
* change assert for unittest test cases
* make fixup
* test with prompt_ids and task and language
* fix unittest test case call
* fix test_tiny_generation
* fix test_tiny_en_generation
* fix test_tiny_en_batched_generation
* fix test_tiny_longform_timestamps_generation
* fix test_tiny_timestamp_generation
* fix test_large_generation
* fix test_large_batched_generation
* fix test_large_generation_multilingual
* fix test_large_timestamp_generation
* fix test_large_timestamp_generation
* fix test_tiny_token_timestamp_generation_longform
* fix test_tiny_en_batched_generation
* make fixup
* [run-slow] whisper
---------
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* allow unused parameter passthrough when chunking in asr pipelines
* format code
* format
* run fixup
* update tests
* update parameters to pipline in test
* updates parametrs in tests
* change spelling in gitignore
* revert .gitignore to main
* add git ignore of devcontainer folder
* assert asr output follows expected inference output type
* run fixup
* Remove .devcontainer from .gitignore
* remove compliance check
* Add Nemotron GGUF Loading Support
* fix the Nemotron architecture assignation
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Do not load for meta device
* Make some minor improvements
* Add test
* Update tests/utils/test_modeling_utils.py
Update test parameters
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Make the test simpler
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>