* Fixed typo in multi gpu docs and OLMoE version
* Fixed typos in docs for agents, agents advanced, knowledge distillation, and image feature extraction
* Fixed incorrect usage of model.image_guided_detection in zero shot object detection docs
* Use torch.nn.attention.sdpa_kernel instead of deprecated torch.backends.cuda.sdp_kernel
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
* Fix test_eager_matches_sdpa_inference for XPU backend
As of PyTorch 2.5 XPU backend supports only torch.nn.attention.SDPBackend.MATH
which is implemented on PyTorch level using aten operators and is device
agnostic with respect to implementation of each aten operator. Thus, we can
reuse CUDA (or CPU) MATH weights for XPU.
Fixes: #34888
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
* Use torch.amp.autocast instead of deprecated torch.cuda.amp.autocast in nemotron
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
---------
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
* [PEFT] Set eval mode when loading PEFT adapter
Resolves#34469
When calling model.load_adapter to load a PEFT adapter, by default the
adapter should be set to eval mode. This is now correctly done. Users
can still pass is_trainable=True to load the adapter in training mode.
* Linter
* explain release_memory
* Update docs/source/en/llm_tutorial_optimization.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Add docs/source/ar/benchmarks.md to Add_docs_source_ar_benchmarks.md
* Update docs/source/ar/benchmarks.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/benchmarks.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/benchmarks.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/benchmarks.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/benchmarks.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/benchmarks.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/benchmarks.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/benchmarks.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/benchmarks.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/benchmarks.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/benchmarks.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update _toctree.yml
* Update benchmarks.md
---------
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Initial draft
* Add .jinja file loading for processors
* Add processor saving of naked chat template files
* make fixup
* Add save-load test for tokenizers
* Add save-load test for tokenizers
* stash commit
* Try popping the file
* make fixup
* Pop the arg correctly
* Pop the arg correctly
* Add processor test
* Fix processor code
* stash commit
* Processor clobbers child tokenizer's chat template
* Processor clobbers child tokenizer's chat template
* make fixup
* Split processor/tokenizer files to avoid interactions
* fix test
* Expand processor tests
* Rename arg to "save_raw_chat_template" across all classes
* Update processor warning
* Move templates to single file
* Move templates to single file
* Improve testing for processor/tokenizer clashes
* Improve testing for processor/tokenizer clashes
* Extend saving test
* Test file priority correctly
* make fixup
* Don't pop the chat template file before the slow tokenizer gets a look
* Remove breakpoint
* make fixup
* Fix error
* change apply_rotary_pos_emb
* upload for glm-edge
* remove useless part
* follow the suggestion
* fix
* format
* format
* test
* format again
* format again
* remove modular change
* remove modular change
* this apply_rotary_pos_emb need modify?
* fix with this
* format
* format
* ruff check
* modify modular_glm failed
* remove partial_rotary_factor of function partial_rotary_factor
* fix wrong change of examples/research_projects
* revert
* remove line 118
* use q_rot
* fix test_tiny_timestamp_generation
* fix test_large_timestamp_generation
* fix test_whisper_shortform_single_batch_prev_cond
* fix test_whisper_shortform_multi_batch_hard_prev_cond
* return_timestamps necessary with long form
* fix test_default_multilingual_transcription_long_form
* fix test_tiny_token_timestamp_generation_longform
* fix test_whisper_longform_multi_batch_hard
* Update tests/models/whisper/test_modeling_whisper.py
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* fix typo
* do not expect special tokens
* fix test_whisper_longform_single_batch_beam
* fix test_whisper_longform_multi_batch_hard_prev_cond
* update test_whisper_longform_multi_batch_hard_prev_cond
* update test_whisper_longform_multi_batch_hard_prev_cond
* these tests does not make sense anymore
* this test does not make sense anymore
* make fixup
* suggested nits
* add test with forced_decoder_ids
* this test does not make sense anymore
* change assert for unittest test cases
* make fixup
* test with prompt_ids and task and language
* fix unittest test case call
* fix test_tiny_generation
* fix test_tiny_en_generation
* fix test_tiny_en_batched_generation
* fix test_tiny_longform_timestamps_generation
* fix test_tiny_timestamp_generation
* fix test_large_generation
* fix test_large_batched_generation
* fix test_large_generation_multilingual
* fix test_large_timestamp_generation
* fix test_large_timestamp_generation
* fix test_tiny_token_timestamp_generation_longform
* fix test_tiny_en_batched_generation
* make fixup
* [run-slow] whisper
---------
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
* Updated documentation and added conversion utility
* Update docs/source/en/tiktoken.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/tiktoken.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Moved util function to integration folder + allow for str
* Update formatting
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Updated formatting
* style changes
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>