Yao Matrix
8f6b27eb5c
enable test_assisted_decoding_in_different_gpu
test on XPU ( #37120 )
...
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-04-01 11:22:59 +02:00
jiqing-feng
737cbd2109
Fix llava xpu tests. ( #37130 )
...
* fix llava 4bit xpu test
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix llava 4bit xpu test
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-04-01 11:10:13 +02:00
jiqing-feng
3a6ab46a0b
add gpt2 test on XPU ( #37028 )
...
* add gpt2 test on XPU
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* auto dtype has been fixed
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* convert model to train mode
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-04-01 11:09:29 +02:00
cyyever
786d9c5ed9
Fix more inefficient PT operations ( #37060 )
...
* Fix inefficient operations
* Remove cpu() call
* Reorder detach()
* Reorder detach()
* tolist without detach
* item without detach
* Update src/transformers/models/rag/modeling_rag.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update tests/models/encodec/test_modeling_encodec.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Use detach().cpu().numpy
* Revert some numpy operations
* More fixes
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-03-31 16:31:24 +01:00
Pavel Iakubovskii
a1e389e637
Refactor return_dict
logic to remove complicated if/else paths ( #36794 )
...
* SAM
* CLIP
* SigLIP
* GOT-OCR2 (depends on SAM)
* SigLIP2 (depends on SigLIP)
* trigger tests
* Fix SAM
* Fix missed indexing, use named attributes
* Llama
* Aria
* Bamba
* Update llama: missed outputs return type
* (fixup) Aria
* DiffLlama
* Emu3
* Gemma
* Gemma2
* Paligemma
* Fix paligemma
* Gemma3
* GLM
* Helium
* JetMoe
* Jamba
* Mistral
* Mistral
* Mixtral
* Nemotron
* Olmo
* Olmo2
* Persimmon
* Phi
* Phi3
* PhiMoe
* Qwen2
* Qwen2_moe
* StableLM
* Starcoder2
* Add return_dict decorator
* SAM
* Update decorator: compile, export, trace - friendly
* Llama (decorator)
* SAM (decorator)
* Add decorator `can_return_tuple`
* Llama
* Update to decorator
* Update CLIP
* Update decorator to store `_is_top_level_module` in self
* Update decorator to correctly handle compile/export
* Remove is_torchdynamo_compiling constraint, all work fine with self attribute assignment
* Typing
* GPT NeoX
* Fixup
* Fix attribute Granite
* Fix return type mixtral
* Update Gemma3
* Fix Cohere amd Cohere2
* Fixup
* Fix corner case for Phi4, when activation is shared
* (fix-copies) deepseekv3, phi4
* Fixup
* Apply to qwen3/qwen3_moe
* Fix
2025-03-31 16:23:37 +01:00
Cyril Vallez
f304318f5f
Remove low_cpu_mem_usage and _fast_init ( #36963 )
...
* Remove low_cpu_mem_usage and _fast_init
* Update deepspeed.py
* Update modeling_utils.py
* remove the first 2 tests everywhere
* Update test_modeling_common.py
* remove what was remaining about fast_init
* fix logic and simplify
* mismatched keys logic update
* Update modeling_utils.py
* Update modeling_utils.py
* Update modeling_utils.py
* Update modeling_utils.py
* fix 2 models init_weights
* extend to others
* remove grad
* Update modeling_fsmt.py
* init weights in tests
* style
* Update test_modeling_fsmt.py
* more old models
* fix more init_weights
* copies
* fix
* style
* Update modeling_lxmert.py
* fix inits
* more and more
* more
* should finalize
* style
* Update modeling_dinov2_with_registers.py
* fix
* Update modeling_encoder_decoder.py
* fix
* style
* Update modeling_lxmert.py
* post rebase cleanup
* Update modeling_informer.py
* back to start for device
* fix
* add test to detect all failing cases correctly
* Update test_modeling_common.py
* fix
* fix
* sam
* style
* Update modeling_maskformer_swin.py
* CIs
* CIs
* remove test - will add it on separate PR
* fix
* fix
* Update modeling_sam.py
* CIs
* CIs
* CIs
* convnext
* suggestions
* CIs
* fix copies after merge
---------
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-03-31 17:18:43 +02:00
Raushan Turganbay
8805600406
[qwen3] fix generation tests ( #37142 )
...
* do not skip tests
* fix qwen3-moe as well
* fixup
* fixup
2025-03-31 16:33:41 +02:00
Zhen
e686fed635
[Feature] Support using FlashAttention2 on Ascend NPU ( #36696 )
...
* [Feature] Support using flash-attention on Ascend NPU
* Fix qwen3 and qwen3_moe moduler conversion mismatch
2025-03-31 16:12:58 +02:00
Yih-Dar
a03cee7a1d
skip ( #37141 )
...
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-31 15:38:40 +02:00
Guang Yang
3b07ca78bb
Export T5 (encoder-decoder) to ExecuTorch ( #36486 )
...
Co-authored-by: Guang Yang <guangyang@fb.com>
2025-03-31 12:10:26 +02:00
Fanli Lin
475664e2c6
[tests] remove cuda-only test marker in AwqConfigTest
( #37032 )
...
* enable on xpu
* add xpu support
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-03-31 11:53:02 +02:00
Armaghan Shakir
0710e9b1e8
Create and Expose SamVisionModel as public for better accessibility ( #36493 )
...
* move encoder below
* auto modeling
* write SamVisionTester
* fix vision attention shape
* fix SamVisionTest
* minor changes to SamVisionTest
* Revert "fix vision attention shape"
This reverts commit d2a4083ae5
.
* fix attention output shape in new tests
* remove encoder examples
* run modular on got_ocr2
* code formatting
* fix got_ocr2
* ruff fixes
* code quality
* add sam_vision in auto modeling and auto configuration
* remove composite test
* updated index.md
* add TFSamVisionEncoder to __init__
* fix public TFSamVisionEncoder
* remove outdated todo comment
* set test_torch_exportable
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* rename: VisionEncoder -> VisionModel
* bring back original SamVisionEncoder
* rename back: VisionEncoderOutput -> VisionModelOutput
* undo changes in SamModelTester
* reuse SamVisionEncoder in SamVisionModel
---------
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-03-31 11:45:07 +02:00
cyyever
f99c279d20
Remove deprecated code ( #37059 )
...
* Remove deprecated code
* fix get_loading_attributes
* fix error
* skip test
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-03-31 11:15:35 +02:00
huismiling
d0b65bb479
[MLU] Fix FA2 check error, remove deepspeed-mlu deps. ( #36159 )
...
* add Cambricon MLUs support
* fix mlu device rng state
* up for quality check
* up mlu to support fp16
* fix mlu device dependency error
* fix mlu device dependency error
* enable mlu device for bf16
* fix mlu device memory tracker
* Cambricon support SDPA and flash_attn
* MLU devices : Checks if `mlu` is available via an `cndev-based` check which won't trigger the drivers and leave mlu
* Fix mlu FA2 check. Remove deepspeed-mlu check. add mlu tests support.
* fix testing errors.
* Merge branch 'hf/main' into main
* fix get_device_count error.
* fix mlu testing utils.
* fix code quality and style.
* switch to @require_torch_multi_accelerator
2025-03-31 11:02:49 +02:00
jiqing-feng
286393fbb1
enable tp on CPU ( #36299 )
...
* enable tp on CPU
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* get rank from cpu
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* update
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* enable TP tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix comment
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* em print
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix model id
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix conflict
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix index and add doc
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-03-31 10:55:47 +02:00
Bo Zheng
6acd5aecb3
Adding Qwen3 and Qwen3MoE ( #36878 )
...
* Initial commit for Qwen3
* fix and add tests for qwen3 & qwen3_moe
* rename models for tests.
* fix
* fix
* fix and add docs.
* fix model name in docs.
* simplify modular and fix configuration issues
* Fix the red CI: ruff was updated
* revert ruff, version was wrong
* fix qwen3moe.
* fix
* make sure MOE can load
* fix copies
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
2025-03-31 09:50:49 +02:00
Joao Gante
9fd9476005
[generate] beam search -- fix output cropping ( #37080 )
...
* handle jagged beams
* better comment
* bart -- beam search tests print special tokens
* more bart test updates
* more tests!
* better comment
2025-03-28 18:57:51 +01:00
Minho Ryu
eca74d1367
[WIP] add deepseek-v3 ( #35926 )
...
* init commit
* style
* take comments into account
* add deepseekv3 modeling
* remove redundant code
* apply make style
* apply fix-copies
* make format
* add init files
* rename deepseekv3 into deepseek_v3 based on its model_type
* rename deepseekv3 into deepseek_v3 based on its model_type
* deepseek-v3 not deepseek_v3
* set model_type as deepseek_v3
* use default docs
* apply make
* fill type and docstring
* add rope_config_validation
* use custom DeepseekV3MLP
* hold code only for checkpoints congifuration; remove redundant
* revise rope yarn for DeepSeek variation
* rename DeepSeek-V3
* some refactoring
* revise load_hook to work properly; make moe func trainable; use llama instead of mixtral
* fix attention forward
* use -1 for not-changing dim when to use exapnd
* refactor DeepseekV3TopkRouter
* use reshape_for_rope instead of load_hook; revise attention forward for TP; rename q_head_dim with qk_head_dim
* register pre_hook and hook both
* make style
* use n_shared_experts
* Update src/transformers/models/deepseek_v3/configuration_deepseek_v3.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add test file
* update modeling_file according to modular file
* make style
* add mapping for DeepseekV3ForSequenceClassification
* remove aux_loss_alpha
* add deepseek_v3 for perf
* add deepseek_v3
* rename test as deepseekv3
* use tiny-deepseek-v3
* remove DeepseekV3ForSequenceClassification
* cache before padding
* remote output_router_logits
* Revert "remote output_router_logits"
This reverts commit f264f800d0
.
* remove output_router_logits
* make e_score_correction_bias as buffer
* skip tests not compatible
* make style
* make e_score_correction_bias as buffer
* use rope_interleave instead of load_hook
* skip tests not compatible with MLA
* add doc for rope_interleave
* fix typo
* remove torch.no_grad for selecting topk
* fix post merge issue
* mrege with main and simplify
* nits
* final
* small fixes
* fix
* support TP better
* stash
* changes currently requires
* remove synch
* more fixes for TP
* temp fix for TP : some attention layers's FP8 scales are too small + shared is local colwise and anything is local if FP8 because weights are used
* updates to have generation work!
* push most of the changes
* reorder functions + call for contributions!
* update readme
* nits
* update
* ruff was updated on main
* merge with main and fix copies
* revert unrelated changes
* route all tokens to all experts when testing to avoid no gradient iddues
* finish fixing all tests
* fixup
* nit
* clean config
* last readme changes
* nit
* do cnit
* typo
* last nit
* one more one more
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: arthur@huggingface.co <arthur@ip-26-0-165-131.ec2.internal>
2025-03-28 15:56:59 +01:00
Yih-Dar
1fcaad6df9
Use lru_cache
for tokenization tests ( #36818 )
...
* fix
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-28 15:09:35 +01:00
cyyever
6cc9c8d7d1
Remove deprecated batch_size parameter ( #37007 )
2025-03-27 15:01:56 +00:00
cyyever
41a0e58e5b
Set weights_only in torch.load ( #36991 )
2025-03-27 14:55:50 +00:00
Joao Gante
29f322d04d
[generate, cache] handle more complex device maps ( #37014 )
2025-03-27 14:33:20 +00:00
eustlb
fb8e6c50e4
[audio utils] fix fft_bin_width computation ( #36603 )
...
* fix fft_bin_width computation
* update docstring + enforce correct params
* update test with correct value
* udpate test
* update feature extractors for concerned models
* update
* make
* udpate docstring
* udpate docstring
2025-03-27 15:20:02 +01:00
Raushan Turganbay
e97c760006
[chat templates} support loading audio from video ( #36955 )
...
* add audio from video
* typos
* delete print
* comments
2025-03-27 14:46:11 +01:00
Sungyoon Jeong
d1eafe8d4e
Optimize to_py_obj
for python-native numeric lists and scalars ( #36885 )
...
* Optimize to_py_obj for python-native numeric lists and scalars
* Fix bug that tuple is not converted to list
* Try np.array for more robust type checking
* Apply review and add tests for to_py_obj
2025-03-27 14:16:46 +01:00
jiqing-feng
0e56fb69a2
fix pegasus init weights and other copied models ( #36844 )
...
* fix pegasus init weights
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix the rest of models
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix test
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix informer init
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* init weight before checking
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix roformer tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix roformer tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-03-27 14:14:30 +01:00
Mohamed Mekkouri
92429057d9
Skip FP8 linear tests For device capability < 9.0( #37008 )
...
* skip fp8 linear
* add capability check
* format
2025-03-27 12:38:37 +01:00
Yih-Dar
d13c390d01
Mark 2 tests as flaky for now ( #37038 )
...
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-27 10:59:47 +01:00
Abu Bakr Soliman
49b5ab6a27
Support QuestionAnswering Module for ModernBert based models. ( #35566 )
...
* push ModernBertForQuestionAnswering
* update ModernBertForQuestionAnswering
* update __init__ loading
* set imports for ModernBertForQuestionAnswering
* update ModernBertForQuestionAnswering
* remove debugging logs
* update init_weights method
* remove custom initialization for ModernBertForQuestionAnswering
* apply make fix-copies
* apply make style
* apply make fix-copies
* append ModernBertForQuestionAnswering to the pipeline supported models
* remove unused file
* remove invalid autoload value
* update en/model_doc/modernbert.md
* apply make fixup command
* make fixup
* Update dummies
* update usage tips for ModernBertForQuestionAnswering
* update usage tips for ModernBertForQuestionAnswering
* add init
* add lint
* add consistency
* update init test
* change text to trigger stuck text
* use self.loss_function instead of custom loss
By @Cyrilvallez
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* Update modeling_modernbert.py
make comparable commit to even it out
* Match whitespace
* whitespace
---------
Co-authored-by: Matt <rocketknight1@gmail.com>
Co-authored-by: Orion Weller <wellerorion@gmail.com>
Co-authored-by: Orion Weller <31665361+orionw@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-03-26 21:24:18 +01:00
cyyever
2b550c47b2
Remove deprecated training arguments ( #36946 )
...
* Remove deprecated training arguments
* More fixes
* More fixes
* More fixes
2025-03-26 16:44:48 +00:00
cyyever
e7139d06f5
Fix tensor dtype mismatch ( #36985 )
...
* Fix tensor dtype mismatch
* update
* update
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-26 10:37:46 +01:00
湛露先生
ebd2029483
Change GPUS to GPUs ( #36945 )
...
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-03-25 17:25:39 +01:00
Yih-Dar
c6814b4ee8
Update ruff to 0.11.2
( #36962 )
...
* update
* update
* update
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-25 16:00:11 +01:00
Joao Gante
bc1c90a755
[Utils] torch version checks optionally accept dev versions ( #36847 )
2025-03-25 10:58:58 +00:00
Raushan Turganbay
0f733110a6
Support return_tensors
in audio chat templates ( #34601 )
...
* add audio chat templates
* update
* update
* nit
* green ci
* we dont care about the order anymore
* clean up after rebase
* overriden tests rename
* rename shieldgemma also
* one more rename
* require_read_token
* removde images/videos
* retrigger CI flaky
2025-03-25 11:08:47 +01:00
Afanti
19085c28da
fix typos in the tests directory ( #36932 )
...
* chore: fix typos in test codes
* chore: fix typos in test codes
* chore: fix typos in test codes
* chore: fix typos in test codes
* chore: fix typos in test codes
* chore: fix typos in test codes
* chore: fix typos in test codes
* chore: fix typos in test codes
* chore: format codes
2025-03-25 10:49:24 +01:00
Guang Yang
69bcb86c58
Export for Phi4-mini ( #36780 )
...
* Export for Phi4-mini
* Update tests/models/phi3/test_modeling_phi3.py
---------
Co-authored-by: Guang Yang <guangyang@fb.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-03-25 10:46:38 +01:00
Cyril Vallez
4303d88c09
Add Phi4 multimodal ( #36939 )
...
* raw start
* update
* update
* add to imports
* update
* up
* simplify configs
* clean configs
* style
* typos
* Update convert_phi4_multimodal_weights_to_hf.py
* Update convert_phi4_multimodal_weights_to_hf.py
* fix
* up
* up
* up
* Update convert_phi4_multimodal_weights_to_hf.py
* Update convert_phi4_multimodal_weights_to_hf.py
* up
* up
* up
* Update feature_extraction_phi4_multimodal.py
* up
* up
* up
* up
* up
* simplify configs
* typo
* cut code
* typo
* typo
* typo
* re
* typo
* up
* up
* up
* add tests
* fix
* fix
* Update test_modeling_phi4_multimodal.py
* up
* Update test_modeling_phi4_multimodal.py
* doc
* fix
* up
* up
* up
* up
* up
* up
* simplify
* up
* simplify
* config docstrings
* cleanup
* clean
* typo
* typo
* fix
* Update phi4_multimodal.md
* fix
* fix
* Update test_modeling_phi4_multimodal.py
* update
* simplify reshapes and permutes
* up
* simplify special tokens
* simplify processor a lot
* Update processing_phi4_multimodal.py
* Update processing_phi4_multimodal.py
* switch to fast processor
* image processor
* Update image_processing_phi4_multimodal_fast.py
* add lora extraction to converter
* Update convert_phi4_multimodal_weights_to_hf.py
* Update __init__.py
* add AudioInput type in audio_utils
* rewrite feature_extraction: support torch batched FFT
* input_audio_embeds -> audio_input_features, input_image_embeds -> image_pixel_values
* test update
* not mono channel warning update
* remove auto maps from processor
* kargs dispatch in processor
* simplify kwargs dispatch
* simplify merging
* remove default sampling rate
* style
* Update test_modeling_phi4_multimodal.py
* update doc
* doc
* torch only feature extractor
* make fake tokens adjustable
* Update feature_extraction_phi4_multimodal.py
* fix
* Update processing_phi4_multimodal.py
* simplify mask
* last touch
* fix copies
* style
* Update audio_utils.py
* style
* Update feature_extraction_phi4_multimodal.py
* Update __init__.py
* docstrings
* copies
* fix all checks
* back to fix-copies
* trigger CIs
* Update feature_extraction_phi4_multimodal.py
* improve tests with multimodal inputs
* trigger CIs
---------
Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
2025-03-25 09:55:21 +01:00
Raushan Turganbay
47e5432805
Deprecate #36741 and map Causal to Conditional ( #36917 )
...
* deprecate the prev fix
* reword warning and update docs
* reword warning
* tests
* dont bloat `get_text_config()`
2025-03-25 09:13:56 +01:00
Yoni Gozlan
91455c1825
Fix processor kwargs qwen2 vl ( #36890 )
...
* Fix qwen2_vl and qwen2_5_vl processors cutom images kwargs
* change version warning
2025-03-24 13:19:26 -04:00
gautham
48385aa4f4
Added support for seed in DataCollatorForWholeWordMask
( #36903 )
...
* Added support for seed in `DataCollatorForWholeWordMask`, and also wrote tests.
Also fixed bugs where the code hardcoded values for mask replacement probability and random replacement probability, instead of using the values passed by the user.
* formatting issues
* Used better way to generate seed in TF. Made tests more consistent.
2025-03-24 16:57:17 +00:00
omahs
cbf924b76c
Fix typos ( #36910 )
...
* fix typos
* fix typos
* fix typos
* fix typos
2025-03-24 14:08:29 +00:00
Yih-Dar
340500b1a9
Use another repo. for Mistral3 processor testing ( #36925 )
...
* fix
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-24 14:36:05 +01:00
Raushan Turganbay
57f551c78d
[chameleon] fix num image token check ( #36918 )
...
* [chameleon] fix num image token check
* embed after merging image token
* skip this also
* mistral require_read_token
2025-03-24 12:36:08 +01:00
Yoni Gozlan
beb9b5b022
Fix Pan and Scan on batched images Gemma3 ( #36864 )
...
* process flattened images in fast image proc
* process flattened images in low proc and add tests
* remove print
* add unbalanced batch test pas image proc
* fix integration tests
2025-03-21 13:56:00 -04:00
Cyril Vallez
dd3933dd65
Simplify keep_in_fp32_modules logic ( #36722 )
...
* better regex everywhere
* fix
* Update test_modeling_instructblip.py
* BC with explanations this time otherwise it makes no sense at all
* Update test_modeling_instructblip.py
* style
* CIs
* update _keep_in_fp32_modules in blip2
* Update modeling_utils.py
* Update modeling_utils.py
* style
* CIs
* add check
* trigger CIs
* Update modeling_utils.py
* trigger CIs
2025-03-21 16:12:59 +01:00
Sukriti Sharma
90e2df5d55
fix: loss computation after embeddings resize - mllama ( #36840 )
...
* move loss to generation class
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* code cleanup
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* test for resize and loss computation
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fix tests
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fix:test for resize and loss
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fix resize embedding mllama test
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* review changes
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
---------
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
2025-03-21 14:47:59 +01:00
Raushan Turganbay
523f6e743c
Fix: dtype cannot be str ( #36262 )
...
* fix
* this wan't supposed to be here, revert
* refine tests a bit more
2025-03-21 13:27:47 +01:00
Pablo Montalvo
2638d54e78
Gemma 3 tests expect greedy decoding ( #36882 )
...
tests expect greedy decoding
2025-03-21 12:36:39 +01:00
Joao Gante
94f487626a
[generate] model defaults being inherited only happens for newer models ( #36881 )
2025-03-21 11:01:09 +00:00