* Add Doge Model
* Fix code quality
* Rollback an error commit
* Fix config for open-source weights
* Revert "Fix config for open-source weights"
This reverts commit 229cdcac10.
* Add modular_doge
* Update Doge inherits from Llama
* Fix import bug
* [docs] Add usage of doge model
* Fix Doge import pretrainedconfig from modeling_utils to configuration_utils
* [docs] remove trust remote code from doge
* Fix dynamo bug in doge model
* Update docstrings
* Import apply_rotary_pos_emb and repeat_kv from Llama
* Fix all nits
* Fix code quality
* Fix some bugs
* Fix code quality
* Remove inherited `_update_causal_mask` from Llama
This leads to incorrect weight initialization.
* Fix the wrong tensor orderings in DogeCDMoE
* Fix attention mask bug
We have to provide attention_mask for dynamic mask computation
* Modify most implementations to inherit from Llama
But there are two problems:
1. `flex_attention_forward` is not updated properly
2. `Example` error in the forward method of DogeForCausalLM
* Modify CDMoE for batch efficient implementation
* Uniform MoE configuration names, just like QwenMoE
* Fix code quality
* Fix code quality
* Fix code quality
* Add tp plan of CDMoE Module
* Hybird DMA with sliding window
* Update valid tokens greater than window size
* Fix code quality
* Add `convert_doge_weights_to_hf`
* Fix STATE_DICT_MAPPING in convert_doge_weights_to_hf.py
* Fix nits in modular_doge
* Fix code quality
* Fix all nits
* Fix all nits
* Make sure the attention function is updated inside the class
* Fix code quality issues in the Doge model and add a test for it
* Fix `test_generate`
* Fix code quality
* Fix nits fllowing suggestions
* Fix code quality
* Fix code quality issues
* Fix nits
* Fix code quality nits
* Fix the missing parameters in the configuration.
* Fix the missing parameters in the configuration.
* Fix nits
* Add initialization of attention
* Fix last nits
* Simplify dynamic mask generation logic
* Rename router_logits to gate_logits for matching latest changes of MixtralModel
* Rename typings for matching latest changes of MixtralModel
* Fixes typo in comment
* Update src/transformers/models/doge/modular_doge.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Fix code quality issues to match other modular
* Fix code quality issues to match other modular
* Fix the static compilation errors
* Update model weights link
* Fix code quality issues to match other modular
* reapply modular and support for new outputs
* style
* simplify a lot
* fix import location
* reapply modular
* fix
* fix integration test
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* Fix errors when use verl to train GLM4.1v model
* Support glm4v load from AutoModelForVision2Seq
* Set glm4v model _checkpoint_conversion_mapping attr from None to {}
* Update modeling_auto.py
* fix(decoding): stop beam search per-instance when heuristic satisfied
Previously, when early_stopping is set to `False`, the early-stopping heuristic only halted generation when **all** batch instances reached the criterion. This caused instances that are impossible (suggested by the heuristic) to improve keep generating, leading to inconsistent and overlong outputs across the batch.
Now we apply the heuristic **per-instance**: once a certain instance of batch has its all beams impossibe to improve, we mark that instance finished while letting others continue. This restores expected behavior and ensures consistency in batched generation.
* Add test case GenerationIntegrationTests.test_beam_search_early_stop_heuristic
* Update naming improvement_possibility -> is_early_stop_heuristic_unsatisfied
* Add comments for early stop heuristic
* Update src/transformers/generation/utils.py
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
- Complete Apache License text in Italian documentation
- Remove duplicate variable assignment in Perceiver converter
- Fix typo in MODEL_FOR_VISION_2_SEQ_MAPPING_NAMES constant
* chameleon xpu bnb groundtruth update on bnb triton backend since we are
deprecating ipex backend
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* enable hqq uts on XPU, all passed
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* fix style
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* fix comment
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
---------
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* update the glm4 model readme
* update test
* update GLM-4.1V model
* update as format
* update
* fix some tests
* fix the rest
* fix on a10, not t4
* nit: dummy import
---------
Co-authored-by: raushan <raushan@huggingface.co>
* Update LED model card
* Remove extra arguments
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* fix bug using FSDP V1 will lead to model device not properly set
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
* update the code
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
---------
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
* [video processors] Support float fps for precise frame sampling
Enable fractional fps values (e.g., 1.5, 29.97) in video processors
for more precise frame sampling control.
- Change fps type from int to float across all video processors
- Maintain backward compatibility with integer values
Extends: #38105
* [video processors] Refine fps typing to Union[int, float]
Change fps type from Optional[float] to Optional[Union[int, float]]
for more explicit type information about supporting both integer
and floating-point frame rates.
- Update type hints and docstrings across 8 files
- Maintain backward compatibility
- Clarify support for both int and float values
Extends: #38105
* Revert "[video processors] Support float fps for precise frame sampling"
This reverts commit 7360d6e661.
* just update 2 files
* update other models as well just making fix-copies
* also add the changes needed to modeling utils
* put this on the pretrained model instead
* nits and fixes
* update generic, fix to use config value
* update other modelings
* use transformers kwargs instead
* update
* update
* update other models
* update
* updates
* update
* update
* update
* fix
* finally
* very small nits
* this fixes more tests
* fix other models as well!
* update modularqwen2
* update models based on qwen2
* update
* update
* remove the **flash stuff in favor of noraml kwargs
* update
* propagate gemma?
* remove output attentions
* propagate
* support cross attention edge case
* same
* test this
* fixes
* more fix
* update
* update
* fix conflicts
* update
* fix emu3
* fix emu3
* move the fix a bit
* quel enfer
* some fixes, loss_kwargs should never had been
* finish fixing gemma3n
* fix small lm3
* fix another one
* fix csm now
* fux csm and mistral
* fix mistral now
* small fixes
* fix janusss
* only for some models
* fixup
* phix phi3
* more fixes?
* dose this fix it?
* update
* holy shit it was just graph breaks
* protect torch
* updates
* fix samhq?
* fix moonshine
* more moonshine fixes, 3 failures left!
* nits
* generic needs to support more
* more fixes to moonshine!
* fix cross attention outputs!
* fix csm!
* nits
* fix stupid kosmos2
* current updates
* fixes
* use output recorder?
* nicer!
* a little bit of magic
* update
* fix protect
* fix
* small fixes
* protect import
* fix a bunch of more models
* fix fixups
* fix some of the last ones
* nit
* partly fix phi
* update
* fix import path
* make something that is fullgraph compatible just to be sure
* typing was wrong on llama so the rest was wrong as well
* fucking ugly but at least it is still exportable
* syle
* supposed to fix moonshine, it still breaks
* fix some default
* fix the last bits of sam
* update samhq
* more fixes to am hq
* nit
* fix all output+hidden states and output_attentions!
* fix?
* fix diffllama
* updates to fix initialization on the sam pips
* ups there was a bug
* fix the last sam hq test
* fix gotocr
* fix gotocr2!
* fixes
* skip stupid tests
* there was one left :)
* fixup
* fix fix copies issues with this test file
* fix copies for sam_hq
* rm some comments
* skip 2 more failing tests
* fix
* fix everything
* Apply suggestions from code review
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
* add more doc!
* fix public init
* fix modular qwen3
---------
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>