* Initial commit for Qwen3
* fix and add tests for qwen3 & qwen3_moe
* rename models for tests.
* fix
* fix
* fix and add docs.
* fix model name in docs.
* simplify modular and fix configuration issues
* Fix the red CI: ruff was updated
* revert ruff, version was wrong
* fix qwen3moe.
* fix
* make sure MOE can load
* fix copies
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
* init commit
* style
* take comments into account
* add deepseekv3 modeling
* remove redundant code
* apply make style
* apply fix-copies
* make format
* add init files
* rename deepseekv3 into deepseek_v3 based on its model_type
* rename deepseekv3 into deepseek_v3 based on its model_type
* deepseek-v3 not deepseek_v3
* set model_type as deepseek_v3
* use default docs
* apply make
* fill type and docstring
* add rope_config_validation
* use custom DeepseekV3MLP
* hold code only for checkpoints congifuration; remove redundant
* revise rope yarn for DeepSeek variation
* rename DeepSeek-V3
* some refactoring
* revise load_hook to work properly; make moe func trainable; use llama instead of mixtral
* fix attention forward
* use -1 for not-changing dim when to use exapnd
* refactor DeepseekV3TopkRouter
* use reshape_for_rope instead of load_hook; revise attention forward for TP; rename q_head_dim with qk_head_dim
* register pre_hook and hook both
* make style
* use n_shared_experts
* Update src/transformers/models/deepseek_v3/configuration_deepseek_v3.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add test file
* update modeling_file according to modular file
* make style
* add mapping for DeepseekV3ForSequenceClassification
* remove aux_loss_alpha
* add deepseek_v3 for perf
* add deepseek_v3
* rename test as deepseekv3
* use tiny-deepseek-v3
* remove DeepseekV3ForSequenceClassification
* cache before padding
* remote output_router_logits
* Revert "remote output_router_logits"
This reverts commit f264f800d0.
* remove output_router_logits
* make e_score_correction_bias as buffer
* skip tests not compatible
* make style
* make e_score_correction_bias as buffer
* use rope_interleave instead of load_hook
* skip tests not compatible with MLA
* add doc for rope_interleave
* fix typo
* remove torch.no_grad for selecting topk
* fix post merge issue
* mrege with main and simplify
* nits
* final
* small fixes
* fix
* support TP better
* stash
* changes currently requires
* remove synch
* more fixes for TP
* temp fix for TP : some attention layers's FP8 scales are too small + shared is local colwise and anything is local if FP8 because weights are used
* updates to have generation work!
* push most of the changes
* reorder functions + call for contributions!
* update readme
* nits
* update
* ruff was updated on main
* merge with main and fix copies
* revert unrelated changes
* route all tokens to all experts when testing to avoid no gradient iddues
* finish fixing all tests
* fixup
* nit
* clean config
* last readme changes
* nit
* do cnit
* typo
* last nit
* one more one more
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: arthur@huggingface.co <arthur@ip-26-0-165-131.ec2.internal>
* add audio chat templates
* update
* update
* nit
* green ci
* we dont care about the order anymore
* clean up after rebase
* overriden tests rename
* rename shieldgemma also
* one more rename
* require_read_token
* removde images/videos
* retrigger CI flaky
* chore: fix typos in test codes
* chore: fix typos in test codes
* chore: fix typos in test codes
* chore: fix typos in test codes
* chore: fix typos in test codes
* chore: fix typos in test codes
* chore: fix typos in test codes
* chore: fix typos in test codes
* chore: format codes
* process flattened images in fast image proc
* process flattened images in low proc and add tests
* remove print
* add unbalanced batch test pas image proc
* fix integration tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* fix: format codes
* chore: fix copy mismatch issue
* fix: format codes
* chore: fix copy mismatch issue
* chore: fix copy mismatch issue
* chore: fix copy mismatch issue
* chore: restore previous words
* chore: revert unexpected changes
* add prompt depth anything model by modular transformer
* add prompt depth anything docs and imports
* update code style according transformers doc
* update code style: import order issue is fixed by custom_init_isort
* fix depth shape from B,1,H,W to B,H,W which is as the same as Depth Anything
* move prompt depth anything to vision models in _toctree.yml
* update backbone test; there is no need for resnet18 backbone test
* update init file & pass RUN_SLOW tests
* update len(prompt_depth) to prompt_depth.shape[0]
Co-authored-by: Joshua Lochner <admin@xenova.com>
* fix torch_int/model_doc
* fix typo
* update PromptDepthAnythingImageProcessor
* fix typo
* fix typo for prompt depth anything doc
* update promptda overview image link of huggingface repo
* fix some typos in promptda doc
* Update image processing to include pad_image, prompt depth position, and related explanations for better clarity and functionality.
* add copy disclaimer for prompt depth anything image processing
* fix some format typos in image processing and conversion scripts
* fix nn.ReLU(False) to nn.ReLU()
* rename residual layer as it's a sequential layer
* move size compute to a separate line/variable for easier debug in modular prompt depth anything
* fix modular format for prompt depth anything
* update modular prompt depth anything
* fix scale to meter and some internal funcs warp
* fix code style in image_processing_prompt_depth_anything.py
* fix issues in image_processing_prompt_depth_anything.py
* fix issues in image_processing_prompt_depth_anything.py
* fix issues in prompt depth anything
* update converting script similar to mllamma
* update testing for modeling prompt depth anything
* update testing for image_processing_prompt_depth_anything
* fix assertion in image_processing_prompt_depth_anything
* Update src/transformers/models/prompt_depth_anything/modular_prompt_depth_anything.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Update src/transformers/models/prompt_depth_anything/modular_prompt_depth_anything.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Update docs/source/en/model_doc/prompt_depth_anything.md
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Update docs/source/en/model_doc/prompt_depth_anything.md
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* update some testing
* fix testing
* fix
* add return doc for forward of prompt depth anything
* Update src/transformers/models/prompt_depth_anything/modular_prompt_depth_anything.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Update tests/models/prompt_depth_anything/test_modeling_prompt_depth_anything.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* fix prompt depth order
* fix format for testing prompt depth anything
* fix minor issues in prompt depth anything doc
* fix format for modular prompt depth anything
* revert format for modular prompt depth anything
* revert format for modular prompt depth anything
* update format for modular prompt depth anything
* fix parallel testing errors
* fix doc for prompt depth anything
* Add header
* Fix imports
* Licence header
---------
Co-authored-by: Joshua Lochner <admin@xenova.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Add expectation classes + tests
* Use typing Union instead of |
* Use bits to track score in properties cmp method
* Add exceptions and tests + comments
* Remove compute cap minor as it is not needed currently
* Simplify. Remove Properties class
* Add example Exceptions usage
* Expectations as dict subclass
* Update example Exceptions usage
* Refactor. Improve type name. Document score fn.
* Rename to DeviceProperties.