* plm template
* A working plm with fixed image features
* hacked processor
* First version that reproduced PLM output using PE from timm.
* Simplify and fix tie_word_embeddings
* Use PIL resize. Simplify converstion.
* First version that works with video input.
* simplifed image preprocessing (not batched)
* Minor fixes after rebasing on main.
* Video processor based on new API.
* Revert to use _preprocess for image processor.
* refactor with modular
* fix tie_word_embedding
* Testing with timm PE
* check in missed converstion from modular to model.py
* First working version of PLM with Eva PE. PLM-1B and 3B outputs are exactly the same as before. PLM-8B output has some differences.
* address review comments
* Fixed batching if video and image examples mixed.
* Simplify PE configuration.
* Enable AutoModel for PerceptionEncoder.
* Update PE config style.
* update all headers
* Minor fixes.
* Move lm_head to PerceptionLMForConditionalGeneration.
Fix vit_G model specification.
* Fix for testing_modeling_perception_lm.py
* Image processing refactoring to use more common parts.
* Fix processor test.
* update tests to use model from hub
* More test fixes.
* integration test GT update after rebasing; probably due to video preprocessing
* update test media path to hub
* Stop tracking local scripts
* address some review comments
* refactor image processing.
* small fixes
* update documentation and minor fixes
* remove scripts
* Minor fix for CI
* Fix image processing
* CI and doc fix
* CI formatting fix
* ruff fix
* ruff formatting
* ran utils/sort_auto_mappings.py
* update docstring
* more docstring udpates
* add vision_input_type default fallback for image processing
* more verbose variable naming
* test update
* Remove PE and PEConfig use AutoModel(TimmWrapper) instead
* Minor cleanup.
* Minor Fix: remove any ref to PE. Ruff format and check.
* fix docstring
* Fix modular/model consistency.Improvex docstringfor .
* Fix PerceptionLMForConditionalGenerationModelTest
* ruff fix
* fix for check_repo
* minor formatting
* dummy size arg to fix for processor test.
* Update docstring for PerceptionLMConfig
* Minor fixes from review feedback.
* Revert some minor changes per reviewer feedback.
* update base_model_prefix
* address reviewer feedback
* fix comment in modeling file
* address reviewer feedback
* ruff format
* Pre-merge test update.
* reapply modular and fix checkpoint name
* processor test path
* use modular a bit more
* remove dead code
* add token decorator
---------
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* Updated Switch Transformers model card with standardized format (Issue #36979)
* Apply reviewer suggestions to the new standardised Switch Transformer's model card
* Update switch_transformers.md
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* changes for video
* update modular
* change get_video_features
* update video token replacement
* update modular
* add test and fix typo
* lint
* fix order
* lint
* fix
* remove dependency
* lint
* lint
* remove todo
* resize video for test
* lint..
* fix test
* new a processor for video_test
* fix test
Also add notes asking users to set `TORCHDYNAMO_CAPTURE_SCALAR_OUTPUTS=1`
or call `torch._dynamo.config.capture_scalar_outputs = True`, as currently
this will cause a graph break.
Signed-off-by: Hollow Man <hollowman@opensuse.org>
* ensure the query is updated during training
avoid unused parameters that DDP does not like
* avoid a crash when `kwargs` contain `padding=True`
trainers often pass this argument automatically
* minor
* Remove mel_spec lazy init, and rename to mel_filters.
this ensures save_pretrained will not crash when saving the processor during training
d5d007a1a0/src/transformers/feature_extraction_utils.py (L595)
* minor - most feature extractors has a `sampling_rate` property
* speedup relative position embeddings
* fix several issues in model saving/loading:
- avoid modifying `self._hf_peft_config_loaded` when saving
- adapter_config automatically points to the original base model - a finetuned version should point to the model save dir.
- fixing model weights names, that are changed by adding an adapter.
* minor
* minor
* minor
* fixing a crash without peft active
* add todo to replace einsum
* granite speech speedups:
1. register attention_dist to avoid cpu-to-gpu transfer every layer.
2. pad_sequence is much faster than per-sample-padding + concat.
3. avoid returning audio back to cpu when using a compute device.
* support audio.shape=(1,L)
* docs: update LLaVA-NeXT model card
* Update docs/source/en/model_doc/llava_next.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/llava_next.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/llava_next.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/llava_next.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/llava_next.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/llava_next.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/llava_next.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/llava_next.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* [docs] Updated llava_next model card
* Update docs/source/en/model_doc/llava_next.md remove image sources
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* [fix] Change Flash Attention to SDPA badge
* [doc] fixed quantization example
* docs: updated contribution details and badges
* Update llava_next.md
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update marian.md
This update improves the Marian model card to follow the Hugging Face standardized model card format. The changes include:
- Added a clear description of MarianMT, its architecture, and how it differs from other models.
- Provided usage examples for Pipeline and AutoModel.
- Added a quantization example for optimizing model inference.
- Included instructions and examples for multilingual translation with language codes.
- Added an Attention Mask Visualizer example.
- Added a Resources section with relevant links to papers, the Marian framework, language codes, tokenizer guides, and quantization documentation.
- Fixed formatting issues in the code blocks for correct rendering.
This update improves the readability, usability, and consistency of the Marian model documentation for users.
* Update docs/source/en/model_doc/marian.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/marian.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/marian.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/marian.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/marian.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/marian.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/marian.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/marian.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/marian.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/marian.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/marian.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/marian.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/marian.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/marian.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/marian.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/marian.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/marian.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update marian.md
* Update marian.md
* Update marian.md
* Update marian.md
* Update docs/source/en/model_doc/marian.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update marian.md
* Update marian.md
* Update marian.md
* Update marian.md
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* add initial structure
* doc fixes, add model base logic
* update init files
* some fixes to config and modular
* some improvements for attention
* format
* remove unused attn
* some fixes for moe layer and for decoder
* adapt _compute_yarn_parameters for deepseek
* format
* small fix
* fix for decoder forward
* add tests, small refactoring
* fix dummies
* fix init
* fix doc
* fix config docs
* add sequce doc, fix init for gate
* fix issues in tests
* fix config doc
* remove unused args
* some fixes and refactoring after review
* fix doc for config
* small fixes for config args
* revert config refactoring
* small refactoring
* minor fixes after rebase
* small fix after merge
* fix modular
* remove rotaryembd from public init
* small test fix
* some rotary pos calculation improvement
* fix format
* some improvements and fixes
* fix config
* some refactoring
* adjust some unit tests
* skip test
* small fixes and tests adjustment
* reapply modular
* fix all tests except Integration
* fix integration testzs
* cleanup BC stuff
* rope
* fix integrations tests based on a10
* style
---------
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* Add Doge Model
* Fix code quality
* Rollback an error commit
* Fix config for open-source weights
* Revert "Fix config for open-source weights"
This reverts commit 229cdcac10.
* Add modular_doge
* Update Doge inherits from Llama
* Fix import bug
* [docs] Add usage of doge model
* Fix Doge import pretrainedconfig from modeling_utils to configuration_utils
* [docs] remove trust remote code from doge
* Fix dynamo bug in doge model
* Update docstrings
* Import apply_rotary_pos_emb and repeat_kv from Llama
* Fix all nits
* Fix code quality
* Fix some bugs
* Fix code quality
* Remove inherited `_update_causal_mask` from Llama
This leads to incorrect weight initialization.
* Fix the wrong tensor orderings in DogeCDMoE
* Fix attention mask bug
We have to provide attention_mask for dynamic mask computation
* Modify most implementations to inherit from Llama
But there are two problems:
1. `flex_attention_forward` is not updated properly
2. `Example` error in the forward method of DogeForCausalLM
* Modify CDMoE for batch efficient implementation
* Uniform MoE configuration names, just like QwenMoE
* Fix code quality
* Fix code quality
* Fix code quality
* Add tp plan of CDMoE Module
* Hybird DMA with sliding window
* Update valid tokens greater than window size
* Fix code quality
* Add `convert_doge_weights_to_hf`
* Fix STATE_DICT_MAPPING in convert_doge_weights_to_hf.py
* Fix nits in modular_doge
* Fix code quality
* Fix all nits
* Fix all nits
* Make sure the attention function is updated inside the class
* Fix code quality issues in the Doge model and add a test for it
* Fix `test_generate`
* Fix code quality
* Fix nits fllowing suggestions
* Fix code quality
* Fix code quality issues
* Fix nits
* Fix code quality nits
* Fix the missing parameters in the configuration.
* Fix the missing parameters in the configuration.
* Fix nits
* Add initialization of attention
* Fix last nits
* Simplify dynamic mask generation logic
* Rename router_logits to gate_logits for matching latest changes of MixtralModel
* Rename typings for matching latest changes of MixtralModel
* Fixes typo in comment
* Update src/transformers/models/doge/modular_doge.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Fix code quality issues to match other modular
* Fix code quality issues to match other modular
* Fix the static compilation errors
* Update model weights link
* Fix code quality issues to match other modular
* reapply modular and support for new outputs
* style
* simplify a lot
* fix import location
* reapply modular
* fix
* fix integration test
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>