* Handle empty change indices in RLE conversion for masks
* [test] Add unit tests for RLE encoding of masks in SamProcessor
* [test] Update RLE conversion tests to use TensorFlow implementation
* [test] Fix formatting in SamProcessorTest according to check_code_quality action
* [test] Fix formatting in SamProcessorTest according to check_code_quality
* [test] Refactored rle test cases into one test and used tf tensors in tf test cases
* [test] Fix: removed self parameter from refactored methods
* [test] Removed nested methods in run-length encoding tests for PyTorch and TensorFlow
* [test] Added description to individual to run-length encoding tests for PyTorch and TensorFlow.
* initial POC
* - batch mix feature
* fix tests
* fix tests
* make style
* do not skip and instead fix tests
* update
* return back the test
* correct text with the correct ckpt
* start
* So far: 30%
* Small fix
* Continuing update
* Continuing
* Forgot to check if not None
* Continuing refactor
* Fix if else
* Fix ref
* Should make tests pass
* Keep grad norm same
* Document
* Apply suggestions from code review
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Err instead of info for logging RNG state error
* Seperate out to func
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Support for generate_argument: return_dict_in_generate=True, instead of returning a error
* fix: call test with return_dict_in_generate=True
* fix: Only import torch if it is present
* update: Encapsulate output_dict changes
* fix: added back original comments
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* correctly slice
* check mask
* Update modular_gemma2.py
* fix
* add tests
* fix typo
* finally fix mask slicing
* Finally correctly slice in all cases!!
* add test for all attention functions
* small fix in tests
* trick around dynamo tracing issue
* last update
* more robust
* kwargs propagation
* make it explicit for checkpointing
* apply modular
* Add some tp plans!
* More tp plans!
* Add it in the comment
* style
* Update configuration_mixtral.py
* Update configuration_phi.py
* update the layout according to special archs
* fix mixtral
* style
* trigger CIs
* trigger CIs
* CIs
* olmo2
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Added `segmentation_maps` support for DPT image processor
* Added tests for dpt image processor
* Moved preprocessing into separate functions
* Added # Copied from statements
* Fixed # Copied from statements
* Added `segmentation_maps` support for DPT image processor
* Added tests for dpt image processor
* Moved preprocessing into separate functions
* Added # Copied from statements
* Fixed # Copied from statements
* First commit
* Finish model implementation
* First commit
* Finish model implementation
* Register zamba2
* generated modeling and configuration
* generated modeling and configuration
* added hybrid cache
* fix attention_mask in mamba
* dropped unused loras
* fix flash2
* config docstrings
* fix config and fwd pass
* make fixup fixes
* text_modeling_zamba2
* small fixes
* make fixup fixes
* Fix modular model converter
* added inheritances in modular, renamed zamba cache
* modular rebase
* new modular conversion
* fix generated modeling file
* fixed import for Zamba2RMSNormGated
* modular file cleanup
* make fixup and model tests
* dropped inheritance for Zamba2PreTrainedModel
* make fixup and unit tests
* Add inheritance of rope from GemmaRotaryEmbedding
* moved rope to model init
* drop del self.self_attn and del self.feed_forward
* fix tests
* renamed lora -> adapter
* rewrote adapter implementation
* fixed tests
* Fix torch_forward in mamba2 layer
* Fix torch_forward in mamba2 layer
* Fix torch_forward in mamba2 layer
* Dropped adapter in-place sum
* removed rope from attention init
* updated rope
* created get_layers method
* make fixup fix
* make fixup fixes
* make fixup fixes
* update to new attention standard
* update to new attention standard
* make fixup fixes
* minor fixes
* cache_position
* removed cache_position postion_ids use_cache
* remove config from modular
* removed config from modular (2)
* import apply_rotary_pos_emb from llama
* fixed rope_kwargs
* Instantiate cache in Zamba2Model
* fix cache
* fix @slow decorator
* small fix in modular file
* Update docs/source/en/model_doc/zamba2.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* several minor fixes
* inherit mamba2decoder fwd and drop position_ids in mamba
* removed docstrings from modular
* reinstate zamba2 attention decoder fwd
* use regex for tied keys
* Revert "use regex for tied keys"
This reverts commit 9007a522b1.
* use regex for tied keys
* add cpu to slow forward tests
* dropped config.use_shared_mlp_adapter
* Update docs/source/en/model_doc/zamba2.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* re-convert from modular
---------
Co-authored-by: root <root@node-2.us-southcentral1-a.compute.internal>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>