When preparing the causal attention mask at this point the mask comes
in as a float tensor with min value as a masked value.
It is not correct to convert it to bool and treat it as a bool mask as
this inverts the mask.
`torch.nn.functional.scaled_dot_product_attention` expects that a masked value is `False`.
I suspect that the `sdpa` implementation variant may not have been
thoroughly tested and that is why this error was not caught earlier.
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Add Llama4TextModel to AutoModel mapping
using Llama4TextConfig on AutoModel.from_config raises a ValueError when it is expected to instantiate a Llama4TextModel
bnb quant tests: remove obsolete trust_remote_code test
The MPT model is now natively integrated in Transformers and no longer requires trust_remote_code=True. This removes the failing test_get_keys_to_not_convert_trust_remote_code and related usage, which depended on remote code and caused CI issues due to missing dependencies (e.g., triton_pre_mlir).
* fix sliding attn
* make style
* Update tests/test_modeling_common.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* no a second throught, should default to `True` fo BC
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* use device agnostic APIs in tests
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* more
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* fix style
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* add reset_peak_memory_stats API
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* update
---------
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Update modular_qwen2_5_omni.py
fix the error when loading quantized model by AuotAWQ.
* Update modeling_qwen2_5_omni.py
sync code to modular_qwen2_5_omni.py
* pipeline generation defaults
* add max_new_tokens=20 in test pipelines
* pop all kwargs that are used to parameterize generation config
* add class attr that tell us whether a pipeline calls generate
* tmp commit
* pt text gen pipeline tests passing
* remove failing tf tests
* fix text gen pipeline mixin test corner case
* update text_to_audio pipeline tests
* trigger tests
* a few more tests
* skips
* some more audio tests
* not slow
* broken
* lower severity of generation mode errors
* fix all asr pipeline tests
* nit
* skip
* image to text pipeline tests
* text2test pipeline
* last pipelines
* fix flaky
* PR comments
* handle generate attrs more carefully in models that cant generate
* same as above
* tmp commit (imports broken)
* working version; update tests
* remove line break
* shorter msg
* dola checks need num_beams=1; other minor PR comments
* update early trainer failing on bad gen config
* make fixup
* test msg
* Fix ModuleNotFoundError torchao.prototype.low_bit_optim since torchao v 0.11.0
* Fix space on blank line
* update torchao's AdamW4bit and AdamW8bit import for v0.11.0
* Apply style fixes
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* add args support to fast image processors
* add comment for clarity
* fix-copies
* Handle child class args passed as both args or kwargs in call and preprocess functions
* revert support args passed as kwargs in overwritten preprocess
* fix image processor errors
* Add flash-attention-2 backend for ESM-2
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
* update extended_attention_mask for fa2
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
* add test_flash_attn_2_equivalence test
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
---------
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
* enable optional RMS in BitLinear
* Fix naming
* Import RMS from Llama using config.*
* make fix-copies
* ran CI loop
* remove default BitNetQuantConfig values
* Fix BitNetQuantConfig to be Optional
* Fix config docstrings to match Optoinal
* Edit docstrings to match standards
---------
Co-authored-by: steinmetzc <codysteinmetz7@gmail.com>
Co-authored-by: codys12 <steinmetzc@dh-mgmt4.hpc.msoe.edu>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
* Include output embedding as well with `include_embedding` flag
Summary:
att
Test Plan:
python tests/quantization/torchao_integration/test_torchao.py -k test_include_embedding
Reviewers:
Subscribers:
Tasks:
Tags:
* format
* rename include_embedding to include_input_output_embeddings
---------
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>