* remove it from all py files
* remove it from the doc
* remove it from examples
* style
* remove traces of _fast_init
* Update test_peft_integration.py
* CIs
* use device agnostic APIs in test cases
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* fix style
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* add one more
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* xpu now supports integer device id, aligning to CUDA behaviors
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* update to use device_properties
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* fix style
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* update comment
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* fix comments
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* fix style
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
---------
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* chore: fix typos in the tests
* fix: format codes
* chore: fix copy mismatch issue
* fix: format codes
* chore: fix copy mismatch issue
* chore: fix copy mismatch issue
* chore: fix copy mismatch issue
* chore: restore previous words
* chore: revert unexpected changes
* tmp commit
* move tests to the right class
* remove ALL all_generative_model_classes = ...
* skip tf roberta
* skip InstructBlipForConditionalGenerationDecoderOnlyTest
* videollava
* reduce diff
* reduce diff
* remove on vlms
* fix a few more
* manual rebase bits
* more manual rebase
* remove all manual generative model class test entries
* fix up to ernie
* a few more removals
* handle remaining cases
* recurrent gemma
* it's better here
* make fixup
* tf idefics is broken
* tf bert + generate is broken
* don't touch tf :()
* don't touch tf :(
* make fixup
* better comments for test skips
* revert tf changes
* remove empty line removal
* one more
* missing one
* use torch.testing.assertclose instead to get more details about error in cis
* fix
* style
* test_all
* revert for I bert
* fixes and updates
* more image processing fixes
* more image processors
* fix mamba and co
* style
* less strick
* ok I won't be strict
* skip and be done
* up
* first adding diffllama
* add Diff Attention and other but still with errors
* complate make attention Diff-Attention
* fix some bugs which may be caused by transformer-cli while adding model
* fix a bug caused by forgetting KV cache...
* Update src/transformers/models/diffllama/modeling_diffllama.py
You don't need to divide by 2 if we use same number of attention heads as llama. instead you can just split in forward.
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* Update src/transformers/models/diffllama/modeling_diffllama.py
fit to changeing "num_heads // 2" place
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* Update src/transformers/models/diffllama/modeling_diffllama.py
new codes are more meaningful than before
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* Update src/transformers/models/diffllama/modeling_diffllama.py
new codes are more meaningful than before
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* Update src/transformers/models/diffllama/modeling_diffllama.py
fit to changeing "num_heads // 2" place
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* Update src/transformers/models/diffllama/modeling_diffllama.py
fix 2times divide by sqrt(self.head_dim)
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* Update src/transformers/models/diffllama/modeling_diffllama.py
fix 2times divide by sqrt(self.head_dim)
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* Update src/transformers/models/diffllama/modeling_diffllama.py
fit to changeing "num_heads // 2" place.
and more visible
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* I found Attention missed implemented from paper still on e072544a3b.
* re-implemented
* adding groupnorm
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* align with transformers code style
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* fix typo
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* adding groupnorm
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* change SdpaAttention to DiffSdpaAttention
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* fix bug
* Update src/transformers/models/diffllama/modeling_diffllama.py
resolve "not same outputs" problem
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* fix bugs of places of "GroupNorm with scale" and etc
* Revert "fix bugs of places of "GroupNorm with scale" and etc"
This reverts commit 26307d92f6.
* simplify multiple of attention (matmul) operations into one by repeating value_states
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* simplify multiple of attention (matmul) operations into one by repeating value_states
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* simplify multiple of attention (matmul) operations into one by repeating value_states
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* remove missed type
* add diffllama model_doc
* apply make style/quality
* apply review comment about model
* apply review comment about test
* place diffllama alphabetically on the src/transformers/__init__.py
* fix forgot code
* Supports parameters that are not initialized with standard deviation 0 in the conventional method
* add DiffLlamaConfig to CONFIG_CLASSES_TO_IGNORE_FOR_DOCSTRING_CHECKPOINT_CHECK on utils/check_config_docstrings.py
* remove unused property of config
* add to supported model list
* add to spda supported model list
* fix copyright, remove pretraining_tensor_parallel, and modify for initialization test
* remove unused import and etc.
* empty commit
* empty commit
* empty commit
* apply modular transformers but with bugs
* revert prev commit
* create src/transformers/model/diffllama/modular_diffllama.py
* run utils/modular_model_converter.py
* empty commit
* leaner modular diffllama
* remove more and more in modular_diffllama.pt
* remove more and more in modular_diffllama.pt
* resolve missing docstring entries
* force reset
* convert modular
---------
Co-authored-by: Minho Ryu <ryumin93@gmail.com>