* Add MLCD model
* Update codes for auto-mapping
* Add test scripts for MLCD
* Update doc for MLCD model
* Fix import error
* Fix import error
* Fix CI error for attention_outputs
* Fix code style for CI
* Fix code style for CI
* Fix code style for CI
* Fix code style for CI
* Fix code style for CI
* Fix CI error for initialization
* Fix code style for CI
* Fix code style for CI
* Reformat codes and docs for CI test
* Reformat codes and docs for CI test
* Remove unused attributes for CI test
* Fix style for CI test
* List MLCD in flash_attn doc
* Fix: typos, modulars, refactors from suggestions
* Refactoring convert_mlcd_weights_to_hf.py from suggestions
* Fix: docs conflicts
* Fix error for CI test
* Fix style for CI test
* Add integration test for MLCD
* Refactoring by class inheritance
* Fix: refactor attention interface, adjust codes
* Fix: merging conflicts
* Fix: merging conflicts
* Fix: style for CI test
* Fix: style for CI test
* Fix: set test_resize_embeddings to be False
* Fix: initializer for CI test
* Fix: conflicts, CI test, warning and refactoring
* Fix: merging conflicts
* Refactor
* Update docs
* Fix mistakes
* Remove unused args and fix multi-gpu error
* Revert position_embeddings
* Solve conflicts
* Solve conflicts
* Remove dummy
* Update _init_weights
* Update _init_weights
* Update _init_weights for CI test