transformers/tests
Arthur dcb183f4bd
[MPT] Add MosaicML's MPT model to transformers (#24629)
* draft add new model like

* some cleaning of the config

* nits

* add nested configs

* nits

* update

* update

* added layer norms + triton kernels

* consider only LPLayerNorm for now.

* update

* all keys match.

* Update

* fixing nits here and there

* working forward pass.

* removed einops dependency

* nits

* format

* add alibi

* byebye head mask

* refactor attention

* nits.

* format

* fix nits.

* nuke ande updates

* nuke tokenizer test

* don't reshape query with kv heads

* added a bit of documentation.

* remove unneeded things

* nuke more stuff

* nit

* logits match - same generations

* rm unneeded methods

* 1 remaining failing CI test

* nit

* fix nits

* fix docs

* fix docs

* rm tokenizer

* fixup

* fixup

* fixup and fix tests

* fixed configuration object.

* use correct activation

* few minor fixes

* clarify docs a bit

* logits match à 1e-12

* skip and unskip a test

* added some slow tests.

* fix readme

* add more details

* Update docs/source/en/model_doc/mpt.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix configuration issues

* more fixes in config

* added more models

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* remove unneeded position ids

* fix some  comments

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* revert suggestion

* mpt alibi + added batched generation

* Update src/transformers/models/mpt/__init__.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* remove init config

* Update src/transformers/models/mpt/configuration_mpt.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix nit

* add another slow test

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fits in one line

* some refactor because make fixup doesn't pass

* add ft notebook

* update md

* correct doc path

---------

Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-07-25 14:32:40 +02:00
..
benchmark
bettertransformer
bnb [gpt2-int8] Add gpt2-xl int8 test (#24543) 2023-06-28 18:02:13 +02:00
deepspeed accelerate deepspeed and gradient accumulation integrate (#23236) 2023-05-31 15:16:22 +05:30
extended
fixtures
generation Contrastive Search peak memory reduction (#24120) 2023-07-20 18:46:53 +01:00
models [MPT] Add MosaicML's MPT model to transformers (#24629) 2023-07-25 14:32:40 +02:00
optimization
pipelines [Llama2] Add support for Llama 2 (#24891) 2023-07-18 15:18:31 -04:00
repo_utils Fix expected value in tests of the test fetcher (#24077) 2023-06-07 11:38:56 -04:00
sagemaker
tokenization
tools Add support for for loops in python interpreter (#24429) 2023-06-26 09:58:14 -04:00
trainer Add dispatch_batches to training arguments (#25038) 2023-07-24 09:27:19 -04:00
utils Make (TF) CI faster (test only a subset of model classes) (#24592) 2023-06-30 16:54:54 +02:00
__init__.py
test_backbone_common.py Add TimmBackbone model (#22619) 2023-06-06 17:11:30 +01:00
test_configuration_common.py Split common test from core tests (#24284) 2023-06-15 07:30:24 -04:00
test_configuration_utils.py Split common test from core tests (#24284) 2023-06-15 07:30:24 -04:00
test_feature_extraction_common.py Split common test from core tests (#24284) 2023-06-15 07:30:24 -04:00
test_feature_extraction_utils.py Split common test from core tests (#24284) 2023-06-15 07:30:24 -04:00
test_image_processing_common.py Split common test from core tests (#24284) 2023-06-15 07:30:24 -04:00
test_image_processing_utils.py Run hub tests (#24807) 2023-07-13 15:25:45 -04:00
test_image_transforms.py Bug fix - flip_channel_order for channels first images (#23701) 2023-05-31 17:12:27 +01:00
test_modeling_common.py Fix last models for common tests that are too big. (#25058) 2023-07-25 07:56:04 -04:00
test_modeling_flax_common.py Split common test from core tests (#24284) 2023-06-15 07:30:24 -04:00
test_modeling_flax_utils.py Split common test from core tests (#24284) 2023-06-15 07:30:24 -04:00
test_modeling_tf_common.py Speed up TF tests by reducing hidden layer counts (#24595) 2023-06-30 16:30:33 +01:00
test_modeling_tf_utils.py Split common test from core tests (#24284) 2023-06-15 07:30:24 -04:00
test_modeling_utils.py Show a warning for missing attention masks when pad_token_id is not None (#24510) 2023-06-30 08:19:39 -04:00
test_pipeline_mixin.py Update tiny models for pipeline testing. (#24364) 2023-06-20 14:43:10 +02:00
test_sequence_feature_extraction_common.py
test_tokenization_common.py Fix TypeError: Object of type int64 is not JSON serializable (#24340) 2023-06-27 12:15:49 +01:00
test_tokenization_utils.py Split common test from core tests (#24284) 2023-06-15 07:30:24 -04:00