transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-08-02 19:21:31 +06:00

History

Arthur dcb183f4bd [`MPT`] Add MosaicML's `MPT` model to transformers (#24629 ) * draft add new model like * some cleaning of the config * nits * add nested configs * nits * update * update * added layer norms + triton kernels * consider only LPLayerNorm for now. * update * all keys match. * Update * fixing nits here and there * working forward pass. * removed einops dependency * nits * format * add alibi * byebye head mask * refactor attention * nits. * format * fix nits. * nuke ande updates * nuke tokenizer test * don't reshape query with kv heads * added a bit of documentation. * remove unneeded things * nuke more stuff * nit * logits match - same generations * rm unneeded methods * 1 remaining failing CI test * nit * fix nits * fix docs * fix docs * rm tokenizer * fixup * fixup * fixup and fix tests * fixed configuration object. * use correct activation * few minor fixes * clarify docs a bit * logits match à 1e-12 * skip and unskip a test * added some slow tests. * fix readme * add more details * Update docs/source/en/model_doc/mpt.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix configuration issues * more fixes in config * added more models * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * remove unneeded position ids * fix some comments * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * revert suggestion * mpt alibi + added batched generation * Update src/transformers/models/mpt/__init__.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * remove init config * Update src/transformers/models/mpt/configuration_mpt.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix nit * add another slow test * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fits in one line * some refactor because make fixup doesn't pass * add ft notebook * update md * correct doc path --------- Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>		2023-07-25 14:32:40 +02:00
..
benchmark
bettertransformer
bnb	[`gpt2-int8`] Add gpt2-xl int8 test (#24543 )	2023-06-28 18:02:13 +02:00
deepspeed	accelerate deepspeed and gradient accumulation integrate (#23236 )	2023-05-31 15:16:22 +05:30
extended
fixtures
generation	Contrastive Search peak memory reduction (#24120 )	2023-07-20 18:46:53 +01:00
models	[`MPT`] Add MosaicML's `MPT` model to transformers (#24629 )	2023-07-25 14:32:40 +02:00
optimization
pipelines	[`Llama2`] Add support for Llama 2 (#24891 )	2023-07-18 15:18:31 -04:00
repo_utils	Fix expected value in tests of the test fetcher (#24077 )	2023-06-07 11:38:56 -04:00
sagemaker
tokenization
tools	Add support for for loops in python interpreter (#24429 )	2023-06-26 09:58:14 -04:00
trainer	Add dispatch_batches to training arguments (#25038 )	2023-07-24 09:27:19 -04:00
utils	Make (TF) CI faster (test only a subset of model classes) (#24592 )	2023-06-30 16:54:54 +02:00
__init__.py
test_backbone_common.py	Add TimmBackbone model (#22619 )	2023-06-06 17:11:30 +01:00
test_configuration_common.py	Split common test from core tests (#24284 )	2023-06-15 07:30:24 -04:00
test_configuration_utils.py	Split common test from core tests (#24284 )	2023-06-15 07:30:24 -04:00
test_feature_extraction_common.py	Split common test from core tests (#24284 )	2023-06-15 07:30:24 -04:00
test_feature_extraction_utils.py	Split common test from core tests (#24284 )	2023-06-15 07:30:24 -04:00
test_image_processing_common.py	Split common test from core tests (#24284 )	2023-06-15 07:30:24 -04:00
test_image_processing_utils.py	Run hub tests (#24807 )	2023-07-13 15:25:45 -04:00
test_image_transforms.py	Bug fix - flip_channel_order for channels first images (#23701 )	2023-05-31 17:12:27 +01:00
test_modeling_common.py	Fix last models for common tests that are too big. (#25058 )	2023-07-25 07:56:04 -04:00
test_modeling_flax_common.py	Split common test from core tests (#24284 )	2023-06-15 07:30:24 -04:00
test_modeling_flax_utils.py	Split common test from core tests (#24284 )	2023-06-15 07:30:24 -04:00
test_modeling_tf_common.py	Speed up TF tests by reducing hidden layer counts (#24595 )	2023-06-30 16:30:33 +01:00
test_modeling_tf_utils.py	Split common test from core tests (#24284 )	2023-06-15 07:30:24 -04:00
test_modeling_utils.py	Show a warning for missing attention masks when pad_token_id is not None (#24510 )	2023-06-30 08:19:39 -04:00
test_pipeline_mixin.py	Update tiny models for pipeline testing. (#24364 )	2023-06-20 14:43:10 +02:00
test_sequence_feature_extraction_common.py
test_tokenization_common.py	Fix TypeError: Object of type int64 is not JSON serializable (#24340 )	2023-06-27 12:15:49 +01:00
test_tokenization_utils.py	Split common test from core tests (#24284 )	2023-06-15 07:30:24 -04:00