transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-31 02:02:21 +06:00

History

Anton Vlasjuk 5a2aedca1e [`Mamba2`] Fix caching, slow path, and multi-gpu (#35154 ) * fixup mamba2 - caching and several other small fixes * fixup cached forward * correct fix this time * fixup cache - we do not need to extend the attn mask it's handled by generate (gives total ids + mask at each step) * remove unnecessary (un)squeeze * fixup cache position * simplify a few things * [run-slow] mamba2 * multi gpu attempt two * [run-slow] mamba2 * [run-slow] mamba2 * [run-slow] mamba2 * [run-slow] mamba2 * add newer slow path fix * [run-slow] mamba2		2024-12-20 09:27:47 +01:00
..
agents	Add token cost + runtime monitoring to Agent and HfEngine children (#34548 )	2024-12-03 13:14:52 +01:00
benchmark	[Test refactor 1/5] Per-folder tests reorganization (#15725 )	2022-02-23 15:46:28 -05:00
bettertransformer	Fixed malapropism error (#26660 )	2023-10-09 11:04:57 +02:00
deepspeed	Trainer - deprecate tokenizer for processing_class (#32385 )	2024-10-02 14:08:46 +01:00
extended	[tests] skip tests for xpu (#33553 )	2024-09-19 19:28:04 +01:00
fixtures	Implementation of SuperPoint and AutoModelForKeypointDetection (#28966 )	2024-03-19 14:43:02 +00:00
fsdp	FSDP grad accum fix (#34645 )	2024-11-15 22:28:06 +01:00
generation	Add the Bamba Model (#34982 )	2024-12-18 20:18:17 +01:00
models	[`Mamba2`] Fix caching, slow path, and multi-gpu (#35154 )	2024-12-20 09:27:47 +01:00
optimization	fix: Fixed the `1st argument` name in classmethods (#31907 )	2024-07-11 12:11:50 +01:00
peft_integration	[PEFT] Better Trainer error when prompt learning with loading best model at the end (#35087 )	2024-12-11 12:44:39 +01:00
pipelines	Fix seamless TTS generate (#34968 )	2024-12-11 15:38:42 +01:00
quantization	change bnb tests (#34713 )	2024-12-18 09:49:59 -05:00
repo_utils	Refactor CI: more explicit (#30674 )	2024-08-30 18:17:25 +02:00
sagemaker	Trainer - deprecate tokenizer for processing_class (#32385 )	2024-10-02 14:08:46 +01:00
tokenization	VLM: special multimodal Tokenizer (#34461 )	2024-11-04 16:37:51 +01:00
tp	Simplify Tensor Parallel implementation with PyTorch TP (#34184 )	2024-11-18 19:51:49 +01:00
trainer	Fix GA loss bugs and add unit test (#35121 )	2024-12-09 09:57:41 +01:00
utils	🚨All attention refactor🚨 (#35235 )	2024-12-18 16:53:39 +01:00
__init__.py	GPU text generation: mMoved the encoded_prompt to correct device	2020-01-06 15:11:12 +01:00
test_backbone_common.py	Align backbone stage selection with out_indices & out_features (#27606 )	2023-12-20 18:33:17 +00:00
test_configuration_common.py	Load sub-configs from composite configs (#34410 )	2024-11-05 11:34:01 +01:00
test_feature_extraction_common.py	Split common test from core tests (#24284 )	2023-06-15 07:30:24 -04:00
test_image_processing_common.py	Fall back to slow image processor in ImageProcessingAuto when no fast processor available (#34785 )	2024-12-15 14:00:36 -05:00
test_image_transforms.py	fix: center_crop occasionally outputs off-by-one dimension matrix (#30934 )	2024-05-21 13:56:52 +01:00
test_modeling_common.py	Fix some fa2 tests (#35340 )	2024-12-19 17:05:25 +01:00
test_modeling_flax_common.py	🚨All attention refactor🚨 (#35235 )	2024-12-18 16:53:39 +01:00
test_modeling_tf_common.py	🚨All attention refactor🚨 (#35235 )	2024-12-18 16:53:39 +01:00
test_pipeline_mixin.py	Add image text to text pipeline (#34170 )	2024-10-31 15:48:11 -04:00
test_processing_common.py	Separate chat templates into a single file (#33957 )	2024-11-26 14:18:04 +00:00
test_sequence_feature_extraction_common.py	Fix typo (#25966 )	2023-09-05 10:12:25 +02:00
test_tokenization_common.py	Separate chat templates into a single file (#33957 )	2024-11-26 14:18:04 +00:00