transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-31 10:12:23 +06:00

History

Arthur 0fe44059ae Add recurrent gemma (#30143 ) * Fork. * RecurrentGemma initial commit. * Updating __init__.py. * Minor modification to how we initialize the cache. Changing how the config specifies the architecture. * Reformat code to 4 spaces. Fixed a few typos. * Fixed the forward pass. Still unclear on the cache? * Fixed the RecurrentGemmaForCausalLM * Minor comment that we might not need attention_mask and output_attention arguments. * Now cache should work as well. * Adding a temporary example to check whether the model generation works. * Adding the tests and updating imports. * Adding the example file missing in the previous commit. * First working example. * Removing .gitignore and reverting parts of __init__. * Re-add .gitignore. * Addressing comments for configuration. * Move mask creation to `_prepare_inputs_for_generation`. * First try at integration tests: 1. AttributeError: 'GriffinCausalLMOutput' object has no attribute 'attentions'. 2. `cache_position` not passed * Transfoering between machines. * Running normal tests. * Minor fix. * More fixes. * Addressing more comments. * Minor fixes. * first stab at cleanup * more refactoring * fix copies and else * renaming and get init to work * fix causal mask creation * update * nit * fix a hell lot of things * updates * update conversion script * make all keys importable * nits * add auto mappings * properly convert ffw_up and down * add scaling * fix generations * for recurrent dtype * update * fix going beyong window * fixup * add missing files * current updates to remove last einops * finish modeling refactor * TADA * fix compile * fix most failing testt ? ? * update tests * refactor and update * update * nits, fixup and update tests * more fixup * nits * fix imports * test format * fixups * nits * tuple typing * fix code quality * add model card * fix doc * skip most generation tests * nits * style * doc fixes * fix pr and check_copies? * last nit * oupsy * Apply suggestions from code review Co-authored-by: Lysandre Debut <hi@lysand.re> * update * Update src/transformers/models/recurrent_gemma/convert_recurrent_gemma_to_hf.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * update based on review * doc nit * fix quality * quality * fix slow test model path * update default dype * ignore attributes that can be safely ignored in check config attributes * 0lallalala come on * save nit * style * remove to dict update * make sure we can also run in float16 * style --------- Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> Co-authored-by: Aleksandar Botev <botev@google.com> Co-authored-by: Leonard Berrada <lberrada@users.noreply.github.com> Co-authored-by: anushanf <anushanf@google.com> Co-authored-by: botev <botevmg@gmail.com> Co-authored-by: Lysandre Debut <hi@lysand.re> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>		2024-04-10 16:59:13 +02:00
..
benchmark
bettertransformer	Fixed malapropism error (#26660 )	2023-10-09 11:04:57 +02:00
deepspeed	Fix failing DeepSpeed model zoo tests (#30112 )	2024-04-09 12:01:47 +05:30
extended	[tests] make `test_trainer_log_level_replica` to run on accelerators with more than 2 devices (#29609 )	2024-03-13 17:44:35 +00:00
fixtures	Implementation of SuperPoint and AutoModelForKeypointDetection (#28966 )	2024-03-19 14:43:02 +00:00
fsdp	Add support for FSDP+QLoRA and DeepSpeed ZeRO3+QLoRA (#29587 )	2024-03-13 22:03:02 +05:30
generation	Fix length related warnings in speculative decoding (#29585 )	2024-04-10 12:45:07 +05:00
models	Add recurrent gemma (#30143 )	2024-04-10 16:59:13 +02:00
optimization	Make schedulers picklable by making lr_lambda fns global (#21768 )	2023-03-02 12:08:43 -05:00
peft_integration	FIX [`CI`]: Fix failing tests for peft integration (#29330 )	2024-02-29 03:56:16 +01:00
pipelines	Revert workaround for TF safetensors loading (#30128 )	2024-04-09 11:04:18 +01:00
quantization	Fix quantization tests (#29914 )	2024-04-09 17:10:29 +02:00
repo_utils	Allow `# Ignore copy` (#27328 )	2023-12-07 10:00:08 +01:00
sagemaker	Update all references to canonical models (#29001 )	2024-02-16 08:16:58 +01:00
tokenization	Remove static pretrained maps from the library's internals (#29112 )	2024-03-25 10:33:38 +01:00
tools	Add support for for loops in python interpreter (#24429 )	2023-06-26 09:58:14 -04:00
trainer	Rework tests to compare trainer checkpoint args (#29883 )	2024-03-30 22:19:17 -04:00
utils	Update `tests/utils/tiny_model_summary.json` (#29941 )	2024-04-03 09:25:01 +02:00
__init__.py
test_backbone_common.py	Align backbone stage selection with out_indices & out_features (#27606 )	2023-12-20 18:33:17 +00:00
test_cache_utils.py	Generate: add tests for caches with `pad_to_multiple_of` (#29462 )	2024-03-06 10:57:04 +00:00
test_configuration_common.py	[ `PretrainedConfig`] Improve messaging (#27438 )	2023-11-15 14:10:39 +01:00
test_configuration_utils.py	[tests] remove deprecated tests for model loading (#29450 )	2024-03-15 14:18:41 +00:00
test_feature_extraction_common.py	Split common test from core tests (#24284 )	2023-06-15 07:30:24 -04:00
test_feature_extraction_utils.py	[tests] remove deprecated tests for model loading (#29450 )	2024-03-15 14:18:41 +00:00
test_image_processing_common.py	Raise unused kwargs image processor (#29063 )	2024-02-20 16:20:20 +01:00
test_image_processing_utils.py	[tests] remove deprecated tests for model loading (#29450 )	2024-03-15 14:18:41 +00:00
test_image_transforms.py	Normalize floating point cast (#27249 )	2023-11-10 15:35:27 +00:00
test_modeling_common.py	Fix slow tests for important models to be compatible with A10 runners (#29905 )	2024-04-09 13:28:54 +02:00
test_modeling_flax_common.py	[Flax] Update no init test for Flax v0.7.1 (#28735 )	2024-01-26 18:20:39 +00:00
test_modeling_flax_utils.py	Enable safetensors conversion from PyTorch to other frameworks without the torch requirement (#27599 )	2024-01-23 10:28:23 +01:00
test_modeling_tf_common.py	Add tf_keras imports to prepare for Keras 3 (#28588 )	2024-01-30 17:26:36 +00:00
test_modeling_tf_utils.py	Cast bfloat16 to float32 for Numpy conversions (#29755 )	2024-03-21 14:04:11 +00:00
test_modeling_utils.py	[tests] make 2 tests device-agnostic (#30008 )	2024-04-10 14:46:39 +02:00
test_pipeline_mixin.py	Image Feature Extraction pipeline (#28216 )	2024-02-05 14:50:07 +00:00
test_processing_common.py	Don't save `processor_config.json` if a processor has no extra attribute (#28584 )	2024-01-19 09:59:14 +00:00
test_sequence_feature_extraction_common.py	Fix typo (#25966 )	2023-09-05 10:12:25 +02:00
test_tokenization_common.py	skip `test_encode_decode_fast_slow_all_tokens` for now (#30044 )	2024-04-05 09:07:41 +02:00
test_tokenization_utils.py	[tests] remove deprecated tests for model loading (#29450 )	2024-03-15 14:18:41 +00:00