transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-31 02:02:21 +06:00

History

mig-mfreitas 34b43211d7 Add YaRN and Dynamic-YaRN RoPE Scaling Methods (#30910 ) * Add YaRN and Dynamic-YaRN RoPE Scaling Methods YaRN (Yet another RoPE extension method) combines the NTK-By-Parts Interpolation and Attention Scaling methods, improving upon existing RoPE interpolation methods for longer context window sizes. Fine-tuned models maintain their original performance across benchmarks while enabling efficient extrapolation and transfer learning for quicker convergence, especially in compute-limited environments. We implement YaRN and Dynamic-YaRN for the following list of models: - LLaMA - Falcon - GPT-NeoX - Olmo - Persimmon - Phi - StableLM - OpenLLaMA New unit tests are added to assert YaRN's correct behavior on both short and long sequence inputs. For more details, please refer to https://arxiv.org/abs/2309.00071. Co-authored-by: Miguel Almeida <miguel.pessanha.almeida@tecnico.ulisboa.pt> * Refactor YaRN implementation for LLaMA Iterate on YaRN implementation for LLaMA and remove diff from remaining models for increased PR modularity. This commit includes the following changes: - Merge 'yarn_rope_scaling' and 'rope_scaling' dictionaries - Remove unnecessary attributes ('extrapolation_factor' and 'finetuned') from YaRN classes - Inherit 'forward' method in YaRN classes from superclass - Rename 'yarn' method to 'compute_yarn_scaling' - Extend YaRN tests with further assertions - Fix style inconsistencies Co-authored-by: Miguel Monte e Freitas <miguelmontefreitas@tecnico.ulisboa.pt> * Refactor Tensor Building Logic for YaRN - Comply with the the tensor building logic introduced in #30743 - Add referencing to the optimized Attention Factor equation - Remove Dynamic YaRN for a more agile deployment Co-authored-by: mig-mfreitas <mig-mfreitas@users.noreply.github.com> * remove unwanted file --------- Co-authored-by: Miguel Almeida <miguel.pessanha.almeida@tecnico.ulisboa.pt> Co-authored-by: mig-mfreitas <mig-mfreitas@users.noreply.github.com> Co-authored-by: Joao Gante <joao@huggingface.co>		2024-07-23 10:07:58 +01:00
..
agents	fix: Fixed raising `TypeError` instead of `ValueError` for invalid type (#32111 )	2024-07-22 17:46:17 +01:00
benchmark
bettertransformer	Fixed malapropism error (#26660 )	2023-10-09 11:04:57 +02:00
deepspeed	[tests] fix deepspeed zero3 config for `test_stage3_nvme_offload` (#31881 )	2024-07-16 16:11:37 +02:00
extended	Skip tests properly (#31308 )	2024-06-26 21:59:08 +01:00
fixtures	Implementation of SuperPoint and AutoModelForKeypointDetection (#28966 )	2024-03-19 14:43:02 +00:00
fsdp	Llama et al. / FSDP : Fix breaking change in 4.40 for FSDP (#31161 )	2024-06-26 14:50:08 +01:00
generation	Generate: store special token tensors under a unique variable name (#31980 )	2024-07-22 14:06:49 +01:00
models	Add YaRN and Dynamic-YaRN RoPE Scaling Methods (#30910 )	2024-07-23 10:07:58 +01:00
optimization	fix: Fixed the `1st argument` name in classmethods (#31907 )	2024-07-11 12:11:50 +01:00
peft_integration	FIX [`CI`]: Fix failing tests for peft integration (#29330 )	2024-02-29 03:56:16 +01:00
pipelines	Remove `trust_remote_code` when loading Libri Dummy (#31748 )	2024-07-23 14:54:38 +08:00
quantization	Add new quant method (#32047 )	2024-07-22 20:21:59 +02:00
repo_utils	Allow `# Ignore copy` (#27328 )	2023-12-07 10:00:08 +01:00
sagemaker	Fixed `log messages` that are resulting in TypeError due to too many arguments (#32017 )	2024-07-17 10:56:44 +01:00
tokenization	Skip tests properly (#31308 )	2024-06-26 21:59:08 +01:00
trainer	Fix tests after `huggingface_hub` 0.24 (#32054 )	2024-07-19 19:32:39 +01:00
utils	Remove `trust_remote_code` when loading Libri Dummy (#31748 )	2024-07-23 14:54:38 +08:00
__init__.py
test_backbone_common.py	Align backbone stage selection with out_indices & out_features (#27606 )	2023-12-20 18:33:17 +00:00
test_configuration_common.py	Move some test files (`tets/test_xxx_utils.py`) to `tests/utils` (#31730 )	2024-07-02 13:46:03 +02:00
test_feature_extraction_common.py	Split common test from core tests (#24284 )	2023-06-15 07:30:24 -04:00
test_image_processing_common.py	Skip tests properly (#31308 )	2024-06-26 21:59:08 +01:00
test_image_transforms.py	fix: center_crop occasionally outputs off-by-one dimension matrix (#30934 )	2024-05-21 13:56:52 +01:00
test_modeling_common.py	Chameleon: add model (#31534 )	2024-07-17 10:41:43 +05:00
test_modeling_flax_common.py	add sdpa to ViT [follow up of #29325 ] (#30555 )	2024-05-16 10:56:11 +01:00
test_modeling_tf_common.py	Port IDEFICS to tensorflow (#26870 )	2024-05-13 15:59:46 +01:00
test_pipeline_mixin.py	fix: Fixed raising `TypeError` instead of `ValueError` for invalid type (#32111 )	2024-07-22 17:46:17 +01:00
test_processing_common.py	add initial design for uniform processors + align model (#31197 )	2024-06-13 16:27:16 +02:00
test_sequence_feature_extraction_common.py	Fix typo (#25966 )	2023-09-05 10:12:25 +02:00
test_tokenization_common.py	Return assistant generated tokens mask in apply_chat_template (#30650 )	2024-07-22 18:24:43 +01:00