transformers/tests
mig-mfreitas 34b43211d7
Add YaRN and Dynamic-YaRN RoPE Scaling Methods (#30910)
* Add YaRN and Dynamic-YaRN RoPE Scaling Methods

YaRN (Yet another RoPE extension method) combines the NTK-By-Parts
Interpolation and Attention Scaling methods, improving upon existing
RoPE interpolation methods for longer context window sizes.

Fine-tuned models maintain their original performance across benchmarks
while enabling efficient extrapolation and transfer learning for
quicker convergence, especially in compute-limited environments.

We implement YaRN and Dynamic-YaRN for the following list of models:

 - LLaMA
 - Falcon
 - GPT-NeoX
 - Olmo
 - Persimmon
 - Phi
 - StableLM
 - OpenLLaMA

New unit tests are added to assert YaRN's correct behavior on both
short and long sequence inputs.

For more details, please refer to https://arxiv.org/abs/2309.00071.

Co-authored-by: Miguel Almeida <miguel.pessanha.almeida@tecnico.ulisboa.pt>

* Refactor YaRN implementation for LLaMA

Iterate on YaRN implementation for LLaMA and remove diff from remaining
models for increased PR modularity.

This commit includes the following changes:
- Merge 'yarn_rope_scaling' and 'rope_scaling' dictionaries
- Remove unnecessary attributes ('extrapolation_factor' and 'finetuned')
  from YaRN classes
- Inherit 'forward' method in YaRN classes from superclass
- Rename 'yarn' method to 'compute_yarn_scaling'
- Extend YaRN tests with further assertions
- Fix style inconsistencies

Co-authored-by: Miguel Monte e Freitas <miguelmontefreitas@tecnico.ulisboa.pt>

* Refactor Tensor Building Logic for YaRN

- Comply with the the tensor building logic introduced in #30743
- Add referencing to the optimized Attention Factor equation
- Remove Dynamic YaRN for a more agile deployment

Co-authored-by: mig-mfreitas <mig-mfreitas@users.noreply.github.com>

* remove unwanted file

---------

Co-authored-by: Miguel Almeida <miguel.pessanha.almeida@tecnico.ulisboa.pt>
Co-authored-by: mig-mfreitas <mig-mfreitas@users.noreply.github.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
2024-07-23 10:07:58 +01:00
..
agents fix: Fixed raising TypeError instead of ValueError for invalid type (#32111) 2024-07-22 17:46:17 +01:00
benchmark
bettertransformer Fixed malapropism error (#26660) 2023-10-09 11:04:57 +02:00
deepspeed [tests] fix deepspeed zero3 config for test_stage3_nvme_offload (#31881) 2024-07-16 16:11:37 +02:00
extended Skip tests properly (#31308) 2024-06-26 21:59:08 +01:00
fixtures Implementation of SuperPoint and AutoModelForKeypointDetection (#28966) 2024-03-19 14:43:02 +00:00
fsdp Llama et al. / FSDP : Fix breaking change in 4.40 for FSDP (#31161) 2024-06-26 14:50:08 +01:00
generation Generate: store special token tensors under a unique variable name (#31980) 2024-07-22 14:06:49 +01:00
models Add YaRN and Dynamic-YaRN RoPE Scaling Methods (#30910) 2024-07-23 10:07:58 +01:00
optimization fix: Fixed the 1st argument name in classmethods (#31907) 2024-07-11 12:11:50 +01:00
peft_integration FIX [CI]: Fix failing tests for peft integration (#29330) 2024-02-29 03:56:16 +01:00
pipelines Remove trust_remote_code when loading Libri Dummy (#31748) 2024-07-23 14:54:38 +08:00
quantization Add new quant method (#32047) 2024-07-22 20:21:59 +02:00
repo_utils Allow # Ignore copy (#27328) 2023-12-07 10:00:08 +01:00
sagemaker Fixed log messages that are resulting in TypeError due to too many arguments (#32017) 2024-07-17 10:56:44 +01:00
tokenization Skip tests properly (#31308) 2024-06-26 21:59:08 +01:00
trainer Fix tests after huggingface_hub 0.24 (#32054) 2024-07-19 19:32:39 +01:00
utils Remove trust_remote_code when loading Libri Dummy (#31748) 2024-07-23 14:54:38 +08:00
__init__.py
test_backbone_common.py Align backbone stage selection with out_indices & out_features (#27606) 2023-12-20 18:33:17 +00:00
test_configuration_common.py Move some test files (tets/test_xxx_utils.py) to tests/utils (#31730) 2024-07-02 13:46:03 +02:00
test_feature_extraction_common.py Split common test from core tests (#24284) 2023-06-15 07:30:24 -04:00
test_image_processing_common.py Skip tests properly (#31308) 2024-06-26 21:59:08 +01:00
test_image_transforms.py fix: center_crop occasionally outputs off-by-one dimension matrix (#30934) 2024-05-21 13:56:52 +01:00
test_modeling_common.py Chameleon: add model (#31534) 2024-07-17 10:41:43 +05:00
test_modeling_flax_common.py add sdpa to ViT [follow up of #29325] (#30555) 2024-05-16 10:56:11 +01:00
test_modeling_tf_common.py Port IDEFICS to tensorflow (#26870) 2024-05-13 15:59:46 +01:00
test_pipeline_mixin.py fix: Fixed raising TypeError instead of ValueError for invalid type (#32111) 2024-07-22 17:46:17 +01:00
test_processing_common.py add initial design for uniform processors + align model (#31197) 2024-06-13 16:27:16 +02:00
test_sequence_feature_extraction_common.py Fix typo (#25966) 2023-09-05 10:12:25 +02:00
test_tokenization_common.py Return assistant generated tokens mask in apply_chat_template (#30650) 2024-07-22 18:24:43 +01:00