* fix long seq bug
* fixed format
* fixed fn copy inconsistency
* fix long seq bug
* fixed format
* fixed fn copy inconsistency
* Addressed comments
* added a unit test
* fixed cache position
* Added a warning msg to the forward fn
* fixed test case
* more precise name
* better docstrings
* Update src/transformers/cache_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* chore(root): Initial commit of Phi-3 files.
* fix(root): Fixes Phi-3 missing on readme.
* fix(root): Ensures files are consistent.
* fix(phi3): Fixes unit tests.
* fix(tests): Fixes style of phi-3 test file.
* chore(tests): Adds integration tests for Phi-3.
* fix(phi3): Removes additional flash-attention usage, .e.g, swiglu and rmsnorm.
* fix(phi3): Fixes incorrect docstrings.
* fix(phi3): Fixes docstring typos.
* fix(phi3): Adds support for Su and Yarn embeddings.
* fix(phi3): Improves according first batch of reviews.
* fix(phi3): Uses up_states instead of y in Phi3MLP.
* fix(phi3): Uses gemma rotary embedding to support torch.compile.
* fix(phi3): Improves how rotary embedding classes are defined.
* fix(phi3): Fixes inv_freq not being re-computed for extended RoPE.
* fix(phi3): Adds last suggestions to modeling file.
* fix(phi3): Splits inv_freq calculation in two lines.