* Fix issues in add and is_done for BeamHypotheses
* make newly added arguments optional for better compatibility
* Directly use cur_len as generated_len, add note for retrocompatibility
* update test expectation
* make cur_len represents the length of the entire sequence including the decoder prompt
* remove redundant if/else in testing
* Draft version of new KV Caching
This should allow Attention Sinks (https://github.com/tomaarsen/attention_sinks)
/ StreamingLLM (https://arxiv.org/abs/2309.17453) to be easily implemented
in a third-party or in transformers directly
* Address numerous PR suggestions
1. Move layer_idx from cache to ...Attention. Removes confusing set_layer_idx magic.
2. Always convert past_key_values to Cache instance at the start of ...Attention, removes all other isinstance calls.
3. Remove __bool__ and __getitem__ magic as they're confusing.
4. past_key_values.update(key, value, idx) now returns key, value.
5. Add use_legacy_cache flag, defaults to None, i.e. Falsey. This breaks generate for now, until 1) the cache is used is generate() or 2) use_legacy_cache is defaulted to True in generate() until we change it in another PR.
6. Separate key_cache and value_cache.
Some work is still needed to see if the SinkCache can conveniently be implemented with just one update method.
* Implement the SinkCache through backward+forward rotations
* Integrate (Sink)Cache with Llama FA2
* Set use_legacy_cache=True as default, allows for test passes
* Move from/to_legacy_cache to ...Model class
* Undo unnecessary newline change
* Remove copy utility from deprecated OpenLlama
* Match import style
* manual rebase with main
* Cache class working with generate (#1)
* Draft version of new KV Caching
This should allow Attention Sinks (https://github.com/tomaarsen/attention_sinks)
/ StreamingLLM (https://arxiv.org/abs/2309.17453) to be easily implemented
in a third-party or in transformers directly
* Address numerous PR suggestions
1. Move layer_idx from cache to ...Attention. Removes confusing set_layer_idx magic.
2. Always convert past_key_values to Cache instance at the start of ...Attention, removes all other isinstance calls.
3. Remove __bool__ and __getitem__ magic as they're confusing.
4. past_key_values.update(key, value, idx) now returns key, value.
5. Add use_legacy_cache flag, defaults to None, i.e. Falsey. This breaks generate for now, until 1) the cache is used is generate() or 2) use_legacy_cache is defaulted to True in generate() until we change it in another PR.
6. Separate key_cache and value_cache.
Some work is still needed to see if the SinkCache can conveniently be implemented with just one update method.
* Integrate (Sink)Cache with Llama FA2
* Move from/to_legacy_cache to ...Model class
* Undo unnecessary newline change
* Match import style
* working generate
* Add tests; Simplify code; Apply changes to Mistral and Persimmon
* fix rebase mess
* a few more manual fixes
* last manual fix
* propagate changes to phi
* upgrade test
* add use_legacy_cache docstring; beef up tests
* reintroduce unwanted deletes
---------
Co-authored-by: Tom Aarsen <Cubiegamedev@gmail.com>
* move import
* add default to model_kwargs.get('use_legacy_cache')
* correct failing test
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* apply PR suggestions
* fix failing test
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com>
* PR comments
* tmp commit
* add docstrings
* more tests, more docstrings, add to docs
* derp
* tmp commit
* tmp dbg
* more dbg
* fix beam search bug
* cache can be a list of tuples in some models
* fix group beam search
* all but sinkcache integration tests
* fix sink cache and add hard integration test
* now also compatible with input_embeds input
* PR comments
* add Cache support to Phi+FA2
* make fixup
---------
Co-authored-by: Joao Gante <joao@huggingface.co>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Add models
* Add more models
* Update docs/source/ja/model_doc/convnextv2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/model_doc/convbert.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/model_doc/codegen.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update translation errors and author names
* link update
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Updates the Distributed CPU documentation to add a Kubernetes example
* Small edits
* Fixing link
* Adding missing new lines
* Minor edits
* Update to include Dockerfile snippet
* Add comment about tuning env var
* Updates based on review comments
* Un-skip tests
* Add aliasing support to tf_to_pt_weight_rename
* Refactor tf-to-pt weight rename for simplicity
* Patch mobilebert
* Let us pray that the transfo-xl one works
* Add XGLM rename
* Expand the test to see if we can get more models to break
* Expand the test to see if we can get more models to break
* Fix MPNet (it was actually an unrelated bug)
* Fix MPNet (it was actually an unrelated bug)
* Add speech2text fix
* Update src/transformers/modeling_tf_pytorch_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/mobilebert/modeling_tf_mobilebert.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update to always return a tuple from tf_to_pt_weight_rename
* reformat
* Add a couple of missing tuples
* Remove the extra test for tie_word_embeddings since it didn't cause any unexpected failures anyway
* Revert changes to modeling_tf_mpnet.py
* Skip MPNet test and add explanation
* Add weight link for BART
* Add TODO to clean this up a bit
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* update `create_model_card` to properly save peft details when using Trainer with PEFT
* nit
* Apply suggestions from code review
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
---------
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
* add model like
* logits match
* minor fixes
* fixes
* up
* up
* add todo
* llava processor
* keep the processor simple
* add conversion script
* fixup
* fix copies
* up
* add to index
* fix config + logits
* fix
* refactor
* more refactor
* more refactor
* fix copies
* add authors
* v1 tests
* add `LlavaProcessor` in init
* remove unneeded import
* up
* up
* docs
* up
* fix CI
* fix CI
* add attention mask in test
* make fixup
* remove the vision model
* that' s the dirty way to do it
* nits
* nits
* updates
* add more tests
* add input tests
* fixup
* more styling
* nits
* updates amd cleanup
* fixup the generation expected results
* fix the testing script
* some cleanup and simplification which does not work yet but almost there!
* make correct dispatch operations
* vectorize works for batch of images and text
* last todos
* nits
* update test and modeling code
* remove useless function for now
* fix few issues
* fix generation
* some nits
* add bakllava
* nits
* remove duplicated code
* finis merge
* cleanup
* missed this line
* fill the todos
* add left padding offset
* add left and rignt padding logic
* bool to properly index
* make sure
* more cleanups
* batch is fixed 😉
* add correct device for tensor creation
* fix some dtype missmatch
* ruff
* update conversion script
* Update src/transformers/__init__.py
* fa 2 support + fix conversion script
* more
* correct reshaping
* fix test dict
* fix copies by ignoring
* fix nit
* skip clip vision model
* fixup
* fixup
* LlavaForVisionText2Text -> LlavaForCausalLM
* update
* fix
* raise correct errors
* fix
* docs
* nuke for now
* nits here and there
* fixup
* fix remaining tests
* update LlavaForConditionalGeneration instead of CausalLM
* fixups
* pipeline support
* slow and piepline tests
* supports batch
* nits
* cleanup
* fix first integration tests
* add pad token where needed
* correct etsts
* fixups
* update pipeline testr
* fix quality
* nits
* revert unneeded change
* nit
* use BatchFeature
* from ...feature_extraction_utils import BatchFeature
* nits
* nits
* properly update
* more f*** nits
* fix copies
* comment
* keep slow test slow
* Update src/transformers/models/llava/processing_llava.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add piepline example
* add pixel values in docstrign
* update pr doctest
* fix
* fix slow tests
* remove hack
* fixup
* small note
* forward contrib credits from PR25789
* forward contrib credits from original implementation and work
* add arthur
* Update src/transformers/models/llava/processing_llava.py
Co-authored-by: Lysandre Debut <hi@lysand.re>
* update docstring
* nit
* move to not doctested because of timeout issues
* fixup
* add description
* more
* fix-copies
* fix docs
* add beam search
* add more comments
* add typehints on processor
* add speedup plot
* update slow tests and docs
* push test
* push batched test
* fix batched generation with different number of images
* remove benchmark due to a bug
* fix test
* fix copies
* add gcolab demo
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: shauray8 <shauray8@users.noreply.github.com>
Co-authored-by: haotian-liu <haotian-liu@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
* Copies `modeling_flax_gpt_neo.py` to start
* MLP Block. WIP Attention and Block
* Adds Flax implementation of `LlamaMLP`
Validated with in-file test.
Some slight numeric differences, but assuming it isn't an issue
* Adds `FlaxLlamaRMSNorm` layer
`flax.linen` includes `RMSNorm` layer but not necessarily in all
versions. Hence, we add in-file.
* Adds FlaxLlamaAttention
Copied from GPT-J as it has efficient caching implementation as well as
rotary embeddings.
Notice numerically different, but not by a huge amount. Needs
investigating
* Adds `FlaxLlamaDecoderLayer`
numerically inaccurate, debugging..
* debugging rotary mismatch
gptj uses interleaved whilst llama uses contiguous
i think they match now but still final result is wrong.
maybe drop back to just debugging attention layer?
* fixes bug with decoder layer
still somewhat numerically inaccurate, but close enough for now
* adds markers for what to implement next
the structure here diverges a lot from the PT version.
not a big fan of it, but just get something working for now
* implements `FlaxLlamaBlockCollection`]
tolerance must be higher than expected, kinda disconcerting
* Adds `FlaxLlamaModule`
equivalent PyTorch model is `LlamaModel`
yay! a language model🤗
* adds `FlaxLlamaForCausalLMModule`
equivalent to `LlamaForCausalLM`
still missing returning dict or tuple, will add later
* start porting pretrained wrappers
realised it probably needs return dict as a prereq
* cleanup, quality, style
* readds `return_dict` and model output named tuples
* (tentatively) pretrained wrappers work 🔥
* fixes numerical mismatch in `FlaxLlamaRMSNorm`
seems `jax.lax.rsqrt` does not match `torch.sqrt`.
manually computing `1 / jax.numpy.sqrt` results in matching values.
* [WIP] debugging numerics
* numerical match
I think issue was accidental change of backend. forcing CPU fixes test.
We expect some mismatch on GPU.
* adds in model and integration tests for Flax Llama
summary of failing:
- mul invalid combination of dimensions
- one numerical mismatch
- bf16 conversion (maybe my local backend issue)
- params are not FrozenDict
* adds missing TYPE_CHECKING import and `make fixup`
* adds back missing docstrings
needs review on quality of docstrings, not sure what is required.
Furthermore, need to check if `CHECKPOINT_FOR_DOC` is valid. See TODO
* commenting out equivalence test as can just use common
* debugging
* Fixes bug where mask and pos_ids were swapped in pretrained models
This results in all tests passing now 🔥
* cleanup of modeling file
* cleanup of test file
* Resolving simpler review comments
* addresses more minor review comments
* fixing introduced pytest errors from review
* wip additional slow tests
* wip tests
need to grab a GPU machine to get real logits for comparison
otherwise, slow tests should be okay
* `make quality`, `make style`
* adds slow integration tests
- checking logits
- checking hidden states
- checking generation outputs
* `make fix-copies`
* fix mangled function following `make fix-copies`
* adds missing type checking imports
* fixes missing parameter checkpoint warning
* more finegrained 'Copied from' tags
avoids issue of overwriting `LLAMA_INPUTS_DOCSTRING`
* swaps import guards
??? how did these get swapped initially?
* removing `inv_freq` again as pytorch version has now removed
* attempting to get CI to pass
* adds doc entries for llama flax models
* fixes typo in __init__.py imports
* adds back special equivalence tests
these come from the gpt neo flax tests. there is special behaviour for these models that needs to override the common version
* overrides tests with dummy to see if CI passes
need to fill in these tests later
* adds my contribution to docs
* `make style; make quality`
* replaces random masking with fixed to work with flax version
* `make quality; make style`
* Update src/transformers/models/llama/modeling_flax_llama.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Update src/transformers/models/llama/modeling_flax_llama.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Update src/transformers/models/llama/modeling_flax_llama.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Update src/transformers/models/llama/modeling_flax_llama.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Update src/transformers/models/llama/modeling_flax_llama.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Update src/transformers/models/llama/modeling_flax_llama.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* updates `x`->`tensor` in `rotate_half`
* addresses smaller review comments
* Update docs/source/en/model_doc/llama.md
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* adds integration test class
* adds `dtype` to rotary embedding to cast outputs
* adds type to flax llama rotary layer
* `make style`
* `make fix-copies`
* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* applies suggestions from review
* Update modeling_flax_llama.py
* `make fix-copies`
* Update tests/models/llama/test_modeling_llama.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Update src/transformers/models/llama/modeling_flax_llama.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* fixes shape mismatch in FlaxLlamaMLP
* applies some suggestions from reviews
* casts attn output logits to f32 regardless of dtype
* adds attn bias using `LlamaConfig.attention_bias`
* adds Copied From comments to Flax Llama test
* mistral and persimmon test change -copy from llama
* updates docs index
* removes Copied from in tests
it was preventing `make fix-copies` from succeeding
* quality and style
* ignores FlaxLlama input docstring
* adds revision to `_CHECKPOINT_FOR_DOC`
* repo consistency and quality
* removes unused import
* removes copied from from Phi test
now diverges from llama tests following FlaxLlama changes
* adds `_REAL_CHECKPOINT_FOR_DOC`
* removes refs from pr tests
* reformat to make ruff happy
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Add models
* Add models and update `_toctree.yml`
* Update docs/source/ja/model_doc/chinese_clip.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/model_doc/camembert.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/model_doc/bros.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/model_doc/bros.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/model_doc/blip-2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/model_doc/camembert.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* solve merge conflicts and update paper titles
* Update docs/source/ja/model_doc/bridgetower.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/model_doc/canine.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/model_doc/chinese_clip.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update the authons name in bros..md
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Copy perplexity.md file to es/ folder
* Adding perplexity to es/_toctree.yml
* Translate first section
* Calculating PPL section translate
* Example section translate
* fix translate of log-likehood
* Fix title translate
* Fix \ in second paragraph
* Change verosimilitud for log-likelihood
* Run 'make style'
* v1 fusing modules
* add fused mlp support
* up
* fix CI
* block save_pretrained
* fixup
* small fix
* add new condition
* add v1 docs
* add some comments
* style
* fix nit
* adapt from suggestion
* add check
* change arg names
* change variables name
* Update src/transformers/integrations/awq.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* style
* split up into 3 different private methods
* more conditions
* more checks
* add fused tests for custom models
* fix
* fix tests
* final update docs
* final fixes
* fix importlib metadata
* Update src/transformers/utils/quantization_config.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* change it to `do_fuse`
* nit
* Update src/transformers/utils/quantization_config.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update src/transformers/utils/quantization_config.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update src/transformers/utils/quantization_config.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* few fixes
* revert
* fix test
* fix copies
* raise error if model is not quantized
* add test
* use quantization_config.config when fusing
* Update src/transformers/modeling_utils.py
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Create asr.md
* Create audio_classification.md
* Create document_question_answering.md
* Update document_question_answering.md
* add
* add
* ggg
* gg
* add masked_language_modeling.md
* add monocular_depth estimation
* new
* dd
* add
* add
* cl
* add
* Add Traslation.md
* hgf
* Added docs to Toctree file
* Update docs/source/ja/tasks/asr.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/tasks/asr.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/tasks/image_classification.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/tasks/idefics.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/tasks/image_captioning.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Fix docs and revert changes
* Update docs/source/en/tasks/idefics.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/tasks/language_modeling.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/tasks/language_modeling.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/tasks/language_modeling.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/tasks/prompting.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/tasks/masked_language_modeling.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/tasks/masked_language_modeling.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/tasks/prompting.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/tasks/object_detection.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/tasks/semantic_segmentation.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/tasks/semantic_segmentation.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/tasks/token_classification.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/tasks/translation.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/tasks/visual_question_answering.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/tasks/summarization.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* changes in review 1 and 2
* add
* Update docs/source/ja/tasks/asr.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/tasks/translation.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* changes
* Update docs/source/ja/_toctree.yml
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/_toctree.yml
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/_toctree.yml
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update _toctree.yml
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>