Commit Graph

16502 Commits

Author SHA1 Message Date
OsamaS99
51ab25e293
Fixed Hybrid Cache Shape Initialization. (#32163)
* fixed hybrid cache init, added test

* Fix Test Typo

---------

Co-authored-by: Aaron Haag <aaron.haag@siemens.com>
2024-08-01 13:57:42 +01:00
Joao Gante
e3d8285a84
Docker: add speech dep to the consistency docker image (#32374) 2024-08-01 13:46:11 +01:00
Nikos Karampatziakis
ca59d6f77c
Offloaded KV Cache (#31325)
* Initial implementation of OffloadedCache

* enable usage via cache_implementation

* Address feedback, add tests, remove legacy methods.

* Remove flash-attn, discover synchronization bugs, fix bugs

* Prevent usage in CPU only mode

* Add a section about offloaded KV cache to the docs

* Fix typos in docs

* Clarifications and better explanation of streams
2024-08-01 14:42:07 +02:00
Omar Salman
b4727a1216
Fix conflicting key in init kwargs in PreTrainedTokenizerBase (#31233)
* Fix conflicting key in init kwargs in PreTrainedTokenizerBase

* Update code to check for callable key in save_pretrained

* Apply PR suggestions

* Invoke CI

* Updates based on PR suggestion
2024-08-01 14:32:13 +02:00
Viktor Scherbakov
db8c7caeb6
Empty list in defaults for LLaMA special tokens during weights conversion (#32342)
empty list in defaults
2024-08-01 14:30:10 +02:00
Ita Zaporozhets
2229ebe722
update clean_up_tokenization_spaces warning (#32371) 2024-08-01 13:57:41 +02:00
Hanna Yukhymenko
05c1f9af9a
Check device map for saving tokenizer config on TPU (fix for issue #31971) (#32043)
* Remove TPU device map for saving tokenizer config

* Update tokenization_utils_base.py

* Fix error msg when passing non-string device into tokenizer

* Fix error message for non-string tokenizer device

* Print out tokenizer device type in error msg

* Update tokenization_utils_base.py
2024-08-01 13:52:05 +02:00
nv-guomingz
9e28284032
add missing attribute _supports_param_buffer_assignment for gpt-j. (#32359)
Co-authored-by: Guoming Zhang <37257613+nv-guomingz@users.noreply.github.com>
2024-08-01 13:51:20 +02:00
Lunwen He
48ed24c50a
Remove size check between attn_weights and kv_seq_len for phi3 (#32339)
* Remove size check between attn_weights and kv_seq_len

* add unit tests
2024-08-01 13:49:00 +02:00
Sanchit Gandhi
e234061cdd
[whisper] compile compatibility with long-form decoding (#31772)
* [whisper] compile compatibility with long-form decoding

* clarify comment

* fix after rebase

* finalise

* fix bsz

* fix cache split

* remove contiguous

* style

* finish

* update doc

* prevent cuda graph trace
2024-08-01 18:10:56 +08:00
Sanchit Gandhi
9451a38526
[enc-dec cache] fix bug in indexing (#32370) 2024-08-01 16:05:27 +08:00
Raushan Turganbay
453e74884f
LLaVa: add cache class attribute (#32278)
cache class flag
2024-08-01 09:48:03 +05:00
Ricardo
14ee2326e5
fix: warmup_steps check for training_args (#32236) 2024-07-31 23:34:22 +01:00
Sai-Suraj-27
53f0c9c290
fix: Removed unnecessary @staticmethod decorator (#32361)
* Fixed staticmethods with self as first argument.

* Fixed staticmethods with self as first argument.

* Fixed staticmethods with self as first argument.

* Fixed staticmethods with self as first argument.
2024-07-31 20:56:50 +01:00
fxmarty
92abe60334
>3-5x faster torch.compile forward compilation for autoregressive decoder models (#32227)
* draft

* apply changes to all relevant archs

* rerun ci - check_docstrings.py failing?

* fix docstring

* move 2D->4D mask creation to modeling file

* repo consistency

* fix the batch size = 1 case - calling contiguous is not enough

* nit

* style

* propagate to gemma/gemma-2

* prepare inputs for gemma generation

* implement test and tiny fix in gemma2

* Update src/transformers/models/bloom/modeling_bloom.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix copies

* ci pass

* fix gemma's test_compile_static_cache tests

* flacky

* retrigger ci

---------

Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-08-01 02:03:07 +08:00
Aymeric Roucher
b46bd8b9d2
Fix error when streaming to gradio with non-string tool arguments (#32360)
Fix error when streaming agent run to gradio with non-string tool arguments
2024-07-31 18:44:53 +02:00
Joao Gante
ef177a5e1c
Gemma 2: support assisted generation (#32357) 2024-07-31 16:04:48 +01:00
amyeroberts
5f1fcc299c
[Idefics2] - Fix FA2 call for Perceiver layer (#32275)
* Fix FA2 call for Perciever layer

* [run_slow] idefics2

* [run_slow] idefics2

* [run_slow] idefics2

* Fix up

* [run_slow] idefics2

* [run_slow] idefics2

* [run_slow] idefics2
2024-07-31 14:51:04 +01:00
Joao Gante
b75ad56620
Llama 3.1: Fix incorrect inv_freq assignment (#32330)
fix 💩
2024-07-31 11:12:46 +01:00
Raushan Turganbay
7f552e28e0
Gemma2 and flash-attention (#32188)
* enable flash-attn & static cache

* this works, not the prev

* fix for sliding window layers

* not needed anymore
2024-07-31 10:33:38 +05:00
Raushan Turganbay
a3264332cf
LLaVA-NeXT: fix anyres shapes (#32314)
fix
2024-07-31 10:01:12 +05:00
Joshua Lochner
6e2d04e429
Fix slow GemmaTokenizer and improve SPM slow -> fast conversion process (#32191)
* Remove user-defined tokens which can be obtained through merges

* Remove debug line

* formatting

* Refactor spm slow -> fast converter

* revert unnecessary refactor

* set comprehension

* remove test files

* Use `vocab_scores`

* Always replace spiece underline with space in decode

* we no longer need token filtering

* Add save fast load slow unit test

* Remove tokenizers version check

* Remove duplicate code

* Make `<start_of_turn>` and `<end_of_turn>` special tokens

* Bias merge priority with length if score is the same

* Add unit test for merge priority

* CI
2024-07-30 23:36:38 +02:00
Joao Gante
026a173a64
Repo checks: skip docstring checks if not in the diff (#32328)
* tmp

* skip files not in the diff

* use git.Repo instead of an external subprocess

* add tiny change to confirm that the diff is working on pushed changes

* add make quality task

* more profesh main commit reference
2024-07-30 18:56:10 +01:00
fkrasnov2
516af4bb63
fixes #32329 : The Torch code is correct - to get an average of 10% o… (#32335)
fixes #32329 : The Torch code is correct - to get an average of 10% of the total, we want to take 50% of the remainder after we've already masked 80% with [MASK] in the previous step.
2024-07-30 18:21:45 +01:00
Wing Lian
62c60a3018
fixes to properly shard FSDP across cpu and meta for cpu_efficient_loading for prequantized 4bit (#32276) 2024-07-30 18:55:59 +02:00
Sai-Suraj-27
1627108033
fix: Added missing raise keyword for few exceptions (#32333)
Fixed raising of few exceptions.
2024-07-30 17:53:03 +01:00
plaggy
bd54ed2ed7
Alternative agent plan (#32295)
* new agent plan

* plan type assertion

* style corrections

* better prompt naming

* make fixup
2024-07-30 18:48:18 +02:00
Joao Gante
e68ec18ce2
Docs: formatting nits (#32247)
* doc formatting nits

* ignore non-autodocs

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/esm/modeling_esm.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/esm/modeling_esm.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* make fixup

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-30 15:49:14 +01:00
Yoach Lacombe
2fbbcf5007
Fix M4T for ASR pipeline (#32296)
* tentative fix

* do the same for M4T
2024-07-30 16:00:13 +02:00
Luc Georges
084b5094eb
feat(ci): set fetch-depth: 0 in trufflehog checkout step (#31663) 2024-07-30 14:49:26 +02:00
Teddy Ferdinan
20528f067c
Cast epochs_trained to int when resuming training (#32286)
* fix epochs_trained as int when resuming training

* refactor

---------

Co-authored-by: teddyferdinan <teddy.ferdinan@pwr.edu.pl>
2024-07-30 11:25:54 +02:00
Isotr0py
934fe1504e
Fix GGUF dequantize for gguf==0.9.1 (#32298)
* fix gguf dequantize for gguf==0.9.1

* fix old version

* make style
2024-07-30 11:01:00 +02:00
Gilad Turok
3e8106d253
Docs: fix GaLore optimizer code example (#32249)
Docs: fix GaLore optimizer example

Fix incorrect usage of GaLore optimizer in Transformers trainer code example.

The GaLore optimizer uses low-rank gradient updates to reduce memory usage. GaLore is quite popular and is implemented by the authors in [https://github.com/jiaweizzhao/GaLore](https://github.com/jiaweizzhao/GaLore). A few months ago GaLore was added to the HuggingFace Transformers library in https://github.com/huggingface/transformers/pull/29588.

Documentation of the Trainer module includes a few code examples of how to use GaLore. However, the `optim_targe_modules` argument to the `TrainingArguments` function is incorrect, as discussed in https://github.com/huggingface/transformers/pull/29588#issuecomment-2006289512. This pull request fixes this issue.
2024-07-30 09:19:24 +02:00
Yih-Dar
f0bc49e7f6
use torch 2.4 in 2 CI jobs (#32302)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-07-29 22:12:21 +02:00
Aymeric Roucher
a24a9a66f4
Add stream messages from agent run for gradio chatbot (#32142)
* Add stream_to_gradio method for running agent in gradio demo
2024-07-29 20:12:44 +02:00
Guang Yang
811a9caa21
Make static cache compatible with torch.export (#32168) 2024-07-29 18:19:15 +01:00
Sanchit Gandhi
7f5d644e69
[pipeline] fix padding for 1-d tensors (#31776)
* [pipeline] fix padding for 1-d tensors

* add test

* make style

* Update tests/pipelines/test_pipelines_automatic_speech_recognition.py

Co-authored-by: Kamil Akesbi <45195979+kamilakesbi@users.noreply.github.com>

* Update tests/pipelines/test_pipelines_automatic_speech_recognition.py

---------

Co-authored-by: Kamil Akesbi <45195979+kamilakesbi@users.noreply.github.com>
2024-07-29 21:24:42 +08:00
Kamil Akesbi
3fbaaaa64d
Whisper tokenizer word level timestamps (#32197)
* fix _fix_key in PreTrainedModel

* fix _find_longest_common_sequence

* add test

* remove result.json

* nit

* update test
2024-07-29 11:19:52 +01:00
Joao Gante
7ffe25f2b9
Generate: end-to-end compilation (#30788)
* mvp

* added test (a few models need fixes)

* fix a few test cases

* test nits

* harder test 😈

* revert changes in stablelm

* test with improved condition

* add todo

* tmp commit

* merged with main

* nits

* add todo

* final corrections

* add docs for generation compilation

* docs nits

* add  tip

* PR suggestions

* add more details to the compilation docs

* fix cache positions

* cache is now init in generate; update docs

* tag test as flaky

* docs

* post rebase make fixup and other nits

* remove unintended changes

* whisper (encoder-decoder) not supported

* move token default updates to ; add tests for token defaults

* push changes

* manual rebase

* chameleon doesn't support this

* fix test_static_cache_mha_mqa_gqa (broken in another PR)

* docs: dynamic is better with end-to-end compilation
2024-07-29 10:52:13 +01:00
Sai-Suraj-27
49928892d6
fix(docs): Fixed a link in docs (#32274)
Fixed a link in docs.
2024-07-29 10:50:43 +01:00
Fanli Lin
6494479f1d
make p_mask a numpy array before passing to select_starts_ends (#32076)
* fix

* bug fix

* refine

* fix
2024-07-29 10:29:11 +01:00
Joao Gante
535fe78b9f
Repo: remove exceptions in check_docstrings (#32259)
remove exceptions
2024-07-29 11:06:05 +02:00
Sai-Suraj-27
a2ad9d5ad5
fix: Fixed wrong argument passed to convert_blip_checkpoint function call (#32262)
Removed one wrong argument passed to convert_blip_checkpoint function call.
2024-07-29 10:43:09 +02:00
leejet
5019aabfac
Optimize t5 tokenize logic to avoid redundant calls (#32270)
* Optimize t5 tokenize logic to avoid redundant calls

* fix and overwrite copies
2024-07-29 09:51:43 +02:00
Yih-Dar
f2122cc6eb
Upload new model failure report to Hub (#32264)
upload

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-07-29 09:42:54 +02:00
Raushan Turganbay
f739687684
🚨 Bloom support for cache class (#31445)
* bloom dynamic cache

* bloom follows standard cache format

* no skips for bloom anymore

* use cache position when possible

* clean up

* codestyle

* Update src/transformers/models/bloom/modeling_bloom.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/bloom/modeling_bloom.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/bloom/modeling_bloom.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* pr comments

* isinstance fix

* address comments

* make musicgen test happy

* [run-slow] bloom

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-29 10:58:59 +05:00
Joao Gante
44f6fdd74f
Llama 3.1: replace for loop by tensor ops at inv_freq initialization (#32244)
* replace for loop by tensor ops

* rm assert; readability
2024-07-27 10:19:46 +01:00
Yih-Dar
8da9068730
More flexible trigger condition (#32251)
update

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-07-26 20:52:45 +02:00
Raushan Turganbay
81233c069c
Flash-Attn: fix generation when no attention mask or no pading (#32241)
* fix

* fix prev test (half of failures)

* [run-slow] llama, gemma2

* [run-slow] llama, gemma2
2024-07-26 14:45:55 +05:00
Fanli Lin
27c7f971c0
[tests] fix static cache implementation is not compatible with attn_implementation==flash_attention_2 (#32039)
* add flash attention check

* fix

* fix
2024-07-26 11:41:27 +02:00