Francisco Kurucz
13dc6b0853
Fix documentation links and code reference to model llava-next ( #32434 )
2024-08-05 15:14:50 -07:00
amyeroberts
7e5d46ded4
Respect the config's attn_implementation if set ( #32383 )
...
* Respect the config's attn if set
* Update test - can override in from_config
* Fix
2024-08-05 16:33:19 +01:00
Sai-Suraj-27
458b0cd2c5
fix: Updated test_embeded_special_tokens
for luke and mluke models ( #32413 )
...
Fixed tokenizertests for luke, mluke models.
2024-08-05 15:19:42 +01:00
Abdi
baf7e5c927
Persist embedding type of BART and mBART models after resize ( #32242 )
...
* fix: persist embedding type of MBartConditonalGeneration after resize
* fix: persist embedding type of BartConditonalGeneration after resize
2024-08-05 14:15:36 +01:00
Francisco Kurucz
f5f1e52f6c
Fix documentation references to google/bit-50 model ( #32407 )
2024-08-05 10:18:28 +02:00
Nicholas Broad
ea5da52ebc
add values for neftune ( #32399 )
...
I always forget what typical values are, and I have to look at the paper everytime. This will be a helpful reminder.
2024-08-05 09:51:58 +02:00
Ita Zaporozhets
3d7c2f9dea
#32184 save total_vocab_size ( #32240 )
...
* save total_vocab_size = vocab_size + user added tokens to speed up operation
* updating length when added_tokens_decoder is set
* add test len(tokenizer)
2024-08-05 09:22:48 +02:00
Raushan Turganbay
3bb646a54f
Phi3 tests: fix typing for Python 3.8 ( #32388 )
...
fix phi
2024-08-05 11:58:42 +05:00
TechInterMezzo
05ae3a300d
fix: SeamlessM4TFeatureExtractor stride remainder ( #32088 )
...
* fix: SeamlessM4TFeatureExtractor stride remainder
* Added attention mask size test
* Reran ruff for style correction
2024-08-05 08:40:58 +02:00
dependabot[bot]
847bb856d5
Bump keras from 2.8.0 to 2.13.1 in /examples/research_projects/decision_transformer ( #32393 )
...
Bump keras in /examples/research_projects/decision_transformer
Bumps [keras](https://github.com/keras-team/keras ) from 2.8.0 to 2.13.1.
- [Release notes](https://github.com/keras-team/keras/releases )
- [Commits](https://github.com/keras-team/keras/compare/v2.8.0...v2.13.1 )
---
updated-dependencies:
- dependency-name: keras
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-08-05 08:38:34 +02:00
Xueshen Liu
621fb3c0ed
MixtralFlashAttention2: put "plus 1" inside parentheses when calculating rotary_seq_len, allowing None position_ids input. ( #31500 )
...
* Mixtral: remove unnecessary plus 1 when calculating rotary_seq_len, allowing position_ids=None (no auto position_ids generation could be unsafe)
* fix typo [:-1] to [:, -1]
* to meet formatting requirement
* to meet formatting requirement
* remove white space
* MixtralFlashAttention2: put "+ 1" inside parentheses when calculating rotary_seq_len, allowing None position_ids input. Fix format/style issue.
* propagate to startcoder2, phi3, mixtral and qwen2
* update qwen2_moe
2024-08-03 20:07:55 +02:00
Shaopeng Fu
7c31d05b59
fix: (issue #32124 ) Exception raised when running transformers/examples/flax/language-modeling/t5_tokenizer_model.py
. ( #32157 )
...
fix: Exception raised when running .
2024-08-03 18:24:11 +02:00
Sanchit Gandhi
c1aa0edb48
[generate] only require an attention mask for mps with torch<2.4 ( #32367 )
...
* up
* style
* stopping
2024-08-02 17:32:50 +08:00
Joao Gante
083e13b7c4
RoPE: Add numerical tests ✨ ( #32380 )
...
tests! :D
2024-08-02 09:39:45 +01:00
Raushan Turganbay
2af199c42b
Update docs ( #32368 )
...
nits
2024-08-02 09:54:16 +05:00
Zach Mueller
82efc53513
Yell at the user if zero-3 init wasn't performed, but expected to have been done ( #32299 )
...
* Test this zach
* Test for improper init w/o zero3
* Move back
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Get rid of stars in warning
* Make private
* Make clear
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-08-01 15:18:43 -04:00
OsamaS99
51ab25e293
Fixed Hybrid Cache Shape Initialization. ( #32163 )
...
* fixed hybrid cache init, added test
* Fix Test Typo
---------
Co-authored-by: Aaron Haag <aaron.haag@siemens.com>
2024-08-01 13:57:42 +01:00
Joao Gante
e3d8285a84
Docker: add speech
dep to the consistency docker image ( #32374 )
2024-08-01 13:46:11 +01:00
Nikos Karampatziakis
ca59d6f77c
Offloaded KV Cache ( #31325 )
...
* Initial implementation of OffloadedCache
* enable usage via cache_implementation
* Address feedback, add tests, remove legacy methods.
* Remove flash-attn, discover synchronization bugs, fix bugs
* Prevent usage in CPU only mode
* Add a section about offloaded KV cache to the docs
* Fix typos in docs
* Clarifications and better explanation of streams
2024-08-01 14:42:07 +02:00
Omar Salman
b4727a1216
Fix conflicting key in init kwargs in PreTrainedTokenizerBase ( #31233 )
...
* Fix conflicting key in init kwargs in PreTrainedTokenizerBase
* Update code to check for callable key in save_pretrained
* Apply PR suggestions
* Invoke CI
* Updates based on PR suggestion
2024-08-01 14:32:13 +02:00
Viktor Scherbakov
db8c7caeb6
Empty list in defaults for LLaMA special tokens during weights conversion ( #32342 )
...
empty list in defaults
2024-08-01 14:30:10 +02:00
Ita Zaporozhets
2229ebe722
update clean_up_tokenization_spaces warning ( #32371 )
2024-08-01 13:57:41 +02:00
Hanna Yukhymenko
05c1f9af9a
Check device map for saving tokenizer config on TPU (fix for issue #31971 ) ( #32043 )
...
* Remove TPU device map for saving tokenizer config
* Update tokenization_utils_base.py
* Fix error msg when passing non-string device into tokenizer
* Fix error message for non-string tokenizer device
* Print out tokenizer device type in error msg
* Update tokenization_utils_base.py
2024-08-01 13:52:05 +02:00
nv-guomingz
9e28284032
add missing attribute _supports_param_buffer_assignment for gpt-j. ( #32359 )
...
Co-authored-by: Guoming Zhang <37257613+nv-guomingz@users.noreply.github.com>
2024-08-01 13:51:20 +02:00
Lunwen He
48ed24c50a
Remove size check between attn_weights and kv_seq_len for phi3 ( #32339 )
...
* Remove size check between attn_weights and kv_seq_len
* add unit tests
2024-08-01 13:49:00 +02:00
Sanchit Gandhi
e234061cdd
[whisper] compile compatibility with long-form decoding ( #31772 )
...
* [whisper] compile compatibility with long-form decoding
* clarify comment
* fix after rebase
* finalise
* fix bsz
* fix cache split
* remove contiguous
* style
* finish
* update doc
* prevent cuda graph trace
2024-08-01 18:10:56 +08:00
Sanchit Gandhi
9451a38526
[enc-dec cache] fix bug in indexing ( #32370 )
2024-08-01 16:05:27 +08:00
Raushan Turganbay
453e74884f
LLaVa: add cache class attribute ( #32278 )
...
cache class flag
2024-08-01 09:48:03 +05:00
Ricardo
14ee2326e5
fix: warmup_steps check for training_args ( #32236 )
2024-07-31 23:34:22 +01:00
Sai-Suraj-27
53f0c9c290
fix: Removed unnecessary @staticmethod
decorator ( #32361 )
...
* Fixed staticmethods with self as first argument.
* Fixed staticmethods with self as first argument.
* Fixed staticmethods with self as first argument.
* Fixed staticmethods with self as first argument.
2024-07-31 20:56:50 +01:00
fxmarty
92abe60334
>3-5x faster torch.compile forward compilation for autoregressive decoder models ( #32227 )
...
* draft
* apply changes to all relevant archs
* rerun ci - check_docstrings.py failing?
* fix docstring
* move 2D->4D mask creation to modeling file
* repo consistency
* fix the batch size = 1 case - calling contiguous is not enough
* nit
* style
* propagate to gemma/gemma-2
* prepare inputs for gemma generation
* implement test and tiny fix in gemma2
* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix copies
* ci pass
* fix gemma's test_compile_static_cache tests
* flacky
* retrigger ci
---------
Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-08-01 02:03:07 +08:00
Aymeric Roucher
b46bd8b9d2
Fix error when streaming to gradio with non-string tool arguments ( #32360 )
...
Fix error when streaming agent run to gradio with non-string tool arguments
2024-07-31 18:44:53 +02:00
Joao Gante
ef177a5e1c
Gemma 2: support assisted generation ( #32357 )
2024-07-31 16:04:48 +01:00
amyeroberts
5f1fcc299c
[Idefics2] - Fix FA2 call for Perceiver layer ( #32275 )
...
* Fix FA2 call for Perciever layer
* [run_slow] idefics2
* [run_slow] idefics2
* [run_slow] idefics2
* Fix up
* [run_slow] idefics2
* [run_slow] idefics2
* [run_slow] idefics2
2024-07-31 14:51:04 +01:00
Joao Gante
b75ad56620
Llama 3.1: Fix incorrect inv_freq
assignment ( #32330 )
...
fix 💩
2024-07-31 11:12:46 +01:00
Raushan Turganbay
7f552e28e0
Gemma2 and flash-attention ( #32188 )
...
* enable flash-attn & static cache
* this works, not the prev
* fix for sliding window layers
* not needed anymore
2024-07-31 10:33:38 +05:00
Raushan Turganbay
a3264332cf
LLaVA-NeXT: fix anyres shapes ( #32314 )
...
fix
2024-07-31 10:01:12 +05:00
Joshua Lochner
6e2d04e429
Fix slow GemmaTokenizer and improve SPM slow -> fast conversion process ( #32191 )
...
* Remove user-defined tokens which can be obtained through merges
* Remove debug line
* formatting
* Refactor spm slow -> fast converter
* revert unnecessary refactor
* set comprehension
* remove test files
* Use `vocab_scores`
* Always replace spiece underline with space in decode
* we no longer need token filtering
* Add save fast load slow unit test
* Remove tokenizers version check
* Remove duplicate code
* Make `<start_of_turn>` and `<end_of_turn>` special tokens
* Bias merge priority with length if score is the same
* Add unit test for merge priority
* CI
2024-07-30 23:36:38 +02:00
Joao Gante
026a173a64
Repo checks: skip docstring checks if not in the diff ( #32328 )
...
* tmp
* skip files not in the diff
* use git.Repo instead of an external subprocess
* add tiny change to confirm that the diff is working on pushed changes
* add make quality task
* more profesh main commit reference
2024-07-30 18:56:10 +01:00
fkrasnov2
516af4bb63
fixes #32329 : The Torch code is correct - to get an average of 10% o… ( #32335 )
...
fixes #32329 : The Torch code is correct - to get an average of 10% of the total, we want to take 50% of the remainder after we've already masked 80% with [MASK] in the previous step.
2024-07-30 18:21:45 +01:00
Wing Lian
62c60a3018
fixes to properly shard FSDP across cpu and meta for cpu_efficient_loading for prequantized 4bit ( #32276 )
2024-07-30 18:55:59 +02:00
Sai-Suraj-27
1627108033
fix: Added missing raise keyword for few exceptions ( #32333 )
...
Fixed raising of few exceptions.
2024-07-30 17:53:03 +01:00
plaggy
bd54ed2ed7
Alternative agent plan ( #32295 )
...
* new agent plan
* plan type assertion
* style corrections
* better prompt naming
* make fixup
2024-07-30 18:48:18 +02:00
Joao Gante
e68ec18ce2
Docs: formatting nits ( #32247 )
...
* doc formatting nits
* ignore non-autodocs
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/esm/modeling_esm.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/esm/modeling_esm.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* make fixup
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-30 15:49:14 +01:00
Yoach Lacombe
2fbbcf5007
Fix M4T for ASR pipeline ( #32296 )
...
* tentative fix
* do the same for M4T
2024-07-30 16:00:13 +02:00
Luc Georges
084b5094eb
feat(ci): set fetch-depth: 0
in trufflehog checkout step ( #31663 )
2024-07-30 14:49:26 +02:00
Teddy Ferdinan
20528f067c
Cast epochs_trained to int when resuming training ( #32286 )
...
* fix epochs_trained as int when resuming training
* refactor
---------
Co-authored-by: teddyferdinan <teddy.ferdinan@pwr.edu.pl>
2024-07-30 11:25:54 +02:00
Isotr0py
934fe1504e
Fix GGUF dequantize for gguf==0.9.1
( #32298 )
...
* fix gguf dequantize for gguf==0.9.1
* fix old version
* make style
2024-07-30 11:01:00 +02:00
Gilad Turok
3e8106d253
Docs: fix GaLore optimizer code example ( #32249 )
...
Docs: fix GaLore optimizer example
Fix incorrect usage of GaLore optimizer in Transformers trainer code example.
The GaLore optimizer uses low-rank gradient updates to reduce memory usage. GaLore is quite popular and is implemented by the authors in [https://github.com/jiaweizzhao/GaLore ](https://github.com/jiaweizzhao/GaLore ). A few months ago GaLore was added to the HuggingFace Transformers library in https://github.com/huggingface/transformers/pull/29588 .
Documentation of the Trainer module includes a few code examples of how to use GaLore. However, the `optim_targe_modules` argument to the `TrainingArguments` function is incorrect, as discussed in https://github.com/huggingface/transformers/pull/29588#issuecomment-2006289512 . This pull request fixes this issue.
2024-07-30 09:19:24 +02:00
Yih-Dar
f0bc49e7f6
use torch 2.4 in 2 CI jobs ( #32302 )
...
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-07-29 22:12:21 +02:00