transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-24 23:08:57 +06:00

Author	SHA1	Message	Date
OsamaS99	51ab25e293	Fixed Hybrid Cache Shape Initialization. (#32163 ) * fixed hybrid cache init, added test * Fix Test Typo --------- Co-authored-by: Aaron Haag <aaron.haag@siemens.com>	2024-08-01 13:57:42 +01:00
Joao Gante	e3d8285a84	Docker: add `speech` dep to the consistency docker image (#32374 )	2024-08-01 13:46:11 +01:00
Nikos Karampatziakis	ca59d6f77c	Offloaded KV Cache (#31325 ) * Initial implementation of OffloadedCache * enable usage via cache_implementation * Address feedback, add tests, remove legacy methods. * Remove flash-attn, discover synchronization bugs, fix bugs * Prevent usage in CPU only mode * Add a section about offloaded KV cache to the docs * Fix typos in docs * Clarifications and better explanation of streams	2024-08-01 14:42:07 +02:00
Omar Salman	b4727a1216	Fix conflicting key in init kwargs in PreTrainedTokenizerBase (#31233 ) * Fix conflicting key in init kwargs in PreTrainedTokenizerBase * Update code to check for callable key in save_pretrained * Apply PR suggestions * Invoke CI * Updates based on PR suggestion	2024-08-01 14:32:13 +02:00
Viktor Scherbakov	db8c7caeb6	Empty list in defaults for LLaMA special tokens during weights conversion (#32342 ) empty list in defaults	2024-08-01 14:30:10 +02:00
Ita Zaporozhets	2229ebe722	update clean_up_tokenization_spaces warning (#32371 )	2024-08-01 13:57:41 +02:00
Hanna Yukhymenko	05c1f9af9a	Check device map for saving tokenizer config on TPU (fix for issue #31971 ) (#32043 ) * Remove TPU device map for saving tokenizer config * Update tokenization_utils_base.py * Fix error msg when passing non-string device into tokenizer * Fix error message for non-string tokenizer device * Print out tokenizer device type in error msg * Update tokenization_utils_base.py	2024-08-01 13:52:05 +02:00
nv-guomingz	9e28284032	add missing attribute _supports_param_buffer_assignment for gpt-j. (#32359 ) Co-authored-by: Guoming Zhang <37257613+nv-guomingz@users.noreply.github.com>	2024-08-01 13:51:20 +02:00
Lunwen He	48ed24c50a	Remove size check between attn_weights and kv_seq_len for phi3 (#32339 ) * Remove size check between attn_weights and kv_seq_len * add unit tests	2024-08-01 13:49:00 +02:00
Sanchit Gandhi	e234061cdd	[whisper] compile compatibility with long-form decoding (#31772 ) * [whisper] compile compatibility with long-form decoding * clarify comment * fix after rebase * finalise * fix bsz * fix cache split * remove contiguous * style * finish * update doc * prevent cuda graph trace	2024-08-01 18:10:56 +08:00
Sanchit Gandhi	9451a38526	[enc-dec cache] fix bug in indexing (#32370 )	2024-08-01 16:05:27 +08:00
Raushan Turganbay	453e74884f	LLaVa: add cache class attribute (#32278 ) cache class flag	2024-08-01 09:48:03 +05:00
Ricardo	14ee2326e5	fix: warmup_steps check for training_args (#32236 )	2024-07-31 23:34:22 +01:00
Sai-Suraj-27	53f0c9c290	fix: Removed unnecessary `@staticmethod` decorator (#32361 ) * Fixed staticmethods with self as first argument. * Fixed staticmethods with self as first argument. * Fixed staticmethods with self as first argument. * Fixed staticmethods with self as first argument.	2024-07-31 20:56:50 +01:00
fxmarty	92abe60334	>3-5x faster torch.compile forward compilation for autoregressive decoder models (#32227 ) * draft * apply changes to all relevant archs * rerun ci - check_docstrings.py failing? * fix docstring * move 2D->4D mask creation to modeling file * repo consistency * fix the batch size = 1 case - calling contiguous is not enough * nit * style * propagate to gemma/gemma-2 * prepare inputs for gemma generation * implement test and tiny fix in gemma2 * Update src/transformers/models/bloom/modeling_bloom.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix copies * ci pass * fix gemma's test_compile_static_cache tests * flacky * retrigger ci --------- Co-authored-by: sanchit-gandhi <sanchit@huggingface.co> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-08-01 02:03:07 +08:00
Aymeric Roucher	b46bd8b9d2	Fix error when streaming to gradio with non-string tool arguments (#32360 ) Fix error when streaming agent run to gradio with non-string tool arguments	2024-07-31 18:44:53 +02:00
Joao Gante	ef177a5e1c	Gemma 2: support assisted generation (#32357 )	2024-07-31 16:04:48 +01:00
amyeroberts	5f1fcc299c	[Idefics2] - Fix FA2 call for Perceiver layer (#32275 ) * Fix FA2 call for Perciever layer * [run_slow] idefics2 * [run_slow] idefics2 * [run_slow] idefics2 * Fix up * [run_slow] idefics2 * [run_slow] idefics2 * [run_slow] idefics2	2024-07-31 14:51:04 +01:00
Joao Gante	b75ad56620	Llama 3.1: Fix incorrect `inv_freq` assignment (#32330 ) fix 💩	2024-07-31 11:12:46 +01:00
Raushan Turganbay	7f552e28e0	Gemma2 and flash-attention (#32188 ) * enable flash-attn & static cache * this works, not the prev * fix for sliding window layers * not needed anymore	2024-07-31 10:33:38 +05:00
Raushan Turganbay	a3264332cf	LLaVA-NeXT: fix anyres shapes (#32314 ) fix	2024-07-31 10:01:12 +05:00
Joshua Lochner	6e2d04e429	Fix slow GemmaTokenizer and improve SPM slow -> fast conversion process (#32191 ) * Remove user-defined tokens which can be obtained through merges * Remove debug line * formatting * Refactor spm slow -> fast converter * revert unnecessary refactor * set comprehension * remove test files * Use `vocab_scores` * Always replace spiece underline with space in decode * we no longer need token filtering * Add save fast load slow unit test * Remove tokenizers version check * Remove duplicate code * Make `<start_of_turn>` and `<end_of_turn>` special tokens * Bias merge priority with length if score is the same * Add unit test for merge priority * CI	2024-07-30 23:36:38 +02:00
Joao Gante	026a173a64	Repo checks: skip docstring checks if not in the diff (#32328 ) * tmp * skip files not in the diff * use git.Repo instead of an external subprocess * add tiny change to confirm that the diff is working on pushed changes * add make quality task * more profesh main commit reference	2024-07-30 18:56:10 +01:00
fkrasnov2	516af4bb63	fixes #32329 : The Torch code is correct - to get an average of 10% o… (#32335 ) fixes #32329 : The Torch code is correct - to get an average of 10% of the total, we want to take 50% of the remainder after we've already masked 80% with [MASK] in the previous step.	2024-07-30 18:21:45 +01:00
Wing Lian	62c60a3018	fixes to properly shard FSDP across cpu and meta for cpu_efficient_loading for prequantized 4bit (#32276 )	2024-07-30 18:55:59 +02:00
Sai-Suraj-27	1627108033	fix: Added missing raise keyword for few exceptions (#32333 ) Fixed raising of few exceptions.	2024-07-30 17:53:03 +01:00
plaggy	bd54ed2ed7	Alternative agent plan (#32295 ) * new agent plan * plan type assertion * style corrections * better prompt naming * make fixup	2024-07-30 18:48:18 +02:00
Joao Gante	e68ec18ce2	Docs: formatting nits (#32247 ) * doc formatting nits * ignore non-autodocs * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/esm/modeling_esm.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/esm/modeling_esm.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * make fixup --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-07-30 15:49:14 +01:00
Yoach Lacombe	2fbbcf5007	Fix M4T for ASR pipeline (#32296 ) * tentative fix * do the same for M4T	2024-07-30 16:00:13 +02:00
Luc Georges	084b5094eb	feat(ci): set `fetch-depth: 0` in trufflehog checkout step (#31663 )	2024-07-30 14:49:26 +02:00
Teddy Ferdinan	20528f067c	Cast epochs_trained to int when resuming training (#32286 ) * fix epochs_trained as int when resuming training * refactor --------- Co-authored-by: teddyferdinan <teddy.ferdinan@pwr.edu.pl>	2024-07-30 11:25:54 +02:00
Isotr0py	934fe1504e	Fix GGUF dequantize for `gguf==0.9.1` (#32298 ) * fix gguf dequantize for gguf==0.9.1 * fix old version * make style	2024-07-30 11:01:00 +02:00
Gilad Turok	3e8106d253	Docs: fix GaLore optimizer code example (#32249 ) Docs: fix GaLore optimizer example Fix incorrect usage of GaLore optimizer in Transformers trainer code example. The GaLore optimizer uses low-rank gradient updates to reduce memory usage. GaLore is quite popular and is implemented by the authors in [https://github.com/jiaweizzhao/GaLore](https://github.com/jiaweizzhao/GaLore). A few months ago GaLore was added to the HuggingFace Transformers library in https://github.com/huggingface/transformers/pull/29588. Documentation of the Trainer module includes a few code examples of how to use GaLore. However, the `optim_targe_modules` argument to the `TrainingArguments` function is incorrect, as discussed in https://github.com/huggingface/transformers/pull/29588#issuecomment-2006289512. This pull request fixes this issue.	2024-07-30 09:19:24 +02:00
Yih-Dar	f0bc49e7f6	use torch 2.4 in 2 CI jobs (#32302 ) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-07-29 22:12:21 +02:00
Aymeric Roucher	a24a9a66f4	Add stream messages from agent run for gradio chatbot (#32142 ) * Add stream_to_gradio method for running agent in gradio demo	2024-07-29 20:12:44 +02:00
Guang Yang	811a9caa21	Make static cache compatible with torch.export (#32168 )	2024-07-29 18:19:15 +01:00
Sanchit Gandhi	7f5d644e69	[pipeline] fix padding for 1-d tensors (#31776 ) * [pipeline] fix padding for 1-d tensors * add test * make style * Update tests/pipelines/test_pipelines_automatic_speech_recognition.py Co-authored-by: Kamil Akesbi <45195979+kamilakesbi@users.noreply.github.com> * Update tests/pipelines/test_pipelines_automatic_speech_recognition.py --------- Co-authored-by: Kamil Akesbi <45195979+kamilakesbi@users.noreply.github.com>	2024-07-29 21:24:42 +08:00
Kamil Akesbi	3fbaaaa64d	Whisper tokenizer word level timestamps (#32197 ) * fix _fix_key in PreTrainedModel * fix _find_longest_common_sequence * add test * remove result.json * nit * update test	2024-07-29 11:19:52 +01:00
Joao Gante	7ffe25f2b9	Generate: end-to-end compilation (#30788 ) * mvp * added test (a few models need fixes) * fix a few test cases * test nits * harder test 😈 * revert changes in stablelm * test with improved condition * add todo * tmp commit * merged with main * nits * add todo * final corrections * add docs for generation compilation * docs nits * add tip * PR suggestions * add more details to the compilation docs * fix cache positions * cache is now init in generate; update docs * tag test as flaky * docs * post rebase make fixup and other nits * remove unintended changes * whisper (encoder-decoder) not supported * move token default updates to ; add tests for token defaults * push changes * manual rebase * chameleon doesn't support this * fix test_static_cache_mha_mqa_gqa (broken in another PR) * docs: dynamic is better with end-to-end compilation	2024-07-29 10:52:13 +01:00
Sai-Suraj-27	49928892d6	fix(docs): Fixed a link in docs (#32274 ) Fixed a link in docs.	2024-07-29 10:50:43 +01:00
Fanli Lin	6494479f1d	make `p_mask` a numpy array before passing to `select_starts_ends` (#32076 ) * fix * bug fix * refine * fix	2024-07-29 10:29:11 +01:00
Joao Gante	535fe78b9f	Repo: remove exceptions in `check_docstrings` (#32259 ) remove exceptions	2024-07-29 11:06:05 +02:00
Sai-Suraj-27	a2ad9d5ad5	fix: Fixed wrong argument passed to `convert_blip_checkpoint` function call (#32262 ) Removed one wrong argument passed to convert_blip_checkpoint function call.	2024-07-29 10:43:09 +02:00
leejet	5019aabfac	Optimize t5 tokenize logic to avoid redundant calls (#32270 ) * Optimize t5 tokenize logic to avoid redundant calls * fix and overwrite copies	2024-07-29 09:51:43 +02:00
Yih-Dar	f2122cc6eb	Upload new model failure report to Hub (#32264 ) upload Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-07-29 09:42:54 +02:00
Raushan Turganbay	f739687684	🚨 Bloom support for cache class (#31445 ) * bloom dynamic cache * bloom follows standard cache format * no skips for bloom anymore * use cache position when possible * clean up * codestyle * Update src/transformers/models/bloom/modeling_bloom.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/bloom/modeling_bloom.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/bloom/modeling_bloom.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * pr comments * isinstance fix * address comments * make musicgen test happy * [run-slow] bloom --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-07-29 10:58:59 +05:00
Joao Gante	44f6fdd74f	Llama 3.1: replace for loop by tensor ops at inv_freq initialization (#32244 ) * replace for loop by tensor ops * rm assert; readability	2024-07-27 10:19:46 +01:00
Yih-Dar	8da9068730	More flexible trigger condition (#32251 ) update Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-07-26 20:52:45 +02:00
Raushan Turganbay	81233c069c	Flash-Attn: fix generation when no attention mask or no pading (#32241 ) * fix * fix prev test (half of failures) * [run-slow] llama, gemma2 * [run-slow] llama, gemma2	2024-07-26 14:45:55 +05:00
Fanli Lin	27c7f971c0	[tests] fix `static` cache implementation is not compatible with `attn_implementation==flash_attention_2` (#32039 ) * add flash attention check * fix * fix	2024-07-26 11:41:27 +02:00

1 2 3 4 5 ...

16502 Commits