transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-31 02:02:21 +06:00

Author	SHA1	Message	Date
Francisco Kurucz	13dc6b0853	Fix documentation links and code reference to model llava-next (#32434 )	2024-08-05 15:14:50 -07:00
amyeroberts	7e5d46ded4	Respect the config's attn_implementation if set (#32383 ) * Respect the config's attn if set * Update test - can override in from_config * Fix	2024-08-05 16:33:19 +01:00
Sai-Suraj-27	458b0cd2c5	fix: Updated `test_embeded_special_tokens` for luke and mluke models (#32413 ) Fixed tokenizertests for luke, mluke models.	2024-08-05 15:19:42 +01:00
Abdi	baf7e5c927	Persist embedding type of BART and mBART models after resize (#32242 ) * fix: persist embedding type of MBartConditonalGeneration after resize * fix: persist embedding type of BartConditonalGeneration after resize	2024-08-05 14:15:36 +01:00
Francisco Kurucz	f5f1e52f6c	Fix documentation references to google/bit-50 model (#32407 )	2024-08-05 10:18:28 +02:00
Nicholas Broad	ea5da52ebc	add values for neftune (#32399 ) I always forget what typical values are, and I have to look at the paper everytime. This will be a helpful reminder.	2024-08-05 09:51:58 +02:00
Ita Zaporozhets	3d7c2f9dea	#32184 save total_vocab_size (#32240 ) * save total_vocab_size = vocab_size + user added tokens to speed up operation * updating length when added_tokens_decoder is set * add test len(tokenizer)	2024-08-05 09:22:48 +02:00
Raushan Turganbay	3bb646a54f	Phi3 tests: fix typing for Python 3.8 (#32388 ) fix phi	2024-08-05 11:58:42 +05:00
TechInterMezzo	05ae3a300d	fix: SeamlessM4TFeatureExtractor stride remainder (#32088 ) * fix: SeamlessM4TFeatureExtractor stride remainder * Added attention mask size test * Reran ruff for style correction	2024-08-05 08:40:58 +02:00
dependabot[bot]	847bb856d5	Bump keras from 2.8.0 to 2.13.1 in /examples/research_projects/decision_transformer (#32393 ) Bump keras in /examples/research_projects/decision_transformer Bumps [keras](https://github.com/keras-team/keras) from 2.8.0 to 2.13.1. - [Release notes](https://github.com/keras-team/keras/releases) - [Commits](https://github.com/keras-team/keras/compare/v2.8.0...v2.13.1) --- updated-dependencies: - dependency-name: keras dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-08-05 08:38:34 +02:00
Xueshen Liu	621fb3c0ed	MixtralFlashAttention2: put "plus 1" inside parentheses when calculating rotary_seq_len, allowing None position_ids input. (#31500 ) * Mixtral: remove unnecessary plus 1 when calculating rotary_seq_len, allowing position_ids=None (no auto position_ids generation could be unsafe) * fix typo [:-1] to [:, -1] * to meet formatting requirement * to meet formatting requirement * remove white space * MixtralFlashAttention2: put "+ 1" inside parentheses when calculating rotary_seq_len, allowing None position_ids input. Fix format/style issue. * propagate to startcoder2, phi3, mixtral and qwen2 * update qwen2_moe	2024-08-03 20:07:55 +02:00
Shaopeng Fu	7c31d05b59	fix: (issue #32124 ) Exception raised when running `transformers/examples/flax/language-modeling/t5_tokenizer_model.py`. (#32157 ) fix: Exception raised when running .	2024-08-03 18:24:11 +02:00
Sanchit Gandhi	c1aa0edb48	[generate] only require an attention mask for mps with torch<2.4 (#32367 ) * up * style * stopping	2024-08-02 17:32:50 +08:00
Joao Gante	083e13b7c4	RoPE: Add numerical tests ✨ (#32380 ) tests! :D	2024-08-02 09:39:45 +01:00
Raushan Turganbay	2af199c42b	Update docs (#32368 ) nits	2024-08-02 09:54:16 +05:00
Zach Mueller	82efc53513	Yell at the user if zero-3 init wasn't performed, but expected to have been done (#32299 ) * Test this zach * Test for improper init w/o zero3 * Move back * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Get rid of stars in warning * Make private * Make clear --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-08-01 15:18:43 -04:00
OsamaS99	51ab25e293	Fixed Hybrid Cache Shape Initialization. (#32163 ) * fixed hybrid cache init, added test * Fix Test Typo --------- Co-authored-by: Aaron Haag <aaron.haag@siemens.com>	2024-08-01 13:57:42 +01:00
Joao Gante	e3d8285a84	Docker: add `speech` dep to the consistency docker image (#32374 )	2024-08-01 13:46:11 +01:00
Nikos Karampatziakis	ca59d6f77c	Offloaded KV Cache (#31325 ) * Initial implementation of OffloadedCache * enable usage via cache_implementation * Address feedback, add tests, remove legacy methods. * Remove flash-attn, discover synchronization bugs, fix bugs * Prevent usage in CPU only mode * Add a section about offloaded KV cache to the docs * Fix typos in docs * Clarifications and better explanation of streams	2024-08-01 14:42:07 +02:00
Omar Salman	b4727a1216	Fix conflicting key in init kwargs in PreTrainedTokenizerBase (#31233 ) * Fix conflicting key in init kwargs in PreTrainedTokenizerBase * Update code to check for callable key in save_pretrained * Apply PR suggestions * Invoke CI * Updates based on PR suggestion	2024-08-01 14:32:13 +02:00
Viktor Scherbakov	db8c7caeb6	Empty list in defaults for LLaMA special tokens during weights conversion (#32342 ) empty list in defaults	2024-08-01 14:30:10 +02:00
Ita Zaporozhets	2229ebe722	update clean_up_tokenization_spaces warning (#32371 )	2024-08-01 13:57:41 +02:00
Hanna Yukhymenko	05c1f9af9a	Check device map for saving tokenizer config on TPU (fix for issue #31971 ) (#32043 ) * Remove TPU device map for saving tokenizer config * Update tokenization_utils_base.py * Fix error msg when passing non-string device into tokenizer * Fix error message for non-string tokenizer device * Print out tokenizer device type in error msg * Update tokenization_utils_base.py	2024-08-01 13:52:05 +02:00
nv-guomingz	9e28284032	add missing attribute _supports_param_buffer_assignment for gpt-j. (#32359 ) Co-authored-by: Guoming Zhang <37257613+nv-guomingz@users.noreply.github.com>	2024-08-01 13:51:20 +02:00
Lunwen He	48ed24c50a	Remove size check between attn_weights and kv_seq_len for phi3 (#32339 ) * Remove size check between attn_weights and kv_seq_len * add unit tests	2024-08-01 13:49:00 +02:00
Sanchit Gandhi	e234061cdd	[whisper] compile compatibility with long-form decoding (#31772 ) * [whisper] compile compatibility with long-form decoding * clarify comment * fix after rebase * finalise * fix bsz * fix cache split * remove contiguous * style * finish * update doc * prevent cuda graph trace	2024-08-01 18:10:56 +08:00
Sanchit Gandhi	9451a38526	[enc-dec cache] fix bug in indexing (#32370 )	2024-08-01 16:05:27 +08:00
Raushan Turganbay	453e74884f	LLaVa: add cache class attribute (#32278 ) cache class flag	2024-08-01 09:48:03 +05:00
Ricardo	14ee2326e5	fix: warmup_steps check for training_args (#32236 )	2024-07-31 23:34:22 +01:00
Sai-Suraj-27	53f0c9c290	fix: Removed unnecessary `@staticmethod` decorator (#32361 ) * Fixed staticmethods with self as first argument. * Fixed staticmethods with self as first argument. * Fixed staticmethods with self as first argument. * Fixed staticmethods with self as first argument.	2024-07-31 20:56:50 +01:00
fxmarty	92abe60334	>3-5x faster torch.compile forward compilation for autoregressive decoder models (#32227 ) * draft * apply changes to all relevant archs * rerun ci - check_docstrings.py failing? * fix docstring * move 2D->4D mask creation to modeling file * repo consistency * fix the batch size = 1 case - calling contiguous is not enough * nit * style * propagate to gemma/gemma-2 * prepare inputs for gemma generation * implement test and tiny fix in gemma2 * Update src/transformers/models/bloom/modeling_bloom.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix copies * ci pass * fix gemma's test_compile_static_cache tests * flacky * retrigger ci --------- Co-authored-by: sanchit-gandhi <sanchit@huggingface.co> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-08-01 02:03:07 +08:00
Aymeric Roucher	b46bd8b9d2	Fix error when streaming to gradio with non-string tool arguments (#32360 ) Fix error when streaming agent run to gradio with non-string tool arguments	2024-07-31 18:44:53 +02:00
Joao Gante	ef177a5e1c	Gemma 2: support assisted generation (#32357 )	2024-07-31 16:04:48 +01:00
amyeroberts	5f1fcc299c	[Idefics2] - Fix FA2 call for Perceiver layer (#32275 ) * Fix FA2 call for Perciever layer * [run_slow] idefics2 * [run_slow] idefics2 * [run_slow] idefics2 * Fix up * [run_slow] idefics2 * [run_slow] idefics2 * [run_slow] idefics2	2024-07-31 14:51:04 +01:00
Joao Gante	b75ad56620	Llama 3.1: Fix incorrect `inv_freq` assignment (#32330 ) fix 💩	2024-07-31 11:12:46 +01:00
Raushan Turganbay	7f552e28e0	Gemma2 and flash-attention (#32188 ) * enable flash-attn & static cache * this works, not the prev * fix for sliding window layers * not needed anymore	2024-07-31 10:33:38 +05:00
Raushan Turganbay	a3264332cf	LLaVA-NeXT: fix anyres shapes (#32314 ) fix	2024-07-31 10:01:12 +05:00
Joshua Lochner	6e2d04e429	Fix slow GemmaTokenizer and improve SPM slow -> fast conversion process (#32191 ) * Remove user-defined tokens which can be obtained through merges * Remove debug line * formatting * Refactor spm slow -> fast converter * revert unnecessary refactor * set comprehension * remove test files * Use `vocab_scores` * Always replace spiece underline with space in decode * we no longer need token filtering * Add save fast load slow unit test * Remove tokenizers version check * Remove duplicate code * Make `<start_of_turn>` and `<end_of_turn>` special tokens * Bias merge priority with length if score is the same * Add unit test for merge priority * CI	2024-07-30 23:36:38 +02:00
Joao Gante	026a173a64	Repo checks: skip docstring checks if not in the diff (#32328 ) * tmp * skip files not in the diff * use git.Repo instead of an external subprocess * add tiny change to confirm that the diff is working on pushed changes * add make quality task * more profesh main commit reference	2024-07-30 18:56:10 +01:00
fkrasnov2	516af4bb63	fixes #32329 : The Torch code is correct - to get an average of 10% o… (#32335 ) fixes #32329 : The Torch code is correct - to get an average of 10% of the total, we want to take 50% of the remainder after we've already masked 80% with [MASK] in the previous step.	2024-07-30 18:21:45 +01:00
Wing Lian	62c60a3018	fixes to properly shard FSDP across cpu and meta for cpu_efficient_loading for prequantized 4bit (#32276 )	2024-07-30 18:55:59 +02:00
Sai-Suraj-27	1627108033	fix: Added missing raise keyword for few exceptions (#32333 ) Fixed raising of few exceptions.	2024-07-30 17:53:03 +01:00
plaggy	bd54ed2ed7	Alternative agent plan (#32295 ) * new agent plan * plan type assertion * style corrections * better prompt naming * make fixup	2024-07-30 18:48:18 +02:00
Joao Gante	e68ec18ce2	Docs: formatting nits (#32247 ) * doc formatting nits * ignore non-autodocs * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/esm/modeling_esm.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/esm/modeling_esm.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * make fixup --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-07-30 15:49:14 +01:00
Yoach Lacombe	2fbbcf5007	Fix M4T for ASR pipeline (#32296 ) * tentative fix * do the same for M4T	2024-07-30 16:00:13 +02:00
Luc Georges	084b5094eb	feat(ci): set `fetch-depth: 0` in trufflehog checkout step (#31663 )	2024-07-30 14:49:26 +02:00
Teddy Ferdinan	20528f067c	Cast epochs_trained to int when resuming training (#32286 ) * fix epochs_trained as int when resuming training * refactor --------- Co-authored-by: teddyferdinan <teddy.ferdinan@pwr.edu.pl>	2024-07-30 11:25:54 +02:00
Isotr0py	934fe1504e	Fix GGUF dequantize for `gguf==0.9.1` (#32298 ) * fix gguf dequantize for gguf==0.9.1 * fix old version * make style	2024-07-30 11:01:00 +02:00
Gilad Turok	3e8106d253	Docs: fix GaLore optimizer code example (#32249 ) Docs: fix GaLore optimizer example Fix incorrect usage of GaLore optimizer in Transformers trainer code example. The GaLore optimizer uses low-rank gradient updates to reduce memory usage. GaLore is quite popular and is implemented by the authors in [https://github.com/jiaweizzhao/GaLore](https://github.com/jiaweizzhao/GaLore). A few months ago GaLore was added to the HuggingFace Transformers library in https://github.com/huggingface/transformers/pull/29588. Documentation of the Trainer module includes a few code examples of how to use GaLore. However, the `optim_targe_modules` argument to the `TrainingArguments` function is incorrect, as discussed in https://github.com/huggingface/transformers/pull/29588#issuecomment-2006289512. This pull request fixes this issue.	2024-07-30 09:19:24 +02:00
Yih-Dar	f0bc49e7f6	use torch 2.4 in 2 CI jobs (#32302 ) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-07-29 22:12:21 +02:00

1 2 3 4 5 ...

16518 Commits