transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-26 07:49:01 +06:00

Author	SHA1	Message	Date
Raushan Turganbay	7f552e28e0	Gemma2 and flash-attention (#32188 ) * enable flash-attn & static cache * this works, not the prev * fix for sliding window layers * not needed anymore	2024-07-31 10:33:38 +05:00
Raushan Turganbay	a3264332cf	LLaVA-NeXT: fix anyres shapes (#32314 ) fix	2024-07-31 10:01:12 +05:00
Joshua Lochner	6e2d04e429	Fix slow GemmaTokenizer and improve SPM slow -> fast conversion process (#32191 ) * Remove user-defined tokens which can be obtained through merges * Remove debug line * formatting * Refactor spm slow -> fast converter * revert unnecessary refactor * set comprehension * remove test files * Use `vocab_scores` * Always replace spiece underline with space in decode * we no longer need token filtering * Add save fast load slow unit test * Remove tokenizers version check * Remove duplicate code * Make `<start_of_turn>` and `<end_of_turn>` special tokens * Bias merge priority with length if score is the same * Add unit test for merge priority * CI	2024-07-30 23:36:38 +02:00
Joao Gante	026a173a64	Repo checks: skip docstring checks if not in the diff (#32328 ) * tmp * skip files not in the diff * use git.Repo instead of an external subprocess * add tiny change to confirm that the diff is working on pushed changes * add make quality task * more profesh main commit reference	2024-07-30 18:56:10 +01:00
fkrasnov2	516af4bb63	fixes #32329 : The Torch code is correct - to get an average of 10% o… (#32335 ) fixes #32329 : The Torch code is correct - to get an average of 10% of the total, we want to take 50% of the remainder after we've already masked 80% with [MASK] in the previous step.	2024-07-30 18:21:45 +01:00
Wing Lian	62c60a3018	fixes to properly shard FSDP across cpu and meta for cpu_efficient_loading for prequantized 4bit (#32276 )	2024-07-30 18:55:59 +02:00
Sai-Suraj-27	1627108033	fix: Added missing raise keyword for few exceptions (#32333 ) Fixed raising of few exceptions.	2024-07-30 17:53:03 +01:00
plaggy	bd54ed2ed7	Alternative agent plan (#32295 ) * new agent plan * plan type assertion * style corrections * better prompt naming * make fixup	2024-07-30 18:48:18 +02:00
Joao Gante	e68ec18ce2	Docs: formatting nits (#32247 ) * doc formatting nits * ignore non-autodocs * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/esm/modeling_esm.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/esm/modeling_esm.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * make fixup --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-07-30 15:49:14 +01:00
Yoach Lacombe	2fbbcf5007	Fix M4T for ASR pipeline (#32296 ) * tentative fix * do the same for M4T	2024-07-30 16:00:13 +02:00
Luc Georges	084b5094eb	feat(ci): set `fetch-depth: 0` in trufflehog checkout step (#31663 )	2024-07-30 14:49:26 +02:00
Teddy Ferdinan	20528f067c	Cast epochs_trained to int when resuming training (#32286 ) * fix epochs_trained as int when resuming training * refactor --------- Co-authored-by: teddyferdinan <teddy.ferdinan@pwr.edu.pl>	2024-07-30 11:25:54 +02:00
Isotr0py	934fe1504e	Fix GGUF dequantize for `gguf==0.9.1` (#32298 ) * fix gguf dequantize for gguf==0.9.1 * fix old version * make style	2024-07-30 11:01:00 +02:00
Gilad Turok	3e8106d253	Docs: fix GaLore optimizer code example (#32249 ) Docs: fix GaLore optimizer example Fix incorrect usage of GaLore optimizer in Transformers trainer code example. The GaLore optimizer uses low-rank gradient updates to reduce memory usage. GaLore is quite popular and is implemented by the authors in [https://github.com/jiaweizzhao/GaLore](https://github.com/jiaweizzhao/GaLore). A few months ago GaLore was added to the HuggingFace Transformers library in https://github.com/huggingface/transformers/pull/29588. Documentation of the Trainer module includes a few code examples of how to use GaLore. However, the `optim_targe_modules` argument to the `TrainingArguments` function is incorrect, as discussed in https://github.com/huggingface/transformers/pull/29588#issuecomment-2006289512. This pull request fixes this issue.	2024-07-30 09:19:24 +02:00
Yih-Dar	f0bc49e7f6	use torch 2.4 in 2 CI jobs (#32302 ) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-07-29 22:12:21 +02:00
Aymeric Roucher	a24a9a66f4	Add stream messages from agent run for gradio chatbot (#32142 ) * Add stream_to_gradio method for running agent in gradio demo	2024-07-29 20:12:44 +02:00
Guang Yang	811a9caa21	Make static cache compatible with torch.export (#32168 )	2024-07-29 18:19:15 +01:00
Sanchit Gandhi	7f5d644e69	[pipeline] fix padding for 1-d tensors (#31776 ) * [pipeline] fix padding for 1-d tensors * add test * make style * Update tests/pipelines/test_pipelines_automatic_speech_recognition.py Co-authored-by: Kamil Akesbi <45195979+kamilakesbi@users.noreply.github.com> * Update tests/pipelines/test_pipelines_automatic_speech_recognition.py --------- Co-authored-by: Kamil Akesbi <45195979+kamilakesbi@users.noreply.github.com>	2024-07-29 21:24:42 +08:00
Kamil Akesbi	3fbaaaa64d	Whisper tokenizer word level timestamps (#32197 ) * fix _fix_key in PreTrainedModel * fix _find_longest_common_sequence * add test * remove result.json * nit * update test	2024-07-29 11:19:52 +01:00
Joao Gante	7ffe25f2b9	Generate: end-to-end compilation (#30788 ) * mvp * added test (a few models need fixes) * fix a few test cases * test nits * harder test 😈 * revert changes in stablelm * test with improved condition * add todo * tmp commit * merged with main * nits * add todo * final corrections * add docs for generation compilation * docs nits * add tip * PR suggestions * add more details to the compilation docs * fix cache positions * cache is now init in generate; update docs * tag test as flaky * docs * post rebase make fixup and other nits * remove unintended changes * whisper (encoder-decoder) not supported * move token default updates to ; add tests for token defaults * push changes * manual rebase * chameleon doesn't support this * fix test_static_cache_mha_mqa_gqa (broken in another PR) * docs: dynamic is better with end-to-end compilation	2024-07-29 10:52:13 +01:00
Sai-Suraj-27	49928892d6	fix(docs): Fixed a link in docs (#32274 ) Fixed a link in docs.	2024-07-29 10:50:43 +01:00
Fanli Lin	6494479f1d	make `p_mask` a numpy array before passing to `select_starts_ends` (#32076 ) * fix * bug fix * refine * fix	2024-07-29 10:29:11 +01:00
Joao Gante	535fe78b9f	Repo: remove exceptions in `check_docstrings` (#32259 ) remove exceptions	2024-07-29 11:06:05 +02:00
Sai-Suraj-27	a2ad9d5ad5	fix: Fixed wrong argument passed to `convert_blip_checkpoint` function call (#32262 ) Removed one wrong argument passed to convert_blip_checkpoint function call.	2024-07-29 10:43:09 +02:00
leejet	5019aabfac	Optimize t5 tokenize logic to avoid redundant calls (#32270 ) * Optimize t5 tokenize logic to avoid redundant calls * fix and overwrite copies	2024-07-29 09:51:43 +02:00
Yih-Dar	f2122cc6eb	Upload new model failure report to Hub (#32264 ) upload Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-07-29 09:42:54 +02:00
Raushan Turganbay	f739687684	🚨 Bloom support for cache class (#31445 ) * bloom dynamic cache * bloom follows standard cache format * no skips for bloom anymore * use cache position when possible * clean up * codestyle * Update src/transformers/models/bloom/modeling_bloom.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/bloom/modeling_bloom.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/bloom/modeling_bloom.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * pr comments * isinstance fix * address comments * make musicgen test happy * [run-slow] bloom --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-07-29 10:58:59 +05:00
Joao Gante	44f6fdd74f	Llama 3.1: replace for loop by tensor ops at inv_freq initialization (#32244 ) * replace for loop by tensor ops * rm assert; readability	2024-07-27 10:19:46 +01:00
Yih-Dar	8da9068730	More flexible trigger condition (#32251 ) update Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-07-26 20:52:45 +02:00
Raushan Turganbay	81233c069c	Flash-Attn: fix generation when no attention mask or no pading (#32241 ) * fix * fix prev test (half of failures) * [run-slow] llama, gemma2 * [run-slow] llama, gemma2	2024-07-26 14:45:55 +05:00
Fanli Lin	27c7f971c0	[tests] fix `static` cache implementation is not compatible with `attn_implementation==flash_attention_2` (#32039 ) * add flash attention check * fix * fix	2024-07-26 11:41:27 +02:00
Connor Anderson	5f841c74b6	Add check for `target_sizes is None` in `post_process_image_guided_detection` for owlv2 (#31934 ) * Add check for target_sizes is None in post_process_image_guided_detection * Make sure Owlvit and Owlv2 in sync * Fix incorrect indentation; add check for correct size of target_sizes	2024-07-26 10:05:46 +01:00
Rohit Dwivedula	f9756d9edb	Adds: extra_repr for RMSNorm layers in most models (#32204 ) * adds: extra_repr() to RMSNorm layers in multiple models * adds: extra_repr for deprecated models as well * formatting as per style guide	2024-07-26 11:05:38 +02:00
Sai-Suraj-27	b8e5cd5396	Refactor: Removed un-necessary `object` base class (#32230 ) * Refactored to remove un-necessary object base class. * small fix.	2024-07-26 10:33:02 +02:00
João Nadkarni	1c7ebf1d6e	don't log base model architecture in wandb if log model is false (#32143 ) * don't log base model architecture in wandb is log model is false * Update src/transformers/integrations/integration_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * convert log model setting into an enum * fix formatting --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-07-26 09:38:59 +02:00
Raushan Turganbay	c46edfb823	Resize embeds with DeepSpeed (#32214 ) * fix resize when deepspeed * deepsped uses new embeds * we needed this	2024-07-26 10:52:06 +05:00
Raushan Turganbay	fad15fba78	Llava: generate without images (#32183 ) * llava w/o images * tests	2024-07-26 10:17:27 +05:00
Raushan Turganbay	4ab33c2d81	Generation: stop at `eos` for assisted decoding (#31301 ) * fix * move changes to prompt lookup * add test * set eos in assistant model * style * fix flakiness * changes for new `main` * Update tests/generation/test_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/generation/test_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * add comment to explain --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-07-26 10:16:06 +05:00
Pavel Iakubovskii	9d6c0641c4	Fix code snippet for Grounding DINO (#32229 ) Fix code snippet for grounding-dino	2024-07-25 19:20:47 +01:00
jrhe	3a83ec48a6	Allow a specific microphone to be used by the ffmpeg audio pipeline utility functions. Default to using the currently active microphone on Mac (#31846 ) * use currently active microphone on mac for ffmpeg_microphone * Allow ffmpeg_microphone device to be specified Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-07-25 17:16:13 +01:00
Huazhong Ji	6ed0bf1e85	translate philosophy.md to chinese (#32177 ) * translate philosophy.md to chinese * add the missing link	2024-07-25 09:01:06 -07:00
Yih-Dar	df6eee9201	Follow up for #31973 (#32025 ) * fix * [test_all] trigger full CI --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-07-25 16:12:23 +02:00
Kashif Rasul	de2318894e	[warnings] fix E721 warnings (#32223 ) fix E721 warnings	2024-07-25 15:12:23 +02:00
Kashif Rasul	9b9a54e61b	[BigBird Pegasus] set _supports_param_buffer_assignment to False (#32222 ) set _supports_param_buffer_assignment to False	2024-07-25 15:11:43 +02:00
Austin	1ecedf1d9e	Update question_answering.py (#32208 )	2024-07-25 13:20:27 +01:00
Huazhong Ji	f53a5dec7b	remove unnecessary guard code related with pytorch versions 1.4.2 ~ 1.7.0 (#32210 ) remove unnecessary guard code related with pytorch versions 1.4.2 ~ 1.7.0	2024-07-25 11:04:04 +02:00
Sanchit Gandhi	5658e749ad	[whisper] fix short-form output type (#32178 ) * [whisper] fix short-form output type * add test * make style * update long-form tests * fixes * last fix * finalise test	2024-07-25 16:58:02 +08:00
Sai-Suraj-27	85a1269e19	fix: Replaced deprecated `unittest method` with the correct one (#32198 ) Replaced deprecated unittest method with the correct one.	2024-07-24 18:00:21 +01:00
Matt	edd68f4ed8	🚨 No more default chat templates (#31733 ) * No more default chat templates * Add the template to the GPT-SW3 tests since it's not available by default now * Fix GPT2 test * Fix Bloom test * Fix Bloom test * Remove default templates again	2024-07-24 17:36:32 +01:00
Penut Chen	1c122a46dc	Support dequantizing GGUF FP16 format (#31783 ) * support gguf fp16 * support gguf bf16 with pytorch * add gguf f16 test * remove bf16	2024-07-24 17:59:59 +02:00

... 57 58 59 60 61 ...

19383 Commits