transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-31 10:12:23 +06:00

Author	SHA1	Message	Date
Sai-Suraj-27	49928892d6	fix(docs): Fixed a link in docs (#32274 ) Fixed a link in docs.	2024-07-29 10:50:43 +01:00
Fanli Lin	6494479f1d	make `p_mask` a numpy array before passing to `select_starts_ends` (#32076 ) * fix * bug fix * refine * fix	2024-07-29 10:29:11 +01:00
Joao Gante	535fe78b9f	Repo: remove exceptions in `check_docstrings` (#32259 ) remove exceptions	2024-07-29 11:06:05 +02:00
Sai-Suraj-27	a2ad9d5ad5	fix: Fixed wrong argument passed to `convert_blip_checkpoint` function call (#32262 ) Removed one wrong argument passed to convert_blip_checkpoint function call.	2024-07-29 10:43:09 +02:00
leejet	5019aabfac	Optimize t5 tokenize logic to avoid redundant calls (#32270 ) * Optimize t5 tokenize logic to avoid redundant calls * fix and overwrite copies	2024-07-29 09:51:43 +02:00
Yih-Dar	f2122cc6eb	Upload new model failure report to Hub (#32264 ) upload Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-07-29 09:42:54 +02:00
Raushan Turganbay	f739687684	🚨 Bloom support for cache class (#31445 ) * bloom dynamic cache * bloom follows standard cache format * no skips for bloom anymore * use cache position when possible * clean up * codestyle * Update src/transformers/models/bloom/modeling_bloom.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/bloom/modeling_bloom.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/bloom/modeling_bloom.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * pr comments * isinstance fix * address comments * make musicgen test happy * [run-slow] bloom --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-07-29 10:58:59 +05:00
Joao Gante	44f6fdd74f	Llama 3.1: replace for loop by tensor ops at inv_freq initialization (#32244 ) * replace for loop by tensor ops * rm assert; readability	2024-07-27 10:19:46 +01:00
Yih-Dar	8da9068730	More flexible trigger condition (#32251 ) update Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-07-26 20:52:45 +02:00
Raushan Turganbay	81233c069c	Flash-Attn: fix generation when no attention mask or no pading (#32241 ) * fix * fix prev test (half of failures) * [run-slow] llama, gemma2 * [run-slow] llama, gemma2	2024-07-26 14:45:55 +05:00
Fanli Lin	27c7f971c0	[tests] fix `static` cache implementation is not compatible with `attn_implementation==flash_attention_2` (#32039 ) * add flash attention check * fix * fix	2024-07-26 11:41:27 +02:00
Connor Anderson	5f841c74b6	Add check for `target_sizes is None` in `post_process_image_guided_detection` for owlv2 (#31934 ) * Add check for target_sizes is None in post_process_image_guided_detection * Make sure Owlvit and Owlv2 in sync * Fix incorrect indentation; add check for correct size of target_sizes	2024-07-26 10:05:46 +01:00
Rohit Dwivedula	f9756d9edb	Adds: extra_repr for RMSNorm layers in most models (#32204 ) * adds: extra_repr() to RMSNorm layers in multiple models * adds: extra_repr for deprecated models as well * formatting as per style guide	2024-07-26 11:05:38 +02:00
Sai-Suraj-27	b8e5cd5396	Refactor: Removed un-necessary `object` base class (#32230 ) * Refactored to remove un-necessary object base class. * small fix.	2024-07-26 10:33:02 +02:00
João Nadkarni	1c7ebf1d6e	don't log base model architecture in wandb if log model is false (#32143 ) * don't log base model architecture in wandb is log model is false * Update src/transformers/integrations/integration_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * convert log model setting into an enum * fix formatting --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-07-26 09:38:59 +02:00
Raushan Turganbay	c46edfb823	Resize embeds with DeepSpeed (#32214 ) * fix resize when deepspeed * deepsped uses new embeds * we needed this	2024-07-26 10:52:06 +05:00
Raushan Turganbay	fad15fba78	Llava: generate without images (#32183 ) * llava w/o images * tests	2024-07-26 10:17:27 +05:00
Raushan Turganbay	4ab33c2d81	Generation: stop at `eos` for assisted decoding (#31301 ) * fix * move changes to prompt lookup * add test * set eos in assistant model * style * fix flakiness * changes for new `main` * Update tests/generation/test_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/generation/test_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * add comment to explain --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-07-26 10:16:06 +05:00
Pavel Iakubovskii	9d6c0641c4	Fix code snippet for Grounding DINO (#32229 ) Fix code snippet for grounding-dino	2024-07-25 19:20:47 +01:00
jrhe	3a83ec48a6	Allow a specific microphone to be used by the ffmpeg audio pipeline utility functions. Default to using the currently active microphone on Mac (#31846 ) * use currently active microphone on mac for ffmpeg_microphone * Allow ffmpeg_microphone device to be specified Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-07-25 17:16:13 +01:00
Huazhong Ji	6ed0bf1e85	translate philosophy.md to chinese (#32177 ) * translate philosophy.md to chinese * add the missing link	2024-07-25 09:01:06 -07:00
Yih-Dar	df6eee9201	Follow up for #31973 (#32025 ) * fix * [test_all] trigger full CI --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-07-25 16:12:23 +02:00
Kashif Rasul	de2318894e	[warnings] fix E721 warnings (#32223 ) fix E721 warnings	2024-07-25 15:12:23 +02:00
Kashif Rasul	9b9a54e61b	[BigBird Pegasus] set _supports_param_buffer_assignment to False (#32222 ) set _supports_param_buffer_assignment to False	2024-07-25 15:11:43 +02:00
Austin	1ecedf1d9e	Update question_answering.py (#32208 )	2024-07-25 13:20:27 +01:00
Huazhong Ji	f53a5dec7b	remove unnecessary guard code related with pytorch versions 1.4.2 ~ 1.7.0 (#32210 ) remove unnecessary guard code related with pytorch versions 1.4.2 ~ 1.7.0	2024-07-25 11:04:04 +02:00
Sanchit Gandhi	5658e749ad	[whisper] fix short-form output type (#32178 ) * [whisper] fix short-form output type * add test * make style * update long-form tests * fixes * last fix * finalise test	2024-07-25 16:58:02 +08:00
Sai-Suraj-27	85a1269e19	fix: Replaced deprecated `unittest method` with the correct one (#32198 ) Replaced deprecated unittest method with the correct one.	2024-07-24 18:00:21 +01:00
Matt	edd68f4ed8	🚨 No more default chat templates (#31733 ) * No more default chat templates * Add the template to the GPT-SW3 tests since it's not available by default now * Fix GPT2 test * Fix Bloom test * Fix Bloom test * Remove default templates again	2024-07-24 17:36:32 +01:00
Penut Chen	1c122a46dc	Support dequantizing GGUF FP16 format (#31783 ) * support gguf fp16 * support gguf bf16 with pytorch * add gguf f16 test * remove bf16	2024-07-24 17:59:59 +02:00
Marc Sun	af0e4b7b37	Fix float8_e4m3fn in modeling_utils (#32193 ) * Fix float8_e4m3fn in modeling_utils * style * fix * comment	2024-07-24 17:14:05 +02:00
Raushan Turganbay	1392a6867f	Fix resize embedding with Deepspeed (#32192 ) fix resize when deepspeed	2024-07-24 19:26:20 +05:00
Arthur	8d2534c4d0	let's not warn when someone is running a forward (#32176 ) * let's not warn when someone is running a foward without cache + self.training * more models * fixup	2024-07-24 16:06:39 +02:00
Joao Gante	e0182f3bd7	RoPE: relaxed rope validation (#32182 ) * relaxed rope check * lets also accept rope_type=None, defaulting to the original implementation * type and rope_type can coexist	2024-07-24 15:00:48 +01:00
amyeroberts	165116bc14	Remove conversational pipeline tests (#32099 ) Remove conversation pipeline tests	2024-07-24 14:03:40 +01:00
Dr. Artificial曾小健	5f4ee98a7a	Update qwen2.md (#32108 ) * Update qwen2.md outdated description * Update qwen2.md amended * Update qwen2.md Update * Update qwen2.md fix wrong version code, now good to go	2024-07-24 11:54:41 +01:00
조준래	8678879f1d	fix: default value reflects the runtime environment variables rather than the ones present at import time. (#32153 ) * fix: default value reflects the runtime environment variables rather than the ones present at import time. * Fix: Change `deterministic` to None by default; use env var if None	2024-07-24 11:38:49 +01:00
Rohit Dwivedula	01be5b4879	adds: extra_repr() to MambaRMSNorm to include hidden size / size of weights in the layer (#32171 ) * adds: extra_repr() to MambaRMSNorm to include the hidden size of the layer * style fix with ruff:	2024-07-24 09:09:59 +02:00
Fanli Lin	c85510f958	[docs] change temperature to a positive value (#32077 ) fix	2024-07-23 17:47:51 +01:00
Sai-Suraj-27	bc2adb0112	fix: Fixed an if condition that is always evaluating to true (#32160 ) Fixed an if condition always evaluating to true.	2024-07-23 16:52:41 +01:00
Joao Gante	23f6a43f82	fix (#32162 )	2024-07-23 16:48:16 +01:00
Lysandre	d5a99dfcee	Llama 3.1 conversion Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>	2024-07-23 17:13:25 +02:00
Lysandre	ff0d708fe6	Dev version: v4.44.0.dev0	2024-07-23 17:12:47 +02:00
Sai-Suraj-27	d2c687b3f1	Updated `ruff` to the latest version (#31926 ) * Updated ruff version and fixed the required code accorindg to the latest version. * Updated ruff version and fixed the required code accorindg to the latest version. * Added noqa directive to ignore 1 error shown by ruff	2024-07-23 17:07:31 +02:00
RhuiDih	9cf4f2aa9a	Enhancing SFT Training Efficiency Using Packing and FlashAttention2 with Position IDs (#31629 ) * add DataCollatorBatchFlattening * Update data_collator.py * change name * new FA2 flow if position_ids is provided * add comments * minor fix * minor fix data collator * add test cases for models * add test case for data collator * remove extra code * formating for ruff check and check_repo.py * ruff format ruff format tests src utils * custom_init_isort.py	2024-07-23 15:56:41 +02:00
Deep Gandhi	7d92009af6	Added additional kwarg for successful running of optuna hyperparameter search (#31924 ) Update integration_utils.py Added additional kwarg	2024-07-23 14:41:52 +01:00
Alvaro Moran	63700628ad	feat(cache): StaticCache uses index_copy_ to avoid useless copy (#31857 ) * feat(cache): StaticCache uses index_copy_ to avoid useless copy Using index_copy_ allows for explicit in-place change of the tensor. Some backends (XLA) will otherwise copy the tensor, making the code slower and using more memory. Proposed implementation will end up using less memory and on XLA will result in less compilation, but the change is also quite generic, making no change whatsoever on CUDA or CPU backend. * feat(cache): SlidingWindowCache uses index_copy_ to avoid useless copy Applying the same change done in StaticCache. * fix(cache): fallback of index_copy_ when not implemented * fix(cache): in index_copy_ ensure tensors are on same device * [run slow] llama * fix(cache): add move of cache_position to same device in SlidingWindowCache * Revert "[run slow] llama" This reverts commit `02608dd142`.	2024-07-23 14:18:19 +02:00
amyeroberts	a009fbdab3	Fix typing to be compatible with later py versions (#32155 )	2024-07-23 12:23:34 +01:00
Sanchit Gandhi	3263b34354	Revert "Incorrect Whisper long-form decoding timestamps " (#32148 ) Revert "Incorrect Whisper long-form decoding timestamps (#32003)" This reverts commit `cd48553fc8`.	2024-07-23 18:34:30 +08:00
Amit Garg	034b477847	Rename Phi-3 rope scaling type (#31436 ) * renamed phi3 rope_scaling type * fixed trailing whitespaces * fixed test * added warning * fixed format	2024-07-23 12:33:22 +02:00

1 2 3 4 5 ...

16463 Commits