transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-27 00:09:00 +06:00

Author	SHA1	Message	Date
Connor Anderson	5f841c74b6	Add check for `target_sizes is None` in `post_process_image_guided_detection` for owlv2 (#31934 ) * Add check for target_sizes is None in post_process_image_guided_detection * Make sure Owlvit and Owlv2 in sync * Fix incorrect indentation; add check for correct size of target_sizes	2024-07-26 10:05:46 +01:00
Rohit Dwivedula	f9756d9edb	Adds: extra_repr for RMSNorm layers in most models (#32204 ) * adds: extra_repr() to RMSNorm layers in multiple models * adds: extra_repr for deprecated models as well * formatting as per style guide	2024-07-26 11:05:38 +02:00
Sai-Suraj-27	b8e5cd5396	Refactor: Removed un-necessary `object` base class (#32230 ) * Refactored to remove un-necessary object base class. * small fix.	2024-07-26 10:33:02 +02:00
João Nadkarni	1c7ebf1d6e	don't log base model architecture in wandb if log model is false (#32143 ) * don't log base model architecture in wandb is log model is false * Update src/transformers/integrations/integration_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * convert log model setting into an enum * fix formatting --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-07-26 09:38:59 +02:00
Raushan Turganbay	c46edfb823	Resize embeds with DeepSpeed (#32214 ) * fix resize when deepspeed * deepsped uses new embeds * we needed this	2024-07-26 10:52:06 +05:00
Raushan Turganbay	fad15fba78	Llava: generate without images (#32183 ) * llava w/o images * tests	2024-07-26 10:17:27 +05:00
Raushan Turganbay	4ab33c2d81	Generation: stop at `eos` for assisted decoding (#31301 ) * fix * move changes to prompt lookup * add test * set eos in assistant model * style * fix flakiness * changes for new `main` * Update tests/generation/test_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/generation/test_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * add comment to explain --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-07-26 10:16:06 +05:00
Pavel Iakubovskii	9d6c0641c4	Fix code snippet for Grounding DINO (#32229 ) Fix code snippet for grounding-dino	2024-07-25 19:20:47 +01:00
jrhe	3a83ec48a6	Allow a specific microphone to be used by the ffmpeg audio pipeline utility functions. Default to using the currently active microphone on Mac (#31846 ) * use currently active microphone on mac for ffmpeg_microphone * Allow ffmpeg_microphone device to be specified Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-07-25 17:16:13 +01:00
Huazhong Ji	6ed0bf1e85	translate philosophy.md to chinese (#32177 ) * translate philosophy.md to chinese * add the missing link	2024-07-25 09:01:06 -07:00
Yih-Dar	df6eee9201	Follow up for #31973 (#32025 ) * fix * [test_all] trigger full CI --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-07-25 16:12:23 +02:00
Kashif Rasul	de2318894e	[warnings] fix E721 warnings (#32223 ) fix E721 warnings	2024-07-25 15:12:23 +02:00
Kashif Rasul	9b9a54e61b	[BigBird Pegasus] set _supports_param_buffer_assignment to False (#32222 ) set _supports_param_buffer_assignment to False	2024-07-25 15:11:43 +02:00
Austin	1ecedf1d9e	Update question_answering.py (#32208 )	2024-07-25 13:20:27 +01:00
Huazhong Ji	f53a5dec7b	remove unnecessary guard code related with pytorch versions 1.4.2 ~ 1.7.0 (#32210 ) remove unnecessary guard code related with pytorch versions 1.4.2 ~ 1.7.0	2024-07-25 11:04:04 +02:00
Sanchit Gandhi	5658e749ad	[whisper] fix short-form output type (#32178 ) * [whisper] fix short-form output type * add test * make style * update long-form tests * fixes * last fix * finalise test	2024-07-25 16:58:02 +08:00
Sai-Suraj-27	85a1269e19	fix: Replaced deprecated `unittest method` with the correct one (#32198 ) Replaced deprecated unittest method with the correct one.	2024-07-24 18:00:21 +01:00
Matt	edd68f4ed8	🚨 No more default chat templates (#31733 ) * No more default chat templates * Add the template to the GPT-SW3 tests since it's not available by default now * Fix GPT2 test * Fix Bloom test * Fix Bloom test * Remove default templates again	2024-07-24 17:36:32 +01:00
Penut Chen	1c122a46dc	Support dequantizing GGUF FP16 format (#31783 ) * support gguf fp16 * support gguf bf16 with pytorch * add gguf f16 test * remove bf16	2024-07-24 17:59:59 +02:00
Marc Sun	af0e4b7b37	Fix float8_e4m3fn in modeling_utils (#32193 ) * Fix float8_e4m3fn in modeling_utils * style * fix * comment	2024-07-24 17:14:05 +02:00
Raushan Turganbay	1392a6867f	Fix resize embedding with Deepspeed (#32192 ) fix resize when deepspeed	2024-07-24 19:26:20 +05:00
Arthur	8d2534c4d0	let's not warn when someone is running a forward (#32176 ) * let's not warn when someone is running a foward without cache + self.training * more models * fixup	2024-07-24 16:06:39 +02:00
Joao Gante	e0182f3bd7	RoPE: relaxed rope validation (#32182 ) * relaxed rope check * lets also accept rope_type=None, defaulting to the original implementation * type and rope_type can coexist	2024-07-24 15:00:48 +01:00
amyeroberts	165116bc14	Remove conversational pipeline tests (#32099 ) Remove conversation pipeline tests	2024-07-24 14:03:40 +01:00
Dr. Artificial曾小健	5f4ee98a7a	Update qwen2.md (#32108 ) * Update qwen2.md outdated description * Update qwen2.md amended * Update qwen2.md Update * Update qwen2.md fix wrong version code, now good to go	2024-07-24 11:54:41 +01:00
조준래	8678879f1d	fix: default value reflects the runtime environment variables rather than the ones present at import time. (#32153 ) * fix: default value reflects the runtime environment variables rather than the ones present at import time. * Fix: Change `deterministic` to None by default; use env var if None	2024-07-24 11:38:49 +01:00
Rohit Dwivedula	01be5b4879	adds: extra_repr() to MambaRMSNorm to include hidden size / size of weights in the layer (#32171 ) * adds: extra_repr() to MambaRMSNorm to include the hidden size of the layer * style fix with ruff:	2024-07-24 09:09:59 +02:00
Fanli Lin	c85510f958	[docs] change temperature to a positive value (#32077 ) fix	2024-07-23 17:47:51 +01:00
Sai-Suraj-27	bc2adb0112	fix: Fixed an if condition that is always evaluating to true (#32160 ) Fixed an if condition always evaluating to true.	2024-07-23 16:52:41 +01:00
Joao Gante	23f6a43f82	fix (#32162 )	2024-07-23 16:48:16 +01:00
Lysandre	d5a99dfcee	Llama 3.1 conversion Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>	2024-07-23 17:13:25 +02:00
Lysandre	ff0d708fe6	Dev version: v4.44.0.dev0	2024-07-23 17:12:47 +02:00
Sai-Suraj-27	d2c687b3f1	Updated `ruff` to the latest version (#31926 ) * Updated ruff version and fixed the required code accorindg to the latest version. * Updated ruff version and fixed the required code accorindg to the latest version. * Added noqa directive to ignore 1 error shown by ruff	2024-07-23 17:07:31 +02:00
RhuiDih	9cf4f2aa9a	Enhancing SFT Training Efficiency Using Packing and FlashAttention2 with Position IDs (#31629 ) * add DataCollatorBatchFlattening * Update data_collator.py * change name * new FA2 flow if position_ids is provided * add comments * minor fix * minor fix data collator * add test cases for models * add test case for data collator * remove extra code * formating for ruff check and check_repo.py * ruff format ruff format tests src utils * custom_init_isort.py	2024-07-23 15:56:41 +02:00
Deep Gandhi	7d92009af6	Added additional kwarg for successful running of optuna hyperparameter search (#31924 ) Update integration_utils.py Added additional kwarg	2024-07-23 14:41:52 +01:00
Alvaro Moran	63700628ad	feat(cache): StaticCache uses index_copy_ to avoid useless copy (#31857 ) * feat(cache): StaticCache uses index_copy_ to avoid useless copy Using index_copy_ allows for explicit in-place change of the tensor. Some backends (XLA) will otherwise copy the tensor, making the code slower and using more memory. Proposed implementation will end up using less memory and on XLA will result in less compilation, but the change is also quite generic, making no change whatsoever on CUDA or CPU backend. * feat(cache): SlidingWindowCache uses index_copy_ to avoid useless copy Applying the same change done in StaticCache. * fix(cache): fallback of index_copy_ when not implemented * fix(cache): in index_copy_ ensure tensors are on same device * [run slow] llama * fix(cache): add move of cache_position to same device in SlidingWindowCache * Revert "[run slow] llama" This reverts commit `02608dd142`.	2024-07-23 14:18:19 +02:00
amyeroberts	a009fbdab3	Fix typing to be compatible with later py versions (#32155 )	2024-07-23 12:23:34 +01:00
Sanchit Gandhi	3263b34354	Revert "Incorrect Whisper long-form decoding timestamps " (#32148 ) Revert "Incorrect Whisper long-form decoding timestamps (#32003)" This reverts commit `cd48553fc8`.	2024-07-23 18:34:30 +08:00
Amit Garg	034b477847	Rename Phi-3 rope scaling type (#31436 ) * renamed phi3 rope_scaling type * fixed trailing whitespaces * fixed test * added warning * fixed format	2024-07-23 12:33:22 +02:00
Alexandre TL	bab32d6fe9	Added mamba.py backend (#30139 ) * Update README.md * tests: forward ok * backward test done * done testing * removed check. scripts * Update README.md * added use_mambapy arg * fixed typo in warning * protected imports w/ mambapy package * delete pscan.py + raise rather than assert * Update import_utils.py * fix whitespaces and unused import * trailing whitespace + import block unformatted * Update modeling_mamba.py * transpose before pscan * shape comment * ran make style * use_mambapy=False by default Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * ran make fix-copies --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-07-23 12:32:19 +02:00
Merve Noyan	9ced33ca7f	Fix video batching to videollava (#32139 ) --------- Co-authored-by: Merve Noyan <mervenoyan@Merve-MacBook-Pro.local>	2024-07-23 13:23:23 +03:00
Cyril Vallez	a5b226ce98	Fix flash attention speed issue (#32028 ) Add the lru_cache for speed	2024-07-23 12:21:23 +02:00
Ita Zaporozhets	a1844a3209	gguf conversion add_prefix_space=None for llama3 (#31937 ) * gguf conversion forces add_prefix_space=False for llama3, this is not required and forces from_slow, which fails. changing to None + test * typo * clean test	2024-07-23 11:45:54 +02:00
Joao Gante	2e113422b3	Llama: RoPE refactor (#32135 ) Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-07-23 10:42:55 +01:00
bayllama	5a4a76edb7	Modify resize_token_embeddings to ensure output type is same as input (#31979 ) * Change resize_token_embeddings to make it return same Class that is passed to it * Add explanatory comment as requested in review * Add explanatory comments for add resizing function in lxmert * Add comment for padding_idx and moving _resize_bias in lxmert to LxmertForPreTraining --------- Co-authored-by: Prashanth Sateesh <prasatee@Prashanths-MBP.attlocal.net> Co-authored-by: Prashanth Sateesh <prasatee@Prashanths-MacBook-Pro.local>	2024-07-23 10:28:44 +01:00
Daniel Lok	1535a2c93d	Disable quick init for TapasPreTrainedModel (#32149 ) add attribute to model Signed-off-by: Daniel Lok <daniel.lok@databricks.com>	2024-07-23 10:26:00 +01:00
mig-mfreitas	34b43211d7	Add YaRN and Dynamic-YaRN RoPE Scaling Methods (#30910 ) * Add YaRN and Dynamic-YaRN RoPE Scaling Methods YaRN (Yet another RoPE extension method) combines the NTK-By-Parts Interpolation and Attention Scaling methods, improving upon existing RoPE interpolation methods for longer context window sizes. Fine-tuned models maintain their original performance across benchmarks while enabling efficient extrapolation and transfer learning for quicker convergence, especially in compute-limited environments. We implement YaRN and Dynamic-YaRN for the following list of models: - LLaMA - Falcon - GPT-NeoX - Olmo - Persimmon - Phi - StableLM - OpenLLaMA New unit tests are added to assert YaRN's correct behavior on both short and long sequence inputs. For more details, please refer to https://arxiv.org/abs/2309.00071. Co-authored-by: Miguel Almeida <miguel.pessanha.almeida@tecnico.ulisboa.pt> * Refactor YaRN implementation for LLaMA Iterate on YaRN implementation for LLaMA and remove diff from remaining models for increased PR modularity. This commit includes the following changes: - Merge 'yarn_rope_scaling' and 'rope_scaling' dictionaries - Remove unnecessary attributes ('extrapolation_factor' and 'finetuned') from YaRN classes - Inherit 'forward' method in YaRN classes from superclass - Rename 'yarn' method to 'compute_yarn_scaling' - Extend YaRN tests with further assertions - Fix style inconsistencies Co-authored-by: Miguel Monte e Freitas <miguelmontefreitas@tecnico.ulisboa.pt> * Refactor Tensor Building Logic for YaRN - Comply with the the tensor building logic introduced in #30743 - Add referencing to the optimized Attention Factor equation - Remove Dynamic YaRN for a more agile deployment Co-authored-by: mig-mfreitas <mig-mfreitas@users.noreply.github.com> * remove unwanted file --------- Co-authored-by: Miguel Almeida <miguel.pessanha.almeida@tecnico.ulisboa.pt> Co-authored-by: mig-mfreitas <mig-mfreitas@users.noreply.github.com> Co-authored-by: Joao Gante <joao@huggingface.co>	2024-07-23 10:07:58 +01:00
KonradSzafer	7405c1c77e	Add method to retrieve used chat template (#32032 ) encapsulate chat template logic	2024-07-23 10:56:21 +02:00
Anton Vlasjuk	605f3245dc	Fix mask creations of `GPTNeoX` and `GPT2` (#31944 ) * fix mask creation of gpt2 and gpt_neox caused by me * forgot the reshape of masks when shape > 2 * add tests for gpt neox and gpt2 * nit on a comment	2024-07-23 10:11:12 +02:00
Sanchit Gandhi	2782aadae2	[modelling] remove un-necessary transpose for fa2 attention (#31749 ) * [whisper] remove un-necessary transpose for fa2 attention * propagate	2024-07-23 14:55:16 +08:00

1 2 3 4 5 ...

16502 Commits