Sai-Suraj-27
49928892d6
fix(docs): Fixed a link in docs ( #32274 )
...
Fixed a link in docs.
2024-07-29 10:50:43 +01:00
Fanli Lin
6494479f1d
make p_mask
a numpy array before passing to select_starts_ends
( #32076 )
...
* fix
* bug fix
* refine
* fix
2024-07-29 10:29:11 +01:00
Joao Gante
535fe78b9f
Repo: remove exceptions in check_docstrings
( #32259 )
...
remove exceptions
2024-07-29 11:06:05 +02:00
Sai-Suraj-27
a2ad9d5ad5
fix: Fixed wrong argument passed to convert_blip_checkpoint
function call ( #32262 )
...
Removed one wrong argument passed to convert_blip_checkpoint function call.
2024-07-29 10:43:09 +02:00
leejet
5019aabfac
Optimize t5 tokenize logic to avoid redundant calls ( #32270 )
...
* Optimize t5 tokenize logic to avoid redundant calls
* fix and overwrite copies
2024-07-29 09:51:43 +02:00
Yih-Dar
f2122cc6eb
Upload new model failure report to Hub ( #32264 )
...
upload
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-07-29 09:42:54 +02:00
Raushan Turganbay
f739687684
🚨 Bloom support for cache class ( #31445 )
...
* bloom dynamic cache
* bloom follows standard cache format
* no skips for bloom anymore
* use cache position when possible
* clean up
* codestyle
* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* pr comments
* isinstance fix
* address comments
* make musicgen test happy
* [run-slow] bloom
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-29 10:58:59 +05:00
Joao Gante
44f6fdd74f
Llama 3.1: replace for loop by tensor ops at inv_freq initialization ( #32244 )
...
* replace for loop by tensor ops
* rm assert; readability
2024-07-27 10:19:46 +01:00
Yih-Dar
8da9068730
More flexible trigger condition ( #32251 )
...
update
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-07-26 20:52:45 +02:00
Raushan Turganbay
81233c069c
Flash-Attn: fix generation when no attention mask or no pading ( #32241 )
...
* fix
* fix prev test (half of failures)
* [run-slow] llama, gemma2
* [run-slow] llama, gemma2
2024-07-26 14:45:55 +05:00
Fanli Lin
27c7f971c0
[tests] fix static
cache implementation is not compatible with attn_implementation==flash_attention_2
( #32039 )
...
* add flash attention check
* fix
* fix
2024-07-26 11:41:27 +02:00
Connor Anderson
5f841c74b6
Add check for target_sizes is None
in post_process_image_guided_detection
for owlv2 ( #31934 )
...
* Add check for target_sizes is None in post_process_image_guided_detection
* Make sure Owlvit and Owlv2 in sync
* Fix incorrect indentation; add check for correct size of target_sizes
2024-07-26 10:05:46 +01:00
Rohit Dwivedula
f9756d9edb
Adds: extra_repr for RMSNorm layers in most models ( #32204 )
...
* adds: extra_repr() to RMSNorm layers in multiple models
* adds: extra_repr for deprecated models as well
* formatting as per style guide
2024-07-26 11:05:38 +02:00
Sai-Suraj-27
b8e5cd5396
Refactor: Removed un-necessary object
base class ( #32230 )
...
* Refactored to remove un-necessary object base class.
* small fix.
2024-07-26 10:33:02 +02:00
João Nadkarni
1c7ebf1d6e
don't log base model architecture in wandb if log model is false ( #32143 )
...
* don't log base model architecture in wandb is log model is false
* Update src/transformers/integrations/integration_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* convert log model setting into an enum
* fix formatting
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-26 09:38:59 +02:00
Raushan Turganbay
c46edfb823
Resize embeds with DeepSpeed ( #32214 )
...
* fix resize when deepspeed
* deepsped uses new embeds
* we needed this
2024-07-26 10:52:06 +05:00
Raushan Turganbay
fad15fba78
Llava: generate without images ( #32183 )
...
* llava w/o images
* tests
2024-07-26 10:17:27 +05:00
Raushan Turganbay
4ab33c2d81
Generation: stop at eos
for assisted decoding ( #31301 )
...
* fix
* move changes to prompt lookup
* add test
* set eos in assistant model
* style
* fix flakiness
* changes for new `main`
* Update tests/generation/test_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/generation/test_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add comment to explain
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-26 10:16:06 +05:00
Pavel Iakubovskii
9d6c0641c4
Fix code snippet for Grounding DINO ( #32229 )
...
Fix code snippet for grounding-dino
2024-07-25 19:20:47 +01:00
jrhe
3a83ec48a6
Allow a specific microphone to be used by the ffmpeg audio pipeline utility functions. Default to using the currently active microphone on Mac ( #31846 )
...
* use currently active microphone on mac for ffmpeg_microphone
* Allow ffmpeg_microphone device to be specified
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-07-25 17:16:13 +01:00
Huazhong Ji
6ed0bf1e85
translate philosophy.md to chinese ( #32177 )
...
* translate philosophy.md to chinese
* add the missing link
2024-07-25 09:01:06 -07:00
Yih-Dar
df6eee9201
Follow up for #31973 ( #32025 )
...
* fix
* [test_all] trigger full CI
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-07-25 16:12:23 +02:00
Kashif Rasul
de2318894e
[warnings] fix E721 warnings ( #32223 )
...
fix E721 warnings
2024-07-25 15:12:23 +02:00
Kashif Rasul
9b9a54e61b
[BigBird Pegasus] set _supports_param_buffer_assignment to False ( #32222 )
...
set _supports_param_buffer_assignment to False
2024-07-25 15:11:43 +02:00
Austin
1ecedf1d9e
Update question_answering.py ( #32208 )
2024-07-25 13:20:27 +01:00
Huazhong Ji
f53a5dec7b
remove unnecessary guard code related with pytorch versions 1.4.2 ~ 1.7.0 ( #32210 )
...
remove unnecessary guard code related with pytorch versions 1.4.2 ~
1.7.0
2024-07-25 11:04:04 +02:00
Sanchit Gandhi
5658e749ad
[whisper] fix short-form output type ( #32178 )
...
* [whisper] fix short-form output type
* add test
* make style
* update long-form tests
* fixes
* last fix
* finalise test
2024-07-25 16:58:02 +08:00
Sai-Suraj-27
85a1269e19
fix: Replaced deprecated unittest method
with the correct one ( #32198 )
...
Replaced deprecated unittest method with the correct one.
2024-07-24 18:00:21 +01:00
Matt
edd68f4ed8
🚨 No more default chat templates ( #31733 )
...
* No more default chat templates
* Add the template to the GPT-SW3 tests since it's not available by default now
* Fix GPT2 test
* Fix Bloom test
* Fix Bloom test
* Remove default templates again
2024-07-24 17:36:32 +01:00
Penut Chen
1c122a46dc
Support dequantizing GGUF FP16 format ( #31783 )
...
* support gguf fp16
* support gguf bf16 with pytorch
* add gguf f16 test
* remove bf16
2024-07-24 17:59:59 +02:00
Marc Sun
af0e4b7b37
Fix float8_e4m3fn in modeling_utils ( #32193 )
...
* Fix float8_e4m3fn in modeling_utils
* style
* fix
* comment
2024-07-24 17:14:05 +02:00
Raushan Turganbay
1392a6867f
Fix resize embedding with Deepspeed ( #32192 )
...
fix resize when deepspeed
2024-07-24 19:26:20 +05:00
Arthur
8d2534c4d0
let's not warn when someone is running a forward ( #32176 )
...
* let's not warn when someone is running a foward without cache + self.training
* more models
* fixup
2024-07-24 16:06:39 +02:00
Joao Gante
e0182f3bd7
RoPE: relaxed rope validation ( #32182 )
...
* relaxed rope check
* lets also accept rope_type=None, defaulting to the original implementation
* type and rope_type can coexist
2024-07-24 15:00:48 +01:00
amyeroberts
165116bc14
Remove conversational pipeline tests ( #32099 )
...
Remove conversation pipeline tests
2024-07-24 14:03:40 +01:00
Dr. Artificial曾小健
5f4ee98a7a
Update qwen2.md ( #32108 )
...
* Update qwen2.md
outdated description
* Update qwen2.md
amended
* Update qwen2.md
Update
* Update qwen2.md
fix wrong version code, now good to go
2024-07-24 11:54:41 +01:00
조준래
8678879f1d
fix: default value reflects the runtime environment variables rather than the ones present at import time. ( #32153 )
...
* fix: default value reflects the runtime environment variables rather than the ones present at import time.
* Fix: Change `deterministic` to None by default; use env var if None
2024-07-24 11:38:49 +01:00
Rohit Dwivedula
01be5b4879
adds: extra_repr() to MambaRMSNorm to include hidden size / size of weights in the layer ( #32171 )
...
* adds: extra_repr() to MambaRMSNorm to include the hidden size of the layer
* style fix with ruff:
2024-07-24 09:09:59 +02:00
Fanli Lin
c85510f958
[docs] change temperature to a positive value ( #32077 )
...
fix
2024-07-23 17:47:51 +01:00
Sai-Suraj-27
bc2adb0112
fix: Fixed an if condition that is always evaluating to true ( #32160 )
...
Fixed an if condition always evaluating to true.
2024-07-23 16:52:41 +01:00
Joao Gante
23f6a43f82
fix ( #32162 )
2024-07-23 16:48:16 +01:00
Lysandre
d5a99dfcee
Llama 3.1 conversion
...
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
2024-07-23 17:13:25 +02:00
Lysandre
ff0d708fe6
Dev version: v4.44.0.dev0
2024-07-23 17:12:47 +02:00
Sai-Suraj-27
d2c687b3f1
Updated ruff
to the latest version ( #31926 )
...
* Updated ruff version and fixed the required code accorindg to the latest version.
* Updated ruff version and fixed the required code accorindg to the latest version.
* Added noqa directive to ignore 1 error shown by ruff
2024-07-23 17:07:31 +02:00
RhuiDih
9cf4f2aa9a
Enhancing SFT Training Efficiency Using Packing and FlashAttention2 with Position IDs ( #31629 )
...
* add DataCollatorBatchFlattening
* Update data_collator.py
* change name
* new FA2 flow if position_ids is provided
* add comments
* minor fix
* minor fix data collator
* add test cases for models
* add test case for data collator
* remove extra code
* formating for ruff check and check_repo.py
* ruff format
ruff format tests src utils
* custom_init_isort.py
2024-07-23 15:56:41 +02:00
Deep Gandhi
7d92009af6
Added additional kwarg for successful running of optuna hyperparameter search ( #31924 )
...
Update integration_utils.py
Added additional kwarg
2024-07-23 14:41:52 +01:00
Alvaro Moran
63700628ad
feat(cache): StaticCache uses index_copy_ to avoid useless copy ( #31857 )
...
* feat(cache): StaticCache uses index_copy_ to avoid useless copy
Using index_copy_ allows for explicit in-place change of the tensor.
Some backends (XLA) will otherwise copy the tensor, making the code
slower and using more memory.
Proposed implementation will end up using less memory and on XLA will
result in less compilation, but the change is also quite generic, making
no change whatsoever on CUDA or CPU backend.
* feat(cache): SlidingWindowCache uses index_copy_ to avoid useless copy
Applying the same change done in StaticCache.
* fix(cache): fallback of index_copy_ when not implemented
* fix(cache): in index_copy_ ensure tensors are on same device
* [run slow] llama
* fix(cache): add move of cache_position to same device in SlidingWindowCache
* Revert "[run slow] llama"
This reverts commit 02608dd142
.
2024-07-23 14:18:19 +02:00
amyeroberts
a009fbdab3
Fix typing to be compatible with later py versions ( #32155 )
2024-07-23 12:23:34 +01:00
Sanchit Gandhi
3263b34354
Revert "Incorrect Whisper long-form decoding timestamps " ( #32148 )
...
Revert "Incorrect Whisper long-form decoding timestamps (#32003 )"
This reverts commit cd48553fc8
.
2024-07-23 18:34:30 +08:00
Amit Garg
034b477847
Rename Phi-3 rope scaling type ( #31436 )
...
* renamed phi3 rope_scaling type
* fixed trailing whitespaces
* fixed test
* added warning
* fixed format
2024-07-23 12:33:22 +02:00