Sylvain Gugger
786092a35e
Rework a bit the LLaMA conversion script ( #22236 )
...
* Update LLaMA conversion script
* Doc
* Fix the weight size for the 13B checkpoint
* Update src/transformers/models/llama/convert_llama_weights_to_hf.py
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
---------
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
2023-03-20 11:30:36 -04:00
Sylvain Gugger
43efd7cb13
Fix balanced and auto device_map ( #22271 )
2023-03-20 11:24:17 -04:00
yqy2001
89f0fda5d3
Fix the gradient checkpointing bug of the llama model ( #22270 )
...
fix grad ckpt bug of llama
2023-03-20 10:26:50 -04:00
heya5
cf0af9a31b
[Trainer] Add optional communication backends for torch.distributed when using GPU ( #22247 )
...
Update training_args.py
2023-03-20 09:17:34 -04:00
Nicola Procopio
c4bf6f38bd
Italian translation perf_infer_cpu ( #22243 )
...
* added translated files
added perf_train_cpu and perf_train_cpu_many
* updated toctree
* updated toctree
* added file
perf_infer_cpu.medx
* italian translation perf_infer_cpu.mdx
2023-03-20 09:16:07 -04:00
yesinkim
466144d440
[Docs] fix typos in some tokenizer docs ( #22256 )
...
[Docs] fix typos
Co-authored-by: yesinkim <yesinkim@yesinkimui-MacBookAir.local>
2023-03-20 12:17:31 +00:00
Pasquale Minervini
a48310de47
Update training_args.py -- a nightly install is not required anymore for torch.compile ( #22266 )
...
Update training_args.py
A nightly install is not required anymore for `torch.compile`.
2023-03-20 12:00:05 +00:00
Stas Bekman
60d51ef512
[trainer] param count for deepspeed zero3 ( #22193 )
...
[trainer] param count for zero3
2023-03-17 11:02:55 -07:00
Guangyuan Ma
cf601b902f
Fix Unnecessary move of tensors from CPU to GPU in LlamaRotaryEmbedding ( #22234 )
...
push
2023-03-17 13:56:32 -04:00
Yih-Dar
bec075612a
Revert "Use dash==2.8.1
for now for daily CI" ( #22233 )
...
Revert "Use `dash==2.8.1` for now for daily CI (#22227 )"
This reverts commit 53218671d9
.
2023-03-17 16:54:27 +01:00
Ali Hassani
3028b20a71
Fix natten ( #22229 )
...
* Add kernel size to NATTEN's QK arguments.
The new NATTEN 0.14.5 supports PyTorch 2.0, but also adds an additional
argument to the QK operation to allow optional RPBs.
This ends up failing NATTEN tests.
This commit adds NATTEN back to circleci and adds the arguments to get
it working again.
* Force NATTEN >= 0.14.5
2023-03-17 11:07:55 -04:00
Seb0
074490b2c2
fix(docs): fix task guide links in model docs ( #22226 )
...
fix(docs): task guide links in model docs
2023-03-17 14:30:17 +00:00
Maria Khalusova
314cdf7c25
Removed .mdx extension in two links ( #22230 )
...
removed .mdx extension
2023-03-17 10:27:12 -04:00
lewtun
f251441387
Add LlamaForSequenceClassification ( #22209 )
...
* Add LlamaForSequenceClassification
* Update src/transformers/models/llama/modeling_llama.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Update src/transformers/models/llama/modeling_llama.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Add docstring
* Add test
* Add input embedding getter and setter
* Remove dead code
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-03-17 14:39:26 +01:00
Wang, Yi
675d2a5a00
fix AutoTP in deepspeed could not work for bloom ( #22196 )
...
* fix AutoTP in deepspeed could not work for bloom
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* add a method in BloomModel to build ailib
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
---------
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2023-03-17 09:28:17 -04:00
Sylvain Gugger
00934026a4
LLaMA house-keeping ( #22216 )
...
* LLaMA house-keeping
* Doc links
2023-03-17 08:55:15 -04:00
Maria Khalusova
42f8f76402
Depth estimation task guide ( #22205 )
...
* added doc to toc, auto tip with supported models, mention of task guide in model docs
* make style
* removed "see also"
* minor fix
2023-03-17 08:36:23 -04:00
Yih-Dar
53218671d9
Use dash==2.8.1
for now for daily CI ( #22227 )
...
Use dash 2.8.1 for now
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-17 13:27:14 +01:00
wangpeng
af1c864cdc
fix code example in mgp-str doc ( #22219 )
...
Co-authored-by: yue kun <yuekun.wp@alibaba-inc.com>
2023-03-17 09:40:06 +00:00
Kevin Turner
33d033d694
fix typos in llama.mdx ( #22223 )
2023-03-17 08:43:18 +00:00
Yih-Dar
97a3d16a69
Hotfix for natten issue with torch 2.0.0 on CircleCI ( #22218 )
...
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-16 23:57:26 +01:00
Yih-Dar
5110e5748e
🔥 py38 + torch 2 🔥 🔥 🔥 🚀 ( #22204 )
...
* py38 + torch 2
* increment cache versions
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-16 22:59:23 +01:00
Susnato Dhar
fb366b9a2a
fixes a typo in WhisperFeatureExtractor docs. ( #22208 )
...
* fixes a typo
* .
2023-03-16 16:08:05 +00:00
Younes Belkada
da3ba3a167
[XGLM
] Add accelerate
support for XGLM ( #22207 )
...
* add `accelerate` support for XGLM
* fix order
2023-03-16 16:18:05 +01:00
SatyaJandhyalaAtMS
a88a4dae19
Temporarily fix ONNX model exporting error ( #21830 )
...
* Temporarily fix https://github.com/microsoft/onnx-converters-private/issues/143
* Reduced column width
* Fix formatting.
* Revert "Temporarily fix https://github.com/microsoft/onnx-converters-private/issues/143 "
This reverts commit 6e95a108042118d204da447729f3834affa354fc.
* Fix export error.
* Revert "Fix formatting."
This reverts commit 8310f60da10358edbdf77a2a2f3c83ee55066cb8.
* Propagated changes made in SwinV2 to Swin2SR
2023-03-16 10:56:26 -04:00
Yih-Dar
4c5c0af7e5
Update tiny model creation script ( #22202 )
...
* Update UNCONVERTIBLE_MODEL_ARCHITECTURES
* Deal with 2 model tester classes in single test file
* Deal with 2 model tester classes in single test file
* Deal with 2 model tester classes in single test file
* make style and quality
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-16 14:21:58 +01:00
Jason Phang
464d420775
LLaMA Implementation ( #21955 )
...
* LLaMA
* sharding and docs
* tweak
* black
* inits
* ruff
* LLAMA_PRETRAINED_CONFIG_ARCHIVE_MAP
* init
* no checkpoint
* docs
* ruff
* type_vocab_size
* tokenizer fixes
* tokenizer fixes
* Update tokenization_llama.py
* Update tokenization_llama.py
* Update configuration_llama.py
* Update modeling_llama.py
* tokenizer add_bos by default
* licenses
* remove decoder
* norms and mlp
* rope overhaul
* tweaks
* black
* mention OPT implementation
* off-by-one naming
* typo
* fix
* tokenization fix and slicing bug
* padding config
* cleanup
* black
* update tests
* undo typo
* fix vocab caching logic
* ruff
* docbuilder
* attn fix from BlackSamorez
* initial feedback
* typo
* docs
* llama case
* llama case
* load checkpoint docs
* comment about tokenizer
* tokenizer defaults
* clear past_key_values if use_cache=False
* last tweaks
* last tweaks
* last tweaks
* last tweaks
---------
Co-authored-by: Stella Biderman <stellabiderman@gmail.com>
2023-03-16 09:01:15 -04:00
Jason Phang
0041be5b3d
LLaMA Implementation ( #21955 )
...
* LLaMA
* sharding and docs
* tweak
* black
* inits
* ruff
* LLAMA_PRETRAINED_CONFIG_ARCHIVE_MAP
* init
* no checkpoint
* docs
* ruff
* type_vocab_size
* tokenizer fixes
* tokenizer fixes
* Update tokenization_llama.py
* Update tokenization_llama.py
* Update configuration_llama.py
* Update modeling_llama.py
* tokenizer add_bos by default
* licenses
* remove decoder
* norms and mlp
* rope overhaul
* tweaks
* black
* mention OPT implementation
* off-by-one naming
* typo
* fix
* tokenization fix and slicing bug
* padding config
* cleanup
* black
* update tests
* undo typo
* fix vocab caching logic
* ruff
* docbuilder
* attn fix from BlackSamorez
* initial feedback
* typo
* docs
* llama case
* llama case
* load checkpoint docs
* comment about tokenizer
* tokenizer defaults
* clear past_key_values if use_cache=False
* last tweaks
* last tweaks
* last tweaks
* last tweaks
---------
Co-authored-by: Stella Biderman <stellabiderman@gmail.com>
2023-03-16 09:00:53 -04:00
Baelish03
09922da4a7
Italian Translation of migration.mdx ( #22183 )
...
* Tranlstion Italian: migration
* Update migration.mdx
minor fixes
* Update _toctree.yml
* Delete migration.mdx
* Add italian translation of migration.mdx
* Update of migration.mdx translation and toctree
2023-03-16 12:00:07 +00:00
Yih-Dar
52a57f7c7c
Update expected values in MgpstrModelIntegrationTest
( #22195 )
...
Update values
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-16 11:48:52 +00:00
Alara Dirik
1485bd9c02
Fix typo in Align docs ( #22199 )
...
Fix align docs typo
2023-03-16 13:41:48 +03:00
Yih-Dar
1c4a9acc73
Fix DeepSpeed CI ( #22194 )
...
* Deal with torch-tensorrt
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-16 05:52:40 +01:00
Prathik Rao
7c4999e495
t5 remove data dependency ( #22097 )
...
* t5 remove data dependency
* make style
* make fix-copies
---------
Co-authored-by: Prathik Rao <prathikrao@microsoft.com>
2023-03-15 16:11:15 -04:00
Anahita Bhiwandiwalla
16121bae5c
Update BridgeTowerForContrastiveLearning ( #22145 )
...
* Use return_loss for BridgeTowerForContrastiveLearning, add example
* fix tests
* Update example in BridgeTowerForContrastiveLearning
* Update test_modeling_bridgetower.py
* update model output format
* minor update
* Update src/transformers/models/bridgetower/modeling_bridgetower.py
* make style
---------
Co-authored-by: Tiep Le <97980157+tileintel@users.noreply.github.com>
Co-authored-by: Tiep Le <tiep.le@intel.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-15 20:54:38 +01:00
Sylvain Gugger
42ad693b7b
Regression pipeline device ( #22190 )
...
* Fix regression in pipeline when device=-1 is passed
* Add regression test
2023-03-15 14:13:38 -04:00
amyeroberts
737681477c
Revert 22152 MaskedImageCompletionOutput changes ( #22187 )
...
Revert changes
2023-03-15 18:37:23 +01:00
浮躁的小螃蟹
7b0e2cfdfb
Fix: unfinished_sequences with correct device ( #22184 )
...
Fix: unfinished_sequences with correct device
The original code was causing errors when running torch.jit.trace due to the tensor options being incorrect. I fixed this by using torch.ones to create a tensor with the correct device and dtype. This should resolve the issue with running torch.jit.trace.
2023-03-15 16:27:19 +00:00
Sylvain Gugger
f7329751fe
Run all tests by default ( #22162 )
2023-03-14 17:30:43 -04:00
Sylvain Gugger
b7036f4912
Load optimizer state on CPU to avoid CUDA OOM ( #22159 )
2023-03-14 17:30:32 -04:00
Sylvain Gugger
ebdb185bef
v4.28.0.dev0
2023-03-14 13:49:10 -04:00
Sylvain Gugger
c52c5282ef
Revert "Enforce same behavior as PyTorch 2.0 for older versions" ( #22163 )
...
Revert "Enforce same behavior as PyTorch 2.0 for older versions (#22136 )"
This reverts commit 1c801d65eb
.
2023-03-14 13:45:46 -04:00
Stas Bekman
085bf5c1fe
[trainer] add --optim adamw_torch_fused
for pt-2.0+ ( #22144 )
...
* [trainer] add --optim adamw_torch_fused
* change optim default
* deal with non-torch
* revert default change; prep; add fp16/amp assert
* typo
* typo
2023-03-14 10:22:03 -07:00
amyeroberts
c6318c3788
to_pil - don't rescale if int and in range 0-255 ( #22158 )
...
* Don't rescale if in and in range 0-255
* Raise value error if int values too large
* Update tests/test_image_transforms.py
* Update tests/test_image_transforms.py
2023-03-14 15:43:44 +00:00
Alara Dirik
3b22bfbc6a
Create MaskedImageCompletionOutput and fix ViT docs ( #22152 )
...
* create MaskedImageCompletionOutput
* fix bugs
* fix bugs
2023-03-14 13:55:18 +00:00
Sylvain Gugger
b45192ec47
Fix big model inference for T5 models in float16 ( #22095 )
...
* Fix big model inference for T5 models in float16
* Apply suggestions from code review
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Style
* Trigger CI with latest release
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-03-14 09:20:16 -04:00
Nicola Procopio
7f5ad6c35b
Translation Italian: perf_train_cpu and perf_train_cpu_many ( #22151 )
...
* added translated files
added perf_train_cpu and perf_train_cpu_many
* updated toctree
2023-03-14 11:09:36 +00:00
Yih-Dar
ff88703501
Update 2 doctest expected values for torch 2.0.0 ( #22148 )
...
update values
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-14 09:13:16 +00:00
Alara Dirik
cdddfbffa1
Add ConvNeXT V2 ( #21679 )
...
* Add ConvNeXt V2 to transformers
* TF model is separated from the PR to fix issues
2023-03-14 12:08:14 +03:00
Yih-Dar
6c2ad00c46
Move is_pipeline_test_to_skip
to specific model test classes ( #21999 )
...
* Move `is_pipeline_test_to_skip` to specific model test classes
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-14 10:03:02 +01:00
Arthur
2beabd24f0
[ 🛠️ ] Fix-whisper-breaking-changes ( #21965 )
...
* temp fix
* temporary fix
* update
* fix tests
* fixup
* update based on reveiew
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* update to fix tests
* update docstring
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
2023-03-14 09:23:48 +01:00