Commit Graph

12387 Commits

Author SHA1 Message Date
Alara Dirik
0558914dff
Add MaskedImageModelingOutput (#22212)
* Add MaskedImageModelingOutput
2023-03-22 07:35:47 +03:00
Yih-Dar
0dcb46e7a4
Final update of doctest (#22299)
* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-22 01:00:33 +01:00
Stas Bekman
89a0a9eace
[deepspeed] offload + non-cpuadam optimizer exception doc (#22044)
* [deepspeed] offload + non-cpuadam optimizer exception doc

* deps
2023-03-21 17:00:05 -07:00
Ali Hassani
5990743fdd
Correct NATTEN function signatures and force new version (#22298) 2023-03-21 17:21:34 -04:00
Yanming W
d35f729649
Restore fp16 support on xla gpu device (#22300) 2023-03-21 16:32:43 -04:00
Yih-Dar
67c2dbdb54
Time to Say Goodbye, torch 1.7 and 1.8 (#22291)
* time to say goodbye, torch 1.7 and 1.8

* clean up torch_int_div

* clean up is_torch_less_than_1_8-9

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-21 19:22:01 +01:00
Davide Gazzè
86c7931a70
Add translation perf_infer_gpu_one for it (#22296)
Add translation
2023-03-21 13:07:30 -04:00
Yih-Dar
d0b942d1dc
fix more doctests (#22292)
* fix more doctests

* fix style

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-21 16:16:17 +01:00
Yih-Dar
48327c5718
More doctests (#22268)
* all doctests

* Skip failed tests

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-21 13:27:30 +01:00
Gerald Cuder
5a2b77a6c1
Fix error in mixed precision training of TFCvtModel (#22267)
* Make sure CVT can be trained using mixed precision

* Add test for keras-fit with mixed-precision

* Update tests/models/cvt/test_modeling_tf_cvt.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

---------

Co-authored-by: gcuder <Gerald.Cuder@iacapps.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2023-03-21 12:12:57 +00:00
Andrei Panferov
330d8b991f
replace_8bit_linear modules_to_not_convert default value fix (#22238)
* Fixed modules_to_not_convert default value

* Fixed modules_to_not_convert docstring

* Update src/transformers/utils/bitsandbytes.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/utils/bitsandbytes.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* ["lm_head"] if modules_to_not_convert is None

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-03-21 10:16:07 +00:00
amyeroberts
c07a02a4b7
Update vision docstring bool masked pos (#22237)
* Add bool_masked_pos to forward docstrings

* Add note about mask ratio - videomae

* Fix up

* Fix indenting
2023-03-20 20:06:16 +00:00
Maria Khalusova
7bd8650512
Example of pad_to_multiple_of for padding and truncation guide & docstring update (#22278)
* added an example of pad_to_multiple_of

* make style

* addressed feedback
2023-03-20 14:18:55 -04:00
Antoni Viros
fb0a38b4f2
Move torch.compile() wrapping after DDP/FSDP wrapping to ensure correct graph breaks during training (#22279) 2023-03-20 13:54:01 -04:00
amyeroberts
8ac29fe090
Fix doc links (#22274) 2023-03-20 17:07:31 +00:00
Sylvain Gugger
da005253b8
Proper map location for optimizer load (#22273)
* Proper map location for optimizer load

* What happened to my code?
2023-03-20 11:30:46 -04:00
Sylvain Gugger
786092a35e
Rework a bit the LLaMA conversion script (#22236)
* Update LLaMA conversion script

* Doc

* Fix the weight size for the 13B checkpoint

* Update src/transformers/models/llama/convert_llama_weights_to_hf.py

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

---------

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
2023-03-20 11:30:36 -04:00
Sylvain Gugger
43efd7cb13
Fix balanced and auto device_map (#22271) 2023-03-20 11:24:17 -04:00
yqy2001
89f0fda5d3
Fix the gradient checkpointing bug of the llama model (#22270)
fix grad ckpt bug of llama
2023-03-20 10:26:50 -04:00
heya5
cf0af9a31b
[Trainer] Add optional communication backends for torch.distributed when using GPU (#22247)
Update training_args.py
2023-03-20 09:17:34 -04:00
Nicola Procopio
c4bf6f38bd
Italian translation perf_infer_cpu (#22243)
* added translated files

added perf_train_cpu and perf_train_cpu_many

* updated toctree

* updated toctree

* added file

perf_infer_cpu.medx

* italian translation perf_infer_cpu.mdx
2023-03-20 09:16:07 -04:00
yesinkim
466144d440
[Docs] fix typos in some tokenizer docs (#22256)
[Docs] fix typos

Co-authored-by: yesinkim <yesinkim@yesinkimui-MacBookAir.local>
2023-03-20 12:17:31 +00:00
Pasquale Minervini
a48310de47
Update training_args.py -- a nightly install is not required anymore for torch.compile (#22266)
Update training_args.py

A nightly install is not required anymore for `torch.compile`.
2023-03-20 12:00:05 +00:00
Stas Bekman
60d51ef512
[trainer] param count for deepspeed zero3 (#22193)
[trainer] param count for zero3
2023-03-17 11:02:55 -07:00
Guangyuan Ma
cf601b902f
Fix Unnecessary move of tensors from CPU to GPU in LlamaRotaryEmbedding (#22234)
push
2023-03-17 13:56:32 -04:00
Yih-Dar
bec075612a
Revert "Use dash==2.8.1 for now for daily CI" (#22233)
Revert "Use `dash==2.8.1` for now for daily CI (#22227)"

This reverts commit 53218671d9.
2023-03-17 16:54:27 +01:00
Ali Hassani
3028b20a71
Fix natten (#22229)
* Add kernel size to NATTEN's QK arguments.

The new NATTEN 0.14.5 supports PyTorch 2.0, but also adds an additional
argument to the QK operation to allow optional RPBs.

This ends up failing NATTEN tests.

This commit adds NATTEN back to circleci and adds the arguments to get
it working again.

* Force NATTEN >= 0.14.5
2023-03-17 11:07:55 -04:00
Seb0
074490b2c2
fix(docs): fix task guide links in model docs (#22226)
fix(docs): task guide links in model docs
2023-03-17 14:30:17 +00:00
Maria Khalusova
314cdf7c25
Removed .mdx extension in two links (#22230)
removed .mdx extension
2023-03-17 10:27:12 -04:00
lewtun
f251441387
Add LlamaForSequenceClassification (#22209)
* Add LlamaForSequenceClassification

* Update src/transformers/models/llama/modeling_llama.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update src/transformers/models/llama/modeling_llama.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Add docstring

* Add test

* Add input embedding getter and setter

* Remove dead code

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-03-17 14:39:26 +01:00
Wang, Yi
675d2a5a00
fix AutoTP in deepspeed could not work for bloom (#22196)
* fix AutoTP in deepspeed could not work for bloom

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* add a method in BloomModel to build ailib

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

---------

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2023-03-17 09:28:17 -04:00
Sylvain Gugger
00934026a4
LLaMA house-keeping (#22216)
* LLaMA house-keeping

* Doc links
2023-03-17 08:55:15 -04:00
Maria Khalusova
42f8f76402
Depth estimation task guide (#22205)
* added doc to toc, auto tip with  supported models, mention of task guide in model docs

* make style

* removed "see also"

* minor fix
2023-03-17 08:36:23 -04:00
Yih-Dar
53218671d9
Use dash==2.8.1 for now for daily CI (#22227)
Use dash 2.8.1 for now

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-17 13:27:14 +01:00
wangpeng
af1c864cdc
fix code example in mgp-str doc (#22219)
Co-authored-by: yue kun <yuekun.wp@alibaba-inc.com>
2023-03-17 09:40:06 +00:00
Kevin Turner
33d033d694
fix typos in llama.mdx (#22223) 2023-03-17 08:43:18 +00:00
Yih-Dar
97a3d16a69
Hotfix for natten issue with torch 2.0.0 on CircleCI (#22218)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-16 23:57:26 +01:00
Yih-Dar
5110e5748e
🔥py38 + torch 2 🔥🔥🔥🚀 (#22204)
* py38 + torch 2

* increment cache versions

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-16 22:59:23 +01:00
Susnato Dhar
fb366b9a2a
fixes a typo in WhisperFeatureExtractor docs. (#22208)
* fixes a typo

* .
2023-03-16 16:08:05 +00:00
Younes Belkada
da3ba3a167
[XGLM] Add accelerate support for XGLM (#22207)
* add `accelerate` support for XGLM

* fix order
2023-03-16 16:18:05 +01:00
SatyaJandhyalaAtMS
a88a4dae19
Temporarily fix ONNX model exporting error (#21830)
* Temporarily fix https://github.com/microsoft/onnx-converters-private/issues/143

* Reduced column width

* Fix formatting.

* Revert "Temporarily fix https://github.com/microsoft/onnx-converters-private/issues/143"

This reverts commit 6e95a108042118d204da447729f3834affa354fc.

* Fix export error.

* Revert "Fix formatting."

This reverts commit 8310f60da10358edbdf77a2a2f3c83ee55066cb8.

* Propagated changes made in SwinV2 to Swin2SR
2023-03-16 10:56:26 -04:00
Yih-Dar
4c5c0af7e5
Update tiny model creation script (#22202)
* Update UNCONVERTIBLE_MODEL_ARCHITECTURES

* Deal with 2 model tester classes in single test file

* Deal with 2 model tester classes in single test file

* Deal with 2 model tester classes in single test file

* make style and quality

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-16 14:21:58 +01:00
Jason Phang
464d420775
LLaMA Implementation (#21955)
* LLaMA

* sharding and docs

* tweak

* black

* inits

* ruff

* LLAMA_PRETRAINED_CONFIG_ARCHIVE_MAP

* init

* no checkpoint

* docs

* ruff

* type_vocab_size

* tokenizer fixes

* tokenizer fixes

* Update tokenization_llama.py

* Update tokenization_llama.py

* Update configuration_llama.py

* Update modeling_llama.py

* tokenizer add_bos by default

* licenses

* remove decoder

* norms and mlp

* rope overhaul

* tweaks

* black

* mention OPT implementation

* off-by-one naming

* typo

* fix

* tokenization fix and slicing bug

* padding config

* cleanup

* black

* update tests

* undo typo

* fix vocab caching logic

* ruff

* docbuilder

* attn fix from BlackSamorez

* initial feedback

* typo

* docs

* llama case

* llama case

* load checkpoint docs

* comment about tokenizer

* tokenizer defaults

* clear past_key_values if use_cache=False

* last tweaks

* last tweaks

* last tweaks

* last tweaks

---------

Co-authored-by: Stella Biderman <stellabiderman@gmail.com>
2023-03-16 09:01:15 -04:00
Jason Phang
0041be5b3d
LLaMA Implementation (#21955)
* LLaMA

* sharding and docs

* tweak

* black

* inits

* ruff

* LLAMA_PRETRAINED_CONFIG_ARCHIVE_MAP

* init

* no checkpoint

* docs

* ruff

* type_vocab_size

* tokenizer fixes

* tokenizer fixes

* Update tokenization_llama.py

* Update tokenization_llama.py

* Update configuration_llama.py

* Update modeling_llama.py

* tokenizer add_bos by default

* licenses

* remove decoder

* norms and mlp

* rope overhaul

* tweaks

* black

* mention OPT implementation

* off-by-one naming

* typo

* fix

* tokenization fix and slicing bug

* padding config

* cleanup

* black

* update tests

* undo typo

* fix vocab caching logic

* ruff

* docbuilder

* attn fix from BlackSamorez

* initial feedback

* typo

* docs

* llama case

* llama case

* load checkpoint docs

* comment about tokenizer

* tokenizer defaults

* clear past_key_values if use_cache=False

* last tweaks

* last tweaks

* last tweaks

* last tweaks

---------

Co-authored-by: Stella Biderman <stellabiderman@gmail.com>
2023-03-16 09:00:53 -04:00
Baelish03
09922da4a7
Italian Translation of migration.mdx (#22183)
* Tranlstion Italian: migration

* Update migration.mdx

minor fixes

* Update _toctree.yml

* Delete migration.mdx

* Add italian translation of migration.mdx

* Update of migration.mdx translation and toctree
2023-03-16 12:00:07 +00:00
Yih-Dar
52a57f7c7c
Update expected values in MgpstrModelIntegrationTest (#22195)
Update values

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-16 11:48:52 +00:00
Alara Dirik
1485bd9c02
Fix typo in Align docs (#22199)
Fix align docs typo
2023-03-16 13:41:48 +03:00
Yih-Dar
1c4a9acc73
Fix DeepSpeed CI (#22194)
* Deal with torch-tensorrt

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-16 05:52:40 +01:00
Prathik Rao
7c4999e495
t5 remove data dependency (#22097)
* t5 remove data dependency

* make style

* make fix-copies

---------

Co-authored-by: Prathik Rao <prathikrao@microsoft.com>
2023-03-15 16:11:15 -04:00
Anahita Bhiwandiwalla
16121bae5c
Update BridgeTowerForContrastiveLearning (#22145)
* Use return_loss for BridgeTowerForContrastiveLearning, add example

* fix tests

* Update example in BridgeTowerForContrastiveLearning

* Update test_modeling_bridgetower.py

* update model output format

* minor update

* Update src/transformers/models/bridgetower/modeling_bridgetower.py

* make style

---------

Co-authored-by: Tiep Le <97980157+tileintel@users.noreply.github.com>
Co-authored-by: Tiep Le <tiep.le@intel.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-15 20:54:38 +01:00