Joao Gante
12febc20db
Generate: Export TF generate with a TF tokenizer ( #22310 )
...
* Export TF generate with a TF tokenizer
* remove unused lines
2023-03-22 15:00:20 +00:00
Sylvain Gugger
5fd4e3c87c
Enforce max_memory
for device_map strategies ( #22311 )
...
Enforce for device_map strategies
2023-03-22 09:22:07 -04:00
silentghoul-spec
48bef3a734
Fixed bug to calculate correct xpath_sub_list in MarkupLMTokenizer ( #22302 )
...
Fixed bug to calculate correct xpath_sub_list in MarkupLMTokenizer. Earlier xpath_sub_list was same as xpath_tags_list
Co-authored-by: dusejat <dusejat@amazon.com>
2023-03-22 12:07:49 +00:00
Nick Hill
4e94c6c008
Fix position embeddings for GPT-J and CodeGen ( #22069 )
...
* Revert "[GPT-J] add deprecation warning (#21869 )"
This reverts commit fb76994c41
.
* Fix position embeddings for GPT-J and CodeGen
* Address review comments from @gante
* Fix "Copied from" comment referencing wrong function
* Fix copy/paste mistake
* Fix training path
* Hopefully make torch.fx happy
* Move position_ids long cast
* Revert "Hopefully make torch.fx happy"
This reverts commit e41a6f4cad3ff441124c7457b19cfb630d4ca025.
* Changes to help with torch.fx tracing
* Linter fix
* Correct position_ids tensor type hint
* Work-around torch.fx tracing issue
* Get the changes to work with torch.fx
* Address review comment from @michaelbenayoun
* Another small adjustment
* Add explanatory comment; small code tidyup
2023-03-22 11:14:54 +00:00
Connor Henderson
8e6c34b390
fix: Allow only test_file in pytorch and flax summarization ( #22293 )
...
allow only test_file in pytorch and flax summarization
2023-03-22 10:46:56 +00:00
Wang, Yi
4ccaf268fb
add low_cpu_mem_usage option in run_clm.py example which will benefit… ( #22288 )
...
* add low_cpu_mem_usage option in run_clm.py example which will benefit LLM loading
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* update all the example and README under language-modeling
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
---------
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2023-03-22 10:42:39 +00:00
jiqing-feng
8472a224fb
Enable traced model for text-generation task ( #22265 )
2023-03-22 10:19:26 +00:00
Alara Dirik
0558914dff
Add MaskedImageModelingOutput ( #22212 )
...
* Add MaskedImageModelingOutput
2023-03-22 07:35:47 +03:00
Yih-Dar
0dcb46e7a4
Final update of doctest ( #22299 )
...
* update
* update
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-22 01:00:33 +01:00
Stas Bekman
89a0a9eace
[deepspeed] offload + non-cpuadam optimizer exception doc ( #22044 )
...
* [deepspeed] offload + non-cpuadam optimizer exception doc
* deps
2023-03-21 17:00:05 -07:00
Ali Hassani
5990743fdd
Correct NATTEN function signatures and force new version ( #22298 )
2023-03-21 17:21:34 -04:00
Yanming W
d35f729649
Restore fp16 support on xla gpu device ( #22300 )
2023-03-21 16:32:43 -04:00
Yih-Dar
67c2dbdb54
Time to Say Goodbye, torch 1.7 and 1.8 ( #22291 )
...
* time to say goodbye, torch 1.7 and 1.8
* clean up torch_int_div
* clean up is_torch_less_than_1_8-9
* update
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-21 19:22:01 +01:00
Davide Gazzè
86c7931a70
Add translation perf_infer_gpu_one for it ( #22296 )
...
Add translation
2023-03-21 13:07:30 -04:00
Yih-Dar
d0b942d1dc
fix more doctests ( #22292 )
...
* fix more doctests
* fix style
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-21 16:16:17 +01:00
Yih-Dar
48327c5718
More doctests ( #22268 )
...
* all doctests
* Skip failed tests
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-21 13:27:30 +01:00
Gerald Cuder
5a2b77a6c1
Fix error in mixed precision training of TFCvtModel
( #22267 )
...
* Make sure CVT can be trained using mixed precision
* Add test for keras-fit with mixed-precision
* Update tests/models/cvt/test_modeling_tf_cvt.py
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
---------
Co-authored-by: gcuder <Gerald.Cuder@iacapps.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2023-03-21 12:12:57 +00:00
Andrei Panferov
330d8b991f
replace_8bit_linear modules_to_not_convert default value fix ( #22238 )
...
* Fixed modules_to_not_convert default value
* Fixed modules_to_not_convert docstring
* Update src/transformers/utils/bitsandbytes.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/utils/bitsandbytes.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* ["lm_head"] if modules_to_not_convert is None
---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-03-21 10:16:07 +00:00
amyeroberts
c07a02a4b7
Update vision docstring bool masked pos ( #22237 )
...
* Add bool_masked_pos to forward docstrings
* Add note about mask ratio - videomae
* Fix up
* Fix indenting
2023-03-20 20:06:16 +00:00
Maria Khalusova
7bd8650512
Example of pad_to_multiple_of for padding and truncation guide & docstring update ( #22278 )
...
* added an example of pad_to_multiple_of
* make style
* addressed feedback
2023-03-20 14:18:55 -04:00
Antoni Viros
fb0a38b4f2
Move torch.compile() wrapping after DDP/FSDP wrapping to ensure correct graph breaks during training ( #22279 )
2023-03-20 13:54:01 -04:00
amyeroberts
8ac29fe090
Fix doc links ( #22274 )
2023-03-20 17:07:31 +00:00
Sylvain Gugger
da005253b8
Proper map location for optimizer load ( #22273 )
...
* Proper map location for optimizer load
* What happened to my code?
2023-03-20 11:30:46 -04:00
Sylvain Gugger
786092a35e
Rework a bit the LLaMA conversion script ( #22236 )
...
* Update LLaMA conversion script
* Doc
* Fix the weight size for the 13B checkpoint
* Update src/transformers/models/llama/convert_llama_weights_to_hf.py
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
---------
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
2023-03-20 11:30:36 -04:00
Sylvain Gugger
43efd7cb13
Fix balanced and auto device_map ( #22271 )
2023-03-20 11:24:17 -04:00
yqy2001
89f0fda5d3
Fix the gradient checkpointing bug of the llama model ( #22270 )
...
fix grad ckpt bug of llama
2023-03-20 10:26:50 -04:00
heya5
cf0af9a31b
[Trainer] Add optional communication backends for torch.distributed when using GPU ( #22247 )
...
Update training_args.py
2023-03-20 09:17:34 -04:00
Nicola Procopio
c4bf6f38bd
Italian translation perf_infer_cpu ( #22243 )
...
* added translated files
added perf_train_cpu and perf_train_cpu_many
* updated toctree
* updated toctree
* added file
perf_infer_cpu.medx
* italian translation perf_infer_cpu.mdx
2023-03-20 09:16:07 -04:00
yesinkim
466144d440
[Docs] fix typos in some tokenizer docs ( #22256 )
...
[Docs] fix typos
Co-authored-by: yesinkim <yesinkim@yesinkimui-MacBookAir.local>
2023-03-20 12:17:31 +00:00
Pasquale Minervini
a48310de47
Update training_args.py -- a nightly install is not required anymore for torch.compile ( #22266 )
...
Update training_args.py
A nightly install is not required anymore for `torch.compile`.
2023-03-20 12:00:05 +00:00
Stas Bekman
60d51ef512
[trainer] param count for deepspeed zero3 ( #22193 )
...
[trainer] param count for zero3
2023-03-17 11:02:55 -07:00
Guangyuan Ma
cf601b902f
Fix Unnecessary move of tensors from CPU to GPU in LlamaRotaryEmbedding ( #22234 )
...
push
2023-03-17 13:56:32 -04:00
Yih-Dar
bec075612a
Revert "Use dash==2.8.1
for now for daily CI" ( #22233 )
...
Revert "Use `dash==2.8.1` for now for daily CI (#22227 )"
This reverts commit 53218671d9
.
2023-03-17 16:54:27 +01:00
Ali Hassani
3028b20a71
Fix natten ( #22229 )
...
* Add kernel size to NATTEN's QK arguments.
The new NATTEN 0.14.5 supports PyTorch 2.0, but also adds an additional
argument to the QK operation to allow optional RPBs.
This ends up failing NATTEN tests.
This commit adds NATTEN back to circleci and adds the arguments to get
it working again.
* Force NATTEN >= 0.14.5
2023-03-17 11:07:55 -04:00
Seb0
074490b2c2
fix(docs): fix task guide links in model docs ( #22226 )
...
fix(docs): task guide links in model docs
2023-03-17 14:30:17 +00:00
Maria Khalusova
314cdf7c25
Removed .mdx extension in two links ( #22230 )
...
removed .mdx extension
2023-03-17 10:27:12 -04:00
lewtun
f251441387
Add LlamaForSequenceClassification ( #22209 )
...
* Add LlamaForSequenceClassification
* Update src/transformers/models/llama/modeling_llama.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Update src/transformers/models/llama/modeling_llama.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Add docstring
* Add test
* Add input embedding getter and setter
* Remove dead code
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-03-17 14:39:26 +01:00
Wang, Yi
675d2a5a00
fix AutoTP in deepspeed could not work for bloom ( #22196 )
...
* fix AutoTP in deepspeed could not work for bloom
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* add a method in BloomModel to build ailib
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
---------
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2023-03-17 09:28:17 -04:00
Sylvain Gugger
00934026a4
LLaMA house-keeping ( #22216 )
...
* LLaMA house-keeping
* Doc links
2023-03-17 08:55:15 -04:00
Maria Khalusova
42f8f76402
Depth estimation task guide ( #22205 )
...
* added doc to toc, auto tip with supported models, mention of task guide in model docs
* make style
* removed "see also"
* minor fix
2023-03-17 08:36:23 -04:00
Yih-Dar
53218671d9
Use dash==2.8.1
for now for daily CI ( #22227 )
...
Use dash 2.8.1 for now
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-17 13:27:14 +01:00
wangpeng
af1c864cdc
fix code example in mgp-str doc ( #22219 )
...
Co-authored-by: yue kun <yuekun.wp@alibaba-inc.com>
2023-03-17 09:40:06 +00:00
Kevin Turner
33d033d694
fix typos in llama.mdx ( #22223 )
2023-03-17 08:43:18 +00:00
Yih-Dar
97a3d16a69
Hotfix for natten issue with torch 2.0.0 on CircleCI ( #22218 )
...
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-16 23:57:26 +01:00
Yih-Dar
5110e5748e
🔥 py38 + torch 2 🔥 🔥 🔥 🚀 ( #22204 )
...
* py38 + torch 2
* increment cache versions
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-16 22:59:23 +01:00
Susnato Dhar
fb366b9a2a
fixes a typo in WhisperFeatureExtractor docs. ( #22208 )
...
* fixes a typo
* .
2023-03-16 16:08:05 +00:00
Younes Belkada
da3ba3a167
[XGLM
] Add accelerate
support for XGLM ( #22207 )
...
* add `accelerate` support for XGLM
* fix order
2023-03-16 16:18:05 +01:00
SatyaJandhyalaAtMS
a88a4dae19
Temporarily fix ONNX model exporting error ( #21830 )
...
* Temporarily fix https://github.com/microsoft/onnx-converters-private/issues/143
* Reduced column width
* Fix formatting.
* Revert "Temporarily fix https://github.com/microsoft/onnx-converters-private/issues/143 "
This reverts commit 6e95a108042118d204da447729f3834affa354fc.
* Fix export error.
* Revert "Fix formatting."
This reverts commit 8310f60da10358edbdf77a2a2f3c83ee55066cb8.
* Propagated changes made in SwinV2 to Swin2SR
2023-03-16 10:56:26 -04:00
Yih-Dar
4c5c0af7e5
Update tiny model creation script ( #22202 )
...
* Update UNCONVERTIBLE_MODEL_ARCHITECTURES
* Deal with 2 model tester classes in single test file
* Deal with 2 model tester classes in single test file
* Deal with 2 model tester classes in single test file
* make style and quality
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-16 14:21:58 +01:00
Jason Phang
464d420775
LLaMA Implementation ( #21955 )
...
* LLaMA
* sharding and docs
* tweak
* black
* inits
* ruff
* LLAMA_PRETRAINED_CONFIG_ARCHIVE_MAP
* init
* no checkpoint
* docs
* ruff
* type_vocab_size
* tokenizer fixes
* tokenizer fixes
* Update tokenization_llama.py
* Update tokenization_llama.py
* Update configuration_llama.py
* Update modeling_llama.py
* tokenizer add_bos by default
* licenses
* remove decoder
* norms and mlp
* rope overhaul
* tweaks
* black
* mention OPT implementation
* off-by-one naming
* typo
* fix
* tokenization fix and slicing bug
* padding config
* cleanup
* black
* update tests
* undo typo
* fix vocab caching logic
* ruff
* docbuilder
* attn fix from BlackSamorez
* initial feedback
* typo
* docs
* llama case
* llama case
* load checkpoint docs
* comment about tokenizer
* tokenizer defaults
* clear past_key_values if use_cache=False
* last tweaks
* last tweaks
* last tweaks
* last tweaks
---------
Co-authored-by: Stella Biderman <stellabiderman@gmail.com>
2023-03-16 09:01:15 -04:00