Commit Graph

15053 Commits

Author SHA1 Message Date
Stas Bekman
61f79b2986
[gptj] support older pytorch version (#22325)
* [gptj] support older pytorch version

* contributor

* contributor

* make copies

---------

Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2023-03-22 18:35:04 -07:00
Sylvain Gugger
80e3b36361
Really fix quality due to ruff release 2023-03-22 20:56:22 -04:00
Sylvain
ef28df0572 Fix quality due to ruff release 2023-03-22 20:45:08 -04:00
Stas Bekman
73fdc8c5b4
[deepspeed zero3] need generate(synced_gpus=True, ...) (#22242)
* [deepspeed zero3] need generate(synced_gpus=True, ...)

* fix

* rework per Sylvain's suggestion

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-03-22 12:18:57 -07:00
Yih-Dar
8b05ace014
Fix PipelineTests skip conditions (#22320)
* check what tests fail

* Skip failing tests

* Skip failing tests

* Skip failing tests

* Skip failing tests

* clean up

* clean up

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-22 20:02:24 +01:00
Luc CAILLIAU
d62e7d8842
Chunkable token classification pipeline (#21771)
* Chunkable classification pipeline 

The TokenClassificationPipeline is now able to process sequences longer than 512. No matter the framework, the model, the tokenizer. We just have to pass process_all=True and a stride number (optional). The behavior remains the same if you don't pass these optional parameters. For overlapping parts when using stride above 0, we consider only the max scores for each overlapped token in all chunks where the token is.

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* update with latest black format

* update black format

* Update token_classification.py

* Update token_classification.py

* format correction

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update comments

* Update src/transformers/pipelines/token_classification.py

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

* Update token_classification.py

Correct spaces, remove process_all and keep only stride. If stride is provided, the pipeline is applied to the whole text.

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update chunk aggregation

Update the chunk aggregation strategy based on entities aggregation.

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

Remove unnecessary pop from outputs dict

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update token_classification.py

* Update src/transformers/pipelines/token_classification.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* add chunking tests

* correct formating

* correct formatting

* correct model id for test chunking

* update scores with nested simplify

* Update test_pipelines_token_classification.py

* Update test_pipelines_token_classification.py

* update model to a tiny one

* Update test_pipelines_token_classification.py

* Adding smaller test for chunking.

* Fixup

* Update token_classification.py

* Update src/transformers/pipelines/token_classification.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/pipelines/token_classification.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

---------

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-03-22 14:13:20 -04:00
Tom Aarsen
f48d3314e4
docs: Resolve incorrect type typo in trainer methods (#22316)
Resolve incorrect type typo in trainer methods
2023-03-22 11:57:08 -04:00
Younes Belkada
0f68a7f408
Add Pix2Struct (#21400)
* v1 all keys match

* clean up

* forward pass ok

* add correct image transform

* generate works, logits matching

* clean up

* more refactor

* revert

* revert

* clean up

* clean ups

* clean up

* refactor

* refactor

* fix doc

* fix tokenizer test

* fix toctree

* revert toctree

* oops

* few fixes

* replace to `pixel_embeds`

* make fixup

* test processing & feat extractor

* fix some tests

* more fixes

* make fixup

* clean up

* more clean up

* add a single slow test

* fix test

* make fixup

* fix

* fix authors

* fix toctree

* update docs

* add docstring

* revert change

* Update src/transformers/models/pix2struct/__init__.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix tokenizer

* fix processor test

* fix test

* make fixup

* refactor

* fix config

* Update src/transformers/models/pix2struct/image_processing_pix2struct.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* format

* fix

* Update src/transformers/models/pix2struct/image_processing_pix2struct.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* make fixup

* add docstring

* fix issues

* fix

* fix

* fix

* add slow test

* fix

* fix

* fix batched issue

* fix training issues

* fix ci test

* fix slow test

* fix conversion script

* remove unneeded classes

* fix slow test

* fix require backends

* fix masked fill

* revert

* fix softmax

* add large models support

* fix conditional generation

* few fixes

* add instructions

* rm unneeded file

* Update src/transformers/models/pix2struct/convert_pix2struct_original_pytorch_to_hf.py

* fix ci test

* fix ci test really

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix nit

* fix nits

* fix image processors nits

* docstring

* clean up

* fix nit

* fix tests

* docstring nit

* fix reshape

* Update src/transformers/models/pix2struct/image_processing_pix2struct.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* fix nit

* fix repetition

* refactor processor

* make patch size consistent

* refactor forward

* fix docstring

* fix max_patches issue

* update docstirng

* update docstring

* fix coped from

* add skip reasons

* few fixes

* Update src/transformers/models/pix2struct/image_processing_pix2struct.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* format

* fix doctests

* refactor and fix

* fix doc build issue

* fix processor test

* small fix conversion script

* replace correct weights

* make fixup

* fix some issues

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* revert config and fixes

* Update src/transformers/models/pix2struct/image_processing_pix2struct.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* more details

* fixes

* fix processor

* fix processor test

* fix

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* make fixup

* fix processor

* Update src/transformers/models/pix2struct/modeling_pix2struct.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add copied

* make fixup

* fix copies

* update docstring

* refactor

* fix docstring

* fix conversion script

* fix vqa issue

* replace to `flattened_patches`

* nit

* fix numpy issue

* fix image processors

* add batched vqa support

* fix vqa conversion

* make fixup

* fix conversion script

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* make fixup

* add correct docstring

* update docstring

* fix module level + channel dim

* use `make_list_of_images`

* refactor

* correct docstring

* fix authors

* remove `data_format`

* add header text test

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* make fixup

* add checkpoints

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
2023-03-22 16:53:52 +01:00
Joao Gante
fd3eb3e3cd
Beef up Llama tests (#22314)
* tmp commit

* beef up llama tests
2023-03-22 15:20:48 +00:00
Joao Gante
12febc20db
Generate: Export TF generate with a TF tokenizer (#22310)
* Export TF generate with a TF tokenizer

* remove unused lines
2023-03-22 15:00:20 +00:00
Sylvain Gugger
5fd4e3c87c
Enforce max_memory for device_map strategies (#22311)
Enforce  for device_map strategies
2023-03-22 09:22:07 -04:00
silentghoul-spec
48bef3a734
Fixed bug to calculate correct xpath_sub_list in MarkupLMTokenizer (#22302)
Fixed bug to calculate correct xpath_sub_list in MarkupLMTokenizer. Earlier xpath_sub_list was same as xpath_tags_list

Co-authored-by: dusejat <dusejat@amazon.com>
2023-03-22 12:07:49 +00:00
Nick Hill
4e94c6c008
Fix position embeddings for GPT-J and CodeGen (#22069)
* Revert "[GPT-J] add deprecation warning (#21869)"

This reverts commit fb76994c41.

* Fix position embeddings for GPT-J and CodeGen

* Address review comments from @gante

* Fix "Copied from" comment referencing wrong function

* Fix copy/paste mistake

* Fix training path

* Hopefully make torch.fx happy

* Move position_ids long cast

* Revert "Hopefully make torch.fx happy"

This reverts commit e41a6f4cad3ff441124c7457b19cfb630d4ca025.

* Changes to help with torch.fx tracing

* Linter fix

* Correct position_ids tensor type hint

* Work-around torch.fx tracing issue

* Get the changes to work with torch.fx

* Address review comment from @michaelbenayoun

* Another small adjustment

* Add explanatory comment; small code tidyup
2023-03-22 11:14:54 +00:00
Connor Henderson
8e6c34b390
fix: Allow only test_file in pytorch and flax summarization (#22293)
allow only test_file in pytorch and flax summarization
2023-03-22 10:46:56 +00:00
Wang, Yi
4ccaf268fb
add low_cpu_mem_usage option in run_clm.py example which will benefit… (#22288)
* add low_cpu_mem_usage option in run_clm.py example which will benefit LLM loading

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* update all the example and README under language-modeling

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

---------

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2023-03-22 10:42:39 +00:00
jiqing-feng
8472a224fb
Enable traced model for text-generation task (#22265) 2023-03-22 10:19:26 +00:00
Alara Dirik
0558914dff
Add MaskedImageModelingOutput (#22212)
* Add MaskedImageModelingOutput
2023-03-22 07:35:47 +03:00
Yih-Dar
0dcb46e7a4
Final update of doctest (#22299)
* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-22 01:00:33 +01:00
Stas Bekman
89a0a9eace
[deepspeed] offload + non-cpuadam optimizer exception doc (#22044)
* [deepspeed] offload + non-cpuadam optimizer exception doc

* deps
2023-03-21 17:00:05 -07:00
Ali Hassani
5990743fdd
Correct NATTEN function signatures and force new version (#22298) 2023-03-21 17:21:34 -04:00
Yanming W
d35f729649
Restore fp16 support on xla gpu device (#22300) 2023-03-21 16:32:43 -04:00
Yih-Dar
67c2dbdb54
Time to Say Goodbye, torch 1.7 and 1.8 (#22291)
* time to say goodbye, torch 1.7 and 1.8

* clean up torch_int_div

* clean up is_torch_less_than_1_8-9

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-21 19:22:01 +01:00
Davide Gazzè
86c7931a70
Add translation perf_infer_gpu_one for it (#22296)
Add translation
2023-03-21 13:07:30 -04:00
Yih-Dar
d0b942d1dc
fix more doctests (#22292)
* fix more doctests

* fix style

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-21 16:16:17 +01:00
Yih-Dar
48327c5718
More doctests (#22268)
* all doctests

* Skip failed tests

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-21 13:27:30 +01:00
Gerald Cuder
5a2b77a6c1
Fix error in mixed precision training of TFCvtModel (#22267)
* Make sure CVT can be trained using mixed precision

* Add test for keras-fit with mixed-precision

* Update tests/models/cvt/test_modeling_tf_cvt.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

---------

Co-authored-by: gcuder <Gerald.Cuder@iacapps.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2023-03-21 12:12:57 +00:00
Andrei Panferov
330d8b991f
replace_8bit_linear modules_to_not_convert default value fix (#22238)
* Fixed modules_to_not_convert default value

* Fixed modules_to_not_convert docstring

* Update src/transformers/utils/bitsandbytes.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/utils/bitsandbytes.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* ["lm_head"] if modules_to_not_convert is None

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-03-21 10:16:07 +00:00
amyeroberts
c07a02a4b7
Update vision docstring bool masked pos (#22237)
* Add bool_masked_pos to forward docstrings

* Add note about mask ratio - videomae

* Fix up

* Fix indenting
2023-03-20 20:06:16 +00:00
Maria Khalusova
7bd8650512
Example of pad_to_multiple_of for padding and truncation guide & docstring update (#22278)
* added an example of pad_to_multiple_of

* make style

* addressed feedback
2023-03-20 14:18:55 -04:00
Antoni Viros
fb0a38b4f2
Move torch.compile() wrapping after DDP/FSDP wrapping to ensure correct graph breaks during training (#22279) 2023-03-20 13:54:01 -04:00
amyeroberts
8ac29fe090
Fix doc links (#22274) 2023-03-20 17:07:31 +00:00
Sylvain Gugger
da005253b8
Proper map location for optimizer load (#22273)
* Proper map location for optimizer load

* What happened to my code?
2023-03-20 11:30:46 -04:00
Sylvain Gugger
786092a35e
Rework a bit the LLaMA conversion script (#22236)
* Update LLaMA conversion script

* Doc

* Fix the weight size for the 13B checkpoint

* Update src/transformers/models/llama/convert_llama_weights_to_hf.py

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

---------

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
2023-03-20 11:30:36 -04:00
Sylvain Gugger
43efd7cb13
Fix balanced and auto device_map (#22271) 2023-03-20 11:24:17 -04:00
yqy2001
89f0fda5d3
Fix the gradient checkpointing bug of the llama model (#22270)
fix grad ckpt bug of llama
2023-03-20 10:26:50 -04:00
heya5
cf0af9a31b
[Trainer] Add optional communication backends for torch.distributed when using GPU (#22247)
Update training_args.py
2023-03-20 09:17:34 -04:00
Nicola Procopio
c4bf6f38bd
Italian translation perf_infer_cpu (#22243)
* added translated files

added perf_train_cpu and perf_train_cpu_many

* updated toctree

* updated toctree

* added file

perf_infer_cpu.medx

* italian translation perf_infer_cpu.mdx
2023-03-20 09:16:07 -04:00
yesinkim
466144d440
[Docs] fix typos in some tokenizer docs (#22256)
[Docs] fix typos

Co-authored-by: yesinkim <yesinkim@yesinkimui-MacBookAir.local>
2023-03-20 12:17:31 +00:00
Pasquale Minervini
a48310de47
Update training_args.py -- a nightly install is not required anymore for torch.compile (#22266)
Update training_args.py

A nightly install is not required anymore for `torch.compile`.
2023-03-20 12:00:05 +00:00
Stas Bekman
60d51ef512
[trainer] param count for deepspeed zero3 (#22193)
[trainer] param count for zero3
2023-03-17 11:02:55 -07:00
Guangyuan Ma
cf601b902f
Fix Unnecessary move of tensors from CPU to GPU in LlamaRotaryEmbedding (#22234)
push
2023-03-17 13:56:32 -04:00
Yih-Dar
bec075612a
Revert "Use dash==2.8.1 for now for daily CI" (#22233)
Revert "Use `dash==2.8.1` for now for daily CI (#22227)"

This reverts commit 53218671d9.
2023-03-17 16:54:27 +01:00
Ali Hassani
3028b20a71
Fix natten (#22229)
* Add kernel size to NATTEN's QK arguments.

The new NATTEN 0.14.5 supports PyTorch 2.0, but also adds an additional
argument to the QK operation to allow optional RPBs.

This ends up failing NATTEN tests.

This commit adds NATTEN back to circleci and adds the arguments to get
it working again.

* Force NATTEN >= 0.14.5
2023-03-17 11:07:55 -04:00
Seb0
074490b2c2
fix(docs): fix task guide links in model docs (#22226)
fix(docs): task guide links in model docs
2023-03-17 14:30:17 +00:00
Maria Khalusova
314cdf7c25
Removed .mdx extension in two links (#22230)
removed .mdx extension
2023-03-17 10:27:12 -04:00
lewtun
f251441387
Add LlamaForSequenceClassification (#22209)
* Add LlamaForSequenceClassification

* Update src/transformers/models/llama/modeling_llama.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update src/transformers/models/llama/modeling_llama.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Add docstring

* Add test

* Add input embedding getter and setter

* Remove dead code

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-03-17 14:39:26 +01:00
Wang, Yi
675d2a5a00
fix AutoTP in deepspeed could not work for bloom (#22196)
* fix AutoTP in deepspeed could not work for bloom

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* add a method in BloomModel to build ailib

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

---------

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2023-03-17 09:28:17 -04:00
Sylvain Gugger
00934026a4
LLaMA house-keeping (#22216)
* LLaMA house-keeping

* Doc links
2023-03-17 08:55:15 -04:00
Maria Khalusova
42f8f76402
Depth estimation task guide (#22205)
* added doc to toc, auto tip with  supported models, mention of task guide in model docs

* make style

* removed "see also"

* minor fix
2023-03-17 08:36:23 -04:00
Yih-Dar
53218671d9
Use dash==2.8.1 for now for daily CI (#22227)
Use dash 2.8.1 for now

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-17 13:27:14 +01:00