Commit Graph

13418 Commits

Author SHA1 Message Date
Samin Yasar
4f08887053
Add Multimodal heading and Document question answering in task_summary.mdx (#23318)
* add multimodal heading and docqa

* fix sentence

* task_summary data type = modality clarification

* change the multimodal example to a smaller model
2023-07-17 13:51:19 +01:00
dependabot[bot]
38dfb86958
Bump cryptography from 41.0.0 to 41.0.2 in /examples/research_projects/decision_transformer (#24833)
Bump cryptography in /examples/research_projects/decision_transformer

Bumps [cryptography](https://github.com/pyca/cryptography) from 41.0.0 to 41.0.2.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pyca/cryptography/compare/41.0.0...41.0.2)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-07-17 07:17:17 -04:00
namespace-Pt
18d42bfd23
Remove unused code in GPT-Neo (#24826)
1
2023-07-17 07:07:47 -04:00
Sohyun Sim
9771ad33be
🌐 [i18n-KO] Translated custom_tools.mdx to Korean (#24580)
* docs: ko: custom_tools.mdx

* feat: deepl draft

* fix: change .mdx to .md

* fix: resolve suggestions

* fix: resolve suggestions
2023-07-17 07:04:10 -04:00
statelesshz
8ba26c18cf
deprecate sharded_ddp training argument (#24825)
* deprecate fairscale's ShardedDDP

* fix code style

* roll back

* deprecate the `sharded_ddp` training argument

---------

Co-authored-by: jihuazhong <jihuazhong1@huawei.com>
2023-07-17 06:57:42 -04:00
Kadir Nar
5bb4430edc
[🔗 Docs] Fixed Incorrect Migration Link (#24793)
* [🔗 Docs] Fixed Incorrect Migration Link

* Update README.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-07-14 17:47:50 -04:00
Sylvain Gugger
1023705440
Check models used for common tests are small (#24824)
* First models

* Conditional DETR

* Treat DETR models, skip others

* Skip LayoutLMv2 as well

* Fix last tests
2023-07-14 14:43:19 -04:00
Dario Sučić
a865b62e07
set correct model input names for gptsw3tokenizer (#24788) 2023-07-14 18:13:45 +01:00
Nicolas Patry
50726f9ea7
Fixing double use_auth_token.pop (preventing private models from being visible). (#24812)
Fixing double `use_auth_token.pop` (preventing private models from
being visible).

Should fix: https://github.com/huggingface/transformers/issues/14334#issuecomment-1634527833

Repro: Have a private repo, with `vocab.json` (spread out files for the
tokenizer) and use `AutoTokenizer.from_pretrained(...,
use_auth_token="token")`.
2023-07-14 15:20:02 +02:00
Sylvain Gugger
91d7df58b6
Copy code when using local trust remote code (#24785)
* Copy code when using local trust remote code

* Remote upgrade strategy

* Revert "Remote upgrade strategy"

This reverts commit 4f0392f5d7.
2023-07-13 16:57:20 -04:00
Sylvain Gugger
f32303d519
Run hub tests (#24807)
* Run hub tests

* [all-test] Run tests please!

* [all-test] Add vision dep for hub tests

* Fix tests
2023-07-13 15:25:45 -04:00
Fady Nakhla
9d7a0871e2
Use _BaseAutoModelClass's register method (#24810)
Switching _BaseAutoModelClass from_pretrained and from_config to use the register classmethod that it defines rather than using the _LazyAutoMapping register method directly. This makes use of the additional consistency check within the base model's register.
2023-07-13 15:24:51 -04:00
Georgie Mathews
0866705022
Update setup.py to be compatible with pipenv (#24789) 2023-07-13 12:56:43 -04:00
Matt
c0ca73dc98
Remove Falcon docs for the release until TGI is ready (#24808)
* Remove Falcon docs for the release until TGI is ready

* Update toctree
2023-07-13 17:27:58 +01:00
dymil
f9a711df4a
Fix typo 'submosules' (#24809) 2023-07-13 16:56:53 +01:00
amyeroberts
eebce4470c
Add accelerate version in transformers-cli env (#24806)
* Add accelerate version in transformers-cli env

* Add accelerate config
2023-07-13 16:50:19 +01:00
Joao Gante
34d9409427
Llama/GPTNeoX: add RoPE scaling (#24653)
* add rope_scaling

* tmp commit

* add gptneox

* add tests

* GPTNeoX can now handle long inputs, so the pipeline test was wrong

* Update src/transformers/models/open_llama/configuration_open_llama.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* remove ntk

* remove redundant validation

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-07-13 16:47:30 +01:00
Sylvain Gugger
9342c8fb82
Deprecate models (#24787)
* Deprecate some models

* Fix imports

* Fix inits too

* Remove tests

* Add deprecated banner to documentation

* Remove from init

* Fix auto classes

* Style

* Remote upgrade strategy 1

* Remove site package cache

* Revert this part

* Fix typo...

* Update utils

* Update docs/source/en/model_doc/bort.md

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

* Address review comments

* With all files saved

---------

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
2023-07-13 11:46:54 -04:00
Yih-Dar
717dadc6f3
Skip torchscript tests for MusicgenForConditionalGeneration (#24782)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-07-13 15:54:18 +02:00
amyeroberts
e367a9770f
Fix MobileVitV2 doctest checkpoint (#24805)
* Fix doctest checkpoint

* Add import torch for mobilevit
2023-07-13 14:47:59 +01:00
Yih-Dar
e538189931
Upgrade jax/jaxlib/flax pin versions (#24791)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-07-13 13:57:30 +02:00
Bram Vanroy
6ba4d5de3a
[DOC] Clarify relationshi load_best_model_at_end and save_total_limit (#24614)
* Update training_args.py

Clarify the relationship between `load_best_model_at_end` and `save_total_limit`.

* fix: faulty quotes

* make quality

* Update src/transformers/training_args.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* DOCS: add explicit `True`

* DOCS: make style/quality

---------

Co-authored-by: Bram Vanroy <Bram.Vanroy@UGent.be>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-07-13 07:36:16 -04:00
SeongBeomLEE
21946a8cf4
[fix] Change the condition of ValueError in "convert_checkpoint_from_transformers_to_megatron" (#24769)
* fix: half inference error

norm_factor is still torch.float32 after using model.half

So I changed it to register_buffer so I can change it to torch.float16 after using model.half

* fix: Added a variable "persistent=False"

* run make style

* [fix] Change the condition of ValueError
convert_checkpoint_from_transformers_to_megatron

* [fix] error wording
layers -> attention heads
2023-07-13 11:57:56 +01:00
Liyang90
1f6f32c243
Removing unnecessary device=device in modeling_llama.py (#24696)
* Update modeling_llama.py

Removing unnecessary `device=device`

* fix in all occurrences of _make_causal_mask
2023-07-13 10:30:22 +01:00
Yih-Dar
906afa1d5c
Revert "Unpin protobuf in docker file (for daily CI)" (#24800)
Revert "Unpin protobuf in docker file (for daily CI) (#24761)"

This reverts commit 45025d92f8.
2023-07-13 04:19:45 +02:00
Zach Mueller
f1732e1374
Rm duplicate pad_across_processes (#24780)
Rm duplicate
2023-07-12 11:47:21 -04:00
Lysandre Debut
cfc8a05305
Remove WWT from README (#24672) 2023-07-12 10:58:08 -04:00
Pedro Cuenca
395e566a42
gpt-bigcode: avoid zero_ to support Core ML (#24755)
gpt-bigcode: avoid `zeros_` to support Core ML.

In-place `zeros_` is not supported by the Core ML conversion process.
This PR replaces it with `zeros_like` so conversion can proceed.

The change only affects a workaround for a PyTorch bug on the `cpu`
device.
2023-07-12 16:38:25 +02:00
Zach Mueller
0284285501
Fix pad across processes dim in trainer and not being able to set the timeout (#24775)
* dim, and rm copy

* Don't rm copy for now

* Oops

* pad index

* Should be a working test

* Tickle down ddp timeout

* Put fix back in now that testing locally is done

* Better comment specifying timeout

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-07-12 10:01:51 -04:00
Yih-Dar
4f85aaa6c9
Update default values of bos/eos token ids in CLIPTextConfig (#24773)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-07-12 13:50:26 +02:00
Bauke Brenninkmeijer
fc9e387dc0
Replacement of 20 asserts with exceptions (#24757)
* initial replacements of asserts with errors/exceptions

* replace assert with exception in generation, align and bart

* reset formatting change

* reset another formatting issue

* Apply suggestion

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* don't touch this file

* change to 'is not False'

* fix type

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-07-12 07:45:09 -04:00
Joao Gante
430a04a75a
Docs: Update logit processors __call__ docs (#24729)
* tmp commit

* __call__ docs

* kwargs documented; shorter input_ids doc

* nit

* Update src/transformers/generation/logits_process.py
2023-07-12 12:21:30 +01:00
amyeroberts
6e2f069650
Add MobileVitV2 to doctests (#24771)
* Add to doctests

* Alphabetical order
2023-07-12 12:06:17 +01:00
Zach Mueller
7edc33ac7a
Fix eval_accumulation_steps leading to incorrect metrics (#24756)
Fix eval steps
2023-07-12 05:49:12 -04:00
Yih-Dar
45025d92f8
Unpin protobuf in docker file (for daily CI) (#24761)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-07-11 23:55:55 +02:00
Sylvain Gugger
6aadb8d016
Allow existing configs to be registered (#24760) 2023-07-11 16:52:34 -04:00
Gaurav Kumbhat
4c0e251dc7
🐛 Handle empty gen_kwargs for seq2seq trainer prediction_step function (#24759)
* 🐛 Handle empty gen_kwargs for seq2seq trainer prediction_step fn

Signed-off-by: gkumbhat <kumbhat.gaurav@gmail.com>

* Update src/transformers/trainer_seq2seq.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

---------

Signed-off-by: gkumbhat <kumbhat.gaurav@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-07-11 16:48:06 -04:00
Zach Mueller
253d43d46d
Fix lr scheduler not being reset on reruns (#24758)
* Try this

* Solved!

* Rm extranious

* Rm extranious

* self

* Args'

* Check for if we created the lr scheduler

* Move comment

* Clean
2023-07-11 16:37:04 -04:00
Yih-Dar
1be0145d6a
Skip some slow tests for doctesting in PRs (Circle)CI (#24753)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-07-11 22:08:14 +02:00
NielsRogge
bb13a92859
[InstructBLIP] Fix bos token of LLaMa checkpoints (#24492)
* Add fix

* Fix doctest
2023-07-11 20:43:01 +01:00
janEbert
aac4c79968
Fix non-deterministic Megatron-LM checkpoint name (#24674)
Fix non-deterministic checkpoint name

`os.listdir`'s order is not deterministic, which is a problem when
querying the first listed file as in the code (`os.listdir(...)[0]`).

This can return a checkpoint name such as `distrib_optim.pt`, which does
not include desired information such as the saved arguments originally
given to Megatron-LM.
2023-07-11 19:55:04 +01:00
Sylvain Gugger
33aafc26ee
Skip keys not in the state dict when finding mismatched weights (#24749) 2023-07-11 12:40:21 -04:00
Zehan Li
3d8697261e
add gradient checkpointing for distilbert (#24719)
* add gradient checkpointing for distilbert

* reformatted
2023-07-11 11:29:47 -04:00
Joao Gante
2642d8d04b
Docs: add kwargs type to fix formatting (#24733) 2023-07-11 16:21:29 +01:00
Connor Henderson
5739726fcc
fix: Text splitting in the BasicTokenizer (#22280)
* fix: Apostraphe splitting in the BasicTokenizer for CLIPTokenizer

* account for apostrophe at start of new word

* remove _run_split_on_punc, use re.findall instead

* remove debugging, make style and quality

* use pattern and punc splitting, repo-consistency will fail

* remove commented out debugging

* adds bool args to BasicTokenizer, remove pattern

* do_split_on_punc default True

* clean stray comments and line breaks

* rebase, repo-consistency

* update to just do punctuation split

* add unicode normalizing back

* remove redundant line
2023-07-11 11:07:58 -04:00
Justin Martin
2489e380e4
Fix typo in LocalAgent (#24736) 2023-07-11 09:04:50 -04:00
Jegor Kitškerkin
8a5e8a9c2a
Add ViViT (#22518)
* Add model

* Add ability to get classification head weights

* Add docs

* Add imports to __init__.py

* Run style

* Fix imports and add mdx doc

* Run style

* Fix copyright

* Fix config docstring

* Remove imports of ViViTLayer and load_tf_weights_in_vivit

* Remove FeatureExtractor and replace with ImageProcessor everywhere

* Remove ViViTForPreTraining from vivit.mdx

* Change ViViT -> Vivit everywhere

* Add model_doc to _toctree.yml

* Replace tuples with lists in arguments of VivitConfig

* Rename patch_size to tubelet_size in TubeletEmbeddings

* Fix checkpoint names

* Add tests

* Remove unused num_frames

* Fix imports for VivitImageProcessor

* Minor fixes

* Decrease number of frames in VivitModelTester from 32 to 16

* Decrease number of frames in VivitModelTester from 16 to 8

* Add initialization for pos embeddings

* Rename Vivit -> ViViT in some places

* Fix docstring and formatting

* Rename TubeletEmbeddings -> VivitTubeletEmbeddings

* Remove load_tf_weights_in_vivit

* Change checkpoint name

* Remove Vivit _TOKENIZER_FOR_DOC

* Fix

* Fix VivitTubeletEmbeddings and pass config object as parameter

* Use image_size and num_frames instead of video_size

* Change conversion script and fix differences with the orig implementation

* Fix docstrings

* Add attention head pruning

* Run style and fixup

* Fix tests

* Add ViViT to video_classification.mdx

* Save processor in conversion script

* Fix

* Add image processor test

* Run fixup and style

* Run fix-copies

* Update tests/models/vivit/test_modeling_vivit.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/vivit/test_modeling_vivit.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vivit/modeling_vivit.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Use PyAV instead of decord

* Add unittest.skip

* Run style

* Remove unneeded test

* Update docs/source/en/model_doc/vivit.mdx

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vivit/configuration_vivit.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vivit/modeling_vivit.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vivit/image_processing_vivit.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vivit/modeling_vivit.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vivit/modeling_vivit.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vivit/image_processing_vivit.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vivit/modeling_vivit.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Add model

* Add docs

* Run style

* Fix imports and add mdx doc

* Remove FeatureExtractor and replace with ImageProcessor everywhere

* Change ViViT -> Vivit everywhere

* Rename Vivit -> ViViT in some places

* Update src/transformers/models/vivit/image_processing_vivit.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Run make style

* Remove inputs save

* Fix image processor

* Fix

* Run `make style`

* Decrease parameters of VivitModelTester

* Decrease tubelet size

* Rename vivit.mdx

* Update src/transformers/models/vivit/image_processing_vivit.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vivit/image_processing_vivit.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vivit/image_processing_vivit.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fix default values in image_processing_vivit.py

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-07-11 14:04:04 +01:00
Arthur
b15343de6f
[Patch-t5-tokenizer] Patches the changes on T5 to make sure previous behaviour is still valide for beginning of words (#24622)
* patch `_tokenize` function

* more tests

* properly fix

* fixup

* Update src/transformers/models/t5/tokenization_t5.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix without ifs

* update

* protect import

* add python processing

* is first needed

* add doc and update with lefacy

* updaate

* fix T5 SPM converter

* styling

* fix T5 warning

* add is_seqio_available

* remove is_first

* revert some changes

* more tests and update

* update llama test batterie

* fixup

* refactor T5 spm common tests

* draft the llama tests

* update

* uopdate test

* nits

* refine

* name nit

* fix t5 tests

* fix T5

* update

* revert convert slow to fast changes that fail lots of tests

* legacy support

* fixup

* nits is first not defined

* don't use legacy behaviour for switch transformers

* style

* My attempt to check.

* nits

* fixes

* update

* fixup

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* updates

* fixup

* add legacy warning

* fixup

* warning_once nit

* update t5 documentation test

* update llama tok documentation

* add space to warning

* nits

* nit

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* last nits

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2023-07-11 15:02:18 +02:00
Matt
b3ab3fac1d
Falcon port (#24523)
* Initial commit

* Update src/transformers/models/falcon/configuration_falcon.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/falcon/configuration_falcon.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Cleanup config docstring

* Update src/transformers/models/falcon/configuration_falcon.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Convert to relative imports

* Remove torch < 1.8 warning

* Restructure cos_sin header

* qkv -> query, key, value

* Refactor attention calculation

* Add a couple of config variables to account for the different checkpoints

* Successful merging of the code paths!

* Fix misplaced line in the non-parallel attention path

* Update config and tests

* Add a pad_token_id when testing

* Support output_attentions when alibi is None

* make fixup

* Skip KV cache shape test

* No more _keys_to_ignore_on_load_missing

* Simplify self attention a bit

* Simplify self attention a bit

* make fixup

* stash commit

* Some more attention mask updates

* Should pass all tests except assisted generation!

* Add big model generation test

* make fixup

* Add temporary workaround for test

* Test overrides for assisted generation

* Update src/transformers/models/falcon/modeling_falcon.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/falcon/modeling_falcon.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/falcon/modeling_falcon.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update tests/models/falcon/test_modeling_falcon.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Test overrides for assisted generation

* Add generation demo

* Update copyright

* Make the docstring model actually small

* Add module-level docstring

* Remove all assertions

* Add copied from bloom

* Reformat the QKV layer

* Add copied from bloom

* Update src/transformers/models/falcon/modeling_falcon.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Remove unused line and reformat

* No single letter variables

* Cleanup return names

* Add copied from line

* Remove the deprecated arguments blocks

* Change the embeddings test to an alibi on/off test

* Remove position_ids from FalconForQA

* Remove old check for token type IDs

* Fix the alibi path when multi_query is False

* Update src/transformers/models/falcon/modeling_falcon.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/falcon/modeling_falcon.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/falcon/test_modeling_falcon.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update config naming

* Fix typo for new_decoder_architecture

* Add some comments

* Fix docstring

* Fix docstring

* Create range in the right dtype from the start

* Review comment cleanup

* n_head_kv -> num_kv_heads

* self.alibi -> self.use_alibi

* self.num_kv -> self.num_kv_heads

* Reorder config args

* Made alibi arguments Optional

* Add all model docstrings

* Add extra checkpoints

* Add author info for Falcon

* Stop removing token_type_ids because our checkpoints shouldn't return it anymore

* Add one hopeful comment for the future

* Fix typo

* Update tests, fix cache issue for generation

* Use -1e9 instead of -inf to avoid float overflow

* Recompute the rotary embeddings much less often

* Re-enable disabled tests

* One final fix to attention mask calculation, and update tests

* Cleanup targeting falcon-40b equivalency

* Post-rebase docs update

* Update docstrings, especially in the config

* More descriptive variable names, and comments where we can't rename them

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-07-11 13:36:31 +01:00
Marc Sun
35eac0df75
add link to accelerate doc (#24601) 2023-07-10 17:49:30 -04:00