Commit Graph

12474 Commits

Author SHA1 Message Date
dependabot[bot]
6fc44656b4
Bump redis from 4.5.3 to 4.5.4 in /examples/research_projects/decision_transformer (#22494)
Bump redis in /examples/research_projects/decision_transformer

Bumps [redis](https://github.com/redis/redis-py) from 4.5.3 to 4.5.4.
- [Release notes](https://github.com/redis/redis-py/releases)
- [Changelog](https://github.com/redis/redis-py/blob/master/CHANGES)
- [Commits](https://github.com/redis/redis-py/compare/v4.5.3...v4.5.4)

---
updated-dependencies:
- dependency-name: redis
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-03-31 10:50:33 -04:00
Nicolas Patry
d143087d18
Making sure we can use safetensors to serialize all the time. (#22437)
* Making sure we can use safetensors to serialize all the time.

* Expanding the tests for increased coverage.

* Update the test.

* Getting current state of affairs.

* Tentative fix.

* Fixing black version.

* Fixing the worst offenders.

* Try to modify less files.

* Fixing blip_2 (Weird solution right now).

* Fixing deta.

* Fix blip ?

* Missing extra newline.

* No deta modification.

* Adding some comments.

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Addressing comments.

* Addressing comments.

* creating warn_once.

* Warning_once !

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-03-31 16:07:35 +02:00
Yih-Dar
516077b3b0
Update Wav2Vec2ProcessorWithLM doc example (#22474)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-31 14:17:40 +02:00
lewtun
da68fd691c
Relax eos_token_id < 0 checks in generate() from ValueError to warning (#22472)
* Relax  checks from  to warning

* Fix style

* Replace warnings with logger

* Use warning vs warn
2023-03-31 09:09:40 +02:00
Yih-Dar
0fe6c6bdca
(Re-)Enable Nightly + Past CI (#22393)
* Enable Nightly + Past CI

* put schedule

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-30 21:06:35 +02:00
Manuel de Prada
d5de578c22
Docs fix: Multinomial sampling decoding needs "num_beams=1", since by default it is usually not 1. (#22473)
Fix: Multinomial sampling needs "num_beams=1", since by default is 5.
2023-03-30 11:04:12 -04:00
Joao Gante
165dd6dc91
Llama: support for max_position_embeddings (#22471)
* Llama now supports max_position_embeddings

* Save config; Cosmetic edits
2023-03-30 15:54:01 +01:00
Arthur
349e1242d9
[NLLB-MoE] model_type update for auto mapping (#22470)
edit default model type and testing path set to hf-internal-testing
2023-03-30 15:36:07 +02:00
Roy Hvaara
11426641dc
Guard imports of PreTrainedTokenizerFast on is_tokenizers_available (#22285)
Guard imports that use the tokenizers library
2023-03-30 09:16:03 -04:00
amyeroberts
4d7a5b5ba3
🚨🚨🚨 Fix ordering of height, width for BLIP image processor (#22466)
Fix ordering of height,width for BLIP
2023-03-30 14:02:16 +01:00
Joao Gante
228792a9dc
Generate: basic token streaming (#22449)
* haha tokens go brrrr
2023-03-30 12:00:12 +01:00
amyeroberts
f0aeb1be17
Skip flaky NLLB Moe test for now (#22463)
Skip flaky test for now
2023-03-30 11:30:19 +01:00
amyeroberts
154c6bb7ac
Rescale image back if it was scaled during PIL conversion (#22458)
* Rescale image back if it was scaled during PIL conversion

* do_rescale is defined if PIL image passed in
2023-03-30 11:29:11 +01:00
amyeroberts
c15f937581
Move common properties to BackboneMixin (#21855)
* Move common properties to BackboneMixin

* Fix failing tests

* Update ConvNextV2 backbone
2023-03-30 10:04:11 +01:00
Stefan Heng
cd73b9a8c1
Update: ignore padding support for TransfoXL training when n_clusters==0 (#22457)
* Update: ignore padding support for TransfoXL training when n_clusters==0

* Update: transformer XL always pad

* Update: drop doc
2023-03-29 14:36:39 -04:00
Sylvain Gugger
2194943a34
Pin ruff (#22455) 2023-03-29 14:07:06 -04:00
Sylvain Gugger
4c295a265b
Update release instructions (#22454) 2023-03-29 14:05:42 -04:00
Yih-Dar
97440e9c75
Avoid using personal HF token in CI (#22453)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-29 19:45:06 +02:00
Sabine
173193ccd0
Update Neptune docs (#22452) 2023-03-29 13:15:38 -04:00
jeffhataws
5e89a435c8
Revert "Fix --bf16 option support for Neuron after PR #22300" (#22451)
This reverts commit fd81746dbe.
2023-03-29 12:59:13 -04:00
Younes Belkada
b844f8a9ab
[Pix2Struct] Fix slow test (#22448)
fix slow test
2023-03-29 17:40:45 +02:00
Sylvain Gugger
55dae94c0c
Revert "Error (also in original) model, scaling only q matrix not qk.T dot product (qk.T/sqrt(dim_per_head))" (#22444)
Revert "Error (also in original) model, scaling only q matrix not qk.T dot product (qk.T/sqrt(dim_per_head)) (#21627)"

This reverts commit bad8300837.
2023-03-29 10:59:42 -04:00
Yih-Dar
8894b81742
Use real tokenizers if tiny version(s) creation has issue(s) (#22428)
Fix some tiny model creation issues

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-29 16:16:23 +02:00
Sylvain Gugger
9b494a1537
Don't hard error when cache version can't be converted to int (#22427) 2023-03-29 09:46:30 -04:00
Younes Belkada
8252e24a77
[Generate] Add conditional generation for multimodal models (#22424)
* add conditional generation

* add comments
2023-03-29 15:35:30 +02:00
Younes Belkada
33f4cb1093
[bnb] fix bnb failing test (#22439)
* fix bnb failing test

* fix

* fix

* fixup
2023-03-29 15:13:00 +02:00
Nolwenn Bernard
fab1de72f1
Hyperparameter search reporting to W&B (#22440)
Fixes #22429
2023-03-29 09:09:57 -04:00
Arthur
8d9c3836be
Add clean_up_tokenization_spaces to config (#22341)
* add draft changes

* fix failing wav2vec

* style

* make sure that the argument is saved + add tests

* style

* fixup

* update test

* default clean_up_tokenization_spaces to False for Bloom and Llama

* Update code based on review

Co-authored-by: Nicolas Patry <patry.nicolas@gmail.com>

* style

* quality

---------

Co-authored-by: Nicolas Patry <patry.nicolas@gmail.com>
2023-03-29 13:21:07 +02:00
Joao Gante
b29fd6971d
MBart: Fix docs and doctests (#22422)
Fix docs and doctests
2023-03-28 15:42:02 +01:00
Jeff Rasley
ae5fc2db87
[performance] ensure causal_mask is created directly on device (#22378)
* ensure causal_mask is created directly on device

* add copy tag to opt, update bart implementation

* add device to all _make_causal_mask copies

* formatting fixes

* more manual fixes due to unlinked versions of _prepare_decoder_attention_mask
2023-03-28 09:17:03 -04:00
fpgaminer
ed57c979b9
Fix bug in perplexity guide calculations and update perplexity numbers. Fixes #22348 (#22411)
Fix bug in perplexity guide calculations and update perplexity numbers.
2023-03-28 09:09:17 -04:00
dependabot[bot]
32ff06403d
Bump redis from 4.1.4 to 4.5.3 in /examples/research_projects/decision_transformer (#22410)
Bump redis in /examples/research_projects/decision_transformer

Bumps [redis](https://github.com/redis/redis-py) from 4.1.4 to 4.5.3.
- [Release notes](https://github.com/redis/redis-py/releases)
- [Changelog](https://github.com/redis/redis-py/blob/master/CHANGES)
- [Commits](https://github.com/redis/redis-py/compare/v4.1.4...v4.5.3)

---
updated-dependencies:
- dependency-name: redis
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-03-27 20:23:55 -04:00
Kshiteej K
3ec7a47664
[neptune] fix checkpoint bug with relative out_dir (#22102)
* [neptune] fix checkpoint bug with relative out_dir

* update imports

* reformat with black

* check neptune without imports

* fix typing-related issue

* run black on code

* use os.path.sep instead of raw \

* simplify imports and remove type annotation

* make ruff happy

* apply review suggestions

---------

Co-authored-by: Aleksander Wojnarowicz <alwojnarowicz@gmail.com>
2023-03-27 15:00:16 -04:00
Arthur
19ade2426a
[WIP]NLLB-MoE Adds the moe model (#22024)
* Initial commit

* update modeling code

* update doc

* add functions necessary

* fix impotrs

* revert changes

* fixup

* more styling to get going

* remove standalone encoder

* update code

* styling

* fix config and model

* update code and some refactoring

* make more tests pass

* Adding NLLB-200 - MoE - 54.5B for no language left behind
Fixes #21300

* fix mor common tests

* styke

* update testing file

* update

* update

* Router2 doc

* update check config with sparse layer

* add dummy router

* update current conversion script

* create on the fly conversion script

* Fixup

* style

* style 2

* fix empty return

* fix return

* Update default config sparse layers

* easier to create sparse layers

* update

* update conversion script

* update modeling

* add to toctree

* styling

* make ruff happy

* update docstring

* update conversion script

* update, will break tests but impelemting top2

* update

* local groups are supported here

* ⚠️ Support for local groups is now removed ⚠️

This is because it has to work with model parallelism that we do not support

* finish simplificaiton

* Fix forward

* style

* fixup

* Update modelling and test, refactoring

* update tests

* remove final layer)norm as it is done in the FF

* routing works! Logits test added

* nit in test

* remove top1router

* style

* make sure sparse are tested. Had to change route_tokens a liottle bit

* add support for unslip models when converting

* fixup

* style

* update test s

* update test

* REFACTOR

* encoder outputs match!

* style

* update testing

* 🎉encoder and decoder logits match 🎉

* styleing

* update tests

* cleanup tests

* fix router test and CIs

* cleanup

* cleanup test styling

* fix tests

* Finally the generation tests match!

* cleanup

* update test

* style testing file

* remove script

* cleanup

* more cleanup

* nits

* update

* NLLB tokenizer is wrong and will be fixed soon

* use LongTensors

* update tests

* revert some small changes

* fix second expert sampling and batch prioritized routing

* update tests

* finish last tests

* make ruff happy

* update

* ruff again

* style

* Update docs/source/en/model_doc/nllb-moe.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Updates based on review

* style and fix import issue

* nit

* more nits

* cleanup

* styling

* update test_seconde_expert_policy

* fix name

* last nit on the markdown examples

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-03-27 19:42:00 +02:00
Sylvain Gugger
057e1d7473
Fix quality 2023-03-27 13:17:14 -04:00
Donny Greenberg
f02e3a2b18
Hardware Auto-Setup for Examples (#22319)
* Add initial remote hardware auto-setup docs

* Fix a few typos and clarify some language

* Add missing dependency

* Update self-hosted launch script with Sylvain's comments.

* Formatting.

* Trigger CI

* Style
2023-03-27 13:07:53 -04:00
Joao Gante
738944c9ee
Trainer: missing None check (#22404)
missing None check
2023-03-27 18:04:28 +01:00
Joao Gante
53155b520d
Trainer: move Seq2SeqTrainer imports under the typing guard (#22401) 2023-03-27 16:39:26 +01:00
NielsRogge
0e708178ed
[Pix2Struct] Add support to resize embeddings (#22394)
* First draft

* Fix integration test

* Remove script

* Fix test and typos

* Fix one more test

* Skip tied embeddings test

* Remove line

* Address comments
2023-03-27 11:38:07 -04:00
Sylvain Gugger
f6b80a0139
Transformers env safetensors (#22400)
* Report safetensors version in transformers-cli env

* Styling

* Trigger CI maybe
2023-03-27 11:12:42 -04:00
Younes Belkada
d324b70f00
[bnb] Force requires_grad to be False (#22396)
for rg to be `False`
2023-03-27 16:55:55 +02:00
Joao Gante
7dcd8703ef
Generate: support for left-padding on GPTNeoX and Llama (#22382) 2023-03-27 15:48:23 +01:00
Nathan Fradet
5506d04969
Seq2seq trainer generation config arg (#22323)
* seq2seq trainer and training arguments accepting GenerationConfig arg

* seq2seq Trainer and training arguments docstring fixes

* Update training_args_seq2seq.py docstring

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Fixing trainer_seq2seq.py docstring

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* seq2seq trainer: legacy gen args back & GenerationConfig created at init

* Seq2seq trainer: fix in case gen_config.max_new_tokens is None

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* seq2seq trainer: adding legacy arg retrocompatibility

* seq2seq trainer and training arguments accepting GenerationConfig arg

* seq2seq Trainer and training arguments docstring fixes

* Update training_args_seq2seq.py docstring

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Fixing trainer_seq2seq.py docstring

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* seq2seq trainer: legacy gen args back & GenerationConfig created at init

* Seq2seq trainer: fix in case gen_config.max_new_tokens is None

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* seq2seq trainer: adding legacy arg retrocompatibility

* seq2seq trainer: evaluate and predict untouched

* Apply suggestions from code review

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* seq2seq trainer: adding init args, keeping IDEs hints

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-03-27 15:47:35 +01:00
Vladislav Sokolovskii
03966cacf9
Wav2Vec2ProcessorWithLM can return N best hypotheses now (#22235)
* Wav2Vec2ProcessorWithLM can return N best hypotheses now

Signed-off-by: Vladislav Sokolovskii <vladislav@parrothq.com>

* Wav2Vec2ProcessorWithLM n_best cannot be None

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Batch decoding can return  N best hypotheses now

batch_decode was extended with the same functionality as decode
function, N best hypotheses per sample can be returned

Signed-off-by: Vladislav Sokolovskii <vladislav@parrothq.com>

---------

Signed-off-by: Vladislav Sokolovskii <vladislav@parrothq.com>
Co-authored-by: Vladislav Sokolovskii <vladislav@parrothq.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-03-27 10:37:46 -04:00
кѳѳsнī
66d1eee682
load_in_8bit now respects 'balanced' device maps in multi-gpu environments (#22377)
balanced 8bit memory
2023-03-27 10:34:52 -04:00
Sylvain Gugger
8cfc6678da
Adapt find_tied_parameters to handle breaking change in Accelerate (#22360) 2023-03-27 10:11:14 -04:00
Nicola Procopio
204737fcc5
Translated documentation in italian (#22388)
* updated toctree

* added and translated mdx documents
2023-03-27 09:48:49 -04:00
Charlie-Bell
d5c2c71c0f
Changed world_size() to get_world_size() bugfix (#22381)
Edited one line in src/transormers/generation/utils.py. Changed dist.world_size() to dist.get_world_size() since world_size() doesn't exist in pytorch.dist.
2023-03-27 09:24:25 -04:00
Joao Gante
c746eb1603
TensorFlow: additional missing cmake dependencies in CI (#22383)
* missing cmake

* more cmake
2023-03-27 09:20:56 -04:00
Stas Bekman
cae78c46d6
[safetensors] don't use in torch<1.10 (#22370)
* [safetensors] don't use in pt<1.10

* better fix
2023-03-24 16:23:27 -04:00