Commit Graph

15053 Commits

Author SHA1 Message Date
Sylvain Gugger
55dae94c0c
Revert "Error (also in original) model, scaling only q matrix not qk.T dot product (qk.T/sqrt(dim_per_head))" (#22444)
Revert "Error (also in original) model, scaling only q matrix not qk.T dot product (qk.T/sqrt(dim_per_head)) (#21627)"

This reverts commit bad8300837.
2023-03-29 10:59:42 -04:00
Yih-Dar
8894b81742
Use real tokenizers if tiny version(s) creation has issue(s) (#22428)
Fix some tiny model creation issues

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-29 16:16:23 +02:00
Sylvain Gugger
9b494a1537
Don't hard error when cache version can't be converted to int (#22427) 2023-03-29 09:46:30 -04:00
Younes Belkada
8252e24a77
[Generate] Add conditional generation for multimodal models (#22424)
* add conditional generation

* add comments
2023-03-29 15:35:30 +02:00
Younes Belkada
33f4cb1093
[bnb] fix bnb failing test (#22439)
* fix bnb failing test

* fix

* fix

* fixup
2023-03-29 15:13:00 +02:00
Nolwenn Bernard
fab1de72f1
Hyperparameter search reporting to W&B (#22440)
Fixes #22429
2023-03-29 09:09:57 -04:00
Arthur
8d9c3836be
Add clean_up_tokenization_spaces to config (#22341)
* add draft changes

* fix failing wav2vec

* style

* make sure that the argument is saved + add tests

* style

* fixup

* update test

* default clean_up_tokenization_spaces to False for Bloom and Llama

* Update code based on review

Co-authored-by: Nicolas Patry <patry.nicolas@gmail.com>

* style

* quality

---------

Co-authored-by: Nicolas Patry <patry.nicolas@gmail.com>
2023-03-29 13:21:07 +02:00
Joao Gante
b29fd6971d
MBart: Fix docs and doctests (#22422)
Fix docs and doctests
2023-03-28 15:42:02 +01:00
Jeff Rasley
ae5fc2db87
[performance] ensure causal_mask is created directly on device (#22378)
* ensure causal_mask is created directly on device

* add copy tag to opt, update bart implementation

* add device to all _make_causal_mask copies

* formatting fixes

* more manual fixes due to unlinked versions of _prepare_decoder_attention_mask
2023-03-28 09:17:03 -04:00
fpgaminer
ed57c979b9
Fix bug in perplexity guide calculations and update perplexity numbers. Fixes #22348 (#22411)
Fix bug in perplexity guide calculations and update perplexity numbers.
2023-03-28 09:09:17 -04:00
dependabot[bot]
32ff06403d
Bump redis from 4.1.4 to 4.5.3 in /examples/research_projects/decision_transformer (#22410)
Bump redis in /examples/research_projects/decision_transformer

Bumps [redis](https://github.com/redis/redis-py) from 4.1.4 to 4.5.3.
- [Release notes](https://github.com/redis/redis-py/releases)
- [Changelog](https://github.com/redis/redis-py/blob/master/CHANGES)
- [Commits](https://github.com/redis/redis-py/compare/v4.1.4...v4.5.3)

---
updated-dependencies:
- dependency-name: redis
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-03-27 20:23:55 -04:00
Kshiteej K
3ec7a47664
[neptune] fix checkpoint bug with relative out_dir (#22102)
* [neptune] fix checkpoint bug with relative out_dir

* update imports

* reformat with black

* check neptune without imports

* fix typing-related issue

* run black on code

* use os.path.sep instead of raw \

* simplify imports and remove type annotation

* make ruff happy

* apply review suggestions

---------

Co-authored-by: Aleksander Wojnarowicz <alwojnarowicz@gmail.com>
2023-03-27 15:00:16 -04:00
Arthur
19ade2426a
[WIP]NLLB-MoE Adds the moe model (#22024)
* Initial commit

* update modeling code

* update doc

* add functions necessary

* fix impotrs

* revert changes

* fixup

* more styling to get going

* remove standalone encoder

* update code

* styling

* fix config and model

* update code and some refactoring

* make more tests pass

* Adding NLLB-200 - MoE - 54.5B for no language left behind
Fixes #21300

* fix mor common tests

* styke

* update testing file

* update

* update

* Router2 doc

* update check config with sparse layer

* add dummy router

* update current conversion script

* create on the fly conversion script

* Fixup

* style

* style 2

* fix empty return

* fix return

* Update default config sparse layers

* easier to create sparse layers

* update

* update conversion script

* update modeling

* add to toctree

* styling

* make ruff happy

* update docstring

* update conversion script

* update, will break tests but impelemting top2

* update

* local groups are supported here

* ⚠️ Support for local groups is now removed ⚠️

This is because it has to work with model parallelism that we do not support

* finish simplificaiton

* Fix forward

* style

* fixup

* Update modelling and test, refactoring

* update tests

* remove final layer)norm as it is done in the FF

* routing works! Logits test added

* nit in test

* remove top1router

* style

* make sure sparse are tested. Had to change route_tokens a liottle bit

* add support for unslip models when converting

* fixup

* style

* update test s

* update test

* REFACTOR

* encoder outputs match!

* style

* update testing

* 🎉encoder and decoder logits match 🎉

* styleing

* update tests

* cleanup tests

* fix router test and CIs

* cleanup

* cleanup test styling

* fix tests

* Finally the generation tests match!

* cleanup

* update test

* style testing file

* remove script

* cleanup

* more cleanup

* nits

* update

* NLLB tokenizer is wrong and will be fixed soon

* use LongTensors

* update tests

* revert some small changes

* fix second expert sampling and batch prioritized routing

* update tests

* finish last tests

* make ruff happy

* update

* ruff again

* style

* Update docs/source/en/model_doc/nllb-moe.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Updates based on review

* style and fix import issue

* nit

* more nits

* cleanup

* styling

* update test_seconde_expert_policy

* fix name

* last nit on the markdown examples

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-03-27 19:42:00 +02:00
Sylvain Gugger
057e1d7473
Fix quality 2023-03-27 13:17:14 -04:00
Donny Greenberg
f02e3a2b18
Hardware Auto-Setup for Examples (#22319)
* Add initial remote hardware auto-setup docs

* Fix a few typos and clarify some language

* Add missing dependency

* Update self-hosted launch script with Sylvain's comments.

* Formatting.

* Trigger CI

* Style
2023-03-27 13:07:53 -04:00
Joao Gante
738944c9ee
Trainer: missing None check (#22404)
missing None check
2023-03-27 18:04:28 +01:00
Joao Gante
53155b520d
Trainer: move Seq2SeqTrainer imports under the typing guard (#22401) 2023-03-27 16:39:26 +01:00
NielsRogge
0e708178ed
[Pix2Struct] Add support to resize embeddings (#22394)
* First draft

* Fix integration test

* Remove script

* Fix test and typos

* Fix one more test

* Skip tied embeddings test

* Remove line

* Address comments
2023-03-27 11:38:07 -04:00
Sylvain Gugger
f6b80a0139
Transformers env safetensors (#22400)
* Report safetensors version in transformers-cli env

* Styling

* Trigger CI maybe
2023-03-27 11:12:42 -04:00
Younes Belkada
d324b70f00
[bnb] Force requires_grad to be False (#22396)
for rg to be `False`
2023-03-27 16:55:55 +02:00
Joao Gante
7dcd8703ef
Generate: support for left-padding on GPTNeoX and Llama (#22382) 2023-03-27 15:48:23 +01:00
Nathan Fradet
5506d04969
Seq2seq trainer generation config arg (#22323)
* seq2seq trainer and training arguments accepting GenerationConfig arg

* seq2seq Trainer and training arguments docstring fixes

* Update training_args_seq2seq.py docstring

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Fixing trainer_seq2seq.py docstring

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* seq2seq trainer: legacy gen args back & GenerationConfig created at init

* Seq2seq trainer: fix in case gen_config.max_new_tokens is None

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* seq2seq trainer: adding legacy arg retrocompatibility

* seq2seq trainer and training arguments accepting GenerationConfig arg

* seq2seq Trainer and training arguments docstring fixes

* Update training_args_seq2seq.py docstring

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Fixing trainer_seq2seq.py docstring

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* seq2seq trainer: legacy gen args back & GenerationConfig created at init

* Seq2seq trainer: fix in case gen_config.max_new_tokens is None

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* seq2seq trainer: adding legacy arg retrocompatibility

* seq2seq trainer: evaluate and predict untouched

* Apply suggestions from code review

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* seq2seq trainer: adding init args, keeping IDEs hints

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-03-27 15:47:35 +01:00
Vladislav Sokolovskii
03966cacf9
Wav2Vec2ProcessorWithLM can return N best hypotheses now (#22235)
* Wav2Vec2ProcessorWithLM can return N best hypotheses now

Signed-off-by: Vladislav Sokolovskii <vladislav@parrothq.com>

* Wav2Vec2ProcessorWithLM n_best cannot be None

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Batch decoding can return  N best hypotheses now

batch_decode was extended with the same functionality as decode
function, N best hypotheses per sample can be returned

Signed-off-by: Vladislav Sokolovskii <vladislav@parrothq.com>

---------

Signed-off-by: Vladislav Sokolovskii <vladislav@parrothq.com>
Co-authored-by: Vladislav Sokolovskii <vladislav@parrothq.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-03-27 10:37:46 -04:00
кѳѳsнī
66d1eee682
load_in_8bit now respects 'balanced' device maps in multi-gpu environments (#22377)
balanced 8bit memory
2023-03-27 10:34:52 -04:00
Sylvain Gugger
8cfc6678da
Adapt find_tied_parameters to handle breaking change in Accelerate (#22360) 2023-03-27 10:11:14 -04:00
Nicola Procopio
204737fcc5
Translated documentation in italian (#22388)
* updated toctree

* added and translated mdx documents
2023-03-27 09:48:49 -04:00
Charlie-Bell
d5c2c71c0f
Changed world_size() to get_world_size() bugfix (#22381)
Edited one line in src/transormers/generation/utils.py. Changed dist.world_size() to dist.get_world_size() since world_size() doesn't exist in pytorch.dist.
2023-03-27 09:24:25 -04:00
Joao Gante
c746eb1603
TensorFlow: additional missing cmake dependencies in CI (#22383)
* missing cmake

* more cmake
2023-03-27 09:20:56 -04:00
Stas Bekman
cae78c46d6
[safetensors] don't use in torch<1.10 (#22370)
* [safetensors] don't use in pt<1.10

* better fix
2023-03-24 16:23:27 -04:00
Sylvain Gugger
cfab34e188
Fix TF pipeline job 2023-03-24 16:16:43 -04:00
Stas Bekman
500fce073b
[Trainer] add disclaimer that full_determinism is slow (#22368) 2023-03-24 12:46:41 -07:00
Shubhamai
a0cbbba31f
Resnet flax (#21472)
* [WIP] flax resnet

* added pretrained flax models, results reproducible

* Added pretrained flax models, results reproducible

* working on tests

* no real code change, just some comments

* [flax] adding support for batch norm layers

* fixing bugs related to pt+flax integration

* removing loss from modeling flax output class

* fixing classifier tests

* fixing comments, model output

* cleaning comments

* review changes

* review changes

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* renaming Flax to PyTorch

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-03-24 19:45:57 +00:00
Joao Gante
88dae78f4d
TensorFlow: pin maximum version to 2.12 (#22364) 2023-03-24 18:45:03 +00:00
Samuel Bubán
3a7f5fa9d2
Improve error message (#22361)
* Improve error message

* Fix consistency
2023-03-24 18:09:01 +00:00
Sylvain Gugger
6587125c0a
Pin tensorflow-text to go with tensorflow (#22362)
* Pin tensorflow-text to go with tensorflow

* Make it more convenient to pin TensorFlow

* setup don't like f-strings
2023-03-24 10:54:06 -04:00
Yih-Dar
01203475c9
Update docker files to use official torch 2.0.0 (#22357)
* update docker files to use official torch 2.0.0

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-24 14:29:05 +01:00
Mitch Naylor
57f25f4b7f
Add Mega: Moving Average Equipped Gated Attention (#21766)
* add mega file structure and plain pytorch version of mega source code

* added config class with old naming conventions

* filled in mega documentation

* added config class and embeddings with optional token types

* updated notes

* starting the conversion process, deleted intermediate and added use_cache back to config

* renamed config attributes in modeling_mega.py

* checkpointing before refactoring incremental decoding functions

* removed stateful incremental key/values for EMA and self-attention

* refactored MovingAverageGatedAttention to remove stateful k/v history and use unified attention mask

* MovingAverageGatedAttention works with incremental decoding + past values, added sequence length enforcement

* more comments in MovingAverageGatedAttention + checkpointing before GatedCrossAttention

* bug fix in attention mask handling in MovingAverageGatedAttention

* removed incremental state from GatedCrossAttention and removed IncrementalState class

* finished gated cross attention and got MegaLayer working

* fixed causal masking in mega decoder

* fixed how padding and causal masks are passed through MegaLayer with and without k/v caching

* finished MegaModel; tested with encoder, decoder-only, and cross-attention type inputs; started work on downstream classes; removed mentions of position_ids

* added optional dense hidden layer for masked and causal LM classes

* docstring updates in MultiHeadEMA and GatedCrossAttention, removed unnecessary inputs in cross-attention

* removed before_attn_fn in Mega class and updated docstrings and comments up to there

* bug fix in MovingAverageGatedAttention masking

* working conversion of MLM checkpoint in scratchpad script -- perfect matches

* moved arg for hidden dense layer in LM head to config; discovered issue where from_pretrained is renaming gamma and beta parameters

* renamed gamma and beta parameters to avoid HF renaming when loading from checkpoint

* finished checkpoint conversion script

* cleanup old class in mega config script

* removed 'copied from' statements and passing integration tests

* added num_attention_heads=1 to config for integration compatibility, decoder tests working, generation tests failing

* fixed tuple output of megamodel

* all common tests passing after fixing issues in decoder, gradient retention, and initialization

* added mega-specific tests, ready for more documentation and style checks

* updated docstrings; checkpoint before style fixes

* style and quality checks, fixed initialization problem in float_tensor, ready for PR

* added mega to toctree

* removed unnecessary arg in megaconfig

* removed unused arg and fixed code samples with leftover roberta models

* Apply suggestions from code review

Applied all suggestions except the one renaming a class, as I'll need to update that througout

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fixed issue where .view breaks batch dimension, conversion script fixed with absolute imports, updated readme with Mega->MEGA

* removed asserts in Mega code, renamed sequencenorm, gatedcrossattention, and NFFN, replaced get_activation_fn with ACTFN, and added sequencenorm to layer norms

* reformatted .forward() docstrings to match style and removed unused mask input in cross-attention

* removed all reset_parameters() methods and rolled into MegaPreTrainedModel._init_weights()

* renamed all single-letter variables and improved readability in tensor size comments, Mega->MEGA in 2 documentation files

* variable names in NFFN

* manual Mega->MEGA changes in docs

* Mega->MEGA in config auto

* style and quality fixes

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* renamed parameters and variables with confusing names, added copied from statements, moved fft conv to its own method, other cleanup from PR comments

* commit before dealing with merge conflicts

* made new attention activation functions available in ACT2FN and added generation test from OPT

* style and quality in activations and tests

* documentation fixes, renaming variables in dropout and rotary positions, used built-in causal masking, encoders->layers in MegaModel, moved comments into docstrings

* style and quality fixes after latest updates, before rotary position ids

* causal mask in MegaBlock docstring + added missing device passing

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* added Mega prefixes where missing, reverted MegaSequenceNorm to if-else, other module renaming requested in PR

* style and quality fixes + readme updates pointing to main

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-03-24 08:17:27 -04:00
Joao Gante
0fa46524ac
Generate: Add GPTNeoX integration test (#22346) 2023-03-24 11:33:16 +00:00
Ashwin Mathur
b79607656b
Fix typo in Greedy Search Description (#22345)
Fix typo in greedy search docs
2023-03-24 07:32:18 -04:00
James Reed
c0fa2aa0b8
[HFTracer] Make embeddings ops take on the dtype of the weight (#22347)
* [HFTracer] Make embeddings ops take on the dtype of the weight

* fix bug
2023-03-24 07:04:51 -04:00
Yih-Dar
e8cc02555e
Automatically create/update tiny models (#22275)
* Automatically create or update tiny models

* Skip failed tests

* update workflow file

* use revision

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-03-23 19:14:17 +01:00
кѳѳsнī
a92e0ad2e2
Enable training Llama with model or pipeline parallelism (#22329)
* Llama - Move target tokens to final pipeline device if needed

* Update src/transformers/models/llama/modeling_llama.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/llama/modeling_llama.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-03-23 13:15:51 -04:00
Joao Gante
502fec779b
Generate: add test for left-padding support (#22322) 2023-03-23 17:00:22 +00:00
jeffhataws
ec9b18f62d
Fix --bf16 option support for Neuron after PR #22300 (#22307)
This PR fixes the "RuntimeError: No CUDA GPUs are available"
when running with --bf16 option on Neuron.

Related PRs:
https://github.com/huggingface/transformers/pull/20684
https://github.com/huggingface/transformers/pull/22300
2023-03-23 12:27:13 -04:00
Batese2001
aef488c503
Added type hints to TFDeiTModel (#22327)
* Added type hints to TFDeiTModel

* make style

---------

Co-authored-by: Matt <rocketknight1@gmail.com>
2023-03-23 15:31:32 +00:00
Samuel Larkin
59b9351b78
Minor typo in pipeline FillMaskPipeline's documentation. (#22339) 2023-03-23 11:14:11 -04:00
Sylvain Gugger
506e7c6361
Fix various imports (#22281)
* Fix various imports

* Fix copies

* Fix import
2023-03-23 10:34:17 -04:00
Quentin Lhoest
053c2153f8
Mention why one needs to specify max_steps in Trainer (#22333)
* Mention why one needs to specify max_steps in Trainer

* dummy change to trigger CI
2023-03-23 15:26:51 +01:00
mollerup23
5a9eb31477
Fixed gradient checkpoint bug for TimeSeriesTransformer (#22272)
* Fixed gradient checkpoint bug for this model

* Updating PR indentation (maintainer feedback)

* make fixup

---------

Co-authored-by: younesbelkada <younesbelkada@gmail.com>
2023-03-23 08:45:13 -04:00
Younes Belkada
ff20f9cf36
[MBart] Add accelerate support for MBart (#22309)
add `accelerate` support for MBart
2023-03-23 10:34:43 +01:00