transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-18 12:08:22 +06:00

Author	SHA1	Message	Date
Sylvain Gugger	55dae94c0c	Revert "Error (also in original) model, scaling only q matrix not qk.T dot product (qk.T/sqrt(dim_per_head))" (#22444 ) Revert "Error (also in original) model, scaling only q matrix not qk.T dot product (qk.T/sqrt(dim_per_head)) (#21627)" This reverts commit `bad8300837`.	2023-03-29 10:59:42 -04:00
Yih-Dar	8894b81742	Use real tokenizers if tiny version(s) creation has issue(s) (#22428 ) Fix some tiny model creation issues Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-03-29 16:16:23 +02:00
Sylvain Gugger	9b494a1537	Don't hard error when cache version can't be converted to int (#22427 )	2023-03-29 09:46:30 -04:00
Younes Belkada	8252e24a77	[`Generate`] Add conditional generation for multimodal models (#22424 ) * add conditional generation * add comments	2023-03-29 15:35:30 +02:00
Younes Belkada	33f4cb1093	[`bnb`] fix bnb failing test (#22439 ) * fix bnb failing test * fix * fix * fixup	2023-03-29 15:13:00 +02:00
Nolwenn Bernard	fab1de72f1	Hyperparameter search reporting to W&B (#22440 ) Fixes #22429	2023-03-29 09:09:57 -04:00
Arthur	8d9c3836be	Add clean_up_tokenization_spaces to config (#22341 ) * add draft changes * fix failing wav2vec * style * make sure that the argument is saved + add tests * style * fixup * update test * default clean_up_tokenization_spaces to False for Bloom and Llama * Update code based on review Co-authored-by: Nicolas Patry <patry.nicolas@gmail.com> * style * quality --------- Co-authored-by: Nicolas Patry <patry.nicolas@gmail.com>	2023-03-29 13:21:07 +02:00
Joao Gante	b29fd6971d	MBart: Fix docs and doctests (#22422 ) Fix docs and doctests	2023-03-28 15:42:02 +01:00
Jeff Rasley	ae5fc2db87	[performance] ensure `causal_mask` is created directly on device (#22378 ) * ensure causal_mask is created directly on device * add copy tag to opt, update bart implementation * add device to all _make_causal_mask copies * formatting fixes * more manual fixes due to unlinked versions of _prepare_decoder_attention_mask	2023-03-28 09:17:03 -04:00
fpgaminer	ed57c979b9	Fix bug in perplexity guide calculations and update perplexity numbers. Fixes #22348 (#22411 ) Fix bug in perplexity guide calculations and update perplexity numbers.	2023-03-28 09:09:17 -04:00
dependabot[bot]	32ff06403d	Bump redis from 4.1.4 to 4.5.3 in /examples/research_projects/decision_transformer (#22410 ) Bump redis in /examples/research_projects/decision_transformer Bumps [redis](https://github.com/redis/redis-py) from 4.1.4 to 4.5.3. - [Release notes](https://github.com/redis/redis-py/releases) - [Changelog](https://github.com/redis/redis-py/blob/master/CHANGES) - [Commits](https://github.com/redis/redis-py/compare/v4.1.4...v4.5.3) --- updated-dependencies: - dependency-name: redis dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-03-27 20:23:55 -04:00
Kshiteej K	3ec7a47664	[neptune] fix checkpoint bug with relative out_dir (#22102 ) * [neptune] fix checkpoint bug with relative out_dir * update imports * reformat with black * check neptune without imports * fix typing-related issue * run black on code * use os.path.sep instead of raw \ * simplify imports and remove type annotation * make ruff happy * apply review suggestions --------- Co-authored-by: Aleksander Wojnarowicz <alwojnarowicz@gmail.com>	2023-03-27 15:00:16 -04:00
Arthur	19ade2426a	[WIP]`NLLB-MoE` Adds the moe model (#22024 ) * Initial commit * update modeling code * update doc * add functions necessary * fix impotrs * revert changes * fixup * more styling to get going * remove standalone encoder * update code * styling * fix config and model * update code and some refactoring * make more tests pass * Adding NLLB-200 - MoE - 54.5B for no language left behind Fixes #21300 * fix mor common tests * styke * update testing file * update * update * Router2 doc * update check config with sparse layer * add dummy router * update current conversion script * create on the fly conversion script * Fixup * style * style 2 * fix empty return * fix return * Update default config sparse layers * easier to create sparse layers * update * update conversion script * update modeling * add to toctree * styling * make ruff happy * update docstring * update conversion script * update, will break tests but impelemting top2 * update * ❗local groups are supported here * ⚠️ Support for local groups is now removed ⚠️ This is because it has to work with model parallelism that we do not support * finish simplificaiton * Fix forward * style * fixup * Update modelling and test, refactoring * update tests * remove final layer)norm as it is done in the FF * routing works! Logits test added * nit in test * remove top1router * style * make sure sparse are tested. Had to change route_tokens a liottle bit * add support for unslip models when converting * fixup * style * update test s * update test * REFACTOR * encoder outputs match! * style * update testing * 🎉encoder and decoder logits match 🎉 * styleing * update tests * cleanup tests * fix router test and CIs * cleanup * cleanup test styling * fix tests * Finally the generation tests match! * cleanup * update test * style testing file * remove script * cleanup * more cleanup * nits * update * NLLB tokenizer is wrong and will be fixed soon * use LongTensors * update tests * revert some small changes * fix second expert sampling and batch prioritized routing * update tests * finish last tests * make ruff happy * update * ruff again * style * Update docs/source/en/model_doc/nllb-moe.mdx Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Updates based on review * style and fix import issue * nit * more nits * cleanup * styling * update test_seconde_expert_policy * fix name * last nit on the markdown examples --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2023-03-27 19:42:00 +02:00
Sylvain Gugger	057e1d7473	Fix quality	2023-03-27 13:17:14 -04:00
Donny Greenberg	f02e3a2b18	Hardware Auto-Setup for Examples (#22319 ) * Add initial remote hardware auto-setup docs * Fix a few typos and clarify some language * Add missing dependency * Update self-hosted launch script with Sylvain's comments. * Formatting. * Trigger CI * Style	2023-03-27 13:07:53 -04:00
Joao Gante	738944c9ee	Trainer: missing None check (#22404 ) missing None check	2023-03-27 18:04:28 +01:00
Joao Gante	53155b520d	Trainer: move Seq2SeqTrainer imports under the typing guard (#22401 )	2023-03-27 16:39:26 +01:00
NielsRogge	0e708178ed	[Pix2Struct] Add support to resize embeddings (#22394 ) * First draft * Fix integration test * Remove script * Fix test and typos * Fix one more test * Skip tied embeddings test * Remove line * Address comments	2023-03-27 11:38:07 -04:00
Sylvain Gugger	f6b80a0139	Transformers env safetensors (#22400 ) * Report safetensors version in transformers-cli env * Styling * Trigger CI maybe	2023-03-27 11:12:42 -04:00
Younes Belkada	d324b70f00	[`bnb`] Force `requires_grad` to be `False` (#22396 ) for rg to be `False`	2023-03-27 16:55:55 +02:00
Joao Gante	7dcd8703ef	Generate: support for left-padding on GPTNeoX and Llama (#22382 )	2023-03-27 15:48:23 +01:00
Nathan Fradet	5506d04969	Seq2seq trainer generation config arg (#22323 ) * seq2seq trainer and training arguments accepting GenerationConfig arg * seq2seq Trainer and training arguments docstring fixes * Update training_args_seq2seq.py docstring Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Fixing trainer_seq2seq.py docstring Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * seq2seq trainer: legacy gen args back & GenerationConfig created at init * Seq2seq trainer: fix in case gen_config.max_new_tokens is None Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * seq2seq trainer: adding legacy arg retrocompatibility * seq2seq trainer and training arguments accepting GenerationConfig arg * seq2seq Trainer and training arguments docstring fixes * Update training_args_seq2seq.py docstring Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Fixing trainer_seq2seq.py docstring Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * seq2seq trainer: legacy gen args back & GenerationConfig created at init * Seq2seq trainer: fix in case gen_config.max_new_tokens is None Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * seq2seq trainer: adding legacy arg retrocompatibility * seq2seq trainer: evaluate and predict untouched * Apply suggestions from code review Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * seq2seq trainer: adding init args, keeping IDEs hints --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2023-03-27 15:47:35 +01:00
Vladislav Sokolovskii	03966cacf9	Wav2Vec2ProcessorWithLM can return N best hypotheses now (#22235 ) * Wav2Vec2ProcessorWithLM can return N best hypotheses now Signed-off-by: Vladislav Sokolovskii <vladislav@parrothq.com> * Wav2Vec2ProcessorWithLM n_best cannot be None Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Batch decoding can return N best hypotheses now batch_decode was extended with the same functionality as decode function, N best hypotheses per sample can be returned Signed-off-by: Vladislav Sokolovskii <vladislav@parrothq.com> --------- Signed-off-by: Vladislav Sokolovskii <vladislav@parrothq.com> Co-authored-by: Vladislav Sokolovskii <vladislav@parrothq.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-03-27 10:37:46 -04:00
кѳѳsнī	66d1eee682	load_in_8bit now respects 'balanced' device maps in multi-gpu environments (#22377 ) balanced 8bit memory	2023-03-27 10:34:52 -04:00
Sylvain Gugger	8cfc6678da	Adapt find_tied_parameters to handle breaking change in Accelerate (#22360 )	2023-03-27 10:11:14 -04:00
Nicola Procopio	204737fcc5	Translated documentation in italian (#22388 ) * updated toctree * added and translated mdx documents	2023-03-27 09:48:49 -04:00
Charlie-Bell	d5c2c71c0f	Changed world_size() to get_world_size() bugfix (#22381 ) Edited one line in src/transormers/generation/utils.py. Changed dist.world_size() to dist.get_world_size() since world_size() doesn't exist in pytorch.dist.	2023-03-27 09:24:25 -04:00
Joao Gante	c746eb1603	TensorFlow: additional missing `cmake` dependencies in CI (#22383 ) * missing cmake * more cmake	2023-03-27 09:20:56 -04:00
Stas Bekman	cae78c46d6	[safetensors] don't use in `torch<1.10` (#22370 ) * [safetensors] don't use in pt<1.10 * better fix	2023-03-24 16:23:27 -04:00
Sylvain Gugger	cfab34e188	Fix TF pipeline job	2023-03-24 16:16:43 -04:00
Stas Bekman	500fce073b	[Trainer] add disclaimer that full_determinism is slow (#22368 )	2023-03-24 12:46:41 -07:00
Shubhamai	a0cbbba31f	Resnet flax (#21472 ) * [WIP] flax resnet * added pretrained flax models, results reproducible * Added pretrained flax models, results reproducible * working on tests * no real code change, just some comments * [flax] adding support for batch norm layers * fixing bugs related to pt+flax integration * removing loss from modeling flax output class * fixing classifier tests * fixing comments, model output * cleaning comments * review changes * review changes * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * renaming Flax to PyTorch --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-03-24 19:45:57 +00:00
Joao Gante	88dae78f4d	TensorFlow: pin maximum version to 2.12 (#22364 )	2023-03-24 18:45:03 +00:00
Samuel Bubán	3a7f5fa9d2	Improve error message (#22361 ) * Improve error message * Fix consistency	2023-03-24 18:09:01 +00:00
Sylvain Gugger	6587125c0a	Pin tensorflow-text to go with tensorflow (#22362 ) * Pin tensorflow-text to go with tensorflow * Make it more convenient to pin TensorFlow * setup don't like f-strings	2023-03-24 10:54:06 -04:00
Yih-Dar	01203475c9	Update docker files to use official torch 2.0.0 (#22357 ) * update docker files to use official torch 2.0.0 --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-03-24 14:29:05 +01:00
Mitch Naylor	57f25f4b7f	Add Mega: Moving Average Equipped Gated Attention (#21766 ) * add mega file structure and plain pytorch version of mega source code * added config class with old naming conventions * filled in mega documentation * added config class and embeddings with optional token types * updated notes * starting the conversion process, deleted intermediate and added use_cache back to config * renamed config attributes in modeling_mega.py * checkpointing before refactoring incremental decoding functions * removed stateful incremental key/values for EMA and self-attention * refactored MovingAverageGatedAttention to remove stateful k/v history and use unified attention mask * MovingAverageGatedAttention works with incremental decoding + past values, added sequence length enforcement * more comments in MovingAverageGatedAttention + checkpointing before GatedCrossAttention * bug fix in attention mask handling in MovingAverageGatedAttention * removed incremental state from GatedCrossAttention and removed IncrementalState class * finished gated cross attention and got MegaLayer working * fixed causal masking in mega decoder * fixed how padding and causal masks are passed through MegaLayer with and without k/v caching * finished MegaModel; tested with encoder, decoder-only, and cross-attention type inputs; started work on downstream classes; removed mentions of position_ids * added optional dense hidden layer for masked and causal LM classes * docstring updates in MultiHeadEMA and GatedCrossAttention, removed unnecessary inputs in cross-attention * removed before_attn_fn in Mega class and updated docstrings and comments up to there * bug fix in MovingAverageGatedAttention masking * working conversion of MLM checkpoint in scratchpad script -- perfect matches * moved arg for hidden dense layer in LM head to config; discovered issue where from_pretrained is renaming gamma and beta parameters * renamed gamma and beta parameters to avoid HF renaming when loading from checkpoint * finished checkpoint conversion script * cleanup old class in mega config script * removed 'copied from' statements and passing integration tests * added num_attention_heads=1 to config for integration compatibility, decoder tests working, generation tests failing * fixed tuple output of megamodel * all common tests passing after fixing issues in decoder, gradient retention, and initialization * added mega-specific tests, ready for more documentation and style checks * updated docstrings; checkpoint before style fixes * style and quality checks, fixed initialization problem in float_tensor, ready for PR * added mega to toctree * removed unnecessary arg in megaconfig * removed unused arg and fixed code samples with leftover roberta models * Apply suggestions from code review Applied all suggestions except the one renaming a class, as I'll need to update that througout Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fixed issue where .view breaks batch dimension, conversion script fixed with absolute imports, updated readme with Mega->MEGA * removed asserts in Mega code, renamed sequencenorm, gatedcrossattention, and NFFN, replaced get_activation_fn with ACTFN, and added sequencenorm to layer norms * reformatted .forward() docstrings to match style and removed unused mask input in cross-attention * removed all reset_parameters() methods and rolled into MegaPreTrainedModel._init_weights() * renamed all single-letter variables and improved readability in tensor size comments, Mega->MEGA in 2 documentation files * variable names in NFFN * manual Mega->MEGA changes in docs * Mega->MEGA in config auto * style and quality fixes * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * renamed parameters and variables with confusing names, added copied from statements, moved fft conv to its own method, other cleanup from PR comments * commit before dealing with merge conflicts * made new attention activation functions available in ACT2FN and added generation test from OPT * style and quality in activations and tests * documentation fixes, renaming variables in dropout and rotary positions, used built-in causal masking, encoders->layers in MegaModel, moved comments into docstrings * style and quality fixes after latest updates, before rotary position ids * causal mask in MegaBlock docstring + added missing device passing * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update README.md Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * added Mega prefixes where missing, reverted MegaSequenceNorm to if-else, other module renaming requested in PR * style and quality fixes + readme updates pointing to main --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2023-03-24 08:17:27 -04:00
Joao Gante	0fa46524ac	Generate: Add GPTNeoX integration test (#22346 )	2023-03-24 11:33:16 +00:00
Ashwin Mathur	b79607656b	Fix typo in Greedy Search Description (#22345 ) Fix typo in greedy search docs	2023-03-24 07:32:18 -04:00
James Reed	c0fa2aa0b8	[HFTracer] Make embeddings ops take on the dtype of the weight (#22347 ) * [HFTracer] Make embeddings ops take on the dtype of the weight * fix bug	2023-03-24 07:04:51 -04:00
Yih-Dar	e8cc02555e	Automatically create/update tiny models (#22275 ) * Automatically create or update tiny models * Skip failed tests * update workflow file * use revision --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-03-23 19:14:17 +01:00
кѳѳsнī	a92e0ad2e2	Enable training Llama with model or pipeline parallelism (#22329 ) * Llama - Move target tokens to final pipeline device if needed * Update src/transformers/models/llama/modeling_llama.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/llama/modeling_llama.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2023-03-23 13:15:51 -04:00
Joao Gante	502fec779b	Generate: add test for left-padding support (#22322 )	2023-03-23 17:00:22 +00:00
jeffhataws	ec9b18f62d	Fix --bf16 option support for Neuron after PR #22300 (#22307 ) This PR fixes the "RuntimeError: No CUDA GPUs are available" when running with --bf16 option on Neuron. Related PRs: https://github.com/huggingface/transformers/pull/20684 https://github.com/huggingface/transformers/pull/22300	2023-03-23 12:27:13 -04:00
Batese2001	aef488c503	Added type hints to TFDeiTModel (#22327 ) * Added type hints to TFDeiTModel * make style --------- Co-authored-by: Matt <rocketknight1@gmail.com>	2023-03-23 15:31:32 +00:00
Samuel Larkin	59b9351b78	Minor typo in pipeline FillMaskPipeline's documentation. (#22339 )	2023-03-23 11:14:11 -04:00
Sylvain Gugger	506e7c6361	Fix various imports (#22281 ) * Fix various imports * Fix copies * Fix import	2023-03-23 10:34:17 -04:00
Quentin Lhoest	053c2153f8	Mention why one needs to specify max_steps in Trainer (#22333 ) * Mention why one needs to specify max_steps in Trainer * dummy change to trigger CI	2023-03-23 15:26:51 +01:00
mollerup23	5a9eb31477	Fixed gradient checkpoint bug for TimeSeriesTransformer (#22272 ) * Fixed gradient checkpoint bug for this model * Updating PR indentation (maintainer feedback) * make fixup --------- Co-authored-by: younesbelkada <younesbelkada@gmail.com>	2023-03-23 08:45:13 -04:00
Younes Belkada	ff20f9cf36	[`MBart`] Add `accelerate` support for MBart (#22309 ) add `accelerate` support for MBart	2023-03-23 10:34:43 +01:00

... 51 52 53 54 55 ...

15053 Commits