transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-31 02:02:21 +06:00

Author	SHA1	Message	Date
bytebarde	be3fd8a262	[Flash Attention 2] Add flash attention 2 for GPT-J (#28295 ) * initial implementation of flash attention for gptj * modify flash attention and overwrite test_flash_attn_2_generate_padding_right * update flash attention support list * remove the copy line in the `CodeGenBlock` * address copy mechanism * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Add GPTJ attention classes * add expected outputs in the gptj test * Ensure repo consistency with 'make fix-copies' --------- Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-03-13 08:43:00 +01:00
Younes Belkada	d522afea13	[`Gemma`] Supports converting directly in half-precision (#29529 ) * Update convert_gemma_weights_to_hf.py * Update src/transformers/models/gemma/convert_gemma_weights_to_hf.py * fixup	2024-03-12 22:44:49 +01:00
Joao Gante	d47966536c	Examples: check `max_position_embeddings` in the translation example (#29600 ) check max_position_embeddings	2024-03-12 18:58:12 +00:00
Bharat Ramanathan	6b660d5ed5	Fix: handle logging of scalars in Weights & Biases summary (#29612 ) fix: handle logging of scalars in wandb summary fixes: #29430	2024-03-12 18:26:09 +00:00
Raushan Turganbay	8e64ba2890	Add tests for batching support (#29297 ) * add tests for batching support * Update src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update tests/test_modeling_common.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update tests/test_modeling_common.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update tests/test_modeling_common.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * fixes and comments * use cosine distance for conv models * skip mra model testing * Update tests/models/vilt/test_modeling_vilt.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * finzalize and make style * check model type by input names * Update tests/models/vilt/test_modeling_vilt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fixed batch size for all testers * Revert "fixed batch size for all testers" This reverts commit `525f3a0a05`. * add batch_size for all testers * dict from model output * do not skip layoutlm * bring back some code from git revert * Update tests/test_modeling_common.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/test_modeling_common.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * clean-up * where did minus go in tolerance * make whisper happy * deal with consequences of losing minus * deal with consequences of losing minus * maskformer needs its own test for happiness * fix more models * tag flaky CV models from Amy's approval * make codestyle --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-03-12 17:46:19 +00:00
Furkan Akkurt	11163fff58	Fix typo ; Update quantization.md (#29615 ) Update quantization.md	2024-03-12 16:32:50 +00:00
Yih-Dar	a15bd3af4e	Update flava tests (#29611 ) * update * update * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-03-12 17:04:53 +01:00
Matt	df1542581e	Set env var to hold Keras at Keras 2 (#29598 ) * Set env var to hold Keras at Keras 2 * Add Amy's update * make fixup * Use a warning instead	2024-03-12 13:49:57 +00:00
Hilco van der Wilk	b6404866cd	Update legacy Repository usage in various example files (#29085 ) * Update legacy Repository usage in `examples/pytorch/text-classification/run_glue_no_trainer.py` Marked for deprecation here https://huggingface.co/docs/huggingface_hub/guides/upload#legacy-upload-files-with-git-lfs * Fix import order * Replace all example usage of deprecated Repository * Fix remaining repo call and rename args variable * Revert removing creation of gitignore files and don't change research examples	2024-03-12 13:20:49 +00:00
tomigee	f1a565a39f	Implemented add_pooling_layer arg to TFBertModel (#29603 ) Implemented add_pooling_layer argument	2024-03-12 13:01:55 +00:00
Kola	50ec493363	Fix typo (determine) (#29606 ) * Fix type (determine) * ruff * Update src/transformers/models/mamba/configuration_mamba.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-03-12 12:56:51 +00:00
Matt	81ec8028f9	Stop passing None to compile() in TF examples (#29597 ) * Fix examples to stop passing None to compile(), rework example invocation for run_text_classification.py * Add Amy's fix	2024-03-12 12:22:29 +00:00
Dries Verachtert	73efe896df	Fix minor typo: softare => software (#29602 )	2024-03-12 10:39:56 +00:00
Raushan Turganbay	6cc5411d81	Fix Fuyu doc typos (#29601 ) fix fuyu docs	2024-03-12 10:16:21 +00:00
Pedro Cuenca	b382a09e28	Experimental loading of MLX files (#29511 ) * Experimental loading of MLX files * Update exception message * Add test * Style * Use model from hf-internal-testing	2024-03-11 18:42:06 +00:00
fzyzcjy	73a27345d4	Tiny improvement for doc (#29581 ) * Update add_new_model.md * Update docs/source/en/add_new_model.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-03-11 17:43:35 +00:00
Amrit Gupta	b45c0f55e0	Fixed broken link (#29558 ) Fixed broken link for Resources -> Token Classification -> Finetuning BERT for named-entity	2024-03-11 17:26:38 +00:00
Klaus Hipp	c1e478aa7f	Add missing localized READMEs to the copies check (#29575 ) * Add missing localized READMEs to the copies check * Run check to resolve all inconsistencies	2024-03-11 17:17:42 +00:00
yuanzhoulvpi	47c9570903	fix error: TypeError: Object of type Tensor is not JSON serializable … (#29568 ) fix error: TypeError: Object of type Tensor is not JSON serializable trainer Co-authored-by: Zach Mueller <muellerzr@gmail.com>	2024-03-11 17:15:36 +00:00
Yih-Dar	e5eb55b88b	Don't use a subset in test fetcher if on `main` branch (#28816 ) save ci life Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-03-11 16:58:06 +01:00
Klaus Hipp	dd1c905215	[Docs] Fix FastSpeech2Conformer model doc links (#29574 ) [Docs] Fix FastSpeech2Conformer links	2024-03-11 14:14:03 +00:00
Yitong Huang	873d9bb3cc	Make torch xla available on GPU (#29334 ) * add USE_TORCH_XLA env * rename torch_tpu to torch_xla * better is_torch_xla_available; fix some fsdp and performance issues * fix format * fix bug when pjrt_device is cpu * fix bug * fix the deprecation handling --------- Co-authored-by: anw90 <ang868@gmail.com> Co-authored-by: wangang.wa <wangang.wa@alibaba-inc.com>	2024-03-11 14:07:16 +00:00
Damith Senanayake	9a3f4d4daf	Bark model Flash Attention 2 Enabling to pass on check_device_map parameter to super() (#29357 ) * Fixing error #29332. The _check_and_enable_flash_attn_2() method receives a check_device_map parameter and fails. * style fixup	2024-03-11 12:44:12 +00:00
Tanay Mehta	6d67837f06	Add Fill-in-the-middle training objective example - PyTorch (#27464 ) * add: initial script to train clm fim * fix: if training model from scratch, new tokens will be added and embeddings resized * fix: fixed attention_mask errors when generating FIM data * fix: file formatted using black * add: run_fim_no_trainer.py and fixed some comments in run_fim.py * add: added fim examples to the README.md and ran code fixup * fix: little bug in both fim training scripts * fix: remove comment from notebook and added a note on fim related params * fix: minor typo in README * add: suggested minor changes to README and run_fim.py * add: gradient_accumulation_steps and gradient_checkpointing args * add: improved model embedding resizing * add: pad_to_multiple_of and attn_implementation params * add: requested minor changes * add: deepspeed zero compatibility * add: resize embeddings layer with zero3 support for fim model initialization	2024-03-11 12:14:02 +00:00
j-gc	d80c9a3497	[`Docs`] fixed minor typo (#29555 )	2024-03-11 11:05:16 +00:00
Arthur	4f27ee936a	[`Mamba doc`] Post merge updates (#29472 ) * post merge update * nit * oups	2024-03-11 09:46:24 +01:00
Winston H	0290ec19c9	feat: use `warning_advice` for tensorflow warning (#29540 ) feat: use `warning_advice` instead of tensorflow warning	2024-03-08 17:27:30 +00:00
Zach Mueller	469c13280d	Fix eval thread fork bomb (#29538 ) * Fix eval thread fork bomb * Keep eval dl persistent and prepare after so free_memory doesn't destroy it * Add note * Quality	2024-03-08 11:04:18 -05:00
Fanli Lin	3f6973db06	[tests] use the correct `n_gpu` in `TrainerIntegrationTest::test_train_and_eval_dataloaders` for XPU (#29307 ) * fix n_gpu * fix style	2024-03-08 10:52:25 -05:00
Yoach Lacombe	1ba89dc2d2	Fix WhisperNoSpeechDetection when input is full silence (#29065 ) fix total silence input with no_speech_threshold	2024-03-08 14:31:05 +00:00
Yun Dai	697f05bab3	fix typos in FSDP config parsing logic in `TrainingArguments` (#29189 ) fix FSDP config	2024-03-08 08:36:30 -05:00
Jonatan Kłosko	608fa5496c	Make sliding window size inclusive in eager attention (#29519 ) * Make sliding window size inclusive in eager attention * Fix tests	2024-03-08 12:53:17 +00:00
liangjs	f386c51ad9	StableLM: Fix dropout argument type error (#29236 ) * fix stablelm dropout argument type error * fix docs of _flash_attention_forward * fix all docs of _flash_attention_forward * fix docs of _flash_attention_forward in starcoder2 --------- Co-authored-by: oliang <oliang@tencent.com>	2024-03-08 11:58:25 +00:00
Fanli Lin	1ea3ad1aec	[tests] use `torch_device` instead of `auto` for model testing (#29531 ) * use torch_device * skip for XPU * Update tests/generation/test_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-03-08 11:21:43 +00:00
Clémentine Fourrier	14536c339a	Typo fix in error message (#29535 )	2024-03-08 11:20:31 +00:00
Wang, Yi	8ee1d47203	fix image-to-text batch incorrect output issue (#29342 ) * fix image-to-text batch incorrect output issue Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * add ci test Signed-off-by: Wang, Yi <yi.a.wang@intel.com> * update ci test Signed-off-by: Wang, Yi <yi.a.wang@intel.com> --------- Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> Signed-off-by: Wang, Yi <yi.a.wang@intel.com>	2024-03-08 11:11:10 +00:00
Fanli Lin	8e589c83b6	[tests] add the missing `require_sacremoses` decorator (#29504 ) * add sacremoses check * fix style * for FlaubertTokenizer * HerbertTokenizer fix * add typeHint * Update src/transformers/testing_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * make less skipped * make quality * remove import --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-03-08 10:13:54 +00:00
Joao Gante	bc764f4263	Generate: left-padding test, revisited (#29515 ) * left-padding test revisited * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-03-08 10:06:46 +00:00
Pedro Cuenca	631fa7bf6b	Typo in mlx tensor support (#29509 ) Potential typo in mlx support	2024-03-08 09:47:44 +00:00
Nick DeGroot	b338a6c3b8	Fix `VisionEncoderDecoder` Positional Arg (#29497 ) * 🐛 Fix vision encoder decoder positional arg * ✅ Add test for VisionEncoderDecoder with LayoutLMv3 encoder --------- Co-authored-by: Nick DeGroot <1966472+nickthegroot@users.noreply.github.com>	2024-03-07 20:45:51 +00:00
Alvaro Bartolome	ddf177ee4a	Set `inputs` as kwarg in `TextClassificationPipeline` (#29495 ) * Set `inputs` as kwarg in `TextClassificationPipeline` This change has been done to align the `TextClassificationPipeline` with the rest of the pipelines, and to be able to e.g. `pipeline(*{"inputs": "text"})` which wouldn't be possible since the `args` were being used instead. * Add `noqa: C409` on `tuple([inputs],)` Even though is discouraged by the linter, the cast `tuple(list(...),)` is required here, as otherwise the original list in `inputs` will be transformed into a `tuple` and the elements 1...N will be ignored by the `Pipeline` * Run `ruff format` * Simplify `tuple` conversion with `(inputs,)` Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> --------- Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>	2024-03-07 20:43:57 +00:00
amyeroberts	4ed9ae623d	test_generation_config_is_loaded_with_model - fall back to pytorch model for now (#29521 ) * Fall back to pytorch model for now * Fix up	2024-03-07 17:30:28 +00:00
Alex Ishida	45c0651090	Add support for metadata format MLX (#29335 ) Add support for loading safetensors files saved with metadata format mlx.	2024-03-07 14:51:59 +01:00
Raushan Turganbay	923733c22b	Flava multimodal add attention mask (#29446 ) * flava multimodal add attn mask * make style * check mask is not None	2024-03-07 12:45:47 +01:00
Ashok Pon Kumar	9288e759ad	fix: Avoid error when fsdp_config is missing xla_fsdp_v2 (#29480 ) Signed-off-by: Ashok Pon Kumar Sree Prakash <ashokponkumar@gmail.com>	2024-03-07 12:44:23 +01:00
Lysandre Debut	f6133d767a	Revert "Automatic safetensors conversion when lacking these files (#2… (#29507 ) Revert "Automatic safetensors conversion when lacking these files (#29390)" This reverts commit `a69cbf4e64`.	2024-03-07 12:12:41 +01:00
Joao Gante	ffe60fdcd6	v4.39 deprecations 🧼 (#29492 )	2024-03-07 10:44:43 +00:00
regisss	979fccc90f	Enable BLIP for auto VQA (#29499 ) * Enable BLIP for auto VQA * Make style * Add VQA to BLIP pipeline tests	2024-03-07 10:28:01 +01:00
Park Jun	d45f47ab7f	Fix: Disable torch.autocast in RotaryEmbedding of Gemma and LLaMa for MPS device (#29439 ) * Fix: Disable torch.autocast in RotaryEmbedding of Gemma and LLaMa for MPS devices * Update src/transformers/models/gemma/modeling_gemma.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update llama ang gemma rope use cpu in mps device --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-03-07 00:57:22 +01:00
Glen Taggart	2a939f20ff	Substantially reduce memory usage in _update_causal_mask for large batches by using .expand instead of .repeat [needs tests+sanity check] (#29413 ) * try to fix gemma mem use * fix: handle attention mask dim==2 case * remove logits=logits.float() * clean up + add llama * apply formatting * readability edit: swap order of items being multiplied * revert change unrelated to PR * revert black autoformat * switch to one .to * Accept style edits Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-03-07 00:56:25 +01:00

1 2 3 4 5 ...

15334 Commits