transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-31 02:02:21 +06:00

Author	SHA1	Message	Date
Sylvain Gugger	786092a35e	Rework a bit the LLaMA conversion script (#22236 ) * Update LLaMA conversion script * Doc * Fix the weight size for the 13B checkpoint * Update src/transformers/models/llama/convert_llama_weights_to_hf.py Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr> --------- Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>	2023-03-20 11:30:36 -04:00
Sylvain Gugger	43efd7cb13	Fix balanced and auto device_map (#22271 )	2023-03-20 11:24:17 -04:00
yqy2001	89f0fda5d3	Fix the gradient checkpointing bug of the llama model (#22270 ) fix grad ckpt bug of llama	2023-03-20 10:26:50 -04:00
heya5	cf0af9a31b	[Trainer] Add optional communication backends for torch.distributed when using GPU (#22247 ) Update training_args.py	2023-03-20 09:17:34 -04:00
Nicola Procopio	c4bf6f38bd	Italian translation perf_infer_cpu (#22243 ) * added translated files added perf_train_cpu and perf_train_cpu_many * updated toctree * updated toctree * added file perf_infer_cpu.medx * italian translation perf_infer_cpu.mdx	2023-03-20 09:16:07 -04:00
yesinkim	466144d440	[Docs] fix typos in some tokenizer docs (#22256 ) [Docs] fix typos Co-authored-by: yesinkim <yesinkim@yesinkimui-MacBookAir.local>	2023-03-20 12:17:31 +00:00
Pasquale Minervini	a48310de47	Update training_args.py -- a nightly install is not required anymore for torch.compile (#22266 ) Update training_args.py A nightly install is not required anymore for `torch.compile`.	2023-03-20 12:00:05 +00:00
Stas Bekman	60d51ef512	[trainer] param count for deepspeed zero3 (#22193 ) [trainer] param count for zero3	2023-03-17 11:02:55 -07:00
Guangyuan Ma	cf601b902f	Fix Unnecessary move of tensors from CPU to GPU in LlamaRotaryEmbedding (#22234 ) push	2023-03-17 13:56:32 -04:00
Yih-Dar	bec075612a	Revert "Use `dash==2.8.1` for now for daily CI" (#22233 ) Revert "Use `dash==2.8.1` for now for daily CI (#22227)" This reverts commit `53218671d9`.	2023-03-17 16:54:27 +01:00
Ali Hassani	3028b20a71	Fix natten (#22229 ) * Add kernel size to NATTEN's QK arguments. The new NATTEN 0.14.5 supports PyTorch 2.0, but also adds an additional argument to the QK operation to allow optional RPBs. This ends up failing NATTEN tests. This commit adds NATTEN back to circleci and adds the arguments to get it working again. * Force NATTEN >= 0.14.5	2023-03-17 11:07:55 -04:00
Seb0	074490b2c2	fix(docs): fix task guide links in model docs (#22226 ) fix(docs): task guide links in model docs	2023-03-17 14:30:17 +00:00
Maria Khalusova	314cdf7c25	Removed .mdx extension in two links (#22230 ) removed .mdx extension	2023-03-17 10:27:12 -04:00
lewtun	f251441387	Add LlamaForSequenceClassification (#22209 ) * Add LlamaForSequenceClassification * Update src/transformers/models/llama/modeling_llama.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/models/llama/modeling_llama.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Add docstring * Add test * Add input embedding getter and setter * Remove dead code --------- Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>	2023-03-17 14:39:26 +01:00
Wang, Yi	675d2a5a00	fix AutoTP in deepspeed could not work for bloom (#22196 ) * fix AutoTP in deepspeed could not work for bloom Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * add a method in BloomModel to build ailib Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> --------- Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>	2023-03-17 09:28:17 -04:00
Sylvain Gugger	00934026a4	LLaMA house-keeping (#22216 ) * LLaMA house-keeping * Doc links	2023-03-17 08:55:15 -04:00
Maria Khalusova	42f8f76402	Depth estimation task guide (#22205 ) * added doc to toc, auto tip with supported models, mention of task guide in model docs * make style * removed "see also" * minor fix	2023-03-17 08:36:23 -04:00
Yih-Dar	53218671d9	Use `dash==2.8.1` for now for daily CI (#22227 ) Use dash 2.8.1 for now Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-03-17 13:27:14 +01:00
wangpeng	af1c864cdc	fix code example in mgp-str doc (#22219 ) Co-authored-by: yue kun <yuekun.wp@alibaba-inc.com>	2023-03-17 09:40:06 +00:00
Kevin Turner	33d033d694	fix typos in llama.mdx (#22223 )	2023-03-17 08:43:18 +00:00
Yih-Dar	97a3d16a69	Hotfix for natten issue with torch 2.0.0 on CircleCI (#22218 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-03-16 23:57:26 +01:00
Yih-Dar	5110e5748e	🔥py38 + torch 2 🔥🔥🔥🚀 (#22204 ) * py38 + torch 2 * increment cache versions --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-03-16 22:59:23 +01:00
Susnato Dhar	fb366b9a2a	fixes a typo in WhisperFeatureExtractor docs. (#22208 ) * fixes a typo * .	2023-03-16 16:08:05 +00:00
Younes Belkada	da3ba3a167	[`XGLM`] Add `accelerate` support for XGLM (#22207 ) * add `accelerate` support for XGLM * fix order	2023-03-16 16:18:05 +01:00
SatyaJandhyalaAtMS	a88a4dae19	Temporarily fix ONNX model exporting error (#21830 ) * Temporarily fix https://github.com/microsoft/onnx-converters-private/issues/143 * Reduced column width * Fix formatting. * Revert "Temporarily fix https://github.com/microsoft/onnx-converters-private/issues/143" This reverts commit 6e95a108042118d204da447729f3834affa354fc. * Fix export error. * Revert "Fix formatting." This reverts commit 8310f60da10358edbdf77a2a2f3c83ee55066cb8. * Propagated changes made in SwinV2 to Swin2SR	2023-03-16 10:56:26 -04:00
Yih-Dar	4c5c0af7e5	Update tiny model creation script (#22202 ) * Update UNCONVERTIBLE_MODEL_ARCHITECTURES * Deal with 2 model tester classes in single test file * Deal with 2 model tester classes in single test file * Deal with 2 model tester classes in single test file * make style and quality --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-03-16 14:21:58 +01:00
Jason Phang	464d420775	LLaMA Implementation (#21955 ) * LLaMA * sharding and docs * tweak * black * inits * ruff * LLAMA_PRETRAINED_CONFIG_ARCHIVE_MAP * init * no checkpoint * docs * ruff * type_vocab_size * tokenizer fixes * tokenizer fixes * Update tokenization_llama.py * Update tokenization_llama.py * Update configuration_llama.py * Update modeling_llama.py * tokenizer add_bos by default * licenses * remove decoder * norms and mlp * rope overhaul * tweaks * black * mention OPT implementation * off-by-one naming * typo * fix * tokenization fix and slicing bug * padding config * cleanup * black * update tests * undo typo * fix vocab caching logic * ruff * docbuilder * attn fix from BlackSamorez * initial feedback * typo * docs * llama case * llama case * load checkpoint docs * comment about tokenizer * tokenizer defaults * clear past_key_values if use_cache=False * last tweaks * last tweaks * last tweaks * last tweaks --------- Co-authored-by: Stella Biderman <stellabiderman@gmail.com>	2023-03-16 09:01:15 -04:00
Jason Phang	0041be5b3d	LLaMA Implementation (#21955 ) * LLaMA * sharding and docs * tweak * black * inits * ruff * LLAMA_PRETRAINED_CONFIG_ARCHIVE_MAP * init * no checkpoint * docs * ruff * type_vocab_size * tokenizer fixes * tokenizer fixes * Update tokenization_llama.py * Update tokenization_llama.py * Update configuration_llama.py * Update modeling_llama.py * tokenizer add_bos by default * licenses * remove decoder * norms and mlp * rope overhaul * tweaks * black * mention OPT implementation * off-by-one naming * typo * fix * tokenization fix and slicing bug * padding config * cleanup * black * update tests * undo typo * fix vocab caching logic * ruff * docbuilder * attn fix from BlackSamorez * initial feedback * typo * docs * llama case * llama case * load checkpoint docs * comment about tokenizer * tokenizer defaults * clear past_key_values if use_cache=False * last tweaks * last tweaks * last tweaks * last tweaks --------- Co-authored-by: Stella Biderman <stellabiderman@gmail.com>	2023-03-16 09:00:53 -04:00
Baelish03	09922da4a7	Italian Translation of migration.mdx (#22183 ) * Tranlstion Italian: migration * Update migration.mdx minor fixes * Update _toctree.yml * Delete migration.mdx * Add italian translation of migration.mdx * Update of migration.mdx translation and toctree	2023-03-16 12:00:07 +00:00
Yih-Dar	52a57f7c7c	Update expected values in `MgpstrModelIntegrationTest` (#22195 ) Update values Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-03-16 11:48:52 +00:00
Alara Dirik	1485bd9c02	Fix typo in Align docs (#22199 ) Fix align docs typo	2023-03-16 13:41:48 +03:00
Yih-Dar	1c4a9acc73	Fix DeepSpeed CI (#22194 ) * Deal with torch-tensorrt --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-03-16 05:52:40 +01:00
Prathik Rao	7c4999e495	t5 remove data dependency (#22097 ) * t5 remove data dependency * make style * make fix-copies --------- Co-authored-by: Prathik Rao <prathikrao@microsoft.com>	2023-03-15 16:11:15 -04:00
Anahita Bhiwandiwalla	16121bae5c	Update BridgeTowerForContrastiveLearning (#22145 ) * Use return_loss for BridgeTowerForContrastiveLearning, add example * fix tests * Update example in BridgeTowerForContrastiveLearning * Update test_modeling_bridgetower.py * update model output format * minor update * Update src/transformers/models/bridgetower/modeling_bridgetower.py * make style --------- Co-authored-by: Tiep Le <97980157+tileintel@users.noreply.github.com> Co-authored-by: Tiep Le <tiep.le@intel.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-03-15 20:54:38 +01:00
Sylvain Gugger	42ad693b7b	Regression pipeline device (#22190 ) * Fix regression in pipeline when device=-1 is passed * Add regression test	2023-03-15 14:13:38 -04:00
amyeroberts	737681477c	Revert 22152 MaskedImageCompletionOutput changes (#22187 ) Revert changes	2023-03-15 18:37:23 +01:00
浮躁的小螃蟹	7b0e2cfdfb	Fix: unfinished_sequences with correct device (#22184 ) Fix: unfinished_sequences with correct device The original code was causing errors when running torch.jit.trace due to the tensor options being incorrect. I fixed this by using torch.ones to create a tensor with the correct device and dtype. This should resolve the issue with running torch.jit.trace.	2023-03-15 16:27:19 +00:00
Sylvain Gugger	f7329751fe	Run all tests by default (#22162 )	2023-03-14 17:30:43 -04:00
Sylvain Gugger	b7036f4912	Load optimizer state on CPU to avoid CUDA OOM (#22159 )	2023-03-14 17:30:32 -04:00
Sylvain Gugger	ebdb185bef	v4.28.0.dev0	2023-03-14 13:49:10 -04:00
Sylvain Gugger	c52c5282ef	Revert "Enforce same behavior as PyTorch 2.0 for older versions" (#22163 ) Revert "Enforce same behavior as PyTorch 2.0 for older versions (#22136)" This reverts commit `1c801d65eb`.	2023-03-14 13:45:46 -04:00
Stas Bekman	085bf5c1fe	[trainer] add `--optim adamw_torch_fused` for pt-2.0+ (#22144 ) * [trainer] add --optim adamw_torch_fused * change optim default * deal with non-torch * revert default change; prep; add fp16/amp assert * typo * typo	2023-03-14 10:22:03 -07:00
amyeroberts	c6318c3788	to_pil - don't rescale if int and in range 0-255 (#22158 ) * Don't rescale if in and in range 0-255 * Raise value error if int values too large * Update tests/test_image_transforms.py * Update tests/test_image_transforms.py	2023-03-14 15:43:44 +00:00
Alara Dirik	3b22bfbc6a	Create MaskedImageCompletionOutput and fix ViT docs (#22152 ) * create MaskedImageCompletionOutput * fix bugs * fix bugs	2023-03-14 13:55:18 +00:00
Sylvain Gugger	b45192ec47	Fix big model inference for T5 models in float16 (#22095 ) * Fix big model inference for T5 models in float16 * Apply suggestions from code review Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Style * Trigger CI with latest release --------- Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>	2023-03-14 09:20:16 -04:00
Nicola Procopio	7f5ad6c35b	Translation Italian: perf_train_cpu and perf_train_cpu_many (#22151 ) * added translated files added perf_train_cpu and perf_train_cpu_many * updated toctree	2023-03-14 11:09:36 +00:00
Yih-Dar	ff88703501	Update 2 doctest expected values for torch 2.0.0 (#22148 ) update values Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-03-14 09:13:16 +00:00
Alara Dirik	cdddfbffa1	Add ConvNeXT V2 (#21679 ) * Add ConvNeXt V2 to transformers * TF model is separated from the PR to fix issues	2023-03-14 12:08:14 +03:00
Yih-Dar	6c2ad00c46	Move `is_pipeline_test_to_skip` to specific model test classes (#21999 ) * Move `is_pipeline_test_to_skip` to specific model test classes --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-03-14 10:03:02 +01:00
Arthur	2beabd24f0	[🛠️] Fix-whisper-breaking-changes (#21965 ) * temp fix * temporary fix * update * fix tests * fixup * update based on reveiew Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * update to fix tests * update docstring --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>	2023-03-14 09:23:48 +01:00

1 2 3 4 5 ...

12371 Commits