transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-31 02:02:21 +06:00

Author	SHA1	Message	Date
Fanli Lin	c403441339	[docs] add the missing huggingface hub username (#33431 ) * add username * update username * add username	2024-09-11 09:56:40 -07:00
daejin	ecf7024bde	Fix: Cast prefetch_bucket_size to integer for deepspeed >= 0.15 (#33402 ) Fix: Cast prefetch bucket size to integer in zero_optimization	2024-09-11 14:25:48 +02:00
Jonathan Mamou	7a51cbc65f	Dynamic number of speculative tokens in order to accelerate speculative decoding (#33258 ) * optimal Speculation Lookahead based on probability * update peer finished condition * add support to do_sample True * add stopping criteria * gitignore * add print * remove prints * minor * minor * git ignore * adding test to stopping ConfidenceCriteria * doc + format * add doc * Update .gitignore * update docstring and default value of assistant_confidence_threshold * add docstring * Update src/transformers/generation/configuration_utils.py implicit default value (None) Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * style fix --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2024-09-11 14:22:28 +02:00
Albert Villanova del Moral	42babe8548	Remove deprecated task in load_dataset (#33433 )	2024-09-11 14:18:32 +02:00
Lysandre Debut	91f19a5b18	Fix failing windows (#33436 ) * Encoding * style	2024-09-11 14:06:16 +02:00
Theia Vogel	e719b65c31	Fix `FbgemmFp8Linear` not preserving tensor shape (#33239 ) * add tests for linear shape behavior * fix linear shape behavior ended up adding the reshape at the end, after f8f8bf16_rowwise, because adding it directly after quantize_fp8_per_row caused f8f8bf16_rowwise to drop the seq_len dimension. (i.e., (17, 23, 1014) -> (17, 1024)) * save shape up front + comment	2024-09-11 13:26:44 +02:00
Ita Zaporozhets	781bbc4d98	use diff internal model in tests (#33387 ) * use diff internal model in tests * use diff internal model in tests	2024-09-11 11:27:00 +02:00
Guang Yang	f38590dade	Make StaticCache configurable at model construct time (#32830 ) * Make StaticCache configurable at model construct time * integrations import structure * add new doc file to toc --------- Co-authored-by: Guang Yang <guangyang@fb.com> Co-authored-by: Joao Gante <joao@huggingface.co>	2024-09-10 16:35:57 +01:00
Bruno Hays	dfee4f2362	Update WhisperTokenizer Doc: Timestamps and Previous Tokens Behaviour (#33390 ) * added doc explaining behaviour regarding tokens timestamps and previous tokens * copied changes to faster tokenizer --------- Co-authored-by: Bruno Hays <bruno.hays@illuin.tech>	2024-09-10 16:49:28 +02:00
Rishiraj Acharya	6ed2b10942	Bug Fix: Update hub.py to fix NoneType error (#33315 ) * Bug Fix: Update hub.py Bug: TypeError: argument of type 'NoneType' is not iterable Analysis: The error `TypeError: argument of type 'NoneType' is not iterable` suggests that `model_card.data.tags` is `None`, and the code is trying to iterate through it using `not in`. Fix: 1. Check if `model_card.data.tags` is `None` before the loop: Since you're checking the variable `tags` before the loop, you should also ensure that `model_card.data.tags` is not `None`. You can do this by initializing `model_card.data.tags` to an empty list if it's `None`. 2. Updated code: Add a check and initialize the `tags` if it is `None` before proceeding with the iteration. This way, if `model_card.data.tags` is `None`, it gets converted to an empty list before checking the contents. This prevents the `TypeError`. * Update hub.py	2024-09-10 16:39:19 +02:00
Alazar	96429e74a8	Add support for GGUF Phi-3 (#31844 ) * Update docs for GGUF supported models * Add tensor mappings and define class GGUFPhi3Converter * Fix tokenizer * Working version * Attempt to fix some CI failures * Run ruff format * Add vocab, merges, decoder methods like LlamaConverter * Resolve conflicts since Qwen2Moe was added to gguf - I missed one place when resolving conflict - I also made a mistake with tests_ggml.py and now has been fixed to reflect its master version.	2024-09-10 13:32:38 +02:00
Maciej Adamiak	8e8e7d8558	fixed Mask2Former image processor segmentation maps handling (#33364 ) * fixed mask2former image processor segmentation maps handling * introduced review suggestions * introduced review suggestions	2024-09-10 11:19:56 +01:00
Raushan Turganbay	7d2d6ce9cb	VLM: fixes after refactor (#32907 ) * leave only half of the changes * fix tests * [run-slow] llava, llava_next, llava_next_video, vipllava, video_llava * fix tests, first try * [run-slow] llava, llava_next, llava_next_video, vipllava, video_llava * fix, second try * [run-slow] llava, llava_next, llava_next_video, vipllava, video_llava * fix * [run-slow] llava, llava_next, llava_next_video, vipllava, video_llava	2024-09-10 12:02:37 +02:00
Lysandre Debut	f24f084329	Import structure & first three model refactors (#31329 ) * Import structure & first three model refactors * Register -> Export. Export all in __all__. Sensible defaults according to filename. * Apply most comments from Amy and some comments from Lucain Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Lucain Pouget <lucainp@gmail.com> * Style * Add comment * Clearer .py management * Raise if not in backend mapping * More specific type * More efficient listdir * Misc fixes --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Lucain Pouget <lucainp@gmail.com>	2024-09-10 11:10:53 +02:00
Younes Belkada	7f112caac2	Fix import of `FalconMambaForCausalLM` (#33381 ) * fix build issues with FM kernels * try another approach * test * fix * add init files * push fix * fix * fixup * fix duplicate * fix * fix * fix	2024-09-10 09:14:54 +02:00
amyeroberts	f745e7d3f9	Remove repeated prepare_images in processor tests (#33163 ) * Remove repeated prepare_images * Address comments - update docstring; explanatory comment	2024-09-09 13:20:27 +01:00
Lysandre Debut	0574fa668b	Adjust templates (#33384 ) * Adjust templates * Update .github/ISSUE_TEMPLATE/bug-report.yml Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Chat templates --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-09-09 14:00:43 +02:00
Raushan Turganbay	65bb284448	Compile compatibilty for decoder-only models (#32617 ) * squash into one commit * add qwen2-vl for rope standardization * fix mistral compile * fix qwen2-vl * fix-copies	2024-09-09 10:59:04 +02:00
Nilay Bhatnagar	eedd21b9e7	Fixed Majority of the Typos in `transformers[en]` Documentation (#33350 ) * Fixed typo: insted to instead * Fixed typo: relase to release * Fixed typo: nighlty to nightly * Fixed typos: versatible, benchamarks, becnhmark to versatile, benchmark, benchmarks * Fixed typo in comment: quantizd to quantized * Fixed typo: architecutre to architecture * Fixed typo: contibution to contribution * Fixed typo: Presequities to Prerequisites * Fixed typo: faste to faster * Fixed typo: extendeding to extending * Fixed typo: segmetantion_maps to segmentation_maps * Fixed typo: Alternativelly to Alternatively * Fixed incorrectly defined variable: output to output_disabled * Fixed typo in library name: tranformers.onnx to transformers.onnx * Fixed missing import: import tensorflow as tf * Fixed incorrectly defined variable: token_tensor to tokens_tensor * Fixed missing import: import torch * Fixed incorrectly defined variable and typo: uromaize to uromanize * Fixed incorrectly defined variable and typo: uromaize to uromanize * Fixed typo in function args: numpy.ndarry to numpy.ndarray * Fixed Inconsistent Library Name: Torchscript to TorchScript * Fixed Inconsistent Class Name: OneformerProcessor to OneFormerProcessor * Fixed Inconsistent Class Named Typo: TFLNetForMultipleChoice to TFXLNetForMultipleChoice * Fixed Inconsistent Library Name Typo: Pytorch to PyTorch * Fixed Inconsistent Function Name Typo: captureWarning to captureWarnings * Fixed Inconsistent Library Name Typo: Pytorch to PyTorch * Fixed Inconsistent Class Name Typo: TrainingArgument to TrainingArguments * Fixed Inconsistent Model Name Typo: Swin2R to Swin2SR * Fixed Inconsistent Model Name Typo: EART to BERT * Fixed Inconsistent Library Name Typo: TensorFLow to TensorFlow * Fixed Broken Link for Speech Emotion Classification with Wav2Vec2 * Fixed minor missing word Typo * Fixed minor missing word Typo * Fixed minor missing word Typo * Fixed minor missing word Typo * Fixed minor missing word Typo * Fixed minor missing word Typo * Fixed minor missing word Typo * Fixed minor missing word Typo * Fixed Punctuation: Two commas * Fixed Punctuation: No Space between XLM-R and is * Fixed Punctuation: No Space between [~accelerate.Accelerator.backward] and method * Added backticks to display model.fit() in codeblock * Added backticks to display openai-community/gpt2 in codeblock * Fixed Minor Typo: will to with * Fixed Minor Typo: is to are * Fixed Minor Typo: in to on * Fixed Minor Typo: inhibits to exhibits * Fixed Minor Typo: they need to it needs * Fixed Minor Typo: cast the load the checkpoints To load the checkpoints * Fixed Inconsistent Class Name Typo: TFCamembertForCasualLM to TFCamembertForCausalLM * Fixed typo in attribute name: outputs.last_hidden_states to outputs.last_hidden_state * Added missing verbosity level: fatal * Fixed Minor Typo: take To takes * Fixed Minor Typo: heuristic To heuristics * Fixed Minor Typo: setting To settings * Fixed Minor Typo: Content To Contents * Fixed Minor Typo: millions To million * Fixed Minor Typo: difference To differences * Fixed Minor Typo: while extract To which extracts * Fixed Minor Typo: Hereby To Here * Fixed Minor Typo: addition To additional * Fixed Minor Typo: supports To supported * Fixed Minor Typo: so that benchmark results TO as a consequence, benchmark * Fixed Minor Typo: a To an * Fixed Minor Typo: a To an * Fixed Minor Typo: Chain-of-though To Chain-of-thought	2024-09-09 10:47:24 +02:00
Aymeric Roucher	489cbfd6d3	Add visit webpage tool (#33353 ) * Add VisitWebpageTool	2024-09-09 10:32:42 +02:00
Wing Lian	62aecd85ff	schedulefree optimizers (#30079 ) * schedulefree optimizers * fix train instead of eval for optimizer * fixes and update docs * chore: lint * add tests and drop overly-verbose _32bit suffix * chore: lint * fix for docs * fix code review issues * use duck-typing to avoid per-optimizer patches * fixup style * fixup style * warn if incorrect accelerate version with schedule free Co-authored-by: Aman Gupta Karmani <aman@tmm1.net> --------- Co-authored-by: Aman Karmani <aman@tmm1.net>	2024-09-09 09:51:39 +02:00
Raushan Turganbay	60226fdc1d	Fix quantized cache tests (#33351 ) * fix * fix * better fix * Update src/transformers/generation/configuration_utils.py Co-authored-by: Lysandre Debut <hi@lysand.re> --------- Co-authored-by: Lysandre Debut <hi@lysand.re>	2024-09-09 09:09:58 +02:00
Nicholas Broad	66bc4def95	add sdpa mbart (#32033 ) * add sdpa mbart useful for donut * update sdpa docs * formatting * add self._use_sdpa in mbartencoder * use self.config to check attn * retrigger checks * [run-slow] mbart	2024-09-06 17:31:24 -07:00
Daniel Lok	a70286f827	Update author for QLorA/PEFT community notebook (#33338 ) update author Signed-off-by: Daniel Lok <daniel.lok@databricks.com>	2024-09-06 22:50:26 +02:00
Matt	d7b04ea14d	Fix Prefill docs (#33352 ) last -> final	2024-09-06 17:57:54 +01:00
Joao Gante	6ff6069fa7	RoPE: fix BC warning (#33331 )	2024-09-06 16:15:11 +01:00
Arthur	2d757002fc	red-ci on main, fix copies (#33356 ) * fix copies * ???	2024-09-06 17:06:39 +02:00
Ita Zaporozhets	e48e5f1f13	Support reading tiktoken tokenizer.model file (#31656 ) * use existing TikTokenConverter to read tiktoken tokenizer.model file * del test file * create titktoken integration file * adding tiktoken llama test * ALTNATIVE IMPLEMENTATION: supports llama 405B * fix one char * remove redundant line * small fix * rm unused import * flag for converting from tiktokeng * remove unneeded file * ruff * remove llamatiktokenconverter, stick to general converter * tiktoken support v2 * update test * remove stale changes * udpate doc * protect import * use is_protobuf_available * add templateprocessor in tiktokenconverter * reverting templateprocessor from tiktoken support * update test * add require_tiktoken * dev-ci * trigger build * trigger build again * dev-ci * [build-ci-image] tiktoken * dev-ci * dev-ci * dev-ci * dev-ci * change tiktoken file name * feedback review * feedback rev * applying feedback, removing tiktoken converters * conform test * adding docs for review * add doc file for review * add doc file for review * add doc file for review * support loading model without config.json file * Revert "support loading model without config.json file" This reverts commit 2753602e51c34cef2f184eb11f36d2ad1b02babb. * remove dev var * updating docs * safely import protobuf * fix protobuf import error * fix protobuf import error * trying isort to fix ruff error * fix ruff error * try to fix ruff again * try to fix ruff again * try to fix ruff again * doc table of contents * add fix for consistency.dockerfile torchaudio * ruff * applying feedback * minor typo * merging with push-ci-image * clean up imports * revert dockerfile consistency	2024-09-06 14:24:02 +02:00
Shiyu	342e800086	support 3D attention mask in bert (#32105 ) * support 3D/4D attention mask in bert * test cases * update doc * fix doc	2024-09-06 14:20:48 +02:00
GeLee	2b18354106	add self.head_dim for VisionAttention in Qwen2-VL (#33211 ) * add self.head_dim for VisionAttention in Qwen2-VL * add self.head_dim for VisionAttention in Qwen2-VL * fix ci * black the test_modeling_qwen2_vl.py * use ruff to format test_modeling_qwen2_vl.py * [run-slow] qwen2_vl * use tying for python3.8 * fix the import format * use ruff to fix the ci error I001 * [run-slow] qwen2_vl * remove unused import * commit for rebase * use ruff fix ci * [run-slow] qwen2_vl --------- Co-authored-by: root <liji>	2024-09-06 17:19:29 +05:00
Amir Mohammad Fakhimi	3314fe1760	Add validation for maximum sequence length in modeling_whisper.py (#33196 ) * Add validation for maximum sequence length in modeling_whisper.py Added a validation check to ensure that the sequence length of labels does not exceed the maximum allowed length of 448 tokens. If the sequence length exceeds this limit, a ValueError is raised with a descriptive error message. This change prevents the model from encountering errors or unexpected behavior due to excessively long sequences during training or fine-tuning, ensuring consistent input dimensions and improving overall robustness. * Change exception message in src/transformers/models/whisper/modeling_whisper.py The exception message is for whisper's label's sequence max length. Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * Change 448 to config.max_target_positions in src/transformers/models/whisper/modeling_whisper.py It's for whisper's config.max_target_positions. Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * Change method's documentation in src/transformers/models/whisper/modeling_whisper.py * Add test for maximum label's sequence length in test_modeling_whisper.py * Add self to modeling_whisper.py * Update test_modeling_whisper.py with respect to automatic validations * Update modeling_whisper.py with respect to ci/circleci: check_code_quality * Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality * Update test_modeling_whisper.py with respect to ci/circleci: tests_generate * Update test_modeling_whisper.py with respect to ci/circleci: tests_generate * Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality * Separate test_labels_sequence_max_length tests in test_modeling_whisper.py * Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality * Remove assert from test_modeling_whisper.py * Add max_target_positions to WhisperModelTester in test_modeling_whisper.py * Update test_modeling_whisper.py with respect to ci/circleci: check_code_quality * Update test_modeling_whisper.py with respect to ci/circleci: tests_generate * Update test_modeling_whisper.py * Change test_labels_sequence_max_length_error_after_changing_config in test_modeling_whisper.py * Change self.config.max_target_positions to self.max_target_positions modeling_whisper.py * Add new tests in test_modeling_whisper.py * Update test_modeling_whisper.py --------- Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>	2024-09-06 14:09:49 +02:00
Ita Zaporozhets	363301f221	support loading model without config.json file (#32356 ) * support loading model without config.json file * fix condition * update tests * add test * ruff * ruff * ruff	2024-09-06 13:49:47 +02:00
Xuehai Pan	e1c2b69c34	Load dynamic module (remote code) only once if code isn't change (#33162 ) * Load remote code only once * Use hash as load indicator * Add a new option `force_reload` for old behavior (i.e. always reload) * Add test for dynamic module is cached * Add more type annotations to improve code readability * Address comments from code review	2024-09-06 12:49:35 +01:00
Shijie	1bd9d1c899	fix qwen2vl vision eager-attention (#33213 ) * fix-qwen2vl-vision-eager-attention * code-quality * Update src/transformers/models/qwen2_vl/modeling_qwen2_vl.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * code-quality --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-09-06 13:42:17 +02:00
Sanchit Gandhi	51d15eb1c1	[whisper] alternative fix for long-form timestamps (#32131 ) * [whisper] alternative fix for long-form timestamps * update test	2024-09-06 12:57:08 +02:00
Joao Gante	2b789f27f3	Docs: add more cross-references to the KV cache docs (#33323 ) * add more cross-references * nit * import guard * more import guards * nit * Update src/transformers/generation/configuration_utils.py	2024-09-06 10:22:00 +01:00
Raushan Turganbay	1759bb9126	Fix: StaticCache & `inputs_embeds` (#32932 ) squash commit	2024-09-06 12:56:59 +05:00
Daniel Lok	5792c459ed	Add a community notebook for fine-tuning with QLoRA, PEFT, and MLflow (#33319 ) add notebook for finetuning with mlflow Signed-off-by: Daniel Lok <daniel.lok@databricks.com>	2024-09-06 09:35:01 +02:00
Shijie	21fac7abba	simple align qwen2vl kv_seq_len calculation with qwen2 (#33161 ) * qwen2vl_align_kv_seqlen_to_qwen2 * flash att test * [run-slow] qwen2_vl * [run-slow] qwen2_vl fix OOM * [run-slow] qwen2_vl * Update tests/models/qwen2_vl/test_modeling_qwen2_vl.py Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz> * Update tests/models/qwen2_vl/test_modeling_qwen2_vl.py Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz> * code quality --------- Co-authored-by: baishuai.bs <1051314669@qq.com> Co-authored-by: ShuaiBai623 <baishuai623@icloud.com> Co-authored-by: ShuaiBai623 <43326198+ShuaiBai623@users.noreply.github.com> Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>	2024-09-05 21:19:30 +05:00
Vladislav Bronzov	5d11de4a2f	Add Qwen2Moe GGUF loading support (#33264 ) * update gguf doc, config and tensor mapping * add qwen2moe architecture support, GGUFQwen2MoeConverter and q4 unit tests * apply code style fixes * reformat files * assign GGUFQwen2Converter to qwen2_moe	2024-09-05 17:42:03 +02:00
Michelle Habonneau	132e87500e	Update SECURITY.md (#32680 ) updated reporting a vulnerability section	2024-09-05 16:41:01 +02:00
Joshua Lochner	c6d2848a23	🚨 Fix `torch.jit.trace` for `interpolate_pos_encoding` in all vision models (#33226 ) * Fix `torch.jit.tracing` for `interpolate_pos_encoding` in all vision models * Apply formatting * Add missing `self.config = config` * Fix copies * Fix hiera interpolation unit test * Formatting * Update `_import_structure` * make style * Fix docstring * Use `# Copied from` instead of utils * DeiT variable renaming (`class_and_dist_pos_embed`) * Fix Hiera `interpolate_pos_encoding`	2024-09-05 16:17:34 +02:00
Niklas Muennighoff	03164ba14e	Add paper link (#33305 )	2024-09-05 15:49:28 +02:00
Younes Belkada	47b096412d	Fix: Fix `FalconMamba` training issues due to incompatible kernels (#33195 ) * fix FM training kernels * fix copies * fix copies * propagate to slow path * make it BC * add comment * fix test	2024-09-05 11:55:08 +02:00
Raushan Turganbay	43df47d8e7	Llava Onevision: add model (#32673 ) * working version * fix copies * update * tests * update docs * codestyle * add more tests * add returns for docs * clean up * Update src/transformers/models/llava_onevision/processing_llava_onevision.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * updates * codestyle * style * shouldn't be reversed * [run-slow] llava_onevision * [run-slow] llava_onevision * add pooling in videos * [run-slow] llava_onevision * num-logits-to-keep * [run-slow] llava_onevision * [run-slow] llava_onevision * Update tests/test_modeling_common.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * video matched orig impl * fix tests * chat template was modified * Update docs/source/en/model_doc/llava_onevision.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * add morer info in the doc page --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-09-05 14:43:20 +05:00
Yoni Gozlan	9230d78e76	Add validate images and text inputs order util for processors and test_processing_utils (#33285 ) * Add validate images and test processing utils * Remove encoded text from possible inputs in tests * Removed encoded inputs as valid in processing_utils * change text input check to be recursive * change text check to all element of lists and not just the first one in recursive checks	2024-09-04 13:50:31 -04:00
Matthew Douglas	b3909989d3	Fix excessive CPU memory usage with FSDP and cpu_ram_efficient_loading (#33154 )	2024-09-04 18:37:54 +02:00
Yoach Lacombe	a1faf22f2c	[BUG] fix upper nltk version (#33301 ) fix upper nltk version	2024-09-04 18:28:08 +02:00
Aymeric Roucher	cfd92c64f5	Add new documentation page for advanced agent usage (#33265 ) * Add new documentation page for advanced agent usage	2024-09-04 18:19:54 +02:00
Matt	01c8c6c419	Add a warning to the chat template docs about the tool_calls format (#33277 ) * Add a warning to the chat template docs * Add a warning to the chat template docs * Add a warning to the chat template docs	2024-09-04 17:13:34 +01:00

1 2 3 4 5 ...

16800 Commits