transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-28 00:32:25 +06:00

Author	SHA1	Message	Date
Ranggi Hwang	9b85e405ab	[`SwitchTransformer`] Significant performance improvement on MoE blocks (#31173 ) * SwitchTransformer MoE layer performance improvement * make fixup * comments about shapes * make fixup	2024-06-06 09:10:12 +02:00
graham	8177aa0e1a	no need for explicit EXTRA_TOKENS in processing_paligemma.py (#31022 ) no need for explicit EXTRA_TOKENS	2024-06-06 08:41:41 +02:00
amyeroberts	940fde8daf	Skip failing JetMOE generation tests (#31266 ) Skip failing tests for now	2024-06-05 19:06:46 +01:00
Cyril Vallez	bd5091df8d	Reduce by 2 the memory requirement in `generate()` 🔥🔥🔥 (#30536 ) * Fix contrastive_search for new cache structure, and improve performance by removing inneficient torch.stack(torch.split(x, top_k, dim=0)) * Fix _contrastive_search for non-standard cache using ellipsis slicing * Fix all outputs.logits memory leaks for all decoding strategies! * Fix small error in _contrastive_search() * Make all necessary change and revert for the new class * Apply coding style * Remove pipes in type hints for compatibility * correct type hint * apply style * Use DynamicCache by default and solve conflicts * Fix rebase issues * Add `_supports_dynamic_cache_class` in models for models that support DynamicCache but not other caches to make DynamicCache the default for more models * Create generation config to return legacy format by default, or to choose not to * style * Fix case when use_cache is False * Remove default DynamicCache in assiste_decoding if assistant_model does not support it + fix _seen_tokens when cropping cache * Update prepare_inputs_for_generation() for case with empty DynamicCache * Correct return of args in _assisted_decoding * Remove EfficientDynamicCache as it is no longer needed * Correct mistake in generation config * Move cache logic of assisted decoding to AssistedCandidateGenerator.__init__ * change DynamicCache function names from "split" to "batch_split" for readability + apply coding style * Remove `_supports_dynamic_cache_class` attribute after rebase * Correct missing line lost in conflict resolution during rebasing * Add special case for Jamba * Fix jamba test * Coding style * coding style * Correct missing import in rebasing * Simplify _validate_model_kwargs based on removal of _supports_dynamic_cache attribute * Simplify code paths in _contrastive_search * coding style * Update docstrings of cache methods * Update prepare_inputs_for_generation() -> past_key_values are always Cache objects	2024-06-05 17:05:01 +02:00
Yih-Dar	d6276f0fc5	Add condition to `benchmark` job in `push-important-models.yml` (#31259 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-06-05 15:19:16 +02:00
Dhaivat Bhatt	b72752f068	Fix circular reference issue in CLIPTokenizerFast (#31075 )	2024-06-05 14:01:13 +02:00
bastrob	464d986b6c	Add missing Flaubert tokenizer tests (#30492 ) * add flaubert tokenization test, enrich inheritance in FlaubertTokenizer. * fix quality code ci * ensure parameter consistency * fix ci * fix copyright year and flatten vocab list. * fix style	2024-06-05 13:52:16 +02:00
Huazhong Ji	41cf4097f7	enable deterministic mode for npu (#31253 )	2024-06-05 07:35:35 -04:00
Vaibhav Srivastav	4a6024921f	doc: add info about wav2vec2 bert in older wav2vec2 models. (#31120 ) * doc: add info about wav2vec2 bert in older wav2vec2 models. * apply suggestions from review. * forward contrib credits from review --------- Co-authored-by: Sanchit Gandhi <sanchit-gandhi@users.noreply.github.com>	2024-06-05 11:56:11 +01:00
dependabot[bot]	c39aaea972	Bump transformers from 3.5.1 to 4.38.0 in /examples/research_projects/deebert (#31244 ) Bump transformers in /examples/research_projects/deebert Bumps [transformers](https://github.com/huggingface/transformers) from 3.5.1 to 4.38.0. - [Release notes](https://github.com/huggingface/transformers/releases) - [Commits](https://github.com/huggingface/transformers/compare/v3.5.1...v4.38.0) --- updated-dependencies: - dependency-name: transformers dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-06-05 11:12:58 +01:00
amyeroberts	54659048a2	Early labels validation (#31240 ) * Move label validation checks - fail early * Remove some formatting changes - add back labels change wav2vec2	2024-06-05 10:50:55 +01:00
Yih-Dar	03ea160937	Benchmark GitHub Actions workflow (#31163 ) * benchmark workflow * benchmark workflow * benchmark workflow * benchmark workflow * build * build * build * build * build * build * build * build * build * build * build * build * build * build --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-06-05 10:39:00 +02:00
James Braza	63fb253df0	Fixing `name 'torch' is not defined` in `bitsandbytes` integration (#31243 ) Fixed torch definition error	2024-06-05 08:00:30 +02:00
Yury Sulsky	66875ac070	Specify dtype=torch.bool to avoid xla error (#31191 ) The StoppingCriteriaList allocates is_done without specifying dtype=torch.bool. On XLA this allocates a float tensor and causes a failure on the following line: is_done = is_done \| criteria(input_ids, scores, **kwargs) by attempting to OR float with bool.	2024-06-05 07:50:54 +02:00
dependabot[bot]	8685b3c5d2	Bump transformers from 4.26.0 to 4.38.0 in /examples/research_projects/vqgan-clip (#31242 ) Bump transformers in /examples/research_projects/vqgan-clip Bumps [transformers](https://github.com/huggingface/transformers) from 4.26.0 to 4.38.0. - [Release notes](https://github.com/huggingface/transformers/releases) - [Commits](https://github.com/huggingface/transformers/compare/v4.26.0...v4.38.0) --- updated-dependencies: - dependency-name: transformers dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-06-04 22:11:45 +01:00
Yih-Dar	3714f3f86b	Upload (daily) CI results to Hub (#31168 ) * build * build * build * build * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-06-04 21:20:54 +02:00
amyeroberts	99de3a844b	Move out common backbone config param validation (#31144 ) * Move out common validation * Add missing backbone config arguments	2024-06-04 18:15:37 +01:00
Younes Belkada	485d913dfb	Blip: Deprecate `BlipModel` (#31235 ) * deprecate blip * mention deprecation on docs	2024-06-04 18:29:45 +02:00
Yih-Dar	fd3238b4b0	Fix `MistralIntegrationTest` (#31231 ) * fix * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-06-04 18:04:08 +02:00
Manuel Faysse	2965b20459	add no split modules for xlmrobertaxl (#31223 )	2024-06-04 15:46:19 +01:00
Jacklanda	821b772ab9	Add new line switch before logging *** Running {description} * (#31225 ) ✨ Add new line switch before logging "* Running {description} ***". Signed-off-by: jacklanda <yonyonlau@gmail.com>	2024-06-04 13:38:17 +01:00
amyeroberts	4ba66fdb4c	Fix pipeline tests - torch imports (#31227 ) * Fix pipeline tests - torch imports * Frameowrk dependant float conversion	2024-06-04 12:30:23 +01:00
Chujie Zheng	6b22a8f2d8	fix bf16 issue in text classification pipeline (#30996 ) * fix logits dtype * Add bf16/fp16 tests for text_classification pipeline * Update test_pipelines_text_classification.py * fix * fix	2024-06-04 11:20:48 +01:00
Kristen Pereira	de460e28e1	Add dynamic resolution input/interpolate position embedding to deit (#31131 ) * Added interpolate pos encoding feature and test to deit * Added interpolate pos encoding feature and test for deit TF model * readded accidentally delted test for multi_gpu * storing only patch_size instead of entire config and removed commented code * Update modeling_tf_deit.py to remove extra line Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-06-04 10:29:01 +01:00
Raushan Turganbay	d64e4da713	Video-LLaVa: handle any number of frames (#31221 ) video-llava can handle more frames	2024-06-04 14:20:03 +05:00
Max Strobel	36ade4a32b	fix(PatchTST): Wrong dropout used for PretainHead (#31117 ) * fix(PatchTST): Wrong dropout used for PretainHead * feat(PatchTST): remove unused config.dropout --------- Co-authored-by: Strobel Maximilian (IFAG PSS SIS SCE ACM) <Maximilian.Strobel@infineon.com>	2024-06-04 10:11:36 +01:00
DomHudson	e83cf58145	Fix sentence fragment within test comments (#31218 )	2024-06-04 10:09:24 +01:00
Raushan Turganbay	83238eeebc	Pass device in Logits Processor's init (#29804 ) * add device in logits processor * remove device when not needed * codestyle * tests * forgot `melody` version * Update src/transformers/models/whisper/generation_whisper.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * codestyle * updates --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2024-06-04 10:19:19 +05:00
Aaron Jimenez	c73ee1333d	[docs] Spanish translation of tokenizer_summary.md (#31154 ) * add tokenizer_summary to es/_toctree.yml * add tokenizer_summary to es/ * fix link to Transformes XL in en/ * translate until Subword tokenization section * fix GPT link in en/ * fix other GPT link in en/ * fix typo in en/ * translate the doc * run make fixup * Remove .md in Transformer XL link * fix some link issues in es/ * fix typo	2024-06-03 16:52:23 -07:00
Yih-Dar	8a1a23ae4d	Fix GPU OOM for `mistral.py::Mask4DTestHard` (#31212 ) * build * build * build * build --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-06-03 19:25:15 +02:00
miivanov90	df5abae894	Set greater_is_better to False if metric_for_best_model ends with "loss" (#31142 ) * update to not(endswith(loss)) * ruff formatting	2024-06-03 17:52:28 +01:00
Younes Belkada	924c46d40c	Cohere: Fix copied from (#31213 ) Update modeling_cohere.py	2024-06-03 18:29:31 +02:00
Jade Choghari	98dd842339	Wrong translation FR : Contents = Contenu (#31186 ) Update index.md - Contents = Contenu French typo - Contents = Contenu	2024-06-03 17:40:14 +02:00
Qubitium	c6c78733d7	Rename sanity_evaluation to eval_on_start (#31192 ) * Rename sanity_evaluation to eval_on_start * move arg back to last	2024-06-03 16:32:21 +01:00
Bojun Feng	c230504b36	Fix typo in utils (#31169 ) fix typo	2024-06-03 17:27:53 +02:00
Sangbum Daniel Choi	874ac129bb	fix the get_size_with_aspect_ratio in max_size situation (#30902 ) * fix the get_size_with_aspect_ratio in max_size situation * make fix-up * add more general solution * consider when max_size is not defined * fix typo * fix typo * simple fix * fix error * fix if else error * fix error of size overwrite * fix yolos image processing * fix detr image processing * make * add longest related test script * Update src/transformers/models/yolos/image_processing_yolos.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * add more test * add test script about longest size * remove deprecated --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-06-03 16:12:08 +01:00
Isotr0py	e4628434d8	Add Qwen2 GGUF loading support (#31175 ) * add qwen2 gguf support * Update docs * fix qwen2 tokenizer * add qwen2 gguf test * fix typo in qwen2 gguf test * format code * Remove mistral, clarify the error message * format code * add typing and update docstring	2024-06-03 14:55:10 +01:00
Yih-Dar	df848acc5d	Fix `test_compile_static_cache` (#30991 ) * fix * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-06-03 15:16:28 +02:00
NielsRogge	70c8713872	🚨 [Mistral and friends] Update MLP (#31057 ) Update MLP	2024-06-03 14:57:07 +02:00
Joao Gante	d475f76745	SlidingWindowCache: reduce differences to other Cache classes (#30970 ) * tmp commit * sliding window with fewer differences * make fixup + rebase * missing overwrite	2024-06-03 14:04:24 +02:00
fxmarty	221aaec6ec	Ignore non-causal mask in more cases with SDPA (#30138 ) * update non-causal mask for sdpa * add test * update docstrings * add one more test * fix cross attention bug * gentler atol/rtol	2024-06-03 19:08:41 +08:00
Pavithra Devi M	f4f696255f	Fix Cannot convert [array()] to EagerTensor of dtype int64 (#31109 ) While running the model.prepare_tf_dataset() method, it raises the error below: ``` TypeError: Cannot convert [array([322., 1.])] to EagerTensor of dtype int64 ``` This happens, in "DataCollatorForSeq2Seq" function when we are try to convert the labels to tensors. While converting the labels to tensors, the labels can be in the format of list of list or list of ndarrays. There is no problem converting the list of list lables. There is a problem when the list of ndarrays are float values(like below). ``` [array([322., 1.])] ``` so the exception raises while trying to convert this label to tensors using below code. ``` batch["labels"] = tf.constant(batch["labels"], dtype=tf.int64) ``` The labels are always integer values, so this got converted to float values in the label padding operation below. ``` batch["labels"] = [ call(label) if padding_side == "right" else np.concatenate([[self.label_pad_token_id] * (max_label_length - len(label)), label]) for label in labels ] ``` Here we have 2 cases: 1 - Concatenating an array having integer padding token value with labels. 2 - Concatenating an empty array with labels. ---------------------------------------------------------------------------------------- case 1: Concatenating an array having integer padding token value with labels. WORKS EXPECTED: ---------------------------------------------------------------------------------------- ``` label = np.array([233, 1]) max_label_length = 4 label_pad_token_id = -100 np.concatenate([[label_pad_token_id] * (max_label_length - len(label)), label]) o/p: array([-100, -100, 233, 1]) ``` ---------------------------------------------------------------------------------------- Case 2: Concatenating an empty array with labels. GIVES THE ISSUE: This scenorio can happen when the label has the maximum label length -- No padding needed. ---------------------------------------------------------------------------------------- ``` label = np.array([233, 1]) max_label_length = 2 label_pad_token_id = -100 np.concatenate([[label_pad_token_id] * (max_label_length - len(label)), label]) o/p: array([233., 1.]) ``` ---------------------------------------------------------------------------------------- Solution: ---------------------------------------------------------------------------------------- We need to concatenate a ndarray of dtype int with labels. AFTER FIX: ---------- case 1: ``` label = np.array([233, 1]) max_label_length = 4 label_pad_token_id = -100 np.concatenate([np.array([label_pad_token_id] * (max_label_length - len(label)), dtype=np.int64),label]) o/p: array([-100, -100, 233, 1]) ``` case 2: ``` label = np.array([233, 1]) max_label_length = 2 label_pad_token_id = -100 np.concatenate([np.array([label_pad_token_id] * (max_label_length - len(label)), dtype=np.int64),label]) o/p: array([233, 1]) ```	2024-06-03 10:49:03 +01:00
Arthur	1749841a0e	[`GemmaModel`] fix small typo (#31202 ) * fixes * fix-copies	2024-06-03 11:02:38 +02:00
Ahmed Moubtahij	39b2ff69d6	Token healing (#30081 ) * token healing impl + trie with extensions * make fixup * prefix-robust space tokenization * examples readme and requirements * make fixup * allow input prompt and model * redundant defaults * Specialized Trie * make fixup * updated tests with new inherited Tree * input ids to auto device_map * rm unused import * Update src/transformers/generation/utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * naming convention * Revert "naming convention" This reverts commit dd39d9c5b7a969e2d8a8d2a8e54f121b82dc44f0. * naming convention * last -hopefully- changes --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-06-03 10:53:15 +02:00
amyeroberts	5b5b48b11d	Remove copied froms for deprecated models (#31153 ) * Remove copied froms for deprecated models * Remove automatically in script	2024-06-03 09:42:53 +01:00
CharlesCNorton	97e5a7072c	Fix typo: use_safetenstors to use_safetensors (#31184 ) Corrected a typo in security.md. Changed `use_safetenstors` to `use_safetensors` in the section discussing the usage of safe formats for loading models to prevent arbitrary code execution.	2024-06-03 10:33:02 +02:00
Arthur	96eb06286b	Diff converter v2 (#30868 ) * current working example! * commit regex and result file * update * nit * push the conversion file * oups * roadmap and nits * attempt diffs for 3 files * persimmon * nit * add diff file that is the same as the modeling_llama.py * fix rope nits * updates * updates with converted versions * give some breathing space to the code * delete * update * update * push the actual result * update regex patterns * update regex patterns * fix some issues * fix some issues * fix some issues * updates * updates * updates * updates * updates * revert changes done to llama * updates * update gemma * updates * oups * current state * current state * update * ouiiii * nit * clear diffs * nit * fixup * update * doc 🚀 * 🔥 * for now use gemma * deal with comments * style * handle funtions * deal with assigns * todos * process inheritage * keep decorators? * 🤗 * deal with duplicates * fixup * correctly remove duplicate code * run ruff post script * ruff deals pretty well with imports, let's leave it to him * ah maybe not lol * for now remove all imports from child. * nit * conversion of llama * okay * convert starcoder2 * synch with main * update llama diff * updates * https://docs.astral.sh/ruff/rules/redefined-while-unused/ fixes the imports, bit needs later version of ruff * updates * okay actual state * non zero exit * update! * revert unrelated * remove other diff files * updates * cleanup * update * less diff! * stash * current updates * updates * No need for call * finished fining deps * update * current changes * current state * current state * new status * nit * finally * fixes * nits * order is now expected * use logger info instead of prints * fixup * up * nit * update * nits * update * correct merge * update * update * update * add warning * update caution message * update * better merging strategy * copy class statements :wink * fixups * nits * update * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * nits * smaller header * do cleanup some stuff * even simpler header? * fixup * updates * ruff * update examples * nit * TODO * state * OUUUUUUF * current state * nits * final state * add a readme * fixup * remove diff llama * fix * nit * dummy noy funny * ruff format tests src utils --check * everless diffs * less diffs and fix test * fixes * naming nit? * update converter and add supper example * nits * updated for function signatures * update * update * add converted dummies * autoformat * single target assign fix * fixup * fix some imports * fixes * don't push them * `# noqa: F841` --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-05-31 18:37:43 +02:00
Vallepu Vamsi Krishna	372baec2e6	Added description of quantization_config (#31133 ) * Description of quantization_config Added missing description about quantization_config in replace_with_bnb_linear for better readability. * Removed trailing spaces	2024-05-31 18:23:11 +02:00
Pavel Iakubovskii	cdc813113a	Instance segmentation examples (#31084 ) * Initial setup * Metrics * Overfit on two batches * Train 40 epochs * Memory leak debugging * Trainer fine-tuning * Draft * Fixup * Trained end-to-end * Add requirements * Rewrite evaluator * nits * Add readme * Add instance-segmentation to the table * Support void masks * Remove sh * Update docs * Add pytorch test * Add accelerate test * Update examples/pytorch/instance-segmentation/README.md * Update examples/pytorch/instance-segmentation/run_instance_segmentation.py * Update examples/pytorch/instance-segmentation/run_instance_segmentation_no_trainer.py * Update examples/pytorch/instance-segmentation/run_instance_segmentation_no_trainer.py * Update examples/pytorch/instance-segmentation/run_instance_segmentation.py * Fix consistency oneformer * Fix imports * Fix imports sort * Apply suggestions from code review Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update examples/pytorch/instance-segmentation/run_instance_segmentation.py Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com> * Add resources to docs * Update examples/pytorch/instance-segmentation/README.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update examples/pytorch/instance-segmentation/README.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Remove explicit model_type argument * Fix tests * Update readme * Note about other models --------- Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-05-31 16:56:17 +01:00
Aymeric Roucher	9837a25481	Add streaming, various fixes (#30838 ) * Implement streaming run in ReAct agents * Allow additional imports in code agents * Python interpreter: support classes and exceptions, fixes	2024-05-31 14:16:23 +02:00

... 65 66 67 68 69 ...

19383 Commits