transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-30 09:42:22 +06:00

Author	SHA1	Message	Date
Yoach Lacombe	569f6c7d43	Fix FA2 tests (#29909 ) * fix FA2 tests * refactor inference test name	2024-04-01 07:51:00 +00:00
Zach Mueller	3b8e2932ce	Rework tests to compare trainer checkpoint args (#29883 ) * Start rework * Fix failing test * Include max * Update src/transformers/trainer.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-03-30 22:19:17 -04:00
TechxGenus	6e584070d4	[`BC`] Fix BC for AWQ quant (#29965 ) fix awq quant	2024-03-30 19:37:25 +01:00
Bo Zheng	46d636818b	Update model card and link of blog post. (#29928 ) * Update qwen2_moe.md * update link of blogpost. * fixup --------- Co-authored-by: bozheng-hit <dsoul0621@gmail.com>	2024-03-30 17:49:03 +01:00
Gary Wang	f6701bc664	Reset alarm signal when the function is ended (#29706 ) Fixes #29690	2024-03-30 17:41:27 +01:00
Alexander Jipa	e644b60038	fix: get mlflow version from mlflow-skinny (#29918 ) Co-authored-by: Alexander Jipa <azzhipa@amazon.com>	2024-03-30 17:38:29 +01:00
Jacky Lee	156d30da94	Add warning message for `run_qa.py` (#29867 ) * improve: error message for best model metric * update: raise warning instead of error	2024-03-30 17:02:31 +01:00
Jacky Lee	6fd93fe93a	Fix rope theta for OpenLlama (#29893 ) fix: rope_theta for open llama	2024-03-30 16:30:52 +01:00
fzyzcjy	5ad7f17002	Super tiny fix 12 typos about "with with" (#29926 ) * with with * style	2024-03-29 14:31:31 +00:00
Yih-Dar	43d17c1836	Mark `test_eager_matches_sdpa_generate` flaky for some models (#29479 ) * fix * revert for qwen2 * revert for qwen2 * update * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-03-29 11:51:20 +01:00
MariaHei	ba56ed0869	Update installs in image classification doc (#29947 ) Trainer with PyTorch now requires accelerate to be installed. Partly resolves huggingface/transformers#29174	2024-03-28 14:26:27 -07:00
Arthur	536ea2aca2	[`LlamaSlowConverter`] Slow to Fast better support (#29797 ) * fix * fix test * style * nit * rather rely on concert token to id * fix quality * Update src/transformers/convert_slow_tokenizer.py	2024-03-28 16:19:32 +01:00
VINAYAKK GARG	e203646871	Fix doc issue #29758 in DebertaV2Config class (#29842 ) Fix doc issue in DebertaV2Config class Co-authored-by: Vinayakk Garg <vigar@akamai.com>	2024-03-28 14:49:57 +00:00
Arthur	2bbbf1be5b	[`BC`] Fix BC for other libraries (#29934 ) * fi xbc? * nit	2024-03-28 15:13:23 +01:00
Yu Chin Fabian Lim	4df5b9b4b2	Allow GradientAccumulationPlugin to be configured from AcceleratorConfig (#29589 ) * add gradient_accumulation_kwargs to AcceleratorConfig * add suggestions from @muellerzr to docstrings, new behavior and tests * Documentation suggestions from @muellerz Co-authored-by: Zach Mueller <muellerzr@gmail.com> * addressed @muellerzr comments regarding tests and test utils * moved accelerate version to top of file. * @muellerzr's variable fix Co-authored-by: Zach Mueller <muellerzr@gmail.com> * address @amyeroberts. fix tests and docstrings * address @amyeroberts additional suggestions --------- Co-authored-by: Yu Chin Fabian Lim <flim@sg.ibm.com> Co-authored-by: Zach Mueller <muellerzr@gmail.com>	2024-03-28 14:01:40 +00:00
Arthur	a2a7f71604	[ `TokenizationLlama`] fix the way we convert tokens to strings to keep leading spaces 🚨 breaking fix (#29453 ) * nit * update test and fix test * fixup	2024-03-28 13:58:40 +01:00
Arthur	e677479c81	[`Mamba`] from pretrained issue with `self.embeddings` (#29851 ) * nit * update * oups * Update src/transformers/models/mamba/modeling_mamba.py Co-authored-by: Lysandre Debut <hi@lysand.re> --------- Co-authored-by: Lysandre Debut <hi@lysand.re>	2024-03-28 13:54:51 +01:00
Joao Gante	441de62f49	RoPE models: add numerical sanity-check test for RoPE scaling (#29808 ) * add hard rope scaling test * make fixup * quick rope scaling tests * add copy statements	2024-03-28 11:25:50 +00:00
Christopher Keibel	aac7099c92	add functions to inspect model and optimizer status to trainer.py (#29838 ) * add functions to get number of params which require grad, get optimizer group for parameters and get learning rates of param groups to trainer.py * add tests and raise ValueError when optimizer is None * add second layer to test and freeze its weigths * check if torch is available before running tests * use decorator to check if torch is available Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix test indentation Co-authored-by: Zach Mueller <muellerzr@gmail.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Zach Mueller <muellerzr@gmail.com>	2024-03-28 10:37:16 +00:00
amyeroberts	855b95ce34	Safe import of LRScheduler (#29919 ) * Safe import of LRScheduler * Update src/transformers/trainer_pt_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/trainer_pt_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Fix up --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-03-28 09:54:51 +00:00
Aymeric Roucher	c9d2e855ea	Add beam search visualizer to the doc (#29876 )	2024-03-28 09:54:08 +00:00
Joao Gante	248d5d23a2	Tests: replace `torch.testing.assert_allclose` by `torch.testing.assert_close` (#29915 ) * replace torch.testing.assert_allclose by torch.testing.assert_close * missing atol rtol	2024-03-28 09:53:31 +00:00
Fanli Lin	7c19fafe44	[doc] fix some typos and add `xpu` to the testing documentation (#29894 ) fix typo	2024-03-28 09:42:49 +00:00
Eduardo Pacheco	22d159ddf9	Adding Flash Attention 2 Support for GPT2 (#29226 ) * First commit to add flash attention 2 for GPT-2 * more improvements * Make GPT2 pass tests and fixed Decison Transformers copies * Fixed missing arg * fix copies * Added expected speedup * Update src/transformers/models/gpt2/modeling_gpt2.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt2/modeling_gpt2.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt2/modeling_gpt2.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Added test * Fixed attn attribute * Update docs/source/en/model_doc/gpt2.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/model_doc/gpt2.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update Decision transformer attentions * More updates * Passing tests * Fix copies * Fix copies part 2 * Decision transformer updates * Update src/transformers/models/gpt2/modeling_gpt2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Fix copies * Decision transformer not supporting flash attn * Addressed comments * Addressed comments * Addressed comments --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-03-28 09:31:24 +00:00
Arthur	3a7e68362b	[`pipeline`]. Zero shot add doc warning (#29845 ) * add doc warning * fix build pr	2024-03-28 09:10:26 +01:00
Arthur	543889f3f6	[`GptNeox`] don't gather on pkv when using the trainer (#29892 ) don't gather on pkv when using the trainer	2024-03-28 08:56:53 +01:00
Arthur	b256516a8c	[`make fix-copies`] update and help (#29924 ) * add some help * style	2024-03-28 08:56:14 +01:00
Minseo Kang	d9dc993fdd	Fix typo in T5Block error message (#29881 )	2024-03-28 03:30:29 +01:00
Lorenzo Verardo	a25037beb9	MixtralSparseMoeBlock: add gate jitter (#29865 ) This commit adds gate jitter to MixtralSparseMoeBlock's input data before passing it through the MoE layer, if turned on.	2024-03-27 16:14:26 +01:00
huismiling	75769744e9	add Cambricon MLUs support (#29627 ) * add Cambricon MLUs support * fix mlu device rng state * up for quality check * up mlu to support fp16 * fix mlu device dependency error * fix mlu device dependency error * enable mlu device for bf16 * fix mlu device memory tracker	2024-03-27 15:54:28 +01:00
Raushan Turganbay	0efcf32351	Move `eos_token_id` to stopping criteria (#29459 ) * add eos stopping criteria * minor fix * Update tests/generation/test_stopping_criteria.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * check eos is not None and fix tests * make style and fixup * Update src/transformers/generation/stopping_criteria.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update tests/generation/test_utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update tests/generation/test_utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/__init__.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/generation/stopping_criteria.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/generation/stopping_criteria.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/generation/stopping_criteria.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * camel case everywhere * call stopping criteria list for candidate ids * make style and fixup * Empty commit * Empty commit to pass flaky test * set max length in PromptLookupCandidateGenerator * Update src/transformers/generation/utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * lets fix this typo in docs * Update src/transformers/generation/utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/generation/utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * update PR * empty commit --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-03-27 12:18:10 +00:00
Marc Sun	31c575bcf1	fix fuyu device_map compatibility (#29880 ) fix foward	2024-03-27 10:18:48 +01:00
Lysandre Debut	4d8427f739	Reimplement "Automatic safetensors conversion when lacking these files" (#29846 ) * Automatic safetensors conversion when lacking these files (#29390) * Automatic safetensors conversion when lacking these files * Remove debug * Thread name * Typo * Ensure that raises do not affect the main thread * Catch all errors	2024-03-27 08:58:08 +01:00
Hovnatan Karapetyan	a81cf9ee90	Fix 29807, sinusoidal positional encodings overwritten by post_init() (#29813 ) * Check for requires_grad when initing weights * Add unit test * Move sinusoidal positional encoding generation after post_init() * Add modules to skip init list * Move create_sinusoidal_embeddings to _init_weights	2024-03-27 06:28:00 +01:00
Anton Vlasjuk	cefb819f7a	Mamba `slow_forward` gradient fix (#29563 ) * FIX: Cached slow forward in mamba - additionally added mamba cached test - added unused test (mamba causal lm forward and backward) - fixed typo: "causl" --> "causal" * formatting * fix: use real `slow_forward` call instead of torch module's * add shape assertion for mixer block test * adjust shape assertion	2024-03-27 04:52:12 +01:00
Bo Zheng	1c39974a4c	Add Qwen2MoE (#29377 ) * add support for qwen2 MoE models * update docs * add support for qwen2 MoE models * update docs * update model name & test * update readme * update class names & readme & model_doc of Qwen2MoE. * update architecture name * fix qwen2_moe tests * use Qwen2Tokenizer instead of Qwen2MoeTokenizer * update modeling_qwen2_moe.py * fix model architecture * fix qwen2_moe tests * use Qwen2Tokenizer instead of Qwen2MoeTokenizer * update modeling_qwen2_moe.py * fix model architecture * fix style * fix test when there are sparse and non sparse layers * fixup * Update README.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fixup * fixup * add archive back * add support for qwen2 MoE models * update docs * update model name & test * update readme * update class names & readme & model_doc of Qwen2MoE. * update architecture name * fix qwen2_moe tests * use Qwen2Tokenizer instead of Qwen2MoeTokenizer * update modeling_qwen2_moe.py * fix model architecture * fixup * fix qwen2_moe tests * use Qwen2Tokenizer instead of Qwen2MoeTokenizer * fix style * fix test when there are sparse and non sparse layers * fixup * add archive back * fix integration test * fixup --------- Co-authored-by: bozheng-hit <dsoul0621@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-03-27 02:11:55 +01:00
Benjamin Minixhofer	8e08acad6b	Support `num_attention_heads` != `num_key_value_heads` in Flax Llama Implementation (#29557 ) * fix tinyllama flax modelling * rename vars to minimize changes * move * formatting * remove unused var	2024-03-27 02:08:43 +01:00
Lucain	f01e1609bf	Set custom_container in build docs workflows (#29855 )	2024-03-26 14:46:02 +01:00
Ilyas Moutawwakil	07d79520ef	Disable AMD memory benchmarks (#29871 ) * remove py3nvml to skip amd memory benchmarks * uninstall pynvml from docker images	2024-03-26 14:43:12 +01:00
Yanyi Liu	ef60995858	Add `cosine_with_min_lr` scheduler in Trainer (#29341 ) * Add cosine_with_min_lr scheduler * Update error message for missing min_lr or min_lr_rate	2024-03-26 13:57:07 +01:00
Zhihao Lin	998b5bb56f	Allow `bos_token_id is None` during the generation with `inputs_embeds` (#29772 ) * update * add ut * update	2024-03-26 12:51:00 +00:00
Michael	b9ceb03df8	[docs] Indent ordered list in add_new_model.md (#29796 )	2024-03-26 12:03:39 +00:00
Merve Noyan	de81a677c4	Fix header in IFE task guide (#29859 ) Update image_feature_extraction.md	2024-03-26 12:32:37 +01:00
yunxiangtang	b32bf85b58	Replace 'decord' with 'av' in VideoClassificationPipeline (#29747 ) * replace the 'decord' with 'av' in VideoClassificationPipeline * fix the check of backend in VideoClassificationPipeline * adjust the order of imports * format 'video_classification.py' * format 'video_classification.py' with ruff --------- Co-authored-by: wanqiancheng <13541261013@163.com>	2024-03-26 10:12:24 +00:00
Jonathan Flynn	b5a6d6eeab	Add warnings if training args differ from checkpoint trainer state (#29255 ) * add warnings if training args differ from checkpoint args stored in trainer_state.json * run formatting and styling * add a test * format and styling --------- Co-authored-by: Jonathan Flynn <jonl.flynn@guardian.co.uk>	2024-03-26 07:13:13 +01:00
Johannes Kolbe	7eb3ba8224	remove quotes in code example (#29812 ) Co-authored-by: Johannes <johannes.kolbe@tech.better.team>	2024-03-25 13:26:54 +00:00
Arthur Zucker	e3e16ddc3c	[`revert commit`] revert `00a09ed448`	2024-03-25 22:01:01 +09:00
Arthur Zucker	00a09ed448	fix 😭	2024-03-25 21:57:31 +09:00
Yuki Watanabe	8e9a2207b3	Populate torch_dtype from model to pipeline (#28940 ) * Populate torch_dtype from model to pipeline Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> * use property Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> * lint Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> * Remove default handling Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> --------- Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>	2024-03-25 10:46:40 +01:00
yhuang	afe73aed54	Fix the behavior of collecting 'num_input_tokens_seen' (#29099 ) fix the behavior of collecting 'num_input_tokens_seen' See https://github.com/huggingface/transformers/issues/28791 for more details.	2024-03-25 10:43:46 +01:00

... 77 78 79 80 81 ...

19383 Commits