transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-04 05:10:06 +06:00

History

Minho Ryu eca74d1367 [WIP] add deepseek-v3 (#35926 ) * init commit * style * take comments into account * add deepseekv3 modeling * remove redundant code * apply make style * apply fix-copies * make format * add init files * rename deepseekv3 into deepseek_v3 based on its model_type * rename deepseekv3 into deepseek_v3 based on its model_type * deepseek-v3 not deepseek_v3 * set model_type as deepseek_v3 * use default docs * apply make * fill type and docstring * add rope_config_validation * use custom DeepseekV3MLP * hold code only for checkpoints congifuration; remove redundant * revise rope yarn for DeepSeek variation * rename DeepSeek-V3 * some refactoring * revise load_hook to work properly; make moe func trainable; use llama instead of mixtral * fix attention forward * use -1 for not-changing dim when to use exapnd * refactor DeepseekV3TopkRouter * use reshape_for_rope instead of load_hook; revise attention forward for TP; rename q_head_dim with qk_head_dim * register pre_hook and hook both * make style * use n_shared_experts * Update src/transformers/models/deepseek_v3/configuration_deepseek_v3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * add test file * update modeling_file according to modular file * make style * add mapping for DeepseekV3ForSequenceClassification * remove aux_loss_alpha * add deepseek_v3 for perf * add deepseek_v3 * rename test as deepseekv3 * use tiny-deepseek-v3 * remove DeepseekV3ForSequenceClassification * cache before padding * remote output_router_logits * Revert "remote output_router_logits" This reverts commit `f264f800d0`. * remove output_router_logits * make e_score_correction_bias as buffer * skip tests not compatible * make style * make e_score_correction_bias as buffer * use rope_interleave instead of load_hook * skip tests not compatible with MLA * add doc for rope_interleave * fix typo * remove torch.no_grad for selecting topk * fix post merge issue * mrege with main and simplify * nits * final * small fixes * fix * support TP better * stash * changes currently requires * remove synch * more fixes for TP * temp fix for TP : some attention layers's FP8 scales are too small + shared is local colwise and anything is local if FP8 because weights are used * updates to have generation work! * push most of the changes * reorder functions + call for contributions! * update readme * nits * update * ruff was updated on main * merge with main and fix copies * revert unrelated changes * route all tokens to all experts when testing to avoid no gradient iddues * finish fixing all tests * fixup * nit * clean config * last readme changes * nit * do cnit * typo * last nit * one more one more --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: arthur@huggingface.co <arthur@ip-26-0-165-131.ec2.internal>		2025-03-28 15:56:59 +01:00
..
ar	Remove research projects (#36645 )	2025-03-11 13:47:38 +00:00
de	Fix typos (#36910 )	2025-03-24 14:08:29 +00:00
en	[WIP] add deepseek-v3 (#35926 )	2025-03-28 15:56:59 +01:00
es	Fix typos (#36910 )	2025-03-24 14:08:29 +00:00
fr	Remove research projects (#36645 )	2025-03-11 13:47:38 +00:00
hi	[i18n-HI] Translated TFLite page to Hindi (#34572 )	2024-11-04 09:40:30 -08:00
it	Fix typos (#36910 )	2025-03-24 14:08:29 +00:00
ja	Just import torch AdamW instead (#36177 )	2025-03-19 18:29:40 +00:00
ko	🌐 [i18n-KO] Translated codegen.md to Korean (#36698 )	2025-03-14 09:31:18 -07:00
ms	Remove research projects (#36645 )	2025-03-11 13:47:38 +00:00
pt	Fix typos (#36910 )	2025-03-24 14:08:29 +00:00
te	Fix typos in translated quicktour docs (#35302 )	2024-12-17 09:32:00 -08:00
tr	Translate index.md to Turkish (#27093 )	2023-11-08 08:35:20 -05:00
zh	Just import torch AdamW instead (#36177 )	2025-03-19 18:29:40 +00:00
_config.py	[#29174 ] ImportError Fix: Trainer with PyTorch requires accelerate>=0.20.1 Fix (#29888 )	2024-04-08 14:21:16 +01:00