transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-13 09:40:06 +06:00

History

Minho Ryu eca74d1367 [WIP] add deepseek-v3 (#35926 ) * init commit * style * take comments into account * add deepseekv3 modeling * remove redundant code * apply make style * apply fix-copies * make format * add init files * rename deepseekv3 into deepseek_v3 based on its model_type * rename deepseekv3 into deepseek_v3 based on its model_type * deepseek-v3 not deepseek_v3 * set model_type as deepseek_v3 * use default docs * apply make * fill type and docstring * add rope_config_validation * use custom DeepseekV3MLP * hold code only for checkpoints congifuration; remove redundant * revise rope yarn for DeepSeek variation * rename DeepSeek-V3 * some refactoring * revise load_hook to work properly; make moe func trainable; use llama instead of mixtral * fix attention forward * use -1 for not-changing dim when to use exapnd * refactor DeepseekV3TopkRouter * use reshape_for_rope instead of load_hook; revise attention forward for TP; rename q_head_dim with qk_head_dim * register pre_hook and hook both * make style * use n_shared_experts * Update src/transformers/models/deepseek_v3/configuration_deepseek_v3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * add test file * update modeling_file according to modular file * make style * add mapping for DeepseekV3ForSequenceClassification * remove aux_loss_alpha * add deepseek_v3 for perf * add deepseek_v3 * rename test as deepseekv3 * use tiny-deepseek-v3 * remove DeepseekV3ForSequenceClassification * cache before padding * remote output_router_logits * Revert "remote output_router_logits" This reverts commit `f264f800d0`. * remove output_router_logits * make e_score_correction_bias as buffer * skip tests not compatible * make style * make e_score_correction_bias as buffer * use rope_interleave instead of load_hook * skip tests not compatible with MLA * add doc for rope_interleave * fix typo * remove torch.no_grad for selecting topk * fix post merge issue * mrege with main and simplify * nits * final * small fixes * fix * support TP better * stash * changes currently requires * remove synch * more fixes for TP * temp fix for TP : some attention layers's FP8 scales are too small + shared is local colwise and anything is local if FP8 because weights are used * updates to have generation work! * push most of the changes * reorder functions + call for contributions! * update readme * nits * update * ruff was updated on main * merge with main and fix copies * revert unrelated changes * route all tokens to all experts when testing to avoid no gradient iddues * finish fixing all tests * fixup * nit * clean config * last readme changes * nit * do cnit * typo * last nit * one more one more --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: arthur@huggingface.co <arthur@ip-26-0-165-131.ec2.internal>		2025-03-28 15:56:59 +01:00
..
internal	Allow easy registration of custom attention functions (#36889 )	2025-03-26 16:15:06 +01:00
main_classes	Support loading Quark quantized models in Transformers (#36372 )	2025-03-20 15:40:51 +01:00
model_doc	[WIP] add deepseek-v3 (#35926 )	2025-03-28 15:56:59 +01:00
quantization	Support loading Quark quantized models in Transformers (#36372 )	2025-03-20 15:40:51 +01:00
tasks	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
_config.py	Add optimized `PixtralImageProcessorFast` (#34836 )	2024-11-28 16:04:05 +01:00
_redirects.yml	Docs / Quantization: Redirect deleted page (#31063 )	2024-05-28 18:29:22 +02:00
_toctree.yml	[WIP] add deepseek-v3 (#35926 )	2025-03-28 15:56:59 +01:00
accelerate.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
add_new_model.md	Add support for fast image processors in add-new-model-like CLI (#36313 )	2025-03-13 14:16:37 -04:00
add_new_pipeline.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
agents.md	chore: Fix typos in docs and examples (#36524 )	2025-03-04 13:47:41 +00:00
attention_interface.md	Allow easy registration of custom attention functions (#36889 )	2025-03-26 16:15:06 +01:00
attention.md	[Docs] Fix broken links and syntax issues (#28918 )	2024-02-08 14:13:35 -08:00
backbones.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
cache_explanation.md	Fix typos (#36910 )	2025-03-24 14:08:29 +00:00
chat_extras.md	Update chat_extras.md with content correction (#36599 )	2025-03-07 13:09:02 +00:00
chat_templating_multimodal.md	Fix typos (#36910 )	2025-03-24 14:08:29 +00:00
chat_templating_writing.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
chat_templating.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
community.md	Fixed Majority of the Typos in `transformers[en]` Documentation (#33350 )	2024-09-09 10:47:24 +02:00
contributing.md	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
conversations.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
custom_models.md	Fix typos (#36910 )	2025-03-24 14:08:29 +00:00
debugging.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
deepspeed.md	chore: Fix typos in docs and examples (#36524 )	2025-03-04 13:47:41 +00:00
executorch.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
fast_tokenizers.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
feature_extractors.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
fsdp.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
generation_features.md	chore: Fix typos in docs and examples (#36524 )	2025-03-04 13:47:41 +00:00
generation_strategies.md	fix typos in the docs directory (#36639 )	2025-03-11 09:41:41 -07:00
gguf.md	Fix gguf docs (#36601 )	2025-03-11 15:29:14 +01:00
glossary.md	Fix typos (#31819 )	2024-07-08 11:52:47 +01:00
gpu_selection.md	Fix typos (#36910 )	2025-03-24 14:08:29 +00:00
how_to_hack_models.md	fix typos in the docs directory (#36639 )	2025-03-11 09:41:41 -07:00
hpo_train.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
image_processors.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
index.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
installation.md	Update installation.md (#36826 )	2025-03-21 16:32:02 -07:00
kv_cache.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
llm_optims.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
llm_tutorial_optimization.md	fix typos in the docs directory (#36639 )	2025-03-11 09:41:41 -07:00
llm_tutorial.md	chore: Fix typos in docs and examples (#36524 )	2025-03-04 13:47:41 +00:00
model_memory_anatomy.md	Enable BNB multi-backend support (#31098 )	2024-09-24 03:40:56 -06:00
model_sharing.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
model_summary.md	model_summary.md - Restore link to Harvard's Annotated Transformer. (#29702 )	2024-03-23 18:29:39 -07:00
models.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
modular_transformers.md	Support custom dosctrings in modular (#36726 )	2025-03-18 14:00:54 -04:00
notebooks.md	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
optimizers.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
pad_truncation.md	Fixed Majority of the Typos in `transformers[en]` Documentation (#33350 )	2024-09-09 10:47:24 +02:00
peft.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
perf_hardware.md	chore: Fix typos in docs and examples (#36524 )	2025-03-04 13:47:41 +00:00
perf_infer_cpu.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
perf_infer_gpu_multi.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
perf_infer_gpu_one.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
perf_torch_compile.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
perf_train_cpu_many.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
perf_train_cpu.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
perf_train_gpu_many.md	Mention UltraScale Playbook 🌌 in docs (#36589 )	2025-03-06 14:48:11 -08:00
perf_train_gpu_one.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
perf_train_special.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
perf_train_tpu_tf.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
perplexity.md	[docs] use device-agnostic API instead of cuda (#34913 )	2024-11-26 09:23:34 -08:00
philosophy.md	[docs] fixed links with 404 (#27327 )	2023-11-06 19:45:03 +00:00
pipeline_gradio.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
pipeline_tutorial.md	chore: Fix typos in docs and examples (#36524 )	2025-03-04 13:47:41 +00:00
pipeline_webserver.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
pr_checks.md	Fixed Majority of the Typos in `transformers[en]` Documentation (#33350 )	2024-09-09 10:47:24 +02:00
processors.md	[docs] Fix image link (#36869 )	2025-03-25 11:34:21 -07:00
quicktour.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
run_scripts.md	Remove research projects (#36645 )	2025-03-11 13:47:38 +00:00
serialization.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
serving.md	[docs] Serving LLMs (#36522 )	2025-03-10 13:14:19 -07:00
task_summary.md	[doctest] Fixes (#35863 )	2025-01-26 15:26:38 -08:00
tasks_explained.md	fix: Wrong task mentioned in docs (#34757 )	2024-11-18 18:42:28 +00:00
testing.md	chore: Fix typos in docs and examples (#36524 )	2025-03-04 13:47:41 +00:00
tf_xla.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
tflite.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
tokenizer_summary.md	[docs] Spanish translation of tokenizer_summary.md (#31154 )	2024-06-03 16:52:23 -07:00
tools.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
torchscript.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
trainer.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
training.md	[docs] Redesign (#31757 )	2025-03-03 10:33:46 -08:00
troubleshooting.md	Update all references to canonical models (#29001 )	2024-02-16 08:16:58 +01:00