transformers/docs/source/en
Jaeyong Sung 583db52bc6
Add Dia model (#38405)
* add dia model

* add tokenizer files

* cleanup some stuff

* brut copy paste code

* rough cleanup of the modeling code

* nuke some stuff

* more nuking

* more cleanups

* updates

* add mulitLayerEmbedding vectorization

* nits

* more modeling simplifications

* updates

* update rope

* update rope

* just fixup

* update configuration files

* more cleanup!

* default config values

* update

* forgotten comma

* another comma!

* update, more cleanups

* just more nits

* more config cleanups

* time for the encoder

* fix

* sa=mall nit

* nits

* n

* refacto a bit

* cleanup

* update cv scipt

* fix last issues

* fix last nits

* styling

* small fixes

* just run 1 generation

* fixes

* nits

* fix conversion

* fix

* more fixes

* full generate

* ouf!

* fixes!

* updates

* fix

* fix cvrt

* fixup

* nits

* delete wrong test

* update

* update

* test tokenization

* let's start changing things bit by bit - fix encoder step

* removing custom generation, moving to GenerationMixin

* add encoder decoder attention masks for generation

* mask changes, correctness checked against ad29837 in dia repo

* refactor a bit already --> next cache

* too important not to push :)

* minimal cleanup + more todos

* make main overwrite modeling utils

* add cfg filter & eos filter

* add eos countdown & delay pattern

* update eos countdown

* add max step eos countdown

* fix tests

* fix some things

* fix generation with testing

* move cfg & eos stuff to logits processor

* make RepetitionPenaltyLogitsProcessor flexible

- can accept 3D scores like (batch_size, channel, vocab)

* fix input_ids concatenation dimension in GenerationMixin for flexibility

* Add DiaHangoverLogitsProcessor and DiaExponentialDecayLengthPenalty classes; refactor logits processing in DiaForConditionalGeneration to utilize new configurations and improve flexibility.

* Add stopping criteria

* refactor

* move delay pattern from processor to modeling like musicgen.

- add docs
- change eos countdown to eos delay pattern

* fix processor & fix tests

* refactor types

* refactor imports

* format code

* fix docstring to pass ci

* add docstring to DiaConfig & add DiaModel to test

* fix docstring

* add docstring

* fix some bugs

* check

* porting / merging results from other branch - IMPORTANT: it very likely breaks generation, the goal is to have a proper forward path first

* experimental testing of left padding for first channel

* whoops

* Fix merge to make generation work

* fix cfg filter

* add position ids

* add todos, break things

* revert changes to generation --> we will force 2d but go 3d on custom stuff

* refactor a lot, change prepare decoder ids to work with left padding (needs testing), add todos

* some first fixes to get to 10. in generation

* some more generation fixes / adjustment

* style + rope fixes

* move cfg out, simplify a few things, more todos

* nit

* start working on custom logit processors

* nit

* quick fixes

* cfg top k

* more refactor of logits processing, needs a decision if gen config gets the new attributes or if we move it to config or similar

* lets keep changes to core code minimal, only eos scaling is questionable atm

* simpler eos delay logits processor

* that was for debugging :D

* proof of concept rope

* small fix on device mismatch

* cfg fixes + delay logits max len

* transformers rope

* modular dia

* more cleanup

* keep modeling consistently 3D, generate handles 2D internally

* decoder starts with bos if nothing

* post processing prototype

* style

* lol

* force sample / greedy + fixes on padding

* style

* fixup tokenization

* nits

* revert

* start working on dia tests

* fix a lot of tests

* more test fixes

* nit

* more test fixes + some features to simplify code more

* more cleanup

* forgot that one

* autodocs

* small consistency fixes

* fix regression

* small fixes

* dia feature extraction

* docs

* wip processor

* fix processor order

* processing goes brrr

* transpose before

* small fix

* fix major bug but needs now a closer look into the custom processors esp cfg

* small thing on logits

* nits

* simplify indices and shifts

* add simpler version of padding tests back (temporarily)

* add logit processor tests

* starting tests on processor

* fix mask application during generation

* some fixes on the weights conversion

* style + fixup logits order

* simplify conversion

* nit

* remove padding tests

* nits on modeling

* hmm

* fix tests

* trigger

* probably gonna be reverted, just a quick design around audio tokenizer

* fixup typing

* post merge + more typing

* initial design for audio tokenizer

* more design changes

* nit

* more processor tests and style related things

* add to init

* protect import

* not sure why tbh

* add another protect

* more fixes

* wow

* it aint stopping :D

* another missed type issue

* ...

* change design around audio tokenizer to prioritize init and go for auto - in regards to the review

* change to new causal mask function + docstrings

* change ternary

* docs

* remove todo, i dont think its essential tbh

* remove pipeline as current pipelines do not fit in the current scheme, same as csm

* closer to wrapping up the processor

* text to audio, just for demo purposes (will likely be reverted)

* check if it's this

* save audio function

* ensure no grad

* fixes on prefixed audio, hop length is used via preprocess dac, device fixes

* integration tests (tested locally on a100) + some processor utils / fixes

* style

* nits

* another round of smaller things

* docs + some fixes (generate one might be big)

* msytery solved

* small fix on conversion

* add abstract audio tokenizer, change init check to abstract class

* nits

* update docs + fix some processing :D

* change inheritance scheme for audio tokenizer

* delete dead / unnecessary code in copied generate loop

* last nits on new pipeline behavior (+ todo on tests) + style

* trigger

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Vasqu <antonprogamer@gmail.com>
2025-06-26 11:04:23 +00:00
..
internal Remove all traces of low_cpu_mem_usage (#38792) 2025-06-12 16:39:33 +02:00
main_classes Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
model_doc Add Dia model (#38405) 2025-06-26 11:04:23 +00:00
quantization Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
reference Enhance Model Loading By Providing Parallelism, Uses Optional Env Flag (#36835) 2025-05-23 16:39:47 +00:00
tasks No more Tuple, List, Dict (#38797) 2025-06-17 19:37:18 +01:00
_config.py Add optimized PixtralImageProcessorFast (#34836) 2024-11-28 16:04:05 +01:00
_redirects.yml Docs / Quantization: Redirect deleted page (#31063) 2024-05-28 18:29:22 +02:00
_toctree.yml Add Dia model (#38405) 2025-06-26 11:04:23 +00:00
accelerate.md change fsdp_strategy to fsdp in TrainingArguments in accelerate doc (#38807) 2025-06-13 15:32:40 +00:00
accelerator_selection.md [docs] add xpu environment variable for gpu selection (#38194) 2025-05-30 16:05:07 +00:00
add_new_model.md No more Tuple, List, Dict (#38797) 2025-06-17 19:37:18 +01:00
add_new_pipeline.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
agents.md [agents] remove agents 🧹 (#37368) 2025-04-11 18:42:37 +01:00
attention_interface.md No more Tuple, List, Dict (#38797) 2025-06-17 19:37:18 +01:00
auto_docstring.md [AutoDocstring] Based on inspect parsing of the signature (#33771) 2025-05-08 17:46:07 -04:00
backbones.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
cache_explanation.md [docs] Format fix (#38414) 2025-06-03 09:53:23 -07:00
chat_extras.md Update chat_extras.md with content correction (#36599) 2025-03-07 13:09:02 +00:00
chat_templating_multimodal.md [chat-template] Unify tests and clean up 🧼 (#37275) 2025-04-10 14:42:32 +02:00
chat_templating_writing.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
chat_templating.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
community.md Fixed Majority of the Typos in transformers[en] Documentation (#33350) 2024-09-09 10:47:24 +02:00
contributing.md Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
conversations.md [chat] generate parameterization powered by GenerationConfig and UX-related changes (#38047) 2025-05-12 14:04:41 +01:00
custom_models.md No more Tuple, List, Dict (#38797) 2025-06-17 19:37:18 +01:00
debugging.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
deepspeed.md chore: Fix typos in docs and examples (#36524) 2025-03-04 13:47:41 +00:00
executorch.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
fast_tokenizers.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
feature_extractors.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
fsdp.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
generation_features.md chore: Fix typos in docs and examples (#36524) 2025-03-04 13:47:41 +00:00
generation_strategies.md Fix custom generate from local directory (#38916) 2025-06-20 17:36:57 +01:00
gguf.md Fix gguf docs (#36601) 2025-03-11 15:29:14 +01:00
glossary.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
how_to_hack_models.md [doc] fix bugs in how_to_hack_models.md (#38198) 2025-05-19 10:37:54 -07:00
hpo_train.md [Nit] Add Note on SigOpt being in Public Archive Mode (#38610) 2025-06-05 14:07:23 -07:00
image_processors.md 🔴 Video processors as a separate class (#35206) 2025-05-12 11:55:51 +02:00
index.md [docs] Update docs moved to the course (#38800) 2025-06-13 12:02:27 -07:00
installation.md byebye torch 2.0 (#37277) 2025-04-07 15:19:47 +02:00
kv_cache.md [docs] update cache docs with new info (#38775) 2025-06-13 07:10:56 +00:00
llm_optims.md [CI] green llama tests (#37244) 2025-04-03 14:15:53 +01:00
llm_tutorial_optimization.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
llm_tutorial.md No more Tuple, List, Dict (#38797) 2025-06-17 19:37:18 +01:00
model_memory_anatomy.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
model_sharing.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
models.md Fix grammatical error in models documentation (#39019) 2025-06-25 14:55:22 +00:00
modular_transformers.md [modular] CLI allows positional arguments, and more defaults names for the optional arg (#38979) 2025-06-23 12:40:01 +02:00
notebooks.md Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
optimizers.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
pad_truncation.md Fixed Majority of the Typos in transformers[en] Documentation (#33350) 2024-09-09 10:47:24 +02:00
peft.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
perf_hardware.md chore: Fix typos in docs and examples (#36524) 2025-03-04 13:47:41 +00:00
perf_infer_cpu.md remove ipex_optimize_model usage (#38632) 2025-06-06 20:04:44 +02:00
perf_infer_gpu_multi.md Fix: make docs work better with doc builder (#38213) 2025-05-20 08:23:03 +00:00
perf_infer_gpu_one.md Small typo lines 47 and 199 perf_infer_gpu_one.md (#37938) 2025-05-06 14:32:55 +01:00
perf_torch_compile.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
perf_train_cpu_many.md remove ipex_optimize_model usage (#38632) 2025-06-06 20:04:44 +02:00
perf_train_cpu.md remove ipex_optimize_model usage (#38632) 2025-06-06 20:04:44 +02:00
perf_train_gaudi.md Add Intel Gaudi doc (#37855) 2025-04-29 13:28:06 -07:00
perf_train_gpu_many.md docs: fix typo (#37567) 2025-04-17 14:54:44 +01:00
perf_train_gpu_one.md [docs] Typos - Single GPU efficient training features (#38964) 2025-06-23 12:33:10 -07:00
perf_train_special.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
perf_train_tpu_tf.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
perplexity.md [docs] use device-agnostic API instead of cuda (#34913) 2024-11-26 09:23:34 -08:00
philosophy.md [docs] fixed links with 404 (#27327) 2023-11-06 19:45:03 +00:00
pipeline_gradio.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
pipeline_tutorial.md chore: Fix typos in docs and examples (#36524) 2025-03-04 13:47:41 +00:00
pipeline_webserver.md fix and enhance pipeline_webserver.md (#36992) 2025-04-15 08:35:05 -07:00
pr_checks.md Fixed Majority of the Typos in transformers[en] Documentation (#33350) 2024-09-09 10:47:24 +02:00
processors.md [docs] add Audio import (#38195) 2025-05-19 13:16:35 +00:00
quicktour.md Add Hugging Face authentication procedure for IDEs (PyCharm, VS Code,… (#38954) 2025-06-24 11:48:15 -07:00
run_scripts.md Remove research projects (#36645) 2025-03-11 13:47:38 +00:00
serialization.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
serving.md fix docs serving typos. (#37936) 2025-05-06 14:32:44 +01:00
testing.md [tests] remove TF tests (uses of require_tf) (#38944) 2025-06-25 17:29:10 +00:00
tf_xla.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
tflite.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
tokenizer_summary.md Use HF papers (#38184) 2025-06-13 11:07:09 +00:00
tools.md [agents] remove agents 🧹 (#37368) 2025-04-11 18:42:37 +01:00
torchscript.md Fix wording in torchscript.md (#38004) 2025-05-08 16:47:45 +01:00
trainer.md feat: add flexible Liger Kernel configuration to TrainingArguments (#38911) 2025-06-19 15:54:08 +00:00
training.md [docs] Redesign (#31757) 2025-03-03 10:33:46 -08:00
troubleshooting.md Update all references to canonical models (#29001) 2024-02-16 08:16:58 +01:00
video_processors.md 🔴 Video processors as a separate class (#35206) 2025-05-12 11:55:51 +02:00