eustlb
|
798f948e88
|
Add CSM model (#36719)
* draft structure
* depth decoder with forward pre hook
* full model forward draft
* draft update
* depth decoder update
* ConversationalSpeechModelForCausalLM udpates
* add generate
* max length criteria small fix
* udpate
* updates
* generation update
* update in loss compute
* conversion script
* update for correct input embeddings
* handle interleaved rope
* update
* update
* update
* support compile
* update training
* add doc
* update doc
* correct inits
* ConversationalSpeechModel -> Csm
* conf update
* name update
* tests CsmForCausalLMTest
* convert use cached_file
* conf + modeling updates
* generate utils handle third dim shape
* integration test
* modeling + conf updates
* common test handle more than 2 dims
* add nested audio list utils
* processing handle nested audio list
* csm processing draft
* mimi util
* init updates
* modular update
* convert modular
* processing update
* csm tests update
* generate tests handle third dim
* generate utils handle third dim
* propagate _get_initial_cache_position update
* tied_weight_keys update + convert correctly
* fix inputs_embeds
* revert audio nested list
* batch inference update + return audio
* audio_utils update
* processor update
* some more integration tests
* remove old test
* porcessing output labels
* improve
* fix
* update rope values with equivalent ones
* conversion update
* udpate tests
* handle depth decoder generation config
* remove default eos_token_id
* make style
* revert modeling_mimi
* add default generation_config
* remove sdpa since handled by default
* make
* fix conflict
* fix conflicts
* correct naming
* correct imports
* make
* causal -> conditional naming
* causal -> conditional naming
* auto update
* make
* make
* add doc
* test update
* fix weight init
* audio tokens offsets as buffer
* 4d mask in conditional class
* make
* doc update
* fix causal mask
* fix causal mask
* doc update
* doc update
* add processor doc
* update doc
* fix 4d causal mask
* update make_list_of_audio
* do not default to mutable
* remove duplicates
* remove useless reset_parameters
* use GradientCheckpointingLayer
* use can_return_tuple
* formatting
* prepend placeholder in _sample
* torch compile fix
* some more fixies
* convert modular
* fix
* default max_length in convert
* handle depth decoder generation config correctly
* clearer formulation
* handle output_loading_info
* handle softmax warning
* add doc
* propagate _get_initial_cache_position changes
* generation in its own module
* add processor tests
* fix compile witu cuda graphs
* fix compile with cuda graphs
* add csm.md
* include CSM loss
* doc nit
* doc nit
* doc nit
* Update docs/source/en/model_doc/csm.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add save_audio to processor
* Update src/transformers/models/csm/modular_csm.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* doc update
* simplify audio_codes_mask computation
* doc update
* simplify loss computation
* fix static cache test
* fix
* remove comment
* simplify encoded length computation
* use hf-internal-testing
* doc update
* cast to float before numpy
* nit
* mem efficient codebook head
* nit
* cat input values with cutoffs
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
|
2025-05-07 10:20:13 -04:00 |
|