mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-30 01:32:23 +06:00
![]() * first raw version of the bark integration * working code on small models with single run * add converting script from suno weights 2 hf * many changes * correct past_kv output * working implementation for inference * update the converting script according to the architecture changes * add a working end-to-end inference code * remove some comments and make small changes * remove unecessary comment * add docstrings and ensure no unecessary intermediary output during audio generation * remove done TODOs * make style + add config docstrings * modification for batch inference support on the whole model * add details to .generation_audio method * add copyright * convert EncodecModel from original library to transformers implementation * add two class in order to facilitate model and sub-models loading from the hub * add support of loading the whole model * add BarkProcessor * correct modeling according to processor output * Add proper __init__ and auto support * Add up-to-date copyright/license message * add relative import instead of absolute * cleaner head_dim computation * small comment removal or changes * more verbose LayerNorm init method * specify eps for clearer comprehension * more verbose variable naming in the MLP module * remove unecessary BarkBlock parameter * clearer code in the forward pass of the BarkBlock * remove _initialize_modules method for cleaner code * Remove unnecessary methods from sub-models * move code to remove unnecessary function * rename a variable for clarity and change an assert * move code and change variable name for clarity * remove unnecessary asserts * correct small bug * correct a comment * change variable names for clarity * remove asserts * change import from absolute to relative * correct small error due to comma missing + correct import * Add attribute Bark config * add first version of tests * update attention_map * add tie_weights and resize_token_embeddings for fineModel * correct getting attention_mask in generate_text_semantic * remove Bark inference trick * leave more choices in barkProcessor * remove _no_split_modules * fixe error in forward of block and introduce clearer notations * correct converting script with last changes * make style + add draft bark.mdx * correct BarkModelTest::test_generate_text_semantic * add Bark in main README * add dummy_pt_objects for Bark * add missing models in the main init * correct test_decoder_model_past_with_large_inputs * disable torchscript test * change docstring of BarkProcessor * Add test_processor_bark * make style * correct copyrights * add bark.mdx + make style, quality and consistency * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Remove unnecessary test method * simply logic of a test * Only check first ids for slow audio generation * split full end-to-end generation tests * remove unneccessary comment * change submodel names for clearer naming * remove ModuleDict from modeling_bark * combine two if statements * ensure that an edge misued won't happen * modify variable name * move code snippet to the right place (coarse instead of semantic) * change BarkSemanticModule -> BarkSemanticModel * align BarkProcessor with transformers paradigm * correct BarkProcessor tests with last commit changes * change _validate_voice_preset to an instance method instead of a class method * tie_weights already called with post_init * add codec_model config to configuration * update bark modeling tests with recent BarkProcessor changes * remove SubModelPretrainedModel + change speakers embeddings prompt type in BarkModel * change absolute imports to relative * remove TODO * change docstrings * add examples to docs and docstrings * make style * uses BatchFeature in BarkProcessor insteads of dict * continue improving docstrings and docs + make style * correct docstrings examples * more comprehensible speaker_embeddings load/Save * rename speaker_embeddings_dict -> speaker_embeddings * correct bark.mdx + add bark to documentation_tests * correct docstrings configuration_bark * integrate last nit suggestions * integrate BarkGeneration configs * make style * remove bark tests from documentation_tests.txt because timeout - tested manually * add proper generation config initialization * small bark.mdx documentation changes * rename bark.mdx -> bark.md * add torch.no_grad behind BarkModel.generate_audio() * replace assert by ValueError in convert_suno_to_hf.py * integrate a series of short comments from reviewer * move SemanticLogitsProcessors and remove .detach() from Bark docs and docstrings * actually remove SemanticLogitsProcessor from modeling_bark.oy * BarkProcessor returns a single output instead of tuple + correct docstrings * make style + correct bug * add initializer_range to BarkConfig + correct slow modeling tests * add .clone() to history_prompt.coarse_prompt to avoid modifying input array * Making sure no extra "`" are present * remove extra characters in modeling_bark.py * Correct output if history_prompt is None * remove TODOs * remove ravel comment * completing generation_configuration_bark.py docstrings * change docstrings - number of audio codebooks instead of Encodec codebooks * change 'bias' docstrings in configuration_bark.py * format code * rename BarkModel.generate_audio -> BarkModel.generate_speech * modify AutoConfig instead of EncodecConfig in BarkConfig * correct AutoConfig wrong init * refactor BarkModel and sub-models generate_coarse, generate_fine, generate_text_semantic * remove SemanticLogitsProcessor and replace it with SuppressTokensLogitsProcessor * move nb_codebook related config arguments to BarkFineConfig * rename bark.mdx -> bark.md * correcting BarkModelConfig from_pretrained + remove keys_to_ignore * correct bark.md with correct hub path * correct code bug in bark.md * correct list tokens_to_suppress * modify Processor to load nested speaker embeddings in a safer way * correct batch sampling in BarkFineModel.generate_fine * Apply suggestions from code review Small docstrings correction and code improvements Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * give more details about num_layers in docstrings * correct indentation mistake * correct submodelconfig order of docstring variables * put audio models in alphabetical order in utils/check_repo.my * remove useless line from test_modeling_bark.py * makes BarkCoarseModelTest inherits from (ModelTesterMixin, GenerationTesterMixin, unittest.TestCase) instead of BarkSemanticModelTest * make a Tester class for each sub-model instead of inheriting * add test_resize_embeddings=True for Bark sub-models * add Copied from transformers.models.gpt_neo.modeling_gpt_neo.GPTNeoSelfAttention._split_heads * remove 'Copied fom Bark' comment * remove unneccessary comment * change np.min -> min in modeling_bark.py * refactored all custom layers to have Bark prefix * add attention_mask as an argument of generate_text_semantic * refactor sub-models start docstrings to have more precise config class definition * move _tied_weights_keys overriding * add docstrings to generate_xxx in modeling_bark.py * add loading whole BarkModel to convert_suno_to_hf * refactor attribute and variable names * make style convert_suno * update bark checkpoints * remove never entered if statement * move bark_modeling docstrings after BarkPretrainedModel class definition * refactor modeling_bark.py: kv -> key_values * small nits - code refactoring and removing unecessary lines from _init_weights * nits - replace inplace method by variable assigning * remove *optional* when necessary * remove some lines in generate_speech * add default value for optional parameter * Refactor preprocess_histories_before_coarse -> preprocess_histories Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * correct usage after refactoring * refactor Bark's generate_xxx -> generate and modify docstrings and tests accordingly * update docstrings python in configuration_bark.py * add bark files in utils/documentation_test.txt * correct docstrings python snippet * add the ability to use parameters in the form of e.g coarse_temperature * add semantic_max_new_tokens in python snippet in docstrings for quicker generation * Reformate sub-models kwargs in BakModel.generate Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * correct kwargs in BarkModel.generate * correct attention_mask kwarg in BarkModel.generate * add tests for sub-models args in BarkModel.generate and correct BarkFineModel.test_generate_fp16 * enrich BarkModel.generate docstrings with a description of how to use the kwargs --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> |
||
---|---|---|
.. | ||
albert.md | ||
align.md | ||
altclip.md | ||
audio-spectrogram-transformer.md | ||
auto.md | ||
autoformer.md | ||
bark.md | ||
bart.md | ||
barthez.md | ||
bartpho.md | ||
beit.md | ||
bert-generation.md | ||
bert-japanese.md | ||
bert.md | ||
bertweet.md | ||
big_bird.md | ||
bigbird_pegasus.md | ||
biogpt.md | ||
bit.md | ||
blenderbot-small.md | ||
blenderbot.md | ||
blip-2.md | ||
blip.md | ||
bloom.md | ||
bort.md | ||
bridgetower.md | ||
byt5.md | ||
camembert.md | ||
canine.md | ||
chinese_clip.md | ||
clap.md | ||
clip.md | ||
clipseg.md | ||
codegen.md | ||
conditional_detr.md | ||
convbert.md | ||
convnext.md | ||
convnextv2.md | ||
cpm.md | ||
cpmant.md | ||
ctrl.md | ||
cvt.md | ||
data2vec.md | ||
deberta-v2.md | ||
deberta.md | ||
decision_transformer.md | ||
deformable_detr.md | ||
deit.md | ||
deplot.md | ||
deta.md | ||
detr.md | ||
dialogpt.md | ||
dinat.md | ||
distilbert.md | ||
dit.md | ||
donut.md | ||
dpr.md | ||
dpt.md | ||
efficientformer.md | ||
efficientnet.md | ||
electra.md | ||
encodec.md | ||
encoder-decoder.md | ||
ernie_m.md | ||
ernie.md | ||
esm.md | ||
flan-t5.md | ||
flan-ul2.md | ||
flaubert.md | ||
flava.md | ||
fnet.md | ||
focalnet.md | ||
fsmt.md | ||
funnel.md | ||
git.md | ||
glpn.md | ||
gpt_bigcode.md | ||
gpt_neo.md | ||
gpt_neox_japanese.md | ||
gpt_neox.md | ||
gpt-sw3.md | ||
gpt2.md | ||
gptj.md | ||
gptsan-japanese.md | ||
graphormer.md | ||
groupvit.md | ||
herbert.md | ||
hubert.md | ||
ibert.md | ||
imagegpt.md | ||
informer.md | ||
instructblip.md | ||
jukebox.md | ||
layoutlm.md | ||
layoutlmv2.md | ||
layoutlmv3.md | ||
layoutxlm.md | ||
led.md | ||
levit.md | ||
lilt.md | ||
llama.md | ||
longformer.md | ||
longt5.md | ||
luke.md | ||
lxmert.md | ||
m2m_100.md | ||
marian.md | ||
markuplm.md | ||
mask2former.md | ||
maskformer.md | ||
matcha.md | ||
mbart.md | ||
mctct.md | ||
mega.md | ||
megatron_gpt2.md | ||
megatron-bert.md | ||
mgp-str.md | ||
mluke.md | ||
mms.md | ||
mobilebert.md | ||
mobilenet_v1.md | ||
mobilenet_v2.md | ||
mobilevit.md | ||
mobilevitv2.md | ||
mpnet.md | ||
mra.md | ||
mt5.md | ||
musicgen.md | ||
mvp.md | ||
nat.md | ||
nezha.md | ||
nllb-moe.md | ||
nllb.md | ||
nystromformer.md | ||
oneformer.md | ||
open-llama.md | ||
openai-gpt.md | ||
opt.md | ||
owlvit.md | ||
pegasus_x.md | ||
pegasus.md | ||
perceiver.md | ||
phobert.md | ||
pix2struct.md | ||
plbart.md | ||
poolformer.md | ||
prophetnet.md | ||
qdqbert.md | ||
rag.md | ||
realm.md | ||
reformer.md | ||
regnet.md | ||
rembert.md | ||
resnet.md | ||
retribert.md | ||
roberta-prelayernorm.md | ||
roberta.md | ||
roc_bert.md | ||
roformer.md | ||
rwkv.md | ||
sam.md | ||
segformer.md | ||
sew-d.md | ||
sew.md | ||
speech_to_text_2.md | ||
speech_to_text.md | ||
speech-encoder-decoder.md | ||
speecht5.md | ||
splinter.md | ||
squeezebert.md | ||
swiftformer.md | ||
swin.md | ||
swin2sr.md | ||
swinv2.md | ||
switch_transformers.md | ||
t5.md | ||
t5v1.1.md | ||
table-transformer.md | ||
tapas.md | ||
tapex.md | ||
time_series_transformer.md | ||
timesformer.md | ||
trajectory_transformer.md | ||
transfo-xl.md | ||
trocr.md | ||
tvlt.md | ||
ul2.md | ||
umt5.md | ||
unispeech-sat.md | ||
unispeech.md | ||
upernet.md | ||
van.md | ||
videomae.md | ||
vilt.md | ||
vision-encoder-decoder.md | ||
vision-text-dual-encoder.md | ||
visual_bert.md | ||
vit_hybrid.md | ||
vit_mae.md | ||
vit_msn.md | ||
vit.md | ||
vivit.md | ||
wav2vec2_phoneme.md | ||
wav2vec2-conformer.md | ||
wav2vec2.md | ||
wavlm.md | ||
whisper.md | ||
xclip.md | ||
xglm.md | ||
xlm-prophetnet.md | ||
xlm-roberta-xl.md | ||
xlm-roberta.md | ||
xlm-v.md | ||
xlm.md | ||
xlnet.md | ||
xls_r.md | ||
xlsr_wav2vec2.md | ||
xmod.md | ||
yolos.md | ||
yoso.md |