mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-06 22:30:09 +06:00
![]() * first raw commit * still POC * tentative convert script * almost working speech encoder conversion scripts * intermediate code for encoder/decoders * add modeling code * first version of speech encoder * make style * add new adapter layer architecture * add adapter block * add first tentative config * add working speech encoder conversion * base model convert works now * make style * remove unnecessary classes * remove unecessary functions * add modeling code speech encoder * rework logics * forward pass of sub components work * add modeling codes * some config modifs and modeling code modifs * save WIP * new edits * same output speech encoder * correct attention mask * correct attention mask * fix generation * new generation logics * erase comments * make style * fix typo * add some descriptions * new state * clean imports * add tests * make style * make beam search and num_return_sequences>1 works * correct edge case issue * correct SeamlessM4TConformerSamePadLayer copied from * replace ACT2FN relu by nn.relu * remove unecessary return variable * move back a class * change name conformer_attention_mask ->conv_attention_mask * better nit code * add some Copied from statements * small nits * small nit in dict.get * rename t2u model -> conditionalgeneration * ongoing refactoring of structure * update models architecture * remove SeamlessM4TMultiModal classes * add tests * adapt tests * some non-working code for vocoder * add seamlessM4T vocoder * remove buggy line * fix some hifigan related bugs * remove hifigan specifc config * change * add WIP tokenization * add seamlessM4T working tokenzier * update tokenization * add tentative feature extractor * Update converting script * update working FE * refactor input_values -> input_features * update FE * changes in generation, tokenizer and modeling * make style and add t2u_decoder_input_ids * add intermediate outputs for ToSpeech models * add vocoder to speech models * update valueerror * update FE with languages * add vocoder convert * update config docstrings and names * update generation code and configuration * remove todos and update config.pad_token_id to generation_config.pad_token_id * move block vocoder * remove unecessary code and uniformize tospeech code * add feature extractor import * make style and fix some copies from * correct consistency + make fix-copies * add processor code * remove comments * add fast tokenizer support * correct pad_token_id in M4TModel * correct config * update tests and codes + make style * make some suggested correstion - correct comments and change naming * rename some attributes * rename some attributes * remove unecessary sequential * remove option to use dur predictor * nit * refactor hifigan * replace normalize_mean and normalize_var with do_normalize + save lang ids to generation config * add tests * change tgt_lang logic * update generation ToSpeech * add support import SeamlessM4TProcessor * fix generate * make tests * update integration tests, add option to only return text and update tokenizer fast * fix wrong function call * update import and convert script * update integration tests + update repo id * correct paths and add first test * update how new attention masks are computed * update tests * take first care of batching in vocoder code * add batching with the vocoder * add waveform lengths to model outputs * make style * add generate kwargs + forward kwargs of M4TModel * add docstrings forward methods * reformate docstrings * add docstrings t2u model * add another round of modeling docstrings + reformate speaker_id -> spkr_id * make style * fix check_repo * make style * add seamlessm4t to toctree * correct check_config_attributes * write config docstrings + some modifs * make style * add docstrings tokenizer * add docstrings to processor, fe and tokenizers * make style * write first version of model docs * fix FE + correct FE test * fix tokenizer + add correct integration tests * fix most tokenization tests * make style * correct most processor test * add generation tests and fix num_return_sequences > 1 * correct integration tests -still one left * make style * correct position embedding * change numbeams to 1 * refactor some modeling code and correct one test * make style * correct typo * refactor intermediate fnn * refactor feedforward conformer * make style * remove comments * make style * fix tokenizer tests * make style * correct processor tests * make style * correct S2TT integration * Apply suggestions from Sanchit code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * correct typo * replace torch.nn->nn + make style * change Output naming (waveforms -> waveform) and ordering * nit renaming and formating * remove return None when not necessary * refactor SeamlessM4TConformerFeedForward * nit typo * remove almost copied from comments * add a copied from comment and remove an unecessary dropout * remove inputs_embeds from speechencoder * remove backward compatibiliy function * reformate class docstrings for a few components * remove unecessary methods * split over 2 lines smthg hard to read * make style * replace two steps offset by one step as suggested * nice typo * move warnings * remove useless lines from processor * make generation non-standard test more robusts * remove torch.inference_mode from tests * split integration tests * enrich md * rename control_symbol_vocoder_offset->vocoder_offset * clean convert file * remove tgt_lang and src_lang from FE * change generate docstring of ToText models * update generate docstring of tospeech models * unify how to deal withtext_decoder_input_ids * add default spkr_id * unify tgt_lang for t2u_model * simplify tgt_lang verification * remove a todo * change config docstring * make style * simplify t2u_tgt_lang_id * make style * enrich/correct comments * enrich .md * correct typo in docstrings * add torchaudio dependency * update tokenizer * make style and fix copies * modify SeamlessM4TConverter with new tokenizer behaviour * make style * correct small typo docs * fix import * update docs and add requirement to tests * add convert_fairseq2_to_hf in utils/not_doctested.txt * update FE * fix imports and make style * remove torchaudio in FE test * add seamless_m4t.md to utils/not_doctested.txt * nits and change the way docstring dataset is loaded * move checkpoints from ylacombe/ to facebook/ orga * refactor warning/error to be in the 119 line width limit * round overly precised floats * add stereo audio behaviour * refactor .md and make style * enrich docs with more precised architecture description * readd undocumented models * make fix-copies * apply some suggestions * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * correct bug from previous commit * refactor a parameter allowing to clean the code + some small nits * clean tokenizer * make style and fix * make style * clean tokenizers arguments * add precisions for some tests * move docs from not_tested to slow * modify tokenizer according to last comments * add copied from statements in tests * correct convert script * correct parameter docstring style * correct tokenization * correct multi gpus * make style * clean modeling code * make style * add copied from statements * add copied statements * add support with ASR pipeline * remove file added inadvertently * fix docstrings seamlessM4TModel * add seamlessM4TConfig to OBJECTS_TO_IGNORE due of unconventional markdown * add seamlessm4t to assisted generation ignored models --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> |
||
---|---|---|
.. | ||
albert.md | ||
align.md | ||
altclip.md | ||
audio-spectrogram-transformer.md | ||
auto.md | ||
autoformer.md | ||
bark.md | ||
bart.md | ||
barthez.md | ||
bartpho.md | ||
beit.md | ||
bert-generation.md | ||
bert-japanese.md | ||
bert.md | ||
bertweet.md | ||
big_bird.md | ||
bigbird_pegasus.md | ||
biogpt.md | ||
bit.md | ||
blenderbot-small.md | ||
blenderbot.md | ||
blip-2.md | ||
blip.md | ||
bloom.md | ||
bort.md | ||
bridgetower.md | ||
bros.md | ||
byt5.md | ||
camembert.md | ||
canine.md | ||
chinese_clip.md | ||
clap.md | ||
clip.md | ||
clipseg.md | ||
code_llama.md | ||
codegen.md | ||
conditional_detr.md | ||
convbert.md | ||
convnext.md | ||
convnextv2.md | ||
cpm.md | ||
cpmant.md | ||
ctrl.md | ||
cvt.md | ||
data2vec.md | ||
deberta-v2.md | ||
deberta.md | ||
decision_transformer.md | ||
deformable_detr.md | ||
deit.md | ||
deplot.md | ||
deta.md | ||
detr.md | ||
dialogpt.md | ||
dinat.md | ||
dinov2.md | ||
distilbert.md | ||
dit.md | ||
donut.md | ||
dpr.md | ||
dpt.md | ||
efficientformer.md | ||
efficientnet.md | ||
electra.md | ||
encodec.md | ||
encoder-decoder.md | ||
ernie_m.md | ||
ernie.md | ||
esm.md | ||
falcon.md | ||
flan-t5.md | ||
flan-ul2.md | ||
flaubert.md | ||
flava.md | ||
fnet.md | ||
focalnet.md | ||
fsmt.md | ||
funnel.md | ||
fuyu.md | ||
git.md | ||
glpn.md | ||
gpt_bigcode.md | ||
gpt_neo.md | ||
gpt_neox_japanese.md | ||
gpt_neox.md | ||
gpt-sw3.md | ||
gpt2.md | ||
gptj.md | ||
gptsan-japanese.md | ||
graphormer.md | ||
groupvit.md | ||
herbert.md | ||
hubert.md | ||
ibert.md | ||
idefics.md | ||
imagegpt.md | ||
informer.md | ||
instructblip.md | ||
jukebox.md | ||
layoutlm.md | ||
layoutlmv2.md | ||
layoutlmv3.md | ||
layoutxlm.md | ||
led.md | ||
levit.md | ||
lilt.md | ||
llama.md | ||
llama2.md | ||
longformer.md | ||
longt5.md | ||
luke.md | ||
lxmert.md | ||
m2m_100.md | ||
marian.md | ||
markuplm.md | ||
mask2former.md | ||
maskformer.md | ||
matcha.md | ||
mbart.md | ||
mctct.md | ||
mega.md | ||
megatron_gpt2.md | ||
megatron-bert.md | ||
mgp-str.md | ||
mistral.md | ||
mluke.md | ||
mms.md | ||
mobilebert.md | ||
mobilenet_v1.md | ||
mobilenet_v2.md | ||
mobilevit.md | ||
mobilevitv2.md | ||
mpnet.md | ||
mpt.md | ||
mra.md | ||
mt5.md | ||
musicgen.md | ||
mvp.md | ||
nat.md | ||
nezha.md | ||
nllb-moe.md | ||
nllb.md | ||
nougat.md | ||
nystromformer.md | ||
oneformer.md | ||
open-llama.md | ||
openai-gpt.md | ||
opt.md | ||
owlv2.md | ||
owlvit.md | ||
pegasus_x.md | ||
pegasus.md | ||
perceiver.md | ||
persimmon.md | ||
phobert.md | ||
pix2struct.md | ||
plbart.md | ||
poolformer.md | ||
pop2piano.md | ||
prophetnet.md | ||
pvt.md | ||
qdqbert.md | ||
rag.md | ||
realm.md | ||
reformer.md | ||
regnet.md | ||
rembert.md | ||
resnet.md | ||
retribert.md | ||
roberta-prelayernorm.md | ||
roberta.md | ||
roc_bert.md | ||
roformer.md | ||
rwkv.md | ||
sam.md | ||
seamless_m4t.md | ||
segformer.md | ||
sew-d.md | ||
sew.md | ||
speech_to_text_2.md | ||
speech_to_text.md | ||
speech-encoder-decoder.md | ||
speecht5.md | ||
splinter.md | ||
squeezebert.md | ||
swiftformer.md | ||
swin.md | ||
swin2sr.md | ||
swinv2.md | ||
switch_transformers.md | ||
t5.md | ||
t5v1.1.md | ||
table-transformer.md | ||
tapas.md | ||
tapex.md | ||
time_series_transformer.md | ||
timesformer.md | ||
trajectory_transformer.md | ||
transfo-xl.md | ||
trocr.md | ||
tvlt.md | ||
ul2.md | ||
umt5.md | ||
unispeech-sat.md | ||
unispeech.md | ||
upernet.md | ||
van.md | ||
videomae.md | ||
vilt.md | ||
vision-encoder-decoder.md | ||
vision-text-dual-encoder.md | ||
visual_bert.md | ||
vit_hybrid.md | ||
vit_mae.md | ||
vit_msn.md | ||
vit.md | ||
vitdet.md | ||
vitmatte.md | ||
vits.md | ||
vivit.md | ||
wav2vec2_phoneme.md | ||
wav2vec2-conformer.md | ||
wav2vec2.md | ||
wavlm.md | ||
whisper.md | ||
xclip.md | ||
xglm.md | ||
xlm-prophetnet.md | ||
xlm-roberta-xl.md | ||
xlm-roberta.md | ||
xlm-v.md | ||
xlm.md | ||
xlnet.md | ||
xls_r.md | ||
xlsr_wav2vec2.md | ||
xmod.md | ||
yolos.md | ||
yoso.md |