transformers/docs/source/en
Yoach Lacombe cb45f71c4d
Add Seamless M4T model (#25693)
* first raw commit

* still POC

* tentative convert script

* almost working speech encoder conversion scripts

* intermediate code for encoder/decoders

* add modeling code

* first version of speech encoder

* make style

* add new adapter layer architecture

* add adapter block

* add first tentative config

* add working speech encoder conversion

* base model convert works now

* make style

* remove unnecessary classes

* remove unecessary functions

* add modeling code speech encoder

* rework logics

* forward pass of sub components work

* add modeling codes

* some config modifs and modeling code modifs

* save WIP

* new edits

* same output speech encoder

* correct attention mask

* correct attention mask

* fix generation

* new generation logics

* erase comments

* make style

* fix typo

* add some descriptions

* new state

* clean imports

* add tests

* make style

* make beam search and num_return_sequences>1 works

* correct edge case issue

* correct SeamlessM4TConformerSamePadLayer copied from

* replace ACT2FN relu by nn.relu

* remove unecessary return variable

* move back a class

* change name conformer_attention_mask ->conv_attention_mask

* better nit code

* add some Copied from statements

* small nits

* small nit in dict.get

* rename t2u model -> conditionalgeneration

* ongoing refactoring of structure

* update models architecture

* remove SeamlessM4TMultiModal classes

* add tests

* adapt tests

* some non-working code for vocoder

* add seamlessM4T vocoder

* remove buggy line

* fix some hifigan related bugs

* remove hifigan specifc config

* change

* add WIP tokenization

* add seamlessM4T working tokenzier

* update tokenization

* add tentative feature extractor

* Update converting script

* update working FE

* refactor input_values -> input_features

* update FE

* changes in generation, tokenizer and modeling

* make style and add t2u_decoder_input_ids

* add intermediate outputs for ToSpeech models

* add vocoder to speech models

* update valueerror

* update FE with languages

* add vocoder convert

* update config docstrings and names

* update generation code and configuration

* remove todos and update config.pad_token_id to generation_config.pad_token_id

* move block vocoder

* remove unecessary code and uniformize tospeech code

* add feature extractor import

* make style and fix some copies from

* correct consistency + make fix-copies

* add processor code

* remove comments

* add fast tokenizer support

* correct pad_token_id in M4TModel

* correct config

* update tests and codes  + make style

* make some suggested correstion - correct comments and change naming

* rename some attributes

* rename some attributes

* remove unecessary sequential

* remove option to use dur predictor

* nit

* refactor hifigan

* replace normalize_mean and normalize_var with do_normalize + save lang ids to generation config

* add tests

* change tgt_lang logic

* update generation ToSpeech

* add support import SeamlessM4TProcessor

* fix generate

* make tests

* update integration tests, add option to only return text and update tokenizer fast

* fix wrong function call

* update import and convert script

* update integration tests + update repo id

* correct paths and add first test

* update how new attention masks are computed

* update tests

* take first care of batching in vocoder code

* add batching with the vocoder

* add waveform lengths to model outputs

* make style

* add generate kwargs + forward kwargs of M4TModel

* add docstrings forward methods

* reformate docstrings

* add docstrings t2u model

* add another round of modeling docstrings + reformate speaker_id -> spkr_id

* make style

* fix check_repo

* make style

* add seamlessm4t to toctree

* correct check_config_attributes

* write config docstrings + some modifs

* make style

* add docstrings tokenizer

* add docstrings to processor, fe and tokenizers

* make style

* write first version of model docs

* fix FE + correct FE test

* fix tokenizer + add correct integration tests

* fix most tokenization tests

* make style

* correct most processor test

* add generation tests and fix num_return_sequences > 1

* correct integration tests -still one left

* make style

* correct position embedding

* change numbeams to 1

* refactor some modeling code and correct one test

* make style

* correct typo

* refactor intermediate fnn

* refactor feedforward conformer

* make style

* remove comments

* make style

* fix tokenizer tests

* make style

* correct processor tests

* make style

* correct S2TT integration

* Apply suggestions from Sanchit code review

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* correct typo

* replace torch.nn->nn + make style

* change Output naming (waveforms -> waveform) and ordering

* nit renaming and formating

* remove return None when not necessary

* refactor SeamlessM4TConformerFeedForward

* nit typo

* remove almost copied from comments

* add a copied from comment and remove an unecessary dropout

* remove inputs_embeds from speechencoder

* remove backward compatibiliy function

* reformate class docstrings for a few components

* remove unecessary methods

* split over 2 lines smthg hard to read

* make style

* replace two steps offset by one step as suggested

* nice typo

* move warnings

* remove useless lines from processor

* make generation non-standard test more robusts

* remove torch.inference_mode from tests

* split integration tests

* enrich md

* rename control_symbol_vocoder_offset->vocoder_offset

* clean convert file

* remove tgt_lang and src_lang from FE

* change generate docstring of ToText models

* update generate docstring of tospeech models

* unify how to deal withtext_decoder_input_ids

* add default spkr_id

* unify tgt_lang for t2u_model

* simplify tgt_lang verification

* remove a todo

* change config docstring

* make style

* simplify t2u_tgt_lang_id

* make style

* enrich/correct comments

* enrich .md

* correct typo in docstrings

* add torchaudio dependency

* update tokenizer

* make style and fix copies

* modify SeamlessM4TConverter with new tokenizer behaviour

* make style

* correct small typo docs

* fix import

* update docs and add requirement to tests

* add convert_fairseq2_to_hf in utils/not_doctested.txt

* update FE

* fix imports and make style

* remove torchaudio in FE test

* add seamless_m4t.md to utils/not_doctested.txt

* nits and change the way docstring dataset is loaded

* move checkpoints from ylacombe/ to facebook/ orga

* refactor warning/error to be in the 119 line width limit

* round overly precised floats

* add stereo audio behaviour

* refactor .md and make style

* enrich docs with more precised architecture description

* readd undocumented models

* make fix-copies

* apply some suggestions

* Apply suggestions from code review

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* correct bug from previous commit

* refactor a parameter allowing to clean the code + some small nits

* clean tokenizer

* make style and fix

* make style

* clean tokenizers arguments

* add precisions for some tests

* move docs from not_tested to slow

* modify tokenizer according to last comments

* add copied from statements in tests

* correct convert script

* correct parameter docstring style

* correct tokenization

* correct multi gpus

* make style

* clean modeling code

* make style

* add copied from statements

* add copied statements

* add support with ASR pipeline

* remove file added inadvertently

* fix docstrings seamlessM4TModel

* add seamlessM4TConfig to OBJECTS_TO_IGNORE due of unconventional markdown

* add seamlessm4t to assisted generation ignored models

---------

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-10-23 14:49:48 +02:00
..
internal Generate: add missing logits processors docs (#25653) 2023-08-25 11:56:17 +01:00
main_classes Fixed typos (#26810) 2023-10-16 09:52:29 +02:00
model_doc Add Seamless M4T model (#25693) 2023-10-23 14:49:48 +02:00
tasks Add Seamless M4T model (#25693) 2023-10-23 14:49:48 +02:00
_config.py Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
_toctree.yml Add Seamless M4T model (#25693) 2023-10-23 14:49:48 +02:00
accelerate.md Fix typos (#25936) 2023-09-04 11:15:12 +01:00
add_new_model.md Update add_new_model.md (#26365) 2023-09-25 12:58:11 +02:00
add_new_pipeline.md Update add_new_pipeline.md (#26197) 2023-09-19 00:41:16 +02:00
add_tensorflow_model.md Remove utils/documentation_tests.txt (#26213) 2023-09-18 13:33:01 +02:00
attention.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
autoclass_tutorial.md Update autoclass_tutorial.md (#25929) 2023-09-04 11:16:49 +01:00
benchmarks.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
bertology.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
big_models.md Fix typos (#25936) 2023-09-04 11:15:12 +01:00
chat_templating.md Update chat template docs with more tips on writing a template (#26625) 2023-10-06 12:04:40 +01:00
community.md Update community.md (#25928) 2023-09-04 11:16:34 +01:00
contributing.md Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
create_a_model.md Fix typos (#25936) 2023-09-04 11:15:12 +01:00
custom_models.md Fix typos (#25936) 2023-09-04 11:15:12 +01:00
custom_tools.md [doc] Always call it Agents for consistency (#25958) 2023-09-05 12:27:20 +01:00
debugging.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
fast_tokenizers.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
generation_strategies.md [docs] navigation improvement between text gen pipelines and text gen params (#26477) 2023-09-29 09:43:39 +02:00
glossary.md Fix typos (#25936) 2023-09-04 11:15:12 +01:00
hpo_train.md enable optuna multi-objectives feature (#25969) 2023-09-12 18:01:22 +01:00
index.md Add Seamless M4T model (#25693) 2023-10-23 14:49:48 +02:00
installation.md [docs] Update offline mode docs (#26478) 2023-09-29 09:42:21 +02:00
llm_tutorial_optimization.md Add LLM doc (#26058) 2023-10-16 16:09:50 +02:00
llm_tutorial.md Generate: update basic llm tutorial (#26937) 2023-10-19 16:53:28 +01:00
model_memory_anatomy.md Fix typos (#25936) 2023-09-04 11:15:12 +01:00
model_sharing.md Fix typos (#25936) 2023-09-04 11:15:12 +01:00
model_summary.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
multilingual.md Fix typo in example code (#25583) 2023-08-18 07:58:59 +02:00
notebooks.md Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
pad_truncation.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
peft.md [PEFT] Peft integration alternative design (#25077) 2023-08-18 19:08:03 +02:00
perf_hardware.md 🌐 [i18n-KO] Translated perf_hardware.md to Korean (#24966) 2023-07-25 07:44:24 -04:00
perf_infer_cpu.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
perf_infer_gpu_many.md [core ] Integrate Flash attention 2 in most used models (#25598) 2023-09-22 17:42:10 +02:00
perf_infer_gpu_one.md [Mistral] Add Flash Attention-2 support for mistral (#26464) 2023-10-03 13:44:46 +02:00
perf_infer_special.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
perf_torch_compile.md Fix rendering for torch.compile() docs (#25432) 2023-08-10 13:25:00 +02:00
perf_train_cpu_many.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
perf_train_cpu.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
perf_train_gpu_many.md deprecate sharded_ddp training argument (#24825) 2023-07-17 06:57:42 -04:00
perf_train_gpu_one.md [core ] Integrate Flash attention 2 in most used models (#25598) 2023-09-22 17:42:10 +02:00
perf_train_special.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
perf_train_tpu_tf.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
perf_train_tpu.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
performance.md [docs] Performance docs tidy up, part 1 (#23963) 2023-07-24 08:57:24 -04:00
perplexity.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
philosophy.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
pipeline_tutorial.md [ASR Pipe] Improve docs and error messages (#26476) 2023-09-29 18:32:37 +01:00
pipeline_webserver.md Suggestions on Pipeline_webserver (#25570) 2023-08-18 10:17:44 +02:00
pr_checks.md Docstring check (#26052) 2023-10-04 15:13:37 +02:00
preprocessing.md fix set_transform link docs (#26856) 2023-10-20 11:16:37 +02:00
quicktour.md [TYPO] fix typo/format in quicktour.md (#25519) 2023-08-16 08:03:23 +02:00
run_scripts.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
sagemaker.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
serialization.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
task_summary.md Fix doctest (#25031) 2023-07-25 22:10:06 +02:00
tasks_explained.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
testing.md add japanese documentation (#26138) 2023-10-11 10:26:37 -07:00
tf_xla.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
tflite.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
tokenizer_summary.md Fix typo: Roberta -> RoBERTa (#25302) 2023-08-03 14:17:30 -07:00
torchscript.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
training.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
transformers_agents.md [doc] Always call it Agents for consistency (#25958) 2023-09-05 12:27:20 +01:00
troubleshooting.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00