
* add dia model * add tokenizer files * cleanup some stuff * brut copy paste code * rough cleanup of the modeling code * nuke some stuff * more nuking * more cleanups * updates * add mulitLayerEmbedding vectorization * nits * more modeling simplifications * updates * update rope * update rope * just fixup * update configuration files * more cleanup! * default config values * update * forgotten comma * another comma! * update, more cleanups * just more nits * more config cleanups * time for the encoder * fix * sa=mall nit * nits * n * refacto a bit * cleanup * update cv scipt * fix last issues * fix last nits * styling * small fixes * just run 1 generation * fixes * nits * fix conversion * fix * more fixes * full generate * ouf! * fixes! * updates * fix * fix cvrt * fixup * nits * delete wrong test * update * update * test tokenization * let's start changing things bit by bit - fix encoder step * removing custom generation, moving to GenerationMixin * add encoder decoder attention masks for generation * mask changes, correctness checked against ad29837 in dia repo * refactor a bit already --> next cache * too important not to push :) * minimal cleanup + more todos * make main overwrite modeling utils * add cfg filter & eos filter * add eos countdown & delay pattern * update eos countdown * add max step eos countdown * fix tests * fix some things * fix generation with testing * move cfg & eos stuff to logits processor * make RepetitionPenaltyLogitsProcessor flexible - can accept 3D scores like (batch_size, channel, vocab) * fix input_ids concatenation dimension in GenerationMixin for flexibility * Add DiaHangoverLogitsProcessor and DiaExponentialDecayLengthPenalty classes; refactor logits processing in DiaForConditionalGeneration to utilize new configurations and improve flexibility. * Add stopping criteria * refactor * move delay pattern from processor to modeling like musicgen. - add docs - change eos countdown to eos delay pattern * fix processor & fix tests * refactor types * refactor imports * format code * fix docstring to pass ci * add docstring to DiaConfig & add DiaModel to test * fix docstring * add docstring * fix some bugs * check * porting / merging results from other branch - IMPORTANT: it very likely breaks generation, the goal is to have a proper forward path first * experimental testing of left padding for first channel * whoops * Fix merge to make generation work * fix cfg filter * add position ids * add todos, break things * revert changes to generation --> we will force 2d but go 3d on custom stuff * refactor a lot, change prepare decoder ids to work with left padding (needs testing), add todos * some first fixes to get to 10. in generation * some more generation fixes / adjustment * style + rope fixes * move cfg out, simplify a few things, more todos * nit * start working on custom logit processors * nit * quick fixes * cfg top k * more refactor of logits processing, needs a decision if gen config gets the new attributes or if we move it to config or similar * lets keep changes to core code minimal, only eos scaling is questionable atm * simpler eos delay logits processor * that was for debugging :D * proof of concept rope * small fix on device mismatch * cfg fixes + delay logits max len * transformers rope * modular dia * more cleanup * keep modeling consistently 3D, generate handles 2D internally * decoder starts with bos if nothing * post processing prototype * style * lol * force sample / greedy + fixes on padding * style * fixup tokenization * nits * revert * start working on dia tests * fix a lot of tests * more test fixes * nit * more test fixes + some features to simplify code more * more cleanup * forgot that one * autodocs * small consistency fixes * fix regression * small fixes * dia feature extraction * docs * wip processor * fix processor order * processing goes brrr * transpose before * small fix * fix major bug but needs now a closer look into the custom processors esp cfg * small thing on logits * nits * simplify indices and shifts * add simpler version of padding tests back (temporarily) * add logit processor tests * starting tests on processor * fix mask application during generation * some fixes on the weights conversion * style + fixup logits order * simplify conversion * nit * remove padding tests * nits on modeling * hmm * fix tests * trigger * probably gonna be reverted, just a quick design around audio tokenizer * fixup typing * post merge + more typing * initial design for audio tokenizer * more design changes * nit * more processor tests and style related things * add to init * protect import * not sure why tbh * add another protect * more fixes * wow * it aint stopping :D * another missed type issue * ... * change design around audio tokenizer to prioritize init and go for auto - in regards to the review * change to new causal mask function + docstrings * change ternary * docs * remove todo, i dont think its essential tbh * remove pipeline as current pipelines do not fit in the current scheme, same as csm * closer to wrapping up the processor * text to audio, just for demo purposes (will likely be reverted) * check if it's this * save audio function * ensure no grad * fixes on prefixed audio, hop length is used via preprocess dac, device fixes * integration tests (tested locally on a100) + some processor utils / fixes * style * nits * another round of smaller things * docs + some fixes (generate one might be big) * msytery solved * small fix on conversion * add abstract audio tokenizer, change init check to abstract class * nits * update docs + fix some processing :D * change inheritance scheme for audio tokenizer * delete dead / unnecessary code in copied generate loop * last nits on new pipeline behavior (+ todo on tests) + style * trigger --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Vasqu <antonprogamer@gmail.com>
8.8 KiB
Auto Classes
In many cases, the architecture you want to use can be guessed from the name or the path of the pretrained model you
are supplying to the from_pretrained()
method. AutoClasses are here to do this job for you so that you
automatically retrieve the relevant model given the name/path to the pretrained weights/config/vocabulary.
Instantiating one of [AutoConfig
], [AutoModel
], and
[AutoTokenizer
] will directly create a class of the relevant architecture. For instance
model = AutoModel.from_pretrained("google-bert/bert-base-cased")
will create a model that is an instance of [BertModel
].
There is one class of AutoModel
for each task, and for each backend (PyTorch, TensorFlow, or Flax).
Extending the Auto Classes
Each of the auto classes has a method to be extended with your custom classes. For instance, if you have defined a
custom class of model NewModel
, make sure you have a NewModelConfig
then you can add those to the auto
classes like this:
from transformers import AutoConfig, AutoModel
AutoConfig.register("new-model", NewModelConfig)
AutoModel.register(NewModelConfig, NewModel)
You will then be able to use the auto classes like you would usually do!
If your NewModelConfig
is a subclass of [~transformers.PretrainedConfig
], make sure its
model_type
attribute is set to the same key you use when registering the config (here "new-model"
).
Likewise, if your NewModel
is a subclass of [PreTrainedModel
], make sure its
config_class
attribute is set to the same class you use when registering the model (here
NewModelConfig
).
AutoConfig
autodoc AutoConfig
AutoTokenizer
autodoc AutoTokenizer
AutoFeatureExtractor
autodoc AutoFeatureExtractor
AutoImageProcessor
autodoc AutoImageProcessor
AutoVideoProcessor
autodoc AutoVideoProcessor
AutoProcessor
autodoc AutoProcessor
Generic model classes
The following auto classes are available for instantiating a base model class without a specific head.
AutoModel
autodoc AutoModel
TFAutoModel
autodoc TFAutoModel
FlaxAutoModel
autodoc FlaxAutoModel
Generic pretraining classes
The following auto classes are available for instantiating a model with a pretraining head.
AutoModelForPreTraining
autodoc AutoModelForPreTraining
TFAutoModelForPreTraining
autodoc TFAutoModelForPreTraining
FlaxAutoModelForPreTraining
autodoc FlaxAutoModelForPreTraining
Natural Language Processing
The following auto classes are available for the following natural language processing tasks.
AutoModelForCausalLM
autodoc AutoModelForCausalLM
TFAutoModelForCausalLM
autodoc TFAutoModelForCausalLM
FlaxAutoModelForCausalLM
autodoc FlaxAutoModelForCausalLM
AutoModelForMaskedLM
autodoc AutoModelForMaskedLM
TFAutoModelForMaskedLM
autodoc TFAutoModelForMaskedLM
FlaxAutoModelForMaskedLM
autodoc FlaxAutoModelForMaskedLM
AutoModelForMaskGeneration
autodoc AutoModelForMaskGeneration
TFAutoModelForMaskGeneration
autodoc TFAutoModelForMaskGeneration
AutoModelForSeq2SeqLM
autodoc AutoModelForSeq2SeqLM
TFAutoModelForSeq2SeqLM
autodoc TFAutoModelForSeq2SeqLM
FlaxAutoModelForSeq2SeqLM
autodoc FlaxAutoModelForSeq2SeqLM
AutoModelForSequenceClassification
autodoc AutoModelForSequenceClassification
TFAutoModelForSequenceClassification
autodoc TFAutoModelForSequenceClassification
FlaxAutoModelForSequenceClassification
autodoc FlaxAutoModelForSequenceClassification
AutoModelForMultipleChoice
autodoc AutoModelForMultipleChoice
TFAutoModelForMultipleChoice
autodoc TFAutoModelForMultipleChoice
FlaxAutoModelForMultipleChoice
autodoc FlaxAutoModelForMultipleChoice
AutoModelForNextSentencePrediction
autodoc AutoModelForNextSentencePrediction
TFAutoModelForNextSentencePrediction
autodoc TFAutoModelForNextSentencePrediction
FlaxAutoModelForNextSentencePrediction
autodoc FlaxAutoModelForNextSentencePrediction
AutoModelForTokenClassification
autodoc AutoModelForTokenClassification
TFAutoModelForTokenClassification
autodoc TFAutoModelForTokenClassification
FlaxAutoModelForTokenClassification
autodoc FlaxAutoModelForTokenClassification
AutoModelForQuestionAnswering
autodoc AutoModelForQuestionAnswering
TFAutoModelForQuestionAnswering
autodoc TFAutoModelForQuestionAnswering
FlaxAutoModelForQuestionAnswering
autodoc FlaxAutoModelForQuestionAnswering
AutoModelForTextEncoding
autodoc AutoModelForTextEncoding
TFAutoModelForTextEncoding
autodoc TFAutoModelForTextEncoding
Computer vision
The following auto classes are available for the following computer vision tasks.
AutoModelForDepthEstimation
autodoc AutoModelForDepthEstimation
AutoModelForImageClassification
autodoc AutoModelForImageClassification
TFAutoModelForImageClassification
autodoc TFAutoModelForImageClassification
FlaxAutoModelForImageClassification
autodoc FlaxAutoModelForImageClassification
AutoModelForVideoClassification
autodoc AutoModelForVideoClassification
AutoModelForKeypointDetection
autodoc AutoModelForKeypointDetection
AutoModelForMaskedImageModeling
autodoc AutoModelForMaskedImageModeling
TFAutoModelForMaskedImageModeling
autodoc TFAutoModelForMaskedImageModeling
AutoModelForObjectDetection
autodoc AutoModelForObjectDetection
AutoModelForImageSegmentation
autodoc AutoModelForImageSegmentation
AutoModelForImageToImage
autodoc AutoModelForImageToImage
AutoModelForSemanticSegmentation
autodoc AutoModelForSemanticSegmentation
TFAutoModelForSemanticSegmentation
autodoc TFAutoModelForSemanticSegmentation
AutoModelForInstanceSegmentation
autodoc AutoModelForInstanceSegmentation
AutoModelForUniversalSegmentation
autodoc AutoModelForUniversalSegmentation
AutoModelForZeroShotImageClassification
autodoc AutoModelForZeroShotImageClassification
TFAutoModelForZeroShotImageClassification
autodoc TFAutoModelForZeroShotImageClassification
AutoModelForZeroShotObjectDetection
autodoc AutoModelForZeroShotObjectDetection
Audio
The following auto classes are available for the following audio tasks.
AutoModelForAudioClassification
autodoc AutoModelForAudioClassification
AutoModelForAudioFrameClassification
autodoc TFAutoModelForAudioClassification
TFAutoModelForAudioFrameClassification
autodoc AutoModelForAudioFrameClassification
AutoModelForCTC
autodoc AutoModelForCTC
AutoModelForSpeechSeq2Seq
autodoc AutoModelForSpeechSeq2Seq
TFAutoModelForSpeechSeq2Seq
autodoc TFAutoModelForSpeechSeq2Seq
FlaxAutoModelForSpeechSeq2Seq
autodoc FlaxAutoModelForSpeechSeq2Seq
AutoModelForAudioXVector
autodoc AutoModelForAudioXVector
AutoModelForTextToSpectrogram
autodoc AutoModelForTextToSpectrogram
AutoModelForTextToWaveform
autodoc AutoModelForTextToWaveform
AutoModelForAudioTokenization
autodoc AutoModelForAudioTokenization
Multimodal
The following auto classes are available for the following multimodal tasks.
AutoModelForTableQuestionAnswering
autodoc AutoModelForTableQuestionAnswering
TFAutoModelForTableQuestionAnswering
autodoc TFAutoModelForTableQuestionAnswering
AutoModelForDocumentQuestionAnswering
autodoc AutoModelForDocumentQuestionAnswering
TFAutoModelForDocumentQuestionAnswering
autodoc TFAutoModelForDocumentQuestionAnswering
AutoModelForVisualQuestionAnswering
autodoc AutoModelForVisualQuestionAnswering
AutoModelForVision2Seq
autodoc AutoModelForVision2Seq
TFAutoModelForVision2Seq
autodoc TFAutoModelForVision2Seq
FlaxAutoModelForVision2Seq
autodoc FlaxAutoModelForVision2Seq
AutoModelForImageTextToText
autodoc AutoModelForImageTextToText
Time Series
AutoModelForTimeSeriesPrediction
autodoc AutoModelForTimeSeriesPrediction