mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-17 19:48:23 +06:00
![]() * add model like clip * update * text model ok * clap text works * some refactor - `CLAPVision` to `CLAPAudio` - refactor kwargs of audio modules * more refactor * more refactor * more refactor * correct fusion * more refactor * new modules * add basic processor * fixup * remove whisper copioed from * audio logits match * add doc * correct filters mel and add maxlength * style * few fixes * forward passes * fixup * fixup * some clean up * remove mels form the dictionnary * pad after the repeat * update padding when dsmaller * fix padding * style * use swin patch merging * use copied from swin * processor with any tokenizer * more copied from * some clean up * more refactor * fix mel when rand_trunc * style * remove unused imports * update processing * remove image processing tests * add testing fiel * fixmodeling issues * replace with `is_longer` * clap in serialization * more refactor * `make fixup` * make fixup * fix feature extractor * update test feature extractor * `make fixup` * clean up config * more clean up * more cleanup * update tests * refactor tests and inits * removeCLAP vision config * remove CLAP from image procssing auto and dummy vision objects * update inits * style * re order classes in modeling clap * Use roberta tokenizer as the other weights are not open sourced * small cleaup * remove tokenization CLAP * processor tokenizr is roberta * update feature extraction doc * remove vclap from model zero shot * update f_min and f_max to frequency_xx * some changes - fix modeling keys - add `is_longer` in the forward pass - make fixup * make fixup * consistent behavior ebtween rand_crop and fusion * add numpy resize and bilinear and documentation * move resizing to image utils * clean feature extraction * import resize from correct file * resize in image transforms * update * style * style * nit * remove unused arguments form the feature extractor * style * few fixes + make fixup * oops * fix more tests * add zero shot audio classification pipeline * update zeroshot classification pipeline * fixup * fix copies * all CI tests pass * make fixup + fix docs * fix docs * fix docs * update tests pip;eline * update zero shot pipeline * update feature extraction clap * update tokenization auto * use nested simplify * update pipeline tests * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * split in two lines * fixes * refactor * clean up * add integration tests * update config docstring * style * update processor * fix processor test * fix feat extractor tests * update docs * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix readmes * fix tips * Update src/transformers/models/auto/configuration_auto.py * update doc and remove todo -> properly explained * fix idx and typo * typoe * cleanup config * cleanup tests, styles and doc * ignore docstyle on image transform * add conversion script * remove the `clap` indx in favor of `CLAP` * update __init * nits * Update src/transformers/pipelines/__init__.py * fix bug * clarifiy config * fix copy * fix init * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fix model output * fix comment * make fixup * make fixup * rename to `Clap` * replace to `Clap` * replace to `Clap` * repo consistency * again repo-consistency * make fixup * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * add config * changes * update conversion * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * remove unused function * update based on code reviews * style * more comments * cleanup * clean up * style * apply suggestions * Empty commit * pipeline will be added in a different PR * update calls to audio utils functions * update pipeline init * style * style * styling again * use pad * fix repo-consistency * update utils and add doc for audio utils * clean up resize by using torch. update inits accordingly * style * CLap's tokenizer is RobertA * add audio utils to internal toctreee * update totctree * style * update documentation and normalize naming accross audio utils and feature extraction clap * style * clean up * update doc and typos * fix doctest * update modelin code, got rid of a lot of reshaping * style on added doc audio utils * update modeling clap * style * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * docstringvariables with CLAP * rename key * update modeling CLAP * update audio utils docstring * update processing clap * fix readmes * fix toctree * udpate configuration clap * fix init * make fixup * fix * fix * update naming * update * update checkpoint path * Apply suggestions from code review * Major refactoring * Update src/transformers/models/clap/configuration_clap.py * merge --------- Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> |
||
---|---|---|
.. | ||
albert.mdx | ||
altclip.mdx | ||
audio-spectrogram-transformer.mdx | ||
auto.mdx | ||
bart.mdx | ||
barthez.mdx | ||
bartpho.mdx | ||
beit.mdx | ||
bert-generation.mdx | ||
bert-japanese.mdx | ||
bert.mdx | ||
bertweet.mdx | ||
big_bird.mdx | ||
bigbird_pegasus.mdx | ||
biogpt.mdx | ||
bit.mdx | ||
blenderbot-small.mdx | ||
blenderbot.mdx | ||
blip-2.mdx | ||
blip.mdx | ||
bloom.mdx | ||
bort.mdx | ||
bridgetower.mdx | ||
byt5.mdx | ||
camembert.mdx | ||
canine.mdx | ||
chinese_clip.mdx | ||
clap.mdx | ||
clip.mdx | ||
clipseg.mdx | ||
codegen.mdx | ||
conditional_detr.mdx | ||
convbert.mdx | ||
convnext.mdx | ||
cpm.mdx | ||
ctrl.mdx | ||
cvt.mdx | ||
data2vec.mdx | ||
deberta-v2.mdx | ||
deberta.mdx | ||
decision_transformer.mdx | ||
deformable_detr.mdx | ||
deit.mdx | ||
deta.mdx | ||
detr.mdx | ||
dialogpt.mdx | ||
dinat.mdx | ||
distilbert.mdx | ||
dit.mdx | ||
donut.mdx | ||
dpr.mdx | ||
dpt.mdx | ||
efficientformer.mdx | ||
electra.mdx | ||
encoder-decoder.mdx | ||
ernie_m.mdx | ||
ernie.mdx | ||
esm.mdx | ||
flan-t5.mdx | ||
flaubert.mdx | ||
flava.mdx | ||
fnet.mdx | ||
fsmt.mdx | ||
funnel.mdx | ||
git.mdx | ||
glpn.mdx | ||
gpt_neo.mdx | ||
gpt_neox_japanese.mdx | ||
gpt_neox.mdx | ||
gpt-sw3.mdx | ||
gpt2.mdx | ||
gptj.mdx | ||
graphormer.mdx | ||
groupvit.mdx | ||
herbert.mdx | ||
hubert.mdx | ||
ibert.mdx | ||
imagegpt.mdx | ||
jukebox.mdx | ||
layoutlm.mdx | ||
layoutlmv2.mdx | ||
layoutlmv3.mdx | ||
layoutxlm.mdx | ||
led.mdx | ||
levit.mdx | ||
lilt.mdx | ||
longformer.mdx | ||
longt5.mdx | ||
luke.mdx | ||
lxmert.mdx | ||
m2m_100.mdx | ||
marian.mdx | ||
markuplm.mdx | ||
mask2former.mdx | ||
maskformer.mdx | ||
mbart.mdx | ||
mctct.mdx | ||
megatron_gpt2.mdx | ||
megatron-bert.mdx | ||
mluke.mdx | ||
mobilebert.mdx | ||
mobilenet_v1.mdx | ||
mobilenet_v2.mdx | ||
mobilevit.mdx | ||
mpnet.mdx | ||
mt5.mdx | ||
mvp.mdx | ||
nat.mdx | ||
nezha.mdx | ||
nllb.mdx | ||
nystromformer.mdx | ||
oneformer.mdx | ||
openai-gpt.mdx | ||
opt.mdx | ||
owlvit.mdx | ||
pegasus_x.mdx | ||
pegasus.mdx | ||
perceiver.mdx | ||
phobert.mdx | ||
plbart.mdx | ||
poolformer.mdx | ||
prophetnet.mdx | ||
qdqbert.mdx | ||
rag.mdx | ||
realm.mdx | ||
reformer.mdx | ||
regnet.mdx | ||
rembert.mdx | ||
resnet.mdx | ||
retribert.mdx | ||
roberta-prelayernorm.mdx | ||
roberta.mdx | ||
roc_bert.mdx | ||
roformer.mdx | ||
segformer.mdx | ||
sew-d.mdx | ||
sew.mdx | ||
speech_to_text_2.mdx | ||
speech_to_text.mdx | ||
speech-encoder-decoder.mdx | ||
speecht5.mdx | ||
splinter.mdx | ||
squeezebert.mdx | ||
swin.mdx | ||
swin2sr.mdx | ||
swinv2.mdx | ||
switch_transformers.mdx | ||
t5.mdx | ||
t5v1.1.mdx | ||
table-transformer.mdx | ||
tapas.mdx | ||
tapex.mdx | ||
time_series_transformer.mdx | ||
timesformer.mdx | ||
trajectory_transformer.mdx | ||
transfo-xl.mdx | ||
trocr.mdx | ||
tvlt.mdx | ||
ul2.mdx | ||
unispeech-sat.mdx | ||
unispeech.mdx | ||
upernet.mdx | ||
van.mdx | ||
videomae.mdx | ||
vilt.mdx | ||
vision-encoder-decoder.mdx | ||
vision-text-dual-encoder.mdx | ||
visual_bert.mdx | ||
vit_hybrid.mdx | ||
vit_mae.mdx | ||
vit_msn.mdx | ||
vit.mdx | ||
wav2vec2_phoneme.mdx | ||
wav2vec2-conformer.mdx | ||
wav2vec2.mdx | ||
wavlm.mdx | ||
whisper.mdx | ||
xclip.mdx | ||
xglm.mdx | ||
xlm-prophetnet.mdx | ||
xlm-roberta-xl.mdx | ||
xlm-roberta.mdx | ||
xlm-v.mdx | ||
xlm.mdx | ||
xlnet.mdx | ||
xls_r.mdx | ||
xlsr_wav2vec2.mdx | ||
xmod.mdx | ||
yolos.mdx | ||
yoso.mdx |