mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-16 02:58:23 +06:00
![]() * Initial commit * Make some fixes * Make PT model full forward pass * Drop TF & Flax implementation, fix copies etc * Add Flax model and update some corresponding stuff * Drop some TF things * Update config and flax local attn * Add encoder_attention_type to config * . * Update docs * Do some cleansing * Fix some issues -> make style; add some docs * Fix position_bias + mask addition + Update tests * Fix repo consistency * Fix model consistency by removing flax operation over attn_mask * [WIP] Add PT TGlobal LongT5 * . * [WIP] Add flax tglobal model * [WIP] Update flax model to use the right attention type in the encoder * Fix flax tglobal model forward pass * Make the use of global_relative_attention_bias * Add test suites for TGlobal model * Fix minor bugs, clean code * Fix pt-flax equivalence though not convinced with correctness * Fix LocalAttn implementation to match the original impl. + update READMEs * Few updates * Update: [Flax] improve large model init and loading #16148 * Add ckpt conversion script accoring to #16853 + handle torch device placement * Minor updates to conversion script. * Typo: AutoModelForSeq2SeqLM -> FlaxAutoModelForSeq2SeqLM * gpu support + dtype fix * Apply some suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * * Remove (de)parallelize stuff * Edit shape comments * Update README.md * make fix-copies * Remove caching logic for local & tglobal attention * Apply another batch of suggestions from code review * Add missing checkpoints * Format converting scripts * Drop (de)parallelize links from longT5 mdx * Fix converting script + revert config file change * Revert "Remove caching logic for local & tglobal attention" This reverts commit 2a619828f6ddc3e65bd9bb1725a12b77fa883a46. * Stash caching logic in Flax model * Make side relative bias used always * Drop caching logic in PT model * Return side bias as it was * Drop all remaining model parallel logic * Remove clamp statements * Move test files to the proper place * Update docs with new version of hf-doc-builder * Fix test imports * Make some minor improvements * Add missing checkpoints to docs * Make TGlobal model compatible with torch.onnx.export * Replace some np.ndarray with jnp.ndarray * Fix TGlobal for ONNX conversion + update docs * fix _make_global_fixed_block_ids and masked neg value * update flax model * style and quality * fix imports * remove load_tf_weights_in_longt5 from init and fix copies * add slow test for TGlobal model * typo fix * Drop obsolete is_parallelizable and one warning * Update __init__ files to fix repo-consistency * fix pipeline test * Fix some device placements * [wip]: Update tests -- need to generate summaries to update expected_summary * Fix quality * Update LongT5 model card * Update (slow) summarization tests * make style * rename checkpoitns * finish * fix flax tests Co-authored-by: phungvanduy <pvduy23@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: patil-suraj <surajp815@gmail.com> |
||
---|---|---|
.. | ||
albert.mdx | ||
auto.mdx | ||
bart.mdx | ||
barthez.mdx | ||
bartpho.mdx | ||
beit.mdx | ||
bert-generation.mdx | ||
bert-japanese.mdx | ||
bert.mdx | ||
bertweet.mdx | ||
big_bird.mdx | ||
bigbird_pegasus.mdx | ||
blenderbot-small.mdx | ||
blenderbot.mdx | ||
bloom.mdx | ||
bort.mdx | ||
byt5.mdx | ||
camembert.mdx | ||
canine.mdx | ||
clip.mdx | ||
convbert.mdx | ||
convnext.mdx | ||
cpm.mdx | ||
ctrl.mdx | ||
cvt.mdx | ||
data2vec.mdx | ||
deberta-v2.mdx | ||
deberta.mdx | ||
decision_transformer.mdx | ||
deit.mdx | ||
detr.mdx | ||
dialogpt.mdx | ||
distilbert.mdx | ||
dit.mdx | ||
dpr.mdx | ||
dpt.mdx | ||
electra.mdx | ||
encoder-decoder.mdx | ||
flaubert.mdx | ||
flava.mdx | ||
fnet.mdx | ||
fsmt.mdx | ||
funnel.mdx | ||
glpn.mdx | ||
gpt_neo.mdx | ||
gpt_neox.mdx | ||
gpt2.mdx | ||
gptj.mdx | ||
herbert.mdx | ||
hubert.mdx | ||
ibert.mdx | ||
imagegpt.mdx | ||
layoutlm.mdx | ||
layoutlmv2.mdx | ||
layoutlmv3.mdx | ||
layoutxlm.mdx | ||
led.mdx | ||
levit.mdx | ||
longformer.mdx | ||
longt5.mdx | ||
luke.mdx | ||
lxmert.mdx | ||
m2m_100.mdx | ||
marian.mdx | ||
maskformer.mdx | ||
mbart.mdx | ||
mctct.mdx | ||
megatron_gpt2.mdx | ||
megatron-bert.mdx | ||
mluke.mdx | ||
mobilebert.mdx | ||
mpnet.mdx | ||
mt5.mdx | ||
nystromformer.mdx | ||
openai-gpt.mdx | ||
opt.mdx | ||
pegasus.mdx | ||
perceiver.mdx | ||
phobert.mdx | ||
plbart.mdx | ||
poolformer.mdx | ||
prophetnet.mdx | ||
qdqbert.mdx | ||
rag.mdx | ||
realm.mdx | ||
reformer.mdx | ||
regnet.mdx | ||
rembert.mdx | ||
resnet.mdx | ||
retribert.mdx | ||
roberta.mdx | ||
roformer.mdx | ||
segformer.mdx | ||
sew-d.mdx | ||
sew.mdx | ||
speech_to_text_2.mdx | ||
speech_to_text.mdx | ||
speech-encoder-decoder.mdx | ||
splinter.mdx | ||
squeezebert.mdx | ||
swin.mdx | ||
t5.mdx | ||
t5v1.1.mdx | ||
tapas.mdx | ||
tapex.mdx | ||
trajectory_transformer.mdx | ||
transfo-xl.mdx | ||
trocr.mdx | ||
unispeech-sat.mdx | ||
unispeech.mdx | ||
van.mdx | ||
vilt.mdx | ||
vision-encoder-decoder.mdx | ||
vision-text-dual-encoder.mdx | ||
visual_bert.mdx | ||
vit_mae.mdx | ||
vit.mdx | ||
wav2vec2_phoneme.mdx | ||
wav2vec2-conformer.mdx | ||
wav2vec2.mdx | ||
wavlm.mdx | ||
xglm.mdx | ||
xlm-prophetnet.mdx | ||
xlm-roberta-xl.mdx | ||
xlm-roberta.mdx | ||
xlm.mdx | ||
xlnet.mdx | ||
xls_r.mdx | ||
xlsr_wav2vec2.mdx | ||
yolos.mdx | ||
yoso.mdx |