mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-31 10:12:23 +06:00
![]() * init bigbird * model.__init__ working, conversion script ready, config updated * add conversion script * BigBirdEmbeddings working :) * slightly update conversion script * BigBirdAttention working :) ; some bug in layer.output.dense * add debugger-notebook * forward() working for BigBirdModel :) ; replaced gelu with gelu_fast * tf code adapted to torch till rand_attn in bigbird_block_sparse_attention ; till now everything working :) * BigBirdModel working in block-sparse attention mode :) * add BigBirdForPreTraining * small fix * add tokenizer for BigBirdModel * fix config & hence modeling * fix base prefix * init testing * init tokenizer test * pos_embed must be absolute, attn_type=original_full when add_cross_attn=True , nsp loss is optional in BigBirdForPreTraining, add assert statements * remove position_embedding_type arg * complete normal tests * add comments to block sparse attention * add attn_probs for sliding & global tokens * create fn for block sparse attn mask creation * add special tests * restore pos embed arg * minor fix * attn probs update * make big bird fully gpu friendly * fix tests * remove pruning * correct tokenzier & minor fixes * update conversion script , remove norm_type * tokenizer-inference test add * remove extra comments * add docs * save intermediate * finish trivia_qa conversion * small update to forward * correct qa and layer * better error message * BigBird QA ready * fix rebased * add triva-qa debugger notebook * qa setup * fixed till embeddings * some issue in q/k/v_layer * fix bug in conversion-script * fixed till self-attn * qa fixed except layer norm * add qa end2end test * fix gradient ckpting ; other qa test * speed-up big bird a bit * hub_id=google * clean up * make quality * speed up einsum with bmm * finish perf improvements for big bird * remove wav2vec2 tok * fix tokenizer * include docs * correct docs * add helper to auto pad block size * make style * remove fast tokenizer for now * fix some * add pad test * finish * fix some bugs * fix another bug * fix buffer tokens * fix comment and merge from master * add comments * make style * commit some suggestions Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Fix typos * fix some more suggestions * add another patch Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fix copies * another path Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * update * update nit suggestions * make style Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Lysandre Debut <lysandre@huggingface.co> |
||
---|---|---|
.. | ||
albert.rst | ||
auto.rst | ||
bart.rst | ||
barthez.rst | ||
bert.rst | ||
bertgeneration.rst | ||
bertweet.rst | ||
bigbird.rst | ||
blenderbot_small.rst | ||
blenderbot.rst | ||
bort.rst | ||
camembert.rst | ||
convbert.rst | ||
ctrl.rst | ||
deberta_v2.rst | ||
deberta.rst | ||
dialogpt.rst | ||
distilbert.rst | ||
dpr.rst | ||
electra.rst | ||
encoderdecoder.rst | ||
flaubert.rst | ||
fsmt.rst | ||
funnel.rst | ||
gpt.rst | ||
gpt2.rst | ||
herbert.rst | ||
ibert.rst | ||
layoutlm.rst | ||
led.rst | ||
longformer.rst | ||
lxmert.rst | ||
m2m_100.rst | ||
marian.rst | ||
mbart.rst | ||
mobilebert.rst | ||
mpnet.rst | ||
mt5.rst | ||
pegasus.rst | ||
phobert.rst | ||
prophetnet.rst | ||
rag.rst | ||
reformer.rst | ||
retribert.rst | ||
roberta.rst | ||
speech_to_text.rst | ||
squeezebert.rst | ||
t5.rst | ||
tapas.rst | ||
transformerxl.rst | ||
wav2vec2.rst | ||
xlm.rst | ||
xlmprophetnet.rst | ||
xlmroberta.rst | ||
xlnet.rst | ||
xlsr_wav2vec2.rst |