transformers/docs/source/en
Daniel Stancl a72f1c9f5b
Add LongT5 model (#16792)
* Initial commit

* Make some fixes

* Make PT model full forward pass

* Drop TF & Flax implementation, fix copies etc

* Add Flax model and update some corresponding stuff

* Drop some TF things

* Update config and flax local attn

* Add encoder_attention_type to config

* .

* Update docs

* Do some cleansing

* Fix some issues -> make style; add some docs

* Fix position_bias + mask addition + Update tests

* Fix repo consistency

* Fix model consistency by removing flax operation over attn_mask

* [WIP] Add PT TGlobal LongT5

* .

* [WIP] Add flax tglobal model

* [WIP] Update flax model to use the right attention type in the encoder

* Fix flax tglobal model forward pass

* Make the use of global_relative_attention_bias

* Add test suites for TGlobal model

* Fix minor bugs, clean code

* Fix pt-flax equivalence though not convinced with correctness

* Fix LocalAttn implementation to match the original impl. + update READMEs

* Few updates

* Update: [Flax] improve large model init and loading #16148

* Add ckpt conversion script accoring to #16853 + handle torch device placement

* Minor updates to conversion script.

* Typo: AutoModelForSeq2SeqLM -> FlaxAutoModelForSeq2SeqLM

* gpu support + dtype fix

* Apply some suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* * Remove (de)parallelize stuff
* Edit shape comments
* Update README.md
* make fix-copies

* Remove caching logic for local & tglobal attention

* Apply another batch of suggestions from code review

* Add missing checkpoints
* Format converting scripts
* Drop (de)parallelize links from longT5 mdx

* Fix converting script + revert config file change

* Revert "Remove caching logic for local & tglobal attention"

This reverts commit 2a619828f6ddc3e65bd9bb1725a12b77fa883a46.

* Stash caching logic in Flax model

* Make side relative bias used always

* Drop caching logic in PT model

* Return side bias as it was

* Drop all remaining model parallel logic

* Remove clamp statements

* Move test files to the proper place

* Update docs with new version of hf-doc-builder

* Fix test imports

* Make some minor improvements

* Add missing checkpoints to docs
* Make TGlobal model compatible with torch.onnx.export
* Replace some np.ndarray with jnp.ndarray

* Fix TGlobal for ONNX conversion + update docs

* fix _make_global_fixed_block_ids and masked neg  value

* update flax model

* style and quality

* fix imports

* remove load_tf_weights_in_longt5 from init and fix copies

* add slow test for TGlobal model

* typo fix

* Drop obsolete is_parallelizable and one warning

* Update __init__ files to fix repo-consistency

* fix pipeline test

* Fix some device placements

* [wip]: Update tests -- need to generate summaries to update expected_summary

* Fix quality

* Update LongT5 model card

* Update (slow) summarization tests

* make style

* rename checkpoitns

* finish

* fix flax tests

Co-authored-by: phungvanduy <pvduy23@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: patil-suraj <surajp815@gmail.com>
2022-06-13 22:36:58 +02:00
..
internal Allow from transformers import TypicalLogitsWarper (#17477) 2022-06-03 11:08:35 +02:00
main_classes Add Visual Question Answering (VQA) pipeline (#17286) 2022-06-13 07:49:44 -04:00
model_doc Add LongT5 model (#16792) 2022-06-13 22:36:58 +02:00
tasks Update audio examples with MInDS-14 (#16633) 2022-04-08 15:55:42 -05:00
_config.py Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
_toctree.yml Add LongT5 model (#16792) 2022-06-13 22:36:58 +02:00
accelerate.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
add_new_model.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
add_new_pipeline.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
autoclass_tutorial.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
benchmarks.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
bertology.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
big_models.mdx Make Trainer compatible with sharded checkpoints (#17053) 2022-05-03 09:55:10 -04:00
community.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
contributing.md Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
converting_tensorflow_models.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
create_a_model.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
custom_models.mdx Update custom_models.mdx (#16964) 2022-04-27 16:46:55 +02:00
debugging.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
fast_tokenizers.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
glossary.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
index.mdx Add LongT5 model (#16792) 2022-06-13 22:36:58 +02:00
installation.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
migration.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
model_sharing.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
model_summary.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
multilingual.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
notebooks.md Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
pad_truncation.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
perf_hardware.mdx [WIP] [doc] performance/scalability revamp (#15723) 2022-05-16 13:36:41 +02:00
perf_train_cpu.mdx Extend Transformers Trainer Class to Enable CPU AMP and Integrate Intel Extension for PyTorch (#17138) 2022-06-08 09:41:57 -04:00
perf_train_gpu_many.mdx [WIP] [doc] performance/scalability revamp (#15723) 2022-05-16 13:36:41 +02:00
perf_train_gpu_one.mdx [WIP] [doc] performance/scalability revamp (#15723) 2022-05-16 13:36:41 +02:00
performance.mdx Extend Transformers Trainer Class to Enable CPU AMP and Integrate Intel Extension for PyTorch (#17138) 2022-06-08 09:41:57 -04:00
perplexity.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
philosophy.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
pipeline_tutorial.mdx docs(transformers): fix typo (#17263) 2022-05-16 17:04:30 -04:00
pr_checks.mdx Add a check on config classes docstring checkpoints (#17012) 2022-04-30 10:40:46 +02:00
preprocessing.mdx Fixing the output of code examples in the preprocessing chapter (#17162) 2022-05-10 12:16:28 -04:00
quicktour.mdx Fix doc test quicktour dataset (#16929) 2022-04-25 16:26:59 +02:00
run_scripts.mdx Fix all docs for accelerate install directions (#17145) 2022-05-09 15:45:18 -04:00
sagemaker.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
serialization.mdx Add LongT5 model (#16792) 2022-06-13 22:36:58 +02:00
task_summary.mdx [Doctests] Correct task summary (#16644) 2022-04-11 14:59:35 +02:00
testing.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
tokenizer_summary.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
training.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
troubleshooting.mdx Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00