transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-31 02:02:21 +06:00

History

Daniel Stancl a72f1c9f5b Add `LongT5` model (#16792 ) * Initial commit * Make some fixes * Make PT model full forward pass * Drop TF & Flax implementation, fix copies etc * Add Flax model and update some corresponding stuff * Drop some TF things * Update config and flax local attn * Add encoder_attention_type to config * . * Update docs * Do some cleansing * Fix some issues -> make style; add some docs * Fix position_bias + mask addition + Update tests * Fix repo consistency * Fix model consistency by removing flax operation over attn_mask * [WIP] Add PT TGlobal LongT5 * . * [WIP] Add flax tglobal model * [WIP] Update flax model to use the right attention type in the encoder * Fix flax tglobal model forward pass * Make the use of global_relative_attention_bias * Add test suites for TGlobal model * Fix minor bugs, clean code * Fix pt-flax equivalence though not convinced with correctness * Fix LocalAttn implementation to match the original impl. + update READMEs * Few updates * Update: [Flax] improve large model init and loading #16148 * Add ckpt conversion script accoring to #16853 + handle torch device placement * Minor updates to conversion script. * Typo: AutoModelForSeq2SeqLM -> FlaxAutoModelForSeq2SeqLM * gpu support + dtype fix * Apply some suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * * Remove (de)parallelize stuff * Edit shape comments * Update README.md * make fix-copies * Remove caching logic for local & tglobal attention * Apply another batch of suggestions from code review * Add missing checkpoints * Format converting scripts * Drop (de)parallelize links from longT5 mdx * Fix converting script + revert config file change * Revert "Remove caching logic for local & tglobal attention" This reverts commit 2a619828f6ddc3e65bd9bb1725a12b77fa883a46. * Stash caching logic in Flax model * Make side relative bias used always * Drop caching logic in PT model * Return side bias as it was * Drop all remaining model parallel logic * Remove clamp statements * Move test files to the proper place * Update docs with new version of hf-doc-builder * Fix test imports * Make some minor improvements * Add missing checkpoints to docs * Make TGlobal model compatible with torch.onnx.export * Replace some np.ndarray with jnp.ndarray * Fix TGlobal for ONNX conversion + update docs * fix _make_global_fixed_block_ids and masked neg value * update flax model * style and quality * fix imports * remove load_tf_weights_in_longt5 from init and fix copies * add slow test for TGlobal model * typo fix * Drop obsolete is_parallelizable and one warning * Update __init__ files to fix repo-consistency * fix pipeline test * Fix some device placements * [wip]: Update tests -- need to generate summaries to update expected_summary * Fix quality * Update LongT5 model card * Update (slow) summarization tests * make style * rename checkpoitns * finish * fix flax tests Co-authored-by: phungvanduy <pvduy23@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: patil-suraj <surajp815@gmail.com>		2022-06-13 22:36:58 +02:00
..
__init__.py	[Test refactor 1/5] Per-folder tests reorganization (#15725 )	2022-02-23 15:46:28 -05:00
test_pipelines_audio_classification.py	[Test refactor 1/5] Per-folder tests reorganization (#15725 )	2022-02-23 15:46:28 -05:00
test_pipelines_automatic_speech_recognition.py	Black preview (#17217 )	2022-05-12 16:25:55 -04:00
test_pipelines_common.py	Black preview (#17217 )	2022-05-12 16:25:55 -04:00
test_pipelines_conversational.py	[Test refactor 1/5] Per-folder tests reorganization (#15725 )	2022-02-23 15:46:28 -05:00
test_pipelines_feature_extraction.py	[Test refactor 1/5] Per-folder tests reorganization (#15725 )	2022-02-23 15:46:28 -05:00
test_pipelines_fill_mask.py	Running a pipeline of `float16`. (#17637 )	2022-06-09 19:04:42 +02:00
test_pipelines_image_classification.py	[Test refactor 1/5] Per-folder tests reorganization (#15725 )	2022-02-23 15:46:28 -05:00
test_pipelines_image_segmentation.py	Add `ForInstanceSegmentation` models to `image-segmentation` pipelines (#15937 )	2022-03-09 10:19:05 +01:00
test_pipelines_object_detection.py	[Test refactor 1/5] Per-folder tests reorganization (#15725 )	2022-02-23 15:46:28 -05:00
test_pipelines_question_answering.py	Adding `batch_size` test to QA pipeline. (#17330 )	2022-05-19 14:28:12 -04:00
test_pipelines_summarization.py	Add `LongT5` model (#16792 )	2022-06-13 22:36:58 +02:00
test_pipelines_table_question_answering.py	Black preview (#17217 )	2022-05-12 16:25:55 -04:00
test_pipelines_text_classification.py	Adding `top_k` argument to `text-classification` pipeline. (#17606 )	2022-06-09 18:33:10 +02:00
test_pipelines_text_generation.py	Black preview (#17217 )	2022-05-12 16:25:55 -04:00
test_pipelines_text2text_generation.py	Fixing return type tensor with `num_return_sequences>1`. (#16828 )	2022-04-20 16:11:51 +02:00
test_pipelines_token_classification.py	Attention mask is important in the case of batching... (#16222 )	2022-03-18 10:02:12 +01:00
test_pipelines_translation.py	Black preview (#17217 )	2022-05-12 16:25:55 -04:00
test_pipelines_visual_question_answering.py	Add Visual Question Answering (VQA) pipeline (#17286 )	2022-06-13 07:49:44 -04:00
test_pipelines_zero_shot_image_classification.py	The tests were not updated after the addition of `torch.diag` (#15890 )	2022-03-03 15:33:49 +01:00
test_pipelines_zero_shot.py	Black preview (#17217 )	2022-05-12 16:25:55 -04:00