transformers/docs/source/en/tasks
Joel Lamy-Poirier e0921c6b53
Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) (#22575)
* Add model with cli tool

* Remove unwanted stuff

* Add new code

* Remove inference runner

* Style

* Fix checks

* Test updates

* make fixup

* fix docs

* fix doc

* fix test

* hopefully fix pipeline tests

* refactor

* fix CIs

* add comment

* rename to `GPTBigCodeForCausalLM`

* correct readme

* make fixup + docs

* make fixup

* fixes

* fixes

* Remove pruning

* Remove import

* Doc updates

* More pruning removal

* Combine copies

* Single MQA implementation, remove kv cache pre-allocation and padding

* Update doc

* Revert refactor to match gpt2 style

* Merge back key and value caches, fix some type hints

* Update doc

* Fix position ids pith padding (PR 21080)

* Add conversion script temporarily

* Update conversion script

* Remove checkpoint conversion

* New model

* Fix MQA test

* Fix copies

* try fix tests

* FIX TEST!!

* remove  `DoubleHeadsModel`

* add MQA tests

* add slow tests

* clean up

* add CPU checker

* final fixes

* fixes

- fix GPU issue
- fixed slow tests
- skip disk offload

* fix final issue

* Simplify and comment baddbmm fix

* Remove unnecessary code

* Transpose tweaks

* Use beta=1 on cpu, improve tests

---------

Co-authored-by: younesbelkada <younesbelkada@gmail.com>
2023-04-10 10:57:21 +02:00
..
asr.mdx Added "Open in Colab" to task guides (#21729) 2023-02-22 08:32:35 -05:00
audio_classification.mdx [Whisper] Add model for audio classification (#21754) 2023-03-07 16:20:21 +01:00
document_question_answering.mdx Add: document question answering task guide (#21518) 2023-02-13 09:24:56 -05:00
image_captioning.mdx [Tasks] Adds image captioning (#21512) 2023-02-10 22:52:12 +05:30
image_classification.mdx Fix doc links (#22274) 2023-03-20 17:07:31 +00:00
language_modeling.mdx Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) (#22575) 2023-04-10 10:57:21 +02:00
masked_language_modeling.mdx Add Mega: Moving Average Equipped Gated Attention (#21766) 2023-03-24 08:17:27 -04:00
monocular_depth_estimation.mdx Depth estimation task guide (#22205) 2023-03-17 08:36:23 -04:00
multiple_choice.mdx Add Mega: Moving Average Equipped Gated Attention (#21766) 2023-03-24 08:17:27 -04:00
object_detection.mdx Update quality tooling for formatting (#21480) 2023-02-06 18:10:56 -05:00
question_answering.mdx Add Mega: Moving Average Equipped Gated Attention (#21766) 2023-03-24 08:17:27 -04:00
semantic_segmentation.mdx Fix doc links (#22274) 2023-03-20 17:07:31 +00:00
sequence_classification.mdx Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) (#22575) 2023-04-10 10:57:21 +02:00
summarization.mdx [WIP]NLLB-MoE Adds the moe model (#22024) 2023-03-27 19:42:00 +02:00
token_classification.mdx Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) (#22575) 2023-04-10 10:57:21 +02:00
translation.mdx [WIP]NLLB-MoE Adds the moe model (#22024) 2023-03-27 19:42:00 +02:00
video_classification.mdx Automated compatible models list for task guides (#21338) 2023-01-27 13:19:28 -05:00
zero_shot_image_classification.mdx Zero-shot image classification task guide (#22132) 2023-03-13 10:57:17 -04:00
zero_shot_object_detection.mdx Add: task guide for zero shot object detection (#21829) 2023-02-28 10:23:08 -05:00