transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-13 17:48:22 +06:00

History

Joel Lamy-Poirier e0921c6b53 Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) (#22575 ) * Add model with cli tool * Remove unwanted stuff * Add new code * Remove inference runner * Style * Fix checks * Test updates * make fixup * fix docs * fix doc * fix test * hopefully fix pipeline tests * refactor * fix CIs * add comment * rename to `GPTBigCodeForCausalLM` * correct readme * make fixup + docs * make fixup * fixes * fixes * Remove pruning * Remove import * Doc updates * More pruning removal * Combine copies * Single MQA implementation, remove kv cache pre-allocation and padding * Update doc * Revert refactor to match gpt2 style * Merge back key and value caches, fix some type hints * Update doc * Fix position ids pith padding (PR 21080) * Add conversion script temporarily * Update conversion script * Remove checkpoint conversion * New model * Fix MQA test * Fix copies * try fix tests * FIX TEST!! * remove `DoubleHeadsModel` * add MQA tests * add slow tests * clean up * add CPU checker * final fixes * fixes - fix GPU issue - fixed slow tests - skip disk offload * fix final issue * Simplify and comment baddbmm fix * Remove unnecessary code * Transpose tweaks * Use beta=1 on cpu, improve tests --------- Co-authored-by: younesbelkada <younesbelkada@gmail.com>		2023-04-10 10:57:21 +02:00
..
asr.mdx	Added "Open in Colab" to task guides (#21729 )	2023-02-22 08:32:35 -05:00
audio_classification.mdx	[Whisper] Add model for audio classification (#21754 )	2023-03-07 16:20:21 +01:00
document_question_answering.mdx	Add: document question answering task guide (#21518 )	2023-02-13 09:24:56 -05:00
image_captioning.mdx	[Tasks] Adds image captioning (#21512 )	2023-02-10 22:52:12 +05:30
image_classification.mdx	Fix doc links (#22274 )	2023-03-20 17:07:31 +00:00
language_modeling.mdx	Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) (#22575 )	2023-04-10 10:57:21 +02:00
masked_language_modeling.mdx	Add Mega: Moving Average Equipped Gated Attention (#21766 )	2023-03-24 08:17:27 -04:00
monocular_depth_estimation.mdx	Depth estimation task guide (#22205 )	2023-03-17 08:36:23 -04:00
multiple_choice.mdx	Add Mega: Moving Average Equipped Gated Attention (#21766 )	2023-03-24 08:17:27 -04:00
object_detection.mdx	Update quality tooling for formatting (#21480 )	2023-02-06 18:10:56 -05:00
question_answering.mdx	Add Mega: Moving Average Equipped Gated Attention (#21766 )	2023-03-24 08:17:27 -04:00
semantic_segmentation.mdx	Fix doc links (#22274 )	2023-03-20 17:07:31 +00:00
sequence_classification.mdx	Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) (#22575 )	2023-04-10 10:57:21 +02:00
summarization.mdx	[WIP]`NLLB-MoE` Adds the moe model (#22024 )	2023-03-27 19:42:00 +02:00
token_classification.mdx	Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) (#22575 )	2023-04-10 10:57:21 +02:00
translation.mdx	[WIP]`NLLB-MoE` Adds the moe model (#22024 )	2023-03-27 19:42:00 +02:00
video_classification.mdx	Automated compatible models list for task guides (#21338 )	2023-01-27 13:19:28 -05:00
zero_shot_image_classification.mdx	Zero-shot image classification task guide (#22132 )	2023-03-13 10:57:17 -04:00
zero_shot_object_detection.mdx	Add: task guide for zero shot object detection (#21829 )	2023-02-28 10:23:08 -05:00