transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-31 02:02:21 +06:00

History

Joel Lamy-Poirier e0921c6b53 Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) (#22575 ) * Add model with cli tool * Remove unwanted stuff * Add new code * Remove inference runner * Style * Fix checks * Test updates * make fixup * fix docs * fix doc * fix test * hopefully fix pipeline tests * refactor * fix CIs * add comment * rename to `GPTBigCodeForCausalLM` * correct readme * make fixup + docs * make fixup * fixes * fixes * Remove pruning * Remove import * Doc updates * More pruning removal * Combine copies * Single MQA implementation, remove kv cache pre-allocation and padding * Update doc * Revert refactor to match gpt2 style * Merge back key and value caches, fix some type hints * Update doc * Fix position ids pith padding (PR 21080) * Add conversion script temporarily * Update conversion script * Remove checkpoint conversion * New model * Fix MQA test * Fix copies * try fix tests * FIX TEST!! * remove `DoubleHeadsModel` * add MQA tests * add slow tests * clean up * add CPU checker * final fixes * fixes - fix GPU issue - fixed slow tests - skip disk offload * fix final issue * Simplify and comment baddbmm fix * Remove unnecessary code * Transpose tweaks * Use beta=1 on cpu, improve tests --------- Co-authored-by: younesbelkada <younesbelkada@gmail.com>		2023-04-10 10:57:21 +02:00
..
benchmark	[Test refactor 1/5] Per-folder tests reorganization (#15725 )	2022-02-23 15:46:28 -05:00
deepspeed	[deepspeed] offload + non-cpuadam optimizer exception (#22043 )	2023-03-09 08:12:57 -08:00
extended	Apply ruff flake8-comprehensions (#21694 )	2023-02-22 09:14:54 +01:00
fixtures	[WIP] add SpeechT5 model (#18922 )	2023-02-03 12:43:46 -05:00
generation	Generate: `TextIteratorStreamer` timeout (#22576 )	2023-04-05 09:57:46 +01:00
mixed_int8	[`bnb`] fix bnb failing test (#22439 )	2023-03-29 15:13:00 +02:00
models	Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) (#22575 )	2023-04-10 10:57:21 +02:00
onnx	Time to Say Goodbye, torch 1.7 and 1.8 (#22291 )	2023-03-21 19:22:01 +01:00
optimization	Make schedulers picklable by making lr_lambda fns global (#21768 )	2023-03-02 12:08:43 -05:00
pipelines	Soft error whisper. (#22475 )	2023-04-04 16:21:57 +02:00
repo_utils	Test fetch v2 (#22367 )	2023-03-31 16:18:43 -04:00
sagemaker	Apply ruff flake8-comprehensions (#21694 )	2023-02-22 09:14:54 +01:00
tokenization	Update quality tooling for formatting (#21480 )	2023-02-06 18:10:56 -05:00
trainer	Implemented safetensors checkpoints save/load for Trainer (#22498 )	2023-04-04 09:05:04 -04:00
utils	Update tiny model summary file for recent models (#22637 )	2023-04-06 22:52:59 +02:00
__init__.py	GPU text generation: mMoved the encoded_prompt to correct device	2020-01-06 15:11:12 +01:00
test_backbone_common.py	Backbone add mixin tests (#22542 )	2023-04-06 13:50:15 +01:00
test_configuration_common.py	Remove set_access_token usage + fail tests if FutureWarning (#22051 )	2023-03-09 09:23:48 -05:00
test_feature_extraction_common.py	Remove set_access_token usage + fail tests if FutureWarning (#22051 )	2023-03-09 09:23:48 -05:00
test_image_processing_common.py	Remove set_access_token usage + fail tests if FutureWarning (#22051 )	2023-03-09 09:23:48 -05:00
test_image_transforms.py	Rescale image back if it was scaled during PIL conversion (#22458 )	2023-03-30 11:29:11 +01:00
test_modeling_common.py	Fix inverted conditional in TF common test! (#22540 )	2023-04-04 21:59:54 +01:00
test_modeling_flax_common.py	Remove set_access_token usage + fail tests if FutureWarning (#22051 )	2023-03-09 09:23:48 -05:00
test_modeling_tf_common.py	Fix inverted conditional in TF common test! (#22540 )	2023-04-04 21:59:54 +01:00
test_pipeline_mixin.py	Make tiny model creation + pipeline testing more robust (#22500 )	2023-04-06 17:45:55 +02:00
test_sequence_feature_extraction_common.py	Apply ruff flake8-comprehensions (#21694 )	2023-02-22 09:14:54 +01:00
test_tokenization_common.py	Fix llama tokenizer (#22402 )	2023-04-03 09:07:32 -04:00