transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-05 13:50:13 +06:00

Author	SHA1	Message	Date
s-JoL	c2c99dc7ef	add open-llama model with ckpt (#22795 ) * update Open-Llama model * update * update format * update doc * update * update stable embedding test * update test case * update format * update readme * fix typo * update name * remove tokenizer and update format * remove convert_open_llama_weights_to_hf * update warning and doc_string --------- Co-authored-by: songliang.bayesian <songliang.bayesian@bytedance.com>	2023-04-28 11:01:32 -04:00
Sugawara	6daa9cb515	add GPTNeoXForSequenceClassification (#22671 ) * add GPTNeoXForSequenceClassification * move the labels to logits.device (ref: #22561) * fix	2023-04-10 11:52:23 -04:00
Joel Lamy-Poirier	e0921c6b53	Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) (#22575 ) * Add model with cli tool * Remove unwanted stuff * Add new code * Remove inference runner * Style * Fix checks * Test updates * make fixup * fix docs * fix doc * fix test * hopefully fix pipeline tests * refactor * fix CIs * add comment * rename to `GPTBigCodeForCausalLM` * correct readme * make fixup + docs * make fixup * fixes * fixes * Remove pruning * Remove import * Doc updates * More pruning removal * Combine copies * Single MQA implementation, remove kv cache pre-allocation and padding * Update doc * Revert refactor to match gpt2 style * Merge back key and value caches, fix some type hints * Update doc * Fix position ids pith padding (PR 21080) * Add conversion script temporarily * Update conversion script * Remove checkpoint conversion * New model * Fix MQA test * Fix copies * try fix tests * FIX TEST!! * remove `DoubleHeadsModel` * add MQA tests * add slow tests * clean up * add CPU checker * final fixes * fixes - fix GPU issue - fixed slow tests - skip disk offload * fix final issue * Simplify and comment baddbmm fix * Remove unnecessary code * Transpose tweaks * Use beta=1 on cpu, improve tests --------- Co-authored-by: younesbelkada <younesbelkada@gmail.com>	2023-04-10 10:57:21 +02:00
Mitch Naylor	57f25f4b7f	Add Mega: Moving Average Equipped Gated Attention (#21766 ) * add mega file structure and plain pytorch version of mega source code * added config class with old naming conventions * filled in mega documentation * added config class and embeddings with optional token types * updated notes * starting the conversion process, deleted intermediate and added use_cache back to config * renamed config attributes in modeling_mega.py * checkpointing before refactoring incremental decoding functions * removed stateful incremental key/values for EMA and self-attention * refactored MovingAverageGatedAttention to remove stateful k/v history and use unified attention mask * MovingAverageGatedAttention works with incremental decoding + past values, added sequence length enforcement * more comments in MovingAverageGatedAttention + checkpointing before GatedCrossAttention * bug fix in attention mask handling in MovingAverageGatedAttention * removed incremental state from GatedCrossAttention and removed IncrementalState class * finished gated cross attention and got MegaLayer working * fixed causal masking in mega decoder * fixed how padding and causal masks are passed through MegaLayer with and without k/v caching * finished MegaModel; tested with encoder, decoder-only, and cross-attention type inputs; started work on downstream classes; removed mentions of position_ids * added optional dense hidden layer for masked and causal LM classes * docstring updates in MultiHeadEMA and GatedCrossAttention, removed unnecessary inputs in cross-attention * removed before_attn_fn in Mega class and updated docstrings and comments up to there * bug fix in MovingAverageGatedAttention masking * working conversion of MLM checkpoint in scratchpad script -- perfect matches * moved arg for hidden dense layer in LM head to config; discovered issue where from_pretrained is renaming gamma and beta parameters * renamed gamma and beta parameters to avoid HF renaming when loading from checkpoint * finished checkpoint conversion script * cleanup old class in mega config script * removed 'copied from' statements and passing integration tests * added num_attention_heads=1 to config for integration compatibility, decoder tests working, generation tests failing * fixed tuple output of megamodel * all common tests passing after fixing issues in decoder, gradient retention, and initialization * added mega-specific tests, ready for more documentation and style checks * updated docstrings; checkpoint before style fixes * style and quality checks, fixed initialization problem in float_tensor, ready for PR * added mega to toctree * removed unnecessary arg in megaconfig * removed unused arg and fixed code samples with leftover roberta models * Apply suggestions from code review Applied all suggestions except the one renaming a class, as I'll need to update that througout Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fixed issue where .view breaks batch dimension, conversion script fixed with absolute imports, updated readme with Mega->MEGA * removed asserts in Mega code, renamed sequencenorm, gatedcrossattention, and NFFN, replaced get_activation_fn with ACTFN, and added sequencenorm to layer norms * reformatted .forward() docstrings to match style and removed unused mask input in cross-attention * removed all reset_parameters() methods and rolled into MegaPreTrainedModel._init_weights() * renamed all single-letter variables and improved readability in tensor size comments, Mega->MEGA in 2 documentation files * variable names in NFFN * manual Mega->MEGA changes in docs * Mega->MEGA in config auto * style and quality fixes * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * renamed parameters and variables with confusing names, added copied from statements, moved fft conv to its own method, other cleanup from PR comments * commit before dealing with merge conflicts * made new attention activation functions available in ACT2FN and added generation test from OPT * style and quality in activations and tests * documentation fixes, renaming variables in dropout and rotary positions, used built-in causal masking, encoders->layers in MegaModel, moved comments into docstrings * style and quality fixes after latest updates, before rotary position ids * causal mask in MegaBlock docstring + added missing device passing * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update README.md Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * added Mega prefixes where missing, reverted MegaSequenceNorm to if-else, other module renaming requested in PR * style and quality fixes + readme updates pointing to main --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2023-03-24 08:17:27 -04:00
amyeroberts	8ac29fe090	Fix doc links (#22274 )	2023-03-20 17:07:31 +00:00
lewtun	f251441387	Add LlamaForSequenceClassification (#22209 ) * Add LlamaForSequenceClassification * Update src/transformers/models/llama/modeling_llama.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/models/llama/modeling_llama.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Add docstring * Add test * Add input embedding getter and setter * Remove dead code --------- Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>	2023-03-17 14:39:26 +01:00
Thomas Paviot	ba2a5f13f7	Fix en documentation typos (#21799 ) * fix wrong url * typos in english documentation	2023-02-27 08:36:36 +01:00
Susnato Dhar	0c9c8472e6	Add Ernie-M Model to huggingface (#21349 ) * config and tokenization(fast too) changed and ErnieEncoder added * Slow Tokenization Added * Tokenizer(slow) is now working and Fast Tokenizer removed * Added Config code * Added Base Model and utils * ErnieMModel is now working * All added except tests * All tests passed except ErnieUIEM * All tests passed * all fixes done * all fixes done * fixed MAP * fixed check_code_quality * fixed Build PR Documentation issue * Added changes(comments) and also updated to the latest upstream/main * Added fixup * Added # Copied comments * Added fixup * Added more comments and some nits * Added fixup * Fixed README_hd.md * Added more fixes * ErnieMTokenizer (being sentencepiece) protected and other docs edited * Added code_quality fix * Fixed for * Added more fix * modified AZ * ernie-m tokenization test added! * attention mask part fixed(with 0->self.config.pad_token_id) * applied make fixup	2023-02-15 09:24:56 -05:00
Jannis Vamvas	b0d539ccad	Add X-MOD (#20939 ) * Add X-MOD to Readme * Add documentation for X-MOD * Implement X-MOD * Fix formatting of X-MOD docs * Change signature of X-MOD forward methods to use lang_ids * Minor changes * Rebase with main and run make fix-copies * Make suggested changes to docstrings * Improve code readability Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Fix code style * Conversion script: Remove asserts and type annotations * Remove _TOKENIZER_FOR_DOC * XMOD -> Xmod * Update copyright note * Fix doctests * Fix docstring * Add integration test for FillMaskPipeline * Revert "Add integration test for FillMaskPipeline" This reverts commit 4381eb3b1d0f5d85785f89caba83928e6efa6d1f. * Add end-to-end integration test for mask fill * make style * Rebase with main and make fix-copies --------- Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>	2023-02-10 15:32:06 +01:00
Steven Liu	0a75717602	Fix task guide formatting (#21409 ) fix formatting	2023-02-02 10:06:26 -08:00
Maria Khalusova	73a2ff6974	Automated compatible models list for task guides (#21338 ) * initial commit. added tip placeholders and a script * removed unused imports, fixed paths * fixed generated links * make style * split language modeling doc into two: causal language modeling and masked language modeling * added check_task_guides.py to make fix-copies * review feedback addressed	2023-01-27 13:19:28 -05:00
Steven Liu	d896029e27	Add inference section to task guides (#18781 ) * 📝 start adding inference section to task guides * ✨ make style * 📝 add multiple choice * add rest of inference sections * make style * add compute_metric, push_to_hub, pipeline * make style * add updated sequence and token classification * make style * make edits in token classification * add audio classification * make style * add asr * make style * add image classification * make style * add summarization * make style * add translation * make style * add multiple choice * add language modeling * add qa * make style * review and edits * apply reviews * make style * fix call to processor * apply audio reviews * update to better asr model * make style	2022-11-21 10:06:21 -08:00
Matt	2b9513fdab	Update TF fine-tuning docs (#18654 ) * Update TF fine-tuning docs * Fix formatting * Add some section headers so the right sidebar works better * Squiggly it * Update docs/source/en/training.mdx Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/training.mdx Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/training.mdx Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/training.mdx Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/training.mdx Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/training.mdx Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/training.mdx Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/training.mdx Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/training.mdx Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/training.mdx Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/training.mdx Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/training.mdx Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/training.mdx Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Explain things in the text, not the comments * Make the two dataset creation methods into a list * Move the advice about collation out of a <Tip> * Edits for clarity * Edits for clarity * Edits for clarity * Replace `to_tf_dataset` with `prepare_tf_dataset` in the fine-tuning pages * Restructure the page a little bit * Restructure the page a little bit * Restructure the page a little bit Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-09-07 13:30:07 +01:00
Sylvain Gugger	2e90c3df8f	Doc to dataset (#18037 ) * Link to the Datasets doc * Remove unwanted file	2022-07-06 12:10:06 -04:00
Sylvain Gugger	b9a768b3ff	Enable doc in Spanish (#16518 ) * Reorganize doc for multilingual support * Fix style * Style * Toc trees * Adapt templates	2022-04-04 10:25:46 -04:00

15 Commits