transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-23 14:29:01 +06:00

Author	SHA1	Message	Date
Hamid Shojanazeri	af6e01c5bc	Fix for the issue of device-id getting hardcoded for token_type_ids during Tracing [WIP] (#11252 ) * registering a buffer for token_type_ids, to pass the error of device-id getting hardcoded when tracing * sytle format * adding persistent flag to the resgitered buffers that prevent from adding them to the state_dict and addresses the Backward compatibility issue * adding the try catch to the fix as persistent flag is only available from PT >1.6 * adding version check * added the condition to only use the token_type_ids buffer when its autogenerated not passed by user * adding comments and making the conidtion where token_type_ids are None to use the registered buffer * taking out position-embeddding from the if block * adding comments * handling the case if buffer for position_ids was not registered * reverted the changes on position_ids, fix the issue with size of token_type_ids buffer, moved the modification for generated token_type_ids to Bertmodel, instead of Embeddings * reverting the token_type_ids in case of None to the previous version * reverting changes on position_ids adding back the if block * changes added by running make fix-copies * changes added by running make fix-copies and added the import version as it was getting used * changes added by running make fix-copies * changes added by running make fix-copies * fixing the import format * fixing the import format * modified to use temp tensor for trimed and expanded token_type_ids buffer * changes made by fix-copies after temp tensor modifications * changes made by fix-copies after temp tensor modifications * changes made by fix-copies after temp tensor modifications * clean up * clean up * clean up * clean up * Nit * Nit * Nit * modified according to support device conversion on traced models * modified according to support device conversion on traced models * modified according to support device conversion on traced models * modified according to support device conversion on traced models * changes based on latest in master * Adapt templates * Add version import Co-authored-by: Ubuntu <ubuntu@ip-172-31-32-81.us-west-2.compute.internal> Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>	2021-06-22 05:21:30 -04:00
Xa9aX ツ	f3558bbcfd	Depreciate pythonic Mish and support PyTorch 1.9 version of Mish (#12240 ) * Moved Mish to Torch 1.9 version * Run black formatting	2021-06-18 09:13:45 -04:00
Stas Bekman	a156da9a23	consistent nn. and nn.functional: p2 templates (#12153 )	2021-06-14 11:41:24 -07:00
François Lagunas	f8bd8c6c7e	Fixes bug that appears when using QA bert and distilation. (#12026 ) * Fixing bug that appears when using distilation (and potentially other uses). During backward pass Pytorch complains with: RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation This happens because the QA model code modifies the start_positions and end_positions input tensors, using clamp_ function: as a consequence the teacher and the student both modifies the inputs, and backward pass fails. * Fixing all models QA clamp_ bug.	2021-06-07 11:21:59 -04:00
Suraj Patil	185122ef22	fix docs of past_key_values (#12049 )	2021-06-07 15:24:03 +05:30
Sylvain Gugger	7eee950ac3	Re-styling in seq2seq attention (#11613 )	2021-05-06 14:24:19 -04:00
Daniel Stancl	38a716cd41	TF BART models - Add `cross_attentions` to model output and fix cross-attention head masking (#10699 ) * Add cross_attn_head_mask to BART * Fix cross_attentions in TFBart-like models * This commit enables returning of `cross_attentions` for TFBart-like models * It also fixes attention head masking in cross-attenion module * Update TF model templates * Fix missing , in TF model templates * Fix typo: congig -> config	2021-04-26 14:16:21 +02:00
Daniel Stancl	e3ff165aa5	Fix cross-attention head mask for Torch encoder-decoder models (#10605 ) * Fix cross-attention head mask for Torch BART models * Fix head masking for cross-attention module for the following models: BART, Blenderbot, Blenderbot_small, M2M_100, Marian, MBart, Pegasus * Enable test_headmasking for M2M_100 model * Fix cross_head_mask for FSMT, LED and T5 * This commit fixes `head_mask` for cross-attention modules in the following models: FSMT, LED, T5 * It also contains some smaller changes in doc so that it is be perfectly clear the shape of `cross_head_mask` is the same as of `decoder_head_mask` * Update template * Fix template for BartForCausalLM * Fix cross_head_mask for Speech2Text models * Fix cross_head_mask in templates * Fix args order in BartForCausalLM template * Fix doc in BART templates * Make more explicit naming * `cross_head_mask` -> `cross_attn_head_mask` * `cross_layer_head_mask` -> `cross_attn_layer_head_mask` * Fix doc * make style quality * Fix speech2text docstring	2021-04-23 18:58:06 +02:00
Sylvain Gugger	74712e22f3	Honor contributors to models (#11329 ) * Honor contributors to models * Fix typo * Address review comments * Add more authors	2021-04-21 09:47:27 -04:00
Sylvain Gugger	45fc8c7951	Make `get_special_tokens_mask` consider all tokens (#11163 )	2021-04-09 11:57:44 -04:00
Stas Bekman	c9035e4537	fix: The 'warn' method is deprecated (#11105 ) * The 'warn' method is deprecated * fix test	2021-04-07 09:20:06 -04:00
Sylvain Gugger	acc3bd9d2a	Enforce string-formatting with f-strings (#10980 ) * First third * Styling and fix mistake * Quality * All the rest * Treat %s and %d * typo * Missing ) * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2021-03-31 10:00:27 -04:00
Sylvain Gugger	700229f8a4	Fixes in the templates (#10951 ) * Fixes in the templates * Define in all cases * Dimensionality -> Dimension Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>	2021-03-29 17:36:13 -04:00
Sylvain Gugger	2295d783d5	Copy tokenizer files in each of their repo (#10624 ) * Move tokenizer files in each repo * Fix mBART50 tests * Fix mBART tests * Fix Marian tests * Update templates	2021-03-10 11:26:23 -05:00
Sylvain Gugger	7da995c00c	Fix embeddings for PyTorch 1.8 (#10549 ) * Fix embeddings for PyTorch 1.8 * Try with PyTorch 1.8.0 * Fix embeddings init * Fix copies * Typo * More typos	2021-03-05 16:18:48 -05:00
Patrick von Platen	2d2ed2cc18	[T5] Fix speed degradation bug t5 (#10496 ) * fix speed degradation bug t5 * fix for all models * fix code quality	2021-03-03 12:42:41 +03:00
mingruimingrui	894db6701e	Bugfix: Removal of padding_idx in BartLearnedPositionalEmbedding (#10200 ) * Assumption of padding_idx <2 might not stand * Use offset instead of 2 * Fix with black * Change behavior to warning instead for backward compatibility. * Fix with black * Remove warning * Make padding_idx non-required * padding_idx fix for blenderbot * padding_idx fix for blenderbot_small * padding_idx fix for led * padding_idx fix for mbart * Remove extra whitespaces * padding_idx fix for template * Fix padding_idx passed to nn.Embedding mistake * Fixed padding_idx passed to positional embedding in template * Remove padding_idx from pytorch learned positional embeddings * Remove accidentally added quotes * Remove padding_idx from tf learned positional embeddings * Remove zeroing of weights in __init__ Co-authored-by: Wang Ming Rui <mingrui.wang@C02CJTUYMD6M.local>	2021-02-25 14:33:13 +03:00
Julien Plu	83d803ba02	Making TF BART-like models XLA and AMP compliant (#10191 ) * Update BART * Update Blenderbot * Update BlenderbotSmall * Update Marian * Update MBart * Update MBart * Update Pegasus * Update template * Fix Marian and Pegasus * Apply style * Default initializer * Default initializer * Default initializer * Remove int32 casts * Fix template * Remove more cast	2021-02-17 17:48:56 +01:00
Julien Plu	31b0560ab4	Add AMP for Albert (#10141 )	2021-02-15 17:18:33 +01:00
Julien Plu	570218878a	Fix TF template (#10189 ) * Fix template * Update Seq2Seq tests	2021-02-15 09:21:57 -05:00
Patrick von Platen	8e13b73593	Update README.md	2021-02-11 18:35:27 +03:00
Patrick von Platen	d6b4f48ecb	Update ADD_BIG_BIRD.md	2021-02-11 18:34:17 +03:00
Patrick von Platen	4cda2d73ef	Update ADD_BIG_BIRD.md	2021-02-09 19:58:35 +03:00
Julien Plu	b82fe7d258	Replace strided slice with tf.expand_dims (#10078 ) * Replace tf.newaxis -> tf.expand_dims * Fix tests * Fix tests * Use reshape when a tensors needs a double expand * Fix GPT2 * Fix GPT2	2021-02-09 11:48:28 -05:00
Lysandre Debut	c9df1b1d53	Model templates (#10072 )	2021-02-08 09:07:02 -05:00
Julien Plu	cdd8659231	Fix TF template (#10069 ) * Fix template * Fix template	2021-02-08 08:10:50 -05:00
Julien Plu	31563e056d	Restore TF embeddings and attention layers to their previous version (#9890 ) * Refacto BERT * Restore all the concerned models * Remove print * Update template * Apply Sylvain's and Morgan's comments * Fix cast * Put the cast inside call * Remove cond in ebds * Fix funnel * Restore previous dot product (attention_scores) computation * Add ConvBERT and BART * Make all the S2S models ONNX compliant * Fix test * Fix check copies	2021-02-08 14:36:30 +03:00
Lysandre Debut	ae37ceacbd	Fix typo (#10064 )	2021-02-08 06:02:05 -05:00
Patrick von Platen	89be094e29	[Templates] Add template "call-for-model" markdown and "call-for-big-bird" markdown (#9921 ) * add big bird * change teacher to mentor * add proposal template * adapt template * delete old template * correct some links * finish template * create big bird from template * add big bird * improve boxes * finish boxes * add pointers for BigBird * finish big bird * up * up * up * up * apply lysandres and sylvains suggestions * delete bogus file * correct markdown * try different style * try different style * finalize	2021-02-05 15:47:54 +03:00
Lysandre Debut	e89c959af9	Fix model templates (#9999 )	2021-02-04 07:47:26 -05:00
demSd	00031785a8	BartForCausalLM analogs to `ProphetNetForCausalLM` (#9128 ) * initiliaze bart4causalLM * create BartDecoderWrapper, setters/getters * delete spaces * forward and additional methods * update cache function, loss function, remove ngram* params in data class. * add bartcausallm, bartdecoder testing * correct bart for causal lm * remove at * add mbart as well * up * fix typo * up * correct * add pegasusforcausallm * add blenderbotforcausallm * add blenderbotsmallforcausallm * add marianforcausallm * add test for MarianForCausalLM * add Pegasus test * add BlenderbotSmall test * add blenderbot test * fix a fail * fix an import fail * a fix * fix * Update modeling_pegasus.py * fix models * fix inputs_embeds setting getter * adapt tests * correct repo utils check * finish test improvement * fix tf models as well * make style * make fix-copies * fix copies * run all tests * last changes * fix all tests Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2021-02-04 11:56:12 +03:00
Patrick von Platen	0f4dc5d864	fix typo in naming (#9944 )	2021-02-02 12:22:42 +03:00
Patrick von Platen	538b3b4607	[Tokenizer Utils Base] Make pad function more flexible (#9928 ) * change tokenizer requirement * split line * Correct typo from list to str * improve style * make other function pretty as well * add comment * correct typo * add new test * pass tests for tok without padding token * Apply suggestions from code review	2021-02-02 10:35:27 +03:00
Patrick von Platen	0e3be1ac8f	Add new model docs (#9667 ) * add new model logic * fix docs * change structure * improve add_new_model * push new changes * up * up * correct spelling * improve docstring * correct line length * update readme * correct links * correct typos * only add rst file for now * Apply suggestions from code review 1 Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Bram Vanroy <Bram.Vanroy@UGent.be> * Apply suggestions from code review Co-authored-by: Bram Vanroy <Bram.Vanroy@UGent.be> Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Stefan Schweter <stefan@schweter.it> Co-authored-by: Bram Vanroy <Bram.Vanroy@UGent.be> * Apply suggestions from code review Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Pierric Cistac <Pierrci@users.noreply.github.com> * finish adding all suggestions * make style * apply Niels feedback * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * apply sylvains suggestions Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Bram Vanroy <Bram.Vanroy@UGent.be> Co-authored-by: Stefan Schweter <stefan@schweter.it> Co-authored-by: Pierric Cistac <Pierrci@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-02-01 17:55:10 +03:00
Sylvain Gugger	d85691ac75	Doc title in the template (#9910 )	2021-02-01 03:05:31 -05:00
Funtowicz Morgan	2ee9f9b69e	Fix computation of attention_probs when head_mask is provided. (#9853 ) * Fix computation of attention_probs when head_mask is provided. Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Apply changes to the template Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>	2021-01-28 06:11:52 -05:00
Lysandre Debut	763ece2fea	Fix model templates (#9842 )	2021-01-27 08:20:58 -05:00
Julien Plu	bd701ab1a0	Fix template (#9840 )	2021-01-27 07:40:30 -05:00
Julien Plu	4adbdce5ee	Clean TF Bert (#9788 ) * Start cleaning BERT * Clean BERT and all those depends of it * Fix attribute name * Apply style * Apply Sylvain's comments * Apply Lysandre's comments * remove unused import	2021-01-27 11:28:11 +01:00
Julien Plu	a1720694a5	Remove a TF usage warning and rework the documentation (#9756 ) * Rework documentation * Update the template * Trigger CI * Restore the warning but with the TF logger * Update convbert doc	2021-01-27 10:45:42 +01:00
Lysandre	897a24c869	Fix head_mask for model templates	2021-01-26 11:02:48 +01:00
Julien Plu	a7dabfb3d1	Fix TF s2s models (#9478 ) * Fix Seq2Seq models for serving * Apply style * Fix lonfgormer * Fix mBart/Pegasus/Blenderbot * Apply style * Add a main intermediate layer * Apply style * Remove import * Apply tf.function to Longformer * Fix utils check_copy * Update S2S template * Fix BART + Blenderbot * Fix BlenderbotSmall * Fix BlenderbotSmall * Fix BlenderbotSmall * Fix MBart * Fix Marian * Fix Pegasus + template * Apply style * Fix common attributes test * Forgot to fix the LED test * Apply Patrick's comment on LED Decoder	2021-01-21 17:03:29 +01:00
Julien Plu	3f290e6c84	Fix mixed precision in TF models (#9163 ) * Fix Gelu precision * Fix gelu_fast * Naming * Fix usage and apply style * add TF gelu approximate version * add TF gelu approximate version * add TF gelu approximate version * Apply style * Fix albert * Remove the usage of the Activation layer	2021-01-21 07:00:11 -05:00
Julien Plu	7251a4736d	Fix template (#9697 )	2021-01-20 09:04:53 -05:00
Julien Plu	14042d560f	New TF embeddings (cleaner and faster) (#9418 ) * Create new embeddings + add to BERT * Add Albert * Add DistilBert * Add Albert + Electra + Funnel * Add Longformer + Lxmert * Add last models * Apply style * Update the template * Remove unused imports * Rename attribute * Import embeddings in their own model file * Replace word_embeddings per weight * fix naming * Fix Albert * Fix Albert * Fix Longformer * Fix Lxmert Mobilebert and MPNet * Fix copy * Fix template * Update the get weights function * Update src/transformers/modeling_tf_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/electra/modeling_tf_electra.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * address Sylvain's comments Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-01-20 12:08:12 +01:00
Sylvain Gugger	7e662e6a3b	Fix model templates and use less than 119 chars (#9684 ) * Fix model templates and use less than 119 chars * Missing new line	2021-01-19 17:11:22 -05:00
Yusuke Mori	b020a736c3	Update `past_key_values` in GPT-2 (#9596 ) * Update past_key_values in gpt2 (#9391) * Update generation_utils, and rename some items * Update modeling_gpt2 to avoid an error in gradient_checkpointing * Remove 'reorder_cache' from util and add variations to XLNet, TransfoXL, GPT-2 * Change the location of '_reorder_cache' in modeling files * Add '_reorder_cache' in modeling_ctrl * Fix a bug of my last commit in CTRL * Add '_reorder_cache' to GPT2DoubleHeadsModel * Manage 'use_cache' in config of test_modeling_gpt2 * Clean up the doc string * Update src/transformers/models/gpt2/modeling_gpt2.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Fix the doc string (GPT-2, CTRL) * improve gradient_checkpointing_behavior Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2021-01-19 16:00:15 +01:00
Patrick von Platen	7f28613213	[TFBart] Split TF-Bart (#9497 ) * make templates ready * make add_new_model_command_ready * finish tf bart * prepare tf mbart * finish tf bart * add tf mbart * add marian * prep pegasus * add tf pegasus * push blenderbot tf * add blenderbot * add blenderbot small * clean-up * make fix copy * define blend bot tok * fix * up * make style * add to docs * add copy statements * overwrite changes * improve * fix docs * finish * fix last slow test * fix missing git conflict line * fix blenderbot * up * fix blenderbot small * load changes * finish copied from * upload fix	2021-01-12 02:06:32 +01:00
Julien Plu	1e3c362235	Fix template (#9512 )	2021-01-11 08:03:28 -05:00
Julien Plu	1243ee7d0c	Full rework of the TF input/output embeddings and bias resizing (#9193 ) * Start rework resizing * Rework bias/decoder resizing * Full resizing rework * Full resizing rework * Start to update the models with the new approach * Finish to update the models * Update all the tests * Update the template * Fix tests * Fix tests * Test a new approach * Refactoring * Refactoring * Refactoring * New rework * Rework BART * Rework bert+blenderbot * Rework CTRL * Rework Distilbert * Rework DPR * Rework Electra * Rework Flaubert * Rework Funnel * Rework GPT2 * Rework Longformer * Rework Lxmert * Rework marian+mbart * Rework mobilebert * Rework mpnet * Rework openai * Rework pegasus * Rework Roberta * Rework T5 * Rework xlm+xlnet * Rework template * Fix TFT5EncoderOnly + DPRs * Restore previous methods * Fix Funnel * Fix CTRL and TransforXL * Apply style * Apply Sylvain's comments * Restore a test in DPR * Address the comments * Fix bug * Apply style * remove unused import * Fix test * Forgot a method * missing test * Trigger CI * naming update * Rebase * Trigger CI	2021-01-11 06:27:28 -05:00

1 2 3 4

157 Commits