transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-24 23:08:57 +06:00

Author	SHA1	Message	Date
Sylvain Gugger	27d4639779	Make gradient_checkpointing a training argument (#13657 ) * Make gradient_checkpointing a training argument * Update src/transformers/modeling_utils.py Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * Update src/transformers/configuration_utils.py Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * Fix tests * Style * document Gradient Checkpointing as a performance feature * Small rename * PoC for not using the config * Adapt BC to new PoC * Forgot to save * Rollout changes to all other models * Fix typo Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Stas Bekman <stas@stason.org>	2021-09-22 07:51:38 -04:00
Bhadresh Savani	3fbb55c757	[Flax] Fixes typo in Bart based Flax Models (#13565 )	2021-09-15 11:03:52 +05:30
Nils Reimers	c8be8a9adb	Update model configs - Allow setters for common properties (#13026 ) * refactor GPT Config to allow dyn. properties * make attribute_map a class attribute * remove old code * update unit test to test config: Add test for common properties setter * update unit test to test config: Add test for common properties passed as parameters to __init__ * update to black code format * Allow that setters are not defined for certain config classes * update config classes to implement attribute_map * bugfix lxmert config - id2labels was not defined when num_labels was set * update broken configs - add attribute_maps * update bart config * update black codestyle * update documentation on common config attributes * update GPTJ config to new attribute map * update docs on common attributes * gptj config: add max_position_embeddings * gptj config: format with black * update speech to text 2 config * format doc file to max_len 119 * update config template	2021-09-06 16:30:13 +02:00
Patrick von Platen	02039352b2	Update README.md	2021-09-01 09:50:21 +02:00
Jonathan Chang	d160782a53	Add template for adding flax models (#12441 ) * Add option to add flax * Add flax template for __init__.py * Add flax template for .rst * Copy TF modeling template * Add a missing line in modeling_tf_... template * Update first half of modeling_flax_.. * Update encoder flax template * Copy test_modeling_tf... as test_modeling_flax... * Replace some TF to Flax in test_modeling_flax_... * Replace tf to np some function might not work, like _assert_tensors_equal * Replace remaining tf to np (might not work) * Fix cookiecutter * Add Flax in to_replace_... template * Update transformers-cli add-new-model * Save generate_flax in configuration.json This will be read by transformers-cli * Fix to_replace_... and cli * Fix replace cli * Fix cookiecutter name * Move docstring earlier to avoid not defined error * Fix a missing Module * Add encoder-decoder flax template from bart * Fix flax test * Make style * Fix endif * Fix replace all "utf-8 -> unp-8" * Update comment * Fix flax template (add missing ..._DOCSTRING) * Use flax_bart imports in template (was t5) * Fix unp * Update templates/adding_a_new_model/tests * Revert "Fix unp" This reverts commit `dc9002a41d`. * Remove one line of copied from to suppress CI error * Use generate_tensorflow_pytorch_and_flax * Add a missing part * fix typo * fix flax config * add examples for flax * small rename * correct modeling imports * correct auto loading * corrects some flax tests * correct small typo * correct as type * finish modif * correct more templates * final fixes * add file testers * up * make sure tests match template regex * correct pytorch * correct tf * correct more tf * correct imports * minor error * minor error * correct init * more fixes * correct more flax tests * correct flax test * more fixes * correct docs * update * fix Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2021-09-01 09:49:03 +02:00
Jongheon Kim	ef8d6f2b4a	Set missing seq_length variable when using inputs_embeds with ALBERT & Remove code duplication (#13152 ) * Set seq_length variable when using inputs_embeds * remove code duplication	2021-08-31 06:51:25 -04:00
Sylvain Gugger	9870093f7b	[WIP] Disentangle auto modules from other modeling files (#13023 ) * Initial work * All auto models * All tf auto models * All flax auto models * Tokenizers * Add feature extractors * Fix typos * Fix other typo * Use the right config * Remove old mapping names and update logic in AutoTokenizer * Update check_table * Fix copies and check_repo script * Fix last test * Add back name * clean up * Update template * Update template * Forgot a ) * Use alternative to fixup * Fix TF model template * Address review comments * Address review comments * Style	2021-08-06 13:12:30 +02:00
Sylvain Gugger	790f1c9545	Fix template for inputs docstrings (#12976 )	2021-08-03 08:28:25 +02:00
Lysandre Debut	c3d9ac7607	Expose get_config() on ModelTesters (#12812 ) * Expose get_config() on ModelTesters * Typo	2021-07-21 04:13:11 -04:00
Sylvain Gugger	0a6b9048d1	Init pickle (#12567 ) * Try to pickle transformers * Deal with special objs better * Make picklable	2021-07-08 07:20:46 -04:00
Michal Szutenberg	0d2bffad31	Remove tf.roll wherever not needed (#12512 ) It was used in shift_right. After this change TF code is more similar to Pytorch implementations Also, TF graphs are optimized (one node less)	2021-07-07 16:17:30 +01:00
Sylvain Gugger	9eda6b52e2	Add all XxxPreTrainedModel to the main init (#12314 ) * Add all XxxPreTrainedModel to the main init * Add to template * Add to template bis * Add FlaxT5	2021-06-23 10:40:54 -04:00
Hamid Shojanazeri	af6e01c5bc	Fix for the issue of device-id getting hardcoded for token_type_ids during Tracing [WIP] (#11252 ) * registering a buffer for token_type_ids, to pass the error of device-id getting hardcoded when tracing * sytle format * adding persistent flag to the resgitered buffers that prevent from adding them to the state_dict and addresses the Backward compatibility issue * adding the try catch to the fix as persistent flag is only available from PT >1.6 * adding version check * added the condition to only use the token_type_ids buffer when its autogenerated not passed by user * adding comments and making the conidtion where token_type_ids are None to use the registered buffer * taking out position-embeddding from the if block * adding comments * handling the case if buffer for position_ids was not registered * reverted the changes on position_ids, fix the issue with size of token_type_ids buffer, moved the modification for generated token_type_ids to Bertmodel, instead of Embeddings * reverting the token_type_ids in case of None to the previous version * reverting changes on position_ids adding back the if block * changes added by running make fix-copies * changes added by running make fix-copies and added the import version as it was getting used * changes added by running make fix-copies * changes added by running make fix-copies * fixing the import format * fixing the import format * modified to use temp tensor for trimed and expanded token_type_ids buffer * changes made by fix-copies after temp tensor modifications * changes made by fix-copies after temp tensor modifications * changes made by fix-copies after temp tensor modifications * clean up * clean up * clean up * clean up * Nit * Nit * Nit * modified according to support device conversion on traced models * modified according to support device conversion on traced models * modified according to support device conversion on traced models * modified according to support device conversion on traced models * changes based on latest in master * Adapt templates * Add version import Co-authored-by: Ubuntu <ubuntu@ip-172-31-32-81.us-west-2.compute.internal> Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>	2021-06-22 05:21:30 -04:00
Xa9aX ツ	f3558bbcfd	Depreciate pythonic Mish and support PyTorch 1.9 version of Mish (#12240 ) * Moved Mish to Torch 1.9 version * Run black formatting	2021-06-18 09:13:45 -04:00
Stas Bekman	a156da9a23	consistent nn. and nn.functional: p2 templates (#12153 )	2021-06-14 11:41:24 -07:00
François Lagunas	f8bd8c6c7e	Fixes bug that appears when using QA bert and distilation. (#12026 ) * Fixing bug that appears when using distilation (and potentially other uses). During backward pass Pytorch complains with: RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation This happens because the QA model code modifies the start_positions and end_positions input tensors, using clamp_ function: as a consequence the teacher and the student both modifies the inputs, and backward pass fails. * Fixing all models QA clamp_ bug.	2021-06-07 11:21:59 -04:00
Suraj Patil	185122ef22	fix docs of past_key_values (#12049 )	2021-06-07 15:24:03 +05:30
Sylvain Gugger	7eee950ac3	Re-styling in seq2seq attention (#11613 )	2021-05-06 14:24:19 -04:00
Daniel Stancl	38a716cd41	TF BART models - Add `cross_attentions` to model output and fix cross-attention head masking (#10699 ) * Add cross_attn_head_mask to BART * Fix cross_attentions in TFBart-like models * This commit enables returning of `cross_attentions` for TFBart-like models * It also fixes attention head masking in cross-attenion module * Update TF model templates * Fix missing , in TF model templates * Fix typo: congig -> config	2021-04-26 14:16:21 +02:00
Daniel Stancl	e3ff165aa5	Fix cross-attention head mask for Torch encoder-decoder models (#10605 ) * Fix cross-attention head mask for Torch BART models * Fix head masking for cross-attention module for the following models: BART, Blenderbot, Blenderbot_small, M2M_100, Marian, MBart, Pegasus * Enable test_headmasking for M2M_100 model * Fix cross_head_mask for FSMT, LED and T5 * This commit fixes `head_mask` for cross-attention modules in the following models: FSMT, LED, T5 * It also contains some smaller changes in doc so that it is be perfectly clear the shape of `cross_head_mask` is the same as of `decoder_head_mask` * Update template * Fix template for BartForCausalLM * Fix cross_head_mask for Speech2Text models * Fix cross_head_mask in templates * Fix args order in BartForCausalLM template * Fix doc in BART templates * Make more explicit naming * `cross_head_mask` -> `cross_attn_head_mask` * `cross_layer_head_mask` -> `cross_attn_layer_head_mask` * Fix doc * make style quality * Fix speech2text docstring	2021-04-23 18:58:06 +02:00
Sylvain Gugger	74712e22f3	Honor contributors to models (#11329 ) * Honor contributors to models * Fix typo * Address review comments * Add more authors	2021-04-21 09:47:27 -04:00
Sylvain Gugger	45fc8c7951	Make `get_special_tokens_mask` consider all tokens (#11163 )	2021-04-09 11:57:44 -04:00
Stas Bekman	c9035e4537	fix: The 'warn' method is deprecated (#11105 ) * The 'warn' method is deprecated * fix test	2021-04-07 09:20:06 -04:00
Sylvain Gugger	acc3bd9d2a	Enforce string-formatting with f-strings (#10980 ) * First third * Styling and fix mistake * Quality * All the rest * Treat %s and %d * typo * Missing ) * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2021-03-31 10:00:27 -04:00
Sylvain Gugger	700229f8a4	Fixes in the templates (#10951 ) * Fixes in the templates * Define in all cases * Dimensionality -> Dimension Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>	2021-03-29 17:36:13 -04:00
Sylvain Gugger	2295d783d5	Copy tokenizer files in each of their repo (#10624 ) * Move tokenizer files in each repo * Fix mBART50 tests * Fix mBART tests * Fix Marian tests * Update templates	2021-03-10 11:26:23 -05:00
Sylvain Gugger	7da995c00c	Fix embeddings for PyTorch 1.8 (#10549 ) * Fix embeddings for PyTorch 1.8 * Try with PyTorch 1.8.0 * Fix embeddings init * Fix copies * Typo * More typos	2021-03-05 16:18:48 -05:00
Patrick von Platen	2d2ed2cc18	[T5] Fix speed degradation bug t5 (#10496 ) * fix speed degradation bug t5 * fix for all models * fix code quality	2021-03-03 12:42:41 +03:00
mingruimingrui	894db6701e	Bugfix: Removal of padding_idx in BartLearnedPositionalEmbedding (#10200 ) * Assumption of padding_idx <2 might not stand * Use offset instead of 2 * Fix with black * Change behavior to warning instead for backward compatibility. * Fix with black * Remove warning * Make padding_idx non-required * padding_idx fix for blenderbot * padding_idx fix for blenderbot_small * padding_idx fix for led * padding_idx fix for mbart * Remove extra whitespaces * padding_idx fix for template * Fix padding_idx passed to nn.Embedding mistake * Fixed padding_idx passed to positional embedding in template * Remove padding_idx from pytorch learned positional embeddings * Remove accidentally added quotes * Remove padding_idx from tf learned positional embeddings * Remove zeroing of weights in __init__ Co-authored-by: Wang Ming Rui <mingrui.wang@C02CJTUYMD6M.local>	2021-02-25 14:33:13 +03:00
Julien Plu	83d803ba02	Making TF BART-like models XLA and AMP compliant (#10191 ) * Update BART * Update Blenderbot * Update BlenderbotSmall * Update Marian * Update MBart * Update MBart * Update Pegasus * Update template * Fix Marian and Pegasus * Apply style * Default initializer * Default initializer * Default initializer * Remove int32 casts * Fix template * Remove more cast	2021-02-17 17:48:56 +01:00
Julien Plu	31b0560ab4	Add AMP for Albert (#10141 )	2021-02-15 17:18:33 +01:00
Julien Plu	570218878a	Fix TF template (#10189 ) * Fix template * Update Seq2Seq tests	2021-02-15 09:21:57 -05:00
Patrick von Platen	8e13b73593	Update README.md	2021-02-11 18:35:27 +03:00
Patrick von Platen	d6b4f48ecb	Update ADD_BIG_BIRD.md	2021-02-11 18:34:17 +03:00
Patrick von Platen	4cda2d73ef	Update ADD_BIG_BIRD.md	2021-02-09 19:58:35 +03:00
Julien Plu	b82fe7d258	Replace strided slice with tf.expand_dims (#10078 ) * Replace tf.newaxis -> tf.expand_dims * Fix tests * Fix tests * Use reshape when a tensors needs a double expand * Fix GPT2 * Fix GPT2	2021-02-09 11:48:28 -05:00
Lysandre Debut	c9df1b1d53	Model templates (#10072 )	2021-02-08 09:07:02 -05:00
Julien Plu	cdd8659231	Fix TF template (#10069 ) * Fix template * Fix template	2021-02-08 08:10:50 -05:00
Julien Plu	31563e056d	Restore TF embeddings and attention layers to their previous version (#9890 ) * Refacto BERT * Restore all the concerned models * Remove print * Update template * Apply Sylvain's and Morgan's comments * Fix cast * Put the cast inside call * Remove cond in ebds * Fix funnel * Restore previous dot product (attention_scores) computation * Add ConvBERT and BART * Make all the S2S models ONNX compliant * Fix test * Fix check copies	2021-02-08 14:36:30 +03:00
Lysandre Debut	ae37ceacbd	Fix typo (#10064 )	2021-02-08 06:02:05 -05:00
Patrick von Platen	89be094e29	[Templates] Add template "call-for-model" markdown and "call-for-big-bird" markdown (#9921 ) * add big bird * change teacher to mentor * add proposal template * adapt template * delete old template * correct some links * finish template * create big bird from template * add big bird * improve boxes * finish boxes * add pointers for BigBird * finish big bird * up * up * up * up * apply lysandres and sylvains suggestions * delete bogus file * correct markdown * try different style * try different style * finalize	2021-02-05 15:47:54 +03:00
Lysandre Debut	e89c959af9	Fix model templates (#9999 )	2021-02-04 07:47:26 -05:00
demSd	00031785a8	BartForCausalLM analogs to `ProphetNetForCausalLM` (#9128 ) * initiliaze bart4causalLM * create BartDecoderWrapper, setters/getters * delete spaces * forward and additional methods * update cache function, loss function, remove ngram* params in data class. * add bartcausallm, bartdecoder testing * correct bart for causal lm * remove at * add mbart as well * up * fix typo * up * correct * add pegasusforcausallm * add blenderbotforcausallm * add blenderbotsmallforcausallm * add marianforcausallm * add test for MarianForCausalLM * add Pegasus test * add BlenderbotSmall test * add blenderbot test * fix a fail * fix an import fail * a fix * fix * Update modeling_pegasus.py * fix models * fix inputs_embeds setting getter * adapt tests * correct repo utils check * finish test improvement * fix tf models as well * make style * make fix-copies * fix copies * run all tests * last changes * fix all tests Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2021-02-04 11:56:12 +03:00
Patrick von Platen	0f4dc5d864	fix typo in naming (#9944 )	2021-02-02 12:22:42 +03:00
Patrick von Platen	538b3b4607	[Tokenizer Utils Base] Make pad function more flexible (#9928 ) * change tokenizer requirement * split line * Correct typo from list to str * improve style * make other function pretty as well * add comment * correct typo * add new test * pass tests for tok without padding token * Apply suggestions from code review	2021-02-02 10:35:27 +03:00
Patrick von Platen	0e3be1ac8f	Add new model docs (#9667 ) * add new model logic * fix docs * change structure * improve add_new_model * push new changes * up * up * correct spelling * improve docstring * correct line length * update readme * correct links * correct typos * only add rst file for now * Apply suggestions from code review 1 Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Bram Vanroy <Bram.Vanroy@UGent.be> * Apply suggestions from code review Co-authored-by: Bram Vanroy <Bram.Vanroy@UGent.be> Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Stefan Schweter <stefan@schweter.it> Co-authored-by: Bram Vanroy <Bram.Vanroy@UGent.be> * Apply suggestions from code review Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Pierric Cistac <Pierrci@users.noreply.github.com> * finish adding all suggestions * make style * apply Niels feedback * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * apply sylvains suggestions Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Bram Vanroy <Bram.Vanroy@UGent.be> Co-authored-by: Stefan Schweter <stefan@schweter.it> Co-authored-by: Pierric Cistac <Pierrci@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-02-01 17:55:10 +03:00
Sylvain Gugger	d85691ac75	Doc title in the template (#9910 )	2021-02-01 03:05:31 -05:00
Funtowicz Morgan	2ee9f9b69e	Fix computation of attention_probs when head_mask is provided. (#9853 ) * Fix computation of attention_probs when head_mask is provided. Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Apply changes to the template Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>	2021-01-28 06:11:52 -05:00
Lysandre Debut	763ece2fea	Fix model templates (#9842 )	2021-01-27 08:20:58 -05:00
Julien Plu	bd701ab1a0	Fix template (#9840 )	2021-01-27 07:40:30 -05:00

1 2 3 4

169 Commits