transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-31 10:12:23 +06:00

Author	SHA1	Message	Date
Lysandre Debut	d415882b41	Remove tolerance + drop_rows_to_fit by default (#9507 ) * Remove tolerance + drop_rows_to_fit by default * remove drop_rows_to_fit	2021-01-11 08:02:41 -05:00
Julien Plu	1243ee7d0c	Full rework of the TF input/output embeddings and bias resizing (#9193 ) * Start rework resizing * Rework bias/decoder resizing * Full resizing rework * Full resizing rework * Start to update the models with the new approach * Finish to update the models * Update all the tests * Update the template * Fix tests * Fix tests * Test a new approach * Refactoring * Refactoring * Refactoring * New rework * Rework BART * Rework bert+blenderbot * Rework CTRL * Rework Distilbert * Rework DPR * Rework Electra * Rework Flaubert * Rework Funnel * Rework GPT2 * Rework Longformer * Rework Lxmert * Rework marian+mbart * Rework mobilebert * Rework mpnet * Rework openai * Rework pegasus * Rework Roberta * Rework T5 * Rework xlm+xlnet * Rework template * Fix TFT5EncoderOnly + DPRs * Restore previous methods * Fix Funnel * Fix CTRL and TransforXL * Apply style * Apply Sylvain's comments * Restore a test in DPR * Address the comments * Fix bug * Apply style * remove unused import * Fix test * Forgot a method * missing test * Trigger CI * naming update * Rebase * Trigger CI	2021-01-11 06:27:28 -05:00
Nicolas Patry	96f1f74aaf	Fixing tests. It seems master changed something in the warnings. (#9483 ) Trying to keep warning tests for now. Should be discarded if it becomes too hard to maintain.	2021-01-10 15:08:20 +01:00
Nicolas Patry	02e05fb0a5	Making Conversation possible to create directly a full conversation (#9434 ) * Cleaning up conversation tests. * Adding tests that don't require downloading models + conversation can be fully created from static state. * Making tests non flaky (by fixing generation length) * Bumping isort version. * Doc cleanup. * Remove unused test in this PR. * Torch import guard for TF. * Missing torch guard. * Small mistake in doc. * Actual uses `_history` and `_index` cache. + remove dead enumerate + improve warning message. * Update src/transformers/pipelines/conversational.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/pipelines/conversational.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/pipelines/conversational.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Adding comments and cleaner code to address history copy. * Improving pipeline name in tests. * Change tokenizer to a real one (still created at runtime with no external dependency) * Simplify DummyTok, reverse changes on tokenization. * Removing DummyTok. Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-01-08 14:33:25 +01:00
Julien Plu	4fbcf8ea49	Fix TF input for np.ndarray (#9294 ) * Fix input for np.ndarray" * add a test * add a test * Add a test * Apply style * Fix test	2021-01-08 08:23:29 -05:00
Patrick von Platen	f33a6f3446	[TFGPT2] - Fix flaky past_key_values test (#9460 ) * fix tf flakey * remove test files	2021-01-07 16:12:08 +01:00
Patrick von Platen	a400fe8931	[LED Test] fix common inputs pt for flaky pt-tf led test (#9459 ) * fix common inputs pt flakey led * fix other tests correspondingly	2021-01-07 12:29:03 +01:00
Julien Plu	812045adcc	New serving (#9419 ) * Add a serving method * Add albert * Add serving for BERT and BART * Add more models * Finish the serving addition * Temp fix * Restore DPR * Fix funnel attribute * Fix attributes GPT2 * Fix OpenAIGPT attribute * Fix T5 attributes * Fix Bart attributes * Fix TransfoXL attributes * Add versioning * better test * Update template * Fix Flaubert * Fix T5 * Apply style * Remove unused imports * Deactivate extra parameters * Remove too long test + saved_model default to False * Ignore the saved model test for some models * Fix some inputs * Fix mpnet serving * Trigger CI * Address all comments	2021-01-07 11:48:49 +01:00
Patrick von Platen	b8462b5b2a	[GenerationOutputs] Fix GenerationOutputs Tests (#9443 ) * fix generation models * fix led * fix docs * add is_decoder * fix last docstrings * make style * fix t5 cross attentions * correct t5	2021-01-06 19:37:02 +01:00
Sylvain Gugger	0c96262f7d	Fast transformers import part 1 (#9441 ) * Don't import libs to check they are available * Don't import integrations at init * Add importlib_metdata to deps * Remove old vars references * Avoid syntax error * Adapt testing utils * Try to appease torchhub * Add dependency * Remove more private variables * Fix typo * Another typo * Refine the tf availability test	2021-01-06 12:17:24 -05:00
Simon Brandeis	c89f1bc92e	Add flags to return scores, hidden states and / or attention weights in GenerationMixin (#9150 ) * Define new output dataclasses for greedy generation * Add output_[...] flags in greedy generation methods Added output_attentions, output_hidden_states, output_scores flags in generate and greedy_search methods in GenerationMixin. * [WIP] Implement logic and tests for output flags in generation * Update GreedySearchOutput classes & docstring * Implement greedy search output accumulation logic Update greedy_search unittests Fix generate method return value docstring Properly init flags with the default config * Update configuration to add output_scores flag * Fix test_generation_utils Sort imports and fix isinstance tests for GreedySearchOutputs * Fix typo in generation_utils * Add return_dict_in_generate for backwards compatibility * Add return_dict_in_generate flag in config * Fix tyPo in configuration * Fix handling of attentions and hidden_states flags * Make style & quality * first attempt attentions * some corrections * improve tests * special models requires special test * disable xlm test for now * clean tests * fix for tf * isort * Add output dataclasses for other generation methods * Add logic to return dict in sample generation * Complete test for sample generation - Pass output_attentions and output_hidden_states flags to encoder in encoder-decoder models - Fix import satements order in test_generation_utils file * Add logic to return dict in sample generation - Refactor tests to avoid using self.assertTrue, which provides scarce information when the test fails - Add tests for the three beam_search methods: vanilla, sample and grouped * Style doc * Fix copy-paste error in generation tests * Rename logits to scores and refactor * Refactor group_beam_search for consistency * make style * add sequences_scores * fix all tests * add docs * fix beam search finalize test * correct docstring * clean some files * Made suggested changes to the documentation * Style doc ? * Style doc using the Python util * Update src/transformers/generation_utils.py * fix empty lines * fix all test Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2021-01-06 17:11:42 +01:00
Stas Bekman	9f675b05d4	[trainer] self.model_wrapped + _model_unwrap (#9390 ) * model wrapped + model_unwrap * cleanup * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * style * deprecation warning * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-01-06 06:50:11 -05:00
Patrick von Platen	eef66035a2	[PyTorch Bart] Split Bart into different models (#9343 ) * first try * remove old template * finish bart * finish mbart * delete unnecessary line * init pegasus * save intermediate * correct pegasus * finish pegasus * remove cookie cutter leftover * add marian * finish blenderbot * replace in file * correctly split blenderbot * delete "old" folder * correct "add statement" * adapt config for tf comp * correct configs for tf * remove ipdb * fix more stuff * fix mbart * push pegasus fix * fix mbart * more fixes * fix research projects code * finish docs for bart, mbart, and marian * delete unnecessary file * correct attn typo * correct configs * remove pegasus for seq class * correct peg docs * correct peg docs * finish configs * further improve docs * add copied from statements to mbart * fix copied from in mbart * add copy statements to marian * add copied from to marian * add pegasus copied from * finish pegasus * finish copied from * Apply suggestions from code review * make style * backward comp blenderbot * apply lysandres and sylvains suggestions * apply suggestions * push last fixes * fix docs * fix tok tests * fix imports code style * fix doc	2021-01-05 22:00:05 +01:00
Patrick von Platen	189387e9b2	LED (#9278 ) * create model * add integration * save current state * make integration tests pass * add one more test * add explanation to tests * remove from bart * add padding * remove unnecessary test * make all tests pass * re-add cookie cutter tests * finish PyTorch * fix attention test * Update tests/test_modeling_common.py * revert change * remove unused file * add string to doc * save intermediate * make tf integration tests pass * finish tf * fix doc * fix docs again * add led to doctree * add to auto tokenizer * added tips for led * make style * apply jplus statements * correct tf longformer * apply lysandres suggestions * apply sylvains suggestions * Apply suggestions from code review	2021-01-05 13:14:30 +01:00
Julien Plu	4225740a7b	Use stable functions (#9369 )	2021-01-05 03:58:26 -05:00
Stas Bekman	143289dcf7	[test_model_parallelization] multiple fixes (#9354 )	2021-01-04 12:09:12 -08:00
Patrick von Platen	61443cd7d9	[GPT2] Correct gradient checkpointing (#9308 ) * correct gpt2 * fix gpt2 * fix use_cache ordering * correct past tolerance * fix for all cases * style	2020-12-25 23:28:12 +01:00
Ratthachat (Jung)	f3a3b91d6f	Proposed Fix : [RagSequenceForGeneration] generate "without" input_ids (#9220 ) * Create modeling_tf_dpr.py * Add TFDPR * Add back TFPegasus, TFMarian, TFMBart, TFBlenderBot last commit accidentally deleted these 4 lines, so I recover them back * Add TFDPR * Add TFDPR * clean up some comments, add TF input-style doc string * Add TFDPR * Make return_dict=False as default * Fix return_dict bug (in .from_pretrained) * Add get_input_embeddings() * Create test_modeling_tf_dpr.py The current version is already passed all 27 tests! Please see the test run at : https://colab.research.google.com/drive/1czS_m9zy5k-iSJbzA_DP1k1xAAC_sdkf?usp=sharing * fix quality * delete init weights * run fix copies * fix repo consis * del config_class, load_tf_weights They shoud be 'pytorch only' * add config_class back after removing it, test failed ... so totally only removing "use_tf_weights = None" on Lysandre suggestion * newline after .. note:: * import tf, np (Necessary for ModelIntegrationTest) * slow_test from_pretrained with from_pt=True At the moment we don't have TF weights (since we don't have official official TF model) Previously, I did not run slow test, so I missed this bug * Add simple TFDPRModelIntegrationTest Note that this is just a test that TF and Pytorch gives approx. the same output. However, I could not test with the official DPR repo's output yet * upload correct tf model * remove position_ids as missing keys * fix RagSeq generate with context_input_ids fix RagSeq generate with context_input_ids * apply style * delete unused lines * Add test_rag_sequence_generate_batch_from_context_input_ids * Readability improved * stylying * Stylize * typos * add check_model_generate_from_context_input_ids * make style * Apply suggestions from code review * make style2 Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: patrickvonplaten <patrick@huggingface.co>	2020-12-24 13:38:00 +01:00
Suraj Patil	88ef8893cd	Add caching mechanism to BERT, RoBERTa (#9183 ) * add past_key_values * add use_cache option * make mask before cutting ids * adjust position_ids according to past_key_values * flatten past_key_values * fix positional embeds * fix _reorder_cache * set use_cache to false when not decoder, fix attention mask init * add test for caching * add past_key_values for Roberta * fix position embeds * add caching test for roberta * add doc * make style * doc, fix attention mask, test * small fixes * adress patrick's comments * input_ids shouldn't start with pad token * use_cache only when decoder * make consistent with bert * make copies consistent * add use_cache to encoder * add past_key_values to tapas attention * apply suggestions from code review * make coppies consistent * add attn mask in tests * remove copied from longformer * apply suggestions from code review * fix bart test * nit * simplify model outputs * fix doc * fix output ordering	2020-12-23 23:01:32 +05:30
Patrick von Platen	cbe63949d7	Model Templates for Seq2Seq (#9251 ) * adapt cookie cutter * fix copy past statement * delete copy statements for now * remove unused import from template * make doc rst * correct config docstring * correct training * correct inputs processing tf enc dec * make style * adapt templates * clean tabs * correct tensor -> Tensor naming * correct indent * correct templates * fix the test * break lines to avoid > 119 * Apply suggestions from code review	2020-12-22 23:41:20 +01:00
Sylvain Gugger	490b39e614	Seq2seq trainer (#9241 ) * Add label smoothing in Trainer * Add options for scheduler and Adafactor in Trainer * Put Seq2SeqTrainer in the main lib * Apply suggestions from code review Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Address review comments and adapt scripts * Documentation * Move test not using script to tests folder Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2020-12-22 11:33:44 -05:00
Patrick von Platen	e9d77ccd5a	[EncoderDecoder] Make tests more aggressive (#9256 ) * add tests * make style and fix bart bug * fix bart past key value edge case * correct tf bart test * fix gpt2 tf * fix t5 test	2020-12-22 17:00:04 +01:00
Patrick von Platen	9a12b9696f	[MPNet] Add slow to fast tokenizer converter (#9233 ) * add converter * delet unnecessary comments	2020-12-21 15:41:34 +01:00
Suraj Patil	f4432b7e01	add base model classes to bart subclassed models (#9230 ) * add base model classes to bart subclassed models * add doc	2020-12-21 19:56:46 +05:30
TobiasNorlund	08abdabda1	Fixed beam search generation for GPT2 and T5 (#9219 )	2020-12-21 08:05:23 -05:00
sandip	e0e255be1f	Added TF TransfoXL Sequence Classification (#9169 ) * TF Transfoxl seq classification * Update test_modeling_tf_transfo_xl.py Added num_labels to config level * TF Transfoxl seq classification * Update test_modeling_tf_transfo_xl.py Added num_labels to config level * code refactor * code refactor * code refator	2020-12-19 14:44:04 +01:00
Sylvain Gugger	1198ba8fba	Add timing inside Trainer (#9196 ) * Add timing inside Trainer * Fix tests * Add n_objs for train * Sort logs	2020-12-18 15:10:39 -05:00
Sylvain Gugger	9a25c5bd3a	Add new run_swag example (#9175 ) * Add new run_swag example * Add check * Add sample * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Very important change to make Lysandre happy Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2020-12-18 14:19:24 -05:00
sandip	467e9158b4	Added TF CTRL Sequence Classification (#9151 ) * Added TF CTRL Sequence Classification * code refactor	2020-12-17 18:10:57 -05:00
Lysandre Debut	1c1a2ffbff	TableQuestionAnsweringPipeline (#9145 ) * AutoModelForTableQuestionAnswering * TableQuestionAnsweringPipeline * Apply suggestions from Patrick's code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Sylvain and Patrick comments * Better PyTorch/TF error message * Add integration tests * Argument Handler naming Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com> * Fix docs to appease the documentation gods Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2020-12-16 12:31:50 -05:00
Lysandre Debut	07384baf7a	AutoModelForTableQuestionAnswering (#9154 ) * AutoModelForTableQuestionAnswering * Update src/transformers/models/auto/modeling_auto.py * Style	2020-12-16 12:14:33 -05:00
Patrick von Platen	640e6fe190	[Flax] Align FlaxBertForMaskedLM with BertForMaskedLM, implement from_pretrained, init (#9054 ) * save intermediate * save intermediate * save intermediate * correct flax bert model file * new module / model naming * make style * almost finish BERT * finish roberta * make fix-copies * delete keys file * last refactor * fixes in run_mlm_flax.py * remove pooled from run_mlm_flax.py` * fix gelu \| gelu_new * remove Module from inits * splits * dirty print * preventing warmup_steps == 0 * smaller splits * make fix-copies * dirty print * dirty print * initial_evaluation argument * declaration order fix * proper model initialization/loading * proper initialization * run_mlm_flax improvements: improper model inputs bugfix + automatic dataset splitting + tokenizers parallelism warning + avoiding warmup_steps=0 bug * removed tokenizers warning hack, fixed model re-initialization * reverted training_args.py changes * fix flax from pretrained * improve test in flax * apply sylvains tips * update init * make 0.3.0 compatible * revert tevens changes * revert tevens changes 2 * finalize revert * fix bug * add docs * add pretrained to init * Update src/transformers/modeling_flax_utils.py * fix copies * final improvements Co-authored-by: TevenLeScao <teven.lescao@gmail.com>	2020-12-16 13:03:32 +01:00
NielsRogge	1551e2dc6d	[WIP] Tapas v4 (tres) (#9117 ) * First commit: adding all files from tapas_v3 * Fix multiple bugs including soft dependency and new structure of the library * Improve testing by adding torch_device to inputs and adding dependency on scatter * Use Python 3 inheritance rather than Python 2 * First draft model cards of base sized models * Remove model cards as they are already on the hub * Fix multiple bugs with integration tests * All model integration tests pass * Remove print statement * Add test for convert_logits_to_predictions method of TapasTokenizer * Incorporate suggestions by Google authors * Fix remaining tests * Change position embeddings sizes to 512 instead of 1024 * Comment out positional embedding sizes * Update PRETRAINED_VOCAB_FILES_MAP and PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES * Added more model names * Fix truncation when no max length is specified * Disable torchscript test * Make style & make quality * Quality * Address CI needs * Test the Masked LM model * Fix the masked LM model * Truncate when overflowing * More much needed docs improvements * Fix some URLs * Some more docs improvements * Test PyTorch scatter * Set to slow + minify * Calm flake8 down * First commit: adding all files from tapas_v3 * Fix multiple bugs including soft dependency and new structure of the library * Improve testing by adding torch_device to inputs and adding dependency on scatter * Use Python 3 inheritance rather than Python 2 * First draft model cards of base sized models * Remove model cards as they are already on the hub * Fix multiple bugs with integration tests * All model integration tests pass * Remove print statement * Add test for convert_logits_to_predictions method of TapasTokenizer * Incorporate suggestions by Google authors * Fix remaining tests * Change position embeddings sizes to 512 instead of 1024 * Comment out positional embedding sizes * Update PRETRAINED_VOCAB_FILES_MAP and PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES * Added more model names * Fix truncation when no max length is specified * Disable torchscript test * Make style & make quality * Quality * Address CI needs * Test the Masked LM model * Fix the masked LM model * Truncate when overflowing * More much needed docs improvements * Fix some URLs * Some more docs improvements * Add add_pooling_layer argument to TapasModel Fix comments by @sgugger and @patrickvonplaten * Fix issue in docs + fix style and quality * Clean up conversion script and add task parameter to TapasConfig * Revert the task parameter of TapasConfig Some minor fixes * Improve conversion script and add test for absolute position embeddings * Improve conversion script and add test for absolute position embeddings * Fix bug with reset_position_index_per_cell arg of the conversion cli * Add notebooks to the examples directory and fix style and quality * Apply suggestions from code review * Move from `nielsr/` to `google/` namespace * Apply Sylvain's comments Co-authored-by: sgugger <sylvain.gugger@gmail.com> Co-authored-by: Rogge Niels <niels.rogge@howest.be> Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr> Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: sgugger <sylvain.gugger@gmail.com>	2020-12-15 17:08:49 -05:00
Sylvain Gugger	ad895af98d	Add possibility to switch between APEX and AMP in Trainer (#9137 ) * Add possibility to switch between APEX and AMP in Trainer * Update src/transformers/training_args.py Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * Address review comments * Update src/transformers/training_args.py Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>	2020-12-15 16:38:10 -05:00
Lysandre Debut	0b2f46fa9e	Add large model config (#9140 )	2020-12-15 16:03:59 -05:00
Patrick von Platen	abc573f51a	[TF Bart] Refactor TFBart (#9029 ) * reorder file * delete unnecesarry function * make style * save intermediate * fix attention masks * correct tf bart past key values * solve merge conflict bug * correct tensor dims * save intermediate tf * change attn layer * fix typo re-order past * inputs_embeds * make fix copies * finish tests * fix graph mode * appyl lysandres suggestions	2020-12-15 17:31:28 +01:00
sandip	389aba34bf	Added TF OpenAi GPT1 Sequence Classification (#9105 ) * TF OpenAI GPT Sequence Classification * Update src/transformers/models/openai/modeling_tf_openai.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2020-12-15 11:27:08 -05:00
Julien Plu	ef2d4cd445	Fix tf2.4 (#9120 ) * Fix tests for TF 2.4 * Remove <2.4 limitation * Add version condition * Update tests/test_optimization_tf.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update tests/test_optimization_tf.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update tests/test_optimization_tf.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2020-12-15 10:10:46 -05:00
Lysandre Debut	6ccea0486f	Fix T5 model parallel tes (#9107 ) k	2020-12-15 09:51:12 -05:00
Julien Plu	df3f4d2aef	Fix T5 and BART for TF (#9063 ) * Fix T5 for graphe compilation+execution * Fix BART * Fix import * Fix naming * fix attribute name * Oops * fix import * fix tests * fix tests * Update test * Add mising import * Address Patrick's comments * Style * Address Patrick's comment	2020-12-14 18:47:00 +01:00
Ahmed Elnaggar	a9c8bff724	Add parallelization support for T5EncoderModel (#9082 ) * add model parallelism to T5EncoderModel add model parallelism to T5EncoderModel * remove decoder from T5EncoderModel parallelize * uodate T5EncoderModel docs * Extend T5ModelTest for T5EncoderModel * fix T5Stask using range for get_device_map * fix style Co-authored-by: Ahmed Elnaggar <elnaggar@rostlab.informatik.tu-muenchen.de>	2020-12-14 12:00:45 -05:00
Patrick von Platen	fa1ddced9e	[RAG, Bart] Align RAG, Bart cache with T5 and other models of transformers (#9098 ) * fix rag * fix slow test * fix past in bart	2020-12-14 12:32:26 +01:00
Julien Plu	51d9c569fa	Fix embeddings resizing in TF models (#8657 ) * Resize the biases in same time than the embeddings * Trigger CI * Biases are not reset anymore * Remove get_output_embeddings + better LM model detection in generation utils * Apply style * First test on BERT * Update docstring + new name * Apply the new resizing logic to all the models * fix tests * Apply style * Update the template * Fix naming * Fix naming * Apply style * Apply style * Remove unused import * Revert get_output_embeddings * Trigger CI * Update num parameters * Restore get_output_embeddings in TFPretrainedModel and add comments * Style * Add decoder resizing * Style * Fix tests * Separate bias and decoder resize * Fix tests * Fix tests * Apply style * Add bias resizing in MPNet * Trigger CI * Apply style	2020-12-13 23:05:24 -05:00
Patrick von Platen	9cc9f4122e	Make ProphetNetModel really compatible with EncoderDecoder (#9033 ) * improve * finish * upload model * fix lm head * fix test	2020-12-11 16:59:54 +01:00
Sylvain Gugger	8d4bb02056	Refactor FLAX tests (#9034 )	2020-12-10 15:57:39 -05:00
Patrick von Platen	06971ac4f9	[Bart] Refactor - fix issues, consistency with the library, naming (#8900 ) * remove make on the fly linear embedding * start refactor * big first refactor * save intermediate * save intermediat * correct mask issue * save tests * refactor padding masks * make all tests pass * further refactor * make pegasus test pass * fix bool if * fix leftover tests * continue * bart renaming * delete torchscript test hack * fix imports in tests * correct shift * fix docs and repo cons * re-add fix for FSTM * typo in test * fix typo * fix another typo * continue * hot fix 2 for tf * small fixes * refactor types linting * continue * finish refactor * fix import in tests * better bart names * further refactor and add test * delete hack * apply sylvains and lysandres commens * small perf improv * further perf improv * improv perf * fix typo * make style * small perf improv	2020-12-09 20:55:24 +01:00
Funtowicz Morgan	75627148ee	Flax Masked Language Modeling training example (#8728 ) * Remove "Model" suffix from Flax models to look more 🤗 Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Initial working (forward + backward) for Flax MLM training example. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Simply code Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Addressing comments, using module and moving to LM task. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Restore parameter name "module" wrongly renamed model. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Restore correct output ordering... Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Actually commit the example 😅 Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Add FlaxBertModelForMaskedLM after rebasing. Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Make it possible to initialize the training from scratch Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Reuse flax linen example of cross entropy loss Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Added specific data collator for flax Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Remove todo for data collator Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Added evaluation step Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Added ability to provide dtype to support bfloat16 on TPU Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Enable flax tensorboard output Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Enable jax.pmap support. Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Ensure batches are correctly sized to be dispatched with jax.pmap Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Enable bfloat16 with --fp16 cmdline args Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Correctly export metrics to tensorboard Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Added dropout and ability to use it. Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Effectively enable & disable during training and evaluation steps. Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Oops. Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Enable specifying kernel initializer scale Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Style. Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Added warmup step to the learning rate scheduler. Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Fix typo. Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Print training loss Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Make style Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * fix linter issue (flake8) Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Fix model matching Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Fix dummies Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Fix non default dtype on Flax models Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Use the same create_position_ids_from_input_ids for FlaxRoberta Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Make Roberta attention as Bert Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * fix copy Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Wording. Co-authored-by: Marc van Zee <marcvanzee@gmail.com> Co-authored-by: Marc van Zee <marcvanzee@gmail.com>	2020-12-09 17:13:56 +01:00
StillKeepTry	df2af6d8b8	Add MP Net 2 (#9004 )	2020-12-09 10:32:43 -05:00
Patrick von Platen	02d0e0355c	Diverse beam search 2 (#9006 ) * diverse beam search * bug fixes * bug fixes * bug fix * separate out diverse_beam_search function * separate out diverse_beam_search function * bug fix * improve code quality * bug fix * bug fix * separate out diverse beam search scorer * code format * code format * code format * code format * add test * code format * documentation changes * code quality * add slow integration tests * more general name * refactor into logits processor * add test * avoid too much copy paste * refactor * add to docs * fix-copies * bug fix * Revert "bug fix" This reverts commit `c99eb5a8dc`. * improve comment * implement sylvains feedback Co-authored-by: Ayush Jain <a.jain@sprinklr.com> Co-authored-by: ayushtiku5 <40797286+ayushtiku5@users.noreply.github.com>	2020-12-09 15:00:37 +01:00
Sylvain Gugger	447808c85f	New squad example (#8992 ) * Add new SQUAD example * Same with a task-specific Trainer * Address review comment. * Small fixes * Initial work for XLNet * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Final clean up and working XLNet script * Test and debug * Final working version * Add new SQUAD example * Same with a task-specific Trainer * Address review comment. * Small fixes * Initial work for XLNet * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Final clean up and working XLNet script * Test and debug * Final working version * Add tick * Update README * Address review comments Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2020-12-08 14:39:29 -05:00
guillaume-be	7809eb82ae	Removed unused `encoder_hidden_states` and `encoder_attention_mask` (#8972 ) * Removed unused `encoder_hidden_states` and `encoder_attention_mask` from MobileBert * Removed decoder tests for MobileBert * Removed now unnecessary import	2020-12-08 12:04:34 -05:00
Julien Plu	bf7f79cd57	Optional layers (#8961 ) * Apply on BERT and ALBERT * Update TF Bart * Add input processing to TF BART * Add input processing for TF CTRL * Add input processing to TF Distilbert * Add input processing to TF DPR * Add input processing to TF Electra * Add deprecated arguments * Add input processing to TF XLM * remove unused imports * Add input processing to TF Funnel * Add input processing to TF GPT2 * Add input processing to TF Longformer * Add input processing to TF Lxmert * Apply style * Add input processing to TF Mobilebert * Add input processing to TF GPT * Add input processing to TF Roberta * Add input processing to TF T5 * Add input processing to TF TransfoXL * Apply style * Rebase on master * Fix wrong model name * Fix BART * Apply style * Put the deprecated warnings in the input processing function * Remove the unused imports * Raise an error when len(kwargs)>0 * test ModelOutput instead of TFBaseModelOutput * Address Patrick's comments * Address Patrick's comments * Add boolean processing for the inputs * Take into account the optional layers * Add missing/unexpected weights in the other models * Apply style * rename parameters * Apply style * Remove useless * Remove useless * Remove useless * Update num parameters * Fix tests * Address Patrick's comment * Remove useless attribute	2020-12-08 09:14:09 -05:00
Sylvain Gugger	00aa9dbca2	Copyright (#8970 ) * Add copyright everywhere missing * Style	2020-12-07 18:36:34 -05:00
Julien Chaumond	28fa014a1f	transformers-cli: LFS multipart uploads (> 5GB) (#8663 ) * initial commit * [cli] lfs commands * Fix FileSlice * Tweak to FileSlice * [hf_api] Backport filetype arg from `datasets` cc @lhoestq * Silm down the CI while i'm working * Ok let's try this in CI * Update config.yml * Do not try this at home * one more try * Update lfs.py * Revert "Tweak to FileSlice" This reverts commit `d7e32c4b35`. * Update test_hf_api.py * Update test_hf_api.py * Update test_hf_api.py * CI still green? * make CI green again? * Update test_hf_api.py * make CI red again? * Update test_hf_api.py * add CI style back * Fix CI? * oh my * doc + switch back to real staging endpoint * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Pierric Cistac <Pierrci@users.noreply.github.com> * Fix docblock + f-strings Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Pierric Cistac <Pierrci@users.noreply.github.com>	2020-12-07 16:38:39 -05:00
sandip	483e13273f	Add TFGPT2ForSequenceClassification based on DialogRPT (#8714 ) * Add TFGPT2ForSequenceClassification based on DialogRPT * Add TFGPT2ForSequenceClassification based on DialogRPT * TFGPT2ForSequenceClassification based on DialogRPT-refactored code, implemented review comments and added input processing * Add TFGPT2ForSequenceClassification based on DialogRPT * TFGPT2ForSequenceClassification based on DialogRPT-refactored code, implemented review comments and added input processing * code refactor for latest other TF PR * code refactor * code refactor * Update modeling_tf_gpt2.py	2020-12-07 16:58:37 +01:00
Lysandre Debut	aa60b230ec	Patch model parallel test (#8920 ) * Patch model parallel test * Remove line * Remove `ci_*` from scheduled branches	2020-12-03 17:15:47 -05:00
Patrick von Platen	443f67e887	[PyTorch] Refactor Resize Token Embeddings (#8880 ) * fix resize tokens * correct mobile_bert * move embedding fix into modeling_utils.py * refactor * fix lm head resize * refactor * break lines to make sylvain happy * add news tests * fix typo * improve test * skip bart-like for now * check if base_model = get(...) is necessary * clean files * improve test * fix tests * revert style templates * Update templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/modeling_{{cookiecutter.lowercase_modelname}}.py	2020-12-02 19:19:50 +01:00
Nicolas Patry	a8c3f9aa76	Warning about too long input for fast tokenizers too (#8799 ) * Warning about too long input for fast tokenizers too If truncation is not set in tokenizers, but the tokenization is too long for the model (`model_max_length`), we used to trigger a warning that The input would probably fail (which it most likely will). This PR re-enables the warning for fast tokenizers too and uses common code for the trigger to make sure it's consistent across. * Checking for pair of inputs too. * Making the function private and adding it's doc. * Remove formatting ?? in odd place. * Missed uppercase.	2020-12-02 10:18:28 -05:00
sandip	f6b44e6190	Transfoxl seq classification (#8868 ) * Transfoxl sequence classification * Transfoxl sequence classification	2020-12-02 10:08:32 -05:00
Sylvain Gugger	7c10dd22ae	Better support for resuming training (#8878 )	2020-12-01 13:45:21 -05:00
elk-cloner	4a9e502a36	Ctrl for sequence classification (#8812 ) * add CTRLForSequenceClassification * pass local test * merge with master * fix modeling test for sequence classification * fix deco * fix assert	2020-12-01 09:49:27 +01:00
Nicolas Patry	d8fc26e919	NerPipeline (TokenClassification) now outputs offsets of words (#8781 ) * NerPipeline (TokenClassification) now outputs offsets of words - It happens that the offsets are missing, it forces the user to pattern match the "word" from his input, which is not always feasible. For instance if a sentence contains the same word twice, then there is no way to know which is which. - This PR proposes to fix that by outputting 2 new keys for this pipelines outputs, "start" and "end", which correspond to the string offsets of the word. That means that we should always have the invariant: ```python input[entity["start"]: entity["end"]] == entity["entity_group"] # or entity["entity"] if not grouped ``` * Fixing doc style	2020-11-30 14:05:08 -05:00
Funtowicz Morgan	51b071313b	Attempt to fix Flax CI error(s) (#8829 ) * Slightly increase tolerance between pytorch and flax output Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * test_multiple_sentences doesn't require torch Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Simplify parameterization on "jit" to use boolean rather than str Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Use `require_torch` on `test_multiple_sentences` because we pull the weight from the hub. Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Rename "jit" parameter to "use_jit" for (hopefully) making it self-documenting. Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Remove pytest.mark.parametrize which seems to fail in some circumstances Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Fix unused imports. Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Fix style. Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Give default parameters values for traced model. Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Review comment: Change sentences to sequences Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2020-11-30 13:43:17 -05:00
Ahmed Elnaggar	40ecaf0c2b	Add T5 Encoder for Feature Extraction (#8717 ) * Add T5 Encoder class for feature extraction * fix T5 encoder add_start_docstrings indent * update init with T5 encoder * update init with TFT5ModelEncoder * remove TFT5ModelEncoder * change T5ModelEncoder order in init * add T5ModelEncoder to transformers init * clean T5ModelEncoder * update init with TFT5ModelEncoder * add TFModelEncoder for Tensorflow * update init with TFT5ModelEncoder * Update src/transformers/models/t5/modeling_t5.py change output from Seq2SeqModelOutput to BaseModelOutput Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * remove encoder_outputs 1. remove encoder_outputs from the function call. 2. remove the encoder_outputs If statement. 3. remove isinstance from return_dict. * Authorize missing decoder keys * remove unnecessary input parameters remove pask_key_values and use_cache * remove use_cache remove use_cache from the forward method * add doctoring for T5 encoder add doctoring for T5 encoder with T5_ENCODER_INPUTS_DOCSTRING * change return_dict to dot access * add T5_ENCODER_INPUTS_DOCSTRING for TF T5 * change TFT5Encoder output type to BaseModelOutput * remove unnecessary parameters for TFT5Encoder * remove unnecessary if statement * add import BaseModelOutput * fix BaseModelOutput typo to TFBaseModelOutput * update T5 doc with T5ModelEncoder * add T5ModelEncoder to tests * finish pytorch * finish docs and mt5 * add mtf to init * fix init * remove n_positions * finish PR * Update src/transformers/models/mt5/modeling_mt5.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/models/t5/modeling_t5.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/models/t5/modeling_tf_t5.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/models/mt5/modeling_tf_mt5.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * make style Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2020-11-30 08:34:40 +01:00
Patrick von Platen	5ced23dc84	[Pegasus] Refactor Tokenizer (#8731 ) * refactor * further refactor * fix the rest tomorrow * save intermediate * finish slow tokenizer * make more tests pass * finish refactor * fix comment * clean further * fix name * fix naming * Update src/transformers/models/reformer/tokenization_reformer.py * Apply suggestions from code review * Apply suggestions from code review * refactor * fix init tokenizers * refactor * improve convert * refactor * correct convert slow tokenizer * final fix for Pegasus Tok * remove ipdb * improve links	2020-11-29 16:57:43 +01:00
Lysandre Debut	18c32eeb21	Model parallel tests should return, not pass in non model parallel settings. (#8825 )	2020-11-27 16:41:29 -05:00
Max Del	0a921b6459	BART & FSMT: fix decoder not returning hidden states from the last layer (#8597 ) * Fix decoder not returning hidden states from the last layer * Resolve conflict * Change the way to gather hidden states * Add decoder hidden states test * Make pytest and black happy * Remove redundant line * remove new line Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>	2020-11-27 18:35:34 +01:00
Moussa Kamal Eddine	81fe0bf085	Add barthez model (#8393 ) * Add init barthez * Add barthez model, tokenizer and docs BARThez is a pre-trained french seq2seq model that uses BART objective. * Apply suggestions from code review docs typos Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Add license * Change URLs scheme * Remove barthez model keep tokenizer * Fix style * Fix quality * Update tokenizer * Add fast tokenizer * Add fast tokenizer test Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2020-11-27 12:31:42 -05:00
Patrick von Platen	a7d46a0609	Fix dpr<>bart config for RAG (#8808 ) * correct dpr test and bert pos fault * fix dpr bert config problem * fix layoutlm * add config to dpr as well	2020-11-27 16:26:45 +01:00
Patrick von Platen	a2cf37595e	[Flax test] Add require pytorch to flix flax test (#8816 ) * try flax fix * same for roberta	2020-11-27 14:40:42 +01:00
Kristian Holsheimer	f8eda599bd	[FlaxBert] Fix non-broadcastable attention mask for batched forward-passes (#8791 ) * [FlaxBert] Fix non-broadcastable attention mask for batched forward-passes * [FlaxRoberta] Fix non-broadcastable attention mask * Use jax.numpy instead of ordinary numpy (otherwise not jit-able) * Partially revert "Use jax.numpy ..." * Add tests for batched forward passes * Avoid unnecessary OOMs due to preallocation of GPU memory by XLA * Auto-fix style * Re-enable GPU memory preallocation but with mem fraction < 1/paralleism	2020-11-27 13:21:19 +01:00
Patrick von Platen	2a6fbe6a40	[XLNet] Fix mems behavior (#8567 ) * fix mems in xlnet * fix use_mems * fix use_mem_len * fix use mems * clean docs * fix tf typo * make xlnet tf for generation work * fix tf test * refactor use cache * add use cache for missing models * correct use_cache in generate * correct use cache in tf generate * fix tf * correct getattr typo * make sylvain happy * change in docs as well * do not apply to cookie cutter statements * fix tf test * make pytorch model fully backward compatible	2020-11-25 16:54:59 -05:00
Joe Davison	369f1d77b4	Return correct Bart hidden state tensors (#8747 ) * bart output hidden states upstream * same w/ decoder * add tests * fix prophetnet * fix gpt2 and ctrl * fix fstm and skip test for reformer and longformer * fix all models Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2020-11-25 22:06:04 +01:00
Lysandre Debut	138f45c184	Fix QA argument handler (#8765 ) * Fix QA argument handler * Attempt to get a better fix for QA (#8768) Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>	2020-11-25 14:02:15 -05:00
Julien Plu	29d4992453	New TF model inputs (#8602 ) * Apply on BERT and ALBERT * Update TF Bart * Add input processing to TF BART * Add input processing for TF CTRL * Add input processing to TF Distilbert * Add input processing to TF DPR * Add input processing to TF Electra * Add input processing for TF Flaubert * Add deprecated arguments * Add input processing to TF XLM * remove unused imports * Add input processing to TF Funnel * Add input processing to TF GPT2 * Add input processing to TF Longformer * Add input processing to TF Lxmert * Apply style * Add input processing to TF Mobilebert * Add input processing to TF GPT * Add input processing to TF Roberta * Add input processing to TF T5 * Add input processing to TF TransfoXL * Apply style * Rebase on master * Bug fix * Retry to bugfix * Retry bug fix * Fix wrong model name * Try another fix * Fix BART * Fix input precessing * Apply style * Put the deprecated warnings in the input processing function * Remove the unused imports * Raise an error when len(kwargs)>0 * test ModelOutput instead of TFBaseModelOutput * Bug fix * Address Patrick's comments * Address Patrick's comments * Address Sylvain's comments * Add the new inputs in new Longformer models * Update the template with the new input processing * Remove useless assert * Apply style * Trigger CI	2020-11-24 13:55:00 -05:00
Stas Bekman	82d443a7fd	[core] implement support for run-time dependency version checking (#8645 ) * implement support for run-time dependency version checking * try not escaping ! * use findall that works on py36 * small tweaks * autoformatter worship * simplify * shorter names * add support for non-versioned checks * add deps * revert * tokenizers not required, check version only if installed * make a proper distutils cmd and add make target * tqdm must be checked before tokenizers * workaround the DistributionNotFound peculiar setup * handle the rest of packages in setup.py * fully sync setup.py's install_requires - to check them all * nit * make install_requires more readable * typo * Update setup.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * restyle * add types * simplify * simplify2 Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2020-11-24 13:22:25 -05:00
Lysandre Debut	e09e54fd9d	MT5 should have an autotokenizer (#8743 ) * MT5 should have an autotokenizer * Different configurations should be able to point to same tokenizers	2020-11-24 09:50:25 -05:00
Lysandre Debut	6fdd0bb231	Fix slow tests v2 (#8746 ) * Fix BART test * Fix MBART tests * Remove erroneous line from yaml * Update tests/test_modeling_bart.py * Quality	2020-11-24 09:35:12 -05:00
zhiheng-huang	2c83b3c38d	Support various BERT relative position embeddings (2nd) (#8276 ) * Support BERT relative position embeddings * Fix typo in README.md * Address review comment * Fix failing tests * [tiny] Fix style_doc.py check by adding an empty line to configuration_bert.py * make fix copies * fix configs of electra and albert and fix longformer * remove copy statement from longformer * fix albert * fix electra * Add bert variants forward tests for various position embeddings * [tiny] Fix style for test_modeling_bert.py * improve docstring * [tiny] improve docstring and remove unnecessary dependency * [tiny] Remove unused import * re-add to ALBERT * make embeddings work for ALBERT * add test for albert Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2020-11-24 14:40:53 +01:00
LysandreJik	7f2c00913a	TF BERT test update	2020-11-23 18:20:19 -05:00
LysandreJik	e1b7e10d5f	Update TF BERT test	2020-11-23 18:19:12 -05:00
Colin Brochtrup	8ffc01a76a	Add early stopping callback to pytorch trainer (#8581 ) * Add early stopping patience and minimum threshold metric must improve to prevent early stopping to pytorch trainer * Add early stopping test * Set patience counter to 0 if best metric not defined yet * Make early stopping a callback. Add callback event for updating the best metric for early stopping callback to trigger on. * Run make style * make funciton name sensible * Improve new argument docstring wording and hope that flakey CI test passes. * Use on_evaluation callback instead of custom. Remove some debug printing * Move early stopping arguments and state into early stopping callback * Run make style * Remove old code * Fix docs formatting. make style went rogue on me. * Remove copied attributes and fix variable * Add assertions on training arguments instead of mutating them. Move comment out of public docs. * Make separate test for early stopping callback. Add test of invalid arguments. * Run make style... I remembered before CI this time! * appease flake8 * Add EarlyStoppingCallback to callback docs * Make docstring EarlyStoppingCallabck match other callbacks. * Fix typo in docs	2020-11-23 17:25:35 -05:00
Stas Bekman	e84786aaa6	consistent ignore keys + make private (#8737 ) * consistent ignore keys + make private * style * - authorized_missing_keys => _keys_to_ignore_on_load_missing - authorized_unexpected_keys => _keys_to_ignore_on_load_unexpected * move public doc of private attributes to private comment	2020-11-23 12:33:13 -08:00
alexorona	1cd9be2aeb	gpt2 and t5 parallel modeling (#8696 ) * gpt2 and t5 parallel modeling * model_parallel utils update * adding missing model_parallel_utils Adds missing model_parallel_utils and reverses the changes to code in modeling_gpt2 and modeling_t5 * training_args reformat Reformatted training_args * style formatting Style formatting doc string length on training_args and model_parallel_utils * style changes make style && make quality for training_args and model_parallel_utils. * adding tests * minor change in trainer reverts loss calculation * Update training_args.py * Update training_args.py added back docstring language for adam_beta1 and adam_beta2 * Update trainer.py * Update src/transformers/trainer.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Fix style & rebase Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>	2020-11-23 14:41:23 -05:00
Julien Chaumond	0cc5ab1333	Improve bert-japanese tokenizer handling (#8659 ) * Make ci fail * Try to make tests actually run? * CI finally failing? * Fix CI * Revert "Fix CI" This reverts commit `ca7923be73`. * Ooops wrong one * one more try * Ok ok let's move this elsewhere * Alternative to globals() (#8667) * Alternative to globals() * Error is raised later so return None * Sentencepiece not installed make some tokenizers None * Apply Lysandre wisdom * Slightly clearer comment? cc @sgugger Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2020-11-23 11:15:02 -05:00
Yossi Synett	18c8cf000b	Fix bug in x-attentions output for roberta and harden test to catch it (#8660 )	2020-11-23 13:28:29 +01:00
Patrick von Platen	9c0afdaf7b	fix flaky ci (#8694 )	2020-11-20 22:07:21 +01:00
Sylvain Gugger	6494910f27	Add sentencepiece to the CI and fix tests (#8672 ) * Fix the CI and tests * Fix quality * Remove that m form nowhere	2020-11-19 16:44:20 -05:00
Stas Bekman	42111f1d56	[tokenizers] convert_to_tensors: don't reconvert when the type is already right (#8283 ) * don't reconvert when the type is already right * better name * adjust logic as suggested * merge	2020-11-19 12:06:01 -08:00
Zhylko Dima	ca0109bd68	`disable_ngram_loss` fix for prophetnet (#8554 ) * `disable_ngram_loss` fix for prophetnet * add changes documentation * fix _compute_loss to use mean reduction and -100 to masked tokens & remove unnecessary arguments * mean label smoothing loss * small refactor * fix test Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>	2020-11-19 19:18:07 +01:00
Sylvain Gugger	4208f496ee	Better filtering of the model outputs in Trainer (#8633 ) * Better filtering of the model outputs in Trainer * Fix examples tests * Add test for Lysandre	2020-11-19 10:43:15 -05:00
Lysandre Debut	f2e07e7272	Fix a bunch of slow tests (#8634 ) * CI should install `sentencepiece` * Requiring TF * Fixing some TFDPR bugs * remove return_dict=False/True hack Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>	2020-11-19 10:41:41 -05:00
elk-cloner	5362bb8a6b	Tf longformer for sequence classification (#8231 ) * working on LongformerForSequenceClassification * add TFLongformerForMultipleChoice * add TFLongformerForTokenClassification * use add_start_docstrings_to_model_forward * test TFLongformerForSequenceClassification * test TFLongformerForMultipleChoice * test TFLongformerForTokenClassification * remove test from repo * add test and doc for TFLongformerForSequenceClassification, TFLongformerForTokenClassification, TFLongformerForMultipleChoice * add requested classes to modeling_tf_auto.py update dummy_tf_objects fix tests fix bugs in requested classes * pass all tests except test_inputs_embeds * sync with master * pass all tests except test_inputs_embeds * pass all tests * pass all tests * work on test_inputs_embeds * fix style and quality * make multi choice work * fix TFLongformerForTokenClassification signature * fix TFLongformerForMultipleChoice, TFLongformerForSequenceClassification signature * fix mult choice * fix mc hint * fix input embeds * fix input embeds * refactor input embeds * fix copy issue * apply sylvains changes and clean more Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2020-11-19 10:37:27 -05:00
Sylvain Gugger	1e62e999e8	Fixes the training resuming with gradient accumulation (#8624 )	2020-11-18 12:00:11 -05:00
Nicola De Cao	2f9d49b389	Adding PrefixConstrainedLogitsProcessor (#8529 ) * Adding PrefixConstrainedLogitsProcessor * fixing RAG and style_doc * fixing black (v20 instead of v19) * Improving doc in generation_logits_process.py * Improving docs and typing in generation_utils.py * docs improvement * adding test and fixing doc typo * fixing doc_len * isort on test * fixed test * improve docstring a bit Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2020-11-18 17:06:25 +01:00
Sylvain Gugger	dd52804f5f	Remove deprecated (#8604 ) * Remove old deprecated arguments Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr> * Remove needless imports * Fix tests Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>	2020-11-17 15:11:29 -05:00
Lysandre Debut	3095ee9dab	Tokenizers should be framework agnostic (#8599 ) * Tokenizers should be framework agnostic * Run the slow tests * Not testing * Fix documentation * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2020-11-17 14:03:03 -05:00
Patrick von Platen	86822a358b	T5 & mT5 (#8552 ) * add mt5 and t5v1_1 model * fix tests * correct some imports * add tf model * finish tf t5 * improve examples * fix copies * clean doc	2020-11-17 12:23:09 +01:00
Sylvain Gugger	c89bdfbe72	Reorganize repo (#8580 ) * Put models in subfolders * Styling * Fix imports in tests * More fixes in test imports * Sneaky hidden imports * Fix imports in doc files * More sneaky imports * Finish fixing tests * Fix examples * Fix path for copies * More fixes for examples * Fix dummy files * More fixes for example * More model import fixes * Is this why you're unhappy GitHub? * Fix imports in conver command	2020-11-16 21:43:42 -05:00
Sylvain Gugger	1073a2bde5	Switch `return_dict` to `True` by default. (#8530 ) * Use the CI to identify failing tests * Remove from all examples and tests * More default switch * Fixes * More test fixes * More fixes * Last fixes hopefully * Use the CI to identify failing tests * Remove from all examples and tests * More default switch * Fixes * More test fixes * More fixes * Last fixes hopefully * Run on the real suite * Fix slow tests	2020-11-16 11:43:00 -05:00

1 2 3 4 5 ...

806 Commits