transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-05 22:00:09 +06:00

Author	SHA1	Message	Date
Funtowicz Morgan	d490b5d500	Fast Tokenizers save pretrained should return the list of generated file paths. (#2918 ) * Correctly return the tuple of generated file(s) when calling save_pretrained Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Quality and format. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>	2020-02-20 00:58:04 +01:00
Funtowicz Morgan	e676764241	Override build_inputs_with_special_tokens for fast tokenizers (#2912 ) * Override build_inputs_with_special_tokens for fast impl + unittest. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Quality + format. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>	2020-02-19 16:09:51 -05:00
Funtowicz Morgan	3f3fa7f7da	Integrate fast tokenizers library inside transformers (#2674 ) * Implemented fast version of tokenizers Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Bumped tokenizers version requirements to latest 0.2.1 Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Added matching tests Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Matching OpenAI GPT tokenization ! Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Matching GPT2 on tokenizers Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Expose add_prefix_space as constructor parameter for GPT2 Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Matching Roberta tokenization ! Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Removed fast implementation of CTRL. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Binding TransformerXL tokenizers to Rust. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Updating tests accordingly. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Added tokenizers as top-level modules. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Black & isort. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Rename LookupTable to WordLevel to match Rust side. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Black. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Use "fast" suffix instead of "ru" for rust tokenizers implementations. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Introduce tokenize() method on fast tokenizers. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * encode_plus dispatchs to batch_encode_plus Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * batch_encode_plus now dispatchs to encode if there is only one input element. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Bind all the encode_plus parameter to the forwarded batch_encode_plus call. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Bump tokenizers dependency to 0.3.0 Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Formatting. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Fix tokenization_auto with support for new (python, fast) mapping schema. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Give correct fixtures path in test_tokenization_fast.py for the CLI. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Expose max_len_ properties on BertTokenizerFast Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Move max_len_ properties to PreTrainedTokenizerFast and override in specific subclasses. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * _convert_encoding should keep the batch axis tensor if only one sample in the batch. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Add warning message for RobertaTokenizerFast if used for MLM. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Added use_fast (bool) parameter on AutoTokenizer.from_pretrained(). This allows to easily enable/disable Rust-based tokenizer instantiation. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Let's tokenizers handle all the truncation and padding stuff. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Allow to provide tokenizer arguments during pipeline creation. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Update test_fill_mask pipeline to not use fast tokenizers. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Fix too much parameters for convert_encoding. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * When enabling padding, max_length should be set to None. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Avoid returning nested tensors of length 1 when calling encode_plus Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Ensure output is padded when return_tensor is not None. Tensor creation requires the inital list input to be of the exact same size. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Disable transfoxl unittest if pytorch is not available (required to load the model) Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * encode_plus should not remove the leading batch axis if return_tensor is set Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Temporary disable fast tokenizers on QA pipelines. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Fix formatting issues. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Update tokenizers to 0.4.0 * Update style * Enable truncation + stride unit test on fast tokenizers. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Add unittest ensuring special_tokens set match between Python and Rust. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Ensure special_tokens are correctly set during construction. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Give more warning feedback to the user in case of padding without pad_token. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * quality & format. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Added possibility to add a single token as str Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Added unittest for add_tokens and add_special_tokens on fast tokenizers. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Fix rebase mismatch on pipelines qa default model. QA requires cased input while the tokenizers would be uncased. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Addressing review comment: Using offset mapping relative to the original string + unittest. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Addressing review comment: save_vocabulary requires folder and file name Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Addressing review comment: Simplify import for Bert. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Addressing review comment: truncate_and_pad disables padding according to the same heuristic than the one enabling padding. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Addressing review comment: Remove private member access in tokenize() Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Addressing review comment: Bump tokenizers dependency to 0.4.2 Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * format & quality. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Addressing review comment: Use named arguments when applicable. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Addressing review comment: Add Github link to Roberta/GPT2 space issue on masked input. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Addressing review comment: Move max_len_single_sentence / max_len_sentences_pair to PreTrainedTokenizerFast + tests. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Addressing review comment: Relax type checking to include tuple and list object. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Addressing review comment: Document the truncate_and_pad manager behavior. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Raise an exception if return_offsets_mapping is not available with the current tokenizer. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Ensure padding is set on the tokenizers before setting any padding strategy + unittest. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * On pytorch we need to stack tensor to get proper new axis. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Generalize tests to different framework removing hard written return_tensors="..." Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Bump tokenizer dependency for num_special_tokens_to_add Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Overflowing tokens in batch_encode_plus are now stacked over the batch axis. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Improved error message for padding strategy without pad token. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Bumping tokenizers dependency to 0.5.0 for release. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Optimizing convert_encoding around 4x improvement. 🚀 Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * expose pad_to_max_length in encode_plus to avoid duplicating the parameters in kwargs Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Generate a proper overflow_to_sampling_mapping when return_overflowing_tokens is True. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Fix unittests for overflow_to_sampling_mapping not being returned as tensor. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Format & quality. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Remove perfect alignment constraint for Roberta (allowing 1% difference max) Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Triggering final CI Co-authored-by: MOI Anthony <xn1t0x@gmail.com>	2020-02-19 11:35:40 -05:00
Sam Shleifer	20fc18fbda	Skip flaky test_tf_question_answering (#2845 ) * Skip flaky test * Style	2020-02-18 16:14:50 -05:00
Joe Davison	f1e8a51f08	Preserve spaces in GPT-2 tokenizers (#2778 ) * Preserve spaces in GPT-2 tokenizers Preserves spaces after special tokens in GPT-2 and inhereted (RoBERTa) tokenizers, enabling correct BPE encoding. Automatically inserts a space in front of first token in encode function when adding special tokens. * Add tokenization preprocessing method * Add framework argument to pipeline factory Also fixes pipeline test issue. Each test input now treated as a distinct sequence.	2020-02-13 13:29:43 -05:00
Sam Shleifer	ef74b0f07a	get_activation('relu') provides a simple mapping from strings i… (#2807 ) * activations.py contains a mapping from string to activation function * resolves some `gelu` vs `gelu_new` ambiguity	2020-02-13 08:28:33 -05:00
Oleksiy Syvokon	ee5de0ba44	BERT decoder: Fix causal mask dtype. PyTorch < 1.3 requires multiplication operands to be of the same type. This was violated when using default attention mask (i.e., attention_mask=None in arguments) given BERT in the decoder mode. In particular, this was breaking Model2Model and made tutorial from the quickstart failing.	2020-02-11 15:19:22 -05:00
VictorSanh	d8b43600fd	omission	2020-02-07 15:28:13 -05:00
VictorSanh	ee5a6856ca	distilbert-base-cased weights + Readmes + omissions	2020-02-07 15:28:13 -05:00
Lysandre	2184f87003	RoBERTa TensorFlow Tests	2020-02-04 18:05:35 -05:00
Lysandre	e615269cb8	Correct slow test	2020-02-04 18:05:35 -05:00
Lysandre	5f96ebc0be	Style	2020-02-04 18:05:35 -05:00
Lysandre	950c6a4f09	Flaubert PyTorch tests	2020-02-04 18:05:35 -05:00
Lysandre	d28b81dc29	RoBERTa Pytorch tests	2020-02-04 18:05:35 -05:00
sshleifer	9e5b549b4d	fix default getattr	2020-02-04 16:38:52 -05:00
sshleifer	25848a6094	double quotes	2020-02-04 16:38:52 -05:00
sshleifer	cbcb83f21d	minor cleanup of test_attention_outputs	2020-02-04 16:38:52 -05:00
Lysandre	1e82cd8457	Flaubert auto tokenizer + tests cc @julien-c	2020-01-31 14:16:52 -05:00
Julien Chaumond	9fa836a73f	fill_mask helper (#2576 ) * fill_mask helper * [poc] FillMaskPipeline * Revert "[poc] FillMaskPipeline" This reverts commit `67eeea55b0`. * Revert "fill_mask helper" This reverts commit `cacc17b884`. * README: clarify that Pipelines can also do text-classification cf. question at the AI&ML meetup last week, @mfuntowicz * Fix test: test feature-extraction pipeline * Test tweaks * Slight refactor of existing pipeline (in preparation of new FillMaskPipeline) * Extraneous doc * More robust way of doing this @mfuntowicz as we don't rely on the model name anymore (see AutoConfig) * Also add RobertaConfig as a quickfix for wrong token_type_ids * cs * [BIG] FillMaskPipeline	2020-01-30 18:15:42 -05:00
Lysandre	df27648bd9	Rename test_examples to test_doc_samples	2020-01-30 10:07:22 -05:00
Lysandre	e63a81dd25	Style	2020-01-29 16:29:20 -05:00
Lysandre	217349016a	Copy object instead of passing the reference	2020-01-29 16:15:39 -05:00
Lysandre	ea2600bd5f	Absolute definitive HeisenDistilBug solve cc @julien-c @thomwolf	2020-01-27 21:58:36 -05:00
thomwolf	0e31e06a75	Add AutoModelForPreTraining	2020-01-27 14:27:07 -05:00
Lysandre	875c4ae48f	Definitive HeisenDistilBug fix cc @julien-c @@thomwolf	2020-01-27 12:09:58 -05:00
Lysandre	24d5ad1dcc	Run the examples in slow	2020-01-23 09:38:45 -05:00
Lysandre	f81b6c95f2	Flake8 violation	2020-01-23 09:38:45 -05:00
Lysandre	632675ea88	Can test examples spread over multiple blocks	2020-01-23 09:38:45 -05:00
Lysandre	eaa6b9afc6	Require Torch when testing examples	2020-01-23 09:38:45 -05:00
Lysandre	64abd3e0aa	Multi-line examples can be tested + ALBERT patch for CircleCI All tests should now work fine.	2020-01-23 09:38:45 -05:00
Lysandre	837577256b	Automatic testing of examples The CircleCI test should fail.	2020-01-23 09:38:45 -05:00
Mark Neumann	65a89a8976	Fix BasicTokenizer to respect `never_split` parameters (#2557 ) * add failing test * fix call to _run_split_on_punc * format with black	2020-01-17 14:57:56 -05:00
Julien Chaumond	23a2cea8cb	Tokenizer.from_pretrained: fetch all possible files remotely	2020-01-16 16:47:19 -05:00
Julien Chaumond	9d8fd2d40e	tokenizer.save_pretrained: only save file if non-empty	2020-01-16 16:47:19 -05:00
Thomas Wolf	dc17f2a111	Merge pull request #2538 from huggingface/py3_super 💄 super	2020-01-16 13:17:15 +01:00
Julien Chaumond	d9fa1bad72	Fix failing torchscript test for xlnet model.parameters() order is apparently not stable (only for xlnet, for some reason)	2020-01-15 20:22:21 -05:00
Julien Chaumond	83a41d39b3	💄 super	2020-01-15 18:33:50 -05:00
Julien Chaumond	eb59e9f705	Graduate sst-2 to a canonical one	2020-01-15 16:28:50 +00:00
Julien Chaumond	e184ad13cf	Close #2392	2020-01-15 15:43:44 +00:00
Julien Chaumond	715fa638a7	Merge branch 'master' into from_scratch_training	2020-01-14 18:58:21 +00:00
Lysandre	100e3b6f21	Bias should be resized with the weights Created a link between the linear layer bias and the model attribute bias. This does not change anything for the user nor for the conversion scripts, but allows the `resize_token_embeddings` method to resize the bias as well as the weights of the decoder. Added a test.	2020-01-14 13:43:45 -05:00
Julien Chaumond	764f836d52	Update test_tokenization_auto.py	2020-01-13 22:50:34 -05:00
Julien Chaumond	d5831acb07	Update test_tokenization_auto.py	2020-01-13 22:47:33 -05:00
Julien Chaumond	ed6cd597cc	Update test_tokenization_auto.py	2020-01-13 22:46:35 -05:00
Julien Chaumond	5cb463a714	Update test_tokenization_auto.py	2020-01-13 22:38:29 -05:00
Julien Chaumond	0304628590	Map configs to models and tokenizers	2020-01-13 23:11:44 +00:00
Julien Chaumond	1fc855e456	[tests] Safety checks on CONFIG_MAPPING	2020-01-13 21:52:55 +00:00
Julien Chaumond	cf8a70bf68	More AutoConfig tests	2020-01-11 03:43:57 +00:00
Julien Chaumond	c6f682c1eb	flake	2020-01-11 03:18:31 +00:00
Julien Chaumond	4d1c98c012	AutoConfig + other Auto classes honor model_type	2020-01-11 02:46:17 +00:00
Julien Chaumond	2f32dfd33b	Convention: name mixins mixins	2020-01-11 01:24:29 +00:00
Julien Chaumond	055e80cfad	rm old ConfigTester	2020-01-10 21:36:18 +00:00
Julien Chaumond	84c0aa1868	num_parameters helper	2020-01-10 17:40:02 +00:00
alberduris	81d6841b4b	GPU text generation: mMoved the encoded_prompt to correct device	2020-01-06 15:11:12 +01:00
alberduris	dd4df80f0b	Moved the encoded_prompts to correct device	2020-01-06 15:11:12 +01:00
Aymeric Augustin	0ffc8eaf53	Enforce target version for black. This should stabilize formatting.	2020-01-05 12:52:14 -05:00
Julien Chaumond	594ca6dead	[debug] Debug Heisenbug, the old school way.	2019-12-29 10:07:21 -05:00
Julien Chaumond	f78ebc22ad	[cli] Add ability to delete remote object	2019-12-27 22:53:49 -05:00
Thomas Wolf	9f5f646442	Merge pull request #2211 from huggingface/fast-tokenizers Fast tokenizers	2019-12-27 10:24:29 +01:00
Anthony MOI	2818e50569	Add tests for fast tokenizers	2019-12-24 13:29:01 -05:00
Aymeric Augustin	e6c0019c80	Remove unused variables in tests.	2019-12-23 22:38:18 +01:00
Aymeric Augustin	1c62e87b34	Use built-in open(). On Python 3, `open is io.open`.	2019-12-22 18:38:56 +01:00
Aymeric Augustin	798b3b3899	Remove sys.version_info[0] == 2 or 3.	2019-12-22 18:38:42 +01:00
Aymeric Augustin	8af25b1664	Remove six.	2019-12-22 17:56:09 +01:00
Aymeric Augustin	c824d15aa1	Remove __future__ imports.	2019-12-22 17:47:54 +01:00
Aymeric Augustin	00204f2b4c	Replace CommonTestCases for tokenizers with a mixin. This is the same change as for (TF)CommonTestCases for modeling.	2019-12-22 15:35:25 +01:00
Aymeric Augustin	a3c5883f2c	Rename file for consistency.	2019-12-22 15:35:25 +01:00
Aymeric Augustin	daf8bebcdd	Remove unused GPTModelTester. It isn't imported anywhere.	2019-12-22 15:35:25 +01:00
Aymeric Augustin	345c23a60f	Replace (TF)CommonTestCases for modeling with a mixin. I suspect the wrapper classes were created in order to prevent the abstract base class (TF)CommonModelTester from being included in test discovery and running, because that would fail. I solved this by replacing the abstract base class with a mixin. Code changes are just de-indenting and automatic reformattings performed by black to use the extra line space.	2019-12-22 15:35:18 +01:00
Aymeric Augustin	7e98e211f0	Remove unittest.main() in test modules. This construct isn't used anymore these days. Running python tests/test_foo.py puts the tests/ directory on PYTHONPATH, which isn't representative of how we run tests. Use python -m unittest tests/test_foo.py instead.	2019-12-22 14:42:03 +01:00
Aymeric Augustin	ced0a94204	Switch test files to the standard test_*.py scheme.	2019-12-22 14:15:13 +01:00
Aymeric Augustin	067395d5c5	Move tests outside of library.	2019-12-22 13:47:17 +01:00
thomwolf	1484d67de9	[LARGE] updating all tests and API	2019-07-02 12:13:17 +02:00
thomwolf	4f8b5f687c	add fix for serialization of tokenizer	2019-06-29 23:35:21 +02:00
thomwolf	d9184620f9	fix tests and new API	2019-06-29 23:10:40 +02:00
thomwolf	7e3070ae4f	add from_pretrained method to all configuration classes	2019-06-26 11:12:00 +02:00
thomwolf	93e9971c54	fix tests	2019-06-26 10:02:45 +02:00
thomwolf	62d78aa37e	updating GLUE utils for compatibility with XLNet	2019-06-24 14:36:11 +02:00
thomwolf	c946bb51a6	fix xlnet tokenizer and python2	2019-06-22 22:28:49 +02:00
thomwolf	ebd2cb8d74	update from_pretrained to load XLNetModel as well	2019-06-21 21:08:44 +02:00
thomwolf	483cbc36a9	test deviation with tf model: max ~1e-3 should be ok	2019-06-21 16:38:01 +02:00
thomwolf	24d8068982	weights loading script ok	2019-06-21 12:33:44 +02:00
thomwolf	32da75486b	add tokenizer and tests	2019-06-21 11:09:51 +02:00
thomwolf	45709d7532	model running with simple inputs	2019-06-21 00:28:42 +02:00
thomwolf	34d706a0e1	pruning in bertology	2019-06-19 15:25:49 +02:00
thomwolf	33d3db5c43	updating head masking, readme and docstrings	2019-06-17 15:51:28 +02:00
thomwolf	965f172de6	output all hidden layers states in GPT/GPT-2	2019-06-17 14:34:12 +02:00
thomwolf	f12007e421	add head masking and pruning to openai GPT	2019-06-17 14:19:40 +02:00
thomwolf	b860e47cf5	add head masking and pruning to gpt-2	2019-06-17 14:12:10 +02:00
thomwolf	7220d47a1c	adding head pruning and tests	2019-06-17 13:20:45 +02:00
thomwolf	96c4d3d988	add head masking tests	2019-06-17 12:17:26 +02:00
thomwolf	5e1207b8ad	add attention to all bert models and add test	2019-06-14 16:28:25 +02:00
thomwolf	bcc9e93e6f	fix test	2019-06-14 15:38:20 +02:00
thomwolf	a3274ac40b	adding attention outputs in bert	2019-06-03 16:11:45 -05:00
thomwolf	c30139a013	add special tokens to gpt-2	2019-04-30 10:45:26 +02:00
lukovnikov	56a47ce2b7	- replaced OpenAIGPTAdam with OpenAIAdam in docs	2019-04-25 16:05:28 +02:00
lukovnikov	704037ad51	- updated docs for new LR API - added some images for illustration - updated comments in optimization	2019-04-25 15:59:39 +02:00
lukovnikov	bb7557d3ab	- removed __all__ in optimization - removed unused plotting code - using ABC for LRSchedule - added some schedule object init tests	2019-04-21 13:48:33 +02:00
lukovnikov	34ccc8ebf4	Merge remote-tracking branch 'upstream/master'	2019-04-21 13:16:15 +02:00
thomwolf	34ae5bf838	small clean up in tests	2019-04-17 14:52:12 +02:00
thomwolf	265550ec34	relax network connection requirements	2019-04-17 14:22:35 +02:00
thomwolf	31d387604c	adding s3 model tests with --runslow	2019-04-17 11:58:27 +02:00
thomwolf	bc70779bf0	fixed GPT-2 tokenization on python 2	2019-04-17 10:56:15 +02:00
thomwolf	18a8a15f78	improving GPT2 tokenization and adding tests	2019-04-16 17:00:55 +02:00
thomwolf	9761aa4845	add to_json_file method to configuration classes	2019-04-15 14:12:08 +02:00
thomwolf	e8568a3b17	fixing tests	2019-04-15 12:55:38 +02:00
thomwolf	870b734bfd	added tokenizers serialization tests	2019-04-15 12:03:56 +02:00
lukovnikov	20686b78fc	schedule fix	2019-04-03 18:13:52 +02:00
lukovnikov	1b4ce76c38	schedule fix	2019-04-03 17:40:12 +02:00
lukovnikov	23bd2eebf5	schedule fix	2019-04-03 17:10:34 +02:00
lukovnikov	91a073f804	schedule fix	2019-04-03 17:10:08 +02:00
lukovnikov	b64cc63a77	optimization schedule test update	2019-04-03 16:42:40 +02:00
lukovnikov	d164867d90	- updated docs for optimization	2019-04-03 16:13:51 +02:00
lukovnikov	262a9992d7	class weights	2019-03-18 18:29:12 +01:00
thomwolf	2dd8f524f5	removing test for long sequences error following #337	2019-03-06 10:10:41 +01:00
thomwolf	009ee86a19	fix tests - bump up version	2019-02-17 23:57:23 +01:00
thomwolf	ffd623823d	adding gpt2	2019-02-17 23:38:51 +01:00
thomwolf	884ca81d87	transposing the inputs of Transformer-XL to have a unified interface	2019-02-11 13:19:59 +01:00
thomwolf	0a9860daa7	tests pass on python 2 and 3	2019-02-11 10:47:52 +01:00
thomwolf	2071a9b86e	fix python 2.7 imports	2019-02-11 10:35:36 +01:00
thomwolf	b514a60c36	added tests for OpenAI GPT and Transformer-XL tokenizers	2019-02-11 10:17:16 +01:00
thomwolf	9bdcba53fd	fix tests	2019-02-09 17:07:12 +01:00
thomwolf	1320e4ec0c	mc_token_mask => mc_token_ids	2019-02-09 16:58:53 +01:00
thomwolf	2df41663f1	added test	2019-02-07 17:05:49 +01:00
thomwolf	ba9e4eb354	fix unicode in tokenization tests	2019-02-06 00:28:00 +01:00
thomwolf	448937c00d	python 2 compatibility	2019-02-06 00:07:46 +01:00
thomwolf	98c96fb1a7	splitting position and tokens embeddings in OpenAI GPT - updating tf imports - tests	2019-01-29 10:31:42 +01:00
thomwolf	a45a9cc0e1	update tests	2019-01-28 17:16:02 +01:00
thomwolf	dc5df92fa8	added LM head for OpenAI	2019-01-08 17:18:47 +01:00
thomwolf	3cf12b235a	added tests + fixed losses	2019-01-08 16:24:23 +01:00
Patrick Lewis	78cf7b4ab4	added code to raise value error for bert tokenizer for covert_tokens_to_indices	2018-12-18 14:41:30 +00:00
thomwolf	0f544625f4	fix swag example for work with apex	2018-12-13 13:35:59 +01:00
thomwolf	52c53f39d0	clean up apex integration	2018-12-13 13:02:17 +01:00
thomwolf	85fff78c2d	compatibility PT 1.0 and 0.4.1	2018-12-13 12:48:13 +01:00
Deyu Fu	c8ea286048	change to apex for better fp16 and multi-gpu support	2018-12-11 17:13:58 -08:00
thomwolf	7f7c41b0c1	tests for all model classes with and without labels	2018-11-30 22:54:33 +01:00
thomwolf	757750d6f6	fix tests	2018-11-17 11:58:14 +01:00
thomwolf	1de35b624b	preparing for first release	2018-11-15 20:56:10 +01:00
Yaser Martinez Palenzuela	4d124baf8f	Add test for Chinese tokenization	2018-11-05 23:04:29 +01:00
thomwolf	3d291dea4a	clean up tests	2018-11-04 21:27:19 +01:00
thomwolf	87da161c2a	finishing model test	2018-11-04 21:27:10 +01:00
thomwolf	f8276008df	update readme, file names, removing TF code, moving tests	2018-11-03 23:35:14 +01:00

... 97 98 99 100 101

5042 Commits