transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-19 20:48:22 +06:00

Author	SHA1	Message	Date
Thomas Wolf	05db5bc1af	added small comparison between BERT, RoBERTa and DistilBERT	2019-11-14 22:40:22 +01:00
Thomas Wolf	9629e2c676	Merge pull request #1804 from ronakice/master fix multi-gpu eval in torch examples	2019-11-14 22:24:05 +01:00
Thomas Wolf	5b322a36db	Merge pull request #1811 from huggingface/special-tokens Fix special tokens addition in decoder #1807	2019-11-14 22:17:24 +01:00
Thomas Wolf	1a237d7f42	Merge pull request #1831 from iedmrc/gpt2-tokenization-sum-func-replacement sum() is replaced by itertools.chain.from_iterable()	2019-11-14 22:11:54 +01:00
Thomas Wolf	df99f8c5a1	Merge pull request #1832 from huggingface/memory-leak-schedulers replace LambdaLR scheduler wrappers by function	2019-11-14 22:10:31 +01:00
Thomas Wolf	0be9ae7b3e	Merge pull request #1833 from huggingface/max-length-warning Token indices sequence length is longer than the specified maximum sequence length for this model	2019-11-14 22:04:49 +01:00
Lysandre	be7f2aacce	[CI][DOC] Don't rebuild if folder exists - Correct directory.	2019-11-14 14:54:44 -05:00
Lysandre	8f8d69716a	[CI][DOC] Don't rebuild if folder exists.	2019-11-14 14:48:21 -05:00
Rémi Louf	2276bf69b7	update the examples, docs and template	2019-11-14 20:38:02 +01:00
Lysandre	d7929899da	Specify checkpoint in saved file for run_lm_finetuning.py	2019-11-14 10:49:00 -05:00
Lysandre	a67e747889	Reorganized max_len warning	2019-11-14 10:30:22 -05:00
Lysandre	e18f786cd5	Quickstart example showcasing past	2019-11-14 10:06:00 -05:00
Rémi Louf	022525b003	replace LambdaLR scheduler wrappers by function Custom schedulers are currently initiated by wrapping Pytorch's LambdaLR class and passing a method of the wrapping class to the __init__ function of LambdaLR. This approach is not appropriate for several reasons: 1. one does not need to define a class when it only defines a __init__() method; 2. instantiating the parent class by passing a method of the child class creates a cyclical reference which leads to memory leaks. See issues #1742 and #1134. In this commit we replace the wrapper classes with functions that instantiate `LambdaLR` with a custom learning rate function. We use a closure to specify the parameter of the latter. We also do a bit of renaming within the function to explicit the behaviour and removed docstrings that were subsequently not necessary.	2019-11-14 15:39:08 +01:00
İbrahim Ethem Demirci	7627dde1f8	sum() is the leanest method to flatten a string list, so it's been replaced by itertools.chain.from_iterable()	2019-11-14 17:06:15 +03:00
Lysandre	74d0bcb6ff	Fix special tokens addition in decoder	2019-11-12 15:27:57 -05:00
Julien Chaumond	155c782a2c	[inputs_embeds] All TF models + tests	2019-11-12 11:29:21 -05:00
Julien Chaumond	2aef2f0bbc	[common attributes] Fix previous commit for transfo-xl	2019-11-12 11:29:21 -05:00
Julien Chaumond	2f17464266	[common attributes] Slightly sharper test coverage	2019-11-12 11:29:21 -05:00
Julien Chaumond	9d2398fd99	Ooopsie	2019-11-12 11:29:21 -05:00
Julien Chaumond	70d97ddd60	[TF models] Common attributes as per #1721	2019-11-12 11:29:21 -05:00
Julien Chaumond	872403be1c	This is not a @property after all	2019-11-12 11:29:21 -05:00
Julien Chaumond	dd6b2e05e1	whitespace	2019-11-12 11:29:21 -05:00
Lysandre	d409aca326	Clarify the use of past in GPT2 and CTRL	2019-11-12 10:59:37 -05:00
Michael Watkins	7246d3c2f9	Consider do_lower_case in PreTrainedTokenizer As pointed out in #1545, when using an uncased model, and adding a new uncased token, the tokenizer does not correctly identify this in the case that the input text contains the token in a cased format. For instance, if we load bert-base-uncased into BertTokenizer, and then use .add_tokens() to add "cool-token", we get the expected result for .tokenize('this is a cool-token'). However, we get a possibly unexpected result for .tokenize('this is a cOOl-Token'), which in fact mirrors the result for the former from before the new token was added. This commit adds - functionality to PreTrainedTokenizer to handle this situation in case a tokenizer (currently Bert, DistilBert, and XLNet) has the do_lower_case=True kwarg by: 1) lowercasing tokens added with .add_tokens() 2) lowercasing text at the beginning of .tokenize() - new common test case for tokenizers https://github.com/huggingface/transformers/issues/1545	2019-11-12 13:08:30 +02:00
ronakice	2e31176557	fix multi-gpu eval	2019-11-12 05:55:11 -05:00
thomwolf	8aba81a0b6	fix #1789	2019-11-12 08:52:43 +01:00
Stefan Schweter	94e55253ae	tests: add test case for DistilBertForTokenClassification implementation	2019-11-11 16:20:15 +01:00
Stefan Schweter	2b07b9e5ee	examples: add DistilBert support for NER fine-tuning	2019-11-11 16:19:34 +01:00
Stefan Schweter	1806eabf59	module: add DistilBertForTokenClassification import	2019-11-11 16:18:48 +01:00
Stefan Schweter	1c7253cc5f	modeling: add DistilBertForTokenClassification implementation	2019-11-11 16:18:16 +01:00
Lysandre	b5d330d118	Fix #1784	2019-11-11 10:15:14 -05:00
eukaryote	90f6e73a35	Add DialoGPT support for Pytorch->TF	2019-11-09 16:46:19 +00:00
eukaryote	ef99852961	from_pretrained: convert DialoGPT format DialoGPT checkpoints have "lm_head.decoder.weight" instead of "lm_head.weight". (see: https://www.reddit.com/r/MachineLearning/comments/dt5woy/p_dialogpt_state_of_the_art_conversational_model/f6vmwuy?utm_source=share&utm_medium=web2x)	2019-11-09 16:32:40 +00:00
Adrian Bauer	7a9aae1044	Fix run_bertology.py Make imports and args.overwrite_cache match run_glue.py	2019-11-08 16:28:40 -05:00
thomwolf	268d4f2099	fix position biases + better tests	2019-11-08 16:41:55 +01:00
thomwolf	b4fcd59a5a	add sentinels in tokenizer	2019-11-08 14:38:53 +01:00
thomwolf	15e53c4e87	maybe fix tests	2019-11-08 12:43:21 +01:00
thomwolf	f03c0c1423	adding models in readme and auto classes	2019-11-08 11:49:46 +01:00
thomwolf	4321c54125	fix tests	2019-11-08 11:49:32 +01:00
thomwolf	727a79b305	added TF2 model and tests - updated templates	2019-11-08 11:35:03 +01:00
Rémi Louf	cd286c2145	add condition around mask transformation	2019-11-08 11:31:16 +01:00
Rémi Louf	28d0ba35d7	only init encoder_attention_mask if stack is decoder We currently initialize `encoder_attention_mask` when it is `None`, whether the stack is that of an encoder or a decoder. Since this may lead to bugs that are difficult to tracks down, I added a condition that assesses whether the current stack is a decoder.	2019-11-08 11:22:19 +01:00
thomwolf	8fda532c3c	fix python 2 sentencepiece tokenization	2019-11-07 17:09:50 +01:00
thomwolf	ba10065c4b	update model, conversion script, tests and template	2019-11-07 15:55:36 +01:00
Diganta Misra	070dcf1c02	Added Mish Activation Function Mish is a new activation function proposed here - https://arxiv.org/abs/1908.08681 It has seen some recent success and has been adopted in SpaCy, Thic, TensorFlow Addons and FastAI-dev. All benchmarks recorded till now (including against ReLU, Swish and GELU) is present in the repository - https://github.com/digantamisra98/Mish Might be a good addition to experiment with especially in the Bert Model.	2019-11-07 03:45:43 +05:30
Julien Chaumond	1c542df7e5	Add RoBERTa-based GPT-2 Output Detector from OpenAI converted from https://github.com/openai/gpt-2-output-dataset/tree/master/detector Co-Authored-By: Lysandre Debut <lysandre.debut@reseau.eseo.fr> Co-Authored-By: Jong Wook Kim <jongwook@nyu.edu> Co-Authored-By: Jeff Wu <wuthefwasthat@gmail.com>	2019-11-06 16:26:31 -05:00
Julien Chaumond	2f3a421018	Fix other PyTorch models	2019-11-06 14:03:47 -05:00
Julien Chaumond	d5319793c4	Fix BERT	2019-11-06 14:03:47 -05:00
Julien Chaumond	27e015bd54	[tests] Flag to test on cuda	2019-11-06 14:03:47 -05:00
Julien Chaumond	13d9135fa5	[tests] get rid of warning cf. https://docs.pytest.org/en/latest/example/simple.html	2019-11-06 14:03:47 -05:00

... 343 344 345 346 347 ...

19383 Commits