Thomas Wolf
9f5f646442
Merge pull request #2211 from huggingface/fast-tokenizers
...
Fast tokenizers
2019-12-27 10:24:29 +01:00
Aymeric Augustin
9024b19994
Auto-format (fixes previous commit).
2019-12-27 10:13:52 +01:00
Aymeric Augustin
3233b58ad4
Quote square brackets in shell commands.
...
This ensures compatibility with zsh.
Fix #2316 .
2019-12-27 08:50:25 +01:00
Anthony MOI
e6ec24fa88
Better added_tokens handling
2019-12-26 16:49:48 -05:00
Anthony MOI
599db139f9
Code style update
2019-12-26 15:13:30 -05:00
Anthony MOI
835b76a46f
Handle unk_token
...
As we discussed, this is handled here directly
cc @thomwolf
2019-12-26 14:42:55 -05:00
Anthony MOI
7ead04ce14
FastPreTrainedTokenizer => PreTrainedTokenizerFast
2019-12-26 14:39:39 -05:00
Anthony MOI
1f82a5d910
Update for changes in tokenizers API
2019-12-26 14:37:55 -05:00
Thomas Wolf
8c67b529f6
Merge pull request #2324 from kashif/patch-1
...
Typo in serving.py
2019-12-26 12:38:06 +01:00
Kashif Rasul
7211541ade
Typo in serving.py
2019-12-26 12:21:40 +01:00
Thomas Wolf
aeef4823ab
Merge pull request #2303 from patrickvonplaten/fix_error_with_repetition_penalty
...
fix repetition penalty error in modeling_utils.py
2019-12-25 22:39:20 +01:00
Thomas Wolf
0412f3d929
Merge pull request #2291 from aaugustin/fix-flake8-F841
...
Fix F841 flake8 warning
2019-12-25 22:37:42 +01:00
Thomas Wolf
8742c95461
Merge pull request #2289 from patrickvonplaten/fix_effective_batch_size_lang_gen_xlm
...
fix bug in prepare inputs for language generation for xlm for effective batch_size > 1
2019-12-25 22:30:46 +01:00
Thomas Wolf
1240be3ed9
Merge pull request #2312 from vitaliyradchenko/fix_special_and_add_tokens_loading
...
Correct tokenization for special and added tokens
2019-12-25 20:52:30 +01:00
vitaliyradchenko
b262577d17
add special tokens to unique_added_tokens_encoder
2019-12-25 18:31:35 +02:00
vitaliyradchenko
83a2347952
fixed lack of added and special tokens
2019-12-25 18:03:19 +02:00
Thomas Wolf
cea04a2443
Merge pull request #2310 from ShnitzelKiller/scatter-unfix
...
revert erroneous fix #2276
2019-12-25 12:43:22 +01:00
James Noeckel
e1844d9a45
use positional arguments due to inconsistent API
2019-12-25 01:34:02 -08:00
James Noeckel
9fb7addd4d
revert erroneous fix
2019-12-24 22:26:09 -08:00
Anthony MOI
734d29b03d
tokenizers is now a real dependency
2019-12-24 13:32:41 -05:00
Anthony MOI
2818e50569
Add tests for fast tokenizers
2019-12-24 13:29:01 -05:00
Anthony MOI
31c56f2e0b
Fix style
2019-12-24 12:43:27 -05:00
Anthony MOI
951ae99bea
BertTokenizerFast
2019-12-24 12:24:24 -05:00
Anthony MOI
041eac2d6d
GPT2TokenizerFast
2019-12-24 12:24:14 -05:00
Anthony MOI
3471ff0d35
FastPreTrainedTokenizer
2019-12-24 12:23:30 -05:00
patrickvonplaten
18e5bdbec5
fix repetition penalty error in modeling_utils.py
2019-12-24 17:18:05 +01:00
patrickvonplaten
f18ac4c28e
fix sequence length for prepare_inputs for xlnet
2019-12-24 16:43:24 +01:00
patrickvonplaten
359dc43837
fix effective batch_size error in prepare_inputs also for xlnet
2019-12-24 16:33:20 +01:00
patrickvonplaten
d98a384cb0
fix bug in prepare inputs for language generation for xlm for effective batch_size > 1
2019-12-24 16:29:54 +01:00
thomwolf
3e0cf49514
adding back last dropout in TF 2.0 T5
2019-12-24 11:30:56 +01:00
thomwolf
35d32308de
adding back final dropout in T5
2019-12-24 11:29:49 +01:00
Thomas Wolf
81db12c3ba
Merge pull request #2271 from aaugustin/improve-setup-and-requirements
...
Improve setup and requirements
2019-12-24 11:21:20 +01:00
Aymeric Augustin
10724a8123
Run the slow tests every Monday morning.
2019-12-24 09:09:43 +01:00
Aymeric Augustin
a8d34e534e
Remove [--editable] in install instructions.
...
Use -e only in docs targeted at contributors.
If a user copy-pastes command line with [--editable], they will hit
an error. If they don't know the --editable option, we're giving them
a choice to make before they can move forwards, but this isn't a choice
they need to make right now.
2019-12-24 08:46:08 +01:00
Aymeric Augustin
e74c73a85d
Enable F841 warning in flake8.
2019-12-23 22:38:23 +01:00
Aymeric Augustin
e6c0019c80
Remove unused variables in tests.
2019-12-23 22:38:18 +01:00
Aymeric Augustin
495580dad1
Remove unused variables in templates.
2019-12-23 22:38:18 +01:00
Aymeric Augustin
71f94a8a1c
Remove unused variables in src.
2019-12-23 22:38:09 +01:00
Aymeric Augustin
81422c4e6d
Remove unused variables in examples.
2019-12-23 22:29:02 +01:00
Aymeric Augustin
072750f4dc
Merge pull request #2288 from aaugustin/better-handle-optional-imports
...
Improve handling of optional imports
2019-12-23 22:28:47 +01:00
Aymeric Augustin
4621ad6f9d
Use the same pattern as everywhere else.
...
This is really just for consistency.
2019-12-23 21:30:04 +01:00
Aymeric Augustin
a31d4a2971
Reraise ImportError when sentencepiece isn't installed.
...
Else, the next line fails with a confusion exception because the spm
variable isn't defined.
2019-12-23 21:27:42 +01:00
Aymeric Augustin
c8b0c1e551
Improve exception type.
...
ImportError isn't really appropriate when there's no import involved.
2019-12-23 21:27:38 +01:00
Aymeric Augustin
4c09a96096
Simplify re-raising exceptions.
...
Most module use the simpler `raise` version. Normalize those that don't.
2019-12-23 21:20:54 +01:00
Aymeric Augustin
5565dcdd35
Remove warning when scikit-learn isn't available.
...
Most users don't need it.
2019-12-23 21:16:26 +01:00
Aymeric Augustin
8a6881822a
Run some tests on Python 3.7.
...
This will improve version coverage.
2019-12-23 21:06:23 +01:00
Aymeric Augustin
7a865821d9
Remove stray egg-info directory automatically.
...
If a user or contributor ran `pip install -e .` on transformers < 3.0,
pip created a transformers.egg-info directory next to the transformers
directory at the root of the repository.
In transformers 3.0, the source is in a `src` subdirectory.
`pip install -e .` creates a transformers.egg-info directory there.
However, pip will still pick transformers.egg-info from the previous
location. This is a bug: https://github.com/pypa/pip/issues/5466
Users and contributors are likely to hit this problem because the
documentation for transformers 3.0 relies heavily on extra_requires
which didn't exist in earlier versions, so aren't defined in a stale
transformers.egg-info directory.
If such a directory exists, remove it. It's autogenerated, gitignored
and not supposed to contain anything of value.
2019-12-23 21:06:23 +01:00
Aymeric Augustin
70373a5f7c
Update contribution instructions.
...
Also provide shortcuts in a Makefile.
2019-12-23 21:05:30 +01:00
Aymeric Augustin
c3783399db
Remove redundant requirements with transformers.
2019-12-23 19:17:27 +01:00
Aymeric Augustin
d79e9c9a9a
Remove docs/requirements.txt.
...
It's superseded by the "docs" extras.
2019-12-23 19:17:07 +01:00