transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-23 22:38:58 +06:00

History

Paul O'Leary McCann cf3cf304ca Replace mecab-python3 with fugashi for Japanese tokenization (#6086 ) * Replace mecab-python3 with fugashi This replaces mecab-python3 with fugashi for Japanese tokenization. I am the maintainer of both projects. Both projects are MeCab wrappers, so the underlying C++ code is the same. fugashi is the newer wrapper and doesn't use SWIG, so for basic use of the MeCab API it's easier to use. This code insures the use of a version of ipadic installed via pip, which should make versioning and tracking down issues easier. fugashi has wheels for Windows, OSX, and Linux, which will help with issues with installing old versions of mecab-python3 on Windows. Compared to mecab-python3, because fugashi doesn't use SWIG, it doesn't require a C++ runtime to be installed on Windows. In adding this change I removed some code dealing with `cursor`, `token_start`, and `token_end` variables. These variables didn't seem to be used for anything, it is unclear to me why they were there. I ran the tests and they passed, though I couldn't figure out how to run the slow tests (`--runslow` gave an error) and didn't try testing with Tensorflow. * Style fix * Remove unused variable Forgot to delete this... * Adapt doc with install instructions * Fix typo Co-authored-by: sgugger <sylvain.gugger@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>		2020-07-31 04:41:14 -04:00
..
_static	Add forum link in the docs (#5637 )	2020-07-09 15:13:22 -04:00
imgs	Guide to fixed-length model perplexity evaluation (#5449 )	2020-07-07 16:04:15 -06:00
internal	Doc tokenizer (#6110 )	2020-07-30 14:51:19 -04:00
main_classes	Doc tokenizer (#6110 )	2020-07-30 14:51:19 -04:00
model_doc	Actually the extra_id are from 0-99 and not from 1-100 (#5967 )	2020-07-30 06:13:29 -04:00
benchmarks.rst	[Docs] Benchmark docs (#5360 )	2020-06-29 16:08:57 +02:00
bertology.rst	[doc] Fix broken links + remove crazy big notebook	2020-05-07 18:44:18 -04:00
conf.py	Release: v3.0.2	2020-07-06 18:49:44 -04:00
contributing.md	Update installation page and add contributing to the doc (#5084 )	2020-06-17 14:01:10 -04:00
converting_tensorflow_models.rst	Add ALBERT to the Tensorflow to Pytorch model conversion cli (#3933 )	2020-05-11 13:10:00 -04:00
examples.md	per_device instead of per_gpu/error thrown when argument unknown (#4618 )	2020-05-27 11:36:55 -04:00
favicon.ico	Adding usage examples for common tasks (#2850 )	2020-02-25 13:48:24 -05:00
glossary.rst	Fix typo in glossary (#5466 )	2020-07-02 09:19:33 -04:00
index.rst	Doc tokenizer (#6110 )	2020-07-30 14:51:19 -04:00
installation.md	Update installation page and add contributing to the doc (#5084 )	2020-06-17 14:01:10 -04:00
migration.md	Add hugs (#5225 )	2020-06-24 07:56:14 -04:00
model_sharing.rst	How to share model cards with the CLI (#5374 )	2020-06-30 08:59:32 -04:00
model_summary.rst	Update model_summary.rst (#5737 )	2020-07-27 05:34:02 -04:00
multilingual.rst	Refactor Code samples; Test code samples (#5036 )	2020-06-25 16:46:00 -04:00
notebooks.md	Update notebooks (#3620 )	2020-04-06 14:32:39 -04:00
perplexity.rst	tiny ppl doc typo fix (#5751 )	2020-07-14 10:39:44 -06:00
philosophy.rst	Add hugs (#5225 )	2020-06-24 07:56:14 -04:00
preprocessing.rst	Tokenization tutorial (#5257 )	2020-06-24 18:43:20 -04:00
pretrained_models.rst	Replace mecab-python3 with fugashi for Japanese tokenization (#6086 )	2020-07-31 04:41:14 -04:00
quicktour.rst	Switch from return_tuple to return_dict (#6138 )	2020-07-30 09:17:00 -04:00
serialization.rst	Enable ONNX/ONNXRuntime optimizations through converter script (#6131 )	2020-07-31 09:45:13 +02:00
task_summary.rst	Update doc to new model outputs (#5946 )	2020-07-21 18:13:55 -04:00
tokenizer_summary.rst	doc fixes (#5613 )	2020-07-08 19:52:44 -04:00
training.rst	Switch from return_tuple to return_dict (#6138 )	2020-07-30 09:17:00 -04:00