transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-31 02:02:21 +06:00

Author	SHA1	Message	Date
Julien Demouth	02ec02d6d3	Add nvidia megatron models (#10911 ) * Add support for NVIDIA Megatron models * Add support for NVIDIA Megatron GPT2 and BERT Add the megatron_gpt2 model. That model reuses the existing GPT2 model. This commit includes a script to convert a Megatron-GPT2 checkpoint downloaded from NVIDIA GPU Cloud. See examples/megatron-models/README.md for details. Add the megatron_bert model. That model is implemented as a modification of the existing BERT model in Transformers. This commit includes a script to convert a Megatron-BERT checkpoint downloaded from NVIDIA GPU Cloud. See examples/megatron-models/README.md for details. * Update src/transformers/models/megatron_bert/configuration_megatron_bert.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/models/megatron_bert/configuration_megatron_bert.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/models/megatron_bert/configuration_megatron_bert.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Remove model.half in tests + add "# Copied ..." Remove the model.half() instruction which makes tests fail on the CPU. Add a comment "# Copied ..." before many classes in the model to enable automatic tracking in CI between the new Megatron classes and the original Bert ones. * Fix issues * Fix Flax/TF tests * Fix copyright * Update src/transformers/models/megatron_bert/configuration_megatron_bert.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/models/megatron_bert/configuration_megatron_bert.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update docs/source/model_doc/megatron_bert.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update docs/source/model_doc/megatron_gpt2.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/__init__.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Resolve most of 'sgugger' comments * Fix conversion issue + Run make fix-copies/quality/docs * Apply suggestions from code review * Causal LM & merge * Fix init * Add CausalLM to last auto class Co-authored-by: Julien Demouth <jdemouth@nvidia.com> Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>	2021-04-08 14:09:11 -04:00
Stas Bekman	c6d664849b	[DeepSpeed] ZeRO Stage 3 (#10753 ) * synced gpus * fix * fix * need to use t5-small for quality tests * notes * complete merge * fix a disappearing std stream problem * start zero3 tests * wip * tune params * sorting out the pre-trained model loading * reworking generate loop wip * wip * style * fix tests * split the tests * refactor tests * wip * parameterized * fix * workout the resume from non-ds checkpoint pass + test * cleanup * remove no longer needed code * split getter/setter functions * complete the docs * suggestions * gpus and their compute capabilities link * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * style * remove invalid paramgd * automatically configure zero3 params that rely on hidden size * make _get_resized_embeddings zero3-aware * add test exercising resize_token_embeddings() * add docstring Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2021-04-08 09:53:01 -07:00
Stas Bekman	acc851e1ff	[run_clm] clarify why we get the tokenizer warning on long input (#11145 ) * clarify why we get the warning here * Update examples/language-modeling/run_clm.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * wording * style Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-04-08 09:46:28 -07:00
Yusuke Mori	5bf5d50c8d	Typo fix of the name of BertLMHeadModel in BERT doc (#11133 )	2021-04-08 08:22:58 -04:00
Jannis Born	f8e90d6fb9	Fix typing error in Trainer class (prediction_step) (#11138 ) * fix: docstrings in prediction_step * ci: Satisfy line length requirements * ci: character length requirements	2021-04-08 08:22:25 -04:00
Sylvain Gugger	ffe0761777	Fix and refactor check_repo (#11127 )	2021-04-07 17:56:21 -04:00
Philipp Schmid	3fd7eee18f	Adds use_auth_token with pipelines (#11123 ) * added model_kwargs to infer_framework_from_model * added model_kwargs to tokenizer * added use_auth_token as named parameter * added dynamic get for use_auth_token	2021-04-07 20:32:59 +02:00
Stas Bekman	1c15128312	[versions] handle version requirement ranges (#11110 ) * handle version requirement ranges * add mixed requirement test * cleanup	2021-04-07 09:09:38 -07:00
Vasudev Gupta	7442801df5	fix tests (#11109 )	2021-04-07 10:07:26 -04:00
Lysandre Debut	c0d97cee13	Adds a note to resize the token embedding matrix when adding special … (#11120 ) * Adds a note to resize the token embedding matrix when adding special tokens * Remove superfluous space	2021-04-07 10:06:45 -04:00
Sylvain Gugger	02f7c2fe66	Some styling of the training table in Notebooks (#11118 )	2021-04-07 10:00:33 -04:00
Sylvain Gugger	11505fa139	Dummies multi backend (#11100 ) * Replaces requires_xxx by one generic method * Quality and update check_dummies * Fix inits check * Post-merge cleanup	2021-04-07 09:56:40 -04:00
Stas Bekman	424419f549	[examples] fix white space (#11099 ) these get concatenated without whitespace, so fix it	2021-04-07 09:20:58 -04:00
Stas Bekman	c9035e4537	fix: The 'warn' method is deprecated (#11105 ) * The 'warn' method is deprecated * fix test	2021-04-07 09:20:06 -04:00
Leo Gao	247bed3857	GPTNeo: handle padded wte (#11079 ) * GPTNeo: handle padded wte * Switch to config.vocab_size * apply review suggestion Co-authored-by: Suraj Patil <surajp815@gmail.com>	2021-04-07 17:35:20 +05:30
cronoik	083ad7d46c	dead link fixed (#11103 )	2021-04-07 07:50:47 -04:00
Sylvain Gugger	fd338abdeb	Style	2021-04-06 19:54:13 -04:00
SHYAM SUNDER KUMAR	aef4cf8c52	accelerate question answering examples with no trainer (#11091 ) * accelerate question answering examples with no trainer * removed train and eval flags also fixed fill np array function * Update examples/question-answering/run_qa_beam_search_no_trainer.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update examples/question-answering/run_qa_no_trainer.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-04-06 19:35:21 -04:00
Sylvain Gugger	403d530eec	Auto feature extractor (#11097 ) * AutoFeatureExtractor * Init and first tests * Tests * Damn you gitignore * Quality * Defensive test for when not all backends are here * Use pattern for Speech2Text models	2021-04-06 19:20:08 -04:00
Stas Bekman	520198f56f	[doc] gpt-neo (#11098 ) make the example work	2021-04-06 16:42:06 -04:00
Lysandre	9853c5dd58	Development on v4.6.0dev0	2021-04-06 12:53:25 -04:00
Lysandre	4906a29f7f	Release v4.5.0	2021-04-06 12:37:47 -04:00
Suraj Patil	2a8115f083	[WIP] GPT Neo cleanup (#10985 ) * better names * add attention mixin * all slow tests in one class * make helper methods static so we can test * add local attention tests * better names * doc * apply review suggestions	2021-04-06 12:24:15 -04:00
Philipp Schmid	76800fb8e6	added new merged Trainer test (#11090 )	2021-04-06 15:12:21 +02:00
Philipp Schmid	b219d6b5a5	added social thumbnail for docs (#11083 )	2021-04-06 14:56:18 +02:00
Sylvain Gugger	6c1bee7d89	Link to new blog	2021-04-06 08:55:40 -04:00
Stas Bekman	f7328de46d	HF emoji unicode doesn't work in console (#11081 ) It doesn't look like using 🤗 is a great idea for printing to console. See attachment. This PR proposes to replace 🤗 with "HuggingFace" for an exception message. @LysandreJik	2021-04-06 08:03:00 -04:00
Hemil Desai	6ab7d1a429	Add Readme for language modeling scripts with accelerate (#11073 )	2021-04-05 20:56:12 -04:00
Sylvain Gugger	2199608ca6	Make a base init in FeatureExtractionMixin (#11074 )	2021-04-05 18:02:28 -04:00
Sylvain Gugger	04ceee7d24	Fix distributed gather for tuples of tensors of varying sizes (#11071 )	2021-04-05 16:21:49 -04:00
Sylvain Gugger	f05a8a0c5e	Document common config attributes (#11070 )	2021-04-05 15:29:01 -04:00
Sylvain Gugger	090e3e6896	Add center_crop to ImageFeatureExtractoMixin (#11066 )	2021-04-05 15:28:51 -04:00
konstin	abb7430003	Replace pkg_resources with importlib_metadata (#11061 ) * Replace pkg_resources with importlib_metadata Fixes #10964. The other reason for this change is that pkg_resources has been [deprecated](`8fe85c22ce`) in favor of importlib_metadata. * Reduce to a single importlib_metadata import switch * Trigger CI Co-authored-by: Stas Bekman <stas@stason.org>	2021-04-05 12:12:19 -07:00
Hemil Desai	b51b87c41d	Add `examples/language_modeling/run_clm_no_trainer.py` (#11026 ) * Initial draft for clm no trainer * Remove unwanted args * Fix bug * Update examples/language-modeling/run_clm_no_trainer.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-04-05 12:27:52 -04:00
Amala Deshmukh	e1c02e018c	Add example for registering callbacks with trainers (#10928 ) * Add example for callback registry Resolves: #9036 * Update callback registry documentation * Added comments for other ways to register callback	2021-04-05 12:27:23 -04:00
Lysandre Debut	9f4e0c23d6	Documentation about loading a fast tokenizer within Transformers (#11029 ) * Documentation about loading a fast tokenizer within Transformers * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * style Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-04-05 10:51:16 -04:00
Sylvain Gugger	6c25f5228e	Refactor AutoModel classes and add Flax Auto classes (#11027 ) * Refactor AutoModel classes and add Flax Auto classes * Add new objects to the init * Fix hubconf and sort models * Fix TF tests * Missing coma * Update src/transformers/models/auto/auto_factory.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Fix init * Fix dummies * Other init to fix Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2021-04-05 10:11:28 -04:00
Lysandre Debut	eb3479e7cf	Some models have no tokenizers (#11064 )	2021-04-05 09:37:49 -04:00
Lysandre Debut	773e4c7263	Remove unnecessary space (#11060 )	2021-04-05 09:36:20 -04:00
Lysandre Debut	ef62f038fd	Pin docutils (#11062 ) * Pin docutils * Versions table	2021-04-05 09:35:21 -04:00
Eren Şahin	6e31014110	[doc] update code-block rendering (#11053 ) double : prevents code-block section to be rendered, so made it single :	2021-04-05 09:06:07 -04:00
Stas Bekman	3d39226a51	s\|Pretrained\|PreTrained\| (#11048 )	2021-04-04 18:08:42 -07:00
Sylvain Gugger	b0d49fd536	Add a script to check inits are consistent (#11024 )	2021-04-04 20:41:34 -04:00
versis	335c0ca35c	fixed typo: logging instead of logger (#11025 )	2021-04-02 09:22:22 -04:00
Philipp Schmid	34e1bec649	added new notebook and merge of trainer (#11015 ) * added new notebook and merge of trainer * Update docs/source/sagemaker.md Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2021-04-01 23:13:47 +02:00
Julien Chaumond	e8da77d181	[doc] no more bucket	2021-04-01 14:25:47 -04:00
Joe Davison	f4ad3d8cea	minor typo fix negative log-likelihood	2021-04-01 11:58:37 -06:00
cronoik	57c1749efa	DebertaTokenizer Rework closes #10258 (#10703 ) * closes #10258 * typo * reworked deberta test * implemented the comments from BigBird01 regarding sequence pair encoding of deberta * Update style * VOCAB_FILES_NAMES is now a oneliner as suggested by @sgugger Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * added #fmt: on as requested by @sgugger * Style Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2021-04-01 13:53:53 -04:00
NielsRogge	30677dc743	Add Vision Transformer and ViTFeatureExtractor (#10950 ) * Squash all commits into one * Update ViTFeatureExtractor to use image_utils instead of torchvision * Remove torchvision and add Pillow * Small docs improvement * Address most comments by @sgugger * Fix tests * Clean up conversion script * Pooler first draft * Fix quality * Improve conversion script * Make style and quality * Make fix-copies * Minor docs improvements * Should use fix-copies instead of manual handling * Revert "Should use fix-copies instead of manual handling" This reverts commit `fd4e591bce`. * Place ViT in alphabetical order Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-04-01 11:16:05 -04:00
cchen-dialpad	af6732225c	Improve the speed of adding tokens from added_tokens.json (#10780 ) * use bisect to add one token to unique_no_split_tokens * fix style	2021-04-01 08:56:12 -04:00

1 2 3 4 5 ...

6940 Commits