transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-27 00:09:00 +06:00

Author	SHA1	Message	Date
Sylvain Gugger	700229f8a4	Fixes in the templates (#10951 ) * Fixes in the templates * Define in all cases * Dimensionality -> Dimension Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>	2021-03-29 17:36:13 -04:00
Stas Bekman	05c966f24b	[vulnerability] dep fix (#10954 ) Fixes https://github.com/huggingface/transformers/security/dependabot/examples/research_projects/lxmert/requirements.txt/Pygments/open @LysandreJik	2021-03-29 17:25:47 -04:00
Stas Bekman	fb7fca718a	[trainer metrics] fix cpu mem metrics; reformat runtime metric (#10937 ) * fix cpu mem metrics; reformat runtime metric * adjust dependency * extend docs * soft dependency * cleanup * fix the runtime metric issue * restore * move docs, cross reference from 2 places, improve * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-03-29 13:47:02 -07:00
Daniel Stancl	5057213bcc	Add `examples/multiple-choice/run_swag_no_trainer.py` (#10934 ) * Initial commit * Another bunch of updates * make style quliaty + delete debug arg from bash script * Use compue_metrics func * Do a few fixes * Add copyright * Fix typos	2021-03-29 16:41:09 -04:00
pcuenca	ae6b6963ad	Allow use of pre-computed lengths when grouping by length. (#10953 ) A new argument `length_column_name` has been added to `TrainingArguments`, with default value `"length"`. If this column exists and `group_by_length` is `True`, the train sampler will use it for grouping rather than computing it before training starts. This is an optimization that allows the user to prepare data for fast processing, preventing sequential access to the dataset as described in issue #10909.	2021-03-29 15:44:19 -04:00
Sylvain Gugger	4002f95eb6	Remove duplicate code	2021-03-29 15:27:12 -04:00
Daniel Stancl	d7b50ce469	Add `examples/run_ner_no_trainer.py` (#10902 ) * Add NER example with accelerate library * This commit contains the first (yet really unfinished) version of a script for showing how to train HuggingFace model with their new accelerate library. * Fix metric calculation * make style quality * mv ner_no_trainer to token-classification dir * Delete --debug flag from running script * hf_datasets -> raw_datasets * Make a few slight adjustments * Add an informative comment + rewrite a help comment * Change header * Fix a few things * Enforce to use fast tokenizers only * DataCollatorWithPadding -> DataCollatorForTokenClassification * Change bash script: python3 -> accelerate launch * make style * Add a few missing things (see below) * Add a max-lenghth padding to predictions and labels to enable accelerate gather functionality * Add PyTorch no trainer example to the example README.md * Remove --do-train from args as being redundant for now * DataCollatorWithPadding -> DataCollatorForTokenClassification * Remove some obsolete args.do_train conditions from the script * Delete --do_train from bash running script * Delete use_slow_tokenizer from args * Add unintentionally removed flag --label_all_tokens * Delete --debug flag from running script	2021-03-29 15:11:23 -04:00
Sylvain Gugger	06a6fea782	Instantiate model only once in pipeline (#10888 ) * Instantiate model only once in pipeline * Remove documentation of deprecated method * Add FutureWarning * Update src/transformers/pipelines/base.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2021-03-29 10:39:14 -04:00
Masatoshi Suzuki	cc2366bbb9	Ignore not initialized NO_CONFIG_TOKENIZERs (#10936 )	2021-03-29 10:26:15 -04:00
WybeKoper	ddea8771c6	Updated colab links in readme of examples (#10932 ) Co-authored-by: WybeKoper <WybeKoper@users.noreply.github.com>	2021-03-29 08:47:09 -04:00
Guillaume Filion	b3544e4cc5	Return global attentions (see #7514 ) (#10906 )	2021-03-29 15:00:23 +03:00
Bhadresh Savani	4f21e1ddd6	fixed finename (#10939 )	2021-03-28 09:48:12 -07:00
Sylvain Gugger	b0595d33c1	Add ImageFeatureExtractionMixin (#10905 ) * Add ImageFeatureExtractionMixin * Add dummy vision objects * Add require_vision * Add tests * Fix test	2021-03-26 11:23:56 -04:00
Stas Bekman	3c27d246e5	[vulnerability] fix dependency (#10914 ) this PR fixes https://github.com/huggingface/transformers/security/dependabot/examples/research_projects/lxmert/requirements.txt/PyYAML/open	2021-03-26 09:06:11 -04:00
Tomy Hsieh	4b2b50aa7b	Rename NLP library to Datasets library (#10920 ) * Rename NLP library to Datasets library * Update github template * Fix styling	2021-03-26 08:07:59 -04:00
lexhuismans	86c6f8a8b1	Fix comment (#10886 )	2021-03-25 21:23:56 +03:00
Sylvain Gugger	9856c9213d	Reorder init imports	2021-03-25 12:51:43 -04:00
Sylvain Gugger	e70068a719	Fix typo	2021-03-25 12:40:25 -04:00
Sylvain Gugger	f183a7a3c3	Sort init imports	2021-03-25 12:38:54 -04:00
Amir Tahmasbi	4684bfc757	Layout lm tf 2 (#10636 ) * Added embeddings layer * Added layoutlm layers, main model, maskedlm and token classification classes * Added model classes to tf auto models * Added model to PT to TF conversion script * Added model to doc README * Added tests * Removed unused imports * Added layoutlm model, test, and doc for sequence classification, and fix imports in __init__.py * Made tests pass! * Fixed typos in imports and docs * Fixed a typo in embeddings layer * Removed imports * Fixed formatting issues, imports, tests * Added layoutlm layers, main model, maskedlm and token classification classes * Added model classes to tf auto models * Added model to PT to TF conversion script * Removed unused imports * Added layoutlm model, test, and doc for sequence classification, and fix imports in __init__.py * Made tests pass! * Fixed typos in imports and docs * Removed imports * Fixed small formatting issues * Removed duplicates import from main __init__.py * Chnaged deafult arg to true for adding pooling layer to tf layoutlm * Fixed formatting issues * Style * Added copied from to classes copied from bert * Fixed doc strings examples to work with layoutlm inputs * Removed PyTorch reference in doc strings example * Added integration tests * Cleaned up initialization file * Updated model checkpoint identifiers * Fixed imports Co-authored-by: Amir Tahmasbi <amir@ehsai.ca> Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>	2021-03-25 12:32:38 -04:00
Philipp Schmid	1a3e0c4fe6	make local setup more clearer and added missing links (#10899 )	2021-03-25 09:01:31 -04:00
Jethro Kuan	5f1491d3b3	run_glue_no_trainer: datasets -> raw_datasets (#10898 ) Use the correct variable (raw_datasets) instead of the module (datasets) where appropriate.	2021-03-25 08:28:17 -04:00
Sidd Karamcheti	1c06240e1b	Update training args ignore_skip_data -> ignore_data_skip (#10891 )	2021-03-24 16:44:51 -04:00
Sylvain Gugger	3b20e910b4	Remove version warning in pretrained BART models (#10890 ) * Remove version warning in pretrained BART models * Put it at the base model	2021-03-24 15:21:40 -04:00
Lysandre Debut	3c12e3c1c4	Fix overflowing bad word ids (#10889 ) * Removes overflowing bad word IDs * Raise warning	2021-03-24 15:13:56 -04:00
Eliza Szczechla	1f5ea9e04a	Add notebook on fine-tuning Bart (#10883 ) Co-authored-by: Eliza <eliza@habanero.tiger.com.pl>	2021-03-24 11:03:37 -04:00
imzhengzx	f81077fcf3	error type of tokenizer in __init__ definition (#10879 ) the orignal code in line 246 is ``` tokenizer: Optional["PreTrainedTokenizerBase"] = None, ``` it should be ``` tokenizer: Optional[PreTrainedTokenizerBase] = None, ```	2021-03-24 11:00:14 -04:00
Sylvain Gugger	1aed2b908e	Add new notebook links in the docs (#10876 )	2021-03-24 09:45:08 -04:00
Sylvain Gugger	a735f727cc	Fix test_trainer_distributed (#10875 )	2021-03-23 19:03:06 -04:00
Philipp Schmid	8c297cdb30	Sm trainer smp init fix (#10870 ) * rewrote is_sagemaker_model_parallel_available * added is_sagemaker_model_parallel_available to SageMakerTrainer * removed unnecessary mp_parameters as TrainingArguments * make style happy * added mp_parameters again to parse mp-specific args.	2021-03-23 20:07:55 +01:00
RafaelWO	d4d4447d53	fixed prefix_allowed_tokens_fn docstring in generate() (#10862 )	2021-03-23 13:48:22 -04:00
Bhadresh Savani	7ef40120a0	[Examples] Added predict stage and Updated Example Template (#10868 ) * added predict stage * added test keyword in exception message * removed example specific saving predictions * fixed f-string error * removed extra line Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>	2021-03-23 10:37:59 -07:00
Stas Bekman	fb2b89840b	[file_utils] import refactor (#10859 ) * import refactor * fix the fallback	2021-03-23 09:41:41 -07:00
Lysandre	3f48b2bc3e	Update stable docs	2021-03-23 11:01:16 -04:00
Philipp Schmid	77ffd5edd5	Amazon SageMaker Documentation (#10867 ) * added finished documentation * changed version from 1.6 to 1.6.0 for distributed * updated versions * updated urls	2021-03-23 10:56:44 -04:00
Sylvain Gugger	bf1f43fbd7	Update the example template for a no Trainer option (#10865 )	2021-03-23 10:02:39 -04:00
Marta Maślankowska	2eb596f085	Fix p_mask cls token masking in qa pipeline (#10863 )	2021-03-23 09:08:39 -04:00
Bhadresh Savani	eb330e8904	fixed typo (#10861 )	2021-03-23 08:15:28 -04:00
Stas Bekman	e21f89f64c	fix nan in full-fp16 label_smoothing eval (#10815 )	2021-03-22 19:23:24 -07:00
Sylvain Gugger	b5b957a65c	Make convert_to_onnx runable as script again (#10857 )	2021-03-22 22:16:39 -04:00
Patrick von Platen	77bf3fe787	[Generate] Add save mode logits processor to remove nans and infs if necessary (#10769 ) * push * finish * finish * make fix copies * change name	2021-03-23 01:00:05 +03:00
Eliza Szczechla	9f8fa4e973	Use DataCollatorForSeq2Seq in run_summarization in all cases (#10856 ) Co-authored-by: Eliza <eliza@habanero.tiger.com.pl>	2021-03-22 15:05:39 -04:00
Ruan Chaves	a8d4d6776d	Modify the Trainer class to handle simultaneous execution of Ray Tune and Weights & Biases (#10823 ) * Modify the _hp_search_setup method on the Trainer class to handle the wandb argument passed by Ray Tune to model config. * Reformat single quotes as double quotes.	2021-03-22 14:04:51 -04:00
Boris Dayma	125ccead71	feat(wandb): logging and configuration improvements (#10826 ) * feat: ensure unique artifact id * feat: allow manual init * fix: simplify reinit logic * fix: no dropped value + immediate commits * fix: wandb use in sagemaker * docs: improve documenation and formatting * fix: typos * docs: improve formatting	2021-03-22 10:45:17 -04:00
Sidd Karamcheti	b230181d41	Add simple one character fix so that on_step_begin and on_step_end are called at the right times (#10839 )	2021-03-22 09:15:39 -04:00
Stas Bekman	24ab5b08a3	[makefile] autogenerate target (#10814 ) * autogenerate target * clarify comment	2021-03-22 09:14:22 -04:00
Sebastian Olsson	2c6684239f	Correct AutoConfig call docstrings (#10822 )	2021-03-22 09:12:44 -04:00
Stas Bekman	8fb4671811	[vulnerability] in example deps fix (#10817 ) Takes care of: https://github.com/huggingface/transformers/security/dependabot/examples/research_projects/lxmert/requirements.txt/jinja2/open @LysandreJik Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2021-03-22 09:05:24 -04:00
dependabot[bot]	dbfe379514	Bump jinja2 from 2.11.2 to 2.11.3 in /examples/research_projects/lxmert (#10818 ) Bumps [jinja2](https://github.com/pallets/jinja) from 2.11.2 to 2.11.3. - [Release notes](https://github.com/pallets/jinja/releases) - [Changelog](https://github.com/pallets/jinja/blob/master/CHANGES.rst) - [Commits](https://github.com/pallets/jinja/compare/2.11.2...2.11.3) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2021-03-22 08:54:50 -04:00
Qiushi Pan	29904a967b	Update FINE_TUNE_XLSR_WAV2VEC2.md (#10849 ) Fix typo.	2021-03-22 07:58:59 -04:00

... 5 6 7 8 9 ...

7165 Commits