transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-31 02:02:21 +06:00

Author	SHA1	Message	Date
cody-moveworks	a54961c5f7	Make OpenAIGPTTokenizer work with SpaCy 2.x and 3.x (#15019 ) * Make OpenAIGPTTokenizer work with SpaCy 3.x SpaCy 3.x introduced an API change to creating the tokenizer that breaks OpenAIGPTTokenizer. The old API for creating the tokenizer in SpaCy 2.x no longer works under SpaCy 3.x, but the new API for creating the tokenizer in SpaCy 3.x DOES work under SpaCy 2.x. Switching to the new API should allow OpenAIGPTTokenizer to work under both SpaCy 2.x and SpaCy 3.x versions. * Add is_spacy_available and is_ftfy_available methods to file utils * Add spacy and ftfy unittest decorator to testing utils * Add tests for OpenAIGPTTokenizer that require spacy and ftfy * Modify CircleCI config to run tests that require spacy and ftfy * Remove unneeded unittest decorators are reuse test code * Run make fixup	2022-01-10 07:53:20 -05:00
Kamal Raj	9fbf7c87c3	Update check_repo.py (#15014 ) added new line	2022-01-10 06:55:43 -05:00
Yih-Dar	0a03a86813	fix model table cell text alignment (#14999 ) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-01-10 06:44:11 -05:00
Patrick von Platen	d72343d2b8	[Wav2Vec2 Speech Event] Add speech event v2 (#15083 ) * up * up * up * up * up * up * improve * up * up * Update src/transformers/trainer.py * up * up * up	2022-01-10 10:46:21 +01:00
yoquankara	768e6c1449	Fix convert for newer megatron-lm bert model (#14082 ) * Fix convert for newer megatron-lm models * Save megatron-bert config in a proper way * Fix code style	2022-01-08 11:33:55 -08:00
Yih-Dar	623b4f7c63	[VisionTextDualEncoder] Add token_type_ids param (#15073 ) * fix doc example - TypeError: get_text_features() got an unexpected keyword argument 'token_type_ids' * add token_type_ids param Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-01-07 20:02:49 +01:00
Yih-Dar	ac224bb079	[Fix doc examples] Add missing from_pretrained (#15044 ) * fix doc example - ValueError: Parameter config should be an instance of class `PretrainedConfig` * Update src/transformers/models/segformer/modeling_segformer.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * update Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>	2022-01-07 16:55:59 +01:00
K.C. Tung	f18c6fa94c	Resubmit changes after rebase to master (#14982 )	2022-01-07 08:34:12 +01:00
Yih-Dar	cc406da4de	[VisionTextDualEncoder] Fix doc example Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-01-06 17:59:06 +01:00
flozi00	b67f345d00	Update run_speech_recognition_seq2seq.py (#14967 )	2022-01-06 19:26:45 +03:00
Tavin Turner	f71fb5c36e	Add 'with torch.no_grad()' to BertGeneration integration test forward passes (#14963 )	2022-01-06 10:39:13 -05:00
Nicolas Patry	d2183a46fb	Remove old asserts. (#15012 )	2022-01-06 09:45:41 -05:00
NielsRogge	83c552d390	Add detectron2 to Github actions (#15053 )	2022-01-06 08:53:58 -05:00
Matt Churgin	5ab87cd4da	wrapped forward passes in torch.no_grad() (#15037 )	2022-01-06 08:48:49 -05:00
Nicolas Patry	5a06118b39	Enabling `TF` on `image-classification` pipeline. (#15030 )	2022-01-06 14:16:00 +01:00
Yih-Dar	9f89fa02ed	Add Flax image captioning example (#14864 ) * add image captioning example * update README * fix style & quality * simplify * apply review suggestions * Apply suggestions from code review Co-authored-by: Suraj Patil <surajp815@gmail.com> * Apply suggestions from code review Co-authored-by: Suraj Patil <surajp815@gmail.com> * Apply review suggestions * add comments about using np instead jax array * remove unused lines * add model creation script * only support from_pretrained * fix style * fix * not use cache_dir when creating model * fix tokenizer creation * update README * fix quality * apply suggestion * simplify some blocks * Update examples/flax/image-captioning/README.md * Update examples/flax/image-captioning/run_image_captioning_flax.py Co-authored-by: Suraj Patil <surajp815@gmail.com> * apply suggestion Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Suraj Patil <surajp815@gmail.com>	2022-01-06 14:00:54 +01:00
Suraj Patil	2e9af29494	[CLIP] Fix TF test (#15042 )	2022-01-05 16:58:42 +01:00
Patrick von Platen	443fdaf29f	[SpeechEncoderDecoder] Fix from pretrained (#15043 )	2022-01-05 16:54:39 +01:00
Patrick von Platen	ae929dcbbd	[CLIP] Fix PT test (#15041 )	2022-01-05 14:21:04 +01:00
Nicolas Patry	65cb94ff77	Adding QoL for `batch_size` arg (like others enabled everywhere). (#15027 ) * Adding QoL for `batch_size` arg (like others enabled everywhere). * Typo.	2022-01-05 12:16:23 +01:00
Yih-Dar	e34dd055e9	Fix doc example: mask_time_indices (numpy) has no attribute 'to' (#15033 ) * fix doc example - AttributeError: 'numpy.ndarray' object has no attribute 'to' * fix more * Apply suggestions from code review * Update src/transformers/models/unispeech/modeling_unispeech.py Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2022-01-05 11:34:08 +01:00
Stas Bekman	927f654427	[megatron convert] PYTHONPATH requirements (#14956 ) * [megatron convert] PYTHONPATH requirements * more info	2022-01-05 04:09:52 -05:00
Kevin Ko	857ab55c01	[doc] Update parallelism.mdx (#15018 ) * Update parallelism.mdx * Update parallelism.mdx	2022-01-04 09:58:27 -08:00
Nicolas Patry	19d37c2dd3	Hotfix `chunk_length_s` instead of `_ms`. (#15029 ) * Hotfix `chunk_length_s` instead of `_ms`. * Adding fix of `pad_token` which should be last/previous token for CTC proper decoding * Fixing ChunkPipeline unwrapping. * Adding a PackIterator specific test.	2022-01-04 14:07:44 +01:00
Daniel Stancl	21aecc0971	Add Flax RoFormer (#15005 ) * Add FlaxRoFormer * Clean code + make quality * Fix output pooling for FlaxRoFormerForMultipleChoiceModule * Apply suggestions from code review * add flax model to repos Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2022-01-04 13:23:10 +01:00
milyiyo	9e1775dd23	Fix a little typo (#15002 )	2022-01-04 12:59:47 +01:00
flozi00	774ed4a027	Fix Code block (#14983 )	2022-01-04 12:59:20 +01:00
Kevin Ko	f2ab21833f	Update parallelism.mdx (#15013 ) * Update parallelism.mdx * Update parallelism.mdx * Update parallelism.mdx * Update parallelism.mdx * Update parallelism.mdx * Update parallelism.mdx * Update parallelism.mdx * Update parallelism.mdx	2022-01-03 11:49:27 -08:00
Patrick von Platen	dbac8899fe	[Tests] Correct Wav2Vec2 & WavLM tests (#15015 ) * up * up * up	2022-01-03 20:19:04 +01:00
Yih-Dar	0b4c3a1a53	fix missing import (#15016 ) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-01-03 19:11:47 +01:00
Anton Lozhkov	38f95d1846	Large audio chunking for the existing ASR pipeline (#14896 ) * Naive ASR chunking * Fixing batching for ASR. Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>	2022-01-03 16:54:17 +01:00
Nicolas Patry	d33dc7966a	Improve truncation_side (#14947 ) * Enabling `truncation_side` for Slow and Fast tokenizer. Co-Authored-by: Niels Rogge <48327001+NielsRogge@users.noreply.github.com> * Disable failing tests. * Layout xlm. * assert -> assertEqual. Co-authored-by: Niels Rogge <48327001+NielsRogge@users.noreply.github.com>	2022-01-03 16:18:39 +01:00
Nicolas Patry	8c2618e6aa	Fixing t2t pipelines lists outputs. (#15008 ) Backward compatibility broken in https://github.com/huggingface/transformers/pull/14988	2022-01-03 14:49:58 +01:00
Sylvain Gugger	8f6373c61c	Map model_type and doc pages names (#14944 ) * Map model_type and doc pages names * Add script * Fix typo * Quality * Manual check for Auto Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>	2022-01-03 05:08:55 -05:00
Sylvain Gugger	e68c3756fe	Allow training to resume even if RNG states are not properly loaded (#14994 ) * Allow training to resume even if RNG states are not properly loaded * Proper f-string	2021-12-30 17:03:20 -05:00
Nicolas Patry	08cb5718ec	Enabling `tokenizers` upgrade. (#14941 ) * Enabling `tokenizers` upgrade. * Moved ugly comment. * Tokenizers==0.11.1 needs an update to keep borrow checker happy in highly contiguous calls. * Support both 0.11.1 and 0.11.0	2021-12-30 17:30:58 +01:00
Nicolas Patry	f8a989cfb2	Adding `num_return_sequences` support for text2text generation. (#14988 ) * Adding `num_return_sequences` support for text2text generation. Co-Authored-By: Enze <pu.miao@foxmail.com> * Update tests/test_pipelines_text2text_generation.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update tests/test_pipelines_text2text_generation.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Enze <pu.miao@foxmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-12-30 16:17:15 +01:00
Patrick von Platen	c043ce6cfd	[Generate] correct encoder_outputs are passed without attention_mask (#14980 ) * [Generate] correct encoder_outputs are passed without attention_mask * Apply suggestions from code review * up	2021-12-30 10:16:03 +01:00
Patrick von Platen	a1392883ce	[AutoProcessor] Correct AutoProcessor and automatically add processor… (#14881 ) * [AutoProcessor] Correct AutoProcessor and automatically add processor class * up * up * up * up * up * up * up * up * continue tomorrow * up * up * up * make processor class private * fix loop	2021-12-30 09:56:43 +01:00
Nicolas Patry	d7d60df0ec	Fixing a pathological case for slow tokenizers (#14981 ) * Fixing a pathological case for slow tokenizers * Update src/transformers/tokenization_utils.py	2021-12-30 09:10:34 +01:00
Stas Bekman	d1ba56d8d8	remove absl workaround as it's no longer needed (#14909 ) the absl workaround hasn't been needed since 2019-04 https://github.com/abseil/abseil-py/issues/99 so it should be safe to remove it.	2021-12-29 17:18:03 -05:00
Jake Tae	04cddaf402	refactor: replace `assert` with `ValueError` (#14970 )	2021-12-29 10:09:54 -05:00
Patrick von Platen	600496fa50	[Wav2Vec2] Rename model's feature extractor to feature encoder (#14959 ) * rename classes * clean up more namings * remove bogus file * Apply suggestions from code review * Apply suggestions from code review * replace more names * more regex replace * make style * correct * correct more * make style * finish * correct more in wav2vec2 * make style * improve freeze_extractor * add aliases * add tf aliases	2021-12-28 20:33:23 +01:00
Patrick von Platen	1bfa347707	[Tests] Speed up tokenizer tests (#14964 ) * speed up canine and mluke * speed up mbart and mbart50 toks * upload files	2021-12-28 17:02:50 +01:00
Patrick von Platen	f80775df2b	Update README.md (#14965 )	2021-12-28 13:41:27 +01:00
Patrick von Platen	1e847b40c0	[WavLM] give model for precision (#14958 )	2021-12-28 11:07:05 +01:00
Patrick von Platen	1c121916f3	Add Speech Seq2Seq Training script (#14792 ) * start * add gradient checkpointing and feature extractor freezing * Apply suggestions from code review * up * up * up * correct * up * more changes * up * up * up * remove rst	2021-12-28 10:20:51 +01:00
Stas Bekman	10fd4fa1a6	[doc] :class: hunt (#14955 ) * [doc] :class: hunt * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fix the fix + style Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2021-12-27 17:17:38 -08:00
Sylvain Gugger	2c5597f6c7	Style	2021-12-27 19:18:08 -05:00
Sylvain Gugger	b5e2b183af	Doc styler examples (#14953 ) * Fix bad examples * Add black formatting to style_doc * Use first nonempty line * Put it at the right place * Don't add spaces to empty lines * Better templates * Deal with triple quotes in docstrings * Result of style_doc * Enable mdx treatment and fix code examples in MDXs * Result of doc styler on doc source files * Last fixes * Break copy from	2021-12-27 19:07:46 -05:00

1 2 3 4 5 ...

8635 Commits