transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-05 13:50:13 +06:00

Author	SHA1	Message	Date
jiqing-feng	49a0bef4c1	enable low-precision pipeline (#31625 ) * enable low-precision pipeline * fix parameter for ASR * reformat * fix asr bug * fix bug for zero-shot * add dtype check * rm useless comments * add np.float16 check * Update src/transformers/pipelines/image_classification.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/pipelines/token_classification.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * fix comments * fix asr check * make fixup * No more need for is_torch_available() --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> Co-authored-by: Matt <rocketknight1@gmail.com>	2024-09-20 16:43:30 -07:00
Joao Gante	7542fac2c7	Pipeline: no side-effects on `model.config` and `model.generation_config` 🔫 (#33480 )	2024-09-18 15:43:06 +01:00
Yoach Lacombe	f883827c0a	Fix tests in ASR pipeline (#33545 )	2024-09-18 16:25:45 +02:00
Raushan Turganbay	65bb284448	Compile compatibilty for decoder-only models (#32617 ) * squash into one commit * add qwen2-vl for rope standardization * fix mistral compile * fix qwen2-vl * fix-copies	2024-09-09 10:59:04 +02:00
Matt	52a0213755	Add assistant prefill for chat templates and TextGenerationPipeline (#33198 ) * Add assistant prefill to chat templates * Add assistant prefill to pipeline * Add assistant prefill to pipeline * Tweak another test that ended in assistant message * Update tests that ended in assistant messages * Update tests that ended in assistant messages * Replace assistant_prefill with continue_final_message * Allow passing continue_final_message to pipeline * Small fixup * Add continue_final_message as a pipeline kwarg * Update docstrings * Move repos to hf-internal-testing! * Update src/transformers/tokenization_utils_base.py Co-authored-by: Lysandre Debut <hi@lysand.re> * Add explanatory comment * make fixup * Update chat templating docs to explain continue_last_message --------- Co-authored-by: Lysandre Debut <hi@lysand.re>	2024-09-02 13:23:47 +01:00
Arthur	b017a9eb11	Refactor CI: more explicit (#30674 ) * don't run custom when not needed? * update test fetcher filtering * fixup and updates * update * update * reduce burden * nit * nit * mising comma * this? * this? * more parallelism * more * nit for real parallelism on tf and torch examples * update * update * update * update * update * update * update * update * update * update * update * update * update to make it more custom * update to make it more custom * update to make it more custom * update to make it more custom * update * update * update * update * update * update * use correct path * fix path to test files and examples * filter-tests * filter? * filter? * filter? * nits * fix naming of the artifacts to be pushed * list vs files * list vs files * fixup * fix list of all tests * fix the install steps * fix the install steps * fix the config * fix the config * only split if needed * only split if needed * extend should fix it * extend should fix it * arg * arg * update * update * run tests * run tests * run tests * more nits * update * update * update * update * update * update * update * simpler way to show the test, reduces the complexity of the generated config * simpler way to show the test, reduces the complexity of the generated config * style * oups * oups * fix import errors * skip some tests for now * update doctestjob * more parallelism * fixup * test only the test in examples * test only the test in examples * nits * from Arthur * fix generated congi * update * update * show tests * oups * oups * fix torch job for now * use single upload setp * oups * fu*k fix * nit * update * nit * fix * fixes * [test-all] * add generate marker and generate job * oups * torch job runs not generate tests * let repo utils test all utils * UPdate * styling * fix repo utils test * more parallel please * don't test * update * bit more verbose sir * more * hub were skipped * split by classname * revert * maybe? * Amazing catch Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> * fix * update * update * maybe non capturing * manual convert? * pass artifacts as parameters as otherwise the config is too long * artifact.json * store output * might not be safe? * my token * mmm? * use CI job IS * can't get a proper id? * ups * build num * update * echo url * this? * this! * fix * wget * ish * dang * udpdate * there we go * update * update * pass all * not .txt * update * fetcg * fix naming * fix * up * update * update * ?? * update * more updates * update * more * skip * oups * pr documentation tests are currently created differently * update * hmmmm * oups * curl -L * update * ???? * nit * mmmm * ish * ouf * update * ish * update * update * updatea * nit * nit * up * oups * documentation_test fix * test hub tests everything, just marker * update * fix * test_hub is the only annoying one now * tf threads? * oups * not sure what is happening? * fix? * just use folder for stating hub * I am getting fucking annoyed * fix the test? * update * uupdate * ? * fixes * add comment! * nit --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>	2024-08-30 18:17:25 +02:00
Matt	38d58a4427	Fix local repos with remote code not registering for pipelines (#33100 ) * Extremely experimental fix! * Try removing the clause entirely * Add test * make fixup * stash commit * Remove breakpoint * Add anti-regression test * make fixup * Move repos to hf-internal-testing!	2024-08-30 16:56:22 +01:00
Juan Pizarro	7591ca5bc5	🚨 Add Blip2ForImageTextRetrieval (#29261 ) * add Blip2ForImageTextRetrieval * use one line and remove unnecessary space in tests Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * use value from the config, rather than hardcoded * change order of params in Blip2QFormerModel.forward * update docstring * fix style * update test_inference_opt * move embeddings out of Blip2QFormerModel * remove from_vision_qformer_configs * remove autocast float16 in Blip2QFormerModel * rename fiels into vision_projection,text_projection,use_image_text_matching_head * use CLIPOutput for Blip2ImageTextMatchingModelOutput * remove past_key_values_length from Blip2TextEmbeddings * fix small typo in the CLIPOutput docstring * add Blip2ForImageTextRetrieval to Zero Shot Image Classification mapping * update docstring and add require_torch_fp16 * rollback test_inference_opt * use use_image_text_matching_head=True in convert * skip test_model_get_set_embeddings * fix create_rename_keys error on new itm fields * revert to do scale after dot product between "query" and "key" * fix ValueError on convert script for blip2-opt-2.7b * update org of paths to Salesforce * add is_pipeline_test_to_skip for VisualQuestionAnsweringPipelineTests * [run_slow] blip_2 * removed Blip2ForImageTextRetrieval from IGNORE_NON_AUTO_CONFIGURED * fix docstring of Blip2ImageTextMatchingModelOutput * [run_slow] blip_2 * fix multi-gpu tests * [run_slow] blip_2 * [run_slow] blip_2 --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-08-27 18:50:27 +01:00
Matt	9956c2bc98	Add a fix for custom code tokenizers in pipelines (#32300 ) * Add a fix for the case when tokenizers are passed as a string * Support image processors and feature extractors as well * Reverting load_feature_extractor and load_image_processor * Add test * Test is torch-only * Add tests for preprocessors and feature extractors and move test * Extremely experimental fix * Revert that change, wrong branch! * Typo! * Split tests	2024-08-27 14:39:57 +01:00
Fanli Lin	b5016d5de7	fix tensors on different devices in `WhisperGenerationMixin` (#32316 ) * fix * enable on xpu * no manual remove * move to device * remove to * add move to	2024-08-13 11:29:57 +01:00
Sanchit Gandhi	7f5d644e69	[pipeline] fix padding for 1-d tensors (#31776 ) * [pipeline] fix padding for 1-d tensors * add test * make style * Update tests/pipelines/test_pipelines_automatic_speech_recognition.py Co-authored-by: Kamil Akesbi <45195979+kamilakesbi@users.noreply.github.com> * Update tests/pipelines/test_pipelines_automatic_speech_recognition.py --------- Co-authored-by: Kamil Akesbi <45195979+kamilakesbi@users.noreply.github.com>	2024-07-29 21:24:42 +08:00
amyeroberts	165116bc14	Remove conversational pipeline tests (#32099 ) Remove conversation pipeline tests	2024-07-24 14:03:40 +01:00
Sanchit Gandhi	f83c6f1d02	Remove `trust_remote_code` when loading Libri Dummy (#31748 ) * [whisper integration] use parquet dataset for testing * propagate to others * more propagation * last one	2024-07-23 14:54:38 +08:00
Sai-Suraj-27	12b6880c81	fix: Fixed raising `TypeError` instead of `ValueError` for invalid type (#32111 ) * Raised TypeError instead of ValueError for invalid types. * Updated formatting using ruff. * Retrieved few changes. * Retrieved few changes. * Updated tests accordingly.	2024-07-22 17:46:17 +01:00
Robin Bakker	b31d595040	Add language to word timestamps for Whisper (#31572 ) * add language to words _collate_word_timestamps uses the return_language flag to determine whether the language of the chunk should be added to the word's information * ran style checks added missing comma * add new language test test that the pipeline can return both the language and timestamp * remove model configuration in test Removed model configurations that do not influence test results * remove model configuration in test Removed model configurations that do not influence test results	2024-07-17 21:32:53 +01:00
Yih-Dar	4879ac2b33	Avoid failure `TFBlipModelTest::test_pipeline_image_to_text` (#31827 ) * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-07-08 13:49:21 +02:00
Billy Cao	ac26260436	Allow FP16 or other precision inference for Pipelines (#31342 ) * cast image features to model.dtype where needed to support FP16 or other precision in pipelines * Update src/transformers/pipelines/image_feature_extraction.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Use .to instead * Add FP16 pipeline support for zeroshot audio classification * Remove unused torch imports * Add docs on FP16 pipeline * Remove unused import * Add FP16 tests to pipeline mixin * Add fp16 placeholder for mask_generation pipeline test * Add FP16 tests for all pipelines * Fix formatting * Remove torch_dtype arg from is_pipeline_test_to_skip* * Fix format * trigger ci --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-07-05 17:21:50 +01:00
Joao Gante	82486e5995	🚨🚨 TextGenerationPipeline: rely on the tokenizer default kwargs (#31747 ) * rely on the tokenizer default kwargs * fix a few tests	2024-07-02 16:17:42 +02:00
amyeroberts	1de7dc7403	Skip tests properly (#31308 ) * Skip tests properly * [test_all] * Add 'reason' as kwarg for skipTest * [test_all] Fix up * [test_all]	2024-06-26 21:59:08 +01:00
jiqing-feng	a958c4a801	fix output data type of image classification (#31444 ) * fix output data type of image classification * add tests for low-precision pipeline * add bf16 pipeline tests * fix bf16 tests * Update tests/pipelines/test_pipelines_image_classification.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix import * fix import torch * fix style --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-06-25 11:14:39 +01:00
Hiroshi Matsuda	0e23e60a5a	Fix bug about add_special_tokens and so on (#31496 ) * fix bug about add_special_tokens and so on * improve add_special_tokens and padding behavior * add a test case for add_special_tokens and padding	2024-06-24 14:05:16 +01:00
Albert Villanova del Moral	a14b055b65	Pass datasets trust_remote_code (#31406 ) * Pass datasets trust_remote_code * Pass trust_remote_code in more tests * Add trust_remote_dataset_code arg to some tests * Revert "Temporarily pin datasets upper version to fix CI" This reverts commit `b7672826ca`. * Pass trust_remote_code in librispeech_asr_dummy docstrings * Revert "Pin datasets<2.20.0 for examples" This reverts commit `833fc17a3e`. * Pass trust_remote_code to all examples * Revert "Add trust_remote_dataset_code arg to some tests" to research_projects * Pass trust_remote_code to tests * Pass trust_remote_code to docstrings * Fix flax examples tests requirements * Pass trust_remote_dataset_code arg to tests * Replace trust_remote_dataset_code with trust_remote_code in one example * Fix duplicate trust_remote_code * Replace args.trust_remote_dataset_code with args.trust_remote_code * Replace trust_remote_dataset_code with trust_remote_code in parser * Replace trust_remote_dataset_code with trust_remote_code in dataclasses * Replace trust_remote_dataset_code with trust_remote_code arg	2024-06-17 17:29:13 +01:00
Matt	065729a692	Remove ConversationalPipeline and Conversation object (#31165 ) * Remove ConversationalPipeline and Conversation object, as they have been deprecated for some time and are due for removal * Update not-doctested.txt * Fix JA and ZH docs * Fix JA and ZH docs some more * Fix JA and ZH docs some more	2024-06-07 17:50:18 +01:00
Vu Huy Nguyen	f9296249a3	Pipeline VQA: Add support for list of images and questions as pipeline input (#31217 ) * Add list check for image and question * Handle passing two lists and update docstring * Add tests * Add support for dataset * Add test for dataset as input * fixup * fix unprotected import * fix unprotected import * fix import again * fix param type	2024-06-06 14:50:45 +01:00
amyeroberts	4ba66fdb4c	Fix pipeline tests - torch imports (#31227 ) * Fix pipeline tests - torch imports * Frameowrk dependant float conversion	2024-06-04 12:30:23 +01:00
Chujie Zheng	6b22a8f2d8	fix bf16 issue in text classification pipeline (#30996 ) * fix logits dtype * Add bf16/fp16 tests for text_classification pipeline * Update test_pipelines_text_classification.py * fix * fix	2024-06-04 11:20:48 +01:00
Kamil Akesbi	eb1a77bbb0	Using assistant in AutomaticSpeechRecognitionPipeline with different encoder size (#30637 ) * fiw input to generate in pipeline * fixup * pass input_features to generate with assistant * error if model and assistant with different enc size * fix * apply review suggestions * use self.config.is_encoder_decoder * pass inputs to generate directly * add slow tests * Update src/transformers/generation/utils.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update tests/pipelines/test_pipelines_automatic_speech_recognition.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update tests/pipelines/test_pipelines_automatic_speech_recognition.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update tests/pipelines/test_pipelines_automatic_speech_recognition.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update tests/pipelines/test_pipelines_automatic_speech_recognition.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update tests/pipelines/test_pipelines_automatic_speech_recognition.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * apply review * Update src/transformers/generation/utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/pipelines/test_pipelines_automatic_speech_recognition.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * apply code review * update attributes encoder_xyz to check * Update src/transformers/generation/utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * add slow test * solve conflicts --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2024-05-23 09:59:38 +01:00
Sanchit Gandhi	0948c827de	[Whisper] Strip prompt before finding common subsequence (#27836 )	2024-05-22 17:25:47 +01:00
Jonatan Kłosko	1518508467	Avoid extra chunk in speech recognition (#29539 )	2024-05-22 14:07:51 +01:00
Hafedh	c11ac7857b	fix for custom pipeline configuration (#29004 ) * fix for custom pipeline configuration * fix for custom pipelines * remove extra exception * added test for custom pipelines extra tag * format with ruff * limit extra tag for first time only * format with ruff * improve tests for custom pipelines	2024-05-20 11:38:32 +02:00
Fanli Lin	69d9bca55a	enable Pipeline to get device from model (#30534 ) * check model.device * fix * style fix * move model device * remove print * add comment * fix * add unit test * optimize * change test names and add more cases * Update tests/pipelines/test_pipelines_common.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-05-13 15:00:39 +01:00
Kamil Akesbi	9c8979e35f	Word-level timestamps broken for short-form audio (#30325 ) * force chunk_length_s in AutomaticSpeechRecognitionPipeline * compute num_frames even when stride is None * add slow tests * fix test * Update src/transformers/pipelines/automatic_speech_recognition.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/pipelines/test_pipelines_automatic_speech_recognition.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * add input validation * fixup * small fix --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-05-07 10:17:27 +01:00
DarshanDeshpande	2ecefc3959	Add chat templating support for KeyDataset in text-generation pipeline (#30558 ) * added chat templating support for keydataset in generation pipeline * fixed and improved test * fix formatting test failures * Fix tests * Fix tests	2024-04-30 19:51:41 +01:00
Matt	2de5cb12be	Use the Keras set_random_seed in tests (#30504 ) Use the Keras set_random_seed to ensure reproducible weight initialization	2024-04-26 16:14:53 +01:00
Yih-Dar	28a22834bf	Fix all torch pipeline failures except one (#30290 ) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-04-18 10:35:43 +02:00
Yih-Dar	eb75516e7c	Fix `Fatal Python error: Bus error` in `ZeroShotAudioClassificationPipelineTests` (#30283 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-04-17 11:47:30 +02:00
Hafedh	0eaef0c709	add `push_to_hub` to pipeline (#29172 ) * add `push_to_hub` to pipeline * fix docs * format with ruff * update save_pretrained * update save_pretrained * remove unnecessary comment * switch to push_to_hub method in DynamicPipelineTester * remove unused imports * update docs for add_new_pipeline * fix docs for add_new_pipeline * add comment * fix italien docs * changes to token retrieval for pipelines * Update src/transformers/pipelines/base.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-04-16 15:34:04 +01:00
Matt	ec59a42192	Revert workaround for TF safetensors loading (#30128 ) * See if we can get tests to pass with the fixed weights * See if we can get tests to pass with the fixed weights * Replace the revisions now that we don't need them anymore	2024-04-09 11:04:18 +01:00
amyeroberts	7f9aff910b	Patch fix - don't use safetensors for TF models (#30118 ) * Patch fix - don't use safetensors for TF models * Skip test for TF for now * Update for another test	2024-04-08 13:29:20 +01:00
Wang, Yi	79d62b2da2	if output is tuple like facebook/hf-seamless-m4t-medium, waveform is … (#29722 ) * if output is tuple like facebook/hf-seamless-m4t-medium, waveform is the first element Signed-off-by: Wang, Yi <yi.a.wang@intel.com> * add test and fix batch issue Signed-off-by: Wang, Yi <yi.a.wang@intel.com> * add dict output support for seamless_m4t Signed-off-by: Wang, Yi <yi.a.wang@intel.com> --------- Signed-off-by: Wang, Yi <yi.a.wang@intel.com>	2024-04-05 09:26:44 +02:00
Fanli Lin	e4f5b57a3b	[tests] fix the wrong output in `ImageToTextPipelineTests.test_conditional_generation_llava` (#29975 ) bug fix	2024-04-01 13:08:39 +02:00
yunxiangtang	b32bf85b58	Replace 'decord' with 'av' in VideoClassificationPipeline (#29747 ) * replace the 'decord' with 'av' in VideoClassificationPipeline * fix the check of backend in VideoClassificationPipeline * adjust the order of imports * format 'video_classification.py' * format 'video_classification.py' with ruff --------- Co-authored-by: wanqiancheng <13541261013@163.com>	2024-03-26 10:12:24 +00:00
Yuki Watanabe	8e9a2207b3	Populate torch_dtype from model to pipeline (#28940 ) * Populate torch_dtype from model to pipeline Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> * use property Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> * lint Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> * Remove default handling Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> --------- Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>	2024-03-25 10:46:40 +01:00
Wang, Yi	8ee1d47203	fix image-to-text batch incorrect output issue (#29342 ) * fix image-to-text batch incorrect output issue Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * add ci test Signed-off-by: Wang, Yi <yi.a.wang@intel.com> * update ci test Signed-off-by: Wang, Yi <yi.a.wang@intel.com> --------- Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> Signed-off-by: Wang, Yi <yi.a.wang@intel.com>	2024-03-08 11:11:10 +00:00
Fanli Lin	fa7f3cf336	[tests] enable test_pipeline_accelerate_top_p on XPU (#29309 ) * use torch_device * Update tests/pipelines/test_pipelines_text_generation.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix style --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-03-05 09:16:05 +01:00
Fanli Lin	aade711d1e	[tests] enable automatic speech recognition pipeline tests on XPU (#29308 ) * use require_torch_gpu * enable on XPU	2024-03-04 08:24:38 +01:00
Raushan Turganbay	ddf7ac4237	Token level timestamps for long-form generation in Whisper (#29148 )	2024-02-27 18:15:26 +00:00
amyeroberts	e770f0316d	[`pipeline`] Add pool option to image feature extraction pipeline (#28985 ) * Add pool option * PR comments - error message and exact outputs check	2024-02-20 20:22:08 +00:00
Matt	2f1003be86	Add chat support to text generation pipeline (#28945 ) * Add chat support to text generation pipeline * Better handling of single elements * Deprecate ConversationalPipeline * stash commit * Add missing add_special_tokens kwarg * Update chat templating docs to refer to TextGenerationPipeline instead of ConversationalPipeline * Add ✨TF✨ tests * @require_tf * Add type hint * Add specific deprecation version * Remove unnecessary do_sample * Remove todo - the discrepancy has been resolved * Update src/transformers/tokenization_utils_base.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/pipelines/text_generation.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-02-16 16:41:01 +00:00
Lysandre Debut	f497f564bb	Update all references to canonical models (#29001 ) * Script & Manual edition * Update	2024-02-16 08:16:58 +01:00
NielsRogge	f278ef20ed	[Nougat] Fix pipeline (#28242 ) * Fix pipeline * Remove print statements * Address comments * Address issue * Remove unused imports	2024-02-12 10:21:15 +01:00
Daniel Korat	abf8f54a01	⚠️ Raise `Exception` when trying to generate 0 tokens ⚠️ (#28621 ) * change warning to exception * Update src/transformers/generation/utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * validate `max_new_tokens` > 0 in `GenerationConfig` * fix truncation test parameterization in `TextGenerationPipelineTests` --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2024-02-07 13:42:01 +01:00
amyeroberts	ba3264b4e8	Image Feature Extraction pipeline (#28216 ) * Draft pipeline * Fixup * Fix docstrings * Update doctest * Update pipeline_model_mapping * Update docstring * Update tests * Update src/transformers/pipelines/image_feature_extraction.py Co-authored-by: Omar Sanseviero <osanseviero@gmail.com> * Fix docstrings - review comments * Remove pipeline mapping for composite vision models * Add to pipeline tests * Remove for flava (multimodal) * safe pil import * Add requirements for pipeline run * Account for super slow efficientnet * Review comments * Fix tests * Swap order of kwargs * Use build_pipeline_init_args * Add back FE pipeline for Vilt * Include image_processor_kwargs in docstring * Mark test as flaky * Update TODO * Update tests/pipelines/test_pipelines_image_feature_extraction.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Add license header --------- Co-authored-by: Omar Sanseviero <osanseviero@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-02-05 14:50:07 +00:00
Yoach Lacombe	7addc9346c	Correct wav2vec2-bert inputs_to_logits_ratio (#28821 ) * Correct wav2vec2-bert inputs_to_logits_ratio * correct ratio * correct ratio, clean asr pipeline * refactor on one line	2024-02-05 13:14:47 +00:00
Patrick von Platen	65a926e82b	[Whisper] Refactor forced_decoder_ids & prompt ids (#28687 ) * up * Fix more * Correct more * Fix more tests * fix fast tests * Fix more * fix more * push all files * finish all * make style * Fix timestamp wrap * make style * make style * up * up * up * Fix lang detection behavior * Fix lang detection behavior * Add lang detection test * Fix lang detection behavior * make style * Update src/transformers/models/whisper/generation_whisper.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * better error message * make style tests * add warning --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>	2024-01-31 14:02:07 +02:00
Patrick von Platen	690fe73f20	[Whisper] Finalize batched SOTA long-form generation (#27658 ) * finalize * make fix copies whisper * [Tests] Make sure that we don't run tests mulitple times * Update src/transformers/models/whisper/modeling_whisper.py * [Tests] Make sure that we don't run tests mulitple times * fix more * improve * improve * improve further * improve more * improve * fix more * git commit and git push * fix more * fix more * fix more * New try * Fix more whisper stuff * Improve * correct more * correct more * correct more * Fix some tests * Add more tests * correct more * correct more * correct more * push * correct more * Fix more * Better * without dec mask * correct more * clean * save intermediate * Fix more * Fix VAD for large-v2 * Save new * Correct more * make cleaner * correct tests * correct src * Finish * Fix more * Fix more * finish * Fix edge cases * fix return_dict_in_generate * fix all tests * make style * add docstrings * add docstrings * Fix logit processor * make style * fix pipeline test * fix more style * Apply suggestions from code review * apply feedback Sanchit * correct more * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * correct more * correct more * correct more * Fix staticmethod * correct more * fix * fix slow tests * make style * fix tokenizer test * fix tokenizer test * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * finish * finish * revert kwargs change --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-01-19 14:04:17 +02:00
Yoach Lacombe	268fc1fdfa	Add w2v2bert to pipeline (#28585 ) * generalize asr pipeline to fbank models * change w2v2 pipeline output * Update test_pipelines_automatic_speech_recognition.py	2024-01-19 11:25:01 +00:00
thedamnedrhino	366c03271e	Tokenizer kwargs in textgeneration pipe (#28362 ) * added args to the pipeline * added test * more sensical tests * fixup * docs * typo ; * docs * made changes to support named args * fixed test * docs update * styles * docs * docs	2024-01-15 16:52:18 +01:00
Yih-Dar	59cd9de39d	Byebye torch 1.10 (#28207 ) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-01-11 16:18:27 +01:00
amyeroberts	66964c00f6	Enable multi-label image classification in pipeline (#28433 ) Enable multi-label image classification	2024-01-11 10:29:38 +00:00
NielsRogge	3b742ea84c	Add SigLIP (#26522 ) * Add first draft * Use appropriate gelu function * More improvements * More improvements * More improvements * Convert checkpoint * More improvements * Improve docs, remove print statements * More improvements * Add link * remove unused masking function * begin tokenizer * do_lower_case * debug * set split_special_tokens=True * Remove script * Fix style * Fix rebase * Use same design as CLIP * Add fast tokenizer * Add SiglipTokenizer to init, remove extra_ids * Improve conversion script * Use smaller inputs in conversion script * Update conversion script * More improvements * Add processor to conversion script * Add tests * Remove print statements * Add tokenizer tests * Fix more tests * More improvements related to weight initialization * More improvements * Make more tests pass * More improvements * More improvements * Add copied from * Add canonicalize_text * Enable fast tokenizer tests * More improvements * Fix most slow tokenizer tests * Address comments * Fix style * Remove script * Address some comments * Add copied from to tests * Add more copied from * Add more copied from * Add more copied from * Remove is_flax_available * More updates * Address comment * Remove SiglipTokenizerFast for now * Add caching * Remove umt5 test * Add canonicalize_text inside _tokenize, thanks Arthur * Fix image processor tests * Skip tests which are not applicable * Skip test_initialization * More improvements * Compare pixel values * Fix doc tests, add integration test * Add do_normalize * Remove causal mask and leverage ignore copy * Fix attention_mask * Fix remaining tests * Fix dummies * Rename temperature and bias * Address comments * Add copied from to tokenizer tests * Add SiglipVisionModel to auto mapping * Add copied from to image processor tests * Improve doc * Remove SiglipVisionModel from index * Address comments * Improve docs * Simplify config * Add first draft * Make it like mistral * More improvements * Fix attention_mask * Fix output_attentions * Add note in docs * Convert multilingual model * Convert large checkpoint * Convert more checkpoints * Add pipeline support, correct image_mean and image_std * Use padding=max_length by default * Make processor like llava * Add code snippet * Convert more checkpoints * Set keep_punctuation_string=None as in OpenCLIP * Set normalized=False for special tokens * Fix doc test * Update integration test * Add figure * Update organization * Happy new year * Use AutoModel everywhere --------- Co-authored-by: patil-suraj <surajp815@gmail.com>	2024-01-08 18:17:16 +01:00
yuanwu2017	03b980990a	Don't check the device when device_map=auto (#28351 ) When running the case on multi-cards server with devcie_map-auto, It will not always be allocated to device 0, Because other processes may be using these cards. It will select the devices that can accommodate this model. Signed-off-by: yuanwu <yuan.wu@intel.com>	2024-01-05 12:21:29 +01:00
Yoach Lacombe	5da3db3fd5	[Whisper] Fix word-level timestamps with bs>1 or num_beams>1 (#28114 ) * fix frames * use smaller chunk length * correct beam search + tentative stride * fix whisper word timestamp in batch * add test batch generation with return token timestamps * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * clean a test * make style + correct typo * write clearer comments * explain test in comment --------- Co-authored-by: sanchit-gandhi <sanchit@huggingface.co> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>	2023-12-22 12:43:11 +00:00
Quentin Lhoest	26ea725bc0	Update fixtures-image-utils (#28080 ) * fix hf-internal-testing/fixtures_image_utils * fix test * comments	2023-12-15 16:58:36 +00:00
Matt	050e0b44f6	Proper build() methods for TF (#27794 ) * Add a convenience method for building in your own name scope * Second attempt at auto layer building * Revert "Second attempt at auto layer building" This reverts commit e03a3aaecf9ec41a805582b83cbdfe3290a631be. * Attempt #3 * Revert "Attempt #3" This reverts commit b9df7a0857560d29b5abbed6127d9e9eca77cf47. * Add missing attributes that we're going to need later * Add some attributes we're going to need later * A fourth attempt! Feel the power flow through you! * Revert "A fourth attempt! Feel the power flow through you!" This reverts commit 6bf4aaf3875d6f28485f50187617a4c616c8aff7. * Add more values we'll need later * TF refactor that we'll need later * Revert "TF refactor that we'll need later" This reverts commit ca07202fb5b7b7436b893baa8d688b4f348ea7b9. * Revert "Revert "TF refactor that we'll need later"" This reverts commit 1beb0f39f293ed9c27594575e1c849aadeb15c13. * make fixup * Attempt five! * Revert "Attempt five!" This reverts commit 3302207958dfd0374b0447a51c06eea51a506044. * Attempt six - this time don't add empty methods * Revert "Attempt six - this time don't add empty methods" This reverts commit 67d60129be75416b6beb8f47c7d38d77b18d79bb. * Attempt seven - better base model class detection! * Revert "Attempt seven - better base model class detection!" This reverts commit 5f14845e92ea0e87c598da933bfbfee10f553bc9. * Another attribute we'll need later * Try again with the missing attribute! * Revert "Try again with the missing attribute!" This reverts commit 760c6f30c5dffb3e04b0e73c34a77d1882a0fef7. * This is the attempt that will pierce the heavens! * Revert "This is the attempt that will pierce the heavens!" This reverts commit c868bb657de057aca7a5260350a3f831fc4dfee6. * Attempt seven - snag list is steadily decreasing * Revert "Attempt seven - snag list is steadily decreasing" This reverts commit 46fbd975deda64429bfb3e5fac4fc0370c00d316. * Attempt eight - will an empty snag list do it? * Revert "Attempt eight - will an empty snag list do it?" This reverts commit 7c8a3c2b083253649569e9877e02054ae5cec67b. * Fixes to Hubert issues that cause problems later * Trying again with Conv1D/SeparableConv fixes * Revert "Trying again with Conv1D/SeparableConv fixes" This reverts commit 55092bca952bc0f750aa1ffe246a640bf1e2036e. * Apply the build shape fixes to Wav2Vec2 as well * One more attempt! * Revert "One more attempt!" This reverts commit 5ac3e4cb01b9458cc93312873725f9444ae7261c. * Another attempt! * Revert "Another attempt!" This reverts commit ea16d890e019d7de8792a3b8e72f3b1c02adae50. * Let's see how many failures we get without the internal build method * Fix OpenAI * Fix MobileBERT * (Mostly) fix GroupVIT * Fix BLIP * One more BLIP fix * One more BLIP fix! * Fix Regnet * Finally fully fix GroupViT * Fix Data2Vec and add the new AdaptivePool * Fix Segformer * Fix Albert * Fix Deberta/DebertaV2 * Fix XLM * Actually fix XLM * Fix Flaubert * Fix lxmert * Fix Resnet * Fix ConvBERT * Fix ESM * Fix Convnext / ConvnextV2 * Fix SAM * Fix Efficientformer * Fix LayoutLMv3 * Fix speech_to_text * Fix mpnet and mobilevit * Fix Swin * Fix CTRL * Fix CVT * Fix DPR * Fix Wav2Vec2 * Fix T5 * Fix Hubert * Fix GPT2 * Fix Whisper * Fix DeiT * Fix the encoder-decoder / dual-encoder classes * make fix-copies * build in name scope * Fix summarization test * Fix tied weight names for BART + Blenderbot * Fix tied weight name building * Fix to TFESM weight building * Update TF SAM * Expand all the shapes out into Big Boy Shapes	2023-12-14 15:17:30 +00:00
Yih-Dar	e366937587	Fix 2 tests in `FillMaskPipelineTests` (#27889 ) * fix * fix * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-12-08 14:55:29 +01:00
Younes Belkada	44b5506d29	[`Llava`] Add Llava to transformers (#27662 ) * add model like * logits match * minor fixes * fixes * up * up * add todo * llava processor * keep the processor simple * add conversion script * fixup * fix copies * up * add to index * fix config + logits * fix * refactor * more refactor * more refactor * fix copies * add authors * v1 tests * add `LlavaProcessor` in init * remove unneeded import * up * up * docs * up * fix CI * fix CI * add attention mask in test * make fixup * remove the vision model * that' s the dirty way to do it * nits * nits * updates * add more tests * add input tests * fixup * more styling * nits * updates amd cleanup * fixup the generation expected results * fix the testing script * some cleanup and simplification which does not work yet but almost there! * make correct dispatch operations * vectorize works for batch of images and text * last todos * nits * update test and modeling code * remove useless function for now * fix few issues * fix generation * some nits * add bakllava * nits * remove duplicated code * finis merge * cleanup * missed this line * fill the todos * add left padding offset * add left and rignt padding logic * bool to properly index * make sure * more cleanups * batch is fixed 😉 * add correct device for tensor creation * fix some dtype missmatch * ruff * update conversion script * Update src/transformers/__init__.py * fa 2 support + fix conversion script * more * correct reshaping * fix test dict * fix copies by ignoring * fix nit * skip clip vision model * fixup * fixup * LlavaForVisionText2Text -> LlavaForCausalLM * update * fix * raise correct errors * fix * docs * nuke for now * nits here and there * fixup * fix remaining tests * update LlavaForConditionalGeneration instead of CausalLM * fixups * pipeline support * slow and piepline tests * supports batch * nits * cleanup * fix first integration tests * add pad token where needed * correct etsts * fixups * update pipeline testr * fix quality * nits * revert unneeded change * nit * use BatchFeature * from ...feature_extraction_utils import BatchFeature * nits * nits * properly update * more f*** nits * fix copies * comment * keep slow test slow * Update src/transformers/models/llava/processing_llava.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * add piepline example * add pixel values in docstrign * update pr doctest * fix * fix slow tests * remove hack * fixup * small note * forward contrib credits from PR25789 * forward contrib credits from original implementation and work * add arthur * Update src/transformers/models/llava/processing_llava.py Co-authored-by: Lysandre Debut <hi@lysand.re> * update docstring * nit * move to not doctested because of timeout issues * fixup * add description * more * fix-copies * fix docs * add beam search * add more comments * add typehints on processor * add speedup plot * update slow tests and docs * push test * push batched test * fix batched generation with different number of images * remove benchmark due to a bug * fix test * fix copies * add gcolab demo --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: shauray8 <shauray8@users.noreply.github.com> Co-authored-by: haotian-liu <haotian-liu@users.noreply.github.com> Co-authored-by: Lysandre Debut <hi@lysand.re>	2023-12-07 09:30:47 +01:00
Sanchit Gandhi	3c15fd1990	[Seamless v2] Add FE to auto mapping (#27829 )	2023-12-04 16:34:13 +00:00
Yih-Dar	b8db265bc6	Update tiny model summary file (#27388 ) * update * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-11-23 21:00:39 +01:00
Patrick von Platen	4151fbb49c	[Whisper] Add sequential longform decoding (#27492 ) * [Whisper] Add seq gen * [Whisper] Add seq gen * more debug * Fix whisper logit processor * Improve whisper code further * Fix more * more debug * more debug * Improve further * Add tests * Prep for batch size > 1 * Get batch_size>1 working * Correct more * Add extensive tests * more debug * more debug * more debug * add more tests * more debug * Apply suggestions from code review * more debug * add comments to explain the code better * add comments to explain the code better * add comments to explain the code better * Add more examples * add comments to explain the code better * fix more * add comments to explain the code better * add comments to explain the code better * correct * correct * finalize * Apply suggestions from code review * Apply suggestions from code review	2023-11-22 13:27:34 +01:00
Arthur	651408a077	[`Styling`] stylify using ruff (#27144 ) * try to stylify using ruff * might need to remove these changes? * use ruf format andruff check * use isinstance instead of type comparision * use # fmt: skip * use # fmt: skip * nits * soem styling changes * update ci job * nits isinstance * more files update * nits * more nits * small nits * check and format * revert wrong changes * actually use formatter instead of checker * nits * well docbuilder is overwriting this commit * revert notebook changes * try to nuke docbuilder * style * fix feature exrtaction test * remve `indent-width = 4` * fixup * more nits * update the ruff version that we use * style * nuke docbuilder styling * leve the print for detected changes * nits * Remove file I/O Co-authored-by: charliermarsh <charlie.r.marsh@gmail.com> * style * nits * revert notebook changes * Add # fmt skip when possible * Add # fmt skip when possible * Fix * More ` # fmt: skip` usage * More ` # fmt: skip` usage * More ` # fmt: skip` usage * NIts * more fixes * fix tapas * Another way to skip * Recommended way * Fix two more fiels * Remove asynch Remove asynch --------- Co-authored-by: charliermarsh <charlie.r.marsh@gmail.com>	2023-11-16 17:43:19 +01:00
Lucain	fd65aa9818	Set `usedforsecurity=False` in hashlib methods (FIPS compliance) (#27483 ) * Set usedforsecurity=False in hashlib methods (FIPS compliance) * trigger ci * tokenizers version * deps * bump hfh version * let's try this	2023-11-16 14:29:53 +00:00
Sanchit Gandhi	a4616c6767	[Whisper] Fix pipeline test (#27442 )	2023-11-14 11:18:26 +00:00
Lucain	e38348ae8f	Fix RequestCounter to make it more future-proof (#27406 ) * Fix RequestCounter to make it more future-proof * code quality	2023-11-09 18:53:26 +01:00
Sanchit Gandhi	da7ea9a4e3	[Whisper] Block language/task args for English-only (#27322 ) * [Whisper] Block language/task args for English-only * Update src/transformers/models/whisper/modeling_whisper.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-11-07 10:04:23 +00:00
Yoach Lacombe	0ed6729bb1	Enrich TTS pipeline parameters naming (#26473 ) * enrich TTS pipeline docstring for clearer forward_params use * change token leghts * update Pipeline parameters * correct docstring and make style * fix tests * make style * change music prompt Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * raise errors if generate_kwargs with forward-only models * make style --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-11-02 17:06:56 +00:00
Matt	05f2290114	Backward compatibility fix for the Conversation class (#27176 ) * Backward compatibility fix for the Conversation class * Explain what's going on in the conditional	2023-10-31 15:12:06 +00:00
Hz, Ji	f53041a753	device agnostic pipelines testing (#27129 ) * device agnostic pipelines testing * pass torch_device	2023-10-31 15:46:31 +01:00
Matt	08fadc8085	Shorten the conversation tests for speed + fixing position overflows (#26960 ) * Shorten the conversation tests for speed + fixing position overflows * Put max_new_tokens back to 5 * Remove test skips * Increase max_position_embeddings in blenderbot tests * Add skips for blenderbot_small * Correct TF test skip * make fixup * Reformat skips to use is_pipeline_test_to_skip * Update tests/models/blenderbot_small/test_modeling_blenderbot_small.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/blenderbot_small/test_modeling_flax_blenderbot_small.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/blenderbot_small/test_modeling_tf_blenderbot_small.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-10-31 14:20:04 +00:00
Matt	bdbcd5d482	Fix and re-enable ConversationalPipeline tests (#26907 ) * Fix and re-enable conversationalpipeline tests * Fix the batch test so the change only applies to conversational pipeline	2023-10-19 12:04:25 +01:00
Tom Aarsen	40ea9ab2a1	Add many missing spaces in adjacent strings (#26751 ) Add missing spaces in adjacent strings	2023-10-12 10:28:40 +02:00
Nathan Cahill	b5ca8fcd20	Add tokenizer kwargs to fill mask pipeline. (#26234 ) * add tokenizer kwarg inputs * Adding tokenizer_kwargs to _sanitize_parameters * Add truncation=True example to tests * Update test_pipelines_fill_mask.py * Update test_pipelines_fill_mask.py * make fix-copies and make style * Update fill_mask.py Replace single tick with double * make fix-copies * Style --------- Co-authored-by: Lysandre <lysandre@huggingface.co>	2023-10-03 10:25:10 +02:00
Yih-Dar	d9e4bc2895	Update tiny model information and pipeline tests (#26285 ) * Update tiny model summary file * add to pipeline tests * revert * fix import * fix import * fix * fix * update * update * update * fix * remove BarkModelTest * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-09-25 18:08:12 +02:00
LeviVasconcelos	576cd45a57	Add image to image pipeline (#25393 ) * Add image to image pipeline Add image to image pipeline * remove swin2sr from tf auto * make ImageToImage importable * make style make style make style make style * remove tf support * remove nonused imports * fix postprocessing * add important comments; add unit tests * add documentation * remove support for TF * make fixup * fix typehint Image.Image * fix documentation code * address review request; fix unittest type checking * address review request; fix unittest type checking * make fixup * address reviews * Update src/transformers/pipelines/image_to_image.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * enhance docs * make style * make style * improve docetest time * improve docetest time * Update tests/pipelines/test_pipelines_image_to_image.py Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> * Update tests/pipelines/test_pipelines_image_to_image.py Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> * make fixup * undo faulty merge * undo faulty merge * add image-to-image to test pipeline mixin * Update src/transformers/pipelines/image_to_image.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update tests/pipelines/test_pipelines_image_to_image.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * improve docs --------- Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-09-22 19:53:55 +03:00
Arthur	2da8853775	🚨🚨 🚨🚨 [`Tokenizer`] attemp to fix add_token issues🚨🚨 🚨🚨 (#23909 ) * fix test for bart. Order is correct now let's skip BPEs * ouf * styling * fix bert.... * slow refactoring * current updates * massive refactoring * update * NICE! * update to see where I am at * updates * update * update * revert * updates * updates * start supporting legacy_save * styling * big update * revert some changes * nits * nniiiiiice * small fixes * kinda fix t5 with new behaviour * major update * fixup * fix copies * today's updates * fix byt5 * upfate * update * update * updates * update vocab size test * Barthez does not use not need the fairseq offset ids * super calll must be after * calll super * move all super init * move other super init * fixup * nits * more fixes * nits * more fixes * nits * more fix * remove useless files * ouch all of them are affected * and more! * small imporvements * no more sanitize token * more changes around unique no split tokens * partially fix more things * keep legacy save but add warning * so... more fixes * updates * guess deberta tokenizer could be nuked * fixup * fixup did some bad things * nuke it if it breaks * remove prints and pretrain fast from slow with new format. * fixups * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fiou * nit * by default specials should not be normalized? * update * remove brakpoint * updates * a lot of updates * fixup * fixes revert some changes to match fast * small nits * that makes it cleaner * fix camembert accordingly * update * some lest breaking changes * update * fixup * fix byt5 and whisper mostly * some more fixes, canine's byte vocab * fix gpt2 * fix most of the perceiver tests (4 left) * fix layout lmv3 * fixup * fix copies for gpt2 style * make sure to only warn once * fix perciever and gpt2 tests * some more backward compatibility: also read special tokens map because some ppl use it........////..... * fixup * add else when reading * nits * fresh updates * fix copies * will this make everything faster? * fixes * more fixes * update * more fixes * fixup * is the source of truth right? * sorry camembert for the troubles * current updates * fixup * update led * update * fix regression * fix single word * more model specific fixes * fix t5 tests * fixup * more comments * update * fix nllb * rstrip removed * small fixes * better handle additional_special_tokens and vocab sizes * fixing * styling * fix 4 / 21 * fixup * fix nlbb's tests * some fixes * fix t5 * fixes * style * fix canine tests * damn this is nice * nits * m2m100 nit * fixups * fixes! * fixup * stash * fix merge * revert bad change * fixup * correct order for code Llama * fix speecht5 post merge * styling * revert source of 11 fails * small nits * all changes in one go * fnet hack * fix 2 more tests * update based on main branch of tokenizers * fixup * fix VITS issues * more fixes * fix mgp test * fix camembert issues * oups camembert still has 2 failing tests * mluke fixes * decode fixes * small nits * nits * fix llama and vits * fix camembert * smal nits * more fixes when initialising a fast from a slow and etc * fix one of the last test * fix CPM tokenizer test * fixups * fix pop2piano * fixup * ⚠️ Change tokenizers required version ⚠️ * ⚠️ Change tokenizers required version ⚠️ * "tokenizers>=0.14,<0.15", don't forget smaller than * fix musicgen tests and pretraiendtokenizerfast * fix owlvit and all * update t5 * fix 800 red * fix tests * fix the fix of the fix of t5 * styling * documentation nits * cache _added_tokens_encoder * fixups * Nit * fix red tests * one last nit! * make eveything a lot simpler * Now it's over 😉 * few small nits * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * updates that work for now * tests that should no be skipped / changed and fixed next * fixup * i am ashamed * pushe the fix * update * fixups * nits * fix added_tokens_encoder * fix canine test * fix pegasus vocab * fix transfoXL * fixup * whisper needs to be fixed for train new * pegasus nits * more pegasus fixes * minor update * better error message in failed test * fix whisper failing test * fix whisper failing test * fix pegasus * fixup * fix **** pegasus * reset things * remove another file * attempts to fix the strange custome encoder and offset * nits here and there * update * fixup * nit * fix the whisper test * nits nits * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * updates based on review * some small update to potentially remove * nits * import rlu cache * Update src/transformers/tokenization_utils_base.py Co-authored-by: Lysandre Debut <hi@lysand.re> * move warning to `from_pretrained` * update tests results now that the special tokens are always added --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Lysandre Debut <hi@lysand.re>	2023-09-18 20:28:36 +02:00
Matt	f0a6057fbc	Fix ConversationalPipeline tests (#26217 ) Add BlenderbotSmall templates and correct handling for conversation.past_user_inputs	2023-09-18 15:08:56 +01:00
Joshua Lochner	95fe0f5d80	[Whisper] Fix word-level timestamps for audio < 30 seconds (#25607 ) * Fix word-level timestamps for audio < 30 seconds * Fix code quality * fix unit tests * Fix unit tests * Fix unit test * temp: print out result * temp: set max diff to None * fix unit tests * fix typo * Fix typo Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Use generation config for `num_frames` * fix docs * Move `num_frames` to kwargs * compute stride/attn_mask once * mark test as slow --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>	2023-09-14 17:42:35 +01:00
Sanchit Gandhi	44a0490d3c	[MusicGen] Add sampling rate to config (#26136 ) * [MusicGen] Add sampling rate to config * remove tiny * make property * Update tests/pipelines/test_pipelines_text_to_audio.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * style --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-09-14 16:57:06 +01:00
Matt	866df66fe4	Overhaul Conversation class and prompt templating (#25323 ) * First commit while I figure this out * make fixup * Remove unused method * Store prompt attrib * Fix prompt argument for tests * Make same changes in fast tokenizer * Remove global prompts from fast tokenizer too * stash commit * stash commit * Migrate PromptConfig to its True Final Location * Replace Conversation entirely with the new class * Import/dependency fixes * Import/dependency fixes * Change format for lots of default prompts * More default prompt fixups * Revert llama old methods so we can compare * Fix some default configs * Fix some default configs * Fix misspelled kwarg * Fixes for Blenderbot * make fixup * little rebase cleanup * Add basic documentation * Quick doc fix * Truncate docstring for now * Add handling for the case when messages is a single string * Quick llama merges * Update conversational pipeline and tests * Add a couple of legacy properties for backward compatibility * More legacy handling * Add docstring for build_conversation_input_ids * Restructure PromptConfig * Let's start T E M P L A T I N G * Refactor all default configs to use templates instead * Revert changes to the special token properties since we don't need them anymore * More class templates * Make the sandbox even sandier * Everything replaced with pure templating * Remove docs for PromptConfig * Add testing and optional requirement boilerplate * Fix imports and make fixup * Fix LLaMA tests and add Conversation docstring * Finally get LLaMA working with the template system * Finally get LLaMA working with the template system * make fixup * make fixup * fmt-off for the long lists of test tokens * Rename method to apply_chat_template for now * Start on documentation * Make chat_template a property that reads through to the default if it's not set * Expand docs * Expand chat templating doc some more * trim/lstrip blocks by default and update doc * Few doc tweaks * rebase cleanup * Clarify docstring * rebase cleanup * rebase cleanup * make fixup * Quick doc edit * Reformat the standard template to match ChatML * Re-add PEFT check * Update docs/source/en/chat_templating.md Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Add apply_chat_template to the tokenizer doc * make fixup * Add doc links * Fix chat links * Fix chat links * Explain system messages in the doc * Add chat template test * Proper save-loading for chat template attribute * Add test skips for layout models * Remove _build_conversation_input_ids, add default_chat_template to code_llama * Make sure all LLaMA models are using the latest template * Remove default_system_prompt block in code_llama because it has no default prompt * Update ConversationPipeline preprocess * Add correct #Copied from links to the default_chat_templates * Remove unneeded type checking line * Add a dummy mark_processsed method * Reorganize Conversation to have *deprecated_kwargs Update chat_templating.md * Quick fix to LLAMA tests * Small doc tweaks * Add proper docstrings and "copied from" statements to all default chat templates * Merge use_default_system_prompt support for code_llama too * Improve clarity around self.chat_template * Docstring fix * Fix blenderbot default template * More doctest fix * Break out some tokenizer kwargs * Update doc to explain default templates * Quick tweaks to tokenizer args * Cleanups for tokenizer args * Add note about cacheing * Quick tweak to the chat-templating doc * Update the LLaMA template with error checking and correct system message embedding * make fixup * make fixup * add requires_jinja * Cleanup to expected output formatting * Add cacheing * Fix typo in llama default template * Update LLaMA tests * Update documentation * Improved legacy handling in the Conversation class * Update Jinja template with proper error handling * Quick bugfix * Proper exception raising * Change cacheing behaviour so it doesn't try to pickle an entire Jinja env * make fixup * rebase cleanup --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2023-09-14 15:10:34 +01:00
Arthur	d0354e5e86	[`CI`] Fix red CI and ERROR failed should show (#25995 ) * start with error too * fix ? * start with nit * one more path * use `job_name` * mark pipeline test as slow	2023-09-05 20:16:00 +02:00
Sanchit Gandhi	8d518013ef	[Wav2Vec2 Conformer] Fix inference float16 (#25985 ) * [Wav2Vec2 Conformer] Fix inference float16 * fix test * fix test more * clean pipe test	2023-09-05 18:26:06 +01:00
Sanchit Gandhi	b439129e74	[VITS] Add to TTA pipeline (#25906 ) * [VITS] Add to TTA pipeline * Update tests/pipelines/test_pipelines_text_to_audio.py Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * remove extra spaces --------- Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>	2023-09-01 16:39:00 +01:00
raghavanone	2be8a9098e	Save image_processor while saving pipeline (ImageSegmentationPipeline) (#25884 ) * Save image_processor while saving pipeline (ImageSegmentationPipeline) * Fix black issues	2023-08-31 16:08:20 +02:00
Juan Pizarro	09dc99517f	Add Blip2 model in VQA pipeline (#25532 ) * Add Blip2 model in VQA pipeline * use require_torch_gpu for test_large_model_pt_blip2 * use can_generate in vqa pipeline * test Blip2ForConditionalGeneration using float16 * remove custom can_generate from Blip2ForConditionalGeneration	2023-08-30 14:16:16 +01:00
Sanchit Gandhi	0218876822	[ASR Pipe Test] Fix CTC timestamps error message (#25727 )	2023-08-24 17:58:37 +01:00
Arthur	bc3e20dcf0	[`Llama`] remove prompt and fix prefix finetuning (#25565 ) * nit * update * make sure use_default_system_prompt is saved * update checkpointing * consistency * use_default_system_prompt for test	2023-08-18 13:39:23 +02:00
Yoach Lacombe	b8f69d0d10	Add Text-To-Speech pipeline (#24952 ) * add AutoModelForTextToSpeech class * add TTS pipeline and tessting * add docstrings to text_to_speech pipeline * fix torch dependency * corrector 'processor is None' case in Pipeline * correct repo id * modify text-to-speech -> text-to-audio * remove processor * rename text_to_speech pipelines files to text_audio * add textToWaveform and textToSpectrogram instead of textToAudio classes * update TTS pipeline to the bare minimum * update tests TTS pipeline * make style and erase useless import torch in TTS pipeline tests * modify how to check if generate or forward in TTS pipeline * remove unnecessary extra new lines * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * refactor input_texts -> text_inputs * correct docstrings of TTS.__call__ * correct the shape of generated waveform * take care of Bark tokenizer special case * correct run_pipeline_test TTS * make style * update TTS docstrings * address Sylvain nit refactors * make style * refactor into one liners * correct squeeze * correct way to test if forward or generate * Update output audio waveform shape * make style * correct import * modify how the TTS pipeline test if a model can generate * align shape output of TTS pipeline with consistent shape --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>	2023-08-17 17:34:47 +01:00
Sanchit Gandhi	36f183ebab	[ASR Pipeline] Fix init with timestamps (#25438 ) * [ASR Pipeline] Fix init * refactor test * change default kwarg setting * only perform checks if we have to * override init * move pre/forward/post checks to sanitize	2023-08-16 18:04:19 +01:00
Sanchit Gandhi	dedd11160d	[ASR Pipeline] Clarify return timestamps (#25344 ) * [ASR Pipeline] Clarify return timestamps * fix indentation * fix ctc check * fix ctc error message! * fix test * fix other test * add new tests * final comment	2023-08-08 10:16:00 +01:00
amyeroberts	05cda5df34	🚨🚨🚨 Fix rescale ViVit Efficientnet (#25174 ) * Fix rescaling bug * Add tests * Update integration tests * Fix up * Update src/transformers/image_transforms.py * Update test - new possible order in list	2023-07-28 19:52:51 +01:00

1 2 3 4 5 ...

311 Commits