transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-24 14:58:56 +06:00

Author	SHA1	Message	Date
Matt	08fadc8085	Shorten the conversation tests for speed + fixing position overflows (#26960 ) * Shorten the conversation tests for speed + fixing position overflows * Put max_new_tokens back to 5 * Remove test skips * Increase max_position_embeddings in blenderbot tests * Add skips for blenderbot_small * Correct TF test skip * make fixup * Reformat skips to use is_pipeline_test_to_skip * Update tests/models/blenderbot_small/test_modeling_blenderbot_small.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/blenderbot_small/test_modeling_flax_blenderbot_small.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/blenderbot_small/test_modeling_tf_blenderbot_small.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-10-31 14:20:04 +00:00
Matt	bdbcd5d482	Fix and re-enable ConversationalPipeline tests (#26907 ) * Fix and re-enable conversationalpipeline tests * Fix the batch test so the change only applies to conversational pipeline	2023-10-19 12:04:25 +01:00
Tom Aarsen	40ea9ab2a1	Add many missing spaces in adjacent strings (#26751 ) Add missing spaces in adjacent strings	2023-10-12 10:28:40 +02:00
Nathan Cahill	b5ca8fcd20	Add tokenizer kwargs to fill mask pipeline. (#26234 ) * add tokenizer kwarg inputs * Adding tokenizer_kwargs to _sanitize_parameters * Add truncation=True example to tests * Update test_pipelines_fill_mask.py * Update test_pipelines_fill_mask.py * make fix-copies and make style * Update fill_mask.py Replace single tick with double * make fix-copies * Style --------- Co-authored-by: Lysandre <lysandre@huggingface.co>	2023-10-03 10:25:10 +02:00
Yih-Dar	d9e4bc2895	Update tiny model information and pipeline tests (#26285 ) * Update tiny model summary file * add to pipeline tests * revert * fix import * fix import * fix * fix * update * update * update * fix * remove BarkModelTest * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-09-25 18:08:12 +02:00
LeviVasconcelos	576cd45a57	Add image to image pipeline (#25393 ) * Add image to image pipeline Add image to image pipeline * remove swin2sr from tf auto * make ImageToImage importable * make style make style make style make style * remove tf support * remove nonused imports * fix postprocessing * add important comments; add unit tests * add documentation * remove support for TF * make fixup * fix typehint Image.Image * fix documentation code * address review request; fix unittest type checking * address review request; fix unittest type checking * make fixup * address reviews * Update src/transformers/pipelines/image_to_image.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * enhance docs * make style * make style * improve docetest time * improve docetest time * Update tests/pipelines/test_pipelines_image_to_image.py Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> * Update tests/pipelines/test_pipelines_image_to_image.py Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> * make fixup * undo faulty merge * undo faulty merge * add image-to-image to test pipeline mixin * Update src/transformers/pipelines/image_to_image.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update tests/pipelines/test_pipelines_image_to_image.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * improve docs --------- Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-09-22 19:53:55 +03:00
Arthur	2da8853775	🚨🚨 🚨🚨 [`Tokenizer`] attemp to fix add_token issues🚨🚨 🚨🚨 (#23909 ) * fix test for bart. Order is correct now let's skip BPEs * ouf * styling * fix bert.... * slow refactoring * current updates * massive refactoring * update * NICE! * update to see where I am at * updates * update * update * revert * updates * updates * start supporting legacy_save * styling * big update * revert some changes * nits * nniiiiiice * small fixes * kinda fix t5 with new behaviour * major update * fixup * fix copies * today's updates * fix byt5 * upfate * update * update * updates * update vocab size test * Barthez does not use not need the fairseq offset ids * super calll must be after * calll super * move all super init * move other super init * fixup * nits * more fixes * nits * more fixes * nits * more fix * remove useless files * ouch all of them are affected * and more! * small imporvements * no more sanitize token * more changes around unique no split tokens * partially fix more things * keep legacy save but add warning * so... more fixes * updates * guess deberta tokenizer could be nuked * fixup * fixup did some bad things * nuke it if it breaks * remove prints and pretrain fast from slow with new format. * fixups * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fiou * nit * by default specials should not be normalized? * update * remove brakpoint * updates * a lot of updates * fixup * fixes revert some changes to match fast * small nits * that makes it cleaner * fix camembert accordingly * update * some lest breaking changes * update * fixup * fix byt5 and whisper mostly * some more fixes, canine's byte vocab * fix gpt2 * fix most of the perceiver tests (4 left) * fix layout lmv3 * fixup * fix copies for gpt2 style * make sure to only warn once * fix perciever and gpt2 tests * some more backward compatibility: also read special tokens map because some ppl use it........////..... * fixup * add else when reading * nits * fresh updates * fix copies * will this make everything faster? * fixes * more fixes * update * more fixes * fixup * is the source of truth right? * sorry camembert for the troubles * current updates * fixup * update led * update * fix regression * fix single word * more model specific fixes * fix t5 tests * fixup * more comments * update * fix nllb * rstrip removed * small fixes * better handle additional_special_tokens and vocab sizes * fixing * styling * fix 4 / 21 * fixup * fix nlbb's tests * some fixes * fix t5 * fixes * style * fix canine tests * damn this is nice * nits * m2m100 nit * fixups * fixes! * fixup * stash * fix merge * revert bad change * fixup * correct order for code Llama * fix speecht5 post merge * styling * revert source of 11 fails * small nits * all changes in one go * fnet hack * fix 2 more tests * update based on main branch of tokenizers * fixup * fix VITS issues * more fixes * fix mgp test * fix camembert issues * oups camembert still has 2 failing tests * mluke fixes * decode fixes * small nits * nits * fix llama and vits * fix camembert * smal nits * more fixes when initialising a fast from a slow and etc * fix one of the last test * fix CPM tokenizer test * fixups * fix pop2piano * fixup * ⚠️ Change tokenizers required version ⚠️ * ⚠️ Change tokenizers required version ⚠️ * "tokenizers>=0.14,<0.15", don't forget smaller than * fix musicgen tests and pretraiendtokenizerfast * fix owlvit and all * update t5 * fix 800 red * fix tests * fix the fix of the fix of t5 * styling * documentation nits * cache _added_tokens_encoder * fixups * Nit * fix red tests * one last nit! * make eveything a lot simpler * Now it's over 😉 * few small nits * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * updates that work for now * tests that should no be skipped / changed and fixed next * fixup * i am ashamed * pushe the fix * update * fixups * nits * fix added_tokens_encoder * fix canine test * fix pegasus vocab * fix transfoXL * fixup * whisper needs to be fixed for train new * pegasus nits * more pegasus fixes * minor update * better error message in failed test * fix whisper failing test * fix whisper failing test * fix pegasus * fixup * fix **** pegasus * reset things * remove another file * attempts to fix the strange custome encoder and offset * nits here and there * update * fixup * nit * fix the whisper test * nits nits * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * updates based on review * some small update to potentially remove * nits * import rlu cache * Update src/transformers/tokenization_utils_base.py Co-authored-by: Lysandre Debut <hi@lysand.re> * move warning to `from_pretrained` * update tests results now that the special tokens are always added --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Lysandre Debut <hi@lysand.re>	2023-09-18 20:28:36 +02:00
Matt	f0a6057fbc	Fix ConversationalPipeline tests (#26217 ) Add BlenderbotSmall templates and correct handling for conversation.past_user_inputs	2023-09-18 15:08:56 +01:00
Joshua Lochner	95fe0f5d80	[Whisper] Fix word-level timestamps for audio < 30 seconds (#25607 ) * Fix word-level timestamps for audio < 30 seconds * Fix code quality * fix unit tests * Fix unit tests * Fix unit test * temp: print out result * temp: set max diff to None * fix unit tests * fix typo * Fix typo Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Use generation config for `num_frames` * fix docs * Move `num_frames` to kwargs * compute stride/attn_mask once * mark test as slow --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>	2023-09-14 17:42:35 +01:00
Sanchit Gandhi	44a0490d3c	[MusicGen] Add sampling rate to config (#26136 ) * [MusicGen] Add sampling rate to config * remove tiny * make property * Update tests/pipelines/test_pipelines_text_to_audio.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * style --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-09-14 16:57:06 +01:00
Matt	866df66fe4	Overhaul Conversation class and prompt templating (#25323 ) * First commit while I figure this out * make fixup * Remove unused method * Store prompt attrib * Fix prompt argument for tests * Make same changes in fast tokenizer * Remove global prompts from fast tokenizer too * stash commit * stash commit * Migrate PromptConfig to its True Final Location * Replace Conversation entirely with the new class * Import/dependency fixes * Import/dependency fixes * Change format for lots of default prompts * More default prompt fixups * Revert llama old methods so we can compare * Fix some default configs * Fix some default configs * Fix misspelled kwarg * Fixes for Blenderbot * make fixup * little rebase cleanup * Add basic documentation * Quick doc fix * Truncate docstring for now * Add handling for the case when messages is a single string * Quick llama merges * Update conversational pipeline and tests * Add a couple of legacy properties for backward compatibility * More legacy handling * Add docstring for build_conversation_input_ids * Restructure PromptConfig * Let's start T E M P L A T I N G * Refactor all default configs to use templates instead * Revert changes to the special token properties since we don't need them anymore * More class templates * Make the sandbox even sandier * Everything replaced with pure templating * Remove docs for PromptConfig * Add testing and optional requirement boilerplate * Fix imports and make fixup * Fix LLaMA tests and add Conversation docstring * Finally get LLaMA working with the template system * Finally get LLaMA working with the template system * make fixup * make fixup * fmt-off for the long lists of test tokens * Rename method to apply_chat_template for now * Start on documentation * Make chat_template a property that reads through to the default if it's not set * Expand docs * Expand chat templating doc some more * trim/lstrip blocks by default and update doc * Few doc tweaks * rebase cleanup * Clarify docstring * rebase cleanup * rebase cleanup * make fixup * Quick doc edit * Reformat the standard template to match ChatML * Re-add PEFT check * Update docs/source/en/chat_templating.md Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Add apply_chat_template to the tokenizer doc * make fixup * Add doc links * Fix chat links * Fix chat links * Explain system messages in the doc * Add chat template test * Proper save-loading for chat template attribute * Add test skips for layout models * Remove _build_conversation_input_ids, add default_chat_template to code_llama * Make sure all LLaMA models are using the latest template * Remove default_system_prompt block in code_llama because it has no default prompt * Update ConversationPipeline preprocess * Add correct #Copied from links to the default_chat_templates * Remove unneeded type checking line * Add a dummy mark_processsed method * Reorganize Conversation to have *deprecated_kwargs Update chat_templating.md * Quick fix to LLAMA tests * Small doc tweaks * Add proper docstrings and "copied from" statements to all default chat templates * Merge use_default_system_prompt support for code_llama too * Improve clarity around self.chat_template * Docstring fix * Fix blenderbot default template * More doctest fix * Break out some tokenizer kwargs * Update doc to explain default templates * Quick tweaks to tokenizer args * Cleanups for tokenizer args * Add note about cacheing * Quick tweak to the chat-templating doc * Update the LLaMA template with error checking and correct system message embedding * make fixup * make fixup * add requires_jinja * Cleanup to expected output formatting * Add cacheing * Fix typo in llama default template * Update LLaMA tests * Update documentation * Improved legacy handling in the Conversation class * Update Jinja template with proper error handling * Quick bugfix * Proper exception raising * Change cacheing behaviour so it doesn't try to pickle an entire Jinja env * make fixup * rebase cleanup --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2023-09-14 15:10:34 +01:00
Arthur	d0354e5e86	[`CI`] Fix red CI and ERROR failed should show (#25995 ) * start with error too * fix ? * start with nit * one more path * use `job_name` * mark pipeline test as slow	2023-09-05 20:16:00 +02:00
Sanchit Gandhi	8d518013ef	[Wav2Vec2 Conformer] Fix inference float16 (#25985 ) * [Wav2Vec2 Conformer] Fix inference float16 * fix test * fix test more * clean pipe test	2023-09-05 18:26:06 +01:00
Sanchit Gandhi	b439129e74	[VITS] Add to TTA pipeline (#25906 ) * [VITS] Add to TTA pipeline * Update tests/pipelines/test_pipelines_text_to_audio.py Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * remove extra spaces --------- Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>	2023-09-01 16:39:00 +01:00
raghavanone	2be8a9098e	Save image_processor while saving pipeline (ImageSegmentationPipeline) (#25884 ) * Save image_processor while saving pipeline (ImageSegmentationPipeline) * Fix black issues	2023-08-31 16:08:20 +02:00
Juan Pizarro	09dc99517f	Add Blip2 model in VQA pipeline (#25532 ) * Add Blip2 model in VQA pipeline * use require_torch_gpu for test_large_model_pt_blip2 * use can_generate in vqa pipeline * test Blip2ForConditionalGeneration using float16 * remove custom can_generate from Blip2ForConditionalGeneration	2023-08-30 14:16:16 +01:00
Sanchit Gandhi	0218876822	[ASR Pipe Test] Fix CTC timestamps error message (#25727 )	2023-08-24 17:58:37 +01:00
Arthur	bc3e20dcf0	[`Llama`] remove prompt and fix prefix finetuning (#25565 ) * nit * update * make sure use_default_system_prompt is saved * update checkpointing * consistency * use_default_system_prompt for test	2023-08-18 13:39:23 +02:00
Yoach Lacombe	b8f69d0d10	Add Text-To-Speech pipeline (#24952 ) * add AutoModelForTextToSpeech class * add TTS pipeline and tessting * add docstrings to text_to_speech pipeline * fix torch dependency * corrector 'processor is None' case in Pipeline * correct repo id * modify text-to-speech -> text-to-audio * remove processor * rename text_to_speech pipelines files to text_audio * add textToWaveform and textToSpectrogram instead of textToAudio classes * update TTS pipeline to the bare minimum * update tests TTS pipeline * make style and erase useless import torch in TTS pipeline tests * modify how to check if generate or forward in TTS pipeline * remove unnecessary extra new lines * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * refactor input_texts -> text_inputs * correct docstrings of TTS.__call__ * correct the shape of generated waveform * take care of Bark tokenizer special case * correct run_pipeline_test TTS * make style * update TTS docstrings * address Sylvain nit refactors * make style * refactor into one liners * correct squeeze * correct way to test if forward or generate * Update output audio waveform shape * make style * correct import * modify how the TTS pipeline test if a model can generate * align shape output of TTS pipeline with consistent shape --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>	2023-08-17 17:34:47 +01:00
Sanchit Gandhi	36f183ebab	[ASR Pipeline] Fix init with timestamps (#25438 ) * [ASR Pipeline] Fix init * refactor test * change default kwarg setting * only perform checks if we have to * override init * move pre/forward/post checks to sanitize	2023-08-16 18:04:19 +01:00
Sanchit Gandhi	dedd11160d	[ASR Pipeline] Clarify return timestamps (#25344 ) * [ASR Pipeline] Clarify return timestamps * fix indentation * fix ctc check * fix ctc error message! * fix test * fix other test * add new tests * final comment	2023-08-08 10:16:00 +01:00
amyeroberts	05cda5df34	🚨🚨🚨 Fix rescale ViVit Efficientnet (#25174 ) * Fix rescaling bug * Add tests * Update integration tests * Fix up * Update src/transformers/image_transforms.py * Update test - new possible order in list	2023-07-28 19:52:51 +01:00
Arthur	07360b6c9c	[`Llama2`] Add support for Llama 2 (#24891 ) * add llama * add other readmes * update padding id in readme * add link to paper * fix paths and tokenizer * more nits * styling * fit operation in 2 lines when possible * nits * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * add form * update reademe * update readme, we don't have a default pad token * update test and tokenization * LLaMA instead of Llama * nits * add expected text * add greeedy output * styling * Update src/transformers/models/llama/modeling_llama.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * sequential device map * skip relevant changes --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2023-07-18 15:18:31 -04:00
Yih-Dar	57da42ad05	Enable `ZeroShotAudioClassificationPipelineTests::test_small_model_pt` (#24882 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-07-18 15:08:53 +02:00
Yih-Dar	870dfc15b2	Skip failing `ZeroShotAudioClassificationPipelineTests::test_small_model_pt` for now (#24867 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-07-17 15:51:50 -04:00
Joao Gante	34d9409427	Llama/GPTNeoX: add RoPE scaling (#24653 ) * add rope_scaling * tmp commit * add gptneox * add tests * GPTNeoX can now handle long inputs, so the pipeline test was wrong * Update src/transformers/models/open_llama/configuration_open_llama.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * remove ntk * remove redundant validation --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-07-13 16:47:30 +01:00
Younes Belkada	914289ac4b	[`pipeline`] Fix str device issue (#24396 ) * fix str device issue * fixup * adapt from suggestions * forward contrib credits from suggestions * better fix * added backward compatibility for older PT versions * final fixes * oops * Attempting something with less branching. --------- Co-authored-by: amyeroberts <amyeroberts@users.noreply.github.com> Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>	2023-06-26 13:58:36 +02:00
Sanchit Gandhi	8767958fc1	Allow dict input for audio classification pipeline (#23445 ) * Allow dict input for audio classification pipeline * make style * Empty commit to trigger CI * Empty commit to trigger CI * check for torchaudio * add pip instructions Co-authored-by: Sylvain <sylvain.gugger@gmail.com> * Update src/transformers/pipelines/audio_classification.py Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> * asr -> audio class * asr -> audio class --------- Co-authored-by: Sylvain <sylvain.gugger@gmail.com> Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>	2023-06-23 13:50:37 +01:00
Yih-Dar	652ece0710	Skip `test_conditional_generation_pt_pix2struct` in Past CI (torch < 1.11) (#24417 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-06-22 15:34:13 +02:00
Matthijs Hollemans	cd927a4736	add word-level timestamps to Whisper (#23205 ) * let's go! * initial implementation of token-level timestamps * only return a single timestamp per token * remove token probabilities * fix return type * fix doc comment * strip special tokens * rename * revert to not stripping special tokens * only support models that have alignment_heads * add integration test * consistently name it token-level timestamps * small DTW tweak * initial support for ASR pipeline * fix pipeline doc comments * resolve token timestamps in pipeline with chunking * change warning when no final timestamp is found * return word-level timestamps * fixup * fix bug that skipped final word in each chunk * fix failing unit tests * merge punctuations into the words * also return word tokens * also return token indices * add (failing) unit test for combine_tokens_into_words * make combine_tokens_into_words private * restore OpenAI's punctuation rules * add pipeline tests * make requested changes * PR review changes * fix failing pipeline test * small stuff from PR * only return words and their timestamps, not segments * move alignment_heads into generation config * forgot to set alignment_heads in pipeline tests * tiny comment fix * grr	2023-06-21 17:48:21 +02:00
Yih-Dar	c23d131eab	Update tiny models for pipeline testing. (#24364 ) * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-06-20 14:43:10 +02:00
Yih-Dar	eac8dede83	Skip some `TQAPipelineTests` tests in past CI (#24267 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-06-14 14:25:24 +02:00
Yih-Dar	d0d1632958	Fix Pipeline CI OOM issue (#24124 ) * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-06-09 16:49:02 +02:00
NielsRogge	2f424d7979	[image-to-text pipeline] Add conditional text support + GIT (#23362 ) * First draft * Remove print statements * Add conditional generation * Add more tests * Remove scripts * Remove BLIP specific linkes * Add support for pix2struct * Add fast test * Address comment * Fix style	2023-05-22 21:45:50 +02:00
Yih-Dar	5777c3cb3f	Fix (skip) a pipeline test for `RwkvModel` (#23444 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-05-18 14:54:23 +02:00
Joao Gante	b369e507aa	Generate: text generation pipeline no longer emits `max_length` warning when it is not set (#23139 )	2023-05-04 18:36:23 +01:00
Yih-Dar	975159bb61	Update tiny models and a few fixes (#22928 ) * run_check_tiny_models * update summary * update mixin * update pipeline_model_mapping * update pipeline_model_mapping * Update for gpt_bigcode --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-04-24 14:45:22 +02:00
Yih-Dar	1e1cb6f8e5	Fix `FillMaskPipelineTests` (#22894 ) * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-04-21 15:16:45 +02:00
Arthur	f143037789	Add `automatic-mask-generation` pipeline for Segment Anything Model (SAM) (#22840 ) * cleanup * updates * more refactoring * make style * update inits * support other inputs in base * update based on review Co-authored-by: Nicolas Patry <patry.nicolas@gmail.com> * Update tests/pipelines/test_pipelines_automatic_mask_generation.py Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> * update * fixup * TODO x and y to refactor, _h _w refactored here * update docstring * more nits * style on these * more doc fix * rename variables * update * updates * style * update * fix `_mask_to_rle_pytorch` * styling * fix ask to rle, wrong outputs * add device arg * update * more updates, fix tets * udpate * update docstrings * styling * fixup * add notebook on the docs * update orginal sizes * fix docstring * updat condition on point_per-batch * updates tests * fix CI test * extend is required, append does not work! * fixup * fix CI tests * whit pixels left * address doc comments * fix doc * slow pipeline tests * update auto init * add revision * make fixup * update p!ipoeline tag when calling tests * alphabeitcal order in inits * fix copies * last style nits * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * reformat docstring * more reformat * address most of the comments * Update src/transformers/pipelines/mask_generation.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * final refactor * Update src/transformers/models/sam/image_processing_sam.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fixup and fix slow tests * revert --------- Co-authored-by: Nicolas Patry <patry.nicolas@gmail.com> Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-04-20 19:27:24 +02:00
Yih-Dar	5269718cb7	Don't use `LayoutLMv2` and `LayoutLMv3` in some pipeline tests (#22774 ) * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-04-17 17:45:20 +02:00
Nicolas Patry	a515d0a77c	Soft error whisper. (#22475 ) * Soft error whisper. * Fix format. --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-34-94.taildb5d.ts.net>	2023-04-04 16:21:57 +02:00
Sylvain Gugger	80e3b36361	Really fix quality due to ruff release	2023-03-22 20:56:22 -04:00
Sylvain	ef28df0572	Fix quality due to ruff release	2023-03-22 20:45:08 -04:00
Luc CAILLIAU	d62e7d8842	Chunkable token classification pipeline (#21771 ) * Chunkable classification pipeline The TokenClassificationPipeline is now able to process sequences longer than 512. No matter the framework, the model, the tokenizer. We just have to pass process_all=True and a stride number (optional). The behavior remains the same if you don't pass these optional parameters. For overlapping parts when using stride above 0, we consider only the max scores for each overlapped token in all chunks where the token is. * Update token_classification.py * Update token_classification.py * Update token_classification.py * Update token_classification.py * Update token_classification.py * Update token_classification.py * Update token_classification.py * Update token_classification.py * Update token_classification.py * Update token_classification.py * Update token_classification.py * Update token_classification.py * update with latest black format * update black format * Update token_classification.py * Update token_classification.py * format correction * Update token_classification.py * Update token_classification.py * Update token_classification.py * Update token_classification.py * Update comments * Update src/transformers/pipelines/token_classification.py Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> * Update token_classification.py Correct spaces, remove process_all and keep only stride. If stride is provided, the pipeline is applied to the whole text. * Update token_classification.py * Update token_classification.py * Update token_classification.py * Update token_classification.py * Update token_classification.py * Update token_classification.py * Update token_classification.py * Update token_classification.py * Update chunk aggregation Update the chunk aggregation strategy based on entities aggregation. * Update token_classification.py * Update token_classification.py * Update token_classification.py * Update token_classification.py * Update token_classification.py * Update token_classification.py * Update token_classification.py * Update token_classification.py * Update token_classification.py * Update token_classification.py * Update token_classification.py * Update token_classification.py Remove unnecessary pop from outputs dict * Update token_classification.py * Update token_classification.py * Update token_classification.py * Update token_classification.py * Update token_classification.py * Update token_classification.py * Update token_classification.py * Update src/transformers/pipelines/token_classification.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * add chunking tests * correct formating * correct formatting * correct model id for test chunking * update scores with nested simplify * Update test_pipelines_token_classification.py * Update test_pipelines_token_classification.py * update model to a tiny one * Update test_pipelines_token_classification.py * Adding smaller test for chunking. * Fixup * Update token_classification.py * Update src/transformers/pipelines/token_classification.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/pipelines/token_classification.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> --------- Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2023-03-22 14:13:20 -04:00
Yih-Dar	5110e5748e	🔥py38 + torch 2 🔥🔥🔥🚀 (#22204 ) * py38 + torch 2 * increment cache versions --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-03-16 22:59:23 +01:00
Sylvain Gugger	42ad693b7b	Regression pipeline device (#22190 ) * Fix regression in pipeline when device=-1 is passed * Add regression test	2023-03-15 14:13:38 -04:00
Lucain	923110b74f	Remove set_access_token usage + fail tests if FutureWarning (#22051 ) * Remove set_access_token usage + fail tests if FutureWarning * do not fail on FutureWarning in CI --------- Co-authored-by: testbot <lucainp@hf.co>	2023-03-09 09:23:48 -05:00
Yih-Dar	dfe9a31973	Update `AudioClassificationPipelineTests::test_small_model_pt` for PT 2.0.0 (#22023 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-03-08 13:56:47 +01:00
Nicolas Patry	1325459105	Refactor whisper asr pipeline to include language too. (#21427 ) * [WIP] whisper refacto to support language output. * Handling merges. * A bit more cleanup and comments. * Many improvements. Lots of details everywhere. * Cleanup old code and tests. * Handle lone timestamp tokens (just recover when something bad happens). * Adding return_language example. * No ffmpeg. * Hmm. * Some corrections. * Both fast and slow. * New black. * Update src/transformers/models/whisper/tokenization_whisper.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/whisper/tokenization_whisper.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Remove print. * Undoing tests modifications. * Smaller test modifications. * Rename. * Remove maxDiff. --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-03-02 18:12:19 +01:00
Sylvain Gugger	50a8ed3ee0	Mark pipeline tests to skip them easily (#21887 ) * Mark pipeline tests to skip them easily * Mark the mixin as pipeline test * Update src/transformers/testing_utils.py Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> --------- Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>	2023-03-02 10:55:36 -05:00
Yih-Dar	871c31a6f1	🔥Rework pipeline testing by removing `PipelineTestCaseMeta` 🚀 (#21516 ) * Add PipelineTesterMixin * remove class PipelineTestCaseMeta * move validate_test_components * Add for ViT * Add to SPECIAL_MODULE_TO_TEST_MAP * style and quality * Add feature-extraction * update * raise instead of skip * add tiny_model_summary.json * more explicit * skip tasks not in mapping * add availability check * Add Copyright * A way to diable irrelevant tests * update with main * remove disable_irrelevant_tests * skip tests * better skip message * better skip message * Add all pipeline task tests * revert * Import PipelineTesterMixin * subclass test classes with PipelineTesterMixin * Add pipieline_model_mapping * Fix import after adding pipieline_model_mapping * Fix style and quality after adding pipieline_model_mapping * Fix one more import after adding pipieline_model_mapping * Fix style and quality after adding pipieline_model_mapping * Fix test issues * Fix import requirements * Fix mapping for MobileViTModelTest * Update * Better skip message * pipieline_model_mapping could not be None * Remove some PipelineTesterMixin * Fix typo * revert tests_fetcher.py * update * rename * revert * Remove PipelineTestCaseMeta from ZeroShotAudioClassificationPipelineTests * style and quality * test fetcher for all pipeline/model tests --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-02-28 19:40:57 +01:00
Arthur	cc44e72d14	[Pipeline] Add zero shot audio classificatoin pipeline (#21600 ) * add pipeline * update init * add zero shot to init * update inits and correct checkpoints * update base to support input features * add tests * Update src/transformers/pipelines/zero_shot_audio_classification.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/pipelines/zero_shot_audio_classification.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * update pieline code * use tiny checkpoint * nits and expected value with tiny model * style * last nit on tests values * fix styling * fix collate fn that was casting t float * update --------- Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>	2023-02-27 11:43:44 +01:00
Connor Henderson	279008adc3	fix: Change is_last chunk calc and add conditional break in chunk_iter (#21612 ) * fix: Change is_last chunk calc and add conditional break * format fix * account for 0 and full stride_rights, add comment * add new test * make style * update slow whisper asr test timestamps * use nested_simplify on output and round timestamp to hundreths place	2023-02-24 08:30:32 +01:00
Aaron Gokaslan	5e8c8eb5ba	Apply ruff flake8-comprehensions (#21694 )	2023-02-22 09:14:54 +01:00
Jonatan Kłosko	deafc24388	Add WhisperTokenizerFast (#21222 ) * Add WhisperTokenizerFast * Fixup * Up * Up * Improve tests * Update src/transformers/models/whisper/tokenization_whisper_fast.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Keep stride in whisper pipelien test * Remove unknown token special case * Reduce vocabulary size in tests * Fix vocab size assertion * Sync copied changes from WhisperTokenizer * Skip pipeline tests * Update assertion * Remove Whisper tokenizer dependency on sentencepiece * Format --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-02-21 06:58:54 +01:00
Connor Henderson	0f96c26de6	refactor: Make direct_transformers_import util (#21652 ) * refactor: Make direct_import util * edit direct import fn * add docstring * make import function specific to transformers only * edit doc string	2023-02-16 11:32:32 -05:00
Sylvain Gugger	9d1116e995	Update deprecated load_module (#21651 )	2023-02-15 15:57:24 -05:00
Younes Belkada	f83942684d	[`pipeline`] A simple fix for half-precision & 8bit models (#21479 ) * v1 fix * adapt from suggestions * make style * fix tests * add gpu tests * update docs * fix other tests * Apply suggestions from code review Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> * better fix * make fixup * better example * revert changes * proposal * more elegant solution * Update src/transformers/pipelines/automatic_speech_recognition.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> --------- Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2023-02-10 10:26:17 +01:00
Sylvain Gugger	6f79d26442	Update quality tooling for formatting (#21480 ) * Result of black 23.1 * Update target to Python 3.7 * Switch flake8 to ruff * Configure isort * Configure isort * Apply isort with line limit * Put the right black version * adapt black in check copies * Fix copies	2023-02-06 18:10:56 -05:00
Yih-Dar	a6d8a149a8	Fix some pipeline tests (#21401 ) * fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-02-02 19:03:31 +01:00
Yih-Dar	c749bd405e	Pipeline testing - using tiny models on Hub (#20426 ) * rework pipeline tests * run pipeline tests * fix * fix * fix * revert the changes in get_test_pipeline() parameter list * fix expected error message * skip a test * clean up --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-01-30 10:39:43 +01:00
Nicolas Patry	8788fd0ceb	Moving to cleaner tokenizer version or `oneformer`. (#21292 ) Moving to cleaner tokenizer version.	2023-01-25 15:46:10 +01:00
Arthur	255257f3ea	[Whisper] Refactor whisper (#21252 ) * update whisper logit processor * add generate for whisper * remove part of the whisper specific code from pipeline * update logit processes * major update * enforce first timestamp * update generate * add more tests * update new decoding strategy * Apply suggestions from code review * update docstring * fixup * default config will not have multilingual ar * update expected tokenizer size, see pull on the hub for whisper-tiny	2023-01-25 13:09:43 +01:00
Nicolas Patry	99e7905422	Supporting `ImageProcessor` in place of `FeatureExtractor` for pipelines (#20851 ) * Fixing the pipeline with image processor. * Update the slow test. * Using only the first image processor. * Include exclusion mecanism for Image processor. * Do not handle Gitconfig, deemed as a bug. * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Remove `conversational` changes. They are not supposed to be here. * Address first row of comments. * Remove OneFormer modifications. Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-01-25 10:16:31 +01:00
Arthur	b80b2218b5	[ci-daily] Fix pipeline tests (#21257 ) * use streaming dataset * fix whisper's test * add rescale argument to chunk_iter	2023-01-23 19:32:49 +01:00
Arthur	5d3cb760a0	[Whispe] Fix pipeline after timestamp merges (#21198 ) * pass return_timestamps to pre-process * add a test to test it * test does not need device 0 * remove failing bit * update test	2023-01-20 10:31:40 +01:00
Arthur	e9b4800dda	[Whisper] Fix timestamp processor (#21187 ) * add draft logit processor * add template functions * update timesapmt processor parameters * draft script * simplify code * cleanup * fixup and clean * update pipeline * style * clean up previous idea * add tokenization utils * update tokenizer and asr output * fit whisper type * style and update test * clean test * style test * update tests * update error test * udpate code (not based on review yet) * update tokenization * update asr pipeline * update code * cleanup and update test * fmt * remove text verificatino * cleanup * cleanup * add model test * update tests * update code add docstring * update code and add docstring * fix pipeline tests * add draft logit processor add template functions update timesapmt processor parameters draft script simplify code cleanup fixup and clean update pipeline style clean up previous idea add tokenization utils update tokenizer and asr output fit whisper type style and update test clean test style test update tests update error test udpate code (not based on review yet) update tokenization update asr pipeline update code cleanup and update test fmt remove text verificatino cleanup cleanup add model test update tests update code add docstring update code and add docstring fix pipeline tests * Small update. * Fixup. * Tmp. * More support. * Making `forced_decoder_ids` non mandatory for users to set. * update and fix first bug * properly process sequence right after merge if last * tofo * allow list inputs + compute begin index better * start adding tests * add the 3 edge cases * style * format sequences * fixup * update * update * style * test passes, edge cases should be good * update last value * remove Trie * update tests and expec ted values * handle bigger chunk_length * clean tests a bit * refactor chunk iter and clean pipeline * update tests * style * refactor chunk iter and clean pipeline * upade * resolve comments * Apply suggestions from code review Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> * take stride right into account * update test expected values * Update code based on review Co-authored-by: sgugger <sylvain.gugger@gmail.com> * major refactor * add correct strides for tests * Update src/transformers/pipelines/automatic_speech_recognition.py * fix whisper timestamp test Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> Co-authored-by: sgugger <sylvain.gugger@gmail.com>	2023-01-19 16:25:56 +01:00
Sylvain Gugger	05e72aa0c4	Adapt repository creation to latest hf_hub (#21158 ) * Adapt repository creation to latest hf_hub * Update all examples * Fix other tests, add Flax examples * Address review comments	2023-01-18 11:14:00 -05:00
Arthur	bb300ac686	Whisper Timestamp processor and prediction (#20620 ) * add draft logit processor * add template functions * update timesapmt processor parameters * draft script * simplify code * cleanup * fixup and clean * update pipeline * style * clean up previous idea * add tokenization utils * update tokenizer and asr output * fit whisper type * style and update test * clean test * style test * update tests * update error test * udpate code (not based on review yet) * update tokenization * update asr pipeline * update code * cleanup and update test * fmt * remove text verificatino * cleanup * cleanup * add model test * update tests * update code add docstring * update code and add docstring * fix pipeline tests * add draft logit processor add template functions update timesapmt processor parameters draft script simplify code cleanup fixup and clean update pipeline style clean up previous idea add tokenization utils update tokenizer and asr output fit whisper type style and update test clean test style test update tests update error test udpate code (not based on review yet) update tokenization update asr pipeline update code cleanup and update test fmt remove text verificatino cleanup cleanup add model test update tests update code add docstring update code and add docstring fix pipeline tests * Small update. * Fixup. * Tmp. * More support. * Making `forced_decoder_ids` non mandatory for users to set. * update and fix first bug * properly process sequence right after merge if last * tofo * allow list inputs + compute begin index better * start adding tests * add the 3 edge cases * style * format sequences * fixup * update * update * style * test passes, edge cases should be good * update last value * remove Trie * update tests and expec ted values * handle bigger chunk_length * clean tests a bit * refactor chunk iter and clean pipeline * update tests * style * refactor chunk iter and clean pipeline * upade * resolve comments * Apply suggestions from code review Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> * take stride right into account * update test expected values * Update code based on review Co-authored-by: sgugger <sylvain.gugger@gmail.com> Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> Co-authored-by: sgugger <sylvain.gugger@gmail.com>	2023-01-17 15:50:09 +01:00
Nicolas Patry	488a179ce1	Fixing batching pipelines on single items for ChunkPipeline (#21132 ) * Fixing #20783 * Update src/transformers/pipelines/base.py * Fixing some tests. * Fixup. * Remove ffmpeg dep + a bit more relaxed for bigbird QA precision. * Better dataset. * Prevent failing on TF. * Better condition. We can't use `can_use_iterator` since we cannot use it directly.	2023-01-16 15:04:27 +01:00
Yih-Dar	b3a0aad37d	Fix past CI (#20967 ) * Fix for Past CI * make style * clean up * unindent 2 blocks Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-01-12 18:04:21 +01:00
Arthur	e3ecbaa4ab	Patch-past-refactor (#21050 ) * small patches, forgot a line * refactor PT * the actual fix	2023-01-09 18:12:13 +01:00
Sylvain Gugger	9a046cc14e	Skip failing test until Athur looks at it.	2023-01-08 04:53:20 -05:00
Alara Dirik	cd2457809f	Improve OWL-ViT postprocessing (#20980 ) * add post_process_object_detection method * style changes	2023-01-03 19:25:09 +03:00
NielsRogge	9c6f7485a6	Add GIT (GenerativeImage2Text) (#20295 ) * First draft * Make model instantiation work * Fix copied from statement * More fixes * Add correct output head * Improve configuration * Add conversion script * Improve conversion script * Remove token_type_ids * Fix conversion of projection layers * Convert all weights * Use cats image * Make logits match * Generate caption on cats image * Add GITProcessor * Update conversion script * Add support for more checkpoints * Fix conversion script * Add initial tests * Remove cross-attention * More improvements * Remove is_decoder * Improve model tests * Improve tests * Improve model outputs * Fix model outputs equivalence * Fix more tests * Remove unused code * Use generate to generate text, no use of cache for now * Use generate more appropriately * Fix config tests * Fix style * Add support for use_cache Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Fix style * Fix GIT vision encoder * Update README * Fix integration test * Set bos and eos token ids * Improve docs * Improve code * Add support for provided attention_mask * Add copied from statement * Fix gradient checkpointing test * Set model_input_names * Investigate model_input_names * Remove script * Fix model inputs * Fix docstring * Rename GIT to Git * Support more models * Add support for textvqa model * Add video support * Extend conversion script for video * Add support for large variant * Add support for more models * Fix config archive map * Update integration test * Fix README * Fix CLIP mean and std * Update processor * Fix use_cache for video, thanks @gante * Remove print statements * Remove assertion * Add processor tests * Fix model_input_names * Use Auto API for processor * Fix processor tests * Fix integration test * Fix pipeline test * Make tests faster * Update conversion script * Update conversion script * Convert more checkpoints * Update conversion script * Fix typo * Update docstrings * Improve code snippets * Fix doc tests * Add more code examplesé * Fix doc tests * Add integration tests * Fix unused variable * revert * Add GIT to Japanese README Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-01-03 14:17:18 +01:00
bofeng huang	47c9b22d08	Add generate kwargs to `AutomaticSpeechRecognitionPipeline` (#20952 ) * Add generate kwargs to AutomaticSpeechRecognitionPipeline * Add test for generation kwargs	2022-12-31 01:13:28 -05:00
bofeng huang	fe65657de1	Fix FP16 inference in TextGenerationPipeline (#20913 ) * add torch_dtype attribute to Pipeline * Use torch_dtype to cast input tensor type in AutomaticSpeechRecognitionPipeline * Fix code quality * Add TextGenerationPipeline fp16 test * Fix code quality * Remove useless require in tests Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>	2022-12-29 02:19:25 -05:00
Nicolas Patry	f7f0ec2f54	Adding support for `fp16` for asr pipeline. (#20864 ) * Supporting `fp16` for asr pipeline * Adding test. * Style. * Oops. * Flake8 update ? * Fixing flake8 ? * Revert "Flake8 update ?" This reverts commit `0b917fcb52`. * Style (acctidentally deleted flake8 F401.) * Move to a bigger test (no small whisper model, and s2t doesn't seem to accept torch_dtype=fp16). Also we need to use a GPU to actually compute on fp16. * Using BatchFeature capability.	2022-12-23 10:18:45 +01:00
Andreas Madsen	b4b613b102	Implement Roberta PreLayerNorm (#20305 ) * Copy RoBERTa * formatting * implement RoBERTa with prelayer normalization * update test expectations * add documentation * add convertion script for DinkyTrain weights * update checkpoint repo Unfortunately the original checkpoints assumes a hacked roberta model * add to RoBERTa-PreLayerNorm docs to toc * run utils/check_copies.py * lint files * remove unused import * fix check_repo reporting wrongly a test is missing * fix import error, caused by rebase * run make fix-copies * add RobertaPreLayerNormConfig to ROBERTA_EMBEDDING_ADJUSMENT_CONFIGS * Fix documentation <Facebook> -> Facebook Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fixup: Fix documentation <Facebook> -> Facebook Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Add missing Flax header Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * expected_slice -> EXPECTED_SLICE Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * update copies after rebase * add missing copied from statements * make fix-copies * make prelayernorm explicit in code * fix checkpoint path for the original implementation * add flax integration tests * improve docs * update utils/documentation_tests.txt * lint files * Remove Copyright notice Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * make fix-copies * Remove EXPECTED_SLICE calculation comments Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-12-19 09:30:17 +01:00
Nicolas Patry	3ee958207a	Fix object detection2 (#20798 ) * Revert "Fixing object detection with `layoutlm` (#20776)" This reverts commit `fca66abe2a`. * Better fix for layoutlm object detection. * Style.	2022-12-16 13:25:36 +01:00
Younes Belkada	4341f4e224	[Pipeline] skip feature extraction test if in `IMAGE_PROCESSOR_MAPPING` (#20790 ) skip feature extraction test if in `IMAGE_PROCESSOR_MAPPING`	2022-12-16 12:46:58 +01:00
Nicolas Patry	fca66abe2a	Fixing object detection with `layoutlm` (#20776 ) * Fixing object detection with layoutlm. * Fixup.	2022-12-15 18:46:43 +01:00
Younes Belkada	8891193e83	[Pipeline] fix failing bloom `pipeline` test (#20778 ) fix failing `pipeline` test	2022-12-15 18:46:00 +01:00
Nicolas Patry	a9912d2fca	Even more validation. (#20762 ) * Even more validation. * Fixing order.	2022-12-15 10:05:54 +01:00
Yih-Dar	a12c5cbcd8	Change a logic in pipeline test regarding TF (#20710 ) * Fix the pipeline test regarding TF * Fix the pipeline test regarding TF * update comment Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-12-13 13:42:36 +01:00
Nicolas Patry	53357e8196	Adding ValueError when imcompatible parameters are used. (#20729 )	2022-12-12 15:39:13 +01:00
Nathan Raw	9e56aff58a	Add video classification pipeline (#20151 ) * 🚧 wip video classification pipeline * 🚧 wip - add is_decord_available check * 🐛 add missing import * ✅ add tests * 🔧 add decord to setup extras * 🚧 add is_decord_available * ✨ add video-classification pipeline * 📝 add video classification pipe to docs * 🐛 add missing VideoClassificationPipeline import * 📌 add decord install in test runner * ✅ fix url inputs to video-classification pipeline * ✨ updates from review * 📝 add video cls pipeline to docs * 📝 add docstring * 🔥 remove unused import * 🔥 remove some code * 📝 docfix	2022-12-08 16:22:43 -05:00
Yih-Dar	cec5f7abd1	Update summarization `run_pipeline_test` (#20623 ) * update summarization run_pipeline_test * update Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-12-07 15:46:12 +01:00
Yih-Dar	9b14c1b6bf	Fix `AutomaticSpeechRecognitionPipelineTests.run_pipeline_test` (#20597 ) * Remove assert exception not triggered * Fix wrong expected exception string * fix * use assertRaisesRegex Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-12-06 15:48:49 +01:00
Arthur	538e5248b0	Ci-whisper-asr (#20588 ) * Expected output for the test changed * fix failing asr test	2022-12-05 16:50:38 +01:00
Yih-Dar	cc8aec6740	Add `require_torch` to 2 pipeline tests (#20585 ) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-12-05 16:06:39 +01:00
NielsRogge	4973d2a04c	Add Audio Spectogram Transformer (#19981 ) * First draft * Make conversion script work * Add id2label mapping, run code quality * Fix copies * Add first draft of feature extractor * Update conversion script to use feature extractor * Make more tests pass * Add docs * update input_features to input_values + pad by default to max length * Fix doc tests * Add feature extractor tests * Add proper padding/truncation to feature extractor * Add support for conversion of all audioset checkpoints * Improve docs and extend conversion script * Fix README * Rename spectogram to spectrogram * Fix copies * Add integration test * Remove dummy conv * Update to ast * Update organization * Fix init * Rename model to AST * Add require_torchaudio annotator * Move import of ASTFeatureExtractor under a is_speech_available * Fix rebase * Add pipeline config * Update name of classifier head * Rename time_dimension and frequency_dimension for clarity * Remove print statement * Fix pipeline test * Fix pipeline test * Fix index table * Fix init * Fix conversion script * Rename to ForAudioClassification * Fix index table Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>	2022-11-21 18:58:54 +01:00
Nicolas Patry	8e777b3ba4	[Proposal] Breaking change `zero-shot-object-detection` for improved consistency. (#20280 ) * [Proposal] Breaking change `zero-shot-object-detection` for improved consistency. This is a proposal to modify the output of `zero-shot-object-detection` to provide better alignment with other pipelines. The output is now strictly the same as `object-detection` whereas before it would output lists of lists. The name `candidate_labels` is used throughout for consistency with other `zero-shot` pipelines. The pipeline is changed to `ChunkPipeline` to support batching cleanly. This removes all the lists and list of lists shenanigans, it's now a matter of the base pipeline handling all this not this specific one. Breaking change: It did remove complex calls potentials `pipe(images = [image1, image2], text_queries=[candidates1, candidates2])` to support only `pipe([{"image": image1, "candidate_labels": candidates1}, {"image": image2, "candidate_labels": candidates2}])` when dealing with lists and/or datasets. We could keep them, but it will add a lot of complexity to the code base, since the pipeline is rather young, I'd rather break to keep the code simpler, but we can revert this. Breaking change: The name of the argument is now `image` instead of `images` since it expects by default only 1 image. This is revertable like the previous one. Breaking change: The types is now simplified and flattened: `pipe(inputs) == [{object1}, {object2}]` instead of the previous `pipe(inputs) == [[{object1}, {object1}], [{object2}]]` Where the different instances would be grouped by candidate labels within lists. IMHO this is not really desirable, since it would output empty lists and is only adding superflous indirection compared to `zero-shot-object-detection`. It is relatively change free in terms of how the results, it does change computation however since now the batching is handled by the pipeline itself. It did** change the results for the small models so there seems to be a real difference in how the models handle this. * Fixing the doctests. * Behind is_torch_available.	2022-11-18 15:57:28 +01:00
Younes Belkada	163ac3d3ee	Add Switch transformers (#19323 ) * first commit * add more comments * add router v1 * clean up - remove `tf` modeling files * clean up - remove `tf` modeling files * clean up * v0 routers * added more router - Implemented `ExpertsChooseMaskedRouter` - added tests - 2 more routers to implement * last router * improved docstring - completed the docstring in `router.py` - added more args in the config * v0 sparse mlp * replace wrong naming * forward pass run * update MOE layer * small router update * fixup * consistency * remove scatter router * remove abstract layer * update test and model for integration testing * v1 conversion * update * hardcode hack * all keys match * add gin conversion, without additional libraries * update conversion sctipy * delete router file * update tests wrt router deletion * fix router issues * update expert code * update, logits match, code needsREFACTORING * Refactor code Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com> * add generate tests Co-authored-by: younesbelkada <younesbelkada@gmail.com> * add support for router loss Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com> * fix forward error * refactor a bit * remove `FlaxSwitchTransformers` modules * more tests pass * Update code Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com> * fixup * fix tests * fix doc * fix doc + tokenization * fix tokenizer test * fix test * fix loss output * update code for backward pass * add loss support * update documentation * fix documentation, clean tokenizer * more doc fix, cleanup example_switch * fix failing test * fix test * fix test * fix loss issue * move layer * update doc and fix router capacity usage * fixup * add sparse mlp index for documentation on hub * fixup * test sparse mix architecture * Apply suggestions from code review * Update docs/source/en/model_doc/switch_transformers.mdx * fixup on update * fix tests * fix another test * attempt fix * Update src/transformers/models/switch_transformers/configuration_switch_transformers.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/switch_transformers/convert_switch_transformers_original_flax_checkpoint_to_pytorch.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * try * all tests pass * fix jitter noise * Apply suggestions from code review * doc tests pass * Update src/transformers/models/switch_transformers/modeling_switch_transformers.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/switch_transformers/modeling_switch_transformers.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * remove assert * change config order * fix readme japanese * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * remove parallelizable tests + add one liners * remove ONNX config * fix nits - add `T5Tokenizer` in auto mapping - remove `Switch Transformers` from ONNX supported models * remove `_get_router` * remove asserts * add check in test for `router_dtype` * add `SwitchTransformersConfig` in `run_pipeline_test` * Update tests/pipelines/test_pipelines_summarization.py * add huge model conversion script * fix slow tests - add better casting for `Linear8bitLt` - remove `torchscript` tests * add make dir * style on new script * fix nits - doctest - remove `_keys_to_ignore_on_load_unexpected` * Update src/transformers/models/switch_transformers/configuration_switch_transformers.py * add google as authors * fix year * remove last `assert` statements * standardize vertical spaces * fix failing import * fix another failing test * Remove strange àuthorized_keys` * removing todo and padding that is never used Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: ybelkada <younes@huggingface.co> Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Arthur Zucker <arthur@huggingface.co>	2022-11-15 13:06:45 +01:00
Yih-Dar	f9909fbf85	Make `ImageSegmentationPipelineTests` less flaky (#20147 ) * Fix ImageSegmentationPipelineTests * Use 0.9 * no zip * links to show images * links to show images * rebase Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-11-15 09:14:55 +01:00
Nicolas Patry	25c451e5a0	Adding chunking for whisper (all seq2seq actually). Very crude matching algorithm. (#20104 ) * Very crude matching algorithm. * Fixing tests. * Removing comments * Adding warning + fix short matches. * Cleanup tests. * Quality. * Less noisy. * Fixup.	2022-11-14 22:32:50 +01:00
Bartosz Szmelczynski	78a471ff71	Fix tapas scatter (#20149 ) * First draft * Remove scatter dependency * Add require_torch * update vectorized sum test, add clone call * remove artifacts * fix style * fix style v2 * remove "scatter" mentions from the code base * fix isort error Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-11-14 01:04:26 -05:00
Sylvain Gugger	9740a03f61	Skip broken test	2022-11-10 14:59:32 -05:00
Nicolas Patry	d066c3731b	Adding support for LayoutLMvX variants for `object-detection`. (#20143 ) * Adding support for LayoutLMvX variants for `object-detection`. * Revert bogs `layoutlm` feature extractor which does not exist (it was a V2 model) . * Updated condition. * Handling the comments.	2022-11-10 11:33:38 +01:00
Nicolas Patry	ec6878f6ca	Now supporting pathlike in pipelines too. (#20030 )	2022-11-03 09:14:45 +01:00
Nicolas Patry	5fd5990dce	Factored out some code in the `image-segmentation` pipeline. (#19727 ) * Factored out some code in the image-segmentation pipeline Re-enable `small_model_pt`. Re-enable `small_model_pt`. Enabling the current test with the current values. Debugging the values on the CI. More logs ? Printing doesn't work ? Using the CI values instead. Seems to be a Pillow sensitivity. Added a test showcasing that models not supporting some tasks get a clear error. Factored out code. Further factor out. Fixup. Bad rebase. Put `panoptic` before `instance` as it should be a superset. * Fixing tests. * Adding subtasks tests + Fixes `instance` segmentation which was broken due to default and non kwargs arguments. * Fix bad replace.	2022-10-26 10:44:36 +02:00
Rak Alexey	d3f4cef74d	fix image2test args forwarding (#19648 ) * fix image2test args forwarding * fix issues * Proposing the update to the PR. * Fixup. Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>	2022-10-24 09:49:24 -04:00
Alara Dirik	cca51aa151	Fix image segmentation pipeline errors, resolve backward compatibility issues (#19768 ) * Fix panoptic segmentation and pipeline * Update ImageSegmentationPipeline tests and reenable test_small_model_pt * Resolve backward compatibility issues	2022-10-21 18:09:58 +03:00
Yih-Dar	3aaabaa214	Update `ImageToTextPipelineTests.test_small_model_tf` (#19785 ) * update expected values for the correct TF checkpoint * Run test * Clean up * fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-10-21 14:35:20 +02:00
Nicolas Patry	a40386669f	`image-segmentation` pipeline: re-enable `small_model_pt` test. (#19716 ) * Re-enable `small_model_pt`. Re-enable `small_model_pt`. Enabling the current test with the current values. Debugging the values on the CI. More logs ? Printing doesn't work ? Using the CI values instead. Seems to be a Pillow sensitivity. * Update src/transformers/pipelines/image_segmentation.py Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com> Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>	2022-10-20 11:57:11 +02:00
Yih-Dar	bed2edb99f	Specify TF framework explicitly in more pipeline tests (#19748 ) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-10-19 16:24:03 +02:00
David Yang	a23819ed6a	Clean up deprecation warnings (#19654 ) * Clean up deprecation warnings Notes: Changed some strings in tests to raw strings, which will change the literal content of the strings as they are fed into whatever machine handles them. Test cases for past in the past/past_key_values switch changed/removed due to warning of impending removal * Add PILImageResampling abstraction for PIL.Image.Resampling	2022-10-18 13:34:47 -04:00
Yih-Dar	06a82a49ae	Specify TF framework in TF-related pipeline tests (#19719 ) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-10-18 17:40:28 +02:00
Nicolas Patry	63d13d768b	Improving `image-segmentation` pipeline tests. (#19710 ) This PR (https://github.com/huggingface/transformers/pull/19367) introduced a few breaking changes: - Removed an argument `mask_threshold`. - Broke the default behavior (instance vs panoptic in the function call) https://github.com/huggingface/transformers/pull/19367/files#diff-60f846b86fb6a21d4caf60f5b3d593a04accb8f248de3029cccae2ff898c5bc3R119-R120 - Broke the actual masks: https://github.com/huggingface/transformers/pull/1961 This PR is the start of a handful that will aim at bringing back the old behavior(s). - tests should not have to specify `task` by default, unless we want to modify the behavior and have a lower form of segmentation running) - `test_small_model_pt` should be working. This specific PR starts with adding more information to the masks hash because missing the actual mask was actual easy to miss (the hashes do change, but it was easy to miss that one code path wasn't properly updated). So we go from a simple `hash` to ``` {"hash": #smaller hash, "shape": (h, w), "white_pixels": n} ``` The `shape` should help make sure the interpolation of the mask works correctly, the `white_pixels` hopefully helps detect big regressions in their amount when the hash gets modified.	2022-10-18 16:33:53 +02:00
Nicolas Patry	ee2a80ecc0	add return_tensors parameter for feature_extraction 2 (#19707 ) * add return_tensors parameter for feature_extraction w/ test add return_tensor parameter for feature extraction Revert "Merge branch 'feature-extraction-return-tensor' of https://github.com/ajsanjoaquin/transformers into feature-extraction-return-tensor" This reverts commit d559da743b87914e111a84a98ba6dbb70d08ad88, reversing changes made to bbef89278650c04c090beb65637a8e9572dba222. call parameter directly Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> Fixup. Update src/transformers/pipelines/feature_extraction.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Fix the imports. * Fixing the test by not overflowing the model capacity. Co-authored-by: AJ San Joaquin <ajsanjoaquin@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-10-18 16:29:00 +02:00
Nicolas Patry	713eab45d3	🚨 🚨 🚨 [Breaking change] Deformable DETR intermediate representations (#19678 ) * [Breaking change] Deformable DETR intermediate representations - Fixes naturally the `object-detection` pipeline. - Moves from `[n_decoders, batch_size, ...]` to `[batch_size, n_decoders, ...]` instead. * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-10-18 09:00:39 -04:00
Arthur	d356b89f3c	fix test whisper with new max length (#19668 )	2022-10-18 08:56:37 +02:00
Sylvain Gugger	f2ecb9eec4	Revert "add return_tensor parameter for feature extraction (#19257 )" (#19680 ) This reverts commit `35bd089a24`.	2022-10-17 11:56:29 -04:00
Ayrton San Joaquin	35bd089a24	add return_tensor parameter for feature extraction (#19257 ) * add return_tensors parameter for feature_extraction w/ test add return_tensor parameter for feature extraction Revert "Merge branch 'feature-extraction-return-tensor' of https://github.com/ajsanjoaquin/transformers into feature-extraction-return-tensor" This reverts commit d559da743b87914e111a84a98ba6dbb70d08ad88, reversing changes made to bbef89278650c04c090beb65637a8e9572dba222. * call parameter directly Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> * Fixup. * Update src/transformers/pipelines/feature_extraction.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-10-17 11:17:26 -04:00
Ankur Goyal	cbc1abc4af	A few CI fixes for `DocumentQuestionAnsweringPipeline` (#19584 ) * Fixes * update expected values * style * fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-10-17 15:35:27 +02:00
Matt	3b3024da70	TF port of ESM (#19587 ) * Partial TF port for ESM model * Add ESM-TF tests * Add the various imports for TF-ESM * TF weight conversion almost ready * Stop ignoring the decoder weights in PT * Add tests and lots of fixes * fix-copies * Fix imports, add model docs * Add get_vocab() to tokenizer * Fix vocab links for pretrained files * Allow multiple inputs with a sep * Use EOS as SEP token because ESM vocab lacks SEP * Correctly return special tokens mask from ESM tokenizer * make fixup * Stop testing unsupported embedding resizing * Handle TF bias correctly * Skip all models with slow tokenizers in the token classification test * Fixing the batch/unbatcher of pipelines to accomodate the `None` being passed around. * Fixing pipeline bug caused by slow tokenizer being different. * Update src/transformers/models/esm/modeling_tf_esm.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/models/esm/modeling_tf_esm.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/models/esm/modeling_tf_esm.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update set_input_embeddings and the copyright notices Co-authored-by: Your Name <you@example.com> Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2022-10-17 14:16:16 +01:00
Sivaudha	8aad4363d8	Fix pipeline predict transform methods (#19657 ) * Remove key word argument X from pipeline predict and transform methods As __call__ of pipeline clasees require one positional argument, passing the input as a keyword argument inside predict, transform methods, causing __call__ to fail. Hence in this commit the keyword argument is modified into positional argument. * Implement basic tests for scikitcompat pipeline interface * Seperate tests instead of running with parameterized based on framework as both frameworks will not be active at the same time	2022-10-17 09:06:20 -04:00
Nicolas Patry	463226e2ee	Improve error messaging for ASR pipeline. (#19570 ) * Improve error messaging for ASR pipeline. - Raise error early (in `_sanitize`) so users don't waste time trying to run queries with invalid params. - Fix the error was after using `config.inputs_to_logits_ratio` so our check was masked by the failing property does not exist. - Added some manual check on s2t for the error message. No non ctc model seems to be used by the default runner (they are all skipped). * Removing pdb. * Stop the early error it doesn't really work :(.	2022-10-14 17:12:21 +02:00
Yih-Dar	62f28bc152	Fix `ImageToTextPipelineTests.test_small_model_tf` (#19565 ) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-10-14 16:29:54 +02:00
amyeroberts	83a2e694f1	Cast masks to np.unit8 before converting to PIL.Image.Image (#19616 ) * Cast masks to np.unit8 before converting to PIL.Image.Image * Update tests * Fixup	2022-10-14 09:30:45 -04:00
Ritik Nandwal	e94384e4d8	Add depth estimation pipeline (#18618 ) * Add initial files for depth estimation pipelines * Add test file for depth estimation pipeline * Update model mapping names * Add updates for depth estimation output * Add generic test * Hopefully fixing the tests. * Check if test passes * Add make fixup and make fix-copies changes after rebase with main * Rebase with main * Fixing up depth pipeline. * This is not used anymore. * Fixing the test. `Image` is a module `Image.Image` is the type. * Update docs/source/en/main_classes/pipelines.mdx Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-10-12 08:54:20 -04:00
Quancore	70a058bc65	Added tokenize keyword arguments to feature extraction pipeline (#19382 ) * Added tokenize keyword arguments to feature extraction pipeline * Reverted truncation parameter * Import numpy moved to top	2022-10-11 12:54:41 -04:00
Ankur Goyal	a3008c5a6d	Implement multiple span support for DocumentQuestionAnswering (#19204 ) * Implement multiple span support * Address comments * Add tests + fix bugs	2022-10-11 10:47:55 -04:00
Arthur	b722a6be72	Fix whisper for `pipeline` (#19482 ) * update feature extractor params * update attention mask handling * fix doc and pipeline test * add warning when skipping test * add whisper translation and transcription test * fix build doc test	2022-10-11 07:17:53 -04:00
Sylvain Gugger	d92e22d1f2	Remove ref to is_pipeline_test	2022-10-07 21:38:07 -04:00
Sylvain Gugger	9ac586b3c8	Rework pipeline tests (#19366 ) * Rework pipeline tests * Try to fix Flax tests * Try to put it before * Use a new decorator instead * Remove ignore marker since it doesn't work * Filter pipeline tests * Woopsie * Use the fitlered list * Clean up and fake modif * Remove init * Revert fake modif	2022-10-07 18:01:58 -04:00
Alara Dirik	983451a13e	Improve and fix ImageSegmentationPipeline (#19367 ) - Fixes the image segmentation pipeline test failures caused by changes to the postprocessing methods of supported models - Updates the ImageSegmentationPipeline tests - Improves docs, adds 'task' argument to optionally perform semantic, instance or panoptic segmentation	2022-10-07 23:34:41 +03:00
Amrit Sahu	e9a49babee	[WIP] Add ZeroShotObjectDetectionPipeline (#18445 ) (#18930 ) * Add ZeroShotObjectDetectionPipeline (#18445) * Add AutoModelForZeroShotObjectDetection task This commit also adds the following - Add explicit _processor method for ZeroShotObjectDetectionPipeline. This is necessary as pipelines don't auto infer processors yet and `OwlVitProcessor` wraps tokenizer and feature_extractor together, to process multiple images at once - Add auto tests and other tests for ZeroShotObjectDetectionPipeline * Add AutoModelForZeroShotObjectDetection task This commit also adds the following - Add explicit _processor method for ZeroShotObjectDetectionPipeline. This is necessary as pipelines don't auto infer processors yet and `OwlVitProcessor` wraps tokenizer and feature_extractor together, to process multiple images at once - Add auto tests and other tests for ZeroShotObjectDetectionPipeline * Add batching for ZeroShotObjectDetectionPipeline * Fix doc-string ZeroShotObjectDetectionPipeline * Fix output format: ZeroShotObjectDetectionPipeline	2022-10-07 10:00:19 -04:00
Sylvain Gugger	7e7f62bfa7	Fix pipeline tests for Roberta-like tokenizers (#19365 ) * Fix pipeline tests for Roberta-like tokenizers * Fix fix	2022-10-05 17:48:14 -04:00
Sylvain Gugger	c875a96eb1	Test failing test while we resolve the issue. (#19355 )	2022-10-05 12:23:48 -04:00
Karim Foda	e396358104	Add stop sequence to text generation pipeline (#18444 )	2022-09-30 14:26:51 +01:00
Sylvain Gugger	f16bbf1475	Skip pipeline tests (#19248 )	2022-09-29 12:25:15 -04:00
Ankur Goyal	67403413bd	Change document question answering pipeline to always return an array (#19071 ) Co-authored-by: Ankur Goyal <ankur@impira.com>	2022-09-20 15:17:57 +02:00
amyeroberts	30a28f5227	Update image segmentation pipeline test (#18731 ) * Updated test values The image segmentation pipeline tests - tests/pipelines/test_pipelines_image_segmentation.py - were failing after the merging of #1849 (`49e44b216b`). This was due to the difference in rescaling. Previously the images were rescaled by `image = image / 255`. In the new commit, a `rescale` method was added, and images rescaled using `image = image * scale`. This was known to cause small differences in the processed images (see [PR comment](https://github.com/huggingface/transformers/pull/18499#discussion_r940347575)). Testing locally, changing the `rescale` method to divide by a scale factor (255) resulted in the tests passing. It was therefore decided the test values could be updated, as there was no logic difference between the commits. * Use double quotes, like previous example * Fix up	2022-09-15 07:32:31 -04:00
Yih-Dar	6a9726ec0e	Fix `DocumentQuestionAnsweringPipelineTests` (#19023 ) * Fix DocumentQuestionAnsweringPipelineTests Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-09-14 16:13:20 +02:00
NielsRogge	59407bbeb3	Add Deformable DETR (#17281 ) * First draft * More improvements * Improve model, add custom CUDA code * Import torch before * Add script that imports custom layer * Add everything in new ops directory * Import custom layer in modeling file * Fix ARCHIVE_MAP typo * Creating the custom kernel on the fly. * Import custom layer in modeling file * More improvements * Fix CUDA loading * More improvements * Improve conversion script * Improve conversion script * Make it work until encoder_outputs * Make forward pass work * More improvements * Make logits match original implementation * Make implementation also support single_scale model * Add support for single_scale and dilation checkpoint * Add support for with_box_refine model * Support also two stage model * Improve tests * Fix more tests * Make more tests pass * Upload all models to the hub * Clean up some code * Improve decoder outputs * Rename intermediate hidden states and reference points * Improve model outputs * Move tests to dedicated folder * Improve model outputs * Fix retain_grad test * Improve docs * Clean up and make test_initialization pass * Improve variable names * Add copied from statements * Improve docs * Fix style * Improve docs * Improve docs, move tests to model folder * Fix rebase * Remove DetrForSegmentation from auto mapping * Apply suggestions from code review * Improve variable names and docstrings * Apply some more suggestions from code review * Apply suggestion from code review * better docs and variables names * hint to num_queries and two_stage confusion * remove asserts and code refactor * add exception if two_stage is True and with_box_refine is False * use f-strings * Improve docs and variable names * Fix code quality * Fix rebase * Add require_torch_gpu decorator * Add pip install ninja to CI jobs * Apply suggestion of @sgugger * Remove DeformableDetrForObjectDetection from auto mapping * Remove DeformableDetrModel from auto mapping * Add model to toctree * Add model back to mappings, skip model in pipeline tests * Apply @sgugger's suggestion * Fix imports in the init * Fix copies * Add CPU implementation * Comment out GPU function * Undo previous change * Apply more suggestions * Remove require_torch_gpu annotator * Fix quality * Add logger.info * Fix logger * Fix variable names * Fix initializaztion * Add missing initialization * Update checkpoint name * Add model to doc tests * Add CPU/GPU equivalence test * Add Deformable DETR to pipeline tests * Skip model for object detection pipeline Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> Co-authored-by: Nouamane Tazi <nouamane98@gmail.com> Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>	2022-09-14 11:45:21 +02:00
Ankur Goyal	2ef7742117	Add DocumentQuestionAnswering pipeline (#18414 ) * [WIP] Skeleton of VisualQuestionAnweringPipeline extended to support LayoutLM-like models * Fixup * Use the full encoding * Basic refactoring to DocumentQuestionAnsweringPipeline * Cleanup * Improve args, docs, and implement preprocessing * Integrate OCR * Refactor question_answering pipeline * Use refactored QA code in the document qa pipeline * Fix tests * Some small cleanups * Use a string type annotation for Image.Image * Update encoding with image features * Wire through the basic docs * Handle invalid response * Handle empty word_boxes properly * Docstring fix * Integrate Donut model * Fixup * Incorporate comments * Address comments * Initial incorporation of tests * Address Comments * Change assert to ValueError * Comments * Wrap `score` in float to make it JSON serializable * Incorporate AutoModeLForDocumentQuestionAnswering changes * Fixup * Rename postprocess function * Fix auto import * Applying comments * Improve docs * Remove extra assets and add copyright * Address comments Co-authored-by: Ankur Goyal <ankur@impira.com>	2022-09-07 13:38:49 -04:00
Sylvain Gugger	71ff88fa4f	Further reduce the number of alls to head for cached objects (#18871 ) * Further reduce the number of alls to head for cached models/tokenizers/pipelines * Fix tests * Address review comments	2022-09-06 12:34:37 -04:00
OlivierDehaene	129d73294e	Fix naming issue with ImageToText pipeline (#18864 ) Co-authored-by: Olivier Dehaene <olivier@huggingface.co>	2022-09-02 07:55:30 -04:00
OlivierDehaene	ddb69e5af8	Add Image To Text Generation pipeline (#18821 ) * Add Image2TextGenerationPipeline to supported pipelines * Add Flax and Tensorflow support * Add Flax and Tensorflow small tests * Add default model for Tensorflow * Add docstring * Fix doc style * Add tiny models for pytorch and flax * Remove flax from pipeline. Fix tests * Use ydshieh/vit-gpt2-coco-en as a default for both PyTorch and Tensorflow * Fix Tensorflow support Co-authored-by: Olivier Dehaene <olivier@huggingface.co>	2022-09-01 12:07:14 -04:00
Sylvain Gugger	0d0aada564	Use commit hash to look in cache instead of calling head (#18534 ) * Use commit hash to look in cache instead of calling head * Add tests * Add attr for local configs too * Stupid typos * Fix tests * Update src/transformers/utils/hub.py Co-authored-by: Julien Chaumond <julien@huggingface.co> * Address Julien's comments Co-authored-by: Julien Chaumond <julien@huggingface.co>	2022-08-10 11:55:18 -04:00
Nicolas Patry	9f5fe63548	Adding a new `align_to_words` param to qa pipeline. (#18010 ) * Adding a new `align_to_words` param to qa pipeline. * Update src/transformers/pipelines/question_answering.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Import protection. Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-08-09 18:50:02 +02:00
Nicolas Patry	a4562552eb	[DX fix] Fixing QA pipeline streaming a dataset. (#18516 ) * [DX fix] Fixing QA pipeline streaming a dataset. QuestionAnsweringArgumentHandler would iterate over the whole dataset effectively killing all properties of the pipeline. This restores nice properties when using `Dataset` or `Generator` since those are meant to be consumed lazily. * Handling TF better.	2022-08-08 14:25:56 +02:00
Yih-Dar	280db2e39c	Fix `test_dbmdz_english` by updating expected values (#18482 ) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-08-05 16:49:54 +02:00
Sylvain Gugger	70fa1a8d26	Fix pipeline tests (#18487 ) * Fix pipeline tests * Make sure all pipelines tests run with init changes	2022-08-05 09:14:51 -04:00
Nicolas Patry	586dcf6b21	Fixing issue where generic model types wouldn't load properly with the pipeline (#18392 ) * Adding a better error message when the model is improperly configured within transformers. * Update src/transformers/pipelines/__init__.py * Black version. * Overriding task aliases so that tokenizer+feature_extractor values are correct. * Fixing task aliases by overriding their names early * X. * Fixing feature-extraction. * black again. * Normalizing `translation` too. * Fixing last few corner cases. translation need to use its non normalized name (translation_XX_to_YY, so that the task_specific_params are correctly overloaded). This can be removed and cleaned up in a later PR. `speech-encode-decoder` actually REQUIRES to pass a `tokenizer` manually so the error needs to be discarded when the `tokenizer` is already there. * doc-builder fix. * Fixing the real issue. * Removing dead code. * Do not import the actual config classes.	2022-08-05 08:45:07 +02:00
David	042f420364	Update pipeline word heuristic to work with whitespace in token offsets (#18402 ) * Update pipeline word heuristic to work with whitespace in token offsets This change checks for whitespace in the input string at either the character preceding the token or in the first character of the token. This works with tokenizers that return offsets excluding whitespace between words or with offsets including whitespace. fixes #18111 starting * Use smaller model, ensure expected tokenization * Re-run CI (please squash)	2022-08-02 15:31:01 -04:00
Sylvain Gugger	dc9147ff36	Custom pipeline (#18079 ) * Initial work * More work * Add tests for custom pipelines on the Hub * Protect import * Make the test work for TF as well * Last PyTorch specific bit * Add documentation * Style * Title in toc * Bad names! * Update docs/source/en/add_new_pipeline.mdx Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr> * Auto stash before merge of "custom_pipeline" and "origin/custom_pipeline" * Address review comments * Address more review comments * Update src/transformers/pipelines/__init__.py Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr> Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>	2022-07-19 12:02:35 +02:00
Nicolas Patry	ccc0897804	Adding support for `device_map` directly in `pipeline(..)` function. (#17902 ) * Adding support for `device_map` directly in `pipeline(..)` function. * Updating the docstring. * Adding a better docstring * Put back type hints. * Blacked. (`make fixup` didn't work ??!!)	2022-07-15 15:54:26 +02:00
Sylvain Gugger	6c8017a5c8	Fix image segmentation and object detection pipeline tests (#18100 )	2022-07-11 12:41:56 -04:00

1 2 3 4 5 ...

283 Commits