transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-03 12:50:06 +06:00

Author	SHA1	Message	Date
Ita Zaporozhets	c80f65265b	🚨 rm already deprecated pad_to_max_length arg (#37617 ) * rm already deprecated padding max length * truncate_strategy AS AN ARG is already deprecated for a few years * fix * rm test_padding_to_max_length * rm pad_to_max_length=True in other tests * rm from common * missed fnet	2025-05-01 15:21:55 +02:00
Vinh H. Pham	1897a02d83	Add Fast Image Processor for LayoutLMv3 (#37201 ) * support fast image processor layoutlmv3 * make style * add warning and update test * make style * Update src/transformers/models/layoutlmv3/image_processing_layoutlmv3_fast.py * Update image_processing_auto.py --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>	2025-04-14 15:42:11 +02:00
cyyever	1e6b546ea6	Use Python 3.9 syntax in tests (#37343 ) Signed-off-by: cyy <cyyever@outlook.com>	2025-04-08 14:12:08 +02:00
Yih-Dar	1fcaad6df9	Use `lru_cache` for tokenization tests (#36818 ) * fix * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-03-28 15:09:35 +01:00
co63oc	996f512d52	Fix typos in tests (#36547 ) Signed-off-by: co63oc <co63oc@users.noreply.github.com>	2025-03-05 15:04:06 -08:00
Joao Gante	0863eef248	[tests] remove `pt_tf` equivalence tests (#36253 )	2025-02-19 11:55:11 +00:00
Fanli Lin	f0ae65c198	[tests] further fix `Tester object has no attribute '_testMethodName'` (#35781 ) * bug fix * update with more cases * more entries * Fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-01-29 16:05:33 +01:00
Arthur	b912f5ee43	use torch.testing.assertclose instead to get more details about error in cis (#35659 ) * use torch.testing.assertclose instead to get more details about error in cis * fix * style * test_all * revert for I bert * fixes and updates * more image processing fixes * more image processors * fix mamba and co * style * less strick * ok I won't be strict * skip and be done * up	2025-01-24 16:55:28 +01:00
Yoni Gottesman	082e57e0d4	Fix #34494 assistant tokens when truncated (#34531 ) * Fix assistant tokens when truncated * fix test * fix test * step	2024-11-05 15:10:15 +00:00
Raushan Turganbay	187439c3fa	VLM: special multimodal Tokenizer (#34461 ) * kinda works * update * add tests * update * use special tokens in processors * typo * fix copies * fix * fix moshi after rebase * update * fix tests * update * Update docs/source/en/main_classes/tokenizer.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * update docs * test for load time adding tokens * fix some more tests which are now fetched better * one more fix --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-11-04 16:37:51 +01:00
Pavel Iakubovskii	48461c0fe2	Make `pipeline` able to load `processor` (#32514 ) * Refactor get_test_pipeline * Fixup * Fixing tests * Add processor loading in tests * Restructure processors loading * Add processor to the pipeline * Move model loading on tom of the test * Update `get_test_pipeline` * Fixup * Add class-based flags for loading processors * Change `is_pipeline_test_to_skip` signature * Skip t5 failing test for slow tokenizer * Fixup * Fix copies for T5 * Fix typo * Add try/except for tokenizer loading (kosmos-2 case) * Fixup * Llama not fails for long generation * Revert processor pass in text-generation test * Fix docs * Switch back to json file for image processors and feature extractors * Add processor type check * Remove except for tokenizers * Fix docstring * Fix empty lists for tests * Fixup * Fix load check * Ensure we have non-empty test cases * Update src/transformers/pipelines/__init__.py Co-authored-by: Lysandre Debut <hi@lysand.re> * Update src/transformers/pipelines/base.py Co-authored-by: Lysandre Debut <hi@lysand.re> * Rework comment * Better docs, add note about pipeline components * Change warning to error raise * Fixup * Refine pipeline docs --------- Co-authored-by: Lysandre Debut <hi@lysand.re>	2024-10-09 16:46:11 +01:00
Raushan Turganbay	4b0418df11	Enable `padding_side` as call time kwargs (#33385 ) * fix * add padding-side kwarg * add padding side in all models & fix tests * fix copies * fix tests	2024-09-13 11:58:38 +01:00
amyeroberts	f745e7d3f9	Remove repeated prepare_images in processor tests (#33163 ) * Remove repeated prepare_images * Address comments - update docstring; explanatory comment	2024-09-09 13:20:27 +01:00
Pavel Iakubovskii	fb66ef8147	Update kwargs validation for `preprocess` with decorator (#32024 ) * BLIP preprocess * BIT preprocess * BRIDGETOWER preprocess * CHAMELEON preprocess * CHINESE_CLIP preprocess * CONVNEXT preprocess * DEIT preprocess * DONUT preprocess * DPT preprocess * FLAVA preprocess * EFFICIENTNET preprocess * FUYU preprocess * GLPN preprocess * IMAGEGPT preprocess * INTRUCTBLIPVIDEO preprocess * VIVIT preprocess * ZOEDEPTH preprocess * VITMATTE preprocess * VIT preprocess * VILT preprocess * VIDEOMAE preprocess * VIDEOLLAVA * TVP processing * TVP fixup * SWIN2SR preprocess * SIGLIP preprocess * SAM preprocess * RT-DETR preprocess * PVT preprocess * POOLFORMER preprocess * PERCEIVER preprocess * OWLVIT preprocess * OWLV2 preprocess * NOUGAT preprocess * MOBILEVIT preprocess * MOBILENETV2 preprocess * MOBILENETV1 preprocess * LEVIT preprocess * LAYOUTLMV2 preprocess * LAYOUTLMV3 preprocess * Add test * Update tests	2024-08-06 11:33:05 +01:00
Yoni Gottesman	74d0eb3fed	Return assistant generated tokens mask in apply_chat_template (#30650 ) return assistant generated tokens mask in apply_chat_template	2024-07-22 18:24:43 +01:00
amyeroberts	1de7dc7403	Skip tests properly (#31308 ) * Skip tests properly * [test_all] * Add 'reason' as kwarg for skipTest * [test_all] Fix up * [test_all]	2024-06-26 21:59:08 +01:00
Albert Villanova del Moral	a14b055b65	Pass datasets trust_remote_code (#31406 ) * Pass datasets trust_remote_code * Pass trust_remote_code in more tests * Add trust_remote_dataset_code arg to some tests * Revert "Temporarily pin datasets upper version to fix CI" This reverts commit `b7672826ca`. * Pass trust_remote_code in librispeech_asr_dummy docstrings * Revert "Pin datasets<2.20.0 for examples" This reverts commit `833fc17a3e`. * Pass trust_remote_code to all examples * Revert "Add trust_remote_dataset_code arg to some tests" to research_projects * Pass trust_remote_code to tests * Pass trust_remote_code to docstrings * Fix flax examples tests requirements * Pass trust_remote_dataset_code arg to tests * Replace trust_remote_dataset_code with trust_remote_code in one example * Fix duplicate trust_remote_code * Replace args.trust_remote_dataset_code with args.trust_remote_code * Replace trust_remote_dataset_code with trust_remote_code in parser * Replace trust_remote_dataset_code with trust_remote_code in dataclasses * Replace trust_remote_dataset_code with trust_remote_code arg	2024-06-17 17:29:13 +01:00
amyeroberts	f53fe35b29	Fast image processor (#28847 ) * Draft fast image processors * Draft working fast version * py3.8 compatible cache * Enable loading fast image processors through auto * Tidy up; rescale behaviour based on input type * Enable tests for fast image processors * Smarter rescaling * Don't default to Fast * Safer imports * Add necessary Pillow requirement * Woops * Add AutoImageProcessor test * Fix up * Fix test for imagegpt * Fix test * Review comments * Add warning for TF and JAX input types * Rearrange * Return transforms * NumpyToTensor transformation * Rebase - include changes from upstream in ImageProcessingMixin * Safe typing * Fix up * convert mean/std to tesnor to rescale * Don't store transforms in state * Fix up * Update src/transformers/image_processing_utils_fast.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/auto/image_processing_auto.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/auto/image_processing_auto.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/auto/image_processing_auto.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Warn if fast image processor available * Update src/transformers/models/vit/image_processing_vit_fast.py * Transpose incoming numpy images to be in CHW format * Update mapping names based on packages, auto set fast to None * Fix up * Fix * Add AutoImageProcessor.from_pretrained(checkpoint, use_fast=True) test * Update src/transformers/models/vit/image_processing_vit_fast.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Add equivalence and speed tests * Fix up --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2024-06-11 15:47:38 +01:00
Arthur	673440d073	update ruff version (#30932 ) * update ruff version * fix research projects * Empty * Fix errors --------- Co-authored-by: Lysandre <lysandre@huggingface.co>	2024-05-22 06:40:15 +02:00
Arthur	307f632bb2	[`CI update`] Try to use dockers and no cache (#29202 ) * change cis * nits * update * minor updates * [push-ci-image] * nit [push-ci-image] * nitsssss * [build-ci-image] * [push-ci-image] * [push-ci-image] * both * [push-ci-image] * this? * [push-ci-image] * pypi-kenlm needs g++ * [push-ci-image] * nit * more nits [push-ci-image] * nits [push-ci-image] * [push-ci-image] * [push-ci-image] * [push-ci-image] * add vision * [push-ci-image] * [push-ci-image] * add new dummy file but will need to update them [push-ci-image] * [push-ci-image] * show package size as well * [push-ci-image] * potentially ignore failures * workflow updates * nits [push-ci-image] * [push-ci-image] * fix consistency * clean nciida triton * also show big packages [push-ci-image] * nit * update * another one * line escape? * add accelerate [push-ci-image] * updates [push-ci-image] * nits to run tests, no push-ci * try to parse skip reason to make sure nothing is skipped that should no be skippped * nit? * always show skipped reasons * nits * better parsing of the test outputs * action="store_true", * failure on failed * show matched * debug * update short summary with skipped, failed and errors * nits * nits * coolu pdates * remove docbuilder * fix * always run checks * oups * nits * don't error out on library printing * non zero exi codes * no warning * nit * WAT? * format nit * [push-ci-image] * fail if fail is needed * [push-ci-image] * sound file for torch light? * [push-ci-image] * order is important [push-ci-image] * [push-ci-image] reduce even further * [push-ci-image] * use pytest rich ! * yes [push-ci-image] * oupsy * bring back the full traceback, but pytest rich should help * nit * [push-ci-image] * re run * nit * [push-ci-image] * [push-ci-image] * [push-ci-image] * empty push to trigger * [push-ci-image] * nit? [push-ci-image] * empty * try to install timm with no deps * [push-ci-image] * oups [push-ci-image] * [push-ci-image] * [push-ci-image] ? * [push-ci-image] open ssh client for git checkout fast * empty for torch light * updates [push-ci-image] * nit * @v4 for checkout * [push-ci-image] * [push-ci-image] * fix fetch tests with parallelism * [push-ci-image] * more parallelism * nit * more nits * empty to re-trigger * empty to re-trigger * split by timing * did not work with previous commit * junit.xml * no path? * mmm this? * junitxml format * split by timing * nit * fix junit family * now we can test if the xunit1 is compatible! * this? * fully list tests * update * update * oups * finally * use classname * remove working directory to make sure the path does not interfere * okay no juni should have the correct path * name split? * sort by classname is what make most sense * some testing * naem * oups * test something fun * autodetect * 18? * nit * file size? * uip * 4 is best * update to see versions * better print * [push-ci-image] * [push-ci-image] * please install the correct keras version * [push-ci-image] * [push-ci-image] * [push-ci-image] * [push-ci-image] * [push-ci-image] * uv is fucking me up * [push-ci-image] * [push-ci-image] * [push-ci-image] * nits * [push-ci-image] * [push-ci-image] * install issues an pins * tapas as well * nits * more paralellism * short tb * soundfile * soundfile * [push-ci-image] * [push-ci-image] * [push-ci-image] * oups * [push-ci-image] * fix some things * [push-ci-image] * [push-ci-image] * [push-ci-image] * [push-ci-image] * use torch-light for hub * small git lfs for hub job * [push-ci-image] * [push-ci-image] * [push-ci-image] * [push-ci-image] * fix tf tapas * [push-ci-image] * nits * [push-ci-image] * don't update the test * [push-ci-image] * [push-ci-image] * [push-ci-image] * no use them * [push-ci-image] * [push-ci-image] * [push-ci-image] * [push-ci-image] * update tf proba * [push-ci-image] * [push-ci-image] * woops * [push-ci-image] * [push-ci-image] * [push-ci-image] * [push-ci-image] * [push-ci-image] * [push-ci-image] * test with built dockers * [push-ci-image] * skip annoying tests * revert fix copy * update test values * update * last skip and fixup * nit * ALL GOOOD * quality * Update tests/models/layoutlmv2/test_image_processing_layoutlmv2.py * Update docker/quality.dockerfile Co-authored-by: Lysandre Debut <hi@lysand.re> * Update src/transformers/models/tapas/modeling_tf_tapas.py Co-authored-by: Lysandre Debut <hi@lysand.re> * Apply suggestions from code review Co-authored-by: Lysandre Debut <hi@lysand.re> * use torch-speed * updates * [push-ci-image] * [push-ci-image] * [push-ci-image] * [push-ci-image] * fuck ken-lm [push-ci-image] * [push-ci-image] * [push-ci-image] --------- Co-authored-by: Lysandre Debut <hi@lysand.re>	2024-05-06 10:10:32 +02:00
Lysandre Debut	39114c0383	Remove static pretrained maps from the library's internals (#29112 ) * [test_all] Remove static pretrained maps from the library's internals * Deprecate archive maps instead of removing them * Revert init changes * [test_all] Deprecate instead of removing * [test_all] PVT v2 support * [test_all] Tests should all pass * [test_all] Style * Address review comments * Update src/transformers/models/deprecated/_archive_maps.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/deprecated/_archive_maps.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * [test_all] trigger tests * [test_all] LLAVA * [test_all] Bad rebase --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-03-25 10:33:38 +01:00
Matt	9d999481b2	Add correct batched handling for apply_chat_template (#29222 ) * Add correct batched handling for apply_chat_template * Fix warning method * Add error for incompatible options * expand tests * Add a skip for markuplm * Add skips for other layout models * Skip for LayoutLMv2 * Slightly update the warning message * Update src/transformers/tokenization_utils_base.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/tokenization_utils_base.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/tokenization_utils_base.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/tokenization_utils_base.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/tokenization_utils_base.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/tokenization_utils_base.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * typo fix * Update docstring for conversation kwarg * Update return docstring * Remove the warning, improve error message * Update src/transformers/tokenization_utils_base.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/tokenization_utils_base.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/test_tokenization_common.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/test_tokenization_common.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Remove return_dict=None * Fix up some merge cruft * More merge cruft * Add another skip * Add another skip --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-03-20 15:50:22 +00:00
Lysandre Debut	11bbb505c7	Adds pretrained IDs directly in the tests (#29534 ) * Adds pretrained IDs directly in the tests * Fix tests * Fix tests * Review!	2024-03-13 14:53:27 +01:00
Arthur	651408a077	[`Styling`] stylify using ruff (#27144 ) * try to stylify using ruff * might need to remove these changes? * use ruf format andruff check * use isinstance instead of type comparision * use # fmt: skip * use # fmt: skip * nits * soem styling changes * update ci job * nits isinstance * more files update * nits * more nits * small nits * check and format * revert wrong changes * actually use formatter instead of checker * nits * well docbuilder is overwriting this commit * revert notebook changes * try to nuke docbuilder * style * fix feature exrtaction test * remve `indent-width = 4` * fixup * more nits * update the ruff version that we use * style * nuke docbuilder styling * leve the print for detected changes * nits * Remove file I/O Co-authored-by: charliermarsh <charlie.r.marsh@gmail.com> * style * nits * revert notebook changes * Add # fmt skip when possible * Add # fmt skip when possible * Fix * More ` # fmt: skip` usage * More ` # fmt: skip` usage * More ` # fmt: skip` usage * NIts * more fixes * fix tapas * Another way to skip * Recommended way * Fix two more fiels * Remove asynch Remove asynch --------- Co-authored-by: charliermarsh <charlie.r.marsh@gmail.com>	2023-11-16 17:43:19 +01:00
Arthur	2da8853775	🚨🚨 🚨🚨 [`Tokenizer`] attemp to fix add_token issues🚨🚨 🚨🚨 (#23909 ) * fix test for bart. Order is correct now let's skip BPEs * ouf * styling * fix bert.... * slow refactoring * current updates * massive refactoring * update * NICE! * update to see where I am at * updates * update * update * revert * updates * updates * start supporting legacy_save * styling * big update * revert some changes * nits * nniiiiiice * small fixes * kinda fix t5 with new behaviour * major update * fixup * fix copies * today's updates * fix byt5 * upfate * update * update * updates * update vocab size test * Barthez does not use not need the fairseq offset ids * super calll must be after * calll super * move all super init * move other super init * fixup * nits * more fixes * nits * more fixes * nits * more fix * remove useless files * ouch all of them are affected * and more! * small imporvements * no more sanitize token * more changes around unique no split tokens * partially fix more things * keep legacy save but add warning * so... more fixes * updates * guess deberta tokenizer could be nuked * fixup * fixup did some bad things * nuke it if it breaks * remove prints and pretrain fast from slow with new format. * fixups * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fiou * nit * by default specials should not be normalized? * update * remove brakpoint * updates * a lot of updates * fixup * fixes revert some changes to match fast * small nits * that makes it cleaner * fix camembert accordingly * update * some lest breaking changes * update * fixup * fix byt5 and whisper mostly * some more fixes, canine's byte vocab * fix gpt2 * fix most of the perceiver tests (4 left) * fix layout lmv3 * fixup * fix copies for gpt2 style * make sure to only warn once * fix perciever and gpt2 tests * some more backward compatibility: also read special tokens map because some ppl use it........////..... * fixup * add else when reading * nits * fresh updates * fix copies * will this make everything faster? * fixes * more fixes * update * more fixes * fixup * is the source of truth right? * sorry camembert for the troubles * current updates * fixup * update led * update * fix regression * fix single word * more model specific fixes * fix t5 tests * fixup * more comments * update * fix nllb * rstrip removed * small fixes * better handle additional_special_tokens and vocab sizes * fixing * styling * fix 4 / 21 * fixup * fix nlbb's tests * some fixes * fix t5 * fixes * style * fix canine tests * damn this is nice * nits * m2m100 nit * fixups * fixes! * fixup * stash * fix merge * revert bad change * fixup * correct order for code Llama * fix speecht5 post merge * styling * revert source of 11 fails * small nits * all changes in one go * fnet hack * fix 2 more tests * update based on main branch of tokenizers * fixup * fix VITS issues * more fixes * fix mgp test * fix camembert issues * oups camembert still has 2 failing tests * mluke fixes * decode fixes * small nits * nits * fix llama and vits * fix camembert * smal nits * more fixes when initialising a fast from a slow and etc * fix one of the last test * fix CPM tokenizer test * fixups * fix pop2piano * fixup * ⚠️ Change tokenizers required version ⚠️ * ⚠️ Change tokenizers required version ⚠️ * "tokenizers>=0.14,<0.15", don't forget smaller than * fix musicgen tests and pretraiendtokenizerfast * fix owlvit and all * update t5 * fix 800 red * fix tests * fix the fix of the fix of t5 * styling * documentation nits * cache _added_tokens_encoder * fixups * Nit * fix red tests * one last nit! * make eveything a lot simpler * Now it's over 😉 * few small nits * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * updates that work for now * tests that should no be skipped / changed and fixed next * fixup * i am ashamed * pushe the fix * update * fixups * nits * fix added_tokens_encoder * fix canine test * fix pegasus vocab * fix transfoXL * fixup * whisper needs to be fixed for train new * pegasus nits * more pegasus fixes * minor update * better error message in failed test * fix whisper failing test * fix whisper failing test * fix pegasus * fixup * fix **** pegasus * reset things * remove another file * attempts to fix the strange custome encoder and offset * nits here and there * update * fixup * nit * fix the whisper test * nits nits * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * updates based on review * some small update to potentially remove * nits * import rlu cache * Update src/transformers/tokenization_utils_base.py Co-authored-by: Lysandre Debut <hi@lysand.re> * move warning to `from_pretrained` * update tests results now that the special tokens are always added --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Lysandre Debut <hi@lysand.re>	2023-09-18 20:28:36 +02:00
Matt	866df66fe4	Overhaul Conversation class and prompt templating (#25323 ) * First commit while I figure this out * make fixup * Remove unused method * Store prompt attrib * Fix prompt argument for tests * Make same changes in fast tokenizer * Remove global prompts from fast tokenizer too * stash commit * stash commit * Migrate PromptConfig to its True Final Location * Replace Conversation entirely with the new class * Import/dependency fixes * Import/dependency fixes * Change format for lots of default prompts * More default prompt fixups * Revert llama old methods so we can compare * Fix some default configs * Fix some default configs * Fix misspelled kwarg * Fixes for Blenderbot * make fixup * little rebase cleanup * Add basic documentation * Quick doc fix * Truncate docstring for now * Add handling for the case when messages is a single string * Quick llama merges * Update conversational pipeline and tests * Add a couple of legacy properties for backward compatibility * More legacy handling * Add docstring for build_conversation_input_ids * Restructure PromptConfig * Let's start T E M P L A T I N G * Refactor all default configs to use templates instead * Revert changes to the special token properties since we don't need them anymore * More class templates * Make the sandbox even sandier * Everything replaced with pure templating * Remove docs for PromptConfig * Add testing and optional requirement boilerplate * Fix imports and make fixup * Fix LLaMA tests and add Conversation docstring * Finally get LLaMA working with the template system * Finally get LLaMA working with the template system * make fixup * make fixup * fmt-off for the long lists of test tokens * Rename method to apply_chat_template for now * Start on documentation * Make chat_template a property that reads through to the default if it's not set * Expand docs * Expand chat templating doc some more * trim/lstrip blocks by default and update doc * Few doc tweaks * rebase cleanup * Clarify docstring * rebase cleanup * rebase cleanup * make fixup * Quick doc edit * Reformat the standard template to match ChatML * Re-add PEFT check * Update docs/source/en/chat_templating.md Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Add apply_chat_template to the tokenizer doc * make fixup * Add doc links * Fix chat links * Fix chat links * Explain system messages in the doc * Add chat template test * Proper save-loading for chat template attribute * Add test skips for layout models * Remove _build_conversation_input_ids, add default_chat_template to code_llama * Make sure all LLaMA models are using the latest template * Remove default_system_prompt block in code_llama because it has no default prompt * Update ConversationPipeline preprocess * Add correct #Copied from links to the default_chat_templates * Remove unneeded type checking line * Add a dummy mark_processsed method * Reorganize Conversation to have *deprecated_kwargs Update chat_templating.md * Quick fix to LLAMA tests * Small doc tweaks * Add proper docstrings and "copied from" statements to all default chat templates * Merge use_default_system_prompt support for code_llama too * Improve clarity around self.chat_template * Docstring fix * Fix blenderbot default template * More doctest fix * Break out some tokenizer kwargs * Update doc to explain default templates * Quick tweaks to tokenizer args * Cleanups for tokenizer args * Add note about cacheing * Quick tweak to the chat-templating doc * Update the LLaMA template with error checking and correct system message embedding * make fixup * make fixup * add requires_jinja * Cleanup to expected output formatting * Add cacheing * Fix typo in llama default template * Update LLaMA tests * Update documentation * Improved legacy handling in the Conversation class * Update Jinja template with proper error handling * Quick bugfix * Proper exception raising * Change cacheing behaviour so it doesn't try to pickle an entire Jinja env * make fixup * rebase cleanup --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2023-09-14 15:10:34 +01:00
Arthur	30b3c46ff5	[`split_special_tokens`] Add support for `split_special_tokens` argument to encode (#25081 ) * draft changes * update and add tests * styling for no * move test * path to usable model * update test * small update * update bertbased tokenizers * don'tuse kwargs for _tokenize * don'tuse kwargs for _tokenize * fix copies * update * update test for special tokenizers * fixup * skip two tests * remove pdb breakpiont() * wowo * rewrite custom tests * nits * revert chang in target keys * fix markup lm * update documentation of the argument	2023-08-18 13:26:27 +02:00
amyeroberts	41d56ea6dd	Refactor image processor testers (#25450 ) * Refactor image processor test mixin - Move test_call_numpy, test_call_pytorch, test_call_pil to mixin - Rename mixin to reflect handling of logic more than saving - Add prepare_image_inputs, expected_image_outputs for tests * Fix for oneformer	2023-08-11 11:30:18 +01:00
Yih-Dar	bd90cda9a6	CI with `num_hidden_layers=2` 🚀🚀🚀 (#25266 ) * CI with layers=2 --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-08-02 20:22:36 +02:00
Matt	134caef31a	Speed up TF tests by reducing hidden layer counts (#24595 ) * hidden layers, huh, what are they good for (absolutely nothing) * Some tests break with 1 hidden layer, use 2 * Use 1 hidden layer in a few slow models * Use num_hidden_layers=2 everywhere * Slightly higher tol for groupvit * Slightly higher tol for groupvit	2023-06-30 16:30:33 +01:00
amyeroberts	ae454f41d4	Update old existing feature extractor references (#24552 ) * Update old existing feature extractor references * Typo * Apply suggestions from code review * Apply suggestions from code review * Apply suggestions from code review * Address comments from review - update 'feature extractor' Co-authored by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>	2023-06-29 10:17:36 +01:00
Yih-Dar	c23d131eab	Update tiny models for pipeline testing. (#24364 ) * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-06-20 14:43:10 +02:00
Matt	f8b2574416	Better TF docstring types (#23477 ) * Rework TF type hints to use \| None instead of Optional[] for tf.Tensor * Rework TF type hints to use \| None instead of Optional[] for tf.Tensor * Don't forget the imports * Add the imports to tests too * make fixup * Refactor tests that depended on get_type_hints * Better test refactor * Fix an old hidden bug in the test_keras_fit input creation code * Fix for the Deit tests	2023-05-24 13:52:52 +01:00
Yih-Dar	5269718cb7	Don't use `LayoutLMv2` and `LayoutLMv3` in some pipeline tests (#22774 ) * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-04-17 17:45:20 +02:00
Thibault Douzon	4e441e529c	fix LayoutLMv3TokenizerFast subword label after 'Ġ' token (#21695 ) LayoutLMv3TokenizerFast produces empty 'Ġ' token with `offset_mapping = (0, 0)`. Next token is wrongly assumed to also be beginning of word and isn't correctly assigned `pad_token_label`. Modify test with text that produce 'Ġ' token. Remove copy check from LayoutLMv2TokenizerFast for `_batch_encode_plus`. solves issue: #19978	2023-04-03 10:32:36 -04:00
Yih-Dar	6c2ad00c46	Move `is_pipeline_test_to_skip` to specific model test classes (#21999 ) * Move `is_pipeline_test_to_skip` to specific model test classes --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-03-14 10:03:02 +01:00
Yih-Dar	871c31a6f1	🔥Rework pipeline testing by removing `PipelineTestCaseMeta` 🚀 (#21516 ) * Add PipelineTesterMixin * remove class PipelineTestCaseMeta * move validate_test_components * Add for ViT * Add to SPECIAL_MODULE_TO_TEST_MAP * style and quality * Add feature-extraction * update * raise instead of skip * add tiny_model_summary.json * more explicit * skip tasks not in mapping * add availability check * Add Copyright * A way to diable irrelevant tests * update with main * remove disable_irrelevant_tests * skip tests * better skip message * better skip message * Add all pipeline task tests * revert * Import PipelineTesterMixin * subclass test classes with PipelineTesterMixin * Add pipieline_model_mapping * Fix import after adding pipieline_model_mapping * Fix style and quality after adding pipieline_model_mapping * Fix one more import after adding pipieline_model_mapping * Fix style and quality after adding pipieline_model_mapping * Fix test issues * Fix import requirements * Fix mapping for MobileViTModelTest * Update * Better skip message * pipieline_model_mapping could not be None * Remove some PipelineTesterMixin * Fix typo * revert tests_fetcher.py * update * rename * revert * Remove PipelineTestCaseMeta from ZeroShotAudioClassificationPipelineTests * style and quality * test fetcher for all pipeline/model tests --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-02-28 19:40:57 +01:00
Aaron Gokaslan	5e8c8eb5ba	Apply ruff flake8-comprehensions (#21694 )	2023-02-22 09:14:54 +01:00
Sylvain Gugger	6f79d26442	Update quality tooling for formatting (#21480 ) * Result of black 23.1 * Update target to Python 3.7 * Switch flake8 to ruff * Configure isort * Configure isort * Apply isort with line limit * Put the right black version * adapt black in check copies * Fix copies	2023-02-06 18:10:56 -05:00
amyeroberts	e2bd7f80d0	Update tests: replace feature extractor tests with image processor (#20768 ) * Update imports and test fetcher * Revert but keep test fetcher update * Fix imports * Fix all imports * Replace fe with ip names * Add generate kwargs to `AutomaticSpeechRecognitionPipeline` (#20952) * Add generate kwargs to AutomaticSpeechRecognitionPipeline * Add test for generation kwargs * Update image processor parameters if creating with kwargs (#20866) * Update parameters if creating with kwargs * Shallow copy to prevent mutating input * Pass all args in constructor dict - warnings in init * Fix typo * Rename tester class * Rebase and tidy up * Fixup * Use ImageProcessingSavingTestMixin * Update property ref in tests * Update property ref in tests * Update recently merged in models * Small fix Co-authored-by: bofeng huang <bofenghuang7@gmail.com>	2023-01-23 17:25:41 +00:00
amyeroberts	66459ce319	Add test_image_processing_common.py (#20785 ) * Add test_image_processing_common.py * Fix typo * Update imports and test fetcher * Revert but keep test fetcher update * Fix imports * Fix all imports * Formatting fix * Update tests/test_image_processing_common.py	2023-01-23 13:48:30 +00:00
amyeroberts	0dde58978a	Rename test_feature_extraction files (#21140 ) * Rename files * Update file names in tests	2023-01-17 14:04:07 +00:00
amyeroberts	292acd71d6	Update image processor parameters if creating with kwargs (#20866 ) * Update parameters if creating with kwargs * Shallow copy to prevent mutating input * Pass all args in constructor dict - warnings in init * Fix typo	2023-01-04 14:29:48 +00:00
amyeroberts	a95fd35426	Vision processors - replace FE with IPs (#20590 ) * Replace FE references with IPs * Update processor tests * Update src/transformers/models/clip/processing_clip.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/clip/processing_clip.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update warning messages v4.27 -> v5 * Fixup * Update Chinese CLIP processor * Add feature_extractor property * Add attributes * Add tests Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-12-09 10:48:34 +00:00
Sanchit Gandhi	905e5773a3	[processor] Add 'model input names' property (#20117 ) * [processor] Add 'model input names' property * add test * no f string * add generic property method to mixin * copy to multimodal * copy to vision * tests for all audio * remove ad-hoc tests * style * fix flava test * fix test * fix processor code	2022-11-10 19:29:20 +00:00
amyeroberts	a6b7759880	Add Image Processors (#19796 ) * Add CLIP image processor * Crop size as dict too * Update warning * Actually use logger this time * Normalize doesn't change dtype of input * Add perceiver image processor * Tidy up * Add DPT image processor * Add Vilt image processor * Tidy up * Add poolformer image processor * Tidy up * Add LayoutLM v2 and v3 imsge processors * Tidy up * Add Flava image processor * Tidy up * Add deit image processor * Tidy up * Add ConvNext image processor * Tidy up * Add levit image processor * Add segformer image processor * Add in post processing * Fix up * Add ImageGPT image processor * Fixup * Add mobilevit image processor * Tidy up * Add postprocessing * Fixup * Add VideoMAE image processor * Tidy up * Add ImageGPT image processor * Fixup * Add ViT image processor * Tidy up * Add beit image processor * Add mobilevit image processor * Tidy up * Add postprocessing * Fixup * Fix up * Fix flava and remove tree module * Fix image classification pipeline failing tests * Update feature extractor in trainer scripts * Update pad_if_smaller to accept tuple and int size * Update for image segmentation pipeline * Update src/transformers/models/perceiver/image_processing_perceiver.py Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com> * Update src/transformers/image_processing_utils.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/beit/image_processing_beit.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * PR comments - docstrings; remove accidentally added resize; var names * Update docstrings * Add exception if size is not in the right format * Fix exception check * Fix up * Use shortest_edge in tuple in script Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>	2022-11-02 11:57:36 +00:00
Yih-Dar	998a90bc7d	Fix `test_tf_encode_plus_sent_to_model` for `LayoutLMv3` (#18898 ) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-09-06 14:51:03 +02:00
Christoffer Koo Øhrstrøm	de8548ebf3	[LayoutLMv3] Add TensorFlow implementation (#18678 ) Co-authored-by: Esben Toke Christensen <esben.christensen@visma.com> Co-authored-by: Lasse Reedtz <lasse.reedtz@visma.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2022-08-30 11:48:11 +01:00
SaulLu	6667b0d7bf	add warning to let the user know that the `__call__` method is faster than `encode` + `pad` for a fast tokenizer (#18693 ) * add warning to let the user know that the method is slower that for a fast tokenizer * user warnings * fix layoutlmv2 * fix layout* * change warnings into logger.warning	2022-08-24 06:27:56 -04:00
Yulv-git	95113d1365	Fix some typos. (#17560 ) * Fix some typos. Signed-off-by: Yulv-git <yulvchi@qq.com> * Fix typo. Signed-off-by: Yulv-git <yulvchi@qq.com> * make fixup.	2022-07-11 05:00:13 -04:00

1 2

51 Commits