transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-08-02 19:21:31 +06:00

Author	SHA1	Message	Date
Sanchit Gandhi	06d488061f	[Whisper Tokenizer] Make more user-friendly (#19921 ) * [Whisper Tokenizer] Make more user-friendly * use property * make indexing rigorous * small clean-up * tests * skip seq2seq tests * remove multilingual arg * reorder args * collapse to one function Co-authored-by: ArthurZucker <arthur@huggingface.co> * option to override attributes Co-authored-by: ArthurZucker <arthur@huggingface.co> * add to docs * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * make comment more clear Co-authored-by: sgugger <sylvain@huggingface.co> * don't add special tokens in get_decoder_prompt_ids * add test for set_prefix_tokens Co-authored-by: ArthurZucker <arthur@huggingface.co> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: sgugger <sylvain@huggingface.co>	2022-11-03 14:22:40 +00:00
Nicolas Patry	ec6878f6ca	Now supporting pathlike in pipelines too. (#20030 )	2022-11-03 09:14:45 +01:00
Ben Eyal	9f9ddcc2de	🚨 🚨 🚨 Fix Issue 15003: SentencePiece Tokenizers Not Adding Special Tokens in `convert_tokens_to_string` (#15775 ) * Add test for SentencePiece not adding special tokens to strings * Add SentencePieceStringConversionMixin to fix issue 15003 * Fix conversion from tokens to string for most SentencePiece tokenizers Tokenizers fixed: - AlbertTokenizer - BarthezTokenizer - CamembertTokenizer - FNetTokenizer - M2M100Tokenizer - MBart50Tokenizer - PegasusTokenizer - Speech2TextTokenizer * Fix MarianTokenizer, adjust SentencePiece test to accomodate vocab * Fix DebertaV2Tokenizer * Ignore LayoutXLMTokenizer in SentencePiece string conversion test * Run 'make style' and 'make quality' * Clean convert_tokens_to_string test Instead of explicitly ignoring LayoutXLMTokenizer in the test, override the test in LayoutLMTokenizationTest and do nothing in it. * Remove commented out code * Improve robustness of convert_tokens_to_string test Instead of comparing lengths of re-tokenized text and input_ids, check that converting all special tokens to string yields a string with all special tokens. * Inline and remove SentencePieceStringConversionMixin The convert_tokens_to_string method is now implemented in each relevant SentencePiece tokenizer. * Run 'make style' and 'make quality' * Revert removal of space in convert_tokens_to_string * Remove redundant import * Revert test text to original * Uncomment the lowercasing of the reverse_text variable * Mimic Rust tokenizer behavior for tokenizers - Albert - Barthez - Camembert - MBart50 - T5 * Fix accidentally skipping test in wrong tokenizer * Add test for equivalent Rust and slow tokenizer behavior * Override _decode in BigBirdTokenizer to mimic Rust behavior * Override _decode in FNetTokenizer to mimic Rust behavior * Override _decode in XLNetTokenizer to mimic Rust behavior * Remove unused 're' import * Update DebertaV2Tokenizer to mimic Rust tokenizer * Deberta tokenizer now behaves like Albert and its `convert_tokens_to_string` is not tested. * Ignore problematic tests in Deberta V2 * Add comment on why the Deberta V2 tests are skipped	2022-11-02 15:45:38 -04:00
Yih-Dar	f69eb24b5a	Improve model tester (#19984 ) * part 1 * part 2 * part 3 * fix * For CANINE * For ESMFold Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-11-02 17:38:44 +01:00
amyeroberts	9aedce99b0	Update auto processor to check image processor created (#20021 )	2022-11-02 15:19:33 +00:00
Sylvain Gugger	49b77b89ea	Quality (#20002 )	2022-11-02 09:53:37 -04:00
Yih-Dar	c6c9db3d0c	Fix gradient checkpoint test in encoder-decoder (#20017 ) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-11-02 14:15:09 +01:00
amyeroberts	a6b7759880	Add Image Processors (#19796 ) * Add CLIP image processor * Crop size as dict too * Update warning * Actually use logger this time * Normalize doesn't change dtype of input * Add perceiver image processor * Tidy up * Add DPT image processor * Add Vilt image processor * Tidy up * Add poolformer image processor * Tidy up * Add LayoutLM v2 and v3 imsge processors * Tidy up * Add Flava image processor * Tidy up * Add deit image processor * Tidy up * Add ConvNext image processor * Tidy up * Add levit image processor * Add segformer image processor * Add in post processing * Fix up * Add ImageGPT image processor * Fixup * Add mobilevit image processor * Tidy up * Add postprocessing * Fixup * Add VideoMAE image processor * Tidy up * Add ImageGPT image processor * Fixup * Add ViT image processor * Tidy up * Add beit image processor * Add mobilevit image processor * Tidy up * Add postprocessing * Fixup * Fix up * Fix flava and remove tree module * Fix image classification pipeline failing tests * Update feature extractor in trainer scripts * Update pad_if_smaller to accept tuple and int size * Update for image segmentation pipeline * Update src/transformers/models/perceiver/image_processing_perceiver.py Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com> * Update src/transformers/image_processing_utils.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/beit/image_processing_beit.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * PR comments - docstrings; remove accidentally added resize; var names * Update docstrings * Add exception if size is not in the right format * Fix exception check * Fix up * Use shortest_edge in tuple in script Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>	2022-11-02 11:57:36 +00:00
Joao Gante	831590f6a9	Generate: contrastive search with full optional outputs (#19963 ) * Use beam search functionality; Add extra outputs and test * Add full tests for contrastive search * Add error message on unconventional cache format	2022-11-01 18:15:36 +00:00
Mohit Sharma	c796b6dea6	Added onnx config whisper (#19525 ) * Added onnx config whisper * added whisper support onnx * add audio input data * added whisper support onnx * fixed the seqlength value * Updated the whisper onnx ocnfig * restore files to old version * removed attention mask from inputs * Updated get_dummy_input_onnxruntime docstring * Updated relative imports and token generation * update docstring	2022-11-01 07:50:42 -04:00
Matt	7f9b7b3f0e	Add ESMFold (#19977 ) * initial commit * First draft that gets outputs without crashing! * Add all the ported openfold dependencies * testing * Restructure config files for ESMFold * Debugging to find output discrepancies * Mainly style * Make model runnable without extra deps * Remove utils and merge them to the modeling file * Use correct gelu and remove some debug prints * More cleanup * Update esm docs * Update conversion script to support ESMFold properly * Port some top-level changes from ESMFold repo * Expand EsmFold docstrings * Make attention_mask optional (default to all 1s) * Add inference test for ESMFold * Use config and not n kwargs * Add modeling output class * Remove einops * Remove chunking in ESM FFN * Update tests for ESMFold * Quality * REpo consistency * Remove tree dependency from ESMFold * make fixup * Add an error in case my structure map function breaks later * Remove needless code * Stop auto-casting the LM to float16 so CPU tests pass * Stop auto-casting the LM to float16 so CPU tests pass * Final test updates * Split test file * Copyright and quality * Unpin PyTorch to see built doc * Fix config file to_dict() method * Add some docstrings to the output * Skip TF checkpoint tests for ESM until we reupload those * make fixup * More docstrings * Unpin to get even with main * Flag example to write Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>	2022-10-31 21:32:58 -04:00
NielsRogge	4c9e0f029e	Add support for gradient checkpointing (#19990 ) Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>	2022-10-31 18:37:17 +01:00
NielsRogge	0b294c2334	[Conditional, Deformable DETR] Add postprocessing methods (#19709 ) * Add postprocessing methods * Update docs * Add fix * Add test * Add test for deformable detr postprocessing * Add post processing methods for segmentation * Update code examples * Add post_process to make the pipeline work * Apply updates Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>	2022-10-31 08:28:44 +01:00
Raghav Prabhakar	0d4c45c585	Add Onnx Config for ImageGPT (#19868 ) * add Onnx Config for ImageGPT * add generate_dummy_inputs for onnx config * add TYPE_CHECKING clause * Update doc for generate_dummy_inputs Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-10-28 09:39:53 -04:00
donguk.lim	347ba38cb4	Support segformer fx (#19924 ) * Support segformer fx * Add fx_compatible attribute to test_modeling_segformer.py * Update glpn model (fx support) glpn model was copied from segformer. * Update utils/fx.py \| add semantic-segmentation for SegformerForSemanticSegmentation model * Fix minor import order(isort) * Add random input generation for segformer fx Co-authored-by: noelbird <lduldu00228@gmail.com>	2022-10-28 08:44:38 -04:00
Sylvain Gugger	6c24443ff5	Safetensors tf (#19900 ) * Wip * Add safetensors support for TensorFlow * First tests * Add final test for now * Retrigger CI like this * Update src/transformers/modeling_tf_utils.py Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr> Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>	2022-10-27 15:56:29 -04:00
Antonio Carlos Falcão Petri	ea118ae2e1	Fix bug in Wav2Vec2's GPU tests (#19803 ) * Fix tests when running on GPU * Fix tests that require mp.set_start_method	2022-10-27 09:00:03 -04:00
Yih-Dar	f1e42bc50e	Some fixes regarding auto mappings and test class names (#19923 ) * Add pegasus_x * ViTMSN * ESM Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-10-27 14:38:59 +02:00
Younes Belkada	7629656926	`accelerate` support for `RoBERTa` family (#19906 )	2022-10-26 22:41:53 +02:00
Patrick von Platen	6d023270f6	Allow flax subfolder (#19902 ) * add first generation tutorial * [Flax] Add subfolder functionality * [Flax] Add subfolder functionality * up * finish * delete file and re-add test	2022-10-26 18:33:23 +02:00
Yih-Dar	688c3e8e40	Update `max_diff` in `test_save_load_fast_init_to_base` (#19849 ) * Fix test_save_load_fast_init_to_base * Fix test_save_load_fast_init_to_base * update Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-10-26 17:09:47 +02:00
Nicolas Patry	5fd5990dce	Factored out some code in the `image-segmentation` pipeline. (#19727 ) * Factored out some code in the image-segmentation pipeline Re-enable `small_model_pt`. Re-enable `small_model_pt`. Enabling the current test with the current values. Debugging the values on the CI. More logs ? Printing doesn't work ? Using the CI values instead. Seems to be a Pillow sensitivity. Added a test showcasing that models not supporting some tasks get a clear error. Factored out code. Further factor out. Fixup. Bad rebase. Put `panoptic` before `instance` as it should be a superset. * Fixing tests. * Adding subtasks tests + Fixes `instance` segmentation which was broken due to default and non kwargs arguments. * Fix bad replace.	2022-10-26 10:44:36 +02:00
Yih-Dar	f9257843b5	Fix incorrect model<->tokenizer mapping in tokenization testing (#19872 ) * Fix model-tokenizer mapping Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-10-25 16:02:13 +02:00
Lysandre Debut	eedaba682f	[Past CI] Vilt only supports PT >= v1.10 (#19851 ) * Support for Vilt in v1.9 * Skip if not higher or equal than 1.10 * Move test :) * I am bad at python	2022-10-25 15:59:35 +02:00
Guillaume Klein	ab108a0e31	Add missing lang tokens in M2M100Tokenizer.get_vocab (#18416 )	2022-10-25 09:18:24 -04:00
Sylvain Gugger	d4eb52d13d	Refactor conversion function (#19799 ) * Refactor conversion function * Remove dupe line * Fixes * Fixes * Use the right variable... * Fix last test	2022-10-24 13:48:40 -04:00
Yih-Dar	8b2501b4b9	Update `LEDModelIntegrationTests` expected values (#19841 ) * Update expected values * fix style Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-10-24 16:05:26 +02:00
Rak Alexey	d3f4cef74d	fix image2test args forwarding (#19648 ) * fix image2test args forwarding * fix issues * Proposing the update to the PR. * Fixup. Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>	2022-10-24 09:49:24 -04:00
Yih-Dar	3436842102	Run some TF Whisper tests in subprocesses to avoid GPU OOM (#19772 ) * Run some TF Whisper tests in subprocesses to avoid GPU OOM Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-10-21 21:59:18 +02:00
Joao Gante	e0b825a8d0	Generate: contrastive search test updates (#19787 ) * contrastive search test updates * make fixup	2022-10-21 19:10:08 +01:00
Alara Dirik	cca51aa151	Fix image segmentation pipeline errors, resolve backward compatibility issues (#19768 ) * Fix panoptic segmentation and pipeline * Update ImageSegmentationPipeline tests and reenable test_small_model_pt * Resolve backward compatibility issues	2022-10-21 18:09:58 +03:00
Yih-Dar	3a1aeea3c5	Fix CTRL `test_torchscrip_xxx` CI by updating `_create_and_check_torchscript` (#19786 ) * Run inputs before trace * Run inputs before trace Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-10-21 16:23:13 +02:00
Hao Wang	31565ff0fd	Add sentencepiece to BertJapaneseTokenizer (#19769 ) * support sentencepiece for bertjapanesetokenizer * add test vocab file for sentencepiece, bertjapanesetokenizer * make BasicTokenizer be identical to transformers.models.bert.tokenization_bert.BasicTokenizer * fix missing of \n in comment * fix init argument missing in tests * make spm_file be optional, exclude spiece.model from tests/fixtures, and add description comments * make comment length less than 119 * apply doc style check	2022-10-21 10:04:49 -04:00
Yih-Dar	3aaabaa214	Update `ImageToTextPipelineTests.test_small_model_tf` (#19785 ) * update expected values for the correct TF checkpoint * Run test * Clean up * fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-10-21 14:35:20 +02:00
Yih-Dar	84f6bee5da	PT <-> TF for composite models (#19732 ) * First step of PT->TF for composite models * Update the tests * For VisionEncoderDecoderModel * Fix * Fix * Add comment * Fix * clean up import * Save memory * For (TF)EncoderDecoderModel * For (TF)EncoderDecoderModel Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-10-21 12:40:39 +02:00
Nicolas Patry	a40386669f	`image-segmentation` pipeline: re-enable `small_model_pt` test. (#19716 ) * Re-enable `small_model_pt`. Re-enable `small_model_pt`. Enabling the current test with the current values. Debugging the values on the CI. More logs ? Printing doesn't work ? Using the CI values instead. Seems to be a Pillow sensitivity. * Update src/transformers/pipelines/image_segmentation.py Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com> Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>	2022-10-20 11:57:11 +02:00
amyeroberts	5041bc3511	Image transforms add center crop (#19718 ) * Add center crop to transforms library * Return PIL images if PIL image input by default * Fixup and add docstring * Trigger CI * Update src/transformers/image_transforms.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/image_transforms.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * PR comments - move comments; unindent Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-10-19 16:15:01 +01:00
Yih-Dar	bed2edb99f	Specify TF framework explicitly in more pipeline tests (#19748 ) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-10-19 16:24:03 +02:00
GMFTBY	71786b10c5	Adding the state-of-the-art contrastive search decoding methods for the codebase of generation_utils.py (#19477 ) * add: the contrastive search for generaton_utils * add: testing scripts for contrastive search under examples/text-generation * update the quality of codes * revise the docstring; make the generation_contrastive_search.py scripts; * revise the examples/pytorch/text-generation/run_generation_contrastive_search.py to the auto-APIs format * revise the necessary documents * fix: revise the docstring of generation_contrastive_search.py * Fix the code indentation * fix: revise the nits and examples in contrastive_search docstring. * fix the copyright * delete generation_contrastive_search.py * revise the logic in contrastive_search * update the intergration test and the docstring * run the tests over * add the slow decorate to the contrastive_search intergrate test * add more test * do the style, quality, consistency checks	2022-10-19 10:17:46 +01:00
Sylvain Gugger	a929f81e92	Repo utils test (#19696 ) * Create repo utils test job * Last occurence * Add tests for tests_fetcher * Better filtering * Let's learn more * Should fix * Should fix * Remove debug * Style * WiP WiP WiP WiP WiP WiP WiP WiP WiP * Quality * address review comments * Fix link	2022-10-18 13:47:36 -04:00
David Yang	a23819ed6a	Clean up deprecation warnings (#19654 ) * Clean up deprecation warnings Notes: Changed some strings in tests to raw strings, which will change the literal content of the strings as they are fed into whatever machine handles them. Test cases for past in the past/past_key_values switch changed/removed due to warning of impending removal * Add PILImageResampling abstraction for PIL.Image.Resampling	2022-10-18 13:34:47 -04:00
Sylvain Gugger	fb0bd7b7a8	Fix activations being all the same module (#19728 )	2022-10-18 11:56:45 -04:00
Yih-Dar	06a82a49ae	Specify TF framework in TF-related pipeline tests (#19719 ) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2022-10-18 17:40:28 +02:00
Nicolas Patry	63d13d768b	Improving `image-segmentation` pipeline tests. (#19710 ) This PR (https://github.com/huggingface/transformers/pull/19367) introduced a few breaking changes: - Removed an argument `mask_threshold`. - Broke the default behavior (instance vs panoptic in the function call) https://github.com/huggingface/transformers/pull/19367/files#diff-60f846b86fb6a21d4caf60f5b3d593a04accb8f248de3029cccae2ff898c5bc3R119-R120 - Broke the actual masks: https://github.com/huggingface/transformers/pull/1961 This PR is the start of a handful that will aim at bringing back the old behavior(s). - tests should not have to specify `task` by default, unless we want to modify the behavior and have a lower form of segmentation running) - `test_small_model_pt` should be working. This specific PR starts with adding more information to the masks hash because missing the actual mask was actual easy to miss (the hashes do change, but it was easy to miss that one code path wasn't properly updated). So we go from a simple `hash` to ``` {"hash": #smaller hash, "shape": (h, w), "white_pixels": n} ``` The `shape` should help make sure the interpolation of the mask works correctly, the `white_pixels` hopefully helps detect big regressions in their amount when the hash gets modified.	2022-10-18 16:33:53 +02:00
Nicolas Patry	ee2a80ecc0	add return_tensors parameter for feature_extraction 2 (#19707 ) * add return_tensors parameter for feature_extraction w/ test add return_tensor parameter for feature extraction Revert "Merge branch 'feature-extraction-return-tensor' of https://github.com/ajsanjoaquin/transformers into feature-extraction-return-tensor" This reverts commit d559da743b87914e111a84a98ba6dbb70d08ad88, reversing changes made to bbef89278650c04c090beb65637a8e9572dba222. call parameter directly Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> Fixup. Update src/transformers/pipelines/feature_extraction.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Fix the imports. * Fixing the test by not overflowing the model capacity. Co-authored-by: AJ San Joaquin <ajsanjoaquin@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-10-18 16:29:00 +02:00
NielsRogge	dd523da577	Add table transformer [v2] (#19614 ) * First draft * Add conversion script * Make conversion work * Upload checkpoints * Add final fixes * Revert changes of conditional and deformable detr * Fix toctree, add and remove copied from * Use model type * Improve docs * Improve code example * Update copies * Add copied formt * Don't update conditional detr * Don't update deformable detr	2022-10-18 15:20:09 +02:00
Nicolas Patry	713eab45d3	🚨 🚨 🚨 [Breaking change] Deformable DETR intermediate representations (#19678 ) * [Breaking change] Deformable DETR intermediate representations - Fixes naturally the `object-detection` pipeline. - Moves from `[n_decoders, batch_size, ...]` to `[batch_size, n_decoders, ...]` instead. * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-10-18 09:00:39 -04:00
Antonio Carlos Falcão Petri	af150e4a1c	Allow user-managed Pool in Wav2Vec2ProcessorWithLM.batch_decode (#18351 ) * [Wav2Vec2] Allow user-managed Pool in Wav2Vec2ProcessorWithLM.batch_decode * [Wav2Vec2] Add user-managed LM's pool tests and usage examples * Improve styling Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * [Wav2Vec2] Fix hyperlink references Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-10-18 08:48:03 -04:00
NielsRogge	90071fe42b	Improve DETR models (#19644 ) * Improve DETR models * Fix Deformable DETR loss and matcher * Fixup * Fix integration tests * Improve variable names * Apply suggestion * Fix copies * Fix DeformableDetrLoss * Make Conditional DETR copy from Deformable DETR * Copy from deformable detr's hungarian matcher * Fix bug	2022-10-18 10:29:14 +02:00
Arthur	d356b89f3c	fix test whisper with new max length (#19668 )	2022-10-18 08:56:37 +02:00

1 2 3 4 5 ...

2216 Commits