transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-15 10:38:23 +06:00

Author	SHA1	Message	Date
Yih-Dar	73893df864	Fix `Owlv2ModelIntegrationTest::test_inference_object_detection` (#27793 ) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-12-04 09:45:22 +01:00
Yih-Dar	5a551df92b	Fix `TvpModelIntegrationTests` (#27792 ) * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-12-04 09:40:42 +01:00
Arthur	c0b9db0914	[`ModelOnTheFlyConversionTester`] Mark as slow for now (#27823 ) * mark test as slow for now * style	2023-12-04 08:33:15 +01:00
NielsRogge	7edf8bfafd	Improve forward signature test (#27729 ) * First draft * Extend test_forward_signature * Update tests/test_modeling_common.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Revert suggestion --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-12-04 07:38:22 +01:00
Nicolas Patry	7b6324e18e	Make using safetensors files automated. (#27571 ) * [WIP] Make using safetensors files automated. If `use_safetensors=True` is used, and it doesn't exist: - Don't crash just yet - Lookup for an open PR containing it. - If yes, use that instead - If not, touch the space to convert, wait for conversion to be finished and the PR to be opened - Use that new PR - Profit. * Remove the token. * [Auto Safetensors] Websocket -> SSE (#27656) * Websocket -> SSE * Support sharded + tests +cleanup a * env var * Apply suggestions from code review * Thanks Simon * Thanks Wauplin Co-authored-by: Wauplin <lucainp@gmail.com> * Cleanup * Update tests * Tests should pass * Apply to other tests * Extend extension * relax requirement on latest hfh * Revert * Correct private handling & debug statements * Skip gated repos as of now * Address review comments Co-authored-by: ArthurZucker <arthur.zucker@gmail.com> --------- Co-authored-by: Lysandre Debut <hi@lysand.re> Co-authored-by: Lysandre <lysandre@huggingface.co> Co-authored-by: Wauplin <lucainp@gmail.com> Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr> Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>	2023-12-01 15:51:10 +01:00
Yoach Lacombe	29f1aee3b6	Add SeamlessM4T v2 (#27779 ) * add working convertion script * first non-working version of modeling code * update modeling code (working) * make style * make fix-copies * add config docstrings * add config to ignore docstrings formatage due to unconventional markdown * fix copies * fix generation num_return_sequences * enrich docs * add and fix tests beside integration tests * update integration tests * update repo id * add tie weights and make style * correct naming in .md * fix imports and so on * correct docstrings * fix fp16 speech forward * fix speechencoder attention * make style * fix copied from * rename SeamlessM4Tv2-v2 to SeamlessM4Tv2 * Apply suggestions on configuration Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * remove useless public models * fix private models + better naming for T2U models * clean speech encoder relative position embeddings * refactor chunk attention * add docstrings to chunk attention method * improve naming and docstrings * rename some attention variables + add temperature sampling in T2U model * rename DOCSTRINGS variable names * make style + remove 2 useless config parameters * enrich model card * remove any attention_head reference + fix temperature in T2U * new fmt and make style * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * rename spkr_id->speaker_id and change docstrings of get_char_input_ids * simplify v2attention * make style * Update seamless_m4t_v2.md * update code and tests with last update * update repo ids * fill article name, abstract andauthors * update not_doctested and slow_doc tests --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-11-30 20:24:43 +01:00
Joao Gante	510270af34	Generate: `GenerationConfig` throws an exception when `generate` args are passed (#27757 )	2023-11-30 14:16:31 +00:00
Kashif Rasul	af8acc4760	[Time series] Add patchtst (#27581 ) * add distribution head to forecasting * formatting * Add generate function for forecasting * Add generate function to prediction task * formatting * use argsort * add past_observed_mask ordering * fix arguments * docs * add back test_model_outputs_equivalence test * formatting * cleanup * formatting * use ACT2CLS * formatting * fix add_start_docstrings decorator * add distribution head and generate function to regression task add distribution head and generate function to regression task. Also made add PatchTSTForForecastingOutput, PatchTSTForRegressionOutput. * add distribution head and generate function to regression task add distribution head and generate function to regression task. Also made add PatchTSTForForecastingOutput, PatchTSTForRegressionOutput. * fix typos * add forecast_masking * fixed tests * use set_seed * fix doc test * formatting * Update docs/source/en/model_doc/patchtst.md Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * better var names * rename PatchTSTTranspose * fix argument names and docs string * remove compute_num_patches and unused class * remove assert * renamed to PatchTSTMasking * use num_labels for classification * use num_labels * use default num_labels from super class * move model_type after docstring * renamed PatchTSTForMaskPretraining * bs -> batch_size * more review fixes * use hidden_state * rename encoder layer and block class * remove commented seed_number * edit docstring * Add docstring * formatting * use past_observed_mask * doc suggestion * make fix-copies * use Args: * add docstring * add docstring * change some variable names and add PatchTST before some class names * formatting * fix argument types * fix tests * change x variable to patch_input * format * formatting * fix-copies * Update tests/models/patchtst/test_modeling_patchtst.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * move loss to forward * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * formatting * fix a bug when pre_norm is set to True * output_hidden_states is set to False as default * set pre_norm=True as default * format docstring * format * output_hidden_states is None by default * add missing docs * better var names * docstring: remove default to False in output_hidden_states * change labels name to target_values in regression task * format * fix tests * change to forecast_mask_ratios and random_mask_ratio * change mask names * change future_values to target_values param in the prediction class * remove nn.Sequential and make PatchTSTBatchNorm class * black * fix argument name for prediction * add output_attentions option * add output_attentions to PatchTSTEncoder * formatting * Add attention output option to all classes * Remove PatchTSTEncoderBlock * create PatchTSTEmbedding class * use config in PatchTSTPatchify * Use config in PatchTSTMasking class * add channel_attn_weights * Add PatchTSTScaler class * add output_attentions arg to test function * format * Update doc with image patchtst.md * fix-copies * rename Forecast <-> Prediction * change name of a few parameters to match with PatchTSMixer. * Remove ForForecasting class to match with other time series models. make style * Remove PatchTSTForForecasting in the test * remove PatchTSTForForecastingOutput class * change test_forecast_head to test_prediction_head * style * fix docs * fix tests * change num_labels to num_targets * Remove PatchTSTTranspose * remove arguments in PatchTSTMeanScaler * remove arguments in PatchTSTStdScaler * add config as an argument to all the scaler classes * reformat * Add norm_eps for batchnorm and layernorm * reformat. * reformat * edit docstring * update docstring * change variable name pooling to pooling_type * fix output_hidden_states as tuple * fix bug when calling PatchTSTBatchNorm * change stride to patch_stride * create PatchTSTPositionalEncoding class and restructure the PatchTSTEncoder * formatting * initialize scalers with configs * edit output_hidden_states * style * fix forecast_mask_patches doc string * doc improvements * move summary to the start * typo * fix docstring * turn off masking when using prediction, regression, classification * return scaled output * adjust output when using distribution head * remove _num_patches function in the config * get config.num_patches from patchifier init * add output_attentions docstring, remove tuple in output_hidden_states * change SamplePatchTSTPredictionOutput and SamplePatchTSTRegressionOutput to SamplePatchTSTOutput * remove print("model_class: ", model_class) * change encoder_attention_heads to num_attention_heads * change norm to norm_layer * change encoder_layers to num_hidden_layers * change shared_embedding to share_embedding, shared_projection to share_projection * add output_attentions * more robust check of norm_type * change dropout_path to path_dropout * edit docstring * remove positional_encoding function and add _init_pe in PatchTSTPositionalEncoding * edit shape of cls_token and initialize it * add a check on the num_input_channels. * edit head_dim in the Prediction class to allow the use of cls_token * remove some positional_encoding_type options, remove learn_pe arg, initalize pe * change Exception to ValueError * format * norm_type is "batchnorm" * make style * change cls_token shape * Change forecast_mask_patches to num_mask_patches. Remove forecast_mask_ratios. * Bring PatchTSTClassificationHead on top of PatchTSTForClassification * change encoder_ffn_dim to ffn_dim and edit the docstring. * update variable names to match with the config * add generation tests * change num_mask_patches to num_forecast_mask_patches * Add examples explaining the use of these models * make style * Revert "Revert "[time series] Add PatchTST (#25927)" (#27486)" This reverts commit `78f6ed6c70`. * make style * fix default std scaler's minimum_scale * fix docstring * close code blocks * Update docs/source/en/model_doc/patchtst.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/patchtst/test_modeling_patchtst.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/patchtst/configuration_patchtst.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix tests * add add_start_docstrings * move examples to the forward's docstrings * update prepare_batch * update test * fix test_prediction_head * fix generation test * use seed to create generator * add output_hidden_states and config.num_patches * add loc and scale args in PatchTSTForPredictionOutput * edit outputs if if not return_dict * use self.share_embedding to check instead checking type. * remove seed * make style * seed is an optional int * fix test * generator device * Fix assertTrue test * swap order of items in outputs when return_dict=False. * add mask_type and random_mask_ratio to unittest * Update modeling_patchtst.py * add add_start_docstrings for regression model * make style * update model path * Edit the ValueError comment in forecast_masking * update examples * make style * fix commented code * update examples: remove config from from_pretrained call * Edit example outputs * Set default target_values to None * remove config setting in regression example * Update configuration_patchtst.py * Update configuration_patchtst.py * remove config from examples * change default d_model and ffn_dim * norm_eps default * set has_attentions to Trye and define self.seq_length = self.num_patche * update docstring * change variable mask_input to do_mask_input * fix blank space. * change logger.debug to logger.warning. * remove unused PATCHTST_INPUTS_DOCSTRING * remove all_generative_model_classes * set test_missing_keys=True * remove undefined params in the docstring. --------- Co-authored-by: nnguyen <nnguyen@us.ibm.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Nam Nguyen <namctin@gmail.com> Co-authored-by: Wesley Gifford <79663411+wgifford@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-11-29 13:36:38 +01:00
Susnato Dhar	dfbd209c25	CLVP Fixes (#27547 ) * fixes * more fixes * style fix * more fix * comments	2023-11-28 17:40:01 +01:00
Yih-Dar	30e92ea323	Trigger corresponding pipeline tests if `tests/utils/tiny_model_summary.json` is modified (#27693 ) * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-11-28 17:21:21 +01:00
NielsRogge	1fb3c23b41	Add BeitBackbone (#25952 ) * First draft * Add backwards compatibility * More improvements * More improvements * Improve error message * Address comment * Add conversion script * Fix style * Update code snippet * Adddress comment * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-11-28 08:38:32 +00:00
Charbel Abi Daher	2ca73e5ee3	Fixed passing scheduler-specific kwargs via TrainingArguments lr_scheduler_kwargs (#27595 ) * Fix passing scheduler-specific kwargs through TrainingArguments `lr_scheduler_kwargs` * Added test for lr_scheduler_kwargs	2023-11-28 08:33:45 +01:00
NielsRogge	59499bbe8b	Update forward signature test for vision models (#27681 ) * Update forward signature * Empty-Commit	2023-11-27 15:48:17 +01:00
jiqing-feng	1d7f406e19	fix assisted decoding assistant model inputs (#27503 ) * fix assisted decoding attention_cat * fix attention_mask for assisted decoding * fix attention_mask len * fix attn len * Use a more clean way to prepare assistant models inputs * fix param meaning * fix param name * fix assistant model inputs * update token type ids * fix assistant kwargs copy * add encoder-decoder tests of assisted decoding * check if assistant kwargs contains updated keys * revert test * fix whisper tests * fix assistant kwargs * revert whisper test * delete _extend funcs	2023-11-27 14:23:54 +00:00
Yanan Xie	b09912c8f4	Fix mistral generate for long prompt / response (#27548 ) * Fix mistral generate for long prompt / response * Add unit test * fix linter * fix linter * fix test * add assisted generation test for mistral and load the model in 4 bit + fa2	2023-11-27 10:18:41 +01:00
Yih-Dar	35551f9a0f	Fix `TVPModelTest` (#27695 ) * fix * fix * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-11-24 19:47:50 +01:00
Yih-Dar	7293fdc5b9	Deprecate `TransfoXL` (#27607 ) * fix * fix * trigger * Apply suggestions from code review Co-authored-by: Lysandre Debut <hi@lysand.re> * tic * revert * revert --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Lysandre Debut <hi@lysand.re>	2023-11-24 11:48:02 +01:00
Yih-Dar	623432dcc9	Skip pipeline tests for 2 models for now (#27687 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-11-24 09:43:20 +01:00
Yih-Dar	b8db265bc6	Update tiny model summary file (#27388 ) * update * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-11-23 21:00:39 +01:00
dg845	7f6a804d30	Add UnivNet Vocoder Model for Tortoise TTS Diffusers Integration (#24799 ) * initial commit * Add inital testing files and modify __init__ files to add UnivNet imports. * Fix some bugs * Add checkpoint conversion script and add references to transformers pre-trained model. * Add UnivNet entries for auto. * Add initial docs for UnivNet. * Handle input and output shapes in UnivNetGan.forward and add initial docstrings. * Write tests and make them pass. * Write docs. * Add UnivNet doc to _toctree.yml and improve docs. * fix typo * make fixup * make fix-copies * Add upsample_rates parameter to config and improve config documentation. * make fixup * make fix-copies * Remove unused upsample_rates config parameter. * apply suggestions from review * make style * Verify and add reason for skipped tests inherited from ModelTesterMixin. * Add initial UnivNetGan integration tests * make style * Remove noise_length input to UnivNetGan and improve integration tests. * Fix bug and make style * Make UnivNet integration tests pass * Add initial code for UnivNetFeatureExtractor. * make style * Add initial tests for UnivNetFeatureExtractor. * make style * Properly initialize weights for UnivNetGan * Get feature extractor fast tests passing * make style * Get feature extractor integration tests passing * Get UnivNet integration tests passing * make style * Add UnivNetGan usage example * make style and use feature extractor from hub in integration tests * Update tips in docs * apply suggestions from review * make style * Calculate padding directly instead of using get_padding methods. * Update UnivNetFeatureExtractor.to_dict to be UnivNet-specific. * Update feature extractor to support using model(*inputs) and add the ability to generate noise and pad the end of the spectrogram in __call__. Perform padding before generating noise to ensure the shapes are correct. * Rename UnivNetGan.forward's noise_waveform argument to noise_sequence. * make style * Add tests to test generating noise and padding the end for UnivNetFeatureExtractor.__call__. * Add tests for checking batched vs unbatched inputs for UnivNet feature extractor and model. * Add expected mean and stddev checks to the integration tests and make them pass. * make style * Make it possible to use model(*inputs), where inputs is the output of the feature extractor. fix typo in UnivNetGanConfig example * Calculate spectrogram_zero from other config values. * apply suggestions from review * make style * Refactor UnivNet conversion script to use load_state_dict (following persimmon). * Rename UnivNetFeatureExtractor to UnivNetGanFeatureExtractor. * make style * Switch to using torch.tensor and torch.testing.assert_close for testing expected values/slices. * make style * Use config in UnivNetGan modeling blocks. * make style * Rename the spectrogram argument of UnivNetGan.forward to input_features, following Whisper. * make style * Improving padding documentation. * Add UnivNet usage example to the docs. * apply suggestions from review * Move dynamic_range_compression computation into the mel_spectrogram method of the feature extractor. * Improve UnivNetGan.forward return docstring. * Update table in docs/source/en/index.md. * make fix-copies * Rename UnivNet components to have pattern UnivNet. make style * make fix-copies * Update docs * make style * Increase tolerance on flaky unbatched integration test. * Remove torch.no_grad decorators from UnivNet integration tests to try to avoid flax/Tensorflow test errors. * Add padding_mask argument to UnivNetModel.forward and add batch_decode feature extractor method to remove padding. * Update documentation and clean up padding code. * make style * make style * Remove torch dependency from UnivNetFeatureExtractor. * make style * Fix UnivNetModel usage example * Clean up feature extractor code/docstrings. * apply suggestions from review * make style * Add comments for tests skipped via ModelTesterMixin flags. * Add comment for model parallel tests skipped via the test_model_parallel ModelTesterMixin flag. * Add # Copied from statements to copied UnivNetFeatureExtractionTest tests. * Simplify UnivNetFeatureExtractorTest.test_batch_decode. * Add support for unbatched padding_masks in UnivNetModel.forward. * Refactor unbatched padding_mask support. * make style	2023-11-22 17:21:36 +01:00
Patrick von Platen	4151fbb49c	[Whisper] Add sequential longform decoding (#27492 ) * [Whisper] Add seq gen * [Whisper] Add seq gen * more debug * Fix whisper logit processor * Improve whisper code further * Fix more * more debug * more debug * Improve further * Add tests * Prep for batch size > 1 * Get batch_size>1 working * Correct more * Add extensive tests * more debug * more debug * more debug * add more tests * more debug * Apply suggestions from code review * more debug * add comments to explain the code better * add comments to explain the code better * add comments to explain the code better * Add more examples * add comments to explain the code better * fix more * add comments to explain the code better * add comments to explain the code better * correct * correct * finalize * Apply suggestions from code review * Apply suggestions from code review	2023-11-22 13:27:34 +01:00
fxmarty	7f04373865	Explicitely specify `use_cache=True` in Flash Attention tests (#27635 ) explicit use_cache=True	2023-11-22 01:53:10 +09:00
jiqing-feng	c770600fde	TVP model (#25856 ) * tvp model for video grounding add tokenizer auto fix param in TVPProcessor add docs clear comments and enable different torch dtype add image processor test and model test and fix code style * fix conflict * fix model doc * fix image processing tests * fix tvp tests * remove torch in processor * fix grammar error * add more details on tvp.md * fix model arch for loss, grammar, and processor * add docstring and do not regard TvpTransformer, TvpVisionModel as individual model * use pad_image * update copyright * control first downsample stride * reduce first only works for ResNetBottleNeckLayer * fix param name * fix style * add testing * fix style * rm init_weight * fix style * add post init * fix comments * do not test TvpTransformer * fix warning * fix style * fix example * fix config map * add link in config * fix comments * fix style * rm useless param * change attention * change test * add notes * fix comments * fix tvp * import checkpointing * fix gradient checkpointing * Use a more accurate example in readme * update * fix copy * fix style * update readme * delete print * remove tvp test_forward_signature * remove TvpTransformer * fix test init model * merge main and make style * fix tests and others * fix image processor * fix style and model_input_names * fix tests	2023-11-21 16:41:55 +00:00
amyeroberts	0145c6825e	Fix tracing dinov2 (#27561 ) * Enable tracing with DINOv2 model * ABC * Add note to model doc	2023-11-21 14:28:38 +00:00
fxmarty	82cc0a79ac	Fix flash attention bugs with Mistral and Falcon (#27625 ) * fix various bugs with flash attention * bump * fix test * fix mistral * use skiptest instead of return that may be misleading * fix on review	2023-11-21 23:20:44 +09:00
Leo Tronchon	851a4f7088	Idefics: Fix information leak with cross attention gate in modeling (#26839 ) * fix image_attention gate in idefics modeling * update comment * cleaner gating * fix gate condition * create attention gate once * update comment * update doc of cross-attention forward * improve comment * bring back no_images * pass cross_attention_gate similarly to no_images gate * add information on gate shape * fix no_images placement * make tests for gate * take off no_images logic * update test based on comments * raise value error if cross_attention_gate is None * send cross_attention_gate to device * Revert "send cross_attention_gate to device" This reverts commit `054f842284`. * send cross_attention_gate to device * fix device in test + nit * fill hidden_states with zeros instead of multiplying with the gate * style * Update src/transformers/models/idefics/modeling_idefics.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/idefics/modeling_idefics.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-11-21 13:26:01 +01:00
Dave Berenbaum	8eb9e29d8d	dvclive callback: warn instead of fail when logging non-scalars (#27608 ) * dvclive callback: warn instead of fail when logging non-scalars * tests: log lr as scalar	2023-11-21 09:29:51 +01:00
Younes Belkada	e66984f995	[`FA-2`] Add fa2 support for `from_config` (#26914 ) * add fa2 support for from_config * Update test_modeling_common.py	2023-11-20 16:45:55 +01:00
Joel Tang	dbf7bfafa7	Fix idx2sym not loaded from pretrained vocab file in Transformer XL (#27589 ) * Load idx2sym from pretrained vocab file in Transformer XL When loading vocab file from a pretrained tokenizer for Transformer XL, although the pickled vocabulary file contains a idx2sym key, it isn't loaded, because it is discarded as the empty list already exists as an attribute. Solution is to explicitly take it into account, just like for sym2idx. * ran make style	2023-11-20 07:56:18 +01:00
V.Prasanna kumar	ffbcfc0166	Broken links fixed related to datasets docs (#27569 ) fixed the broken links belogs to dataset library of transformers	2023-11-17 13:44:09 -08:00
Joao Gante	913d03dc5e	Generate: fix flaky tests (#27543 )	2023-11-17 10:15:00 +00:00
Yih-Dar	fe3ce061c4	Skip some fuyu tests (#27553 ) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-11-17 10:35:04 +01:00
Joao Gante	12b50c6130	Generate: improve assisted generation tests (#27540 )	2023-11-16 18:54:20 +00:00
Arthur	651408a077	[`Styling`] stylify using ruff (#27144 ) * try to stylify using ruff * might need to remove these changes? * use ruf format andruff check * use isinstance instead of type comparision * use # fmt: skip * use # fmt: skip * nits * soem styling changes * update ci job * nits isinstance * more files update * nits * more nits * small nits * check and format * revert wrong changes * actually use formatter instead of checker * nits * well docbuilder is overwriting this commit * revert notebook changes * try to nuke docbuilder * style * fix feature exrtaction test * remve `indent-width = 4` * fixup * more nits * update the ruff version that we use * style * nuke docbuilder styling * leve the print for detected changes * nits * Remove file I/O Co-authored-by: charliermarsh <charlie.r.marsh@gmail.com> * style * nits * revert notebook changes * Add # fmt skip when possible * Add # fmt skip when possible * Fix * More ` # fmt: skip` usage * More ` # fmt: skip` usage * More ` # fmt: skip` usage * NIts * more fixes * fix tapas * Another way to skip * Recommended way * Fix two more fiels * Remove asynch Remove asynch --------- Co-authored-by: charliermarsh <charlie.r.marsh@gmail.com>	2023-11-16 17:43:19 +01:00
Lucain	fd65aa9818	Set `usedforsecurity=False` in hashlib methods (FIPS compliance) (#27483 ) * Set usedforsecurity=False in hashlib methods (FIPS compliance) * trigger ci * tokenizers version * deps * bump hfh version * let's try this	2023-11-16 14:29:53 +00:00
Patrick von Platen	5603fad247	Revert "add attention_mask and position_ids in assisted model" (#27523 ) * Revert "add attention_mask and position_ids in assisted model (#26892)" This reverts commit `184f60dcec`. * more debug	2023-11-16 14:50:39 +01:00
Marc Sun	1ac599d90f	Fix offload disk for loading derivated model checkpoint into base model (#27253 ) * fix * style * add test	2023-11-15 14:58:08 -05:00
Arthur	48ba1e074f	[ `PretrainedConfig`] Improve messaging (#27438 ) * import hf error * nits * fixup * catch the error at the correct place * style * improve message a tiny bit * Update src/transformers/utils/hub.py Co-authored-by: Lucain <lucainp@gmail.com> * add a test --------- Co-authored-by: Lucain <lucainp@gmail.com>	2023-11-15 14:10:39 +01:00
Xin Qiu	453079c7f8	🚨🚨 Fix beam score calculation issue for decoder-only models (#27351 ) * Fix beam score calculation issue for decoder-only models * Update beam search test and fix code quality issue * Fix beam_sample, group_beam_search and constrained_beam_search * Split test for pytorch and TF, add documentation --------- Co-authored-by: Xin Qiu <xin.qiu@sentient.ai>	2023-11-15 12:49:14 +00:00
Arthur	1e0e2dd376	[`CircleCI`] skip test_assisted_decoding_sample for everyone (#27511 ) * skip 4 tests * nits * style * wow it's not my day * skip new failing tests * style * skip for NLLB MoE as well * skip `test_assisted_decoding_sample` for everyone	2023-11-15 10:17:51 +01:00
NielsRogge	cc0dc24bc9	[Fuyu] Add tests (#27001 ) * Add tests * Add integration test * More improvements * Fix tests * Fix style * Skip gradient checkpointing tests * Update script * Remove scripts * Remove Fuyu from auto mapping * Fix integration test * More improvements * Remove file * Add Fuyu to slow documentation tests * Address comments * Clarify comment	2023-11-15 09:33:04 +01:00
Arthur	186c077513	[`CI-test_torch`] skip test_tf_from_pt_safetensors and `test_assisted_decoding_sample` (#27508 ) * skip 4 tests * nits * style * wow it's not my day * skip new failing tests * style * skip for NLLB MoE as well	2023-11-15 08:39:29 +01:00
Zach Mueller	067c4a310d	Have seq2seq just use gather (#27025 ) * Have seq2seq just use gather * Change * Reset after * Make slow * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Clean * Simplify and just use gather * Update tests/trainer/test_trainer_seq2seq.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * gather always for seq2seq --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-11-14 14:54:44 -05:00
amyeroberts	78f6ed6c70	Revert "[time series] Add PatchTST (#25927 )" (#27486 ) The model was merged before final review and approval. This reverts commit `2ac5b9325e`.	2023-11-14 12:24:00 +00:00
Sanchit Gandhi	a4616c6767	[Whisper] Fix pipeline test (#27442 )	2023-11-14 11:18:26 +00:00
Sihan Chen	4309abedbc	Add speecht5 batch generation and fix wrong attention mask when padding (#25943 ) * fix speecht5 wrong attention mask when padding * enable batch generation and add parameter attention_mask * fix doc * fix format * batch postnet inputs, return batched lengths, and consistent to old api * fix format * fix format * fix the format * fix doc-builder error * add test, cross attention and docstring * optimize code based on reviews * docbuild * refine * not skip slow test * add consistent dropout for batching * loose atol * add another test regarding to the consistency of vocoder * fix format * refactor * add return_concrete_lengths as parameter for consistency w/wo batching * fix review issues * fix cross_attention issue	2023-11-14 09:54:09 +00:00
Arthur	e107ae364e	[`CI-test_torch`] skip `test_tf_from_pt_safetensors` for 4 models (#27481 ) * skip 4 tests * nits * style * wow it's not my day	2023-11-14 10:34:03 +01:00
Younes Belkada	d71fa9f618	[`Peft`] `modules_to_save` support for peft integration (#27466 ) * `modules_to_save` support for peft integration * Update docs/source/en/peft.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * slightly elaborate test --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-11-14 10:32:57 +01:00
Gift Sinthong	2ac5b9325e	[time series] Add PatchTST (#25927 ) * Initial commit of PatchTST model classes Co-authored-by: Phanwadee Sinthong <phsinthong@gmail.com> Co-authored-by: Nam Nguyen <namctin@gmail.com> Co-authored-by: Vijay Ekambaram <vijaykr.e@gmail.com> Co-authored-by: Ngoc Diep Do <55230119+diepi@users.noreply.github.com> Co-authored-by: Wesley Gifford <79663411+wgifford@users.noreply.github.com> * Add PatchTSTForPretraining * update to include classification Co-authored-by: Phanwadee Sinthong <phsinthong@gmail.com> Co-authored-by: Nam Nguyen <namctin@gmail.com> Co-authored-by: Vijay Ekambaram <vijaykr.e@gmail.com> Co-authored-by: Ngoc Diep Do <55230119+diepi@users.noreply.github.com> Co-authored-by: Wesley Gifford <79663411+wgifford@users.noreply.github.com> * clean up auto files * Add PatchTSTForPrediction * Fix relative import * Replace original PatchTSTEncoder with ChannelAttentionPatchTSTEncoder * temporary adding absolute path + add PatchTSTForForecasting class * Update base PatchTSTModel + Unittest * Update ForecastHead to use the config class * edit cv_random_masking, add mask to model output * Update configuration_patchtst.py * add masked_loss to the pretraining * add PatchEmbeddings * Update configuration_patchtst.py * edit loss which considers mask in the pretraining * remove patch_last option * Add commits from internal repo * Update ForecastHead * Add model weight initilization + unittest * Update PatchTST unittest to use local import * PatchTST integration tests for pretraining and prediction * Added PatchTSTForRegression + update unittest to include label generation * Revert unrelated model test file * Combine similar output classes * update PredictionHead * Update configuration_patchtst.py * Add Revin * small edit to PatchTSTModelOutputWithNoAttention * Update modeling_patchtst.py * Updating integration test for forecasting * Fix unittest after class structure changed * docstring updates * change input_size to num_input_channels * more formatting * Remove some unused params * Add a comment for pretrained models * add channel_attention option add channel_attention option and remove unused positional encoders. * Update PatchTST models to use HF's MultiHeadAttention module * Update paper + github urls * Fix hidden_state return value * Update integration test to use PatchTSTForForecasting * Adding dataclass decorator for model output classes * Run fixup script * Rename model repos for integration test * edit argument explanation * change individual option to shared_projection * style * Rename integration test + import cleanup * Fix outpu_hidden_states return value * removed unused mode * added std, mean and nops scaler * add initial distributional loss for predition * fix typo in docs * add generate function * formatting * add num_parallel_samples * Fix a typo * copy weighted_average function, edit PredictionHead * edit PredictionHead * add distribution head to forecasting * formatting * Add generate function for forecasting * Add generate function to prediction task * formatting * use argsort * add past_observed_mask ordering * fix arguments * docs * add back test_model_outputs_equivalence test * formatting * cleanup * formatting * use ACT2CLS * formatting * fix add_start_docstrings decorator * add distribution head and generate function to regression task add distribution head and generate function to regression task. Also made add PatchTSTForForecastingOutput, PatchTSTForRegressionOutput. * add distribution head and generate function to regression task add distribution head and generate function to regression task. Also made add PatchTSTForForecastingOutput, PatchTSTForRegressionOutput. * fix typos * add forecast_masking * fixed tests * use set_seed * fix doc test * formatting * Update docs/source/en/model_doc/patchtst.md Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * better var names * rename PatchTSTTranspose * fix argument names and docs string * remove compute_num_patches and unused class * remove assert * renamed to PatchTSTMasking * use num_labels for classification * use num_labels * use default num_labels from super class * move model_type after docstring * renamed PatchTSTForMaskPretraining * bs -> batch_size * more review fixes * use hidden_state * rename encoder layer and block class * remove commented seed_number * edit docstring * Add docstring * formatting * use past_observed_mask * doc suggestion * make fix-copies * use Args: * add docstring * add docstring * change some variable names and add PatchTST before some class names * formatting * fix argument types * fix tests * change x variable to patch_input * format * formatting * fix-copies * Update tests/models/patchtst/test_modeling_patchtst.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * move loss to forward * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * formatting * fix a bug when pre_norm is set to True * output_hidden_states is set to False as default * set pre_norm=True as default * format docstring * format * output_hidden_states is None by default * add missing docs * better var names * docstring: remove default to False in output_hidden_states * change labels name to target_values in regression task * format * fix tests * change to forecast_mask_ratios and random_mask_ratio * change mask names * change future_values to target_values param in the prediction class * remove nn.Sequential and make PatchTSTBatchNorm class * black * fix argument name for prediction * add output_attentions option * add output_attentions to PatchTSTEncoder * formatting * Add attention output option to all classes * Remove PatchTSTEncoderBlock * create PatchTSTEmbedding class * use config in PatchTSTPatchify * Use config in PatchTSTMasking class * add channel_attn_weights * Add PatchTSTScaler class * add output_attentions arg to test function * format * Update doc with image patchtst.md * fix-copies * rename Forecast <-> Prediction * change name of a few parameters to match with PatchTSMixer. * Remove ForForecasting class to match with other time series models. make style * Remove PatchTSTForForecasting in the test * remove PatchTSTForForecastingOutput class * change test_forecast_head to test_prediction_head * style * fix docs * fix tests * change num_labels to num_targets * Remove PatchTSTTranspose * remove arguments in PatchTSTMeanScaler * remove arguments in PatchTSTStdScaler * add config as an argument to all the scaler classes * reformat * Add norm_eps for batchnorm and layernorm * reformat. * reformat * edit docstring * update docstring * change variable name pooling to pooling_type * fix output_hidden_states as tuple * fix bug when calling PatchTSTBatchNorm * change stride to patch_stride * create PatchTSTPositionalEncoding class and restructure the PatchTSTEncoder * formatting * initialize scalers with configs * edit output_hidden_states * style * fix forecast_mask_patches doc string --------- Co-authored-by: Gift Sinthong <gift.sinthong@ibm.com> Co-authored-by: Nam Nguyen <namctin@gmail.com> Co-authored-by: Vijay Ekambaram <vijaykr.e@gmail.com> Co-authored-by: Ngoc Diep Do <55230119+diepi@users.noreply.github.com> Co-authored-by: Wesley Gifford <79663411+wgifford@users.noreply.github.com> Co-authored-by: Wesley M. Gifford <wmgifford@us.ibm.com> Co-authored-by: nnguyen <nnguyen@us.ibm.com> Co-authored-by: Ngoc Diep Do <diiepy@gmail.com> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2023-11-13 19:06:32 +01:00
Younes Belkada	7b139023c3	[`AWQ` ] Addresses TODO for awq tests (#27467 ) addresses todo for awq tests	2023-11-13 18:18:41 +01:00
NielsRogge	2422c38de6	Add DINOv2 depth estimation (#26092 ) * First draft * Fix style * More improvements * Fix tests * Fix tests * Convert checkpoint * Improve DPTImageProcessor * Remove scripts, improve conversion script * Remove print statements * Fix test * Improve docstring * More improvements * Fix style * Fix image processor * Add tests * Address comments * Address comments * Make bias backwards compatible * Address comment * Address comment * Address comment * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Address comments * Add flag * Add tests * Make tests smaller * Use regular BackboneOutput * Fix all tests * Update test * Convert more checkpoints * Convert giant checkpoints, add integration test * Rename size_divisibility to size_divisor --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-11-13 16:20:42 +00:00
Lysandre Debut	68ae3be7f5	Fix `from_pt` flag when loading with `safetensors` (#27394 ) * Fix * Tests * Fix	2023-11-13 15:18:19 +01:00
Lysandre Debut	9dc8fe1b32	Default to msgpack for safetensors (#27460 ) * Default to msgpack for safetensors * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-11-13 15:17:01 +01:00
Arthur	b97cab7e6d	Remove-auth-token (#27060 ) * don't use `use_auth_token`internally * let's use token everywhere * fixup	2023-11-13 14:20:54 +01:00
amyeroberts	ed115b3473	Normalize floating point cast (#27249 ) * Normalize image - cast input images to float32. This is done if the input image isn't of floating type. Issues can occur when do_rescale=False is set in an image processor. When this happens, the image passed to the call is of type uint8 becuase of the type casting that happens in resize because of the PIL image library. As the mean and std values are cast to match the image dtype, this can cause NaNs and infs to appear in the normalized image, as the floating values being used to divide the image are now set to 0. The reason the mean and std values are cast is because previously they were set as float32 by default. However, if the input image was of type float16, the normalization would result in the image being upcast to float32 too. * Add tests * Remove float32 cast	2023-11-10 15:35:27 +00:00
Susnato Dhar	e1c3ac2551	Add Phi-1 and Phi-1_5 (#26170 ) * only dir not even init * init * tokenizer removed and reference of codegen added * modeling file updated a lot remaining app_rotary_emb * conversion script done * conversion script fixed, a lot of factoring done and most tests pass * added token_clf and extractive_QA_head * integration tests pass * flash attn tests pass! * config done * more docs in modeling file * some style fix * style and others * doc test error fix * more doc fix * some attention fixes * most fixes * style and other fixes * docs fix and config * doc fix * some comments * conversion script updated * conversion script updated * Revert "conversion script updated" This reverts commit e92378c54084ec0747041b113083d1746ecb6c7f. * final comments * add Phi to language_modeling.md * edit phi.md file * rebase and fix * removed phi-1.5 example * changed model_type from 'phi'->'mixformer-sequential' * small change * small change * revert \small change * changed mixformer-sequential->phi * small change * added phi-1.5 example instead of phi-1 * doc test might pass now * rebase and small change * added the dropout layer * more fixes * modified .md file * very very small doc change	2023-11-10 15:28:30 +00:00
Arthur	68afca3e69	[`AttentionMaskConverter`] ]Fix-mask-inf (#27114 ) * fix? * actual fix * fixups * add dataclass to the attention mask converter * refine testing suite * make sure there are no overflows * update the test	2023-11-10 15:22:43 +01:00
Susnato Dhar	7e9f10ac94	Add CLVP (#24745 ) * init commit * attention arch done except rotary emb * rotary emb done * text encoder working * outputs matching * arch first pass done * make commands done, tests and docs remaining * all tests passed, only docs remaining * docs done * doc-builder fix * convert script removed(not relevant) * minor comments done * added ckpt conversion script * tokenizer done * very minor fix of index.md 2 * mostly make fixup related * all done except fe and rotary emb * very small change * removed unidecode dependency * style changes * tokenizer removed require_backends * added require_inflect to tokenizer tests * removed VOCAB_FILES in tokenizer test * inflect dependency removed * added rotary pos emb cache and simplified the apply method * style * little doc change * more comments * feature extractor added * added processor * auto-regressive config added * added CLVPConditioningEncoder * comments done except the test one * weights added successfull(NOT tested) * tokenizer fix with numbers * generate outputs matching * almost tests passing Integ tests not written * Integ tests added * major CUDA error fixed * docs done * rebase and multiple fixes * fixed rebase overwrites * generate code simplified and tests for AutoRegressive model added * minor changes * refectored gpt2 code in clvp file * weights done and all code refactored * mostly done except the fast_tokenizer * doc test fix * config file's doc fixes * more config fix * more comments * tokenizer comments mostly done * modeling file mostly refactored and can load modules * ClvpEncoder tested * ClvpDecoder, ClvpModel and ClvpForCausalLM tested * integration and all tests passed * more fixes * docs almost done * ckpt conversion refectored * style and some failing tests fix * comments * temporary output fix but test_assisted_decoding_matches_greedy_search test fails * majority changes done * use_cache outputs same now! Along with the asisted_greedy_decoding test fix * more comments * more comments * prepare_inputs_for_generation fixed and _prepare_model_inputs added * style fix * clvp.md change * moved clvpconditionalencoder norms * add model to new index * added tokenizer input_ids_with_special_tokens * small fix * config mostly done * added config-tester and changed conversion script * more comments * comments * style fix * some comments * tokenizer changed back to prev state * small commnets * added output hidden states for the main model * style fix * comments * small change * revert small change * . * Update clvp.md * Update test_modeling_clvp.py * :) * some minor change * new fixes * remove to_dict from FE	2023-11-10 13:49:10 +00:00
Younes Belkada	fd685cfd59	[`Quantization`] Add str to enum conversion for AWQ (#27320 ) * add str to enum conversion * fixup * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-11-10 13:45:00 +01:00
Yoach Lacombe	51a98c40ee	remove failing tests and clean FE files (#27414 ) * remove failing tests and clean FE files * remove same similar text from tvlt	2023-11-09 18:35:42 +00:00
Lucain	e38348ae8f	Fix RequestCounter to make it more future-proof (#27406 ) * Fix RequestCounter to make it more future-proof * code quality	2023-11-09 18:53:26 +01:00
Yih-Dar	3258ff9330	use `pytest.mark` directly (#27390 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-11-09 13:32:54 +01:00
Hz, Ji	c5d7754b11	device-agnostic deepspeed testing (#27342 )	2023-11-09 12:34:13 +01:00
amyeroberts	9999b73968	Skip failing cache call tests (#27393 ) * Skip failing cache call tests * Fixup	2023-11-09 11:03:37 +00:00
Arthur	085ea7e56c	[`CodeLlamaTokenizer`] Nit, update __init__ to make sure the AddedTokens are not normalized because they are special (#27359 ) * make sure tokens are properly initialized for codellama slow * add m ore pretrained models * style * test more tokenizers checkpoints	2023-11-09 10:15:10 +01:00
Sourab Mangrulkar	7ecd229ba4	Smangrul/fix failing ds ci tests (#27358 ) * fix failing DeepSpeed CI tests due to `safetensors` being default * debug * remove debug statements * resolve comments * Update test_deepspeed.py	2023-11-09 11:47:24 +05:30
Sergii Dymchenko	0e402e1478	Update deprecated `torch.range` in `test_modeling_ibert.py` (#27355 ) * Update deprecated torch.range * Remove comment	2023-11-08 20:58:36 +01:00
Yoach Lacombe	a5bee89c9d	Add Flash Attention 2 support to Bark (#27364 ) * change handmade attention mask to _prepare_4d_attention_mask * add flashattention2 support in Bark * add flashattention2 tests on BarkSemanticModel * make style * fix flashattention and tests + make style * fix memory leak and allow Bark to pass flash attention to sub-models * make style * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * remove unecessary code from tests + justify overriding * Update tests/models/bark/test_modeling_bark.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * make style --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-11-08 17:06:35 +00:00
Sanchit Gandhi	f16ff0f07e	MusicGen Update (#27084 ) * [MusicGen] Add stereo model * safe serialization * Update src/transformers/models/musicgen/modeling_musicgen.py * split over 2 lines * fix slow tests on cuda	2023-11-08 13:26:02 +00:00
Yoach Lacombe	be74b2ead6	Add numpy alternative to FE using torchaudio (#26339 ) * add audio_utils usage in the FE of SpeechToText * clean unecessary parameters of AudioSpectrogramTransformer FE * add audio_utils usage in AST * add serialization tests and function to FEs * make style * remove use_torchaudio and move to_dict to FE * test audio_utils usage * make style and fix import (remove torchaudio dependency import) * fix torch dependency for jax and tensor tests * fix typo * clean tests with suggestions * add lines to test if is_speech_availble is False	2023-11-08 07:39:37 +00:00
Yoach Lacombe	ac5d4cf6de	FIx Bark batching feature (#27271 ) * fix bark batching * make style * add tests and make style	2023-11-07 18:32:00 +00:00
Joao Gante	90b4adc1f1	Generate: skip tests on unsupported models instead of passing (#27265 )	2023-11-07 12:08:28 +00:00
Sanchit Gandhi	da7ea9a4e3	[Whisper] Block language/task args for English-only (#27322 ) * [Whisper] Block language/task args for English-only * Update src/transformers/models/whisper/modeling_whisper.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-11-07 10:04:23 +00:00
Yih-Dar	1b20e2bb42	Fix `Kosmos2Processor` batch mode (#27323 ) * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-11-06 19:05:50 +01:00
Hz, Ji	1ffc4dee5b	enable memory tracker metrics for npu (#27280 )	2023-11-06 13:44:21 +00:00
Susnato Dhar	1ac2463dfe	[`FA2`] Add flash attention for for `DistilBert` (#26489 ) * flash attention added for DistilBert * fixes * removed padding_masks * Update modeling_distilbert.py * Update test_modeling_distilbert.py * style fix	2023-11-03 16:07:54 +00:00
Younes Belkada	8f1a43cd91	[`PEFT` / `Tests` ] Fix peft integration failing tests (#27258 ) fix peft integration issues	2023-11-03 12:23:02 +01:00
Tom Aarsen	05ea7b79e6	Refactor: Use Llama RoPE implementation for Falcon (#26933 ) * Use Llama RoPE implementation for Falcon + Add copy functionalities * Use standard cache format for Falcon * Simplify apply_rotary_pos_emb, copy from Llama * Remove unnecessary cache conversion test We don't need to convert any caches anymore! * Resolve copy complaint	2023-11-03 11:05:55 +00:00
Yoach Lacombe	0ed6729bb1	Enrich TTS pipeline parameters naming (#26473 ) * enrich TTS pipeline docstring for clearer forward_params use * change token leghts * update Pipeline parameters * correct docstring and make style * fix tests * make style * change music prompt Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * raise errors if generate_kwargs with forward-only models * make style --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-11-02 17:06:56 +00:00
Joao Gante	a6c82d4567	Generate: return `past_key_values` (#25086 )	2023-11-02 15:39:21 +00:00
Nicolas Patry	8801861d2d	Fixing m4t. (#27240 ) * Fixing m4t. * Trying to remove comparison ? Odd test failure. * Adding shared. But why on earth does it hang ???? * Putting back the model weights checks the test is silently failing on cuda. * Fix style + unremoved comment.	2023-11-02 15:32:17 +01:00
Lysandre Debut	443bf5e9e2	Fix safetensors failing tests (#27231 ) * Fix Kosmos2 * Fix ProphetNet * Fix MarianMT * Fix M4T * XLM ProphetNet * ProphetNet fix * XLM ProphetNet * Final M4T fixes * Tied weights keys * Revert M4T changes * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-11-02 15:03:09 +01:00
Pablo Montalvo	8a312956fd	Fuyu: improve image processing (#27007 ) * Fix Fuyu image scaling bug It could produce negative padding and hence inference errors for certain image sizes. * initial rework commit * add batching capabilities, refactor image processing * add functional batching for a list of images and texts * make args explicit * Fuyu processing update (#27133) * Add file headers * Add file headers * First pass - preprocess method with standard args * First pass image processor rework * Small tweaks * More args and docstrings * Tidying iterating over batch * Tidying up * Modify to have quick tests (for now) * Fix up * BatchFeature * Passing tests * Add tests for processor * Sense check when patchifying * Add some tests * FuyuBatchFeature * Post-process box coordinates * Update to `size` in processor * Remove unused and duplicate constants * Store unpadded dims after resize * Fix up * Return FuyuBatchFeature * Get unpadded sizes after resize * Update exception * Fix return * Convert input `<box>` coordinates to model format. * Post-process point coords, support multiple boxes/points in a single sequence * Replace constants * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Preprocess List[List[image]] * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update to Amy's latest state. * post-processing returns a list of tensors * Fix error when target_sizes is None Co-authored-by: Pablo Montalvo <pablo.montalvo.leroux@gmail.com> * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Review comments * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Fix up * Fix up --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-72-126.ec2.internal> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Pablo Montalvo <pablo.montalvo.leroux@gmail.com> * Fix conflicts in fuyu_follow_up_image_processing (#27228) fixing conflicts and updating on main * Revert "Fix conflicts in fuyu_follow_up_image_processing" (#27232) Revert "Fix conflicts in fuyu_follow_up_image_processing (#27228)" This reverts commit `acce10b6c6`. --------- Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-72-126.ec2.internal>	2023-11-02 12:25:41 +01:00
Younes Belkada	9b25c164bd	[`core` / `Quantization`] Fix for 8bit serialization tests (#27234 ) * fix for 8bit serialization * added regression tests. * fixup	2023-11-02 12:03:51 +01:00
Patrick von Platen	af3de8d87c	[Whisper, Bart, MBart] Add Flash Attention 2 (#27203 ) * add whisper fa2 * correct * change all * correct * correct * fix more * fix more * fix more * fix more * fix more * fix more * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix more * fix more * fix more * fix more * fix more --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-11-01 21:03:01 +01:00
Lysandre Debut	95020f208e	Fix CPU offload + disk offload tests (#27204 ) Fix disk offload tests + weight sharing issues	2023-11-01 19:25:23 +01:00
Marc Sun	c9e72f55b2	Add exllamav2 better (#27111 ) * add_ xllamav2 arg * add test * style * add check * add doc * replace by use_exllama_v2 * fix tests * fix doc * style * better condition * fix logic * add deprecate msg * deprecate exllama * remove disable_exllama from the linter * remove * fix warning * Revert the commits deprecating exllama * deprecate disable_exllama for use_exllama * fix * fix loading attribute * better handling of args * remove disable_exllama from init and linter * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * better arg * fix warning * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * switch to dict * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * style * nits * style * better tests * style --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-11-01 13:09:21 -04:00
Andi Powers Holmes	f8afb2b2ec	Add TensorFlow implementation of ConvNeXTv2 (#25558 ) * Add type annotations to TFConvNextDropPath * Use tf.debugging.assert_equal for TFConvNextEmbeddings shape check * Add TensorFlow implementation of ConvNeXTV2 * check_docstrings: add TFConvNextV2Model to exclusions TFConvNextV2Model and TFConvNextV2ForImageClassification have docstrings which are equivalent to their PyTorch cousins, but a parsing issue prevents them from passing the test. Adding exclusions for these two classes as discussed in #25558.	2023-11-01 15:09:55 +00:00
Patrick von Platen	391d14e810	[WhisperForCausalLM] Add WhisperForCausalLM for speculative decoding (#27195 ) * finish * add tests * fix all tests * [Assistant Decoding] Add test * fix more * better * finish * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * finish --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-11-01 16:01:53 +01:00
Younes Belkada	ae093eef01	[`core` / `Quantization` ] AWQ integration (#27045 ) * working v1 * oops * Update src/transformers/modeling_utils.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * fixup * oops * push * more changes * add docs * some fixes * fix copies * add v1 doc * added installation guide * relax constraints * revert * attempt llm-awq * oops * oops * fixup * raise error when incorrect cuda compute capability * nit * add instructions for llm-awq * fixup * fix copies * fixup and docs * change * few changes + add demo * add v1 tests * add autoawq in dockerfile * finalize * Update tests/quantization/autoawq/test_awq.py * fix test * fix * fix issue * Update src/transformers/integrations/awq.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/main_classes/quantization.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/main_classes/quantization.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/integrations/awq.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/integrations/awq.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * add link to example script * Update docs/source/en/main_classes/quantization.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * add more content * add more details * add link to quantization docs * camel case + change backend class name * change to string * fixup * raise errors if libs not installed * change to `bits` and `group_size` * nit * nit * Apply suggestions from code review Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * disable training * address some comments and fix nits * fix * final nits and fix tests * adapt to our new runners * make fix-copies * Update src/transformers/utils/quantization_config.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/utils/quantization_config.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/integrations/awq.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/integrations/awq.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * move to top * add conversion test * final nit * add more elaborated test --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-11-01 09:06:31 +01:00
Hz, Ji	82c7e87987	device agnostic fsdp testing (#27120 ) * make fsdp test cases device agnostic * make style	2023-11-01 07:17:06 +01:00
Lysandre Debut	113ebf80ac	Safetensors serialization by default (#27064 ) * Safetensors serialization by default * First pass on the tests * Second pass on the tests * Third pass on the tests * Fix TF weight loading from TF-format safetensors * Specific encoder-decoder fixes for weight crossloading * Add VisionEncoderDecoder fixes for TF too * Change filename test for pt-to-tf * One missing fix for TFVisionEncoderDecoder * Fix the other crossload test * Support for flax + updated tests * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Sanchit's comments * Sanchit's comments 2 * Nico's comments * Fix tests * cleanup * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: Matt <rocketknight1@gmail.com> Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-10-31 19:16:49 +01:00
Hz, Ji	50378cbf6c	device agnostic models testing (#27146 ) * device agnostic models testing * add decorator `require_torch_fp16` * make style * apply review suggestion * Oops, the fp16 decorator was misused	2023-10-31 18:12:14 +01:00
Younes Belkada	4bb50aa212	[`Quantization` / `tests` ] Fix bnb MPT test (#27178 ) fix bnb mpt test	2023-10-31 16:25:53 +01:00
Matt	05f2290114	Backward compatibility fix for the Conversation class (#27176 ) * Backward compatibility fix for the Conversation class * Explain what's going on in the conditional	2023-10-31 15:12:06 +00:00
Younes Belkada	309a90664f	[FEAT] Add Neftune into transformers Trainer (#27141 ) * add v1 neftune * use `unwrap_model` instead * add test + docs * Apply suggestions from code review Co-authored-by: Zach Mueller <muellerzr@gmail.com> * more details * fixup * Update docs/source/en/main_classes/trainer.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * refactor a bit * more elaborated test * fix unwrap issue --------- Co-authored-by: Zach Mueller <muellerzr@gmail.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-10-31 16:03:59 +01:00
Hz, Ji	f53041a753	device agnostic pipelines testing (#27129 ) * device agnostic pipelines testing * pass torch_device	2023-10-31 15:46:31 +01:00
Matt	08fadc8085	Shorten the conversation tests for speed + fixing position overflows (#26960 ) * Shorten the conversation tests for speed + fixing position overflows * Put max_new_tokens back to 5 * Remove test skips * Increase max_position_embeddings in blenderbot tests * Add skips for blenderbot_small * Correct TF test skip * make fixup * Reformat skips to use is_pipeline_test_to_skip * Update tests/models/blenderbot_small/test_modeling_blenderbot_small.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/blenderbot_small/test_modeling_flax_blenderbot_small.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/blenderbot_small/test_modeling_tf_blenderbot_small.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-10-31 14:20:04 +00:00
Younes Belkada	f7ea959b96	[`core`/ `GC` / `tests`] Stronger GC tests (#27124 ) * stronger GC tests * better tests and skip failing tests * break down into 3 sub-tests * break down into 3 sub-tests * refactor a bit * more refactor * fix * last nit * credits contrib and suggestions * credits contrib and suggestions --------- Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-10-30 19:53:46 +01:00
Hz, Ji	5bbf671276	Device agnostic trainer testing (#27131 )	2023-10-30 18:16:40 +00:00
Younes Belkada	6b466771b0	[`tests` / `Quantization`] Fix bnb test (#27145 ) * fix bnb test * link to GH issue	2023-10-30 15:43:08 +01:00
Yih-Dar	576994963f	Fix some tests using `"common_voice"` (#27147 ) * Use mozilla-foundation/common_voice_11_0 * Update expected values * Update expected values * For test_word_time_stamp_integration --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-10-30 15:27:15 +01:00
Yih-Dar	691fd8fdde	Add `Kosmos-2` model (#24709 ) * Add KOSMOS-2 model * update * update * update * address review comment - 001 * address review comment - 002 * address review comment - 003 * style * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix * address review comment - 004 * address review comment - 005 * address review comment - 006 * address review comment - 007 * address review comment - 008 * address review comment - 009 * address review comment - 010 * address review comment - 011 * update readme * fix * fix * fix * [skip ci] fix * revert the change in _decode * fix docstring * fix docstring * Update docs/source/en/model_doc/kosmos-2.md Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * no more Kosmos2Tokenizer * style * remove "returned when being computed by the model" * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * UTM5 Atten * fix attn mask * use present_key_value_states instead of next_decoder_cache * style * conversion scripts * conversion scripts * conversion scripts * Add _reorder_cache * fix doctest and copies * rename 1 * rename 2 * rename 3 * make fixup * fix table * fix docstring * rename 4 * change repo_id * remove tip * update md file * make style * update md file * put docs/source/en/model_doc/kosmos-2.md to slow * update conversion script * Use CLIPImageProcessor in Kosmos2Processor * Remove Kosmos2ImageProcessor * Remove to_dict in Kosmos2Config * Remove files * fix import * Update conversion * normalized=False * Not using hardcoded values like <image> * elt --> element * Apply suggestion * Not using hardcoded values like </image> * No assert * No nested functions * Fix md file * copy * update doc * fix docstring * fix name * Remove _add_remove_spaces_around_tag_tokens * Remove dummy docstring of _preprocess_single_example * Use `BatchEncoding` * temp * temp * temp * Update * Update * Make Kosmos2ProcessorTest a bit pretty * Update gradient checkpointing * Fix gradient checkpointing test * Remove one liner remove_special_fields * Simplify conversion script * fix add_eos_token * update readme * update tests * Change to microsoft/kosmos-2-patch14-224 * style * Fix doc --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-10-30 13:32:17 +01:00
Younes Belkada	5fbed2d7ca	[`Trainer` / `GC`] Add `gradient_checkpointing_kwargs` in trainer and training arguments (#27068 ) * add `gradient_checkpointing_kwargs` in trainer and training arguments * add comment * add test - currently failing * now tests pass	2023-10-30 12:41:48 +01:00
Patrick von Platen	ac5893756b	[Attention Mask] Refactor all encoder-decoder attention mask (#27086 ) * [FA2 Bart] Add FA2 to all Bart-like * better * Refactor attention mask * remove all customized atteniton logic * format * mass rename * replace _expand_mask * replace _expand_mask * mass rename * add pt files * mass replace & rename * mass replace & rename * mass replace & rename * mass replace & rename * Update src/transformers/models/idefics/modeling_idefics.py * fix more * clean more * fix more * make style * fix again * finish * finish * finish * finish * finish * finish * finish * finish * finish * finish * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * small fix mistral * finish * finish * finish * finish --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-10-27 16:42:01 +02:00
Isaac Chung	e2bffcfafd	Add early stopping for Bark generation via logits processor (#26675 ) * add early stopping logits processor * black formmated * indent * follow method signature * actual logic * check for None * address comments on docstrings and method signature * add unit test under `LogitsProcessorTest` wip * unit test passing * black formatted * condition per sample * add to BarkModelIntegrationTests * wip BarkSemanticModelTest * rename and add to kwargs handling * not add to BarkSemanticModelTest * correct logic and assert last outputs tokens different in test * doc-builder style * read from kwargs as well * assert len of with less than that of without * ruff * add back seed and test case * add original impl default suggestion * doc-builder * rename and use softmax * switch back to LogitsProcessor and update docs wording * camelCase and spelling and saving compute * assert strictly less than * assert less than * expand test_generate_semantic_early_stop instead	2023-10-27 11:07:33 +01:00
Arthur	90ee9cea19	Revert "add exllamav2 arg" (#27102 ) Revert "add exllamav2 arg (#26437)" This reverts commit `8214d6e7b1`.	2023-10-27 11:23:06 +02:00
Zach Mueller	34a640642b	Save TB logs as part of push_to_hub (#27022 ) * Support runs/ * Upload runs folder as part of push to hub * Add a test * Add to test deps * Update with proposed solution from Slack * Ensure that repo gets deleted in tests	2023-10-26 12:13:19 -04:00
Marc Sun	8214d6e7b1	add exllamav2 arg (#26437 ) * add_ xllamav2 arg * add test * style * add check * add doc * replace by use_exllama_v2 * fix tests * fix doc * style * better condition * fix logic * add deprecate msg	2023-10-26 10:15:05 -04:00
Arthur	4864d08d3e	Add-support for commit description (#26704 ) * fix * update * revert * add dosctring * good to go * update * add a test	2023-10-26 12:37:09 +02:00
Younes Belkada	06e782da4e	[`core`] Refactor of `gradient_checkpointing` (#27020 ) * v1 * fix * remove `create_custom_forward` * fixup * fixup * add test and fix all failing GC tests * remove all remaining `create_custom_forward` methods * fix idefics bug * fixup * replace with `__call__` * add comment * quality	2023-10-25 12:16:15 +02:00
Arthur	9286f0ac39	Skip-test (#27062 ) * skip plbart test * nits * update	2023-10-25 10:47:33 +02:00
JB (Don)	a0fd34483f	Add a default decoder_attention_mask for EncoderDecoderModel during training (#26752 ) * Add a default decoder_attention_mask for EncoderDecoderModel during training Since we are already creating the default decoder_input_ids from the labels, we should also create a default decoder_attention_mask to go with it. * Fix test constant that relied on manual_seed() The test was changed to use a decoder_attention_mask that ignores padding instead (which is the default one created by BERT when attention_mask is None). * Create the decoder_attention_mask using decoder_input_ids instead of labels * Fix formatting in test	2023-10-24 18:26:16 +01:00
Alex McKinney	9da451713d	Device agnostic testing (#25870 ) * adds agnostic decorators and availability fns * renaming decorators and fixing imports * updating some representative example tests bloom, opt, and reformer for now * wip device agnostic functions * lru cache to device checking functions * adds `TRANSFORMERS_TEST_DEVICE_SPEC` if present, imports the target file and updates device to function mappings * comments `TRANSFORMERS_TEST_DEVICE_SPEC` code * extra checks on device name * `make style; make quality` * updates default functions for agnostic calls * applies suggestions from review * adds `is_torch_available` guard * Add spec file to docs, rename function dispatch names to backend_* * add backend import to docs example for spec file * change instances of to * Move register backend to before device check as per @statelesshz changes * make style * make opt test require fp16 to run --------- Co-authored-by: arsalanu <arsalanu@graphcore.ai> Co-authored-by: arsalanu <hzji210@gmail.com>	2023-10-24 16:49:26 +02:00
Xuehai Pan	cc7803c0a6	Register ModelOutput as supported torch pytree nodes (#26618 ) * Register ModelOutput as supported torch pytree nodes * Test ModelOutput as supported torch pytree nodes * Update type hints for pytree unflatten functions	2023-10-24 11:02:40 +02:00
Patrick von Platen	33f98cfded	Remove ambiguous `padding_mask` and instead use a 2D->4D Attn Mask Mapper (#26792 ) * [Attn Mask Converter] refactor attn mask * up * Apply suggestions from code review Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com> * improve * rename * better cache * renaming * improve more * improve * fix bug * finalize * make style & make fix-copies * correct more * start moving attention_mask * fix llama * improve falcon * up * improve more * improve more * Update src/transformers/models/owlv2/modeling_owlv2.py * make style * make style * rename to converter * Apply suggestions from code review --------- Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com>	2023-10-23 18:54:00 +02:00
Yoach Lacombe	cb45f71c4d	Add Seamless M4T model (#25693 ) * first raw commit * still POC * tentative convert script * almost working speech encoder conversion scripts * intermediate code for encoder/decoders * add modeling code * first version of speech encoder * make style * add new adapter layer architecture * add adapter block * add first tentative config * add working speech encoder conversion * base model convert works now * make style * remove unnecessary classes * remove unecessary functions * add modeling code speech encoder * rework logics * forward pass of sub components work * add modeling codes * some config modifs and modeling code modifs * save WIP * new edits * same output speech encoder * correct attention mask * correct attention mask * fix generation * new generation logics * erase comments * make style * fix typo * add some descriptions * new state * clean imports * add tests * make style * make beam search and num_return_sequences>1 works * correct edge case issue * correct SeamlessM4TConformerSamePadLayer copied from * replace ACT2FN relu by nn.relu * remove unecessary return variable * move back a class * change name conformer_attention_mask ->conv_attention_mask * better nit code * add some Copied from statements * small nits * small nit in dict.get * rename t2u model -> conditionalgeneration * ongoing refactoring of structure * update models architecture * remove SeamlessM4TMultiModal classes * add tests * adapt tests * some non-working code for vocoder * add seamlessM4T vocoder * remove buggy line * fix some hifigan related bugs * remove hifigan specifc config * change * add WIP tokenization * add seamlessM4T working tokenzier * update tokenization * add tentative feature extractor * Update converting script * update working FE * refactor input_values -> input_features * update FE * changes in generation, tokenizer and modeling * make style and add t2u_decoder_input_ids * add intermediate outputs for ToSpeech models * add vocoder to speech models * update valueerror * update FE with languages * add vocoder convert * update config docstrings and names * update generation code and configuration * remove todos and update config.pad_token_id to generation_config.pad_token_id * move block vocoder * remove unecessary code and uniformize tospeech code * add feature extractor import * make style and fix some copies from * correct consistency + make fix-copies * add processor code * remove comments * add fast tokenizer support * correct pad_token_id in M4TModel * correct config * update tests and codes + make style * make some suggested correstion - correct comments and change naming * rename some attributes * rename some attributes * remove unecessary sequential * remove option to use dur predictor * nit * refactor hifigan * replace normalize_mean and normalize_var with do_normalize + save lang ids to generation config * add tests * change tgt_lang logic * update generation ToSpeech * add support import SeamlessM4TProcessor * fix generate * make tests * update integration tests, add option to only return text and update tokenizer fast * fix wrong function call * update import and convert script * update integration tests + update repo id * correct paths and add first test * update how new attention masks are computed * update tests * take first care of batching in vocoder code * add batching with the vocoder * add waveform lengths to model outputs * make style * add generate kwargs + forward kwargs of M4TModel * add docstrings forward methods * reformate docstrings * add docstrings t2u model * add another round of modeling docstrings + reformate speaker_id -> spkr_id * make style * fix check_repo * make style * add seamlessm4t to toctree * correct check_config_attributes * write config docstrings + some modifs * make style * add docstrings tokenizer * add docstrings to processor, fe and tokenizers * make style * write first version of model docs * fix FE + correct FE test * fix tokenizer + add correct integration tests * fix most tokenization tests * make style * correct most processor test * add generation tests and fix num_return_sequences > 1 * correct integration tests -still one left * make style * correct position embedding * change numbeams to 1 * refactor some modeling code and correct one test * make style * correct typo * refactor intermediate fnn * refactor feedforward conformer * make style * remove comments * make style * fix tokenizer tests * make style * correct processor tests * make style * correct S2TT integration * Apply suggestions from Sanchit code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * correct typo * replace torch.nn->nn + make style * change Output naming (waveforms -> waveform) and ordering * nit renaming and formating * remove return None when not necessary * refactor SeamlessM4TConformerFeedForward * nit typo * remove almost copied from comments * add a copied from comment and remove an unecessary dropout * remove inputs_embeds from speechencoder * remove backward compatibiliy function * reformate class docstrings for a few components * remove unecessary methods * split over 2 lines smthg hard to read * make style * replace two steps offset by one step as suggested * nice typo * move warnings * remove useless lines from processor * make generation non-standard test more robusts * remove torch.inference_mode from tests * split integration tests * enrich md * rename control_symbol_vocoder_offset->vocoder_offset * clean convert file * remove tgt_lang and src_lang from FE * change generate docstring of ToText models * update generate docstring of tospeech models * unify how to deal withtext_decoder_input_ids * add default spkr_id * unify tgt_lang for t2u_model * simplify tgt_lang verification * remove a todo * change config docstring * make style * simplify t2u_tgt_lang_id * make style * enrich/correct comments * enrich .md * correct typo in docstrings * add torchaudio dependency * update tokenizer * make style and fix copies * modify SeamlessM4TConverter with new tokenizer behaviour * make style * correct small typo docs * fix import * update docs and add requirement to tests * add convert_fairseq2_to_hf in utils/not_doctested.txt * update FE * fix imports and make style * remove torchaudio in FE test * add seamless_m4t.md to utils/not_doctested.txt * nits and change the way docstring dataset is loaded * move checkpoints from ylacombe/ to facebook/ orga * refactor warning/error to be in the 119 line width limit * round overly precised floats * add stereo audio behaviour * refactor .md and make style * enrich docs with more precised architecture description * readd undocumented models * make fix-copies * apply some suggestions * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * correct bug from previous commit * refactor a parameter allowing to clean the code + some small nits * clean tokenizer * make style and fix * make style * clean tokenizers arguments * add precisions for some tests * move docs from not_tested to slow * modify tokenizer according to last comments * add copied from statements in tests * correct convert script * correct parameter docstring style * correct tokenization * correct multi gpus * make style * clean modeling code * make style * add copied from statements * add copied statements * add support with ASR pipeline * remove file added inadvertently * fix docstrings seamlessM4TModel * add seamlessM4TConfig to OBJECTS_TO_IGNORE due of unconventional markdown * add seamlessm4t to assisted generation ignored models --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-10-23 14:49:48 +02:00
Arthur	ef978d0a7b	skip two tests (#27013 ) * skip two tests * skip torch as well * fixup	2023-10-23 12:52:05 +02:00
Pedro Cuenca	c030fc8913	Fix Fuyu image scaling bug (#26918 ) * Fix Fuyu image scaling bug It could produce negative padding and hence inference errors for certain image sizes. * Fix aspect ratio scaling test	2023-10-20 13:46:06 +02:00
Matt	bdbcd5d482	Fix and re-enable ConversationalPipeline tests (#26907 ) * Fix and re-enable conversationalpipeline tests * Fix the batch test so the change only applies to conversational pipeline	2023-10-19 12:04:25 +01:00
Pablo Montalvo	caa0ff0bf1	Add fuyu model (#26911 ) * initial commit * add processor, add fuyu naming * add draft processor * fix processor * remove dropout to fix loading of weights * add image processing fixes from Pedro * fix * fix processor * add basic processing fuyu test * add documentation and TODO * address comments, add tests, add doc * replace assert with torch asserts * add Mixins and fix tests * clean imports * add model tester, clean imports * fix embedding test * add updated tests from pre-release model * Processor: return input_ids used for inference * separate processing and model tests * relax test tolerance for embeddings * add test for logit comparison * make sure fuyu image processor is imported in the init * fix formattingh * more formatting issues * and more * fixups * remove some stuff * nits * update init * remove the fuyu file * Update integration test with release model * Update conversion script. The projection is not used, as confirmed by the authors. * improve geenration * Remove duplicate function * Trickle down patches to model call * processing fuyu updates * remove things * fix prepare_inputs_for_generation to fix generate() * remove model_input * update * add generation tests * nits * draft leverage automodel and autoconfig * nits * fix dtype patch * address comments, update READMEs and doc, include tests * add working processing test, remove refs to subsequences * add tests, remove Sequence classification * processing * update * update the conversion script * more processing cleanup * safe import * take out ModelTesterMixin for early release * more cl;eanup * more cleanup * more cleanup * and more * register a buffer * nits * add postprocessing of generate output * nits * updates * add one working test * fix test * make fixup works * fixup * Arthur's updates * nits * update * update * fix processor * update tests * passe more fixups * fix * nits * don't import torch * skip fuyu config for now * fixup done * fixup * update * oups * nits * Use input embeddings * no buffer * update * styling processing fuyu * fix test * update licence * protect torch import * fixup and update not doctested * kwargs should be passed * udpates * update the impofixuprts in the test * protect import * protecting imports * protect imports in type checking * add testing decorators * protect top level import structure * fix typo * fix check init * move requires_backend to functions * Imports * Protect types --------- Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: ArthurZucker <arthur.zucker@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Lysandre <lysandre@huggingface.co>	2023-10-18 15:24:11 -07:00
Younes Belkada	5a73316bed	[`FA-2`] Final fix for FA2 dtype (#26846 ) * final fix for FA2 dtype * try * oops * Update src/transformers/models/falcon/modeling_falcon.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * apply fix everywhere --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-10-18 19:48:55 +02:00
Matt	de55ead1f1	Emergency PR to skip conversational tests to fix CI (#26906 )	2023-10-18 15:33:43 +01:00
Arthur	ef7e93699a	[`Tokenizer`] Fix slow and fast serialization (#26570 ) * fix * last attempt * current work * fix forward compatibility * save all special tokens * current state * revert additional changes * updates * remove tokenizer.model * add a test and the fix * nit * revert one more break * fix typefield issue * quality * more tests * fix fields for FC * more nits? * new additional changes * how * some updates * simplify all * more nits * revert some things to original * nice * nits * a small hack * more nits * ahhaha * fixup * update * make test run on ci * use subtesting * update * Update .circleci/create_circleci_config.py * updates * fixup * nits * replace typo * fix the test * nits * update * None max dif pls * a partial fix * had to revert one thing * test the fast * updates * fixup * and more nits * more fixes * update * Oupsy 👁️ * nits * fix marian * on our way to heaven * Update src/transformers/models/t5/tokenization_t5.py Co-authored-by: Lysandre Debut <hi@lysand.re> * fixup * Update src/transformers/tokenization_utils_fast.py Co-authored-by: Leo Tronchon <leo.tronchon@gmail.com> * Update src/transformers/tokenization_utils_base.py Co-authored-by: Leo Tronchon <leo.tronchon@gmail.com> * fix phobert * skip some things, test more * nits * fixup * fix deberta * update * update * more updates * skip one test * more updates * fix camembert * can't test this one * more good fixes * kind of a major update - seperate what is only done in fast in fast init and refactor - add_token(AddedToken(..., speicla = True)) ignores it in fast - better loading * fixup * more fixups * fix pegasus and mpnet * remove skipped tests * fix phoneme tokenizer if self.verbose * fix individual models * update common tests * update testing files * all over again * nits * skip test for markup lm * fixups * fix order of addition in fast by sorting the added tokens decoder * proper defaults for deberta * correct default for fnet * nits on add tokens, string initialized to special if special * skip irrelevant herbert tests * main fixes * update test added_tokens_serialization * the fix for bart like models and class instanciating * update bart * nit! * update idefix test * fix whisper! * some fixup * fixups * revert some of the wrong chanegs * fixup * fixup * skip marian * skip the correct tests * skip for tf and flax as well --------- Co-authored-by: Lysandre Debut <hi@lysand.re> Co-authored-by: Leo Tronchon <leo.tronchon@gmail.com>	2023-10-18 16:30:53 +02:00
Yoach Lacombe	db611aabee	🚨 🚨 Raise error when no speaker embeddings in speecht5._generate_speech (#26418 ) * add warning when no speaker embeddings in speecht5._generate_speech * modify warning to error * adapt generation test	2023-10-17 15:59:35 +02:00
Younes Belkada	41c42f85f6	[`FA2`] Fix flash attention 2 fine-tuning with Falcon (#26852 ) fix fa2 + dropout issue	2023-10-17 15:38:03 +02:00
Yih-Dar	b8f1cde931	Fix Mistral OOM again (#26847 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-10-16 22:47:20 +02:00
Younes Belkada	fd6a0ade9b	🚨🚨🚨 [`Quantization`] Store the original dtype in the config as a private attribute 🚨🚨🚨 (#26761 ) * First step * fix * add adjustements for gptq * change to `_pre_quantization_dtype` * Update src/transformers/modeling_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix serialization * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fixup --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-10-16 19:56:53 +02:00
Matt	14b04b4b9c	Conversation pipeline fixes (#26795 ) * Adjust length limits and allow naked conversation list inputs * Adjust length limits and allow naked conversation list inputs * Maybe use a slightly more reasonable limit than 1024 * Skip tests for old models that never supported this anyway * Cleanup input docstrings * More docstring cleanup + skip failing TF test * Make fixup	2023-10-16 17:27:45 +01:00
NielsRogge	762af3e3c7	Add OWLv2, bis (#26668 ) * First draft * Update conversion script * Update copied from statements * Fix style * Add copied from to config * Add copied from to processor * Run make fixup * Add docstring * Update docstrings * Add method * Improve docstrings * Fix docstrings * Improve docstrings * Remove onnx * Add flag * Address comments * Add copied from to model tests * Add flag to conversion script * Add code snippet * Address more comments * Address comment * Improve conversion script * More improvements * Add expected objectness logits * Skip test * Improve conversion script * Extend conversion script * Convert large checkpoint * Fix doc tests * Convert all checkpoints, update integration tests * Add checkpoint_path arg * Fix repo_id	2023-10-13 16:41:24 +02:00
Matt	bdb391e9c6	Fix Falcon generation test (#26770 )	2023-10-13 15:10:27 +01:00
Matt	c9785d956b	Disable default system prompt for LLaMA (#26765 ) * Disable default system prompt for LLaMA * Update test to not expect default prompt	2023-10-13 14:48:38 +01:00
Yih-Dar	21da3b2461	Update expect outputs of `IdeficsProcessorTest.test_tokenizer_padding` (#26779 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-10-13 09:52:10 +02:00
Yih-Dar	3e93dd295b	Skip `TrainerIntegrationFSDP::test_basic_run_with_cpu_offload` if `torch < 2.1` (#26764 ) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-10-12 18:22:09 +02:00
Heinz-Alexander Fuetterer	883ed4b344	chore: fix typos (#26756 )	2023-10-12 18:00:27 +02:00
Yih-Dar	a243cdca2a	Fix `PerceiverModelIntegrationTest::test_inference_masked_lm` (#26760 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-10-12 17:43:06 +02:00
Yih-Dar	db5e0c3292	Fix `MistralIntegrationTest` OOM (#26754 ) * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-10-12 12:31:11 +02:00
Yih-Dar	72256bc72a	Fix `PersimmonIntegrationTest` OOM (#26750 ) * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-10-12 11:24:18 +02:00
Tom Aarsen	40ea9ab2a1	Add many missing spaces in adjacent strings (#26751 ) Add missing spaces in adjacent strings	2023-10-12 10:28:40 +02:00
Patrick von Platen	da69de17e8	[Assistant Generation] Improve Encoder Decoder (#26701 ) * [Assistant Generation] Improve enc dec * save more * Fix logit processor checks * Clean * make style * fix deprecation * fix generation test * Apply suggestions from code review * fix biogpt * make style	2023-10-11 15:52:20 +02:00
Yih-Dar	5334796d20	`Copied from` for test files (#26713 ) * copied statement for test files --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-10-11 14:12:09 +02:00
Billy Bradley	dcc49d8a7e	In assisted decoding, pass model_kwargs to model's forward call (fix prepare_input_for_generation in all models) (#25242 ) * In assisted decoding, pass model_kwargs to model's forward call Previously, assisted decoding would ignore any additional kwargs that it doesn't explicitly handle. This was inconsistent with other generation methods, which pass the model_kwargs through prepare_inputs_for_generation and forward the returned dict to the model's forward call. The prepare_inputs_for_generation method needs to be amended in all models, as previously it only kept the last input ID when a past_key_values was passed. * Improve variable names in _extend_attention_mask * Refactor extending token_type_ids into a function * Replace deepcopy with copy to optimize performance * Update new persimmon model with llama changes for assisted generation * Update new mistral model for assisted generation with prepare_inputs_for_generation * Update position_ids creation in falcon prepare_inputs_for_generation to support assisted generation	2023-10-11 13:18:42 +02:00
Thien Tran	1e3c9ddacc	Make Whisper Encoder's sinusoidal PE non-trainable by default (#26032 ) * set encoder's PE as non-trainable * freeze flax * init sinusoids * add test for non-trainable embed positions * simplify TF encoder embed_pos * revert tf * clean up * add sinusoidal init for jax * make consistent sinusoidal function * fix dtype * add default dtype * use numpy for sinusoids. fix jax * add sinusoid init for TF * fix * use custom embedding * use specialized init for each impl * fix sinusoids init. add test for pytorch * fix TF dtype * simplify sinusoid init for flax and tf * add tests for TF * change default dtype to float32 * add sinusoid test for flax * Update src/transformers/models/whisper/modeling_flax_whisper.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update src/transformers/models/whisper/modeling_tf_whisper.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * move sinusoidal init to _init_weights --------- Co-authored-by: sanchit-gandhi <sanchit@huggingface.co> Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>	2023-10-11 09:08:54 +01:00
Shreyas S	86a4e5a96b	Fixed malapropism error (#26660 ) Update test_integration.py Fixed malapropism clone>copy	2023-10-09 11:04:57 +02:00
Arthur	9ad815e412	[`LlamaTokenizerFast`] Adds edge cases for the template processor (#26606 ) * make sure eos and bos are properly handled for fast tokenizer * fix code llama as well * nits * fix the conversion script as well * fix failing test	2023-10-06 16:40:54 +02:00
statelesshz	27597fea07	remove SharedDDP as it is deprecated (#25702 ) * remove SharedDDP as it was drepracated * apply review suggestion * make style * Oops,forgot to remove the compute_loss context manager in Seq2SeqTrainer. * remove the unnecessary conditional statement * keep the logic of IPEX * clean code * mix precision setup & make fixup --------- Co-authored-by: statelesshz <jihuazhong1@huawei.com>	2023-10-06 16:03:11 +02:00
Yih-Dar	e840aa67e8	Fix failing `MusicgenTest .test_pipeline_text_to_audio` (#26586 ) * fix * fix * Fix * Fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-10-06 15:53:59 +02:00
fxmarty	64845307b3	Remove unnecessary unsqueeze - squeeze in rotary positional embedding (#26162 ) * remove unnecessary unsqueeze-squeeze in llama * correct other models * fix * revert gpt_neox_japanese * fix copie * fix test	2023-10-06 18:25:15 +09:00
Tianqi Liu	65aabafe2f	Update tokenization_code_llama_fast.py (#26576 ) * Update tokenization_code_llama_fast.py * Update test_tokenization_code_llama.py * Update test_tokenization_code_llama.py	2023-10-06 10:49:02 +02:00
Towdo	af38c837ee	Fixed inconsistency in several fast tokenizers (#26561 )	2023-10-06 10:40:47 +02:00
Marvin Gabler	0a3b9d02fe	#26566 swin2 sr allow in out channels (#26568 ) * feat: close #26566, changed model & config files to accept arbitary in and out channels * updated docstrings * fix: linter error * fix: update Copy docstrings * fix: linter update * fix: rename num_channels_in to num_channels to prevent breaking changes * fix: make num_channels_out None per default * Update src/transformers/models/swin2sr/configuration_swin2sr.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix: update tests to include num_channels_out * fix:linter * fix: remove normalization with precomputed rgb values when #input_channels!=#output_channels --------- Co-authored-by: marvingabler <marvingabler@outlook.de> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-10-05 15:20:38 +02:00
Younes Belkada	e6d250e4cd	[`core`] fix silent bug `keep_in_fp32` modules (#26589 ) * fix silent bug `keep_in_fp32` modules * final fix * added a common test. * Trigger CI * revert	2023-10-05 14:44:31 +02:00
Yih-Dar	54e17a15dc	Fix failing tests on `main` due to torch 2.1 (#26607 ) * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-10-05 10:27:05 +02:00
Arthur	c037b2e340	skip flaky hub tests (#26594 ) skip flaky	2023-10-04 17:47:55 +02:00
dg845	9deb18ca1a	Add # Copied from statements to audio feature extractors that use the floats_list function (#26581 ) Add # Copied from statements to audio feature extractors that use the floats_list function.	2023-10-04 17:09:48 +02:00
Sylvain Gugger	03af4c42a6	Docstring check (#26052 ) * Fix number of minimal calls to the Hub with peft integration * Alternate design * And this way? * Revert * Nits to fix * Add util * Print when changes are made * Add list to ignore * Add more rules * Manual fixes * deal with kwargs * deal with enum defaults * avoid many digits for floats * Manual fixes * Fix regex * Fix regex * Auto fix * Style * Apply script * Add ignored list * Add check that templates are filled * Adding to CI checks * Add back semi-fix * Ignore more objects * More auto-fixes * Ignore missing objects * Remove temp semi-fix * Fixes * Update src/transformers/models/pvt/configuration_pvt.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update utils/check_docstrings.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/utils/quantization_config.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Deal with float defaults * Fix small defaults * Address review comment * Treat * Post-rebase cleanup * Address review comment * Update src/transformers/models/deprecated/mctct/configuration_mctct.py Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr> * Address review comment --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>	2023-10-04 15:13:37 +02:00
Lysandre Debut	5c66378cea	[Tokenizers] Skip tests temporarily (#26574 ) * Skip tests temporarily * style * Add additional test	2023-10-03 19:43:42 +02:00
Sanchit Gandhi	57f44dc428	[Whisper] Allow basic text normalization (#26149 ) * [Whisper] Allow basic text normalization * up * style copies	2023-10-03 17:57:16 +01:00
Younes Belkada	2aef9a9601	[`PEFT`] Final fixes (#26559 ) * fix issues with PEFT * logger warning futurewarning issues * fixup * adapt from suggestions * oops * rm test	2023-10-03 14:53:09 +02:00
Younes Belkada	ae9a344cce	[`Mistral`] Add Flash Attention-2 support for `mistral` (#26464 ) * add FA-2 support for mistral * fixup * add sliding windows * fixing few nits * v1 slicing cache - logits do not match * add comment * fix bugs * more mem efficient * add warning once * add warning once * oops * fixup * more comments * copy * add safety checker * fixup * Update src/transformers/models/mistral/modeling_mistral.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * copied from * up * raise when padding side is right * fixup * add doc + few minor changes * fixup --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-10-03 13:44:46 +02:00
Sanchit Gandhi	768aa3d9cd	[Wav2Vec2 and Co] Update init tests for PT 2.1 (#26494 )	2023-10-03 10:52:34 +02:00
Nathan Cahill	b5ca8fcd20	Add tokenizer kwargs to fill mask pipeline. (#26234 ) * add tokenizer kwarg inputs * Adding tokenizer_kwargs to _sanitize_parameters * Add truncation=True example to tests * Update test_pipelines_fill_mask.py * Update test_pipelines_fill_mask.py * make fix-copies and make style * Update fill_mask.py Replace single tick with double * make fix-copies * Style --------- Co-authored-by: Lysandre <lysandre@huggingface.co>	2023-10-03 10:25:10 +02:00
Arthur	bab3331906	Code-llama-nit (#26300 ) * fix encoding when the fill token is None * add tests and edge cases * fiuxp * Update tests/models/code_llama/test_tokenization_code_llama.py	2023-10-02 18:29:27 +02:00
Arthur	63864e057f	Fix model integration ci (#26322 ) * fix wav2vec2 * nit * stash * one more file to update * fix byt5 * vocab size is 256, don't change that! * use other revision * test persimon in smaller size * style * tests * nits * update add tokens from pretrained * test tokenization * nits * potential fnet fix? * more nits * nits * correct test * assert close * udpate * ouch * fix it * some more nits * FINALLU * use `adept` checkpoints * more adept checkpoints * that was invlved!	2023-10-02 13:55:46 +02:00
Younes Belkada	6824461f2a	[`core`/ `auto` ] Fix bnb test with code revision + bug with code revision (#26431 ) * fix bnb test with code revision * fix test * Apply suggestions from code review * Update src/transformers/models/auto/auto_factory.py * Update src/transformers/models/auto/auto_factory.py * Update src/transformers/models/auto/auto_factory.py	2023-10-02 11:35:07 +02:00
Lysandre Debut	67239f7360	Revert falcon exception (#26472 ) * Revert "Falcon: fix revision propagation (#26006)" This reverts commit `118c676ef3`. * Revert "Put Falcon back (#25960)" This reverts commit `22a69f1d7d`.	2023-10-02 09:13:19 +02:00
Yih-Dar	391177441b	Avoid all-zeor attnetion mask used in testing (#26469 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-09-29 11:06:06 +02:00
Yih-Dar	9b23d0de0e	Skip 2 failing persimmon pipeline tests for now (#26485 ) skip Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-09-29 10:52:18 +02:00
Marc Sun	5e11d72d4d	fix_mbart_tied_weights (#26422 ) * fix_mbart_tied_weights * add test	2023-09-28 15:08:35 +02:00
Younes Belkada	38e96324ef	[`PEFT`] introducing `adapter_kwargs` for loading adapters from different Hub location (`subfolder`, `revision`) than the base model (#26270 ) * make use of adapter_revision * v1 adapter kwargs * fix CI * fix CI * fix CI * fixup * add BC * Update src/transformers/integrations/peft.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fixup * change it to error * Update src/transformers/modeling_utils.py * Update src/transformers/modeling_utils.py * fixup * change * Update src/transformers/integrations/peft.py --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-09-28 11:13:03 +02:00
Chris Bamford	72958fcd3c	[Mistral] Mistral-7B-v0.1 support (#26447 ) * [Mistral] Mistral-7B-v0.1 support * fixing names * slightly longer test * fixups * not_doctested * wrongly formatted references * make fixuped --------- Co-authored-by: Timothee Lacroix <t@eugen.ai> Co-authored-by: timlacroix <t@mistral.ai>	2023-09-27 18:30:46 +02:00
Younes Belkada	3ca18d6d09	[`PEFT`] Fix PEFT multi adapters support (#26407 ) * fix PEFT multi adapters support * refactor a bit * save pretrained + BC + added tests * Update src/transformers/integrations/peft.py Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com> * add more tests * add suggestion * final changes * adapt a bit * fixup * Update src/transformers/integrations/peft.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * adapt from suggestions --------- Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2023-09-27 16:45:31 +02:00
Younes Belkada	153755ee38	[`FA` / `tests`] Add use_cache tests for FA models (#26415 ) * add use_cache tests for FA * fixup	2023-09-27 12:21:54 +02:00
Shauray Singh	abd2531034	Fix padding for IDEFICS (#26396 ) * fix * fixup * tests * fixup	2023-09-27 10:56:07 +02:00
sanjeevk-os	6ce6a5adb9	added support for gradient checkpointing in ESM models (#26386 )	2023-09-26 10:15:53 +02:00
NielsRogge	ace74d16bd	Add Nougat (#25942 ) * Add conversion script * Add NougatImageProcessor * Add crop margin * More improvements * Add docs, READMEs * Remove print statements * Include model_max_length * Add NougatTokenizerFast * Fix imports * Improve postprocessing * Improve image processor * Fix image processor * Improve normalize method * More improvements * More improvements * Add processor, improve docs * Simplify fast tokenizer * Remove test file * Fix docstrings * Use NougatProcessor in conversion script * Add is_levensthein_available * Add tokenizer tests * More improvements * Use numpy instead of opencv * Add is_cv2_available * Fix cv2_available * Add is_nltk_available * Add image processor tests, improve crop_margin * Add integration tests * Improve integration test * Use do_rescale instead of hacks, thanks Amy * Remove random_padding * Address comments * Address more comments * Add import * Address more comments * Address more comments * Address comment * Address comment * Set max_model_input_sizes * Add tests * Add requires_backends * Add Nougat to exotic tests * Use to_pil_image * Address comment regarding nltk * Add NLTK * Improve variable names, integration test * Add test * refactor, document, and test regexes * remove named capture groups, add comments * format * add non-markdown fixed tokenization * format * correct flakyness of args parse * add regex comments * test functionalities for crop_image, align long axis and expected output * add regex tests * remove cv2 dependency * test crop_margin equality between cv2 and python * refactor table regexes to markdown add newline * change print to log, improve doc * fix high count tables correction * address PR comments: naming, linting, asserts * Address comments * Add copied from * Update conversion script * Update conversion script to convert both small and base versions * Add inference example * Add more info * Fix style * Add require annotators to test * Define all keyword arguments explicitly * Move cv2 annotator * Add tokenizer init method * Transfer checkpoints * Add reference to Donut * Address comments * Skip test * Remove cv2 method * Add copied from statements * Use cached_property * Fix docstring * Add file to not doctested --------- Co-authored-by: Pablo Montalvo <pablo.montalvo.leroux@gmail.com>	2023-09-26 07:06:04 +02:00
Yih-Dar	d9e4bc2895	Update tiny model information and pipeline tests (#26285 ) * Update tiny model summary file * add to pipeline tests * revert * fix import * fix import * fix * fix * update * update * update * fix * remove BarkModelTest * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-09-25 18:08:12 +02:00
LeviVasconcelos	576cd45a57	Add image to image pipeline (#25393 ) * Add image to image pipeline Add image to image pipeline * remove swin2sr from tf auto * make ImageToImage importable * make style make style make style make style * remove tf support * remove nonused imports * fix postprocessing * add important comments; add unit tests * add documentation * remove support for TF * make fixup * fix typehint Image.Image * fix documentation code * address review request; fix unittest type checking * address review request; fix unittest type checking * make fixup * address reviews * Update src/transformers/pipelines/image_to_image.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * enhance docs * make style * make style * improve docetest time * improve docetest time * Update tests/pipelines/test_pipelines_image_to_image.py Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> * Update tests/pipelines/test_pipelines_image_to_image.py Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> * make fixup * undo faulty merge * undo faulty merge * add image-to-image to test pipeline mixin * Update src/transformers/pipelines/image_to_image.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update tests/pipelines/test_pipelines_image_to_image.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * improve docs --------- Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-09-22 19:53:55 +03:00
Younes Belkada	368a58e61c	[`core` ] Integrate Flash attention 2 in most used models (#25598 ) * v1 * oops * working v1 * fixup * add some TODOs * fixup * padding support + try with module replacement * nit * alternative design * oops * add `use_cache` support for llama * v1 falcon * nit * a bit of refactor * nit * nits nits * add v1 padding support falcon (even though it seemed to work before) * nit * falcon works * fixup * v1 tests * nit * fix generation llama flash * update tests * fix tests + nits * fix copies * fix nit * test- padding mask * stype * add more mem efficient support * Update src/transformers/modeling_utils.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * fixup * nit * fixup * remove it from config when saving * fixup * revert docstring * add more checks * use values * oops * new version * fixup * add same trick for falcon * nit * add another test * change tests * fix issues with GC and also falcon * fixup * oops * Update src/transformers/models/falcon/modeling_falcon.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * add init_rope * updates * fix copies * fixup * fixup * more clarification * fixup * right padding tests * add docs * add FA in docker image * more clarifications * add some figures * add todo * rectify comment * Change to FA2 * Update docs/source/en/perf_infer_gpu_one.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * split in two lines * change test name * add more tests * some clean up * remove `rearrange` deps * add more docs * revert changes on dockerfile * Revert "revert changes on dockerfile" This reverts commit `8d72a66b4b`. * revert changes on dockerfile * Apply suggestions from code review Co-authored-by: Lysandre Debut <hi@lysand.re> * address some comments * docs * use inheritance * Update src/transformers/testing_utils.py Co-authored-by: Lysandre Debut <hi@lysand.re> * fixup * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/modeling_utils.py * final comments * clean up * style * add cast + warning for PEFT models * fixup --------- Co-authored-by: Felix Marty <9808326+fxmarty@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Lysandre Debut <hi@lysand.re>	2023-09-22 17:42:10 +02:00
Yoach Lacombe	9a30753485	Porting the torchaudio kaldi fbank implementation to audio_utils (#26182 ) * add kaldi fbank * make style * add herz_to_mel_kaldi tests * add mel to hertz kaldi test * integration tests * correct test and remove comment * make style * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * change parameter name * Apply suggestions from Arthur review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update remove_dc_offset description * fix bug + make style * fix error in using np.exp instead of np.power * make style --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-09-21 17:52:47 +02:00
Lysandre Debut	26ba56ccbd	Fix FSMT weight sharing (#26292 )	2023-09-21 14:46:05 +02:00
fxmarty	da971b2271	Keep relevant weights in fp32 when `model._keep_in_fp32_modules` is set even when `accelerate` is not installed (#26225 ) * fix bug where weight would not be kept in fp32 * nit * address review comments * fix test	2023-09-21 19:00:03 +09:00
Arthur	f94c9b3d86	include changes from llama (#26260 ) * include changes from llama * add a test	2023-09-20 17:19:30 +02:00
Jinho Park	37c205eb5d	Update bros checkpoint (#26277 ) * fix bros integration test * update bros checkpoint	2023-09-20 10:22:07 +02:00
Sourab Mangrulkar	86ffd5ffa2	fix name error when accelerate is not available (#26278 ) * fix name error when accelerate is not available * fix `is_fsdp_available`	2023-09-20 08:02:55 +02:00
Sourab Mangrulkar	382ba670ed	FSDP tests and checkpointing fixes (#26180 ) * add fsdp tests * Update test_fsdp.py * Update test_fsdp.py * fixes * checks * Update trainer.py * fix * fixes for saving/resuming checkpoints * fixes * add tests and delete debug statements * fixing tests * Update test_fsdp.py * fix tests * fix tests * minor nits * fix code style and quality * refactor and modularize test code * reduce the time of tests * reduce the test time * fix test * reduce test time * reduce test time * fix failing tests * fix * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * resolve comments --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-09-20 10:26:16 +05:30
Sam Passaglia	8e3980a290	[FIX] resize_token_embeddings (#26102 ) * fix roundup command * add test for resize_token_embeddings * Update tests/test_modeling_common.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * style --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-09-19 21:44:41 +02:00
NielsRogge	7d6354e047	Add ViTMatte (#25843 ) * First draft * Simplify image processor * Fix rebase * Address comments * Address more comments * Address more comments * Address more comments * Address more comments * Improve pad_image * Add tests * Update integration test * Fix image processor tests * Fix model tests * Convert checkpoints * Fix doc tests * Remove file * Apply suggestions * Address comments * Fix typing hint * Add batch_norm_eps * Address comments * Fix style	2023-09-19 10:56:10 -03:00
Lucain	04191ea1e6	Fix gated repo tests (#26257 ) * Fix gated repo tests * Apply suggestions from code review	2023-09-19 13:25:12 +02:00
NielsRogge	de8bec6df3	[AutoBackbone] Add test (#26094 ) * Add test * Add config_class	2023-09-18 23:47:54 +02:00
Arthur	2da8853775	🚨🚨 🚨🚨 [`Tokenizer`] attemp to fix add_token issues🚨🚨 🚨🚨 (#23909 ) * fix test for bart. Order is correct now let's skip BPEs * ouf * styling * fix bert.... * slow refactoring * current updates * massive refactoring * update * NICE! * update to see where I am at * updates * update * update * revert * updates * updates * start supporting legacy_save * styling * big update * revert some changes * nits * nniiiiiice * small fixes * kinda fix t5 with new behaviour * major update * fixup * fix copies * today's updates * fix byt5 * upfate * update * update * updates * update vocab size test * Barthez does not use not need the fairseq offset ids * super calll must be after * calll super * move all super init * move other super init * fixup * nits * more fixes * nits * more fixes * nits * more fix * remove useless files * ouch all of them are affected * and more! * small imporvements * no more sanitize token * more changes around unique no split tokens * partially fix more things * keep legacy save but add warning * so... more fixes * updates * guess deberta tokenizer could be nuked * fixup * fixup did some bad things * nuke it if it breaks * remove prints and pretrain fast from slow with new format. * fixups * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fiou * nit * by default specials should not be normalized? * update * remove brakpoint * updates * a lot of updates * fixup * fixes revert some changes to match fast * small nits * that makes it cleaner * fix camembert accordingly * update * some lest breaking changes * update * fixup * fix byt5 and whisper mostly * some more fixes, canine's byte vocab * fix gpt2 * fix most of the perceiver tests (4 left) * fix layout lmv3 * fixup * fix copies for gpt2 style * make sure to only warn once * fix perciever and gpt2 tests * some more backward compatibility: also read special tokens map because some ppl use it........////..... * fixup * add else when reading * nits * fresh updates * fix copies * will this make everything faster? * fixes * more fixes * update * more fixes * fixup * is the source of truth right? * sorry camembert for the troubles * current updates * fixup * update led * update * fix regression * fix single word * more model specific fixes * fix t5 tests * fixup * more comments * update * fix nllb * rstrip removed * small fixes * better handle additional_special_tokens and vocab sizes * fixing * styling * fix 4 / 21 * fixup * fix nlbb's tests * some fixes * fix t5 * fixes * style * fix canine tests * damn this is nice * nits * m2m100 nit * fixups * fixes! * fixup * stash * fix merge * revert bad change * fixup * correct order for code Llama * fix speecht5 post merge * styling * revert source of 11 fails * small nits * all changes in one go * fnet hack * fix 2 more tests * update based on main branch of tokenizers * fixup * fix VITS issues * more fixes * fix mgp test * fix camembert issues * oups camembert still has 2 failing tests * mluke fixes * decode fixes * small nits * nits * fix llama and vits * fix camembert * smal nits * more fixes when initialising a fast from a slow and etc * fix one of the last test * fix CPM tokenizer test * fixups * fix pop2piano * fixup * ⚠️ Change tokenizers required version ⚠️ * ⚠️ Change tokenizers required version ⚠️ * "tokenizers>=0.14,<0.15", don't forget smaller than * fix musicgen tests and pretraiendtokenizerfast * fix owlvit and all * update t5 * fix 800 red * fix tests * fix the fix of the fix of t5 * styling * documentation nits * cache _added_tokens_encoder * fixups * Nit * fix red tests * one last nit! * make eveything a lot simpler * Now it's over 😉 * few small nits * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * updates that work for now * tests that should no be skipped / changed and fixed next * fixup * i am ashamed * pushe the fix * update * fixups * nits * fix added_tokens_encoder * fix canine test * fix pegasus vocab * fix transfoXL * fixup * whisper needs to be fixed for train new * pegasus nits * more pegasus fixes * minor update * better error message in failed test * fix whisper failing test * fix whisper failing test * fix pegasus * fixup * fix **** pegasus * reset things * remove another file * attempts to fix the strange custome encoder and offset * nits here and there * update * fixup * nit * fix the whisper test * nits nits * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * updates based on review * some small update to potentially remove * nits * import rlu cache * Update src/transformers/tokenization_utils_base.py Co-authored-by: Lysandre Debut <hi@lysand.re> * move warning to `from_pretrained` * update tests results now that the special tokens are always added --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Lysandre Debut <hi@lysand.re>	2023-09-18 20:28:36 +02:00
Lysandre Debut	77ed9fa1a9	[FSMT] Fix non-shared weights (#26187 ) * Fix non-shared weights * Add tests * Edit tied weights keys	2023-09-18 16:58:38 +02:00
Matt	f0a6057fbc	Fix ConversationalPipeline tests (#26217 ) Add BlenderbotSmall templates and correct handling for conversation.past_user_inputs	2023-09-18 15:08:56 +01:00
Julien Chaumond	bc7ce1808f	moved `ctrl` to `Salesforce/ctrl` (#26183 ) * moved `ctrl` to `Salesforce/ctrl` redirects should theoretically work, but still updating those repo references for clarity * Fixup * Slow doc tests * Add modeling file --------- Co-authored-by: Lysandre <lysandre@huggingface.co>	2023-09-18 13:52:43 +02:00
Patrick von Platen	0a55d9f737	[PEFT] Allow PEFT model dict to be loaded (#25721 ) * Allow PEFT model dict to be loaded * make style * make style * Apply suggestions from code review * address comments * fixup * final change * added tests * fix test * better logic for handling if adapter has been loaded * Update tests/peft_integration/test_peft_integration.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-09-15 18:22:01 +02:00
Arthur	eb644980eb	Fix pad to multiple of (#25732 ) * nits * update the test * nits * update * fix bark * fix bark tests and allow padding to multiple of without new tokens	2023-09-15 11:53:39 -04:00
Sanchit Gandhi	c7b4d0b4e2	[Whisper] Check length of prompt + max new tokens (#26164 )	2023-09-15 15:46:31 +01:00
Sanchit Gandhi	d70fab8b20	[TTA Pipeline] Test MusicGen and VITS (#26146 )	2023-09-15 10:00:36 +01:00
Leo Tronchon	869733ab62	IDEFICS: allow interpolation of vision's pos embeddings (#26029 ) * add pos embed interpolation for vision encoder * style * update config with interpolate_pos_encoding arg * fix imports formatting * take off copied from on vision embeddings * add test for image embeddings interpolation * add credit for interpolation code * Update src/transformers/models/idefics/configuration_idefics.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/idefics/vision.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix condition to check nbr image patches match shape of pos embeddings * use kwargs in the forward methods for interpolation * fix tests * have interpolate_pos_encoding default to False instead of None * Update tests/models/idefics/test_modeling_idefics.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/idefics/test_modeling_idefics.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/idefics/test_modeling_idefics.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/idefics/configuration_idefics.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * take off for loop meant to print k,v * add interpolate_pos_encoding arg in prepare_inputs_for_generation * add test for interpolated generation * fix edge case num_patches == num_positions and height == width * add test for edge case * fix pos_embed in interpolate * allow interpolation in bf16 with upcasting * Update src/transformers/models/idefics/vision.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/idefics/vision.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * add multiple images tests for interpolation and generation --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-09-14 19:27:40 -04:00
Jinho Park	17fdd35481	Add BROS (#23190 ) * add Bros boilerplate * copy and pasted modeling_bros.py from official Bros repo * update copyright of bros files * copy tokenization_bros.py from official repo and update import path * copy tokenization_bros_fast.py from official repo and update import path * copy configuration_bros.py from official repo and update import path * remove trailing period in copyright line * copy and paste bros/__init__.py from official repo * save formatting * remove unused unnecessary pe_type argument - using only crel type * resolve import issue * remove unused model classes * remove unnecessary tests * remove unused classes * fix original code's bug - layer_module's argument order * clean up modeling auto * add bbox to prepare_config_and_inputs * set temporary value to hidden_size (32 is too low because of the of the Bros' positional embedding) * remove decoder test, update create_and_check* input arguemnts * add missing variable to model tests * do make fixup * update bros.mdx * add boilerate plate for no_head inference test * update BROS_PRETRAINED_MODEL_ARCHIVE_LIST (add naver-clova-ocr prefix) * add prepare_bros_batch_inputs function * update modeling_common to add bbox inputs in Bros Model Test * remove unnecessary model inference * add test case * add model_doc * add test case for token_classification * apply fixup * update modeling code * update BrosForTokenClassification loss calculation logic * revert logits preprocessing logic to make sure logits have original shape * - update class name * - add BrosSpadeOutput - update BrosConfig arguments * add boilerate plate for no_head inference test * add prepare_bros_batch_inputs function * add test case * add test case for token_classification * update modeling code * update BrosForTokenClassification loss calculation logic * revert logits preprocessing logic to make sure logits have original shape * apply masking on the fly * add BrosSpadeForTokenLinking * update class name put docstring to the beginning of the file * separate the logits calculation logic and loss calculation logic * update logic for loss calculation so that logits shape doesn't change when return * update typo * update prepare_config_and_inputs * update dummy node initialization * update last_hidden_states getting logic to consider when return_dict is False * update box first token mask param * bugfix: remove random attention mask generation * update keys to ignore on load missing * run make style and quality * apply make style and quality of other codes * update box_first_token_mask to bool type * update index.md * apply make style and quality * apply make fix-copies * pass check_repo * update bros model doc * docstring bugfix fix * add checkpoint for doc, tokenizer for doc * Update README.md * Update docs/source/en/model_doc/bros.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update bros.md * Update src/transformers/__init__.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/bros.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * apply suggestions from code review * apply suggestions from code review * revert test_processor_markuplm.py * Update test_processor_markuplm.py * apply suggestions from code review * apply suggestions from code review * apply suggestions from code review * update BrosSpadeELForTokenClassification head name to entity linker * add doc string for config params * update class, var names to more explicit and apply suggestions from code review * remove unnecessary keys to ignore * update relation extractor to be initialized with config * add bros processor * apply make style and quality * update bros.md * remove bros tokenizer, add bros processor that wraps bert tokenizer * revert change * apply make fix-copies * update processor code, update itc -> initial token, stc -> subsequent token * add type hint * remove unnecessary condition branches in embedding forward * fix auto tokenizer fail * update docstring for each classes * update bbox input dimension as standard 2 points and convert them to 4 points in forward pass * update bros docs * apply suggestions from code review : update Bros -> BROS in bros.md * 1. box prefix var -> bbox 2. update variable names to be more explicit * replace einsum with torch matmul * apply style and quality * remove unused argument * remove unused arguments * update docstrings * apply suggestions from code review: add BrosBboxEmbeddings, replace einsum with classical matrix operations * revert einsum update * update bros processor * apply suggestions from code review * add conversion script for bros * Apply suggestions from code review * fix readme * apply fix-copies --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-09-14 18:02:37 +01:00

... 2 3 4 5 6 ...

3386 Commits