transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-20 13:08:21 +06:00

Author	SHA1	Message	Date
Connor Henderson	d83ff5eeff	Add FastSpeech2Conformer (#23439 ) * start - docs, SpeechT5 copy and rename * add relevant code from FastSpeech2 draft, have tests pass * make it an actual conformer, demo ex. * matching inference with original repo, includes debug code * refactor nn.Sequentials, start more desc. var names * more renaming * more renaming * vocoder scratchwork * matching vocoder outputs * hifigan vocoder conversion script * convert model script, rename some config vars * replace postnet with speecht5's implementation * passing common tests, file cleanup * expand testing, add output hidden states and attention * tokenizer + passing tokenizer tests * variety of updates and tests * g2p_en pckg setup * import structure edits * docstrings and cleanup * repo consistency * deps * small cleanup * forward signature param order * address comments except for masks and labels * address comments on attention_mask and labels * address second round of comments * remove old unneeded line * address comments part 1 * address comments pt 2 * rename auto mapping * fixes for failing tests * address comments part 3 (bart-like, train loss) * make style * pass config where possible * add forward method + tests to WithHifiGan model * make style * address arg passing and generate_speech comments * address Arthur comments * address Arthur comments pt2 * lint changes * Sanchit comment * add g2p-en to doctest deps * move up self.encoder * onnx compatible tensor method * fix is symbolic * fix paper url * move models to espnet org * make style * make fix-copies * update docstring * Arthur comments * update docstring w/ new updates * add model architecture images * header size * md wording update * make style	2024-01-03 18:01:06 +00:00
Younes Belkada	fa21ead73d	[`Awq`] Enable the possibility to skip quantization for some target modules (#27950 ) * v1 * add docstring * add tests * add awq 0.1.8 * oops * fix test	2023-12-25 11:06:56 +01:00
Younes Belkada	29e7a1e183	[`Llava`] Fix llava index errors (#28032 ) * fix llava index errors * forward contrib credits from original implementation and fix * better fix * final fixes and fix all tests * fix * fix nit * fix tests * add regression tests --------- Co-authored-by: gullalc <gullalc@users.noreply.github.com>	2023-12-22 17:47:38 +01:00
Yoach Lacombe	5da3db3fd5	[Whisper] Fix word-level timestamps with bs>1 or num_beams>1 (#28114 ) * fix frames * use smaller chunk length * correct beam search + tentative stride * fix whisper word timestamp in batch * add test batch generation with return token timestamps * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * clean a test * make style + correct typo * write clearer comments * explain test in comment --------- Co-authored-by: sanchit-gandhi <sanchit@huggingface.co> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>	2023-12-22 12:43:11 +00:00
Yih-Dar	bb3bd44739	Fix the check of models supporting FA/SDPA not run (#28202 ) * add check_support_list.py * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-12-22 12:56:11 +01:00
NielsRogge	c9fb250a25	Add Swinv2 backbone (#27742 ) * First draft * More improvements * More improvements * Make all tests pass * Remove script * Update image processor * Address comments * Use new gradient checkpointing method * Convert checkpoints, add integration test * Do not keep aspect ratio for now * Set keep_aspect_ratio=False for beit, add integration test * Remove print statement	2023-12-22 11:12:56 +00:00
Nicholas Neo	1ef86c4f56	Fix: [SeamlessM4T - S2TT] Bug in batch loading of audio in torch.Tensor format in the SeamlessM4TFeatureExtractor class (#27914 ) * fixes: code fixes on is_batched condition to also check for batched audio data in torch.Tensor format instead of only just checking for batched audio data in np.ndarray format * Update src/transformers/models/seamless_m4t/feature_extraction_seamless_m4t.py Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * refactor: code refactoring to remove torch framework dependency * docs: updated docstring to add torch tensor compatibility * test: add test cases to incorporate torch tensor inputs * test: ran make fix-copies for code conformity * test: refactor test to separate the test_call into test_call_numpy and test_call_torch --------- Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>	2023-12-22 10:47:30 +00:00
amyeroberts	3657748b4d	Update YOLOS slow test values (#28187 ) Update test values	2023-12-21 18:17:07 +00:00
amyeroberts	cd1350ce9b	Fix slow backbone tests - out_indices must match stage name ordering (#28186 ) Indices must match stage name ordering	2023-12-21 18:16:50 +00:00
Matt	260b9d2179	Even more TF test fixes (#28146 ) * Fix vision text dual encoder * Small cleanup for wav2vec2 (not fixed yet) * Small fix for vision_encoder_decoder * Fix SAM builds * Update TFBertTokenizer test with modern exporting + tokenizer * Fix DeBERTa * Fix DeBERTav2 * Try RAG fix but it's impossible to test locally * Actually fix RAG now that I got FAISS working somehow * Fix Wav2Vec2, add sermon * Fix Hubert	2023-12-21 15:14:46 +00:00
Arthur	f9a98c476c	[`Mixtral` & `Mistral`] Add support for sdpa (#28133 ) * some nits * update test * add support d\sd[a * remove some dummy inputs * all good * style * nits * fixes * fix more copies * nits * styling * fix * Update src/transformers/models/mistral/modeling_mistral.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * add a slow test just to be sure * fixup --------- Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>	2023-12-21 12:38:22 +01:00
Sanchit Gandhi	814619f54f	[Whisper] Use torch for stft if available (#26119 ) * [Whisper] Use torch for stft if available * update docstring * mock patch decorator * fit on one line	2023-12-21 11:04:05 +00:00
Poedator	4f7806ef7e	[bnb] Let's make serialization of 4bit models possible (#26037 ) * updated bitsandbytes.py * rm test_raise_* from test_4bit.py * add test_4bit_serialization.py * modeling_utils bulk edits * bnb_ver 0.41.3 in integrations/bitsandbytes.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * @slow reinstated Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * bnb ver 0.41.3 in src/transformers/modeling_utils.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * rm bnb version todo in integrations/bitsandbytes.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * moved 4b serialization tests to test_4bit * tests upd for opt * to torch_device Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * ruff fixes to tests * rm redundant bnb version check in mod_utils Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * restore _hf_peft_config_loaded modeling_utils.py::2188 Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * restore _hf_peft_config_loaded test in modeling_utils.py::2199 Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * fixed NOT getattr(self, "is_8bit_serializable") Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * setting model.is_4bit_serializable * rm separate fp16_statistics arg from set_module... * rm else branch in integrations::bnb::set_module * bnb 4bit dtype check * upd comment on 4bit weights * upd tests for FP4 safe --------- Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>	2023-12-21 11:54:44 +01:00
Dean Wyatte	e268d7e5dc	disable test_retain_grad_hidden_states_attentions on SeamlessM4TModelWithTextInputTest (#28169 ) disable retain_grad_hidden_states_attentions on SeamlessM4TModelWithTextInputTest	2023-12-21 08:39:44 +01:00
amyeroberts	1d77735947	Fix yolos resizing (#27663 ) * Fix yolos resizing * Update tests * Add a test	2023-12-20 20:55:51 +00:00
Joao Gante	45b70384a7	Generate: fix speculative decoding (#28166 ) Co-authored-by: Merve Noyan <merveenoyan@gmail.com>	2023-12-20 18:55:35 +00:00
amyeroberts	ee298a16a2	Align backbone stage selection with out_indices & out_features (#27606 ) * Iteratre over out_features instead of stage_names * Update for all backbones * Add tests * Fix * Align timm backbone behaviour with other backbones * Fix tests * Stricter checks on set out_features and out_indices * Revert back stage selection logic * Remove out-of-order logic * Document restriction in docstrings	2023-12-20 18:33:17 +00:00
Yih-Dar	7938c8c836	Fix weights not properly initialized due to shape mismatch (#28122 ) * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-12-20 14:20:02 +01:00
peter-sk	769a9542de	move code to Trainer.evaluate to enable use of that function with multiple datasets (#27844 ) * move code to Trainer.evaluate to enable use of that function with multiple datasets * test * update doc string * and a tip * forgot the type --------- Co-authored-by: Prof. Peter Schneider-Kamp <jps@ordbogen.com>	2023-12-20 10:55:56 +01:00
Arthur	4a04b4ccca	[`Mixtral`] Fix loss + nits (#28115 ) * default config should not use sliding window * update the doc * nits * add a proper test * update * update * update expected value * Update src/transformers/tokenization_utils_fast.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * convert to float * average then N*2 comment * revert nit * good to fo * fixup * Update tests/models/mixtral/test_modeling_mixtral.py Co-authored-by: Lysandre Debut <hi@lysand.re> * revert unrelated change --------- Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Lysandre Debut <hi@lysand.re>	2023-12-19 17:31:54 +01:00
amyeroberts	bd7a356135	Update split string in doctest to reflect #28087 (#28135 )	2023-12-19 13:55:09 +00:00
Matt	71d47f0ad4	More TF fixes (#28081 ) * More build_in_name_scope() * Make sure we set the save spec now we don't do it with dummies anymore * make fixup	2023-12-18 15:26:03 +00:00
Poedator	f85a1e82c1	4D `attention_mask` support (#27539 ) * edits to _prepare_4d_causal_attention_mask() * initial tests for 4d mask * attention_mask_for_sdpa support * added test for inner model hidden * added autotest decorators * test mask dtype to torch.int64 * torch.testing.assert_close Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * torch_device and @torch_gpu in tests * upd tests * +torch decorators * torch decorators fixed * more decorators! * even more decorators * fewer decorators --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-12-17 11:08:04 +01:00
Sourab Mangrulkar	238d2e3c44	fix resuming from ckpt when using FSDP with FULL_STATE_DICT (#27891 ) * fix resuming from ckpt when suing FSDP with FULL_STATE_DICT * update tests * fix tests	2023-12-16 19:41:43 +05:30
Quentin Lhoest	26ea725bc0	Update fixtures-image-utils (#28080 ) * fix hf-internal-testing/fixtures_image_utils * fix test * comments	2023-12-15 16:58:36 +00:00
Yoach Lacombe	deb72cb6d9	Skip M4T `test_retain_grad_hidden_states_attentions` (#28060 ) * skip test from SpeechInput * refine description of skip	2023-12-15 13:39:16 +00:00
Younes Belkada	1e20931765	[`FA-2`] Fix fa-2 issue when passing `config` to `from_pretrained` (#28043 ) * fix fa-2 issue * fix test * Update src/transformers/modeling_utils.py Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com> * clenaer fix * up * add more robust tests * Update src/transformers/modeling_utils.py Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com> * fixup * Update src/transformers/modeling_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * pop * add test --------- Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-12-15 11:08:27 +01:00
Matt	3060899be5	Replace build() with build_in_name_scope() for some TF tests (#28046 ) Replace build() with build_in_name_scope() for some tests	2023-12-14 17:42:25 +00:00
Matt	050e0b44f6	Proper build() methods for TF (#27794 ) * Add a convenience method for building in your own name scope * Second attempt at auto layer building * Revert "Second attempt at auto layer building" This reverts commit e03a3aaecf9ec41a805582b83cbdfe3290a631be. * Attempt #3 * Revert "Attempt #3" This reverts commit b9df7a0857560d29b5abbed6127d9e9eca77cf47. * Add missing attributes that we're going to need later * Add some attributes we're going to need later * A fourth attempt! Feel the power flow through you! * Revert "A fourth attempt! Feel the power flow through you!" This reverts commit 6bf4aaf3875d6f28485f50187617a4c616c8aff7. * Add more values we'll need later * TF refactor that we'll need later * Revert "TF refactor that we'll need later" This reverts commit ca07202fb5b7b7436b893baa8d688b4f348ea7b9. * Revert "Revert "TF refactor that we'll need later"" This reverts commit 1beb0f39f293ed9c27594575e1c849aadeb15c13. * make fixup * Attempt five! * Revert "Attempt five!" This reverts commit 3302207958dfd0374b0447a51c06eea51a506044. * Attempt six - this time don't add empty methods * Revert "Attempt six - this time don't add empty methods" This reverts commit 67d60129be75416b6beb8f47c7d38d77b18d79bb. * Attempt seven - better base model class detection! * Revert "Attempt seven - better base model class detection!" This reverts commit 5f14845e92ea0e87c598da933bfbfee10f553bc9. * Another attribute we'll need later * Try again with the missing attribute! * Revert "Try again with the missing attribute!" This reverts commit 760c6f30c5dffb3e04b0e73c34a77d1882a0fef7. * This is the attempt that will pierce the heavens! * Revert "This is the attempt that will pierce the heavens!" This reverts commit c868bb657de057aca7a5260350a3f831fc4dfee6. * Attempt seven - snag list is steadily decreasing * Revert "Attempt seven - snag list is steadily decreasing" This reverts commit 46fbd975deda64429bfb3e5fac4fc0370c00d316. * Attempt eight - will an empty snag list do it? * Revert "Attempt eight - will an empty snag list do it?" This reverts commit 7c8a3c2b083253649569e9877e02054ae5cec67b. * Fixes to Hubert issues that cause problems later * Trying again with Conv1D/SeparableConv fixes * Revert "Trying again with Conv1D/SeparableConv fixes" This reverts commit 55092bca952bc0f750aa1ffe246a640bf1e2036e. * Apply the build shape fixes to Wav2Vec2 as well * One more attempt! * Revert "One more attempt!" This reverts commit 5ac3e4cb01b9458cc93312873725f9444ae7261c. * Another attempt! * Revert "Another attempt!" This reverts commit ea16d890e019d7de8792a3b8e72f3b1c02adae50. * Let's see how many failures we get without the internal build method * Fix OpenAI * Fix MobileBERT * (Mostly) fix GroupVIT * Fix BLIP * One more BLIP fix * One more BLIP fix! * Fix Regnet * Finally fully fix GroupViT * Fix Data2Vec and add the new AdaptivePool * Fix Segformer * Fix Albert * Fix Deberta/DebertaV2 * Fix XLM * Actually fix XLM * Fix Flaubert * Fix lxmert * Fix Resnet * Fix ConvBERT * Fix ESM * Fix Convnext / ConvnextV2 * Fix SAM * Fix Efficientformer * Fix LayoutLMv3 * Fix speech_to_text * Fix mpnet and mobilevit * Fix Swin * Fix CTRL * Fix CVT * Fix DPR * Fix Wav2Vec2 * Fix T5 * Fix Hubert * Fix GPT2 * Fix Whisper * Fix DeiT * Fix the encoder-decoder / dual-encoder classes * make fix-copies * build in name scope * Fix summarization test * Fix tied weight names for BART + Blenderbot * Fix tied weight name building * Fix to TFESM weight building * Update TF SAM * Expand all the shapes out into Big Boy Shapes	2023-12-14 15:17:30 +00:00
Yoach Lacombe	bb1d0d0d9e	Fix languages covered by M4Tv2 (#28019 ) * correct language assessment + add tests * Update src/transformers/models/seamless_m4t_v2/modeling_seamless_m4t_v2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * make style + simplify and enrich test --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-12-14 14:43:44 +00:00
Joao Gante	e2b16485f3	SeamlessM4T: `test_retain_grad_hidden_states_attentions` is flaky (#28035 )	2023-12-14 13:56:03 +00:00
Joao Gante	9e5c28c573	Generate: assisted decoding now uses `generate` for the assistant (#28030 ) generate refactor	2023-12-14 13:31:13 +00:00
Zach Mueller	93766251cb	Fix bug with rotating checkpoints (#28009 ) * Fix bug * Write test * Keep back old modification for grad accum steps * Whitespace... * Whitespace again * Race condition * Wait for everyone	2023-12-13 12:17:30 -05:00
Arthur	ec43d6870a	[`CI slow`] Fix expected values (#27999 ) * fix expected values * style * test is slow	2023-12-13 13:37:10 +01:00
Arindam Jati	749f94e460	Fix PatchTSMixer slow tests (#27997 ) * fix slow tests * revert formatting --------- Co-authored-by: Arindam Jati <arindam.jati@ibm.com> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>	2023-12-13 13:34:25 +01:00
Younes Belkada	c7f076a00e	Adds VIP-llava to transformers (#27932 ) * v1 * add-new-model-like * revert * fix forward and conversion script * revert * fix copies * fixup * fix * Update docs/source/en/index.md * Apply suggestions from code review * push * fix * fixes here and there * up * fixup and fix tests * Apply suggestions from code review * add docs * fixup * fixes * docstring * add docstring * fixup * docstring * fixup * nit * docs * more copies * fix copies * nit * update test	2023-12-13 10:42:24 +01:00
rjenc29	7e35f37071	Fix a couple of typos and add an illustrative test (#26941 ) * fix a typo and add an illustrative test * appease black * reduce code duplication and add Annotion type back with a pending deprecation warning * remove unused code * change warning type * black formatting fix * change enum deprecation approach to support 3.8 and earlier * add stacklevel * fix black issue * fix ruff issues * fix ruff issues * move tests to own mixin * include yolos * fix black formatting issue * fix black formatting issue * use logger instead of warnings and include target version for deprecation	2023-12-11 15:51:51 +00:00
Ella Charlaix	39acfe84ba	Add deepspeed test to amd scheduled CI (#27633 ) * add deepspeed scheduled test for amd * fix image * add dockerfile * add comment * enable tests * trigger * remove trigger for this branch * trigger * change runner env to trigger the docker build image test * use new docker image * remove test suffix from docker image tag * replace test docker image with original image * push new image * Trigger * add back amd tests * fix typo * add amd tests back * fix * comment until docker image build scheduled test fix * remove deprecated deepspeed build option * upgrade torch * update docker & make tests pass * Update docker/transformers-pytorch-deepspeed-amd-gpu/Dockerfile * fix * tmp disable test * precompile deepspeed to avoid timeout during tests * fix comment * trigger deepspeed tests with new image * comment tests * trigger * add sklearn dependency to fix slow tests * enable back other tests * final update --------- Co-authored-by: Felix Marty <felix@hf.co> Co-authored-by: Félix Marty <9808326+fxmarty@users.noreply.github.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-12-11 16:33:36 +01:00
Zach Mueller	44127ec667	Fix test for auto_find_batch_size on multi-GPU (#27947 ) * Fix test for multi-GPU * WIth CPU handle	2023-12-11 09:57:41 -05:00
Arthur	accccdd008	[`Add Mixtral`] Adds support for the Mixtral MoE (#27942 ) * up * up * test * logits ok * up * up * few fixes * conversion script * up * nits * nits * update * nuke * more updates * nites * fix many issues * nit * scatter * nit * nuke megablocks * nits * fix conversion script * nit * remove * nits * nit * update * oupsssss * change * nits device * nits * fixup * update * merge * add copied from * fix the copy mentions * update tests * more fixes * nits * conversion script * add parts of the readme * Update tests/models/mixtral/test_modeling_mixtral.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * new test + conversion script * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Apply suggestions from code review * fix * fix copies * fix copies * ooops * fix config * Apply suggestions from code review * fix nits * nit * add copies * add batched tests * docs * fix flash attention * let's add more verbose * add correct outputs * support router ouptus * ignore copies where needed * fix * cat list if list is given for now * nits * Update docs/source/en/model_doc/mixtral.md * finish router refactoring * fix forward * fix expected values * nits * fixup * fix * fix bug * fix * fix dtype mismatch * fix * grrr grrr I support item assignment * fix CI * docs * fixup * remove some copied form * fix weird diff * skip doctest fast on the config and modeling * mark that is supports flash attention in the doc * update * Update src/transformers/models/mixtral/modeling_mixtral.py Co-authored-by: Lysandre Debut <hi@lysand.re> * Update docs/source/en/model_doc/mixtral.md Co-authored-by: Lysandre Debut <hi@lysand.re> * revert router logits config issue * update doc accordingly * Update src/transformers/models/mixtral/convert_mixtral_weights_to_hf.py * nits * use torch testing asssert close * fixup * doc nits --------- Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Lysandre Debut <hi@lysand.re>	2023-12-11 12:50:27 +01:00
Arthur	0676d992a5	[`from_pretrained`] Make from_pretrained fast again (#27709 ) * Skip nn.Module.reset_parameters * Actually skip * Check quality * Maybe change all inits * Fix init issues: only modify public functions * Add a small test for now * Style * test updates * style * nice tes * style * make it even faster * one more second * remove fx icompatible * Update tests/test_modeling_common.py Co-authored-by: Lysandre Debut <hi@lysand.re> * Update tests/test_modeling_common.py Co-authored-by: Lysandre Debut <hi@lysand.re> * skip * fix quality * protect the import --------- Co-authored-by: Lysandre Debut <hi@lysand.re>	2023-12-11 12:38:17 +01:00
fxmarty	9f18cc6df0	Fix SDPA dispatch & make SDPA CI compatible with torch<2.1.1 (#27940 ) fix sdpa dispatch	2023-12-11 18:56:38 +09:00
Yoach Lacombe	5e620a92cf	Fix `SeamlessM4Tv2ModelIntegrationTest` (#27911 ) change dtype of some integration tests	2023-12-11 09:18:41 +01:00
Yih-Dar	e96c1de191	Skip `UnivNetModelTest::test_multi_gpu_data_parallel_forward` (#27912 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-12-11 09:17:37 +01:00
Sangbum Daniel Choi	235be08569	[DETA] fix backbone freeze/unfreeze function (#27843 ) * [DETA] fix freeze/unfreeze function * Update src/transformers/models/deta/modeling_deta.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/deta/modeling_deta.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * add freeze/unfreeze test case in DETA * fix type * fix typo 2 --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-12-11 07:57:30 +01:00
fxmarty	80377eb018	F.scaled_dot_product_attention support (#26572 ) * add sdpa * wip * cleaning * add ref * yet more cleaning * and more :) * wip llama * working llama * add output_attentions=True support * bigcode sdpa support * fixes * gpt-bigcode support, require torch>=2.1.1 * add falcon support * fix conflicts falcon * style * fix attention_mask definition * remove output_attentions from attnmaskconverter * support whisper without removing any Copied from statement * fix mbart default to eager renaming * fix typo in falcon * fix is_causal in SDPA * check is_flash_attn_2_available in the models init as well in case the model is not initialized through from_pretrained * add warnings when falling back on the manual implementation * precise doc * wip replace _flash_attn_enabled by config.attn_implementation * fix typo * add tests * style * add a copy.deepcopy on the config in from_pretrained, as we do not want to modify it inplace * obey to config.attn_implementation if a config is passed in from_pretrained * fix is_torch_sdpa_available when torch is not installed * remove dead code * Update src/transformers/modeling_attn_mask_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/modeling_attn_mask_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/modeling_attn_mask_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/modeling_attn_mask_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/modeling_attn_mask_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/bart/modeling_bart.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * remove duplicate pretraining_tp code * add dropout in llama * precise comment on attn_mask * add fmt: off for _unmask_unattended docstring * precise num_masks comment * nuke pretraining_tp in LlamaSDPAAttention following Arthur's suggestion * cleanup modeling_utils * backward compatibility * fix style as requested * style * improve documentation * test pass * style * add _unmask_unattended tests * skip meaningless tests for idefics * hard_check SDPA requirements when specifically requested * standardize the use if XXX_ATTENTION_CLASSES * fix SDPA bug with mem-efficient backend on CUDA when using fp32 * fix test * rely on SDPA is_causal parameter to handle the causal mask in some cases * fix FALCON_ATTENTION_CLASSES * remove _flash_attn_2_enabled occurences * fix test * add OPT to the list of supported flash models * improve test * properly test on different SDPA backends, on different dtypes & properly handle separately the pad tokens in the test * remove remaining _flash_attn_2_enabled occurence * Update src/transformers/modeling_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/modeling_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/modeling_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/modeling_attn_mask_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/perf_infer_gpu_one.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * remove use_attn_implementation * fix docstring & slight bug * make attn_implementation internal (_attn_implementation) * typos * fix tests * deprecate use_flash_attention_2=True * fix test * add back llama that was removed by mistake * fix tests * remove _flash_attn_2_enabled occurences bis * add check & test that passed attn_implementation is valid * fix falcon torchscript export * fix device of mask in tests * add tip about torch.jit.trace and move bt doc below sdpa * fix parameterized.expand order * move tests from test_modeling_attn_mask_utils to test_modeling_utils as a relevant test class is already there * update sdpaattention class with the new cache * Update src/transformers/configuration_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/bark/modeling_bark.py * address review comments * WIP torch.jit.trace fix. left: test both eager & sdpa * add test for torch.jit.trace for both eager/sdpa * fix falcon with torch==2.0 that needs to use sdpa * fix doc * hopefully last fix * fix key_value_length that has no default now in mask converter * is it flacky? * fix speculative decoding bug * tests do pass * fix following #27907 --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-12-09 05:38:14 +09:00
Joao Gante	ce0bbd5101	Generate: SinkCache can handle iterative prompts (#27907 )	2023-12-08 20:02:20 +00:00
Zach Mueller	6757ed28ce	Allow `resume_from_checkpoint` to handle `auto_find_batch_size` (#27568 ) * Fuffill request * Add test * Better test * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Better test * Better test * MOre comments --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-12-08 11:51:02 -05:00
Yih-Dar	e366937587	Fix 2 tests in `FillMaskPipelineTests` (#27889 ) * fix * fix * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-12-08 14:55:29 +01:00
Yih-Dar	3b720ad9a5	mark `test_initialization` as flaky in 2 model tests (#27906 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-12-08 14:54:32 +01:00
Xin Qiu	b31905d1f6	Fix remaining issues in beam score calculation (#27808 ) * Fix issues in add and is_done for BeamHypotheses * make newly added arguments optional for better compatibility * Directly use cur_len as generated_len, add note for retrocompatibility * update test expectation * make cur_len represents the length of the entire sequence including the decoder prompt * remove redundant if/else in testing	2023-12-08 14:14:16 +01:00
Jonathon Belotti	4c5ed1d0c9	fix: non-atomic checkpoint save (#27820 )	2023-12-08 14:08:54 +01:00
Saibo-creator	56be5e80e6	Fix: Raise informative exception when `prefix_allowed_tokens_fn` return empty set of tokens (#27797 ) Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-12-08 10:25:49 +00:00
fxmarty	307a7d0be8	[⚠️ removed a default argument] Make `AttentionMaskConverter` compatible with `torch.compile(..., fullgraph=True)` (#27868 ) * remove bugged torch.float32 default * add test * fix tests * fix test * fix doc	2023-12-08 18:44:47 +09:00
Tom Aarsen	633215ba58	Generate: New `Cache` abstraction and Attention Sinks support (#26681 ) * Draft version of new KV Caching This should allow Attention Sinks (https://github.com/tomaarsen/attention_sinks) / StreamingLLM (https://arxiv.org/abs/2309.17453) to be easily implemented in a third-party or in transformers directly * Address numerous PR suggestions 1. Move layer_idx from cache to ...Attention. Removes confusing set_layer_idx magic. 2. Always convert past_key_values to Cache instance at the start of ...Attention, removes all other isinstance calls. 3. Remove __bool__ and __getitem__ magic as they're confusing. 4. past_key_values.update(key, value, idx) now returns key, value. 5. Add use_legacy_cache flag, defaults to None, i.e. Falsey. This breaks generate for now, until 1) the cache is used is generate() or 2) use_legacy_cache is defaulted to True in generate() until we change it in another PR. 6. Separate key_cache and value_cache. Some work is still needed to see if the SinkCache can conveniently be implemented with just one update method. * Implement the SinkCache through backward+forward rotations * Integrate (Sink)Cache with Llama FA2 * Set use_legacy_cache=True as default, allows for test passes * Move from/to_legacy_cache to ...Model class * Undo unnecessary newline change * Remove copy utility from deprecated OpenLlama * Match import style * manual rebase with main * Cache class working with generate (#1) * Draft version of new KV Caching This should allow Attention Sinks (https://github.com/tomaarsen/attention_sinks) / StreamingLLM (https://arxiv.org/abs/2309.17453) to be easily implemented in a third-party or in transformers directly * Address numerous PR suggestions 1. Move layer_idx from cache to ...Attention. Removes confusing set_layer_idx magic. 2. Always convert past_key_values to Cache instance at the start of ...Attention, removes all other isinstance calls. 3. Remove __bool__ and __getitem__ magic as they're confusing. 4. past_key_values.update(key, value, idx) now returns key, value. 5. Add use_legacy_cache flag, defaults to None, i.e. Falsey. This breaks generate for now, until 1) the cache is used is generate() or 2) use_legacy_cache is defaulted to True in generate() until we change it in another PR. 6. Separate key_cache and value_cache. Some work is still needed to see if the SinkCache can conveniently be implemented with just one update method. * Integrate (Sink)Cache with Llama FA2 * Move from/to_legacy_cache to ...Model class * Undo unnecessary newline change * Match import style * working generate * Add tests; Simplify code; Apply changes to Mistral and Persimmon * fix rebase mess * a few more manual fixes * last manual fix * propagate changes to phi * upgrade test * add use_legacy_cache docstring; beef up tests * reintroduce unwanted deletes --------- Co-authored-by: Tom Aarsen <Cubiegamedev@gmail.com> * move import * add default to model_kwargs.get('use_legacy_cache') * correct failing test * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * apply PR suggestions * fix failing test * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com> * PR comments * tmp commit * add docstrings * more tests, more docstrings, add to docs * derp * tmp commit * tmp dbg * more dbg * fix beam search bug * cache can be a list of tuples in some models * fix group beam search * all but sinkcache integration tests * fix sink cache and add hard integration test * now also compatible with input_embeds input * PR comments * add Cache support to Phi+FA2 * make fixup --------- Co-authored-by: Joao Gante <joao@huggingface.co> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2023-12-08 09:00:17 +01:00
Matt	47500b1d72	Fix TF loading PT safetensors when weights are tied (#27490 ) * Un-skip tests * Add aliasing support to tf_to_pt_weight_rename * Refactor tf-to-pt weight rename for simplicity * Patch mobilebert * Let us pray that the transfo-xl one works * Add XGLM rename * Expand the test to see if we can get more models to break * Expand the test to see if we can get more models to break * Fix MPNet (it was actually an unrelated bug) * Fix MPNet (it was actually an unrelated bug) * Add speech2text fix * Update src/transformers/modeling_tf_pytorch_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/mobilebert/modeling_tf_mobilebert.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update to always return a tuple from tf_to_pt_weight_rename * reformat * Add a couple of missing tuples * Remove the extra test for tie_word_embeddings since it didn't cause any unexpected failures anyway * Revert changes to modeling_tf_mpnet.py * Skip MPNet test and add explanation * Add weight link for BART * Add TODO to clean this up a bit --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-12-07 14:28:53 +00:00
fxmarty	c99f254763	Fix device of masks in tests (#27887 ) fix device of mask in tests	2023-12-07 21:34:43 +09:00
Yih-Dar	52746922b0	Allow `# Ignore copy` (#27328 ) * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-12-07 10:00:08 +01:00
Younes Belkada	44b5506d29	[`Llava`] Add Llava to transformers (#27662 ) * add model like * logits match * minor fixes * fixes * up * up * add todo * llava processor * keep the processor simple * add conversion script * fixup * fix copies * up * add to index * fix config + logits * fix * refactor * more refactor * more refactor * fix copies * add authors * v1 tests * add `LlavaProcessor` in init * remove unneeded import * up * up * docs * up * fix CI * fix CI * add attention mask in test * make fixup * remove the vision model * that' s the dirty way to do it * nits * nits * updates * add more tests * add input tests * fixup * more styling * nits * updates amd cleanup * fixup the generation expected results * fix the testing script * some cleanup and simplification which does not work yet but almost there! * make correct dispatch operations * vectorize works for batch of images and text * last todos * nits * update test and modeling code * remove useless function for now * fix few issues * fix generation * some nits * add bakllava * nits * remove duplicated code * finis merge * cleanup * missed this line * fill the todos * add left padding offset * add left and rignt padding logic * bool to properly index * make sure * more cleanups * batch is fixed 😉 * add correct device for tensor creation * fix some dtype missmatch * ruff * update conversion script * Update src/transformers/__init__.py * fa 2 support + fix conversion script * more * correct reshaping * fix test dict * fix copies by ignoring * fix nit * skip clip vision model * fixup * fixup * LlavaForVisionText2Text -> LlavaForCausalLM * update * fix * raise correct errors * fix * docs * nuke for now * nits here and there * fixup * fix remaining tests * update LlavaForConditionalGeneration instead of CausalLM * fixups * pipeline support * slow and piepline tests * supports batch * nits * cleanup * fix first integration tests * add pad token where needed * correct etsts * fixups * update pipeline testr * fix quality * nits * revert unneeded change * nit * use BatchFeature * from ...feature_extraction_utils import BatchFeature * nits * nits * properly update * more f*** nits * fix copies * comment * keep slow test slow * Update src/transformers/models/llava/processing_llava.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * add piepline example * add pixel values in docstrign * update pr doctest * fix * fix slow tests * remove hack * fixup * small note * forward contrib credits from PR25789 * forward contrib credits from original implementation and work * add arthur * Update src/transformers/models/llava/processing_llava.py Co-authored-by: Lysandre Debut <hi@lysand.re> * update docstring * nit * move to not doctested because of timeout issues * fixup * add description * more * fix-copies * fix docs * add beam search * add more comments * add typehints on processor * add speedup plot * update slow tests and docs * push test * push batched test * fix batched generation with different number of images * remove benchmark due to a bug * fix test * fix copies * add gcolab demo --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: shauray8 <shauray8@users.noreply.github.com> Co-authored-by: haotian-liu <haotian-liu@users.noreply.github.com> Co-authored-by: Lysandre Debut <hi@lysand.re>	2023-12-07 09:30:47 +01:00
Susnato Dhar	f84d85ba67	[`FA-2`] Add Flash Attention to `Phi` (#27661 ) * add FA and modify doc file * test_flash_attn_2_generate_padding_right test overwritten * comment * modify persimmon modeling file * added speedup graph * more changes	2023-12-07 07:57:48 +01:00
Alex McKinney	75336c1794	Add Llama Flax Implementation (#24587 ) * Copies `modeling_flax_gpt_neo.py` to start * MLP Block. WIP Attention and Block * Adds Flax implementation of `LlamaMLP` Validated with in-file test. Some slight numeric differences, but assuming it isn't an issue * Adds `FlaxLlamaRMSNorm` layer `flax.linen` includes `RMSNorm` layer but not necessarily in all versions. Hence, we add in-file. * Adds FlaxLlamaAttention Copied from GPT-J as it has efficient caching implementation as well as rotary embeddings. Notice numerically different, but not by a huge amount. Needs investigating * Adds `FlaxLlamaDecoderLayer` numerically inaccurate, debugging.. * debugging rotary mismatch gptj uses interleaved whilst llama uses contiguous i think they match now but still final result is wrong. maybe drop back to just debugging attention layer? * fixes bug with decoder layer still somewhat numerically inaccurate, but close enough for now * adds markers for what to implement next the structure here diverges a lot from the PT version. not a big fan of it, but just get something working for now * implements `FlaxLlamaBlockCollection`] tolerance must be higher than expected, kinda disconcerting * Adds `FlaxLlamaModule` equivalent PyTorch model is `LlamaModel` yay! a language model🤗 * adds `FlaxLlamaForCausalLMModule` equivalent to `LlamaForCausalLM` still missing returning dict or tuple, will add later * start porting pretrained wrappers realised it probably needs return dict as a prereq * cleanup, quality, style * readds `return_dict` and model output named tuples * (tentatively) pretrained wrappers work 🔥 * fixes numerical mismatch in `FlaxLlamaRMSNorm` seems `jax.lax.rsqrt` does not match `torch.sqrt`. manually computing `1 / jax.numpy.sqrt` results in matching values. * [WIP] debugging numerics * numerical match I think issue was accidental change of backend. forcing CPU fixes test. We expect some mismatch on GPU. * adds in model and integration tests for Flax Llama summary of failing: - mul invalid combination of dimensions - one numerical mismatch - bf16 conversion (maybe my local backend issue) - params are not FrozenDict * adds missing TYPE_CHECKING import and `make fixup` * adds back missing docstrings needs review on quality of docstrings, not sure what is required. Furthermore, need to check if `CHECKPOINT_FOR_DOC` is valid. See TODO * commenting out equivalence test as can just use common * debugging * Fixes bug where mask and pos_ids were swapped in pretrained models This results in all tests passing now 🔥 * cleanup of modeling file * cleanup of test file * Resolving simpler review comments * addresses more minor review comments * fixing introduced pytest errors from review * wip additional slow tests * wip tests need to grab a GPU machine to get real logits for comparison otherwise, slow tests should be okay * `make quality`, `make style` * adds slow integration tests - checking logits - checking hidden states - checking generation outputs * `make fix-copies` * fix mangled function following `make fix-copies` * adds missing type checking imports * fixes missing parameter checkpoint warning * more finegrained 'Copied from' tags avoids issue of overwriting `LLAMA_INPUTS_DOCSTRING` * swaps import guards ??? how did these get swapped initially? * removing `inv_freq` again as pytorch version has now removed * attempting to get CI to pass * adds doc entries for llama flax models * fixes typo in __init__.py imports * adds back special equivalence tests these come from the gpt neo flax tests. there is special behaviour for these models that needs to override the common version * overrides tests with dummy to see if CI passes need to fill in these tests later * adds my contribution to docs * `make style; make quality` * replaces random masking with fixed to work with flax version * `make quality; make style` * Update src/transformers/models/llama/modeling_flax_llama.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update src/transformers/models/llama/modeling_flax_llama.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update src/transformers/models/llama/modeling_flax_llama.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update src/transformers/models/llama/modeling_flax_llama.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update src/transformers/models/llama/modeling_flax_llama.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update src/transformers/models/llama/modeling_flax_llama.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * updates `x`->`tensor` in `rotate_half` * addresses smaller review comments * Update docs/source/en/model_doc/llama.md Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * adds integration test class * adds `dtype` to rotary embedding to cast outputs * adds type to flax llama rotary layer * `make style` * `make fix-copies` * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * applies suggestions from review * Update modeling_flax_llama.py * `make fix-copies` * Update tests/models/llama/test_modeling_llama.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update src/transformers/models/llama/modeling_flax_llama.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * fixes shape mismatch in FlaxLlamaMLP * applies some suggestions from reviews * casts attn output logits to f32 regardless of dtype * adds attn bias using `LlamaConfig.attention_bias` * adds Copied From comments to Flax Llama test * mistral and persimmon test change -copy from llama * updates docs index * removes Copied from in tests it was preventing `make fix-copies` from succeeding * quality and style * ignores FlaxLlama input docstring * adds revision to `_CHECKPOINT_FOR_DOC` * repo consistency and quality * removes unused import * removes copied from from Phi test now diverges from llama tests following FlaxLlama changes * adds `_REAL_CHECKPOINT_FOR_DOC` * removes refs from pr tests * reformat to make ruff happy --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>	2023-12-07 07:05:00 +01:00
Yih-Dar	ac975074e6	Update `VitDetModelTester.get_config` to use `pretrain_image_size` (#27831 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-12-05 16:33:27 +01:00
Arindam Jati	b242d0f297	[Time series] Add PatchTSMixer (#26247 ) * patchtsmixer initial commit * x,y->context_values,target_values, unittest addded * cleanup code * minor * return hidden states * model tests, partial integration tests * ettm notebook temporary * minor * config mask bug fix, tests updated * final ETT notebooks * add selfattn * init * added docstrings * PatchTSMixerForPretraining -> PatchTSMixerForMaskPretraining * functionality tests added * add start and input docstrings * docstring edits * testcase edits * minor changes * docstring error fixed * ran make fixup * finalize integration tests and docs * minor * cleaned gitignore * added dataclass decorator, ran black formatter * ran ruff * formatting * add slow decorator * renamed in_Channel to input_size and default to 1 * shorten dataclass names * use smaller model for testing * moved the 3 heads to the modeling file * use scalers instead of revin * support forecast_channel_indices * fix regression scaling * undo reg. scaling * removed unneeded classes * forgot missing * add more layers * add copied positional_encoding * use patchmask from patchtst * removed dependency on layers directory * formatting * set seed * removed unused imports * fixed forward signature test * adding distributional head for PatchTSMixerForecasting * add generate to forecast * testcases for generate * add generate and distributional head for regression * raise Exception for negative values for neg binominal distribution * formatting changes * remove copied from patchtst and add TODO for test passing * make copies * doc edits * minor changes * format issues * minor changes * minor changes * format docstring * change some class names to PatchTSMixer + class name Transpose to PatchTSMixerTranspose GatedAttention to PatchTSMixerGatedAttention * change NormLayer to PatchTSMixerNormLayer * change MLP to PatchTSMixerMLP * change PatchMixer to PatchMixerBlock, FeatureMixer to FeatureMixerBlock * change ChannelFeatureMixer to ChannelFeatureMixerBlock * change PatchMasking to PatchTSMixerMasking * change Patchify to PatchTSMixerPatchify * list to `list` * fix docstrings * formatting * change bs to batch_size, edit forecast_masking * edit random_masking * change variable name and update docstring in PatchTSMixerMasking * change variable name and update docstring in InjectScalerStatistics4D * update forward call in PatchTSMixerTranspose * change variable name and update docstring in PatchTSMixerNormLayer * change variable name and update docstring in PatchTSMixerMLP * change variable name and update docstring in ChannelFeatureMixerBlock * formatting * formatting issues * docstring issue * fixed observed_mask type in docstrings * use FloatTensor type * formatting * fix rescaling issue in forecasting, fixed integration tests * add docstring from decorator * fix docstring * Update README.md Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/patchtsmixer/configuration_patchtsmixer.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/patchtsmixer/modeling_patchtsmixer.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/patchtsmixer/configuration_patchtsmixer.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/patchtsmixer/modeling_patchtsmixer.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * PatchTSMixerChannelFeatureMixerBlock * formatting * ForPretraining * use num_labels instead of n_classes * remove commented out code * docstring fixed * nn.functional used instead of one letter F * x_tmp renamed * one letter variable x removed from forward calls * one letter variable y removed * remove commented code * rename patch_size, in_channels, PatchTSMixerBackbone * add config to heads * add config to heads tests * code reafactoring to use config instead of passing individual params * Cdocstring fixes part 1 * docstring fixes part 2 * removed logger.debug * context_values -> past_values * formatting changes * pe -> positional_encoding * removed unused target variable * self.mode logic fixed * formatting change * edit docstring and var name * change n_targets to num_targets * rename input_size to num_input_channels * add head names with prefix PatchTSMixer * edit docstring in PatchTSMixerForRegression * fix var name change in testcases * add PatchTSMixerAttention * return dict for all exposed classes, test cases added * format * move loss function to forward call * make style * adding return dict/tuple * make repo-consistency * remove flatten mode * code refactoring * rename data * remove PatchTSMixer and keep only PatchTSMixerEncoder * docstring fixes * removed unused code * format * format * remove contiguous and formatting changes * remove model description from config * replace asserts with ValueError * remove nn.Sequential from PatchTSMixerNormLayer * replace if-else with map * remove all nn.Sequential * format * formatting * fix gradient_checkpointing error after merge, and formatting * make fix-copies * remove comments * reshape * doesnt support gradient checkpointing * corect Patchify * masking updates * batchnorm copy from * format checks * scaler edits * remove comments * format changes * remove self.config * correct class PatchTSMixerMLP(nn.Module): * makr fix * doc updates * fix-copies * scaler class correction * doc edits * scaler edits * update readme with links * injectstatistics add * fix-copies * add norm_eps option to LayerNorm * format changes * fix copies * correct make copies * use parametrize * fix doc string * add docs to toctree * make style * doc segmenting * docstring edit * change forecast to prediction * edit doc * doc edits * remove PatchTSMixerTranspose * add PatchTSMixerPositionalEncoding and init position_enc * remove positional_encoding * edit forecast_masking, remove forecast_mask_ratios * fix broken code * var rename target_values -> future_values * num_features -> d_model * fix broken code after master merge * repo consistency * use postional embedding * prediction_logits -> prediction_outputs, make fix-copies * uncommented @slow * minor changes * loss first in tuple * tuple and dict same ordering * style edits * minor changes * dict/tuple consistent enablement * Update src/transformers/models/patchtsmixer/modeling_patchtsmixer.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update tests/models/patchtsmixer/test_modeling_patchtsmixer.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/patchtsmixer/modeling_patchtsmixer.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix formatting * formatting * usage tip * test on cpu only * add sample usage * change PatchTSMixerForClassification to PatchTSMixerForTimeSeriesClassification * push changes * fix copies * std scaling set to default True case * minor changes * stylechanges --------- Co-authored-by: Arindam Jati <arindam.jati@ibm.com> Co-authored-by: vijaye12 <vijaye12@in.ibm.com> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> Co-authored-by: nnguyen <nnguyen@us.ibm.com> Co-authored-by: vijaye12 <vijaykr.e@gmail.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Nam Nguyen <namctin@gmail.com> Co-authored-by: Wesley Gifford <79663411+wgifford@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-12-05 15:31:35 +01:00
Joao Gante	b7e6d120c1	Generate: Update VisionEncoderDecoder test value (#27850 ) update test result, due to bug fix in decoder-only beam search	2023-12-05 11:26:59 +00:00
Younes Belkada	fdb85be40f	Faster generation using AWQ + Fused modules (#27411 ) * v1 fusing modules * add fused mlp support * up * fix CI * block save_pretrained * fixup * small fix * add new condition * add v1 docs * add some comments * style * fix nit * adapt from suggestion * add check * change arg names * change variables name * Update src/transformers/integrations/awq.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * style * split up into 3 different private methods * more conditions * more checks * add fused tests for custom models * fix * fix tests * final update docs * final fixes * fix importlib metadata * Update src/transformers/utils/quantization_config.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * change it to `do_fuse` * nit * Update src/transformers/utils/quantization_config.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/utils/quantization_config.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/utils/quantization_config.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * few fixes * revert * fix test * fix copies * raise error if model is not quantized * add test * use quantization_config.config when fusing * Update src/transformers/modeling_utils.py --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2023-12-05 12:14:45 +01:00
NielsRogge	df40edfb00	Make image processors more general (#27690 ) * Make image processors more general * Add backwards compatibility for KOSMOS-2 * Remove use_square_size everywhere * Remove script	2023-12-05 10:45:39 +01:00
Sanchit Gandhi	3c15fd1990	[Seamless v2] Add FE to auto mapping (#27829 )	2023-12-04 16:34:13 +00:00
Yih-Dar	1d63b0ec36	Disallow `pickle.load` unless `TRUST_REMOTE_CODE=True` (#27776 ) * fix * fix * Use TRUST_REMOTE_CODE * fix doc * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-12-04 16:48:37 +01:00
fxmarty	1da1302ec8	Flash Attention 2 support for RoCm (#27611 ) * support FA2 * fix typo * fix broken tests * fix more test errors * left/right * fix bug * more test * typo * fix layout flash attention falcon * do not support this case * use allclose instead of equal * fix various bugs with flash attention * bump * fix test * fix mistral * use skiptest instead of return that may be misleading * add fix causal arg flash attention * fix copies * more explicit comment * still use self.is_causal * fix causal argument * comment * fixes * update documentation * add link * wrong test * simplify FA2 RoCm requirements * update opt * make flash_attn_uses_top_left_mask attribute private and precise comment * better error handling * fix copy & mistral * Update src/transformers/modeling_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/modeling_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/modeling_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/utils/import_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * use is_flash_attn_greater_or_equal_2_10 instead of is_flash_attn_greater_or_equal_210 * fix merge * simplify * inline args --------- Co-authored-by: Felix Marty <felix@hf.co> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-12-04 21:52:17 +09:00
Nilesh	4d4febb7aa	Added test cases for rembert refering to albert and reformer test_tok… (#27637 ) * Added test cases for rembert refering to albert and reformer test_tokenization * removed CURL_CA_BUNDLE=' * Added flag test_sentencepiece_ignore_case and space_between_special_tokens to True * Overrided test_added_tokens_serialization * As slow->fast token failed due to the different initialization for [MASK] for slow and fast, Therefore it required to make the initialization for [MASK] token uniform between fast and slow token * Added few more test cases in test_encode_decode_round_trip and modefied the slow token (mask_token) to have AddedToken instance with lstrip=True * Added few test cases in test_encoder_decoder round trip and also modified slow tokenizer of rembert to have mask_token as AddedToken with lstrip = True * Cleaned the code and added fmt: skip to avoid line breaks after make style + added comments to indicate from the copied test cases * Corrected few comments * Fixed quality issue * Ran fix-copies * Fixed few minor issues as (make fix-copies) broke few test cases while stripping the text * Reverted the changes made by repo-consistancy --------- Co-authored-by: Kokane <kokanen@apac.corpdir.net>	2023-12-04 13:36:57 +01:00
Yih-Dar	73893df864	Fix `Owlv2ModelIntegrationTest::test_inference_object_detection` (#27793 ) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-12-04 09:45:22 +01:00
Yih-Dar	5a551df92b	Fix `TvpModelIntegrationTests` (#27792 ) * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-12-04 09:40:42 +01:00
Arthur	c0b9db0914	[`ModelOnTheFlyConversionTester`] Mark as slow for now (#27823 ) * mark test as slow for now * style	2023-12-04 08:33:15 +01:00
NielsRogge	7edf8bfafd	Improve forward signature test (#27729 ) * First draft * Extend test_forward_signature * Update tests/test_modeling_common.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Revert suggestion --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-12-04 07:38:22 +01:00
Nicolas Patry	7b6324e18e	Make using safetensors files automated. (#27571 ) * [WIP] Make using safetensors files automated. If `use_safetensors=True` is used, and it doesn't exist: - Don't crash just yet - Lookup for an open PR containing it. - If yes, use that instead - If not, touch the space to convert, wait for conversion to be finished and the PR to be opened - Use that new PR - Profit. * Remove the token. * [Auto Safetensors] Websocket -> SSE (#27656) * Websocket -> SSE * Support sharded + tests +cleanup a * env var * Apply suggestions from code review * Thanks Simon * Thanks Wauplin Co-authored-by: Wauplin <lucainp@gmail.com> * Cleanup * Update tests * Tests should pass * Apply to other tests * Extend extension * relax requirement on latest hfh * Revert * Correct private handling & debug statements * Skip gated repos as of now * Address review comments Co-authored-by: ArthurZucker <arthur.zucker@gmail.com> --------- Co-authored-by: Lysandre Debut <hi@lysand.re> Co-authored-by: Lysandre <lysandre@huggingface.co> Co-authored-by: Wauplin <lucainp@gmail.com> Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr> Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>	2023-12-01 15:51:10 +01:00
Yoach Lacombe	29f1aee3b6	Add SeamlessM4T v2 (#27779 ) * add working convertion script * first non-working version of modeling code * update modeling code (working) * make style * make fix-copies * add config docstrings * add config to ignore docstrings formatage due to unconventional markdown * fix copies * fix generation num_return_sequences * enrich docs * add and fix tests beside integration tests * update integration tests * update repo id * add tie weights and make style * correct naming in .md * fix imports and so on * correct docstrings * fix fp16 speech forward * fix speechencoder attention * make style * fix copied from * rename SeamlessM4Tv2-v2 to SeamlessM4Tv2 * Apply suggestions on configuration Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * remove useless public models * fix private models + better naming for T2U models * clean speech encoder relative position embeddings * refactor chunk attention * add docstrings to chunk attention method * improve naming and docstrings * rename some attention variables + add temperature sampling in T2U model * rename DOCSTRINGS variable names * make style + remove 2 useless config parameters * enrich model card * remove any attention_head reference + fix temperature in T2U * new fmt and make style * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * rename spkr_id->speaker_id and change docstrings of get_char_input_ids * simplify v2attention * make style * Update seamless_m4t_v2.md * update code and tests with last update * update repo ids * fill article name, abstract andauthors * update not_doctested and slow_doc tests --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-11-30 20:24:43 +01:00
Joao Gante	510270af34	Generate: `GenerationConfig` throws an exception when `generate` args are passed (#27757 )	2023-11-30 14:16:31 +00:00
Kashif Rasul	af8acc4760	[Time series] Add patchtst (#27581 ) * add distribution head to forecasting * formatting * Add generate function for forecasting * Add generate function to prediction task * formatting * use argsort * add past_observed_mask ordering * fix arguments * docs * add back test_model_outputs_equivalence test * formatting * cleanup * formatting * use ACT2CLS * formatting * fix add_start_docstrings decorator * add distribution head and generate function to regression task add distribution head and generate function to regression task. Also made add PatchTSTForForecastingOutput, PatchTSTForRegressionOutput. * add distribution head and generate function to regression task add distribution head and generate function to regression task. Also made add PatchTSTForForecastingOutput, PatchTSTForRegressionOutput. * fix typos * add forecast_masking * fixed tests * use set_seed * fix doc test * formatting * Update docs/source/en/model_doc/patchtst.md Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * better var names * rename PatchTSTTranspose * fix argument names and docs string * remove compute_num_patches and unused class * remove assert * renamed to PatchTSTMasking * use num_labels for classification * use num_labels * use default num_labels from super class * move model_type after docstring * renamed PatchTSTForMaskPretraining * bs -> batch_size * more review fixes * use hidden_state * rename encoder layer and block class * remove commented seed_number * edit docstring * Add docstring * formatting * use past_observed_mask * doc suggestion * make fix-copies * use Args: * add docstring * add docstring * change some variable names and add PatchTST before some class names * formatting * fix argument types * fix tests * change x variable to patch_input * format * formatting * fix-copies * Update tests/models/patchtst/test_modeling_patchtst.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * move loss to forward * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * formatting * fix a bug when pre_norm is set to True * output_hidden_states is set to False as default * set pre_norm=True as default * format docstring * format * output_hidden_states is None by default * add missing docs * better var names * docstring: remove default to False in output_hidden_states * change labels name to target_values in regression task * format * fix tests * change to forecast_mask_ratios and random_mask_ratio * change mask names * change future_values to target_values param in the prediction class * remove nn.Sequential and make PatchTSTBatchNorm class * black * fix argument name for prediction * add output_attentions option * add output_attentions to PatchTSTEncoder * formatting * Add attention output option to all classes * Remove PatchTSTEncoderBlock * create PatchTSTEmbedding class * use config in PatchTSTPatchify * Use config in PatchTSTMasking class * add channel_attn_weights * Add PatchTSTScaler class * add output_attentions arg to test function * format * Update doc with image patchtst.md * fix-copies * rename Forecast <-> Prediction * change name of a few parameters to match with PatchTSMixer. * Remove ForForecasting class to match with other time series models. make style * Remove PatchTSTForForecasting in the test * remove PatchTSTForForecastingOutput class * change test_forecast_head to test_prediction_head * style * fix docs * fix tests * change num_labels to num_targets * Remove PatchTSTTranspose * remove arguments in PatchTSTMeanScaler * remove arguments in PatchTSTStdScaler * add config as an argument to all the scaler classes * reformat * Add norm_eps for batchnorm and layernorm * reformat. * reformat * edit docstring * update docstring * change variable name pooling to pooling_type * fix output_hidden_states as tuple * fix bug when calling PatchTSTBatchNorm * change stride to patch_stride * create PatchTSTPositionalEncoding class and restructure the PatchTSTEncoder * formatting * initialize scalers with configs * edit output_hidden_states * style * fix forecast_mask_patches doc string * doc improvements * move summary to the start * typo * fix docstring * turn off masking when using prediction, regression, classification * return scaled output * adjust output when using distribution head * remove _num_patches function in the config * get config.num_patches from patchifier init * add output_attentions docstring, remove tuple in output_hidden_states * change SamplePatchTSTPredictionOutput and SamplePatchTSTRegressionOutput to SamplePatchTSTOutput * remove print("model_class: ", model_class) * change encoder_attention_heads to num_attention_heads * change norm to norm_layer * change encoder_layers to num_hidden_layers * change shared_embedding to share_embedding, shared_projection to share_projection * add output_attentions * more robust check of norm_type * change dropout_path to path_dropout * edit docstring * remove positional_encoding function and add _init_pe in PatchTSTPositionalEncoding * edit shape of cls_token and initialize it * add a check on the num_input_channels. * edit head_dim in the Prediction class to allow the use of cls_token * remove some positional_encoding_type options, remove learn_pe arg, initalize pe * change Exception to ValueError * format * norm_type is "batchnorm" * make style * change cls_token shape * Change forecast_mask_patches to num_mask_patches. Remove forecast_mask_ratios. * Bring PatchTSTClassificationHead on top of PatchTSTForClassification * change encoder_ffn_dim to ffn_dim and edit the docstring. * update variable names to match with the config * add generation tests * change num_mask_patches to num_forecast_mask_patches * Add examples explaining the use of these models * make style * Revert "Revert "[time series] Add PatchTST (#25927)" (#27486)" This reverts commit `78f6ed6c70`. * make style * fix default std scaler's minimum_scale * fix docstring * close code blocks * Update docs/source/en/model_doc/patchtst.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/patchtst/test_modeling_patchtst.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/patchtst/configuration_patchtst.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix tests * add add_start_docstrings * move examples to the forward's docstrings * update prepare_batch * update test * fix test_prediction_head * fix generation test * use seed to create generator * add output_hidden_states and config.num_patches * add loc and scale args in PatchTSTForPredictionOutput * edit outputs if if not return_dict * use self.share_embedding to check instead checking type. * remove seed * make style * seed is an optional int * fix test * generator device * Fix assertTrue test * swap order of items in outputs when return_dict=False. * add mask_type and random_mask_ratio to unittest * Update modeling_patchtst.py * add add_start_docstrings for regression model * make style * update model path * Edit the ValueError comment in forecast_masking * update examples * make style * fix commented code * update examples: remove config from from_pretrained call * Edit example outputs * Set default target_values to None * remove config setting in regression example * Update configuration_patchtst.py * Update configuration_patchtst.py * remove config from examples * change default d_model and ffn_dim * norm_eps default * set has_attentions to Trye and define self.seq_length = self.num_patche * update docstring * change variable mask_input to do_mask_input * fix blank space. * change logger.debug to logger.warning. * remove unused PATCHTST_INPUTS_DOCSTRING * remove all_generative_model_classes * set test_missing_keys=True * remove undefined params in the docstring. --------- Co-authored-by: nnguyen <nnguyen@us.ibm.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Nam Nguyen <namctin@gmail.com> Co-authored-by: Wesley Gifford <79663411+wgifford@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-11-29 13:36:38 +01:00
Susnato Dhar	dfbd209c25	CLVP Fixes (#27547 ) * fixes * more fixes * style fix * more fix * comments	2023-11-28 17:40:01 +01:00
Yih-Dar	30e92ea323	Trigger corresponding pipeline tests if `tests/utils/tiny_model_summary.json` is modified (#27693 ) * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-11-28 17:21:21 +01:00
NielsRogge	1fb3c23b41	Add BeitBackbone (#25952 ) * First draft * Add backwards compatibility * More improvements * More improvements * Improve error message * Address comment * Add conversion script * Fix style * Update code snippet * Adddress comment * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-11-28 08:38:32 +00:00
Charbel Abi Daher	2ca73e5ee3	Fixed passing scheduler-specific kwargs via TrainingArguments lr_scheduler_kwargs (#27595 ) * Fix passing scheduler-specific kwargs through TrainingArguments `lr_scheduler_kwargs` * Added test for lr_scheduler_kwargs	2023-11-28 08:33:45 +01:00
NielsRogge	59499bbe8b	Update forward signature test for vision models (#27681 ) * Update forward signature * Empty-Commit	2023-11-27 15:48:17 +01:00
jiqing-feng	1d7f406e19	fix assisted decoding assistant model inputs (#27503 ) * fix assisted decoding attention_cat * fix attention_mask for assisted decoding * fix attention_mask len * fix attn len * Use a more clean way to prepare assistant models inputs * fix param meaning * fix param name * fix assistant model inputs * update token type ids * fix assistant kwargs copy * add encoder-decoder tests of assisted decoding * check if assistant kwargs contains updated keys * revert test * fix whisper tests * fix assistant kwargs * revert whisper test * delete _extend funcs	2023-11-27 14:23:54 +00:00
Yanan Xie	b09912c8f4	Fix mistral generate for long prompt / response (#27548 ) * Fix mistral generate for long prompt / response * Add unit test * fix linter * fix linter * fix test * add assisted generation test for mistral and load the model in 4 bit + fa2	2023-11-27 10:18:41 +01:00
Yih-Dar	35551f9a0f	Fix `TVPModelTest` (#27695 ) * fix * fix * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-11-24 19:47:50 +01:00
Yih-Dar	7293fdc5b9	Deprecate `TransfoXL` (#27607 ) * fix * fix * trigger * Apply suggestions from code review Co-authored-by: Lysandre Debut <hi@lysand.re> * tic * revert * revert --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Lysandre Debut <hi@lysand.re>	2023-11-24 11:48:02 +01:00
Yih-Dar	623432dcc9	Skip pipeline tests for 2 models for now (#27687 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-11-24 09:43:20 +01:00
Yih-Dar	b8db265bc6	Update tiny model summary file (#27388 ) * update * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-11-23 21:00:39 +01:00
dg845	7f6a804d30	Add UnivNet Vocoder Model for Tortoise TTS Diffusers Integration (#24799 ) * initial commit * Add inital testing files and modify __init__ files to add UnivNet imports. * Fix some bugs * Add checkpoint conversion script and add references to transformers pre-trained model. * Add UnivNet entries for auto. * Add initial docs for UnivNet. * Handle input and output shapes in UnivNetGan.forward and add initial docstrings. * Write tests and make them pass. * Write docs. * Add UnivNet doc to _toctree.yml and improve docs. * fix typo * make fixup * make fix-copies * Add upsample_rates parameter to config and improve config documentation. * make fixup * make fix-copies * Remove unused upsample_rates config parameter. * apply suggestions from review * make style * Verify and add reason for skipped tests inherited from ModelTesterMixin. * Add initial UnivNetGan integration tests * make style * Remove noise_length input to UnivNetGan and improve integration tests. * Fix bug and make style * Make UnivNet integration tests pass * Add initial code for UnivNetFeatureExtractor. * make style * Add initial tests for UnivNetFeatureExtractor. * make style * Properly initialize weights for UnivNetGan * Get feature extractor fast tests passing * make style * Get feature extractor integration tests passing * Get UnivNet integration tests passing * make style * Add UnivNetGan usage example * make style and use feature extractor from hub in integration tests * Update tips in docs * apply suggestions from review * make style * Calculate padding directly instead of using get_padding methods. * Update UnivNetFeatureExtractor.to_dict to be UnivNet-specific. * Update feature extractor to support using model(*inputs) and add the ability to generate noise and pad the end of the spectrogram in __call__. Perform padding before generating noise to ensure the shapes are correct. * Rename UnivNetGan.forward's noise_waveform argument to noise_sequence. * make style * Add tests to test generating noise and padding the end for UnivNetFeatureExtractor.__call__. * Add tests for checking batched vs unbatched inputs for UnivNet feature extractor and model. * Add expected mean and stddev checks to the integration tests and make them pass. * make style * Make it possible to use model(*inputs), where inputs is the output of the feature extractor. fix typo in UnivNetGanConfig example * Calculate spectrogram_zero from other config values. * apply suggestions from review * make style * Refactor UnivNet conversion script to use load_state_dict (following persimmon). * Rename UnivNetFeatureExtractor to UnivNetGanFeatureExtractor. * make style * Switch to using torch.tensor and torch.testing.assert_close for testing expected values/slices. * make style * Use config in UnivNetGan modeling blocks. * make style * Rename the spectrogram argument of UnivNetGan.forward to input_features, following Whisper. * make style * Improving padding documentation. * Add UnivNet usage example to the docs. * apply suggestions from review * Move dynamic_range_compression computation into the mel_spectrogram method of the feature extractor. * Improve UnivNetGan.forward return docstring. * Update table in docs/source/en/index.md. * make fix-copies * Rename UnivNet components to have pattern UnivNet. make style * make fix-copies * Update docs * make style * Increase tolerance on flaky unbatched integration test. * Remove torch.no_grad decorators from UnivNet integration tests to try to avoid flax/Tensorflow test errors. * Add padding_mask argument to UnivNetModel.forward and add batch_decode feature extractor method to remove padding. * Update documentation and clean up padding code. * make style * make style * Remove torch dependency from UnivNetFeatureExtractor. * make style * Fix UnivNetModel usage example * Clean up feature extractor code/docstrings. * apply suggestions from review * make style * Add comments for tests skipped via ModelTesterMixin flags. * Add comment for model parallel tests skipped via the test_model_parallel ModelTesterMixin flag. * Add # Copied from statements to copied UnivNetFeatureExtractionTest tests. * Simplify UnivNetFeatureExtractorTest.test_batch_decode. * Add support for unbatched padding_masks in UnivNetModel.forward. * Refactor unbatched padding_mask support. * make style	2023-11-22 17:21:36 +01:00
Patrick von Platen	4151fbb49c	[Whisper] Add sequential longform decoding (#27492 ) * [Whisper] Add seq gen * [Whisper] Add seq gen * more debug * Fix whisper logit processor * Improve whisper code further * Fix more * more debug * more debug * Improve further * Add tests * Prep for batch size > 1 * Get batch_size>1 working * Correct more * Add extensive tests * more debug * more debug * more debug * add more tests * more debug * Apply suggestions from code review * more debug * add comments to explain the code better * add comments to explain the code better * add comments to explain the code better * Add more examples * add comments to explain the code better * fix more * add comments to explain the code better * add comments to explain the code better * correct * correct * finalize * Apply suggestions from code review * Apply suggestions from code review	2023-11-22 13:27:34 +01:00
fxmarty	7f04373865	Explicitely specify `use_cache=True` in Flash Attention tests (#27635 ) explicit use_cache=True	2023-11-22 01:53:10 +09:00
jiqing-feng	c770600fde	TVP model (#25856 ) * tvp model for video grounding add tokenizer auto fix param in TVPProcessor add docs clear comments and enable different torch dtype add image processor test and model test and fix code style * fix conflict * fix model doc * fix image processing tests * fix tvp tests * remove torch in processor * fix grammar error * add more details on tvp.md * fix model arch for loss, grammar, and processor * add docstring and do not regard TvpTransformer, TvpVisionModel as individual model * use pad_image * update copyright * control first downsample stride * reduce first only works for ResNetBottleNeckLayer * fix param name * fix style * add testing * fix style * rm init_weight * fix style * add post init * fix comments * do not test TvpTransformer * fix warning * fix style * fix example * fix config map * add link in config * fix comments * fix style * rm useless param * change attention * change test * add notes * fix comments * fix tvp * import checkpointing * fix gradient checkpointing * Use a more accurate example in readme * update * fix copy * fix style * update readme * delete print * remove tvp test_forward_signature * remove TvpTransformer * fix test init model * merge main and make style * fix tests and others * fix image processor * fix style and model_input_names * fix tests	2023-11-21 16:41:55 +00:00
amyeroberts	0145c6825e	Fix tracing dinov2 (#27561 ) * Enable tracing with DINOv2 model * ABC * Add note to model doc	2023-11-21 14:28:38 +00:00
fxmarty	82cc0a79ac	Fix flash attention bugs with Mistral and Falcon (#27625 ) * fix various bugs with flash attention * bump * fix test * fix mistral * use skiptest instead of return that may be misleading * fix on review	2023-11-21 23:20:44 +09:00
Leo Tronchon	851a4f7088	Idefics: Fix information leak with cross attention gate in modeling (#26839 ) * fix image_attention gate in idefics modeling * update comment * cleaner gating * fix gate condition * create attention gate once * update comment * update doc of cross-attention forward * improve comment * bring back no_images * pass cross_attention_gate similarly to no_images gate * add information on gate shape * fix no_images placement * make tests for gate * take off no_images logic * update test based on comments * raise value error if cross_attention_gate is None * send cross_attention_gate to device * Revert "send cross_attention_gate to device" This reverts commit `054f842284`. * send cross_attention_gate to device * fix device in test + nit * fill hidden_states with zeros instead of multiplying with the gate * style * Update src/transformers/models/idefics/modeling_idefics.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/idefics/modeling_idefics.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-11-21 13:26:01 +01:00
Dave Berenbaum	8eb9e29d8d	dvclive callback: warn instead of fail when logging non-scalars (#27608 ) * dvclive callback: warn instead of fail when logging non-scalars * tests: log lr as scalar	2023-11-21 09:29:51 +01:00
Younes Belkada	e66984f995	[`FA-2`] Add fa2 support for `from_config` (#26914 ) * add fa2 support for from_config * Update test_modeling_common.py	2023-11-20 16:45:55 +01:00
Joel Tang	dbf7bfafa7	Fix idx2sym not loaded from pretrained vocab file in Transformer XL (#27589 ) * Load idx2sym from pretrained vocab file in Transformer XL When loading vocab file from a pretrained tokenizer for Transformer XL, although the pickled vocabulary file contains a idx2sym key, it isn't loaded, because it is discarded as the empty list already exists as an attribute. Solution is to explicitly take it into account, just like for sym2idx. * ran make style	2023-11-20 07:56:18 +01:00
V.Prasanna kumar	ffbcfc0166	Broken links fixed related to datasets docs (#27569 ) fixed the broken links belogs to dataset library of transformers	2023-11-17 13:44:09 -08:00
Joao Gante	913d03dc5e	Generate: fix flaky tests (#27543 )	2023-11-17 10:15:00 +00:00
Yih-Dar	fe3ce061c4	Skip some fuyu tests (#27553 ) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-11-17 10:35:04 +01:00
Joao Gante	12b50c6130	Generate: improve assisted generation tests (#27540 )	2023-11-16 18:54:20 +00:00
Arthur	651408a077	[`Styling`] stylify using ruff (#27144 ) * try to stylify using ruff * might need to remove these changes? * use ruf format andruff check * use isinstance instead of type comparision * use # fmt: skip * use # fmt: skip * nits * soem styling changes * update ci job * nits isinstance * more files update * nits * more nits * small nits * check and format * revert wrong changes * actually use formatter instead of checker * nits * well docbuilder is overwriting this commit * revert notebook changes * try to nuke docbuilder * style * fix feature exrtaction test * remve `indent-width = 4` * fixup * more nits * update the ruff version that we use * style * nuke docbuilder styling * leve the print for detected changes * nits * Remove file I/O Co-authored-by: charliermarsh <charlie.r.marsh@gmail.com> * style * nits * revert notebook changes * Add # fmt skip when possible * Add # fmt skip when possible * Fix * More ` # fmt: skip` usage * More ` # fmt: skip` usage * More ` # fmt: skip` usage * NIts * more fixes * fix tapas * Another way to skip * Recommended way * Fix two more fiels * Remove asynch Remove asynch --------- Co-authored-by: charliermarsh <charlie.r.marsh@gmail.com>	2023-11-16 17:43:19 +01:00
Lucain	fd65aa9818	Set `usedforsecurity=False` in hashlib methods (FIPS compliance) (#27483 ) * Set usedforsecurity=False in hashlib methods (FIPS compliance) * trigger ci * tokenizers version * deps * bump hfh version * let's try this	2023-11-16 14:29:53 +00:00
Patrick von Platen	5603fad247	Revert "add attention_mask and position_ids in assisted model" (#27523 ) * Revert "add attention_mask and position_ids in assisted model (#26892)" This reverts commit `184f60dcec`. * more debug	2023-11-16 14:50:39 +01:00
Marc Sun	1ac599d90f	Fix offload disk for loading derivated model checkpoint into base model (#27253 ) * fix * style * add test	2023-11-15 14:58:08 -05:00
Arthur	48ba1e074f	[ `PretrainedConfig`] Improve messaging (#27438 ) * import hf error * nits * fixup * catch the error at the correct place * style * improve message a tiny bit * Update src/transformers/utils/hub.py Co-authored-by: Lucain <lucainp@gmail.com> * add a test --------- Co-authored-by: Lucain <lucainp@gmail.com>	2023-11-15 14:10:39 +01:00
Xin Qiu	453079c7f8	🚨🚨 Fix beam score calculation issue for decoder-only models (#27351 ) * Fix beam score calculation issue for decoder-only models * Update beam search test and fix code quality issue * Fix beam_sample, group_beam_search and constrained_beam_search * Split test for pytorch and TF, add documentation --------- Co-authored-by: Xin Qiu <xin.qiu@sentient.ai>	2023-11-15 12:49:14 +00:00
Arthur	1e0e2dd376	[`CircleCI`] skip test_assisted_decoding_sample for everyone (#27511 ) * skip 4 tests * nits * style * wow it's not my day * skip new failing tests * style * skip for NLLB MoE as well * skip `test_assisted_decoding_sample` for everyone	2023-11-15 10:17:51 +01:00
NielsRogge	cc0dc24bc9	[Fuyu] Add tests (#27001 ) * Add tests * Add integration test * More improvements * Fix tests * Fix style * Skip gradient checkpointing tests * Update script * Remove scripts * Remove Fuyu from auto mapping * Fix integration test * More improvements * Remove file * Add Fuyu to slow documentation tests * Address comments * Clarify comment	2023-11-15 09:33:04 +01:00
Arthur	186c077513	[`CI-test_torch`] skip test_tf_from_pt_safetensors and `test_assisted_decoding_sample` (#27508 ) * skip 4 tests * nits * style * wow it's not my day * skip new failing tests * style * skip for NLLB MoE as well	2023-11-15 08:39:29 +01:00
Zach Mueller	067c4a310d	Have seq2seq just use gather (#27025 ) * Have seq2seq just use gather * Change * Reset after * Make slow * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Clean * Simplify and just use gather * Update tests/trainer/test_trainer_seq2seq.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * gather always for seq2seq --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-11-14 14:54:44 -05:00
amyeroberts	78f6ed6c70	Revert "[time series] Add PatchTST (#25927 )" (#27486 ) The model was merged before final review and approval. This reverts commit `2ac5b9325e`.	2023-11-14 12:24:00 +00:00
Sanchit Gandhi	a4616c6767	[Whisper] Fix pipeline test (#27442 )	2023-11-14 11:18:26 +00:00
Sihan Chen	4309abedbc	Add speecht5 batch generation and fix wrong attention mask when padding (#25943 ) * fix speecht5 wrong attention mask when padding * enable batch generation and add parameter attention_mask * fix doc * fix format * batch postnet inputs, return batched lengths, and consistent to old api * fix format * fix format * fix the format * fix doc-builder error * add test, cross attention and docstring * optimize code based on reviews * docbuild * refine * not skip slow test * add consistent dropout for batching * loose atol * add another test regarding to the consistency of vocoder * fix format * refactor * add return_concrete_lengths as parameter for consistency w/wo batching * fix review issues * fix cross_attention issue	2023-11-14 09:54:09 +00:00
Arthur	e107ae364e	[`CI-test_torch`] skip `test_tf_from_pt_safetensors` for 4 models (#27481 ) * skip 4 tests * nits * style * wow it's not my day	2023-11-14 10:34:03 +01:00
Younes Belkada	d71fa9f618	[`Peft`] `modules_to_save` support for peft integration (#27466 ) * `modules_to_save` support for peft integration * Update docs/source/en/peft.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * slightly elaborate test --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-11-14 10:32:57 +01:00
Gift Sinthong	2ac5b9325e	[time series] Add PatchTST (#25927 ) * Initial commit of PatchTST model classes Co-authored-by: Phanwadee Sinthong <phsinthong@gmail.com> Co-authored-by: Nam Nguyen <namctin@gmail.com> Co-authored-by: Vijay Ekambaram <vijaykr.e@gmail.com> Co-authored-by: Ngoc Diep Do <55230119+diepi@users.noreply.github.com> Co-authored-by: Wesley Gifford <79663411+wgifford@users.noreply.github.com> * Add PatchTSTForPretraining * update to include classification Co-authored-by: Phanwadee Sinthong <phsinthong@gmail.com> Co-authored-by: Nam Nguyen <namctin@gmail.com> Co-authored-by: Vijay Ekambaram <vijaykr.e@gmail.com> Co-authored-by: Ngoc Diep Do <55230119+diepi@users.noreply.github.com> Co-authored-by: Wesley Gifford <79663411+wgifford@users.noreply.github.com> * clean up auto files * Add PatchTSTForPrediction * Fix relative import * Replace original PatchTSTEncoder with ChannelAttentionPatchTSTEncoder * temporary adding absolute path + add PatchTSTForForecasting class * Update base PatchTSTModel + Unittest * Update ForecastHead to use the config class * edit cv_random_masking, add mask to model output * Update configuration_patchtst.py * add masked_loss to the pretraining * add PatchEmbeddings * Update configuration_patchtst.py * edit loss which considers mask in the pretraining * remove patch_last option * Add commits from internal repo * Update ForecastHead * Add model weight initilization + unittest * Update PatchTST unittest to use local import * PatchTST integration tests for pretraining and prediction * Added PatchTSTForRegression + update unittest to include label generation * Revert unrelated model test file * Combine similar output classes * update PredictionHead * Update configuration_patchtst.py * Add Revin * small edit to PatchTSTModelOutputWithNoAttention * Update modeling_patchtst.py * Updating integration test for forecasting * Fix unittest after class structure changed * docstring updates * change input_size to num_input_channels * more formatting * Remove some unused params * Add a comment for pretrained models * add channel_attention option add channel_attention option and remove unused positional encoders. * Update PatchTST models to use HF's MultiHeadAttention module * Update paper + github urls * Fix hidden_state return value * Update integration test to use PatchTSTForForecasting * Adding dataclass decorator for model output classes * Run fixup script * Rename model repos for integration test * edit argument explanation * change individual option to shared_projection * style * Rename integration test + import cleanup * Fix outpu_hidden_states return value * removed unused mode * added std, mean and nops scaler * add initial distributional loss for predition * fix typo in docs * add generate function * formatting * add num_parallel_samples * Fix a typo * copy weighted_average function, edit PredictionHead * edit PredictionHead * add distribution head to forecasting * formatting * Add generate function for forecasting * Add generate function to prediction task * formatting * use argsort * add past_observed_mask ordering * fix arguments * docs * add back test_model_outputs_equivalence test * formatting * cleanup * formatting * use ACT2CLS * formatting * fix add_start_docstrings decorator * add distribution head and generate function to regression task add distribution head and generate function to regression task. Also made add PatchTSTForForecastingOutput, PatchTSTForRegressionOutput. * add distribution head and generate function to regression task add distribution head and generate function to regression task. Also made add PatchTSTForForecastingOutput, PatchTSTForRegressionOutput. * fix typos * add forecast_masking * fixed tests * use set_seed * fix doc test * formatting * Update docs/source/en/model_doc/patchtst.md Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * better var names * rename PatchTSTTranspose * fix argument names and docs string * remove compute_num_patches and unused class * remove assert * renamed to PatchTSTMasking * use num_labels for classification * use num_labels * use default num_labels from super class * move model_type after docstring * renamed PatchTSTForMaskPretraining * bs -> batch_size * more review fixes * use hidden_state * rename encoder layer and block class * remove commented seed_number * edit docstring * Add docstring * formatting * use past_observed_mask * doc suggestion * make fix-copies * use Args: * add docstring * add docstring * change some variable names and add PatchTST before some class names * formatting * fix argument types * fix tests * change x variable to patch_input * format * formatting * fix-copies * Update tests/models/patchtst/test_modeling_patchtst.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * move loss to forward * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/patchtst/modeling_patchtst.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * formatting * fix a bug when pre_norm is set to True * output_hidden_states is set to False as default * set pre_norm=True as default * format docstring * format * output_hidden_states is None by default * add missing docs * better var names * docstring: remove default to False in output_hidden_states * change labels name to target_values in regression task * format * fix tests * change to forecast_mask_ratios and random_mask_ratio * change mask names * change future_values to target_values param in the prediction class * remove nn.Sequential and make PatchTSTBatchNorm class * black * fix argument name for prediction * add output_attentions option * add output_attentions to PatchTSTEncoder * formatting * Add attention output option to all classes * Remove PatchTSTEncoderBlock * create PatchTSTEmbedding class * use config in PatchTSTPatchify * Use config in PatchTSTMasking class * add channel_attn_weights * Add PatchTSTScaler class * add output_attentions arg to test function * format * Update doc with image patchtst.md * fix-copies * rename Forecast <-> Prediction * change name of a few parameters to match with PatchTSMixer. * Remove ForForecasting class to match with other time series models. make style * Remove PatchTSTForForecasting in the test * remove PatchTSTForForecastingOutput class * change test_forecast_head to test_prediction_head * style * fix docs * fix tests * change num_labels to num_targets * Remove PatchTSTTranspose * remove arguments in PatchTSTMeanScaler * remove arguments in PatchTSTStdScaler * add config as an argument to all the scaler classes * reformat * Add norm_eps for batchnorm and layernorm * reformat. * reformat * edit docstring * update docstring * change variable name pooling to pooling_type * fix output_hidden_states as tuple * fix bug when calling PatchTSTBatchNorm * change stride to patch_stride * create PatchTSTPositionalEncoding class and restructure the PatchTSTEncoder * formatting * initialize scalers with configs * edit output_hidden_states * style * fix forecast_mask_patches doc string --------- Co-authored-by: Gift Sinthong <gift.sinthong@ibm.com> Co-authored-by: Nam Nguyen <namctin@gmail.com> Co-authored-by: Vijay Ekambaram <vijaykr.e@gmail.com> Co-authored-by: Ngoc Diep Do <55230119+diepi@users.noreply.github.com> Co-authored-by: Wesley Gifford <79663411+wgifford@users.noreply.github.com> Co-authored-by: Wesley M. Gifford <wmgifford@us.ibm.com> Co-authored-by: nnguyen <nnguyen@us.ibm.com> Co-authored-by: Ngoc Diep Do <diiepy@gmail.com> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2023-11-13 19:06:32 +01:00
Younes Belkada	7b139023c3	[`AWQ` ] Addresses TODO for awq tests (#27467 ) addresses todo for awq tests	2023-11-13 18:18:41 +01:00
NielsRogge	2422c38de6	Add DINOv2 depth estimation (#26092 ) * First draft * Fix style * More improvements * Fix tests * Fix tests * Convert checkpoint * Improve DPTImageProcessor * Remove scripts, improve conversion script * Remove print statements * Fix test * Improve docstring * More improvements * Fix style * Fix image processor * Add tests * Address comments * Address comments * Make bias backwards compatible * Address comment * Address comment * Address comment * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Address comments * Add flag * Add tests * Make tests smaller * Use regular BackboneOutput * Fix all tests * Update test * Convert more checkpoints * Convert giant checkpoints, add integration test * Rename size_divisibility to size_divisor --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-11-13 16:20:42 +00:00
Lysandre Debut	68ae3be7f5	Fix `from_pt` flag when loading with `safetensors` (#27394 ) * Fix * Tests * Fix	2023-11-13 15:18:19 +01:00
Lysandre Debut	9dc8fe1b32	Default to msgpack for safetensors (#27460 ) * Default to msgpack for safetensors * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-11-13 15:17:01 +01:00
Arthur	b97cab7e6d	Remove-auth-token (#27060 ) * don't use `use_auth_token`internally * let's use token everywhere * fixup	2023-11-13 14:20:54 +01:00
amyeroberts	ed115b3473	Normalize floating point cast (#27249 ) * Normalize image - cast input images to float32. This is done if the input image isn't of floating type. Issues can occur when do_rescale=False is set in an image processor. When this happens, the image passed to the call is of type uint8 becuase of the type casting that happens in resize because of the PIL image library. As the mean and std values are cast to match the image dtype, this can cause NaNs and infs to appear in the normalized image, as the floating values being used to divide the image are now set to 0. The reason the mean and std values are cast is because previously they were set as float32 by default. However, if the input image was of type float16, the normalization would result in the image being upcast to float32 too. * Add tests * Remove float32 cast	2023-11-10 15:35:27 +00:00
Susnato Dhar	e1c3ac2551	Add Phi-1 and Phi-1_5 (#26170 ) * only dir not even init * init * tokenizer removed and reference of codegen added * modeling file updated a lot remaining app_rotary_emb * conversion script done * conversion script fixed, a lot of factoring done and most tests pass * added token_clf and extractive_QA_head * integration tests pass * flash attn tests pass! * config done * more docs in modeling file * some style fix * style and others * doc test error fix * more doc fix * some attention fixes * most fixes * style and other fixes * docs fix and config * doc fix * some comments * conversion script updated * conversion script updated * Revert "conversion script updated" This reverts commit e92378c54084ec0747041b113083d1746ecb6c7f. * final comments * add Phi to language_modeling.md * edit phi.md file * rebase and fix * removed phi-1.5 example * changed model_type from 'phi'->'mixformer-sequential' * small change * small change * revert \small change * changed mixformer-sequential->phi * small change * added phi-1.5 example instead of phi-1 * doc test might pass now * rebase and small change * added the dropout layer * more fixes * modified .md file * very very small doc change	2023-11-10 15:28:30 +00:00
Arthur	68afca3e69	[`AttentionMaskConverter`] ]Fix-mask-inf (#27114 ) * fix? * actual fix * fixups * add dataclass to the attention mask converter * refine testing suite * make sure there are no overflows * update the test	2023-11-10 15:22:43 +01:00
Susnato Dhar	7e9f10ac94	Add CLVP (#24745 ) * init commit * attention arch done except rotary emb * rotary emb done * text encoder working * outputs matching * arch first pass done * make commands done, tests and docs remaining * all tests passed, only docs remaining * docs done * doc-builder fix * convert script removed(not relevant) * minor comments done * added ckpt conversion script * tokenizer done * very minor fix of index.md 2 * mostly make fixup related * all done except fe and rotary emb * very small change * removed unidecode dependency * style changes * tokenizer removed require_backends * added require_inflect to tokenizer tests * removed VOCAB_FILES in tokenizer test * inflect dependency removed * added rotary pos emb cache and simplified the apply method * style * little doc change * more comments * feature extractor added * added processor * auto-regressive config added * added CLVPConditioningEncoder * comments done except the test one * weights added successfull(NOT tested) * tokenizer fix with numbers * generate outputs matching * almost tests passing Integ tests not written * Integ tests added * major CUDA error fixed * docs done * rebase and multiple fixes * fixed rebase overwrites * generate code simplified and tests for AutoRegressive model added * minor changes * refectored gpt2 code in clvp file * weights done and all code refactored * mostly done except the fast_tokenizer * doc test fix * config file's doc fixes * more config fix * more comments * tokenizer comments mostly done * modeling file mostly refactored and can load modules * ClvpEncoder tested * ClvpDecoder, ClvpModel and ClvpForCausalLM tested * integration and all tests passed * more fixes * docs almost done * ckpt conversion refectored * style and some failing tests fix * comments * temporary output fix but test_assisted_decoding_matches_greedy_search test fails * majority changes done * use_cache outputs same now! Along with the asisted_greedy_decoding test fix * more comments * more comments * prepare_inputs_for_generation fixed and _prepare_model_inputs added * style fix * clvp.md change * moved clvpconditionalencoder norms * add model to new index * added tokenizer input_ids_with_special_tokens * small fix * config mostly done * added config-tester and changed conversion script * more comments * comments * style fix * some comments * tokenizer changed back to prev state * small commnets * added output hidden states for the main model * style fix * comments * small change * revert small change * . * Update clvp.md * Update test_modeling_clvp.py * :) * some minor change * new fixes * remove to_dict from FE	2023-11-10 13:49:10 +00:00
Younes Belkada	fd685cfd59	[`Quantization`] Add str to enum conversion for AWQ (#27320 ) * add str to enum conversion * fixup * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-11-10 13:45:00 +01:00
Yoach Lacombe	51a98c40ee	remove failing tests and clean FE files (#27414 ) * remove failing tests and clean FE files * remove same similar text from tvlt	2023-11-09 18:35:42 +00:00
Lucain	e38348ae8f	Fix RequestCounter to make it more future-proof (#27406 ) * Fix RequestCounter to make it more future-proof * code quality	2023-11-09 18:53:26 +01:00
Yih-Dar	3258ff9330	use `pytest.mark` directly (#27390 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-11-09 13:32:54 +01:00
Hz, Ji	c5d7754b11	device-agnostic deepspeed testing (#27342 )	2023-11-09 12:34:13 +01:00
amyeroberts	9999b73968	Skip failing cache call tests (#27393 ) * Skip failing cache call tests * Fixup	2023-11-09 11:03:37 +00:00
Arthur	085ea7e56c	[`CodeLlamaTokenizer`] Nit, update __init__ to make sure the AddedTokens are not normalized because they are special (#27359 ) * make sure tokens are properly initialized for codellama slow * add m ore pretrained models * style * test more tokenizers checkpoints	2023-11-09 10:15:10 +01:00
Sourab Mangrulkar	7ecd229ba4	Smangrul/fix failing ds ci tests (#27358 ) * fix failing DeepSpeed CI tests due to `safetensors` being default * debug * remove debug statements * resolve comments * Update test_deepspeed.py	2023-11-09 11:47:24 +05:30
Sergii Dymchenko	0e402e1478	Update deprecated `torch.range` in `test_modeling_ibert.py` (#27355 ) * Update deprecated torch.range * Remove comment	2023-11-08 20:58:36 +01:00
Yoach Lacombe	a5bee89c9d	Add Flash Attention 2 support to Bark (#27364 ) * change handmade attention mask to _prepare_4d_attention_mask * add flashattention2 support in Bark * add flashattention2 tests on BarkSemanticModel * make style * fix flashattention and tests + make style * fix memory leak and allow Bark to pass flash attention to sub-models * make style * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * remove unecessary code from tests + justify overriding * Update tests/models/bark/test_modeling_bark.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * make style --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-11-08 17:06:35 +00:00
Sanchit Gandhi	f16ff0f07e	MusicGen Update (#27084 ) * [MusicGen] Add stereo model * safe serialization * Update src/transformers/models/musicgen/modeling_musicgen.py * split over 2 lines * fix slow tests on cuda	2023-11-08 13:26:02 +00:00
Yoach Lacombe	be74b2ead6	Add numpy alternative to FE using torchaudio (#26339 ) * add audio_utils usage in the FE of SpeechToText * clean unecessary parameters of AudioSpectrogramTransformer FE * add audio_utils usage in AST * add serialization tests and function to FEs * make style * remove use_torchaudio and move to_dict to FE * test audio_utils usage * make style and fix import (remove torchaudio dependency import) * fix torch dependency for jax and tensor tests * fix typo * clean tests with suggestions * add lines to test if is_speech_availble is False	2023-11-08 07:39:37 +00:00
Yoach Lacombe	ac5d4cf6de	FIx Bark batching feature (#27271 ) * fix bark batching * make style * add tests and make style	2023-11-07 18:32:00 +00:00
Joao Gante	90b4adc1f1	Generate: skip tests on unsupported models instead of passing (#27265 )	2023-11-07 12:08:28 +00:00
Sanchit Gandhi	da7ea9a4e3	[Whisper] Block language/task args for English-only (#27322 ) * [Whisper] Block language/task args for English-only * Update src/transformers/models/whisper/modeling_whisper.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-11-07 10:04:23 +00:00
Yih-Dar	1b20e2bb42	Fix `Kosmos2Processor` batch mode (#27323 ) * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-11-06 19:05:50 +01:00
Hz, Ji	1ffc4dee5b	enable memory tracker metrics for npu (#27280 )	2023-11-06 13:44:21 +00:00
Susnato Dhar	1ac2463dfe	[`FA2`] Add flash attention for for `DistilBert` (#26489 ) * flash attention added for DistilBert * fixes * removed padding_masks * Update modeling_distilbert.py * Update test_modeling_distilbert.py * style fix	2023-11-03 16:07:54 +00:00
Younes Belkada	8f1a43cd91	[`PEFT` / `Tests` ] Fix peft integration failing tests (#27258 ) fix peft integration issues	2023-11-03 12:23:02 +01:00
Tom Aarsen	05ea7b79e6	Refactor: Use Llama RoPE implementation for Falcon (#26933 ) * Use Llama RoPE implementation for Falcon + Add copy functionalities * Use standard cache format for Falcon * Simplify apply_rotary_pos_emb, copy from Llama * Remove unnecessary cache conversion test We don't need to convert any caches anymore! * Resolve copy complaint	2023-11-03 11:05:55 +00:00
Yoach Lacombe	0ed6729bb1	Enrich TTS pipeline parameters naming (#26473 ) * enrich TTS pipeline docstring for clearer forward_params use * change token leghts * update Pipeline parameters * correct docstring and make style * fix tests * make style * change music prompt Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * raise errors if generate_kwargs with forward-only models * make style --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-11-02 17:06:56 +00:00
Joao Gante	a6c82d4567	Generate: return `past_key_values` (#25086 )	2023-11-02 15:39:21 +00:00
Nicolas Patry	8801861d2d	Fixing m4t. (#27240 ) * Fixing m4t. * Trying to remove comparison ? Odd test failure. * Adding shared. But why on earth does it hang ???? * Putting back the model weights checks the test is silently failing on cuda. * Fix style + unremoved comment.	2023-11-02 15:32:17 +01:00
Lysandre Debut	443bf5e9e2	Fix safetensors failing tests (#27231 ) * Fix Kosmos2 * Fix ProphetNet * Fix MarianMT * Fix M4T * XLM ProphetNet * ProphetNet fix * XLM ProphetNet * Final M4T fixes * Tied weights keys * Revert M4T changes * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-11-02 15:03:09 +01:00
Pablo Montalvo	8a312956fd	Fuyu: improve image processing (#27007 ) * Fix Fuyu image scaling bug It could produce negative padding and hence inference errors for certain image sizes. * initial rework commit * add batching capabilities, refactor image processing * add functional batching for a list of images and texts * make args explicit * Fuyu processing update (#27133) * Add file headers * Add file headers * First pass - preprocess method with standard args * First pass image processor rework * Small tweaks * More args and docstrings * Tidying iterating over batch * Tidying up * Modify to have quick tests (for now) * Fix up * BatchFeature * Passing tests * Add tests for processor * Sense check when patchifying * Add some tests * FuyuBatchFeature * Post-process box coordinates * Update to `size` in processor * Remove unused and duplicate constants * Store unpadded dims after resize * Fix up * Return FuyuBatchFeature * Get unpadded sizes after resize * Update exception * Fix return * Convert input `<box>` coordinates to model format. * Post-process point coords, support multiple boxes/points in a single sequence * Replace constants * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Preprocess List[List[image]] * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update to Amy's latest state. * post-processing returns a list of tensors * Fix error when target_sizes is None Co-authored-by: Pablo Montalvo <pablo.montalvo.leroux@gmail.com> * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Review comments * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Fix up * Fix up --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-72-126.ec2.internal> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Pablo Montalvo <pablo.montalvo.leroux@gmail.com> * Fix conflicts in fuyu_follow_up_image_processing (#27228) fixing conflicts and updating on main * Revert "Fix conflicts in fuyu_follow_up_image_processing" (#27232) Revert "Fix conflicts in fuyu_follow_up_image_processing (#27228)" This reverts commit `acce10b6c6`. --------- Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-72-126.ec2.internal>	2023-11-02 12:25:41 +01:00
Younes Belkada	9b25c164bd	[`core` / `Quantization`] Fix for 8bit serialization tests (#27234 ) * fix for 8bit serialization * added regression tests. * fixup	2023-11-02 12:03:51 +01:00
Patrick von Platen	af3de8d87c	[Whisper, Bart, MBart] Add Flash Attention 2 (#27203 ) * add whisper fa2 * correct * change all * correct * correct * fix more * fix more * fix more * fix more * fix more * fix more * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix more * fix more * fix more * fix more * fix more --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-11-01 21:03:01 +01:00
Lysandre Debut	95020f208e	Fix CPU offload + disk offload tests (#27204 ) Fix disk offload tests + weight sharing issues	2023-11-01 19:25:23 +01:00
Marc Sun	c9e72f55b2	Add exllamav2 better (#27111 ) * add_ xllamav2 arg * add test * style * add check * add doc * replace by use_exllama_v2 * fix tests * fix doc * style * better condition * fix logic * add deprecate msg * deprecate exllama * remove disable_exllama from the linter * remove * fix warning * Revert the commits deprecating exllama * deprecate disable_exllama for use_exllama * fix * fix loading attribute * better handling of args * remove disable_exllama from init and linter * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * better arg * fix warning * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * switch to dict * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * style * nits * style * better tests * style --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-11-01 13:09:21 -04:00
Andi Powers Holmes	f8afb2b2ec	Add TensorFlow implementation of ConvNeXTv2 (#25558 ) * Add type annotations to TFConvNextDropPath * Use tf.debugging.assert_equal for TFConvNextEmbeddings shape check * Add TensorFlow implementation of ConvNeXTV2 * check_docstrings: add TFConvNextV2Model to exclusions TFConvNextV2Model and TFConvNextV2ForImageClassification have docstrings which are equivalent to their PyTorch cousins, but a parsing issue prevents them from passing the test. Adding exclusions for these two classes as discussed in #25558.	2023-11-01 15:09:55 +00:00
Patrick von Platen	391d14e810	[WhisperForCausalLM] Add WhisperForCausalLM for speculative decoding (#27195 ) * finish * add tests * fix all tests * [Assistant Decoding] Add test * fix more * better * finish * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * finish --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-11-01 16:01:53 +01:00
Younes Belkada	ae093eef01	[`core` / `Quantization` ] AWQ integration (#27045 ) * working v1 * oops * Update src/transformers/modeling_utils.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * fixup * oops * push * more changes * add docs * some fixes * fix copies * add v1 doc * added installation guide * relax constraints * revert * attempt llm-awq * oops * oops * fixup * raise error when incorrect cuda compute capability * nit * add instructions for llm-awq * fixup * fix copies * fixup and docs * change * few changes + add demo * add v1 tests * add autoawq in dockerfile * finalize * Update tests/quantization/autoawq/test_awq.py * fix test * fix * fix issue * Update src/transformers/integrations/awq.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/main_classes/quantization.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/main_classes/quantization.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/integrations/awq.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/integrations/awq.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * add link to example script * Update docs/source/en/main_classes/quantization.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * add more content * add more details * add link to quantization docs * camel case + change backend class name * change to string * fixup * raise errors if libs not installed * change to `bits` and `group_size` * nit * nit * Apply suggestions from code review Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * disable training * address some comments and fix nits * fix * final nits and fix tests * adapt to our new runners * make fix-copies * Update src/transformers/utils/quantization_config.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/utils/quantization_config.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/integrations/awq.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/integrations/awq.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * move to top * add conversion test * final nit * add more elaborated test --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-11-01 09:06:31 +01:00
Hz, Ji	82c7e87987	device agnostic fsdp testing (#27120 ) * make fsdp test cases device agnostic * make style	2023-11-01 07:17:06 +01:00
Lysandre Debut	113ebf80ac	Safetensors serialization by default (#27064 ) * Safetensors serialization by default * First pass on the tests * Second pass on the tests * Third pass on the tests * Fix TF weight loading from TF-format safetensors * Specific encoder-decoder fixes for weight crossloading * Add VisionEncoderDecoder fixes for TF too * Change filename test for pt-to-tf * One missing fix for TFVisionEncoderDecoder * Fix the other crossload test * Support for flax + updated tests * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Sanchit's comments * Sanchit's comments 2 * Nico's comments * Fix tests * cleanup * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: Matt <rocketknight1@gmail.com> Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-10-31 19:16:49 +01:00
Hz, Ji	50378cbf6c	device agnostic models testing (#27146 ) * device agnostic models testing * add decorator `require_torch_fp16` * make style * apply review suggestion * Oops, the fp16 decorator was misused	2023-10-31 18:12:14 +01:00
Younes Belkada	4bb50aa212	[`Quantization` / `tests` ] Fix bnb MPT test (#27178 ) fix bnb mpt test	2023-10-31 16:25:53 +01:00
Matt	05f2290114	Backward compatibility fix for the Conversation class (#27176 ) * Backward compatibility fix for the Conversation class * Explain what's going on in the conditional	2023-10-31 15:12:06 +00:00
Younes Belkada	309a90664f	[FEAT] Add Neftune into transformers Trainer (#27141 ) * add v1 neftune * use `unwrap_model` instead * add test + docs * Apply suggestions from code review Co-authored-by: Zach Mueller <muellerzr@gmail.com> * more details * fixup * Update docs/source/en/main_classes/trainer.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * refactor a bit * more elaborated test * fix unwrap issue --------- Co-authored-by: Zach Mueller <muellerzr@gmail.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-10-31 16:03:59 +01:00
Hz, Ji	f53041a753	device agnostic pipelines testing (#27129 ) * device agnostic pipelines testing * pass torch_device	2023-10-31 15:46:31 +01:00
Matt	08fadc8085	Shorten the conversation tests for speed + fixing position overflows (#26960 ) * Shorten the conversation tests for speed + fixing position overflows * Put max_new_tokens back to 5 * Remove test skips * Increase max_position_embeddings in blenderbot tests * Add skips for blenderbot_small * Correct TF test skip * make fixup * Reformat skips to use is_pipeline_test_to_skip * Update tests/models/blenderbot_small/test_modeling_blenderbot_small.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/blenderbot_small/test_modeling_flax_blenderbot_small.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/blenderbot_small/test_modeling_tf_blenderbot_small.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-10-31 14:20:04 +00:00
Younes Belkada	f7ea959b96	[`core`/ `GC` / `tests`] Stronger GC tests (#27124 ) * stronger GC tests * better tests and skip failing tests * break down into 3 sub-tests * break down into 3 sub-tests * refactor a bit * more refactor * fix * last nit * credits contrib and suggestions * credits contrib and suggestions --------- Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-10-30 19:53:46 +01:00
Hz, Ji	5bbf671276	Device agnostic trainer testing (#27131 )	2023-10-30 18:16:40 +00:00
Younes Belkada	6b466771b0	[`tests` / `Quantization`] Fix bnb test (#27145 ) * fix bnb test * link to GH issue	2023-10-30 15:43:08 +01:00
Yih-Dar	576994963f	Fix some tests using `"common_voice"` (#27147 ) * Use mozilla-foundation/common_voice_11_0 * Update expected values * Update expected values * For test_word_time_stamp_integration --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-10-30 15:27:15 +01:00
Yih-Dar	691fd8fdde	Add `Kosmos-2` model (#24709 ) * Add KOSMOS-2 model * update * update * update * address review comment - 001 * address review comment - 002 * address review comment - 003 * style * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix * address review comment - 004 * address review comment - 005 * address review comment - 006 * address review comment - 007 * address review comment - 008 * address review comment - 009 * address review comment - 010 * address review comment - 011 * update readme * fix * fix * fix * [skip ci] fix * revert the change in _decode * fix docstring * fix docstring * Update docs/source/en/model_doc/kosmos-2.md Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * no more Kosmos2Tokenizer * style * remove "returned when being computed by the model" * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * UTM5 Atten * fix attn mask * use present_key_value_states instead of next_decoder_cache * style * conversion scripts * conversion scripts * conversion scripts * Add _reorder_cache * fix doctest and copies * rename 1 * rename 2 * rename 3 * make fixup * fix table * fix docstring * rename 4 * change repo_id * remove tip * update md file * make style * update md file * put docs/source/en/model_doc/kosmos-2.md to slow * update conversion script * Use CLIPImageProcessor in Kosmos2Processor * Remove Kosmos2ImageProcessor * Remove to_dict in Kosmos2Config * Remove files * fix import * Update conversion * normalized=False * Not using hardcoded values like <image> * elt --> element * Apply suggestion * Not using hardcoded values like </image> * No assert * No nested functions * Fix md file * copy * update doc * fix docstring * fix name * Remove _add_remove_spaces_around_tag_tokens * Remove dummy docstring of _preprocess_single_example * Use `BatchEncoding` * temp * temp * temp * Update * Update * Make Kosmos2ProcessorTest a bit pretty * Update gradient checkpointing * Fix gradient checkpointing test * Remove one liner remove_special_fields * Simplify conversion script * fix add_eos_token * update readme * update tests * Change to microsoft/kosmos-2-patch14-224 * style * Fix doc --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-10-30 13:32:17 +01:00
Younes Belkada	5fbed2d7ca	[`Trainer` / `GC`] Add `gradient_checkpointing_kwargs` in trainer and training arguments (#27068 ) * add `gradient_checkpointing_kwargs` in trainer and training arguments * add comment * add test - currently failing * now tests pass	2023-10-30 12:41:48 +01:00
Patrick von Platen	ac5893756b	[Attention Mask] Refactor all encoder-decoder attention mask (#27086 ) * [FA2 Bart] Add FA2 to all Bart-like * better * Refactor attention mask * remove all customized atteniton logic * format * mass rename * replace _expand_mask * replace _expand_mask * mass rename * add pt files * mass replace & rename * mass replace & rename * mass replace & rename * mass replace & rename * Update src/transformers/models/idefics/modeling_idefics.py * fix more * clean more * fix more * make style * fix again * finish * finish * finish * finish * finish * finish * finish * finish * finish * finish * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * small fix mistral * finish * finish * finish * finish --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-10-27 16:42:01 +02:00
Isaac Chung	e2bffcfafd	Add early stopping for Bark generation via logits processor (#26675 ) * add early stopping logits processor * black formmated * indent * follow method signature * actual logic * check for None * address comments on docstrings and method signature * add unit test under `LogitsProcessorTest` wip * unit test passing * black formatted * condition per sample * add to BarkModelIntegrationTests * wip BarkSemanticModelTest * rename and add to kwargs handling * not add to BarkSemanticModelTest * correct logic and assert last outputs tokens different in test * doc-builder style * read from kwargs as well * assert len of with less than that of without * ruff * add back seed and test case * add original impl default suggestion * doc-builder * rename and use softmax * switch back to LogitsProcessor and update docs wording * camelCase and spelling and saving compute * assert strictly less than * assert less than * expand test_generate_semantic_early_stop instead	2023-10-27 11:07:33 +01:00
Arthur	90ee9cea19	Revert "add exllamav2 arg" (#27102 ) Revert "add exllamav2 arg (#26437)" This reverts commit `8214d6e7b1`.	2023-10-27 11:23:06 +02:00
Zach Mueller	34a640642b	Save TB logs as part of push_to_hub (#27022 ) * Support runs/ * Upload runs folder as part of push to hub * Add a test * Add to test deps * Update with proposed solution from Slack * Ensure that repo gets deleted in tests	2023-10-26 12:13:19 -04:00
Marc Sun	8214d6e7b1	add exllamav2 arg (#26437 ) * add_ xllamav2 arg * add test * style * add check * add doc * replace by use_exllama_v2 * fix tests * fix doc * style * better condition * fix logic * add deprecate msg	2023-10-26 10:15:05 -04:00
Arthur	4864d08d3e	Add-support for commit description (#26704 ) * fix * update * revert * add dosctring * good to go * update * add a test	2023-10-26 12:37:09 +02:00
Younes Belkada	06e782da4e	[`core`] Refactor of `gradient_checkpointing` (#27020 ) * v1 * fix * remove `create_custom_forward` * fixup * fixup * add test and fix all failing GC tests * remove all remaining `create_custom_forward` methods * fix idefics bug * fixup * replace with `__call__` * add comment * quality	2023-10-25 12:16:15 +02:00
Arthur	9286f0ac39	Skip-test (#27062 ) * skip plbart test * nits * update	2023-10-25 10:47:33 +02:00
JB (Don)	a0fd34483f	Add a default decoder_attention_mask for EncoderDecoderModel during training (#26752 ) * Add a default decoder_attention_mask for EncoderDecoderModel during training Since we are already creating the default decoder_input_ids from the labels, we should also create a default decoder_attention_mask to go with it. * Fix test constant that relied on manual_seed() The test was changed to use a decoder_attention_mask that ignores padding instead (which is the default one created by BERT when attention_mask is None). * Create the decoder_attention_mask using decoder_input_ids instead of labels * Fix formatting in test	2023-10-24 18:26:16 +01:00
Alex McKinney	9da451713d	Device agnostic testing (#25870 ) * adds agnostic decorators and availability fns * renaming decorators and fixing imports * updating some representative example tests bloom, opt, and reformer for now * wip device agnostic functions * lru cache to device checking functions * adds `TRANSFORMERS_TEST_DEVICE_SPEC` if present, imports the target file and updates device to function mappings * comments `TRANSFORMERS_TEST_DEVICE_SPEC` code * extra checks on device name * `make style; make quality` * updates default functions for agnostic calls * applies suggestions from review * adds `is_torch_available` guard * Add spec file to docs, rename function dispatch names to backend_* * add backend import to docs example for spec file * change instances of to * Move register backend to before device check as per @statelesshz changes * make style * make opt test require fp16 to run --------- Co-authored-by: arsalanu <arsalanu@graphcore.ai> Co-authored-by: arsalanu <hzji210@gmail.com>	2023-10-24 16:49:26 +02:00
Xuehai Pan	cc7803c0a6	Register ModelOutput as supported torch pytree nodes (#26618 ) * Register ModelOutput as supported torch pytree nodes * Test ModelOutput as supported torch pytree nodes * Update type hints for pytree unflatten functions	2023-10-24 11:02:40 +02:00
Patrick von Platen	33f98cfded	Remove ambiguous `padding_mask` and instead use a 2D->4D Attn Mask Mapper (#26792 ) * [Attn Mask Converter] refactor attn mask * up * Apply suggestions from code review Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com> * improve * rename * better cache * renaming * improve more * improve * fix bug * finalize * make style & make fix-copies * correct more * start moving attention_mask * fix llama * improve falcon * up * improve more * improve more * Update src/transformers/models/owlv2/modeling_owlv2.py * make style * make style * rename to converter * Apply suggestions from code review --------- Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com>	2023-10-23 18:54:00 +02:00
Yoach Lacombe	cb45f71c4d	Add Seamless M4T model (#25693 ) * first raw commit * still POC * tentative convert script * almost working speech encoder conversion scripts * intermediate code for encoder/decoders * add modeling code * first version of speech encoder * make style * add new adapter layer architecture * add adapter block * add first tentative config * add working speech encoder conversion * base model convert works now * make style * remove unnecessary classes * remove unecessary functions * add modeling code speech encoder * rework logics * forward pass of sub components work * add modeling codes * some config modifs and modeling code modifs * save WIP * new edits * same output speech encoder * correct attention mask * correct attention mask * fix generation * new generation logics * erase comments * make style * fix typo * add some descriptions * new state * clean imports * add tests * make style * make beam search and num_return_sequences>1 works * correct edge case issue * correct SeamlessM4TConformerSamePadLayer copied from * replace ACT2FN relu by nn.relu * remove unecessary return variable * move back a class * change name conformer_attention_mask ->conv_attention_mask * better nit code * add some Copied from statements * small nits * small nit in dict.get * rename t2u model -> conditionalgeneration * ongoing refactoring of structure * update models architecture * remove SeamlessM4TMultiModal classes * add tests * adapt tests * some non-working code for vocoder * add seamlessM4T vocoder * remove buggy line * fix some hifigan related bugs * remove hifigan specifc config * change * add WIP tokenization * add seamlessM4T working tokenzier * update tokenization * add tentative feature extractor * Update converting script * update working FE * refactor input_values -> input_features * update FE * changes in generation, tokenizer and modeling * make style and add t2u_decoder_input_ids * add intermediate outputs for ToSpeech models * add vocoder to speech models * update valueerror * update FE with languages * add vocoder convert * update config docstrings and names * update generation code and configuration * remove todos and update config.pad_token_id to generation_config.pad_token_id * move block vocoder * remove unecessary code and uniformize tospeech code * add feature extractor import * make style and fix some copies from * correct consistency + make fix-copies * add processor code * remove comments * add fast tokenizer support * correct pad_token_id in M4TModel * correct config * update tests and codes + make style * make some suggested correstion - correct comments and change naming * rename some attributes * rename some attributes * remove unecessary sequential * remove option to use dur predictor * nit * refactor hifigan * replace normalize_mean and normalize_var with do_normalize + save lang ids to generation config * add tests * change tgt_lang logic * update generation ToSpeech * add support import SeamlessM4TProcessor * fix generate * make tests * update integration tests, add option to only return text and update tokenizer fast * fix wrong function call * update import and convert script * update integration tests + update repo id * correct paths and add first test * update how new attention masks are computed * update tests * take first care of batching in vocoder code * add batching with the vocoder * add waveform lengths to model outputs * make style * add generate kwargs + forward kwargs of M4TModel * add docstrings forward methods * reformate docstrings * add docstrings t2u model * add another round of modeling docstrings + reformate speaker_id -> spkr_id * make style * fix check_repo * make style * add seamlessm4t to toctree * correct check_config_attributes * write config docstrings + some modifs * make style * add docstrings tokenizer * add docstrings to processor, fe and tokenizers * make style * write first version of model docs * fix FE + correct FE test * fix tokenizer + add correct integration tests * fix most tokenization tests * make style * correct most processor test * add generation tests and fix num_return_sequences > 1 * correct integration tests -still one left * make style * correct position embedding * change numbeams to 1 * refactor some modeling code and correct one test * make style * correct typo * refactor intermediate fnn * refactor feedforward conformer * make style * remove comments * make style * fix tokenizer tests * make style * correct processor tests * make style * correct S2TT integration * Apply suggestions from Sanchit code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * correct typo * replace torch.nn->nn + make style * change Output naming (waveforms -> waveform) and ordering * nit renaming and formating * remove return None when not necessary * refactor SeamlessM4TConformerFeedForward * nit typo * remove almost copied from comments * add a copied from comment and remove an unecessary dropout * remove inputs_embeds from speechencoder * remove backward compatibiliy function * reformate class docstrings for a few components * remove unecessary methods * split over 2 lines smthg hard to read * make style * replace two steps offset by one step as suggested * nice typo * move warnings * remove useless lines from processor * make generation non-standard test more robusts * remove torch.inference_mode from tests * split integration tests * enrich md * rename control_symbol_vocoder_offset->vocoder_offset * clean convert file * remove tgt_lang and src_lang from FE * change generate docstring of ToText models * update generate docstring of tospeech models * unify how to deal withtext_decoder_input_ids * add default spkr_id * unify tgt_lang for t2u_model * simplify tgt_lang verification * remove a todo * change config docstring * make style * simplify t2u_tgt_lang_id * make style * enrich/correct comments * enrich .md * correct typo in docstrings * add torchaudio dependency * update tokenizer * make style and fix copies * modify SeamlessM4TConverter with new tokenizer behaviour * make style * correct small typo docs * fix import * update docs and add requirement to tests * add convert_fairseq2_to_hf in utils/not_doctested.txt * update FE * fix imports and make style * remove torchaudio in FE test * add seamless_m4t.md to utils/not_doctested.txt * nits and change the way docstring dataset is loaded * move checkpoints from ylacombe/ to facebook/ orga * refactor warning/error to be in the 119 line width limit * round overly precised floats * add stereo audio behaviour * refactor .md and make style * enrich docs with more precised architecture description * readd undocumented models * make fix-copies * apply some suggestions * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * correct bug from previous commit * refactor a parameter allowing to clean the code + some small nits * clean tokenizer * make style and fix * make style * clean tokenizers arguments * add precisions for some tests * move docs from not_tested to slow * modify tokenizer according to last comments * add copied from statements in tests * correct convert script * correct parameter docstring style * correct tokenization * correct multi gpus * make style * clean modeling code * make style * add copied from statements * add copied statements * add support with ASR pipeline * remove file added inadvertently * fix docstrings seamlessM4TModel * add seamlessM4TConfig to OBJECTS_TO_IGNORE due of unconventional markdown * add seamlessm4t to assisted generation ignored models --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-10-23 14:49:48 +02:00
Arthur	ef978d0a7b	skip two tests (#27013 ) * skip two tests * skip torch as well * fixup	2023-10-23 12:52:05 +02:00
Pedro Cuenca	c030fc8913	Fix Fuyu image scaling bug (#26918 ) * Fix Fuyu image scaling bug It could produce negative padding and hence inference errors for certain image sizes. * Fix aspect ratio scaling test	2023-10-20 13:46:06 +02:00
Matt	bdbcd5d482	Fix and re-enable ConversationalPipeline tests (#26907 ) * Fix and re-enable conversationalpipeline tests * Fix the batch test so the change only applies to conversational pipeline	2023-10-19 12:04:25 +01:00
Pablo Montalvo	caa0ff0bf1	Add fuyu model (#26911 ) * initial commit * add processor, add fuyu naming * add draft processor * fix processor * remove dropout to fix loading of weights * add image processing fixes from Pedro * fix * fix processor * add basic processing fuyu test * add documentation and TODO * address comments, add tests, add doc * replace assert with torch asserts * add Mixins and fix tests * clean imports * add model tester, clean imports * fix embedding test * add updated tests from pre-release model * Processor: return input_ids used for inference * separate processing and model tests * relax test tolerance for embeddings * add test for logit comparison * make sure fuyu image processor is imported in the init * fix formattingh * more formatting issues * and more * fixups * remove some stuff * nits * update init * remove the fuyu file * Update integration test with release model * Update conversion script. The projection is not used, as confirmed by the authors. * improve geenration * Remove duplicate function * Trickle down patches to model call * processing fuyu updates * remove things * fix prepare_inputs_for_generation to fix generate() * remove model_input * update * add generation tests * nits * draft leverage automodel and autoconfig * nits * fix dtype patch * address comments, update READMEs and doc, include tests * add working processing test, remove refs to subsequences * add tests, remove Sequence classification * processing * update * update the conversion script * more processing cleanup * safe import * take out ModelTesterMixin for early release * more cl;eanup * more cleanup * more cleanup * and more * register a buffer * nits * add postprocessing of generate output * nits * updates * add one working test * fix test * make fixup works * fixup * Arthur's updates * nits * update * update * fix processor * update tests * passe more fixups * fix * nits * don't import torch * skip fuyu config for now * fixup done * fixup * update * oups * nits * Use input embeddings * no buffer * update * styling processing fuyu * fix test * update licence * protect torch import * fixup and update not doctested * kwargs should be passed * udpates * update the impofixuprts in the test * protect import * protecting imports * protect imports in type checking * add testing decorators * protect top level import structure * fix typo * fix check init * move requires_backend to functions * Imports * Protect types --------- Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: ArthurZucker <arthur.zucker@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Lysandre <lysandre@huggingface.co>	2023-10-18 15:24:11 -07:00
Younes Belkada	5a73316bed	[`FA-2`] Final fix for FA2 dtype (#26846 ) * final fix for FA2 dtype * try * oops * Update src/transformers/models/falcon/modeling_falcon.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * apply fix everywhere --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-10-18 19:48:55 +02:00
Matt	de55ead1f1	Emergency PR to skip conversational tests to fix CI (#26906 )	2023-10-18 15:33:43 +01:00
Arthur	ef7e93699a	[`Tokenizer`] Fix slow and fast serialization (#26570 ) * fix * last attempt * current work * fix forward compatibility * save all special tokens * current state * revert additional changes * updates * remove tokenizer.model * add a test and the fix * nit * revert one more break * fix typefield issue * quality * more tests * fix fields for FC * more nits? * new additional changes * how * some updates * simplify all * more nits * revert some things to original * nice * nits * a small hack * more nits * ahhaha * fixup * update * make test run on ci * use subtesting * update * Update .circleci/create_circleci_config.py * updates * fixup * nits * replace typo * fix the test * nits * update * None max dif pls * a partial fix * had to revert one thing * test the fast * updates * fixup * and more nits * more fixes * update * Oupsy 👁️ * nits * fix marian * on our way to heaven * Update src/transformers/models/t5/tokenization_t5.py Co-authored-by: Lysandre Debut <hi@lysand.re> * fixup * Update src/transformers/tokenization_utils_fast.py Co-authored-by: Leo Tronchon <leo.tronchon@gmail.com> * Update src/transformers/tokenization_utils_base.py Co-authored-by: Leo Tronchon <leo.tronchon@gmail.com> * fix phobert * skip some things, test more * nits * fixup * fix deberta * update * update * more updates * skip one test * more updates * fix camembert * can't test this one * more good fixes * kind of a major update - seperate what is only done in fast in fast init and refactor - add_token(AddedToken(..., speicla = True)) ignores it in fast - better loading * fixup * more fixups * fix pegasus and mpnet * remove skipped tests * fix phoneme tokenizer if self.verbose * fix individual models * update common tests * update testing files * all over again * nits * skip test for markup lm * fixups * fix order of addition in fast by sorting the added tokens decoder * proper defaults for deberta * correct default for fnet * nits on add tokens, string initialized to special if special * skip irrelevant herbert tests * main fixes * update test added_tokens_serialization * the fix for bart like models and class instanciating * update bart * nit! * update idefix test * fix whisper! * some fixup * fixups * revert some of the wrong chanegs * fixup * fixup * skip marian * skip the correct tests * skip for tf and flax as well --------- Co-authored-by: Lysandre Debut <hi@lysand.re> Co-authored-by: Leo Tronchon <leo.tronchon@gmail.com>	2023-10-18 16:30:53 +02:00
Yoach Lacombe	db611aabee	🚨 🚨 Raise error when no speaker embeddings in speecht5._generate_speech (#26418 ) * add warning when no speaker embeddings in speecht5._generate_speech * modify warning to error * adapt generation test	2023-10-17 15:59:35 +02:00
Younes Belkada	41c42f85f6	[`FA2`] Fix flash attention 2 fine-tuning with Falcon (#26852 ) fix fa2 + dropout issue	2023-10-17 15:38:03 +02:00
Yih-Dar	b8f1cde931	Fix Mistral OOM again (#26847 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-10-16 22:47:20 +02:00
Younes Belkada	fd6a0ade9b	🚨🚨🚨 [`Quantization`] Store the original dtype in the config as a private attribute 🚨🚨🚨 (#26761 ) * First step * fix * add adjustements for gptq * change to `_pre_quantization_dtype` * Update src/transformers/modeling_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix serialization * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fixup --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-10-16 19:56:53 +02:00
Matt	14b04b4b9c	Conversation pipeline fixes (#26795 ) * Adjust length limits and allow naked conversation list inputs * Adjust length limits and allow naked conversation list inputs * Maybe use a slightly more reasonable limit than 1024 * Skip tests for old models that never supported this anyway * Cleanup input docstrings * More docstring cleanup + skip failing TF test * Make fixup	2023-10-16 17:27:45 +01:00
NielsRogge	762af3e3c7	Add OWLv2, bis (#26668 ) * First draft * Update conversion script * Update copied from statements * Fix style * Add copied from to config * Add copied from to processor * Run make fixup * Add docstring * Update docstrings * Add method * Improve docstrings * Fix docstrings * Improve docstrings * Remove onnx * Add flag * Address comments * Add copied from to model tests * Add flag to conversion script * Add code snippet * Address more comments * Address comment * Improve conversion script * More improvements * Add expected objectness logits * Skip test * Improve conversion script * Extend conversion script * Convert large checkpoint * Fix doc tests * Convert all checkpoints, update integration tests * Add checkpoint_path arg * Fix repo_id	2023-10-13 16:41:24 +02:00
Matt	bdb391e9c6	Fix Falcon generation test (#26770 )	2023-10-13 15:10:27 +01:00
Matt	c9785d956b	Disable default system prompt for LLaMA (#26765 ) * Disable default system prompt for LLaMA * Update test to not expect default prompt	2023-10-13 14:48:38 +01:00
Yih-Dar	21da3b2461	Update expect outputs of `IdeficsProcessorTest.test_tokenizer_padding` (#26779 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-10-13 09:52:10 +02:00
Yih-Dar	3e93dd295b	Skip `TrainerIntegrationFSDP::test_basic_run_with_cpu_offload` if `torch < 2.1` (#26764 ) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-10-12 18:22:09 +02:00
Heinz-Alexander Fuetterer	883ed4b344	chore: fix typos (#26756 )	2023-10-12 18:00:27 +02:00
Yih-Dar	a243cdca2a	Fix `PerceiverModelIntegrationTest::test_inference_masked_lm` (#26760 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-10-12 17:43:06 +02:00
Yih-Dar	db5e0c3292	Fix `MistralIntegrationTest` OOM (#26754 ) * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-10-12 12:31:11 +02:00
Yih-Dar	72256bc72a	Fix `PersimmonIntegrationTest` OOM (#26750 ) * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-10-12 11:24:18 +02:00
Tom Aarsen	40ea9ab2a1	Add many missing spaces in adjacent strings (#26751 ) Add missing spaces in adjacent strings	2023-10-12 10:28:40 +02:00
Patrick von Platen	da69de17e8	[Assistant Generation] Improve Encoder Decoder (#26701 ) * [Assistant Generation] Improve enc dec * save more * Fix logit processor checks * Clean * make style * fix deprecation * fix generation test * Apply suggestions from code review * fix biogpt * make style	2023-10-11 15:52:20 +02:00
Yih-Dar	5334796d20	`Copied from` for test files (#26713 ) * copied statement for test files --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-10-11 14:12:09 +02:00
Billy Bradley	dcc49d8a7e	In assisted decoding, pass model_kwargs to model's forward call (fix prepare_input_for_generation in all models) (#25242 ) * In assisted decoding, pass model_kwargs to model's forward call Previously, assisted decoding would ignore any additional kwargs that it doesn't explicitly handle. This was inconsistent with other generation methods, which pass the model_kwargs through prepare_inputs_for_generation and forward the returned dict to the model's forward call. The prepare_inputs_for_generation method needs to be amended in all models, as previously it only kept the last input ID when a past_key_values was passed. * Improve variable names in _extend_attention_mask * Refactor extending token_type_ids into a function * Replace deepcopy with copy to optimize performance * Update new persimmon model with llama changes for assisted generation * Update new mistral model for assisted generation with prepare_inputs_for_generation * Update position_ids creation in falcon prepare_inputs_for_generation to support assisted generation	2023-10-11 13:18:42 +02:00
Thien Tran	1e3c9ddacc	Make Whisper Encoder's sinusoidal PE non-trainable by default (#26032 ) * set encoder's PE as non-trainable * freeze flax * init sinusoids * add test for non-trainable embed positions * simplify TF encoder embed_pos * revert tf * clean up * add sinusoidal init for jax * make consistent sinusoidal function * fix dtype * add default dtype * use numpy for sinusoids. fix jax * add sinusoid init for TF * fix * use custom embedding * use specialized init for each impl * fix sinusoids init. add test for pytorch * fix TF dtype * simplify sinusoid init for flax and tf * add tests for TF * change default dtype to float32 * add sinusoid test for flax * Update src/transformers/models/whisper/modeling_flax_whisper.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update src/transformers/models/whisper/modeling_tf_whisper.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * move sinusoidal init to _init_weights --------- Co-authored-by: sanchit-gandhi <sanchit@huggingface.co> Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>	2023-10-11 09:08:54 +01:00
Shreyas S	86a4e5a96b	Fixed malapropism error (#26660 ) Update test_integration.py Fixed malapropism clone>copy	2023-10-09 11:04:57 +02:00
Arthur	9ad815e412	[`LlamaTokenizerFast`] Adds edge cases for the template processor (#26606 ) * make sure eos and bos are properly handled for fast tokenizer * fix code llama as well * nits * fix the conversion script as well * fix failing test	2023-10-06 16:40:54 +02:00
statelesshz	27597fea07	remove SharedDDP as it is deprecated (#25702 ) * remove SharedDDP as it was drepracated * apply review suggestion * make style * Oops,forgot to remove the compute_loss context manager in Seq2SeqTrainer. * remove the unnecessary conditional statement * keep the logic of IPEX * clean code * mix precision setup & make fixup --------- Co-authored-by: statelesshz <jihuazhong1@huawei.com>	2023-10-06 16:03:11 +02:00
Yih-Dar	e840aa67e8	Fix failing `MusicgenTest .test_pipeline_text_to_audio` (#26586 ) * fix * fix * Fix * Fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-10-06 15:53:59 +02:00
fxmarty	64845307b3	Remove unnecessary unsqueeze - squeeze in rotary positional embedding (#26162 ) * remove unnecessary unsqueeze-squeeze in llama * correct other models * fix * revert gpt_neox_japanese * fix copie * fix test	2023-10-06 18:25:15 +09:00
Tianqi Liu	65aabafe2f	Update tokenization_code_llama_fast.py (#26576 ) * Update tokenization_code_llama_fast.py * Update test_tokenization_code_llama.py * Update test_tokenization_code_llama.py	2023-10-06 10:49:02 +02:00
Towdo	af38c837ee	Fixed inconsistency in several fast tokenizers (#26561 )	2023-10-06 10:40:47 +02:00
Marvin Gabler	0a3b9d02fe	#26566 swin2 sr allow in out channels (#26568 ) * feat: close #26566, changed model & config files to accept arbitary in and out channels * updated docstrings * fix: linter error * fix: update Copy docstrings * fix: linter update * fix: rename num_channels_in to num_channels to prevent breaking changes * fix: make num_channels_out None per default * Update src/transformers/models/swin2sr/configuration_swin2sr.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix: update tests to include num_channels_out * fix:linter * fix: remove normalization with precomputed rgb values when #input_channels!=#output_channels --------- Co-authored-by: marvingabler <marvingabler@outlook.de> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-10-05 15:20:38 +02:00
Younes Belkada	e6d250e4cd	[`core`] fix silent bug `keep_in_fp32` modules (#26589 ) * fix silent bug `keep_in_fp32` modules * final fix * added a common test. * Trigger CI * revert	2023-10-05 14:44:31 +02:00
Yih-Dar	54e17a15dc	Fix failing tests on `main` due to torch 2.1 (#26607 ) * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-10-05 10:27:05 +02:00
Arthur	c037b2e340	skip flaky hub tests (#26594 ) skip flaky	2023-10-04 17:47:55 +02:00
dg845	9deb18ca1a	Add # Copied from statements to audio feature extractors that use the floats_list function (#26581 ) Add # Copied from statements to audio feature extractors that use the floats_list function.	2023-10-04 17:09:48 +02:00
Sylvain Gugger	03af4c42a6	Docstring check (#26052 ) * Fix number of minimal calls to the Hub with peft integration * Alternate design * And this way? * Revert * Nits to fix * Add util * Print when changes are made * Add list to ignore * Add more rules * Manual fixes * deal with kwargs * deal with enum defaults * avoid many digits for floats * Manual fixes * Fix regex * Fix regex * Auto fix * Style * Apply script * Add ignored list * Add check that templates are filled * Adding to CI checks * Add back semi-fix * Ignore more objects * More auto-fixes * Ignore missing objects * Remove temp semi-fix * Fixes * Update src/transformers/models/pvt/configuration_pvt.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update utils/check_docstrings.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/utils/quantization_config.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Deal with float defaults * Fix small defaults * Address review comment * Treat * Post-rebase cleanup * Address review comment * Update src/transformers/models/deprecated/mctct/configuration_mctct.py Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr> * Address review comment --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>	2023-10-04 15:13:37 +02:00
Lysandre Debut	5c66378cea	[Tokenizers] Skip tests temporarily (#26574 ) * Skip tests temporarily * style * Add additional test	2023-10-03 19:43:42 +02:00
Sanchit Gandhi	57f44dc428	[Whisper] Allow basic text normalization (#26149 ) * [Whisper] Allow basic text normalization * up * style copies	2023-10-03 17:57:16 +01:00
Younes Belkada	2aef9a9601	[`PEFT`] Final fixes (#26559 ) * fix issues with PEFT * logger warning futurewarning issues * fixup * adapt from suggestions * oops * rm test	2023-10-03 14:53:09 +02:00
Younes Belkada	ae9a344cce	[`Mistral`] Add Flash Attention-2 support for `mistral` (#26464 ) * add FA-2 support for mistral * fixup * add sliding windows * fixing few nits * v1 slicing cache - logits do not match * add comment * fix bugs * more mem efficient * add warning once * add warning once * oops * fixup * more comments * copy * add safety checker * fixup * Update src/transformers/models/mistral/modeling_mistral.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * copied from * up * raise when padding side is right * fixup * add doc + few minor changes * fixup --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-10-03 13:44:46 +02:00
Sanchit Gandhi	768aa3d9cd	[Wav2Vec2 and Co] Update init tests for PT 2.1 (#26494 )	2023-10-03 10:52:34 +02:00
Nathan Cahill	b5ca8fcd20	Add tokenizer kwargs to fill mask pipeline. (#26234 ) * add tokenizer kwarg inputs * Adding tokenizer_kwargs to _sanitize_parameters * Add truncation=True example to tests * Update test_pipelines_fill_mask.py * Update test_pipelines_fill_mask.py * make fix-copies and make style * Update fill_mask.py Replace single tick with double * make fix-copies * Style --------- Co-authored-by: Lysandre <lysandre@huggingface.co>	2023-10-03 10:25:10 +02:00
Arthur	bab3331906	Code-llama-nit (#26300 ) * fix encoding when the fill token is None * add tests and edge cases * fiuxp * Update tests/models/code_llama/test_tokenization_code_llama.py	2023-10-02 18:29:27 +02:00
Arthur	63864e057f	Fix model integration ci (#26322 ) * fix wav2vec2 * nit * stash * one more file to update * fix byt5 * vocab size is 256, don't change that! * use other revision * test persimon in smaller size * style * tests * nits * update add tokens from pretrained * test tokenization * nits * potential fnet fix? * more nits * nits * correct test * assert close * udpate * ouch * fix it * some more nits * FINALLU * use `adept` checkpoints * more adept checkpoints * that was invlved!	2023-10-02 13:55:46 +02:00
Younes Belkada	6824461f2a	[`core`/ `auto` ] Fix bnb test with code revision + bug with code revision (#26431 ) * fix bnb test with code revision * fix test * Apply suggestions from code review * Update src/transformers/models/auto/auto_factory.py * Update src/transformers/models/auto/auto_factory.py * Update src/transformers/models/auto/auto_factory.py	2023-10-02 11:35:07 +02:00
Lysandre Debut	67239f7360	Revert falcon exception (#26472 ) * Revert "Falcon: fix revision propagation (#26006)" This reverts commit `118c676ef3`. * Revert "Put Falcon back (#25960)" This reverts commit `22a69f1d7d`.	2023-10-02 09:13:19 +02:00
Yih-Dar	391177441b	Avoid all-zeor attnetion mask used in testing (#26469 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-09-29 11:06:06 +02:00
Yih-Dar	9b23d0de0e	Skip 2 failing persimmon pipeline tests for now (#26485 ) skip Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-09-29 10:52:18 +02:00
Marc Sun	5e11d72d4d	fix_mbart_tied_weights (#26422 ) * fix_mbart_tied_weights * add test	2023-09-28 15:08:35 +02:00
Younes Belkada	38e96324ef	[`PEFT`] introducing `adapter_kwargs` for loading adapters from different Hub location (`subfolder`, `revision`) than the base model (#26270 ) * make use of adapter_revision * v1 adapter kwargs * fix CI * fix CI * fix CI * fixup * add BC * Update src/transformers/integrations/peft.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fixup * change it to error * Update src/transformers/modeling_utils.py * Update src/transformers/modeling_utils.py * fixup * change * Update src/transformers/integrations/peft.py --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-09-28 11:13:03 +02:00
Chris Bamford	72958fcd3c	[Mistral] Mistral-7B-v0.1 support (#26447 ) * [Mistral] Mistral-7B-v0.1 support * fixing names * slightly longer test * fixups * not_doctested * wrongly formatted references * make fixuped --------- Co-authored-by: Timothee Lacroix <t@eugen.ai> Co-authored-by: timlacroix <t@mistral.ai>	2023-09-27 18:30:46 +02:00
Younes Belkada	3ca18d6d09	[`PEFT`] Fix PEFT multi adapters support (#26407 ) * fix PEFT multi adapters support * refactor a bit * save pretrained + BC + added tests * Update src/transformers/integrations/peft.py Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com> * add more tests * add suggestion * final changes * adapt a bit * fixup * Update src/transformers/integrations/peft.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * adapt from suggestions --------- Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2023-09-27 16:45:31 +02:00
Younes Belkada	153755ee38	[`FA` / `tests`] Add use_cache tests for FA models (#26415 ) * add use_cache tests for FA * fixup	2023-09-27 12:21:54 +02:00
Shauray Singh	abd2531034	Fix padding for IDEFICS (#26396 ) * fix * fixup * tests * fixup	2023-09-27 10:56:07 +02:00
sanjeevk-os	6ce6a5adb9	added support for gradient checkpointing in ESM models (#26386 )	2023-09-26 10:15:53 +02:00
NielsRogge	ace74d16bd	Add Nougat (#25942 ) * Add conversion script * Add NougatImageProcessor * Add crop margin * More improvements * Add docs, READMEs * Remove print statements * Include model_max_length * Add NougatTokenizerFast * Fix imports * Improve postprocessing * Improve image processor * Fix image processor * Improve normalize method * More improvements * More improvements * Add processor, improve docs * Simplify fast tokenizer * Remove test file * Fix docstrings * Use NougatProcessor in conversion script * Add is_levensthein_available * Add tokenizer tests * More improvements * Use numpy instead of opencv * Add is_cv2_available * Fix cv2_available * Add is_nltk_available * Add image processor tests, improve crop_margin * Add integration tests * Improve integration test * Use do_rescale instead of hacks, thanks Amy * Remove random_padding * Address comments * Address more comments * Add import * Address more comments * Address more comments * Address comment * Address comment * Set max_model_input_sizes * Add tests * Add requires_backends * Add Nougat to exotic tests * Use to_pil_image * Address comment regarding nltk * Add NLTK * Improve variable names, integration test * Add test * refactor, document, and test regexes * remove named capture groups, add comments * format * add non-markdown fixed tokenization * format * correct flakyness of args parse * add regex comments * test functionalities for crop_image, align long axis and expected output * add regex tests * remove cv2 dependency * test crop_margin equality between cv2 and python * refactor table regexes to markdown add newline * change print to log, improve doc * fix high count tables correction * address PR comments: naming, linting, asserts * Address comments * Add copied from * Update conversion script * Update conversion script to convert both small and base versions * Add inference example * Add more info * Fix style * Add require annotators to test * Define all keyword arguments explicitly * Move cv2 annotator * Add tokenizer init method * Transfer checkpoints * Add reference to Donut * Address comments * Skip test * Remove cv2 method * Add copied from statements * Use cached_property * Fix docstring * Add file to not doctested --------- Co-authored-by: Pablo Montalvo <pablo.montalvo.leroux@gmail.com>	2023-09-26 07:06:04 +02:00
Yih-Dar	d9e4bc2895	Update tiny model information and pipeline tests (#26285 ) * Update tiny model summary file * add to pipeline tests * revert * fix import * fix import * fix * fix * update * update * update * fix * remove BarkModelTest * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-09-25 18:08:12 +02:00
LeviVasconcelos	576cd45a57	Add image to image pipeline (#25393 ) * Add image to image pipeline Add image to image pipeline * remove swin2sr from tf auto * make ImageToImage importable * make style make style make style make style * remove tf support * remove nonused imports * fix postprocessing * add important comments; add unit tests * add documentation * remove support for TF * make fixup * fix typehint Image.Image * fix documentation code * address review request; fix unittest type checking * address review request; fix unittest type checking * make fixup * address reviews * Update src/transformers/pipelines/image_to_image.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * enhance docs * make style * make style * improve docetest time * improve docetest time * Update tests/pipelines/test_pipelines_image_to_image.py Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> * Update tests/pipelines/test_pipelines_image_to_image.py Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> * make fixup * undo faulty merge * undo faulty merge * add image-to-image to test pipeline mixin * Update src/transformers/pipelines/image_to_image.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update tests/pipelines/test_pipelines_image_to_image.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * improve docs --------- Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-09-22 19:53:55 +03:00
Younes Belkada	368a58e61c	[`core` ] Integrate Flash attention 2 in most used models (#25598 ) * v1 * oops * working v1 * fixup * add some TODOs * fixup * padding support + try with module replacement * nit * alternative design * oops * add `use_cache` support for llama * v1 falcon * nit * a bit of refactor * nit * nits nits * add v1 padding support falcon (even though it seemed to work before) * nit * falcon works * fixup * v1 tests * nit * fix generation llama flash * update tests * fix tests + nits * fix copies * fix nit * test- padding mask * stype * add more mem efficient support * Update src/transformers/modeling_utils.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * fixup * nit * fixup * remove it from config when saving * fixup * revert docstring * add more checks * use values * oops * new version * fixup * add same trick for falcon * nit * add another test * change tests * fix issues with GC and also falcon * fixup * oops * Update src/transformers/models/falcon/modeling_falcon.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * add init_rope * updates * fix copies * fixup * fixup * more clarification * fixup * right padding tests * add docs * add FA in docker image * more clarifications * add some figures * add todo * rectify comment * Change to FA2 * Update docs/source/en/perf_infer_gpu_one.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * split in two lines * change test name * add more tests * some clean up * remove `rearrange` deps * add more docs * revert changes on dockerfile * Revert "revert changes on dockerfile" This reverts commit `8d72a66b4b`. * revert changes on dockerfile * Apply suggestions from code review Co-authored-by: Lysandre Debut <hi@lysand.re> * address some comments * docs * use inheritance * Update src/transformers/testing_utils.py Co-authored-by: Lysandre Debut <hi@lysand.re> * fixup * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/modeling_utils.py * final comments * clean up * style * add cast + warning for PEFT models * fixup --------- Co-authored-by: Felix Marty <9808326+fxmarty@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Lysandre Debut <hi@lysand.re>	2023-09-22 17:42:10 +02:00
Yoach Lacombe	9a30753485	Porting the torchaudio kaldi fbank implementation to audio_utils (#26182 ) * add kaldi fbank * make style * add herz_to_mel_kaldi tests * add mel to hertz kaldi test * integration tests * correct test and remove comment * make style * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * change parameter name * Apply suggestions from Arthur review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update remove_dc_offset description * fix bug + make style * fix error in using np.exp instead of np.power * make style --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-09-21 17:52:47 +02:00
Lysandre Debut	26ba56ccbd	Fix FSMT weight sharing (#26292 )	2023-09-21 14:46:05 +02:00
fxmarty	da971b2271	Keep relevant weights in fp32 when `model._keep_in_fp32_modules` is set even when `accelerate` is not installed (#26225 ) * fix bug where weight would not be kept in fp32 * nit * address review comments * fix test	2023-09-21 19:00:03 +09:00
Arthur	f94c9b3d86	include changes from llama (#26260 ) * include changes from llama * add a test	2023-09-20 17:19:30 +02:00
Jinho Park	37c205eb5d	Update bros checkpoint (#26277 ) * fix bros integration test * update bros checkpoint	2023-09-20 10:22:07 +02:00
Sourab Mangrulkar	86ffd5ffa2	fix name error when accelerate is not available (#26278 ) * fix name error when accelerate is not available * fix `is_fsdp_available`	2023-09-20 08:02:55 +02:00
Sourab Mangrulkar	382ba670ed	FSDP tests and checkpointing fixes (#26180 ) * add fsdp tests * Update test_fsdp.py * Update test_fsdp.py * fixes * checks * Update trainer.py * fix * fixes for saving/resuming checkpoints * fixes * add tests and delete debug statements * fixing tests * Update test_fsdp.py * fix tests * fix tests * minor nits * fix code style and quality * refactor and modularize test code * reduce the time of tests * reduce the test time * fix test * reduce test time * reduce test time * fix failing tests * fix * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * resolve comments --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-09-20 10:26:16 +05:30
Sam Passaglia	8e3980a290	[FIX] resize_token_embeddings (#26102 ) * fix roundup command * add test for resize_token_embeddings * Update tests/test_modeling_common.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * style --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-09-19 21:44:41 +02:00
NielsRogge	7d6354e047	Add ViTMatte (#25843 ) * First draft * Simplify image processor * Fix rebase * Address comments * Address more comments * Address more comments * Address more comments * Address more comments * Improve pad_image * Add tests * Update integration test * Fix image processor tests * Fix model tests * Convert checkpoints * Fix doc tests * Remove file * Apply suggestions * Address comments * Fix typing hint * Add batch_norm_eps * Address comments * Fix style	2023-09-19 10:56:10 -03:00
Lucain	04191ea1e6	Fix gated repo tests (#26257 ) * Fix gated repo tests * Apply suggestions from code review	2023-09-19 13:25:12 +02:00
NielsRogge	de8bec6df3	[AutoBackbone] Add test (#26094 ) * Add test * Add config_class	2023-09-18 23:47:54 +02:00
Arthur	2da8853775	🚨🚨 🚨🚨 [`Tokenizer`] attemp to fix add_token issues🚨🚨 🚨🚨 (#23909 ) * fix test for bart. Order is correct now let's skip BPEs * ouf * styling * fix bert.... * slow refactoring * current updates * massive refactoring * update * NICE! * update to see where I am at * updates * update * update * revert * updates * updates * start supporting legacy_save * styling * big update * revert some changes * nits * nniiiiiice * small fixes * kinda fix t5 with new behaviour * major update * fixup * fix copies * today's updates * fix byt5 * upfate * update * update * updates * update vocab size test * Barthez does not use not need the fairseq offset ids * super calll must be after * calll super * move all super init * move other super init * fixup * nits * more fixes * nits * more fixes * nits * more fix * remove useless files * ouch all of them are affected * and more! * small imporvements * no more sanitize token * more changes around unique no split tokens * partially fix more things * keep legacy save but add warning * so... more fixes * updates * guess deberta tokenizer could be nuked * fixup * fixup did some bad things * nuke it if it breaks * remove prints and pretrain fast from slow with new format. * fixups * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fiou * nit * by default specials should not be normalized? * update * remove brakpoint * updates * a lot of updates * fixup * fixes revert some changes to match fast * small nits * that makes it cleaner * fix camembert accordingly * update * some lest breaking changes * update * fixup * fix byt5 and whisper mostly * some more fixes, canine's byte vocab * fix gpt2 * fix most of the perceiver tests (4 left) * fix layout lmv3 * fixup * fix copies for gpt2 style * make sure to only warn once * fix perciever and gpt2 tests * some more backward compatibility: also read special tokens map because some ppl use it........////..... * fixup * add else when reading * nits * fresh updates * fix copies * will this make everything faster? * fixes * more fixes * update * more fixes * fixup * is the source of truth right? * sorry camembert for the troubles * current updates * fixup * update led * update * fix regression * fix single word * more model specific fixes * fix t5 tests * fixup * more comments * update * fix nllb * rstrip removed * small fixes * better handle additional_special_tokens and vocab sizes * fixing * styling * fix 4 / 21 * fixup * fix nlbb's tests * some fixes * fix t5 * fixes * style * fix canine tests * damn this is nice * nits * m2m100 nit * fixups * fixes! * fixup * stash * fix merge * revert bad change * fixup * correct order for code Llama * fix speecht5 post merge * styling * revert source of 11 fails * small nits * all changes in one go * fnet hack * fix 2 more tests * update based on main branch of tokenizers * fixup * fix VITS issues * more fixes * fix mgp test * fix camembert issues * oups camembert still has 2 failing tests * mluke fixes * decode fixes * small nits * nits * fix llama and vits * fix camembert * smal nits * more fixes when initialising a fast from a slow and etc * fix one of the last test * fix CPM tokenizer test * fixups * fix pop2piano * fixup * ⚠️ Change tokenizers required version ⚠️ * ⚠️ Change tokenizers required version ⚠️ * "tokenizers>=0.14,<0.15", don't forget smaller than * fix musicgen tests and pretraiendtokenizerfast * fix owlvit and all * update t5 * fix 800 red * fix tests * fix the fix of the fix of t5 * styling * documentation nits * cache _added_tokens_encoder * fixups * Nit * fix red tests * one last nit! * make eveything a lot simpler * Now it's over 😉 * few small nits * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * updates that work for now * tests that should no be skipped / changed and fixed next * fixup * i am ashamed * pushe the fix * update * fixups * nits * fix added_tokens_encoder * fix canine test * fix pegasus vocab * fix transfoXL * fixup * whisper needs to be fixed for train new * pegasus nits * more pegasus fixes * minor update * better error message in failed test * fix whisper failing test * fix whisper failing test * fix pegasus * fixup * fix **** pegasus * reset things * remove another file * attempts to fix the strange custome encoder and offset * nits here and there * update * fixup * nit * fix the whisper test * nits nits * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * updates based on review * some small update to potentially remove * nits * import rlu cache * Update src/transformers/tokenization_utils_base.py Co-authored-by: Lysandre Debut <hi@lysand.re> * move warning to `from_pretrained` * update tests results now that the special tokens are always added --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Lysandre Debut <hi@lysand.re>	2023-09-18 20:28:36 +02:00
Lysandre Debut	77ed9fa1a9	[FSMT] Fix non-shared weights (#26187 ) * Fix non-shared weights * Add tests * Edit tied weights keys	2023-09-18 16:58:38 +02:00
Matt	f0a6057fbc	Fix ConversationalPipeline tests (#26217 ) Add BlenderbotSmall templates and correct handling for conversation.past_user_inputs	2023-09-18 15:08:56 +01:00
Julien Chaumond	bc7ce1808f	moved `ctrl` to `Salesforce/ctrl` (#26183 ) * moved `ctrl` to `Salesforce/ctrl` redirects should theoretically work, but still updating those repo references for clarity * Fixup * Slow doc tests * Add modeling file --------- Co-authored-by: Lysandre <lysandre@huggingface.co>	2023-09-18 13:52:43 +02:00
Patrick von Platen	0a55d9f737	[PEFT] Allow PEFT model dict to be loaded (#25721 ) * Allow PEFT model dict to be loaded * make style * make style * Apply suggestions from code review * address comments * fixup * final change * added tests * fix test * better logic for handling if adapter has been loaded * Update tests/peft_integration/test_peft_integration.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-09-15 18:22:01 +02:00
Arthur	eb644980eb	Fix pad to multiple of (#25732 ) * nits * update the test * nits * update * fix bark * fix bark tests and allow padding to multiple of without new tokens	2023-09-15 11:53:39 -04:00
Sanchit Gandhi	c7b4d0b4e2	[Whisper] Check length of prompt + max new tokens (#26164 )	2023-09-15 15:46:31 +01:00
Sanchit Gandhi	d70fab8b20	[TTA Pipeline] Test MusicGen and VITS (#26146 )	2023-09-15 10:00:36 +01:00
Leo Tronchon	869733ab62	IDEFICS: allow interpolation of vision's pos embeddings (#26029 ) * add pos embed interpolation for vision encoder * style * update config with interpolate_pos_encoding arg * fix imports formatting * take off copied from on vision embeddings * add test for image embeddings interpolation * add credit for interpolation code * Update src/transformers/models/idefics/configuration_idefics.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/idefics/vision.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix condition to check nbr image patches match shape of pos embeddings * use kwargs in the forward methods for interpolation * fix tests * have interpolate_pos_encoding default to False instead of None * Update tests/models/idefics/test_modeling_idefics.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/idefics/test_modeling_idefics.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/idefics/test_modeling_idefics.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/idefics/configuration_idefics.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * take off for loop meant to print k,v * add interpolate_pos_encoding arg in prepare_inputs_for_generation * add test for interpolated generation * fix edge case num_patches == num_positions and height == width * add test for edge case * fix pos_embed in interpolate * allow interpolation in bf16 with upcasting * Update src/transformers/models/idefics/vision.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/idefics/vision.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * add multiple images tests for interpolation and generation --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-09-14 19:27:40 -04:00
Jinho Park	17fdd35481	Add BROS (#23190 ) * add Bros boilerplate * copy and pasted modeling_bros.py from official Bros repo * update copyright of bros files * copy tokenization_bros.py from official repo and update import path * copy tokenization_bros_fast.py from official repo and update import path * copy configuration_bros.py from official repo and update import path * remove trailing period in copyright line * copy and paste bros/__init__.py from official repo * save formatting * remove unused unnecessary pe_type argument - using only crel type * resolve import issue * remove unused model classes * remove unnecessary tests * remove unused classes * fix original code's bug - layer_module's argument order * clean up modeling auto * add bbox to prepare_config_and_inputs * set temporary value to hidden_size (32 is too low because of the of the Bros' positional embedding) * remove decoder test, update create_and_check* input arguemnts * add missing variable to model tests * do make fixup * update bros.mdx * add boilerate plate for no_head inference test * update BROS_PRETRAINED_MODEL_ARCHIVE_LIST (add naver-clova-ocr prefix) * add prepare_bros_batch_inputs function * update modeling_common to add bbox inputs in Bros Model Test * remove unnecessary model inference * add test case * add model_doc * add test case for token_classification * apply fixup * update modeling code * update BrosForTokenClassification loss calculation logic * revert logits preprocessing logic to make sure logits have original shape * - update class name * - add BrosSpadeOutput - update BrosConfig arguments * add boilerate plate for no_head inference test * add prepare_bros_batch_inputs function * add test case * add test case for token_classification * update modeling code * update BrosForTokenClassification loss calculation logic * revert logits preprocessing logic to make sure logits have original shape * apply masking on the fly * add BrosSpadeForTokenLinking * update class name put docstring to the beginning of the file * separate the logits calculation logic and loss calculation logic * update logic for loss calculation so that logits shape doesn't change when return * update typo * update prepare_config_and_inputs * update dummy node initialization * update last_hidden_states getting logic to consider when return_dict is False * update box first token mask param * bugfix: remove random attention mask generation * update keys to ignore on load missing * run make style and quality * apply make style and quality of other codes * update box_first_token_mask to bool type * update index.md * apply make style and quality * apply make fix-copies * pass check_repo * update bros model doc * docstring bugfix fix * add checkpoint for doc, tokenizer for doc * Update README.md * Update docs/source/en/model_doc/bros.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update bros.md * Update src/transformers/__init__.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/bros.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * apply suggestions from code review * apply suggestions from code review * revert test_processor_markuplm.py * Update test_processor_markuplm.py * apply suggestions from code review * apply suggestions from code review * apply suggestions from code review * update BrosSpadeELForTokenClassification head name to entity linker * add doc string for config params * update class, var names to more explicit and apply suggestions from code review * remove unnecessary keys to ignore * update relation extractor to be initialized with config * add bros processor * apply make style and quality * update bros.md * remove bros tokenizer, add bros processor that wraps bert tokenizer * revert change * apply make fix-copies * update processor code, update itc -> initial token, stc -> subsequent token * add type hint * remove unnecessary condition branches in embedding forward * fix auto tokenizer fail * update docstring for each classes * update bbox input dimension as standard 2 points and convert them to 4 points in forward pass * update bros docs * apply suggestions from code review : update Bros -> BROS in bros.md * 1. box prefix var -> bbox 2. update variable names to be more explicit * replace einsum with torch matmul * apply style and quality * remove unused argument * remove unused arguments * update docstrings * apply suggestions from code review: add BrosBboxEmbeddings, replace einsum with classical matrix operations * revert einsum update * update bros processor * apply suggestions from code review * add conversion script for bros * Apply suggestions from code review * fix readme * apply fix-copies --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-09-14 18:02:37 +01:00
Joshua Lochner	95fe0f5d80	[Whisper] Fix word-level timestamps for audio < 30 seconds (#25607 ) * Fix word-level timestamps for audio < 30 seconds * Fix code quality * fix unit tests * Fix unit tests * Fix unit test * temp: print out result * temp: set max diff to None * fix unit tests * fix typo * Fix typo Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Use generation config for `num_frames` * fix docs * Move `num_frames` to kwargs * compute stride/attn_mask once * mark test as slow --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>	2023-09-14 17:42:35 +01:00
Sanchit Gandhi	44a0490d3c	[MusicGen] Add sampling rate to config (#26136 ) * [MusicGen] Add sampling rate to config * remove tiny * make property * Update tests/pipelines/test_pipelines_text_to_audio.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * style --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-09-14 16:57:06 +01:00
Dong-Yong Lee	8881f38a4f	Fix beam search when using model parallel (#24969 ) * Fix GPTNeoX beam search when using parallelize * Fix beam search idx device when using model parallel * remove onnx related stuff Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix: move test_beam_search_on_multi_gpu to GenerationTesterMixin * fix: add right item to _no_split_modules of MegaPreTrainedModel * fix: add num_beams within parallelized beam_search test Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-09-14 11:00:52 -04:00
Matt	866df66fe4	Overhaul Conversation class and prompt templating (#25323 ) * First commit while I figure this out * make fixup * Remove unused method * Store prompt attrib * Fix prompt argument for tests * Make same changes in fast tokenizer * Remove global prompts from fast tokenizer too * stash commit * stash commit * Migrate PromptConfig to its True Final Location * Replace Conversation entirely with the new class * Import/dependency fixes * Import/dependency fixes * Change format for lots of default prompts * More default prompt fixups * Revert llama old methods so we can compare * Fix some default configs * Fix some default configs * Fix misspelled kwarg * Fixes for Blenderbot * make fixup * little rebase cleanup * Add basic documentation * Quick doc fix * Truncate docstring for now * Add handling for the case when messages is a single string * Quick llama merges * Update conversational pipeline and tests * Add a couple of legacy properties for backward compatibility * More legacy handling * Add docstring for build_conversation_input_ids * Restructure PromptConfig * Let's start T E M P L A T I N G * Refactor all default configs to use templates instead * Revert changes to the special token properties since we don't need them anymore * More class templates * Make the sandbox even sandier * Everything replaced with pure templating * Remove docs for PromptConfig * Add testing and optional requirement boilerplate * Fix imports and make fixup * Fix LLaMA tests and add Conversation docstring * Finally get LLaMA working with the template system * Finally get LLaMA working with the template system * make fixup * make fixup * fmt-off for the long lists of test tokens * Rename method to apply_chat_template for now * Start on documentation * Make chat_template a property that reads through to the default if it's not set * Expand docs * Expand chat templating doc some more * trim/lstrip blocks by default and update doc * Few doc tweaks * rebase cleanup * Clarify docstring * rebase cleanup * rebase cleanup * make fixup * Quick doc edit * Reformat the standard template to match ChatML * Re-add PEFT check * Update docs/source/en/chat_templating.md Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Add apply_chat_template to the tokenizer doc * make fixup * Add doc links * Fix chat links * Fix chat links * Explain system messages in the doc * Add chat template test * Proper save-loading for chat template attribute * Add test skips for layout models * Remove _build_conversation_input_ids, add default_chat_template to code_llama * Make sure all LLaMA models are using the latest template * Remove default_system_prompt block in code_llama because it has no default prompt * Update ConversationPipeline preprocess * Add correct #Copied from links to the default_chat_templates * Remove unneeded type checking line * Add a dummy mark_processsed method * Reorganize Conversation to have *deprecated_kwargs Update chat_templating.md * Quick fix to LLAMA tests * Small doc tweaks * Add proper docstrings and "copied from" statements to all default chat templates * Merge use_default_system_prompt support for code_llama too * Improve clarity around self.chat_template * Docstring fix * Fix blenderbot default template * More doctest fix * Break out some tokenizer kwargs * Update doc to explain default templates * Quick tweaks to tokenizer args * Cleanups for tokenizer args * Add note about cacheing * Quick tweak to the chat-templating doc * Update the LLaMA template with error checking and correct system message embedding * make fixup * make fixup * add requires_jinja * Cleanup to expected output formatting * Add cacheing * Fix typo in llama default template * Update LLaMA tests * Update documentation * Improved legacy handling in the Conversation class * Update Jinja template with proper error handling * Quick bugfix * Proper exception raising * Change cacheing behaviour so it doesn't try to pickle an entire Jinja env * make fixup * rebase cleanup --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2023-09-14 15:10:34 +01:00
Younes Belkada	7c63e6fc8c	[`PEFT`] Fix PEFT + gradient checkpointing (#25846 ) * fix PEFT + gradient checkpointing * add disable RG * polish tests * fix comment * Revert "fix comment" This reverts commit `b85386f50d`. * final explanations and tests	2023-09-14 13:01:58 +02:00
Sanchit Gandhi	ac957f69cc	[Whisper Tokenizer] Encode timestamps (#26054 ) * [Whisper Tokenizer] Fix tests after adding timestamps * fix s2t tokenizer tests * fix vocab test * backwards comp * fix tests * comment * style * fix last test * fix fast * make faster * move logic to decode * remove skip test * fix decode with offsets * fix special tokens * empty commit to re-trigger ci * use lru cache	2023-09-14 12:00:43 +01:00
Craig Chan	d7bd325b5a	Add missing Maskformer dataclass decorator, add dataclass check in ModelOutput for subclasses (#25638 ) * Add @dataclass to MaskFormerPixelDecoderOutput * Add dataclass check if subclass of ModelOutout * Use unittest assertRaises rather than pytest per contribution doc * Update src/transformers/utils/generic.py per suggested change Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-09-14 10:30:49 +01:00
Joao Gante	a796f7eea6	Falcon: batched generation (#26137 )	2023-09-13 17:00:52 +01:00
Younes Belkada	7ccac73f74	[`RWKV`] Final fix RWMV 4bit (#26134 ) * Final fix RWMV 4bit * fixup * add a test * add more clarifications	2023-09-13 16:30:20 +02:00
Younes Belkada	c8b26096d4	[`core`] fix 4bit `num_parameters` (#26132 ) * fix 4bit `num_parameters` * stronger check	2023-09-13 14:12:35 +02:00
Sourab Mangrulkar	b477327394	fix the deepspeed tests (#26021 ) * fix the deepspeed tests * resolve comment	2023-09-13 10:26:53 +05:30
Tanay Mehta	12f043eaea	Fix `MarianTokenizer` to remove metaspace character in `decode` (#26091 ) * add: check to remove metaspace from marian tokenizer * fix: metaspace character being removed from everywhere * fix: remove redundant check at top * add: test for marian tokenizer decode fix * fix: simplified the test	2023-09-12 21:53:31 +02:00
Wang, Yi	8f609ab9e0	enable optuna multi-objectives feature (#25969 ) * enable optuna multi-objectives feature Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * update hpo doc * update docstring Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * extend direction to List[str] type Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * Update src/transformers/integrations/integration_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-09-12 18:01:22 +01:00
pokjay	6acc27eea8	Fix ExponentialDecayLengthPenalty negative logits issue (#25594 ) * Fix issues in test_exponential_decay_length_penalty Fix tests which were broken and add validation of negative scores. Current test didn't take into account that ExponentialDecayLengthPenalty updates the score inplace, resulting in updates to base tested Tensor. In addition, the gt assert had empty Tensors due to indexing along the batch dimension. Test is currently expected to fail to show ExponentialDecayLengthPenalty issues with negative scores * Fix ExponentialDecayLengthPenalty negative logits issue In cases where the scores are negative, ExponentialDecayLengthPenalty decreases the score of eos_token_id instead of increasing it. To fix this issue we compute the penalty of the absolute value and add it to the original score. * Add examples for ExponentialDecayLengthPenalty * Fix styling issue in ExponentialDecayLengthPenalty doc * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Style and quality fix * Fix example outputs --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-09-12 12:50:41 +01:00
Joao Gante	3319eb5490	Generate: legacy mode is only triggered when `generation_config` is untouched (#25962 )	2023-09-12 12:08:17 +01:00
Arthur	9cccb3a838	[`Persimmon`] Add support for persimmon (#26042 ) * intiial commit * updates * nits * update conversion script * update conversion script * use path to load * add tips etc * some modeling logic * modeling update * more nits * nits * normal layer norm * update config and doc * nits * update doc remove unused * update * fix inits and stuff * fixup * revert wrong changes * updates * more nits * add default config values to the configuration file * fixup happy * update * 2 tests left * update readmes * more nits * slow test and more documentation * update readme * fix licences * styling * use fast if possible when saving tokenizer * remove todo * remove tokenization tests * small last nits * Apply suggestions from code review Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * nits to skip the timout doctest * fix integration test * fix test * update eos token * update to allow fast tokenization * styling * fix codeLlama as well for the update post processor * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * add more copied from statements * update * doc passes doctest * remove `# final layer norm?` * change docstring prompot * update * Update README.md Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * don't doctest the conversion script as it requires more packages * don't init a model in the config * oups * fix doctest --------- Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2023-09-12 11:33:27 +02:00
Arthur	95b374952d	[`CITests`] skip failing tests until #26054 is merged (#26063 ) * skip failing tests until #26054 is merged * fixup	2023-09-09 05:43:26 +02:00
Angela Yi	6c26faa159	Skip warning if tracing with dynamo (#25581 ) * Ignore warning if tracing with dynamo * fix import error * separate to function * add test	2023-09-08 21:13:33 +02:00
Sanchit Gandhi	2af87d018e	[VITS] Fix nightly tests (#25986 ) * fix tokenizer * make bs even * fix multi gpu test * style * model forward * fix torch import * revert tok pin	2023-09-07 17:49:14 +01:00
Marc Sun	fa6107c97e	modify context length for GPTQ + version bump (#25899 ) * add new arg for gptq * add tests * add min version autogptq * fix order * skip test * fix * Update src/transformers/modeling_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix style * change model path --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-09-06 11:45:47 -04:00
Tanay Mehta	b8def68934	Fix Mega chunking error when using decoder-only model (#25765 ) * add: potential fix to mega chunking in decoder only model bug * add: decoder with chunking test * add: input_mask passed with input_ids	2023-09-05 21:50:14 +02:00
Arthur	4fa0aff21e	[`VITS`] tokenizer integration test: fix revision did not exist (#25996 ) * revision did not exist * correct revision	2023-09-05 21:21:33 +02:00
Arthur	d0354e5e86	[`CI`] Fix red CI and ERROR failed should show (#25995 ) * start with error too * fix ? * start with nit * one more path * use `job_name` * mark pipeline test as slow	2023-09-05 20:16:00 +02:00
Sanchit Gandhi	8d518013ef	[Wav2Vec2 Conformer] Fix inference float16 (#25985 ) * [Wav2Vec2 Conformer] Fix inference float16 * fix test * fix test more * clean pipe test	2023-09-05 18:26:06 +01:00
Sourab Mangrulkar	6bc517ccd4	deepspeed resume from ckpt fixes and adding support for deepspeed optimizer and HF scheduler (#25863 ) * Add support for deepspeed optimizer and HF scheduler * fix bug * fix the import * fix issue with deepspeed scheduler saving for hf optim + hf scheduler scenario * fix loading of hf scheduler when loading deepspeed checkpoint * fix import of `DeepSpeedSchedulerWrapper` * add tests * add the comment and skip the failing tests * address comment	2023-09-05 22:31:20 +05:30
raghavanone	1110b565d6	Add TFDebertaV2ForMultipleChoice (#25932 ) * Add TFDebertaV2ForMultipleChoice * Import newer model in main init * Fix import issues * Fix copies * Add doc * Fix tests * Fix copies * Fix docstring	2023-09-05 17:13:06 +01:00
Abhilash Majumder	70a98024b1	Patch with accelerate xpu (#25714 ) * patch with accelerate xpu * patch with accelerate xpu * formatting * fix tests * revert ruff unrelated fixes * revert ruff unrelated fixes * revert ruff unrelated fixes * fix test * review fixes * review fixes * black fixed * review commits * review commits * style fix * use pytorch_utils * revert markuplm test	2023-09-05 15:41:42 +01:00
Yih-Dar	fbbe1b8a40	Fix `test_load_img_url_timeout` (#25976 ) * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-09-05 11:34:28 +02:00
Yih-Dar	feec56959a	Fix Detr CI (#25972 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-09-05 11:19:56 +02:00
Susnato Dhar	404ff8fc17	Fix typo (#25966 ) * Update feature_extraction_clap.py * changed all lenght to length	2023-09-05 10:12:25 +02:00
Lysandre Debut	22a69f1d7d	Put Falcon back (#25960 ) * Put Falcon back * Update src/transformers/models/auto/configuration_auto.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update test --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-09-04 14:17:09 -04:00
Sanchit Gandhi	d750eff627	[VITS] Fix init test (#25945 ) * [VITS] Fix init test * add flaky decorator * style * max attempts Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> * style --------- Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>	2023-09-04 17:09:26 +01:00
Yih-Dar	b1d475f6d2	Skip offload tests for `ViTDet` (#25913 ) * update * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-09-04 11:35:39 +02:00
ydshieh	ab8cba824e	CI: hotfix (skip VitsModelTest::test_initialization)	2023-09-04 09:06:11 +02:00
Arthur	a4dd53d88e	Update-llama-code (#25826 ) * some bug fixes * updates * Update code_llama.md Co-authored-by: Omar Sanseviero <osanseviero@users.noreply.github.com> * Add co author Co-authored-by: pcuenca <pedro@latenitesoft.com> * add a test * fixup * nits * some updates * fix-coies * adress comments * nits * nits * fix docsting * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * update * add int for https://huggingface.co/spaces/hf-accelerate/model-memory-usage --------- Co-authored-by: Omar Sanseviero <osanseviero@users.noreply.github.com> Co-authored-by: pcuenca <pedro@latenitesoft.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-09-01 20:40:40 +02:00
Sanchit Gandhi	b439129e74	[VITS] Add to TTA pipeline (#25906 ) * [VITS] Add to TTA pipeline * Update tests/pipelines/test_pipelines_text_to_audio.py Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> * remove extra spaces --------- Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>	2023-09-01 16:39:00 +01:00
Zach Mueller	be0e189bd3	Revert frozen training arguments (#25903 ) * Revert frozen training arguments * TODO	2023-09-01 11:24:12 -04:00
Joao Gante	53e2fd785b	Falcon: Add RoPE scaling (#25878 )	2023-09-01 12:05:53 +01:00
Matthijs Hollemans	4ece3b9433	add VITS model (#24085 ) * add VITS model * let's vits * finish TextEncoder (mostly) * rename VITS to Vits * add StochasticDurationPredictor * ads flow model * add generator * correctly set vocab size * add tokenizer * remove processor & feature extractor * add PosteriorEncoder * add missing weights to SDP * also convert LJSpeech and VCTK checkpoints * add training stuff in forward * add placeholder tests for tokenizer * add placeholder tests for model * starting cleanup * let the great renaming begin! * use config * global_conditioning * more cleaning * renaming variables * more renaming * more renaming * it never ends * reticulating the splines * more renaming * HiFi-GAN * doc strings for main model * fixup * fix-copies * don't make it a PreTrainedModel * fixup * rename config options * remove training logic from forward pass * simplify relative position * use actual checkpoint * style * PR review fixes * more review changes * fixup * more unit tests * fixup * fix doc test * add integration test * improve tokenizer tests * add tokenizer integration test * fix tests on GPU (gave OOM) * conversion script can handle repos from hub * add conversion script for all MMS-TTS checkpoints * automatically create a README for the converted checkpoint * small changes to config * push README to hub * only show uroman note for checkpoints that need it * remove conversion script because code formatting breaks the readme * make WaveNet layers configurable * rename variables * simplifying the math * output attentions and hidden states * remove VitsFlip in flow model * also got rid of the other flip * fix tests * rename more variables * rename tokenizer, add phonemization * raise error when phonemizer missing * re-order config docstrings to match method * change config naming * remove redundant str -> list * fix copyright: vits authors -> kakao enterprise * (mean, log_variances) -> (prior_mean, prior_log_variances) * if return dict -> if not return dict * speed -> speaking rate * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * update fused tanh sigmoid * reduce dims in tester * audio -> output_values * audio -> output_values in tuple out * fix return type * fix return type * make _unconstrained_rational_quadratic_spline a function * all nn's to accept a config * add spectro to output * move {speaking rate, noise scale, noise scale duration} to config * path -> attn_path * idxs -> valid idxs -> padded idxs * output values -> waveform * use config for attention * make generation work * harden integration test * add spectrogram to dict output * tokenizer refactor * make style * remove 'fake' padding token * harden tokenizer tests * ron norm test * fprop / save tests deterministic * move uroman to tokenizer as much as possible * better logger message * fix vivit imports * add uroman integration test * make style * up * matthijs -> sanchit-gandhi * fix tokenizer test * make fix-copies * fix dict comprehension * fix config tests * fix model tests * make outputs consistent with reverse/not reverse * fix key concat * more model details * add author * return dict * speaker error * labels error * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/vits/convert_original_checkpoint.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * remove uromanize * add docstrings * add docstrings for tokenizer * upper-case skip messages * fix return dict * style * finish tests * update checkpoints * make style * remove doctest file * revert * fix docstring * fix tokenizer * remove uroman integration test * add sampling rate * fix docs / docstrings * style * add sr to model output * fix outputs * style / copies * fix docstring * fix copies * remove sr from model outputs * Update utils/documentation_tests.txt Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * add sr as allowed attr --------- Co-authored-by: sanchit-gandhi <sanchit@huggingface.co> Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-09-01 10:50:06 +01:00
Younes Belkada	9c5acca002	[`InstructBlip`] FINAL Fix instructblip test (#25887 ) fix instructblip test	2023-08-31 17:01:27 +02:00
raghavanone	2be8a9098e	Save image_processor while saving pipeline (ImageSegmentationPipeline) (#25884 ) * Save image_processor while saving pipeline (ImageSegmentationPipeline) * Fix black issues	2023-08-31 16:08:20 +02:00
Juan Pizarro	09dc99517f	Add Blip2 model in VQA pipeline (#25532 ) * Add Blip2 model in VQA pipeline * use require_torch_gpu for test_large_model_pt_blip2 * use can_generate in vqa pipeline * test Blip2ForConditionalGeneration using float16 * remove custom can_generate from Blip2ForConditionalGeneration	2023-08-30 14:16:16 +01:00
Haylee Schäfer	dbc16f4404	Support loading base64 images in pipelines (#25633 ) * support loading base64 images * add test * mention in docs * remove the logging * sort imports * update error message * Update tests/utils/test_image_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * restructure to catch base64 exception * doesn't like the newline * download files * format * optimize imports * guess it needs a space? * support loading base64 images * add test * remove the logging * sort imports * restructure to catch base64 exception * doesn't like the newline * download files * optimize imports * guess it needs a space? --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-08-29 19:24:24 +01:00
Susnato Dhar	0e59c93983	update remaining `Pop2Piano` checkpoints (#25827 ) update checkpoints	2023-08-29 18:00:40 +01:00
Arthur	5b5ee235f3	[`LlamaTokenizer`] `tokenize` nits. (#25793 ) * return when length is zero * Add tests Co-authored-by: Avnish Narayan <38871737avnishn@users.noreply.github.com> * Co-authored-by: avnishn <38871737+avnishn@users.noreply.github.com> * codeLlama doc should not be on Main * update test --------- Co-authored-by: Avnish Narayan <38871737avnishn@users.noreply.github.com>	2023-08-29 15:08:14 +02:00
NielsRogge	77713d11f6	[DINOv2] Add backbone class (#25520 ) * First draft * More improvements * Fix all tests * More improvements * Add backbone test * Improve docstring * Address comments * Rename attribute * Remove expected output * Update src/transformers/models/dinov2/modeling_dinov2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Fix style --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-08-29 11:05:27 +01:00
NielsRogge	4c21da5e34	Add ViTDet (#25524 ) * First draft * Fix READMEs * Update return_dict * Add more tests * Fix docstrings * Address comments * Address more comments * Address more comments * Address more comments, fix test * Fix test	2023-08-29 10:03:52 +01:00
Lorenzo Battistela	99c3d44906	fixing name position_embeddings to object_queries (#24652 ) * fixing name position_embeddings to object_queries * [fix] renaming variable and docstring do object queries * [fix] comment position_embedding to object queries * [feat] changes from make-fix-copies to keep consistency * Revert "[feat] changes from make-fix-copies to keep consistency" This reverts commit `56e3e9ede1`. * [tests] fix wrong expected score * [fix] wrong assignment causing wrong tensor shapes * [fix] fixing position_embeddings to object queries to keep consistency (make fix copies) * [fix] make fix copies, renaming position_embeddings to object_queries * [fix] positional_embeddingss to object queries, fixes from make fix copies * [fix] comments frmo make fix copies * [fix] adding args validation to keep version support * [fix] adding args validation to keep version support -conditional detr * [fix] adding args validation to keep version support - maskformer * [style] make fixup style fixes * [feat] adding args checking * [feat] fixcopies and args checking * make fixup * make fixup --------- Co-authored-by: Lorenzobattistela <lorenzobattistela@gmail.com>	2023-08-29 09:09:45 +01:00
Arthur	015f8e110d	[`CodeLlama`] Add support for `CodeLlama` (#25740 ) * add all * Revert "Delete .github directory" This reverts commit 9b0ff7b052e2b20b629a26fb13606b78a42944d1. * make conversion script backward compatible * fixup * more styling * copy to llama changes * fix repo consistency * nits * document correct classes * updates * more fixes * nits * update auto mappings * add readmes * smallupdates * llama-code replace with llama_code * make fixup * updates to the testsing suite * fix fast nits * more small fixes * fix decode * fix template processing * properly reset the normalizer * nits processor * tokenization tests pass * styling * last tests * additional nits * one test is left * nits Co-authored-by faabian <faabian@users.noreply.github.com> * update failing test * fixup * remove decode infilling users should handle it on their onw after generation, padding can be a problem * update * make test slow and more meaningfull * fixup * doc update * fixup * Apply suggestions from code review * add kwargs doc * tokenizer requires `requires_backend` * type requires_backends * CodeLlama instead of LlamaCode * more name cahnges * nits * make doctests happy * small pipeline nits * last nit * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * update * add codellama to toctree --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2023-08-25 18:57:40 +02:00
Tianlin Liu	0040469bb8	Correct attention mask dtype for Flax GPT2 (#25636 ) * Correct attention mask dtype * reformat code * add a test for boolean mask * convert test to fast test * delete unwanted print * use assertTrue for testing	2023-08-25 17:36:37 +02:00
Younes Belkada	4b79697865	🚨🚨🚨 [`Refactor`] Move third-party related utility files into `integrations/` folder 🚨🚨🚨 (#25599 ) * move deepspeed to `lib_integrations.deepspeed` * more refactor * oops * fix slow tests * Fix docs * fix docs * addess feedback * address feedback * final modifs for PEFT * fixup * ok now * trigger CI * trigger CI again * Update docs/source/en/main_classes/deepspeed.md Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * import from `integrations` * address feedback * revert removal of `deepspeed` module * revert removal of `deepspeed` module * fix conflicts * ooops * oops * add deprecation warning * place it on the top * put `FutureWarning` * fix conflicts with not_doctested.txt * add back `bitsandbytes` module with a depr warning * fix * fix * fixup * oops * fix doctests --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2023-08-25 17:13:34 +02:00
Pedro Cuenca	cb8e3ee25f	Add FlaxCLIPTextModelWithProjection (#25254 ) * Add FlaxClipTextModelWithProjection This is necessary to support the Flax port of Stable Diffusion XL: `fb6d705fb5/text_encoder_2/config.json (L3)` Co-authored-by: Martin Müller <martin.muller.me@gmail.com> Co-authored-by: Juan Acevedo <juancevedo@gmail.com> * Use FlaxCLIPTextModelOutput * make fix-copies again * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Use `return_dict` for consistency with other uses. Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Fix docstring example. * Add new model to FlaxCLIPTextModelTest * Add to IGNORE_NON_AUTO_CONFIGURED list * Fix naming convention. --------- Co-authored-by: Martin Müller <martin.muller.me@gmail.com> Co-authored-by: Juan Acevedo <juancevedo@gmail.com> Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>	2023-08-25 10:58:14 +02:00
Younes Belkada	ae320fa53f	[`PEFT`] Fix PeftConfig save pretrained when calling `add_adapter` (#25738 ) fix save_pretrained issue + add test	2023-08-25 08:19:11 +02:00
Sanchit Gandhi	0218876822	[ASR Pipe Test] Fix CTC timestamps error message (#25727 )	2023-08-24 17:58:37 +01:00
Stas Bekman	7a6efe1e9f	[idefics] idefics-9b test use 4bit quant (#25734 )	2023-08-24 08:33:14 -07:00
Younes Belkada	584eeb5387	[`AutoGPTQ`] Add correct installation of GPTQ library + fix slow tests (#25713 ) * add correct installation of GPTQ library * update tests values	2023-08-24 14:57:16 +02:00
Yih-Dar	8fff61b9db	Fix failing `test_batch_generation` for bloom (#25718 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-08-24 11:15:29 +02:00
Sylvain Gugger	68fa9a5937	Skip broken tests	2023-08-24 01:48:53 -04:00
Joao Gante	3c2383b1c6	Generate: general test for decoder-only generation from `inputs_embeds` (#25687 ) Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-08-23 19:17:01 +01:00
Arthur	51794bf21e	[`SPM`] Patch `spm` Llama and T5 (#25656 ) * hot fix * only encode with string prefix if starts with prefix * styling * add a new test * fixup	2023-08-23 07:16:43 +02:00
Arthur	e20fab0bbe	Fix bloom add prefix space (#25652 ) * properly support Sequence of pretokenizers * actual fix * make sure the fix works. Tests are not working for sure! * hacky way * add TODO * update * add a todo * nits * rename test * nits * rename test	2023-08-22 14:50:12 +02:00
Tanay Mehta	182b83749a	Add Number Normalisation for SpeechT5 (#25447 ) * add: NumberNormalizer works for integers, floats, common currencies, negative numbers and percentages * fix: renamed number normalizer class and added normalization to SpeechT5Processor * fix: restyled with black and ruff, should pass code quality tests * fix: moved normalization to tokenizer and other small changes to normalizer * add: test for normalization and changed the existing full tokenizer test * fix: tokenization tests now pass, made changes to existing tokenization where normalization is covered; added normalize arg to func signature * fix: changed default normalize setting to False, modified the tests a bit * fix: added support for comma separated numbers, tokenization on the fly with kwargs and normalizer getter setter funcs	2023-08-22 08:12:57 +02:00
Susnato Dhar	450a181d8b	Add Pop2Piano (#21785 ) * init commit * config updated also some modeling * Processor and Model config combined * extraction pipeline(upto before spectogram & mel_conditioner) added but not properly tested * model loading successful! * feature extractor done! * FE can now be called from HF * postprocessing added in fe file * same as prev commit * Pop2PianoConfig doc done * cfg docs slightly changed * fe docs done * batched * batched working! * temp * v1 * checking * trying to go with generate * with generate and model tests passed * before rebasing * . * tests done docs done remaining others & nits * nits * LogMelSpectogram shifted to FeatureExtractor * is_tf rmeoved from pop2piano/init * import solved * tokenization tests added * minor fixed regarding modeling_pop2piano * tokenizer changed to only return midi_object and other changes * Updated paper abstract(Camera-ready version) (#2) * more comments and nits * ruff changes * code quality fix * sg comments * t5 change added and rebased * comments except batching * batching done * comments * small doc fix * example removed from modeling * ckpt * forward it compatible with fe and generation done * comments * comments * code-quality fix(maybe) * ckpts changed * doc file changed from mdx to md * test fixes * tokenizer test fix * changes * nits done main changes remaining * code modified * Pop2PianoProcessor added with tests * other comments * added Pop2PianoProcessor to dummy_objects * added require_onnx to modeling file * changes * update .md file * remove extra line in index.md * back to the main index * added pop2piano to index * Added tokenizer.__call__ with valid args and batch_decode and aligned the processor part too * changes * added return types to 2 tokenizer methods * the PR build test might work now * added backends * PR build fix * vocab added * comments * refactored vocab into 1 file * added conversion script * comments * essentia version changed in .md * comments * more tokenizer tests added * minor fix * tests extended for outputs acc check * small fix --------- Co-authored-by: Jongho Choi <sweetcocoa@snu.ac.kr>	2023-08-21 16:35:00 +01:00
Francisco Kurucz	2f8acfea1c	Fix test_modeling_mpt typo in model id (#25606 ) Fix model id in get_large_model_config on file test_modeling_mpt	2023-08-21 11:11:21 +02:00
ydshieh	1982dd3b15	Hotfix	2023-08-19 11:15:38 +02:00
Stas Bekman	6c811a322f	new model: IDEFICS via HuggingFaceM4 (#24796 ) * rename * restore * mappings * unedited tests+docs * docs * fixes * fix auto-sync breakage * cleanup * wip * wip * add fetch_images * remove einops dependency * update * fix * fix * fix * fix * fix * re-add * add batching * rework * fix * improve * add Leo as I am extending his work * cleanup * fix * cleanup * slow-test * fix * fix * fixes * deal with warning * rename modified llama classes * rework fetch_images * alternative implementation * cleanup * strict version * cleanup * [`IDEFICS`] Fix idefics ci (#25056) * Fix IDEFICS CI * fix test file * fixup * some changes to make tests pass * fix * fixup * Update src/transformers/models/idefics/configuration_idefics.py Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> --------- Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * remove compat checks * style * explain that Idefics is not for training from scratch * require pt>=2.0 * fix idefics vision config (#25092) * fix idefics vision config * fixup * clean * Update src/transformers/models/idefics/configuration_idefics.py --------- Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * cleanup * style * cleanup * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * upcase * sequence of images * handle the case with no images * Update src/transformers/image_processing_utils.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * support pure lm take 2 * support tokenizer options * parameterize num_channels * fix upcase * s\|IdeficsForCausalLM\|IdeficsForVisionText2Text\|g * manual to one line * addressing review * unbreak * remove clip dependency * fix test * consistency * PIL import * Idefics prefix * Idefics prefix * hack to make tests work * style * fix * fix * revert * try/finally * cleanup * clean up * move * [`IDEFICS`] Fix idefics config refactor (#25149) * refactor config * nuke init weights * more refactor * oops * remove visual question answering pipeline support * Update src/transformers/models/idefics/clip.py Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * Update src/transformers/models/idefics/modeling_idefics.py * cleanup * mv clip.py vision.py * tidyup --------- Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Stas Bekman <stas@stason.org> * fix * license * condition on pt * fix * style * fix * rm torchvision dependency, allow custom transforms * address review * rework device arg * add_eos_token * s/transforms/transform/ * fix top level imports * fix return value * cleanup * cleanup * fix * style * license * license * Update src/transformers/models/idefics/image_processing_idefics.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * add a wrapper to freeze vision layears * tidyup * use the correct std/mean settings * parameterize values from config * add tests/models/idefics/test_image_processing_idefics.py * add test_processor_idefics.py * cleanup * cleanups * fix * fix * move to the right group * style * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * add perceiver config * reset * missing arg docs * Apply suggestions from code review Co-authored-by: Leo Tronchon <leo.tronchon@gmail.com> * address review comments * inject automatic end of utterance tokens (#25218) * inject automatic end of utterance tokens * fix * fix * fix * rework to not use the config * not end_of_utterance_token at the end * Update src/transformers/models/idefics/processing_idefics.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * address review * Apply suggestions from code review Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/image_processing_utils.py Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> * [`Idefics`] add image_embeddings option in generate-related methods (#25442) * add image_embeddings option in generate-related methods * style * rename image_embeddings and allow perceiver embeddings precomputation * compute embeddings within generate * make is_encoder_decoder= True the default in config * nested if else fix * better triple check * switch if elif order for pixel values / img embeds * update model_kwargs perceiver only at the end * use _prepare_model_inputs instead of encoder_decoder logic * fix comment typo * fix config default for is_encoder_decoder * style * add typehints * precompute in forward * doc builder * style * pop instead of get image hidden states * Trigger CI * Update src/transformers/models/idefics/modeling_idefics.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/idefics/modeling_idefics.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix * + indentation + style * simplify a bit the use_resampler logic using comments * update diocstrings * Trigger CI --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix rebase changes * unbreak #25237 - to be fixed in follow up PRs * is_composition = False * no longer needed --------- Co-authored-by: leot13 <leo.tronchon@gmail.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Victor SANH <victorsanh@gmail.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-08-18 14:12:28 -07:00
Younes Belkada	faed2ca46f	[`PEFT`] Peft integration alternative design (#25077 ) * a draft version * v2 integration * fix * make it more generic and works for IA3 * add set adapter and multiple adapters support * fixup * adapt a bit * oops * oops * oops * adapt more * fix * add more refactor * now works with model class * change it to instance method as it causes issues with `jit`. * add CR * change method name * add `add_adapter` method * clean up * Update src/transformers/adapters/peft_mixin.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * add moe utils * fixup * Update src/transformers/adapters/peft_mixin.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * adapt * oops * fixup * add is_peft_available * remove `requires_backend` * trainer compatibility * fixup + docstring * more details * trigger CI * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/modeling_utils.py * fixup + is_main_process * added `save_peft_format` in save_pretrained * up * fix nits here and there * nits here and there. * docs * revert `encoding="utf-8"` * comment * added slow tests before the PEFT release. * fixup and nits * let's be on the safe zone * added more comments * v1 docs * add remaining docs * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * move to `lib_integrations` * fixup * this time fixup * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * address final comments * refactor to use `token` * add PEFT to DockerFile for slow tests. * added pipeline support. --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2023-08-18 19:08:03 +02:00
Arthur	bc3e20dcf0	[`Llama`] remove prompt and fix prefix finetuning (#25565 ) * nit * update * make sure use_default_system_prompt is saved * update checkpointing * consistency * use_default_system_prompt for test	2023-08-18 13:39:23 +02:00
Arthur	30b3c46ff5	[`split_special_tokens`] Add support for `split_special_tokens` argument to encode (#25081 ) * draft changes * update and add tests * styling for no * move test * path to usable model * update test * small update * update bertbased tokenizers * don'tuse kwargs for _tokenize * don'tuse kwargs for _tokenize * fix copies * update * update test for special tokenizers * fixup * skip two tests * remove pdb breakpiont() * wowo * rewrite custom tests * nits * revert chang in target keys * fix markup lm * update documentation of the argument	2023-08-18 13:26:27 +02:00
Alex McKinney	9d7afd2536	Replaces calls to `.cuda` with `.to(torch_device)` in tests (#25571 ) * Replaces calls to `.cuda` with `.to(torch_device)` in tests `torch.Tensor.cuda()` is a pre-0.4 solution to changing a tensor's device. It is recommended to prefer `.to(...)` for greater flexibility and error handling. Furthermore, this makes it more consistent with other tests (that tend to use `.to(torch_device)`) and ensures the correct device backend is used (if `torch_device` is neither `cpu` or `cuda`). * addressing review comments * more formatting changes in Bloom test * `make style` * Update tests/models/bloom/test_modeling_bloom.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fixes style failures --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-08-18 12:40:40 +02:00
Yih-Dar	427adc898a	Skip `test_contrastive_generate` for `TFXLNet` (#25574 ) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-08-17 18:56:34 +02:00
Yoach Lacombe	b8f69d0d10	Add Text-To-Speech pipeline (#24952 ) * add AutoModelForTextToSpeech class * add TTS pipeline and tessting * add docstrings to text_to_speech pipeline * fix torch dependency * corrector 'processor is None' case in Pipeline * correct repo id * modify text-to-speech -> text-to-audio * remove processor * rename text_to_speech pipelines files to text_audio * add textToWaveform and textToSpectrogram instead of textToAudio classes * update TTS pipeline to the bare minimum * update tests TTS pipeline * make style and erase useless import torch in TTS pipeline tests * modify how to check if generate or forward in TTS pipeline * remove unnecessary extra new lines * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * refactor input_texts -> text_inputs * correct docstrings of TTS.__call__ * correct the shape of generated waveform * take care of Bark tokenizer special case * correct run_pipeline_test TTS * make style * update TTS docstrings * address Sylvain nit refactors * make style * refactor into one liners * correct squeeze * correct way to test if forward or generate * Update output audio waveform shape * make style * correct import * modify how the TTS pipeline test if a model can generate * align shape output of TTS pipeline with consistent shape --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>	2023-08-17 17:34:47 +01:00
Arthur	181d778f83	[`NllbMoe`] Update code to properly support loss computation (#25429 ) * update nllb_moe * fix * doc nits * nits * add a small test * ficup * remove adapted from	2023-08-17 17:21:56 +02:00
Arthur	b4d5548800	🚨🚨🚨 [`SPM`] Finish fix spm models 🚨🚨🚨 (#25224 ) * fix EVERYTHING * more fixes * ⚗️⚗️ Tokenizer magic ⚗️⚗️ * wrong value but test passes for the TODO * update * updat * safe protobuf import? * style * non gated repo * update * fixup * Update src/transformers/models/llama/tokenization_llama.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/llama/tokenization_llama.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/t5/test_tokenization_t5.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * nits * fix t5 too * use assert equal * fix llama decoding * nits on t5 * fixup * only remove the prefix space, not other spaces * more deconding tests and more todos * fix CI as well * fixup * skip failing test on CI (its tf its ok) * skip test_subword_regularization_tokenizer that is also crashing on the CI for TF * update llama * revert good fixes * fixup * empty * explain why we need to encode with an additional token * better warning? * nits --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-08-17 17:08:05 +02:00
Arthur	d6bf08f7f6	[`resize_embedding`] Introduce `pad_to_multiple_of` and guidance (#25088 ) * fix * revert cahnges and update resizing of embedding layer * use wraning * fixup * more styling nits * fix all tests that overload the embedding tests * 👀👀 remove breakpoint * remove useless overload + overload correctly where needed * resize lm head with new vocab size * reverse not necessary changes * style * fix CIs! * fix last CI tests, adapt bark and Marian * fixup	2023-08-17 17:00:32 +02:00
Yih-Dar	d2871b2975	Skip `test_beam_search_xla_generate_simple` for `T5` (#25566 ) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-08-17 15:30:46 +02:00
Younes Belkada	e7e9261a20	[`Docs`] Fix un-rendered images (#25561 ) fix un-rendered images	2023-08-17 12:08:11 +02:00
Yih-Dar	8992589dd6	Skip `test_onnx_runtime_optimize` for now (#25560 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-08-17 11:23:16 +02:00
Yih-Dar	ec25306b39	Fix MPT CI (#25548 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-08-17 09:06:26 +02:00
Sanchit Gandhi	36f183ebab	[ASR Pipeline] Fix init with timestamps (#25438 ) * [ASR Pipeline] Fix init * refactor test * change default kwarg setting * only perform checks if we have to * override init * move pre/forward/post checks to sanitize	2023-08-16 18:04:19 +01:00
amyeroberts	6bca43bb90	Input data format (#25464 ) * Add copied from statements for image processors * Move out rescale and normalize to base image processor * Remove rescale and normalize from vit (post rebase) * Update docstrings and tidy up * PR comments * Add input_data_format as preprocess argument * Resolve tests and tidy up * Remove num_channels argument * Update doc strings -> default ints not in code formatting	2023-08-16 17:45:02 +01:00
Yih-Dar	f61f072b61	Fix `MaskFormerModelIntegrationTest` OOM (#25544 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-08-16 18:11:24 +02:00
Marc Sun	0ed23e4db2	fix vit hybrid test (#25543 ) fix test	2023-08-16 17:02:57 +02:00
Joao Gante	3f9cb33504	Generate: fix default max length warning (#25539 )	2023-08-16 15:30:54 +01:00
Joao Gante	0b568291d7	Marian: post-hack-fix correction (#25459 )	2023-08-16 11:49:29 +01:00
Zach Mueller	ca51499248	Make training args fully immutable (#25435 ) * Make training args fully immutable * Working tests, PyTorch * In test_trainer * during testing * Use proper dataclass way * Fix test * Another one * Fix tf * Lingering slow * Exception * Clean	2023-08-15 11:47:47 -04:00
amyeroberts	c41291965f	🚨🚨🚨 Remove softmax for EfficientNetForImageClassification 🚨🚨🚨 (#25501 ) * Remove softmax for EfficientNet * Update integration test values * Fix up	2023-08-14 17:08:47 +01:00
amyeroberts	5e5fa0d88c	Mark flaky tests (#25463 ) Make CI less brittle	2023-08-11 15:26:45 +01:00
amyeroberts	11757e2bbd	Add input_data_format argument, image transforms (#25462 ) * Enable specifying input data format - overriding inferring * Add tests	2023-08-11 15:09:31 +01:00
Joao Gante	4692d26194	Switch Transformers: remove overwritten beam sample test (#25458 )	2023-08-11 13:16:01 +01:00
amyeroberts	41d56ea6dd	Refactor image processor testers (#25450 ) * Refactor image processor test mixin - Move test_call_numpy, test_call_pytorch, test_call_pil to mixin - Rename mixin to reflect handling of logic more than saving - Add prepare_image_inputs, expected_image_outputs for tests * Fix for oneformer	2023-08-11 11:30:18 +01:00
Marc Sun	55db70c63d	GPTQ integration (#25062 ) * GTPQ integration * Add tests for gptq * support for more quantization model * fix style * typo * fix method * Update src/transformers/modeling_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * add dataclass and fix quantization_method * fix doc * Update tests/quantization/gptq/test_gptq.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * modify dataclass * add gtpqconfig import * fix typo * fix tests * remove dataset as req arg * remove tokenizer import * add offload cpu quantization test * fix check dataset * modify dockerfile * protect trainer * style * test for config * add more log * overwrite torch_dtype * draft doc * modify quantization_config docstring * fix class name in docstring * Apply suggestions from code review Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * more warning * fix 8bit kwargs tests * peft compatibility * remove var * fix is_gptq_quantized * remove is_gptq_quantized * fix wrap * Update src/transformers/modeling_utils.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * add exllama * skip test * overwrite float16 * style * fix skip test * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fix docsting formatting * add doc * better test --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>	2023-08-10 16:06:29 -04:00
Joao Gante	3e41cf13fc	Generate: Load generation config when `device_map` is passed (#25413 )	2023-08-10 10:54:26 +01:00
Joao Gante	123ad5363f	Generation: strict generation config validation at save time (#25411 ) * strict gen config save; Add tests * add note that the warning will be an exception in v4.34	2023-08-10 10:42:34 +01:00
amyeroberts	944ddce8bf	Enable passing number of channels when inferring data format (#25412 )	2023-08-09 17:41:21 +01:00
hukuda222	cb3c821cb7	aligned sample_beam output selection with beam_search (#25375 ) * aligned sample_beam specs with beam_search * pull origin main * Revert "pull origin main" This reverts commit `06d356f113`. * update test_utils.py * fix format * remove comment --------- Co-authored-by: Shogo Fujita <shogo.fujita@legalontech.jp>	2023-08-09 18:28:57 +02:00
Yoach Lacombe	704bf595eb	Update Bark generation configs and tests (#25409 ) * update bark generation configs for more coherent parameter * make style * update bark hub repo	2023-08-09 18:28:02 +02:00
Yih-Dar	5b517e1764	Use small config for `OneFormerModelTest.test_model_with_labels` (#25383 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-08-08 17:15:34 +02:00
Sanchit Gandhi	dedd11160d	[ASR Pipeline] Clarify return timestamps (#25344 ) * [ASR Pipeline] Clarify return timestamps * fix indentation * fix ctc check * fix ctc error message! * fix test * fix other test * add new tests * final comment	2023-08-08 10:16:00 +01:00
Yih-Dar	6ea3ee3cd2	Fix `test_model_parallelism` (#25359 ) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-08-08 10:48:45 +02:00
Matthew Hoffman	d4bd33cc9f	Register ModelOutput subclasses as supported torch.utils._pytree nodes (#25358 ) * Register ModelOutput subclasses as supported torch.utils._pytree nodes Fixes #25357 where DDP with static_graph=True does not sync gradients when calling backward() over tensors contained in ModelOutput subclasses * Add test for torch pytree ModelOutput serialization and deserialization	2023-08-08 08:12:11 +02:00
Pedro Lira	080a97119c	Add mask2former fp16 support (#25093 ) * Add mask2former fp16 support * Clear consistency/quality issues * Fix consistency/quality (2) * Add integration test for mask2former (fp16 case) * Fix code quality * Add integration test for maskformer (fp16 case) * Add integration test for oneformer (fp16 case) * Remove slow decorator from fp16 tests * Fix lint * Remove usage of full inference and value checks for fp16 * Temporarily comment slow for {mask, mask2, one}former * Add fp16 support to oneformer * Revert "Temporarily comment slow for {mask, mask2, one}former" This reverts commit `e5371edabd`. * Remove dtype conversion noop	2023-08-07 20:07:29 +01:00
Sylvain Gugger	baf1daa58e	Migrate Trainer from `Repository` to `upload_folder` (#25095 ) * First draft * Deal with progress bars * Update src/transformers/utils/hub.py Co-authored-by: Lucain <lucainp@gmail.com> * Address review comments * Forgot one * Pin hf_hub * Add argument for push all and fix tests * Fix tests * Address review comments --------- Co-authored-by: Lucain <lucainp@gmail.com>	2023-08-07 17:47:22 +02:00
Yih-Dar	c177606fb4	Fix more offload edge cases (#25342 ) * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-08-07 17:45:41 +02:00
Guillaume "Vermeille" Sanchez	d533465150	add CFG for .generate() (#24654 )	2023-08-06 20:15:24 +01:00
Yih-Dar	ce6d153a53	Make `bark` could have tiny model (#25290 ) * temp * update * update * update * small dim * small dim * small dim * fix * update * fix * fix * fix * fix * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-08-04 15:13:14 +02:00
Sylvain Gugger	f0fd73a2de	Document check copies (#25291 ) * Document check copies better and add tests * Include header in check for copies * Manual fixes * Try autofix * Fixes * Clean tests * Finalize doc * Remove debug print * More fixes	2023-08-04 14:56:29 +02:00
Sylvain Gugger	29f04002e6	Deal with nested configs better in base class (#25237 ) * Deal better with nested configs * Fixes * More fixes * Fix last test * Clean up existing configs * Remove hack in MPT Config * Update src/transformers/configuration_utils.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Fix setting a nested config via dict in the kwargs * Adapt common test * Add test for nested config load with dict --------- Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>	2023-08-04 14:56:09 +02:00
Sylvain Gugger	fab1a0aa82	Give more memory in test_disk_offload (#25315 )	2023-08-04 14:10:31 +02:00
Roland Szabo	d114a6b71f	Add timeout parameter to load_image function (#25184 ) * Add timeout parameter to load_image function. * Remove line. * Reformat code Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Add parameter to docs. --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-08-03 15:51:54 +01:00
Yoach Lacombe	6d3f9c1e2e	add generate method to SpeechT5ForTextToSpeech (#25233 ) * add generate method to SpeechT5ForTextToSpeech * update speecht5forTTS docstrings * Remove defaults to None in generate docstrings Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2023-08-03 14:12:07 +01:00
amyeroberts	30409af6e1	Update InstructBLIP & Align values after rescale update (#25209 ) * Update InstructBLIP values Note: the tests are not independent. Running the test independentely produces different logits compared to running all the integration tests * Update test values after rescale update * Remove left over commented out code * Revert to previous rescaling logic * Update rescale tests	2023-08-03 11:01:10 +01:00
Yih-Dar	bd90cda9a6	CI with `num_hidden_layers=2` 🚀🚀🚀 (#25266 ) * CI with layers=2 --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-08-02 20:22:36 +02:00
Patrick von Platen	b28ebb2655	[MMS] Fix mms (#25267 ) * [MMS] Fix mms * [MMS] Fix mms * fix mms loading * Apply suggestions from code review * make style * Update tests/models/wav2vec2/test_modeling_wav2vec2.py	2023-08-02 18:11:15 +02:00
Yupeng Jia	8021c684ec	Fix some bugs for two stage training of deformable detr (#25045 ) * Update modeling_deformable_detr.py Fix bugs for two stage training * Update modeling_deformable_detr.py * Add test_two_stage_training to DeformableDetrModelTest --------- Co-authored-by: yupeng.jia <yupeng.jia@momenta.ai>	2023-08-02 11:30:36 +01:00
amyeroberts	1b35409768	Update rescale tests - cast to float after rescaling to reflect #25229 (#25259 ) Rescale tests - cast to float after rescaling to reflect #25229	2023-08-02 11:29:55 +01:00
YQ	2230d149f0	fix get_keys_to_not_convert() to return correct modules for full precision inference (#25105 ) * add test for `get_keys_to_not_convert` * add minimum patch to keep mpt lm_head from 8bit quantization * add reivsion to	2023-08-02 04:21:52 -04:00
Younes Belkada	05ebb0264e	[`MPT`] Add `require_bitsandbytes` on MPT integration tests (#25201 ) * add `require_bitsandbytes` on MPT integration tests * add it on mpt as well	2023-08-01 12:20:34 +02:00
Yih-Dar	1b4f6199c6	Update tiny model info. and pipeline testing (#25213 ) * update tiny_model_summary.json * update * update * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-07-31 19:35:33 +02:00
Yih-Dar	9ca3aa0156	Fix `all_model_classes` in `FlaxBloomGenerationTest` (#25211 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-07-31 17:32:05 +02:00
amyeroberts	05cda5df34	🚨🚨🚨 Fix rescale ViVit Efficientnet (#25174 ) * Fix rescaling bug * Add tests * Update integration tests * Fix up * Update src/transformers/image_transforms.py * Update test - new possible order in list	2023-07-28 19:52:51 +01:00
Sanchit Gandhi	03f98f9683	[MusicGen] Fix integration tests (#25169 ) * move to device * update with cuda values * fix fp16 * more rigorous	2023-07-28 18:50:15 +01:00
Younes Belkada	dd9d45b6ec	[`InstructBlip`] Fix instructblip slow test (#25171 ) * fix instruct blip slow test * Update tests/models/instructblip/test_modeling_instructblip.py	2023-07-28 17:00:10 +02:00
Younes Belkada	add0895dd9	[`Mpt`] Fix mpt slow test (#25170 ) fix mpt slow test	2023-07-28 16:45:09 +02:00
Lucain	c1dba1111b	Add test when downloading from gated repo (#25039 )	2023-07-28 08:14:27 -04:00
Sanchit Gandhi	e93103632b	Add bloom flax (#25094 ) * First commit * step 1 working * add alibi * placeholder for `scan` * add matrix mult alibi * beta scaling factor for bmm * working v1 - simple forward pass * move layer_number from attribute to arg in call * partial functioning scan * hacky working scan * add more modifs * add test * update scan for new kwarg order * fix position_ids problem * fix bug in attention layer * small fix - do the alibi broadcasting only once * prelim refactor * finish refactor * alibi shifting * incorporate dropout_add to attention module * make style * make padding work again * update * remove bogus file * up * get generation to work * clean code a bit * added small tests * adding albii test * make CI tests pass: - change init weight - add correct tuple for output attention - add scan test - make CI tests work * fix few nits * fix nit onnx * fix onnx nit * add missing dtype args to nn.Modules * remove debugging statements * fix scan generate * Update modeling_flax_bloom.py * Update test_modeling_flax_bloom.py * Update test_modeling_flax_bloom.py * Update test_modeling_flax_bloom.py * fix small test issue + make style * clean up * Update tests/models/bloom/test_modeling_flax_bloom.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * fix function name * small fix test * forward contrib credits from PR17761 * Fix failing test * fix small typo documentation * fix non passing test - remove device from build alibi * refactor call - refactor `FlaxBloomBlockCollection` module * make style * upcast to fp32 * cleaner way to upcast * remove unused args * remove layer number * fix scan test * make style * fix i4 casting * fix slow test * Update src/transformers/models/bloom/modeling_flax_bloom.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * remove `layer_past` * refactor a bit * fix `scan` slow test * remove useless import * major changes - remove unused code - refactor a bit - revert import `torch` * major refactoring - change build alibi * remove scan * fix tests * make style * clean-up alibi * add integration tests * up * fix batch norm conversion * style * style * update pt-fx cross tests * update copyright * Update src/transformers/modeling_flax_pytorch_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * per-weight check * style * line formats --------- Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: haileyschoelkopf <haileyschoelkopf@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2023-07-27 18:24:56 +01:00
Yoach Lacombe	0b92ae3489	Add offload support to Bark (#25037 ) * initial Bark offload proposal * use hooks instead of manually offloading * add test of bark offload to cpu feature * Apply nit suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update docstrings of offload Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * remove unecessary set_seed in Bark tests --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>	2023-07-27 15:35:17 +01:00
Arthur	9cea3e7b80	[`MptConfig`] support from pretrained args (#25116 ) * support from pretrained args * draft addition of tests * update test * use parrent assert true * Update src/transformers/models/mpt/configuration_mpt.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> --------- Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>	2023-07-27 16:24:52 +02:00
amyeroberts	659829b6ae	MaskFormer - enable return_dict in order to compile (#25052 ) * Enable return_dict in order to compile * Update tests	2023-07-26 16:23:30 +01:00
Yih-Dar	224da5df69	update `use_auth_token` -> `token` (#25083 ) * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-07-26 15:09:59 +02:00
Yih-Dar	31acba5697	Fix `PvtModelIntegrationTest::test_inference_fp16` (#25106 ) update Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-07-26 14:57:44 +02:00
Sebastian Husch Lee	8f36ab3e22	[`T5`, `MT5`, `UMT5`] Add [T5, MT5, UMT5]ForSequenceClassification (#24726 ) * Initial addition of t5forsequenceclassification * Adding imports and adding tests * Formatting * Running make fix-copies * Adding mt5forseq * Formatting * run make fix-copies * Adding to docs * Add model_parallel * Fix bug * Fix * Remove TODO * Fixing tests for T5ForSequenceClassification * Undo changes to dependency_versions_table.py * Change classification head to work with T5Config directly * Change seq length to let tests pass * PR comments for formatting * Formatting * Initial addition of UMT5ForSequenceClassification * Adding to inits and formatting * run make fix-copies * Add doc for UMT5ForSeqClass * Update UMT5 config * Fix docs * Skip torch fx test for SequenceClassification * Formatting * Add skip to UMT5 tests as well * Fix umt5 tests * Running make fix-copies * PR comments * Fix for change to sentence_representation * Rename seq_len to hidden_size since that's what it is * Use base_model to follow format of the rest of the library * Update docs * Extract the decoder_input_ids changes and make one liner * Make one-liner	2023-07-25 21:02:49 +02:00
Arthur	f9cc333805	[ `PreTrainedTokenizerFast`] Keep properties from fast tokenizer (#25053 ) * draft solution * use `setdefault` * nits * add tests and fix truncation issue * fix test * test passes locally * quality * updates * update tsets	2023-07-25 18:45:01 +02:00
Connor Henderson	0779fc8eb8	Edit err message and comment in `test_model_is_small` (#25087 ) * Edit err message and comment in * put back 80M comment	2023-07-25 12:24:36 -04:00
Arthur	dcb183f4bd	[`MPT`] Add MosaicML's `MPT` model to transformers (#24629 ) * draft add new model like * some cleaning of the config * nits * add nested configs * nits * update * update * added layer norms + triton kernels * consider only LPLayerNorm for now. * update * all keys match. * Update * fixing nits here and there * working forward pass. * removed einops dependency * nits * format * add alibi * byebye head mask * refactor attention * nits. * format * fix nits. * nuke ande updates * nuke tokenizer test * don't reshape query with kv heads * added a bit of documentation. * remove unneeded things * nuke more stuff * nit * logits match - same generations * rm unneeded methods * 1 remaining failing CI test * nit * fix nits * fix docs * fix docs * rm tokenizer * fixup * fixup * fixup and fix tests * fixed configuration object. * use correct activation * few minor fixes * clarify docs a bit * logits match à 1e-12 * skip and unskip a test * added some slow tests. * fix readme * add more details * Update docs/source/en/model_doc/mpt.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix configuration issues * more fixes in config * added more models * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * remove unneeded position ids * fix some comments * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * revert suggestion * mpt alibi + added batched generation * Update src/transformers/models/mpt/__init__.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * remove init config * Update src/transformers/models/mpt/configuration_mpt.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix nit * add another slow test * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fits in one line * some refactor because make fixup doesn't pass * add ft notebook * update md * correct doc path --------- Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2023-07-25 14:32:40 +02:00
Xuehai Pan	6bc61aa7af	Set `TF32` flag for PyTorch cuDNN backend (#25075 )	2023-07-25 08:04:48 -04:00
Sylvain Gugger	f295fc8a16	Fix last models for common tests that are too big. (#25058 ) * Fix last models for common tests that are too big. * Remove print statement	2023-07-25 07:56:04 -04:00
Rinat	a03d13c83d	Pvt model (#24720 ) * pull and push updates * add docs * fix modeling * Add and run test * make copies * add task * fix tests and fix small issues * Checks on a Pull Request * fix docs * add desc pvt.md	2023-07-24 15:34:19 +01:00
Sylvain Gugger	afe8bfc075	Comment again print statement	2023-07-24 10:12:20 -04:00
Sylvain Gugger	42571f6eb8	Make more test models smaller (#25005 ) * Make more test models tiny * Make more test models tiny * More models * More models	2023-07-24 10:08:47 -04:00
Zach Mueller	3b734f5042	Add dispatch_batches to training arguments (#25038 ) * Dispatch batches * Copy items	2023-07-24 09:27:19 -04:00
Arthur	0511369a8b	[`LlamaConfig`] Nit: pad token should be None by default (#24958 ) * pad token should be None by default * fix tests * nits	2023-07-21 14:32:34 +02:00
Benjamin Badger	caf5e369fc	Contrastive Search peak memory reduction (#24120 ) Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2023-07-20 18:46:53 +01:00
Joao Gante	89136ff7f8	Generate: sequence bias can handle same terminations (#24822 )	2023-07-20 12:23:17 +01:00
Tom Aarsen	79444f370f	Deprecate unused OpenLlama architecture (#24922 ) * Resolve typo in check_repo.py * Specify encoding when opening modeling files * Deprecate the OpenLlama architecture * Add disclaimer pointing to Llama I'm open to different wordings here * Match the capitalisation of LLaMA	2023-07-20 07:03:24 -04:00
Arthur	07360b6c9c	[`Llama2`] Add support for Llama 2 (#24891 ) * add llama * add other readmes * update padding id in readme * add link to paper * fix paths and tokenizer * more nits * styling * fit operation in 2 lines when possible * nits * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * add form * update reademe * update readme, we don't have a default pad token * update test and tokenization * LLaMA instead of Llama * nits * add expected text * add greeedy output * styling * Update src/transformers/models/llama/modeling_llama.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * sequential device map * skip relevant changes --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2023-07-18 15:18:31 -04:00
NielsRogge	3ec10e6c76	Add DINOv2 (#24016 ) * First draft * More improvements * Convert patch embedding layer * Convert all weights * Make conversion work * Improve conversion script * Fix style * Make all tests pass * Add image processor to auto mapping * Add swiglu ffn * Add image processor to conversion script * Fix conversion of giant model * Fix documentation * Fix style * Fix tests * Address comments * Address more comments * Remove unused arguments * Remove more arguments * Rename parameters * Include mask token * Address comments * Add docstring * Transfer checkpoints * Empty commit	2023-07-18 15:34:06 +01:00
Yih-Dar	57da42ad05	Enable `ZeroShotAudioClassificationPipelineTests::test_small_model_pt` (#24882 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-07-18 15:08:53 +02:00
statelesshz	9c875839c0	add ascend npu accelerator support (#24879 ) * Add Ascend NPU accelerator support * fix style warining	2023-07-18 08:20:32 -04:00
Yih-Dar	2ab75add4b	Remove `tests/onnx` (#24868 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-07-17 22:37:28 +02:00
Yih-Dar	870dfc15b2	Skip failing `ZeroShotAudioClassificationPipelineTests::test_small_model_pt` for now (#24867 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-07-17 15:51:50 -04:00
Yoach Lacombe	f42a35e611	Add bark (#24086 ) * first raw version of the bark integration * working code on small models with single run * add converting script from suno weights 2 hf * many changes * correct past_kv output * working implementation for inference * update the converting script according to the architecture changes * add a working end-to-end inference code * remove some comments and make small changes * remove unecessary comment * add docstrings and ensure no unecessary intermediary output during audio generation * remove done TODOs * make style + add config docstrings * modification for batch inference support on the whole model * add details to .generation_audio method * add copyright * convert EncodecModel from original library to transformers implementation * add two class in order to facilitate model and sub-models loading from the hub * add support of loading the whole model * add BarkProcessor * correct modeling according to processor output * Add proper __init__ and auto support * Add up-to-date copyright/license message * add relative import instead of absolute * cleaner head_dim computation * small comment removal or changes * more verbose LayerNorm init method * specify eps for clearer comprehension * more verbose variable naming in the MLP module * remove unecessary BarkBlock parameter * clearer code in the forward pass of the BarkBlock * remove _initialize_modules method for cleaner code * Remove unnecessary methods from sub-models * move code to remove unnecessary function * rename a variable for clarity and change an assert * move code and change variable name for clarity * remove unnecessary asserts * correct small bug * correct a comment * change variable names for clarity * remove asserts * change import from absolute to relative * correct small error due to comma missing + correct import * Add attribute Bark config * add first version of tests * update attention_map * add tie_weights and resize_token_embeddings for fineModel * correct getting attention_mask in generate_text_semantic * remove Bark inference trick * leave more choices in barkProcessor * remove _no_split_modules * fixe error in forward of block and introduce clearer notations * correct converting script with last changes * make style + add draft bark.mdx * correct BarkModelTest::test_generate_text_semantic * add Bark in main README * add dummy_pt_objects for Bark * add missing models in the main init * correct test_decoder_model_past_with_large_inputs * disable torchscript test * change docstring of BarkProcessor * Add test_processor_bark * make style * correct copyrights * add bark.mdx + make style, quality and consistency * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Remove unnecessary test method * simply logic of a test * Only check first ids for slow audio generation * split full end-to-end generation tests * remove unneccessary comment * change submodel names for clearer naming * remove ModuleDict from modeling_bark * combine two if statements * ensure that an edge misued won't happen * modify variable name * move code snippet to the right place (coarse instead of semantic) * change BarkSemanticModule -> BarkSemanticModel * align BarkProcessor with transformers paradigm * correct BarkProcessor tests with last commit changes * change _validate_voice_preset to an instance method instead of a class method * tie_weights already called with post_init * add codec_model config to configuration * update bark modeling tests with recent BarkProcessor changes * remove SubModelPretrainedModel + change speakers embeddings prompt type in BarkModel * change absolute imports to relative * remove TODO * change docstrings * add examples to docs and docstrings * make style * uses BatchFeature in BarkProcessor insteads of dict * continue improving docstrings and docs + make style * correct docstrings examples * more comprehensible speaker_embeddings load/Save * rename speaker_embeddings_dict -> speaker_embeddings * correct bark.mdx + add bark to documentation_tests * correct docstrings configuration_bark * integrate last nit suggestions * integrate BarkGeneration configs * make style * remove bark tests from documentation_tests.txt because timeout - tested manually * add proper generation config initialization * small bark.mdx documentation changes * rename bark.mdx -> bark.md * add torch.no_grad behind BarkModel.generate_audio() * replace assert by ValueError in convert_suno_to_hf.py * integrate a series of short comments from reviewer * move SemanticLogitsProcessors and remove .detach() from Bark docs and docstrings * actually remove SemanticLogitsProcessor from modeling_bark.oy * BarkProcessor returns a single output instead of tuple + correct docstrings * make style + correct bug * add initializer_range to BarkConfig + correct slow modeling tests * add .clone() to history_prompt.coarse_prompt to avoid modifying input array * Making sure no extra "`" are present * remove extra characters in modeling_bark.py * Correct output if history_prompt is None * remove TODOs * remove ravel comment * completing generation_configuration_bark.py docstrings * change docstrings - number of audio codebooks instead of Encodec codebooks * change 'bias' docstrings in configuration_bark.py * format code * rename BarkModel.generate_audio -> BarkModel.generate_speech * modify AutoConfig instead of EncodecConfig in BarkConfig * correct AutoConfig wrong init * refactor BarkModel and sub-models generate_coarse, generate_fine, generate_text_semantic * remove SemanticLogitsProcessor and replace it with SuppressTokensLogitsProcessor * move nb_codebook related config arguments to BarkFineConfig * rename bark.mdx -> bark.md * correcting BarkModelConfig from_pretrained + remove keys_to_ignore * correct bark.md with correct hub path * correct code bug in bark.md * correct list tokens_to_suppress * modify Processor to load nested speaker embeddings in a safer way * correct batch sampling in BarkFineModel.generate_fine * Apply suggestions from code review Small docstrings correction and code improvements Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * give more details about num_layers in docstrings * correct indentation mistake * correct submodelconfig order of docstring variables * put audio models in alphabetical order in utils/check_repo.my * remove useless line from test_modeling_bark.py * makes BarkCoarseModelTest inherits from (ModelTesterMixin, GenerationTesterMixin, unittest.TestCase) instead of BarkSemanticModelTest * make a Tester class for each sub-model instead of inheriting * add test_resize_embeddings=True for Bark sub-models * add Copied from transformers.models.gpt_neo.modeling_gpt_neo.GPTNeoSelfAttention._split_heads * remove 'Copied fom Bark' comment * remove unneccessary comment * change np.min -> min in modeling_bark.py * refactored all custom layers to have Bark prefix * add attention_mask as an argument of generate_text_semantic * refactor sub-models start docstrings to have more precise config class definition * move _tied_weights_keys overriding * add docstrings to generate_xxx in modeling_bark.py * add loading whole BarkModel to convert_suno_to_hf * refactor attribute and variable names * make style convert_suno * update bark checkpoints * remove never entered if statement * move bark_modeling docstrings after BarkPretrainedModel class definition * refactor modeling_bark.py: kv -> key_values * small nits - code refactoring and removing unecessary lines from _init_weights * nits - replace inplace method by variable assigning * remove optional when necessary * remove some lines in generate_speech * add default value for optional parameter * Refactor preprocess_histories_before_coarse -> preprocess_histories Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * correct usage after refactoring * refactor Bark's generate_xxx -> generate and modify docstrings and tests accordingly * update docstrings python in configuration_bark.py * add bark files in utils/documentation_test.txt * correct docstrings python snippet * add the ability to use parameters in the form of e.g coarse_temperature * add semantic_max_new_tokens in python snippet in docstrings for quicker generation * Reformate sub-models kwargs in BakModel.generate Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * correct kwargs in BarkModel.generate * correct attention_mask kwarg in BarkModel.generate * add tests for sub-models args in BarkModel.generate and correct BarkFineModel.test_generate_fp16 * enrich BarkModel.generate docstrings with a description of how to use the kwargs --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2023-07-17 17:53:24 +01:00
Sylvain Gugger	1023705440	Check models used for common tests are small (#24824 ) * First models * Conditional DETR * Treat DETR models, skip others * Skip LayoutLMv2 as well * Fix last tests	2023-07-14 14:43:19 -04:00
Sylvain Gugger	f32303d519	Run hub tests (#24807 ) * Run hub tests * [all-test] Run tests please! * [all-test] Add vision dep for hub tests * Fix tests	2023-07-13 15:25:45 -04:00
Joao Gante	34d9409427	Llama/GPTNeoX: add RoPE scaling (#24653 ) * add rope_scaling * tmp commit * add gptneox * add tests * GPTNeoX can now handle long inputs, so the pipeline test was wrong * Update src/transformers/models/open_llama/configuration_open_llama.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * remove ntk * remove redundant validation --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-07-13 16:47:30 +01:00
Sylvain Gugger	9342c8fb82	Deprecate models (#24787 ) * Deprecate some models * Fix imports * Fix inits too * Remove tests * Add deprecated banner to documentation * Remove from init * Fix auto classes * Style * Remote upgrade strategy 1 * Remove site package cache * Revert this part * Fix typo... * Update utils * Update docs/source/en/model_doc/bort.md Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr> * Address review comments * With all files saved --------- Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>	2023-07-13 11:46:54 -04:00
Yih-Dar	717dadc6f3	Skip torchscript tests for `MusicgenForConditionalGeneration` (#24782 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-07-13 15:54:18 +02:00
Zach Mueller	0284285501	Fix pad across processes dim in trainer and not being able to set the timeout (#24775 ) * dim, and rm copy * Don't rm copy for now * Oops * pad index * Should be a working test * Tickle down ddp timeout * Put fix back in now that testing locally is done * Better comment specifying timeout Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2023-07-12 10:01:51 -04:00
NielsRogge	bb13a92859	[InstructBLIP] Fix bos token of LLaMa checkpoints (#24492 ) * Add fix * Fix doctest	2023-07-11 20:43:01 +01:00
Connor Henderson	5739726fcc	fix: Text splitting in the BasicTokenizer (#22280 ) * fix: Apostraphe splitting in the BasicTokenizer for CLIPTokenizer * account for apostrophe at start of new word * remove _run_split_on_punc, use re.findall instead * remove debugging, make style and quality * use pattern and punc splitting, repo-consistency will fail * remove commented out debugging * adds bool args to BasicTokenizer, remove pattern * do_split_on_punc default True * clean stray comments and line breaks * rebase, repo-consistency * update to just do punctuation split * add unicode normalizing back * remove redundant line	2023-07-11 11:07:58 -04:00
Jegor Kitškerkin	8a5e8a9c2a	Add ViViT (#22518 ) * Add model * Add ability to get classification head weights * Add docs * Add imports to __init__.py * Run style * Fix imports and add mdx doc * Run style * Fix copyright * Fix config docstring * Remove imports of ViViTLayer and load_tf_weights_in_vivit * Remove FeatureExtractor and replace with ImageProcessor everywhere * Remove ViViTForPreTraining from vivit.mdx * Change ViViT -> Vivit everywhere * Add model_doc to _toctree.yml * Replace tuples with lists in arguments of VivitConfig * Rename patch_size to tubelet_size in TubeletEmbeddings * Fix checkpoint names * Add tests * Remove unused num_frames * Fix imports for VivitImageProcessor * Minor fixes * Decrease number of frames in VivitModelTester from 32 to 16 * Decrease number of frames in VivitModelTester from 16 to 8 * Add initialization for pos embeddings * Rename Vivit -> ViViT in some places * Fix docstring and formatting * Rename TubeletEmbeddings -> VivitTubeletEmbeddings * Remove load_tf_weights_in_vivit * Change checkpoint name * Remove Vivit _TOKENIZER_FOR_DOC * Fix * Fix VivitTubeletEmbeddings and pass config object as parameter * Use image_size and num_frames instead of video_size * Change conversion script and fix differences with the orig implementation * Fix docstrings * Add attention head pruning * Run style and fixup * Fix tests * Add ViViT to video_classification.mdx * Save processor in conversion script * Fix * Add image processor test * Run fixup and style * Run fix-copies * Update tests/models/vivit/test_modeling_vivit.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/vivit/test_modeling_vivit.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/vivit/modeling_vivit.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Use PyAV instead of decord * Add unittest.skip * Run style * Remove unneeded test * Update docs/source/en/model_doc/vivit.mdx Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/vivit/configuration_vivit.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/vivit/modeling_vivit.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/vivit/image_processing_vivit.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/vivit/modeling_vivit.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/vivit/modeling_vivit.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/vivit/image_processing_vivit.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/vivit/modeling_vivit.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Add model * Add docs * Run style * Fix imports and add mdx doc * Remove FeatureExtractor and replace with ImageProcessor everywhere * Change ViViT -> Vivit everywhere * Rename Vivit -> ViViT in some places * Update src/transformers/models/vivit/image_processing_vivit.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Run make style * Remove inputs save * Fix image processor * Fix * Run `make style` * Decrease parameters of VivitModelTester * Decrease tubelet size * Rename vivit.mdx * Update src/transformers/models/vivit/image_processing_vivit.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/vivit/image_processing_vivit.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/vivit/image_processing_vivit.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Fix default values in image_processing_vivit.py --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-07-11 14:04:04 +01:00
Arthur	b15343de6f	[Patch-t5-tokenizer] Patches the changes on T5 to make sure previous behaviour is still valide for beginning of words (#24622 ) * patch `_tokenize` function * more tests * properly fix * fixup * Update src/transformers/models/t5/tokenization_t5.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix without ifs * update * protect import * add python processing * is first needed * add doc and update with lefacy * updaate * fix T5 SPM converter * styling * fix T5 warning * add is_seqio_available * remove is_first * revert some changes * more tests and update * update llama test batterie * fixup * refactor T5 spm common tests * draft the llama tests * update * uopdate test * nits * refine * name nit * fix t5 tests * fix T5 * update * revert convert slow to fast changes that fail lots of tests * legacy support * fixup * nits is first not defined * don't use legacy behaviour for switch transformers * style * My attempt to check. * nits * fixes * update * fixup * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * updates * fixup * add legacy warning * fixup * warning_once nit * update t5 documentation test * update llama tok documentation * add space to warning * nits * nit * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * last nits --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>	2023-07-11 15:02:18 +02:00
Matt	b3ab3fac1d	Falcon port (#24523 ) * Initial commit * Update src/transformers/models/falcon/configuration_falcon.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/falcon/configuration_falcon.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Cleanup config docstring * Update src/transformers/models/falcon/configuration_falcon.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Convert to relative imports * Remove torch < 1.8 warning * Restructure cos_sin header * qkv -> query, key, value * Refactor attention calculation * Add a couple of config variables to account for the different checkpoints * Successful merging of the code paths! * Fix misplaced line in the non-parallel attention path * Update config and tests * Add a pad_token_id when testing * Support output_attentions when alibi is None * make fixup * Skip KV cache shape test * No more _keys_to_ignore_on_load_missing * Simplify self attention a bit * Simplify self attention a bit * make fixup * stash commit * Some more attention mask updates * Should pass all tests except assisted generation! * Add big model generation test * make fixup * Add temporary workaround for test * Test overrides for assisted generation * Update src/transformers/models/falcon/modeling_falcon.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/falcon/modeling_falcon.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/falcon/modeling_falcon.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update tests/models/falcon/test_modeling_falcon.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Test overrides for assisted generation * Add generation demo * Update copyright * Make the docstring model actually small * Add module-level docstring * Remove all assertions * Add copied from bloom * Reformat the QKV layer * Add copied from bloom * Update src/transformers/models/falcon/modeling_falcon.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Remove unused line and reformat * No single letter variables * Cleanup return names * Add copied from line * Remove the deprecated arguments blocks * Change the embeddings test to an alibi on/off test * Remove position_ids from FalconForQA * Remove old check for token type IDs * Fix the alibi path when multi_query is False * Update src/transformers/models/falcon/modeling_falcon.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/falcon/modeling_falcon.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/falcon/test_modeling_falcon.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update config naming * Fix typo for new_decoder_architecture * Add some comments * Fix docstring * Fix docstring * Create range in the right dtype from the start * Review comment cleanup * n_head_kv -> num_kv_heads * self.alibi -> self.use_alibi * self.num_kv -> self.num_kv_heads * Reorder config args * Made alibi arguments Optional * Add all model docstrings * Add extra checkpoints * Add author info for Falcon * Stop removing token_type_ids because our checkpoints shouldn't return it anymore * Add one hopeful comment for the future * Fix typo * Update tests, fix cache issue for generation * Use -1e9 instead of -inf to avoid float overflow * Recompute the rotary embeddings much less often * Re-enable disabled tests * One final fix to attention mask calculation, and update tests * Cleanup targeting falcon-40b equivalency * Post-rebase docs update * Update docstrings, especially in the config * More descriptive variable names, and comments where we can't rename them --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-07-11 13:36:31 +01:00
novice	30ed3adf47	Add Multi Resolution Analysis (MRA) (New PR) (#24513 ) * Add all files * Update masked_language_modeling.md * fix mlm models * fix conflicts * fix conflicts * fix copies * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Reduce seq_len and hidden_size in ModelTester * remove output_attentions * fix conflicts * remove copied from statements * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-07-10 10:50:43 +01:00
Yih-Dar	4957294270	Fix flaky `test_for_warning_if_padding_and_no_attention_mask` (#24706 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-07-07 11:55:21 +02:00
Arthur	fb78769b9c	[`MT5`] Fix CONFIG_MAPPING issue leading it to load umt5 class (#24678 ) * update * add umt5 to auto tokenizer mapping * nits * fixup * fix failing torch test	2023-07-07 11:33:54 +09:00
Yuchao Dai	fb3b22c3b9	LlamaTokenizer should be picklable (#24681 ) * LlamaTokenizer should be picklable * make fixup	2023-07-06 10:21:27 +01:00
Nripesh Niketan	bd9dfc23b9	Add `is_torch_mps_available` function to utils (#24660 ) * Add mps function utils * black formating * format fix * Added MPS functionality to transformers * format fix	2023-07-05 16:02:20 +02:00
Yih-Dar	ee339bad01	Fix `VisionTextDualEncoderIntegrationTest` (#24661 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-07-05 13:44:30 +02:00
Yih-Dar	d211a84aca	Fix `EncodecModelTest::test_multi_gpu_data_parallel_forward` (#24663 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-07-05 11:37:46 +02:00
Sanchit Gandhi	4e94566018	Fix audio feature extractor deps (#24636 ) * Fix audio feature extractor deps * use audio utils window over torch window	2023-07-04 16:03:27 +01:00
Arthur	799df10aef	[`Umt5`] Add google's umt5 to `transformers` (#24477 ) * add tokenization template * update conversion script * update modeling code * update * update convert checkpoint * update modeling * revert changes on convert script * new conversion script for new format * correct position bias * cleaning a bit * Credit co authors Co-authored-by: agemagician <ahmed.elnaggar@tum.de> Co-authored-by: stefan-it <> * styling * Add docq * fix copies * add co author * Other Author * Merge branch 'main' of https://github.com/huggingface/transformers into add-umt5 * add testing * nit * Update docs/source/en/model_doc/umt5.mdx Co-authored-by: Stefan Schweter <stefan@schweter.it> * fix t5 * actual fix? * revert wrong changes * remove * update test * more fixes * revert some changes * add SPIECE_UNDERLINE * add a commone xample * upfate * fix copies * revert changes on t5 conversion script * revert bytefallback changes since there was no addition yet * fixup * fixup * ingore umt5 cutom testing folder * fix readmes * revertT5 changes * same outputs * fixup * update example * Apply suggestions from code review * style * draft addition of all new files * current update * fix attention and stuff * finish refactoring * auto config * fixup * more nits * add umt5 to init * use md format * Update README.md Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * revert changes on mt5 * revert mt4 changes * update test * more fixes * add to mapping * fix-copies * fix copies * foix retain grad * fix some tests * nits * done * Update src/transformers/models/umt5/modeling_umt5.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update docs/source/en/model_doc/umt5.md * Update src/transformers/models/umt5/__init__.py * Update docs/source/en/model_doc/umt5.md Co-authored-by: Stefan Schweter <stefan@schweter.it> * Update src/transformers/models/umt5/modeling_umt5.py * update conversion script + use google checkpoints * nits * update test and modelling * stash slow convert * update fixupd * don't change slow --------- Co-authored-by: stefan-it <> Co-authored-by: Stefan Schweter <stefan@schweter.it> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2023-07-03 07:38:21 +02:00
Matt	134caef31a	Speed up TF tests by reducing hidden layer counts (#24595 ) * hidden layers, huh, what are they good for (absolutely nothing) * Some tests break with 1 hidden layer, use 2 * Use 1 hidden layer in a few slow models * Use num_hidden_layers=2 everywhere * Slightly higher tol for groupvit * Slightly higher tol for groupvit	2023-06-30 16:30:33 +01:00
Yih-Dar	3441ad7d43	Make (TF) CI faster (test only a subset of model classes) (#24592 ) * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-06-30 16:54:54 +02:00
JB (Don)	78a2b19fc8	Show a warning for missing attention masks when pad_token_id is not None (#24510 ) * Adding warning messages to BERT for missing attention masks These warning messages when there are pad tokens within the input ids and no attention masks are given. The warning message should only show up once. * Adding warning messages to BERT for missing attention masks These warning messages are shown when the pad_token_id is not None and no attention masks are given. The warning message should only show up once. * Ran fix copies to copy over the changes to some of the other models * Add logger.warning_once.cache_clear() to the test * Shows warning when there are no attention masks and input_ids start/end with pad tokens * Using warning_once() instead and fix indexing in input_ids check --------- Co-authored-by: JB Lau <hckyn@voyager2.local>	2023-06-30 08:19:39 -04:00
Arthur	b52a03cd3b	⚠️⚠️[`T5Tokenize`] Fix T5 family tokenizers⚠️⚠️ (#24565 ) * don't add space before single letter chars that don't have a merge * fix the fix * fixup * add a test * more testing * fixup * hack to make sure fast is also fixed * update switch transformers test * revert convert slow * Update src/transformers/models/t5/tokenization_t5.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * add typechecking * quality --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2023-06-30 07:00:43 +02:00
amyeroberts	b324557aac	Removal of deprecated vision methods and specify deprecation versions (#24570 ) * Removal of deprecated methods and specify versions * Fix tests	2023-06-29 15:09:51 +01:00
Yih-Dar	77db28dc52	Update some torchscript tests after #24505 (#24566 ) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-06-29 16:05:24 +02:00
Sanchit Gandhi	1c1c90756d	Add Musicgen (#24109 ) * Add Audiocraft * add cross attention * style * add for lm * convert and verify * introduce t5 * split configs * load t5 + lm * clean conversion * copy from t5 * style * start pattern provider * make generation work * style * fix pos embs * propagate shape changes * propagate shape changes * style * delay pattern: pad tokens at end * audiocraft -> musicgen * fix inits * add mdx * style * fix pad token in processor * override generate and add todos * add init to test * undo pattern delay mask after gen * remove cfg logits processor * remove cfg logits processor * remove logits processor in favour of mask * clean pos embs * make fix copies * update readmes * clean pos emb * refactor encoder/decoder * make fix copies * update conversion * fix config imports * update config docs * make style * send pattern mask to device * pattern mask with delay * recover prompted audio tokens * fix docstrings * laydown test file * pattern edge case * remove t5 ref * add processing class * config refactor * better pattern comment * check if mask is not present * check if mask is not present * refactor to auto class * remove encoder configs * fix processor * processor import * start updating conversion * start updating tests * make style * convert t5, encodec, lm * convert as composite * also convert processor * run generate * classifier free gen * comments and clean up * make style * docs for logit proc * docstring for uncond gen * start lm tests * work tests * let the lm generate * refactor: reshape inside forward * undo greedy loop changes * from_enc_dec -> from_sub_model * fix input id shapes in docstrings * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * undo generate changes * from sub model config * Update src/transformers/models/musicgen/modeling_musicgen.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * make generate work again * generate uncond -> get uncond inputs * remove prefix allowed tokens fn * better error message * logit proc checks * Apply suggestions from code review Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * make decoder only tests work * composite fast tests * make style * uncond generation * feat extr padding * make audio prompt work * fix inputs docstrings * unconditional inputs: dict -> model output * clean up tests * more clean up tests * make style * t5 encoder -> auto text encoder * remove comments * deal with frames * fix auto text * slow tests * nice mdx * remove can generate * todo - hub id * convert m/l * make fix copies * only import generation with torch * ignore decoder from tests * don't wrap uncond inputs * make style * cleaner uncond inputs * add example to musicgen forward * fix docs * ignore MusicGen Model/ForConditionalGeneration in auto mapping * add doc section to toctree * add to doc tests * add processor tests * fix push to hub in conversion * tips for decoder only loading * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fix conversion for s / m / l checkpoints * import stopping criteria from module * remove from pipeline tests * fix uncond docstring * decode audio method * fix docs * org: sanchit-gandhi -> facebook * fix max pos embeddings * remove auto doc (not compatible with shapes) * bump max pos emb * make style * fix doc * fix config doc * fix config doc * ignore musicgen config from docstring * make style * fix config * fix config for doctest * consistent from_sub_models * don't automap decoder * fix mdx save audio file * fix mdx save audio file * processor batch decode for audio * remove keys to ignore * update doc md * update generation config * allow changes for default generation config * update tests * make style * fix docstring for uncond * fix processor test * fix processor test --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2023-06-29 14:48:59 +01:00
amyeroberts	ae454f41d4	Update old existing feature extractor references (#24552 ) * Update old existing feature extractor references * Typo * Apply suggestions from code review * Apply suggestions from code review * Apply suggestions from code review * Address comments from review - update 'feature extractor' Co-authored by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>	2023-06-29 10:17:36 +01:00
Yih-Dar	fd6735102a	Make PT/Flax tests could be run on GPU (#24557 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-06-28 20:11:01 +02:00
Younes Belkada	33b5ef5cdf	[`InstructBlip`] Add instruct blip int8 test (#24555 ) * add 8bit instructblip test * update tests	2023-06-28 19:06:30 +02:00
Younes Belkada	903b97d8df	[`gpt2-int8`] Add gpt2-xl int8 test (#24543 ) add gpt2-xl test	2023-06-28 18:02:13 +02:00
Yih-Dar	b0651655be	Update `EncodecIntegrationTest` (#24553 ) * fix * fix * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-06-28 18:01:41 +02:00
Yih-Dar	e84bf1f734	⚠️ Time to say goodbye to py37 (#24091 ) * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-06-28 07:22:39 +02:00
Dario Sučić	12240925cf	Add bitsandbytes support for gpt2 models (#24504 ) * Add bitsandbytes support for gpt2 models * Guard Conv1D import to pass tensorflow test * Appease ruff linter * Fix 4bit test and remove int8 test boilerplate * Update tests/bnb/test_mixed_int8.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>	2023-06-28 05:55:32 +02:00
Sylvain Gugger	89b6ee49fd	Finishing tidying keys to ignore on load (#24535 )	2023-06-27 21:35:15 -04:00
Sylvain Gugger	8e5d1619b3	Clean load keys (#24505 ) * Preliminary work on some models * Fix test load missing and make sure nonpersistent buffers are tested * Always ignore nonpersistent buffers if in state_dict * Treat models * More models * Treat remaining models * Fix quality * Fix tests * Remove draft * This test is not needed anymore * Fix copies * Fix last test * Newly added models * Fix last tests * Address review comments	2023-06-27 14:45:40 -04:00
Sebastian	06910f5a76	[`T5`] Add T5ForQuestionAnswering and MT5ForQuestionAnswering (#24481 ) * Adding T5ForQuestionAnswering * Changed weight initialization that results in better initial loss when fine-tuning * Update to class variables * Running make fixup * Running make fix-copies * Remove model_parallel * Adding MT5ForQuestionAnswering * Adding docs * Fix wrong doc * Update src/transformers/models/mt5/modeling_mt5.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/models/t5/modeling_t5.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * File formatting * Undoing change --------- Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>	2023-06-27 10:07:06 -04:00
Xiaoli Wang	239ace152b	Fix TypeError: Object of type int64 is not JSON serializable (#24340 ) * Fix TypeError: Object of type int64 is not JSON serializable * Convert numpy.float64 and numpy.int64 to float and int for json serialization * Black reformatted examples/pytorch/token-classification/run_ner_no_trainer.py * * make style	2023-06-27 12:15:49 +01:00
Joao Gante	5f3efdf762	Generate: `group_beam_search` requires `diversity_penalty>0.0` (#24456 ) * add exception * update docs	2023-06-27 10:46:39 +01:00
Yih-Dar	850cf4af0c	Compute `dropout_probability` only in training mode (#24486 ) * fix * fix * fix * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-06-26 18:36:47 +02:00
Sylvain Gugger	5757923888	Add support for for loops in python interpreter (#24429 ) Add support for for loops	2023-06-26 09:58:14 -04:00
Yih-Dar	3ca022238b	Update `InstructBlipModelIntegrationTest` (#24490 ) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-06-26 14:37:12 +02:00
Younes Belkada	914289ac4b	[`pipeline`] Fix str device issue (#24396 ) * fix str device issue * fixup * adapt from suggestions * forward contrib credits from suggestions * better fix * added backward compatibility for older PT versions * final fixes * oops * Attempting something with less branching. --------- Co-authored-by: amyeroberts <amyeroberts@users.noreply.github.com> Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>	2023-06-26 13:58:36 +02:00
Matthijs Hollemans	3b84d86b57	add missing alignment_heads to Whisper integration test (#24487 ) add missing alignment heads	2023-06-26 11:50:10 +02:00
NielsRogge	868363abb9	Add InstructBLIP (#23460 ) * Squash 88 commits * Use markdown * Remove mdx files due to bad rebase * Fix modeling files due to bad rebase * Fix style * Update comment * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-06-26 11:23:57 +02:00
Sanchit Gandhi	8767958fc1	Allow dict input for audio classification pipeline (#23445 ) * Allow dict input for audio classification pipeline * make style * Empty commit to trigger CI * Empty commit to trigger CI * check for torchaudio * add pip instructions Co-authored-by: Sylvain <sylvain.gugger@gmail.com> * Update src/transformers/pipelines/audio_classification.py Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> * asr -> audio class * asr -> audio class --------- Co-authored-by: Sylvain <sylvain.gugger@gmail.com> Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>	2023-06-23 13:50:37 +01:00
Yih-Dar	2898fd3968	Fix some `TFWhisperModelIntegrationTests` (#24428 ) * fix * fix * fix * fix * fix * fix * fix * fix * fix * Update src/transformers/models/whisper/modeling_tf_whisper.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/whisper/modeling_tf_whisper.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-06-23 14:27:49 +02:00
Bowen Bao	a28325e25e	Replace python random with torch.rand to enable dynamo.export (#24434 ) * Replace python random with torch.rand to enable dynamo.export * revert changes to flax model code * Remove unused random import * Fix torch template * Move torch.manual_seed(0) to right location	2023-06-23 08:17:21 -04:00
Alex Hall	b6295b26c5	Refactor hyperparameter search backends (#24384 ) * Refactor hyperparameter search backends * Simpler refactoring without abstract base class * black * review comments: specify name in class use methods instead of callable class attributes name constant better * review comments: safer bool checking, log multiple available backends * test ALL_HYPERPARAMETER_SEARCH_BACKENDS vs HPSearchBackend in unit test, not module. format with black. * copyright	2023-06-22 14:28:25 -04:00
Younes Belkada	3ce3385c47	Revert "Fix gradient checkpointing + fp16 autocast for most models" (#24420 ) Revert "Fix gradient checkpointing + fp16 autocast for most models (#24247)" This reverts commit `285a48011d`.	2023-06-22 16:11:27 +02:00
Yih-Dar	652ece0710	Skip `test_conditional_generation_pt_pix2struct` in Past CI (torch < 1.11) (#24417 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-06-22 15:34:13 +02:00
Matthijs Hollemans	cd927a4736	add word-level timestamps to Whisper (#23205 ) * let's go! * initial implementation of token-level timestamps * only return a single timestamp per token * remove token probabilities * fix return type * fix doc comment * strip special tokens * rename * revert to not stripping special tokens * only support models that have alignment_heads * add integration test * consistently name it token-level timestamps * small DTW tweak * initial support for ASR pipeline * fix pipeline doc comments * resolve token timestamps in pipeline with chunking * change warning when no final timestamp is found * return word-level timestamps * fixup * fix bug that skipped final word in each chunk * fix failing unit tests * merge punctuations into the words * also return word tokens * also return token indices * add (failing) unit test for combine_tokens_into_words * make combine_tokens_into_words private * restore OpenAI's punctuation rules * add pipeline tests * make requested changes * PR review changes * fix failing pipeline test * small stuff from PR * only return words and their timestamps, not segments * move alignment_heads into generation config * forgot to set alignment_heads in pipeline tests * tiny comment fix * grr	2023-06-21 17:48:21 +02:00
Younes Belkada	285a48011d	Fix gradient checkpointing + fp16 autocast for most models (#24247 ) * fix gc bug * continue PoC on OPT * fixes * 🤯 * fix tests * remove pytest.mark * fixup * forward contrib credits from discussions * forward contrib credits from discussions * reverting changes on untouched files. --------- Co-authored-by: zhaoqf123 <zhaoqf123@users.noreply.github.com> Co-authored-by: 7eu7d7 <7eu7d7@users.noreply.github.com>	2023-06-21 17:04:59 +02:00
Joao Gante	5f0801d174	Generate: add SequenceBiasLogitsProcessor (#24334 )	2023-06-21 11:14:41 +01:00
Sylvain Gugger	eb849f6604	Migrate doc files to Markdown. (#24376 ) * Rename index.mdx to index.md * With saved modifs * Address review comment * Treat all files * .mdx -> .md * Remove special char * Update utils/tests_fetcher.py Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr> --------- Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>	2023-06-20 18:07:47 -04:00
Patrick von Platen	b0513b013b	[Wav2Vec2 - MMS] Correct directly loading adapters weights (#24335 ) * Correct direct lang loading * correct more * revert black * Use tie weights instead= * add tests * add tests * make style	2023-06-20 19:39:52 +02:00
Arthur	e5c760d636	[GPTNeoX] Nit in config (#24349 ) * add raise value error for attention size * nits to fix test_config * style	2023-06-20 19:19:19 +02:00
Yih-Dar	83dc5762e7	Skip a tapas (tokenization) test in past CI (#24378 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-06-20 18:35:45 +02:00
Yih-Dar	297d769d0e	Better test name and enable pipeline test for `pix2struct` (#24377 ) * best test name forever * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-06-20 18:29:30 +02:00
Yih-Dar	0527c1c0ea	Add a check in `ImageToTextPipeline._forward` (#24373 ) * fix * fix * fix * Update src/transformers/pipelines/image_to_text.py Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>	2023-06-20 18:07:34 +02:00
Sylvain Gugger	dc4449918d	Rename test to be more accurate (#24374 )	2023-06-20 11:54:55 -04:00
Sanchit Gandhi	6c1344444a	[Whisper] Make tests faster (#24105 )	2023-06-20 16:01:56 +01:00
Yih-Dar	c23d131eab	Update tiny models for pipeline testing. (#24364 ) * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-06-20 14:43:10 +02:00
Matt	56efbf4301	TensorFlow CI fixes (#24360 ) * Fix saved_model_creation_extended * Skip the BLIP model creation test for now * Fix TF SAM test * Fix longformer tests * Fix Wav2Vec2 * Add a skip for XLNet * make fixup * make fix-copies * Add comments	2023-06-20 12:59:21 +01:00
Matt	9138995025	Add test for proper TF input signatures (#24320 ) * Add test for proper input signatures * No more signature pruning * Test the dummy inputs are valid too * fine-tine -> fine-tune * Fix indent in test_dataset_conversion	2023-06-16 17:03:13 +01:00
Sylvain Gugger	096f2cf126	Tied weights load (#24310 ) * Use tied weight keys * More * Fix tied weight missing warning * Only give info on unexpected keys with different classes * Deal with empty archs * Fix tests * Refine test	2023-06-16 10:55:42 -04:00
Matt	3403712958	Big TF test cleanup (#24282 ) * Fix one BLIP arg not being optional, remove misspelled arg * Remove the lxmert test overrides and just use the base test_saved_model_creation * saved_model_creation fixes and re-enabling tests across the board * Remove unnecessary skip * Stop caching sinusoidal embeddings in speech_to_text * Fix transfo_xl compilation * Fix transfo_xl compilation * Fix the conditionals in xglm * Set the save spec only when building * Clarify comment * Move comment correctly * Correct embeddings generation for speech2text * Mark RAG generation tests as @slow * Remove redundant else: * Add comment to clarify the save_spec line in build() * Fix size tests for XGLM at last! * make fixup * Remove one band_part operation * Mark test_keras_fit as @slow	2023-06-16 15:40:49 +01:00
Yih-Dar	896a58de15	Byebye pytorch 1.9 (#24080 ) byebye --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-06-16 16:38:23 +02:00
Matt	62d71f4083	Fix functional TF Whisper and modernize tests (#24301 ) * Revert whisper change and modify the test_compile_tf_model test * make fixup * Tweak test slightly * Add functional model saving to test * Ensure TF can infer shapes for data2vec * Add override for efficientformer * Mark test as slow	2023-06-16 14:43:43 +01:00
Sanchit Gandhi	4124a09f8b	[EnCodec] Changes for 32kHz ckpt (#24296 ) * [EnCodec] Changes for 32kHz ckpt * Update src/transformers/models/encodec/convert_encodec_checkpoint_to_pytorch.py * Update src/transformers/models/encodec/convert_encodec_checkpoint_to_pytorch.py	2023-06-15 14:36:19 +01:00
amyeroberts	e6122c3f40	Fix image segmentation tool bug (#23897 ) * Image segmentation tool bug * Remove resizing in the tests	2023-06-15 08:09:31 -04:00
Sylvain Gugger	372f50030b	Split common test from core tests (#24284 )	2023-06-15 07:30:24 -04:00
Matthijs Hollemans	0c3fdccf2f	[WIP] add EnCodec model (#23655 ) * boilerplate stuff * messing around with the feature extractor * fix feature extractor * unit tests for feature extractor * rename speech to audio * quick-and-dirty import of Meta's code * import weights (sort of) * cleaning up * more cleaning up * move encoder/decoder args into config * cleanup model * rename EnCodec -> Encodec * RVQ parameters in config * add slow test * add lstm init and test_init * Add save & load * finish EncodecModel * remove decoder_input_values as they are ont used anywhere (not removed from doc yet) * fix test feature extraction model name * Add better slow test * Fix tests * some fixup and cleaning * Improve further * cleaning up quantizer * fix up conversion script * test don't pass, _encode_fram does not work * update tests with output per encode and decode * more cleanup * rename _codebook * remove old config cruft * ratios & hop_length * use ModuleList instead of Sequential * clean up resnet block * update types * update tests * fixup * quick cleanup * fix padding * more styl,ing * add patrick feedback * fix copies * fixup * fix lstm * fix shape issues * fixup * rename conv layers * fixup * fix decoding * small conv refactoring * remove norm_params * simplify conv layers * rename conv layers * stuff * Clean up * Add padding logic use padding mask small conv refactoring remove norm_params simplify conv layers rename conv layers stuff add batched test update Clean up merge and update for padding fix padding fixup * clean up more * clean up more * More clean ups * cleanup convolutions * typo * fix typos * fixup * build PR doc? * start refactoring docstring * fix don't pad when no strid and chunk * update docstring * update docstring * nits * update going to lunch * update config and model * fix broken testse (becaue of the config changes) * fix scale computation * fixu[ * only return dict if speciefied or if config returns it * remove todos * update defaults in config * update conversion script * fix doctest * more docstring + fixup * nits on batched_tests * more nits * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * update basxed on review * fix update * updaet tests * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fixup * add overlap and chunl_length_s * cleanup feature extraction * teste edge cases truncation and padding * correct processor values * update config encodec, nits * fix tests * fixup * fix 24Hz test * elle tests are green * fix fixup * Apply suggestions from code review * revert readme changes * fixup * add example * use facebook checkpoints * fix typo * no pipeline tests * use slef.pad everywhere we can * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * update based on review * update * update mdx * fix bug and tests * fixup * fix doctest * remove comment * more nits * add more coverage for `test_truncation_and_padding` * fixup * add last test * fix text * nits * Update tests/models/encodec/test_modeling_encodec.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * take care of the last comments * typo * fix test * nits * fixup * Update src/transformers/models/encodec/feature_extraction_encodec.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: arthur.zucker@gmail.com <arthur.zucker@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-06-14 18:57:23 +02:00
Yih-Dar	a04ebc8b33	`Pix2StructImageProcessor` requires `torch>=1.11.0` (#24270 ) * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-06-14 17:05:40 +02:00
Joao Gante	4626df5077	TF: CTRL with native embedding layers (#23456 )	2023-06-14 14:39:02 +01:00
Yih-Dar	eac8dede83	Skip some `TQAPipelineTests` tests in past CI (#24267 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-06-14 14:25:24 +02:00
Yih-Dar	233113149b	Skip `GPT-J` fx tests for torch < 1.12 (#24256 ) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2023-06-13 20:33:26 +02:00
Matt	3bd1fe4315	Stop storing references to bound methods via tf.function (#24146 ) * Stop storing references to bound methods in tf.functions * Remove the gc.collect calls now that we resolved the underlying problem * Remove the default signature from model.serving entirely, big cleanup * Remove _prune_signature as self.input_signature can prune itself * Restore serving docstring * Update int support test to check the input signature * Make sure other tests also use model.input_signature and not serving.input_signature * Restore _prune_signature * Remove the doctest GC now it's no longer needed * Correct core tests to use the pruned sig * order lines correctly in core tests * Add eager_serving back with a deprecation warning	2023-06-13 19:04:22 +01:00

... 8 9 10 11 12 ...

3756 Commits