transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-20 04:58:22 +06:00

Author	SHA1	Message	Date
Younes Belkada	c876d12127	FIX / TST: Fix expected results on Mistral slow test (A10) (#30909 ) Update test_modeling_mistral.py	2024-05-21 09:14:14 +02:00
Longjie Zheng	616bb11d48	Add torch.compile for Mistral (#30642 ) * first version * fix sliding window * fix style * add sliding window cache * fix style * address comments * fix test * fix style * move sliding window check inside cache init * revert changes on irrelevant files & add comment on SlidingWindowCache * address comments & fix style fix style * update causal mask * [run-slow] mistral * [run-slow] mistral * [run-slow] mistral * [run-slow] mistral * [run-slow] mistral * [run-slow] llama * [run-slow] mistral * [run-slow] mistral * [run-slow] mistral * revert CI from a10 to t4 * wrap up	2024-05-20 16:27:24 +02:00
Zach Mueller	92d1d97c05	Introduce configured_state arg for accelerator_config (#29781 ) * Introduce configured_state * Include note on tuning * Allow for users to have defined a state already * Include tests * Add note on hpam tune * Guard a bit better * Update src/transformers/training_args.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/training_args.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Finish rebase * Finish rebase * Guard carefully * Fixup test * Refactor * Fin refactor * Comment * Update wrt feedback --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-05-20 09:21:40 -04:00
Yoach Lacombe	e6708709cb	Add AutoFeatureExtractor support to Wav2Vec2ProcessorWithLM (#28706 ) * Add AutoFeatureExtractor support to Wav2Vec2ProcessorWithLM * update with a type filter * add raises error test * fix added test	2024-05-20 13:40:42 +02:00
Hafedh	c11ac7857b	fix for custom pipeline configuration (#29004 ) * fix for custom pipeline configuration * fix for custom pipelines * remove extra exception * added test for custom pipelines extra tag * format with ruff * limit extra tag for first time only * format with ruff * improve tests for custom pipelines	2024-05-20 11:38:32 +02:00
Kamil Akesbi	1c2bb3ac54	add return_token_timestamps to WhisperProcessor (#30812 ) * compute num_frames in WhisperFeatureExtractor * add return_num_frames in WhisperFeatureProcessor + adapt pipeline * return_timestamps renaming + pipeline fix * fix * fix * fix * add tests * Update src/transformers/models/whisper/feature_extraction_whisper.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * apply review changes * fix * Update src/transformers/models/whisper/feature_extraction_whisper.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update tests/models/whisper/test_modeling_whisper.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * apply review * fix * review changes * Update src/transformers/models/whisper/feature_extraction_whisper.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * make style quality * EXPECTED_OUTPUT in single line * small numpy->torch fix * fix --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-05-20 09:53:58 +01:00
Raushan Turganbay	5d0bf59b4d	LLaVa-Next: Update docs with batched inference (#30857 ) * update docs with batch ex * Update docs/source/en/model_doc/llava_next.md Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * accept nested list of img --------- Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>	2024-05-20 13:45:56 +05:00
Benjamin Warner	cd6bd0af34	Add support for torch.compile dynamic shapes (#30560 ) * add torch.compile dynamic support * Add SDPA dynamic shapes compile test & improve SDPA comment * comment consistency	2024-05-20 10:36:57 +02:00
Joseph Enguehard	07bf2dff78	Add TokenClassification for Mistral, Mixtral and Qwen2 (#29878 ) * Add MistralForTokenClassification * Add tests and docs * Add token classification for Mixtral and Qwen2 * Save llma for token classification draft * Add token classification support for Llama, Gemma, Persimmon, StableLm and StarCoder2 * Formatting * Add token classification support for Qwen2Moe model * Add dropout layer to each ForTokenClassification model * Add copied from in tests * Update src/transformers/models/llama/modeling_llama.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Propagate suggested changes * Style --------- Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>	2024-05-20 10:06:57 +02:00
Abhiroop Tejomay	481a957814	Enable dynamic resolution input for Swin Transformer and variants (#30656 ) * add interpolation of positional encoding support to swin * add style changes * use default image processor and make size a dictionary Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * remove logits testing Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Refactor image size validation logic when interpolation is disabled Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * remove asserts in modeling Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * add dynamic resolution input support to swinv2 * change size to ensure interpolation encoding path is triggered * set interpolate_pos_encoding default value to False Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * set interpolate_pos_encoding default value to False Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * set interpolate_pos_encoding default value to False Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * set interpolate_pos_encoding default value to False Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * set interpolate_pos_encoding default value to False Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * set interpolate_pos_encoding default value to False Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * set interpolate_pos_encoding default value to False Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * set interpolate_pos_encoding default value to False * add dynamic resolution input to donut swin * add dynamic resolution input to maskformer swin --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-05-17 18:38:46 +01:00
Pavel Iakubovskii	bf646fbf2d	Add fixed resize and pad strategy for object detection (#30742 ) * Add resize and pad strategy * Merge get_size functions * Add pad_size + tests to object detection models * Fixup * Update docstrings * Fixup	2024-05-17 16:21:26 +01:00
Arthur	0a9300f474	Support arbitrary processor (#30875 ) * Support arbitrary processor * fix * nit * update * nit * nit * fix and revert * add a small test * better check * fixup * bug so let's just use class for now * oups * .	2024-05-17 16:51:31 +02:00
Younes Belkada	3d7d3a87a0	TEST: Add llama logits tests (#30835 ) * add llama logits test * fix * fix tests " " * fix for a10 * format * format * fix * [run-slow] remove fmt: skip * Your commit message * test commit * Revert "test commit" This reverts commit `b66e01e55f`. * [run-slow]llama * Update tests/models/llama/test_modeling_llama.py * [run-slow]llama * empty commit	2024-05-17 12:23:00 +02:00
Yih-Dar	1b3dba9417	Make `Gemma` work with `torch.compile` (#30775 ) * fix * [run-slow] gemma * add test * add `test_compile_static_cache` * fix * style * remove subprocess * use attribute * fix * style * update * [run-slow] dbrx,gemma,jetmoe,phi3,recurrent_gemma --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-05-16 13:41:33 +02:00
Joao Gante	9d889f870e	Cache: add new flag to distinguish models that `Cache` but not static cache (#30800 ) * jamba cache * new flag * generate exception	2024-05-16 12:08:35 +01:00
hyenal	1c21f48a50	add sdpa to ViT [follow up of #29325 ] (#30555 ) remove blank line (+1 squashed commit) Squashed commits: [24ccd2061] [run-slow]vit_msn,vision_encoder_decoder (+24 squashed commits) Squashed commits: [08bd27e7a] [run-slow]vit_msn,vision_encoder_decoder [ec96a8db3] [run-slow]vit_msn [ead817eca] fix vit msn multi gpu [d12cdc8fd] [run-slow]audio_spectrogram_transformer,deit,vision_encoder_decoder,vision_text_dual_encoder,vit,vit_hybrid,vit_mae,vit_msn,videomae,yolos [3fdbfa88f] doc [a3ff33e4a] finish implementation [e20b7b7fb] Update test_modeling_common.py [e290c5810] Update test_modeling_flax_common.py [d3af86f46] comment [ff7dd32d8] more comments [59b137889] suggestion [7e2ba6d67] attn_implementation as attribute of the class [fe66ab71f] minor [38642b568] Apply suggestions from code review Accept comments Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> [22cde7d52] Update tests/test_modeling_common.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> [48e137cc6] Update tests/test_modeling_common.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> [99f4c679f] Update tests/test_modeling_common.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> [96cf20a6d] Update src/transformers/models/vit_msn/modeling_vit_msn.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> [c59377d23] Update src/transformers/models/vit_mae/modeling_vit_mae.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> [b70a47259] Update tests/models/vision_text_dual_encoder/test_modeling_vision_text_dual_encoder.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> [00c84d216] [run-slow]audio_spectrogram_transformer,deit,vision_encoder_decoder,vision_text_dual_encoder,vit,vit_hybrid,vit_mae,vit_msn,videomae,yolos [61f00ebb0] all tests are passing locally [e9e0b82b7] vision encoder/decoder [4d5076b56] test-vision (+20 squashed commits) Squashed commits: [d1add8db9] yolo [9fde65716] fix flax [986566c28] minor [ca2f21d1f] vit [3333efd7a] easy models change [ebfc21402] [run-slow]audio_spectrogram_transformer,deit,vision_encoder_decoder,vision_text_dual_encoder,vit,vit_hybrid,vit_mae,vit_msn,videomae,yolos [b8b8603ed] [run-slow]vision_encoder_decoder,vision_text_dual_encoder,yolos [48ecc7e26] all tests are passing locally [bff7fc366] minor [62f88306f] fix yolo and text_encoder tests [121507555] [run-slow]audio_spectrogram_transformer,deit,vit,vit_hybrid,vit_mae,vit_msn,videomae [1064cae0a] [run-slow]vision_encoder_decoder,vision_text_dual_encoder,yolos [b7f52ff3a] [run-slow]audio_spectrogram_transformer,deit,vit,vit_hybrid,vit_mae,vit_msn,videomae [cffaa10dd] fix-copies [ef6c511c4] test vit hybrid [7d4ba8644] vit hybrid [66f919033] [run-slow]audio_spectrogram_transformer,deit,vit,vit_hybrid,vit_mae,vit_msn,videomae [1fcc0a031] fixes [cfde6eb21] fixup [e77df1ed3] all except yolo end encoder decoder (+17 squashed commits) Squashed commits: [602913e22] vit + vit_mae are working [547f6c4cc] RUN_SLOW=1 pytest tests/models/audio_spectrogram_transformer/ tests/models/deit/ tests/models/videomae/ passes [61a97dfa9] it s the complete opposite... [aefab37d4] fix more tests [71802a1b9] fix all torch tests [40b12eb58] encoder - decoder tests [941552b69] slow decorator where appropriate [14d055d80] has_attentions to yolo and msn [3381fa19f] add correct name [e261316a7] repo consistency [31c6d0c08] fixup [9d214276c] minor fix [11ed2e1b7] chore [eca6644c4] add sdpa to vit-based models [cffbf390b] make fix-copies result [6468319b0] fix style [d324cd02a] add sdpa for vit Co-authored-by: Liubov Yaronskaya <luba.yaronskaya@gmail.com>	2024-05-16 10:56:11 +01:00
Edoardo Cetin	4b3eb19fa7	Fix llama model sdpa attention forward function masking bug when output_attentions=True (#30652 ) * Fix llama model forward function with attention=True, same-length encoded sequence. * Fix style * propagate fix to modeling_cohere, gemma, dbrx, and olmo (which copy the same sdpa masking logic from llama) * Fix style * ignore unnecessary sdpa mask converter when output_attentions=True * add tests checking sdpa and eager outputs match when output_attentions=True * Split if statements in two lines Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Fix formatting * Add fix to new jetmoe model * Add missing output_attentions argument to jetmoe mask creation --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-05-15 19:48:19 +02:00
Younes Belkada	3f435823e0	FEAT / Bitsandbytes: Add `dequantize` API for bitsandbytes quantized models (#30806 ) * add method * change method name * more comments * Apply suggestions from code review Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fixup * add docstrings and fix comment * warn users on the de-quantized dtype * Update src/transformers/quantizers/base.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/integrations/bitsandbytes.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * final suggestion - use private method --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-05-15 17:17:09 +02:00
Xuan-Phi Nguyen	5ca085b882	Better llava next. (#29850 ) * Better llava next. - Batched forward with multiple image of different sizes (number of patches). - Support training, for cases without any image. - Support multi-image in same sequence. e.g: ["<image> <image> the first image is a dog while the second is a cat", "<image> <image> <image> <image> these 4 image are..."] Current limitation: - Haven't done testing - Only support right padding (for training) - left padding (batched generation) is not ready yet. - PR not ready. * fix bugs in batched generation * add tests * fix batch-gen bugs, left-padding positions and incorrect attention mask * remove better modeling llava * fix formatting * fix test * fix test * fix testing * fix test * fix formatting * Update src/transformers/models/llava_next/modeling_llava_next.py add clarity Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update modeling_llava_next.py remove assert * fix bug modeling_llava_next.py * update modeling * fix bugs * fix format * fix error * fix new_token_positions * Update modeling_llava_next.py * update formatting * add args * removecomments * add slow tests for batched inference * failing tf/flax tests * this one ic correct * Update src/transformers/models/llava_next/modeling_llava_next.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix docs * make fixup * more fixup * add test for batch equivalence * Update tests/models/llava_next/test_modeling_llava_next.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/llava_next/image_processing_llava_next.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/llava_next/image_processing_llava_next.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/llava_next/modeling_llava_next.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/llava_next/modeling_llava_next.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/llava_next/modeling_llava_next.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/llava_next/modeling_llava_next.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/llava_next/modeling_llava_next.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/llava_next/modeling_llava_next.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * pr comments * hardcode padding side for bs=1 * update * [run-slow] llava_next * [run-slow] llava_next * make fix-copies --------- Co-authored-by: NGUYEN, Xuan Phi <x.nguyen@alibaba-inc.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: raushan <raushan@huggingface.co> Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>	2024-05-15 19:02:56 +05:00
Sourab Mangrulkar	bdfefbadaf	Update ds_config_zero3.json (#30829 )	2024-05-15 10:02:31 -04:00
amyeroberts	64c06df325	Jamba - Skip 4d custom attention mask test (#30826 ) * Jamba - Skip 4d custom attention mask test * Skip assistant greedy test	2024-05-15 13:57:28 +01:00
Lysandre Debut	a42844955f	Loading GGUF files support (#30391 ) * Adds support for loading GGUF files Co-authored-by: Younes Belkada <younesbelkada@gmail.com> Co-authored-by: 99991 <99991@users.noreply.github.com> * add q2_k q3_k q5_k support from @99991 * fix tests * Update doc * Style * Docs * fix CI * Update docs/source/en/gguf.md * Update docs/source/en/gguf.md * Compute merges * change logic * add comment for clarity * add comment for clarity * Update src/transformers/models/auto/tokenization_auto.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * change logic * Update src/transformers/modeling_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * change * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/modeling_gguf_pytorch_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * put back comment * add comment about mistral * comments and added tests * fix unconsistent type * more * fix tokenizer * Update src/transformers/modeling_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * address comments about tests and tokenizer + add added_tokens * from_gguf -> gguf_file * replace on docs too --------- Co-authored-by: Younes Belkada <younesbelkada@gmail.com> Co-authored-by: 99991 <99991@users.noreply.github.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-05-15 14:28:20 +02:00
Raushan Turganbay	bd9f4d7951	Add Video Llava (#29733 ) * add model draft * update docstring * add tests * support image and video as input * update for better handling of mixed input and clean-up a bit * bug when mixed inputs & add tests * Update README.md Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Merge remote-tracking branch 'upstream/main' into video_llava * link to abstract of paper in README * fix test * fix-copies * make tests happy * skip docstest for now * do not run doctest for now * Update src/transformers/models/video_llava/processing_video_llava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/video_llava/image_processing_video_llava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/video_llava/image_processing_video_llava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/video_llava/image_processing_video_llava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/video_llava/image_processing_video_llava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/video_llava/test_modeling_video_llava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/video_llava/image_processing_video_llava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * address review comments * failing tests * Fix vocab_size in common tests for VLMs * codestyle * Update src/transformers/models/video_llava/configuration_video_llava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/video_llava/configuration_video_llava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/video_llava/modeling_video_llava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/video_llava/modeling_video_llava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/video_llava.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/video_llava.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/video_llava/image_processing_video_llava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/video_llava.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/video_llava/processing_video_llava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/video_llava/test_modeling_video_llava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/video_llava/test_modeling_video_llava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/video_llava/test_modeling_video_llava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * PR suggestions * fix-copies * Update src/transformers/models/video_llava/configuration_video_llava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/video_llava/configuration_video_llava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * add full example in docs * clean-up with new model-id * [run-slow] video_llava * update docstring * [run-slow] video_llava * remove all achive maps * fix some tests * test was supposed to be skipped for llava :) --------- Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-05-15 16:42:29 +05:00
Ondřej Cífka	be3aa43e5f	Support mixed-language batches in `WhisperGenerationMixin` (#29688 ) * Add support for mixing languages in a single batch * Update docstring * Enable different detected languages in batch * Do not require input_features * Test list of languages * Fix comment * Make init_tokens length-1 if possible, broadcast at the end * Test for ValueError with language list of incorrect length * Slow test for batched multilingual transcription * fixup * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Address review, refactor * Second attempt to move this line where it was originally * Split test, fix a bug --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>	2024-05-15 09:53:17 +02:00
Pablo Montalvo	1360801a69	Add PaliGemma (#30814 ) * add new model like * add state dict slicing + new model config * update palma config and weights, passes vision activations * fix * update * reorder loading/unpacking * clean up * add debug statements * change device * fix * debugging * fix noncausal mask * fixup sdpa + causal mask * fix activation function * remove debug before changing modeling file * add variants * debug attention mask in generate * revert to non-debug sdpa * revert gemma modifications * add custom language modeling * use Processor * add language modeling file to init * try thin wrapper around generate * Update * update mask * breakpoints galore * remove conflict * switch to left-padding * add incomplete model doc * add paligemma global files * batch rename paligemma * make generation match outputs and captioning * style * style * remove copied from + doc * remove more copied from * remove copy from projector * minor fix * update config and style * add readme - dummy * CORRECT image captioning * moving to args * add siglip proper + fix merging image + text features * take update_causal_mask from upstream * remove breakpoint * leverage AutoModel * fix input_ids slicing * make siglip head conditional * remove encoder_decoder value * remove unneeded modeling file * add commented 4d attention mask * FIXED generation with 4D mask * Update src/transformers/models/siglip/modeling_siglip.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix left padding detection * shuffle order of verifications * fix missing labels for training * fix * vectorize merging of features, improve slicing * improve testing before conversion * handle merging in processor * image token index depends on checkpoint * add variants, save processor too * save processors, base tokenizer off spm file * expand model embeddings due to additional image token * pass image processing args * add convert rgb to siglip processor * add \n token separately * fix tokenizer and prompts * fix docstrings * change to camel * fix casing * debug pos_ids and sdpa * pass and use cache_position * add flag for newline tokenization * Update src/transformers/models/paligemma/processing_paligemma.py Co-authored-by: Merve Noyan <merveenoyan@gmail.com> * simplify conversion script * add copied from * add precision to conversion script * Update src/transformers/models/paligemma/modeling_paligemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * clean up * Shift attention mask from `1:` After discussion with @molbap * add docs, fix quality * quality, tied weights inheritance, and logits/label alignment * fix more tests * pass attn_implementation to language model correctly * add SiglipVisionTransformer to no split modules * skip paligemma test for sdpa dispatch to flash * skip incompatible tests * quality * [broken archive maps] * Apply suggestions - remove archive lists - style - take shape of inputs_embeds for batch Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/utils/dummy_pt_objects.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * simplify conversion script * add suggestions * add suggestions * add copied from * fix * move labels out * revert * fix * remove placeholder labels if None * use cache_position * fix quality + docstrings * fix quality * fix paligemma 4d gemma mask incompatibility * fix config docstring * fix query and attn_mask dtype --------- Co-authored-by: ArthurZucker <arthur.zucker@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Merve Noyan <merveenoyan@gmail.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co>	2024-05-14 22:07:15 +02:00
Yikang Shen	ccdabc5642	Add JetMoE model (#30005 ) * init jetmoe code * update archive maps * remove flax import * fix import error * update README * ruff fix * update readme * fix * update config * fix issue * merge files * fix model bug * fix test * auto fix * model size * add comments * fix form * add flash attention support * fix attention head number * fix init * fix support list * sort auto mapping * fix test * fix docs * update test * fix test * fix test * change variable name * fix config * fix init * update format * clean code * fix config * fix config * change default config * update config * fix issues * update formate * update config argument * update format * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * change to mixtral aux loss * change to cache_position * debug * fix bugs * debug * fix format * fix format * fix copy * fix format * fix format * fix sort * fix sort * fix sort * add copy comment * add copy from * remove debug code * revert readme update * add copy * debug * remove debug code * fix flash attention * add comments * clean code * clean format * fix format * fix format * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * change variable name * add copied from * fix variable name * remove deprecated functinos * sync to llama implementation * fix format * fix copy * fix format * update format * remove repr * add comment for moe weight * fix copy * Update src/transformers/models/jetmoe/configuration_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * add comments and reformat config * fix format * fix format * fix format * update test * update doc string in config * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * update config doc * update attention cache * fix format * fix copy --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>	2024-05-14 16:32:01 +02:00
Raushan Turganbay	5ad960f1f4	Add Watermarking LogitsProcessor and WatermarkDetector (#29676 ) * add watermarking processor * remove the other hashing (context width=1 always) * make style * Update src/transformers/generation/logits_process.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/logits_process.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/logits_process.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/configuration_utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * update watermarking process * add detector * update tests to use detector * fix failing tests * rename `input_seq` * make style * doc for processor * minor fixes * docs * make quality * Update src/transformers/generation/configuration_utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/logits_process.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/watermarking.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/watermarking.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/watermarking.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * add PR suggestions * let's use lru_cache's default max size (128) * import processor if torch available * maybe like this * lets move the config to torch independet file * add docs * tiny docs fix to make the test happy * Update src/transformers/generation/configuration_utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/watermarking.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * PR suggestions * add docs * fix test * fix docs * address pr comments * style * Revert "style" This reverts commit `7f33cc34ff`. * correct style * make doctest green --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2024-05-14 13:31:39 +05:00
fxmarty	37bba2a32d	CI: update to ROCm 6.0.2 and test MI300 (#30266 ) * update to ROCm 6.0.2 and test MI300 * add callers for mi300 * update dockerfile * fix trainer tests * remove apex * style * Update tests/trainer/test_trainer_seq2seq.py * Update tests/trainer/test_trainer_seq2seq.py * Update tests/trainer/test_trainer_seq2seq.py * Update tests/trainer/test_trainer_seq2seq.py * update to torch 2.3 * add workflow dispatch target * we may need branches: mi300-ci after all * nit * fix docker build * nit * add check runner * remove docker-gpu * fix issues * fix --------- Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-05-13 18:14:36 +02:00
Marc Sun	539ed75d50	skip low_cpu_mem_usage tests (#30782 )	2024-05-13 18:00:43 +02:00
Alazar	94306352f4	Port IDEFICS to tensorflow (#26870 ) * Initial commit * Just a copy of modeling_idefics.py that will be ported to TF * - Prepend TF to the name of all classes - Convert pytorch ops to TF (not all operations are converted yet) * Add TF imports * Add autotranslated files * Add TF classes to model_tf_auto.py * Add the TF classes in model_doc * include auto-translated code * Adopted from auto-translated version * Add a forgotten super().build * Add test code for TF version. * Fix indentation and load pytorch weights for now * Some fixes. Many tests are still failing but some are passing now. - I have added TODO's for some of the hacks I made to unblock me and I will address them soon - I have the processing_idefics.py hacked in my view to support TF temporarily * Add ALL_LAYERNORM_LAYERS to match pytorch * Revert "Add ALL_LAYERNORM_LAYERS to match pytorch" This reverts commit 7e0a35119b4d7a6284d04d8c543fba1b29e573c9 as it is not needed in the tf implementation. * Fix freeze_relevant_params() * Some more fixes * Fix test_attention_outputs * Add tf stuff to processing_idefics.py processing_idefics.py supports both pytorch and tf now. test_processor_idefics.py for pytorch is passing, so i didn't break anything but still some issues with tf. I also need to add tf tests in test_processor_idefics.py. * Pass return_tensors to image processing code and fix test * Pass return_tensors to the image processor __init__ * Fix several test cases - Make input to some of the forward pass of type `TFModelInputType` - Decorate main layer forward pass with `@unpack_inputs` - Decorate main layer with `@keras_serializable` - Pass `inputs` to TFIdeficsModel * Some more fixes forgotten in last commit * Fix processing code and vision_tf.py * Fix perceiver bug * Import from * Auto-add build() methods + style pass * Fix build() errors due to `None` being passed as shape to some layers * Change name in TFIdeficsForVisionText2Text to attribute in IdeficsForVisionText2Text * Fix pytorch weights load for tf2 There were a lot of `name=` missing in weight initialization code. * Attempt to fix CI * Add back accidently removed line * Remove torch-specific stuff from the TF test file * make fix-copies, make style, remove autotranslated files * Fixes to imports/docstrings * Let's try the from future import in desperation * Fix the core random_attention_mask fn to match the torch/flax behaviour * Clean random_attention_mask up correctly * Remove torch-only test * Fix loss shape, couple of nits * make style * Don't test for OOB embeddings because IDEFICS uses those deliberately * Fix loss computation to handle masking * Fix test failures when flattening * Fix some test failures - Add cross attention gate which was missing and wasn't being passed arround - Fix overwriting of image_attention_mask due to hack I had for dummy inputs * Add a proper stateless scaled_dot_product_attention * make style * Adding missing attribute from the PyTorch version * Small cleanups to decoupledlinearlayer in case that helps * Pass epsilon to LayerNormalization * Attemp to fix pytorch weight cross-loading for TFIdeficsEmbedding * Fix a bug in TFIdeficsGatedCrossAttentionLayer * Patching up build() methods * Constant self.inv_freq * Constant self.inv_freq * First working version The TF implementation works now, there was a bug in the TFIdeficsDecoupledLinear where the weights were mis-intialized (in_features,out_features) when it should be: (out_features, in_features) I have tested this so far with tiny-random and idefics-9b-instruct and gives correct output. I also dumped the final outputs for both pytorch and TF and they are identical. * Fix some test failures * remove print statement * Fix return_tensors * Fix CI test failure check_code_quality * Attempt to fix CI failures by running `make fixup` The hardcoded IDs in test_modeling_tf_idefics.py are for the integration test and makes that file unreadable and should probably be moved to a seperate file. * Attempt to fix tests_pr_documentation_tests * Fix a test failure in test_image_processing_idefics.py * Fix test test_pt_tf_model_equivalence * Fix a few failures * Tiny fix * Some minor fixes * Remove a duplicate test * Override a few test failures for IDEFICS - `test_keras_save_load` is passing now - `test_compile_tf_model` is still failing * Fix processing_idefics.py after rebase * Guard import keras with is_tf_available * fix check code quality * fix check code quality * Minor fixes * Skip test_save_load temporarily This test passed on my local box but fails on the CI, skipping for now to see if there are other remaining failures on the CI. * Run `ruff format tests src utils` * Fix last failing test, `test_compile_tf_model` * Add fixes for vision_tf.py I forgot to add this file in last commit. * Minor fixes * Replace "<<<" with "<<" for doc tests IDEFICS-9B is too big for doctest runner, so don't run it there * Make code more readable * Fix bug after code review I added a layer_norm_eps to IdeficsConfig but I don't even need it since the vision config has a layer_norm_eps. * Fix after code review Use original code tokenizer.convert_tokens_to_ids * Keep PyTorch as the default return_tensors * Fixes to modeling_tf after code review * Fixes from code review - Remove all references of `TF_IDEFICS_PRETRAINED_MODEL_ARCHIVE_LIST` - Pass 1e-5 to LayerNormalization in perceiver * Run ruff * Undo a change * Refactor processing code after Matt's suggestion * Remove TODO's that aren't needed anymore * For pytorch, Use original pytorch processing code from main Since this PR is a TF port it shouldn't make any modifications to pytorch IDEFICS code. This changes undo's the pytorch processing modifications I made and uses original code from main. * Update tests/models/idefics/test_modeling_idefics.py * Update tests/models/idefics/test_modeling_tf_idefics.py * Add missing imports for is_pt_tf_cross_test * [DO NOT MERGE]: This is a commit for debugging and will be reverted The cross test `test_pt_tf_model_equivalence` passes locally but fails when running on the CI. This commit is to help debug that and will be reverted. * Revert "[DO NOT MERGE]: This is a commit for debugging and will be reverted" This reverts commit 8f0d709ec5bd46685fb0b4259d914ffee794875b. * [DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted * [DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted * Revert "[DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted" This reverts commit 998cc38b8c3d313bf5e5eb55a7f5b7b881897b89. * Revert "[DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted" This reverts commit 1c695ac4219c4ae4d39b330b01744dc27deb7dd4. * Don't skip test_save_load IIRC test_save_load was also failing on the CI but not on my local box, it might be easier to debug that on the CI first than the cross tests * Debugging commit, will be reverted * Revert "Debugging commit, will be reverted" This reverts commit 8eafc8e41e20c4e95a3a90834f06a6e9f445e2d5. * Override `test_save_load` and push model to save Maybe this will help me repro this weird bug * pass my repo_id * add endpoint * Pass a temp (write) token just for this CI * Undo last few commits, still pushing to hub for model debugging The issue seems to be with save_pretrained(), when I looked at the model saved from the CI test failure it is basically empty and has no weights. `self.save_weights(..)` seems to be failing in save_pretrained but needs more debugging * Add logging to modeling tf utils, will be reverted just for debugging * Debugging, will revert * Revert "Debugging, will revert" This reverts commit 9d0d3075fb7c82d8cde3a5c76bc8f3876c5c55d3. * Revert "Add logging to modeling tf utils, will be reverted just for debugging" This reverts commit 774b6b7b1c17b3ce5d7634ade768f2f686cee617. * Remove `test_save_load` The CI failures are gone after my latest rebase, no idea why but I was still saving the model to my hub on HF and the tf_model.h5 file now has everything. * Run make fix-copies * Run ruff format tests src utils * Debugging commit, will be reverted * Run ruff, also trigger CI run * Run ruff again * Undo debugging commit --------- Co-authored-by: Matt <rocketknight1@gmail.com> Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>	2024-05-13 15:59:46 +01:00
Fanli Lin	69d9bca55a	enable Pipeline to get device from model (#30534 ) * check model.device * fix * style fix * move model device * remove print * add comment * fix * add unit test * optimize * change test names and add more cases * Update tests/pipelines/test_pipelines_common.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-05-13 15:00:39 +01:00
Poedator	a0779b9e19	Llama: fix custom 4D masks, v2 (#30348 ) * 4d mask fixes * Update custom 4D mask logic * test moved to mixin * extra tests 4d mask * upd 4d mask and StaticCache handling * added Mask4DTestHard to mistral tests * post-rebase fixes * test fixes for StaticCache * make fix-copies * upd 1 after #30476 * fix common tests * rm elif attention_mask.dim() == 4: * tests combined, fixed, mixtral supported * bigbird style chg reverted * rm if attention_mask.dim() == 2 * modeling_llama formatting chg --------- Co-authored-by: Joao Gante <joao@huggingface.co>	2024-05-13 13:46:06 +02:00
Nilabhra Roy Chowdhury	e52741f601	Support for Falcon2-11B (#30771 ) * remove unrelated changes * remove unrelated changes on phi and stable LM * add: Test for Falcon 10B * fix: formatting * fix: loading the falcon 10B in 8 bit precision using bitsanbytes. * fix: device placement * fix: broken tests. * fix: backwards compatibility for falcon 1B architecture. * chore: updated test. * chore: test_modeling_falcon.py to use the 11B model. * chore: minor edit * chore: formating. --------- Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>	2024-05-13 13:32:43 +02:00
Zafir Stojanovski	f63d822242	Blip dynamic input resolution (#30722 ) * blip with interpolated pos encoding * feat: Add interpolate_pos_encoding option to other models from `BLIP` family. * include check for textual generated content in tests	2024-05-13 12:20:16 +01:00
Marc Sun	de6e0db184	[awq] replace scale when we have GELU (#30074 ) * fix awq test * style * add log * new fix * style * only modifying impacted model in the end * rename function	2024-05-13 11:41:03 +02:00
Joao Gante	7130a22db9	Generate: consistently handle special tokens as tensors (#30624 ) * tmp commit * [test_all] mvp * missing not * [test_all] final test fixes * fix musicgen_melody and rag * [test_all] empty commit * PR comments * Update src/transformers/generation/utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-05-09 18:01:57 +01:00
Jacky Lee	218f44135f	Fix image post-processing for OWLv2 (#30686 ) * feat: add note about owlv2 * fix: post processing coordinates * remove: workaround document * fix: extra quotes * update: owlv2 docstrings * fix: copies check * feat: add unit test for resize * Update tests/models/owlv2/test_image_processor_owlv2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-05-09 17:02:03 +01:00
Joao Gante	df53c6e5d9	Generate: add `min_p` sampling (#30639 ) * min_p * more relaxed test to avoid numerical issues * Update src/transformers/generation/logits_process.py Co-authored-by: menhguin <minh1228@gmail.com> * Update src/transformers/generation/configuration_utils.py Co-authored-by: menhguin <minh1228@gmail.com> * docstring clarifications * PR comments * Update tests/generation/test_logits_process.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * make fixup --------- Co-authored-by: menhguin <minh1228@gmail.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-05-09 14:36:53 +01:00
Lysandre Debut	297b732bdf	Removal of deprecated maps (#30576 ) * [test_all] Remove all imports Remove remaining ARCHIVE MAPS Remove remaining PRETRAINED maps * review comments * [test_all] empty commit to trigger tests	2024-05-09 14:15:56 +02:00
Jacky Lee	8c5b3c19cf	Enable dynamic resolution for vivit (#30630 ) * feat: enable dynamic resolution for vivit * fix: formatting * remove: print statement for testing * Update src/transformers/models/vivit/modeling_vivit.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/vivit/modeling_vivit.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/vivit/modeling_vivit.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/vivit/test_modeling_vivit.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/vivit/test_modeling_vivit.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/vivit/modeling_vivit.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/vivit/test_modeling_vivit.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/vivit/modeling_vivit.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/vivit/modeling_vivit.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/vivit/modeling_vivit.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/vivit/modeling_vivit.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix: style check --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-05-09 11:23:39 +01:00
David Xue	60293bd210	Add dynamic resolution input/interpolate position embedding to SigLIP (#30719 ) * Add interpolate positional encoding to siglip * Change # of patches for siglip interpolation test * fix formatting * Apply nit suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-05-09 11:10:38 +01:00
Joao Gante	f26e407370	Cache: models return input cache type (#30716 )	2024-05-08 18:26:34 +01:00
Anton Vlasjuk	71c1985069	Immutability for data collators (#30603 ) * immutability fix for seq2seq as well as immutability tests for the collators * ensure we don't act on none labels and formatting * remove tf/pt in respective tests as they are not required * more type error fixes tf/np * remove todo * apply suggestions from code review * formatting / style	2024-05-08 17:54:49 +01:00
David Xue	cf7bed9832	Add safetensors to model not found error msg for default use_safetensors value (#30602 ) * add safetensors to model not found error for default use_safetensors=None case * format code w/ ruff * fix assert true typo	2024-05-07 17:55:27 +01:00
Aymeric Roucher	0ba15cedbc	Reboot Agents (#30387 ) * Create CodeAgent and ReactAgent * Fix formatting errors * Update documentation for agents * Add custom errors, improve logging * Support variable usage in ReactAgent * add messages * Add message passing format * Create React Code Agent * Update * Refactoring * Fix errors * Improve python interpreter * Only non-tensor inputs should be sent to device * Calculator tool slight refactor * Improve docstrings * Refactor * Fix tests * Fix more tests * Fix even more tests * Fix tests by replacing output and input types * Fix operand type issue * two small fixes * EM TTS * Fix agent running type errors * Change text to speech tests to allow changed outputs * Update doc with new agent types * Improve code interpreter * If max iterations reached, provide a real answer instead of an error * Add edge case in interpreter * Add safe imports to the interpreter * Interpreter tweaks: tuples and listcomp * Make style * Make quality * Add dictcomp to interpreter * Rename ReactJSONAgent to ReactJsonAgent * Misc changes * ToolCollection * Rename agent's logger to self.logger * Add while loops to interpreter * Update doc with new tools. still need to mention collections * Add collections to the doc * Small fixes on logs and interpretor * Fix toolbox return type * Docs + fixup * Skip doctests * Correct prompts with improved examples and formatting * Update prompt * Remove outdated docs * Change agent to accept Toolbox object for tools * Remove calculator tool * Propagate removal of calculator in doc * Fix 2 failing workflows * Simplify additional argument passing * AgentType audio * Minor changes: function name, types * Remove calculator tests * Fix test * Fix torch requirement * Fix final answer tests * Style fixes * Fix tests * Update docstrings with calculator removal * Small type hint fixes * Update tests/agents/test_translation.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update tests/agents/test_python_interpreter.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/agents/default_tools.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/agents/tools.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update tests/agents/test_agents.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/bert/configuration_bert.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/agents/tools.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/agents/speech_to_text.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update tests/agents/test_speech_to_text.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update tests/agents/test_tools_common.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * pygments * Answer comments * Cleaning up * Simplifying init for all agents * Improving prompts and making code nicer * Style fixes * Add multiple comparator test in interpreter * Style fixes * Improve BERT example in documentation * Add examples to doc * Fix python interpreter quality * Logging improvements * Change test flag to agents * Quality fix * Add example for HfEngine * Improve conversation example for HfEngine * typo fix * Verify doc * Update docs/source/en/agents.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/agents/agents.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/agents/prompts.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/agents/python_interpreter.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/agents.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Fix style issues * local s2t tool --------- Co-authored-by: Cyril Kondratenko <kkn1993@gmail.com> Co-authored-by: Lysandre <lysandre@huggingface.co> Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-05-07 12:59:49 +02:00
Kamil Akesbi	9c8979e35f	Word-level timestamps broken for short-form audio (#30325 ) * force chunk_length_s in AutomaticSpeechRecognitionPipeline * compute num_frames even when stride is None * add slow tests * fix test * Update src/transformers/pipelines/automatic_speech_recognition.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/pipelines/test_pipelines_automatic_speech_recognition.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * add input validation * fixup * small fix --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-05-07 10:17:27 +01:00
JB (Don)	54a2361a29	Adding _tie_weights() to prediction heads to support low_cpu_mem_usage=True (#29024 ) * Adding _tie_weights() to prediction heads to support low_cpu_mem_usage=True * Testing for the non-safe-tensors case, since the default is safe-tensors already * Running fixup/fix-copies * Adding accelerate annotations to tests	2024-05-07 11:12:21 +02:00
Nate Cibik	df475bf8e6	Trainer - add cache clearing and the option for batched eval metrics computation (#28769 ) * Added cache clearing for GPU efficiency. * Added cache clearing for GPU efficiency. * Added batch_eval_metrics capability * Ran make fixup * Fixed bug * Fixed whitespace issue * Fixed outdated condition * Updated docstrings with instructions for batch_eval_metrics. Updated end of dataloader logic * Added first version of batch_eval_metrics Trainer test * Fixed batch_eval_metrics Trainer tests for both eval and predict * Fixed batch_eval_metrics behavior for new Trainer variables * Fixed batch_eval_metrics Trainer tests * Ran fixup	2024-05-06 08:23:40 -04:00
Clara Pohland	e076953079	Trainer._load_from_checkpoint - support loading multiple Peft adapters (#30505 ) * Trainer: load checkpoint model with multiple adapters * Trainer._load_from_checkpoint support multiple active adapters * PeftModel.set_adapter does not support multiple adapters yet * Trainer._load_from_checkpoint test multiple adapters --------- Co-authored-by: Clara Luise Pohland <clara-luise.pohland@telekom.de>	2024-05-06 08:22:52 -04:00
Younes Belkada	9c772ac888	Quantization / HQQ: Fix HQQ tests on our runner (#30668 ) Update test_hqq.py	2024-05-06 11:33:52 +02:00
Arthur	307f632bb2	[`CI update`] Try to use dockers and no cache (#29202 ) * change cis * nits * update * minor updates * [push-ci-image] * nit [push-ci-image] * nitsssss * [build-ci-image] * [push-ci-image] * [push-ci-image] * both * [push-ci-image] * this? * [push-ci-image] * pypi-kenlm needs g++ * [push-ci-image] * nit * more nits [push-ci-image] * nits [push-ci-image] * [push-ci-image] * [push-ci-image] * [push-ci-image] * add vision * [push-ci-image] * [push-ci-image] * add new dummy file but will need to update them [push-ci-image] * [push-ci-image] * show package size as well * [push-ci-image] * potentially ignore failures * workflow updates * nits [push-ci-image] * [push-ci-image] * fix consistency * clean nciida triton * also show big packages [push-ci-image] * nit * update * another one * line escape? * add accelerate [push-ci-image] * updates [push-ci-image] * nits to run tests, no push-ci * try to parse skip reason to make sure nothing is skipped that should no be skippped * nit? * always show skipped reasons * nits * better parsing of the test outputs * action="store_true", * failure on failed * show matched * debug * update short summary with skipped, failed and errors * nits * nits * coolu pdates * remove docbuilder * fix * always run checks * oups * nits * don't error out on library printing * non zero exi codes * no warning * nit * WAT? * format nit * [push-ci-image] * fail if fail is needed * [push-ci-image] * sound file for torch light? * [push-ci-image] * order is important [push-ci-image] * [push-ci-image] reduce even further * [push-ci-image] * use pytest rich ! * yes [push-ci-image] * oupsy * bring back the full traceback, but pytest rich should help * nit * [push-ci-image] * re run * nit * [push-ci-image] * [push-ci-image] * [push-ci-image] * empty push to trigger * [push-ci-image] * nit? [push-ci-image] * empty * try to install timm with no deps * [push-ci-image] * oups [push-ci-image] * [push-ci-image] * [push-ci-image] ? * [push-ci-image] open ssh client for git checkout fast * empty for torch light * updates [push-ci-image] * nit * @v4 for checkout * [push-ci-image] * [push-ci-image] * fix fetch tests with parallelism * [push-ci-image] * more parallelism * nit * more nits * empty to re-trigger * empty to re-trigger * split by timing * did not work with previous commit * junit.xml * no path? * mmm this? * junitxml format * split by timing * nit * fix junit family * now we can test if the xunit1 is compatible! * this? * fully list tests * update * update * oups * finally * use classname * remove working directory to make sure the path does not interfere * okay no juni should have the correct path * name split? * sort by classname is what make most sense * some testing * naem * oups * test something fun * autodetect * 18? * nit * file size? * uip * 4 is best * update to see versions * better print * [push-ci-image] * [push-ci-image] * please install the correct keras version * [push-ci-image] * [push-ci-image] * [push-ci-image] * [push-ci-image] * [push-ci-image] * uv is fucking me up * [push-ci-image] * [push-ci-image] * [push-ci-image] * nits * [push-ci-image] * [push-ci-image] * install issues an pins * tapas as well * nits * more paralellism * short tb * soundfile * soundfile * [push-ci-image] * [push-ci-image] * [push-ci-image] * oups * [push-ci-image] * fix some things * [push-ci-image] * [push-ci-image] * [push-ci-image] * [push-ci-image] * use torch-light for hub * small git lfs for hub job * [push-ci-image] * [push-ci-image] * [push-ci-image] * [push-ci-image] * fix tf tapas * [push-ci-image] * nits * [push-ci-image] * don't update the test * [push-ci-image] * [push-ci-image] * [push-ci-image] * no use them * [push-ci-image] * [push-ci-image] * [push-ci-image] * [push-ci-image] * update tf proba * [push-ci-image] * [push-ci-image] * woops * [push-ci-image] * [push-ci-image] * [push-ci-image] * [push-ci-image] * [push-ci-image] * [push-ci-image] * test with built dockers * [push-ci-image] * skip annoying tests * revert fix copy * update test values * update * last skip and fixup * nit * ALL GOOOD * quality * Update tests/models/layoutlmv2/test_image_processing_layoutlmv2.py * Update docker/quality.dockerfile Co-authored-by: Lysandre Debut <hi@lysand.re> * Update src/transformers/models/tapas/modeling_tf_tapas.py Co-authored-by: Lysandre Debut <hi@lysand.re> * Apply suggestions from code review Co-authored-by: Lysandre Debut <hi@lysand.re> * use torch-speed * updates * [push-ci-image] * [push-ci-image] * [push-ci-image] * [push-ci-image] * fuck ken-lm [push-ci-image] * [push-ci-image] * [push-ci-image] --------- Co-authored-by: Lysandre Debut <hi@lysand.re>	2024-05-06 10:10:32 +02:00
mobicham	59952994c4	Add HQQ quantization support (#29637 ) * update HQQ transformers integration * push import_utils.py * add force_hooks check in modeling_utils.py * fix \| with Optional * force bias as param * check bias is Tensor * force forward for multi-gpu * review fixes pass * remove torch grad() * if any key in linear_tags fix * add cpu/disk check * isinstance return * add multigpu test + refactor tests * clean hqq_utils imports in hqq.py * clean hqq_utils imports in quantizer_hqq.py * delete hqq_utils.py * Delete src/transformers/utils/hqq_utils.py * ruff init * remove torch.float16 from __init__ in test * refactor test * isinstance -> type in quantizer_hqq.py * cpu/disk device_map check in quantizer_hqq.py * remove type(module) nn.linear check in quantizer_hqq.py * add BaseQuantizeConfig import inside HqqConfig init * remove hqq import in hqq.py * remove accelerate import from test_hqq.py * quant config.py doc update * add hqqconfig to main_classes doc * make style * __init__ fix * ruff __init__ * skip_modules list * hqqconfig format fix * hqqconfig doc fix * hqqconfig doc fix * hqqconfig doc fix * hqqconfig doc fix * hqqconfig doc fix * hqqconfig doc fix * hqqconfig doc fix * hqqconfig doc fix * hqqconfig doc fix * test_hqq.py remove mistral comment * remove self.using_multi_gpu is False * torch_dtype default val set and logger.info * hqq.py isinstance fix * remove torch=None * torch_device test_hqq * rename test_hqq * MODEL_ID in test_hqq * quantizer_hqq setattr fix * quantizer_hqq typo fix * imports quantizer_hqq.py * isinstance quantizer_hqq * hqq_layer.bias reformat quantizer_hqq * Step 2 as comment in quantizer_hqq * prepare_for_hqq_linear() comment * keep_in_fp32_modules fix * HqqHfQuantizer reformat * quantization.md hqqconfig * quantization.md model example reformat * quantization.md # space * quantization.md space }) * quantization.md space }) * quantization_config fix doc Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * axis value check in quantization_config * format * dynamic config explanation * quant config method in quantization.md * remove shard-level progress * .cuda fix modeling_utils * test_hqq fixes * make fix-copies --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-05-02 17:51:49 +01:00
Jonghwan Hyeon	4c940934da	Output `None` as attention when layer is skipped (#30597 ) * Output `None` as attention when layer is skipped * Add test for output_attentions	2024-05-02 17:25:19 +01:00
Richard Brown	f95302584b	🚨 Update image_processing_vitmatte.py (#30566 ) * Update image_processing_vitmatte.py * add test * [run-slow]vitmatte	2024-05-02 11:00:07 +01:00
Michael Benayoun	fbabd6746f	Fix for Neuron (#30259 )	2024-05-02 10:24:47 +02:00
Fraser Mince	5090ea3f68	Fix llava half precision and autocast issues (#29721 ) * Ensure input_embeds and image_features are the same dtype in autocast * Fix nans in half precision llava-next and fix autocasting behavior. * Fix styling issues. * fix randn newline instantiation * fix broken slow llava test * Fix llava next init. * fix styling issues * [run-slow]llava,llava_next * fix styling issues	2024-05-01 17:49:44 +01:00
Raushan Turganbay	38a4bf79ad	Encoder-decoder models: move embedding scale to nn.Module (#30410 ) * move scaling to nn.Module * let the test be here for now (need to fix) * failing tests * last failing models * Revert commit `4c14817f38` * clean-up * oops forgot * codestyle * raise NotImplemented when possible * Update tests/test_modeling_common.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * skip tests in respective modeling files --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-05-01 12:33:00 +05:00
Raushan Turganbay	9d31b32e9d	Use text config's vocab size in testing models (#30568 ) use text config's vocab size	2024-05-01 12:32:45 +05:00
Yih-Dar	78fdd64dcf	Remove `use_square_size` after loading (#30567 ) * fix * add test --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-04-30 21:11:37 +02:00
DarshanDeshpande	2ecefc3959	Add chat templating support for KeyDataset in text-generation pipeline (#30558 ) * added chat templating support for keydataset in generation pipeline * fixed and improved test * fix formatting test failures * Fix tests * Fix tests	2024-04-30 19:51:41 +01:00
Jiarui Xu	0cdb6b3f92	BlipModel: get_multimodal_features method (#30438 ) * add_blip_get_multimodal_feautres * Fix docstring error * reimplement get_multimodal_features * fix error * recheck code quality * add new necessary tests	2024-04-30 19:01:01 +01:00
Anton Vlasjuk	9112520b15	Fix seq2seq collator padding (#30556 ) * fix seq2seq data collator to respect the given padding strategy further added tests for the seq2seq data collator in the style of the `data_collator_for_token_classification` (pt, tf, np) * formatting and change bool equals "==" to "is" * add missed return types in tests * update numpy test as it can handle unequal shapes, not like pt or tf	2024-04-30 18:32:30 +01:00
Joao Gante	75bbfd5b22	Cache: Static cache as a standalone object (#30476 )	2024-04-30 16:37:19 +01:00
Eduardo Pacheco	6d4cabda26	[SegGPT] Fix seggpt image processor (#29550 ) * Fixed SegGptImageProcessor to handle 2D and 3D prompt mask inputs * Added new test to check prompt mask equivalence * New proposal * Better proposal * Removed unnecessary method * Updated seggpt docs * Introduced do_convert_rgb * nits	2024-04-26 19:40:12 +01:00
amyeroberts	c793b26f2e	load_image - decode b64encode and encodebytes strings (#30192 ) * Decode b64encode and encodebytes strings * Remove conditional encode -- image is always a string	2024-04-26 18:21:47 +01:00
amyeroberts	aafa7ce72b	[`DETR`] Remove timm hardcoded logic in modeling files (#29038 ) * Enable instantiating model with pretrained backbone weights * Clarify pretrained import * Use load_backbone instead * Add backbone_kwargs to config * Fix up * Add tests * Tidy up * Enable instantiating model with pretrained backbone weights * Update tests so backbone checkpoint isn't passed in * Clarify pretrained import * Update configs - docs and validation check * Update src/transformers/utils/backbone_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Clarify exception message * Update config init in tests * Add test for when use_timm_backbone=True * Use load_backbone instead * Add use_timm_backbone to the model configs * Add backbone_kwargs to config * Pass kwargs to constructors * Draft * Fix tests * Add back timm - weight naming * More tidying up * Whoops * Tidy up * Handle when kwargs are none * Update tests * Revert test changes * Deformable detr test - don't use default * Don't mutate; correct model attributes * Add some clarifying comments * nit - grammar is hard --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-04-26 16:55:24 +01:00
JB (Don)	dfa7b580e9	[`BERT`] Add support for sdpa (#28802 ) * Adding SDPA support for BERT * Using the proper input name for testing model input in inference() * Adding documentation for SDPA in BERT model page * Use the stable link for the documentation * Adding a gate to only call .contiguous() for torch < 2.2.0 * Additions and fixes to the documentation * Minor updates to documentation * Adding extra requirements needed for the contiguous() bug * Adding "Adapted from" in plcae of the "Copied from" * Add benchmark speedup tables to the documentation * Minor fixes to the documentation * Use ClapText as a replacemenet for Bert in the Copied-From * Some more fixes for the fix-copies references * Overriding the test_eager_matches_sdpa_generate in bert tests to not load with low_cpu_mem_usage [test all] * Undo changes to separate test * Refactored SDPA self attention code for KV projections * Change use_sdpa to attn_implementation * Fix test_sdpa_can_dispatch_on_flash by preparing input (required for MultipleChoice models)	2024-04-26 16:23:44 +01:00
Matt	2de5cb12be	Use the Keras set_random_seed in tests (#30504 ) Use the Keras set_random_seed to ensure reproducible weight initialization	2024-04-26 16:14:53 +01:00
Michael Goin	20081c743e	Update `dtype_byte_size` to handle torch.float8_e4m3fn/float8_e5m2 types (#30488 ) * Update modeling_utils/dtype_byte_size to handle float8 types * Add a test for dtype_byte_size * Format * Fix bool	2024-04-26 11:26:43 +01:00
Raushan Turganbay	e60491adc9	Fix Llava for 0-embeddings (#30473 )	2024-04-25 20:28:51 +05:00
Zach Mueller	ad697f1801	Introduce Stateful Callbacks (#29666 ) * Introduce saveable callbacks * Add note * Test for non-present and flag * Support early stopping and refusing to train further * Update docstring * More saving * Import oopsie * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Make it go through TrainerArguments * Document * Fix test * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Rework to allow for duplicates * CLean * Fix failing tests --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-04-25 11:00:09 -04:00
Alexander Visheratin	7b1170b0fa	Add WSD scheduler (#30231 ) * Added WSD scheduler. * Added tests. * Fixed errors. * Fix formatting. * CI fixes.	2024-04-25 12:07:21 +01:00
Yoach Lacombe	90cb55bf77	🚨 Add training compatibility for Musicgen-like models (#29802 ) * first modeling code * make repository * still WIP * update model * add tests * add latest change * clean docstrings and copied from * update docstrings md and readme * correct chroma function * correct copied from and remove unreleated test * add doc to toctree * correct imports * add convert script to notdoctested * Add suggestion from Sanchit Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * correct get_uncoditional_inputs docstrings * modify README according to SANCHIT feedback * add chroma to audio utils * clean librosa and torchaudio hard dependencies * fix FE * refactor audio decoder -> audio encoder for consistency with previous musicgen * refactor conditional -> encoder * modify sampling rate logics * modify license at the beginning * refactor all_self_attns->all_attentions * remove ignore copy from causallm generate * add copied from for from_sub_models * fix make copies * add warning if audio is truncated * add copied from where relevant * remove artefact * fix convert script * fix torchaudio and FE * modify chroma method according to feedback-> better naming * refactor input_values->input_features * refactor input_values->input_features and fix import fe * add input_features to docstrigs * correct inputs_embeds logics * remove dtype conversion * refactor _prepare_conditional_hidden_states_kwargs_for_generation ->_prepare_encoder_hidden_states_kwargs_for_generation * change warning for chroma length * Update src/transformers/models/musicgen_melody/convert_musicgen_melody_transformers.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * change way to save wav, using soundfile * correct docs and change to soundfile * fix import * fix init proj layers * add draft training * fix cross entropy * clean loss computation * fix labels * remove line breaks from md * fix issue with docstrings * add FE suggestions * improve is in logics and remove useless imports * remove custom from_pretrained * simplify docstring code * add suggestions for modeling tests * make style * update converting script with sanity check * remove encoder attention mask from conditional generation * replace musicgen melody checkpoints with official orga * rename ylacombe->facebook in checkpoints * fix copies * remove unecessary warning * add shape in code docstrings * add files to slow doc tests * fix md bug and add md to not_tested * make fix-copies * fix hidden states test and batching * update training code * add training tests for melody * add training for o.g musicgen * fix copied from * remove final todos * make style * fix style * add suggestions from review * add ref to the original loss computation code * rename method + fix labels in tests * make style --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>	2024-04-25 12:51:19 +02:00
amyeroberts	aca4a1037f	Don't run fp16 MusicGen tests on CPU (#30466 )	2024-04-25 11:14:07 +01:00
Gustavo de Rosa	c9693db2fc	Phi-3 (#30423 ) * chore(root): Initial commit of Phi-3 files. * fix(root): Fixes Phi-3 missing on readme. * fix(root): Ensures files are consistent. * fix(phi3): Fixes unit tests. * fix(tests): Fixes style of phi-3 test file. * chore(tests): Adds integration tests for Phi-3. * fix(phi3): Removes additional flash-attention usage, .e.g, swiglu and rmsnorm. * fix(phi3): Fixes incorrect docstrings. * fix(phi3): Fixes docstring typos. * fix(phi3): Adds support for Su and Yarn embeddings. * fix(phi3): Improves according first batch of reviews. * fix(phi3): Uses up_states instead of y in Phi3MLP. * fix(phi3): Uses gemma rotary embedding to support torch.compile. * fix(phi3): Improves how rotary embedding classes are defined. * fix(phi3): Fixes inv_freq not being re-computed for extended RoPE. * fix(phi3): Adds last suggestions to modeling file. * fix(phi3): Splits inv_freq calculation in two lines.	2024-04-24 17:32:09 +02:00
Eduardo Pacheco	d26c14139c	[SegGPT] Fix loss calculation (#30421 ) * Fixed main train issues * Added loss test * Update src/transformers/models/seggpt/modeling_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Added missing labels arg in SegGptModel forward * Fixed typo * Added slow test to test loss calculation --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-04-24 15:24:34 +01:00
Fanli Lin	16c8e176f9	[tests] make test device-agnostic (#30444 ) * make device-agnostic * clean code	2024-04-24 11:21:27 +01:00
Arthur	9a4a119c10	[`Llava`] + CIs fix red cis and llava integration tests (#30440 ) * nit * nit and fmt skip * fixup * Update src/transformers/convert_slow_tokenizer.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * set to true --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-04-24 10:51:35 +02:00
Pavel Iakubovskii	767e351840	Fix YOLOS image processor resizing (#30436 ) * Add test for square image that fails * Fix for square images * Extend test cases * Fix resizing in tests * Style fixup	2024-04-24 09:50:17 +01:00
Arthur	e34da3ee3c	[`LlamaTokenizerFast`] Refactor default llama (#28881 ) * push legacy to fast as well * super strange * Update src/transformers/convert_slow_tokenizer.py * make sure we are BC * fix Llama test * nit * revert * more test * style * update * small update w.r.t tokenizers * nit * don't split * lol * add a test for `add_prefix_space=False` * fix gemma tokenizer as well * update * fix gemma * nicer failures * fixup * update * fix the example for legacy = False * use `huggyllama/llama-7b` for the PR doctest * nit * use from_slow * fix llama	2024-04-23 23:12:59 +02:00
Raushan Turganbay	77b59dce9f	Fix on "cache position" for assisted generation (#30068 ) * clean commit history I hope * get kv seq length correctly * PR suggestions * Update src/transformers/testing_utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * add comment * give gpt bigcode it's own overriden method * remove code --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2024-04-23 16:23:36 +05:00
Fanli Lin	2d61823fa2	[tests] add `require_torch_sdpa` for test that needs sdpa support (#30408 ) * add cuda flag * check for sdpa * add bitsandbytes	2024-04-23 10:39:38 +01:00
Eduardo Pacheco	c651ea982b	[Grounding DINO] Add support for cross-attention in GroundingDinoMultiHeadAttention (#30364 ) * Added cross attention support * Fixed dtypes * Fixed assumption * Moved to decoder	2024-04-23 09:56:14 +01:00
zhong zhuang	b4c18a830a	[FEAT]: EETQ quantizer support (#30262 ) * [FEAT]: EETQ quantizer support * Update quantization.md * Update docs/source/en/main_classes/quantization.md Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update docs/source/en/quantization.md Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update docs/source/en/quantization.md Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/integrations/__init__.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/integrations/__init__.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/integrations/eetq.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/integrations/eetq.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/integrations/eetq.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update tests/quantization/eetq_integration/test_eetq.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/quantizers/auto.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/quantizers/auto.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/quantizers/auto.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/quantizers/quantizer_eetq.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update tests/quantization/eetq_integration/test_eetq.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/quantizers/quantizer_eetq.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update tests/quantization/eetq_integration/test_eetq.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update tests/quantization/eetq_integration/test_eetq.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * [FEAT]: EETQ quantizer support * [FEAT]: EETQ quantizer support * remove whitespaces * update quantization.md * style * Update docs/source/en/quantization.md Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * add copyright * Update quantization.md * Update docs/source/en/quantization.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/quantization.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Address the comments by amyeroberts * style --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Marc Sun <marc@huggingface.co> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-04-22 20:38:58 +01:00
Kamil Akesbi	569743f510	Add sdpa and fa2 the Wav2vec2 family. (#30121 ) * add sdpa to wav2vec. Co-authored-by: kamilakesbi <kamil@huggingface.co> Co-authored-by: jp1924 <jp42maru@gmail.com> * add fa2 to wav2vec2 * add tests * fix attention_mask compatibility with fa2 * minor dtype fix * replace fa2 slow test * fix fa2 slow test * apply code review + add fa2 batch test * add sdpa and fa2 to hubert * sdpa and fa2 to data2vec_audio * sdpa and fa2 to Sew * sdpa to unispeech + unispeech sat * small fix * attention mask in tests Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * add_speedup_benchmark_to_doc --------- Co-authored-by: kamil@huggingface.co <kamil.akesbi@gmail.com> Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>	2024-04-22 18:30:38 +01:00
Pavel Iakubovskii	13b3b90ab1	Fix DETA save_pretrained (#30326 ) * Add class_embed to tied weights for DETA * Fix test_tied_weights_keys for DETA model * Replace error raise with assert statement	2024-04-22 17:11:13 +01:00
Joao Gante	6c7335e053	Jamba: fix left-padding test (#30389 ) fix test	2024-04-22 17:02:55 +01:00
Matt	0d84901cb7	Terminator strings for generate() (#28932 ) * stash commit (will discard all of this) * stash commit * First commit - needs a lot of testing! * Add a test * Fix imports and make the tests actually test something * Tests pass! * Rearrange test * Add comments (but it's still a bit confusing) * Stop storing the tokenizer * Comment fixup * Fix for input_ids with a single sequence * Update tests to test single sequences * make fixup * Fix incorrect use of isin() * Expand tests to catch more cases * Expand tests to catch more cases * make fixup * Fix length calculation and update tests * Handle Ġ as a space replacement too * Update src/transformers/generation/stopping_criteria.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Add optimizations from Joao's suggestion * Remove TODO * Update src/transformers/generation/stopping_criteria.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update tests/generation/test_stopping_criteria.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * make fixup * Rename some variables and remove some debugging clauses for clarity * Add tests for the sub-methods * Clarify one test slightly * Add stop_strings to GenerationConfig * generate() supports stop_string arg, asks for tokenizer if not provided * make fixup * Cleanup code and rename variables for clarity * Update tokenizer error * Update tokenizer passing, handle generation on GPU * Slightly more explanation cleanup * More comment cleanup * Factor out the token cleanup so it's more obvious what we're doing, and we can change it later * Careful with that cleanup! * Cleanup + optimizations to _get_matching_positions * More minor performance tweaks * Implement caching and eliminate some expensive ops (startup time: 200ms -> 9ms) * Remove the pin_memory call * Parallelize across all stop strings! * Quick fix for tensor devices * Update embeddings test for the new format * Fix test imports * Manual patching for BERT-like tokenizers * Return a bool vector instead of a single True/False * Better comment * Better comment * Add tests from @zucchini-nlp * Amy's list creation nit * tok_list -> token_list * Push a big expanded docstring (should we put it somewhere else?) * Expand docstrings * Docstring fixups * Rebase * make fixup * Make a properly general method for figuring out token strings * Fix naming throughout the functions * Move cache, refactor, fix tests * Add comment * Remove finished TODO * Remove finished TODO * make fixup * Update src/transformers/generation/stopping_criteria.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update and shorten docstring * Update tests to be shorter/clearer and test specific cases --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-04-22 14:13:04 +01:00
Howard Liberty	f16caf44bb	Add FSDP config for CPU RAM efficient loading through accelerate (#30002 ) * Add FSDP config for CPU RAM efficient loading * Style fix * Update src/transformers/training_args.py Co-authored-by: Zach Mueller <muellerzr@gmail.com> * Update src/transformers/training_args.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Add sync_module_states and cpu_ram_efficient_loading validation logic * Update src/transformers/training_args.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Style --------- Co-authored-by: Zach Mueller <muellerzr@gmail.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-04-22 13:15:28 +01:00
João David	d2cec09baa	Add TF swiftformer (#23342 ) * Duplicate swiftformer * Convert SwiftFormerPatchEmbedding * Convert SwiftFormerEmbeddings * Convert TFSwiftFormerMlp * Convert TFSwiftFormerConvEncoder * Convert TFSwiftFormerLocalRepresentation * convert TFSwiftFormerEncoderBlock * Convert SwiftFormerStage * Convert SwiftFormerEncoder * Add TFSWiftFormerPreTrainedModel * Convert SwiftFormerForImageClassification * Add kwargs and start drop path * Fix syntax * Change Model class name * Add TFSwiftFormer to __init__ * Duplicate test_modeling_swiftformer * First test conversions * Change require_torch to require_tf * Add exports to swiftformer __init__ * Add TFSwiftFormerModel wrapper * Fix __init__ and run black * Remove docstring from MainLayer, fix padding * Use keras.layers.Activation on keras.Sequential * Fix swiftformer exports * Fix activation layer from config * Remove post_inits * Use tf.keras.layers.ZeroPadding2D * Convert torch normalize * Change tf test input shape * Fix softmax and reduce_sum * Convert expand_dims and repeat * Add missing reshape and tranpose * Simplify TFSwiftFormerEncoderBlock.call * Fix mismatch in patch embeddings * Fix expected output shape to match channels last * Fix swiftformer typo * Disable test_onnx * Fix TFSwiftFormerForImageClassification call * Add unpack inputs * Convert flatten(2).mean(-1) * Change vision dummy inputs (to be reviewed) * Change test_forward_signature to use .call * Fix @unpack_inputs * Set return_tensors="tf" and rename class * Rename wrongly named patch_embeddings layer * Add serving_output and change dummy_input shape * Make dimensions BCHW and transpose inside embedding layer * Change SwiftFormerEncoderBlock * Fix ruff problems * Add image size to swiftformer config * Change tranpose to MainLayer and use -1 for reshape * Remove serving_outputs and dummy_inputs * Remove test_initialization test from tf model * Make Sequential component a separate layer * Fix layers' names * Tranpose encoder outputs * Fix tests and check if hidden states is not None * Fix TFSwiftFormerForImageClassification * Run make fixup * Run make fix-copies * Update modeling_tf_auto * Update docs * Fix modeling auto mapping * Update modelint_tf_swiftformer docs * Fill image_size doc and type * Add reduction=None to loss computation * Update docs * make style * Debug: Delete the tip to see if that changes anything * Re-add tip * Remove add_code_sample_docstrings * Remove unused import * Get the debug to actually tell us the problem it has with the docs * Try a substitution to match the PyTorch file? * Add swiftformer to ignore list * Add build() methods * Update copyright year Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Remove FIXME comment * Remove from_pt * Update copyright year Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Rename one-letter variables * Remove FIXMEs related to momentum * Remove old TODO comment * Remove outstanding FIXME comments * Get dropout rate from config * Add specific dropout config for MLP * Add convencoder dropout to config * Pass config to SwiftFormerDropPath layer * Fix drop_path variable name and add Adapted from comment * Run ruff * Removed copied from comment * Run fix copies * Change drop_path to identity to match pt * Cleanup build() methods and move to new keras imports * Update docs/source/en/model_doc/swiftformer.md Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * Raise error if drop_path_rate > 0.0 * Apply suggestions from code review Replace (self.dim), with self.dim, Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * Remove drop_path function * Add training to TFSwiftFormerEncoder * Set self.built = True last Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Should have been added to previous commit Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Change default_feature_extractor to default_image_processor Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Import Keras from modeling_tf_utils * Remove relative import * Run ruff --fix * Move import keras to tf_available * Add copied from comment to test_forward_signature * Reduce batch size and num_labels * Extract loss logic to hf_compute_loss * Run ruff format --------- Co-authored-by: Matt <rocketknight1@gmail.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>	2024-04-19 18:31:43 +01:00
hoshi-hiyouga	21c912e79c	Fix config + attn_implementation in AutoModelForCausalLM.from_pretrained (#30299 ) * Update modeling_utils.py * Update test_modeling_utils.py * Update test_modeling_utils.py * Update test_modeling_utils.py	2024-04-19 17:45:53 +01:00
Raushan Turganbay	b1cd48740e	Do not remove half seq length in generation tests (#30016 ) * remove seq length from generation tests * style and quality * [test_all] & PR suggestion Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update tests/generation/test_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * [test all] remove unused variables --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-04-19 17:32:52 +01:00
Marc Sun	b4fd49b6c5	Update unwrap from accelerate (#29933 ) * Use unwrap with the one in accelerate * oups * update unwrap * fix * wording * raise error instead * comment * doc * Update src/transformers/modeling_utils.py Co-authored-by: Zach Mueller <muellerzr@gmail.com> * style * put else --------- Co-authored-by: Zach Mueller <muellerzr@gmail.com>	2024-04-19 18:05:34 +02:00
Sanchit Gandhi	4ed0e51cc3	[Whisper] Fix slow tests (#30152 ) * fix tests * style * more fixes * move model to device * move logits to cpu * update expected values * use ungated dataset * fix * fix * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-04-19 13:21:46 +02:00
Sanchit Gandhi	cd09a8dfbc	[Feature Extractors] Fix kwargs to pre-trained (#30260 ) fixes	2024-04-19 11:16:08 +01:00
Jacky Lee	30b453206d	Enable multi-device for some models (#30207 ) * feat: multidevice for resnet * feat: yes! resnet * fix: compare all elements in tuple * feat: support for regnet * feat: support for convnextv2 * feat: support for bit * feat: support for cvt * feat: add support for focalnet * feat: support for yolos * feat: support for glpn * feat: support for imagegpt * feat: support for levit * feat: support for mgp_str * feat: support for mobilnet_v1 * feat: support for mobilnet_v2 * feat: support for mobilevit * feat: support for mobilevitv2 * feat: support for poolformer * fix: copies * fix: code quality check * update: upstream changes from main * fix: consistency check * feat: support for sam * feat: support for switchformer * feat: support for swin * feat: support for swinv2 * feat: support for timesformer * feat: suport for trocr * feat: support for upernet * fix: check copies * update: rerun CI * update: rerun again, maybe * update: one more rerun --------- Co-authored-by: Jacky Lee <jackylee328@gmail.com>	2024-04-19 09:24:44 +01:00
NielsRogge	ecfe9be705	[UDOP] Add special tokens to tokenizer (#29594 ) * Add special tokens * Add special tokens * Use fmt * Uncomment code * Add test * Remove scripts * Address comments * Improve tests * Address comment * Remove flag	2024-04-19 09:06:01 +02:00
Zach Mueller	60d5f8f9f0	🚨🚨🚨Deprecate `evaluation_strategy` to `eval_strategy`🚨🚨🚨 (#30190 ) * Alias * Note alias * Tests and src * Rest * Clean * Change typing? * Fix tests * Deprecation versions	2024-04-18 12:49:43 -04:00
Albert Villanova del Moral	c86d020ead	Fix test transposing image with EXIF Orientation tag (#30319 ) * Fix test with exif_transpose image * Replace datasets with PIL to load image in tests	2024-04-18 17:41:20 +01:00
Younes Belkada	5728b5ad00	FIX: Fixes unexpected behaviour for Llava / LLama & AWQ Fused modules + revert #30070 at the same time (#30317 ) * Update awq.py * style * revert felix PR * fix * add felix comments	2024-04-18 15:51:17 +02:00

1 2 3 4 5 ...

3756 Commits