transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-04 21:30:07 +06:00

Author	SHA1	Message	Date
Juri Ganitkevitch	ec29d25d9f	Add missing None check for hf_quantizer (#28804 ) * Add missing None check for hf_quantizer * Add test, fix logic. * make style * Switch test model to Mistral * Comment * Update tests/test_modeling_utils.py --------- Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>	2024-02-02 09:34:12 +01:00
Yih-Dar	0754217c82	Use `LoggingLevel` context manager in 3 tests (#28575 ) * inside with LoggingLevel * remove is_flaky --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-01-18 13:41:25 +00:00
Joao Gante	f4f57f9dfa	Config: warning when saving generation kwargs in the model config (#28514 )	2024-01-16 18:31:01 +00:00
fxmarty	66db33ddc8	Fix mismatching loading in from_pretrained with/without accelerate (#28414 ) * fix mismatching behavior in from_pretrained with/without accelerate * meaningful refactor * remove added space * add test * fix model on the hub * comment * use tiny model * style	2024-01-16 14:29:51 +01:00
Younes Belkada	1b9a2e4c80	[`core`/ FEAT] Add the possibility to push custom tags using `PreTrainedModel` itself (#28405 ) * v1 tags * remove unneeded conversion * v2 * rm unneeded warning * add more utility methods * Update src/transformers/utils/hub.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/utils/hub.py Co-authored-by: Lucain <lucainp@gmail.com> * Update src/transformers/utils/hub.py Co-authored-by: Lucain <lucainp@gmail.com> * more enhancements * oops * merge tags * clean up * revert unneeded change * add extensive docs * more docs * more kwargs * add test * oops * fix test * Update src/transformers/modeling_utils.py Co-authored-by: Omar Sanseviero <osanseviero@gmail.com> * Update src/transformers/utils/hub.py Co-authored-by: Lucain <lucainp@gmail.com> * Update src/transformers/modeling_utils.py * Update src/transformers/trainer.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/modeling_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * add more conditions * more logic --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Lucain <lucainp@gmail.com> Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>	2024-01-15 14:48:07 +01:00
amyeroberts	4e36a6cd00	Mark two logger tests as flaky (#28458 ) * Mark two logger tests as flaky * Add description to is_flaky	2024-01-12 11:58:59 +00:00
Poedator	f85a1e82c1	4D `attention_mask` support (#27539 ) * edits to _prepare_4d_causal_attention_mask() * initial tests for 4d mask * attention_mask_for_sdpa support * added test for inner model hidden * added autotest decorators * test mask dtype to torch.int64 * torch.testing.assert_close Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * torch_device and @torch_gpu in tests * upd tests * +torch decorators * torch decorators fixed * more decorators! * even more decorators * fewer decorators --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-12-17 11:08:04 +01:00
Younes Belkada	1e20931765	[`FA-2`] Fix fa-2 issue when passing `config` to `from_pretrained` (#28043 ) * fix fa-2 issue * fix test * Update src/transformers/modeling_utils.py Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com> * clenaer fix * up * add more robust tests * Update src/transformers/modeling_utils.py Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com> * fixup * Update src/transformers/modeling_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * pop * add test --------- Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-12-15 11:08:27 +01:00
fxmarty	80377eb018	F.scaled_dot_product_attention support (#26572 ) * add sdpa * wip * cleaning * add ref * yet more cleaning * and more :) * wip llama * working llama * add output_attentions=True support * bigcode sdpa support * fixes * gpt-bigcode support, require torch>=2.1.1 * add falcon support * fix conflicts falcon * style * fix attention_mask definition * remove output_attentions from attnmaskconverter * support whisper without removing any Copied from statement * fix mbart default to eager renaming * fix typo in falcon * fix is_causal in SDPA * check is_flash_attn_2_available in the models init as well in case the model is not initialized through from_pretrained * add warnings when falling back on the manual implementation * precise doc * wip replace _flash_attn_enabled by config.attn_implementation * fix typo * add tests * style * add a copy.deepcopy on the config in from_pretrained, as we do not want to modify it inplace * obey to config.attn_implementation if a config is passed in from_pretrained * fix is_torch_sdpa_available when torch is not installed * remove dead code * Update src/transformers/modeling_attn_mask_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/modeling_attn_mask_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/modeling_attn_mask_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/modeling_attn_mask_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/modeling_attn_mask_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/bart/modeling_bart.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * remove duplicate pretraining_tp code * add dropout in llama * precise comment on attn_mask * add fmt: off for _unmask_unattended docstring * precise num_masks comment * nuke pretraining_tp in LlamaSDPAAttention following Arthur's suggestion * cleanup modeling_utils * backward compatibility * fix style as requested * style * improve documentation * test pass * style * add _unmask_unattended tests * skip meaningless tests for idefics * hard_check SDPA requirements when specifically requested * standardize the use if XXX_ATTENTION_CLASSES * fix SDPA bug with mem-efficient backend on CUDA when using fp32 * fix test * rely on SDPA is_causal parameter to handle the causal mask in some cases * fix FALCON_ATTENTION_CLASSES * remove _flash_attn_2_enabled occurences * fix test * add OPT to the list of supported flash models * improve test * properly test on different SDPA backends, on different dtypes & properly handle separately the pad tokens in the test * remove remaining _flash_attn_2_enabled occurence * Update src/transformers/modeling_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/modeling_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/modeling_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/modeling_attn_mask_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/perf_infer_gpu_one.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * remove use_attn_implementation * fix docstring & slight bug * make attn_implementation internal (_attn_implementation) * typos * fix tests * deprecate use_flash_attention_2=True * fix test * add back llama that was removed by mistake * fix tests * remove _flash_attn_2_enabled occurences bis * add check & test that passed attn_implementation is valid * fix falcon torchscript export * fix device of mask in tests * add tip about torch.jit.trace and move bt doc below sdpa * fix parameterized.expand order * move tests from test_modeling_attn_mask_utils to test_modeling_utils as a relevant test class is already there * update sdpaattention class with the new cache * Update src/transformers/configuration_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/bark/modeling_bark.py * address review comments * WIP torch.jit.trace fix. left: test both eager & sdpa * add test for torch.jit.trace for both eager/sdpa * fix falcon with torch==2.0 that needs to use sdpa * fix doc * hopefully last fix * fix key_value_length that has no default now in mask converter * is it flacky? * fix speculative decoding bug * tests do pass * fix following #27907 --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2023-12-09 05:38:14 +09:00
fxmarty	307a7d0be8	[⚠️ removed a default argument] Make `AttentionMaskConverter` compatible with `torch.compile(..., fullgraph=True)` (#27868 ) * remove bugged torch.float32 default * add test * fix tests * fix test * fix doc	2023-12-08 18:44:47 +09:00
Arthur	c0b9db0914	[`ModelOnTheFlyConversionTester`] Mark as slow for now (#27823 ) * mark test as slow for now * style	2023-12-04 08:33:15 +01:00
Nicolas Patry	7b6324e18e	Make using safetensors files automated. (#27571 ) * [WIP] Make using safetensors files automated. If `use_safetensors=True` is used, and it doesn't exist: - Don't crash just yet - Lookup for an open PR containing it. - If yes, use that instead - If not, touch the space to convert, wait for conversion to be finished and the PR to be opened - Use that new PR - Profit. * Remove the token. * [Auto Safetensors] Websocket -> SSE (#27656) * Websocket -> SSE * Support sharded + tests +cleanup a * env var * Apply suggestions from code review * Thanks Simon * Thanks Wauplin Co-authored-by: Wauplin <lucainp@gmail.com> * Cleanup * Update tests * Tests should pass * Apply to other tests * Extend extension * relax requirement on latest hfh * Revert * Correct private handling & debug statements * Skip gated repos as of now * Address review comments Co-authored-by: ArthurZucker <arthur.zucker@gmail.com> --------- Co-authored-by: Lysandre Debut <hi@lysand.re> Co-authored-by: Lysandre <lysandre@huggingface.co> Co-authored-by: Wauplin <lucainp@gmail.com> Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr> Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>	2023-12-01 15:51:10 +01:00
Marc Sun	1ac599d90f	Fix offload disk for loading derivated model checkpoint into base model (#27253 ) * fix * style * add test	2023-11-15 14:58:08 -05:00
Arthur	b97cab7e6d	Remove-auth-token (#27060 ) * don't use `use_auth_token`internally * let's use token everywhere * fixup	2023-11-13 14:20:54 +01:00
Arthur	68afca3e69	[`AttentionMaskConverter`] ]Fix-mask-inf (#27114 ) * fix? * actual fix * fixups * add dataclass to the attention mask converter * refine testing suite * make sure there are no overflows * update the test	2023-11-10 15:22:43 +01:00
Lysandre Debut	113ebf80ac	Safetensors serialization by default (#27064 ) * Safetensors serialization by default * First pass on the tests * Second pass on the tests * Third pass on the tests * Fix TF weight loading from TF-format safetensors * Specific encoder-decoder fixes for weight crossloading * Add VisionEncoderDecoder fixes for TF too * Change filename test for pt-to-tf * One missing fix for TFVisionEncoderDecoder * Fix the other crossload test * Support for flax + updated tests * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Sanchit's comments * Sanchit's comments 2 * Nico's comments * Fix tests * cleanup * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: Matt <rocketknight1@gmail.com> Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-10-31 19:16:49 +01:00
Hz, Ji	50378cbf6c	device agnostic models testing (#27146 ) * device agnostic models testing * add decorator `require_torch_fp16` * make style * apply review suggestion * Oops, the fp16 decorator was misused	2023-10-31 18:12:14 +01:00
Patrick von Platen	ac5893756b	[Attention Mask] Refactor all encoder-decoder attention mask (#27086 ) * [FA2 Bart] Add FA2 to all Bart-like * better * Refactor attention mask * remove all customized atteniton logic * format * mass rename * replace _expand_mask * replace _expand_mask * mass rename * add pt files * mass replace & rename * mass replace & rename * mass replace & rename * mass replace & rename * Update src/transformers/models/idefics/modeling_idefics.py * fix more * clean more * fix more * make style * fix again * finish * finish * finish * finish * finish * finish * finish * finish * finish * finish * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * small fix mistral * finish * finish * finish * finish --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2023-10-27 16:42:01 +02:00
Arthur	4864d08d3e	Add-support for commit description (#26704 ) * fix * update * revert * add dosctring * good to go * update * add a test	2023-10-26 12:37:09 +02:00
Arthur	c037b2e340	skip flaky hub tests (#26594 ) skip flaky	2023-10-04 17:47:55 +02:00
Angela Yi	6c26faa159	Skip warning if tracing with dynamo (#25581 ) * Ignore warning if tracing with dynamo * fix import error * separate to function * add test	2023-09-08 21:13:33 +02:00
Joao Gante	3e41cf13fc	Generate: Load generation config when `device_map` is passed (#25413 )	2023-08-10 10:54:26 +01:00
JB (Don)	78a2b19fc8	Show a warning for missing attention masks when pad_token_id is not None (#24510 ) * Adding warning messages to BERT for missing attention masks These warning messages when there are pad tokens within the input ids and no attention masks are given. The warning message should only show up once. * Adding warning messages to BERT for missing attention masks These warning messages are shown when the pad_token_id is not None and no attention masks are given. The warning message should only show up once. * Ran fix copies to copy over the changes to some of the other models * Add logger.warning_once.cache_clear() to the test * Shows warning when there are no attention masks and input_ids start/end with pad tokens * Using warning_once() instead and fix indexing in input_ids check --------- Co-authored-by: JB Lau <hckyn@voyager2.local>	2023-06-30 08:19:39 -04:00
Sylvain Gugger	8e5d1619b3	Clean load keys (#24505 ) * Preliminary work on some models * Fix test load missing and make sure nonpersistent buffers are tested * Always ignore nonpersistent buffers if in state_dict * Treat models * More models * Treat remaining models * Fix quality * Fix tests * Remove draft * This test is not needed anymore * Fix copies * Fix last test * Newly added models * Fix last tests * Address review comments	2023-06-27 14:45:40 -04:00
Sylvain Gugger	096f2cf126	Tied weights load (#24310 ) * Use tied weight keys * More * Fix tied weight missing warning * Only give info on unexpected keys with different classes * Deal with empty archs * Fix tests * Refine test	2023-06-16 10:55:42 -04:00
Sylvain Gugger	372f50030b	Split common test from core tests (#24284 )	2023-06-15 07:30:24 -04:00

26 Commits