transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-08-02 03:01:07 +06:00

History

tomeras91 3f20877da9 Add jamba (#29943 ) * Add jamba arch * apply "make fix-copies" changes * fix link to model in JambaConfig docstring * Add n_ctx in modeling file because repo-consistency wants that * Add jamba to flash attention and sdpa documentation * mamba dt_proj quant fix now works for LoRA as well * override test_left_padding_compatibility and use a more permissive tolerance. left padding numerical difference are accentuated by mamba layers * add jamba to tokenization auto * fix comments of shape (PR #24 in the model page: https://huggingface.co/ai21labs/Jamba-v0.1/discussions/24) * simple PR fixes * remove unnecessary kwargs from JambaAttentionDecoderLayer and JambaMambaDecoderLayer * remove the LoRA hack for the mamba dt_proj bias. It was solved in huggingface/peft#1530 (https://github.com/huggingface/peft/pull/1530) * Add copied comment on JambaMLP (it's the same as MixtralMLP) * remove padding_mask warnings. It's not supported anymore * fix docstring. Float instead of int * A few more minor PR fixes * (1) lowercase names for mamba layernorms (2) remove _apply_inner_layernorms and do it directly in the forward pass * Return None attention weights from mamba layers. Append to all attentions only if not None. * remove some leftover jamba archive lists * Better separation between expert vs non-expert layers. non-expert layers return None as router_logits, and it is not concatenated to all_router_logits returned from JambaModel * no need to take router_logits at config.expert_layer_offset anymore. result.router_logits now holds results only for expert layers * Add Jamba paper on READMEs * (1) rename n_ctx -> max_position_embeddings (2) don't use it in the modeling file since it's not needed (set it as an exception to check_config_attributes) * Add copied from comment * remove the code path for apply_inner_layernorms=False. Jamba always has the inner mamba layernorms * clearer docstring for _convert_to_standard_cache * style fixes * Change calc_logits_for_entire_prompt (bool) to num_logits_to_keep (int). Adapt assisted decoding code tp use it. Also small change in low memory beam search decoding path to support this new int value in model_inputs * rename test so it still overrides what its meant to override * draft * oups * nit * remove more complexe logic * fix names used in config * fix fix fix * style * fix some more failing tests * generate did not init the cache 🙃 * more small nits * typo * config.mamba_expand * config.hidden_size for the intermediate size of the mamba shapes * fix init of pkv with torch.tensor() * empty tensor * fix some init issues * stupid changes required by generate because it does not even support it's own DynamicCache class * more fixes * fix general assisted gen cache_position bug * tests passing * Add offsets and periods as SPECIAL_CASES_TO_ALLOW in check_config_attributes.py * fix reorder_cache to reorder mamba states and override some more functions in HybridMambaAttentionDynamicCache * no need to override test_past_key_values_format() and _check_past_key_values_for_generate() in tests anymore * fix docstrings and typehints for past_key_values * style fixes * fix docs * change typehint due to copy from Mixtral * forgot import * import order * Add configuration_jamba and modeling_jamba to not_doctested because the model is too big to download (in docstring of JambaForCausalLM.forward) * Add integration test with tiny tandom Jamba model on hub * fix flash attention cache shapes * bring back forgotten hidden states * rename HybridMambaAttentionDynamicCache.seqlen_offset to has_previous_state (and make bool) and bugfix - it should be set to True after a finished forward pass of the entire model * align integration test after modeling fixes * bugfix - mamba can use precomputed states only of forward pass is on a single token * bugfix - mamba can use precomputed states only if they match the batch size * typo * remove making _prepare_4d_causal_attention_mask a leaf function * stop using past_seq_len.get_seq_length(). Use cache positions instead. Adjust test (test_decoder_model_past_with_large_inputs) accordingly --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: Joao Gante <joao@huggingface.co>		2024-04-18 11:04:02 +02:00
..
test_module	AutoImageProcessor (#20111 )	2022-11-08 19:54:41 +00:00
tf_ops	Check TF ops for ONNX compliance (#10025 )	2021-02-15 07:55:10 -05:00
add_pipeline_model_mapping_to_test.py	A script to add/update `pipeline_model_mapping` systematically (#22180 )	2023-04-06 18:08:14 +02:00
check_build.py	Clean up CUDA kernels (#23455 )	2023-05-18 14:14:43 -04:00
check_config_attributes.py	Add jamba (#29943 )	2024-04-18 11:04:02 +02:00
check_config_docstrings.py	Update all references to canonical models (#29001 )	2024-02-16 08:16:58 +01:00
check_copies.py	Fix copies main ci (#29979 )	2024-04-01 12:43:58 +02:00
check_doc_toc.py	Doc checks (#25408 )	2023-08-10 10:53:22 +02:00
check_docstrings.py	Add MusicGen Melody (#28819 )	2024-03-18 13:06:12 +00:00
check_doctest_list.py	Avoid many failing tests in doctesting (#27262 )	2023-11-03 12:47:07 +01:00
check_dummies.py	Doc checks (#25408 )	2023-08-10 10:53:22 +02:00
check_inits.py	Make using safetensors files automated. (#27571 )	2023-12-01 15:51:10 +01:00
check_model_tester.py	Add a new script to check model testers' config (#22063 )	2023-03-13 19:11:19 +01:00
check_repo.py	Add recurrent gemma (#30143 )	2024-04-10 16:59:13 +02:00
check_self_hosted_runner.py	Tiny fix for `check_self_hosted_runner.py` (#24052 )	2023-06-06 18:17:41 +02:00
check_support_list.py	Fix the check of models supporting FA/SDPA not run (#28202 )	2023-12-22 12:56:11 +01:00
check_table.py	Add support for fine-tuning CLIP-like models using contrastive-image-text example (#29070 )	2024-02-20 12:08:31 +00:00
check_task_guides.py	More utils doc (#25457 )	2023-08-17 07:58:35 +02:00
check_tf_ops.py	Check TF ops for ONNX compliance (#10025 )	2021-02-15 07:55:10 -05:00
create_dummy_models.py	Update tiny model creation script (#27674 )	2023-11-28 10:05:34 +01:00
custom_init_isort.py	More utils doc (#25457 )	2023-08-17 07:58:35 +02:00
download_glue_data.py	Raise exceptions instead of asserts (#13907 )	2021-10-07 12:44:23 +05:30
extract_warnings.py	update github actions packages' version to suppress warnings (#30249 )	2024-04-15 15:08:09 +02:00
get_ci_error_statistics.py	Add artifact name in job step to maintain job / artifact correspondence (#28682 )	2024-01-31 15:58:17 +01:00
get_github_job_time.py	Make Slack CI reporting stronger (#21823 )	2023-02-28 17:12:44 +01:00
get_modified_files.py	exclude deleted files in the fixup script (#21436 )	2023-02-03 12:57:02 -05:00
get_previous_daily_ci.py	Fix a minor bug in CI slack report (#22906 )	2023-04-21 20:36:35 +02:00
get_test_info.py	Add an utility file to get information from test files (#21856 )	2023-03-01 17:53:29 +01:00
important_models.txt	ENH: [`CI`] Add new workflow to run slow tests of important models on push main if they are modified (#29235 )	2024-04-12 10:01:28 +02:00
not_doctested.txt	Add jamba (#29943 )	2024-04-18 11:04:02 +02:00
notification_service_doc_tests.py	Refactor doctest (#30210 )	2024-04-15 13:20:36 +02:00
notification_service_quantization.py	Fix quantization tests (#29914 )	2024-04-09 17:10:29 +02:00
notification_service.py	Fix quantization tests (#29914 )	2024-04-09 17:10:29 +02:00
past_ci_versions.py	(Re-)Enable Nightly + Past CI (#22393 )	2023-03-30 21:06:35 +02:00
print_env.py	Print more library versions in CI (#17384 )	2022-06-02 10:24:16 +02:00
release.py	More utils doc (#25457 )	2023-08-17 07:58:35 +02:00
slow_documentation_tests.txt	Add Idefics2 (#30253 )	2024-04-15 17:03:03 +01:00
sort_auto_mappings.py	More utils doc (#25457 )	2023-08-17 07:58:35 +02:00
split_doctest_jobs.py	Refactor doctest (#30210 )	2024-04-15 13:20:36 +02:00
split_model_tests.py	Split daily CI using 2 level matrix (#28773 )	2024-01-31 18:04:43 +01:00
tests_fetcher.py	Fix test fetcher (doctest) + `Idefics2`'s doc example (#30274 )	2024-04-16 21:25:06 +02:00
update_metadata.py	Add feature extraction mapping for automatic metadata update (#28944 )	2024-02-26 10:35:37 +00:00
update_tiny_models.py	Update tiny model summary file for recent models (#22637 )	2023-04-06 22:52:59 +02:00