transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-04 13:20:12 +06:00

History

tomeras91 3f20877da9 Add jamba (#29943 ) * Add jamba arch * apply "make fix-copies" changes * fix link to model in JambaConfig docstring * Add n_ctx in modeling file because repo-consistency wants that * Add jamba to flash attention and sdpa documentation * mamba dt_proj quant fix now works for LoRA as well * override test_left_padding_compatibility and use a more permissive tolerance. left padding numerical difference are accentuated by mamba layers * add jamba to tokenization auto * fix comments of shape (PR #24 in the model page: https://huggingface.co/ai21labs/Jamba-v0.1/discussions/24) * simple PR fixes * remove unnecessary kwargs from JambaAttentionDecoderLayer and JambaMambaDecoderLayer * remove the LoRA hack for the mamba dt_proj bias. It was solved in huggingface/peft#1530 (https://github.com/huggingface/peft/pull/1530) * Add copied comment on JambaMLP (it's the same as MixtralMLP) * remove padding_mask warnings. It's not supported anymore * fix docstring. Float instead of int * A few more minor PR fixes * (1) lowercase names for mamba layernorms (2) remove _apply_inner_layernorms and do it directly in the forward pass * Return None attention weights from mamba layers. Append to all attentions only if not None. * remove some leftover jamba archive lists * Better separation between expert vs non-expert layers. non-expert layers return None as router_logits, and it is not concatenated to all_router_logits returned from JambaModel * no need to take router_logits at config.expert_layer_offset anymore. result.router_logits now holds results only for expert layers * Add Jamba paper on READMEs * (1) rename n_ctx -> max_position_embeddings (2) don't use it in the modeling file since it's not needed (set it as an exception to check_config_attributes) * Add copied from comment * remove the code path for apply_inner_layernorms=False. Jamba always has the inner mamba layernorms * clearer docstring for _convert_to_standard_cache * style fixes * Change calc_logits_for_entire_prompt (bool) to num_logits_to_keep (int). Adapt assisted decoding code tp use it. Also small change in low memory beam search decoding path to support this new int value in model_inputs * rename test so it still overrides what its meant to override * draft * oups * nit * remove more complexe logic * fix names used in config * fix fix fix * style * fix some more failing tests * generate did not init the cache 🙃 * more small nits * typo * config.mamba_expand * config.hidden_size for the intermediate size of the mamba shapes * fix init of pkv with torch.tensor() * empty tensor * fix some init issues * stupid changes required by generate because it does not even support it's own DynamicCache class * more fixes * fix general assisted gen cache_position bug * tests passing * Add offsets and periods as SPECIAL_CASES_TO_ALLOW in check_config_attributes.py * fix reorder_cache to reorder mamba states and override some more functions in HybridMambaAttentionDynamicCache * no need to override test_past_key_values_format() and _check_past_key_values_for_generate() in tests anymore * fix docstrings and typehints for past_key_values * style fixes * fix docs * change typehint due to copy from Mixtral * forgot import * import order * Add configuration_jamba and modeling_jamba to not_doctested because the model is too big to download (in docstring of JambaForCausalLM.forward) * Add integration test with tiny tandom Jamba model on hub * fix flash attention cache shapes * bring back forgotten hidden states * rename HybridMambaAttentionDynamicCache.seqlen_offset to has_previous_state (and make bool) and bugfix - it should be set to True after a finished forward pass of the entire model * align integration test after modeling fixes * bugfix - mamba can use precomputed states only of forward pass is on a single token * bugfix - mamba can use precomputed states only if they match the batch size * typo * remove making _prepare_4d_causal_attention_mask a leaf function * stop using past_seq_len.get_seq_length(). Use cache positions instead. Adjust test (test_decoder_model_past_with_large_inputs) accordingly --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: Joao Gante <joao@huggingface.co>		2024-04-18 11:04:02 +02:00
..
asr.md	Add new meta w2v2-conformer BERT-like model (#28165 )	2024-01-18 13:37:34 +00:00
audio_classification.md	Add new meta w2v2-conformer BERT-like model (#28165 )	2024-01-18 13:37:34 +00:00
document_question_answering.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
idefics.md	[Docs] Fix broken links and syntax issues (#28918 )	2024-02-08 14:13:35 -08:00
image_captioning.md	[Docs] Fix backticks in inline code and documentation links (#28875 )	2024-02-06 11:15:44 -08:00
image_classification.md	[Trainer] Undo #29896 (#30129 )	2024-04-09 12:55:42 +02:00
image_feature_extraction.md	Fix header in IFE task guide (#29859 )	2024-03-26 12:32:37 +01:00
image_to_image.md	Image-to-Image Task Guide (#26595 )	2023-10-16 15:12:03 +02:00
knowledge_distillation_for_image_classification.md	fixed typos (issue 27919) (#27920 )	2023-12-11 18:44:23 -05:00
language_modeling.md	Add jamba (#29943 )	2024-04-18 11:04:02 +02:00
mask_generation.md	Mask Generation Task Guide (#28897 )	2024-02-14 18:29:49 +00:00
masked_language_modeling.md	Update all references to canonical models (#29001 )	2024-02-16 08:16:58 +01:00
monocular_depth_estimation.md	Add Depth Anything (#28654 )	2024-01-25 09:34:50 +01:00
multiple_choice.md	Update all references to canonical models (#29001 )	2024-02-16 08:16:58 +01:00
object_detection.md	[Trainer] Undo #29896 (#30129 )	2024-04-09 12:55:42 +02:00
prompting.md	Fix doctest more (for `docs/source/en`) (#30247 )	2024-04-15 14:10:59 +02:00
question_answering.md	fix the post-processing link (#29091 )	2024-02-19 10:15:58 +00:00
semantic_segmentation.md	[docs] Fix image segmentation guide (#30132 )	2024-04-09 09:08:37 -07:00
sequence_classification.md	Add jamba (#29943 )	2024-04-18 11:04:02 +02:00
summarization.md	Update all references to canonical models (#29001 )	2024-02-16 08:16:58 +01:00
text-to-speech.md	Add FastSpeech2Conformer (#23439 )	2024-01-03 18:01:06 +00:00
token_classification.md	Update all references to canonical models (#29001 )	2024-02-16 08:16:58 +01:00
translation.md	Configuring Translation Pipelines documents update #27753 (#29986 )	2024-04-17 11:27:49 +02:00
video_classification.md	[Trainer] Undo #29896 (#30129 )	2024-04-09 12:55:42 +02:00
visual_question_answering.md	VQA task guide (#25244 )	2023-08-09 08:29:06 -04:00
zero_shot_image_classification.md	[docs] Fix model reference in zero shot image classification example (#26206 )	2023-09-19 00:45:12 +02:00
zero_shot_object_detection.md	[Docs] Update README and default pipelines (#28864 )	2024-02-12 10:21:36 +01:00