* Enhance documentation to explain chat-based few-shot prompting
Updates the documentation on few-shot prompting to illustrate how to structure examples using the chat-based format for instruction-tuned models.
* Update docs/source/en/tasks/prompting.md
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Update docs/source/en/tasks/prompting.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/tasks/prompting.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/tasks/prompting.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/tasks/prompting.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* fix typos
---------
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* tokenize inputs directly in apply_chat_template
* refactor processing
* revert changes processing llava
* Update docs
* fix issue with str being iterable
* add test chat text only
* change function name
* VDR task guide
* Add to toctree
* Update docs/source/en/tasks/visual_document_retrieval.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/tasks/visual_document_retrieval.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/tasks/visual_document_retrieval.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/tasks/visual_document_retrieval.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/tasks/visual_document_retrieval.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/tasks/visual_document_retrieval.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/tasks/visual_document_retrieval.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/tasks/visual_document_retrieval.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/tasks/visual_document_retrieval.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/tasks/visual_document_retrieval.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Add implementation for DataCollatorForMultipleChoice based on docs.
* Add DataCollatorForMultipleChoice to import structure.
* Remove custom DataCollatorForMultipleChoice implementations from example scripts.
* Remove custom implementations of DataCollatorForMultipleChoice from docs in English, Spanish, Japanese and Korean.
* Refactor torch version of DataCollatorForMultipleChoice to be more easily understandable.
* Apply suggested changes and run make fixup.
* fix copies, style and fixup
* add missing documentation
* nits
* fix docstring
* style
* nits
* isort
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
* Fixed typo in multi gpu docs and OLMoE version
* Fixed typos in docs for agents, agents advanced, knowledge distillation, and image feature extraction
* Fixed incorrect usage of model.image_guided_detection in zero shot object detection docs
* Added image-text-to-text pipeline to task guide
* Update docs/source/en/tasks/image_text_to_text.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/tasks/image_text_to_text.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/tasks/image_text_to_text.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/tasks/image_text_to_text.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Merge codeblocks
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* add XPU path
* use accelerate API
* Update docs/source/en/tasks/semantic_segmentation.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* update more places with accelerate API
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* add colorize_depth and matplotlib availability check
* add post_process_depth_estimation for zoedepth + tests
* add post_process_depth_estimation for DPT + tests
* add post_process_depth_estimation in DepthEstimationPipeline & special case for zoedepth
* run `make fixup`
* fix import related error on tests
* fix more import related errors on test
* forgot some `torch` calls in declerations
* remove `torch` call in zoedepth tests that caused error
* updated docs for depth estimation
* small fix for `colorize` input/output types
* remove `colorize_depth`, fix various names, remove matplotlib dependency
* fix formatting
* run fixup
* different images for test
* update examples in `forward` functions
* fixed broken links
* fix output types for docs
* possible format fix inside `<Tip>`
* Readability related updates
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Readability related update
* cleanup after merge
* refactor `post_process_depth_estimation` to return dict; simplify ZoeDepth's `post_process_depth_estimation`
* rewrite dict merging to support python 3.8
---------
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Add Auto model for image-text-to-text
* Remove donut from processing auto, add chameleon ti image text to text models
* add qwen2_vl and llava_onevision
* add pixtral to auto model for image-text-to-text
* add mllama and idefics3
* remove models in IGNORE_NON_AUTO_CONFIGURED
* add AutoModelForImageTextToText to tests and doc
* Trainer - deprecate tokenizer for processing_class
* Extend chage across Seq2Seq trainer and docs
* Add tests
* Update to FutureWarning and add deprecation version
* add xpu note
* add one more case
* add more
* Update docs/source/en/run_scripts.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Fixed typo: insted to instead
* Fixed typo: relase to release
* Fixed typo: nighlty to nightly
* Fixed typos: versatible, benchamarks, becnhmark to versatile, benchmark, benchmarks
* Fixed typo in comment: quantizd to quantized
* Fixed typo: architecutre to architecture
* Fixed typo: contibution to contribution
* Fixed typo: Presequities to Prerequisites
* Fixed typo: faste to faster
* Fixed typo: extendeding to extending
* Fixed typo: segmetantion_maps to segmentation_maps
* Fixed typo: Alternativelly to Alternatively
* Fixed incorrectly defined variable: output to output_disabled
* Fixed typo in library name: tranformers.onnx to transformers.onnx
* Fixed missing import: import tensorflow as tf
* Fixed incorrectly defined variable: token_tensor to tokens_tensor
* Fixed missing import: import torch
* Fixed incorrectly defined variable and typo: uromaize to uromanize
* Fixed incorrectly defined variable and typo: uromaize to uromanize
* Fixed typo in function args: numpy.ndarry to numpy.ndarray
* Fixed Inconsistent Library Name: Torchscript to TorchScript
* Fixed Inconsistent Class Name: OneformerProcessor to OneFormerProcessor
* Fixed Inconsistent Class Named Typo: TFLNetForMultipleChoice to TFXLNetForMultipleChoice
* Fixed Inconsistent Library Name Typo: Pytorch to PyTorch
* Fixed Inconsistent Function Name Typo: captureWarning to captureWarnings
* Fixed Inconsistent Library Name Typo: Pytorch to PyTorch
* Fixed Inconsistent Class Name Typo: TrainingArgument to TrainingArguments
* Fixed Inconsistent Model Name Typo: Swin2R to Swin2SR
* Fixed Inconsistent Model Name Typo: EART to BERT
* Fixed Inconsistent Library Name Typo: TensorFLow to TensorFlow
* Fixed Broken Link for Speech Emotion Classification with Wav2Vec2
* Fixed minor missing word Typo
* Fixed minor missing word Typo
* Fixed minor missing word Typo
* Fixed minor missing word Typo
* Fixed minor missing word Typo
* Fixed minor missing word Typo
* Fixed minor missing word Typo
* Fixed minor missing word Typo
* Fixed Punctuation: Two commas
* Fixed Punctuation: No Space between XLM-R and is
* Fixed Punctuation: No Space between [~accelerate.Accelerator.backward] and method
* Added backticks to display model.fit() in codeblock
* Added backticks to display openai-community/gpt2 in codeblock
* Fixed Minor Typo: will to with
* Fixed Minor Typo: is to are
* Fixed Minor Typo: in to on
* Fixed Minor Typo: inhibits to exhibits
* Fixed Minor Typo: they need to it needs
* Fixed Minor Typo: cast the load the checkpoints To load the checkpoints
* Fixed Inconsistent Class Name Typo: TFCamembertForCasualLM to TFCamembertForCausalLM
* Fixed typo in attribute name: outputs.last_hidden_states to outputs.last_hidden_state
* Added missing verbosity level: fatal
* Fixed Minor Typo: take To takes
* Fixed Minor Typo: heuristic To heuristics
* Fixed Minor Typo: setting To settings
* Fixed Minor Typo: Content To Contents
* Fixed Minor Typo: millions To million
* Fixed Minor Typo: difference To differences
* Fixed Minor Typo: while extract To which extracts
* Fixed Minor Typo: Hereby To Here
* Fixed Minor Typo: addition To additional
* Fixed Minor Typo: supports To supported
* Fixed Minor Typo: so that benchmark results TO as a consequence, benchmark
* Fixed Minor Typo: a To an
* Fixed Minor Typo: a To an
* Fixed Minor Typo: Chain-of-though To Chain-of-thought
* wip
* fix __init__.py
* add docs
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* address comments 1
* work on make fixup
* pass configs down
* add sdpa attention
* remove DbrxBlock
* add to configuration_auto
* docstring now passes formatting test
* fix style
* update READMEs
* add dbrx to modeling_auto
* make fix-copies generated this
* add DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP
* config docstring passes formatting test
* rename moe_loss_weight to router_aux_loss_coef
* add to flash-attn documentation
* fix model-path in tests
* Explicitly make `"suli"` the default `ffn_act_fn`
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* default to using router_aux_loss_coef over ffn_config[moe_loss_weight]
* fix _flash_attn_uses_top_left_mask and is_causal
* fix tests path
* don't use token type IDs
* follow Llama and remove token_type_ids from test
* init ConfigTester differently so tests pass
* remove multiple choice test
* remove question + answer test
* remove sequence classification test
* remove token classification test
* copy Llama tests and remove token_type_ids from test inputs
* do not test pruning or headmasking; style code
* add _tied_weights_keys parameter to pass test
* add type hints
* fix type check
* update config tester
* remove masked_lm test
* remove encoder tests
* initialize DbrxModelTester with correct params
* style
* torch_dtype does not rely on torch
* run make fixup, fix-copies
* use https://huggingface.co/v2ray/dbrx-base-fixed/blob/main/modeling_dbrx.py
* add copyright info
* fix imports and DbrxRotaryEmbedding
* update DbrxModel docstring
* use copies
* change model path in docstring
* use config in DbrxFFN
* fix flashattention2, sdpaattention
* input config to DbrXAttention, DbrxNormAttentionNorm
* more fixes
* fix
* fix again!
* add informative comment
* fix ruff?
* remove print statement + style
* change doc-test
* fix doc-test
* fix docstring
* delete commented out text
* make defaults match dbrx-instruct
* replace `router_aux_loss_coef` with `moe_loss_weight`
* is_decoder=True
* remove is_decoder from configtester
* implement sdpa properly
* make is_decoder pass tests
* start on the GenerationTesterMixin tests
* add dbrx to sdpa documentation
* skip weight typing test
* style
* initialize smaller model
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Add DBRX to toctree
* skip test_new_cache_format
* make config defaults smaller again
* add pad_token_id
* remove pad_token_id from config
* Remove all references to DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP
* Update src/transformers/models/dbrx/__init__.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/dbrx/modeling_dbrx.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update docs/source/en/model_doc/dbrx.md
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Update src/transformers/models/dbrx/configuration_dbrx.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update docs/source/en/model_doc/dbrx.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix typo
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* update docs, fix configuration_auto.py
* address pr comments
* remove is_decoder flag
* slice
* fix requires grad
* remove grad
* disconnect differently
* remove grad
* enable grads
* patch
* detach expert
* nissan al ghaib
* Update modeling_dbrx.py
* Update src/transformers/models/dbrx/modeling_dbrx.py
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* replace "Gemma" with "Dbrx"
* remove # type: ignore
* don't hardcode vocab_size
* remove ToDo
* Re-add removed idefics2 line
* Update test to use tiny-random!
* Remove TODO
* Remove one more case of loading the entire dbrx-instruct in the tests
* Update src/transformers/models/dbrx/modeling_dbrx.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* address some comments
* small model
* add dbrx to tokenization_auto
* More docstrings with add_start_docstrings
* Dbrx for now
* add PipelineTesterMixin
* Update src/transformers/models/dbrx/configuration_dbrx.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* remove flash-attn2 import error
* fix docstring
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add useage example
* put on one line
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix ffn_act_fn
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* change "dbrx" to "DBRX" for display purposes.
* fix __init__.py?
* fix __init__.py
* fix README
* return the aux_loss
* remove extra spaces
* fix configuration_auto.py
* fix format in tokenization_auto
* remove new line
* add more useage examples
---------
Co-authored-by: Abhi Venigalla <abhi.venigalla@databricks.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Eitan Turok <eitan.turok@databricks.com>
Co-authored-by: Eitan Turok <150733043+eitanturok@users.noreply.github.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
Co-authored-by: Eitan Turok <eitanturok@gmail.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Matt <rocketknight1@gmail.com>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Add jamba arch
* apply "make fix-copies" changes
* fix link to model in JambaConfig docstring
* Add n_ctx in modeling file because repo-consistency wants that
* Add jamba to flash attention and sdpa documentation
* mamba dt_proj quant fix now works for LoRA as well
* override test_left_padding_compatibility and use a more permissive tolerance. left padding numerical difference are accentuated by mamba layers
* add jamba to tokenization auto
* fix comments of shape (PR #24 in the model page: https://huggingface.co/ai21labs/Jamba-v0.1/discussions/24)
* simple PR fixes
* remove unnecessary kwargs from JambaAttentionDecoderLayer and JambaMambaDecoderLayer
* remove the LoRA hack for the mamba dt_proj bias. It was solved in huggingface/peft#1530 (https://github.com/huggingface/peft/pull/1530)
* Add copied comment on JambaMLP (it's the same as MixtralMLP)
* remove padding_mask warnings. It's not supported anymore
* fix docstring. Float instead of int
* A few more minor PR fixes
* (1) lowercase names for mamba layernorms (2) remove _apply_inner_layernorms and do it directly in the forward pass
* Return None attention weights from mamba layers. Append to all attentions only if not None.
* remove some leftover jamba archive lists
* Better separation between expert vs non-expert layers. non-expert layers return None as router_logits, and it is not concatenated to all_router_logits returned from JambaModel
* no need to take router_logits at config.expert_layer_offset anymore. result.router_logits now holds results only for expert layers
* Add Jamba paper on READMEs
* (1) rename n_ctx -> max_position_embeddings (2) don't use it in the modeling file since it's not needed (set it as an exception to check_config_attributes)
* Add copied from comment
* remove the code path for apply_inner_layernorms=False. Jamba always has the inner mamba layernorms
* clearer docstring for _convert_to_standard_cache
* style fixes
* Change calc_logits_for_entire_prompt (bool) to num_logits_to_keep (int). Adapt assisted decoding code tp use it. Also small change in low memory beam search decoding path to support this new int value in model_inputs
* rename test so it still overrides what its meant to override
* draft
* oups
* nit
* remove more complexe logic
* fix names used in config
* fix fix fix
* style
* fix some more failing tests
* generate did not init the cache 🙃
* more small nits
* typo
* config.mamba_expand * config.hidden_size for the intermediate size of the mamba shapes
* fix init of pkv with torch.tensor()
* empty tensor
* fix some init issues
* stupid changes required by generate because it does not even support it's own DynamicCache class
* more fixes
* fix general assisted gen cache_position bug
* tests passing
* Add offsets and periods as SPECIAL_CASES_TO_ALLOW in check_config_attributes.py
* fix reorder_cache to reorder mamba states and override some more functions in HybridMambaAttentionDynamicCache
* no need to override test_past_key_values_format() and _check_past_key_values_for_generate() in tests anymore
* fix docstrings and typehints for past_key_values
* style fixes
* fix docs
* change typehint due to copy from Mixtral
* forgot import
* import order
* Add configuration_jamba and modeling_jamba to not_doctested because the model is too big to download (in docstring of JambaForCausalLM.forward)
* Add integration test with tiny tandom Jamba model on hub
* fix flash attention cache shapes
* bring back forgotten hidden states
* rename HybridMambaAttentionDynamicCache.seqlen_offset to has_previous_state (and make bool) and bugfix - it should be set to True after a finished forward pass of the entire model
* align integration test after modeling fixes
* bugfix - mamba can use precomputed states only of forward pass is on a single token
* bugfix - mamba can use precomputed states only if they match the batch size
* typo
* remove making _prepare_4d_causal_attention_mask a leaf function
* stop using past_seq_len.get_seq_length(). Use cache positions instead. Adjust test (test_decoder_model_past_with_large_inputs) accordingly
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Joao Gante <joao@huggingface.co>