* add audio_token attribute to proc
* expand input_ids
* and legacy and expanded input_ids
* test update
* split lines
* add possibility not to provide eos and bos audio tokens
* raise errors
* test incorrect number of audio tokens
* add example
* fmt
* typo
* first adding diffllama
* add Diff Attention and other but still with errors
* complate make attention Diff-Attention
* fix some bugs which may be caused by transformer-cli while adding model
* fix a bug caused by forgetting KV cache...
* Update src/transformers/models/diffllama/modeling_diffllama.py
You don't need to divide by 2 if we use same number of attention heads as llama. instead you can just split in forward.
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* Update src/transformers/models/diffllama/modeling_diffllama.py
fit to changeing "num_heads // 2" place
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* Update src/transformers/models/diffllama/modeling_diffllama.py
new codes are more meaningful than before
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* Update src/transformers/models/diffllama/modeling_diffllama.py
new codes are more meaningful than before
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* Update src/transformers/models/diffllama/modeling_diffllama.py
fit to changeing "num_heads // 2" place
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* Update src/transformers/models/diffllama/modeling_diffllama.py
fix 2times divide by sqrt(self.head_dim)
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* Update src/transformers/models/diffllama/modeling_diffllama.py
fix 2times divide by sqrt(self.head_dim)
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* Update src/transformers/models/diffllama/modeling_diffllama.py
fit to changeing "num_heads // 2" place.
and more visible
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* I found Attention missed implemented from paper still on e072544a3b.
* re-implemented
* adding groupnorm
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* align with transformers code style
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* fix typo
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* adding groupnorm
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* change SdpaAttention to DiffSdpaAttention
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* fix bug
* Update src/transformers/models/diffllama/modeling_diffllama.py
resolve "not same outputs" problem
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* fix bugs of places of "GroupNorm with scale" and etc
* Revert "fix bugs of places of "GroupNorm with scale" and etc"
This reverts commit 26307d92f6.
* simplify multiple of attention (matmul) operations into one by repeating value_states
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* simplify multiple of attention (matmul) operations into one by repeating value_states
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* simplify multiple of attention (matmul) operations into one by repeating value_states
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* remove missed type
* add diffllama model_doc
* apply make style/quality
* apply review comment about model
* apply review comment about test
* place diffllama alphabetically on the src/transformers/__init__.py
* fix forgot code
* Supports parameters that are not initialized with standard deviation 0 in the conventional method
* add DiffLlamaConfig to CONFIG_CLASSES_TO_IGNORE_FOR_DOCSTRING_CHECKPOINT_CHECK on utils/check_config_docstrings.py
* remove unused property of config
* add to supported model list
* add to spda supported model list
* fix copyright, remove pretraining_tensor_parallel, and modify for initialization test
* remove unused import and etc.
* empty commit
* empty commit
* empty commit
* apply modular transformers but with bugs
* revert prev commit
* create src/transformers/model/diffllama/modular_diffllama.py
* run utils/modular_model_converter.py
* empty commit
* leaner modular diffllama
* remove more and more in modular_diffllama.pt
* remove more and more in modular_diffllama.pt
* resolve missing docstring entries
* force reset
* convert modular
---------
Co-authored-by: Minho Ryu <ryumin93@gmail.com>
* Add French translation of task_summary and tasks_explained
---------
Co-authored-by: Aymeric Roucher <69208727+aymeric-roucher@users.noreply.github.com>
* إضافة الترجمة العربية: summarization.md
* Update docs/source/ar/tasks/summarization.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/summarization.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/summarization.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/summarization.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/summarization.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/summarization.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/summarization.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/summarization.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/summarization.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/summarization.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/summarization.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/summarization.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/summarization.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/summarization.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/summarization.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update _toctree.yml
---------
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* إضافة الترجمة العربية: question_answering.md
* Update question_answering.md
* Update docs/source/ar/tasks/question_answering.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/question_answering.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/question_answering.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/question_answering.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/question_answering.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/question_answering.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/question_answering.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/question_answering.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/question_answering.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/question_answering.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/question_answering.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/question_answering.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/question_answering.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/question_answering.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tasks/question_answering.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update _toctree.yml
---------
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Improve modular transformers documentation
- Adds hints to general contribution guides
- Lists which utils scripts are available to generate single-files from modular files and check their content
* Show commands in copyable code cells
---------
Co-authored-by: Joel Koch <joel@bitcrowd.net>
* initial cut of modernbert for transformers
* small bug fixes
* fixes
* Update import
* Use compiled mlp->mlp_norm to match research implementation
* Propagate changes in modular to modeling
* Replace duplicate attn_out_dropout in favor of attention_dropout
cc @warner-benjamin let me know if the two should remain separate!
* Update BOS to CLS and EOS to SEP
Please confirm @warner-benjamin
* Set default classifier bias to False, matching research repo
* Update tie_word_embeddings description
* Fix _init_weights for ForMaskedLM
* Match base_model_prefix
* Add compiled_head to match research repo outputs
* Fix imports for ModernBertForMaskedLM
* Just use "gelu" default outright for classifier
* Fix config name typo: initalizer -> initializer
* Remove some unused parameters in docstring. Still lots to edit there!
* Compile the embeddings forward
Not having this resulted in very slight differences - so small it wasn't even noticed for the base model, only for the large model.
But the tiny difference for large propagated at the embedding layer through the rest of the model, leading to notable differences of ~0.0084 average per value, up to 0.2343 for the worst case.
* Add drafts for ForSequenceClassification/ForTokenClassification
* Add initial SDPA support (not exactly equivalent to FA2 yet!)
During testing, FA2 and SDPA still differ by about 0.0098 per value in the token embeddings. It still predicts the correct mask fills, but I'd like to get it fully 1-1 if possible.
* Only use attention dropout if training
* Add initial eager attention support (also not equivalent to FA2 yet!)
Frustratingly, I also can't get eager to be equivalent to FA2 (or sdpa), but it does get really close, i.e. avg ~0.010 difference per value.
Especially if I use fp32 for both FA2&eager, avg ~0.0029 difference per value
The fill-mask results are good with eager.
* Add initial tests, output_attentions, output_hidden_states, prune_heads
Tests are based on BERT, not all tests pass yet: 23 failed, 79 passed, 100 skipped
* Remove kwargs from ModernBertForMaskedLM
Disable sparse_prediction by default to match the normal HF, can be enabled via config
* Remove/adjust/skip improper tests; warn if padding but no attn mask
* Run formatting etc.
* Run python utils/custom_init_isort.py
* FlexAttention with unpadded sequences(matches FA2 within bf16 numerics)
* Reformat init_weights based on review
* self -> module in attention forwards
* Remove if config.tie_word_embeddings
* Reformat output projection on a different line
* Remove pruning
* Remove assert
* Call contiguous() to simplify paths
* Remove prune_qkv_linear_layer
* Format code
* Keep as kwargs, only use if needed
* Remove unused codepaths & related config options
* Remove 3d attn_mask test; fix token classification tuple output
* Reorder: attention_mask above position_ids, fixes gradient checkpointing
* Fix usage if no FA2 or torch v2.5+
* Make torch.compile/triton optional
Should we rename 'compile'? It's a bit vague
* Separate pooling options into separate functions (cls, mean) - cls as default
* Simplify _pad_modernbert_output, remove unused labels path
* Update tied weights to remove decoder.weight, simplify decoder loading
* Adaptively set config.compile based on hf_device_map/device/resize, etc.
* Update ModernBertConfig docstring
* Satisfy some consistency checks, add unfinished docs
* Only set compile to False if there's more than 1 device
* Add docstrings for public ModernBert classes
* Dont replace docstring returns - ends up being duplicate
* Fix mistake in toctree
* Reformat toctree
* Patched FlexAttention, SDPA, Eager with Local Attention
* Implement FA2 -> SDPA -> Eager attn_impl defaulting, crucial
both to match the original performance, and to get the highest inference speed without requiring users to manually pick FA2
* Patch test edge case with Idefics3 not working with 'attn_implementation="sdpa"'
* Repad all_hidden_states as well
* rename config.compile to reference_compile
* disable flex_attention since it crashes
* Update modernbert.md
* Using dtype min to mask in eager
* Fully remove flex attention for now
It's only compatible with the nightly torch 2.6, so we'll leave it be for now. It's also slower than eager/sdpa.
Also, update compile -> reference_compile in one more case
* Call contiguous to allow for .view()
* Copyright 2020 -> 2024
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update/simplify __init__ structure
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Remove "... if dropout_prob > 0 else identity"
As dropout with 0.0 should be efficient like identity
* re-use existing pad/unpad functions instead of creating new ones
* remove flexattention method
* Compute attention_mask and local_attention_mask once in modeling
* Simplify sequence classification prediction heads, only CLS now
Users can make custom heads if they feel like it
Also removes the unnecessary pool parameter
* Simplify module.training in eager attn
* Also export ModernBertPreTrainedModel
* Update the documentation with links to finetuning scripts
* Explain local_attention_mask parameter in docstring
* Simplify _autoset_attn_implementation, rely on super()
* Keep "in" to initialize Prediction head
Doublechecked with Benjamin that it's correct/what we used for pretraining
* add back mean pooling
* Use the pooling head in TokenClassification
* update copyright
* Reset config._attn_implementation_internal on failure
* Allow optional attention_mask in ForMaskedLM head
* fix failing run_slow tests
* Add links to the paper
* Remove unpad_no_grad, always pad/unpad without gradients
* local_attention_mask -> sliding_window_mask
* Revert "Use the pooling head in TokenClassification"
This reverts commit 99c38badd1.
There was no real motivation, no info on whether having this bigger head does anything useful.
* Simplify pooling, 2 options via if-else
---------
Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com>
Co-authored-by: Tom Aarsen <Cubiegamedev@gmail.com>
Co-authored-by: Said Taghadouini <taghadouinisaid@gmail.com>
Co-authored-by: Benjamin Clavié <ben@clavie.eu>
Co-authored-by: Antoine Chaffin <ant54600@hotmail.fr>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* docs: fix typo quickstart snippet in ColPali's model card
* docs: clean the ColPali's model card
* docs: make the `ColPaliForRetrieval`'s docstring more concise
* docs: add missing bash command used to convert weights for `vidore/colpali-v1.3-hf`
* initial commit for PR
Co-authored-by: Gabe Goodhart <gabe.l.hart@gmail.com>
* rename dynamic cache
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* add more unit tests
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* add integration test
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* add integration test
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* Add modular bamba file
* Remove trainer changes from unrelated PR
* Modify modular and cofig to get model running
* Fix some CI errors and beam search
* Fix a plethora of bugs from CI/docs/etc
* Add bamba to models with special caches
* Updat to newer mamba PR for mamba sublayer
* fix test_left_padding_compatibility
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* fix style
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* fix remaining tests
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* missed this test
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* ran make style
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* move slow tag to integration obj
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* make style
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* address comments
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* fix modular
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* left out one part of modular
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* change model
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* Make Rotary modular as well
* Update bamba.md
Added overview, update Model inference card and added config
* Update bamba.md
* Update bamba.md
* Update bamba.md
Minor fixes
* Add docs for config and model back
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
* Add warning when using fast kernels
* replaced generate example
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
* Address comments from PR
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
* Propagate attention fixes
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
* Fix attention interfaces to the new API
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
* Fix API for decoder layer
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
* Remove extra weights
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
---------
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
Co-authored-by: Gabe Goodhart <gabe.l.hart@gmail.com>
Co-authored-by: Antoni Viros i Martin <aviros@ibm.com>
Co-authored-by: divya-kumari32 <72085811+divya-kumari32@users.noreply.github.com>
Co-authored-by: Antoni Viros <ani300@gmail.com>
* Add Cohere2 docs details
* Update docs/source/en/model_doc/cohere2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Add Falcon3 documentation
* Update Falcon3 documentation
* Change Falcon to Falcon3
* Update docs and run make fix-copies
* Add blog post and huggingface models links
* refactor image_processing_auto logic
* fix fast image processor tests
* Fix tests fast vit image processor
* Add safeguard when use_fast True and torchvision not available
* change default use_fast back to None, add warnings
* remove debugging print
* call get_image_processor_class_from_name once
* Add files
* Init
* Add TimmWrapperModel
* Fix up
* Some fixes
* Fix up
* Remove old file
* Sort out import orders
* Fix some model loading
* Compatible with pipeline and trainer
* Fix up
* Delete test_timm_model_1/config.json
* Remove accidentally commited files
* Delete src/transformers/models/modeling_timm_wrapper.py
* Remove empty imports; fix transformations applied
* Tidy up
* Add image classifcation model to special cases
* Create pretrained model; enable device_map='auto'
* Enable most tests; fix init order
* Sort imports
* [run-slow] timm_wrapper
* Pass num_classes into timm.create_model
* Remove train transforms from image processor
* Update timm creation with pretrained=False
* Fix gamma/beta issue for timm models
* Fixing gamma and beta renaming for timm models
* Simplify config and model creation
* Remove attn_implementation diff
* Fixup
* Docstrings
* Fix warning msg text according to test case
* Fix device_map auto
* Set dtype and device for pixel_values in forward
* Enable output hidden states
* Enable tests for hidden_states and model parallel
* Remove default scriptable arg
* Refactor inner model
* Update timm version
* Fix _find_mismatched_keys function
* Change inheritance for Classification model (fix weights loading with device_map)
* Minor bugfix
* Disable save pretrained for image processor
* Rename hook method for loaded keys correction
* Rename state dict keys on save, remove `timm_model` prefix, make checkpoint compatible with `timm`
* Managing num_labels <-> num_classes attributes
* Enable loading checkpoints in Trainer to resume training
* Update error message for output_hidden_states
* Add output hidden states test
* Decouple base and classification models
* Add more test cases
* Add save-load-to-timm test
* Fix test name
* Fixup
* Add do_pooling
* Add test for do_pooling
* Fix doc
* Add tests for TimmWrapperModel
* Add validation for `num_classes=0` in timm config + test for DINO checkpoint
* Adjust atol for test
* Fix docs
* dev-ci
* dev-ci
* Add tests for image processor
* Update docs
* Update init to new format
* Update docs in configuration
* Fix some docs in image processor
* Improve docs for modeling
* fix for is_timm_checkpoint
* Update code examples
* Fix header
* Fix typehint
* Increase tolerance a bit
* Fix Path
* Fixing model parallel tests
* Disable "parallel" tests
* Add comment for metadata
* Refactor AutoImageProcessor for timm wrapper loading
* Remove custom test_model_outputs_equivalence
* Add require_timm decorator
* Fix comment
* Make image processor work with older timm versions and tensor input
* Save config instead of whole model in image processor tests
* Add docstring for `image_processor_filename`
* Sanitize kwargs for timm image processor
* Fix doc style
* Update check for tensor input
* Update normalize
* Remove _load_timm_model function
---------
Co-authored-by: Amy Roberts <22614925+amyeroberts@users.noreply.github.com>
* add "Translating Benchmarks.md to Chinese "
* Removed all the English original text (which was previously kept as comments in the document) and refined some of the Chinese expressions.
* Add docs/source/ar/community.md to Add_docs_source_ar_community.md
* Update community.md
* Update community.md
* Update community.md
* Update _toctree.yml - add community.md
* Update docs/source/ar/community.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Create how_to_hack_models.md
* Create modular_transformers.md
* Create tiktoken.md
* Update _toctree.yml
* Update docs/source/ar/how_to_hack_models.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/how_to_hack_models.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/how_to_hack_models.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/how_to_hack_models.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/how_to_hack_models.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/how_to_hack_models.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/how_to_hack_models.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/how_to_hack_models.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/modular_transformers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/modular_transformers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/modular_transformers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/modular_transformers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/modular_transformers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/modular_transformers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/modular_transformers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/modular_transformers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/modular_transformers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tiktoken.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/tiktoken.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
---------
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* add commen to offloading
* Update docs/source/en/kv_cache.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Fixed typo in multi gpu docs and OLMoE version
* Fixed typos in docs for agents, agents advanced, knowledge distillation, and image feature extraction
* Fixed incorrect usage of model.image_guided_detection in zero shot object detection docs
* explain release_memory
* Update docs/source/en/llm_tutorial_optimization.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Add docs/source/ar/benchmarks.md to Add_docs_source_ar_benchmarks.md
* Update docs/source/ar/benchmarks.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/benchmarks.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/benchmarks.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/benchmarks.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/benchmarks.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/benchmarks.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/benchmarks.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/benchmarks.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/benchmarks.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/benchmarks.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/benchmarks.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update _toctree.yml
* Update benchmarks.md
---------
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Updated documentation and added conversion utility
* Update docs/source/en/tiktoken.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/tiktoken.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Moved util function to integration folder + allow for str
* Update formatting
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Updated formatting
* style changes
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Add Nemotron GGUF Loading Support
* fix the Nemotron architecture assignation
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Added image-text-to-text pipeline to task guide
* Update docs/source/en/tasks/image_text_to_text.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/tasks/image_text_to_text.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/tasks/image_text_to_text.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/tasks/image_text_to_text.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Merge codeblocks
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* add deformable detr image processor fast
* add fast processor to doc
* fix copies
* nit docstring
* Add tests gpu/cpu and fix docstrings
* fix docstring
* import changes from detr
* fix imports
* rebase and fix
* fix input data format change in detr and rtdetr fast
* Fix post process function called in the instance segmentation example of mask2former
* fix description and additional notes for post_process_instance_segmentation of maskformers
* remove white space in maskformers post_process_instance_segmentation doc
* change image.size[::-1] to height and width for clarity in segmentation examples
* Add model skeletion with transformers-cli add-new-model-like
* Convert config to modular, add rms_norm_eps, delete clip_qkv
* Convert model to modular, add RMSNorm
* Add flash attention with qk norm and no qkv clipping
* Add decoder layer with RMSNorm after attention/feedforward layers
* Add base and causal model
* Add converter improvements from OLMo repo
* Update weight loading in OLMo to HF converter
* Set correct default for rms_norm_eps
* Set correct pipeline_model_mapping in test
* Run make fixup
* Fix model type
* Re-run modular conversion
* Manually set config docs to fix build errors
* Convert olmo-1124 to olmo_1124 to fix flash attention docs errors
* Start updating tests
* Update tests
* Copy upstream test_eager_matches_sdpa_inference_1_bfloat16 changes to olmo_1124
* Rename input_layernorm and post_attention_layernorm to reflect their ops better
* Use correct tokenizer
* Remove test unsupported by GPT2 tokenizer
* Create GenerationConfig outside of from_pretrained call
* Use simpler init file structure
* Add explicit __all__ to support simplified init
* Make safetensor serialization the default
* Update OLMo November 2024 docs
* add XPU path
* use accelerate API
* Update docs/source/en/tasks/semantic_segmentation.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* update more places with accelerate API
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Add docs/source/ar/torchscript.md to Add_docs_source_ar_torchscript.md
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/torchscript.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Merge troubleshooting.md with this Branch
* Update _toctree.yml
* Update torchscript.md
* Update troubleshooting.md
---------
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Add docs/source/ar/trainer.md to Add_docs_source_ar_trainer.md
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update trainer.md
* Update trainer.md
* Update trainer.md
* Create _toctree.yml
* Delete docs/source/ar/_toctree.yml
* Update _toctree.yml - add trainer
* Update _toctree.yml
* merge serialization.md into this branch
* merge sagemaker.md into this PR
* Update _toctree.yml
* Update docs/source/ar/trainer.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ar/trainer.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* kinda works
* update
* add tests
* update
* use special tokens in processors
* typo
* fix copies
* fix
* fix moshi after rebase
* update
* fix tests
* update
* Update docs/source/en/main_classes/tokenizer.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* update docs
* test for load time adding tokens
* fix some more tests which are now fetched better
* one more fix
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Add docs/source/ar/multilingual.md to Add_docs_source_ar_multilingual.md
* Update docs/source/ar/multilingual.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/multilingual.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/multilingual.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/multilingual.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/multilingual.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/multilingual.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/multilingual.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/multilingual.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/multilingual.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/multilingual.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/multilingual.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/multilingual.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/multilingual.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/multilingual.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/multilingual.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/multilingual.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update _toctree.yml
* Update _toctree.yml
* Add Translated files to branch for merg
* Update _toctree.yml
* Update _toctree.yml
* Update custom_models.md
* Update chat_templating.md
* Update docs/source/ar/create_a_model.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update create_a_model.md
* Update gguf.md
* Update gguf.md
* Update gguf.md
* Update gguf.md
---------
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* update doc
* Update docs/source/en/perf_train_cpu.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* delete closing tip
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Standardize image-text-to-text-models-output
add post_process_image_text_to_text to chameleon and cleanup
Fix legacy kwarg behavior and deprecation warning
add post_process_image_text_to_text to qwen2_vl and llava_onevision
Add post_process_image_text_to_text to idefics3, mllama, pixtral processor
* nit var name post_process_image_text_to_text udop
* nit fix deprecation warnings
* Add image-text-to-text pipeline
* add support for image url in chat template for pipeline
* Reformat to be fully compatible with chat templates
* Add tests chat template
* Fix imports and tests
* Add pipeline tag
* change logic handling of single prompt ans multiple images
* add pipeline mapping to models
* fix batched inference
* fix tests
* Add manual batching for preprocessing
* Fix outputs with nested images
* Add support for all common processing kwargs
* Add default padding when multiple text inputs (batch size>1)
* nit change version deprecation warning
* Add support for text only inference
* add chat_template warnings
* Add pipeline tests and add copied from post process function
* Fix batched pipeline tests
* nit
* Fix pipeline tests blip2
* remove unnecessary max_new_tokens
* revert processing kosmos2 and remove unnecessary max_new_tokens
* fix pipeline tests idefics
* Force try loading processor if pipeline supports it
* revert load_processor change
* hardcode loading only processor
* remove unnecessary try except
* skip imagetexttotext tests for kosmos2 as tiny model causes problems
* Make code clearer
* Address review comments
* remove preprocessing logic from pipeline
* fix fuyu
* add BC resize fuyu
* Move post_process_image_text_to_text to ProcessorMixin
* add guard in post_process
* fix zero shot object detection pipeline
* add support for generator input in pipeline
* nit
* change default image-text-to-text model to llava onevision
* fix owlv2 size dict
* Change legacy deprecation warning to only show when True
* add fast image processor rtdetr
* add gpu/cpu test and fix docstring
* remove prints
* add to doc
* nit docstring
* avoid iterating over images/annotations several times
* change torch typing
* Add image processor fast documentation
* add mamba architecture for gguf
* add logic for weights conversion, some fixes and refactoring
* add lm_head layers, unit test refactoring
* more fixes for tests
* remove lm_head creation
* remove unused comments
* feat: Added int conversion and unwrapping
* test: added tests for post_process_keypoint_detection of SuperPointImageProcessor
* docs: changed docs to include post_process_keypoint_detection method and switched from opencv to matplotlib
* test: changed test to not depend on SuperPointModel forward
* test: added missing require_torch decorator
* docs: changed pyplot parameters for the keypoints to be more visible in the example
* tests: changed import torch location to make test_flax and test_tf
* Revert "tests: changed import torch location to make test_flax and test_tf"
This reverts commit 39b32a2f69.
* tests: fixed import
* chore: applied suggestions from code review
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* tests: fixed import
* tests: fixed import (bis)
* tests: fixed import (ter)
* feat: added choice of type for target_size and changed tests accordingly
* docs: updated code snippet to reflect the addition of target size type choice in post process method
* tests: fixed imports (...)
* tests: fixed imports (...)
* style: formatting file
* docs: fixed typo from image[0] to image.size[0]
* docs: added output image and fixed some tests
* Update docs/source/en/model_doc/superpoint.md
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* fix: included SuperPointKeypointDescriptionOutput in TYPE_CHECKING if statement and changed tests results to reflect changes to SuperPoint from absolute keypoints coordinates to relative
* docs: changed SuperPoint's docs to print output instead of just accessing
* style: applied make style
* docs: added missing output type and precision in docstring of post_process_keypoint_detection
* perf: deleted loop to perform keypoint conversion in one statement
* fix: moved keypoint conversion at the end of model forward
* docs: changed SuperPointInterestPointDecoder to SuperPointKeypointDecoder class name and added relative (x, y) coordinates information to its method
* fix: changed type hint
* refactor: removed unnecessary brackets
* revert: SuperPointKeypointDecoder to SuperPointInterestPointDecoder
* Update docs/source/en/model_doc/superpoint.md
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
---------
Co-authored-by: Steven Bucaille <steven.bucaille@buawei.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Add docs/source/ar/fast_tokenizers.md to Add_docs_source_ar_fast_tokenizers.md
* Update _toctree.yml
* Update _toctree.yml
* Update docs/source/ar/_toctree.yml
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/fast_tokenizers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/fast_tokenizers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/fast_tokenizers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/fast_tokenizers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/fast_tokenizers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/fast_tokenizers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/fast_tokenizers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/fast_tokenizers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/fast_tokenizers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* Update docs/source/ar/fast_tokenizers.md
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
---------
Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
* translated gguf.md into chinese
* Apply suggestions from code review
I have updated the PR accordingly.Thank you very much for detailed guidance,and I 'll pay more attention to the details next time.
Co-authored-by: Isotr0py <2037008807@qq.com>
* Apply suggestions from code review
Co-authored-by: Isotr0py <2037008807@qq.com>
---------
Co-authored-by: Isotr0py <2037008807@qq.com>
* Add SynthIDTextWatermarkLogitsProcessor
* esolving comments.
* Resolving comments.
* esolving commits,
* Improving SynthIDWatermark tests.
* switch to PT version
* detector as pretrained model + style
* update training + style
* rebase
* Update logits_process.py
* Improving SynthIDWatermark tests.
* Shift detector training to wikitext negatives and stabilize with lower learning rate.
* Clean up.
* in for 7B
* cleanup
* upport python 3.8.
* README and final cleanup.
* HF Hub upload and initiaze.
* Update requirements for synthid_text.
* Adding SynthIDTextWatermarkDetector.
* Detector testing.
* Documentation changes.
* Copyrights fix.
* Fix detector api.
* ironing out errors
* ironing out errors
* training checks
* make fixup and make fix-copies
* docstrings and add to docs
* copyright
* BC
* test docstrings
* move import
* protect type hints
* top level imports
* watermarking example
* direct imports
* tpr fpr meaning
* process_kwargs
* SynthIDTextWatermarkingConfig docstring
* assert -> exception
* example updates
* no immutable dict (cant be serialized)
* pack fn
* einsum equivalent
* import order
* fix test on gpu
* add detector example
---------
Co-authored-by: Sumedh Ghaisas <sumedhg@google.com>
Co-authored-by: Marc Sun <marc@huggingface.co>
Co-authored-by: sumedhghaisas2 <138781311+sumedhghaisas2@users.noreply.github.com>
Co-authored-by: raushan <raushan@huggingface.co>
* add colorize_depth and matplotlib availability check
* add post_process_depth_estimation for zoedepth + tests
* add post_process_depth_estimation for DPT + tests
* add post_process_depth_estimation in DepthEstimationPipeline & special case for zoedepth
* run `make fixup`
* fix import related error on tests
* fix more import related errors on test
* forgot some `torch` calls in declerations
* remove `torch` call in zoedepth tests that caused error
* updated docs for depth estimation
* small fix for `colorize` input/output types
* remove `colorize_depth`, fix various names, remove matplotlib dependency
* fix formatting
* run fixup
* different images for test
* update examples in `forward` functions
* fixed broken links
* fix output types for docs
* possible format fix inside `<Tip>`
* Readability related updates
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Readability related update
* cleanup after merge
* refactor `post_process_depth_estimation` to return dict; simplify ZoeDepth's `post_process_depth_estimation`
* rewrite dict merging to support python 3.8
---------
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* first try
* codestyle
* idefics2 is happy
* [run-slow] llava, llava_next, video_llava, vipllava, llava_next_video, idefics, idefics2, kosmos2, fuyu, blip, blip_2, instructblip, instructblipvideo, paligemma
* fix-copies
* [run-slow] llava, llava_next, video_llava, vipllava, llava_next_video, idefics, idefics2, kosmos2, fuyu, blip, blip_2, instructblip, instructblipvideo
* blip-2 needs to init vision from config
* when was this removed O_o
* minor fix
* tests
* this way?
* tests
* model-agnostic code
* codestyle
* add tests for idefics
* modify general test for VLMs
* no generation test for vlm yet!
* no generation test here also
* wanr in VIT-SDPA if output attn
* add more tests
* user can pass dict as attn impl
* repo consistency
* update
* muicgen
* no prints
* forgot speech enc-dec and clip
* how many composite models we have?
* musicgen meelody is same as mudicgen
* +siglip
* fix tests + add some more
* remove idefics custom overriden code
* make idefics2 automappable
* nits
* skip tests
* doctests
* Update src/transformers/models/idefics2/configuration_idefics2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/clip/test_modeling_clip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/idefics2/test_modeling_idefics2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/idefics2/test_modeling_idefics2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/configuration_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* major update, no need for automap
* clean up
* add FA2 test
* more tests
* style
* skip tests
* why did these started failing now?
* no attributes for FA2 needed
* one tiny test
* address comment about FA2 false warning
* style
* add new models and resolve conflicts
* fix copies
* let it be this way for now, come back tomorrow to review
* some more fixes
* update
* more updates
* update
* fix copies
* style and tests
* another big update
* fix tests
* fix tests
* update
* another update
* fix tests
* fix copies
* fix tests
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>