* new card for mbart and mbart50
* removed comment BADGES
* Update mBart overview
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* fix typo (MBart to mBart)
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* maybe fix typo
* update typo and combine notes
* changed notes
* changed the example sentence
* fixed grammatical error and removed some lines from notes example
* missed one word
* removed documentation resources and added some lines of example code back in notes.
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* docs: ko: siglip.md
* feat: nmt draft
* fix: manual edits
* chore: Correct document title to kebab-case format
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Apply suggestions from code review
Convert unnatural language to natural Korean
Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>
* Restructure torchao quantization examples
Summary:
Mainly structured the examples by hardwares and then listed
the recommended quantization methods for each hardware H100 GPU, A100 GPU and CPU
Also added example for push_to_hub
Test Plan:
not required
Reviewers:
Subscribers:
Tasks:
Tags:
* update
* drop float8 cpu
* address comments and simplify
* small update
* link update
* minor update
* Added documentation for phi model
* Update phi.md
* Update phi.md
* Update phi.md
* Update docs/source/en/model_doc/phi.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/phi.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/phi.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/phi.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Updated model card
* Update phi.md
* Update phi.md
* Update phi.md
* Update docs/source/en/model_doc/phi.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Jihad <jihadhammoud_@hotmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update longformer.md
* Update longformer.md
* Update docs/source/en/model_doc/longformer.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/longformer.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update longformer.md
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* edit siglip model card
* fix syntax
* Update docs/source/en/model_doc/siglip.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/siglip.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/siglip.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/siglip.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/siglip.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/siglip.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* address comments
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Update generation_strategies.md
The prompt text shown in the example does not match what is inside the generated output. As the generated output always include the prompt, the correct prompt should be "Hugging Face is an open-source company".
* initial commit
* add convert internvl
* add first end-to-end working internvl
* nit prompt and image proc
* add working chat template
* add conversion llama-based models
* add tests
* pass all tests
* fix isort
* fix modular after main merge
* add video processing for internvl
* add support for interlaced images and videos
* Remove processing and config from modular, add more tests
* add llama model tests
* Modify processor for compatibility with refactored got ocr image processor
* add comments in processor
* Add docs and nits
* change video processing to use custom sample_indices_fn
* rebase and fix tests
* add processor tests
* Add changes Raushan review
* Use the new attention interface for the vision model
* nits
* add support for custom video_load_backend
* remove mention to InternVLTokenizer
* refactor vision model to simplify logic
* refactor processor for better readibility
* fix copies
* fix require av processor test
* refactor internVL vision
* Update processor and fix processing tests
* fix docstring
* update convert_weights for internvl3
* change image processor to fast by default
* remove do_center_crop=True in convert_weights
* force use_cache to True
* push_to_hub before reloading
* fix internVLVision for larger models
* update convert weight for qk norm
* fix convert_weights
* fix eos_token_id in convert
* update docs and integration tests
* make modifs after review
* fix wrong k_norm and reduce modular
* change image_token_index to image_token_id
* change checkpoint to OpenGVLab org
* last nits
* explicitely del self.num_key_value_groups
* add extra special tokens
* Iterative generation using input embeds
* Add Janus model
* discard changes
* Janus imports
* Refactor config and processor
* Added Vision tower of Janus
* Import Janus Image processor
* Vision tower fixes
* Refactor code
* Added VQ Model
* Complete model integration
* temp conversion script
* processor refactor
* Adding files to facilitate pulling
* Fixes after debugging
* Skip test for these models
* Add Janus Model
* discard changes
* Janus imports
* Refactor config and processor
* Added Vision tower of Janus
* Import Janus Image processor
* Vision tower fixes
* Refactor code
* Added VQ Model
* Complete model integration
* temp conversion script
* processor refactor
* Adding files to facilitate pulling
* Fixes after debugging
* Refactor to Text config
* ✨ Added generate function
* Saving intermediate convert file. Still need to read configs from the hub and convert them to our format.
* Adding version that reads from the JSON files. Still have to tweak some parameters manually.
* relative imports
* Initial tests
* Refactor image processor
* Seemingly working version of the conversion script, will need to test further.
* Adding command message
* Fixing conflicting JanusTextConfig class
* Incorporating some of the discussed changes.
* Small fix to create dir.
* Removing system from JINJA template
* Adding draft processor tests
* style fixes
* Minor fixes and enhancement
* added generation config
* Initial tests
* Small modifications, tests are now passing.
* Small changes I noticed while reading code.
* more fixes
* Added JanusModel class
* Small merge adaptations
* Small merge adaptations
* Image processing tests passing
* More tests and fixes
* Convert script updated and refactored
* Tests and cleanup
* make style
* Postprocessing for image generation
* generate refactor
* fixes
* - Passing tests that write a part of the model to cpu (e.g. test_cpu_offload)
- Passing tests of dispatching SDPA
- Only gradient checkpointing tests are left.
* Removing temporary code
* Changes
* Writing change to modular
* Added JanusVisionModel. SDPA dispatch tests pass more robustly. Gradient checkpoint tests are next
* Gradient checkpoint tests passing
* Removing debug code
* Major generate refactor 😮💨
* Temp changes for testing
* Green quality CI
* 2 out of 4 integration tests passing
* breadcrumbs
* Usage Examples
* Regenerate modeling after merge
* dirty code
* JanusIntegrationTest are passing
* breadcrumbs
* happy CI
* fixes
* Changing template
* nits
* Text generation logits matching original codebase at 100% precision
* Remove ./tmp from git tracking
* Remove ./tmp from git tracking
* Checkpointing changes after reviewing
* Fixing code in docstrings
* CHanging comments and small bug in convert file
* Fixing bug in image_token_id for 7B version
* Removing line that was added by both of us
* Pushing changes after discussion. Only one left is to change the key mapping for convert file.
* Updating module file
* New convert file using dict. Tested that it is equivalent to the old one by:
- comparing keys in a script
- comparing checksums of the output files between version generated with the current convert script and those generated with the old script. This is a more reliable test.
* revert changes
* mistake
* consistency change for CI
* make style
* doc fixes
* more fixes
* experimenting with masking out pad token
* checkpoint
* Batched generation with multi-images working for 1B models. Will test 7B next.
* Device fix.
* Writing changes to modular, previous ones were written to modeling just for quick testing.
* Using passed processor attention mask (only in modeling for now)
* Matching performance done in the non-standard way
* Working version of batched generation. Will change how some args are passed to make it more similar to language case
* More compliant version of the code
* Removed duplicated `_prepare_4d_causal_attention_mask_with_cache_position`
* Updating modular file, making masked filling with paddings more efficient
* Slightly more efficient version
* Modifying JanusVisionModel to be a wrapper
* Fixing test to comply with new names
* Modular overhaul
* More refactoring
* - Changing JanusVisionModel back
- Changing forward pass
- Adding boi token to the comparison
* - Removing whole context model_ids
- Using inherited implementation of prepare_inputs_for_generation
* Moving the way boi token is passed to the model
* Fixing sdpa test
* Minor changes
* testing changes
* Minor fix
* - Adding postprocessing test
- checking values of generated image on integration test
* changes
* Removing pooled attention vision module, fixing convert script as a consequence
* More changes
* Fixes
* Draft after merge
* Bug fixes
* More bug fix
* Fixing docs
* Nits
* Refactor return dict
* Moving image post processing test to main processor post process
* Passing guidance_scale as kwarg
* make style
* 🔥 refactor
* make style
* Update and green CI
* Nits and tests update
* up
* Added MID block
* fix
* Dead code
* update testcase
* update
* model_id change
* init_weight changes
---------
Co-authored-by: hsilva664 <metallic-silver@hotmail.com>
* add support for fast tokenizer
* make style
* fix according to reviews
* make style
* relax slow_fast_equivalence mean diff
---------
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
* added efficientnet image preprocessor but tests fail
* ruff checks pass
* ruff formatted
* properly pass rescale_offset through the functions
* - corrected indentation, ordering of methods
- reshape test passes when casted to float64
- equivalence test doesn't pass
* all tests now pass
- changes order of rescale, normalize acc to slow
- rescale_offset defaults to False acc to slow
- resample was causing difference in fast and slow. Changing test to bilinear resolves this difference
* ruff reformat
* F.InterpolationMode.NEAREST_EXACT gives TypeError: Object of type InterpolationMode is not JSON serializable
* fixes offset not being applied when do_rescale and do_normalization are both true
* - using nearest_exact sampling
- added tests for rescale + normalize
* resolving reviews
---------
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
* initial documentation
* rename mask to attention_mask
* smaller tests
* fixup
* fix copies
* move to time series section
* sort docs
* isort fix
* batch_size is not a configuration
* rename to TimesFMModelForPrediction
* initial script
* add check_outputs
* remove dropout_rate
* works with torch.Tensor inputs
* rename script
* fix docstrings
* fix freq when window_size is given
* add loss
* fix _quantile_loss
* formatting
* fix isort
* add weight init
* add support for sdpa and flash_attention_2
* fixes for flash_attention
* formatting
* remove flash_attention
* fix tests
* fix file name
* fix quantile loss
* added initial TimesFMModelIntegrationTests
* fix formatting
* fix import order
* fix _quantile_loss
* add doc for SDPA
* use timesfm 2.0
* bug fix in timesfm decode function.
* compare mean forecasts
* refactor type hints, use CamelCase
* consolidate decode func
* more readable code for weight conversion
* fix-copies
* simpler init
* renaem TimesFmMLP
* use T5LayerNorm
* fix tests
* use initializer_range
* TimesFmModel instead of TimesFmDecoder
* TimesFmPositionalEmbedding takes config for its init
* 2.0-500m-pytorch default configs
* use TimesFmModel
* fix formatting
* ignore TimesFmModel for testing
* fix docstring
* override generate as its not needed
* add doc strings
* fix logging
* add docstrings to output data classes
* initial copy from t5
* added config and attention layers
* add TimesFMPositionalEmbedding
* calcuate scale_factor once
* add more configs and TimesFMResidualBlock
* fix input_dims
* standardize code format with black
* remove unneeded modules
* TimesFM Model
* order of imports
* copy from Google official implementation
* remove covariate forecasting
* Adapting TimesFM to HF format
* restructing in progress
* adapted to HF convention
* timesfm test
* the model runs
* fixing unit tests
* fixing unit tests in progress
* add post_init
* do not change TimesFMOutput
* fixing unit tests
* all unit tests passed
* remove timesfm_layers
* add intermediate_size and initialize with config
* initial documentation
* rename mask to attention_mask
* smaller tests
* fixup
* fix copies
* move to time series section
* sort docs
* isort fix
* batch_size is not a configuration
* rename to TimesFMModelForPrediction
* initial script
* add check_outputs
* remove dropout_rate
* works with torch.Tensor inputs
* rename script
* fix docstrings
* fix freq when window_size is given
* add loss
* fix _quantile_loss
* formatting
* fix isort
* add weight init
* add support for sdpa and flash_attention_2
* fixes for flash_attention
* formatting
* remove flash_attention
* fix tests
* fix file name
* fix quantile loss
* added initial TimesFMModelIntegrationTests
* fix formatting
* fix import order
* fix _quantile_loss
* add doc for SDPA
* use timesfm 2.0
* bug fix in timesfm decode function.
* compare mean forecasts
* refactor type hints, use CamelCase
* consolidate decode func
* more readable code for weight conversion
* fix-copies
* simpler init
* renaem TimesFmMLP
* use T5LayerNorm
* fix tests
* use initializer_range
* TimesFmModel instead of TimesFmDecoder
* TimesFmPositionalEmbedding takes config for its init
* 2.0-500m-pytorch default configs
* use TimesFmModel
* fix formatting
* ignore TimesFmModel for testing
* fix docstring
* override generate as its not needed
* add doc strings
* fix logging
* add docstrings to output data classes
* add _CHECKPOINT_FOR_DOC
* fix comments
* Revert "fix comments"
This reverts commit 8deeb3e191.
* add _prepare_4d_attention_mask
* we do not have generative model classes
* use Cache
* return past_key_values
* modules initialized with config only
* update year
* Update docs/source/en/model_doc/timesfm.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* add layer_idx to cache
* modular timesfm
* fix test
* unwrap sequential class
* fix toctree
* remove TimesFmOnnxConfig
* fix modular
* remove TimesFmStackedDecoder
* split qkv layer into individual layers
* rename projection layers
* use ALL_ATTENTION_FUNCTIONS
* is_causal is True
* rename config
* does not support flash_attn_2
* formatting
* fix typo in docsstring
* rename inputs
* add time series mapping
* Update src/transformers/models/olmo2/modeling_olmo2.py
* Update src/transformers/models/moonshine/modeling_moonshine.py
* use updated arguments
* fix class name
* add MODEL_FOR_TIME_SERIES_PREDICTION_MAPPING
* isort
* consolidate _preprocess into forward
* fix a typo
* fix a typo
* fix toc
* fix modular
* remove aaserts
* use self.config._attn_implementation
* move to _postprocess_output
* remove timesfm_get_large_negative_number
* use view unstead of multiple unsqueeze
* make helpers static methods of the Model
* use to_tuple
* use to_tuple if not return_dict
* remove unused intitialization block as its incorporated in nn.Linear
* remove unused num_key_value_groups
* use the same convention as the masking method
* update modular
* do not use unsqueeze
* use view instead of unsqueeze
* use buffer for inv_timescales
* formatting
* modular conversion
* remove unneeded intialization
* add missing docstrings
* remove cache
* use simple_eager_attention_forward
* support tp_plan
* support for flex and flash attention masks
* Revert "support for flex and flash attention masks"
This reverts commit def36c4fcf.
* fix device
* fix tests on gpu
* remove unsued large model test
* removed unneeded comments
* add example usage
* fix style
* add import
* Update docs/source/en/model_doc/timesfm.md
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* inherit from LlamaRMSNorm
* use can_return_tuple decorator
* remvoe return_dict
* fix year
* Update docs/source/en/model_doc/timesfm.md
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* pretrained does not inherit from GenerationMixin
* use model for integration test
---------
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Rajat Sen <rsen91@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
* refactor docs
* add serialization
* Update docs/source/en/quantization/torchao.md
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* reorder
* add link
* change automatic to autoquant
Co-authored-by: DerekLiu35 <91234588+DerekLiu35@users.noreply.github.com>
* Update docs/source/en/quantization/torchao.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/torchao.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/torchao.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/torchao.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/torchao.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/torchao.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/torchao.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/torchao.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/torchao.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/torchao.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/torchao.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/torchao.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/torchao.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/torchao.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/torchao.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/torchao.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/torchao.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/torchao.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/torchao.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/torchao.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/torchao.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/quantization/torchao.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* nits
* refactor
* add colab
* update
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: DerekLiu35 <91234588+DerekLiu35@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Refactor ColPali model documentation
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Include quantisation exemple + real images
* simpler image loading
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update VITS model card
* Update docs/source/en/model_doc/vits.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/vits.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/vits.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/vits.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update vits.md
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* VDR task guide
* Add to toctree
* Update docs/source/en/tasks/visual_document_retrieval.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/tasks/visual_document_retrieval.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/tasks/visual_document_retrieval.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/tasks/visual_document_retrieval.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/tasks/visual_document_retrieval.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/tasks/visual_document_retrieval.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/tasks/visual_document_retrieval.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/tasks/visual_document_retrieval.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/tasks/visual_document_retrieval.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/tasks/visual_document_retrieval.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* fix and enhance pipeline_webserver.md
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
* Update docs/source/en/pipeline_webserver.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/pipeline_webserver.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* use pipe
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
---------
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Add MLCD model
* Update codes for auto-mapping
* Add test scripts for MLCD
* Update doc for MLCD model
* Fix import error
* Fix import error
* Fix CI error for attention_outputs
* Fix code style for CI
* Fix code style for CI
* Fix code style for CI
* Fix code style for CI
* Fix code style for CI
* Fix CI error for initialization
* Fix code style for CI
* Fix code style for CI
* Reformat codes and docs for CI test
* Reformat codes and docs for CI test
* Remove unused attributes for CI test
* Fix style for CI test
* List MLCD in flash_attn doc
* Fix: typos, modulars, refactors from suggestions
* Refactoring convert_mlcd_weights_to_hf.py from suggestions
* Fix: docs conflicts
* Fix error for CI test
* Fix style for CI test
* Add integration test for MLCD
* Refactoring by class inheritance
* Fix: refactor attention interface, adjust codes
* Fix: merging conflicts
* Fix: merging conflicts
* Fix: style for CI test
* Fix: style for CI test
* Fix: set test_resize_embeddings to be False
* Fix: initializer for CI test
* Fix: conflicts, CI test, warning and refactoring
* Fix: merging conflicts
* Refactor
* Update docs
* Fix mistakes
* Remove unused args and fix multi-gpu error
* Revert position_embeddings
* Solve conflicts
* Solve conflicts
* Remove dummy
* Update _init_weights
* Update _init_weights
* Update _init_weights for CI test
* Add ImageProcessorFast to BiT processor
* propose a fast processor and add tests
* all tests pass except one
* run make
* remove useless print
* use same test as clip
* apply make
* Update src/transformers/models/bit/image_processing_bit_fast.py
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
* Update setup.py
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
* Update src/transformers/models/bit/image_processing_bit_fast.py
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
* apply review comment
---------
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
* support fast image processor layoutlmv3
* make style
* add warning and update test
* make style
* Update src/transformers/models/layoutlmv3/image_processing_layoutlmv3_fast.py
* Update image_processing_auto.py
---------
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
* support flava fast image processor
* run style and quality
* update test
* update according to reviews
* make style
* update comment on BICUBIC
* make style
---------
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
* add test and fast image processor
* make style
* Update src/transformers/models/perceiver/image_processing_perceiver_fast.py
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
* make style
---------
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
* First pass at speech granite
Add encoder / projector, rename things
* Combine into one model file with causal lm outputs for forward
* Add loss calc
* Fix config loading
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
* Split new / old loading logic
* Use transformers integration for loading peft adapters
* Add generation wrapper for selective lora enablement
* Add note for qformer encoder automodel
* Guard torch/audio imports in feature extractor
* Handle granite speech autoclasses
* Handle optional deps in package structure for granite speech
* Add granite pretrained model def for init
* Add dummy objects for torch/torchaudio
* Add tests for granite speech processor
* Minor formatting fixes and refactoring
* Add options for falling back to config in forward
* Tentative model docstrings for granite speech
* Fix config type
* Remove legacy load
* Allow non-lora variants for granite speech
* Override weight tying for llm
* Use text config instead of llm config
* Add output embeddings getter to fix weight tying
* Fix relative imports
* computing the number of audio features, based on the raw audio sequence.
* collating audio inputs, and keeping the original lengths.
* asserted we have text. otherwise we can't specify the audio special token.
* assering the number of audio-symbols/audios match correctly.
running get validated_audios only when audio is present
* indentation bugfix + supporting different feature lengths when expanding audio.
* redundant, done in _get_validated_text
* adapting the tests:
- we must have text (not either audio or text)
- _get_num_audio_features takes a list of raw lengths, provided it insetad.
* Minor cleanup, remove unused import
* Add more tests for batch feature processing
* Allow setting offset in rel position embeddings
* Add config option for warning if peft is not installed w/ lora
* Port blip2 qformer code into granite speech
* Add sad test for numpy arr processing
* Allow numpy arrays / tuples in granite speech processor
* Fix config type for projector
* - pad instead of creating a zeros tensor, to keep the original dtype/device (support bfloat16)
- cast input_features to the model dtype (support bfloat16)
* merge Blip2QFormerConfig to GraniteSpeechProjectorConfig
* prevent a crash when re-saving/loading the model (line 109)
* consider additional edge cases during preprocessing.
* consider additional edge cases during preprocessing.
* add features mask for batched inference (bugfix)
* Minor refactor, remove multiaudio processor tests
* Add set input/output embeddings for granite speech
* Fix feature dim check in processor test
* Pop input features in embed test for granite speech
* Small fixes for test edge cases
Add granite speech to seq2seq causal lm mapping names
* Add small tests for granite speech model
* Fix data parallelism test
* Standardize model class names
* Fix check for copies
* Fix misaligned init check
* Skip granite speech in checkpoint check
* Use default for tie_word_embeddings in granite speech
* Fix non documentation granite speech repo issues
* Fix comments and docstring checks
* Add placeholder docs for granite speech
* Fix test naming collision
* Code formatting
* Rerun torch dummy obj regen
* Fix save pretrained for granite speech
* Import sorting
* Fix tests typo
* Remove offset hack
* Pass args through encoder config
* Remove unused prune heads from blip2
* removing einsum. replaced with explicit multiplication (relative positional encodings) and sdpa attention.
* remove Sequential from ConformerFeedForward and ConformerConvModule. + fix for sdpa attention
* remove GraniteSpeechConformerScale
* rename to hidden_states
* rename conformer layers to self.layers, remove the first linear from the list to keep the list homogenous.
* move pre-norm to the attention/feedforward blocks (avoid complex module wrapping)
* adding pre_norm into forward
* feature extractor refactoring to resemble how it's done in phi4multimodal.
* rename feature_extractor to audio_processor
* bugfix: input_feature_mask fix to get the exact number tokens.
* Fix pytest decorator in processor test
* Add (disabled) integration tests for granite speech
* Fix handling of optional feature masking
* Loosen validation in processing for vLLM compatability
* Formatting fixes
* Update init structure to mirror llama
* Make granite speech projector generic
* Update test config to reflect generic projector
* Formatting fixes
* Fix typos, add license
* Fix undefined var in input processing
* Cleanup and expose ctc encoder
* Add missing config docstrings
* Better var names, type hints, etc
* Set attn context size in init
* Add max pos emb to encoder config
* Cleanup feature extractor
* Add granite speech architecture details
* Remove granite speech qformer ref
* Add paper link, explicit calc for qkv
* Calculate padding directly in depthwise conv1d init
* Raise value error instead of asserting
* Reorder class defs (classes used at top)
* Precompute relpos distances
* Run formatting
* Pass attention distances through forward
* Apply suggestions from code review
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
* Add todo for using common batch feature extraction
* Rename audios/features
* Ensure chat template may be provided to processor
* Move granite speech docs to audio models
* Add todos for input proc refactoring
* Fix import order
* Guard torch import
* Use relative imports
* Require torch backend for processor in granite speech
* Add backend guards in feature extractor
---------
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
Co-authored-by: Avihu Dekel <avihu.dekel@ibm.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>