mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-03 21:00:08 +06:00
5427eaad43
699 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
![]() |
5427eaad43
|
[i18n-ur] Added README_ur.md file (#33461)
* Urdu docs added * fixed the misaligned issue. |
||
![]() |
87134662f7
|
[i18n-ar] add README_ar.md to README.md (#32583)
* Update README.md * Update README.md * Add README_ar.md to i18n/README_de.md * Add README_ar.md to i18n/README_es.md * Add README_ar.md to i18n/README_fr.md * Add README_ar.md to i18n/README_hd.md * Add README_ar.md to i18n/README_ja.md * Add README_ar.md to i18n/README_ko.md * Add README_ar.md to i18n/README_pt-br.md * Add README_ar.md to i18n/README_ru.md * Add README_ar.md to i18n/README_te.md * Add README_ar.md to i18n/README_vi.md * Add README_ar.md to i18n/README_vi.md * Add README_ar.md to i18n/README_zh-hans.md * Add README_ar.md to i18n/README_zh-hant.md * Create README_ar.md |
||
![]() |
2daf2c3eaa
|
docs: move translations to i18n (#31584)
docs: move translations to i18n |
||
![]() |
9f863d9a5b
|
README underline between badges fix (#31376)
Badge underline fix |
||
![]() |
c6bba94040
|
Remove mentions of models in the READMEs and link to the documentation page in which they are featured. (#30420)
* REAMDEs * REAMDEs v2 |
||
![]() |
ce8e64fbe2 | Dev version | ||
![]() |
005b957fb8
|
Add DBRX Model (#29921)
* wip * fix __init__.py * add docs * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * address comments 1 * work on make fixup * pass configs down * add sdpa attention * remove DbrxBlock * add to configuration_auto * docstring now passes formatting test * fix style * update READMEs * add dbrx to modeling_auto * make fix-copies generated this * add DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP * config docstring passes formatting test * rename moe_loss_weight to router_aux_loss_coef * add to flash-attn documentation * fix model-path in tests * Explicitly make `"suli"` the default `ffn_act_fn` Co-authored-by: Wing Lian <wing.lian@gmail.com> * default to using router_aux_loss_coef over ffn_config[moe_loss_weight] * fix _flash_attn_uses_top_left_mask and is_causal * fix tests path * don't use token type IDs * follow Llama and remove token_type_ids from test * init ConfigTester differently so tests pass * remove multiple choice test * remove question + answer test * remove sequence classification test * remove token classification test * copy Llama tests and remove token_type_ids from test inputs * do not test pruning or headmasking; style code * add _tied_weights_keys parameter to pass test * add type hints * fix type check * update config tester * remove masked_lm test * remove encoder tests * initialize DbrxModelTester with correct params * style * torch_dtype does not rely on torch * run make fixup, fix-copies * use https://huggingface.co/v2ray/dbrx-base-fixed/blob/main/modeling_dbrx.py * add copyright info * fix imports and DbrxRotaryEmbedding * update DbrxModel docstring * use copies * change model path in docstring * use config in DbrxFFN * fix flashattention2, sdpaattention * input config to DbrXAttention, DbrxNormAttentionNorm * more fixes * fix * fix again! * add informative comment * fix ruff? * remove print statement + style * change doc-test * fix doc-test * fix docstring * delete commented out text * make defaults match dbrx-instruct * replace `router_aux_loss_coef` with `moe_loss_weight` * is_decoder=True * remove is_decoder from configtester * implement sdpa properly * make is_decoder pass tests * start on the GenerationTesterMixin tests * add dbrx to sdpa documentation * skip weight typing test * style * initialize smaller model Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * Add DBRX to toctree * skip test_new_cache_format * make config defaults smaller again * add pad_token_id * remove pad_token_id from config * Remove all references to DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP * Update src/transformers/models/dbrx/__init__.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/dbrx/modeling_dbrx.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/model_doc/dbrx.md Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * Update src/transformers/models/dbrx/configuration_dbrx.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/model_doc/dbrx.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix typo * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * update docs, fix configuration_auto.py * address pr comments * remove is_decoder flag * slice * fix requires grad * remove grad * disconnect differently * remove grad * enable grads * patch * detach expert * nissan al ghaib * Update modeling_dbrx.py * Update src/transformers/models/dbrx/modeling_dbrx.py Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * replace "Gemma" with "Dbrx" * remove # type: ignore * don't hardcode vocab_size * remove ToDo * Re-add removed idefics2 line * Update test to use tiny-random! * Remove TODO * Remove one more case of loading the entire dbrx-instruct in the tests * Update src/transformers/models/dbrx/modeling_dbrx.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * address some comments * small model * add dbrx to tokenization_auto * More docstrings with add_start_docstrings * Dbrx for now * add PipelineTesterMixin * Update src/transformers/models/dbrx/configuration_dbrx.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * remove flash-attn2 import error * fix docstring Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * add useage example * put on one line Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix ffn_act_fn Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * change "dbrx" to "DBRX" for display purposes. * fix __init__.py? * fix __init__.py * fix README * return the aux_loss * remove extra spaces * fix configuration_auto.py * fix format in tokenization_auto * remove new line * add more useage examples --------- Co-authored-by: Abhi Venigalla <abhi.venigalla@databricks.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Eitan Turok <eitan.turok@databricks.com> Co-authored-by: Eitan Turok <150733043+eitanturok@users.noreply.github.com> Co-authored-by: Wing Lian <wing.lian@gmail.com> Co-authored-by: Eitan Turok <eitanturok@gmail.com> Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> Co-authored-by: Matt <rocketknight1@gmail.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> |
||
![]() |
3f20877da9
|
Add jamba (#29943)
* Add jamba arch * apply "make fix-copies" changes * fix link to model in JambaConfig docstring * Add n_ctx in modeling file because repo-consistency wants that * Add jamba to flash attention and sdpa documentation * mamba dt_proj quant fix now works for LoRA as well * override test_left_padding_compatibility and use a more permissive tolerance. left padding numerical difference are accentuated by mamba layers * add jamba to tokenization auto * fix comments of shape (PR #24 in the model page: https://huggingface.co/ai21labs/Jamba-v0.1/discussions/24) * simple PR fixes * remove unnecessary kwargs from JambaAttentionDecoderLayer and JambaMambaDecoderLayer * remove the LoRA hack for the mamba dt_proj bias. It was solved in huggingface/peft#1530 (https://github.com/huggingface/peft/pull/1530) * Add copied comment on JambaMLP (it's the same as MixtralMLP) * remove padding_mask warnings. It's not supported anymore * fix docstring. Float instead of int * A few more minor PR fixes * (1) lowercase names for mamba layernorms (2) remove _apply_inner_layernorms and do it directly in the forward pass * Return None attention weights from mamba layers. Append to all attentions only if not None. * remove some leftover jamba archive lists * Better separation between expert vs non-expert layers. non-expert layers return None as router_logits, and it is not concatenated to all_router_logits returned from JambaModel * no need to take router_logits at config.expert_layer_offset anymore. result.router_logits now holds results only for expert layers * Add Jamba paper on READMEs * (1) rename n_ctx -> max_position_embeddings (2) don't use it in the modeling file since it's not needed (set it as an exception to check_config_attributes) * Add copied from comment * remove the code path for apply_inner_layernorms=False. Jamba always has the inner mamba layernorms * clearer docstring for _convert_to_standard_cache * style fixes * Change calc_logits_for_entire_prompt (bool) to num_logits_to_keep (int). Adapt assisted decoding code tp use it. Also small change in low memory beam search decoding path to support this new int value in model_inputs * rename test so it still overrides what its meant to override * draft * oups * nit * remove more complexe logic * fix names used in config * fix fix fix * style * fix some more failing tests * generate did not init the cache 🙃 * more small nits * typo * config.mamba_expand * config.hidden_size for the intermediate size of the mamba shapes * fix init of pkv with torch.tensor() * empty tensor * fix some init issues * stupid changes required by generate because it does not even support it's own DynamicCache class * more fixes * fix general assisted gen cache_position bug * tests passing * Add offsets and periods as SPECIAL_CASES_TO_ALLOW in check_config_attributes.py * fix reorder_cache to reorder mamba states and override some more functions in HybridMambaAttentionDynamicCache * no need to override test_past_key_values_format() and _check_past_key_values_for_generate() in tests anymore * fix docstrings and typehints for past_key_values * style fixes * fix docs * change typehint due to copy from Mixtral * forgot import * import order * Add configuration_jamba and modeling_jamba to not_doctested because the model is too big to download (in docstring of JambaForCausalLM.forward) * Add integration test with tiny tandom Jamba model on hub * fix flash attention cache shapes * bring back forgotten hidden states * rename HybridMambaAttentionDynamicCache.seqlen_offset to has_previous_state (and make bool) and bugfix - it should be set to True after a finished forward pass of the entire model * align integration test after modeling fixes * bugfix - mamba can use precomputed states only of forward pass is on a single token * bugfix - mamba can use precomputed states only if they match the batch size * typo * remove making _prepare_4d_causal_attention_mask a leaf function * stop using past_seq_len.get_seq_length(). Use cache positions instead. Adjust test (test_decoder_model_past_with_large_inputs) accordingly --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: Joao Gante <joao@huggingface.co> |
||
![]() |
e4ea19b958
|
Add OLMo model family (#29890)
* Add OLMo using add-new-model-like with Llama * Fix incorrect tokenizer for OLMo * Copy-paste relevant OLMo methods and their imports * Add OLMo config * Modify OLMo config to follow HF conventions * Remove unneeded Llama code from OLMo model * Add ability for OLMo model to output attentions * Add OLMoPreTrainedModel and OLMoModel * Add OLMoForCausalLM * Minor fixes to OLMo model for style and missing functions * Implement OLMo tokenizer * Implement OLMo to HF conversion script * Add tests for OLMo model * Add tests for OLMo fast tokenizer * Add auto-generated dummy objects * Remove unimplemented OLMo classes from auto and init classes and re-format * Add README and associated auto-generated files * Use OLMo names for common properties * Run make fixup * Remove `|` from OLMo typing * Remove unneeded tokenization_olmo.py * Revert model, config and converter to add-new-model-like Llama * Move logic for adding bos/eos token into GPTNeoxTokenizerFast * Change OLMoConfig defaults to match OLMo-7B * Use GPTNeoXToknizerFast in OLMo tokenizer tests * Modify auto-generated OLMoModelTests to work for OLMo * Add non-parametric layer norm OLMoLayerNorm * Update weight conversion script for OLMo * Fix __init__ and auto structure for OLMo * Fix errors from make fixup * Remove OLMoTokenizerFast from documentation * Add missing 'Copied from' for OLMoModel._update_causal_mask * Run make fix-copies * Rearrange string replacements in OLMoForCausalLM Copied from * Move OLMo and Llama CausalLM.forward example into global constants * Fix OLMO_GENERATION_EXAMPLE doc string typo * Add option for qkv clipping to OLMo * Rearrange OLMoConfig kwargs in convert_olmo_weights_to_hf * Add clip_qkv to OLMoConfig in convert_olmo_weights_to_hf * Fix OLMo tokenization bug using conversion script * Keep model in full precision after conversion * Do not add eos token automatically * Update references to OLMo model in HF Hub * Do not add eos token during encoding by default * Fix Llama generation example * Run make fixup * OLMo 7B integration test fix * Remove unneeded special case for OLMoConfig * OLMo 7B Twin 2T integration test fix * Fix test_model_7b_greedy_generation * Remove test_compile_static_cache * Fix OLMo and Llama generation example * Run make fixup * Revert "OLMo 7B integration test fix" This reverts commit |
||
![]() |
6b78360e6d
|
Add Idefics2 (#30253)
* Initial add model additions * Test * All weights loading * Can perform full forward pass * Local and remote the same * Matching local and remote * Fixup * Idefics2Model importable; fixup docstrings * Don't skip by default * Remove deprecated use_resampler arg * Remove self.config * DecoupledLinear takes config * Tidy up * Enable eager attention and tidy up * Most tests passing * Update for batch of processed images * Add image processor * Update doc pages * Update conversion script * Remove erroneous breakpoint * Remove accidendtal spelling change * Update to reflect changes on hub - make generate work * Fix up * Image processor tests * Update tests * Add a processor * Add a processor * Update convert script * Update modeling file - remove fixmes * Bug fix * Add processing test * Use processor * Fix up * Update src/transformers/models/idefics2/modeling_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * Update src/transformers/models/idefics2/modeling_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * Fix test * Update config - PR comments and defaults align with checkpoint * Reviewer comments * Add copied froms for flahs attention * Update src/transformers/models/idefics2/modeling_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Remove qk_layer_norm and freeze_layers functionality * Fix * Remove freeze_layer options from config * Sync with upstream main * Fix attention shapes siglip * Remove Llava-next refs - TO REBASE * Use AutoModel for text model * Add comment to explain vision embeddings * Fix issue with tie_word_embeddings * Address review comments * Fix and fix up * Chat templates for idefics * Fix copies * Fix * Add layer norms to FA2 * Fix tests * Apply suggestions from code review Co-authored-by: Victor SANH <victorsanh@gmail.com> * Fix * Review comments * Update src/transformers/models/idefics2/modeling_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * Update inputs merger * Merge weights in correct order * Update convert script * Update src/transformers/models/idefics2/processing_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * Update template * Model code examples (fix idefics too) * More review comments * Tidy up * Update processing * Fix attention mask preparation * Update inputs_merger inputs * Vectorize inputs_merger * Update src/transformers/models/idefics2/__init__.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/idefics2/modeling_idefics2.py * Review comments * saying bye to the `qk_layer_norms` * Simplify * Update latents * Remove erroneuous readme changes * Return images when applying chat template * Fix bug - prompt images are for a single sample * Update src/transformers/models/idefics2/modeling_idefics2.py * image splitting * fix test * some more comment * some comment * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/idefics2/image_processing_idefics2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update processor * Update model tests * Update src/transformers/models/idefics2/processing_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * Update src/transformers/models/idefics2/processing_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * Don't add BOS in template * Update src/transformers/models/idefics2/processing_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * Remove index in examples * Update tests to reflect #13 * Update src/transformers/models/idefics2/processing_idefics2.py Co-authored-by: Victor SANH <victorsanh@gmail.com> * PR comment - consistent typing * Update readme and model doc * Update docs * Update checkpoint references * Update examples * Fix and update tests * Small addition * Update tests - remove copied from as no ignore placement copy could be found * Update example * small fixes * Update docs/source/en/model_doc/idefics2.md Co-authored-by: Victor SANH <victorsanh@gmail.com> * Update docs/source/en/model_doc/idefics2.md Co-authored-by: Victor SANH <victorsanh@gmail.com> * Update README.md Co-authored-by: Victor SANH <victorsanh@gmail.com> * Connector model as bridge * Fix up * Fix up * Don't pass model inputs for generation kwargs update * IDEFICS-2 -> Idefics2 * Remove config archive name * IDEFICS-2 -> Idefics2 * Add back llava-next * Update readmes * Add requirements for processor tester * Use custom convert_to_rgb to avoid possible BC * Fix doc example * Fix doc example * Skip model doc tests - as model to large * More doc example - account for image splitting * Update src/transformers/image_transforms.py * Fix config doctest --------- Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> Co-authored-by: ArthurZucker <arthur.zucker@gmail.com> Co-authored-by: Victor SANH <victorsanh@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> |
||
![]() |
b752ad3019
|
Adding grounding dino (#26087)
* Fixed typo when converting weigths to GroundingDINO vision backbone * Final modifications on modeling * Removed unnecessary class * Fixed convert structure * Added image processing * make fixup partially completed * Now text_backbone_config has its own class * Modified convert script * Removed unnecessary config attribute * Added new function to generate sub sentence mask * Renamed parameters with gamma in the name as it's currently not allowed * Removed tokenization and image_processing scripts since we'll map from existing models * Fixed some issues with configuration * Just some modifications on conversion script * Other modifications * Copied deformable detr * First commit * Added bert to model * Bert validated * Created Text and Fusion layers for Encoder * Adapted Encoder layer * Fixed typos * Adjusted Encoder * Converted encoder to hf * Modified Decoder Layer * Modified main decoder class * Removed copy comments * Fixed forward from GroundingDINOModel and GroundingDINODecoder * Added all necessary layers, configurations and forward logic up to GroundingDINOModel * Added all layers to convertion * Fixed outputs for GroundingDINOModel and GroundingDINOForObjectDetection * Fixed mask input to encoders and fixed nn.MultiheadAttention batch first and attn output * Fixed forward from GroundingDINOTextEnhancerLayer * Fixed output bug with GroundingDINODeformableLayer * Fixed bugs that prevent GroundingDINOForObjectDetection to run forward method * Fixed attentions to be passed correctly * Passing temperature arg when creating Sine position embedding * Removed copy comments * Added temperature argument for position embedding * Fixed typo when converting weigths to GroundingDINO vision backbone * Final modifications on modeling * Removed unnecessary class * Fixed convert structure * Added image processing * make fixup partially completed * Now text_backbone_config has its own class * Modified convert script * Removed unnecessary config attribute * Added new function to generate sub sentence mask * Renamed parameters with gamma in the name as it's currently not allowed * Removed tokenization and image_processing scripts since we'll map from existing models * Fixed some issues with configuration * Just some modifications on conversion script * Other modifications * Fix style * Improve fixup * Improve conversion script * Improve conversion script * Add GroundingDINOProcessor * More improvements * Return token type ids * something * Fix more tests * More improvements * More cleanup * More improvements * Fixed tests, improved modeling and config * More improvements and fixing tests * Improved tests and modeling * Improved tests and added image processor * Improved tests inference * More improvements * More test improvements * Fixed last test * Improved docstrings and comments * Fix style * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com> * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com> * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com> * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com> * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com> * Better naming * Better naming * Added Copied statement * Added Copied statement * Moved param init from GroundingDINOBiMultiHeadAttention * Better naming * Fixing clamp style * Better naming * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/grounding_dino/configuration_grounding_dino.py Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com> * Update src/transformers/models/grounding_dino/convert_grounding_dino_to_hf.py Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com> * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com> * Improving conversion script * Improved config * Improved naming * Improved naming again * Improved grouding-dino.md * Moved grounding dino to multimodal * Update src/transformers/models/grounding_dino/convert_grounding_dino_to_hf.py Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com> * Fixed docstrings and style * Fix docstrings * Remove timm attributes * Reorder imports * More improvements * Add Grounding DINO to pipeline * Remove model from check_repo * Added grounded post_process to GroundingDINOProcessor * Fixed style * Fixed GroundingDINOTextPrenetConfig docstrings * Aligned inputs.keys() when both image and text are passed with model_input_names * Added tests for GroundingDINOImageProcessor and GroundingDINOProcessor * Testing post_process_grounded_object_detection from GroundingDINOProcessor at test_inference_object_detection_head * Fixed order * Marked test with require_torch * Temporarily changed repo_id * More improvements * Fix style * Final improvements * Improve annotators * Fix style * Add is_torch_available * Remove type hints * vocab_tokens as one liner * Removed print statements * Renamed GroundingDINOTextPrenetConfig to GroundingDINOTextConfig * remove unnecessary comments * Removed unnecessary tests on conversion script * Renamed GroundingDINO to camel case GroundingDino * Fixed GroundingDinoProcessor docstrings * loading MSDA kernels in the modeling file * Fix copies * Replace nn.multiheadattention * Replace nn.multiheadattention * Fixed inputs for GroundingDinoMultiheadAttention & order of modules * Fixed processing to avoid messing with inputs * Added more tips for GroundingDino * Make style * Chaning name to align with SAM * Replace final nn.multiheadattention * Fix model tests * Update year, remove GenerationTesterMixin * Address comments * Address more comments * Rename TextPrenet to TextModel * Rename hidden_states * Address more comments * Address more comments * Address comment * Address more comments * Address merge * Address comment * Address comment * Address comment * Make style * Added layer norm eps to layer norms * Address more comments * More fixes * Fixed equivalence * Make fixup * Remove print statements * Address comments * Address comments * Address comments * Address comments * Address comments * Address comments * Add comment * Address comment * Remove overwriting of test * Fix bbox_embed * Improve decoder_bbox_embed_share * Simplify outputs * Updated post_process_grounded_object_detection * Renamed sources to feature_maps * Improved tests for Grounding Dino ImageProcessor and Processor * Fixed test requirements and imports * Fixed image_processing * Fixed processor tests * Fixed imports for image processing tests * Fix copies * Updated modeling * Fix style * Moved functions to correct position * Fixed copy issues * Update src/transformers/models/deformable_detr/modeling_deformable_detr.py Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com> * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com> * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com> * Keeping consistency custom cuda kernels for MSDA * Make GroundingDinoProcessor logic clearer * Updated Grounding DINO checkpoints * Changed tests to correct structure * Updated gpu-cpu equivalence test * fix copies * Update src/transformers/models/grounding_dino/processing_grounding_dino.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/grounding_dino/processing_grounding_dino.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/grounding_dino/configuration_grounding_dino.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Fixed erros and style * Fix copies * Removed inheritance from PreTrainedModel from GroundingDinoTextModel * Fixed GroundingDinoTextModel * Fixed type of default backbone config * Fixed missing methods for GroundingDinoTextModel and Added timm support for GroundingDinoConvEncoder * Addressed comments * Addressed batched image processing tests * Addressed zero shot test comment * Addressed tip comment * Removed GroundingDinoTextModel from check_repo * Removed inplace masking * Addressed comments * Addressed comments * Addressed comments * Fix copies * Fixing timm test * Fixed batching equivalence test * Update docs/source/en/model_doc/grounding-dino.md Co-authored-by: Tianqi Xu <40522713+dandansamax@users.noreply.github.com> * Update docs/source/en/model_doc/grounding-dino.md Co-authored-by: Tianqi Xu <40522713+dandansamax@users.noreply.github.com> * Update docs/source/en/model_doc/grounding-dino.md Co-authored-by: Tianqi Xu <40522713+dandansamax@users.noreply.github.com> * Addressed more comments * Added a new comment * Reduced image size * Addressed more comments * Nits * Nits * Changed the way text_config is initialized * Update src/transformers/models/grounding_dino/processing_grounding_dino.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: Niels <niels.rogge1@gmail.com> Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Eduardo Pacheco <eduardo.pacheco@limehome.com> Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Tianqi Xu <40522713+dandansamax@users.noreply.github.com> |
||
![]() |
0fe44059ae
|
Add recurrent gemma (#30143)
* Fork. * RecurrentGemma initial commit. * Updating __init__.py. * Minor modification to how we initialize the cache. Changing how the config specifies the architecture. * Reformat code to 4 spaces. Fixed a few typos. * Fixed the forward pass. Still unclear on the cache? * Fixed the RecurrentGemmaForCausalLM * Minor comment that we might not need attention_mask and output_attention arguments. * Now cache should work as well. * Adding a temporary example to check whether the model generation works. * Adding the tests and updating imports. * Adding the example file missing in the previous commit. * First working example. * Removing .gitignore and reverting parts of __init__. * Re-add .gitignore. * Addressing comments for configuration. * Move mask creation to `_prepare_inputs_for_generation`. * First try at integration tests: 1. AttributeError: 'GriffinCausalLMOutput' object has no attribute 'attentions'. 2. `cache_position` not passed * Transfoering between machines. * Running normal tests. * Minor fix. * More fixes. * Addressing more comments. * Minor fixes. * first stab at cleanup * more refactoring * fix copies and else * renaming and get init to work * fix causal mask creation * update * nit * fix a hell lot of things * updates * update conversion script * make all keys importable * nits * add auto mappings * properly convert ffw_up and down * add scaling * fix generations * for recurrent dtype * update * fix going beyong window * fixup * add missing files * current updates to remove last einops * finish modeling refactor * TADA * fix compile * fix most failing testt ? ? * update tests * refactor and update * update * nits, fixup and update tests * more fixup * nits * fix imports * test format * fixups * nits * tuple typing * fix code quality * add model card * fix doc * skip most generation tests * nits * style * doc fixes * fix pr and check_copies? * last nit * oupsy * Apply suggestions from code review Co-authored-by: Lysandre Debut <hi@lysand.re> * update * Update src/transformers/models/recurrent_gemma/convert_recurrent_gemma_to_hf.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * update based on review * doc nit * fix quality * quality * fix slow test model path * update default dype * ignore attributes that can be safely ignored in check config attributes * 0lallalala come on * save nit * style * remove to dict update * make sure we can also run in float16 * style --------- Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> Co-authored-by: Aleksandar Botev <botev@google.com> Co-authored-by: Leonard Berrada <lberrada@users.noreply.github.com> Co-authored-by: anushanf <anushanf@google.com> Co-authored-by: botev <botevmg@gmail.com> Co-authored-by: Lysandre Debut <hi@lysand.re> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> |
||
![]() |
46d636818b
|
Update model card and link of blog post. (#29928)
* Update qwen2_moe.md * update link of blogpost. * fixup --------- Co-authored-by: bozheng-hit <dsoul0621@gmail.com> |
||
![]() |
1c39974a4c
|
Add Qwen2MoE (#29377)
* add support for qwen2 MoE models * update docs * add support for qwen2 MoE models * update docs * update model name & test * update readme * update class names & readme & model_doc of Qwen2MoE. * update architecture name * fix qwen2_moe tests * use Qwen2Tokenizer instead of Qwen2MoeTokenizer * update modeling_qwen2_moe.py * fix model architecture * fix qwen2_moe tests * use Qwen2Tokenizer instead of Qwen2MoeTokenizer * update modeling_qwen2_moe.py * fix model architecture * fix style * fix test when there are sparse and non sparse layers * fixup * Update README.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fixup * fixup * add archive back * add support for qwen2 MoE models * update docs * update model name & test * update readme * update class names & readme & model_doc of Qwen2MoE. * update architecture name * fix qwen2_moe tests * use Qwen2Tokenizer instead of Qwen2MoeTokenizer * update modeling_qwen2_moe.py * fix model architecture * fixup * fix qwen2_moe tests * use Qwen2Tokenizer instead of Qwen2MoeTokenizer * fix style * fix test when there are sparse and non sparse layers * fixup * add archive back * fix integration test * fixup --------- Co-authored-by: bozheng-hit <dsoul0621@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> |
||
![]() |
d91fd7f92c
|
Add LLaVa-1.6, bis (#29586)
* First draft * Fix tests, add docs * Improve docstrings * Fix test * Address comments * Address comments * Remove vocab_size attribute * Remove batch_size * Address comment * Add image processor tests * Support fx * Update docstring * Add support for 34b * Convert 34b model * Add integration tests * Update checkpoints * Convert vicuna-13b, remove doc tests * Remove script * Remove file * Address comments * Improve docstrings * Deprecate vocab_size * Remove aspect_ratio_setting * Address comments * Update READMEs * Add tips about chat templates * Fix tests * Deprecate vocab_size safely * Update tests --------- Co-authored-by: Amy Roberts <22614925+amyeroberts@users.noreply.github.com> |
||
![]() |
1248f09252 | v4.40.0.dev.0 | ||
![]() |
56baa03380
|
Implementation of SuperPoint and AutoModelForKeypointDetection (#28966)
* Added SuperPoint docs * Added tests * Removed commented part * Commit to create and fix add_superpoint branch with a new branch * Fixed dummy_pt_objects * Committed missing files * Fixed README.md * Apply suggestions from code review Fixed small changes Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Moved ImagePointDescriptionOutput from modeling_outputs.py to modeling_superpoint.py * Removed AutoModelForKeypointDetection and related stuff * Fixed inconsistencies in image_processing_superpoint.py * Moved infer_on_model logic simply in test_inference * Fixed bugs, added labels to forward method with checks whether it is properly a None value, also added tests about this logic in test_modeling_superpoint.py * Added tests to SuperPointImageProcessor to ensure that images are properly converted to grayscale * Removed remaining mentions of MODEL_FOR_KEYPOINT_DETECTION_MAPPING * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Fixed from (w, h) to (h, w) as input for tests * Removed unnecessary condition * Moved last_hidden_state to be the first returned * Moved last_hidden_state to be the first returned (bis) * Moved last_hidden_state to be the first returned (ter) * Switched image_width and image_height in tests to match recent changes * Added config as first SuperPointConvBlock init argument * Reordered README's after merge * Added missing first config argument to SuperPointConvBlock instantiations * Removed formatting error * Added SuperPoint to README's de, pt-br, ru, te and vi * Checked out README_fr.md * Fixed README_fr.md * Test fix README_fr.md * Test fix README_fr.md * Last make fix-copies ! * Updated checkpoint path * Removed unused SuperPoint doc * Added missing image * Update src/transformers/models/superpoint/modeling_superpoint.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Removed unnecessary import * Update src/transformers/models/superpoint/modeling_superpoint.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Added SuperPoint to _toctree.yml --------- Co-authored-by: steven <steven.bucaillle@gmail.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Steven Bucaille <steven.bucaille@buawei.com> |
||
![]() |
c43b380e70
|
Add MusicGen Melody (#28819)
* first modeling code * make repository * still WIP * update model * add tests * add latest change * clean docstrings and copied from * update docstrings md and readme * correct chroma function * correct copied from and remove unreleated test * add doc to toctree * correct imports * add convert script to notdoctested * Add suggestion from Sanchit Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * correct get_uncoditional_inputs docstrings * modify README according to SANCHIT feedback * add chroma to audio utils * clean librosa and torchaudio hard dependencies * fix FE * refactor audio decoder -> audio encoder for consistency with previous musicgen * refactor conditional -> encoder * modify sampling rate logics * modify license at the beginning * refactor all_self_attns->all_attentions * remove ignore copy from causallm generate * add copied from for from_sub_models * fix make copies * add warning if audio is truncated * add copied from where relevant * remove artefact * fix convert script * fix torchaudio and FE * modify chroma method according to feedback-> better naming * refactor input_values->input_features * refactor input_values->input_features and fix import fe * add input_features to docstrigs * correct inputs_embeds logics * remove dtype conversion * refactor _prepare_conditional_hidden_states_kwargs_for_generation ->_prepare_encoder_hidden_states_kwargs_for_generation * change warning for chroma length * Update src/transformers/models/musicgen_melody/convert_musicgen_melody_transformers.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * change way to save wav, using soundfile * correct docs and change to soundfile * fix import * fix init proj layers * remove line breaks from md * fix issue with docstrings * add FE suggestions * improve is in logics and remove useless imports * remove custom from_pretrained * simplify docstring code * add suggestions for modeling tests * make style * update converting script with sanity check * remove encoder attention mask from conditional generation * replace musicgen melody checkpoints with official orga * rename ylacombe->facebook in checkpoints * fix copies * remove unecessary warning * add shape in code docstrings * add files to slow doc tests * fix md bug and add md to not_tested * make fix-copies * fix hidden states test and batching --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> |
||
![]() |
0e4a1c3401
|
Cohere Model Release (#29622)
* Cohere Model Release (#1) Cohere Model Release * Remove unnecessary files and code (#2) Some cleanup * Delete cohere-model directory (#3) * Make Fix (#5) * Pr fixes (#6) * fixes for pr * pr fixes for the format * pr fixes for the format * src/transformers/models/auto/tokenization_auto.py * Tokenizer test (#8) * tokenizer test * format fix * Adding Docs and other minor changes (#7) * Add modeling tests (#9) * Smol Fix (#11) * tokenization tests are fixed * format fixes * fix pr doc tests * fix pr doc tests * fix pr doc tests * fix pr style check * small changes in cohere.md * FIX: Address final comments for transformers integration (#13) * fix modeling final nits and add proper test file * for now leave empty tests * add integration test * push new test * fix modeling cohere (#14) * Update chat templates to use the new API (#15) --------- Co-authored-by: ahmetustun <ahmetustun89@gmail.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> |
||
![]() |
1fc505b816
|
Add PvT-v2 Model (#26812)
* Added pytests for pvt-v2, all passed
* Added pvt_v2 to docs/source/end/model_doc
* Ran fix-copies and fixup. All checks passed
* Added additional ReLU for linear attention mode
* pvt_v2_b2_linear converted and working
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* PvT-v2 now works in AutoModel
* Reverted batch eval changes for PR
* Expanded type support for Pvt-v2 config
* Fixed config docstring. Added channels property
* Fixed model names in tests
* Fixed config backbone compat. Added additional type support for image size in config
* Fixed config backbone compat
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* Set key and value layers to use separate linear modules. Fixed pruning function
* Set AvgPool to 7
* Fixed issue in init
* PvT-v2 now works in AutoModel
* Successful conversion of pretrained weights for PVT-v2
* Successful conversion of pretrained weights for PVT-v2 models
* Added pytests for pvt-v2, all passed
* Ran fix-copies and fixup. All checks passed
* Added additional ReLU for linear attention mode
* pvt_v2_b2_linear converted and working
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* Set key and value layers to use separate linear modules. Fixed pruning function
* Set AvgPool to 7
* Fixed issue in init
* PvT-v2 now works in AutoModel
* Successful conversion of pretrained weights for PVT-v2
* Successful conversion of pretrained weights for PVT-v2 models
* Added pytests for pvt-v2, all passed
* Ran fix-copies and fixup. All checks passed
* Added additional ReLU for linear attention mode
* pvt_v2_b2_linear converted and working
* Reverted batch eval changes for PR
* Updated index.md
* Expanded type support for Pvt-v2 config
* Fixed config docstring. Added channels property
* Fixed model names in tests
* Fixed config backbone compat
* Ran fix-copies
* Fixed PvtV2Backbone tests
* Added TFRegNet to OBJECTS_TO_IGNORE in check_docstrings.py
* Fixed backbone stuff and fixed tests: all passing
* Ran make fixup
* Made modifications for code checks
* Remove ONNX config from configuration_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Use explicit image size dict in test_modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Make image_size optional in test_modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Remove _ntuple use in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Remove reference to fp16_enabled
* Model modules now take config as first argument even when not used
* Replaced abbreviations for "SR" and "AP" with explicit "spatialreduction" and "averagepooling"
* All LayerNorm now instantiates with config.layer_norm_eps
* Added docstring for depth-wise conv layer
* PvtV2Config now only takes Union[int, Tuple[int, int]] for image size
* Refactored PVTv2 in prep for gradient checkpointing
* Gradient checkpointing ready to test
* Removed override of _set_gradient_checkpointing
* Cleaned out old code
* Applied code fixup
* Applied code fixup
* Began debug of pvt_v2 tests
* Leave handling of num_labels to base pretrained config class
* Deactivated gradient checkpointing tests until it is fixed
* Removed PvtV2ImageProcessor which duped PvtImageProcessor
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* Set key and value layers to use separate linear modules. Fixed pruning function
* Set AvgPool to 7
* Fixed issue in init
* PvT-v2 now works in AutoModel
* Successful conversion of pretrained weights for PVT-v2
* Successful conversion of pretrained weights for PVT-v2 models
* Added pytests for pvt-v2, all passed
* Added pvt_v2 to docs/source/end/model_doc
* Ran fix-copies and fixup. All checks passed
* Added additional ReLU for linear attention mode
* pvt_v2_b2_linear converted and working
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* PvT-v2 now works in AutoModel
* Reverted batch eval changes for PR
* Expanded type support for Pvt-v2 config
* Fixed config docstring. Added channels property
* Fixed model names in tests
* Fixed config backbone compat. Added additional type support for image size in config
* Fixed config backbone compat
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* Set key and value layers to use separate linear modules. Fixed pruning function
* Set AvgPool to 7
* Fixed issue in init
* PvT-v2 now works in AutoModel
* Successful conversion of pretrained weights for PVT-v2
* Successful conversion of pretrained weights for PVT-v2 models
* Added pytests for pvt-v2, all passed
* Ran fix-copies and fixup. All checks passed
* Added additional ReLU for linear attention mode
* pvt_v2_b2_linear converted and working
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* Set key and value layers to use separate linear modules. Fixed pruning function
* Set AvgPool to 7
* Fixed issue in init
* PvT-v2 now works in AutoModel
* Successful conversion of pretrained weights for PVT-v2
* Successful conversion of pretrained weights for PVT-v2 models
* Added pytests for pvt-v2, all passed
* Ran fix-copies and fixup. All checks passed
* Added additional ReLU for linear attention mode
* pvt_v2_b2_linear converted and working
* Reverted batch eval changes for PR
* Expanded type support for Pvt-v2 config
* Fixed config docstring. Added channels property
* Fixed model names in tests
* Fixed config backbone compat
* Ran fix-copies
* Fixed PvtV2Backbone tests
* Added TFRegNet to OBJECTS_TO_IGNORE in check_docstrings.py
* Fixed backbone stuff and fixed tests: all passing
* Ran make fixup
* Made modifications for code checks
* Remove ONNX config from configuration_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Use explicit image size dict in test_modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Make image_size optional in test_modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Remove _ntuple use in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Remove reference to fp16_enabled
* Model modules now take config as first argument even when not used
* Replaced abbreviations for "SR" and "AP" with explicit "spatialreduction" and "averagepooling"
* All LayerNorm now instantiates with config.layer_norm_eps
* Added docstring for depth-wise conv layer
* PvtV2Config now only takes Union[int, Tuple[int, int]] for image size
* Refactored PVTv2 in prep for gradient checkpointing
* Gradient checkpointing ready to test
* Removed override of _set_gradient_checkpointing
* Cleaned out old code
* Applied code fixup
* Applied code fixup
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* PvT-v2 now works in AutoModel
* Ran fix-copies and fixup. All checks passed
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* PvT-v2 now works in AutoModel
* Reverted batch eval changes for PR
* Fixed config docstring. Added channels property
* Fixed config backbone compat
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* PvT-v2 now works in AutoModel
* Ran fix-copies and fixup. All checks passed
* Allowed for batching of eval metrics
* copied models/pvt to adapt to pvt_v2
* First commit of pvt_v2
* PvT-v2 now works in AutoModel
* Fixed config backbone compat
* Ran fix-copies
* Began debug of pvt_v2 tests
* Leave handling of num_labels to base pretrained config class
* Deactivated gradient checkpointing tests until it is fixed
* Removed PvtV2ImageProcessor which duped PvtImageProcessor
* Fixed issue from rebase
* Fixed issue from rebase
* Set tests for gradient checkpointing to skip those using reentrant since it isn't supported
* Fixed issue from rebase
* Fixed issue from rebase
* Changed model name in docs
* Removed duplicate PvtV2Backbone
* Work around type switching issue in tests
* Fix model name in config comments
* Update docs/source/en/model_doc/pvt_v2.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Changed name of variable from 'attn_reduce' to 'sr_type'
* Changed name of variable from 'attn_reduce' to 'sr_type'
* Changed from using 'sr_type' to 'linear_attention' for clarity
* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
Removed old code
* Changed from using 'sr_type' to 'linear_attention' for clarity
* Fixed Class names to be more descriptive
* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
Removed outdated code
* Moved paper abstract to single line in pvt_v2.md
* Added usage tips to pvt_v2.md
* Simplified module inits by passing layer_idx
* Fixed typing for hidden_act in PvtV2Config
* Removed unusued import
* Add pvt_v2 to docs/source/en/_toctree.yml
* Updated documentation in docs/source/en/model_doc/pvt_v2.md to be more comprehensive.
* Updated documentation in docs/source/en/model_doc/pvt_v2.md to be more comprehensive.
* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
Move function parameters to single line
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
Update year of copyright to 2024
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
Make code more explicit
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Updated sr_ratio to be more explicit spatial_reduction_ratio
* Removed excess type hints in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Move params to single line in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Removed needless comment in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update copyright date in pvt_v2.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Moved params to single line in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Updated copyright date in configuration_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Cleaned comments in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Renamed spatial_reduction Conv2D operation
* Revert "Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
"
This reverts commit
|
||
![]() |
dd1c905215
|
[Docs] Fix FastSpeech2Conformer model doc links (#29574)
[Docs] Fix FastSpeech2Conformer links |
||
![]() |
fb1c62e973
|
[Add Mamba ] Adds support for the Mamba models (#28094)
* initial-commit
* start cleaning
* small nits
* small nits
* current updates
* add kernels
* small refactoring little step
* add comments
* styling
* nit
* nits
* Style
* Small changes
* Push dummy mambda simple slow
* nit
* Use original names
* Use original names and remove norm
* Updates for inference params
* Style nd updates
* nits
* Match logits
* Add a test
* Add expected generated text
* nits doc, imports and styling
* style
* oups
* dont install kernels, invite users to install the required kernels
* let use use the original packages
* styling
* nits
* fix some copieds
* update doc
* fix-copies
* styling done
* nits
* fix import check
* run but wrong cuda ress
* mamba CUDA works :)
* fix the fast path
* config naming nits
* conversion script is not required at this stage
* finish fixing the fast path: generation make sense now!
* nit
* Let's start working on the CIs
* style
* better style
* more nits
* test nit
* quick fix for now
* nits
* nit
* nit
* nit
* nits
* update test rest
* fixup
* update test
* nit
* some fixes
* nits
* update test values
* fix styling
* nit
* support peft
* integrations tests require torchg
* also add slow markers
* styling
* chose forward wisely
* nits
* update tests
* fix gradient checkpointing
* fixup
* nit
* fix doc
* check copies
* fix the docstring
* fix some more tests
* style
* fix beam search
* add init schene
* update
* nit
* fix
* fixup the doc
* fix the doc
* fixup
* tentative update but slow is no longer good
* nit
* should we always use float32?
* nits
* revert wrong changes
* res in float32
* cleanup
* skip fmt for now
* update generation values
* update test values running original model
* fixup
* update tests + rename inference_params to cache_params + make sure training does not use cache_params
* small nits
* more nits
* fix final CIs
* style
* nit doc
* I hope final doc nits
* nit
* 🫠
* final touch!
* fix torch import
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <hi@lysand.re>
* Apply suggestions from code review
* fix fix and fix
* fix base model prefix!
* nit
* Update src/transformers/models/mamba/__init__.py
* Update docs/source/en/model_doc/mamba.md
Co-authored-by: Lysandre Debut <hi@lysand.re>
* nit
---------
Co-authored-by: Lysandre Debut <hi@lysand.re>
|
||
![]() |
ebccb09169
|
[docs] Update starcoder2 paper link (#29418)
Update starcoder2 paper link |
||
![]() |
836921fdeb
|
Add UDOP (#22940)
* First draft
* More improvements
* More improvements
* More fixes
* Fix copies
* More improvements
* More fixes
* More improvements
* Convert checkpoint
* More improvements, set up tests
* Fix more tests
* Add UdopModel
* More improvements
* Fix equivalence test
* More fixes
* Redesign model
* Extend conversion script
* Use real inputs for conversion script
* Add image processor
* Improve conversion script
* Add UdopTokenizer
* Add fast tokenizer
* Add converter
* Update README's
* Add processor
* Add fully fledged tokenizer
* Add fast tokenizer
* Use processor in conversion script
* Add tokenizer tests
* Fix one more test
* Fix more tests
* Fix tokenizer tests
* Enable fast tokenizer tests
* Fix more tests
* Fix additional_special_tokens of fast tokenizer
* Fix tokenizer tests
* Fix more tests
* Fix equivalence test
* Rename image to pixel_values
* Rename seg_data to bbox
* More renamings
* Remove vis_special_token
* More improvements
* Add docs
* Fix copied from
* Update slow tokenizer
* Update fast tokenizer design
* Make text input optional
* Add first draft of processor tests
* Fix more processor tests
* Fix decoder_start_token_id
* Fix test_initialization
* Add integration test
* More improvements
* Improve processor, add test
* Add more copied from
* Add more copied from
* Add more copied from
* Add more copied from
* Remove print statement
* Update README and auto mapping
* Delete files
* Delete another file
* Remove code
* Fix test
* Fix docs
* Remove asserts
* Add doc tests
* Include UDOP in exotic model tests
* Add expected tesseract decodings
* Add sentencepiece
* Use same design as T5
* Add UdopEncoderModel
* Add UdopEncoderModel to tests
* More fixes
* Fix fast tokenizer
* Fix one more test
* Remove parallelisable attribute
* Fix copies
* Remove legacy file
* Copy from T5Tokenizer
* Fix rebase
* More fixes, copy from T5
* More fixes
* Fix init
* Use ArthurZ/udop for tests
* Make all model tests pass
* Remove UdopForConditionalGeneration from auto mapping
* Fix more tests
* fixups
* more fixups
* fix the tokenizers
* remove un-necessary changes
* nits
* nits
* replace truncate_sequences_boxes with truncate_sequences for fix-copies
* nit current path
* add a test for input ids
* ids that we should get taken from
|
||
![]() |
63caa370e6
|
Starcoder2 model - bis (#29215)
* Copy model * changes * misc * fixes * add embed and residual dropout (#30) * misc * remove rms norm and gated MLP * remove copied mentions where its not a copy anymore * remove unused _shape * copied from mistral instead * fix copies * fix copies * add not doctested * fix * fix copyright * Update docs/source/en/model_doc/starcoder2.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/starcoder2/configuration_starcoder2.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/starcoder2/configuration_starcoder2.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix doc * revert some changes * add fa2 tests * fix styling nit * fix * push dummy docs --------- Co-authored-by: Joel Lamy-Poirier <joel.lamy-poirier@servicenow.com> Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> |
||
![]() |
3fcfbe7549
|
Adding SegGPT (#27735)
* First commit * Improvements * More improvements * Converted original checkpoint to HF checkpoint * Fix style * Fixed forward * More improvements * More improvements * Update src/transformers/models/seggpt/modeling_seggpt.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Remove asserts * Remove unnecessary attributes * Changed model name to camel case * Improve forward doc * Improve tests * More improvements * Fix copies * Fix doc * Make SegGptImageProcessor more flexible * Added few-shot test * Fix style * Update READMEs and docs * Update READMEs * Make inputs required * Add SegGptForImageSegmentation * Make tests pass * Rename to out_indicies * Update src/transformers/models/seggpt/image_processing_seggpt.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/seggpt/image_processing_seggpt.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Fixed naming convention * Copying SegGptMlp from modeling_sam.py * Some minor improvements * Remove mlp_ratio * Fix docstrings * Fixed docstring match * Objects defined before use * Storing only patch_size and beta for SegGptLoss * removed _prepare_inputs method * Removed modified from headers * Renamed to output_indicies * Removed unnecessary einsums * Update tests/models/seggpt/test_modeling_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/seggpt/test_modeling_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/seggpt/test_modeling_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/seggpt/image_processing_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/seggpt/image_processing_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/seggpt/image_processing_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/seggpt/modeling_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/seggpt/modeling_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Fixing issues * Raise error as soon as possible * More fixes * Fix merge * Added palette to SegGptImageProcessor * Fixed typo * Fixed shape typo * Added permute before doing palette to class mapping * Fixed style * Fixed and added tests * Fixed docstrings * Matching SegFormer API for post_processing_semantic_segmentation * Fixed copies * Fixed SegGptImageProcessor to handle both binary and RGB masks * Updated docstrings of SegGptImageProcessor * Update src/transformers/models/seggpt/image_processing_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/seggpt.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/seggpt/configuration_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/seggpt/convert_seggpt_to_hf.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/seggpt/image_processing_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/seggpt/modeling_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/seggpt/image_processing_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/seggpt/image_processing_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/seggpt/image_processing_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/seggpt/modeling_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/seggpt/test_image_processing_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/seggpt/test_modeling_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/seggpt/modeling_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/seggpt/modeling_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/seggpt/modeling_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Object definitions above & fix style * Renamed output_indices to intermediate_feature_indices * Removed unnecessary check on bool_masked_pos * Loss first in the outputs * Added validation for do_normalize * Improved SegGptImageProcessor and added new tests * Added comment * Added docstrings to SegGptLoss * Reimplemented ensemble condition logic in SegGptEncoder * Update src/transformers/models/seggpt/__init__.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/seggpt/modeling_seggpt.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/seggpt/convert_seggpt_to_hf.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/seggpt/configuration_seggpt.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Updated docstrings to use post_process_semantic_segmentation * Fixed typo on docstrings * moved pixel values test to test_image_processing_seggpt * Addressed comments * Update src/transformers/models/seggpt/configuration_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/seggpt/image_processing_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/seggpt/configuration_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/seggpt/modeling_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Updated docstrings for SegGptLoss * Address comments * Added SegGpt example to model docs * Update src/transformers/models/seggpt/modeling_seggpt.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * moved patchify and unpatchify * Rename checkpoint * Renamed intermediate_features to intermediate_hidden_states for consistency * Update src/transformers/models/seggpt/configuration_seggpt.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Replaced post_process_masks for post_process_semantic_segmentation in the docs --------- Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Niels <niels.rogge1@gmail.com> Co-authored-by: Eduardo Pacheco <eduardo.pacheco@limehome.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> |
||
![]() |
c29135046a
|
[i18n-vi] Translate README.md to Vietnamese (#29229)
* Add Tiếng Việt language support * Add Vietnamese translation link to README.md * update README_vi.md |
||
![]() |
594c1277b2
|
[ gemma ] Adds support for Gemma 💎 (#29167)
* inital commit * update * update conversion checkpoint * update conversion script * nits * some fixes * nits * merge * fix permute * nits * fix * nits * nits * nits * fix rope * fix both rope * nites * style * make sure flax works * fix flax init code * fix foward * nits * print flax generation out * current code * nits * SIIIIIIIIIIIIIIIIIII * update * add new tokenizer * correct fast tokenizer * fix conversion * more comments * fix modeling and conversion * nits and nits * nits testing * add some tokenization tests * add some edge cases * add slow tests and fix them * fixup * fix copies for modeling * fix copies * add 7B slow tests * fix * fix * fix tests * make tokenizer cis go green * styling * last tokenizer nits * update jax tests * fix flax for 7b * add jit testing 🤗 * cleanups * isolated nit, inv_freq for rotary_emb.inv_freq * propagate to jax * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * adjust test * fix conversion script * change name * correct file names * update conversion script * Fix bos and eos token ids in the model configuration (#3) * update modelling * update conversion script * add static cache for gemma * fix sdpa generate * fix batched * multiple fixes * fix FA2 * final fix * Rename a few missing strings and filenames (#4) * merge with upstream main * fix copies * fix copies * fix fixup * fix fixup * fix * fix * final tests * fix fx gemma tests * fix fx bf16/fp16 tests * update slow fx tests * fx slow tests: one logits, one generation * move jit test standalone * Apply suggestions from code review * nits * tokenizer updates * more tokenization updates: custom GemmaSentencepieceExtrator * style * Update src/transformers/cache_utils.py * Update src/transformers/models/gemma/__init__.py * Update tests/models/gemma/test_modeling_flax_gemma.py * small nits * style * update tokenization test * fix the rotary embedding * with style * fix slow tests * WARNING this commit might be very important for precisions * Update tests/models/gemma/test_modeling_flax_gemma.py * Update src/transformers/models/gemma/configuration_gemma.py Co-authored-by: Lysandre Debut <hi@lysand.re> * Update src/transformers/models/gemma/modeling_flax_gemma.py Co-authored-by: Lysandre Debut <hi@lysand.re> * small nits here and there! * forgotten nit * remove on the fly computation of inv_freq * revert previous change, let's be safe and for now re-compute freq cis to make sure it's in float * Apply suggestions from code review Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update src/transformers/models/gemma/convert_gemma_weights_to_hf.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update src/transformers/models/gemma/convert_gemma_weights_to_hf.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_flax_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_tokenization_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_tokenization_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_tokenization_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_tokenization_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * nit conversion script link * fix some tests * add not doctest and pr doctest * repo consistency * fix last CIs 🚀 * update all readmes --------- Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: sanchit-gandhi <sanchit@huggingface.co> Co-authored-by: Lysandre Debut <hi@lysand.re> |
||
![]() |
1a77f07f65 | v4.39.dev.0 | ||
![]() |
f497f564bb
|
Update all references to canonical models (#29001)
* Script & Manual edition * Update |
||
![]() |
de6029a059
|
Add StableLM (#28810)
* Add `StableLM` * fix(model): re-create from `huggingface-cli add-new-model-like persimmon` * fix: re-add changes to address comments * fix(readme): add links to paper * fix(tokenization_auto): remove `GPTNeoXTokenizerFastFast` ref * fix(tests): re-add `@slow` decorator to integration tests * fix(tests): import slow... * fix(readme_hd): remove whitespace edit * fix(tokenizer): auto tokenizer tuple * skip doctests for `modeling_stablelm` |
||
![]() |
ef5ab72f4b
|
[Docs] Update README and default pipelines (#28864)
* Update README and docs * Update README * Update README |
||
![]() |
58e3d23e97
|
[i18n-de] Translate README.md to German (#28933)
* Translate README.md to German * Add links to README_de.md * Remove invisible characters in README * Change to a formal tone and fix punctuation marks |
||
![]() |
1c31b7aa3b
|
[Docs] Add missing language options and fix broken links (#28852)
* Add missing entries to the language selector * Add links to the Colab and AWS Studio notebooks for ONNX * Use anchor links in CONTRIBUTING.md * Fix broken hyperlinks due to spaces * Fix links to OpenAI research articles * Remove confusing footnote symbols from author names, as they are also considered invalid markup |
||
![]() |
cd2eb8cb2b
|
Add French translation: french README.md (#28696)
* doc: french README Signed-off-by: ThibaultLengagne <thibaultl@padok.fr> * doc: Add Depth Anything Signed-off-by: ThibaultLengagne <thibaultl@padok.fr> * doc: Add french link in other docs Signed-off-by: ThibaultLengagne <thibaultl@padok.fr> * doc: Add missing links in fr docs * doc: fix several mistakes in translation Signed-off-by: ThibaultLengagne <thibaultl@padok.fr> --------- Signed-off-by: ThibaultLengagne <thibaultl@padok.fr> Co-authored-by: Sarapuce <alexandreh@padok.fr> |
||
![]() |
963db81a5a
|
Add Depth Anything (#28654)
* First draft * More improvements * More improvements * More improvements * More improvements * Add docs * Remove file * Add copied from * Address comments * Address comments * Address comments * Fix style * Update docs * Convert all checkpoints, add integration test * Rename checkpoints * Add pretrained backbone attributes * Fix default config * Address comment * Add figure to docs * Fix bug thanks to @xenova * Update conversion script * Fix integration test |
||
![]() |
b2748a6efd | v4.38.dev.0 | ||
![]() |
d2cdefb9ec
|
Add new meta w2v2-conformer BERT-like model (#28165)
* first commit * correct default value non causal * update config and modeling code * update converting checkpoint * clean modeling and fix tests * make style * add new config parameters to docstring * fix copied from statements * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * make position_embeddings_type docstrings clearer * clean converting script * remove function not used * clean modeling file * apply suggestion for test file + add convert script to not_doctested * modify tests according to review - cleaner logic and more tests * Apply nit suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * add checker of valid position embeddings type * instantiate new layer norm layer with the right eps * fix freeze_feature_encoder since it can be None in some cases * add test same output in convert script * restore wav2vec2conformer and add new model * create processor and FE + clean * add new model code * fix convert script and set default config parameters * correct model id paths * make style * make fix-copies and cleaning files * fix copied from statements * complete .md and fixe copies * clean convert script argument defaults * fix config parameters docstrings * fix config docstring * add copied from and enrich FE tests * fix copied from and repo-consistency * add autotokenizer * make test input length shorter and change docstring code * fix docstrings and copied from * add add_adapter to ASR training example * make testing of adapters more robust * adapt to multi adapter layers * refactor input_values->input_features and remove w2v2-bert feature extractor * remove pretraining model * remove depreciated features and useless lines * add copied from and ignore statements to modeling tests * remove pretraining model #2 * change import in convert script * change default in convert script * update readme and remove useless line * Update tests/models/wav2vec2_bert/test_processor_wav2vec2_bert.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * refactor BERT to Bert for consistency * remove useless ignore copy statement * add persistent to buffer in rotary * add eps in LayerNorm init and remove copied from * add adapter activation parameters and add copied from statements * Fix copied statements and add unitest.skip reasons * add copied statement in test_processor * refactor processor * make style * replace numpy random by torch rand * remove expected output CTC * improve converting script with processor class * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * remove gumbel class * remove tests related to previously deleted class * Update src/transformers/models/wav2vec2_bert/configuration_wav2vec2_bert.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * correct typos * remove uused parameters * update processor to takes both text and audio * update checkpoints * update expected output and add ctc expected output * add label_attention_mask * replace pt with np in processor tests * fix typo * revert to behaviour with labels_attention_mask --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> |
||
![]() |
d6ffe74dfa
|
Add qwen2 (#28436)
* add config, modeling, and tokenization * add auto and init * update readme * update readme * update team name * fixup * fixup * update config * update code style * update for fixup * update for fixup * update for fixup * update for testing * update for testing * fix bug for config and tokenization * fix bug for bos token * not doctest * debug tokenizer * not doctest * debug tokenization * debug init for tokenizer * fix style * update init * delete if in token auto * add tokenizer doc * add tokenizer in init * Update dummy_tokenizers_objects.py * update * update * debug * Update tokenization_qwen2.py * debug * Update convert_slow_tokenizer.py * add copies * add copied from and make style * update files map * update test * fix style * fix merge reading and update tests * fix tests * fix tests * fix style * debug a variable in readme * Update src/transformers/models/qwen2/configuration_qwen2.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * update test and copied from * fix style * update qwen2 tokenization and tests * Update tokenization_qwen2.py * delete the copied from after property * fix style * update tests * update tests * add copied from * fix bugs * update doc * add warning for sliding window attention * update qwen2 tokenization * fix style * Update src/transformers/models/qwen2/modeling_qwen2.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix tokenizer fast --------- Co-authored-by: Ren Xuancheng <jklj077@users.noreply.github.com> Co-authored-by: renxuancheng.rxc <renxuancheng.rxc@alibaba-inc.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> |
||
![]() |
59cd9de39d
|
Byebye torch 1.10 (#28207)
* fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> |
||
![]() |
ffd3710391
|
Fix number of models in README.md (#28430) | ||
![]() |
3b742ea84c
|
Add SigLIP (#26522)
* Add first draft * Use appropriate gelu function * More improvements * More improvements * More improvements * Convert checkpoint * More improvements * Improve docs, remove print statements * More improvements * Add link * remove unused masking function * begin tokenizer * do_lower_case * debug * set split_special_tokens=True * Remove script * Fix style * Fix rebase * Use same design as CLIP * Add fast tokenizer * Add SiglipTokenizer to init, remove extra_ids * Improve conversion script * Use smaller inputs in conversion script * Update conversion script * More improvements * Add processor to conversion script * Add tests * Remove print statements * Add tokenizer tests * Fix more tests * More improvements related to weight initialization * More improvements * Make more tests pass * More improvements * More improvements * Add copied from * Add canonicalize_text * Enable fast tokenizer tests * More improvements * Fix most slow tokenizer tests * Address comments * Fix style * Remove script * Address some comments * Add copied from to tests * Add more copied from * Add more copied from * Add more copied from * Remove is_flax_available * More updates * Address comment * Remove SiglipTokenizerFast for now * Add caching * Remove umt5 test * Add canonicalize_text inside _tokenize, thanks Arthur * Fix image processor tests * Skip tests which are not applicable * Skip test_initialization * More improvements * Compare pixel values * Fix doc tests, add integration test * Add do_normalize * Remove causal mask and leverage ignore copy * Fix attention_mask * Fix remaining tests * Fix dummies * Rename temperature and bias * Address comments * Add copied from to tokenizer tests * Add SiglipVisionModel to auto mapping * Add copied from to image processor tests * Improve doc * Remove SiglipVisionModel from index * Address comments * Improve docs * Simplify config * Add first draft * Make it like mistral * More improvements * Fix attention_mask * Fix output_attentions * Add note in docs * Convert multilingual model * Convert large checkpoint * Convert more checkpoints * Add pipeline support, correct image_mean and image_std * Use padding=max_length by default * Make processor like llava * Add code snippet * Convert more checkpoints * Set keep_punctuation_string=None as in OpenCLIP * Set normalized=False for special tokens * Fix doc test * Update integration test * Add figure * Update organization * Happy new year * Use AutoModel everywhere --------- Co-authored-by: patil-suraj <surajp815@gmail.com> |
||
![]() |
5d36025ca1
|
README: install transformers from conda-forge channel (#28313)
Switch to the conda-forge channel for transformer installation, as the huggingface channel does not offer the latest version. Fixes #28248 |
||
![]() |
d83ff5eeff
|
Add FastSpeech2Conformer (#23439)
* start - docs, SpeechT5 copy and rename * add relevant code from FastSpeech2 draft, have tests pass * make it an actual conformer, demo ex. * matching inference with original repo, includes debug code * refactor nn.Sequentials, start more desc. var names * more renaming * more renaming * vocoder scratchwork * matching vocoder outputs * hifigan vocoder conversion script * convert model script, rename some config vars * replace postnet with speecht5's implementation * passing common tests, file cleanup * expand testing, add output hidden states and attention * tokenizer + passing tokenizer tests * variety of updates and tests * g2p_en pckg setup * import structure edits * docstrings and cleanup * repo consistency * deps * small cleanup * forward signature param order * address comments except for masks and labels * address comments on attention_mask and labels * address second round of comments * remove old unneeded line * address comments part 1 * address comments pt 2 * rename auto mapping * fixes for failing tests * address comments part 3 (bart-like, train loss) * make style * pass config where possible * add forward method + tests to WithHifiGan model * make style * address arg passing and generate_speech comments * address Arthur comments * address Arthur comments pt2 * lint changes * Sanchit comment * add g2p-en to doctest deps * move up self.encoder * onnx compatible tensor method * fix is symbolic * fix paper url * move models to espnet org * make style * make fix-copies * update docstring * Arthur comments * update docstring w/ new updates * add model architecture images * header size * md wording update * make style |
||
![]() |
3ed3e3190c | Dev version | ||
![]() |
c7f076a00e
|
Adds VIP-llava to transformers (#27932)
* v1 * add-new-model-like * revert * fix forward and conversion script * revert * fix copies * fixup * fix * Update docs/source/en/index.md * Apply suggestions from code review * push * fix * fixes here and there * up * fixup and fix tests * Apply suggestions from code review * add docs * fixup * fixes * docstring * add docstring * fixup * docstring * fixup * nit * docs * more copies * fix copies * nit * update test |
||
![]() |
accccdd008
|
[Add Mixtral ] Adds support for the Mixtral MoE (#27942)
* up * up * test * logits ok * up * up * few fixes * conversion script * up * nits * nits * update * nuke * more updates * nites * fix many issues * nit * scatter * nit * nuke megablocks * nits * fix conversion script * nit * remove * nits * nit * update * oupsssss * change * nits device * nits * fixup * update * merge * add copied from * fix the copy mentions * update tests * more fixes * nits * conversion script * add parts of the readme * Update tests/models/mixtral/test_modeling_mixtral.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * new test + conversion script * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Apply suggestions from code review * fix * fix copies * fix copies * ooops * fix config * Apply suggestions from code review * fix nits * nit * add copies * add batched tests * docs * fix flash attention * let's add more verbose * add correct outputs * support router ouptus * ignore copies where needed * fix * cat list if list is given for now * nits * Update docs/source/en/model_doc/mixtral.md * finish router refactoring * fix forward * fix expected values * nits * fixup * fix * fix bug * fix * fix dtype mismatch * fix * grrr grrr I support item assignment * fix CI * docs * fixup * remove some copied form * fix weird diff * skip doctest fast on the config and modeling * mark that is supports flash attention in the doc * update * Update src/transformers/models/mixtral/modeling_mixtral.py Co-authored-by: Lysandre Debut <hi@lysand.re> * Update docs/source/en/model_doc/mixtral.md Co-authored-by: Lysandre Debut <hi@lysand.re> * revert router logits config issue * update doc accordingly * Update src/transformers/models/mixtral/convert_mixtral_weights_to_hf.py * nits * use torch testing asssert close * fixup * doc nits --------- Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Lysandre Debut <hi@lysand.re> |
||
![]() |
7ea21f1f03
|
[LLaVa] Some improvements (#27895)
* More improvements * Improve variable names * Update READMEs, improve docs |
||
![]() |
44b5506d29
|
[Llava ] Add Llava to transformers (#27662)
* add model like
* logits match
* minor fixes
* fixes
* up
* up
* add todo
* llava processor
* keep the processor simple
* add conversion script
* fixup
* fix copies
* up
* add to index
* fix config + logits
* fix
* refactor
* more refactor
* more refactor
* fix copies
* add authors
* v1 tests
* add `LlavaProcessor` in init
* remove unneeded import
* up
* up
* docs
* up
* fix CI
* fix CI
* add attention mask in test
* make fixup
* remove the vision model
* that' s the dirty way to do it
* nits
* nits
* updates
* add more tests
* add input tests
* fixup
* more styling
* nits
* updates amd cleanup
* fixup the generation expected results
* fix the testing script
* some cleanup and simplification which does not work yet but almost there!
* make correct dispatch operations
* vectorize works for batch of images and text
* last todos
* nits
* update test and modeling code
* remove useless function for now
* fix few issues
* fix generation
* some nits
* add bakllava
* nits
* remove duplicated code
* finis merge
* cleanup
* missed this line
* fill the todos
* add left padding offset
* add left and rignt padding logic
* bool to properly index
* make sure
* more cleanups
* batch is fixed 😉
* add correct device for tensor creation
* fix some dtype missmatch
* ruff
* update conversion script
* Update src/transformers/__init__.py
* fa 2 support + fix conversion script
* more
* correct reshaping
* fix test dict
* fix copies by ignoring
* fix nit
* skip clip vision model
* fixup
* fixup
* LlavaForVisionText2Text -> LlavaForCausalLM
* update
* fix
* raise correct errors
* fix
* docs
* nuke for now
* nits here and there
* fixup
* fix remaining tests
* update LlavaForConditionalGeneration instead of CausalLM
* fixups
* pipeline support
* slow and piepline tests
* supports batch
* nits
* cleanup
* fix first integration tests
* add pad token where needed
* correct etsts
* fixups
* update pipeline testr
* fix quality
* nits
* revert unneeded change
* nit
* use BatchFeature
* from ...feature_extraction_utils import BatchFeature
* nits
* nits
* properly update
* more f*** nits
* fix copies
* comment
* keep slow test slow
* Update src/transformers/models/llava/processing_llava.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add piepline example
* add pixel values in docstrign
* update pr doctest
* fix
* fix slow tests
* remove hack
* fixup
* small note
* forward contrib credits from PR25789
* forward contrib credits from original implementation and work
* add arthur
* Update src/transformers/models/llava/processing_llava.py
Co-authored-by: Lysandre Debut <hi@lysand.re>
* update docstring
* nit
* move to not doctested because of timeout issues
* fixup
* add description
* more
* fix-copies
* fix docs
* add beam search
* add more comments
* add typehints on processor
* add speedup plot
* update slow tests and docs
* push test
* push batched test
* fix batched generation with different number of images
* remove benchmark due to a bug
* fix test
* fix copies
* add gcolab demo
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: shauray8 <shauray8@users.noreply.github.com>
Co-authored-by: haotian-liu <haotian-liu@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
|
||
![]() |
b242d0f297
|
[Time series] Add PatchTSMixer (#26247)
* patchtsmixer initial commit * x,y->context_values,target_values, unittest addded * cleanup code * minor * return hidden states * model tests, partial integration tests * ettm notebook temporary * minor * config mask bug fix, tests updated * final ETT notebooks * add selfattn * init * added docstrings * PatchTSMixerForPretraining -> PatchTSMixerForMaskPretraining * functionality tests added * add start and input docstrings * docstring edits * testcase edits * minor changes * docstring error fixed * ran make fixup * finalize integration tests and docs * minor * cleaned gitignore * added dataclass decorator, ran black formatter * ran ruff * formatting * add slow decorator * renamed in_Channel to input_size and default to 1 * shorten dataclass names * use smaller model for testing * moved the 3 heads to the modeling file * use scalers instead of revin * support forecast_channel_indices * fix regression scaling * undo reg. scaling * removed unneeded classes * forgot missing * add more layers * add copied positional_encoding * use patchmask from patchtst * removed dependency on layers directory * formatting * set seed * removed unused imports * fixed forward signature test * adding distributional head for PatchTSMixerForecasting * add generate to forecast * testcases for generate * add generate and distributional head for regression * raise Exception for negative values for neg binominal distribution * formatting changes * remove copied from patchtst and add TODO for test passing * make copies * doc edits * minor changes * format issues * minor changes * minor changes * format docstring * change some class names to PatchTSMixer + class name Transpose to PatchTSMixerTranspose GatedAttention to PatchTSMixerGatedAttention * change NormLayer to PatchTSMixerNormLayer * change MLP to PatchTSMixerMLP * change PatchMixer to PatchMixerBlock, FeatureMixer to FeatureMixerBlock * change ChannelFeatureMixer to ChannelFeatureMixerBlock * change PatchMasking to PatchTSMixerMasking * change Patchify to PatchTSMixerPatchify * list to `list` * fix docstrings * formatting * change bs to batch_size, edit forecast_masking * edit random_masking * change variable name and update docstring in PatchTSMixerMasking * change variable name and update docstring in InjectScalerStatistics4D * update forward call in PatchTSMixerTranspose * change variable name and update docstring in PatchTSMixerNormLayer * change variable name and update docstring in PatchTSMixerMLP * change variable name and update docstring in ChannelFeatureMixerBlock * formatting * formatting issues * docstring issue * fixed observed_mask type in docstrings * use FloatTensor type * formatting * fix rescaling issue in forecasting, fixed integration tests * add docstring from decorator * fix docstring * Update README.md Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/patchtsmixer/configuration_patchtsmixer.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/patchtsmixer/modeling_patchtsmixer.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/patchtsmixer/configuration_patchtsmixer.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/models/patchtsmixer/modeling_patchtsmixer.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * PatchTSMixerChannelFeatureMixerBlock * formatting * ForPretraining * use num_labels instead of n_classes * remove commented out code * docstring fixed * nn.functional used instead of one letter F * x_tmp renamed * one letter variable x removed from forward calls * one letter variable y removed * remove commented code * rename patch_size, in_channels, PatchTSMixerBackbone * add config to heads * add config to heads tests * code reafactoring to use config instead of passing individual params * Cdocstring fixes part 1 * docstring fixes part 2 * removed logger.debug * context_values -> past_values * formatting changes * pe -> positional_encoding * removed unused target variable * self.mode logic fixed * formatting change * edit docstring and var name * change n_targets to num_targets * rename input_size to num_input_channels * add head names with prefix PatchTSMixer * edit docstring in PatchTSMixerForRegression * fix var name change in testcases * add PatchTSMixerAttention * return dict for all exposed classes, test cases added * format * move loss function to forward call * make style * adding return dict/tuple * make repo-consistency * remove flatten mode * code refactoring * rename data * remove PatchTSMixer and keep only PatchTSMixerEncoder * docstring fixes * removed unused code * format * format * remove contiguous and formatting changes * remove model description from config * replace asserts with ValueError * remove nn.Sequential from PatchTSMixerNormLayer * replace if-else with map * remove all nn.Sequential * format * formatting * fix gradient_checkpointing error after merge, and formatting * make fix-copies * remove comments * reshape * doesnt support gradient checkpointing * corect Patchify * masking updates * batchnorm copy from * format checks * scaler edits * remove comments * format changes * remove self.config * correct class PatchTSMixerMLP(nn.Module): * makr fix * doc updates * fix-copies * scaler class correction * doc edits * scaler edits * update readme with links * injectstatistics add * fix-copies * add norm_eps option to LayerNorm * format changes * fix copies * correct make copies * use parametrize * fix doc string * add docs to toctree * make style * doc segmenting * docstring edit * change forecast to prediction * edit doc * doc edits * remove PatchTSMixerTranspose * add PatchTSMixerPositionalEncoding and init position_enc * remove positional_encoding * edit forecast_masking, remove forecast_mask_ratios * fix broken code * var rename target_values -> future_values * num_features -> d_model * fix broken code after master merge * repo consistency * use postional embedding * prediction_logits -> prediction_outputs, make fix-copies * uncommented @slow * minor changes * loss first in tuple * tuple and dict same ordering * style edits * minor changes * dict/tuple consistent enablement * Update src/transformers/models/patchtsmixer/modeling_patchtsmixer.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update tests/models/patchtsmixer/test_modeling_patchtsmixer.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/patchtsmixer/modeling_patchtsmixer.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix formatting * formatting * usage tip * test on cpu only * add sample usage * change PatchTSMixerForClassification to PatchTSMixerForTimeSeriesClassification * push changes * fix copies * std scaling set to default True case * minor changes * stylechanges --------- Co-authored-by: Arindam Jati <arindam.jati@ibm.com> Co-authored-by: vijaye12 <vijaye12@in.ibm.com> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> Co-authored-by: nnguyen <nnguyen@us.ibm.com> Co-authored-by: vijaye12 <vijaykr.e@gmail.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Nam Nguyen <namctin@gmail.com> Co-authored-by: Wesley Gifford <79663411+wgifford@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> |