* wip
* fix __init__.py
* add docs
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* address comments 1
* work on make fixup
* pass configs down
* add sdpa attention
* remove DbrxBlock
* add to configuration_auto
* docstring now passes formatting test
* fix style
* update READMEs
* add dbrx to modeling_auto
* make fix-copies generated this
* add DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP
* config docstring passes formatting test
* rename moe_loss_weight to router_aux_loss_coef
* add to flash-attn documentation
* fix model-path in tests
* Explicitly make `"suli"` the default `ffn_act_fn`
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* default to using router_aux_loss_coef over ffn_config[moe_loss_weight]
* fix _flash_attn_uses_top_left_mask and is_causal
* fix tests path
* don't use token type IDs
* follow Llama and remove token_type_ids from test
* init ConfigTester differently so tests pass
* remove multiple choice test
* remove question + answer test
* remove sequence classification test
* remove token classification test
* copy Llama tests and remove token_type_ids from test inputs
* do not test pruning or headmasking; style code
* add _tied_weights_keys parameter to pass test
* add type hints
* fix type check
* update config tester
* remove masked_lm test
* remove encoder tests
* initialize DbrxModelTester with correct params
* style
* torch_dtype does not rely on torch
* run make fixup, fix-copies
* use https://huggingface.co/v2ray/dbrx-base-fixed/blob/main/modeling_dbrx.py
* add copyright info
* fix imports and DbrxRotaryEmbedding
* update DbrxModel docstring
* use copies
* change model path in docstring
* use config in DbrxFFN
* fix flashattention2, sdpaattention
* input config to DbrXAttention, DbrxNormAttentionNorm
* more fixes
* fix
* fix again!
* add informative comment
* fix ruff?
* remove print statement + style
* change doc-test
* fix doc-test
* fix docstring
* delete commented out text
* make defaults match dbrx-instruct
* replace `router_aux_loss_coef` with `moe_loss_weight`
* is_decoder=True
* remove is_decoder from configtester
* implement sdpa properly
* make is_decoder pass tests
* start on the GenerationTesterMixin tests
* add dbrx to sdpa documentation
* skip weight typing test
* style
* initialize smaller model
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Add DBRX to toctree
* skip test_new_cache_format
* make config defaults smaller again
* add pad_token_id
* remove pad_token_id from config
* Remove all references to DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP
* Update src/transformers/models/dbrx/__init__.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/dbrx/modeling_dbrx.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update docs/source/en/model_doc/dbrx.md
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Update src/transformers/models/dbrx/configuration_dbrx.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update docs/source/en/model_doc/dbrx.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix typo
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* update docs, fix configuration_auto.py
* address pr comments
* remove is_decoder flag
* slice
* fix requires grad
* remove grad
* disconnect differently
* remove grad
* enable grads
* patch
* detach expert
* nissan al ghaib
* Update modeling_dbrx.py
* Update src/transformers/models/dbrx/modeling_dbrx.py
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* replace "Gemma" with "Dbrx"
* remove # type: ignore
* don't hardcode vocab_size
* remove ToDo
* Re-add removed idefics2 line
* Update test to use tiny-random!
* Remove TODO
* Remove one more case of loading the entire dbrx-instruct in the tests
* Update src/transformers/models/dbrx/modeling_dbrx.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* address some comments
* small model
* add dbrx to tokenization_auto
* More docstrings with add_start_docstrings
* Dbrx for now
* add PipelineTesterMixin
* Update src/transformers/models/dbrx/configuration_dbrx.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* remove flash-attn2 import error
* fix docstring
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add useage example
* put on one line
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix ffn_act_fn
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* change "dbrx" to "DBRX" for display purposes.
* fix __init__.py?
* fix __init__.py
* fix README
* return the aux_loss
* remove extra spaces
* fix configuration_auto.py
* fix format in tokenization_auto
* remove new line
* add more useage examples
---------
Co-authored-by: Abhi Venigalla <abhi.venigalla@databricks.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Eitan Turok <eitan.turok@databricks.com>
Co-authored-by: Eitan Turok <150733043+eitanturok@users.noreply.github.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
Co-authored-by: Eitan Turok <eitanturok@gmail.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Matt <rocketknight1@gmail.com>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Fix typos and grammar mistakes in docs and examples
* Fix typos in docstrings and comments
* Fix spelling of `tokenizer` in model tests
* Remove erroneous spaces in decorators
* Remove extra spaces in Markdown link texts
* Put models in subfolders
* Styling
* Fix imports in tests
* More fixes in test imports
* Sneaky hidden imports
* Fix imports in doc files
* More sneaky imports
* Finish fixing tests
* Fix examples
* Fix path for copies
* More fixes for examples
* Fix dummy files
* More fixes for example
* More model import fixes
* Is this why you're unhappy GitHub?
* Fix imports in conver command
* configuration_squeezebert.py
thin wrapper around bert tokenizer
fix typos
wip sb model code
wip modeling_squeezebert.py. Next step is to get the multi-layer-output interface working
set up squeezebert to use BertModelOutput when returning results.
squeezebert documentation
formatting
allow head mask that is an array of [None, ..., None]
docs
docs cont'd
path to vocab
docs and pointers to cloud files (WIP)
line length and indentation
squeezebert model cards
formatting of model cards
untrack modeling_squeezebert_scratchpad.py
update aws paths to vocab and config files
get rid of stub of NSP code, and advise users to pretrain with mlm only
fix rebase issues
redo rebase of modeling_auto.py
fix issues with code formatting
more code format auto-fixes
move squeezebert before bert in tokenization_auto.py and modeling_auto.py because squeezebert inherits from bert
tests for squeezebert modeling and tokenization
fix typo
move squeezebert before bert in modeling_auto.py to fix inheritance problem
disable test_head_masking, since squeezebert doesn't yet implement head masking
fix issues exposed by the test_modeling_squeezebert.py
fix an issue exposed by test_tokenization_squeezebert.py
fix issue exposed by test_modeling_squeezebert.py
auto generated code style improvement
issue that we inherited from modeling_xxx.py: SqueezeBertForMaskedLM.forward() calls self.cls(), but there is no self.cls, and I think the goal was actually to call self.lm_head()
update copyright
resolve failing 'test_hidden_states_output' and remove unused encoder_hidden_states and encoder_attention_mask
docs
add integration test. rename squeezebert-mnli --> squeezebert/squeezebert-mnli
autogenerated formatting tweaks
integrate feedback from patrickvonplaten and sgugger to programming style and documentation strings
* tiny change to order of imports