transformers/docs/source/en
Matt b3ab3fac1d
Falcon port (#24523)
* Initial commit

* Update src/transformers/models/falcon/configuration_falcon.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/falcon/configuration_falcon.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Cleanup config docstring

* Update src/transformers/models/falcon/configuration_falcon.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Convert to relative imports

* Remove torch < 1.8 warning

* Restructure cos_sin header

* qkv -> query, key, value

* Refactor attention calculation

* Add a couple of config variables to account for the different checkpoints

* Successful merging of the code paths!

* Fix misplaced line in the non-parallel attention path

* Update config and tests

* Add a pad_token_id when testing

* Support output_attentions when alibi is None

* make fixup

* Skip KV cache shape test

* No more _keys_to_ignore_on_load_missing

* Simplify self attention a bit

* Simplify self attention a bit

* make fixup

* stash commit

* Some more attention mask updates

* Should pass all tests except assisted generation!

* Add big model generation test

* make fixup

* Add temporary workaround for test

* Test overrides for assisted generation

* Update src/transformers/models/falcon/modeling_falcon.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/falcon/modeling_falcon.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/falcon/modeling_falcon.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update tests/models/falcon/test_modeling_falcon.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Test overrides for assisted generation

* Add generation demo

* Update copyright

* Make the docstring model actually small

* Add module-level docstring

* Remove all assertions

* Add copied from bloom

* Reformat the QKV layer

* Add copied from bloom

* Update src/transformers/models/falcon/modeling_falcon.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Remove unused line and reformat

* No single letter variables

* Cleanup return names

* Add copied from line

* Remove the deprecated arguments blocks

* Change the embeddings test to an alibi on/off test

* Remove position_ids from FalconForQA

* Remove old check for token type IDs

* Fix the alibi path when multi_query is False

* Update src/transformers/models/falcon/modeling_falcon.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/falcon/modeling_falcon.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/falcon/test_modeling_falcon.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update config naming

* Fix typo for new_decoder_architecture

* Add some comments

* Fix docstring

* Fix docstring

* Create range in the right dtype from the start

* Review comment cleanup

* n_head_kv -> num_kv_heads

* self.alibi -> self.use_alibi

* self.num_kv -> self.num_kv_heads

* Reorder config args

* Made alibi arguments Optional

* Add all model docstrings

* Add extra checkpoints

* Add author info for Falcon

* Stop removing token_type_ids because our checkpoints shouldn't return it anymore

* Add one hopeful comment for the future

* Fix typo

* Update tests, fix cache issue for generation

* Use -1e9 instead of -inf to avoid float overflow

* Recompute the rotary embeddings much less often

* Re-enable disabled tests

* One final fix to attention mask calculation, and update tests

* Cleanup targeting falcon-40b equivalency

* Post-rebase docs update

* Update docstrings, especially in the config

* More descriptive variable names, and comments where we can't rename them

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-07-11 13:36:31 +01:00
..
internal Generate: add SequenceBiasLogitsProcessor (#24334) 2023-06-21 11:14:41 +01:00
main_classes add link to accelerate doc (#24601) 2023-07-10 17:49:30 -04:00
model_doc Falcon port (#24523) 2023-07-11 13:36:31 +01:00
tasks Falcon port (#24523) 2023-07-11 13:36:31 +01:00
_config.py Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
_toctree.yml Falcon port (#24523) 2023-07-11 13:36:31 +01:00
accelerate.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
add_new_model.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
add_new_pipeline.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
add_tensorflow_model.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
attention.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
autoclass_tutorial.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
benchmarks.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
bertology.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
big_models.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
community.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
contributing.md Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
create_a_model.md Update old existing feature extractor references (#24552) 2023-06-29 10:17:36 +01:00
custom_models.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
custom_tools.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
debugging.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
fast_tokenizers.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
generation_strategies.md Generate: group_beam_search requires diversity_penalty>0.0 (#24456) 2023-06-27 10:46:39 +01:00
glossary.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
hpo_train.md Update RayTune doc link for Hyperparameter tuning (#24422) 2023-06-22 10:38:01 -04:00
index.md Falcon port (#24523) 2023-07-11 13:36:31 +01:00
installation.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
model_sharing.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
model_summary.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
multilingual.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
notebooks.md Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
pad_truncation.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
perf_hardware.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
perf_infer_cpu.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
perf_infer_gpu_many.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
perf_infer_gpu_one.md Docs: 4 bit doc corrections (#24572) 2023-06-29 13:13:20 +01:00
perf_infer_special.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
perf_train_cpu_many.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
perf_train_cpu.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
perf_train_gpu_many.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
perf_train_gpu_one.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
perf_train_special.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
perf_train_tpu_tf.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
perf_train_tpu.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
performance.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
perplexity.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
philosophy.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
pipeline_tutorial.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
pipeline_webserver.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
pr_checks.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
preprocessing.md Removal of deprecated vision methods and specify deprecation versions (#24570) 2023-06-29 15:09:51 +01:00
quicktour.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
run_scripts.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
sagemaker.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
serialization.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
task_summary.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
tasks_explained.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
testing.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
tf_xla.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
tflite.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
tokenizer_summary.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
torchscript.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
training.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
transformers_agents.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
troubleshooting.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00