transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-14 18:18:24 +06:00

History

Matt b3ab3fac1d Falcon port (#24523 ) * Initial commit * Update src/transformers/models/falcon/configuration_falcon.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/falcon/configuration_falcon.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Cleanup config docstring * Update src/transformers/models/falcon/configuration_falcon.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Convert to relative imports * Remove torch < 1.8 warning * Restructure cos_sin header * qkv -> query, key, value * Refactor attention calculation * Add a couple of config variables to account for the different checkpoints * Successful merging of the code paths! * Fix misplaced line in the non-parallel attention path * Update config and tests * Add a pad_token_id when testing * Support output_attentions when alibi is None * make fixup * Skip KV cache shape test * No more _keys_to_ignore_on_load_missing * Simplify self attention a bit * Simplify self attention a bit * make fixup * stash commit * Some more attention mask updates * Should pass all tests except assisted generation! * Add big model generation test * make fixup * Add temporary workaround for test * Test overrides for assisted generation * Update src/transformers/models/falcon/modeling_falcon.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/falcon/modeling_falcon.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/falcon/modeling_falcon.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update tests/models/falcon/test_modeling_falcon.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Test overrides for assisted generation * Add generation demo * Update copyright * Make the docstring model actually small * Add module-level docstring * Remove all assertions * Add copied from bloom * Reformat the QKV layer * Add copied from bloom * Update src/transformers/models/falcon/modeling_falcon.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Remove unused line and reformat * No single letter variables * Cleanup return names * Add copied from line * Remove the deprecated arguments blocks * Change the embeddings test to an alibi on/off test * Remove position_ids from FalconForQA * Remove old check for token type IDs * Fix the alibi path when multi_query is False * Update src/transformers/models/falcon/modeling_falcon.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/falcon/modeling_falcon.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/falcon/test_modeling_falcon.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update config naming * Fix typo for new_decoder_architecture * Add some comments * Fix docstring * Fix docstring * Create range in the right dtype from the start * Review comment cleanup * n_head_kv -> num_kv_heads * self.alibi -> self.use_alibi * self.num_kv -> self.num_kv_heads * Reorder config args * Made alibi arguments Optional * Add all model docstrings * Add extra checkpoints * Add author info for Falcon * Stop removing token_type_ids because our checkpoints shouldn't return it anymore * Add one hopeful comment for the future * Fix typo * Update tests, fix cache issue for generation * Use -1e9 instead of -inf to avoid float overflow * Recompute the rotary embeddings much less often * Re-enable disabled tests * One final fix to attention mask calculation, and update tests * Cleanup targeting falcon-40b equivalency * Post-rebase docs update * Update docstrings, especially in the config * More descriptive variable names, and comments where we can't rename them --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>		2023-07-11 13:36:31 +01:00
..
internal	Generate: add SequenceBiasLogitsProcessor (#24334 )	2023-06-21 11:14:41 +01:00
main_classes	add link to accelerate doc (#24601 )	2023-07-10 17:49:30 -04:00
model_doc	Falcon port (#24523 )	2023-07-11 13:36:31 +01:00
tasks	Falcon port (#24523 )	2023-07-11 13:36:31 +01:00
_config.py	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
_toctree.yml	Falcon port (#24523 )	2023-07-11 13:36:31 +01:00
accelerate.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
add_new_model.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
add_new_pipeline.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
add_tensorflow_model.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
attention.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
autoclass_tutorial.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
benchmarks.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
bertology.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
big_models.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
community.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
contributing.md	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
create_a_model.md	Update old existing feature extractor references (#24552 )	2023-06-29 10:17:36 +01:00
custom_models.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
custom_tools.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
debugging.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
fast_tokenizers.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
generation_strategies.md	Generate: `group_beam_search` requires `diversity_penalty>0.0` (#24456 )	2023-06-27 10:46:39 +01:00
glossary.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
hpo_train.md	Update RayTune doc link for Hyperparameter tuning (#24422 )	2023-06-22 10:38:01 -04:00
index.md	Falcon port (#24523 )	2023-07-11 13:36:31 +01:00
installation.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
model_sharing.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
model_summary.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
multilingual.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
notebooks.md	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
pad_truncation.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
perf_hardware.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
perf_infer_cpu.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
perf_infer_gpu_many.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
perf_infer_gpu_one.md	Docs: 4 bit doc corrections (#24572 )	2023-06-29 13:13:20 +01:00
perf_infer_special.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
perf_train_cpu_many.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
perf_train_cpu.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
perf_train_gpu_many.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
perf_train_gpu_one.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
perf_train_special.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
perf_train_tpu_tf.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
perf_train_tpu.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
performance.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
perplexity.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
philosophy.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
pipeline_tutorial.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
pipeline_webserver.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
pr_checks.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
preprocessing.md	Removal of deprecated vision methods and specify deprecation versions (#24570 )	2023-06-29 15:09:51 +01:00
quicktour.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
run_scripts.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
sagemaker.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
serialization.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
task_summary.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
tasks_explained.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
testing.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
tf_xla.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
tflite.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
tokenizer_summary.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
torchscript.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
training.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
transformers_agents.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
troubleshooting.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00