transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-03 21:00:08 +06:00

History

Alex McKinney 75336c1794 Add Llama Flax Implementation (#24587 ) * Copies `modeling_flax_gpt_neo.py` to start * MLP Block. WIP Attention and Block * Adds Flax implementation of `LlamaMLP` Validated with in-file test. Some slight numeric differences, but assuming it isn't an issue * Adds `FlaxLlamaRMSNorm` layer `flax.linen` includes `RMSNorm` layer but not necessarily in all versions. Hence, we add in-file. * Adds FlaxLlamaAttention Copied from GPT-J as it has efficient caching implementation as well as rotary embeddings. Notice numerically different, but not by a huge amount. Needs investigating * Adds `FlaxLlamaDecoderLayer` numerically inaccurate, debugging.. * debugging rotary mismatch gptj uses interleaved whilst llama uses contiguous i think they match now but still final result is wrong. maybe drop back to just debugging attention layer? * fixes bug with decoder layer still somewhat numerically inaccurate, but close enough for now * adds markers for what to implement next the structure here diverges a lot from the PT version. not a big fan of it, but just get something working for now * implements `FlaxLlamaBlockCollection`] tolerance must be higher than expected, kinda disconcerting * Adds `FlaxLlamaModule` equivalent PyTorch model is `LlamaModel` yay! a language model🤗 * adds `FlaxLlamaForCausalLMModule` equivalent to `LlamaForCausalLM` still missing returning dict or tuple, will add later * start porting pretrained wrappers realised it probably needs return dict as a prereq * cleanup, quality, style * readds `return_dict` and model output named tuples * (tentatively) pretrained wrappers work 🔥 * fixes numerical mismatch in `FlaxLlamaRMSNorm` seems `jax.lax.rsqrt` does not match `torch.sqrt`. manually computing `1 / jax.numpy.sqrt` results in matching values. * [WIP] debugging numerics * numerical match I think issue was accidental change of backend. forcing CPU fixes test. We expect some mismatch on GPU. * adds in model and integration tests for Flax Llama summary of failing: - mul invalid combination of dimensions - one numerical mismatch - bf16 conversion (maybe my local backend issue) - params are not FrozenDict * adds missing TYPE_CHECKING import and `make fixup` * adds back missing docstrings needs review on quality of docstrings, not sure what is required. Furthermore, need to check if `CHECKPOINT_FOR_DOC` is valid. See TODO * commenting out equivalence test as can just use common * debugging * Fixes bug where mask and pos_ids were swapped in pretrained models This results in all tests passing now 🔥 * cleanup of modeling file * cleanup of test file * Resolving simpler review comments * addresses more minor review comments * fixing introduced pytest errors from review * wip additional slow tests * wip tests need to grab a GPU machine to get real logits for comparison otherwise, slow tests should be okay * `make quality`, `make style` * adds slow integration tests - checking logits - checking hidden states - checking generation outputs * `make fix-copies` * fix mangled function following `make fix-copies` * adds missing type checking imports * fixes missing parameter checkpoint warning * more finegrained 'Copied from' tags avoids issue of overwriting `LLAMA_INPUTS_DOCSTRING` * swaps import guards ??? how did these get swapped initially? * removing `inv_freq` again as pytorch version has now removed * attempting to get CI to pass * adds doc entries for llama flax models * fixes typo in __init__.py imports * adds back special equivalence tests these come from the gpt neo flax tests. there is special behaviour for these models that needs to override the common version * overrides tests with dummy to see if CI passes need to fill in these tests later * adds my contribution to docs * `make style; make quality` * replaces random masking with fixed to work with flax version * `make quality; make style` * Update src/transformers/models/llama/modeling_flax_llama.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update src/transformers/models/llama/modeling_flax_llama.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update src/transformers/models/llama/modeling_flax_llama.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update src/transformers/models/llama/modeling_flax_llama.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update src/transformers/models/llama/modeling_flax_llama.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update src/transformers/models/llama/modeling_flax_llama.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * updates `x`->`tensor` in `rotate_half` * addresses smaller review comments * Update docs/source/en/model_doc/llama.md Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * adds integration test class * adds `dtype` to rotary embedding to cast outputs * adds type to flax llama rotary layer * `make style` * `make fix-copies` * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * applies suggestions from review * Update modeling_flax_llama.py * `make fix-copies` * Update tests/models/llama/test_modeling_llama.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update src/transformers/models/llama/modeling_flax_llama.py Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * fixes shape mismatch in FlaxLlamaMLP * applies some suggestions from reviews * casts attn output logits to f32 regardless of dtype * adds attn bias using `LlamaConfig.attention_bias` * adds Copied From comments to Flax Llama test * mistral and persimmon test change -copy from llama * updates docs index * removes Copied from in tests it was preventing `make fix-copies` from succeeding * quality and style * ignores FlaxLlama input docstring * adds revision to `_CHECKPOINT_FOR_DOC` * repo consistency and quality * removes unused import * removes copied from from Phi test now diverges from llama tests following FlaxLlama changes * adds `_REAL_CHECKPOINT_FOR_DOC` * removes refs from pr tests * reformat to make ruff happy --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>		2023-12-07 07:05:00 +01:00
..
test_module	AutoImageProcessor (#20111 )	2022-11-08 19:54:41 +00:00
tf_ops	Check TF ops for ONNX compliance (#10025 )	2021-02-15 07:55:10 -05:00
add_pipeline_model_mapping_to_test.py	A script to add/update `pipeline_model_mapping` systematically (#22180 )	2023-04-06 18:08:14 +02:00
check_build.py	Clean up CUDA kernels (#23455 )	2023-05-18 14:14:43 -04:00
check_config_attributes.py	Add SeamlessM4T v2 (#27779 )	2023-11-30 20:24:43 +01:00
check_config_docstrings.py	[Check] Fix config docstring (#26222 )	2023-09-18 19:58:01 +02:00
check_copies.py	[`Styling`] stylify using ruff (#27144 )	2023-11-16 17:43:19 +01:00
check_doc_toc.py	Doc checks (#25408 )	2023-08-10 10:53:22 +02:00
check_docstrings.py	Add Llama Flax Implementation (#24587 )	2023-12-07 07:05:00 +01:00
check_doctest_list.py	Avoid many failing tests in doctesting (#27262 )	2023-11-03 12:47:07 +01:00
check_dummies.py	Doc checks (#25408 )	2023-08-10 10:53:22 +02:00
check_inits.py	Make using safetensors files automated. (#27571 )	2023-12-01 15:51:10 +01:00
check_model_tester.py	Add a new script to check model testers' config (#22063 )	2023-03-13 19:11:19 +01:00
check_repo.py	[Time series] Add PatchTSMixer (#26247 )	2023-12-05 15:31:35 +01:00
check_self_hosted_runner.py	Tiny fix for `check_self_hosted_runner.py` (#24052 )	2023-06-06 18:17:41 +02:00
check_table.py	Add madlad-400 MT models (#27471 )	2023-11-28 13:19:50 +00:00
check_task_guides.py	More utils doc (#25457 )	2023-08-17 07:58:35 +02:00
check_tf_ops.py	Check TF ops for ONNX compliance (#10025 )	2021-02-15 07:55:10 -05:00
create_dummy_models.py	Update tiny model creation script (#27674 )	2023-11-28 10:05:34 +01:00
custom_init_isort.py	More utils doc (#25457 )	2023-08-17 07:58:35 +02:00
download_glue_data.py	Raise exceptions instead of asserts (#13907 )	2021-10-07 12:44:23 +05:30
extract_warnings.py	Make Slack CI reporting stronger (#21823 )	2023-02-28 17:12:44 +01:00
get_ci_error_statistics.py	Show diff between 2 CI runs on Slack reports (#22798 )	2023-04-19 19:27:37 +02:00
get_github_job_time.py	Make Slack CI reporting stronger (#21823 )	2023-02-28 17:12:44 +01:00
get_modified_files.py	exclude deleted files in the fixup script (#21436 )	2023-02-03 12:57:02 -05:00
get_previous_daily_ci.py	Fix a minor bug in CI slack report (#22906 )	2023-04-21 20:36:35 +02:00
get_test_info.py	Add an utility file to get information from test files (#21856 )	2023-03-01 17:53:29 +01:00
not_doctested.txt	Add SeamlessM4T v2 (#27779 )	2023-11-30 20:24:43 +01:00
notification_service_doc_tests.py	Fix slack report failing for doctest (#27042 )	2023-10-30 10:48:24 +01:00
notification_service.py	restructure AMD scheduled CI (#27743 )	2023-12-04 15:32:05 +01:00
past_ci_versions.py	(Re-)Enable Nightly + Past CI (#22393 )	2023-03-30 21:06:35 +02:00
print_env.py	Print more library versions in CI (#17384 )	2022-06-02 10:24:16 +02:00
release.py	More utils doc (#25457 )	2023-08-17 07:58:35 +02:00
slow_documentation_tests.txt	Add SeamlessM4T v2 (#27779 )	2023-11-30 20:24:43 +01:00
sort_auto_mappings.py	More utils doc (#25457 )	2023-08-17 07:58:35 +02:00
tests_fetcher.py	Trigger corresponding pipeline tests if `tests/utils/tiny_model_summary.json` is modified (#27693 )	2023-11-28 17:21:21 +01:00
update_metadata.py	Update processor mapping for hub snippets (#27477 )	2023-11-14 20:05:54 +00:00
update_tiny_models.py	Update tiny model summary file for recent models (#22637 )	2023-04-06 22:52:59 +02:00