transformers/docs/source/en
Kian Sierra McGettigan f7076cd346
Flax mistral (#26943)
* direct copy from llama work

* mistral modules forward pass working

* flax mistral forward pass with sliding window

* added tests

* added layer collection approach

* Revert "added layer collection approach"

This reverts commit 0e2905bf22.

* Revert "Revert "added layer collection approach""

This reverts commit fb17b6187a.

* fixed attention outputs

* added mistral to init and auto

* fixed import name

* fixed layernorm weight dtype

* freeze initialized weights

* make sure conversion consideres bfloat16

* added backend

* added docstrings

* added cache

* fixed sliding window causal mask

* passes cache tests

* passed all tests

* applied make style

* removed commented out code

* applied fix-copies ignored other model changes

* applied make fix-copies

* removed unused functions

* passed generation integration test

* slow tests pass

* fixed slow tests

* changed default dtype from jax.numpy.float32 to float32 for docstring check

* skip cache test  for FlaxMistralForSequenceClassification since if pad_token_id in input_ids it doesn't score previous input_ids

* updated checkpoint since from_pt not included

* applied black style

* removed unused args

* Applied styling and fixup

* changed checkpoint for doc back

* fixed rf after adding it to hf hub

* Add dummy ckpt

* applied styling

* added tokenizer to new ckpt

* fixed slice format

* fix init and slice

* changed ref for placeholder TODO

* added copies from Llama

* applied styling

* applied fix-copies

* fixed docs

* update weight dtype reconversion for sharded weights

* removed Nullable input ids

* Removed unnecessary output attentions in Module

* added embedding weight initialziation

* removed unused past_key_values

* fixed deterministic

* Fixed RMS Norm and added copied from

* removed input_embeds

* applied make style

* removed nullable input ids from sequence classification model

* added copied from GPTJ

* added copied from Llama on FlaxMistralDecoderLayer

* added copied from to FlaxMistralPreTrainedModel methods

* fix test deprecation warning

* freeze gpt neox random_params and fix copies

* applied make style

* fixed doc issue

* skipped docstring test to allign # copied from

* applied make style

* removed FlaxMistralForSequenceClassification

* removed unused padding_idx

* removed more sequence classification

* removed sequence classification

* applied styling and consistency

* added copied from in tests

* removed sequence classification test logic

* applied styling

* applied make style

* removed freeze and fixed copies

* undo test change

* changed repeat_kv to tile

* fixed to key value groups

* updated copyright year

* split casual_mask

* empty to rerun failed pt_flax_equivalence test FlaxWav2Vec2ModelTest

* went back to 2023 for tests_pr_documentation_tests

* went back to 2024

* changed tile to repeat

* applied make style

* empty for retry on Wav2Vec2
2024-01-31 14:19:02 +01:00
..
internal Generate: consolidate output classes (#28494) 2024-01-15 17:04:08 +00:00
main_classes Improve Backbone API docs (#28666) 2024-01-25 11:51:58 +00:00
model_doc Flax mistral (#26943) 2024-01-31 14:19:02 +01:00
tasks [docs] Fix datasets in guides (#28715) 2024-01-26 09:29:07 -08:00
_config.py [Styling] stylify using ruff (#27144) 2023-11-16 17:43:19 +01:00
_redirects.yml Extended semantic segmentation to image segmentation (#27039) 2023-11-23 15:58:21 +00:00
_toctree.yml [HfQuantizer] Move it to "Developper guides" (#28768) 2024-01-30 07:20:20 +01:00
accelerate.md Fix typos (#25936) 2023-09-04 11:15:12 +01:00
add_new_model.md Update add_new_model.md (#26365) 2023-09-25 12:58:11 +02:00
add_new_pipeline.md Fix broken link on page (#28451) 2024-01-11 09:26:13 -08:00
add_tensorflow_model.md Remove utils/documentation_tests.txt (#26213) 2023-09-18 13:33:01 +02:00
attention.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
autoclass_tutorial.md Docs for AutoBackbone & Backbone (#27456) 2023-12-11 08:22:17 -05:00
benchmarks.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
bertology.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
big_models.md Fix typos (#25936) 2023-09-04 11:15:12 +01:00
chat_templating.md Update chat template warnings/guides (#27634) 2023-11-27 18:40:10 +00:00
community.md Update community.md (#25928) 2023-09-04 11:16:34 +01:00
contributing.md Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
create_a_model.md [docs] fixed links with 404 (#27327) 2023-11-06 19:45:03 +00:00
custom_models.md Add config tip to custom model docs (#28601) 2024-01-22 13:46:04 +00:00
custom_tools.md [doc] Always call it Agents for consistency (#25958) 2023-09-05 12:27:20 +01:00
debugging.md [docs] DeepSpeed (#28542) 2024-01-24 08:31:28 -08:00
deepspeed.md [docs] Fix doc format (#28684) 2024-01-24 11:18:59 -08:00
fast_tokenizers.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
fsdp.md [docs] Trainer docs (#28145) 2023-12-20 10:37:23 -08:00
generation_strategies.md Generate: fix speculative decoding (#28166) 2023-12-20 18:55:35 +00:00
glossary.md [Doc] Spanish translation of glossary.md (#27958) 2023-12-13 09:21:59 -08:00
hf_quantizer.md HfQuantizer class for quantization-related stuff in modeling_utils.py (#26610) 2024-01-30 02:48:25 +01:00
hpo_train.md Remove-auth-token (#27060) 2023-11-13 14:20:54 +01:00
index.md Flax mistral (#26943) 2024-01-31 14:19:02 +01:00
installation.md README: install transformers from conda-forge channel (#28313) 2024-01-04 09:36:16 -08:00
llm_tutorial_optimization.md F.scaled_dot_product_attention support (#26572) 2023-12-09 05:38:14 +09:00
llm_tutorial.md Generate: All logits processors are documented and have examples (#27796) 2023-12-07 15:11:35 +00:00
model_memory_anatomy.md Fix typos (#25936) 2023-09-04 11:15:12 +01:00
model_sharing.md Fix typos (#25936) 2023-09-04 11:15:12 +01:00
model_summary.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
multilingual.md Fix typo in example code (#25583) 2023-08-18 07:58:59 +02:00
notebooks.md Enable doc in Spanish (#16518) 2022-04-04 10:25:46 -04:00
pad_truncation.md [Doc] Spanish translation of pad_truncation.md (#27890) 2023-12-08 10:32:18 -08:00
peft.md [Peft] modules_to_save support for peft integration (#27466) 2023-11-14 10:32:57 +01:00
perf_hardware.md docs: replace torch.distributed.run by torchrun (#27528) 2023-11-27 16:26:33 +00:00
perf_infer_cpu.md [docs] Update CPU/GPU inference docs (#26881) 2023-10-31 09:44:51 -07:00
perf_infer_gpu_one.md Add qwen2 (#28436) 2024-01-17 16:02:22 +01:00
perf_torch_compile.md Fix rendering for torch.compile() docs (#25432) 2023-08-10 13:25:00 +02:00
perf_train_cpu_many.md Doc (#28431) 2024-01-11 08:55:48 -08:00
perf_train_cpu.md improve efficient training on CPU documentation (#28646) 2024-01-24 09:07:13 -08:00
perf_train_gpu_many.md [docs] Improve visualization for vertical parallelism (#28583) 2024-01-25 17:55:11 +00:00
perf_train_gpu_one.md Improving Training Performance and Scalability Documentation (#28497) 2024-01-16 11:30:26 +01:00
perf_train_special.md [docs] MPS (#28016) 2023-12-15 13:17:29 -08:00
perf_train_tpu_tf.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
performance.md [docs] Update CPU/GPU inference docs (#26881) 2023-10-31 09:44:51 -07:00
perplexity.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
philosophy.md [docs] fixed links with 404 (#27327) 2023-11-06 19:45:03 +00:00
pipeline_tutorial.md [ASR Pipe] Improve docs and error messages (#26476) 2023-09-29 18:32:37 +01:00
pipeline_webserver.md Suggestions on Pipeline_webserver (#25570) 2023-08-18 10:17:44 +02:00
pr_checks.md Docstring check (#26052) 2023-10-04 15:13:37 +02:00
preprocessing.md [docs] Update preprocessing.md (#28719) 2024-01-26 11:58:57 +00:00
quantization.md HfQuantizer class for quantization-related stuff in modeling_utils.py (#26610) 2024-01-30 02:48:25 +01:00
quicktour.md [TYPO] fix typo/format in quicktour.md (#25519) 2023-08-16 08:03:23 +02:00
run_scripts.md docs: replace torch.distributed.run by torchrun (#27528) 2023-11-27 16:26:33 +00:00
sagemaker.md [docs] fixed links with 404 (#27327) 2023-11-06 19:45:03 +00:00
serialization.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
task_summary.md [Doc] Fix token link in What 🤗 Transformers can do (#28123) 2023-12-18 15:06:54 -08:00
tasks_explained.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
testing.md [docs] DeepSpeed (#28542) 2024-01-24 08:31:28 -08:00
tf_xla.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
tflite.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
tokenizer_summary.md Fix typo: Roberta -> RoBERTa (#25302) 2023-08-03 14:17:30 -07:00
torchscript.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00
trainer.md [docs] Trainer docs (#28145) 2023-12-20 10:37:23 -08:00
training.md Fix semantic error in evaluation section (#27675) 2023-11-24 12:41:16 +01:00
transformers_agents.md [doc] Always call it Agents for consistency (#25958) 2023-09-05 12:27:20 +01:00
troubleshooting.md Migrate doc files to Markdown. (#24376) 2023-06-20 18:07:47 -04:00