[docs] Llama3 (#31662)

quick usage to top
2025-08-01 02:31:11 +06:00 · 2024-06-27 10:32:51 -07:00 · 2024-06-27 10:32:51 -07:00 · 464aa74659
commit 464aa74659
parent e44b878c02
1 changed files with 10 additions and 14 deletions
--- a/docs/source/en/model_doc/llama3.md
+++ b/docs/source/en/model_doc/llama3.md
@ -16,6 +16,15 @@ rendered properly in your Markdown viewer.

 # Llama3

+```py3
+import transformers
+import torch
+
+model_id = "meta-llama/Meta-Llama-3-8B"
+
+pipeline = transformers.pipeline("text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto")
+pipeline("Hey how are you doing today?")
+```

 ## Overview

@ -66,20 +75,7 @@ model = AutoModelForCausalLM.from_pretrained("/output/path")
 Note that executing the script requires enough CPU RAM to host the whole model in float16 precision (even if the biggest versions
 come in several checkpoints they each contain a part of each weight of the model, so we need to load them all in RAM). For the 75B model, it's thus 145GB of RAM needed.

-
 - When using Flash Attention 2 via `attn_implementation="flash_attention_2"`, don't pass `torch_dtype` to the `from_pretrained` class method and use Automatic Mixed-Precision training. When using `Trainer`, it is simply specifying either `fp16` or `bf16` to `True`. Otherwise, make sure you are using `torch.autocast`. This is required because the Flash Attention only support `fp16` and `bf16` data type.

-## Quick usage
-
-```py3
-import transformers
-import torch
-
-model_id = "meta-llama/Meta-Llama-3-8B"
-
-pipeline = transformers.pipeline("text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto")
-pipeline("Hey how are you doing today?")
-```
-
 ## Resources
-A ton of cool resources are already available on the documentation page of [~llama2], inviting contributors to add new resources curated for Llama3 here! 🤗
+A ton of cool resources are already available on the documentation page of [Llama2](./llama2), inviting contributors to add new resources curated for Llama3 here! 🤗