mirror of
https://github.com/huggingface/transformers.git
synced 2025-07-31 02:02:21 +06:00
Make siglip examples clearer and error free (#33667)
Update siglip.md This was already partially fixed relative to the deployed docs. But the partial fix made it inconsistent. Additionally, giving the full text ("This is a photo of...") is likely not the desired output.
This commit is contained in:
parent
7f9a9ca1e0
commit
55b7a0404e
@ -85,7 +85,7 @@ If you want to do the pre- and postprocessing yourself, here's how to do that:
|
||||
|
||||
>>> candidate_labels = ["2 cats", "2 dogs"]
|
||||
# follows the pipeline prompt template to get same results
|
||||
>>> candidate_labels = [f'This is a photo of {label}.' for label in candidate_labels]
|
||||
>>> texts = [f'This is a photo of {label}.' for label in candidate_labels]
|
||||
>>> # important: we pass `padding=max_length` since the model was trained with this
|
||||
>>> inputs = processor(text=texts, images=image, padding="max_length", return_tensors="pt")
|
||||
|
||||
@ -94,7 +94,7 @@ If you want to do the pre- and postprocessing yourself, here's how to do that:
|
||||
|
||||
>>> logits_per_image = outputs.logits_per_image
|
||||
>>> probs = torch.sigmoid(logits_per_image) # these are the probabilities
|
||||
>>> print(f"{probs[0][0]:.1%} that image 0 is '{texts[0]}'")
|
||||
>>> print(f"{probs[0][0]:.1%} that image 0 is '{candidate_labels[0]}'")
|
||||
31.9% that image 0 is 'a photo of 2 cats'
|
||||
```
|
||||
|
||||
@ -140,9 +140,9 @@ To load and run a model using Flash Attention 2, refer to the snippet below:
|
||||
|
||||
>>> candidate_labels = ["2 cats", "2 dogs"]
|
||||
# follows the pipeline prompt template to get same results
|
||||
>>> candidate_labels = [f'This is a photo of {label}.' for label in candidate_labels]
|
||||
>>> texts = [f'This is a photo of {label}.' for label in candidate_labels]
|
||||
# important: we pass `padding=max_length` since the model was trained with this
|
||||
>>> inputs = processor(text=candidate_labels, images=image, padding="max_length", return_tensors="pt")
|
||||
>>> inputs = processor(text=texts, images=image, padding="max_length", return_tensors="pt")
|
||||
>>> inputs.to(device)
|
||||
|
||||
>>> with torch.no_grad():
|
||||
@ -240,4 +240,4 @@ Below is an expected speedup diagram that compares inference time between the na
|
||||
## SiglipForImageClassification
|
||||
|
||||
[[autodoc]] SiglipForImageClassification
|
||||
- forward
|
||||
- forward
|
||||
|
Loading…
Reference in New Issue
Block a user