transformers/tests
Matt 4563ba2c6f
Fix StopStringCriteria to handle tokens above len(tokenizer) (#35797)
* Fix StopStringCriteria to handle tokens above len(tokenizer)

This fixes #35244 by clipping token IDs to be within the tokenizer's vocabulary size before performing the embedding lookup. This prevents index errors when model.config.vocab_size > len(tokenizer).

The fix:
1. Adds a clamp operation to ensure token IDs are within bounds
2. Adds a test case to verify the behavior

* Use self.stop_strings instead of stop_strings

* Handle clipping correctly

* make fixup

* Update test to the new embedding vecs

* Use much bigger values in the mismatch test

* Typo fix

* Slight simplification

---------

Co-authored-by: openhands <openhands@all-hands.dev>
2025-02-06 16:53:28 +00:00
..
agents use torch.testing.assertclose instead to get more details about error in cis (#35659) 2025-01-24 16:55:28 +01:00
bettertransformer use torch.testing.assertclose instead to get more details about error in cis (#35659) 2025-01-24 16:55:28 +01:00
deepspeed DeepSpeed github repo move sync (#36021) 2025-02-05 08:19:31 -08:00
extended [tests] skip tests for xpu (#33553) 2024-09-19 19:28:04 +01:00
fixtures Implementation of SuperPoint and AutoModelForKeypointDetection (#28966) 2024-03-19 14:43:02 +00:00
fsdp [tests] make cuda-only tests device-agnostic (#35607) 2025-01-13 14:48:39 +01:00
generation Fix StopStringCriteria to handle tokens above len(tokenizer) (#35797) 2025-02-06 16:53:28 +00:00
models Paligemma: fix generation with Gemma2 (#36044) 2025-02-06 14:31:32 +01:00
optimization Update unwrap_and_save_reload_schedule to use weights_only=False (#35952) 2025-01-29 14:30:57 +01:00
peft_integration use torch.testing.assertclose instead to get more details about error in cis (#35659) 2025-01-24 16:55:28 +01:00
pipelines Output dicts support in text generation pipeline (#35092) 2025-01-29 14:44:46 +00:00
quantization Fix words typos in ggml test. (#36060) 2025-02-06 15:32:40 +00:00
repo_utils Fix modular edge case + modular sorting order (#35562) 2025-01-09 17:17:52 +01:00
sagemaker Trainer - deprecate tokenizer for processing_class (#32385) 2024-10-02 14:08:46 +01:00
tokenization tokenizer train from iterator without pre_tokenizers (#35396) 2025-01-09 15:34:43 +01:00
tp Update-tp test (#35844) 2025-02-03 09:37:02 +01:00
trainer layernorm_decay_fix (#35927) 2025-02-04 11:01:49 +01:00
utils Nail in edge case of torch dtype being overriden permantly in the case of an error (#35845) 2025-02-06 09:05:23 -05:00
__init__.py
test_audio_classification_top_k.py Fix Audio Classification Pipeline top_k Documentation Mismatch and Bug #35736 (#35771) 2025-02-05 16:25:08 +00:00
test_backbone_common.py Align backbone stage selection with out_indices & out_features (#27606) 2023-12-20 18:33:17 +00:00
test_configuration_common.py Load sub-configs from composite configs (#34410) 2024-11-05 11:34:01 +01:00
test_feature_extraction_common.py
test_image_processing_common.py Refactoring of ImageProcessorFast (#35069) 2025-02-04 17:52:31 -05:00
test_image_transforms.py fix: center_crop occasionally outputs off-by-one dimension matrix (#30934) 2024-05-21 13:56:52 +01:00
test_modeling_common.py Fix model kwargs (#35875) 2025-02-06 11:35:25 -05:00
test_modeling_flax_common.py 🚨All attention refactor🚨 (#35235) 2024-12-18 16:53:39 +01:00
test_modeling_tf_common.py 🚨All attention refactor🚨 (#35235) 2024-12-18 16:53:39 +01:00
test_pipeline_mixin.py Add image text to text pipeline (#34170) 2024-10-31 15:48:11 -04:00
test_processing_common.py VLMs: major clean up 🧼 (#34502) 2025-01-08 10:35:23 +01:00
test_sequence_feature_extraction_common.py Fix typo (#25966) 2023-09-05 10:12:25 +02:00
test_tokenization_common.py apply_chat_template: consistent behaviour for return_assistant_tokens_mask=True return_tensors=True (#35582) 2025-02-04 10:27:52 +01:00