transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-03 12:50:06 +06:00

History

Alex Brooks 623d395aff Add Granite Speech Support (#36801 ) * First pass at speech granite Add encoder / projector, rename things * Combine into one model file with causal lm outputs for forward * Add loss calc * Fix config loading Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> * Split new / old loading logic * Use transformers integration for loading peft adapters * Add generation wrapper for selective lora enablement * Add note for qformer encoder automodel * Guard torch/audio imports in feature extractor * Handle granite speech autoclasses * Handle optional deps in package structure for granite speech * Add granite pretrained model def for init * Add dummy objects for torch/torchaudio * Add tests for granite speech processor * Minor formatting fixes and refactoring * Add options for falling back to config in forward * Tentative model docstrings for granite speech * Fix config type * Remove legacy load * Allow non-lora variants for granite speech * Override weight tying for llm * Use text config instead of llm config * Add output embeddings getter to fix weight tying * Fix relative imports * computing the number of audio features, based on the raw audio sequence. * collating audio inputs, and keeping the original lengths. * asserted we have text. otherwise we can't specify the audio special token. * assering the number of audio-symbols/audios match correctly. running get validated_audios only when audio is present * indentation bugfix + supporting different feature lengths when expanding audio. * redundant, done in _get_validated_text * adapting the tests: - we must have text (not either audio or text) - _get_num_audio_features takes a list of raw lengths, provided it insetad. * Minor cleanup, remove unused import * Add more tests for batch feature processing * Allow setting offset in rel position embeddings * Add config option for warning if peft is not installed w/ lora * Port blip2 qformer code into granite speech * Add sad test for numpy arr processing * Allow numpy arrays / tuples in granite speech processor * Fix config type for projector * - pad instead of creating a zeros tensor, to keep the original dtype/device (support bfloat16) - cast input_features to the model dtype (support bfloat16) * merge Blip2QFormerConfig to GraniteSpeechProjectorConfig * prevent a crash when re-saving/loading the model (line 109) * consider additional edge cases during preprocessing. * consider additional edge cases during preprocessing. * add features mask for batched inference (bugfix) * Minor refactor, remove multiaudio processor tests * Add set input/output embeddings for granite speech * Fix feature dim check in processor test * Pop input features in embed test for granite speech * Small fixes for test edge cases Add granite speech to seq2seq causal lm mapping names * Add small tests for granite speech model * Fix data parallelism test * Standardize model class names * Fix check for copies * Fix misaligned init check * Skip granite speech in checkpoint check * Use default for tie_word_embeddings in granite speech * Fix non documentation granite speech repo issues * Fix comments and docstring checks * Add placeholder docs for granite speech * Fix test naming collision * Code formatting * Rerun torch dummy obj regen * Fix save pretrained for granite speech * Import sorting * Fix tests typo * Remove offset hack * Pass args through encoder config * Remove unused prune heads from blip2 * removing einsum. replaced with explicit multiplication (relative positional encodings) and sdpa attention. * remove Sequential from ConformerFeedForward and ConformerConvModule. + fix for sdpa attention * remove GraniteSpeechConformerScale * rename to hidden_states * rename conformer layers to self.layers, remove the first linear from the list to keep the list homogenous. * move pre-norm to the attention/feedforward blocks (avoid complex module wrapping) * adding pre_norm into forward * feature extractor refactoring to resemble how it's done in phi4multimodal. * rename feature_extractor to audio_processor * bugfix: input_feature_mask fix to get the exact number tokens. * Fix pytest decorator in processor test * Add (disabled) integration tests for granite speech * Fix handling of optional feature masking * Loosen validation in processing for vLLM compatability * Formatting fixes * Update init structure to mirror llama * Make granite speech projector generic * Update test config to reflect generic projector * Formatting fixes * Fix typos, add license * Fix undefined var in input processing * Cleanup and expose ctc encoder * Add missing config docstrings * Better var names, type hints, etc * Set attn context size in init * Add max pos emb to encoder config * Cleanup feature extractor * Add granite speech architecture details * Remove granite speech qformer ref * Add paper link, explicit calc for qkv * Calculate padding directly in depthwise conv1d init * Raise value error instead of asserting * Reorder class defs (classes used at top) * Precompute relpos distances * Run formatting * Pass attention distances through forward * Apply suggestions from code review Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com> * Add todo for using common batch feature extraction * Rename audios/features * Ensure chat template may be provided to processor * Move granite speech docs to audio models * Add todos for input proc refactoring * Fix import order * Guard torch import * Use relative imports * Require torch backend for processor in granite speech * Add backend guards in feature extractor --------- Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> Co-authored-by: Avihu Dekel <avihu.dekel@ibm.com> Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>		2025-04-11 18:52:00 +02:00
..
test_module	AutoImageProcessor (#20111 )	2022-11-08 19:54:41 +00:00
tf_ops	Check TF ops for ONNX compliance (#10025 )	2021-02-15 07:55:10 -05:00
add_pipeline_model_mapping_to_test.py	update ruff version (#30932 )	2024-05-22 06:40:15 +02:00
check_bad_commit.py	Fix `utils/check_bad_commit.py` (#37272 )	2025-04-04 12:18:20 +02:00
check_build.py	Use `deformable_detr` kernel from the Hub (#36853 )	2025-03-21 13:08:47 +01:00
check_config_attributes.py	Multiple llama4 fixe (#37353 )	2025-04-08 11:14:49 +02:00
check_config_docstrings.py	Add Granite Speech Support (#36801 )	2025-04-11 18:52:00 +02:00
check_copies.py	chore: fix typos in utils module (#36668 )	2025-03-13 15:12:44 +00:00
check_doc_toc.py	update ruff version (#30932 )	2024-05-22 06:40:15 +02:00
check_docstrings.py	Expose blip2qformer (#37254 )	2025-04-08 12:04:33 +02:00
check_doctest_list.py	update ruff version (#30932 )	2024-05-22 06:40:15 +02:00
check_dummies.py	Add llama4 (#37307 )	2025-04-05 22:02:22 +02:00
check_inits.py	Simplify soft dependencies and update the dummy-creation process (#36827 )	2025-04-11 11:08:36 +02:00
check_model_tester.py	Add a new script to check model testers' config (#22063 )	2023-03-13 19:11:19 +01:00
check_modular_conversion.py	[modular] Sort modular skips (#36304 )	2025-03-20 10:55:12 +00:00
check_repo.py	Simplify soft dependencies and update the dummy-creation process (#36827 )	2025-04-11 11:08:36 +02:00
check_self_hosted_runner.py	Tiny fix for `check_self_hosted_runner.py` (#24052 )	2023-06-06 18:17:41 +02:00
check_tf_ops.py	Check TF ops for ONNX compliance (#10025 )	2021-02-15 07:55:10 -05:00
create_dependency_mapping.py	Modular Conversion --fix_and_overwrite on Windows (#36583 )	2025-03-06 13:12:30 +00:00
create_dummy_models.py	CI: fix `efficientnet` pipeline timeout and prevent future similar issues due to large image size (#33123 )	2024-08-27 11:58:27 +01:00
custom_init_isort.py	chore: fix typos in utils module (#36668 )	2025-03-13 15:12:44 +00:00
deprecate_models.py	chore: fix typos in utils module (#36668 )	2025-03-13 15:12:44 +00:00
download_glue_data.py	Update ruff to `0.11.2` (#36962 )	2025-03-25 16:00:11 +01:00
extract_warnings.py	update github actions packages' version to suppress warnings (#30249 )	2024-04-15 15:08:09 +02:00
fetch_hub_objects_for_ci.py	Try to avoid/reduce some remaining CI job failures (#37202 )	2025-04-02 14:39:57 +02:00
get_ci_error_statistics.py	Add artifact name in job step to maintain job / artifact correspondence (#28682 )	2024-01-31 15:58:17 +01:00
get_github_job_time.py	Update ruff to `0.11.2` (#36962 )	2025-03-25 16:00:11 +01:00
get_modified_files.py	exclude deleted files in the fixup script (#21436 )	2023-02-03 12:57:02 -05:00
get_previous_daily_ci.py	Ping team members for new failed tests in daily CI (#34171 )	2024-10-17 16:11:52 +02:00
get_test_info.py	CI: fix `efficientnet` pipeline timeout and prevent future similar issues due to large image size (#33123 )	2024-08-27 11:58:27 +01:00
important_models.txt	ENH: [`CI`] Add new workflow to run slow tests of important models on push main if they are modified (#29235 )	2024-04-12 10:01:28 +02:00
models_to_deprecate.py	update ruff version (#30932 )	2024-05-22 06:40:15 +02:00
modular_model_converter.py	Introduce modular files for speech models (#35902 )	2025-04-04 11:46:27 +02:00
not_doctested.txt	Simplify soft dependencies and update the dummy-creation process (#36827 )	2025-04-11 11:08:36 +02:00
notification_service_doc_tests.py	Refactor doctest (#30210 )	2024-04-15 13:20:36 +02:00
notification_service_quantization.py	Update ruff to `0.11.2` (#36962 )	2025-03-25 16:00:11 +01:00
notification_service.py	Fix new failure reports not including anything other than `tests/models/` (#37415 )	2025-04-10 14:47:23 +02:00
past_ci_versions.py	Update ruff to `0.11.2` (#36962 )	2025-03-25 16:00:11 +01:00
patch_helper.py	[`Patch helper`] update to not have to checkout main (#34006 )	2024-10-09 09:21:46 +02:00
pr_slow_ci_models.py	notify new model merged to `main` (#36375 )	2025-02-24 17:53:18 +01:00
print_env.py	Print more library versions in CI (#17384 )	2022-06-02 10:24:16 +02:00
process_bad_commit_report.py	Tiny update after #34383 (#34404 )	2024-10-28 12:01:05 +01:00
process_circleci_workflow_test_reports.py	Update ruff to `0.11.2` (#36962 )	2025-03-25 16:00:11 +01:00
process_test_artifacts.py	fix the parallel number of CI nodes when it is smaller than number of tests (#33276 )	2024-09-03 16:53:21 +02:00
release.py	Remove research projects (#36645 )	2025-03-11 13:47:38 +00:00
set_cuda_devices_for_ci.py	Fix Cohere CI (#31263 )	2024-06-10 15:16:58 +02:00
slow_documentation_tests.txt	Update CodeLlama references (#30218 )	2024-05-09 22:57:52 +02:00
sort_auto_mappings.py	update ruff version (#30932 )	2024-05-22 06:40:15 +02:00
split_doctest_jobs.py	chore: fix typos in utils module (#36668 )	2025-03-13 15:12:44 +00:00
split_model_tests.py	consistent job / pytest report / artifact name correspondence (#30392 )	2024-04-24 22:32:42 +02:00
tests_fetcher.py	Fix the test fetcher (#37452 )	2025-04-11 12:19:27 +02:00
update_metadata.py	Update ruff to `0.11.2` (#36962 )	2025-03-25 16:00:11 +01:00
update_tiny_models.py	Mention model_info.id instead of model_info.modelId (#32106 )	2024-07-22 14:14:47 +01:00