Files
bd-fhir-national/ops/version-upgrade-integration.md
2026-03-16 00:02:58 +06:00

14 KiB

ICD-11 Version Upgrade — HAPI Integration

Audience: ICD-11 Terminology Pipeline team, DGHS FHIR ops
Related: version_upgrade.py (OCL import pipeline)
HAPI endpoint: DELETE /admin/terminology/cache


Overview

When a new ICD-11 MMS release is imported into OCL, the HAPI server's 24-hour terminology validation cache becomes stale. Vendors submitting resources after the import — but before the cache expires — will have their ICD-11 codes validated against the old OCL data. New codes from the new release will be incorrectly rejected as invalid (cache miss → OCL hit with old data → cached as invalid). Removed or reclassified codes that were previously valid will continue to be accepted from cache.

The cache flush endpoint resolves this. Calling it after OCL import forces the next validation call for every ICD-11 code to hit OCL directly, repopulating the cache with the new version's data.


Step-by-step upgrade procedure

The following steps must be executed in this exact order. Deviating from the order (e.g., flushing before OCL import completes) causes the cache to repopulate with old data and requires a second flush.

Step 1  OCL: import new ICD-11 MMS release
Step 2  OCL: patch concept_class for Diagnosis + Finding concepts
Step 3  OCL: repopulate bd-condition-icd11-diagnosis-valueset collection
Step 4  OCL: verify $validate-code returns correct results for new codes
Step 5  HAPI: flush terminology cache        ← this document
Step 6  HAPI: verify validation with new codes
Step 7  DGHS: notify vendors of new release

Steps 1-4 are handled by version_upgrade.py. This document covers Steps 5-6 and the exact integration between the two systems.


Step 4 — Pre-flush verification (run before calling HAPI)

Before flushing the HAPI cache, verify that OCL is serving correct results for the new release. Flushing a cache backed by an incorrect OCL state degrades validation quality.

4a — Verify a new code is valid in OCL

Pick a code that is new in this release (not in the previous release).

NEW_CODE="XY9Z"  # Replace with an actual new code from the release notes

curl -s "https://tr.ocl.dghs.gov.bd/api/fhir/ValueSet/\$validate-code\
?url=https://fhir.dghs.gov.bd/core/ValueSet/bd-condition-icd11-diagnosis-valueset\
&system=http://id.who.int/icd/release/11/mms\
&code=${NEW_CODE}" | jq '.parameter[] | select(.name=="result") | .valueBoolean'

# Expected: true

4b — Verify a Device-class code is rejected by OCL

Device-class codes must be rejected by the bd-condition-icd11-diagnosis-valueset (which restricts to Diagnosis + Finding only).

DEVICE_CODE="XA7RE2"  # Example Device class code — use an actual one

curl -s "https://tr.ocl.dghs.gov.bd/api/fhir/ValueSet/\$validate-code\
?url=https://fhir.dghs.gov.bd/core/ValueSet/bd-condition-icd11-diagnosis-valueset\
&system=http://id.who.int/icd/release/11/mms\
&code=${DEVICE_CODE}" | jq '.parameter[] | select(.name=="result") | .valueBoolean'

# Expected: false

4c — Verify a deprecated code is invalid

If this release deprecates or removes any codes, verify they are now rejected.

DEPRECATED_CODE="..."  # From release notes

curl -s "https://tr.ocl.dghs.gov.bd/api/fhir/ValueSet/\$validate-code\
?url=https://fhir.dghs.gov.bd/core/ValueSet/bd-condition-icd11-diagnosis-valueset\
&system=http://id.who.int/icd/release/11/mms\
&code=${DEPRECATED_CODE}" | jq '.parameter[] | select(.name=="result") | .valueBoolean'

# Expected: false (if deprecated) or true (if still valid)

Do not proceed to Step 5 until all 4a-4c verifications pass.


Step 5 — Flush the HAPI terminology cache

5a — Obtain fhir-admin token

The cache flush endpoint requires the fhir-admin Keycloak role. The fhir-admin-pipeline client is the designated service account for this operation (see ops/keycloak-setup.md, Part 2).

# In version_upgrade.py — add this function

import requests
import json

KEYCLOAK_TOKEN_URL = "https://auth.dghs.gov.bd/realms/hris/protocol/openid-connect/token"
FHIR_ADMIN_CLIENT_ID = "fhir-admin-pipeline"
FHIR_ADMIN_CLIENT_SECRET = os.environ["FHIR_ADMIN_CLIENT_SECRET"]  # from secrets vault
HAPI_BASE_URL = "https://fhir.dghs.gov.bd"


def get_fhir_admin_token() -> str:
    """Obtain a fhir-admin Bearer token from Keycloak."""
    response = requests.post(
        KEYCLOAK_TOKEN_URL,
        data={
            "grant_type":    "client_credentials",
            "client_id":     FHIR_ADMIN_CLIENT_ID,
            "client_secret": FHIR_ADMIN_CLIENT_SECRET,
        },
        timeout=30,
    )
    response.raise_for_status()
    token_data = response.json()
    access_token = token_data["access_token"]

    # Verify the token contains fhir-admin role before using it
    # (parse middle segment of JWT)
    import base64
    payload_b64 = access_token.split(".")[1]
    # Add padding if needed
    payload_b64 += "=" * (4 - len(payload_b64) % 4)
    claims = json.loads(base64.b64decode(payload_b64))

    realm_roles = claims.get("realm_access", {}).get("roles", [])
    if "fhir-admin" not in realm_roles:
        raise ValueError(
            f"fhir-admin-pipeline token does not contain fhir-admin role. "
            f"Roles present: {realm_roles}. "
            f"Check Keycloak service account role assignment."
        )

    return access_token
def get_cache_stats(admin_token: str) -> dict:
    """Retrieve current HAPI terminology cache statistics."""
    response = requests.get(
        f"{HAPI_BASE_URL}/admin/terminology/cache/stats",
        headers={"Authorization": f"Bearer {admin_token}"},
        timeout=30,
    )
    response.raise_for_status()
    return response.json()


# Usage:
stats_before = get_cache_stats(admin_token)
print(f"Cache before flush: {stats_before['totalEntries']} entries "
      f"({stats_before['liveEntries']} live, "
      f"{stats_before['expiredEntries']} expired)")

5c — Execute cache flush

def flush_hapi_terminology_cache(admin_token: str) -> dict:
    """
    Flush the HAPI ICD-11 terminology validation cache.

    Must be called AFTER:
      - OCL ICD-11 import is complete
      - concept_class patch is applied
      - bd-condition-icd11-diagnosis-valueset is repopulated
      - $validate-code verified returning correct results

    Returns the flush summary from HAPI.
    Raises requests.HTTPError on failure.
    """
    response = requests.delete(
        f"{HAPI_BASE_URL}/admin/terminology/cache",
        headers={"Authorization": f"Bearer {admin_token}"},
        timeout=60,  # allow time for HAPI to process across all replicas
    )

    if response.status_code == 403:
        raise PermissionError(
            "Cache flush rejected: fhir-admin role not present in token. "
            "Check Keycloak fhir-admin-pipeline service account configuration."
        )
    response.raise_for_status()

    result = response.json()
    print(f"HAPI cache flush completed: {result['entriesEvicted']} entries evicted "
          f"at {result['timestamp']}")
    return result


# Full upgrade function to add to version_upgrade.py:
def post_ocl_import_hapi_integration(icd11_version: str) -> None:
    """
    Call after successful OCL import and verification.
    Flushes HAPI cache and verifies the new version validates correctly.

    Args:
        icd11_version: The new ICD-11 version string, e.g. "2025-01"
    """
    print(f"\n=== HAPI integration: ICD-11 {icd11_version} ===")

    # Step 5a: get admin token
    print("Obtaining fhir-admin token...")
    admin_token = get_fhir_admin_token()
    print("Token obtained.")

    # Step 5b: record pre-flush state
    stats_before = get_cache_stats(admin_token)
    print(f"Pre-flush cache: {stats_before['totalEntries']} entries")

    # Step 5c: flush
    print("Flushing HAPI terminology cache...")
    flush_result = flush_hapi_terminology_cache(admin_token)
    print(f"Flush complete: {flush_result['entriesEvicted']} entries evicted")

    # Step 6: post-flush verification (see below)
    verify_hapi_validates_new_version(admin_token, icd11_version)

    print(f"=== HAPI integration complete for ICD-11 {icd11_version} ===\n")

Step 6 — Post-flush verification

After the flush, verify that HAPI is now validating against the new OCL data. This confirms the end-to-end pipeline from OCL → HAPI cache → vendor validation.

6a — Submit a test Condition with a new ICD-11 code

The test resource must be submitted by the fhir-admin-pipeline client. Note: the admin client has fhir-admin role but the FHIR resource endpoints require mci-api role. Use a dedicated test vendor client for resource submission, or temporarily assign mci-api to the admin client for testing.

Recommended approach: use a dedicated test vendor client (fhir-vendor-test-pipeline) with mci-api role for post-upgrade verification.

def verify_hapi_validates_new_version(
        admin_token: str, icd11_version: str) -> None:
    """
    Verifies HAPI is now accepting codes from the new ICD-11 version.
    Uses the $validate-code operation directly against HAPI (not resource submission)
    to avoid needing mci-api role on the admin client.

    Note: HAPI's $validate-code endpoint proxies to OCL via the validation chain.
    A successful result confirms the cache was flushed AND OCL is returning
    correct results for the new version.
    """
    # Use a known-valid code from the new release
    # This should be parameterised with the actual new code from release notes
    test_code = get_test_code_for_version(icd11_version)  # implement per release
    valueset_url = (
        "https://fhir.dghs.gov.bd/core/ValueSet/"
        "bd-condition-icd11-diagnosis-valueset"
    )

    response = requests.get(
        f"{HAPI_BASE_URL}/fhir/ValueSet/$validate-code",
        params={
            "url":    valueset_url,
            "system": "http://id.who.int/icd/release/11/mms",
            "code":   test_code,
        },
        headers={"Authorization": f"Bearer {admin_token}"},
        timeout=30,
    )

    if response.status_code == 401:
        # $validate-code requires mci-api — use a vendor test token here
        print("WARNING: $validate-code requires mci-api role. "
              "Skipping HAPI direct verification. "
              "Verify manually by submitting a test Condition resource.")
        return

    response.raise_for_status()
    result = response.json()

    valid = next(
        (p["valueBoolean"] for p in result.get("parameter", [])
         if p["name"] == "result"),
        None
    )

    if valid is True:
        print(f"✓ HAPI verification passed: code '{test_code}' "
              f"valid in new ICD-11 {icd11_version}")
    else:
        message = next(
            (p.get("valueString") for p in result.get("parameter", [])
             if p["name"] == "message"),
            "no message"
        )
        raise ValueError(
            f"HAPI verification FAILED: code '{test_code}' rejected after cache flush. "
            f"Message: {message}. "
            f"Check OCL import completed correctly for ICD-11 {icd11_version}."
        )

Integration into version_upgrade.py — call site

Add to the end of your main upgrade function, after the OCL verification steps:

def run_upgrade(icd11_version: str) -> None:
    """Main upgrade entry point."""

    # --- Existing steps (your current implementation) ---
    print(f"Starting ICD-11 {icd11_version} upgrade...")

    # 1. Import ICD-11 concepts into OCL
    import_concepts_to_ocl(icd11_version)

    # 2. Patch concept_class for Diagnosis + Finding
    patch_concept_class(icd11_version)

    # 3. Repopulate bd-condition-icd11-diagnosis-valueset
    repopulate_condition_valueset(icd11_version)

    # 4. Verify OCL $validate-code
    verify_ocl_validate_code(icd11_version)

    # --- New: HAPI integration ---
    # 5-6. Flush HAPI cache and verify
    post_ocl_import_hapi_integration(icd11_version)

    # 7. Notify vendors
    notify_vendors_of_upgrade(icd11_version)

    print(f"ICD-11 {icd11_version} upgrade complete.")

Environment variables required by version_upgrade.py

Add to your upgrade pipeline's secrets configuration:

# Keycloak admin client for HAPI cache management
FHIR_ADMIN_CLIENT_SECRET=<secret from keycloak-setup.md Part 2>

# HAPI server base URL
HAPI_BASE_URL=https://fhir.dghs.gov.bd

Rollback procedure

If post-flush verification fails (HAPI is not accepting new codes):

  1. Do not re-run the flush — the cache is already empty, re-flushing has no effect.
  2. Check OCL directly: curl https://tr.ocl.dghs.gov.bd/api/fhir/ValueSet/$validate-code?...
  3. If OCL is returning wrong results: the OCL import is incomplete. Re-run steps 1-4.
  4. If OCL is returning correct results but HAPI is not: check HAPI logs for OCL connectivity errors. OCL may have returned HTTP 5xx during the first post-flush validation call, triggering fail-open behaviour.
  5. After fixing OCL: flush the cache again (it has repopulated with bad data).
# Emergency manual flush via curl
ADMIN_TOKEN=$(curl -s -X POST \
  "https://auth.dghs.gov.bd/realms/hris/protocol/openid-connect/token" \
  -d "grant_type=client_credentials" \
  -d "client_id=fhir-admin-pipeline" \
  -d "client_secret=${FHIR_ADMIN_CLIENT_SECRET}" \
  | jq -r '.access_token')

curl -s -X DELETE \
  -H "Authorization: Bearer ${ADMIN_TOKEN}" \
  https://fhir.dghs.gov.bd/admin/terminology/cache | jq .

Cache warm-up after flush

The HAPI cache repopulates organically as vendors submit resources. There is no pre-warming mechanism. The first vendor submission after a flush for each code will take up to 10 seconds (OCL timeout) rather than sub-millisecond (cache hit). At pilot scale (50 vendors, <36,941 distinct codes in use), this is acceptable.

At national scale, consider a pre-warming job that submits $validate-code requests for the top-N most frequently submitted ICD-11 codes immediately after the flush. The top-N list is derivable from the audit.audit_events table:

SELECT invalid_code, COUNT(*) as frequency
FROM audit.fhir_rejected_submissions
WHERE rejection_code = 'TERMINOLOGY_INVALID_CODE'
  AND submission_time > NOW() - INTERVAL '90 days'
GROUP BY invalid_code
ORDER BY frequency DESC
LIMIT 100;
-- Invert: these are rejected codes. Use accepted codes from audit_events instead.

SELECT
    (validation_messages ->> 0) as code_info,
    COUNT(*) as frequency
FROM audit.audit_events
WHERE outcome = 'ACCEPTED'
  AND resource_type = 'Condition'
  AND event_time > NOW() - INTERVAL '90 days'
GROUP BY 1
ORDER BY frequency DESC
LIMIT 200;