bd-fhir-national/ops/version-upgrade-integration.md

# ICD-11 Version Upgrade — HAPI Integration

**Audience:** ICD-11 Terminology Pipeline team, DGHS FHIR ops
**Related:** `version_upgrade.py` (OCL import pipeline)
**HAPI endpoint:** `DELETE /admin/terminology/cache`

---

## Overview

When a new ICD-11 MMS release is imported into OCL, the HAPI server's
24-hour terminology validation cache becomes stale. Vendors submitting
resources after the import — but before the cache expires — will have their
ICD-11 codes validated against the **old** OCL data. New codes from the
new release will be incorrectly rejected as invalid (cache miss → OCL hit
with old data → cached as invalid). Removed or reclassified codes that were
previously valid will continue to be accepted from cache.

**The cache flush endpoint resolves this.** Calling it after OCL import
forces the next validation call for every ICD-11 code to hit OCL directly,
repopulating the cache with the new version's data.

---

## Step-by-step upgrade procedure

The following steps must be executed **in this exact order**. Deviating
from the order (e.g., flushing before OCL import completes) causes the
cache to repopulate with old data and requires a second flush.

```
Step 1  OCL: import new ICD-11 MMS release
Step 2  OCL: patch concept_class for Diagnosis + Finding concepts
Step 3  OCL: repopulate bd-condition-icd11-diagnosis-valueset collection
Step 4  OCL: verify $validate-code returns correct results for new codes
Step 5  HAPI: flush terminology cache        ← this document
Step 6  HAPI: verify validation with new codes
Step 7  DGHS: notify vendors of new release
```

Steps 1-4 are handled by `version_upgrade.py`. This document covers
Steps 5-6 and the exact integration between the two systems.

---

## Step 4 — Pre-flush verification (run before calling HAPI)

Before flushing the HAPI cache, verify that OCL is serving correct results
for the new release. Flushing a cache backed by an incorrect OCL state
degrades validation quality.

### 4a — Verify a new code is valid in OCL

Pick a code that is **new** in this release (not in the previous release).

```bash
NEW_CODE="XY9Z"  # Replace with an actual new code from the release notes

curl -s "https://tr.ocl.dghs.gov.bd/api/fhir/ValueSet/\$validate-code\
?url=https://fhir.dghs.gov.bd/core/ValueSet/bd-condition-icd11-diagnosis-valueset\
&system=http://id.who.int/icd/release/11/mms\
&code=${NEW_CODE}" | jq '.parameter[] | select(.name=="result") | .valueBoolean'

# Expected: true
```

### 4b — Verify a Device-class code is rejected by OCL

Device-class codes must be rejected by the bd-condition-icd11-diagnosis-valueset
(which restricts to Diagnosis + Finding only).

```bash
DEVICE_CODE="XA7RE2"  # Example Device class code — use an actual one

curl -s "https://tr.ocl.dghs.gov.bd/api/fhir/ValueSet/\$validate-code\
?url=https://fhir.dghs.gov.bd/core/ValueSet/bd-condition-icd11-diagnosis-valueset\
&system=http://id.who.int/icd/release/11/mms\
&code=${DEVICE_CODE}" | jq '.parameter[] | select(.name=="result") | .valueBoolean'

# Expected: false
```

### 4c — Verify a deprecated code is invalid

If this release deprecates or removes any codes, verify they are now rejected.

```bash
DEPRECATED_CODE="..."  # From release notes

curl -s "https://tr.ocl.dghs.gov.bd/api/fhir/ValueSet/\$validate-code\
?url=https://fhir.dghs.gov.bd/core/ValueSet/bd-condition-icd11-diagnosis-valueset\
&system=http://id.who.int/icd/release/11/mms\
&code=${DEPRECATED_CODE}" | jq '.parameter[] | select(.name=="result") | .valueBoolean'

# Expected: false (if deprecated) or true (if still valid)
```

Do not proceed to Step 5 until all 4a-4c verifications pass.

---

## Step 5 — Flush the HAPI terminology cache

### 5a — Obtain fhir-admin token

The cache flush endpoint requires the `fhir-admin` Keycloak role.
The `fhir-admin-pipeline` client is the designated service account for
this operation (see `ops/keycloak-setup.md`, Part 2).

```python
# In version_upgrade.py — add this function

import requests
import json

KEYCLOAK_TOKEN_URL = "https://auth.dghs.gov.bd/realms/hris/protocol/openid-connect/token"
FHIR_ADMIN_CLIENT_ID = "fhir-admin-pipeline"
FHIR_ADMIN_CLIENT_SECRET = os.environ["FHIR_ADMIN_CLIENT_SECRET"]  # from secrets vault
HAPI_BASE_URL = "https://fhir.dghs.gov.bd"


def get_fhir_admin_token() -> str:
    """Obtain a fhir-admin Bearer token from Keycloak."""
    response = requests.post(
        KEYCLOAK_TOKEN_URL,
        data={
            "grant_type":    "client_credentials",
            "client_id":     FHIR_ADMIN_CLIENT_ID,
            "client_secret": FHIR_ADMIN_CLIENT_SECRET,
        },
        timeout=30,
    )
    response.raise_for_status()
    token_data = response.json()
    access_token = token_data["access_token"]

    # Verify the token contains fhir-admin role before using it
    # (parse middle segment of JWT)
    import base64
    payload_b64 = access_token.split(".")[1]
    # Add padding if needed
    payload_b64 += "=" * (4 - len(payload_b64) % 4)
    claims = json.loads(base64.b64decode(payload_b64))

    realm_roles = claims.get("realm_access", {}).get("roles", [])
    if "fhir-admin" not in realm_roles:
        raise ValueError(
            f"fhir-admin-pipeline token does not contain fhir-admin role. "
            f"Roles present: {realm_roles}. "
            f"Check Keycloak service account role assignment."
        )

    return access_token
```

### 5b — Check cache state before flush (optional but recommended)

```python
def get_cache_stats(admin_token: str) -> dict:
    """Retrieve current HAPI terminology cache statistics."""
    response = requests.get(
        f"{HAPI_BASE_URL}/admin/terminology/cache/stats",
        headers={"Authorization": f"Bearer {admin_token}"},
        timeout=30,
    )
    response.raise_for_status()
    return response.json()


# Usage:
stats_before = get_cache_stats(admin_token)
print(f"Cache before flush: {stats_before['totalEntries']} entries "
      f"({stats_before['liveEntries']} live, "
      f"{stats_before['expiredEntries']} expired)")
```

### 5c — Execute cache flush

```python
def flush_hapi_terminology_cache(admin_token: str) -> dict:
    """
    Flush the HAPI ICD-11 terminology validation cache.

    Must be called AFTER:
      - OCL ICD-11 import is complete
      - concept_class patch is applied
      - bd-condition-icd11-diagnosis-valueset is repopulated
      - $validate-code verified returning correct results

    Returns the flush summary from HAPI.
    Raises requests.HTTPError on failure.
    """
    response = requests.delete(
        f"{HAPI_BASE_URL}/admin/terminology/cache",
        headers={"Authorization": f"Bearer {admin_token}"},
        timeout=60,  # allow time for HAPI to process across all replicas
    )

    if response.status_code == 403:
        raise PermissionError(
            "Cache flush rejected: fhir-admin role not present in token. "
            "Check Keycloak fhir-admin-pipeline service account configuration."
        )
    response.raise_for_status()

    result = response.json()
    print(f"HAPI cache flush completed: {result['entriesEvicted']} entries evicted "
          f"at {result['timestamp']}")
    return result


# Full upgrade function to add to version_upgrade.py:
def post_ocl_import_hapi_integration(icd11_version: str) -> None:
    """
    Call after successful OCL import and verification.
    Flushes HAPI cache and verifies the new version validates correctly.

    Args:
        icd11_version: The new ICD-11 version string, e.g. "2025-01"
    """
    print(f"\n=== HAPI integration: ICD-11 {icd11_version} ===")

    # Step 5a: get admin token
    print("Obtaining fhir-admin token...")
    admin_token = get_fhir_admin_token()
    print("Token obtained.")

    # Step 5b: record pre-flush state
    stats_before = get_cache_stats(admin_token)
    print(f"Pre-flush cache: {stats_before['totalEntries']} entries")

    # Step 5c: flush
    print("Flushing HAPI terminology cache...")
    flush_result = flush_hapi_terminology_cache(admin_token)
    print(f"Flush complete: {flush_result['entriesEvicted']} entries evicted")

    # Step 6: post-flush verification (see below)
    verify_hapi_validates_new_version(admin_token, icd11_version)

    print(f"=== HAPI integration complete for ICD-11 {icd11_version} ===\n")
```

---

## Step 6 — Post-flush verification

After the flush, verify that HAPI is now validating against the new OCL data.
This confirms the end-to-end pipeline from OCL → HAPI cache → vendor validation.

### 6a — Submit a test Condition with a new ICD-11 code

The test resource must be submitted by the `fhir-admin-pipeline` client.
Note: the admin client has `fhir-admin` role but the FHIR resource endpoints
require `mci-api` role. Use a dedicated test vendor client for resource
submission, or temporarily assign `mci-api` to the admin client for testing.

**Recommended approach:** use a dedicated test vendor client
(`fhir-vendor-test-pipeline`) with `mci-api` role for post-upgrade verification.

```python
def verify_hapi_validates_new_version(
        admin_token: str, icd11_version: str) -> None:
    """
    Verifies HAPI is now accepting codes from the new ICD-11 version.
    Uses the $validate-code operation directly against HAPI (not resource submission)
    to avoid needing mci-api role on the admin client.

    Note: HAPI's $validate-code endpoint proxies to OCL via the validation chain.
    A successful result confirms the cache was flushed AND OCL is returning
    correct results for the new version.
    """
    # Use a known-valid code from the new release
    # This should be parameterised with the actual new code from release notes
    test_code = get_test_code_for_version(icd11_version)  # implement per release
    valueset_url = (
        "https://fhir.dghs.gov.bd/core/ValueSet/"
        "bd-condition-icd11-diagnosis-valueset"
    )

    response = requests.get(
        f"{HAPI_BASE_URL}/fhir/ValueSet/$validate-code",
        params={
            "url":    valueset_url,
            "system": "http://id.who.int/icd/release/11/mms",
            "code":   test_code,
        },
        headers={"Authorization": f"Bearer {admin_token}"},
        timeout=30,
    )

    if response.status_code == 401:
        # $validate-code requires mci-api — use a vendor test token here
        print("WARNING: $validate-code requires mci-api role. "
              "Skipping HAPI direct verification. "
              "Verify manually by submitting a test Condition resource.")
        return

    response.raise_for_status()
    result = response.json()

    valid = next(
        (p["valueBoolean"] for p in result.get("parameter", [])
         if p["name"] == "result"),
        None
    )

    if valid is True:
        print(f"✓ HAPI verification passed: code '{test_code}' "
              f"valid in new ICD-11 {icd11_version}")
    else:
        message = next(
            (p.get("valueString") for p in result.get("parameter", [])
             if p["name"] == "message"),
            "no message"
        )
        raise ValueError(
            f"HAPI verification FAILED: code '{test_code}' rejected after cache flush. "
            f"Message: {message}. "
            f"Check OCL import completed correctly for ICD-11 {icd11_version}."
        )
```

---

## Integration into version_upgrade.py — call site

Add to the end of your main upgrade function, after the OCL verification steps:

```python
def run_upgrade(icd11_version: str) -> None:
    """Main upgrade entry point."""

    # --- Existing steps (your current implementation) ---
    print(f"Starting ICD-11 {icd11_version} upgrade...")

    # 1. Import ICD-11 concepts into OCL
    import_concepts_to_ocl(icd11_version)

    # 2. Patch concept_class for Diagnosis + Finding
    patch_concept_class(icd11_version)

    # 3. Repopulate bd-condition-icd11-diagnosis-valueset
    repopulate_condition_valueset(icd11_version)

    # 4. Verify OCL $validate-code
    verify_ocl_validate_code(icd11_version)

    # --- New: HAPI integration ---
    # 5-6. Flush HAPI cache and verify
    post_ocl_import_hapi_integration(icd11_version)

    # 7. Notify vendors
    notify_vendors_of_upgrade(icd11_version)

    print(f"ICD-11 {icd11_version} upgrade complete.")
```

---

## Environment variables required by version_upgrade.py

Add to your upgrade pipeline's secrets configuration:

```bash
# Keycloak admin client for HAPI cache management
FHIR_ADMIN_CLIENT_SECRET=<secret from keycloak-setup.md Part 2>

# HAPI server base URL
HAPI_BASE_URL=https://fhir.dghs.gov.bd
```

---

## Rollback procedure

If post-flush verification fails (HAPI is not accepting new codes):

1. **Do not re-run the flush** — the cache is already empty, re-flushing has no effect.
2. Check OCL directly: `curl https://tr.ocl.dghs.gov.bd/api/fhir/ValueSet/$validate-code?...`
3. If OCL is returning wrong results: the OCL import is incomplete. Re-run steps 1-4.
4. If OCL is returning correct results but HAPI is not: check HAPI logs for OCL
   connectivity errors. OCL may have returned HTTP 5xx during the first post-flush
   validation call, triggering fail-open behaviour.
5. After fixing OCL: flush the cache again (it has repopulated with bad data).

```bash
# Emergency manual flush via curl
ADMIN_TOKEN=$(curl -s -X POST \
  "https://auth.dghs.gov.bd/realms/hris/protocol/openid-connect/token" \
  -d "grant_type=client_credentials" \
  -d "client_id=fhir-admin-pipeline" \
  -d "client_secret=${FHIR_ADMIN_CLIENT_SECRET}" \
  | jq -r '.access_token')

curl -s -X DELETE \
  -H "Authorization: Bearer ${ADMIN_TOKEN}" \
  https://fhir.dghs.gov.bd/admin/terminology/cache | jq .
```

---

## Cache warm-up after flush

The HAPI cache repopulates organically as vendors submit resources.
There is no pre-warming mechanism. The first vendor submission after a flush
for each code will take up to 10 seconds (OCL timeout) rather than sub-millisecond
(cache hit). At pilot scale (50 vendors, <36,941 distinct codes in use),
this is acceptable.

At national scale, consider a pre-warming job that submits $validate-code requests
for the top-N most frequently submitted ICD-11 codes immediately after the flush.
The top-N list is derivable from the `audit.audit_events` table:

```sql
SELECT invalid_code, COUNT(*) as frequency
FROM audit.fhir_rejected_submissions
WHERE rejection_code = 'TERMINOLOGY_INVALID_CODE'
  AND submission_time > NOW() - INTERVAL '90 days'
GROUP BY invalid_code
ORDER BY frequency DESC
LIMIT 100;
-- Invert: these are rejected codes. Use accepted codes from audit_events instead.

SELECT
    (validation_messages ->> 0) as code_info,
    COUNT(*) as frequency
FROM audit.audit_events
WHERE outcome = 'ACCEPTED'
  AND resource_type = 'Condition'
  AND event_time > NOW() - INTERVAL '90 days'
GROUP BY 1
ORDER BY frequency DESC
LIMIT 200;
```