Files
bd-fhir-national/ops/version-upgrade-integration.md
2026-03-16 00:02:58 +06:00

435 lines
14 KiB
Markdown

# ICD-11 Version Upgrade — HAPI Integration
**Audience:** ICD-11 Terminology Pipeline team, DGHS FHIR ops
**Related:** `version_upgrade.py` (OCL import pipeline)
**HAPI endpoint:** `DELETE /admin/terminology/cache`
---
## Overview
When a new ICD-11 MMS release is imported into OCL, the HAPI server's
24-hour terminology validation cache becomes stale. Vendors submitting
resources after the import — but before the cache expires — will have their
ICD-11 codes validated against the **old** OCL data. New codes from the
new release will be incorrectly rejected as invalid (cache miss → OCL hit
with old data → cached as invalid). Removed or reclassified codes that were
previously valid will continue to be accepted from cache.
**The cache flush endpoint resolves this.** Calling it after OCL import
forces the next validation call for every ICD-11 code to hit OCL directly,
repopulating the cache with the new version's data.
---
## Step-by-step upgrade procedure
The following steps must be executed **in this exact order**. Deviating
from the order (e.g., flushing before OCL import completes) causes the
cache to repopulate with old data and requires a second flush.
```
Step 1 OCL: import new ICD-11 MMS release
Step 2 OCL: patch concept_class for Diagnosis + Finding concepts
Step 3 OCL: repopulate bd-condition-icd11-diagnosis-valueset collection
Step 4 OCL: verify $validate-code returns correct results for new codes
Step 5 HAPI: flush terminology cache ← this document
Step 6 HAPI: verify validation with new codes
Step 7 DGHS: notify vendors of new release
```
Steps 1-4 are handled by `version_upgrade.py`. This document covers
Steps 5-6 and the exact integration between the two systems.
---
## Step 4 — Pre-flush verification (run before calling HAPI)
Before flushing the HAPI cache, verify that OCL is serving correct results
for the new release. Flushing a cache backed by an incorrect OCL state
degrades validation quality.
### 4a — Verify a new code is valid in OCL
Pick a code that is **new** in this release (not in the previous release).
```bash
NEW_CODE="XY9Z" # Replace with an actual new code from the release notes
curl -s "https://tr.ocl.dghs.gov.bd/api/fhir/ValueSet/\$validate-code\
?url=https://fhir.dghs.gov.bd/core/ValueSet/bd-condition-icd11-diagnosis-valueset\
&system=http://id.who.int/icd/release/11/mms\
&code=${NEW_CODE}" | jq '.parameter[] | select(.name=="result") | .valueBoolean'
# Expected: true
```
### 4b — Verify a Device-class code is rejected by OCL
Device-class codes must be rejected by the bd-condition-icd11-diagnosis-valueset
(which restricts to Diagnosis + Finding only).
```bash
DEVICE_CODE="XA7RE2" # Example Device class code — use an actual one
curl -s "https://tr.ocl.dghs.gov.bd/api/fhir/ValueSet/\$validate-code\
?url=https://fhir.dghs.gov.bd/core/ValueSet/bd-condition-icd11-diagnosis-valueset\
&system=http://id.who.int/icd/release/11/mms\
&code=${DEVICE_CODE}" | jq '.parameter[] | select(.name=="result") | .valueBoolean'
# Expected: false
```
### 4c — Verify a deprecated code is invalid
If this release deprecates or removes any codes, verify they are now rejected.
```bash
DEPRECATED_CODE="..." # From release notes
curl -s "https://tr.ocl.dghs.gov.bd/api/fhir/ValueSet/\$validate-code\
?url=https://fhir.dghs.gov.bd/core/ValueSet/bd-condition-icd11-diagnosis-valueset\
&system=http://id.who.int/icd/release/11/mms\
&code=${DEPRECATED_CODE}" | jq '.parameter[] | select(.name=="result") | .valueBoolean'
# Expected: false (if deprecated) or true (if still valid)
```
Do not proceed to Step 5 until all 4a-4c verifications pass.
---
## Step 5 — Flush the HAPI terminology cache
### 5a — Obtain fhir-admin token
The cache flush endpoint requires the `fhir-admin` Keycloak role.
The `fhir-admin-pipeline` client is the designated service account for
this operation (see `ops/keycloak-setup.md`, Part 2).
```python
# In version_upgrade.py — add this function
import requests
import json
KEYCLOAK_TOKEN_URL = "https://auth.dghs.gov.bd/realms/hris/protocol/openid-connect/token"
FHIR_ADMIN_CLIENT_ID = "fhir-admin-pipeline"
FHIR_ADMIN_CLIENT_SECRET = os.environ["FHIR_ADMIN_CLIENT_SECRET"] # from secrets vault
HAPI_BASE_URL = "https://fhir.dghs.gov.bd"
def get_fhir_admin_token() -> str:
"""Obtain a fhir-admin Bearer token from Keycloak."""
response = requests.post(
KEYCLOAK_TOKEN_URL,
data={
"grant_type": "client_credentials",
"client_id": FHIR_ADMIN_CLIENT_ID,
"client_secret": FHIR_ADMIN_CLIENT_SECRET,
},
timeout=30,
)
response.raise_for_status()
token_data = response.json()
access_token = token_data["access_token"]
# Verify the token contains fhir-admin role before using it
# (parse middle segment of JWT)
import base64
payload_b64 = access_token.split(".")[1]
# Add padding if needed
payload_b64 += "=" * (4 - len(payload_b64) % 4)
claims = json.loads(base64.b64decode(payload_b64))
realm_roles = claims.get("realm_access", {}).get("roles", [])
if "fhir-admin" not in realm_roles:
raise ValueError(
f"fhir-admin-pipeline token does not contain fhir-admin role. "
f"Roles present: {realm_roles}. "
f"Check Keycloak service account role assignment."
)
return access_token
```
### 5b — Check cache state before flush (optional but recommended)
```python
def get_cache_stats(admin_token: str) -> dict:
"""Retrieve current HAPI terminology cache statistics."""
response = requests.get(
f"{HAPI_BASE_URL}/admin/terminology/cache/stats",
headers={"Authorization": f"Bearer {admin_token}"},
timeout=30,
)
response.raise_for_status()
return response.json()
# Usage:
stats_before = get_cache_stats(admin_token)
print(f"Cache before flush: {stats_before['totalEntries']} entries "
f"({stats_before['liveEntries']} live, "
f"{stats_before['expiredEntries']} expired)")
```
### 5c — Execute cache flush
```python
def flush_hapi_terminology_cache(admin_token: str) -> dict:
"""
Flush the HAPI ICD-11 terminology validation cache.
Must be called AFTER:
- OCL ICD-11 import is complete
- concept_class patch is applied
- bd-condition-icd11-diagnosis-valueset is repopulated
- $validate-code verified returning correct results
Returns the flush summary from HAPI.
Raises requests.HTTPError on failure.
"""
response = requests.delete(
f"{HAPI_BASE_URL}/admin/terminology/cache",
headers={"Authorization": f"Bearer {admin_token}"},
timeout=60, # allow time for HAPI to process across all replicas
)
if response.status_code == 403:
raise PermissionError(
"Cache flush rejected: fhir-admin role not present in token. "
"Check Keycloak fhir-admin-pipeline service account configuration."
)
response.raise_for_status()
result = response.json()
print(f"HAPI cache flush completed: {result['entriesEvicted']} entries evicted "
f"at {result['timestamp']}")
return result
# Full upgrade function to add to version_upgrade.py:
def post_ocl_import_hapi_integration(icd11_version: str) -> None:
"""
Call after successful OCL import and verification.
Flushes HAPI cache and verifies the new version validates correctly.
Args:
icd11_version: The new ICD-11 version string, e.g. "2025-01"
"""
print(f"\n=== HAPI integration: ICD-11 {icd11_version} ===")
# Step 5a: get admin token
print("Obtaining fhir-admin token...")
admin_token = get_fhir_admin_token()
print("Token obtained.")
# Step 5b: record pre-flush state
stats_before = get_cache_stats(admin_token)
print(f"Pre-flush cache: {stats_before['totalEntries']} entries")
# Step 5c: flush
print("Flushing HAPI terminology cache...")
flush_result = flush_hapi_terminology_cache(admin_token)
print(f"Flush complete: {flush_result['entriesEvicted']} entries evicted")
# Step 6: post-flush verification (see below)
verify_hapi_validates_new_version(admin_token, icd11_version)
print(f"=== HAPI integration complete for ICD-11 {icd11_version} ===\n")
```
---
## Step 6 — Post-flush verification
After the flush, verify that HAPI is now validating against the new OCL data.
This confirms the end-to-end pipeline from OCL → HAPI cache → vendor validation.
### 6a — Submit a test Condition with a new ICD-11 code
The test resource must be submitted by the `fhir-admin-pipeline` client.
Note: the admin client has `fhir-admin` role but the FHIR resource endpoints
require `mci-api` role. Use a dedicated test vendor client for resource
submission, or temporarily assign `mci-api` to the admin client for testing.
**Recommended approach:** use a dedicated test vendor client
(`fhir-vendor-test-pipeline`) with `mci-api` role for post-upgrade verification.
```python
def verify_hapi_validates_new_version(
admin_token: str, icd11_version: str) -> None:
"""
Verifies HAPI is now accepting codes from the new ICD-11 version.
Uses the $validate-code operation directly against HAPI (not resource submission)
to avoid needing mci-api role on the admin client.
Note: HAPI's $validate-code endpoint proxies to OCL via the validation chain.
A successful result confirms the cache was flushed AND OCL is returning
correct results for the new version.
"""
# Use a known-valid code from the new release
# This should be parameterised with the actual new code from release notes
test_code = get_test_code_for_version(icd11_version) # implement per release
valueset_url = (
"https://fhir.dghs.gov.bd/core/ValueSet/"
"bd-condition-icd11-diagnosis-valueset"
)
response = requests.get(
f"{HAPI_BASE_URL}/fhir/ValueSet/$validate-code",
params={
"url": valueset_url,
"system": "http://id.who.int/icd/release/11/mms",
"code": test_code,
},
headers={"Authorization": f"Bearer {admin_token}"},
timeout=30,
)
if response.status_code == 401:
# $validate-code requires mci-api — use a vendor test token here
print("WARNING: $validate-code requires mci-api role. "
"Skipping HAPI direct verification. "
"Verify manually by submitting a test Condition resource.")
return
response.raise_for_status()
result = response.json()
valid = next(
(p["valueBoolean"] for p in result.get("parameter", [])
if p["name"] == "result"),
None
)
if valid is True:
print(f"✓ HAPI verification passed: code '{test_code}' "
f"valid in new ICD-11 {icd11_version}")
else:
message = next(
(p.get("valueString") for p in result.get("parameter", [])
if p["name"] == "message"),
"no message"
)
raise ValueError(
f"HAPI verification FAILED: code '{test_code}' rejected after cache flush. "
f"Message: {message}. "
f"Check OCL import completed correctly for ICD-11 {icd11_version}."
)
```
---
## Integration into version_upgrade.py — call site
Add to the end of your main upgrade function, after the OCL verification steps:
```python
def run_upgrade(icd11_version: str) -> None:
"""Main upgrade entry point."""
# --- Existing steps (your current implementation) ---
print(f"Starting ICD-11 {icd11_version} upgrade...")
# 1. Import ICD-11 concepts into OCL
import_concepts_to_ocl(icd11_version)
# 2. Patch concept_class for Diagnosis + Finding
patch_concept_class(icd11_version)
# 3. Repopulate bd-condition-icd11-diagnosis-valueset
repopulate_condition_valueset(icd11_version)
# 4. Verify OCL $validate-code
verify_ocl_validate_code(icd11_version)
# --- New: HAPI integration ---
# 5-6. Flush HAPI cache and verify
post_ocl_import_hapi_integration(icd11_version)
# 7. Notify vendors
notify_vendors_of_upgrade(icd11_version)
print(f"ICD-11 {icd11_version} upgrade complete.")
```
---
## Environment variables required by version_upgrade.py
Add to your upgrade pipeline's secrets configuration:
```bash
# Keycloak admin client for HAPI cache management
FHIR_ADMIN_CLIENT_SECRET=<secret from keycloak-setup.md Part 2>
# HAPI server base URL
HAPI_BASE_URL=https://fhir.dghs.gov.bd
```
---
## Rollback procedure
If post-flush verification fails (HAPI is not accepting new codes):
1. **Do not re-run the flush** — the cache is already empty, re-flushing has no effect.
2. Check OCL directly: `curl https://tr.ocl.dghs.gov.bd/api/fhir/ValueSet/$validate-code?...`
3. If OCL is returning wrong results: the OCL import is incomplete. Re-run steps 1-4.
4. If OCL is returning correct results but HAPI is not: check HAPI logs for OCL
connectivity errors. OCL may have returned HTTP 5xx during the first post-flush
validation call, triggering fail-open behaviour.
5. After fixing OCL: flush the cache again (it has repopulated with bad data).
```bash
# Emergency manual flush via curl
ADMIN_TOKEN=$(curl -s -X POST \
"https://auth.dghs.gov.bd/realms/hris/protocol/openid-connect/token" \
-d "grant_type=client_credentials" \
-d "client_id=fhir-admin-pipeline" \
-d "client_secret=${FHIR_ADMIN_CLIENT_SECRET}" \
| jq -r '.access_token')
curl -s -X DELETE \
-H "Authorization: Bearer ${ADMIN_TOKEN}" \
https://fhir.dghs.gov.bd/admin/terminology/cache | jq .
```
---
## Cache warm-up after flush
The HAPI cache repopulates organically as vendors submit resources.
There is no pre-warming mechanism. The first vendor submission after a flush
for each code will take up to 10 seconds (OCL timeout) rather than sub-millisecond
(cache hit). At pilot scale (50 vendors, <36,941 distinct codes in use),
this is acceptable.
At national scale, consider a pre-warming job that submits $validate-code requests
for the top-N most frequently submitted ICD-11 codes immediately after the flush.
The top-N list is derivable from the `audit.audit_events` table:
```sql
SELECT invalid_code, COUNT(*) as frequency
FROM audit.fhir_rejected_submissions
WHERE rejection_code = 'TERMINOLOGY_INVALID_CODE'
AND submission_time > NOW() - INTERVAL '90 days'
GROUP BY invalid_code
ORDER BY frequency DESC
LIMIT 100;
-- Invert: these are rejected codes. Use accepted codes from audit_events instead.
SELECT
(validation_messages ->> 0) as code_info,
COUNT(*) as frequency
FROM audit.audit_events
WHERE outcome = 'ACCEPTED'
AND resource_type = 'Condition'
AND event_time > NOW() - INTERVAL '90 days'
GROUP BY 1
ORDER BY frequency DESC
LIMIT 200;
```