37 KiB
BD FHIR National — Technical Operations Document
System: National FHIR R4 Repository and Validation Engine
Published by: DGHS / MoHFW Bangladesh
IG: BD Core FHIR IG v0.2.1
HAPI FHIR: 7.2.0
Stack: Java 17 · Spring Boot 3.2.5 · PostgreSQL 15 · Docker Compose
Table of Contents
- System Purpose and Architecture
- Repository Structure
- How the System Works
- Infrastructure Components
- Security Model
- Validation Pipeline
- Audit and Forensics
- CI/CD Pipeline
- First Deployment — Step by Step
- Routine Operations
- ICD-11 Version Upgrade
- Scaling
- Troubleshooting
- Architecture Decisions You Must Not Reverse
1. System Purpose and Architecture
This system is the national FHIR R4 repository for Bangladesh. It serves three purposes simultaneously:
Repository — Stores validated FHIR R4 resources submitted by hospitals, clinics, diagnostic labs, and pharmacies (collectively: vendors). No unvalidated resource enters storage.
Validation engine — Every incoming resource is validated against BD Core FHIR IG profiles AND against the national ICD-11 terminology authority (OCL) before storage. Invalid resources are rejected with HTTP 422 and a FHIR OperationOutcome describing exactly what failed.
HIE gateway — Acts as the national Health Information Exchange boundary. The system enforces that only authenticated, authorised, and clinically valid data enters the national record.
Traffic flow
Vendor system
│
│ POST /fhir/Condition
│ Authorization: Bearer {token}
▼
Centralised nginx proxy ← TLS termination, routing (managed separately)
│
▼
HAPI server :8080
│
├─ KeycloakJwtInterceptor ← validates JWT, extracts facility identity
├─ ClusterExpressionValidator ← validates ICD-11 cluster expressions
├─ RequestValidatingInterceptor ← validates against BD Core IG profiles
├─ BdTerminologyValidationSupport ← validates ICD-11 codes against OCL
│
├─ [ACCEPTED] → HFJ_RESOURCE (postgres-fhir)
│ AuditEventEmitter → audit.audit_events (postgres-audit)
│
└─ [REJECTED] → 422 OperationOutcome to vendor
RejectedSubmissionSink → audit.fhir_rejected_submissions (postgres-audit)
AuditEventEmitter → audit.audit_events (postgres-audit)
External service dependencies
| Service | URL | Purpose | Failure behaviour |
|---|---|---|---|
| Keycloak | https://auth.dghs.gov.bd/realms/hris |
JWT validation, JWKS | Fail closed — all requests rejected |
| OCL | https://tr.ocl.dghs.gov.bd/api/fhir |
ICD-11 terminology validation | Fail open — resource accepted with audit record |
| Cluster validator | https://icd11.dghs.gov.bd/cluster/validate |
Postcoordinated ICD-11 expressions | Fail open — resource accepted with audit record |
Fail-open policy for OCL and cluster validator is deliberate. Service continuity during external service outages takes precedence over perfect validation coverage. Every fail-open event is recorded in the audit log. OCL or cluster validator outages must be treated as high-priority incidents.
2. Repository Structure
bd-fhir-national/
├── .env.example ← copy to .env, fill secrets
├── docker-compose.yml ← production orchestration
├── pom.xml ← parent Maven POM, version pins
├── hapi-overlay/
│ ├── Dockerfile ← multi-stage build
│ ├── pom.xml ← runtime dependencies
│ └── src/main/
│ ├── java/bd/gov/dghs/fhir/
│ │ ├── BdFhirApplication.java ← Spring Boot entry point
│ │ ├── audit/
│ │ │ ├── AuditEventEmitter.java ← async INSERT to audit_events
│ │ │ └── RejectedSubmissionSink.java ← async INSERT to rejected_submissions
│ │ ├── config/
│ │ │ ├── DataSourceConfig.java ← dual datasource, dual Flyway
│ │ │ ├── FhirServerConfig.java ← validation chain, IG loading
│ │ │ └── SecurityConfig.java ← interceptor registration
│ │ ├── init/
│ │ │ └── IgPackageInitializer.java ← advisory lock IG loader
│ │ ├── interceptor/
│ │ │ ├── AuditEventInterceptor.java ← audit hook
│ │ │ └── KeycloakJwtInterceptor.java ← JWT auth
│ │ ├── terminology/
│ │ │ ├── BdTerminologyValidationSupport.java ← OCL integration
│ │ │ └── TerminologyCacheManager.java ← cache flush endpoint
│ │ └── validator/
│ │ └── ClusterExpressionValidator.java ← cluster expression check
│ └── resources/
│ ├── application.yaml ← all Spring/HAPI configuration
│ ├── logback-spring.xml ← structured JSON logging
│ ├── db/migration/
│ │ ├── fhir/V1__hapi_schema.sql ← HAPI JPA schema (Flyway)
│ │ └── audit/V2__audit_schema.sql ← audit schema (Flyway)
│ └── packages/
│ └── .gitkeep ← CI places IG .tgz here
├── ops/
│ ├── deployment-guide.md
│ ├── keycloak-setup.md
│ ├── project-manifest.md
│ ├── scaling-roadmap.md
│ └── version-upgrade-integration.md
└── postgres/
├── fhir/
│ ├── init.sql ← template only — replace with init.sh before deploy
│ └── postgresql.conf ← PostgreSQL tuning for HAPI workload
└── audit/
├── init.sql ← template only — replace with init.sh before deploy
└── postgresql.conf ← PostgreSQL tuning for audit workload
3. How the System Works
Startup sequence
When a HAPI container starts, the following happens in order. If any step fails, the container exits and Docker restarts it.
- Flyway — FHIR schema runs
V1__hapi_schema.sqlagainstpostgres-fhirusing the superuser credential. Creates all HAPI JPA tables, sequences, and indexes. Skipped if already applied. - Flyway — Audit schema runs
V2__audit_schema.sqlagainstpostgres-audit. Creates partitionedaudit_eventsandfhir_rejected_submissionstables with monthly partitions pre-created through 2027. Skipped if already applied. - Hibernate validation checks that the schema exactly matches HAPI's entity mappings (
ddl-auto: validate). Fails loudly if tables are missing or wrong. - IgPackageInitializer acquires a PostgreSQL advisory lock on
postgres-fhir, loads the BD Core IG package from the classpath into HAPI'sNpmPackageValidationSupport, writes metadata toNPM_PACKAGEtables, and releases the lock. The advisory lock prevents race conditions when multiple replicas start simultaneously — only one replica writes the metadata row; subsequent replicas find it already present and skip. - KeycloakJwtInterceptor fetches the Keycloak JWKS endpoint and caches the signing keys. If Keycloak is unreachable at startup, the interceptor fails to initialise and the container exits.
- Server begins accepting traffic.
Request lifecycle — accepted resource
1. KeycloakJwtInterceptor
└─ extracts Bearer token from Authorization header
└─ verifies signature against cached Keycloak JWKS
└─ verifies exp, iss = https://auth.dghs.gov.bd/realms/hris
└─ verifies mci-api role present in realm_access or resource_access
└─ extracts client_id, sub, sending_facility
└─ sets request attributes, populates MDC for log correlation
2. AuditEventInterceptor (pre-validation hook)
└─ invokes ClusterExpressionValidator
└─ scans Coding elements with system = http://id.who.int/icd/release/11/mms
└─ if icd11-cluster-expression extension present → calls cluster validator middleware
└─ if raw postcoordination chars (&, /, %) in code without extension → rejects immediately
3. RequestValidatingInterceptor
└─ runs FhirInstanceValidator against ValidationSupportChain:
1. DefaultProfileValidationSupport (base FHIR R4 profiles)
2. CommonCodeSystemsTerminologyService (UCUM, MimeType, etc.)
3. SnapshotGeneratingValidationSupport (differential → snapshot)
4. InMemoryTerminologyServerValidationSupport (cache layer)
5. NpmPackageValidationSupport (BD Core IG profiles)
6. BdTerminologyValidationSupport (OCL $validate-code for ICD-11)
└─ any ERROR severity issue → throws UnprocessableEntityException → 422
4. HAPI JPA persistence
└─ resource written to HFJ_RESOURCE, HFJ_RES_VER, SPIDX tables
5. AuditEventInterceptor (post-storage hook)
└─ async: INSERT into audit.audit_events (outcome = ACCEPTED)
6. HTTP 201 Created → vendor
Request lifecycle — rejected resource
1-3. Same as above up to validation failure
4. UnprocessableEntityException thrown with FHIR OperationOutcome
5. AuditEventInterceptor (exception hook)
└─ async: INSERT full payload into audit.fhir_rejected_submissions
└─ async: INSERT into audit.audit_events (outcome = REJECTED)
6. HTTP 422 Unprocessable Entity → vendor
Body: OperationOutcome with issue[].diagnostics and issue[].expression
ICD-11 terminology validation detail
BdTerminologyValidationSupport intercepts every call to validate an ICD-11 coded element:
- Cache check — if the code was validated in the last 24 hours, serve result from
ConcurrentHashMap. No OCL call. - Cache miss — call OCL
$validate-codewithsystem=http://id.who.int/icd/release/11/mms. ForCondition.code, includeurl=https://fhir.dghs.gov.bd/core/ValueSet/bd-condition-icd11-diagnosis-valuesetto enforce the Diagnosis + Finding class restriction. - OCL returns result=true — cache as valid, return valid to chain.
- OCL returns result=false — cache as invalid, return error to chain → 422.
- OCL timeout or 5xx — log WARN, return null (not supported) — fail open.
$expandattempts —isValueSetSupported()returns false for ICD-11 ValueSets.$expandis never attempted. This is intentional: OCL does not support$expand.
4. Infrastructure Components
Docker services
| Service | Image | Purpose | Networks |
|---|---|---|---|
hapi |
Private registry | HAPI FHIR application | frontend, backend-fhir, backend-audit |
postgres-fhir |
postgres:15-alpine | FHIR resource store | backend-fhir |
postgres-audit |
postgres:15-alpine | Immutable audit store | backend-audit |
pgbouncer-fhir |
bitnami/pgbouncer:1.22.1 | Connection pool → postgres-fhir | backend-fhir |
pgbouncer-audit |
bitnami/pgbouncer:1.22.1 | Connection pool → postgres-audit | backend-audit |
Network isolation
backend-fhir and backend-audit are marked internal: true — no external internet access from these networks. The database containers cannot reach external services and external services cannot reach the databases directly.
pgBouncer configuration
Both pgBouncer instances run in session mode. This is mandatory. HAPI uses Hibernate which relies on prepared statements — transaction mode pgBouncer breaks these. Do not change the pool mode.
Pool sizing at pilot phase (1 HAPI replica):
| Pool | HikariCP max per replica | pgBouncer pool_size | PostgreSQL max_connections |
|---|---|---|---|
| FHIR | 5 | 20 | 30 |
| Audit | 2 | 10 | 20 |
At 3 replicas: 15 FHIR connections, 6 audit connections — both within pool limits.
Databases
postgres-fhir contains all HAPI JPA tables. Schema managed by Flyway V1__hapi_schema.sql. ddl-auto: validate means Hibernate never modifies the schema — Flyway owns all DDL. If a HAPI upgrade requires schema changes, write a new Flyway migration.
postgres-audit contains the audit schema only. Two tables, both partitioned by month. Schema managed by Flyway V2__audit_schema.sql against postgres-audit (separate Flyway instance, separate history table flyway_audit_schema_history).
Volumes
| Volume | Contents | Backup priority |
|---|---|---|
postgres-fhir-data |
All FHIR resources | Critical — primary data |
postgres-audit-data |
All audit records, rejected payloads | Critical — forensic/legal |
hapi-logs |
Structured JSON application logs | Medium — operational |
5. Security Model
Authentication
Every request to FHIR endpoints (except GET /fhir/metadata and /actuator/health/**) requires a valid Bearer token issued by Keycloak realm hris.
KeycloakJwtInterceptor performs these checks in order, rejecting with HTTP 401 on any failure:
Authorization: Bearerheader present and non-empty- JWT signature valid against Keycloak JWKS (
RS256only — symmetric algorithms rejected) expclaim in the future (not expired)issclaim exactly equalshttps://auth.dghs.gov.bd/realms/hrismci-apirole present inrealm_access.rolesOR inresource_access.{client-id}.roles
The JWKS is cached locally with a 1-hour TTL. On receiving a JWT with an unknown kid, the JWKS is immediately re-fetched regardless of TTL — this handles Keycloak key rotation without delay.
Authorisation
Vendors — must have mci-api role. Client naming convention: fhir-vendor-{organisation-id}.
Admin operations (cache flush endpoint) — must have fhir-admin role. Only the fhir-admin-pipeline service account and DGHS system administrators hold this role.
Keycloak client setup for new vendors
See ops/keycloak-setup.md for the full procedure. Summary:
- Create client
fhir-vendor-{org-id}inhrisrealm — confidential, service accounts enabled, standard flow off. - Assign
mci-apirole to the service account. - Add
sending_facilityuser attribute with the DGHS facility code. - Add a User Attribute token mapper for
sending_facility→ token claimsending_facility. - Deliver
client_idandclient_secretto the vendor.
If a vendor token is missing the sending_facility claim, HAPI logs WARN on every submission and uses client_id as the facility identifier in audit records. This is a data quality issue — configure the mapper.
Vendor token flow
# Vendor obtains token
POST https://auth.dghs.gov.bd/realms/hris/protocol/openid-connect/token
grant_type=client_credentials
client_id=fhir-vendor-{org-id}
client_secret={secret}
→ { "access_token": "eyJ...", "expires_in": 300 }
# Vendor submits resource
POST https://fhir.dghs.gov.bd/fhir/Condition
Authorization: Bearer eyJ...
Content-Type: application/fhir+json
{ ... }
Tokens expire in 5 minutes (Keycloak default). Vendor systems must refresh before expiry.
6. Validation Pipeline
BD Core IG profiles
The following resource types are validated against BD Core IG profiles:
| Resource type | Profile URL |
|---|---|
| Patient | https://fhir.dghs.gov.bd/core/StructureDefinition/bd-patient |
| Condition | https://fhir.dghs.gov.bd/core/StructureDefinition/bd-condition |
| Encounter | https://fhir.dghs.gov.bd/core/StructureDefinition/bd-encounter |
| Observation | https://fhir.dghs.gov.bd/core/StructureDefinition/bd-observation |
| Practitioner | https://fhir.dghs.gov.bd/core/StructureDefinition/bd-practitioner |
| Organization | https://fhir.dghs.gov.bd/core/StructureDefinition/bd-organization |
| Location | https://fhir.dghs.gov.bd/core/StructureDefinition/bd-location |
| Medication | https://fhir.dghs.gov.bd/core/StructureDefinition/bd-medication |
| MedicationRequest | https://fhir.dghs.gov.bd/core/StructureDefinition/bd-medicationrequest |
| Immunization | https://fhir.dghs.gov.bd/core/StructureDefinition/bd-immunization |
Resources of any other type are stored with meta.tag = https://fhir.dghs.gov.bd/tags|unvalidated-profile. They are not rejected. They can be queried with _tag=https://fhir.dghs.gov.bd/tags|unvalidated-profile.
ICD-11 cluster expression format
BD Core IG defines a specific pattern for postcoordinated ICD-11 expressions. Raw postcoordinated strings in Coding.code are prohibited.
Correct format:
"code": {
"coding": [{
"system": "http://id.who.int/icd/release/11/mms",
"code": "1C62.0",
"extension": [{
"url": "icd11-cluster-expression",
"valueString": "1C62.0/http%3A%2F%2Fid.who.int%2F..."
}]
}]
}
Prohibited format (rejected with 422):
"code": {
"coding": [{
"system": "http://id.who.int/icd/release/11/mms",
"code": "1C62.0&has_severity=mild"
}]
}
Rejection codes
The rejection_code column in audit.fhir_rejected_submissions contains one of:
| Code | Meaning |
|---|---|
PROFILE_VIOLATION |
Resource violates a BD Core IG SHALL constraint |
TERMINOLOGY_INVALID_CODE |
ICD-11 code not found in OCL |
TERMINOLOGY_INVALID_CLASS |
ICD-11 code exists but is not Diagnosis/Finding class |
CLUSTER_EXPRESSION_INVALID |
Cluster expression failed cluster validator |
CLUSTER_STEM_MISSING_EXTENSION |
Raw postcoordinated string without extension |
AUTH_TOKEN_MISSING |
No Bearer token |
AUTH_TOKEN_EXPIRED |
Token exp in the past |
AUTH_TOKEN_INVALID_SIGNATURE |
Signature verification failed |
AUTH_TOKEN_MISSING_ROLE |
mci-api role absent |
AUTH_TOKEN_INVALID_ISSUER |
iss does not match Keycloak realm |
7. Audit and Forensics
Two audit stores
audit.audit_events — one row per request outcome. Always written, accepted and rejected. Contains: event_type, operation, resource_type, resource_id, outcome, outcome_detail, sending_facility, client_id, subject, request_ip, request_id, validation_messages (JSONB).
audit.fhir_rejected_submissions — one row per rejected write. Contains: full resource payload as submitted (TEXT, not JSONB), rejection_code, rejection_reason, element_path, violated_profile, invalid_code, invalid_system.
Immutability
The audit_writer_login PostgreSQL user has INSERT only on the audit schema. The HAPI JVM connects to postgres-audit as this user. No UPDATE or DELETE is possible from the application layer regardless of what the application code attempts. Only a PostgreSQL superuser can modify audit records.
Partitioning
Both audit tables are partitioned by month (PARTITION BY RANGE (event_time)). Monthly partitions are pre-created through December 2027. A cron job must create next-month partitions on the 20th of each month. If this lapses, INSERT fails with a hard error.
Set up the cron job immediately after first deployment:
# On the host running postgres-audit
crontab -e
# Add:
0 0 20 * * docker exec bd-postgres-audit psql -U audit_maintainer_login -d auditdb \
-c "SELECT audit.create_next_month_partitions();" \
>> /var/log/bd-fhir-partition-maintenance.log 2>&1
Useful audit queries
-- Rejection rate by vendor, last 7 days
SELECT client_id,
COUNT(*) AS total,
SUM(CASE WHEN outcome='REJECTED' THEN 1 ELSE 0 END) AS rejected,
ROUND(100.0 * SUM(CASE WHEN outcome='REJECTED' THEN 1 ELSE 0 END) / COUNT(*), 1) AS pct
FROM audit.audit_events
WHERE event_time > NOW() - INTERVAL '7 days'
AND event_type IN ('OPERATION','VALIDATION_FAILURE')
GROUP BY client_id ORDER BY pct DESC;
-- Retrieve rejected payloads for a vendor
SELECT submission_time, resource_type, rejection_code, rejection_reason, element_path
FROM audit.fhir_rejected_submissions
WHERE client_id = 'fhir-vendor-{org-id}'
ORDER BY submission_time DESC LIMIT 20;
-- Auth failures
SELECT event_time, client_id, outcome_detail, request_ip
FROM audit.audit_events
WHERE event_type = 'AUTH_FAILURE'
ORDER BY event_time DESC LIMIT 20;
8. CI/CD Pipeline
The production server never builds. It only pulls pre-built images from the private registry.
CI pipeline steps (on CI machine)
# 1. Obtain BD Core IG package and place it
cp /path/to/bd.gov.dghs.core-0.2.1.tgz \
hapi-overlay/src/main/resources/packages/
# 2. Run tests (TestContainers spins up real PostgreSQL — no H2)
mvn test -pl hapi-overlay -am
# 3. Build Docker image (multi-stage: Maven builder + JRE runtime)
docker build \
--build-arg IG_PACKAGE=bd.gov.dghs.core-0.2.1.tgz \
--build-arg BUILD_VERSION=1.0.0 \
--build-arg GIT_COMMIT=$(git rev-parse --short HEAD) \
-t your-registry.dghs.gov.bd/bd-fhir-hapi:1.0.0 \
-f hapi-overlay/Dockerfile \
.
# 4. Push to private registry
docker push your-registry.dghs.gov.bd/bd-fhir-hapi:1.0.0
The packages/ directory must contain exactly one .tgz file matching HAPI_IG_PACKAGE_CLASSPATH in .env. If the directory is empty or the filename does not match, the container fails startup immediately with a clear error message.
9. First Deployment — Step by Step
Prerequisites
- Ubuntu 22.04 LTS, minimum 8GB RAM, 4 vCPU, 100GB disk
- Outbound HTTPS to Keycloak, OCL, cluster validator, private registry
- Docker image already built and pushed (see Section 8)
- Keycloak configured (see
ops/keycloak-setup.md)
Step 1 — Install Docker
sudo apt-get update
sudo apt-get install -y ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
-o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] \
https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io \
docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER
# log out and back in
Step 2 — Prepare application directory
sudo mkdir -p /opt/bd-fhir-national
sudo chown $USER:$USER /opt/bd-fhir-national
# rsync project files from CI/deployment machine (excluding source tree)
rsync -avz --exclude='.git' --exclude='hapi-overlay/target' \
--exclude='hapi-overlay/src' \
./bd-fhir-national/ deploy@server:/opt/bd-fhir-national/
Step 3 — Create .env
cd /opt/bd-fhir-national
cp .env.example .env
chmod 600 .env
nano .env # fill all <CHANGE_ME> values
# verify: grep CHANGE_ME .env should return nothing
Step 4 — Fix init scripts (CRITICAL — do not skip)
The postgres/fhir/init.sql and postgres/audit/init.sql files are templates with placeholder passwords. PostgreSQL Docker does not perform variable substitution in .sql files. Replace them with .sh scripts that read from environment variables.
# FHIR database init script
cat > /opt/bd-fhir-national/postgres/fhir/init.sh <<'EOF'
#!/bin/bash
set -e
psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --dbname "$POSTGRES_DB" <<-EOSQL
DO \$\$ BEGIN
IF NOT EXISTS (SELECT 1 FROM pg_roles WHERE rolname = '${FHIR_DB_APP_USER}') THEN
CREATE USER ${FHIR_DB_APP_USER} WITH NOSUPERUSER NOCREATEDB NOCREATEROLE
NOINHERIT LOGIN CONNECTION LIMIT 30 PASSWORD '${FHIR_DB_APP_PASSWORD}';
END IF;
END \$\$;
GRANT CONNECT ON DATABASE ${POSTGRES_DB} TO ${FHIR_DB_APP_USER};
GRANT USAGE ON SCHEMA public TO ${FHIR_DB_APP_USER};
ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO ${FHIR_DB_APP_USER};
ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT USAGE, SELECT ON SEQUENCES TO ${FHIR_DB_APP_USER};
EOSQL
EOF
chmod +x /opt/bd-fhir-national/postgres/fhir/init.sh
# Audit database init script
cat > /opt/bd-fhir-national/postgres/audit/init.sh <<'EOF'
#!/bin/bash
set -e
psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --dbname "$POSTGRES_DB" <<-EOSQL
DO \$\$ BEGIN
IF NOT EXISTS (SELECT 1 FROM pg_roles WHERE rolname = '${AUDIT_DB_WRITER_USER}') THEN
CREATE USER ${AUDIT_DB_WRITER_USER} WITH NOSUPERUSER NOCREATEDB NOCREATEROLE
NOINHERIT LOGIN CONNECTION LIMIT 20 PASSWORD '${AUDIT_DB_WRITER_PASSWORD}';
END IF;
IF NOT EXISTS (SELECT 1 FROM pg_roles WHERE rolname = '${AUDIT_DB_MAINTAINER_USER}') THEN
CREATE USER ${AUDIT_DB_MAINTAINER_USER} WITH NOSUPERUSER NOCREATEDB NOCREATEROLE
NOINHERIT LOGIN CONNECTION LIMIT 5 PASSWORD '${AUDIT_DB_MAINTAINER_PASSWORD}';
END IF;
END \$\$;
GRANT CONNECT ON DATABASE ${POSTGRES_DB} TO ${AUDIT_DB_WRITER_USER};
GRANT CONNECT ON DATABASE ${POSTGRES_DB} TO ${AUDIT_DB_MAINTAINER_USER};
EOSQL
EOF
chmod +x /opt/bd-fhir-national/postgres/audit/init.sh
Update docker-compose.yml — in both postgres services, change the init volume mount from .sql to .sh, and pass the necessary env vars to postgres-audit:
# postgres-fhir volumes: change
- ./postgres/fhir/init.sh:/docker-entrypoint-initdb.d/init.sh:ro
# add to postgres-fhir environment:
FHIR_DB_APP_USER: ${FHIR_DB_APP_USER}
FHIR_DB_APP_PASSWORD: ${FHIR_DB_APP_PASSWORD}
# postgres-audit volumes: change
- ./postgres/audit/init.sh:/docker-entrypoint-initdb.d/init.sh:ro
# add to postgres-audit environment:
AUDIT_DB_WRITER_USER: ${AUDIT_DB_WRITER_USER}
AUDIT_DB_WRITER_PASSWORD: ${AUDIT_DB_WRITER_PASSWORD}
AUDIT_DB_MAINTAINER_USER: ${AUDIT_DB_MAINTAINER_USER}
AUDIT_DB_MAINTAINER_PASSWORD: ${AUDIT_DB_MAINTAINER_PASSWORD}
Step 5 — Registry login
docker login your-registry.dghs.gov.bd
docker compose --env-file .env pull
Step 6 — Start databases
docker compose --env-file .env up -d postgres-fhir postgres-audit
# wait for healthy
until docker compose --env-file .env ps postgres-fhir | grep -q "healthy"; do sleep 3; done
until docker compose --env-file .env ps postgres-audit | grep -q "healthy"; do sleep 3; done
Step 7 — Verify database users
docker exec bd-postgres-fhir psql -U postgres -d fhirdb \
-c "SELECT rolname FROM pg_roles WHERE rolname='hapi_app';"
# Expected: hapi_app
docker exec bd-postgres-audit psql -U postgres -d auditdb \
-c "SELECT rolname FROM pg_roles WHERE rolname IN ('audit_writer_login','audit_maintainer_login');"
# Expected: two rows
Step 8 — Start pgBouncer and HAPI
docker compose --env-file .env up -d pgbouncer-fhir pgbouncer-audit
docker compose --env-file .env up -d hapi
# Follow startup — takes 60-120 seconds
docker compose --env-file .env logs -f hapi
Expected log sequence:
Running FHIR Flyway migrations... → V1 applied
Running Audit Flyway migrations... → V2 applied
Advisory lock acquired... → IG loading begins
BD Core IG package loaded... → IG ready
BdTerminologyValidationSupport initialised...
KeycloakJwtInterceptor initialised...
HAPI RestfulServer interceptors registered...
Tomcat started on port(s): 8080
Started BdFhirApplication in XX seconds
Step 9 — Verify health
# Internal (direct to HAPI)
docker exec $(docker compose --env-file .env ps -q hapi | head -1) \
curl -s http://localhost:8080/actuator/health | jq .
# All components must show status: UP
# FHIR metadata
docker exec $(docker compose --env-file .env ps -q hapi | head -1) \
curl -s http://localhost:8080/fhir/metadata | jq '.software'
# Expected: { "name": "BD FHIR National Repository", "version": "0.2.1" }
Step 10 — Set up partition maintenance cron
crontab -e
# Add:
0 0 20 * * docker exec bd-postgres-audit psql -U audit_maintainer_login -d auditdb \
-c "SELECT audit.create_next_month_partitions();" \
>> /var/log/bd-fhir-partition-maintenance.log 2>&1
Step 11 — Run acceptance tests
Run all tests from Section 9.3 of ops/deployment-guide.md. All nine must pass before the system is declared production-ready.
10. Routine Operations
View logs
# All services
docker compose --env-file .env logs -f
# HAPI logs as structured JSON
docker compose --env-file .env logs -f hapi | jq -R 'try fromjson'
# Filter for rejections
docker compose --env-file .env logs hapi | \
jq -R 'try fromjson | select(.message | test("rejected|REJECTED"))'
Deploy a new image version
# Update image tag in .env
nano /opt/bd-fhir-national/.env
# Change HAPI_IMAGE to new tag
# Pull and redeploy
docker compose --env-file .env pull hapi
docker compose --env-file .env up -d --no-deps hapi
# Verify startup
docker compose --env-file .env logs -f hapi
Scale HAPI replicas
docker compose --env-file .env up -d --scale hapi=3
# No other configuration changes needed at 3 replicas.
# pgBouncer pool_size=20 supports up to 4 replicas at HikariCP max=5.
# At 5+ replicas: increase PGBOUNCER_DEFAULT_POOL_SIZE and postgres max_connections first.
Restart a service
docker compose --env-file .env restart hapi
docker compose --env-file .env restart postgres-fhir # causes brief HAPI downtime
Full stack restart
docker compose --env-file .env down
docker compose --env-file .env up -d
Check pgBouncer pool status
docker exec bd-pgbouncer-fhir psql -h localhost -p 5432 -U pgbouncer pgbouncer \
-c "SHOW POOLS;"
11. ICD-11 Version Upgrade
When a new ICD-11 MMS release is imported into OCL, the HAPI terminology cache becomes stale. The upgrade pipeline must flush the cache after OCL import. Full procedure in ops/version-upgrade-integration.md. Summary:
Order is mandatory:
- OCL: import new ICD-11 concepts
- OCL: patch
concept_classfor Diagnosis + Finding - OCL: repopulate
bd-condition-icd11-diagnosis-valueset - OCL: verify
$validate-codereturns correct results for new codes - HAPI: flush terminology cache
- HAPI: verify new codes validate correctly
Step 5 — cache flush:
# Get fhir-admin token
ADMIN_TOKEN=$(curl -s -X POST \
"https://auth.dghs.gov.bd/realms/hris/protocol/openid-connect/token" \
-d "grant_type=client_credentials" \
-d "client_id=fhir-admin-pipeline" \
-d "client_secret=${FHIR_ADMIN_CLIENT_SECRET}" \
| jq -r '.access_token')
# Flush — run from inside Docker network (admin endpoint is network-restricted)
docker exec $(docker compose --env-file .env ps -q hapi | head -1) \
curl -s -X DELETE \
-H "Authorization: Bearer ${ADMIN_TOKEN}" \
http://localhost:8080/admin/terminology/cache | jq .
# Expected: { "status": "flushed", "entriesEvicted": N }
IG version upgrade (when BD Core IG advances to a new version):
- Place new
.tgzinsrc/main/resources/packages/, remove old one. - Update
HAPI_IG_PACKAGE_CLASSPATHandHAPI_IG_VERSIONin.env. - Build and push new Docker image on CI machine.
- Deploy new image on production server.
12. Scaling
Current capacity (Phase 1 — Pilot)
| Metric | Capacity |
|---|---|
| HAPI replicas | 1 |
| Vendors | <50 |
| Resources/day | <10,000 |
| PostgreSQL connections (FHIR) | 5 |
| PostgreSQL connections (Audit) | 2 |
Scaling to Phase 2 (Regional — up to 500 vendors, 100,000 resources/day)
# Scale HAPI to 3 replicas — no other changes required
docker compose --env-file .env up -d --scale hapi=3
Beyond 3 replicas, update pgBouncer pool sizes and PostgreSQL max_connections before scaling. See ops/scaling-roadmap.md for the full capacity matrix and Phase 3 (national scale → Kubernetes) guidance.
13. Troubleshooting
Container not starting
docker compose --env-file .env logs hapi | tail -50
| Log message | Cause | Fix |
|---|---|---|
STARTUP FAILURE: BD Core IG package not found |
.tgz missing from image |
Rebuild image with package in packages/ |
FHIR Flyway configuration missing |
SPRING_FLYWAY_* env vars not set |
Check .env |
password authentication failed for user "hapi_app" |
init.sh not run or wrong password |
Verify Step 4 of deployment, check .env passwords |
Advisory lock acquisition timed out |
Another replica holding lock and crashed mid-init | Check pg_locks on postgres-fhir, kill stale lock |
Connection refused to Keycloak JWKS |
Keycloak unreachable at startup | Check network connectivity, Keycloak health |
Schema-validation: missing table |
Flyway did not run | Check SPRING_FLYWAY_* env vars, check flyway_schema_history table |
401 on all authenticated requests
# Check JWKS endpoint is reachable from inside the container
docker exec $(docker compose --env-file .env ps -q hapi | head -1) \
curl -s https://auth.dghs.gov.bd/realms/hris/protocol/openid-connect/certs | jq '.keys | length'
# Expected: 1 or more keys
If JWKS is unreachable, all requests will be rejected with 401 (fail closed). Check firewall rules — the HAPI container must have outbound HTTPS to Keycloak.
422 on all ICD-11 coded submissions
# Check OCL is reachable
docker exec $(docker compose --env-file .env ps -q hapi | head -1) \
curl -s -o /dev/null -w "%{http_code}" \
"https://tr.ocl.dghs.gov.bd/api/fhir/metadata"
# Expected: 200
# Check a specific code manually
docker exec $(docker compose --env-file .env ps -q hapi | head -1) \
curl -s "https://tr.ocl.dghs.gov.bd/api/fhir/ValueSet/\$validate-code?\
url=https://fhir.dghs.gov.bd/core/ValueSet/bd-condition-icd11-diagnosis-valueset\
&system=http://id.who.int/icd/release/11/mms&code=1C62.0" | jq .
If OCL is unreachable, the system should be fail-open (codes accepted). If codes are being rejected despite OCL being reachable, check OCL's $validate-code response directly.
Audit writes failing
# Check HAPI logs for "AUDIT WRITE FAILED"
docker compose --env-file .env logs hapi | grep "AUDIT WRITE FAILED"
# Check audit datasource health
docker exec $(docker compose --env-file .env ps -q hapi | head -1) \
curl -s http://localhost:8080/actuator/health | jq '.components.auditDb'
Partition missing (INSERT to audit failing)
# Check which partitions exist
docker exec bd-postgres-audit psql -U postgres -d auditdb -c "
SELECT c.relname FROM pg_class c
JOIN pg_inherits i ON i.inhrelid = c.oid
JOIN pg_class p ON p.oid = i.inhparent
JOIN pg_namespace n ON n.oid = p.relnamespace
WHERE n.nspname = 'audit' AND p.relname = 'audit_events'
ORDER BY c.relname DESC LIMIT 3;"
# Create missing partition manually
docker exec bd-postgres-audit psql -U postgres -d auditdb \
-c "SELECT audit.create_next_month_partitions();"
Check disk usage
docker system df -v
df -h /var/lib/docker
14. Architecture Decisions You Must Not Reverse
These decisions are load-bearing. Reversing any of them without fully understanding the consequences will break the system.
PostgreSQL only — no H2, not even for tests.
The test suite uses TestContainers to spin up real PostgreSQL 15. H2 is not on the classpath. Using H2 masks database-specific behaviour (advisory locks, partitioning, JSONB) and produces false-green test results.
Validation on ALL requests — no vendor exemptions.
The RequestValidatingInterceptor runs on every write. There is no per-vendor or per-resource-type bypass. This is the HIE boundary enforcement. A bypass for one vendor breaks the national data quality guarantee for everyone downstream.
OCL is the single terminology authority.
There is no local ICD-11 concept store. All ICD-11 validation goes to OCL. This means OCL availability affects HAPI validation quality. Keep OCL healthy. Do not add a local fallback without understanding the implications for version consistency.
$expand is never attempted for ICD-11 ValueSets.
OCL does not support $expand. The isValueSetSupported() override returns false for all ICD-11 ValueSets. Do not remove this — removing it causes HAPI to attempt $expand, receive an empty response, and reject every ICD-11 coded resource regardless of whether the code is valid.
pgBouncer must remain in session mode.
Hibernate uses prepared statements and advisory locks. Transaction mode pgBouncer breaks both. Do not change PGBOUNCER_POOL_MODE to transaction.
Flyway owns all DDL — Hibernate never modifies schema.
ddl-auto: validate means Hibernate will refuse to start if the schema does not match its entities, but it will never ALTER or CREATE tables. If a HAPI upgrade changes entity mappings, write a Flyway migration. Never change ddl-auto to update in production.
Audit writes are append-only.
The audit_writer_login PostgreSQL user has INSERT only. The application cannot UPDATE or DELETE audit records regardless of what the code does. This is enforced at the database level. Do not grant additional privileges to this user.
The IG package is bundled in the Docker image.
The .tgz is a build-time artifact, not a runtime configuration. There is no hot-reload. An IG upgrade requires a new Docker image build and deployment. This is by design — it ties IG version to container version, making deployments auditable and rollbacks clean.