Files
bd-fhir-national/ops/project-manifest.md
2026-03-16 00:02:58 +06:00

17 KiB
Raw Blame History

BD FHIR National — Project Manifest & Pre-Flight Checklist

Project: BD Core FHIR National Repository and Validation Engine
IG Version: BD Core FHIR IG v0.2.1
FHIR Version: R4 (4.0.1)
HAPI Version: 7.2.0
Published by: DGHS/MoHFW Bangladesh
Generated: 2025


Complete file manifest

Build and orchestration

File Step Purpose
pom.xml 1 Parent Maven POM. HAPI 7.2.0 BOM, Spring Boot 3.2.5, all version pins.
hapi-overlay/pom.xml 2 Child module POM. All runtime dependencies. Fat JAR output: bd-fhir-hapi.jar.
hapi-overlay/Dockerfile 4 Multi-stage build: Maven builder + eclipse-temurin:17-jre runtime. tini as PID 1.
docker-compose.yml 4 Production orchestration: HAPI, 2× PostgreSQL, 2× pgBouncer, nginx. Scaling roadmap in comments.
.env.example 4 Environment variable template. Copy to .env, fill secrets, chmod 600.

Database

File Step Purpose
hapi-overlay/src/main/resources/db/migration/fhir/V1__hapi_schema.sql 3 HAPI 7.2.0 JPA schema. All tables, sequences, indexes. Flyway-managed. Partition comments at 10M+ rows.
hapi-overlay/src/main/resources/db/migration/audit/V2__audit_schema.sql 3 Audit schema. Partitioned audit_events and fhir_rejected_submissions by month 2025-2027. INSERT-only role grants. create_next_month_partitions() maintenance function.
postgres/fhir/postgresql.conf 4 PostgreSQL 15 tuning for HAPI JPA workload. 2GB container. SSD-optimised.
postgres/audit/postgresql.conf 4 PostgreSQL 15 tuning for audit INSERT workload. 1GB container.
postgres/fhir/init.sql 4 Template — replace with init.sh per deployment-guide.md §1.6 before first deploy.
postgres/audit/init.sql 4 Template — replace with init.sh per deployment-guide.md §1.6 before first deploy.

Application configuration

File Step Purpose
hapi-overlay/src/main/resources/application.yaml 5 Complete Spring Boot + HAPI configuration. Dual datasource, dual Flyway, HAPI R4, validation chain, actuator, structured logging. All secrets via env vars.
hapi-overlay/src/main/resources/logback-spring.xml 5 Structured JSON logging via logstash-logback-encoder. Async appenders. MDC field inclusion.

Java source — entry point

File Step Purpose
hapi-overlay/src/main/java/bd/gov/dghs/fhir/BdFhirApplication.java 12 Spring Boot entry point. @EnableAsync activates audit async executor.

Java source — configuration

File Step Purpose
hapi-overlay/src/main/java/bd/gov/dghs/fhir/config/DataSourceConfig.java 6 Dual datasource wiring. Primary FHIR datasource (HikariCP, pgBouncer session mode). Secondary audit datasource (INSERT-only). Dual Flyway instances. auditDbHealthIndicator using INSERT test. oclHealthIndicator. entityManagerFactory bound explicitly to FHIR datasource.
hapi-overlay/src/main/java/bd/gov/dghs/fhir/config/FhirServerConfig.java 6 Validation support chain (6 supports in dependency order). NpmPackageValidationSupport loading BD Core IG. RequestValidatingInterceptor with failOnSeverity=ERROR. unvalidatedProfileTagInterceptor for unknown resource types. Startup IG presence check.
hapi-overlay/src/main/java/bd/gov/dghs/fhir/config/SecurityConfig.java 8 Registers JWT, validation, and audit interceptors into HAPI RestfulServer in correct order. HTTPS enforcement filter. Security response headers filter.

Java source — initialisation

File Step Purpose
hapi-overlay/src/main/java/bd/gov/dghs/fhir/init/IgPackageInitializer.java 9 InitializingBean that loads BD Core IG with PostgreSQL advisory lock. Prevents multi-replica NPM_PACKAGE race condition. djb2 hash for stable lock key.

Java source — interceptors

File Step Purpose
hapi-overlay/src/main/java/bd/gov/dghs/fhir/interceptor/KeycloakJwtInterceptor.java 8 Nimbus JOSE+JWT with RemoteJWKSet (1-hour TTL, kid-based refresh). Validates: signature, expiry, issuer, mci-api role. Extracts: client_id, subject, sending_facility. Sets all REQUEST_ATTR_* constants. MDC population and guaranteed cleanup. GET /fhir/metadata and actuator health exempt.
hapi-overlay/src/main/java/bd/gov/dghs/fhir/interceptor/AuditEventInterceptor.java 9 Three-hook interceptor: (1) cluster expression pre-validation, (2) accepted resource audit at STORAGE_PRESTORAGE_*, (3) rejected resource audit at SERVER_HANDLE_EXCEPTION. Routes to AuditEventEmitter and RejectedSubmissionSink asynchronously.

Java source — terminology

File Step Purpose
hapi-overlay/src/main/java/bd/gov/dghs/fhir/terminology/BdTerminologyValidationSupport.java 7 Custom IValidationSupport. Forces $validate-code for ICD-11. Suppresses $expand via isValueSetSupported()=false. 24-hour ConcurrentHashMap cache with TTL eviction. Retry with exponential backoff. Fail-open on OCL outage. flushCache() called by TerminologyCacheManager.
hapi-overlay/src/main/java/bd/gov/dghs/fhir/terminology/TerminologyCacheManager.java 7 REST controller: DELETE /admin/terminology/cache and GET /admin/terminology/cache/stats. Requires fhir-admin role (read from REQUEST_ATTR_IS_ADMIN). Called by ICD-11 version upgrade pipeline.

Java source — validator

File Step Purpose
hapi-overlay/src/main/java/bd/gov/dghs/fhir/validator/ClusterExpressionValidator.java 7 Detects icd11-cluster-expression extension on ICD-11 Coding elements. Rejects raw postcoordinated strings (contains &, /, % without extension) with 422. Calls https://icd11.dghs.gov.bd/cluster/validate for full expression validation. Fail-open on cluster validator outage.

Java source — audit

File Step Purpose
hapi-overlay/src/main/java/bd/gov/dghs/fhir/audit/AuditEventEmitter.java 9 @Async INSERT to audit.audit_events. Immutable (INSERT only — audit_writer role enforces at DB level). Serialises validationMessages as JSONB. Truncates fields to column lengths. Logs ERROR on write failure (audit gap is a high-priority incident).
hapi-overlay/src/main/java/bd/gov/dghs/fhir/audit/RejectedSubmissionSink.java 9 @Async INSERT to audit.fhir_rejected_submissions. Stores full resource payload as TEXT (preserves exact bytes). 4MB payload cap (anti-DoS). Machine-readable rejection_code for programmatic analysis.

Infrastructure

File Step Purpose
nginx/nginx.conf 10 Reverse proxy. TLS 1.2/1.3 only. Rate limiting: FHIR 10r/s, admin 6r/m, metadata 5r/s. /admin/ restricted to 172.20.0.0/16. /actuator/ restricted to internal network. /fhir/metadata unauthenticated. All other paths → HAPI.
hapi-overlay/src/main/resources/packages/.gitkeep 12 Marks the IG package directory for git. CI pipeline places bd.gov.dghs.core-{version}.tgz here before docker build.

Operations

File Step Purpose
ops/keycloak-setup.md 10 fhir-admin role creation. fhir-admin-pipeline client setup. Vendor client template. sending_facility mapper configuration. Token verification tests.
ops/version-upgrade-integration.md 10 ICD-11 upgrade pipeline integration. Pre-flush OCL verification. get_fhir_admin_token(), flush_hapi_terminology_cache(), verify_hapi_validates_new_version() Python functions. post_ocl_import_hapi_integration() call site. Rollback procedure.
ops/scaling-roadmap.md 10 Phase 1→2→3 thresholds and changes. Monthly partition maintenance cron. PostgreSQL monitoring queries. IG upgrade procedure. Key Prometheus metrics and alert thresholds.
ops/deployment-guide.md 11 Step-by-step Ubuntu 22.04 deployment. Docker install, daemon config, registry auth. PostgreSQL init script fix (critical). First-deploy sequence. Nine acceptance tests. Rolling upgrade procedure. Operational runbook.

Pre-flight checklist

Work through this list top to bottom before running docker compose up. Each item is a documented failure mode from real HAPI deployments. Do not skip items marked CRITICAL.


CI machine (before docker build)

  • [CRITICAL] bd.gov.dghs.core-0.2.1.tgz placed in hapi-overlay/src/main/resources/packages/
    Symptom if missing: startup fails with STARTUP FAILURE: BD Core IG package not found. Container will not start.

  • HAPI_IG_PACKAGE_CLASSPATH in docker-compose.yml matches the .tgz filename exactly
    Symptom if mismatch: same STARTUP FAILURE as above.

  • Docker image built with correct --build-arg values and pushed to private registry
    Verify: docker manifest inspect your-registry.dghs.gov.bd/bd-fhir-hapi:1.0.0

  • Image tag in .env.example (and your .env) matches the pushed image tag
    Symptom if mismatch: docker compose pull pulls wrong image or fails.


Production server (before docker compose up)

  • [CRITICAL] postgres/fhir/init.sql replaced with init.sh (deployment-guide.md §1.6)
    Symptom if skipped: hapi_app user is never created. Flyway migrations succeed but HAPI runtime fails with authentication error to postgres-fhir.

  • [CRITICAL] postgres/audit/init.sql replaced with init.sh (deployment-guide.md §1.6)
    Symptom if skipped: audit_writer_login never created. HAPI starts but all audit writes fail with FATAL: password authentication failed for user "audit_writer_login".

  • docker-compose.yml postgres-audit service updated to mount init.sh (not init.sql) and passes AUDIT_DB_WRITER_USER/PASSWORD/MAINTAINER_* env vars
    Follows from the init.sh fix above.

  • .env file created, all <CHANGE_ME> values replaced, chmod 600 .env
    Verify: grep CHANGE_ME .env returns no output.

  • TLS_CERT_PATH and TLS_KEY_PATH in .env point to files that exist on the server
    Verify: ls -la $(grep TLS_CERT_PATH .env | cut -d= -f2)

  • Server can reach all external services from within the Docker network:

    # Test from inside a temporary container on the Docker network
    docker run --rm --network bd-fhir-national_backend-fhir alpine sh -c \
      "apk add -q curl && curl -s -o /dev/null -w '%{http_code}' \
      https://auth.dghs.gov.bd/realms/hris/.well-known/openid-configuration"
    # Expected: 200
    

    Symptom if unreachable: KeycloakJwtInterceptor fails to fetch JWKS on startup. All authenticated requests return 401 even with valid tokens.

  • random_page_cost in both postgresql.conf files matches your storage type
    1.1 for SSD (default in this project), 4.0 for spinning HDD
    Symptom if wrong: query planner chooses sequential scans over indexes. FHIR search performance degrades at >100k resources.

  • Docker and Docker Compose v2 installed (docker compose version, not docker-compose)
    Symptom if wrong: docker-compose (v1) does not support deploy.replicas or condition: service_healthy.

  • Private registry credentials stored in ~/.docker/config.json
    Verify: docker login your-registry.dghs.gov.bd


Keycloak (before first vendor submission)

  • [CRITICAL] fhir-admin realm role created in hris realm (keycloak-setup.md Part 1)
    Symptom if missing: fhir-admin-pipeline service account has no role to assign. Cache flush endpoint returns 403 for all callers.

  • [CRITICAL] fhir-admin-pipeline client created with fhir-admin role assigned (keycloak-setup.md Part 2)
    Symptom if missing: version upgrade pipeline cannot flush cache. After ICD-11 upgrade, stale codes accepted/rejected for up to 24 hours.

  • At least one vendor client created (fhir-vendor-TEST-FAC-001 for acceptance testing) with mci-api role and sending_facility attribute mapper (keycloak-setup.md Parts 3-4)
    Symptom if missing: acceptance Test 1 returns 401. All vendor submissions rejected.

  • Token from test vendor client decoded and verified to contain:

    • iss: https://auth.dghs.gov.bd/realms/hris
    • azp: fhir-vendor-TEST-FAC-001
    • realm_access.roles: contains mci-api
    • sending_facility: non-empty facility code
      Verify with: echo $TOKEN | cut -d. -f2 | base64 -d 2>/dev/null | jq .

Post-startup verification

  • All health indicators GREEN:

    curl -s http://localhost:8080/actuator/health | jq '.components | keys'
    # Expected: ["auditDb", "db", "livenessState", "ocl", "readinessState"]
    # All must show "status": "UP"
    
  • FHIR metadata accessible unauthenticated and shows correct IG version:

    curl -s https://fhir.dghs.gov.bd/fhir/metadata | jq '.software.version'
    # Expected: "0.2.1"
    
  • Flyway migration history shows V1 and V2 applied cleanly:

    docker exec bd-postgres-fhir psql -U postgres -d fhirdb \
      -c "SELECT version, description, success FROM flyway_schema_history;"
    # Expected: V1 | hapi_schema | t
    
    docker exec bd-postgres-audit psql -U postgres -d auditdb \
      -c "SELECT version, description, success FROM flyway_audit_schema_history;"
    # Expected: V2 | audit_schema | t
    
  • Audit tables accepting inserts (INSERT-only role working):

    docker exec bd-postgres-audit psql -U audit_writer_login -d auditdb -c \
      "INSERT INTO audit.health_check (check_id) VALUES (gen_random_uuid()) 
       ON CONFLICT DO NOTHING; SELECT 'audit insert ok';"
    # Expected: audit insert ok
    
  • Run all nine acceptance tests from deployment-guide.md Part 3
    Tests 1-9 must all produce the expected HTTP status codes before the server is declared production-ready.


Operational readiness (before announcing to vendors)

  • Partition maintenance cron configured on audit database host (scaling-roadmap.md)
    Run: docker exec bd-postgres-audit psql -U postgres -d auditdb -c "SELECT audit.create_next_month_partitions();" — verify it creates next month without error.

  • Log shipping to ELK configured (or Filebeat agent installed and shipping /app/logs/)
    Minimum: verify logs appear at docker compose logs hapi in JSON format.

  • FHIR_ADMIN_CLIENT_SECRET stored in version upgrade pipeline's secrets vault
    Required by ops/version-upgrade-integration.md before next ICD-11 release.

  • Next ICD-11 version upgrade date noted — cache flush must be coordinated with OCL import completion
    See ops/version-upgrade-integration.md for the 7-step procedure.

  • Vendor onboarding runbook prepared citing ops/keycloak-setup.md Parts 3-4
    Each new vendor requires: Keycloak client, mci-api role, sending_facility mapper, credentials delivery.


Architecture decision record — key decisions frozen in this implementation

The following decisions were finalised through the pre-implementation challenge process and are reflected throughout the codebase. They are not configurable at runtime without code changes.

Decision Rationale Where enforced
PostgreSQL only, no H2 National infrastructure requires production-grade persistence DataSourceConfig.java, Flyway migrations, docker-compose.yml
Validation on ALL requests No vendor exemptions — uniform HIE boundary RequestValidatingInterceptor with failOnSeverity=ERROR
OCL is single terminology authority No local ICD-11 copy — live validation BdTerminologyValidationSupport, chain position 6
$expand failures never cause rejection Known OCL limitation isValueSetSupported()=false, expandValueSet() returns null
Only $validate-code failures cause 422 Distinguish expansion from validation BdTerminologyValidationSupport.validateCode()
Keycloak hris realm, mci-api role, no basic auth Single authentication authority KeycloakJwtInterceptor, SecurityConfig
Audit log append-only, separate instance Immutability, forensic separation postgres-audit separate container, audit_writer INSERT-only role
Rejected payloads stored forensically Vendor debugging, dispute resolution RejectedSubmissionSink, audit.fhir_rejected_submissions
IG bundled in Docker image Reproducible builds, no runtime URL dependency Dockerfile COPY, IgPackageInitializer
Cluster expressions via extension, not raw code BD Core IG decided pattern ClusterExpressionValidator, POSTCOORD_CHARS rejection
Fail-open for OCL/cluster validator outages Service continuity over perfect validation BdTerminologyValidationSupport catch blocks, ClusterExpressionValidator catch blocks
meta.tag = unvalidated-profile for unknown types FHIR-native, queryable, no schema changes unvalidatedProfileTagInterceptor in FhirServerConfig
pgBouncer session mode Hibernate prepared statement compatibility docker-compose.yml PGBOUNCER_POOL_MODE: session
Flyway bypasses pgBouncer for migrations DDL transaction safety SPRING_FLYWAY_URL points to postgres-fhir:5432 directly
Advisory lock for IG initialisation Multi-replica startup race prevention IgPackageInitializer djb2 lock key
Two MDC cleanup hooks Thread pool MDC leak prevention KeycloakJwtInterceptor COMPLETED_NORMALLY + COMPLETED