Files
bd-fhir-national/ops/project-manifest.md
2026-03-16 00:02:58 +06:00

272 lines
17 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# BD FHIR National — Project Manifest & Pre-Flight Checklist
**Project:** BD Core FHIR National Repository and Validation Engine
**IG Version:** BD Core FHIR IG v0.2.1
**FHIR Version:** R4 (4.0.1)
**HAPI Version:** 7.2.0
**Published by:** DGHS/MoHFW Bangladesh
**Generated:** 2025
---
## Complete file manifest
### Build and orchestration
| File | Step | Purpose |
|------|------|---------|
| `pom.xml` | 1 | Parent Maven POM. HAPI 7.2.0 BOM, Spring Boot 3.2.5, all version pins. |
| `hapi-overlay/pom.xml` | 2 | Child module POM. All runtime dependencies. Fat JAR output: `bd-fhir-hapi.jar`. |
| `hapi-overlay/Dockerfile` | 4 | Multi-stage build: Maven builder + eclipse-temurin:17-jre runtime. tini as PID 1. |
| `docker-compose.yml` | 4 | Production orchestration: HAPI, 2× PostgreSQL, 2× pgBouncer, nginx. Scaling roadmap in comments. |
| `.env.example` | 4 | Environment variable template. Copy to `.env`, fill secrets, `chmod 600`. |
### Database
| File | Step | Purpose |
|------|------|---------|
| `hapi-overlay/src/main/resources/db/migration/fhir/V1__hapi_schema.sql` | 3 | HAPI 7.2.0 JPA schema. All tables, sequences, indexes. Flyway-managed. Partition comments at 10M+ rows. |
| `hapi-overlay/src/main/resources/db/migration/audit/V2__audit_schema.sql` | 3 | Audit schema. Partitioned `audit_events` and `fhir_rejected_submissions` by month 2025-2027. INSERT-only role grants. `create_next_month_partitions()` maintenance function. |
| `postgres/fhir/postgresql.conf` | 4 | PostgreSQL 15 tuning for HAPI JPA workload. 2GB container. SSD-optimised. |
| `postgres/audit/postgresql.conf` | 4 | PostgreSQL 15 tuning for audit INSERT workload. 1GB container. |
| `postgres/fhir/init.sql` | 4 | Template — **replace with `init.sh`** per deployment-guide.md §1.6 before first deploy. |
| `postgres/audit/init.sql` | 4 | Template — **replace with `init.sh`** per deployment-guide.md §1.6 before first deploy. |
### Application configuration
| File | Step | Purpose |
|------|------|---------|
| `hapi-overlay/src/main/resources/application.yaml` | 5 | Complete Spring Boot + HAPI configuration. Dual datasource, dual Flyway, HAPI R4, validation chain, actuator, structured logging. All secrets via env vars. |
| `hapi-overlay/src/main/resources/logback-spring.xml` | 5 | Structured JSON logging via logstash-logback-encoder. Async appenders. MDC field inclusion. |
### Java source — entry point
| File | Step | Purpose |
|------|------|---------|
| `hapi-overlay/src/main/java/bd/gov/dghs/fhir/BdFhirApplication.java` | 12 | Spring Boot entry point. `@EnableAsync` activates audit async executor. |
### Java source — configuration
| File | Step | Purpose |
|------|------|---------|
| `hapi-overlay/src/main/java/bd/gov/dghs/fhir/config/DataSourceConfig.java` | 6 | Dual datasource wiring. Primary FHIR datasource (HikariCP, pgBouncer session mode). Secondary audit datasource (INSERT-only). Dual Flyway instances. `auditDbHealthIndicator` using INSERT test. `oclHealthIndicator`. `entityManagerFactory` bound explicitly to FHIR datasource. |
| `hapi-overlay/src/main/java/bd/gov/dghs/fhir/config/FhirServerConfig.java` | 6 | Validation support chain (6 supports in dependency order). `NpmPackageValidationSupport` loading BD Core IG. `RequestValidatingInterceptor` with failOnSeverity=ERROR. `unvalidatedProfileTagInterceptor` for unknown resource types. Startup IG presence check. |
| `hapi-overlay/src/main/java/bd/gov/dghs/fhir/config/SecurityConfig.java` | 8 | Registers JWT, validation, and audit interceptors into HAPI RestfulServer in correct order. HTTPS enforcement filter. Security response headers filter. |
### Java source — initialisation
| File | Step | Purpose |
|------|------|---------|
| `hapi-overlay/src/main/java/bd/gov/dghs/fhir/init/IgPackageInitializer.java` | 9 | `InitializingBean` that loads BD Core IG with PostgreSQL advisory lock. Prevents multi-replica NPM_PACKAGE race condition. djb2 hash for stable lock key. |
### Java source — interceptors
| File | Step | Purpose |
|------|------|---------|
| `hapi-overlay/src/main/java/bd/gov/dghs/fhir/interceptor/KeycloakJwtInterceptor.java` | 8 | Nimbus JOSE+JWT with `RemoteJWKSet` (1-hour TTL, kid-based refresh). Validates: signature, expiry, issuer, `mci-api` role. Extracts: `client_id`, `subject`, `sending_facility`. Sets all `REQUEST_ATTR_*` constants. MDC population and guaranteed cleanup. `GET /fhir/metadata` and actuator health exempt. |
| `hapi-overlay/src/main/java/bd/gov/dghs/fhir/interceptor/AuditEventInterceptor.java` | 9 | Three-hook interceptor: (1) cluster expression pre-validation, (2) accepted resource audit at `STORAGE_PRESTORAGE_*`, (3) rejected resource audit at `SERVER_HANDLE_EXCEPTION`. Routes to `AuditEventEmitter` and `RejectedSubmissionSink` asynchronously. |
### Java source — terminology
| File | Step | Purpose |
|------|------|---------|
| `hapi-overlay/src/main/java/bd/gov/dghs/fhir/terminology/BdTerminologyValidationSupport.java` | 7 | Custom `IValidationSupport`. Forces `$validate-code` for ICD-11. Suppresses `$expand` via `isValueSetSupported()=false`. 24-hour `ConcurrentHashMap` cache with TTL eviction. Retry with exponential backoff. Fail-open on OCL outage. `flushCache()` called by `TerminologyCacheManager`. |
| `hapi-overlay/src/main/java/bd/gov/dghs/fhir/terminology/TerminologyCacheManager.java` | 7 | REST controller: `DELETE /admin/terminology/cache` and `GET /admin/terminology/cache/stats`. Requires `fhir-admin` role (read from `REQUEST_ATTR_IS_ADMIN`). Called by ICD-11 version upgrade pipeline. |
### Java source — validator
| File | Step | Purpose |
|------|------|---------|
| `hapi-overlay/src/main/java/bd/gov/dghs/fhir/validator/ClusterExpressionValidator.java` | 7 | Detects `icd11-cluster-expression` extension on ICD-11 `Coding` elements. Rejects raw postcoordinated strings (contains `&`, `/`, `%` without extension) with 422. Calls `https://icd11.dghs.gov.bd/cluster/validate` for full expression validation. Fail-open on cluster validator outage. |
### Java source — audit
| File | Step | Purpose |
|------|------|---------|
| `hapi-overlay/src/main/java/bd/gov/dghs/fhir/audit/AuditEventEmitter.java` | 9 | `@Async` INSERT to `audit.audit_events`. Immutable (INSERT only — `audit_writer` role enforces at DB level). Serialises `validationMessages` as JSONB. Truncates fields to column lengths. Logs ERROR on write failure (audit gap is a high-priority incident). |
| `hapi-overlay/src/main/java/bd/gov/dghs/fhir/audit/RejectedSubmissionSink.java` | 9 | `@Async` INSERT to `audit.fhir_rejected_submissions`. Stores full resource payload as TEXT (preserves exact bytes). 4MB payload cap (anti-DoS). Machine-readable `rejection_code` for programmatic analysis. |
### Infrastructure
| File | Step | Purpose |
|------|------|---------|
| `nginx/nginx.conf` | 10 | Reverse proxy. TLS 1.2/1.3 only. Rate limiting: FHIR 10r/s, admin 6r/m, metadata 5r/s. `/admin/` restricted to `172.20.0.0/16`. `/actuator/` restricted to internal network. `/fhir/metadata` unauthenticated. All other paths → HAPI. |
| `hapi-overlay/src/main/resources/packages/.gitkeep` | 12 | Marks the IG package directory for git. CI pipeline places `bd.gov.dghs.core-{version}.tgz` here before `docker build`. |
### Operations
| File | Step | Purpose |
|------|------|---------|
| `ops/keycloak-setup.md` | 10 | `fhir-admin` role creation. `fhir-admin-pipeline` client setup. Vendor client template. `sending_facility` mapper configuration. Token verification tests. |
| `ops/version-upgrade-integration.md` | 10 | ICD-11 upgrade pipeline integration. Pre-flush OCL verification. `get_fhir_admin_token()`, `flush_hapi_terminology_cache()`, `verify_hapi_validates_new_version()` Python functions. `post_ocl_import_hapi_integration()` call site. Rollback procedure. |
| `ops/scaling-roadmap.md` | 10 | Phase 1→2→3 thresholds and changes. Monthly partition maintenance cron. PostgreSQL monitoring queries. IG upgrade procedure. Key Prometheus metrics and alert thresholds. |
| `ops/deployment-guide.md` | 11 | Step-by-step Ubuntu 22.04 deployment. Docker install, daemon config, registry auth. PostgreSQL init script fix (critical). First-deploy sequence. Nine acceptance tests. Rolling upgrade procedure. Operational runbook. |
---
## Pre-flight checklist
Work through this list top to bottom before running `docker compose up`.
Each item is a documented failure mode from real HAPI deployments.
**Do not skip items marked CRITICAL.**
---
### CI machine (before docker build)
- [ ] **[CRITICAL]** `bd.gov.dghs.core-0.2.1.tgz` placed in `hapi-overlay/src/main/resources/packages/`
*Symptom if missing: startup fails with `STARTUP FAILURE: BD Core IG package not found`. Container will not start.*
- [ ] `HAPI_IG_PACKAGE_CLASSPATH` in `docker-compose.yml` matches the `.tgz` filename exactly
*Symptom if mismatch: same STARTUP FAILURE as above.*
- [ ] Docker image built with correct `--build-arg` values and pushed to private registry
*Verify: `docker manifest inspect your-registry.dghs.gov.bd/bd-fhir-hapi:1.0.0`*
- [ ] Image tag in `.env.example` (and your `.env`) matches the pushed image tag
*Symptom if mismatch: `docker compose pull` pulls wrong image or fails.*
---
### Production server (before docker compose up)
- [ ] **[CRITICAL]** `postgres/fhir/init.sql` replaced with `init.sh` (deployment-guide.md §1.6)
*Symptom if skipped: `hapi_app` user is never created. Flyway migrations succeed but HAPI runtime fails with authentication error to postgres-fhir.*
- [ ] **[CRITICAL]** `postgres/audit/init.sql` replaced with `init.sh` (deployment-guide.md §1.6)
*Symptom if skipped: `audit_writer_login` never created. HAPI starts but all audit writes fail with `FATAL: password authentication failed for user "audit_writer_login"`.*
- [ ] `docker-compose.yml` `postgres-audit` service updated to mount `init.sh` (not `init.sql`) and passes `AUDIT_DB_WRITER_USER/PASSWORD/MAINTAINER_*` env vars
*Follows from the init.sh fix above.*
- [ ] `.env` file created, all `<CHANGE_ME>` values replaced, `chmod 600 .env`
*Verify: `grep CHANGE_ME .env` returns no output.*
- [ ] `TLS_CERT_PATH` and `TLS_KEY_PATH` in `.env` point to files that exist on the server
*Verify: `ls -la $(grep TLS_CERT_PATH .env | cut -d= -f2)`*
- [ ] Server can reach all external services from within the Docker network:
```bash
# Test from inside a temporary container on the Docker network
docker run --rm --network bd-fhir-national_backend-fhir alpine sh -c \
"apk add -q curl && curl -s -o /dev/null -w '%{http_code}' \
https://auth.dghs.gov.bd/realms/hris/.well-known/openid-configuration"
# Expected: 200
```
*Symptom if unreachable: KeycloakJwtInterceptor fails to fetch JWKS on startup. All authenticated requests return 401 even with valid tokens.*
- [ ] `random_page_cost` in both `postgresql.conf` files matches your storage type
`1.1` for SSD (default in this project), `4.0` for spinning HDD
*Symptom if wrong: query planner chooses sequential scans over indexes. FHIR search performance degrades at >100k resources.*
- [ ] Docker and Docker Compose v2 installed (`docker compose version`, not `docker-compose`)
*Symptom if wrong: `docker-compose` (v1) does not support `deploy.replicas` or `condition: service_healthy`.*
- [ ] Private registry credentials stored in `~/.docker/config.json`
*Verify: `docker login your-registry.dghs.gov.bd`*
---
### Keycloak (before first vendor submission)
- [ ] **[CRITICAL]** `fhir-admin` realm role created in `hris` realm (keycloak-setup.md Part 1)
*Symptom if missing: `fhir-admin-pipeline` service account has no role to assign. Cache flush endpoint returns 403 for all callers.*
- [ ] **[CRITICAL]** `fhir-admin-pipeline` client created with `fhir-admin` role assigned (keycloak-setup.md Part 2)
*Symptom if missing: version upgrade pipeline cannot flush cache. After ICD-11 upgrade, stale codes accepted/rejected for up to 24 hours.*
- [ ] At least one vendor client created (`fhir-vendor-TEST-FAC-001` for acceptance testing) with `mci-api` role and `sending_facility` attribute mapper (keycloak-setup.md Parts 3-4)
*Symptom if missing: acceptance Test 1 returns 401. All vendor submissions rejected.*
- [ ] Token from test vendor client decoded and verified to contain:
- `iss`: `https://auth.dghs.gov.bd/realms/hris`
- `azp`: `fhir-vendor-TEST-FAC-001`
- `realm_access.roles`: contains `mci-api`
- `sending_facility`: non-empty facility code
*Verify with: `echo $TOKEN | cut -d. -f2 | base64 -d 2>/dev/null | jq .`*
---
### Post-startup verification
- [ ] All health indicators GREEN:
```bash
curl -s http://localhost:8080/actuator/health | jq '.components | keys'
# Expected: ["auditDb", "db", "livenessState", "ocl", "readinessState"]
# All must show "status": "UP"
```
- [ ] FHIR metadata accessible unauthenticated and shows correct IG version:
```bash
curl -s https://fhir.dghs.gov.bd/fhir/metadata | jq '.software.version'
# Expected: "0.2.1"
```
- [ ] Flyway migration history shows V1 and V2 applied cleanly:
```bash
docker exec bd-postgres-fhir psql -U postgres -d fhirdb \
-c "SELECT version, description, success FROM flyway_schema_history;"
# Expected: V1 | hapi_schema | t
docker exec bd-postgres-audit psql -U postgres -d auditdb \
-c "SELECT version, description, success FROM flyway_audit_schema_history;"
# Expected: V2 | audit_schema | t
```
- [ ] Audit tables accepting inserts (INSERT-only role working):
```bash
docker exec bd-postgres-audit psql -U audit_writer_login -d auditdb -c \
"INSERT INTO audit.health_check (check_id) VALUES (gen_random_uuid())
ON CONFLICT DO NOTHING; SELECT 'audit insert ok';"
# Expected: audit insert ok
```
- [ ] **Run all nine acceptance tests** from deployment-guide.md Part 3
Tests 1-9 must all produce the expected HTTP status codes before the server is declared production-ready.
---
### Operational readiness (before announcing to vendors)
- [ ] Partition maintenance cron configured on audit database host (scaling-roadmap.md)
*Run: `docker exec bd-postgres-audit psql -U postgres -d auditdb -c "SELECT audit.create_next_month_partitions();"` — verify it creates next month without error.*
- [ ] Log shipping to ELK configured (or Filebeat agent installed and shipping `/app/logs/`)
*Minimum: verify logs appear at `docker compose logs hapi` in JSON format.*
- [ ] `FHIR_ADMIN_CLIENT_SECRET` stored in version upgrade pipeline's secrets vault
*Required by `ops/version-upgrade-integration.md` before next ICD-11 release.*
- [ ] Next ICD-11 version upgrade date noted — cache flush must be coordinated with OCL import completion
*See `ops/version-upgrade-integration.md` for the 7-step procedure.*
- [ ] Vendor onboarding runbook prepared citing `ops/keycloak-setup.md` Parts 3-4
*Each new vendor requires: Keycloak client, `mci-api` role, `sending_facility` mapper, credentials delivery.*
---
## Architecture decision record — key decisions frozen in this implementation
The following decisions were finalised through the pre-implementation challenge process
and are reflected throughout the codebase. They are not configurable at runtime
without code changes.
| Decision | Rationale | Where enforced |
|----------|-----------|---------------|
| PostgreSQL only, no H2 | National infrastructure requires production-grade persistence | `DataSourceConfig.java`, Flyway migrations, `docker-compose.yml` |
| Validation on ALL requests | No vendor exemptions — uniform HIE boundary | `RequestValidatingInterceptor` with `failOnSeverity=ERROR` |
| OCL is single terminology authority | No local ICD-11 copy — live validation | `BdTerminologyValidationSupport`, chain position 6 |
| `$expand` failures never cause rejection | Known OCL limitation | `isValueSetSupported()=false`, `expandValueSet()` returns null |
| Only `$validate-code` failures cause 422 | Distinguish expansion from validation | `BdTerminologyValidationSupport.validateCode()` |
| Keycloak `hris` realm, `mci-api` role, no basic auth | Single authentication authority | `KeycloakJwtInterceptor`, `SecurityConfig` |
| Audit log append-only, separate instance | Immutability, forensic separation | `postgres-audit` separate container, `audit_writer` INSERT-only role |
| Rejected payloads stored forensically | Vendor debugging, dispute resolution | `RejectedSubmissionSink`, `audit.fhir_rejected_submissions` |
| IG bundled in Docker image | Reproducible builds, no runtime URL dependency | `Dockerfile` COPY, `IgPackageInitializer` |
| Cluster expressions via extension, not raw code | BD Core IG decided pattern | `ClusterExpressionValidator`, `POSTCOORD_CHARS` rejection |
| Fail-open for OCL/cluster validator outages | Service continuity over perfect validation | `BdTerminologyValidationSupport` catch blocks, `ClusterExpressionValidator` catch blocks |
| `meta.tag = unvalidated-profile` for unknown types | FHIR-native, queryable, no schema changes | `unvalidatedProfileTagInterceptor` in `FhirServerConfig` |
| pgBouncer session mode | Hibernate prepared statement compatibility | `docker-compose.yml` `PGBOUNCER_POOL_MODE: session` |
| Flyway bypasses pgBouncer for migrations | DDL transaction safety | `SPRING_FLYWAY_URL` points to `postgres-fhir:5432` directly |
| Advisory lock for IG initialisation | Multi-replica startup race prevention | `IgPackageInitializer` djb2 lock key |
| Two MDC cleanup hooks | Thread pool MDC leak prevention | `KeycloakJwtInterceptor` `COMPLETED_NORMALLY` + `COMPLETED` |