6. Operational Documentation
6.1 Deployment procedures
Front-end (built today)
The app is a static export deployed to S3 behind CloudFront via AWS CDK (Python). The CDK app builds the site itself.
# 1. Install app dependencies (CDK runs next build during deploy)
npm install
# 2. Regenerate documentation artefacts (see build order below)
npm run gen:docs # per-screen Markdown from the help registry
npm run build:docs # docs site data module from /docs Markdown
npm run verify:help # assert every walkthrough anchor resolves
# 3. Deploy infrastructure (builds out/, uploads, invalidates CDN)
cd infrastructure
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cdk deploy # or ./deploy.sh
cdk deploy runs next build (which, with output: "export", emits out/), uploads out/ to the private S3 bucket via Origin Access Control, and issues a CloudFront /* invalidation, so visitors always get the latest build. To inspect without rebuilding, use cdk synth --app "python3 app.py" when out/ already exists.
Documentation build order (important)
The /docs site is generated, not hand-rendered. The order matters because each step feeds the next:
flowchart LR A[Edit help content + /docs Markdown] --> B[npm run gen:docs] B --> C[npm run build:docs] C --> D[lib/docs-generated.ts] D --> E[next build -> /docs/* static pages]
npm run gen:docsregenerates the per-screen Markdown in/docsfromcomponents/help/content/*.npm run build:docswalksdocs/product/**,docs/developer/**and the per-screen Markdown, converts to HTML, builds the navigation and search index, and writeslib/docs-generated.ts.next buildstatically renders every/docs/*page (viagenerateStaticParams) intoout/.
Commit the regenerated artefacts together with the source change. See AGENTS.md for the rule that help content, walkthroughs and /docs must stay in lock-step.
Back-end (production, Planned)
Domain services, the database and the event bus deploy as separate stacks (containers or serverless), promoted through environments by CI/CD with automated tests, migrations run with a forward-only, reversible strategy, and blue-green or canary cutover for zero-downtime releases. The static front-end is configured with the environment's API base URL.
6.2 Environment configuration
| Variable | Scope | Purpose |
|---|---|---|
NEXT_PUBLIC_API_BASE_URL | Front-end (public) | Base URL the lib/data.ts seam calls in production |
CERTIFICATE_ARN | Infra (config.py) | ACM certificate in us-east-1 for CloudFront |
HOSTED_ZONE_NAME / HOSTED_ZONE_ID | Infra (config.py) | Route 53 zone for the domain |
DOMAIN_NAME | Infra (config.py) | Public hostname (today mock.globalclinic.app) |
| Database URL, KMS key ids, provider keys | Back-end (secrets store) | Never in source or the static bundle; server-side only |
Configuration principles: only public values use NEXT_PUBLIC_*; everything secret lives in the secrets manager per environment; the bucket name is derived (gc-mocks-<account>-<region_without_dashes>), so no manual naming. Environments are isolated (dev, staging, production) with separate accounts or strict boundaries.
6.3 Monitoring and observability
- Front-end / CDN. CloudFront and S3 metrics (requests, 4xx and 5xx, cache hit ratio, origin latency); real-user monitoring for web vitals; synthetic checks on the marketing site and
/docs. - Back-end (Planned). Structured logs, metrics and distributed traces per service. Golden signals (latency, traffic, errors, saturation) per service and per integration adapter.
- Business and SLO dashboards. Stage SLA performance and breaches (from stage
owner/slaTarget), lead-to-case funnel by corridor and channel, time-to-first-response and time-to-quote, escrow release latency and reconciliation breaks, visa submission and grant rates, and communication delivery rates. - Alerting. Page on money path failures (escrow, refunds, reconciliation mismatch), visa SLA breaches, integration circuit-breaker trips, auth anomaly spikes, and screening hits. Route by severity to the on-call rotation.
- Audit and security monitoring. Audit events and security signals flow to monitoring; anomalies (cross-tenant access attempts, failed-auth spikes) alert.
6.4 Backup and recovery
- Front-end. The static site is reproducible from
next build; the bucket is intentionally disposable. Recovery is a redeploy. Keep the source repository and CDK as the source of truth. - Database (Planned). Automated, encrypted backups with point-in-time recovery; periodic restore drills to validate backups; cross-region copies for disaster recovery.
- Object storage (documents, Planned). Versioning and cross-region replication; lifecycle rules aligned to the retention schedule; deletes are soft and audited.
- Ledger and audit. Append-only and backed up with the database; immutability preserved across restores.
- Targets. Define and test RPO and RTO per data class; money and clinical data carry the strictest targets. Document the runbook for each recovery scenario.
6.5 Incident response procedures
flowchart LR D[Detect: alert / report] --> T[Triage + severity] T --> C[Contain] C --> M[Mitigate / restore] M --> R[Resolve + verify] R --> P[Postmortem + actions]
- Severity. SEV1 (patient safety or money at risk, or a breach) pages 24/7; SEV2 (journey-blocking) responds within the stage SLA; SEV3 (degraded) next business day.
- Roles. An incident commander coordinates; a communications lead handles internal and patient or clinic comms; subject experts mitigate. The Medical Director is engaged for any clinical-safety incident.
- Data breach. A suspected breach triggers the breach workflow and the regulatory notification timelines in the compliance section (for example GDPR's 72-hour rule). Preserve audit logs; do not destroy evidence.
- Money incidents. Place affected escrow tranches on hold; reconcile against the payment partner; never improvise a release.
- Postmortem. Blameless, with timeline, root cause, and tracked corrective actions. The repository provides an incident-response workflow for triage and postmortem.
6.6 Support and troubleshooting
Support uses the contextual help system and the edge-case playbooks. Common issues and first steps:
| Symptom | Likely cause | First step |
|---|---|---|
| Visa "Submit" stays disabled | A linked document is not yet verified | Open Documents; confirm the linked item is uploaded and verified; the visa item then reads Verified |
| Document upload rejected | File type or size, or failed scan | Re-upload a valid file within type and size limits |
| Payment failed at funding | Method or currency not valid for the corridor; provider decline | Retry or switch method; confirm no charge occurred; escalate to Finance if ambiguous |
| Travel option unavailable | Provider inventory changed | Travel Desk proposes an equivalent; selection stays editable |
/docs page missing after a content change | build:docs not run before next build | Run gen:docs then build:docs, then rebuild |
| Walkthrough step points at nothing | A UI element or its data-tour anchor moved | Run verify:help; fix the anchor or step in the ScreenDoc |
| Clinic not visible on the marketplace | Clinic status is not verified | Complete onboarding and verification |
Escalation: clinical to the Medical Director (24/7), money to Finance, visa or travel to the relevant desk, security or data to the incident process. Every support interaction that touches a patient case is consent-gated and audited.
6.7 Everyday commands
npm run dev # local dev server
npm run typecheck # tsc --noEmit
npm run verify:help # validate walkthrough anchors
npm run gen:docs # regenerate per-screen /docs from help content
npm run build:docs # build the /docs site data module (lib/docs-generated.ts)
npm run build # production static build into out/