High Availability

openFHIR Enterprise caches mappings, templates, and related configuration in memory to reduce database load and improve response times. In a single-node deployment this is transparent. In a multi-node (HA) deployment, a change made through one node — such as uploading a new mapping or OPT — must be reflected on all peers immediately. Without coordination, peer nodes continue serving stale cached data until they are restarted.

To address this, openFHIR Enterprise propagates cache invalidation events across all running instances automatically. No external message broker or additional infrastructure is required.

Peer discovery

Nodes need to find each other at startup. Three discovery modes are available, selected via the CLUSTER_DISCOVERY environment variable:

Mode

When to use

Notes

MULTICAST (default)

Local development, Docker Compose, VMs on the same network segment

Works out of the box on most networks. May be blocked in some cloud or restricted environments.

DNS

Kubernetes

Requires a headless Service so that the DNS name resolves to individual pod IPs.

TCP

VM / bare-metal, or any environment where multicast is unavailable

Requires a static list of all peer addresses provided upfront.

Configuration reference

Environment variable

Default

Description

CLUSTER_DISCOVERY

MULTICAST

Peer discovery mode: MULTICAST, DNS, or TCP.

CLUSTER_DNS_QUERY

(empty)

Headless DNS name to resolve when using DNS mode (e.g. a Kubernetes headless Service hostname).

CLUSTER_INITIAL_HOSTS

(empty)

Comma-separated list of peer addresses when using TCP mode, in the form host[port],host[port].

CLUSTER_PORT

7800

Port used for cluster communication.

Per-environment setup

Single node

No configuration is required. The node forms a cluster of one and cache invalidation operates locally as usual.

Docker Compose

Multicast works on Docker’s default bridge network, so no additional configuration is needed when running multiple replicas in Compose.

services:
  openfhir-node-1:
    image: openfhir-enterprise:latest
    # no cluster configuration needed — MULTICAST is the default

  openfhir-node-2:
    image: openfhir-enterprise:latest

If multicast is disabled on the network, switch to TCP mode and set CLUSTER_INITIAL_HOSTS to the list of all peer service names and their cluster port.

services:
  openfhir-node-1:
    image: openfhir-enterprise:latest
    environment:
      CLUSTER_DISCOVERY: TCP
      CLUSTER_INITIAL_HOSTS: openfhir-node-2[7800]

  openfhir-node-2:
    image: openfhir-enterprise:latest
    environment:
      CLUSTER_DISCOVERY: TCP
      CLUSTER_INITIAL_HOSTS: openfhir-node-1[7800]

Kubernetes

Use DNS mode with a headless Service. Set CLUSTER_DNS_QUERY to the fully qualified DNS name of the headless Service (e.g. openfhir-cluster.default.svc.cluster.local). The DNS name must resolve to the IPs of all running pods.

The pod’s service account requires read access to pods and endpoints in the deployment namespace so that peer IPs can be resolved at startup.

env:
  - name: CLUSTER_DISCOVERY
    value: DNS
  - name: CLUSTER_DNS_QUERY
    value: openfhir-cluster.default.svc.cluster.local

VM / bare-metal

Use TCP mode and set CLUSTER_INITIAL_HOSTS to the addresses and cluster ports of all nodes. Every node must include the full list of peers, including itself.

CLUSTER_DISCOVERY=TCP
CLUSTER_INITIAL_HOSTS=10.0.0.1[7800],10.0.0.2[7800],10.0.0.3[7800]

Behaviour and guarantees

Cache invalidation is best-effort. If a peer is temporarily unreachable when a change is made, it will not receive the invalidation event for that change. It will serve stale data until the next request for the same resource triggers a reload from the database, or until the node is restarted. This is an acceptable trade-off given that the database remains the authoritative source of truth at all times.

All nodes in the cluster must run the same version of openFHIR Enterprise.