High Availability ================= openFHIR Enterprise caches mappings, templates, and related configuration in memory to reduce database load and improve response times. In a single-node deployment this is transparent. In a multi-node (HA) deployment, a change made through one node — such as uploading a new mapping or OPT — must be reflected on all peers immediately. Without coordination, peer nodes continue serving stale cached data until they are restarted. To address this, openFHIR Enterprise propagates cache invalidation events across all running instances automatically. No external message broker or additional infrastructure is required. Peer discovery -------------- Nodes need to find each other at startup. Three discovery modes are available, selected via the ``CLUSTER_DISCOVERY`` environment variable: .. list-table:: :header-rows: 1 :widths: 15 30 55 * - Mode - When to use - Notes * - ``MULTICAST`` *(default)* - Local development, Docker Compose, VMs on the same network segment - Works out of the box on most networks. May be blocked in some cloud or restricted environments. * - ``DNS`` - Kubernetes - Requires a headless Service so that the DNS name resolves to individual pod IPs. * - ``TCP`` - VM / bare-metal, or any environment where multicast is unavailable - Requires a static list of all peer addresses provided upfront. Configuration reference ----------------------- .. list-table:: :header-rows: 1 :widths: 30 15 55 * - Environment variable - Default - Description * - ``CLUSTER_DISCOVERY`` - ``MULTICAST`` - Peer discovery mode: ``MULTICAST``, ``DNS``, or ``TCP``. * - ``CLUSTER_DNS_QUERY`` - *(empty)* - Headless DNS name to resolve when using ``DNS`` mode (e.g. a Kubernetes headless Service hostname). * - ``CLUSTER_INITIAL_HOSTS`` - *(empty)* - Comma-separated list of peer addresses when using ``TCP`` mode, in the form ``host[port],host[port]``. * - ``CLUSTER_PORT`` - ``7800`` - Port used for cluster communication. Per-environment setup --------------------- Single node ~~~~~~~~~~~ No configuration is required. The node forms a cluster of one and cache invalidation operates locally as usual. Docker Compose ~~~~~~~~~~~~~~ Multicast works on Docker's default bridge network, so no additional configuration is needed when running multiple replicas in Compose. .. code-block:: yaml services: openfhir-node-1: image: openfhir-enterprise:latest # no cluster configuration needed — MULTICAST is the default openfhir-node-2: image: openfhir-enterprise:latest If multicast is disabled on the network, switch to ``TCP`` mode and set ``CLUSTER_INITIAL_HOSTS`` to the list of all peer service names and their cluster port. .. code-block:: yaml services: openfhir-node-1: image: openfhir-enterprise:latest environment: CLUSTER_DISCOVERY: TCP CLUSTER_INITIAL_HOSTS: openfhir-node-2[7800] openfhir-node-2: image: openfhir-enterprise:latest environment: CLUSTER_DISCOVERY: TCP CLUSTER_INITIAL_HOSTS: openfhir-node-1[7800] Kubernetes ~~~~~~~~~~ Use ``DNS`` mode with a headless Service. Set ``CLUSTER_DNS_QUERY`` to the fully qualified DNS name of the headless Service (e.g. ``openfhir-cluster.default.svc.cluster.local``). The DNS name must resolve to the IPs of all running pods. The pod's service account requires read access to ``pods`` and ``endpoints`` in the deployment namespace so that peer IPs can be resolved at startup. .. code-block:: yaml env: - name: CLUSTER_DISCOVERY value: DNS - name: CLUSTER_DNS_QUERY value: openfhir-cluster.default.svc.cluster.local VM / bare-metal ~~~~~~~~~~~~~~~ Use ``TCP`` mode and set ``CLUSTER_INITIAL_HOSTS`` to the addresses and cluster ports of all nodes. Every node must include the full list of peers, including itself. .. code-block:: bash CLUSTER_DISCOVERY=TCP CLUSTER_INITIAL_HOSTS=10.0.0.1[7800],10.0.0.2[7800],10.0.0.3[7800] Behaviour and guarantees ------------------------ Cache invalidation is best-effort. If a peer is temporarily unreachable when a change is made, it will not receive the invalidation event for that change. It will serve stale data until the next request for the same resource triggers a reload from the database, or until the node is restarted. This is an acceptable trade-off given that the database remains the authoritative source of truth at all times. All nodes in the cluster must run the same version of openFHIR Enterprise.