Planning for Disaster Recovery¶
The SaaS version of Run:ai moves the bulk of the burden of disaster recovery to Run:ai. Backup of data is hence not an issue in such environments.
With the self-hosted version, it is the responsibility of the IT organization to backup data for a possible disaster and to learn how to recover when needed.
Run:ai uses an internal PostgreSQL database. The database is stored on a Kubernetes Persistent Volume (PV). You must provide a backup solution for the database.
- (Recommended) Back up the PV.
- Use the company's enterprise PostgreSQL solution if exists, instead of the in-place instance that Run:ai spawns.
Run:ai stores metric history using Thanos. Thanos is configured to store data on a persistent volume. The recommendation is to back up the PV.
During the installation of Run:ai you have created two value files,
- one for the Run:ai control plane (also called 'backend'). See kubernetes or OpenShift,
- and one for the cluster (see kubernetes or OpenShift).
You will want to save these files, or extract a current version of the file by using the upgrade script.
Administrators may also create templates. Templates are stored as ConfigMaps in the
To recover Run:ai
- Re-create the Kubernetes/OpenShift cluster.
- Recover the persistent volumes for metrics and database.
- Re-install the Run:ai control plane. Use the stored values file. If needed, modify the values file to connect to the restored PostgreSQL PV. Connect Prometheus to the stored metrics PV.
- Re-install the cluster. Use the stored values file or download a new file from the Administration UI.
- If the cluster is configured such that Projects do not create namespace automatically, you will need to re-create namespaces and apply role bindings as discussed in kubernetes or OpenShift.