Rationarium Disaster Recovery Plan

Effective Date: January 1, 2026 Review Cycle: Annual Plan Owner: Chief Executive Officer / Chief Technology Officer, Rationarium Inc. Last Documented Review: December 9, 2025

Purpose

This plan describes how Rationarium Inc. detects, responds to, and recovers from unplanned disruptions to its WeBWorK hosting services, and defines recovery objectives that customers can rely on when planning their own continuity obligations.

Scope

This plan covers all production WeBWorK hosting environments operated by Rationarium Inc., including:

Single-tenant customer WeBWorK virtual machines
Customer databases and file storage
Configuration and operational artifacts required to rebuild a customer environment
Monitoring, alerting, and backup infrastructure

Recovery Objectives

Objective	Target	Measured From
Recovery Point Objective (RPO)	≤ 24 hours	Last successful nightly backup
Recovery Time Objective (RTO) — single instance	≤ 60 minutes	Incident declared
Recovery Time Objective (RTO) — multi-instance or regional event	≤ 48 hours	Incident declared
Customer Notification Time	≤ 4 hours	Incident confirmed

Threats Covered

Single virtual machine failure (kernel panic, disk corruption, memory exhaustion, accidental deletion)
Data-center availability event affecting the DigitalOcean nyc3 region
Hosting-provider outage requiring migration to an alternate provider
Malicious or accidental data loss within a customer instance
Configuration or deployment error affecting customer access

Data Protection Controls

All customer environments use the following protections:

Nightly encrypted backups of the WeBWorK database and the WeBWorK root directory (courses, templates, user uploads), with a seven-day rolling retention. Backup integrity is verified at capture time.
VM snapshot images maintained at DigitalOcean for each customer instance. Snapshots are refreshed after significant configuration or software changes.
Off-box backup storage separate from the production virtual machine, so that a host-level compromise cannot destroy both the live data and the backups.
Infrastructure-as-code artifacts — Docker compose files, installation scripts, and configuration templates — retained in version control.

Recovery Procedures

Scenario A — Single VM Failure

Incident detected via the monitoring stack or a customer report.
The failed instance is assessed; where recoverable, services are restarted via the automated watchdog (status-check.sh).
If the virtual machine is unrecoverable, a replacement VM is provisioned from the most recent snapshot image.
The latest nightly database backup is restored.
DNS, TLS, and LMS integrations are verified.
The customer is notified of restoration and any data-loss window.

Target RTO: 60 minutes from incident declaration.

Scenario B — DigitalOcean nyc3 Regional Event

Incident detected via the monitoring stack or DigitalOcean status notifications.
Replacement infrastructure is provisioned in an alternate DigitalOcean region (for example nyc1 or sfo3) or, if DigitalOcean is unavailable, at a backup hosting provider.
Latest off-box backups are deployed to the new environment.
Customer DNS records are updated to point to the new endpoints.
LMS integration credentials and callbacks are reissued or migrated as needed.
Customers are notified as each instance is restored.

Target RTO: 48 hours from incident declaration, contingent on backup-provider capacity.

Scenario C — Intra-Instance Data Loss

The customer reports loss — for example, an accidental course deletion, bulk grade overwrite, or corrupted import.
Relevant tables or directories are restored from the most recent nightly backup into a staging environment.
Affected data is merged or replaced in the live instance in coordination with the customer administrator.
A post-incident review is conducted with the customer.

Target RTO: Varies by complexity; typically four hours from request to remediation.

Communication

Customer notification. The affected customer is notified within four hours of incident confirmation. Updates are delivered at least daily until the incident is resolved.
Status page. Ongoing incidents that affect availability are posted to https://status.rationarium.org.
Post-incident report. Customers receive a written summary of root cause, resolution, and preventive measures within five business days of restoration.

Testing

Rationarium tests components of this plan on a rolling basis rather than through a single annual full test. The following activities constitute ongoing plan exercise:

Backup restoration tests are performed at least quarterly against a non-production environment.
Snapshot provisioning is exercised whenever a new customer instance is provisioned or migrated.
Cross-region provisioning is exercised when staging new production environments.
Runbook walkthroughs are conducted at least annually.

Last documented plan review and component test: April 9, 2026.

Roles and Responsibilities

Rationarium’s operating scale consolidates disaster-recovery responsibilities in the CEO/CTO role:

Incident declaration — CEO/CTO, or automated alert escalation
Customer communication — CEO/CTO
Technical recovery — CEO/CTO
Post-incident review — CEO/CTO

Standards Alignment

This plan aligns with the disaster-recovery practices described in NIST SP 800-34 Rev. 1 (Contingency Planning Guide for Federal Information Systems), scaled to Rationarium’s size and risk profile.

Review

This plan is reviewed annually and updated following any material incident, architectural change, or change of hosting provider.