Rationarium Disaster Recovery Plan
Effective Date: January 1, 2026 Review Cycle: Annual Plan Owner: Chief Executive Officer / Chief Technology Officer, Rationarium Inc. Last Documented Review: December 9, 2025
Purpose
This plan describes how Rationarium Inc. detects, responds to, and recovers from unplanned disruptions to its WeBWorK hosting services, and defines recovery objectives that customers can rely on when planning their own continuity obligations.
Scope
This plan covers all production WeBWorK hosting environments operated by Rationarium Inc., including:
- Single-tenant customer WeBWorK virtual machines
- Customer databases and file storage
- Configuration and operational artifacts required to rebuild a customer environment
- Monitoring, alerting, and backup infrastructure
Recovery Objectives
| Objective | Target | Measured From |
|---|---|---|
| Recovery Point Objective (RPO) | ≤ 24 hours | Last successful nightly backup |
| Recovery Time Objective (RTO) — single instance | ≤ 60 minutes | Incident declared |
| Recovery Time Objective (RTO) — multi-instance or regional event | ≤ 48 hours | Incident declared |
| Customer Notification Time | ≤ 4 hours | Incident confirmed |
Threats Covered
- Single virtual machine failure (kernel panic, disk corruption, memory exhaustion, accidental deletion)
- Data-center availability event affecting the DigitalOcean nyc3 region
- Hosting-provider outage requiring migration to an alternate provider
- Malicious or accidental data loss within a customer instance
- Configuration or deployment error affecting customer access
Data Protection Controls
All customer environments use the following protections:
- Nightly encrypted backups of the WeBWorK database and the WeBWorK root directory (courses, templates, user uploads), with a seven-day rolling retention. Backup integrity is verified at capture time.
- VM snapshot images maintained at DigitalOcean for each customer instance. Snapshots are refreshed after significant configuration or software changes.
- Off-box backup storage separate from the production virtual machine, so that a host-level compromise cannot destroy both the live data and the backups.
- Infrastructure-as-code artifacts — Docker compose files, installation scripts, and configuration templates — retained in version control.
Recovery Procedures
Scenario A — Single VM Failure
- Incident detected via the monitoring stack or a customer report.
- The failed instance is assessed; where recoverable, services are restarted via the automated watchdog (
status-check.sh). - If the virtual machine is unrecoverable, a replacement VM is provisioned from the most recent snapshot image.
- The latest nightly database backup is restored.
- DNS, TLS, and LMS integrations are verified.
- The customer is notified of restoration and any data-loss window.
Target RTO: 60 minutes from incident declaration.
Scenario B — DigitalOcean nyc3 Regional Event
- Incident detected via the monitoring stack or DigitalOcean status notifications.
- Replacement infrastructure is provisioned in an alternate DigitalOcean region (for example nyc1 or sfo3) or, if DigitalOcean is unavailable, at a backup hosting provider.
- Latest off-box backups are deployed to the new environment.
- Customer DNS records are updated to point to the new endpoints.
- LMS integration credentials and callbacks are reissued or migrated as needed.
- Customers are notified as each instance is restored.
Target RTO: 48 hours from incident declaration, contingent on backup-provider capacity.
Scenario C — Intra-Instance Data Loss
- The customer reports loss — for example, an accidental course deletion, bulk grade overwrite, or corrupted import.
- Relevant tables or directories are restored from the most recent nightly backup into a staging environment.
- Affected data is merged or replaced in the live instance in coordination with the customer administrator.
- A post-incident review is conducted with the customer.
Target RTO: Varies by complexity; typically four hours from request to remediation.
Communication
- Customer notification. The affected customer is notified within four hours of incident confirmation. Updates are delivered at least daily until the incident is resolved.
- Status page. Ongoing incidents that affect availability are posted to https://status.rationarium.org.
- Post-incident report. Customers receive a written summary of root cause, resolution, and preventive measures within five business days of restoration.
Testing
Rationarium tests components of this plan on a rolling basis rather than through a single annual full test. The following activities constitute ongoing plan exercise:
- Backup restoration tests are performed at least quarterly against a non-production environment.
- Snapshot provisioning is exercised whenever a new customer instance is provisioned or migrated.
- Cross-region provisioning is exercised when staging new production environments.
- Runbook walkthroughs are conducted at least annually.
Last documented plan review and component test: April 9, 2026.
Roles and Responsibilities
Rationarium’s operating scale consolidates disaster-recovery responsibilities in the CEO/CTO role:
- Incident declaration — CEO/CTO, or automated alert escalation
- Customer communication — CEO/CTO
- Technical recovery — CEO/CTO
- Post-incident review — CEO/CTO
Standards Alignment
This plan aligns with the disaster-recovery practices described in NIST SP 800-34 Rev. 1 (Contingency Planning Guide for Federal Information Systems), scaled to Rationarium’s size and risk profile.
Review
This plan is reviewed annually and updated following any material incident, architectural change, or change of hosting provider.