Synthetic Users Disaster Recovery Plan

Purpose: To ensure rapid and efficient recovery of Synthetic Users' operations in the event of a disaster, specifically focusing on critical dependencies like Heroku and OpenAI.
Scope: This DRP covers processes and protocols for switching operations from Heroku to AWS in the event of a Heroku failure and from OpenAI to Anthropic in the event of an OpenAI failure.

Primary Dependencies: Heroku for application hosting, OpenAI for AI model integrations.
Secondary Options: AWS for application hosting, Anthropic for AI model integrations.

Recovery Time Objective (RTO): The maximum acceptable time to restore critical functions after a disaster.
- Heroku to AWS Migration: 4 hours
- OpenAI to Anthropic Switch: 2 hours
Recovery Point Objective (RPO): The maximum acceptable amount of data loss measured in time.
- Data Backup: 1 hour

Detection and Assessment: Monitor and quickly identify service disruption on Heroku. Confirm the outage's scope and expected duration.
Activation of AWS Environment:
- Pre-configured AWS environments should be maintained, mirroring the Heroku setup.
- Initiate the AWS environment, ensuring all services and databases are operational.
Data Migration:
- Last data backup from Heroku (or directly from the database if accessible) is restored to AWS.
- Ensure the RPO of 1 hour is met by verifying the data integrity post-migration.
DNS Update:
- Update DNS records to point to the AWS environment, minimizing the switch-over time to meet the RTO of 2 hours.
Verification and Monitoring:
- Conduct thorough testing to confirm operational functionality on AWS.
- Monitor performance and stability closely following the switch.

Detection and Assessment: Identify failure in OpenAI services impacting operations. Assess the impact on service offerings.
Switch to Anthropic:
- Pre-configure Anthropic models to match the functionality provided by OpenAI models closely.
- Redirect API calls from OpenAI to Anthropic, ensuring minimal changes to the integration layer.
Verification and Adjustment:
- Test the integration thoroughly to ensure that Anthropic models perform as expected.
- Adjust configurations as needed to optimize performance and accuracy.
Communication:
- Inform internal teams about the switch to manage expectations and provide updated documentation if necessary.
- Notify key clients of the change, emphasizing the continuity of service and quality.

Review and Analysis: After the recovery, conduct a detailed review to analyze the response's effectiveness, documenting lessons learned.
Plan Update: Update the DRP based on feedback and any changes in the technological landscape or business requirements.

Annual DRP Testing: Simulate disaster scenarios annually to test the effectiveness of the DRP, focusing on the switch from Heroku to AWS and OpenAI to Anthropic.
DRP Updates: Review and update the DRP semi-annually or following significant changes in technology or business operations.

DRP Document: Maintain a comprehensive, accessible DRP document detailing all protocols, procedures, and recovery objectives.
Training: Regularly train relevant staff on DRP protocols, ensuring clear understanding and readiness to act in the event of a disaster.

Synthetic Users Disaster Recovery Plan ​