Collection of operational skills for VaultMesh infrastructure including: - backup-sovereign: Backup and recovery operations - btc-anchor: Bitcoin anchoring - cloudflare-tunnel-manager: Cloudflare tunnel management - container-registry: Container registry operations - disaster-recovery: Disaster recovery procedures - dns-sovereign: DNS management - eth-anchor: Ethereum anchoring - gitea-bootstrap: Gitea setup and configuration - hetzner-bootstrap: Hetzner server provisioning - merkle-forest: Merkle tree operations - node-hardening: Node security hardening - operator-bootstrap: Operator initialization - proof-verifier: Cryptographic proof verification - rfc3161-anchor: RFC3161 timestamping - secrets-vault: Secrets management 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
124 lines
2.7 KiB
Markdown
124 lines
2.7 KiB
Markdown
# Recovery Procedures
|
|
|
|
## Overview
|
|
|
|
This document describes recovery procedures for when node-hardening changes cause loss of remote access or system instability.
|
|
|
|
## Prerequisites
|
|
|
|
- Console access via IPMI, VNC, or physical connection
|
|
- Knowledge of backup file locations
|
|
- Root or sudo access
|
|
|
|
## Scenario 1: SSH Access Lost
|
|
|
|
### Symptoms
|
|
- Cannot SSH to the server
|
|
- Connection refused or timeout
|
|
|
|
### Recovery Steps
|
|
|
|
1. **Access console** (IPMI/VNC/physical)
|
|
|
|
2. **Run emergency restore**:
|
|
```bash
|
|
cd ~/.claude/skills/node-hardening
|
|
./scripts/rollback/emergency_restore.sh
|
|
```
|
|
|
|
3. **If emergency_restore fails**, manually restore:
|
|
```bash
|
|
# Disable UFW
|
|
sudo ufw --force disable
|
|
|
|
# Restore SSH config
|
|
sudo cp /path/to/outputs/backups/sshd_config.before /etc/ssh/sshd_config
|
|
|
|
# Restart SSH
|
|
sudo systemctl restart ssh
|
|
# or
|
|
sudo systemctl restart sshd
|
|
```
|
|
|
|
4. **Verify from another terminal**:
|
|
```bash
|
|
ssh user@server
|
|
```
|
|
|
|
## Scenario 2: Firewall Blocking All Traffic
|
|
|
|
### Symptoms
|
|
- All network services unreachable
|
|
- SSH, HTTP, HTTPS all timeout
|
|
|
|
### Recovery Steps
|
|
|
|
1. **Access console** (IPMI/VNC/physical)
|
|
|
|
2. **Disable UFW**:
|
|
```bash
|
|
sudo ufw --force disable
|
|
```
|
|
|
|
3. **Verify rules**:
|
|
```bash
|
|
sudo ufw status verbose
|
|
```
|
|
|
|
4. **Restore from backup if available**:
|
|
```bash
|
|
sudo iptables-restore < /path/to/outputs/backups/iptables_rules_before.txt
|
|
```
|
|
|
|
## Scenario 3: fail2ban Blocking Legitimate Access
|
|
|
|
### Symptoms
|
|
- SSH works from some IPs but not others
|
|
- Intermittent connection failures
|
|
|
|
### Recovery Steps
|
|
|
|
1. **Check banned IPs**:
|
|
```bash
|
|
sudo fail2ban-client status sshd
|
|
```
|
|
|
|
2. **Unban IP**:
|
|
```bash
|
|
sudo fail2ban-client set sshd unbanip <IP_ADDRESS>
|
|
```
|
|
|
|
3. **Whitelist operator IP** in `/etc/fail2ban/jail.local`:
|
|
```ini
|
|
[DEFAULT]
|
|
ignoreip = 127.0.0.1/8 ::1 <OPERATOR_IP>
|
|
```
|
|
|
|
4. **Restart fail2ban**:
|
|
```bash
|
|
sudo systemctl restart fail2ban
|
|
```
|
|
|
|
## Backup Locations
|
|
|
|
| File | Description |
|
|
|------|-------------|
|
|
| `outputs/backups/sshd_config.before` | Original SSH configuration |
|
|
| `outputs/backups/ufw_status_before.txt` | UFW state before changes |
|
|
| `outputs/backups/iptables_rules_before.txt` | iptables rules before changes |
|
|
|
|
## Prevention
|
|
|
|
1. **Always keep a secondary SSH session open** during changes
|
|
2. **Test from a different network** before closing sessions
|
|
3. **Have console access ready** before running apply scripts
|
|
4. **Review plan output** before running apply
|
|
|
|
## Contact
|
|
|
|
If recovery procedures fail, escalate to infrastructure team with:
|
|
- Node name and IP
|
|
- Time of last successful access
|
|
- Changes that were applied
|
|
- Error messages from recovery attempts
|