vm-skills/node-hardening/references/recovery_procedures.md

# Recovery Procedures

## Overview

This document describes recovery procedures for when node-hardening changes cause loss of remote access or system instability.

## Prerequisites

- Console access via IPMI, VNC, or physical connection
- Knowledge of backup file locations
- Root or sudo access

## Scenario 1: SSH Access Lost

### Symptoms
- Cannot SSH to the server
- Connection refused or timeout

### Recovery Steps

1. **Access console** (IPMI/VNC/physical)

2. **Run emergency restore**:
   ```bash
   cd ~/.claude/skills/node-hardening
   ./scripts/rollback/emergency_restore.sh
   ```

3. **If emergency_restore fails**, manually restore:
   ```bash
   # Disable UFW
   sudo ufw --force disable

   # Restore SSH config
   sudo cp /path/to/outputs/backups/sshd_config.before /etc/ssh/sshd_config

   # Restart SSH
   sudo systemctl restart ssh
   # or
   sudo systemctl restart sshd
   ```

4. **Verify from another terminal**:
   ```bash
   ssh user@server
   ```

## Scenario 2: Firewall Blocking All Traffic

### Symptoms
- All network services unreachable
- SSH, HTTP, HTTPS all timeout

### Recovery Steps

1. **Access console** (IPMI/VNC/physical)

2. **Disable UFW**:
   ```bash
   sudo ufw --force disable
   ```

3. **Verify rules**:
   ```bash
   sudo ufw status verbose
   ```

4. **Restore from backup if available**:
   ```bash
   sudo iptables-restore < /path/to/outputs/backups/iptables_rules_before.txt
   ```

## Scenario 3: fail2ban Blocking Legitimate Access

### Symptoms
- SSH works from some IPs but not others
- Intermittent connection failures

### Recovery Steps

1. **Check banned IPs**:
   ```bash
   sudo fail2ban-client status sshd
   ```

2. **Unban IP**:
   ```bash
   sudo fail2ban-client set sshd unbanip <IP_ADDRESS>
   ```

3. **Whitelist operator IP** in `/etc/fail2ban/jail.local`:
   ```ini
   [DEFAULT]
   ignoreip = 127.0.0.1/8 ::1 <OPERATOR_IP>
   ```

4. **Restart fail2ban**:
   ```bash
   sudo systemctl restart fail2ban
   ```

## Backup Locations

| File | Description |
|------|-------------|
| `outputs/backups/sshd_config.before` | Original SSH configuration |
| `outputs/backups/ufw_status_before.txt` | UFW state before changes |
| `outputs/backups/iptables_rules_before.txt` | iptables rules before changes |

## Prevention

1. **Always keep a secondary SSH session open** during changes
2. **Test from a different network** before closing sessions
3. **Have console access ready** before running apply scripts
4. **Review plan output** before running apply

## Contact

If recovery procedures fail, escalate to infrastructure team with:
- Node name and IP
- Time of last successful access
- Changes that were applied
- Error messages from recovery attempts