Initial commit: VaultMesh Skills collection

Collection of operational skills for VaultMesh infrastructure including: - backup-sovereign: Backup and recovery operations - btc-anchor: Bitcoin anchoring - cloudflare-tunnel-manager: Cloudflare tunnel management - container-registry: Container registry operations - disaster-recovery: Disaster recovery procedures - dns-sovereign: DNS management - eth-anchor: Ethereum anchoring - gitea-bootstrap: Gitea setup and configuration - hetzner-bootstrap: Hetzner server provisioning - merkle-forest: Merkle tree operations - node-hardening: Node security hardening - operator-bootstrap: Operator initialization - proof-verifier: Cryptographic proof verification - rfc3161-anchor: RFC3161 timestamping - secrets-vault: Secrets management 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-27 00:25:00 +00:00
commit eac77ef7b4
213 changed files with 11724 additions and 0 deletions
--- a/node-hardening/references/recovery_procedures.md
+++ b/node-hardening/references/recovery_procedures.md
@@ -0,0 +1,123 @@
+# Recovery Procedures
+
+## Overview
+
+This document describes recovery procedures for when node-hardening changes cause loss of remote access or system instability.
+
+## Prerequisites
+
+- Console access via IPMI, VNC, or physical connection
+- Knowledge of backup file locations
+- Root or sudo access
+
+## Scenario 1: SSH Access Lost
+
+### Symptoms
+- Cannot SSH to the server
+- Connection refused or timeout
+
+### Recovery Steps
+
+1. **Access console** (IPMI/VNC/physical)
+
+2. **Run emergency restore**:
+   ```bash
+   cd ~/.claude/skills/node-hardening
+   ./scripts/rollback/emergency_restore.sh
+   ```
+
+3. **If emergency_restore fails**, manually restore:
+   ```bash
+   # Disable UFW
+   sudo ufw --force disable
+
+   # Restore SSH config
+   sudo cp /path/to/outputs/backups/sshd_config.before /etc/ssh/sshd_config
+
+   # Restart SSH
+   sudo systemctl restart ssh
+   # or
+   sudo systemctl restart sshd
+   ```
+
+4. **Verify from another terminal**:
+   ```bash
+   ssh user@server
+   ```
+
+## Scenario 2: Firewall Blocking All Traffic
+
+### Symptoms
+- All network services unreachable
+- SSH, HTTP, HTTPS all timeout
+
+### Recovery Steps
+
+1. **Access console** (IPMI/VNC/physical)
+
+2. **Disable UFW**:
+   ```bash
+   sudo ufw --force disable
+   ```
+
+3. **Verify rules**:
+   ```bash
+   sudo ufw status verbose
+   ```
+
+4. **Restore from backup if available**:
+   ```bash
+   sudo iptables-restore < /path/to/outputs/backups/iptables_rules_before.txt
+   ```
+
+## Scenario 3: fail2ban Blocking Legitimate Access
+
+### Symptoms
+- SSH works from some IPs but not others
+- Intermittent connection failures
+
+### Recovery Steps
+
+1. **Check banned IPs**:
+   ```bash
+   sudo fail2ban-client status sshd
+   ```
+
+2. **Unban IP**:
+   ```bash
+   sudo fail2ban-client set sshd unbanip <IP_ADDRESS>
+   ```
+
+3. **Whitelist operator IP** in `/etc/fail2ban/jail.local`:
+   ```ini
+   [DEFAULT]
+   ignoreip = 127.0.0.1/8 ::1 <OPERATOR_IP>
+   ```
+
+4. **Restart fail2ban**:
+   ```bash
+   sudo systemctl restart fail2ban
+   ```
+
+## Backup Locations
+
+| File | Description |
+|------|-------------|
+| `outputs/backups/sshd_config.before` | Original SSH configuration |
+| `outputs/backups/ufw_status_before.txt` | UFW state before changes |
+| `outputs/backups/iptables_rules_before.txt` | iptables rules before changes |
+
+## Prevention
+
+1. **Always keep a secondary SSH session open** during changes
+2. **Test from a different network** before closing sessions
+3. **Have console access ready** before running apply scripts
+4. **Review plan output** before running apply
+
+## Contact
+
+If recovery procedures fail, escalate to infrastructure team with:
+- Node name and IP
+- Time of last successful access
+- Changes that were applied
+- Error messages from recovery attempts