Initial commit: VaultMesh Skills collection
Collection of operational skills for VaultMesh infrastructure including: - backup-sovereign: Backup and recovery operations - btc-anchor: Bitcoin anchoring - cloudflare-tunnel-manager: Cloudflare tunnel management - container-registry: Container registry operations - disaster-recovery: Disaster recovery procedures - dns-sovereign: DNS management - eth-anchor: Ethereum anchoring - gitea-bootstrap: Gitea setup and configuration - hetzner-bootstrap: Hetzner server provisioning - merkle-forest: Merkle tree operations - node-hardening: Node security hardening - operator-bootstrap: Operator initialization - proof-verifier: Cryptographic proof verification - rfc3161-anchor: RFC3161 timestamping - secrets-vault: Secrets management 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
123
node-hardening/references/recovery_procedures.md
Normal file
123
node-hardening/references/recovery_procedures.md
Normal file
@@ -0,0 +1,123 @@
|
||||
# Recovery Procedures
|
||||
|
||||
## Overview
|
||||
|
||||
This document describes recovery procedures for when node-hardening changes cause loss of remote access or system instability.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Console access via IPMI, VNC, or physical connection
|
||||
- Knowledge of backup file locations
|
||||
- Root or sudo access
|
||||
|
||||
## Scenario 1: SSH Access Lost
|
||||
|
||||
### Symptoms
|
||||
- Cannot SSH to the server
|
||||
- Connection refused or timeout
|
||||
|
||||
### Recovery Steps
|
||||
|
||||
1. **Access console** (IPMI/VNC/physical)
|
||||
|
||||
2. **Run emergency restore**:
|
||||
```bash
|
||||
cd ~/.claude/skills/node-hardening
|
||||
./scripts/rollback/emergency_restore.sh
|
||||
```
|
||||
|
||||
3. **If emergency_restore fails**, manually restore:
|
||||
```bash
|
||||
# Disable UFW
|
||||
sudo ufw --force disable
|
||||
|
||||
# Restore SSH config
|
||||
sudo cp /path/to/outputs/backups/sshd_config.before /etc/ssh/sshd_config
|
||||
|
||||
# Restart SSH
|
||||
sudo systemctl restart ssh
|
||||
# or
|
||||
sudo systemctl restart sshd
|
||||
```
|
||||
|
||||
4. **Verify from another terminal**:
|
||||
```bash
|
||||
ssh user@server
|
||||
```
|
||||
|
||||
## Scenario 2: Firewall Blocking All Traffic
|
||||
|
||||
### Symptoms
|
||||
- All network services unreachable
|
||||
- SSH, HTTP, HTTPS all timeout
|
||||
|
||||
### Recovery Steps
|
||||
|
||||
1. **Access console** (IPMI/VNC/physical)
|
||||
|
||||
2. **Disable UFW**:
|
||||
```bash
|
||||
sudo ufw --force disable
|
||||
```
|
||||
|
||||
3. **Verify rules**:
|
||||
```bash
|
||||
sudo ufw status verbose
|
||||
```
|
||||
|
||||
4. **Restore from backup if available**:
|
||||
```bash
|
||||
sudo iptables-restore < /path/to/outputs/backups/iptables_rules_before.txt
|
||||
```
|
||||
|
||||
## Scenario 3: fail2ban Blocking Legitimate Access
|
||||
|
||||
### Symptoms
|
||||
- SSH works from some IPs but not others
|
||||
- Intermittent connection failures
|
||||
|
||||
### Recovery Steps
|
||||
|
||||
1. **Check banned IPs**:
|
||||
```bash
|
||||
sudo fail2ban-client status sshd
|
||||
```
|
||||
|
||||
2. **Unban IP**:
|
||||
```bash
|
||||
sudo fail2ban-client set sshd unbanip <IP_ADDRESS>
|
||||
```
|
||||
|
||||
3. **Whitelist operator IP** in `/etc/fail2ban/jail.local`:
|
||||
```ini
|
||||
[DEFAULT]
|
||||
ignoreip = 127.0.0.1/8 ::1 <OPERATOR_IP>
|
||||
```
|
||||
|
||||
4. **Restart fail2ban**:
|
||||
```bash
|
||||
sudo systemctl restart fail2ban
|
||||
```
|
||||
|
||||
## Backup Locations
|
||||
|
||||
| File | Description |
|
||||
|------|-------------|
|
||||
| `outputs/backups/sshd_config.before` | Original SSH configuration |
|
||||
| `outputs/backups/ufw_status_before.txt` | UFW state before changes |
|
||||
| `outputs/backups/iptables_rules_before.txt` | iptables rules before changes |
|
||||
|
||||
## Prevention
|
||||
|
||||
1. **Always keep a secondary SSH session open** during changes
|
||||
2. **Test from a different network** before closing sessions
|
||||
3. **Have console access ready** before running apply scripts
|
||||
4. **Review plan output** before running apply
|
||||
|
||||
## Contact
|
||||
|
||||
If recovery procedures fail, escalate to infrastructure team with:
|
||||
- Node name and IP
|
||||
- Time of last successful access
|
||||
- Changes that were applied
|
||||
- Error messages from recovery attempts
|
||||
Reference in New Issue
Block a user