793 lines
24 KiB
Markdown
793 lines
24 KiB
Markdown
# Docker TLS Security Setup
|
|
|
|
This directory contains scripts and configuration for securing Docker API access with TLS authentication, replacing the insecure socket mounting approach.
|
|
|
|
## Overview
|
|
|
|
Previously, the session-manager service mounted the Docker socket (`/var/run/docker.sock`) directly into containers, granting full root access to the host Docker daemon. This is a critical security vulnerability.
|
|
|
|
This setup replaces socket mounting with authenticated TLS API access over the network.
|
|
|
|
## Security Benefits
|
|
|
|
- ✅ **No socket mounting**: Eliminates privilege escalation risk
|
|
- ✅ **Mutual TLS authentication**: Both client and server authenticate
|
|
- ✅ **Encrypted communication**: All API calls are encrypted
|
|
- ✅ **Certificate-based access**: Granular access control
|
|
- ✅ **Network isolation**: API access is network-bound, not filesystem-bound
|
|
|
|
## Docker Service Abstraction
|
|
|
|
The session-manager now uses a clean `DockerService` abstraction layer that separates Docker operations from business logic, enabling better testing, maintainability, and future Docker client changes.
|
|
|
|
### Architecture Benefits
|
|
|
|
- 🧪 **Testability**: MockDockerService enables testing without Docker daemon
|
|
- 🔧 **Maintainability**: Clean separation of concerns
|
|
- 🔄 **Flexibility**: Easy to swap Docker client implementations
|
|
- 📦 **Dependency Injection**: SessionManager receives DockerService via constructor
|
|
- ⚡ **Performance**: Both async and sync Docker operations supported
|
|
|
|
### Service Interface
|
|
|
|
```python
|
|
class DockerService:
|
|
async def create_container(self, name: str, image: str, **kwargs) -> ContainerInfo
|
|
async def start_container(self, container_id: str) -> None
|
|
async def stop_container(self, container_id: str, timeout: int = 10) -> None
|
|
async def remove_container(self, container_id: str, force: bool = False) -> None
|
|
async def get_container_info(self, container_id: str) -> Optional[ContainerInfo]
|
|
async def list_containers(self, all: bool = False) -> List[ContainerInfo]
|
|
async def ping(self) -> bool
|
|
```
|
|
|
|
### Testing
|
|
|
|
Run the comprehensive test suite:
|
|
|
|
```bash
|
|
# Test Docker service abstraction
|
|
./docker/scripts/test-docker-service.py
|
|
|
|
# Results: 7/7 tests passed ✅
|
|
# - Service Interface ✅
|
|
# - Error Handling ✅
|
|
# - Async vs Sync Modes ✅
|
|
# - Container Info Operations ✅
|
|
# - Context Management ✅
|
|
# - Integration Patterns ✅
|
|
# - Performance and Scaling ✅
|
|
```
|
|
|
|
### Usage in SessionManager
|
|
|
|
```python
|
|
# Dependency injection pattern
|
|
session_manager = SessionManager(docker_service=DockerService(use_async=True))
|
|
|
|
# Or with mock for testing
|
|
test_manager = SessionManager(docker_service=MockDockerService())
|
|
```
|
|
|
|
## Files Structure
|
|
|
|
```
|
|
docker/
|
|
├── certs/ # Generated TLS certificates (not in git)
|
|
├── scripts/
|
|
│ ├── generate-certs.sh # Certificate generation script
|
|
│ ├── setup-docker-tls.sh # Docker daemon TLS configuration
|
|
│ └── test-tls-connection.py # Connection testing script
|
|
├── daemon.json # Docker daemon TLS configuration
|
|
└── .env.example # Environment configuration template
|
|
```
|
|
|
|
## Quick Start
|
|
|
|
### 1. Generate TLS Certificates
|
|
|
|
```bash
|
|
# Generate certificates for development
|
|
DOCKER_ENV=development ./docker/scripts/generate-certs.sh
|
|
|
|
# Or for production with custom settings
|
|
DOCKER_ENV=production \
|
|
DOCKER_HOST_IP=your-server-ip \
|
|
DOCKER_HOST_NAME=your-docker-host \
|
|
./docker/scripts/generate-certs.sh
|
|
```
|
|
|
|
### 2. Configure Docker Daemon
|
|
|
|
**For local development (Docker Desktop):**
|
|
```bash
|
|
# Certificates are automatically mounted in docker-compose.yml
|
|
docker-compose up -d
|
|
```
|
|
|
|
**For production/server setup:**
|
|
```bash
|
|
# Configure system Docker daemon with TLS
|
|
sudo ./docker/scripts/setup-docker-tls.sh
|
|
```
|
|
|
|
### 3. Configure Environment
|
|
|
|
```bash
|
|
# Copy and customize environment file
|
|
cp docker/.env.example .env
|
|
|
|
# Edit .env with your settings
|
|
# DOCKER_HOST_IP=host.docker.internal # for Docker Desktop
|
|
# DOCKER_HOST_IP=your-server-ip # for production
|
|
```
|
|
|
|
### 4. Test Configuration
|
|
|
|
```bash
|
|
# Test TLS connection
|
|
./docker/scripts/test-tls-connection.py
|
|
|
|
# Start services
|
|
docker-compose --env-file .env up -d session-manager
|
|
|
|
# Check logs
|
|
docker-compose logs session-manager
|
|
```
|
|
|
|
## Configuration Options
|
|
|
|
### Environment Variables
|
|
|
|
| Variable | Default | Description |
|
|
|----------|---------|-------------|
|
|
| `DOCKER_TLS_VERIFY` | `1` | Enable TLS verification |
|
|
| `DOCKER_CERT_PATH` | `./docker/certs` | Certificate directory path |
|
|
| `DOCKER_HOST` | `tcp://host.docker.internal:2376` | Docker daemon endpoint |
|
|
| `DOCKER_TLS_PORT` | `2376` | TLS port for Docker API |
|
|
| `DOCKER_CA_CERT` | `./docker/certs/ca.pem` | CA certificate path |
|
|
| `DOCKER_CLIENT_CERT` | `./docker/certs/client-cert.pem` | Client certificate path |
|
|
| `DOCKER_CLIENT_KEY` | `./docker/certs/client-key.pem` | Client key path |
|
|
| `DOCKER_HOST_IP` | `host.docker.internal` | Docker host IP |
|
|
|
|
### Certificate Generation Options
|
|
|
|
| Variable | Default | Description |
|
|
|----------|---------|-------------|
|
|
| `DOCKER_ENV` | `development` | Environment name for certificates |
|
|
| `DOCKER_HOST_IP` | `127.0.0.1` | IP address for server certificate |
|
|
| `DOCKER_HOST_NAME` | `localhost` | Hostname for server certificate |
|
|
| `DAYS` | `3650` | Certificate validity in days |
|
|
|
|
## Production Deployment
|
|
|
|
### Certificate Management
|
|
|
|
1. **Generate certificates on a secure machine**
|
|
2. **Distribute to servers securely** (SCP, Ansible, etc.)
|
|
3. **Set proper permissions**:
|
|
```bash
|
|
chmod 444 /etc/docker/certs/*.pem # certs readable by all
|
|
chmod 400 /etc/docker/certs/*-key.pem # keys readable by root only
|
|
```
|
|
4. **Rotate certificates regularly** (every 6-12 months)
|
|
5. **Revoke compromised certificates** and regenerate
|
|
|
|
### Docker Daemon Configuration
|
|
|
|
For production servers, use the `setup-docker-tls.sh` script or manually configure `/etc/docker/daemon.json`:
|
|
|
|
```json
|
|
{
|
|
"tls": true,
|
|
"tlsverify": true,
|
|
"tlscacert": "/etc/docker/certs/ca.pem",
|
|
"tlscert": "/etc/docker/certs/server-cert.pem",
|
|
"tlskey": "/etc/docker/certs/server-key.pem",
|
|
"hosts": ["tcp://0.0.0.0:2376"],
|
|
"iptables": false,
|
|
"bridge": "none",
|
|
"live-restore": true,
|
|
"userland-proxy": false,
|
|
"no-new-privileges": true
|
|
}
|
|
```
|
|
|
|
### Security Hardening
|
|
|
|
- **Firewall**: Only allow TLS port (2376) from trusted networks
|
|
- **TLS 1.3**: Ensure modern TLS version support
|
|
- **Certificate pinning**: Consider certificate pinning in client code
|
|
- **Monitoring**: Log and monitor Docker API access
|
|
- **Rate limiting**: Implement API rate limiting
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
**"Connection refused"**
|
|
- Check if Docker daemon is running with TLS
|
|
- Verify `DOCKER_HOST` points to correct endpoint
|
|
- Ensure firewall allows port 2376
|
|
|
|
**"TLS handshake failed"**
|
|
- Verify certificates exist and have correct permissions
|
|
- Check certificate validity dates
|
|
- Ensure CA certificate is correct
|
|
|
|
**"Permission denied"**
|
|
- Check certificate file permissions (444 for certs, 400 for keys)
|
|
- Ensure client certificate is signed by the CA
|
|
|
|
### Debug Commands
|
|
|
|
```bash
|
|
# Test TLS connection manually
|
|
docker --tlsverify \
|
|
--tlscacert=./docker/certs/ca.pem \
|
|
--tlscert=./docker/certs/client-cert.pem \
|
|
--tlskey=./docker/certs/client-key.pem \
|
|
-H tcp://host.docker.internal:2376 \
|
|
version
|
|
|
|
# Check certificate validity
|
|
openssl x509 -in ./docker/certs/server-cert.pem -text -noout
|
|
|
|
# Test from container
|
|
docker-compose exec session-manager ./docker/scripts/test-tls-connection.py
|
|
```
|
|
|
|
## Migration from Socket Mounting
|
|
|
|
### Before (Insecure)
|
|
```yaml
|
|
volumes:
|
|
- /var/run/docker.sock:/var/run/docker.sock
|
|
```
|
|
|
|
### After (Secure)
|
|
```yaml
|
|
volumes:
|
|
- ./docker/certs:/etc/docker/certs:ro
|
|
environment:
|
|
- DOCKER_TLS_VERIFY=1
|
|
- DOCKER_HOST=tcp://host.docker.internal:2376
|
|
```
|
|
|
|
### Code Changes Required
|
|
|
|
Update Docker client initialization:
|
|
```python
|
|
# Before
|
|
self.docker_client = docker.from_env()
|
|
|
|
# After
|
|
tls_config = docker.tls.TLSConfig(
|
|
ca_cert=os.getenv('DOCKER_CA_CERT'),
|
|
client_cert=(os.getenv('DOCKER_CLIENT_CERT'), os.getenv('DOCKER_CLIENT_KEY')),
|
|
verify=True
|
|
)
|
|
self.docker_client = docker.from_env()
|
|
self.docker_client.api = docker.APIClient(
|
|
base_url=os.getenv('DOCKER_HOST'),
|
|
tls=tls_config
|
|
)
|
|
```
|
|
|
|
## Dynamic Host IP Detection
|
|
|
|
The session-manager service now includes robust host IP detection to support proxy routing across different Docker environments:
|
|
|
|
### Supported Environments
|
|
|
|
- **Docker Desktop (Mac/Windows)**: Uses `host.docker.internal` resolution
|
|
- **Linux Docker**: Reads gateway from `/proc/net/route`
|
|
- **Cloud environments**: Respects `DOCKER_HOST_GATEWAY` and `GATEWAY` environment variables
|
|
- **Custom networks**: Tests connectivity to common Docker gateway IPs
|
|
|
|
### Detection Methods (in priority order)
|
|
|
|
1. **Docker Internal**: Resolves `host.docker.internal` (Docker Desktop)
|
|
2. **Environment Variables**: Checks `HOST_IP`, `DOCKER_HOST_GATEWAY`, `GATEWAY`
|
|
3. **Route Table**: Parses `/proc/net/route` for default gateway
|
|
4. **Network Connection**: Tests connectivity to determine local routing
|
|
5. **Common Gateways**: Falls back to known Docker bridge IPs
|
|
|
|
### Configuration
|
|
|
|
The detection is automatic and cached for 5 minutes. Override with:
|
|
|
|
```bash
|
|
# Force specific host IP
|
|
export HOST_IP=192.168.1.100
|
|
|
|
# Or in docker-compose.yml
|
|
environment:
|
|
- HOST_IP=your-host-ip
|
|
```
|
|
|
|
### Testing
|
|
|
|
```bash
|
|
# Test host IP detection
|
|
./docker/scripts/test-host-ip-detection.py
|
|
|
|
# Run integration test
|
|
./docker/scripts/test-integration.sh
|
|
```
|
|
|
|
### Troubleshooting
|
|
|
|
**"Could not detect Docker host IP"**
|
|
- Check network configuration: `docker network inspect bridge`
|
|
- Verify environment variables
|
|
- Test connectivity: `ping host.docker.internal`
|
|
- Set explicit `HOST_IP` if needed
|
|
|
|
**Proxy routing fails**
|
|
- Verify detected IP is accessible from containers
|
|
- Check firewall rules blocking container-to-host traffic
|
|
- Ensure Docker network allows communication
|
|
|
|
## Structured Logging
|
|
|
|
Comprehensive logging infrastructure with structured JSON logs, request tracking, and production-ready log management for debugging and monitoring.
|
|
|
|
### Log Features
|
|
|
|
- **Structured JSON Logs**: Machine-readable logs for production analysis
|
|
- **Request ID Tracking**: Trace requests across distributed operations
|
|
- **Human-Readable Development**: Clear logs for local development
|
|
- **Performance Metrics**: Built-in request timing and performance tracking
|
|
- **Security Event Logging**: Audit trail for security-related events
|
|
- **Log Rotation**: Automatic log rotation with size limits
|
|
|
|
### Configuration
|
|
|
|
```bash
|
|
# Log level and format
|
|
export LOG_LEVEL=INFO # DEBUG, INFO, WARNING, ERROR, CRITICAL
|
|
export LOG_FORMAT=auto # json, human, auto (detects environment)
|
|
|
|
# File logging
|
|
export LOG_FILE=/var/log/lovdata-chat.log
|
|
export LOG_MAX_SIZE_MB=10 # Max log file size
|
|
export LOG_BACKUP_COUNT=5 # Number of backup files
|
|
|
|
# Output control
|
|
export LOG_CONSOLE=true # Enable console logging
|
|
export LOG_FILE_ENABLED=true # Enable file logging
|
|
```
|
|
|
|
### Testing Structured Logging
|
|
|
|
```bash
|
|
# Test logging functionality and formatters
|
|
./docker/scripts/test-structured-logging.py
|
|
```
|
|
|
|
### Log Analysis
|
|
|
|
**JSON Format (Production):**
|
|
```json
|
|
{
|
|
"timestamp": "2024-01-15T10:30:45.123Z",
|
|
"level": "INFO",
|
|
"logger": "session_manager.main",
|
|
"message": "Session created successfully",
|
|
"request_id": "req-abc123",
|
|
"session_id": "ses-xyz789",
|
|
"operation": "create_session",
|
|
"duration_ms": 245.67
|
|
}
|
|
```
|
|
|
|
**Human-Readable Format (Development):**
|
|
```
|
|
2024-01-15 10:30:45 [INFO ] session_manager.main:create_session:145 [req-abc123] - Session created successfully
|
|
```
|
|
|
|
### Request Tracing
|
|
|
|
All logs include request IDs for tracing operations across the system:
|
|
|
|
```python
|
|
with RequestContext():
|
|
log_session_operation(session_id, "created")
|
|
# All subsequent logs in this context include request_id
|
|
```
|
|
|
|
## Database Persistence
|
|
|
|
Session data is now stored in PostgreSQL for reliability, multi-instance deployment support, and elimination of JSON file corruption vulnerabilities.
|
|
|
|
### Database Configuration
|
|
|
|
```bash
|
|
# PostgreSQL connection settings
|
|
export DB_HOST=localhost # Database host
|
|
export DB_PORT=5432 # Database port
|
|
export DB_USER=lovdata # Database user
|
|
export DB_PASSWORD=password # Database password
|
|
export DB_NAME=lovdata_chat # Database name
|
|
|
|
# Connection pool settings
|
|
export DB_MIN_CONNECTIONS=5 # Minimum pool connections
|
|
export DB_MAX_CONNECTIONS=20 # Maximum pool connections
|
|
export DB_MAX_QUERIES=50000 # Max queries per connection
|
|
export DB_MAX_INACTIVE_LIFETIME=300.0 # Connection timeout
|
|
```
|
|
|
|
### Storage Backend Selection
|
|
|
|
```bash
|
|
# Enable database storage (recommended for production)
|
|
export USE_DATABASE_STORAGE=true
|
|
|
|
# Or use JSON file storage (legacy/development)
|
|
export USE_DATABASE_STORAGE=false
|
|
```
|
|
|
|
### Database Schema
|
|
|
|
**Sessions Table:**
|
|
- `session_id` (VARCHAR, Primary Key): Unique session identifier
|
|
- `container_name` (VARCHAR): Docker container name
|
|
- `container_id` (VARCHAR): Docker container ID
|
|
- `host_dir` (VARCHAR): Host directory path
|
|
- `port` (INTEGER): Container port
|
|
- `auth_token` (VARCHAR): Authentication token
|
|
- `created_at` (TIMESTAMP): Creation timestamp
|
|
- `last_accessed` (TIMESTAMP): Last access timestamp
|
|
- `status` (VARCHAR): Session status (creating, running, stopped, error)
|
|
- `metadata` (JSONB): Additional session metadata
|
|
|
|
**Indexes:**
|
|
- Primary key on `session_id`
|
|
- Status index for filtering active sessions
|
|
- Last accessed index for cleanup operations
|
|
- Created at index for session listing
|
|
|
|
### Testing Database Persistence
|
|
|
|
```bash
|
|
# Test database connection and operations
|
|
./docker/scripts/test-database-persistence.py
|
|
```
|
|
|
|
### Health Monitoring
|
|
|
|
The `/health` endpoint now includes database status:
|
|
|
|
```json
|
|
{
|
|
"storage_backend": "database",
|
|
"database": {
|
|
"status": "healthy",
|
|
"total_sessions": 15,
|
|
"active_sessions": 8,
|
|
"database_size": "25 MB"
|
|
}
|
|
}
|
|
```
|
|
|
|
### Migration Strategy
|
|
|
|
**From JSON File to Database:**
|
|
1. **Backup existing sessions** (if any)
|
|
2. **Set environment variables** for database connection
|
|
3. **Enable database storage**: `USE_DATABASE_STORAGE=true`
|
|
4. **Restart service** - automatic schema creation and migration
|
|
5. **Verify data migration** in health endpoint
|
|
6. **Monitor performance** and adjust connection pool settings
|
|
|
|
**Backward Compatibility:**
|
|
- JSON file storage remains available for development
|
|
- Automatic fallback if database is unavailable
|
|
- Zero-downtime migration possible
|
|
|
|
## Container Health Monitoring
|
|
|
|
Active monitoring of Docker containers with automatic failure detection and recovery mechanisms to prevent stuck sessions and improve system reliability.
|
|
|
|
### Health Monitoring Features
|
|
|
|
- **Periodic Health Checks**: Continuous monitoring of running containers every 30 seconds
|
|
- **Automatic Failure Detection**: Identifies unhealthy or failed containers
|
|
- **Smart Restart Logic**: Automatic container restart with configurable limits
|
|
- **Health History Tracking**: Maintains health check history for analysis
|
|
- **Status Integration**: Updates session status based on container health
|
|
|
|
### Configuration
|
|
|
|
```bash
|
|
# Health check intervals and timeouts
|
|
CONTAINER_HEALTH_CHECK_INTERVAL=30 # Check every 30 seconds
|
|
CONTAINER_HEALTH_TIMEOUT=10.0 # Health check timeout
|
|
CONTAINER_MAX_RESTART_ATTEMPTS=3 # Max restart attempts
|
|
CONTAINER_RESTART_DELAY=5 # Delay between restarts
|
|
CONTAINER_FAILURE_THRESHOLD=3 # Failures before restart
|
|
```
|
|
|
|
### Health Status Types
|
|
|
|
- **HEALTHY**: Container running normally with optional health checks passing
|
|
- **UNHEALTHY**: Container running but health checks failing
|
|
- **RESTARTING**: Container being restarted due to failures
|
|
- **FAILED**: Container stopped or permanently failed
|
|
- **UNKNOWN**: Unable to determine container status
|
|
|
|
### Testing Health Monitoring
|
|
|
|
```bash
|
|
# Test health monitoring functionality
|
|
./docker/scripts/test-container-health.py
|
|
```
|
|
|
|
### Health Endpoints
|
|
|
|
**System Health:**
|
|
```bash
|
|
GET /health # Includes container health statistics
|
|
```
|
|
|
|
**Detailed Container Health:**
|
|
```bash
|
|
GET /health/container # Overall health stats
|
|
GET /health/container/{session_id} # Specific session health
|
|
```
|
|
|
|
**Health Response:**
|
|
```json
|
|
{
|
|
"container_health": {
|
|
"monitoring_active": true,
|
|
"check_interval": 30,
|
|
"total_sessions_monitored": 5,
|
|
"sessions_with_failures": 1,
|
|
"session_ses123": {
|
|
"total_checks": 10,
|
|
"healthy_checks": 8,
|
|
"failed_checks": 2,
|
|
"average_response_time": 45.2
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### Recovery Mechanisms
|
|
|
|
1. **Health Check Failure**: Container marked as unhealthy
|
|
2. **Consecutive Failures**: After threshold, automatic restart initiated
|
|
3. **Restart Attempts**: Limited to prevent infinite restart loops
|
|
4. **Session Status Update**: Session status reflects container health
|
|
5. **Logging & Alerts**: Comprehensive logging of health events
|
|
|
|
### Integration Benefits
|
|
|
|
- **Proactive Monitoring**: Detects issues before users are affected
|
|
- **Automatic Recovery**: Reduces manual intervention requirements
|
|
- **Improved Reliability**: Prevents stuck sessions and system instability
|
|
- **Operational Visibility**: Detailed health metrics and history
|
|
- **Scalable Architecture**: Works with multiple concurrent sessions
|
|
|
|
## Session Authentication
|
|
|
|
OpenCode servers now require token-based authentication for secure individual user sessions, preventing unauthorized access and ensuring session isolation.
|
|
|
|
### Authentication Features
|
|
|
|
- **Token Generation**: Unique cryptographically secure tokens per session
|
|
- **Automatic Expiry**: Configurable token lifetime (default 24 hours)
|
|
- **Token Rotation**: Ability to rotate tokens for enhanced security
|
|
- **Session Isolation**: Each user session has its own authentication credentials
|
|
- **Proxy Integration**: Authentication headers automatically included in proxy requests
|
|
|
|
### Configuration
|
|
|
|
```bash
|
|
# Token configuration
|
|
export SESSION_TOKEN_LENGTH=32 # Token length in characters
|
|
export SESSION_TOKEN_EXPIRY_HOURS=24 # Token validity period
|
|
export SESSION_TOKEN_SECRET=auto # Token signing secret (auto-generated)
|
|
export TOKEN_CLEANUP_INTERVAL_MINUTES=60 # Expired token cleanup interval
|
|
```
|
|
|
|
### Testing Authentication
|
|
|
|
```bash
|
|
# Test authentication functionality
|
|
./docker/scripts/test-session-auth.py
|
|
|
|
# End-to-end authentication testing
|
|
./docker/scripts/test-auth-end-to-end.sh
|
|
```
|
|
|
|
### API Endpoints
|
|
|
|
**Authentication Management:**
|
|
- `GET /sessions/{id}/auth` - Get session authentication info
|
|
- `POST /sessions/{id}/auth/rotate` - Rotate session token
|
|
- `GET /auth/sessions` - List authenticated sessions
|
|
|
|
**Health Monitoring:**
|
|
```json
|
|
{
|
|
"authenticated_sessions": 3,
|
|
"status": "healthy"
|
|
}
|
|
```
|
|
|
|
### Security Benefits
|
|
|
|
- **Session Isolation**: Users cannot access each other's OpenCode servers
|
|
- **Token Expiry**: Automatic cleanup prevents token accumulation
|
|
- **Secure Generation**: Cryptographically secure random tokens
|
|
- **Proxy Security**: Authentication headers prevent unauthorized proxy access
|
|
|
|
## HTTP Connection Pooling
|
|
|
|
Proxy requests now use a global HTTP connection pool instead of creating new httpx clients for each request, eliminating connection overhead and dramatically improving proxy performance.
|
|
|
|
### Connection Pool Benefits
|
|
|
|
- **Eliminated Connection Overhead**: No more client creation/teardown per request
|
|
- **Connection Reuse**: Persistent keep-alive connections reduce latency
|
|
- **Improved Throughput**: Handle significantly more concurrent proxy requests
|
|
- **Reduced Resource Usage**: Lower memory and CPU overhead for HTTP operations
|
|
- **Better Scalability**: Support higher request rates with the same system resources
|
|
|
|
### Pool Configuration
|
|
|
|
The connection pool is automatically configured with optimized settings:
|
|
|
|
```python
|
|
# Connection pool settings
|
|
max_keepalive_connections=20 # Keep connections alive
|
|
max_connections=100 # Max total connections
|
|
keepalive_expiry=300.0 # 5-minute connection lifetime
|
|
connect_timeout=10.0 # Connection establishment timeout
|
|
read_timeout=30.0 # Read operation timeout
|
|
```
|
|
|
|
### Performance Testing
|
|
|
|
```bash
|
|
# Test HTTP connection pool functionality
|
|
./docker/scripts/test-http-connection-pool.py
|
|
|
|
# Load test proxy performance improvements
|
|
./docker/scripts/test-http-pool-load.sh
|
|
```
|
|
|
|
### Health Monitoring
|
|
|
|
The `/health` endpoint now includes HTTP connection pool status:
|
|
|
|
```json
|
|
{
|
|
"http_connection_pool": {
|
|
"status": "healthy",
|
|
"config": {
|
|
"max_keepalive_connections": 20,
|
|
"max_connections": 100,
|
|
"keepalive_expiry": 300.0
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
## Async Docker Operations
|
|
|
|
Docker operations now run asynchronously using aiodeocker to eliminate blocking calls in FastAPI's async event loop, significantly improving concurrency and preventing thread pool exhaustion.
|
|
|
|
### Async Benefits
|
|
|
|
- **Non-Blocking Operations**: Container creation, management, and cleanup no longer block the event loop
|
|
- **Improved Concurrency**: Handle multiple concurrent user sessions without performance degradation
|
|
- **Better Scalability**: Support higher throughput with the same system resources
|
|
- **Thread Pool Preservation**: Prevent exhaustion of async thread pools
|
|
|
|
### Configuration
|
|
|
|
```bash
|
|
# Enable async Docker operations (recommended)
|
|
export USE_ASYNC_DOCKER=true
|
|
|
|
# Or disable for sync mode (legacy)
|
|
export USE_ASYNC_DOCKER=false
|
|
```
|
|
|
|
### Testing Async Operations
|
|
|
|
```bash
|
|
# Test async Docker functionality
|
|
./docker/scripts/test-async-docker.py
|
|
|
|
# Load test concurrent operations
|
|
./docker/scripts/test-async-docker-load.sh
|
|
```
|
|
|
|
### Performance Impact
|
|
|
|
Async operations provide significant performance improvements:
|
|
|
|
- **Concurrent Sessions**: Handle 10+ concurrent container operations without blocking
|
|
- **Response Times**: Faster session creation under load
|
|
- **Resource Efficiency**: Better CPU utilization with non-blocking I/O
|
|
- **Scalability**: Support more users per server instance
|
|
|
|
## Resource Limits Enforcement
|
|
|
|
Container resource limits are now actively enforced to prevent resource exhaustion attacks and ensure fair resource allocation across user sessions.
|
|
|
|
### Configurable Limits
|
|
|
|
| Environment Variable | Default | Description |
|
|
|---------------------|---------|-------------|
|
|
| `CONTAINER_MEMORY_LIMIT` | `4g` | Memory limit per container |
|
|
| `CONTAINER_CPU_QUOTA` | `100000` | CPU quota (microseconds per period) |
|
|
| `CONTAINER_CPU_PERIOD` | `100000` | CPU period (microseconds) |
|
|
| `MAX_CONCURRENT_SESSIONS` | `3` | Maximum concurrent user sessions |
|
|
| `MEMORY_WARNING_THRESHOLD` | `0.8` | Memory usage warning threshold (80%) |
|
|
| `CPU_WARNING_THRESHOLD` | `0.9` | CPU usage warning threshold (90%) |
|
|
|
|
### Resource Protection Features
|
|
|
|
- **Memory Limits**: Prevents containers from consuming unlimited RAM
|
|
- **CPU Quotas**: Ensures fair CPU allocation across sessions
|
|
- **Session Throttling**: Blocks new sessions when resources are constrained
|
|
- **System Monitoring**: Continuous resource usage tracking
|
|
- **Graceful Degradation**: Alerts and throttling before system failure
|
|
|
|
### Testing Resource Limits
|
|
|
|
```bash
|
|
# Test resource limit configuration and validation
|
|
./docker/scripts/test-resource-limits.py
|
|
|
|
# Load testing with enforcement verification
|
|
./docker/scripts/test-resource-limits-load.sh
|
|
```
|
|
|
|
### Health Monitoring
|
|
|
|
The `/health` endpoint now includes comprehensive resource information:
|
|
|
|
```json
|
|
{
|
|
"resource_limits": {
|
|
"memory_limit": "4g",
|
|
"cpu_quota": 100000,
|
|
"max_concurrent_sessions": 3
|
|
},
|
|
"system_resources": {
|
|
"memory_percent": 0.65,
|
|
"cpu_percent": 0.45
|
|
},
|
|
"resource_alerts": []
|
|
}
|
|
```
|
|
|
|
### Resource Alert Levels
|
|
|
|
- **Warning**: System resources approaching limits (80% memory, 90% CPU)
|
|
- **Critical**: System resources at dangerous levels (95%+ usage)
|
|
- **Throttling**: New sessions blocked when critical alerts active
|
|
|
|
## Security Audit Checklist
|
|
|
|
- [ ] TLS certificates generated with strong encryption
|
|
- [ ] Certificate permissions set correctly (400/444)
|
|
- [ ] No socket mounting in docker-compose.yml
|
|
- [ ] Environment variables properly configured
|
|
- [ ] TLS connection tested successfully
|
|
- [ ] Host IP detection working correctly
|
|
- [ ] Proxy routing functional across environments
|
|
- [ ] Resource limits properly configured and enforced
|
|
- [ ] Session throttling prevents resource exhaustion
|
|
- [ ] System resource monitoring active
|
|
- [ ] Certificate rotation process documented
|
|
- [ ] Firewall rules restrict Docker API access
|
|
- [ ] Docker daemon configured with security options
|
|
- [ ] Monitoring and logging enabled for API access |