Migration from Docker to Standalone Python Server (#73)

* Migration from docker to standalone server
Migration handling
Fixed tests
Use simpler in-memory storage
Support for concurrent logging to disk
Simplified direct connections to localhost

* Migration from docker / redis to standalone script
Updated tests
Updated run script
Fixed requirements
Use dotenv
Ask if user would like to install MCP in Claude Desktop once
Updated docs

* More cleanup and references to docker removed

* Cleanup

* Comments

* Fixed tests

* Fix GitHub Actions workflow for standalone Python architecture

- Install requirements-dev.txt for pytest and testing dependencies
- Remove Docker setup from simulation tests (now standalone)
- Simplify linting job to use requirements-dev.txt
- Update simulation tests to run directly without Docker

Fixes unit test failures in CI due to missing pytest dependency.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Remove simulation tests from GitHub Actions

- Removed simulation-tests job that makes real API calls
- Keep only unit tests (mocked, no API costs) and linting
- Simulation tests should be run manually with real API keys
- Reduces CI costs and complexity

GitHub Actions now only runs:
- Unit tests (569 tests, all mocked)
- Code quality checks (ruff, black)

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Fixed tests

* Fixed tests

---------

Co-authored-by: Claude <noreply@anthropic.com>
This commit is contained in:
Beehive Innovations
2025-06-18 23:41:22 +04:00
committed by GitHub
parent 9d72545ecd
commit 4151c3c3a5
121 changed files with 2842 additions and 3168 deletions

View File

@@ -4,7 +4,7 @@ Utility functions for Zen MCP Server
from .file_types import CODE_EXTENSIONS, FILE_CATEGORIES, PROGRAMMING_EXTENSIONS, TEXT_EXTENSIONS
from .file_utils import expand_paths, read_file_content, read_files
from .security_config import EXCLUDED_DIRS, SECURITY_ROOT
from .security_config import EXCLUDED_DIRS
from .token_utils import check_token_limit, estimate_tokens
__all__ = [
@@ -15,7 +15,6 @@ __all__ = [
"PROGRAMMING_EXTENSIONS",
"TEXT_EXTENSIONS",
"FILE_CATEGORIES",
"SECURITY_ROOT",
"EXCLUDED_DIRS",
"estimate_tokens",
"check_token_limit",

View File

@@ -3,15 +3,29 @@ Conversation Memory for AI-to-AI Multi-turn Discussions
This module provides conversation persistence and context reconstruction for
stateless MCP (Model Context Protocol) environments. It enables multi-turn
conversations between Claude and Gemini by storing conversation state in Redis
conversations between Claude and Gemini by storing conversation state in memory
across independent request cycles.
CRITICAL ARCHITECTURAL REQUIREMENT:
This conversation memory system is designed for PERSISTENT MCP SERVER PROCESSES.
It uses in-memory storage that persists only within a single Python process.
⚠️ IMPORTANT: This system will NOT work correctly if MCP tool calls are made
as separate subprocess invocations (each subprocess starts with empty memory).
WORKING SCENARIO: Claude Desktop with persistent MCP server process
FAILING SCENARIO: Simulator tests calling server.py as individual subprocesses
Root cause of test failures: Each subprocess call loses the conversation
state from previous calls because memory is process-specific, not shared
across subprocess boundaries.
ARCHITECTURE OVERVIEW:
The MCP protocol is inherently stateless - each tool request is independent
with no memory of previous interactions. This module bridges that gap by:
1. Creating persistent conversation threads with unique UUIDs
2. Storing complete conversation context (turns, files, metadata) in Redis
2. Storing complete conversation context (turns, files, metadata) in memory
3. Reconstructing conversation history when tools are called with continuation_id
4. Supporting cross-tool continuation - seamlessly switch between different tools
while maintaining full conversation context and file references
@@ -35,9 +49,9 @@ Key Features:
most recent file context is preserved when token limits require exclusions.
- Automatic turn limiting (20 turns max) to prevent runaway conversations
- Context reconstruction for stateless request continuity
- Redis-based persistence with automatic expiration (3 hour TTL)
- In-memory persistence with automatic expiration (3 hour TTL)
- Thread-safe operations for concurrent access
- Graceful degradation when Redis is unavailable
- Graceful degradation when storage is unavailable
DUAL PRIORITIZATION STRATEGY (Files & Conversations):
The conversation memory system implements sophisticated prioritization for both files and
@@ -187,26 +201,16 @@ class ThreadContext(BaseModel):
initial_context: dict[str, Any] # Original request parameters
def get_redis_client():
def get_storage():
"""
Get Redis client from environment configuration
Creates a Redis client using the REDIS_URL environment variable.
Defaults to localhost:6379/0 if not specified.
Get in-memory storage backend for conversation persistence.
Returns:
redis.Redis: Configured Redis client with decode_responses=True
Raises:
ValueError: If redis package is not installed
InMemoryStorage: Thread-safe in-memory storage backend
"""
try:
import redis
from .storage_backend import get_storage_backend
redis_url = os.getenv("REDIS_URL", "redis://localhost:6379/0")
return redis.from_url(redis_url, decode_responses=True)
except ImportError:
raise ValueError("redis package required. Install with: pip install redis")
return get_storage_backend()
def create_thread(tool_name: str, initial_request: dict[str, Any], parent_thread_id: Optional[str] = None) -> str:
@@ -251,10 +255,10 @@ def create_thread(tool_name: str, initial_request: dict[str, Any], parent_thread
initial_context=filtered_context,
)
# Store in Redis with configurable TTL to prevent indefinite accumulation
client = get_redis_client()
# Store in memory with configurable TTL to prevent indefinite accumulation
storage = get_storage()
key = f"thread:{thread_id}"
client.setex(key, CONVERSATION_TIMEOUT_SECONDS, context.model_dump_json())
storage.setex(key, CONVERSATION_TIMEOUT_SECONDS, context.model_dump_json())
logger.debug(f"[THREAD] Created new thread {thread_id} with parent {parent_thread_id}")
@@ -263,7 +267,7 @@ def create_thread(tool_name: str, initial_request: dict[str, Any], parent_thread
def get_thread(thread_id: str) -> Optional[ThreadContext]:
"""
Retrieve thread context from Redis
Retrieve thread context from in-memory storage
Fetches complete conversation context for cross-tool continuation.
This is the core function that enables tools to access conversation
@@ -278,22 +282,22 @@ def get_thread(thread_id: str) -> Optional[ThreadContext]:
Security:
- Validates UUID format to prevent injection attacks
- Handles Redis connection failures gracefully
- Handles storage connection failures gracefully
- No error information leakage on failure
"""
if not thread_id or not _is_valid_uuid(thread_id):
return None
try:
client = get_redis_client()
storage = get_storage()
key = f"thread:{thread_id}"
data = client.get(key)
data = storage.get(key)
if data:
return ThreadContext.model_validate_json(data)
return None
except Exception:
# Silently handle errors to avoid exposing Redis details
# Silently handle errors to avoid exposing storage details
return None
@@ -313,8 +317,7 @@ def add_turn(
Appends a new conversation turn to an existing thread. This is the core
function for building conversation history and enabling cross-tool
continuation. Each turn preserves the tool and model that generated it,
and tracks file reception order using atomic Redis counters.
continuation. Each turn preserves the tool and model that generated it.
Args:
thread_id: UUID of the conversation thread
@@ -333,7 +336,7 @@ def add_turn(
Failure cases:
- Thread doesn't exist or expired
- Maximum turn limit reached
- Redis connection failure
- Storage connection failure
Note:
- Refreshes thread TTL to configured timeout on successful update
@@ -370,14 +373,14 @@ def add_turn(
context.turns.append(turn)
context.last_updated_at = datetime.now(timezone.utc).isoformat()
# Save back to Redis and refresh TTL
# Save back to storage and refresh TTL
try:
client = get_redis_client()
storage = get_storage()
key = f"thread:{thread_id}"
client.setex(key, CONVERSATION_TIMEOUT_SECONDS, context.model_dump_json()) # Refresh TTL to configured timeout
storage.setex(key, CONVERSATION_TIMEOUT_SECONDS, context.model_dump_json()) # Refresh TTL to configured timeout
return True
except Exception as e:
logger.debug(f"[FLOW] Failed to save turn to Redis: {type(e).__name__}")
logger.debug(f"[FLOW] Failed to save turn to storage: {type(e).__name__}")
return False
@@ -591,11 +594,9 @@ def _plan_file_inclusion_by_size(all_files: list[str], max_file_tokens: int) ->
for file_path in all_files:
try:
from utils.file_utils import estimate_file_tokens, translate_path_for_environment
from utils.file_utils import estimate_file_tokens
translated_path = translate_path_for_environment(file_path)
if os.path.exists(translated_path) and os.path.isfile(translated_path):
if os.path.exists(file_path) and os.path.isfile(file_path):
# Use centralized token estimation for consistency
estimated_tokens = estimate_file_tokens(file_path)
@@ -613,7 +614,7 @@ def _plan_file_inclusion_by_size(all_files: list[str], max_file_tokens: int) ->
else:
files_to_skip.append(file_path)
# More descriptive message for missing files
if not os.path.exists(translated_path):
if not os.path.exists(file_path):
logger.debug(
f"[FILES] Skipping {file_path} - file no longer exists (may have been moved/deleted since conversation)"
)
@@ -724,7 +725,7 @@ def build_conversation_history(context: ThreadContext, model_context=None, read_
Performance Characteristics:
- O(n) file collection with newest-first prioritization
- Intelligent token budgeting prevents context window overflow
- Redis-based persistence with automatic TTL management
- In-memory persistence with automatic TTL management
- Graceful degradation when files are inaccessible or too large
"""
# Get the complete thread chain
@@ -851,10 +852,7 @@ def build_conversation_history(context: ThreadContext, model_context=None, read_
except Exception as e:
# More descriptive error handling for missing files
try:
from utils.file_utils import translate_path_for_environment
translated_path = translate_path_for_environment(file_path)
if not os.path.exists(translated_path):
if not os.path.exists(file_path):
logger.info(
f"File no longer accessible for conversation history: {file_path} - file was moved/deleted since conversation (marking as excluded)"
)

View File

@@ -79,7 +79,7 @@ TEXT_DATA = {
".csv", # CSV
".tsv", # TSV
".gitignore", # Git ignore
".dockerfile", # Docker
".dockerfile", # Dockerfile
".makefile", # Make
".cmake", # CMake
".gradle", # Gradle
@@ -221,7 +221,7 @@ TOKEN_ESTIMATION_RATIOS = {
# Logs and data
".log": 4.5, # Log files - timestamps, messages, stack traces
".csv": 3.1, # CSV - data with delimiters
# Docker and infrastructure
# Infrastructure files
".dockerfile": 3.7, # Dockerfile - commands and paths
".tf": 3.5, # Terraform - infrastructure as code
}

View File

@@ -45,7 +45,7 @@ from pathlib import Path
from typing import Callable, Optional
from .file_types import BINARY_EXTENSIONS, CODE_EXTENSIONS, IMAGE_EXTENSIONS, TEXT_EXTENSIONS
from .security_config import CONTAINER_WORKSPACE, EXCLUDED_DIRS, MCP_SIGNATURE_FILES, SECURITY_ROOT, WORKSPACE_ROOT
from .security_config import EXCLUDED_DIRS, is_dangerous_path
from .token_utils import DEFAULT_CONTEXT_WINDOW, estimate_tokens
@@ -92,44 +92,32 @@ def is_mcp_directory(path: Path) -> bool:
path: Directory path to check
Returns:
True if this appears to be the MCP directory
True if this is the MCP server directory or a subdirectory
"""
if not path.is_dir():
return False
# Check for multiple signature files to be sure
matches = 0
for sig_file in MCP_SIGNATURE_FILES:
if (path / sig_file).exists():
matches += 1
if matches >= 3: # Require at least 3 matches to be certain
logger.info(f"Detected MCP directory at {path}, will exclude from scanning")
return True
return False
# Get the directory where the MCP server is running from
# __file__ is utils/file_utils.py, so parent.parent is the MCP root
mcp_server_dir = Path(__file__).parent.parent.resolve()
# Check if the given path is the MCP server directory or a subdirectory
try:
path.resolve().relative_to(mcp_server_dir)
logger.info(f"Detected MCP server directory at {path}, will exclude from scanning")
return True
except ValueError:
# Not a subdirectory of MCP server
return False
def get_user_home_directory() -> Optional[Path]:
"""
Get the user's home directory based on environment variables.
In Docker, USER_HOME should be set to the mounted home path.
Outside Docker, we use Path.home() or environment variables.
Get the user's home directory.
Returns:
User's home directory path or None if not determinable
User's home directory path
"""
# Check for explicit USER_HOME env var (set in docker-compose.yml)
user_home = os.environ.get("USER_HOME")
if user_home:
return Path(user_home).resolve()
# In container, check if we're running in Docker
if CONTAINER_WORKSPACE.exists():
# We're in Docker but USER_HOME not set - use WORKSPACE_ROOT as fallback
if WORKSPACE_ROOT:
return Path(WORKSPACE_ROOT).resolve()
# Outside Docker, use system home
return Path.home()
@@ -291,155 +279,51 @@ def _add_line_numbers(content: str) -> str:
return "\n".join(numbered_lines)
def translate_path_for_environment(path_str: str) -> str:
"""
Translate paths between host and container environments as needed.
This is the unified path translation function that should be used by all
tools and utilities throughout the codebase. It handles:
1. Docker host-to-container path translation (host paths -> /workspace/...)
2. Direct mode (no translation needed)
3. Internal server files (conf/custom_models.json)
4. Security validation and error handling
Docker Path Translation Logic:
- Input: /Users/john/project/src/file.py (host path from Claude)
- WORKSPACE_ROOT: /Users/john/project (host path in env var)
- Output: /workspace/src/file.py (container path for file operations)
Args:
path_str: Original path string from the client (absolute host path)
Returns:
Translated path appropriate for the current environment
"""
# Handle built-in server config file - no translation needed
if _is_builtin_custom_models_config(path_str):
return path_str
if not WORKSPACE_ROOT or not WORKSPACE_ROOT.strip() or not CONTAINER_WORKSPACE.exists():
if path_str.startswith("/app/"):
# Convert Docker internal paths to local relative paths for standalone mode
relative_path = path_str[5:] # Remove "/app/" prefix
if relative_path.startswith("/"):
relative_path = relative_path[1:] # Remove leading slash if present
return "./" + relative_path
# No other translation needed for standalone mode
return path_str
# Check if the path is already a container path (starts with /workspace)
if path_str.startswith(str(CONTAINER_WORKSPACE) + "/") or path_str == str(CONTAINER_WORKSPACE):
# Path is already translated to container format, return as-is
return path_str
try:
# Use os.path.realpath for security - it resolves symlinks completely
# This prevents symlink attacks that could escape the workspace
real_workspace_root = Path(os.path.realpath(WORKSPACE_ROOT))
# For the host path, we can't use realpath if it doesn't exist in the container
# So we'll use Path().resolve(strict=False) instead
real_host_path = Path(path_str).resolve(strict=False)
# Security check: ensure the path is within the mounted workspace
# This prevents path traversal attacks (e.g., ../../../etc/passwd)
relative_path = real_host_path.relative_to(real_workspace_root)
# Construct the container path
container_path = CONTAINER_WORKSPACE / relative_path
# Log the translation for debugging (but not sensitive paths)
if str(container_path) != path_str:
logger.info(f"Translated host path to container: {path_str} -> {container_path}")
return str(container_path)
except ValueError:
# Path is not within the host's WORKSPACE_ROOT
# In Docker, we cannot access files outside the mounted volume
logger.warning(
f"Path '{path_str}' is outside the mounted workspace '{WORKSPACE_ROOT}'. "
f"Docker containers can only access files within the mounted directory."
)
# Return a clear error path that will fail gracefully
return f"/inaccessible/outside/mounted/volume{path_str}"
except Exception as e:
# Log unexpected errors but don't expose internal details to clients
logger.warning(f"Path translation failed for '{path_str}': {type(e).__name__}")
# Return a clear error path that will fail gracefully
return f"/inaccessible/translation/error{path_str}"
def resolve_and_validate_path(path_str: str) -> Path:
"""
Resolves, translates, and validates a path against security policies.
Resolves and validates a path against security policies.
This is the primary security function that ensures all file access
is properly sandboxed. It enforces three critical policies:
1. Translate host paths to container paths if applicable (Docker environment)
2. All paths must be absolute (no ambiguity)
3. All paths must resolve to within PROJECT_ROOT (sandboxing)
This function ensures safe file access by:
1. Requiring absolute paths (no ambiguity)
2. Resolving symlinks to prevent deception
3. Blocking access to dangerous system directories
Args:
path_str: Path string (must be absolute)
Returns:
Resolved Path object that is guaranteed to be within PROJECT_ROOT
Resolved Path object that is safe to access
Raises:
ValueError: If path is not absolute or otherwise invalid
PermissionError: If path is outside allowed directory
PermissionError: If path is in a dangerous location
"""
# Step 1: Translate Docker paths first (if applicable)
# This must happen before any other validation
translated_path_str = translate_path_for_environment(path_str)
# Step 1: Create a Path object
user_path = Path(path_str)
# Step 2: Create a Path object from the (potentially translated) path
user_path = Path(translated_path_str)
# Step 3: Security Policy - Require absolute paths
# Step 2: Security Policy - Require absolute paths
# Relative paths could be interpreted differently depending on working directory
if not user_path.is_absolute():
raise ValueError(f"Relative paths are not supported. Please provide an absolute path.\nReceived: {path_str}")
# Step 4: Resolve the absolute path (follows symlinks, removes .. and .)
# Step 3: Resolve the absolute path (follows symlinks, removes .. and .)
# This is critical for security as it reveals the true destination of symlinks
resolved_path = user_path.resolve()
# Step 5: Security Policy - Ensure the resolved path is within PROJECT_ROOT
# This prevents directory traversal attacks (e.g., /project/../../../etc/passwd)
try:
resolved_path.relative_to(SECURITY_ROOT)
except ValueError:
# Provide detailed error for debugging while avoiding information disclosure
logger.warning(
f"Access denied - path outside workspace. "
f"Requested: {path_str}, Resolved: {resolved_path}, Workspace: {SECURITY_ROOT}"
)
# Step 4: Check against dangerous paths
if is_dangerous_path(resolved_path):
logger.warning(f"Access denied - dangerous path: {resolved_path}")
raise PermissionError(f"Access to system directory denied: {path_str}")
# Step 5: Check if it's the home directory root
if is_home_directory_root(resolved_path):
raise PermissionError(
f"Path outside workspace: {path_str}\nWorkspace: {SECURITY_ROOT}\nResolved path: {resolved_path}"
f"Cannot scan entire home directory: {path_str}\n" f"Please specify a subdirectory within your home folder."
)
return resolved_path
def translate_file_paths(file_paths: Optional[list[str]]) -> Optional[list[str]]:
"""
Translate a list of file paths for the current environment.
This function should be used by all tools to consistently handle path translation
for file lists. It applies the unified path translation to each path in the list.
Args:
file_paths: List of file paths to translate, or None
Returns:
List of translated paths, or None if input was None
"""
if not file_paths:
return file_paths
return [translate_path_for_environment(path) for path in file_paths]
def expand_paths(paths: list[str], extensions: Optional[set[str]] = None) -> list[str]:
"""
Expand paths to individual files, handling both files and directories.
@@ -474,23 +358,12 @@ def expand_paths(paths: list[str], extensions: Optional[set[str]] = None) -> lis
# Safety checks for directory scanning
if path_obj.is_dir():
resolved_workspace = SECURITY_ROOT.resolve()
resolved_path = path_obj.resolve()
# Check 1: Prevent reading entire workspace root
if resolved_path == resolved_workspace:
logger.warning(
f"Ignoring request to read entire workspace directory: {path}. "
f"Please specify individual files or subdirectories instead."
)
continue
# Check 2: Prevent scanning user's home directory root
# Check 1: Prevent scanning user's home directory root
if is_home_directory_root(path_obj):
logger.warning(f"Skipping home directory root: {path}. Please specify a project subdirectory instead.")
continue
# Check 3: Skip if this is the MCP's own directory
# Check 2: Skip if this is the MCP's own directory
if is_mcp_directory(path_obj):
logger.info(
f"Skipping MCP server directory: {path}. The MCP server code is excluded from project scans."
@@ -573,15 +446,6 @@ def read_file_content(
# Return error in a format that provides context to the AI
logger.debug(f"[FILES] Path validation failed for {file_path}: {type(e).__name__}: {e}")
error_msg = str(e)
# Add Docker-specific help if we're in Docker and path is inaccessible
if WORKSPACE_ROOT and CONTAINER_WORKSPACE.exists():
# We're in Docker
error_msg = (
f"File is outside the Docker mounted directory. "
f"When running in Docker, only files within the mounted workspace are accessible. "
f"Current mounted directory: {WORKSPACE_ROOT}. "
f"To access files in a different directory, please run Claude from that directory."
)
content = f"\n--- ERROR ACCESSING FILE: {file_path} ---\nError: {error_msg}\n--- END FILE ---\n"
tokens = estimate_tokens(content)
logger.debug(f"[FILES] Returning error content for {file_path}: {tokens} tokens")
@@ -761,12 +625,10 @@ def estimate_file_tokens(file_path: str) -> int:
Estimated token count for the file
"""
try:
translated_path = translate_path_for_environment(file_path)
if not os.path.exists(translated_path) or not os.path.isfile(translated_path):
if not os.path.exists(file_path) or not os.path.isfile(file_path):
return 0
file_size = os.path.getsize(translated_path)
file_size = os.path.getsize(file_path)
# Get the appropriate ratio for this file type
from .file_types import get_token_estimation_ratio
@@ -911,11 +773,10 @@ def read_json_file(file_path: str) -> Optional[dict]:
Parsed JSON data as dict, or None if file doesn't exist or invalid
"""
try:
translated_path = translate_path_for_environment(file_path)
if not os.path.exists(translated_path):
if not os.path.exists(file_path):
return None
with open(translated_path, encoding="utf-8") as f:
with open(file_path, encoding="utf-8") as f:
return json.load(f)
except (json.JSONDecodeError, OSError):
return None
@@ -934,10 +795,9 @@ def write_json_file(file_path: str, data: dict, indent: int = 2) -> bool:
True if successful, False otherwise
"""
try:
translated_path = translate_path_for_environment(file_path)
os.makedirs(os.path.dirname(translated_path), exist_ok=True)
os.makedirs(os.path.dirname(file_path), exist_ok=True)
with open(translated_path, "w", encoding="utf-8") as f:
with open(file_path, "w", encoding="utf-8") as f:
json.dump(data, f, indent=indent, ensure_ascii=False)
return True
except (OSError, TypeError):
@@ -955,9 +815,8 @@ def get_file_size(file_path: str) -> int:
File size in bytes, or 0 if file doesn't exist or error
"""
try:
translated_path = translate_path_for_environment(file_path)
if os.path.exists(translated_path) and os.path.isfile(translated_path):
return os.path.getsize(translated_path)
if os.path.exists(file_path) and os.path.isfile(file_path):
return os.path.getsize(file_path)
return 0
except OSError:
return 0
@@ -974,8 +833,7 @@ def ensure_directory_exists(file_path: str) -> bool:
True if directory exists or was created, False on error
"""
try:
translated_path = translate_path_for_environment(file_path)
directory = os.path.dirname(translated_path)
directory = os.path.dirname(file_path)
if directory:
os.makedirs(directory, exist_ok=True)
return True
@@ -1010,15 +868,14 @@ def read_file_safely(file_path: str, max_size: int = 10 * 1024 * 1024) -> Option
File content as string, or None if file too large or unreadable
"""
try:
translated_path = translate_path_for_environment(file_path)
if not os.path.exists(translated_path) or not os.path.isfile(translated_path):
if not os.path.exists(file_path) or not os.path.isfile(file_path):
return None
file_size = os.path.getsize(translated_path)
file_size = os.path.getsize(file_path)
if file_size > max_size:
return None
with open(translated_path, encoding="utf-8", errors="ignore") as f:
with open(file_path, encoding="utf-8", errors="ignore") as f:
return f.read()
except OSError:
return None

View File

@@ -55,7 +55,7 @@ def find_git_repositories(start_path: str, max_depth: int = 5) -> list[str]:
try:
# Create Path object - no need to resolve yet since the path might be
# a translated Docker path that doesn't exist on the host
# a translated path that doesn't exist
start_path = Path(start_path)
# Basic validation - must be absolute

View File

@@ -2,15 +2,14 @@
Security configuration and path validation constants
This module contains security-related constants and configurations
for file access control and workspace management.
for file access control.
"""
import os
from pathlib import Path
# Dangerous paths that should never be used as WORKSPACE_ROOT
# Dangerous paths that should never be scanned
# These would give overly broad access and pose security risks
DANGEROUS_WORKSPACE_PATHS = {
DANGEROUS_PATHS = {
"/",
"/etc",
"/usr",
@@ -18,7 +17,6 @@ DANGEROUS_WORKSPACE_PATHS = {
"/var",
"/root",
"/home",
"/workspace", # Container path - WORKSPACE_ROOT should be host path
"C:\\",
"C:\\Windows",
"C:\\Program Files",
@@ -88,87 +86,19 @@ EXCLUDED_DIRS = {
"vendor",
}
# MCP signature files - presence of these indicates the MCP's own directory
# Used to prevent the MCP from scanning its own codebase
MCP_SIGNATURE_FILES = {
"zen_server.py",
"server.py",
"tools/precommit.py",
"utils/file_utils.py",
"prompts/tool_prompts.py",
}
# Workspace configuration
WORKSPACE_ROOT = os.environ.get("WORKSPACE_ROOT")
CONTAINER_WORKSPACE = Path("/workspace")
def validate_workspace_security(workspace_root: str) -> None:
def is_dangerous_path(path: Path) -> bool:
"""
Validate that WORKSPACE_ROOT is set to a safe directory.
Check if a path is in the dangerous paths list.
Args:
workspace_root: The workspace root path to validate
Raises:
RuntimeError: If the workspace root is unsafe
"""
if not workspace_root:
return
# Resolve to canonical path for comparison
resolved_workspace = Path(workspace_root).resolve()
# Special check for /workspace - common configuration mistake
if str(resolved_workspace) == "/workspace":
raise RuntimeError(
f"Configuration Error: WORKSPACE_ROOT should be set to the HOST path, not the container path. "
f"Found: WORKSPACE_ROOT={workspace_root} "
f"Expected: WORKSPACE_ROOT should be set to your host directory path (e.g., $HOME) "
f"that contains all files Claude might reference. "
f"This path gets mounted to /workspace inside the Docker container."
)
# Check against other dangerous paths
if str(resolved_workspace) in DANGEROUS_WORKSPACE_PATHS:
raise RuntimeError(
f"Security Error: WORKSPACE_ROOT '{workspace_root}' is set to a dangerous system directory. "
f"This would give access to critical system files. "
f"Please set WORKSPACE_ROOT to a specific project directory."
)
# Additional check: prevent filesystem root
if resolved_workspace.parent == resolved_workspace:
raise RuntimeError(
f"Security Error: WORKSPACE_ROOT '{workspace_root}' cannot be the filesystem root. "
f"This would give access to the entire filesystem. "
f"Please set WORKSPACE_ROOT to a specific project directory."
)
def get_security_root() -> Path:
"""
Determine the security boundary for file access.
path: Path to check
Returns:
Path object representing the security root directory
True if the path is dangerous and should not be accessed
"""
# In Docker: use /workspace (container directory)
# In tests/direct mode: use WORKSPACE_ROOT (host directory)
if CONTAINER_WORKSPACE.exists():
# Running in Docker container
return CONTAINER_WORKSPACE
elif WORKSPACE_ROOT:
# Running in tests or direct mode with WORKSPACE_ROOT set
return Path(WORKSPACE_ROOT).resolve()
else:
# Fallback for backward compatibility (should not happen in normal usage)
return Path.home()
# Validate security on import if WORKSPACE_ROOT is set
if WORKSPACE_ROOT:
validate_workspace_security(WORKSPACE_ROOT)
# Export the computed security root
SECURITY_ROOT = get_security_root()
try:
resolved = path.resolve()
return str(resolved) in DANGEROUS_PATHS or resolved.parent == resolved
except Exception:
return True # If we can't resolve, consider it dangerous

113
utils/storage_backend.py Normal file
View File

@@ -0,0 +1,113 @@
"""
In-memory storage backend for conversation threads
This module provides a thread-safe, in-memory alternative to Redis for storing
conversation contexts. It's designed for ephemeral MCP server sessions where
conversations only need to persist during a single Claude session.
⚠️ PROCESS-SPECIFIC STORAGE: This storage is confined to a single Python process.
Data stored in one process is NOT accessible from other processes or subprocesses.
This is why simulator tests that run server.py as separate subprocesses cannot
share conversation state between tool calls.
Key Features:
- Thread-safe operations using locks
- TTL support with automatic expiration
- Background cleanup thread for memory management
- Singleton pattern for consistent state within a single process
- Drop-in replacement for Redis storage (for single-process scenarios)
"""
import logging
import os
import threading
import time
from typing import Optional
logger = logging.getLogger(__name__)
class InMemoryStorage:
"""Thread-safe in-memory storage for conversation threads"""
def __init__(self):
self._store: dict[str, tuple[str, float]] = {}
self._lock = threading.Lock()
# Match Redis behavior: cleanup interval based on conversation timeout
# Run cleanup at 1/10th of timeout interval (e.g., 18 mins for 3 hour timeout)
timeout_hours = int(os.getenv("CONVERSATION_TIMEOUT_HOURS", "3"))
self._cleanup_interval = (timeout_hours * 3600) // 10
self._cleanup_interval = max(300, self._cleanup_interval) # Minimum 5 minutes
self._shutdown = False
# Start background cleanup thread
self._cleanup_thread = threading.Thread(target=self._cleanup_worker, daemon=True)
self._cleanup_thread.start()
logger.info(
f"In-memory storage initialized with {timeout_hours}h timeout, cleanup every {self._cleanup_interval//60}m"
)
def set_with_ttl(self, key: str, ttl_seconds: int, value: str) -> None:
"""Store value with expiration time"""
with self._lock:
expires_at = time.time() + ttl_seconds
self._store[key] = (value, expires_at)
logger.debug(f"Stored key {key} with TTL {ttl_seconds}s")
def get(self, key: str) -> Optional[str]:
"""Retrieve value if not expired"""
with self._lock:
if key in self._store:
value, expires_at = self._store[key]
if time.time() < expires_at:
logger.debug(f"Retrieved key {key}")
return value
else:
# Clean up expired entry
del self._store[key]
logger.debug(f"Key {key} expired and removed")
return None
def setex(self, key: str, ttl_seconds: int, value: str) -> None:
"""Redis-compatible setex method"""
self.set_with_ttl(key, ttl_seconds, value)
def _cleanup_worker(self):
"""Background thread that periodically cleans up expired entries"""
while not self._shutdown:
time.sleep(self._cleanup_interval)
self._cleanup_expired()
def _cleanup_expired(self):
"""Remove all expired entries"""
with self._lock:
current_time = time.time()
expired_keys = [k for k, (_, exp) in self._store.items() if exp < current_time]
for key in expired_keys:
del self._store[key]
if expired_keys:
logger.debug(f"Cleaned up {len(expired_keys)} expired conversation threads")
def shutdown(self):
"""Graceful shutdown of background thread"""
self._shutdown = True
if self._cleanup_thread.is_alive():
self._cleanup_thread.join(timeout=1)
# Global singleton instance
_storage_instance = None
_storage_lock = threading.Lock()
def get_storage_backend() -> InMemoryStorage:
"""Get the global storage instance (singleton pattern)"""
global _storage_instance
if _storage_instance is None:
with _storage_lock:
if _storage_instance is None:
_storage_instance = InMemoryStorage()
logger.info("Initialized in-memory conversation storage")
return _storage_instance