Migration from Docker to Standalone Python Server (#73)
* Migration from docker to standalone server Migration handling Fixed tests Use simpler in-memory storage Support for concurrent logging to disk Simplified direct connections to localhost * Migration from docker / redis to standalone script Updated tests Updated run script Fixed requirements Use dotenv Ask if user would like to install MCP in Claude Desktop once Updated docs * More cleanup and references to docker removed * Cleanup * Comments * Fixed tests * Fix GitHub Actions workflow for standalone Python architecture - Install requirements-dev.txt for pytest and testing dependencies - Remove Docker setup from simulation tests (now standalone) - Simplify linting job to use requirements-dev.txt - Update simulation tests to run directly without Docker Fixes unit test failures in CI due to missing pytest dependency. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Remove simulation tests from GitHub Actions - Removed simulation-tests job that makes real API calls - Keep only unit tests (mocked, no API costs) and linting - Simulation tests should be run manually with real API keys - Reduces CI costs and complexity GitHub Actions now only runs: - Unit tests (569 tests, all mocked) - Code quality checks (ruff, black) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fixed tests * Fixed tests --------- Co-authored-by: Claude <noreply@anthropic.com>
This commit is contained in:
committed by
GitHub
parent
9d72545ecd
commit
4151c3c3a5
@@ -4,7 +4,7 @@ Utility functions for Zen MCP Server
|
||||
|
||||
from .file_types import CODE_EXTENSIONS, FILE_CATEGORIES, PROGRAMMING_EXTENSIONS, TEXT_EXTENSIONS
|
||||
from .file_utils import expand_paths, read_file_content, read_files
|
||||
from .security_config import EXCLUDED_DIRS, SECURITY_ROOT
|
||||
from .security_config import EXCLUDED_DIRS
|
||||
from .token_utils import check_token_limit, estimate_tokens
|
||||
|
||||
__all__ = [
|
||||
@@ -15,7 +15,6 @@ __all__ = [
|
||||
"PROGRAMMING_EXTENSIONS",
|
||||
"TEXT_EXTENSIONS",
|
||||
"FILE_CATEGORIES",
|
||||
"SECURITY_ROOT",
|
||||
"EXCLUDED_DIRS",
|
||||
"estimate_tokens",
|
||||
"check_token_limit",
|
||||
|
||||
@@ -3,15 +3,29 @@ Conversation Memory for AI-to-AI Multi-turn Discussions
|
||||
|
||||
This module provides conversation persistence and context reconstruction for
|
||||
stateless MCP (Model Context Protocol) environments. It enables multi-turn
|
||||
conversations between Claude and Gemini by storing conversation state in Redis
|
||||
conversations between Claude and Gemini by storing conversation state in memory
|
||||
across independent request cycles.
|
||||
|
||||
CRITICAL ARCHITECTURAL REQUIREMENT:
|
||||
This conversation memory system is designed for PERSISTENT MCP SERVER PROCESSES.
|
||||
It uses in-memory storage that persists only within a single Python process.
|
||||
|
||||
⚠️ IMPORTANT: This system will NOT work correctly if MCP tool calls are made
|
||||
as separate subprocess invocations (each subprocess starts with empty memory).
|
||||
|
||||
WORKING SCENARIO: Claude Desktop with persistent MCP server process
|
||||
FAILING SCENARIO: Simulator tests calling server.py as individual subprocesses
|
||||
|
||||
Root cause of test failures: Each subprocess call loses the conversation
|
||||
state from previous calls because memory is process-specific, not shared
|
||||
across subprocess boundaries.
|
||||
|
||||
ARCHITECTURE OVERVIEW:
|
||||
The MCP protocol is inherently stateless - each tool request is independent
|
||||
with no memory of previous interactions. This module bridges that gap by:
|
||||
|
||||
1. Creating persistent conversation threads with unique UUIDs
|
||||
2. Storing complete conversation context (turns, files, metadata) in Redis
|
||||
2. Storing complete conversation context (turns, files, metadata) in memory
|
||||
3. Reconstructing conversation history when tools are called with continuation_id
|
||||
4. Supporting cross-tool continuation - seamlessly switch between different tools
|
||||
while maintaining full conversation context and file references
|
||||
@@ -35,9 +49,9 @@ Key Features:
|
||||
most recent file context is preserved when token limits require exclusions.
|
||||
- Automatic turn limiting (20 turns max) to prevent runaway conversations
|
||||
- Context reconstruction for stateless request continuity
|
||||
- Redis-based persistence with automatic expiration (3 hour TTL)
|
||||
- In-memory persistence with automatic expiration (3 hour TTL)
|
||||
- Thread-safe operations for concurrent access
|
||||
- Graceful degradation when Redis is unavailable
|
||||
- Graceful degradation when storage is unavailable
|
||||
|
||||
DUAL PRIORITIZATION STRATEGY (Files & Conversations):
|
||||
The conversation memory system implements sophisticated prioritization for both files and
|
||||
@@ -187,26 +201,16 @@ class ThreadContext(BaseModel):
|
||||
initial_context: dict[str, Any] # Original request parameters
|
||||
|
||||
|
||||
def get_redis_client():
|
||||
def get_storage():
|
||||
"""
|
||||
Get Redis client from environment configuration
|
||||
|
||||
Creates a Redis client using the REDIS_URL environment variable.
|
||||
Defaults to localhost:6379/0 if not specified.
|
||||
Get in-memory storage backend for conversation persistence.
|
||||
|
||||
Returns:
|
||||
redis.Redis: Configured Redis client with decode_responses=True
|
||||
|
||||
Raises:
|
||||
ValueError: If redis package is not installed
|
||||
InMemoryStorage: Thread-safe in-memory storage backend
|
||||
"""
|
||||
try:
|
||||
import redis
|
||||
from .storage_backend import get_storage_backend
|
||||
|
||||
redis_url = os.getenv("REDIS_URL", "redis://localhost:6379/0")
|
||||
return redis.from_url(redis_url, decode_responses=True)
|
||||
except ImportError:
|
||||
raise ValueError("redis package required. Install with: pip install redis")
|
||||
return get_storage_backend()
|
||||
|
||||
|
||||
def create_thread(tool_name: str, initial_request: dict[str, Any], parent_thread_id: Optional[str] = None) -> str:
|
||||
@@ -251,10 +255,10 @@ def create_thread(tool_name: str, initial_request: dict[str, Any], parent_thread
|
||||
initial_context=filtered_context,
|
||||
)
|
||||
|
||||
# Store in Redis with configurable TTL to prevent indefinite accumulation
|
||||
client = get_redis_client()
|
||||
# Store in memory with configurable TTL to prevent indefinite accumulation
|
||||
storage = get_storage()
|
||||
key = f"thread:{thread_id}"
|
||||
client.setex(key, CONVERSATION_TIMEOUT_SECONDS, context.model_dump_json())
|
||||
storage.setex(key, CONVERSATION_TIMEOUT_SECONDS, context.model_dump_json())
|
||||
|
||||
logger.debug(f"[THREAD] Created new thread {thread_id} with parent {parent_thread_id}")
|
||||
|
||||
@@ -263,7 +267,7 @@ def create_thread(tool_name: str, initial_request: dict[str, Any], parent_thread
|
||||
|
||||
def get_thread(thread_id: str) -> Optional[ThreadContext]:
|
||||
"""
|
||||
Retrieve thread context from Redis
|
||||
Retrieve thread context from in-memory storage
|
||||
|
||||
Fetches complete conversation context for cross-tool continuation.
|
||||
This is the core function that enables tools to access conversation
|
||||
@@ -278,22 +282,22 @@ def get_thread(thread_id: str) -> Optional[ThreadContext]:
|
||||
|
||||
Security:
|
||||
- Validates UUID format to prevent injection attacks
|
||||
- Handles Redis connection failures gracefully
|
||||
- Handles storage connection failures gracefully
|
||||
- No error information leakage on failure
|
||||
"""
|
||||
if not thread_id or not _is_valid_uuid(thread_id):
|
||||
return None
|
||||
|
||||
try:
|
||||
client = get_redis_client()
|
||||
storage = get_storage()
|
||||
key = f"thread:{thread_id}"
|
||||
data = client.get(key)
|
||||
data = storage.get(key)
|
||||
|
||||
if data:
|
||||
return ThreadContext.model_validate_json(data)
|
||||
return None
|
||||
except Exception:
|
||||
# Silently handle errors to avoid exposing Redis details
|
||||
# Silently handle errors to avoid exposing storage details
|
||||
return None
|
||||
|
||||
|
||||
@@ -313,8 +317,7 @@ def add_turn(
|
||||
|
||||
Appends a new conversation turn to an existing thread. This is the core
|
||||
function for building conversation history and enabling cross-tool
|
||||
continuation. Each turn preserves the tool and model that generated it,
|
||||
and tracks file reception order using atomic Redis counters.
|
||||
continuation. Each turn preserves the tool and model that generated it.
|
||||
|
||||
Args:
|
||||
thread_id: UUID of the conversation thread
|
||||
@@ -333,7 +336,7 @@ def add_turn(
|
||||
Failure cases:
|
||||
- Thread doesn't exist or expired
|
||||
- Maximum turn limit reached
|
||||
- Redis connection failure
|
||||
- Storage connection failure
|
||||
|
||||
Note:
|
||||
- Refreshes thread TTL to configured timeout on successful update
|
||||
@@ -370,14 +373,14 @@ def add_turn(
|
||||
context.turns.append(turn)
|
||||
context.last_updated_at = datetime.now(timezone.utc).isoformat()
|
||||
|
||||
# Save back to Redis and refresh TTL
|
||||
# Save back to storage and refresh TTL
|
||||
try:
|
||||
client = get_redis_client()
|
||||
storage = get_storage()
|
||||
key = f"thread:{thread_id}"
|
||||
client.setex(key, CONVERSATION_TIMEOUT_SECONDS, context.model_dump_json()) # Refresh TTL to configured timeout
|
||||
storage.setex(key, CONVERSATION_TIMEOUT_SECONDS, context.model_dump_json()) # Refresh TTL to configured timeout
|
||||
return True
|
||||
except Exception as e:
|
||||
logger.debug(f"[FLOW] Failed to save turn to Redis: {type(e).__name__}")
|
||||
logger.debug(f"[FLOW] Failed to save turn to storage: {type(e).__name__}")
|
||||
return False
|
||||
|
||||
|
||||
@@ -591,11 +594,9 @@ def _plan_file_inclusion_by_size(all_files: list[str], max_file_tokens: int) ->
|
||||
|
||||
for file_path in all_files:
|
||||
try:
|
||||
from utils.file_utils import estimate_file_tokens, translate_path_for_environment
|
||||
from utils.file_utils import estimate_file_tokens
|
||||
|
||||
translated_path = translate_path_for_environment(file_path)
|
||||
|
||||
if os.path.exists(translated_path) and os.path.isfile(translated_path):
|
||||
if os.path.exists(file_path) and os.path.isfile(file_path):
|
||||
# Use centralized token estimation for consistency
|
||||
estimated_tokens = estimate_file_tokens(file_path)
|
||||
|
||||
@@ -613,7 +614,7 @@ def _plan_file_inclusion_by_size(all_files: list[str], max_file_tokens: int) ->
|
||||
else:
|
||||
files_to_skip.append(file_path)
|
||||
# More descriptive message for missing files
|
||||
if not os.path.exists(translated_path):
|
||||
if not os.path.exists(file_path):
|
||||
logger.debug(
|
||||
f"[FILES] Skipping {file_path} - file no longer exists (may have been moved/deleted since conversation)"
|
||||
)
|
||||
@@ -724,7 +725,7 @@ def build_conversation_history(context: ThreadContext, model_context=None, read_
|
||||
Performance Characteristics:
|
||||
- O(n) file collection with newest-first prioritization
|
||||
- Intelligent token budgeting prevents context window overflow
|
||||
- Redis-based persistence with automatic TTL management
|
||||
- In-memory persistence with automatic TTL management
|
||||
- Graceful degradation when files are inaccessible or too large
|
||||
"""
|
||||
# Get the complete thread chain
|
||||
@@ -851,10 +852,7 @@ def build_conversation_history(context: ThreadContext, model_context=None, read_
|
||||
except Exception as e:
|
||||
# More descriptive error handling for missing files
|
||||
try:
|
||||
from utils.file_utils import translate_path_for_environment
|
||||
|
||||
translated_path = translate_path_for_environment(file_path)
|
||||
if not os.path.exists(translated_path):
|
||||
if not os.path.exists(file_path):
|
||||
logger.info(
|
||||
f"File no longer accessible for conversation history: {file_path} - file was moved/deleted since conversation (marking as excluded)"
|
||||
)
|
||||
|
||||
@@ -79,7 +79,7 @@ TEXT_DATA = {
|
||||
".csv", # CSV
|
||||
".tsv", # TSV
|
||||
".gitignore", # Git ignore
|
||||
".dockerfile", # Docker
|
||||
".dockerfile", # Dockerfile
|
||||
".makefile", # Make
|
||||
".cmake", # CMake
|
||||
".gradle", # Gradle
|
||||
@@ -221,7 +221,7 @@ TOKEN_ESTIMATION_RATIOS = {
|
||||
# Logs and data
|
||||
".log": 4.5, # Log files - timestamps, messages, stack traces
|
||||
".csv": 3.1, # CSV - data with delimiters
|
||||
# Docker and infrastructure
|
||||
# Infrastructure files
|
||||
".dockerfile": 3.7, # Dockerfile - commands and paths
|
||||
".tf": 3.5, # Terraform - infrastructure as code
|
||||
}
|
||||
|
||||
@@ -45,7 +45,7 @@ from pathlib import Path
|
||||
from typing import Callable, Optional
|
||||
|
||||
from .file_types import BINARY_EXTENSIONS, CODE_EXTENSIONS, IMAGE_EXTENSIONS, TEXT_EXTENSIONS
|
||||
from .security_config import CONTAINER_WORKSPACE, EXCLUDED_DIRS, MCP_SIGNATURE_FILES, SECURITY_ROOT, WORKSPACE_ROOT
|
||||
from .security_config import EXCLUDED_DIRS, is_dangerous_path
|
||||
from .token_utils import DEFAULT_CONTEXT_WINDOW, estimate_tokens
|
||||
|
||||
|
||||
@@ -92,44 +92,32 @@ def is_mcp_directory(path: Path) -> bool:
|
||||
path: Directory path to check
|
||||
|
||||
Returns:
|
||||
True if this appears to be the MCP directory
|
||||
True if this is the MCP server directory or a subdirectory
|
||||
"""
|
||||
if not path.is_dir():
|
||||
return False
|
||||
|
||||
# Check for multiple signature files to be sure
|
||||
matches = 0
|
||||
for sig_file in MCP_SIGNATURE_FILES:
|
||||
if (path / sig_file).exists():
|
||||
matches += 1
|
||||
if matches >= 3: # Require at least 3 matches to be certain
|
||||
logger.info(f"Detected MCP directory at {path}, will exclude from scanning")
|
||||
return True
|
||||
return False
|
||||
# Get the directory where the MCP server is running from
|
||||
# __file__ is utils/file_utils.py, so parent.parent is the MCP root
|
||||
mcp_server_dir = Path(__file__).parent.parent.resolve()
|
||||
|
||||
# Check if the given path is the MCP server directory or a subdirectory
|
||||
try:
|
||||
path.resolve().relative_to(mcp_server_dir)
|
||||
logger.info(f"Detected MCP server directory at {path}, will exclude from scanning")
|
||||
return True
|
||||
except ValueError:
|
||||
# Not a subdirectory of MCP server
|
||||
return False
|
||||
|
||||
|
||||
def get_user_home_directory() -> Optional[Path]:
|
||||
"""
|
||||
Get the user's home directory based on environment variables.
|
||||
|
||||
In Docker, USER_HOME should be set to the mounted home path.
|
||||
Outside Docker, we use Path.home() or environment variables.
|
||||
Get the user's home directory.
|
||||
|
||||
Returns:
|
||||
User's home directory path or None if not determinable
|
||||
User's home directory path
|
||||
"""
|
||||
# Check for explicit USER_HOME env var (set in docker-compose.yml)
|
||||
user_home = os.environ.get("USER_HOME")
|
||||
if user_home:
|
||||
return Path(user_home).resolve()
|
||||
|
||||
# In container, check if we're running in Docker
|
||||
if CONTAINER_WORKSPACE.exists():
|
||||
# We're in Docker but USER_HOME not set - use WORKSPACE_ROOT as fallback
|
||||
if WORKSPACE_ROOT:
|
||||
return Path(WORKSPACE_ROOT).resolve()
|
||||
|
||||
# Outside Docker, use system home
|
||||
return Path.home()
|
||||
|
||||
|
||||
@@ -291,155 +279,51 @@ def _add_line_numbers(content: str) -> str:
|
||||
return "\n".join(numbered_lines)
|
||||
|
||||
|
||||
def translate_path_for_environment(path_str: str) -> str:
|
||||
"""
|
||||
Translate paths between host and container environments as needed.
|
||||
|
||||
This is the unified path translation function that should be used by all
|
||||
tools and utilities throughout the codebase. It handles:
|
||||
1. Docker host-to-container path translation (host paths -> /workspace/...)
|
||||
2. Direct mode (no translation needed)
|
||||
3. Internal server files (conf/custom_models.json)
|
||||
4. Security validation and error handling
|
||||
|
||||
Docker Path Translation Logic:
|
||||
- Input: /Users/john/project/src/file.py (host path from Claude)
|
||||
- WORKSPACE_ROOT: /Users/john/project (host path in env var)
|
||||
- Output: /workspace/src/file.py (container path for file operations)
|
||||
|
||||
Args:
|
||||
path_str: Original path string from the client (absolute host path)
|
||||
|
||||
Returns:
|
||||
Translated path appropriate for the current environment
|
||||
"""
|
||||
# Handle built-in server config file - no translation needed
|
||||
if _is_builtin_custom_models_config(path_str):
|
||||
return path_str
|
||||
if not WORKSPACE_ROOT or not WORKSPACE_ROOT.strip() or not CONTAINER_WORKSPACE.exists():
|
||||
if path_str.startswith("/app/"):
|
||||
# Convert Docker internal paths to local relative paths for standalone mode
|
||||
relative_path = path_str[5:] # Remove "/app/" prefix
|
||||
if relative_path.startswith("/"):
|
||||
relative_path = relative_path[1:] # Remove leading slash if present
|
||||
return "./" + relative_path
|
||||
# No other translation needed for standalone mode
|
||||
return path_str
|
||||
|
||||
# Check if the path is already a container path (starts with /workspace)
|
||||
if path_str.startswith(str(CONTAINER_WORKSPACE) + "/") or path_str == str(CONTAINER_WORKSPACE):
|
||||
# Path is already translated to container format, return as-is
|
||||
return path_str
|
||||
|
||||
try:
|
||||
# Use os.path.realpath for security - it resolves symlinks completely
|
||||
# This prevents symlink attacks that could escape the workspace
|
||||
real_workspace_root = Path(os.path.realpath(WORKSPACE_ROOT))
|
||||
# For the host path, we can't use realpath if it doesn't exist in the container
|
||||
# So we'll use Path().resolve(strict=False) instead
|
||||
real_host_path = Path(path_str).resolve(strict=False)
|
||||
|
||||
# Security check: ensure the path is within the mounted workspace
|
||||
# This prevents path traversal attacks (e.g., ../../../etc/passwd)
|
||||
relative_path = real_host_path.relative_to(real_workspace_root)
|
||||
|
||||
# Construct the container path
|
||||
container_path = CONTAINER_WORKSPACE / relative_path
|
||||
|
||||
# Log the translation for debugging (but not sensitive paths)
|
||||
if str(container_path) != path_str:
|
||||
logger.info(f"Translated host path to container: {path_str} -> {container_path}")
|
||||
|
||||
return str(container_path)
|
||||
|
||||
except ValueError:
|
||||
# Path is not within the host's WORKSPACE_ROOT
|
||||
# In Docker, we cannot access files outside the mounted volume
|
||||
logger.warning(
|
||||
f"Path '{path_str}' is outside the mounted workspace '{WORKSPACE_ROOT}'. "
|
||||
f"Docker containers can only access files within the mounted directory."
|
||||
)
|
||||
# Return a clear error path that will fail gracefully
|
||||
return f"/inaccessible/outside/mounted/volume{path_str}"
|
||||
except Exception as e:
|
||||
# Log unexpected errors but don't expose internal details to clients
|
||||
logger.warning(f"Path translation failed for '{path_str}': {type(e).__name__}")
|
||||
# Return a clear error path that will fail gracefully
|
||||
return f"/inaccessible/translation/error{path_str}"
|
||||
|
||||
|
||||
def resolve_and_validate_path(path_str: str) -> Path:
|
||||
"""
|
||||
Resolves, translates, and validates a path against security policies.
|
||||
Resolves and validates a path against security policies.
|
||||
|
||||
This is the primary security function that ensures all file access
|
||||
is properly sandboxed. It enforces three critical policies:
|
||||
1. Translate host paths to container paths if applicable (Docker environment)
|
||||
2. All paths must be absolute (no ambiguity)
|
||||
3. All paths must resolve to within PROJECT_ROOT (sandboxing)
|
||||
This function ensures safe file access by:
|
||||
1. Requiring absolute paths (no ambiguity)
|
||||
2. Resolving symlinks to prevent deception
|
||||
3. Blocking access to dangerous system directories
|
||||
|
||||
Args:
|
||||
path_str: Path string (must be absolute)
|
||||
|
||||
Returns:
|
||||
Resolved Path object that is guaranteed to be within PROJECT_ROOT
|
||||
Resolved Path object that is safe to access
|
||||
|
||||
Raises:
|
||||
ValueError: If path is not absolute or otherwise invalid
|
||||
PermissionError: If path is outside allowed directory
|
||||
PermissionError: If path is in a dangerous location
|
||||
"""
|
||||
# Step 1: Translate Docker paths first (if applicable)
|
||||
# This must happen before any other validation
|
||||
translated_path_str = translate_path_for_environment(path_str)
|
||||
# Step 1: Create a Path object
|
||||
user_path = Path(path_str)
|
||||
|
||||
# Step 2: Create a Path object from the (potentially translated) path
|
||||
user_path = Path(translated_path_str)
|
||||
|
||||
# Step 3: Security Policy - Require absolute paths
|
||||
# Step 2: Security Policy - Require absolute paths
|
||||
# Relative paths could be interpreted differently depending on working directory
|
||||
if not user_path.is_absolute():
|
||||
raise ValueError(f"Relative paths are not supported. Please provide an absolute path.\nReceived: {path_str}")
|
||||
|
||||
# Step 4: Resolve the absolute path (follows symlinks, removes .. and .)
|
||||
# Step 3: Resolve the absolute path (follows symlinks, removes .. and .)
|
||||
# This is critical for security as it reveals the true destination of symlinks
|
||||
resolved_path = user_path.resolve()
|
||||
|
||||
# Step 5: Security Policy - Ensure the resolved path is within PROJECT_ROOT
|
||||
# This prevents directory traversal attacks (e.g., /project/../../../etc/passwd)
|
||||
try:
|
||||
resolved_path.relative_to(SECURITY_ROOT)
|
||||
except ValueError:
|
||||
# Provide detailed error for debugging while avoiding information disclosure
|
||||
logger.warning(
|
||||
f"Access denied - path outside workspace. "
|
||||
f"Requested: {path_str}, Resolved: {resolved_path}, Workspace: {SECURITY_ROOT}"
|
||||
)
|
||||
# Step 4: Check against dangerous paths
|
||||
if is_dangerous_path(resolved_path):
|
||||
logger.warning(f"Access denied - dangerous path: {resolved_path}")
|
||||
raise PermissionError(f"Access to system directory denied: {path_str}")
|
||||
|
||||
# Step 5: Check if it's the home directory root
|
||||
if is_home_directory_root(resolved_path):
|
||||
raise PermissionError(
|
||||
f"Path outside workspace: {path_str}\nWorkspace: {SECURITY_ROOT}\nResolved path: {resolved_path}"
|
||||
f"Cannot scan entire home directory: {path_str}\n" f"Please specify a subdirectory within your home folder."
|
||||
)
|
||||
|
||||
return resolved_path
|
||||
|
||||
|
||||
def translate_file_paths(file_paths: Optional[list[str]]) -> Optional[list[str]]:
|
||||
"""
|
||||
Translate a list of file paths for the current environment.
|
||||
|
||||
This function should be used by all tools to consistently handle path translation
|
||||
for file lists. It applies the unified path translation to each path in the list.
|
||||
|
||||
Args:
|
||||
file_paths: List of file paths to translate, or None
|
||||
|
||||
Returns:
|
||||
List of translated paths, or None if input was None
|
||||
"""
|
||||
if not file_paths:
|
||||
return file_paths
|
||||
|
||||
return [translate_path_for_environment(path) for path in file_paths]
|
||||
|
||||
|
||||
def expand_paths(paths: list[str], extensions: Optional[set[str]] = None) -> list[str]:
|
||||
"""
|
||||
Expand paths to individual files, handling both files and directories.
|
||||
@@ -474,23 +358,12 @@ def expand_paths(paths: list[str], extensions: Optional[set[str]] = None) -> lis
|
||||
|
||||
# Safety checks for directory scanning
|
||||
if path_obj.is_dir():
|
||||
resolved_workspace = SECURITY_ROOT.resolve()
|
||||
resolved_path = path_obj.resolve()
|
||||
|
||||
# Check 1: Prevent reading entire workspace root
|
||||
if resolved_path == resolved_workspace:
|
||||
logger.warning(
|
||||
f"Ignoring request to read entire workspace directory: {path}. "
|
||||
f"Please specify individual files or subdirectories instead."
|
||||
)
|
||||
continue
|
||||
|
||||
# Check 2: Prevent scanning user's home directory root
|
||||
# Check 1: Prevent scanning user's home directory root
|
||||
if is_home_directory_root(path_obj):
|
||||
logger.warning(f"Skipping home directory root: {path}. Please specify a project subdirectory instead.")
|
||||
continue
|
||||
|
||||
# Check 3: Skip if this is the MCP's own directory
|
||||
# Check 2: Skip if this is the MCP's own directory
|
||||
if is_mcp_directory(path_obj):
|
||||
logger.info(
|
||||
f"Skipping MCP server directory: {path}. The MCP server code is excluded from project scans."
|
||||
@@ -573,15 +446,6 @@ def read_file_content(
|
||||
# Return error in a format that provides context to the AI
|
||||
logger.debug(f"[FILES] Path validation failed for {file_path}: {type(e).__name__}: {e}")
|
||||
error_msg = str(e)
|
||||
# Add Docker-specific help if we're in Docker and path is inaccessible
|
||||
if WORKSPACE_ROOT and CONTAINER_WORKSPACE.exists():
|
||||
# We're in Docker
|
||||
error_msg = (
|
||||
f"File is outside the Docker mounted directory. "
|
||||
f"When running in Docker, only files within the mounted workspace are accessible. "
|
||||
f"Current mounted directory: {WORKSPACE_ROOT}. "
|
||||
f"To access files in a different directory, please run Claude from that directory."
|
||||
)
|
||||
content = f"\n--- ERROR ACCESSING FILE: {file_path} ---\nError: {error_msg}\n--- END FILE ---\n"
|
||||
tokens = estimate_tokens(content)
|
||||
logger.debug(f"[FILES] Returning error content for {file_path}: {tokens} tokens")
|
||||
@@ -761,12 +625,10 @@ def estimate_file_tokens(file_path: str) -> int:
|
||||
Estimated token count for the file
|
||||
"""
|
||||
try:
|
||||
translated_path = translate_path_for_environment(file_path)
|
||||
|
||||
if not os.path.exists(translated_path) or not os.path.isfile(translated_path):
|
||||
if not os.path.exists(file_path) or not os.path.isfile(file_path):
|
||||
return 0
|
||||
|
||||
file_size = os.path.getsize(translated_path)
|
||||
file_size = os.path.getsize(file_path)
|
||||
|
||||
# Get the appropriate ratio for this file type
|
||||
from .file_types import get_token_estimation_ratio
|
||||
@@ -911,11 +773,10 @@ def read_json_file(file_path: str) -> Optional[dict]:
|
||||
Parsed JSON data as dict, or None if file doesn't exist or invalid
|
||||
"""
|
||||
try:
|
||||
translated_path = translate_path_for_environment(file_path)
|
||||
if not os.path.exists(translated_path):
|
||||
if not os.path.exists(file_path):
|
||||
return None
|
||||
|
||||
with open(translated_path, encoding="utf-8") as f:
|
||||
with open(file_path, encoding="utf-8") as f:
|
||||
return json.load(f)
|
||||
except (json.JSONDecodeError, OSError):
|
||||
return None
|
||||
@@ -934,10 +795,9 @@ def write_json_file(file_path: str, data: dict, indent: int = 2) -> bool:
|
||||
True if successful, False otherwise
|
||||
"""
|
||||
try:
|
||||
translated_path = translate_path_for_environment(file_path)
|
||||
os.makedirs(os.path.dirname(translated_path), exist_ok=True)
|
||||
os.makedirs(os.path.dirname(file_path), exist_ok=True)
|
||||
|
||||
with open(translated_path, "w", encoding="utf-8") as f:
|
||||
with open(file_path, "w", encoding="utf-8") as f:
|
||||
json.dump(data, f, indent=indent, ensure_ascii=False)
|
||||
return True
|
||||
except (OSError, TypeError):
|
||||
@@ -955,9 +815,8 @@ def get_file_size(file_path: str) -> int:
|
||||
File size in bytes, or 0 if file doesn't exist or error
|
||||
"""
|
||||
try:
|
||||
translated_path = translate_path_for_environment(file_path)
|
||||
if os.path.exists(translated_path) and os.path.isfile(translated_path):
|
||||
return os.path.getsize(translated_path)
|
||||
if os.path.exists(file_path) and os.path.isfile(file_path):
|
||||
return os.path.getsize(file_path)
|
||||
return 0
|
||||
except OSError:
|
||||
return 0
|
||||
@@ -974,8 +833,7 @@ def ensure_directory_exists(file_path: str) -> bool:
|
||||
True if directory exists or was created, False on error
|
||||
"""
|
||||
try:
|
||||
translated_path = translate_path_for_environment(file_path)
|
||||
directory = os.path.dirname(translated_path)
|
||||
directory = os.path.dirname(file_path)
|
||||
if directory:
|
||||
os.makedirs(directory, exist_ok=True)
|
||||
return True
|
||||
@@ -1010,15 +868,14 @@ def read_file_safely(file_path: str, max_size: int = 10 * 1024 * 1024) -> Option
|
||||
File content as string, or None if file too large or unreadable
|
||||
"""
|
||||
try:
|
||||
translated_path = translate_path_for_environment(file_path)
|
||||
if not os.path.exists(translated_path) or not os.path.isfile(translated_path):
|
||||
if not os.path.exists(file_path) or not os.path.isfile(file_path):
|
||||
return None
|
||||
|
||||
file_size = os.path.getsize(translated_path)
|
||||
file_size = os.path.getsize(file_path)
|
||||
if file_size > max_size:
|
||||
return None
|
||||
|
||||
with open(translated_path, encoding="utf-8", errors="ignore") as f:
|
||||
with open(file_path, encoding="utf-8", errors="ignore") as f:
|
||||
return f.read()
|
||||
except OSError:
|
||||
return None
|
||||
|
||||
@@ -55,7 +55,7 @@ def find_git_repositories(start_path: str, max_depth: int = 5) -> list[str]:
|
||||
|
||||
try:
|
||||
# Create Path object - no need to resolve yet since the path might be
|
||||
# a translated Docker path that doesn't exist on the host
|
||||
# a translated path that doesn't exist
|
||||
start_path = Path(start_path)
|
||||
|
||||
# Basic validation - must be absolute
|
||||
|
||||
@@ -2,15 +2,14 @@
|
||||
Security configuration and path validation constants
|
||||
|
||||
This module contains security-related constants and configurations
|
||||
for file access control and workspace management.
|
||||
for file access control.
|
||||
"""
|
||||
|
||||
import os
|
||||
from pathlib import Path
|
||||
|
||||
# Dangerous paths that should never be used as WORKSPACE_ROOT
|
||||
# Dangerous paths that should never be scanned
|
||||
# These would give overly broad access and pose security risks
|
||||
DANGEROUS_WORKSPACE_PATHS = {
|
||||
DANGEROUS_PATHS = {
|
||||
"/",
|
||||
"/etc",
|
||||
"/usr",
|
||||
@@ -18,7 +17,6 @@ DANGEROUS_WORKSPACE_PATHS = {
|
||||
"/var",
|
||||
"/root",
|
||||
"/home",
|
||||
"/workspace", # Container path - WORKSPACE_ROOT should be host path
|
||||
"C:\\",
|
||||
"C:\\Windows",
|
||||
"C:\\Program Files",
|
||||
@@ -88,87 +86,19 @@ EXCLUDED_DIRS = {
|
||||
"vendor",
|
||||
}
|
||||
|
||||
# MCP signature files - presence of these indicates the MCP's own directory
|
||||
# Used to prevent the MCP from scanning its own codebase
|
||||
MCP_SIGNATURE_FILES = {
|
||||
"zen_server.py",
|
||||
"server.py",
|
||||
"tools/precommit.py",
|
||||
"utils/file_utils.py",
|
||||
"prompts/tool_prompts.py",
|
||||
}
|
||||
|
||||
# Workspace configuration
|
||||
WORKSPACE_ROOT = os.environ.get("WORKSPACE_ROOT")
|
||||
CONTAINER_WORKSPACE = Path("/workspace")
|
||||
|
||||
|
||||
def validate_workspace_security(workspace_root: str) -> None:
|
||||
def is_dangerous_path(path: Path) -> bool:
|
||||
"""
|
||||
Validate that WORKSPACE_ROOT is set to a safe directory.
|
||||
Check if a path is in the dangerous paths list.
|
||||
|
||||
Args:
|
||||
workspace_root: The workspace root path to validate
|
||||
|
||||
Raises:
|
||||
RuntimeError: If the workspace root is unsafe
|
||||
"""
|
||||
if not workspace_root:
|
||||
return
|
||||
|
||||
# Resolve to canonical path for comparison
|
||||
resolved_workspace = Path(workspace_root).resolve()
|
||||
|
||||
# Special check for /workspace - common configuration mistake
|
||||
if str(resolved_workspace) == "/workspace":
|
||||
raise RuntimeError(
|
||||
f"Configuration Error: WORKSPACE_ROOT should be set to the HOST path, not the container path. "
|
||||
f"Found: WORKSPACE_ROOT={workspace_root} "
|
||||
f"Expected: WORKSPACE_ROOT should be set to your host directory path (e.g., $HOME) "
|
||||
f"that contains all files Claude might reference. "
|
||||
f"This path gets mounted to /workspace inside the Docker container."
|
||||
)
|
||||
|
||||
# Check against other dangerous paths
|
||||
if str(resolved_workspace) in DANGEROUS_WORKSPACE_PATHS:
|
||||
raise RuntimeError(
|
||||
f"Security Error: WORKSPACE_ROOT '{workspace_root}' is set to a dangerous system directory. "
|
||||
f"This would give access to critical system files. "
|
||||
f"Please set WORKSPACE_ROOT to a specific project directory."
|
||||
)
|
||||
|
||||
# Additional check: prevent filesystem root
|
||||
if resolved_workspace.parent == resolved_workspace:
|
||||
raise RuntimeError(
|
||||
f"Security Error: WORKSPACE_ROOT '{workspace_root}' cannot be the filesystem root. "
|
||||
f"This would give access to the entire filesystem. "
|
||||
f"Please set WORKSPACE_ROOT to a specific project directory."
|
||||
)
|
||||
|
||||
|
||||
def get_security_root() -> Path:
|
||||
"""
|
||||
Determine the security boundary for file access.
|
||||
path: Path to check
|
||||
|
||||
Returns:
|
||||
Path object representing the security root directory
|
||||
True if the path is dangerous and should not be accessed
|
||||
"""
|
||||
# In Docker: use /workspace (container directory)
|
||||
# In tests/direct mode: use WORKSPACE_ROOT (host directory)
|
||||
if CONTAINER_WORKSPACE.exists():
|
||||
# Running in Docker container
|
||||
return CONTAINER_WORKSPACE
|
||||
elif WORKSPACE_ROOT:
|
||||
# Running in tests or direct mode with WORKSPACE_ROOT set
|
||||
return Path(WORKSPACE_ROOT).resolve()
|
||||
else:
|
||||
# Fallback for backward compatibility (should not happen in normal usage)
|
||||
return Path.home()
|
||||
|
||||
|
||||
# Validate security on import if WORKSPACE_ROOT is set
|
||||
if WORKSPACE_ROOT:
|
||||
validate_workspace_security(WORKSPACE_ROOT)
|
||||
|
||||
# Export the computed security root
|
||||
SECURITY_ROOT = get_security_root()
|
||||
try:
|
||||
resolved = path.resolve()
|
||||
return str(resolved) in DANGEROUS_PATHS or resolved.parent == resolved
|
||||
except Exception:
|
||||
return True # If we can't resolve, consider it dangerous
|
||||
|
||||
113
utils/storage_backend.py
Normal file
113
utils/storage_backend.py
Normal file
@@ -0,0 +1,113 @@
|
||||
"""
|
||||
In-memory storage backend for conversation threads
|
||||
|
||||
This module provides a thread-safe, in-memory alternative to Redis for storing
|
||||
conversation contexts. It's designed for ephemeral MCP server sessions where
|
||||
conversations only need to persist during a single Claude session.
|
||||
|
||||
⚠️ PROCESS-SPECIFIC STORAGE: This storage is confined to a single Python process.
|
||||
Data stored in one process is NOT accessible from other processes or subprocesses.
|
||||
This is why simulator tests that run server.py as separate subprocesses cannot
|
||||
share conversation state between tool calls.
|
||||
|
||||
Key Features:
|
||||
- Thread-safe operations using locks
|
||||
- TTL support with automatic expiration
|
||||
- Background cleanup thread for memory management
|
||||
- Singleton pattern for consistent state within a single process
|
||||
- Drop-in replacement for Redis storage (for single-process scenarios)
|
||||
"""
|
||||
|
||||
import logging
|
||||
import os
|
||||
import threading
|
||||
import time
|
||||
from typing import Optional
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class InMemoryStorage:
|
||||
"""Thread-safe in-memory storage for conversation threads"""
|
||||
|
||||
def __init__(self):
|
||||
self._store: dict[str, tuple[str, float]] = {}
|
||||
self._lock = threading.Lock()
|
||||
# Match Redis behavior: cleanup interval based on conversation timeout
|
||||
# Run cleanup at 1/10th of timeout interval (e.g., 18 mins for 3 hour timeout)
|
||||
timeout_hours = int(os.getenv("CONVERSATION_TIMEOUT_HOURS", "3"))
|
||||
self._cleanup_interval = (timeout_hours * 3600) // 10
|
||||
self._cleanup_interval = max(300, self._cleanup_interval) # Minimum 5 minutes
|
||||
self._shutdown = False
|
||||
|
||||
# Start background cleanup thread
|
||||
self._cleanup_thread = threading.Thread(target=self._cleanup_worker, daemon=True)
|
||||
self._cleanup_thread.start()
|
||||
|
||||
logger.info(
|
||||
f"In-memory storage initialized with {timeout_hours}h timeout, cleanup every {self._cleanup_interval//60}m"
|
||||
)
|
||||
|
||||
def set_with_ttl(self, key: str, ttl_seconds: int, value: str) -> None:
|
||||
"""Store value with expiration time"""
|
||||
with self._lock:
|
||||
expires_at = time.time() + ttl_seconds
|
||||
self._store[key] = (value, expires_at)
|
||||
logger.debug(f"Stored key {key} with TTL {ttl_seconds}s")
|
||||
|
||||
def get(self, key: str) -> Optional[str]:
|
||||
"""Retrieve value if not expired"""
|
||||
with self._lock:
|
||||
if key in self._store:
|
||||
value, expires_at = self._store[key]
|
||||
if time.time() < expires_at:
|
||||
logger.debug(f"Retrieved key {key}")
|
||||
return value
|
||||
else:
|
||||
# Clean up expired entry
|
||||
del self._store[key]
|
||||
logger.debug(f"Key {key} expired and removed")
|
||||
return None
|
||||
|
||||
def setex(self, key: str, ttl_seconds: int, value: str) -> None:
|
||||
"""Redis-compatible setex method"""
|
||||
self.set_with_ttl(key, ttl_seconds, value)
|
||||
|
||||
def _cleanup_worker(self):
|
||||
"""Background thread that periodically cleans up expired entries"""
|
||||
while not self._shutdown:
|
||||
time.sleep(self._cleanup_interval)
|
||||
self._cleanup_expired()
|
||||
|
||||
def _cleanup_expired(self):
|
||||
"""Remove all expired entries"""
|
||||
with self._lock:
|
||||
current_time = time.time()
|
||||
expired_keys = [k for k, (_, exp) in self._store.items() if exp < current_time]
|
||||
for key in expired_keys:
|
||||
del self._store[key]
|
||||
|
||||
if expired_keys:
|
||||
logger.debug(f"Cleaned up {len(expired_keys)} expired conversation threads")
|
||||
|
||||
def shutdown(self):
|
||||
"""Graceful shutdown of background thread"""
|
||||
self._shutdown = True
|
||||
if self._cleanup_thread.is_alive():
|
||||
self._cleanup_thread.join(timeout=1)
|
||||
|
||||
|
||||
# Global singleton instance
|
||||
_storage_instance = None
|
||||
_storage_lock = threading.Lock()
|
||||
|
||||
|
||||
def get_storage_backend() -> InMemoryStorage:
|
||||
"""Get the global storage instance (singleton pattern)"""
|
||||
global _storage_instance
|
||||
if _storage_instance is None:
|
||||
with _storage_lock:
|
||||
if _storage_instance is None:
|
||||
_storage_instance = InMemoryStorage()
|
||||
logger.info("Initialized in-memory conversation storage")
|
||||
return _storage_instance
|
||||
Reference in New Issue
Block a user