Performance improvements when embedding files:

- Exit early at MCP boundary if files won't fit within given context of chosen model - Encourage claude to re-run with better context - Check file sizes before embedding - Drop files from older conversations when building continuations and give priority to newer files - List and mention excluded files to Claude on return - Improved tests - Improved precommit prompt - Added a new Low severity to precommit - Improved documentation of file embedding strategy - Refactor
2025-06-16 05:51:52 +04:00
parent 56333cbd86
commit 91077e3810
16 changed files with 1557 additions and 308 deletions
--- a/utils/conversation_memory.py
+++ b/utils/conversation_memory.py
@@ -30,12 +30,33 @@ Key Features:
 - Turn-by-turn conversation history storage with tool attribution
 - Cross-tool continuation support - switch tools while preserving context
 - File context preservation - files shared in earlier turns remain accessible
- Automatic turn limiting (5 turns max) to prevent runaway conversations
+- NEWEST-FIRST FILE PRIORITIZATION - when the same file appears in multiple turns,
+  references from newer turns take precedence over older ones. This ensures the
+  most recent file context is preserved when token limits require exclusions.
+- Automatic turn limiting (20 turns max) to prevent runaway conversations
 - Context reconstruction for stateless request continuity
- Redis-based persistence with automatic expiration (1 hour TTL)
+- Redis-based persistence with automatic expiration (3 hour TTL)
 - Thread-safe operations for concurrent access
 - Graceful degradation when Redis is unavailable

+FILE PRIORITIZATION STRATEGY:
+The conversation memory system implements a sophisticated file prioritization algorithm
+that ensures newer file references always take precedence over older ones:
+
+1. When collecting files across conversation turns, the system walks BACKWARDS through
+   turns (newest to oldest) and builds a unique file list
+2. If the same file path appears in multiple turns, only the reference from the
+   NEWEST turn is kept in the final list
+3. This "newest-first" ordering is preserved throughout the entire pipeline:
+   - get_conversation_file_list() establishes the order
+   - build_conversation_history() maintains it during token budgeting
+   - When token limits are hit, OLDER files are excluded first
+4. This strategy works across conversation chains - files from newer turns in ANY
+   thread take precedence over files from older turns in ANY thread
+
+This approach ensures that when token limits force file exclusions, the most
+recently referenced and contextually relevant files are preserved.
+
 USAGE EXAMPLE:
 1. Tool A creates thread: create_thread("analyze", request_data) → returns UUID
 2. Tool A adds response: add_turn(UUID, "assistant", response, files=[...], tool_name="analyze")
@@ -262,11 +283,12 @@ def add_turn(
    model_metadata: Optional[dict[str, Any]] = None,
 ) -> bool:
    """
-    Add turn to existing thread
+    Add turn to existing thread with atomic file ordering.

    Appends a new conversation turn to an existing thread. This is the core
    function for building conversation history and enabling cross-tool
-    continuation. Each turn preserves the tool and model that generated it.
+    continuation. Each turn preserves the tool and model that generated it,
+    and tracks file reception order using atomic Redis counters.

    Args:
        thread_id: UUID of the conversation thread
@@ -289,7 +311,7 @@ def add_turn(
    Note:
        - Refreshes thread TTL to configured timeout on successful update
        - Turn limits prevent runaway conversations
-        - File references are preserved for cross-tool access
+        - File references are preserved for cross-tool access with atomic ordering
        - Model information enables cross-provider conversations
    """
    logger.debug(f"[FLOW] Adding {role} turn to {thread_id} ({tool_name})")
@@ -374,77 +396,212 @@ def get_thread_chain(thread_id: str, max_depth: int = 20) -> list[ThreadContext]

 def get_conversation_file_list(context: ThreadContext) -> list[str]:
    """
-    Get all unique files referenced across all turns in a conversation.
+    Extract all unique files from conversation turns with newest-first prioritization.

-    This function extracts and deduplicates file references from all conversation
-    turns to enable efficient file embedding - files are read once and shared
-    across all turns rather than being embedded multiple times.
+    This function implements the core file prioritization logic used throughout the
+    conversation memory system. It walks backwards through conversation turns
+    (from newest to oldest) and collects unique file references, ensuring that
+    when the same file appears in multiple turns, the reference from the NEWEST
+    turn takes precedence.
+
+    PRIORITIZATION ALGORITHM:
+    1. Iterate through turns in REVERSE order (index len-1 down to 0)
+    2. For each turn, process files in the order they appear in turn.files
+    3. Add file to result list only if not already seen (newest reference wins)
+    4. Skip duplicate files that were already added from newer turns
+
+    This ensures that:
+    - Files from newer conversation turns appear first in the result
+    - When the same file is referenced multiple times, only the newest reference is kept
+    - The order reflects the most recent conversation context
+
+    Example:
+        Turn 1: files = ["main.py", "utils.py"]
+        Turn 2: files = ["test.py"]
+        Turn 3: files = ["main.py", "config.py"]  # main.py appears again
+
+        Result: ["main.py", "config.py", "test.py", "utils.py"]
+        (main.py from Turn 3 takes precedence over Turn 1)

    Args:
-        context: ThreadContext containing the complete conversation
+        context: ThreadContext containing all conversation turns to process

    Returns:
-        list[str]: Deduplicated list of file paths referenced in the conversation
+        list[str]: Unique file paths ordered by newest reference first.
+                   Empty list if no turns exist or no files are referenced.
+
+    Performance:
+        - Time Complexity: O(n*m) where n=turns, m=avg files per turn
+        - Space Complexity: O(f) where f=total unique files
+        - Uses set for O(1) duplicate detection
    """
    if not context.turns:
        logger.debug("[FILES] No turns found, returning empty file list")
        return []

-    # Collect all unique files from all turns, preserving order of first appearance
+    # Collect files by walking backwards (newest to oldest turns)
    seen_files = set()
-    unique_files = []
+    file_list = []

-    logger.debug(f"[FILES] Collecting files from {len(context.turns)} turns")
+    logger.debug(f"[FILES] Collecting files from {len(context.turns)} turns (newest first)")

-    for i, turn in enumerate(context.turns):
+    # Process turns in reverse order (newest first) - this is the CORE of newest-first prioritization
+    # By iterating from len-1 down to 0, we encounter newer turns before older turns
+    # When we find a duplicate file, we skip it because the newer version is already in our list
+    for i in range(len(context.turns) - 1, -1, -1):  # REVERSE: newest turn first
+        turn = context.turns[i]
        if turn.files:
            logger.debug(f"[FILES] Turn {i + 1} has {len(turn.files)} files: {turn.files}")
            for file_path in turn.files:
                if file_path not in seen_files:
+                    # First time seeing this file - add it (this is the NEWEST reference)
                    seen_files.add(file_path)
-                    unique_files.append(file_path)
-                    logger.debug(f"[FILES] Added new file: {file_path}")
+                    file_list.append(file_path)
+                    logger.debug(f"[FILES] Added new file: {file_path} (from turn {i + 1})")
                else:
-                    logger.debug(f"[FILES] Duplicate file skipped: {file_path}")
-        else:
-            logger.debug(f"[FILES] Turn {i + 1} has no files")
+                    # File already seen from a NEWER turn - skip this older reference
+                    logger.debug(f"[FILES] Skipping duplicate file: {file_path} (newer version already included)")

-    logger.debug(f"[FILES] Final unique file list ({len(unique_files)}): {unique_files}")
-    return unique_files
+    logger.debug(f"[FILES] Final file list ({len(file_list)}): {file_list}")
+    return file_list
+
+
+def _plan_file_inclusion_by_size(all_files: list[str], max_file_tokens: int) -> tuple[list[str], list[str], int]:
+    """
+    Plan which files to include based on size constraints.
+
+    This is ONLY used for conversation history building, not MCP boundary checks.
+
+    Args:
+        all_files: List of files to consider for inclusion
+        max_file_tokens: Maximum tokens available for file content
+
+    Returns:
+        Tuple of (files_to_include, files_to_skip, estimated_total_tokens)
+    """
+    if not all_files:
+        return [], [], 0
+
+    files_to_include = []
+    files_to_skip = []
+    total_tokens = 0
+
+    logger.debug(f"[FILES] Planning inclusion for {len(all_files)} files with budget {max_file_tokens:,} tokens")
+
+    for file_path in all_files:
+        try:
+            from utils.file_utils import estimate_file_tokens, translate_path_for_environment
+
+            translated_path = translate_path_for_environment(file_path)
+
+            if os.path.exists(translated_path) and os.path.isfile(translated_path):
+                # Use centralized token estimation for consistency
+                estimated_tokens = estimate_file_tokens(file_path)
+
+                if total_tokens + estimated_tokens <= max_file_tokens:
+                    files_to_include.append(file_path)
+                    total_tokens += estimated_tokens
+                    logger.debug(
+                        f"[FILES] Including {file_path} - {estimated_tokens:,} tokens (total: {total_tokens:,})"
+                    )
+                else:
+                    files_to_skip.append(file_path)
+                    logger.debug(
+                        f"[FILES] Skipping {file_path} - would exceed budget (needs {estimated_tokens:,} tokens)"
+                    )
+            else:
+                files_to_skip.append(file_path)
+                logger.debug(f"[FILES] Skipping {file_path} - file not accessible")
+
+        except Exception as e:
+            files_to_skip.append(file_path)
+            logger.debug(f"[FILES] Skipping {file_path} - error: {type(e).__name__}: {e}")
+
+    logger.debug(
+        f"[FILES] Inclusion plan: {len(files_to_include)} include, {len(files_to_skip)} skip, {total_tokens:,} tokens"
+    )
+    return files_to_include, files_to_skip, total_tokens


 def build_conversation_history(context: ThreadContext, model_context=None, read_files_func=None) -> tuple[str, int]:
    """
    Build formatted conversation history for tool prompts with embedded file contents.

-    Creates a formatted string representation of the conversation history that includes
-    full file contents from all referenced files. Files are embedded only ONCE at the
-    start, even if referenced in multiple turns, to prevent duplication and optimize
-    token usage.
+    Creates a comprehensive conversation history that includes both conversation turns and
+    file contents, with intelligent prioritization to maximize relevant context within
+    token limits. This function enables stateless tools to access complete conversation
+    context from previous interactions, including cross-tool continuations.

-    If the thread has a parent chain, this function traverses the entire chain to
-    include the complete conversation history.
+    FILE PRIORITIZATION BEHAVIOR:
+    Files from newer conversation turns are prioritized over files from older turns.
+    When the same file appears in multiple turns, the reference from the NEWEST turn
+    takes precedence. This ensures the most recent file context is preserved when
+    token limits require file exclusions.
+
+    CONVERSATION CHAIN HANDLING:
+    If the thread has a parent_thread_id, this function traverses the entire chain
+    to include complete conversation history across multiple linked threads. File
+    prioritization works across the entire chain, not just the current thread.
+
+    TOKEN MANAGEMENT:
+    - Uses model-specific token allocation (file_tokens + history_tokens)
+    - Files are embedded ONCE at the start to prevent duplication
+    - Conversation turns are processed newest-first but presented chronologically
+    - Stops adding turns when token budget would be exceeded
+    - Gracefully handles token limits with informative notes

    Args:
-        context: ThreadContext containing the complete conversation
-        model_context: ModelContext for token allocation (optional, uses DEFAULT_MODEL if not provided)
-        read_files_func: Optional function to read files (for testing)
+        context: ThreadContext containing the conversation to format
+        model_context: ModelContext for token allocation (optional, uses DEFAULT_MODEL fallback)
+        read_files_func: Optional function to read files (primarily for testing)

    Returns:
        tuple[str, int]: (formatted_conversation_history, total_tokens_used)
-        Returns ("", 0) if no conversation turns exist
+        Returns ("", 0) if no conversation turns exist in the context

-    Format:
-        - Header with thread metadata and turn count
-        - All referenced files embedded once with full contents
-        - Each turn shows: role, tool used, which files were used, content
-        - Clear delimiters for AI parsing
-        - Continuation instruction at end
+    Output Format:
+        === CONVERSATION HISTORY (CONTINUATION) ===
+        Thread: <thread_id>
+        Tool: <original_tool_name>
+        Turn <current>/<max_allowed>
+        You are continuing this conversation thread from where it left off.

-    Note:
-        This formatted history allows tools to "see" both conversation context AND
-        file contents from previous tools, enabling true cross-tool collaboration
-        while preventing duplicate file embeddings.
+        === FILES REFERENCED IN THIS CONVERSATION ===
+        The following files have been shared and analyzed during our conversation.
+        [NOTE: X files omitted due to size constraints]
+        Refer to these when analyzing the context and requests below:
+
+        <embedded_file_contents_with_line_numbers>
+
+        === END REFERENCED FILES ===
+
+        Previous conversation turns:
+
+        --- Turn 1 (Claude) ---
+        Files used in this turn: file1.py, file2.py
+
+        <turn_content>
+
+        --- Turn 2 (Gemini using analyze via google/gemini-2.5-flash) ---
+        Files used in this turn: file3.py
+
+        <turn_content>
+
+        === END CONVERSATION HISTORY ===
+
+        IMPORTANT: You are continuing an existing conversation thread...
+        This is turn X of the conversation - use the conversation history above...
+
+    Cross-Tool Collaboration:
+        This formatted history allows any tool to "see" both conversation context AND
+        file contents from previous tools, enabling seamless handoffs between analyze,
+        codereview, debug, chat, and other tools while maintaining complete context.
+
+    Performance Characteristics:
+        - O(n) file collection with newest-first prioritization
+        - Intelligent token budgeting prevents context window overflow
+        - Redis-based persistence with automatic TTL management
+        - Graceful degradation when files are inaccessible or too large
    """
    # Get the complete thread chain
    if context.parent_thread_id:
@@ -453,19 +610,25 @@ def build_conversation_history(context: ThreadContext, model_context=None, read_

        # Collect all turns from all threads in chain
        all_turns = []
-        all_files_set = set()
        total_turns = 0

        for thread in chain:
            all_turns.extend(thread.turns)
            total_turns += len(thread.turns)

-            # Collect files from this thread
-            for turn in thread.turns:
-                if turn.files:
-                    all_files_set.update(turn.files)
-
-        all_files = list(all_files_set)
+        # Use centralized file collection logic for consistency across the entire chain
+        # This ensures files from newer turns across ALL threads take precedence
+        # over files from older turns, maintaining the newest-first prioritization
+        # even when threads are chained together
+        temp_context = ThreadContext(
+            thread_id="merged_chain",
+            created_at=context.created_at,
+            last_updated_at=context.last_updated_at,
+            tool_name=context.tool_name,
+            turns=all_turns,  # All turns from entire chain in chronological order
+            initial_context=context.initial_context,
+        )
+        all_files = get_conversation_file_list(temp_context)  # Applies newest-first logic to entire chain
        logger.debug(f"[THREAD] Built history from {len(chain)} threads with {total_turns} total turns")
    else:
        # Single thread, no parent chain
@@ -511,101 +674,91 @@ def build_conversation_history(context: ThreadContext, model_context=None, read_
        "",
    ]

-    # Embed all files referenced in this conversation once at the start
+    # Embed files referenced in this conversation with size-aware selection
    if all_files:
        logger.debug(f"[FILES] Starting embedding for {len(all_files)} files")
-        history_parts.extend(
-            [
-                "=== FILES REFERENCED IN THIS CONVERSATION ===",
-                "The following files have been shared and analyzed during our conversation.",
-                "Refer to these when analyzing the context and requests below:",
-                "",
-            ]
-        )

-        if read_files_func is None:
-            from utils.file_utils import read_file_content
+        # Plan file inclusion based on size constraints
+        # CRITICAL: all_files is already ordered by newest-first prioritization from get_conversation_file_list()
+        # So when _plan_file_inclusion_by_size() hits token limits, it naturally excludes OLDER files first
+        # while preserving the most recent file references - exactly what we want!
+        files_to_include, files_to_skip, estimated_tokens = _plan_file_inclusion_by_size(all_files, max_file_tokens)

-            # Optimized: read files incrementally with token tracking
-            file_contents = []
-            total_tokens = 0
-            files_included = 0
-            files_truncated = 0
+        if files_to_skip:
+            logger.info(f"[FILES] Skipping {len(files_to_skip)} files due to size constraints: {files_to_skip}")

-            for file_path in all_files:
-                try:
-                    logger.debug(f"[FILES] Processing file {file_path}")
-                    # Correctly unpack the tuple returned by read_file_content
-                    formatted_content, content_tokens = read_file_content(file_path)
-                    if formatted_content:
-                        # read_file_content already returns formatted content, use it directly
-                        # Check if adding this file would exceed the limit
-                        if total_tokens + content_tokens <= max_file_tokens:
+        if files_to_include:
+            history_parts.extend(
+                [
+                    "=== FILES REFERENCED IN THIS CONVERSATION ===",
+                    "The following files have been shared and analyzed during our conversation.",
+                    (
+                        ""
+                        if not files_to_skip
+                        else f"[NOTE: {len(files_to_skip)} files omitted due to size constraints]"
+                    ),
+                    "Refer to these when analyzing the context and requests below:",
+                    "",
+                ]
+            )
+
+            if read_files_func is None:
+                from utils.file_utils import read_file_content
+
+                # Process files for embedding
+                file_contents = []
+                total_tokens = 0
+                files_included = 0
+
+                for file_path in files_to_include:
+                    try:
+                        logger.debug(f"[FILES] Processing file {file_path}")
+                        formatted_content, content_tokens = read_file_content(file_path)
+                        if formatted_content:
                            file_contents.append(formatted_content)
                            total_tokens += content_tokens
                            files_included += 1
                            logger.debug(
                                f"File embedded in conversation history: {file_path} ({content_tokens:,} tokens)"
                            )
-                            logger.debug(
-                                f"[FILES] Successfully embedded {file_path} - {content_tokens:,} tokens (total: {total_tokens:,})"
-                            )
                        else:
-                            files_truncated += 1
-                            logger.debug(
-                                f"File truncated due to token limit: {file_path} ({content_tokens:,} tokens, would exceed {max_file_tokens:,} limit)"
-                            )
-                            logger.debug(
-                                f"[FILES] File {file_path} would exceed token limit - skipping (would be {total_tokens + content_tokens:,} tokens)"
-                            )
-                            # Stop processing more files
-                            break
-                    else:
-                        logger.debug(f"File skipped (empty content): {file_path}")
-                        logger.debug(f"[FILES] File {file_path} has empty content - skipping")
-                except Exception as e:
-                    # Skip files that can't be read but log the failure
-                    logger.warning(
-                        f"Failed to embed file in conversation history: {file_path} - {type(e).__name__}: {e}"
-                    )
-                    logger.debug(f"[FILES] Failed to read file {file_path} - {type(e).__name__}: {e}")
-                    continue
+                            logger.debug(f"File skipped (empty content): {file_path}")
+                    except Exception as e:
+                        logger.warning(
+                            f"Failed to embed file in conversation history: {file_path} - {type(e).__name__}: {e}"
+                        )
+                        continue

-            if file_contents:
-                files_content = "".join(file_contents)
-                if files_truncated > 0:
-                    files_content += (
-                        f"\n[NOTE: {files_truncated} additional file(s) were truncated due to token limit]\n"
-                    )
-                history_parts.append(files_content)
-                logger.debug(
-                    f"Conversation history file embedding complete: {files_included} files embedded, {files_truncated} truncated, {total_tokens:,} total tokens"
-                )
-                logger.debug(
-                    f"[FILES] File embedding summary - {files_included} embedded, {files_truncated} truncated, {total_tokens:,} tokens total"
-                )
-            else:
-                history_parts.append("(No accessible files found)")
-                logger.debug(
-                    f"Conversation history file embedding: no accessible files found from {len(all_files)} requested"
-                )
-                logger.debug(f"[FILES] No accessible files found from {len(all_files)} requested files")
-        else:
-            # Fallback to original read_files function for backward compatibility
-            files_content = read_files_func(all_files)
-            if files_content:
-                # Add token validation for the combined file content
-                from utils.token_utils import check_token_limit
-
-                within_limit, estimated_tokens = check_token_limit(files_content)
-                if within_limit:
+                if file_contents:
+                    files_content = "".join(file_contents)
+                    if files_to_skip:
+                        files_content += (
+                            f"\n[NOTE: {len(files_to_skip)} additional file(s) were omitted due to size constraints. "
+                            f"These were older files from earlier conversation turns.]\n"
+                        )
                    history_parts.append(files_content)
+                    logger.debug(
+                        f"Conversation history file embedding complete: {files_included} files embedded, {len(files_to_skip)} omitted, {total_tokens:,} total tokens"
+                    )
                else:
-                    # Handle token limit exceeded for conversation files
-                    error_message = f"ERROR: The total size of files referenced in this conversation has exceeded the context limit and cannot be displayed.\nEstimated tokens: {estimated_tokens}, but limit is {max_file_tokens}."
-                    history_parts.append(error_message)
+                    history_parts.append("(No accessible files found)")
+                    logger.debug(f"[FILES] No accessible files found from {len(files_to_include)} planned files")
            else:
-                history_parts.append("(No accessible files found)")
+                # Fallback to original read_files function for backward compatibility
+                files_content = read_files_func(all_files)
+                if files_content:
+                    # Add token validation for the combined file content
+                    from utils.token_utils import check_token_limit
+
+                    within_limit, estimated_tokens = check_token_limit(files_content)
+                    if within_limit:
+                        history_parts.append(files_content)
+                    else:
+                        # Handle token limit exceeded for conversation files
+                        error_message = f"ERROR: The total size of files referenced in this conversation has exceeded the context limit and cannot be displayed.\nEstimated tokens: {estimated_tokens}, but limit is {max_file_tokens}."
+                        history_parts.append(error_message)
+                else:
+                    history_parts.append("(No accessible files found)")

        history_parts.extend(
            [
--- a/utils/file_types.py
+++ b/utils/file_types.py
@@ -178,3 +178,65 @@ def is_binary_file(file_path: str) -> bool:
    from pathlib import Path

    return Path(file_path).suffix.lower() in BINARY_EXTENSIONS
+
+
+# File-type specific token-to-byte ratios for accurate token estimation
+# Based on empirical analysis of file compression characteristics and tokenization patterns
+TOKEN_ESTIMATION_RATIOS = {
+    # Programming languages
+    ".py": 3.5,  # Python - moderate verbosity
+    ".js": 3.2,  # JavaScript - compact syntax
+    ".ts": 3.3,  # TypeScript - type annotations add tokens
+    ".jsx": 3.1,  # React JSX - JSX tags are tokenized efficiently
+    ".tsx": 3.0,  # React TSX - combination of TypeScript + JSX
+    ".java": 3.6,  # Java - verbose syntax, long identifiers
+    ".cpp": 3.7,  # C++ - preprocessor directives, templates
+    ".c": 3.8,  # C - function definitions, struct declarations
+    ".go": 3.9,  # Go - explicit error handling, package names
+    ".rs": 3.5,  # Rust - similar to Python in verbosity
+    ".php": 3.3,  # PHP - mixed HTML/code, variable prefixes
+    ".rb": 3.6,  # Ruby - descriptive method names
+    ".swift": 3.4,  # Swift - modern syntax, type inference
+    ".kt": 3.5,  # Kotlin - similar to modern languages
+    ".scala": 3.2,  # Scala - functional programming, concise
+    # Scripts and configuration
+    ".sh": 4.1,  # Shell scripts - commands and paths
+    ".bat": 4.0,  # Batch files - similar to shell
+    ".ps1": 3.8,  # PowerShell - more structured than bash
+    ".sql": 3.8,  # SQL - keywords and table/column names
+    # Data and configuration formats
+    ".json": 2.5,  # JSON - lots of punctuation and quotes
+    ".yaml": 3.0,  # YAML - structured but readable
+    ".yml": 3.0,  # YAML (alternative extension)
+    ".xml": 2.8,  # XML - tags and attributes
+    ".toml": 3.2,  # TOML - similar to config files
+    # Documentation and text
+    ".md": 4.2,  # Markdown - natural language with formatting
+    ".txt": 4.0,  # Plain text - mostly natural language
+    ".rst": 4.1,  # reStructuredText - documentation format
+    # Web technologies
+    ".html": 2.9,  # HTML - tags and attributes
+    ".css": 3.4,  # CSS - properties and selectors
+    # Logs and data
+    ".log": 4.5,  # Log files - timestamps, messages, stack traces
+    ".csv": 3.1,  # CSV - data with delimiters
+    # Docker and infrastructure
+    ".dockerfile": 3.7,  # Dockerfile - commands and paths
+    ".tf": 3.5,  # Terraform - infrastructure as code
+}
+
+
+def get_token_estimation_ratio(file_path: str) -> float:
+    """
+    Get the token estimation ratio for a file based on its extension.
+
+    Args:
+        file_path: Path to the file
+
+    Returns:
+        Token-to-byte ratio for the file type (default: 3.5 for unknown types)
+    """
+    from pathlib import Path
+
+    extension = Path(file_path).suffix.lower()
+    return TOKEN_ESTIMATION_RATIOS.get(extension, 3.5)  # Conservative default
--- a/utils/file_utils.py
+++ b/utils/file_utils.py
@@ -18,10 +18,12 @@ Security Model:
 - Symbolic links are resolved to ensure they stay within bounds
 """

+import json
 import logging
 import os
+import time
 from pathlib import Path
-from typing import Optional
+from typing import Callable, Optional

 from .file_types import BINARY_EXTENSIONS, CODE_EXTENSIONS, IMAGE_EXTENSIONS, TEXT_EXTENSIONS
 from .security_config import CONTAINER_WORKSPACE, EXCLUDED_DIRS, MCP_SIGNATURE_FILES, SECURITY_ROOT, WORKSPACE_ROOT
@@ -689,3 +691,349 @@ def read_files(
    result = "\n\n".join(content_parts) if content_parts else ""
    logger.debug(f"[FILES] read_files complete: {len(result)} chars, {total_tokens:,} tokens used")
    return result
+
+
+def estimate_file_tokens(file_path: str) -> int:
+    """
+    Estimate tokens for a file using file-type aware ratios.
+
+    Args:
+        file_path: Path to the file
+
+    Returns:
+        Estimated token count for the file
+    """
+    try:
+        translated_path = translate_path_for_environment(file_path)
+
+        if not os.path.exists(translated_path) or not os.path.isfile(translated_path):
+            return 0
+
+        file_size = os.path.getsize(translated_path)
+
+        # Get the appropriate ratio for this file type
+        from .file_types import get_token_estimation_ratio
+
+        ratio = get_token_estimation_ratio(file_path)
+
+        return int(file_size / ratio)
+    except Exception:
+        return 0
+
+
+def check_files_size_limit(files: list[str], max_tokens: int, threshold_percent: float = 1.0) -> tuple[bool, int, int]:
+    """
+    Check if a list of files would exceed token limits.
+
+    Args:
+        files: List of file paths to check
+        max_tokens: Maximum allowed tokens
+        threshold_percent: Percentage of max_tokens to use as threshold (0.0-1.0)
+
+    Returns:
+        Tuple of (within_limit, total_estimated_tokens, file_count)
+    """
+    if not files:
+        return True, 0, 0
+
+    total_estimated_tokens = 0
+    file_count = 0
+    threshold = int(max_tokens * threshold_percent)
+
+    for file_path in files:
+        try:
+            estimated_tokens = estimate_file_tokens(file_path)
+            total_estimated_tokens += estimated_tokens
+            if estimated_tokens > 0:  # Only count accessible files
+                file_count += 1
+        except Exception:
+            # Skip files that can't be accessed for size check
+            continue
+
+    within_limit = total_estimated_tokens <= threshold
+    return within_limit, total_estimated_tokens, file_count
+
+
+class LogTailer:
+    """
+    General-purpose log file tailer with rotation detection.
+
+    This class provides a reusable way to monitor log files for new content,
+    automatically handling log rotation and maintaining position tracking.
+    """
+
+    def __init__(self, file_path: str, initial_seek_end: bool = True):
+        """
+        Initialize log tailer for a specific file.
+
+        Args:
+            file_path: Path to the log file to monitor
+            initial_seek_end: If True, start monitoring from end of file
+        """
+        self.file_path = file_path
+        self.position = 0
+        self.last_size = 0
+        self.initial_seek_end = initial_seek_end
+
+        # Ensure file exists and initialize position
+        Path(self.file_path).touch()
+        if self.initial_seek_end and os.path.exists(self.file_path):
+            self.last_size = os.path.getsize(self.file_path)
+            self.position = self.last_size
+
+    def read_new_lines(self) -> list[str]:
+        """
+        Read new lines since last call, handling rotation.
+
+        Returns:
+            List of new lines from the file
+        """
+        if not os.path.exists(self.file_path):
+            return []
+
+        try:
+            current_size = os.path.getsize(self.file_path)
+
+            # Check for log rotation (file size decreased)
+            if current_size < self.last_size:
+                self.position = 0
+                self.last_size = current_size
+
+            with open(self.file_path, encoding="utf-8", errors="ignore") as f:
+                f.seek(self.position)
+                new_lines = f.readlines()
+                self.position = f.tell()
+                self.last_size = current_size
+
+                # Strip whitespace from each line
+                return [line.strip() for line in new_lines if line.strip()]
+
+        except OSError:
+            return []
+
+    def monitor_continuously(
+        self,
+        line_handler: Callable[[str], None],
+        check_interval: float = 0.5,
+        stop_condition: Optional[Callable[[], bool]] = None,
+    ):
+        """
+        Monitor file continuously and call handler for each new line.
+
+        Args:
+            line_handler: Function to call for each new line
+            check_interval: Seconds between file checks
+            stop_condition: Optional function that returns True to stop monitoring
+        """
+        while True:
+            try:
+                if stop_condition and stop_condition():
+                    break
+
+                new_lines = self.read_new_lines()
+                for line in new_lines:
+                    line_handler(line)
+
+                time.sleep(check_interval)
+
+            except KeyboardInterrupt:
+                break
+            except Exception as e:
+                logger.warning(f"Error monitoring log file {self.file_path}: {e}")
+                time.sleep(1)
+
+
+def read_json_file(file_path: str) -> Optional[dict]:
+    """
+    Read and parse a JSON file with proper error handling.
+
+    Args:
+        file_path: Path to the JSON file
+
+    Returns:
+        Parsed JSON data as dict, or None if file doesn't exist or invalid
+    """
+    try:
+        translated_path = translate_path_for_environment(file_path)
+        if not os.path.exists(translated_path):
+            return None
+
+        with open(translated_path, encoding="utf-8") as f:
+            return json.load(f)
+    except (json.JSONDecodeError, OSError):
+        return None
+
+
+def write_json_file(file_path: str, data: dict, indent: int = 2) -> bool:
+    """
+    Write data to a JSON file with proper formatting.
+
+    Args:
+        file_path: Path to write the JSON file
+        data: Dictionary data to serialize
+        indent: JSON indentation level
+
+    Returns:
+        True if successful, False otherwise
+    """
+    try:
+        translated_path = translate_path_for_environment(file_path)
+        os.makedirs(os.path.dirname(translated_path), exist_ok=True)
+
+        with open(translated_path, "w", encoding="utf-8") as f:
+            json.dump(data, f, indent=indent, ensure_ascii=False)
+        return True
+    except (OSError, TypeError):
+        return False
+
+
+def get_file_size(file_path: str) -> int:
+    """
+    Get file size in bytes with proper error handling.
+
+    Args:
+        file_path: Path to the file
+
+    Returns:
+        File size in bytes, or 0 if file doesn't exist or error
+    """
+    try:
+        translated_path = translate_path_for_environment(file_path)
+        if os.path.exists(translated_path) and os.path.isfile(translated_path):
+            return os.path.getsize(translated_path)
+        return 0
+    except OSError:
+        return 0
+
+
+def ensure_directory_exists(file_path: str) -> bool:
+    """
+    Ensure the parent directory of a file path exists.
+
+    Args:
+        file_path: Path to file (directory will be created for parent)
+
+    Returns:
+        True if directory exists or was created, False on error
+    """
+    try:
+        translated_path = translate_path_for_environment(file_path)
+        directory = os.path.dirname(translated_path)
+        if directory:
+            os.makedirs(directory, exist_ok=True)
+        return True
+    except OSError:
+        return False
+
+
+def is_text_file(file_path: str) -> bool:
+    """
+    Check if a file is likely a text file based on extension and content.
+
+    Args:
+        file_path: Path to the file
+
+    Returns:
+        True if file appears to be text, False otherwise
+    """
+    from .file_types import is_text_file as check_text_type
+
+    return check_text_type(file_path)
+
+
+def read_file_safely(file_path: str, max_size: int = 10 * 1024 * 1024) -> Optional[str]:
+    """
+    Read a file with size limits and encoding handling.
+
+    Args:
+        file_path: Path to the file
+        max_size: Maximum file size in bytes (default 10MB)
+
+    Returns:
+        File content as string, or None if file too large or unreadable
+    """
+    try:
+        translated_path = translate_path_for_environment(file_path)
+        if not os.path.exists(translated_path) or not os.path.isfile(translated_path):
+            return None
+
+        file_size = os.path.getsize(translated_path)
+        if file_size > max_size:
+            return None
+
+        with open(translated_path, encoding="utf-8", errors="ignore") as f:
+            return f.read()
+    except OSError:
+        return None
+
+
+def check_total_file_size(files: list[str], model_name: Optional[str] = None) -> Optional[dict]:
+    """
+    Check if total file sizes would exceed token threshold before embedding.
+
+    IMPORTANT: This performs STRICT REJECTION at MCP boundary.
+    No partial inclusion - either all files fit or request is rejected.
+    This forces Claude to make better file selection decisions.
+
+    Args:
+        files: List of file paths to check
+        model_name: Model name for context-aware thresholds, or None for default
+
+    Returns:
+        Dict with MCP_CODE_TOO_LARGE response if too large, None if acceptable
+    """
+    if not files:
+        return None
+
+    # Get model-specific token allocation (dynamic thresholds)
+    if not model_name:
+        from config import DEFAULT_MODEL
+
+        model_name = DEFAULT_MODEL
+
+    # Handle auto mode gracefully
+    if model_name.lower() == "auto":
+        from providers.registry import ModelProviderRegistry
+
+        model_name = ModelProviderRegistry.get_preferred_fallback_model()
+
+    from utils.model_context import ModelContext
+
+    model_context = ModelContext(model_name)
+    token_allocation = model_context.calculate_token_allocation()
+
+    # Dynamic threshold based on model capacity
+    context_window = token_allocation.total_tokens
+    if context_window >= 1_000_000:  # Gemini-class models
+        threshold_percent = 0.8  # Can be more generous
+    elif context_window >= 500_000:  # Mid-range models
+        threshold_percent = 0.7  # Moderate
+    else:  # OpenAI-class models (200K)
+        threshold_percent = 0.6  # Conservative
+
+    max_file_tokens = int(token_allocation.file_tokens * threshold_percent)
+
+    # Use centralized file size checking (threshold already applied to max_file_tokens)
+    within_limit, total_estimated_tokens, file_count = check_files_size_limit(files, max_file_tokens)
+
+    if not within_limit:
+        return {
+            "status": "code_too_large",
+            "content": (
+                f"The selected files are too large for analysis "
+                f"(estimated {total_estimated_tokens:,} tokens, limit {max_file_tokens:,}). "
+                f"Please select fewer, more specific files that are most relevant "
+                f"to your question, then invoke the tool again."
+            ),
+            "content_type": "text",
+            "metadata": {
+                "total_estimated_tokens": total_estimated_tokens,
+                "limit": max_file_tokens,
+                "file_count": file_count,
+                "threshold_percent": threshold_percent,
+                "model_context_window": context_window,
+                "instructions": "Reduce file selection and try again - all files must fit within budget",
+            },
+        }
+
+    return None  # Proceed with ALL files