Performance improvements when embedding files:

- Exit early at MCP boundary if files won't fit within given context of chosen model - Encourage claude to re-run with better context - Check file sizes before embedding - Drop files from older conversations when building continuations and give priority to newer files - List and mention excluded files to Claude on return - Improved tests - Improved precommit prompt - Added a new Low severity to precommit - Improved documentation of file embedding strategy - Refactor
2025-06-16 05:51:52 +04:00
parent 56333cbd86
commit 91077e3810
16 changed files with 1557 additions and 308 deletions
--- a/utils/conversation_memory.py
+++ b/utils/conversation_memory.py
@@ -30,12 +30,33 @@ Key Features:
 - Turn-by-turn conversation history storage with tool attribution
 - Cross-tool continuation support - switch tools while preserving context
 - File context preservation - files shared in earlier turns remain accessible
- Automatic turn limiting (5 turns max) to prevent runaway conversations
+- NEWEST-FIRST FILE PRIORITIZATION - when the same file appears in multiple turns,
+  references from newer turns take precedence over older ones. This ensures the
+  most recent file context is preserved when token limits require exclusions.
+- Automatic turn limiting (20 turns max) to prevent runaway conversations
 - Context reconstruction for stateless request continuity
- Redis-based persistence with automatic expiration (1 hour TTL)
+- Redis-based persistence with automatic expiration (3 hour TTL)
 - Thread-safe operations for concurrent access
 - Graceful degradation when Redis is unavailable

+FILE PRIORITIZATION STRATEGY:
+The conversation memory system implements a sophisticated file prioritization algorithm
+that ensures newer file references always take precedence over older ones:
+
+1. When collecting files across conversation turns, the system walks BACKWARDS through
+   turns (newest to oldest) and builds a unique file list
+2. If the same file path appears in multiple turns, only the reference from the
+   NEWEST turn is kept in the final list
+3. This "newest-first" ordering is preserved throughout the entire pipeline:
+   - get_conversation_file_list() establishes the order
+   - build_conversation_history() maintains it during token budgeting
+   - When token limits are hit, OLDER files are excluded first
+4. This strategy works across conversation chains - files from newer turns in ANY
+   thread take precedence over files from older turns in ANY thread
+
+This approach ensures that when token limits force file exclusions, the most
+recently referenced and contextually relevant files are preserved.
+
 USAGE EXAMPLE:
 1. Tool A creates thread: create_thread("analyze", request_data) → returns UUID
 2. Tool A adds response: add_turn(UUID, "assistant", response, files=[...], tool_name="analyze")
@@ -262,11 +283,12 @@ def add_turn(
    model_metadata: Optional[dict[str, Any]] = None,
 ) -> bool:
    """
-    Add turn to existing thread
+    Add turn to existing thread with atomic file ordering.

    Appends a new conversation turn to an existing thread. This is the core
    function for building conversation history and enabling cross-tool
-    continuation. Each turn preserves the tool and model that generated it.
+    continuation. Each turn preserves the tool and model that generated it,
+    and tracks file reception order using atomic Redis counters.

    Args:
        thread_id: UUID of the conversation thread
@@ -289,7 +311,7 @@ def add_turn(
    Note:
        - Refreshes thread TTL to configured timeout on successful update
        - Turn limits prevent runaway conversations
-        - File references are preserved for cross-tool access
+        - File references are preserved for cross-tool access with atomic ordering
        - Model information enables cross-provider conversations
    """
    logger.debug(f"[FLOW] Adding {role} turn to {thread_id} ({tool_name})")
@@ -374,77 +396,212 @@ def get_thread_chain(thread_id: str, max_depth: int = 20) -> list[ThreadContext]

 def get_conversation_file_list(context: ThreadContext) -> list[str]:
    """
-    Get all unique files referenced across all turns in a conversation.
+    Extract all unique files from conversation turns with newest-first prioritization.

-    This function extracts and deduplicates file references from all conversation
-    turns to enable efficient file embedding - files are read once and shared
-    across all turns rather than being embedded multiple times.
+    This function implements the core file prioritization logic used throughout the
+    conversation memory system. It walks backwards through conversation turns
+    (from newest to oldest) and collects unique file references, ensuring that
+    when the same file appears in multiple turns, the reference from the NEWEST
+    turn takes precedence.
+
+    PRIORITIZATION ALGORITHM:
+    1. Iterate through turns in REVERSE order (index len-1 down to 0)
+    2. For each turn, process files in the order they appear in turn.files
+    3. Add file to result list only if not already seen (newest reference wins)
+    4. Skip duplicate files that were already added from newer turns
+
+    This ensures that:
+    - Files from newer conversation turns appear first in the result
+    - When the same file is referenced multiple times, only the newest reference is kept
+    - The order reflects the most recent conversation context
+
+    Example:
+        Turn 1: files = ["main.py", "utils.py"]
+        Turn 2: files = ["test.py"]
+        Turn 3: files = ["main.py", "config.py"]  # main.py appears again
+
+        Result: ["main.py", "config.py", "test.py", "utils.py"]
+        (main.py from Turn 3 takes precedence over Turn 1)

    Args:
-        context: ThreadContext containing the complete conversation
+        context: ThreadContext containing all conversation turns to process

    Returns:
-        list[str]: Deduplicated list of file paths referenced in the conversation
+        list[str]: Unique file paths ordered by newest reference first.
+                   Empty list if no turns exist or no files are referenced.
+
+    Performance:
+        - Time Complexity: O(n*m) where n=turns, m=avg files per turn
+        - Space Complexity: O(f) where f=total unique files
+        - Uses set for O(1) duplicate detection
    """
    if not context.turns:
        logger.debug("[FILES] No turns found, returning empty file list")
        return []

-    # Collect all unique files from all turns, preserving order of first appearance
+    # Collect files by walking backwards (newest to oldest turns)
    seen_files = set()
-    unique_files = []
+    file_list = []

-    logger.debug(f"[FILES] Collecting files from {len(context.turns)} turns")
+    logger.debug(f"[FILES] Collecting files from {len(context.turns)} turns (newest first)")

-    for i, turn in enumerate(context.turns):
+    # Process turns in reverse order (newest first) - this is the CORE of newest-first prioritization
+    # By iterating from len-1 down to 0, we encounter newer turns before older turns
+    # When we find a duplicate file, we skip it because the newer version is already in our list
+    for i in range(len(context.turns) - 1, -1, -1):  # REVERSE: newest turn first
+        turn = context.turns[i]
        if turn.files:
            logger.debug(f"[FILES] Turn {i + 1} has {len(turn.files)} files: {turn.files}")
            for file_path in turn.files:
                if file_path not in seen_files:
+                    # First time seeing this file - add it (this is the NEWEST reference)
                    seen_files.add(file_path)
-                    unique_files.append(file_path)
-                    logger.debug(f"[FILES] Added new file: {file_path}")
+                    file_list.append(file_path)
+                    logger.debug(f"[FILES] Added new file: {file_path} (from turn {i + 1})")
                else:
-                    logger.debug(f"[FILES] Duplicate file skipped: {file_path}")
-        else:
-            logger.debug(f"[FILES] Turn {i + 1} has no files")
+                    # File already seen from a NEWER turn - skip this older reference
+                    logger.debug(f"[FILES] Skipping duplicate file: {file_path} (newer version already included)")

-    logger.debug(f"[FILES] Final unique file list ({len(unique_files)}): {unique_files}")
-    return unique_files
+    logger.debug(f"[FILES] Final file list ({len(file_list)}): {file_list}")
+    return file_list
+
+
+def _plan_file_inclusion_by_size(all_files: list[str], max_file_tokens: int) -> tuple[list[str], list[str], int]:
+    """
+    Plan which files to include based on size constraints.
+
+    This is ONLY used for conversation history building, not MCP boundary checks.
+
+    Args:
+        all_files: List of files to consider for inclusion
+        max_file_tokens: Maximum tokens available for file content
+
+    Returns:
+        Tuple of (files_to_include, files_to_skip, estimated_total_tokens)
+    """
+    if not all_files:
+        return [], [], 0
+
+    files_to_include = []
+    files_to_skip = []
+    total_tokens = 0
+
+    logger.debug(f"[FILES] Planning inclusion for {len(all_files)} files with budget {max_file_tokens:,} tokens")
+
+    for file_path in all_files:
+        try:
+            from utils.file_utils import estimate_file_tokens, translate_path_for_environment
+
+            translated_path = translate_path_for_environment(file_path)
+
+            if os.path.exists(translated_path) and os.path.isfile(translated_path):
+                # Use centralized token estimation for consistency
+                estimated_tokens = estimate_file_tokens(file_path)
+
+                if total_tokens + estimated_tokens <= max_file_tokens:
+                    files_to_include.append(file_path)
+                    total_tokens += estimated_tokens
+                    logger.debug(
+                        f"[FILES] Including {file_path} - {estimated_tokens:,} tokens (total: {total_tokens:,})"
+                    )
+                else:
+                    files_to_skip.append(file_path)
+                    logger.debug(
+                        f"[FILES] Skipping {file_path} - would exceed budget (needs {estimated_tokens:,} tokens)"
+                    )
+            else:
+                files_to_skip.append(file_path)
+                logger.debug(f"[FILES] Skipping {file_path} - file not accessible")
+
+        except Exception as e:
+            files_to_skip.append(file_path)
+            logger.debug(f"[FILES] Skipping {file_path} - error: {type(e).__name__}: {e}")
+
+    logger.debug(
+        f"[FILES] Inclusion plan: {len(files_to_include)} include, {len(files_to_skip)} skip, {total_tokens:,} tokens"
+    )
+    return files_to_include, files_to_skip, total_tokens


 def build_conversation_history(context: ThreadContext, model_context=None, read_files_func=None) -> tuple[str, int]:
    """
    Build formatted conversation history for tool prompts with embedded file contents.

-    Creates a formatted string representation of the conversation history that includes
-    full file contents from all referenced files. Files are embedded only ONCE at the
-    start, even if referenced in multiple turns, to prevent duplication and optimize
-    token usage.
+    Creates a comprehensive conversation history that includes both conversation turns and
+    file contents, with intelligent prioritization to maximize relevant context within
+    token limits. This function enables stateless tools to access complete conversation
+    context from previous interactions, including cross-tool continuations.

-    If the thread has a parent chain, this function traverses the entire chain to
-    include the complete conversation history.
+    FILE PRIORITIZATION BEHAVIOR:
+    Files from newer conversation turns are prioritized over files from older turns.
+    When the same file appears in multiple turns, the reference from the NEWEST turn
+    takes precedence. This ensures the most recent file context is preserved when
+    token limits require file exclusions.
+
+    CONVERSATION CHAIN HANDLING:
+    If the thread has a parent_thread_id, this function traverses the entire chain
+    to include complete conversation history across multiple linked threads. File
+    prioritization works across the entire chain, not just the current thread.
+
+    TOKEN MANAGEMENT:
+    - Uses model-specific token allocation (file_tokens + history_tokens)
+    - Files are embedded ONCE at the start to prevent duplication
+    - Conversation turns are processed newest-first but presented chronologically
+    - Stops adding turns when token budget would be exceeded
+    - Gracefully handles token limits with informative notes

    Args:
-        context: ThreadContext containing the complete conversation
-        model_context: ModelContext for token allocation (optional, uses DEFAULT_MODEL if not provided)
-        read_files_func: Optional function to read files (for testing)
+        context: ThreadContext containing the conversation to format
+        model_context: ModelContext for token allocation (optional, uses DEFAULT_MODEL fallback)
+        read_files_func: Optional function to read files (primarily for testing)

    Returns:
        tuple[str, int]: (formatted_conversation_history, total_tokens_used)
-        Returns ("", 0) if no conversation turns exist
+        Returns ("", 0) if no conversation turns exist in the context

-    Format:
-        - Header with thread metadata and turn count
-        - All referenced files embedded once with full contents
-        - Each turn shows: role, tool used, which files were used, content
-        - Clear delimiters for AI parsing
-        - Continuation instruction at end
+    Output Format:
+        === CONVERSATION HISTORY (CONTINUATION) ===
+        Thread: <thread_id>
+        Tool: <original_tool_name>
+        Turn <current>/<max_allowed>
+        You are continuing this conversation thread from where it left off.

-    Note:
-        This formatted history allows tools to "see" both conversation context AND
-        file contents from previous tools, enabling true cross-tool collaboration
-        while preventing duplicate file embeddings.
+        === FILES REFERENCED IN THIS CONVERSATION ===
+        The following files have been shared and analyzed during our conversation.
+        [NOTE: X files omitted due to size constraints]
+        Refer to these when analyzing the context and requests below:
+
+        <embedded_file_contents_with_line_numbers>
+
+        === END REFERENCED FILES ===
+
+        Previous conversation turns:
+
+        --- Turn 1 (Claude) ---
+        Files used in this turn: file1.py, file2.py
+
+        <turn_content>
+
+        --- Turn 2 (Gemini using analyze via google/gemini-2.5-flash) ---
+        Files used in this turn: file3.py
+
+        <turn_content>
+
+        === END CONVERSATION HISTORY ===
+
+        IMPORTANT: You are continuing an existing conversation thread...
+        This is turn X of the conversation - use the conversation history above...
+
+    Cross-Tool Collaboration:
+        This formatted history allows any tool to "see" both conversation context AND
+        file contents from previous tools, enabling seamless handoffs between analyze,
+        codereview, debug, chat, and other tools while maintaining complete context.
+
+    Performance Characteristics:
+        - O(n) file collection with newest-first prioritization
+        - Intelligent token budgeting prevents context window overflow
+        - Redis-based persistence with automatic TTL management
+        - Graceful degradation when files are inaccessible or too large
    """
    # Get the complete thread chain
    if context.parent_thread_id:
@@ -453,19 +610,25 @@ def build_conversation_history(context: ThreadContext, model_context=None, read_

        # Collect all turns from all threads in chain
        all_turns = []
-        all_files_set = set()
        total_turns = 0

        for thread in chain:
            all_turns.extend(thread.turns)
            total_turns += len(thread.turns)

-            # Collect files from this thread
-            for turn in thread.turns:
-                if turn.files:
-                    all_files_set.update(turn.files)
-
-        all_files = list(all_files_set)
+        # Use centralized file collection logic for consistency across the entire chain
+        # This ensures files from newer turns across ALL threads take precedence
+        # over files from older turns, maintaining the newest-first prioritization
+        # even when threads are chained together
+        temp_context = ThreadContext(
+            thread_id="merged_chain",
+            created_at=context.created_at,
+            last_updated_at=context.last_updated_at,
+            tool_name=context.tool_name,
+            turns=all_turns,  # All turns from entire chain in chronological order
+            initial_context=context.initial_context,
+        )
+        all_files = get_conversation_file_list(temp_context)  # Applies newest-first logic to entire chain
        logger.debug(f"[THREAD] Built history from {len(chain)} threads with {total_turns} total turns")
    else:
        # Single thread, no parent chain
@@ -511,101 +674,91 @@ def build_conversation_history(context: ThreadContext, model_context=None, read_
        "",
    ]

-    # Embed all files referenced in this conversation once at the start
+    # Embed files referenced in this conversation with size-aware selection
    if all_files:
        logger.debug(f"[FILES] Starting embedding for {len(all_files)} files")
-        history_parts.extend(
-            [
-                "=== FILES REFERENCED IN THIS CONVERSATION ===",
-                "The following files have been shared and analyzed during our conversation.",
-                "Refer to these when analyzing the context and requests below:",
-                "",
-            ]
-        )

-        if read_files_func is None:
-            from utils.file_utils import read_file_content
+        # Plan file inclusion based on size constraints
+        # CRITICAL: all_files is already ordered by newest-first prioritization from get_conversation_file_list()
+        # So when _plan_file_inclusion_by_size() hits token limits, it naturally excludes OLDER files first
+        # while preserving the most recent file references - exactly what we want!
+        files_to_include, files_to_skip, estimated_tokens = _plan_file_inclusion_by_size(all_files, max_file_tokens)

-            # Optimized: read files incrementally with token tracking
-            file_contents = []
-            total_tokens = 0
-            files_included = 0
-            files_truncated = 0
+        if files_to_skip:
+            logger.info(f"[FILES] Skipping {len(files_to_skip)} files due to size constraints: {files_to_skip}")

-            for file_path in all_files:
-                try:
-                    logger.debug(f"[FILES] Processing file {file_path}")
-                    # Correctly unpack the tuple returned by read_file_content
-                    formatted_content, content_tokens = read_file_content(file_path)
-                    if formatted_content:
-                        # read_file_content already returns formatted content, use it directly
-                        # Check if adding this file would exceed the limit
-                        if total_tokens + content_tokens <= max_file_tokens:
+        if files_to_include:
+            history_parts.extend(
+                [
+                    "=== FILES REFERENCED IN THIS CONVERSATION ===",
+                    "The following files have been shared and analyzed during our conversation.",
+                    (
+                        ""
+                        if not files_to_skip
+                        else f"[NOTE: {len(files_to_skip)} files omitted due to size constraints]"
+                    ),
+                    "Refer to these when analyzing the context and requests below:",
+                    "",
+                ]
+            )
+
+            if read_files_func is None:
+                from utils.file_utils import read_file_content
+
+                # Process files for embedding
+                file_contents = []
+                total_tokens = 0
+                files_included = 0
+
+                for file_path in files_to_include:
+                    try:
+                        logger.debug(f"[FILES] Processing file {file_path}")
+                        formatted_content, content_tokens = read_file_content(file_path)
+                        if formatted_content:
                            file_contents.append(formatted_content)
                            total_tokens += content_tokens
                            files_included += 1
                            logger.debug(
                                f"File embedded in conversation history: {file_path} ({content_tokens:,} tokens)"
                            )
-                            logger.debug(
-                                f"[FILES] Successfully embedded {file_path} - {content_tokens:,} tokens (total: {total_tokens:,})"
-                            )
                        else:
-                            files_truncated += 1
-                            logger.debug(
-                                f"File truncated due to token limit: {file_path} ({content_tokens:,} tokens, would exceed {max_file_tokens:,} limit)"
-                            )
-                            logger.debug(
-                                f"[FILES] File {file_path} would exceed token limit - skipping (would be {total_tokens + content_tokens:,} tokens)"
-                            )
-                            # Stop processing more files
-                            break
-                    else:
-                        logger.debug(f"File skipped (empty content): {file_path}")
-                        logger.debug(f"[FILES] File {file_path} has empty content - skipping")
-                except Exception as e:
-                    # Skip files that can't be read but log the failure
-                    logger.warning(
-                        f"Failed to embed file in conversation history: {file_path} - {type(e).__name__}: {e}"
-                    )
-                    logger.debug(f"[FILES] Failed to read file {file_path} - {type(e).__name__}: {e}")
-                    continue
+                            logger.debug(f"File skipped (empty content): {file_path}")
+                    except Exception as e:
+                        logger.warning(
+                            f"Failed to embed file in conversation history: {file_path} - {type(e).__name__}: {e}"
+                        )
+                        continue

-            if file_contents:
-                files_content = "".join(file_contents)
-                if files_truncated > 0:
-                    files_content += (
-                        f"\n[NOTE: {files_truncated} additional file(s) were truncated due to token limit]\n"
-                    )
-                history_parts.append(files_content)
-                logger.debug(
-                    f"Conversation history file embedding complete: {files_included} files embedded, {files_truncated} truncated, {total_tokens:,} total tokens"
-                )
-                logger.debug(
-                    f"[FILES] File embedding summary - {files_included} embedded, {files_truncated} truncated, {total_tokens:,} tokens total"
-                )
-            else:
-                history_parts.append("(No accessible files found)")
-                logger.debug(
-                    f"Conversation history file embedding: no accessible files found from {len(all_files)} requested"
-                )
-                logger.debug(f"[FILES] No accessible files found from {len(all_files)} requested files")
-        else:
-            # Fallback to original read_files function for backward compatibility
-            files_content = read_files_func(all_files)
-            if files_content:
-                # Add token validation for the combined file content
-                from utils.token_utils import check_token_limit
-
-                within_limit, estimated_tokens = check_token_limit(files_content)
-                if within_limit:
+                if file_contents:
+                    files_content = "".join(file_contents)
+                    if files_to_skip:
+                        files_content += (
+                            f"\n[NOTE: {len(files_to_skip)} additional file(s) were omitted due to size constraints. "
+                            f"These were older files from earlier conversation turns.]\n"
+                        )
                    history_parts.append(files_content)
+                    logger.debug(
+                        f"Conversation history file embedding complete: {files_included} files embedded, {len(files_to_skip)} omitted, {total_tokens:,} total tokens"
+                    )
                else:
-                    # Handle token limit exceeded for conversation files
-                    error_message = f"ERROR: The total size of files referenced in this conversation has exceeded the context limit and cannot be displayed.\nEstimated tokens: {estimated_tokens}, but limit is {max_file_tokens}."
-                    history_parts.append(error_message)
+                    history_parts.append("(No accessible files found)")
+                    logger.debug(f"[FILES] No accessible files found from {len(files_to_include)} planned files")
            else:
-                history_parts.append("(No accessible files found)")
+                # Fallback to original read_files function for backward compatibility
+                files_content = read_files_func(all_files)
+                if files_content:
+                    # Add token validation for the combined file content
+                    from utils.token_utils import check_token_limit
+
+                    within_limit, estimated_tokens = check_token_limit(files_content)
+                    if within_limit:
+                        history_parts.append(files_content)
+                    else:
+                        # Handle token limit exceeded for conversation files
+                        error_message = f"ERROR: The total size of files referenced in this conversation has exceeded the context limit and cannot be displayed.\nEstimated tokens: {estimated_tokens}, but limit is {max_file_tokens}."
+                        history_parts.append(error_message)
+                else:
+                    history_parts.append("(No accessible files found)")

        history_parts.extend(
            [