Performance improvements when embedding files:

- Exit early at MCP boundary if files won't fit within given context of chosen model
- Encourage claude to re-run with better context
- Check file sizes before embedding
- Drop files from older conversations when building continuations and give priority to newer files
- List and mention excluded files to Claude on return
- Improved tests
- Improved precommit prompt
- Added a new Low severity to precommit
- Improved documentation of file embedding strategy
- Refactor
This commit is contained in:
Fahad
2025-06-16 05:51:52 +04:00
parent 56333cbd86
commit 91077e3810
16 changed files with 1557 additions and 308 deletions

View File

@@ -30,12 +30,33 @@ Key Features:
- Turn-by-turn conversation history storage with tool attribution
- Cross-tool continuation support - switch tools while preserving context
- File context preservation - files shared in earlier turns remain accessible
- Automatic turn limiting (5 turns max) to prevent runaway conversations
- NEWEST-FIRST FILE PRIORITIZATION - when the same file appears in multiple turns,
references from newer turns take precedence over older ones. This ensures the
most recent file context is preserved when token limits require exclusions.
- Automatic turn limiting (20 turns max) to prevent runaway conversations
- Context reconstruction for stateless request continuity
- Redis-based persistence with automatic expiration (1 hour TTL)
- Redis-based persistence with automatic expiration (3 hour TTL)
- Thread-safe operations for concurrent access
- Graceful degradation when Redis is unavailable
FILE PRIORITIZATION STRATEGY:
The conversation memory system implements a sophisticated file prioritization algorithm
that ensures newer file references always take precedence over older ones:
1. When collecting files across conversation turns, the system walks BACKWARDS through
turns (newest to oldest) and builds a unique file list
2. If the same file path appears in multiple turns, only the reference from the
NEWEST turn is kept in the final list
3. This "newest-first" ordering is preserved throughout the entire pipeline:
- get_conversation_file_list() establishes the order
- build_conversation_history() maintains it during token budgeting
- When token limits are hit, OLDER files are excluded first
4. This strategy works across conversation chains - files from newer turns in ANY
thread take precedence over files from older turns in ANY thread
This approach ensures that when token limits force file exclusions, the most
recently referenced and contextually relevant files are preserved.
USAGE EXAMPLE:
1. Tool A creates thread: create_thread("analyze", request_data) → returns UUID
2. Tool A adds response: add_turn(UUID, "assistant", response, files=[...], tool_name="analyze")
@@ -262,11 +283,12 @@ def add_turn(
model_metadata: Optional[dict[str, Any]] = None,
) -> bool:
"""
Add turn to existing thread
Add turn to existing thread with atomic file ordering.
Appends a new conversation turn to an existing thread. This is the core
function for building conversation history and enabling cross-tool
continuation. Each turn preserves the tool and model that generated it.
continuation. Each turn preserves the tool and model that generated it,
and tracks file reception order using atomic Redis counters.
Args:
thread_id: UUID of the conversation thread
@@ -289,7 +311,7 @@ def add_turn(
Note:
- Refreshes thread TTL to configured timeout on successful update
- Turn limits prevent runaway conversations
- File references are preserved for cross-tool access
- File references are preserved for cross-tool access with atomic ordering
- Model information enables cross-provider conversations
"""
logger.debug(f"[FLOW] Adding {role} turn to {thread_id} ({tool_name})")
@@ -374,77 +396,212 @@ def get_thread_chain(thread_id: str, max_depth: int = 20) -> list[ThreadContext]
def get_conversation_file_list(context: ThreadContext) -> list[str]:
"""
Get all unique files referenced across all turns in a conversation.
Extract all unique files from conversation turns with newest-first prioritization.
This function extracts and deduplicates file references from all conversation
turns to enable efficient file embedding - files are read once and shared
across all turns rather than being embedded multiple times.
This function implements the core file prioritization logic used throughout the
conversation memory system. It walks backwards through conversation turns
(from newest to oldest) and collects unique file references, ensuring that
when the same file appears in multiple turns, the reference from the NEWEST
turn takes precedence.
PRIORITIZATION ALGORITHM:
1. Iterate through turns in REVERSE order (index len-1 down to 0)
2. For each turn, process files in the order they appear in turn.files
3. Add file to result list only if not already seen (newest reference wins)
4. Skip duplicate files that were already added from newer turns
This ensures that:
- Files from newer conversation turns appear first in the result
- When the same file is referenced multiple times, only the newest reference is kept
- The order reflects the most recent conversation context
Example:
Turn 1: files = ["main.py", "utils.py"]
Turn 2: files = ["test.py"]
Turn 3: files = ["main.py", "config.py"] # main.py appears again
Result: ["main.py", "config.py", "test.py", "utils.py"]
(main.py from Turn 3 takes precedence over Turn 1)
Args:
context: ThreadContext containing the complete conversation
context: ThreadContext containing all conversation turns to process
Returns:
list[str]: Deduplicated list of file paths referenced in the conversation
list[str]: Unique file paths ordered by newest reference first.
Empty list if no turns exist or no files are referenced.
Performance:
- Time Complexity: O(n*m) where n=turns, m=avg files per turn
- Space Complexity: O(f) where f=total unique files
- Uses set for O(1) duplicate detection
"""
if not context.turns:
logger.debug("[FILES] No turns found, returning empty file list")
return []
# Collect all unique files from all turns, preserving order of first appearance
# Collect files by walking backwards (newest to oldest turns)
seen_files = set()
unique_files = []
file_list = []
logger.debug(f"[FILES] Collecting files from {len(context.turns)} turns")
logger.debug(f"[FILES] Collecting files from {len(context.turns)} turns (newest first)")
for i, turn in enumerate(context.turns):
# Process turns in reverse order (newest first) - this is the CORE of newest-first prioritization
# By iterating from len-1 down to 0, we encounter newer turns before older turns
# When we find a duplicate file, we skip it because the newer version is already in our list
for i in range(len(context.turns) - 1, -1, -1): # REVERSE: newest turn first
turn = context.turns[i]
if turn.files:
logger.debug(f"[FILES] Turn {i + 1} has {len(turn.files)} files: {turn.files}")
for file_path in turn.files:
if file_path not in seen_files:
# First time seeing this file - add it (this is the NEWEST reference)
seen_files.add(file_path)
unique_files.append(file_path)
logger.debug(f"[FILES] Added new file: {file_path}")
file_list.append(file_path)
logger.debug(f"[FILES] Added new file: {file_path} (from turn {i + 1})")
else:
logger.debug(f"[FILES] Duplicate file skipped: {file_path}")
else:
logger.debug(f"[FILES] Turn {i + 1} has no files")
# File already seen from a NEWER turn - skip this older reference
logger.debug(f"[FILES] Skipping duplicate file: {file_path} (newer version already included)")
logger.debug(f"[FILES] Final unique file list ({len(unique_files)}): {unique_files}")
return unique_files
logger.debug(f"[FILES] Final file list ({len(file_list)}): {file_list}")
return file_list
def _plan_file_inclusion_by_size(all_files: list[str], max_file_tokens: int) -> tuple[list[str], list[str], int]:
"""
Plan which files to include based on size constraints.
This is ONLY used for conversation history building, not MCP boundary checks.
Args:
all_files: List of files to consider for inclusion
max_file_tokens: Maximum tokens available for file content
Returns:
Tuple of (files_to_include, files_to_skip, estimated_total_tokens)
"""
if not all_files:
return [], [], 0
files_to_include = []
files_to_skip = []
total_tokens = 0
logger.debug(f"[FILES] Planning inclusion for {len(all_files)} files with budget {max_file_tokens:,} tokens")
for file_path in all_files:
try:
from utils.file_utils import estimate_file_tokens, translate_path_for_environment
translated_path = translate_path_for_environment(file_path)
if os.path.exists(translated_path) and os.path.isfile(translated_path):
# Use centralized token estimation for consistency
estimated_tokens = estimate_file_tokens(file_path)
if total_tokens + estimated_tokens <= max_file_tokens:
files_to_include.append(file_path)
total_tokens += estimated_tokens
logger.debug(
f"[FILES] Including {file_path} - {estimated_tokens:,} tokens (total: {total_tokens:,})"
)
else:
files_to_skip.append(file_path)
logger.debug(
f"[FILES] Skipping {file_path} - would exceed budget (needs {estimated_tokens:,} tokens)"
)
else:
files_to_skip.append(file_path)
logger.debug(f"[FILES] Skipping {file_path} - file not accessible")
except Exception as e:
files_to_skip.append(file_path)
logger.debug(f"[FILES] Skipping {file_path} - error: {type(e).__name__}: {e}")
logger.debug(
f"[FILES] Inclusion plan: {len(files_to_include)} include, {len(files_to_skip)} skip, {total_tokens:,} tokens"
)
return files_to_include, files_to_skip, total_tokens
def build_conversation_history(context: ThreadContext, model_context=None, read_files_func=None) -> tuple[str, int]:
"""
Build formatted conversation history for tool prompts with embedded file contents.
Creates a formatted string representation of the conversation history that includes
full file contents from all referenced files. Files are embedded only ONCE at the
start, even if referenced in multiple turns, to prevent duplication and optimize
token usage.
Creates a comprehensive conversation history that includes both conversation turns and
file contents, with intelligent prioritization to maximize relevant context within
token limits. This function enables stateless tools to access complete conversation
context from previous interactions, including cross-tool continuations.
If the thread has a parent chain, this function traverses the entire chain to
include the complete conversation history.
FILE PRIORITIZATION BEHAVIOR:
Files from newer conversation turns are prioritized over files from older turns.
When the same file appears in multiple turns, the reference from the NEWEST turn
takes precedence. This ensures the most recent file context is preserved when
token limits require file exclusions.
CONVERSATION CHAIN HANDLING:
If the thread has a parent_thread_id, this function traverses the entire chain
to include complete conversation history across multiple linked threads. File
prioritization works across the entire chain, not just the current thread.
TOKEN MANAGEMENT:
- Uses model-specific token allocation (file_tokens + history_tokens)
- Files are embedded ONCE at the start to prevent duplication
- Conversation turns are processed newest-first but presented chronologically
- Stops adding turns when token budget would be exceeded
- Gracefully handles token limits with informative notes
Args:
context: ThreadContext containing the complete conversation
model_context: ModelContext for token allocation (optional, uses DEFAULT_MODEL if not provided)
read_files_func: Optional function to read files (for testing)
context: ThreadContext containing the conversation to format
model_context: ModelContext for token allocation (optional, uses DEFAULT_MODEL fallback)
read_files_func: Optional function to read files (primarily for testing)
Returns:
tuple[str, int]: (formatted_conversation_history, total_tokens_used)
Returns ("", 0) if no conversation turns exist
Returns ("", 0) if no conversation turns exist in the context
Format:
- Header with thread metadata and turn count
- All referenced files embedded once with full contents
- Each turn shows: role, tool used, which files were used, content
- Clear delimiters for AI parsing
- Continuation instruction at end
Output Format:
=== CONVERSATION HISTORY (CONTINUATION) ===
Thread: <thread_id>
Tool: <original_tool_name>
Turn <current>/<max_allowed>
You are continuing this conversation thread from where it left off.
Note:
This formatted history allows tools to "see" both conversation context AND
file contents from previous tools, enabling true cross-tool collaboration
while preventing duplicate file embeddings.
=== FILES REFERENCED IN THIS CONVERSATION ===
The following files have been shared and analyzed during our conversation.
[NOTE: X files omitted due to size constraints]
Refer to these when analyzing the context and requests below:
<embedded_file_contents_with_line_numbers>
=== END REFERENCED FILES ===
Previous conversation turns:
--- Turn 1 (Claude) ---
Files used in this turn: file1.py, file2.py
<turn_content>
--- Turn 2 (Gemini using analyze via google/gemini-2.5-flash) ---
Files used in this turn: file3.py
<turn_content>
=== END CONVERSATION HISTORY ===
IMPORTANT: You are continuing an existing conversation thread...
This is turn X of the conversation - use the conversation history above...
Cross-Tool Collaboration:
This formatted history allows any tool to "see" both conversation context AND
file contents from previous tools, enabling seamless handoffs between analyze,
codereview, debug, chat, and other tools while maintaining complete context.
Performance Characteristics:
- O(n) file collection with newest-first prioritization
- Intelligent token budgeting prevents context window overflow
- Redis-based persistence with automatic TTL management
- Graceful degradation when files are inaccessible or too large
"""
# Get the complete thread chain
if context.parent_thread_id:
@@ -453,19 +610,25 @@ def build_conversation_history(context: ThreadContext, model_context=None, read_
# Collect all turns from all threads in chain
all_turns = []
all_files_set = set()
total_turns = 0
for thread in chain:
all_turns.extend(thread.turns)
total_turns += len(thread.turns)
# Collect files from this thread
for turn in thread.turns:
if turn.files:
all_files_set.update(turn.files)
all_files = list(all_files_set)
# Use centralized file collection logic for consistency across the entire chain
# This ensures files from newer turns across ALL threads take precedence
# over files from older turns, maintaining the newest-first prioritization
# even when threads are chained together
temp_context = ThreadContext(
thread_id="merged_chain",
created_at=context.created_at,
last_updated_at=context.last_updated_at,
tool_name=context.tool_name,
turns=all_turns, # All turns from entire chain in chronological order
initial_context=context.initial_context,
)
all_files = get_conversation_file_list(temp_context) # Applies newest-first logic to entire chain
logger.debug(f"[THREAD] Built history from {len(chain)} threads with {total_turns} total turns")
else:
# Single thread, no parent chain
@@ -511,101 +674,91 @@ def build_conversation_history(context: ThreadContext, model_context=None, read_
"",
]
# Embed all files referenced in this conversation once at the start
# Embed files referenced in this conversation with size-aware selection
if all_files:
logger.debug(f"[FILES] Starting embedding for {len(all_files)} files")
history_parts.extend(
[
"=== FILES REFERENCED IN THIS CONVERSATION ===",
"The following files have been shared and analyzed during our conversation.",
"Refer to these when analyzing the context and requests below:",
"",
]
)
if read_files_func is None:
from utils.file_utils import read_file_content
# Plan file inclusion based on size constraints
# CRITICAL: all_files is already ordered by newest-first prioritization from get_conversation_file_list()
# So when _plan_file_inclusion_by_size() hits token limits, it naturally excludes OLDER files first
# while preserving the most recent file references - exactly what we want!
files_to_include, files_to_skip, estimated_tokens = _plan_file_inclusion_by_size(all_files, max_file_tokens)
# Optimized: read files incrementally with token tracking
file_contents = []
total_tokens = 0
files_included = 0
files_truncated = 0
if files_to_skip:
logger.info(f"[FILES] Skipping {len(files_to_skip)} files due to size constraints: {files_to_skip}")
for file_path in all_files:
try:
logger.debug(f"[FILES] Processing file {file_path}")
# Correctly unpack the tuple returned by read_file_content
formatted_content, content_tokens = read_file_content(file_path)
if formatted_content:
# read_file_content already returns formatted content, use it directly
# Check if adding this file would exceed the limit
if total_tokens + content_tokens <= max_file_tokens:
if files_to_include:
history_parts.extend(
[
"=== FILES REFERENCED IN THIS CONVERSATION ===",
"The following files have been shared and analyzed during our conversation.",
(
""
if not files_to_skip
else f"[NOTE: {len(files_to_skip)} files omitted due to size constraints]"
),
"Refer to these when analyzing the context and requests below:",
"",
]
)
if read_files_func is None:
from utils.file_utils import read_file_content
# Process files for embedding
file_contents = []
total_tokens = 0
files_included = 0
for file_path in files_to_include:
try:
logger.debug(f"[FILES] Processing file {file_path}")
formatted_content, content_tokens = read_file_content(file_path)
if formatted_content:
file_contents.append(formatted_content)
total_tokens += content_tokens
files_included += 1
logger.debug(
f"File embedded in conversation history: {file_path} ({content_tokens:,} tokens)"
)
logger.debug(
f"[FILES] Successfully embedded {file_path} - {content_tokens:,} tokens (total: {total_tokens:,})"
)
else:
files_truncated += 1
logger.debug(
f"File truncated due to token limit: {file_path} ({content_tokens:,} tokens, would exceed {max_file_tokens:,} limit)"
)
logger.debug(
f"[FILES] File {file_path} would exceed token limit - skipping (would be {total_tokens + content_tokens:,} tokens)"
)
# Stop processing more files
break
else:
logger.debug(f"File skipped (empty content): {file_path}")
logger.debug(f"[FILES] File {file_path} has empty content - skipping")
except Exception as e:
# Skip files that can't be read but log the failure
logger.warning(
f"Failed to embed file in conversation history: {file_path} - {type(e).__name__}: {e}"
)
logger.debug(f"[FILES] Failed to read file {file_path} - {type(e).__name__}: {e}")
continue
logger.debug(f"File skipped (empty content): {file_path}")
except Exception as e:
logger.warning(
f"Failed to embed file in conversation history: {file_path} - {type(e).__name__}: {e}"
)
continue
if file_contents:
files_content = "".join(file_contents)
if files_truncated > 0:
files_content += (
f"\n[NOTE: {files_truncated} additional file(s) were truncated due to token limit]\n"
)
history_parts.append(files_content)
logger.debug(
f"Conversation history file embedding complete: {files_included} files embedded, {files_truncated} truncated, {total_tokens:,} total tokens"
)
logger.debug(
f"[FILES] File embedding summary - {files_included} embedded, {files_truncated} truncated, {total_tokens:,} tokens total"
)
else:
history_parts.append("(No accessible files found)")
logger.debug(
f"Conversation history file embedding: no accessible files found from {len(all_files)} requested"
)
logger.debug(f"[FILES] No accessible files found from {len(all_files)} requested files")
else:
# Fallback to original read_files function for backward compatibility
files_content = read_files_func(all_files)
if files_content:
# Add token validation for the combined file content
from utils.token_utils import check_token_limit
within_limit, estimated_tokens = check_token_limit(files_content)
if within_limit:
if file_contents:
files_content = "".join(file_contents)
if files_to_skip:
files_content += (
f"\n[NOTE: {len(files_to_skip)} additional file(s) were omitted due to size constraints. "
f"These were older files from earlier conversation turns.]\n"
)
history_parts.append(files_content)
logger.debug(
f"Conversation history file embedding complete: {files_included} files embedded, {len(files_to_skip)} omitted, {total_tokens:,} total tokens"
)
else:
# Handle token limit exceeded for conversation files
error_message = f"ERROR: The total size of files referenced in this conversation has exceeded the context limit and cannot be displayed.\nEstimated tokens: {estimated_tokens}, but limit is {max_file_tokens}."
history_parts.append(error_message)
history_parts.append("(No accessible files found)")
logger.debug(f"[FILES] No accessible files found from {len(files_to_include)} planned files")
else:
history_parts.append("(No accessible files found)")
# Fallback to original read_files function for backward compatibility
files_content = read_files_func(all_files)
if files_content:
# Add token validation for the combined file content
from utils.token_utils import check_token_limit
within_limit, estimated_tokens = check_token_limit(files_content)
if within_limit:
history_parts.append(files_content)
else:
# Handle token limit exceeded for conversation files
error_message = f"ERROR: The total size of files referenced in this conversation has exceeded the context limit and cannot be displayed.\nEstimated tokens: {estimated_tokens}, but limit is {max_file_tokens}."
history_parts.append(error_message)
else:
history_parts.append("(No accessible files found)")
history_parts.extend(
[