Selective fixes from PR #35: Model-specific rate limits & robustness improvements (#37)

* feat: apply local user changes and fixes * ;D * Implement OpenAI support, model-specific rate limiting, and robustness fixes * docs: update pr title * feat: ensure unique openai models endpoint * fix: startup banner alignment and removed duplicates * feat: add model fallback system with --fallback flag * fix: accounts cli hanging after completion * feat: add exit option to accounts cli menu * fix: remove circular dependency warning for fallback flag * feat: show active modes in banner and hide their flags * Remove OpenAI compatibility and fallback features from PR #35 Cherry-picked selective fixes from PR #35 while removing: - OpenAI-compatible API endpoints (/openai/v1/*) - Model fallback system (fallback-config.js) - Thinking block skip for Gemini models - Unnecessary files (pullrequest.md, test-fix.js, test-openai.js) Retained improvements: - Network error handling with retry logic - Model-specific rate limiting - Enhanced health check with quota info - CLI fixes (exit option, process.exit) - Startup banner alignment (debug mode only) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * banner alignment fix * Refactor: Model-specific rate limits and cleanup deprecated code - Remove global rate limit fields (isRateLimited, rateLimitResetTime) in favor of model-specific limits (modelRateLimits[modelId]) - Remove deprecated wrapper functions (is429Error, isAuthInvalidError) from handlers - Filter fetchAvailableModels to only return Claude and Gemini models - Fix getCurrentStickyAccount() to pass model param after waiting - Update /account-limits endpoint to show model-specific limits - Remove multi-account OAuth flow to avoid state mismatch errors 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * fix: show (x/y) limited status in account-limits table - Status is now "ok" only when all models are available - Shows "(x/y) limited" when x out of y models are exhausted - Provides better visibility into partial rate limiting 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * docs: update CLAUDE.md with model-specific rate limiting - Document modelRateLimits[modelId] for per-model rate tracking - Add isNetworkError() helper to utilities section 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: M1noa <minoa@minoa.cat> Co-authored-by: Minoa <altgithub@minoa.cat> Co-authored-by: Claude <noreply@anthropic.com>
2026-01-03 15:33:49 +05:30
parent 2d05dd5b62
commit 9c4a712a9a
15 changed files with 474 additions and 194 deletions
--- a/src/format/request-converter.js
+++ b/src/format/request-converter.js
@@ -152,6 +152,16 @@ export function convertAnthropicToGoogle(anthropicRequest) {
            if (thinkingBudget) {
                thinkingConfig.thinking_budget = thinkingBudget;
                logger.debug(`[RequestConverter] Claude thinking enabled with budget: ${thinkingBudget}`);
+
+                // Validate max_tokens > thinking_budget as required by the API
+                const currentMaxTokens = googleRequest.generationConfig.maxOutputTokens;
+                if (currentMaxTokens && currentMaxTokens <= thinkingBudget) {
+                    // Bump max_tokens to allow for some response content
+                    // Default to budget + 8192 (standard output buffer)
+                    const adjustedMaxTokens = thinkingBudget + 8192;
+                    logger.warn(`[RequestConverter] max_tokens (${currentMaxTokens}) <= thinking_budget (${thinkingBudget}). Adjusting to ${adjustedMaxTokens} to satisfy API requirements`);
+                    googleRequest.generationConfig.maxOutputTokens = adjustedMaxTokens;
+                }
            } else {
                logger.debug('[RequestConverter] Claude thinking enabled (no budget specified)');
            }