Count responses with thinking content (but no text) as successful, and validate actual response status instead of hardcoding 200. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>