Reading the response — content blocks, stop_reason, usage (step 3/9) · llm apis

Six stop reasons, two that drive 99% of code

Your agent loop's main control flow IS branching on stop_reason. Get this wrong and the loop either exits early or never exits. Six values you'll see (the first two drive 99% of real code; the other four are exception paths you still have to handle):

`end_turn` — the model is done

The model finished saying what it had to say. Read the text from content, return it to the user, exit the loop. This is the "normal" end state for non-agent calls.

`tool_use` — the model wants you to run something

The model emitted at least one tool_use block. Your code:

Runs each tool by name with its input.
Appends a user-role turn with tool_result blocks (one per tool_use_id).
Calls the model again with the appended history.

The loop continues until end_turn (or you hit your iteration cap).

`max_tokens` — the response was cut off

You set max_tokens=1024 and the model wanted to write more. The output is truncated mid-sentence. Two options:

Bump max_tokens and retry the whole call (simple, sometimes wasteful).
Continue the conversation by sending the truncated response back as the assistant's prior turn and asking "continue."

In production agents, you cap max_tokens aggressively and detect this stop reason as a signal that the task was too big. Catching it loudly beats silently shipping a half-answer.

`stop_sequence` — a custom stop hit

You passed stop_sequences=["\n\nUser:", "STOP"] and one of those strings appeared in the output. Rare in production; common in fine-tuned-model legacy code. If you see this, you know the prompt set up custom delimiters; the response was intentionally cut at one.

`pause_turn` — server-side tool needs more turns

Anthropic-only. When you use Anthropic's hosted tools (their server runs the tool, not yours) and the tool's internal loop needs more turns than fit in one response, you get pause_turn as a checkpoint, not an error. The fix: just call the API again with the same messages — the server resumes from where it paused. This is a continuation signal, not a timeout.

`refusal` — the model declined to comply

The model determined the request crosses a safety boundary and declined. The response still has structure (you'll see the refusal in content), but it isn't a regular answer and retrying the same prompt won't change the result. Surface it to the user plainly; don't dress it up as a generic error and don't put it in a retry loop.

What this maps to in code

def drive_loop(response, messages, run_tool):
    if response.stop_reason == "end_turn":
        return extract_text(response.content), "done"

    if response.stop_reason == "tool_use":
        tool_results = []
        for block in response.content:
            if block.type == "tool_use":
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": run_tool(block.name, block.input),
                })
        messages.append({"role": "assistant", "content": response.content})
        messages.append({"role": "user", "content": tool_results})
        return None, "continue"

    if response.stop_reason == "max_tokens":
        return None, "truncated"

    if response.stop_reason == "refusal":
        # safety refusal — show plainly, don't retry
        return None, "refusal"

    # stop_sequence, pause_turn: handle as needed
    return None, response.stop_reason

Two branches drive 99% of agent code. The others are exception paths. The bug step 7 fixes: hardcoding if response.stop_reason == "stop" (a value that doesn't exist in the API — it's a common misremembering of end_turn).

Anthropic vs OpenAI: the same field, slightly different

OpenAI's Responses API exposes response.status (completed, incomplete, failed) at the response level, plus per-output finish_reason (stop, length, tool_calls, content_filter). Same idea, different vocabulary. Code that targets both providers typically wraps stop_reason / finish_reason into a normalized enum.

The principle is universal: every modern LLM API tells you why it stopped, and your code branches on that field.

Six stop reasons, two that drive 99% of code

`end_turn` — the model is done

The model finished saying what it had to say. Read the text from content, return it to the user, exit the loop. This is the "normal" end state for non-agent calls.

`tool_use` — the model wants you to run something

The model emitted at least one tool_use block. Your code:

Runs each tool by name with its input.
Appends a user-role turn with tool_result blocks (one per tool_use_id).
Calls the model again with the appended history.

The loop continues until end_turn (or you hit your iteration cap).

`max_tokens` — the response was cut off

You set max_tokens=1024 and the model wanted to write more. The output is truncated mid-sentence. Two options:

Bump max_tokens and retry the whole call (simple, sometimes wasteful).
Continue the conversation by sending the truncated response back as the assistant's prior turn and asking "continue."

In production agents, you cap max_tokens aggressively and detect this stop reason as a signal that the task was too big. Catching it loudly beats silently shipping a half-answer.

`stop_sequence` — a custom stop hit

`pause_turn` — server-side tool needs more turns

`refusal` — the model declined to comply

What this maps to in code

def drive_loop(response, messages, run_tool):
    if response.stop_reason == "end_turn":
        return extract_text(response.content), "done"

    if response.stop_reason == "tool_use":
        tool_results = []
        for block in response.content:
            if block.type == "tool_use":
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": run_tool(block.name, block.input),
                })
        messages.append({"role": "assistant", "content": response.content})
        messages.append({"role": "user", "content": tool_results})
        return None, "continue"

    if response.stop_reason == "max_tokens":
        return None, "truncated"

    if response.stop_reason == "refusal":
        # safety refusal — show plainly, don't retry
        return None, "refusal"

    # stop_sequence, pause_turn: handle as needed
    return None, response.stop_reason

Anthropic vs OpenAI: the same field, slightly different

The principle is universal: every modern LLM API tells you why it stopped, and your code branches on that field.

Reading the response — content blocks, stop_reason, usage — step 3 of 9

Six stop reasons, two that drive 99% of code

end_turn — the model is done

tool_use — the model wants you to run something

max_tokens — the response was cut off

stop_sequence — a custom stop hit

pause_turn — server-side tool needs more turns

refusal — the model declined to comply

What this maps to in code

Anthropic vs OpenAI: the same field, slightly different

Six stop reasons, two that drive 99% of code

end_turn — the model is done

tool_use — the model wants you to run something

max_tokens — the response was cut off

stop_sequence — a custom stop hit

pause_turn — server-side tool needs more turns

refusal — the model declined to comply

What this maps to in code

Anthropic vs OpenAI: the same field, slightly different

`end_turn` — the model is done

`tool_use` — the model wants you to run something

`max_tokens` — the response was cut off

`stop_sequence` — a custom stop hit

`pause_turn` — server-side tool needs more turns

`refusal` — the model declined to comply

`end_turn` — the model is done

`tool_use` — the model wants you to run something

`max_tokens` — the response was cut off

`stop_sequence` — a custom stop hit

`pause_turn` — server-side tool needs more turns

`refusal` — the model declined to comply