Prompt caching correctly — the variable input goes LAST (step 9/9) · production tradeoffs

Checkpoint

One last thing before we move on. Same surface as a write step — but the lesson doesn't complete until this passes.

Final drill. Build a cache strategy decision helper. Write recommend_strategy(stable_tokens, variable_tokens, calls, max_gap_minutes, input_rate) that:

Computes uncached cost, 5m-cached cost, and 1h-cached cost.
- 5m-cached writes 1.25× input rate; 1h-cached writes 2× input rate.
- Both read at 0.1× input rate.
- 5m cache: if max_gap_minutes <= 5, write once and read on later calls. If max_gap_minutes > 5, approximate a miss on each call, so the stable prefix is written every time.
- 1h cache: write once, read for all subsequent calls.
Returns a dict {"uncached": <cost>, "cached_5m": <cost>, "cached_1h": <cost>, "winner": <strategy with lowest cost>}.
All costs rounded to 2 places.

Two scenarios run for you. Expected output:

short gaps (2 min): cached_5m=0.33 cached_1h=0.35 winner=cached_5m
gappy session (20 min gaps): cached_5m=3.06 cached_1h=0.35 winner=cached_1h

Checkpoint

One last thing before we move on. Same surface as a write step — but the lesson doesn't complete until this passes.

Final drill. Build a cache strategy decision helper. Write recommend_strategy(stable_tokens, variable_tokens, calls, max_gap_minutes, input_rate) that:

Computes uncached cost, 5m-cached cost, and 1h-cached cost.
- 5m-cached writes 1.25× input rate; 1h-cached writes 2× input rate.
- Both read at 0.1× input rate.
- 5m cache: if max_gap_minutes <= 5, write once and read on later calls. If max_gap_minutes > 5, approximate a miss on each call, so the stable prefix is written every time.
- 1h cache: write once, read for all subsequent calls.
Returns a dict {"uncached": <cost>, "cached_5m": <cost>, "cached_1h": <cost>, "winner": <strategy with lowest cost>}.
All costs rounded to 2 places.

Two scenarios run for you. Expected output:

short gaps (2 min): cached_5m=0.33 cached_1h=0.35 winner=cached_5m
gappy session (20 min gaps): cached_5m=3.06 cached_1h=0.35 winner=cached_1h

full-screen editor opens — close anytime to keep reading.

Prompt caching correctly — the variable input goes LAST — step 9 of 9