promptdojo_

Retrieval that finds the right thing — top-k, thresholds, and the rerank step everyone skips — step 9 of 9

Checkpoint

One last thing before we move on. Same surface as a write step — but the lesson doesn't complete until this passes.

Final drill. Build rag_retrieve(query_vec, indexed, k, threshold) that:

  • Takes indexed as a list of (chunk_id, doc_vec) pairs.
  • Computes cosine similarity between query_vec and each doc_vec (use cosine provided in the starter).
  • Filters by score >= threshold.
  • Dedupes by chunk_id, keeping the highest-scoring occurrence.
  • Returns the top-k chunk_ids in descending score order.

Three cases run. Expected output:

['policy/refund_terms', 'policy/refund_eligibility']
[]
['policy/refund_terms']

full-screen editor opens — close anytime to keep reading.