Checkpoint
One last thing before we move on. Same surface as a write step — but the lesson doesn't complete until this passes.
Final drill. Build a structure-aware chunker WITH overlap. Write
chunk_document(text, max_chars, overlap, separators) that:
- First runs the recursive split (algorithm from step 8) to get
chunks ≤
max_chars. - Then adds overlap: for each chunk after the first, prepend the
LAST
overlapcharacters of the previous chunk. - Returns the overlap-augmented chunk list.
Three cases run for you. Expected output:
3 chunks, lengths: [22, 30, 27]
first chunk: 'First short paragraph.'
second chunk starts with: 'ragraph.Second paragrap'