read_text, write_text, glob — the three you'll use every day

The intro showed read_text and write_text. This step adds the third call you'll meet on day one of any AI project: glob. Together they cover the 80% case for "read some files, do something, write the result."

`read_text` and `write_text`

p.read_text() opens the file at p, reads the whole contents into a string, closes the file. Encoding defaults to UTF-8, which is what you want 99% of the time.

p.write_text(content) opens the file at p for writing, dumps the string, closes the file. If the file exists, it gets overwritten — no append, no warning. If the parent directory doesn't exist, the call raises FileNotFoundError. (You can p.parent.mkdir(parents=True, exist_ok=True) to ensure the directory first.)

There's also read_bytes and write_bytes for binary content (PDFs, images, anything not text). Same shape, different return type.

`glob` — find files matching a pattern

p.glob(pattern) returns an iterator of paths inside the directory p whose names match pattern. The pattern syntax is the one your shell uses:

* matches any sequence of characters in a single path segment
? matches a single character
** matches any number of directories (recursive)
[abc] matches one character from the set

Common cases:

base.glob("*.csv")        # all CSVs directly inside `base`
base.glob("*/*.csv")      # CSVs one level deep
base.glob("**/*.csv")     # CSVs anywhere underneath, recursive

glob returns an iterator, not a list. You usually want to wrap it in sorted(...) or list(...) if the order matters or you'll iterate twice. sorted gives you alphabetical order by full path, which is deterministic and almost always what you want for a file listing.

A worked example

The editor on the right writes three files into /tmp/notes, two .txt and one .md, then globs only the text ones:

from pathlib import Path

base = Path("/tmp/notes")
base.mkdir(exist_ok=True)
(base / "alpha.txt").write_text("first")
(base / "beta.txt").write_text("second")
(base / "gamma.md").write_text("third")

for txt in sorted(base.glob("*.txt")):
    print(txt.name, "->", txt.read_text())

Output:

alpha.txt -> first
beta.txt -> second

The .md file is filtered out because the pattern *.txt only matches .txt extensions. sorted makes sure alpha comes before beta regardless of insertion order. txt.name strips the directory, giving just the filename.

This six-line shape — make a directory, write a few files, glob and read each one — is what AI generates dozens of times in any data pipeline, log processor, or scratch script.

Where AI specifically gets this wrong

Two patterns worth flagging.

One: forgetting mkdir(exist_ok=True). Cursor writes p.write_text(...) against a path whose parent directory doesn't exist yet, and the call blows up with FileNotFoundError. The fix is one line — p.parent.mkdir(parents=True, exist_ok=True) before the write. AI sometimes skips this and only adds it after the first crash.

Two: globbing without sorted. The order of glob results is filesystem-defined, which means it's deterministic on one machine and randomly different on another. If your script does anything based on file order (concatenation, batching, IDs), wrap the glob in sorted(...) or you'll ship a bug that only shows up in production.

Run the editor. Notice the .md file gets created but never read — the pattern *.txt filters it out cleanly.

pathlib — the file api ai should reach for first

read_text, write_text, glob — the three you'll use every day

read_text and write_text

glob — find files matching a pattern

A worked example

Where AI specifically gets this wrong

`read_text` and `write_text`

`glob` — find files matching a pattern