Whasuk Lee

Agents

A shared verification spine, applied across domains. The same idea everywhere: the agent that produces a result never grades it — a held-out verifier does.

LenaLab

Computer vision · visual odometry

A Claude agent authors visual-odometry algorithms from scratch in a sealed sandbox; the verifier grades them on held-out trajectories. Best result: 0.033 m metric on an unseen scene, beating the classical reference.

↳ built on Touchstone

Lodestar

SLAM testbed

A 3D testbed that grades SLAM against fixed, hidden ground truth across progressive rungs (pose-graph → RGB-D VO → full SLAM with loop closure). The verifier is constant; only the solver domain swaps.

↳ built on Touchstone

no works yetGitHub →
▲ built on ▲

Touchstone

· Verification spine

The domain-agnostic harness: a held-out evaluator, crash-resumable registry, token/experiment budget, single-GPU lease, and a calibration gate that refuses to open if the grader can't tell a good run from a broken one.

GitHub →

Also

Blueberry

legacy

Perception research lab (legacy)

An earlier self-running perception lab on a CLI loop. Its honest arc — a differentiable-pose beat, then its own dismantling (doesn't generalize indoor; loses to a strong baseline) — is kept as a record of the verification-first idea working before Touchstone generalized it.