Agents
A shared verification spine, applied across domains. The same idea everywhere: the agent that produces a result never grades it — a held-out verifier does.
LenaLab
Computer vision · visual odometry
A Claude agent authors visual-odometry algorithms from scratch in a sealed sandbox; the verifier grades them on held-out trajectories. Best result: 0.033 m metric on an unseen scene, beating the classical reference.
↳ built on Touchstone
Lodestar
SLAM testbed
A 3D testbed that grades SLAM against fixed, hidden ground truth across progressive rungs (pose-graph → RGB-D VO → full SLAM with loop closure). The verifier is constant; only the solver domain swaps.
↳ built on Touchstone
Touchstone
· Verification spineThe domain-agnostic harness: a held-out evaluator, crash-resumable registry, token/experiment budget, single-GPU lease, and a calibration gate that refuses to open if the grader can't tell a good run from a broken one.
GitHub →Also
Blueberry
legacyPerception research lab (legacy)
An earlier self-running perception lab on a CLI loop. Its honest arc — a differentiable-pose beat, then its own dismantling (doesn't generalize indoor; loses to a strong baseline) — is kept as a record of the verification-first idea working before Touchstone generalized it.