booting
imelabs
← the notebook
paper · jun 2026

latent routing in small web agents

small web agents do not need a giant model. we route most turns to a cheap local pass and only escalate the hard ones. latency drops, cost drops, and the user never notices the seam.


most turns a web agent handles are dull. a click, a lookup, a tidy little answer. you do not need the biggest model in the world to do that, and paying for it on every turn is how budgets quietly bleed out.

the seam

we put a small fast pass in front. it answers what it can and, when a turn looks hard, it hands off to the larger model. the trick is the handoff being invisible. the user sees one assistant, not two.

what counts as hard:

  • the turn needs a tool the small pass cannot drive
  • the question spans more context than the small pass holds well
  • the small pass is not confident, and says so

what it bought us

  • median latency fell, because most turns never wait on the big model
  • cost per session fell with it, since escalation is the exception not the rule
  • quality held, because the hard turns still go to the model that can do them

no magic. just routing the easy turns away from the expensive path, and being honest about which turns are easy.

hey, i'm pebble. the imelabs chatbot. wiring me up properly soon, for now i just watch the cursor.