ir.traverse
Query-time graph traversal — the traverse operator (report 12).
Recursive retrieval, summary-routing, RAPTOR collapsed-tree, PPR /
spreading-activation, and beam walks are one operator — score frontier →
select → expand → test stop — parameterized by a pluggable
WalkPolicy. The engineering substance is termination, split in two:
safety (”will it stop?”) — structural, and enforced by the operator: a visited-set (keyed on the policy’s node id), a depth cap, and a node budget live in
WalkStateand are checked bytraverse()itself, so a buggy or adversarial policy cannot loop forever — the highest-leverage robustness decision when the graph may contain directed cycles;sufficiency (”should it stop?”) — injected and fallible: the policy’s
stop(default: never — run to budget).
The policy owns graph semantics; the operator owns the loop and the safety
primitives. store is passed to the policy verbatim and never interpreted by
the operator — collapsed-tree takes a Corpus (search to
seed, ledger to expand to chunks), an artifact-link policy a
CorpusGraph (neighbors).
Flat top-k + rerank stays the default (report 12: a strong flat retriever
beats most graph methods on simple lookup, and global graph methods cost
orders of magnitude more); traverse() is opt-in, and a policy earns
promotion only by beating flat on the eval set. The shipped
collapsed_tree_policy() is pure-vector (no LLM in the query loop): it
routes a query that matches an artifact’s summary surface down to that
artifact’s best chunk.
Returned SearchHits carry additive, JSON-clean walk
provenance — metadata["walk_depth"] and metadata["seed"] (the routing
node) — and compose with ir.select() / ir.disclose() unchanged.
- ir.traverse.DFLT_LEAF_KINDS = ('chunk', 'readme_chunk')
Default leaf surface kinds a collapsed-tree walk descends to (the results).
- ir.traverse.DFLT_MAX_DEPTH = 2
Default traversal bounds (operator-enforced safety).
- ir.traverse.DFLT_SEED_K = 10
Default number of summary surfaces a collapsed-tree walk routes from.
- ir.traverse.DFLT_SUMMARY_KINDS = ('description', 'synopsis', 'capability', 'document')
Default summary/router surface kinds a collapsed-tree walk seeds on.
- class ir.traverse.WalkPolicy(*args, **kwargs)[source]
The pluggable strategy of a walk — graph semantics, not safety.
seedproduces the initial frontier;scoreranks a node against the query;selectchooses which scored frontier nodes to commit/expand this step (beam/greedy — default: all, best-first);expandyields a node’s neighbors;node_idis the hashable visited-set key;stopis the injected sufficiency check;to_hitmaterializes a committed node as aSearchHit— orNonefor a router-only node (a summary that routes but is not itself a result).
- class ir.traverse.WalkState(query: str, max_depth: int, budget: int, visited: set = <factory>, results: list = <factory>, cache: dict = <factory>)[source]
The operator-owned state of one
traverse()call — the safety home.visited(node ids already committed),budget, andmax_depthare the structural safety primitives the operator enforces;resultsare the emitted hits;cacheis scratch space a policy may use (e.g. to embed the query once). A policy reads this but the operator enforces the bounds — a policy cannot opt out of termination.
- ir.traverse.collapsed_tree_policy(*, summary_kinds: Iterable[str] = ('description', 'synopsis', 'capability', 'document'), leaf_kinds: Iterable[str] = ('chunk', 'readme_chunk'), seed_k: int = 10) WalkPolicy[source]
The pure-vector summary-routing / collapsed-tree
WalkPolicy.Seeds on the top seed_k matches among
summary_kindssurfaces and descends to each routed artifact’sleaf_kindssurfaces (the emitted results), scored by cosine to the query. No LLM in the loop. A summary surface is a router (suppressed from results) only when its artifact has leaf surfaces; on a single-surface corpus (WholeTextdocument, Skillcapability) the summaries are leaf-less and emitted directly, so the walk degrades to flat-over-summaries instead of returning nothing.The defaults keep
document/capabilityinsummary_kindson purpose — that is what lets a WholeText / Skill corpus seed at all; the structural router check (above) is what keeps those seeds from being silently swallowed.>>> hits = traverse(q, corpus, policy=collapsed_tree_policy())
- ir.traverse.traverse(query: str, store: Any, *, policy: WalkPolicy, max_depth: int = 2, node_budget: int = 64, k: int = 10) list[SearchHit][source]
Walk store from query under policy, returning the top-k hits.
The loop — score the frontier → select → commit → expand — is the operator’s; the safety primitives are non-negotiable and enforced here: a node id is committed at most once (the visited-set), expansion stops at
max_depth, and no more thannode_budgetnodes are ever committed. A policy whoseexpandcycles forever and whosestopnever fires still terminates.- Parameters:
query – the user intent.
store – passed to policy verbatim — a
Corpusforcollapsed_tree_policy(), aCorpusGraphfor an artifact-link policy. The operator never inspects it.policy – the
WalkPolicy(e.g.collapsed_tree_policy()).max_depth – maximum expansion depth from a seed (safety).
node_budget – maximum nodes committed (safety).
k – number of hits to return.
- Returns:
the committed hits, best-first, top-k — each a
SearchHitwithmetadata["walk_depth"]/["seed"].