ir.traverse

Query-time graph traversal — the traverse operator (report 12).

Recursive retrieval, summary-routing, RAPTOR collapsed-tree, PPR / spreading-activation, and beam walks are one operatorscore frontier → select → expand → test stop — parameterized by a pluggable WalkPolicy. The engineering substance is termination, split in two:

  • safety (”will it stop?”) — structural, and enforced by the operator: a visited-set (keyed on the policy’s node id), a depth cap, and a node budget live in WalkState and are checked by traverse() itself, so a buggy or adversarial policy cannot loop forever — the highest-leverage robustness decision when the graph may contain directed cycles;

  • sufficiency (”should it stop?”) — injected and fallible: the policy’s stop (default: never — run to budget).

The policy owns graph semantics; the operator owns the loop and the safety primitives. store is passed to the policy verbatim and never interpreted by the operator — collapsed-tree takes a Corpus (search to seed, ledger to expand to chunks), an artifact-link policy a CorpusGraph (neighbors).

Flat top-k + rerank stays the default (report 12: a strong flat retriever beats most graph methods on simple lookup, and global graph methods cost orders of magnitude more); traverse() is opt-in, and a policy earns promotion only by beating flat on the eval set. The shipped collapsed_tree_policy() is pure-vector (no LLM in the query loop): it routes a query that matches an artifact’s summary surface down to that artifact’s best chunk.

Returned SearchHits carry additive, JSON-clean walk provenance — metadata["walk_depth"] and metadata["seed"] (the routing node) — and compose with ir.select() / ir.disclose() unchanged.

ir.traverse.DFLT_LEAF_KINDS = ('chunk', 'readme_chunk')

Default leaf surface kinds a collapsed-tree walk descends to (the results).

ir.traverse.DFLT_MAX_DEPTH = 2

Default traversal bounds (operator-enforced safety).

ir.traverse.DFLT_SEED_K = 10

Default number of summary surfaces a collapsed-tree walk routes from.

ir.traverse.DFLT_SUMMARY_KINDS = ('description', 'synopsis', 'capability', 'document')

Default summary/router surface kinds a collapsed-tree walk seeds on.

class ir.traverse.WalkPolicy(*args, **kwargs)[source]

The pluggable strategy of a walk — graph semantics, not safety.

seed produces the initial frontier; score ranks a node against the query; select chooses which scored frontier nodes to commit/expand this step (beam/greedy — default: all, best-first); expand yields a node’s neighbors; node_id is the hashable visited-set key; stop is the injected sufficiency check; to_hit materializes a committed node as a SearchHit — or None for a router-only node (a summary that routes but is not itself a result).

class ir.traverse.WalkState(query: str, max_depth: int, budget: int, visited: set = <factory>, results: list = <factory>, cache: dict = <factory>)[source]

The operator-owned state of one traverse() call — the safety home.

visited (node ids already committed), budget, and max_depth are the structural safety primitives the operator enforces; results are the emitted hits; cache is scratch space a policy may use (e.g. to embed the query once). A policy reads this but the operator enforces the bounds — a policy cannot opt out of termination.

ir.traverse.collapsed_tree_policy(*, summary_kinds: Iterable[str] = ('description', 'synopsis', 'capability', 'document'), leaf_kinds: Iterable[str] = ('chunk', 'readme_chunk'), seed_k: int = 10) WalkPolicy[source]

The pure-vector summary-routing / collapsed-tree WalkPolicy.

Seeds on the top seed_k matches among summary_kinds surfaces and descends to each routed artifact’s leaf_kinds surfaces (the emitted results), scored by cosine to the query. No LLM in the loop. A summary surface is a router (suppressed from results) only when its artifact has leaf surfaces; on a single-surface corpus (WholeText document, Skill capability) the summaries are leaf-less and emitted directly, so the walk degrades to flat-over-summaries instead of returning nothing.

The defaults keep document / capability in summary_kinds on purpose — that is what lets a WholeText / Skill corpus seed at all; the structural router check (above) is what keeps those seeds from being silently swallowed.

>>> hits = traverse(q, corpus, policy=collapsed_tree_policy())
ir.traverse.traverse(query: str, store: Any, *, policy: WalkPolicy, max_depth: int = 2, node_budget: int = 64, k: int = 10) list[SearchHit][source]

Walk store from query under policy, returning the top-k hits.

The loop — score the frontier → select → commit → expand — is the operator’s; the safety primitives are non-negotiable and enforced here: a node id is committed at most once (the visited-set), expansion stops at max_depth, and no more than node_budget nodes are ever committed. A policy whose expand cycles forever and whose stop never fires still terminates.

Parameters:
  • query – the user intent.

  • store – passed to policy verbatim — a Corpus for collapsed_tree_policy(), a CorpusGraph for an artifact-link policy. The operator never inspects it.

  • policy – the WalkPolicy (e.g. collapsed_tree_policy()).

  • max_depth – maximum expansion depth from a seed (safety).

  • node_budget – maximum nodes committed (safety).

  • k – number of hits to return.

Returns:

the committed hits, best-first, top-k — each a SearchHit with metadata["walk_depth"] / ["seed"].