Query Execution¶

How CongraphDB processes Cypher queries from parsing to execution.

Query Processing Pipeline¶

Cypher Query String
        │
        ▼
    ┌──────────┐
    │  Parser  │  →  AST (Abstract Syntax Tree)
    │ (cypher) │     using nom combinator
    └──────────┘
        │
        ▼
  ┌─────────────┐
  │  Binder     │  →  Bound Query
  │ (semantic) │     - Validates labels/properties
  └─────────────┘     - Binds variables to tables
        │
        ▼
 ┌────────────────┐
  │  Builder      │  →  Physical Operator Tree
  │ (executor)    │     - Constructs execution plan
  └────────────────┘
        │
        ▼
  ┌──────────────┐
  │   Executor   │  →  Results
  │ (execution)  │     - Streaming execution
  └──────────────┘

1. Parsing¶

The parser converts Cypher text into an AST using the nom combinator library.

Parser Structure¶

Located in src/query/cypher/:

mod.rs - Main parser module
clause.rs - Clause parsing (MATCH, RETURN, WHERE, etc.)
expr.rs - Expression parsing
pattern.rs - Pattern parsing for graph patterns
schema.rs - Schema statement parsing (CREATE NODE TABLE, etc.)
types.rs - AST type definitions

Example¶

Input:

MATCH (u:User {name: 'Alice'})-[:KNOWS]->(f:User)
RETURN u.name, f.name

AST:

name="__codelineno-2-1" href="#__codelineno-2-1">Query { clauses: [ Match { pattern: Pattern { parts: [ Node { variable: "u", label: "User", predicates: [Equals("name", "Alice")] }, Relationship { type: "KNOWS", direction: Outgoing }, Node { variable: "f", label: "User" } ] } }, Return { items: [ Property("u", "name"), Property("f", "name") ] } ] }

2. Binding (Semantic Analysis)¶

The binder performs semantic analysis on the AST, validating references and binding variables to specific tables.

Binder Structure¶

Located in src/query/binder/:

binder.rs - Main binder implementation
clause.rs - Clause-level binding
statement.rs - Statement-level binding
expression.rs - Expression binding and validation
pattern.rs - Pattern binding (nodes, relationships)
types.rs - Bound query types

Binding Process¶

Validate Labels: Check that all referenced labels exist in the catalog
Validate Properties: Check that property references are valid for their types
Bind Variables: Map variables to specific table sources
Type Checking: Ensure expressions are type-safe
Resolve Scopes: Handle variable scoping across query clauses

Example Binding¶

For the query above, binding produces:

BoundQuery {
  bound_clauses: [
    BoundMatch {
      bound_pattern: [
        BoundNode {
          variable: "u",
          table: "User",
          filters: [PropertyFilter("name", Equals, "Alice")]
        },
        BoundRelationship {
          type: "KNOWS",
          direction: Outgoing,
          table: "KNOWS"
        },
        BoundNode {
          variable: "f",
          table: "User",
          filters: []
        }
      ]
    },
    BoundReturn {
      items: [
        BoundProperty("u", "name", StringColumn),
        BoundProperty("f", "name", StringColumn)
      ]
    }
  ]
}

3. Building (Physical Planning)¶

The builder converts the bound query into a physical operator tree optimized for execution.

Builder Structure¶

Located in src/query/executor/builder.rs:

Converts bound clauses to operators
Handles operator composition
Manages variable scope across operators

Physical Operators¶

Operators are organized into three categories in src/query/operator/:

Streaming Operators (`streaming/`)¶

Operator	Description	File
`Scan`	Full table scan	`scan.rs`
`PatternMatch`	Graph pattern matching	`pattern_match.rs`
`OptionalMatch`	Optional pattern matching (LEFT JOIN)	`optional_match.rs`
`Transform`	Column projection and transformation	`transform.rs`

Non-Streaming Operators (`non_streaming/`)¶

Operator	Description	File
`Aggregate`	Aggregation (COUNT, SUM, AVG, etc.)	`aggregate.rs`
`Path`	Path finding operations	`path.rs`
`PathVarLength`	Variable-length path patterns	`path_var_length.rs`

Write Operators (`write/`)¶

Operator	Description	File
`Create`	CREATE clause execution	`create.rs`
`Merge`	MERGE clause execution	`merge.rs`
`Delete`	DELETE clause execution	`delete.rs`
`Update`	SET clause execution	`update.rs`

Example Physical Plan¶

Transform(u.name, f.name)
  └─ PatternMatch
      ├─ Scan(User) u
      │  └─ Filter(u.name = 'Alice')
      └─ Scan(Knows) -> Scan(User) f

4. Execution¶

The executor runs the physical operator tree using a streaming execution model.

Executor Structure¶

Located in src/query/executor/:

execution.rs - Main execution engine
result.rs - Result handling and streaming
stats.rs - Execution statistics
table_manager.rs - Table access during execution
utils.rs - Execution utilities

Streaming Execution¶

CongraphDB uses a streaming execution model (Volcano-style):

┌─────────────────────────────────────────────────┐
│              Operator Pipeline                  │
├─────────────────────────────────────────────────┤
│  Transform pulls from PatternMatch              │
│    PatternMatch pulls from Scan                 │
│      Scan reads from storage                    │
│    PatternMatch joins results                   │
│  Transform projects final columns               │
└─────────────────────────────────────────────────┘

Pull-Based Iteration¶

// Simplified execution model
while let Some(row) = operator.next() {
    // Process row
    // Return to client
}

Result Streaming¶

Results are streamed to the client to minimize memory usage:

Row by row: Each row is produced and sent immediately
No materialization: Intermediate results are not stored
Lazy evaluation: Operators only compute when needed

Query Modifiers¶

LIMIT and SKIP¶

LIMIT: Stops execution after N rows
SKIP: Skips first N rows
Optimization: Applied at the executor level

ORDER BY¶

Sorting: Uses external sort for large result sets
Optimization: Pushed down when possible

DISTINCT¶

Deduplication: Uses hashing for efficient duplicate removal
Optimization: Combined with aggregation when possible

Vector Query Execution¶

Vector similarity queries use HNSW for approximate nearest neighbor search:

Query: ORDER BY embedding <-> $query

Execution:
  1. Convert query to vector
  2. Traverse HNSW graph to find nearest candidates
  3. Compute exact distances for candidates
  4. Sort by distance
  5. Apply LIMIT

Write Query Execution¶

Write queries (CREATE, MERGE, DELETE, SET) use special write operators:

CREATE (u:User {name: 'Alice'})

Execution:
  1. Parse and bind
  2. Build operator tree with Create operator
  3. Execute Create operator:
     - Allocate new node ID
     - Write to WAL
     - Update in-memory structures
  4. Return created node

Error Handling¶

Errors can occur at any stage:

Stage	Error Type	Example
Parser	SyntaxError	Invalid Cypher syntax
Binder	SemanticError	Undefined label, type mismatch
Builder	PlanError	Invalid query structure
Executor	RuntimeError	I/O error, constraint violation

Performance Considerations¶

Memory Usage¶

Streaming: Results streamed to avoid large allocations
No intermediate materialization: Operators pass rows directly
Efficient representations: Compact data structures

I/O Optimization¶

Prefetching: Sequential read ahead
Batching: Group page reads
Caching: Keep hot pages in buffer pool

CPU Optimization¶

Zero-copy: Avoid unnecessary data copying
Cache locality: Columnar storage improves cache hits
Branch prediction: Minimize branches in hot paths