Architecture¶
CongraphDB is a high-performance, embedded graph database designed for Node.js (via napi-rs). This document outlines the current architecture following the major refactoring.
System Overview¶
┌─────────────────────────────────────────────┐
│ Node.js Application │
│ (Business logic, API endpoints, etc.) │
└─────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ napi-rs Bindings Layer │
│ (lib.rs - TypeScript ↔ Rust bridge) │
└─────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ Core Layer │
│ ┌──────────┐ ┌────────────┐ ┌───────────┐ │
│ │Database │ │Connection │ │Transaction│ │
│ └──────────┘ └────────────┘ └───────────┘ │
└─────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ Query Engine │
│ ┌─────────┐ ┌──────────┐ ┌─────────────┐ │
│ │ Parser │ │ Binder │ │ Executor │ │
│ │ (nom) │ │(semantic)│ │ + Operators │ │
│ └─────────┘ └──────────┘ └─────────────┘ │
└─────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ Table & Index Layer │
│ ┌──────────┐ ┌──────────┐ ┌────────────┐ │
│ │NodeTable │ │RelTable │ │IndexManager│ │
│ └──────────┘ └──────────┘ └────────────┘ │
└─────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ Storage Engine │
│ ┌──────────┐ ┌──────────┐ ┌────────────┐ │
│ │ Buffer │ │Storage │ │ WAL │ │
│ │ Manager │ │Manager │ │ Manager │ │
│ └──────────┘ └──────────┘ └────────────┘ │
└─────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ .cgraph file + .wal file │
│ (Persistent storage) │
└─────────────────────────────────────────────┘
High-Level Architecture¶
The system is organized into several key layers:
1.1 Core Layer (src/core)¶
- Database: The main entry point. Manages shared state (Catalog, StorageManager, BufferManager).
- Connection: Handles session state, transaction scope, and query execution entry points.
- Transaction: Implements ACID properties through a transaction manager and WAL.
1.2 Storage Layer (src/storage)¶
- BufferManager: Manages page caching and disk I/O.
- StorageManager: Handles page-level access to data files.
- Catalog: Persists schema information (Node and Relationship table definitions).
- WAL (Write-Ahead Log): Ensures durability and atomic commits.
1.3 Table & Index Layer (src/table, src/index)¶
- NodeTable: Manages node records and property columns.
- RelTable: Manages relationship records, connecting source and target nodes with specific types.
- IndexManager: Coordinates secondary indexes (HNSW, Hash) for fast property lookups.
1.4 Query Engine (src/query)¶
The query engine follows a classic pipeline:
- Parsing (
cypher): Usesnomto convert Cypher text into an Abstract Syntax Tree (AST). - Binding (
binder): Performs semantic analysis, validates labels/properties against the catalog, and binds variables to specific tables. - Planning (
executor/builder): Converts the bound query into a physical execution plan (operator tree). - Execution (
executor/execution): Coordinates the execution and manages result streaming. - Operators (
operator): Implementation of physical operators organized into: - Streaming operators: Scan, PatternMatch, OptionalMatch, Transform
- Non-streaming operators: Aggregate, Path, PathVarLength
- Write operators: Create, Merge, Delete, Update
Component Diagram¶
graph TD
API[Node.js API / napi-rs] --> Core[Core: Database/Connection/Transaction]
Core --> Query[Query Engine]
subgraph "Query Engine"
Query --> Parser[Parser: cypher/nom]
Parser --> Binder[Binder: semantic analysis]
Binder --> Builder[Builder: physical plan]
Builder --> Executor[Executor]
Executor --> Operators[Physical Operators]
end
Operators --> Table[Table Layer: NodeTable/RelTable]
Table --> Storage[Storage Layer: Buffer/Storage Manager]
Storage --> Disk[(Disk / WAL)]
Table --> Index[Index Layer: HNSW/Hash]
Index --> Storage
Core Components¶
1. napi-rs Bindings¶
- File:
src/lib.rs - Purpose: Bridge between JavaScript and Rust
- Technology: napi-rs with neon Serde serialization
#[napi]
impl Database {
#[napi(constructor)]
pub fn new(path: Option<String>) -> Self {
// ...
}
#[napi]
pub fn init(&mut self) {
// ...
}
}
2. Query Engine¶
- Directory:
src/query/ - Components:
- Parser (
cypher): Cypher → AST usingnomparser combinator - Binder: Semantic analysis with modular structure:
node- Node binding logicrel- Relationship binding logicexpression- Expression evaluationclause- Clause-level bindingpattern- Pattern matching bindingstatement- Statement-level binding
- Builder: Converts bound queries to physical operator tree
- Executor: Executes operator trees and manages result streaming
3. Storage Engine¶
- Directory:
src/storage/ - Components:
- Buffer Manager (
buffer): Page caching with LRU eviction - Storage Manager: Page-level file access
- Catalog (
catalog): Schema persistence - Page Manager: Memory-mapped file I/O
- WAL (
wal): Write-ahead logging for durability
4. Table Management¶
- Directory:
src/table/ - Components:
- Node Table: Columnar storage for nodes
- Rel Table: Adjacency lists for relationships
- Chunk Manager: Column chunk management
5. Index Structures¶
- Directory:
src/index/ - Components:
- HNSW (
hnsw): Vector similarity search - Hash: Fast exact match lookups
6. Algorithm Module¶
- Directory:
src/algorithm/ - Purpose: Graph algorithms operating directly on table layer
- Components:
- Centrality: PageRank, Betweenness, Closeness, Degree
- Community Detection: Louvain, Leiden, Spectral, SLPA, Infomap, Label Propagation, Walktrap
- Traversal: BFS, DFS
- Path: Dijkstra, Bidirectional Dijkstra
- Analytics: Triangle Count
- Configuration:
AlgorithmConfigwith direction, iterations, tolerance - Result Types: NodeScores, NodeLabels, TraversalOrder, ShortestPaths, TriangleCount
See also: Algorithm Internals for detailed algorithm documentation.
Recent Architecture Improvements¶
The recent refactoring introduced several benefits:
- Modular Binder: The binder is now split into
node,rel,expression,clause,pattern, andstatementmodules, making it easier to add support for complex Cypher features. - Separated Query Logic: The query execution is no longer a monolithic block.
builder.rsandexecution.rshave clear, distinct responsibilities. - Organized Operators: Physical operators are organized into
streaming,non_streaming, andwritemodules for better discoverability and maintenance.
Recommended Architecture Enhancements¶
3.1 Logical Optimization Layer¶
Status: Missing.
Proposal: Introduce a LogicalOptimizer between the Binder and the Builder.
- Predicate Pushdown: Move filters as close to the scan as possible.
- Projection Pruning: Only read columns that are actually needed in the final result or in intermediate filters.
- Constant Folding: Pre-evaluate constant expressions at compile time.
3.2 Cost-Based Optimizer (CBO)¶
Status: Rule-based (simple translation).
Proposal: Implement statistics collection (node counts, property cardinality, selectivity).
- Join Order Optimization: Use statistics to choose which labels to scan first in complex
MATCHpatterns. - Index Selection: Automatically choose between a full scan and an index lookup based on estimated cost.
3.3 Task-Based Execution Model¶
Status: Iterator-based (Pull model).
Proposal: While the current Volcano-style iterator model is simple and works well for embedded workloads, a task-based or push-based model could improve performance for highly concurrent or very large queries by better utilizing multiple cores.
3.4 Vector Search Integration¶
Status: Basic types exist.
Proposal: Deepen the integration of vector types and distance functions.
- HNSW Index: Add support for hierarchical navigable small world indexes for fast ANN (Approximate Nearest Neighbor) search, making CongraphDB a first-class choice for AI/RAG applications.
3.5 Storage Engine Abstraction¶
Status: Tightly coupled to custom page format.
Proposal: Define a StorageEngine trait to allow different storage backends (e.g., Memory-only, S3-backed cache, or pluggable compression algorithms).
Technology Stack¶
| Component | Technology |
|---|---|
| Language | Rust (edition 2021) |
| FFI | napi-rs |
| Parallelism | Rayon |
| Parsing | nom |
| Serialization | Serde, bincode |
| Compression | snap (Snappy) |
| Hashing | ahash |
Data Flow¶
Read Path¶
1. Application calls conn.query()
↓
2. Parse Cypher query (nom → AST)
↓
3. Bind query (semantic analysis)
↓
4. Build physical operator tree
↓
5. Execute operators (streaming/non-streaming)
↓
6. Read from buffer pool (or disk)
↓
7. Serialize results to JavaScript
↓
8. Return QueryResult
Write Path¶
1. Application calls conn.query()
↓
2. Parse and bind query
↓
3. Build operator tree with write operators
↓
4. Execute write operators
↓
5. Write to WAL (sync)
↓
6. Update in-memory structures
↓
7. Return success
↓
8. Background: Checkpoint to main file
Concurrency Model¶
Optimistic Concurrency Control (OCC)¶
CongraphDB v0.1.8+ implements OCC with: - Read/Write set tracking per transaction - Version-based conflict detection - LRU version cache (1000 entries default) - Lock-free atomic version reads - Adaptive retry system (up to 3x multiplier under contention)
OCC Workflow¶
- Read Phase - Record versions in read set
- Validation Phase - Check versions haven't changed
- Write Phase - Apply writes and increment versions
- Retry - On conflict, rollback and retry with backoff
Concurrency Model¶
- Readers: MVCC-based concurrent reads
- Writers: Single-writer concurrency (WAL serialization)
- Isolation: Serializable snapshot isolation
- Locking: RwLock for table access
Memory Management¶
- Buffer Pool: Fixed-size page cache (configurable)
- Page Size: 4KB default
- Eviction: LRU with clock algorithm
- Mapping: Memory-mapped I/O with
memmap2
Error Handling¶
- Strategy: Structured errors with
thiserror - Propagation: Errors bubble through napi-rs to JavaScript
- Recovery: WAL replay on startup
See Also¶
- Storage Format — On-disk structure
- Query Execution — Query processing details
- Binder Details — Semantic analysis and binding
- Operators — Physical operator reference