Overview

Conversational AI Memory via a Dynamic, Self-Describing Knowledge Graph

This work represents a shift from simple, stateless RAG to a sophisticated, structured and stateful AI architecture where memory is not just a retrieved document, but a evolving and self-describing model of the external knowledge and the conversation itself.

Current Implementation Status:

KG architecture and services: Done
Consolidation Module: Done
Retrieval Module v1 (Hybrid Recall): Done, testing
Retrieval Module v2 (GNN powered): Designing, literature reviewing

A graph sample from conversation haystack

KG Design

The objective is to maintain a dynamic Knowledge Graph where structure and meaning are defined by node and edge attributes.

Node Design

Nodes are the core entities in our memory graph. Each node includes the following key attributes:

id: A system-generated unique identifier (e.g., <timestamp>_<sanitized_label>).
type: The node’s primary role, chosen from a predefined, hierarchical schema (e.g., category → event_node → sub_event).
label: A concise, human-readable name for the node (e.g., “Paris Conference Planning”).
description: A detailed, text-based account of the node’s content (e.g., a conversation summary, knowledge details).
description_embedding: A dense vector representation of the description, generated by a Sentence-Transformer model. This serves as the initial feature for both semantic search and the GNN input feature.
status: The node’s lifecycle state, either active or archived.
update_time: A timestamp tracking the last modification.

Relation (Edge) Design

Edges define the rich, typed relationships between nodes, forming the semantic and structural backbone of the graph.

label: The human-readable meaning of the relationship (e.g., HAS_GOAL, REFERENCES_KNOWLEDGE, PRECEDES).
type: The category of the relationship:
- structural: Defines the graph’s organizational backbone.
- semantic: Represents meaningful connections between entities.

Module: Context Retrieval

The ability to retrieve relevant historical context is central to the agent’s intelligence. The project defines a two-stage evolution for this module.

V1: Hybrid Recall Engine

This initial system establishes a powerful, non-graph-aware retrieval baseline. It is used to generate the initial KG from historical data and serves as a crucial benchmark for the GNN.

Component 1: Multi-Intent Query Generator: An LLM analyzes the user’s turn and generates one or more precise query objects.
Component 2: Hybrid Search Powerhouse: For each query, it performs two parallel searches on all nodes:
- Keyword Search: A BM25 index on node description texts.
- Semantic Search: An ANN search using the description_embedding attribute.
Component 3: Fusion & Reranking: The results are merged using Reciprocal Rank Fusion (RRF) and optionally reranked by a lightweight model.

V2: GNN-Powered Retrieval

This advanced system leverages the graph’s relational structure to achieve superior retrieval performance.

Core Model: A Heterogeneous Graph Transformer (HGT) trained to produce “context-aware” node embeddings.
Input: The GNN takes the pre-computed description_embedding of a node and its neighbors as its initial input features.
Output: A final, structurally-informed embedding that represents the node’s role and context within the entire graph. This output embedding is stored in a dedicated vector database (e.g., Milvus) for retrieval.
Workflow: The retrieval process is a single, highly efficient ANN search in the GNN-produced embedding space, followed by the same optional reranking step.

Module: Context-Aware Consolidation

This module is the “intelligent writer” for the KG. It is decoupled into two distinct processes: a real-time, two-stage consolidation pipeline and an asynchronous classification service.

Real-time: Two-Stage Consolidation Pipeline

This pipeline runs for every new dialogue turn to integrate information immediately.

Stage 1: Strategy & Structure Generation (The “Architect” Model)
- Model: A powerful, reasoning-focused LLM (e.g., gemini-2.5-flash).
- Input: The new dialogue turn and the context retrieved by the active Retrieval Module (v1 or v2).
- Task: Decides what structural changes to make to the graph (add_node, update_node, add_relation). It generates a “blueprint” of these operations for “unclassified” event and knowledge nodes.
Stage 2: Node Detail Population (The “Scribe” Model)
- Model: A lightweight, efficient model (e.g., a fine-tuned gemma-3).
- Input: The dialogue, retrieved context, and the architect’s blueprint.
- Task: Performs a focused information extraction task to populate the label and description for each operation in the blueprint. When a description is created or updated, the description_embedding is generated/updated accordingly.

Asynchronous: Node Classification Service

This is a background process that runs periodically to organize the graph.

Task: It scans the KG for event_node and knowledge_node entities that are “unclassified” (i.e., not yet linked to a parent event_category or knowledge_category node).
Mechanism: It uses a dedicated, lightweight classification model (which can be a fine-tuned LLM or a traditional classifier) to determine the most appropriate category for each unclassified node.
Action: Upon successful classification, it creates the structural CONTAINS relationship from the parent category node to the child node, thus completing the graph’s hierarchical organization.

This decoupled approach makes the real-time consolidation process simpler and faster, while still ensuring the graph remains well-organized over time.

Next Step

The project follows a rigorous, two-version development plan to ensure robust evaluation and scientifically sound conclusions.

Step 1: Run the Baseline (v1)
- Use this v1 system to process a large historical dialogue corpus, generating the “base knowledge graph” needed for GNN training.
Step 2: Train the GNN (v2)
- On the static “base knowledge graph,” perform a large-scale, offline pre-training of the HGT model.
- Training Strategy: Employ a hybrid self-supervised learning objective, combining Link Prediction (to learn graph structure) and Contrastive Learning (to learn semantic robustness), as detailed in the research plan.
Step 3: Deploy and Evaluate
- Deploy the v2 system with the trained, frozen GNN encoder.
- On a held-out test set of conversations/queries, conduct a thorough comparative analysis of the v1 and v2 retrieval modules.
- The primary goal is to demonstrate that the structural awareness of the v2 GNN provides a statistically significant performance improvement over the powerful v1 hybrid search baseline.

Bowen Notes

Explorer

M7 - Complete Pipeline V1