mirror of
https://github.com/eliasstepanik/core.git
synced 2026-01-09 22:38:29 +00:00
feat: automatic space identification
This commit is contained in:
parent
0ad2bba2ad
commit
62db6c1d51
38
README.md
38
README.md
@ -55,7 +55,7 @@ CORE memory achieves **88.24%** average accuracy in Locomo dataset across all re
|
||||
|
||||
## Overview
|
||||
|
||||
**Problem**
|
||||
**Problem**
|
||||
|
||||
Developers waste time re-explaining context to AI tools. Hit token limits in Claude? Start fresh and lose everything. Switch from ChatGPT/Claude to Cursor? Explain your context again. Your conversations, decisions, and insights vanish between sessions. With every new AI tool, the cost of context switching grows.
|
||||
|
||||
@ -64,11 +64,16 @@ Developers waste time re-explaining context to AI tools. Hit token limits in Cla
|
||||
CORE is an open-source unified, persistent memory layer for all your AI tools. Your context follows you from Cursor to Claude to ChatGPT to Claude Code. One knowledge graph remembers who said what, when, and why. Connect once, remember everywhere. Stop managing context and start building.
|
||||
|
||||
## 🚀 CORE Self-Hosting
|
||||
|
||||
Want to run CORE on your own infrastructure? Self-hosting gives you complete control over your data and deployment.
|
||||
|
||||
**Quick Deploy Options:**
|
||||
|
||||
<<<<<<< Updated upstream
|
||||
[[Deploy on Railway](https://railway.com/button.svg)](https://railway.com/deploy/core?referralCode=LHvbIb&utm_medium=integration&utm_source=template&utm_campaign=generic)
|
||||
=======
|
||||
[](https://railway.com/deploy/core?referralCode=LHvbIb&utm_medium=integration&utm_source=template&utm_campaign=generic)
|
||||
>>>>>>> Stashed changes
|
||||
|
||||
**Prerequisites**:
|
||||
|
||||
@ -80,15 +85,20 @@ Want to run CORE on your own infrastructure? Self-hosting gives you complete con
|
||||
### Setup
|
||||
|
||||
1. Clone the repository:
|
||||
|
||||
```
|
||||
git clone https://github.com/RedPlanetHQ/core.git
|
||||
cd core
|
||||
```
|
||||
|
||||
2. Configure environment variables in `core/.env`:
|
||||
|
||||
```
|
||||
OPENAI_API_KEY=your_openai_api_key
|
||||
```
|
||||
|
||||
3. Start the service
|
||||
|
||||
```
|
||||
docker-compose up -d
|
||||
```
|
||||
@ -100,6 +110,7 @@ Once deployed, you can configure your AI providers (OpenAI, Anthropic) and start
|
||||
Note: We tried open-source models like Ollama or GPT OSS but facts generation were not good, we are still figuring out how to improve on that and then will also support OSS models.
|
||||
|
||||
## 🚀 CORE Cloud
|
||||
|
||||
**Build your unified memory graph in 5 minutes:**
|
||||
|
||||
Don't want to manage infrastructure? CORE Cloud lets you build your personal memory system instantly - no setup, no servers, just memory that works.
|
||||
@ -115,24 +126,24 @@ Don't want to manage infrastructure? CORE Cloud lets you build your personal mem
|
||||
|
||||
## 🧩 Key Features
|
||||
|
||||
### 🧠 **Unified, Portable Memory**:
|
||||
### 🧠 **Unified, Portable Memory**:
|
||||
|
||||
Add and recall your memory across **Cursor, Windsurf, Claude Desktop, Claude Code, Gemini CLI, AWS's Kiro, VS Code, and Roo Code** via MCP
|
||||
|
||||

|
||||
|
||||
|
||||
### 🕸️ **Temporal + Reified Knowledge Graph**:
|
||||
### 🕸️ **Temporal + Reified Knowledge Graph**:
|
||||
|
||||
Remember the story behind every fact—track who said what, when, and why with rich relationships and full provenance, not just flat storage
|
||||
|
||||

|
||||
|
||||
|
||||
### 🌐 **Browser Extension**:
|
||||
### 🌐 **Browser Extension**:
|
||||
|
||||
Save conversations and content from ChatGPT, Grok, Gemini, Twitter, YouTube, blog posts, and any webpage directly into your CORE memory.
|
||||
|
||||
**How to Use Extension**
|
||||
|
||||
1. [Download the Extension](https://chromewebstore.google.com/detail/core-extension/cglndoindnhdbfcbijikibfjoholdjcc) from the Chrome Web Store.
|
||||
2. Login to [CORE dashboard](https://core.heysol.ai)
|
||||
- Navigate to Settings (bottom left)
|
||||
@ -141,13 +152,12 @@ Save conversations and content from ChatGPT, Grok, Gemini, Twitter, YouTube, blo
|
||||
|
||||
https://github.com/user-attachments/assets/6e629834-1b9d-4fe6-ae58-a9068986036a
|
||||
|
||||
### 💬 **Chat with Memory**:
|
||||
|
||||
### 💬 **Chat with Memory**:
|
||||
Ask questions like "What are my writing preferences?" with instant insights from your connected knowledge
|
||||
|
||||

|
||||
|
||||
|
||||
### ⚡ **Auto-Sync from Apps**:
|
||||
|
||||
Automatically capture relevant context from Linear, Slack, Notion, GitHub and other connected apps into your CORE memory
|
||||
@ -156,16 +166,12 @@ Automatically capture relevant context from Linear, Slack, Notion, GitHub and ot
|
||||
|
||||

|
||||
|
||||
|
||||
### 🔗 **MCP Integration Hub**:
|
||||
### 🔗 **MCP Integration Hub**:
|
||||
|
||||
Connect Linear, Slack, GitHub, Notion once to CORE—then use all their tools in Claude, Cursor, or any MCP client with a single URL
|
||||
|
||||
|
||||

|
||||
|
||||
|
||||
|
||||
## How CORE create memory
|
||||
|
||||
<img width="12885" height="3048" alt="memory-ingest-diagram" src="https://github.com/user-attachments/assets/c51679de-8260-4bee-bebf-aff32c6b8e13" />
|
||||
@ -179,7 +185,6 @@ CORE’s ingestion pipeline has four phases designed to capture evolving context
|
||||
|
||||
The Result: Instead of a flat database, CORE gives you a memory that grows and changes with you - preserving context, evolution, and ownership so agents can actually use it.
|
||||
|
||||
|
||||

|
||||
|
||||
## How CORE recalls from memory
|
||||
@ -204,7 +209,7 @@ Explore our documentation to get the most out of CORE
|
||||
- [Connect Core MCP with Claude](https://docs.heysol.ai/providers/claude)
|
||||
- [Connect Core MCP with Cursor](https://docs.heysol.ai/providers/cursor)
|
||||
- [Connect Core MCP with Claude Code](https://docs.heysol.ai/providers/claude-code)
|
||||
- [Connect Core MCP with Codex](https://docs.heysol.ai/providers/codex)
|
||||
- [Connect Core MCP with Codex](https://docs.heysol.ai/providers/codex)
|
||||
|
||||
- [Basic Concepts](https://docs.heysol.ai/overview)
|
||||
- [API Reference](https://docs.heysol.ai/api-reference/get-user-profile)
|
||||
@ -249,6 +254,7 @@ Have questions or feedback? We're here to help:
|
||||
<a href="https://github.com/RedPlanetHQ/core/graphs/contributors">
|
||||
<img src="https://contrib.rocks/image?repo=RedPlanetHQ/core" />
|
||||
</a>
|
||||
<<<<<<< Updated upstream
|
||||
|
||||
|
||||
|
||||
@ -266,3 +272,5 @@ Have questions or feedback? We're here to help:
|
||||
|
||||
|
||||
|
||||
=======
|
||||
>>>>>>> Stashed changes
|
||||
|
||||
@ -92,3 +92,69 @@ export const sessionCompactionQueue = new Queue("session-compaction-queue", {
|
||||
},
|
||||
},
|
||||
});
|
||||
|
||||
/**
|
||||
* Space assignment queue
|
||||
* Handles assigning episodes to spaces based on semantic matching
|
||||
*/
|
||||
export const spaceAssignmentQueue = new Queue("space-assignment-queue", {
|
||||
connection: getRedisConnection(),
|
||||
defaultJobOptions: {
|
||||
attempts: 3,
|
||||
backoff: {
|
||||
type: "exponential",
|
||||
delay: 2000,
|
||||
},
|
||||
removeOnComplete: {
|
||||
age: 3600,
|
||||
count: 1000,
|
||||
},
|
||||
removeOnFail: {
|
||||
age: 86400,
|
||||
},
|
||||
},
|
||||
});
|
||||
|
||||
/**
|
||||
* Space summary queue
|
||||
* Handles generating summaries for spaces
|
||||
*/
|
||||
export const spaceSummaryQueue = new Queue("space-summary-queue", {
|
||||
connection: getRedisConnection(),
|
||||
defaultJobOptions: {
|
||||
attempts: 3,
|
||||
backoff: {
|
||||
type: "exponential",
|
||||
delay: 2000,
|
||||
},
|
||||
removeOnComplete: {
|
||||
age: 3600,
|
||||
count: 1000,
|
||||
},
|
||||
removeOnFail: {
|
||||
age: 86400,
|
||||
},
|
||||
},
|
||||
});
|
||||
|
||||
/**
|
||||
* Space discovery queue
|
||||
* Handles discovering and auto-creating thematic spaces based on entity clustering
|
||||
*/
|
||||
export const spaceDiscoveryQueue = new Queue("space-discovery-queue", {
|
||||
connection: getRedisConnection(),
|
||||
defaultJobOptions: {
|
||||
attempts: 2, // Less retries since it's a long-running analysis
|
||||
backoff: {
|
||||
type: "exponential",
|
||||
delay: 5000,
|
||||
},
|
||||
removeOnComplete: {
|
||||
age: 3600,
|
||||
count: 100,
|
||||
},
|
||||
removeOnFail: {
|
||||
age: 86400,
|
||||
},
|
||||
},
|
||||
});
|
||||
|
||||
@ -18,24 +18,37 @@ import {
|
||||
processConversationTitleCreation,
|
||||
type CreateConversationTitlePayload,
|
||||
} from "~/jobs/conversation/create-title.logic";
|
||||
|
||||
import {
|
||||
processSessionCompaction,
|
||||
type SessionCompactionPayload,
|
||||
} from "~/jobs/session/session-compaction.logic";
|
||||
import {
|
||||
processSpaceAssignment,
|
||||
type SpaceAssignmentPayload,
|
||||
} from "~/jobs/spaces/space-assignment.logic";
|
||||
import {
|
||||
processSpaceSummary,
|
||||
type SpaceSummaryPayload,
|
||||
} from "~/jobs/spaces/space-summary.logic";
|
||||
import {
|
||||
processSpaceDiscovery,
|
||||
type SpaceDiscoveryPayload,
|
||||
} from "~/jobs/spaces/space-discovery.logic";
|
||||
import {
|
||||
enqueueIngestEpisode,
|
||||
enqueueSpaceAssignment,
|
||||
enqueueSessionCompaction,
|
||||
enqueueSpaceSummary,
|
||||
} from "~/lib/queue-adapter.server";
|
||||
import { logger } from "~/services/logger.service";
|
||||
|
||||
/**
|
||||
* Episode ingestion worker
|
||||
* Processes individual episode ingestion jobs with per-user concurrency
|
||||
* Processes individual episode ingestion jobs with global concurrency
|
||||
*
|
||||
* Note: Per-user concurrency is achieved by using userId as part of the jobId
|
||||
* when adding jobs to the queue, ensuring only one job per user runs at a time
|
||||
* Note: BullMQ uses global concurrency limit (5 jobs max).
|
||||
* Trigger.dev uses per-user concurrency via concurrencyKey.
|
||||
* For most open-source deployments, global concurrency is sufficient.
|
||||
*/
|
||||
export const ingestWorker = new Worker(
|
||||
"ingest-queue",
|
||||
@ -51,7 +64,7 @@ export const ingestWorker = new Worker(
|
||||
},
|
||||
{
|
||||
connection: getRedisConnection(),
|
||||
concurrency: 5, // Process up to 5 jobs in parallel
|
||||
concurrency: 1, // Global limit: process up to 1 jobs in parallel
|
||||
},
|
||||
);
|
||||
|
||||
@ -108,6 +121,61 @@ export const sessionCompactionWorker = new Worker(
|
||||
},
|
||||
);
|
||||
|
||||
/**
|
||||
* Space assignment worker
|
||||
* Handles assigning episodes to spaces based on semantic matching
|
||||
*
|
||||
* Note: Global concurrency of 1 ensures sequential processing.
|
||||
* Trigger.dev uses per-user concurrency via concurrencyKey.
|
||||
*/
|
||||
export const spaceAssignmentWorker = new Worker(
|
||||
"space-assignment-queue",
|
||||
async (job) => {
|
||||
const payload = job.data as SpaceAssignmentPayload;
|
||||
return await processSpaceAssignment(
|
||||
payload,
|
||||
// Callback to enqueue space summary
|
||||
enqueueSpaceSummary,
|
||||
);
|
||||
},
|
||||
{
|
||||
connection: getRedisConnection(),
|
||||
concurrency: 1, // Global limit: process one job at a time
|
||||
},
|
||||
);
|
||||
|
||||
/**
|
||||
* Space summary worker
|
||||
* Handles generating summaries for spaces
|
||||
*/
|
||||
export const spaceSummaryWorker = new Worker(
|
||||
"space-summary-queue",
|
||||
async (job) => {
|
||||
const payload = job.data as SpaceSummaryPayload;
|
||||
return await processSpaceSummary(payload);
|
||||
},
|
||||
{
|
||||
connection: getRedisConnection(),
|
||||
concurrency: 1, // Process one space summary at a time
|
||||
},
|
||||
);
|
||||
|
||||
/**
|
||||
* Space discovery worker
|
||||
* Handles discovering and auto-creating thematic spaces based on entity clustering
|
||||
*/
|
||||
export const spaceDiscoveryWorker = new Worker(
|
||||
"space-discovery-queue",
|
||||
async (job) => {
|
||||
const payload = job.data as SpaceDiscoveryPayload;
|
||||
return await processSpaceDiscovery(payload);
|
||||
},
|
||||
{
|
||||
connection: getRedisConnection(),
|
||||
concurrency: 1, // Process one space discovery at a time (long-running)
|
||||
},
|
||||
);
|
||||
|
||||
/**
|
||||
* Graceful shutdown handler
|
||||
*/
|
||||
@ -116,8 +184,10 @@ export async function closeAllWorkers(): Promise<void> {
|
||||
ingestWorker.close(),
|
||||
documentIngestWorker.close(),
|
||||
conversationTitleWorker.close(),
|
||||
|
||||
sessionCompactionWorker.close(),
|
||||
spaceAssignmentWorker.close(),
|
||||
spaceSummaryWorker.close(),
|
||||
spaceDiscoveryWorker.close(),
|
||||
]);
|
||||
logger.log("All BullMQ workers closed");
|
||||
}
|
||||
|
||||
1201
apps/webapp/app/jobs/spaces/space-assignment.logic.ts
Normal file
1201
apps/webapp/app/jobs/spaces/space-assignment.logic.ts
Normal file
File diff suppressed because it is too large
Load Diff
235
apps/webapp/app/jobs/spaces/space-discovery.logic.ts
Normal file
235
apps/webapp/app/jobs/spaces/space-discovery.logic.ts
Normal file
@ -0,0 +1,235 @@
|
||||
import { logger } from "~/services/logger.service";
|
||||
import {
|
||||
discoverThematicSpaces,
|
||||
type SpaceProposal as ClusteringSpaceProposal,
|
||||
} from "~/services/clustering.server";
|
||||
import { SpaceService } from "~/services/space.server";
|
||||
import { prisma } from "~/trigger/utils/prisma";
|
||||
|
||||
// ============================================================================
|
||||
// Types
|
||||
// ============================================================================
|
||||
|
||||
export interface SpaceDiscoveryPayload {
|
||||
userId: string;
|
||||
workspaceId: string;
|
||||
spaceIds?: string[]; // Optional: limit discovery to specific spaces
|
||||
minEpisodeCount?: number; // Minimum episodes per entity (default: 20)
|
||||
maxEntities?: number; // Maximum entities to analyze (default: 50)
|
||||
autoCreateThreshold?: number; // Auto-create spaces with confidence >= this (default: 80)
|
||||
}
|
||||
|
||||
export interface CreatedSpace {
|
||||
id: string;
|
||||
name: string;
|
||||
description: string;
|
||||
confidence: number;
|
||||
estimatedEpisodeCount: number;
|
||||
}
|
||||
|
||||
export interface SpaceDiscoveryJobResult {
|
||||
success: boolean;
|
||||
totalProposals: number;
|
||||
highConfidenceProposals: number;
|
||||
spacesCreated: number;
|
||||
createdSpaces: CreatedSpace[];
|
||||
stats: {
|
||||
totalEntities: number;
|
||||
totalEpisodes: number;
|
||||
clustersAnalyzed: number;
|
||||
};
|
||||
error?: string;
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// Helper: Fetch Existing Spaces
|
||||
// ============================================================================
|
||||
|
||||
/**
|
||||
* Fetch all existing spaces for a user to avoid duplicate proposals
|
||||
*/
|
||||
async function fetchExistingSpaces(
|
||||
workspaceId: string,
|
||||
): Promise<Array<{ name: string; description: string | null }>> {
|
||||
try {
|
||||
const spaces = await prisma.space.findMany({
|
||||
where: {
|
||||
workspaceId,
|
||||
},
|
||||
select: {
|
||||
name: true,
|
||||
description: true,
|
||||
},
|
||||
});
|
||||
|
||||
logger.info(`Fetched ${spaces.length} existing spaces for workspace`, {
|
||||
workspaceId,
|
||||
});
|
||||
|
||||
return spaces;
|
||||
} catch (error) {
|
||||
logger.error("Failed to fetch existing spaces", { error, workspaceId });
|
||||
return [];
|
||||
}
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// Helper: Auto-Create High-Confidence Spaces
|
||||
// ============================================================================
|
||||
|
||||
/**
|
||||
* Automatically create spaces with confidence >= threshold
|
||||
*/
|
||||
async function autoCreateSpaces(
|
||||
proposals: ClusteringSpaceProposal[],
|
||||
userId: string,
|
||||
workspaceId: string,
|
||||
threshold: number,
|
||||
): Promise<CreatedSpace[]> {
|
||||
const spaceService = new SpaceService();
|
||||
const createdSpaces: CreatedSpace[] = [];
|
||||
|
||||
// Filter proposals by confidence threshold
|
||||
const highConfidenceProposals = proposals.filter(
|
||||
(p) => p.confidence >= threshold,
|
||||
);
|
||||
|
||||
logger.info(
|
||||
`Auto-creating ${highConfidenceProposals.length} spaces with confidence >= ${threshold}%`,
|
||||
);
|
||||
|
||||
for (const proposal of highConfidenceProposals) {
|
||||
try {
|
||||
// Create space using SpaceService
|
||||
const space = await spaceService.createSpace({
|
||||
name: proposal.name,
|
||||
description: proposal.intent, // Use intent as description
|
||||
userId,
|
||||
workspaceId,
|
||||
});
|
||||
|
||||
logger.info(`Auto-created space: "${space.name}" (${space.id})`, {
|
||||
confidence: proposal.confidence,
|
||||
estimatedEpisodes: proposal.estimatedEpisodeCount,
|
||||
});
|
||||
|
||||
createdSpaces.push({
|
||||
id: space.id,
|
||||
name: space.name,
|
||||
description: space.description || "",
|
||||
confidence: proposal.confidence,
|
||||
estimatedEpisodeCount: proposal.estimatedEpisodeCount,
|
||||
});
|
||||
} catch (error) {
|
||||
logger.error(`Failed to auto-create space "${proposal.name}": ${error}`, {
|
||||
proposal,
|
||||
error,
|
||||
});
|
||||
// Continue with other proposals even if one fails
|
||||
}
|
||||
}
|
||||
|
||||
logger.info(`Successfully created ${createdSpaces.length} spaces`);
|
||||
return createdSpaces;
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// Main Job Logic
|
||||
// ============================================================================
|
||||
|
||||
/**
|
||||
* Process space discovery job
|
||||
*
|
||||
* Workflow:
|
||||
* 1. Fetch existing spaces to avoid duplicates
|
||||
* 2. Run entity-first clustering analysis (discoverThematicSpaces)
|
||||
* 3. Filter out proposals that match existing spaces
|
||||
* 4. Auto-create spaces with confidence >= threshold (default 80%)
|
||||
* 5. Return results with created spaces and statistics
|
||||
*/
|
||||
export async function processSpaceDiscovery(
|
||||
payload: SpaceDiscoveryPayload,
|
||||
): Promise<SpaceDiscoveryJobResult> {
|
||||
const {
|
||||
userId,
|
||||
workspaceId,
|
||||
spaceIds,
|
||||
minEpisodeCount = 20,
|
||||
maxEntities = 50,
|
||||
autoCreateThreshold = 80,
|
||||
} = payload;
|
||||
|
||||
logger.info("Starting space discovery job", {
|
||||
userId,
|
||||
workspaceId,
|
||||
spaceIds,
|
||||
minEpisodeCount,
|
||||
maxEntities,
|
||||
autoCreateThreshold,
|
||||
});
|
||||
|
||||
try {
|
||||
// Step 1: Fetch existing spaces
|
||||
const existingSpaces = await fetchExistingSpaces(workspaceId);
|
||||
|
||||
// Step 2: Run entity-first clustering to discover thematic spaces
|
||||
logger.info("Running entity clustering analysis...");
|
||||
const discoveryResult = await discoverThematicSpaces({
|
||||
userId,
|
||||
spaceIds,
|
||||
minEpisodeCount,
|
||||
maxEntities,
|
||||
existingSpaces, // Pass existing spaces to LLM to avoid duplicates
|
||||
});
|
||||
|
||||
logger.info("Clustering analysis complete", {
|
||||
totalProposals: discoveryResult.proposals.length,
|
||||
totalEntities: discoveryResult.stats.totalEntities,
|
||||
totalEpisodes: discoveryResult.stats.totalEpisodes,
|
||||
clustersAnalyzed: discoveryResult.stats.clustersAnalyzed,
|
||||
});
|
||||
|
||||
// Step 3: Auto-create high-confidence spaces
|
||||
// Note: LLM already filters out duplicates based on existingSpaces in prompt
|
||||
const createdSpaces = await autoCreateSpaces(
|
||||
discoveryResult.proposals,
|
||||
userId,
|
||||
workspaceId,
|
||||
autoCreateThreshold,
|
||||
);
|
||||
|
||||
// Step 4: Count high-confidence proposals
|
||||
const highConfidenceProposals = discoveryResult.proposals.filter(
|
||||
(p) => p.confidence >= autoCreateThreshold,
|
||||
);
|
||||
|
||||
const result: SpaceDiscoveryJobResult = {
|
||||
success: true,
|
||||
totalProposals: discoveryResult.proposals.length,
|
||||
highConfidenceProposals: highConfidenceProposals.length,
|
||||
spacesCreated: createdSpaces.length,
|
||||
createdSpaces,
|
||||
stats: discoveryResult.stats,
|
||||
};
|
||||
|
||||
logger.info(`Space discovery job completed successfully: ${result}`);
|
||||
|
||||
return result;
|
||||
} catch (error) {
|
||||
logger.error("Space discovery job failed", { error, userId, workspaceId });
|
||||
|
||||
return {
|
||||
success: false,
|
||||
totalProposals: 0,
|
||||
highConfidenceProposals: 0,
|
||||
spacesCreated: 0,
|
||||
createdSpaces: [],
|
||||
stats: {
|
||||
totalEntities: 0,
|
||||
totalEpisodes: 0,
|
||||
clustersAnalyzed: 0,
|
||||
},
|
||||
error: error instanceof Error ? error.message : "Unknown error",
|
||||
};
|
||||
}
|
||||
}
|
||||
723
apps/webapp/app/jobs/spaces/space-summary.logic.ts
Normal file
723
apps/webapp/app/jobs/spaces/space-summary.logic.ts
Normal file
@ -0,0 +1,723 @@
|
||||
import { logger } from "~/services/logger.service";
|
||||
import { SpaceService } from "~/services/space.server";
|
||||
import { makeModelCall } from "~/lib/model.server";
|
||||
import { runQuery } from "~/lib/neo4j.server";
|
||||
import { updateSpaceStatus, SPACE_STATUS } from "~/trigger/utils/space-status";
|
||||
import type { CoreMessage } from "ai";
|
||||
import { z } from "zod";
|
||||
import { getSpace, updateSpace } from "~/trigger/utils/space-utils";
|
||||
import { getSpaceEpisodeCount } from "~/services/graphModels/space";
|
||||
|
||||
export interface SpaceSummaryPayload {
|
||||
userId: string;
|
||||
workspaceId: string;
|
||||
spaceId: string; // Single space only
|
||||
triggerSource?: "assignment" | "manual" | "scheduled";
|
||||
}
|
||||
|
||||
interface SpaceEpisodeData {
|
||||
uuid: string;
|
||||
content: string;
|
||||
originalContent: string;
|
||||
source: string;
|
||||
createdAt: Date;
|
||||
validAt: Date;
|
||||
metadata: any;
|
||||
sessionId: string | null;
|
||||
}
|
||||
|
||||
interface SpaceSummaryData {
|
||||
spaceId: string;
|
||||
spaceName: string;
|
||||
spaceDescription?: string;
|
||||
contextCount: number;
|
||||
summary: string;
|
||||
keyEntities: string[];
|
||||
themes: string[];
|
||||
confidence: number;
|
||||
lastUpdated: Date;
|
||||
isIncremental: boolean;
|
||||
}
|
||||
|
||||
// Zod schema for LLM response validation
|
||||
const SummaryResultSchema = z.object({
|
||||
summary: z.string(),
|
||||
keyEntities: z.array(z.string()),
|
||||
themes: z.array(z.string()),
|
||||
confidence: z.number().min(0).max(1),
|
||||
});
|
||||
|
||||
const CONFIG = {
|
||||
maxEpisodesForSummary: 20, // Limit episodes for performance
|
||||
minEpisodesForSummary: 1, // Minimum episodes to generate summary
|
||||
summaryEpisodeThreshold: 5, // Minimum new episodes required to trigger summary (configurable)
|
||||
};
|
||||
|
||||
export interface SpaceSummaryResult {
|
||||
success: boolean;
|
||||
spaceId: string;
|
||||
triggerSource: string;
|
||||
summary?: {
|
||||
statementCount: number;
|
||||
confidence: number;
|
||||
themesCount: number;
|
||||
} | null;
|
||||
reason?: string;
|
||||
}
|
||||
|
||||
/**
|
||||
* Core business logic for space summary generation
|
||||
* This is shared between Trigger.dev and BullMQ implementations
|
||||
*/
|
||||
export async function processSpaceSummary(
|
||||
payload: SpaceSummaryPayload,
|
||||
): Promise<SpaceSummaryResult> {
|
||||
const { userId, workspaceId, spaceId, triggerSource = "manual" } = payload;
|
||||
|
||||
logger.info(`Starting space summary generation`, {
|
||||
userId,
|
||||
workspaceId,
|
||||
spaceId,
|
||||
triggerSource,
|
||||
});
|
||||
|
||||
try {
|
||||
// Update status to processing
|
||||
await updateSpaceStatus(spaceId, SPACE_STATUS.PROCESSING, {
|
||||
userId,
|
||||
operation: "space-summary",
|
||||
metadata: { triggerSource, phase: "start_summary" },
|
||||
});
|
||||
|
||||
// Generate summary for the single space
|
||||
const summaryResult = await generateSpaceSummary(
|
||||
spaceId,
|
||||
userId,
|
||||
triggerSource,
|
||||
);
|
||||
|
||||
if (summaryResult) {
|
||||
// Store the summary
|
||||
await storeSummary(summaryResult);
|
||||
|
||||
// Update status to ready after successful completion
|
||||
await updateSpaceStatus(spaceId, SPACE_STATUS.READY, {
|
||||
userId,
|
||||
operation: "space-summary",
|
||||
metadata: {
|
||||
triggerSource,
|
||||
phase: "completed_summary",
|
||||
contextCount: summaryResult.contextCount,
|
||||
confidence: summaryResult.confidence,
|
||||
},
|
||||
});
|
||||
|
||||
logger.info(`Generated summary for space ${spaceId}`, {
|
||||
statementCount: summaryResult.contextCount,
|
||||
confidence: summaryResult.confidence,
|
||||
themes: summaryResult.themes.length,
|
||||
triggerSource,
|
||||
});
|
||||
|
||||
return {
|
||||
success: true,
|
||||
spaceId,
|
||||
triggerSource,
|
||||
summary: {
|
||||
statementCount: summaryResult.contextCount,
|
||||
confidence: summaryResult.confidence,
|
||||
themesCount: summaryResult.themes.length,
|
||||
},
|
||||
};
|
||||
} else {
|
||||
// No summary generated - this could be due to insufficient episodes or no new episodes
|
||||
// This is not an error state, so update status to ready
|
||||
await updateSpaceStatus(spaceId, SPACE_STATUS.READY, {
|
||||
userId,
|
||||
operation: "space-summary",
|
||||
metadata: {
|
||||
triggerSource,
|
||||
phase: "no_summary_needed",
|
||||
reason: "Insufficient episodes or no new episodes to summarize",
|
||||
},
|
||||
});
|
||||
|
||||
logger.info(
|
||||
`No summary generated for space ${spaceId} - insufficient or no new episodes`,
|
||||
);
|
||||
return {
|
||||
success: true,
|
||||
spaceId,
|
||||
triggerSource,
|
||||
summary: null,
|
||||
reason: "No episodes to summarize",
|
||||
};
|
||||
}
|
||||
} catch (error) {
|
||||
// Update status to error on exception
|
||||
try {
|
||||
await updateSpaceStatus(spaceId, SPACE_STATUS.ERROR, {
|
||||
userId,
|
||||
operation: "space-summary",
|
||||
metadata: {
|
||||
triggerSource,
|
||||
phase: "exception",
|
||||
error: error instanceof Error ? error.message : "Unknown error",
|
||||
},
|
||||
});
|
||||
} catch (statusError) {
|
||||
logger.warn(`Failed to update status to error for space ${spaceId}`, {
|
||||
statusError,
|
||||
});
|
||||
}
|
||||
|
||||
logger.error(
|
||||
`Error in space summary generation for space ${spaceId}:`,
|
||||
error as Record<string, unknown>,
|
||||
);
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
|
||||
async function generateSpaceSummary(
|
||||
spaceId: string,
|
||||
userId: string,
|
||||
triggerSource?: "assignment" | "manual" | "scheduled",
|
||||
): Promise<SpaceSummaryData | null> {
|
||||
try {
|
||||
// 1. Get space details
|
||||
const spaceService = new SpaceService();
|
||||
const space = await spaceService.getSpace(spaceId, userId);
|
||||
|
||||
if (!space) {
|
||||
logger.warn(`Space ${spaceId} not found for user ${userId}`);
|
||||
return null;
|
||||
}
|
||||
|
||||
// 2. Check episode count threshold (skip for manual triggers)
|
||||
if (triggerSource !== "manual") {
|
||||
const currentEpisodeCount = await getSpaceEpisodeCount(spaceId, userId);
|
||||
const lastSummaryEpisodeCount = space.contextCount || 0;
|
||||
const episodeDifference = currentEpisodeCount - lastSummaryEpisodeCount;
|
||||
|
||||
if (
|
||||
episodeDifference < CONFIG.summaryEpisodeThreshold ||
|
||||
lastSummaryEpisodeCount !== 0
|
||||
) {
|
||||
logger.info(
|
||||
`Skipping summary generation for space ${spaceId}: only ${episodeDifference} new episodes (threshold: ${CONFIG.summaryEpisodeThreshold})`,
|
||||
{
|
||||
currentEpisodeCount,
|
||||
lastSummaryEpisodeCount,
|
||||
episodeDifference,
|
||||
threshold: CONFIG.summaryEpisodeThreshold,
|
||||
},
|
||||
);
|
||||
return null;
|
||||
}
|
||||
|
||||
logger.info(
|
||||
`Proceeding with summary generation for space ${spaceId}: ${episodeDifference} new episodes (threshold: ${CONFIG.summaryEpisodeThreshold})`,
|
||||
{
|
||||
currentEpisodeCount,
|
||||
lastSummaryEpisodeCount,
|
||||
episodeDifference,
|
||||
},
|
||||
);
|
||||
}
|
||||
|
||||
// 2. Check for existing summary
|
||||
const existingSummary = await getExistingSummary(spaceId);
|
||||
const isIncremental = existingSummary !== null;
|
||||
|
||||
// 3. Get episodes (all or new ones based on existing summary)
|
||||
const episodes = await getSpaceEpisodes(
|
||||
spaceId,
|
||||
userId,
|
||||
isIncremental ? existingSummary?.lastUpdated : undefined,
|
||||
);
|
||||
|
||||
// Handle case where no new episodes exist for incremental update
|
||||
if (isIncremental && episodes.length === 0) {
|
||||
logger.info(
|
||||
`No new episodes found for space ${spaceId}, skipping summary update`,
|
||||
);
|
||||
return null;
|
||||
}
|
||||
|
||||
// Check minimum episode requirement for new summaries only
|
||||
if (!isIncremental && episodes.length < CONFIG.minEpisodesForSummary) {
|
||||
logger.info(
|
||||
`Space ${spaceId} has insufficient episodes (${episodes.length}) for new summary`,
|
||||
);
|
||||
return null;
|
||||
}
|
||||
|
||||
// 4. Process episodes using unified approach
|
||||
let summaryResult;
|
||||
|
||||
if (episodes.length > CONFIG.maxEpisodesForSummary) {
|
||||
logger.info(
|
||||
`Large space detected (${episodes.length} episodes). Processing in batches.`,
|
||||
);
|
||||
|
||||
// Process in batches, each building on previous result
|
||||
const batches: SpaceEpisodeData[][] = [];
|
||||
for (let i = 0; i < episodes.length; i += CONFIG.maxEpisodesForSummary) {
|
||||
batches.push(episodes.slice(i, i + CONFIG.maxEpisodesForSummary));
|
||||
}
|
||||
|
||||
let currentSummary = existingSummary?.summary || null;
|
||||
let currentThemes = existingSummary?.themes || [];
|
||||
let cumulativeConfidence = 0;
|
||||
|
||||
for (const [batchIndex, batch] of batches.entries()) {
|
||||
logger.info(
|
||||
`Processing batch ${batchIndex + 1}/${batches.length} with ${batch.length} episodes`,
|
||||
);
|
||||
|
||||
const batchResult = await generateUnifiedSummary(
|
||||
space.name,
|
||||
space.description as string,
|
||||
batch,
|
||||
currentSummary,
|
||||
currentThemes,
|
||||
);
|
||||
|
||||
if (batchResult) {
|
||||
currentSummary = batchResult.summary;
|
||||
currentThemes = batchResult.themes;
|
||||
cumulativeConfidence += batchResult.confidence;
|
||||
} else {
|
||||
logger.warn(`Failed to process batch ${batchIndex + 1}`);
|
||||
}
|
||||
|
||||
// Small delay between batches
|
||||
if (batchIndex < batches.length - 1) {
|
||||
await new Promise((resolve) => setTimeout(resolve, 500));
|
||||
}
|
||||
}
|
||||
|
||||
summaryResult = currentSummary
|
||||
? {
|
||||
summary: currentSummary,
|
||||
themes: currentThemes,
|
||||
confidence: Math.min(cumulativeConfidence / batches.length, 1.0),
|
||||
}
|
||||
: null;
|
||||
} else {
|
||||
logger.info(
|
||||
`Processing ${episodes.length} episodes with unified approach`,
|
||||
);
|
||||
|
||||
// Use unified approach for smaller spaces
|
||||
summaryResult = await generateUnifiedSummary(
|
||||
space.name,
|
||||
space.description as string,
|
||||
episodes,
|
||||
existingSummary?.summary || null,
|
||||
existingSummary?.themes || [],
|
||||
);
|
||||
}
|
||||
|
||||
if (!summaryResult) {
|
||||
logger.warn(`Failed to generate LLM summary for space ${spaceId}`);
|
||||
return null;
|
||||
}
|
||||
|
||||
// Get the actual current counts from Neo4j
|
||||
const currentEpisodeCount = await getSpaceEpisodeCount(spaceId, userId);
|
||||
|
||||
return {
|
||||
spaceId: space.uuid,
|
||||
spaceName: space.name,
|
||||
spaceDescription: space.description as string,
|
||||
contextCount: currentEpisodeCount,
|
||||
summary: summaryResult.summary,
|
||||
keyEntities: summaryResult.keyEntities || [],
|
||||
themes: summaryResult.themes,
|
||||
confidence: summaryResult.confidence,
|
||||
lastUpdated: new Date(),
|
||||
isIncremental,
|
||||
};
|
||||
} catch (error) {
|
||||
logger.error(
|
||||
`Error generating summary for space ${spaceId}:`,
|
||||
error as Record<string, unknown>,
|
||||
);
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
async function generateUnifiedSummary(
|
||||
spaceName: string,
|
||||
spaceDescription: string | undefined,
|
||||
episodes: SpaceEpisodeData[],
|
||||
previousSummary: string | null = null,
|
||||
previousThemes: string[] = [],
|
||||
): Promise<{
|
||||
summary: string;
|
||||
themes: string[];
|
||||
confidence: number;
|
||||
keyEntities?: string[];
|
||||
} | null> {
|
||||
try {
|
||||
const prompt = createUnifiedSummaryPrompt(
|
||||
spaceName,
|
||||
spaceDescription,
|
||||
episodes,
|
||||
previousSummary,
|
||||
previousThemes,
|
||||
);
|
||||
|
||||
// Space summary generation requires HIGH complexity (creative synthesis, narrative generation)
|
||||
let responseText = "";
|
||||
await makeModelCall(
|
||||
false,
|
||||
prompt,
|
||||
(text: string) => {
|
||||
responseText = text;
|
||||
},
|
||||
undefined,
|
||||
"high",
|
||||
);
|
||||
|
||||
return parseSummaryResponse(responseText);
|
||||
} catch (error) {
|
||||
logger.error(
|
||||
"Error generating unified summary:",
|
||||
error as Record<string, unknown>,
|
||||
);
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
function createUnifiedSummaryPrompt(
|
||||
spaceName: string,
|
||||
spaceDescription: string | undefined,
|
||||
episodes: SpaceEpisodeData[],
|
||||
previousSummary: string | null,
|
||||
previousThemes: string[],
|
||||
): CoreMessage[] {
|
||||
// If there are no episodes and no previous summary, we cannot generate a meaningful summary
|
||||
if (episodes.length === 0 && previousSummary === null) {
|
||||
throw new Error(
|
||||
"Cannot generate summary without episodes or existing summary",
|
||||
);
|
||||
}
|
||||
|
||||
const episodesText = episodes
|
||||
.map(
|
||||
(episode) =>
|
||||
`- ${episode.content} (Source: ${episode.source}, Session: ${episode.sessionId || "N/A"})`,
|
||||
)
|
||||
.join("\n");
|
||||
|
||||
// Extract key entities and themes from episode content
|
||||
const contentWords = episodes
|
||||
.map((ep) => ep.content.toLowerCase())
|
||||
.join(" ")
|
||||
.split(/\s+/)
|
||||
.filter((word) => word.length > 3);
|
||||
|
||||
const wordFrequency = new Map<string, number>();
|
||||
contentWords.forEach((word) => {
|
||||
wordFrequency.set(word, (wordFrequency.get(word) || 0) + 1);
|
||||
});
|
||||
|
||||
const topEntities = Array.from(wordFrequency.entries())
|
||||
.sort(([, a], [, b]) => b - a)
|
||||
.slice(0, 10)
|
||||
.map(([word]) => word);
|
||||
|
||||
const isUpdate = previousSummary !== null;
|
||||
|
||||
return [
|
||||
{
|
||||
role: "system",
|
||||
content: `You are an expert at analyzing and summarizing episodes within semantic spaces based on the space's intent and purpose. Your task is to ${isUpdate ? "update an existing summary by integrating new episodes" : "create a comprehensive summary of episodes"}.
|
||||
|
||||
CRITICAL RULES:
|
||||
1. Base your summary ONLY on insights derived from the actual content/episodes provided
|
||||
2. Use the space's INTENT/PURPOSE (from description) to guide what to summarize and how to organize it
|
||||
3. Write in a factual, neutral tone - avoid promotional language ("pivotal", "invaluable", "cutting-edge")
|
||||
4. Be specific and concrete - reference actual content, patterns, and insights found in the episodes
|
||||
5. If episodes are insufficient for meaningful insights, state that more data is needed
|
||||
|
||||
INTENT-DRIVEN SUMMARIZATION:
|
||||
Your summary should SERVE the space's intended purpose. Examples:
|
||||
- "Learning React" → Summarize React concepts, patterns, techniques learned
|
||||
- "Project X Updates" → Summarize progress, decisions, blockers, next steps
|
||||
- "Health Tracking" → Summarize metrics, trends, observations, insights
|
||||
- "Guidelines for React" → Extract actionable patterns, best practices, rules
|
||||
- "Evolution of design thinking" → Track how thinking changed over time, decision points
|
||||
The intent defines WHY this space exists - organize content to serve that purpose.
|
||||
|
||||
INSTRUCTIONS:
|
||||
${
|
||||
isUpdate
|
||||
? `1. Review the existing summary and themes carefully
|
||||
2. Analyze the new episodes for patterns and insights that align with the space's intent
|
||||
3. Identify connecting points between existing knowledge and new episodes
|
||||
4. Update the summary to seamlessly integrate new information while preserving valuable existing insights
|
||||
5. Evolve themes by adding new ones or refining existing ones based on the space's purpose
|
||||
6. Organize the summary to serve the space's intended use case`
|
||||
: `1. Analyze the semantic content and relationships within the episodes
|
||||
2. Identify topics/sections that align with the space's INTENT and PURPOSE
|
||||
3. Create a coherent summary that serves the space's intended use case
|
||||
4. Organize the summary based on the space's purpose (not generic frequency-based themes)`
|
||||
}
|
||||
${isUpdate ? "7" : "5"}. Assess your confidence in the ${isUpdate ? "updated" : ""} summary quality (0.0-1.0)
|
||||
|
||||
INTENT-ALIGNED ORGANIZATION:
|
||||
- Organize sections based on what serves the space's purpose
|
||||
- Topics don't need minimum episode counts - relevance to intent matters most
|
||||
- Each section should provide value aligned with the space's intended use
|
||||
- For "guidelines" spaces: focus on actionable patterns
|
||||
- For "tracking" spaces: focus on temporal patterns and changes
|
||||
- For "learning" spaces: focus on concepts and insights gained
|
||||
- Let the space's intent drive the structure, not rigid rules
|
||||
|
||||
${
|
||||
isUpdate
|
||||
? `CONNECTION FOCUS:
|
||||
- Entity relationships that span across batches/time
|
||||
- Theme evolution and expansion
|
||||
- Temporal patterns and progressions
|
||||
- Contradictions or confirmations of existing insights
|
||||
- New insights that complement existing knowledge`
|
||||
: ""
|
||||
}
|
||||
|
||||
RESPONSE FORMAT:
|
||||
Provide your response inside <output></output> tags with valid JSON. Include both HTML summary and markdown format.
|
||||
|
||||
<output>
|
||||
{
|
||||
"summary": "${isUpdate ? "Updated HTML summary that integrates new insights with existing knowledge. Write factually about what the statements reveal - mention specific entities, relationships, and patterns found in the data. Avoid marketing language. Use HTML tags for structure." : "Factual HTML summary based on patterns found in the statements. Report what the data actually shows - specific entities, relationships, frequencies, and concrete insights. Avoid promotional language. Use HTML tags like <p>, <strong>, <ul>, <li> for structure. Keep it concise and evidence-based."}",
|
||||
"keyEntities": ["entity1", "entity2", "entity3"],
|
||||
"themes": ["${isUpdate ? 'updated_theme1", "new_theme2", "evolved_theme3' : 'theme1", "theme2", "theme3'}"],
|
||||
"confidence": 0.85
|
||||
}
|
||||
</output>
|
||||
|
||||
JSON FORMATTING RULES:
|
||||
- HTML content in summary field is allowed and encouraged
|
||||
- Escape quotes within strings as \"
|
||||
- Escape HTML angle brackets if needed: < and >
|
||||
- Use proper HTML tags for structure: <p>, <strong>, <em>, <ul>, <li>, <h3>, etc.
|
||||
- HTML content should be well-formed and semantic
|
||||
|
||||
GUIDELINES:
|
||||
${
|
||||
isUpdate
|
||||
? `- Preserve valuable insights from existing summary
|
||||
- Integrate new information by highlighting connections
|
||||
- Themes should evolve naturally, don't replace wholesale
|
||||
- The updated summary should read as a coherent whole
|
||||
- Make the summary user-friendly and explain what value this space provides`
|
||||
: `- Report only what the episodes actually reveal - be specific and concrete
|
||||
- Cite actual content and patterns found in the episodes
|
||||
- Avoid generic descriptions that could apply to any space
|
||||
- Use neutral, factual language - no "comprehensive", "robust", "cutting-edge" etc.
|
||||
- Themes must be backed by at least 3 supporting episodes with clear evidence
|
||||
- Better to have fewer, well-supported themes than many weak ones
|
||||
- Confidence should reflect actual data quality and coverage, not aspirational goals`
|
||||
}`,
|
||||
},
|
||||
{
|
||||
role: "user",
|
||||
content: `SPACE INFORMATION:
|
||||
Name: "${spaceName}"
|
||||
Intent/Purpose: ${spaceDescription || "No specific intent provided - organize naturally based on content"}
|
||||
|
||||
${
|
||||
isUpdate
|
||||
? `EXISTING SUMMARY:
|
||||
${previousSummary}
|
||||
|
||||
EXISTING THEMES:
|
||||
${previousThemes.join(", ")}
|
||||
|
||||
NEW EPISODES TO INTEGRATE (${episodes.length} episodes):`
|
||||
: `EPISODES IN THIS SPACE (${episodes.length} episodes):`
|
||||
}
|
||||
${episodesText}
|
||||
|
||||
${
|
||||
episodes.length > 0
|
||||
? `TOP WORDS BY FREQUENCY:
|
||||
${topEntities.join(", ")}`
|
||||
: ""
|
||||
}
|
||||
|
||||
${
|
||||
isUpdate
|
||||
? "Please identify connections between the existing summary and new episodes, then update the summary to integrate the new insights coherently. Organize the summary to SERVE the space's intent/purpose. Remember: only summarize insights from the actual episode content."
|
||||
: "Please analyze the episodes and provide a comprehensive summary that SERVES the space's intent/purpose. Organize sections based on what would be most valuable for this space's intended use case. If the intent is unclear, organize naturally based on content patterns. Only summarize insights from actual episode content."
|
||||
}`,
|
||||
},
|
||||
];
|
||||
}
|
||||
|
||||
async function getExistingSummary(spaceId: string): Promise<{
|
||||
summary: string;
|
||||
themes: string[];
|
||||
lastUpdated: Date;
|
||||
contextCount: number;
|
||||
} | null> {
|
||||
try {
|
||||
const existingSummary = await getSpace(spaceId);
|
||||
|
||||
if (existingSummary?.summary) {
|
||||
return {
|
||||
summary: existingSummary.summary,
|
||||
themes: existingSummary.themes,
|
||||
lastUpdated: existingSummary.summaryGeneratedAt || new Date(),
|
||||
contextCount: existingSummary.contextCount || 0,
|
||||
};
|
||||
}
|
||||
|
||||
return null;
|
||||
} catch (error) {
|
||||
logger.warn(`Failed to get existing summary for space ${spaceId}:`, {
|
||||
error,
|
||||
});
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
async function getSpaceEpisodes(
|
||||
spaceId: string,
|
||||
userId: string,
|
||||
sinceDate?: Date,
|
||||
): Promise<SpaceEpisodeData[]> {
|
||||
// Query episodes directly using Space-[:HAS_EPISODE]->Episode relationships
|
||||
const params: any = { spaceId, userId };
|
||||
|
||||
let dateCondition = "";
|
||||
if (sinceDate) {
|
||||
dateCondition = "AND e.createdAt > $sinceDate";
|
||||
params.sinceDate = sinceDate.toISOString();
|
||||
}
|
||||
|
||||
const query = `
|
||||
MATCH (space:Space {uuid: $spaceId, userId: $userId})-[:HAS_EPISODE]->(e:Episode {userId: $userId})
|
||||
WHERE e IS NOT NULL ${dateCondition}
|
||||
RETURN DISTINCT e
|
||||
ORDER BY e.createdAt DESC
|
||||
`;
|
||||
|
||||
const result = await runQuery(query, params);
|
||||
|
||||
return result.map((record) => {
|
||||
const episode = record.get("e").properties;
|
||||
return {
|
||||
uuid: episode.uuid,
|
||||
content: episode.content,
|
||||
originalContent: episode.originalContent,
|
||||
source: episode.source,
|
||||
createdAt: new Date(episode.createdAt),
|
||||
validAt: new Date(episode.validAt),
|
||||
metadata: JSON.parse(episode.metadata || "{}"),
|
||||
sessionId: episode.sessionId,
|
||||
};
|
||||
});
|
||||
}
|
||||
|
||||
function parseSummaryResponse(response: string): {
|
||||
summary: string;
|
||||
themes: string[];
|
||||
confidence: number;
|
||||
keyEntities?: string[];
|
||||
} | null {
|
||||
try {
|
||||
// Extract content from <output> tags
|
||||
const outputMatch = response.match(/<output>([\s\S]*?)<\/output>/);
|
||||
if (!outputMatch) {
|
||||
logger.warn("No <output> tags found in LLM summary response");
|
||||
logger.debug("Full LLM response:", { response });
|
||||
return null;
|
||||
}
|
||||
|
||||
let jsonContent = outputMatch[1].trim();
|
||||
|
||||
let parsed;
|
||||
try {
|
||||
parsed = JSON.parse(jsonContent);
|
||||
} catch (jsonError) {
|
||||
logger.warn("JSON parsing failed, attempting cleanup and retry", {
|
||||
originalError: jsonError,
|
||||
jsonContent: jsonContent.substring(0, 500) + "...", // Log first 500 chars
|
||||
});
|
||||
|
||||
// More aggressive cleanup for malformed JSON
|
||||
jsonContent = jsonContent
|
||||
.replace(/([^\\])"/g, '$1\\"') // Escape unescaped quotes
|
||||
.replace(/^"/g, '\\"') // Escape quotes at start
|
||||
.replace(/\\\\"/g, '\\"'); // Fix double-escaped quotes
|
||||
|
||||
parsed = JSON.parse(jsonContent);
|
||||
}
|
||||
|
||||
// Validate the response structure
|
||||
const validationResult = SummaryResultSchema.safeParse(parsed);
|
||||
if (!validationResult.success) {
|
||||
logger.warn("Invalid LLM summary response format:", {
|
||||
error: validationResult.error,
|
||||
parsedData: parsed,
|
||||
});
|
||||
return null;
|
||||
}
|
||||
|
||||
return validationResult.data;
|
||||
} catch (error) {
|
||||
logger.error(
|
||||
"Error parsing LLM summary response:",
|
||||
error as Record<string, unknown>,
|
||||
);
|
||||
logger.debug("Failed response content:", { response });
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
async function storeSummary(summaryData: SpaceSummaryData): Promise<void> {
|
||||
try {
|
||||
// Store in PostgreSQL for API access and persistence
|
||||
await updateSpace(summaryData);
|
||||
|
||||
// Also store in Neo4j for graph-based queries
|
||||
const query = `
|
||||
MATCH (space:Space {uuid: $spaceId})
|
||||
SET space.summary = $summary,
|
||||
space.keyEntities = $keyEntities,
|
||||
space.themes = $themes,
|
||||
space.summaryConfidence = $confidence,
|
||||
space.summaryContextCount = $contextCount,
|
||||
space.summaryLastUpdated = datetime($lastUpdated)
|
||||
RETURN space
|
||||
`;
|
||||
|
||||
await runQuery(query, {
|
||||
spaceId: summaryData.spaceId,
|
||||
summary: summaryData.summary,
|
||||
keyEntities: summaryData.keyEntities,
|
||||
themes: summaryData.themes,
|
||||
confidence: summaryData.confidence,
|
||||
contextCount: summaryData.contextCount,
|
||||
lastUpdated: summaryData.lastUpdated.toISOString(),
|
||||
});
|
||||
|
||||
logger.info(`Stored summary for space ${summaryData.spaceId}`, {
|
||||
themes: summaryData.themes.length,
|
||||
keyEntities: summaryData.keyEntities.length,
|
||||
confidence: summaryData.confidence,
|
||||
});
|
||||
} catch (error) {
|
||||
logger.error(
|
||||
`Error storing summary for space ${summaryData.spaceId}:`,
|
||||
error as Record<string, unknown>,
|
||||
);
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
@ -15,7 +15,9 @@ import type { z } from "zod";
|
||||
import type { IngestBodyRequest } from "~/jobs/ingest/ingest-episode.logic";
|
||||
import type { CreateConversationTitlePayload } from "~/jobs/conversation/create-title.logic";
|
||||
import type { SessionCompactionPayload } from "~/jobs/session/session-compaction.logic";
|
||||
import { type SpaceAssignmentPayload } from "~/trigger/spaces/space-assignment";
|
||||
import type { SpaceAssignmentPayload } from "~/jobs/spaces/space-assignment.logic";
|
||||
import type { SpaceSummaryPayload } from "~/jobs/spaces/space-summary.logic";
|
||||
import type { SpaceDiscoveryPayload } from "~/jobs/spaces/space-discovery.logic";
|
||||
|
||||
type QueueProvider = "trigger" | "bullmq";
|
||||
|
||||
@ -144,22 +146,80 @@ export async function enqueueSessionCompaction(
|
||||
|
||||
/**
|
||||
* Enqueue space assignment job
|
||||
* (Helper for common job logic to call)
|
||||
*/
|
||||
export async function enqueueSpaceAssignment(
|
||||
payload: SpaceAssignmentPayload,
|
||||
): Promise<void> {
|
||||
): Promise<{ id?: string }> {
|
||||
const provider = env.QUEUE_PROVIDER as QueueProvider;
|
||||
|
||||
if (provider === "trigger") {
|
||||
const { triggerSpaceAssignment } = await import(
|
||||
"~/trigger/spaces/space-assignment"
|
||||
);
|
||||
await triggerSpaceAssignment(payload);
|
||||
const handler = await triggerSpaceAssignment(payload);
|
||||
return { id: handler.id };
|
||||
} else {
|
||||
// For BullMQ, space assignment is not implemented yet
|
||||
// You can add it later when needed
|
||||
console.warn("Space assignment not implemented for BullMQ yet");
|
||||
// BullMQ
|
||||
const { spaceAssignmentQueue } = await import("~/bullmq/queues");
|
||||
const job = await spaceAssignmentQueue.add("space-assignment", payload, {
|
||||
jobId: `space-assignment-${payload.userId}-${payload.mode}-${Date.now()}`,
|
||||
attempts: 3,
|
||||
backoff: { type: "exponential", delay: 2000 },
|
||||
});
|
||||
return { id: job.id };
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Enqueue space summary job
|
||||
*/
|
||||
export async function enqueueSpaceSummary(
|
||||
payload: SpaceSummaryPayload,
|
||||
): Promise<{ id?: string }> {
|
||||
const provider = env.QUEUE_PROVIDER as QueueProvider;
|
||||
|
||||
if (provider === "trigger") {
|
||||
const { triggerSpaceSummary } = await import(
|
||||
"~/trigger/spaces/space-summary"
|
||||
);
|
||||
const handler = await triggerSpaceSummary(payload);
|
||||
return { id: handler.id };
|
||||
} else {
|
||||
// BullMQ
|
||||
const { spaceSummaryQueue } = await import("~/bullmq/queues");
|
||||
const job = await spaceSummaryQueue.add("space-summary", payload, {
|
||||
jobId: `space-summary-${payload.spaceId}-${Date.now()}`,
|
||||
attempts: 3,
|
||||
backoff: { type: "exponential", delay: 2000 },
|
||||
});
|
||||
return { id: job.id };
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Enqueue space discovery job
|
||||
* Discovers and auto-creates thematic spaces based on entity clustering
|
||||
*/
|
||||
export async function enqueueSpaceDiscovery(
|
||||
payload: SpaceDiscoveryPayload,
|
||||
): Promise<{ id?: string }> {
|
||||
const provider = env.QUEUE_PROVIDER as QueueProvider;
|
||||
|
||||
if (provider === "trigger") {
|
||||
const { triggerSpaceDiscovery } = await import(
|
||||
"~/trigger/spaces/space-discovery"
|
||||
);
|
||||
const handler = await triggerSpaceDiscovery(payload);
|
||||
return { id: handler.id };
|
||||
} else {
|
||||
// BullMQ
|
||||
const { spaceDiscoveryQueue } = await import("~/bullmq/queues");
|
||||
const job = await spaceDiscoveryQueue.add("space-discovery", payload, {
|
||||
jobId: `space-discovery-${payload.userId}-${Date.now()}`,
|
||||
attempts: 2, // Less retries for long-running analysis
|
||||
backoff: { type: "exponential", delay: 5000 },
|
||||
});
|
||||
return { id: job.id };
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@ -19,7 +19,10 @@ import {
|
||||
import { getModel } from "~/lib/model.server";
|
||||
import { UserTypeEnum } from "@core/types";
|
||||
import { nanoid } from "nanoid";
|
||||
import { getOrCreatePersonalAccessToken } from "~/services/personalAccessToken.server";
|
||||
import {
|
||||
deletePersonalAccessToken,
|
||||
getOrCreatePersonalAccessToken,
|
||||
} from "~/services/personalAccessToken.server";
|
||||
import {
|
||||
hasAnswer,
|
||||
hasQuestion,
|
||||
@ -126,6 +129,7 @@ const { loader, action } = createHybridActionApiRoute(
|
||||
});
|
||||
|
||||
result.consumeStream(); // no await
|
||||
await deletePersonalAccessToken(pat?.id);
|
||||
|
||||
return result.toUIMessageStreamResponse({
|
||||
originalMessages: validatedMessages,
|
||||
|
||||
@ -1,6 +1,7 @@
|
||||
import { json } from "@remix-run/node";
|
||||
import { z } from "zod";
|
||||
import { prisma } from "~/db.server";
|
||||
import { discoverThematicSpaces } from "~/services/clustering.server";
|
||||
import { createHybridLoaderApiRoute } from "~/services/routeBuilders/apiBuilder.server";
|
||||
|
||||
// Schema for logs search parameters
|
||||
@ -29,6 +30,15 @@ export const loader = createHybridLoaderApiRoute(
|
||||
const sessionId = searchParams.sessionId;
|
||||
const skip = (page - 1) * limit;
|
||||
|
||||
// Simple - just pass userId
|
||||
const result = await discoverThematicSpaces({
|
||||
userId: "cmc1w8xke000xo51vffqcn2mt",
|
||||
});
|
||||
|
||||
// Access the results
|
||||
console.log(result.proposals); // Space proposals from LLM
|
||||
console.log(result.stats); // Overall statistics
|
||||
|
||||
// Get user and workspace in one query
|
||||
const user = await prisma.user.findUnique({
|
||||
where: { id: authentication.userId },
|
||||
|
||||
@ -0,0 +1,555 @@
|
||||
import neo4j from "neo4j-driver";
|
||||
import { driver } from "~/lib/neo4j.server";
|
||||
import { logger } from "~/services/logger.service";
|
||||
import { makeModelCall } from "~/lib/model.server";
|
||||
|
||||
// Helper function to safely convert Neo4j integers to JavaScript numbers
|
||||
function toNumber(value: any): number {
|
||||
if (typeof value === "number") {
|
||||
return value;
|
||||
}
|
||||
if (value && typeof value.toNumber === "function") {
|
||||
return value.toNumber();
|
||||
}
|
||||
return Number(value);
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// Type Definitions
|
||||
// ============================================================================
|
||||
|
||||
export interface SpaceDiscoveryParams {
|
||||
userId: string;
|
||||
spaceIds?: string[];
|
||||
minEpisodeCount?: number;
|
||||
maxEntities?: number;
|
||||
existingSpaces?: Array<{ name: string; description: string | null }>; // Existing spaces to avoid duplicates
|
||||
}
|
||||
|
||||
export interface EntityCluster {
|
||||
entity: string;
|
||||
entityUuid: string;
|
||||
episodeCount: number;
|
||||
topSubjects: Array<{ name: string; count: number }>;
|
||||
topObjects: Array<{ name: string; count: number }>;
|
||||
topPredicates: Array<{ name: string; count: number }>;
|
||||
sampleEpisodes: Array<{
|
||||
uuid: string;
|
||||
content: string;
|
||||
subject: string;
|
||||
predicate: string;
|
||||
object: string;
|
||||
}>;
|
||||
}
|
||||
|
||||
export interface SpaceProposal {
|
||||
name: string;
|
||||
intent: string;
|
||||
confidence: number; // 0-100
|
||||
sourceEntities: string[]; // Which entities suggested this space
|
||||
keyEntities: string[];
|
||||
estimatedEpisodeCount: number;
|
||||
reasoning: string;
|
||||
}
|
||||
|
||||
export interface SpaceDiscoveryResult {
|
||||
clusters: EntityCluster[];
|
||||
proposals: SpaceProposal[];
|
||||
stats: {
|
||||
totalEntities: number;
|
||||
totalEpisodes: number;
|
||||
clustersAnalyzed: number;
|
||||
};
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// Step 1: Entity-Based Clustering
|
||||
// ============================================================================
|
||||
|
||||
/**
|
||||
* Analyze entity clusters by grouping episodes by top entities
|
||||
* For each entity, find co-occurring subjects, objects, and predicates
|
||||
*/
|
||||
async function analyzeEntityClusters(
|
||||
userId: string,
|
||||
spaceIds: string[] | undefined,
|
||||
minEpisodeCount: number,
|
||||
maxEntities: number,
|
||||
): Promise<EntityCluster[]> {
|
||||
const session = driver.session();
|
||||
|
||||
try {
|
||||
logger.info("Analyzing entity clusters...");
|
||||
|
||||
const spaceFilter = spaceIds?.length
|
||||
? "AND any(sid IN ep.spaceIds WHERE sid IN $spaceIds)"
|
||||
: "";
|
||||
|
||||
// Query: Get top entities (subjects + objects) with their episode context
|
||||
const query = `
|
||||
// Get entities that appear as either subject or object
|
||||
MATCH (entity:Entity {userId: $userId})
|
||||
MATCH (entity)<-[r:HAS_SUBJECT|HAS_OBJECT]-(stmt:Statement {userId: $userId})
|
||||
<-[:HAS_PROVENANCE]-(ep:Episode {userId: $userId})
|
||||
WHERE 1=1 ${spaceFilter}
|
||||
|
||||
WITH entity, count(DISTINCT ep) as episodeCount
|
||||
WHERE episodeCount >= $minEpisodeCount
|
||||
|
||||
// For top entities, get their context (subjects, objects, predicates, sample episodes)
|
||||
MATCH (entity)<-[r:HAS_SUBJECT|HAS_OBJECT]-(stmt:Statement {userId: $userId})
|
||||
<-[:HAS_PROVENANCE]-(ep:Episode {userId: $userId})
|
||||
WHERE 1=1 ${spaceFilter}
|
||||
|
||||
MATCH (stmt)-[:HAS_SUBJECT]->(subj:Entity {userId: $userId})
|
||||
MATCH (stmt)-[:HAS_OBJECT]->(obj:Entity {userId: $userId})
|
||||
MATCH (stmt)-[:HAS_PREDICATE]->(pred:Entity {userId: $userId})
|
||||
|
||||
WITH entity,
|
||||
episodeCount,
|
||||
collect(DISTINCT subj.name) as subjects,
|
||||
collect(DISTINCT obj.name) as objects,
|
||||
collect(DISTINCT pred.name) as predicates,
|
||||
collect(DISTINCT {
|
||||
uuid: ep.uuid,
|
||||
content: ep.content,
|
||||
subject: subj.name,
|
||||
predicate: pred.name,
|
||||
object: obj.name
|
||||
})[0..8] as sampleEpisodes
|
||||
|
||||
RETURN
|
||||
entity.name as entityName,
|
||||
entity.uuid as entityUuid,
|
||||
episodeCount,
|
||||
subjects,
|
||||
objects,
|
||||
predicates,
|
||||
sampleEpisodes
|
||||
ORDER BY episodeCount DESC
|
||||
LIMIT $maxEntities
|
||||
`;
|
||||
|
||||
const result = await session.run(query, {
|
||||
userId,
|
||||
spaceIds: spaceIds || [],
|
||||
minEpisodeCount: neo4j.int(minEpisodeCount),
|
||||
maxEntities: neo4j.int(maxEntities),
|
||||
});
|
||||
|
||||
const clusters: EntityCluster[] = result.records.map((record) => {
|
||||
const subjects = record.get("subjects") as string[];
|
||||
const objects = record.get("objects") as string[];
|
||||
const predicates = record.get("predicates") as string[];
|
||||
const sampleEpisodes = record.get("sampleEpisodes") as Array<any>;
|
||||
|
||||
return {
|
||||
entity: record.get("entityName"),
|
||||
entityUuid: record.get("entityUuid"),
|
||||
episodeCount: toNumber(record.get("episodeCount")),
|
||||
topSubjects: countFrequency(subjects).slice(0, 10),
|
||||
topObjects: countFrequency(objects).slice(0, 10),
|
||||
topPredicates: countFrequency(predicates).slice(0, 10),
|
||||
sampleEpisodes: sampleEpisodes.map((ep) => ({
|
||||
uuid: ep.uuid,
|
||||
content: ep.content || "",
|
||||
subject: ep.subject || "",
|
||||
predicate: ep.predicate || "",
|
||||
object: ep.object || "",
|
||||
})),
|
||||
};
|
||||
});
|
||||
|
||||
logger.info(`Found ${clusters.length} entity clusters`);
|
||||
return clusters;
|
||||
} finally {
|
||||
await session.close();
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Count frequency of items in array and return sorted by count
|
||||
*/
|
||||
function countFrequency(
|
||||
items: string[],
|
||||
): Array<{ name: string; count: number }> {
|
||||
const counts = new Map<string, number>();
|
||||
items.forEach((item) => {
|
||||
counts.set(item, (counts.get(item) || 0) + 1);
|
||||
});
|
||||
|
||||
return Array.from(counts.entries())
|
||||
.map(([name, count]) => ({ name, count }))
|
||||
.sort((a, b) => b.count - a.count);
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// Step 2: Group Similar Entity Clusters
|
||||
// ============================================================================
|
||||
|
||||
/**
|
||||
* Group entity clusters by similarity (co-occurring entities and predicates)
|
||||
* This helps merge related entities into thematic groups
|
||||
*/
|
||||
function groupSimilarClusters(clusters: EntityCluster[]): EntityCluster[][] {
|
||||
// For now, return each cluster as its own group
|
||||
// Future enhancement: use entity/predicate overlap to merge similar clusters
|
||||
return clusters.map((cluster) => [cluster]);
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// Step 3: LLM Synthesis for Space Proposals
|
||||
// ============================================================================
|
||||
|
||||
/**
|
||||
* Generate space proposals from entity clusters using LLM
|
||||
*/
|
||||
async function generateSpaceProposalsFromClusters(
|
||||
clusterGroups: EntityCluster[][],
|
||||
userId: string,
|
||||
existingSpaces?: Array<{ name: string; description: string | null }>,
|
||||
): Promise<SpaceProposal[]> {
|
||||
logger.info("Generating space proposals from entity clusters...");
|
||||
|
||||
// Flatten for prompt (treat each cluster separately for now)
|
||||
const clusters = clusterGroups.flat();
|
||||
|
||||
const prompt = buildSpaceDiscoveryPrompt(clusters, existingSpaces);
|
||||
|
||||
let proposals: SpaceProposal[] = [];
|
||||
|
||||
await makeModelCall(
|
||||
false, // not streaming
|
||||
[
|
||||
{
|
||||
role: "user",
|
||||
content: prompt,
|
||||
},
|
||||
],
|
||||
(text) => {
|
||||
try {
|
||||
const parsed = JSON.parse(text);
|
||||
proposals = (parsed.spaces || []).map((space: any) => ({
|
||||
...space,
|
||||
sourceEntities: space.sourceEntities || [],
|
||||
keyEntities: space.keyEntities || [],
|
||||
estimatedEpisodeCount: space.estimatedEpisodeCount || 0,
|
||||
}));
|
||||
} catch (error) {
|
||||
logger.error(`Failed to parse LLM response: ${error}`);
|
||||
logger.error(`Response text: ${text}`);
|
||||
}
|
||||
},
|
||||
{
|
||||
temperature: 0.7,
|
||||
response_format: { type: "json_object" },
|
||||
},
|
||||
"high", // Use high complexity for better analysis
|
||||
);
|
||||
|
||||
logger.info(`Generated ${proposals.length} space proposals`);
|
||||
return proposals;
|
||||
}
|
||||
|
||||
/**
|
||||
* Build LLM prompt for space discovery from entity clusters
|
||||
*/
|
||||
function buildSpaceDiscoveryPrompt(
|
||||
clusters: EntityCluster[],
|
||||
existingSpaces?: Array<{ name: string; description: string | null }>,
|
||||
): string {
|
||||
const clusterDescriptions = clusters
|
||||
.map((cluster, idx) => {
|
||||
// Format top subjects, objects, and predicates
|
||||
const topSubjects = cluster.topSubjects
|
||||
.slice(0, 6)
|
||||
.map((s) => `"${s.name}" (${s.count})`)
|
||||
.join(", ");
|
||||
|
||||
const topObjects = cluster.topObjects
|
||||
.slice(0, 6)
|
||||
.map((o) => `"${o.name}" (${o.count})`)
|
||||
.join(", ");
|
||||
|
||||
const topPredicates = cluster.topPredicates
|
||||
.slice(0, 6)
|
||||
.map((p) => `"${p.name}" (${p.count})`)
|
||||
.join(", ");
|
||||
|
||||
// Format sample episodes (truncate to 200 chars each)
|
||||
const episodeTexts = cluster.sampleEpisodes
|
||||
.slice(0, 4)
|
||||
.map(
|
||||
(ep, epIdx) =>
|
||||
` ${epIdx + 1}. [${ep.subject} → ${ep.predicate} → ${ep.object}]\n "${ep.content.substring(0, 200)}${ep.content.length > 200 ? "..." : ""}"`,
|
||||
)
|
||||
.join("\n");
|
||||
|
||||
return `
|
||||
### Entity ${idx + 1}: "${cluster.entity}"
|
||||
- **Episodes**: ${cluster.episodeCount}
|
||||
- **Top Subjects**: ${topSubjects}
|
||||
- **Top Objects**: ${topObjects}
|
||||
- **Top Predicates**: ${topPredicates}
|
||||
|
||||
**Sample Episodes**:
|
||||
${episodeTexts}
|
||||
`;
|
||||
})
|
||||
.join("\n");
|
||||
|
||||
// Format existing spaces if provided
|
||||
const existingSpacesSection =
|
||||
existingSpaces && existingSpaces.length > 0
|
||||
? `
|
||||
## Existing Spaces (DO NOT DUPLICATE)
|
||||
|
||||
The user already has the following spaces. DO NOT propose spaces with similar names or intents:
|
||||
|
||||
${existingSpaces
|
||||
.map(
|
||||
(space, idx) =>
|
||||
`${idx + 1}. **"${space.name}"**${space.description ? `: ${space.description}` : ""}`,
|
||||
)
|
||||
.join("\n")}
|
||||
|
||||
IMPORTANT: Avoid proposing spaces that overlap with these existing ones. Focus on discovering NEW themes.
|
||||
`
|
||||
: "";
|
||||
|
||||
return `You are analyzing entity clusters from a knowledge graph to discover thematic spaces for organizing episodes.
|
||||
|
||||
Each **Entity Cluster** represents a prominent topic/concept with its associated episodes and related entities.
|
||||
A **Space** is a thematic container that groups related episodes based on projects, topics, or domains.
|
||||
${existingSpacesSection}
|
||||
## Entity Clusters
|
||||
|
||||
${clusterDescriptions}
|
||||
|
||||
## Your Task
|
||||
|
||||
Analyze these entity clusters to identify 3-10 major THEMES that would make meaningful organizational spaces.
|
||||
|
||||
## Guidelines
|
||||
|
||||
1. **Look for related entities**: Group clusters that share common subjects/objects/predicates
|
||||
- Example: "Core", "Backend", "Frontend" with "part_of", "uses" → "Core Project Development"
|
||||
- Example: "Department-Specific Index", "Permission", "Configuration" → "Department Indexing Feature"
|
||||
|
||||
2. **Identify project/feature themes**: Technical content often organizes by:
|
||||
- Projects/codebases (e.g., "Core", "Apollo")
|
||||
- Features/capabilities (e.g., "Department Indexing", "API Development")
|
||||
- Components/layers (e.g., "Frontend", "Backend", "Database")
|
||||
- Cross-cutting concerns (e.g., "Security", "Performance")
|
||||
|
||||
3. **Consider entity relationships**:
|
||||
- Entities with overlapping subjects/objects likely belong together
|
||||
- Common predicates suggest similar types of content
|
||||
- Check sample episodes for thematic coherence
|
||||
|
||||
4. **Space naming**:
|
||||
- Use natural, descriptive names (2-6 words)
|
||||
- Should reflect how user would search/think about content
|
||||
- Prefer specific over generic (e.g., "Core Backend" > "Backend Code")
|
||||
|
||||
5. **Confidence scoring**:
|
||||
- 90-100: Very clear theme, strong entity clustering, coherent episodes
|
||||
- 75-89: Clear theme, good evidence from entities and episodes
|
||||
- 60-74: Moderate theme, reasonable grouping but some diversity
|
||||
- Below 60: Don't propose
|
||||
|
||||
## Output Format
|
||||
|
||||
Return ONLY valid JSON (no markdown, no explanation):
|
||||
|
||||
{
|
||||
"spaces": [
|
||||
{
|
||||
"name": "Core Project Development",
|
||||
"intent": "All discussions, code, and documentation related to the Core project including backend, frontend, and configuration",
|
||||
"confidence": 92,
|
||||
"sourceEntities": ["Core", "Backend", "Frontend"],
|
||||
"keyEntities": ["Core", "Backend", "Frontend", "Configuration", "API"],
|
||||
"estimatedEpisodeCount": 350,
|
||||
"reasoning": "Strong clustering around Core entity with clear project scope. Multiple related components and consistent technical predicates."
|
||||
},
|
||||
{
|
||||
"name": "Department Indexing & Permissions",
|
||||
"intent": "Feature development for department-specific indexes, permission filtering, and access control",
|
||||
"confidence": 85,
|
||||
"sourceEntities": ["Department-Specific Index", "Permission Filtering", "Index"],
|
||||
"keyEntities": ["Department-Specific Index", "Permission", "Backend", "Index", "Filtering"],
|
||||
"estimatedEpisodeCount": 280,
|
||||
"reasoning": "Clear feature theme with related permission and indexing concepts. Coherent technical discussions."
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
Important:
|
||||
- Propose 3-10 spaces maximum
|
||||
- Each space must have confidence >= 60
|
||||
- Avoid overlapping spaces - ensure distinct themes
|
||||
- sourceEntities: List of main entity clusters this space is built from
|
||||
- keyEntities: All important entities that belong in this space
|
||||
- estimatedEpisodeCount: Sum episode counts from relevant entity clusters
|
||||
- Reasoning: Explain WHY these entities form a coherent theme`;
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// Main Discovery Function
|
||||
// ============================================================================
|
||||
|
||||
/**
|
||||
* Discover thematic spaces using entity-first analysis
|
||||
*
|
||||
* Process:
|
||||
* 1. Analyze entity clusters (group episodes by top entities)
|
||||
* 2. Group similar clusters by entity/predicate overlap
|
||||
* 3. Use LLM to synthesize clusters into thematic spaces
|
||||
*/
|
||||
export async function discoverThematicSpaces(
|
||||
params: SpaceDiscoveryParams,
|
||||
): Promise<SpaceDiscoveryResult> {
|
||||
const {
|
||||
userId,
|
||||
spaceIds,
|
||||
minEpisodeCount = 30,
|
||||
maxEntities = 50,
|
||||
existingSpaces,
|
||||
} = params;
|
||||
|
||||
const session = driver.session();
|
||||
|
||||
try {
|
||||
logger.info(`Starting space discovery for user ${userId}`);
|
||||
|
||||
// Step 1: Analyze entity clusters
|
||||
const clusters = await analyzeEntityClusters(
|
||||
userId,
|
||||
spaceIds,
|
||||
minEpisodeCount,
|
||||
maxEntities,
|
||||
);
|
||||
|
||||
if (clusters.length === 0) {
|
||||
logger.info("No entity clusters found");
|
||||
return {
|
||||
clusters: [],
|
||||
proposals: [],
|
||||
stats: {
|
||||
totalEntities: 0,
|
||||
totalEpisodes: 0,
|
||||
clustersAnalyzed: 0,
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
// Step 2: Group similar clusters (future enhancement)
|
||||
const clusterGroups = groupSimilarClusters(clusters);
|
||||
|
||||
// Step 3: Generate space proposals via LLM (with existing spaces to avoid duplicates)
|
||||
const proposals = await generateSpaceProposalsFromClusters(
|
||||
clusterGroups,
|
||||
userId,
|
||||
existingSpaces,
|
||||
);
|
||||
|
||||
// Get overall stats
|
||||
const statsQuery = `
|
||||
MATCH (entity:Entity {userId: $userId})<-[:HAS_SUBJECT|HAS_OBJECT]-(:Statement {userId: $userId})<-[:HAS_PROVENANCE]-(ep:Episode {userId: $userId})
|
||||
RETURN count(DISTINCT entity) as totalEntities, count(DISTINCT ep) as totalEpisodes
|
||||
`;
|
||||
|
||||
const statsResult = await session.run(statsQuery, { userId });
|
||||
|
||||
const result: SpaceDiscoveryResult = {
|
||||
clusters,
|
||||
proposals,
|
||||
stats: {
|
||||
totalEntities:
|
||||
toNumber(statsResult.records[0]?.get("totalEntities")) || 0,
|
||||
totalEpisodes:
|
||||
toNumber(statsResult.records[0]?.get("totalEpisodes")) || 0,
|
||||
clustersAnalyzed: clusters.length,
|
||||
},
|
||||
};
|
||||
|
||||
// Print summary
|
||||
printSpaceDiscoverySummary(result);
|
||||
|
||||
return result;
|
||||
} catch (error) {
|
||||
logger.error(`Error in space discovery: ${error}`);
|
||||
throw error;
|
||||
} finally {
|
||||
await session.close();
|
||||
}
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// Utilities
|
||||
// ============================================================================
|
||||
|
||||
/**
|
||||
* Print formatted summary of space discovery results
|
||||
*/
|
||||
function printSpaceDiscoverySummary(result: SpaceDiscoveryResult): void {
|
||||
console.log("\n" + "=".repeat(80));
|
||||
console.log("THEMATIC SPACE DISCOVERY (Entity-First)");
|
||||
console.log("=".repeat(80));
|
||||
|
||||
console.log("\nOVERALL STATISTICS:");
|
||||
console.log(` Total Entities: ${result.stats.totalEntities}`);
|
||||
console.log(` Total Episodes: ${result.stats.totalEpisodes}`);
|
||||
console.log(` Entity Clusters Analyzed: ${result.stats.clustersAnalyzed}`);
|
||||
console.log(` Space Proposals: ${result.proposals.length}`);
|
||||
|
||||
if (result.clusters.length > 0) {
|
||||
console.log("\n" + "-".repeat(80));
|
||||
console.log("\nTOP ENTITY CLUSTERS:");
|
||||
result.clusters.slice(0, 10).forEach((cluster, idx) => {
|
||||
console.log(
|
||||
` ${idx + 1}. "${cluster.entity}" - ${cluster.episodeCount} episodes`,
|
||||
);
|
||||
console.log(
|
||||
` Top subjects: ${cluster.topSubjects
|
||||
.slice(0, 3)
|
||||
.map((s) => s.name)
|
||||
.join(", ")}`,
|
||||
);
|
||||
console.log(
|
||||
` Top objects: ${cluster.topObjects
|
||||
.slice(0, 3)
|
||||
.map((o) => o.name)
|
||||
.join(", ")}`,
|
||||
);
|
||||
console.log(
|
||||
` Top predicates: ${cluster.topPredicates
|
||||
.slice(0, 3)
|
||||
.map((p) => p.name)
|
||||
.join(", ")}`,
|
||||
);
|
||||
});
|
||||
}
|
||||
|
||||
if (result.proposals.length > 0) {
|
||||
console.log("\n" + "-".repeat(80));
|
||||
console.log("\nSPACE PROPOSALS:");
|
||||
result.proposals.forEach((proposal, idx) => {
|
||||
console.log(
|
||||
`\n ${idx + 1}. "${proposal.name}" (${proposal.confidence}% confidence)`,
|
||||
);
|
||||
console.log(` Intent: ${proposal.intent}`);
|
||||
console.log(` Episodes: ~${proposal.estimatedEpisodeCount}`);
|
||||
console.log(
|
||||
` Source entities: ${proposal.sourceEntities.join(", ")}`,
|
||||
);
|
||||
console.log(
|
||||
` Key entities: ${proposal.keyEntities.slice(0, 5).join(", ")}`,
|
||||
);
|
||||
console.log(` Reasoning: ${proposal.reasoning}`);
|
||||
});
|
||||
}
|
||||
|
||||
console.log("\n" + "=".repeat(80) + "\n");
|
||||
}
|
||||
@ -58,6 +58,7 @@ async function createMcpServer(
|
||||
// Handle memory tools and integration meta-tools
|
||||
if (
|
||||
name.startsWith("memory_") ||
|
||||
name === "get_session_id" ||
|
||||
name === "get_integrations" ||
|
||||
name === "get_integration_actions" ||
|
||||
name === "execute_integration_action"
|
||||
|
||||
@ -1,9 +1,9 @@
|
||||
import { task } from "@trigger.dev/sdk";
|
||||
import { z } from "zod";
|
||||
import { IngestionQueue, IngestionStatus } from "@core/database";
|
||||
import { IngestionStatus } from "@core/database";
|
||||
import { logger } from "~/services/logger.service";
|
||||
import { prisma } from "../utils/prisma";
|
||||
import { IngestBodyRequest, ingestTask } from "./ingest";
|
||||
import { type IngestBodyRequest, ingestTask } from "./ingest";
|
||||
|
||||
export const RetryNoCreditBodyRequest = z.object({
|
||||
workspaceId: z.string(),
|
||||
@ -43,9 +43,7 @@ export const retryNoCreditsTask = task({
|
||||
};
|
||||
}
|
||||
|
||||
logger.log(
|
||||
`Found ${noCreditItems.length} NO_CREDITS episodes to retry`,
|
||||
);
|
||||
logger.log(`Found ${noCreditItems.length} NO_CREDITS episodes to retry`);
|
||||
|
||||
const results = {
|
||||
total: noCreditItems.length,
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
39
apps/webapp/app/trigger/spaces/space-discovery.ts
Normal file
39
apps/webapp/app/trigger/spaces/space-discovery.ts
Normal file
@ -0,0 +1,39 @@
|
||||
import { queue, task } from "@trigger.dev/sdk/v3";
|
||||
import { logger } from "~/services/logger.service";
|
||||
import {
|
||||
processSpaceDiscovery,
|
||||
type SpaceDiscoveryPayload,
|
||||
} from "~/jobs/spaces/space-discovery.logic";
|
||||
|
||||
export type { SpaceDiscoveryPayload };
|
||||
|
||||
export const spaceDiscoveryQueue = queue({
|
||||
name: "space-discovery-queue",
|
||||
concurrencyLimit: 1, // One discovery job at a time globally
|
||||
});
|
||||
|
||||
export const spaceDiscoveryTask = task({
|
||||
id: "space-discovery",
|
||||
queue: spaceDiscoveryQueue,
|
||||
run: async (payload: SpaceDiscoveryPayload) => {
|
||||
logger.info(`[Trigger.dev] Starting space discovery task`, {
|
||||
userId: payload.userId,
|
||||
workspaceId: payload.workspaceId,
|
||||
minEpisodeCount: payload.minEpisodeCount,
|
||||
maxEntities: payload.maxEntities,
|
||||
autoCreateThreshold: payload.autoCreateThreshold,
|
||||
});
|
||||
|
||||
// Use common business logic
|
||||
return await processSpaceDiscovery(payload);
|
||||
},
|
||||
});
|
||||
|
||||
// Helper function to trigger the task
|
||||
export async function triggerSpaceDiscovery(payload: SpaceDiscoveryPayload) {
|
||||
return await spaceDiscoveryTask.trigger(payload, {
|
||||
queue: "space-discovery-queue",
|
||||
concurrencyKey: payload.userId, // One discovery per user at a time
|
||||
tags: [payload.userId, payload.workspaceId, "space-discovery"],
|
||||
});
|
||||
}
|
||||
@ -1,557 +0,0 @@
|
||||
import { task } from "@trigger.dev/sdk/v3";
|
||||
import { logger } from "~/services/logger.service";
|
||||
import { makeModelCall } from "~/lib/model.server";
|
||||
import { runQuery } from "~/lib/neo4j.server";
|
||||
import type { CoreMessage } from "ai";
|
||||
import { z } from "zod";
|
||||
import {
|
||||
EXPLICIT_PATTERN_TYPES,
|
||||
IMPLICIT_PATTERN_TYPES,
|
||||
type SpacePattern,
|
||||
type PatternDetectionResult,
|
||||
} from "@core/types";
|
||||
import { createSpacePattern, getSpace } from "../utils/space-utils";
|
||||
|
||||
interface SpacePatternPayload {
|
||||
userId: string;
|
||||
workspaceId: string;
|
||||
spaceId: string;
|
||||
triggerSource?:
|
||||
| "summary_complete"
|
||||
| "manual"
|
||||
| "assignment"
|
||||
| "scheduled"
|
||||
| "new_space"
|
||||
| "growth_threshold"
|
||||
| "ingestion_complete";
|
||||
}
|
||||
|
||||
interface SpaceStatementData {
|
||||
uuid: string;
|
||||
fact: string;
|
||||
subject: string;
|
||||
predicate: string;
|
||||
object: string;
|
||||
createdAt: Date;
|
||||
validAt: Date;
|
||||
content?: string; // For implicit pattern analysis
|
||||
}
|
||||
|
||||
interface SpaceThemeData {
|
||||
themes: string[];
|
||||
summary: string;
|
||||
}
|
||||
|
||||
// Zod schemas for LLM response validation
|
||||
const ExplicitPatternSchema = z.object({
|
||||
name: z.string(),
|
||||
type: z.string(),
|
||||
summary: z.string(),
|
||||
evidence: z.array(z.string()),
|
||||
confidence: z.number().min(0).max(1),
|
||||
});
|
||||
|
||||
const ImplicitPatternSchema = z.object({
|
||||
name: z.string(),
|
||||
type: z.string(),
|
||||
summary: z.string(),
|
||||
evidence: z.array(z.string()),
|
||||
confidence: z.number().min(0).max(1),
|
||||
});
|
||||
|
||||
const PatternAnalysisSchema = z.object({
|
||||
explicitPatterns: z.array(ExplicitPatternSchema),
|
||||
implicitPatterns: z.array(ImplicitPatternSchema),
|
||||
});
|
||||
|
||||
const CONFIG = {
|
||||
minStatementsForPatterns: 5,
|
||||
maxPatternsPerSpace: 20,
|
||||
minPatternConfidence: 0.85,
|
||||
};
|
||||
|
||||
export const spacePatternTask = task({
|
||||
id: "space-pattern",
|
||||
run: async (payload: SpacePatternPayload) => {
|
||||
const { userId, workspaceId, spaceId, triggerSource = "manual" } = payload;
|
||||
|
||||
logger.info(`Starting space pattern detection`, {
|
||||
userId,
|
||||
workspaceId,
|
||||
spaceId,
|
||||
triggerSource,
|
||||
});
|
||||
|
||||
try {
|
||||
// Get space data and check if it has enough content
|
||||
const space = await getSpaceForPatternAnalysis(spaceId);
|
||||
if (!space) {
|
||||
return {
|
||||
success: false,
|
||||
spaceId,
|
||||
error: "Space not found or insufficient data",
|
||||
};
|
||||
}
|
||||
|
||||
// Get statements for pattern analysis
|
||||
const statements = await getSpaceStatementsForPatterns(spaceId, userId);
|
||||
|
||||
if (statements.length < CONFIG.minStatementsForPatterns) {
|
||||
logger.info(
|
||||
`Space ${spaceId} has insufficient statements (${statements.length}) for pattern detection`,
|
||||
);
|
||||
return {
|
||||
success: true,
|
||||
spaceId,
|
||||
triggerSource,
|
||||
patterns: {
|
||||
explicitPatterns: [],
|
||||
implicitPatterns: [],
|
||||
totalPatternsFound: 0,
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
// Detect patterns
|
||||
const patternResult = await detectSpacePatterns(space, statements);
|
||||
|
||||
if (patternResult) {
|
||||
// Store patterns
|
||||
await storePatterns(
|
||||
patternResult.explicitPatterns,
|
||||
patternResult.implicitPatterns,
|
||||
spaceId,
|
||||
);
|
||||
|
||||
logger.info(`Generated patterns for space ${spaceId}`, {
|
||||
explicitPatterns: patternResult.explicitPatterns.length,
|
||||
implicitPatterns: patternResult.implicitPatterns.length,
|
||||
totalPatterns: patternResult.totalPatternsFound,
|
||||
triggerSource,
|
||||
});
|
||||
|
||||
return {
|
||||
success: true,
|
||||
spaceId,
|
||||
triggerSource,
|
||||
patterns: {
|
||||
explicitPatterns: patternResult.explicitPatterns.length,
|
||||
implicitPatterns: patternResult.implicitPatterns.length,
|
||||
totalPatternsFound: patternResult.totalPatternsFound,
|
||||
},
|
||||
};
|
||||
} else {
|
||||
logger.warn(`Failed to detect patterns for space ${spaceId}`);
|
||||
return {
|
||||
success: false,
|
||||
spaceId,
|
||||
triggerSource,
|
||||
error: "Failed to detect patterns",
|
||||
};
|
||||
}
|
||||
} catch (error) {
|
||||
logger.error(
|
||||
`Error in space pattern detection for space ${spaceId}:`,
|
||||
error as Record<string, unknown>,
|
||||
);
|
||||
throw error;
|
||||
}
|
||||
},
|
||||
});
|
||||
|
||||
async function getSpaceForPatternAnalysis(
|
||||
spaceId: string,
|
||||
): Promise<SpaceThemeData | null> {
|
||||
try {
|
||||
const space = await getSpace(spaceId);
|
||||
|
||||
if (!space || !space.themes || space.themes.length === 0) {
|
||||
logger.warn(
|
||||
`Space ${spaceId} not found or has no themes for pattern analysis`,
|
||||
);
|
||||
return null;
|
||||
}
|
||||
|
||||
return {
|
||||
themes: space.themes,
|
||||
summary: space.summary || "",
|
||||
};
|
||||
} catch (error) {
|
||||
logger.error(
|
||||
`Error getting space for pattern analysis:`,
|
||||
error as Record<string, unknown>,
|
||||
);
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
async function getSpaceStatementsForPatterns(
|
||||
spaceId: string,
|
||||
userId: string,
|
||||
): Promise<SpaceStatementData[]> {
|
||||
const query = `
|
||||
MATCH (s:Statement)
|
||||
WHERE s.userId = $userId
|
||||
AND s.spaceIds IS NOT NULL
|
||||
AND $spaceId IN s.spaceIds
|
||||
AND s.invalidAt IS NULL
|
||||
MATCH (s)-[:HAS_SUBJECT]->(subj:Entity)
|
||||
MATCH (s)-[:HAS_PREDICATE]->(pred:Entity)
|
||||
MATCH (s)-[:HAS_OBJECT]->(obj:Entity)
|
||||
RETURN s, subj.name as subject, pred.name as predicate, obj.name as object
|
||||
ORDER BY s.createdAt DESC
|
||||
`;
|
||||
|
||||
const result = await runQuery(query, {
|
||||
spaceId,
|
||||
userId,
|
||||
});
|
||||
|
||||
return result.map((record) => {
|
||||
const statement = record.get("s").properties;
|
||||
return {
|
||||
uuid: statement.uuid,
|
||||
fact: statement.fact,
|
||||
subject: record.get("subject"),
|
||||
predicate: record.get("predicate"),
|
||||
object: record.get("object"),
|
||||
createdAt: new Date(statement.createdAt),
|
||||
validAt: new Date(statement.validAt),
|
||||
content: statement.fact, // Use fact as content for implicit analysis
|
||||
};
|
||||
});
|
||||
}
|
||||
|
||||
async function detectSpacePatterns(
|
||||
space: SpaceThemeData,
|
||||
statements: SpaceStatementData[],
|
||||
): Promise<PatternDetectionResult | null> {
|
||||
try {
|
||||
// Extract explicit patterns from themes
|
||||
const explicitPatterns = await extractExplicitPatterns(
|
||||
space.themes,
|
||||
space.summary,
|
||||
statements,
|
||||
);
|
||||
|
||||
// Extract implicit patterns from statement analysis
|
||||
const implicitPatterns = await extractImplicitPatterns(statements);
|
||||
|
||||
return {
|
||||
explicitPatterns,
|
||||
implicitPatterns,
|
||||
totalPatternsFound: explicitPatterns.length + implicitPatterns.length,
|
||||
processingStats: {
|
||||
statementsAnalyzed: statements.length,
|
||||
themesProcessed: space.themes.length,
|
||||
implicitPatternsExtracted: implicitPatterns.length,
|
||||
},
|
||||
};
|
||||
} catch (error) {
|
||||
logger.error(
|
||||
"Error detecting space patterns:",
|
||||
error as Record<string, unknown>,
|
||||
);
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
async function extractExplicitPatterns(
|
||||
themes: string[],
|
||||
summary: string,
|
||||
statements: SpaceStatementData[],
|
||||
): Promise<Omit<SpacePattern, "id" | "createdAt" | "updatedAt" | "spaceId">[]> {
|
||||
if (themes.length === 0) return [];
|
||||
|
||||
const prompt = createExplicitPatternPrompt(themes, summary, statements);
|
||||
|
||||
// Pattern extraction requires HIGH complexity (insight synthesis, pattern recognition)
|
||||
let responseText = "";
|
||||
await makeModelCall(false, prompt, (text: string) => {
|
||||
responseText = text;
|
||||
}, undefined, 'high');
|
||||
|
||||
const patterns = parseExplicitPatternResponse(responseText);
|
||||
|
||||
return patterns.map((pattern) => ({
|
||||
name: pattern.name || `${pattern.type} pattern`,
|
||||
source: "explicit" as const,
|
||||
type: pattern.type,
|
||||
summary: pattern.summary,
|
||||
evidence: pattern.evidence,
|
||||
confidence: pattern.confidence,
|
||||
userConfirmed: "pending" as const,
|
||||
}));
|
||||
}
|
||||
|
||||
async function extractImplicitPatterns(
|
||||
statements: SpaceStatementData[],
|
||||
): Promise<Omit<SpacePattern, "id" | "createdAt" | "updatedAt" | "spaceId">[]> {
|
||||
if (statements.length < CONFIG.minStatementsForPatterns) return [];
|
||||
|
||||
const prompt = createImplicitPatternPrompt(statements);
|
||||
|
||||
// Implicit pattern discovery requires HIGH complexity (pattern recognition from statements)
|
||||
let responseText = "";
|
||||
await makeModelCall(false, prompt, (text: string) => {
|
||||
responseText = text;
|
||||
}, undefined, 'high');
|
||||
|
||||
const patterns = parseImplicitPatternResponse(responseText);
|
||||
|
||||
return patterns.map((pattern) => ({
|
||||
name: pattern.name || `${pattern.type} pattern`,
|
||||
source: "implicit" as const,
|
||||
type: pattern.type,
|
||||
summary: pattern.summary,
|
||||
evidence: pattern.evidence,
|
||||
confidence: pattern.confidence,
|
||||
userConfirmed: "pending" as const,
|
||||
}));
|
||||
}
|
||||
|
||||
function createExplicitPatternPrompt(
|
||||
themes: string[],
|
||||
summary: string,
|
||||
statements: SpaceStatementData[],
|
||||
): CoreMessage[] {
|
||||
const statementsText = statements
|
||||
.map((stmt) => `[${stmt.uuid}] ${stmt.fact}`)
|
||||
.join("\n");
|
||||
|
||||
const explicitTypes = Object.values(EXPLICIT_PATTERN_TYPES).join('", "');
|
||||
|
||||
return [
|
||||
{
|
||||
role: "system",
|
||||
content: `You are an expert at extracting structured patterns from themes and supporting evidence.
|
||||
|
||||
Your task is to convert high-level themes into explicit patterns with supporting statement evidence.
|
||||
|
||||
INSTRUCTIONS:
|
||||
1. For each theme, create a pattern that explains what it reveals about the user
|
||||
2. Give each pattern a short, descriptive name (2-4 words)
|
||||
3. Find supporting statement IDs that provide evidence for each pattern
|
||||
4. Assess confidence based on evidence strength and theme clarity
|
||||
5. Use appropriate pattern types from these guidelines: "${explicitTypes}"
|
||||
- "theme": High-level thematic content areas
|
||||
- "topic": Specific subject matter or topics of interest
|
||||
- "domain": Knowledge or work domains the user operates in
|
||||
- "interest_area": Areas of personal interest or hobby
|
||||
6. You may suggest new pattern types if none of the guidelines fit well
|
||||
|
||||
RESPONSE FORMAT:
|
||||
Provide your response inside <output></output> tags with valid JSON.
|
||||
|
||||
<output>
|
||||
{
|
||||
"explicitPatterns": [
|
||||
{
|
||||
"name": "Short descriptive name for the pattern",
|
||||
"type": "theme",
|
||||
"summary": "Description of what this pattern reveals about the user",
|
||||
"evidence": ["statement_id_1", "statement_id_2"],
|
||||
"confidence": 0.85
|
||||
}
|
||||
]
|
||||
}
|
||||
</output>`,
|
||||
},
|
||||
{
|
||||
role: "user",
|
||||
content: `THEMES TO ANALYZE:
|
||||
${themes.map((theme, i) => `${i + 1}. ${theme}`).join("\n")}
|
||||
|
||||
SPACE SUMMARY:
|
||||
${summary}
|
||||
|
||||
SUPPORTING STATEMENTS:
|
||||
${statementsText}
|
||||
|
||||
Please extract explicit patterns from these themes and map them to supporting statement evidence.`,
|
||||
},
|
||||
];
|
||||
}
|
||||
|
||||
function createImplicitPatternPrompt(
|
||||
statements: SpaceStatementData[],
|
||||
): CoreMessage[] {
|
||||
const statementsText = statements
|
||||
.map(
|
||||
(stmt) =>
|
||||
`[${stmt.uuid}] ${stmt.fact} (${stmt.subject} → ${stmt.predicate} → ${stmt.object})`,
|
||||
)
|
||||
.join("\n");
|
||||
|
||||
const implicitTypes = Object.values(IMPLICIT_PATTERN_TYPES).join('", "');
|
||||
|
||||
return [
|
||||
{
|
||||
role: "system",
|
||||
content: `You are an expert at discovering implicit behavioral patterns from statement analysis.
|
||||
|
||||
Your task is to identify hidden patterns in user behavior, preferences, and habits from statement content.
|
||||
|
||||
INSTRUCTIONS:
|
||||
1. Analyze statement content for behavioral patterns, not explicit topics
|
||||
2. Give each pattern a short, descriptive name (2-4 words)
|
||||
3. Look for recurring behaviors, preferences, and working styles
|
||||
4. Identify how the user approaches tasks, makes decisions, and interacts
|
||||
5. Use appropriate pattern types from these guidelines: "${implicitTypes}"
|
||||
- "preference": Personal preferences and choices
|
||||
- "habit": Recurring behaviors and routines
|
||||
- "workflow": Work and process patterns
|
||||
- "communication_style": How user communicates and expresses ideas
|
||||
- "decision_pattern": Decision-making approaches and criteria
|
||||
- "temporal_pattern": Time-based behavioral patterns
|
||||
- "behavioral_pattern": General behavioral tendencies
|
||||
- "learning_style": How user learns and processes information
|
||||
- "collaboration_style": How user works with others
|
||||
6. You may suggest new pattern types if none of the guidelines fit well
|
||||
7. Focus on what the statements reveal about how the user thinks, works, or behaves
|
||||
|
||||
RESPONSE FORMAT:
|
||||
Provide your response inside <output></output> tags with valid JSON.
|
||||
|
||||
<output>
|
||||
{
|
||||
"implicitPatterns": [
|
||||
{
|
||||
"name": "Short descriptive name for the pattern",
|
||||
"type": "preference",
|
||||
"summary": "Description of what this behavioral pattern reveals",
|
||||
"evidence": ["statement_id_1", "statement_id_2"],
|
||||
"confidence": 0.75
|
||||
}
|
||||
]
|
||||
}
|
||||
</output>`,
|
||||
},
|
||||
{
|
||||
role: "user",
|
||||
content: `STATEMENTS TO ANALYZE FOR IMPLICIT PATTERNS:
|
||||
${statementsText}
|
||||
|
||||
Please identify implicit behavioral patterns, preferences, and habits from these statements.`,
|
||||
},
|
||||
];
|
||||
}
|
||||
|
||||
function parseExplicitPatternResponse(response: string): Array<{
|
||||
name: string;
|
||||
type: string;
|
||||
summary: string;
|
||||
evidence: string[];
|
||||
confidence: number;
|
||||
}> {
|
||||
try {
|
||||
const outputMatch = response.match(/<output>([\s\S]*?)<\/output>/);
|
||||
if (!outputMatch) {
|
||||
logger.warn("No <output> tags found in explicit pattern response");
|
||||
return [];
|
||||
}
|
||||
|
||||
const parsed = JSON.parse(outputMatch[1].trim());
|
||||
const validationResult = z
|
||||
.object({
|
||||
explicitPatterns: z.array(ExplicitPatternSchema),
|
||||
})
|
||||
.safeParse(parsed);
|
||||
|
||||
if (!validationResult.success) {
|
||||
logger.warn("Invalid explicit pattern response format:", {
|
||||
error: validationResult.error,
|
||||
});
|
||||
return [];
|
||||
}
|
||||
|
||||
return validationResult.data.explicitPatterns.filter(
|
||||
(p) =>
|
||||
p.confidence >= CONFIG.minPatternConfidence && p.evidence.length >= 3, // Ensure at least 3 evidence statements
|
||||
);
|
||||
} catch (error) {
|
||||
logger.error(
|
||||
"Error parsing explicit pattern response:",
|
||||
error as Record<string, unknown>,
|
||||
);
|
||||
return [];
|
||||
}
|
||||
}
|
||||
|
||||
function parseImplicitPatternResponse(response: string): Array<{
|
||||
name: string;
|
||||
type: string;
|
||||
summary: string;
|
||||
evidence: string[];
|
||||
confidence: number;
|
||||
}> {
|
||||
try {
|
||||
const outputMatch = response.match(/<output>([\s\S]*?)<\/output>/);
|
||||
if (!outputMatch) {
|
||||
logger.warn("No <output> tags found in implicit pattern response");
|
||||
return [];
|
||||
}
|
||||
|
||||
const parsed = JSON.parse(outputMatch[1].trim());
|
||||
const validationResult = z
|
||||
.object({
|
||||
implicitPatterns: z.array(ImplicitPatternSchema),
|
||||
})
|
||||
.safeParse(parsed);
|
||||
|
||||
if (!validationResult.success) {
|
||||
logger.warn("Invalid implicit pattern response format:", {
|
||||
error: validationResult.error,
|
||||
});
|
||||
return [];
|
||||
}
|
||||
|
||||
return validationResult.data.implicitPatterns.filter(
|
||||
(p) =>
|
||||
p.confidence >= CONFIG.minPatternConfidence && p.evidence.length >= 3, // Ensure at least 3 evidence statements
|
||||
);
|
||||
} catch (error) {
|
||||
logger.error(
|
||||
"Error parsing implicit pattern response:",
|
||||
error as Record<string, unknown>,
|
||||
);
|
||||
return [];
|
||||
}
|
||||
}
|
||||
|
||||
async function storePatterns(
|
||||
explicitPatterns: Omit<
|
||||
SpacePattern,
|
||||
"id" | "createdAt" | "updatedAt" | "spaceId"
|
||||
>[],
|
||||
implicitPatterns: Omit<
|
||||
SpacePattern,
|
||||
"id" | "createdAt" | "updatedAt" | "spaceId"
|
||||
>[],
|
||||
spaceId: string,
|
||||
): Promise<void> {
|
||||
try {
|
||||
const allPatterns = [...explicitPatterns, ...implicitPatterns];
|
||||
|
||||
if (allPatterns.length === 0) return;
|
||||
|
||||
// Store in PostgreSQL
|
||||
await createSpacePattern(spaceId, allPatterns);
|
||||
|
||||
logger.info(`Stored ${allPatterns.length} patterns`, {
|
||||
explicit: explicitPatterns.length,
|
||||
implicit: implicitPatterns.length,
|
||||
});
|
||||
} catch (error) {
|
||||
logger.error("Error storing patterns:", error as Record<string, unknown>);
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
|
||||
// Helper function to trigger the task
|
||||
export async function triggerSpacePattern(payload: SpacePatternPayload) {
|
||||
return await spacePatternTask.trigger(payload, {
|
||||
concurrencyKey: `space-pattern-${payload.spaceId}`, // Prevent parallel runs for the same space
|
||||
tags: [payload.userId, payload.spaceId, payload.triggerSource || "manual"],
|
||||
});
|
||||
}
|
||||
@ -1,62 +1,11 @@
|
||||
import { queue, task } from "@trigger.dev/sdk/v3";
|
||||
import { logger } from "~/services/logger.service";
|
||||
import { SpaceService } from "~/services/space.server";
|
||||
import { makeModelCall } from "~/lib/model.server";
|
||||
import { runQuery } from "~/lib/neo4j.server";
|
||||
import { updateSpaceStatus, SPACE_STATUS } from "../utils/space-status";
|
||||
import type { CoreMessage } from "ai";
|
||||
import { z } from "zod";
|
||||
import { triggerSpacePattern } from "./space-pattern";
|
||||
import { getSpace, updateSpace } from "../utils/space-utils";
|
||||
import {
|
||||
processSpaceSummary,
|
||||
type SpaceSummaryPayload,
|
||||
} from "~/jobs/spaces/space-summary.logic";
|
||||
|
||||
import { EpisodeType } from "@core/types";
|
||||
import { getSpaceEpisodeCount } from "~/services/graphModels/space";
|
||||
import { addToQueue } from "~/lib/ingest.server";
|
||||
|
||||
interface SpaceSummaryPayload {
|
||||
userId: string;
|
||||
workspaceId: string;
|
||||
spaceId: string; // Single space only
|
||||
triggerSource?: "assignment" | "manual" | "scheduled";
|
||||
}
|
||||
|
||||
interface SpaceEpisodeData {
|
||||
uuid: string;
|
||||
content: string;
|
||||
originalContent: string;
|
||||
source: string;
|
||||
createdAt: Date;
|
||||
validAt: Date;
|
||||
metadata: any;
|
||||
sessionId: string | null;
|
||||
}
|
||||
|
||||
interface SpaceSummaryData {
|
||||
spaceId: string;
|
||||
spaceName: string;
|
||||
spaceDescription?: string;
|
||||
contextCount: number;
|
||||
summary: string;
|
||||
keyEntities: string[];
|
||||
themes: string[];
|
||||
confidence: number;
|
||||
lastUpdated: Date;
|
||||
isIncremental: boolean;
|
||||
}
|
||||
|
||||
// Zod schema for LLM response validation
|
||||
const SummaryResultSchema = z.object({
|
||||
summary: z.string(),
|
||||
keyEntities: z.array(z.string()),
|
||||
themes: z.array(z.string()),
|
||||
confidence: z.number().min(0).max(1),
|
||||
});
|
||||
|
||||
const CONFIG = {
|
||||
maxEpisodesForSummary: 20, // Limit episodes for performance
|
||||
minEpisodesForSummary: 1, // Minimum episodes to generate summary
|
||||
summaryEpisodeThreshold: 5, // Minimum new episodes required to trigger summary (configurable)
|
||||
};
|
||||
export type { SpaceSummaryPayload };
|
||||
|
||||
export const spaceSummaryQueue = queue({
|
||||
name: "space-summary-queue",
|
||||
@ -67,735 +16,17 @@ export const spaceSummaryTask = task({
|
||||
id: "space-summary",
|
||||
queue: spaceSummaryQueue,
|
||||
run: async (payload: SpaceSummaryPayload) => {
|
||||
const { userId, workspaceId, spaceId, triggerSource = "manual" } = payload;
|
||||
|
||||
logger.info(`Starting space summary generation`, {
|
||||
userId,
|
||||
workspaceId,
|
||||
spaceId,
|
||||
triggerSource,
|
||||
logger.info(`[Trigger.dev] Starting space summary task`, {
|
||||
userId: payload.userId,
|
||||
spaceId: payload.spaceId,
|
||||
triggerSource: payload.triggerSource,
|
||||
});
|
||||
|
||||
try {
|
||||
// Update status to processing
|
||||
await updateSpaceStatus(spaceId, SPACE_STATUS.PROCESSING, {
|
||||
userId,
|
||||
operation: "space-summary",
|
||||
metadata: { triggerSource, phase: "start_summary" },
|
||||
});
|
||||
|
||||
// Generate summary for the single space
|
||||
const summaryResult = await generateSpaceSummary(
|
||||
spaceId,
|
||||
userId,
|
||||
triggerSource,
|
||||
);
|
||||
|
||||
if (summaryResult) {
|
||||
// Store the summary
|
||||
await storeSummary(summaryResult);
|
||||
|
||||
// Update status to ready after successful completion
|
||||
await updateSpaceStatus(spaceId, SPACE_STATUS.READY, {
|
||||
userId,
|
||||
operation: "space-summary",
|
||||
metadata: {
|
||||
triggerSource,
|
||||
phase: "completed_summary",
|
||||
contextCount: summaryResult.contextCount,
|
||||
confidence: summaryResult.confidence,
|
||||
},
|
||||
});
|
||||
|
||||
logger.info(`Generated summary for space ${spaceId}`, {
|
||||
statementCount: summaryResult.contextCount,
|
||||
confidence: summaryResult.confidence,
|
||||
themes: summaryResult.themes.length,
|
||||
triggerSource,
|
||||
});
|
||||
|
||||
return {
|
||||
success: true,
|
||||
spaceId,
|
||||
triggerSource,
|
||||
summary: {
|
||||
statementCount: summaryResult.contextCount,
|
||||
confidence: summaryResult.confidence,
|
||||
themesCount: summaryResult.themes.length,
|
||||
},
|
||||
};
|
||||
} else {
|
||||
// No summary generated - this could be due to insufficient episodes or no new episodes
|
||||
// This is not an error state, so update status to ready
|
||||
await updateSpaceStatus(spaceId, SPACE_STATUS.READY, {
|
||||
userId,
|
||||
operation: "space-summary",
|
||||
metadata: {
|
||||
triggerSource,
|
||||
phase: "no_summary_needed",
|
||||
reason: "Insufficient episodes or no new episodes to summarize",
|
||||
},
|
||||
});
|
||||
|
||||
logger.info(
|
||||
`No summary generated for space ${spaceId} - insufficient or no new episodes`,
|
||||
);
|
||||
return {
|
||||
success: true,
|
||||
spaceId,
|
||||
triggerSource,
|
||||
summary: null,
|
||||
reason: "No episodes to summarize",
|
||||
};
|
||||
}
|
||||
} catch (error) {
|
||||
// Update status to error on exception
|
||||
try {
|
||||
await updateSpaceStatus(spaceId, SPACE_STATUS.ERROR, {
|
||||
userId,
|
||||
operation: "space-summary",
|
||||
metadata: {
|
||||
triggerSource,
|
||||
phase: "exception",
|
||||
error: error instanceof Error ? error.message : "Unknown error",
|
||||
},
|
||||
});
|
||||
} catch (statusError) {
|
||||
logger.warn(`Failed to update status to error for space ${spaceId}`, {
|
||||
statusError,
|
||||
});
|
||||
}
|
||||
|
||||
logger.error(
|
||||
`Error in space summary generation for space ${spaceId}:`,
|
||||
error as Record<string, unknown>,
|
||||
);
|
||||
throw error;
|
||||
}
|
||||
// Use common business logic
|
||||
return await processSpaceSummary(payload);
|
||||
},
|
||||
});
|
||||
|
||||
async function generateSpaceSummary(
|
||||
spaceId: string,
|
||||
userId: string,
|
||||
triggerSource?: "assignment" | "manual" | "scheduled",
|
||||
): Promise<SpaceSummaryData | null> {
|
||||
try {
|
||||
// 1. Get space details
|
||||
const spaceService = new SpaceService();
|
||||
const space = await spaceService.getSpace(spaceId, userId);
|
||||
|
||||
if (!space) {
|
||||
logger.warn(`Space ${spaceId} not found for user ${userId}`);
|
||||
return null;
|
||||
}
|
||||
|
||||
// 2. Check episode count threshold (skip for manual triggers)
|
||||
if (triggerSource !== "manual") {
|
||||
const currentEpisodeCount = await getSpaceEpisodeCount(spaceId, userId);
|
||||
const lastSummaryEpisodeCount = space.contextCount || 0;
|
||||
const episodeDifference = currentEpisodeCount - lastSummaryEpisodeCount;
|
||||
|
||||
if (
|
||||
episodeDifference < CONFIG.summaryEpisodeThreshold ||
|
||||
lastSummaryEpisodeCount !== 0
|
||||
) {
|
||||
logger.info(
|
||||
`Skipping summary generation for space ${spaceId}: only ${episodeDifference} new episodes (threshold: ${CONFIG.summaryEpisodeThreshold})`,
|
||||
{
|
||||
currentEpisodeCount,
|
||||
lastSummaryEpisodeCount,
|
||||
episodeDifference,
|
||||
threshold: CONFIG.summaryEpisodeThreshold,
|
||||
},
|
||||
);
|
||||
return null;
|
||||
}
|
||||
|
||||
logger.info(
|
||||
`Proceeding with summary generation for space ${spaceId}: ${episodeDifference} new episodes (threshold: ${CONFIG.summaryEpisodeThreshold})`,
|
||||
{
|
||||
currentEpisodeCount,
|
||||
lastSummaryEpisodeCount,
|
||||
episodeDifference,
|
||||
},
|
||||
);
|
||||
}
|
||||
|
||||
// 2. Check for existing summary
|
||||
const existingSummary = await getExistingSummary(spaceId);
|
||||
const isIncremental = existingSummary !== null;
|
||||
|
||||
// 3. Get episodes (all or new ones based on existing summary)
|
||||
const episodes = await getSpaceEpisodes(
|
||||
spaceId,
|
||||
userId,
|
||||
isIncremental ? existingSummary?.lastUpdated : undefined,
|
||||
);
|
||||
|
||||
// Handle case where no new episodes exist for incremental update
|
||||
if (isIncremental && episodes.length === 0) {
|
||||
logger.info(
|
||||
`No new episodes found for space ${spaceId}, skipping summary update`,
|
||||
);
|
||||
return null;
|
||||
}
|
||||
|
||||
// Check minimum episode requirement for new summaries only
|
||||
if (!isIncremental && episodes.length < CONFIG.minEpisodesForSummary) {
|
||||
logger.info(
|
||||
`Space ${spaceId} has insufficient episodes (${episodes.length}) for new summary`,
|
||||
);
|
||||
return null;
|
||||
}
|
||||
|
||||
// 4. Process episodes using unified approach
|
||||
let summaryResult;
|
||||
|
||||
if (episodes.length > CONFIG.maxEpisodesForSummary) {
|
||||
logger.info(
|
||||
`Large space detected (${episodes.length} episodes). Processing in batches.`,
|
||||
);
|
||||
|
||||
// Process in batches, each building on previous result
|
||||
const batches: SpaceEpisodeData[][] = [];
|
||||
for (let i = 0; i < episodes.length; i += CONFIG.maxEpisodesForSummary) {
|
||||
batches.push(episodes.slice(i, i + CONFIG.maxEpisodesForSummary));
|
||||
}
|
||||
|
||||
let currentSummary = existingSummary?.summary || null;
|
||||
let currentThemes = existingSummary?.themes || [];
|
||||
let cumulativeConfidence = 0;
|
||||
|
||||
for (const [batchIndex, batch] of batches.entries()) {
|
||||
logger.info(
|
||||
`Processing batch ${batchIndex + 1}/${batches.length} with ${batch.length} episodes`,
|
||||
);
|
||||
|
||||
const batchResult = await generateUnifiedSummary(
|
||||
space.name,
|
||||
space.description as string,
|
||||
batch,
|
||||
currentSummary,
|
||||
currentThemes,
|
||||
);
|
||||
|
||||
if (batchResult) {
|
||||
currentSummary = batchResult.summary;
|
||||
currentThemes = batchResult.themes;
|
||||
cumulativeConfidence += batchResult.confidence;
|
||||
} else {
|
||||
logger.warn(`Failed to process batch ${batchIndex + 1}`);
|
||||
}
|
||||
|
||||
// Small delay between batches
|
||||
if (batchIndex < batches.length - 1) {
|
||||
await new Promise((resolve) => setTimeout(resolve, 500));
|
||||
}
|
||||
}
|
||||
|
||||
summaryResult = currentSummary
|
||||
? {
|
||||
summary: currentSummary,
|
||||
themes: currentThemes,
|
||||
confidence: Math.min(cumulativeConfidence / batches.length, 1.0),
|
||||
}
|
||||
: null;
|
||||
} else {
|
||||
logger.info(
|
||||
`Processing ${episodes.length} episodes with unified approach`,
|
||||
);
|
||||
|
||||
// Use unified approach for smaller spaces
|
||||
summaryResult = await generateUnifiedSummary(
|
||||
space.name,
|
||||
space.description as string,
|
||||
episodes,
|
||||
existingSummary?.summary || null,
|
||||
existingSummary?.themes || [],
|
||||
);
|
||||
}
|
||||
|
||||
if (!summaryResult) {
|
||||
logger.warn(`Failed to generate LLM summary for space ${spaceId}`);
|
||||
return null;
|
||||
}
|
||||
|
||||
// Get the actual current counts from Neo4j
|
||||
const currentEpisodeCount = await getSpaceEpisodeCount(spaceId, userId);
|
||||
|
||||
return {
|
||||
spaceId: space.uuid,
|
||||
spaceName: space.name,
|
||||
spaceDescription: space.description as string,
|
||||
contextCount: currentEpisodeCount,
|
||||
summary: summaryResult.summary,
|
||||
keyEntities: summaryResult.keyEntities || [],
|
||||
themes: summaryResult.themes,
|
||||
confidence: summaryResult.confidence,
|
||||
lastUpdated: new Date(),
|
||||
isIncremental,
|
||||
};
|
||||
} catch (error) {
|
||||
logger.error(
|
||||
`Error generating summary for space ${spaceId}:`,
|
||||
error as Record<string, unknown>,
|
||||
);
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
async function generateUnifiedSummary(
|
||||
spaceName: string,
|
||||
spaceDescription: string | undefined,
|
||||
episodes: SpaceEpisodeData[],
|
||||
previousSummary: string | null = null,
|
||||
previousThemes: string[] = [],
|
||||
): Promise<{
|
||||
summary: string;
|
||||
themes: string[];
|
||||
confidence: number;
|
||||
keyEntities?: string[];
|
||||
} | null> {
|
||||
try {
|
||||
const prompt = createUnifiedSummaryPrompt(
|
||||
spaceName,
|
||||
spaceDescription,
|
||||
episodes,
|
||||
previousSummary,
|
||||
previousThemes,
|
||||
);
|
||||
|
||||
// Space summary generation requires HIGH complexity (creative synthesis, narrative generation)
|
||||
let responseText = "";
|
||||
await makeModelCall(
|
||||
false,
|
||||
prompt,
|
||||
(text: string) => {
|
||||
responseText = text;
|
||||
},
|
||||
undefined,
|
||||
"high",
|
||||
);
|
||||
|
||||
return parseSummaryResponse(responseText);
|
||||
} catch (error) {
|
||||
logger.error(
|
||||
"Error generating unified summary:",
|
||||
error as Record<string, unknown>,
|
||||
);
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
function createUnifiedSummaryPrompt(
|
||||
spaceName: string,
|
||||
spaceDescription: string | undefined,
|
||||
episodes: SpaceEpisodeData[],
|
||||
previousSummary: string | null,
|
||||
previousThemes: string[],
|
||||
): CoreMessage[] {
|
||||
// If there are no episodes and no previous summary, we cannot generate a meaningful summary
|
||||
if (episodes.length === 0 && previousSummary === null) {
|
||||
throw new Error(
|
||||
"Cannot generate summary without episodes or existing summary",
|
||||
);
|
||||
}
|
||||
|
||||
const episodesText = episodes
|
||||
.map(
|
||||
(episode) =>
|
||||
`- ${episode.content} (Source: ${episode.source}, Session: ${episode.sessionId || "N/A"})`,
|
||||
)
|
||||
.join("\n");
|
||||
|
||||
// Extract key entities and themes from episode content
|
||||
const contentWords = episodes
|
||||
.map((ep) => ep.content.toLowerCase())
|
||||
.join(" ")
|
||||
.split(/\s+/)
|
||||
.filter((word) => word.length > 3);
|
||||
|
||||
const wordFrequency = new Map<string, number>();
|
||||
contentWords.forEach((word) => {
|
||||
wordFrequency.set(word, (wordFrequency.get(word) || 0) + 1);
|
||||
});
|
||||
|
||||
const topEntities = Array.from(wordFrequency.entries())
|
||||
.sort(([, a], [, b]) => b - a)
|
||||
.slice(0, 10)
|
||||
.map(([word]) => word);
|
||||
|
||||
const isUpdate = previousSummary !== null;
|
||||
|
||||
return [
|
||||
{
|
||||
role: "system",
|
||||
content: `You are an expert at analyzing and summarizing episodes within semantic spaces based on the space's intent and purpose. Your task is to ${isUpdate ? "update an existing summary by integrating new episodes" : "create a comprehensive summary of episodes"}.
|
||||
|
||||
CRITICAL RULES:
|
||||
1. Base your summary ONLY on insights derived from the actual content/episodes provided
|
||||
2. Use the space's INTENT/PURPOSE (from description) to guide what to summarize and how to organize it
|
||||
3. Write in a factual, neutral tone - avoid promotional language ("pivotal", "invaluable", "cutting-edge")
|
||||
4. Be specific and concrete - reference actual content, patterns, and insights found in the episodes
|
||||
5. If episodes are insufficient for meaningful insights, state that more data is needed
|
||||
|
||||
INTENT-DRIVEN SUMMARIZATION:
|
||||
Your summary should SERVE the space's intended purpose. Examples:
|
||||
- "Learning React" → Summarize React concepts, patterns, techniques learned
|
||||
- "Project X Updates" → Summarize progress, decisions, blockers, next steps
|
||||
- "Health Tracking" → Summarize metrics, trends, observations, insights
|
||||
- "Guidelines for React" → Extract actionable patterns, best practices, rules
|
||||
- "Evolution of design thinking" → Track how thinking changed over time, decision points
|
||||
The intent defines WHY this space exists - organize content to serve that purpose.
|
||||
|
||||
INSTRUCTIONS:
|
||||
${
|
||||
isUpdate
|
||||
? `1. Review the existing summary and themes carefully
|
||||
2. Analyze the new episodes for patterns and insights that align with the space's intent
|
||||
3. Identify connecting points between existing knowledge and new episodes
|
||||
4. Update the summary to seamlessly integrate new information while preserving valuable existing insights
|
||||
5. Evolve themes by adding new ones or refining existing ones based on the space's purpose
|
||||
6. Organize the summary to serve the space's intended use case`
|
||||
: `1. Analyze the semantic content and relationships within the episodes
|
||||
2. Identify topics/sections that align with the space's INTENT and PURPOSE
|
||||
3. Create a coherent summary that serves the space's intended use case
|
||||
4. Organize the summary based on the space's purpose (not generic frequency-based themes)`
|
||||
}
|
||||
${isUpdate ? "7" : "5"}. Assess your confidence in the ${isUpdate ? "updated" : ""} summary quality (0.0-1.0)
|
||||
|
||||
INTENT-ALIGNED ORGANIZATION:
|
||||
- Organize sections based on what serves the space's purpose
|
||||
- Topics don't need minimum episode counts - relevance to intent matters most
|
||||
- Each section should provide value aligned with the space's intended use
|
||||
- For "guidelines" spaces: focus on actionable patterns
|
||||
- For "tracking" spaces: focus on temporal patterns and changes
|
||||
- For "learning" spaces: focus on concepts and insights gained
|
||||
- Let the space's intent drive the structure, not rigid rules
|
||||
|
||||
${
|
||||
isUpdate
|
||||
? `CONNECTION FOCUS:
|
||||
- Entity relationships that span across batches/time
|
||||
- Theme evolution and expansion
|
||||
- Temporal patterns and progressions
|
||||
- Contradictions or confirmations of existing insights
|
||||
- New insights that complement existing knowledge`
|
||||
: ""
|
||||
}
|
||||
|
||||
RESPONSE FORMAT:
|
||||
Provide your response inside <output></output> tags with valid JSON. Include both HTML summary and markdown format.
|
||||
|
||||
<output>
|
||||
{
|
||||
"summary": "${isUpdate ? "Updated HTML summary that integrates new insights with existing knowledge. Write factually about what the statements reveal - mention specific entities, relationships, and patterns found in the data. Avoid marketing language. Use HTML tags for structure." : "Factual HTML summary based on patterns found in the statements. Report what the data actually shows - specific entities, relationships, frequencies, and concrete insights. Avoid promotional language. Use HTML tags like <p>, <strong>, <ul>, <li> for structure. Keep it concise and evidence-based."}",
|
||||
"keyEntities": ["entity1", "entity2", "entity3"],
|
||||
"themes": ["${isUpdate ? 'updated_theme1", "new_theme2", "evolved_theme3' : 'theme1", "theme2", "theme3'}"],
|
||||
"confidence": 0.85
|
||||
}
|
||||
</output>
|
||||
|
||||
JSON FORMATTING RULES:
|
||||
- HTML content in summary field is allowed and encouraged
|
||||
- Escape quotes within strings as \"
|
||||
- Escape HTML angle brackets if needed: < and >
|
||||
- Use proper HTML tags for structure: <p>, <strong>, <em>, <ul>, <li>, <h3>, etc.
|
||||
- HTML content should be well-formed and semantic
|
||||
|
||||
GUIDELINES:
|
||||
${
|
||||
isUpdate
|
||||
? `- Preserve valuable insights from existing summary
|
||||
- Integrate new information by highlighting connections
|
||||
- Themes should evolve naturally, don't replace wholesale
|
||||
- The updated summary should read as a coherent whole
|
||||
- Make the summary user-friendly and explain what value this space provides`
|
||||
: `- Report only what the episodes actually reveal - be specific and concrete
|
||||
- Cite actual content and patterns found in the episodes
|
||||
- Avoid generic descriptions that could apply to any space
|
||||
- Use neutral, factual language - no "comprehensive", "robust", "cutting-edge" etc.
|
||||
- Themes must be backed by at least 3 supporting episodes with clear evidence
|
||||
- Better to have fewer, well-supported themes than many weak ones
|
||||
- Confidence should reflect actual data quality and coverage, not aspirational goals`
|
||||
}`,
|
||||
},
|
||||
{
|
||||
role: "user",
|
||||
content: `SPACE INFORMATION:
|
||||
Name: "${spaceName}"
|
||||
Intent/Purpose: ${spaceDescription || "No specific intent provided - organize naturally based on content"}
|
||||
|
||||
${
|
||||
isUpdate
|
||||
? `EXISTING SUMMARY:
|
||||
${previousSummary}
|
||||
|
||||
EXISTING THEMES:
|
||||
${previousThemes.join(", ")}
|
||||
|
||||
NEW EPISODES TO INTEGRATE (${episodes.length} episodes):`
|
||||
: `EPISODES IN THIS SPACE (${episodes.length} episodes):`
|
||||
}
|
||||
${episodesText}
|
||||
|
||||
${
|
||||
episodes.length > 0
|
||||
? `TOP WORDS BY FREQUENCY:
|
||||
${topEntities.join(", ")}`
|
||||
: ""
|
||||
}
|
||||
|
||||
${
|
||||
isUpdate
|
||||
? "Please identify connections between the existing summary and new episodes, then update the summary to integrate the new insights coherently. Organize the summary to SERVE the space's intent/purpose. Remember: only summarize insights from the actual episode content."
|
||||
: "Please analyze the episodes and provide a comprehensive summary that SERVES the space's intent/purpose. Organize sections based on what would be most valuable for this space's intended use case. If the intent is unclear, organize naturally based on content patterns. Only summarize insights from actual episode content."
|
||||
}`,
|
||||
},
|
||||
];
|
||||
}
|
||||
|
||||
async function getExistingSummary(spaceId: string): Promise<{
|
||||
summary: string;
|
||||
themes: string[];
|
||||
lastUpdated: Date;
|
||||
contextCount: number;
|
||||
} | null> {
|
||||
try {
|
||||
const existingSummary = await getSpace(spaceId);
|
||||
|
||||
if (existingSummary?.summary) {
|
||||
return {
|
||||
summary: existingSummary.summary,
|
||||
themes: existingSummary.themes,
|
||||
lastUpdated: existingSummary.summaryGeneratedAt || new Date(),
|
||||
contextCount: existingSummary.contextCount || 0,
|
||||
};
|
||||
}
|
||||
|
||||
return null;
|
||||
} catch (error) {
|
||||
logger.warn(`Failed to get existing summary for space ${spaceId}:`, {
|
||||
error,
|
||||
});
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
async function getSpaceEpisodes(
|
||||
spaceId: string,
|
||||
userId: string,
|
||||
sinceDate?: Date,
|
||||
): Promise<SpaceEpisodeData[]> {
|
||||
// Query episodes directly using Space-[:HAS_EPISODE]->Episode relationships
|
||||
const params: any = { spaceId, userId };
|
||||
|
||||
let dateCondition = "";
|
||||
if (sinceDate) {
|
||||
dateCondition = "AND e.createdAt > $sinceDate";
|
||||
params.sinceDate = sinceDate.toISOString();
|
||||
}
|
||||
|
||||
const query = `
|
||||
MATCH (space:Space {uuid: $spaceId, userId: $userId})-[:HAS_EPISODE]->(e:Episode {userId: $userId})
|
||||
WHERE e IS NOT NULL ${dateCondition}
|
||||
RETURN DISTINCT e
|
||||
ORDER BY e.createdAt DESC
|
||||
`;
|
||||
|
||||
const result = await runQuery(query, params);
|
||||
|
||||
return result.map((record) => {
|
||||
const episode = record.get("e").properties;
|
||||
return {
|
||||
uuid: episode.uuid,
|
||||
content: episode.content,
|
||||
originalContent: episode.originalContent,
|
||||
source: episode.source,
|
||||
createdAt: new Date(episode.createdAt),
|
||||
validAt: new Date(episode.validAt),
|
||||
metadata: JSON.parse(episode.metadata || "{}"),
|
||||
sessionId: episode.sessionId,
|
||||
};
|
||||
});
|
||||
}
|
||||
|
||||
function parseSummaryResponse(response: string): {
|
||||
summary: string;
|
||||
themes: string[];
|
||||
confidence: number;
|
||||
keyEntities?: string[];
|
||||
} | null {
|
||||
try {
|
||||
// Extract content from <output> tags
|
||||
const outputMatch = response.match(/<output>([\s\S]*?)<\/output>/);
|
||||
if (!outputMatch) {
|
||||
logger.warn("No <output> tags found in LLM summary response");
|
||||
logger.debug("Full LLM response:", { response });
|
||||
return null;
|
||||
}
|
||||
|
||||
let jsonContent = outputMatch[1].trim();
|
||||
|
||||
let parsed;
|
||||
try {
|
||||
parsed = JSON.parse(jsonContent);
|
||||
} catch (jsonError) {
|
||||
logger.warn("JSON parsing failed, attempting cleanup and retry", {
|
||||
originalError: jsonError,
|
||||
jsonContent: jsonContent.substring(0, 500) + "...", // Log first 500 chars
|
||||
});
|
||||
|
||||
// More aggressive cleanup for malformed JSON
|
||||
jsonContent = jsonContent
|
||||
.replace(/([^\\])"/g, '$1\\"') // Escape unescaped quotes
|
||||
.replace(/^"/g, '\\"') // Escape quotes at start
|
||||
.replace(/\\\\"/g, '\\"'); // Fix double-escaped quotes
|
||||
|
||||
parsed = JSON.parse(jsonContent);
|
||||
}
|
||||
|
||||
// Validate the response structure
|
||||
const validationResult = SummaryResultSchema.safeParse(parsed);
|
||||
if (!validationResult.success) {
|
||||
logger.warn("Invalid LLM summary response format:", {
|
||||
error: validationResult.error,
|
||||
parsedData: parsed,
|
||||
});
|
||||
return null;
|
||||
}
|
||||
|
||||
return validationResult.data;
|
||||
} catch (error) {
|
||||
logger.error(
|
||||
"Error parsing LLM summary response:",
|
||||
error as Record<string, unknown>,
|
||||
);
|
||||
logger.debug("Failed response content:", { response });
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
async function storeSummary(summaryData: SpaceSummaryData): Promise<void> {
|
||||
try {
|
||||
// Store in PostgreSQL for API access and persistence
|
||||
await updateSpace(summaryData);
|
||||
|
||||
// Also store in Neo4j for graph-based queries
|
||||
const query = `
|
||||
MATCH (space:Space {uuid: $spaceId})
|
||||
SET space.summary = $summary,
|
||||
space.keyEntities = $keyEntities,
|
||||
space.themes = $themes,
|
||||
space.summaryConfidence = $confidence,
|
||||
space.summaryContextCount = $contextCount,
|
||||
space.summaryLastUpdated = datetime($lastUpdated)
|
||||
RETURN space
|
||||
`;
|
||||
|
||||
await runQuery(query, {
|
||||
spaceId: summaryData.spaceId,
|
||||
summary: summaryData.summary,
|
||||
keyEntities: summaryData.keyEntities,
|
||||
themes: summaryData.themes,
|
||||
confidence: summaryData.confidence,
|
||||
contextCount: summaryData.contextCount,
|
||||
lastUpdated: summaryData.lastUpdated.toISOString(),
|
||||
});
|
||||
|
||||
logger.info(`Stored summary for space ${summaryData.spaceId}`, {
|
||||
themes: summaryData.themes.length,
|
||||
keyEntities: summaryData.keyEntities.length,
|
||||
confidence: summaryData.confidence,
|
||||
});
|
||||
} catch (error) {
|
||||
logger.error(
|
||||
`Error storing summary for space ${summaryData.spaceId}:`,
|
||||
error as Record<string, unknown>,
|
||||
);
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Process space summary sequentially: ingest document then trigger patterns
|
||||
*/
|
||||
async function processSpaceSummarySequentially({
|
||||
userId,
|
||||
workspaceId,
|
||||
spaceId,
|
||||
spaceName,
|
||||
summaryContent,
|
||||
triggerSource,
|
||||
}: {
|
||||
userId: string;
|
||||
workspaceId: string;
|
||||
spaceId: string;
|
||||
spaceName: string;
|
||||
summaryContent: string;
|
||||
triggerSource:
|
||||
| "summary_complete"
|
||||
| "manual"
|
||||
| "assignment"
|
||||
| "scheduled"
|
||||
| "new_space"
|
||||
| "growth_threshold"
|
||||
| "ingestion_complete";
|
||||
}): Promise<void> {
|
||||
// Step 1: Ingest summary as document synchronously
|
||||
await ingestSpaceSummaryDocument(spaceId, userId, spaceName, summaryContent);
|
||||
|
||||
logger.info(
|
||||
`Successfully ingested space summary document for space ${spaceId}`,
|
||||
);
|
||||
|
||||
// Step 2: Now trigger space patterns (patterns will have access to the ingested summary)
|
||||
await triggerSpacePattern({
|
||||
userId,
|
||||
workspaceId,
|
||||
spaceId,
|
||||
triggerSource,
|
||||
});
|
||||
|
||||
logger.info(
|
||||
`Sequential processing completed for space ${spaceId}: summary ingested → patterns triggered`,
|
||||
);
|
||||
}
|
||||
|
||||
/**
|
||||
* Ingest space summary as document synchronously
|
||||
*/
|
||||
async function ingestSpaceSummaryDocument(
|
||||
spaceId: string,
|
||||
userId: string,
|
||||
spaceName: string,
|
||||
summaryContent: string,
|
||||
): Promise<void> {
|
||||
// Create the ingest body
|
||||
const ingestBody = {
|
||||
episodeBody: summaryContent,
|
||||
referenceTime: new Date().toISOString(),
|
||||
metadata: {
|
||||
documentType: "space_summary",
|
||||
spaceId,
|
||||
spaceName,
|
||||
generatedAt: new Date().toISOString(),
|
||||
},
|
||||
source: "space",
|
||||
spaceId,
|
||||
sessionId: spaceId,
|
||||
type: EpisodeType.DOCUMENT,
|
||||
};
|
||||
|
||||
// Add to queue
|
||||
await addToQueue(ingestBody, userId);
|
||||
|
||||
logger.info(`Queued space summary for synchronous ingestion`);
|
||||
|
||||
return;
|
||||
}
|
||||
|
||||
// Helper function to trigger the task
|
||||
export async function triggerSpaceSummary(payload: SpaceSummaryPayload) {
|
||||
return await spaceSummaryTask.trigger(payload, {
|
||||
|
||||
@ -1,4 +1,3 @@
|
||||
import { type SpacePattern } from "@core/types";
|
||||
import { prisma } from "./prisma";
|
||||
|
||||
export const getSpace = async (spaceId: string) => {
|
||||
@ -11,22 +10,6 @@ export const getSpace = async (spaceId: string) => {
|
||||
return space;
|
||||
};
|
||||
|
||||
export const createSpacePattern = async (
|
||||
spaceId: string,
|
||||
allPatterns: Omit<
|
||||
SpacePattern,
|
||||
"id" | "createdAt" | "updatedAt" | "spaceId"
|
||||
>[],
|
||||
) => {
|
||||
return await prisma.spacePattern.createMany({
|
||||
data: allPatterns.map((pattern) => ({
|
||||
...pattern,
|
||||
spaceId,
|
||||
userConfirmed: pattern.userConfirmed as any, // Temporary cast until Prisma client is regenerated
|
||||
})),
|
||||
});
|
||||
};
|
||||
|
||||
export const updateSpace = async (summaryData: {
|
||||
spaceId: string;
|
||||
summary: string;
|
||||
@ -41,7 +24,7 @@ export const updateSpace = async (summaryData: {
|
||||
summary: summaryData.summary,
|
||||
themes: summaryData.themes,
|
||||
contextCount: summaryData.contextCount,
|
||||
summaryGeneratedAt: new Date().toISOString()
|
||||
summaryGeneratedAt: new Date().toISOString(),
|
||||
},
|
||||
});
|
||||
};
|
||||
|
||||
@ -1,3 +1,4 @@
|
||||
import { randomUUID } from "node:crypto";
|
||||
import { EpisodeTypeEnum } from "@core/types";
|
||||
import { addToQueue } from "~/lib/ingest.server";
|
||||
import { logger } from "~/services/logger.service";
|
||||
@ -19,24 +20,24 @@ const SearchParamsSchema = {
|
||||
description:
|
||||
"Search query optimized for knowledge graph retrieval. Choose the right query structure based on your search intent:\n\n" +
|
||||
"1. **Entity-Centric Queries** (Best for graph search):\n" +
|
||||
" - ✅ GOOD: \"User's preferences for code style and formatting\"\n" +
|
||||
" - ✅ GOOD: \"Project authentication implementation decisions\"\n" +
|
||||
" - ❌ BAD: \"user code style\"\n" +
|
||||
' - ✅ GOOD: "User\'s preferences for code style and formatting"\n' +
|
||||
' - ✅ GOOD: "Project authentication implementation decisions"\n' +
|
||||
' - ❌ BAD: "user code style"\n' +
|
||||
" - Format: [Person/Project] + [relationship/attribute] + [context]\n\n" +
|
||||
"2. **Multi-Entity Relationship Queries** (Excellent for episode graph):\n" +
|
||||
" - ✅ GOOD: \"User and team discussions about API design patterns\"\n" +
|
||||
" - ✅ GOOD: \"relationship between database schema and performance optimization\"\n" +
|
||||
" - ❌ BAD: \"user team api design\"\n" +
|
||||
' - ✅ GOOD: "User and team discussions about API design patterns"\n' +
|
||||
' - ✅ GOOD: "relationship between database schema and performance optimization"\n' +
|
||||
' - ❌ BAD: "user team api design"\n' +
|
||||
" - Format: [Entity1] + [relationship type] + [Entity2] + [context]\n\n" +
|
||||
"3. **Semantic Question Queries** (Good for vector search):\n" +
|
||||
" - ✅ GOOD: \"What causes authentication errors in production? What are the security requirements?\"\n" +
|
||||
" - ✅ GOOD: \"How does caching improve API response times compared to direct database queries?\"\n" +
|
||||
" - ❌ BAD: \"auth errors production\"\n" +
|
||||
' - ✅ GOOD: "What causes authentication errors in production? What are the security requirements?"\n' +
|
||||
' - ✅ GOOD: "How does caching improve API response times compared to direct database queries?"\n' +
|
||||
' - ❌ BAD: "auth errors production"\n' +
|
||||
" - Format: Complete natural questions with full context\n\n" +
|
||||
"4. **Concept Exploration Queries** (Good for BFS traversal):\n" +
|
||||
" - ✅ GOOD: \"concepts and ideas related to database indexing and query optimization\"\n" +
|
||||
" - ✅ GOOD: \"topics connected to user authentication and session management\"\n" +
|
||||
" - ❌ BAD: \"database indexing concepts\"\n" +
|
||||
' - ✅ GOOD: "concepts and ideas related to database indexing and query optimization"\n' +
|
||||
' - ✅ GOOD: "topics connected to user authentication and session management"\n' +
|
||||
' - ❌ BAD: "database indexing concepts"\n' +
|
||||
" - Format: [concept] + related/connected + [domain/context]\n\n" +
|
||||
"Avoid keyword soup queries - use complete phrases with proper context for best results.",
|
||||
},
|
||||
@ -150,6 +151,20 @@ export const memoryTools = [
|
||||
},
|
||||
},
|
||||
},
|
||||
{
|
||||
name: "get_session_id",
|
||||
description:
|
||||
"Get a new session ID for the MCP connection. USE THIS TOOL: When you need a session ID and don't have one yet. This generates a unique UUID to identify your MCP session. IMPORTANT: If any other tool requires a sessionId parameter and you don't have one, call this tool first to get a session ID. Returns: A UUID string to use as sessionId.",
|
||||
inputSchema: {
|
||||
type: "object",
|
||||
properties: {
|
||||
new: {
|
||||
type: "boolean",
|
||||
description: "Set to true to get a new sessionId.",
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
{
|
||||
name: "get_integrations",
|
||||
description:
|
||||
@ -162,7 +177,7 @@ export const memoryTools = [
|
||||
{
|
||||
name: "get_integration_actions",
|
||||
description:
|
||||
"Get list of actions available for a specific integration. USE THIS TOOL: After get_integrations to see what operations you can perform. For example, GitHub integration has actions like 'get_pr', 'get_issues', 'create_issue'. HOW TO USE: Provide the integrationSlug from get_integrations (like 'github', 'linear', 'slack'). Returns: Array of actions with name, description, and inputSchema for each.",
|
||||
"Get list of actions available for a specific integration. USE THIS TOOL: After get_integrations to see what operations you can perform. For example, GitHub integration has actions like 'get_pr', 'get_issues', 'create_issue'. HOW TO USE: Provide the integrationSlug from get_integrations (like 'github', 'linear', 'slack').",
|
||||
inputSchema: {
|
||||
type: "object",
|
||||
properties: {
|
||||
@ -178,7 +193,7 @@ export const memoryTools = [
|
||||
{
|
||||
name: "execute_integration_action",
|
||||
description:
|
||||
"Execute an action on an integration (fetch GitHub PR, create Linear issue, send Slack message, etc.). USE THIS TOOL: After using get_integration_actions to see available actions. HOW TO USE: 1) Set integrationSlug (like 'github'), 2) Set action name (like 'get_pr'), 3) Set arguments object with required parameters from the action's inputSchema. Returns: Result of the action execution.",
|
||||
"Execute an action on an integration (fetch GitHub PR, create Linear issue, send Slack message, etc.). USE THIS TOOL: After using get_integration_actions to see available actions. HOW TO USE: 1) Set integrationSlug (like 'github'), 2) Set action name (like 'get_pr'), 3) Set arguments object with required parameters from the action's inputSchema.",
|
||||
inputSchema: {
|
||||
type: "object",
|
||||
properties: {
|
||||
@ -243,6 +258,8 @@ export async function callMemoryTool(
|
||||
return await handleUserProfile(userId);
|
||||
case "memory_get_space":
|
||||
return await handleGetSpace({ ...args, userId });
|
||||
case "get_session_id":
|
||||
return await handleGetSessionId();
|
||||
case "get_integrations":
|
||||
return await handleGetIntegrations({ ...args, userId });
|
||||
case "get_integration_actions":
|
||||
@ -489,6 +506,35 @@ async function handleGetSpace(args: any) {
|
||||
}
|
||||
}
|
||||
|
||||
// Handler for get_session_id
|
||||
async function handleGetSessionId() {
|
||||
try {
|
||||
const sessionId = randomUUID();
|
||||
|
||||
return {
|
||||
content: [
|
||||
{
|
||||
type: "text",
|
||||
text: JSON.stringify({ sessionId }),
|
||||
},
|
||||
],
|
||||
isError: false,
|
||||
};
|
||||
} catch (error) {
|
||||
logger.error(`MCP get session id error: ${error}`);
|
||||
|
||||
return {
|
||||
content: [
|
||||
{
|
||||
type: "text",
|
||||
text: `Error generating session ID: ${error instanceof Error ? error.message : String(error)}`,
|
||||
},
|
||||
],
|
||||
isError: true,
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
// Handler for get_integrations
|
||||
async function handleGetIntegrations(args: any) {
|
||||
try {
|
||||
|
||||
@ -1,6 +1,6 @@
|
||||
import { logger } from "~/services/logger.service";
|
||||
import { fetchAndSaveStdioIntegrations } from "~/trigger/utils/mcp";
|
||||
import { initNeo4jSchemaOnce } from "~/lib/neo4j.server";
|
||||
import { initNeo4jSchemaOnce, verifyConnectivity } from "~/lib/neo4j.server";
|
||||
import { env } from "~/env.server";
|
||||
import { initWorkers, shutdownWorkers } from "~/bullmq/start-workers";
|
||||
import { trackConfig } from "~/services/telemetry.server";
|
||||
@ -8,6 +8,31 @@ import { trackConfig } from "~/services/telemetry.server";
|
||||
// Global flag to ensure startup only runs once per server process
|
||||
let startupInitialized = false;
|
||||
|
||||
/**
|
||||
* Wait for Neo4j to be ready before initializing schema
|
||||
*/
|
||||
async function waitForNeo4j(maxRetries = 30, retryDelay = 2000) {
|
||||
logger.info("Waiting for Neo4j to be ready...");
|
||||
|
||||
for (let i = 0; i < maxRetries; i++) {
|
||||
try {
|
||||
const connected = await verifyConnectivity();
|
||||
if (connected) {
|
||||
logger.info("✓ Neo4j is ready!");
|
||||
return true;
|
||||
}
|
||||
} catch (error) {
|
||||
// Connection failed, will retry
|
||||
}
|
||||
|
||||
logger.info(`Neo4j not ready, retrying... (${i + 1}/${maxRetries})`);
|
||||
await new Promise((resolve) => setTimeout(resolve, retryDelay));
|
||||
}
|
||||
|
||||
logger.error("Failed to connect to Neo4j after maximum retries");
|
||||
throw new Error("Failed to connect to Neo4j after maximum retries");
|
||||
}
|
||||
|
||||
/**
|
||||
* Initialize all startup services once per server process
|
||||
* Safe to call multiple times - will only run initialization once
|
||||
@ -80,6 +105,9 @@ export async function initializeStartupServices() {
|
||||
try {
|
||||
logger.info("Starting application initialization...");
|
||||
|
||||
// Wait for Neo4j to be ready
|
||||
await waitForNeo4j();
|
||||
|
||||
// Initialize Neo4j schema
|
||||
await initNeo4jSchemaOnce();
|
||||
logger.info("Neo4j schema initialization completed");
|
||||
|
||||
@ -14,9 +14,7 @@ description: "Get started with CORE in 5 minutes"
|
||||
|
||||
## Requirements
|
||||
|
||||
These are the minimum requirements for running the webapp and background job components. They can run on the same, or on separate machines.
|
||||
|
||||
It's fine to run everything on the same machine for testing. To be able to scale your workers, you will want to run them separately.
|
||||
These are the minimum requirements for running the core.
|
||||
|
||||
### Prerequisites
|
||||
|
||||
@ -27,7 +25,6 @@ To run CORE, you will need:
|
||||
|
||||
### System Requirements
|
||||
|
||||
**Webapp & Database Machine:**
|
||||
- 4+ vCPU
|
||||
- 8+ GB RAM
|
||||
- 20+ GB Storage
|
||||
@ -41,7 +38,7 @@ CORE offers multiple deployment approaches depending on your needs:
|
||||
|
||||
For a one-click deployment experience, use Railway:
|
||||
|
||||
[](https://railway.com/deploy/6aEd9C?referralCode=LHvbIb&utm_medium=integration&utm_source=template&utm_campaign=generic)
|
||||
[](https://railway.com/deploy/core?referralCode=LHvbIb&utm_medium=integration&utm_source=template&utm_campaign=generic)
|
||||
|
||||
Railway will automatically set up all required services and handle the infrastructure for you.
|
||||
|
||||
|
||||
@ -16,7 +16,7 @@ We provide version-tagged releases for self-hosted deployments. It's highly advi
|
||||
|
||||
For a quick one-click deployment, you can use Railway:
|
||||
|
||||
[](https://railway.com/deploy/6aEd9C?referralCode=LHvbIb&utm_medium=integration&utm_source=template&utm_campaign=generic)
|
||||
[](https://railway.com/deploy/core?referralCode=LHvbIb&utm_medium=integration&utm_source=template&utm_campaign=generic)
|
||||
|
||||
Alternatively, you can follow our [Docker deployment guide](/self-hosting/docker) for manual setup.
|
||||
|
||||
|
||||
@ -8,55 +8,55 @@ x-logging: &logging-config
|
||||
version: "3.8"
|
||||
|
||||
services:
|
||||
core:
|
||||
container_name: core-app
|
||||
image: redplanethq/core:${VERSION}
|
||||
environment:
|
||||
- NODE_ENV=${NODE_ENV}
|
||||
- DATABASE_URL=${DATABASE_URL}
|
||||
- DIRECT_URL=${DIRECT_URL}
|
||||
- SESSION_SECRET=${SESSION_SECRET}
|
||||
- ENCRYPTION_KEY=${ENCRYPTION_KEY}
|
||||
- MAGIC_LINK_SECRET=${MAGIC_LINK_SECRET}
|
||||
- LOGIN_ORIGIN=${CORE_LOGIN_ORIGIN}
|
||||
- APP_ORIGIN=${CORE_APP_ORIGIN}
|
||||
- REDIS_HOST=${REDIS_HOST}
|
||||
- REDIS_PORT=${REDIS_PORT}
|
||||
- REDIS_PASSWORD=${REDIS_PASSWORD}
|
||||
- REDIS_TLS_DISABLED=${REDIS_TLS_DISABLED}
|
||||
- NEO4J_URI=${NEO4J_URI}
|
||||
- NEO4J_USERNAME=${NEO4J_USERNAME}
|
||||
- NEO4J_PASSWORD=${NEO4J_PASSWORD}
|
||||
- OPENAI_API_KEY=${OPENAI_API_KEY}
|
||||
- AUTH_GOOGLE_CLIENT_ID=${AUTH_GOOGLE_CLIENT_ID}
|
||||
- AUTH_GOOGLE_CLIENT_SECRET=${AUTH_GOOGLE_CLIENT_SECRET}
|
||||
- ENABLE_EMAIL_LOGIN=${ENABLE_EMAIL_LOGIN}
|
||||
- OLLAMA_URL=${OLLAMA_URL}
|
||||
- EMBEDDING_MODEL=${EMBEDDING_MODEL}
|
||||
- MODEL=${MODEL}
|
||||
- TRIGGER_PROJECT_ID=${TRIGGER_PROJECT_ID}
|
||||
- TRIGGER_SECRET_KEY=${TRIGGER_SECRET_KEY}
|
||||
- TRIGGER_API_URL=${API_ORIGIN}
|
||||
- POSTGRES_DB=${POSTGRES_DB}
|
||||
- EMAIL_TRANSPORT=${EMAIL_TRANSPORT}
|
||||
- REPLY_TO_EMAIL=${REPLY_TO_EMAIL}
|
||||
- FROM_EMAIL=${FROM_EMAIL}
|
||||
- RESEND_API_KEY=${RESEND_API_KEY}
|
||||
- COHERE_API_KEY=${COHERE_API_KEY}
|
||||
- QUEUE_PROVIDER=${QUEUE_PROVIDER}
|
||||
- TELEMETRY_ENABLED=${TELEMETRY_ENABLED}
|
||||
- TELEMETRY_ANONYMOUS=${TELEMETRY_ANONYMOUS}
|
||||
ports:
|
||||
- "3033:3000"
|
||||
depends_on:
|
||||
postgres:
|
||||
condition: service_healthy
|
||||
redis:
|
||||
condition: service_started
|
||||
neo4j:
|
||||
condition: service_healthy
|
||||
networks:
|
||||
- core
|
||||
# core:
|
||||
# container_name: core-app
|
||||
# image: redplanethq/core:${VERSION}
|
||||
# environment:
|
||||
# - NODE_ENV=${NODE_ENV}
|
||||
# - DATABASE_URL=${DATABASE_URL}
|
||||
# - DIRECT_URL=${DIRECT_URL}
|
||||
# - SESSION_SECRET=${SESSION_SECRET}
|
||||
# - ENCRYPTION_KEY=${ENCRYPTION_KEY}
|
||||
# - MAGIC_LINK_SECRET=${MAGIC_LINK_SECRET}
|
||||
# - LOGIN_ORIGIN=${CORE_LOGIN_ORIGIN}
|
||||
# - APP_ORIGIN=${CORE_APP_ORIGIN}
|
||||
# - REDIS_HOST=${REDIS_HOST}
|
||||
# - REDIS_PORT=${REDIS_PORT}
|
||||
# - REDIS_PASSWORD=${REDIS_PASSWORD}
|
||||
# - REDIS_TLS_DISABLED=${REDIS_TLS_DISABLED}
|
||||
# - NEO4J_URI=${NEO4J_URI}
|
||||
# - NEO4J_USERNAME=${NEO4J_USERNAME}
|
||||
# - NEO4J_PASSWORD=${NEO4J_PASSWORD}
|
||||
# - OPENAI_API_KEY=${OPENAI_API_KEY}
|
||||
# - AUTH_GOOGLE_CLIENT_ID=${AUTH_GOOGLE_CLIENT_ID}
|
||||
# - AUTH_GOOGLE_CLIENT_SECRET=${AUTH_GOOGLE_CLIENT_SECRET}
|
||||
# - ENABLE_EMAIL_LOGIN=${ENABLE_EMAIL_LOGIN}
|
||||
# - OLLAMA_URL=${OLLAMA_URL}
|
||||
# - EMBEDDING_MODEL=${EMBEDDING_MODEL}
|
||||
# - MODEL=${MODEL}
|
||||
# - TRIGGER_PROJECT_ID=${TRIGGER_PROJECT_ID}
|
||||
# - TRIGGER_SECRET_KEY=${TRIGGER_SECRET_KEY}
|
||||
# - TRIGGER_API_URL=${API_ORIGIN}
|
||||
# - POSTGRES_DB=${POSTGRES_DB}
|
||||
# - EMAIL_TRANSPORT=${EMAIL_TRANSPORT}
|
||||
# - REPLY_TO_EMAIL=${REPLY_TO_EMAIL}
|
||||
# - FROM_EMAIL=${FROM_EMAIL}
|
||||
# - RESEND_API_KEY=${RESEND_API_KEY}
|
||||
# - COHERE_API_KEY=${COHERE_API_KEY}
|
||||
# - QUEUE_PROVIDER=${QUEUE_PROVIDER}
|
||||
# - TELEMETRY_ENABLED=${TELEMETRY_ENABLED}
|
||||
# - TELEMETRY_ANONYMOUS=${TELEMETRY_ANONYMOUS}
|
||||
# ports:
|
||||
# - "3033:3000"
|
||||
# depends_on:
|
||||
# postgres:
|
||||
# condition: service_healthy
|
||||
# redis:
|
||||
# condition: service_started
|
||||
# neo4j:
|
||||
# condition: service_healthy
|
||||
# networks:
|
||||
# - core
|
||||
|
||||
postgres:
|
||||
container_name: core-postgres
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user