mirror of
https://github.com/eliasstepanik/core.git
synced 2026-01-09 22:48:39 +00:00
Feat: Space v3
* feat: space v3 * feat: connected space creation * fix: * fix: session_id for memory ingestion * chore: simplify gitignore patterns for agent directories --------- Co-authored-by: Manoj <saimanoj58@gmail.com>
This commit is contained in:
parent
c5407be54d
commit
c869096be8
@ -41,10 +41,7 @@ NEO4J_USERNAME=neo4j
|
||||
NEO4J_PASSWORD=27192e6432564f4788d55c15131bd5ac
|
||||
OPENAI_API_KEY=
|
||||
|
||||
|
||||
MAGIC_LINK_SECRET=27192e6432564f4788d55c15131bd5ac
|
||||
|
||||
|
||||
NEO4J_AUTH=neo4j/27192e6432564f4788d55c15131bd5ac
|
||||
OLLAMA_URL=http://ollama:11434
|
||||
|
||||
|
||||
26
.github/workflows/build-docker-image.yml
vendored
26
.github/workflows/build-docker-image.yml
vendored
@ -7,32 +7,6 @@ on:
|
||||
workflow_dispatch:
|
||||
|
||||
jobs:
|
||||
build-init:
|
||||
runs-on: ubuntu-latest
|
||||
|
||||
steps:
|
||||
- uses: actions/checkout@v2
|
||||
with:
|
||||
ref: main
|
||||
|
||||
- name: Set up QEMU
|
||||
uses: docker/setup-qemu-action@v1
|
||||
|
||||
- name: Set up Docker Buildx
|
||||
uses: docker/setup-buildx-action@v1
|
||||
|
||||
- name: Login to Docker Registry
|
||||
run: echo "${{ secrets.DOCKER_PASSWORD }}" | docker login -u "${{ secrets.DOCKER_USERNAME }}" --password-stdin
|
||||
|
||||
- name: Build and Push Frontend Docker Image
|
||||
uses: docker/build-push-action@v2
|
||||
with:
|
||||
context: .
|
||||
file: ./apps/init/Dockerfile
|
||||
platforms: linux/amd64,linux/arm64
|
||||
push: true
|
||||
tags: redplanethq/init:${{ github.ref_name }}
|
||||
|
||||
build-webapp:
|
||||
runs-on: ubuntu-latest
|
||||
|
||||
|
||||
17
.gitignore
vendored
17
.gitignore
vendored
@ -46,13 +46,14 @@ registry/
|
||||
|
||||
.cursor
|
||||
CLAUDE.md
|
||||
AGENTS.md
|
||||
|
||||
.claude
|
||||
.clinerules/byterover-rules.md
|
||||
.kilocode/rules/byterover-rules.md
|
||||
.roo/rules/byterover-rules.md
|
||||
.windsurf/rules/byterover-rules.md
|
||||
.cursor/rules/byterover-rules.mdc
|
||||
.kiro/steering/byterover-rules.md
|
||||
.qoder/rules/byterover-rules.md
|
||||
.augment/rules/byterover-rules.md
|
||||
.clinerules
|
||||
.kilocode
|
||||
.roo
|
||||
.windsurf
|
||||
.cursor
|
||||
.kiro
|
||||
.qoder
|
||||
.augment
|
||||
51
README.md
51
README.md
@ -55,7 +55,7 @@ CORE memory achieves **88.24%** average accuracy in Locomo dataset across all re
|
||||
|
||||
## Overview
|
||||
|
||||
**Problem**
|
||||
**Problem**
|
||||
|
||||
Developers waste time re-explaining context to AI tools. Hit token limits in Claude? Start fresh and lose everything. Switch from ChatGPT/Claude to Cursor? Explain your context again. Your conversations, decisions, and insights vanish between sessions. With every new AI tool, the cost of context switching grows.
|
||||
|
||||
@ -64,6 +64,7 @@ Developers waste time re-explaining context to AI tools. Hit token limits in Cla
|
||||
CORE is an open-source unified, persistent memory layer for all your AI tools. Your context follows you from Cursor to Claude to ChatGPT to Claude Code. One knowledge graph remembers who said what, when, and why. Connect once, remember everywhere. Stop managing context and start building.
|
||||
|
||||
## 🚀 CORE Self-Hosting
|
||||
|
||||
Want to run CORE on your own infrastructure? Self-hosting gives you complete control over your data and deployment.
|
||||
|
||||
**Quick Deploy Options:**
|
||||
@ -80,15 +81,20 @@ Want to run CORE on your own infrastructure? Self-hosting gives you complete con
|
||||
### Setup
|
||||
|
||||
1. Clone the repository:
|
||||
|
||||
```
|
||||
git clone https://github.com/RedPlanetHQ/core.git
|
||||
cd core
|
||||
```
|
||||
|
||||
2. Configure environment variables in `core/.env`:
|
||||
|
||||
```
|
||||
OPENAI_API_KEY=your_openai_api_key
|
||||
```
|
||||
|
||||
3. Start the service
|
||||
|
||||
```
|
||||
docker-compose up -d
|
||||
```
|
||||
@ -100,6 +106,7 @@ Once deployed, you can configure your AI providers (OpenAI, Anthropic) and start
|
||||
Note: We tried open-source models like Ollama or GPT OSS but facts generation were not good, we are still figuring out how to improve on that and then will also support OSS models.
|
||||
|
||||
## 🚀 CORE Cloud
|
||||
|
||||
**Build your unified memory graph in 5 minutes:**
|
||||
|
||||
Don't want to manage infrastructure? CORE Cloud lets you build your personal memory system instantly - no setup, no servers, just memory that works.
|
||||
@ -115,24 +122,24 @@ Don't want to manage infrastructure? CORE Cloud lets you build your personal mem
|
||||
|
||||
## 🧩 Key Features
|
||||
|
||||
### 🧠 **Unified, Portable Memory**:
|
||||
### 🧠 **Unified, Portable Memory**:
|
||||
|
||||
Add and recall your memory across **Cursor, Windsurf, Claude Desktop, Claude Code, Gemini CLI, AWS's Kiro, VS Code, and Roo Code** via MCP
|
||||
|
||||

|
||||
|
||||
|
||||
### 🕸️ **Temporal + Reified Knowledge Graph**:
|
||||
### 🕸️ **Temporal + Reified Knowledge Graph**:
|
||||
|
||||
Remember the story behind every fact—track who said what, when, and why with rich relationships and full provenance, not just flat storage
|
||||
|
||||

|
||||
|
||||
|
||||
### 🌐 **Browser Extension**:
|
||||
### 🌐 **Browser Extension**:
|
||||
|
||||
Save conversations and content from ChatGPT, Grok, Gemini, Twitter, YouTube, blog posts, and any webpage directly into your CORE memory.
|
||||
|
||||
**How to Use Extension**
|
||||
|
||||
1. [Download the Extension](https://chromewebstore.google.com/detail/core-extension/cglndoindnhdbfcbijikibfjoholdjcc) from the Chrome Web Store.
|
||||
2. Login to [CORE dashboard](https://core.heysol.ai)
|
||||
- Navigate to Settings (bottom left)
|
||||
@ -141,13 +148,12 @@ Save conversations and content from ChatGPT, Grok, Gemini, Twitter, YouTube, blo
|
||||
|
||||
https://github.com/user-attachments/assets/6e629834-1b9d-4fe6-ae58-a9068986036a
|
||||
|
||||
### 💬 **Chat with Memory**:
|
||||
|
||||
### 💬 **Chat with Memory**:
|
||||
Ask questions like "What are my writing preferences?" with instant insights from your connected knowledge
|
||||
|
||||

|
||||
|
||||
|
||||
### ⚡ **Auto-Sync from Apps**:
|
||||
|
||||
Automatically capture relevant context from Linear, Slack, Notion, GitHub and other connected apps into your CORE memory
|
||||
@ -156,16 +162,12 @@ Automatically capture relevant context from Linear, Slack, Notion, GitHub and ot
|
||||
|
||||

|
||||
|
||||
|
||||
### 🔗 **MCP Integration Hub**:
|
||||
### 🔗 **MCP Integration Hub**:
|
||||
|
||||
Connect Linear, Slack, GitHub, Notion once to CORE—then use all their tools in Claude, Cursor, or any MCP client with a single URL
|
||||
|
||||
|
||||

|
||||
|
||||
|
||||
|
||||
## How CORE create memory
|
||||
|
||||
<img width="12885" height="3048" alt="memory-ingest-diagram" src="https://github.com/user-attachments/assets/c51679de-8260-4bee-bebf-aff32c6b8e13" />
|
||||
@ -179,7 +181,6 @@ CORE’s ingestion pipeline has four phases designed to capture evolving context
|
||||
|
||||
The Result: Instead of a flat database, CORE gives you a memory that grows and changes with you - preserving context, evolution, and ownership so agents can actually use it.
|
||||
|
||||
|
||||

|
||||
|
||||
## How CORE recalls from memory
|
||||
@ -204,7 +205,7 @@ Explore our documentation to get the most out of CORE
|
||||
- [Connect Core MCP with Claude](https://docs.heysol.ai/providers/claude)
|
||||
- [Connect Core MCP with Cursor](https://docs.heysol.ai/providers/cursor)
|
||||
- [Connect Core MCP with Claude Code](https://docs.heysol.ai/providers/claude-code)
|
||||
- [Connect Core MCP with Codex](https://docs.heysol.ai/providers/codex)
|
||||
- [Connect Core MCP with Codex](https://docs.heysol.ai/providers/codex)
|
||||
|
||||
- [Basic Concepts](https://docs.heysol.ai/overview)
|
||||
- [API Reference](https://docs.heysol.ai/api-reference/get-user-profile)
|
||||
@ -249,21 +250,11 @@ Have questions or feedback? We're here to help:
|
||||
<a href="https://github.com/RedPlanetHQ/core/graphs/contributors">
|
||||
<img src="https://contrib.rocks/image?repo=RedPlanetHQ/core" />
|
||||
</a>
|
||||
<<<<<<< Updated upstream
|
||||
|
||||
<<<<<<< HEAD
|
||||
|
||||
# =======
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
> > > > > > > Stashed changes
|
||||
> > > > > > > 62db6c1 (feat: automatic space identification)
|
||||
|
||||
51
apps/init/.gitignore
vendored
51
apps/init/.gitignore
vendored
@ -1,51 +0,0 @@
|
||||
# See https://help.github.com/articles/ignoring-files/ for more about ignoring files.
|
||||
|
||||
# Dependencies
|
||||
node_modules
|
||||
.pnp
|
||||
.pnp.js
|
||||
|
||||
# Local env files
|
||||
.env
|
||||
.env.local
|
||||
.env.development.local
|
||||
.env.test.local
|
||||
.env.production.local
|
||||
|
||||
# Testing
|
||||
coverage
|
||||
|
||||
# Turbo
|
||||
.turbo
|
||||
|
||||
# Vercel
|
||||
.vercel
|
||||
|
||||
# Build Outputs
|
||||
.next/
|
||||
out/
|
||||
build
|
||||
dist
|
||||
.tshy/
|
||||
.tshy-build/
|
||||
|
||||
# Debug
|
||||
npm-debug.log*
|
||||
yarn-debug.log*
|
||||
yarn-error.log*
|
||||
|
||||
# Misc
|
||||
.DS_Store
|
||||
*.pem
|
||||
|
||||
docker-compose.dev.yaml
|
||||
|
||||
clickhouse/
|
||||
.vscode/
|
||||
registry/
|
||||
|
||||
.cursor
|
||||
CLAUDE.md
|
||||
|
||||
.claude
|
||||
|
||||
@ -1,70 +0,0 @@
|
||||
ARG NODE_IMAGE=node:20.11.1-bullseye-slim@sha256:5a5a92b3a8d392691c983719dbdc65d9f30085d6dcd65376e7a32e6fe9bf4cbe
|
||||
|
||||
FROM ${NODE_IMAGE} AS pruner
|
||||
|
||||
WORKDIR /core
|
||||
|
||||
COPY --chown=node:node . .
|
||||
RUN npx -q turbo@2.5.3 prune --scope=@redplanethq/init --docker
|
||||
RUN find . -name "node_modules" -type d -prune -exec rm -rf '{}' +
|
||||
|
||||
# Base strategy to have layer caching
|
||||
FROM ${NODE_IMAGE} AS base
|
||||
RUN apt-get update && apt-get install -y openssl dumb-init postgresql-client
|
||||
WORKDIR /core
|
||||
COPY --chown=node:node .gitignore .gitignore
|
||||
COPY --from=pruner --chown=node:node /core/out/json/ .
|
||||
COPY --from=pruner --chown=node:node /core/out/pnpm-lock.yaml ./pnpm-lock.yaml
|
||||
COPY --from=pruner --chown=node:node /core/out/pnpm-workspace.yaml ./pnpm-workspace.yaml
|
||||
|
||||
## Dev deps
|
||||
FROM base AS dev-deps
|
||||
WORKDIR /core
|
||||
# Corepack is used to install pnpm
|
||||
RUN corepack enable
|
||||
ENV NODE_ENV development
|
||||
RUN pnpm install --ignore-scripts --no-frozen-lockfile
|
||||
|
||||
## Production deps
|
||||
FROM base AS production-deps
|
||||
WORKDIR /core
|
||||
# Corepack is used to install pnpm
|
||||
RUN corepack enable
|
||||
ENV NODE_ENV production
|
||||
RUN pnpm install --prod --no-frozen-lockfile
|
||||
|
||||
## Builder (builds the init CLI)
|
||||
FROM base AS builder
|
||||
WORKDIR /core
|
||||
# Corepack is used to install pnpm
|
||||
RUN corepack enable
|
||||
|
||||
COPY --from=pruner --chown=node:node /core/out/full/ .
|
||||
COPY --from=dev-deps --chown=node:node /core/ .
|
||||
COPY --chown=node:node turbo.json turbo.json
|
||||
COPY --chown=node:node .configs/tsconfig.base.json .configs/tsconfig.base.json
|
||||
RUN pnpm run build --filter=@redplanethq/init...
|
||||
|
||||
# Runner
|
||||
FROM ${NODE_IMAGE} AS runner
|
||||
RUN apt-get update && apt-get install -y openssl postgresql-client ca-certificates
|
||||
WORKDIR /core
|
||||
RUN corepack enable
|
||||
ENV NODE_ENV production
|
||||
|
||||
COPY --from=base /usr/bin/dumb-init /usr/bin/dumb-init
|
||||
COPY --from=pruner --chown=node:node /core/out/full/ .
|
||||
COPY --from=production-deps --chown=node:node /core .
|
||||
COPY --from=builder --chown=node:node /core/apps/init/dist ./apps/init/dist
|
||||
|
||||
# Copy the trigger dump file
|
||||
COPY --chown=node:node apps/init/trigger.dump ./apps/init/trigger.dump
|
||||
|
||||
# Copy and set up entrypoint script
|
||||
COPY --chown=node:node apps/init/entrypoint.sh ./apps/init/entrypoint.sh
|
||||
RUN chmod +x ./apps/init/entrypoint.sh
|
||||
|
||||
USER node
|
||||
WORKDIR /core/apps/init
|
||||
ENTRYPOINT ["dumb-init", "--"]
|
||||
CMD ["./entrypoint.sh"]
|
||||
@ -1,197 +0,0 @@
|
||||
# Core CLI
|
||||
|
||||
🧠 **CORE - Contextual Observation & Recall Engine**
|
||||
|
||||
A Command-Line Interface for setting up and managing the Core development environment.
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
npm install -g @redplanethq/core
|
||||
```
|
||||
|
||||
## Commands
|
||||
|
||||
### `core init`
|
||||
|
||||
**One-time setup command** - Initializes the Core development environment with full configuration.
|
||||
|
||||
### `core start`
|
||||
|
||||
**Daily usage command** - Starts all Core services (Docker containers).
|
||||
|
||||
### `core stop`
|
||||
|
||||
**Daily usage command** - Stops all Core services (Docker containers).
|
||||
|
||||
## Getting Started
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- **Node.js** (v18.20.0 or higher)
|
||||
- **Docker** and **Docker Compose**
|
||||
- **Git**
|
||||
- **pnpm** package manager
|
||||
|
||||
### Initial Setup
|
||||
|
||||
1. **Clone the Core repository:**
|
||||
```bash
|
||||
git clone https://github.com/redplanethq/core.git
|
||||
cd core
|
||||
```
|
||||
|
||||
2. **Run the initialization command:**
|
||||
```bash
|
||||
core init
|
||||
```
|
||||
|
||||
3. **The CLI will guide you through the complete setup process:**
|
||||
|
||||
#### Step 1: Prerequisites Check
|
||||
- The CLI shows a checklist of required tools
|
||||
- Confirms you're in the Core repository directory
|
||||
- Exits with instructions if prerequisites aren't met
|
||||
|
||||
#### Step 2: Environment Configuration
|
||||
|
||||
- Copies `.env.example` to `.env` in the root directory
|
||||
- Copies `trigger/.env.example` to `trigger/.env`
|
||||
- Skips copying if `.env` files already exist
|
||||
|
||||
#### Step 3: Docker Services Startup
|
||||
|
||||
- Starts main Core services: `docker compose up -d`
|
||||
- Starts Trigger.dev services: `docker compose up -d` (in trigger/ directory)
|
||||
- Shows real-time output with progress indicators
|
||||
|
||||
#### Step 4: Database Health Check
|
||||
|
||||
- Verifies PostgreSQL is running on `localhost:5432`
|
||||
- Retries for up to 60 seconds if needed
|
||||
|
||||
#### Step 5: Trigger.dev Setup (Interactive)
|
||||
|
||||
- **If Trigger.dev is not configured:**
|
||||
|
||||
1. Prompts you to open http://localhost:8030
|
||||
2. Asks you to login to Trigger.dev
|
||||
3. Guides you to create an organization and project
|
||||
4. Collects your Project ID and Secret Key
|
||||
5. Updates `.env` with your Trigger.dev configuration
|
||||
6. Restarts Core services with new configuration
|
||||
|
||||
- **If Trigger.dev is already configured:**
|
||||
- Skips setup and shows "Configuration already exists" message
|
||||
|
||||
#### Step 6: Docker Registry Login
|
||||
|
||||
- Displays docker login command with credentials from `.env`
|
||||
- Waits for you to complete the login process
|
||||
|
||||
#### Step 7: Trigger.dev Task Deployment
|
||||
|
||||
- Automatically runs: `npx trigger.dev@v4-beta login -a http://localhost:8030`
|
||||
- Deploys tasks with: `pnpm trigger:deploy`
|
||||
- Shows manual deployment instructions if automatic deployment fails
|
||||
|
||||
#### Step 8: Setup Complete!
|
||||
|
||||
- Confirms all services are running
|
||||
- Shows service URLs and connection information
|
||||
|
||||
## Daily Usage
|
||||
|
||||
After initial setup, use these commands for daily development:
|
||||
|
||||
### Start Services
|
||||
|
||||
```bash
|
||||
core start
|
||||
```
|
||||
|
||||
Starts all Docker containers for Core development.
|
||||
|
||||
### Stop Services
|
||||
|
||||
```bash
|
||||
core stop
|
||||
```
|
||||
|
||||
Stops all Docker containers.
|
||||
|
||||
## Service URLs
|
||||
|
||||
After setup, these services will be available:
|
||||
|
||||
- **Core Application**: http://localhost:3033
|
||||
- **Trigger.dev**: http://localhost:8030
|
||||
- **PostgreSQL**: localhost:5432
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Repository Not Found
|
||||
|
||||
If you run commands outside the Core repository:
|
||||
|
||||
- The CLI will ask you to confirm you're in the Core repository
|
||||
- If not, it provides instructions to clone the repository
|
||||
- Navigate to the Core repository directory before running commands again
|
||||
|
||||
### Docker Issues
|
||||
|
||||
- Ensure Docker is running
|
||||
- Check Docker Compose is installed
|
||||
- Verify you have sufficient system resources
|
||||
|
||||
### Trigger.dev Setup Issues
|
||||
|
||||
- Check container logs: `docker logs trigger-webapp --tail 50`
|
||||
- Ensure you can access http://localhost:8030
|
||||
- Verify your network allows connections to localhost
|
||||
|
||||
### Environment Variables
|
||||
|
||||
The CLI automatically manages these environment variables:
|
||||
|
||||
- `TRIGGER_PROJECT_ID` - Your Trigger.dev project ID
|
||||
- `TRIGGER_SECRET_KEY` - Your Trigger.dev secret key
|
||||
- Docker registry credentials for deployment
|
||||
|
||||
### Manual Trigger.dev Deployment
|
||||
|
||||
If automatic deployment fails, run manually:
|
||||
|
||||
```bash
|
||||
npx trigger.dev@v4-beta login -a http://localhost:8030
|
||||
pnpm trigger:deploy
|
||||
```
|
||||
|
||||
## Development Workflow
|
||||
|
||||
1. **First time setup:** `core init`
|
||||
2. **Daily development:**
|
||||
- `core start` - Start your development environment
|
||||
- Do your development work
|
||||
- `core stop` - Stop services when done
|
||||
|
||||
## Support
|
||||
|
||||
For issues and questions:
|
||||
|
||||
- Check the main Core repository: https://github.com/redplanethq/core
|
||||
- Review Docker container logs for troubleshooting
|
||||
- Ensure all prerequisites are properly installed
|
||||
|
||||
## Features
|
||||
|
||||
- 🚀 **One-command setup** - Complete environment initialization
|
||||
- 🔄 **Smart configuration** - Skips already configured components
|
||||
- 📱 **Real-time feedback** - Live progress indicators and output
|
||||
- 🐳 **Docker integration** - Full container lifecycle management
|
||||
- 🔧 **Interactive setup** - Guided configuration process
|
||||
- 🎯 **Error handling** - Graceful failure with recovery instructions
|
||||
|
||||
---
|
||||
|
||||
**Happy coding with Core!** 🎉
|
||||
@ -1,22 +0,0 @@
|
||||
#!/bin/sh
|
||||
|
||||
# Exit on any error
|
||||
set -e
|
||||
|
||||
echo "Starting init CLI..."
|
||||
|
||||
# Wait for database to be ready
|
||||
echo "Waiting for database connection..."
|
||||
until pg_isready -h "${DB_HOST:-localhost}" -p "${DB_PORT:-5432}" -U "${POSTGRES_USER:-docker}"; do
|
||||
echo "Database is unavailable - sleeping"
|
||||
sleep 2
|
||||
done
|
||||
|
||||
echo "Database is ready!"
|
||||
|
||||
# Run the init command
|
||||
echo "Running init command..."
|
||||
node ./dist/esm/index.js init
|
||||
|
||||
echo "Init completed successfully!"
|
||||
exit 0
|
||||
@ -1,145 +0,0 @@
|
||||
{
|
||||
"name": "@redplanethq/init",
|
||||
"version": "0.1.0",
|
||||
"description": "A init service to create trigger instance",
|
||||
"type": "module",
|
||||
"license": "MIT",
|
||||
"repository": {
|
||||
"type": "git",
|
||||
"url": "https://github.com/redplanethq/core",
|
||||
"directory": "apps/init"
|
||||
},
|
||||
"publishConfig": {
|
||||
"access": "public"
|
||||
},
|
||||
"keywords": [
|
||||
"typescript"
|
||||
],
|
||||
"files": [
|
||||
"dist",
|
||||
"trigger.dump"
|
||||
],
|
||||
"bin": {
|
||||
"core": "./dist/esm/index.js"
|
||||
},
|
||||
"tshy": {
|
||||
"selfLink": false,
|
||||
"main": false,
|
||||
"module": false,
|
||||
"dialects": [
|
||||
"esm"
|
||||
],
|
||||
"project": "./tsconfig.json",
|
||||
"exclude": [
|
||||
"**/*.test.ts"
|
||||
],
|
||||
"exports": {
|
||||
"./package.json": "./package.json",
|
||||
".": "./src/index.ts"
|
||||
}
|
||||
},
|
||||
"devDependencies": {
|
||||
"@epic-web/test-server": "^0.1.0",
|
||||
"@types/gradient-string": "^1.1.2",
|
||||
"@types/ini": "^4.1.1",
|
||||
"@types/object-hash": "3.0.6",
|
||||
"@types/polka": "^0.5.7",
|
||||
"@types/react": "^18.2.48",
|
||||
"@types/resolve": "^1.20.6",
|
||||
"@types/rimraf": "^4.0.5",
|
||||
"@types/semver": "^7.5.0",
|
||||
"@types/source-map-support": "0.5.10",
|
||||
"@types/ws": "^8.5.3",
|
||||
"cpy-cli": "^5.0.0",
|
||||
"execa": "^8.0.1",
|
||||
"find-up": "^7.0.0",
|
||||
"rimraf": "^5.0.7",
|
||||
"ts-essentials": "10.0.1",
|
||||
"tshy": "^3.0.2",
|
||||
"tsx": "4.17.0"
|
||||
},
|
||||
"scripts": {
|
||||
"clean": "rimraf dist .tshy .tshy-build .turbo",
|
||||
"typecheck": "tsc -p tsconfig.src.json --noEmit",
|
||||
"build": "tshy",
|
||||
"test": "vitest",
|
||||
"test:e2e": "vitest --run -c ./e2e/vitest.config.ts"
|
||||
},
|
||||
"dependencies": {
|
||||
"@clack/prompts": "^0.10.0",
|
||||
"@depot/cli": "0.0.1-cli.2.80.0",
|
||||
"@opentelemetry/api": "1.9.0",
|
||||
"@opentelemetry/api-logs": "0.52.1",
|
||||
"@opentelemetry/exporter-logs-otlp-http": "0.52.1",
|
||||
"@opentelemetry/exporter-trace-otlp-http": "0.52.1",
|
||||
"@opentelemetry/instrumentation": "0.52.1",
|
||||
"@opentelemetry/instrumentation-fetch": "0.52.1",
|
||||
"@opentelemetry/resources": "1.25.1",
|
||||
"@opentelemetry/sdk-logs": "0.52.1",
|
||||
"@opentelemetry/sdk-node": "0.52.1",
|
||||
"@opentelemetry/sdk-trace-base": "1.25.1",
|
||||
"@opentelemetry/sdk-trace-node": "1.25.1",
|
||||
"@opentelemetry/semantic-conventions": "1.25.1",
|
||||
"ansi-escapes": "^7.0.0",
|
||||
"braces": "^3.0.3",
|
||||
"c12": "^1.11.1",
|
||||
"chalk": "^5.2.0",
|
||||
"chokidar": "^3.6.0",
|
||||
"cli-table3": "^0.6.3",
|
||||
"commander": "^9.4.1",
|
||||
"defu": "^6.1.4",
|
||||
"dotenv": "^16.4.5",
|
||||
"dotenv-expand": "^12.0.2",
|
||||
"esbuild": "^0.23.0",
|
||||
"eventsource": "^3.0.2",
|
||||
"evt": "^2.4.13",
|
||||
"fast-npm-meta": "^0.2.2",
|
||||
"git-last-commit": "^1.0.1",
|
||||
"gradient-string": "^2.0.2",
|
||||
"has-flag": "^5.0.1",
|
||||
"import-in-the-middle": "1.11.0",
|
||||
"import-meta-resolve": "^4.1.0",
|
||||
"ini": "^5.0.0",
|
||||
"jsonc-parser": "3.2.1",
|
||||
"magicast": "^0.3.4",
|
||||
"minimatch": "^10.0.1",
|
||||
"mlly": "^1.7.1",
|
||||
"nypm": "^0.5.4",
|
||||
"nanoid": "3.3.8",
|
||||
"object-hash": "^3.0.0",
|
||||
"open": "^10.0.3",
|
||||
"knex": "3.1.0",
|
||||
"p-limit": "^6.2.0",
|
||||
"p-retry": "^6.1.0",
|
||||
"partysocket": "^1.0.2",
|
||||
"pkg-types": "^1.1.3",
|
||||
"polka": "^0.5.2",
|
||||
"pg": "8.16.3",
|
||||
"resolve": "^1.22.8",
|
||||
"semver": "^7.5.0",
|
||||
"signal-exit": "^4.1.0",
|
||||
"source-map-support": "0.5.21",
|
||||
"std-env": "^3.7.0",
|
||||
"supports-color": "^10.0.0",
|
||||
"tiny-invariant": "^1.2.0",
|
||||
"tinyexec": "^0.3.1",
|
||||
"tinyglobby": "^0.2.10",
|
||||
"uuid": "11.1.0",
|
||||
"ws": "^8.18.0",
|
||||
"xdg-app-paths": "^8.3.0",
|
||||
"zod": "3.23.8",
|
||||
"zod-validation-error": "^1.5.0"
|
||||
},
|
||||
"engines": {
|
||||
"node": ">=18.20.0"
|
||||
},
|
||||
"exports": {
|
||||
"./package.json": "./package.json",
|
||||
".": {
|
||||
"import": {
|
||||
"types": "./dist/esm/index.d.ts",
|
||||
"default": "./dist/esm/index.js"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
@ -1,14 +0,0 @@
|
||||
import { Command } from "commander";
|
||||
import { initCommand } from "../commands/init.js";
|
||||
import { VERSION } from "./version.js";
|
||||
|
||||
const program = new Command();
|
||||
|
||||
program.name("core").description("Core CLI - A Command-Line Interface for Core").version(VERSION);
|
||||
|
||||
program
|
||||
.command("init")
|
||||
.description("Initialize Core development environment (run once)")
|
||||
.action(initCommand);
|
||||
|
||||
program.parse(process.argv);
|
||||
@ -1,3 +0,0 @@
|
||||
import { env } from "../utils/env.js";
|
||||
|
||||
export const VERSION = env.VERSION;
|
||||
@ -1,36 +0,0 @@
|
||||
import { intro, outro, note } from "@clack/prompts";
|
||||
import { printCoreBrainLogo } from "../utils/ascii.js";
|
||||
import { initTriggerDatabase, updateWorkerImage } from "../utils/trigger.js";
|
||||
|
||||
export async function initCommand() {
|
||||
// Display the CORE brain logo
|
||||
printCoreBrainLogo();
|
||||
|
||||
intro("🚀 Core Development Environment Setup");
|
||||
|
||||
try {
|
||||
await initTriggerDatabase();
|
||||
await updateWorkerImage();
|
||||
|
||||
note(
|
||||
[
|
||||
"Your services will start running:",
|
||||
"",
|
||||
"• Core Application: http://localhost:3033",
|
||||
"• Trigger.dev: http://localhost:8030",
|
||||
"• PostgreSQL: localhost:5432",
|
||||
"",
|
||||
"You can now start developing with Core!",
|
||||
"",
|
||||
"ℹ️ When logging in to the Core Application, you can find the login URL in the Docker container logs:",
|
||||
" docker logs core-app --tail 50",
|
||||
].join("\n"),
|
||||
"🚀 Services Running"
|
||||
);
|
||||
outro("🎉 Setup Complete!");
|
||||
process.exit(0);
|
||||
} catch (error: any) {
|
||||
outro(`❌ Setup failed: ${error.message}`);
|
||||
process.exit(1);
|
||||
}
|
||||
}
|
||||
@ -1,3 +0,0 @@
|
||||
#!/usr/bin/env node
|
||||
|
||||
import "./cli/index.js";
|
||||
@ -1,29 +0,0 @@
|
||||
import chalk from "chalk";
|
||||
import { VERSION } from "../cli/version.js";
|
||||
|
||||
export function printCoreBrainLogo(): void {
|
||||
const brain = `
|
||||
██████╗ ██████╗ ██████╗ ███████╗
|
||||
██╔════╝██╔═══██╗██╔══██╗██╔════╝
|
||||
██║ ██║ ██║██████╔╝█████╗
|
||||
██║ ██║ ██║██╔══██╗██╔══╝
|
||||
╚██████╗╚██████╔╝██║ ██║███████╗
|
||||
╚═════╝ ╚═════╝ ╚═╝ ╚═╝╚══════╝
|
||||
|
||||
o o o
|
||||
o o---o---o o
|
||||
o---o o o---o---o
|
||||
o o---o---o---o o
|
||||
o---o o o---o---o
|
||||
o o---o---o o
|
||||
o o o
|
||||
|
||||
`;
|
||||
|
||||
console.log(chalk.cyan(brain));
|
||||
console.log(
|
||||
chalk.bold.white(
|
||||
` 🧠 CORE - Contextual Observation & Recall Engine ${VERSION ? chalk.gray(`(${VERSION})`) : ""}\n`
|
||||
)
|
||||
);
|
||||
}
|
||||
@ -1,24 +0,0 @@
|
||||
import { z } from "zod";
|
||||
|
||||
const EnvironmentSchema = z.object({
|
||||
// Version
|
||||
VERSION: z.string().default("0.1.24"),
|
||||
|
||||
// Database
|
||||
DB_HOST: z.string().default("localhost"),
|
||||
DB_PORT: z.string().default("5432"),
|
||||
TRIGGER_DB: z.string().default("trigger"),
|
||||
POSTGRES_USER: z.string().default("docker"),
|
||||
POSTGRES_PASSWORD: z.string().default("docker"),
|
||||
|
||||
// Trigger database
|
||||
TRIGGER_TASKS_IMAGE: z.string().default("redplanethq/proj_core:latest"),
|
||||
|
||||
// Node environment
|
||||
NODE_ENV: z
|
||||
.union([z.literal("development"), z.literal("production"), z.literal("test")])
|
||||
.default("development"),
|
||||
});
|
||||
|
||||
export type Environment = z.infer<typeof EnvironmentSchema>;
|
||||
export const env = EnvironmentSchema.parse(process.env);
|
||||
@ -1,182 +0,0 @@
|
||||
import Knex from "knex";
|
||||
import path from "path";
|
||||
import { fileURLToPath } from "url";
|
||||
import { env } from "./env.js";
|
||||
import { spinner, note, log } from "@clack/prompts";
|
||||
|
||||
const __filename = fileURLToPath(import.meta.url);
|
||||
const __dirname = path.dirname(__filename);
|
||||
|
||||
/**
|
||||
* Returns a PostgreSQL database URL for the given database name.
|
||||
* Throws if required environment variables are missing.
|
||||
*/
|
||||
export function getDatabaseUrl(dbName: string): string {
|
||||
const { POSTGRES_USER, POSTGRES_PASSWORD, DB_HOST, DB_PORT } = env;
|
||||
|
||||
if (!POSTGRES_USER || !POSTGRES_PASSWORD || !DB_HOST || !DB_PORT || !dbName) {
|
||||
throw new Error(
|
||||
"One or more required environment variables are missing: POSTGRES_USER, POSTGRES_PASSWORD, DB_HOST, DB_PORT, dbName"
|
||||
);
|
||||
}
|
||||
|
||||
return `postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@${DB_HOST}:${DB_PORT}/${dbName}`;
|
||||
}
|
||||
|
||||
/**
|
||||
* Checks if the database specified by TRIGGER_DB exists, and creates it if it does not.
|
||||
* Returns { exists: boolean, created: boolean } - exists indicates success, created indicates if database was newly created.
|
||||
*/
|
||||
export async function ensureDatabaseExists(): Promise<{ exists: boolean; created: boolean }> {
|
||||
const { TRIGGER_DB } = env;
|
||||
|
||||
if (!TRIGGER_DB) {
|
||||
throw new Error("TRIGGER_DB environment variable is missing");
|
||||
}
|
||||
|
||||
// Build a connection string to the default 'postgres' database
|
||||
const adminDbUrl = getDatabaseUrl("postgres");
|
||||
|
||||
// Create a Knex instance for the admin connection
|
||||
const adminKnex = Knex({
|
||||
client: "pg",
|
||||
connection: adminDbUrl,
|
||||
});
|
||||
|
||||
const s = spinner();
|
||||
s.start("Checking for Trigger.dev database...");
|
||||
|
||||
try {
|
||||
// Check if the database exists
|
||||
const result = await adminKnex.select(1).from("pg_database").where("datname", TRIGGER_DB);
|
||||
|
||||
if (result.length === 0) {
|
||||
s.message("Database not found. Creating...");
|
||||
// Database does not exist, create it
|
||||
await adminKnex.raw(`CREATE DATABASE "${TRIGGER_DB}"`);
|
||||
s.stop("Database created.");
|
||||
return { exists: true, created: true };
|
||||
} else {
|
||||
s.stop("Database exists.");
|
||||
return { exists: true, created: false };
|
||||
}
|
||||
} catch (err) {
|
||||
s.stop("Failed to ensure database exists.");
|
||||
log.warning("Failed to ensure database exists: " + (err as Error).message);
|
||||
return { exists: false, created: false };
|
||||
} finally {
|
||||
await adminKnex.destroy();
|
||||
}
|
||||
}
|
||||
|
||||
// Main initialization function
|
||||
export async function initTriggerDatabase() {
|
||||
const { TRIGGER_DB } = env;
|
||||
|
||||
if (!TRIGGER_DB) {
|
||||
throw new Error("TRIGGER_DB environment variable is missing");
|
||||
}
|
||||
|
||||
// Ensure the database exists
|
||||
const { exists, created } = await ensureDatabaseExists();
|
||||
if (!exists) {
|
||||
throw new Error("Failed to create or verify database exists");
|
||||
}
|
||||
|
||||
// Only run pg_restore if the database was newly created
|
||||
if (!created) {
|
||||
note("Database already exists, skipping restore from trigger.dump");
|
||||
return;
|
||||
}
|
||||
|
||||
// Run pg_restore with the trigger.dump file
|
||||
const dumpFilePath = path.join(__dirname, "../../../trigger.dump");
|
||||
const connectionString = getDatabaseUrl(TRIGGER_DB);
|
||||
|
||||
const s = spinner();
|
||||
s.start("Restoring database from trigger.dump...");
|
||||
|
||||
try {
|
||||
// Use execSync and capture stdout/stderr, send to spinner.log
|
||||
const { spawn } = await import("child_process");
|
||||
await new Promise<void>((resolve, reject) => {
|
||||
const child = spawn(
|
||||
"pg_restore",
|
||||
["--verbose", "--no-acl", "--no-owner", "-d", connectionString, dumpFilePath],
|
||||
{ stdio: ["ignore", "pipe", "pipe"] }
|
||||
);
|
||||
|
||||
child.stdout.on("data", (data) => {
|
||||
s.message(data.toString());
|
||||
});
|
||||
|
||||
child.stderr.on("data", (data) => {
|
||||
s.message(data.toString());
|
||||
});
|
||||
|
||||
child.on("close", (code) => {
|
||||
if (code === 0) {
|
||||
s.stop("Database restored successfully from trigger.dump");
|
||||
resolve();
|
||||
} else {
|
||||
s.stop("Failed to restore database.");
|
||||
log.warning(`Failed to restore database: pg_restore exited with code ${code}`);
|
||||
reject(new Error(`Database restore failed: pg_restore exited with code ${code}`));
|
||||
}
|
||||
});
|
||||
|
||||
child.on("error", (err) => {
|
||||
s.stop("Failed to restore database.");
|
||||
log.warning("Failed to restore database: " + err.message);
|
||||
reject(new Error(`Database restore failed: ${err.message}`));
|
||||
});
|
||||
});
|
||||
} catch (error: any) {
|
||||
s.stop("Failed to restore database.");
|
||||
log.warning("Failed to restore database: " + error.message);
|
||||
throw new Error(`Database restore failed: ${error.message}`);
|
||||
}
|
||||
}
|
||||
|
||||
export async function updateWorkerImage() {
|
||||
const { TRIGGER_DB, TRIGGER_TASKS_IMAGE } = env;
|
||||
|
||||
if (!TRIGGER_DB) {
|
||||
throw new Error("TRIGGER_DB environment variable is missing");
|
||||
}
|
||||
|
||||
const connectionString = getDatabaseUrl(TRIGGER_DB);
|
||||
|
||||
const knex = Knex({
|
||||
client: "pg",
|
||||
connection: connectionString,
|
||||
});
|
||||
|
||||
const s = spinner();
|
||||
s.start("Updating worker image reference...");
|
||||
|
||||
try {
|
||||
// Get the first record from WorkerDeployment table
|
||||
const firstWorkerDeployment = await knex("WorkerDeployment").select("id").first();
|
||||
|
||||
if (!firstWorkerDeployment) {
|
||||
s.stop("No WorkerDeployment records found, skipping image update");
|
||||
note("No WorkerDeployment records found, skipping image update");
|
||||
return;
|
||||
}
|
||||
|
||||
// Update the imageReference column with the TRIGGER_TASKS_IMAGE value
|
||||
await knex("WorkerDeployment").where("id", firstWorkerDeployment.id).update({
|
||||
imageReference: TRIGGER_TASKS_IMAGE,
|
||||
updatedAt: new Date(),
|
||||
});
|
||||
|
||||
s.stop(`Successfully updated worker image reference to: ${TRIGGER_TASKS_IMAGE}`);
|
||||
} catch (error: any) {
|
||||
s.stop("Failed to update worker image.");
|
||||
log.warning("Failed to update worker image: " + error.message);
|
||||
throw new Error(`Worker image update failed: ${error.message}`);
|
||||
} finally {
|
||||
await knex.destroy();
|
||||
}
|
||||
}
|
||||
Binary file not shown.
@ -1,40 +0,0 @@
|
||||
{
|
||||
"include": ["./src/**/*.ts"],
|
||||
"exclude": ["./src/**/*.test.ts"],
|
||||
"compilerOptions": {
|
||||
"target": "es2022",
|
||||
"lib": ["ES2022", "DOM", "DOM.Iterable", "DOM.AsyncIterable"],
|
||||
"module": "NodeNext",
|
||||
"moduleResolution": "NodeNext",
|
||||
"moduleDetection": "force",
|
||||
"verbatimModuleSyntax": false,
|
||||
"jsx": "react",
|
||||
|
||||
"strict": true,
|
||||
"alwaysStrict": true,
|
||||
"strictPropertyInitialization": true,
|
||||
"skipLibCheck": true,
|
||||
"forceConsistentCasingInFileNames": true,
|
||||
"noUnusedLocals": false,
|
||||
"noUnusedParameters": false,
|
||||
"noImplicitAny": true,
|
||||
"noImplicitReturns": true,
|
||||
"noImplicitThis": true,
|
||||
|
||||
"noFallthroughCasesInSwitch": true,
|
||||
"resolveJsonModule": true,
|
||||
|
||||
"removeComments": false,
|
||||
"esModuleInterop": true,
|
||||
"emitDecoratorMetadata": false,
|
||||
"experimentalDecorators": false,
|
||||
"downlevelIteration": true,
|
||||
"isolatedModules": true,
|
||||
"noUncheckedIndexedAccess": true,
|
||||
|
||||
"pretty": true,
|
||||
"isolatedDeclarations": false,
|
||||
"composite": true,
|
||||
"sourceMap": true
|
||||
}
|
||||
}
|
||||
@ -1,8 +0,0 @@
|
||||
import { configDefaults, defineConfig } from "vitest/config";
|
||||
|
||||
export default defineConfig({
|
||||
test: {
|
||||
globals: true,
|
||||
exclude: [...configDefaults.exclude, "e2e/**/*"],
|
||||
},
|
||||
});
|
||||
@ -92,3 +92,69 @@ export const sessionCompactionQueue = new Queue("session-compaction-queue", {
|
||||
},
|
||||
},
|
||||
});
|
||||
|
||||
/**
|
||||
* BERT topic analysis queue
|
||||
* Handles CPU-intensive topic modeling on user episodes
|
||||
*/
|
||||
export const bertTopicQueue = new Queue("bert-topic-queue", {
|
||||
connection: getRedisConnection(),
|
||||
defaultJobOptions: {
|
||||
attempts: 2, // Only 2 attempts due to long runtime
|
||||
backoff: {
|
||||
type: "exponential",
|
||||
delay: 5000,
|
||||
},
|
||||
removeOnComplete: {
|
||||
age: 7200, // Keep completed jobs for 2 hours
|
||||
count: 100,
|
||||
},
|
||||
removeOnFail: {
|
||||
age: 172800, // Keep failed jobs for 48 hours (for debugging)
|
||||
},
|
||||
},
|
||||
});
|
||||
|
||||
/**
|
||||
* Space assignment queue
|
||||
* Handles assigning episodes to spaces based on semantic matching
|
||||
*/
|
||||
export const spaceAssignmentQueue = new Queue("space-assignment-queue", {
|
||||
connection: getRedisConnection(),
|
||||
defaultJobOptions: {
|
||||
attempts: 3,
|
||||
backoff: {
|
||||
type: "exponential",
|
||||
delay: 2000,
|
||||
},
|
||||
removeOnComplete: {
|
||||
age: 3600,
|
||||
count: 1000,
|
||||
},
|
||||
removeOnFail: {
|
||||
age: 86400,
|
||||
},
|
||||
},
|
||||
});
|
||||
|
||||
/**
|
||||
* Space summary queue
|
||||
* Handles generating summaries for spaces
|
||||
*/
|
||||
export const spaceSummaryQueue = new Queue("space-summary-queue", {
|
||||
connection: getRedisConnection(),
|
||||
defaultJobOptions: {
|
||||
attempts: 3,
|
||||
backoff: {
|
||||
type: "exponential",
|
||||
delay: 2000,
|
||||
},
|
||||
removeOnComplete: {
|
||||
age: 3600,
|
||||
count: 1000,
|
||||
},
|
||||
removeOnFail: {
|
||||
age: 86400,
|
||||
},
|
||||
},
|
||||
});
|
||||
|
||||
@ -66,7 +66,6 @@ export async function initWorkers(): Promise<void> {
|
||||
queue: conversationTitleQueue,
|
||||
name: "conversation-title",
|
||||
},
|
||||
|
||||
{
|
||||
worker: sessionCompactionWorker,
|
||||
queue: sessionCompactionQueue,
|
||||
|
||||
@ -18,24 +18,39 @@ import {
|
||||
processConversationTitleCreation,
|
||||
type CreateConversationTitlePayload,
|
||||
} from "~/jobs/conversation/create-title.logic";
|
||||
|
||||
import {
|
||||
processSessionCompaction,
|
||||
type SessionCompactionPayload,
|
||||
} from "~/jobs/session/session-compaction.logic";
|
||||
import {
|
||||
processTopicAnalysis,
|
||||
type TopicAnalysisPayload,
|
||||
} from "~/jobs/bert/topic-analysis.logic";
|
||||
|
||||
import {
|
||||
enqueueIngestEpisode,
|
||||
enqueueSpaceAssignment,
|
||||
enqueueSessionCompaction,
|
||||
enqueueBertTopicAnalysis,
|
||||
enqueueSpaceSummary,
|
||||
} from "~/lib/queue-adapter.server";
|
||||
import { logger } from "~/services/logger.service";
|
||||
import {
|
||||
processSpaceAssignment,
|
||||
type SpaceAssignmentPayload,
|
||||
} from "~/jobs/spaces/space-assignment.logic";
|
||||
import {
|
||||
processSpaceSummary,
|
||||
type SpaceSummaryPayload,
|
||||
} from "~/jobs/spaces/space-summary.logic";
|
||||
|
||||
/**
|
||||
* Episode ingestion worker
|
||||
* Processes individual episode ingestion jobs with per-user concurrency
|
||||
* Processes individual episode ingestion jobs with global concurrency
|
||||
*
|
||||
* Note: Per-user concurrency is achieved by using userId as part of the jobId
|
||||
* when adding jobs to the queue, ensuring only one job per user runs at a time
|
||||
* Note: BullMQ uses global concurrency limit (5 jobs max).
|
||||
* Trigger.dev uses per-user concurrency via concurrencyKey.
|
||||
* For most open-source deployments, global concurrency is sufficient.
|
||||
*/
|
||||
export const ingestWorker = new Worker(
|
||||
"ingest-queue",
|
||||
@ -47,11 +62,12 @@ export const ingestWorker = new Worker(
|
||||
// Callbacks to enqueue follow-up jobs
|
||||
enqueueSpaceAssignment,
|
||||
enqueueSessionCompaction,
|
||||
enqueueBertTopicAnalysis,
|
||||
);
|
||||
},
|
||||
{
|
||||
connection: getRedisConnection(),
|
||||
concurrency: 5, // Process up to 5 jobs in parallel
|
||||
concurrency: 1, // Global limit: process up to 1 jobs in parallel
|
||||
},
|
||||
);
|
||||
|
||||
@ -108,6 +124,65 @@ export const sessionCompactionWorker = new Worker(
|
||||
},
|
||||
);
|
||||
|
||||
/**
|
||||
* BERT topic analysis worker
|
||||
* Handles CPU-intensive topic modeling
|
||||
*/
|
||||
export const bertTopicWorker = new Worker(
|
||||
"bert-topic-queue",
|
||||
async (job) => {
|
||||
const payload = job.data as TopicAnalysisPayload;
|
||||
return await processTopicAnalysis(
|
||||
payload,
|
||||
// Callback to enqueue space summary
|
||||
enqueueSpaceSummary,
|
||||
);
|
||||
},
|
||||
{
|
||||
connection: getRedisConnection(),
|
||||
concurrency: 2, // Process up to 2 analyses in parallel (CPU-intensive)
|
||||
},
|
||||
);
|
||||
|
||||
/**
|
||||
* Space assignment worker
|
||||
* Handles assigning episodes to spaces based on semantic matching
|
||||
*
|
||||
* Note: Global concurrency of 1 ensures sequential processing.
|
||||
* Trigger.dev uses per-user concurrency via concurrencyKey.
|
||||
*/
|
||||
export const spaceAssignmentWorker = new Worker(
|
||||
"space-assignment-queue",
|
||||
async (job) => {
|
||||
const payload = job.data as SpaceAssignmentPayload;
|
||||
return await processSpaceAssignment(
|
||||
payload,
|
||||
// Callback to enqueue space summary
|
||||
enqueueSpaceSummary,
|
||||
);
|
||||
},
|
||||
{
|
||||
connection: getRedisConnection(),
|
||||
concurrency: 1, // Global limit: process one job at a time
|
||||
},
|
||||
);
|
||||
|
||||
/**
|
||||
* Space summary worker
|
||||
* Handles generating summaries for spaces
|
||||
*/
|
||||
export const spaceSummaryWorker = new Worker(
|
||||
"space-summary-queue",
|
||||
async (job) => {
|
||||
const payload = job.data as SpaceSummaryPayload;
|
||||
return await processSpaceSummary(payload);
|
||||
},
|
||||
{
|
||||
connection: getRedisConnection(),
|
||||
concurrency: 1, // Process one space summary at a time
|
||||
},
|
||||
);
|
||||
|
||||
/**
|
||||
* Graceful shutdown handler
|
||||
*/
|
||||
@ -116,8 +191,10 @@ export async function closeAllWorkers(): Promise<void> {
|
||||
ingestWorker.close(),
|
||||
documentIngestWorker.close(),
|
||||
conversationTitleWorker.close(),
|
||||
|
||||
sessionCompactionWorker.close(),
|
||||
bertTopicWorker.close(),
|
||||
spaceSummaryWorker.close(),
|
||||
spaceAssignmentWorker.close(),
|
||||
]);
|
||||
logger.log("All BullMQ workers closed");
|
||||
}
|
||||
|
||||
250
apps/webapp/app/jobs/bert/topic-analysis.logic.ts
Normal file
250
apps/webapp/app/jobs/bert/topic-analysis.logic.ts
Normal file
@ -0,0 +1,250 @@
|
||||
import { exec } from "child_process";
|
||||
import { promisify } from "util";
|
||||
import { identifySpacesForTopics } from "~/jobs/spaces/space-identification.logic";
|
||||
import { assignEpisodesToSpace } from "~/services/graphModels/space";
|
||||
import { logger } from "~/services/logger.service";
|
||||
import { SpaceService } from "~/services/space.server";
|
||||
import { prisma } from "~/trigger/utils/prisma";
|
||||
|
||||
const execAsync = promisify(exec);
|
||||
|
||||
export interface TopicAnalysisPayload {
|
||||
userId: string;
|
||||
workspaceId: string;
|
||||
minTopicSize?: number;
|
||||
nrTopics?: number;
|
||||
}
|
||||
|
||||
export interface TopicAnalysisResult {
|
||||
topics: {
|
||||
[topicId: string]: {
|
||||
keywords: string[];
|
||||
episodeIds: string[];
|
||||
};
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Run BERT analysis using exec (for BullMQ/Docker)
|
||||
*/
|
||||
async function runBertWithExec(
|
||||
userId: string,
|
||||
minTopicSize: number,
|
||||
nrTopics?: number,
|
||||
): Promise<string> {
|
||||
let command = `python3 /core/apps/webapp/python/main.py ${userId} --json`;
|
||||
|
||||
if (minTopicSize) {
|
||||
command += ` --min-topic-size ${minTopicSize}`;
|
||||
}
|
||||
|
||||
if (nrTopics) {
|
||||
command += ` --nr-topics ${nrTopics}`;
|
||||
}
|
||||
|
||||
console.log(`[BERT Topic Analysis] Executing: ${command}`);
|
||||
|
||||
const { stdout, stderr } = await execAsync(command, {
|
||||
timeout: 300000, // 5 minutes
|
||||
maxBuffer: 10 * 1024 * 1024, // 10MB buffer for large outputs
|
||||
});
|
||||
|
||||
if (stderr) {
|
||||
console.warn(`[BERT Topic Analysis] Warnings:`, stderr);
|
||||
}
|
||||
|
||||
return stdout;
|
||||
}
|
||||
|
||||
/**
|
||||
* Process BERT topic analysis on user's episodes
|
||||
* This is the common logic shared between Trigger.dev and BullMQ
|
||||
*
|
||||
* NOTE: This function does NOT update workspace.metadata.lastTopicAnalysisAt
|
||||
* That should be done by the caller BEFORE enqueueing this job to prevent
|
||||
* duplicate analyses from racing conditions.
|
||||
*/
|
||||
export async function processTopicAnalysis(
|
||||
payload: TopicAnalysisPayload,
|
||||
enqueueSpaceSummary?: (params: {
|
||||
spaceId: string;
|
||||
userId: string;
|
||||
}) => Promise<any>,
|
||||
pythonRunner?: (
|
||||
userId: string,
|
||||
minTopicSize: number,
|
||||
nrTopics?: number,
|
||||
) => Promise<string>,
|
||||
): Promise<TopicAnalysisResult> {
|
||||
const { userId, workspaceId, minTopicSize = 10, nrTopics } = payload;
|
||||
|
||||
console.log(`[BERT Topic Analysis] Starting analysis for user: ${userId}`);
|
||||
console.log(
|
||||
`[BERT Topic Analysis] Parameters: minTopicSize=${minTopicSize}, nrTopics=${nrTopics || "auto"}`,
|
||||
);
|
||||
|
||||
try {
|
||||
const startTime = Date.now();
|
||||
|
||||
// Run BERT analysis using provided runner or default exec
|
||||
const runner = pythonRunner || runBertWithExec;
|
||||
const stdout = await runner(userId, minTopicSize, nrTopics);
|
||||
|
||||
const duration = Date.now() - startTime;
|
||||
console.log(`[BERT Topic Analysis] Completed in ${duration}ms`);
|
||||
|
||||
// Parse the JSON output
|
||||
const result: TopicAnalysisResult = JSON.parse(stdout);
|
||||
|
||||
// Log summary
|
||||
const topicCount = Object.keys(result.topics).length;
|
||||
const totalEpisodes = Object.values(result.topics).reduce(
|
||||
(sum, topic) => sum + topic.episodeIds.length,
|
||||
0,
|
||||
);
|
||||
|
||||
console.log(
|
||||
`[BERT Topic Analysis] Found ${topicCount} topics covering ${totalEpisodes} episodes`,
|
||||
);
|
||||
|
||||
// Step 2: Identify spaces for topics using LLM
|
||||
try {
|
||||
logger.info("[BERT Topic Analysis] Starting space identification", {
|
||||
userId,
|
||||
topicCount,
|
||||
});
|
||||
|
||||
const spaceProposals = await identifySpacesForTopics({
|
||||
userId,
|
||||
topics: result.topics,
|
||||
});
|
||||
|
||||
logger.info("[BERT Topic Analysis] Space identification completed", {
|
||||
userId,
|
||||
proposalCount: spaceProposals.length,
|
||||
});
|
||||
|
||||
// Step 3: Create or find spaces and assign episodes
|
||||
// Get existing spaces from PostgreSQL
|
||||
const existingSpacesFromDb = await prisma.space.findMany({
|
||||
where: { workspaceId },
|
||||
});
|
||||
const existingSpacesByName = new Map(
|
||||
existingSpacesFromDb.map((s) => [s.name.toLowerCase(), s]),
|
||||
);
|
||||
|
||||
for (const proposal of spaceProposals) {
|
||||
try {
|
||||
// Check if space already exists (case-insensitive match)
|
||||
let spaceId: string;
|
||||
const existingSpace = existingSpacesByName.get(
|
||||
proposal.name.toLowerCase(),
|
||||
);
|
||||
|
||||
if (existingSpace) {
|
||||
// Use existing space
|
||||
spaceId = existingSpace.id;
|
||||
logger.info("[BERT Topic Analysis] Using existing space", {
|
||||
spaceName: proposal.name,
|
||||
spaceId,
|
||||
});
|
||||
} else {
|
||||
// Create new space (creates in both PostgreSQL and Neo4j)
|
||||
// Skip automatic space assignment since we're manually assigning from BERT topics
|
||||
const spaceService = new SpaceService();
|
||||
const newSpace = await spaceService.createSpace({
|
||||
name: proposal.name,
|
||||
description: proposal.intent,
|
||||
userId,
|
||||
workspaceId,
|
||||
});
|
||||
spaceId = newSpace.id;
|
||||
logger.info("[BERT Topic Analysis] Created new space", {
|
||||
spaceName: proposal.name,
|
||||
spaceId,
|
||||
intent: proposal.intent,
|
||||
});
|
||||
}
|
||||
|
||||
// Collect all episode IDs from the topics in this proposal
|
||||
const episodeIds: string[] = [];
|
||||
for (const topicId of proposal.topics) {
|
||||
const topic = result.topics[topicId];
|
||||
if (topic) {
|
||||
episodeIds.push(...topic.episodeIds);
|
||||
}
|
||||
}
|
||||
|
||||
// Assign all episodes from these topics to the space
|
||||
if (episodeIds.length > 0) {
|
||||
await assignEpisodesToSpace(episodeIds, spaceId, userId);
|
||||
logger.info("[BERT Topic Analysis] Assigned episodes to space", {
|
||||
spaceName: proposal.name,
|
||||
spaceId,
|
||||
episodeCount: episodeIds.length,
|
||||
topics: proposal.topics,
|
||||
});
|
||||
|
||||
// Step 4: Trigger space summary if callback provided
|
||||
if (enqueueSpaceSummary) {
|
||||
await enqueueSpaceSummary({ spaceId, userId });
|
||||
logger.info("[BERT Topic Analysis] Triggered space summary", {
|
||||
spaceName: proposal.name,
|
||||
spaceId,
|
||||
});
|
||||
}
|
||||
}
|
||||
} catch (spaceError) {
|
||||
logger.error(
|
||||
"[BERT Topic Analysis] Failed to process space proposal",
|
||||
{
|
||||
proposal,
|
||||
error: spaceError,
|
||||
},
|
||||
);
|
||||
// Continue with other proposals
|
||||
}
|
||||
}
|
||||
} catch (spaceIdentificationError) {
|
||||
logger.error(
|
||||
"[BERT Topic Analysis] Space identification failed, returning topics only",
|
||||
{
|
||||
error: spaceIdentificationError,
|
||||
},
|
||||
);
|
||||
// Return topics even if space identification fails
|
||||
}
|
||||
|
||||
return result;
|
||||
} catch (error) {
|
||||
console.error(`[BERT Topic Analysis] Error:`, error);
|
||||
|
||||
if (error instanceof Error) {
|
||||
// Check for timeout
|
||||
if (error.message.includes("ETIMEDOUT")) {
|
||||
throw new Error(
|
||||
`Topic analysis timed out after 5 minutes. User may have too many episodes.`,
|
||||
);
|
||||
}
|
||||
|
||||
// Check for Python errors
|
||||
if (error.message.includes("python3: not found")) {
|
||||
throw new Error(`Python 3 is not installed or not available in PATH.`);
|
||||
}
|
||||
|
||||
// Check for Neo4j connection errors
|
||||
if (error.message.includes("Failed to connect to Neo4j")) {
|
||||
throw new Error(
|
||||
`Could not connect to Neo4j. Check NEO4J_URI and credentials.`,
|
||||
);
|
||||
}
|
||||
|
||||
// Check for no episodes
|
||||
if (error.message.includes("No episodes found")) {
|
||||
throw new Error(`No episodes found for userId: ${userId}`);
|
||||
}
|
||||
}
|
||||
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
@ -7,6 +7,10 @@ import { prisma } from "~/trigger/utils/prisma";
|
||||
import { EpisodeType } from "@core/types";
|
||||
import { deductCredits, hasCredits } from "~/trigger/utils/utils";
|
||||
import { assignEpisodesToSpace } from "~/services/graphModels/space";
|
||||
import {
|
||||
shouldTriggerTopicAnalysis,
|
||||
updateLastTopicAnalysisTime,
|
||||
} from "~/services/bertTopicAnalysis.server";
|
||||
|
||||
export const IngestBodyRequest = z.object({
|
||||
episodeBody: z.string(),
|
||||
@ -55,6 +59,12 @@ export async function processEpisodeIngestion(
|
||||
sessionId: string;
|
||||
source: string;
|
||||
}) => Promise<any>,
|
||||
enqueueBertTopicAnalysis?: (params: {
|
||||
userId: string;
|
||||
workspaceId: string;
|
||||
minTopicSize?: number;
|
||||
nrTopics?: number;
|
||||
}) => Promise<any>,
|
||||
): Promise<IngestEpisodeResult> {
|
||||
try {
|
||||
logger.log(`Processing job for user ${payload.userId}`);
|
||||
@ -250,6 +260,44 @@ export async function processEpisodeIngestion(
|
||||
});
|
||||
}
|
||||
|
||||
// Auto-trigger BERT topic analysis if threshold met (20+ new episodes)
|
||||
try {
|
||||
if (
|
||||
currentStatus === IngestionStatus.COMPLETED &&
|
||||
enqueueBertTopicAnalysis
|
||||
) {
|
||||
const shouldTrigger = await shouldTriggerTopicAnalysis(
|
||||
payload.userId,
|
||||
payload.workspaceId,
|
||||
);
|
||||
|
||||
if (shouldTrigger) {
|
||||
logger.info(
|
||||
`Triggering BERT topic analysis after reaching 20+ new episodes`,
|
||||
{
|
||||
userId: payload.userId,
|
||||
workspaceId: payload.workspaceId,
|
||||
},
|
||||
);
|
||||
|
||||
await enqueueBertTopicAnalysis({
|
||||
userId: payload.userId,
|
||||
workspaceId: payload.workspaceId,
|
||||
minTopicSize: 10,
|
||||
});
|
||||
|
||||
// Update the last analysis timestamp
|
||||
await updateLastTopicAnalysisTime(payload.workspaceId);
|
||||
}
|
||||
}
|
||||
} catch (topicAnalysisError) {
|
||||
// Don't fail the ingestion if topic analysis fails
|
||||
logger.warn(`Failed to trigger topic analysis after ingestion:`, {
|
||||
error: topicAnalysisError,
|
||||
userId: payload.userId,
|
||||
});
|
||||
}
|
||||
|
||||
return { success: true, episodeDetails };
|
||||
} catch (err: any) {
|
||||
await prisma.ingestionQueue.update({
|
||||
|
||||
@ -36,7 +36,7 @@ export interface SessionCompactionResult {
|
||||
}
|
||||
|
||||
// Zod schema for LLM response validation
|
||||
const CompactionResultSchema = z.object({
|
||||
export const CompactionResultSchema = z.object({
|
||||
summary: z.string().describe("Consolidated narrative of the entire session"),
|
||||
confidence: z
|
||||
.number()
|
||||
@ -45,7 +45,7 @@ const CompactionResultSchema = z.object({
|
||||
.describe("Confidence score of the compaction quality"),
|
||||
});
|
||||
|
||||
const CONFIG = {
|
||||
export const CONFIG = {
|
||||
minEpisodesForCompaction: 5, // Minimum episodes to trigger compaction
|
||||
compactionThreshold: 1, // Trigger after N new episodes
|
||||
maxEpisodesPerBatch: 50, // Process in batches if needed
|
||||
|
||||
1201
apps/webapp/app/jobs/spaces/space-assignment.logic.ts
Normal file
1201
apps/webapp/app/jobs/spaces/space-assignment.logic.ts
Normal file
File diff suppressed because it is too large
Load Diff
229
apps/webapp/app/jobs/spaces/space-identification.logic.ts
Normal file
229
apps/webapp/app/jobs/spaces/space-identification.logic.ts
Normal file
@ -0,0 +1,229 @@
|
||||
/**
|
||||
* Space Identification Logic
|
||||
*
|
||||
* Uses LLM to identify appropriate spaces for topics discovered by BERT analysis
|
||||
*/
|
||||
|
||||
import { makeModelCall } from "~/lib/model.server";
|
||||
import { getAllSpacesForUser } from "~/services/graphModels/space";
|
||||
import { getEpisode } from "~/services/graphModels/episode";
|
||||
import { logger } from "~/services/logger.service";
|
||||
import type { SpaceNode } from "@core/types";
|
||||
|
||||
export interface TopicData {
|
||||
keywords: string[];
|
||||
episodeIds: string[];
|
||||
}
|
||||
|
||||
export interface SpaceProposal {
|
||||
name: string;
|
||||
intent: string;
|
||||
confidence: number;
|
||||
reason: string;
|
||||
topics: string[]; // Array of topic IDs
|
||||
}
|
||||
|
||||
interface IdentifySpacesParams {
|
||||
userId: string;
|
||||
topics: Record<string, TopicData>;
|
||||
}
|
||||
|
||||
/**
|
||||
* Identify spaces for topics using LLM analysis
|
||||
* Takes top 10 keywords and top 5 episodes per topic
|
||||
*/
|
||||
export async function identifySpacesForTopics(
|
||||
params: IdentifySpacesParams,
|
||||
): Promise<SpaceProposal[]> {
|
||||
const { userId, topics } = params;
|
||||
|
||||
// Get existing spaces for the user
|
||||
const existingSpaces = await getAllSpacesForUser(userId);
|
||||
|
||||
// Prepare topic data with top 10 keywords and top 5 episodes
|
||||
const topicsForAnalysis = await Promise.all(
|
||||
Object.entries(topics).map(async ([topicId, topicData]) => {
|
||||
// Take top 10 keywords
|
||||
const topKeywords = topicData.keywords.slice(0, 10);
|
||||
|
||||
// Take top 5 episodes and fetch their content
|
||||
const topEpisodeIds = topicData.episodeIds.slice(0, 5);
|
||||
const episodes = await Promise.all(
|
||||
topEpisodeIds.map((id) => getEpisode(id)),
|
||||
);
|
||||
|
||||
return {
|
||||
topicId,
|
||||
keywords: topKeywords,
|
||||
episodes: episodes
|
||||
.filter((e) => e !== null)
|
||||
.map((e) => ({
|
||||
content: e!.content.substring(0, 500), // Limit to 500 chars per episode
|
||||
})),
|
||||
episodeCount: topicData.episodeIds.length,
|
||||
};
|
||||
}),
|
||||
);
|
||||
|
||||
// Build the prompt
|
||||
const prompt = buildSpaceIdentificationPrompt(
|
||||
existingSpaces,
|
||||
topicsForAnalysis,
|
||||
);
|
||||
|
||||
logger.info("Identifying spaces for topics", {
|
||||
userId,
|
||||
topicCount: Object.keys(topics).length,
|
||||
existingSpaceCount: existingSpaces.length,
|
||||
});
|
||||
|
||||
// Call LLM with structured output
|
||||
let responseText = "";
|
||||
await makeModelCall(
|
||||
false, // not streaming
|
||||
[{ role: "user", content: prompt }],
|
||||
(text) => {
|
||||
responseText = text;
|
||||
},
|
||||
{
|
||||
temperature: 0.7,
|
||||
},
|
||||
"high", // Use high complexity for space identification
|
||||
);
|
||||
|
||||
// Parse the response
|
||||
const proposals = parseSpaceProposals(responseText);
|
||||
|
||||
logger.info("Space identification completed", {
|
||||
userId,
|
||||
proposalCount: proposals.length,
|
||||
});
|
||||
|
||||
return proposals;
|
||||
}
|
||||
|
||||
/**
|
||||
* Build the prompt for space identification
|
||||
*/
|
||||
function buildSpaceIdentificationPrompt(
|
||||
existingSpaces: SpaceNode[],
|
||||
topics: Array<{
|
||||
topicId: string;
|
||||
keywords: string[];
|
||||
episodes: Array<{ content: string }>;
|
||||
episodeCount: number;
|
||||
}>,
|
||||
): string {
|
||||
const existingSpacesSection =
|
||||
existingSpaces.length > 0
|
||||
? `## Existing Spaces
|
||||
|
||||
The user currently has these spaces:
|
||||
${existingSpaces.map((s) => `- **${s.name}**: ${s.description || "No description"} (${s.contextCount || 0} episodes)`).join("\n")}
|
||||
|
||||
When identifying new spaces, consider if topics fit into existing spaces or if new spaces are needed.`
|
||||
: `## Existing Spaces
|
||||
|
||||
The user currently has no spaces defined. This is a fresh start for space organization.`;
|
||||
|
||||
const topicsSection = `## Topics Discovered
|
||||
|
||||
BERT topic modeling has identified ${topics.length} distinct topics from the user's episodes. Each topic represents a cluster of semantically related content.
|
||||
|
||||
${topics
|
||||
.map(
|
||||
(t, idx) => `### Topic ${idx + 1} (ID: ${t.topicId})
|
||||
**Episode Count**: ${t.episodeCount}
|
||||
**Top Keywords**: ${t.keywords.join(", ")}
|
||||
|
||||
**Sample Episodes** (showing ${t.episodes.length} of ${t.episodeCount}):
|
||||
${t.episodes.map((e, i) => `${i + 1}. ${e.content}`).join("\n")}
|
||||
`,
|
||||
)
|
||||
.join("\n")}`;
|
||||
|
||||
return `You are a knowledge organization expert. Your task is to analyze discovered topics and identify appropriate "spaces" (thematic containers) for organizing episodic memories.
|
||||
|
||||
${existingSpacesSection}
|
||||
|
||||
${topicsSection}
|
||||
|
||||
## Task
|
||||
|
||||
Analyze the topics above and identify spaces that would help organize this content meaningfully. For each space:
|
||||
|
||||
1. **Consider existing spaces first**: If topics clearly belong to existing spaces, assign them there
|
||||
2. **Create new spaces when needed**: If topics represent distinct themes not covered by existing spaces
|
||||
3. **Group related topics**: Multiple topics can be assigned to the same space if they share a theme
|
||||
4. **Aim for 20-50 episodes per space**: This is the sweet spot for space cohesion
|
||||
5. **Focus on user intent**: What would help the user find and understand this content later?
|
||||
|
||||
## Output Format
|
||||
|
||||
Return your analysis as a JSON array of space proposals. Each proposal should have:
|
||||
|
||||
\`\`\`json
|
||||
[
|
||||
{
|
||||
"name": "Space name (use existing space name if assigning to existing space)",
|
||||
"intent": "Clear description of what this space represents",
|
||||
"confidence": 0.85,
|
||||
"reason": "Brief explanation of why these topics belong together",
|
||||
"topics": ["topic-id-1", "topic-id-2"]
|
||||
}
|
||||
]
|
||||
\`\`\`
|
||||
|
||||
**Important Guidelines**:
|
||||
- **confidence**: 0.0-1.0 scale indicating how confident you are this is a good grouping
|
||||
- **topics**: Array of topic IDs (use the exact IDs from above like "0", "1", "-1", etc.)
|
||||
- **name**: For existing spaces, use the EXACT name. For new spaces, create a clear, concise name
|
||||
- Only propose spaces with confidence >= 0.6
|
||||
- Each topic should only appear in ONE space proposal
|
||||
- Topic "-1" is the outlier topic (noise) - only include if it genuinely fits a theme
|
||||
|
||||
Return ONLY the JSON array, no additional text.`;
|
||||
}
|
||||
|
||||
/**
|
||||
* Parse space proposals from LLM response
|
||||
*/
|
||||
function parseSpaceProposals(responseText: string): SpaceProposal[] {
|
||||
try {
|
||||
// Extract JSON from markdown code blocks if present
|
||||
const jsonMatch = responseText.match(/```(?:json)?\s*(\[[\s\S]*?\])\s*```/);
|
||||
const jsonText = jsonMatch ? jsonMatch[1] : responseText;
|
||||
|
||||
const proposals = JSON.parse(jsonText.trim());
|
||||
|
||||
if (!Array.isArray(proposals)) {
|
||||
throw new Error("Response is not an array");
|
||||
}
|
||||
|
||||
// Validate and filter proposals
|
||||
return proposals
|
||||
.filter((p) => {
|
||||
return (
|
||||
p.name &&
|
||||
p.intent &&
|
||||
typeof p.confidence === "number" &&
|
||||
p.confidence >= 0.6 &&
|
||||
Array.isArray(p.topics) &&
|
||||
p.topics.length > 0
|
||||
);
|
||||
})
|
||||
.map((p) => ({
|
||||
name: p.name.trim(),
|
||||
intent: p.intent.trim(),
|
||||
confidence: p.confidence,
|
||||
reason: (p.reason || "").trim(),
|
||||
topics: p.topics.map((t: any) => String(t)),
|
||||
}));
|
||||
} catch (error) {
|
||||
logger.error("Failed to parse space proposals", {
|
||||
error,
|
||||
responseText: responseText.substring(0, 500),
|
||||
});
|
||||
return [];
|
||||
}
|
||||
}
|
||||
721
apps/webapp/app/jobs/spaces/space-summary.logic.ts
Normal file
721
apps/webapp/app/jobs/spaces/space-summary.logic.ts
Normal file
@ -0,0 +1,721 @@
|
||||
import { logger } from "~/services/logger.service";
|
||||
import { SpaceService } from "~/services/space.server";
|
||||
import { makeModelCall } from "~/lib/model.server";
|
||||
import { runQuery } from "~/lib/neo4j.server";
|
||||
import { updateSpaceStatus, SPACE_STATUS } from "~/trigger/utils/space-status";
|
||||
import type { CoreMessage } from "ai";
|
||||
import { z } from "zod";
|
||||
import { getSpace, updateSpace } from "~/trigger/utils/space-utils";
|
||||
import { getSpaceEpisodeCount } from "~/services/graphModels/space";
|
||||
|
||||
export interface SpaceSummaryPayload {
|
||||
userId: string;
|
||||
spaceId: string; // Single space only
|
||||
triggerSource?: "assignment" | "manual" | "scheduled";
|
||||
}
|
||||
|
||||
interface SpaceEpisodeData {
|
||||
uuid: string;
|
||||
content: string;
|
||||
originalContent: string;
|
||||
source: string;
|
||||
createdAt: Date;
|
||||
validAt: Date;
|
||||
metadata: any;
|
||||
sessionId: string | null;
|
||||
}
|
||||
|
||||
interface SpaceSummaryData {
|
||||
spaceId: string;
|
||||
spaceName: string;
|
||||
spaceDescription?: string;
|
||||
contextCount: number;
|
||||
summary: string;
|
||||
keyEntities: string[];
|
||||
themes: string[];
|
||||
confidence: number;
|
||||
lastUpdated: Date;
|
||||
isIncremental: boolean;
|
||||
}
|
||||
|
||||
// Zod schema for LLM response validation
|
||||
const SummaryResultSchema = z.object({
|
||||
summary: z.string(),
|
||||
keyEntities: z.array(z.string()),
|
||||
themes: z.array(z.string()),
|
||||
confidence: z.number().min(0).max(1),
|
||||
});
|
||||
|
||||
const CONFIG = {
|
||||
maxEpisodesForSummary: 20, // Limit episodes for performance
|
||||
minEpisodesForSummary: 1, // Minimum episodes to generate summary
|
||||
summaryEpisodeThreshold: 5, // Minimum new episodes required to trigger summary (configurable)
|
||||
};
|
||||
|
||||
export interface SpaceSummaryResult {
|
||||
success: boolean;
|
||||
spaceId: string;
|
||||
triggerSource: string;
|
||||
summary?: {
|
||||
statementCount: number;
|
||||
confidence: number;
|
||||
themesCount: number;
|
||||
} | null;
|
||||
reason?: string;
|
||||
}
|
||||
|
||||
/**
|
||||
* Core business logic for space summary generation
|
||||
* This is shared between Trigger.dev and BullMQ implementations
|
||||
*/
|
||||
export async function processSpaceSummary(
|
||||
payload: SpaceSummaryPayload,
|
||||
): Promise<SpaceSummaryResult> {
|
||||
const { userId, spaceId, triggerSource = "manual" } = payload;
|
||||
|
||||
logger.info(`Starting space summary generation`, {
|
||||
userId,
|
||||
spaceId,
|
||||
triggerSource,
|
||||
});
|
||||
|
||||
try {
|
||||
// Update status to processing
|
||||
await updateSpaceStatus(spaceId, SPACE_STATUS.PROCESSING, {
|
||||
userId,
|
||||
operation: "space-summary",
|
||||
metadata: { triggerSource, phase: "start_summary" },
|
||||
});
|
||||
|
||||
// Generate summary for the single space
|
||||
const summaryResult = await generateSpaceSummary(
|
||||
spaceId,
|
||||
userId,
|
||||
triggerSource,
|
||||
);
|
||||
|
||||
if (summaryResult) {
|
||||
// Store the summary
|
||||
await storeSummary(summaryResult);
|
||||
|
||||
// Update status to ready after successful completion
|
||||
await updateSpaceStatus(spaceId, SPACE_STATUS.READY, {
|
||||
userId,
|
||||
operation: "space-summary",
|
||||
metadata: {
|
||||
triggerSource,
|
||||
phase: "completed_summary",
|
||||
contextCount: summaryResult.contextCount,
|
||||
confidence: summaryResult.confidence,
|
||||
},
|
||||
});
|
||||
|
||||
logger.info(`Generated summary for space ${spaceId}`, {
|
||||
statementCount: summaryResult.contextCount,
|
||||
confidence: summaryResult.confidence,
|
||||
themes: summaryResult.themes.length,
|
||||
triggerSource,
|
||||
});
|
||||
|
||||
return {
|
||||
success: true,
|
||||
spaceId,
|
||||
triggerSource,
|
||||
summary: {
|
||||
statementCount: summaryResult.contextCount,
|
||||
confidence: summaryResult.confidence,
|
||||
themesCount: summaryResult.themes.length,
|
||||
},
|
||||
};
|
||||
} else {
|
||||
// No summary generated - this could be due to insufficient episodes or no new episodes
|
||||
// This is not an error state, so update status to ready
|
||||
await updateSpaceStatus(spaceId, SPACE_STATUS.READY, {
|
||||
userId,
|
||||
operation: "space-summary",
|
||||
metadata: {
|
||||
triggerSource,
|
||||
phase: "no_summary_needed",
|
||||
reason: "Insufficient episodes or no new episodes to summarize",
|
||||
},
|
||||
});
|
||||
|
||||
logger.info(
|
||||
`No summary generated for space ${spaceId} - insufficient or no new episodes`,
|
||||
);
|
||||
return {
|
||||
success: true,
|
||||
spaceId,
|
||||
triggerSource,
|
||||
summary: null,
|
||||
reason: "No episodes to summarize",
|
||||
};
|
||||
}
|
||||
} catch (error) {
|
||||
// Update status to error on exception
|
||||
try {
|
||||
await updateSpaceStatus(spaceId, SPACE_STATUS.ERROR, {
|
||||
userId,
|
||||
operation: "space-summary",
|
||||
metadata: {
|
||||
triggerSource,
|
||||
phase: "exception",
|
||||
error: error instanceof Error ? error.message : "Unknown error",
|
||||
},
|
||||
});
|
||||
} catch (statusError) {
|
||||
logger.warn(`Failed to update status to error for space ${spaceId}`, {
|
||||
statusError,
|
||||
});
|
||||
}
|
||||
|
||||
logger.error(
|
||||
`Error in space summary generation for space ${spaceId}:`,
|
||||
error as Record<string, unknown>,
|
||||
);
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
|
||||
async function generateSpaceSummary(
|
||||
spaceId: string,
|
||||
userId: string,
|
||||
triggerSource?: "assignment" | "manual" | "scheduled",
|
||||
): Promise<SpaceSummaryData | null> {
|
||||
try {
|
||||
// 1. Get space details
|
||||
const spaceService = new SpaceService();
|
||||
const space = await spaceService.getSpace(spaceId, userId);
|
||||
|
||||
if (!space) {
|
||||
logger.warn(`Space ${spaceId} not found for user ${userId}`);
|
||||
return null;
|
||||
}
|
||||
|
||||
// 2. Check episode count threshold (skip for manual triggers)
|
||||
if (triggerSource !== "manual") {
|
||||
const currentEpisodeCount = await getSpaceEpisodeCount(spaceId, userId);
|
||||
const lastSummaryEpisodeCount = space.contextCount || 0;
|
||||
const episodeDifference = currentEpisodeCount - lastSummaryEpisodeCount;
|
||||
|
||||
if (
|
||||
episodeDifference < CONFIG.summaryEpisodeThreshold ||
|
||||
lastSummaryEpisodeCount !== 0
|
||||
) {
|
||||
logger.info(
|
||||
`Skipping summary generation for space ${spaceId}: only ${episodeDifference} new episodes (threshold: ${CONFIG.summaryEpisodeThreshold})`,
|
||||
{
|
||||
currentEpisodeCount,
|
||||
lastSummaryEpisodeCount,
|
||||
episodeDifference,
|
||||
threshold: CONFIG.summaryEpisodeThreshold,
|
||||
},
|
||||
);
|
||||
return null;
|
||||
}
|
||||
|
||||
logger.info(
|
||||
`Proceeding with summary generation for space ${spaceId}: ${episodeDifference} new episodes (threshold: ${CONFIG.summaryEpisodeThreshold})`,
|
||||
{
|
||||
currentEpisodeCount,
|
||||
lastSummaryEpisodeCount,
|
||||
episodeDifference,
|
||||
},
|
||||
);
|
||||
}
|
||||
|
||||
// 2. Check for existing summary
|
||||
const existingSummary = await getExistingSummary(spaceId);
|
||||
const isIncremental = existingSummary !== null;
|
||||
|
||||
// 3. Get episodes (all or new ones based on existing summary)
|
||||
const episodes = await getSpaceEpisodes(
|
||||
spaceId,
|
||||
userId,
|
||||
isIncremental ? existingSummary?.lastUpdated : undefined,
|
||||
);
|
||||
|
||||
// Handle case where no new episodes exist for incremental update
|
||||
if (isIncremental && episodes.length === 0) {
|
||||
logger.info(
|
||||
`No new episodes found for space ${spaceId}, skipping summary update`,
|
||||
);
|
||||
return null;
|
||||
}
|
||||
|
||||
// Check minimum episode requirement for new summaries only
|
||||
if (!isIncremental && episodes.length < CONFIG.minEpisodesForSummary) {
|
||||
logger.info(
|
||||
`Space ${spaceId} has insufficient episodes (${episodes.length}) for new summary`,
|
||||
);
|
||||
return null;
|
||||
}
|
||||
|
||||
// 4. Process episodes using unified approach
|
||||
let summaryResult;
|
||||
|
||||
if (episodes.length > CONFIG.maxEpisodesForSummary) {
|
||||
logger.info(
|
||||
`Large space detected (${episodes.length} episodes). Processing in batches.`,
|
||||
);
|
||||
|
||||
// Process in batches, each building on previous result
|
||||
const batches: SpaceEpisodeData[][] = [];
|
||||
for (let i = 0; i < episodes.length; i += CONFIG.maxEpisodesForSummary) {
|
||||
batches.push(episodes.slice(i, i + CONFIG.maxEpisodesForSummary));
|
||||
}
|
||||
|
||||
let currentSummary = existingSummary?.summary || null;
|
||||
let currentThemes = existingSummary?.themes || [];
|
||||
let cumulativeConfidence = 0;
|
||||
|
||||
for (const [batchIndex, batch] of batches.entries()) {
|
||||
logger.info(
|
||||
`Processing batch ${batchIndex + 1}/${batches.length} with ${batch.length} episodes`,
|
||||
);
|
||||
|
||||
const batchResult = await generateUnifiedSummary(
|
||||
space.name,
|
||||
space.description as string,
|
||||
batch,
|
||||
currentSummary,
|
||||
currentThemes,
|
||||
);
|
||||
|
||||
if (batchResult) {
|
||||
currentSummary = batchResult.summary;
|
||||
currentThemes = batchResult.themes;
|
||||
cumulativeConfidence += batchResult.confidence;
|
||||
} else {
|
||||
logger.warn(`Failed to process batch ${batchIndex + 1}`);
|
||||
}
|
||||
|
||||
// Small delay between batches
|
||||
if (batchIndex < batches.length - 1) {
|
||||
await new Promise((resolve) => setTimeout(resolve, 500));
|
||||
}
|
||||
}
|
||||
|
||||
summaryResult = currentSummary
|
||||
? {
|
||||
summary: currentSummary,
|
||||
themes: currentThemes,
|
||||
confidence: Math.min(cumulativeConfidence / batches.length, 1.0),
|
||||
}
|
||||
: null;
|
||||
} else {
|
||||
logger.info(
|
||||
`Processing ${episodes.length} episodes with unified approach`,
|
||||
);
|
||||
|
||||
// Use unified approach for smaller spaces
|
||||
summaryResult = await generateUnifiedSummary(
|
||||
space.name,
|
||||
space.description as string,
|
||||
episodes,
|
||||
existingSummary?.summary || null,
|
||||
existingSummary?.themes || [],
|
||||
);
|
||||
}
|
||||
|
||||
if (!summaryResult) {
|
||||
logger.warn(`Failed to generate LLM summary for space ${spaceId}`);
|
||||
return null;
|
||||
}
|
||||
|
||||
// Get the actual current counts from Neo4j
|
||||
const currentEpisodeCount = await getSpaceEpisodeCount(spaceId, userId);
|
||||
|
||||
return {
|
||||
spaceId: space.uuid,
|
||||
spaceName: space.name,
|
||||
spaceDescription: space.description as string,
|
||||
contextCount: currentEpisodeCount,
|
||||
summary: summaryResult.summary,
|
||||
keyEntities: summaryResult.keyEntities || [],
|
||||
themes: summaryResult.themes,
|
||||
confidence: summaryResult.confidence,
|
||||
lastUpdated: new Date(),
|
||||
isIncremental,
|
||||
};
|
||||
} catch (error) {
|
||||
logger.error(
|
||||
`Error generating summary for space ${spaceId}:`,
|
||||
error as Record<string, unknown>,
|
||||
);
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
async function generateUnifiedSummary(
|
||||
spaceName: string,
|
||||
spaceDescription: string | undefined,
|
||||
episodes: SpaceEpisodeData[],
|
||||
previousSummary: string | null = null,
|
||||
previousThemes: string[] = [],
|
||||
): Promise<{
|
||||
summary: string;
|
||||
themes: string[];
|
||||
confidence: number;
|
||||
keyEntities?: string[];
|
||||
} | null> {
|
||||
try {
|
||||
const prompt = createUnifiedSummaryPrompt(
|
||||
spaceName,
|
||||
spaceDescription,
|
||||
episodes,
|
||||
previousSummary,
|
||||
previousThemes,
|
||||
);
|
||||
|
||||
// Space summary generation requires HIGH complexity (creative synthesis, narrative generation)
|
||||
let responseText = "";
|
||||
await makeModelCall(
|
||||
false,
|
||||
prompt,
|
||||
(text: string) => {
|
||||
responseText = text;
|
||||
},
|
||||
undefined,
|
||||
"high",
|
||||
);
|
||||
|
||||
return parseSummaryResponse(responseText);
|
||||
} catch (error) {
|
||||
logger.error(
|
||||
"Error generating unified summary:",
|
||||
error as Record<string, unknown>,
|
||||
);
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
function createUnifiedSummaryPrompt(
|
||||
spaceName: string,
|
||||
spaceDescription: string | undefined,
|
||||
episodes: SpaceEpisodeData[],
|
||||
previousSummary: string | null,
|
||||
previousThemes: string[],
|
||||
): CoreMessage[] {
|
||||
// If there are no episodes and no previous summary, we cannot generate a meaningful summary
|
||||
if (episodes.length === 0 && previousSummary === null) {
|
||||
throw new Error(
|
||||
"Cannot generate summary without episodes or existing summary",
|
||||
);
|
||||
}
|
||||
|
||||
const episodesText = episodes
|
||||
.map(
|
||||
(episode) =>
|
||||
`- ${episode.content} (Source: ${episode.source}, Session: ${episode.sessionId || "N/A"})`,
|
||||
)
|
||||
.join("\n");
|
||||
|
||||
// Extract key entities and themes from episode content
|
||||
const contentWords = episodes
|
||||
.map((ep) => ep.content.toLowerCase())
|
||||
.join(" ")
|
||||
.split(/\s+/)
|
||||
.filter((word) => word.length > 3);
|
||||
|
||||
const wordFrequency = new Map<string, number>();
|
||||
contentWords.forEach((word) => {
|
||||
wordFrequency.set(word, (wordFrequency.get(word) || 0) + 1);
|
||||
});
|
||||
|
||||
const topEntities = Array.from(wordFrequency.entries())
|
||||
.sort(([, a], [, b]) => b - a)
|
||||
.slice(0, 10)
|
||||
.map(([word]) => word);
|
||||
|
||||
const isUpdate = previousSummary !== null;
|
||||
|
||||
return [
|
||||
{
|
||||
role: "system",
|
||||
content: `You are an expert at analyzing and summarizing episodes within semantic spaces based on the space's intent and purpose. Your task is to ${isUpdate ? "update an existing summary by integrating new episodes" : "create a comprehensive summary of episodes"}.
|
||||
|
||||
CRITICAL RULES:
|
||||
1. Base your summary ONLY on insights derived from the actual content/episodes provided
|
||||
2. Use the space's INTENT/PURPOSE (from description) to guide what to summarize and how to organize it
|
||||
3. Write in a factual, neutral tone - avoid promotional language ("pivotal", "invaluable", "cutting-edge")
|
||||
4. Be specific and concrete - reference actual content, patterns, and insights found in the episodes
|
||||
5. If episodes are insufficient for meaningful insights, state that more data is needed
|
||||
|
||||
INTENT-DRIVEN SUMMARIZATION:
|
||||
Your summary should SERVE the space's intended purpose. Examples:
|
||||
- "Learning React" → Summarize React concepts, patterns, techniques learned
|
||||
- "Project X Updates" → Summarize progress, decisions, blockers, next steps
|
||||
- "Health Tracking" → Summarize metrics, trends, observations, insights
|
||||
- "Guidelines for React" → Extract actionable patterns, best practices, rules
|
||||
- "Evolution of design thinking" → Track how thinking changed over time, decision points
|
||||
The intent defines WHY this space exists - organize content to serve that purpose.
|
||||
|
||||
INSTRUCTIONS:
|
||||
${
|
||||
isUpdate
|
||||
? `1. Review the existing summary and themes carefully
|
||||
2. Analyze the new episodes for patterns and insights that align with the space's intent
|
||||
3. Identify connecting points between existing knowledge and new episodes
|
||||
4. Update the summary to seamlessly integrate new information while preserving valuable existing insights
|
||||
5. Evolve themes by adding new ones or refining existing ones based on the space's purpose
|
||||
6. Organize the summary to serve the space's intended use case`
|
||||
: `1. Analyze the semantic content and relationships within the episodes
|
||||
2. Identify topics/sections that align with the space's INTENT and PURPOSE
|
||||
3. Create a coherent summary that serves the space's intended use case
|
||||
4. Organize the summary based on the space's purpose (not generic frequency-based themes)`
|
||||
}
|
||||
${isUpdate ? "7" : "5"}. Assess your confidence in the ${isUpdate ? "updated" : ""} summary quality (0.0-1.0)
|
||||
|
||||
INTENT-ALIGNED ORGANIZATION:
|
||||
- Organize sections based on what serves the space's purpose
|
||||
- Topics don't need minimum episode counts - relevance to intent matters most
|
||||
- Each section should provide value aligned with the space's intended use
|
||||
- For "guidelines" spaces: focus on actionable patterns
|
||||
- For "tracking" spaces: focus on temporal patterns and changes
|
||||
- For "learning" spaces: focus on concepts and insights gained
|
||||
- Let the space's intent drive the structure, not rigid rules
|
||||
|
||||
${
|
||||
isUpdate
|
||||
? `CONNECTION FOCUS:
|
||||
- Entity relationships that span across batches/time
|
||||
- Theme evolution and expansion
|
||||
- Temporal patterns and progressions
|
||||
- Contradictions or confirmations of existing insights
|
||||
- New insights that complement existing knowledge`
|
||||
: ""
|
||||
}
|
||||
|
||||
RESPONSE FORMAT:
|
||||
Provide your response inside <output></output> tags with valid JSON. Include both HTML summary and markdown format.
|
||||
|
||||
<output>
|
||||
{
|
||||
"summary": "${isUpdate ? "Updated HTML summary that integrates new insights with existing knowledge. Write factually about what the statements reveal - mention specific entities, relationships, and patterns found in the data. Avoid marketing language. Use HTML tags for structure." : "Factual HTML summary based on patterns found in the statements. Report what the data actually shows - specific entities, relationships, frequencies, and concrete insights. Avoid promotional language. Use HTML tags like <p>, <strong>, <ul>, <li> for structure. Keep it concise and evidence-based."}",
|
||||
"keyEntities": ["entity1", "entity2", "entity3"],
|
||||
"themes": ["${isUpdate ? 'updated_theme1", "new_theme2", "evolved_theme3' : 'theme1", "theme2", "theme3'}"],
|
||||
"confidence": 0.85
|
||||
}
|
||||
</output>
|
||||
|
||||
JSON FORMATTING RULES:
|
||||
- HTML content in summary field is allowed and encouraged
|
||||
- Escape quotes within strings as \"
|
||||
- Escape HTML angle brackets if needed: < and >
|
||||
- Use proper HTML tags for structure: <p>, <strong>, <em>, <ul>, <li>, <h3>, etc.
|
||||
- HTML content should be well-formed and semantic
|
||||
|
||||
GUIDELINES:
|
||||
${
|
||||
isUpdate
|
||||
? `- Preserve valuable insights from existing summary
|
||||
- Integrate new information by highlighting connections
|
||||
- Themes should evolve naturally, don't replace wholesale
|
||||
- The updated summary should read as a coherent whole
|
||||
- Make the summary user-friendly and explain what value this space provides`
|
||||
: `- Report only what the episodes actually reveal - be specific and concrete
|
||||
- Cite actual content and patterns found in the episodes
|
||||
- Avoid generic descriptions that could apply to any space
|
||||
- Use neutral, factual language - no "comprehensive", "robust", "cutting-edge" etc.
|
||||
- Themes must be backed by at least 3 supporting episodes with clear evidence
|
||||
- Better to have fewer, well-supported themes than many weak ones
|
||||
- Confidence should reflect actual data quality and coverage, not aspirational goals`
|
||||
}`,
|
||||
},
|
||||
{
|
||||
role: "user",
|
||||
content: `SPACE INFORMATION:
|
||||
Name: "${spaceName}"
|
||||
Intent/Purpose: ${spaceDescription || "No specific intent provided - organize naturally based on content"}
|
||||
|
||||
${
|
||||
isUpdate
|
||||
? `EXISTING SUMMARY:
|
||||
${previousSummary}
|
||||
|
||||
EXISTING THEMES:
|
||||
${previousThemes.join(", ")}
|
||||
|
||||
NEW EPISODES TO INTEGRATE (${episodes.length} episodes):`
|
||||
: `EPISODES IN THIS SPACE (${episodes.length} episodes):`
|
||||
}
|
||||
${episodesText}
|
||||
|
||||
${
|
||||
episodes.length > 0
|
||||
? `TOP WORDS BY FREQUENCY:
|
||||
${topEntities.join(", ")}`
|
||||
: ""
|
||||
}
|
||||
|
||||
${
|
||||
isUpdate
|
||||
? "Please identify connections between the existing summary and new episodes, then update the summary to integrate the new insights coherently. Organize the summary to SERVE the space's intent/purpose. Remember: only summarize insights from the actual episode content."
|
||||
: "Please analyze the episodes and provide a comprehensive summary that SERVES the space's intent/purpose. Organize sections based on what would be most valuable for this space's intended use case. If the intent is unclear, organize naturally based on content patterns. Only summarize insights from actual episode content."
|
||||
}`,
|
||||
},
|
||||
];
|
||||
}
|
||||
|
||||
async function getExistingSummary(spaceId: string): Promise<{
|
||||
summary: string;
|
||||
themes: string[];
|
||||
lastUpdated: Date;
|
||||
contextCount: number;
|
||||
} | null> {
|
||||
try {
|
||||
const existingSummary = await getSpace(spaceId);
|
||||
|
||||
if (existingSummary?.summary) {
|
||||
return {
|
||||
summary: existingSummary.summary,
|
||||
themes: existingSummary.themes,
|
||||
lastUpdated: existingSummary.summaryGeneratedAt || new Date(),
|
||||
contextCount: existingSummary.contextCount || 0,
|
||||
};
|
||||
}
|
||||
|
||||
return null;
|
||||
} catch (error) {
|
||||
logger.warn(`Failed to get existing summary for space ${spaceId}:`, {
|
||||
error,
|
||||
});
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
async function getSpaceEpisodes(
|
||||
spaceId: string,
|
||||
userId: string,
|
||||
sinceDate?: Date,
|
||||
): Promise<SpaceEpisodeData[]> {
|
||||
// Query episodes directly using Space-[:HAS_EPISODE]->Episode relationships
|
||||
const params: any = { spaceId, userId };
|
||||
|
||||
let dateCondition = "";
|
||||
if (sinceDate) {
|
||||
dateCondition = "AND e.createdAt > $sinceDate";
|
||||
params.sinceDate = sinceDate.toISOString();
|
||||
}
|
||||
|
||||
const query = `
|
||||
MATCH (space:Space {uuid: $spaceId, userId: $userId})-[:HAS_EPISODE]->(e:Episode {userId: $userId})
|
||||
WHERE e IS NOT NULL ${dateCondition}
|
||||
RETURN DISTINCT e
|
||||
ORDER BY e.createdAt DESC
|
||||
`;
|
||||
|
||||
const result = await runQuery(query, params);
|
||||
|
||||
return result.map((record) => {
|
||||
const episode = record.get("e").properties;
|
||||
return {
|
||||
uuid: episode.uuid,
|
||||
content: episode.content,
|
||||
originalContent: episode.originalContent,
|
||||
source: episode.source,
|
||||
createdAt: new Date(episode.createdAt),
|
||||
validAt: new Date(episode.validAt),
|
||||
metadata: JSON.parse(episode.metadata || "{}"),
|
||||
sessionId: episode.sessionId,
|
||||
};
|
||||
});
|
||||
}
|
||||
|
||||
function parseSummaryResponse(response: string): {
|
||||
summary: string;
|
||||
themes: string[];
|
||||
confidence: number;
|
||||
keyEntities?: string[];
|
||||
} | null {
|
||||
try {
|
||||
// Extract content from <output> tags
|
||||
const outputMatch = response.match(/<output>([\s\S]*?)<\/output>/);
|
||||
if (!outputMatch) {
|
||||
logger.warn("No <output> tags found in LLM summary response");
|
||||
logger.debug("Full LLM response:", { response });
|
||||
return null;
|
||||
}
|
||||
|
||||
let jsonContent = outputMatch[1].trim();
|
||||
|
||||
let parsed;
|
||||
try {
|
||||
parsed = JSON.parse(jsonContent);
|
||||
} catch (jsonError) {
|
||||
logger.warn("JSON parsing failed, attempting cleanup and retry", {
|
||||
originalError: jsonError,
|
||||
jsonContent: jsonContent.substring(0, 500) + "...", // Log first 500 chars
|
||||
});
|
||||
|
||||
// More aggressive cleanup for malformed JSON
|
||||
jsonContent = jsonContent
|
||||
.replace(/([^\\])"/g, '$1\\"') // Escape unescaped quotes
|
||||
.replace(/^"/g, '\\"') // Escape quotes at start
|
||||
.replace(/\\\\"/g, '\\"'); // Fix double-escaped quotes
|
||||
|
||||
parsed = JSON.parse(jsonContent);
|
||||
}
|
||||
|
||||
// Validate the response structure
|
||||
const validationResult = SummaryResultSchema.safeParse(parsed);
|
||||
if (!validationResult.success) {
|
||||
logger.warn("Invalid LLM summary response format:", {
|
||||
error: validationResult.error,
|
||||
parsedData: parsed,
|
||||
});
|
||||
return null;
|
||||
}
|
||||
|
||||
return validationResult.data;
|
||||
} catch (error) {
|
||||
logger.error(
|
||||
"Error parsing LLM summary response:",
|
||||
error as Record<string, unknown>,
|
||||
);
|
||||
logger.debug("Failed response content:", { response });
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
async function storeSummary(summaryData: SpaceSummaryData): Promise<void> {
|
||||
try {
|
||||
// Store in PostgreSQL for API access and persistence
|
||||
await updateSpace(summaryData);
|
||||
|
||||
// Also store in Neo4j for graph-based queries
|
||||
const query = `
|
||||
MATCH (space:Space {uuid: $spaceId})
|
||||
SET space.summary = $summary,
|
||||
space.keyEntities = $keyEntities,
|
||||
space.themes = $themes,
|
||||
space.summaryConfidence = $confidence,
|
||||
space.summaryContextCount = $contextCount,
|
||||
space.summaryLastUpdated = datetime($lastUpdated)
|
||||
RETURN space
|
||||
`;
|
||||
|
||||
await runQuery(query, {
|
||||
spaceId: summaryData.spaceId,
|
||||
summary: summaryData.summary,
|
||||
keyEntities: summaryData.keyEntities,
|
||||
themes: summaryData.themes,
|
||||
confidence: summaryData.confidence,
|
||||
contextCount: summaryData.contextCount,
|
||||
lastUpdated: summaryData.lastUpdated.toISOString(),
|
||||
});
|
||||
|
||||
logger.info(`Stored summary for space ${summaryData.spaceId}`, {
|
||||
themes: summaryData.themes.length,
|
||||
keyEntities: summaryData.keyEntities.length,
|
||||
confidence: summaryData.confidence,
|
||||
});
|
||||
} catch (error) {
|
||||
logger.error(
|
||||
`Error storing summary for space ${summaryData.spaceId}:`,
|
||||
error as Record<string, unknown>,
|
||||
);
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
@ -15,7 +15,8 @@ import type { z } from "zod";
|
||||
import type { IngestBodyRequest } from "~/jobs/ingest/ingest-episode.logic";
|
||||
import type { CreateConversationTitlePayload } from "~/jobs/conversation/create-title.logic";
|
||||
import type { SessionCompactionPayload } from "~/jobs/session/session-compaction.logic";
|
||||
import { type SpaceAssignmentPayload } from "~/trigger/spaces/space-assignment";
|
||||
import type { SpaceAssignmentPayload } from "~/jobs/spaces/space-assignment.logic";
|
||||
import type { SpaceSummaryPayload } from "~/jobs/spaces/space-summary.logic";
|
||||
|
||||
type QueueProvider = "trigger" | "bullmq";
|
||||
|
||||
@ -144,22 +145,86 @@ export async function enqueueSessionCompaction(
|
||||
|
||||
/**
|
||||
* Enqueue space assignment job
|
||||
* (Helper for common job logic to call)
|
||||
*/
|
||||
export async function enqueueSpaceAssignment(
|
||||
payload: SpaceAssignmentPayload,
|
||||
): Promise<void> {
|
||||
): Promise<{ id?: string }> {
|
||||
const provider = env.QUEUE_PROVIDER as QueueProvider;
|
||||
|
||||
if (provider === "trigger") {
|
||||
const { triggerSpaceAssignment } = await import(
|
||||
"~/trigger/spaces/space-assignment"
|
||||
);
|
||||
await triggerSpaceAssignment(payload);
|
||||
const handler = await triggerSpaceAssignment(payload);
|
||||
return { id: handler.id };
|
||||
} else {
|
||||
// For BullMQ, space assignment is not implemented yet
|
||||
// You can add it later when needed
|
||||
console.warn("Space assignment not implemented for BullMQ yet");
|
||||
// BullMQ
|
||||
const { spaceAssignmentQueue } = await import("~/bullmq/queues");
|
||||
const job = await spaceAssignmentQueue.add("space-assignment", payload, {
|
||||
jobId: `space-assignment-${payload.userId}-${payload.mode}-${Date.now()}`,
|
||||
attempts: 3,
|
||||
backoff: { type: "exponential", delay: 2000 },
|
||||
});
|
||||
return { id: job.id };
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Enqueue space summary job
|
||||
*/
|
||||
export async function enqueueSpaceSummary(
|
||||
payload: SpaceSummaryPayload,
|
||||
): Promise<{ id?: string }> {
|
||||
const provider = env.QUEUE_PROVIDER as QueueProvider;
|
||||
|
||||
if (provider === "trigger") {
|
||||
const { triggerSpaceSummary } = await import(
|
||||
"~/trigger/spaces/space-summary"
|
||||
);
|
||||
const handler = await triggerSpaceSummary(payload);
|
||||
return { id: handler.id };
|
||||
} else {
|
||||
// BullMQ
|
||||
const { spaceSummaryQueue } = await import("~/bullmq/queues");
|
||||
const job = await spaceSummaryQueue.add("space-summary", payload, {
|
||||
jobId: `space-summary-${payload.spaceId}-${Date.now()}`,
|
||||
attempts: 3,
|
||||
backoff: { type: "exponential", delay: 2000 },
|
||||
});
|
||||
return { id: job.id };
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Enqueue BERT topic analysis job
|
||||
*/
|
||||
export async function enqueueBertTopicAnalysis(payload: {
|
||||
userId: string;
|
||||
workspaceId: string;
|
||||
minTopicSize?: number;
|
||||
nrTopics?: number;
|
||||
}): Promise<{ id?: string }> {
|
||||
const provider = env.QUEUE_PROVIDER as QueueProvider;
|
||||
|
||||
if (provider === "trigger") {
|
||||
const { bertTopicAnalysisTask } = await import(
|
||||
"~/trigger/bert/topic-analysis"
|
||||
);
|
||||
const handler = await bertTopicAnalysisTask.trigger(payload, {
|
||||
queue: "bert-topic-analysis",
|
||||
concurrencyKey: payload.userId,
|
||||
tags: [payload.userId, "bert-analysis"],
|
||||
});
|
||||
return { id: handler.id };
|
||||
} else {
|
||||
// BullMQ
|
||||
const { bertTopicQueue } = await import("~/bullmq/queues");
|
||||
const job = await bertTopicQueue.add("topic-analysis", payload, {
|
||||
jobId: `bert-${payload.userId}-${Date.now()}`,
|
||||
attempts: 2, // Only 2 attempts for expensive operations
|
||||
backoff: { type: "exponential", delay: 5000 },
|
||||
});
|
||||
return { id: job.id };
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@ -29,12 +29,6 @@ Exclude:
|
||||
• Anything not explicitly consented to share
|
||||
don't store anything the user did not explicitly consent to share.`;
|
||||
|
||||
const githubDescription = `Everything related to my GitHub work - repos I'm working on, projects I contribute to, code I'm writing, PRs I'm reviewing. Basically my coding life on GitHub.`;
|
||||
|
||||
const healthDescription = `My health and wellness stuff - how I'm feeling, what I'm learning about my body, experiments I'm trying, patterns I notice. Whatever matters to me about staying healthy.`;
|
||||
|
||||
const fitnessDescription = `My workouts and training - what I'm doing at the gym, runs I'm going on, progress I'm making, goals I'm chasing. Anything related to physical exercise and getting stronger.`;
|
||||
|
||||
export async function createWorkspace(
|
||||
input: CreateWorkspaceDto,
|
||||
): Promise<Workspace> {
|
||||
@ -56,32 +50,7 @@ export async function createWorkspace(
|
||||
await ensureBillingInitialized(workspace.id);
|
||||
|
||||
// Create default spaces
|
||||
await Promise.all([
|
||||
spaceService.createSpace({
|
||||
name: "Profile",
|
||||
description: profileRule,
|
||||
userId: input.userId,
|
||||
workspaceId: workspace.id,
|
||||
}),
|
||||
spaceService.createSpace({
|
||||
name: "GitHub",
|
||||
description: githubDescription,
|
||||
userId: input.userId,
|
||||
workspaceId: workspace.id,
|
||||
}),
|
||||
spaceService.createSpace({
|
||||
name: "Health",
|
||||
description: healthDescription,
|
||||
userId: input.userId,
|
||||
workspaceId: workspace.id,
|
||||
}),
|
||||
spaceService.createSpace({
|
||||
name: "Fitness",
|
||||
description: fitnessDescription,
|
||||
userId: input.userId,
|
||||
workspaceId: workspace.id,
|
||||
}),
|
||||
]);
|
||||
await Promise.all([]);
|
||||
|
||||
try {
|
||||
const response = await sendEmail({ email: "welcome", to: user.email });
|
||||
|
||||
@ -19,7 +19,10 @@ import {
|
||||
import { getModel } from "~/lib/model.server";
|
||||
import { UserTypeEnum } from "@core/types";
|
||||
import { nanoid } from "nanoid";
|
||||
import { getOrCreatePersonalAccessToken } from "~/services/personalAccessToken.server";
|
||||
import {
|
||||
deletePersonalAccessToken,
|
||||
getOrCreatePersonalAccessToken,
|
||||
} from "~/services/personalAccessToken.server";
|
||||
import {
|
||||
hasAnswer,
|
||||
hasQuestion,
|
||||
@ -126,6 +129,7 @@ const { loader, action } = createHybridActionApiRoute(
|
||||
});
|
||||
|
||||
result.consumeStream(); // no await
|
||||
await deletePersonalAccessToken(pat?.id);
|
||||
|
||||
return result.toUIMessageStreamResponse({
|
||||
originalMessages: validatedMessages,
|
||||
|
||||
@ -1,6 +1,7 @@
|
||||
import { json } from "@remix-run/node";
|
||||
import { z } from "zod";
|
||||
import { prisma } from "~/db.server";
|
||||
|
||||
import { createHybridLoaderApiRoute } from "~/services/routeBuilders/apiBuilder.server";
|
||||
|
||||
// Schema for logs search parameters
|
||||
|
||||
@ -7,7 +7,10 @@ import { SpaceService } from "~/services/space.server";
|
||||
import { json } from "@remix-run/node";
|
||||
import { prisma } from "~/db.server";
|
||||
import { apiCors } from "~/utils/apiCors";
|
||||
import { isTriggerDeployment } from "~/lib/queue-adapter.server";
|
||||
import {
|
||||
enqueueSpaceAssignment,
|
||||
isTriggerDeployment,
|
||||
} from "~/lib/queue-adapter.server";
|
||||
|
||||
const spaceService = new SpaceService();
|
||||
|
||||
@ -74,6 +77,14 @@ const { action } = createHybridActionApiRoute(
|
||||
workspaceId: user.Workspace.id,
|
||||
});
|
||||
|
||||
await enqueueSpaceAssignment({
|
||||
userId: user.id,
|
||||
workspaceId: user.Workspace.id,
|
||||
mode: "new_space",
|
||||
newSpaceId: space.id,
|
||||
batchSize: 25, // Analyze recent statements for the new space
|
||||
});
|
||||
|
||||
return json({ space, success: true });
|
||||
}
|
||||
|
||||
|
||||
107
apps/webapp/app/services/bertTopicAnalysis.server.ts
Normal file
107
apps/webapp/app/services/bertTopicAnalysis.server.ts
Normal file
@ -0,0 +1,107 @@
|
||||
import { prisma } from "~/trigger/utils/prisma";
|
||||
import { logger } from "~/services/logger.service";
|
||||
import { runQuery } from "~/lib/neo4j.server";
|
||||
|
||||
interface WorkspaceMetadata {
|
||||
lastTopicAnalysisAt?: string;
|
||||
[key: string]: any;
|
||||
}
|
||||
|
||||
/**
|
||||
* Check if we should trigger a BERT topic analysis for this workspace
|
||||
* Criteria: 20+ new episodes since last analysis (or no previous analysis)
|
||||
*/
|
||||
export async function shouldTriggerTopicAnalysis(
|
||||
userId: string,
|
||||
workspaceId: string,
|
||||
): Promise<boolean> {
|
||||
try {
|
||||
// Get workspace metadata
|
||||
const workspace = await prisma.workspace.findUnique({
|
||||
where: { id: workspaceId },
|
||||
select: { metadata: true },
|
||||
});
|
||||
|
||||
if (!workspace) {
|
||||
logger.warn(`Workspace not found: ${workspaceId}`);
|
||||
return false;
|
||||
}
|
||||
|
||||
const metadata = (workspace.metadata || {}) as WorkspaceMetadata;
|
||||
const lastAnalysisAt = metadata.lastTopicAnalysisAt;
|
||||
|
||||
// Count episodes since last analysis
|
||||
const query = lastAnalysisAt
|
||||
? `
|
||||
MATCH (e:Episode {userId: $userId})
|
||||
WHERE e.createdAt > datetime($lastAnalysisAt)
|
||||
RETURN count(e) as newEpisodeCount
|
||||
`
|
||||
: `
|
||||
MATCH (e:Episode {userId: $userId})
|
||||
RETURN count(e) as totalEpisodeCount
|
||||
`;
|
||||
|
||||
const result = await runQuery(query, {
|
||||
userId,
|
||||
lastAnalysisAt,
|
||||
});
|
||||
|
||||
const episodeCount = lastAnalysisAt
|
||||
? result[0]?.get("newEpisodeCount")?.toNumber() || 0
|
||||
: result[0]?.get("totalEpisodeCount")?.toNumber() || 0;
|
||||
|
||||
logger.info(
|
||||
`[Topic Analysis Check] User: ${userId}, New episodes: ${episodeCount}, Last analysis: ${lastAnalysisAt || "never"}`,
|
||||
);
|
||||
|
||||
// Trigger if 20+ new episodes
|
||||
return episodeCount >= 20;
|
||||
} catch (error) {
|
||||
logger.error(
|
||||
`[Topic Analysis Check] Error checking episode count:`,
|
||||
error,
|
||||
);
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Update workspace metadata with last topic analysis timestamp
|
||||
*/
|
||||
export async function updateLastTopicAnalysisTime(
|
||||
workspaceId: string,
|
||||
): Promise<void> {
|
||||
try {
|
||||
const workspace = await prisma.workspace.findUnique({
|
||||
where: { id: workspaceId },
|
||||
select: { metadata: true },
|
||||
});
|
||||
|
||||
if (!workspace) {
|
||||
logger.warn(`Workspace not found: ${workspaceId}`);
|
||||
return;
|
||||
}
|
||||
|
||||
const metadata = (workspace.metadata || {}) as WorkspaceMetadata;
|
||||
|
||||
await prisma.workspace.update({
|
||||
where: { id: workspaceId },
|
||||
data: {
|
||||
metadata: {
|
||||
...metadata,
|
||||
lastTopicAnalysisAt: new Date().toISOString(),
|
||||
},
|
||||
},
|
||||
});
|
||||
|
||||
logger.info(
|
||||
`[Topic Analysis] Updated last analysis timestamp for workspace: ${workspaceId}`,
|
||||
);
|
||||
} catch (error) {
|
||||
logger.error(
|
||||
`[Topic Analysis] Error updating last analysis timestamp:`,
|
||||
error,
|
||||
);
|
||||
}
|
||||
}
|
||||
@ -45,6 +45,43 @@ export async function createSpace(
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Get all active spaces for a user
|
||||
*/
|
||||
export async function getAllSpacesForUser(
|
||||
userId: string,
|
||||
): Promise<SpaceNode[]> {
|
||||
const query = `
|
||||
MATCH (s:Space {userId: $userId})
|
||||
WHERE s.isActive = true
|
||||
|
||||
// Count episodes assigned to each space
|
||||
OPTIONAL MATCH (s)-[:HAS_EPISODE]->(e:Episode {userId: $userId})
|
||||
|
||||
WITH s, count(e) as episodeCount
|
||||
RETURN s, episodeCount
|
||||
ORDER BY s.createdAt DESC
|
||||
`;
|
||||
|
||||
const result = await runQuery(query, { userId });
|
||||
|
||||
return result.map((record) => {
|
||||
const spaceData = record.get("s").properties;
|
||||
const episodeCount = record.get("episodeCount") || 0;
|
||||
|
||||
return {
|
||||
uuid: spaceData.uuid,
|
||||
name: spaceData.name,
|
||||
description: spaceData.description,
|
||||
userId: spaceData.userId,
|
||||
createdAt: new Date(spaceData.createdAt),
|
||||
updatedAt: new Date(spaceData.updatedAt),
|
||||
isActive: spaceData.isActive,
|
||||
contextCount: Number(episodeCount),
|
||||
};
|
||||
});
|
||||
}
|
||||
|
||||
/**
|
||||
* Get a specific space by ID
|
||||
*/
|
||||
|
||||
@ -58,6 +58,7 @@ async function createMcpServer(
|
||||
// Handle memory tools and integration meta-tools
|
||||
if (
|
||||
name.startsWith("memory_") ||
|
||||
name === "get_session_id" ||
|
||||
name === "get_integrations" ||
|
||||
name === "get_integration_actions" ||
|
||||
name === "execute_integration_action"
|
||||
|
||||
@ -1,262 +0,0 @@
|
||||
import { logger } from "~/services/logger.service";
|
||||
import {
|
||||
getCompactedSessionBySessionId,
|
||||
getCompactionStats,
|
||||
getSessionEpisodes,
|
||||
type CompactedSessionNode,
|
||||
} from "~/services/graphModels/compactedSession";
|
||||
import { enqueueSessionCompaction } from "~/lib/queue-adapter.server";
|
||||
|
||||
/**
|
||||
* Configuration for session compaction
|
||||
*/
|
||||
export const COMPACTION_CONFIG = {
|
||||
minEpisodesForCompaction: 5, // Minimum episodes to trigger initial compaction
|
||||
compactionThreshold: 1, // Trigger update after N new episodes
|
||||
autoCompactionEnabled: true, // Enable automatic compaction
|
||||
};
|
||||
|
||||
/**
|
||||
* SessionCompactionService - Manages session compaction lifecycle
|
||||
*/
|
||||
export class SessionCompactionService {
|
||||
/**
|
||||
* Check if a session should be compacted
|
||||
*/
|
||||
async shouldCompact(sessionId: string, userId: string): Promise<{
|
||||
shouldCompact: boolean;
|
||||
reason: string;
|
||||
episodeCount?: number;
|
||||
newEpisodeCount?: number;
|
||||
}> {
|
||||
try {
|
||||
// Get existing compact
|
||||
const existingCompact = await getCompactedSessionBySessionId(sessionId, userId);
|
||||
|
||||
if (!existingCompact) {
|
||||
// No compact exists, check if we have enough episodes
|
||||
const episodeCount = await this.getSessionEpisodeCount(sessionId, userId);
|
||||
|
||||
if (episodeCount >= COMPACTION_CONFIG.minEpisodesForCompaction) {
|
||||
return {
|
||||
shouldCompact: true,
|
||||
reason: "initial_compaction",
|
||||
episodeCount,
|
||||
};
|
||||
}
|
||||
|
||||
return {
|
||||
shouldCompact: false,
|
||||
reason: "insufficient_episodes",
|
||||
episodeCount,
|
||||
};
|
||||
}
|
||||
|
||||
// Compact exists, check if we have enough new episodes
|
||||
const newEpisodeCount = await this.getNewEpisodeCount(
|
||||
sessionId,
|
||||
userId,
|
||||
existingCompact.endTime
|
||||
);
|
||||
|
||||
if (newEpisodeCount >= COMPACTION_CONFIG.compactionThreshold) {
|
||||
return {
|
||||
shouldCompact: true,
|
||||
reason: "update_compaction",
|
||||
newEpisodeCount,
|
||||
};
|
||||
}
|
||||
|
||||
return {
|
||||
shouldCompact: false,
|
||||
reason: "insufficient_new_episodes",
|
||||
newEpisodeCount,
|
||||
};
|
||||
} catch (error) {
|
||||
logger.error(`Error checking if session should compact`, {
|
||||
sessionId,
|
||||
userId,
|
||||
error: error instanceof Error ? error.message : String(error),
|
||||
});
|
||||
|
||||
return {
|
||||
shouldCompact: false,
|
||||
reason: "error",
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Get total episode count for a session
|
||||
*/
|
||||
private async getSessionEpisodeCount(
|
||||
sessionId: string,
|
||||
userId: string
|
||||
): Promise<number> {
|
||||
const episodes = await getSessionEpisodes(sessionId, userId);
|
||||
return episodes.length;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get count of new episodes since last compaction
|
||||
*/
|
||||
private async getNewEpisodeCount(
|
||||
sessionId: string,
|
||||
userId: string,
|
||||
afterTime: Date
|
||||
): Promise<number> {
|
||||
const episodes = await getSessionEpisodes(sessionId, userId, afterTime);
|
||||
return episodes.length;
|
||||
}
|
||||
|
||||
/**
|
||||
* Trigger compaction for a session
|
||||
*/
|
||||
async triggerCompaction(
|
||||
sessionId: string,
|
||||
userId: string,
|
||||
source: string,
|
||||
triggerSource: "auto" | "manual" | "threshold" = "auto"
|
||||
): Promise<{ success: boolean; taskId?: string; error?: string }> {
|
||||
try {
|
||||
// Check if compaction should be triggered
|
||||
const check = await this.shouldCompact(sessionId, userId);
|
||||
|
||||
if (!check.shouldCompact) {
|
||||
logger.info(`Compaction not needed`, {
|
||||
sessionId,
|
||||
userId,
|
||||
reason: check.reason,
|
||||
});
|
||||
|
||||
return {
|
||||
success: false,
|
||||
error: `Compaction not needed: ${check.reason}`,
|
||||
};
|
||||
}
|
||||
|
||||
// Trigger the compaction task
|
||||
logger.info(`Triggering session compaction`, {
|
||||
sessionId,
|
||||
userId,
|
||||
source,
|
||||
triggerSource,
|
||||
reason: check.reason,
|
||||
});
|
||||
|
||||
const handle = await enqueueSessionCompaction({
|
||||
userId,
|
||||
sessionId,
|
||||
source,
|
||||
triggerSource,
|
||||
});
|
||||
|
||||
logger.info(`Session compaction triggered`, {
|
||||
sessionId,
|
||||
userId,
|
||||
taskId: handle.id,
|
||||
});
|
||||
|
||||
return {
|
||||
success: true,
|
||||
taskId: handle.id,
|
||||
};
|
||||
} catch (error) {
|
||||
logger.error(`Failed to trigger compaction`, {
|
||||
sessionId,
|
||||
userId,
|
||||
error: error instanceof Error ? error.message : String(error),
|
||||
});
|
||||
|
||||
return {
|
||||
success: false,
|
||||
error: error instanceof Error ? error.message : "Unknown error",
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Get compacted session for recall
|
||||
*/
|
||||
async getCompactForRecall(
|
||||
sessionId: string,
|
||||
userId: string
|
||||
): Promise<CompactedSessionNode | null> {
|
||||
try {
|
||||
return await getCompactedSessionBySessionId(sessionId, userId);
|
||||
} catch (error) {
|
||||
logger.error(`Error fetching compact for recall`, {
|
||||
sessionId,
|
||||
userId,
|
||||
error: error instanceof Error ? error.message : String(error),
|
||||
});
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Get compaction statistics for a user
|
||||
*/
|
||||
async getStats(userId: string): Promise<{
|
||||
totalCompacts: number;
|
||||
totalEpisodes: number;
|
||||
averageCompressionRatio: number;
|
||||
mostRecentCompaction: Date | null;
|
||||
}> {
|
||||
try {
|
||||
return await getCompactionStats(userId);
|
||||
} catch (error) {
|
||||
logger.error(`Error fetching compaction stats`, {
|
||||
userId,
|
||||
error: error instanceof Error ? error.message : String(error),
|
||||
});
|
||||
|
||||
return {
|
||||
totalCompacts: 0,
|
||||
totalEpisodes: 0,
|
||||
averageCompressionRatio: 0,
|
||||
mostRecentCompaction: null,
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Auto-trigger compaction after episode ingestion
|
||||
* Called from ingestion pipeline
|
||||
*/
|
||||
async autoTriggerAfterIngestion(
|
||||
sessionId: string | null | undefined,
|
||||
userId: string,
|
||||
source: string
|
||||
): Promise<void> {
|
||||
// Skip if no sessionId or auto-compaction disabled
|
||||
if (!sessionId || !COMPACTION_CONFIG.autoCompactionEnabled) {
|
||||
return;
|
||||
}
|
||||
|
||||
try {
|
||||
const check = await this.shouldCompact(sessionId, userId);
|
||||
|
||||
if (check.shouldCompact) {
|
||||
logger.info(`Auto-triggering compaction after ingestion`, {
|
||||
sessionId,
|
||||
userId,
|
||||
reason: check.reason,
|
||||
});
|
||||
|
||||
// Trigger compaction asynchronously (don't wait)
|
||||
await this.triggerCompaction(sessionId, userId, source, "auto");
|
||||
}
|
||||
} catch (error) {
|
||||
// Log error but don't fail ingestion
|
||||
logger.error(`Error in auto-trigger compaction`, {
|
||||
sessionId,
|
||||
userId,
|
||||
error: error instanceof Error ? error.message : String(error),
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Singleton instance
|
||||
export const sessionCompactionService = new SessionCompactionService();
|
||||
@ -16,8 +16,6 @@ import {
|
||||
updateSpace,
|
||||
} from "./graphModels/space";
|
||||
import { prisma } from "~/trigger/utils/prisma";
|
||||
import { trackFeatureUsage } from "./telemetry.server";
|
||||
import { enqueueSpaceAssignment } from "~/lib/queue-adapter.server";
|
||||
|
||||
export class SpaceService {
|
||||
/**
|
||||
@ -65,26 +63,7 @@ export class SpaceService {
|
||||
logger.info(`Created space ${space.id} successfully`);
|
||||
|
||||
// Track space creation
|
||||
trackFeatureUsage("space_created", params.userId).catch(console.error);
|
||||
|
||||
// Trigger automatic LLM assignment for the new space
|
||||
try {
|
||||
await enqueueSpaceAssignment({
|
||||
userId: params.userId,
|
||||
workspaceId: params.workspaceId,
|
||||
mode: "new_space",
|
||||
newSpaceId: space.id,
|
||||
batchSize: 25, // Analyze recent statements for the new space
|
||||
});
|
||||
|
||||
logger.info(`Triggered LLM space assignment for new space ${space.id}`);
|
||||
} catch (error) {
|
||||
// Don't fail space creation if LLM assignment fails
|
||||
logger.warn(
|
||||
`Failed to trigger LLM assignment for space ${space.id}:`,
|
||||
error as Record<string, unknown>,
|
||||
);
|
||||
}
|
||||
// trackFeatureUsage("space_created", params.userId).catch(console.error);
|
||||
|
||||
return space;
|
||||
}
|
||||
@ -197,9 +176,6 @@ export class SpaceService {
|
||||
logger.info(`Nothing to update to graph`);
|
||||
}
|
||||
|
||||
// Track space update
|
||||
trackFeatureUsage("space_updated", userId).catch(console.error);
|
||||
|
||||
logger.info(`Updated space ${spaceId} successfully`);
|
||||
return space;
|
||||
}
|
||||
|
||||
53
apps/webapp/app/trigger/bert/topic-analysis.ts
Normal file
53
apps/webapp/app/trigger/bert/topic-analysis.ts
Normal file
@ -0,0 +1,53 @@
|
||||
import { task } from "@trigger.dev/sdk/v3";
|
||||
import { python } from "@trigger.dev/python";
|
||||
import {
|
||||
processTopicAnalysis,
|
||||
type TopicAnalysisPayload,
|
||||
} from "~/jobs/bert/topic-analysis.logic";
|
||||
import { spaceSummaryTask } from "~/trigger/spaces/space-summary";
|
||||
|
||||
/**
|
||||
* Python runner for Trigger.dev using python.runScript
|
||||
*/
|
||||
async function runBertWithTriggerPython(
|
||||
userId: string,
|
||||
minTopicSize: number,
|
||||
nrTopics?: number,
|
||||
): Promise<string> {
|
||||
const args = [userId, "--json"];
|
||||
|
||||
if (nrTopics) {
|
||||
args.push("--nr-topics", String(nrTopics));
|
||||
}
|
||||
|
||||
console.log(
|
||||
`[BERT Topic Analysis] Running with Trigger.dev Python: args=${args.join(" ")}`,
|
||||
);
|
||||
|
||||
const result = await python.runScript("./python/main.py", args);
|
||||
return result.stdout;
|
||||
}
|
||||
|
||||
/**
|
||||
* Trigger.dev task for BERT topic analysis
|
||||
*
|
||||
* This is a thin wrapper around the common logic in jobs/bert/topic-analysis.logic.ts
|
||||
*/
|
||||
export const bertTopicAnalysisTask = task({
|
||||
id: "bert-topic-analysis",
|
||||
queue: {
|
||||
name: "bert-topic-analysis",
|
||||
concurrencyLimit: 3, // Max 3 parallel analyses to avoid CPU overload
|
||||
},
|
||||
run: async (payload: TopicAnalysisPayload) => {
|
||||
return await processTopicAnalysis(
|
||||
payload,
|
||||
// Callback to enqueue space summary
|
||||
async (params) => {
|
||||
await spaceSummaryTask.trigger(params);
|
||||
},
|
||||
// Python runner for Trigger.dev
|
||||
runBertWithTriggerPython,
|
||||
);
|
||||
},
|
||||
});
|
||||
@ -6,6 +6,7 @@ import {
|
||||
} from "~/jobs/ingest/ingest-episode.logic";
|
||||
import { triggerSpaceAssignment } from "../spaces/space-assignment";
|
||||
import { triggerSessionCompaction } from "../session/session-compaction";
|
||||
import { bertTopicAnalysisTask } from "../bert/topic-analysis";
|
||||
|
||||
const ingestionQueue = queue({
|
||||
name: "ingestion-queue",
|
||||
@ -32,6 +33,14 @@ export const ingestTask = task({
|
||||
async (params) => {
|
||||
await triggerSessionCompaction(params);
|
||||
},
|
||||
// Callback for BERT topic analysis
|
||||
async (params) => {
|
||||
await bertTopicAnalysisTask.trigger(params, {
|
||||
queue: "bert-topic-analysis",
|
||||
concurrencyKey: params.userId,
|
||||
tags: [params.userId, "bert-analysis"],
|
||||
});
|
||||
},
|
||||
);
|
||||
},
|
||||
});
|
||||
|
||||
@ -1,9 +1,9 @@
|
||||
import { task } from "@trigger.dev/sdk";
|
||||
import { z } from "zod";
|
||||
import { IngestionQueue, IngestionStatus } from "@core/database";
|
||||
import { IngestionStatus } from "@core/database";
|
||||
import { logger } from "~/services/logger.service";
|
||||
import { prisma } from "../utils/prisma";
|
||||
import { IngestBodyRequest, ingestTask } from "./ingest";
|
||||
import { type IngestBodyRequest, ingestTask } from "./ingest";
|
||||
|
||||
export const RetryNoCreditBodyRequest = z.object({
|
||||
workspaceId: z.string(),
|
||||
@ -43,9 +43,7 @@ export const retryNoCreditsTask = task({
|
||||
};
|
||||
}
|
||||
|
||||
logger.log(
|
||||
`Found ${noCreditItems.length} NO_CREDITS episodes to retry`,
|
||||
);
|
||||
logger.log(`Found ${noCreditItems.length} NO_CREDITS episodes to retry`);
|
||||
|
||||
const results = {
|
||||
total: noCreditItems.length,
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@ -1,557 +0,0 @@
|
||||
import { task } from "@trigger.dev/sdk/v3";
|
||||
import { logger } from "~/services/logger.service";
|
||||
import { makeModelCall } from "~/lib/model.server";
|
||||
import { runQuery } from "~/lib/neo4j.server";
|
||||
import type { CoreMessage } from "ai";
|
||||
import { z } from "zod";
|
||||
import {
|
||||
EXPLICIT_PATTERN_TYPES,
|
||||
IMPLICIT_PATTERN_TYPES,
|
||||
type SpacePattern,
|
||||
type PatternDetectionResult,
|
||||
} from "@core/types";
|
||||
import { createSpacePattern, getSpace } from "../utils/space-utils";
|
||||
|
||||
interface SpacePatternPayload {
|
||||
userId: string;
|
||||
workspaceId: string;
|
||||
spaceId: string;
|
||||
triggerSource?:
|
||||
| "summary_complete"
|
||||
| "manual"
|
||||
| "assignment"
|
||||
| "scheduled"
|
||||
| "new_space"
|
||||
| "growth_threshold"
|
||||
| "ingestion_complete";
|
||||
}
|
||||
|
||||
interface SpaceStatementData {
|
||||
uuid: string;
|
||||
fact: string;
|
||||
subject: string;
|
||||
predicate: string;
|
||||
object: string;
|
||||
createdAt: Date;
|
||||
validAt: Date;
|
||||
content?: string; // For implicit pattern analysis
|
||||
}
|
||||
|
||||
interface SpaceThemeData {
|
||||
themes: string[];
|
||||
summary: string;
|
||||
}
|
||||
|
||||
// Zod schemas for LLM response validation
|
||||
const ExplicitPatternSchema = z.object({
|
||||
name: z.string(),
|
||||
type: z.string(),
|
||||
summary: z.string(),
|
||||
evidence: z.array(z.string()),
|
||||
confidence: z.number().min(0).max(1),
|
||||
});
|
||||
|
||||
const ImplicitPatternSchema = z.object({
|
||||
name: z.string(),
|
||||
type: z.string(),
|
||||
summary: z.string(),
|
||||
evidence: z.array(z.string()),
|
||||
confidence: z.number().min(0).max(1),
|
||||
});
|
||||
|
||||
const PatternAnalysisSchema = z.object({
|
||||
explicitPatterns: z.array(ExplicitPatternSchema),
|
||||
implicitPatterns: z.array(ImplicitPatternSchema),
|
||||
});
|
||||
|
||||
const CONFIG = {
|
||||
minStatementsForPatterns: 5,
|
||||
maxPatternsPerSpace: 20,
|
||||
minPatternConfidence: 0.85,
|
||||
};
|
||||
|
||||
export const spacePatternTask = task({
|
||||
id: "space-pattern",
|
||||
run: async (payload: SpacePatternPayload) => {
|
||||
const { userId, workspaceId, spaceId, triggerSource = "manual" } = payload;
|
||||
|
||||
logger.info(`Starting space pattern detection`, {
|
||||
userId,
|
||||
workspaceId,
|
||||
spaceId,
|
||||
triggerSource,
|
||||
});
|
||||
|
||||
try {
|
||||
// Get space data and check if it has enough content
|
||||
const space = await getSpaceForPatternAnalysis(spaceId);
|
||||
if (!space) {
|
||||
return {
|
||||
success: false,
|
||||
spaceId,
|
||||
error: "Space not found or insufficient data",
|
||||
};
|
||||
}
|
||||
|
||||
// Get statements for pattern analysis
|
||||
const statements = await getSpaceStatementsForPatterns(spaceId, userId);
|
||||
|
||||
if (statements.length < CONFIG.minStatementsForPatterns) {
|
||||
logger.info(
|
||||
`Space ${spaceId} has insufficient statements (${statements.length}) for pattern detection`,
|
||||
);
|
||||
return {
|
||||
success: true,
|
||||
spaceId,
|
||||
triggerSource,
|
||||
patterns: {
|
||||
explicitPatterns: [],
|
||||
implicitPatterns: [],
|
||||
totalPatternsFound: 0,
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
// Detect patterns
|
||||
const patternResult = await detectSpacePatterns(space, statements);
|
||||
|
||||
if (patternResult) {
|
||||
// Store patterns
|
||||
await storePatterns(
|
||||
patternResult.explicitPatterns,
|
||||
patternResult.implicitPatterns,
|
||||
spaceId,
|
||||
);
|
||||
|
||||
logger.info(`Generated patterns for space ${spaceId}`, {
|
||||
explicitPatterns: patternResult.explicitPatterns.length,
|
||||
implicitPatterns: patternResult.implicitPatterns.length,
|
||||
totalPatterns: patternResult.totalPatternsFound,
|
||||
triggerSource,
|
||||
});
|
||||
|
||||
return {
|
||||
success: true,
|
||||
spaceId,
|
||||
triggerSource,
|
||||
patterns: {
|
||||
explicitPatterns: patternResult.explicitPatterns.length,
|
||||
implicitPatterns: patternResult.implicitPatterns.length,
|
||||
totalPatternsFound: patternResult.totalPatternsFound,
|
||||
},
|
||||
};
|
||||
} else {
|
||||
logger.warn(`Failed to detect patterns for space ${spaceId}`);
|
||||
return {
|
||||
success: false,
|
||||
spaceId,
|
||||
triggerSource,
|
||||
error: "Failed to detect patterns",
|
||||
};
|
||||
}
|
||||
} catch (error) {
|
||||
logger.error(
|
||||
`Error in space pattern detection for space ${spaceId}:`,
|
||||
error as Record<string, unknown>,
|
||||
);
|
||||
throw error;
|
||||
}
|
||||
},
|
||||
});
|
||||
|
||||
async function getSpaceForPatternAnalysis(
|
||||
spaceId: string,
|
||||
): Promise<SpaceThemeData | null> {
|
||||
try {
|
||||
const space = await getSpace(spaceId);
|
||||
|
||||
if (!space || !space.themes || space.themes.length === 0) {
|
||||
logger.warn(
|
||||
`Space ${spaceId} not found or has no themes for pattern analysis`,
|
||||
);
|
||||
return null;
|
||||
}
|
||||
|
||||
return {
|
||||
themes: space.themes,
|
||||
summary: space.summary || "",
|
||||
};
|
||||
} catch (error) {
|
||||
logger.error(
|
||||
`Error getting space for pattern analysis:`,
|
||||
error as Record<string, unknown>,
|
||||
);
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
async function getSpaceStatementsForPatterns(
|
||||
spaceId: string,
|
||||
userId: string,
|
||||
): Promise<SpaceStatementData[]> {
|
||||
const query = `
|
||||
MATCH (s:Statement)
|
||||
WHERE s.userId = $userId
|
||||
AND s.spaceIds IS NOT NULL
|
||||
AND $spaceId IN s.spaceIds
|
||||
AND s.invalidAt IS NULL
|
||||
MATCH (s)-[:HAS_SUBJECT]->(subj:Entity)
|
||||
MATCH (s)-[:HAS_PREDICATE]->(pred:Entity)
|
||||
MATCH (s)-[:HAS_OBJECT]->(obj:Entity)
|
||||
RETURN s, subj.name as subject, pred.name as predicate, obj.name as object
|
||||
ORDER BY s.createdAt DESC
|
||||
`;
|
||||
|
||||
const result = await runQuery(query, {
|
||||
spaceId,
|
||||
userId,
|
||||
});
|
||||
|
||||
return result.map((record) => {
|
||||
const statement = record.get("s").properties;
|
||||
return {
|
||||
uuid: statement.uuid,
|
||||
fact: statement.fact,
|
||||
subject: record.get("subject"),
|
||||
predicate: record.get("predicate"),
|
||||
object: record.get("object"),
|
||||
createdAt: new Date(statement.createdAt),
|
||||
validAt: new Date(statement.validAt),
|
||||
content: statement.fact, // Use fact as content for implicit analysis
|
||||
};
|
||||
});
|
||||
}
|
||||
|
||||
async function detectSpacePatterns(
|
||||
space: SpaceThemeData,
|
||||
statements: SpaceStatementData[],
|
||||
): Promise<PatternDetectionResult | null> {
|
||||
try {
|
||||
// Extract explicit patterns from themes
|
||||
const explicitPatterns = await extractExplicitPatterns(
|
||||
space.themes,
|
||||
space.summary,
|
||||
statements,
|
||||
);
|
||||
|
||||
// Extract implicit patterns from statement analysis
|
||||
const implicitPatterns = await extractImplicitPatterns(statements);
|
||||
|
||||
return {
|
||||
explicitPatterns,
|
||||
implicitPatterns,
|
||||
totalPatternsFound: explicitPatterns.length + implicitPatterns.length,
|
||||
processingStats: {
|
||||
statementsAnalyzed: statements.length,
|
||||
themesProcessed: space.themes.length,
|
||||
implicitPatternsExtracted: implicitPatterns.length,
|
||||
},
|
||||
};
|
||||
} catch (error) {
|
||||
logger.error(
|
||||
"Error detecting space patterns:",
|
||||
error as Record<string, unknown>,
|
||||
);
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
async function extractExplicitPatterns(
|
||||
themes: string[],
|
||||
summary: string,
|
||||
statements: SpaceStatementData[],
|
||||
): Promise<Omit<SpacePattern, "id" | "createdAt" | "updatedAt" | "spaceId">[]> {
|
||||
if (themes.length === 0) return [];
|
||||
|
||||
const prompt = createExplicitPatternPrompt(themes, summary, statements);
|
||||
|
||||
// Pattern extraction requires HIGH complexity (insight synthesis, pattern recognition)
|
||||
let responseText = "";
|
||||
await makeModelCall(false, prompt, (text: string) => {
|
||||
responseText = text;
|
||||
}, undefined, 'high');
|
||||
|
||||
const patterns = parseExplicitPatternResponse(responseText);
|
||||
|
||||
return patterns.map((pattern) => ({
|
||||
name: pattern.name || `${pattern.type} pattern`,
|
||||
source: "explicit" as const,
|
||||
type: pattern.type,
|
||||
summary: pattern.summary,
|
||||
evidence: pattern.evidence,
|
||||
confidence: pattern.confidence,
|
||||
userConfirmed: "pending" as const,
|
||||
}));
|
||||
}
|
||||
|
||||
async function extractImplicitPatterns(
|
||||
statements: SpaceStatementData[],
|
||||
): Promise<Omit<SpacePattern, "id" | "createdAt" | "updatedAt" | "spaceId">[]> {
|
||||
if (statements.length < CONFIG.minStatementsForPatterns) return [];
|
||||
|
||||
const prompt = createImplicitPatternPrompt(statements);
|
||||
|
||||
// Implicit pattern discovery requires HIGH complexity (pattern recognition from statements)
|
||||
let responseText = "";
|
||||
await makeModelCall(false, prompt, (text: string) => {
|
||||
responseText = text;
|
||||
}, undefined, 'high');
|
||||
|
||||
const patterns = parseImplicitPatternResponse(responseText);
|
||||
|
||||
return patterns.map((pattern) => ({
|
||||
name: pattern.name || `${pattern.type} pattern`,
|
||||
source: "implicit" as const,
|
||||
type: pattern.type,
|
||||
summary: pattern.summary,
|
||||
evidence: pattern.evidence,
|
||||
confidence: pattern.confidence,
|
||||
userConfirmed: "pending" as const,
|
||||
}));
|
||||
}
|
||||
|
||||
function createExplicitPatternPrompt(
|
||||
themes: string[],
|
||||
summary: string,
|
||||
statements: SpaceStatementData[],
|
||||
): CoreMessage[] {
|
||||
const statementsText = statements
|
||||
.map((stmt) => `[${stmt.uuid}] ${stmt.fact}`)
|
||||
.join("\n");
|
||||
|
||||
const explicitTypes = Object.values(EXPLICIT_PATTERN_TYPES).join('", "');
|
||||
|
||||
return [
|
||||
{
|
||||
role: "system",
|
||||
content: `You are an expert at extracting structured patterns from themes and supporting evidence.
|
||||
|
||||
Your task is to convert high-level themes into explicit patterns with supporting statement evidence.
|
||||
|
||||
INSTRUCTIONS:
|
||||
1. For each theme, create a pattern that explains what it reveals about the user
|
||||
2. Give each pattern a short, descriptive name (2-4 words)
|
||||
3. Find supporting statement IDs that provide evidence for each pattern
|
||||
4. Assess confidence based on evidence strength and theme clarity
|
||||
5. Use appropriate pattern types from these guidelines: "${explicitTypes}"
|
||||
- "theme": High-level thematic content areas
|
||||
- "topic": Specific subject matter or topics of interest
|
||||
- "domain": Knowledge or work domains the user operates in
|
||||
- "interest_area": Areas of personal interest or hobby
|
||||
6. You may suggest new pattern types if none of the guidelines fit well
|
||||
|
||||
RESPONSE FORMAT:
|
||||
Provide your response inside <output></output> tags with valid JSON.
|
||||
|
||||
<output>
|
||||
{
|
||||
"explicitPatterns": [
|
||||
{
|
||||
"name": "Short descriptive name for the pattern",
|
||||
"type": "theme",
|
||||
"summary": "Description of what this pattern reveals about the user",
|
||||
"evidence": ["statement_id_1", "statement_id_2"],
|
||||
"confidence": 0.85
|
||||
}
|
||||
]
|
||||
}
|
||||
</output>`,
|
||||
},
|
||||
{
|
||||
role: "user",
|
||||
content: `THEMES TO ANALYZE:
|
||||
${themes.map((theme, i) => `${i + 1}. ${theme}`).join("\n")}
|
||||
|
||||
SPACE SUMMARY:
|
||||
${summary}
|
||||
|
||||
SUPPORTING STATEMENTS:
|
||||
${statementsText}
|
||||
|
||||
Please extract explicit patterns from these themes and map them to supporting statement evidence.`,
|
||||
},
|
||||
];
|
||||
}
|
||||
|
||||
function createImplicitPatternPrompt(
|
||||
statements: SpaceStatementData[],
|
||||
): CoreMessage[] {
|
||||
const statementsText = statements
|
||||
.map(
|
||||
(stmt) =>
|
||||
`[${stmt.uuid}] ${stmt.fact} (${stmt.subject} → ${stmt.predicate} → ${stmt.object})`,
|
||||
)
|
||||
.join("\n");
|
||||
|
||||
const implicitTypes = Object.values(IMPLICIT_PATTERN_TYPES).join('", "');
|
||||
|
||||
return [
|
||||
{
|
||||
role: "system",
|
||||
content: `You are an expert at discovering implicit behavioral patterns from statement analysis.
|
||||
|
||||
Your task is to identify hidden patterns in user behavior, preferences, and habits from statement content.
|
||||
|
||||
INSTRUCTIONS:
|
||||
1. Analyze statement content for behavioral patterns, not explicit topics
|
||||
2. Give each pattern a short, descriptive name (2-4 words)
|
||||
3. Look for recurring behaviors, preferences, and working styles
|
||||
4. Identify how the user approaches tasks, makes decisions, and interacts
|
||||
5. Use appropriate pattern types from these guidelines: "${implicitTypes}"
|
||||
- "preference": Personal preferences and choices
|
||||
- "habit": Recurring behaviors and routines
|
||||
- "workflow": Work and process patterns
|
||||
- "communication_style": How user communicates and expresses ideas
|
||||
- "decision_pattern": Decision-making approaches and criteria
|
||||
- "temporal_pattern": Time-based behavioral patterns
|
||||
- "behavioral_pattern": General behavioral tendencies
|
||||
- "learning_style": How user learns and processes information
|
||||
- "collaboration_style": How user works with others
|
||||
6. You may suggest new pattern types if none of the guidelines fit well
|
||||
7. Focus on what the statements reveal about how the user thinks, works, or behaves
|
||||
|
||||
RESPONSE FORMAT:
|
||||
Provide your response inside <output></output> tags with valid JSON.
|
||||
|
||||
<output>
|
||||
{
|
||||
"implicitPatterns": [
|
||||
{
|
||||
"name": "Short descriptive name for the pattern",
|
||||
"type": "preference",
|
||||
"summary": "Description of what this behavioral pattern reveals",
|
||||
"evidence": ["statement_id_1", "statement_id_2"],
|
||||
"confidence": 0.75
|
||||
}
|
||||
]
|
||||
}
|
||||
</output>`,
|
||||
},
|
||||
{
|
||||
role: "user",
|
||||
content: `STATEMENTS TO ANALYZE FOR IMPLICIT PATTERNS:
|
||||
${statementsText}
|
||||
|
||||
Please identify implicit behavioral patterns, preferences, and habits from these statements.`,
|
||||
},
|
||||
];
|
||||
}
|
||||
|
||||
function parseExplicitPatternResponse(response: string): Array<{
|
||||
name: string;
|
||||
type: string;
|
||||
summary: string;
|
||||
evidence: string[];
|
||||
confidence: number;
|
||||
}> {
|
||||
try {
|
||||
const outputMatch = response.match(/<output>([\s\S]*?)<\/output>/);
|
||||
if (!outputMatch) {
|
||||
logger.warn("No <output> tags found in explicit pattern response");
|
||||
return [];
|
||||
}
|
||||
|
||||
const parsed = JSON.parse(outputMatch[1].trim());
|
||||
const validationResult = z
|
||||
.object({
|
||||
explicitPatterns: z.array(ExplicitPatternSchema),
|
||||
})
|
||||
.safeParse(parsed);
|
||||
|
||||
if (!validationResult.success) {
|
||||
logger.warn("Invalid explicit pattern response format:", {
|
||||
error: validationResult.error,
|
||||
});
|
||||
return [];
|
||||
}
|
||||
|
||||
return validationResult.data.explicitPatterns.filter(
|
||||
(p) =>
|
||||
p.confidence >= CONFIG.minPatternConfidence && p.evidence.length >= 3, // Ensure at least 3 evidence statements
|
||||
);
|
||||
} catch (error) {
|
||||
logger.error(
|
||||
"Error parsing explicit pattern response:",
|
||||
error as Record<string, unknown>,
|
||||
);
|
||||
return [];
|
||||
}
|
||||
}
|
||||
|
||||
function parseImplicitPatternResponse(response: string): Array<{
|
||||
name: string;
|
||||
type: string;
|
||||
summary: string;
|
||||
evidence: string[];
|
||||
confidence: number;
|
||||
}> {
|
||||
try {
|
||||
const outputMatch = response.match(/<output>([\s\S]*?)<\/output>/);
|
||||
if (!outputMatch) {
|
||||
logger.warn("No <output> tags found in implicit pattern response");
|
||||
return [];
|
||||
}
|
||||
|
||||
const parsed = JSON.parse(outputMatch[1].trim());
|
||||
const validationResult = z
|
||||
.object({
|
||||
implicitPatterns: z.array(ImplicitPatternSchema),
|
||||
})
|
||||
.safeParse(parsed);
|
||||
|
||||
if (!validationResult.success) {
|
||||
logger.warn("Invalid implicit pattern response format:", {
|
||||
error: validationResult.error,
|
||||
});
|
||||
return [];
|
||||
}
|
||||
|
||||
return validationResult.data.implicitPatterns.filter(
|
||||
(p) =>
|
||||
p.confidence >= CONFIG.minPatternConfidence && p.evidence.length >= 3, // Ensure at least 3 evidence statements
|
||||
);
|
||||
} catch (error) {
|
||||
logger.error(
|
||||
"Error parsing implicit pattern response:",
|
||||
error as Record<string, unknown>,
|
||||
);
|
||||
return [];
|
||||
}
|
||||
}
|
||||
|
||||
async function storePatterns(
|
||||
explicitPatterns: Omit<
|
||||
SpacePattern,
|
||||
"id" | "createdAt" | "updatedAt" | "spaceId"
|
||||
>[],
|
||||
implicitPatterns: Omit<
|
||||
SpacePattern,
|
||||
"id" | "createdAt" | "updatedAt" | "spaceId"
|
||||
>[],
|
||||
spaceId: string,
|
||||
): Promise<void> {
|
||||
try {
|
||||
const allPatterns = [...explicitPatterns, ...implicitPatterns];
|
||||
|
||||
if (allPatterns.length === 0) return;
|
||||
|
||||
// Store in PostgreSQL
|
||||
await createSpacePattern(spaceId, allPatterns);
|
||||
|
||||
logger.info(`Stored ${allPatterns.length} patterns`, {
|
||||
explicit: explicitPatterns.length,
|
||||
implicit: implicitPatterns.length,
|
||||
});
|
||||
} catch (error) {
|
||||
logger.error("Error storing patterns:", error as Record<string, unknown>);
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
|
||||
// Helper function to trigger the task
|
||||
export async function triggerSpacePattern(payload: SpacePatternPayload) {
|
||||
return await spacePatternTask.trigger(payload, {
|
||||
concurrencyKey: `space-pattern-${payload.spaceId}`, // Prevent parallel runs for the same space
|
||||
tags: [payload.userId, payload.spaceId, payload.triggerSource || "manual"],
|
||||
});
|
||||
}
|
||||
@ -1,62 +1,11 @@
|
||||
import { queue, task } from "@trigger.dev/sdk/v3";
|
||||
import { logger } from "~/services/logger.service";
|
||||
import { SpaceService } from "~/services/space.server";
|
||||
import { makeModelCall } from "~/lib/model.server";
|
||||
import { runQuery } from "~/lib/neo4j.server";
|
||||
import { updateSpaceStatus, SPACE_STATUS } from "../utils/space-status";
|
||||
import type { CoreMessage } from "ai";
|
||||
import { z } from "zod";
|
||||
import { triggerSpacePattern } from "./space-pattern";
|
||||
import { getSpace, updateSpace } from "../utils/space-utils";
|
||||
import {
|
||||
processSpaceSummary,
|
||||
type SpaceSummaryPayload,
|
||||
} from "~/jobs/spaces/space-summary.logic";
|
||||
|
||||
import { EpisodeType } from "@core/types";
|
||||
import { getSpaceEpisodeCount } from "~/services/graphModels/space";
|
||||
import { addToQueue } from "~/lib/ingest.server";
|
||||
|
||||
interface SpaceSummaryPayload {
|
||||
userId: string;
|
||||
workspaceId: string;
|
||||
spaceId: string; // Single space only
|
||||
triggerSource?: "assignment" | "manual" | "scheduled";
|
||||
}
|
||||
|
||||
interface SpaceEpisodeData {
|
||||
uuid: string;
|
||||
content: string;
|
||||
originalContent: string;
|
||||
source: string;
|
||||
createdAt: Date;
|
||||
validAt: Date;
|
||||
metadata: any;
|
||||
sessionId: string | null;
|
||||
}
|
||||
|
||||
interface SpaceSummaryData {
|
||||
spaceId: string;
|
||||
spaceName: string;
|
||||
spaceDescription?: string;
|
||||
contextCount: number;
|
||||
summary: string;
|
||||
keyEntities: string[];
|
||||
themes: string[];
|
||||
confidence: number;
|
||||
lastUpdated: Date;
|
||||
isIncremental: boolean;
|
||||
}
|
||||
|
||||
// Zod schema for LLM response validation
|
||||
const SummaryResultSchema = z.object({
|
||||
summary: z.string(),
|
||||
keyEntities: z.array(z.string()),
|
||||
themes: z.array(z.string()),
|
||||
confidence: z.number().min(0).max(1),
|
||||
});
|
||||
|
||||
const CONFIG = {
|
||||
maxEpisodesForSummary: 20, // Limit episodes for performance
|
||||
minEpisodesForSummary: 1, // Minimum episodes to generate summary
|
||||
summaryEpisodeThreshold: 5, // Minimum new episodes required to trigger summary (configurable)
|
||||
};
|
||||
export type { SpaceSummaryPayload };
|
||||
|
||||
export const spaceSummaryQueue = queue({
|
||||
name: "space-summary-queue",
|
||||
@ -67,735 +16,17 @@ export const spaceSummaryTask = task({
|
||||
id: "space-summary",
|
||||
queue: spaceSummaryQueue,
|
||||
run: async (payload: SpaceSummaryPayload) => {
|
||||
const { userId, workspaceId, spaceId, triggerSource = "manual" } = payload;
|
||||
|
||||
logger.info(`Starting space summary generation`, {
|
||||
userId,
|
||||
workspaceId,
|
||||
spaceId,
|
||||
triggerSource,
|
||||
logger.info(`[Trigger.dev] Starting space summary task`, {
|
||||
userId: payload.userId,
|
||||
spaceId: payload.spaceId,
|
||||
triggerSource: payload.triggerSource,
|
||||
});
|
||||
|
||||
try {
|
||||
// Update status to processing
|
||||
await updateSpaceStatus(spaceId, SPACE_STATUS.PROCESSING, {
|
||||
userId,
|
||||
operation: "space-summary",
|
||||
metadata: { triggerSource, phase: "start_summary" },
|
||||
});
|
||||
|
||||
// Generate summary for the single space
|
||||
const summaryResult = await generateSpaceSummary(
|
||||
spaceId,
|
||||
userId,
|
||||
triggerSource,
|
||||
);
|
||||
|
||||
if (summaryResult) {
|
||||
// Store the summary
|
||||
await storeSummary(summaryResult);
|
||||
|
||||
// Update status to ready after successful completion
|
||||
await updateSpaceStatus(spaceId, SPACE_STATUS.READY, {
|
||||
userId,
|
||||
operation: "space-summary",
|
||||
metadata: {
|
||||
triggerSource,
|
||||
phase: "completed_summary",
|
||||
contextCount: summaryResult.contextCount,
|
||||
confidence: summaryResult.confidence,
|
||||
},
|
||||
});
|
||||
|
||||
logger.info(`Generated summary for space ${spaceId}`, {
|
||||
statementCount: summaryResult.contextCount,
|
||||
confidence: summaryResult.confidence,
|
||||
themes: summaryResult.themes.length,
|
||||
triggerSource,
|
||||
});
|
||||
|
||||
return {
|
||||
success: true,
|
||||
spaceId,
|
||||
triggerSource,
|
||||
summary: {
|
||||
statementCount: summaryResult.contextCount,
|
||||
confidence: summaryResult.confidence,
|
||||
themesCount: summaryResult.themes.length,
|
||||
},
|
||||
};
|
||||
} else {
|
||||
// No summary generated - this could be due to insufficient episodes or no new episodes
|
||||
// This is not an error state, so update status to ready
|
||||
await updateSpaceStatus(spaceId, SPACE_STATUS.READY, {
|
||||
userId,
|
||||
operation: "space-summary",
|
||||
metadata: {
|
||||
triggerSource,
|
||||
phase: "no_summary_needed",
|
||||
reason: "Insufficient episodes or no new episodes to summarize",
|
||||
},
|
||||
});
|
||||
|
||||
logger.info(
|
||||
`No summary generated for space ${spaceId} - insufficient or no new episodes`,
|
||||
);
|
||||
return {
|
||||
success: true,
|
||||
spaceId,
|
||||
triggerSource,
|
||||
summary: null,
|
||||
reason: "No episodes to summarize",
|
||||
};
|
||||
}
|
||||
} catch (error) {
|
||||
// Update status to error on exception
|
||||
try {
|
||||
await updateSpaceStatus(spaceId, SPACE_STATUS.ERROR, {
|
||||
userId,
|
||||
operation: "space-summary",
|
||||
metadata: {
|
||||
triggerSource,
|
||||
phase: "exception",
|
||||
error: error instanceof Error ? error.message : "Unknown error",
|
||||
},
|
||||
});
|
||||
} catch (statusError) {
|
||||
logger.warn(`Failed to update status to error for space ${spaceId}`, {
|
||||
statusError,
|
||||
});
|
||||
}
|
||||
|
||||
logger.error(
|
||||
`Error in space summary generation for space ${spaceId}:`,
|
||||
error as Record<string, unknown>,
|
||||
);
|
||||
throw error;
|
||||
}
|
||||
// Use common business logic
|
||||
return await processSpaceSummary(payload);
|
||||
},
|
||||
});
|
||||
|
||||
async function generateSpaceSummary(
|
||||
spaceId: string,
|
||||
userId: string,
|
||||
triggerSource?: "assignment" | "manual" | "scheduled",
|
||||
): Promise<SpaceSummaryData | null> {
|
||||
try {
|
||||
// 1. Get space details
|
||||
const spaceService = new SpaceService();
|
||||
const space = await spaceService.getSpace(spaceId, userId);
|
||||
|
||||
if (!space) {
|
||||
logger.warn(`Space ${spaceId} not found for user ${userId}`);
|
||||
return null;
|
||||
}
|
||||
|
||||
// 2. Check episode count threshold (skip for manual triggers)
|
||||
if (triggerSource !== "manual") {
|
||||
const currentEpisodeCount = await getSpaceEpisodeCount(spaceId, userId);
|
||||
const lastSummaryEpisodeCount = space.contextCount || 0;
|
||||
const episodeDifference = currentEpisodeCount - lastSummaryEpisodeCount;
|
||||
|
||||
if (
|
||||
episodeDifference < CONFIG.summaryEpisodeThreshold ||
|
||||
lastSummaryEpisodeCount !== 0
|
||||
) {
|
||||
logger.info(
|
||||
`Skipping summary generation for space ${spaceId}: only ${episodeDifference} new episodes (threshold: ${CONFIG.summaryEpisodeThreshold})`,
|
||||
{
|
||||
currentEpisodeCount,
|
||||
lastSummaryEpisodeCount,
|
||||
episodeDifference,
|
||||
threshold: CONFIG.summaryEpisodeThreshold,
|
||||
},
|
||||
);
|
||||
return null;
|
||||
}
|
||||
|
||||
logger.info(
|
||||
`Proceeding with summary generation for space ${spaceId}: ${episodeDifference} new episodes (threshold: ${CONFIG.summaryEpisodeThreshold})`,
|
||||
{
|
||||
currentEpisodeCount,
|
||||
lastSummaryEpisodeCount,
|
||||
episodeDifference,
|
||||
},
|
||||
);
|
||||
}
|
||||
|
||||
// 2. Check for existing summary
|
||||
const existingSummary = await getExistingSummary(spaceId);
|
||||
const isIncremental = existingSummary !== null;
|
||||
|
||||
// 3. Get episodes (all or new ones based on existing summary)
|
||||
const episodes = await getSpaceEpisodes(
|
||||
spaceId,
|
||||
userId,
|
||||
isIncremental ? existingSummary?.lastUpdated : undefined,
|
||||
);
|
||||
|
||||
// Handle case where no new episodes exist for incremental update
|
||||
if (isIncremental && episodes.length === 0) {
|
||||
logger.info(
|
||||
`No new episodes found for space ${spaceId}, skipping summary update`,
|
||||
);
|
||||
return null;
|
||||
}
|
||||
|
||||
// Check minimum episode requirement for new summaries only
|
||||
if (!isIncremental && episodes.length < CONFIG.minEpisodesForSummary) {
|
||||
logger.info(
|
||||
`Space ${spaceId} has insufficient episodes (${episodes.length}) for new summary`,
|
||||
);
|
||||
return null;
|
||||
}
|
||||
|
||||
// 4. Process episodes using unified approach
|
||||
let summaryResult;
|
||||
|
||||
if (episodes.length > CONFIG.maxEpisodesForSummary) {
|
||||
logger.info(
|
||||
`Large space detected (${episodes.length} episodes). Processing in batches.`,
|
||||
);
|
||||
|
||||
// Process in batches, each building on previous result
|
||||
const batches: SpaceEpisodeData[][] = [];
|
||||
for (let i = 0; i < episodes.length; i += CONFIG.maxEpisodesForSummary) {
|
||||
batches.push(episodes.slice(i, i + CONFIG.maxEpisodesForSummary));
|
||||
}
|
||||
|
||||
let currentSummary = existingSummary?.summary || null;
|
||||
let currentThemes = existingSummary?.themes || [];
|
||||
let cumulativeConfidence = 0;
|
||||
|
||||
for (const [batchIndex, batch] of batches.entries()) {
|
||||
logger.info(
|
||||
`Processing batch ${batchIndex + 1}/${batches.length} with ${batch.length} episodes`,
|
||||
);
|
||||
|
||||
const batchResult = await generateUnifiedSummary(
|
||||
space.name,
|
||||
space.description as string,
|
||||
batch,
|
||||
currentSummary,
|
||||
currentThemes,
|
||||
);
|
||||
|
||||
if (batchResult) {
|
||||
currentSummary = batchResult.summary;
|
||||
currentThemes = batchResult.themes;
|
||||
cumulativeConfidence += batchResult.confidence;
|
||||
} else {
|
||||
logger.warn(`Failed to process batch ${batchIndex + 1}`);
|
||||
}
|
||||
|
||||
// Small delay between batches
|
||||
if (batchIndex < batches.length - 1) {
|
||||
await new Promise((resolve) => setTimeout(resolve, 500));
|
||||
}
|
||||
}
|
||||
|
||||
summaryResult = currentSummary
|
||||
? {
|
||||
summary: currentSummary,
|
||||
themes: currentThemes,
|
||||
confidence: Math.min(cumulativeConfidence / batches.length, 1.0),
|
||||
}
|
||||
: null;
|
||||
} else {
|
||||
logger.info(
|
||||
`Processing ${episodes.length} episodes with unified approach`,
|
||||
);
|
||||
|
||||
// Use unified approach for smaller spaces
|
||||
summaryResult = await generateUnifiedSummary(
|
||||
space.name,
|
||||
space.description as string,
|
||||
episodes,
|
||||
existingSummary?.summary || null,
|
||||
existingSummary?.themes || [],
|
||||
);
|
||||
}
|
||||
|
||||
if (!summaryResult) {
|
||||
logger.warn(`Failed to generate LLM summary for space ${spaceId}`);
|
||||
return null;
|
||||
}
|
||||
|
||||
// Get the actual current counts from Neo4j
|
||||
const currentEpisodeCount = await getSpaceEpisodeCount(spaceId, userId);
|
||||
|
||||
return {
|
||||
spaceId: space.uuid,
|
||||
spaceName: space.name,
|
||||
spaceDescription: space.description as string,
|
||||
contextCount: currentEpisodeCount,
|
||||
summary: summaryResult.summary,
|
||||
keyEntities: summaryResult.keyEntities || [],
|
||||
themes: summaryResult.themes,
|
||||
confidence: summaryResult.confidence,
|
||||
lastUpdated: new Date(),
|
||||
isIncremental,
|
||||
};
|
||||
} catch (error) {
|
||||
logger.error(
|
||||
`Error generating summary for space ${spaceId}:`,
|
||||
error as Record<string, unknown>,
|
||||
);
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
async function generateUnifiedSummary(
|
||||
spaceName: string,
|
||||
spaceDescription: string | undefined,
|
||||
episodes: SpaceEpisodeData[],
|
||||
previousSummary: string | null = null,
|
||||
previousThemes: string[] = [],
|
||||
): Promise<{
|
||||
summary: string;
|
||||
themes: string[];
|
||||
confidence: number;
|
||||
keyEntities?: string[];
|
||||
} | null> {
|
||||
try {
|
||||
const prompt = createUnifiedSummaryPrompt(
|
||||
spaceName,
|
||||
spaceDescription,
|
||||
episodes,
|
||||
previousSummary,
|
||||
previousThemes,
|
||||
);
|
||||
|
||||
// Space summary generation requires HIGH complexity (creative synthesis, narrative generation)
|
||||
let responseText = "";
|
||||
await makeModelCall(
|
||||
false,
|
||||
prompt,
|
||||
(text: string) => {
|
||||
responseText = text;
|
||||
},
|
||||
undefined,
|
||||
"high",
|
||||
);
|
||||
|
||||
return parseSummaryResponse(responseText);
|
||||
} catch (error) {
|
||||
logger.error(
|
||||
"Error generating unified summary:",
|
||||
error as Record<string, unknown>,
|
||||
);
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
function createUnifiedSummaryPrompt(
|
||||
spaceName: string,
|
||||
spaceDescription: string | undefined,
|
||||
episodes: SpaceEpisodeData[],
|
||||
previousSummary: string | null,
|
||||
previousThemes: string[],
|
||||
): CoreMessage[] {
|
||||
// If there are no episodes and no previous summary, we cannot generate a meaningful summary
|
||||
if (episodes.length === 0 && previousSummary === null) {
|
||||
throw new Error(
|
||||
"Cannot generate summary without episodes or existing summary",
|
||||
);
|
||||
}
|
||||
|
||||
const episodesText = episodes
|
||||
.map(
|
||||
(episode) =>
|
||||
`- ${episode.content} (Source: ${episode.source}, Session: ${episode.sessionId || "N/A"})`,
|
||||
)
|
||||
.join("\n");
|
||||
|
||||
// Extract key entities and themes from episode content
|
||||
const contentWords = episodes
|
||||
.map((ep) => ep.content.toLowerCase())
|
||||
.join(" ")
|
||||
.split(/\s+/)
|
||||
.filter((word) => word.length > 3);
|
||||
|
||||
const wordFrequency = new Map<string, number>();
|
||||
contentWords.forEach((word) => {
|
||||
wordFrequency.set(word, (wordFrequency.get(word) || 0) + 1);
|
||||
});
|
||||
|
||||
const topEntities = Array.from(wordFrequency.entries())
|
||||
.sort(([, a], [, b]) => b - a)
|
||||
.slice(0, 10)
|
||||
.map(([word]) => word);
|
||||
|
||||
const isUpdate = previousSummary !== null;
|
||||
|
||||
return [
|
||||
{
|
||||
role: "system",
|
||||
content: `You are an expert at analyzing and summarizing episodes within semantic spaces based on the space's intent and purpose. Your task is to ${isUpdate ? "update an existing summary by integrating new episodes" : "create a comprehensive summary of episodes"}.
|
||||
|
||||
CRITICAL RULES:
|
||||
1. Base your summary ONLY on insights derived from the actual content/episodes provided
|
||||
2. Use the space's INTENT/PURPOSE (from description) to guide what to summarize and how to organize it
|
||||
3. Write in a factual, neutral tone - avoid promotional language ("pivotal", "invaluable", "cutting-edge")
|
||||
4. Be specific and concrete - reference actual content, patterns, and insights found in the episodes
|
||||
5. If episodes are insufficient for meaningful insights, state that more data is needed
|
||||
|
||||
INTENT-DRIVEN SUMMARIZATION:
|
||||
Your summary should SERVE the space's intended purpose. Examples:
|
||||
- "Learning React" → Summarize React concepts, patterns, techniques learned
|
||||
- "Project X Updates" → Summarize progress, decisions, blockers, next steps
|
||||
- "Health Tracking" → Summarize metrics, trends, observations, insights
|
||||
- "Guidelines for React" → Extract actionable patterns, best practices, rules
|
||||
- "Evolution of design thinking" → Track how thinking changed over time, decision points
|
||||
The intent defines WHY this space exists - organize content to serve that purpose.
|
||||
|
||||
INSTRUCTIONS:
|
||||
${
|
||||
isUpdate
|
||||
? `1. Review the existing summary and themes carefully
|
||||
2. Analyze the new episodes for patterns and insights that align with the space's intent
|
||||
3. Identify connecting points between existing knowledge and new episodes
|
||||
4. Update the summary to seamlessly integrate new information while preserving valuable existing insights
|
||||
5. Evolve themes by adding new ones or refining existing ones based on the space's purpose
|
||||
6. Organize the summary to serve the space's intended use case`
|
||||
: `1. Analyze the semantic content and relationships within the episodes
|
||||
2. Identify topics/sections that align with the space's INTENT and PURPOSE
|
||||
3. Create a coherent summary that serves the space's intended use case
|
||||
4. Organize the summary based on the space's purpose (not generic frequency-based themes)`
|
||||
}
|
||||
${isUpdate ? "7" : "5"}. Assess your confidence in the ${isUpdate ? "updated" : ""} summary quality (0.0-1.0)
|
||||
|
||||
INTENT-ALIGNED ORGANIZATION:
|
||||
- Organize sections based on what serves the space's purpose
|
||||
- Topics don't need minimum episode counts - relevance to intent matters most
|
||||
- Each section should provide value aligned with the space's intended use
|
||||
- For "guidelines" spaces: focus on actionable patterns
|
||||
- For "tracking" spaces: focus on temporal patterns and changes
|
||||
- For "learning" spaces: focus on concepts and insights gained
|
||||
- Let the space's intent drive the structure, not rigid rules
|
||||
|
||||
${
|
||||
isUpdate
|
||||
? `CONNECTION FOCUS:
|
||||
- Entity relationships that span across batches/time
|
||||
- Theme evolution and expansion
|
||||
- Temporal patterns and progressions
|
||||
- Contradictions or confirmations of existing insights
|
||||
- New insights that complement existing knowledge`
|
||||
: ""
|
||||
}
|
||||
|
||||
RESPONSE FORMAT:
|
||||
Provide your response inside <output></output> tags with valid JSON. Include both HTML summary and markdown format.
|
||||
|
||||
<output>
|
||||
{
|
||||
"summary": "${isUpdate ? "Updated HTML summary that integrates new insights with existing knowledge. Write factually about what the statements reveal - mention specific entities, relationships, and patterns found in the data. Avoid marketing language. Use HTML tags for structure." : "Factual HTML summary based on patterns found in the statements. Report what the data actually shows - specific entities, relationships, frequencies, and concrete insights. Avoid promotional language. Use HTML tags like <p>, <strong>, <ul>, <li> for structure. Keep it concise and evidence-based."}",
|
||||
"keyEntities": ["entity1", "entity2", "entity3"],
|
||||
"themes": ["${isUpdate ? 'updated_theme1", "new_theme2", "evolved_theme3' : 'theme1", "theme2", "theme3'}"],
|
||||
"confidence": 0.85
|
||||
}
|
||||
</output>
|
||||
|
||||
JSON FORMATTING RULES:
|
||||
- HTML content in summary field is allowed and encouraged
|
||||
- Escape quotes within strings as \"
|
||||
- Escape HTML angle brackets if needed: < and >
|
||||
- Use proper HTML tags for structure: <p>, <strong>, <em>, <ul>, <li>, <h3>, etc.
|
||||
- HTML content should be well-formed and semantic
|
||||
|
||||
GUIDELINES:
|
||||
${
|
||||
isUpdate
|
||||
? `- Preserve valuable insights from existing summary
|
||||
- Integrate new information by highlighting connections
|
||||
- Themes should evolve naturally, don't replace wholesale
|
||||
- The updated summary should read as a coherent whole
|
||||
- Make the summary user-friendly and explain what value this space provides`
|
||||
: `- Report only what the episodes actually reveal - be specific and concrete
|
||||
- Cite actual content and patterns found in the episodes
|
||||
- Avoid generic descriptions that could apply to any space
|
||||
- Use neutral, factual language - no "comprehensive", "robust", "cutting-edge" etc.
|
||||
- Themes must be backed by at least 3 supporting episodes with clear evidence
|
||||
- Better to have fewer, well-supported themes than many weak ones
|
||||
- Confidence should reflect actual data quality and coverage, not aspirational goals`
|
||||
}`,
|
||||
},
|
||||
{
|
||||
role: "user",
|
||||
content: `SPACE INFORMATION:
|
||||
Name: "${spaceName}"
|
||||
Intent/Purpose: ${spaceDescription || "No specific intent provided - organize naturally based on content"}
|
||||
|
||||
${
|
||||
isUpdate
|
||||
? `EXISTING SUMMARY:
|
||||
${previousSummary}
|
||||
|
||||
EXISTING THEMES:
|
||||
${previousThemes.join(", ")}
|
||||
|
||||
NEW EPISODES TO INTEGRATE (${episodes.length} episodes):`
|
||||
: `EPISODES IN THIS SPACE (${episodes.length} episodes):`
|
||||
}
|
||||
${episodesText}
|
||||
|
||||
${
|
||||
episodes.length > 0
|
||||
? `TOP WORDS BY FREQUENCY:
|
||||
${topEntities.join(", ")}`
|
||||
: ""
|
||||
}
|
||||
|
||||
${
|
||||
isUpdate
|
||||
? "Please identify connections between the existing summary and new episodes, then update the summary to integrate the new insights coherently. Organize the summary to SERVE the space's intent/purpose. Remember: only summarize insights from the actual episode content."
|
||||
: "Please analyze the episodes and provide a comprehensive summary that SERVES the space's intent/purpose. Organize sections based on what would be most valuable for this space's intended use case. If the intent is unclear, organize naturally based on content patterns. Only summarize insights from actual episode content."
|
||||
}`,
|
||||
},
|
||||
];
|
||||
}
|
||||
|
||||
async function getExistingSummary(spaceId: string): Promise<{
|
||||
summary: string;
|
||||
themes: string[];
|
||||
lastUpdated: Date;
|
||||
contextCount: number;
|
||||
} | null> {
|
||||
try {
|
||||
const existingSummary = await getSpace(spaceId);
|
||||
|
||||
if (existingSummary?.summary) {
|
||||
return {
|
||||
summary: existingSummary.summary,
|
||||
themes: existingSummary.themes,
|
||||
lastUpdated: existingSummary.summaryGeneratedAt || new Date(),
|
||||
contextCount: existingSummary.contextCount || 0,
|
||||
};
|
||||
}
|
||||
|
||||
return null;
|
||||
} catch (error) {
|
||||
logger.warn(`Failed to get existing summary for space ${spaceId}:`, {
|
||||
error,
|
||||
});
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
async function getSpaceEpisodes(
|
||||
spaceId: string,
|
||||
userId: string,
|
||||
sinceDate?: Date,
|
||||
): Promise<SpaceEpisodeData[]> {
|
||||
// Query episodes directly using Space-[:HAS_EPISODE]->Episode relationships
|
||||
const params: any = { spaceId, userId };
|
||||
|
||||
let dateCondition = "";
|
||||
if (sinceDate) {
|
||||
dateCondition = "AND e.createdAt > $sinceDate";
|
||||
params.sinceDate = sinceDate.toISOString();
|
||||
}
|
||||
|
||||
const query = `
|
||||
MATCH (space:Space {uuid: $spaceId, userId: $userId})-[:HAS_EPISODE]->(e:Episode {userId: $userId})
|
||||
WHERE e IS NOT NULL ${dateCondition}
|
||||
RETURN DISTINCT e
|
||||
ORDER BY e.createdAt DESC
|
||||
`;
|
||||
|
||||
const result = await runQuery(query, params);
|
||||
|
||||
return result.map((record) => {
|
||||
const episode = record.get("e").properties;
|
||||
return {
|
||||
uuid: episode.uuid,
|
||||
content: episode.content,
|
||||
originalContent: episode.originalContent,
|
||||
source: episode.source,
|
||||
createdAt: new Date(episode.createdAt),
|
||||
validAt: new Date(episode.validAt),
|
||||
metadata: JSON.parse(episode.metadata || "{}"),
|
||||
sessionId: episode.sessionId,
|
||||
};
|
||||
});
|
||||
}
|
||||
|
||||
function parseSummaryResponse(response: string): {
|
||||
summary: string;
|
||||
themes: string[];
|
||||
confidence: number;
|
||||
keyEntities?: string[];
|
||||
} | null {
|
||||
try {
|
||||
// Extract content from <output> tags
|
||||
const outputMatch = response.match(/<output>([\s\S]*?)<\/output>/);
|
||||
if (!outputMatch) {
|
||||
logger.warn("No <output> tags found in LLM summary response");
|
||||
logger.debug("Full LLM response:", { response });
|
||||
return null;
|
||||
}
|
||||
|
||||
let jsonContent = outputMatch[1].trim();
|
||||
|
||||
let parsed;
|
||||
try {
|
||||
parsed = JSON.parse(jsonContent);
|
||||
} catch (jsonError) {
|
||||
logger.warn("JSON parsing failed, attempting cleanup and retry", {
|
||||
originalError: jsonError,
|
||||
jsonContent: jsonContent.substring(0, 500) + "...", // Log first 500 chars
|
||||
});
|
||||
|
||||
// More aggressive cleanup for malformed JSON
|
||||
jsonContent = jsonContent
|
||||
.replace(/([^\\])"/g, '$1\\"') // Escape unescaped quotes
|
||||
.replace(/^"/g, '\\"') // Escape quotes at start
|
||||
.replace(/\\\\"/g, '\\"'); // Fix double-escaped quotes
|
||||
|
||||
parsed = JSON.parse(jsonContent);
|
||||
}
|
||||
|
||||
// Validate the response structure
|
||||
const validationResult = SummaryResultSchema.safeParse(parsed);
|
||||
if (!validationResult.success) {
|
||||
logger.warn("Invalid LLM summary response format:", {
|
||||
error: validationResult.error,
|
||||
parsedData: parsed,
|
||||
});
|
||||
return null;
|
||||
}
|
||||
|
||||
return validationResult.data;
|
||||
} catch (error) {
|
||||
logger.error(
|
||||
"Error parsing LLM summary response:",
|
||||
error as Record<string, unknown>,
|
||||
);
|
||||
logger.debug("Failed response content:", { response });
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
async function storeSummary(summaryData: SpaceSummaryData): Promise<void> {
|
||||
try {
|
||||
// Store in PostgreSQL for API access and persistence
|
||||
await updateSpace(summaryData);
|
||||
|
||||
// Also store in Neo4j for graph-based queries
|
||||
const query = `
|
||||
MATCH (space:Space {uuid: $spaceId})
|
||||
SET space.summary = $summary,
|
||||
space.keyEntities = $keyEntities,
|
||||
space.themes = $themes,
|
||||
space.summaryConfidence = $confidence,
|
||||
space.summaryContextCount = $contextCount,
|
||||
space.summaryLastUpdated = datetime($lastUpdated)
|
||||
RETURN space
|
||||
`;
|
||||
|
||||
await runQuery(query, {
|
||||
spaceId: summaryData.spaceId,
|
||||
summary: summaryData.summary,
|
||||
keyEntities: summaryData.keyEntities,
|
||||
themes: summaryData.themes,
|
||||
confidence: summaryData.confidence,
|
||||
contextCount: summaryData.contextCount,
|
||||
lastUpdated: summaryData.lastUpdated.toISOString(),
|
||||
});
|
||||
|
||||
logger.info(`Stored summary for space ${summaryData.spaceId}`, {
|
||||
themes: summaryData.themes.length,
|
||||
keyEntities: summaryData.keyEntities.length,
|
||||
confidence: summaryData.confidence,
|
||||
});
|
||||
} catch (error) {
|
||||
logger.error(
|
||||
`Error storing summary for space ${summaryData.spaceId}:`,
|
||||
error as Record<string, unknown>,
|
||||
);
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Process space summary sequentially: ingest document then trigger patterns
|
||||
*/
|
||||
async function processSpaceSummarySequentially({
|
||||
userId,
|
||||
workspaceId,
|
||||
spaceId,
|
||||
spaceName,
|
||||
summaryContent,
|
||||
triggerSource,
|
||||
}: {
|
||||
userId: string;
|
||||
workspaceId: string;
|
||||
spaceId: string;
|
||||
spaceName: string;
|
||||
summaryContent: string;
|
||||
triggerSource:
|
||||
| "summary_complete"
|
||||
| "manual"
|
||||
| "assignment"
|
||||
| "scheduled"
|
||||
| "new_space"
|
||||
| "growth_threshold"
|
||||
| "ingestion_complete";
|
||||
}): Promise<void> {
|
||||
// Step 1: Ingest summary as document synchronously
|
||||
await ingestSpaceSummaryDocument(spaceId, userId, spaceName, summaryContent);
|
||||
|
||||
logger.info(
|
||||
`Successfully ingested space summary document for space ${spaceId}`,
|
||||
);
|
||||
|
||||
// Step 2: Now trigger space patterns (patterns will have access to the ingested summary)
|
||||
await triggerSpacePattern({
|
||||
userId,
|
||||
workspaceId,
|
||||
spaceId,
|
||||
triggerSource,
|
||||
});
|
||||
|
||||
logger.info(
|
||||
`Sequential processing completed for space ${spaceId}: summary ingested → patterns triggered`,
|
||||
);
|
||||
}
|
||||
|
||||
/**
|
||||
* Ingest space summary as document synchronously
|
||||
*/
|
||||
async function ingestSpaceSummaryDocument(
|
||||
spaceId: string,
|
||||
userId: string,
|
||||
spaceName: string,
|
||||
summaryContent: string,
|
||||
): Promise<void> {
|
||||
// Create the ingest body
|
||||
const ingestBody = {
|
||||
episodeBody: summaryContent,
|
||||
referenceTime: new Date().toISOString(),
|
||||
metadata: {
|
||||
documentType: "space_summary",
|
||||
spaceId,
|
||||
spaceName,
|
||||
generatedAt: new Date().toISOString(),
|
||||
},
|
||||
source: "space",
|
||||
spaceId,
|
||||
sessionId: spaceId,
|
||||
type: EpisodeType.DOCUMENT,
|
||||
};
|
||||
|
||||
// Add to queue
|
||||
await addToQueue(ingestBody, userId);
|
||||
|
||||
logger.info(`Queued space summary for synchronous ingestion`);
|
||||
|
||||
return;
|
||||
}
|
||||
|
||||
// Helper function to trigger the task
|
||||
export async function triggerSpaceSummary(payload: SpaceSummaryPayload) {
|
||||
return await spaceSummaryTask.trigger(payload, {
|
||||
|
||||
@ -1,4 +1,3 @@
|
||||
import { type SpacePattern } from "@core/types";
|
||||
import { prisma } from "./prisma";
|
||||
|
||||
export const getSpace = async (spaceId: string) => {
|
||||
@ -11,22 +10,6 @@ export const getSpace = async (spaceId: string) => {
|
||||
return space;
|
||||
};
|
||||
|
||||
export const createSpacePattern = async (
|
||||
spaceId: string,
|
||||
allPatterns: Omit<
|
||||
SpacePattern,
|
||||
"id" | "createdAt" | "updatedAt" | "spaceId"
|
||||
>[],
|
||||
) => {
|
||||
return await prisma.spacePattern.createMany({
|
||||
data: allPatterns.map((pattern) => ({
|
||||
...pattern,
|
||||
spaceId,
|
||||
userConfirmed: pattern.userConfirmed as any, // Temporary cast until Prisma client is regenerated
|
||||
})),
|
||||
});
|
||||
};
|
||||
|
||||
export const updateSpace = async (summaryData: {
|
||||
spaceId: string;
|
||||
summary: string;
|
||||
@ -41,7 +24,7 @@ export const updateSpace = async (summaryData: {
|
||||
summary: summaryData.summary,
|
||||
themes: summaryData.themes,
|
||||
contextCount: summaryData.contextCount,
|
||||
summaryGeneratedAt: new Date().toISOString()
|
||||
summaryGeneratedAt: new Date().toISOString(),
|
||||
},
|
||||
});
|
||||
};
|
||||
|
||||
@ -1,3 +1,4 @@
|
||||
import { randomUUID } from "node:crypto";
|
||||
import { EpisodeTypeEnum } from "@core/types";
|
||||
import { addToQueue } from "~/lib/ingest.server";
|
||||
import { logger } from "~/services/logger.service";
|
||||
@ -19,24 +20,24 @@ const SearchParamsSchema = {
|
||||
description:
|
||||
"Search query optimized for knowledge graph retrieval. Choose the right query structure based on your search intent:\n\n" +
|
||||
"1. **Entity-Centric Queries** (Best for graph search):\n" +
|
||||
" - ✅ GOOD: \"User's preferences for code style and formatting\"\n" +
|
||||
" - ✅ GOOD: \"Project authentication implementation decisions\"\n" +
|
||||
" - ❌ BAD: \"user code style\"\n" +
|
||||
' - ✅ GOOD: "User\'s preferences for code style and formatting"\n' +
|
||||
' - ✅ GOOD: "Project authentication implementation decisions"\n' +
|
||||
' - ❌ BAD: "user code style"\n' +
|
||||
" - Format: [Person/Project] + [relationship/attribute] + [context]\n\n" +
|
||||
"2. **Multi-Entity Relationship Queries** (Excellent for episode graph):\n" +
|
||||
" - ✅ GOOD: \"User and team discussions about API design patterns\"\n" +
|
||||
" - ✅ GOOD: \"relationship between database schema and performance optimization\"\n" +
|
||||
" - ❌ BAD: \"user team api design\"\n" +
|
||||
' - ✅ GOOD: "User and team discussions about API design patterns"\n' +
|
||||
' - ✅ GOOD: "relationship between database schema and performance optimization"\n' +
|
||||
' - ❌ BAD: "user team api design"\n' +
|
||||
" - Format: [Entity1] + [relationship type] + [Entity2] + [context]\n\n" +
|
||||
"3. **Semantic Question Queries** (Good for vector search):\n" +
|
||||
" - ✅ GOOD: \"What causes authentication errors in production? What are the security requirements?\"\n" +
|
||||
" - ✅ GOOD: \"How does caching improve API response times compared to direct database queries?\"\n" +
|
||||
" - ❌ BAD: \"auth errors production\"\n" +
|
||||
' - ✅ GOOD: "What causes authentication errors in production? What are the security requirements?"\n' +
|
||||
' - ✅ GOOD: "How does caching improve API response times compared to direct database queries?"\n' +
|
||||
' - ❌ BAD: "auth errors production"\n' +
|
||||
" - Format: Complete natural questions with full context\n\n" +
|
||||
"4. **Concept Exploration Queries** (Good for BFS traversal):\n" +
|
||||
" - ✅ GOOD: \"concepts and ideas related to database indexing and query optimization\"\n" +
|
||||
" - ✅ GOOD: \"topics connected to user authentication and session management\"\n" +
|
||||
" - ❌ BAD: \"database indexing concepts\"\n" +
|
||||
' - ✅ GOOD: "concepts and ideas related to database indexing and query optimization"\n' +
|
||||
' - ✅ GOOD: "topics connected to user authentication and session management"\n' +
|
||||
' - ❌ BAD: "database indexing concepts"\n' +
|
||||
" - Format: [concept] + related/connected + [domain/context]\n\n" +
|
||||
"Avoid keyword soup queries - use complete phrases with proper context for best results.",
|
||||
},
|
||||
@ -75,6 +76,11 @@ const IngestSchema = {
|
||||
description:
|
||||
"The conversation text to store. Include both what the user asked and what you answered. Keep it concise but complete.",
|
||||
},
|
||||
sessionId: {
|
||||
type: "string",
|
||||
description:
|
||||
"IMPORTANT: Session ID (UUID) is required to track the conversation session. If you don't have a sessionId in your context, you MUST call the get_session_id tool first to obtain one before calling memory_ingest.",
|
||||
},
|
||||
spaceIds: {
|
||||
type: "array",
|
||||
items: {
|
||||
@ -84,14 +90,14 @@ const IngestSchema = {
|
||||
"Optional: Array of space UUIDs (from memory_get_spaces). Add this to organize the memory by project. Example: If discussing 'core' project, include the 'core' space ID. Leave empty to store in general memory.",
|
||||
},
|
||||
},
|
||||
required: ["message"],
|
||||
required: ["message", "sessionId"],
|
||||
};
|
||||
|
||||
export const memoryTools = [
|
||||
{
|
||||
name: "memory_ingest",
|
||||
description:
|
||||
"Store conversation in memory for future reference. USE THIS TOOL: At the END of every conversation after fully answering the user. WHAT TO STORE: 1) User's question or request, 2) Your solution or explanation, 3) Important decisions made, 4) Key insights discovered. HOW TO USE: Put the entire conversation summary in the 'message' field. Optionally add spaceIds array to organize by project. Returns: Success confirmation with storage ID.",
|
||||
"Store conversation in memory for future reference. USE THIS TOOL: At the END of every conversation after fully answering the user. WHAT TO STORE: 1) User's question or request, 2) Your solution or explanation, 3) Important decisions made, 4) Key insights discovered. HOW TO USE: Put the entire conversation summary in the 'message' field. IMPORTANT: You MUST provide a sessionId - if you don't have one in your context, call get_session_id tool first to obtain it. Optionally add spaceIds array to organize by project. Returns: Success confirmation with storage ID.",
|
||||
inputSchema: IngestSchema,
|
||||
},
|
||||
{
|
||||
@ -150,6 +156,20 @@ export const memoryTools = [
|
||||
},
|
||||
},
|
||||
},
|
||||
{
|
||||
name: "get_session_id",
|
||||
description:
|
||||
"Get a new session ID for the MCP connection. USE THIS TOOL: When you need a session ID and don't have one yet. This generates a unique UUID to identify your MCP session. IMPORTANT: If any other tool requires a sessionId parameter and you don't have one, call this tool first to get a session ID. Returns: A UUID string to use as sessionId.",
|
||||
inputSchema: {
|
||||
type: "object",
|
||||
properties: {
|
||||
new: {
|
||||
type: "boolean",
|
||||
description: "Set to true to get a new sessionId.",
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
{
|
||||
name: "get_integrations",
|
||||
description:
|
||||
@ -243,6 +263,8 @@ export async function callMemoryTool(
|
||||
return await handleUserProfile(userId);
|
||||
case "memory_get_space":
|
||||
return await handleGetSpace({ ...args, userId });
|
||||
case "get_session_id":
|
||||
return await handleGetSessionId();
|
||||
case "get_integrations":
|
||||
return await handleGetIntegrations({ ...args, userId });
|
||||
case "get_integration_actions":
|
||||
@ -334,6 +356,7 @@ async function handleMemoryIngest(args: any) {
|
||||
source: args.source,
|
||||
type: EpisodeTypeEnum.CONVERSATION,
|
||||
spaceIds,
|
||||
sessionId: args.sessionId,
|
||||
},
|
||||
args.userId,
|
||||
);
|
||||
@ -462,7 +485,7 @@ async function handleGetSpace(args: any) {
|
||||
const spaceDetails = {
|
||||
id: space.id,
|
||||
name: space.name,
|
||||
description: space.description,
|
||||
summary: space.summary,
|
||||
};
|
||||
|
||||
return {
|
||||
@ -489,6 +512,35 @@ async function handleGetSpace(args: any) {
|
||||
}
|
||||
}
|
||||
|
||||
// Handler for get_session_id
|
||||
async function handleGetSessionId() {
|
||||
try {
|
||||
const sessionId = randomUUID();
|
||||
|
||||
return {
|
||||
content: [
|
||||
{
|
||||
type: "text",
|
||||
text: JSON.stringify({ sessionId }),
|
||||
},
|
||||
],
|
||||
isError: false,
|
||||
};
|
||||
} catch (error) {
|
||||
logger.error(`MCP get session id error: ${error}`);
|
||||
|
||||
return {
|
||||
content: [
|
||||
{
|
||||
type: "text",
|
||||
text: `Error generating session ID: ${error instanceof Error ? error.message : String(error)}`,
|
||||
},
|
||||
],
|
||||
isError: true,
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
// Handler for get_integrations
|
||||
async function handleGetIntegrations(args: any) {
|
||||
try {
|
||||
|
||||
@ -44,6 +44,7 @@
|
||||
"@radix-ui/react-icons": "^1.3.0",
|
||||
"@radix-ui/react-label": "^2.0.2",
|
||||
"@radix-ui/react-popover": "^1.0.7",
|
||||
"@radix-ui/react-progress": "^1.1.4",
|
||||
"@radix-ui/react-scroll-area": "^1.0.5",
|
||||
"@radix-ui/react-select": "^2.0.0",
|
||||
"@radix-ui/react-separator": "^1.1.7",
|
||||
@ -53,7 +54,6 @@
|
||||
"@radix-ui/react-tabs": "^1.0.4",
|
||||
"@radix-ui/react-toast": "^1.1.5",
|
||||
"@radix-ui/react-tooltip": "^1.2.7",
|
||||
"@radix-ui/react-progress": "^1.1.4",
|
||||
"@remix-run/express": "2.16.7",
|
||||
"@remix-run/node": "2.1.0",
|
||||
"@remix-run/react": "2.16.7",
|
||||
@ -80,6 +80,7 @@
|
||||
"@tiptap/pm": "^2.11.9",
|
||||
"@tiptap/react": "^2.11.9",
|
||||
"@tiptap/starter-kit": "2.11.9",
|
||||
"@trigger.dev/python": "4.0.4",
|
||||
"@trigger.dev/react-hooks": "4.0.4",
|
||||
"@trigger.dev/sdk": "4.0.4",
|
||||
"ai": "5.0.78",
|
||||
@ -125,25 +126,25 @@
|
||||
"react": "^18.2.0",
|
||||
"react-calendar-heatmap": "^1.10.0",
|
||||
"react-dom": "^18.2.0",
|
||||
"react-hotkeys-hook": "^4.5.0",
|
||||
"react-markdown": "10.1.0",
|
||||
"react-resizable-panels": "^1.0.9",
|
||||
"react-hotkeys-hook": "^4.5.0",
|
||||
"react-virtualized": "^9.22.6",
|
||||
"resumable-stream": "2.2.8",
|
||||
"remix-auth": "^4.2.0",
|
||||
"remix-auth-oauth2": "^3.4.1",
|
||||
"remix-themes": "^2.0.4",
|
||||
"remix-typedjson": "0.3.1",
|
||||
"remix-utils": "^7.7.0",
|
||||
"resumable-stream": "2.2.8",
|
||||
"sigma": "^3.0.2",
|
||||
"stripe": "19.0.0",
|
||||
"simple-oauth2": "^5.1.0",
|
||||
"stripe": "19.0.0",
|
||||
"tailwind-merge": "^2.6.0",
|
||||
"tiptap-markdown": "0.9.0",
|
||||
"tailwind-scrollbar-hide": "^2.0.0",
|
||||
"tailwindcss-animate": "^1.0.7",
|
||||
"tailwindcss-textshadow": "^2.1.3",
|
||||
"tiny-invariant": "^1.3.1",
|
||||
"tiptap-markdown": "0.9.0",
|
||||
"zod": "3.25.76",
|
||||
"zod-error": "1.5.0",
|
||||
"zod-validation-error": "^1.5.0"
|
||||
|
||||
299
apps/webapp/python/README.md
Normal file
299
apps/webapp/python/README.md
Normal file
@ -0,0 +1,299 @@
|
||||
# BERT Topic Modeling CLI for Echo Episodes
|
||||
|
||||
This CLI tool performs topic modeling on Echo episodes using BERTopic. It connects to Neo4j, retrieves episodes with their pre-computed embeddings for a given user, and discovers thematic clusters using HDBSCAN clustering.
|
||||
|
||||
## Features
|
||||
|
||||
- Connects to Neo4j database to fetch episodes
|
||||
- Uses pre-computed embeddings (no need to regenerate them)
|
||||
- Performs semantic topic clustering with BERTopic
|
||||
- Displays topics with:
|
||||
- Top keywords per topic
|
||||
- Episode count per topic
|
||||
- Sample episodes for each topic
|
||||
- Configurable minimum topic size
|
||||
- Environment variable support for easy configuration
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Python 3.8+
|
||||
- Access to Neo4j database with episodes stored
|
||||
- Pre-computed embeddings stored in Neo4j (in `contentEmbedding` field)
|
||||
|
||||
## Installation
|
||||
|
||||
1. Navigate to the bert directory:
|
||||
|
||||
```bash
|
||||
cd apps/webapp/app/bert
|
||||
```
|
||||
|
||||
2. Install dependencies:
|
||||
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
The CLI can read Neo4j connection details from:
|
||||
|
||||
1. **Environment variables** (recommended) - Create a `.env` file or export:
|
||||
|
||||
```bash
|
||||
export NEO4J_URI=bolt://localhost:7687
|
||||
export NEO4J_USERNAME=neo4j
|
||||
export NEO4J_PASSWORD=your_password
|
||||
```
|
||||
|
||||
2. **Command-line options** - Pass credentials directly as flags
|
||||
|
||||
3. **From project root** - The tool automatically loads `.env` from the project root
|
||||
|
||||
## Usage
|
||||
|
||||
### Basic Usage
|
||||
|
||||
Using environment variables (most common):
|
||||
|
||||
```bash
|
||||
python main.py <user_id>
|
||||
```
|
||||
|
||||
### Advanced Options
|
||||
|
||||
```bash
|
||||
python main.py <user_id> [OPTIONS]
|
||||
```
|
||||
|
||||
**Options:**
|
||||
|
||||
- `--min-topic-size INTEGER`: Minimum number of episodes per topic (default: 10)
|
||||
- `--nr-topics INTEGER`: Target number of topics for reduction (optional)
|
||||
- `--propose-spaces`: Generate space proposals using OpenAI (requires OPENAI_API_KEY)
|
||||
- `--openai-api-key TEXT`: OpenAI API key for space proposals (or use OPENAI_API_KEY env var)
|
||||
- `--json`: Output only final results in JSON format (suppresses all other output)
|
||||
- `--neo4j-uri TEXT`: Neo4j connection URI (default: bolt://localhost:7687)
|
||||
- `--neo4j-username TEXT`: Neo4j username (default: neo4j)
|
||||
- `--neo4j-password TEXT`: Neo4j password (required)
|
||||
|
||||
### Examples
|
||||
|
||||
1. **Basic usage with environment variables:**
|
||||
|
||||
```bash
|
||||
python main.py user-123
|
||||
```
|
||||
|
||||
2. **Custom minimum topic size:**
|
||||
|
||||
```bash
|
||||
python main.py user-123 --min-topic-size 10
|
||||
```
|
||||
|
||||
3. **Explicit credentials:**
|
||||
|
||||
```bash
|
||||
python main.py user-123 \
|
||||
--neo4j-uri bolt://neo4j:7687 \
|
||||
--neo4j-username neo4j \
|
||||
--neo4j-password mypassword
|
||||
```
|
||||
|
||||
4. **Using Docker compose Neo4j:**
|
||||
```bash
|
||||
python main.py user-123 \
|
||||
--neo4j-uri bolt://localhost:7687 \
|
||||
--neo4j-password 27192e6432564f4788d55c15131bd5ac
|
||||
```
|
||||
|
||||
5. **With space proposals:**
|
||||
```bash
|
||||
python main.py user-123 --propose-spaces
|
||||
```
|
||||
|
||||
6. **JSON output mode (for programmatic use):**
|
||||
```bash
|
||||
python main.py user-123 --json
|
||||
```
|
||||
|
||||
7. **JSON output with space proposals:**
|
||||
```bash
|
||||
python main.py user-123 --propose-spaces --json
|
||||
```
|
||||
|
||||
### Get Help
|
||||
|
||||
```bash
|
||||
python main.py --help
|
||||
```
|
||||
|
||||
## Output Formats
|
||||
|
||||
### Human-Readable Output (Default)
|
||||
|
||||
The CLI outputs:
|
||||
|
||||
```
|
||||
================================================================================
|
||||
BERT TOPIC MODELING FOR ECHO EPISODES
|
||||
================================================================================
|
||||
User ID: user-123
|
||||
Min Topic Size: 20
|
||||
================================================================================
|
||||
|
||||
✓ Connected to Neo4j at bolt://localhost:7687
|
||||
✓ Fetched 150 episodes with embeddings
|
||||
|
||||
🔍 Running BERTopic analysis (min_topic_size=20)...
|
||||
✓ Topic modeling complete
|
||||
|
||||
================================================================================
|
||||
TOPIC MODELING RESULTS
|
||||
================================================================================
|
||||
Total Topics Found: 5
|
||||
Total Episodes: 150
|
||||
================================================================================
|
||||
|
||||
────────────────────────────────────────────────────────────────────────────────
|
||||
Topic 0: 45 episodes
|
||||
────────────────────────────────────────────────────────────────────────────────
|
||||
Keywords: authentication, login, user, security, session, password, token, oauth, jwt, credentials
|
||||
|
||||
Sample Episodes (showing up to 3):
|
||||
1. [uuid-123]
|
||||
Discussing authentication flow for the new user login system...
|
||||
|
||||
2. [uuid-456]
|
||||
Implementing OAuth2 with JWT tokens for secure sessions...
|
||||
|
||||
3. [uuid-789]
|
||||
Password reset functionality with email verification...
|
||||
|
||||
────────────────────────────────────────────────────────────────────────────────
|
||||
Topic 1: 32 episodes
|
||||
────────────────────────────────────────────────────────────────────────────────
|
||||
Keywords: database, neo4j, query, graph, cypher, nodes, relationships, index, performance, optimization
|
||||
|
||||
Sample Episodes (showing up to 3):
|
||||
...
|
||||
|
||||
Topic -1 (Outliers): 8 episodes
|
||||
|
||||
================================================================================
|
||||
✓ Analysis complete!
|
||||
================================================================================
|
||||
|
||||
✓ Neo4j connection closed
|
||||
```
|
||||
|
||||
### JSON Output Mode (--json flag)
|
||||
|
||||
When using the `--json` flag, the tool outputs only a clean JSON object with no debug logs:
|
||||
|
||||
```json
|
||||
{
|
||||
"topics": {
|
||||
"0": {
|
||||
"keywords": ["authentication", "login", "user", "security", "session"],
|
||||
"episodeIds": ["uuid-123", "uuid-456", "uuid-789"]
|
||||
},
|
||||
"1": {
|
||||
"keywords": ["database", "neo4j", "query", "graph", "cypher"],
|
||||
"episodeIds": ["uuid-abc", "uuid-def"]
|
||||
}
|
||||
},
|
||||
"spaces": [
|
||||
{
|
||||
"name": "User Authentication",
|
||||
"intent": "Episodes about user authentication, login systems, and security belong in this space.",
|
||||
"confidence": 85,
|
||||
"topics": [0, 3],
|
||||
"estimatedEpisodes": 120
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**JSON Output Structure:**
|
||||
- `topics`: Dictionary of topic IDs with keywords and episode UUIDs
|
||||
- `spaces`: Array of space proposals (only if `--propose-spaces` is used)
|
||||
- `name`: Space name (2-5 words)
|
||||
- `intent`: Classification intent (1-2 sentences)
|
||||
- `confidence`: Confidence score (0-100)
|
||||
- `topics`: Source topic IDs that form this space
|
||||
- `estimatedEpisodes`: Estimated number of episodes in this space
|
||||
|
||||
**Use Cases for JSON Mode:**
|
||||
- Programmatic consumption by other tools
|
||||
- Piping output to jq or other JSON processors
|
||||
- Integration with CI/CD pipelines
|
||||
- Automated space creation workflows
|
||||
|
||||
## How It Works
|
||||
|
||||
1. **Connection**: Establishes connection to Neo4j database
|
||||
2. **Data Fetching**: Queries all episodes for the given userId that have:
|
||||
- Non-null `contentEmbedding` field
|
||||
- Non-empty content
|
||||
3. **Topic Modeling**: Runs BERTopic with:
|
||||
- Pre-computed embeddings (no re-embedding needed)
|
||||
- HDBSCAN clustering (automatic cluster discovery)
|
||||
- Keyword extraction via c-TF-IDF
|
||||
4. **Results**: Displays topics with keywords and sample episodes
|
||||
|
||||
## Neo4j Query
|
||||
|
||||
The tool uses this Cypher query to fetch episodes:
|
||||
|
||||
```cypher
|
||||
MATCH (e:Episode {userId: $userId})
|
||||
WHERE e.contentEmbedding IS NOT NULL
|
||||
AND size(e.contentEmbedding) > 0
|
||||
AND e.content IS NOT NULL
|
||||
AND e.content <> ''
|
||||
RETURN e.uuid as uuid,
|
||||
e.content as content,
|
||||
e.contentEmbedding as embedding,
|
||||
e.createdAt as createdAt
|
||||
ORDER BY e.createdAt DESC
|
||||
```
|
||||
|
||||
## Tuning Parameters
|
||||
|
||||
- **`--min-topic-size`**:
|
||||
- Smaller values (5-10): More granular topics, may include noise
|
||||
- Larger values (20-30): Broader topics, more coherent but fewer clusters
|
||||
- Recommended: Start with 20 and adjust based on your data
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### No episodes found
|
||||
|
||||
- Verify the userId exists in Neo4j
|
||||
- Check that episodes have `contentEmbedding` populated
|
||||
- Ensure episodes have non-empty `content` field
|
||||
|
||||
### Connection errors
|
||||
|
||||
- Verify Neo4j is running: `docker ps | grep neo4j`
|
||||
- Check URI format: should be `bolt://host:port`
|
||||
- Verify credentials are correct
|
||||
|
||||
### Too few/many topics
|
||||
|
||||
- Adjust `--min-topic-size` parameter
|
||||
- Need more topics: decrease the value (e.g., `--min-topic-size 10`)
|
||||
- Need fewer topics: increase the value (e.g., `--min-topic-size 30`)
|
||||
|
||||
## Dependencies
|
||||
|
||||
- `bertopic>=0.16.0` - Topic modeling
|
||||
- `neo4j>=5.14.0` - Neo4j Python driver
|
||||
- `click>=8.1.0` - CLI framework
|
||||
- `numpy>=1.24.0` - Numerical operations
|
||||
- `python-dotenv>=1.0.0` - Environment variable loading
|
||||
|
||||
## License
|
||||
|
||||
Part of the Echo project.
|
||||
384
apps/webapp/python/main.py
Normal file
384
apps/webapp/python/main.py
Normal file
@ -0,0 +1,384 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
BERT Topic Modeling CLI for Echo Episodes
|
||||
|
||||
This CLI tool connects to Neo4j, retrieves episodes with their embeddings for a given userId,
|
||||
and performs topic modeling using BERTopic to discover thematic clusters.
|
||||
"""
|
||||
|
||||
import os
|
||||
import sys
|
||||
import json
|
||||
from typing import List, Tuple, Dict, Any
|
||||
import click
|
||||
import numpy as np
|
||||
from neo4j import GraphDatabase
|
||||
from bertopic import BERTopic
|
||||
from bertopic.vectorizers import ClassTfidfTransformer
|
||||
from dotenv import load_dotenv
|
||||
from sklearn.feature_extraction.text import CountVectorizer
|
||||
from umap import UMAP
|
||||
from hdbscan import HDBSCAN
|
||||
|
||||
|
||||
class Neo4jConnection:
|
||||
"""Manages Neo4j database connection."""
|
||||
|
||||
def __init__(self, uri: str, username: str, password: str, quiet: bool = False):
|
||||
"""Initialize Neo4j connection.
|
||||
|
||||
Args:
|
||||
uri: Neo4j connection URI (e.g., bolt://localhost:7687)
|
||||
username: Neo4j username
|
||||
password: Neo4j password
|
||||
quiet: If True, suppress output messages
|
||||
"""
|
||||
self.quiet = quiet
|
||||
try:
|
||||
self.driver = GraphDatabase.driver(uri, auth=(username, password))
|
||||
# Verify connection
|
||||
self.driver.verify_connectivity()
|
||||
if not quiet:
|
||||
click.echo(f"✓ Connected to Neo4j at {uri}")
|
||||
except Exception as e:
|
||||
if not quiet:
|
||||
click.echo(f"✗ Failed to connect to Neo4j: {e}", err=True)
|
||||
sys.exit(1)
|
||||
|
||||
def close(self):
|
||||
"""Close the Neo4j connection."""
|
||||
if self.driver:
|
||||
self.driver.close()
|
||||
if not self.quiet:
|
||||
click.echo("✓ Neo4j connection closed")
|
||||
|
||||
def get_episodes_with_embeddings(self, user_id: str) -> Tuple[List[str], List[str], np.ndarray]:
|
||||
"""Fetch all episodes with their embeddings for a given user.
|
||||
|
||||
Args:
|
||||
user_id: The user ID to fetch episodes for
|
||||
|
||||
Returns:
|
||||
Tuple of (episode_uuids, episode_contents, embeddings_array)
|
||||
"""
|
||||
query = """
|
||||
MATCH (e:Episode {userId: $userId})
|
||||
WHERE e.contentEmbedding IS NOT NULL
|
||||
AND size(e.contentEmbedding) > 0
|
||||
AND e.content IS NOT NULL
|
||||
AND e.content <> ''
|
||||
RETURN e.uuid as uuid,
|
||||
e.content as content,
|
||||
e.contentEmbedding as embedding,
|
||||
e.createdAt as createdAt
|
||||
ORDER BY e.createdAt DESC
|
||||
"""
|
||||
|
||||
with self.driver.session() as session:
|
||||
result = session.run(query, userId=user_id)
|
||||
records = list(result)
|
||||
|
||||
if not records:
|
||||
if not self.quiet:
|
||||
click.echo(f"✗ No episodes found for userId: {user_id}", err=True)
|
||||
sys.exit(1)
|
||||
|
||||
uuids = []
|
||||
contents = []
|
||||
embeddings = []
|
||||
|
||||
for record in records:
|
||||
uuids.append(record["uuid"])
|
||||
contents.append(record["content"])
|
||||
embeddings.append(record["embedding"])
|
||||
|
||||
embeddings_array = np.array(embeddings, dtype=np.float32)
|
||||
|
||||
if not self.quiet:
|
||||
click.echo(f"✓ Fetched {len(contents)} episodes with embeddings")
|
||||
return uuids, contents, embeddings_array
|
||||
|
||||
|
||||
def run_bertopic_analysis(
|
||||
contents: List[str],
|
||||
embeddings: np.ndarray,
|
||||
min_topic_size: int = 20,
|
||||
nr_topics: int = None,
|
||||
quiet: bool = False
|
||||
) -> Tuple[BERTopic, List[int], List[float]]:
|
||||
"""Run BERTopic clustering on episode contents with improved configuration.
|
||||
|
||||
Args:
|
||||
contents: List of episode content strings
|
||||
embeddings: Pre-computed embeddings for the episodes
|
||||
min_topic_size: Minimum number of documents per topic
|
||||
nr_topics: Target number of topics (optional, for topic reduction)
|
||||
quiet: If True, suppress output messages
|
||||
|
||||
Returns:
|
||||
Tuple of (bertopic_model, topic_assignments, probabilities)
|
||||
"""
|
||||
if not quiet:
|
||||
click.echo(f"\n🔍 Running BERTopic analysis (min_topic_size={min_topic_size})...")
|
||||
|
||||
# Step 1: Configure UMAP for dimensionality reduction
|
||||
# More aggressive reduction helps find distinct clusters
|
||||
umap_model = UMAP(
|
||||
n_neighbors=15, # Balance between local/global structure
|
||||
n_components=5, # Reduce to 5 dimensions
|
||||
min_dist=0.0, # Allow tight clusters
|
||||
metric='cosine', # Use cosine similarity
|
||||
random_state=42
|
||||
)
|
||||
|
||||
# Step 2: Configure HDBSCAN for clustering
|
||||
# Tuned to find more granular topics
|
||||
hdbscan_model = HDBSCAN(
|
||||
min_cluster_size=min_topic_size, # Minimum episodes per topic
|
||||
min_samples=5, # More sensitive to local density
|
||||
metric='euclidean',
|
||||
cluster_selection_method='eom', # Excess of mass method
|
||||
prediction_data=True
|
||||
)
|
||||
|
||||
# Step 3: Configure vectorizer with stopword removal
|
||||
# Remove common English stopwords that pollute topic keywords
|
||||
vectorizer_model = CountVectorizer(
|
||||
stop_words='english', # Remove common English words
|
||||
min_df=2, # Word must appear in at least 2 docs
|
||||
max_df=0.95, # Ignore words in >95% of docs
|
||||
ngram_range=(1, 2) # Include unigrams and bigrams
|
||||
)
|
||||
|
||||
# Step 4: Configure c-TF-IDF with better parameters
|
||||
ctfidf_model = ClassTfidfTransformer(
|
||||
reduce_frequent_words=True, # Further reduce common words
|
||||
bm25_weighting=True # Use BM25 for better keyword extraction
|
||||
)
|
||||
|
||||
# Step 5: Initialize BERTopic with all custom components
|
||||
model = BERTopic(
|
||||
embedding_model=None, # Use pre-computed embeddings
|
||||
umap_model=umap_model,
|
||||
hdbscan_model=hdbscan_model,
|
||||
vectorizer_model=vectorizer_model,
|
||||
ctfidf_model=ctfidf_model,
|
||||
top_n_words=15, # More keywords per topic
|
||||
nr_topics=nr_topics, # Optional topic reduction
|
||||
calculate_probabilities=True,
|
||||
verbose=(not quiet)
|
||||
)
|
||||
|
||||
# Fit the model with pre-computed embeddings
|
||||
topics, probs = model.fit_transform(contents, embeddings=embeddings)
|
||||
|
||||
# Get topic count
|
||||
unique_topics = len(set(topics)) - (1 if -1 in topics else 0)
|
||||
if not quiet:
|
||||
click.echo(f"✓ Topic modeling complete - Found {unique_topics} topics")
|
||||
|
||||
return model, topics, probs
|
||||
|
||||
|
||||
def print_topic_results(
|
||||
model: BERTopic,
|
||||
topics: List[int],
|
||||
uuids: List[str],
|
||||
contents: List[str]
|
||||
):
|
||||
"""Print formatted topic results.
|
||||
|
||||
Args:
|
||||
model: Fitted BERTopic model
|
||||
topics: Topic assignments for each episode
|
||||
uuids: Episode UUIDs
|
||||
contents: Episode contents
|
||||
"""
|
||||
# Get topic info
|
||||
topic_info = model.get_topic_info()
|
||||
num_topics = len(topic_info) - 1 # Exclude outlier topic (-1)
|
||||
|
||||
click.echo(f"\n{'='*80}")
|
||||
click.echo(f"TOPIC MODELING RESULTS")
|
||||
click.echo(f"{'='*80}")
|
||||
click.echo(f"Total Topics Found: {num_topics}")
|
||||
click.echo(f"Total Episodes: {len(contents)}")
|
||||
click.echo(f"{'='*80}\n")
|
||||
|
||||
# Print each topic
|
||||
for idx, row in topic_info.iterrows():
|
||||
topic_id = row['Topic']
|
||||
count = row['Count']
|
||||
|
||||
# Skip outlier topic
|
||||
if topic_id == -1:
|
||||
click.echo(f"Topic -1 (Outliers): {count} episodes\n")
|
||||
continue
|
||||
|
||||
# Get top words for this topic
|
||||
topic_words = model.get_topic(topic_id)
|
||||
|
||||
click.echo(f"{'─'*80}")
|
||||
click.echo(f"Topic {topic_id}: {count} episodes")
|
||||
click.echo(f"{'─'*80}")
|
||||
|
||||
# Print top keywords
|
||||
if topic_words:
|
||||
keywords = [word for word, score in topic_words[:10]]
|
||||
click.echo(f"Keywords: {', '.join(keywords)}")
|
||||
|
||||
# Print sample episodes
|
||||
topic_episodes = [(uuid, content) for uuid, content, topic
|
||||
in zip(uuids, contents, topics) if topic == topic_id]
|
||||
|
||||
click.echo(f"\nSample Episodes (showing up to 3):")
|
||||
for i, (uuid, content) in enumerate(topic_episodes[:3]):
|
||||
# Truncate content for display
|
||||
truncated = content[:200] + "..." if len(content) > 200 else content
|
||||
click.echo(f" {i+1}. [{uuid}]")
|
||||
click.echo(f" {truncated}\n")
|
||||
|
||||
click.echo()
|
||||
|
||||
|
||||
def build_json_output(
|
||||
model: BERTopic,
|
||||
topics: List[int],
|
||||
uuids: List[str]
|
||||
) -> Dict[str, Any]:
|
||||
"""Build JSON output structure.
|
||||
|
||||
Args:
|
||||
model: Fitted BERTopic model
|
||||
topics: Topic assignments for each episode
|
||||
uuids: Episode UUIDs
|
||||
|
||||
Returns:
|
||||
Dictionary with topics data
|
||||
"""
|
||||
# Build topics dictionary
|
||||
topics_dict = {}
|
||||
topic_info = model.get_topic_info()
|
||||
|
||||
for idx, row in topic_info.iterrows():
|
||||
topic_id = row['Topic']
|
||||
|
||||
# Skip outlier topic
|
||||
if topic_id == -1:
|
||||
continue
|
||||
|
||||
# Get keywords
|
||||
topic_words = model.get_topic(topic_id)
|
||||
keywords = [word for word, score in topic_words[:10]] if topic_words else []
|
||||
|
||||
# Get episode IDs for this topic
|
||||
episode_ids = [uuid for uuid, topic in zip(uuids, topics) if topic == topic_id]
|
||||
|
||||
topics_dict[topic_id] = {
|
||||
"keywords": keywords,
|
||||
"episodeIds": episode_ids
|
||||
}
|
||||
|
||||
return {"topics": topics_dict}
|
||||
|
||||
|
||||
@click.command()
|
||||
@click.argument('user_id', type=str)
|
||||
@click.option(
|
||||
'--min-topic-size',
|
||||
default=10,
|
||||
type=int,
|
||||
help='Minimum number of episodes per topic (default: 10, lower = more granular topics)'
|
||||
)
|
||||
@click.option(
|
||||
'--nr-topics',
|
||||
default=None,
|
||||
type=int,
|
||||
help='Target number of topics for reduction (optional, e.g., 20 for ~20 topics)'
|
||||
)
|
||||
@click.option(
|
||||
'--neo4j-uri',
|
||||
envvar='NEO4J_URI',
|
||||
default='bolt://localhost:7687',
|
||||
help='Neo4j connection URI (default: bolt://localhost:7687)'
|
||||
)
|
||||
@click.option(
|
||||
'--neo4j-username',
|
||||
envvar='NEO4J_USERNAME',
|
||||
default='neo4j',
|
||||
help='Neo4j username (default: neo4j)'
|
||||
)
|
||||
@click.option(
|
||||
'--neo4j-password',
|
||||
envvar='NEO4J_PASSWORD',
|
||||
required=True,
|
||||
help='Neo4j password (required, can use NEO4J_PASSWORD env var)'
|
||||
)
|
||||
@click.option(
|
||||
'--json',
|
||||
'json_output',
|
||||
is_flag=True,
|
||||
default=False,
|
||||
help='Output only final results in JSON format (suppresses all other output)'
|
||||
)
|
||||
def main(user_id: str, min_topic_size: int, nr_topics: int, neo4j_uri: str, neo4j_username: str, neo4j_password: str, json_output: bool):
|
||||
"""
|
||||
Run BERTopic analysis on episodes for a given USER_ID.
|
||||
|
||||
This tool connects to Neo4j, retrieves all episodes with embeddings for the specified user,
|
||||
and performs topic modeling to discover thematic clusters.
|
||||
|
||||
Examples:
|
||||
|
||||
# Using environment variables from .env file
|
||||
python main.py user-123
|
||||
|
||||
# With custom min topic size
|
||||
python main.py user-123 --min-topic-size 10
|
||||
|
||||
# With explicit Neo4j credentials
|
||||
python main.py user-123 --neo4j-uri bolt://localhost:7687 --neo4j-password mypassword
|
||||
"""
|
||||
# Print header only if not in JSON mode
|
||||
if not json_output:
|
||||
click.echo(f"\n{'='*80}")
|
||||
click.echo("BERT TOPIC MODELING FOR ECHO EPISODES")
|
||||
click.echo(f"{'='*80}")
|
||||
click.echo(f"User ID: {user_id}")
|
||||
click.echo(f"Min Topic Size: {min_topic_size}")
|
||||
if nr_topics:
|
||||
click.echo(f"Target Topics: ~{nr_topics}")
|
||||
click.echo(f"{'='*80}\n")
|
||||
|
||||
# Connect to Neo4j (quiet mode if JSON output)
|
||||
neo4j_conn = Neo4jConnection(neo4j_uri, neo4j_username, neo4j_password, quiet=json_output)
|
||||
|
||||
try:
|
||||
# Fetch episodes with embeddings
|
||||
uuids, contents, embeddings = neo4j_conn.get_episodes_with_embeddings(user_id)
|
||||
|
||||
# Run BERTopic analysis
|
||||
model, topics, probs = run_bertopic_analysis(contents, embeddings, min_topic_size, nr_topics, quiet=json_output)
|
||||
|
||||
# Output results
|
||||
if json_output:
|
||||
# JSON output mode - only print JSON
|
||||
output = build_json_output(model, topics, uuids)
|
||||
click.echo(json.dumps(output, indent=2))
|
||||
else:
|
||||
# Normal output mode - print formatted results
|
||||
print_topic_results(model, topics, uuids, contents)
|
||||
|
||||
click.echo(f"{'='*80}")
|
||||
click.echo("✓ Analysis complete!")
|
||||
click.echo(f"{'='*80}\n")
|
||||
|
||||
finally:
|
||||
# Always close connection
|
||||
neo4j_conn.close()
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
# Load environment variables from .env file if present
|
||||
load_dotenv()
|
||||
main()
|
||||
8
apps/webapp/python/requirements.txt
Normal file
8
apps/webapp/python/requirements.txt
Normal file
@ -0,0 +1,8 @@
|
||||
bertopic>=0.16.0
|
||||
neo4j>=5.14.0
|
||||
click>=8.1.0
|
||||
numpy>=1.24.0
|
||||
python-dotenv>=1.0.0
|
||||
scikit-learn>=1.3.0
|
||||
umap-learn>=0.5.4
|
||||
hdbscan>=0.8.33
|
||||
@ -1,6 +1,7 @@
|
||||
import { defineConfig } from "@trigger.dev/sdk/v3";
|
||||
import { syncEnvVars } from "@trigger.dev/build/extensions/core";
|
||||
import { prismaExtension } from "@trigger.dev/build/extensions/prisma";
|
||||
import { pythonExtension } from "@trigger.dev/python/extension";
|
||||
|
||||
export default defineConfig({
|
||||
project: process.env.TRIGGER_PROJECT_ID as string,
|
||||
@ -23,6 +24,9 @@ export default defineConfig({
|
||||
dirs: ["./app/trigger"],
|
||||
build: {
|
||||
extensions: [
|
||||
pythonExtension({
|
||||
scripts: ["./python/*.py"],
|
||||
}),
|
||||
syncEnvVars(() => ({
|
||||
// ANTHROPIC_API_KEY: process.env.ANTHROPIC_API_KEY as string,
|
||||
// API_BASE_URL: process.env.API_BASE_URL as string,
|
||||
|
||||
@ -55,7 +55,16 @@ RUN pnpm run build --filter=webapp...
|
||||
|
||||
# Runner
|
||||
FROM ${NODE_IMAGE} AS runner
|
||||
RUN apt-get update && apt-get install -y openssl netcat-openbsd ca-certificates
|
||||
RUN apt-get update && apt-get install -y \
|
||||
openssl \
|
||||
netcat-openbsd \
|
||||
ca-certificates \
|
||||
python3 \
|
||||
python3-pip \
|
||||
python3-venv \
|
||||
gcc \
|
||||
g++ \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
WORKDIR /core
|
||||
RUN corepack enable
|
||||
ENV NODE_ENV production
|
||||
@ -69,6 +78,13 @@ COPY --from=builder --chown=node:node /core/apps/webapp/build ./apps/webapp/buil
|
||||
COPY --from=builder --chown=node:node /core/apps/webapp/public ./apps/webapp/public
|
||||
COPY --from=builder --chown=node:node /core/scripts ./scripts
|
||||
|
||||
# Install BERT Python dependencies
|
||||
COPY --chown=node:node apps/webapp/python/requirements.txt ./apps/webapp/python/requirements.txt
|
||||
RUN pip3 install --no-cache-dir -r ./apps/webapp/python/requirements.txt
|
||||
|
||||
# Copy BERT scripts
|
||||
COPY --chown=node:node apps/webapp/python/main.py ./apps/webapp/python/main.py
|
||||
|
||||
EXPOSE 3000
|
||||
|
||||
USER node
|
||||
|
||||
@ -14,9 +14,7 @@ description: "Get started with CORE in 5 minutes"
|
||||
|
||||
## Requirements
|
||||
|
||||
These are the minimum requirements for running the webapp and background job components. They can run on the same, or on separate machines.
|
||||
|
||||
It's fine to run everything on the same machine for testing. To be able to scale your workers, you will want to run them separately.
|
||||
These are the minimum requirements for running the core.
|
||||
|
||||
### Prerequisites
|
||||
|
||||
@ -27,7 +25,6 @@ To run CORE, you will need:
|
||||
|
||||
### System Requirements
|
||||
|
||||
**Webapp & Database Machine:**
|
||||
- 4+ vCPU
|
||||
- 8+ GB RAM
|
||||
- 20+ GB Storage
|
||||
@ -41,7 +38,7 @@ CORE offers multiple deployment approaches depending on your needs:
|
||||
|
||||
For a one-click deployment experience, use Railway:
|
||||
|
||||
[](https://railway.com/deploy/6aEd9C?referralCode=LHvbIb&utm_medium=integration&utm_source=template&utm_campaign=generic)
|
||||
[](https://railway.com/deploy/core?referralCode=LHvbIb&utm_medium=integration&utm_source=template&utm_campaign=generic)
|
||||
|
||||
Railway will automatically set up all required services and handle the infrastructure for you.
|
||||
|
||||
|
||||
@ -16,7 +16,7 @@ We provide version-tagged releases for self-hosted deployments. It's highly advi
|
||||
|
||||
For a quick one-click deployment, you can use Railway:
|
||||
|
||||
[](https://railway.com/deploy/6aEd9C?referralCode=LHvbIb&utm_medium=integration&utm_source=template&utm_campaign=generic)
|
||||
[](https://railway.com/deploy/core?referralCode=LHvbIb&utm_medium=integration&utm_source=template&utm_campaign=generic)
|
||||
|
||||
Alternatively, you can follow our [Docker deployment guide](/self-hosting/docker) for manual setup.
|
||||
|
||||
|
||||
@ -52,4 +52,6 @@ MODEL=gpt-4.1-2025-04-14
|
||||
## for opensource embedding model
|
||||
# EMBEDDING_MODEL=mxbai-embed-large
|
||||
|
||||
QUEUE_PROVIDER=bullmq
|
||||
QUEUE_PROVIDER=bullmq
|
||||
|
||||
TELEMETRY_ENABLED=false
|
||||
@ -33,6 +33,7 @@ services:
|
||||
- ENABLE_EMAIL_LOGIN=${ENABLE_EMAIL_LOGIN}
|
||||
- OLLAMA_URL=${OLLAMA_URL}
|
||||
- EMBEDDING_MODEL=${EMBEDDING_MODEL}
|
||||
- EMBEDDING_MODEL_SIZE=${EMBEDDING_MODEL_SIZE}
|
||||
- MODEL=${MODEL}
|
||||
- TRIGGER_PROJECT_ID=${TRIGGER_PROJECT_ID}
|
||||
- TRIGGER_SECRET_KEY=${TRIGGER_SECRET_KEY}
|
||||
|
||||
@ -0,0 +1,2 @@
|
||||
-- AlterTable
|
||||
ALTER TABLE "Workspace" ADD COLUMN "metadata" JSONB NOT NULL DEFAULT '{}';
|
||||
@ -694,6 +694,8 @@ model Workspace {
|
||||
slug String @unique
|
||||
icon String?
|
||||
|
||||
metadata Json @default("{}")
|
||||
|
||||
integrations String[]
|
||||
|
||||
userId String? @unique
|
||||
|
||||
2077
pnpm-lock.yaml
generated
2077
pnpm-lock.yaml
generated
File diff suppressed because it is too large
Load Diff
Loading…
x
Reference in New Issue
Block a user