Watson / Forensic AI Assistant

Overview & Goals

Watson is a professional-grade collaborative forensic investigation platform. It pairs human analysts with an AI assistant that works side by side — both feeding findings, IOCs, tasks, assets, and recommendations into a shared central investigation platform.

Core Principles

PrincipleDescription
CollaborativeHuman and AI analysts contribute equally; AI suggestions require human validation before becoming active
ModularEvery major component (LLM backend, MCP servers, vector DB) is pluggable and replaceable
Security-firstAll API routes are authenticated and authorized; secrets managed via environment variables
Token-efficientThe AI never ingests raw artifacts wholesale; it uses targeted filtering, line counting, and truncation
Audit-readyAll entities carry full provenance metadata and soft-delete support

Terminology & Glossary

TermDefinition
InvestigationA scoped forensic case grouping all assets, findings, IOCs, tasks, recommendations, and analysts
AnalystA human user assigned to one or more investigations
FindingA discrete malicious or suspicious event identified during the investigation
IOCIndicator of Compromise — a typed observable (hash, IP, domain, path, etc.)
AssetA machine, account, or other entity subject to investigation
RecommendationA mitigation or remediation action suggested for a finding
TaskAn actionable item assigned to an analyst or the client
HuntA single autonomous investigation cycle performed by the Watson AI assistant
TimelineA chronological view linking Findings, Assets, IOCs, and Recommendations
MCP ServerModel Context Protocol server — exposes tools that the AI assistant can invoke
DFIR MCPThe forensic tool container that exposes forensic tools (Volatility, Plaso, etc.) via MCP
App MCPThe platform UI backend MCP interface consumed by the Watson AI assistant
Vector DBOptional knowledge base storing security articles and investigation references
SummaryAn AI-generated executive summary uploaded to the platform after each hunt
Creator TypeEnum distinguishing human vs. ai as the origin of an entity
EnabledBoolean flag; AI-created entities default to false (require human validation)

Container Summary

ContainerImage BaseRole
ui-frontendnode:20-alpineVueJS SPA served by Nginx
ui-backendnode:20-alpineREST API + App MCP + WebSocket
postgrespostgres:16-alpinePrimary relational database
watsonpython:3.12-slimAI assistant orchestrator
dfir-mcpubuntu:22.04 + forensic toolsDFIR tool execution + MCP server
vectordbqdrant/qdrant (or similar)Vector knowledge store (optional)
vectordb-mcppython:3.12-slimMCP wrapper for the vector DB

3. System Architecture

3.1 High-Level Component Diagram

┌──────────────────────────────────────────────────────────────────────┐
│                         Docker Network: watson-net                   │
│                                                                      │
│  ┌─────────────────────┐        ┌──────────────────────────────────┐ │
│  │   ui-frontend       │◄──────►│   ui-backend                     │ │
│  │   (VueJS / Node)    │        │   (Node / Express)               │ │
│  │   Port: 3000        │        │   Port: 4000                     │ │
│  │                     │        │   - REST API                     │ │
│  │                     │        │   - MCP-like App API             │ │
│  │                     │        │   - WebSocket (realtime)         │ │
│  └─────────────────────┘        └──────────────┬───────────────────┘ │
│                                                │                      │
│                              ┌─────────────────▼──────────────────┐  │
│                              │   MongoDB                           │  │
│                              │   Port: 27017                       │  │
│                              │   Mongoose ODM (ui-backend)         │  │
│                              └─────────────────────────────────────┘  │
│                                                                      │
│  ┌────────────────────────────────────────────────────────────────┐  │
│  │   adminjs (AdminJS + Express)                                  │  │
│  │   Port: 5000  — Dev & admin panel for all Mongoose models      │  │
│  └────────────────────────────────────────────────────────────────┘  │
│                                                                      │
│  ┌─────────────────────────────────────────────────────────────────┐ │
│  │   watson (AI Assistant — Python)                                │ │
│  └─────────────────────────────────────────────────────────────────┘ │
│                                                                      │
│  ┌─────────────────────┐        ┌──────────────────────────────────┐ │
│  │   dfir-mcp          │        │   vectordb (optional)            │ │
│  │   (Python/FastAPI)  │        │   (Qdrant)  + MCP wrapper        │ │
│  └─────────────────────┘        └──────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────┘

3.2 Container Summary

ContainerImage BaseRole
ui-frontendnode:20-alpineVueJS SPA served by Nginx
ui-backendnode:20-alpineREST API + App MCP + WebSocket
postgrespostgres:16-alpinePrimary relational database
watsonpython:3.12-slimAI assistant orchestrator
dfir-mcpubuntu:22.04 + forensic toolsDFIR tool execution + MCP server
vectordbqdrant/qdrant (or similar)Vector knowledge store (optional)
vectordb-mcppython:3.12-slimMCP wrapper for the vector DB

4. Data Models

Data models are defined as plain typed interfaces, fully decoupled from any database or persistence technology. They represent the canonical shape of each entity in the system.

BaseEntity {
  _id: string;                        // UUID v4
  enabled: boolean; 
  created: Date;
  updated: Date;
  deleted: Date | null;           // Soft delete
  creator: string;                // User ID or "watson" or free string
  creator_type: "human" | "ai";                 // AI-created entities default to false
}

4.2 User

User extends BaseEntity {
  username: string;                  // Unique
  email: string;                     // Unique
  password: string;      //hashed password
  role: [string];
  firstname: string;
  lastname: string;
  picture: string | null;
  last_login_at: Date | null;
}

4.3 Investigation

Investigation extends BaseEntity {
  title: string;
  description: string;
  status: "open" | "in_progress" | "closed" | "archived";
  severity: "critical" | "high" | "medium" | "low" | "informational";
  started: Date;
  closed: Date | null;
  assigned: string[];       // Array of User IDs
  default_system_prompt: string;     // Platform default, overridable
  custom_system_prompt: string | null; // Per-investigation override
  last_hunt_at: Date | null;
  last_summary_at: Date | null;
  last_summary_content: string | null; // AI executive summary (Markdown)
  tags: string[];
  metadata: Record<string, unknown>; // Extensible key-value bag
}

4.4 Asset

Asset extends BaseEntity {
  investigation: string; // Investigation id
  iocs: string[];                 // Associated IOCs
  findings: string[];             // Associated Findings
  type: "machine" | "account" | "network_share" | "service" | "email_address" | "other";
  name: string;                      // Hostname, username, IP, etc.
  description: string | null;
  properties: Record<string, string>; // OS version, domain, owner, etc.
  compromise_status: "unknown" | "suspected" | "confirmed" | "cleared";
  tags: string[];
}

4.5 IOC (Indicator of Compromise)

type IOCType =
  | "hash::sha1"
  | "hash::md5"
  | "hash::sha256"
  | "hash::sha512"
  | "ip::ipv4"
  | "ip::ipv6"
  | "domain::fqdn"
  | "domain::subdomain"
  | "url::full"
  | "file::path"
  | "file::name"
  | "file::extension"
  | "scheduled-task::name"
  | "registry::key"
  | "registry::value"
  | "string::email"
  | "string::useragent"
  | "string::string"
  | "certificate::thumbprint"
  | "mutex::name"
  | "pipe::name"
  | "service::name"
  | "process::name"
  | "network::port";

IOC extends BaseEntity {
  investigation: string; // Investigation id
  type: IOCType;
  value: string;                     // Can contain wildcards (*) or be a regex if  is_regex=true
  is_regex: boolean;
  description: string | null;
  confidence: "low" | "medium" | "high" | "confirmed";
  source: string | null;             // Origin (e.g., "analyst", "threat-intel", "watson")
  tlp: "white" | "green" | "amber" | "red"; // Traffic Light Protocol
  first_seen_at: Date | null;
  last_seen_at: Date | null;
  tags: string[];
  findings: string[];             // Findings this IOC was observed in
  assets: string[];               // Assets where this IOC was observed
  external_references: ExternalRef[];
}

ExternalRef {
  name: string;                      // e.g., "VirusTotal", "MISP"
  url: string | null;
  value: string | null;
}

4.6 Finding

FindingType =
  | "account_creation"
  | "account_modification"
  | "account_deletion"
  | "privilege_escalation"
  | "lateral_movement"
  | "persistence"
  | "defense_evasion"
  | "credential_access"
  | "discovery"
  | "collection"
  | "exfiltration"
  | "command_and_control"
  | "impact"
  | "execution"
  | "initial_access"
  | "file_creation"
  | "file_modification"
  | "file_deletion"
  | "process_creation"
  | "network_connection"
  | "registry_modification"
  | "scheduled_task"
  | "service_creation"
  | "log_tampering"
  | "suspicious_activity"
  | "other";

FindingSeverity = "critical" | "high" | "medium" | "low" | "informational";

Finding extends BaseEntity {
  investigation: string; // Investigation id
  timestamp: Date;                   // When the event occurred
  utc_offset: string;                // e.g., "+02:00"
  location: string;                  // e.g., "DESKTOP-ABC123:/C/Windows/System32/winevt/..."
  source: string; // Associated source Asset where the action originate from
  source_metadata: string // Add JSON meta data such as {"port":"80","protocol":"XX"} 
  destination: string;  // Associated destination Asset where the action was performed 
  destination_metadata: string // Add JSON meta data such as {"port":"80","protocol":"XX"} 
  type: FindingType;
  severity: FindingSeverity;
  mitre_tactic: string | null;       // e.g., "TA0003"
  mitre_technique: string | null;    // e.g., "T1053.005"
  title: string;                     // Short one-liner title
  description: string;               // Human-readable description (one sentence to paragraph)
  content: string;                   // Raw artifact value, truncated at 500 chars
  content_truncated: boolean;        // Indicates if content was truncated
  iocs: string[];                 // Associated IOCs
  recommendations: string[];      // Associated Recommendations
  timeline_event_id: string | null;  // Link to timeline
  parent_finding_id: string | null;  // For correlated findings
  tags: string[];
  notes: string | null;      // Human annotation
}

Note: When an event concerns multiple assets, a separate Finding is created for each asset, referencing the same event with appropriate location/asset context.


4.7 Recommendation

RecommendationStatus = "opened" | "in_progress" | "implemented" | "rejected" | "deferred";

Recommendation extends BaseEntity {
  investigation: string;
  findings: string[];             // Which findings this recommendation addresses
  title: string;
  description: string;
  priority: integer;  // from 0 to 4. 0 being most urgent recommandation to implement
  status: RecommendationStatus;
  effort: "low" | "medium" | "high";
  category: "prevention" | "detection" | "containment" | "eradication" | "recovery" | "hardening";
  completed: Date | null;
  tags: string[];
}

4.8 Task

TaskType =
  | "collect_artifact"
  | "analyze_artifact"
  | "interview_user"
  | "contain_asset"
  | "disable_account"
  | "delete_account"
  | "block_ioc"
  | "patch_system"
  | "restore_system"
  | "notify_stakeholder"
  | "escalate"
  | "document"
  | "other";

TaskStatus = "created" | "rejected" | "in_progress" | "completed" | "cancelled";

Task extends BaseEntity {
  investigation: string;
  type: TaskType;
  title: string;
  description: string;
  status: TaskStatus;
  assigned: string | null;        // User ID, free string (e.g., "client"), or null
  assigned_to_type: "user" | "client" | "external" | null;
  priority: "critical" | "high" | "medium" | "low";
  due_date: Date | null;
  completed: Date | null;
  comment: string | null;
  findings: string[];
  recommendations: string[];
  tags: string[];
}

4.9 Timeline Event

TimelineEvent extends BaseEntity {
  investigation: string;
  timestamp: Date;
  utc_offset: string;
  title: string;
  description: string | null;
  findings: string[];
  assets: string[];
  iocs: string[];
  recommendations: string[];
  type: FindingType | "milestone" | "analyst_note" | "ai_note";
  severity: FindingSeverity | null;
  color: string | null;              // Hex color override for visualization
  icon: string | null;               // Icon code override
}

4.10 Message (Analyst ↔ Watson Communication)

Message extends BaseEntity {
  investigation: string;
  sender: string;                 // User ID or "watson"
  sender_type: "human" | "ai";
  content: string;
  requires_response: boolean;        // Watson uses this for questions
  response_timeout_at: Date | null;  // Watson timeout deadline
  responded: Date | null;
  is_hunt_summary: boolean;          // True for post-hunt summaries
  hunt: string | null;            // ID of the specific hunt
  parent_message_id: string | null;
}

4.11 Hunt

HuntStatus = "initializing" | "running" | "awaiting_analyst" | "summarizing" | "completed" | "failed";

Hunt extends BaseEntity {
  investigation: string;
  status: HuntStatus;
  started: Date;
  completed: Date | null;
  llm_model: string;                 // Snapshot of model used
  system_prompt_used: string;        // Snapshot of prompt used
  plan: string | null;               // AI-stated investigation plan
  memory_snapshot: string | null;    // Compressed hunt memory
  files_analyzed: string[];          // Paths of files analyzed in this hunt
  tool_calls_count: number;
  findings_suggested: number;
  iocs_suggested: number;
  assets_suggested: number;
  recommendations_suggested: number;
  tasks_suggested: number;
  tokens_used_input: number;
  tokens_used_output: number;
  summary_message_id: string | null; // Link to post-hunt Message
  error_log: string | null;
}

4.12 ArtifactType (Reference Table)

ArtifactType {
  id: string;
  name: string;                      // e.g., "Windows Event Log"
  category: string;                  // e.g., "logs", "memory", "filesystem", "network"
  description: string;
  default_path_pattern: string | null; // e.g., "C:/Windows/System32/winevt/Logs/*.evtx"
  parser_tool: string | null;        // Suggested DFIR tool
  os_compatibility: string[];        // ["windows", "linux", "macos"]
}

4.13 WatsonActivity (Realtime Feed)

ActivityType = "tool_call" | "thinking" | "suggestion" | "question" | "summary" | "error" | "status";

WatsonActivity {
  id: string;
  investigation: string;
  hunt: string;
  timestamp: Date;
  type: ActivityType;
  message: string;                   // Human-readable description of what Watson is doing
  tool_name: string | null;
  tool_input_summary: string | null; // Truncated/sanitized view of tool input
  tool_output_summary: string | null;
  is_complete: boolean;
}

Broadcast via WebSocket to all clients subscribed to the investigation channel.

5. Platform UI — Features & API

5.1 Frontend (VueJS)

Framework: Vue 3 + Composition API + Pinia (state) + Vue Router + TailwindCSS + shadcn-vue

Pages & Views

Example of main routes. Add other required routes.

RouteViewDescription
/DashboardAll investigations overview, quick stats
/investigationsInvestigation ListTable with filters, sorting, creation
/investigations/:idInvestigation DetailTabs: Overview, Timeline, Assets, IOCs, Findings, Recommendations, Tasks, Messages, Watson
/investigations/:id/timelineTimeline ViewTable and horizontal timeline modes
/investigations/:id/exportExportDownload CSV/ZIP/HTML report
/admin/usersUser ManagementAdmin only
/admin/settingsPlatform SettingsGlobal prompts, LLM config
/profileUser ProfileSelf-management, API key generation

Timeline Visualizer

Two display modes selectable by the user:

Table Mode:

  • Paginated, sortable table with columns: timestamp, severity icon, type icon, title, assets, IOCs count, creator badge
  • Color-coded severity rows
  • Inline filters: asset, type category, string search, date range, creator type (human/AI)
  • CSV export of current filter view
  • Add Finding button inline

Horizontal Timeline Mode:

  • Chronological axis (horizontal scroll if needed)
  • Rounded pill-shaped bars for each finding, color-coded by severity
  • Icons for finding type
  • Asset swimlanes (one row per asset)
  • Hover tooltip showing full details
  • Zoom controls (day / week / month / custom)
  • Click to open Finding detail modal
  • Same filter panel as table mode

Watson Realtime Panel

  • Live activity feed panel (WebSocket) showing what Watson is doing
  • Shows: current tool call, current thinking step, pending questions
  • Displays blinking indicator when Watson is actively running
  • Human can answer Watson questions inline from this panel

5.2 Backend API (Node/Express)

Base URL: /api/v1

Authentication: JWT Bearer token (Authorization: Bearer <token>) or API key header (X-API-Key: <key>)

Auth Routes

MethodRouteDescription
POST/auth/loginReturns JWT + refresh token
POST/auth/refreshRefresh JWT
POST/auth/logoutInvalidate session
POST/auth/apikeyGenerate API key for current user

Investigation Routes

MethodRouteDescription
GET/investigationsList all investigations for current user
POST/investigationsCreate investigation
GET/investigations/:idGet full investigation state
PATCH/investigations/:idUpdate investigation
DELETE/investigations/:idSoft-delete
GET/investigations/:id/summaryGet latest AI summary
POST/investigations/:id/export/csvExport all data as ZIP of CSVs
POST/investigations/:id/export/reportGenerate HTML/Markdown report
GET/investigations/:id/hunt/statusCurrent Watson hunt status
POST/investigations/:id/hunt/triggerManually trigger a new hunt

Asset / IOC / Finding / Recommendation / Task / Timeline Routes

For each entity {entity} in [assets, iocs, findings, recommendations, tasks, timeline]:

MethodRouteDescription
GET/investigations/:id/{entity}List with filter/sort/page
POST/investigations/:id/{entity}Create
GET/investigations/:id/{entity}/:eidGet single
PATCH/investigations/:id/{entity}/:eidUpdate (human only for validated entities)
DELETE/investigations/:id/{entity}/:eidSoft delete
POST/investigations/:id/{entity}/:eid/validateValidate AI-created entity
POST/investigations/:id/{entity}/:eid/rejectReject AI-created entity

Messages

MethodRouteDescription
GET/investigations/:id/messagesList messages
POST/investigations/:id/messagesPost message (human)
POST/investigations/:id/messages/:mid/respondRespond to Watson question

MCP-Compatible App API

These routes mirror the App MCP tool surface, usable by Watson AI with its API key.

MethodRouteDescription
GET/mcp/investigations/:id/stateFull current investigation state snapshot
GET/mcp/investigations/:id/findingsFindings list (paginated, filterable)
GET/mcp/investigations/:id/iocsIOCs list
GET/mcp/investigations/:id/assetsAssets list
GET/mcp/investigations/:id/tasksTasks list
POST/mcp/investigations/:id/findingsSuggest finding (sets enabled=false, creator_type="ai")
POST/mcp/investigations/:id/iocsSuggest IOC
POST/mcp/investigations/:id/assetsSuggest asset
POST/mcp/investigations/:id/recommendationsSuggest recommendation
POST/mcp/investigations/:id/tasksSuggest task
POST/mcp/investigations/:id/summaryUpload hunt summary
POST/mcp/investigations/:id/messagesSend message to analyst
GET/mcp/investigations/:id/messages/pendingGet unanswered messages pending response
POST/mcp/investigations/:id/hunt/:hid/activityPush realtime activity event

All MCP routes require the X-API-Key header and are rate-limited per key.

WebSocket

  • Endpoint: ws://ui-backend:4000/ws
  • Channels: investigation:{id} — emits WatsonActivity events, finding/IOC validation events, new message events

6. Watson AI Assistant

6.1 Configuration (config.yaml)

Exemple of configuration file :

# Watson AI Assistant Configuration

# LLM Provider
llm:
  provider: "anthropic"              # anthropic | openai | ollama | custom
  model: "claude-opus-4"
  api_key_env: "LLM_API_KEY"        # Name of environment variable holding the key
  base_url: null                     # Override for Ollama or custom: "http://ollama:11434"
  temperature: 0.2
  max_tokens: 4096
  timeout_seconds: 120

# Platform App MCP
app_mcp:
  base_url: "http://ui-backend:4000/mcp"
  api_key_env: "APP_MCP_API_KEY"

# DFIR MCP
dfir_mcp:
  base_url: "http://dfir-mcp:8000"
  api_key_env: "DFIR_MCP_API_KEY"

# Vector DB MCP (optional)
vectordb_mcp:
  enabled: false
  base_url: "http://vectordb-mcp:7000"
  api_key_env: "VECTORDB_MCP_API_KEY"

# Additional MCPs (future extension)
additional_mcps: []
# - name: "malware-analysis-mcp"
#   base_url: "http://malware-mcp:9000"
#   api_key_env: "MALWARE_MCP_API_KEY"

# Artifact ingestion limits
artifact_limits:
  max_file_lines_before_grep: 10000  # Files larger than this require grep filtering
  max_grep_output_lines: 500
  max_content_chars: 3000            # Max chars passed to LLM per artifact chunk
  max_ioc_list_size: 100             # Max IOCs passed in a single context
  max_findings_context: 50           # Max findings passed in a single context

# Hunt behavior
hunt:
  analyst_response_timeout_seconds: 300  # 5 minutes before Watson continues without response
  memory_max_chars: 8000             # Max chars of hunt memory before compression
  max_tool_calls_per_hunt: 200
  enable_internet_check: false       # If false, Watson asks analyst for external lookups

# Logging
logging:
  level: "INFO"                      # DEBUG | INFO | WARNING | ERROR
  file: "/logs/watson.log"

6.2 LLM Adapter Layer

The LLM Adapter Layer is the component inside the Watson AI assistant that acts as a translator between Watson’s internal orchestration logic and any external language model provider. Its purpose is to ensure that the rest of Watson never needs to know which specific LLM it is talking to. From Watson’s perspective, it simply sends a prompt and receives a response — the complexity of dealing with each provider’s specific API format, authentication mechanism, SDK, and quirks is entirely hidden behind this layer.

Concretely, the adapter layer defines a common contract that every supported provider must fulfill. This contract covers the two fundamental operations Watson needs: sending a conversation — including the system prompt, the message history, and any available tool definitions — to the model and receiving back either a text response or a tool call instruction, and estimating token counts for a given piece of text so that Watson can manage its context window and enforce its configured limits before making a call.

Each provider is implemented as a self-contained adapter that satisfies this contract. The Anthropic adapter handles the specifics of the Anthropic Messages API, including its particular message format and how it expresses tool use. The OpenAI adapter does the same for the OpenAI Chat Completions API. The Ollama adapter communicates with a locally running Ollama instance over its REST API, enabling fully offline operation with open-source models. A generic custom adapter handles any additional OpenAI-compatible endpoint, which covers a broad range of self-hosted solutions such as vLLM, LM Studio, or any other server that exposes a compatible interface.

At startup, Watson reads the provider field from its configuration file, selects the corresponding adapter, initializes it with the appropriate credentials and parameters drawn from environment variables, and from that point forward all LLM communication flows exclusively through that adapter. Switching providers requires only a configuration change — no other part of Watson is affected.

The adapter layer also centralizes concerns that apply regardless of provider: enforcing the configured token budget before sending a request, applying the configured temperature and max token settings, handling transient API errors with retry logic, logging token usage back to the active Hunt record, and normalizing provider-specific error responses into a consistent internal error format that Watson’s orchestration layer can handle uniformly.

Concrete implementations:

  • AnthropicAdapter — uses anthropic SDK
  • OpenAIAdapter — uses openai SDK
  • OllamaAdapter — uses Ollama REST API
  • CustomAdapter — OpenAI-compatible base URL override

New adapters are added by subclassing LLMAdapter and registering in LLM_REGISTRY.

6.3 MCP Client Layer

The MCP Client Layer manages all communication between Watson and the external MCP servers it is connected to. Its role is symmetric to the LLM Adapter Layer: just as that layer abstracts away LLM provider differences, the MCP client layer abstracts away the specifics of each MCP server, giving Watson’s orchestration logic a single unified way to discover and invoke external tools regardless of their origin.

It is responsible for two things. First, tool discovery: at startup it connects to every configured MCP server, fetches each server’s tool manifest, and aggregates them into a single unified tool registry presented to the LLM at the beginning of each hunt. This is what allows the LLM to know what capabilities are available and reason about which tool to invoke at each investigation step.

Second, tool execution: when the LLM decides to call a tool, the client layer identifies which server owns it, formats and sends the request with the appropriate credentials, and returns the result to the orchestration logic in a normalized format. The orchestration layer remains completely unaware of which server was involved or how the call was handled.

Each MCP server has its own client instance held in a central registry. When a new server is added to the configuration, its client is initialized automatically and its tools are incorporated into the registry with no changes to Watson’s core logic — this is the mechanism that makes Watson’s toolset extensible through configuration alone.

Cross-cutting concerns handled uniformly across all servers include request timeouts, retry logic on transient failures, logging of every tool call to the active Hunt record, and truncation of oversized responses before they are passed back to the LLM.

6.4 Hunt Lifecycle

┌─────────────────────────────────────────────────────────────────────┐
│                         HUNT LIFECYCLE                              │
│                                                                     │
│  [1] Initialize Hunt                                                │
│       └─ Create Hunt record (status: initializing)                  │
│       └─ Load investigation state from App MCP                      │
│       └─ Load existing findings, IOCs, assets, tasks                │
│       └─ Compose system prompt (default + custom)                   │
│                                                                     │
│  [2] Plan Phase                                                     │
│       └─ AI states its investigation plan                           │
│       └─ Plan stored in Hunt.plan                                   │
│       └─ Activity broadcast: "Planning investigation..."            │
│                                                                     │
│  [3] Execution Phase (status: running)                              │
│       └─ Iterative tool call loop:                                  │
│           ├─ Call DFIR MCP tools (grep, evtx, mft, volatility...)   │
│           ├─ Respect artifact limits (grep first, truncate)         │
│           ├─ Track analyzed files (files_analyzed set)              │
│           ├─ Update hunt memory (compress when > max_chars)         │
│           ├─ Suggest findings/IOCs/assets/tasks via App MCP         │
│           ├─ Broadcast WatsonActivity after each tool call          │
│           └─ Check max_tool_calls_per_hunt limit                    │
│                                                                     │
│  [4] Question Phase (optional, status: awaiting_analyst)            │
│       └─ Post question message via App MCP                          │
│       └─ Start timeout timer                                        │
│       └─ If response received: incorporate into context, resume     │
│       └─ If timeout: log "no response", resume without answer       │
│                                                                     │
│  [5] Summary Phase (status: summarizing)                            │
│       └─ Gather all hunt findings (human + AI)                      │
│       └─ Generate executive summary (Markdown)                      │
│       └─ Upload summary via App MCP /summary route                  │
│       └─ Send hunt summary message to analyst                       │
│       └─ Mark Hunt as completed                                     │
│                                                                     │
│  [6] Idle — Wait for analyst to add new elements                    │
│       └─ Triggered again by platform when new assets/findings added │
└─────────────────────────────────────────────────────────────────────┘

Phase 1 — Initialize Hunt

Watson creates a Hunt record and takes an immutable snapshot of the LLM model and system prompt being used, ensuring the hunt is fully reproducible and auditable. It then queries the App MCP to load the full current investigation state: existing findings, IOCs, assets, tasks, and the list of files already analyzed in previous hunts. This prevents Watson from duplicating prior work and allows it to build on what is already known.

The system prompt is then assembled by merging the platform default prompt with any investigation-level custom prompt. The default provides Watson’s role, behavioral rules, and entity definitions. The custom prompt adds case-specific context from the lead analyst. Together they form the complete instruction set for the hunt.

Phase 2 — Plan Phase

Before touching any artifact, Watson reads the loaded investigation context and articulates its intended approach in plain language: where it will start, which artifacts it considers most relevant, and what hypotheses it wants to test. This plan is stored in the Hunt record and broadcast to the activity panel so analysts can review Watson’s reasoning before it acts and intervene if needed. It also serves as an auditable record of Watson’s intent for post-hunt review.

Phase 3 — Execution Phase

Watson enters an iterative tool call loop, repeatedly reasoning about what to investigate next, invoking a tool via the MCP client layer, processing the result, and deciding the next step. Several rules are enforced throughout:

  • File tracking: Watson skips any file already analyzed for the same purpose in this hunt, avoiding redundant work and unnecessary token consumption.
  • Artifact size limits: Files exceeding the configured line threshold must be queried with a targeted grep pattern rather than read in full. All tool outputs are truncated before being passed to the LLM.
  • Suggestions: As Watson discovers evidence it submits findings, IOCs, assets, recommendations, and tasks to the platform via the App MCP. All submissions are created in a disabled state pending human validation and appear in real time in the activity panel.
  • Hunt memory: After each tool call Watson updates a rolling compressed memory of its conclusions, confirmed or ruled-out hypotheses, and remaining threads. When memory exceeds its size limit it is compressed into a dense summary. This memory is included in every subsequent LLM prompt to maintain coherent reasoning across a long sequence of tool calls.

The loop continues until Watson judges the hunt complete or the maximum tool call limit is reached.

Phase 4 — Question Phase (Optional)

!! Only ask question to analyst only if it is strictly required to continue the hunt.

At any point during execution, Watson may determine it needs information it cannot obtain from artifacts or tools alone — an uncollected artifact, an external threat intelligence lookup, or a clarification about the environment. In these cases Watson posts a focused question to the investigation message thread via the App MCP and transitions to awaiting_analyst status.

If the analyst responds before the configured timeout, Watson incorporates the answer and resumes. If no response arrives, Watson logs the absence, makes reasonable assumptions, and continues without it. The open question is flagged in the final summary. This mechanism ensures a hunt is never permanently blocked by an unavailable analyst.

Phase 5 — Summary Phase

Once execution is complete, Watson gathers all findings from the hunt — both its own suggestions and those created by human analysts — and generates a concise executive summary in Markdown covering what was investigated, what was found, and what remains uncertain. The summary is uploaded to the platform via the App MCP and posted as a message in the investigation thread. The Hunt record is then marked as completed with final token usage and suggestion counts recorded.

Phase 6 — Idle

Watson returns to an idle state, waiting for analysts to enrich the investigation with new elements — additional assets, new findings from manual analysis, or updated context. When significant new elements are added the platform can trigger a new hunt, at which point the cycle restarts from phase 1 with the enriched investigation state as its foundation.

6.5 Artifact Ingestion Strategy

Watson never reads artifact files wholesale. Every file access follows the same disciplined strategy to keep LLM token consumption under control.

Before reading any file, Watson first checks its line count. If the file is below the configured threshold it is read in full. If it exceeds the threshold, Watson must supply a targeted grep pattern to extract only the relevant lines — reading the entire file is not permitted. If no pattern is available at that point, Watson skips the file and formulates a pattern before attempting it again. Regardless of how the content was obtained — full read or grep — the result is always truncated to the configured character limit before being passed to the LLM.

The three configuration values that govern this behavior are max_file_lines_before_grep (the line count above which grep is required), max_grep_output_lines (the maximum number of matching lines returned), and max_content_chars (the hard character cap applied to all output before it reaches the LLM). All three are adjustable in config.yaml.

6.6 Hunt Memory

Watson maintains a rolling memory throughout a hunt to preserve continuity across what can be a long sequence of tool calls. This memory accumulates key findings, identified IOCs, open questions, analyst context, and the list of files already analyzed.

As new information is added, the memory grows. When it exceeds its configured size limit, Watson asks the LLM to compress its current contents into a dense summary, discarding verbose detail while retaining essential conclusions. This keeps the memory footprint stable over time and ensures it never consumes an excessive portion of the LLM context window.

At each tool call step, the current memory is injected into the LLM prompt, giving Watson a coherent picture of where the investigation stands without having to replay the entire history of the hunt.

6.7 System Prompt Structure

[ROLE]
You are Watson, an expert forensic AI assistant working alongside human analysts on a DFIR 
(Digital Forensics and Incident Response) investigation. You are methodical, precise, and 
evidence-driven. You never speculate beyond what the artifacts support. You prioritize 
high-value findings over exhaustive coverage.

[PLATFORM CONTEXT]
You have access to the following MCP servers:
- App MCP: the investigation platform. Use it to read the current investigation state 
  (findings, IOCs, assets, tasks) and to submit your suggestions.
- DFIR MCP: the forensic artifact store. Use it to run tools against collected artifacts 
  located under /data. Always explore /data structure first if you are unsure what is available.
- Vector DB MCP (if available): a knowledge base of threat intelligence articles and 
  investigation references. Query it when you need context on a TTP, malware family, 
  or threat actor before drawing conclusions.

[ENTITY DEFINITIONS]
These are the entities you will read and suggest. Understand them precisely.

Finding — a discrete malicious or suspicious event identified in an artifact.
  - timestamp: when the event occurred (from the artifact, not the current time)
  - utc_offset: UTC offset of the source system
  - location: full artifact path where the event was observed (e.g. DESKTOP-X:/C/Windows/System32/winevt/Security.evtx)
  - type: category of the event (e.g. account_creation, lateral_movement, persistence)
  - severity: critical | high | medium | low | informational
  - mitre_tactic / mitre_technique: populate whenever mappable (e.g. TA0003 / T1053.005)
  - title: one clear sentence summarizing the event
  - description: concise explanation of why this event is significant
  - content: raw artifact value, max 500 characters. Truncate if longer.
  - asset_ids: affected assets. One finding per asset — create separate findings if multiple assets are involved.
  - ioc_ids: IOCs observed in this finding
  - recommendation_ids: recommendations that address this finding

IOC — a typed observable associated with malicious activity.
  - Supported types: hash::sha256, hash::md5, hash::sha1, ip::ipv4, ip::ipv6, domain::fqdn,
    url::full, file::path, file::name, registry::key, registry::value, scheduled-task::name,
    service::name, process::name, mutex::name, string::email, certificate::thumbprint, and others.
  - value: the indicator value. Wildcards and regex are supported.
  - confidence: low | medium | high | confirmed
  - Always link IOCs to the findings and assets where they were observed.

Asset — a machine, account, or entity involved in the incident.
  - type: machine | account | network_share | service | email_address | other
  - compromise_status: unknown | suspected | confirmed | cleared
  - Update compromise status by suggesting a new asset with the revised status rather than 
    modifying an existing one.

Recommendation — a mitigation or remediation action addressing one or more findings.
  - Be specific and actionable. Vague recommendations have no value.
  - Link to the findings that motivated the recommendation.

Task — a concrete action to be performed by an analyst or the client.
  - Use tasks for actions that require human intervention: artifact collection, account 
    disabling, system isolation, stakeholder notification, etc.
  - Be specific about what needs to be done, by whom, and why.

[INVESTIGATION METHODOLOGY]
Follow this order of priority when deciding what to investigate:

1. Start with what is already known. Read all existing findings, IOCs, and assets from the 
   App MCP before touching any artifact. Build on prior work — do not rediscover what is 
   already documented.
2. Prioritize high-value artifacts first: event logs (Security, System, PowerShell), 
   prefetch, MFT, registry hives, scheduled tasks, services, and network connections.
3. Form a hypothesis before each tool call. Know what you are looking for and why before 
   you run a command. Do not explore blindly.
4. Follow the evidence. Each finding should inform the next step. Pivot from a suspicious 
   process to its network connections, from a file creation to its hash, from a logon event 
   to lateral movement indicators.
5. Map everything to MITRE ATT&CK where possible. This adds structure and helps analysts 
   understand the adversary's playbook.
6. When in doubt about a TTP or malware family, query the Vector DB MCP before concluding.

[ARTIFACT INGESTION RULES]
These rules are mandatory and must never be bypassed.

- Before reading any file, check its line count.
- Files below the threshold may be read in full.
- Files above the threshold must be queried with a targeted grep pattern. Never read a 
  large file in full.
- If you do not yet have a pattern for a large file, skip it and return to it once you 
  have formulated one.
- All tool output is truncated to the configured character limit before you process it.
- Never analyze the same file for the same purpose twice in a single hunt. Check your 
  memory before repeating a tool call.

[SUGGESTION RULES]
- Submit suggestions via the App MCP as you discover evidence — do not batch everything 
  at the end.
- All suggestions are created as disabled and require human validation before becoming active.
- Never modify existing human-created entities. If you want to refine one, create a new 
  suggestion referencing the original.
- Every finding must have a location, a timestamp, and at least one associated asset.
- Every IOC must have a type, a value, and a confidence level.
- Populate MITRE fields whenever you can map the finding to a technique.
- If an event involves multiple assets, create one finding per asset.

[COMMUNICATION]
- If you need information you cannot obtain from artifacts or tools — an uncollected 
  artifact, an external lookup, or an environmental clarification — post a focused and 
  specific question to the analyst via the App MCP message thread.
- Ask one question at a time. Do not bundle multiple questions into a single message.
- If no response is received within {timeout} seconds, log the absence, make a reasonable 
  documented assumption, and continue.
- If you need an external lookup (VirusTotal, MISP, Shodan, etc.) and internet access is 
  unavailable, ask the analyst to perform it and provide you with the result.

[EXECUTIVE SUMMARY]
At the end of the hunt, generate a concise executive summary in Markdown covering:
- What was investigated and which artifacts were examined
- Key findings, grouped by attack phase or MITRE tactic where possible
- Identified IOCs and compromised assets
- Confidence level in the overall assessment
- What remains uncertain or requires further investigation
- Recommended immediate actions

Upload the summary via the App MCP and post it as a message to the investigation thread.

[MEMORY]
The following is your current hunt memory. It contains your conclusions so far, open 
questions, and the list of files already analyzed. Treat it as ground truth for what 
you have already done.

{memory}

[INVESTIGATION CONTEXT]
The following context has been provided by the lead analyst for this specific investigation. 
It takes priority over your default methodology where they conflict.

{custom_system_prompt}

7. DFIR MCP Server

7.1 Overview

The dfir-mcp container is an Ubuntu 22.04 image with forensic tools pre-installed, exposing them as MCP-compliant HTTP endpoints.

Framework: Python + FastAPI
Authentication: X-API-Key header (shared secret, environment variable)
Data Volume: /data — investigation artifacts mounted here

7.2 Installed Forensic Tools

ToolPurpose
grep / zgrepPattern search in text / gzip files
zcat / catRead files
wcLine/word/byte counts
findFile system traversal
strings / bstringsExtract printable strings from binaries
Volatility 2Legacy Windows memory analysis
Volatility 3Modern cross-platform memory analysis
The Sleuth Kit (fls, icat, mmls)Filesystem forensics
MFT Parser (mftparser / analyzeMFT)NTFS MFT parsing
EVTX Parser (python-evtx / evtxtract)Windows Event Log parsing
Plaso (log2timeline / psort)Timeline super-analysis
Autopsy / tsk_recoverFile recovery
PhotoRecFile carving
MemprocFSMemory file system
MobSFMobile forensics
SherloqImage forensics
YARAPattern/rule matching
ExifToolMetadata extraction
binwalkBinary analysis / firmware extraction
foremostFile carving
bulk_extractorBulk data extraction (emails, URLs, hashes)
hashdeepFile hashing and auditing
ssdeepFuzzy hashing
jqJSON processing
python3Scripting and custom parsers

7.3 MCP Tool Definitions

Every forensic tool available in the DFIR MCP container is exposed as a named MCP tool with a structured definition that serves two purposes: it tells the MCP client how to validate and route calls to that tool, and it tells the LLM what the tool does, when to use it, and what parameters it expects.

Each tool definition consists of a unique name, a plain-language description written specifically for the LLM to understand the tool’s purpose and appropriate use cases, and an input schema that defines the accepted parameters, their types, and which are required. All tool definitions are aggregated into the server’s tool manifest, which Watson fetches at startup and uses throughout the hunt to reason about what capabilities are available.

The quality of the description field is critical — it directly determines how well Watson decides when and how to invoke each tool. A poorly described tool will be underused or misused regardless of how well it is implemented

7.4 API Structure

GET  /health                  Health check
GET  /tools                   MCP tool manifest
POST /tools/{tool_name}       Execute a tool
GET  /data/ls?path=...        Browse /data filesystem
GET  /data/info?path=...      File metadata (size, mtime, hash)

All routes require X-API-Key.
All command executions are sandboxed: no write access to /data by default, configurable allowed-write paths.
Execution timeout configurable per tool.
Output always truncated to configured limits.

8. Vector Knowledge Base

8.1 Overview

An optional container running a vector database — Qdrant by default, replaceable with Chroma, Weaviate, or any equivalent — stores security articles, threat reports, and investigation playbooks as vector embeddings. A lightweight Python MCP wrapper exposes this knowledge base to Watson as two tools: a semantic search tool for querying relevant content by topic, and an ingestion tool for adding new documents to the knowledge base.

Watson checks whether the vector DB MCP is enabled before attempting any call and operates normally without it if it is not available. When it is available, Watson uses it to enrich its reasoning — querying for context on a TTP, malware family, or threat actor before drawing conclusions from artifact evidence alone.

Watson checks config.vectordb_mcp.enabled before attempting any calls. If unavailable or disabled, it proceeds without this knowledge enrichment.

9. Security Controls

9.1 Authentication & Authorization

ControlImplementation
API AuthenticationJWT (RS256) for users; API keys (HMAC-SHA256 hashed in DB) for Watson/MCP
Token ExpiryAccess token: 15 min; Refresh token: 7 days
Role-Based Access Controladmin, analyst, viewer roles enforced on all routes
Investigation IsolationAnalysts can only access investigations they are assigned to
AI SegregationWatson API key scoped to MCP routes only; cannot call admin or user management routes

9.2 Input Validation & Injection Prevention

ControlImplementation
Request ValidationZod (backend) and Yup/Valibot (frontend) schema validation on all inputs
SQL InjectionParameterized queries via Prisma ORM
Command Injection (DFIR MCP)All tool arguments are strictly validated and shell-escaped; no raw string interpolation into shell commands; tools run via subprocess with argument lists, not shell strings
Path Traversal (DFIR MCP)All paths validated to be within /data; symlink resolution checked
XSSVue.js template escaping; Content Security Policy headers
SSRFDFIR MCP only allows tool calls; no arbitrary URL fetch

9.3 Transport Security

ControlImplementation
TLSAll inter-container and external communication uses TLS in production (via Nginx reverse proxy with Let’s Encrypt or provided certificates)
HSTSStrict-Transport-Security header enforced
CORSStrict origin allowlist; no wildcard * in production

9.4 Secrets Management

ControlImplementation
Environment VariablesAll secrets (API keys, DB password, JWT secret) passed via environment variables; never hardcoded
.env files.env files explicitly in .gitignore; .env.example provided
DB PasswordStrong random password generated at deploy time
Watson API KeySeparate key from human user keys; rotatable independently

9.6 Audit Logging

All API requests and MCP tool calls are logged with:

  • Timestamp
  • Actor (user ID or API key hash)
  • Action
  • Resource ID
  • IP address
  • Response status

Logs are written to a dedicated audit log file and optionally to a syslog endpoint.

9.7 Data Integrity

  • All AI-created entities require explicit human validation before becoming enabled=true
  • Entities are soft-deleted (never hard-deleted unless explicitly purged by admin)
  • All updates preserve updated_at; a change history table is maintained for Findings

10. Processing Flows

10.1 Human Analyst Creates a Finding

Analyst ──POST /api/v1/investigations/:id/findings──► ui-backend
  └─ Validate schema (Zod)
  └─ Set creator_type = "human", enabled = true
  └─ Persist to PostgreSQL
  └─ Emit WebSocket event to channel investigation:{id}
  └─ Return 201 + created finding

10.2 Watson Suggests a Finding

Watson ──POST /mcp/investigations/:id/findings──► ui-backend
  └─ Validate API key → scope check (mcp only)
  └─ Set creator_type = "ai", enabled = false
  └─ Persist to PostgreSQL
  └─ Emit WebSocket event: "Watson suggested a new finding"
  └─ Analysts see yellow badge in timeline awaiting validation

10.3 Analyst Validates a Watson Finding

Analyst ──POST /api/v1/investigations/:id/findings/:fid/validate──► ui-backend
  └─ Set enabled = true
  └─ Update updated_at
  └─ Emit WebSocket event

10.4 Watson Hunt Execution

[Trigger] (manual POST or auto on new assets/findings)
  │
  ▼
Watson initializes Hunt record (status: initializing)
  │
  ▼
Fetch investigation state via App MCP
  │
  ▼
Formulate investigation plan → broadcast activity
  │
  ▼
Loop: LLM reasons → selects tool → calls MCP tool
  ├─ Update files_analyzed set
  ├─ Update hunt memory (compress if needed)
  ├─ Broadcast WatsonActivity
  └─ If suggestion → POST to App MCP (/findings, /iocs, etc.)
  │
  ▼
Question (if needed) → post Message → start timeout
  ├─ If response received: incorporate into context
  └─ If timeout: continue
  │
  ▼
Max tool calls reached OR LLM decides hunt is complete
  │
  ▼
Generate executive summary (all findings human+AI)
  │
  ▼
POST summary to /mcp/investigations/:id/summary
POST summary message to /mcp/investigations/:id/messages
  │
  ▼
Mark Hunt as completed

10.5 Timeline Construction

GET /api/v1/investigations/:id/timeline
  └─ Query findings with filters (asset, type, date, search, enabled)
  └─ Enrich with asset names and IOC counts
  └─ Sort by timestamp
  └─ Return paginated results

Frontend renders:
  ├─ Table mode: paginated rows with icons, colors, filter panel
  └─ Horizontal mode: swimlane bars by asset, chronological axis

11. Export & Reporting

11.1 CSV/ZIP Export

POST /api/v1/investigations/:id/export/csv

Returns a ZIP archive containing:

investigation_{id}_{date}/
  ├── investigation_summary.csv
  ├── findings.csv
  ├── iocs.csv
  ├── assets.csv
  ├── recommendations.csv
  ├── tasks.csv
  ├── timeline.csv
  └── messages.csv

11.2 HTML/Markdown Report Export

POST /api/v1/investigations/:id/export/report

Returns a self-contained HTML page or Markdown file with:

# [Investigation Title]

**Analysts:** [list]
**Start Date:** [date]
**Report Date:** [current date]
**Status:** [status]

---

## Executive Summary
[Last AI hunt summary — Markdown]

---

## Timeline of Findings
[Table of findings sorted by timestamp with severity, type, title, asset, IOCs]

---

## Recommendations
[Table: priority, title, description, status, assigned to]

---

## Tasks
[Table: type, title, assigned to, status, due date]

---

## Indicators of Compromise
[Table: type, value, confidence, TLP, source, first seen]

---

## Annexes

### A. Analyst Notes & Messages
[Hunt summaries and analyst thread]

### B. Assets
[Table: type, name, compromise status, properties]

12. Docker Compose — Dev & Production

Two Docker Compose files are provided: one for development and one for production. Both define the same set of services connected on a shared internal network, but differ in how they are configured and exposed.

In development, all service ports are published to the host to allow direct access from local tools such as MongoDB Compass, API clients, or browser devtools. Hot reload is enabled for the frontend and backend. AdminJS is enabled. Log levels are set to verbose. The DFIR data volume is mounted with read-write access.

In production, no service ports are exposed directly except through the Nginx reverse proxy, which handles TLS termination. AdminJS is disabled by default. Log levels are reduced. The DFIR data volume is mounted read-only. Resource limits are applied to each container. All secrets are injected exclusively via environment variables — never hardcoded.

The optional vector database services are gated behind a Docker Compose profile and are only started when explicitly activated, keeping the default deployment footprint minimal.

A .env.example file documents every required environment variable with placeholder values. A .env file — never committed to version control — is used to supply actual secrets and environment-specific parameters at runtime.

12.3 .env.example

Example of environment file :

# MongoDB
MONGO_INITDB_ROOT_USERNAME=watson_admin
MONGO_INITDB_ROOT_PASSWORD=CHANGE_ME_STRONG_MONGO_PASSWORD
MONGO_INITDB_DATABASE=watson_db
MONGODB_URI=mongodb://watson_admin:CHANGE_ME_STRONG_MONGO_PASSWORD@mongodb:27017/watson_db?authSource=admin

# JWT
JWT_SECRET=CHANGE_ME_JWT_SECRET_256BIT
JWT_REFRESH_SECRET=CHANGE_ME_JWT_REFRESH_SECRET_256BIT

# LLM
LLM_PROVIDER=anthropic        # anthropic | openai | ollama | custom
LLM_API_KEY=sk-ant-...
LLM_MODEL=claude-opus-4
LLM_BASE_URL=                  # Leave empty unless using Ollama or custom

# MCP API Keys
APP_MCP_API_KEY=CHANGE_ME_APP_MCP_KEY
DFIR_MCP_API_KEY=CHANGE_ME_DFIR_MCP_KEY

# Vector DB (optional)
VECTORDB_MCP_ENABLED=false
VECTORDB_MCP_API_KEY=CHANGE_ME_VECTORDB_KEY

# Frontend URLs (production)
FRONTEND_API_URL=https://watson.yourdomain.com/api/v1
FRONTEND_WS_URL=wss://watson.yourdomain.com/ws

# DFIR MCP limits
DFIR_MAX_TIMEOUT=60
DFIR_MAX_CONCURRENT=4

13. Future Evolution & Extension Points

The table below provides a quick reference, followed by a detailed description of each evolution point.

Quick Reference

AreaExtension Point
New LLM ProvidersSubclass LLMAdapter, register in LLM_REGISTRY, add provider value to config
New MCP ServersAdd entry to additional_mcps in config.yaml; MCPClient auto-registers tools
New IOC TypesExtend IOCType enum in the IOC model
New Finding TypesExtend FindingType enum in the Finding model
New Forensic ToolsAdd tool endpoint to DFIR MCP; auto-appears in tool manifest
New Task TypesExtend TaskType enum in the Task model
MITRE ATT&CK EnrichmentCross-reference findings with ATT&CK via vector DB or local JSON
Threat Intel FeedIngest MISP/OpenCTI into vector DB; available to Watson via knowledge_search
Multi-tenancyAdd organizationId to User and Investigation; enforce in middleware
Notification SystemWebhook/email/Slack on finding validation or hunt completion
Watson Feedback LoopAnalyst rates Watson suggestions; feedback stored for prompt tuning
Collaborative EditingReal-time concurrent editing of findings and entities

13.1 New LLM Providers

Watson’s AI layer is built around a provider-agnostic adapter interface. Each LLM provider is a self-contained adapter that implements a common contract: send a prompt, receive a response, and optionally count tokens. The active provider is selected at runtime from the configuration file.

Adding a new provider requires creating a new adapter class that implements the base interface, handling authentication, request formatting, response parsing, and error normalization specific to that provider. The new adapter is then registered in the central LLM_REGISTRY under a unique provider name string, and users can activate it by setting llm.provider in config.yaml with no other changes needed.

This design supports any provider that exposes a completion-style API, including commercial APIs (OpenAI, Anthropic, Mistral, Cohere, etc.), self-hosted open-source models via Ollama or vLLM, and any OpenAI-compatible endpoint. The base_url configuration field exists specifically to accommodate self-hosted deployments where the provider interface is compatible but the endpoint differs.

Special consideration should be given when adding providers that do not support native tool/function calling, as Watson relies heavily on structured tool use. In such cases the adapter layer is responsible for emulating tool call behavior through prompt engineering and response parsing.


13.2 New MCP Servers

Watson’s MCP client layer is designed to support an arbitrary number of MCP servers simultaneously. Each server is registered in the configuration under additional_mcps with a name, base URL, and API key environment variable reference. At startup Watson initializes an MCPClient instance for each entry, fetches its tool manifest, and makes all discovered tools available to the LLM during hunts.

This means adding a new capability to Watson — for example a malware sandbox, a threat intelligence lookup service, a SIEM query interface, or a cloud provider API — requires only adding the MCP server to the configuration. No changes to Watson’s core orchestration logic are needed.

New MCP servers should follow the standard MCP specification so that tool definitions, parameter schemas, and response formats are consistent. The tool manifest fetched at startup is what Watson’s LLM uses to understand what tools are available and how to call them, so clear and descriptive tool definitions are essential for Watson to use them effectively.

Priority areas for future MCP servers include: a SIEM/EDR query interface (Splunk, Elastic, CrowdStrike), a threat intelligence platform connector (MISP, OpenCTI, VirusTotal), a cloud forensics interface (AWS CloudTrail, Azure Activity Log), a sandbox detonation service (Cuckoo, Any.run), and a case management integration (TheHive, JIRA).


13.3 New IOC Types

The IOCType enum defines the full set of supported indicator categories. The current set covers the most common forensic observables but the list is intentionally extensible.

Adding a new IOC type requires extending the enum in the IOC model interface. Downstream impacts are limited: the frontend type selector and badge renderer will need to include the new type, and Watson’s system prompt should be updated to describe the new type so it understands when and how to suggest it. No structural database or API changes are required.

Candidate future IOC types include asn::number, crypto::wallet_address, cloud::bucket_name, cloud::resource_id, bluetooth::mac, imei::device, jwt::claim, and ssh::fingerprint.


13.4 New Finding Types

The FindingType enum categorizes each forensic event. While the current set maps broadly to MITRE ATT&CK tactics and common forensic event categories, investigations may surface event types not yet represented.

Adding a new finding type requires extending the enum in the Finding model interface. The timeline visualizer uses the type to select display icons and default colors, so new types should be registered in the frontend’s type-to-icon and type-to-color mapping tables. Watson’s system prompt should also be updated to describe the new type.

As the platform matures and analysts identify recurring event patterns not covered by the current set, new types can be proposed, reviewed, and added without structural changes to the rest of the system. Candidate additions include cloud_resource_creation, cloud_permission_change, container_escape, firmware_modification, supply_chain_compromise, and data_destruction.


13.5 New Forensic Tools in DFIR MCP

The DFIR MCP server is designed as an open-ended toolbox. Any forensic tool that can be invoked from the command line on the container’s operating system can be exposed as an MCP tool by adding a new endpoint to the DFIR MCP FastAPI application.

Each new tool endpoint defines its name, a clear description (which Watson reads to decide when to use it), an input parameter schema, execution logic, output sanitization, and truncation behavior. Once the endpoint is defined it automatically appears in the MCP tool manifest that Watson fetches at startup and becomes immediately available for use in hunts.

When adding new tools, special attention should be paid to execution safety (no shell injection, path traversal protection, output size limits), execution timeout configuration, and writing a description detailed enough for Watson to understand what the tool does, what inputs it expects, and what the output format looks like. The quality of the tool description directly affects how well Watson uses it.

Future tool additions of interest include chainsaw for Windows event log hunting, hayabusa for fast EVTX timeline generation, LogonTracer for Active Directory lateral movement visualization, Volatility plugins for Linux and macOS memory images, Velociraptor artifact collection, Magnet AXIOM integration hooks, and cloud-native forensic CLI tools (AWS CLI, az CLI for Azure).


13.6 New Task Types

The TaskType enum defines the categories of actionable items that can be assigned during an investigation. As the platform is adopted across different types of engagements — red team debrief, cloud incident response, mobile forensics, OT/ICS investigations — new task categories will naturally emerge.

Adding a new task type requires only extending the enum in the Task model interface. The frontend task creation form and task list filters will reflect the new type automatically if they are driven by the enum values. Watson’s system prompt should be updated so it knows when to suggest tasks of the new type.


13.7 MITRE ATT&CK Enrichment

Findings already carry mitreTactic and mitreTechnique fields. A dedicated enrichment layer can be built on top of these to provide analysts with deeper context directly within the platform.

The enrichment service would consume the local MITRE ATT&CK STIX dataset (updated periodically) and provide automatic lookup of tactic and technique metadata — full name, description, associated sub-techniques, known threat actor groups, and suggested mitigations — whenever a finding is created or updated with a technique ID.

This enrichment can be surfaced in several ways: inline in the finding detail view, as a sidebar in the timeline, or as automated recommendation suggestions generated whenever a finding references a technique that has well-known mitigations in the ATT&CK knowledge base.

Watson can also leverage this enrichment during hunts by querying the vector DB (if enabled) or a local ATT&CK JSON dataset via a dedicated MCP tool, allowing it to reason about technique relationships and suggest related artifacts to investigate based on known adversary behavior patterns.


13.8 Threat Intelligence Feed Integration

The optional vector database provides a natural integration point for external threat intelligence. Structured feeds from platforms such as MISP, OpenCTI, or TAXII-compliant sources can be ingested, chunked, embedded, and stored in the vector DB, making them available to Watson via the knowledge_search tool during hunts.

This allows Watson to cross-reference newly discovered IOCs or findings against known threat actor campaigns, malware families, and previously documented TTPs without requiring direct internet access. The ingestion pipeline would run as a scheduled process, pulling new intelligence, generating embeddings, and upserting records into the vector DB.

The ingestion pipeline should normalize feed data into a consistent schema regardless of source format, deduplicate entries, tag records with their source and TLP classification, and respect TLP restrictions when surfacing results to Watson or displaying them in the UI. A management interface in the platform (or via AdminJS) would allow administrators to view indexed intelligence, trigger manual ingestion runs, and purge stale data.


13.9 Multi-tenancy

The current architecture assumes a single organization operates the platform. Adding multi-tenancy would allow multiple independent organizations to share a single deployment while maintaining strict data isolation between them.

The primary change is introducing an organizationId field on the User and Investigation models, with all child entities (findings, IOCs, assets, etc.) inheriting isolation through their parent investigation. All database queries and API middleware would be updated to enforce organization-scoping as a mandatory filter layer, preventing any cross-organization data access.

Additional considerations include per-organization configuration of the default system prompt, LLM provider selection, custom branding, and storage quotas. An organization-level admin role distinct from the platform superadmin would be introduced to allow organizations to manage their own users, investigations, and settings independently.

This evolution requires careful security review of every API route and data access path to ensure that organization boundaries cannot be bypassed through direct ID enumeration or relationship traversal.


13.10 Notification System

Currently, analysts learn about new Watson suggestions and completed hunts by observing the real-time activity panel or checking the investigation page. A notification system would push these events proactively to analysts through their preferred channels.

Notification triggers would include: a hunt completing and a summary being posted, Watson suggesting a batch of new findings or IOCs requiring validation, a task being assigned to an analyst, a task due date approaching, a new message requiring a response, and an investigation status change.

The notification system would be implemented as an event-driven module that subscribes to internal platform events (the same events already emitted over WebSocket) and dispatches them to configured delivery channels. Supported channels would initially include in-app notifications (a notification bell in the UI), email, Slack webhooks, and Microsoft Teams webhooks. The configuration for delivery channels would be manageable per-user (personal preferences) and per-investigation (investigation-level overrides).


13.11 Watson Feedback Loop

Watson’s effectiveness over time depends on feedback signals from analysts. A structured feedback loop would allow Watson to improve its suggestions based on real-world validation outcomes.

When an analyst validates or rejects a Watson-suggested finding, IOC, asset, recommendation, or task, that action is already recorded. Building on this, the platform would track per-investigation and aggregate statistics on Watson’s suggestion acceptance rate by type, severity, and investigation context. This data can be reviewed by platform administrators to identify systematic patterns — for example, Watson frequently suggesting low-confidence IOCs that are consistently rejected, or missing a particular class of finding.

In addition to passive tracking, analysts could be given an explicit thumbs-up / thumbs-down feedback control on each Watson suggestion with an optional comment field. This structured feedback, associated with the investigation context and system prompt used, creates a dataset that can be used to refine Watson’s system prompt, adjust its confidence thresholds, or fine-tune a locally hosted model in future iterations.

A feedback dashboard visible to administrators would surface these trends, allowing continuous improvement of Watson’s default system prompt and investigation strategies.


13.12 Collaborative Editing

The current model assumes that entities are created by one actor (human or AI) and subsequently read by others, with updates applied sequentially. As team sizes grow and multiple analysts work simultaneously on the same investigation, the need for real-time collaborative editing will emerge.

Collaborative editing would allow multiple analysts to edit a finding, recommendation, or task simultaneously without overwriting each other’s changes. This requires implementing an operational transformation or CRDT (Conflict-free Replicated Data Type) layer on top of the existing WebSocket infrastructure, so that incremental changes from each client are merged in a consistent order.

A simpler intermediate step before full CRDT implementation would be optimistic locking combined with a field-level merge strategy: the platform tracks which fields were changed in each update request, and if two analysts update different fields of the same entity concurrently, both changes are accepted and merged. Conflicts on the same field would be surfaced to both analysts for manual resolution via a simple diff view.

The real-time activity panel already establishes the WebSocket infrastructure needed for presence indicators — showing which analysts are currently viewing or editing a given entity — which is a valuable first step toward a fully collaborative experience.