SentinelIQ — Case Study

Log Format Parsers

Scoring Methods

Query → SQL

A+

Code Quality

The Problem

Every security analyst I've worked with spends most of their day three things deep: pivoting between SIEM dashboards that don't talk to each other, hand-writing query DSLs against schemas they've memorized but resent, and trying to remember whether this spike of failed logins is "Tuesday morning normal" or "actually an incident."

SentinelIQ is built around a different default: ingest everything, embed it, score it three ways, and let the analyst ask questions in English. Logs are useless if they're locked behind a query language; alerts are noise if the model thinks a holiday weekend is anomalous; and a SIEM that can't tell you "show me failed SSH attempts from new IPs in the last hour" without you knowing the schema is just a $50K log archive.

Multi-Format Log Ingestion

Real security teams aren't running one log source — they're running a syslog collector for Linux fleet, Windows Event collectors for the AD domain, a CEF feed off a network appliance, and a CSV dump from whatever EDR they bought last year. SentinelIQ has parsers for all four, normalizing into a single schema:

Syslog (RFC 5424) — structured priority, facility, severity, hostname, app-name, msgid, timestamp, message body. Tested against journald, rsyslog, and a Cisco ASA capture.
Windows Event Logs (EVTX/JSON) — channel, event-id, provider, level, computer, user-sid, payload data. Maps the noisy Microsoft event-id space to plain-English action labels (e.g., 4625 → "failed logon").
CEF (Common Event Format) — vendor, product, signature-id, name, severity, custom extensions. The lingua franca for network appliances.
Firewall CSV — flexible header detection so a Palo Alto export drops in next to a pfSense one without code changes.

Each event lands in SQLite for the timeline view and ChromaDB as a 384-dim embedding so the anomaly engine and NL query layer can both reason over the same store.

Three Anomaly Scoring Methods

One scoring method = one blind spot. SentinelIQ runs three independent methods on every event and surfaces the disagreement, not just the consensus.

Method 01

Embedding distance

Each new event's vector is compared against the running k-NN cloud of the last N events from the same source. Outliers in semantic space surface even when the surface text looks routine.

Method 02

Frequency Z-score

Per-source, per-event-type rolling baseline. A 142-attempts-per-minute brute force isn't anomalous because the words are scary — it's anomalous because the rate is six standard deviations above this host's normal.

Method 03

Time-of-day prior

Same event, 3 AM, weekend = different score than 10 AM, Tuesday. Catches lateral-movement patterns that hide inside business-hours noise.

Design principle

Disagreement is the signal. When two methods score an event high and one scores it low, the dashboard surfaces it as "investigate" — not "alert." That's where the actual analyst value lives.

Natural Language Query Engine

The killer feature: an analyst types "show failed logins in the last 24h from IPs we haven't seen this week" — the system translates that into a structured filter pipeline (event-type, time-range, source-IP cardinality lookup), runs it against SQLite + ChromaDB, and returns a result table with citations back to raw events.

The translation layer uses Ollama-served local LLMs for everything (no logs leaving the network), with a tool-call schema constraining what the model is allowed to emit. The model never writes raw SQL — it picks from a fixed set of typed filter primitives. Hallucinations show up as parse errors, not as silent wrong queries.

sentineliq // dashboard.log

[03:14:22] INFO Syslog ingestion started on :514
[03:14:23] WARN Anomaly score 0.87 — src 10.0.0.45 (M01:0.91 M02:0.88 M03:0.82)
[03:14:24] CRIT Brute-force detected — 142 attempts/min on sshd@bastion-01
[03:14:25] INFO Embedding vectors indexed: 24,891
[03:14:26] WARN Unusual port scan from 192.168.1.33 — 47 ports / 8s
[03:14:27] NLQ "show failed logins last 24h" → 312 results
[03:14:28] ALERT Lateral movement pattern (3 hosts, sequential SMB)
[03:14:29] INFO ChromaDB k-NN: 12 nearest events (cos-dist 0.21–0.34)
[03:14:30] NLQ "summarize last 5 alerts" → AI-gen incident report

Architecture

                    [ Log sources ]
   syslog      winevt      CEF        firewall.csv
     │            │          │             │
     └────────────┴──────────┴─────────────┘
                          ▼
              Format-agnostic parser layer
                          ▼
            Normalized event { ts, src, type, msg, ... }
                          ▼
       ┌──────────────────────┼──────────────────────┐
       ▼                      ▼                      ▼
   SQLite (timeline)     ChromaDB (vectors)     Anomaly
                                                  M01 / M02 / M03
       │                      │                      │
       └──────────────────────┼──────────────────────┘
                          ▼
              FastAPI  (query + WebSocket stream)
                          ▼
       ┌──────────────────────┼──────────────────────┐
       ▼                      ▼                      ▼
  Ollama LLM           React Dashboard          Incident
  NL → typed filters   timeline / heatmap      summarizer

Why It's a Useful Signal

I built SentinelIQ for one reason: in my QA & Cybersecurity Engineer role at NextgenID I'd written enough vulnerability assessments and FedRAMP / NIST 800-53 documentation to know what's missing in the day-to-day analyst workflow — and what won't get sold to teams that already have Splunk.

Local-first. Logs never leave the network. Ollama, ChromaDB, SQLite — everything runs on a single box. That's the only model that works for federal-compliance environments where I spend my day job.
Format-pluralism, not format-religion. No "convert your logs to our schema first" — meet the team where the logs already are.
Three independent methods, surfaced honestly. Disagreement between scorers is more interesting than agreement. Most SIEMs hide the variance; SentinelIQ shows it.
NL query as the front door, not a feature. The dashboard exists, but the primary interaction is "ask in English, get a citation back." That's the workflow change that matters.

Next Case Study

Papi AI →

Cross-platform AI dating wingman: React Native mobile + Vercel FastAPI backend + custom Qwen2.5-7B LoRA fine-tune on a local RTX 4060 Ti.