Case Study | AI × Cybersecurity | Hummus Development LLC

SentinelIQ.

An AI-powered SIEM companion that turns multi-format security logs into a queryable, anomaly-scored timeline — with a natural-language query layer that translates plain English into structured filters.

Stack · Python / FastAPI / React / ChromaDB Status · Public on GitHub Domain · SIEM · Threat Detection Role · Solo design + build
4
Log Format Parsers
3
Scoring Methods
NL
Query → SQL
A+
Code Quality
01

The Problem

Every security analyst I've worked with spends most of their day three things deep: pivoting between SIEM dashboards that don't talk to each other, hand-writing query DSLs against schemas they've memorized but resent, and trying to remember whether this spike of failed logins is "Tuesday morning normal" or "actually an incident."

SentinelIQ is built around a different default: ingest everything, embed it, score it three ways, and let the analyst ask questions in English. Logs are useless if they're locked behind a query language; alerts are noise if the model thinks a holiday weekend is anomalous; and a SIEM that can't tell you "show me failed SSH attempts from new IPs in the last hour" without you knowing the schema is just a $50K log archive.

02

Multi-Format Log Ingestion

Real security teams aren't running one log source — they're running a syslog collector for Linux fleet, Windows Event collectors for the AD domain, a CEF feed off a network appliance, and a CSV dump from whatever EDR they bought last year. SentinelIQ has parsers for all four, normalizing into a single schema:

Each event lands in SQLite for the timeline view and ChromaDB as a 384-dim embedding so the anomaly engine and NL query layer can both reason over the same store.

03

Three Anomaly Scoring Methods

One scoring method = one blind spot. SentinelIQ runs three independent methods on every event and surfaces the disagreement, not just the consensus.

Method 01

Embedding distance

Each new event's vector is compared against the running k-NN cloud of the last N events from the same source. Outliers in semantic space surface even when the surface text looks routine.

Method 02

Frequency Z-score

Per-source, per-event-type rolling baseline. A 142-attempts-per-minute brute force isn't anomalous because the words are scary — it's anomalous because the rate is six standard deviations above this host's normal.

Method 03

Time-of-day prior

Same event, 3 AM, weekend = different score than 10 AM, Tuesday. Catches lateral-movement patterns that hide inside business-hours noise.

Design principle

Disagreement is the signal. When two methods score an event high and one scores it low, the dashboard surfaces it as "investigate" — not "alert." That's where the actual analyst value lives.

04

Natural Language Query Engine

The killer feature: an analyst types "show failed logins in the last 24h from IPs we haven't seen this week" — the system translates that into a structured filter pipeline (event-type, time-range, source-IP cardinality lookup), runs it against SQLite + ChromaDB, and returns a result table with citations back to raw events.

The translation layer uses Ollama-served local LLMs for everything (no logs leaving the network), with a tool-call schema constraining what the model is allowed to emit. The model never writes raw SQL — it picks from a fixed set of typed filter primitives. Hallucinations show up as parse errors, not as silent wrong queries.

sentineliq // dashboard.log
[03:14:22] INFO Syslog ingestion started on :514
[03:14:23] WARN Anomaly score 0.87 — src 10.0.0.45 (M01:0.91 M02:0.88 M03:0.82)
[03:14:24] CRIT Brute-force detected — 142 attempts/min on sshd@bastion-01
[03:14:25] INFO Embedding vectors indexed: 24,891
[03:14:26] WARN Unusual port scan from 192.168.1.33 — 47 ports / 8s
[03:14:27] NLQ "show failed logins last 24h" → 312 results
[03:14:28] ALERT Lateral movement pattern (3 hosts, sequential SMB)
[03:14:29] INFO ChromaDB k-NN: 12 nearest events (cos-dist 0.21–0.34)
[03:14:30] NLQ "summarize last 5 alerts" → AI-gen incident report
05

Architecture

                    [ Log sources ]
   syslog      winevt      CEF        firewall.csv
                                        
     └────────────┴──────────┴─────────────┘
                          
              Format-agnostic parser layer
                          
            Normalized event { ts, src, type, msg, ... }
                          
       ┌──────────────────────┼──────────────────────┐
                                                   
   SQLite (timeline)     ChromaDB (vectors)     Anomaly
                                                  M01 / M02 / M03
                                                   
       └──────────────────────┼──────────────────────┘
                          
              FastAPI  (query + WebSocket stream)
                          
       ┌──────────────────────┼──────────────────────┐
                                                   
  Ollama LLM           React Dashboard          Incident
  NL → typed filters   timeline / heatmap      summarizer
06

Why It's a Useful Signal

I built SentinelIQ for one reason: in my QA & Cybersecurity Engineer role at NextgenID I'd written enough vulnerability assessments and FedRAMP / NIST 800-53 documentation to know what's missing in the day-to-day analyst workflow — and what won't get sold to teams that already have Splunk.

Next Case Study

Papi AI

Cross-platform AI dating wingman: React Native mobile + Vercel FastAPI backend + custom Qwen2.5-7B LoRA fine-tune on a local RTX 4060 Ti.