AI Summarization¶
Every ingested article is processed by the configured LLM provider (OpenAI or Anthropic) to produce a structured intelligence summary. Summaries include an executive overview, novelty assessment, technical details, defensive mitigations, tags, and an attack flow sequence.
Summary Structure¶
The summarizer requests a structured JSON response with these fields:
Executive Summary¶
A 3-5 sentence paragraph covering the core intelligence — what happened, who is involved, what's affected, and why it matters.
Novelty Assessment¶
What's new or noteworthy about the threat. This highlights novel tactics, techniques, tooling, or targeting that distinguish this from routine activity. The novelty text is stored both within the markdown summary and as a separate novelty_notes field in the database, enabling the "What's Notable" section on the article detail page.
Technical Details¶
An array of bullet points capturing:
- Indicators of Compromise (IOCs)
- CVE identifiers
- Affected systems and versions
- Attack chain steps
- Malware capabilities
- Infrastructure details
Mitigations¶
Actionable defensive recommendations derived from the article content.
Tags¶
3-8 lowercase, hyphenated tags for categorization:
| Tag Type | Examples | Purpose |
|---|---|---|
| Category | ransomware, phishing, vulnerability |
Maps to broad threat categories |
| Entity | apt29, emotet, cobalt-strike |
MITRE ATT&CK threat actors and software |
| CVE | cve-2024-1234 |
Specific vulnerability identifiers |
Attack Flow¶
An ordered sequence of attack phases (when applicable). Each step includes:
{
"phase": "Initial Access",
"title": "Spearphishing with macro-enabled document",
"description": "Attacker sends targeted email with weaponized DOCX...",
"technique": "T1566.001"
}
See Attack Flow for details on the visualization.
Categorization¶
Tags drive the categorization system. Each tag is mapped to one of 9 broad threat categories using keyword rules:
| Category | Matching Tags |
|---|---|
| Malware | malware, trojan, backdoor, ransomware, infostealer, rootkit, wiper, loader, dropper, RAT, and 9 known RaaS groups |
| Vulnerabilities | CVE, exploit, RCE, zero-day, patch, privilege-escalation, buffer-overflow |
| Threat Actors | APT, campaign, nation-state, and 15+ named groups (lazarus, apt29, etc.) |
| Data Leaks | breach, data-leak, exfiltration, credential-dump |
| Phishing & Social Engineering | phishing, BEC, spearphishing, credential-theft, social-engineering |
| Supply Chain | supply-chain, dependency-confusion, typosquatting, npm, pypi |
| Botnet & DDoS | botnet, ddos, mirai, amplification |
| C2 & Offensive Tooling | cobalt-strike, metasploit, sliver, brute-ratel, C2 |
| IoT & Hardware | firmware, scada, ics, industrial, embedded |
Articles with tags matching multiple categories appear in each relevant category.
Subcategory Drill-Down¶
Three categories support entity-level drill-down:
- Threat Actors — Named groups (APT29, Lazarus, Turla, etc.)
- Malware — Families (Emotet, LockBit, QakBot, etc.)
- C2 & Offensive Tooling — Tools (Cobalt Strike, Sliver, etc.)
Entity names are matched against a MITRE ATT&CK lookup table containing 100+ threat actor groups and 200+ malware families. Only known MITRE entities become subcategories; generic tags are grouped under "General."
Entity Normalization¶
Versioned variants are canonicalized to their base name:
| Raw Tag | Normalized |
|---|---|
lockbit-3.0 |
lockbit |
lockbit-2.0 |
lockbit |
apt-29 |
apt29 |
This ensures articles about the same entity are grouped together regardless of version references.
Processing Details¶
Batch Processing¶
The summarize_pending() function processes up to 10 unsummarized articles per batch. Each article is summarized individually with its own API call.
Content Handling¶
- Article content is truncated to 12,000 characters before being sent to the LLM
- The summarizer uses
response_format: json_objectfor reliable structured output - Temperature is set to 0.3 for consistent, factual summaries
Retry Logic¶
- Up to 3 retries on rate limit or transient API errors
- OpenAI: Exponential backoff: 4 s, 8 s, 16 s between retries
- Anthropic: Exponential backoff starting at 10 s, doubling to a maximum of 120 s; honours the
Retry-Afterresponse header when provided by the API - Failed articles are marked with
model_used="failed"and skipped on subsequent runs; failed count is shown in the sidebar statistics
Model Configuration¶
The summarizer uses whichever model is set in config.json under openai_model (OpenAI provider) or anthropic_model (Anthropic provider). Token limits:
| Operation | Max Tokens | Temperature |
|---|---|---|
| Relevance check | 300 | 0.0 |
| Article summary | 2,500 | 0.3 |
| Category insight | 2,000 | 0.4 |
Quality vs. Cost
gpt-4o-mini handles most articles well. For articles requiring deeper analysis (APT campaigns, complex exploit chains), gpt-4o produces noticeably better attack flows and novelty assessments.