AI Summarization¶

Every ingested article is processed by the configured LLM provider (OpenAI or Anthropic) to produce a structured intelligence summary. Summaries include an executive overview, novelty assessment, technical details, defensive mitigations, tags, and an attack flow sequence.

Summary Structure¶

The summarizer requests a structured JSON response with these fields:

Executive Summary¶

A 3-5 sentence paragraph covering the core intelligence — what happened, who is involved, what's affected, and why it matters.

Novelty Assessment¶

What's new or noteworthy about the threat. This highlights novel tactics, techniques, tooling, or targeting that distinguish this from routine activity. The novelty text is stored both within the markdown summary and as a separate novelty_notes field in the database, enabling the "What's Notable" section on the article detail page.

Technical Details¶

An array of bullet points capturing:

Indicators of Compromise (IOCs)
CVE identifiers
Affected systems and versions
Attack chain steps
Malware capabilities
Infrastructure details

Mitigations¶

Actionable defensive recommendations derived from the article content.

Tags¶

3-8 lowercase, hyphenated tags for categorization:

Tag Type	Examples	Purpose
Category	`ransomware`, `phishing`, `vulnerability`	Maps to broad threat categories
Entity	`apt29`, `emotet`, `cobalt-strike`	MITRE ATT&CK threat actors and software
CVE	`cve-2024-1234`	Specific vulnerability identifiers

Attack Flow¶

An ordered sequence of attack phases (when applicable). Each step includes:

{
  "phase": "Initial Access",
  "title": "Spearphishing with macro-enabled document",
  "description": "Attacker sends targeted email with weaponized DOCX...",
  "technique": "T1566.001"
}

See Attack Flow for details on the visualization.

Categorization¶

Tags drive the categorization system. Each tag is mapped to one of 9 broad threat categories using keyword rules:

Category	Matching Tags
Malware	malware, trojan, backdoor, ransomware, infostealer, rootkit, wiper, loader, dropper, RAT, and 9 known RaaS groups
Vulnerabilities	CVE, exploit, RCE, zero-day, patch, privilege-escalation, buffer-overflow
Threat Actors	APT, campaign, nation-state, and 15+ named groups (lazarus, apt29, etc.)
Data Leaks	breach, data-leak, exfiltration, credential-dump
Phishing & Social Engineering	phishing, BEC, spearphishing, credential-theft, social-engineering
Supply Chain	supply-chain, dependency-confusion, typosquatting, npm, pypi
Botnet & DDoS	botnet, ddos, mirai, amplification
C2 & Offensive Tooling	cobalt-strike, metasploit, sliver, brute-ratel, C2
IoT & Hardware	firmware, scada, ics, industrial, embedded

Articles with tags matching multiple categories appear in each relevant category.

Subcategory Drill-Down¶

Three categories support entity-level drill-down:

Threat Actors — Named groups (APT29, Lazarus, Turla, etc.)
Malware — Families (Emotet, LockBit, QakBot, etc.)
C2 & Offensive Tooling — Tools (Cobalt Strike, Sliver, etc.)

Entity names are matched against a MITRE ATT&CK lookup table containing 100+ threat actor groups and 200+ malware families. Only known MITRE entities become subcategories; generic tags are grouped under "General."

Entity Normalization¶

Versioned variants are canonicalized to their base name:

Raw Tag	Normalized
`lockbit-3.0`	`lockbit`
`lockbit-2.0`	`lockbit`
`apt-29`	`apt29`

This ensures articles about the same entity are grouped together regardless of version references.

Processing Details¶

Batch Processing¶

The summarize_pending() function processes up to 10 unsummarized articles per batch. Each article is summarized individually with its own API call.

Content Handling¶

Article content is truncated to 12,000 characters before being sent to the LLM
The summarizer uses response_format: json_object for reliable structured output
Temperature is set to 0.3 for consistent, factual summaries

Retry Logic¶

Up to 3 retries on rate limit or transient API errors
OpenAI: Exponential backoff: 4 s, 8 s, 16 s between retries
Anthropic: Exponential backoff starting at 10 s, doubling to a maximum of 120 s; honours the Retry-After response header when provided by the API
Failed articles are marked with model_used="failed" and skipped on subsequent runs; failed count is shown in the sidebar statistics

Model Configuration¶

The summarizer uses whichever model is set in config.json under openai_model (OpenAI provider) or anthropic_model (Anthropic provider). Token limits:

Operation	Max Tokens	Temperature
Relevance check	300	0.0
Article summary	2,500	0.3
Category insight	2,000	0.4

Quality vs. Cost

gpt-4o-mini handles most articles well. For articles requiring deeper analysis (APT campaigns, complex exploit chains), gpt-4o produces noticeably better attack flows and novelty assessments.