Feed Aggregation¶
Threat Loom automatically collects threat intelligence articles from RSS/Atom feeds and the Malpedia research library. A built-in LLM relevance filter ensures only genuine threat intelligence enters the database.
Supported Formats¶
- RSS 2.0 — Standard syndication format used by most security blogs
- Atom — Alternative feed format, fully supported
The feed parser uses feedparser with a requests-based fallback for sites that block feed readers.
Pre-Configured Sources¶
Threat Loom ships with 13 curated cybersecurity feeds:
| # | Source | Type | Default |
|---|---|---|---|
| 1 | The Hacker News | News | Enabled |
| 2 | BleepingComputer | News | Enabled |
| 3 | Krebs on Security | Blog | Enabled |
| 4 | SecurityWeek | News | Enabled |
| 5 | Dark Reading | News | Enabled |
| 6 | CISA Alerts | Government | Enabled |
| 7 | Sophos News | Vendor Research | Enabled |
| 8 | Infosecurity Magazine | News | Enabled |
| 9 | HackRead | News | Enabled |
| 10 | SC Media | News | Disabled |
| 11 | Cyber Defense Magazine | News | Disabled |
| 12 | The Record | News | Enabled |
| 13 | Schneier on Security | Blog | Enabled |
How Fetching Works¶
RSS/Atom Pipeline¶
- Iterate enabled feeds — Each feed is processed sequentially
- Download feed XML —
requestswith a feed-reader User-Agent (20-second timeout) - Parse entries — Extract title, URL, author, published date, image
- Date filtering — Skip articles older than the lookback period (default: 1 day)
- Skip file URLs — Ignore links to PDFs, DOCs, ZIPs, and other non-web content
- Deduplication — Skip articles whose URL is already in the database
- Relevance filtering — Batch-classify titles via LLM (see below)
- Insert — Store relevant articles in the database
- Update timestamp — Record
last_fetchedfor the source
Lookback Period¶
When refreshing, you can specify a lookback window:
- By days — Fetch articles published within the last N days (default: 1)
- Since last fetch — Only fetch articles newer than each source's
last_fetchedtimestamp
The scheduler uses the default 1-day lookback. Manual refreshes allow custom lookback periods via the UI.
Relevance Filtering¶
Not all articles from security feeds are threat intelligence. Many are product announcements, opinion pieces, or general IT news. Threat Loom uses an LLM to classify relevance:
- Batch titles — Up to 25 article titles per LLM call
- Classify — The model labels each as
RELEVANTorIRRELEVANTbased on threat research value - Filter — Only articles classified as relevant are inserted
No API Key Fallback
If no OpenAI API key is configured, relevance filtering is skipped and all articles are accepted. This allows basic operation without an API key, though the database will contain more noise.
Malpedia Integration¶
Malpedia is a curated repository of threat research maintained by Fraunhofer FKIE. Threat Loom integrates with it as an additional article source.
How It Works¶
- Fetch BibTeX — Download the full bibliography from
/api/get/bib(~4.5 MB, 60-second timeout) - Parse entries — Extract title, URL, author, organization, and date using regex
- Date filter — Same lookback logic as RSS feeds
- Relevance check — Same LLM batch classification
- Insert — Store as articles with source "Malpedia"
Setup¶
A Malpedia API key is required. See Configuration for setup instructions.
Adding Custom Feeds¶
You can add any RSS or Atom feed as a source. See Configuration — Adding Custom Feeds for instructions.
Finding Feed URLs
Most security blogs and news sites offer RSS feeds. Common URL patterns:
/feed/or/rss//feed.xmlor/rss.xml- Check the page source for
<link rel="alternate" type="application/rss+xml">
URL Ingestion¶
In addition to scheduled feed fetching, you can process specific article URLs on demand using the Ingest URLs button in the dashboard header.
How It Works¶
- Click Ingest URLs in the header
- Paste one URL per line in the dialog
- Click Process — the pipeline runs scrape → cost gate → summarize → embed for each new URL
URLs already in the database are skipped. Invalid schemes (non-http/https) are rejected. The job runs in the background using the same pipeline lock as a full refresh.
API¶
This is equivalent to adding a one-off article source without modifying your feed list.
Skipped Content¶
The fetcher automatically skips URLs pointing to non-web content:
- Documents:
.pdf,.doc,.docx - Spreadsheets:
.xls,.xlsx - Archives:
.zip,.tar,.gz - Executables:
.exe,.msi - Images and other binary formats
These URLs cannot be scraped for text content and are filtered out during ingestion.