Security Audit Report¶

Date: 2026-02-22 (updated) Scope: Threat Loom application — staged-file secret scanning, dependency vulnerabilities, static code analysis, secret scanning, container image security Tools: gitleaks 8.30.0, pip-audit 2.10.0, bandit 1.9.3, detect-secrets 1.5.0, Docker Scout (Docker 29.2.0)

Change Log¶

Date	Change
2026-02-14	Initial audit
2026-02-19	Re-audit after feature additions (Anthropic provider, time-period filter, cost estimation, URL validation, sidebar logo). Added gitleaks staged-file scan.
2026-02-22 (v3)	Re-audit after feature additions (generate-embeddings button, abort pipeline, clear DB by period, ingest URLs modal, LLM provider-aware API key check, sidebar stats redesign, failed-summary count, Anthropic rate-limit backoff, intelligence system prompt guardrails).
2026-02-22 (v4)	Re-audit after bug fixes and new features: fix "refresh since last fetch" returning 0 articles (web: no issue found; Android: DateUtils.parseIso SQLite format mismatch + FetchMalpediaUseCase missing lastFetched logic); add time-period filtering to Intelligence RAG (`get_article_ids_since_days`, `semantic_search(since_days)`, auto-detection of natural-language time references in `intelligence.py`).

Summary¶

Scan	Tool	Findings	Critical / High	Status
Staged-file secret scan	gitleaks 8.30.0	0	0 / 0	Pass
Dependency vulnerabilities	pip-audit	1	0 / 0	Remediated (unchanged)
Static code analysis	bandit	15	0 / 0	Reviewed — all false positives or acceptable patterns
Secret scanning	detect-secrets	6	—	Reviewed — all documentation placeholders
Container image	Docker Scout (Docker 29.2.0)	21	0 / 0	Accepted — all in OS base image, no fixes available

Overall assessment: No critical or high-severity issues. No secrets leaked in staged files. One medium-severity dependency CVE (pip) remains remediated in Docker. All static analysis findings are false positives or intentional exception-handling patterns. All detect-secrets findings are example/placeholder values in documentation files. Container image findings are unchanged (21 OS-level vulnerabilities, no application-level vulnerabilities).

The four modified Python files (app.py, database.py, embeddings.py, intelligence.py) introduced zero new bandit findings. The new get_article_ids_since_days() helper uses exclusively parameterized queries (no f-string SQL interpolation of user data) and is therefore clean. Line numbers in existing findings shifted by approximately +58 in database.py due to the new function insertion.

0. Staged-File Secret Scan (gitleaks)¶

Command:

D:/gitleaks/gitleaks.exe protect --staged --verbose

Staged files scanned: 4 files (~4.32 KB)

Result: No leaks found.

Gitleaks scanned the following staged files for API keys, tokens, credentials, and other secrets:

File	Description
`app.py`	Flask routes — added `since_days` parameter to `/api/intelligence/chat`
`database.py`	SQLite data layer — new `get_article_ids_since_days()` helper
`embeddings.py`	Semantic search — time-period filtering via `since_days`
`intelligence.py`	RAG chat — `_extract_since_days()` NLP helper and `since_days` propagation

No secrets were detected in any of the staged files. All API keys and credentials continue to be stored exclusively in data/config.json (excluded from git and Docker image).

1. Dependency Vulnerability Scan (pip-audit)¶

Command:

pip-audit --format json

Total packages scanned: 91 (unchanged)

Findings¶

Package	Version	CVE	Severity	Description	Status
pip	25.3	CVE-2026-1703	Medium	Path traversal when extracting maliciously crafted wheel archives. Files may be extracted outside the installation directory, limited to prefixes of the install path.	Remediated in Docker

No new CVEs were introduced by any of the newly added code. All runtime dependencies remain clean.

Remediation¶

Dockerfile: pip install --upgrade pip runs before installing dependencies, ensuring the container always uses the latest patched pip.
Local development: Developers should run pip install --upgrade pip in their virtual environment.
Note: pip is a build-time tool, not a runtime dependency. This CVE requires a maliciously crafted wheel to be installed, making exploitation unlikely in normal operation.

Clean packages (notable runtime deps)¶

Package	Version	Status
flask	3.1.2	Clean
openai	2.20.0	Clean
anthropic	0.82.0	Clean
requests	2.32.5	Clean
trafilatura	2.0.0	Clean
numpy	2.4.2	Clean
jinja2	3.1.6	Clean
werkzeug	3.1.5	Clean
lxml	6.0.2	Clean
certifi	2026.1.4	Clean
urllib3	2.6.3	Clean

2. Static Code Analysis (bandit)¶

Command:

bandit -r . --exclude ./venv,./site,./__pycache__,./android,./docs -f json

Total lines scanned: 5,049 across 14 files (up from 4,964 — 85 new lines across the four modified files and the new get_article_ids_since_days helper)

Findings¶

B608 — SQL Injection (11 instances in `database.py`) — FALSE POSITIVE¶

Location	Severity	Confidence	Context
database.py:621	Medium	Medium	`get_articles_by_ids()` — `IN ({ph})` placeholder construction
database.py:972	Medium	Medium	`get_categorized_articles()` — date_filter f-string
database.py:1065	Medium	Medium	`get_articles_for_category()` — date_filter f-string
database.py:1204	Medium	Medium	`delete_file_url_articles()` — `IN ({ph})` placeholder
database.py:1205	Medium	Medium	`delete_file_url_articles()` — `IN ({ph})` placeholder
database.py:1207	Medium	Medium	`delete_file_url_articles()` — `IN ({ph})` placeholder
database.py:1210	Medium	Medium	`delete_file_url_articles()` — `IN ({ph})` placeholder
database.py:1255	Medium	Medium	`clear_articles_before_days()` — `IN ({ph})` placeholder
database.py:1256	Medium	Medium	`clear_articles_before_days()` — `IN ({ph})` placeholder
database.py:1258	Medium	Medium	`clear_articles_before_days()` — `IN ({ph})` placeholder
database.py:1262	Medium	Medium	`clear_articles_before_days()` — `IN ({ph})` placeholder

Line numbers shifted by ~+58 from the previous audit due to the new get_article_ids_since_days() function (~35 lines) inserted before get_articles_by_ids(), plus minor additions in other modified files.

Analysis:

All eleven flagged lines fall into two safe patterns:

Pattern A — Dynamic IN clause (lines 621, 1204–1210, 1255–1262):

ph = ",".join("?" * len(ids))
conn.execute(f"DELETE FROM articles WHERE id IN ({ph})", ids)

The f-string interpolates only the ? placeholder string (e.g., ?,?,?), never user data. Values are passed via SQLite's parameterized binding. This is the standard safe pattern for dynamic IN clauses in SQLite.

Pattern B — Optional date filter (lines 972, 1065):

date_filter = ""
if since_days:
    date_filter = " AND date(a.published_date) >= date('now', ?)"
    params.append(f"-{since_days} days")
rows = conn.execute(f"...{date_filter}...", params)

The {date_filter} interpolated into the query is a hardcoded SQL string constant, not user input. The user-supplied since_days is cast to int in the Flask route (type=int) and then formatted as -N days — passed as a parameterized value to SQLite, never interpolated into the SQL string.

New function get_article_ids_since_days() (lines ~568–600): zero bandit findings. This function uses exclusively ?-parameterized queries without any f-string SQL interpolation of user data:

rows = conn.execute(
    "SELECT ae.article_id FROM article_embeddings ae "
    "JOIN articles a ON a.id = ae.article_id "
    "WHERE ae.model_used = ? "
    "AND date(COALESCE(a.published_date, a.fetched_date)) >= date('now', ?)",
    (model_used, cutoff),
).fetchall()

No user-controllable input reaches the SQL string in any of these locations.

Verdict: False positives. No fix required.

B112 — Try/Except/Continue (1 instance in `database.py`) — ACCEPTABLE¶

Location	Severity	Confidence
database.py:1197	Low	High

Analysis: The code catches exceptions during urlparse() on article URLs stored in the database and continues to the next row. This is intentional defensive coding to skip malformed URLs without crashing the cleanup function delete_file_url_articles(). Line shifted from 1139 (previous audit) due to new function insertion.

Verdict: Acceptable pattern. No fix required.

B105 — Hardcoded Password (2 instances in `config.py`) — FALSE POSITIVE¶

Location	Severity	Confidence	Context
config.py:54	Low	Medium	`"smtp_password": ""`
config.py:58	Low	Medium	`"report_token": ""`

Analysis: Both flags are empty string defaults in the configuration template. Neither contains an actual credential — both are placeholders that prompt the user to configure their own value. Real values are stored in data/config.json (excluded from git and Docker image) or passed via environment variables. Unchanged from previous audit.

Verdict: False positives. No fix required.

B110 — Try/Except/Pass (1 instance in `llm_client.py`) — ACCEPTABLE¶

Location	Severity	Confidence
llm_client.py:136	Low	High

Analysis:

try:
    ra = getattr(getattr(e, "response", None), "headers", {}).get("retry-after")
    if ra:
        wait = max(int(ra), retry_delay)
except Exception:
    pass

The except Exception: pass is intentional and narrowly scoped: if parsing the Retry-After response header fails for any reason (missing attribute, non-integer value, malformed header), the code silently falls back to the default retry_delay value. Unchanged from previous audit.

Verdict: Acceptable pattern. No fix required.

Files with zero findings¶

File	Lines
app.py	695
article_scraper.py	127
cost_tracker.py	46
embeddings.py	161
feed_fetcher.py	209
intelligence.py	201
malpedia_fetcher.py	169
mitre_data.py	957
notifier.py	172
scheduler.py	320
summarizer.py	650

intelligence.py grew from 161 to 201 LOC (new _extract_since_days() regex helper and chat() parameter changes) with zero new bandit findings. embeddings.py grew from 153 to 161 LOC with zero new bandit findings.

3. Secret Scanning (detect-secrets)¶

Command:

detect-secrets scan \
  --exclude-files 'venv/.*' \
  --exclude-files 'android/.*' \
  --exclude-files '.*\.(jpg|jpeg|png|ico|gif|svg)$'

Result: 6 findings across 3 files — all documentation placeholder values.

File	Line	Type	Content	Verdict
`docs/api-reference.md`	593	Secret Keyword	`"openai_api_key": "sk-proj-..."`	False positive — example placeholder in API docs (line shifted from 485 due to new endpoint documentation added above)
`docs/api-reference.md`	611	Secret Keyword	`"smtp_password": "app-password"`	False positive — label string in API docs (line shifted from 503)
`docs/getting-started.md`	75	Secret Keyword	`"openai_api_key": "sk-proj-your-key-here"`	False positive — placeholder in getting-started guide (unchanged)
`docs/security-audit.md`	—	Secret Keyword	(same hash as api-reference.md:593)	False positive — this audit document quoting previous findings
`docs/security-audit.md`	—	Secret Keyword	(same hash as api-reference.md:611)	False positive — this audit document quoting previous findings
`docs/security-audit.md`	—	Secret Keyword	(same hash as getting-started.md:75)	False positive — this audit document quoting previous findings

Line numbers in docs/security-audit.md are not fixed in this table because they depend on the final line count of this document after editing. The three findings are self-referential: this audit document quotes the same placeholder strings from the api-reference and getting-started findings tables above.

The 2 findings in docs/api-reference.md shifted from lines 485/503 to 593/611 (+108 lines) because new endpoint documentation for /api/intelligence/chat since_days parameter and related endpoints was added above the Settings section.

All 6 findings are in documentation files and contain clearly non-secret placeholder text. The sk-proj-... and sk-proj-your-key-here strings are not valid OpenAI API keys (real keys have 51+ character random suffixes). The string app-password is a generic label used in documentation examples, not a real credential.

No secrets detected in application source code, templates, or static assets.

The scan checks for:

AWS keys, Azure storage keys
OpenAI API keys, GitHub/GitLab tokens
High-entropy Base64/hex strings
JWT tokens, private keys
Basic auth credentials
Slack, Stripe, Twilio, SendGrid, Telegram tokens
Artifactory, Cloudant, IBM Cloud, Mailchimp, npm, PyPI tokens

Design notes:

API keys and SMTP credentials are stored in data/config.json (excluded from Docker image via .dockerignore) or passed via environment variables
No secrets are hardcoded in source code
The config.json template in docs uses placeholder values (sk-proj-your-key-here)
SMTP passwords are stored in plaintext in config.json, consistent with the existing pattern for API keys. For Docker deployments, prefer environment variables (SMTP_PASSWORD)

4. Container Image Scan (Docker Scout)¶

Command:

docker build -t threat-loom .
docker scout cves threat-loom

Scanner: Docker Scout (Docker 29.2.0) Image: threat-loom:latest (124 MB, 193 packages indexed) Base image: python:3.13-slim (Debian Trixie 13)

Container image was not rebuilt in this iteration (no changes to requirements.txt, Dockerfile, or base image). Results are unchanged from the 2026-02-22 (v3) audit.

Summary¶

Severity	Count	Change from v3
Critical	0	—
High	0	—
Medium	1	Unchanged
Low	20	Unchanged
Total	21	Unchanged

Findings¶

All 21 vulnerabilities are in Debian OS base packages shipped with python:3.13-slim. None are in application code or Python dependencies. All have no fix available from Debian upstream.

Package	Version	CVEs	Severity	Notes
tar	1.35+dfsg-3.1	CVE-2025-45582	Medium	Exploiting requires crafting malicious tar archives; application does not process tar files
tar	1.35+dfsg-3.1	CVE-2005-2541	Low	Permissions-preserving behavior; known for 20+ years, considered by-design
glibc	2.41-12+deb13u1	7 CVEs	Low	CVE-2019-9192, CVE-2019-1010025, CVE-2019-1010024, CVE-2019-1010023, CVE-2019-1010022, CVE-2018-20796, CVE-2010-4756 — all disputed or negligible impact
systemd	257.9-1~deb13u1	4 CVEs	Low	CVE-2023-31439, CVE-2023-31438, CVE-2023-31437, CVE-2013-4392 — container does not use systemd as init
coreutils	9.7-3	2 CVEs	Low	CVE-2025-5278, CVE-2017-18018 — local attack vector only
shadow	1:4.17.4-2	CVE-2007-5686	Low	Informational only
openssl	3.5.4-1~deb13u2	CVE-2010-0928	Low	Theoretical cache-timing attack; not practically exploitable
apt	3.0.3	CVE-2011-3374	Low	Repository signature validation edge case
perl	5.40.1-6	CVE-2011-4116	Low	Temp file race condition in File::Temp; application does not use Perl
util-linux	2.41-5	CVE-2022-0563	Low	chfn/chsh READLINE variable disclosure; not relevant in containers
sqlite3	3.46.1-7	CVE-2021-45346	Low	Disputed; considered a non-issue by SQLite maintainers

Risk assessment¶

No critical or high vulnerabilities in the container image
The single medium CVE (tar) requires an attacker to supply a malicious tar archive for extraction — the application does not process tar files at runtime
All 20 low CVEs are in OS packages with no upstream fixes and negligible practical risk in a containerized Python web application
No Python package vulnerabilities detected in the container (0 findings in application layer)
No new vulnerabilities introduced by application code changes in this iteration

Container hardening applied¶

Measure	Details
Minimal base image	`python:3.13-slim` — reduced attack surface vs full image
Non-root user	`appuser` created and set via `USER` directive
No cached packages	`pip install --no-cache-dir` — no pip cache in image
pip upgraded	Latest pip installed to avoid CVE-2026-1703
No secrets baked in	`config.json`, `*.db`, and `data/` excluded via `.dockerignore`
Minimal file copy	Only `.py`, `templates/`, `static/`, and `requirements.txt` copied

Alternative: Alpine-based image¶

Docker Scout recommends python:3.14-alpine as an alternative base image, which would reduce vulnerabilities from 21 to 2 (1M, 1L) and shrink image size significantly. Trade-off: Alpine uses musl libc instead of glibc, which may cause compatibility issues with some compiled Python packages (e.g., numpy). This should be evaluated in a future iteration.

5. Recommendations¶

Immediate (completed)¶

Upgrade pip in Dockerfile to fix CVE-2026-1703
Run container as non-root user
Exclude secrets and data files from Docker image via .dockerignore
Support API keys via environment variables (no hardcoding needed)
Validate feed URLs (http/https only) client-side and server-side before saving
Validate article URLs before rendering as href links (prevent javascript: / data: injection)

Ongoing¶

Re-run pip-audit periodically or add it to CI to catch new CVEs in dependencies
Add gitleaks protect --staged as a git pre-commit hook to catch secrets before every commit
Create a .secrets.baseline file for detect-secrets to formally track the documentation false positives and prevent re-alerting on them
Pin dependency versions in requirements.txt for reproducible builds (currently using >= minimum version constraints)
Consider adding bandit to a pre-commit hook or CI pipeline
Consider encrypting SMTP credentials at rest in config.json (currently plaintext, matching the existing API key pattern)
Evaluate Alpine-based Docker image to reduce OS-level CVE surface

Appendix: Tool Versions¶

Tool	Version
gitleaks	8.30.0
pip-audit	2.10.0
bandit	1.9.3
detect-secrets	1.5.0
Python	3.13
Docker	29.2.0
Docker Scout	built-in
pip	25.3 (26.0.1 in Docker)

Security Audit Report¶

Change Log¶

Summary¶

0. Staged-File Secret Scan (gitleaks)¶

1. Dependency Vulnerability Scan (pip-audit)¶

Findings¶

Remediation¶

Clean packages (notable runtime deps)¶

2. Static Code Analysis (bandit)¶

Findings¶

B608 — SQL Injection (11 instances in database.py) — FALSE POSITIVE¶

B112 — Try/Except/Continue (1 instance in database.py) — ACCEPTABLE¶

B105 — Hardcoded Password (2 instances in config.py) — FALSE POSITIVE¶

B110 — Try/Except/Pass (1 instance in llm_client.py) — ACCEPTABLE¶

Files with zero findings¶

3. Secret Scanning (detect-secrets)¶

4. Container Image Scan (Docker Scout)¶

Summary¶

Findings¶

Risk assessment¶

Container hardening applied¶

Alternative: Alpine-based image¶

5. Recommendations¶

Immediate (completed)¶

Ongoing¶

Appendix: Tool Versions¶

B608 — SQL Injection (11 instances in `database.py`) — FALSE POSITIVE¶

B112 — Try/Except/Continue (1 instance in `database.py`) — ACCEPTABLE¶

B105 — Hardcoded Password (2 instances in `config.py`) — FALSE POSITIVE¶

B110 — Try/Except/Pass (1 instance in `llm_client.py`) — ACCEPTABLE¶