Product Antifraud -- Log-Based Fraud Detection

Aspect Detail

Purpose Rule-based fraud detection from application logs (registration + auth flows)

Approach Pure Python + pandas -- counting, grouping, threshold problems

Not for ML classification -- use at moderate volumes (~50K entries/day)

Outputs Markdown report + CSV alerts for security teams

When to Use This Skill

Task This Skill Applies

Building fraud detection scripts for registration or auth logs Yes

Analyzing K8s application logs for suspicious behavioral patterns Yes

Detecting bots, credential stuffing, or velocity abuse from structured logs Yes

Auditing logs for GDPR PII exposure (unmasked emails, phones, names) Yes

Designing tunable threshold-based rule engines with JSON config Yes

Reviewing or extending existing antifraud detection rules Yes

Building fraud alerting reports (Markdown + CSV) for security teams Yes

ML-based fraud scoring (real-time model inference) No -- use ai-ml-data-science

Application security hardening (OWASP, auth implementation) No -- use software-security-appsec

Infrastructure log analysis (access logs, firewall, WAF) No -- use ops-devops-platform

Real-time streaming fraud detection No -- use data-lake-platform

Quick-Start Checklist

Step Action Notes

1 Identify log type Registration (.txt.gz /.debug.gz ) or auth (.log /.log.gz )

2 Create directory structure config/ , reports/ , script file

3 Build LogParser Correct timestamp format: , for registration ms, . for auth ms

4 Implement SessionAggregator pandas groupby for key dimensions (token, IP, device, email)

5 Create JSON config Default thresholds (see Configuration Pattern below)

6 Implement velocity rules R1-R12 or A1-A13 -- highest signal-to-noise ratio

7 Add bot detection rules R13-R17 or A14-A19

8 Add behavioral analysis rules R18-R22 or A20-A25

9 Enable PIIScanner GDPR compliance pass on all log lines

10 Test in discover mode --mode discover against example data

11 Tune thresholds Reduce false positives, verify known fraud patterns surface

12 Cross-node correlation Merge by token/session_id before aggregation

Quick Reference

Architecture (4 Layers)

Every fraud detection script follows this pattern:

Log Files (.gz, .log) | v [1] LogFileReader -- Walk dirs, handle .gz decompression, iterate lines | v [2] LogParser -- Regex extraction -> dataclass (RegistrationEvent / AuthEvent) | v [3] SessionAggregator -- Group by token/IP/device/email, compute features | v [4] RuleEngine + Report -- Evaluate rules, produce Markdown + CSV alerts

Detection Rule Categories

Category Registration (R) Authentication (A) Reference

Fraud velocity R1-R12 A1-A13 references/registration-fraud-rules.md, references/auth-fraud-rules.md

Bot vs human R13-R17 A14-A19 references/bot-detection-patterns.md

Behavioral analysis R18-R22 A20-A25 references/behavioral-analysis-rules.md

GDPR PII scanning Both scripts Both scripts references/gdpr-pii-scanning.md

Rule Severity Quick Map

Severity Registration Examples Auth Examples

CRITICAL JNDI injection (R10), national ID exposure Personal data API response leaks

HIGH Email/device velocity (R1-R2), IP hopping (R6) Brute force (A1), credential stuffing (A2), session hijack (A4)

MEDIUM Partial phone masking, confirmation brute force (R8) Captcha trigger rate (A8), off-hours surge (A10)

LOW Sequential email patterns (R17) Auth method escalation (A21)

Decision Tree

New fraud detection task: | +-- Registration logs? | +-- .txt.gz / .debug.gz format? | | -> Use RegistrationEvent parser (references/log-parser-architecture.md) | +-- What signals available? | +-- Token, IP, DeviceSerial, Email, Phone -> R1-R12 velocity rules | +-- Timing data -> R13-R15 bot detection | +-- Platform field -> R12, R16 device fingerprinting | +-- Authentication logs? | +-- .log / .log.gz format? | | -> Use AuthEvent parser (references/log-parser-architecture.md) | +-- What signals available? | +-- user_id, IP, device_id -> A1-A6 velocity rules | +-- Fraud check weights -> A5, A11 risk scoring | +-- Country field -> A4, A20 impossible travel | +-- Auth type field -> A12, A21 method switching | +-- GDPR compliance audit? -> Run PIIScanner pass on both log types -> See references/gdpr-pii-scanning.md

CLI Interface Pattern

Discover mode: analyze example logs, output pattern statistics

python registration_fraud.py examples/epa-registration/ --mode discover --output reports/

Detect mode: apply rules to new logs, generate alerts

python registration_fraud.py /path/to/new-day-logs/
--config config/registration_rules.json --output reports/

Auth fraud (same pattern)

python auth_fraud.py examples/epa-identity-auth-publicapi/ --mode discover --output reports/

Output Format

Output Filename Contents

Markdown report report_YYYYMMDD_HHMMSS.md

Summary table, severity breakdown, detailed alerts with log line evidence

CSV export alerts_YYYYMMDD_HHMMSS.csv

One row per alert, importable into SIEM/ticketing

Tech Stack

Component Tool Notes

Runtime Python 3.10+ Standard library: re, gzip, json, csv, argparse, dataclasses, collections, datetime, pathlib, statistics

Data analysis pandas Time-window grouping and aggregation (only pip dependency)

Report formatting tabulate (optional) Pretty markdown tables

Configuration Pattern

Rules use external JSON configs for tunable thresholds (no code changes needed):

{ "gdpr_pii_scanner": { "enabled": true, "check_emails": true, "check_phones": true, "check_names": true, "check_national_ids": true }, "bot_detection": { "timing_variance_threshold_ms": 50, "min_human_step_interval_seconds": 2, "known_emulator_serials": ["000000000000000", "emulator-5554"], "scripting_user_agents": ["python-requests", "curl", "Go-http-client"] }, "behavioral": { "impossible_travel_speed_kmh": 900, "burst_silence_ratio_threshold": 5.0, "session_abandonment_rate_threshold": 0.8 } }

Common Anti-Patterns

Anti-Pattern Why It Fails Instead

Hardcoded thresholds in code Cannot tune without redeployment External JSON config per rule

Single-dimension rules only Easy to evade by changing one variable Cross-correlate IP + device + email + timing

No deduplication Duplicate log lines inflate counts Deduplicate by (timestamp, request_id, message hash)

Ignoring multi-line entries Auth logs have stack traces across lines Parser must detect continuation lines

Treating all timestamps alike Registration uses , for ms; auth uses .

Normalize timestamp parsing per log type

Cross-node blind spots Same session spans K8s nodes Merge by token/session_id before aggregation

PII in fraud reports GDPR violation in the detection output itself Mask PII in report output, reference by hash/ID

Known Challenges

Challenge Impact Mitigation

Multi-line log entries Auth logs have stack traces across lines Detect continuation lines (leading whitespace, at , Caused by: )

Duplicate log lines Registration logs inflate counts Deduplicate by (timestamp, request_id, message hash)

Masked data (MASKED ) Auth logs limit email/phone correlation IP/device/user_id analysis still works

Different timestamp formats Registration , for ms; auth . for ms Normalize parsing per log type

Cross-node correlation Same session spans K8s nodes Merge by token/session_id before aggregation

Internal scanner noise Qualys scanner IP 10.7.2.171 triggers R10 Flag but annotate as likely internal scan

Trend Awareness Protocol

When users ask about current fraud detection approaches, search before answering:

Search Query Domain

1 "fintech fraud detection patterns 2026"

Fraud patterns

2 "application log fraud analysis tools 2026"

Tooling

3 "GDPR log compliance requirements 2026"

Compliance

4 "bot detection registration abuse 2026"

Bot detection

Navigation

Reference Guides

File Coverage

references/registration-fraud-rules.md Registration fraud rules R1-R12: thresholds, signals, detection logic

references/auth-fraud-rules.md Auth fraud rules A1-A13: thresholds, signals, detection logic

references/bot-detection-patterns.md Bot vs human: timing analysis, UA fingerprinting, speed checks, emulators

references/behavioral-analysis-rules.md Behavioral: impossible travel, session abandonment, burst-then-silence

references/gdpr-pii-scanning.md GDPR PII scanner: regex patterns, severity levels, config, report format

references/log-parser-architecture.md 4-layer architecture: LogFileReader, LogParser, SessionAggregator, RuleEngine

data/sources.json 18 curated antifraud, OWASP, GDPR, and log analysis resources

Related Skills

Skill Use For

software-security-appsec Application security patterns, OWASP Top 10

ai-ml-data-science ML-based fraud classification (when rule-based is insufficient)

data-analytics-engineering Data pipeline patterns for log aggregation

qa-observability Observability, structured logging, SIEM integration

product-antifraud

Safety Notice

Copy this and send it to your AI assistant to learn

Discover mode: analyze example logs, output pattern statistics

Detect mode: apply rules to new logs, generate alerts

Auth fraud (same pattern)

Source Transparency

Related Skills

product-management

marketing-visual-design

startup-idea-validation

software-architecture-design