Email Forensics

Comprehensive email forensics skill for analyzing email messages, mailbox archives, and email metadata. Enables investigation of phishing attacks, business email compromise (BEC), email spoofing, and extraction of forensically valuable artifacts from email data.

Capabilities

Mailbox Parsing: Parse PST, OST, MBOX, EML, and MSG files
Header Analysis: Deep analysis of email headers and routing
Attachment Extraction: Extract and analyze email attachments
Phishing Detection: Identify phishing indicators and techniques
Spoofing Detection: Detect email spoofing and impersonation
Link Analysis: Extract and analyze URLs in email content
Timeline Generation: Create email-based communication timeline
Thread Reconstruction: Rebuild email conversation threads
Metadata Extraction: Extract sender, recipient, and routing metadata
Authentication Analysis: Analyze SPF, DKIM, and DMARC results

Quick Start

from email_forensics import EmailAnalyzer, MailboxParser, PhishingDetector

Parse mailbox file

parser = MailboxParser("/evidence/mailbox.pst") emails = parser.get_all_messages()

Analyze single email

analyzer = EmailAnalyzer() analysis = analyzer.analyze_file("/evidence/suspicious.eml")

Detect phishing

detector = PhishingDetector() results = detector.scan_email(analysis)

Usage

Task 1: Mailbox Parsing

Input: Mailbox file (PST, OST, MBOX)

Process:

Load and validate mailbox file
Parse folder structure
Extract messages
Index metadata
Generate mailbox summary

Output: Parsed mailbox with message inventory

Example:

from email_forensics import MailboxParser

Parse Outlook PST file

parser = MailboxParser("/evidence/user_mailbox.pst")

Get mailbox info

info = parser.get_mailbox_info() print(f"Mailbox type: {info.format}") print(f"Total messages: {info.message_count}") print(f"Total folders: {info.folder_count}") print(f"Date range: {info.oldest_date} - {info.newest_date}")

List folders

folders = parser.get_folders() for folder in folders: print(f"Folder: {folder.name}") print(f" Path: {folder.path}") print(f" Messages: {folder.message_count}") print(f" Unread: {folder.unread_count}")

Get messages from folder

inbox = parser.get_messages(folder_path="Inbox") for msg in inbox: print(f"[{msg.date}] From: {msg.sender}") print(f" Subject: {msg.subject}") print(f" To: {msg.recipients}") print(f" Has attachments: {msg.has_attachments}")

Search messages

results = parser.search( query="confidential", search_body=True, search_subject=True ) for r in results: print(f"Match: {r.subject}") print(f" Folder: {r.folder}") print(f" Match context: {r.context}")

Export messages

parser.export_messages( folder_path="Inbox", output_dir="/evidence/exported/", format="eml" )

Generate mailbox report

parser.generate_report("/evidence/mailbox_report.html")

Task 2: Email Header Analysis

Input: Email message (EML, MSG, or raw headers)

Process:

Parse all header fields
Analyze routing path
Verify authentication
Detect anomalies
Generate header analysis

Output: Comprehensive header analysis

Example:

from email_forensics import HeaderAnalyzer

Analyze email headers

analyzer = HeaderAnalyzer() analysis = analyzer.analyze_file("/evidence/suspicious.eml")

Get basic headers

print(f"From: {analysis.from_address}") print(f"To: {analysis.to_addresses}") print(f"Subject: {analysis.subject}") print(f"Date: {analysis.date}") print(f"Message-ID: {analysis.message_id}")

Analyze routing path

routing = analysis.get_routing_path() for hop in routing: print(f"Hop {hop.number}:") print(f" From: {hop.from_server}") print(f" By: {hop.by_server}") print(f" Time: {hop.timestamp}") print(f" Delay: {hop.delay_seconds}s")

Get authentication results

auth = analysis.get_authentication() print(f"SPF: {auth.spf_result}") print(f" SPF domain: {auth.spf_domain}") print(f"DKIM: {auth.dkim_result}") print(f" DKIM domain: {auth.dkim_domain}") print(f"DMARC: {auth.dmarc_result}")

Detect anomalies

anomalies = analysis.detect_anomalies() for a in anomalies: print(f"ANOMALY: {a.type}") print(f" Description: {a.description}") print(f" Severity: {a.severity}")

Get original sender (envelope)

envelope = analysis.get_envelope_info() print(f"Envelope From: {envelope.mail_from}") print(f"Envelope To: {envelope.rcpt_to}")

Get X-headers

x_headers = analysis.get_x_headers() for header, value in x_headers.items(): print(f"{header}: {value}")

Export analysis

analysis.export_report("/evidence/header_analysis.html")

Task 3: Phishing Detection

Input: Email message

Process:

Analyze sender authenticity
Check URLs for malicious indicators
Analyze attachment risks
Detect social engineering
Calculate risk score

Output: Phishing analysis with risk assessment

Example:

from email_forensics import PhishingDetector, EmailAnalyzer

Initialize detector

detector = PhishingDetector()

Analyze email

analyzer = EmailAnalyzer() email = analyzer.parse_file("/evidence/suspicious.eml")

Run phishing detection

result = detector.analyze(email)

print(f"Risk Score: {result.risk_score}/100") print(f"Classification: {result.classification}") print(f"Confidence: {result.confidence}")

Get indicators

for indicator in result.indicators: print(f"INDICATOR: {indicator.type}") print(f" Description: {indicator.description}") print(f" Weight: {indicator.weight}") print(f" Evidence: {indicator.evidence}")

Check sender authenticity

sender = result.sender_analysis print(f"Sender: {sender.display_name} <{sender.address}>") print(f" Display name mismatch: {sender.display_name_mismatch}") print(f" Domain reputation: {sender.domain_reputation}") print(f" First-time sender: {sender.first_time_sender}")

Analyze URLs

for url in result.url_analysis: print(f"URL: {url.url}") print(f" Domain: {url.domain}") print(f" Display text: {url.display_text}") print(f" Mismatch: {url.text_url_mismatch}") print(f" Shortened: {url.is_shortened}") print(f" Risk: {url.risk_level}")

Check attachments

for att in result.attachment_analysis: print(f"Attachment: {att.filename}") print(f" Type: {att.content_type}") print(f" Risk: {att.risk_level}") print(f" Double extension: {att.has_double_extension}")

Export report

detector.generate_report(result, "/evidence/phishing_report.html")

Task 4: Attachment Analysis

Input: Email with attachments

Process:

Extract all attachments
Identify file types
Calculate hashes
Check for malware indicators
Extract metadata

Output: Attachment analysis with extracted files

Example:

from email_forensics import AttachmentAnalyzer

Initialize analyzer

analyzer = AttachmentAnalyzer()

Extract from single email

attachments = analyzer.extract_from_email( email_path="/evidence/email.eml", output_dir="/evidence/attachments/" )

for att in attachments: print(f"Attachment: {att.filename}") print(f" Content-Type: {att.content_type}") print(f" Size: {att.size}") print(f" MD5: {att.md5}") print(f" SHA256: {att.sha256}") print(f" Detected type: {att.detected_type}") print(f" Type mismatch: {att.type_mismatch}") print(f" Extracted to: {att.output_path}")

Analyze specific attachment

detailed = analyzer.analyze_file("/evidence/attachments/document.pdf") print(f"Metadata: {detailed.metadata}") print(f"Embedded objects: {detailed.embedded_objects}") print(f"Scripts: {detailed.contains_scripts}") print(f"Macros: {detailed.contains_macros}")

Extract from mailbox

mailbox_attachments = analyzer.extract_from_mailbox( mailbox_path="/evidence/mailbox.pst", output_dir="/evidence/all_attachments/", filter_types=["application/pdf", "application/msword"] )

Find suspicious attachments

suspicious = analyzer.find_suspicious(attachments) for s in suspicious: print(f"SUSPICIOUS: {s.filename}") print(f" Reason: {s.reason}")

Check against malware hashes

malware = analyzer.check_malware_hashes("/hashsets/malware.txt")

Generate attachment report

analyzer.generate_report("/evidence/attachment_report.html")

Task 5: Email Timeline Creation

Input: Mailbox or collection of emails

Process:

Parse all messages
Extract timestamps
Build chronological timeline
Identify communication patterns
Visualize activity

Output: Email communication timeline

Example:

from email_forensics import EmailTimeline

Initialize timeline

timeline = EmailTimeline()

Add email sources

timeline.add_mailbox("/evidence/user1.pst") timeline.add_mailbox("/evidence/user2.pst") timeline.add_folder("/evidence/exported_emails/")

Build timeline

events = timeline.build()

for event in events: print(f"[{event.timestamp}] {event.direction}") print(f" From: {event.sender}") print(f" To: {event.recipients}") print(f" Subject: {event.subject}")

Filter by date range

filtered = timeline.filter_by_date( start="2024-01-01", end="2024-01-31" )

Filter by participants

participant_emails = timeline.filter_by_participant("suspect@example.com")

Get communication patterns

patterns = timeline.analyze_patterns() print(f"Total messages: {patterns.total_messages}") print(f"Unique senders: {patterns.unique_senders}") print(f"Unique recipients: {patterns.unique_recipients}") print(f"Peak hours: {patterns.peak_hours}") print(f"Top correspondents: {patterns.top_correspondents}")

Detect anomalies

anomalies = timeline.detect_anomalies() for a in anomalies: print(f"ANOMALY: {a.description}") print(f" Time: {a.timestamp}")

Export timeline

timeline.export_csv("/evidence/email_timeline.csv") timeline.generate_visualization("/evidence/email_timeline.html")

Task 6: Email Thread Reconstruction

Input: Email messages

Process:

Group by conversation
Analyze In-Reply-To headers
Build thread hierarchy
Identify missing messages
Reconstruct full threads

Output: Reconstructed email threads

Example:

from email_forensics import ThreadReconstructor

Initialize reconstructor

reconstructor = ThreadReconstructor()

Load emails

reconstructor.load_mailbox("/evidence/mailbox.pst")

Reconstruct all threads

threads = reconstructor.reconstruct_all()

for thread in threads: print(f"Thread: {thread.subject}") print(f" Messages: {thread.message_count}") print(f" Participants: {thread.participants}") print(f" Duration: {thread.start_date} - {thread.end_date}") print(f" Complete: {thread.is_complete}")

# Print thread hierarchy
for msg in thread.messages:
    indent = "  " * msg.depth
    print(f"{indent}[{msg.date}] {msg.sender}: {msg.subject}")

Find specific thread

thread = reconstructor.find_thread(subject_contains="Project Alpha")

Find threads with missing messages

incomplete = reconstructor.find_incomplete_threads() for t in incomplete: print(f"Incomplete: {t.subject}") print(f" Missing IDs: {t.missing_message_ids}")

Export threads

reconstructor.export_threads( output_dir="/evidence/threads/", format="mbox" )

Generate thread report

reconstructor.generate_report("/evidence/threads_report.html")

Task 7: Spoofing Detection

Input: Email message

Process:

Verify sender headers
Check authentication records
Analyze display name tricks
Compare envelope vs header
Detect impersonation

Output: Spoofing analysis results

Example:

from email_forensics import SpoofingDetector

Initialize detector

detector = SpoofingDetector()

Analyze email

result = detector.analyze_file("/evidence/suspicious.eml")

print(f"Spoofing detected: {result.is_spoofed}") print(f"Confidence: {result.confidence}")

Header vs Envelope analysis

print(f"Header From: {result.header_from}") print(f"Envelope From: {result.envelope_from}") print(f"Mismatch: {result.from_mismatch}")

Display name analysis

display = result.display_name_analysis print(f"Display Name: {display.name}") print(f"Homograph attack: {display.homograph_detected}") print(f"Executive impersonation: {display.executive_impersonation}") print(f"Brand impersonation: {display.brand_impersonation}")

Authentication analysis

auth = result.authentication_analysis print(f"SPF Pass: {auth.spf_pass}") print(f"DKIM Pass: {auth.dkim_pass}") print(f"DMARC Pass: {auth.dmarc_pass}")

Reply-To analysis

reply_to = result.reply_to_analysis print(f"Reply-To: {reply_to.address}") print(f"Reply-To differs from From: {reply_to.differs_from_sender}")

Get all indicators

for indicator in result.indicators: print(f"INDICATOR: {indicator.name}") print(f" Evidence: {indicator.evidence}") print(f" Severity: {indicator.severity}")

Export report

detector.generate_report(result, "/evidence/spoofing_analysis.html")

Task 8: Link Analysis

Input: Email content

Process:

Extract all URLs
Analyze URL components
Check against threat intel
Detect URL obfuscation
Identify redirect chains

Output: URL analysis results

Example:

from email_forensics import LinkAnalyzer

Initialize analyzer

analyzer = LinkAnalyzer()

Extract links from email

links = analyzer.extract_from_email("/evidence/email.eml")

for link in links: print(f"URL: {link.url}") print(f" Display text: {link.display_text}") print(f" Domain: {link.domain}") print(f" TLD: {link.tld}") print(f" Text matches URL: {link.text_matches_url}") print(f" Is shortened: {link.is_shortened}") print(f" Is IP-based: {link.is_ip_based}") print(f" Risk score: {link.risk_score}")

Unshorten URLs

unshortened = analyzer.unshorten_urls(links) for u in unshortened: print(f"Short: {u.short_url}") print(f"Final: {u.final_url}") print(f"Redirects: {u.redirect_count}")

Check against threat intelligence

threats = analyzer.check_threat_intel( links, feed_path="/feeds/malicious_urls.txt" ) for t in threats: print(f"THREAT: {t.url}") print(f" Category: {t.category}") print(f" Source: {t.intel_source}")

Detect URL obfuscation

obfuscated = analyzer.detect_obfuscation(links) for o in obfuscated: print(f"OBFUSCATED: {o.url}") print(f" Technique: {o.obfuscation_type}") print(f" Decoded: {o.decoded_url}")

Analyze link destinations (safe fetch)

destinations = analyzer.analyze_destinations(links, safe_mode=True)

Export link analysis

analyzer.generate_report("/evidence/link_analysis.html")

Task 9: Business Email Compromise Analysis

Input: Email or mailbox

Process:

Identify BEC indicators
Detect urgency language
Analyze financial requests
Check sender legitimacy
Score BEC probability

Output: BEC analysis results

Example:

from email_forensics import BECDetector

Initialize BEC detector

detector = BECDetector()

Analyze single email

result = detector.analyze_email("/evidence/wire_request.eml")

print(f"BEC Score: {result.bec_score}/100") print(f"Classification: {result.classification}")

Check BEC indicators

for indicator in result.indicators: print(f"INDICATOR: {indicator.type}") print(f" Description: {indicator.description}") print(f" Evidence: {indicator.evidence}") print(f" Weight: {indicator.weight}")

Language analysis

language = result.language_analysis print(f"Urgency detected: {language.urgency_score}") print(f"Authority claims: {language.authority_score}") print(f"Financial keywords: {language.financial_keywords}") print(f"Secrecy requests: {language.secrecy_score}")

Sender analysis

sender = result.sender_analysis print(f"Claimed identity: {sender.claimed_identity}") print(f"Actual sender: {sender.actual_address}") print(f"Executive impersonation: {sender.executive_impersonation}")

Request analysis

request = result.request_analysis print(f"Action requested: {request.action}") print(f"Amount mentioned: {request.amount}") print(f"Account details: {request.has_account_details}") print(f"Wire transfer request: {request.wire_transfer}")

Scan mailbox for BEC

mailbox_results = detector.scan_mailbox("/evidence/mailbox.pst") for r in mailbox_results.high_risk: print(f"HIGH RISK: {r.subject}") print(f" BEC Score: {r.bec_score}")

Generate BEC report

detector.generate_report("/evidence/bec_analysis.html")

Task 10: Email Search and Export

Input: Mailbox file or email collection

Process:

Index email content
Execute search queries
Filter results
Export matches
Generate search report

Output: Search results with exported emails

Example:

from email_forensics import EmailSearcher

Initialize searcher

searcher = EmailSearcher("/evidence/mailbox.pst")

Build search index

searcher.build_index()

Search by keywords

results = searcher.search( query="confidential project", search_body=True, search_subject=True, search_attachments=True )

for r in results: print(f"Match: {r.subject}") print(f" From: {r.sender}") print(f" Date: {r.date}") print(f" Score: {r.relevance_score}") print(f" Snippet: {r.snippet}")

Search by sender

sender_emails = searcher.search_by_sender("suspicious@example.com")

Search by date range

date_range = searcher.search_by_date( start="2024-01-01", end="2024-01-31" )

Search by attachment name

with_attachments = searcher.search_by_attachment( filename_pattern="*.pdf" )

Complex query

complex_results = searcher.advanced_search( sender_contains="@example.com", subject_contains="wire transfer", date_after="2024-01-01", has_attachments=True )

Export search results

searcher.export_results( results, output_dir="/evidence/search_results/", format="eml", include_attachments=True )

Generate search report

searcher.generate_report("/evidence/search_report.html")

Configuration

Environment Variables

Variable Description Required Default

EMAIL_PARSER

Path to email parsing library No Built-in

THREAT_INTEL_FEED

URL threat intelligence feed No None

VT_API_KEY

VirusTotal API key No None

SAFE_BROWSE_KEY

Google Safe Browsing API key No None

Options

Option Type Description

extract_attachments

boolean Auto-extract attachments

decode_mime

boolean Decode MIME-encoded content

parse_html

boolean Parse HTML email bodies

safe_url_check

boolean Safe URL verification

parallel

boolean Enable parallel processing

Examples

Example 1: Phishing Campaign Investigation

Scenario: Investigating a phishing campaign targeting the organization

from email_forensics import MailboxParser, PhishingDetector

Parse quarantined emails

parser = MailboxParser("/evidence/quarantine.pst") emails = parser.get_all_messages()

Initialize phishing detector

detector = PhishingDetector()

Analyze all emails

phishing_emails = [] for email in emails: result = detector.analyze(email) if result.risk_score > 70: phishing_emails.append(result) print(f"PHISHING: {email.subject}") print(f" Risk: {result.risk_score}") print(f" Indicators: {len(result.indicators)}")

Extract IOCs from phishing emails

iocs = detector.extract_iocs(phishing_emails) print(f"Malicious URLs: {len(iocs.urls)}") print(f"Sender addresses: {len(iocs.senders)}")

Generate campaign report

detector.generate_campaign_report(phishing_emails, "/evidence/phishing_campaign.html")

Example 2: BEC Incident Investigation

Scenario: Investigating potential business email compromise

from email_forensics import BECDetector, EmailTimeline, SpoofingDetector

Analyze the suspicious request email

bec = BECDetector() result = bec.analyze_email("/evidence/wire_request.eml")

print(f"BEC Score: {result.bec_score}") print(f"Financial request: {result.request_analysis.wire_transfer}")

Check for spoofing

spoof = SpoofingDetector() spoof_result = spoof.analyze_file("/evidence/wire_request.eml") print(f"Spoofed: {spoof_result.is_spoofed}")

Build communication timeline

timeline = EmailTimeline() timeline.add_mailbox("/evidence/cfo_mailbox.pst") timeline.add_mailbox("/evidence/finance_mailbox.pst")

Find related emails

related = timeline.filter_by_participant(result.sender_analysis.actual_address) print(f"Related emails from sender: {len(related)}")

Limitations

Large mailboxes may require significant processing time
Encrypted emails require decryption keys
Some proprietary formats may have limited support
URL analysis requires network access for verification
Attachment analysis depends on file type support
BEC detection may have false positives
Header analysis accuracy depends on email preservation

Troubleshooting

Common Issue 1: PST File Corruption

Problem: Unable to parse PST file Solution:

Use PST repair tools before analysis
Try different parsing libraries
Extract individual messages if possible

Common Issue 2: Encoded Content Not Decoded

Problem: Email body appears as encoded text Solution:

Enable MIME decoding
Check for unusual character encodings
Try different decoding methods

Common Issue 3: Missing Attachments

Problem: Attachments not extracted Solution:

Check attachment size limits
Verify attachment format support
Look for inline attachments

Related Skills

network-forensics: Analyze email network traffic
browser-forensics: Webmail investigation
malware-forensics: Analyze malicious attachments
timeline-forensics: Integrate email timeline
log-forensics: Correlate with mail server logs

References

Email Forensics Reference
Email Header Analysis Guide
Phishing Detection Patterns

email-forensics

Safety Notice

Copy this and send it to your AI assistant to learn

Parse mailbox file

Analyze single email

Detect phishing

Parse Outlook PST file

Get mailbox info

List folders

Get messages from folder

Search messages

Export messages

Generate mailbox report

Analyze email headers

Get basic headers

Analyze routing path

Get authentication results

Detect anomalies

Get original sender (envelope)

Get X-headers

Export analysis

Initialize detector

Analyze email

Run phishing detection

Get indicators

Check sender authenticity

Analyze URLs

Check attachments

Export report

Initialize analyzer

Extract from single email

Analyze specific attachment

Extract from mailbox

Find suspicious attachments

Check against malware hashes

Generate attachment report

Initialize timeline

Add email sources

Build timeline

Filter by date range

Filter by participants

Get communication patterns

Detect anomalies

Export timeline

Initialize reconstructor

Load emails

Reconstruct all threads

Find specific thread

Find threads with missing messages

Export threads

Generate thread report

Initialize detector

Analyze email

Header vs Envelope analysis

Display name analysis

Authentication analysis

Reply-To analysis

Get all indicators

Export report

Initialize analyzer

Extract links from email

Unshorten URLs

Check against threat intelligence

Detect URL obfuscation

Analyze link destinations (safe fetch)

Export link analysis

Initialize BEC detector

Analyze single email

Check BEC indicators

Language analysis

Sender analysis

Request analysis

Scan mailbox for BEC

Generate BEC report

Initialize searcher

Build search index

Search by keywords

Search by sender

Search by date range

Search by attachment name