Building Vulnerability Aging and SLA Tracking

Overview

With over 30,000 new vulnerabilities identified in 2024 (a 17% increase from the prior year), organizations must track how long vulnerabilities remain unpatched and whether remediation occurs within defined Service Level Agreements (SLAs). Vulnerability aging measures the time between discovery and remediation, while SLA tracking enforces severity-based deadlines. Industry benchmarks indicate standard SLAs of 14 days for critical, 30 days for high, 60 days for medium, and 90 days for low vulnerabilities, though more aggressive timelines (24-48 hours for actively exploited critical CVEs) are increasingly common. This skill covers designing SLA policies, building aging dashboards, implementing automated escalations, and generating compliance metrics.

Prerequisites

Vulnerability management platform with historical scan data
Asset inventory with criticality ratings
ITSM/ticketing system for remediation tracking
Reporting platform (Splunk, Elastic, Power BI, Grafana)
Stakeholder agreement on SLA timelines and escalation procedures

Core Concepts

Standard Vulnerability SLA Framework

Severity	CVSS Range	Standard SLA	Aggressive SLA	CISA KEV SLA
Critical	9.0-10.0	14 days	48 hours	BOD 22-01 due date
High	7.0-8.9	30 days	7 days	14 days
Medium	4.0-6.9	60 days	30 days	N/A
Low	0.1-3.9	90 days	60 days	N/A
Informational	0.0	Best effort	Best effort	N/A

Adaptive SLA Modifiers

Factor	Modifier	Rationale
Internet-facing asset	-50% SLA	Higher exposure risk
CISA KEV listed	Override to 48h	Active exploitation confirmed
EPSS > 0.7	-50% SLA	High exploitation probability
Tier 1 (crown jewel) asset	-25% SLA	Maximum business impact
Compensating control in place	+25% SLA	Risk partially mitigated
Vendor patch unavailable	Exception with review date	Cannot remediate yet

Key Performance Indicators (KPIs)

KPI	Formula	Target
Mean Time to Remediate (MTTR)	Avg(remediation_date - discovery_date)	< 30 days overall
SLA Compliance Rate	(Vulns remediated within SLA / Total vulns) * 100	>= 90%
Overdue Vulnerability Count	Count where age > SLA	Trending downward
Vulnerability Aging Distribution	Count by age bucket (0-14d, 15-30d, 31-60d, 60+d)	Majority in 0-30d
Remediation Velocity	Vulns closed per week	Trending upward
Exception Rate	(Exceptions / Total vulns) * 100	< 5%

Implementation Steps

Step 1: Define SLA Policy Document

Vulnerability Remediation SLA Policy v1.0

1. Scope: All information systems and applications
2. Severity Classification: Based on CVSS v4.0/v3.1 base score
3. SLA Timelines: See Standard SLA Framework table
4. Adaptive Modifiers: Applied based on asset context
5. Exception Process:
   - Must be documented with business justification
   - Requires compensating control description
   - Maximum extension: 90 days (one renewal)
   - CISO approval required for Critical/High exceptions
6. Escalation Path:
   - 50% SLA elapsed: Automated reminder to asset owner
   - 75% SLA elapsed: Escalation to manager
   - 100% SLA elapsed (overdue): CISO notification
   - 120% SLA elapsed: VP/CTO escalation
7. Metrics Reporting: Monthly to security committee

Step 2: Build the Aging Calculation Engine

import pandas as pd
from datetime import datetime, timedelta

class VulnerabilityAgingTracker:
    """Track vulnerability aging and SLA compliance."""

    SLA_DAYS = {
        "Critical": 14,
        "High": 30,
        "Medium": 60,
        "Low": 90,
    }

    def __init__(self, sla_overrides=None):
        if sla_overrides:
            self.SLA_DAYS.update(sla_overrides)

    def calculate_aging(self, vulns_df):
        """Calculate aging metrics for each vulnerability."""
        today = datetime.now()

        vulns_df["discovery_date"] = pd.to_datetime(vulns_df["discovery_date"])
        vulns_df["remediation_date"] = pd.to_datetime(
            vulns_df["remediation_date"], errors="coerce"
        )

        vulns_df["age_days"] = vulns_df.apply(
            lambda row: (row["remediation_date"] - row["discovery_date"]).days
            if pd.notna(row["remediation_date"])
            else (today - row["discovery_date"]).days,
            axis=1
        )

        vulns_df["sla_days"] = vulns_df["severity"].map(self.SLA_DAYS)
        vulns_df["sla_deadline"] = vulns_df["discovery_date"] + \
            pd.to_timedelta(vulns_df["sla_days"], unit="D")

        vulns_df["is_overdue"] = vulns_df.apply(
            lambda row: row["age_days"] > row["sla_days"]
            if pd.isna(row["remediation_date"]) else False,
            axis=1
        )

        vulns_df["sla_compliance"] = vulns_df.apply(
            lambda row: row["age_days"] <= row["sla_days"]
            if pd.notna(row["remediation_date"]) else None,
            axis=1
        )

        vulns_df["days_overdue"] = vulns_df.apply(
            lambda row: max(0, row["age_days"] - row["sla_days"])
            if row["is_overdue"] else 0,
            axis=1
        )

        vulns_df["sla_pct_elapsed"] = (
            vulns_df["age_days"] / vulns_df["sla_days"] * 100
        ).round(1)

        return vulns_df

    def generate_kpis(self, vulns_df):
        """Generate KPI summary from aging data."""
        open_vulns = vulns_df[vulns_df["remediation_date"].isna()]
        closed_vulns = vulns_df[vulns_df["remediation_date"].notna()]

        kpis = {
            "total_vulnerabilities": len(vulns_df),
            "open_vulnerabilities": len(open_vulns),
            "closed_vulnerabilities": len(closed_vulns),
            "overdue_count": open_vulns["is_overdue"].sum(),
            "mttr_days": closed_vulns["age_days"].mean() if len(closed_vulns) > 0 else 0,
            "sla_compliance_rate": (
                closed_vulns["sla_compliance"].mean() * 100
                if len(closed_vulns) > 0 else 0
            ),
        }

        kpis["overdue_by_severity"] = (
            open_vulns[open_vulns["is_overdue"]]
            .groupby("severity")
            .size()
            .to_dict()
        )

        return kpis

    def get_escalation_list(self, vulns_df):
        """Get vulnerabilities requiring escalation."""
        open_vulns = vulns_df[vulns_df["remediation_date"].isna()].copy()

        escalations = []
        for _, vuln in open_vulns.iterrows():
            pct = vuln["sla_pct_elapsed"]
            if pct >= 120:
                level = "VP/CTO Escalation"
            elif pct >= 100:
                level = "CISO Notification"
            elif pct >= 75:
                level = "Manager Escalation"
            elif pct >= 50:
                level = "Owner Reminder"
            else:
                continue

            escalations.append({
                "cve_id": vuln.get("cve_id", ""),
                "severity": vuln["severity"],
                "age_days": vuln["age_days"],
                "sla_days": vuln["sla_days"],
                "days_overdue": vuln["days_overdue"],
                "sla_pct": pct,
                "escalation_level": level,
                "asset": vuln.get("asset", ""),
                "owner": vuln.get("owner", ""),
            })

        return pd.DataFrame(escalations)

Step 3: Dashboard Visualization

# Grafana/Kibana query examples for vulnerability aging

# Age distribution histogram (Elasticsearch)
age_distribution_query = {
    "aggs": {
        "age_buckets": {
            "range": {
                "field": "age_days",
                "ranges": [
                    {"key": "0-7 days", "to": 8},
                    {"key": "8-14 days", "from": 8, "to": 15},
                    {"key": "15-30 days", "from": 15, "to": 31},
                    {"key": "31-60 days", "from": 31, "to": 61},
                    {"key": "61-90 days", "from": 61, "to": 91},
                    {"key": "90+ days", "from": 91},
                ]
            }
        }
    }
}

# SLA compliance trend (monthly)
sla_trend_query = {
    "aggs": {
        "monthly": {
            "date_histogram": {"field": "remediation_date", "interval": "month"},
            "aggs": {
                "within_sla": {
                    "filter": {"script": {
                        "source": "doc['age_days'].value <= doc['sla_days'].value"
                    }}
                }
            }
        }
    }
}

Best Practices

Start with achievable SLA targets and tighten them as processes mature
Adapt SLAs based on asset criticality and threat context, not just CVSS scores
Automate escalation notifications to reduce manual tracking overhead
Track MTTR trends month-over-month to demonstrate improvement
Build exception workflows that require documented compensating controls
Report SLA compliance to executive leadership monthly for accountability
Include aging metrics in security committee and board-level reporting
Integrate SLA tracking with ITSM ticketing for end-to-end remediation visibility

Common Pitfalls

Setting unrealistic SLA targets that teams cannot meet, causing SLA fatigue
Not adapting SLAs for asset criticality, treating all systems equally
Lacking exception processes, forcing teams to either ignore SLAs or request blanket waivers
Measuring only open vulnerability count without considering age and SLA compliance
Not tracking the SLA clock from discovery date (using report date instead)
Failing to re-baseline SLAs as team maturity improves

Related Skills

implementing-vulnerability-remediation-sla
building-executive-vulnerability-risk-report
implementing-security-metrics-and-kpis
performing-remediation-validation-scanning