observability first

Observability First (可观测性优先)

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "observability first" with this command: npx skills add fwrite0920/android-skills/fwrite0920-android-skills-observability-first

Observability First (可观测性优先)

Instructions

  • 先定义要观测的关键流程与指标

  • 先填写 Required Inputs(流程、阈值、负责人)并冻结

  • 依序创建 Crash/ANR、结构化日志、性能指标

  • 一次只补强一类信号,避免噪音扩散

  • 完成后对照 Quick Checklist

When to Use

  • 发布前需要稳定监控与回馈闭环

  • 事故频繁但缺乏可定位信息

  • 需要把性能与稳定性纳入日常决策

Example Prompts

  • "请创建支付流程的可观测性指标与事件"

  • "请设计 Crash/ANR 的告警门槛"

  • "帮我创建结构化日志的字段规格"

  • "请用 OpenTelemetry 追踪关键 API 调用链"

Workflow

  • 先确认 Required Inputs(关键流程、SLO、告警接收人)

  • 定义关键流程与 SLO,并建立事件命名规范

  • 创建 Crash/ANR 与结构化事件

  • 加入性能指标与告警门槛

  • 建立回馈回路(分析 -> 修复 -> 验证)与值班流程

  • 运行 Monitoring Gate 验收命令并记录结果

Practical Notes (2026)

  • 先有指标再谈优化,避免主观调整

  • 事件字段需一致,便于查找与汇整

  • 告警门槛要可运行且可回溯

  • 同一个业务事件只定义一个 canonical name,避免跨系统同义词

  • P0 告警必须有 owner 与响应时限(SLA),避免“看见但没人处理”

  • 仪表板要区分 release/build version,支持回归比对

Minimal Template

目标: 关键流程 owner: 告警接收渠道: SLO 窗口(日/周): 关键流程: 指标/事件: 告警门槛: 验收: Quick Checklist

Required Inputs (执行前输入)

  • 关键流程清单 (启动/登录/支付/列表等)

  • SLO (指标定义、统计窗口、目标值)

  • Owner & Oncall (每个 P0/P1 指标对应负责人)

  • 告警渠道 (PagerDuty/Slack/Email)

  • 事件命名与字段字典 (event name、必填字段、枚举值)

  • 发布维度 (build version、flavor、region)

Deliverables (完成后交付物)

  • SLO 文档 (含阈值、owner、响应时限)

  • 结构化事件字典 (字段说明 + 示例)

  • Crash/ANR 上下文策略 (custom keys 与分级)

  • 性能仪表板 (启动、网络、滚动、关键交易)

  • 告警规则 (P0/P1/P2)与升级路径

  • 反馈闭环记录模板 (问题 -> 修复 -> 指标回归)

Monitoring Gate (验收门槛)

1) 基础质量

./gradlew lint test assemble

2) 性能量测(若项目有 benchmark 模块)

./gradlew :benchmark:connectedBenchmarkAndroidTest

3) 关键信号验证(按项目脚本调整)

./gradlew :app:connectedDebugAndroidTest

4) 发布前手动核查

- Crashlytics/Performance dashboard 有最近 24h 数据

- P0 告警路由已验证(至少演练一次)

没有 benchmark 模块时,需在 PR 说明中记录替代量测方法与结果。

Signals & SLOs

关键流程清单

流程 P0 指标 SLO 目标

启动 Cold Start 时间 P95 < 1.5s

登录 成功率

99.5%

核心交易 完成率、延迟 成功率 > 99.9%, P95 < 3s

列表滚动 掉帧率 Jank < 1%

网络请求 成功率、延迟 成功率 > 99%, P95 < 500ms

指标分级

enum class MetricPriority { P0, // Crash/ANR 率、关键流程失败率 — 立即告警 P1, // 首次渲染时间、列表滚动流畅度 — 每日检视 P2 // 特定功能转换率或完成率 — 每周检视 }

Firebase Performance Monitoring

自定义 Trace

class CheckoutTracer @Inject constructor() {

fun &#x3C;T> traceCheckout(block: () -> T): T {
    val trace = Firebase.performance.newTrace("checkout_flow")
    trace.start()
    return try {
        val result = block()
        trace.putAttribute("result", "success")
        result
    } catch (e: Exception) {
        trace.putAttribute("result", "failure")
        trace.putAttribute("error", e.javaClass.simpleName)
        throw e
    } finally {
        trace.stop()
    }
}

}

// 使用 class CheckoutUseCase @Inject constructor( private val tracer: CheckoutTracer, private val repository: OrderRepository ) { suspend fun execute(order: Order): OrderResult { return tracer.traceCheckout { repository.submitOrder(order) } } }

网络请求自动追踪

class PerformanceInterceptor : Interceptor { override fun intercept(chain: Interceptor.Chain): Response { val request = chain.request() val metric = Firebase.performance.newHttpMetric( request.url.toString(), request.method ) metric.start()

    return try {
        val response = chain.proceed(request)
        metric.setResponseContentType(response.header("Content-Type"))
        metric.setHttpResponseCode(response.code)
        metric.setResponsePayloadSize(response.body?.contentLength() ?: 0)
        response
    } catch (e: IOException) {
        metric.putAttribute("result", "io_exception")
        throw e
    } finally {
        metric.stop()
    }
}

}

Structured Events

统一事件接口

interface AnalyticsTracker { fun track(event: AnalyticsEvent) }

data class AnalyticsEvent( val name: String, val params: Map<String, Any> = emptyMap() )

class CompositeTracker @Inject constructor( private val trackers: Set<@JvmSuppressWildcards AnalyticsTracker> ) : AnalyticsTracker { override fun track(event: AnalyticsEvent) { trackers.forEach { it.track(event) } } }

class FirebaseTracker @Inject constructor() : AnalyticsTracker { override fun track(event: AnalyticsEvent) { Firebase.analytics.logEvent(event.name) { event.params.forEach { (key, value) -> when (value) { is String -> param(key, value) is Long -> param(key, value) is Double -> param(key, value) is Bundle -> param(key, value) } } } } }

事件字段规格

object EventKeys { const val FLOW_ID = "flow_id" const val USER_TIER = "user_tier" const val BUILD_VERSION = "build_version" const val LATENCY_MS = "latency_ms" const val RESULT = "result" const val ERROR_CODE = "error_code" const val SCREEN_NAME = "screen_name" }

// 使用 tracker.track(AnalyticsEvent( name = "checkout_completed", params = mapOf( EventKeys.FLOW_ID to flowId, EventKeys.LATENCY_MS to duration, EventKeys.RESULT to "success", EventKeys.USER_TIER to "premium" ) ))

Crash / ANR Strategy

Crash 上下文增强

class CrashContextManager @Inject constructor() {

fun setFlowContext(flowName: String, params: Map&#x3C;String, String> = emptyMap()) {
    Firebase.crashlytics.apply {
        setCustomKey("current_flow", flowName)
        setCustomKey("flow_timestamp", System.currentTimeMillis().toString())
        params.forEach { (k, v) -> setCustomKey(k, v) }
    }
}

fun clearFlowContext() {
    Firebase.crashlytics.setCustomKey("current_flow", "none")
}

}

Non-Fatal 分级策略

enum class NonFatalSeverity { LOW, MEDIUM, HIGH }

class NonFatalReporter @Inject constructor() {

fun report(
    exception: Exception,
    severity: NonFatalSeverity,
    context: Map&#x3C;String, String> = emptyMap()
) {
    if (severity == NonFatalSeverity.LOW) return

    Firebase.crashlytics.apply {
        setCustomKey("severity", severity.name)
        context.forEach { (k, v) -> setCustomKey(k, v) }
        recordException(exception)
    }
}

}

Performance Signals

Startup 量测上报

class StartupMetricReporter @Inject constructor( private val tracker: AnalyticsTracker ) { private var processStartTime: Long = 0L

fun onProcessStart() {
    processStartTime = SystemClock.elapsedRealtime()
}

fun onFirstFrameRendered() {
    val duration = SystemClock.elapsedRealtime() - processStartTime
    tracker.track(AnalyticsEvent(
        name = "app_startup",
        params = mapOf(
            EventKeys.LATENCY_MS to duration,
            "startup_type" to "cold"
        )
    ))
}

}

CI Gate 性能门槛

.github/workflows/performance-gate.yml

name: Performance Gate on: pull_request: branches: [main] jobs: benchmark: runs-on: macos-latest steps: - uses: actions/checkout@v4 - name: Run Macrobenchmark uses: reactivecircus/android-emulator-runner@v2 with: api-level: 34 script: ./gradlew :benchmark:connectedBenchmarkAndroidTest - name: Check Startup Threshold run: | STARTUP_MS=$(jq '.benchmarks[0].metrics.timeToInitialDisplayMs.median' benchmark/build/outputs/connected_android_test_additional_output/benchmarkData.json) if (( $(echo "$STARTUP_MS > 1500" | bc -l) )); then echo "Startup ${STARTUP_MS}ms exceeds 1500ms threshold" exit 1 fi

Alerting & Feedback Loop

告警门槛

指标 告警门槛 动作

Crash-free rate < 99.5% P0 立即通知

ANR rate

0.5% P0 立即通知

API 成功率 < 99% P1 当日处理

Cold Start P95

2s P1 当日处理

Jank rate

5% P2 本周处理

回馈回路

发现问题 → 定位根因 → 修复 → 验证指标回归 → 更新 SLO ↑ │ └──────────────────────────────────────────────┘

Quick Checklist

  • Required Inputs 已填写并冻结(流程/SLO/owner/告警渠道)

  • 关键流程与 SLO 定义完成

  • 关键事件命名与字段字典完成(含必填字段)

  • Firebase Performance 自定义 Trace 覆盖核心流程

  • 事件字段统一且可查找

  • Crash/ANR 上下文增强(Custom Keys)

  • Non-Fatal 分级策略避免噪音

  • 性能指标有量测与 CI Gate 门槛

  • 告警门槛与回馈回路已创建

  • 每个 P0 指标有明确 owner 与响应时限

  • Monitoring Gate 已执行并记录结果

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

crash monitoring

No summary provided by upstream source.

Repository SourceNeeds Review
General

navigation patterns

No summary provided by upstream source.

Repository SourceNeeds Review
General

deep performance tuning

No summary provided by upstream source.

Repository SourceNeeds Review
General

dependency injection mastery

No summary provided by upstream source.

Repository SourceNeeds Review