AI Tech RSS Fetch
Core Goal
- Subscribe to RSS/Atom sources.
- Persist feed and entry metadata to SQLite.
- Deduplicate entries with layered identity keys plus content fingerprints.
- Keep only metadata; do not fetch full article bodies and do not summarize.
Triggering Conditions
- Receive a request to subscribe RSS feeds from URLs or OPML.
- Receive a request to run incremental RSS sync reliably.
- Need stable metadata persistence for downstream processing.
- Need dedupe-safe storage of feed items over repeated runs.
Workflow
- Prepare runtime and database.
- Ensure dependency is installed:
python3 -m pip install feedparser. - In multi-agent runtimes, pin DB to an absolute path before any command:
export AI_RSS_DB_PATH="/absolute/path/to/workspace-rss-bot/ai_rss.db"
- Initialize SQLite schema once:
python3 scripts/rss_subscribe.py init-db --db "$AI_RSS_DB_PATH"
- Add feed subscriptions.
- Add one feed URL:
python3 scripts/rss_subscribe.py add-feed --db "$AI_RSS_DB_PATH" --url "https://example.com/feed.xml"
- Import feeds from OPML:
python3 scripts/rss_subscribe.py import-opml --db "$AI_RSS_DB_PATH" --opml assets/hn-popular-blogs-2025.opml
- Run incremental sync.
- Fetch active feeds and store metadata:
python3 scripts/rss_subscribe.py sync --db "$AI_RSS_DB_PATH" --max-feeds 20 --max-items-per-feed 100
- Optional one-feed sync:
python3 scripts/rss_subscribe.py sync --db "$AI_RSS_DB_PATH" --feed-url "https://example.com/feed.xml"
- Query persisted metadata.
- List feeds:
python3 scripts/rss_subscribe.py list-feeds --db "$AI_RSS_DB_PATH" --limit 50
- List recent entries:
python3 scripts/rss_subscribe.py list-entries --db "$AI_RSS_DB_PATH" --limit 100
Input Requirements
- Supported inputs:
- RSS XML feed URLs.
- OPML feed list files.
Output Contract (Metadata Only)
- Persist
feedsmetadata to SQLite:feed_url,feed_title,site_url,etag,last_modified, status fields.
- Persist
entriesmetadata to SQLite:id,dedupe_key(compat primary identity snapshot),guid,url,canonical_url,title,author,published_at,updated_at,summary,categories,content_hash,match_confidence, timestamps.
- Persist
entry_identitiesmapping table to SQLite:entry_id,key_type,key_value,created_at.- Supported key types:
guid,canonical_url,legacy_guid,fallback_hash.
- Do not store generated summaries and do not create archive markdown files.
Configurable Parameters
db_pathAI_RSS_DB_PATH(recommended absolute path in multi-agent runtime)opml_pathfeed_urlsmax_feeds_per_runmax_items_per_feeduser_agentseen_ttl_daysenable_conditional_get- Example config:
assets/config.example.json
Error and Boundary Handling
- Feed HTTP/network failure: keep syncing other feeds and record
last_error. - Feed
304 Not Modified: skip entry parsing and keep state. - Missing
guidandlink: use hashed fallback identity and setmatch_confidence=low. - Dependency missing (
feedparser): return install guidance.
Final Output Checklist (Required)
- core goal
- trigger conditions
- input requirements
- metadata schema
- dedupe and sync rules
- command workflow
- configurable parameters
- error handling
Use the following simplified checklist verbatim when the user requests it:
核心目标
输入需求
触发条件
元数据模型
去重与同步规则
命令流程
可配置参数
错误处理
References
references/input-model.mdreferences/output-rules.mdreferences/time-range-rules.md
Assets
assets/hn-popular-blogs-2025.opml(candidate feed pool)assets/config.example.json
Scripts
scripts/rss_subscribe.py