AI News Pipeline
Overview
This skill is executable by itself. The actual workflow scripts are bundled in scripts/.
Run them against the current workspace or pass --workspace /path/to/workspace explicitly.
Workspace Requirements
The target workspace should contain or accept these files and folders:
config/sources.jsonconfig/international_sources.jsoncompanies.txtdata/reports/state/
If the folders do not exist, the scripts create them.
Install Dependencies
Install Python dependencies before first use:
python -m pip install -r /path/to/skill/scripts/requirements.txt
Available Entrypoints
Use the bundled Python entrypoints depending on the job type.
Capture Only
Use this for high-frequency collection jobs. It only captures feeds, updates deduplication state, and writes raw and incremental data.
python /path/to/skill/scripts/run_capture_only.py --workspace /path/to/workspace
Report Only
Use this for scheduled delivery jobs. It reads already-collected data, calls the model for summaries and titles, updates the cumulative Excel files, and rebuilds the Word brief.
By default it uses the reporting window from yesterday 00:00 to today 08:00.
python /path/to/skill/scripts/run_report_only.py --workspace /path/to/workspace
Optional time window:
python /path/to/skill/scripts/run_report_only.py --workspace /path/to/workspace --time-window "2026-03-15 00:00 to 2026-03-16 08:00"
Optional skip-AI mode:
python /path/to/skill/scripts/run_report_only.py --workspace /path/to/workspace --disable-ai
Full Workflow
python /path/to/skill/scripts/run_full_workflow.py --workspace /path/to/workspace
Optional time window:
python /path/to/skill/scripts/run_full_workflow.py --workspace /path/to/workspace --time-window "2026-03-15 00:00 to 2026-03-15 18:00"
Optional skip-AI mode:
python /path/to/skill/scripts/run_full_workflow.py --workspace /path/to/workspace --disable-ai
What Each Entrypoint Does
run_capture_only.py
- Collect domestic RSS items into
data/YYYY-MM-DD.jsonl. - Collect domestic raw items into
data/domestic_raw_YYYY-MM-DD.jsonl. - Collect international raw items into
data/international_raw_YYYY-MM-DD.jsonl. - Filter international items into
data/international_YYYY-MM-DD.jsonl. - Save per-source snapshots in
snapshots/. - Update RSS deduplication and source metrics in
state/feed_state.json.
run_report_only.py
- Read the selected time window from collected data.
- Build the cumulative domestic Excel output in
reports/company_mentions.xlsx. - Build the cumulative international Excel output in
reports/international_company_mentions.xlsx. - Call the model to generate domestic AI titles and AI summaries.
- Call the model to generate international AI titles, AI summaries, and impact scores.
- Build a merged daily Word brief in
reports/.
run_full_workflow.py
- Run capture.
- Run domestic reporting.
- Run international reporting.
Inputs
- Domestic RSS config:
config/sources.json - International RSS config:
config/international_sources.json - Company list:
companies.txt - Volcengine key:
ARK_API_KEY - Optional model override:
ARK_MODEL
Important Behavior
state/feed_state.jsoncontrols RSS deduplication.- Excel files are cumulative.
- The Word brief is rebuilt per run.
- The Word international section only includes the top 5 items by impact score inside the selected time window.
- International items without a successful AI summary are excluded from the Word brief.
- AI cache files are deleted automatically after each run.
Troubleshooting
- If the workflow does not rerun old RSS items, check
state/feed_state.json. - If AI columns are empty, check whether
ARK_API_KEYis set in the execution environment. - If the user wants a full rebuild, delete the relevant daily
datafiles andstate/feed_state.json, then rerun. - If the user needs exact commands or cloud prompts, read
references/commands.md.
References
references/commands.md