knowledge-base-setup

在 Mac Mini (M4) 上快速搭建本地知识库 + RAG 自然语言搜索系统。 适用场景: - 新 Mac 配置知识库:从零开始安装配置 Ollama、embedding模型、定时任务、OCR文档分析 - 遇到 PDF 提取乱码、定时任务超时、skill 加载失败等问题 - 想要建立每日自动分析文档 + 08:00发送摘要到飞书的流程 - 迁移或复现知识库:打包整个 knowledge 目录和配置到新电脑 本 skill 会引导完成:目录结构创建、依赖安装、脚本部署、定时任务注册、OpenClaw 配置。

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "knowledge-base-setup" with this command: npx skills add seairteng/macmini-knowledge-base

Knowledge Base Setup

在 Mac Mini 上快速搭建本地知识库 + RAG 搜索系统。

快速开始

方法一:一键安装(推荐)

cd ~/.openclaw/workspace/skills/knowledge-base-setup/scripts
bash setup.sh <飞书用户ID>

setup.sh 会自动完成:

  1. 创建目录结构
  2. 安装 tesseract + Python 依赖
  3. 下载 nomic-embed-text 模型
  4. 部署分析脚本到 knowledge/.analysis/
  5. 更新 OpenClaw 配置(ollama provider + memorySearch)
  6. 重启网关

安装完成后手动注册定时任务(setup.sh 会打印具体命令)。

方法二:手动分步安装

按顺序执行以下步骤:

Step 1: 环境准备

brew install tesseract
pip3 install pytesseract pymupdf pdfplumber
# 安装 Ollama: https://ollama.com/download

Step 2: 下载 embedding 模型

ollama pull nomic-embed-text

Step 3: 创建目录结构

mkdir -p ~/.openclaw/workspace/knowledge/.analysis/summaries/archives
mkdir -p ~/.openclaw/workspace/knowledge/temp_docs
mkdir -p ~/.openclaw/workspace/knowledge/"Macro Financials"
touch ~/.openclaw/workspace/knowledge/文章目录.md

Step 4: 部署脚本(从 skill 目录复制)

cp ~/.openclaw/workspace/skills/knowledge-base-setup/scripts/run_analysis.py \
   ~/.openclaw/workspace/knowledge/.analysis/
cp ~/.openclaw/workspace/skills/knowledge-base-setup/scripts/generate_catalog.js \
   ~/.openclaw/workspace/knowledge/.analysis/
chmod +x ~/.openclaw/workspace/knowledge/.analysis/*.py
chmod +x ~/.openclaw/workspace/knowledge/.analysis/*.js

Step 5: 配置 OpenClaw

编辑 ~/.openclaw/openclaw.json,加入:

{
  "models": {
    "providers": {
      "ollama": {
        "baseUrl": "http://127.0.0.1:11434",
        "api": "ollama",
        "models": [
          {"id": "nomic-embed-text", "name": "Nomic Embed Text"}
        ]
      }
    }
  },
  "agents": {
    "defaults": {
      "memorySearch": {
        "provider": "ollama",
        "model": "nomic-embed-text"
      }
    }
  }
}

确保 tools 区块有:

"tools": {
    "alsoAllow": ["exec", "process"]
}

然后重启:openclaw gateway restart

Step 6: 注册定时任务

# 22:00 分析任务
openclaw cron add \
  --name "22:00分析新文档" \
  --cron "0 22 * * *" \
  --tz "Asia/Shanghai" \
  --session isolated \
  --timeout-seconds 300 \
  --message "运行 run_analysis.py 和 generate_catalog.js" \
  --announce --channel feishu --to "user:<飞书用户ID>"

# 08:00 发送任务
openclaw cron add \
  --name "08:00发送文档摘要" \
  --cron "0 8 * * *" \
  --tz "Asia/Shanghai" \
  --session isolated \
  --timeout-seconds 120 \
  --message "读取 summaries/ 目录发送摘要到飞书,完成后移动到 archives/" \
  --announce --channel feishu --to "user:<飞书用户ID>"

迁移到新电脑

  1. 复制整个目录:
    scp -r ~/.openclaw/workspace/knowledge user@new-mac:~/.openclaw/workspace/
    
  2. 在新电脑运行 setup.sh 或手动分步安装
  3. 重新注册定时任务(Job ID会变)

避坑指南

问题原因解决
PDF提取乱码自定义字体无ToUnicodepymupdf+tesseract OCR
定时任务超时默认120秒太短--timeout-seconds 300
飞书无exec工具tools策略限制添加 alsoAllow: [exec, process]
skill加载失败导出名称错误CodeChunkerFileChunker
BGE-M3卡顿16GB内存不足继续用 nomic-embed-text
brew install ollama慢网络问题直接下载 dmg 安装

关键路径

  • Skill目录:~/.openclaw/workspace/skills/knowledge-base-setup/
  • 知识库:~/.openclaw/workspace/knowledge/
  • 分析脚本:~/.openclaw/workspace/knowledge/.analysis/
  • 摘要输出:~/.openclaw/workspace/knowledge/.analysis/summaries/

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

地藏经药师经智慧

地藏经药师经智慧 - 佛家孝道与救度思想,涵盖地藏本愿、药师十二愿、因果报应、消灾延寿等核心智慧,适用于道德修养、慈悲精神、身心健康

Registry SourceRecently Updated
General

Precision Oncology Zhcn

综合学术文献、流行病学报告、临床与药物指南及临床试验报告,提供关于癌症及其治疗的报告。 基于癌变机制进行详细的分子生物学和组织学分析。 当查询涉及以下内容时加载本技能: - 癌症或肿瘤 - 癌变机制 - 癌症或肿瘤的治疗 典型查询 - 乳腺癌是如何发生的? - 白血病的一线和二线治疗 - CAR-T 疗法治疗胰腺...

Registry SourceRecently Updated
General

hermes-traffic-guardian

Hermes runtime traffic monitoring baseline for opt-in proxy inspection, egress detection, and attestation-aware traffic posture.

Registry SourceRecently Updated
General

Scp Paradigm

Use when analyzing how industry structure drives firm behavior and market performance, assessing market concentration, entry barriers, or competitive dynamic...

Registry SourceRecently Updated