ARC Creator

Create FAIR Digital Objects following the nfdi4plants ARC specification v3.0.0.

Prerequisites

git and git-lfs installed
ARC Commander CLI at ~/bin/arc (optional but recommended)
For DataHUB sync: Personal Access Token for git.nfdi4plants.org or datahub.hhu.de

Interactive ARC Creation Workflow

Guide the user through these phases in order. Ask questions conversationally — don't dump all questions at once. Batch 2-4 related questions per message.

Phase 1: Investigation Setup

Ask the user:

Investigation identifier (short, lowercase-hyphenated, e.g. cold-stress-arabidopsis)
Title (concise name for the investigation)
Description (textual description of the research goals)
Where to store the ARC locally (suggest /home/uranus/arc-projects/<identifier>/)

Then run scripts/create_arc.sh <path> <identifier> and set investigation metadata via:

arc investigation update -i "<id>" --title "<title>" --description "<desc>"

Phase 2: Studies

For each study, ask:

Study identifier (e.g. plant-growth)
Title and description
Organism (for Characteristic [Organism])
Growth conditions (temperature, light, medium, etc.)
Source materials (what goes in — seeds, cell lines, etc.)
Sample materials (what comes out — leaves, roots, extracts, etc.)
Protocols — does the user have protocol documents to include?
Factors — what experimental variables are being tested? (e.g., temperature, genotype, treatment)

Create with:

arc study init --studyidentifier "<id>"
arc study update --studyidentifier "<id>" --title "<title>" --description "<desc>"

Copy protocol files to studies/<id>/protocols/. Copy resource files to studies/<id>/resources/.

Phase 3: Assays

For each assay, ask:

Assay identifier (e.g. proteomics-ms, rnaseq, sugar-measurement)
Measurement type (e.g., protein expression profiling, transcription profiling, metabolite profiling)
Technology type (e.g., mass spectrometry, nucleotide sequencing, plate reader)
Technology platform (e.g., Illumina NovaSeq, Bruker timsTOF)
Data files — where are the raw data files? (will go into assays/<id>/dataset/)
Processed data — any processed output files?
Protocols — assay-specific protocols?
Performers — who performed this assay? (name, affiliation, role)

Create with:

arc assay init -a "<id>" --measurementtype "<type>" --technologytype "<tech>"

Copy data to assays/<id>/dataset/, protocols to assays/<id>/protocols/.

Phase 4: Workflows (optional)

Ask if there are computational analysis steps. For each:

Workflow identifier (e.g. deseq2-analysis, heatmap-generation)
Description of what it does
Code files (scripts, notebooks)
Dependencies (Python packages, R libraries, Docker image)

Place code in workflows/<id>/. Note: workflow.cwl is REQUIRED by spec but often created later. Inform user.

Phase 5: Runs (optional)

Ask if there are computation outputs. For each:

Run identifier
Which workflow produced it
Output files (figures, tables, processed data)

Place outputs in runs/<id>/.

Phase 6: Contacts & Publications

Ask:

Investigation contacts (name, email, affiliation, role — at minimum the PI)
Publications (if any — DOI, PubMed ID, title, authors)

Add via:

arc investigation person register --lastname "<last>" --firstname "<first>" --email "<email>" --affiliation "<aff>"

Phase 7: Git Commit & DataHUB Sync

Configure git user:

git config user.name "<name>"
git config user.email "<email>"

Commit:

git add -A
git commit -m "Initial ARC: <investigation title>"

Ask if the user wants to push to a DataHUB. If yes:
- Ask which host (git.nfdi4plants.org, datahub.hhu.de, etc.)
- Create remote repo (via browser or API)
- Set remote and push

ISA Metadata Reference

For detailed ISA-XLSX fields, annotation table columns, and ontology references, read references/arc-spec.md.

Key Reminders

Assay data is immutable — never modify files in assays/<id>/dataset/ after initial placement
Studies describe materials, assays describe measurements
Workflows are code, runs are outputs
Git LFS for files > 100 MB: git lfs track "*.fastq.gz" "*.bam" "*.raw"
Don't store ARCs on OneDrive/Dropbox — Git + cloud sync causes conflicts
ARC Commander CLI reference: arc <subcommand> --help

arc-creator

Safety Notice

Copy this and send it to your AI assistant to learn

ARC Creator

Prerequisites

Interactive ARC Creation Workflow

Phase 1: Investigation Setup

Phase 2: Studies

Phase 3: Assays

Phase 4: Workflows (optional)

Phase 5: Runs (optional)

Phase 6: Contacts & Publications

Phase 7: Git Commit & DataHUB Sync

ISA Metadata Reference

Key Reminders

Source Transparency

Related Skills

Philosophical Three Questions

FN Portrait Toolkit

SEO AGI (Multi-Agent SEO: Research → Gap Analysis → Write → Validate → Ship)

Knowledge Gaps