Cappella — Quick Start¶
Installation¶
pip install cappella
# For SQL adapter support:
pip install cappella[sql]
# For development:
pip install cappella[dev]
Minimal Configuration¶
Create cappella.yaml:
hippo:
url: "http://localhost:8001"
token: "${HIPPO_TOKEN}"
canon:
enabled: true
mode: in_process # import canon directly (no separate service needed)
adapters:
donors:
type: csv
trust_level: 80
config:
source: file
url: "/data/exports/donors.csv"
entity_type: Donor
external_id_field: SUBJECT_ID
field_map:
SUBJECT_ID: external_id
SEX: sex
AGE_AT_DEATH: age_at_death
DIAGNOSIS: diagnosis
vocabulary_map:
diagnosis:
"CTE": "chronic traumatic encephalopathy"
"AD": "Alzheimer disease"
triggers:
- name: weekly_donor_sync
type: schedule
schedule: "0 2 * * 0" # Sunday 2 AM
action:
type: ingest
adapter: donors
Start the Service¶
Or just use the CLI without starting a server.
Ingest Data¶
From a file (scheduled or manual):
Upload a CSV directly:
Fire a trigger manually:
Resolve a Collection¶
Get all gene counts for CTE DLPFC samples against GRCh38:
cappella resolve \
--entity-type GeneCounts \
--criteria "donor.diagnosis=chronic traumatic encephalopathy" \
--criteria "sample.tissue=DLPFC" \
--parameters genome=GRCh38 \
--config cappella.yaml \
--output my_collection.json
Output (my_collection.json):
{
"request": {
"entity_type": "GeneCounts",
"criteria": {"donor.diagnosis": "chronic traumatic encephalopathy", "sample.tissue": "DLPFC"}
},
"selection": {"strategy": "most_recent"},
"resolved": [
{"sample_id": "S001", "entity": {"id": "uuid-1", "uri": "s3://bucket/s001.counts.tsv"}, "status": "reused"},
{"sample_id": "S002", "entity": {"id": "uuid-2", "uri": "s3://bucket/s002.counts.tsv"}, "status": "built"}
],
"unresolved": [
{"sample_id": "S003", "reason": "no_dataset", "detail": "No SequencingDataset found for sample S003"}
],
"provenance": {"cappella_version": "0.1.0", "canon_version": "0.2.0"},
"resolved_count": 2,
"unresolved_count": 1
}
Via the REST API¶
# Start the server
cappella serve --config cappella.yaml
# Resolve via API
curl -X POST http://localhost:8000/resolve \
-H "Content-Type: application/json" \
-d '{
"entity_type": "GeneCounts",
"criteria": {"donor.diagnosis": "chronic traumatic encephalopathy"},
"parameters": {"genome": "GRCh38"}
}'
# Response: 202 Accepted with run_id
# {"run_id": "uuid-789", "status": "queued", "poll_url": "/resolve/uuid-789"}
# Poll for result
curl http://localhost:8000/resolve/uuid-789
Check Status¶
{
"cappella_version": "0.1.0",
"hippo": {"status": "ok", "version": "0.3.1"},
"adapters": {
"donors": {"status": "ok", "last_sync": "2026-03-26T02:00:47Z"}
}
}
Python API¶
from cappella.config import load_config
from cappella.adapters.csv_adapter import CSVAdapter
from cappella.ingest.pipeline import IngestPipeline
# Load config
config = load_config("cappella.yaml")
# Ingest from CSV
adapter = CSVAdapter(config.adapters["donors"].config)
pipeline = IngestPipeline(hippo_client=hippo_client)
result = pipeline.run(adapter)
print(f"Ingested {result.upserted} entities ({result.created} new, {result.updated} updated)")
print(f"Conflicts detected: {result.conflicts_detected}")
Next Steps¶
- User Guide — detailed adapter configuration, selection strategies, triggers
- Design Spec — architecture decisions and component boundaries
- API Reference — full REST API and CLI documentation