Hippo Quickstart¶
This guide uses an example research study to walk through Hippo features: tracking brain tissue donors, samples, and RNA-seq data files from a small Alzheimer's disease study.
By the end, you'll understand how to define schemas, run migrations, use the REST API, and work with the Python SDK.
Prerequisites¶
- Hippo installed:
pip install hippo - curl or HTTPie for REST API examples
- (Optional) Python 3.11+ for SDK examples
Step 1: Initialize a Project¶
Create a new Hippo project with the default template:
Expected output:
Created brain_study/
Created brain_study/data/
Created brain_study/config.json
Created brain_study/.gitignore
Hippo project initialized at brain_study
Template: Basic
Run 'hippo serve' to start the server
This creates the following structure:
Step 2: Define Schemas¶
Create a LinkML schema file that defines the entity types for your Alzheimer's research study. LinkML schemas use classes to define entity types, with attributes for their fields.
Create schemas/brain_study.yaml:
id: https://example.org/brain-study
name: brain_study
prefixes:
linkml: https://w3id.org/linkml/
brain_study: https://example.org/brain-study/
imports:
- linkml:types
default_range: string
enums:
SexType:
permissible_values:
M:
F:
DiagnosisType:
permissible_values:
Control:
Alzheimers_Disease:
description: "Alzheimer's Disease"
Mild_Cognitive_Impairment:
description: "Mild Cognitive Impairment"
BrainRegion:
permissible_values:
Hippocampus:
Frontal_Cortex:
description: "Frontal Cortex"
Temporal_Cortex:
description: "Temporal Cortex"
Occipital_Cortex:
description: "Occipital Cortex"
Cerebellum:
TissueType:
permissible_values:
Brain_Tissue:
description: "Brain Tissue"
Blood:
CSF:
FileType:
permissible_values:
FASTQ:
BAM:
VCF:
CSV:
JSON:
TSV:
classes:
Donor:
description: "Brain tissue donors with demographics and diagnosis"
attributes:
donor_id:
range: string
required: true
identifier: true
name:
range: string
required: true
age:
range: integer
required: true
sex:
range: SexType
required: true
diagnosis:
range: DiagnosisType
required: true
brain_region:
range: BrainRegion
required: true
Sample:
description: "Brain tissue samples with collection details and RNA quality metrics"
attributes:
sample_id:
range: string
required: true
identifier: true
donor:
range: Donor
required: true
description: "Reference to the donor entity (holds Hippo internal UUID)"
tissue_type:
range: TissueType
required: true
brain_region:
range: BrainRegion
required: true
collection_date:
range: date
required: true
rin_score:
range: float
description: "RNA Integrity Number (RIN), 1-10 scale"
notes:
range: string
DataFile:
description: "Sequencing output files linked to samples"
attributes:
file_id:
range: string
required: true
identifier: true
sample:
range: Sample
required: true
description: "Reference to the sample entity (holds Hippo internal UUID)"
file_path:
range: string
required: true
file_type:
range: FileType
required: true
size_bytes:
range: integer
required: true
checksum:
range: string
pipeline_version:
range: string
description:
range: string
This schema defines three entity types: - Donor: Brain tissue donors with demographics and diagnosis - Sample: Brain tissue samples with collection details and RNA quality metrics - DataFile: Sequencing output files linked to samples
Entity Reference Attributes¶
The donor attribute on Sample and the sample attribute on DataFile have their range set to another class. These are entity reference attributes -- they are fundamentally different from a plain string attribute named donor_id:
- Semantic relationship --
range: Donordeclares that this attribute points to aDonorentity, not an arbitrary string. Hippo records and exposes this as a typed edge in the data model. - Holds a Hippo internal ID -- the value stored in a reference attribute is the UUID assigned by Hippo when the entity was ingested (e.g.
"donor-1"), not a user-facing identifier like"AD-001". User-facing identifiers belong in a plainstringattribute (such asdonor_id) or as an ExternalID. - Write-time validation -- when reference validation is enabled, Hippo checks at ingest time that the referenced entity UUID exists and is available. Writing a
Samplewith adonorvalue that points to a non-existent or unavailableDonoris rejected. - Graph traversal -- reference attributes connect to the Relationships API. Hippo can traverse from a
DataFileto itsSampleto itsDonorusingclient.relationships.traverse()or the/relationshipsREST endpoint.
Reference value format: Reference attribute values encode the target entity type and UUID as
"entity_type:uuid"-- for example,"Donor:donor-1". The SDK accepts and normalizes both the bare UUID and the prefixed form.
Step 3: Run Migrations¶
Apply the schema to create database tables:
Expected output:
=== Migration Plan ===
New tables to create (3):
+ donors
+ samples
+ data_files
=== Migration Complete ===
Tables created: 3
Tables modified: 0
FTS tables created: 0
Records backfilled: 0
The database is now ready to store entities.
Step 4: Start the Server¶
Start the Hippo REST API server:
Expected output:
The server is now running. In a separate terminal, you can test the endpoints.
Step 5: REST API Walkthrough¶
All requests below use the header Authorization: Bearer dev-token. The dev server accepts any Bearer token.
Create a Donor¶
curl -s -X POST http://127.0.0.1:8000/ingest \
-H "Authorization: Bearer dev-token" \
-H "Content-Type: application/json" \
-d '{
"entity_type": "Donor",
"data": {
"donor_id": "AD-001",
"name": "Subject AD-001",
"age": 78,
"sex": "M",
"diagnosis": "Alzheimers_Disease",
"brain_region": "Hippocampus"
}
}'
Response:
{
"id": "donor-1",
"entity_type": "Donor",
"data": {
"donor_id": "AD-001",
"name": "Subject AD-001",
"age": 78,
"sex": "M",
"diagnosis": "Alzheimers_Disease",
"brain_region": "Hippocampus"
},
"version": 1,
"created_at": "2026-03-17T10:30:00Z",
"updated_at": "2026-03-17T10:30:00Z",
"is_available": true
}
Save the returned entity id (e.g., donor-1) for use in subsequent operations.
Create a Brain Tissue Sample¶
Use the id returned by the Donor ingest call (e.g. donor-1) as the value for the donor reference attribute.
curl -s -X POST http://127.0.0.1:8000/ingest \
-H "Authorization: Bearer dev-token" \
-H "Content-Type: application/json" \
-d '{
"entity_type": "Sample",
"data": {
"sample_id": "SMPL-AD-001-HC",
"donor": "donor-1",
"tissue_type": "Brain_Tissue",
"brain_region": "Hippocampus",
"collection_date": "2025-11-15",
"rin_score": 8.4,
"notes": "Left hemisphere, posterior hippocampus"
}
}'
Create an RNA-seq Data File¶
Use the id returned by the Sample ingest call (e.g. sample-1) as the value for the sample reference attribute.
curl -s -X POST http://127.0.0.1:8000/ingest \
-H "Authorization: Bearer dev-token" \
-H "Content-Type: application/json" \
-d '{
"entity_type": "DataFile",
"data": {
"file_id": "RNASEQ-AD-001-HC-R1",
"sample": "sample-1",
"file_path": "/storage/alzheimer study/2025/AD-001/RNAseq/fastq/sample_R1_001.fastq.gz",
"file_type": "FASTQ",
"size_bytes": 15234000000,
"checksum": "sha256:a3f2b1c9d8e5f...",
"pipeline_version": "rnaseq-v2.1.0",
"description": "RNA-seq FASTQ, forward reads"
}
}'
List Entities¶
curl -s "http://127.0.0.1:8000/entities?entity_type=Donor&limit=10" \
-H "Authorization: Bearer dev-token"
curl -s "http://127.0.0.1:8000/entities?entity_type=Sample&limit=10" \
-H "Authorization: Bearer dev-token"
Update an Entity¶
curl -s -X PUT http://127.0.0.1:8000/entities/donor-1 \
-H "Authorization: Bearer dev-token" \
-H "Content-Type: application/json" \
-d '{
"entity_type": "Donor",
"data": {
"donor_id": "AD-001",
"name": "Subject AD-001",
"age": 78,
"sex": "M",
"diagnosis": "Alzheimers_Disease",
"brain_region": "Temporal_Cortex"
}
}'
Search with Full-Text Search¶
curl -s "http://127.0.0.1:8000/search?entity_type=Donor&q=alzheimer" \
-H "Authorization: Bearer dev-token"
curl -s "http://127.0.0.1:8000/search?entity_type=Sample&q=hippocampus" \
-H "Authorization: Bearer dev-token"
Step 6: Relationships via REST API¶
Link Donor to Sample¶
curl -s -X POST http://127.0.0.1:8000/entities/donor-1/relationships \
-H "Authorization: Bearer dev-token" \
-H "Content-Type: application/json" \
-d '{
"target_entity_id": "sample-1",
"relationship_type": "donated"
}'
Link Sample to Data File¶
curl -s -X POST http://127.0.0.1:8000/entities/sample-1/relationships \
-H "Authorization: Bearer dev-token" \
-H "Content-Type: application/json" \
-d '{
"target_entity_id": "datafile-1",
"relationship_type": "generated_from"
}'
Traverse Relationships¶
curl -s "http://127.0.0.1:8000/entities/donor-1/relationships" \
-H "Authorization: Bearer dev-token"
curl -s "http://127.0.0.1:8000/entities/donor-1/relationships?relationship_type=donated&max_depth=3" \
-H "Authorization: Bearer dev-token"
Step 7: View Provenance/History¶
Response shows every change made to the entity:
[
{
"version": 1,
"operation": "CREATE",
"timestamp": "2026-03-17T10:30:00Z",
"changed_by": "system",
"changes": {}
},
{
"version": 2,
"operation": "UPDATE",
"timestamp": "2026-03-17T10:45:00Z",
"changed_by": "user@example.com",
"changes": {
"brain_region": {
"old": "Hippocampus",
"new": "Temporal_Cortex"
}
}
}
]
Step 8: Python SDK Usage¶
The following self-contained script demonstrates the full Python SDK capabilities. Save this as example_sdk_usage.py:
#!/usr/bin/env python3
"""
Hippo Python SDK Example: Alzheimer's Study Data Management
This script demonstrates:
- Setting up HippoClient with SQLiteAdapter
- Creating, updating, and querying entities
- Full-text search
- Relationships
- History/provenance tracking
- Custom validation
"""
import yaml
from datetime import date
from hippo.core.client import HippoClient
from hippo.core.storage.adapters.sqlite_adapter import SQLiteAdapter
from hippo.core.pipeline import ValidationPipeline
from hippo.core.validation.validators import WriteValidator, WriteOperation, ValidationResult
from hippo.core.exceptions import ValidationFailure
# Define schemas programmatically (normally loaded from YAML)
SCHEMAS = {
"brain_study": """
id: https://example.org/brain-study
name: brain_study
prefixes:
linkml: https://w3id.org/linkml/
brain_study: https://example.org/brain-study/
imports:
- linkml:types
default_range: string
enums:
SexType:
permissible_values:
M:
F:
DiagnosisType:
permissible_values:
Control:
Alzheimers_Disease:
Mild_Cognitive_Impairment:
BrainRegion:
permissible_values:
Hippocampus:
Frontal_Cortex:
Temporal_Cortex:
Occipital_Cortex:
Cerebellum:
TissueType:
permissible_values:
Brain_Tissue:
Blood:
CSF:
FileType:
permissible_values:
FASTQ:
BAM:
VCF:
CSV:
JSON:
TSV:
classes:
Donor:
attributes:
donor_id:
range: string
required: true
identifier: true
name:
range: string
required: true
age:
range: integer
required: true
sex:
range: SexType
required: true
diagnosis:
range: DiagnosisType
required: true
brain_region:
range: BrainRegion
required: true
Sample:
attributes:
sample_id:
range: string
required: true
identifier: true
donor:
range: Donor
required: true
tissue_type:
range: TissueType
required: true
brain_region:
range: BrainRegion
required: true
collection_date:
range: date
required: true
rin_score:
range: float
DataFile:
attributes:
file_id:
range: string
required: true
identifier: true
sample:
range: Sample
required: true
file_path:
range: string
required: true
file_type:
range: FileType
required: true
size_bytes:
range: integer
required: true
checksum:
range: string
pipeline_version:
range: string
"""
}
class RINScoreValidator(WriteValidator):
"""Custom validator: RIN scores must be between 1.0 and 10.0."""
def validate(self, operation: WriteOperation, data: dict) -> ValidationResult:
if operation == WriteOperation.CREATE or operation == WriteOperation.UPDATE:
if 'rin_score' in data:
rin = data['rin_score']
if not (1.0 <= rin <= 10.0):
return ValidationResult(
valid=False,
errors=[f"RIN score must be between 1.0 and 10.0, got {rin}"]
)
return ValidationResult(valid=True, errors=[])
def main():
# Initialize storage adapter (in-memory SQLite for this example)
adapter = SQLiteAdapter(':memory:')
adapter.initialize()
# Parse schemas
schemas = {}
for name, yaml_content in SCHEMAS.items():
schemas[name] = yaml.safe_load(yaml_content)
# Set up validation pipeline with custom validators
pipeline = ValidationPipeline()
pipeline.add_validator(RINScoreValidator())
# Create HippoClient
client = HippoClient(
storage=adapter,
schemas=schemas,
pipeline=pipeline
)
# Create a donor
donor = client.create("Donor", {
"donor_id": "AD-002",
"name": "Subject AD-002",
"age": 82,
"sex": "F",
"diagnosis": "Alzheimers_Disease",
"brain_region": "Frontal_Cortex"
})
print(f"Created donor: {donor['id']}")
donor_id = donor['id']
# Create a sample -- donor attribute holds the Hippo internal ID returned above
sample = client.create("Sample", {
"sample_id": "SMPL-AD-002-FC",
"donor": donor_id,
"tissue_type": "Brain_Tissue",
"brain_region": "Frontal_Cortex",
"collection_date": "2025-12-01",
"rin_score": 7.9
})
print(f"Created sample: {sample['id']}")
sample_id = sample['id']
# Create a data file -- sample attribute holds the Hippo internal ID returned above
datafile = client.create("DataFile", {
"file_id": "RNASEQ-AD-002-FC-R1",
"sample": sample_id,
"file_path": "/storage/alzheimer study/2025/AD-002/RNAseq/fastq/R1.fastq.gz",
"file_type": "FASTQ",
"size_bytes": 14890000000,
"pipeline_version": "rnaseq-v2.1.0"
})
print(f"Created data file: {datafile['id']}")
datafile_id = datafile['id']
# Query with filters
print("\n--- Query: Donors with Alzheimer's ---")
donors = client.query("Donor", filters=[
{"field": "diagnosis", "operator": "eq", "value": "Alzheimers_Disease"}
])
for d in donors:
print(f" {d['data']['donor_id']}: {d['data']['name']}")
# Full-text search
print("\n--- Search: Samples in hippocampus ---")
results = client.search("Sample", "hippocampus")
for r in results:
print(f" {r['data']['sample_id']}: {r['data']['brain_region']}")
# Create relationships
print("\n--- Relationships ---")
client.relationships.relate(donor_id, sample_id, "donated")
print(f"Linked donor {donor_id} -> sample {sample_id} (donated)")
client.relationships.relate(sample_id, datafile_id, "generated_from")
print(f"Linked sample {sample_id} -> datafile {datafile_id} (generated_from)")
# Traverse relationships
print("\n--- Traverse relationships from donor ---")
graph = client.relationships.traverse(donor_id, max_depth=2)
print(f" Depth 0: donor {donor_id}")
for rel in graph:
print(f" -> {rel['target_id']} ({rel['relationship_type']})")
# View history
print("\n--- History for donor ---")
history = client.history(donor_id)
for record in history:
print(f" v{record['version']}: {record['operation']} at {record['timestamp']}")
# Update with validation
print("\n--- Update sample with RIN score ---")
updated_sample = client.update(sample_id, {
"sample_id": "SMPL-AD-002-FC",
"donor": donor_id,
"tissue_type": "Brain_Tissue",
"brain_region": "Frontal_Cortex",
"collection_date": "2025-12-01",
"rin_score": 8.2
})
print(f"Updated sample to RIN: {updated_sample['data']['rin_score']}")
# Try invalid RIN (should fail validation)
print("\n--- Validation: Invalid RIN score ---")
try:
client.update(sample_id, {
"sample_id": "SMPL-AD-002-FC",
"donor": donor_id,
"tissue_type": "Brain_Tissue",
"brain_region": "Frontal_Cortex",
"collection_date": "2025-12-01",
"rin_score": 15.0 # Invalid: exceeds 10.0
})
except ValidationFailure as e:
print(f" Validation failed (expected): {e.errors}")
# Get single entity
print("\n--- Get single entity ---")
donor = client.get("Donor", donor_id)
print(f" Retrieved: {donor['data']['donor_id']}, diagnosis: {donor['data']['diagnosis']}")
# Soft delete (availability management)
print("\n--- Soft delete (set unavailable) ---")
client.delete(donor_id) # Soft delete
print(f" Donor {donor_id} marked as unavailable")
# Check availability
availability = client.get(f"{donor_id}/availability")
print(f" is_available: {availability['is_available']}")
if __name__ == "__main__":
main()
Run the script:
Expected output:
Created donor: donor-1
Created sample: sample-1
Created data file: datafile-1
--- Query: Donors with Alzheimer's ---
AD-002: Subject AD-002
--- Search: Samples in hippocampus ---
--- Relationships ---
Linked donor donor-1 -> sample sample-1 (donated)
Linked sample sample-1 -> datafile datafile-1 (generated_from)
--- Traverse relationships from donor ---
Depth 0: donor donor-1
-> sample sample-1 (donated)
--- History for donor ---
v1: CREATE at 2026-03-17T10:30:00Z
v2: UPDATE at 2026-03-17T10:35:00Z
--- Update sample with RIN score ---
Updated sample to RIN: 8.2
--- Validation: Invalid RIN score ---
Validation failed (expected): ['RIN score must be between 1.0 and 10.0, got 15.0']
--- Get single entity ---
Retrieved: AD-002, diagnosis: Alzheimers_Disease
--- Soft delete (set unavailable) ---
Donor donor-1 marked as unavailable
is_available: False
Step 9: Availability Management (Soft Delete)¶
Hippo never hard-deletes data. The delete() method sets is_available to false:
# Mark entity as unavailable
curl -s -X PUT http://127.0.0.1:8000/entities/donor-1/availability \
-H "Authorization: Bearer dev-token" \
-H "Content-Type: application/json" \
-d '{"is_available": false}'
# Check availability status
curl -s http://127.0.0.1:8000/entities/donor-1/availability \
-H "Authorization: Bearer dev-token"
Response:
To restore:
curl -s -X PUT http://127.0.0.1:8000/entities/donor-1/availability \
-H "Authorization: Bearer dev-token" \
-H "Content-Type: application/json" \
-d '{"is_available": true}'
Next Steps¶
Now that you've completed the quickstart, explore these resources to deepen your understanding:
- Data Model -- Deep dive into entity types, relationships, and schema design
- CLI Reference -- Complete reference for all
hippocommands - API Reference -- Full REST API documentation
- Configuration -- Configure Hippo for different deployment scenarios
- Design Specification -- Internal engineering specification
For larger studies, consider:
- Using PostgreSQL instead of SQLite for production (hippo serve with PostgreSQL backend)
- Setting up external adapters to sync with STARLIMS or HALO
- Configuring the GraphQL layer for complex queries