Hippo — Metadata Tracking Service¶

Hippo is an open-source, configurable metadata tracking service. It provides a unified, queryable registry of entities, their fields, and the relationships between them — so that downstream systems, analysis pipelines, and data portals can reliably locate and filter metadata without manually managing spreadsheets or bespoke file manifests.

Hippo is domain-agnostic: the entity types, fields, and relationships it tracks are defined entirely by a schema config file authored for each deployment. For example, an omics research deployment might define entity types like Subject, Sample, and Datafile, while a manufacturing deployment might define Batch, Component, and Inspection.

Who Is Hippo For?¶

Pipeline authors who need to resolve file paths and sample metadata at runtime from Nextflow, Snakemake, or custom scripts
Researchers who want a queryable record of what data exists, where it lives, and what it describes
Data managers who need to track the provenance, lifecycle, and relationships of samples and files across a project
Platform developers building downstream tools that need a reliable metadata API

When to Use Hippo¶

Use Hippo when you need a central metadata registry that is:

Queryable — find entities by type, field values, relationships, or external identifiers
Auditable — every write is versioned with full provenance; nothing is hard-deleted
Flexible — define your own entity types and relationships without changing code
Embeddable — use the Python SDK directly in scripts and notebooks, or run a REST API for shared access

Hippo tracks where data lives and what it describes — not the data itself. Raw data files stay in place on your filesystem or object store.

Key Features¶

Config-driven data model — Define entity types, fields, and relationships directly in LinkML schema
Graph-shaped API — Query entities and traverse relationships through a relational store with graph semantics
Provenance tracking — Every change is logged with structured context and full audit trail
SDK-first architecture — Embed Hippo directly in Python scripts or notebooks; REST API is a thin transport wrapper
Flexible deployment — From a single researcher's laptop (SQLite) to enterprise cloud (PostgreSQL)

Key Concepts¶

Concept	Description
Entity	A top-level object tracked by Hippo. Entity types are defined in your schema config (e.g., Subject, Sample, Datafile).
Schema config	A YAML file defining entity types, fields, and relationships for your deployment, authored directly in LinkML format.
Relationship	A directional, typed edge connecting two entities with cardinality constraints.
External ID	An identifier from an upstream system (e.g., LIMS barcode) mapped to a Hippo entity UUID.
Provenance record	An immutable log entry recording what changed, when, and by whom.
Availability	Entities are never hard-deleted. Instead, they transition through lifecycle states: `active`, `archived`, `superseded`, `deleted`.

Architecture Overview¶

Hippo has three concentric layers — only the Core SDK is required:

┌─────────────────────────────────────────────────┐
│            Transport Layer (optional)            │
│       REST (FastAPI)  ·  GraphQL (future)        │
├─────────────────────────────────────────────────┤
│               Core Python SDK                    │
│  HippoClient · QueryEngine · IngestionPipeline  │
│  ProvenanceManager · SchemaConfig                │
├─────────────────────────────────────────────────┤
│         Infrastructure Layer (adapters)          │
│    SQLite (v0.1)  ·  PostgreSQL (future)         │
└─────────────────────────────────────────────────┘

All business logic lives in the Core SDK. The REST API is a thin wrapper — you can embed Hippo directly in Python code without running a server.

Deployment Options¶

Tier	How it works	Typical use
Local / single-user	`pip install hippo`, point at a SQLite file, query from Python or a notebook. No server required.	Individual researcher, exploratory analysis
Small team	Run `hippo serve` on a shared host backed by SQLite or PostgreSQL.	Lab group, shared project
Enterprise / cloud	Deploy with a managed PostgreSQL backend, container orchestration, and Bridge for authentication.	Production platform, multi-team environment

Getting Started¶

pip install hippo
hippo init my-project
hippo serve

See the Quick Start guide for a complete walkthrough — from schema definition through entity creation, querying, and provenance inspection.

Cappella — Harmonizes data from external sources and upserts into Hippo
Canon — Resolves computational artifacts registered in Hippo
Aperture — User-facing CLI and web interface for browsing Hippo entities
Bridge — Adds authentication and unified API gateway for multi-user deployments

User Documentation¶

Introduction — Detailed overview and use cases
Installation — Setup instructions
Quick Start — Get running in minutes
Schema Guide — Authoring entity schemas in LinkML
Data Model — Core data model concepts
Configuration — Configuration reference
CLI Reference — Command-line interface
API Reference — REST API endpoints