Hippo — Metadata Tracking Service¶

Hippo is an open-source, configurable metadata tracking service. It gives you a unified, queryable registry of entities, their fields, and the relationships between them — so that downstream systems, analysis pipelines, and data portals can reliably locate and filter metadata without manually managing spreadsheets or bespoke file manifests.

Hippo is domain-agnostic: the entity types, fields, and relationships it tracks are defined entirely by a schema config file authored for each deployment.

Who Is Hippo For?¶

Pipeline authors who need to resolve file paths and sample metadata at runtime (e.g., from Nextflow or Snakemake)
Researchers who want a queryable record of what data exists, where it lives, and what it describes
Data managers who need to track the provenance, lifecycle, and relationships of samples and files across a project

What Hippo Does¶

Hippo tracks where data lives and what it describes — not the data itself. Raw data files remain in place on your filesystem or object store; Hippo stores the metadata and file locations needed to find and interpret them.

Specifically, Hippo tracks:

Entities of any type defined in your schema config (for example: subjects, samples, and data files in an omics deployment; batches, components, and inspections in a manufacturing deployment)
Relationships between entities, including derivation chains and supersession history
Full provenance — every write is versioned, nothing is ever hard-deleted, and complete change history is always available

What Hippo Does NOT Do¶

Hippo is deliberately scoped. It does not:

Store, move, copy, or manage raw data files
Execute or schedule analysis pipelines
Perform any biological analysis, QC, or data processing
Provide a data portal, web UI, or visualization layer
Manage authentication or authorization (this is delegated to the transport layer or a future middleware component)
Replace upstream source systems such as LIMS, EHR, or clinical databases

How Hippo Fits into the Larger Platform¶

Hippo is designed as the first independently deliverable module of a modular platform. Other platform modules depend on Hippo; Hippo has no dependencies on them.

┌──────────────────────────────────────────────────────┐
│                    Platform                           │
│                                                      │
│  ┌─────────┐  ┌──────────┐  ┌──────────────────┐    │
│  │  Hippo  │  │ Module B │  │  Module C        │    │
│  │  (MTS)  │  │ (future) │  │  (future)        │    │
│  │  ◄ HERE │  │          │  │                  │    │
│  └────┬────┘  └──────────┘  └──────────────────┘    │
│       │                                              │
│  ┌────▼──────────────────────────────────────────┐   │
│  │         Data Portal / GraphQL Layer (future)  │   │
│  └───────────────────────────────────────────────┘   │
└──────────────────────────────────────────────────────┘

Downstream analysis pipelines use Hippo to look up input files and register output files. Data portals query Hippo to present metadata to users. Integration middleware coordinates cross-module operations.

Deployment Options¶

Hippo is designed to run at any scale using the same codebase. You choose your deployment tier through configuration alone — no code changes required.

Tier	How it works	Typical use
Local / single-user	Install via `pip`, point at a local SQLite file, query from a Python script or notebook. No server required.	Individual researcher, exploratory analysis
Small team	Run the REST API service (`hippo serve`) on a shared host, backed by SQLite or PostgreSQL.	Lab group, shared project server
Enterprise / cloud	Deploy on AWS with a managed PostgreSQL or DynamoDB backend, container orchestration, and authentication middleware.	Production platform, multi-team environment

Next Steps¶

Quickstart — Get Hippo running locally in under 5 minutes
Installation Guide — Full installation instructions for all deployment tiers
Data Model — Learn about Hippo's entity types, relationships, and schema system
CLI Reference — Complete reference for the hippo command-line tool
API Reference — REST API reference
Design Specification — Internal engineering specification for developers building or extending Hippo