Overview & Scope
1. Overview & Scope¶
1.1 What Is Hippo?¶
Hippo is an open source, configurable metadata tracking service. It provides a unified, queryable registry of entities, their fields, and their relationships — enabling downstream systems, analysis pipelines, and data portals to reliably locate and filter metadata without manually managing spreadsheets or bespoke file manifests.
Hippo is designed as the first module of a larger platform. It tracks where data lives and what it describes — not the data itself. Raw data files remain in place (e.g., S3, local filesystem); Hippo stores the metadata and file locations needed to find and interpret them.
Hippo is domain-agnostic: the entity types, fields, and relationships it tracks are defined entirely by a schema config file authored for each deployment. For example, an omics research deployment might define entity types like Subject, Sample, and Datafile, while a manufacturing deployment might define Batch, Component, and Inspection.
1.2 Deployment Philosophy¶
Hippo is built to run at any scale from a single researcher's laptop to an enterprise cloud deployment, using the same codebase throughout:
- Local / single-user: Install via
pip, point at a local SQLite database, query from a Python script or notebook in minutes — no server required. - Small team: Run the optional REST API service on a shared host backed by PostgreSQL.
- Enterprise / cloud: Deploy on AWS with managed database backends, container orchestration, and authentication middleware.
Scale is controlled entirely by configuration and backend adapter selection. No code changes are required to move between deployment tiers.
1.3 Position in the Larger Platform¶
Hippo is the first independently deliverable module of a modular platform. It is designed so that other platform modules can be built independently and integrated later via well-defined interfaces.
Hippo itself has no dependencies on those future modules. They will depend on Hippo.
┌──────────────────────────────────────────────────────┐
│ Platform │
│ │
│ ┌─────────┐ ┌──────────┐ ┌──────────────────┐ │
│ │ Hippo │ │ Module B │ │ Module C │ │
│ │ (MTS) │ │ (future) │ │ (future) │ │
│ │ ◄ HERE │ │ │ │ │ │
│ └────┬────┘ └──────────┘ └──────────────────┘ │
│ │ │
│ ┌────▼──────────────────────────────────────────┐ │
│ │ Data Portal / GraphQL Layer (future) │ │
│ └───────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────┘
1.4 Non-Goals¶
Hippo explicitly does not:
- Store, move, or manage raw data files
- Execute or schedule analysis pipelines
- Perform domain-specific analysis or QC
- Provide a user-facing data portal or visualization layer
- Manage authentication or authorization (delegated to transport layer)
- Serve as the system of record for upstream source systems (those are separate modules with placeholder ingestion interfaces in Hippo)
- Replace or replicate LIMS, EHR, or other upstream source systems
1.5 Delivery Scope (v0.1)¶
The initial four-week delivery targets a functional development environment containing:
- Core Python SDK with SQLite backend adapter
- Config-driven schema system supporting arbitrary entity types
- Full provenance and audit trail on all writes
- REST API service (FastAPI) wrapping the SDK
- Batch ingestion interface for loading metadata from flat files
- Placeholder ingestion adapters for future external systems
- Unit and integration test suite
The following are explicitly out of scope for v0.1:
- GraphQL transport layer
- Cloud-managed database adapters (PostgreSQL/RDS, DynamoDB)
- Authentication/authorization middleware
- Data portal or query UI
- Production deployment infrastructure (IaC, CI/CD)
- Bidirectional sync with external systems
1.6 Key Design Principles¶
| Principle | Description |
|---|---|
| SDK-first | All business logic lives in the Python SDK. REST and GraphQL are thin transport wrappers. |
| Adapter pattern | Storage backends, external system integrations, and transport layers are swappable via config. |
| Config-driven schema | Entity schemas are defined in YAML config, not hardcoded. New fields and entity types can be added without code changes. |
| Provenance by default | Every write is versioned. No data is ever hard-deleted. Full change history is always available. |
| Local-first | Zero infrastructure required for single-user deployment. |
| Openplan-compatible | This specification is structured to feed directly into the openplan Vision → Epic → Feature → OpenSpec pipeline. |
1.7 Intended Consumers¶
In v0.1, Hippo serves other applications only — there is no human-facing query interface:
- Analysis pipelines (e.g., Nextflow): resolve file paths and entity metadata at runtime
- Future data portal: browse and filter entities via the GraphQL layer
- Future platform modules: look up entities related to other modules' data
1.8 Glossary¶
| Term | Definition |
|---|---|
| Entity | Any top-level object tracked by Hippo. Entity types are defined in schema config (e.g., Project, Item, Attachment). |
| Schema config | A YAML or JSON file defining the entity types, fields, and relationships for a Hippo deployment, authored in LinkML format. |
| Field | A named, typed attribute on an entity type, declared in schema config. See sec3 §3.5 for supported types. |
| Relationship | A directional, typed edge connecting two entities. Declared in schema config with cardinality constraints. |
| External ID | An identifier from an upstream system mapped to a Hippo entity UUID. Enables cross-system lookups. |
| Adapter | A pluggable implementation of a storage or integration interface. |
| Provenance record | An immutable record of a change to any entity, including what changed, when, and by whom. |