Skip to content

Relational Storage

3b. Relational Storage Mapping

Document status: Draft v0.1 Depends on: sec3_data_model.md Feeds into: sec4_api_layer.md


This section describes how the conceptual data model defined in sec3 maps to relational storage. It is the reference specification for the SQLite (v0.1) and PostgreSQL (future) storage adapters. Other adapter types (graph database, document store, etc.) would have their own mapping documents.


3b.1 Entity Table Schema

Each entity type defined in schema config maps to its own database table. The table name is the lowercase plural of the entity type name (e.g., entity type Item → table items).

Every entity table contains the following physical columns:

Column Type Constraint Source
id UUID Primary key Generated by Hippo on creation
is_available boolean NOT NULL, default true Managed by availability operations
superseded_by UUID Nullable, no default Set atomically by client.supersede_entity(); NULL means not superseded
(user-defined fields) (per schema config) (per schema config) Declared in schema config

superseded_by is a system column applied generically to all entity types by hippo migrate (mirroring how is_available is handled). It is not declared in schema config. hippo migrate adds it via ALTER TABLE {entity_type}s ADD COLUMN superseded_by TEXT when the column is absent. Pre-existing rows receive NULL, correctly representing "not superseded".

User-defined field columns are generated by hippo migrate from the schema config. Field type mappings:

Hippo type SQLite type PostgreSQL type
string TEXT TEXT / VARCHAR(n)
int INTEGER INTEGER
float REAL DOUBLE PRECISION
bool INTEGER (0/1) BOOLEAN
date TEXT (ISO 8601) DATE
datetime TEXT (ISO 8601) TIMESTAMP WITH TIME ZONE
enum TEXT + CHECK constraint TEXT + CHECK constraint
json TEXT (JSON string) JSONB
uri TEXT TEXT
ref TEXT ("type:uuid") TEXT ("type:uuid")

Temporal fields (created_at, updated_at, schema_version) are not stored as columns on entity tables. They are computed at read time from the provenance log (see §3b.6).


3b.2 Partial Index Strategy

hippo migrate creates a partial index on every indexed: true field on every entity table, scoped to available records only:

-- One per indexed field per entity table
CREATE INDEX idx_{entity_type}_{field}_available
ON {entity_type}s ({field})
WHERE is_available = true;

These indexes contain only available records. As entities become unavailable over time the indexes stay small and query performance remains constant regardless of total table size. Queries that explicitly include unavailable records bypass the partial index and perform a full scan, which is appropriate since they are explicitly exceptional operations.


3b.3 External IDs Table

External identifiers are stored in a shared external_ids table:

external_ids:
  id            UUID        primary key
  entity_id     UUID        indexed
  entity_type   string      indexed
  system        string      indexed    (e.g. "starlims", "halo", "nextflow" — see Appendix A)
  external_id   string      indexed
  is_active     boolean     default true
  created_at    datetime               (written by provenance system)

See sec3 §3.4 for the conceptual external ID model (cardinality, immutability, correction via supersession).


3b.4 Entity Relationships Table

Relationships are stored in a single entity_relationships edge table. New relationship types added via schema config require no table migrations.

entity_relationships:
  id              UUID      primary key
  from_id         UUID      indexed
  from_type       string    indexed
  to_id           UUID      indexed
  to_type         string    indexed
  relationship    string    indexed    (e.g. "derived_from", "generated", "superseded_by" — see Appendix A)
  properties      JSON                 (relationship-level attributes)
  status          enum                 (active | removed)
  created_at      datetime             (written by provenance system)

See sec3 §3.7 for the conceptual relationship model (cardinality enforcement, immutability, status-based removal).


3b.5 hippo_meta Table

hippo_meta stores deployment-level state for the Hippo instance. It is managed exclusively by hippo migrate and the SDK internals — not user-writable.

hippo_meta:
  key             string      primary key
  value           JSON
  updated_at      datetime

Standard keys:

Key Value Purpose
schema_version string Currently deployed schema config version
schema_hash string SHA-256 of the compiled schema for drift detection
deprecated_fields JSON object Map of entity type → list of deprecated field names
migration_history JSON array Ordered list of applied migration version strings

3b.6 Migration DDL Mechanics

When hippo migrate detects differences between the schema config and the deployed schema, it generates DDL operations according to the conceptual migration rules defined in sec3 §3.8. The relational DDL for each rule:

Conceptual rule (sec3) Relational DDL
Provision new entity type CREATE TABLE {type}s (id UUID PRIMARY KEY, is_available BOOLEAN DEFAULT true, ...) + partial indexes on all indexed: true fields
Add field ALTER TABLE {type}s ADD COLUMN {field} {sql_type} (nullable, or with DEFAULT if specified)
Add relationship type No DDL — the entity_relationships edge table is generic
Add index CREATE INDEX idx_{type}_{field}_available ON {type}s ({field}) WHERE is_available = true
Extend enum Update CHECK constraint to include new values
Deprecate field No DDL — column retained; hippo_meta.deprecated_fields updated

Hippo never drops columns or tables automatically. Field type changes and entity type removal are rejected by the migration planner (see sec3 §3.8 for the full rule set).


3b.7 Computed Field Derivation

The system fields created_at, updated_at, and schema_version are not stored on entity tables. They are derived from the provenance log at read time.

In a relational context, the SDK resolves these fields via queries against the provenance table:

  • created_at: Timestamp of the first provenance event for the entity (MIN(timestamp) WHERE entity_id = ?)
  • updated_at: Timestamp of the most recent non-SOFT_DELETE provenance event (MAX(timestamp) WHERE entity_id = ? AND operation_type != 'SOFT_DELETE')
  • schema_version: Not stored in v0.1; NULL

The entity_provenance_summary VIEW is required (not optional) for correct operation of client.query() with provenance-derived fields. hippo migrate creates this view before entity table migrations. See sec6_provenance.md §6.6 for the full view definition and column derivation logic.

These derivations are implemented as: - Subqueries on each individual client.get() call (single-entity fast path) - JOIN against entity_provenance_summary on client.query() calls (batch reads)

Both approaches produce identical results visible to callers. Entity table columns created_at and updated_at are retained as a write-through cache — they are updated on every write operation but reads derive authoritative values from the provenance log. See sec6_provenance.md for the full provenance event model.


3b.8 Polymorphic Inheritance Storage

When a schema declares a subtype via base:, Hippo uses a joined table inheritance strategy: each type in the hierarchy has its own table containing only the fields declared at that level. A query for a parent type joins across all subtype tables.

Table structure

samples (parent table — Sample fields only)
  id          UUID  PK
  is_available BOOL
  tissue_type  TEXT
  ...

brain_samples (child table — BrainSample-specific fields only)
  id                         UUID  PK  FK → samples.id
  brain_region               TEXT
  hemisphere                 TEXT
  post_mortem_interval_hours REAL

The child table's id column is both a primary key and a foreign key to the parent table. Every subtype entity has exactly one row in the parent table and one row in its own table.

__type__ discriminator

A __type__ column on the parent table stores the concrete type name:

ALTER TABLE samples ADD COLUMN __type__ TEXT NOT NULL DEFAULT 'Sample';

This allows efficient single-table queries when the concrete type is already known, and drives __type__ exposure on entity response objects.

Query behaviour

Query SQL
client.query("Sample") SELECT * FROM samples LEFT JOIN brain_samples USING (id) LEFT JOIN cell_lines USING (id) WHERE samples.is_available = true
client.query("Sample", exact_type=True) SELECT * FROM samples WHERE __type__ = 'Sample' AND is_available = true
client.query("BrainSample") SELECT * FROM samples JOIN brain_samples USING (id) WHERE samples.is_available = true
client.get("Sample", id) Looks up __type__ in parent table, then fetches the full row via the appropriate join

The storage adapter resolves the join structure at startup from the deployed schema config. No hardcoded table names — all table and column names are derived from the schema.

Migration DDL for subtypes

Operation DDL
New subtype declared CREATE TABLE {subtype}s (id UUID PRIMARY KEY REFERENCES {parent}s(id), ...) + ALTER TABLE {parent}s ADD COLUMN __type__ TEXT if not present
New field on subtype ALTER TABLE {subtype}s ADD COLUMN {field} {sql_type}
New field on parent ALTER TABLE {parent}s ADD COLUMN {field} {sql_type} — inherited by all subtypes automatically

Hippo never drops tables or columns. Removing a subtype from schema config is rejected by the migration planner until all entities of that type are marked unavailable.

Performance note

Joined table inheritance adds a JOIN per subtype level per query. For shallow hierarchies (1–2 levels) this is negligible. Deep hierarchies (3+ levels) should be avoided as a schema design practice. The exact_type=True flag avoids joins entirely when the concrete type is known — Cappella adapters and workflow pipelines should use this when querying for a specific subtype.



3b.8 Polymorphic Inheritance Storage

When a schema declares base: on an entity type, the relational adapter uses a class-table inheritance strategy (also known as "joined table inheritance"):

  • The parent type has its own table containing system fields (id, is_available) and all parent-declared fields
  • Each subtype has its own table containing only the fields specific to that subtype; it shares the same id as the parent row
  • The parent table has a __type__ discriminator column (TEXT, NOT NULL) recording the concrete entity type name (e.g. "BrainSample")

Example (BrainSample extends Sample):

-- Parent table (Sample)
CREATE TABLE samples (
    id          TEXT PRIMARY KEY,
    is_available BOOLEAN NOT NULL DEFAULT 1,
    __type__    TEXT NOT NULL,            -- e.g. "Sample" or "BrainSample"
    external_id TEXT,
    tissue_type TEXT,
    collection_date TEXT
    -- ... other Sample fields
);

-- Subtype table (BrainSample-specific fields only)
CREATE TABLE brain_samples (
    id              TEXT PRIMARY KEY REFERENCES samples(id),
    brain_region    TEXT,
    hemisphere      TEXT CHECK (hemisphere IN ('left','right','bilateral','unknown')),
    post_mortem_interval_hours REAL
);

Query strategy:

  • client.query("Sample")SELECT s.*, bs.* FROM samples s LEFT JOIN brain_samples bs ON s.id = bs.id WHERE s.is_available = 1
  • client.query("Sample", exact_type=True) → adds WHERE s.__type__ = 'Sample'
  • client.query("BrainSample")SELECT s.*, bs.* FROM samples s JOIN brain_samples bs ON s.id = bs.id WHERE s.is_available = 1
  • The LEFT JOIN for polymorphic queries means subtype-specific fields are null on rows that are plain Sample entities — callers can distinguish via __type__

Write strategy:

  • Creating a BrainSample writes one row to samples (with __type__ = "BrainSample") and one row to brain_samples — both in the same transaction
  • Updating a BrainSample updates columns in whichever table(s) contain the changed fields
  • Availability changes only touch the samples table (the parent owns is_available)

Index strategy:

  • Parent table fields follow the existing partial index strategy (§3b.2): partial indexes on all indexed: true parent fields, scoped to WHERE is_available = 1
  • Subtype table fields get their own partial indexes, joining to the parent's availability:
CREATE INDEX idx_brain_samples_brain_region
ON brain_samples (brain_region)
WHERE id IN (SELECT id FROM samples WHERE is_available = 1);

Migration DDL for new subtypes:

Conceptual change Relational DDL
Declare new subtype (e.g. BrainSample: base: Sample) CREATE TABLE brain_samples (id TEXT PRIMARY KEY REFERENCES samples(id), ...) + ALTER TABLE samples ADD COLUMN __type__ TEXT (if first subtype)
Add field to subtype ALTER TABLE brain_samples ADD COLUMN ...
Add field to parent type ALTER TABLE samples ADD COLUMN ... (all subtypes inherit automatically via JOIN)

The __type__ discriminator column is added to the parent table the first time any subtype is declared for that parent. Pre-existing rows receive __type__ = "<ParentTypeName>" as their default value via the migration.

Rationale for class-table over single-table inheritance:

  • Single-table inheritance (all types in one table, nullable columns for subtype fields) would produce very wide sparse tables as the number of subtypes grows — poor for columnar storage patterns and confusing for direct SQL queries
  • Class-table inheritance keeps each table narrow and semantically coherent; SQL queries against a subtype return only relevant columns
  • The JOIN overhead is acceptable for the expected query patterns in Hippo deployments

Opinionated decision: Class-table inheritance is the mandated strategy for the SQLite and PostgreSQL adapters. Alternative adapters (e.g. graph database) may implement polymorphic inheritance differently, but must present identical semantics to the SDK.