Relational Storage

3b. Relational Storage Mapping¶

Document status: Draft v0.1 Depends on: sec3_data_model.md Feeds into: sec4_api_layer.md

This section describes how the conceptual data model defined in sec3 maps to relational storage. It is the reference specification for the SQLite (v0.1) and PostgreSQL (future) storage adapters. Other adapter types (graph database, document store, etc.) would have their own mapping documents.

3b.1 Entity Table Schema¶

Each entity type defined in schema config maps to its own database table. The table name is the lowercase plural of the entity type name (e.g., entity type Item → table items).

Every entity table contains the following physical columns:

Column	Type	Constraint	Source
`id`	UUID	Primary key	Generated by Hippo on creation
`is_available`	boolean	NOT NULL, default `true`	Managed by availability operations
`superseded_by`	UUID	Nullable, no default	Set atomically by `client.supersede_entity()`; `NULL` means not superseded
(user-defined fields)	(per schema config)	(per schema config)	Declared in schema config

superseded_by is a system column applied generically to all entity types by hippo migrate (mirroring how is_available is handled). It is not declared in schema config. hippo migrate adds it via ALTER TABLE {entity_type}s ADD COLUMN superseded_by TEXT when the column is absent. Pre-existing rows receive NULL, correctly representing "not superseded".

User-defined field columns are generated by hippo migrate from the schema config. Field type mappings:

Hippo type	SQLite type	PostgreSQL type
`string`	TEXT	TEXT / VARCHAR(n)
`int`	INTEGER	INTEGER
`float`	REAL	DOUBLE PRECISION
`bool`	INTEGER (0/1)	BOOLEAN
`date`	TEXT (ISO 8601)	DATE
`datetime`	TEXT (ISO 8601)	TIMESTAMP WITH TIME ZONE
`enum`	TEXT + CHECK constraint	TEXT + CHECK constraint
`json`	TEXT (JSON string)	JSONB
`uri`	TEXT	TEXT
`ref`	TEXT (`"type:uuid"`)	TEXT (`"type:uuid"`)

Temporal fields (created_at, updated_at, schema_version) are not stored as columns on entity tables. They are computed at read time from the provenance log (see §3b.6).

3b.2 Partial Index Strategy¶

hippo migrate creates a partial index on every indexed: true field on every entity table, scoped to available records only:

-- One per indexed field per entity table
CREATE INDEX idx_{entity_type}_{field}_available
ON {entity_type}s ({field})
WHERE is_available = true;

These indexes contain only available records. As entities become unavailable over time the indexes stay small and query performance remains constant regardless of total table size. Queries that explicitly include unavailable records bypass the partial index and perform a full scan, which is appropriate since they are explicitly exceptional operations.

3b.3 External IDs Table¶

External identifiers are stored in a shared external_ids table:

external_ids:
  id            UUID        primary key
  entity_id     UUID        indexed
  entity_type   string      indexed
  system        string      indexed    (e.g. "starlims", "halo", "nextflow" — see Appendix A)
  external_id   string      indexed
  is_active     boolean     default true
  created_at    datetime               (written by provenance system)

See sec3 §3.4 for the conceptual external ID model (cardinality, immutability, correction via supersession).

3b.4 Entity Relationships Table¶

Relationships are stored in a single entity_relationships edge table. New relationship types added via schema config require no table migrations.

entity_relationships:
  id              UUID      primary key
  from_id         UUID      indexed
  from_type       string    indexed
  to_id           UUID      indexed
  to_type         string    indexed
  relationship    string    indexed    (e.g. "derived_from", "generated", "superseded_by" — see Appendix A)
  properties      JSON                 (relationship-level attributes)
  status          enum                 (active | removed)
  created_at      datetime             (written by provenance system)

See sec3 §3.7 for the conceptual relationship model (cardinality enforcement, immutability, status-based removal).

3b.5 `hippo_meta` Table¶

hippo_meta stores deployment-level state for the Hippo instance. It is managed exclusively by hippo migrate and the SDK internals — not user-writable.

hippo_meta:
  key             string      primary key
  value           JSON
  updated_at      datetime

Standard keys:

Key	Value	Purpose
`schema_version`	string	Currently deployed schema config version
`schema_hash`	string	SHA-256 of the compiled schema for drift detection
`deprecated_fields`	JSON object	Map of entity type → list of deprecated field names
`migration_history`	JSON array	Ordered list of applied migration version strings

3b.6 Migration DDL Mechanics¶

When hippo migrate detects differences between the schema config and the deployed schema, it generates DDL operations according to the conceptual migration rules defined in sec3 §3.8. The relational DDL for each rule:

Conceptual rule (sec3)	Relational DDL
Provision new entity type	`CREATE TABLE {type}s (id UUID PRIMARY KEY, is_available BOOLEAN DEFAULT true, ...)` + partial indexes on all `indexed: true` fields
Add field	`ALTER TABLE {type}s ADD COLUMN {field} {sql_type}` (nullable, or with `DEFAULT` if specified)
Add relationship type	No DDL — the `entity_relationships` edge table is generic
Add index	`CREATE INDEX idx_{type}_{field}_available ON {type}s ({field}) WHERE is_available = true`
Extend enum	Update `CHECK` constraint to include new values
Deprecate field	No DDL — column retained; `hippo_meta.deprecated_fields` updated

Hippo never drops columns or tables automatically. Field type changes and entity type removal are rejected by the migration planner (see sec3 §3.8 for the full rule set).

3b.7 Computed Field Derivation¶

The system fields created_at, updated_at, and schema_version are not stored on entity tables. They are derived from the provenance log at read time.

In a relational context, the SDK resolves these fields via queries against the provenance table:

created_at: Timestamp of the first provenance event for the entity (MIN(timestamp) WHERE entity_id = ?)
updated_at: Timestamp of the most recent non-SOFT_DELETE provenance event (MAX(timestamp) WHERE entity_id = ? AND operation_type != 'SOFT_DELETE')
schema_version: Not stored in v0.1; NULL

The entity_provenance_summary VIEW is required (not optional) for correct operation of client.query() with provenance-derived fields. hippo migrate creates this view before entity table migrations. See sec6_provenance.md §6.6 for the full view definition and column derivation logic.

These derivations are implemented as: - Subqueries on each individual client.get() call (single-entity fast path) - JOIN against entity_provenance_summary on client.query() calls (batch reads)

Both approaches produce identical results visible to callers. Entity table columns created_at and updated_at are retained as a write-through cache — they are updated on every write operation but reads derive authoritative values from the provenance log. See sec6_provenance.md for the full provenance event model.

3b.8 Polymorphic Inheritance Storage¶

When a schema declares a subtype via base:, Hippo uses a joined table inheritance strategy: each type in the hierarchy has its own table containing only the fields declared at that level. A query for a parent type joins across all subtype tables.

Table structure¶

samples (parent table — Sample fields only)
  id          UUID  PK
  is_available BOOL
  tissue_type  TEXT
  ...

brain_samples (child table — BrainSample-specific fields only)
  id                         UUID  PK  FK → samples.id
  brain_region               TEXT
  hemisphere                 TEXT
  post_mortem_interval_hours REAL

The child table's id column is both a primary key and a foreign key to the parent table. Every subtype entity has exactly one row in the parent table and one row in its own table.

`type` discriminator¶

A __type__ column on the parent table stores the concrete type name:

ALTER TABLE samples ADD COLUMN __type__ TEXT NOT NULL DEFAULT 'Sample';

This allows efficient single-table queries when the concrete type is already known, and drives __type__ exposure on entity response objects.

Query behaviour¶

Query	SQL
`client.query("Sample")`	`SELECT * FROM samples LEFT JOIN brain_samples USING (id) LEFT JOIN cell_lines USING (id) WHERE samples.is_available = true`
`client.query("Sample", exact_type=True)`	`SELECT * FROM samples WHERE __type__ = 'Sample' AND is_available = true`
`client.query("BrainSample")`	`SELECT * FROM samples JOIN brain_samples USING (id) WHERE samples.is_available = true`
`client.get("Sample", id)`	Looks up `__type__` in parent table, then fetches the full row via the appropriate join

The storage adapter resolves the join structure at startup from the deployed schema config. No hardcoded table names — all table and column names are derived from the schema.

Migration DDL for subtypes¶

Operation	DDL
New subtype declared	`CREATE TABLE {subtype}s (id UUID PRIMARY KEY REFERENCES {parent}s(id), ...)` + `ALTER TABLE {parent}s ADD COLUMN __type__ TEXT` if not present
New field on subtype	`ALTER TABLE {subtype}s ADD COLUMN {field} {sql_type}`
New field on parent	`ALTER TABLE {parent}s ADD COLUMN {field} {sql_type}` — inherited by all subtypes automatically

Hippo never drops tables or columns. Removing a subtype from schema config is rejected by the migration planner until all entities of that type are marked unavailable.

Performance note¶

Joined table inheritance adds a JOIN per subtype level per query. For shallow hierarchies (1–2 levels) this is negligible. Deep hierarchies (3+ levels) should be avoided as a schema design practice. The exact_type=True flag avoids joins entirely when the concrete type is known — Cappella adapters and workflow pipelines should use this when querying for a specific subtype.

3b.8 Polymorphic Inheritance Storage¶

When a schema declares base: on an entity type, the relational adapter uses a class-table inheritance strategy (also known as "joined table inheritance"):

The parent type has its own table containing system fields (id, is_available) and all parent-declared fields
Each subtype has its own table containing only the fields specific to that subtype; it shares the same id as the parent row
The parent table has a __type__ discriminator column (TEXT, NOT NULL) recording the concrete entity type name (e.g. "BrainSample")

Example (BrainSample extends Sample):

-- Parent table (Sample)
CREATE TABLE samples (
    id          TEXT PRIMARY KEY,
    is_available BOOLEAN NOT NULL DEFAULT 1,
    __type__    TEXT NOT NULL,            -- e.g. "Sample" or "BrainSample"
    external_id TEXT,
    tissue_type TEXT,
    collection_date TEXT
    -- ... other Sample fields
);

-- Subtype table (BrainSample-specific fields only)
CREATE TABLE brain_samples (
    id              TEXT PRIMARY KEY REFERENCES samples(id),
    brain_region    TEXT,
    hemisphere      TEXT CHECK (hemisphere IN ('left','right','bilateral','unknown')),
    post_mortem_interval_hours REAL
);

Query strategy:

client.query("Sample") → SELECT s.*, bs.* FROM samples s LEFT JOIN brain_samples bs ON s.id = bs.id WHERE s.is_available = 1
client.query("Sample", exact_type=True) → adds WHERE s.__type__ = 'Sample'
client.query("BrainSample") → SELECT s.*, bs.* FROM samples s JOIN brain_samples bs ON s.id = bs.id WHERE s.is_available = 1
The LEFT JOIN for polymorphic queries means subtype-specific fields are null on rows that are plain Sample entities — callers can distinguish via __type__

Write strategy:

Creating a BrainSample writes one row to samples (with __type__ = "BrainSample") and one row to brain_samples — both in the same transaction
Updating a BrainSample updates columns in whichever table(s) contain the changed fields
Availability changes only touch the samples table (the parent owns is_available)

Index strategy:

Parent table fields follow the existing partial index strategy (§3b.2): partial indexes on all indexed: true parent fields, scoped to WHERE is_available = 1
Subtype table fields get their own partial indexes, joining to the parent's availability:

CREATE INDEX idx_brain_samples_brain_region
ON brain_samples (brain_region)
WHERE id IN (SELECT id FROM samples WHERE is_available = 1);

Migration DDL for new subtypes:

Conceptual change	Relational DDL
Declare new subtype (e.g. `BrainSample: base: Sample`)	`CREATE TABLE brain_samples (id TEXT PRIMARY KEY REFERENCES samples(id), ...)` + `ALTER TABLE samples ADD COLUMN __type__ TEXT` (if first subtype)
Add field to subtype	`ALTER TABLE brain_samples ADD COLUMN ...`
Add field to parent type	`ALTER TABLE samples ADD COLUMN ...` (all subtypes inherit automatically via JOIN)

The __type__ discriminator column is added to the parent table the first time any subtype is declared for that parent. Pre-existing rows receive __type__ = "<ParentTypeName>" as their default value via the migration.

Rationale for class-table over single-table inheritance:

Single-table inheritance (all types in one table, nullable columns for subtype fields) would produce very wide sparse tables as the number of subtypes grows — poor for columnar storage patterns and confusing for direct SQL queries
Class-table inheritance keeps each table narrow and semantically coherent; SQL queries against a subtype return only relevant columns
The JOIN overhead is acceptable for the expected query patterns in Hippo deployments

Opinionated decision: Class-table inheritance is the mandated strategy for the SQLite and PostgreSQL adapters. Alternative adapters (e.g. graph database) may implement polymorphic inheritance differently, but must present identical semantics to the SDK.

Relational Storage

3b. Relational Storage Mapping¶

3b.1 Entity Table Schema¶

3b.2 Partial Index Strategy¶

3b.3 External IDs Table¶

3b.4 Entity Relationships Table¶

3b.5 hippo_meta Table¶

3b.6 Migration DDL Mechanics¶

3b.7 Computed Field Derivation¶

3b.8 Polymorphic Inheritance Storage¶

Table structure¶

__type__ discriminator¶

Query behaviour¶

Migration DDL for subtypes¶

Performance note¶

3b.8 Polymorphic Inheritance Storage¶

3b.5 `hippo_meta` Table¶

`type` discriminator¶