Relational Storage
3b. Relational Storage Mapping¶
Document status: Draft v0.1 Depends on: sec3_data_model.md Feeds into: sec4_api_layer.md
This section describes how the conceptual data model defined in sec3 maps to relational storage. It is the reference specification for the SQLite (v0.1) and PostgreSQL (future) storage adapters. Other adapter types (graph database, document store, etc.) would have their own mapping documents.
3b.1 Entity Table Schema¶
Each entity type defined in schema config maps to its own database table. The table name is
the lowercase plural of the entity type name (e.g., entity type Item → table items).
Every entity table contains the following physical columns:
| Column | Type | Constraint | Source |
|---|---|---|---|
id |
UUID | Primary key | Generated by Hippo on creation |
is_available |
boolean | NOT NULL, default true |
Managed by availability operations |
superseded_by |
UUID | Nullable, no default | Set atomically by client.supersede_entity(); NULL means not superseded |
| (user-defined fields) | (per schema config) | (per schema config) | Declared in schema config |
superseded_by is a system column applied generically to all entity types by hippo migrate (mirroring how is_available is handled). It is not declared in schema config. hippo migrate adds it via ALTER TABLE {entity_type}s ADD COLUMN superseded_by TEXT when the column is absent. Pre-existing rows receive NULL, correctly representing "not superseded".
User-defined field columns are generated by hippo migrate from the schema config. Field
type mappings:
| Hippo type | SQLite type | PostgreSQL type |
|---|---|---|
string |
TEXT | TEXT / VARCHAR(n) |
int |
INTEGER | INTEGER |
float |
REAL | DOUBLE PRECISION |
bool |
INTEGER (0/1) | BOOLEAN |
date |
TEXT (ISO 8601) | DATE |
datetime |
TEXT (ISO 8601) | TIMESTAMP WITH TIME ZONE |
enum |
TEXT + CHECK constraint | TEXT + CHECK constraint |
json |
TEXT (JSON string) | JSONB |
uri |
TEXT | TEXT |
ref |
TEXT ("type:uuid") |
TEXT ("type:uuid") |
Temporal fields (created_at, updated_at, schema_version) are not stored as columns
on entity tables. They are computed at read time from the provenance log (see §3b.6).
3b.2 Partial Index Strategy¶
hippo migrate creates a partial index on every indexed: true field on every entity table,
scoped to available records only:
-- One per indexed field per entity table
CREATE INDEX idx_{entity_type}_{field}_available
ON {entity_type}s ({field})
WHERE is_available = true;
These indexes contain only available records. As entities become unavailable over time the indexes stay small and query performance remains constant regardless of total table size. Queries that explicitly include unavailable records bypass the partial index and perform a full scan, which is appropriate since they are explicitly exceptional operations.
3b.3 External IDs Table¶
External identifiers are stored in a shared external_ids table:
external_ids:
id UUID primary key
entity_id UUID indexed
entity_type string indexed
system string indexed (e.g. "starlims", "halo", "nextflow" — see Appendix A)
external_id string indexed
is_active boolean default true
created_at datetime (written by provenance system)
See sec3 §3.4 for the conceptual external ID model (cardinality, immutability, correction via supersession).
3b.4 Entity Relationships Table¶
Relationships are stored in a single entity_relationships edge table. New relationship
types added via schema config require no table migrations.
entity_relationships:
id UUID primary key
from_id UUID indexed
from_type string indexed
to_id UUID indexed
to_type string indexed
relationship string indexed (e.g. "derived_from", "generated", "superseded_by" — see Appendix A)
properties JSON (relationship-level attributes)
status enum (active | removed)
created_at datetime (written by provenance system)
See sec3 §3.7 for the conceptual relationship model (cardinality enforcement, immutability, status-based removal).
3b.5 hippo_meta Table¶
hippo_meta stores deployment-level state for the Hippo instance. It is managed exclusively
by hippo migrate and the SDK internals — not user-writable.
Standard keys:
| Key | Value | Purpose |
|---|---|---|
schema_version |
string | Currently deployed schema config version |
schema_hash |
string | SHA-256 of the compiled schema for drift detection |
deprecated_fields |
JSON object | Map of entity type → list of deprecated field names |
migration_history |
JSON array | Ordered list of applied migration version strings |
3b.6 Migration DDL Mechanics¶
When hippo migrate detects differences between the schema config and the deployed schema,
it generates DDL operations according to the conceptual migration rules defined in sec3 §3.8.
The relational DDL for each rule:
| Conceptual rule (sec3) | Relational DDL |
|---|---|
| Provision new entity type | CREATE TABLE {type}s (id UUID PRIMARY KEY, is_available BOOLEAN DEFAULT true, ...) + partial indexes on all indexed: true fields |
| Add field | ALTER TABLE {type}s ADD COLUMN {field} {sql_type} (nullable, or with DEFAULT if specified) |
| Add relationship type | No DDL — the entity_relationships edge table is generic |
| Add index | CREATE INDEX idx_{type}_{field}_available ON {type}s ({field}) WHERE is_available = true |
| Extend enum | Update CHECK constraint to include new values |
| Deprecate field | No DDL — column retained; hippo_meta.deprecated_fields updated |
Hippo never drops columns or tables automatically. Field type changes and entity type removal are rejected by the migration planner (see sec3 §3.8 for the full rule set).
3b.7 Computed Field Derivation¶
The system fields created_at, updated_at, and schema_version are not stored on entity
tables. They are derived from the provenance log at read time.
In a relational context, the SDK resolves these fields via queries against the provenance table:
created_at: Timestamp of the first provenance event for the entity (MIN(timestamp) WHERE entity_id = ?)updated_at: Timestamp of the most recent non-SOFT_DELETEprovenance event (MAX(timestamp) WHERE entity_id = ? AND operation_type != 'SOFT_DELETE')schema_version: Not stored in v0.1;NULL
The entity_provenance_summary VIEW is required (not optional) for correct operation
of client.query() with provenance-derived fields. hippo migrate creates this view before
entity table migrations. See sec6_provenance.md §6.6 for the full view definition and column
derivation logic.
These derivations are implemented as:
- Subqueries on each individual client.get() call (single-entity fast path)
- JOIN against entity_provenance_summary on client.query() calls (batch reads)
Both approaches produce identical results visible to callers. Entity table columns
created_at and updated_at are retained as a write-through cache — they are updated on
every write operation but reads derive authoritative values from the provenance log. See
sec6_provenance.md for the full provenance event model.
3b.8 Polymorphic Inheritance Storage¶
When a schema declares a subtype via base:, Hippo uses a joined table inheritance
strategy: each type in the hierarchy has its own table containing only the fields declared
at that level. A query for a parent type joins across all subtype tables.
Table structure¶
samples (parent table — Sample fields only)
id UUID PK
is_available BOOL
tissue_type TEXT
...
brain_samples (child table — BrainSample-specific fields only)
id UUID PK FK → samples.id
brain_region TEXT
hemisphere TEXT
post_mortem_interval_hours REAL
The child table's id column is both a primary key and a foreign key to the parent table.
Every subtype entity has exactly one row in the parent table and one row in its own table.
__type__ discriminator¶
A __type__ column on the parent table stores the concrete type name:
This allows efficient single-table queries when the concrete type is already known, and
drives __type__ exposure on entity response objects.
Query behaviour¶
| Query | SQL |
|---|---|
client.query("Sample") |
SELECT * FROM samples LEFT JOIN brain_samples USING (id) LEFT JOIN cell_lines USING (id) WHERE samples.is_available = true |
client.query("Sample", exact_type=True) |
SELECT * FROM samples WHERE __type__ = 'Sample' AND is_available = true |
client.query("BrainSample") |
SELECT * FROM samples JOIN brain_samples USING (id) WHERE samples.is_available = true |
client.get("Sample", id) |
Looks up __type__ in parent table, then fetches the full row via the appropriate join |
The storage adapter resolves the join structure at startup from the deployed schema config. No hardcoded table names — all table and column names are derived from the schema.
Migration DDL for subtypes¶
| Operation | DDL |
|---|---|
| New subtype declared | CREATE TABLE {subtype}s (id UUID PRIMARY KEY REFERENCES {parent}s(id), ...) + ALTER TABLE {parent}s ADD COLUMN __type__ TEXT if not present |
| New field on subtype | ALTER TABLE {subtype}s ADD COLUMN {field} {sql_type} |
| New field on parent | ALTER TABLE {parent}s ADD COLUMN {field} {sql_type} — inherited by all subtypes automatically |
Hippo never drops tables or columns. Removing a subtype from schema config is rejected by the migration planner until all entities of that type are marked unavailable.
Performance note¶
Joined table inheritance adds a JOIN per subtype level per query. For shallow hierarchies
(1–2 levels) this is negligible. Deep hierarchies (3+ levels) should be avoided as a schema
design practice. The exact_type=True flag avoids joins entirely when the concrete type is
known — Cappella adapters and workflow pipelines should use this when querying for a specific
subtype.
3b.8 Polymorphic Inheritance Storage¶
When a schema declares base: on an entity type, the relational adapter uses a
class-table inheritance strategy (also known as "joined table inheritance"):
- The parent type has its own table containing system fields (
id,is_available) and all parent-declared fields - Each subtype has its own table containing only the fields specific to that subtype; it
shares the same
idas the parent row - The parent table has a
__type__discriminator column (TEXT, NOT NULL) recording the concrete entity type name (e.g."BrainSample")
Example (BrainSample extends Sample):
-- Parent table (Sample)
CREATE TABLE samples (
id TEXT PRIMARY KEY,
is_available BOOLEAN NOT NULL DEFAULT 1,
__type__ TEXT NOT NULL, -- e.g. "Sample" or "BrainSample"
external_id TEXT,
tissue_type TEXT,
collection_date TEXT
-- ... other Sample fields
);
-- Subtype table (BrainSample-specific fields only)
CREATE TABLE brain_samples (
id TEXT PRIMARY KEY REFERENCES samples(id),
brain_region TEXT,
hemisphere TEXT CHECK (hemisphere IN ('left','right','bilateral','unknown')),
post_mortem_interval_hours REAL
);
Query strategy:
client.query("Sample")→SELECT s.*, bs.* FROM samples s LEFT JOIN brain_samples bs ON s.id = bs.id WHERE s.is_available = 1client.query("Sample", exact_type=True)→ addsWHERE s.__type__ = 'Sample'client.query("BrainSample")→SELECT s.*, bs.* FROM samples s JOIN brain_samples bs ON s.id = bs.id WHERE s.is_available = 1- The
LEFT JOINfor polymorphic queries means subtype-specific fields arenullon rows that are plainSampleentities — callers can distinguish via__type__
Write strategy:
- Creating a
BrainSamplewrites one row tosamples(with__type__ = "BrainSample") and one row tobrain_samples— both in the same transaction - Updating a
BrainSampleupdates columns in whichever table(s) contain the changed fields - Availability changes only touch the
samplestable (the parent ownsis_available)
Index strategy:
- Parent table fields follow the existing partial index strategy (§3b.2): partial indexes
on all
indexed: trueparent fields, scoped toWHERE is_available = 1 - Subtype table fields get their own partial indexes, joining to the parent's availability:
CREATE INDEX idx_brain_samples_brain_region
ON brain_samples (brain_region)
WHERE id IN (SELECT id FROM samples WHERE is_available = 1);
Migration DDL for new subtypes:
| Conceptual change | Relational DDL |
|---|---|
Declare new subtype (e.g. BrainSample: base: Sample) |
CREATE TABLE brain_samples (id TEXT PRIMARY KEY REFERENCES samples(id), ...) + ALTER TABLE samples ADD COLUMN __type__ TEXT (if first subtype) |
| Add field to subtype | ALTER TABLE brain_samples ADD COLUMN ... |
| Add field to parent type | ALTER TABLE samples ADD COLUMN ... (all subtypes inherit automatically via JOIN) |
The __type__ discriminator column is added to the parent table the first time any subtype
is declared for that parent. Pre-existing rows receive __type__ = "<ParentTypeName>" as
their default value via the migration.
Rationale for class-table over single-table inheritance:
- Single-table inheritance (all types in one table, nullable columns for subtype fields) would produce very wide sparse tables as the number of subtypes grows — poor for columnar storage patterns and confusing for direct SQL queries
- Class-table inheritance keeps each table narrow and semantically coherent; SQL queries against a subtype return only relevant columns
- The JOIN overhead is acceptable for the expected query patterns in Hippo deployments
Opinionated decision: Class-table inheritance is the mandated strategy for the SQLite and PostgreSQL adapters. Alternative adapters (e.g. graph database) may implement polymorphic inheritance differently, but must present identical semantics to the SDK.