Appendix B: Implementation Guide
Appendix B: Implementation Guide for Coding Agents¶
Document status: Draft v0.1 Depends on: All sec1–sec7, reference_*.md documents
This document provides a structured implementation plan for a coding agent (or human engineer) building Hippo from this specification. It covers build order, module responsibilities, key invariants, and test strategy.
B.1 Technology Stack¶
| Layer | Technology | Notes |
|---|---|---|
| Language | Python 3.10+ | Required for match statements, importlib.metadata entry points |
| Package manager | uv |
Preferred; pip acceptable |
| Config/schema validation | pydantic>=2.0 |
All config and schema types are Pydantic models |
| YAML parsing | pyyaml |
Config and schema file parsing |
| CLI | typer |
CLI framework |
| REST framework | fastapi |
REST transport; auto-generates OpenAPI docs |
| ASGI server | uvicorn |
ASGI server for hippo serve |
| CEL evaluation | cel-python |
Config-driven validator conditions |
| SQLite | stdlib sqlite3 |
Core storage; no ORM |
| Testing | pytest |
Test framework; in-memory SQLite for integration tests |
| Packaging | pyproject.toml |
Entry points declared here |
B.2 Package Structure¶
hippo/
├── __init__.py # Public exports: HippoClient, HippoConfig
│
├── core/
│ ├── client.py # HippoClient — all business logic; no I/O except via adapters
│ ├── types.py # Shared types: Filter, PaginatedResult, ScoredMatch,
│ │ # WriteOperation, ValidationResult, ProvenanceRecord, IngestResult
│ ├── errors.py # Full error hierarchy (see §B.3)
│ └── ingestion.py # IngestionPipeline, IngestOptions, IngestResult
│
├── schema/
│ ├── parser.py # Parse LinkML schema.yaml → SchemaConfig (Pydantic model)
│ ├── validator.py # Validate LinkML schema structure and references
│ ├── migrator.py # Schema diff, migration plan generation, migration application
│ └── types.py # SchemaConfig, EntityTypeConfig, FieldConfig, RelationshipConfig
│
├── config/
│ ├── loader.py # Load hippo.yaml → HippoConfig (Pydantic model); env var substitution
│ └── types.py # HippoConfig, AdapterConfig, ServerConfig, LoggingConfig, etc.
│
├── adapters/
│ ├── base.py # EntityStore ABC, ExternalSourceAdapter ABC
│ ├── registry.py # Adapter discovery via hippo.storage_adapters entry points
│ └── sqlite/
│ ├── __init__.py
│ ├── store.py # EntityStore implementation for SQLite
│ ├── schema_ddl.py # DDL generation from SchemaConfig
│ ├── migrations.py # SQLite-specific migration application
│ ├── provenance.py # Provenance table + summary view management
│ └── search.py # FTS5 search implementation
│
├── validators/
│ ├── base.py # WriteValidator ABC, WriteOperation, ValidationResult
│ ├── registry.py # Validator discovery (schema + config + plugins), ordering
│ ├── schema_validator.py # Tier 1: field-level schema validation (priority -1)
│ ├── cel_validator.py # Tier 2: CEL condition evaluation, expand path engine
│ ├── expand.py # Expand path parser, batch fetcher, cycle detector
│ ├── presets.py # Built-in preset expansion (ref_check, count_constraint, etc.)
│ └── cel_context.py # CEL context construction from entity + existing + expansions
│
├── reference/
│ ├── base.py # ReferenceLoader ABC, LoadResult
│ └── registry.py # ReferenceLoader discovery via hippo.reference_loaders entry points
│
├── rest/
│ ├── app.py # FastAPI app factory: create_app(client: HippoClient)
│ ├── auth.py # AuthMiddleware ABC + pass-through stub
│ ├── dependencies.py # FastAPI dependency injection (get_client, get_actor, etc.)
│ └── routers/
│ ├── entities.py # GET/POST /entities/{type}, GET /entities/{type}/{id}
│ ├── history.py # GET /entities/{type}/{id}/history
│ ├── availability.py # POST /entities/{type}/{id}/availability
│ ├── supersede.py # POST /entities/{type}/{id}/supersede
│ ├── relationships.py # POST /relationships, DELETE /relationships/{id}
│ ├── external_ids.py # GET/POST /external-ids/...
│ ├── search.py # GET /search/{type}
│ ├── ingest.py # POST /ingest/{type}
│ └── schema.py # GET /schema/entity-types, etc.
│
├── cli/
│ └── main.py # Typer CLI: init, serve, migrate, ingest, validate,
│ # reference (install/update/list), validate-schema
│
└── graphql/
└── __init__.py # Reserved for future GraphQL transport; empty in v0.1
B.3 Error Hierarchy¶
All Hippo exceptions inherit from HippoError. Callers should catch the most specific
subclass available; catch HippoError only as a last resort.
class HippoError(Exception):
"""Base class for all Hippo errors."""
# ── Configuration errors (startup time) ──────────────────────────────────────
class ConfigError(HippoError):
"""Invalid or missing configuration. Raised at startup."""
class SchemaError(HippoError):
"""Schema file is invalid, references unknown types, or contains cycles."""
class MigrationError(HippoError):
"""Migration cannot be applied (e.g. destructive change attempted)."""
class SearchCapabilityError(HippoError):
"""Schema declares a search mode not supported by the active adapter."""
# ── Runtime errors (request time) ────────────────────────────────────────────
class ValidationError(HippoError):
"""Write was rejected by a validator. HTTP 422."""
validator: str # name of the validator that rejected the write
entity_type: str
entity_id: str | None # None for creates before ID assignment
detail: dict # validator-specific detail
class EntityNotFoundError(HippoError):
"""Entity does not exist or is unavailable. HTTP 404."""
entity_type: str
entity_id: str
class RelationshipNotFoundError(HippoError):
"""Relationship edge does not exist. HTTP 404."""
relationship_id: str
class ExternalIdNotFoundError(HippoError):
"""No entity found for the given (system, external_id) pair. HTTP 404."""
system: str
external_id: str
class ExternalIdConflictError(HippoError):
"""ExternalID is already registered to a different entity. HTTP 409."""
system: str
external_id: str
existing_entity_id: str
class SchemaVersionMismatchError(HippoError):
"""Instance schema version differs from storage schema version (future v0.2). HTTP 503."""
instance_version: str
storage_version: str
# ── Adapter errors (infrastructure) ──────────────────────────────────────────
class AdapterError(HippoError):
"""Storage backend failure. Always wraps the underlying exception. HTTP 500."""
cause: Exception
class AdapterNotFoundError(ConfigError):
"""No adapter registered for the configured adapter type."""
adapter_type: str
Wrapping rule: Adapters must catch all backend-specific exceptions
(e.g. sqlite3.OperationalError) and re-raise as AdapterError(cause=original).
Internal implementation details must never leak through the SDK boundary.
B.4 Build Order and Dependencies¶
Implement in this sequence. Each phase is independently testable before proceeding.
Phase 1 — Pure types and config (no I/O)¶
Modules: core/types.py, core/errors.py, config/loader.py, config/types.py,
schema/parser.py, schema/types.py
Goal: Parse hippo.yaml and schema.yaml into validated Pydantic models. No database,
no network.
Key tests:
- hippo.yaml with all valid fields parses correctly
- Missing required fields raise ConfigError with clear message
- Unknown keys raise ConfigError
- schema.yaml with entity types, fields, relationships, base:, requires:, search:
parses correctly
- Cyclic base: declarations raise SchemaError
- Invalid field types raise SchemaError
Phase 2 — SQLite storage adapter (no business logic)¶
Modules: adapters/base.py, adapters/sqlite/store.py, adapters/sqlite/schema_ddl.py,
adapters/sqlite/provenance.py
Goal: Implement EntityStore ABC against SQLite. CRUD + provenance writing.
No validators, no relationships, no search yet.
Key tests (all use :memory: SQLite):
- put() creates a new entity and provenance EntityCreated event
- put() on existing entity updates fields and writes EntityUpdated event with changed_fields
- get() returns entity by UUID
- get() raises EntityNotFoundError for unknown UUID
- get() raises EntityNotFoundError for unavailable entity (unless include_unavailable=True)
- query() returns filtered results
- history() returns provenance events in chronological order
- Provenance immutability trigger prevents UPDATE/DELETE on provenance_events
- WAL mode is enabled on connection
- Entity tables use partial indexes on is_available
Phase 3 — Schema validation (Tier 1)¶
Modules: validators/base.py, validators/schema_validator.py, validators/registry.py
Goal: Integrate field-level schema validation into the write path. Every put() now
validates against schema before writing.
Key tests:
- Required field missing → ValidationError
- Field type mismatch → ValidationError
- Enum value not in allowed set → ValidationError
- Unknown field in input → warning (not error) + field ignored
- Valid write succeeds
Phase 4 — Full HippoClient (no CEL validators yet)¶
Modules: core/client.py, core/ingestion.py
Goal: Implement the full public HippoClient interface — availability, supersession,
relationships, ExternalID operations, history, state_at, ingestion pipeline.
Key tests:
- set_availability(available=False) writes AvailabilityChanged provenance event
- set_availability(available=False) without reason raises ValidationError
- supersede(old_id, new_id) writes EntitySuperseded event on old, creates superseded_by edge
- relate() creates edge and RelationshipCreated event
- unrelate() sets edge status to removed and writes RelationshipRemoved event
- register_external_id() writes ExternalIdAdded event
- register_external_id() with existing (system, external_id) on different entity raises ExternalIdConflictError
- correct_external_id() invalidates old record and writes ExternalIdSuperseded event
- get_by_external_id() returns entity
- state_at() reconstructs entity state at a given timestamp
- IngestionPipeline creates/updates/skips correctly; IngestResult counts are accurate
- IngestionPipeline with --fail-fast stops on first error
Phase 5 — CEL validators (Tier 2) and expand path engine¶
Modules: validators/cel_validator.py, validators/expand.py, validators/presets.py,
validators/cel_context.py
Goal: Load and execute validators.yaml rules. Implement expand path engine with batch
fetch and cycle detection.
Key tests:
- validators.yaml with unknown keys raises ConfigError
- Validator when pre-condition = false → validator skipped (write succeeds even if condition would fail)
- Validator condition = false → ValidationError with error template rendered
- expand: subject — single ref expanded to full entity map in CEL context
- expand: samples[] — list ref batch-fetched and expanded to list of maps
- expand: subject.diagnosis_group — nested ref expansion
- Cycle in expand graph → expansion stops; no infinite loop
- max_expand_list_size cap — list truncated; warning logged
- Built-in preset ref_check — rejects write when ref points to unavailable entity
- Built-in preset immutable_field — rejects update that changes field value
- existing is null for creates; present for updates
- CEL sandbox has no I/O functions (test that http or open are not available)
- Plugin validators (hippo.write_validators entry points) are discovered and ordered
Phase 6 — REST transport¶
Modules: rest/app.py, rest/routers/*, rest/auth.py, rest/dependencies.py
Goal: Thin FastAPI transport wrapping HippoClient. All routing, serialisation,
request/response envelope, error mapping.
Key tests:
- All entity CRUD endpoints return correct HTTP status codes
- ValidationError maps to HTTP 422 with structured error body
- EntityNotFoundError maps to HTTP 404
- AdapterError maps to HTTP 500
- Response envelope always present: {data, error, meta}
- Pagination: ?limit=10&offset=20 returns correct slice + meta pagination fields
- X-Hippo-Actor header sets actor on writes; defaults to "anonymous" if absent
- X-Hippo-Context header parsed as JSON and passed as provenance context
- OpenAPI docs generated at /docs
Phase 7 — CLI¶
Module: cli/main.py
Goal: Implement all CLI commands using HippoClient and supporting modules.
Commands to implement:
- hippo init — scaffold hippo.yaml and blank schema.yaml in current directory
- hippo serve — start REST server (loads config, validates, starts Uvicorn)
- hippo migrate [--yes] — diff schema, print plan, apply on confirmation
- hippo validate — run validators.yaml against current data (read-only dry run)
- hippo ingest <entity_type> <file> [--dry-run] [--fail-fast] [--actor] [--context]
- hippo reference install <name> [--version]
- hippo reference update <name>
- hippo reference list
- hippo validate-schema <schema_file>
Phase 8 — Search¶
Modules: adapters/sqlite/search.py, core/client.py (search method), REST search router
Goal: FTS5 full-text search on fields declared search: fts. Embedding and synonym
modes are deferred; stubs only.
Key tests:
- EntityStore.search_capabilities() returns {"fts"} for SQLite adapter
- Fields with search: fts have FTS5 virtual table populated on write
- client.search(entity_type, field, query) returns list[ScoredMatch] ordered by score desc
- Startup raises SearchCapabilityError if schema declares search: embedding and adapter
doesn't support it
B.5 Key Invariants¶
These invariants must hold at all times. Tests should verify them directly.
-
Atomic write + provenance: Every successful
put()produces exactly one provenance event in the same transaction. If provenance write fails, entity write rolls back. If entity write fails, no provenance event is written. -
No data loss:
is_available = falseis the only way to "remove" entities. There is no DELETE path in the SDK or REST API for entity data. -
Schema validation always runs: There is no
skip_validationparameter on any write method. Schema validators (Tier 1) cannot be disabled via config. -
Provenance immutability: Once written, provenance records cannot be modified or deleted. Verified at the database level by triggers (SQLite) or equivalent constraints.
-
ExternalID uniqueness:
(system, external_id)is globally unique. Two entities cannot share the same ExternalID in the same system. Enforced by a unique constraint on theexternal_idstable. -
Temporal fields derived, never stored:
created_at,updated_at,schema_versionare computed from the provenance log. They must never appear as columns on entity tables. -
Adapter wraps all exceptions: No raw
sqlite3orpsycopg2exceptions reach the caller. All backend errors are wrapped asAdapterError. -
__type__is immutable: An entity's concrete type cannot be changed after creation. Attempts to update__type__raiseValidationErrorwithimmutable_fieldreason.
B.6 Testing Conventions¶
- All integration tests use
adapter.type: sqlitewithpath: ":memory:" - A
@pytest.fixturenamedclientprovides a freshHippoClientwith an in-memory SQLite adapter for each test. No test isolation issues — each test gets a clean DB. - A
@pytest.fixturenamedclient_with_schema(schema_yaml)loads a schema from a YAML string and returns a configured client. - Test schemas should be minimal — declare only the entity types and fields needed for the test. Avoid re-using the Appendix A omics schema across unrelated tests.
- Provenance events should be asserted explicitly in write operation tests — not just that the write succeeded, but that the correct event type was recorded with the correct fields.
Standard test schema (minimal, used in unit tests):
version: "1.0"
entities:
Thing:
fields:
name: {type: string, required: true, indexed: true}
category: {type: enum, values: [A, B, C]}
count: {type: int}
parent: {type: ref}
relationships:
- name: related_to
from: Thing
to: Thing
cardinality: many-to-many
B.7 OpenSpec Integration¶
This spec feeds directly into the OpenSpec workflow. Each phase in §B.4 maps to one or more OpenSpec features. Recommended mapping:
| OpenSpec Epic | Phases | Key spec sections |
|---|---|---|
hippo-foundation |
Phase 1 | sec2 §2.4, reference_hippo_yaml.md, sec3 §3.6 |
hippo-storage |
Phase 2 | sec3b, sec6 §6.6 |
hippo-validation-tier1 |
Phase 3 | sec2 §2.13 (Tier 1 only) |
hippo-sdk-core |
Phase 4 | sec4 §4.2, sec5 §5.2–5.6, sec6 §6.3 |
hippo-validation-tier2 |
Phase 5 | sec2 §2.13 (Tier 2+3), reference_validators_yaml.md, reference_cel_context.md |
hippo-rest |
Phase 6 | sec4 §4.3–4.5 |
hippo-cli |
Phase 7 | sec2 §2.5 |
hippo-search |
Phase 8 | sec2 §2.3 (search), sec3 §3.5 (search field type) |
Each OpenSpec feature should reference the specific section(s) of this design spec that define its behaviour. OpenSpec acceptance criteria should map directly to the test cases listed in §B.4.