Skip to content

Hippo Configuration Reference

Complete reference for configuring Hippo, the Metadata Tracking Service (MTS) for the BASS platform.

Overview

Hippo uses two primary configuration files:

  1. config.json -- Main application configuration that specifies the schema path and storage settings
  2. Schema files (YAML or JSON) -- Define entity types, attributes, validators, and inheritance in LinkML format

Configuration is loaded via load_hippo_config() from config.json, and schemas are loaded via SchemaParser or load_schema().


HippoConfig

The main configuration model for the Hippo application. Loaded from config.json.

Field Type Default Description
schema_path Path (required) Path to the schema directory or file containing entity definitions
storage_backend str None Storage backend to use (e.g., "sqlite", "postgresql"). If None, uses default backend
database_url str None Connection string for the database. For SQLite: hippo.db or absolute path
validation_enabled bool true Enable validation of entity data against schema rules
validation_fail_fast bool true Stop validation on first error. If false, collects all validation errors
write_path_validation_enabled bool true Enable validation of write paths (entity location paths)
write_path_validation_timeout float None Timeout in seconds for write path validation. None = no timeout
validators_path Path None Path to custom validators file. If None, uses default location

schema_path is required -- HippoConfig will not instantiate without it.

Example config.json

{
  "schema_path": "./schemas",
  "storage_backend": "sqlite",
  "database_url": "./data/hippo.db",
  "validation_enabled": true,
  "validation_fail_fast": true,
  "write_path_validation_enabled": true,
  "validators_path": "./validators.yaml"
}

Template config.json (from hippo init)

The basic template generated by hippo init produces:

{
  "version": "0.1",
  "storage": {
    "type": "sqlite",
    "path": "hippo.db"
  },
  "schema": {
    "type": "linkml"
  }
}

The full template includes additional sections:

{
  "version": "0.1",
  "name": "hippo-project",
  "description": "Hippo Metadata Tracking Service Project",
  "storage": {
    "type": "sqlite",
    "path": "hippo.db"
  },
  "schema": {
    "type": "linkml",
    "path": "schemas"
  },
  "validation": {
    "enabled": true,
    "strict": false
  },
  "api": {
    "host": "127.0.0.1",
    "port": 8000
  }
}

LinkML Schema Format

Schema files use standard LinkML format. Classes and their attributes are defined in a dictionary structure with a required schema header. Both YAML and JSON are accepted.

Schema Header

Every LinkML schema file requires these top-level keys:

Key Required Description
id Yes Unique URI identifier for the schema
name Yes Short schema name (alphanumeric, underscores, dashes)
prefixes Yes Maps prefix names to IRI namespaces (must include linkml)
imports Yes List of imported schemas (almost always includes linkml:types)
default_range Recommended Default data type for attributes without explicit range (e.g., string)
classes Core Dictionary of class (entity type) definitions
enums Core Dictionary of enumeration definitions
slots Optional Dictionary of reusable slot definitions shared across classes

Schema Example

id: https://example.org/my-lab
name: my_lab
prefixes:
  linkml: https://w3id.org/linkml/
  my_lab: https://example.org/my-lab/
imports:
  - linkml:types
default_range: string

enums:
  SexType:
    permissible_values:
      M:
      F:
      Unknown:

  SampleType:
    permissible_values:
      tissue:
      blood:
      csf:

classes:
  Donor:
    description: "Human donor information"
    attributes:
      donor_id:
        range: string
        required: true
        identifier: true
        annotations:
          hippo_index: true
      species:
        range: string
        required: true
        ifabsent: "string(Homo sapiens)"
      sex:
        range: SexType
        required: true
      date_of_birth:
        range: date
      consent_obtained:
        range: boolean
        ifabsent: "false"

  Sample:
    description: "Biological sample from a donor"
    attributes:
      sample_id:
        range: string
        required: true
        identifier: true
        annotations:
          hippo_index: true
      donor:
        range: Donor
        required: true
        annotations:
          hippo_index: true
      sample_type:
        range: SampleType
        required: true
      quantity_ng:
        range: float
      notes:
        range: string
        annotations:
          hippo_search: fts5

Schema Inheritance

Schemas support inheritance via is_a:

classes:
  BiologicalEntity:
    description: "Base entity for all biological samples"
    abstract: true
    attributes:
      external_id:
        range: string
        identifier: true

  Donor:
    is_a: BiologicalEntity
    description: "Human donor information"
    attributes:
      donor_id:
        range: string
        required: true
      species:
        range: string

Mixins are supported for composing shared attribute sets:

classes:
  BrainEntity:
    mixin: true
    attributes:
      brain_region:
        range: string

  BrainSample:
    is_a: Sample
    mixins:
      - BrainEntity

Attribute Properties

Each attribute within a class supports these LinkML properties:

Property Type Default Description
range str schema's default_range Data type: a built-in type, class name, or enum name
required bool false Whether the attribute is required (non-null)
description str None Human-readable description
ifabsent str None Default value expression, e.g., string(pending), int(0), true
identifier bool false Whether this attribute is the unique primary key
multivalued bool false Whether the value is a list
pattern str None Regex pattern for validation
minimum_value number None Minimum numeric value (inclusive)
maximum_value number None Maximum numeric value (inclusive)
inlined bool false Whether class-range values are nested inline
inlined_as_list bool false Inlined as a list rather than a dictionary

Built-in Range Types

Available when you import linkml:types:

Range Description
string Text data
integer Integer numbers
float Floating-point numbers
boolean True/false values
date Date (YYYY-MM-DD)
datetime Date and time (ISO 8601)
uri URI/URL string
uriorcurie URI or compact URI (CURIE)

Hippo-Specific Annotations

Hippo extends standard LinkML with storage annotations. These are expressed under the annotations key on attributes:

Annotation Values Description
hippo_index true Creates a B-tree database index on this attribute
hippo_index_partial true Creates a partial index (non-null values only)
hippo_search fts, fts5, embedding Enables full-text search indexing
  diagnosis:
    range: string
    annotations:
      hippo_index: true
      hippo_search: fts5

Declare that an attribute points to another class by setting range to a class name:

classes:
  Sample:
    attributes:
      donor:
        range: Donor          # references the Donor class
        required: true

When the range is a class, Hippo treats the attribute as an entity reference. This is used for HippoClient.schema_references() and Cappella's collection resolver.


Enum Definitions

Enums are defined at the schema top level under enums:. Each enum declares its allowed values under permissible_values:

enums:
  SexType:
    description: "Biological sex"
    permissible_values:
      M:
        description: "Male"
      F:
        description: "Female"
      Unknown:
        description: "Not specified"

Reference an enum from an attribute with range:

  sex:
    range: SexType
    required: true

ValidatorDefinition

Defines a validator applied to entity data. Validators are loaded from a separate validators file with the CEL (Common Expression Language) engine:

validators:
  - name: sample_name_required
    type: cel
    enabled: true
    priority: 0
    config:
      entity_types: [Sample]
      on: [create, update]
      condition: 'entity.name != ""'
      error: "Sample {entity_id}: name is required"
Field Type Default Description
name str (required) Unique name for this validator
type str (required) Validator type (e.g., "cel" for CEL expressions)
enabled bool true Whether this validator is active
priority int 0 Execution order (higher runs first). Negative for pre-schema validation
config dict[str, Any] None Type-specific configuration

Complete Example

config.json

{
  "schema_path": "./schemas",
  "storage_backend": "sqlite",
  "database_url": "./data/hippo.db",
  "validation_enabled": true,
  "validation_fail_fast": false,
  "write_path_validation_enabled": true,
  "validators_path": "./validators.yaml"
}

schemas/base.yaml -- Base Entity

id: https://example.org/base
name: base
prefixes:
  linkml: https://w3id.org/linkml/
  base: https://example.org/base/
imports:
  - linkml:types
default_range: string

classes:
  BiologicalEntity:
    description: "Base entity for all biological samples"
    abstract: true
    attributes:
      external_id:
        range: string
        identifier: true
        description: "External system identifier"
      created_by:
        range: string
        description: "User who created this entity"

schemas/donor.yaml -- Donor Entity

id: https://example.org/donor
name: donor
prefixes:
  linkml: https://w3id.org/linkml/
  donor: https://example.org/donor/
imports:
  - linkml:types
  - base
default_range: string

enums:
  SexType:
    permissible_values:
      M:
      F:
      Unknown:

classes:
  Donor:
    is_a: BiologicalEntity
    description: "Human donor information"
    attributes:
      donor_id:
        range: string
        required: true
        identifier: true
        annotations:
          hippo_index: true
        description: "Internal donor identifier"
      species:
        range: string
        required: true
        ifabsent: "string(Homo sapiens)"
        description: "Species of the donor"
      sex:
        range: SexType
        required: true
        description: "Biological sex"
      date_of_birth:
        range: date
        description: "Date of birth"
      ethnicity:
        range: string
        description: "Ethnicity information"
      consent_obtained:
        range: boolean
        ifabsent: "false"
        description: "Whether donor consent is on file"
      consent_date:
        range: date
        description: "Date consent was obtained"
      diagnoses:
        range: string
        multivalued: true
        description: "List of diagnoses"
      medical_record_number:
        range: string
        annotations:
          hippo_index: true
        description: "Hospital MRN"
      notes:
        range: string
        annotations:
          hippo_search: fts5
        description: "Free-text notes for searching"

schemas/sample.yaml -- Sample Entity

id: https://example.org/sample
name: sample
prefixes:
  linkml: https://w3id.org/linkml/
  sample: https://example.org/sample/
imports:
  - linkml:types
  - base
  - donor
default_range: string

enums:
  SampleType:
    permissible_values:
      tissue:
      blood:
      csf:
      cell_line:

classes:
  Sample:
    is_a: BiologicalEntity
    description: "Biological sample from a donor"
    attributes:
      sample_id:
        range: string
        required: true
        identifier: true
        annotations:
          hippo_index: true
        description: "Internal sample identifier"
      donor:
        range: Donor
        required: true
        annotations:
          hippo_index: true
        description: "Reference to donor"
      sample_type:
        range: SampleType
        required: true
        description: "Type of biological sample"
      tissue_type:
        range: string
        description: "Tissue of origin"
      collection_date:
        range: datetime
        description: "Date/time of sample collection"
      quantity_ng:
        range: float
        description: "Sample quantity in nanograms"
      quality_score:
        range: float
        description: "Sample quality metric (0-1)"
      storage_location:
        range: string
        description: "Physical storage location"
      barcode:
        range: string
        annotations:
          hippo_index: true
        description: "Sample barcode"
    unique_keys:
      barcode_key:
        unique_key_slots:
          - barcode

validators.yaml

validators:
  - name: donor_id_format
    entity_types: [Donor]
    'on': [create, update]
    condition: 'entity.donor_id.matches("^DON-\\d{6}$")'
    error: "Donor {entity_id}: donor_id must match format DON-123456"

  - name: sample_quantity_positive
    entity_types: [Sample]
    'on': [create, update]
    condition: 'entity.quantity_ng > 0'
    error: "Sample {entity_id}: quantity_ng must be positive"

  - name: sample_with_donor
    entity_types: [Sample]
    'on': [create]
    condition: 'entity.donor != ""'
    error: "Sample {entity_id}: must have an associated donor"

Loading Configuration

from hippo.config.loader import load_hippo_config, load_schema

# Load main configuration
config = load_hippo_config("config.json")

# Load a schema file
schema = load_schema("schemas/donor.yaml")

# Load all schemas from a directory
from hippo.config.loader import SchemaParser
parser = SchemaParser(schema_dir=Path("schemas"))
schemas = parser.load_schema_dir(Path("schemas"))