Hippo Configuration Reference¶
Complete reference for configuring Hippo, the Metadata Tracking Service (MTS) for the BASS platform.
Overview¶
Hippo uses two primary configuration files:
config.json-- Main application configuration that specifies the schema path and storage settings- Schema files (YAML or JSON) -- Define entity types, attributes, validators, and inheritance in LinkML format
Configuration is loaded via load_hippo_config() from config.json, and schemas are loaded via SchemaParser or load_schema().
HippoConfig¶
The main configuration model for the Hippo application. Loaded from config.json.
| Field | Type | Default | Description |
|---|---|---|---|
schema_path |
Path |
(required) | Path to the schema directory or file containing entity definitions |
storage_backend |
str |
None |
Storage backend to use (e.g., "sqlite", "postgresql"). If None, uses default backend |
database_url |
str |
None |
Connection string for the database. For SQLite: hippo.db or absolute path |
validation_enabled |
bool |
true |
Enable validation of entity data against schema rules |
validation_fail_fast |
bool |
true |
Stop validation on first error. If false, collects all validation errors |
write_path_validation_enabled |
bool |
true |
Enable validation of write paths (entity location paths) |
write_path_validation_timeout |
float |
None |
Timeout in seconds for write path validation. None = no timeout |
validators_path |
Path |
None |
Path to custom validators file. If None, uses default location |
schema_path is required -- HippoConfig will not instantiate without it.
Example config.json¶
{
"schema_path": "./schemas",
"storage_backend": "sqlite",
"database_url": "./data/hippo.db",
"validation_enabled": true,
"validation_fail_fast": true,
"write_path_validation_enabled": true,
"validators_path": "./validators.yaml"
}
Template config.json (from hippo init)¶
The basic template generated by hippo init produces:
{
"version": "0.1",
"storage": {
"type": "sqlite",
"path": "hippo.db"
},
"schema": {
"type": "linkml"
}
}
The full template includes additional sections:
{
"version": "0.1",
"name": "hippo-project",
"description": "Hippo Metadata Tracking Service Project",
"storage": {
"type": "sqlite",
"path": "hippo.db"
},
"schema": {
"type": "linkml",
"path": "schemas"
},
"validation": {
"enabled": true,
"strict": false
},
"api": {
"host": "127.0.0.1",
"port": 8000
}
}
LinkML Schema Format¶
Schema files use standard LinkML format. Classes and their attributes are defined in a dictionary structure with a required schema header. Both YAML and JSON are accepted.
Schema Header¶
Every LinkML schema file requires these top-level keys:
| Key | Required | Description |
|---|---|---|
id |
Yes | Unique URI identifier for the schema |
name |
Yes | Short schema name (alphanumeric, underscores, dashes) |
prefixes |
Yes | Maps prefix names to IRI namespaces (must include linkml) |
imports |
Yes | List of imported schemas (almost always includes linkml:types) |
default_range |
Recommended | Default data type for attributes without explicit range (e.g., string) |
classes |
Core | Dictionary of class (entity type) definitions |
enums |
Core | Dictionary of enumeration definitions |
slots |
Optional | Dictionary of reusable slot definitions shared across classes |
Schema Example¶
id: https://example.org/my-lab
name: my_lab
prefixes:
linkml: https://w3id.org/linkml/
my_lab: https://example.org/my-lab/
imports:
- linkml:types
default_range: string
enums:
SexType:
permissible_values:
M:
F:
Unknown:
SampleType:
permissible_values:
tissue:
blood:
csf:
classes:
Donor:
description: "Human donor information"
attributes:
donor_id:
range: string
required: true
identifier: true
annotations:
hippo_index: true
species:
range: string
required: true
ifabsent: "string(Homo sapiens)"
sex:
range: SexType
required: true
date_of_birth:
range: date
consent_obtained:
range: boolean
ifabsent: "false"
Sample:
description: "Biological sample from a donor"
attributes:
sample_id:
range: string
required: true
identifier: true
annotations:
hippo_index: true
donor:
range: Donor
required: true
annotations:
hippo_index: true
sample_type:
range: SampleType
required: true
quantity_ng:
range: float
notes:
range: string
annotations:
hippo_search: fts5
Schema Inheritance¶
Schemas support inheritance via is_a:
classes:
BiologicalEntity:
description: "Base entity for all biological samples"
abstract: true
attributes:
external_id:
range: string
identifier: true
Donor:
is_a: BiologicalEntity
description: "Human donor information"
attributes:
donor_id:
range: string
required: true
species:
range: string
Mixins are supported for composing shared attribute sets:
classes:
BrainEntity:
mixin: true
attributes:
brain_region:
range: string
BrainSample:
is_a: Sample
mixins:
- BrainEntity
Attribute Properties¶
Each attribute within a class supports these LinkML properties:
| Property | Type | Default | Description |
|---|---|---|---|
range |
str |
schema's default_range |
Data type: a built-in type, class name, or enum name |
required |
bool |
false |
Whether the attribute is required (non-null) |
description |
str |
None |
Human-readable description |
ifabsent |
str |
None |
Default value expression, e.g., string(pending), int(0), true |
identifier |
bool |
false |
Whether this attribute is the unique primary key |
multivalued |
bool |
false |
Whether the value is a list |
pattern |
str |
None |
Regex pattern for validation |
minimum_value |
number |
None |
Minimum numeric value (inclusive) |
maximum_value |
number |
None |
Maximum numeric value (inclusive) |
inlined |
bool |
false |
Whether class-range values are nested inline |
inlined_as_list |
bool |
false |
Inlined as a list rather than a dictionary |
Built-in Range Types¶
Available when you import linkml:types:
| Range | Description |
|---|---|
string |
Text data |
integer |
Integer numbers |
float |
Floating-point numbers |
boolean |
True/false values |
date |
Date (YYYY-MM-DD) |
datetime |
Date and time (ISO 8601) |
uri |
URI/URL string |
uriorcurie |
URI or compact URI (CURIE) |
Hippo-Specific Annotations¶
Hippo extends standard LinkML with storage annotations. These are expressed under the annotations key on attributes:
| Annotation | Values | Description |
|---|---|---|
hippo_index |
true |
Creates a B-tree database index on this attribute |
hippo_index_partial |
true |
Creates a partial index (non-null values only) |
hippo_search |
fts, fts5, embedding |
Enables full-text search indexing |
References (Entity Links)¶
Declare that an attribute points to another class by setting range to a class name:
When the range is a class, Hippo treats the attribute as an entity reference. This is used for HippoClient.schema_references() and Cappella's collection resolver.
Enum Definitions¶
Enums are defined at the schema top level under enums:. Each enum declares its allowed values under permissible_values:
enums:
SexType:
description: "Biological sex"
permissible_values:
M:
description: "Male"
F:
description: "Female"
Unknown:
description: "Not specified"
Reference an enum from an attribute with range:
ValidatorDefinition¶
Defines a validator applied to entity data. Validators are loaded from a separate validators file with the CEL (Common Expression Language) engine:
validators:
- name: sample_name_required
type: cel
enabled: true
priority: 0
config:
entity_types: [Sample]
on: [create, update]
condition: 'entity.name != ""'
error: "Sample {entity_id}: name is required"
| Field | Type | Default | Description |
|---|---|---|---|
name |
str |
(required) | Unique name for this validator |
type |
str |
(required) | Validator type (e.g., "cel" for CEL expressions) |
enabled |
bool |
true |
Whether this validator is active |
priority |
int |
0 |
Execution order (higher runs first). Negative for pre-schema validation |
config |
dict[str, Any] |
None |
Type-specific configuration |
Complete Example¶
config.json¶
{
"schema_path": "./schemas",
"storage_backend": "sqlite",
"database_url": "./data/hippo.db",
"validation_enabled": true,
"validation_fail_fast": false,
"write_path_validation_enabled": true,
"validators_path": "./validators.yaml"
}
schemas/base.yaml -- Base Entity¶
id: https://example.org/base
name: base
prefixes:
linkml: https://w3id.org/linkml/
base: https://example.org/base/
imports:
- linkml:types
default_range: string
classes:
BiologicalEntity:
description: "Base entity for all biological samples"
abstract: true
attributes:
external_id:
range: string
identifier: true
description: "External system identifier"
created_by:
range: string
description: "User who created this entity"
schemas/donor.yaml -- Donor Entity¶
id: https://example.org/donor
name: donor
prefixes:
linkml: https://w3id.org/linkml/
donor: https://example.org/donor/
imports:
- linkml:types
- base
default_range: string
enums:
SexType:
permissible_values:
M:
F:
Unknown:
classes:
Donor:
is_a: BiologicalEntity
description: "Human donor information"
attributes:
donor_id:
range: string
required: true
identifier: true
annotations:
hippo_index: true
description: "Internal donor identifier"
species:
range: string
required: true
ifabsent: "string(Homo sapiens)"
description: "Species of the donor"
sex:
range: SexType
required: true
description: "Biological sex"
date_of_birth:
range: date
description: "Date of birth"
ethnicity:
range: string
description: "Ethnicity information"
consent_obtained:
range: boolean
ifabsent: "false"
description: "Whether donor consent is on file"
consent_date:
range: date
description: "Date consent was obtained"
diagnoses:
range: string
multivalued: true
description: "List of diagnoses"
medical_record_number:
range: string
annotations:
hippo_index: true
description: "Hospital MRN"
notes:
range: string
annotations:
hippo_search: fts5
description: "Free-text notes for searching"
schemas/sample.yaml -- Sample Entity¶
id: https://example.org/sample
name: sample
prefixes:
linkml: https://w3id.org/linkml/
sample: https://example.org/sample/
imports:
- linkml:types
- base
- donor
default_range: string
enums:
SampleType:
permissible_values:
tissue:
blood:
csf:
cell_line:
classes:
Sample:
is_a: BiologicalEntity
description: "Biological sample from a donor"
attributes:
sample_id:
range: string
required: true
identifier: true
annotations:
hippo_index: true
description: "Internal sample identifier"
donor:
range: Donor
required: true
annotations:
hippo_index: true
description: "Reference to donor"
sample_type:
range: SampleType
required: true
description: "Type of biological sample"
tissue_type:
range: string
description: "Tissue of origin"
collection_date:
range: datetime
description: "Date/time of sample collection"
quantity_ng:
range: float
description: "Sample quantity in nanograms"
quality_score:
range: float
description: "Sample quality metric (0-1)"
storage_location:
range: string
description: "Physical storage location"
barcode:
range: string
annotations:
hippo_index: true
description: "Sample barcode"
unique_keys:
barcode_key:
unique_key_slots:
- barcode
validators.yaml¶
validators:
- name: donor_id_format
entity_types: [Donor]
'on': [create, update]
condition: 'entity.donor_id.matches("^DON-\\d{6}$")'
error: "Donor {entity_id}: donor_id must match format DON-123456"
- name: sample_quantity_positive
entity_types: [Sample]
'on': [create, update]
condition: 'entity.quantity_ng > 0'
error: "Sample {entity_id}: quantity_ng must be positive"
- name: sample_with_donor
entity_types: [Sample]
'on': [create]
condition: 'entity.donor != ""'
error: "Sample {entity_id}: must have an associated donor"
Loading Configuration¶
from hippo.config.loader import load_hippo_config, load_schema
# Load main configuration
config = load_hippo_config("config.json")
# Load a schema file
schema = load_schema("schemas/donor.yaml")
# Load all schemas from a directory
from hippo.config.loader import SchemaParser
parser = SchemaParser(schema_dir=Path("schemas"))
schemas = parser.load_schema_dir(Path("schemas"))