Reference: validators.yaml

Reference: `validators.yaml` Format¶

Document status: Draft v0.1 Depends on: sec2_architecture.md §2.13, reference_cel_context.md

This document is the authoritative format reference for validators.yaml — the config-driven business rule validator file. See sec2 §2.13 for the validation architecture overview and execution model.

Top-Level Structure¶

validators:          # required top-level key; value is a list of validator entries
  - name: ...        # validator entries, evaluated in priority order
  - name: ...

The file must have exactly one top-level key: validators. Unknown top-level keys produce a ConfigError at startup.

Validator Entry Fields¶

Field	Type	Required	Default	Description
`name`	string	✅ yes	—	Unique identifier; used in error messages and logs. Must be unique across all validators (config + plugin).
`entity_types`	list[string] \| null	no	null	Entity types this validator applies to. `null` means all types. Subtype-aware: `[Sample]` covers `BrainSample`, `CellLine`, etc. Startup warning if both parent and subtype are listed.
`on`	list[string]	no	`[create, update, delete]`	Operations that trigger this validator. Valid values: `create`, `update`, `delete`.
`priority`	int	no	`0`	Execution order among config validators. Lower values run first. Schema validation (Tier 1) always runs at priority `-1` before any config validator. Plugin validators (Tier 3) run after all config validators regardless of their declared priority.
`when`	string (CEL)	no	null	Pre-condition expression. If evaluates to `false`, the validator is skipped entirely for this write. See reference_cel_context.md.
`expand`	list[ExpandEntry]	no	`[]`	Fields to pre-fetch before CEL evaluation. Required for any validator that needs to traverse relationships. See §Expand Entries.
`condition`	string (CEL)	no*	—	The validation condition. Must evaluate to `bool`. `true` = write allowed; `false` = write rejected. Required unless `requires` shorthand is used.
`requires`	list[PresetEntry]	no*	—	Built-in preset validators as ergonomic shortcuts. Expands internally to `condition` expressions. See §Built-in Presets. Either `condition` or `requires` (or both) must be present.
`error`	string	no	`"Validation failed: {name}"`	Error message returned to the caller on failure. Supports template variables: `{name}` (validator name), `{entity_type}`, `{entity_id}`.
`max_expand_list_size`	int	no	global default (200)	Per-validator override for the list expansion cap. Cannot exceed global hard cap of 1000.

* Either condition or requires must be present (or both).

Expand Entries¶

Each entry in the expand list specifies one field path to pre-fetch before CEL evaluation. Paths support dot notation and list traversal.

expand:
  - path: subject                  # simple ref field — fetch the referenced entity
  - path: subject.diagnosis_group  # nested ref — fetch subject, then its diagnosis_group
  - path: samples[]                # list ref — fetch all entities in the list field
  - path: samples[].tissue_region  # list traversal — fetch samples, then each tissue_region ref

Path syntax:

Syntax	Meaning
`field`	Fetch the entity referenced by `field` (must be type `ref`)
`field.child`	Fetch `field`, then fetch `child` from that entity
`field[]`	Fetch all entities referenced in `field` (must be type `json` list of refs or a list field)
`field[].child`	Fetch all entities in `field[]`, then fetch `child` from each

How expanded values appear in CEL context:

Expanded fields are merged into the entity map. After expanding subject: - entity.subject is the full subject entity map (all fields), not just the ref string - entity.subject.diagnosis accesses the subject's diagnosis field directly

After expanding samples[]: - entity.samples is a list of full entity maps (not ref strings) - entity.samples[0].tissue_type accesses the first sample's tissue_type

See reference_cel_context.md for full CEL context variable specification.

Batch fetch guarantee: All field[] expansions are fetched in a single query per list, not N individual lookups. Implementers must batch-fetch list expansions.

Cycle detection: The expand engine maintains a visited set keyed by "type:uuid". If a cycle is detected, expansion stops at that node and the validator receives the entity without further expansion (no error is raised; a debug log entry is written).

Built-in Presets¶

Presets are ergonomic shortcuts that expand to condition expressions internally. They are not separate code paths — they produce standard CEL conditions.

`ref_check`¶

Validates that a ref field points to an available entity of the expected type.

requires:
  - type: ref_check
    field: subject              # required — the ref field to check
    target_type: Subject        # optional — if set, also checks __type__ matches
    allow_unavailable: false    # optional; default: false — reject refs to unavailable entities

Equivalent condition (generated internally):

entity.subject != null && entity.subject.is_available == true

`count_constraint`¶

Validates a count constraint on a list or relationship.

requires:
  - type: count_constraint
    field: samples[]            # required — the list field or expand path
    min: 1                      # optional; default: no minimum
    max: 10                     # optional; default: no maximum

Equivalent condition: entity.samples.size() >= 1 && entity.samples.size() <= 10

`immutable_field`¶

Rejects updates that change a specific field value once set.

requires:
  - type: immutable_field
    field: external_id          # required — field that must not change after create
    allow_null_to_value: true   # optional; default: true — allow setting a null field

Equivalent condition (for updates): existing == null || existing.external_id == null || entity.external_id == existing.external_id

`field_required_if`¶

Makes a field required when a condition is met.

requires:
  - type: field_required_if
    field: post_mortem_interval_hours    # required — field that must be present
    when: "entity.__type__ == 'BrainSample'"  # required — condition under which field is required

Equivalent condition: !(entity.__type__ == 'BrainSample') || entity.post_mortem_interval_hours != null

`no_self_ref`¶

Rejects an entity that references itself (e.g. a derived_from self-reference that would create a trivial cycle).

requires:
  - type: no_self_ref
    field: parent_sample        # required — the ref field to check

Equivalent condition: entity.parent_sample == null || entity.parent_sample.id != entity.id

Full Example¶

validators:

  # Validate that a Sample's subject is present and available
  - name: sample_subject_available
    entity_types: [Sample]
    on: [create, update]
    expand:
      - path: subject
    condition: "entity.subject != null && entity.subject.is_available == true"
    error: "Sample {entity_id}: referenced subject is unavailable or missing"

  # Prevent changing external_id after it has been set
  - name: sample_external_id_immutable
    entity_types: [Sample]
    on: [update]
    requires:
      - type: immutable_field
        field: external_id
    error: "Sample {entity_id}: external_id is immutable once set"

  # BrainSample must have post_mortem_interval_hours
  - name: brain_sample_pmi_required
    entity_types: [BrainSample]
    on: [create]
    condition: "entity.post_mortem_interval_hours != null"
    error: "BrainSample {entity_id}: post_mortem_interval_hours is required"

  # A WorkflowRun must reference a Workflow that is available
  - name: workflow_run_valid_workflow
    entity_types: [WorkflowRun]
    on: [create]
    expand:
      - path: workflow
    requires:
      - type: ref_check
        field: workflow
        target_type: Workflow
    error: "WorkflowRun {entity_id}: referenced workflow is unavailable or missing"

  # A Dataset must contain at least one Datafile (checked on update; creates can be empty)
  - name: dataset_not_empty
    entity_types: [Dataset]
    on: [update]
    when: "entity.is_public == true"        # only enforce for public datasets
    expand:
      - path: datafiles[]
    requires:
      - type: count_constraint
        field: datafiles[]
        min: 1
    error: "Dataset {entity_id}: public datasets must contain at least one datafile"

Execution Semantics¶

All validators for a given entity type and operation are collected (config + plugin)
Schema validators (Tier 1) run first at implicit priority -1
Config validators run in ascending priority order (lower number = earlier)
For config validators with the same priority, order is stable (file order)
Plugin validators (Tier 3) run after all config validators, in their declared priority order
First failure stops execution and rolls back the transaction
when pre-condition is evaluated before expand — if when is false, no expansion occurs

Unknown Keys¶

Unknown keys at any level in validators.yaml produce a ConfigError at startup. This prevents silently-ignored typos in validator config.

Reference: validators.yaml