Canon Rules Reference: `canon_rules.yaml` and `.canon.yaml` Sidecars¶

Document status: Draft v0.1 Depends on: sec3_rules_dsl.md, sec3b_cwl_integration.md, sec2_architecture.md

Overview¶

Canon's knowledge of how to produce artifacts is declared in two YAML formats:

canon_rules.yaml — the rule registry: maps artifact specifications to CWL workflows. One file per Canon project, path configurable via rules_file in canon.yaml.
<workflow>.canon.yaml — per-workflow sidecar: maps CWL outputs to Hippo entities. One file per CWL workflow referenced in the rules.

Both formats are human-authored YAML, validated at Canon startup, and versioned in source control alongside the CWL workflow files.

Part 1: `canon_rules.yaml`¶

Top-Level Structure¶

rules:
  - <rule>
  - <rule>
  ...

rules is the only top-level key. It contains a list of rule objects. There is no metadata header, version field, or other top-level keys. The list may be empty (Canon starts with no rules and no error), but a non-list value raises CanonRuleValidationError.

Rule Object¶

Each rule is a YAML mapping with the following fields:

Field	Type	Required	Default
`name`	`string`	Yes	—
`description`	`string`	No	—
`produces`	`mapping`	Yes	—
`requires`	`list[mapping]`	No	`[]`
`execute`	`mapping`	Yes	—

`name`¶

Type: string Required: Yes

Unique identifier for this rule within the rules file. Used in error messages, canon rules list output, WorkflowRun.rule_name provenance records, and the canon rules validate --rule <name> command.

Convention: snake_case verb phrase describing the transformation.

name: trim_reads
name: build_star_index
name: align_reads
name: count_genes

Validation: duplicate rule names raise CanonRuleValidationError: duplicate rule name '{name}'.

`description`¶

Type: string Required: No

Human-readable description of what this rule does. Shown in canon rules list output and error messages. Has no effect on rule matching.

description: Align trimmed reads to a reference genome using STAR

`produces`¶

Type: mapping Required: Yes

Declares what artifact this rule produces. Used for rule matching: when Canon needs an entity of produces.entity_type with parameters matching produces.match, this rule is a candidate.

Field	Type	Required	Description
`entity_type`	`string`	Yes	Hippo entity type this rule produces
`match`	`mapping`	Yes	Parameter set that identifies this artifact

produces:
  entity_type: AlignmentFile
  match:
    sample: "{sample}"
    genome_build: "ref:GenomeBuild{name={genome_build}}"
    aligner: "ref:ToolVersion{tool.name=STAR, version={star_version}}"
    quality_cutoff: "{quality_cutoff}"
    min_length: "{min_length}"

produces.entity_type

Must be a valid Hippo entity type name (PascalCase). Canon verifies at startup that this type exists in the connected Hippo instance's schema. Domain entity types (AlignmentFile, FastqFile, etc.) are defined in your Hippo schema configuration; Canon's own entity types (Tool, ToolVersion, etc.) are defined in the Canon reference schema.

produces.match

A mapping from parameter names to parameter values. Values may be: - Scalar — a plain string, integer, float, or boolean - Entity reference — ref:EntityType{...} (see Entity Reference Syntax) - Wildcard — {name} (see Wildcards) - Wildcard inside entity reference — ref:EntityType{field={wildcard}}

The match block is the complete parameter set stored on the produced Hippo entity. All parameters that should be queryable (including upstream provenance parameters) must appear here — they do not propagate automatically from upstream rules.

Validation: duplicate produces specs (same entity_type + same match) across rules raises CanonRuleValidationError: ambiguous produces.

`requires`¶

Type: list[mapping] Required: No Default: []

A list of input artifacts that must be resolved before this rule can execute. Each entry is resolved by a recursive canon get call — it may REUSE an existing artifact or trigger a further BUILD.

Each entry in requires is a mapping with:

Field	Type	Required	Description
`bind`	`string`	Yes	Name for this input — used in `execute.inputs` expressions
`entity_type`	`string`	Yes	Hippo entity type to resolve
`match`	`mapping`	Yes	Parameters identifying the required input

requires:
  - bind: trimmed_fastq
    entity_type: TrimmedFastqFile
    match:
      sample: "{sample}"
      trimmer: "ref:ToolVersion{tool.name=cutadapt, version={cutadapt_version}}"
      quality_cutoff: "{quality_cutoff}"
      min_length: "{min_length}"
  - bind: genome_index
    entity_type: StarIndex
    match:
      genome_build: "ref:GenomeBuild{name={genome_build}}"
      aligner: "ref:ToolVersion{tool.name=STAR, version={star_version}}"

requires[].bind

A snake_case identifier. Bound input values are accessible in execute.inputs expressions as {bind_name.field}. For example, bind: trimmed_fastq → {trimmed_fastq.uri} in execute.inputs.

requires[].entity_type

Same semantics as produces.entity_type.

requires[].match

Same semantics as produces.match — scalars, entity references, and wildcards. Wildcards here must appear in produces.match (see Wildcard Propagation Rule).

Resolution order: required inputs are resolved sequentially in the order they appear in the list. In v0.1, Canon does not parallelize input resolution.

`execute`¶

Type: mapping Required: Yes

Declares how to run the CWL workflow that produces this artifact.

Field	Type	Required	Description
`workflow`	`string`	Yes	Path to the CWL workflow file
`inputs`	`mapping`	Yes	CWL workflow input → value mappings

execute:
  workflow: workflows/star_align.cwl
  inputs:
    fastq: "{trimmed_fastq.uri}"
    genome_index: "{genome_index.uri}"
    genome_build: "{genome_build}"
    aligner: "{aligner}"
    quality_cutoff: "{quality_cutoff}"
    min_length: "{min_length}"
    sample_id: "{sample.id}"

execute.workflow

Path to the CWL workflow file. Relative paths are resolved relative to canon_rules.yaml. Must point to a CWL Workflow (not a bare CommandLineTool). A companion .canon.yaml sidecar must exist in the same directory.

workflow: workflows/star_align.cwl        # relative to canon_rules.yaml
workflow: /abs/path/to/star_align.cwl     # absolute path

Validation: - File must exist at startup - Must be a valid YAML file - cwlVersion must be v1.2 - class must be Workflow - A .canon.yaml sidecar must exist alongside it

execute.inputs

Maps CWL workflow input names to values. Keys are the inputs: identifiers declared in the CWL workflow file. Every input declared in the CWL workflow must have a corresponding key here (Canon validates this at startup).

Values are input value expressions.

Part 2: Parameter Types¶

Scalar Parameters¶

Plain YAML values — strings, integers, floats, or booleans:

quality_cutoff: 20
min_length: 30
strand_specific: true
max_intron_length: 500000
mode: "paired"

Scalars are matched exactly against Hippo entity fields. They are stored as-is on the produced Hippo entity. YAML type is preserved (integer 20 ≠ string "20").

Entity References¶

Pointers to Hippo entities. An entity reference is resolved to a Hippo UUID before any lookup or execution.

Syntax:

ref:<EntityType>{<field>=<value>[, <field>=<value>...]}

genome_build: "ref:GenomeBuild{name=GRCh38}"
aligner: "ref:ToolVersion{tool.name=STAR, version=2.7.11a}"
annotation: "ref:GeneAnnotation{source=GENCODE, version=43}"
sample: "ref:Sample{id=AD001}"

Entity reference strings must be quoted in YAML (the { and } characters require quoting).

See: Entity Reference Syntax for the full reference.

Wildcards¶

Parameters whose values are supplied at resolution time (not hard-coded in the rule). Written as {name} where name is the wildcard identifier.

sample: "{sample}"
quality_cutoff: "{quality_cutoff}"
genome_build: "{genome_build}"    # plain wildcard — scalar value

Wildcard names are snake_case. The same name in produces.match and requires[].match creates a binding — the value is automatically threaded through from the top-level request.

Wildcards may also appear inside entity reference expressions:

aligner: "ref:ToolVersion{tool.name=STAR, version={star_version}}"
genome_build: "ref:GenomeBuild{name={genome_build}}"

See: Wildcard Propagation Rule for the mandatory propagation requirement.

Part 3: Entity Reference Syntax¶

Full Syntax¶

ref:<EntityType>{<field>=<value>[, <field>=<value>...]}

<EntityType> — PascalCase Hippo entity type name
<field>=<value> — comma-separated field constraints
Field names are case-sensitive
Values are literal strings (no quoting needed inside the braces)
Whitespace around = and , is ignored

Dot-Notation Field Traversal¶

Dot notation traverses reference fields to access fields on a related entity:

# Follow the 'tool' reference field on ToolVersion, then match 'name' on Tool
aligner: "ref:ToolVersion{tool.name=STAR, version=2.7.11a}"

# Follow annotation → GeneAnnotation, then match source and version
gtf: "ref:GeneAnnotationFile{annotation.source=GENCODE, annotation.version=43}"

tool.name=STAR is implemented as a Hippo JOIN query: find ToolVersion entities whose tool reference field points to a Tool entity with name=STAR. This is not a client-side filter.

Maximum traversal depth: 3 levels. Deeper paths raise CanonResolutionError: dot-notation traversal exceeds maximum depth (3).

Wildcards Inside Entity References¶

Field values inside a ref: expression may be wildcards:

# genome_build wildcard inside an entity ref field
genome_build: "ref:GenomeBuild{name={genome_build}}"

# Multiple wildcards in one ref
aligner: "ref:ToolVersion{tool.name={aligner_name}, version={aligner_version}}"

# Mixed literal and wildcard fields
aligner: "ref:ToolVersion{tool.name=STAR, version={star_version}}"

Wildcards inside refs are substituted from the request spec before the entity lookup. If the wildcard is not bound, Canon raises CanonPlanningError: unbound wildcard '{name}'.

Resolution Rules¶

All literal field values must match exactly (case-sensitive, exact string)
Wildcard fields are substituted from the request spec before resolution
Dot-notation traversal is implemented as Hippo JOIN queries, not client-side filters
Zero matches → CanonResolutionError: no {EntityType} entity found matching {constraints}
Multiple matches → CanonResolutionError: ambiguous reference — {n} {EntityType} entities match {constraints}. Provide additional fields to disambiguate
The resolved UUID is used for all subsequent Hippo queries and stored on produced entities

Part 4: Wildcards¶

Wildcard Syntax¶

{name} — a single identifier in curly braces. No spaces inside the braces.

sample: "{sample}"
quality_cutoff: "{quality_cutoff}"
star_version: "{star_version}"

Wildcard names are snake_case. They are bound when Canon evaluates a request — the value comes from the --param arguments to canon get or the parameters passed programmatically.

Wildcard Propagation Rule¶

Any wildcard that appears in a requires[].match block must also appear in produces.match.

This rule enforces complete provenance declaration. If a requires input is parameterized by {quality_cutoff}, the produced artifact must also declare quality_cutoff in its identity — otherwise the produced entity's metadata would silently omit an upstream parameter that distinguishes it from other artifacts.

# CORRECT — quality_cutoff appears in both produces.match and requires[].match
produces:
  entity_type: AlignmentFile
  match:
    sample: "{sample}"
    quality_cutoff: "{quality_cutoff}"   # ← declared here

requires:
  - bind: trimmed_fastq
    entity_type: TrimmedFastqFile
    match:
      sample: "{sample}"
      quality_cutoff: "{quality_cutoff}" # ← used here — valid because declared in produces

# INVALID — quality_cutoff used in requires but missing from produces
produces:
  entity_type: AlignmentFile
  match:
    sample: "{sample}"
    # quality_cutoff is NOT here — startup error

requires:
  - bind: trimmed_fastq
    entity_type: TrimmedFastqFile
    match:
      sample: "{sample}"
      quality_cutoff: "{quality_cutoff}"  # ← unpropagated wildcard

Violation: CanonRuleValidationError: unpropagated wildcard '{quality_cutoff}' in rule '{name}' requires[0].match — must appear in produces.match

Part 5: Input Value Expressions¶

The execute.inputs mapping uses expressions to compute the concrete values passed to inputs.json for the CWL workflow. Expressions are snake_case identifiers in curly braces.

Expression Types¶

Bound input field — access a field on a resolved required input entity:

inputs:
  fastq: "{trimmed_fastq.uri}"          # uri field on TrimmedFastqFile entity
  read_count: "{raw_fastq.read_count}"  # scalar field on resolved entity
  genome_index: "{genome_index.uri}"    # uri field on StarIndex entity

The part before . must be a bind name from requires. The part after . is a field name on the resolved Hippo entity (typically uri, but any entity field is valid).

Scalar wildcard — pass a wildcard value directly to CWL:

inputs:
  quality_cutoff: "{quality_cutoff}"
  strand_specific: "{strand_specific}"
  min_length: "{min_length}"

Entity reference field — access a field on a resolved entity reference from produces.match:

inputs:
  genome_name: "{genome_build.name}"    # genome_build was ref:GenomeBuild{name={genome_build}}
  aligner_version: "{aligner.version}"  # aligner was ref:ToolVersion{...}
  sample_id: "{sample.id}"              # sample was ref:Sample{id={sample}}

These expressions follow the reference to the resolved Hippo entity and extract the named field.

Static value — a literal string or number with no {...} expression:

inputs:
  output_format: "BAM"
  sort_order: "coordinate"
  threads: 8

Expression Evaluation Order¶

Wildcards are bound from the request spec
Entity references in produces.match and requires[].match are resolved to UUIDs
Required inputs are resolved (recursive canon get) → bound by bind name
execute.inputs expressions are evaluated left-to-right
The resulting dict is written to inputs.json and passed to the CWL executor

CWL Type Mapping¶

Canon maps bound values to CWL input types as follows:

Expression result	CWL type	`inputs.json` representation
Hippo entity `uri` (file URI, S3 URI, DRS URI)	`File`	`{"class": "File", "location": "<uri>"}`
Hippo entity `uri` (directory URI)	`Directory`	`{"class": "Directory", "location": "<uri>"}`
Scalar string	`string`	`"value"`
Scalar integer	`int`	`20`
Scalar float	`float`	`0.05`
Scalar boolean	`boolean`	`true`
Hippo entity UUID (passthrough)	`string`	`"uuid:..."`

Canon infers File vs. Directory from the CWL workflow's declared input type for that parameter. The workflow's input declaration is authoritative.

Validation: CanonRuleValidationError: unknown binding '{name}' in execute.inputs — '{name}' is not a wildcard or requires bind name

Part 6: `.canon.yaml` Sidecar Format¶

Every CWL workflow referenced in canon_rules.yaml must have a companion .canon.yaml sidecar file in the same directory. The sidecar declares which Hippo entities the CWL workflow's outputs map to and how CWL output values become Hippo entity fields.

The sidecar keeps CWL files entirely standard — no Canon-specific extensions to the CWL format itself.

Sidecar File Naming and Location¶

The sidecar must be named by replacing .cwl with .canon.yaml:

workflows/
  star_align.cwl           → star_align.canon.yaml (required)
  cutadapt.cwl             → cutadapt.canon.yaml   (required)
  htseq_count.cwl          → htseq_count.canon.yaml (required)

Top-Level Structure¶

outputs:
  <cwl_output_name>:
    entity_type: <string>
    identity_fields:
      - <field>
    hippo_fields:
      <hippo_field>: <expression>
    optional: <bool>

The only top-level key is outputs. It is a mapping from CWL output names (as declared in the CWL workflow's outputs: block) to output descriptor objects.

`outputs.<name>.entity_type`¶

Type: string Required: Yes

The Hippo entity type that this CWL output maps to. Must be a valid Hippo entity type name.

outputs:
  bam:
    entity_type: AlignmentFile

`outputs.<name>.identity_fields`¶

Type: list[string] Required: Yes

The subset of hippo_fields that uniquely identify this artifact. These fields are used by Canon for the registry lookup query in Phase 2 of the resolution algorithm. Every field listed here must appear in hippo_fields.

Identity fields should match the parameters in the corresponding rule's produces.match block.

identity_fields:
  - sample
  - genome_build
  - aligner
  - quality_cutoff
  - min_length

Validation: every identity_field must be present in hippo_fields — CanonRuleValidationError: identity_field '{field}' is not declared in hippo_fields.

`outputs.<name>.hippo_fields`¶

Type: mapping Required: Yes

Maps Hippo entity field names to expressions that compute their values from the CWL execution context. See Sidecar Expression Syntax below.

hippo_fields:
  uri: "{outputs.bam.location}"
  file_size_bytes: "{outputs.bam.size}"
  checksum_sha1: "{outputs.bam.checksum}"
  sample: "{inputs.sample}"
  genome_build: "{inputs.genome_build}"
  aligner: "{inputs.aligner}"
  quality_cutoff: "{inputs.quality_cutoff}"
  min_length: "{inputs.min_length}"

`outputs.<name>.optional`¶

Type: boolean Required: No Default: false

If true, a missing output (CWL output is null or absent) is not an error — Canon skips ingestion for this output. Use this for outputs that some workflow runs produce but others do not (e.g. BAM index files that are only generated when the aligner produces a sorted BAM).

bam_index:
  entity_type: AlignmentIndex
  optional: true
  ...

Validation: at least one non-optional output must be declared per sidecar — CanonRuleValidationError: sidecar has no required outputs.

Sidecar Expression Syntax¶

All expressions in hippo_fields are strings containing {...} placeholders:

Expression	Source	Description
`{outputs.<name>.location}`	CWL output	File URI from the CWL output object, after relocation to `output_storage`
`{outputs.<name>.checksum}`	CWL output	SHA1 checksum from the CWL output object (`sha1:deadbeef...`)
`{outputs.<name>.size}`	CWL output	File size in bytes (integer)
`{outputs.<name>.<key>}`	CWL output	Field from a CWL record output
`{outputs.<name>.entity_id}`	Ingestion	UUID of the Hippo entity created for another output in this sidecar (for cross-output references)
`{inputs.<name>}`	CWL inputs	Value that was passed to the CWL workflow — for entity ref fields, this is the Hippo UUID

Notes on {outputs.<name>.location}: Canon relocates output files from the cwltool work directory to output_storage before evaluating this expression. The value is the final storage URI (s3://... or file://...), not the temporary cwltool path.

Notes on {inputs.<name>}: the value is whatever was passed to inputs.json. For entity reference fields (genome_build, aligner, sample, etc.), this is the Hippo UUID. For scalar fields, this is the scalar value. Storing the UUID — not the name — is correct: UUIDs are stable identifiers that can be queried against.

Literal values — strings without any {...} placeholders are stored as-is.

Complete Sidecar Example¶

# star_align.canon.yaml

outputs:
  bam:
    entity_type: AlignmentFile
    identity_fields:
      - sample
      - genome_build
      - aligner
      - quality_cutoff
      - min_length
    hippo_fields:
      uri: "{outputs.bam.location}"
      file_size_bytes: "{outputs.bam.size}"
      checksum_sha1: "{outputs.bam.checksum}"
      sample: "{inputs.sample}"
      genome_build: "{inputs.genome_build}"
      aligner: "{inputs.aligner}"
      quality_cutoff: "{inputs.quality_cutoff}"
      min_length: "{inputs.min_length}"

  bam_index:
    entity_type: AlignmentIndex
    identity_fields:
      - alignment
    hippo_fields:
      uri: "{outputs.bam_index.location}"
      alignment: "{outputs.bam.entity_id}"    # UUID of the AlignmentFile just ingested
    optional: true

Part 7: Complete Validation Table¶

Canon validates all rules and sidecars at startup before accepting any requests. All errors are collected and reported together.

Rule Validation (`CanonRuleValidationError`)¶

Check	Condition	Error message
Duplicate rule name	Two rules have the same `name`	`duplicate rule name '{name}'`
Duplicate produces spec	Two rules have the same `entity_type` + `match` after resolving fixed params	`ambiguous produces: multiple rules can produce {entity_type} with these parameters`
Workflow file not found	`execute.workflow` path does not exist	`workflow not found: {path}`
Workflow not valid YAML	CWL file is not parseable	`workflow is not valid YAML: {path}`
Wrong CWL version	`cwlVersion != v1.2`	`CWL version {v} is not supported — Canon requires v1.2`
Not a Workflow	CWL file `class` is `CommandLineTool` or other	`Canon rules must reference a CWL Workflow, not {class}: {path}`
Sidecar not found	No `.canon.yaml` alongside the CWL file	`sidecar not found: {path}.canon.yaml`
Sidecar output not in CWL	A sidecar output name is not in the CWL `outputs:` block	`unknown CWL output '{name}' in sidecar {sidecar_path}`
Missing identity_field	An `identity_field` is not in `hippo_fields`	`identity_field '{field}' is not declared in hippo_fields`
No required outputs	All sidecar outputs are `optional: true`	`sidecar has no required outputs: {path}`
Unpropagated wildcard	A wildcard in `requires[].match` is absent from `produces.match`	`unpropagated wildcard '{name}' in rule '{rule}' requires[{i}].match — must appear in produces.match`
Tool ref without version	An entity ref to `ToolVersion` lacks a `version` field (or `version` is itself a wildcard on a non-wildcard produces spec)	`tool version required in rule '{rule}' — entity ref to ToolVersion must include a version field`
Unknown binding in inputs	`execute.inputs` expression references a `{bind.field}` where `bind` is not in `requires`	`unknown binding '{bind}' in execute.inputs of rule '{rule}'`
Missing CWL input	A CWL workflow input has no corresponding key in `execute.inputs`	`CWL workflow input '{input}' has no mapping in execute.inputs of rule '{rule}'`

Startup / Configuration (`CanonConfigError`)¶

Check	Error
Canon entity types missing from Hippo	`Canon entity types not found in Hippo schema. Run: hippo reference install canon`
Executor adapter not found	`executor '{name}' not found. Available adapters: {list}`
Executor binary unavailable	`executor '{name}' is configured but cwltool is not installed or not on PATH`

Runtime Rule Errors¶

These raise during resolution, not at startup:

Error class	Condition
`CanonCycleError`	Circular rule dependency: rule A requires an artifact that requires the same rule A (directly or transitively)
`CanonNoRuleError`	No rule can produce the requested `entity_type` with the given parameters
`CanonResolutionError`	`ref:T{...}` expression matches zero or more than one Hippo entity
`CanonPlanningError`	Unbound wildcard — a required parameter was not supplied in the request

Part 8: Naming Conventions Summary¶

Element	Convention	Examples
Rule names	`snake_case` verb phrase	`trim_reads`, `align_reads`, `count_genes`, `build_star_index`
Wildcard names	`snake_case` noun	`{sample}`, `{genome_build}`, `{star_version}`, `{quality_cutoff}`
Bind names	`snake_case` noun (the bound artifact)	`trimmed_fastq`, `genome_index`, `bam`, `gtf`
Entity type names	`PascalCase`	`AlignmentFile`, `TrimmedFastqFile`, `StarIndex`, `GeneCounts`
Entity ref types	`PascalCase`	`ref:ToolVersion{...}`, `ref:GenomeBuild{...}`
Workflow files	`snake_case.cwl` paired with `snake_case.canon.yaml`	`star_align.cwl` / `star_align.canon.yaml`

Part 9: Complete `canon_rules.yaml` Example¶

# canon_rules.yaml — RNA-seq pipeline (trim → align → count)

rules:

  - name: trim_reads
    description: Trim adapters and low-quality bases from raw FASTQ reads using cutadapt
    produces:
      entity_type: TrimmedFastqFile
      match:
        sample: "{sample}"
        trimmer: "ref:ToolVersion{tool.name=cutadapt, version={cutadapt_version}}"
        quality_cutoff: "{quality_cutoff}"
        min_length: "{min_length}"
    requires:
      - bind: raw_fastq
        entity_type: FastqFile
        match:
          sample: "{sample}"
    execute:
      workflow: workflows/cutadapt.cwl
      inputs:
        fastq: "{raw_fastq.uri}"
        quality_cutoff: "{quality_cutoff}"
        min_length: "{min_length}"
        sample_id: "{sample}"

  - name: build_star_index
    description: Build a STAR genome index for a given genome build and STAR version
    produces:
      entity_type: StarIndex
      match:
        genome_build: "ref:GenomeBuild{name={genome_build}}"
        aligner: "ref:ToolVersion{tool.name=STAR, version={star_version}}"
    requires:
      - bind: genome_fasta
        entity_type: GenomeFasta
        match:
          genome_build: "ref:GenomeBuild{name={genome_build}}"
    execute:
      workflow: workflows/star_index.cwl
      inputs:
        fasta: "{genome_fasta.uri}"
        genome_build: "{genome_build}"
        aligner: "{aligner}"

  - name: align_reads
    description: Align trimmed reads to a reference genome using STAR
    produces:
      entity_type: AlignmentFile
      match:
        sample: "{sample}"
        genome_build: "ref:GenomeBuild{name={genome_build}}"
        aligner: "ref:ToolVersion{tool.name=STAR, version={star_version}}"
        trimmer: "ref:ToolVersion{tool.name=cutadapt, version={cutadapt_version}}"
        quality_cutoff: "{quality_cutoff}"
        min_length: "{min_length}"
    requires:
      - bind: trimmed_fastq
        entity_type: TrimmedFastqFile
        match:
          sample: "{sample}"
          trimmer: "ref:ToolVersion{tool.name=cutadapt, version={cutadapt_version}}"
          quality_cutoff: "{quality_cutoff}"
          min_length: "{min_length}"
      - bind: genome_index
        entity_type: StarIndex
        match:
          genome_build: "ref:GenomeBuild{name={genome_build}}"
          aligner: "ref:ToolVersion{tool.name=STAR, version={star_version}}"
    execute:
      workflow: workflows/star_align.cwl
      inputs:
        fastq: "{trimmed_fastq.uri}"
        genome_index: "{genome_index.uri}"
        genome_build: "{genome_build}"
        aligner: "{aligner}"
        sample_id: "{sample}"
        quality_cutoff: "{quality_cutoff}"
        min_length: "{min_length}"

  - name: count_genes
    description: Count reads per gene using HTSeq-count
    produces:
      entity_type: GeneCounts
      match:
        sample: "{sample}"
        genome_build: "ref:GenomeBuild{name={genome_build}}"
        annotation: "ref:GeneAnnotation{source=GENCODE, version={gencode_version}}"
        aligner: "ref:ToolVersion{tool.name=STAR, version={star_version}}"
        trimmer: "ref:ToolVersion{tool.name=cutadapt, version={cutadapt_version}}"
        counter: "ref:ToolVersion{tool.name=HTSeq, version={htseq_version}}"
        strand_specific: "{strand_specific}"
        quality_cutoff: "{quality_cutoff}"
        min_length: "{min_length}"
    requires:
      - bind: bam
        entity_type: AlignmentFile
        match:
          sample: "{sample}"
          genome_build: "ref:GenomeBuild{name={genome_build}}"
          aligner: "ref:ToolVersion{tool.name=STAR, version={star_version}}"
          trimmer: "ref:ToolVersion{tool.name=cutadapt, version={cutadapt_version}}"
          quality_cutoff: "{quality_cutoff}"
          min_length: "{min_length}"
      - bind: gtf
        entity_type: GeneAnnotationFile
        match:
          annotation: "ref:GeneAnnotation{source=GENCODE, version={gencode_version}}"
    execute:
      workflow: workflows/htseq_count.cwl
      inputs:
        bam: "{bam.uri}"
        gtf: "{gtf.uri}"
        strand_specific: "{strand_specific}"
        sample_id: "{sample}"

Canon Rules Reference: canon_rules.yaml and .canon.yaml Sidecars¶

Overview¶

Part 1: canon_rules.yaml¶

Top-Level Structure¶

Rule Object¶

name¶

description¶

produces¶

requires¶

execute¶

Part 2: Parameter Types¶

Scalar Parameters¶

Entity References¶

Wildcards¶

Part 3: Entity Reference Syntax¶

Full Syntax¶

Dot-Notation Field Traversal¶

Wildcards Inside Entity References¶

Resolution Rules¶

Part 4: Wildcards¶

Wildcard Syntax¶

Wildcard Propagation Rule¶

Part 5: Input Value Expressions¶

Expression Types¶

Expression Evaluation Order¶

CWL Type Mapping¶

Part 6: .canon.yaml Sidecar Format¶

Sidecar File Naming and Location¶

Top-Level Structure¶

outputs.<name>.entity_type¶

outputs.<name>.identity_fields¶

outputs.<name>.hippo_fields¶

outputs.<name>.optional¶

Sidecar Expression Syntax¶

Complete Sidecar Example¶

Part 7: Complete Validation Table¶

Rule Validation (CanonRuleValidationError)¶

Startup / Configuration (CanonConfigError)¶

Runtime Rule Errors¶

Part 8: Naming Conventions Summary¶

Part 9: Complete canon_rules.yaml Example¶

Canon Rules Reference: `canon_rules.yaml` and `.canon.yaml` Sidecars¶

Part 1: `canon_rules.yaml`¶

`name`¶

`description`¶

`produces`¶

`requires`¶

`execute`¶

Part 6: `.canon.yaml` Sidecar Format¶

`outputs.<name>.entity_type`¶

`outputs.<name>.identity_fields`¶

`outputs.<name>.hippo_fields`¶

`outputs.<name>.optional`¶

Rule Validation (`CanonRuleValidationError`)¶

Startup / Configuration (`CanonConfigError`)¶

Part 9: Complete `canon_rules.yaml` Example¶