Non-Functional Requirements
7. Non-Functional Requirements¶
Document status: Draft v0.1
Depends on: sec1_overview.md, sec2_architecture.md
7.1 Reproducibility¶
Reproducibility is Canon's primary non-functional requirement. Every artifact Canon produces must be re-producible from its provenance record alone.
Requirements:
- Every produced artifact has a
WorkflowRunentity in Hippo recording: CWL workflow file path + SHA256 hash, CWL runner name + version, execution environment type + image digest/hash, all input entity UUIDs, all parameters - CWL workflow file hashes are captured at execution time — if the workflow file changes
after execution, the hash in
WorkflowRunstill reflects what was actually run - Container image digests are captured (not just tags) —
sha256:abc123is reproducible;latestis not - Canon raises
CanonRuleValidationErrorat startup if any tool reference in a rule lacks a version — "STAR without a version" is not allowed - All entity reference parameters are stored as Hippo UUIDs in the produced entity's metadata — UUIDs are stable identifiers; names can change
Achieving reproducibility from a WorkflowRun record:
Given a WorkflowRun entity, a researcher can reproduce the execution by:
1. Obtaining the CWL file at the recorded hash (from git history or a CWL registry)
2. Pulling the exact container image by digest
3. Resolving all input entity UUIDs to their current URIs
4. Running cwltool (or the recorded runner) with the same inputs
This is possible without Canon itself — the WorkflowRun is a self-contained recipe.
7.2 Idempotency¶
Canon's canon get operation is idempotent: calling it multiple times with the same
specification produces the same result without duplicating computation or storage.
Requirements:
- A
canon getcall that finds an existing Hippo entity always returns that entity's URI without running any computation - A
canon getcall that finds an in-progressWorkflowRunraisesCanonExecutorErrorrather than launching a duplicate execution - A
canon getcall after a failedWorkflowRunre-runs the workflow (failed results are not cached) - Two concurrent
canon getcalls for the same spec are safe: one will run and the other will either find the completed result (REUSE) or the in-progress run (error)
Idempotency is not guaranteed in concurrent deployments without a distributed lock. In v0.1, Canon relies on Hippo's atomic write semantics for single-instance deployments. Multi-instance Canon deployments (e.g. two Cappella workers calling Canon in parallel for the same spec) may produce duplicate executions. Distributed locking is deferred to v0.2.
7.3 Correctness¶
Canon must never silently return an artifact that does not match the requested specification.
Requirements:
- All field comparisons in Hippo queries are exact match — Canon never uses fuzzy matching or range queries in v0.1
- Entity reference resolution raises
CanonResolutionErroron zero matches or multiple matches — Canon never silently picks one from many - Tool version is always required in rules — Canon never resolves "any STAR" to a specific version without explicit declaration
- The
unpropagated wildcardvalidation at startup ensures upstream parameters are always preserved in produced artifact metadata — Canon never silently drops provenance - Sidecar
identity_fieldsare validated against the rule'sproduces.matchat startup — a mismatch between what Canon queries and what it stores is a startup error
7.4 Observability¶
Canon provides sufficient visibility into its operations for debugging and monitoring.
Requirements:
canon planprovides a dry-run view of the full REUSE/BUILD decision tree before any executioncanon statusshows recentWorkflowRunentities from Hippo (running, completed, failed) with timing information- All Canon operations are logged at configurable levels (
DEBUG,INFO,WARNING,ERROR) to stderr DEBUGlogging includes: Hippo queries + response times, rule matching decisions, wildcard bindings, entity ref resolutionsINFOlogging includes: REUSE/BUILD decisions, CWL execution start/end, ingestion confirmation- CWL runner stderr is captured in
WorkflowRun.stderr(truncated to 64KB) for post-execution debugging - Failed
WorkflowRunentities in Hippo are queryable —canon status --failedshows all failures
7.5 Performance¶
Canon is not a high-throughput system. Its performance envelope is appropriate for research workloads, not real-time services.
Targets:
| Operation | Target latency | Notes |
|---|---|---|
| REUSE (Hippo query hit) | < 500ms | Dominated by Hippo network roundtrip |
| Entity ref resolution | < 200ms per ref | One Hippo query per ref:T{...} expression |
| Rule matching | < 10ms | In-memory after startup |
| CWL execution | Minutes to hours | Determined by the workflow, not Canon |
| Canon startup (rule validation) | < 5s for 100 rules | CWL file validation + Hippo schema check |
Canon does not cache Hippo query results within a single canon get call. Each
entity ref and each registry lookup is a fresh Hippo query. This ensures correctness —
the registry may have been updated by another process between queries — at the cost of
additional network roundtrips. Caching may be added in v0.2 with a configurable TTL.
Sequential input resolution in v0.1. Required inputs are resolved one at a time
in the order declared in requires:. Parallel resolution (resolving independent inputs
concurrently) is deferred to v0.2. For most pipelines with 2–4 inputs per rule, the
overhead is negligible compared to CWL execution time.
7.6 Security¶
Canon inherits Hippo's security model for all data access.
Requirements:
- The
hippo_tokenincanon.yamlmust have read+write access to all entity types used by Canon rules — Canon does not perform partial-permission graceful degradation canon.yamlshould not be committed to version control if it contains credentials — use environment variable substitution:hippo_token: "${HIPPO_TOKEN}"- Canon does not execute arbitrary code from canon_rules.yaml — rules are data, not code; wildcard values are never evaluated as expressions
- CWL
ExpressionTool(JavaScript) is supported but discouraged — JavaScript execution in CWL steps is outside Canon's security perimeter; restrict withcwltool_options: ["--disable-ext"]if needed - Staging directory (
work_dir/staging) should be on a filesystem inaccessible to other users if input data is sensitive - Canon does not implement authentication beyond forwarding
hippo_token— access control for produced artifacts is Hippo's responsibility
7.7 Deployment and Operations¶
Requirements:
- Canon runs on any system with Python 3.11+ and Docker (or another container runtime)
- No server process required for local use —
canon getis a CLI command that runs and exits - Single
canon.yamlper working directory — supports multiple Canon configurations for different Hippo deployments or rule sets by running from different directories - Canon is stateless except for ephemeral CWL work directories and the optional
~/.canon/runs.dbSQLite database forcanon status(rebuildable from Hippo) canon rules validateshould be run in CI/CD alongside CWL file changes to catch rule errors before deployment- Canon version and the Canon Hippo reference schema version are always identical —
upgrading Canon requires re-running
hippo reference install canonto apply any schema changes
7.8 Extensibility¶
Canon is designed to be extended without modifying the core package.
Extension points:
| Extension point | Mechanism | Example |
|---|---|---|
| New CWL executor backend | canon.executor_adapters entry point |
canon-executor-toil |
| Community workflow packages | canon.workflow_packages entry point |
canon-workflows-rnaseq |
| Domain entity types | hippo.reference_loaders entry point |
bundled in workflow packages |
| Custom Canon entity types | hippo.reference_loaders entry point |
lab-specific entity extensions |
All extension points use standard Python entry points — no Canon-specific plugin API
or registration step beyond pip install.
7.9 Versioning and Compatibility¶
- Canon follows semantic versioning (semver)
- The Canon Hippo reference schema version always matches the Canon package version
- Breaking changes to
canon_rules.yamlsyntax increment the major version - CWL v1.2 is the minimum supported version — older CWL files are not supported
- Canon guarantees that existing rules continue to work across minor version upgrades
- The
WorkflowRunentity schema is stable within a major version — fields may be added (additive) but not renamed or removed within a major release