Cross-Component Sync
5. Cross-Component Sync¶
Document status: Draft v0.1 Depends on: sec2_architecture.md, Hippo sec5 (ingestion), Hippo sec6 (provenance), Cappella sec5 (workflows) Feeds into: Bridge sec6 (NFR — sync latency targets), deployment docs
5.1 What Cross-Component Sync Addresses¶
BASS components operate independently. Each has its own state:
- Hippo stores entity metadata and provenance records.
- Cappella tracks pipeline run state, reconciliation records, and trigger history.
- Canon maintains a file artifact cache and resolution records.
After a pipeline run, a reconciliation scan, or a bulk ingest, these stores must be mutually consistent. Inconsistencies arise in two ways:
-
Write failures under partial completion — A Cappella pipeline registers outputs to Hippo, but the Hippo write fails mid-batch. Cappella believes the run succeeded; Hippo has partial data.
-
Eventual-consistency lag — Components are written independently (SDK calls, direct REST) without a distributed transaction. A brief window exists where Cappella's run log shows success but Hippo's entity store has not yet reflected the output.
Bridge's sync subsystem provides detection and repair for both cases. It does not impose a two-phase commit protocol — that would couple components too tightly. Instead, it uses reconciliation events and consistency checks to identify and surface discrepancies.
5.2 Sync Model¶
Bridge uses an event-driven consistency model:
┌──────────┐ run_completed ┌──────────────┐ check_hippo ┌────────┐
│ Cappella │────────────────▶│ Bridge Sync │──────────────▶│ Hippo │
│ │ │ Engine │ │ │
│ │◀────────────────│ │◀──────────────│ │
│ │ consistency_ok │ │ entities_ok └────────┘
└──────────┘ (or mismatch └──────────────┘
event)
-
Event subscription — Cappella emits structured lifecycle events (run started, run completed, reconciliation finished). Bridge subscribes to these events via a lightweight in-process message bus (v0.1) or a persistent event queue (v1.1+).
-
Consistency check — On receipt of a completion event, Bridge queries both Cappella (what did the run produce?) and Hippo (are those entities present?) and compares.
-
Mismatch handling — If a discrepancy is found, Bridge emits a
sync_mismatchevent, records it in the sync event log, and optionally triggers a repair workflow.
5.3 Event Types¶
Bridge subscribes to and emits the following sync-relevant events:
5.3.1 Inbound Events (from components)¶
| Event | Source | Payload |
|---|---|---|
cappella.run.completed |
Cappella | run_id, pipeline_id, outputs: [{entity_type, external_id}], actor |
cappella.run.failed |
Cappella | run_id, pipeline_id, error, partial_outputs |
cappella.reconciliation.completed |
Cappella | scan_id, discrepancy_count, resolved_count |
hippo.ingest.batch_completed |
Hippo | batch_id, entity_type, count, actor |
canon.cache.evicted |
Canon | artifact_ids: [], reason |
5.3.2 Outbound Events (from Bridge Sync Engine)¶
| Event | Consumers | Meaning |
|---|---|---|
bridge.sync.mismatch |
Observability, ops team | Cappella says entity exists; Hippo does not |
bridge.sync.repaired |
Observability | Mismatch resolved by re-trigger or manual correction |
bridge.sync.stale_cache |
Canon | Canon has artifacts for entities that no longer exist in Hippo |
5.4 Consistency Check Procedures¶
5.4.1 Cappella → Hippo Post-Run Check¶
After receiving cappella.run.completed, Bridge performs:
- Fetch
outputslist from the completed run record:GET /cappella/runs/{runId}/outputs - For each output, verify entity existence in Hippo:
GET /hippo/entities/{type}/{externalId} - If any entity is missing, record a
bridge.sync.mismatchevent with: run_id,missing_entities,actor,timestamprepair_strategy:resubmit_run|manual(from config)- If all entities are present, record
bridge.sync.ok.
This check runs asynchronously and does not block the pipeline run response to the caller.
5.4.2 Canon Cache Staleness Check¶
After a Hippo bulk deletion or availability change:
- Bridge queries Canon for artifacts associated with the affected entities.
- If Canon has cached artifacts for entities that are now unavailable, Bridge emits
bridge.sync.stale_cacheso Canon can evict those artifacts on its next sweep.
5.4.3 Periodic Full Consistency Scan¶
Bridge can run a scheduled full consistency scan (configurable interval, default: daily):
- Compare Cappella's run output history with Hippo's entity provenance.
- Surface runs that produced outputs not present in Hippo.
- Surface Hippo entities whose provenance references Cappella runs that no longer exist.
Scan results are written to the sync event log and available via:
GET /api/v1/bridge/sync/scan/latest
5.5 Mismatch Repair¶
Bridge does not automatically repair all mismatches — some require human judgement. The repair strategy per mismatch type is configurable:
| Mismatch Type | Default Strategy | Options |
|---|---|---|
| Missing Hippo entity after Cappella run | alert_only |
alert_only, resubmit_run |
| Stale Canon artifact | evict_on_next_sweep |
evict_immediately, evict_on_next_sweep, alert_only |
| Partial Cappella output batch | alert_only |
alert_only, resubmit_run |
| Cappella run referencing non-existent Hippo entity | alert_only |
alert_only |
resubmit_run: Bridge calls POST /cappella/runs with the same pipeline and inputs as the
failed run. This is safe only for idempotent pipelines (those that use ExternalID upsert
semantics in Hippo). Bridge checks idempotency metadata on the pipeline definition before
auto-resubmitting.
5.6 Sync Event Log¶
All sync events are stored in Bridge's local sync event log (SQLite or PostgreSQL table).
Schema:
sync_events
├── id UUID PRIMARY KEY
├── event_type TEXT (e.g. 'bridge.sync.mismatch')
├── source TEXT (component emitting the triggering event)
├── source_id TEXT (e.g. run_id, batch_id)
├── actor TEXT
├── details JSON
├── resolved BOOLEAN DEFAULT false
├── resolved_at TIMESTAMP
├── created_at TIMESTAMP
Query API:
GET /api/v1/bridge/sync/events?status=unresolved&limit=50
GET /api/v1/bridge/sync/events/{eventId}
POST /api/v1/bridge/sync/events/{eventId}/resolve (admin only)
GET /api/v1/bridge/sync/scan/latest
POST /api/v1/bridge/sync/scan (trigger on-demand scan; admin only)
5.7 Event Transport¶
v0.1: In-Process Event Bus¶
For v0.1 (single-server deployments), events are dispatched in-process using a simple asyncio-based pub/sub bus:
# bridge/sync/event_bus.py
bus = EventBus()
bus.subscribe("cappella.run.completed", sync_engine.on_run_completed)
bus.emit("cappella.run.completed", payload)
Events are not persisted in the bus. If Bridge restarts mid-check, the check is lost. This is acceptable for v0.1; missed checks are caught by the next periodic scan.
v1.1: Persistent Event Queue¶
For production deployments where Bridge may restart during a long-running check, v1.1 will introduce an optional persistent event queue backend (Redis Streams or PostgreSQL LISTEN/NOTIFY). This is explicitly out of scope for v0.1.
5.8 Open Questions¶
| Question | Priority | Status |
|---|---|---|
Should auto-resubmit (resubmit_run) require an approval step for non-idempotent pipelines? |
High | Open |
| How should Bridge handle sync checks when a component is temporarily unreachable (retry vs. skip)? | High | Open — likely exponential backoff with a max-retry cap |
| Should sync mismatches be surfaced in the Aperture web portal as alerts? | Medium | Deferred to Aperture v0.2 |
| Event schema versioning — should events carry a schema version for forward compatibility? | Low | Open |