ImpactSense Phase 1 Incremental Pipeline - Handoff Summary
Primary intent
- Build an accuracy-first, push-driven incremental pipeline:
GitLab Push -> Webhook API -> RabbitMQ -> Consumer -> Mode Decide -> Delta Resolve -> Targeted Parse -> Incremental Neo4j Update -> Impact Query.
- Branch strategy: new work should diverge from erlang_issues (not main).
Repository and branch context
- Parser repo root: /Users/sujal.v/Desktop/impactSenseProject/parser
- Webhook service path: parser/impactsense-webhook
- Current branch line: erlang_issues -> incremental-pipeline (same tip when checked)
- main is on a separate line and behind this stream.
Jira updates
- Created subtask CRM-3568 under CRM-3482:
"Phase 1: Implement repo-level incremental parsing and Neo4j graph updates"
- Added progress comment to CRM-3568 with completion and pending items.
Implemented in code
1) Webhook ingress service (modularized)
- Endpoints:
- GET /healthz
- POST /events/vcs/push
- Token validation via WEBHOOK_SECRET headers check.
- Payload normalization for GitLab/GitHub/unknown.
- Deterministic job_key generation.
- Branch gating via TRACKED_BRANCH.
2) RabbitMQ publishing
- Publishes normalized events to:
- Exchange: impactsense.events
- Routing key: push
- Uses persistent publish settings and confirm handling.
3) Modular file split (no monolithic main.rs)
- parser/impactsense-webhook/src/main.rs
- parser/impactsense-webhook/src/config.rs
- parser/impactsense-webhook/src/model.rs
- parser/impactsense-webhook/src/amqp.rs
- parser/impactsense-webhook/src/webhook.rs
- parser/impactsense-webhook/src/consumer.rs
- parser/impactsense-webhook/src/delta.rs
4) Consumer contract
- Consumes from AMQP_QUEUE (default impactsense.push).
- Deserializes event payload.
- Mode fallback:
- before_sha == 0000000000000000000000000000000000000000 -> bootstrap
- else -> incremental
- Logs contract details and ack/nack handling.
5) Delta resolve implementation
- Runs:
git -C <repo> diff --name-status --find-renames <before_sha> <after_sha>
- Produces:
- added_files
- modified_files
- deleted_files
- renamed_files
- parse_targets (A + M + R.new)
- cleanup_targets (D + M + R.old)
- Wired into consumer for incremental mode.
6) Build quality
- cargo check passes.
- Lints clean.
- Deprecated tokio-amqp usage removed and modernized.
Current environment variables used
- BIND_ADDR (default 0.0.0.0:8080)
- WEBHOOK_SECRET
- TRACKED_BRANCH (default master)
- AMQP_ADDR (default amqp://127.0.0.1:5672/%2f)
- AMQP_EXCHANGE (default impactsense.events)
- AMQP_ROUTING_KEY (default push)
- AMQP_QUEUE (default impactsense.push)
- GIT_REPO_PATH (default .., expected parser repo root for delta resolve)
Infra/test status discussed
- GitLab webhook delivery tested with HTTP 202.
- RabbitMQ route test passed:
- push routed to impactsense.push
- unknown routing key not routed
- Windows deploy discussed on host 10.166.1.220, intended port moved to 8093 in troubleshooting.
Pending implementation (next steps)
1) Targeted parse integration
- Add scanner/parser entrypoint that accepts parse_targets for incremental mode.
- Keep current full scan path for bootstrap.
2) Incremental Neo4j update path
- Cleanup by cleanup_targets (deleted + renamed old + stale changed scope).
- Upsert from newly parsed changed files.
3) Retry and DLQ hardening
- Explicit retry policy, max attempts, and dead-letter flow in consumer runtime.
4) Idempotency persistence
- Persist processed job_key to prevent duplicate processing on redelivery.
5) End-to-end worker pipeline
- Consumer flow:
mode -> delta -> targeted parse -> extract -> persist -> run summary metrics.
Operational notes and user preferences
- Accuracy is prioritized over aggressive optimization.
- Bootstrap path must remain available.
- Incremental path should be additive (do not break full path).
- Parser repo git history is source-of-truth for delta resolve.
- RabbitMQ chosen as queueing backbone.