Payment Webhook Handling
Payment webhooks define the precise ingress boundary between external financial processors and the Registration Ingestion & Payment Reconciliation subsystem. In event operations, this boundary determines whether a successful transaction transitions into a confirmed attendee record or remains in a pending reconciliation state. Unlike synchronous form submissions, webhooks operate asynchronously, arrive at least once, and require deterministic handling to guarantee exactly-once processing semantics. This pipeline stage isolates cryptographic verification, payload normalization, idempotency enforcement, and immediate acknowledgment. It explicitly terminates before badge generation, email routing, or financial reporting begin, ensuring clean separation of concerns across the automation stack.
Pipeline Boundary & Scope Link to this section
The webhook handler acts as a stateless gatekeeper. Its sole responsibility is to validate, secure, and normalize incoming financial events before delegating downstream state mutations. Any logic involving attendee provisioning, seat allocation, or receipt generation violates the boundary contract and must be routed to subsequent pipeline stages. This strict isolation prevents cascading failures: if badge rendering services degrade, the webhook boundary continues to accept and acknowledge payments, preserving financial ledger integrity while downstream queues absorb the load.
Data Contract & Schema Enforcement Link to this section
Incoming events must conform to a rigid internal schema that standardizes provider-specific variations into a canonical format. The contract requires the following fields:
provider_event_id(string, UUID or alphanumeric)event_type(enum:charge.succeeded,payment.failed,refund.processed)payment_status(enum:completed,pending,failed)customer_email(string, RFC 5322 compliant)transaction_amount(integer, minor units/cents)currency(string, ISO 4217)metadata(object containingticket_tierandregistration_session_id)
Validation occurs synchronously at the endpoint using a schema enforcement layer. Fields failing type checks, missing required keys, or containing out-of-range values trigger an immediate 400 Bad Request response. The handler logs the rejection with a structured error code (ERR_SCHEMA_VIOLATION) and discards the payload, preserving queue capacity for valid events. This contract acts as the first gatekeeper, ensuring that only semantically complete payment confirmations proceed to state mutation.
Cryptographic Verification & Idempotency Enforcement Link to this section
Security at the webhook boundary relies on cryptographic signature verification before any business logic executes. The handler must reconstruct the signature using the raw request body, timestamp, and a provider-managed signing secret. Mismatched signatures or expired timestamps (typically >5 minutes) result in a 401 Unauthorized response, preventing spoofed events from corrupting registration ledgers. Implementation follows the HMAC-SHA256 standard and aligns with provider-specific requirements like Stripe’s signature verification protocol.
Once verified, the system enforces idempotency using the provider’s unique event identifier. A distributed idempotency store (Redis or PostgreSQL) tracks processed event IDs with a TTL matching the provider’s retry window (usually 72 hours). Duplicate deliveries return a 200 OK immediately without re-executing downstream logic. This pattern eliminates race conditions during provider retries and network partitions, maintaining consistent registration counts even under high-throughput event surges. Provider-specific payload transformations are documented in the Handling Stripe Payment Webhooks for Ticket Purchases reference.
Production Implementation Link to this section
The following FastAPI implementation demonstrates a production-ready webhook boundary. It enforces raw-body signature verification, Pydantic schema validation, Redis-backed idempotency, and immediate acknowledgment before delegating to background workers.
import hmac
import hashlib
import time
import logging
from typing import Optional
from fastapi import FastAPI, Request, Response, HTTPException, BackgroundTasks
from pydantic import BaseModel, EmailStr, Field, ValidationError
from redis import Redis
from contextlib import asynccontextmanager
logger = logging.getLogger("payment_webhook")
# --- Data Contract ---
class Metadata(BaseModel):
ticket_tier: str
registration_session_id: str
class PaymentWebhookPayload(BaseModel):
provider_event_id: str = Field(..., min_length=1)
event_type: str
payment_status: str
customer_email: EmailStr
transaction_amount: int = Field(..., gt=0)
currency: str = Field(..., min_length=3, max_length=3)
metadata: Metadata
# --- Infrastructure (Swap for production clients) ---
redis_client = Redis(host="localhost", port=6379, db=0, decode_responses=True)
SIGNING_SECRET = "whsec_production_secret_here"
IDEMPOTENCY_TTL = 259200 # 72 hours
@asynccontextmanager
async def lifespan(app: FastAPI):
logger.info("Webhook boundary initialized. Idempotency TTL: %ds", IDEMPOTENCY_TTL)
yield
app = FastAPI(lifespan=lifespan)
def verify_signature(raw_body: bytes, signature_header: str, timestamp_header: str) -> bool:
try:
ts = int(timestamp_header)
if abs(time.time() - ts) > 300:
return False
expected = hmac.new(
SIGNING_SECRET.encode(),
f"{ts}.{raw_body.decode('utf-8')}".encode(),
hashlib.sha256
).hexdigest()
return hmac.compare_digest(signature_header, f"v1={expected}")
except (ValueError, TypeError):
return False
@app.post("/webhooks/payment")
async def handle_payment_webhook(
request: Request,
background_tasks: BackgroundTasks,
response: Response
):
raw_body = await request.body()
sig_header = request.headers.get("X-Webhook-Signature", "")
ts_header = request.headers.get("X-Webhook-Timestamp", "")
# 1. Cryptographic Verification
if not verify_signature(raw_body, sig_header, ts_header):
raise HTTPException(status_code=401, detail="ERR_INVALID_SIGNATURE")
# 2. Schema Enforcement
try:
payload = PaymentWebhookPayload.model_validate_json(raw_body)
except ValidationError as e:
logger.warning("ERR_SCHEMA_VIOLATION: %s", str(e))
raise HTTPException(status_code=400, detail="ERR_SCHEMA_VIOLATION")
# 3. Idempotency Check
cache_key = f"idempotency:{payload.provider_event_id}"
if redis_client.exists(cache_key):
logger.info("DUPLICATE_EVENT_ACK: %s", payload.provider_event_id)
return Response(status_code=200)
# 4. Mark as processed & ACK immediately
redis_client.setex(cache_key, IDEMPOTENCY_TTL, "processed")
response.status_code = 200
# 5. Delegate downstream (Async Batch Processing)
background_tasks.add_task(
dispatch_to_reconciliation_queue,
payload.model_dump(mode="json")
)
return Response(status_code=200)
async def dispatch_to_reconciliation_queue(normalized_payload: dict):
"""
Production deployments must route this to a durable message broker (RabbitMQ, SQS, or Celery).
This stage explicitly hands off to downstream state machines.
"""
logger.info("QUEUE_DISPATCH: %s", normalized_payload["provider_event_id"])
# Implementation routes to [Async Batch Processing](/registration-ingestion-payment-reconciliation/async-batch-processing/)
pass
Fallback Chains & Incident Resolution Link to this section
Production deployments require explicit fallback chains to handle transient infrastructure failures without violating provider retry SLAs. The webhook boundary must never block on downstream service availability.
- Redis/Idempotency Store Failure: If the idempotency check times out, the handler defaults to a
500 Internal Server Error. Providers will retry, and the next attempt will likely succeed. Do not implement local fallback caches; they risk duplicate processing during split-brain scenarios. - Background Task Dispatch Failure: If the message broker is unreachable, log the payload to a local dead-letter directory with a structured filename (
{correlation_id}.json) and trigger a PagerDuty alert. A scheduled reconciliation job will drain the directory once connectivity restores. - Provider Retry Exhaustion: If a provider stops retrying after a persistent
5xxresponse, the system relies on Form API Polling Strategies to reconcile missing transactions. Polling acts as a deterministic backstop, querying the provider’s transaction API for unconfirmed session IDs and patching gaps in the ledger.
Debugging & Observability Link to this section
Fast incident resolution depends on structured telemetry and explicit correlation identifiers. Every webhook request must emit logs containing:
correlation_id: UUID generated at ingressprovider_event_id: Original transaction identifiersignature_valid: booleanschema_errors: array (if applicable)idempotency_hit: boolean
Runbook for Common Incidents:
- High 401 Rate: Verify signing secret rotation. Check clock skew between server and provider. Ensure raw body is not mutated by middleware before verification.
- High 400 Rate: Audit provider payload changes. Validate schema version compatibility. Check for malformed
metadataobjects from legacy checkout flows. - Duplicate Processing Detected: Confirm Redis TTL alignment with provider retry windows. Verify
hmac.compare_digestis used instead of==to prevent timing attacks.
Metrics should track webhook_ack_latency_p95, idempotency_hit_rate, and schema_rejection_count. Alert thresholds must trigger when rejection rates exceed 2% over a 5-minute window or when acknowledgment latency surpasses 800ms. This observability layer ensures the boundary remains resilient, auditable, and strictly decoupled from downstream event automation pipelines.