Building a Task Queue System

Go + Redis + PostgreSQL. A running log of what's shipped, what's next, and why. Last updated: April 2026

What Is This?

A resilient background task processing system built entirely in Go. The idea is simple: producers submit tasks over HTTP, tasks are queued in Redis by priority, and a pool of workers picks them up, executes them, retries on failure, and chains follow-up tasks — all while keeping a durable record of every attempt in PostgreSQL.

It's a real alternative to reaching for SQS + Lambda or Celery. One binary, two data stores (Redis for speed, Postgres for truth), and enough features to handle production workloads: scheduling, chaining, cancellation, idempotency, and observability out of the box.

How a Task Moves Through the System

01
POST /tasks
The HTTP handler validates the request, runs an idempotency check against PostgreSQL (same type + payload + priority = reject with 409), saves the event, then enqueues it. A 200 is returned only after both writes succeed.
02
Redis priority queue
Tasks land in one of three Redis lists — events_queue:high, events_queue:medium, or events_queue:low. Scheduled tasks go into a sorted set (events_queue:scheduled) keyed by execute_at. A separate goroutine moves scheduled tasks to the main queues when their time comes.
03
BRPOP dequeue
Workers use BRPOP across the three priority lists in order (high → medium → low), which means the call blocks until a task is available and always drains high-priority work first. No polling loop, no busy-waiting.
04
Fetch state, check cancellation
The worker fetches the full event from PostgreSQL and checks its status. If a DELETE /tasks/:id was called earlier, the status is cancelled and the task is skipped cleanly — no partial execution.
05
Execute handler
The task type determines which handler runs. Each attempt is logged to event_delivery_logs in Postgres — status, duration, error message if any. A trace ID from the original request is propagated through all log entries.
06
Success → chain. Failure → retry → DLQ.
On success, if the task has a next field, the chained task is enqueued immediately and inherits the same parent_id. On failure, exponential backoff kicks in (1s → 2s → 4s → 8s → 16s). After 5 attempts the task is moved to events_queue:dlq and marked failed in Postgres.

What's Shipped

✅

HTTP API — POST /tasks, DELETE /tasks/:id— 202 on accept, 409 on duplicate or bad cancel

✅

Priority queues — high / medium / low— Workers always drain high before touching low

✅

Scheduled tasks— execute_at defers a task; sorted set + mover goroutine handles promotion

✅

Task chaining— next field is recursive; chain shares a parent_id back to the root task

✅

Task cancellation— DELETE marks status cancelled; in-flight check skips it before execution

✅

Idempotency— type + payload + priority fingerprint stored in idempotency_keys table

✅

Retry with exponential backoff— 1s → 2s → 4s → 8s → 16s, max 5 attempts

✅

Dead letter queue— events_queue:dlq in Redis; status = failed in Postgres

✅

Delivery logs— Every attempt written to event_delivery_logs with trace ID

✅

Trace IDs— Injected at middleware layer, propagated through all logs and context

✅

Prometheus metrics at /metrics— Queue depth, processing latency, retry and failure rates

✅

Admin dashboard— Built — live queue stats, DLQ inspector, delivery log viewer

What's Next: Real Job Handlers

The worker pool, queues, retry logic, and dashboard all work. The task handlers are currently stubs. Next milestone is replacing them with implementations that actually do something useful.

🔄

Image Processing

resize_image

payload: image_url, width, height

Use the imaging library to fetch the image from the URL, resize it to the requested dimensions, and store the output. The real challenge here is handling remote fetch errors gracefully and not leaving the worker hanging on a slow URL — context with timeout will be the first thing I wire in.

⏳

Email Sending

send_email

payload: to, subject, body

Integrate with SendGrid or Mailgun via their REST API. Emails are a natural fit for the retry system — transient API failures (rate limits, 5xx) should retry with backoff, but hard bounces (invalid address) should go straight to DLQ without retrying.

⏳

PDF Report Generation

generate_report

payload: date

Use go-pdf to generate a formatted report for the given date — pull data from Postgres, render it into a PDF, and store or return the result. This one is also a natural chain target: generate_report often follows a scrape_url or resize_image task.

⏳

Webhook Delivery

deliver_webhook

payload: url, event_type, data

POST to an external URL with a JSON payload and a signature header. Webhook delivery needs its own retry behavior layered on top of the queue's retry — a 5xx from the target should retry; a 4xx should not. The delivery log already captures response codes, so distinguishing the two is straightforward.

Tech Stack & Why

Goroutines make the worker pool trivial. BRPOP blocking in a goroutine is essentially free. The standard library handles HTTP, context propagation, and signals — very few dependencies needed.

Redis

Lists for priority queues, sorted sets for scheduling. BRPOP is a single atomic blocking pop — exactly what a worker pool needs. The trade-off is at-least-once delivery (not durable by default), which is acceptable here because Postgres holds the ground truth.

PostgreSQL

Durable state store. events holds task state, event_delivery_logs records every attempt with trace ID and error, idempotency_keys deduplicates requests. Redis moves fast; Postgres keeps the record.

sqlc

Write real SQL, get type-safe Go structs back at compile time. No ORM, no runtime reflection. If a query is wrong it fails before the binary is built.

Goose

File-based migrations, numbered and committed to the repo. Rolling forward or back is a single CLI command. No migration state stored outside the database.

Prometheus

Counters and histograms exposed at /metrics. Queue depth, processing latency, and DLQ growth are the three numbers I care most about in production.

Decisions Worth Explaining

Redis for queues, not Postgres FOR UPDATE SKIP LOCKED

The common Go pattern uses Postgres row-level locks for queuing. I chose Redis lists + BRPOP instead because BRPOP blocks without polling, priority ordering across three lists is a single call, and scheduled promotion is a sorted set ZRANGEBYSCORE. The trade-off is durability — if Redis goes down between dequeue and DB update, a task could be lost. Acceptable at this stage; I'd switch to Redis Streams or Kafka to close that gap in production.

parent_id = root task ID for chains

When task A chains to B chains to C, all three share A's ID as parent_id. This means querying the full history of a pipeline is one indexed lookup — no recursive CTE needed. Slightly denormalized, but the query simplicity is worth it.

Idempotency keys in Postgres, not Redis

Redis keys expire. If a duplicate request arrives after the TTL, it would be treated as new. Postgres rows persist until explicitly deleted, which makes the dedup guarantee much stronger. Extra DB read on every create is the cost — acceptable given creates are not the hot path.

Fixed worker pool size

A fixed pool means predictable DB connection usage. I size the pool to match the Postgres connection pool limit. Dynamic scaling sounds nice but adds complexity (scale-up triggers, scale-down teardown) that I don't need until I actually hit concurrency limits.