Building a Task Queue System
Go + Redis + PostgreSQL. A running log of what's shipped, what's next, and why. Last updated: April 2026
What Is This?
A resilient background task processing system built entirely in Go. The idea is simple: producers submit tasks over HTTP, tasks are queued in Redis by priority, and a pool of workers picks them up, executes them, retries on failure, and chains follow-up tasks — all while keeping a durable record of every attempt in PostgreSQL.
It's a real alternative to reaching for SQS + Lambda or Celery. One binary, two data stores (Redis for speed, Postgres for truth), and enough features to handle production workloads: scheduling, chaining, cancellation, idempotency, and observability out of the box.
How a Task Moves Through the System
- 01
POST /tasks
The HTTP handler validates the request, runs an idempotency check against PostgreSQL (same type + payload + priority = reject with 409), saves the event, then enqueues it. A 200 is returned only after both writes succeed.
- 02
Redis priority queue
Tasks land in one of three Redis lists — events_queue:high, events_queue:medium, or events_queue:low. Scheduled tasks go into a sorted set (events_queue:scheduled) keyed by execute_at. A separate goroutine moves scheduled tasks to the main queues when their time comes.
- 03
BRPOP dequeue
Workers use BRPOP across the three priority lists in order (high → medium → low), which means the call blocks until a task is available and always drains high-priority work first. No polling loop, no busy-waiting.
- 04
Fetch state, check cancellation
The worker fetches the full event from PostgreSQL and checks its status. If a DELETE /tasks/:id was called earlier, the status is cancelled and the task is skipped cleanly — no partial execution.
- 05
Execute handler
The task type determines which handler runs. Each attempt is logged to event_delivery_logs in Postgres — status, duration, error message if any. A trace ID from the original request is propagated through all log entries.
- 06
Success → chain. Failure → retry → DLQ.
On success, if the task has a next field, the chained task is enqueued immediately and inherits the same parent_id. On failure, exponential backoff kicks in (1s → 2s → 4s → 8s → 16s). After 5 attempts the task is moved to events_queue:dlq and marked failed in Postgres.
What's Shipped
What's Next: Real Job Handlers
The worker pool, queues, retry logic, and dashboard all work. The task handlers are currently stubs. Next milestone is replacing them with implementations that actually do something useful.
Image Processing
resize_imagepayload: image_url, width, height
Use the imaging library to fetch the image from the URL, resize it to the requested dimensions, and store the output. The real challenge here is handling remote fetch errors gracefully and not leaving the worker hanging on a slow URL — context with timeout will be the first thing I wire in.
Email Sending
send_emailpayload: to, subject, body
Integrate with SendGrid or Mailgun via their REST API. Emails are a natural fit for the retry system — transient API failures (rate limits, 5xx) should retry with backoff, but hard bounces (invalid address) should go straight to DLQ without retrying.
PDF Report Generation
generate_reportpayload: date
Use go-pdf to generate a formatted report for the given date — pull data from Postgres, render it into a PDF, and store or return the result. This one is also a natural chain target: generate_report often follows a scrape_url or resize_image task.
Webhook Delivery
deliver_webhookpayload: url, event_type, data
POST to an external URL with a JSON payload and a signature header. Webhook delivery needs its own retry behavior layered on top of the queue's retry — a 5xx from the target should retry; a 4xx should not. The delivery log already captures response codes, so distinguishing the two is straightforward.
Tech Stack & Why
Goroutines make the worker pool trivial. BRPOP blocking in a goroutine is essentially free. The standard library handles HTTP, context propagation, and signals — very few dependencies needed.
Lists for priority queues, sorted sets for scheduling. BRPOP is a single atomic blocking pop — exactly what a worker pool needs. The trade-off is at-least-once delivery (not durable by default), which is acceptable here because Postgres holds the ground truth.
Durable state store. events holds task state, event_delivery_logs records every attempt with trace ID and error, idempotency_keys deduplicates requests. Redis moves fast; Postgres keeps the record.
Write real SQL, get type-safe Go structs back at compile time. No ORM, no runtime reflection. If a query is wrong it fails before the binary is built.
File-based migrations, numbered and committed to the repo. Rolling forward or back is a single CLI command. No migration state stored outside the database.
Counters and histograms exposed at /metrics. Queue depth, processing latency, and DLQ growth are the three numbers I care most about in production.
Decisions Worth Explaining
Redis for queues, not Postgres FOR UPDATE SKIP LOCKED
The common Go pattern uses Postgres row-level locks for queuing. I chose Redis lists + BRPOP instead because BRPOP blocks without polling, priority ordering across three lists is a single call, and scheduled promotion is a sorted set ZRANGEBYSCORE. The trade-off is durability — if Redis goes down between dequeue and DB update, a task could be lost. Acceptable at this stage; I'd switch to Redis Streams or Kafka to close that gap in production.
parent_id = root task ID for chains
When task A chains to B chains to C, all three share A's ID as parent_id. This means querying the full history of a pipeline is one indexed lookup — no recursive CTE needed. Slightly denormalized, but the query simplicity is worth it.
Idempotency keys in Postgres, not Redis
Redis keys expire. If a duplicate request arrives after the TTL, it would be treated as new. Postgres rows persist until explicitly deleted, which makes the dedup guarantee much stronger. Extra DB read on every create is the cost — acceptable given creates are not the hot path.
Fixed worker pool size
A fixed pool means predictable DB connection usage. I size the pool to match the Postgres connection pool limit. Dynamic scaling sounds nice but adds complexity (scale-up triggers, scale-down teardown) that I don't need until I actually hit concurrency limits.