Service Spec: notification-service

What this service is

A queue consumer. It has no HTTP API (except a /health endpoint). It listens to the notifications BullMQ queue and sends emails, stores in-app notifications, and fires outbound webhooks.

It never decides when to notify. core-api decides that and enqueues the job. This service only executes the send.


Tech stack

  • Runtime: Node 20
  • Language: TypeScript (strict mode)
  • Queue consumer: bullmq Worker
  • Email: resend SDK (primary), fallback-ready interface
  • Database: Prisma (to write in-app notifications to DB)
  • Redis: ioredis (BullMQ connection)

Folder structure

apps/notification-service/
├── src/
│   ├── main.ts               # start worker + health server
│   ├── worker.ts             # BullMQ Worker setup
│   ├── processors/
│   │   ├── send-email.ts
│   │   ├── send-in-app.ts
│   │   └── trigger-webhook.ts
│   ├── templates/
│   │   ├── otp.ts
│   │   ├── application-status.ts
│   │   ├── job-posted.ts
│   │   ├── verification-approved.ts
│   │   ├── verification-rejected.ts
│   │   └── announcement.ts
│   ├── plugins/
│   │   ├── redis.ts
│   │   └── prisma.ts
│   └── health.ts             # minimal Fastify server just for /health
├── Dockerfile
├── tsconfig.json
└── package.json

Queue setup

// worker.ts
import { Worker } from 'bullmq';
 
const worker = new Worker(
  'notifications',
  async (job) => {
    switch (job.name) {
      case 'send-email':     return processSendEmail(job.data);
      case 'send-in-app':    return processSendInApp(job.data);
      case 'trigger-webhook': return processTriggerWebhook(job.data);
      default: throw new Error(`Unknown job type: ${job.name}`);
    }
  },
  {
    connection: redisConnection,
    concurrency: 10,    // process up to 10 jobs simultaneously
    limiter: {
      max: 100,
      duration: 1000,   // max 100 jobs/sec (Resend rate limit headroom)
    },
  }
);
 
worker.on('failed', (job, err) => {
  console.error(`Job ${job?.id} failed:`, err.message);
  // dead-letter: after 3 retries, job moves to failed queue (BullMQ built-in)
});

Job types and payloads

send-email

interface SendEmailJob {
  tenantId: string;
  to: string | string[];
  template: EmailTemplate;
  data: Record<string, any>;   // template-specific variables
  replyTo?: string;
  cc?: string[];
}
 
type EmailTemplate =
  | 'otp'
  | 'application-submitted'
  | 'application-status-changed'
  | 'job-posted'
  | 'verification-approved'
  | 'verification-rejected'
  | 'announcement'
  | 'cycle-enrollment'
  | 'event-reminder'
  | 'debarment-notice';

send-in-app

interface SendInAppJob {
  tenantId: string;
  userId: string;
  title: string;
  body: string;
  link?: string;    // deep link to relevant page
  type: 'info' | 'success' | 'warning' | 'action_required';
}
// Writes to `notifications` table in DB. Frontend polls or uses SSE.

trigger-webhook

interface TriggerWebhookJob {
  tenantId: string;
  url: string;
  secret: string;      // HMAC-SHA256 signing key
  event: string;       // e.g. "application.status_changed"
  payload: Record<string, any>;
  retryCount?: number;
}
// Signs payload with HMAC-SHA256, POSTs to url
// Verifies 2xx response, else BullMQ will retry

Email templates

All templates are TypeScript functions returning { subject, html, text }. No external template engine — keep it simple.

// templates/otp.ts
export function otpTemplate(data: { otp: string; expiryMinutes: number; tenantName: string }) {
  return {
    subject: `Your ${data.tenantName} login code`,
    html: `
      <p>Your one-time login code is:</p>
      <h1 style="letter-spacing: 0.3em">${data.otp}</h1>
      <p>Expires in ${data.expiryMinutes} minutes. Do not share this code.</p>
    `,
    text: `Your login code is ${data.otp}. Expires in ${data.expiryMinutes} minutes.`
  };
}

Retry strategy

BullMQ handles retries automatically. Config per job type:

// Defined in packages/queue/src/jobs.ts (shared with core-api)
export const emailJobOptions = {
  attempts: 3,
  backoff: { type: 'exponential', delay: 2000 },  // 2s, 4s, 8s
  removeOnComplete: { age: 86400 },    // keep completed jobs for 24h
  removeOnFail: { age: 604800 },       // keep failed jobs for 7 days
};
 
export const webhookJobOptions = {
  attempts: 5,
  backoff: { type: 'exponential', delay: 5000 },  // 5s, 10s, 20s, 40s, 80s
  removeOnComplete: { age: 3600 },
  removeOnFail: { age: 604800 },
};

Health check

Minimal Fastify server on port 4001:

GET /health → { status: "ok", queueDepth: N, workerActive: true }

queueDepth = number of jobs currently waiting in queue (fetched from BullMQ). Alert if this grows above 1000 (means worker is falling behind).