Service Spec: notification-service
What this service is
A queue consumer. It has no HTTP API (except a /health endpoint). It listens to the notifications BullMQ queue and sends emails, stores in-app notifications, and fires outbound webhooks.
It never decides when to notify. core-api decides that and enqueues the job. This service only executes the send.
Tech stack
- Runtime: Node 20
- Language: TypeScript (strict mode)
- Queue consumer:
bullmqWorker - Email:
resendSDK (primary), fallback-ready interface - Database: Prisma (to write in-app notifications to DB)
- Redis:
ioredis(BullMQ connection)
Folder structure
apps/notification-service/
├── src/
│ ├── main.ts # start worker + health server
│ ├── worker.ts # BullMQ Worker setup
│ ├── processors/
│ │ ├── send-email.ts
│ │ ├── send-in-app.ts
│ │ └── trigger-webhook.ts
│ ├── templates/
│ │ ├── otp.ts
│ │ ├── application-status.ts
│ │ ├── job-posted.ts
│ │ ├── verification-approved.ts
│ │ ├── verification-rejected.ts
│ │ └── announcement.ts
│ ├── plugins/
│ │ ├── redis.ts
│ │ └── prisma.ts
│ └── health.ts # minimal Fastify server just for /health
├── Dockerfile
├── tsconfig.json
└── package.jsonQueue setup
// worker.ts
import { Worker } from 'bullmq';
const worker = new Worker(
'notifications',
async (job) => {
switch (job.name) {
case 'send-email': return processSendEmail(job.data);
case 'send-in-app': return processSendInApp(job.data);
case 'trigger-webhook': return processTriggerWebhook(job.data);
default: throw new Error(`Unknown job type: ${job.name}`);
}
},
{
connection: redisConnection,
concurrency: 10, // process up to 10 jobs simultaneously
limiter: {
max: 100,
duration: 1000, // max 100 jobs/sec (Resend rate limit headroom)
},
}
);
worker.on('failed', (job, err) => {
console.error(`Job ${job?.id} failed:`, err.message);
// dead-letter: after 3 retries, job moves to failed queue (BullMQ built-in)
});Job types and payloads
send-email
interface SendEmailJob {
tenantId: string;
to: string | string[];
template: EmailTemplate;
data: Record<string, any>; // template-specific variables
replyTo?: string;
cc?: string[];
}
type EmailTemplate =
| 'otp'
| 'application-submitted'
| 'application-status-changed'
| 'job-posted'
| 'verification-approved'
| 'verification-rejected'
| 'announcement'
| 'cycle-enrollment'
| 'event-reminder'
| 'debarment-notice';send-in-app
interface SendInAppJob {
tenantId: string;
userId: string;
title: string;
body: string;
link?: string; // deep link to relevant page
type: 'info' | 'success' | 'warning' | 'action_required';
}
// Writes to `notifications` table in DB. Frontend polls or uses SSE.trigger-webhook
interface TriggerWebhookJob {
tenantId: string;
url: string;
secret: string; // HMAC-SHA256 signing key
event: string; // e.g. "application.status_changed"
payload: Record<string, any>;
retryCount?: number;
}
// Signs payload with HMAC-SHA256, POSTs to url
// Verifies 2xx response, else BullMQ will retryEmail templates
All templates are TypeScript functions returning { subject, html, text }. No external template engine — keep it simple.
// templates/otp.ts
export function otpTemplate(data: { otp: string; expiryMinutes: number; tenantName: string }) {
return {
subject: `Your ${data.tenantName} login code`,
html: `
<p>Your one-time login code is:</p>
<h1 style="letter-spacing: 0.3em">${data.otp}</h1>
<p>Expires in ${data.expiryMinutes} minutes. Do not share this code.</p>
`,
text: `Your login code is ${data.otp}. Expires in ${data.expiryMinutes} minutes.`
};
}Retry strategy
BullMQ handles retries automatically. Config per job type:
// Defined in packages/queue/src/jobs.ts (shared with core-api)
export const emailJobOptions = {
attempts: 3,
backoff: { type: 'exponential', delay: 2000 }, // 2s, 4s, 8s
removeOnComplete: { age: 86400 }, // keep completed jobs for 24h
removeOnFail: { age: 604800 }, // keep failed jobs for 7 days
};
export const webhookJobOptions = {
attempts: 5,
backoff: { type: 'exponential', delay: 5000 }, // 5s, 10s, 20s, 40s, 80s
removeOnComplete: { age: 3600 },
removeOnFail: { age: 604800 },
};Health check
Minimal Fastify server on port 4001:
GET /health → { status: "ok", queueDepth: N, workerActive: true }queueDepth = number of jobs currently waiting in queue (fetched from BullMQ). Alert if this grows above 1000 (means worker is falling behind).