Overview
The sections below (including architecture) are written from the engagement — they are not tied to a public repository.
/ Scope
- Multi-tenant data model (shared schema, row-level isolation)
- Booking & scheduling engine with 3 staff selection modes (auto-assign, single, per-service)
- Timeline computation engine for sequential multi-staff bookings
- Staff management with 3-layer employment model (profile → employment → service)
- Multi-salon staff onboarding (new user, existing customer, existing staff)
- Subscription billing & payments via Stripe Platform Model with idempotent webhooks
- Loyalty & rewards system with wallet-based accrual and redemption
- Customer wallet with loyalty points, debt tracking, and reconciliation
- Geolocation & maps (Google Maps API + PostGIS proximity search)
- Search & discovery with PostGIS proximity search and ranking
- Scheduled jobs & notifications (reminders, no-show detection, review requests)
- Reviews & ratings system with customer feedback and loyalty integration
- Unified calendar (merges bookings, availability, time-off, and events)
- Time-off management with conflict resolution and compensation offers
- Platform fee system with snapshot at booking creation
- PWA + Capacitor mobile (iOS/Android)
- 7-tier RBAC across platform and tenant hierarchies
- Full audit trail with booking_history event ledger
- AWS deployment & CI/CD
Highlights
01
Solo full-stack: Next.js 16 + NestJS + AWS — shipped to production
02
Concurrency-safe booking engine (DB constraint + Redis lock + pessimistic locking)
03
True multi-tenant isolation enforced at the framework layer
04
Stripe Platform Model with idempotent webhook processing
05
Employment-based staff model supporting multi-salon work with independent roles per salon
06
Live product serving real salons — donebyme.dk
/ Tracks
- Client
Media
The Problem
Salons take bookings over the phone, through WhatsApp, on Instagram, and in person. There's no single source of truth. Double-bookings happen. Customers show up and the staff who was supposed to serve them isn't there. No-shows cost real money and nobody tracks them. I needed a platform that could host many salons without leaking data between them, and each salon had to feel like they had their own system.
Approach
I started with the data model. Every table gets a tenant_id. A request-scoped middleware sets the tenant context and a Drizzle query wrapper injects it into every query automatically. Miss the context and the framework throws an error. Then I split the codebase: services for business logic, repositories for SQL. Redis caches availability lookups and holds temporary slot reservations, all keyed by tenant scope. Scheduled jobs run via cron, gated by PostgreSQL advisory locks so multiple instances don't fire the same job twice.
Key Decisions
- 01
Shared database with row-level tenant_id
I chose shared DB over DB-per-tenant. Cheaper, simpler to operate, and a Drizzle query wrapper auto-injects the tenant filter on every query. Forget to set the tenant context and it throws at the framework layer. No silent leaks.
- 02
Service-Repository separation
Business rules like overlap detection, cancellation windows, and loyalty accrual live in services. Persistence lives in repositories. I can test the booking engine without Postgres and swap storage without touching domain logic.
- 03
Three-layer concurrency for booking slots
Two customers hitting the same 3:00 PM slot at the same time was the hardest problem to solve. A DB unique constraint prevents duplicate active bookings. Redis distributed locks hold temporary reservations with a 5-second TTL. SELECT FOR UPDATE serializes writes inside transactions. I didn't want to find out in production that one layer wasn't enough.
- 04
Timeline computation for per-service staff assignment
When a customer assigns different staff to different services, the system has to find start times where everyone's free sequentially. I built a cursor-based engine that iterates 15-minute intervals, chains services with buffer times, and validates constraints in one pass with zero I/O.
- 05
Employment-based staff model instead of flat staff records
Staff needed to work at multiple salons with different roles and pay at each. I modeled it like LinkedIn: one global profile per person, then per-salon employment records, then per-service competencies. A stylist who works at two salons has two employment records, each with its own schedule, services, pay rate, and visibility. The same person can be a manager at Salon A and regular staff at Salon B. Onboarding detects whether the invitee is new, an existing customer, or already staff elsewhere and adapts accordingly.
- 06
Platform fee snapshotting at booking creation
Platform fees are calculated and stored on the booking record when it's created. They're never recalculated or modified after that. This prevents disputes about what the fee was at the time of booking vs what it is now. The fee rate comes from a two-tier resolution: per-tenant override if set, otherwise global platform default.
- 07
Availability as a three-tier intersection problem
Staff availability isn't just 'works 9-5'. It's recurring weekly schedules, overridden by specific-date changes, further modified by exceptions like holidays or emergencies. Each tier can set different hours, breaks, max bookings, and service restrictions. The repository queries check overrides first, fall back to recurring, and exclude exception dates. Breaks are treated as busy intervals and removed from available slots.
- 08
Single Stripe account instead of Stripe Connect
I chose a single platform Stripe account over Stripe Connect. All customers and payments live under one platform Stripe account. Every payment carries tenant_id and booking_id in metadata for settlement. This is simpler to implement and avoids Stripe Connect's complexity with onboarding, verification, and liability. The trade-off is that automated tenant payouts are a future feature.
- 09
NestJS cron with advisory lock gating
Background jobs run via @nestjs/schedule Cron decorators. Each job acquires a PostgreSQL advisory lock on startup. If another instance holds it, the job skips. No job queue needed, and multi-instance safety is handled by the database.
- 10
Idempotent webhook handlers with event ledger
Stripe webhooks arrive out of order and retry on failure. Incremental state updates broke after retries. I made every handler idempotent: subscription state is derived from the latest known event, and an event_id ledger prevents double-processing.
Architecture
/ System proof
These are not tool badges. They describe the boundaries, consistency controls, async paths, and failure-mode decisions behind the build.
- 01Shared schema, row-level tenant isolation (tenant_id on every table)
- 02Service-Repository pattern — domain logic never touches SQL
- 033 staff selection modes: auto-assign, single staff, per-service (timeline computation)
- 04Three-layer concurrency: DB unique constraint + Redis distributed locks + SELECT FOR UPDATE
- 05NestJS cron jobs with PostgreSQL advisory lock distributed gating
- 06Employment-based 3-layer staff model (global profile → per-salon employment → per-service competencies)
- 073-tier availability resolution (recurring schedule → date override → exception/break)
- 08Time-off conflict resolution with progressive deadlines, customer alternatives, and auto-resolution cron
- 097-tier RBAC across two hierarchies (super_admin → customer, owner → staff)
- 10Stripe Platform Model with metadata-based tenant settlement and idempotent webhook processing
- 11Redis cache with tenant-scoped key namespacing
- 12Unified calendar merging bookings, availability, time-off, and calendar events
- 13PostGIS proximity search with Haversine fallback
- 14Platform fee snapshotting at booking creation with two-tier rate resolution
- 15Walk-in booking atomic transaction (user + profile + booking in one DB transaction)
- 16PWA + Capacitor for cross-platform mobile delivery
- 17Full audit trail via booking_history with JSONB snapshots
- 18Staff scoring algorithm for auto-assignment (experience, rating, proficiency, seniority, volume)
Challenges & How I Solved Them
Concurrent booking race conditions
/ Problem
Two customers hit the same 3:00 PM slot at the same time. Both read 'available' and both try to book. Optimistic locking wasn't enough because both reads saw the same state.
/ Solution
Three layers of defense. A DB unique constraint on (tenant, staff, date, start_time) prevents duplicates at the row level. Redis holds temporary reservations with a distributed lock (SET NX PX). SELECT FOR UPDATE serializes writes inside transactions. Reads stay fast through the Redis cache.
Multi-staff sequential booking (timeline computation)
/ Problem
A customer picks different staff for a haircut and a color treatment. The haircut takes 30 minutes with staff A, then the color takes 60 minutes with staff B. Finding a start time where both are free and the sequence fits within business hours requires checking working hours, existing bookings, and in-flight overlap. A simple slot lookup can't do that.
/ Solution
I wrote a computation engine that scans every 15-minute interval from opening to closing. It chains services sequentially with buffer times, validates constraints for each staff member, and returns all valid timelines sorted chronologically. No database calls during computation.
The employment model: same person, different roles at different salons
/ Problem
When a salon owner invites someone who already works at another salon, the system can't just create a new staff record. It has to reuse the existing global profile, connect it to the new salon, and set up different permissions, pay, and visibility at each salon. A person who is a manager at one salon and a stylist at another should only see manager-level data at the first salon.
/ Solution
Three-layer data model: one global profile per person, per-salon employment records with independent roles and permissions, and per-service competency records. Onboarding uses a detection service that figures out if the invitee is new, an existing customer, or staff elsewhere, then creates only what's missing. Permission checks always use the employment record scoped to the current tenant.
Availability intersection: schedules, overrides, exceptions, and breaks
/ Problem
Staff don't have simple 9-5 schedules. They have recurring weekly hours, date-specific changes, vacation days, holidays, and lunch breaks. Each of these is stored differently and they can overlap. The system has to resolve all of them into a single picture of 'is this person available at this time?'
/ Solution
The repository uses a two-step resolution: check for a date-specific override first, fall back to the recurring weekly schedule. Exceptions (holidays, vacations) are stored as availability entries with is_available = false. Break times are parsed from JSON and treated as busy intervals that get subtracted from available slots. Service restrictions filter which staff can perform which services.
Time-off approval affects existing bookings
/ Problem
When a staff member's time-off is approved, any existing bookings with that staff during the time-off period become invalid. Each affected customer needs to be notified, given alternatives, and given a deadline to respond. The deadlines are progressive: one hour for emergency time-off, up to 48 hours for time-off far in advance. If a customer doesn't respond, the booking gets auto-cancelled.
/ Solution
A detection service finds all affected bookings on approval and creates tracking records with progressive deadlines. Customers get a link to view alternative staff at the same time slot or reschedule. Managers can manually reassign, extend deadlines, offer compensation (discounts, credits, free services), or force-cancel with premium refunds. A daily cron auto-resolves bookings past the deadline.
No-show detection across timezones
/ Problem
Detecting no-shows requires knowing the salon's local time and comparing it against the booking's start time plus a grace period. A cron that runs globally needs to handle salons in different timezones correctly.
/ Solution
The no-show cron runs every minute. For each tenant, it converts the booking's local time to UTC, adds a 15-minute grace period, and compares against the current time. It uses a PostgreSQL advisory lock to prevent concurrent execution across instances and an in-process isRunning flag to prevent overlap within the same instance.
Walk-in atomic transaction
/ Problem
When a staff member books a walk-in customer who doesn't have an account, the system needs to create a user, a customer profile, and a booking in one shot. If any step fails, you don't want orphan user records or bookings without an owner.
/ Solution
All three operations run inside a single database transaction. If user creation, profile creation, or booking creation fails for any reason, everything rolls back. The duplicate phone check runs before the transaction starts to fail fast. Walk-in customers without an email get a placeholder based on their phone number.
Outcomes
- Live at donebyme.dk serving real salons.
- Booking engine handles auto-assign, single staff, and per-service selection with sequential timeline computation.
- Zero double-bookings since the three-layer concurrency system shipped. The DB constraint catches anything the locks miss.
- Tenant isolation is enforced by the framework. No developer has to remember WHERE tenant_id.
- Stripe Platform Model with replay-safe webhook processing and metadata-based tenant settlement.
- Staff can work at multiple salons with independent roles, pay, and visibility at each.
- Availability resolves across recurring schedules, date overrides, exceptions, and breaks in real-time.
- Time-off approvals trigger customer notification, alternatives, progressive deadlines, and auto-resolution.
- Scheduled jobs for notifications, reminders, no-show detection, and loyalty run via cron with advisory lock gating.
- Staff see a unified calendar that merges bookings, availability, time-off, and personal events in one view.
- PWA and Capacitor deliver native-like iOS and Android apps.
- 7-tier RBAC split across platform roles (super_admin, admin, support, customer) and tenant roles (owner, manager, staff).
What I Learned
- 01
I spent way too long early on trying to make optimistic locking work for booking slots. Advisory locks and DB constraints are simpler and you can trust them. Move on faster next time.
- 02
The event_id ledger for Stripe webhooks saved me more than once. Webhook handlers that mutate state incrementally are fragile. Derive state from the latest event and make every handler safe to replay.
- 03
Service-repository separation felt like overhead at first. But when I needed to add an admin panel and webhook handlers, both reused the same services without changes. That's when it paid off.
- 04
The employment model (global profile + per-salon records) was more work upfront but worth it. Duplicating staff profiles per salon would have been faster but wrong. The onboarding detection service that handles three user scenarios was the trickiest part.
- 05
Not every background job needs a dedicated queue. Cron with advisory lock gating covers most cases. BullMQ or RabbitMQ would have been overkill for what amounts to 'run this every 5 minutes and don't fire twice'.
- 06
Three layers of concurrency defense sounds excessive. But each layer catches a different failure mode: the constraint catches application bugs, the lock catches timing races, and FOR UPDATE serializes edge cases. I'd do it the same way again.
- 07
I chose a single Stripe account over Stripe Connect. It's simpler and gets the job done. But if the platform grows to hundreds of tenants needing automated payouts, I'll have to migrate. The metadata-based settlement model makes that possible without a full rewrite.
Tech Stack
Next Steps
- Per-tenant analytics on a read replica with materialized view refresh
- Multi-location support (one tenant operating across multiple physical salons)
- Self-serve onboarding for new salon owners