Scalable Websocket Systems: What to know?

Scalable Websocket Systems: What to know?

  1. How connections are established.

  2. Why sticky sessions matter.

  3. How room membership is tracked.

  4. How message ordering is handled.

  5. What happens when a WebSocket server dies.

  6. Presence (online/offline users).

  7. Persistence vs Pub/Sub.

  8. Why Redis Pub/Sub may fail at scale.

  9. Why NATS worked better.

  10. Fan-out optimization techniques.

  11. Backpressure handling.

  12. Monitoring and metrics.

  13. Multi-region challenges.

Let's rebuild the entire story from the beginning as if we are designing WhatsApp ourselves.


Chapter 1: User Opens WhatsApp

Suppose User A opens WhatsApp.

Immediately:

User A
   |
   |
Internet
   |
Load Balancer
   |
WebSocket Server

The client requests:

GET /socket
Upgrade: websocket

Server accepts.

Now connection becomes:

User A <=================> Server

and stays alive.


Chapter 2: Why Not HTTP?

With HTTP:

User A
  |
Request
  |
Server

If User B sends a message:

Server cannot push

because HTTP request ended.

Server must wait.

WebSocket solves this:

User A <===============> Server
User B <===============> Server

Server can push instantly.


Chapter 3: What Is Stored For Every Connection?

When a user connects:

socketId
userId
tenantId
rooms
auth data
buffers
heartbeat state

Example:

{
  "socketId": "abc123",
  "userId": "user1",
  "rooms": ["room1","room2"]
}

Server keeps this in memory.

This is why RAM becomes important.


Chapter 4: Why RAM Dies First

Imagine:

1 connection = 50 KB

Then:

10000 users

=
500 MB

Now add:

Application memory
Redis client
Database pools
Node runtime
Caches

Suddenly:

2GB
4GB
8GB

gets consumed.

CPU might still be only:

20%

while RAM reaches:

95%

Server crashes.

This is called:

Memory Bound System

Chapter 5: First Attempt (Vertical Scaling)

Developer says:

Let's buy a bigger machine
8 GB
↓
16 GB
↓
32 GB
↓
64 GB

Works for some time.

Then:

100k users arrive

and again:

Out of Memory

Another problem:

Single machine dies
=
Entire system dies

Bad design.


Chapter 6: Horizontal Scaling

Instead:

Server 1
Server 2
Server 3
Server 4

behind:

Load Balancer

Diagram:

               Users
                  |
                  |
          Load Balancer
            /   |   \
           /    |    \
         S1    S2    S3

Now users are distributed.


Chapter 7: Why Sticky Sessions Matter

Suppose:

User A connected to Server 1

WebSocket connection stays there.

If next packet suddenly goes to:

Server 3

problem.

Server 3 knows nothing about:

User A

Therefore load balancers usually maintain:

Connection Affinity

or

Sticky Sessions

Meaning:

User A
always stays on same server

until disconnect.


Chapter 8: Cross-Server Messaging Problem

Now:

User A → Server1

User B → Server2

User A sends:

Hello

Question:

How does Server1 reach User B?

It cannot.

Servers are isolated.


Chapter 9: Backplane

Need a central communication layer.

Diagram:

           NATS

      /      |      \

    S1      S2      S3

Now:

S1 publishes

"Hello"

NATS delivers to:

S2

S2 forwards to User B.

Problem solved.


Chapter 10: Why Redis Pub/Sub Isn't Enough

Many developers use:

Redis Pub/Sub

Works great initially.

Problem:

Pub/Sub is:

Fire and Forget

If subscriber is offline:

Message lost

No replay.

No persistence.

Example:

Publish
      ↓
Subscriber down
      ↓
Message gone forever

For chat systems this may be unacceptable.


Chapter 11: Why NATS Helped

NATS gives:

High throughput
Low latency
Simple setup

Especially good for:

Cross-server communication

Millions of messages per second are possible.


Chapter 12: Room Management

Suppose:

Frontend Room

contains:

5000 users

Question:

Where do we store room members?

Usually:

Redis Set

Example:

room:frontend

contains:

user1
user2
user3

This allows any server to know room membership.


Chapter 13: Fan-Out Problem

User sends:

Hello Everyone

to room:

5000 users

Server must:

Receive 1 message
Generate 5000 sends

Diagram:

1 Message

     |
     v

5000 Outputs

This is Fan-Out.


Chapter 14: Hot Rooms

Imagine:

Stock Market Room

100k users.

Every second:

50 messages

Server performs:

50 × 100000
=
5 Million Sends

Now CPU explodes.

The system becomes:

CPU Bound

instead of RAM bound.


Chapter 15: Presence Tracking

Users want:

Online
Offline
Last Seen

How?

Usually:

SET user:123 online

in Redis.

Heartbeat updates it.

Example:

Ping every 30 sec

If heartbeat missing:

Mark Offline

Chapter 16: Heartbeats

Question:

How do we know connection still exists?

Network cable may disconnect.

WiFi may die.

Server sends:

PING

Client responds:

PONG

If no response:

Disconnect user

Chapter 17: What Happens If Server Crashes?

Imagine:

Server2 dies

All users lose connection.

5000 users

immediately reconnect.

This is dangerous.


Chapter 18: Thundering Herd

Diagram:

5000 Users

     |
     |
Reconnect NOW

Every user performs:

TLS handshake
Authentication
Room Join

at same moment.

New server dies.


Chapter 19: Exponential Backoff + Jitter

Bad:

Reconnect after 1 sec

for everyone.

Good:

1.2 sec
3.5 sec
7.8 sec
5.1 sec

Randomized.

Load becomes:

Spread out

instead of:

Huge spike

Chapter 20: Message Persistence

Pub/Sub only delivers.

But chats need history.

Therefore:

WebSocket
≠ Storage

Messages must also be written to:

  • PostgreSQL

  • MongoDB

  • Cassandra

Example:

Message arrives
        |
Store in DB
        |
Publish to room

Both are needed.


Chapter 21: Message Ordering

Imagine:

Msg1
Msg2
Msg3

Network delays:

Msg2 arrives first

Bad.

Usually systems use:

Sequence Numbers

Example:

1001
1002
1003

Client reorders.


Chapter 22: Backpressure

Suppose:

Client receives slower
than server sends

Example:

Server sends:
1000 msg/sec

Client reads:
10 msg/sec

Memory fills.

Buffers grow.

Eventually crash.

Solutions:

Drop messages
Compress messages
Disconnect slow consumers

Chapter 23: Monitoring

Without monitoring:

You are blind

Track:

Active connections
RAM usage
CPU usage
Reconnects
Room sizes
Fan-out latency
Broker latency

Important dashboards:

  • Grafana

  • Prometheus


Complete Production Architecture

                    Users
                       |
                       |
                Load Balancer
                       |
       --------------------------------
       |              |              |
       |              |              |
      WS1            WS2            WS3
       |              |              |
       --------------------------------
                       |
                    NATS
                       |
                   Redis
                       |
                  Database

Responsibilities

Load Balancer

  • Distributes connections

WebSocket Servers

  • Maintain sockets

  • Authenticate users

  • Broadcast messages

NATS

  • Cross-server communication

Redis

  • Presence

  • Room membership

  • Fast lookups

Database

  • Permanent message storage


This version is much closer to what a senior engineer should understand when discussing a large-scale real-time system like WhatsApp, Discord, Slack, or the collaborative architecture you used in your DevScribble project.
==============================

Real-Time WebSocket System Design Notes

(Explained in the Simplest Possible Way)


1. Why WebSockets Exist

Normally, in HTTP:

Client ---> Request ---> Server
Client <--- Response --- Server

Connection closes after response.

Problem:

For apps like:

  • WhatsApp

  • Discord

  • Slack

  • Google Docs

  • Trading Apps

  • Multiplayer Games

The server needs to send updates instantly without waiting for a request.

Example:

User A sends message
        ↓
Server
        ↓
User B should receive instantly

This is where WebSockets help.


2. What is a WebSocket?

A WebSocket is a persistent TCP connection.

Instead of:

Request
Response
Close

it becomes:

Client ================= Server

Connection stays open
Both sides can send messages anytime

Think of it as:

📞 Phone Call

instead of

📧 Email


3. Why Scaling WebSockets is Hard

Many developers think:

More users
→ More CPU

Wrong.

For WebSocket systems:

More users
→ More Memory (RAM)

is usually the first problem.


4. Why Each Connection Uses Memory

Suppose:

1 user = 1 WebSocket connection

Server stores:

  • TCP socket

  • Read buffer

  • Write buffer

  • Connection metadata

  • Authentication info

  • Room information

Example:

User 1
User 2
User 3
...
User 10000

Each connection occupies memory.


Example

If one connection uses:

50 KB

Then:

10000 users

=
10000 × 50 KB

=
500 MB RAM

Only for connections.

Not counting:

  • application memory

  • caches

  • message queues

  • database clients


Important Insight

WebSocket servers often become:

Memory Bound

before becoming:

CPU Bound

5. First Scaling Attempt: Vertical Scaling

Idea:

Buy Bigger Server

Diagram:

        64 GB RAM
             │
             ▼
      ┌───────────┐
      │ WebSocket │
      │  Server   │
      └───────────┘

Keep increasing:

8 GB
16 GB
32 GB
64 GB
128 GB

Problems:

  • Expensive

  • Limited maximum size

  • Single point of failure

Eventually:

Cannot scale anymore

6. Better Solution: Horizontal Scaling

Instead of:

1 huge server

Use:

Many smaller servers

Diagram:

                 Users
                   │
                   ▼

           ┌────────────┐
           │ Load       │
           │ Balancer   │
           └────────────┘

          /      |      \
         /       |       \

   Server1   Server2   Server3

Now:

10000 users

can be distributed.

Example:

Server1 = 3300 users
Server2 = 3300 users
Server3 = 3400 users

Much better.


7. New Problem Appears

Imagine:

User A connected to Server 1
User B connected to Server 2

Diagram:

User A
   │
Server 1

User B
   │
Server 2

User A sends message:

Hello

How does Server 1 know User B exists on Server 2?

It doesn't.

This creates the:

Cross-Server Communication Problem


8. Solution: Message Broker (Backplane)

A central messaging system is added.

The creator used:

NATS

Diagram:

               NATS
                 │
        ┌────────┼────────┐
        │        │        │

    Server1  Server2  Server3

Message Flow

User A sends:

Hello

Flow:

User A
   │
   ▼
Server 1
   │
 Publish
   ▼
 NATS
   │
 Deliver
   ▼
Server 2
   │
   ▼
User B

Now every server can communicate.


9. Why NATS Was Chosen

The creator tested multiple options.

Advantages:

✅ Very fast

✅ Lightweight

✅ Easy setup

✅ Low memory usage

✅ Excellent for pub/sub


Pub/Sub Explained

Publisher:

Sends messages

Subscriber:

Receives messages

Example:

Channel = room123
Server1 publishes
Server2 subscribes

Message automatically arrives.


10. Load Balancer

A load balancer decides:

Which server gets a new connection

Diagram:

User
  │
  ▼
Load Balancer
  │
 ┌┴┐
 │ │
 ▼ ▼
S1 S2

11. Least Connections Strategy

Idea:

Send user to server
with lowest active connections

Example:

S1 = 500 users
S2 = 300 users

New user goes to:

S2

Seems smart.

But problem appears.


12. Why Least Connections Fails

Suppose:

S1 = 500 quiet users

and

S2 = 300 extremely active users

Actual load:

S1 → Low load

S2 → Very high load

Connection count ≠ actual work.

Important lesson:

Lower connections
does not mean
lower server load

13. Round Robin

Very simple.

User1 → Server1
User2 → Server2
User3 → Server3
User4 → Server1
User5 → Server2

Diagram:

S1
S2
S3

S1
S2
S3

repeat...


Why It Worked

Because of:

Law of Large Numbers

When users become very large:

10000+

active and inactive users naturally spread across servers.

Result:

Average load becomes balanced

without complex calculations.

A surprising discovery from the video.


14. Rooms and Group Chats

Suppose:

Room = Frontend Developers

Contains:

5000 users

One user sends:

Hello Everyone

What Server Must Do

Receive message

Find all users

Send to 5000 users

This process is called:

Fan-Out

1 Message
      ↓
5000 Copies

Diagram:

         Message
             │
             ▼

      ┌────────────┐
      │  Server    │
      └────────────┘

      / / / / / / /
     ▼ ▼ ▼ ▼ ▼ ▼ ▼

 5000 WebSocket Sends

15. Hot Rooms

A room becomes:

Hot Room

when:

Huge users
+
Heavy activity

Example:

  • Global Chat

  • Stock Market Channel

  • Live Stream Chat


Why Hot Rooms Are Dangerous

Initially system was:

Memory Bound

Now:

CPU Bound

because:

Message
      ↓
Thousands of Sends
      ↓
Serialization
      ↓
Network Writes
      ↓
CPU Usage Explodes

Two Types of Bottlenecks

Memory Bound

Many idle users

Example:

100000 users

but nobody talking.

Problem:

RAM

CPU Bound

Few users

but huge activity.

Example:

1000 users

sending messages every second.

Problem:

CPU

16. Reconnection Problem

Imagine:

Server crashes

All users reconnect.

10000 users

at the same time.

Diagram:

10000 Clients
      │
      ▼

Reconnect NOW

This is called:

Thundering Herd Problem


Why It's Dangerous

Every user:

TCP Handshake
TLS Handshake
Authentication
Room Join

all at once.

Server dies again.


17. Solution: Jitter

Instead of reconnecting instantly:

Random Delay

Example:

Client 1 → 2 sec

Client 2 → 5 sec

Client 3 → 8 sec

Client 4 → 1 sec

Diagram:

Reconnect Wave

█
██
████
██████
████████

instead of

██████████████████

Load spreads over time.

Massive improvement.


Final Architecture

                    Users
                      │
                      ▼

              ┌─────────────┐
              │Load Balancer│
              └─────────────┘

          /        |         \
         /         |          \

   ┌────────┐ ┌────────┐ ┌────────┐
   │Server1 │ │Server2 │ │Server3 │
   └────────┘ └────────┘ └────────┘
         \         |         /
          \        |        /

             ┌─────────┐
             │  NATS   │
             └─────────┘

What a Senior Engineer Should Learn From This

Lesson 1

WebSocket systems are usually constrained by:

RAM first
CPU later

Lesson 2

Horizontal scaling is mandatory.

1 big server
❌

Many servers
✅

Lesson 3

Multiple WebSocket servers require a backplane.

Examples:

  • NATS

  • Apache Kafka

  • Redis Pub/Sub


Lesson 4

Connection count is not equal to load.

100 connections
can be heavier

than

1000 connections

depending on activity.


Lesson 5

Group chats are the real scaling challenge.

1-to-1 messaging
Easy

1-to-10000 room
Hard

Lesson 6

Always design for reconnection storms.

Use:

  • Exponential Backoff

  • Jitter

  • Rate Limiting


One-Line Interview Summary

A scalable WebSocket architecture uses horizontally scaled WebSocket servers behind a load balancer, a messaging backplane such as NATS for cross-server communication, strategies to handle hot-room fan-out, and jittered reconnections to survive thundering-herd events while balancing memory and CPU bottlenecks.

Comments

Popular posts from this blog

JavaScript Must-Read Topics for Senior Developer Interviews

Topics for Backend Engineer

Must read Topics for Mid Level Node.js Developer Interview Preparation