Scalable Websocket Systems: What to know?
Scalable Websocket Systems: What to know?
How connections are established.
Why sticky sessions matter.
How room membership is tracked.
How message ordering is handled.
What happens when a WebSocket server dies.
Presence (online/offline users).
Persistence vs Pub/Sub.
Why Redis Pub/Sub may fail at scale.
Why NATS worked better.
Fan-out optimization techniques.
Backpressure handling.
Monitoring and metrics.
Multi-region challenges.
Let's rebuild the entire story from the beginning as if we are designing WhatsApp ourselves.
Chapter 1: User Opens WhatsApp
Suppose User A opens WhatsApp.
Immediately:
User A
|
|
Internet
|
Load Balancer
|
WebSocket Server
The client requests:
GET /socket
Upgrade: websocket
Server accepts.
Now connection becomes:
User A <=================> Server
and stays alive.
Chapter 2: Why Not HTTP?
With HTTP:
User A
|
Request
|
Server
If User B sends a message:
Server cannot push
because HTTP request ended.
Server must wait.
WebSocket solves this:
User A <===============> Server
User B <===============> Server
Server can push instantly.
Chapter 3: What Is Stored For Every Connection?
When a user connects:
socketId
userId
tenantId
rooms
auth data
buffers
heartbeat state
Example:
{
"socketId": "abc123",
"userId": "user1",
"rooms": ["room1","room2"]
}
Server keeps this in memory.
This is why RAM becomes important.
Chapter 4: Why RAM Dies First
Imagine:
1 connection = 50 KB
Then:
10000 users
=
500 MB
Now add:
Application memory
Redis client
Database pools
Node runtime
Caches
Suddenly:
2GB
4GB
8GB
gets consumed.
CPU might still be only:
20%
while RAM reaches:
95%
Server crashes.
This is called:
Memory Bound System
Chapter 5: First Attempt (Vertical Scaling)
Developer says:
Let's buy a bigger machine
8 GB
↓
16 GB
↓
32 GB
↓
64 GB
Works for some time.
Then:
100k users arrive
and again:
Out of Memory
Another problem:
Single machine dies
=
Entire system dies
Bad design.
Chapter 6: Horizontal Scaling
Instead:
Server 1
Server 2
Server 3
Server 4
behind:
Load Balancer
Diagram:
Users
|
|
Load Balancer
/ | \
/ | \
S1 S2 S3
Now users are distributed.
Chapter 7: Why Sticky Sessions Matter
Suppose:
User A connected to Server 1
WebSocket connection stays there.
If next packet suddenly goes to:
Server 3
problem.
Server 3 knows nothing about:
User A
Therefore load balancers usually maintain:
Connection Affinity
or
Sticky Sessions
Meaning:
User A
always stays on same server
until disconnect.
Chapter 8: Cross-Server Messaging Problem
Now:
User A → Server1
User B → Server2
User A sends:
Hello
Question:
How does Server1 reach User B?
It cannot.
Servers are isolated.
Chapter 9: Backplane
Need a central communication layer.
Diagram:
NATS
/ | \
S1 S2 S3
Now:
S1 publishes
"Hello"
NATS delivers to:
S2
S2 forwards to User B.
Problem solved.
Chapter 10: Why Redis Pub/Sub Isn't Enough
Many developers use:
Redis Pub/Sub
Works great initially.
Problem:
Pub/Sub is:
Fire and Forget
If subscriber is offline:
Message lost
No replay.
No persistence.
Example:
Publish
↓
Subscriber down
↓
Message gone forever
For chat systems this may be unacceptable.
Chapter 11: Why NATS Helped
NATS gives:
High throughput
Low latency
Simple setup
Especially good for:
Cross-server communication
Millions of messages per second are possible.
Chapter 12: Room Management
Suppose:
Frontend Room
contains:
5000 users
Question:
Where do we store room members?
Usually:
Redis Set
Example:
room:frontend
contains:
user1
user2
user3
This allows any server to know room membership.
Chapter 13: Fan-Out Problem
User sends:
Hello Everyone
to room:
5000 users
Server must:
Receive 1 message
Generate 5000 sends
Diagram:
1 Message
|
v
5000 Outputs
This is Fan-Out.
Chapter 14: Hot Rooms
Imagine:
Stock Market Room
100k users.
Every second:
50 messages
Server performs:
50 × 100000
=
5 Million Sends
Now CPU explodes.
The system becomes:
CPU Bound
instead of RAM bound.
Chapter 15: Presence Tracking
Users want:
Online
Offline
Last Seen
How?
Usually:
SET user:123 online
in Redis.
Heartbeat updates it.
Example:
Ping every 30 sec
If heartbeat missing:
Mark Offline
Chapter 16: Heartbeats
Question:
How do we know connection still exists?
Network cable may disconnect.
WiFi may die.
Server sends:
PING
Client responds:
PONG
If no response:
Disconnect user
Chapter 17: What Happens If Server Crashes?
Imagine:
Server2 dies
All users lose connection.
5000 users
immediately reconnect.
This is dangerous.
Chapter 18: Thundering Herd
Diagram:
5000 Users
|
|
Reconnect NOW
Every user performs:
TLS handshake
Authentication
Room Join
at same moment.
New server dies.
Chapter 19: Exponential Backoff + Jitter
Bad:
Reconnect after 1 sec
for everyone.
Good:
1.2 sec
3.5 sec
7.8 sec
5.1 sec
Randomized.
Load becomes:
Spread out
instead of:
Huge spike
Chapter 20: Message Persistence
Pub/Sub only delivers.
But chats need history.
Therefore:
WebSocket
≠ Storage
Messages must also be written to:
PostgreSQL
MongoDB
Cassandra
Example:
Message arrives
|
Store in DB
|
Publish to room
Both are needed.
Chapter 21: Message Ordering
Imagine:
Msg1
Msg2
Msg3
Network delays:
Msg2 arrives first
Bad.
Usually systems use:
Sequence Numbers
Example:
1001
1002
1003
Client reorders.
Chapter 22: Backpressure
Suppose:
Client receives slower
than server sends
Example:
Server sends:
1000 msg/sec
Client reads:
10 msg/sec
Memory fills.
Buffers grow.
Eventually crash.
Solutions:
Drop messages
Compress messages
Disconnect slow consumers
Chapter 23: Monitoring
Without monitoring:
You are blind
Track:
Active connections
RAM usage
CPU usage
Reconnects
Room sizes
Fan-out latency
Broker latency
Important dashboards:
Grafana
Prometheus
Complete Production Architecture
Users
|
|
Load Balancer
|
--------------------------------
| | |
| | |
WS1 WS2 WS3
| | |
--------------------------------
|
NATS
|
Redis
|
Database
Responsibilities
Load Balancer
Distributes connections
WebSocket Servers
Maintain sockets
Authenticate users
Broadcast messages
NATS
Cross-server communication
Redis
Presence
Room membership
Fast lookups
Database
Permanent message storage
This version is much closer to what a senior engineer should understand when discussing a large-scale real-time system like WhatsApp, Discord, Slack, or the collaborative architecture you used in your DevScribble project.
==============================
Real-Time WebSocket System Design Notes
(Explained in the Simplest Possible Way)
1. Why WebSockets Exist
Normally, in HTTP:
Client ---> Request ---> Server
Client <--- Response --- Server
Connection closes after response.
Problem:
For apps like:
WhatsApp
Discord
Slack
Google Docs
Trading Apps
Multiplayer Games
The server needs to send updates instantly without waiting for a request.
Example:
User A sends message
↓
Server
↓
User B should receive instantly
This is where WebSockets help.
2. What is a WebSocket?
A WebSocket is a persistent TCP connection.
Instead of:
Request
Response
Close
it becomes:
Client ================= Server
Connection stays open
Both sides can send messages anytime
Think of it as:
📞 Phone Call
instead of
3. Why Scaling WebSockets is Hard
Many developers think:
More users
→ More CPU
Wrong.
For WebSocket systems:
More users
→ More Memory (RAM)
is usually the first problem.
4. Why Each Connection Uses Memory
Suppose:
1 user = 1 WebSocket connection
Server stores:
TCP socket
Read buffer
Write buffer
Connection metadata
Authentication info
Room information
Example:
User 1
User 2
User 3
...
User 10000
Each connection occupies memory.
Example
If one connection uses:
50 KB
Then:
10000 users
=
10000 × 50 KB
=
500 MB RAM
Only for connections.
Not counting:
application memory
caches
message queues
database clients
Important Insight
WebSocket servers often become:
Memory Bound
before becoming:
CPU Bound
5. First Scaling Attempt: Vertical Scaling
Idea:
Buy Bigger Server
Diagram:
64 GB RAM
│
▼
┌───────────┐
│ WebSocket │
│ Server │
└───────────┘
Keep increasing:
8 GB
16 GB
32 GB
64 GB
128 GB
Problems:
Expensive
Limited maximum size
Single point of failure
Eventually:
Cannot scale anymore
6. Better Solution: Horizontal Scaling
Instead of:
1 huge server
Use:
Many smaller servers
Diagram:
Users
│
▼
┌────────────┐
│ Load │
│ Balancer │
└────────────┘
/ | \
/ | \
Server1 Server2 Server3
Now:
10000 users
can be distributed.
Example:
Server1 = 3300 users
Server2 = 3300 users
Server3 = 3400 users
Much better.
7. New Problem Appears
Imagine:
User A connected to Server 1
User B connected to Server 2
Diagram:
User A
│
Server 1
User B
│
Server 2
User A sends message:
Hello
How does Server 1 know User B exists on Server 2?
It doesn't.
This creates the:
Cross-Server Communication Problem
8. Solution: Message Broker (Backplane)
A central messaging system is added.
The creator used:
NATS
Diagram:
NATS
│
┌────────┼────────┐
│ │ │
Server1 Server2 Server3
Message Flow
User A sends:
Hello
Flow:
User A
│
▼
Server 1
│
Publish
▼
NATS
│
Deliver
▼
Server 2
│
▼
User B
Now every server can communicate.
9. Why NATS Was Chosen
The creator tested multiple options.
Advantages:
✅ Very fast
✅ Lightweight
✅ Easy setup
✅ Low memory usage
✅ Excellent for pub/sub
Pub/Sub Explained
Publisher:
Sends messages
Subscriber:
Receives messages
Example:
Channel = room123
Server1 publishes
Server2 subscribes
Message automatically arrives.
10. Load Balancer
A load balancer decides:
Which server gets a new connection
Diagram:
User
│
▼
Load Balancer
│
┌┴┐
│ │
▼ ▼
S1 S2
11. Least Connections Strategy
Idea:
Send user to server
with lowest active connections
Example:
S1 = 500 users
S2 = 300 users
New user goes to:
S2
Seems smart.
But problem appears.
12. Why Least Connections Fails
Suppose:
S1 = 500 quiet users
and
S2 = 300 extremely active users
Actual load:
S1 → Low load
S2 → Very high load
Connection count ≠ actual work.
Important lesson:
Lower connections
does not mean
lower server load
13. Round Robin
Very simple.
User1 → Server1
User2 → Server2
User3 → Server3
User4 → Server1
User5 → Server2
Diagram:
S1
S2
S3
S1
S2
S3
repeat...
Why It Worked
Because of:
Law of Large Numbers
When users become very large:
10000+
active and inactive users naturally spread across servers.
Result:
Average load becomes balanced
without complex calculations.
A surprising discovery from the video.
14. Rooms and Group Chats
Suppose:
Room = Frontend Developers
Contains:
5000 users
One user sends:
Hello Everyone
What Server Must Do
Receive message
Find all users
Send to 5000 users
This process is called:
Fan-Out
1 Message
↓
5000 Copies
Diagram:
Message
│
▼
┌────────────┐
│ Server │
└────────────┘
/ / / / / / /
▼ ▼ ▼ ▼ ▼ ▼ ▼
5000 WebSocket Sends
15. Hot Rooms
A room becomes:
Hot Room
when:
Huge users
+
Heavy activity
Example:
Global Chat
Stock Market Channel
Live Stream Chat
Why Hot Rooms Are Dangerous
Initially system was:
Memory Bound
Now:
CPU Bound
because:
Message
↓
Thousands of Sends
↓
Serialization
↓
Network Writes
↓
CPU Usage Explodes
Two Types of Bottlenecks
Memory Bound
Many idle users
Example:
100000 users
but nobody talking.
Problem:
RAM
CPU Bound
Few users
but huge activity.
Example:
1000 users
sending messages every second.
Problem:
CPU
16. Reconnection Problem
Imagine:
Server crashes
All users reconnect.
10000 users
at the same time.
Diagram:
10000 Clients
│
▼
Reconnect NOW
This is called:
Thundering Herd Problem
Why It's Dangerous
Every user:
TCP Handshake
TLS Handshake
Authentication
Room Join
all at once.
Server dies again.
17. Solution: Jitter
Instead of reconnecting instantly:
Random Delay
Example:
Client 1 → 2 sec
Client 2 → 5 sec
Client 3 → 8 sec
Client 4 → 1 sec
Diagram:
Reconnect Wave
█
██
████
██████
████████
instead of
██████████████████
Load spreads over time.
Massive improvement.
Final Architecture
Users
│
▼
┌─────────────┐
│Load Balancer│
└─────────────┘
/ | \
/ | \
┌────────┐ ┌────────┐ ┌────────┐
│Server1 │ │Server2 │ │Server3 │
└────────┘ └────────┘ └────────┘
\ | /
\ | /
┌─────────┐
│ NATS │
└─────────┘
What a Senior Engineer Should Learn From This
Lesson 1
WebSocket systems are usually constrained by:
RAM first
CPU later
Lesson 2
Horizontal scaling is mandatory.
1 big server
❌
Many servers
✅
Lesson 3
Multiple WebSocket servers require a backplane.
Examples:
NATS
Apache Kafka
Redis Pub/Sub
Lesson 4
Connection count is not equal to load.
100 connections
can be heavier
than
1000 connections
depending on activity.
Lesson 5
Group chats are the real scaling challenge.
1-to-1 messaging
Easy
1-to-10000 room
Hard
Lesson 6
Always design for reconnection storms.
Use:
Exponential Backoff
Jitter
Rate Limiting
One-Line Interview Summary
A scalable WebSocket architecture uses horizontally scaled WebSocket servers behind a load balancer, a messaging backplane such as NATS for cross-server communication, strategies to handle hot-room fan-out, and jittered reconnections to survive thundering-herd events while balancing memory and CPU bottlenecks.
Comments
Post a Comment