Scalability is a buzzword in server architecture. Another one is fault tolerance. Software complexity and maintenance costs define the other side of the medal. The good news: there is a scalable server architecture with good fault tolerance for a medium-sized MMORPG for reasonable costs and complexity.
The term “server” is prone to misunderstandings. Dependent on the situation it can mean:
- a machine
- a rack of machines
- a room full of racks of machines
- a realm
- an application/software
- a provider of something
- a game logic node
Therefore, in the article instead of “server” the specific terms from the list are used. As funny as it we will avoid the term server in an article on server architecture.
Table of contents
- Nodes and threads
- The architecture of a realm
- Scalability and bottlenecks
- Game state management
- Resilient Node-to-Node Communication
- Handling node crash
Nodes and threads
A common approach to process work faster is to share the workload between several threads or nodes. A node is an independent running application that can have several independent running threads.
Communication between threads is realized within an application, usually with queues. Communication between nodes is called inter-process communication. It is realized with the TCP/IP protocol. This enables flexible scaling: Should the need arise, you can run different nodes on different hardware.
Processes that run on different threads within each node:
- receiving queue
- sending queue
- main loop
Processes that run on different nodes:
- connect with clients
- database operations
- zones & instances
- manage all nodes of a realm
- player authentification
The architecture of a realm
A realm is a logical unit that consists of several nodes. All nodes can run on the same hardware or on different machines. The number of client nodes depends on the number of players you expect and the hardware of the client nodes.
When you have several realms, you can build a fixed configuration where each realm has a defined number of nodes.
As each client node doesn´t need realm-specific data, you can balance the user load among the client nodes of all realms.
This means more efficient use of your hardware resources. Also, unused zone & instance nodes can stay in a common pool until they are utilized by a particular realm.
Zone & instance node
The game logic is processed in zone and instance nodes. Each of these nodes is connected with the same database node. The game state is in the RAM of the zone & instance node, so database access is limited to events when the game state needs to be read the first time or permanently stored.
Zone & instance nodes must not share state. That means, data entities like a player must only exist on one node at a given time to prevent inconsistency.
Database and Database backup node
Each realm has a database node. The complete database is in the RAM of the database node for performance reasons. The actual writing of data to the hard disk can lag behind.
Backups are frequently created on the database backup node that should be on another machine than the database node for crash resilience.
Why a client node?
Sending and receiving messages is a time-consuming part of its own. Therefore, each player connects to a client node. The client node is the endpoint in the player-realm connection.
When the realm must send data to 200 players, it sends these data only to the client nodes. The latter have the task to send the data to all the players. Furthermore, they need to prevent traffic spikes by spreading the sending assignment over some milliseconds.
Also, client nodes receive messages from players. You may implement CPU-consuming anti-cheat measures in the client nodes. For instance, they can check whether a player’s movement was valid.
Client nodes work together in a peer-to-peer style when new players connect to the realm. The node with the least number of connected players accepts the new player.
The first contact between a player and the game is the Log-In Node. This node is backed by a separate database that stores all legitimate players.
Successful authentification results in a session handle given to the player and the appropriate realm node. The player can use the session handle to connect with the realms client nodes.
The realm node will assign the player to the correct client node. All further communication is between the player and the client node.
A realm is controlled by a reviver node that connects to all other nodes and can restart them if needed. One reviver node can manage several realms.
The reviver node handles realm restart cycles and node crashes. Therefore, the reviver node has procedures to handle different events accordingly.
Single Point of Failure (SPOF)
SPOFs are the log-in node and the reviver node that will handle new players and restart your realm after a crash. Both are responsible for more than one realm. Therefore, you should have them in duplicate to prevent them from being SPOFs for entire realms.
Scalability and bottlenecks
Scalability in an MMORPG setting means how well an architecture can cope with an increasing number of players.
The limiting nodes in the realm architecture are the client node, database node, and zone & instance node. All of them have limiting resources. Therefore, the hardware they are running on should focus on this resource.
You can add client nodes in unlimited numbers: they scale well. The same accounts for zone & instance nodes. However, the limit is the number of players in one of these nodes. So when all players decide to inhabit the same zone, the architecture doesn´t scale.
The number of players in a single zone is a bottleneck of scaling.
This is never a problem in instances because they restrict the number of players. The worst thing with instances is that you may run into an instance limit, where you don´t have enough resources to open another instance.
So as a rule of thumb, an instance node is less resource-consuming than a zone node. That means you can have much more instances and/or have more consuming operations (more complex battles, more complex pathfinding) in instances than in zones.
Most game state change is temporary, such as player position or health. However, some states like item distribution need to be stored permanently. That is the task of the database node, and this task is hard to share.
The database server can be a bottleneck of scaling.
Shared memory dilemma
In a typical game loop, a number of actions need to be processed. This can be the actions each player in a 40-man raid does in a time interval of 50 milliseconds. The zone & instance node must process these actions within the time frame.
You might want to process the actions in parallel. After all, players act in parallel. Forget about that.
Game logic that writes data in shared memory should be processed in a single thread.
This is less a rule and more a piece of advice. Writing in shared memory from different threads actually can work. However, it is so hard to implement and so error-prone, that you should forget about it.
The single thread dilemma is, that some time-consuming tasks are not meaningfully processable in a parallel approach. And processing an MMORPG´s action queue is one of these tasks.
Game state management
The game states are:
- server state
- publishable state
- client state
The server state is computed on the zone & instance nodes by applying the game logic. For each server tick, there is exactly one server state.
The same nodes generate from the server state the publishable state in a process called interest management. In this process, they cut away as much as possible and leave only the information that each client needs to depict the game.
Players are evenly distributed
The publishable state is restricted to a certain range (interest management).
The publishable state is specific for each player. When players are spread across the zone, each player will have a different publishable state. The zone & instance node needs to send these states to the client node that does nothing more than forwarding these data to the players.
As creatures are usually evenly distributed, the network traffic is stable in these situations. In the above example, the publishable state contains 1, 2, or 3 creatures for the players, meaning an average of 2 creatures. The outgoing traffic of the zone & instance node is dependent on N (N = number of players).
However, when players are close together, the traffic increases as each player needs information not only about the creatures but also about the other players.
In the above example, the publishable state contains 2 creatures and 2 players for each player, meaning an average of 4 entities which is twice as much as before. The outgoing traffic of the zone & instance node could be N² (N = number of players).
The client nodes are in place to prevent exactly these N² traffic situations from hitting one node. So when the zone & instance node has a situation where several players have the same publishable state, the node sends the state only once to each client node.
It is then the task of the client nodes to forward the publishable state to each player. Therefore, the outgoing traffic hitting a single client node is similar to N² / C (N = number of players, C = number of client nodes).
As a result, you cannot prevent an N² traffic situation. Yet, you can make it manageable by distributing the traffic on several nodes.
It is the task of the client node to compress data before sending it to the players. As data packets are very small and have high heterogeneity, classical compressions show little efficiency.
What is much more effecive, are clever content considerations:
- round down as much as possible: most values that need to be exact on the server state, don´t need the same accuracy on the client
- only send relevant changes: when an entity barely moves, don´t send that. Store on the client node what you sent, and compare further updates with that. Pack the outgoing data beginning with the strongest differences and cut when differences fall below a threshold or when the traffic limit is reached. This way you can even enforce an upper traffic limit.
The task of the client is to build a visible world from the sparse publishable state. You want a high framerate (> 60 fps) and fluent movements. However, compared with these high demands, the state updates are rare and infrequent.
The client cannot wait for a state update, before rendering a new frame. Therefore, it has to interpolate between frames. Interpolation can mean lagging a bit behind the received publishable state from the server state.
When players perform actions, the client needs to give immediate feedback to feel responsive. Therefore, it has to predict actions. This might be something like beginning a cast animation or highlighting a health bar. It is important not to predict irreversible or uninterruptable events like a death animation or a loot window opening.
Also, when you have the position, speed, and movement vector of a unit, you can simply let it continue its movement.
Resilient Node-to-Node Communication
Remote Procedure Call (RPC)
Sending a message to a server or client results can be considered a remote procedure call (Wikipedia). You want your message to have a particular effect on the target node.
So the message “Subtract 100 hitpoints from the creature with ID 2350” is a RPC that expects something to be done. This RPC doesn´t have a return value. This can be limiting. However, creating a return value for RPCs is complicated.
It is far easier to build your entire communication on RPCs without a return value. Consider the RPC “Get hitpoint of the creature with ID 2350”. The RPC could result in the target creating another RPC back to the sender that tells the hitpoints.
So your RPCs should be without a return value. Also, they must not block the program flow.
Encapsulate data transfer
Create an abstraction layer for your RPC that handles how game data is packed into network messages and vice versa. This can be done by creating a library that covers low-level conversions and higher-level conversions that build on them.
For instance, you might have a low-level method “PackVector” that receives a Vector as a parameter and returns packet-compatible data. Another higher-level method “PackMOB” might receive creature information as a parameter and use among others “PackVecor” to store position and direction.
A message queue system is a sophisticated protocol that ensures that a message is received at least once or exactly once. Besides, you might want to give messages a priority.
This is far from trivial when you want to take node crashes into account. You can correctly send a message, yet when the targeted node crashes shortly after, the message is lost.
You might even want a history behavior so that messages that were sent in the last 10 minutes (e.g. chat or error messages) are resent after a crash.
Message Queuing is on a higher layer of abstraction than the guaranteed packet delivery of TCP. It is a complicated topic of its own that is highly dependent on your implementation.
In contrast to client-to-realm connections, you are in full control over the node-to-node connections. Therefore, you can make them reliable. Usually, all nodes of your realm are in the same data center, maybe even on the same machine.
In this scenario, a TCP protocol is optimal, as you will have no packet loss 99.9 % of the time. In 0.1 % you will have lag, yet no corruption of the game logic and no crash.
When you have several realms in different parts of the world, each of them works independently from each other, so the same rules account.
When you have a server cluster that needs to connect different data centers, the complexity to maintain reliable connections exponentially increases and goes far beyond the scope of this article. Usually, this is something stock exchanges do. For the sake of simplicity, I strongly suggest against designing your game to need such structures.
Identify nodes in node-to-node communication via unique strings. Keep a JSON file or simple text file within reach of the reviver node. The file should contain:
- Unique node identifier (e.g. “FORTRAX-CLIENT-2”)
- IP:port (e.g. “192.168.0.12:3344”)
- means of restarting the node (e.g. “ftp://192.168.0.12//server//client2-node-restart.bat”)
So you can use this file to setup the nodes to the correct IP:port while working with fixed node identifiers in your source code.
Handling node crash
Due to hardware failure or software exceptions, you may run into node crashes. It´s a matter of statistics: the more nodes are part of your realm, the more likely a node crash occurs.
There are two ways to deal with a node crash: compensate or stop-and-restore.
When the crash occurs in a redundant node and your architecture is built expecting such crashes, you can on the fly restore the game state from backup nodes and continue the game. That is the most player-friendly version.
This is usually done in games with many nodes that keep data redundant. The software architecture is not trivial and exceeds the scope of this article by far.
As soon as a server crash occurs you need to stop the game from running because you won´t be able to rescue the game state after the server crash. So it is better, a player never loots that artifact, than looting it in the second after the database server crash and losing it.
In the easiest variant, you disconnect all players with an appropriate message. Then you shut down the entire server cluster. Make a database recovery when needed. Restart the entire server cluster and open it up for reconnecting players.
This is far more inconvenient for players. However, it is acceptable when such occurrences are very rare. Besides, make sure that such server crashes are exclusively caused by hardware failure and not by the server software. The latter you can prevent, the former not.
The most critical data is the permanent state of your game world. You can easily get away with porting all your players to their home place. However, they will not so easily condone the loss of their last loot.
The worst-case scenario is the loss or corruption of the permanent game state. This can happen when the database node crashes during a write operation to the database, creating an invalid entry.
Database recovery strategy
Therefore, you need to check the integrity of the database after a crash. The strategy of the check is as follows:
- try to correct the invalid entry
- delete the invalid entry
- replace the entire database with a safe copy
Therefore, it is imperative to create database backups as often as possible. At least daily, better hourly, or even more frequently. The backups should be stored on different hardware than the database server.
Database: best practice
It may sound overdone, yet making database backups frequently in an automated manner is by far the easiest measure against data loss. The backup node should automatically check for integrity. So this is also a sentinel mechanism against unnoticed database corruption.
To prevent possibly undetectable database corruption, it is imperative that each database operation is independent of others. So changing the owner of an item must be a single operation.
|set owner field of the item to “NewOwner”||create the item with owner “NewOwner”|
|delete item with owner “OldOwner”|
So in the worst case, your permanent game state will suffer a reset. The reset time is the shorter, the more frequently you made backups. Players hate server resets. And the impact of these events drastically increases with the rollback time.
You might think, that the described architecture is like cracking a nut with a sledgehammer. That is true for small games with less than 500 players that might run on a single machine.
However, scalable server architecture should allow your game to adapt to increasing player numbers or increasingly complex player interactions without changing your source code. It can also mean reducing hardware costs by reducing machines with increasingly performant hardware and fixed application needs.
As long as your game doesn´t try to break technical records you should be fine with the architecture described.
Any questions? Feel free to ask!
(Last Updated on May 4, 2021)