Parallel Programming Basics

Parallel Programming Basics
image by ilovehz - www.freepik.com

Parallel programming is essential to creating a performant server cluster and networking functionality. The promise of performance is paid with complexity. Get an introduction to one of the most interesting and complicated programming topics.

Table of Contents


Peculiarities of parallel programming

Why parallel programming?

Basically, every problem that consists of parts that are independent of each other might be solved with a parallel programming approach at the costs of increased complexity.

Carefully consider such cases. The increased complexity can easily run out of your hands and leave you with a failed programm fragment. Also, the performance increase might be less than expected. Some problems even run slower when solved in parallel.

Parallel programming is not a silver bullet. As a rule of thumb, only use parallel programming when you don´t have any other choice.

Use cases

Typical use cases for parallel programming in MMORPGs are:

  • receive network messages
  • send network messages
  • process mob actions (in partial)
  • handle large amounts of players
  • parallel architecture of a server cluster

Threads, Tasks, Coroutines, Jobs

Parallel programming is the most general term to describe solutions that have concurrently running parts. So a server cluster, even when each machine would only run non-parallel programming flows, would be a parallel programming architecture as a whole.

Threads, Tasks, Coroutines, Jobs are parallel programming implementations of the operating system, programming languages, or game engines.

Whatever such implementations are called, you need to make sure whether they run truly in parallel or not. Threads and Tasks run truly in parallel. So their code must be thread-safe.

A thread runs on a separate CPU core with a little overhead. A typical thread would be an application part that processes incoming network messages.

A task is something that needs to be done with a life span mostly shorter than the application. A typical task would be to process the actions of an NPC.

Coroutines are a vehicle of the Unity game engine to parallel programming. However, coroutines are called sequentially. So the code in coroutines doesn´t need to be thread-safe. This is done on purpose and for the convenience of the users.

What is non-thread-safe

The thread-safe attribute is given to anything that correctly works in parallel programming aka a multi-thread environment. When not stated otherwise, you can safely assume every code non-thread-safe.

A common pitfall is the standard Random-Number-Generator of most libraries. These generators use global variables. So when several instances of a method access these variables at the same time, the result is something weird.

In fact, it is cumbersome to write thread-safe code. Also, one mistake makes your entire code block non-thread-safe. An incomplete list of mistakes that make your code non-thread-safe:

  • writing to global available or in any other way shared data
  • methods that have reference type data as parameter and change that data (“reference leak”)
  • data structures that are themselves thread-safe like an Immutable List can contain reference type data. Changing that data is not thread-safe
  • calling methods that are non-thread-safe
  • direct use of hardware resources (network, hard disk)

What is thread-safe

Methods that don´t access global available or in any other way shared data are thread-safe. Such methods only work with parameters that are data types (reference type data as parameter is regarded shared data and called “reference leak”) and local variables.

Thread safe operations are either

  • atomic
  • purely read (immutable collections)
  • locked (interlocked operations, concurrent collections, blocked collections)

Parallel logic

The thread-safe attributes don´t mean, that your method is content-wise correct. You can have a perfectly thread-safe method that fails.

Imagine a method that has to add global variables. You make sure that read access is atomic, so your method is thread-safe. So while instances of your method are adding global variables, these variables change. So, the results of your methods are often incorrect. You ignored parallel logic.

Only, because something is thread-safe doesn´t mean that it is suitable for a parallel programming approach. Bugs as a result of wrong parallel logic are very common and very hard to track. Such bugs caused by “race conditions” are usually irreproducible and a developer’s nightmare.


Atomic operations

The smallest possible read or write action to a memory address is an atomic operation. That is of high relevance in a programming environment where multiple read and write processes can access data.

Imaging a program that count each access to a database in a 8-byte data fiels. Another thread wants to read that number. At the very same moment the number is increased.

The reading thread gets neither the old nor the new number. What he gets is half the memory with the old and half the memory with the new number. Something completely invalid. So instead of 734 or 735, the result will be something like 8450934895465.

What operations are atomic

Reading or writing the following data types are atomic when they are properly aligned:

  • bool
  • char
  • byte
  • short (2-byte integer)
  • integer (4-byte integer)
  • float (4-byte float)
  • long (8-byte integer; only on 64 bit systems)
  • double (8-byte float; only on 64 bit systems)
  • reference types

So whats that part with “properly aligned”. That means when they are put correctly in memory. Usually you can assume that they are.

Consider, that even two atomic operations in a row are – as a whole – not atomic. Because something can happen between the two operations. So even increasing an atomic variable by 1 means reading that variable and writing that variable.

//atomic
int a = 0;

//not atomic
a = a + 1;

Purely read memory access

Memory access

Memory access is reading from or writing to a physical location at a particular time point. For simplicity let´s assume the following: At any given time interval there can be only one read or write operation to a specific memory address.

Parallel Programming Basics: Trying to access memory at the very same time fails.
Trying to access memory at the very same time fails.

So when many actors want to read at the very same moment from the very same address, that fails.

Queues

The simplest solution to enable parallel reading is to queue all read requests.

Parallel Programming Basics: Queueing reads results in delays.
Queueing reads results in delays.

So all requested read operations are executed one after another. Nothing is lost except time. Queues are a reliable and versatile solution to many problems related to parallel programming.

Queues are bridges between linear and parallel program execution.

Parallel Reading

When speed matters, there is another solution. Create multiple copies that readers can access. Maybe you need to queue readers on that copies. However, you can reduce the queue length arbitrarily by creating more copies.

Parallel Programming Basics: Reading from multiple copies.
Parallel Programming Basics: Reading from multiple copies.

As a result, parallel reading is perfectly scalable as long as data doesn´t change. How to ensure that each copy is identical to the original is another story.

Parallel writing doesn´t work that way

Similar to reading you need to queue concurrent writing operations to prevent concurrent memory access.

However, parallel writing doesn´t work like parallel reading for obvious reasons.

Parallel Programming Basics: Paralell writing doesn´t work this way.
Paralell writing doesn´t work this way.

Parallel writing is not scalable. Rather, you even need to block read access while writing. So writing is way more time-consuming than reading in parallel programming. There is even an entire programming approach that derives from the clarity of read-only data: Immutables.

Immutables

Immutable data is read-only data. An immutable variable is not a variable anymore. Rather it is a constant. Sounds useless at first glance, yet the concept is most helpful in parallel programming.

Operating with immutables

In normal programming, a simple operation like the addition of two variables means that the result is written in the first variable.

int a;
int b;

a = a + b;

The same operation with immutables means that for the result a new immutable variable is created.

readonly int a;
readonly int b;

readonly int c = a + b;

Declaration of immutables

Some programming languages have native immutable support. In other languages, you may need to build something yourself. However, that should be no problem. You simply need a data structure that cannot be changed after initialization.

The simpest way is the readonly modifier.

readonly int a = 10;

However, the readonly modifier forces to assignment to the variable only in its declaration or in a constructor in the same class.

//Assignment to the variable in its declaration

readonly int a = 10;    //works

a = a + 1;  //doesn´t work and throws compile error
//Assignment to the variable in a constructor in the same class

public class SampleClass
{
    private readonly int x;
    
    public SampleClasse(int initValue)
    {
        x = initValue;     //works
    }
    
    public void SampleMethod(int newValue)
    {
        x = newValue;   //doesn´t work and throws compile error
    }

}

Working with the readonly modifier alone is cumbersome. You gain more flexibility by creating an immutable class for each data type you need.

Let´s look at the class definition that capsules a basic data type.

public class ImmutableDouble
{
    public readonly double value;
    
    public ImmutableDouble(double newValue)
    {
        value = newValue;
    }
}

Let´s work with that class. The difference is in line 13. Instead of assigning a new value to an immutable (which doesn´t work), we create a new class instance and assign its reference to x.

public class SampleClass
{
    private ImmutableDouble x;
    
    public SampleClasse(double initValue)
    {
        x = new ImmutableDouble(initValue);     //works
    }
    
    public void SampleMethod(int newValue)
    {
        x.value = newValue;       //doesn´t work
        x = new ImmutableDouble(newValue);   //works
    }
    
    public double GetValue()
    {
        return x.value;
    }

}

What happens to the old class instance? As soon as there are no references to a class instance, it goes to the garbage collecter.

Now, let´s look at a class definition that capsules a Unity Vector3 data type. The use of the class is the same as described above.

public class ImmutableVector3
{
    public readonly Vector3 vector3;

    public ImmutableVector3(Vector3 newVector3)
    {
        vector3 = newVector3;
    }

}

You can also create immutable variants of classes. Let´s look at a more complex example. The features of the class that makes the example more complex are:

  1. Inheritance of two interfaces (line 7). That´s why we need to implement IHasReadComplexUID (is done in line 9), and IItemInfo (is done in line 44 to 52)
  2. Provision of a helper method CreateList (line 33). This helper doesn’t need to be here. In fact, it could be cleaner to put it in a separate processor class.
using System;
using System.Collections.Generic;
using Vortex.Loot;

namespace Vortex.Structure
{
    public class ImmutableItemDistribution : IHasReadComplexUID, IItemInfo
    {
        public ComplexUID ComplexUID { get; }
        public readonly Int64 itemDefinitionUID;
        public readonly Int16 count;
        public readonly bool soulbound;
        public readonly ItemLocation itemLocation;
        public readonly UInt32 additionalLocationSID;
        public readonly string ownerName;
        public readonly Int16 bagSlot;

        public ImmutableItemDistribution(ItemDistribution template)
        {
            if (template == null)
                return;

            this.ComplexUID = template.ComplexUID;
            this.itemDefinitionUID = template.itemDefinitionUID;
            this.count = template.count;
            this.soulbound = template.soulbound;
            this.itemLocation = template.itemLocation;
            this.additionalLocationSID = template.additionalLocationSID;
            this.ownerName = template.ownerName;
            this.bagSlot = template.bagSlot;
        }

        public static List<ImmutableItemDistribution> CreateList(List<ItemDistribution> template)
        {
            if (template == null)
                return null;
            List<ImmutableItemDistribution> result = new List<ImmutableItemDistribution>();
            foreach (ItemDistribution entry in template)
                result.Add(new ImmutableItemDistribution(entry));

            return result;
        }

        #region IItemInfo implementation
        public ComplexUID GetComplexUID()
        {
            return ComplexUID;
        }

        //removed for better readability
        //here were methods similar to the one above
        #endregion
    }
}

Advantages and disadvantages

So what are the advantages of immutables:

  • they will always contain a valid value (yet, eventually the data may become outdated)
  • they are always readable and you can apply ways to speed up parallel reading like creating copies
  • you can give them away without fearing illicit change (something you cannot do with reference data types)
  • the flow of data is clearer with immutables because you know that immutables are not supposed to be changed

And what are the disadvantages:

  • creating a new variable for each processing step instead of changing an existent variable requires more memory and may be slower
  • you need to think about when you finally release the immutable / garbage collecting

In the end, you trade memory and speed for parallelism and clarity. It is worth it.

Switching

Often you need to change a collection of data in one step. That is when changing them in parts creates invalid states that you don´t want to be read.

An example is player coordinates. They only describe the correct player position in unity. So Xold, Yold is valid and Xnew, Ynew is valid. Yet Xold, Ynew is never correct.

Another example is a message queue of client messages that arrived at the server in the last 50 milliseconds. While you process that queue you might not want new messages to be added.

Switching is the way to handle these situations. Access to data is via a reference. You prepare the new data, and when it is complete, you make the reference point to the new data.

Writing a new value to a reference is an atomic operation.

Parallel Programming Basics: Switching the reference from old player data to new player data is an atomic operation and therefore thread-safe.
Switching the reference from old player data to new player data is an atomic operation and therefore thread-safe.

Switching is both relevant for working with immutable data and for working with data that is only valid in unity (which most of the data is).


Locked memory access

When writing globally available data that is not atomic, you need to lock access to that data while you write. Programming languages and operating systems offer a variety of lock mechanisms (lock, mutex, semaphore, etc.).

Such mechanisms differ in the way they lock data and the way to handle errors. Locking can seriously impair the performance of your application because threads stop until they acquire the lock. Also, there is the risk of deadlocks.

Deadlock

What is a deadlock?

It’s a chicken-and-egg situation, a catch-22 situation, a quagmire. When two threads try to lock on something that the other thread has already locked, both of them are blocked.

Blocked means “frozen forever”. No exception is thrown, no error or warning message. The worst kind of application crash.

Deadlocks are a developer’s nightmare. Similar to race conditions they are irreproducible. Besides, they can remain silent for years and will hit your application in the very worst moment. This is, because the higher your application load, the more likely the conditions for a deadlock are meet.

How to prevent deadlocks?

Obviously, when you don´t lock objects, you can´t have deadlocks. As stupid as it sounds, it is really worth considering. As soon as you have locks, you can never be 100 % sure not to have deadlocks.

Also, you may decide to rely on high-quality third-party solutions. The concurrent collections of several programming languages are implemented with fine-grained locks under their hood. As such collections are used by hundreds of thousands of applications, you can reasonably assume them safe.

If you really can´t do without locks, consider the following. A deadlock needs three conditions:

  1. the application waits for unlimited time to get a lock on data
  2. lock on data cannot be removed
  3. two or more threads need to access each other´s data

Usually, only condition three is in your hands. So you should avoid calling locked code from another locked code (nested lock call). If you can´t avoid even that then the ice is really thin. Make at least sure, that both locked code parts can do their work independently from each other.

Lock objects and reentrant locks

For the purpose of locking a class instance in C# it is good practice to have a hidden lock object as a variable of that class.

     private readonly object thisLock = new object();

A simple class with such a lock object and two methods could look so:

 public class ClassWithLock
 {
     private readonly object thisLock = new object();
     
     public MethodA()
     {
         lock (thisLock)
         {
             MethodB();
         }
     }
     
     public MethodB()
     {
         lock (thisLock)
         {
             //Do something
         }
     }
 }

What will happen, when MethodA is called? It will lock on thisLock and call MethodB. MethodB will try to lock on thisLock, which is already taken: Deadlock!

Actually, you are theoretically right, but practically not. The presented case is so common, that C# engineers decided to make lock calls from the same class instance not blocking for developers’ convenience.

So you actually cannot block with a lock within the same class instance. This property of C# is called reentrant lock. In other programming languages, it might be different.

Best practice

Parallel programming is particularly advantageous in tasks that involve purely reading operations of immutable shared data and operations that work completely independently from each other.

The parts of your program which require writing to shared data are a minefield for parallel programming. Locking means that the thread will stop working until it has acquired the lock. So each lock is a risk to performance. Besides, you get the risk of deadlocks.

Better avoid that completely. The best practice is to process the results of independent operations in parallel and put them in a queue. Then write shared data sequentially.


Checklist

Parallel programming basics include:

Applications of parallel programming basics are found in articles about Handling large amounts of players and Scalable MMORPG server architecture.


Server Sid
Server Sid

Welcome to the crazy world of parallel programming!

(Last Updated on May 14, 2021)

Leave a comment

Your email address will not be published.