Fenced Data Transfer

Fenced Data Transfer#

Problem

Write a message to memory and have another processor read it on hardware that does not have a sequentially consistent memory model.

Context

The problem normally arises only when unsynchronized threads concurrently act on a memory location, or are using reads and writes to create synchronization. High level synchronization constructs normally include mechanisms that prevent unwanted reordering.

Modern hardware and compilers can reorder memory operations in a way that preserves the order of a thread’s operation from its viewpoint, but not as observed by other threads. A serial common idiom is to write a message and mark it as ready to ready as shown in the following code:

bool Ready;
std::string Message;


void Send( const std::string& src ) {. // Executed by thread 1
   Message=src;
   Ready = true;
}


bool Receive( std::string& dst ) {    // Executed by thread 2
   bool result = Ready;
   if( result ) dst=Message;
   return result;              // Return true if message was received.
}

Two key assumptions of the code are:

  1. Ready does not become true until Message is written.

  2. Message is not read until Ready becomes true.

These assumptions are trivially true on uniprocessor hardware. However, they may break on multiprocessor hardware. Reordering by the hardware or compiler can cause the sender’s writes to appear out of order to the receiver (thus breaking condition a) or the receiver’s reads to appear out of order (thus breaking condition b).

Forces

  • Creating synchronization via raw reads and writes.

Solution

Change the flag from bool to std::atomic<bool> for the flag that indicates when the message is ready. Here is the previous example with modifications.

std::atomic<bool> Ready;
std::string Message;


void Send( const std::string& src ) {. // Executed by thread 1
   Message=src;
   Ready.store(true, std::memory_order_release);
}


bool Receive( std::string& dst ) {    // Executed by thread 2
   bool result = Ready.load(std::memory_order_acquire);
   if( result ) dst=Message;
   return result;              // Return true if message was received.
}

A write to a std::atomic value has release semantics, which means that all of its prior writes will be seen before the releasing write. A read from std::atomic value has acquire semantics, which means that all of its subsequent reads will happen after the acquiring read. The implementation of std::atomic ensures that both the compiler and the hardware observe these ordering constraints.

Variations

Higher level synchronization constructs normally include the necessary acquire and release fences. For example, mutexes are normally implemented such that acquisition of a lock has acquire semantics and release of a lock has release semantics. Thus a thread that acquires a lock on a mutex always sees any memory writes done by another thread before it released a lock on that mutex.

Non Solutions

Mistaken solutions are so often proposed that it is worth understanding why they are wrong.

One common mistake is to assume that declaring the flag with the volatile keyword solves the problem. Though the volatile keyword forces a write to happen immediately, it generally has no effect on the visible ordering of that write with respect to other memory operations.

Another mistake is to assume that conditionally executed code cannot happen before the condition is tested. However, the compiler or hardware may speculatively hoist the conditional code above the condition.

Similarly, it is a mistake to assume that a processor cannot read the target of a pointer before reading the pointer. A modern processor does not read individual values from main memory. It reads cache lines. The target of a pointer may be in a cache line that has already been read before the pointer was read, thus giving the appearance that the processor presciently read the pointer target.