Exploiting the commutativity lattice for parallelism

11 minute read

Published:

In the paper “Exploiting the commutativity lattice” the authors present a very interesting approach for managing parallel executions of transactions. They present systematic ways of reasoning about concurrency and how these ways can be implemented over different datastructures.

Background

Serializability through commutativity

When reasoning about concurrent transactions that operate over some datastructures, the goal is to achieve serializability. The aim is to achieve serializability with the abstract state of the datastructures without any implications of the concrete implementations.

Each transaction is made up of several operations or invocations (<mT(v)>σ) that operate on some value and create data as return values. A history is a collection of multiple such invocations.

Two invocations m1(v1) and m2(v2) commute with respect to a state σ iff < m1(v1), m2(v2)>σ ≡ < m2(v2), m1(v1)>σ

C-EQUIVALENT Histories

Using this definition of commuting invocations, we can define commutativity conditions on methods. For this, we need to define C-EQUIVALENT histories. Two histories H and H are C-equivalent iff H can be transformed into a history H by replacing sub-histories of the form < m1(v1), m2(v2)>σ with < m2(v2), m1(v1)>σ

Commutativity condition on invocations

A condition φ on two method invocations in a history H is true only if, for all histories H that are c-equivalent to H, these method invocations commute.

If the invocations commute then everything can proceed as normal but if for any method execution, the commutativity predicate φ fails, the transaction can be aborted and undone.

Commutativity Specifications

A data structure, for example trees, has a specific collection of operations. These operations allow modification of the state and are invoked in a transaction. A set of commutativity conditions on every pair of operations for a datastructure is called it’s commutativity specification. these specifications are written using a grammar on predicate logic and can be extended or simplified.

It is possible to have multiple conditions that are valid when considering whether two operations commute or not. This means, that there is a precise condition that subsumes all these valid conditions. This precise condition is called φ* and φ m1;m2 ⇒ φ* m1;m2

This set of predicates has an infimum false and a supremum φ* and all the valid predicates lie somewhere in between and can be derived using a logical conjunction or disjunction. This can also be represented as a lattice with each path starting from false leading to φ* slowly strengthening the commutativity condition. A strong commutativity condition prevents parallelism.

Implementing commutativity conditions

There are two major ways described, by which we can make use of this lattice in practice.

  • Abstract Locking Schemes
  • Gatekeeping
    • Forward Gatekeeping
    • General Gatekeeping

Abstract locking

An abstract lock has several modes depending on the definition. When a lock is requested in a specific mode, the acquisition succeeds only if no other entity holds the lock in an incompatible mode. This incompatibility is defined using the commutativity conditions on method invocations.

A compatibility matrix can be created that presents the compatibility of different operations. These matrices are well suited in practice for providing conflict checking for collections and other datatypes but programmers need to carefully consider the semantics of the datastructures to ensure a correct implementation.

The algorithm for creating the compatibility nmatrix is presented in the paper.

Gatekeeping

A new paradigm for conflict detection has been introduced in the paper. A gatekeeper is a special object associated with a particular datastructure whose role is to ensure that the methods invoked on that datastructure respect the commutativity rules.

To determine commutativity, a gatekeeper is allowed to evaluate predicates on the arguments and return values of method invocations and might also execute it’s own methods on the datastructure. Since the gatekeeper works on the definition of commutativity on an abstract datatype, the actual implementation of the datatype and the data object itself can be a black box and should not influence the gatekeeper.

Forward Gatekeeping

A forward gatekeeper works by building up information as methods are invoked. In essence, as transactions execute, a log is created containing the method, its arguments and the return value. Any new method invocation is checked against all the preceding entries in the log. If a conflict is detected, the transaction is aborted and undone. This check can and should be able to execute independently. For this, the commutativity conditions for every pair of methods is evaluated online. Since these conditions only depend on the arguments and return values, the log is enough to evaluate them.

There are situations where the condition φ between methods might not be online checkable. This is because of the fact that the argument dependencies are not always linear in time. Some methods use other’s return values which might require a method to be invoked before the commutativity condition between them can be checked. This is not online-checkable.

General Gatekeeping

For method pairs where the commutativity condition is not online-checkable, general gatekeeping provides a better alternative. A general gatekeeper is a forward gatekeeper which is also capable of rolling back transactions and re-executing them. Whenever a dependency requires a method invocation in a particular state, the gatekeeper undoes all the operations until that state is achieved. The required state is then used to invoke the method and the operations ae redone to get the current state.

Comparison of conflict detection schemes

Of the three approached described earlier, the simplest and the one with the lowest overhead is abstract locking since the number of locks required for execution is small. Forward gatekeeping is more expensive since extensive logs need to be maintained. As expected, general gatekeeing is the most expensive since, in addition to the logs, the gatekeeper also undoes and redoes the operations.