Statistical arbitrage in the high-frequency domain

Introduction
Statistical arbitrage concepts
High frequency trading concepts
Strategy description

Figure 1: Pairs trading mispricing index

Introduction

Quantitative traders adopt a statistical and machine learning approach known as statistical arbitrage to identify relative mispricings. This market-neutral strategy is executed by an automated trade execution engine based on simple logic. In the high-frequency domain, computer programs monitor and analyse multiple market order books in real-time.

Statistical arbitrage concepts

Market neutrality

The notion of market neutrality is key to statistical arbitrage. A market-neutral strategy is one whose returns are decoupled from the wider market. In other words: the strategy consistently earns a profit regardless of the wider market's directional movements. Hence the strategy actually works and is said to consistently and accurately extract alpha from the market.

One can think of the long-only market-wide investments as sails relying on a breeze subject to a relatively stable weather forecast and hopefully blowing in the right direction. Whereas market-neutral strategies are those that feed on turbulent eddies and waves that are zero-mean disturbances not transferring anything material – simply wealth changing hands ¹.

It is common practice to measure a strategy's risk-adjusted returns using a metric known as the Sharpe Ratio. A strategy with a higher \(>1\) Sharpe Ratio indicates good investment performance compared to just holding the underlying asset ².

Tail dependence

The tail dependence of two random variables is a measure of their co-movements in the tails (extremes) of the distributions. Two random variables can be uncorrelated but still exhibit tail dependence. This is especially true in stock returns during moments of "panic".

It is a stylized fact of stock returns that they commonly exhibit tail dependence ³.

Upper tail dependence

Upper tail dependence refers to co-movements in the upper tails of two random variables \(X_1\) and \(X_2\).

\begin{equation} \lambda_u = \lim{q \to 1} P(X_2 > F_2^{-1}(q) | X_1 > F_1^{-1}(q)) \end{equation}

Lower tail dependence

Lower tail dependence refers to co-movements in the lower tails of two random variables \(X_1\) and \(X_2\). Share prices tend to fall together at the same time rather than rise together. This is due to the aforementioned panic selling phenomenon.

\begin{equation} \lambda_l = \lim{q \to 0} P(X_2 \le F_2^{-1}(q) | X_1 \le F_1^{-1}(q)) \end{equation}

Pairs trading

Pairs trading is a market-neutral, mean-reversion trading strategy involving two correlated currency pairs. The strategy was pioneered by Gerry Bamberger and Nunzio Tartaglia's quantitative group at Morgan Stanley in the 1980s ⁴.

The basic idea is to model two correlated currency pairs in terms of each other. In other words: predict the expected price of pair A given the price of a pair correlated B. The relative mispricing is then the difference between the expected and actual price. If the mispricing is a mean-reverting process, then it can be traded by longing the under-valued pair and shorting the over-valued pair.

The distance approach

One approach to pairs selection is to use the Euclidean distance function to measure the distance between two currency pairs' returns. Then, for each currency pair, select the nearest pair. This can be visualised using a similarity matrix ⁵.

\begin{equation} D = \sum_{t = 1}^N (p_t^1 - p_t^2) ^ 2 \end{equation}

Then, for each pair \(p_1\) and \(p_2\), the difference between the raw prices is calculated for all timestamps \(t\). Lets call this resulting time series \(diff\):

\begin{equation} diff = p_1^{(t)} - p_2^{(t)} \end{equation}

Time series cointegration

A more sophisticated approach to pairs selection is to take advantage of cointegration: a statistical property of correlated time series integrated of order \(d\) ⁶.

For any two currency pairs \(p_i\) and \(p_j\), if a linear combination of \(p_i\) and \(p_j\) is integrated of order less than \(d\), then the pair is said to be co-integrated.

There are multiple methods for testing cointegration. The most common is the Engle–Granger two-step method. If \(x\) and \(y\) are non-stationary and integrated of order \(d\) then a linear combination of them \(u\) must be stationary for some coefficient \(\beta\).

\begin{equation} u = \beta x - y \end{equation}

Risk management

In this context, risk management refers to controlling the outcome of a bet. This is achieved by setting thresholds for limiting gains and losses, as well as consistently following the same algorithm over a long timeframe.

Therefore, effective risk management offers the potential to reduce both the possibility of a risk occurring and its potential impact ⁷.

Position sizing: how much capital to bet on a position
Stop loss: threshold for cutting losses
Take profit: threshold for taking profit

Risk management is essential because it is impossible to predict every price movement. Therefore, trading is inherently a risk management problem where one repeatedly exposes oneself to volatile price movements and the outcome is unknown.

Meta-learning

Meta-learning involves training a secondary model to use a primary exogenous model. In other words, a secondary model maps the output of a primary model to some labels. A labelled dataset is created via a meta-labelling process which is a well-understood and transparent algorithm.

Suppose that you have a model for setting the side of the bet (long or short). You just need to learn the size of that bet, which includes the possibility of no bet at all (zero size). This is a situation that practitioners face regularly ⁸.

A quantitative trading strategy faces the following three challenges:

Trading signals: when to place a bet and what the side should be
Position sizing: how much to place on each bet
Risk management: when to close a bet to take profit or cut losses

Machine learning techniques can be employed in different ways. One approach is to develop a single model that outputs answers to all three of the above challenges. Another approach is to develop separate models for each challenge. In my opinion, the former is naive because the resulting model is monolithic and inflexible.

We do not want the ML algorithm to learn the side, just to tell us what is the appropriate size.

A machine learning algorithm will be trained to decide whether to take the bet or pass, a purely binary prediction. When the predicted label is 1, we can use the probability of this secondary prediction to derive the size of the bet, where the side (sign) of the position has been set by the primary model ⁸.

In ⁹ the author introduces meta-labelling using non financial time series data. This is interesting and can better illustrate the stages that make up the meta-labelling pipeline. The article also lists some benefits meta-labelling can bring to quantitative trading:

Better model transparency
Reduce over-fitting in the primary model
Less rigid models allow for more complex trading strategies

High frequency trading concepts

The order book

An exchange revolves around the order book: a financial data structure that enables market makers to advertise liquidity to other participants. Those who are interested in buying or selling can take liquidity directly from the order book, without needing to wait in some sort of queue.

Buyers: take from the asks side of the book i.e. somebody sells to them at an agreed price. Buyers want to buy at the lowest possible price.
Sellers: give to the bids side of the book i.e. somebody buys from them at an agreed price. Sellers want to sell at the highest possible price.

Figure 2: Order book visualisation

It makes sense to implement an order book using a binary tree format: mapping price levels to liquidity. This makes inserts, updates, and deletions \(O(n \log(n))\).

Figure 3: Order book binary tree representation

The following snippets shows how one might implement an order book structure in the Elixir programming language ¹⁰, using Erlang general balanced trees ¹¹ as the underlying data structure. An agent process ¹² is used to maintain state.

def apply_snapshot(book, {bids, asks}) do
  Agent.update(book, fn _ ->
    {
        # price levels with liquidity <= 0 should never be added
        # to the book so filter them out before sorting.
        #
        # from_ordict constructs a new tree for both sides but 
        # expects the list of {key, value} tuples to be sorted.
        :gb_trees.from_orddict(
          bids
          |> Enum.reject(fn {_, liquidity} -> liquidity <= 0 end)
          |> Enum.sort_by(fn {price, _} -> price end)
        ),
        :gb_trees.from_orddict(
          asks
          |> Enum.reject(fn {_, liquidity} -> liquidity <= 0 end)
          |> Enum.sort_by(fn {price, _} -> price end)
        )
    }
  end)
end

Both sides of the book are initialised with a snapshot. Then, deltas are applied when they arrive. The next snippet is a function for applying a delta to the bid side.

def apply_delta(book, :bid, {price, liquidity}) do
  if liquidity <= 0 do
    # remove price level from bids side if liquidity <= 0.
    Agent.update(book, fn {bids, asks} ->
        if :gb_trees.is_defined(price, bids) do
          {:gb_trees.delete(price, bids), asks}
        else
          {bids, asks}
        end
    end)
  else
    # otherwise insert/update bid price level's liquidity.
    Agent.update(book, fn {bids, asks} ->
        {:gb_trees.enter(price, liquidity, bids), asks}
    end)
  end
end

The best bid and ask prices can be retrieved without scanning all price levels, as the tree is balanced and already partially sorted.

def best_bid(book) do
  bids = Agent.get(book, fn {bids, _} -> bids end)

  # as long as the bids side is not empty,
  # find the highest somebody is willing to pay.
  if :gb_trees.is_empty(bids) do
    :side_empty
  else
    :gb_trees.largest(bids)
  end
end

def best_ask(book) do
  asks = Agent.get(book, fn {_, asks} -> asks end)

  # as long as the asks side is not empty,
  # find the lowest somebody is willing to sell.
  if :gb_trees.is_empty(asks) do
    :side_empty
  else
    :gb_trees.smallest(asks)
  end
end

Best bid and ask prices

The order book is not stationary. Market makers frequently update their positions in order to match incoming supply and demand from sellers and buyers. Indeed, supply/demand shocks result in rapid price fluctuations. The constant tug-of-war between the best-bid and best-ask price results in a highly efficient market and a so-called "tight spread".

Figure 4: Best best/ask prices visualisation

Generally speaking, the price of a financial instrument is a function of two things.

Supply and demand: surplus supply and insufficient demand will result in a price decrease. Strong demand and insufficient supply will result in a price increase. This is a basic rule of economics. It can be observed in the micro market structure (order book) all the way up the macro market (longer-term price trends).
Market sentiment: this is the "general prevailing attitude of investors as to anticipated price development in a market" ¹³.
- bullish: investors expect an upwards price movement = go long
- bearish: investors expect a downwards price movement = go short

Mid market prices

The mid-market price is the average best price i.e. in-between the best bid and best ask prices.

\begin{equation} mid = \frac {best_{bid} + best_{ask}} {2} \end{equation}

The following charts show order book data for a cryptocurrency market. More specifically, the mid-market price changes through time as well as some sliding window analysis with the following parameters.

Window duration: one hour
Step size: ten minutes

A sliding window is a fixed time window that might contain zero or more events. A sliding window can overlap with a previous version of itself.

Figure 5: Mid market prices visualisation

Figure 6: Mid market prices kurtosis

Figure 7: Mid market prices skew

Figure 8: Mid market prices standard deviation

Strategy description

In the following sections I describe my pairs trading algorithm. This approach is designed for high-frequency mid-market price data, and is a "multi-order" strategy. A position consists of two orders: a long "leg" and a short "leg". The stop-loss threshold is always less than the take-profit, so that the losses from one order are covered by the profits from the other order and vice-versa.

Step 1: pairs selection and formation

Before any pairs trading strategy can begin, one must select appropriate pairs of markets to combine and analyse. This is called pairs selection and is an active area if research. There is no correct approach to pairs selection and there are many techniques described in the literature. In this project, I do not explore advanced or exotic approaches to pairs selection.

Fundamentals / assumptions / heuristics
Clustering (k-means)
Portfolio optimisation
Principal component analysis

If \(U\) is the set of all markets (the universe), then \(P \subset U \times U\) is the set if possible pairs \({(u_1, u_2), (u_1, u_3), ..., (u_i, u_j)}, i \neq j\). Note: a pair \((u_1, u_2)\) is considered the same as \((u_2, u_1)\) hence we are dealing with combinations and not permutations. Also, a pair \((u_1, u_1)\) does not make sense and is considered invalid.

With these definitions in mind: pairs selection is the process of selecting a the most suitable subset \(P_{optimal} \subset P\) of market pairs. For this project, I am arbitraging cryptocurrency markets between exchanges, and I make the assumption that markets with the same base currency/symbol will be correlated with each other. For example, consider Bitcoin - which is quoted against multiple currencies on multiple exchanges:

base	quote	exchange	market identifier tuple (id, base. quote, exchange)
BTC	GBP	coinbase	(1, BTC, GBP, coinbase)
	USD		(2, BTC, USD, coinbase)
	USDT		(3, BTC, USDT, coinbase)
	ETH		(4, BTC, ETH, coinbase)
BTC	USD	kraken	(5, BTC, USD, kraken)
	USDT		(6, BTC, USDT, kraken)
	ETH		(7, BTC, ETH, kraken)
BTC	USD	bitfinex	(8, BTC, USD, bitfinex)
	USDT		(9, BTC, USDT, bitfinex)
	ETH		(10, BTC, ETH, bitfinex)

Then the following pairs (combinations) of markets will have interesting statistical properties when modelled together:

pair ID	market A	market B	shorthand representation
1	(1, BTC, GBP, coinbase)	(5, BTC, USD, kraken)	(1, 1, 5)
2		(6, BTC, USDT, kraken)	(2, 1, 6)
3		(7, BTC, ETH, kraken)	(3, 1, 7)
4		(8, BTC, USD, bitfinex)	(4, 1, 8)
5		(9, BTC, USDT, bitfinex)	(5, 1, 9)
6		(10, BTC, ETH, bitfinex)	(6, 1, 10)
7	(2, BTC, USD, coinbase)	(5, BTC, USD, kraken)	(7, 2, 5)
8		(6, BTC, USDT, kraken)	(8, 2, 6)
9		(7, BTC, ETH, kraken)	(9, 2, 7)
10		(8, BTC, USD, bitfinex)	(10, 2, 8)
11		(9, BTC, USDT, bitfinex)	(11, 2, 9)
12		(10, BTC, ETH, bitfinex)	(12, 2, 10)
…	…	…	…
50	(5, BTC, USD, kraken)	(8, BTC, USD, bitfinex)	(50, 5, 8)
51		(9, BTC, USDT, bitfinex)	(51, 5, 9)
52		(10, BTC, ETH, bitfinex)	(51, 5, 10)
53	(6, BTC, USDT, kraken)	(8, BTC, USD, bitfinex)	(52, 6, 8)
54		(9, BTC, USDT, bitfinex)	(53, 6, 9)
55		(10, BTC, ETH, bitfinex)	(54, 6, 10)

This is repeated for all base symbols (e.g. ETH, LTC, XMR, …) and results in a vary large pool of pairs. The live system will "combine" the relevant data streams into pairs (according to a table like the one above) and perform analysis in real-time. The output for a pair of streams is a relative mispricing index which can be used to predict trade signals.

Step 2: absolute (actual) difference

Given a market pair \((u_1, u_2)\), when either of the markets' mid-market price changes, then the algorithm will re-calculate the absolute difference between them \(| u_1^{mid} - u_2^{mid} |\) - this series is always positive, and is not stationary.

Step 3: smoothed difference

The next transformation involves smoothing the output of step 2. This can be done is many different ways; for this project, I experiment with the following formulas:

Simple moving average
Kalman filter

Step 4: smoothed minus actual

The mispricing index is calculated by subtracting the output of step 2 from the output of step 3 \(MPI = x - smoothed(x), x = | u_1^{mid} - u_2^{mid} |\). The MPI quantities the significance of the difference between the two mid-market prices. This time series is stationary and has interesting statistical properties.