triple_barrier_labeling • mlfinance

In these vignette we will show how to use triple barrier labelling in 2 cases:

Without primary model.
With primary model (meta labelling).

First lets import packages and sample data from mlfinance package:

# import packages
library(data.table)
library(mlfinance)

# import sample data
data(spy)
close <- spy[, c('index', 'close')]

Data contains SPY market data with 5 minute frequency for period from 2018-01-01 to 2020-01-01.

Next, we will calculate daily volatility. There are many options of how to calculate daily volatility from intraday data. Probably best approach would be to calculate realized volatility, but for our simple case we will daily_volatility function from the package.

# compute daily volatility
daily_vol <- daily_volatility(close, 50)
head(daily_vol)

##               Datetime        Value
## 1: 2018-01-03 15:39:00 0.0009618423
## 2: 2018-01-03 15:44:00 0.0010268916
## 3: 2018-01-03 15:49:00 0.0011668355
## 4: 2018-01-03 15:54:00 0.0012822868
## 5: 2018-01-03 15:59:00 0.0013209074
## 6: 2018-01-03 16:04:00 0.0013699927

As you can see, the function returns daily volatility for every intraday timestamps. It does so by looking for nearest date from close index vector one day into the past.

Now, usually, we don’t want to trade on every (any) time period (bar), but want to filter important bars. We want to trade only when ‘something’ unusual happens on he market, because than there a re more inefficiencies (rationalities). One way to filter bars i to use CUSUM filter. Simply said, CUSUM filter check if return deviate more than some threshold in some period. We can calculate CUSUM events by following:

cusum_events <- cusum_filter(close, mean(daily_vol$Value, na.rm = TRUE))
head(cusum_events)

##               Datetime
## 1: 2018-01-02 21:59:00
## 2: 2018-01-03 18:39:00
## 3: 2018-01-04 15:34:00
## 4: 2018-01-05 17:19:00
## 5: 2018-01-08 19:09:00
## 6: 2018-01-09 19:24:00

Next, we want to decide hoe long into the future we want to wait before we close out position. That is, we have to define vertical barrier:

vartical_barriers <- add_vertical_barrier(cusum_events, close$index, num_days = 1)
head(vartical_barriers)

##               Datetime                  t1
## 1: 2018-01-02 21:59:00 2018-01-03 21:59:00
## 2: 2018-01-03 18:39:00 2018-01-04 18:39:00
## 3: 2018-01-04 15:34:00 2018-01-05 15:34:00
## 4: 2018-01-05 17:19:00 2018-01-08 15:34:00
## 5: 2018-01-08 19:09:00 2018-01-09 19:09:00
## 6: 2018-01-09 19:24:00 2018-01-10 19:24:00

Now we have everything we need to move to triple barrier method. We jump straight to code an dexplain all arguments:

min_return <- 0.005
pt_sl <- c(1, 2)
events <- get_events(price = close,                             # close data.table with index and value
                     t_events = cusum_events,                   # bars to look at for trades
                     pt_sl = pt_sl,                             # multiple target argument to get width of the barriers
                     target = daily_vol,                        # values that are used to determine the width of the barrier
                     min_ret = min_return,                      # minimum return between events
                     vertical_barrier_times=vartical_barriers,  # vartical barriers timestamps
                     side_prediction=NA)                        # prediction from primary model (in this case no primary model)
head(events)

##                     t0                  t1        trgt pt sl
## 1: 2018-01-30 16:09:00 2018-01-31 16:09:00 0.006064238  1  2
## 2: 2018-02-02 17:59:00 2018-02-05 15:34:00 0.005542836  1  2
## 3: 2018-02-02 19:44:00 2018-02-05 15:34:00 0.005750978  1  2
## 4: 2018-02-02 20:59:00 2018-02-05 15:34:00 0.005128506  1  2
## 5: 2018-02-05 15:34:00 2018-02-06 15:34:00 0.005760954  1  2
## 6: 2018-02-05 16:14:00 2018-02-06 16:14:00 0.008901707  1  2

The resulting table shows:

\(t_0\) - start date of the bar
\(t_1\) - end date of the bar
\(trgt\) - event’s target, threshold that has to be touched (after multiplying with \(pt\) or \(st\))
\(pt\) - profit taking multiplier
\(sl\) - stop loss multiplier

Finally, we have to generate bins (labels) based on events table and close prices:

labels <- get_bins(events, close)
head(labels)

##               Datetime          ret        trgt bin
## 1: 2018-01-30 16:09:00  0.003800121 0.006064238   0
## 2: 2018-02-02 17:59:00 -0.015776612 0.005542836  -1
## 3: 2018-02-02 19:44:00 -0.013933895 0.005750978  -1
## 4: 2018-02-02 20:59:00 -0.009009987 0.005128506   0
## 5: 2018-02-05 15:34:00 -0.043706868 0.005760954  -1
## 6: 2018-02-05 16:14:00 -0.036141512 0.008901707  -1

In the table we can see labels (bins) for dates extracted with CUSUM filter (you can choose your own filter). A a bonus, we get return column (ret) which can be used as weights in ML models.

Labels can be highly unbalanced which can be a problem for ML model. There is an drop_labels function in the package that deletes labels under some frequency threshold. Here is the example:

labels_red <- drop_labels(labels, min_pct = 0.2)

## [1] "dropped label: -1 0.140779220779221"

head(labels_red)

##               Datetime           ret        trgt bin
## 1: 2018-01-30 16:09:00  0.0038001208 0.006064238   0
## 2: 2018-02-02 20:59:00 -0.0090099870 0.005128506   0
## 3: 2018-02-05 21:04:00  0.0002242571 0.007565166   0
## 4: 2018-02-05 21:09:00  0.0164386668 0.009108594   1
## 5: 2018-02-05 21:14:00  0.0025474844 0.009527576   0
## 6: 2018-02-05 21:24:00 -0.0054372114 0.009741652   0

The label -1 is deleted because it’s frequency (14%) is lower than threshold (20%).