In these vignette we will show how to use triple barrier labelling in 2 cases:
First lets import packages and sample data from mlfinance
package:
# import packages
library(data.table)
library(mlfinance)
# import sample data
data(spy)
close <- spy[, c('index', 'close')]
Data contains SPY market data with 5 minute frequency for period from 2018-01-01 to 2020-01-01.
Next, we will calculate daily volatility. There are many options of how to calculate daily volatility from intraday data. Probably best approach would be to calculate realized volatility, but for our simple case we will daily_volatility
function from the package.
# compute daily volatility
daily_vol <- daily_volatility(close, 50)
head(daily_vol)
## Datetime Value
## 1: 2018-01-03 15:39:00 0.0009618423
## 2: 2018-01-03 15:44:00 0.0010268916
## 3: 2018-01-03 15:49:00 0.0011668355
## 4: 2018-01-03 15:54:00 0.0012822868
## 5: 2018-01-03 15:59:00 0.0013209074
## 6: 2018-01-03 16:04:00 0.0013699927
As you can see, the function returns daily volatility for every intraday timestamps. It does so by looking for nearest date from close index vector one day into the past.
Now, usually, we don’t want to trade on every (any) time period (bar), but want to filter important bars. We want to trade only when ‘something’ unusual happens on he market, because than there a re more inefficiencies (rationalities). One way to filter bars i to use CUSUM filter. Simply said, CUSUM filter check if return deviate more than some threshold in some period. We can calculate CUSUM events by following:
cusum_events <- cusum_filter(close, mean(daily_vol$Value, na.rm = TRUE))
head(cusum_events)
## Datetime
## 1: 2018-01-02 21:59:00
## 2: 2018-01-03 18:39:00
## 3: 2018-01-04 15:34:00
## 4: 2018-01-05 17:19:00
## 5: 2018-01-08 19:09:00
## 6: 2018-01-09 19:24:00
Next, we want to decide hoe long into the future we want to wait before we close out position. That is, we have to define vertical barrier:
vartical_barriers <- add_vertical_barrier(cusum_events, close$index, num_days = 1)
head(vartical_barriers)
## Datetime t1
## 1: 2018-01-02 21:59:00 2018-01-03 21:59:00
## 2: 2018-01-03 18:39:00 2018-01-04 18:39:00
## 3: 2018-01-04 15:34:00 2018-01-05 15:34:00
## 4: 2018-01-05 17:19:00 2018-01-08 15:34:00
## 5: 2018-01-08 19:09:00 2018-01-09 19:09:00
## 6: 2018-01-09 19:24:00 2018-01-10 19:24:00
Now we have everything we need to move to triple barrier method. We jump straight to code an dexplain all arguments:
min_return <- 0.005
pt_sl <- c(1, 2)
events <- get_events(price = close, # close data.table with index and value
t_events = cusum_events, # bars to look at for trades
pt_sl = pt_sl, # multiple target argument to get width of the barriers
target = daily_vol, # values that are used to determine the width of the barrier
min_ret = min_return, # minimum return between events
vertical_barrier_times=vartical_barriers, # vartical barriers timestamps
side_prediction=NA) # prediction from primary model (in this case no primary model)
head(events)
## t0 t1 trgt pt sl
## 1: 2018-01-30 16:09:00 2018-01-31 16:09:00 0.006064238 1 2
## 2: 2018-02-02 17:59:00 2018-02-05 15:34:00 0.005542836 1 2
## 3: 2018-02-02 19:44:00 2018-02-05 15:34:00 0.005750978 1 2
## 4: 2018-02-02 20:59:00 2018-02-05 15:34:00 0.005128506 1 2
## 5: 2018-02-05 15:34:00 2018-02-06 15:34:00 0.005760954 1 2
## 6: 2018-02-05 16:14:00 2018-02-06 16:14:00 0.008901707 1 2
The resulting table shows:
Finally, we have to generate bins (labels) based on events table and close prices:
## Datetime ret trgt bin
## 1: 2018-01-30 16:09:00 0.003800121 0.006064238 0
## 2: 2018-02-02 17:59:00 -0.015776612 0.005542836 -1
## 3: 2018-02-02 19:44:00 -0.013933895 0.005750978 -1
## 4: 2018-02-02 20:59:00 -0.009009987 0.005128506 0
## 5: 2018-02-05 15:34:00 -0.043706868 0.005760954 -1
## 6: 2018-02-05 16:14:00 -0.036141512 0.008901707 -1
In the table we can see labels (bins) for dates extracted with CUSUM filter (you can choose your own filter). A a bonus, we get return column (ret
) which can be used as weights in ML models.
Labels can be highly unbalanced which can be a problem for ML model. There is an drop_labels
function in the package that deletes labels under some frequency threshold. Here is the example:
labels_red <- drop_labels(labels, min_pct = 0.2)
## [1] "dropped label: -1 0.140779220779221"
head(labels_red)
## Datetime ret trgt bin
## 1: 2018-01-30 16:09:00 0.0038001208 0.006064238 0
## 2: 2018-02-02 20:59:00 -0.0090099870 0.005128506 0
## 3: 2018-02-05 21:04:00 0.0002242571 0.007565166 0
## 4: 2018-02-05 21:09:00 0.0164386668 0.009108594 1
## 5: 2018-02-05 21:14:00 0.0025474844 0.009527576 0
## 6: 2018-02-05 21:24:00 -0.0054372114 0.009741652 0
The label -1 is deleted because it’s frequency (14%) is lower than threshold (20%).