Einzeltreffer — DigiBib

LASSO-based high-frequency return predictors for profitable Bitcoin investment

This article explores the Bitcoin return predictability of variables constructed from one-minute high-frequency Bitcoin trading data. During the training period of 2012–2018, LASSO is used to pick out the most powerful predictors. We then use predictors selected by LASSO to predict the Bitcoin returns in the 2018–2019 test sample. An investment strategy based on the return predictions outperforms a simple buy-and-hold strategy and other strategies based on the prediction of Ordinary Least Squares and Neural Networks.

Keywords: Bitcoin; high-frequency; investment strategy; LASSO; neural networks

I. Introduction

Being the most successful application of the blockchain technology, Bitcoin has shown speculation, complementary-currency and diversification characteristics resembling an investable asset class (Bouri et al. [2]; Carrick [4]; Gandal et al. [6]). Due to its increasing importance, there exists a growing literature on the determinants of Bitcoin returns. Performance of various financial markets (e.g. equity, interests, commodity and exchange rates as discussed in Kristoufek [8]), macroeconomic indicators (e.g. unemployment, policy uncertainty and etc. as explored by Panagiotidis, Stengos, and Vravosinos [10]), and social factors like online search intensity (Hakim das Neves [7]; Nasir et al. [9]) are all identified as possible covariates of Bitcoin returns. Many novel forecasting models are also utilized to attempt to capture the price trend of cryptocurrencies (Sun, Liu, and Sima [13]). However, these factors do not adequately explain the daily returns of Bitcoin on a continuous intraday basis, since these high-frequency fluctuations are more likely to be affected by noise and momentum rather than fundamentals (Balcilar et al. [1]).

In line with studies emphasizing the importance of high-frequency data (Catania and Sandholdt [5]), and by integrating the LASSO approach, predictive regression, and investment strategy, this article aims to investigate whether high-frequency Bitcoin trading information in prior days can forecast today's Bitcoin returns. We examine a collection of potential factors that can be constructed based on all available high-frequency fields at every one-minute interval. After dividing our whole sample period of 2012–2019 into the training and test periods, we resort to the Least Absolute Shrinkage and Selection Operator (LASSO) framework in the training period to first select the most powerful predictors from the above candidate factors. Then, we confirm that only a subset of these LASSO-selected high-frequency drivers possess actual significant predictability in a statistical linear regression. Last but not least, we establish an activist investment method based on the former two steps and discover that it can generate better performance than a passive status-quo strategy and alternative Ordinary Least Squares (OLS) and Neural Network estimations.

II. Data

The data contain intraday high-frequency trading records sourced from the Bitstamp exchange in minute intervals. Our sample period spans from January 1st, 2012 to August 12th, 2019. Raw data fields contain timestamps expressed in Unix time, minute-to-minute updates of OHLC (open, high, low, close) prices, volumes denominated in both cryptocurrency and indicated money units, and a weighted Bitcoin price.[1] During a minute without any activity, the price and volume data entries are filled with NaNs. The number of NaNs also serves as a potential predictor since it captures to some extent the illiquidity inherent in Bitcoin transactions.

Then, we aggregate minute-frequency variables to the daily level. In specific, daily OHLC prices are their respective minute counterparts in a day's time range. Regarding transaction volume and NaNs count, we sum minute-level observations during a trading day. For the weighted price quote, we take the average across all minute values. Further, the collection of forecasting variables can be expanded by computing the intraday volatility for each daily statistic (except for the number of NaNs). Besides, we also calculate the cross-day differences for the above-mentioned variables by subtracting their day t-1 values from their day t values. In sum, we obtain 30 high-frequency predictors in total. And they are scaled into a consistent order of magnitude. The predictive target is the continuously compound (cc) Bitcoin return between t and t + 1 evaluated at daily open prices. It merits a note that our treatment of data and results stay robust to data from an alternative data source – trades on Coinbase from 1 December 2014 to 9 January 2019.

III. Methodology

We consider two periods separately: the training period including data from 1 January2012 through 17 May 2018, and the test period of 18 May 2018 to 12 August 2019. Using the training subsample, we estimate LASSO models to pick out powerful Bitcoin-return predictors from the pool of high-frequency candidate variables. In machine learning, LASSO performs both variable selection and regularization to enhance prediction accuracy and interpretability (Bühlmann and Van De Geer [3]). It emphasizes proportional coefficient shrinkage, in which data values are shrunk towards a central point. Since sparse models (i.e. models with fewer parameters) are encouraged, LASSO well suits our sample exhibiting sparsity in Bitcoin trading activities at a high frequency and high levels of multicollinearity among candidate predictors on an intra-day basis or, like in our case, when we want to automate the predictor elimination part of modelling. Mathematically, we augment the OLS SSE to be:

Graph

$S S E = \sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})^{2} + λ \sum_{j = 0}^{k} | b_{j} |$

The variable selection function makes LASSO preferred over Ridge regression in that some coefficients can take the value of zero, indicating that the corresponding variables are not contributing to the model. The larger the value of

Graph

$λ$ , the more coefficients will be set to zero. In our study, we apply 10-fold cross-validation to optimize

Graph

$λ$ . By doing this, potent Bitcoin return predictors are:

Lvol_1: 1 day lagged volatility of the low price

Vb_1: 1 day lagged daily volume of Bitcoin transaction

Vbvol_diff: cross-day difference in the volatility of Bitcoin volume

Open_diff: cross-day difference in the open price

Close_diff: cross-day difference in the close price

Vb_diff: cross-day difference in daily Bitcoin trading volume

Vc_diff: cross-day difference in volume denominated in fiat currency

NaNs_diff: cross-day difference in NaNs count

IV. Results

Table 1 shows the results of regression Bitcoin returns on predictors selected by LASSO. As can be seen, Vb_diff and NaNs_diff are significantly negatively correlated with returns. However, Close_diff and Vc_diff have a positive association with returns at the 1% significance level. These results confirm that variables sticking out from LASSO selection indeed possess strong predictability.

Table 1. Regression of Bitcoin returns on predictors selected by LASSO

	Dependent variable:
	Returns
Lvol_1	−0.10
Vb_1	−0.10
Vbvol_diff	−0.05
Open_diff	−0.13
Close_diff	2.39***
Vb_diff	−1.07***
Vc_diff	0.50***
NaNs_diff	−0.40***
Constant	0.003***
Observations	2,259
Log Likelihood	3,809.08
Akaike Inf. Crit.	−7,600.17

1 Note:

Graph

$^{*}$ p

Graph

$<$ 0.1;

Graph

$^{* *}$ p

Graph

$<$ 0.05;

Graph

$^{* * *}$ p

Graph

$<$ 0.01.

Given the predictive power confirmed, we take one step ahead by utilizing it to construct a simple investment strategy on Bitcoin assets. Like the buy-and-hold strategy, we dynamically adjust coin holdings according to the steps below. First, in each day in the test sample period, with predictors selected by LASSO and estimated coefficients obtained based on the training sample, the predictive regression model is conducted over the period from the first day of the test up to the previous day of the concerned one. And we obtain an estimated return for the next day. Second, a threshold level of return is established based on outside investment options, say 0.0433, which equates to the five-year moving average of long-term corporate bond returns.

Third, when our predicted return exceeds the threshold, we should either turn all pocket cash (if there is some due to previous Bitcoin liquidation) into additional Bitcoin on day t + 1, or do nothing if we do not have cash left in day t. When the predicted return becomes less than the negative of the threshold value, the strategy is that we sell all Bitcoin on day t + 1 if we held Bitcoin on day t or we keep holding cash if the balance of coins is zero on day t. In all other cases, i.e., the return estimate falls within the range between negative and positive threshold, we stay with the positions we have exposed to. As can be seen from the dash-dotted and solid line in Figure 1, the gross return of the buy-and-hold strategy is plotted in comparison to our proposed strategy during the same test period. Our strategy that takes advantage of predictors selected by LASSO is shown to generate a consistent and sizable premium.

PHOTO (COLOR): Figure 1. Comparing gross returns of Bitcoin investment strategies based on alternative prediction methods, May 2018 to August 2019.

V. Benchmarking

We continue to compare the effectiveness of LASSO-based investing to alternatives lying on OLS or neural networks (NNET) estimation. Because neither OLS nor NNET provides an elimination mechanism of potential predictors, we use all intraday variables constructed to ensure that information is extracted as much as possible. The finding is that the in-sample root means square errors (RMSE) for LASSO, OLS and NNET are 0.0447, 0.0444, and 0.0476, respectively; whereas the corresponding out-of-sample RMSE for these three methods are 0.0153, 0.0168, and 0.0286.[2] That is to say, the RMSE for LASSO-based predictive regression is less than that for the two alternatives. We again observe that LASSO in our setup can indeed pick out the most important indicators from a pool of predictors, hence mitigating disturbances by noisy variables. Then, we rely on both OLS and NNET predictions to replicate the Bitcoin investment strategy implemented previously with LASSO. Not surprisingly, in terms of gross returns, the LASSO-based strategy outperforms. The dashed and dotted line in Figure 1, respectively, represent the time series of OLS- and NNET-based returns. At the last check, we replace LASSO with the general-to-specific method to choose the best-fitting predictors; and formally test the in-sample and out-of-sample predictability of LASSO-selected variables (Rapach, Wohar, and Rangvid [12]; Rapach and Wohar [11]). The corresponding results corroborate our treatment to profit from Bitcoin.

VI. Conclusion

This paper considers 30 variables constructed from high-frequency Bitcoin trading that may foreshadow next-day Bitcoin returns. We apply LASSO to find those with the strongest predictive power. As suggested by our sample, the cross-day differences in open price and trading volumes of fiat money unit relate to Bitcoin cc returns statistically positively. However, the cross-day differences in the volume denominated by cryptocurrency and such differences in daily NaNs count are adversely correlated with returns. We next establish an investment strategy based on the performance of these return predictors. It is demonstrated that, during the 1-year test period, our strategy outperforms the benchmark buy-and-hold strategy as well as other strategies derived from OLS and NNET estimators or a general-to-specific variable-fitting procedure.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Acknowledgement

This work is supported by the General Program of National Natural Science Foundation of China (No. 72071211).

References 1 Balcilar, M., E. Bouri, R. Gupta, and D. Roubaud. 2017. " Can Volume Predict Bitcoin Returns and Volatility? A Quantiles-based Approach." Economic Modelling 64 : 74 – 81. doi: 10.1016/j.econmod.2017.03.019. 2 Bouri, E., P. Molnar, G. Azzi, D. Roubaud, and L. I. Hagfors. 2017. " On the Hedge and Safe Haven Properties of Bitcoin: Is It Really More than a Diversifier? " Finance Research Letters 20 : 192 – 198. doi: 10.1016/j.frl.2016.09.025. 3 Bühlmann, P., and S. Van De Geer. 2011. Statistics for High-dimensional Data: Methods, Theory and Applications. Heidelberg, Germany: Springer Science & Business Media. 4 Carrick, J. 2016. " Bitcoin as a Complement to Emerging Market Currencies." Emerging Markets Finance and Trade 52 (10): 2321 – 2334. doi: 10.1080/1540496X.2016.1193002. 5 Catania, L., and M. Sandholdt. 2019. " Bitcoin at High Frequency." Journal of Risk and Financial Management 12 : 36. doi: 10.3390/jrfm12010036. 6 Gandal, N., J. T. Hamrick, T. Moore, and T. Oberman. 2018. " Price Manipulation in the Bitcoin Ecosystem." Journal of Monetary Economics 95 : 86 – 96. doi: 10.1016/j.jmoneco.2017.12.004. 7 Hakim das Neves, R. 2020. " Bitcoin Pricing: Impact of Attractiveness Variables." Financial Innovation 6 (21). doi: 10.1186/s40854-020-00176-3. 8 Kristoufek, L. 2015. " What are the Main Drivers of the Bitcoin Price?: Evidence from Wavelet Coherence Analysis." Plos One 10 (4): e0123923. doi: 10.1371/journal.pone.0123923. 9 Nasir, M. A., T. L. D. Huynh, S. P. Nguyen, and D. Duong. 2019. " Forecasting Cryptocurrency Returns and Volume Using Search Engines." Financial Innovation 5 (2). doi: 10.1186/s40854-018-0119-8. Panagiotidis, T., T. Stengos, and O. Vravosinos. 2018. " On the Determinants of Bitcoin Returns: A LASSO Approach." Finance Research Letters 27 : 235 – 240. doi: 10.1016/j.frl.2018.03.016. Rapach, D. E., and M. E. Wohar. 2006. " In-sample Vs. Out-of-sample Tests of Stock Return Predictability in the Context of Data Mining." Journal of Empirical Finance 13 (2): 231 – 247. doi: 10.1016/j.jempfin.2005.08.001. Rapach, D. E., M. E. Wohar, and J. Rangvid. 2005. " Macro Variables and International Stock Return Predictability." International Journal of Forecasting 21 (1): 137 – 166. doi: 10.1016/j.ijforecast.2004.05.004. Sun, X., M. Liu, and Z. Sima. 2020. " A Novel Cryptocurrency Price Trend Forecasting Model Based on LightGBM." Finance Research Letters 32 : 101084. doi: 10.1016/j.frl.2018.12.032. Footnotes There could be cases when timestamps go unrecorded or prices jump around, and they happen occasionally in a day for several reasons, e.g., the exchange or its API was down, trades did not exist, there are unforeseen technical errors in data reporting or gathering. We choose not to backfill these blank data points. The standard deviation of Bitcoin return in the training sample is 0.0511; 0.0377 for the test sample.

By Weige Huang and Xiang Gao

Reported by Author; Author

Titel:	LASSO-based high-frequency return predictors for profitable Bitcoin investment
Autor/in / Beteiligte Person:	Huang, Weige ; Gao, Xiang
Link:	Volltext (PDF) Zitierangaben bei ECONIS
Zeitschrift:	Applied economics letters, Jg. 29 (2022), Heft 12, S. 1079-1083
Veröffentlichung:	2022
Medientyp:	academicJournal
DOI:	10.1080/13504851.2021.1908512
Sonstiges:	Nachgewiesen in: ECONIS Sprachen: English Language: English Publication Type: Aufsatz in Zeitschriften (Article in journal) Document Type: Elektronische Ressource im Fernzugriff Manifestation: Unselbstständiges Werk [Aufsatz, Rezension]

Klicken Sie ein Format an und speichern Sie dann die Daten oder geben Sie eine Empfänger-Adresse ein und lassen Sie sich per Email zusenden.

BibTeX Citavi, JabRef, u.a.
(Literaturverwaltung)

PDF kein Volltext!
(Merkzettel, Notizen)

RIS Endnote, Citavi u.a.
(Literaturverwaltung)

MODS
(XML zur Weiterverarbeitung)

oder

Wählen Sie das für Sie passende Zitationsformat und kopieren Sie es dann in die Zwischenablage, lassen es sich per Mail zusenden oder speichern es als PDF-Datei.

Gewünschter Zitations-Stil:

oder

Bitte prüfen Sie, ob die Zitation formal korrekt ist, bevor Sie sie in einer Arbeit verwenden. Benutzen Sie gegebenenfalls den "Exportieren"-Dialog, wenn Sie ein Literaturverwaltungsprogramm verwenden und die Zitat-Angaben selbst formatieren wollen.