Getting accurate cryptocurrency price data is critical when backtesting discretionary and algorithmic crypto trading strategies. This post covers what I’ve found to be the best free and paid crypto data sources. I also demonstrate how to use Python & Pandas to resample the freely available Kaggle data into one-minute bars filling missing data for future backtesting purposes.
Getting Cryptocurrency Price Data
Multiple free and paid sources exist for historical data for Bitcoin and other cryptocurrencies. The paid versions are more extensive and accurate than the free ones; however, unlike equities data, free crypto data is typically suitable for backtesting purposes.
You can also connect directly to an exchange’s API to download historical data, typically with a more limited history.
SOURCE | PAID OR FREE |
---|---|
Polygon.io | Paid |
CoinMarketCap.com | Paid |
CryptoDataDownload.com | Free |
Kaggle.com | Free |
Analyzing Alpha (Kaggle Resampled) | Free |
Download Free Historical Cryptocurrency Price Data
If you’re just here for the CSV data, please use the following, which is offered freely using the following creative commons license:
- Analyzing Alpha Historical Crypto Price Data CSV (zipped & resampled)
- Kaggle Historical Price Data CSVs (zipped)
Create Your Own Dataset Using Python & Pandas
Learn how to download free crypto data and convert it into minute bars using Python and Pandas for backtesting purposes. Follow along below or download the Get Historical Cryptocurrency Price Data Jupyter Notebook.
1. Get Imports
import datetime as dt
import numpy as np
import pandas as pd
2. Import Universe
from zipfile import ZipFile
zf = ZipFile('/home/leosmigel/Downloads/archive.zip')
cols = ['time', 'open', 'high', 'low', 'close', 'volume']
dfs = pd.concat({text_file.filename.split('.')[0]: pd.read_csv(zf.open(text_file.filename),
usecols=cols)
for text_file in zf.infolist()
if text_file.filename.endswith('.csv')
})
dfs
df = dfs.droplevel(1).reset_index().rename(columns={'index':'ticker'})
df = df[df['ticker'].str.contains('usd')]
df['date'] = pd.to_datetime(df['time'], unit='ms')
df = df.sort_values(by=['date','ticker'])
df = df.drop(columns='time')
df = df.set_index(['date','ticker'])
df = df['2020-07-01':'2020-12-31']
df
4. Resample Timeframes
There will be “missing” bars from an exchange when an asset isn’t traded during that period. We are forward filling the price information and setting the volume to zero for the missing bars. A requirement for many backtesting engines is that there are no missing bars, but doing this can cause issues with various statistical methods if only considering prices and not volume.
bars1m = df
bars1m = bars1m.reset_index().set_index('date').groupby('ticker').resample('1min').last().droplevel(0)
bars1m.loc[:, bars1m.columns[:-1]] = bars1m[bars1m.columns[:-1]].ffill()
bars1m.loc[:, 'volume'] = bars1m['volume'].fillna(value=0.0)
bars1m = bars1m.reset_index().set_index(['date','ticker'])
bars1m
5. Export CSV File
bars1m.to_csv('crypto-price-data.csv')