A Simple Trading Strategy in Zipline

Let’s develop a simple trading strategy using two simple moving averages now that we’ve installed Zipline. This simple strategy is called a dual moving average strategy.

The best way to explain dual moving average (DMA) strategy is with an example. A simple moving average is the average price of the last x number of trading periods. Trading periods can be weekly, daily, hourly, etc. To calculate a 50-day simple moving average (SMA), we would add the closing prices of the previous 50 days and divide by 50, which again is the total number of days.

Now that we understand what a simple moving average is, let’s discuss the DMA strategy. If we calculate both the 50-day SMA and a 200-day SMA, we can determine the price trend. When the 50-day moving average crosses above the 200-day moving average, the trend is up and the strategy would say to buy. When the 50-day moving average crosses below the 200-day moving average, the trend is considered down and the strategy states we should bet on the price falling further. Does it work in practice? Let’s find out!

Setup Jupyter Notebook

Let’s get our workspace set up and run Jupyter notebook. Create a directory to store your files, and activate your Zipline environment using conda where env_zipline is what you called your conda environment.

$ mkdir workspace
$ cd workspace
$ conda activate env_zipline
$ jupyter notebook

Jupyter should open up in a browser and look like the below. You’ll want to click on New and then Python 3 to create a new notebook.

Once you have a new notebook open, we can enter commands into each Jupyter cell. You can follow along with the code below or download my Jupyter notebook if you’re familiar with Jupyter and want to speed things up.

Import Zipline

The first thing we’re going to do is to load zipline using the Jupyter %magic and then we’ll import zipline. After the second line, press shift enter, which will run the cell instead of just starting a new line.

%load_ext zipline
import zipline

Now that we’ve imported zipline, let’s add the various libraries and methods that we’ll be using. A full list of the zipline methods can be found in the Zipline API Reference, Datetime and pytz are needed to set datetimes for when our algo starts and ends.

from zipline.api import order_target_percent, record, symbol, set_benchmark, get_open_orders
from datetime import datetime
import pytz

Zipline has two functions that we need to define:

  1. initialize
  2. handle_data

Zipline Initialize

Initialize is run once. The context variable is required. Context is persistent and can be used throughout our algorithm as you’ll soon see. We also pass Apple to set_benchmark. This will add a series to our results so that we can compare the performance of our algorithm with our selected benchmark.

def initialize(context):
        context.i = 0
        context.asset = symbol('AAPL')
        set_benchmark(symbol('AAPL'))

Zipline Handle Data

After our algorithm has been initialized, it will call handle_data. When defining handle_data, we need to pass it the context variable from above and data to work with. handle_data is called once for every event, which we define when calling run_algorithm. We’ll use the handle data from the previous example, most of which is taken from the Zipline Quickstart.

def handle_data(context, data):
        # Skip first 200 days to get full windows
        context.i += 1
        if context.i < 200:
                        return
        # Compute averages
        # data.history() has to be called with the same params
        # from above and returns a pandas dataframe.
        short_mavg = data.history(context.asset, 'price', bar_count=50, frequency="1d").mean()
        long_mavg = data.history(context.asset, 'price', bar_count=200, frequency="1d").mean()

        # Trading logic
        open_orders = get_open_orders()

        if context.asset not in open_orders:
                if short_mavg > long_mavg:
                        # order_target orders as many shares as needed to
                        # achieve the desired number of shares.
                        order_target_percent(context.asset, 1.0)
                elif short_mavg < long_mavg:
                        order_target_percent(context.asset, 0.0)

        # Save values for later inspection
        record(AAPL=data.current(context.asset, 'price'),
                        short_mavg=short_mavg,
                        long_mavg=long_mavg)

In order to calculate the 200-day moving average, we need the previous 200 days. That’s why we skip 200 days before calculating our moving averages and running our trading logic. Also, we need to be on the 201st day in order to calculate the 200-day moving average for trading purposes as we wouldn’t know what today’s close price is. Finally, notice how we’re using context to save the day number and it maintains its state through each handle_data call.

Now that we’ve skipped the first 200 days, let’s calculate the simple moving averages. Data.history returns a pandas series, dataframe, or panel depending on the data we pass to it. In our case, since we’re passing a single asset, we’ll get a series back and the mean method will return a float of the simple moving average.

# Compute averages
# data.history() has to be called with the same params
# from above and returns a pandas dataframe.
short_mavg = data.history(context.asset, 'price', bar_count=50, frequency="1d").mean()
long_mavg = data.history(context.asset, 'price', bar_count=200, frequency="1d").mean()

With our moving averages, we can now create our trading logic. If the 50-day moving average is above the 200-day, we’ll use 100% of our money to buy Apple. If the 50-day moving average falls below the 200-day, we’ll sell all of our shares. We can pass a float between 1.0 and -1.0 where a negative value indicates we wish to short the stock. You’ll notice that before I place an order, I check to see if we already have any trades open. If I don’t do this, we could place an order before our previous order is completed causing us to buy too many shares.

# Trading logic
open_orders = get_open_orders()

if context.asset not in open_orders:
        if short_mavg > long_mavg:
                # order_target orders as many shares as needed to
                # achieve the desired number of shares.
                order_target_percent(context.asset, 1.0)
        elif short_mavg < long_mavg:
                order_target_percent(context.asset, 0.0)

We need to tell Zipline what values we want for analysis purposes. As we move to larger datasets, recording every value simply isn’t reasonable. We use the record function to keep track of Apple’s price and our moving averages for each day. If you’re familiar with Python, the syntax may look a little bit odd. AAPL isn’t a variable. It’s the text string we’re telling record to use.

# Save values for later inspection
record(AAPL=data.current(context.asset, 'price'),
                short_mavg=short_mavg,
                long_mavg=long_mavg)

Analyze Performance

We’ve initialized our algorithm and we’ve defined handle_data. After handle_data is run, it will order the securities and record the data. Now it’s time to run Zipline and to see how our strategy performed. We can run Zipline in a variety of ways. You can add the following magic in Jupyter to run Zipline.

%%zipline --start 2000-1-1 --end 2017-12-31

We can use the run_algorithm method explicitly. The method has a lot of options so I suggest you read the run_algorithm API Reference. The method will return the performance of our algorithm in a dataframe.

start = datetime(2000, 1, 1, 0, 0, 0, 0, pytz.utc)
end = datetime(2017, 12, 31, 0, 0, 0, 0, pytz.utc)

perf = zipline.run_algorithm(start=start,
                                end=end,
                                initialize=initialize,
                                capital_base=10000,
                                handle_data=handle_data)

Let’s analyze our algo’s performance using Pyfolio. We’ll import pyfolio and numpy so we can use them. We then use pf.utils.extract_rets_pos_txn_from_zipline and extract the benchmark_period_return to get the data we need. Pyfolio requires all of our data to be in period returns and benchmark_period_return, which is poorly named, is actually a cumulative period return. We need to convert benchmark_period_return from a cumulative return into a period return. Let’s dig into this a little deeper as understanding how to calculate returns is important.

You can’t just subtract the differences between the cumulative returns to get to the daily returns as they’re compounded. For example, imagine a scenario where we invested $1.00 and it grew by 50% on day one and it lost 50% on day two grew it by 50% on day three, and lost 50% on day four. How much money would we have remaining?

The answer is not $1.00 as shown here:

$1.00 * (1+0.5) * (1-0.5) * (1+0.5) * (1-0.5) = $0.5625

The cumulative returns would be 0.5625.

We can deal with this problem and get to compounded returns by using either one of the conversion formulas below. In the first formula, we convert our returns to logarithmic returns so we calculate the difference between them, and then we undo the conversion using the exponential formula. In the second formula, which may seem more intuitive to some, divide the second cumulative return by the first cumulative return and then subtract one. See the following example and make note of how we get the daily_returns from the cumulative_returns.

import pandas as pd

# We need to be able to calculate the daily returns from the cumulative returns
daily_returns = pd.Series([0.5, -0.5, 0.5, -0.5])
cumulative_returns = pd.Series([0.5, -0.25, 0.125, 0.5625])

# Two different formulas to calculate daily returns
print((1 + cumulative_returns) / (1 + cumulative_returns.shift()) -1)
print((np.exp(np.log(cumulative_returns + 1).diff()) - 1))

# Recreate daily returns manually for example purposes
print(daily_returns.head(1))
print((1 - 0.25) / (1.5) - 1)
print((1 + 0.125) / (1 - 0.25) - 1)
print((1 + 0.5625) / (1 + 0.125 ) - 1)
0         NaN
1   -0.500000
2    0.500000
3    0.388889
dtype: float64
0         NaN
1   -0.500000
2    0.500000
3    0.388889
dtype: float64
0    0.5
dtype: float64
-0.5
0.5
0.38888888888888884

Once we have the data calculated correctly, we create the tear sheet to analyze our algorithm.

import pyfolio as pf
import numpy as np

# Extract algo returns and benchmark returns
returns, positions, transactions = pf.utils.extract_rets_pos_txn_from_zipline(perf)
benchmark_period_return = perf['benchmark_period_return']

# Convert benchmark returns to daily returns
#daily_returns = (1 + benchmark_period_return) / (1 + benchmark_period_return.shift()) - 1
daily_benchmark_returns = np.exp(np.log(benchmark_period_return + 1.0).diff()) - 1

# Create tear sheet
pf.create_full_tear_sheet(returns, positions=positions, transactions=transactions, benchmark_rets=daily_benchmark_returns)

As you can see, Pyfolio generates a lot of information for us to be able to analyze our algorithm.

Start date2012-01-03
End date2016-12-30
Total months59
Backtest
Annual return9.2%
Cumulative returns55.1%
Annual volatility16.9%
Sharpe ratio0.61
Calmar ratio0.41
Stability0.70
Max drawdown-22.4%
Omega ratio1.17
Sortino ratio0.89
Skew0.10
Kurtosis11.86
Tail ratio1.08
Daily value at risk-2.1%
Gross leverage1.00
Daily turnover0.7%
Alpha0.02
Beta0.42
Worst drawdown periodsNet drawdown in %Peak dateValley dateRecovery dateDuration
022.452015-02-232015-08-24NaTNaN
118.832012-10-162013-09-162013-11-29294
212.322013-12-232014-01-302014-04-2590
310.922014-11-262015-01-162015-02-0451
46.812014-09-022014-10-162014-10-2338
Stress Eventsmeanminmax
EZB IR Event0.00%0.00%0.00%
Apr140.46%-1.57%8.19%
Oct140.31%-1.56%2.71%
Fall2015-0.07%-6.11%5.73%
Recovery-0.06%-6.20%6.93%
New Normal0.07%-7.98%8.19%
Top 10 long positions of all timemax
sid
AAPL100.07%
Top 10 short positions of all timemax
sid
Top 10 positions of all timemax
sid
AAPL100.07%

Leave a Comment