© Copyright Quantopian Inc.
© Modifications Copyright QuantRocket LLC
Licensed under the Creative Commons Attribution 4.0.
by Rene Zhang and Max Margenot
Risk management is critical for constructing portfolios and building algorithms. Its main function is to improve the quality and consistency of returns by adequately accounting for risk. Any returns obtained by unexpected risks, which are always lurking within our portfolio, can usually not be relied upon to produce profits over a long time. By limiting the impact of or eliminating these unexpected risks, the portfolio should ideally only have exposure to the alpha we are pursuing. In this lecture, we will focus on how to use factor models in risk management.
Other lectures have discussed Factor Models and the calculation of Factor Risk Exposure, as well as how to analyze alpha factors. The notation we generally use when introducing a factor model is as follows:
$$R_i = a_i + b_{i1} F_1 + b_{i2} F_2 + \ldots + b_{iK} F_k + \epsilon_i$$where: $$\begin{eqnarray} k &=& \text{the number of factors}\\ R_i &=& \text{the return for company $i$}, \\ a_i &=& \text{the intercept},\\ F_j &=& \text{the return for factor $j$, $j \in [1,k]$}, \\ b_{ij} &=& \text{the corresponding exposure to factor $j$, $j \in [1,k]$,} \\ \epsilon_i &=& \text{specific fluctuation of company $i$.}\\ \end{eqnarray}$$
To quantify unexpected risks and have acceptable risk levels in a given portfolio, we need to answer 3 questions:
These risk factors can be:
The base universe of assets we use here is the TradableStocksUS.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
from zipline.pipeline import Pipeline
from zipline.pipeline.data import sharadar, master, EquityPricing
from zipline.pipeline.factors import CustomFactor, Returns, AverageDollarVolume
from zipline.pipeline.filters import AllPresent, All
from zipline.research import run_pipeline
# date range for building risk model
start = "2009-01-01"
end = "2011-01-01"
First we pull the returns of every asset in this universe across our desired time period.
TradableStocksUS = (
# Market cap over $500M
(sharadar.Fundamentals.slice(dimension='ARQ', period_offset=0).MARKETCAP.latest >= 500e6)
# dollar volume over $2.5M over trailing 200 days
& (AverageDollarVolume(window_length=200) >= 2.5e6)
# price > $5
& (EquityPricing.close.latest > 5)
# no missing data for 200 days (exclude trading halts, IPOs, etc.)
& AllPresent(inputs=[EquityPricing.close], window_length=200)
& All([EquityPricing.volume.latest > 0], window_length=200)
# common stocks only
& master.SecuritiesMaster.usstock_SecurityType2.latest.eq("Common Stock")
# primary share only
& master.SecuritiesMaster.usstock_PrimaryShareSid.latest.isnull()
)
def tus_returns(start_date, end_date):
pipe = Pipeline(
columns={'Close': EquityPricing.close.latest},
screen = TradableStocksUS
)
stocks = run_pipeline(pipe, start_date=start_date, end_date=end_date, bundle='usstock-1d-bundle')
unstacked_results = stocks.unstack()
prices = (unstacked_results['Close'].fillna(method='ffill').fillna(method='bfill')
.dropna(axis=1,how='any').shift(periods=-1).dropna())
tus_returns = prices.pct_change()[1:]
return tus_returns
R = tus_returns(start, end)
print("The universe we define includes {} assets.".format(R.shape[1]))
print('The number of timestamps is {} from {} to {}.'.format(R.shape[0], start, end))
The universe we define includes 1906 assets. The number of timestamps is 503 from 2009-01-01 to 2011-01-01.
assets = R.columns
We will start with the classic Fama-French factors. The Fama-French factors are the market, company size, and company price-to-book (PB) ratio. We compute each asset's exposures to these factors, computing the factors themselves using pipeline code borrowed from the Fundamental Factor Models lecture.
Fundamentals = sharadar.Fundamentals.slice(dimension='ARQ', period_offset=0)
def make_pipeline():
"""
Create and return our pipeline.
We break this piece of logic out into its own function to make it easier to
test and modify in isolation.
"""
# Market Cap
market_cap = Fundamentals.MARKETCAP.latest
# Book to Price ratio
book_to_price = 1/Fundamentals.PB.latest
# Build Filters representing the top and bottom 500 stocks by our combined ranking system.
biggest = market_cap.top(500, mask=TradableStocksUS)
smallest = market_cap.bottom(500, mask=TradableStocksUS)
highpb = book_to_price.top(500, mask=TradableStocksUS)
lowpb = book_to_price.bottom(500, mask=TradableStocksUS)
universe = biggest | smallest | highpb | lowpb
pipe = Pipeline(
columns = {
'returns' : Returns(window_length=2),
'market_cap' : market_cap,
'book_to_price' : book_to_price,
'biggest' : biggest,
'smallest' : smallest,
'highpb' : highpb,
'lowpb' : lowpb
},
screen=universe
)
return pipe
Here we run our pipeline and create the return streams for high-minus-low and small-minus-big.
from quantrocket.master import get_securities
from quantrocket import get_prices
pipe = make_pipeline()
# This takes a few minutes.
results = run_pipeline(pipe, start_date=start, end_date=end, bundle='usstock-1d-bundle')
R_biggest = results[results.biggest]['returns'].groupby(level=0).mean()
R_smallest = results[results.smallest]['returns'].groupby(level=0).mean()
R_highpb = results[results.highpb]['returns'].groupby(level=0).mean()
R_lowpb = results[results.lowpb]['returns'].groupby(level=0).mean()
SMB = R_smallest - R_biggest
HML = R_highpb - R_lowpb
df = pd.DataFrame({
'SMB': SMB.tz_localize(None), # company size
'HML': HML.tz_localize(None) # company PB ratio
},columns =["SMB","HML"]).shift(periods =-1).dropna()
SPY = get_securities(symbols='SPY', vendors='usstock').index[0]
MKT = get_prices(
'usstock-1d-bundle',
data_frequency='daily',
sids=SPY,
start_date=start,
end_date=end,
fields='Close').loc['Close'][SPY].pct_change()[1:]
MKT = pd.DataFrame({'MKT':MKT})
F = pd.concat([MKT,df],axis = 1).dropna()
ax = ((F + 1).cumprod() - 1).plot(subplots=True, title='Cumulative Fundamental Factors')
ax[0].set(ylabel = "daily returns")
ax[1].set(ylabel = "daily returns")
ax[2].set(ylabel = "daily returns")
plt.show()
Running a multiple linear regression on the fundamental factors for each asset in our universe, we can obtain the corresponding factor exposure for each asset. Here we express:
$$ R_i = \alpha_i + \beta_{i, MKT} R_{i, MKT} + \beta_{i, HML} R_{i, HML} + \beta_{i, SMB} R_{i, SMB} + \epsilon_i$$for each asset $S_i$. This shows us how much of each individual security's return is made up of these risk factors.
We calculate the risk exposures on an asset-by-asset basis in order to get a more granular view of the risk of our portfolio. This approach requires that we know the holdings of the portfolio itself, on any given day, and is computationally expensive.
# factor exposure
R = R.tz_localize(None)
B = pd.DataFrame(index=assets, dtype=np.float32)
epsilon = pd.DataFrame(index=R.index, dtype=np.float32)
from IPython.display import clear_output
x = sm.add_constant(F)
for i, asset in enumerate(assets):
print(f"asset {i+1} of {len(assets)}")
clear_output(wait=True)
y = R.loc[:, asset]
y_inlier = y[np.abs(y - y.mean())<=(3*y.std())]
x_inlier = x[np.abs(y - y.mean())<=(3*y.std())]
result = sm.OLS(y_inlier, x_inlier).fit()
B.loc[asset,"MKT_beta"] = result.params[1]
B.loc[asset,"SMB_beta"] = result.params[2]
B.loc[asset,"HML_beta"] = result.params[3]
epsilon[asset] = y - (x.iloc[:,0] * result.params[0] +
x.iloc[:,1] * result.params[1] +
x.iloc[:,2] * result.params[2] +
x.iloc[:,3] * result.params[3])
asset 1906 of 1906
The factor exposures are shown as follows. Each individual asset in our universe will have a different exposure to the three included risk factors.
fig,axes = plt.subplots(3, 1)
ax1,ax2,ax3 =axes
B.iloc[0:10,0].plot.barh(ax=ax1, figsize=[15,15], title=B.columns[0])
B.iloc[0:10,1].plot.barh(ax=ax2, figsize=[15,15], title=B.columns[1])
B.iloc[0:10,2].plot.barh(ax=ax3, figsize=[15,15], title=B.columns[2])
ax1.set(xlabel='beta')
ax2.set(xlabel='beta')
ax3.set(xlabel='beta')
plt.show()
from zipline.research import sid
aapl = sid("FIBBG000B9XRY4", bundle='usstock-1d-bundle')
B.loc[aapl,:]
MKT_beta 1.042848 SMB_beta 0.139400 HML_beta -0.389628 Name: Equity(FIBBG000B9XRY4 [AAPL]), dtype: float64
R
F
B
Currently, the F
DataFrame contains the return streams for MKT, SMB, and HML, by date.
F.head(3)
MKT | SMB | HML | |
---|---|---|---|
2009-01-05 | -0.001183 | -0.003141 | 0.009074 |
2009-01-06 | 0.006677 | 0.010061 | 0.019250 |
2009-01-07 | -0.029956 | -0.000856 | -0.012452 |
While the B
DataFrame contains point estimates of the beta exposures to MKT, SMB, and HML for every asset in our universe.
B.head(3)
MKT_beta | SMB_beta | HML_beta | |
---|---|---|---|
Equity(FIBBG000C2V3D6 [A]) | 1.203883 | 0.467215 | -0.395332 |
Equity(FIBBG000F7RCJ1 [AAP]) | 0.696238 | 0.349129 | -0.437114 |
Equity(FIBBG000B9XRY4 [AAPL]) | 1.042848 | 0.139400 | -0.389628 |
Now that we have these values, we can start to crack open the variance of any portfolio that contains these assets.
The portfolio variance can be represented as:
$$\sigma^2 = \omega BVB^{\top}\omega^{\top} + \omega D\omega^{\top}$$
where:
$$\begin{eqnarray} B &=& \text{the matrix of factor exposures of $n$ assets to the factors} \\ V &=& \text{the covariance matrix of factors} \\ D &=& \text{the specific variance} \\ \omega &=& \text{the vector of portfolio weights for $n$ assets}\\ \omega BVB^{\top}\omega^{\top} &=& \text{common factor variance} \\ \omega D\omega^{\top} &=& \text{specific variance} \\ \end{eqnarray}$$Here we build functions to break out the risk in our portfolio. Suppose that our portfolio consists of all stocks in the Q3000US, equally-weighted. Let's have a look at how much of the variance of the returns in this universe are due to common factor risk.
w = np.ones([1,R.shape[1]])/R.shape[1]
def compute_common_factor_variance(factors, factor_exposures, w):
B = np.asarray(factor_exposures)
F = np.asarray(factors)
V = np.asarray(factors.cov())
return w.dot(B.dot(V).dot(B.T)).dot(w.T)
common_factor_variance = compute_common_factor_variance(F, B, w)[0][0]
print("Common Factor Variance: {0}".format(common_factor_variance))
Common Factor Variance: 0.0001356727668756826
def compute_specific_variance(epsilon, w):
D = np.diag(np.asarray(epsilon.var())) * epsilon.shape[0] / (epsilon.shape[0]-1)
return w.dot(D).dot(w.T)
specific_variance = compute_specific_variance(epsilon, w)[0][0]
print("Specific Variance: {0}".format(specific_variance))
Specific Variance: 2.764834762212209e-07
In order to actually calculate the percentage of our portfolio variance that is made up of common factor risk, we do the following:
$$\frac{\text{common factor variance}}{\text{common factor variance + specific variance}}$$common_factor_pct = common_factor_variance/(common_factor_variance + specific_variance)*100.0
print("Percentage of Portfolio Variance Due to Common Factor Risk: {0:.2f}%".format(common_factor_pct))
Percentage of Portfolio Variance Due to Common Factor Risk: 99.80%
So we see that if we just take every single security in the Q3000US and equally-weight them, we will end up possessing a portfolio that effectively only contains common risk.
Currently we are operating with an equal-weighted portfolio. However, we can reapportion those weights in such a way that we minimize the common factor risk illustrated by our common factor exposures. This is a portfolio optimization problem to find the optimal weights.
We define this problem as:
\begin{array}{ll} \mbox{$\text{minimize/maximum}$}_{w} & \text{objective function}\\ \mbox{subject to} & {\bf 1}^T \omega = 1, \quad f=B^T\omega\\ & \omega \in {\cal W}, \quad f \in {\cal F}, \end{array}where the variable $w$ is the vector of allocations, the variable $f$ is weighted factor exposures, and the variable ${\cal F}$ provides our constraints for $f$. We set ${\cal F}$ as a vector to bound the weighted factor exposures of the porfolio. These constraints allow us to reject weightings that do not fit our criteria. For example, we can set the maximum factor exposures that our portfolios can have by changing the value of ${\cal F}$. A value of $[1,1,1]$ would indicate that we want the maximum factor exposure of the portfolio to each factor to be less than $1$, rejecting any portfolios that do not meet that condition.
We define the objective function as whichever business goal we value highest. This can be something such as maximizing the Sharpe ratio or minimizing the volatility. Ultimately, what we want to solve for in this optimization problem is the weights, $\omega$.
Let's quickly generate some random weights to see how the weighted factor exposures of the portfolio change.
w_0 = np.random.rand(R.shape[1])
w_0 = w_0/np.sum(w_0)
The variable $f$ contains the weighted factor exposures of our portfolio, with size equal to the number of factors we have. As we change $\omega$, our weights, our weighted exposures, $f$, also change.
f = B.T.dot(w_0)
f
MKT_beta 0.788674 SMB_beta 0.455320 HML_beta -0.173078 dtype: float64
A concrete example of this can be found here, in the docs for CVXPY.
This presentation is for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation for any security; nor does it constitute an offer to provide investment advisory or other services by QuantRocket LLC ("QuantRocket"). Nothing contained herein constitutes investment advice or offers any opinion with respect to the suitability of any security, and any views expressed herein should not be taken as advice to buy, sell, or hold any security or as an endorsement of any security or company. In preparing the information contained herein, the authors have not taken into account the investment needs, objectives, and financial circumstances of any particular investor. Any views expressed and data illustrated herein were prepared based upon information believed to be reliable at the time of publication. QuantRocket makes no guarantees as to their accuracy or completeness. All information is subject to change and may quickly become unreliable for various reasons, including changes in market conditions or economic circumstances.