Neural Networks for Hedging I
The purpose of this page is track my progress implementing the 2018 (published in 2019) paper by Beuhler et al., “Deep Hedging”. In that paper they use a semi-recurrent deep neural network to calculate the appropriate hedging positions when hedging vanilla options.
They begin in a simulation setting using the Heston model and their network is implemented using TensorFlow. Owing to the current state of machine learning frameworks, as well as to the fact that one of my colleagues already has TensorFlow experience, I have decided to use PyTorch.
A First Network
Beuhler et al. use a neural network with two hidden layers to compute the necessary position (delta) in the underlying for that trading day. The network is semi-recurrent because the delta for day $i$ is used as an input for the new network at day $i+1$. It is also very deep as you essentially have two hidden layers per trading day.
To start, I construct the simplest neural network based on their hyperparameters. They use $d+15$ nodes per hidden layer, where $d$ is the number of inputs (the underlying assets to be traded). Using PyTorch, the network is defined as follows:
# Imports and Seeds import numpy as np import scipy.stats as stats import matplotlib.pyplot as plt import torch import torch.nn as nn import torch.optim as optim np.random.seed(0) torch.manual_seed(0) # Construct Neural Net class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.lin1 = nn.Linear(1, 16) self.lin2 = nn.Linear(16, 1) self.sigmoid1 = nn.Sigmoid(); def forward(self, S0): out = self.lin1(S0) out = self.lin2(out) out = self.sigmoid1(out) return out net = Net()
Note that I'm using a sigmoid activation function on the second hidden layer, whereas Beuler et al. use ReLU. I'll get back to this. This simple network can be represented graphically in its entirety:
The input (single node on the left) will be the normalized price of the underlying asset, whereas the output (single node on the right) would be optimal trading position in that asset.
The mathematical operations being performed by each layer can be easily visualized by saving the Pytorch model and then importing it into Netron. (Note: In theory, TensorBoard is really the way to do this, but I was having compatibility issues.)
Learning The Black-Scholes Delta
To check that the network is implemented correctly, I test whether it can approximate a known non-linear function. For this, I generate a random sample of initial stock price values and use as a target the corresponding Black-Scholes deltas for a call option struck at $100$, one week from maturity.
# Functions def d1(S0, K, T, r, sigma): return (np.log(S0 / K) + (r + 0.5 * sigma ** 2) * T) / (sigma * np.sqrt(T)) def d2(S0, K, T, r, sigma): return d1(S0, K, T, r, sigma) - sigma * np.sqrt(T) def price_put_BS(S0, K, T, r, sigma): return (stats.norm.cdf(-d2(S0, K, T, r, sigma)) * K * np.exp(-r * T) - stats.norm.cdf(-d1(S0, K, T, r, sigma)) * S0) def price_call_BS(S0, K, T, r, sigma): return (stats.norm.cdf(d1(S0, K, T, r, sigma)) * S0 - stats.norm.cdf(d2(S0, K, T, r, sigma)) * K * np.exp(-r * T)) def delta_put_BS(S0, K, T, r, sigma): return -stats.norm.cdf(-d1(S0, K, T, r, sigma)) def delta_call_BS(S0, K, T, r, sigma): return stats.norm.cdf(d1(S0, K, T, r, sigma)); # Parameters filename = 'bs_delta_1' S0 = 100 K = 100 T = 1/50 r = 0.05 sigma = 0.2 num_samples = 1000; num_epochs = 15; batch_size = 4; S0_lower_bound = 90; S0_upper_bound = 110; uniform_samples = np.random.rand(num_samples, 1) S0_values = (S0_upper_bound - S0_lower_bound) * uniform_samples + S0_lower_bound delta_values = delta_call_BS(S0_values, K, T, r, sigma)
I setup up easy-to-use iterables using
As a loss function, I use the standard mean-squared error and for an optimizer I use stochastic gradient descent.
# Create Data Loaders training_set = torch.utils.data.TensorDataset(torch.Tensor(uniform_samples), torch.Tensor(delta_values)) training_loader = torch.utils.data.DataLoader(training_set, batch_size=batch_size, shuffle=True) # Define Loss Function and Optimizer criterion = nn.MSELoss() optimizer = optim.SGD(net.parameters(), lr=0.1)
With all of this in place, we can finally train the network to approximate the analytical delta.
for epoch in range(num_epochs): running_loss = 0.0 for i, data in enumerate(training_loader, 0): inputs, targets = data; # Zero the parameter gradients optimizer.zero_grad() # Forward + Backward + Optimize outputs = net(inputs) loss = criterion(outputs, targets) loss.backward() optimizer.step() # Print statistics running_loss += loss.item() print('[%d] loss: %.6f' % (epoch + 1, running_loss))
As can be seen below, our simple network manages to approximate the analytical Black-Scholes delta quite quickly.
Learning A One-step Hedge
The next step is to attempt to learn the best hedge position without any knowledge of the analytical delta, but rather by trying to minimize the profit and loss. The simplest case is a one-period model.
At $t_0$ we sell a call option for $C_0$ and buy $\delta_0$ units of the underlying stock, $S_0$. Then at $T_1$ we have to pay out the payoff of the option (if positive) and close out our position. Thus, the function we want to minimize looks like
$$ \delta_0(S_1 - S_0) + C_0 - (S_1 - K)^+ $$
This will be close to the Black-Scholes delta, with a difference accounting for the discrete-time nature of the hedge. We need to price the call options at $t_0$ as well as simulate realizations for the underlying asset at $t_1$. I also construct new data loaders.
uniform_samples = np.random.rand(num_samples, 1) normal_samples = np.random.randn(num_samples, 1) S0_values = (S0_upper_bound - S0_lower_bound) * uniform_samples + S0_lower_bound S1_values = S0_values * np.exp((r - 0.5 * sigma **2) * T + sigma * np.sqrt(T) * normal_samples) call_values = price_call_BS(S0_values, K, T, r, sigma) training_set = torch.utils.data.TensorDataset(torch.Tensor(uniform_samples), torch.Tensor(S0_values), torch.Tensor(S1_values), torch.Tensor(call_values)) training_loader = torch.utils.data.DataLoader(training_set, batch_size=batch_size, shuffle=True)
There's no need to define a custom loss function, as you can cast the problem in terms of MSE. Then we can train the network.
for epoch in range(num_epochs): running_loss = 0.0 for i, data in enumerate(training_loader, 0): inputs, S0, S1, C0 = data; # Zero the parameter gradients optimizer.zero_grad() # Forward + Backward + Optimize outputs = net(inputs) loss = criterion(outputs * (S1 - S0) + C0, torch.max(S1 - K, torch.zeros(batch_size, 1))) loss.backward() optimizer.step() # Print statistics running_loss += loss.item() print('[%d] loss: %.6f' % (epoch + 1, running_loss))
The optimization here is not as smooth as for the previous case, which is to be expected considering the additional randomness (the simulations of $S_1$) and non-linearity (we're further removed from the target function). The convergence is illustrated below.
- Implement a two-period hedge. This requires constructing a semi-recurrent network, for which I'll need additional PyTorch API knowledge.
- Change the underlying model to Heston. For this, we'll need two-dimensional inputs, as we'll require a derivative trading instrument to hedge the volatility.
The simple neural network was visualized using
Both have browser-based implementations available. I edit the resulting SVGs using Inkscape.