How to optimize Julia SDEs?

11 minute read

Published:

Hi all!. In this post I’ll show how to optimize the simulation of a Stochastic Differential Equations (SDEs) System written in Julia Language. To that end I’ll DifferentialEquations.jl and StaticArrays.jl libraries. Firstly, stack and heap memory types will be defined. Then, the Longstaff-Schwartz SDEs system will be defined, and then translated into Julia code. It’ll be written in two flavours: the in-place (iip) and out-of-place (oop) versions will be shown. For the oop system the StaticArrays.jl package will be used in order to provide stack allocated arrays. Finally, the two implementations will be benchmarked in both heap allocations and elapsed time for adaptive and non-adaptive discretization schemes.

Stack and Heap memory allocations

image Stack vs Heap ordering representation. Image taken from this stack overflow post.

Stack and heap both live in RAM memory, but they have some notorious differences. The stack is ordered memory with simple allocation/deallocation patterns used for static allocation at compile time of constants, or even relatively small arrays. It is faster than the heap memory which is used for dynamic allocations that occur at runtime, allowing big arrays to be allocated when the compiling process of the code has been done. Unlike the stack, the heap allows for allocations in any moment at the cost of lower speed. For example, in Python the memory is heap allocated due to fact that the type of the variables is only known at runtime. Furthermore, in Julia one can chose, in some situations, whether the memory is stack or heap allocated. In order to optimize Julia code we must ensure the most allocations are done in the stack. That is what we are going to do with the Longstaff-Schwartz SDEs system.

Longstaff-Schwartz SDEs system

The Longstaff-Schwartz model is defined as a system of two mean reverting CIR-like SDEs. It can be written as follows:

image

image

where are the mean reversion speed, the mean reversion level and tbe volatility of each SDE.

In julia there are some tricks to speed up calculations in the context of ODEs or SDEs. One way is to define in-place (iip) functions which don’t return anything, but that act mutating the values of a previously defined vector (du in this case). Another option is to define out-of-place (oop) functions that return static vectors which are stored in stack memory.

Before defining the L-S system, we import the Julia libraries we’ll use:

using DifferentialEquations
using BenchmarkTools
using Parameters
using StaticArrays

The iip drift and diffusion funcions for the L-S system can be written as:

 function f!(du, u, p, t)
     @unpack κ₁, κ₂, θ₁, θ₂ = p

     du[1] = κ₁ * (θ₁ - u[1])
     du[2] = κ₂ * (θ₂ - u[2])
 end

 function g!(du, u, p, t)
     @unpack σ₁, σ₂ = p

     du[1] = σ₁ * sqrt(u[1])
     du[2] = σ₂ * sqrt(u[2])
 end

Moreover, by using StaticArrays.jl we can define static vectors which are stored in the stack and are very rapid in many common operations. In this scenario our drift and diffusion functions become:

function f(u, p, t)
     @unpack κ₁, κ₂, θ₁, θ₂ = p

     du1 = κ₁ * (θ₁ - u[1])
     du2 = κ₂ * (θ₂ - u[2])

     @SVector [du1, du2]
 end

 function g(u, p, t)
     @unpack σ₁, σ₂ = p

     du1 = σ₁ * sqrt(u[1])
     du2 = σ₂ * sqrt(u[2])

     @SVector [du1, du2]
 end 

Finally, we need to define the initial condition for each SDE, the simulation time and the parameters. Note that for the oop system (i.e the one defined using the static vectors) we need to pass the initial condition as a static vector. We accomplish that by using the @SVector macro.

u0 = [0.1, 0.1]
u01 = @SVector [0.1, 0.1]

tspan = (0.0, 1.0)
p = (κ₁ = 0.1, κ₂ = 0.1, θ₁ = 0.1, θ₂ = 0.1, σ₁ = 0.1, σ₂ = 0.1)

ls_iip = SDEProblem{true}(f!, g!, u0, tspan, p)
ls_oop = SDEProblem{false}(f, g, u01, tspan, p)

Benchmarking

I’ll benchmark both the iip and oop systems in different situations. At first I’ll compare the heap allocations and elapsed time by using an adaptive discretization algorithm. Then I’ll use the same algorithm but turning off the adaptive step feature. Finally, I’ll use the Euler-Maruyama scheme.

By executing:

@btime solve(ls_iip)
@btime solve(ls_oop)

we get 13.958 μs (148 allocations: 12.89 KiB) for the iip L-S system and 150.750 μs (3091 allocations: 97.52 KiB) for the oop system.

If we turn off the adaptive stepping:

@btime solve(ls_iip, dt=1/252, adaptive=false)
@btime solve(ls_oop, dt=1/252, adaptive=false)

we get 45.792 μs (361 allocations: 32.45 KiB) for the iip system and 20.250 μs (69 allocations: 11.36 KiB) for the oop system.

Finally, by using the Euler-Maruyama scheme:

@btime solve(ls_iip, EM(), dt = 1 / 252)
@btime solve(ls_oop, EM(), dt = 1 / 252)

we get 19.959 μs (315 allocations: 27.80 KiB) for the iip system and 9.916 μs (46 allocations: 9.08 KiB) for the oop system.

Conclusions

We defined heap and stack memory types and applied that concepts to the simulation of the Longstaff-Schwartz SDEs system. We defined both iip and oop drift and diffusion functions and benchmarked the simulation of the L-S system in different situations. If we use an adaptive stepping the iip system produced less heap allocations than the oop one. On the contrary, if we turn off the adaptive stepping the oop system showed to be more efficient both in terms of memory allocation and in elapsed time. This could be due to the adaptive step algorithm, which requires a brownian bridge.