LQR Optimal Execution¶
This example starts from a simple liquidation problem: a trader has a position to sell and wants to work out of it over a fixed horizon without waiting too long or trading too aggressively.
In a quadratic execution model, those two pressures become the usual LQR tradeoff:
- penalize remaining inventory because holding it is risky
- penalize large trades because fast execution is expensive
The result is a one-state discrete control problem with a very clean state-space interpretation.
Runnable script: examples/lqr_optimal_execution.py
Problem Setup¶
Let x_k denote remaining inventory at step k, normalized so x_0 = 1
means the full order is still unsold. Let u_k denote the signed inventory
change over one step. The dynamics are
If the controller sells, inventory goes down, so u_k < 0. The finance-facing
sell quantity is therefore
The design objective is the standard infinite-horizon LQR cost
where Q represents inventory risk and R represents trading cost. Increasing
Q pushes the controller to liquidate faster. Increasing R makes it trade
more slowly.
Build The Execution Model¶
import numpy as np
jax.config.update("jax_enable_x64", True)
import jax.numpy as jnp
import contrax as cx
DT = 1.0
HORIZON = 20
X0 = jnp.array([1.0])
def build_execution_system(dt: float = DT) -> cx.DiscLTI:
"""Inventory dynamics with signed inventory-change control.
State:
x_k = remaining inventory, normalized so x_0 = 1 means 100%.
Control:
u_k = signed inventory change. Selling corresponds to u_k < 0.
Dynamics:
x_{k+1} = x_k + u_k
"""
A = jnp.array([[1.0]])
B = jnp.array([[1.0]])
C = jnp.array([[1.0]])
D = jnp.zeros((1, 1))
return cx.dss(A, B, C, D, dt=dt)
SYS = build_execution_system()
This is a tiny model, but it already shows the useful part of the Contrax API:
the execution problem is just a DiscLTI system plus an LQR solve.
Solve The Baseline Schedule¶
def execution_schedule(
inventory_risk: jax.Array,
trading_cost: jax.Array,
*,
x0: jax.Array = X0,
horizon: int = HORIZON,
):
"""Solve the execution problem and return the resulting liquidation path."""
Q = jnp.array([[inventory_risk]])
R = jnp.array([[trading_cost]])
result = cx.lqr(SYS, Q, R)
def controller(t, x):
return -result.K @ x
ts, xs, _ = cx.simulate(SYS, x0, controller, num_steps=horizon)
inventory = xs[:, 0]
# With x[k+1] = x[k] + u[k], a sell quantity is -u[k] = x[k] - x[k+1].
sell_quantity = inventory[:-1] - inventory[1:]
return result, ts, inventory, sell_quantity
For the baseline choice Q = 2.5 and R = 0.4, the controller is strongly
inventory-averse, so it sells most of the position immediately and then cleans
up the remainder very quickly.
That plot is the center of the example. Inventory is the state, liquidation is the control effect, and the design question is the familiar balance between state penalty and control penalty.
Tune The Execution Urgency With Gradients¶
The same script then places the Riccati solve inside a JAX objective:
def target_inventory_curve(horizon: int = HORIZON) -> jax.Array:
"""Reference curve: liquidate most of the position over the horizon."""
steps = jnp.arange(horizon + 1, dtype=jnp.float64)
return jnp.exp(-0.22 * steps)
def execution_tracking_loss(log_inventory_risk, log_trading_cost):
"""Tune LQR weights so the inventory path matches a desired urgency."""
inventory_risk = jnp.exp(log_inventory_risk)
trading_cost = jnp.exp(log_trading_cost)
_, _, inventory, sell_quantity = execution_schedule(
inventory_risk,
trading_cost,
)
target = target_inventory_curve()
inventory_error = jnp.mean((inventory - target) ** 2)
turnover_penalty = 1e-2 * jnp.mean(sell_quantity**2)
terminal_penalty = 10.0 * inventory[-1] ** 2
return inventory_error + turnover_penalty + terminal_penalty
def tune_execution_weights(num_steps: int = 50, learning_rate: float = 0.15):
params = (jnp.array(-2.0), jnp.array(-2.0))
objective_and_grad = jax.jit(
jax.value_and_grad(execution_tracking_loss, argnums=(0, 1))
)
initial_loss, _ = objective_and_grad(*params)
history = [float(initial_loss)]
for _ in range(num_steps):
loss, grads = objective_and_grad(*params)
dq, dr = grads
params = (
params[0] - learning_rate * dq,
params[1] - learning_rate * dr,
)
history.append(float(loss))
final_loss = float(execution_tracking_loss(*params))
return {
"initial_loss": history[0],
"final_loss": final_loss,
"inventory_risk": float(jnp.exp(params[0])),
"trading_cost": float(jnp.exp(params[1])),
"loss_history": np.asarray(history),
}
Here the goal is to tune Q and R so the resulting inventory path tracks a
chosen urgency curve while still keeping turnover and terminal inventory under
control.
That gives the workflow
This is the part that feels especially native to Contrax: the controller design step is not a separate offline calculation. It lives inside the same differentiable JAX program as the rest of the objective.
Batch The Same Design Across Many Assets¶
Once the execution problem is written as an ordinary fixed-shape control
workflow, batching becomes just another vmap:
@jax.jit
def batched_first_trade(inventory_risks, trading_costs):
def solve_one(q, r):
_, _, _, sells = execution_schedule(q, r, horizon=HORIZON)
return sells[0]
return jax.vmap(solve_one)(inventory_risks, trading_costs)
That is a natural extension of the same story. Instead of solving one execution schedule, solve many independent schedules with different risk and impact weights in one compiled pass.
What The Script Prints¶
Running examples/lqr_optimal_execution.py prints a compact summary of the
baseline controller and the tuned design:
LQR optimal execution
baseline gain = [[0.87695257]]
initial tuning loss = 0.052402
final tuning loss = 0.046743
tuned inventory risk = 0.116998
tuned trading cost = 0.156547
first sell quantities = [8.76952567e-01 1.07906744e-01 1.32776339e-02 ...]
batched first sells = [0.65586909 0.87695257 0.96291202]
The useful checks are simple:
- the baseline controller produces a monotone liquidation path
- the tuning loop lowers its objective
- the batched version returns different schedules for different weight choices
Read This Example For What It Is¶
This page is intentionally about the control mapping, not about full execution microstructure realism. The model is linear, quadratic, single-asset, and deliberately small.
That is also what makes it useful here. You can see the state, the control, the cost, and the feedback law immediately, and then see how Contrax extends that classical setup with differentiation and batching.