LQR Optimal Execution¶

This example starts from a simple liquidation problem: a trader has a position to sell and wants to work out of it over a fixed horizon without waiting too long or trading too aggressively.

In a quadratic execution model, those two pressures become the usual LQR tradeoff:

penalize remaining inventory because holding it is risky
penalize large trades because fast execution is expensive

The result is a one-state discrete control problem with a very clean state-space interpretation.

Runnable script: examples/lqr_optimal_execution.py

Problem Setup¶

Let x_k denote remaining inventory at step k, normalized so x_0 = 1 means the full order is still unsold. Let u_k denote the signed inventory change over one step. The dynamics are

\[ x_{k+1} = x_k + u_k. \]

If the controller sells, inventory goes down, so u_k < 0. The finance-facing sell quantity is therefore

\[ q_k^{\mathrm{sell}} = x_k - x_{k+1} = -u_k. \]

The design objective is the standard infinite-horizon LQR cost

\[ J = \sum_{k=0}^{\infty} \left(x_k^\top Q x_k + u_k^\top R u_k\right), \]

where Q represents inventory risk and R represents trading cost. Increasing Q pushes the controller to liquidate faster. Increasing R makes it trade more slowly.

Build The Execution Model¶

import numpy as np

jax.config.update("jax_enable_x64", True)

import jax.numpy as jnp

import contrax as cx

DT = 1.0
HORIZON = 20
X0 = jnp.array([1.0])


def build_execution_system(dt: float = DT) -> cx.DiscLTI:
    """Inventory dynamics with signed inventory-change control.

    State:
        x_k = remaining inventory, normalized so x_0 = 1 means 100%.

    Control:
        u_k = signed inventory change. Selling corresponds to u_k < 0.

    Dynamics:
        x_{k+1} = x_k + u_k
    """

    A = jnp.array([[1.0]])
    B = jnp.array([[1.0]])
    C = jnp.array([[1.0]])
    D = jnp.zeros((1, 1))
    return cx.dss(A, B, C, D, dt=dt)


SYS = build_execution_system()

This is a tiny model, but it already shows the useful part of the Contrax API: the execution problem is just a DiscLTI system plus an LQR solve.

Solve The Baseline Schedule¶

def execution_schedule(
    inventory_risk: jax.Array,
    trading_cost: jax.Array,
    *,
    x0: jax.Array = X0,
    horizon: int = HORIZON,
):
    """Solve the execution problem and return the resulting liquidation path."""

    Q = jnp.array([[inventory_risk]])
    R = jnp.array([[trading_cost]])
    result = cx.lqr(SYS, Q, R)

    def controller(t, x):
        return -result.K @ x

    ts, xs, _ = cx.simulate(SYS, x0, controller, num_steps=horizon)
    inventory = xs[:, 0]
    # With x[k+1] = x[k] + u[k], a sell quantity is -u[k] = x[k] - x[k+1].
    sell_quantity = inventory[:-1] - inventory[1:]
    return result, ts, inventory, sell_quantity

For the baseline choice Q = 2.5 and R = 0.4, the controller is strongly inventory-averse, so it sells most of the position immediately and then cleans up the remainder very quickly.

Baseline LQR execution inventory path over twenty time steps, compared with a smoother reference urgency curve — **The baseline execution path:** the teal curve is the LQR liquidation schedule, and the orange curve is a smoother target profile used later in the tuning section.

That plot is the center of the example. Inventory is the state, liquidation is the control effect, and the design question is the familiar balance between state penalty and control penalty.

Tune The Execution Urgency With Gradients¶

The same script then places the Riccati solve inside a JAX objective:

def target_inventory_curve(horizon: int = HORIZON) -> jax.Array:
    """Reference curve: liquidate most of the position over the horizon."""

    steps = jnp.arange(horizon + 1, dtype=jnp.float64)
    return jnp.exp(-0.22 * steps)


def execution_tracking_loss(log_inventory_risk, log_trading_cost):
    """Tune LQR weights so the inventory path matches a desired urgency."""

    inventory_risk = jnp.exp(log_inventory_risk)
    trading_cost = jnp.exp(log_trading_cost)
    _, _, inventory, sell_quantity = execution_schedule(
        inventory_risk,
        trading_cost,
    )
    target = target_inventory_curve()
    inventory_error = jnp.mean((inventory - target) ** 2)
    turnover_penalty = 1e-2 * jnp.mean(sell_quantity**2)
    terminal_penalty = 10.0 * inventory[-1] ** 2
    return inventory_error + turnover_penalty + terminal_penalty


def tune_execution_weights(num_steps: int = 50, learning_rate: float = 0.15):
    params = (jnp.array(-2.0), jnp.array(-2.0))
    objective_and_grad = jax.jit(
        jax.value_and_grad(execution_tracking_loss, argnums=(0, 1))
    )

    initial_loss, _ = objective_and_grad(*params)
    history = [float(initial_loss)]

    for _ in range(num_steps):
        loss, grads = objective_and_grad(*params)
        dq, dr = grads
        params = (
            params[0] - learning_rate * dq,
            params[1] - learning_rate * dr,
        )
        history.append(float(loss))

    final_loss = float(execution_tracking_loss(*params))
    return {
        "initial_loss": history[0],
        "final_loss": final_loss,
        "inventory_risk": float(jnp.exp(params[0])),
        "trading_cost": float(jnp.exp(params[1])),
        "loss_history": np.asarray(history),
    }

Here the goal is to tune Q and R so the resulting inventory path tracks a chosen urgency curve while still keeping turnover and terminal inventory under control.

That gives the workflow

\[ (\theta_Q, \theta_R) \longrightarrow \bigl(Q(\theta_Q), R(\theta_R)\bigr) \longrightarrow \operatorname{lqr}(A, B, Q, R) \longrightarrow \text{inventory path} \longrightarrow \text{loss}. \]

This is the part that feels especially native to Contrax: the controller design step is not a separate offline calculation. It lives inside the same differentiable JAX program as the rest of the objective.

Batch The Same Design Across Many Assets¶

Once the execution problem is written as an ordinary fixed-shape control workflow, batching becomes just another vmap:

@jax.jit
def batched_first_trade(inventory_risks, trading_costs):
    def solve_one(q, r):
        _, _, _, sells = execution_schedule(q, r, horizon=HORIZON)
        return sells[0]

    return jax.vmap(solve_one)(inventory_risks, trading_costs)

That is a natural extension of the same story. Instead of solving one execution schedule, solve many independent schedules with different risk and impact weights in one compiled pass.

What The Script Prints¶

Running examples/lqr_optimal_execution.py prints a compact summary of the baseline controller and the tuned design:

LQR optimal execution
baseline gain          = [[0.87695257]]
initial tuning loss    = 0.052402
final tuning loss      = 0.046743
tuned inventory risk   = 0.116998
tuned trading cost     = 0.156547
first sell quantities  = [8.76952567e-01 1.07906744e-01 1.32776339e-02 ...]
batched first sells    = [0.65586909 0.87695257 0.96291202]

The useful checks are simple:

the baseline controller produces a monotone liquidation path
the tuning loop lowers its objective
the batched version returns different schedules for different weight choices

Read This Example For What It Is¶

This page is intentionally about the control mapping, not about full execution microstructure realism. The model is linear, quadratic, single-asset, and deliberately small.

That is also what makes it useful here. You can see the state, the control, the cost, and the feedback law immediately, and then see how Contrax extends that classical setup with differentiation and batching.