# IMU Orientation Estimation with Error-State Kalman Filters

## Table of Contents

## Overview

An inertial measurement unit (IMU), which consists of an accelerometer and a gyroscope, is a
powerful sensor for navigation. The relative orientation (of the device) can be computed by
integrating angular velocity reading \(\boldsymbol{\omega}_k\) from the gyroscope. However, error
accumulates over time due to noise. This long-term drifting has to be corrected with some kinds of global
measurements of the device orientation. In the context of inertial-based navigation, we use the
estimated gravity direction from accelerometers to do this correction. The algorithm that
fuses the angular velocity readings and the global orientation measurements (e.g., from gravity
direction) to compute the device orientation is called the **3D altitude filter**.

In this post, I will introduce 3D altitude filters in the form of error-state Kalman filters, one of the most efficient altitude filter implementation. I found that the materials online are fragmented regarding the error-state Kalman filter. It is hard to figure out the motivation behind it and the difference between the error-state Kalman filter the normal extended Kalman filters. This document hopefully make it crystal clear.

I also note that there exist similarly effective but simpler approach, such as complementary filters, for IMU-base orientation estimation [1, 2]. I would probably prefer complementary filters in a real-world system due to their simplicity. However, the methodology regarding the rotation can generalize to any system that optimizes some kinds of rotation representation, e.g., VIO, SLAM, etc, so it is worthwhile to learn it.

## Introduction to Kalman Filters

### Linear Kalman Filters

Let \(\boldsymbol{x}_k \in \mathbb{R}^N\) be the internal state of the process that we are interested in (e.g., the orientation of the device) and
\(\boldsymbol{z}_k \in \mathbb{R}^M\) be our measurement of that internal state. We assume that the
state-transition model of \(\boldsymbol{x}_k\) and the measurement model for \(\boldsymbol{z}_k\) is **known** and linear:

\[ \begin{align} \boldsymbol{x}_k &= \boldsymbol{F}_k \boldsymbol{x}_{k-1} + \boldsymbol{n}_k \tag{1} \\ \boldsymbol{z}_k &= \boldsymbol{H}_k \boldsymbol{x}_k + \boldsymbol{m}_k \tag{2}, \end{align} \]

where \(\boldsymbol{F}_k \in \mathbb{R}^{N \times N}\) is the state-transition matrix, \(\boldsymbol{H} _k \in \mathbb{R}^{M \times N}\) is the observation matrix, the transition noise \(\boldsymbol{n}_k\) and the observation noise \(\boldsymbol{m}_k\)

\[ \begin{align} \boldsymbol{n}_k &\sim \mathcal{N}(\boldsymbol{0}, \boldsymbol{Q}_k) \\ \boldsymbol{m}_k &\sim \mathcal{N}(\boldsymbol{0},\boldsymbol{R}_k) \end{align} \]

are drawn from a zero-mean multivariate normal distribution with covariance matrices \(\boldsymbol{Q}_k\) and \(\boldsymbol{R}_k\) respectively.

The linear Kalman filter provides the **optimal** analytical solution for the distribution of
$\boldsymbol{x}_k$. Assuming that the initial \(\boldsymbol{x}_0\) follows a known normal distribution:
\(\boldsymbol{x} _0 \sim \mathcal{N} (\hat{\boldsymbol{x}}^+_0, \boldsymbol{P}_0^+)\), the Kalman filters
conclude that the posterior distributions of \(\boldsymbol{x}_k\), either before or after the $t$th
measurement, are also normal distributions, i.e.,

\[ \begin{align} \boldsymbol{x}_k \mid \boldsymbol{z}_{1},\boldsymbol{z}_2,\dots,\boldsymbol{z}_{k-1} &\sim \mathcal{N}(\hat{\boldsymbol{x}}_k^-, \boldsymbol{P}_k^-) \\ \boldsymbol{x}_k \mid \boldsymbol{z}_{1},\boldsymbol{z}_2,\dots,\boldsymbol{z}_k &\sim\mathcal{N}(\hat{\boldsymbol{x}}_k^+, \boldsymbol{P}_k^+) \\ \end{align}. \]

The derivation of Kalman filters is long and full of math tricks. You can find it in any related textbook. The key results are the followings:

\[ \begin{align} \hat{\boldsymbol{x}}_k^- &\gets \boldsymbol{F}_k \hat{\boldsymbol{x}}_k^+ \tag{3} \\ \boldsymbol{P}_k^- &\gets \boldsymbol{F}_k \boldsymbol{P}_k^- \boldsymbol{F}_k^T + \boldsymbol{Q}_k \tag{4} \\ \boldsymbol{K}_k \,&\gets \boldsymbol{P}_k^- \boldsymbol{H}_k^T (\boldsymbol{H}_k \boldsymbol{P}_k^- \boldsymbol{H}_k^T + \boldsymbol{R}_k)^{-1} \tag{5} \\ \hat{\boldsymbol{x}}_k^+ &\gets (\boldsymbol{I} - \boldsymbol{K}_k \boldsymbol{H}_k) \hat{\boldsymbol{x}}_k^- + \boldsymbol{K}_k \boldsymbol{z}_k \tag{6} \\ \boldsymbol{P}_k^+ &\gets (\boldsymbol{I} - \boldsymbol{K}_k \boldsymbol{H}_k) \boldsymbol{P}_k^- \tag{7}. \end{align} \]

Here, Equations (3-4) are from the propagation of state-transition equation (1). Equation (5) computes what is so called the "Kalman gain" \(\boldsymbol{K}_k\) and it is heavily used in Equations (6-7). I understand Equation (6) as that the posterior mean \(\hat{\boldsymbol{x}}_k^+\) is a weighted average of prior mean \(\hat{\boldsymbol{x}}_k^-\) and the measurement $\boldsymbol{z}_k$, as the formula is similar to \(x'=(1-\lambda)x + \lambda z\). The "larger" the Kalman gain is, the smaller the uncertainty will be (Equation 7), and the more posterior mean \(\hat{\boldsymbol{x}}_k^+\) will be affected by the measurement \(\boldsymbol{z}_k\).

### Extended Kalman Filters

Extended Kalman filters (EKF) is a trivial extension of Kalman filters to non-linear state-transition and measurement functions:

\[ \begin{align} \boldsymbol{x}_k &= \boldsymbol{f}_k (\boldsymbol{x}_{k-1}) + \boldsymbol{n}_k \tag{8} \\ \boldsymbol{z}_k &= \boldsymbol{h}_k (\boldsymbol{x}_k) + \boldsymbol{m}_k. \tag{9} \end{align} \]

To solve the problem, you do first-order Tayler expansion at \(\hat{\boldsymbol{x}}_{k-1}^+\) and \(\hat{\boldsymbol{x}}^-_k\) to approximate the problem with a linear one so results for the Equation (3-7) can be reused:

\[ \begin{align} \boldsymbol{x}_k &= \underbrace{\boldsymbol{f}_k (\hat{\boldsymbol{x}}_{k-1}^+) - \boldsymbol{F}_k \hat{\boldsymbol{x}}_{k-1}^+}_{\text{constant}} + \boldsymbol{F}_k \boldsymbol{x}_{k-1} + \boldsymbol{n}_k \\ \boldsymbol{z}_k &= \underbrace{\boldsymbol{h}_k (\hat{\boldsymbol{x}}_k^-) - \boldsymbol{H}_k \hat{\boldsymbol{x}}_k^-}_{\text{constant}} + \boldsymbol{H}_k \boldsymbol{x}_k + \boldsymbol{m}_k, \end{align} \tag {10} \]

where

\[ \boldsymbol{F}_k = \left . \frac{\partial \boldsymbol f_k}{\partial \boldsymbol{x}_k} \right |_{\boldsymbol{x}_k=\hat{\boldsymbol{x}}_{k-1}^+}, \quad \boldsymbol{H}_k = \left . \frac{\partial \boldsymbol h_k}{\partial \boldsymbol{x}_k } \right |_{\boldsymbol{x}_k=\hat{\boldsymbol{x}}_k^-} \tag{11} \]

are the Jacobian matrices. The difference between approximated Equations (10) and Equations (1-2) is only the marked constant terms, so almost all the results from linear Kalman filters can be reused:

\[ \begin{align} \hat{\boldsymbol{x}}_k^- &\gets \color{red}\boldsymbol{f}_k(\hat{\boldsymbol{x}}_{k-1}^+) \tag{12} \\ \boldsymbol{P}_k^- &\gets \boldsymbol{F}_k \boldsymbol{P}_k^- \boldsymbol{F}_k^T + \boldsymbol{Q}_k \tag{13} \\ \boldsymbol{K}_k \,&\gets \boldsymbol{P}_k^- \boldsymbol{H}_k^T (\boldsymbol{H}_k \boldsymbol{P}_k^- \boldsymbol{H}_k^T + \boldsymbol{R}_k)^{-1}\tag{14} \\ \hat{\boldsymbol{x}}_k^+ &\gets \hat{\boldsymbol{x}}_k^- +{\color{red} \boldsymbol{K}_k \left(\boldsymbol{z}_k - \boldsymbol{h}_k (\hat{\boldsymbol{x}}_k^-)\right)} \tag{15} \\ \boldsymbol{P}_k^+ &\gets (\boldsymbol{I} - \boldsymbol{K}_k \boldsymbol{H}_k) \boldsymbol{P}_k^- . \tag{16} \end{align} \]

The difference from linear Kalman filters, i.e., Equation (3-7), is highlighted in red.

You may have noted that Extended Kalman filters are not theoretically optimal. For example, the estimation of the mean, \(\hat{\boldsymbol{x}}\) in Equation (12), is way off when \(\hat{x} _{k-1}=0\) and \(\boldsymbol{f}_k(x_{k-1}) = x_{k-1}^2\). In general, the optimal estimation of $\boldsymbol{x}$ may not be Gaussian for non-linear problems. However, Extended Kalman filters are still preferred to global optimization methods such as bundle adjustment when efficiency and simplicity take priority.

### Continuous Time Kalman Filters

Real-world physical systems are modeled continuously, so the state-transition model for most physical-based systems are also continuous but the measurements happen at discretized time points:

\[ \begin{align} \dot{\boldsymbol{x}}(t) &= \boldsymbol{f}(t, \boldsymbol{x}(t)) + \boldsymbol{n}(t) \tag{18} \\ \boldsymbol{z}_k &= \boldsymbol{h}_k(\boldsymbol{x}(t_k)) + \boldsymbol{m}_k, \tag{19} \end{align} \]

where $t$ is the current time, $t_k$ is a set of discretization points, and \(\dot{\boldsymbol x}:=\frac{\partial \boldsymbol x}{\partial t}\) is the Newton notation of time derivative. You can find its solution in your favorite textbook as

\[ \begin{align} \hat{\boldsymbol{x}}_k^- &\gets \color{red}\mathrm{ODESolve}\,[\dot{\boldsymbol{x}}= \boldsymbol{f}; \boldsymbol{x}(t_{k-1})=\hat{\boldsymbol{x}}_{k-1}^+ ](t_k) \tag{20} \\ \boldsymbol{P}_k^- &\gets \color{red} \mathrm{ODESolve}\,[\dot{\boldsymbol{P}} = \boldsymbol{F} \boldsymbol{P} + \boldsymbol{P}\boldsymbol{F}^T + \boldsymbol{Q}; \boldsymbol{P}(t_{k-1})={\boldsymbol{P}}_{k-1}^+] \tag{21}(t_k) \\ \boldsymbol{K}_k \,&\gets \boldsymbol{P}_k^- \boldsymbol{H}_k^T (\boldsymbol{H}_k \boldsymbol{P}_k^- \boldsymbol{H}_k^T + \boldsymbol{R}_k)^{-1}\tag{22} \\ \hat{\boldsymbol{x}}_k^+ &\gets \hat{\boldsymbol{x}}_k^- +{\boldsymbol{K}_k \left(\boldsymbol{z}_k - \boldsymbol{h}_k (\hat{\boldsymbol{x}}_k^-)\right)} \tag{23} \\ \boldsymbol{P}_k^+ &\gets (\boldsymbol{I} - \boldsymbol{K}_k \boldsymbol{H}_k) \boldsymbol{P}_k^- . \tag{24} \end{align} \]

where

\[ \boldsymbol{F}_k(t) = \frac{\partial \boldsymbol f(t, \boldsymbol{x})}{\partial \boldsymbol{x}} , \quad \boldsymbol{H}_k = \left . \frac{\partial \boldsymbol h_k(\boldsymbol{x})}{\partial \boldsymbol{x}} \right |_{\boldsymbol{x}_k=\hat{\boldsymbol{x}}_k^-}, \tag{25} \]

similar to Equation (11). The \(\mathrm{ODESolve}\,[\mathrm{eq};\mathrm{cond}]\) represents the solution of the ordinary equation in the bracket. One can try to either derive the analytical solution of the ODE or use a numerical ODE solver such as RK4(5).

## 3D Altitude Filters

If you are familiar with error-state orientation filters and just need a refresher, skip to the summary section to see the algorithm and math.

### Transition Equations

The gyroscope in the IMU provides measurements of current angular velocity. The most widely used gyroscope model is

\[ \boldsymbol{\omega}_m := \boldsymbol{\omega} + \boldsymbol{b} + \boldsymbol{n_\omega}, \tag{26} \]

where \(\boldsymbol{\omega}_m\) is the current reading of the gyroscope,
\(\boldsymbol{\omega}\) is the actual angular velocity of the IMU, \(\boldsymbol{b}\) is the device
bias that drifts slowly due to the environmental factors (e.g., thermal, external pressure, etc), and
\(\boldsymbol{n_\omega}\) is a Gaussian noise. **All the variables are time-dependent.**

Let the state vector be \(\boldsymbol{x} = (\boldsymbol{R}, \boldsymbol{b})\), where \(\boldsymbol{R}\) is the
rotation matrix **from the global frame to the IMU frame**, and \(\boldsymbol{b}\) is the gyro
bias. The continuous-time state-transition equations are written as

\[ \begin{align} \dot{\boldsymbol{R}} &= {\lfloor \boldsymbol{\omega} \rfloor}_{\times} \boldsymbol{R} \tag{27} \\ \dot{\boldsymbol{b}} &= \boldsymbol{n}_{\boldsymbol{b}}, \tag{28} \end{align} \]

where we use the Newton notation \(\dot{\boldsymbol{R}}\) and \(\dot{\boldsymbol{b}}\) to represents variables' time derivative. Equation (27) can be derived from the definition of angular velocity, where \( \lfloor \boldsymbol{\omega} \rfloor_\times \) represents the \(3 \times 3\) skew-symmetric matrix so that \(\lfloor \boldsymbol{\boldsymbol{a}}\rfloor_{\times} \boldsymbol{b} = \boldsymbol{a} \times \boldsymbol{b}\) is the 3D cross product between vectors \(\boldsymbol{a}\) and \(\boldsymbol{b}\).

Note that literature like Trawny05b uses quaternions instead of rotation matrices as they are more efficient to implement as it is much trickier to keep a rotation matrix orthogonal. However, I think computers are faster and more people are familiar with rotation matrices than with quaternions nowadays. Besides, the results can be easily converted between the form of quaternions and rotation matrices.

### Measurement Equations

The accelerometer provides the reading of the device's acceleration plus gravity. The magnitude of device acceleration is often much smaller than the magnitude of gravity. Therefore, with some tricks, we can use the direction of the accelerometer readings as the measurement updates. For example, one can only do such updates when the accelerometer reading is close to \(9.8 m/s^2\).

Now, let's say we have measured that the **unit vector** of the gravity direction is \(\boldsymbol{g}_I\)
in the IMU frame with some known uncertainty. A common way to write the measurement equation is

\[ \mathbf{0}_{2\times 1} = \underbrace{ \begin{bmatrix} \boldsymbol{g}^{\perp T}_1\\ \boldsymbol{g}^{\perp T}_2\\ \end{bmatrix}}_{\boldsymbol G} \boldsymbol R \boldsymbol{g}_G + \underbrace{\begin{bmatrix} m_1 \\ m_2 \end{bmatrix}}_{\boldsymbol m} \tag{29} \]

where \(\boldsymbol{g}^{\perp}_1\) and \(\boldsymbol{g}^{\perp}_2\) are arbitrary unit \(\mathbb{R}^3\) vectors such that \(\boldsymbol{g}^{\perp}_1 \perp \boldsymbol{g}_I \perp \boldsymbol{g}^{\perp}_2\) and \(\boldsymbol{g}^{\perp}_1 \perp \boldsymbol{g}^{\perp}_2\), \(\boldsymbol g_G\) is the gravity direction in the global frame with a typical value equal to \([0\; 1\; 0]^T\), and \(m_1,m_2 \sim \mathcal{N}(0, R)\) is the measurement Gaussian noise whose characteristics are known. The measurement result \(\boldsymbol z_k\) is always \(\mathbf{0}_{2\times 1}\).

The intuition of Equations (29-30) is that gravity direction in the IMU frame should be perpendicular to \(\boldsymbol{g}^{\perp}_1\) and \(\boldsymbol{g}^{\perp}_2\). One may wonder why not use a simpler equation like

\[ \xcancel{\boldsymbol{g}_I = \boldsymbol{R} \boldsymbol g_G + \boldsymbol{m}} \]

instead? The reason is that the noise \(\boldsymbol{m}\) in this case is very non-Gaussian, making it tricky to set the uncertainty of \(\boldsymbol{R}\boldsymbol g_G\) in the direction of \(\boldsymbol{g}_I\): the covariance (uncertainty) should be zero if assuming the error is small, but if we set it to zero, it would put a hard and potentially problematic constraint on the system.

### Angular Velocity Integration

To solve Equation (20), we try to solve the following ODE for the "mean" estimation of \(\boldsymbol R\):

\[ \dot{\hat{\boldsymbol R}} = \lfloor \hat{\boldsymbol \omega}\rfloor_{\times} \hat{\boldsymbol R} \tag{30} \]

where

\[ \boldsymbol{\hat{\omega}}:=\boldsymbol{\omega}_m - \boldsymbol{\hat b} \tag{38} \]

is our estimated mean of the angular velocity.

I put a quote on *"mean"* because it is only an approximation for a non-linear system. This is angular velocity integration. We can solve it with the zero-order integrator, which assumes
the constant average angular velocity between
\(t_{k-1}\) and \(t_k\):

\[ \bar{\boldsymbol{\omega}}_k = \frac{\boldsymbol{\omega}_m(t_{k-1}) + \boldsymbol{\omega}_m(t_{k})}{2} - \hat{\boldsymbol b}_{k-1}^+. \]

For forward integration, we set the noise \(\boldsymbol n_{\boldsymbol \omega}\) and \(\boldsymbol n_{\boldsymbol b}\) to be \(\boldsymbol 0\) according to Equations (12, 20). Then the ODE solution to Equation (27, 28) is

\[ \begin{align} \hat{\boldsymbol{R}}^-_k &\gets \boldsymbol{R}(\Delta t\boldsymbol{\bar\omega}_k)\hat{\boldsymbol{R}}^+_{k-1} \\ \hat{\boldsymbol{b}}^-_k &\gets \hat{\boldsymbol{b}}^+_{k-1} \end{align}, \tag{31} \]

where \(\Delta t := t_{k} - t_{k-1}\) and \(\boldsymbol R(\boldsymbol \theta)\) is the Rodrigues' rotation formula that computes the rotation matrix from an angle-axis representation. Note that the bias \(\hat{\boldsymbol b}\) does not change during the propagation stage and only be updated during the measurement stage.

One thing to note is that when you multiply many matrices together as in Equation (31), the rotation matrix may lose orthogonality gradually. You need to re-normalize it periodically, e.g., by doing a 3x3 SVD and re-setting the singular value to 1

If you want to look smarter, you can use the first-order quaternion integrator [1, 2], which assumes a linear change in \(\boldsymbol \omega\) and the solution is summarized below:

\[ \hat{\boldsymbol{q}}^-_k = \hat{\boldsymbol{q}}^+_{k-1} \otimes \left( \boldsymbol{q}(\Delta t\boldsymbol{\bar\omega}_k) + \frac{\Delta t^2}{24} \begin{bmatrix} 0 \\ \boldsymbol{\hat{\omega}_{k-1}} \times \boldsymbol{\hat{\omega}_{k}}\end{bmatrix} \right). \]

Here \(\boldsymbol q\) is the quaternion form of \(\boldsymbol R\) and \(\boldsymbol{\hat{\omega}}_{k}:=\boldsymbol{\omega}_m(t_k) - \boldsymbol{\hat b}_k\). Note that the first term in the parentheses are the same as the zero-order integrator, and the second term applies a correction that vanishes when \(\boldsymbol{\hat \omega}_{k-1}\) collinear with \(\boldsymbol{\hat \omega}_{k}\).

### Lie Group Calculus

In this section, we try to derive \(\boldsymbol{F}_k\) and \(\boldsymbol{H}_k\) as needed by Equations (21-25). One way to do this is to flatten \( \boldsymbol{x} = (\boldsymbol{R}, \boldsymbol{b})\) as an \( \mathbb R^{12}\) vector and then apply vector calculus to Equations (27-30). However, this method does not consider the fact that \(\boldsymbol R_k\) must lie in the space of rotation matrices. Thus, there are two major issues:

- The distribution of \(\boldsymbol R\) is not Gaussian in the \(\mathbb R^9\) space, i.e., \(\boldsymbol R_k \not \sim \mathcal{N}(\hat{\boldsymbol R}_k^+, \hat{\boldsymbol{P}}^+_k)\);
- When applying the post-measurement update as in Equation (23), the new \(\boldsymbol{R}_k^+\) may not be a rotation matrix anymore.

Therefore, instead of directly following EFK and linearizing with \(\boldsymbol R_k=\hat{\boldsymbol R}_{k-1}^+ + \boldsymbol \delta \boldsymbol R_k\), we decompose \(\boldsymbol R\) and \(\boldsymbol b\) as

\[ \begin{align} \boldsymbol{R} &= \boldsymbol R(\boldsymbol{\theta}_k) \hat{\boldsymbol{R}}_{k-1}^+ \\ \boldsymbol b &= \boldsymbol \beta_k + \hat{\boldsymbol{b}}_{k-1}^+ \end{align}, \tag{32} \]

where the new model assumes

\[ \begin{bmatrix} \boldsymbol\theta_k \\ \boldsymbol \beta_k \end{bmatrix} \sim \mathcal{N}(\boldsymbol{0}_{3\times 1}, \hat{\boldsymbol{Q}}^+_{k-1}). \]

Here, \((\boldsymbol \theta_k, \boldsymbol \beta_k)\) are so-called *error states*,
representing the uncertainty of the state in the \(k\)th update.
\(\boldsymbol{\theta}_k \in \mathbb R^3\) is the
angle-axis representation of the uncertainty of \(\boldsymbol R\) and
rotation matrix representation of \(\boldsymbol \theta_k\) from the Rodrigues' rotation formula that computes the rotation matrix from the
angle-axis representation.
Unlike direct linearization in Euclidean space, Equation (32) guarantees that \(\boldsymbol R\)
is always a rotation matrix. This is called *Lie Group Calculus* as it mimics regular derivative
but with the rotation Lie group.

Next, we plug Equation (32) into Equation (27-28) to find the linearized relationship between \(\dot{\boldsymbol{\theta}}\) and \(\boldsymbol{\theta}\):

\[ \begin{bmatrix} \dot{\boldsymbol{\theta}}\\ \dot{\boldsymbol \beta} \end{bmatrix} \approx \begin{bmatrix} \lfloor \hat{\boldsymbol \omega} \rfloor_{\times} & -\boldsymbol{I}_{3 \times 3} \\ \boldsymbol{0}_{3 \times 3} & \boldsymbol{0}_{3 \times 3} \end{bmatrix} \begin{bmatrix} \boldsymbol{\theta} \\ \boldsymbol{\beta} \end{bmatrix} + \begin{bmatrix} \boldsymbol{n}_{\boldsymbol \theta} \\ \boldsymbol{n}_{\boldsymbol \beta} \\ \end{bmatrix} \tag{33} \]

where *we omit the subscript $_k$ and superscript $^+$* on \(\boldsymbol\theta\) for clarity.

## Click to Show the Derivation of Equation (33)

To derive Equation (33), we start from Equation (32), take the time derivative, and try to find the
$\dot{\boldsymbol\theta}$ in the term of of $\boldsymbol\theta$:
$$
\begin{align}
\boldsymbol{R} &\overset{\mathrm{0}}{=} \boldsymbol R(\boldsymbol{\theta}) \hat{\boldsymbol{R}} \\

\frac{\partial\boldsymbol{R}}{\partial t} &\overset{\mathrm{1}}{=} \frac{\partial \boldsymbol R(\boldsymbol{\theta})}{\partial t} \hat{\boldsymbol{R}} + \boldsymbol R(\boldsymbol{\theta}) \frac{\partial\hat{\boldsymbol{R}}}{\partial t}\\

{\lfloor \boldsymbol{\omega} \rfloor}_{\times} \boldsymbol{R} &\overset{\mathrm{2}}{=} \frac{\partial \boldsymbol R(\boldsymbol{\theta})}{\partial t} \hat{\boldsymbol{R}} + \boldsymbol R(\boldsymbol{\theta}) {\lfloor \hat{\boldsymbol{\omega}} \rfloor}_{\times} \hat{\boldsymbol{R}} \\

{\lfloor \boldsymbol{\omega} \rfloor}_{\times} \boldsymbol{R} \hat{\boldsymbol{R}}^{-1} &\overset{\mathrm{3}}{=} \frac{\partial \boldsymbol R(\boldsymbol{\theta})}{\partial t} + \boldsymbol R(\boldsymbol{\theta}) {\lfloor \hat{\boldsymbol{\omega}} \rfloor}_{\times} \\

{\lfloor \boldsymbol{\omega} \rfloor}_{\times} \boldsymbol{R}(\boldsymbol \theta) &\overset{\mathrm{4}}{=} \frac{\partial \boldsymbol R(\boldsymbol{\theta})}{\partial t} + \boldsymbol R(\boldsymbol{\theta}) {\lfloor \hat{\boldsymbol{\omega}} \rfloor}_{\times}
\end{align}
$$
where ${=}^1$ is applying the time derivative with the product rule,
${=}^2$ is from the definition of $\boldsymbol R$ and $\hat{\boldsymbol R}$ in Equations (27, 30).
$=^3$ is multiplying $\hat{\boldsymbol R}^{-1}$ on the both sides, and $=^4$ is from $=^0$.

Assuming $\boldsymbol \theta$ is small, the Rodrigues’ rotation formula becomes
$$
\boldsymbol{R}(\boldsymbol \theta) \approx \mathbf{I}_{3\times 3} + \lfloor \boldsymbol{\theta} \rfloor_{\times}. \tag{40}
$$
We can use it to simplify the previous equation to
$$
\begin{align}
\frac{\partial \boldsymbol R(\boldsymbol{\theta})}{\partial t} = \frac{\partial \lfloor \boldsymbol{\theta} \rfloor_{\times}}{\partial t}
&=
{\lfloor \boldsymbol{\omega} \rfloor}_{\times} - {\lfloor \hat{\boldsymbol{\omega}} \rfloor}_{\times} + {\lfloor \boldsymbol{\omega} \rfloor}_{\times} {\lfloor \boldsymbol{\theta} \rfloor}_{\times}- {\lfloor \boldsymbol{\theta} \rfloor}_{\times} {\lfloor \hat{\boldsymbol{\omega}} \rfloor}_{\times}.
\end{align}
$$
Use the property that $\lfloor \cdot \rfloor_{\times}$ is a linear function and plug in the
$\hat{\boldsymbol{\omega}} = \hat{\boldsymbol{\omega}} - \boldsymbol\beta - n_{\boldsymbol \omega}$, we have
$$
\begin{align}
\frac{\partial \lfloor \boldsymbol{\theta} \rfloor_{\times}}{\partial t}
&=
{\lfloor \boldsymbol{\omega} - \hat{\boldsymbol{\omega}} \rfloor}_{\times} + {\lfloor \boldsymbol{\omega} \rfloor}_{\times} {\lfloor \boldsymbol{\theta} \rfloor}_{\times}- {\lfloor \boldsymbol{\theta} \rfloor}_{\times} {\lfloor \hat{\boldsymbol{\omega}} \rfloor}_{\times} \\

&=
{\lfloor - \boldsymbol{\beta} - \boldsymbol n_{\boldsymbol \omega} \rfloor}_{\times} + {\lfloor \hat{\boldsymbol{\omega}} - \boldsymbol \beta - \boldsymbol n_{\boldsymbol \omega}\rfloor}_{\times} {\lfloor \boldsymbol{\theta} \rfloor}_{\times}- {\lfloor \boldsymbol{\theta} \rfloor}_{\times} {\lfloor \hat{\boldsymbol{\omega}} \rfloor}_{\times}
\end{align}
$$
Removing all the term that is on the magnitude of $o(|\boldsymbol \theta||\boldsymbol \beta|)$ or $o(|\boldsymbol \theta||\boldsymbol n_{\boldsymbol \omega}|)$ , we have
$$
\begin{align}
\frac{\partial \lfloor \boldsymbol{\theta} \rfloor_{\times}}{\partial t} =
{\lfloor -\boldsymbol{\beta} - \boldsymbol n_{\boldsymbol \omega} \rfloor}_{\times} + {\lfloor \hat{\boldsymbol{\omega}}\rfloor}_{\times} {\lfloor \boldsymbol{\theta} \rfloor}_{\times}- {\lfloor \boldsymbol{\theta} \rfloor}_{\times} {\lfloor \hat{\boldsymbol{\omega}} \rfloor}_{\times} \\

\end{align}
$$
Next, using the vector triple product identity from Lagrange’s formula

$$\mathbf a \times (\mathbf b \times \mathbf c) -
\mathbf b \times (\mathbf a \times \mathbf c) = (\mathbf a \times \mathbf b) \times \mathbf c,$$
we can further simplify it to
$$
\begin{align}
\frac{\partial \lfloor \boldsymbol{\theta} \rfloor_{\times}}{\partial t}
&= {\lfloor -\boldsymbol{\beta} - \boldsymbol n_{\boldsymbol \omega} \rfloor}_{\times} + {\lfloor \hat{\boldsymbol{\omega}}\times \boldsymbol{\theta} \rfloor}_{\times} \\

&= {\lfloor \hat{\boldsymbol{\omega}}\times \boldsymbol{\theta} - \boldsymbol{\beta} - \boldsymbol n_{\boldsymbol \omega} \rfloor}_{\times},
\end{align}
$$
which gives
$$
\frac{\partial \boldsymbol{\theta} }{\partial t} = \dot{\boldsymbol \theta} =
\hat{\boldsymbol{\omega}}\times \boldsymbol{\theta} - \boldsymbol{b} - \boldsymbol n_{\boldsymbol \omega}.
$$

Similarly, we can expand \(\boldsymbol R\) around \(\hat{\boldsymbol{R}}_{k}^-\) into Equations (29) to compute the linearized error-state measurement equation in term:

\[ \underbrace{\boldsymbol G \hat{\boldsymbol{R}}_{k}^- \boldsymbol g_G}_{\boldsymbol z_k} = \underbrace{\boldsymbol G ( ( \hat{\boldsymbol{R}}_{k}^- \boldsymbol{g}_G ) \times \boldsymbol \theta )}_{\boldsymbol h_k(\boldsymbol \theta)} + \boldsymbol m. \tag{34} \]

## Click to Show the Derivation of Equation (34)

Let $$ \boldsymbol{R} = \boldsymbol R(\boldsymbol{\theta}) \hat{\boldsymbol{R}}_{k}^- \$$ and applying the small angle Rodrigues’ rotation formula in Equation (40), we have $$ \begin{align} \mathbf{0}_{2\times 1} &= \boldsymbol G \boldsymbol R \boldsymbol{g}_G + {\boldsymbol m} \\

\mathbf{0}_{2\times 1} &= \boldsymbol G \boldsymbol R(\boldsymbol{\theta}) \hat{\boldsymbol{R}}_{k}^- \boldsymbol{g}_G + {\boldsymbol m} \\

\mathbf{0}_{2\times 1} &= \boldsymbol G ( \mathbf{I}_{3\times 3} + \lfloor \boldsymbol{\theta} \rfloor_{\times}) \hat{\boldsymbol{R}}_{k}^- \boldsymbol{g}_G + {\boldsymbol m} \\

\boldsymbol G \hat{\boldsymbol{R}}_{k}^- \boldsymbol{g}_G &= \boldsymbol G (( \hat{\boldsymbol{R}}_{k}^- \boldsymbol{g}_G ) \times \boldsymbol \theta) - {\boldsymbol m} \end{align}. $$ Note that the sign before $\boldsymbol m$ does not matter since the noise is assumed to be zero-mean.

### Error State Kalman Filters

Now, instead of using \((\boldsymbol R, \boldsymbol b)\), we can treat \(\boldsymbol x'= [\begin{smallmatrix}\boldsymbol \theta \\ \boldsymbol \beta\end{smallmatrix}]\) as the state vector and use Equations (33-34) as the state-transition model and the measurement model to propagate the mean and covariance of \(\boldsymbol x'\). This is often called *the error-state Kalman filter* in literatures. The error-state Kalman filter only differs from normal Extended Kalman Filters when a specialized "linearization", e.g., Equation (32), is used. Otherwise, error-state Kalman filters are equivalent to extended Kalman filters mathematically.

With results from the continuous-time Kalman filter in Equation (20-24), we can propagate the mean and the covariance of \(\boldsymbol x'\) with

\[ \boldsymbol{F}_k \gets \begin{bmatrix} \lfloor \hat{\boldsymbol \omega} \rfloor_{\times} & -\boldsymbol{I}_{3 \times 3} \\ \boldsymbol{0}_{3 \times 3} & \boldsymbol{0}_{3 \times 3} \end{bmatrix}, \quad \boldsymbol{H}_k \gets \left[ \boldsymbol G \lfloor \hat{\boldsymbol{R}}_{k}^- \boldsymbol g_G \rfloor_{\times} \:\: \boldsymbol 0_{2\times 3}\right] \tag{35} \]

After the measurement update, we need to apply the estimated mean of \(\hat{\boldsymbol \theta}_k\) back to \(\boldsymbol R\), i.e.,

\[ \hat{\boldsymbol R}_{k}^+ \gets \boldsymbol R(\hat{\boldsymbol{\theta}}_k^+) \boldsymbol{R}_{k}^- \]

so that we can do a "Lie group linearization" at \(\boldsymbol R_{k+1} = \boldsymbol R(\boldsymbol \theta_{k+1})\hat{\boldsymbol R}_{k}^+\) again (Equations 33-35) for the step \(k+1\) while keeping the prior of \(\boldsymbol \theta_{k+1}\) be zero-mean.

### Covariance Integration

The last remaining piece is to solve the ODE for propagating the covariance matrix as in Equation (21). Trawny05b gives the closed-form solution. However, the math is very complex and hard to follow. A mathematically simpler solution is just to use a numerical ODE solver like RK4(5) to find the solution for

\[{\boldsymbol P}^-_k \gets \boldsymbol{P}(t_{k})\]

where

\[ \begin{align} \dot{\boldsymbol{P}}(t) &= \boldsymbol{F}_k \boldsymbol{P}(t) + \boldsymbol{P}(t)\boldsymbol{F}_k^T + \boldsymbol{Q}(t) \\ \boldsymbol{P}(t_{k-1})&={\boldsymbol{P}}_{k-1}^+ \\ \end{align}. \]

It probably will not be much slower than the closed-form solution.

### Summary

Put everything together, the algorithm looks like this:

- Initialize $\hat{\boldsymbol R}^+_0$, $\hat{\boldsymbol b}^+_0$, and the covariance matrices ${\boldsymbol P}^+_0$ of the error state $[\begin{smallmatrix}\boldsymbol \theta \\ \boldsymbol \beta\end{smallmatrix}]$ according to the prior of your system;
- For each $k$, use Equation (31) to propagate the mean estimation of state by integrating angular velocity:

\[ \begin{align} \hat{\boldsymbol{R}}^-_k &\gets \boldsymbol{R}(\Delta t\boldsymbol{\bar\omega}_k)\hat{\boldsymbol{R}}^+_{k-1} \\ \hat{\boldsymbol{b}}^-_k &\gets \hat{\boldsymbol{b}}^+_{k-1} \end{align}; \]

- Re-normalize $\hat{\boldsymbol{R}}^-_k$ with SVD to keep its orthogonality if necessary;
- Compute the Jacobian of the error state state-transition model and the measurement model as in Equation (35): \[ \boldsymbol{F}_k \gets \begin{bmatrix} \lfloor \hat{\boldsymbol \omega} \rfloor_{\times} & -\boldsymbol{I}_{3 \times 3} \\ \boldsymbol{0}_{3 \times 3} & \boldsymbol{0}_{3 \times 3} \end{bmatrix}, \quad \boldsymbol{H}_k \gets \left[ \boldsymbol G \lfloor \hat{\boldsymbol{R}}_{k}^- \boldsymbol g_G \rfloor_{\times} \:\: \boldsymbol 0_{2\times 3}\right]; \]
- Propagate the covariance matrix \[{\boldsymbol P}^-_k \gets \boldsymbol{P}(t_{k})\] (see Section Covariance Integration) by solving the following ODE with a numerical ODE solver:

\[ \begin{align} \dot{\boldsymbol{P}}(t) &= \boldsymbol{F}_k \boldsymbol{P}(t) + \boldsymbol{P}(t)\boldsymbol{F}_k^T + \boldsymbol{Q}(t) \\ \boldsymbol{P}(t_{k-1})&={\boldsymbol{P}}_{k-1}^+ \end{align}; \]

- Compute the Kalman gain with Equation (22):\[ \boldsymbol{K}_k \gets \boldsymbol{P}_k^- \boldsymbol{H}_k^T (\boldsymbol{H}_k \boldsymbol{P}_k^- \boldsymbol{H}_k^T + \boldsymbol{R}_k)^{-1};\]
- Update the post-measurement error-state covariance: \[\boldsymbol{P}_k^+ \gets (\boldsymbol{I} - \boldsymbol{K}_k \boldsymbol{H}_k) \boldsymbol{P}_k^- ;\]
- Compute the mean error state according to Section Lie Group Calculus using $\hat{\boldsymbol \theta}^-_k =\boldsymbol 0$ for Equation (23, 34): \[\begin{bmatrix}\hat{\boldsymbol \theta}_k^+ \\ \hat{\boldsymbol \beta}_k^+\end{bmatrix} \gets \boldsymbol{K}_k\boldsymbol G \boldsymbol g_G;\]
- Apply the measurement update to the estimated mean state using the estimated mean error state:

\[ \begin{align} \hat{\boldsymbol R}^+_k &\gets \boldsymbol R(\hat{\boldsymbol{\theta}}_k^+) \hat{\boldsymbol{R}}_{k}^-\\ \hat{\boldsymbol b}^+_k &\gets \hat{\boldsymbol \beta}^+_k + \hat{\boldsymbol{b}}_{k}^- \end{align}; \]

- Repeat 2-9 for $k \gets k + 1$.

## Acknowledgement

I would like to thank Stergios Roumeliotis for providing the reference and tutorial of the topic. I would like to thank Yulin Yang for the in-depth discussion on the error-state EFK.