BOLD, non-BOLD, and TE-dependence with tedana

BOLD, non-BOLD, and TE-dependence with tedana#

Important

This chapter should differentiate itself from Signal_Decay by focusing on the application of (2) to decompositions, rather than raw signal compared between active and inactive states.

We may want to describe adaptive masking, data whitening, the model fit metrics, and post-processing methods (e.g., MIR) in this page as well.

Important

The general flow of this chapter should be:

Explain the monoexponential decay equation, but primarily reference back to Signal_Decay.
Walk through optimal combination and adaptive masking somewhere around here.
Describe why multi-echo denoising can’t be done directly to the raw signal and why ICA is necessary.
1. This mean talking about noise, really, and why the FIT method (volume-wise T2*/S0 estimation) is generally considered too noisy for practical application.
2. Walk through TEDPCA as well.
The TE-(in)dependence models.
Apply the models to a simulated component, as well as multiple real components.
1. Show model fit for different components.
Compare optimally combined, denoised, and high-kappa data.
Describe post-processing methods, like minimum image regression and tedana’s version of global signal regression.

This notebook uses simulated T2*/S0 manipulations to show how TE-dependence is leveraged to denoise multi-echo data.

The equation for how signal is dependent on changes in S0 and T2*:

(2)#\[S(t, TE_k) = \bar{S}(TE_k) * (1 + \frac{{\Delta}{S_0}(t)}{\bar{S}_0} - {\Delta}{R_2^*}(t)*TE_k)\]

Show code cell content

Hide code cell content

import os

import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from book_utils import compute_te_dependence_statistics, predict_bold_signal
from myst_nb import glue
from nilearn.glm import first_level
from repo2data.repo2data import Repo2Data
from scipy import signal, stats

sns.set_style("whitegrid")

# Install the data if running locally, or point to cached data if running on neurolibre
DATA_REQ_FILE = os.path.join("../binder/data_requirement.json")

# Download data
repo2data = Repo2Data(DATA_REQ_FILE)
data_path = repo2data.install()
data_path = os.path.abspath(data_path[0])

out_dir = os.path.join(data_path, "te-dependence")
os.makedirs(out_dir, exist_ok=True)

---- repo2data starting ----
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/repo2data
Config from file :
../binder/data_requirement.json
Destination:
./../data/ds006193/multi-echo-data-analysis

Info : Starting to download from datalad https://github.com/OpenNeuroDatasets/ds006193.git ...

It is highly recommended to configure Git before using DataLad. Set both 'user.name' and 'user.email' configuration variables.
[INFO] Attempting a clone into /home/runner/work/multi-echo-data-analysis/multi-echo-data-analysis/data/ds006193/multi-echo-data-analysis 
[INFO] Attempting to clone from https://github.com/OpenNeuroDatasets/ds006193.git to /home/runner/work/multi-echo-data-analysis/multi-echo-data-analysis/data/ds006193/multi-echo-data-analysis 

[INFO] Start enumerating objects 
[INFO] Start counting objects 
[INFO] Start compressing objects 
[INFO] Start receiving objects 
[INFO] Start resolving deltas 
[INFO] Completed clone attempts for Dataset(/home/runner/work/multi-echo-data-analysis/multi-echo-data-analysis/data/ds006193/multi-echo-data-analysis) 

install(error): /home/runner/work/multi-echo-data-analysis/multi-echo-data-analysis/data/ds006193/multi-echo-data-analysis (dataset) [No working git-annex installation of version >= 8.20200309. Visit http://handbook.datalad.org/r.html?install for instructions on how to install DataLad and git-annex.] [No working git-annex installation of version >= 8.20200309. Visit http://handbook.datalad.org/r.html?install for instructions on how to install DataLad and git-annex.]

---------------------------------------------------------------------------
CalledProcessError                        Traceback (most recent call last)
Cell In[1], line 19
     17 # Download data
     18 repo2data = Repo2Data(DATA_REQ_FILE)
---> 19 data_path = repo2data.install()
     20 data_path = os.path.abspath(data_path[0])
     22 out_dir = os.path.join(data_path, "te-dependence")

File /opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/repo2data/repo2data.py:106, in Repo2Data.install(self)
    103     for key, value in self._data_requirement_file.items():
    104         if isinstance(value, dict):
    105             ret += [Repo2DataChild(value, self._use_server,
--> 106                                    self._data_requirement_path,key,self._server_dst_folder).install()]
    107 # if not, it is a single assignment
    108 else:
    109     ret += [Repo2DataChild(self._data_requirement_file,
    110                            self._use_server, self._data_requirement_path, None, self._server_dst_folder).install()]

File /opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/repo2data/repo2data.py:364, in Repo2DataChild.install(self)
    362     os.makedirs(self._dst_path)
    363 # Downloading with the right method, depending on the src type
--> 364 self._scan_dl_type()
    365 # If needed, decompression of the data
    366 self._archive_decompress()

File /opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/repo2data/repo2data.py:332, in Repo2DataChild._scan_dl_type(self)
    330 # if the source link has a .git, we use datalad
    331 elif re.match(".*?\\.git$", self._data_requirement_file["src"]):
--> 332     self._datalad_download()
    333 # or coming from google drive
    334 elif re.match(".*?(drive\\.google\\.com).*?", self._data_requirement_file["src"]):

File /opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/repo2data/repo2data.py:263, in Repo2DataChild._datalad_download(self)
    260 print("Info : Starting to download from datalad %s ..." %
    261       (self._data_requirement_file["src"]))
    262 try:
--> 263     subprocess.check_call(
    264         ['datalad', 'install', self._dst_path, "-s", self._data_requirement_file["src"]])
    265 except FileNotFoundError:
    266     print("Error: datalad does not appear to be installed")

File /opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/subprocess.py:369, in check_call(*popenargs, **kwargs)
    367     if cmd is None:
    368         cmd = popenargs[0]
--> 369     raise CalledProcessError(retcode, cmd)
    370 return 0

CalledProcessError: Command '['datalad', 'install', './../data/ds006193/multi-echo-data-analysis', '-s', 'https://github.com/OpenNeuroDatasets/ds006193.git']' returned non-zero exit status 1.

Plot simulations of BOLD and non-BOLD signals as a function of echo time#

Make design matrices#

For TEDPCA and TEDICA, we use regression to get parameter estimates (PEs; not beta values) for component time-series against echo-specific data, and substitute those PEs for \({\bar{S}(TE_k)}\). At some point, I would like to dig into why those parameter estimates are equivalent to \({\bar{S}(TE_k)}\) for our purposes.

TE-independence model#

(3)#\[\frac{{\Delta}S(TE_k)}{\bar{S(TE_k)}} = \frac{{\Delta}S_0}{S_0}\]

(4)#\[{\Delta}S(TE_k) = {\bar{S}(TE_k)}\frac{{\Delta}S_0}{S_0}\]

\(\frac{{\Delta}S_0}{S_0}\) is a scalar (i.e., doesn’t change with TE), so we ignore that, which means we only use \({\bar{S}(TE_k)}\) (mean echo-wise signal).

Thus,

(5)#\[{\Delta}S(TE_k) = {\bar{S}(TE_k)} * X\]

and for TEDPCA/TEDICA,

(6)#\[PE(TE_k) = {\bar{S}(TE_k)} * X\]

Lastly, we fit X to the data and evaluate model fit.

TE-dependence model#

(7)#\[\frac{{\Delta}S(TE_k)}{\bar{S}(TE_k)} = -{\Delta}{R_2^*}*TE_k\]

(8)#\[{\Delta}S(TE_k) = {\bar{S}(TE_k)} * -{\Delta}{R_2^*}*TE_k\]

\(-{\Delta}{R_2^*}\) is a scalar, so we ignore it, which means we only use \({\bar{S}(TE_k)}\) (mean echo-wise-signal) and \(TE_k\) (echo time in milliseconds).

Thus,

(9)#\[{\Delta}S(TE_k) = {\bar{S}(TE_k)}*TE_k * X\]

and for TEDPCA/TEDICA,

(10)#\[PE(TE_k) = {\bar{S}(TE_k)}*TE_k * X\]

Lastly, we fit X to the data and evaluate model fit.

Fitted curves for S0-perturbed signal#

The predicted curve for the S0 model matches the real curve perfectly!

Fitted curves for R2*-perturbed signal#

For some reason, the predicted curve for the R2 model doesn’t match the real signal curve. What’s with this mismatch?

It seems like the mismatch increases as the difference between the fluctuating volume’s R2 and the mean R2 increase. The fitted curve seems to actually match the mean signal, not the perturbed signal!

Now let’s apply this approach to components#

Show code cell content

Hide code cell content

# Simulate data
# We'll convolve with HRF just for smoothness
hrf = first_level.spm_hrf(1, oversampling=1)

n_trs = 300

frac = 0.05  # 5% PSC
mean_t2s = 30
t2s_std = mean_t2s * frac
mean_s0 = 16000
s0_std = mean_s0 * frac

# simulate the T2*/S0 time series
n_chunks = 10
scales = np.random.random(n_chunks) * 3
t2s_ts = []
for section in range(n_chunks):
    ts = np.hstack((np.zeros(10), np.ones(20), np.zeros(10)))
    ts *= scales[section]
    t2s_ts.append(ts)

t2s_ts = np.hstack(t2s_ts)[:n_trs + 20]
t2s_ts = signal.convolve(t2s_ts, hrf)[:n_trs]
t2s_ts *= t2s_std / np.std(t2s_ts)
t2s_ts += mean_t2s - np.mean(t2s_ts)

s0_ts = np.random.randint(0, 2, n_trs).astype(float)
s0_ts -= 0.5
s0_ts *= np.random.normal(loc=1, scale=0.25, size=n_trs)
s0_ts = np.sort(s0_ts)
first_half = s0_ts[:n_trs // 2]
second_half = s0_ts[n_trs // 2:]
s0_ts = np.zeros(n_trs)
np.random.shuffle(first_half)
np.random.shuffle(second_half)
s0_ts[::2] = first_half
s0_ts[1::2] = second_half
# s0_ts = signal.convolve(s0_ts, hrf)[20 : n_trs + 20]
s0_ts *= s0_std / np.std(s0_ts)
s0_ts += mean_s0 - np.mean(s0_ts)

# Constant T2*/S0 time series
mean_s0_ts = np.ones(n_trs) * mean_s0
mean_t2s_ts = np.ones(n_trs) * mean_t2s

# Simulate signal for each echo time
t2s_signal = predict_bold_signal(echo_times, mean_s0_ts, t2s_ts)
s0_signal = predict_bold_signal(echo_times, s0_ts, mean_t2s_ts)
multiecho_signal = predict_bold_signal(echo_times, s0_ts, t2s_ts)

# Normalize to get component time series
t2s_ts_z = stats.zscore(t2s_ts)
s0_ts_z = stats.zscore(s0_ts)
p = 0.5  # proportion for combination
component = (p * t2s_ts_z) + ((1 - p) * s0_ts_z)

fig, ax = plt.subplots(figsize=(16, 4))
ax.plot(t2s_ts_z, label="T2* fluctuations", color="blue")
ax.plot(s0_ts_z, label="S0 fluctuations", color="red")
ax.plot(component, label="Component", color="black", alpha=0.5, linewidth=5)
ax.set_xlim(0, n_trs - 1)
ax.set_xlabel("Time (TR)")
leg = ax.legend(fontsize=14, ncol=3)
glue("fig_component_curves", fig, display=False)

Algorithm 1 (Minimum image regression)

Inputs

\(\mathbf{O}\) is the matrix of optimally combined (OC) data, of shape \(v \times t\), where \(v\) is the number of voxels in the brain mask and \(t\) is the number of timepoints in the scan.
\(\mathbf{M}\) is the mixing matrix from the ICA decomposition, of shape \(c \times t\), where \(c\) is the number of components.
\(W\) is the set of indices of all components in \(\mathbf{M}\): \(W = \{1, 2, 3, ..., c\}\)
\(N\) is the set of indices of all non-ignored components (i.e., all accepted or BOLD-like, and rejected or non-BOLD components) in \(\mathbf{M}\): \(N \in \mathbb{N}^k \text{ s.t } 1 \leq k \leq c, N \subseteq W\)
\(A\) is the set of indices of all accepted (i.e., BOLD-like) components in \(\mathbf{M}\): \(A \in \mathbb{N}^l \text{ s.t } 1 \leq l \leq k, A \subseteq N\)

Outputs

Multi-echo denoised data without the T1-like effect, referred to as \(\mathbf{D}\) or MEDN+MIR.
Multi-echo BOLD-like data without the T1-like effect, referred to as \(\mathbf{H}\) or MEHK+MIR.
ICA mixing matrix with the T1-like effect removed from component time series (\(\mathbf{K}\)).
Map of the T1-like effect (\(\mathbf{m}\))

Algorithm

The voxel-wise means (\(\mathbf{\overline{O}} \in \mathbb{R}^{v}\)) and standard deviations (\(\mathbf{\sigma_{O}} \in \mathbb{R}^{v}\)) of the optimally combined data are computed over time.
The optimally combined data are z-normalized over time (\(\mathbf{O_z} \in \mathbb{R}^{v \times t}\)).
The normalized optimally combined data matrix (\(\mathbf{O_z}\)) is regressed on the ICA mixing matrix (\(\mathbf{M} \in \mathbb{R}^{c \times t}\)) to construct component-wise parameter estimate maps (\(\mathbf{B} \in \mathbb{R}^{v \times c}\)).

\[ \mathbf{O_{z}} = \mathbf{B} \mathbf{M} + \mathbf{\epsilon}, \enspace \mathbf{\epsilon} \in \mathbb{R}^{v \times t} \]
\(N\) is used to select rows from the mixing matrix \(\mathbf{M}\) and columns from the parameter estimate matrix \(\mathbf{B}\) that correspond to non-ignored (i.e., accepted and rejected) components, forming reduced matrices \(\mathbf{M}_N\) and \(\mathbf{B}_N\). The normalized time series matrix for the combined ignored components and variance left unexplained by the ICA decomposition is then computed by subtracting the scalar product of the non-ignored beta weight and mixing matrices from the normalized OC data time series (\(\mathbf{O_{z}}\)). The result is referred to as the normalized residuals time series matrix (\(\mathbf{R} \in \mathbb{R}^{v \times t}\)).

\[ \mathbf{R} = \mathbf{O_{z}} - \mathbf{B}_N \mathbf{M}_N, \enspace \mathbf{B}_N \in \mathbb{R}^{v \times |N|}, \enspace \mathbf{M}_N \in \mathbb{R}^{|N| \times t} \]
We can likewise construct the normalized time series of BOLD-like components (\(\mathbf{P} \in \mathbb{R}^{v \times t}\)) by multiplying similarly reduced parameter estimate and mixing matrices composed of only the columns and rows, respectively, that are associated with the accepted components indexed in \(A\). The resulting time series matrix is similar to the time series matrix referred to elsewhere in the manuscript as multi-echo high-Kappa (MEHK), with the exception that the component time series have been normalized prior to reconstruction.

\[ \mathbf{P} = \mathbf{B}_A \mathbf{M}_A, \enspace \mathbf{B}_A \in \mathbb{R}^{v \times |A|}, \enspace \mathbf{M}_A \in \mathbb{R}^{|A| \times t} \]
The map of the T1-like effect (\(\mathbf{m} \in \mathbb{R}^{v}\)) is constructed by taking the minimum across timepoints from the normalized MEHK time series (\(\mathbf{P}\)) and then mean-centering across brain voxels. Let \(J = \{1, ..., t\}\) denote the indices of the columns of matrix \(\mathbf{P}\), and let \(p_{ij}\) denote the value of the element \(\mathbf{P}[i,j]\).

\[ \mathbf{q_{i}} = \min_{j\in{J}}p_{ij} \quad \forall i = 1,...,v \]

\[ \mathbf{m} = \mathbf{q} - \mathbf{\overline{q}}, \enspace \mathbf{q} \in \mathbb{R}^{v} \]
The standardized optimally combined time series matrix (\(\mathbf{O_z}\)) is regressed on the T1-like effect map (\(\mathbf{m}\)) to estimate the volume-wise global signal time series (\(\mathbf{g} \in \mathbb{R}^t\)).

\[ \mathbf{O_{z}} = \mathbf{m} \otimes \mathbf{g} + \mathbf{\epsilon}, \enspace \mathbf{\epsilon} \in \mathbb{R}^{v \times t} \]

Where \(\otimes\) is the outer product.
The normalized BOLD time series matrix (\(\mathbf{P}\)) is then regressed on this global signal time series (\(\mathbf{g}\)) in order to estimate a global signal map (\(\mathbf{s} \in \mathbb{R}^v\)) and the normalized BOLD time series matrix without the T1-like effect (\(\mathbf{E} \in \mathbb{R}^{v \times t}\)).

\[ \mathbf{P} = \mathbf{g} \otimes \mathbf{s} + \mathbf{E} \]
The time series matrix of BOLD-like components without the T1-like effect (MEHK+MIR, \(\mathbf{H} \in \mathbb{R}^{v \times t}\)), scaled to match the original OC time series matrix, is constructed by multiplying each column of \(\mathbf{E}\) by the vector \(\mathbf{\sigma_{O}}\).

\[\begin{split} \mathbf{H} = \mathbf{E} \circ \underbrace{ \pmatrix{ \mathbf{{\sigma_{O}}_1} & \cdots & \mathbf{{\sigma_{O}}_1}\\ \vdots & \vdots & \vdots \\ \mathbf{{\sigma_{O}}_v} & \cdots & \mathbf{{\sigma_{O}}_v}\\ } }_{t} \end{split}\]

Where \(\circ\) is the Hadamard product for element-wise multiplication of two matrices.
The ICA-denoised time series without the T1-like effect (MEDN+MIR, \(\mathbf{D} \in \mathbb{R}^{v \times t}\)) is constructed by adding the residuals time series (\(\mathbf{R}\)) to the normalized BOLD time series (\(\mathbf{E}\)), multiplying each column of the result by the vector \(\sigma_{O}\), and adding back in the voxel-wise mean of the OC time series (\(\mathbf{\overline{O}}\)).

\[\begin{split} \mathbf{D} = \mathbf{\overline{O}} + (\mathbf{E} + \mathbf{R}) \circ \underbrace{ \pmatrix{ \mathbf{{\sigma_{O}}_1} & \cdots & \mathbf{{\sigma_{O}}_1}\\ \vdots & \vdots & \vdots \\ \mathbf{{\sigma_{O}}_v} & \cdots & \mathbf{{\sigma_{O}}_v}\\ } }_{t} \end{split}\]
The T1c-corrected ICA mixing matrix is then derived by regressing the global signal time series \(\mathbf{g}\) from each component’s time series. Let \(\mathbf{Q}\) be the associated parameter estimate matrix (\(\mathbf{Q} \in \mathbb{R}^{c \times t}\)).

\[ \mathbf{M} = \mathbf{Q}\mathbf{g} + \mathbf{K} \]

\[ \mathbf{K} = \mathbf{M} - \mathbf{Q}\mathbf{g} \]