pylemur.pp.shifted_log_transform

pylemur.pp.shifted_log_transform#

pylemur.pp.shifted_log_transform(counts, overdispersion=0.05, pseudo_count=None, minimum_overdispersion=0.001)#

Apply log transformation to count data

The transformation is proportional to \(\log(y/s + c)\), where \(y\) are the counts, \(s\) is the size factor, and \(c\) is the pseudo-count.

The actual transformation is \(a \, \log(y/(sc) + 1)\), where using \(+1\) ensures that the output remains sparse for \(y=0\) and the scaling with \(a=\sqrt{4c}\) ensures that the transformation approximates the \(\operatorname{acosh}\) transformation. Using \(\log(y/(sc) + 1)\) instead of \(\log(y/s+c)\) only changes the results by a constant offset, as \(\log(y+c) = \log(y/c + 1) - \log(1/c)\). Importantly, neither scaling nor offsetting by a constant changes the variance-stabilizing quality of the transformation.

The size factors are calculated as normalized sum per cell:

size_factors = counts.sum(axis=1)
size_factors = size_factors / np.exp(np.mean(np.log(size_factors)))

In case y is not a matrix, the size_factors are fixed to 1.

Parameters:
  • counts – The count matrix which can be sparse.

  • overdispersion (default: 0.05) – Specification how much variation is expected for a homogeneous sample. The overdispersion and pseudo_count are related by overdispersion = 1 / (4 * pseudo_count).

  • pseudo_count (default: None) – Specification how much variation is expected for a homogeneous sample. The overdispersion and pseudo_count are related by overdispersion = 1 / (4 * pseudo_count).

  • minimum_overdispersion (default: 0.001) – Avoid overdispersion values smaller than minimum_overdispersion.

Returns:

A matrix of variance-stabilized values.