pylemur.pp.shifted_log_transform#
- pylemur.pp.shifted_log_transform(counts, overdispersion=0.05, pseudo_count=None, minimum_overdispersion=0.001)#
Apply log transformation to count data
The transformation is proportional to \(\log(y/s + c)\), where \(y\) are the counts, \(s\) is the size factor, and \(c\) is the pseudo-count.
The actual transformation is \(a \, \log(y/(sc) + 1)\), where using \(+1\) ensures that the output remains sparse for \(y=0\) and the scaling with \(a=\sqrt{4c}\) ensures that the transformation approximates the \(\operatorname{acosh}\) transformation. Using \(\log(y/(sc) + 1)\) instead of \(\log(y/s+c)\) only changes the results by a constant offset, as \(\log(y+c) = \log(y/c + 1) - \log(1/c)\). Importantly, neither scaling nor offsetting by a constant changes the variance-stabilizing quality of the transformation.
The size factors are calculated as normalized sum per cell:
size_factors = counts.sum(axis=1) size_factors = size_factors / np.exp(np.mean(np.log(size_factors)))
In case
yis not a matrix, thesize_factorsare fixed to 1.- Parameters:
counts – The count matrix which can be sparse.
overdispersion (default:
0.05) – Specification how much variation is expected for a homogeneous sample. Theoverdispersionandpseudo_countare related byoverdispersion = 1 / (4 * pseudo_count).pseudo_count (default:
None) – Specification how much variation is expected for a homogeneous sample. Theoverdispersionandpseudo_countare related byoverdispersion = 1 / (4 * pseudo_count).minimum_overdispersion (default:
0.001) – Avoid overdispersion values smaller thanminimum_overdispersion.
- Returns:
A matrix of variance-stabilized values.