pylemur.tl.LEMUR

pylemur.tl.LEMUR#

class pylemur.tl.LEMUR(adata, design='~ 1', obs_data=None, n_embedding=15, linear_coefficient_estimator='linear', layer=None, copy=True)#

Fit the LEMUR model

A python implementation of the LEMUR algorithm. For more details please refer to Ahlmann-Eltze (2024).

Parameters:

data – The AnnData object (or a different matrix container) with the variance stabilized data and the cell-wise annotations in data.obs.
design (str | list[str] | ndarray (default: '~ 1')) – A specification of the experimental design. This can be a string, which is then parsed using formulaic. Alternatively, it can be a a list of strings, which are assumed to refer to the columns in data.obs. Finally, it can be a numpy array, representing a design matrix of size n_cells x n_covariates. If not provided, a constant design is used.
obs_data (Union[DataFrame, Mapping[str, Iterable[Any]], None] (default: None)) – A pandas DataFrame or a dictionary of iterables containing the cell-wise annotations. It is used in combination with the information in data.obs.
n_embedding (int (default: 15)) – The number of dimensions to use for the shared embedding space.
linear_coefficient_estimator (Literal['linear', 'zero'] (default: 'linear')) – The method to use for estimating the linear coefficients. If "linear", the linear coefficients are estimated using ridge regression. If "zero", the linear coefficients are set to zero.
layer (Optional[str] (default: None)) – The name of the layer to use in data. If None, the X slot is used.
copy (bool (default: True)) – Whether to make a copy of data.

Variables:

embedding (ndarray (\(C \times P\))) – Low-dimensional representation of each cell
adata (AnnData) – A reference to (potentially a copy of) the input data.
data_matrix (ndarray (\(C \times G\))) – A reference to the data matrix from the adata object.
n_embedding (int) – The number of latent dimensions
design_matrix (ModelMatrix (\(C \times K\))) – The design matrix that is used for the fit.
formula (str) – The design formula specification.
coefficients (ndarray (\(P \times G \times K\))) – The 3D array of coefficients for the Grassmann regression.
alignment_coefficients (ndarray (\(P \times (P+1) \times K\))) – The 3D array of coefficients for the affine alignment.
linear_coefficients (ndarray (\(K\times G\))) – The 2D array of coefficients for the linear offset per condition.
linear_coefficient_estimator (str) – The linear coefficient estimation specification.
base_point (ndarray (\((P \times G\)))) – The 2D array representing the reference subspace.

Examples

>>> model = pylemur.tl.LEMUR(adata, design="~ label + batch_cov", n_embedding=15)
>>> model.fit()
>>> model.align_with_harmony()
>>> pred_expr = model.predict(new_condition=model.cond(label="treated"))
>>> emb_proj = model_small.transform(adata)

Methods table#

`align_with_grouping`(grouping[, ...])	Fine-tune the embedding using annotated groups of cells.
`align_with_harmony`([ridge_penalty, ...])	Fine-tune the embedding with a parametric version of Harmony.
`cond`(**kwargs)	Define a condition for the `predict` function.
`copy`([copy_adata])
`fit`([verbose])	Fit the LEMUR model
`predict`([embedding, new_design, ...])	Predict the expression of cells in a specific condition
`transform`(adata[, layer, obs_data, return_type])	Transform data using the fitted LEMUR model

Methods#

LEMUR.align_with_grouping(grouping, ridge_penalty=0.01, preserve_position_of_NAs=False, verbose=True)#

Fine-tune the embedding using annotated groups of cells.

Parameters:

grouping (list | ndarray | Series) – A list, ndarray, or pandas pandas.Series specifying the group of cells. The groups span different conditions and can for example be cell types.
ridge_penalty (float | list[float] | ndarray (default: 0.01)) – The penalty controlling the flexibility of the alignment.
preserve_position_of_NAs (bool (default: False)) – True means that NA’s in the grouping indicate that these cells should stay where they are (if possible). False means that they are free to move around.
verbose (bool (default: True)) – Whether to print progress to the console.

Returns:

self The fitted LEMUR model with the updated embedding space stored in model.embedding attribute and an the updated alignment coefficients stored in model.alignment_coefficients.

LEMUR.align_with_harmony(ridge_penalty=0.01, max_iter=10, verbose=True)#

Fine-tune the embedding with a parametric version of Harmony.

Parameters:

ridge_penalty (float | list[float] | ndarray (default: 0.01)) – The penalty controlling the flexibility of the alignment. Smaller values mean more flexible alignments.
max_iter (int (default: 10)) – The maximum number of iterations to perform.
verbose (bool (default: True)) – Whether to print progress to the console.

Returns:

self The fitted LEMUR model with the updated embedding space stored in model.embedding attribute and an the updated alignment coefficients stored in model.alignment_coefficients.

LEMUR.cond(**kwargs)#

Define a condition for the predict function.

Parameters:: kwargs – Named arguments specifying the levels of the covariates from the design formula. If a covariate is not specified, the first level is used.
Returns:: pd.Series A contrast vector that aligns to the columns of the design matrix.

Notes

Subtracting two cond(...) calls, produces a contrast vector; these are commonly used in R to test for differences in a regression model. This pattern is inspired by the R package glmGamPoi.

LEMUR.copy(copy_adata=True)#

LEMUR.fit(verbose=True)#

Fit the LEMUR model

Parameters:: verbose (bool (default: True)) – Whether to print progress to the console.
Returns:: self The fitted LEMUR model.

LEMUR.predict(embedding=None, new_design=None, new_condition=None, obs_data=None, new_adata_layer=None)#

Predict the expression of cells in a specific condition

Parameters:

embedding (Optional[ndarray] (default: None)) – The coordinates of the cells in the shared embedding space. If None, the coordinates stored in model.embedding are used.
new_design (Union[str, list[str], ndarray, None] (default: None)) – Either a design formula parsed using model.adata.obs and obs_data or a design matrix defining the condition for each cell. If both new_design and new_condition are None, the original design matrix (model.design_matrix) is used.
new_condition (Union[ndarray, DataFrame, None] (default: None)) – A specification of the new condition that is applied to all cells. Typically, this is generated by cond(...).
obs_data (Union[DataFrame, Mapping[str, Iterable[Any]], None] (default: None)) – A DataFrame-like object containing cell-wise annotations. It is only used if new_design contains a formulaic formula string.
new_adata_layer (Optional[str] (default: None)) – If not None, the function returns self and stores the prediction in model.adata["new_adata_layer"].

Returns:

array-like, shape (n_cells, n_genes) The predicted expression of the cells in the new condition.

LEMUR.transform(adata, layer=None, obs_data=None, return_type='embedding')#

Transform data using the fitted LEMUR model

Parameters:

adata (AnnData) – The AnnData object to transform.
obs_data (Union[DataFrame, Mapping[str, Iterable[Any]], None] (default: None)) – Optional set of annotations for each cell (same as obs_data in the constructor).
return_type (Literal['embedding', 'LEMUR'] (default: 'embedding')) – Flag that decides if the function returns a full LEMUR object or only the embedding.

Returns:

LEMUR: (if return_type = "LEMUR") A new LEMUR object object with the embedding calculated for the input adata.
ndarray: (if return_type = "embedding") A 2D numpy array of the embedding matrix calculated for the input adata (with cells in the rows and latent dimensions in the columns).

pylemur.tl.LEMUR

Contents

pylemur.tl.LEMUR#

Methods table#

Methods#