pylemur.tl.LEMUR#

class pylemur.tl.LEMUR(adata, design='~ 1', obs_data=None, n_embedding=15, linear_coefficient_estimator='linear', layer=None, copy=True)#

Fit the LEMUR model

A python implementation of the LEMUR algorithm. For more details please refer to Ahlmann-Eltze (2024).

Parameters:
  • data – The AnnData object (or a different matrix container) with the variance stabilized data and the cell-wise annotations in data.obs.

  • design (str | list[str] | ndarray (default: '~ 1')) – A specification of the experimental design. This can be a string, which is then parsed using formulaic. Alternatively, it can be a a list of strings, which are assumed to refer to the columns in data.obs. Finally, it can be a numpy array, representing a design matrix of size n_cells x n_covariates. If not provided, a constant design is used.

  • obs_data (Union[DataFrame, Mapping[str, Iterable[Any]], None] (default: None)) – A pandas DataFrame or a dictionary of iterables containing the cell-wise annotations. It is used in combination with the information in data.obs.

  • n_embedding (int (default: 15)) – The number of dimensions to use for the shared embedding space.

  • linear_coefficient_estimator (Literal['linear', 'zero'] (default: 'linear')) – The method to use for estimating the linear coefficients. If "linear", the linear coefficients are estimated using ridge regression. If "zero", the linear coefficients are set to zero.

  • layer (Optional[str] (default: None)) – The name of the layer to use in data. If None, the X slot is used.

  • copy (bool (default: True)) – Whether to make a copy of data.

Variables:
  • embedding (ndarray (\(C \times P\))) – Low-dimensional representation of each cell

  • adata (AnnData) – A reference to (potentially a copy of) the input data.

  • data_matrix (ndarray (\(C \times G\))) – A reference to the data matrix from the adata object.

  • n_embedding (int) – The number of latent dimensions

  • design_matrix (ModelMatrix (\(C \times K\))) – The design matrix that is used for the fit.

  • formula (str) – The design formula specification.

  • coefficients (ndarray (\(P \times G \times K\))) – The 3D array of coefficients for the Grassmann regression.

  • alignment_coefficients (ndarray (\(P \times (P+1) \times K\))) – The 3D array of coefficients for the affine alignment.

  • linear_coefficients (ndarray (\(K\times G\))) – The 2D array of coefficients for the linear offset per condition.

  • linear_coefficient_estimator (str) – The linear coefficient estimation specification.

  • base_point (ndarray (\((P \times G\)))) – The 2D array representing the reference subspace.

Examples

>>> model = pylemur.tl.LEMUR(adata, design="~ label + batch_cov", n_embedding=15)
>>> model.fit()
>>> model.align_with_harmony()
>>> pred_expr = model.predict(new_condition=model.cond(label="treated"))
>>> emb_proj = model_small.transform(adata)

Methods table#

align_with_grouping(grouping[, ...])

Fine-tune the embedding using annotated groups of cells.

align_with_harmony([ridge_penalty, ...])

Fine-tune the embedding with a parametric version of Harmony.

cond(**kwargs)

Define a condition for the predict function.

copy([copy_adata])

fit([verbose])

Fit the LEMUR model

predict([embedding, new_design, ...])

Predict the expression of cells in a specific condition

transform(adata[, layer, obs_data, return_type])

Transform data using the fitted LEMUR model

Methods#

LEMUR.align_with_grouping(grouping, ridge_penalty=0.01, preserve_position_of_NAs=False, verbose=True)#

Fine-tune the embedding using annotated groups of cells.

Parameters:
  • grouping (list | ndarray | Series) – A list, ndarray, or pandas pandas.Series specifying the group of cells. The groups span different conditions and can for example be cell types.

  • ridge_penalty (float | list[float] | ndarray (default: 0.01)) – The penalty controlling the flexibility of the alignment.

  • preserve_position_of_NAs (bool (default: False)) – True means that NA’s in the grouping indicate that these cells should stay where they are (if possible). False means that they are free to move around.

  • verbose (bool (default: True)) – Whether to print progress to the console.

Returns:

self The fitted LEMUR model with the updated embedding space stored in model.embedding attribute and an the updated alignment coefficients stored in model.alignment_coefficients.

LEMUR.align_with_harmony(ridge_penalty=0.01, max_iter=10, verbose=True)#

Fine-tune the embedding with a parametric version of Harmony.

Parameters:
  • ridge_penalty (float | list[float] | ndarray (default: 0.01)) – The penalty controlling the flexibility of the alignment. Smaller values mean more flexible alignments.

  • max_iter (int (default: 10)) – The maximum number of iterations to perform.

  • verbose (bool (default: True)) – Whether to print progress to the console.

Returns:

self The fitted LEMUR model with the updated embedding space stored in model.embedding attribute and an the updated alignment coefficients stored in model.alignment_coefficients.

LEMUR.cond(**kwargs)#

Define a condition for the predict function.

Parameters:

kwargs – Named arguments specifying the levels of the covariates from the design formula. If a covariate is not specified, the first level is used.

Returns:

pd.Series A contrast vector that aligns to the columns of the design matrix.

Notes

Subtracting two cond(...) calls, produces a contrast vector; these are commonly used in R to test for differences in a regression model. This pattern is inspired by the R package glmGamPoi.

LEMUR.copy(copy_adata=True)#
LEMUR.fit(verbose=True)#

Fit the LEMUR model

Parameters:

verbose (bool (default: True)) – Whether to print progress to the console.

Returns:

self The fitted LEMUR model.

LEMUR.predict(embedding=None, new_design=None, new_condition=None, obs_data=None, new_adata_layer=None)#

Predict the expression of cells in a specific condition

Parameters:
  • embedding (Optional[ndarray] (default: None)) – The coordinates of the cells in the shared embedding space. If None, the coordinates stored in model.embedding are used.

  • new_design (Union[str, list[str], ndarray, None] (default: None)) – Either a design formula parsed using model.adata.obs and obs_data or a design matrix defining the condition for each cell. If both new_design and new_condition are None, the original design matrix (model.design_matrix) is used.

  • new_condition (Union[ndarray, DataFrame, None] (default: None)) – A specification of the new condition that is applied to all cells. Typically, this is generated by cond(...).

  • obs_data (Union[DataFrame, Mapping[str, Iterable[Any]], None] (default: None)) – A DataFrame-like object containing cell-wise annotations. It is only used if new_design contains a formulaic formula string.

  • new_adata_layer (Optional[str] (default: None)) – If not None, the function returns self and stores the prediction in model.adata["new_adata_layer"].

Returns:

array-like, shape (n_cells, n_genes) The predicted expression of the cells in the new condition.

LEMUR.transform(adata, layer=None, obs_data=None, return_type='embedding')#

Transform data using the fitted LEMUR model

Parameters:
  • adata (AnnData) – The AnnData object to transform.

  • obs_data (Union[DataFrame, Mapping[str, Iterable[Any]], None] (default: None)) – Optional set of annotations for each cell (same as obs_data in the constructor).

  • return_type (Literal['embedding', 'LEMUR'] (default: 'embedding')) – Flag that decides if the function returns a full LEMUR object or only the embedding.

Returns:

LEMUR

(if return_type = "LEMUR") A new LEMUR object object with the embedding calculated for the input adata.

ndarray

(if return_type = "embedding") A 2D numpy array of the embedding matrix calculated for the input adata (with cells in the rows and latent dimensions in the columns).