pylemur.tl.LEMUR#
- class pylemur.tl.LEMUR(adata, design='~ 1', obs_data=None, n_embedding=15, linear_coefficient_estimator='linear', layer=None, copy=True)#
Fit the LEMUR model
A python implementation of the LEMUR algorithm. For more details please refer to Ahlmann-Eltze (2024).
- Parameters:
data – The AnnData object (or a different matrix container) with the variance stabilized data and the cell-wise annotations in
data.obs.design (
str|list[str] |ndarray(default:'~ 1')) – A specification of the experimental design. This can be a string, which is then parsed usingformulaic. Alternatively, it can be a a list of strings, which are assumed to refer to the columns indata.obs. Finally, it can be a numpy array, representing a design matrix of sizen_cellsxn_covariates. If not provided, a constant design is used.obs_data (
Union[DataFrame,Mapping[str,Iterable[Any]],None] (default:None)) – A pandas DataFrame or a dictionary of iterables containing the cell-wise annotations. It is used in combination with the information indata.obs.n_embedding (
int(default:15)) – The number of dimensions to use for the shared embedding space.linear_coefficient_estimator (
Literal['linear','zero'] (default:'linear')) – The method to use for estimating the linear coefficients. If"linear", the linear coefficients are estimated using ridge regression. If"zero", the linear coefficients are set to zero.layer (
Optional[str] (default:None)) – The name of the layer to use indata. IfNone, theXslot is used.copy (
bool(default:True)) – Whether to make a copy ofdata.
- Variables:
embedding (
ndarray(\(C \times P\))) – Low-dimensional representation of each celladata (
AnnData) – A reference to (potentially a copy of) the input data.data_matrix (
ndarray(\(C \times G\))) – A reference to the data matrix from theadataobject.n_embedding (int) – The number of latent dimensions
design_matrix (
ModelMatrix(\(C \times K\))) – The design matrix that is used for the fit.formula (str) – The design formula specification.
coefficients (
ndarray(\(P \times G \times K\))) – The 3D array of coefficients for the Grassmann regression.alignment_coefficients (
ndarray(\(P \times (P+1) \times K\))) – The 3D array of coefficients for the affine alignment.linear_coefficients (
ndarray(\(K\times G\))) – The 2D array of coefficients for the linear offset per condition.linear_coefficient_estimator (str) – The linear coefficient estimation specification.
base_point (
ndarray(\((P \times G\)))) – The 2D array representing the reference subspace.
Examples
>>> model = pylemur.tl.LEMUR(adata, design="~ label + batch_cov", n_embedding=15) >>> model.fit() >>> model.align_with_harmony() >>> pred_expr = model.predict(new_condition=model.cond(label="treated")) >>> emb_proj = model_small.transform(adata)
Methods table#
|
Fine-tune the embedding using annotated groups of cells. |
|
Fine-tune the embedding with a parametric version of Harmony. |
|
Define a condition for the |
|
|
|
Fit the LEMUR model |
|
Predict the expression of cells in a specific condition |
|
Transform data using the fitted LEMUR model |
Methods#
- LEMUR.align_with_grouping(grouping, ridge_penalty=0.01, preserve_position_of_NAs=False, verbose=True)#
Fine-tune the embedding using annotated groups of cells.
- Parameters:
grouping (
list|ndarray|Series) – A list,ndarray, or pandaspandas.Seriesspecifying the group of cells. The groups span different conditions and can for example be cell types.ridge_penalty (
float|list[float] |ndarray(default:0.01)) – The penalty controlling the flexibility of the alignment.preserve_position_of_NAs (
bool(default:False)) –Truemeans thatNA’s in thegroupingindicate that these cells should stay where they are (if possible).Falsemeans that they are free to move around.verbose (
bool(default:True)) – Whether to print progress to the console.
- Returns:
selfThe fitted LEMUR model with the updated embedding space stored inmodel.embeddingattribute and an the updated alignment coefficients stored inmodel.alignment_coefficients.
- LEMUR.align_with_harmony(ridge_penalty=0.01, max_iter=10, verbose=True)#
Fine-tune the embedding with a parametric version of Harmony.
- Parameters:
ridge_penalty (
float|list[float] |ndarray(default:0.01)) – The penalty controlling the flexibility of the alignment. Smaller values mean more flexible alignments.max_iter (
int(default:10)) – The maximum number of iterations to perform.verbose (
bool(default:True)) – Whether to print progress to the console.
- Returns:
selfThe fitted LEMUR model with the updated embedding space stored inmodel.embeddingattribute and an the updated alignment coefficients stored inmodel.alignment_coefficients.
- LEMUR.cond(**kwargs)#
Define a condition for the
predictfunction.- Parameters:
kwargs – Named arguments specifying the levels of the covariates from the design formula. If a covariate is not specified, the first level is used.
- Returns:
pd.SeriesA contrast vector that aligns to the columns of the design matrix.
Notes
Subtracting two
cond(...)calls, produces a contrast vector; these are commonly used inRto test for differences in a regression model. This pattern is inspired by theRpackage glmGamPoi.
- LEMUR.copy(copy_adata=True)#
- LEMUR.fit(verbose=True)#
Fit the LEMUR model
- Parameters:
verbose (
bool(default:True)) – Whether to print progress to the console.- Returns:
selfThe fitted LEMUR model.
- LEMUR.predict(embedding=None, new_design=None, new_condition=None, obs_data=None, new_adata_layer=None)#
Predict the expression of cells in a specific condition
- Parameters:
embedding (
Optional[ndarray] (default:None)) – The coordinates of the cells in the shared embedding space. If None, the coordinates stored inmodel.embeddingare used.new_design (
Union[str,list[str],ndarray,None] (default:None)) – Either a design formula parsed usingmodel.adata.obsandobs_dataor a design matrix defining the condition for each cell. If bothnew_designandnew_conditionare None, the original design matrix (model.design_matrix) is used.new_condition (
Union[ndarray,DataFrame,None] (default:None)) – A specification of the new condition that is applied to all cells. Typically, this is generated bycond(...).obs_data (
Union[DataFrame,Mapping[str,Iterable[Any]],None] (default:None)) – A DataFrame-like object containing cell-wise annotations. It is only used ifnew_designcontains a formulaic formula string.new_adata_layer (
Optional[str] (default:None)) – If notNone, the function returnsselfand stores the prediction inmodel.adata["new_adata_layer"].
- Returns:
array-like, shape (n_cells, n_genes) The predicted expression of the cells in the new condition.
- LEMUR.transform(adata, layer=None, obs_data=None, return_type='embedding')#
Transform data using the fitted LEMUR model
- Parameters:
adata (
AnnData) – The AnnData object to transform.obs_data (
Union[DataFrame,Mapping[str,Iterable[Any]],None] (default:None)) – Optional set of annotations for each cell (same asobs_datain the constructor).return_type (
Literal['embedding','LEMUR'] (default:'embedding')) – Flag that decides if the function returns a fullLEMURobject or only the embedding.
- Returns: