lamindb.core.AnnDataCurator¶
- class lamindb.core.AnnDataCurator(data, var_index, categoricals=None, obs_columns=FieldAttr(Feature.name), using_key=None, verbosity='hint', organism=None, sources=None, exclude=None)¶
- Bases: - DataFrameCurator- Curation flow for - AnnData.- See also - Curator.- Note that if genes are removed from the AnnData object, the object should be recreated using - from_anndata().- See Curate AnnData based on the CELLxGENE schema for instructions on how to curate against a specific cellxgene schema version. - Parameters:
- data (ad.AnnData | UPathStr) – The AnnData object or an AnnData-like path. 
- var_index (FieldAttr) – The registry field for mapping the - .varindex.
- categoricals (dict[str, FieldAttr] | None, default: - None) – A dictionary mapping- .obs.columnsto a registry field.
- obs_columns (FieldAttr, default: - FieldAttr(Feature.name)) – The registry field for mapping the- .obs.columns.
- using_key (str | None, default: - None) – A reference LaminDB instance.
- verbosity (str, default: - 'hint') – The verbosity level.
- organism (str | None, default: - None) – The organism name.
- sources (dict[str, Record] | None, default: - None) – A dictionary mapping- .obs.columnsto Source records.
- exclude (dict | None, default: - None) – A dictionary mapping column names to values to exclude from validation. When specific- Sourceinstances are pinned and may lack default values (e.g., “unknown” or “na”), using the exclude parameter ensures they are not validated.
 
 - Examples - >>> import bionty as bt >>> curator = ln.Curator.from_anndata( ... adata, ... var_index=bt.Gene.ensembl_gene_id, ... categoricals={ ... "cell_type_ontology_id": bt.CellType.ontology_id, ... "donor_id": ln.ULabel.name ... }, ... organism="human", ... ) - Attributes¶- property categoricals: dict¶
- Return the obs fields to validate against. 
 - property fields: dict¶
- Return the columns fields to validate against. 
 - property non_validated: dict[str, list[str]]¶
- Return the non-validated features and labels. 
 - property var_index: DeferredAttribute¶
- Return the registry field to validate variables index against. 
 - Methods¶- add_new_from(key, organism=None, **kwargs)¶
- Add validated & new categories. - Parameters:
- key ( - str) – The key referencing the slot in the DataFrame from which to draw terms.
- organism ( - str|- None, default:- None) – The organism name.
- **kwargs – Additional keyword arguments to pass to create new records 
 
 
 - add_new_from_var_index(organism=None, **kwargs)¶
- Update variable records. - Parameters:
- organism ( - str|- None, default:- None) – The organism name.
- **kwargs – Additional keyword arguments to pass to create new records. 
 
 
 - clean_up_failed_runs()¶
- Clean up previous failed runs that don’t save any outputs. 
 - lookup(using_key=None, public=False)¶
- Lookup categories. - Parameters:
- using_key ( - str|- None, default:- None) – The instance where the lookup is performed. if “public”, the lookup is performed on the public reference.
- Return type:
 
 - save_artifact(description=None, key=None, revises=None, run=None)¶
- Save the validated - AnnDataand metadata.- Parameters:
- description ( - str|- None, default:- None) – A description of the- AnnDataobject.
- key ( - str|- None, default:- None) – A path-like key to reference artifact in default storage, e.g.,- "myfolder/myfile.fcs". Artifacts with the same key form a revision family.
- revises ( - Artifact|- None, default:- None) – Previous version of the artifact. Triggers a revision.
- run ( - Run|- None, default:- None) – The run that creates the artifact.
 
- Return type:
- Returns:
- A saved artifact record. 
 
 - standardize(key)¶
- Replace synonyms with standardized values. - Parameters:
- key ( - str) –- The key referencing the slot in - adata.obsfrom which to draw terms. Same as the key in- categoricals.- If “var_index”, standardize the var.index. 
- If “all”, standardize all obs columns and var.index. 
 
 - Inplace modification of the dataset. 
 - validate(organism=None)¶
- Validate categories. - This method also registers the validated records in the current instance. - Parameters:
- organism ( - str|- None, default:- None) – The organism name.
- Return type:
- bool
- Returns:
- Whether the AnnData object is validated.