lamindb.core.SOMACurator¶
- class lamindb.core.SOMACurator(experiment_uri, var_index, categoricals=None, obs_columns=FieldAttr(Feature.name), organism=None, sources=None, exclude=None, using_key=None)¶
- Bases: - BaseCurator- Curation flow for - tiledbsoma.- See also - Curator.- Parameters:
- experiment_uri (UPathStr | Artifact) – A local or cloud path to a - tiledbsoma.Experiment.
- var_index (dict[str, tuple[str, FieldAttr]]) – The registry fields for mapping the - .varindices for measurements. Should be in the form- {"measurement name": ("var column", field)}. These keys should be used in the flattened form (- '{measurement name}__{column name in .var}') in- .standardizeor- .add_new_from, see the output of- .var_index.
- categoricals (dict[str, FieldAttr] | None, default: - None) – A dictionary mapping categorical- .obscolumns to a registry field.
- obs_columns (FieldAttr, default: - FieldAttr(Feature.name)) – The registry field for mapping the names of the- .obscolumns.
- organism (str | None, default: - None) – The organism name.
- sources (dict[str, Record] | None, default: - None) – A dictionary mapping- .obscolumns to Source records.
- exclude (dict[str, str | list[str]] | None, default: - None) – A dictionary mapping column names to values to exclude from validation. When specific- Sourceinstances are pinned and may lack default values (e.g., “unknown” or “na”), using the exclude parameter ensures they are not validated.
 
 - Examples - >>> import bionty as bt >>> curator = ln.Curator.from_tiledbsoma( ... "./my_array_store.tiledbsoma", ... var_index={"RNA": ("var_id", bt.Gene.symbol)}, ... categoricals={ ... "cell_type_ontology_id": bt.CellType.ontology_id, ... "donor_id": ln.ULabel.name ... }, ... organism="human", ... ) - Attributes¶- property categoricals: dict[str, DeferredAttribute]¶
- Return the obs fields to validate against. 
 - property non_validated: dict[str, list]¶
- Return the non-validated features and labels. 
 - property var_index: dict[str, DeferredAttribute]¶
- Return the registry fields with flattened keys to validate variables indices against. 
 - Methods¶- add_new_from(key)¶
- Add validated & new categories. - Parameters:
- key ( - str) – The key referencing the slot in the- tiledbsomastore. It should be- '{measurement name}__{column name in .var}'for columns in- .varor a column name in- .obs.
- Return type:
- None
 
 - lookup(using_key=None, public=False)¶
- Lookup categories. - Parameters:
- using_key ( - str|- None, default:- None) – The instance where the lookup is performed. if “public”, the lookup is performed on the public reference.
- Return type:
 
 - save_artifact(description=None, key=None, revises=None, run=None)¶
- Save the validated - tiledbsomastore and metadata.- Parameters:
- description ( - str|- None, default:- None) – A description of the- tiledbsomastore.
- key ( - str|- None, default:- None) – A path-like key to reference artifact in default storage, e.g.,- "myfolder/mystore.tiledbsoma". Artifacts with the same key form a revision family.
- revises ( - Artifact|- None, default:- None) – Previous version of the artifact. Triggers a revision.
- run ( - Run|- None, default:- None) – The run that creates the artifact.
 
- Return type:
- Returns:
- A saved artifact record. 
 
 - standardize(key)¶
- Replace synonyms with standardized values. - Modifies the dataset inplace. - Parameters:
- key ( - str) – The key referencing the slot in the- tiledbsomastore. It should be- '{measurement name}__{column name in .var}'for columns in- .varor a column name in- .obs.
 
 - validate()¶
- Validate categories.