lamindb.core.CanCurate¶
- class lamindb.core.CanCurate¶
- Bases: - object- Base class providing - Record-based validation.- Class methods¶- classmethod from_values(values, field=None, create=False, organism=None, source=None, mute=False)¶
- Bulk create validated records by parsing values for an identifier such as a name or an id). - Parameters:
- values ( - list[- str] |- Series|- array) – A list of values for an identifier, e.g.- ["name1", "name2"].
- field ( - str|- DeferredAttribute|- None, default:- None) – A- Recordfield to look up, e.g.,- bt.CellMarker.name.
- create ( - bool, default:- False) – Whether to create records if they don’t exist.
- organism ( - Record|- str|- None, default:- None) – A- bionty.Organismname or record.
- source ( - Record|- None, default:- None) – A- bionty.Sourcerecord to validate against to create records for.
- mute ( - bool, default:- False) – Whether to mute logging.
 
- Return type:
- Returns:
- A list of validated records. For bionty registries. Also returns knowledge-coupled records. 
 - Notes - For more info, see tutorial: Manage biological registries. - Examples - Bulk create from non-validated values will log warnings & returns empty list: - >>> ulabels = ln.ULabel.from_values(["benchmark", "prediction", "test"], field="name") >>> assert len(ulabels) == 0 - Bulk create records from validated values returns the corresponding existing records: - >>> ln.save([ln.ULabel(name=name) for name in ["benchmark", "prediction", "test"]]) >>> ulabels = ln.ULabel.from_values(["benchmark", "prediction", "test"], field="name") >>> assert len(ulabels) == 3 - Bulk create records from public reference: - >>> import bionty as bt >>> records = bt.CellType.from_values(["T cell", "B cell"], field="name") >>> records 
 - classmethod inspect(values, field=None, *, mute=False, organism=None, source=None)¶
- Inspect if values are mappable to a field. - Being mappable means that an exact match exists. - Parameters:
- values ( - list[- str] |- Series|- array) – Values that will be checked against the field.
- field ( - str|- DeferredAttribute|- None, default:- None) – The field of values. Examples are- 'ontology_id'to map against the source ID or- 'name'to map against the ontologies field names.
- mute ( - bool, default:- False) – Whether to mute logging.
- organism ( - str|- Record|- None, default:- None) – An Organism name or record.
- source ( - Record|- None, default:- None) – A- bionty.Sourcerecord that specifies the version to inspect against.
 
- Return type:
 - See also - Examples - >>> import bionty as bt >>> bt.settings.organism = "human" >>> ln.save(bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol")) >>> gene_symbols = ["A1CF", "A1BG", "FANCD1", "FANCD20"] >>> result = bt.Gene.inspect(gene_symbols, field=bt.Gene.symbol) >>> result.validated ['A1CF', 'A1BG'] >>> result.non_validated ['FANCD1', 'FANCD20'] 
 - classmethod standardize(values, field=None, *, return_field=None, return_mapper=False, case_sensitive=False, mute=False, public_aware=True, keep='first', synonyms_field='synonyms', organism=None, source=None)¶
- Maps input synonyms to standardized names. - Parameters:
- values ( - list[- str] |- Series|- array) – Identifiers that will be standardized.
- field ( - str|- DeferredAttribute|- None, default:- None) – The field representing the standardized names.
- return_field ( - str, default:- None) – The field to return. Defaults to field.
- return_mapper ( - bool, default:- False) – If- True, returns- {input_value: standardized_name}.
- case_sensitive ( - bool, default:- False) – Whether the mapping is case sensitive.
- mute ( - bool, default:- False) – Whether to mute logging.
- public_aware ( - bool, default:- True) – Whether to standardize from Bionty reference. Defaults to- Truefor Bionty registries.
- keep ( - Literal[- 'first',- 'last',- False], default:- 'first') –- When a synonym maps to multiple names, determines which duplicates to mark as pd.DataFrame.duplicated:
- "first": returns the first mapped standardized name
- "last": returns the last mapped standardized name
- False: returns all mapped standardized name.
 
 - When - keepis- False, the returned list of standardized names will contain nested lists in case of duplicates.- When a field is converted into return_field, keep marks which matches to keep when multiple return_field values map to the same field value. 
- When a synonym maps to multiple names, determines which duplicates to mark as 
- synonyms_field ( - str, default:- 'synonyms') – A field containing the concatenated synonyms.
- organism ( - str|- Record|- None, default:- None) – An Organism name or record.
- source ( - Record|- None, default:- None) – A- bionty.Sourcerecord that specifies the version to validate against.
 
- Return type:
- list[- str] |- dict[- str,- str]
- Returns:
- If - return_mapperis- False– a list of standardized names. Otherwise, a dictionary of mapped values with mappable synonyms as keys and standardized names as values.
 - See also - add_synonym()
- Add synonyms. 
- remove_synonym()
- Remove synonyms. 
 - Examples - >>> import bionty as bt >>> bt.settings.organism = "human" >>> ln.save(bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol")) >>> gene_synonyms = ["A1CF", "A1BG", "FANCD1", "FANCD20"] >>> standardized_names = bt.Gene.standardize(gene_synonyms) >>> standardized_names ['A1CF', 'A1BG', 'BRCA2', 'FANCD20'] 
 - classmethod validate(values, field=None, *, mute=False, organism=None, source=None)¶
- Validate values against existing values of a string field. - Note this is strict validation, only asserts exact matches. - Parameters:
- values ( - list[- str] |- Series|- array) – Values that will be validated against the field.
- field ( - str|- DeferredAttribute|- None, default:- None) – The field of values. Examples are- 'ontology_id'to map against the source ID or- 'name'to map against the ontologies field names.
- mute ( - bool, default:- False) – Whether to mute logging.
- organism ( - str|- Record|- None, default:- None) – An Organism name or record.
- source ( - Record|- None, default:- None) – A- bionty.Sourcerecord that specifies the version to validate against.
 
- Return type:
- ndarray
- Returns:
- A vector of booleans indicating if an element is validated. 
 - See also - Examples - >>> import bionty as bt >>> bt.settings.organism = "human" >>> ln.save(bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol")) >>> gene_symbols = ["A1CF", "A1BG", "FANCD1", "FANCD20"] >>> bt.Gene.validate(gene_symbols, field=bt.Gene.symbol) array([ True, True, False, False]) 
 - Methods¶- add_synonym(synonym, force=False, save=None)¶
- Add synonyms to a record. - Parameters:
- synonym ( - str|- list[- str] |- Series|- array) – The synonyms to add to the record.
- force ( - bool, default:- False) – Whether to add synonyms even if they are already synonyms of other records.
- save ( - bool|- None, default:- None) – Whether to save the record to the database.
 
 - See also - remove_synonym()
- Remove synonyms. 
 - Examples - >>> import bionty as bt >>> bt.CellType.from_source(name="T cell").save() >>> lookup = bt.CellType.lookup() >>> record = lookup.t_cell >>> record.synonyms 'T-cell|T lymphocyte|T-lymphocyte' >>> record.add_synonym("T cells") >>> record.synonyms 'T cells|T-cell|T-lymphocyte|T lymphocyte' 
 - remove_synonym(synonym)¶
- Remove synonyms from a record. - Parameters:
- synonym ( - str|- list[- str] |- Series|- array) – The synonym values to remove.
 - See also - add_synonym()
- Add synonyms 
 - Examples - >>> import bionty as bt >>> bt.CellType.from_source(name="T cell").save() >>> lookup = bt.CellType.lookup() >>> record = lookup.t_cell >>> record.synonyms 'T-cell|T lymphocyte|T-lymphocyte' >>> record.remove_synonym("T-cell") 'T lymphocyte|T-lymphocyte' 
 - set_abbr(value)¶
- Set value for abbr field and add to synonyms. - Parameters:
- value ( - str) – A value for an abbreviation.
 - See also - Examples - >>> import bionty as bt >>> bt.ExperimentalFactor.from_source(name="single-cell RNA sequencing").save() >>> scrna = bt.ExperimentalFactor.get(name="single-cell RNA sequencing") >>> scrna.abbr None >>> scrna.synonyms 'single-cell RNA-seq|single-cell transcriptome sequencing|scRNA-seq|single cell RNA sequencing' >>> scrna.set_abbr("scRNA") >>> scrna.abbr 'scRNA' >>> scrna.synonyms 'scRNA|single-cell RNA-seq|single cell RNA sequencing|single-cell transcriptome sequencing|scRNA-seq' >>> scrna.save()