Multi-modal¶
Here, we’ll showcase how to curate and register ECCITE-seq data from Papalexi21 in the form of MuData objects.
ECCITE-seq is designed to enable interrogation of single-cell transcriptomes together with surface protein markers in the context of CRISPR screens.
MuData objects build on top of AnnData objects to store multimodal data.
%load_ext autoreload
%autoreload 2
# !pip install 'lamindb[jupyter,bionty]'
!lamin init --storage ./test-multimodal --schema bionty
Show code cell output
→ initialized lamindb: testuser1/test-multimodal
import lamindb as ln
import bionty as bt
Show code cell output
→ connected lamindb: testuser1/test-multimodal
mdata = ln.core.datasets.mudata_papalexi21_subset()
mdata
Show code cell output
MuData object with n_obs × n_vars = 200 × 300
  obs:	'perturbation', 'replicate'
  var:	'name'
  4 modalities
    rna:	200 x 173
      obs:	'nCount_RNA', 'nFeature_RNA', 'percent.mito'
      var:	'name'
    adt:	200 x 4
      obs:	'nCount_ADT', 'nFeature_ADT'
      var:	'name'
    hto:	200 x 12
      obs:	'nCount_HTO', 'nFeature_HTO', 'technique'
      var:	'name'
    gdo:	200 x 111
      obs:	'nCount_GDO'
      var:	'name'Validate annotations¶
curate = ln.Curator.from_mudata(
    mdata,
    var_index={
        "rna": bt.Gene.symbol,  # gene expression
        "adt": bt.CellMarker.name,  # antibody derived tags reflecting surface proteins
        "hto": ln.Feature.name,  # cell hashing
        "gdo": ln.Feature.name,  # guide RNAs
    },
    categoricals={
        "perturbation": ln.ULabel.name,  # shared categorical
        "replicate": ln.ULabel.name,  # shared categorical
        "hto:technique": bt.ExperimentalFactor.name,  # note this is a modality specific categorical
    },
    organism="human",
)
Show code cell output
✓ added 2 records with Feature.name for "columns": 'perturbation', 'replicate'
! indexing datasets with gene symbols can be problematic: https://docs.lamin.ai/faq/symbol-mapping
✓ added 1 record with Feature.name for "columns": 'technique'
# optional: register additional columns we'd like to curate
curate.add_new_from_columns(modality="rna")
curate.add_new_from_columns(modality="adt")
curate.add_new_from_columns(modality="hto")
curate.add_new_from_columns(modality="gdo")
Show code cell output
/tmp/ipykernel_3829/1003816735.py:2: DeprecationWarning: `.add_new_from_columns()` is deprecated and will be removed in a future version. It's run by default during initialization.
  curate.add_new_from_columns(modality="rna")
/tmp/ipykernel_3829/1003816735.py:3: DeprecationWarning: `.add_new_from_columns()` is deprecated and will be removed in a future version. It's run by default during initialization.
  curate.add_new_from_columns(modality="adt")
/tmp/ipykernel_3829/1003816735.py:4: DeprecationWarning: `.add_new_from_columns()` is deprecated and will be removed in a future version. It's run by default during initialization.
  curate.add_new_from_columns(modality="hto")
/tmp/ipykernel_3829/1003816735.py:5: DeprecationWarning: `.add_new_from_columns()` is deprecated and will be removed in a future version. It's run by default during initialization.
  curate.add_new_from_columns(modality="gdo")
curate.validate()
Show code cell output
• saving validated records of 'var_index'
• saving validated records of 'var_index'
• saving validated records of 'technique'
• validating categoricals in "obs"...
• mapping "perturbation" on ULabel.name
!   2 terms are not validated: 'Perturbed', 'NT'
    → fix typos, remove non-existent values, or save terms via .add_new_from("perturbation")
• mapping "replicate" on ULabel.name
!   3 terms are not validated: 'rep3', 'rep1', 'rep2'
    → fix typos, remove non-existent values, or save terms via .add_new_from("replicate")
• validating categoricals in modality "rna"...
• mapping "var_index" on Gene.symbol
!   96 terms are not validated: 'RP5-827C21.6', 'XX-CR54.1', 'RP11-379B18.5', 'RP11-778D9.12', 'RP11-703G6.1', 'AC005150.1', 'RP11-717H13.1', 'CTC-498J12.1', 'CTC-467M3.1', 'HIST1H4K', 'RP11-524H19.2', 'AC006042.7', 'AC002066.1', 'AC073934.6', 'RP11-268G12.1', 'U52111.14', 'RP11-235C23.5', 'RP11-12J10.3', 'CASC1', 'RP11-324E6.9', ...
    12 synonyms found: "CTC-467M3.1" → "MEF2C-AS2", "HIST1H4K" → "H4C12", "CASC1" → "DNAI7", "LARGE" → "LARGE1", "NBPF16" → "NBPF15", "C1orf65" → "CCDC185", "IBA57-AS1" → "IBA57-DT", "KIAA1239" → "NWD2", "TMEM75" → "LINC02912", "AP003419.16" → "RPS6KB2-AS1", "FAM65C" → "RIPOR3", "C14orf177" → "LINC02914"
    → curate synonyms via .standardize("var_index")    for remaining terms:
    → fix typos, remove non-existent values, or save terms via .add_new_from_var_index()
• validating categoricals in modality "adt"...
✓ "var_index" is validated against CellMarker.name
• validating categoricals in modality "gdo"...
• mapping "var_index" on Feature.name
!   111 terms are not validated: 'eGFPg1', 'CUL3g1', 'CUL3g2', 'CUL3g3', 'CMTM6g1', 'CMTM6g2', 'CMTM6g3', 'NTg1', 'NTg2', 'NTg3', 'NTg4', 'NTg5', 'NTg7', 'PDL1g1', 'PDL1g2', 'PDL1g3', 'ATF2g1', 'ATF2g2', 'ATF2g3', 'ATF2g4', ...
    → fix typos, remove non-existent values, or save terms via .add_new_from_var_index()
• validating categoricals in modality "hto"...
• mapping "var_index" on Feature.name
!   12 terms are not validated: 'rep1-tx', 'rep1-ctrl', 'rep2-tx', 'rep2-ctrl', 'PDL1g1-tx', 'PDL1g1-ctrl', 'PDL1g2-tx', 'PDL1g2-ctrl', 'rep3-tx', 'rep3-ctrl', 'rep4-tx', 'rep4-ctrl'
    → fix typos, remove non-existent values, or save terms via .add_new_from_var_index()
✓ "technique" is validated against ExperimentalFactor.name
False
# add new var index
curate.add_new_from_var_index("rna")
curate.add_new_from_var_index("hto")
curate.add_new_from_var_index("gdo")
# add new categories
curate.add_new_from("perturbation")
curate.add_new_from("replicate")
Show code cell output
✓ added 96 records with Gene.symbol for "var_index": 'RP5-827C21.6', 'XX-CR54.1', 'RP11-379B18.5', 'RP11-778D9.12', 'RP11-703G6.1', 'AC005150.1', 'RP11-717H13.1', 'CTC-498J12.1', 'CTC-467M3.1', 'HIST1H4K', 'RP11-524H19.2', 'AC006042.7', 'AC002066.1', 'AC073934.6', 'RP11-268G12.1', 'U52111.14', 'RP11-235C23.5', 'RP11-12J10.3', 'CASC1', 'RP11-324E6.9', ...
✓ added 12 records with Feature.name for "var_index": 'rep1-tx', 'rep1-ctrl', 'rep2-tx', 'rep2-ctrl', 'PDL1g1-tx', 'PDL1g1-ctrl', 'PDL1g2-tx', 'PDL1g2-ctrl', 'rep3-tx', 'rep3-ctrl', 'rep4-tx', 'rep4-ctrl'
✓ added 111 records with Feature.name for "var_index": 'eGFPg1', 'CUL3g1', 'CUL3g2', 'CUL3g3', 'CMTM6g1', 'CMTM6g2', 'CMTM6g3', 'NTg1', 'NTg2', 'NTg3', 'NTg4', 'NTg5', 'NTg7', 'PDL1g1', 'PDL1g2', 'PDL1g3', 'ATF2g1', 'ATF2g2', 'ATF2g3', 'ATF2g4', ...
✓ added 2 records with ULabel.name for "perturbation": 'Perturbed', 'NT'
✓ added 3 records with ULabel.name for "replicate": 'rep3', 'rep2', 'rep1'
curate.validate()
Show code cell output
• validating categoricals in "obs"...
✓ "perturbation" is validated against ULabel.name
✓ "replicate" is validated against ULabel.name
• validating categoricals in modality "rna"...
✓ "var_index" is validated against Gene.symbol
• validating categoricals in modality "adt"...
✓ "var_index" is validated against CellMarker.name
• validating categoricals in modality "gdo"...
✓ "var_index" is validated against Feature.name
• validating categoricals in modality "hto"...
✓ "var_index" is validated against Feature.name
✓ "technique" is validated against ExperimentalFactor.name
True
Register curated artifact¶
artifact = curate.save_artifact(description="Sub-sampled MuData from Papalexi21")
Show code cell output
! no run & transform got linked, call `ln.track()` & re-run
! run input wasn't tracked, call `ln.track()` and re-run
! did not create Feature records for 37 non-validated names: 'adt:G2M.Score', 'adt:HTO_classification', 'adt:MULTI_ID', 'adt:NT', 'adt:Phase', 'adt:S.Score', 'adt:gene_target', 'adt:guide_ID', 'adt:orig.ident', 'adt:percent.mito', 'adt:perturbation', 'adt:replicate', 'gdo:G2M.Score', 'gdo:HTO_classification', 'gdo:MULTI_ID', 'gdo:NT', 'gdo:Phase', 'gdo:S.Score', 'gdo:gene_target', 'gdo:guide_ID', ...
!    3 unique terms (100.00%) are not validated for name: 'nCount_RNA', 'nFeature_RNA', 'percent.mito'
! skip linking features to artifact in slot 'obs'
!    2 unique terms (100.00%) are not validated for name: 'nCount_ADT', 'nFeature_ADT'
! skip linking features to artifact in slot 'obs'
!    2 unique terms (66.70%) are not validated for name: 'nCount_HTO', 'nFeature_HTO'
!    did not create Feature records for 2 non-validated names: 'nCount_HTO', 'nFeature_HTO'
!    1 unique term (100.00%) is not validated for name: 'nCount_GDO'
! skip linking features to artifact in slot 'obs'
artifact.describe()
Show code cell output
Artifact .h5mu/MuData ├── General │ ├── .uid = 'v4hUvQbLhW9aRuu70000' │ ├── .size = 549984 │ ├── .hash = 'aFIJ7G9AIcxoEib8kecChw' │ ├── .n_observations = 200 │ ├── .path = │ │ /home/runner/work/lamin-usecases/lamin-usecases/docs/test-multimodal/.lamindb/v4hUvQbLhW9aRuu70000.h5mu │ ├── .created_by = testuser1 (Test User1) │ └── .created_at = 2025-01-24 14:05:05 ├── Dataset features/schema │ ├── obs • 2 [Feature] │ │ perturbation cat[ULabel] NT, Perturbed │ │ replicate cat[ULabel] rep1, rep2, rep3 │ ├── ['rna'].var • 184 [bionty.Gene] │ │ SH2D6 float │ │ ARHGAP26-AS1 float │ │ GABRA1 float │ │ HLA-DQB1-AS1 float │ │ HLA-DQB1-AS1 float │ │ HLA-DQB1-AS1 float │ │ HLA-DQB1-AS1 float │ │ HLA-DQB1-AS1 float │ │ HLA-DQB1-AS1 float │ │ HLA-DQB1-AS1 float │ │ SPACA1 float │ │ VNN1 float │ │ CTAGE15 float │ │ CTAGE15 float │ │ PFKFB1 float │ │ TRPC5 float │ │ RBPMS-AS1 float │ │ CA8 float │ │ CSMD3 float │ │ ZNF483 float │ ├── ['adt'].var • 4 [bionty.CellMarker] │ │ CD86 float │ │ PDL1 float │ │ PDL2 float │ │ CD366 float │ ├── ['hto'].var • 12 [Feature] │ │ rep1-tx cat │ │ rep1-ctrl cat │ │ rep2-tx cat │ │ rep2-ctrl cat │ │ PDL1g1-tx cat │ │ PDL1g1-ctrl cat │ │ PDL1g2-tx cat │ │ PDL1g2-ctrl cat │ │ rep3-tx cat │ │ rep3-ctrl cat │ │ rep4-tx cat │ │ rep4-ctrl cat │ ├── ['hto'].obs • 1 [Feature] │ │ technique cat[bionty.ExperimentalF… cell hashing │ └── ['gdo'].var • 111 [Feature] │ eGFPg1 cat │ CUL3g1 cat │ CUL3g2 cat │ CUL3g3 cat │ CMTM6g1 cat │ CMTM6g2 cat │ CMTM6g3 cat │ NTg1 cat │ NTg2 cat │ NTg3 cat │ NTg4 cat │ NTg5 cat │ NTg7 cat │ PDL1g1 cat │ PDL1g2 cat │ PDL1g3 cat │ ATF2g1 cat │ ATF2g2 cat │ ATF2g3 cat │ ATF2g4 cat └── Labels └── .experimental_factors bionty.ExperimentalFactor cell hashing .ulabels ULabel Perturbed, NT, rep3, rep2, rep1
# clean up test instance
!rm -r test-multimodal
!lamin delete --force test-multimodal
Show code cell output
• deleting instance testuser1/test-multimodal