lamindb.Artifact¶
- class lamindb.Artifact(data: UPathStr, type: ArtifactKind | None = None, key: str | None = None, description: str | None = None, revises: Artifact | None = None, run: Run | None = None)¶
- Bases: - Record,- IsVersioned,- TracksRun,- TracksUpdates- Datasets & models stored as files, folders, or arrays. - Artifacts manage data in local or remote storage. - Some artifacts are array-like, e.g., when stored as - .parquet,- .h5ad,- .zarr, or- .tiledb.- Parameters:
- data – - UPathStrA path to a local or remote folder or file.
- type – - Literal["dataset", "model"] | None = NoneThe artifact type.
- key – - str | None = NoneA path-like key to reference artifact in default storage, e.g.,- "myfolder/myfile.fcs". Artifacts with the same key form a revision family.
- description – - str | None = NoneA description.
- revises – - Artifact | None = NonePrevious version of the artifact. Triggers a revision.
- run – - Run | None = NoneThe run that creates the artifact.
 
 - Typical storage formats & their API accessors- Arrays: - Table: - .csv,- .tsv,- .parquet,- .ipc⟷- DataFrame,- pyarrow.Table
- Annotated matrix: - .h5ad,- .h5mu,- .zrad⟷- AnnData,- MuData
- Generic array: HDF5 group, zarr group, TileDB store ⟷ HDF5, zarr, TileDB loaders 
 - Non-arrays: - Image: - .jpg,- .png⟷- np.ndarray, …
- Fastq: - .fastq⟷ /
- VCF: - .vcf⟷ /
- QC: - .html⟷ /
 - You’ll find these values in the - suffix&- accessorfields.- LaminDB makes some default choices (e.g., serialize a - DataFrameas a- .parquetfile).- See also - Storage
- Storage locations for artifacts. 
- Collection
- Collections of artifacts. 
- from_df()
- Create an artifact from a - DataFrame.
- from_anndata()
- Create an artifact from an - AnnData.
 - Examples - Create an artifact from a file path and pass - description:- >>> artifact = ln.Artifact("s3://my_bucket/my_folder/my_file.csv", description="My file") >>> artifact = ln.Artifact("./my_local_file.jpg", description="My image") - You can also pass - keyto create a virtual filepath hierarchy:- >>> artifact = ln.Artifact("./my_local_file.jpg", key="example_datasets/dataset1.jpg") - What works for files also works for folders: - >>> artifact = ln.Artifact("s3://my_bucket/my_folder", description="My folder") >>> artifact = ln.Artifact("./my_local_folder", description="My local folder") >>> artifact = ln.Artifact("./my_local_folder", key="project1/my_target_folder") - Why does the API look this way?- It’s inspired by APIs building on AWS S3. - Both boto3 and quilt select a bucket (akin to default storage in LaminDB) and define a target path through a - keyargument.- In boto3: - # signature: S3.Bucket.upload_file(filepath, key) import boto3 s3 = boto3.resource('s3') bucket = s3.Bucket('mybucket') bucket.upload_file('/tmp/hello.txt', 'hello.txt') - In quilt3: - # signature: quilt3.Bucket.put_file(key, filepath) import quilt3 bucket = quilt3.Bucket('mybucket') bucket.put_file('hello.txt', '/tmp/hello.txt') - Make a new version of an artifact: - >>> artifact = ln.Artifact.from_df(df, key="example_datasets/dataset1.parquet").save() >>> artifact_v2 = ln.Artifact(df_updated, key="example_datasets/dataset1.parquet").save() - Alternatively, if you don’t want to provide a value for - key, you can use- revises:- >>> artifact = ln.Artifact.from_df(df, description="My dataframe").save() >>> artifact_v2 = ln.Artifact(df_updated, revises=artifact).save() - Attributes¶- 
features: FeatureManager¶
- Feature manager. - Features denote dataset dimensions, i.e., the variables that measure labels & numbers. - Annotate with features & values: - artifact.features.add_values({ "species": organism, # here, organism is an Organism record "scientist": ['Barbara McClintock', 'Edgar Anderson'], "temperature": 27.6, "study": "Candidate marker study" }) - Query for features & values: - ln.Artifact.features.filter(scientist="Barbara McClintock") - Features may or may not be part of the artifact content in storage. For instance, the - Curatorflow validates the columns of a- DataFrame-like artifact and annotates it with features corresponding to these columns.- artifact.features.add_values, by contrast, does not validate the content of the artifact.
 - property labels: LabelManager¶
- Label manager. - To annotate with labels, you typically use the registry-specific accessors, for instance - ulabels:- candidate_marker_study = ln.ULabel(name="Candidate marker study").save() artifact.ulabels.add(candidate_marker_study) - Similarly, you query based on these accessors: - ln.Artifact.filter(ulabels__name="Candidate marker study").all() - Unlike the registry-specific accessors, the - .labelsaccessor provides a way of associating labels with features:- study = ln.Feature(name="study", dtype="cat").save() artifact.labels.add(candidate_marker_study, feature=study) - Note that the above is equivalent to: - artifact.features.add_values({"study": candidate_marker_study}) 
 - property n_objects: int¶
 - 
params: ParamManager¶
- Param manager. - Example: - artifact.params.add_values({ "hidden_size": 32, "bottleneck_size": 16, "batch_size": 32, "preprocess_params": { "normalization_type": "cool", "subset_highlyvariable": True, }, }) 
 - property path: Path | UPath¶
- Path. - File in cloud storage, here AWS S3: - >>> artifact = ln.Artifact("s3://my-bucket/my-file.csv").save() >>> artifact.path S3Path('s3://my-bucket/my-file.csv') - File in local storage: - >>> ln.Artifact("./myfile.csv", key="myfile").save() >>> artifact = ln.Artifact.get(key="myfile") >>> artifact.path PosixPath('/home/runner/work/lamindb/lamindb/docs/guide/mydata/myfile.csv') 
 - property stem_uid: str¶
- Universal id characterizing the version family. - The full uid of a record is obtained via concatenating the stem uid and version information: - stem_uid = random_base62(n_char) # a random base62 sequence of length 12 (transform) or 16 (artifact, collection) version_uid = "0000" # an auto-incrementing 4-digit base62 number uid = f"{stem_uid}{version_uid}" # concatenate the stem_uid & version_uid 
 - property type: str¶
 - Simple fields¶- 
uid: str¶
- A universal random id. 
 - 
key: str|None¶
- A (virtual) relative file path within the artifact’s storage location. - Setting a - keyis useful to automatically group artifacts into a version family.- LaminDB defaults to a virtual file path to make renaming of data in object storage easy. - If you register existing files in a storage location, the - keyequals the actual filepath on the underyling filesytem or object store.
 - 
description: str|None¶
- A description. 
 - 
suffix: str¶
- Path suffix or empty string if no canonical suffix exists. - This is either a file suffix ( - ".csv",- ".h5ad", etc.) or the empty string “”.
 - 
kind: Literal['dataset','model'] |None¶
- ArtifactKind(default- None).
 - 
otype: str|None¶
- Default Python object type, e.g., DataFrame, AnnData. 
 - 
size: int|None¶
- Size in bytes. - Examples: 1KB is 1e3 bytes, 1MB is 1e6, 1GB is 1e9, 1TB is 1e12 etc. 
 - 
hash: str|None¶
- Hash or pseudo-hash of artifact content. - Useful to ascertain integrity and avoid duplication. 
 - 
n_files: int|None¶
- Number of files for folder-like artifacts, - Nonefor file-like artifacts.- Note that some arrays are also stored as folders, e.g., - .zarror- .tiledbsoma.- Changed in version 1.0: Renamed from - n_objectsto- n_files.
 - 
n_observations: int|None¶
- Number of observations. - Typically, this denotes the first array dimension. 
 - 
version: str|None¶
- Version (default - None).- Defines version of a family of records characterized by the same - stem_uid.- Consider using semantic versioning with Python versioning. 
 - 
is_latest: bool¶
- Boolean flag that indicates whether a record is the latest in its version family. 
 - 
created_at: datetime¶
- Time of creation of record. 
 - 
updated_at: datetime¶
- Time of last update to record. 
 - Relational fields¶- 
space: Space¶
- The space in which the record lives. 
 - 
collections: Collection¶
- The collections that this artifact is part of. 
 - Class methods¶- classmethod df(include=None, features=False, limit=100)¶
- Convert to - pd.DataFrame.- By default, shows all direct fields, except - updated_at.- Use arguments - includeor- featureto include other data.- Parameters:
- include ( - str|- list[- str] |- None, default:- None) – Related fields to include as columns. Takes strings of form- "ulabels__name",- "cell_types__name", etc. or a list of such strings.
- features ( - bool|- list[- str], default:- False) – If- True, map all features of the- Featureregistry onto the resulting- DataFrame. Only available for- Artifact.
- limit ( - int, default:- 100) – Maximum number of rows to display from a Pandas DataFrame. Defaults to 100 to reduce database load.
 
- Return type:
- DataFrame
 - Examples - Include the name of the creator in the - DataFrame:- >>> ln.ULabel.df(include="created_by__name"]) - Include display of features for - Artifact:- >>> df = ln.Artifact.df(features=True) >>> ln.view(df) # visualize with type annotations - Only include select features: - >>> df = ln.Artifact.df(features=["cell_type_by_expert", "cell_type_by_model"]) 
 - classmethod filter(*queries, **expressions)¶
- Query records. - Parameters:
- queries – One or multiple - Qobjects.
- expressions – Fields and values passed as Django query expressions. 
 
- Return type:
- QuerySet
- Returns:
- A - QuerySet.
 - See also - Guide: Query & search registries 
- Django documentation: Queries 
 - Examples - >>> ln.ULabel(name="my label").save() >>> ln.ULabel.filter(name__startswith="my").df() 
 - classmethod from_anndata(adata, key=None, description=None, run=None, revises=None, **kwargs)¶
- Create from - AnnData, validate & link features.- Parameters:
- adata (AnnData | UPathStr) – An - AnnDataobject or a path of AnnData-like.
- key (str | None, default: - None) – A relative path within default storage, e.g.,- "myfolder/myfile.h5ad".
- description (str | None, default: - None) – A description.
- revises (Artifact | None, default: - None) – An old version of the artifact.
- run (Run | None, default: - None) – The run that creates the artifact.
 
- Return type:
- Artifact 
 - See also - Collection()
- Track collections. 
- Feature
- Track features. 
 - Examples - >>> import bionty as bt >>> bt.settings.organism = "human" >>> adata = ln.core.datasets.anndata_with_obs() >>> artifact = ln.Artifact.from_anndata(adata, description="mini anndata with obs") >>> artifact.save() 
 - classmethod from_df(df, key=None, description=None, run=None, revises=None, **kwargs)¶
- Create from - DataFrame, validate & link features.- Parameters:
- df ( - DataFrame) – A- DataFrameobject.
- key ( - str|- None, default:- None) – A relative path within default storage, e.g.,- "myfolder/myfile.parquet".
- description ( - str|- None, default:- None) – A description.
- revises ( - Artifact|- None, default:- None) – An old version of the artifact.
- run ( - Run|- None, default:- None) – The run that creates the artifact.
 
- Return type:
 - See also - Collection()
- Track collections. 
- Feature
- Track features. 
 - Examples - >>> df = ln.core.datasets.df_iris_in_meter_batch1() >>> df.head() sepal_length sepal_width petal_length petal_width iris_organism_code 0 0.051 0.035 0.014 0.002 0 1 0.049 0.030 0.014 0.002 0 2 0.047 0.032 0.013 0.002 0 3 0.046 0.031 0.015 0.002 0 4 0.050 0.036 0.014 0.002 0 >>> artifact = ln.Artifact.from_df(df, description="Iris flower collection batch1") >>> artifact.save() 
 - classmethod from_dir(path, key=None, *, run=None)¶
- Create a list of artifact objects from a directory. - Hint - If you have a high number of files (several 100k) and don’t want to track them individually, create a single - Artifactvia- Artifact(path)for them. See, e.g., RxRx: cell imaging.- Parameters:
- path (lamindb.core.types.UPathStr) – Source path of folder. 
- key ( - str|- None, default:- None) – Key for storage destination. If- Noneand directory is in a registered location, the inferred- keywill reflect the relative position. If- Noneand directory is outside of a registered storage location, the inferred key defaults to- path.name.
- run ( - Run|- None, default:- None) – A- Runobject.
 
- Return type:
- list[- Artifact]
 - Examples - >>> dir_path = ln.core.datasets.generate_cell_ranger_files("sample_001", ln.settings.storage) >>> artifacts = ln.Artifact.from_dir(dir_path) >>> ln.save(artifacts) 
 - classmethod from_mudata(mdata, key=None, description=None, run=None, revises=None, **kwargs)¶
- Create from - MuData, validate & link features.- Parameters:
- mdata ( - MuData) – An- MuDataobject.
- key ( - str|- None, default:- None) – A relative path within default storage, e.g.,- "myfolder/myfile.h5mu".
- description ( - str|- None, default:- None) – A description.
- revises ( - Artifact|- None, default:- None) – An old version of the artifact.
- run ( - Run|- None, default:- None) – The run that creates the artifact.
 
- Return type:
 - See also - Collection()
- Track collections. 
- Feature
- Track features. 
 - Examples - >>> import bionty as bt >>> bt.settings.organism = "human" >>> mdata = ln.core.datasets.mudata_papalexi21_subset() >>> artifact = ln.Artifact.from_mudata(mdata, description="a mudata object") >>> artifact.save() 
 - classmethod get(idlike=None, **expressions)¶
- Get a single record. - Parameters:
- idlike ( - int|- str|- None, default:- None) – Either a uid stub, uid or an integer id.
- expressions – Fields and values passed as Django query expressions. 
 
- Return type:
- Returns:
- A record. 
- Raises:
- lamindb.core.exceptions.DoesNotExist – In case no matching record is found. 
 - See also - Guide: Query & search registries 
- Django documentation: Queries 
 - Examples - >>> ulabel = ln.ULabel.get("FvtpPJLJ") >>> ulabel = ln.ULabel.get(name="my-label") 
 - classmethod lookup(field=None, return_field=None)¶
- Return an auto-complete object for a field. - Parameters:
- field ( - str|- DeferredAttribute|- None, default:- None) – The field to look up the values for. Defaults to first string field.
- return_field ( - str|- DeferredAttribute|- None, default:- None) – The field to return. If- None, returns the whole record.
 
- Return type:
- NamedTuple
- Returns:
- A - NamedTupleof lookup information of the field values with a dictionary converter.
 - See also - Examples - >>> import bionty as bt >>> bt.settings.organism = "human" >>> bt.Gene.from_source(symbol="ADGB-DT").save() >>> lookup = bt.Gene.lookup() >>> lookup.adgb_dt >>> lookup_dict = lookup.dict() >>> lookup_dict['ADGB-DT'] >>> lookup_by_ensembl_id = bt.Gene.lookup(field="ensembl_gene_id") >>> genes.ensg00000002745 >>> lookup_return_symbols = bt.Gene.lookup(field="ensembl_gene_id", return_field="symbol") 
 - classmethod search(string, *, field=None, limit=20, case_sensitive=False)¶
- Search. - Parameters:
- string ( - str) – The input string to match against the field ontology values.
- field ( - str|- DeferredAttribute|- None, default:- None) – The field or fields to search. Search all string fields by default.
- limit ( - int|- None, default:- 20) – Maximum amount of top results to return.
- case_sensitive ( - bool, default:- False) – Whether the match is case sensitive.
 
- Return type:
- QuerySet
- Returns:
- A sorted - DataFrameof search results with a score in column- score. If- return_querysetis- True.- QuerySet.
 - Examples - >>> ulabels = ln.ULabel.from_values(["ULabel1", "ULabel2", "ULabel3"], field="name") >>> ln.save(ulabels) >>> ln.ULabel.search("ULabel2") 
 - classmethod using(instance)¶
- Use a non-default LaminDB instance. - Parameters:
- instance ( - str|- None) – An instance identifier of form “account_handle/instance_name”.
- Return type:
- QuerySet
 - Examples - >>> ln.ULabel.using("account_handle/instance_name").search("ULabel7", field="name") uid score name ULabel7 g7Hk9b2v 100.0 ULabel5 t4Jm6s0q 75.0 ULabel6 r2Xw8p1z 75.0 
 - Methods¶- cache(is_run_input=None)¶
- Download cloud artifact to local cache. - Follows synching logic: only caches an artifact if it’s outdated in the local cache. - Returns a path to a locally cached on-disk object (say a - .jpgfile).- Return type:
- Path
 - Examples - Sync file from cloud and return the local path of the cache: - >>> artifact.cache() PosixPath('/home/runner/work/Caches/lamindb/lamindb-ci/lndb-storage/pbmc68k.h5ad') 
 - delete(permanent=None, storage=None, using_key=None)¶
- Trash or permanently delete. - A first call to - .delete()puts an artifact into the trash (sets- _branch_codeto- -1). A second call permanently deletes the artifact. If it is a folder artifact with multiple versions, deleting a non-latest version will not delete the underlying storage by default (if- storage=Trueis not specified). Deleting the latest version will delete all the versions for folder artifacts.- FAQ: Storage FAQ - Parameters:
- permanent ( - bool|- None, default:- None) – Permanently delete the artifact (skip trash).
- storage ( - bool|- None, default:- None) – Indicate whether you want to delete the artifact in storage.
 
- Return type:
- None
 - Examples - For an - Artifactobject- artifact, call:- >>> artifact = ln.Artifact.filter(key="some.csv").one() >>> artifact.delete() # delete a single file artifact - >>> artifact = ln.Artifact.filter(key="some.tiledbsoma". is_latest=False).first() >>> artiact.delete() # delete an old version, the data will not be deleted - >>> artifact = ln.Artifact.filter(key="some.tiledbsoma". is_latest=True).one() >>> artiact.delete() # delete all versions, the data will be deleted or prompted for deletion. 
 - describe(print_types=False)¶
- Describe relations of record. - Examples - >>> artifact.describe() 
 - load(is_run_input=None, **kwargs)¶
- Cache and load into memory. - See all - loaders.- Return type:
- Any
 - Examples - Load a - DataFrame-like artifact:- >>> artifact.load().head() sepal_length sepal_width petal_length petal_width iris_organism_code 0 0.051 0.035 0.014 0.002 0 1 0.049 0.030 0.014 0.002 0 2 0.047 0.032 0.013 0.002 0 3 0.046 0.031 0.015 0.002 0 4 0.050 0.036 0.014 0.002 0 - Load an - AnnData-like artifact:- >>> artifact.load() AnnData object with n_obs × n_vars = 70 × 765 - Fall back to - cache()if no in-memory representation is configured:- >>> artifact.load() PosixPath('/home/runner/work/lamindb/lamindb/docs/guide/mydata/.lamindb/jb7BY5UJoQVGMUOKiLcn.jpg') 
 - open(mode='r', is_run_input=None)¶
- Return a cloud-backed data object. - Works for - AnnData(- .h5adand- .zarr), generic- hdf5and- zarr,- tiledbsomaobjects (- .tiledbsoma),- pyarrowcompatible formats.- Parameters:
- mode (str, default: - 'r') – can only be- "w"(write mode) for- tiledbsomastores, otherwise should be always- "r"(read-only mode).
- Return type:
- AnnDataAccessor | BackedAccessor | SOMACollection | SOMAExperiment | PyArrowDataset 
 - Notes - For more info, see tutorial: Slice arrays. - Examples - Read AnnData in backed mode from cloud: - >>> artifact = ln.Artifact.get(key="lndb-storage/pbmc68k.h5ad") >>> artifact.open() AnnDataAccessor object with n_obs × n_vars = 70 × 765 constructed for the AnnData object pbmc68k.h5ad ... 
 - replace(data, run=None, format=None)¶
- Replace artifact content. - Parameters:
- data (UPathStr | pd.DataFrame | AnnData | MuData) – A file path. 
- run (Run | None, default: - None) – The run that created the artifact gets auto-linked if- ln.track()was called.
 
- Return type:
- None 
 - Examples - Say we made a change to the content of an artifact, e.g., edited the image - paradisi05_laminopathic_nuclei.jpg.- This is how we replace the old file in storage with the new file: - >>> artifact.replace("paradisi05_laminopathic_nuclei.jpg") >>> artifact.save() - Note that this neither changes the storage key nor the filename. - However, it will update the suffix if it changes. 
 - restore()¶
- Restore from trash. - Return type:
- None
 - Examples - For any - Artifactobject- artifact, call:- >>> artifact.restore() 
 - save(upload=None, **kwargs)¶
- Save to database & storage. - Parameters:
- upload ( - bool|- None, default:- None) – Trigger upload to cloud storage in instances with hybrid storage mode.
- Return type:
 - Examples - >>> artifact = ln.Artifact("./myfile.csv", description="myfile") >>> artifact.save() 
 - view_lineage(with_children=True)¶
- Graph of data flow. - Return type:
- None
 - Notes - For more info, see use cases: Data lineage. - Examples - >>> collection.view_lineage() >>> artifact.view_lineage()