cellink.io.read_pgen_zarr#
- cellink.io.read_pgen_zarr(store)#
Lazily read an AnnData Zarr v3 store written by
stream_pgen_to_zarr.This function reconstructs an
anndata.AnnDataobject from a Zarr store while keeping the primary data matrix (X) backed by Dask arrays. It is designed for large genotype matrices that cannot be loaded fully into memory.- The reader preserves:
Dense X stored as a Zarr array (returned as a Dask-backed array)
Sparse matrices (CSR/CSC)
DataFrames (obs, var)
Awkward arrays
Standard AnnData container structure
- Parameters:
store (str or pathlib.Path) – Path to a Zarr directory created by
stream_pgen_to_zarror a compatible AnnData Zarr v3 store.- Return type:
- Returns:
anndata.AnnData AnnData object with: -
Xas a Dask-backed array (for dense storage) -obsandvaras pandas DataFrames - empty container groups (uns,obsm,varm,layers, etc.)if present in the store
Notes
The returned object is lazy when X is dense. Computation is triggered only when
.compute()or in-memory materialization is requested.For sparse X written via
stream_pgen_to_zarr(..., sparse=True), the matrix is loaded as a SciPy sparse matrix.This function relies on AnnData’s experimental dispatched I/O API.
Examples
>>> import cellink >>> adata = cellink.io.read_pgen_zarr("genotypes.zarr") >>> adata AnnData object with n_obs x n_vars = ...
>>> # Trigger computation >>> X = adata.X.compute()