cellink.io.from_sgkit_dataset#
- cellink.io.from_sgkit_dataset(sgkit_dataset, *, var_rename=None, obs_rename=None, X_field='GT', hard_call=True, keep_multiallelic=False, load_call_fields=None)#
Convert an sgkit xarray.Dataset to AnnData.
- Parameters:
sgkit_dataset (
Dataset) – xarray.Dataset from sgkit (lazy/dask-backed)var_rename (
dict|None(default:None)) – mapping from sgkit variant keys (e.g., ‘variant_position’) to var column namesobs_rename (
dict|None(default:None)) – mapping from sgkit’s sample annotation keys to desired gdata.obs columnX_field (
str(default:'GT')) – One of: “GT”, “DS”, “GP”, “MASK”, “AC”, “NONE”. - “GT”: collapsed allele count (sum of non-zero allele indices) -> X (samples, variants) - “DS”: scalar dosage (call_DS collapsed across alts) -> X - “GP”: argmax genotype state mapped to alt-count when mapping exists -> X - “MASK”: fraction of masked allele copies per (variant,sample) - “AC”: alias for “GT” - “NONE”: do not set X (X = np.empty((n_samples, 0))) or set to zeros? We set X to empty 2D dask array.hard_call (
bool(default:True)) – if True, returns hard calls (0,1,2); if False, returns dosage/additive encodingkeep_multiallelic (
bool(default:False)) – if True, stores extra alternate alleles beyond ALT1; default is Falseload_call_fields (
Iterable[str] |None(default:None)) – iterable of call_* keys to load as layers; default None = load all present call_ fields.
- Return type: