cellink.tl.external.format_gsmap_sumstats#
- cellink.tl.external.format_gsmap_sumstats(sumstats, out_prefix, snp=None, a1=None, a2=None, beta=None, se=None, p=None, z=None, n=None, chr_col=None, pos=None, info=None, frq=None, info_min=0.9, maf_min=0.01, keep_chr_pos=False, dbsnp=None, tmp_dir=None, cleanup_tmp=False)#
Convert GWAS summary statistics into gsMap-compatible format.
Thin wrapper around
gsmap format_sumstatsthat accepts a pandas DataFrame as input in addition to a file path, handling the temporary file creation and column remapping automatically. The output is a gzip-compressed file with columns SNP, A1, A2, Z, N, ready to pass as--sumstats_fileto anygsmapsubcommand.- Parameters:
sumstats (str, Path, or pd.DataFrame) – Input GWAS summary statistics. If a DataFrame, it is written to a temporary file in
tmp_dirbefore being passed to the CLI.out_prefix (str or Path) – Output prefix. The formatted file is written as
{out_prefix}.sumstats.gz.snp (str, optional) – Column name for SNP rs-identifiers.
a1 (str, optional) – Column name for effect allele.
a2 (str, optional) – Column name for non-effect allele.
beta (str, optional) – Column name for GWAS beta coefficient.
se (str, optional) – Column name for standard error of beta.
p (str, optional) – Column name for p-value.
z (str, optional) – Column name for Z-statistic.
n (str, optional) – Column name for sample size.
chr_col (str, optional) – Column name for chromosome.
pos (str, optional) – Column name for base-pair position.
info (str, optional) – Column name for INFO imputation quality score.
frq (str, optional) – Column name for allele frequency.
info_min (float, default=0.9) – Minimum INFO score threshold.
maf_min (float, default=0.01) – Minimum minor allele frequency threshold.
keep_chr_pos (bool, default=False) – Retain chromosome and position columns in the output.
dbsnp (str or Path, optional) – Path to a dbSNP reference file for rs-ID matching.
tmp_dir (str or Path, optional) – Directory for the temporary text file written when
sumstatsis a DataFrame. Defaults to the current working directory.cleanup_tmp (bool, default=False) – Delete the temporary file after
gsmap format_sumstatsfinishes.
- Return type:
- Returns:
Path Path to the formatted
.sumstats.gzoutput file.
Examples
>>> # From a file path >>> sumstats_path = format_gsmap_sumstats( ... "GIANT_HEIGHT.txt", out_prefix="height", beta="BETA", se="SE", n="N" ... )
>>> # From a GWAS Catalog DataFrame >>> gwas_df['hm_beta'] = pd.to_numeric(gwas_df['hm_beta'], errors='coerce') >>> sumstats_path = format_gsmap_sumstats( ... sumstats=gwas_df, ... out_prefix="IQ", ... snp="hm_rsid", ... a1="hm_effect_allele", ... a2="hm_other_allele", ... beta="hm_beta", ... p="p_value", ... n="n", ... tmp_dir="./tmp", ... cleanup_tmp=True, ... )