cellink.tl.external.run_magma_pipeline#
- cellink.tl.external.run_magma_pipeline(gwas_sumstats, output_prefix='magma_results', genome_build='GRCh38', gene_id_type='ensembl', window_size=(35, 10), n_samples=None, ld_source=None, dd=None, reference_panel=None, external_ld_prefix=None, col_mapping=None, config_file='configs/magma.yaml', magma_bin='magma')#
Complete MAGMA pipeline: prepare inputs, annotate SNPs, and run gene analysis.
- Parameters:
gwas_sumstats (pd.DataFrame) – GWAS summary statistics.
output_prefix (str) – Prefix for all output files.
genome_build (str) – ‘GRCh37’ or ‘GRCh38’.
gene_id_type (str) – Gene ID type: ‘entrez’, ‘ensembl’, or ‘gene_name’.
window_size (tuple of int) – Upstream/downstream window in kilobases for SNP-to-gene annotation.
n_samples (int) – GWAS sample size.
ld_source (str or None) – LD reference strategy — see prepare_magma_inputs for full docs. - ‘dd_genotypes’ : use genotypes from DonorData (pass
dd) - ‘reference_panel’: download a 1000G panel (passreference_panel) - ‘external’ : use existing PLINK files (passexternal_ld_prefix) - None : raises an error — LD reference is required for gene analysisdd (DonorData, optional) – Required when ld_source=’dd_genotypes’.
reference_panel (str, optional) – Required when ld_source=’reference_panel’. Options: ‘EUR’, ‘EAS’, ‘AFR’.
external_ld_prefix (str or Path, optional) – Required when ld_source=’external’. Path prefix of PLINK files.
col_mapping (dict, optional) – Column rename mapping for GWAS sumstats.
config_file (str) – Path to YAML config.
magma_bin (str) – Path to MAGMA binary.
- Return type:
- Returns:
Path Path to gene-level results file (.genes.out).
Examples
>>> # Using your own cohort's genotypes >>> results = run_magma_pipeline( ... gwas_df, output_prefix="t2d", n_samples=100000, ... ld_source="dd_genotypes", dd=my_donor_data, ... magma_bin="./magma/magma", ... )
>>> # Using a downloaded 1000G panel >>> results = run_magma_pipeline( ... gwas_df, output_prefix="t2d", n_samples=100000, ... ld_source="reference_panel", reference_panel="EUR", ... )
>>> # Using your own pre-built PLINK reference >>> results = run_magma_pipeline( ... gwas_df, output_prefix="t2d", n_samples=100000, ... ld_source="external", external_ld_prefix="/data/my_ref_panel", ... )