Tutorial: eQTL Analysis with JaxQTL and TensorQTL using cellink#
This tutorial demonstrates how to perform eQTL analysis using external tools JaxQTL and TensorQTL through the cellink package. The cellink package provides a unified interface to these powerful QTL mapping tools, making it easier to perform comprehensive genetic analyses on single-cell datasets. These tools provide powerful statistical methods for detecting quantitative trait loci (QTLs) in genomic data, with JaxQTL offering fast GPU-accelerated analysis and TensorQTL providing comprehensive cis- and trans-QTL mapping capabilities.
This notebook assumes familiarity with single-cell data processing and basic statistical genetics concepts. The cellink package provides convenient wrapper functions that handle data preparation and formatting for these external tools. JaxQTL currently is not available via pip or conda. Please follow the instructions here. To use TensorQTL you can install it via pip install 'cellink[tensorqtl]'. TensorQTL also requires plink2. For visualization of QTL calling results, please consider checking out the Tutorial: Pseudobulk eQTL Analysis with cellink.
Environment Setup#
We begin by importing necessary libraries and defining key parameters for our analysis. The cellink package provides wrapper functions for JaxQTL and TensorQTL that automatically handle data formatting and preparation.
import numpy as np
from cellink.resources import get_dummy_onek1k
from cellink.tl.external import run_jaxqtl, run_tensorqtl
# Analysis parameters
n_gpcs = 20
n_epcs = 15
batch_e_pcs_n_top_genes = 2000
chrom = 22
cis_window = 500_000
cell_type = "CD8 Naive"
celltype_key = "predicted.celltype.l2"
original_donor_col = "donor_id"
Load and Prepare Data#
We load the OneK1K dataset, which contains both genotype and single-cell expression data. We also add gene annotations from Ensembl, which are essential for defining genomic positions and cis-windows. (This is a subset of the full OneK1K dataset, which can be downloaded, and prepared using get_onek1k())
# Load the dataset
dd = get_dummy_onek1k(config_path="../../src/cellink/resources/config/dummy_onek1k.yaml", verify_checksum=False)
print(f"Dataset shape: {dd.shape}")
dd.G.obsm["gPCs"] = dd.G.obsm["gPCs"][dd.G.obsm["gPCs"].columns[:n_gpcs]]
[2025-12-29 03:06:35,629] INFO:root: /Users/larnoldt/cellink_data/dummy_onek1k/dummy_onek1k.dd.h5 already exists
[2025-12-29 03:06:35,629] WARNING:root: No checksum provided, skipping verification
[2025-12-29 03:06:36,994] INFO:root: Loaded dummy OneK1K dataset: (100, 146939, 125366, 34073)
Dataset shape: (100, 146939, 125366, 34073)
Data Preprocessing#
We filter the dataset to focus on a specific cell type and prepare the data for eQTL analysis.
dd.aggregate(obs=["donor_id", "sex", "age"], func="first", add_to_obs=True)
# Filter to specific cell type
dd = dd[..., dd.C.obs[celltype_key] == cell_type, :].copy()
print(f"After cell type filtering: {dd.shape}")
# Add donor-level metadata
dd.G.obs["donor_sex"] = dd.G.obs["sex"]
dd.G.obs["donor_age"] = dd.G.obs["age"]
# Generate random labels for demonstration (replace with real phenotypes)
dd.G.obs["donor_labels"] = np.random.randint(2, size=len(dd.G.obs))
# Filter to specific chromosome for faster analysis
dd = dd.sel(G_var=dd.G.var.chrom == str(chrom), C_var=dd.C.var.chrom == str(chrom)).copy()
print(f"After chromosome {chrom} filtering: {dd.shape}")
After cell type filtering: (100, 146939, 4756, 34073)
After chromosome 22 filtering: (100, 136776, 4756, 871)
To speed up the computation we also filter for the number of SNPs.
dd = dd[:, dd.G.var["pos"] < 17584955, :, :].copy()
JaxQTL Analysis#
JaxQTL is a fast, GPU-accelerated tool for QTL mapping. It supports various statistical models and can handle large-scale genomic data efficiently. JaxQTL can be run in various modes: ["nominal", "cis", "cis_acat", "fitnull", "covar", "trans", "estimate_ld_only"].
Running JaxQTL with Default Parameters#
The basic cis mode performs permutation-based cis-QTL mapping with empirical FDR estimation. This is the standard approach for identifying cis-eQTLs with appropriate multiple testing correction.
# Basic JaxQTL analysis
results_jaxqtl_basic = run_jaxqtl(
dd,
prefix="jaxqtl_basic",
mode="cis",
model="NB", # Negative binomial model
window=cis_window,
additional_covariates=["gPCs"],
run=True,
)
results_jaxqtl_basic
[2025-12-29 03:02:45,533] INFO:cellink._core.donordata: Aggregated X to PB
[2025-12-29 03:02:45,533] INFO:cellink._core.donordata: Observation found for 100 donors.
Writing FAM... done.
Writing BIM... done.
| phenotype_id | chrom | num_var | variant_id | pos | tss_distance | ma_count | af | beta_shape1 | beta_shape2 | beta_converged | opt_status | true_nc | pval_nominal | slope | slope_se | pval_beta | alpha_cov | model_converged | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ENSG00000177663 | 22 | 4161 | 22_17025474_T_C | 17025474 | -59481.0 | 35.0 | 0.175 | 52.109502 | 54.426431 | 1.0 | True | 1.942890e-16 | 0.482593 | -0.744504 | 1.098446 | 0.447015 | 1.871933e-07 | 1.0 |
| 1 | ENSG00000069998 | 22 | 4114 | 22_17212956_A_ATTTT | 17212956 | 75444.0 | 5.0 | 0.975 | 21.068593 | 33.666325 | 1.0 | True | 1.942890e-16 | 0.422720 | -1.353229 | 1.771991 | 0.721527 | 5.184935e-07 | 1.0 |
| 2 | ENSG00000185837 | 22 | 4096 | 22_17035794_T_G | 17035794 | -123545.0 | 14.0 | 0.930 | NaN | NaN | 0.0 | True | 3.267188e+00 | 0.999750 | -23.416329 | 348.036692 | NaN | 1.780708e-06 | 1.0 |
| 3 | ENSG00000093072 | 22 | 4081 | 22_16933780_G_A | 16933780 | -245011.0 | 1.0 | 0.995 | 21.659233 | 31.581654 | 1.0 | True | 1.942890e-16 | 0.391825 | -2.492299 | 3.536479 | 0.418729 | 4.305565e-08 | 1.0 |
| 4 | ENSG00000182902 | 22 | 2647 | 22_17577481_A_C | 17577481 | 14030.0 | 1.0 | 0.995 | 2179.394106 | 3.329034 | 1.0 | True | 1.942890e-16 | 0.998652 | -7.683877 | 333.268516 | 0.514215 | 2.774556e-03 | 1.0 |
| 5 | ENSG00000236754 | 22 | 2608 | 22_17239813_C_T | 17239813 | -339574.0 | 7.0 | 0.965 | 1512.785525 | 2.894406 | 1.0 | True | 1.942890e-16 | 0.998596 | -3.354731 | 201.541181 | 0.616794 | 1.314982e-05 | 1.0 |
| 6 | ENSG00000131100 | 22 | 2563 | 22_17563390_G_C | 17563390 | -28747.0 | 69.0 | 0.345 | 20.036668 | 37.267971 | 1.0 | True | 1.942890e-16 | 0.282472 | 0.517867 | 0.489794 | 0.141640 | 9.334170e-05 | 1.0 |
| 7 | ENSG00000099968 | 22 | 2407 | 22_17219920_C_T | 17219920 | -408936.0 | 59.0 | 0.295 | 13.350913 | 24.180304 | 1.0 | True | 1.942890e-16 | 0.334455 | 0.638422 | 0.675893 | 0.405157 | 7.940555e-05 | 1.0 |
| 8 | ENSG00000015475 | 22 | 1724 | 22_17407251_T_C | 17407251 | -326888.0 | 3.0 | 0.985 | 36.027044 | 41.509726 | 1.0 | True | 1.942890e-16 | 0.537514 | -1.237648 | 2.082027 | 0.900735 | 8.893963e-05 | 1.0 |
| 9 | ENSG00000269220 | 22 | 1525 | 22_17351364_TTC_T | 17351364 | -425954.0 | 2.0 | 0.990 | 17.517949 | 19.103963 | 1.0 | True | 1.942890e-16 | 0.408392 | -2.148595 | 2.861632 | 0.199174 | 1.000000e-08 | 1.0 |
| 10 | ENSG00000243156 | 22 | 1494 | 22_17577481_A_C | 17577481 | -210169.0 | 1.0 | 0.995 | 9.233605 | 10.043320 | 1.0 | True | 1.942890e-16 | 0.569340 | -3.940565 | 10.120073 | 0.786017 | 1.042847e-03 | 1.0 |
| 11 | ENSG00000286990 | 22 | 1477 | 22_17550982_C_T | 17550982 | -249741.0 | 26.0 | 0.870 | 0.515312 | 0.087452 | 1.0 | True | 7.197109e+00 | 0.998998 | 4.655247 | 302.113200 | 0.640512 | 9.309502e-07 | 1.0 |
| 12 | ENSG00000235445 | 22 | 490 | 22_17562393_A_T | 17562393 | -420813.0 | 35.0 | 0.175 | 1711.765504 | 2.414490 | 1.0 | True | 1.942890e-16 | 0.998003 | 2.431304 | 115.453803 | 0.216040 | 1.168994e-04 | 1.0 |
| 13 | ENSG00000234913 | 22 | 393 | 22_17524692_CTTTTTTTTTTTTTT_C | 17524692 | -479579.0 | 1.0 | 0.995 | 4594.708261 | 4.565273 | 1.0 | True | 1.942890e-16 | 0.999388 | 8.537019 | 7763.970539 | 0.786336 | 4.152501e-06 | 1.0 |
| 14 | ENSG00000225335 | 22 | 33 | 22_17582867_C_T | 17582867 | -492272.0 | 3.0 | 0.985 | NaN | NaN | 0.0 | True | 1.058281e+00 | 0.999336 | 7.377999 | 726.025083 | NaN | 1.550050e-06 | 1.0 |
| 15 | ENSG00000215193 | 22 | 17 | 22_17578563_A_G | 17578563 | -499361.0 | 65.0 | 0.675 | 12.825293 | 7.390283 | 1.0 | True | 1.942890e-16 | 0.634838 | 0.510581 | 1.085020 | 0.484888 | 5.840863e-07 | 1.0 |
Advanced JaxQTL Analysis with Custom Parameters#
The cis_acat mode uses the Aggregated Cauchy Association Test to combine p-values across multiple variants, providing a powerful approach for detecting associations when there are multiple causal variants in a region.
# Advanced JaxQTL analysis with more parameters
results_jaxqtl_advanced = run_jaxqtl(
dd,
prefix="jaxqtl_advanced",
mode="cis_acat", # ACAT-combined p-values
model="gaussian", # Gaussian model
window=1000000, # 1Mb window
nperm=10000, # Number of permutations
test_method="wald", # Wald test
additional_covariates=["gPCs"],
addpc=5, # Number of genotype PCs to add
standardize=True,
verbose=True,
run=True,
)
results_jaxqtl_advanced
[2025-12-29 03:03:39,828] INFO:cellink._core.donordata: Aggregated X to PB
[2025-12-29 03:03:39,829] INFO:cellink._core.donordata: Observation found for 100 donors.
Writing FAM... done.
Writing BIM... done.
| chrom | snp | pos | a0 | a1 | i | phenotype_id | tss | tss_distance | af | ma_count | pval_nominal | slope | slope_se | converged | alpha | pval_acat | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 22 | 22_17009000_T_C | 17009000 | T | C | 1345 | ENSG00000015475 | 17734139.0 | -725139.0 | 0.920 | 16.0 | 2.877574e-03 | -0.060722 | 0.019666 | True | 0.0 | 5.803468e-01 |
| 1 | 22 | 22_17524093_G_A | 17524093 | G | A | 3994 | ENSG00000069998 | 17137512.0 | 386581.0 | 0.985 | 3.0 | 1.970923e-04 | -0.144284 | 0.036742 | True | 0.0 | 5.543843e-01 |
| 2 | 22 | 22_17545374_C_CAAA | 17545374 | C | CAAA | 4109 | ENSG00000093072 | 17178791.0 | 366583.0 | 0.960 | 8.0 | 1.307348e-05 | -0.075339 | 0.016078 | True | 0.0 | 5.631898e-02 |
| 3 | 22 | 22_16849850_C_CA | 16849850 | C | CA | 734 | ENSG00000099968 | 17628856.0 | -779006.0 | 0.980 | 4.0 | 1.302300e-04 | -0.146788 | 0.036267 | True | 0.0 | 3.125808e-02 |
| 4 | 22 | 22_17019092_T_A | 17019092 | T | A | 1388 | ENSG00000131100 | 17592137.0 | -573045.0 | 0.995 | 1.0 | 7.546834e-05 | -0.397131 | 0.094481 | True | 0.0 | 1.132195e-01 |
| 5 | 22 | 22_17349688_C_T | 17349688 | C | T | 3007 | ENSG00000177663 | 17084955.0 | 264733.0 | 0.955 | 9.0 | 9.499886e-04 | 0.060594 | 0.017568 | True | 0.0 | 5.397122e-01 |
| 6 | 22 | 22_16702145_A_G | 16702145 | A | G | 236 | ENSG00000182902 | 17563451.0 | -861306.0 | 0.980 | 4.0 | 4.909324e-09 | -0.004212 | 0.000633 | True | 0.0 | 8.408108e-06 |
| 7 | 22 | 22_17547967_G_A | 17547967 | G | A | 4129 | ENSG00000183785 | 18110101.0 | -562134.0 | 0.985 | 3.0 | 1.428733e-09 | -0.036549 | 0.005256 | True | 0.0 | 3.574742e-06 |
| 8 | 22 | 22_17225888_G_A | 17225888 | G | A | 2512 | ENSG00000184979 | 18149844.0 | -923956.0 | 0.995 | 1.0 | 1.143822e-14 | -0.187680 | 0.019322 | True | 0.0 | 2.643713e-11 |
| 9 | 22 | 22_17031458_ATT_A | 17031458 | ATT | A | 1450 | ENSG00000185837 | 17159339.0 | -127881.0 | 0.995 | 1.0 | 8.238284e-16 | -0.062261 | 0.006021 | True | 0.0 | 3.682221e-12 |
| 10 | 22 | 22_17165668_G_A | 17165668 | G | A | 2054 | ENSG00000215193 | 18077924.0 | -912256.0 | 0.980 | 4.0 | 1.082902e-04 | -0.114595 | 0.027947 | True | 0.0 | 3.421433e-02 |
| 11 | 22 | 22_17393483_G_A | 17393483 | G | A | 3181 | ENSG00000225335 | 18075139.0 | -681656.0 | 0.990 | 2.0 | 2.106266e-71 | -0.248278 | 0.003095 | True | 0.0 | 5.514205e-68 |
| 12 | 22 | 22_17413663_G_GATT | 17413663 | G | GATT | 3273 | ENSG00000234913 | 18004271.0 | -590608.0 | 0.995 | 1.0 | 1.423345e-20 | -0.030281 | 0.002317 | True | 0.0 | 4.159015e-17 |
| 13 | 22 | 22_17335813_T_TA | 17335813 | T | TA | 2979 | ENSG00000235445 | 17983206.0 | -647393.0 | 0.975 | 5.0 | 2.441747e-07 | -0.002361 | 0.000413 | True | 0.0 | 4.283879e-04 |
| 14 | 22 | 22_17238021_G_C | 17238021 | G | C | 2557 | ENSG00000236754 | 17579387.0 | -341366.0 | 0.965 | 7.0 | 7.724801e-08 | -0.001254 | 0.000209 | True | 0.0 | 1.892409e-04 |
| 15 | 22 | 22_17031458_ATT_A | 17031458 | ATT | A | 1450 | ENSG00000243156 | 17787650.0 | -756192.0 | 0.995 | 1.0 | 1.859829e-07 | -0.112233 | 0.019426 | True | 0.0 | 6.952086e-04 |
| 16 | 22 | 22_17351364_TTC_T | 17351364 | TTC | T | 3020 | ENSG00000269220 | 17777318.0 | -425954.0 | 0.990 | 2.0 | 4.115031e-08 | -0.136962 | 0.022279 | True | 0.0 | 3.119761e-05 |
| 17 | 22 | 22_17552774_G_GTTTTTTT | 17552774 | G | GTTTTTTT | 4149 | ENSG00000278558 | 18527803.0 | -975029.0 | 0.660 | 68.0 | 8.157338e-03 | 0.013569 | 0.004985 | True | 0.0 | 5.006843e-01 |
| 18 | 22 | 22_17421100_T_A | 17421100 | T | A | 3324 | ENSG00000280007 | 18110760.0 | -689660.0 | 0.995 | 1.0 | 9.481115e-11 | -0.094931 | 0.012502 | True | 0.0 | 7.866920e-08 |
| 19 | 22 | 22_17435684_T_TAA | 17435684 | T | TAA | 3466 | ENSG00000286990 | 17800723.0 | -365039.0 | 0.980 | 4.0 | 9.338400e-15 | -0.025140 | 0.002575 | True | 0.0 | 3.486283e-11 |
JaxQTL Trans-QTL Analysis#
Trans-QTL analysis identifies associations between variants and genes on different chromosomes. These typically have smaller effect sizes than cis-QTLs and require larger sample sizes to detect.
# Trans-QTL analysis with JaxQTL
results_jaxqtl_trans = run_jaxqtl(
dd, prefix="jaxqtl_trans", mode="trans", model="gaussian", additional_covariates=["gPCs"], run=True
)
results_jaxqtl_trans
[2025-12-29 03:04:00,048] INFO:cellink._core.donordata: Aggregated X to PB
[2025-12-29 03:04:00,048] INFO:cellink._core.donordata: Observation found for 100 donors.
Writing FAM... done.
Writing BIM... done.
| chrom | snp | pos | a0 | a1 | i | phenotype_id | tss | tss_distance | af | ma_count | pval_nominal | slope | slope_se | converged | alpha | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 22 | 22_16388891_G_A | 16388891 | G | A | 0 | ENSG00000177663 | 17084955.0 | -696064.0 | 0.945 | 11.0 | 0.296205 | -0.021325 | 0.020414 | 1.0 | 0.0 |
| 1 | 22 | 22_16388968_C_T | 16388968 | C | T | 1 | ENSG00000177663 | 17084955.0 | -695987.0 | 0.945 | 11.0 | 0.296205 | -0.021325 | 0.020414 | 1.0 | 0.0 |
| 2 | 22 | 22_16389525_A_G | 16389525 | A | G | 2 | ENSG00000177663 | 17084955.0 | -695430.0 | 0.945 | 11.0 | 0.296205 | -0.021325 | 0.020414 | 1.0 | 0.0 |
| 3 | 22 | 22_16390411_G_A | 16390411 | G | A | 3 | ENSG00000177663 | 17084955.0 | -694544.0 | 0.945 | 11.0 | 0.296205 | -0.021325 | 0.020414 | 1.0 | 0.0 |
| 4 | 22 | 22_16391555_G_C | 16391555 | G | C | 4 | ENSG00000177663 | 17084955.0 | -693400.0 | 0.945 | 11.0 | 0.296205 | -0.021325 | 0.020414 | 1.0 | 0.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2105019 | 22 | 22_17583056_C_CA | 17583056 | C | CA | 4272 | ENSG00000079974 | 50767502.0 | -33184446.0 | 0.830 | 34.0 | 0.504663 | 0.009504 | 0.014245 | 1.0 | 0.0 |
| 2105020 | 22 | 22_17583078_A_G | 17583078 | A | G | 4273 | ENSG00000079974 | 50767502.0 | -33184424.0 | 0.830 | 34.0 | 0.504663 | 0.009504 | 0.014245 | 1.0 | 0.0 |
| 2105021 | 22 | 22_17584465_GAGAGAGAA_G | 17584465 | GAGAGAGAA | G | 4274 | ENSG00000079974 | 50767502.0 | -33183037.0 | 0.995 | 1.0 | 0.761113 | -0.022471 | 0.073914 | 1.0 | 0.0 |
| 2105022 | 22 | 22_17584467_GAGAGAA_G | 17584467 | GAGAGAA | G | 4275 | ENSG00000079974 | 50767502.0 | -33183035.0 | 0.970 | 6.0 | 0.649968 | -0.014370 | 0.031666 | 1.0 | 0.0 |
| 2105023 | 22 | 22_17584471_GAA_G | 17584471 | GAA | G | 4276 | ENSG00000079974 | 50767502.0 | -33183031.0 | 0.980 | 4.0 | 0.336810 | -0.033426 | 0.034802 | 1.0 | 0.0 |
2105024 rows × 16 columns
TensorQTL Analysis#
TensorQTL provides comprehensive QTL mapping capabilities with support for various analysis modes and statistical approaches. TensorQTL can be run in various modes: ["cis_nominal", "cis_independent", "cis", "trans", "cis_susie", "trans_susie"].
Basic Cis-QTL Analysis with TensorQTL#
The basic cis mode in TensorQTL performs permutation-based cis-QTL mapping, similar to JaxQTL but with TensorFlow-based GPU acceleration.
# Basic cis-QTL mapping
results_tensorqtl_cis = run_tensorqtl(
dd,
prefix="tensorqtl_cis",
mode="cis",
window=cis_window,
additional_covariates=["gPCs"],
permutations=10000,
run=True,
)
results_tensorqtl_cis
[2025-12-29 03:06:59,460] INFO:cellink._core.donordata: Aggregated X to PB
[2025-12-29 03:06:59,461] INFO:cellink._core.donordata: Observation found for 100 donors.
[2025-12-29 03:06:59,715] INFO:cellink.tl.external._tensorqtl: Performing z-normalization of age.
Writing FAM... done.
Writing BIM... done.
PLINK v2.0.0-a.6.9 64-bit (29 Jan 2025) cog-genomics.org/plink/2.0/
(C) 2005-2025 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to tensorqtl_cis.log.
Options in effect:
--bfile tensorqtl_cis
--make-pgen
--out tensorqtl_cis
Start time: Mon Dec 29 03:06:59 2025
24576 MiB RAM detected; reserving 12288 MiB for main workspace.
Using up to 8 compute threads.
100 samples (0 females, 40 males, 60 ambiguous; 100 founders) loaded from
tensorqtl_cis.fam.
4277 variants loaded from tensorqtl_cis.bim.
Note: No phenotype data present.
Writing tensorqtl_cis.psam ... done.
Writing tensorqtl_cis.pvar ... 1010111112121313141415151616171718181919202021212222232324242525262627272828292930303131323233333434353536363737383839394040414142424343444445454646474748484949505051515252535354545555565657575858595960606161626263636464656566666767686869697070717172727373747475757676777778787979808081818282838384848585868687878888898990909191929293939494959596969797989899done.
Writing tensorqtl_cis.pgen ... done.
End time: Mon Dec 29 03:06:59 2025
[Dec 29 03:07:06] Running TensorQTL v1.0.10: cis-QTL mapping
* WARNING: using CPU!
* reading phenotypes (tensorqtl_cis_phenotype.bed.gz)
* cis-window detected as [start - 500,000, end + 500,000]
* reading covariates (tensorqtl_cis_donor_features.tsv)
* loading genotypes
cis-QTL mapping: empirical p-values for phenotypes
* 100 samples
* 871 phenotypes
* 23 covariates
* 4277 variants
* cis-window: ±500,000
** dropping 375 constant phenotypes
* checking phenotypes: 496/496
** dropping 480 phenotypes without variants in cis-window
* computing permutations
processing phenotype 8/16 * WARNING: excluding 33 monomorphic variants
processing phenotype 9/16 * WARNING: excluding 33 monomorphic variants
processing phenotype 10/16 * WARNING: excluding 32 monomorphic variants
processing phenotype 11/16 * WARNING: excluding 31 monomorphic variants
processing phenotype 12/16 * WARNING: excluding 30 monomorphic variants
WARNING: scipy.optimize.newton failed to converge (running scipy.optimize.minimize)
processing phenotype 13/16 * WARNING: excluding 30 monomorphic variants
WARNING: scipy.optimize.newton failed to converge (running scipy.optimize.minimize)
processing phenotype 14/16 * WARNING: excluding 30 monomorphic variants
processing phenotype 15/16 * WARNING: excluding 30 monomorphic variants
processing phenotype 16/16
* WARNING: excluding 10 monomorphic variants
* WARNING: excluding 7 monomorphic variants
* WARNING: excluding 7 monomorphic variants
* WARNING: excluding 7 monomorphic variants
* WARNING: excluding 2 monomorphic variants
* WARNING: excluding 2 monomorphic variants
Time elapsed: 0.04 min
done.
* writing output
Computing q-values
* Number of phenotypes tested: 16
* Correlation between Beta-approximated and empirical p-values: nan
* Proportion of significant phenotypes (1-pi0): 0.00
* QTL phenotypes @ FDR 0.05: 0
[Dec 29 03:07:09] Finished mapping
| phenotype_id | num_var | beta_shape1 | beta_shape2 | true_df | pval_true_df | variant_id | start_distance | end_distance | ma_samples | ma_count | af | pval_nominal | slope | slope_se | pval_perm | pval_beta | qval | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ENSG00000177663 | 4161 | 0.997280 | 407.17500 | 61.42510 | 7.592840e-04 | 22_17404016_TTGGGAGATG_T | 319061 | 288323 | 71 | 87 | 0.565 | 1.965230e-04 | 0.031859 | 0.008135 | 0.273173 | 0.267237 | 0.728300 |
| 1 | ENSG00000069998 | 4114 | 1.015660 | 138.48000 | 38.93290 | 3.588760e-03 | 22_17524093_G_A | 386581 | 358806 | 3 | 3 | 0.985 | 5.029320e-05 | -0.174374 | 0.040528 | 0.382562 | 0.384575 | 0.769149 |
| 2 | ENSG00000185837 | 4096 | 0.797147 | 12.47490 | 9.12607 | 4.709830e-03 | 22_17031458_ATT_A | -127881 | -133993 | 1 | 1 | 0.995 | 1.200610e-16 | -0.061286 | 0.005758 | 0.104890 | 0.108845 | 0.728300 |
| 3 | ENSG00000093072 | 4081 | 0.926989 | 47.02260 | 25.15310 | 4.476540e-03 | 22_17560607_G_GCTCTCC | 381816 | 302372 | 3 | 3 | 0.985 | 7.816000e-07 | -0.199937 | 0.037089 | 0.259374 | 0.220129 | 0.728300 |
| 4 | ENSG00000182902 | 2647 | NaN | NaN | 75.00000 | NaN | 22_17577481_A_C | 14030 | -13514 | 1 | 1 | 0.995 | NaN | -0.012346 | NaN | 0.052395 | NaN | NaN |
| 5 | ENSG00000236754 | 2608 | NaN | NaN | 75.00000 | 4.741170e-08 | 22_17239813_C_T | -339574 | -349379 | 7 | 7 | 0.965 | 4.741170e-08 | -0.001256 | 0.000207 | 0.787621 | NaN | NaN |
| 6 | ENSG00000131100 | 2563 | 1.003070 | 119.93700 | 43.16060 | 8.749770e-03 | 22_17360965_T_C | -231172 | -267784 | 30 | 30 | 0.850 | 5.315510e-04 | -0.088390 | 0.024414 | 0.652235 | 0.650188 | 0.962637 |
| 7 | ENSG00000099968 | 2407 | 0.763298 | 18.40660 | 19.73950 | 9.908240e-02 | 22_17219920_C_T | -408936 | -510935 | 49 | 59 | 0.295 | 1.175950e-03 | 0.039427 | 0.011686 | 0.909409 | 0.903363 | 0.962637 |
| 8 | ENSG00000015475 | 1724 | 1.038670 | 146.92700 | 55.16530 | 2.263510e-02 | 22_17345140_C_T | -388999 | -429630 | 2 | 2 | 0.990 | 7.788650e-03 | -0.137268 | 0.050196 | 0.961504 | 0.962637 | 0.962637 |
| 9 | ENSG00000269220 | 1525 | 0.664014 | 9.55225 | 16.92340 | 7.565680e-03 | 22_17351364_TTC_T | -425954 | -429155 | 2 | 2 | 0.990 | 1.309410e-08 | -0.144193 | 0.022597 | 0.244576 | 0.186541 | 0.728300 |
| 10 | ENSG00000243156 | 1494 | 0.898013 | 14.83950 | 18.30200 | 1.113210e-01 | 22_17566826_C_G | -220824 | -457735 | 3 | 3 | 0.985 | 1.128620e-03 | -0.041466 | 0.012243 | 0.830717 | 0.851997 | 0.962637 |
| 11 | ENSG00000286990 | 1477 | 0.528184 | 4.81984 | 8.79398 | 2.015620e-02 | 22_17435684_T_TAA | -365039 | -366351 | 3 | 4 | 0.980 | 3.720230e-12 | -0.020749 | 0.002510 | 0.239476 | 0.312129 | 0.728300 |
| 12 | ENSG00000235445 | 490 | 1.000000 | 3.85027 | 6.99940 | 3.334120e-01 | 22_17523014_A_AGGGG | -460192 | -460605 | 30 | 32 | 0.840 | 1.080330e-03 | -0.000655 | 0.000193 | 0.823418 | 0.790201 | 0.962637 |
| 13 | ENSG00000234913 | 393 | 0.459678 | 4.05765 | 14.42780 | 9.079320e-03 | 22_17560607_G_GCTCTCC | -443664 | -446701 | 3 | 3 | 0.985 | 1.652040e-09 | -0.015233 | 0.002218 | 0.133187 | 0.238033 | 0.728300 |
| 14 | ENSG00000225335 | 33 | 0.206226 | 1.35911 | 41.19860 | 5.304390e-01 | 22_17575315_G_A | -499824 | -503569 | 4 | 4 | 0.980 | 3.960240e-01 | -0.012890 | 0.015100 | 0.975702 | 0.922256 | 0.962637 |
| 15 | ENSG00000215193 | 17 | 0.575820 | 4.20176 | 58.76220 | 1.249960e-01 | 22_17578563_A_G | -499361 | -526833 | 54 | 65 | 0.675 | 8.277940e-02 | 0.016903 | 0.009613 | 0.572643 | 0.651453 | 0.962637 |
Nominal Cis-QTL Analysis#
Nominal mode tests all variant-gene pairs within the specified window and returns comprehensive association statistics. This mode can also output top associations per gene and significant pairs based on a p-value threshold.
# Nominal cis-QTL analysis (all variant-gene pairs)
results_tensorqtl_nominal = run_tensorqtl(
dd,
prefix="tensorqtl_nominal",
mode="cis_nominal",
window=cis_window,
pval_threshold=1e-5,
additional_covariates=["gPCs"],
batch_size=20000,
run=True,
)
# Results contain multiple outputs for nominal mode
cis_qtl_pairs, cis_qtl_signif_pairs, cis_qtl_top_assoc = results_tensorqtl_nominal
print(f"Total variant-gene pairs tested: {len(cis_qtl_pairs)}")
print(f"Significant pairs (p < {1e-5}): {len(cis_qtl_signif_pairs) if cis_qtl_signif_pairs is not None else 0}")
cis_qtl_pairs
[2025-12-29 03:07:09,832] INFO:cellink._core.donordata: Aggregated X to PB
[2025-12-29 03:07:09,834] INFO:cellink._core.donordata: Observation found for 100 donors.
[2025-12-29 03:07:10,153] INFO:cellink.tl.external._tensorqtl: Performing z-normalization of age.
Writing FAM... done.
Writing BIM... done.
PLINK v2.0.0-a.6.9 64-bit (29 Jan 2025) cog-genomics.org/plink/2.0/
(C) 2005-2025 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to tensorqtl_nominal.log.
Options in effect:
--bfile tensorqtl_nominal
--make-pgen
--out tensorqtl_nominal
Start time: Mon Dec 29 03:07:10 2025
24576 MiB RAM detected; reserving 12288 MiB for main workspace.
Using up to 8 compute threads.
100 samples (0 females, 40 males, 60 ambiguous; 100 founders) loaded from
tensorqtl_nominal.fam.
4277 variants loaded from tensorqtl_nominal.bim.
Note: No phenotype data present.
Writing tensorqtl_nominal.psam ... done.
Writing tensorqtl_nominal.pvar ... 1010111112121313141415151616171718181919202021212222232324242525262627272828292930303131323233333434353536363737383839394040414142424343444445454646474748484949505051515252535354545555565657575858595960606161626263636464656566666767686869697070717172727373747475757676777778787979808081818282838384848585868687878888898990909191929293939494959596969797989899done.
Writing tensorqtl_nominal.pgen ... done.
End time: Mon Dec 29 03:07:10 2025
[Dec 29 03:07:14] Running TensorQTL v1.0.10: cis-QTL mapping
* WARNING: using CPU!
* reading phenotypes (tensorqtl_nominal_phenotype.bed.gz)
* cis-window detected as [start - 500,000, end + 500,000]
* reading covariates (tensorqtl_nominal_donor_features.tsv)
* loading genotypes
cis-QTL mapping: nominal associations for all variant-phenotype pairs
* 100 samples
* 871 phenotypes
* 23 covariates
* 4277 variants
* cis-window: ±500,000
** dropping 375 constant phenotypes
* checking phenotypes: 496/496
** dropping 480 phenotypes without variants in cis-window
* Computing associations
Mapping chromosome 22
processing phenotype 16/16
time elapsed: 0.00 min
* writing output
done.
[Dec 29 03:07:14] Finished mapping
Total variant-gene pairs tested: 34114
Significant pairs (p < 1e-05): 0
| phenotype_id | variant_id | start_distance | end_distance | af | ma_samples | ma_count | pval_nominal | slope | slope_se | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ENSG00000177663 | 22_16585130_T_C | -499825 | -530563 | 0.980 | 4 | 4 | 0.339680 | 0.032829 | 0.034164 |
| 1 | ENSG00000177663 | 22_16585144_G_A | -499811 | -530549 | 0.905 | 17 | 19 | 0.402192 | 0.011152 | 0.013236 |
| 2 | ENSG00000177663 | 22_16585510_C_G | -499445 | -530183 | 0.500 | 71 | 100 | 0.109400 | -0.012162 | 0.007507 |
| 3 | ENSG00000177663 | 22_16585603_G_A | -499352 | -530090 | 0.740 | 44 | 52 | 0.395744 | 0.007709 | 0.009025 |
| 4 | ENSG00000177663 | 22_16585810_T_C | -499145 | -529883 | 0.730 | 43 | 54 | 0.478267 | 0.006274 | 0.008803 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 34109 | ENSG00000215193 | 22_17583056_C_CA | -494868 | -522340 | 0.830 | 30 | 34 | 0.274086 | 0.013370 | 0.012135 |
| 34110 | ENSG00000215193 | 22_17583078_A_G | -494846 | -522318 | 0.830 | 30 | 34 | 0.274086 | 0.013370 | 0.012135 |
| 34111 | ENSG00000215193 | 22_17584465_GAGAGAGAA_G | -493459 | -520931 | 0.995 | 1 | 1 | 0.649537 | -0.028478 | 0.062419 |
| 34112 | ENSG00000215193 | 22_17584467_GAGAGAA_G | -493457 | -520929 | 0.970 | 6 | 6 | 0.833054 | 0.005817 | 0.027500 |
| 34113 | ENSG00000215193 | 22_17584471_GAA_G | -493453 | -520925 | 0.980 | 4 | 4 | 0.093625 | 0.049907 | 0.029389 |
34114 rows × 10 columns
Cis-Independent QTL Analysis#
The cis_independent mode identifies multiple independent association signals for each gene by performing conditional analysis. This is crucial for understanding the genetic architecture when multiple causal variants affect the same gene.
# First run basic cis mode to get top associations
results_tensorqtl_cis = run_tensorqtl(
dd,
prefix="tensorqtl_cis",
mode="cis",
window=cis_window,
additional_covariates=["gPCs"],
permutations=10000,
fdr=0.9, # Very permissive for demo
run=True,
)
# Now identify independent signals conditioning on the top hits
results_tensorqtl_independent = run_tensorqtl(
dd,
prefix="tensorqtl_independent",
mode="cis_independent",
window=cis_window,
cis_output="tensorqtl_cis.cis_qtl.txt.gz", # Use output from cis mode
additional_covariates=["gPCs"],
fdr=0.9, # Very permissive for demo
pval_threshold=0.1, # Very permissive for demo
run=True,
)
print(f"Independent signals identified: {len(results_tensorqtl_independent)}")
results_tensorqtl_independent
[2025-12-29 03:07:14,748] INFO:cellink._core.donordata: Aggregated X to PB
[2025-12-29 03:07:14,749] INFO:cellink._core.donordata: Observation found for 100 donors.
[2025-12-29 03:07:14,916] INFO:cellink.tl.external._tensorqtl: Performing z-normalization of age.
Writing FAM... done.
Writing BIM... done.
PLINK v2.0.0-a.6.9 64-bit (29 Jan 2025) cog-genomics.org/plink/2.0/
(C) 2005-2025 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to tensorqtl_cis.log.
Options in effect:
--bfile tensorqtl_cis
--make-pgen
--out tensorqtl_cis
Start time: Mon Dec 29 03:07:15 2025
24576 MiB RAM detected; reserving 12288 MiB for main workspace.
Using up to 8 compute threads.
100 samples (0 females, 40 males, 60 ambiguous; 100 founders) loaded from
tensorqtl_cis.fam.
4277 variants loaded from tensorqtl_cis.bim.
Note: No phenotype data present.
Writing tensorqtl_cis.psam ... done.
Writing tensorqtl_cis.pvar ... 1010111112121313141415151616171718181919202021212222232324242525262627272828292930303131323233333434353536363737383839394040414142424343444445454646474748484949505051515252535354545555565657575858595960606161626263636464656566666767686869697070717172727373747475757676777778787979808081818282838384848585868687878888898990909191929293939494959596969797989899done.
Writing tensorqtl_cis.pgen ... done.
End time: Mon Dec 29 03:07:15 2025
[Dec 29 03:07:19] Running TensorQTL v1.0.10: cis-QTL mapping
* WARNING: using CPU!
* reading phenotypes (tensorqtl_cis_phenotype.bed.gz)
* cis-window detected as [start - 500,000, end + 500,000]
* reading covariates (tensorqtl_cis_donor_features.tsv)
* loading genotypes
cis-QTL mapping: empirical p-values for phenotypes
* 100 samples
* 871 phenotypes
* 23 covariates
* 4277 variants
* cis-window: ±500,000
** dropping 375 constant phenotypes
* checking phenotypes: 496/496
** dropping 480 phenotypes without variants in cis-window
* computing permutations
processing phenotype 8/16 * WARNING: excluding 33 monomorphic variants
processing phenotype 9/16 * WARNING: excluding 33 monomorphic variants
processing phenotype 10/16 * WARNING: excluding 32 monomorphic variants
processing phenotype 11/16 * WARNING: excluding 31 monomorphic variants
processing phenotype 12/16 * WARNING: excluding 30 monomorphic variants
WARNING: scipy.optimize.newton failed to converge (running scipy.optimize.minimize)
processing phenotype 13/16 * WARNING: excluding 30 monomorphic variants
WARNING: scipy.optimize.newton failed to converge (running scipy.optimize.minimize)
processing phenotype 14/16 * WARNING: excluding 30 monomorphic variants
processing phenotype 15/16 * WARNING: excluding 30 monomorphic variants
processing phenotype 16/16
* WARNING: excluding 10 monomorphic variants
* WARNING: excluding 7 monomorphic variants
* WARNING: excluding 7 monomorphic variants
* WARNING: excluding 7 monomorphic variants
* WARNING: excluding 2 monomorphic variants
* WARNING: excluding 2 monomorphic variants
Time elapsed: 0.04 min
done.
* writing output
Computing q-values
* Number of phenotypes tested: 16
* Correlation between Beta-approximated and empirical p-values: nan
* Proportion of significant phenotypes (1-pi0): 0.00
* QTL phenotypes @ FDR 0.90: 7
* min p-value threshold @ FDR 0.9: 0.518721
[Dec 29 03:07:22] Finished mapping
[2025-12-29 03:07:22,502] INFO:cellink._core.donordata: Aggregated X to PB
[2025-12-29 03:07:22,502] INFO:cellink._core.donordata: Observation found for 100 donors.
[2025-12-29 03:07:22,668] INFO:cellink.tl.external._tensorqtl: Performing z-normalization of age.
Writing FAM... done.
Writing BIM... done.
PLINK v2.0.0-a.6.9 64-bit (29 Jan 2025) cog-genomics.org/plink/2.0/
(C) 2005-2025 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to tensorqtl_independent.log.
Options in effect:
--bfile tensorqtl_independent
--make-pgen
--out tensorqtl_independent
Start time: Mon Dec 29 03:07:22 2025
24576 MiB RAM detected; reserving 12288 MiB for main workspace.
Using up to 8 compute threads.
100 samples (0 females, 40 males, 60 ambiguous; 100 founders) loaded from
tensorqtl_independent.fam.
4277 variants loaded from tensorqtl_independent.bim.
Note: No phenotype data present.
Writing tensorqtl_independent.psam ... done.
Writing tensorqtl_independent.pvar ... 1010111112121313141415151616171718181919202021212222232324242525262627272828292930303131323233333434353536363737383839394040414142424343444445454646474748484949505051515252535354545555565657575858595960606161626263636464656566666767686869697070717172727373747475757676777778787979808081818282838384848585868687878888898990909191929293939494959596969797989899done.
Writing tensorqtl_independent.pgen ... done.
End time: Mon Dec 29 03:07:22 2025
[Dec 29 03:07:26] Running TensorQTL v1.0.10: cis-QTL mapping
* WARNING: using CPU!
* reading phenotypes (tensorqtl_independent_phenotype.bed.gz)
* cis-window detected as [start - 500,000, end + 500,000]
* reading covariates (tensorqtl_independent_donor_features.tsv)
* loading genotypes
cis-QTL mapping: conditionally independent variants
* 100 samples
* 7/16 significant phenotypes
* 23 covariates
* 4277 variants
* cis-window: ±500,000
* checking phenotypes: 7/7
* computing independent QTLs
processing phenotype 7/7
Time elapsed: 0.04 min
done.
* writing output
[Dec 29 03:07:29] Finished mapping
Independent signals identified: 9
| phenotype_id | num_var | beta_shape1 | beta_shape2 | true_df | pval_true_df | variant_id | start_distance | end_distance | ma_samples | ma_count | af | pval_nominal | slope | slope_se | pval_perm | pval_beta | rank | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ENSG00000177663 | 4194 | 1.015110 | 423.35200 | 60.11670 | 0.000004 | 22_17404016_TTGGGAGATG_T | 319061 | 288323 | 71 | 87 | 0.565 | 3.452950e-07 | 0.037838 | 0.006747 | 0.001800 | 0.001449 | 1 |
| 1 | ENSG00000177663 | 4194 | 1.003600 | 440.10500 | 60.74600 | 0.000062 | 22_17546485_C_CA | 461530 | 430792 | 2 | 2 | 0.990 | 1.124440e-05 | -0.161322 | 0.034204 | 0.023898 | 0.026704 | 2 |
| 2 | ENSG00000177663 | 4194 | 1.008040 | 429.60400 | 60.42130 | 0.000533 | 22_17214059_A_G | 129104 | 98366 | 27 | 28 | 0.860 | 1.389950e-04 | 0.039808 | 0.009897 | 0.200780 | 0.201398 | 3 |
| 3 | ENSG00000069998 | 4114 | 1.039220 | 127.93900 | 37.89910 | 0.004069 | 22_17524093_G_A | 386581 | 358806 | 3 | 3 | 0.985 | 5.029320e-05 | -0.174374 | 0.040528 | 0.390561 | 0.387522 | 1 |
| 4 | ENSG00000185837 | 4096 | 0.773050 | 13.02420 | 9.41844 | 0.004061 | 22_17031458_ATT_A | -127881 | -133993 | 1 | 1 | 0.995 | 1.200610e-16 | -0.061286 | 0.005758 | 0.104590 | 0.108415 | 1 |
| 5 | ENSG00000093072 | 4081 | 0.938737 | 45.70470 | 24.66020 | 0.004894 | 22_17560607_G_GCTCTCC | 381816 | 302372 | 3 | 3 | 0.985 | 7.816000e-07 | -0.199937 | 0.037089 | 0.265473 | 0.226310 | 1 |
| 6 | ENSG00000269220 | 1525 | 0.655935 | 9.81804 | 17.30170 | 0.006905 | 22_17351364_TTC_T | -425954 | -429155 | 2 | 2 | 0.990 | 1.309410e-08 | -0.144193 | 0.022597 | 0.237676 | 0.183322 | 1 |
| 7 | ENSG00000286990 | 1477 | 0.522711 | 4.71656 | 8.67813 | 0.021041 | 22_17435684_T_TAA | -365039 | -366351 | 3 | 4 | 0.980 | 3.720230e-12 | -0.020749 | 0.002510 | 0.247375 | 0.319513 | 1 |
| 8 | ENSG00000234913 | 393 | 0.473676 | 4.08297 | 14.34160 | 0.009296 | 22_17560607_G_GCTCTCC | -443664 | -446701 | 3 | 3 | 0.985 | 1.652040e-09 | -0.015233 | 0.002218 | 0.133487 | 0.230412 | 1 |
Trans-QTL Analysis with TensorQTL#
Trans-QTL mapping with TensorQTL tests associations between variants and distant genes. Due to the large number of tests, stricter p-value thresholds are typically required.
# Trans-QTL mapping
results_tensorqtl_trans = run_tensorqtl(
dd,
prefix="tensorqtl_trans",
mode="trans",
pval_threshold=1e-6, # Stricter threshold for trans
additional_covariates=["gPCs"],
batch_size=10000, # Smaller batches for trans analysis
return_r2=True, # Include R² statistics
run=True,
)
results_tensorqtl_trans
[2025-12-29 03:07:29,508] INFO:cellink._core.donordata: Aggregated X to PB
[2025-12-29 03:07:29,508] INFO:cellink._core.donordata: Observation found for 100 donors.
[2025-12-29 03:07:29,677] INFO:cellink.tl.external._tensorqtl: Performing z-normalization of age.
Writing FAM... done.
Writing BIM... done.
PLINK v2.0.0-a.6.9 64-bit (29 Jan 2025) cog-genomics.org/plink/2.0/
(C) 2005-2025 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to tensorqtl_trans.log.
Options in effect:
--bfile tensorqtl_trans
--make-pgen
--out tensorqtl_trans
Start time: Mon Dec 29 03:07:29 2025
24576 MiB RAM detected; reserving 12288 MiB for main workspace.
Using up to 8 compute threads.
100 samples (0 females, 40 males, 60 ambiguous; 100 founders) loaded from
tensorqtl_trans.fam.
4277 variants loaded from tensorqtl_trans.bim.
Note: No phenotype data present.
Writing tensorqtl_trans.psam ... done.
Writing tensorqtl_trans.pvar ... 1010111112121313141415151616171718181919202021212222232324242525262627272828292930303131323233333434353536363737383839394040414142424343444445454646474748484949505051515252535354545555565657575858595960606161626263636464656566666767686869697070717172727373747475757676777778787979808081818282838384848585868687878888898990909191929293939494959596969797989899done.
Writing tensorqtl_trans.pgen ... done.
End time: Mon Dec 29 03:07:29 2025
[Dec 29 03:07:34] Running TensorQTL v1.0.10: trans-QTL mapping
* WARNING: using CPU!
* reading phenotypes (tensorqtl_trans_phenotype.bed.gz)
* reading covariates (tensorqtl_trans_donor_features.tsv)
* loading genotypes
* p-value threshold: 1e-06
trans-QTL mapping
* 100 samples
* 871 phenotypes
* 23 covariates
* 4277 variants
processing batch 1/1
elapsed time: 0.00 min
done.
* filtering out cis-QTLs (within +/-5Mb)
* writing sparse output
[Dec 29 03:07:34] Finished mapping
| variant_id | phenotype_id | pval | b | b_se | r2 | af | |
|---|---|---|---|---|---|---|---|
| 0 | 22_16393930_G_T | ENSG00000273350 | 2.124748e-07 | -0.002581 | 0.000452 | 0.303167 | 0.980 |
| 1 | 22_16393939_G_C | ENSG00000273350 | 2.124748e-07 | -0.002581 | 0.000452 | 0.303167 | 0.980 |
| 3 | 22_16414733_G_A | ENSG00000133460 | 6.571755e-08 | -0.264812 | 0.044160 | 0.324078 | 0.985 |
| 4 | 22_16414733_G_A | ENSG00000099999 | 1.474357e-07 | -0.062907 | 0.010844 | 0.309742 | 0.985 |
| 5 | 22_16414733_G_A | ENSG00000273076 | 1.771229e-07 | -0.062787 | 0.010907 | 0.306448 | 0.985 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 2475 | 22_17582333_G_A | ENSG00000128165 | 7.672292e-07 | -0.053877 | 0.009986 | 0.279612 | 0.995 |
| 2477 | 22_17582867_C_T | ENSG00000100156 | 1.227405e-09 | -0.002649 | 0.000382 | 0.390730 | 0.985 |
| 2478 | 22_17582867_C_T | ENSG00000286381 | 2.763245e-07 | -0.054595 | 0.009666 | 0.298404 | 0.985 |
| 2480 | 22_17584465_GAGAGAGAA_G | ENSG00000128165 | 7.672292e-07 | -0.053877 | 0.009986 | 0.279612 | 0.995 |
| 2481 | 22_17584471_GAA_G | ENSG00000213888 | 8.686729e-08 | -0.039012 | 0.006579 | 0.319159 | 0.980 |
2139 rows × 7 columns
SuSiE Finemapping with TensorQTL#
SuSiE (Sum of Single Effects) is a powerful finemapping method that identifies credible sets of causal variants for each independent association signal. It accounts for LD structure and provides posterior probabilities for each variant being causal.
Cis-SuSiE Finemapping#
Cis-SuSiE performs finemapping on the top cis-QTL signals to identify credible sets of potentially causal variants. This requires the output from a previous cis-QTL run.
# First, run standard cis-QTL analysis if not already done
results_tensorqtl_cis = run_tensorqtl(
dd,
prefix="tensorqtl_cis_for_susie",
mode="cis",
window=cis_window,
additional_covariates=["gPCs"],
permutations=10000,
run=True,
)
# Perform SuSiE finemapping on cis-QTL signals
results_tensorqtl_cis_susie = run_tensorqtl(
dd,
prefix="tensorqtl_cis_susie",
mode="cis_susie",
window=cis_window,
cis_output="tensorqtl_cis_for_susie.cis_qtl.txt.gz", # Input from cis mode
additional_covariates=["gPCs"],
max_effects=10, # Maximum number of causal effects per gene
fdr=0.9, # Very permissive for demo, FDR threshold for selecting genes to finemapping
run=True,
)
# Results contain SuSiE model objects and summary statistics
susie_models, susie_summary = results_tensorqtl_cis_susie
print(f"Genes with finemapping results: {len(susie_models)}")
print("\nSummary statistics columns:")
print(susie_summary.columns.tolist())
print(f"\nTotal credible sets identified: {len(susie_summary)}")
# Examine credible sets for a specific gene
if len(susie_summary) > 0:
example_gene = susie_summary["phenotype_id"].iloc[0]
gene_credible_sets = susie_summary[susie_summary["phenotype_id"] == example_gene]
print(f"\nCredible sets for {example_gene}:")
print(gene_credible_sets)
[2025-12-29 03:07:34,596] INFO:cellink._core.donordata: Aggregated X to PB
[2025-12-29 03:07:34,596] INFO:cellink._core.donordata: Observation found for 100 donors.
[2025-12-29 03:07:34,861] INFO:cellink.tl.external._tensorqtl: Performing z-normalization of age.
Writing FAM... done.
Writing BIM... done.
PLINK v2.0.0-a.6.9 64-bit (29 Jan 2025) cog-genomics.org/plink/2.0/
(C) 2005-2025 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to tensorqtl_cis_for_susie.log.
Options in effect:
--bfile tensorqtl_cis_for_susie
--make-pgen
--out tensorqtl_cis_for_susie
Start time: Mon Dec 29 03:07:34 2025
24576 MiB RAM detected; reserving 12288 MiB for main workspace.
Using up to 8 compute threads.
100 samples (0 females, 40 males, 60 ambiguous; 100 founders) loaded from
tensorqtl_cis_for_susie.fam.
4277 variants loaded from tensorqtl_cis_for_susie.bim.
Note: No phenotype data present.
Writing tensorqtl_cis_for_susie.psam ... done.
Writing tensorqtl_cis_for_susie.pvar ... 1010111112121313141415151616171718181919202021212222232324242525262627272828292930303131323233333434353536363737383839394040414142424343444445454646474748484949505051515252535354545555565657575858595960606161626263636464656566666767686869697070717172727373747475757676777778787979808081818282838384848585868687878888898990909191929293939494959596969797989899done.
Writing tensorqtl_cis_for_susie.pgen ... done.
End time: Mon Dec 29 03:07:34 2025
[Dec 29 03:07:38] Running TensorQTL v1.0.10: cis-QTL mapping
* WARNING: using CPU!
* reading phenotypes (tensorqtl_cis_for_susie_phenotype.bed.gz)
* cis-window detected as [start - 500,000, end + 500,000]
* reading covariates (tensorqtl_cis_for_susie_donor_features.tsv)
* loading genotypes
cis-QTL mapping: empirical p-values for phenotypes
* 100 samples
* 871 phenotypes
* 23 covariates
* 4277 variants
* cis-window: ±500,000
** dropping 375 constant phenotypes
* checking phenotypes: 496/496
** dropping 480 phenotypes without variants in cis-window
* computing permutations
processing phenotype 8/16 * WARNING: excluding 33 monomorphic variants
processing phenotype 9/16 * WARNING: excluding 33 monomorphic variants
processing phenotype 10/16 * WARNING: excluding 32 monomorphic variants
processing phenotype 11/16 * WARNING: excluding 31 monomorphic variants
processing phenotype 12/16 * WARNING: excluding 30 monomorphic variants
WARNING: scipy.optimize.newton failed to converge (running scipy.optimize.minimize)
processing phenotype 13/16 * WARNING: excluding 30 monomorphic variants
WARNING: scipy.optimize.newton failed to converge (running scipy.optimize.minimize)
processing phenotype 14/16 * WARNING: excluding 30 monomorphic variants
processing phenotype 15/16 * WARNING: excluding 30 monomorphic variants
processing phenotype 16/16
* WARNING: excluding 10 monomorphic variants
* WARNING: excluding 7 monomorphic variants
* WARNING: excluding 7 monomorphic variants
* WARNING: excluding 7 monomorphic variants
* WARNING: excluding 2 monomorphic variants
* WARNING: excluding 2 monomorphic variants
Time elapsed: 0.04 min
done.
* writing output
Computing q-values
* Number of phenotypes tested: 16
* Correlation between Beta-approximated and empirical p-values: nan
* Proportion of significant phenotypes (1-pi0): 0.00
* QTL phenotypes @ FDR 0.05: 0
[Dec 29 03:07:41] Finished mapping
[2025-12-29 03:07:42,336] INFO:cellink._core.donordata: Aggregated X to PB
[2025-12-29 03:07:42,336] INFO:cellink._core.donordata: Observation found for 100 donors.
[2025-12-29 03:07:42,499] INFO:cellink.tl.external._tensorqtl: Performing z-normalization of age.
Writing FAM... done.
Writing BIM... done.
PLINK v2.0.0-a.6.9 64-bit (29 Jan 2025) cog-genomics.org/plink/2.0/
(C) 2005-2025 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to tensorqtl_cis_susie.log.
Options in effect:
--bfile tensorqtl_cis_susie
--make-pgen
--out tensorqtl_cis_susie
Start time: Mon Dec 29 03:07:42 2025
24576 MiB RAM detected; reserving 12288 MiB for main workspace.
Using up to 8 compute threads.
100 samples (0 females, 40 males, 60 ambiguous; 100 founders) loaded from
tensorqtl_cis_susie.fam.
4277 variants loaded from tensorqtl_cis_susie.bim.
Note: No phenotype data present.
Writing tensorqtl_cis_susie.psam ... done.
Writing tensorqtl_cis_susie.pvar ... 1010111112121313141415151616171718181919202021212222232324242525262627272828292930303131323233333434353536363737383839394040414142424343444445454646474748484949505051515252535354545555565657575858595960606161626263636464656566666767686869697070717172727373747475757676777778787979808081818282838384848585868687878888898990909191929293939494959596969797989899done.
Writing tensorqtl_cis_susie.pgen ... done.
End time: Mon Dec 29 03:07:42 2025
[Dec 29 03:07:46] Running TensorQTL v1.0.10: cis-QTL mapping
* WARNING: using CPU!
* reading phenotypes (tensorqtl_cis_susie_phenotype.bed.gz)
* cis-window detected as [start - 500,000, end + 500,000]
* reading covariates (tensorqtl_cis_susie_donor_features.tsv)
* loading genotypes
SuSiE fine-mapping
* 100 samples
* 7 phenotypes
* 23 covariates
* 4277 variants
* cis-window: ±500,000
* checking phenotypes: 7/7
* fine-mapping
processing phenotype 7/7
Time elapsed: 0.01 min
done.
[Dec 29 03:07:47] Finished mapping
Genes with finemapping results: 5
Summary statistics columns:
['phenotype_id', 'variant_id', 'pip', 'af', 'cs_id']
Total credible sets identified: 12
Credible sets for ENSG00000185837:
phenotype_id variant_id pip af cs_id
0 ENSG00000185837 22_17031458_ATT_A 1.0 0.995 1
Advanced Usage: Dry Run and Command Generation#
Both JaxQTL and TensorQTL support generating commands without execution, which is useful for debugging or running on compute clusters. This is controlled by the run argument.
JaxQTL Command Generation#
# Generate JaxQTL command without running
jaxqtl_command = run_jaxqtl(
dd,
prefix="jaxqtl_cluster",
mode="cis",
additional_covariates=["gPCs"],
run=False, # Don't execute, just return command
)
print("JaxQTL command:")
print(jaxqtl_command)
[2025-12-29 03:05:56,865] INFO:cellink._core.donordata: Aggregated X to PB
[2025-12-29 03:05:56,866] INFO:cellink._core.donordata: Observation found for 100 donors.
Writing FAM... done.
Writing BIM... done.
JaxQTL command:
jaxqtl --geno jaxqtl_cluster --covar jaxqtl_cluster_donor_features.tsv --pheno jaxqtl_cluster_phenotype.bed.gz --model NB --mode cis --test-method score --window 500000 --nperm 1000 --addpc 2 --out jaxqtl_cluster --standardize
TensorQTL Command Generation#
# Generate TensorQTL command without running
tensorqtl_command = run_tensorqtl(
dd,
prefix="tensorqtl_cluster",
mode="cis",
additional_covariates=["gPCs"],
run=False, # Don't execute, just return command
)
print("\nTensorQTL command:")
print(tensorqtl_command)
[2025-12-29 03:08:06,239] INFO:cellink._core.donordata: Aggregated X to PB
[2025-12-29 03:08:06,240] INFO:cellink._core.donordata: Observation found for 100 donors.
[2025-12-29 03:08:06,429] INFO:cellink.tl.external._tensorqtl: Performing z-normalization of age.
Writing FAM... done.
Writing BIM... done.
PLINK v2.0.0-a.6.9 64-bit (29 Jan 2025) cog-genomics.org/plink/2.0/
(C) 2005-2025 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to tensorqtl_cluster.log.
Options in effect:
--bfile tensorqtl_cluster
--make-pgen
--out tensorqtl_cluster
Start time: Mon Dec 29 03:08:06 2025
24576 MiB RAM detected; reserving 12288 MiB for main workspace.
Using up to 8 compute threads.
100 samples (0 females, 40 males, 60 ambiguous; 100 founders) loaded from
tensorqtl_cluster.fam.
4277 variants loaded from tensorqtl_cluster.bim.
Note: No phenotype data present.
Writing tensorqtl_cluster.psam ... done.
Writing tensorqtl_cluster.pvar ... 1010111112121313141415151616171718181919202021212222232324242525262627272828292930303131323233333434353536363737383839394040414142424343444445454646474748484949505051515252535354545555565657575858595960606161626263636464656566666767686869697070717172727373747475757676777778787979808081818282838384848585868687878888898990909191929293939494959596969797989899done.
Writing tensorqtl_cluster.pgen ... done.
End time: Mon Dec 29 03:08:06 2025
TensorQTL command:
python -m tensorqtl tensorqtl_cluster tensorqtl_cluster_phenotype.bed.gz tensorqtl_cluster --covariates tensorqtl_cluster_donor_features.tsv --mode cis --permutations 10000 --window 1000000 --pval_threshold 1e-05 --maf_threshold 0 --maf_threshold_interaction 0.05 --batch_size 20000 --warn_monomorphic --max_effects 10 --fdr 0.05
Saving Commands to Files#
You can also save the generated commands to files for batch submission to compute clusters:
# Save JaxQTL command to file
jaxqtl_command = run_jaxqtl(
dd,
prefix="jaxqtl_cluster",
mode="cis_acat",
model="NB",
window=cis_window,
additional_covariates=["gPCs"],
run=False,
save_cmd_file="jaxqtl_job.sh",
)
[2025-12-29 03:06:08,165] INFO:cellink._core.donordata: Aggregated X to PB
[2025-12-29 03:06:08,166] INFO:cellink._core.donordata: Observation found for 100 donors.
Writing FAM... done.
Writing BIM... done.
# Save TensorQTL command to file
tensorqtl_command = run_tensorqtl(
dd,
prefix="tensorqtl_cluster",
mode="cis_susie",
cis_output="previous_cis_results.txt.gz",
additional_covariates=["gPCs"],
run=False,
save_cmd_file="tensorqtl_susie_job.sh",
)
[2025-12-29 03:08:11,429] INFO:cellink._core.donordata: Aggregated X to PB
[2025-12-29 03:08:11,431] INFO:cellink._core.donordata: Observation found for 100 donors.
[2025-12-29 03:08:11,735] INFO:cellink.tl.external._tensorqtl: Performing z-normalization of age.
Writing FAM... done.
Writing BIM... done.
PLINK v2.0.0-a.6.9 64-bit (29 Jan 2025) cog-genomics.org/plink/2.0/
(C) 2005-2025 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to tensorqtl_cluster.log.
Options in effect:
--bfile tensorqtl_cluster
--make-pgen
--out tensorqtl_cluster
Start time: Mon Dec 29 03:08:11 2025
24576 MiB RAM detected; reserving 12288 MiB for main workspace.
Using up to 8 compute threads.
100 samples (0 females, 40 males, 60 ambiguous; 100 founders) loaded from
tensorqtl_cluster.fam.
4277 variants loaded from tensorqtl_cluster.bim.
Note: No phenotype data present.
Writing tensorqtl_cluster.psam ... done.
Writing tensorqtl_cluster.pvar ... 1010111112121313141415151616171718181919202021212222232324242525262627272828292930303131323233333434353536363737383839394040414142424343444445454646474748484949505051515252535354545555565657575858595960606161626263636464656566666767686869697070717172727373747475757676777778787979808081818282838384848585868687878888898990909191929293939494959596969797989899done.
Writing tensorqtl_cluster.pgen ... done.
End time: Mon Dec 29 03:08:11 2025