cellink.resources.get_1000genomes_ld_scores#
- cellink.resources.get_1000genomes_ld_scores(config_path='./cellink/resources/config/1000genomes.yaml', population='EUR', data_home=None, return_path=False, refresh=False)#
Download, extract, and load precomputed 1000 Genomes linkage disequilibrium (LD) scores.
This function downloads population-specific LD scores from the 1000 Genomes project, extracts them to a local directory, and concatenates chromosome-wise annotation and LD score files into pandas DataFrames.
- Parameters:
config_path (str or pathlib.Path, default='./cellink/resources/config/1000genomes.yaml') – Path to YAML configuration file specifying URLs and file names for LD scores.
population (str, default='EUR') – Population code for LD scores. Must be one of {‘EUR’, ‘EAS’}.
data_home (str or pathlib.Path, optional) – Root directory where data will be stored. Defaults to user-specific cache directory.
return_path (bool, default=False) – If True, returns the path to the extracted files and file prefix instead of DataFrames.
refresh (bool, default=False) – If True, re-downloads and re-extracts files even if they already exist locally.
- Return type:
- Returns:
tuple If
return_path=False, returns(annot, ldscores, prefix): - annot : pd.DataFrameConcatenated annotation files for all chromosomes.
- ldscorespd.DataFrame
Concatenated LD score files for all chromosomes.
- prefixstr
File name prefix used in the extracted data.
If
return_path=True, returns(DATA, prefix): - DATA : pathlib.PathPath to the directory containing extracted files.
- prefixstr
File name prefix used in the extracted data.
- Raises:
ValueError – If
populationis not one of the populations listed in the configuration.