cellink.tl.aggregate_annotations_for_varm#

cellink.tl.aggregate_annotations_for_varm(gdata, annotation_key, agg_type='unique_list_max', return_data=False)#

Aggregates a DataFrame containing variant annotations based on the specified aggregation type such that there is only row per variant id. This means that annotations are aggregated across different gene/transcript contexts

Parameters:

gdata (object) – The genomic data object containing annotations stored in uns under specific keys.
annotation_key (str) – Key to access the annotations within gdata.uns. The annotations are expected to be stored as a pandas DataFrame.
agg_type (str) –
Aggregation type to determine how annotation values are combined. Options are:
- ”unique_list_max”: Unique string values are aggregated into a comma-separated string,
  and numeric columns are aggregated by their maximum value.
- ”list”: Aggregates all values into a list, preserving duplicates.
- ”str”: Aggregates all values into a single comma-separated string.
- ”first”: Drops duplicates and keeps only the first occurrence for each variant-context pair.
Default is “unique_list_max”.
return_data (bool) – If True, the aggregated DataFrame is returned in addition to modifying the gdata object. Default is False.

Returns:

pd.DataFrame The aggregated DataFrame is returned if return_data is True. Otherwise, the function writes the aggregated annotations to gdata.varm[“variant_annotation”].

Examples

>>> aggregate_annotations(gdata, "variant_annotation_vep",
                        agg_type = "unique_list_max",
                        debug = True)

cellink.tl.aggregate_annotations_for_varm

Contents

cellink.tl.aggregate_annotations_for_varm#