cellink.tl.combine_annotations#
- cellink.tl.combine_annotations(gdata, keys=None, unique_identifier_cols=None)#
Combine multiple annotation datasets into a single unified dataset.
- Parameters:
gdata (object) – The genomic data object containing annotations stored in
unsunder specific keys.keys (list) – List of annotation keys to combine, by default [“vep”]. These keys correspond to the annotations stored in
gdata.uns, with prefixes likevariant_annotation_key}.unique_identifier_cols (list) – List of columns that uniquely identify a variant-context pair, by default [AAnn.index, AAnn.gene_id, AAnn.feature_id].
- Returns:
None Modifies the
gdataobject in place by adding the combined annotations under the keyvariant_annotation.
Notes
The function ensures that all unique identifier columns are present in each annotation set.
Performs an outer join across all specified annotations based on unique identifiers.
Verifies that no annotation columns are duplicated in the resulting dataset.
Verifies that the number of unique variant-context combinations remains consistent after merging.
- Raises:
AssertionError – If any of the following checks fail: - The provided
keysare a subset of the allowed annotation sources (currently only “vep”). - Unique identifier columns are present in each annotation dataset. - No duplicate annotation columns exist in the combined dataset. - The number of unique variant-context combinations is consistent post-merge.
Examples
>>> combine_annotations(gdata, keys=["vep"]) >>> print(gdata["variant_annotation"]) # Outputs the combined annotations stored in gdata under the `variant_annotation` key.