{ "cells": [ { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "# Differential Splicing\n", "Based on the alternative splicing events, Isotools facilitates comparisons of samples and groups of samples. \n", "\n", "\n", "In this tutorial, we will apply the statistical test to find differential splicing between K562 and GM12878 (on chromosome 8), and how to interpret and depict the results. \n", "\n", "To run this tutorial, download the transcriptome object file 'PacBio_isotools_substantial_isotools.pkl' from [here](https://oc-molgen.gnz.mpg.de/owncloud/s/gjG9EPiQwpRAyg3) to a subfolder 'demonstration_dataset'." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "This is isotools version 0.3.2rc6, but data has been pickled with version 0.3.2rc2, which may be incompatible\n" ] } ], "source": [ "from isotools import Transcriptome\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "\n", "path='demonstration_dataset'\n", "isoseq=Transcriptome.load(f'{path}/PacBio_isotools_substantial_isotools.pkl')\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Statistical test for differential splicing\n", "\n", "\n", "To run the test, we need to define the groups to compare, what types of splicing events are of interest, and the coverage threshold (over all samples) required to test an event. \n", "The resulting table contains the test statistics, including a description of the tested region, p-value, the transcript ids supporting outcome A or B of the event, group wise PSI and overdispersion values as well as sample wise coverage information.\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "100%|██████████| 10803/10803 [00:10<00:00, 1076.70genes/s]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "18 differential splice sites in 16 genes for GM12878 vs K562\n" ] } ], "source": [ "#perform the test:\n", "types_of_interest=['ES','ME','5AS','3AS','IR'] #ignore alternative TSS/PAS for now\n", "\n", "diff_splice=isoseq.altsplice_test(isoseq.groups(), types=types_of_interest, min_total=200).sort_values('pvalue').reset_index(drop=True)\n", "\n", "sig=diff_splice.padj<.1\n", "print(f'{sum(sig)} differential splice sites in {len(diff_splice.loc[sig,\"gene\"].unique())} genes for {\" vs \".join(isoseq.groups())}')\n", "pd.set_option('display.max_columns', None)\n", "diff_splice.head(18)\n", "diff_splice.to_csv(f'{path}/demonstration_dataset_differential_events.csv',index = False)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | gene | \n", "gene_id | \n", "chrom | \n", "strand | \n", "start | \n", "end | \n", "splice_type | \n", "novel | \n", "padj | \n", "pvalue | \n", "trA | \n", "trB | \n", "nmdA | \n", "nmdB | \n", "GM12878_PSI | \n", "GM12878_disp | \n", "K562_PSI | \n", "K562_disp | \n", "total_PSI | \n", "total_disp | \n", "GM12878_a_GM12878_in_cov | \n", "GM12878_a_GM12878_total_cov | \n", "GM12878_b_GM12878_in_cov | \n", "GM12878_b_GM12878_total_cov | \n", "GM12878_c_GM12878_in_cov | \n", "GM12878_c_GM12878_total_cov | \n", "K562_a_K562_in_cov | \n", "K562_a_K562_total_cov | \n", "K562_b_K562_in_cov | \n", "K562_b_K562_total_cov | \n", "K562_c_K562_in_cov | \n", "K562_c_K562_total_cov | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "RIPK2 | \n", "ENSG00000104312.8 | \n", "chr8 | \n", "+ | \n", "89780160 | \n", "89786592 | \n", "ES | \n", "True | \n", "0.000177 | \n", "0.000001 | \n", "[16, 8, 26, 34, 41, 46, 32, 36, 39, 40, 42, 47... | \n", "[2, 0, 12, 9, 13, 4, 5, 10, 11, 14, 15, 17, 18... | \n", "0.000000 | \n", "0.001595 | \n", "0.994244 | \n", "8.728420e-08 | \n", "0.296339 | \n", "1.059189e-05 | \n", "0.719659 | \n", "0.103622 | \n", "76 | \n", "76 | \n", "218 | \n", "220 | \n", "224 | \n", "225 | \n", "34 | \n", "104 | \n", "41 | \n", "153 | \n", "34 | \n", "111 | \n", "
1 | \n", "ASAH1 | \n", "ENSG00000104763.20 | \n", "chr8 | \n", "- | \n", "18067104 | \n", "18067133 | \n", "IR | \n", "True | \n", "0.007126 | \n", "0.000135 | \n", "[11, 13, 80, 99, 69, 70, 101, 79, 81, 83, 88, ... | \n", "[12, 14, 68, 42, 44] | \n", "0.003788 | \n", "0.000000 | \n", "0.512441 | \n", "1.078330e-03 | \n", "0.010335 | \n", "3.306517e-07 | \n", "0.231512 | \n", "0.069018 | \n", "13 | \n", "31 | \n", "23 | \n", "35 | \n", "39 | \n", "81 | \n", "0 | \n", "51 | \n", "1 | \n", "81 | \n", "1 | \n", "62 | \n", "
2 | \n", "RECQL4 | \n", "ENSG00000160957.15 | \n", "chr8 | \n", "- | \n", "144511789 | \n", "144511910 | \n", "5AS | \n", "False | \n", "0.007126 | \n", "0.000145 | \n", "[22, 31, 6, 20, 11, 72, 13, 33, 8, 79, 2, 105,... | \n", "[114, 147, 124, 38, 143, 205, 123, 152, 197, 1... | \n", "0.216585 | \n", "0.191781 | \n", "0.012575 | \n", "9.366640e-07 | \n", "0.401486 | \n", "1.211899e-04 | \n", "0.240155 | \n", "0.041071 | \n", "1 | \n", "150 | \n", "1 | \n", "9 | \n", "0 | \n", "1 | \n", "278 | \n", "666 | \n", "127 | \n", "301 | \n", "177 | \n", "482 | \n", "
3 | \n", "SNHG6 | \n", "ENSG00000245910.9 | \n", "chr8 | \n", "- | \n", "66922392 | \n", "66922613 | \n", "IR | \n", "False | \n", "0.011708 | \n", "0.000409 | \n", "[1, 11, 17, 29, 16, 15, 7, 14, 19, 23, 24, 25,... | \n", "[4, 10] | \n", "0.994152 | \n", "0.022727 | \n", "0.255022 | \n", "1.506108e-05 | \n", "0.021488 | \n", "1.779160e-06 | \n", "0.128672 | \n", "0.014606 | \n", "5 | \n", "26 | \n", "13 | \n", "45 | \n", "21 | \n", "82 | \n", "1 | \n", "70 | \n", "3 | \n", "72 | \n", "1 | \n", "91 | \n", "
4 | \n", "SMIM19 | \n", "ENSG00000176209.12 | \n", "chr8 | \n", "+ | \n", "42541705 | \n", "42546468 | \n", "5AS | \n", "False | \n", "0.011708 | \n", "0.000471 | \n", "[3, 4, 21] | \n", "[2, 5, 9, 26, 7, 17, 22, 24, 27, 28] | \n", "0.000000 | \n", "0.009901 | \n", "0.097171 | \n", "1.705951e-02 | \n", "0.593322 | \n", "2.732809e-05 | \n", "0.304097 | \n", "0.074889 | \n", "10 | \n", "37 | \n", "2 | \n", "48 | \n", "0 | \n", "68 | \n", "26 | \n", "43 | \n", "38 | \n", "63 | \n", "25 | \n", "44 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
142 | \n", "RNF170 | \n", "ENSG00000120925.16 | \n", "chr8 | \n", "- | \n", "42861855 | \n", "42870003 | \n", "ES | \n", "False | \n", "1.000000 | \n", "0.986423 | \n", "[9, 23, 21, 34, 37, 18, 29] | \n", "[3, 4, 20, 16, 15, 10, 13, 8, 27, 33, 39, 6, 1... | \n", "0.000000 | \n", "0.034682 | \n", "0.852105 | \n", "5.859088e-05 | \n", "0.861528 | \n", "8.070359e-06 | \n", "0.856429 | \n", "0.000020 | \n", "10 | \n", "10 | \n", "31 | \n", "36 | \n", "51 | \n", "62 | \n", "22 | \n", "26 | \n", "35 | \n", "41 | \n", "24 | \n", "27 | \n", "
143 | \n", "POLB | \n", "ENSG00000070501.12 | \n", "chr8 | \n", "+ | \n", "42361365 | \n", "42369270 | \n", "ES | \n", "False | \n", "1.000000 | \n", "0.994807 | \n", "[7, 11, 15, 20, 36, 40, 43] | \n", "[1, 0, 2, 12, 3, 10, 23, 24, 27, 35, 5, 9, 13,... | \n", "0.000000 | \n", "0.015810 | \n", "0.894198 | \n", "1.207652e-05 | \n", "0.894023 | \n", "2.830737e-04 | \n", "0.893994 | \n", "0.000007 | \n", "10 | \n", "10 | \n", "55 | \n", "61 | \n", "87 | \n", "99 | \n", "33 | \n", "36 | \n", "35 | \n", "37 | \n", "33 | \n", "40 | \n", "
144 | \n", "SCRIB | \n", "ENSG00000180900.20 | \n", "chr8 | \n", "- | \n", "143804825 | \n", "143804954 | \n", "5AS | \n", "False | \n", "1.000000 | \n", "0.996899 | \n", "[31, 11, 94, 79, 56, 103, 142, 17, 155, 165, 4... | \n", "[5, 0, 14, 27, 36, 4, 18, 58, 6, 67, 12, 29, 3... | \n", "0.023810 | \n", "0.085366 | \n", "0.852941 | \n", "NaN | \n", "0.854052 | \n", "3.572960e-05 | \n", "0.854019 | \n", "0.000025 | \n", "29 | \n", "34 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "97 | \n", "110 | \n", "40 | \n", "52 | \n", "80 | \n", "92 | \n", "
145 | \n", "CHCHD7 | \n", "ENSG00000170791.18 | \n", "chr8 | \n", "+ | \n", "56214667 | \n", "56216432 | \n", "3AS | \n", "False | \n", "1.000000 | \n", "1.000000 | \n", "[0, 7, 3, 6, 18, 34, 39, 2, 26, 9, 37, 50, 56,... | \n", "[1, 4, 5, 19, 27, 42, 48, 54, 33, 35, 38, 41, ... | \n", "0.026565 | \n", "0.050633 | \n", "0.229832 | \n", "5.468918e-06 | \n", "0.231743 | \n", "4.678234e-06 | \n", "0.230432 | \n", "0.000002 | \n", "22 | \n", "95 | \n", "22 | \n", "110 | \n", "56 | \n", "230 | \n", "19 | \n", "93 | \n", "15 | \n", "53 | \n", "24 | \n", "104 | \n", "
146 | \n", "PCM1 | \n", "ENSG00000078674.20 | \n", "chr8 | \n", "+ | \n", "17993619 | \n", "18006262 | \n", "3AS | \n", "True | \n", "1.000000 | \n", "1.000000 | \n", "[68, 15, 1, 6, 160, 31, 35, 11, 154, 253, 32, ... | \n", "[43, 237, 271, 411, 335, 242, 258, 387, 266, 2... | \n", "0.153310 | \n", "0.156250 | \n", "0.102550 | \n", "1.556612e-05 | \n", "0.098836 | \n", "4.151420e-05 | \n", "0.100326 | \n", "0.000007 | \n", "3 | \n", "40 | \n", "2 | \n", "26 | \n", "7 | \n", "51 | \n", "11 | \n", "88 | \n", "2 | \n", "53 | \n", "7 | \n", "61 | \n", "
147 rows × 32 columns
\n", "\n", " | gene_id | \n", "gene_name | \n", "strand | \n", "eventA_type | \n", "eventB_type | \n", "eventA_start | \n", "evemtA_end | \n", "eventB_start | \n", "eventB_end | \n", "pvalue | \n", "padj | \n", "stat | \n", "log2OR | \n", "dcPSI_AB | \n", "dcPSI_BA | \n", "priA_priB | \n", "priA_altB | \n", "altA_priB | \n", "altA_altB | \n", "priA_priB_trID | \n", "priA_altB_trID | \n", "altA_priB_trID | \n", "altA_altB_trID | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
52 | \n", "ENSG00000147813.16 | \n", "NAPRT | \n", "- | \n", "IR | \n", "IR | \n", "143575093 | \n", "143575190 | \n", "143574900 | \n", "143574985 | \n", "4.436456e-82 | \n", "5.501206e-80 | \n", "98.895238 | \n", "6.627829 | \n", "0.525552 | \n", "0.637141 | \n", "472 | \n", "45 | \n", "14 | \n", "132 | \n", "[0, 5, 6, 7, 54, 62, 73, 17, 100, 24, 30, 46, ... | \n", "[25, 48, 129, 68, 128, 33, 133, 101, 135, 141,... | \n", "[41, 96, 110, 112, 115] | \n", "[21, 42, 1, 12, 31, 71, 44, 51, 63, 78, 92, 13... | \n", "
107 | \n", "ENSG00000182325.11 | \n", "FBXL6 | \n", "- | \n", "IR | \n", "IR | \n", "144357786 | \n", "144358031 | \n", "144357121 | \n", "144357438 | \n", "1.074426e-74 | \n", "6.661440e-73 | \n", "1715.833333 | \n", "10.744694 | \n", "0.472927 | \n", "0.469753 | \n", "142 | \n", "3 | \n", "4 | \n", "145 | \n", "[1, 21, 34, 7, 32, 36, 38, 72, 45, 81, 51, 85,... | \n", "[89, 87, 71] | \n", "[4, 40] | \n", "[2, 9, 19, 41, 14, 24, 44, 48, 56, 11, 54, 17,... | \n", "
55 | \n", "ENSG00000167700.9 | \n", "MFSD3 | \n", "+ | \n", "IR | \n", "IR | \n", "144510015 | \n", "144510359 | \n", "144510507 | \n", "144510597 | \n", "4.299909e-32 | \n", "1.777296e-30 | \n", "inf | \n", "38.459595 | \n", "0.503866 | \n", "0.743802 | \n", "180 | \n", "20 | \n", "0 | \n", "42 | \n", "[0, 3, 23, 35, 5, 38, 39] | \n", "[2, 21, 19, 20, 29] | \n", "[] | \n", "[11, 26, 14, 36, 18] | \n", "
45 | \n", "ENSG00000160957.15 | \n", "RECQL4 | \n", "- | \n", "IR | \n", "IR | \n", "144512771 | \n", "144512846 | \n", "144512324 | \n", "144512391 | \n", "1.341822e-31 | \n", "4.159649e-30 | \n", "11.529161 | \n", "3.527216 | \n", "0.331031 | \n", "0.344543 | \n", "1231 | \n", "87 | \n", "81 | \n", "66 | \n", "[22, 31, 6, 20, 114, 147, 124, 11, 38, 72, 13,... | \n", "[401, 191, 221, 94, 364, 68, 204, 127, 519, 17... | \n", "[117, 136, 279, 259, 271, 532, 292, 49, 184, 2... | \n", "[408, 110, 112, 28, 29, 158, 190, 215, 495, 51... | \n", "
108 | \n", "ENSG00000182325.11 | \n", "FBXL6 | \n", "- | \n", "IR | \n", "IR | \n", "144357786 | \n", "144358031 | \n", "144356915 | \n", "144356989 | \n", "6.335233e-27 | \n", "1.571138e-25 | \n", "51.918519 | \n", "5.698177 | \n", "0.228402 | \n", "0.451672 | \n", "163 | \n", "90 | \n", "3 | \n", "86 | \n", "[1, 12, 21, 34, 4, 7, 75, 86, 15, 23, 25, 26, ... | \n", "[2, 3, 41, 24, 44, 48, 8, 17, 37, 6, 18, 33, 5... | \n", "[32, 53] | \n", "[9, 19, 14, 56, 11, 13, 54, 42, 63, 64, 5, 16,... | \n", "
\n", " | gene_id | \n", "gene_name | \n", "chrom | \n", "strand | \n", "start | \n", "end | \n", "padj | \n", "pvalue | \n", "deltaPI | \n", "transcript_ids | \n", "
---|---|---|---|---|---|---|---|---|---|---|
5982 | \n", "ENSG00000070756.17 | \n", "PABPC1 | \n", "chr8 | \n", "- | \n", "100685815 | \n", "100722809 | \n", "0.000000e+00 | \n", "0.000000e+00 | \n", "-0.289084 | \n", "[2, 1] | \n", "
5094 | \n", "ENSG00000156482.11 | \n", "RPL30 | \n", "chr8 | \n", "- | \n", "98024850 | \n", "98046469 | \n", "0.000000e+00 | \n", "0.000000e+00 | \n", "0.515042 | \n", "[3, 2] | \n", "
4805 | \n", "ENSG00000161016.18 | \n", "RPL8 | \n", "chr8 | \n", "- | \n", "144789764 | \n", "144792587 | \n", "0.000000e+00 | \n", "0.000000e+00 | \n", "-0.477042 | \n", "[2, 6] | \n", "
4632 | \n", "ENSG00000164924.18 | \n", "YWHAZ | \n", "chr8 | \n", "- | \n", "100916522 | \n", "100953388 | \n", "0.000000e+00 | \n", "0.000000e+00 | \n", "0.303116 | \n", "[31, 11] | \n", "
4172 | \n", "ENSG00000147604.14 | \n", "RPL7 | \n", "chr8 | \n", "- | \n", "73290241 | \n", "73295789 | \n", "1.157393e-170 | \n", "1.885004e-172 | \n", "0.092221 | \n", "[2, 31] | \n", "
3235 | \n", "ENSG00000104408.11 | \n", "EIF3E | \n", "chr8 | \n", "- | \n", "108162786 | \n", "108443496 | \n", "2.092139e-137 | \n", "4.088872e-139 | \n", "0.166874 | \n", "[0, 20] | \n", "
9988 | \n", "ENSG00000104312.8 | \n", "RIPK2 | \n", "chr8 | \n", "+ | \n", "89757805 | \n", "89791064 | \n", "6.088181e-101 | \n", "1.388185e-102 | \n", "0.677697 | \n", "[2, 0] | \n", "
4239 | \n", "ENSG00000129696.13 | \n", "TTI2 | \n", "chr8 | \n", "- | \n", "33473385 | \n", "33513185 | \n", "9.366700e-100 | \n", "2.440834e-101 | \n", "-0.617118 | \n", "[3, 22] | \n", "
1369 | \n", "ENSG00000147684.10 | \n", "NDUFB9 | \n", "chr8 | \n", "+ | \n", "124539100 | \n", "124580648 | \n", "4.019587e-89 | \n", "1.178381e-90 | \n", "0.399575 | \n", "[0, -1] | \n", "
1525 | \n", "ENSG00000104320.15 | \n", "NBN | \n", "chr8 | \n", "- | \n", "89924514 | \n", "90003228 | \n", "5.103980e-88 | \n", "1.662534e-89 | \n", "-0.409024 | \n", "[2, 1] | \n", "