Differential Splicing¶
Based on the alternative splicing events, Isotools facilitates comparisons of samples and groups of samples.
In this tutorial, we will apply the statistical test to find differential splicing between K562 and GM12878 (on chromosome 8), and how to interpret and depict the results.
To run this tutorial, download the transcriptome object file ‘PacBio_isotools_substantial_isotools.pkl’ from here to a subfolder ‘demonstration_dataset’.
[1]:
from isotools import Transcriptome
import matplotlib.pyplot as plt
import pandas as pd
path='demonstration_dataset'
isoseq=Transcriptome.load(f'{path}/PacBio_isotools_substantial_isotools.pkl')
This is isotools version 0.3.2rc6, but data has been pickled with version 0.3.2rc2, which may be incompatible
Statistical test for differential splicing¶
To run the test, we need to define the groups to compare, what types of splicing events are of interest, and the coverage threshold (over all samples) required to test an event. The resulting table contains the test statistics, including a description of the tested region, p-value, the transcript ids supporting outcome A or B of the event, group wise PSI and overdispersion values as well as sample wise coverage information.
[2]:
#perform the test:
types_of_interest=['ES','ME','5AS','3AS','IR'] #ignore alternative TSS/PAS for now
diff_splice=isoseq.altsplice_test(isoseq.groups(), types=types_of_interest, min_total=200).sort_values('pvalue').reset_index(drop=True)
sig=diff_splice.padj<.1
print(f'{sum(sig)} differential splice sites in {len(diff_splice.loc[sig,"gene"].unique())} genes for {" vs ".join(isoseq.groups())}')
pd.set_option('display.max_columns', None)
diff_splice.head(18)
diff_splice.to_csv(f'{path}/demonstration_dataset_differential_events.csv',index = False)
100%|██████████| 10803/10803 [00:10<00:00, 1076.70genes/s]
18 differential splice sites in 16 genes for GM12878 vs K562
[3]:
diff_splice
[3]:
gene | gene_id | chrom | strand | start | end | splice_type | novel | padj | pvalue | trA | trB | nmdA | nmdB | GM12878_PSI | GM12878_disp | K562_PSI | K562_disp | total_PSI | total_disp | GM12878_a_GM12878_in_cov | GM12878_a_GM12878_total_cov | GM12878_b_GM12878_in_cov | GM12878_b_GM12878_total_cov | GM12878_c_GM12878_in_cov | GM12878_c_GM12878_total_cov | K562_a_K562_in_cov | K562_a_K562_total_cov | K562_b_K562_in_cov | K562_b_K562_total_cov | K562_c_K562_in_cov | K562_c_K562_total_cov | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | RIPK2 | ENSG00000104312.8 | chr8 | + | 89780160 | 89786592 | ES | True | 0.000177 | 0.000001 | [16, 8, 26, 34, 41, 46, 32, 36, 39, 40, 42, 47... | [2, 0, 12, 9, 13, 4, 5, 10, 11, 14, 15, 17, 18... | 0.000000 | 0.001595 | 0.994244 | 8.728420e-08 | 0.296339 | 1.059189e-05 | 0.719659 | 0.103622 | 76 | 76 | 218 | 220 | 224 | 225 | 34 | 104 | 41 | 153 | 34 | 111 |
1 | ASAH1 | ENSG00000104763.20 | chr8 | - | 18067104 | 18067133 | IR | True | 0.007126 | 0.000135 | [11, 13, 80, 99, 69, 70, 101, 79, 81, 83, 88, ... | [12, 14, 68, 42, 44] | 0.003788 | 0.000000 | 0.512441 | 1.078330e-03 | 0.010335 | 3.306517e-07 | 0.231512 | 0.069018 | 13 | 31 | 23 | 35 | 39 | 81 | 0 | 51 | 1 | 81 | 1 | 62 |
2 | RECQL4 | ENSG00000160957.15 | chr8 | - | 144511789 | 144511910 | 5AS | False | 0.007126 | 0.000145 | [22, 31, 6, 20, 11, 72, 13, 33, 8, 79, 2, 105,... | [114, 147, 124, 38, 143, 205, 123, 152, 197, 1... | 0.216585 | 0.191781 | 0.012575 | 9.366640e-07 | 0.401486 | 1.211899e-04 | 0.240155 | 0.041071 | 1 | 150 | 1 | 9 | 0 | 1 | 278 | 666 | 127 | 301 | 177 | 482 |
3 | SNHG6 | ENSG00000245910.9 | chr8 | - | 66922392 | 66922613 | IR | False | 0.011708 | 0.000409 | [1, 11, 17, 29, 16, 15, 7, 14, 19, 23, 24, 25,... | [4, 10] | 0.994152 | 0.022727 | 0.255022 | 1.506108e-05 | 0.021488 | 1.779160e-06 | 0.128672 | 0.014606 | 5 | 26 | 13 | 45 | 21 | 82 | 1 | 70 | 3 | 72 | 1 | 91 |
4 | SMIM19 | ENSG00000176209.12 | chr8 | + | 42541705 | 42546468 | 5AS | False | 0.011708 | 0.000471 | [3, 4, 21] | [2, 5, 9, 26, 7, 17, 22, 24, 27, 28] | 0.000000 | 0.009901 | 0.097171 | 1.705951e-02 | 0.593322 | 2.732809e-05 | 0.304097 | 0.074889 | 10 | 37 | 2 | 48 | 0 | 68 | 26 | 43 | 38 | 63 | 25 | 44 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
142 | RNF170 | ENSG00000120925.16 | chr8 | - | 42861855 | 42870003 | ES | False | 1.000000 | 0.986423 | [9, 23, 21, 34, 37, 18, 29] | [3, 4, 20, 16, 15, 10, 13, 8, 27, 33, 39, 6, 1... | 0.000000 | 0.034682 | 0.852105 | 5.859088e-05 | 0.861528 | 8.070359e-06 | 0.856429 | 0.000020 | 10 | 10 | 31 | 36 | 51 | 62 | 22 | 26 | 35 | 41 | 24 | 27 |
143 | POLB | ENSG00000070501.12 | chr8 | + | 42361365 | 42369270 | ES | False | 1.000000 | 0.994807 | [7, 11, 15, 20, 36, 40, 43] | [1, 0, 2, 12, 3, 10, 23, 24, 27, 35, 5, 9, 13,... | 0.000000 | 0.015810 | 0.894198 | 1.207652e-05 | 0.894023 | 2.830737e-04 | 0.893994 | 0.000007 | 10 | 10 | 55 | 61 | 87 | 99 | 33 | 36 | 35 | 37 | 33 | 40 |
144 | SCRIB | ENSG00000180900.20 | chr8 | - | 143804825 | 143804954 | 5AS | False | 1.000000 | 0.996899 | [31, 11, 94, 79, 56, 103, 142, 17, 155, 165, 4... | [5, 0, 14, 27, 36, 4, 18, 58, 6, 67, 12, 29, 3... | 0.023810 | 0.085366 | 0.852941 | NaN | 0.854052 | 3.572960e-05 | 0.854019 | 0.000025 | 29 | 34 | 0 | 0 | 0 | 0 | 97 | 110 | 40 | 52 | 80 | 92 |
145 | CHCHD7 | ENSG00000170791.18 | chr8 | + | 56214667 | 56216432 | 3AS | False | 1.000000 | 1.000000 | [0, 7, 3, 6, 18, 34, 39, 2, 26, 9, 37, 50, 56,... | [1, 4, 5, 19, 27, 42, 48, 54, 33, 35, 38, 41, ... | 0.026565 | 0.050633 | 0.229832 | 5.468918e-06 | 0.231743 | 4.678234e-06 | 0.230432 | 0.000002 | 22 | 95 | 22 | 110 | 56 | 230 | 19 | 93 | 15 | 53 | 24 | 104 |
146 | PCM1 | ENSG00000078674.20 | chr8 | + | 17993619 | 18006262 | 3AS | True | 1.000000 | 1.000000 | [68, 15, 1, 6, 160, 31, 35, 11, 154, 253, 32, ... | [43, 237, 271, 411, 335, 242, 258, 387, 266, 2... | 0.153310 | 0.156250 | 0.102550 | 1.556612e-05 | 0.098836 | 4.151420e-05 | 0.100326 | 0.000007 | 3 | 40 | 2 | 26 | 7 | 51 | 11 | 88 | 2 | 53 | 7 | 61 |
147 rows × 32 columns
Depicting differential splicing results¶
To visualize deferentially spliced events, isotools implements a specific plot, summarizing the PSI values in the two plots. For each depicted event, two distributions represent the models posterior probability of PSI values for the two groups, as well as the observed PSI values for the individual samples as points. In addition, the read coverage over the event can be depicted as a sashimi-plot.
[4]:
plt.rcParams["figure.figsize"] = (10,7)
from isotools.plots import plot_diff_results
f,axs,plotted=plot_diff_results(diff_splice, min_diff=.1, min_support=2,grid_shape=(2,2), group_colors=['C0','C1'])
/pkg/python-3.10.8-0/lib/python3.10/site-packages/scipy/stats/_continuous_distns.py:668: RuntimeWarning: divide by zero encountered in _beta_pdf
return _boost._beta_pdf(x, a, b)
/home/lienhard/.local/lib/python3.10/site-packages/isotools/plots.py:92: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
plotted = plotted.append(row)
/home/lienhard/.local/lib/python3.10/site-packages/isotools/plots.py:92: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
plotted = plotted.append(row)
/home/lienhard/.local/lib/python3.10/site-packages/isotools/plots.py:92: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
plotted = plotted.append(row)
/home/lienhard/.local/lib/python3.10/site-packages/isotools/plots.py:92: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
plotted = plotted.append(row)
/home/lienhard/.local/lib/python3.10/site-packages/isotools/plots.py:92: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
plotted = plotted.append(row)
/home/lienhard/.local/lib/python3.10/site-packages/isotools/plots.py:92: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
plotted = plotted.append(row)
/home/lienhard/.local/lib/python3.10/site-packages/isotools/plots.py:92: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
plotted = plotted.append(row)
[5]:
plt.rcParams["figure.figsize"] = (10,10)
pos=[89780160-500,89786592+500]
fig,axs=plt.subplots(3)
g=isoseq['RIPK2']
g.gene_track(x_range=pos, ax=axs[0], reference=False, select_transcripts=g.filter_transcripts('SUBSTANTIAL'))
g.sashimi_plot(samples=isoseq.groups()['K562'],x_range=pos, ax=axs[1], title='K562', log_y=False)
g.sashimi_plot(samples=isoseq.groups()['GM12878'],x_range=pos, ax=axs[2], title='GM12878', log_y=False)
fig.tight_layout()
Splicing coordination¶
A particular strength of long read sequencing is the fact that distal splicing events on the same gene are covered by the same read, facilitating analysis of splicing coordination. Isotools implements a pairwise independence test for all pairs of Alternative Splicing Events (ASEs) of a gene.
[6]:
coordination_results=isoseq.coordination_test().sort_values('padj')
coordination_results.head()
100%|██████████| 10803/10803 [00:03<00:00, 2798.91genes/s]
[6]:
gene_id | gene_name | strand | eventA_type | eventB_type | eventA_start | evemtA_end | eventB_start | eventB_end | pvalue | padj | stat | log2OR | dcPSI_AB | dcPSI_BA | priA_priB | priA_altB | altA_priB | altA_altB | priA_priB_trID | priA_altB_trID | altA_priB_trID | altA_altB_trID | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
52 | ENSG00000147813.16 | NAPRT | - | IR | IR | 143575093 | 143575190 | 143574900 | 143574985 | 4.436456e-82 | 5.501206e-80 | 98.895238 | 6.627829 | 0.525552 | 0.637141 | 472 | 45 | 14 | 132 | [0, 5, 6, 7, 54, 62, 73, 17, 100, 24, 30, 46, ... | [25, 48, 129, 68, 128, 33, 133, 101, 135, 141,... | [41, 96, 110, 112, 115] | [21, 42, 1, 12, 31, 71, 44, 51, 63, 78, 92, 13... |
107 | ENSG00000182325.11 | FBXL6 | - | IR | IR | 144357786 | 144358031 | 144357121 | 144357438 | 1.074426e-74 | 6.661440e-73 | 1715.833333 | 10.744694 | 0.472927 | 0.469753 | 142 | 3 | 4 | 145 | [1, 21, 34, 7, 32, 36, 38, 72, 45, 81, 51, 85,... | [89, 87, 71] | [4, 40] | [2, 9, 19, 41, 14, 24, 44, 48, 56, 11, 54, 17,... |
55 | ENSG00000167700.9 | MFSD3 | + | IR | IR | 144510015 | 144510359 | 144510507 | 144510597 | 4.299909e-32 | 1.777296e-30 | inf | 38.459595 | 0.503866 | 0.743802 | 180 | 20 | 0 | 42 | [0, 3, 23, 35, 5, 38, 39] | [2, 21, 19, 20, 29] | [] | [11, 26, 14, 36, 18] |
45 | ENSG00000160957.15 | RECQL4 | - | IR | IR | 144512771 | 144512846 | 144512324 | 144512391 | 1.341822e-31 | 4.159649e-30 | 11.529161 | 3.527216 | 0.331031 | 0.344543 | 1231 | 87 | 81 | 66 | [22, 31, 6, 20, 114, 147, 124, 11, 38, 72, 13,... | [401, 191, 221, 94, 364, 68, 204, 127, 519, 17... | [117, 136, 279, 259, 271, 532, 292, 49, 184, 2... | [408, 110, 112, 28, 29, 158, 190, 215, 495, 51... |
108 | ENSG00000182325.11 | FBXL6 | - | IR | IR | 144357786 | 144358031 | 144356915 | 144356989 | 6.335233e-27 | 1.571138e-25 | 51.918519 | 5.698177 | 0.228402 | 0.451672 | 163 | 90 | 3 | 86 | [1, 12, 21, 34, 4, 7, 75, 86, 15, 23, 25, 26, ... | [2, 3, 41, 24, 44, 48, 8, 17, 37, 6, 18, 33, 5... | [32, 53] | [9, 19, 14, 56, 11, 13, 54, 42, 63, 64, 5, 16,... |
Differential Isoform Expression¶
In addition to the differential splicing test, isotools also implements the differential isoform expression (DIE) test, as proposed by Joglekar *et al.*, 2021
[7]:
isoseq.die_test(groups=isoseq.groups()).sort_values('pvalue').head(10)
100%|██████████| 10803/10803 [00:00<00:00, 17452.81genes/s]
[7]:
gene_id | gene_name | chrom | strand | start | end | padj | pvalue | deltaPI | transcript_ids | |
---|---|---|---|---|---|---|---|---|---|---|
5982 | ENSG00000070756.17 | PABPC1 | chr8 | - | 100685815 | 100722809 | 0.000000e+00 | 0.000000e+00 | -0.289084 | [2, 1] |
5094 | ENSG00000156482.11 | RPL30 | chr8 | - | 98024850 | 98046469 | 0.000000e+00 | 0.000000e+00 | 0.515042 | [3, 2] |
4805 | ENSG00000161016.18 | RPL8 | chr8 | - | 144789764 | 144792587 | 0.000000e+00 | 0.000000e+00 | -0.477042 | [2, 6] |
4632 | ENSG00000164924.18 | YWHAZ | chr8 | - | 100916522 | 100953388 | 0.000000e+00 | 0.000000e+00 | 0.303116 | [31, 11] |
4172 | ENSG00000147604.14 | RPL7 | chr8 | - | 73290241 | 73295789 | 1.157393e-170 | 1.885004e-172 | 0.092221 | [2, 31] |
3235 | ENSG00000104408.11 | EIF3E | chr8 | - | 108162786 | 108443496 | 2.092139e-137 | 4.088872e-139 | 0.166874 | [0, 20] |
9988 | ENSG00000104312.8 | RIPK2 | chr8 | + | 89757805 | 89791064 | 6.088181e-101 | 1.388185e-102 | 0.677697 | [2, 0] |
4239 | ENSG00000129696.13 | TTI2 | chr8 | - | 33473385 | 33513185 | 9.366700e-100 | 2.440834e-101 | -0.617118 | [3, 22] |
1369 | ENSG00000147684.10 | NDUFB9 | chr8 | + | 124539100 | 124580648 | 4.019587e-89 | 1.178381e-90 | 0.399575 | [0, -1] |
1525 | ENSG00000104320.15 | NBN | chr8 | - | 89924514 | 90003228 | 5.103980e-88 | 1.662534e-89 | -0.409024 | [2, 1] |
[ ]: