Saturation analysis

IsoTools provides two approaches to assess the required depth of sequencing:

  • Saturation analysis:

    • Models the propability of observing a transcript with at least n reads.

    • Depends on the expression level on the transcript

    • E.g.: “With the given seq depth, the probability of observing a 1 TPM transcript is > 80%”

  • Rarefaction analysis

    • Subsamples the reads and counts the number of observed transcripts.

    • Helps estimating, how much more transcripts would be observed with deeper sequencing.

    • If the curve is relativly flat towards the right, deeper sequencing would yield little additional transcripts.

    • Can be restricted to a subset of transcripts (e.g. only matching reference annotation or only novel genes)

The following steps assume you have run the previous tutorial, or downloaded the prepared transcriptome pkl file PacBio_isotools_substantial_isotools.pkl from the demonstration dataset.

[1]:
from isotools import Transcriptome

path='demonstration_dataset'
isoseq=Transcriptome.load(f'{path}/PacBio_isotools_substantial_isotools.pkl')
This is isotools version 0.3.2rc3, but data has been pickled with version 0.3.2rc2, which may be incompatible
[2]:
from isotools.plots import plot_saturation, plot_rarefaction
import matplotlib.pyplot as plt


plt.rcParams["figure.figsize"] = (10,5)
fig, axs=plt.subplots(1,2)

plot_saturation(isoseq,cov_th=5, expr_th=[100, 200, 400, 1000, 5000],x_range=(1e2,6e4,1e2), ax=axs[0])

rarefraction, total=isoseq.rarefaction(min_coverage=2, tr_filter={'query':'FSM'})
plot_rarefaction(rarefraction, total=total, ax=axs[1])

fig.tight_layout()


../_images/notebooks_04_saturation_analysis_2_0.png
[ ]: