File Export¶

The identified transcrips can be exported in gtf format or as a table with additional information. Both functions accept all filtering functionality from iter_transcripts, allowing for flexible and fine grained filtering of the relevant transcripts.

This tutorial demonstrates the export functionality with the prepared transcriptome .pkl file from here.

[1]:

from isotools import Transcriptome
import matplotlib.pyplot as plt

path='demonstration_dataset'
isoseq=Transcriptome.load(f'{path}/PacBio_isotools_substantial_isotools.pkl')

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/scratch/local/ipykernel_49741/2985863408.py in <module>
      3
      4 path='demonstration_dataset'
----> 5 isoseq=Transcriptome.load(f'{path}/PacBio_isotools_substantial_isotools.pkl')

~/.local/lib/python3.9/site-packages/isotools/transcriptome.py in load(cls, pickle_file)
     95
     96         logger.info('loading transcriptome from %s', pickle_file)
---> 97         tr = pickle.load(open(pickle_file, 'rb'))
     98         pickled_version = tr.infos.get('isotools_version', '<0.2.6')
     99         if pickled_version != __version__:

AttributeError: Can't get attribute '_unpickle_block' on <module 'pandas._libs.internals' from '/home/lienhard/.local/lib/python3.9/site-packages/pandas/_libs/internals.cpython-39-x86_64-linux-gnu.so'>

[ ]:

# export gtf:
isoseq.write_gtf(f'{path}/demonstration_dataset_substantial_transcripts.gtf.gz', source='isoseq', min_coverage=5,  gzip=True, query='SUBSTANTIAL and not (NOVEL_TRANSCRIPT and UNSPLICED)')

[ ]:

# export transcript table with the same filter criteria:
transcript_tab=isoseq.transcript_table( groups=isoseq.groups(),tpm=True,coverage=True,  min_coverage=5, progress_bar=True, query='SUBSTANTIAL and not (NOVEL_TRANSCRIPT and UNSPLICED)')
# write to csv file
transcript_tab.to_csv(f'{path}/demonstration_dataset_substantial_transcripts.csv.gz', index=False, sep='\t')
#show the first lines
transcript_tab.head()

[ ]: