Command Line Interface (CLI)

IsoTools is a python library for the analysis of long read data. It starts from aligned reads (e.g. bamfiles) and features transcriptome reconstruction, quantification, visualization, explorative analysis and statistical tests. IsoTools is designed to be run interactivly (e.g. in a python notebook, or in the repl), as it features rich functionality to explore the data. However, it can also be run as a command line tool, with predefined parameters. This is convenient to process different experiments in a standardized pipeline. All command line parameters are descried in the CLI section:

To run the commands, need the following files from the prepared demonstration dataset:

  • sample description file ‘encode_samples.tsv’

  • six .bam alignment files for the six samples

  • six corresponding .bam.bai indices

  • referernce annotation file gencode.v42.chr_patch_hapl_scaff.annotation_sorted_chr8.gff3.gz

  • corresponding .gff3.gz.tbi index file

  • genomic reference file GRCh38.p13.genome_chr8.fa

  • cooresponding .fai index file

The sample description file is a tab-separated text file with at least the following columns:

  • sample_name: A unique sample label

  • file_name: The path to the alignment .bam file

  • group: a group assignment of the sample

The following command runs isotools and performs the following analysis steps:

  • transcriptome reconstruction from the alignments

  • filtering transcripts according to filter query (–filter_query)

  • exporting transcripts as gtf (–gtf_out)

  • exporting a transcript table, which includes the number of reads per transcripts (–transcript_table)

  • finally, by default, a pkl file is produced, which contains the transcriptome object, reducing runtime of the next call of the CLI.

    • The resulting transcriptome can also be explored using the API.

    • creation of the pkl file can be skipped with –no-pkl

    • import of the previous pkl file can be bypassed with the –force-recreate command

    • if a pkl file is present, and a sample table containing novel samples is provided, only the novel samples are added

  • the two parameters “–log” INFO and “–progress_bar” enable readout of the progress.

cd demonstration_data
samples='encode_samples.tsv'
anno='gencode.v42.chr_patch_hapl_scaff.annotation_sorted_chr8.gff3.gz'
genome='GRCh38.p13.genome_chr8.fa'

run_isotools \
    --anno $anno \
    --log INFO \
    --progress_bar \
    --genome $genome \
    --samples $samples \
    --file_prefix ./PacBio_isotools_substantial \
    --custom_filter_tag "COVERED=any(g.coverage[:,trid] > 2)"  "HIGH_COVERED=any(g.coverage[:,trid] > 5)" \
    --filter_query "(COVERED and FSM) or (HIGH_COVERED and SUBSTANTIAL and not INTERNAL_PRIMING)" \
    --gtf_out --transcript_table
2022-11-04 14:01:56 INFO: This is isotools version 0.3.2rc2
2022-11-04 14:01:56 INFO: loading transcriptome from ./PacBio_isotools_substantial_isotools.pkl
2022-11-04 14:01:56 INFO: importing reference from gff3 file gencode.v42.chr_patch_hapl_scaff.annotation_sorted_chr8.gff3.gz
100%|████████████████████████████████████████████████████████████████████████████████████| 2.82M/2.82M [00:02<00:00, 1.37MB/s]
2022-11-04 14:01:58 INFO: skipped the following categories: {'CDS', 'five_prime_UTR', 'three_prime_UTR'}
2022-11-04 14:01:58 WARNING: Missing genes! Found gene information in categories ['gene'] for 2540/5080 genes
2022-11-04 14:01:58 INFO: collapsed 0 immunoglobulin loci and 0 T-cell receptor loci
2022-11-04 14:01:58 INFO: adding sample GM12878_a from file ENCFF417VHJ_aligned_mm2_chr8.bam
100%|████████████████████████████████████████████████████████████████████████████████████| 53.0k/53.0k [00:11<00:00, 4.58kreads/s, chr=KI270757.1]
2022-11-04 14:02:09 INFO: skipped 110 reads aligned fraction of less than 0.75.
2022-11-04 14:02:09 INFO: skipped 10972 secondary alignments (0x100), alignment that failed quality check (0x200) or PCR duplicates (0x400)
2022-11-04 14:02:09 WARNING: ignored 533 chimeric alignments with only one part aligned to specified chromosomes.
2022-11-04 14:02:09 INFO: ignoring 2231 chimeric alignments with less than 2 reads
2022-11-04 14:02:09 INFO: imported 40182 nonchimeric reads (including  14 chained chimeric alignments) and 73 chimeric reads with coverage of at least 2.
2022-11-04 14:02:09 INFO: adding sample GM12878_b from file ENCFF450VAU_aligned_mm2_chr8.bam
100%|████████████████████████████████████████████████████████████████████████████████████| 68.4k/68.4k [00:11<00:00, 5.74kreads/s, chr=KI270757.1]
2022-11-04 14:02:21 INFO: skipped 71 reads aligned fraction of less than 0.75.
2022-11-04 14:02:21 INFO: skipped 12700 secondary alignments (0x100), alignment that failed quality check (0x200) or PCR duplicates (0x400)
2022-11-04 14:02:21 WARNING: ignored 484 chimeric alignments with only one part aligned to specified chromosomes.
2022-11-04 14:02:21 INFO: ignoring 1273 chimeric alignments with less than 2 reads
2022-11-04 14:02:21 INFO: imported 54853 nonchimeric reads (including  12 chained chimeric alignments) and 7 chimeric reads with coverage of at least 2.
2022-11-04 14:02:21 INFO: adding sample GM12878_c from file ENCFF694DIE_aligned_mm2_chr8.bam
100%|████████████████████████████████████████████████████████████████████████████████████| 90.7k/90.7k [00:14<00:00, 6.30kreads/s, chr=KI270757.1]
2022-11-04 14:02:36 INFO: skipped 85 reads aligned fraction of less than 0.75.
2022-11-04 14:02:36 INFO: skipped 17261 secondary alignments (0x100), alignment that failed quality check (0x200) or PCR duplicates (0x400)
2022-11-04 14:02:36 WARNING: ignored 455 chimeric alignments with only one part aligned to specified chromosomes.
2022-11-04 14:02:36 INFO: ignoring 1410 chimeric alignments with less than 2 reads
2022-11-04 14:02:36 INFO: imported 72451 nonchimeric reads (including  38 chained chimeric alignments) and 12 chimeric reads with coverage of at least 2.
2022-11-04 14:02:36 INFO: adding sample K562_a from file ENCFF429VVB_aligned_mm2_chr8.bam
100%|████████████████████████████████████████████████████████████████████████████████████| 107k/107k [00:19<00:00, 5.35kreads/s, chr=KI270757.1]
2022-11-04 14:02:56 INFO: skipped 297 reads aligned fraction of less than 0.75.
2022-11-04 14:02:56 INFO: skipped 23990 secondary alignments (0x100), alignment that failed quality check (0x200) or PCR duplicates (0x400)
2022-11-04 14:02:56 WARNING: ignored 2160 chimeric alignments with only one part aligned to specified chromosomes.
2022-11-04 14:02:56 INFO: ignoring 7445 chimeric alignments with less than 2 reads
2022-11-04 14:02:56 INFO: imported 76692 nonchimeric reads (including  57 chained chimeric alignments) and 415 chimeric reads with coverage of at least 2.
2022-11-04 14:02:56 INFO: adding sample K562_b from file ENCFF696GDL_aligned_mm2_chr8.bam
100%|████████████████████████████████████████████████████████████████████████████████████| 78.0k/78.0k [00:15<00:00, 5.03kreads/s, chr=KI270757.1]
2022-11-04 14:03:12 INFO: skipped 165 reads aligned fraction of less than 0.75.
2022-11-04 14:03:12 INFO: skipped 15026 secondary alignments (0x100), alignment that failed quality check (0x200) or PCR duplicates (0x400)
2022-11-04 14:03:12 WARNING: ignored 1142 chimeric alignments with only one part aligned to specified chromosomes.
2022-11-04 14:03:12 INFO: ignoring 4530 chimeric alignments with less than 2 reads
2022-11-04 14:03:12 INFO: imported 59118 nonchimeric reads (including  43 chained chimeric alignments) and 284 chimeric reads with coverage of at least 2.
2022-11-04 14:03:12 INFO: adding sample K562_c from file ENCFF634YSN_aligned_mm2_chr8.bam
100%|████████████████████████████████████████████████████████████████████████████████████| 117k/117k [00:19<00:00, 5.95kreads/s, chr=KI270757.1]
2022-11-04 14:03:32 INFO: skipped 294 reads aligned fraction of less than 0.75.
2022-11-04 14:03:32 INFO: skipped 30231 secondary alignments (0x100), alignment that failed quality check (0x200) or PCR duplicates (0x400)
2022-11-04 14:03:32 WARNING: ignored 2528 chimeric alignments with only one part aligned to specified chromosomes.
2022-11-04 14:03:32 INFO: ignoring 8019 chimeric alignments with less than 2 reads
2022-11-04 14:03:32 INFO: imported 80343 nonchimeric reads (including  46 chained chimeric alignments) and 371 chimeric reads with coverage of at least 2.
100%|████████████████████████████████████████████████████████████████████████████████████| 10803/10803 [01:17<00:00, 140.02genes/s]
2022-11-04 14:04:49 INFO: adding new filter rule COVERED in transcript context
2022-11-04 14:04:49 INFO: adding new filter rule HIGH_COVERED in transcript context
2022-11-04 14:04:49 INFO: writing transcript table to ./PacBio_isotools_substantial_transcripts.csv
100%|████████████████████████████████████████████████████████████████████████████████████| 10803/10803 [00:02<00:00, 3860.22genes/s]
2022-11-04 14:04:52 INFO: writing gtf file to ./PacBio_isotools_substantial_transcripts.gtf
100%|████████████████████████████████████████████████████████████████████████████████████| 10803/10803 [00:02<00:00, 4371.45genes/s]
2022-11-04 14:04:55 INFO: saving transcripts as pickle file
2022-11-04 14:04:55 INFO: saving transcriptome to ./PacBio_isotools_substantial_isotools.pkl

Running the command for the next time will automatically load the stored ‘.pkl’ file. Here, we use the command line interface to perform differential splicing analysis. The resulting table with test statistics for all alternative splicing events is written to the file PacBio_isotools_substantial_diff_K562_GM12878.csv. In addidtion, the command creates sashimi coverage plots for the top 5 differentially spliced genes

run_isotools \
    --log INFO \
    --progress_bar \
    --file_prefix ./PacBio_isotools_substantial \
    --group_by group \
    --diff_plots 5 \
    --diff K562/GM12878
2022-12-12 15:19:04 INFO: This is isotools version 0.3.2rc2
2022-12-12 15:19:04 INFO: loading transcriptome from ./PacBio_isotools_substantial_isotools.pkl
2022-12-12 15:19:07 INFO: testing differential splicing for K562 (3) vs GM12878 (3) using betabinom_lr test
100%|████████████████████████████████████████████████████████████████████████████████████| 10803/10803 [00:22<00:00, 470.97genes/s]
2022-12-12 15:19:30 INFO: 179 differential splice sites in 112 genes for K562 vs GM12878
2022-12-12 15:19:30 INFO: sashimi plot for differentially spliced gene RIPK2
2022-12-12 15:19:31 INFO: sashimi plot for differentially spliced gene PVT1
2022-12-12 15:19:36 INFO: sashimi plot for differentially spliced gene TUSC3
2022-12-12 15:19:38 INFO: sashimi plot for differentially spliced gene NCALD
2022-12-12 15:19:41 INFO: sashimi plot for differentially spliced gene RBIS
2022-12-12 15:19:43 INFO: saving transcripts as pickle file
2022-12-12 15:19:43 INFO: saving transcriptome to ./PacBio_isotools_substantial_isotools.pkl
[ ]: