{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Command Line Interface (CLI)\n",
    "IsoTools is a python library for the analysis of long read data. \n",
    "It starts from aligned reads (e.g. bamfiles) and features transcriptome reconstruction, quantification, visualization, explorative analysis and statistical tests.\n",
    "IsoTools is designed to be run interactivly (e.g. in a python notebook, or in the repl), as it features rich functionality to explore the data. \n",
    "However, it can also be run as a command line tool, with predefined parameters. This is convenient to process different experiments in a standardized pipeline.\n",
    "All command line parameters are descried in the [CLI section](../isotoolsCLI.html):\n",
    "\n",
    "\n",
    "To run the commands, need the following files from the [prepared demonstration dataset](https://oc-molgen.gnz.mpg.de/owncloud/s/gjG9EPiQwpRAyg3):\n",
    "\n",
    "* sample description file 'encode_samples.tsv'\n",
    "* six .bam alignment files for the six samples\n",
    "* six corresponding .bam.bai indices\n",
    "* referernce annotation file gencode.v42.chr_patch_hapl_scaff.annotation_sorted_chr8.gff3.gz\n",
    "* corresponding .gff3.gz.tbi index file\n",
    "* genomic reference file GRCh38.p13.genome_chr8.fa\n",
    "* cooresponding .fai index file\n",
    "\n",
    "The sample description file is a tab-separated text file with at least the following columns:\n",
    "\n",
    "* **sample_name**: A unique sample label\n",
    "* **file_name**: The path to the alignment .bam file\n",
    "* **group**: a group assignment of the sample\n",
    "\n",
    "\n",
    "The following command runs isotools and performs the following analysis steps:\n",
    "\n",
    "* transcriptome reconstruction from the alignments\n",
    "* filtering transcripts according to filter query (--filter_query)\n",
    "    * see [filter tutorial](06_filtering.html#Filtering-tags-and-queries) for a description of tags and queries.\n",
    "* exporting transcripts as gtf (--gtf_out)\n",
    "* exporting a transcript table, which includes the number of reads per transcripts (--transcript_table)\n",
    "* finally, by default, a pkl file is produced, which contains the transcriptome object, reducing runtime of the next call of the CLI.\n",
    "    * The resulting transcriptome can also be explored using the API.\n",
    "    * creation of the pkl file can be skipped with --no-pkl\n",
    "    * import of the previous pkl file can be bypassed with the --force-recreate command\n",
    "    * if a pkl file is present, and a sample table containing novel samples is provided, only the novel samples are added\n",
    "* the two parameters \"--log\" INFO and \"--progress_bar\" enable readout of the progress. \n",
    "\n",
    "``` bash\n",
    "cd demonstration_data \n",
    "samples='encode_samples.tsv'\n",
    "anno='gencode.v42.chr_patch_hapl_scaff.annotation_sorted_chr8.gff3.gz'\n",
    "genome='GRCh38.p13.genome_chr8.fa'\n",
    "\n",
    "run_isotools \\\n",
    "    --anno $anno \\\n",
    "    --log INFO \\\n",
    "    --progress_bar \\\n",
    "    --genome $genome \\\n",
    "    --samples $samples \\\n",
    "    --file_prefix ./PacBio_isotools_substantial \\\n",
    "    --custom_filter_tag \"COVERED=any(g.coverage[:,trid] > 2)\"  \"HIGH_COVERED=any(g.coverage[:,trid] > 5)\" \\\n",
    "    --filter_query \"(COVERED and FSM) or (HIGH_COVERED and SUBSTANTIAL and not INTERNAL_PRIMING)\" \\\n",
    "    --gtf_out --transcript_table\n",
    "    \n",
    "```\n",
    "\n",
    "    \n",
    "```\n",
    "2022-11-04 14:01:56 INFO: This is isotools version 0.3.2rc2\n",
    "2022-11-04 14:01:56 INFO: loading transcriptome from ./PacBio_isotools_substantial_isotools.pkl\n",
    "2022-11-04 14:01:56 INFO: importing reference from gff3 file gencode.v42.chr_patch_hapl_scaff.annotation_sorted_chr8.gff3.gz\n",
    "100%|████████████████████████████████████████████████████████████████████████████████████| 2.82M/2.82M [00:02<00:00, 1.37MB/s]\n",
    "2022-11-04 14:01:58 INFO: skipped the following categories: {'CDS', 'five_prime_UTR', 'three_prime_UTR'}\n",
    "2022-11-04 14:01:58 WARNING: Missing genes! Found gene information in categories ['gene'] for 2540/5080 genes\n",
    "2022-11-04 14:01:58 INFO: collapsed 0 immunoglobulin loci and 0 T-cell receptor loci\n",
    "2022-11-04 14:01:58 INFO: adding sample GM12878_a from file ENCFF417VHJ_aligned_mm2_chr8.bam\n",
    "100%|████████████████████████████████████████████████████████████████████████████████████| 53.0k/53.0k [00:11<00:00, 4.58kreads/s, chr=KI270757.1]\n",
    "2022-11-04 14:02:09 INFO: skipped 110 reads aligned fraction of less than 0.75.\n",
    "2022-11-04 14:02:09 INFO: skipped 10972 secondary alignments (0x100), alignment that failed quality check (0x200) or PCR duplicates (0x400)\n",
    "2022-11-04 14:02:09 WARNING: ignored 533 chimeric alignments with only one part aligned to specified chromosomes.\n",
    "2022-11-04 14:02:09 INFO: ignoring 2231 chimeric alignments with less than 2 reads\n",
    "2022-11-04 14:02:09 INFO: imported 40182 nonchimeric reads (including  14 chained chimeric alignments) and 73 chimeric reads with coverage of at least 2.\n",
    "2022-11-04 14:02:09 INFO: adding sample GM12878_b from file ENCFF450VAU_aligned_mm2_chr8.bam\n",
    "100%|████████████████████████████████████████████████████████████████████████████████████| 68.4k/68.4k [00:11<00:00, 5.74kreads/s, chr=KI270757.1]\n",
    "2022-11-04 14:02:21 INFO: skipped 71 reads aligned fraction of less than 0.75.\n",
    "2022-11-04 14:02:21 INFO: skipped 12700 secondary alignments (0x100), alignment that failed quality check (0x200) or PCR duplicates (0x400)\n",
    "2022-11-04 14:02:21 WARNING: ignored 484 chimeric alignments with only one part aligned to specified chromosomes.\n",
    "2022-11-04 14:02:21 INFO: ignoring 1273 chimeric alignments with less than 2 reads\n",
    "2022-11-04 14:02:21 INFO: imported 54853 nonchimeric reads (including  12 chained chimeric alignments) and 7 chimeric reads with coverage of at least 2.\n",
    "2022-11-04 14:02:21 INFO: adding sample GM12878_c from file ENCFF694DIE_aligned_mm2_chr8.bam\n",
    "100%|████████████████████████████████████████████████████████████████████████████████████| 90.7k/90.7k [00:14<00:00, 6.30kreads/s, chr=KI270757.1]\n",
    "2022-11-04 14:02:36 INFO: skipped 85 reads aligned fraction of less than 0.75.\n",
    "2022-11-04 14:02:36 INFO: skipped 17261 secondary alignments (0x100), alignment that failed quality check (0x200) or PCR duplicates (0x400)\n",
    "2022-11-04 14:02:36 WARNING: ignored 455 chimeric alignments with only one part aligned to specified chromosomes.\n",
    "2022-11-04 14:02:36 INFO: ignoring 1410 chimeric alignments with less than 2 reads\n",
    "2022-11-04 14:02:36 INFO: imported 72451 nonchimeric reads (including  38 chained chimeric alignments) and 12 chimeric reads with coverage of at least 2.\n",
    "2022-11-04 14:02:36 INFO: adding sample K562_a from file ENCFF429VVB_aligned_mm2_chr8.bam\n",
    "100%|████████████████████████████████████████████████████████████████████████████████████| 107k/107k [00:19<00:00, 5.35kreads/s, chr=KI270757.1]\n",
    "2022-11-04 14:02:56 INFO: skipped 297 reads aligned fraction of less than 0.75.\n",
    "2022-11-04 14:02:56 INFO: skipped 23990 secondary alignments (0x100), alignment that failed quality check (0x200) or PCR duplicates (0x400)\n",
    "2022-11-04 14:02:56 WARNING: ignored 2160 chimeric alignments with only one part aligned to specified chromosomes.\n",
    "2022-11-04 14:02:56 INFO: ignoring 7445 chimeric alignments with less than 2 reads\n",
    "2022-11-04 14:02:56 INFO: imported 76692 nonchimeric reads (including  57 chained chimeric alignments) and 415 chimeric reads with coverage of at least 2.\n",
    "2022-11-04 14:02:56 INFO: adding sample K562_b from file ENCFF696GDL_aligned_mm2_chr8.bam\n",
    "100%|████████████████████████████████████████████████████████████████████████████████████| 78.0k/78.0k [00:15<00:00, 5.03kreads/s, chr=KI270757.1]\n",
    "2022-11-04 14:03:12 INFO: skipped 165 reads aligned fraction of less than 0.75.\n",
    "2022-11-04 14:03:12 INFO: skipped 15026 secondary alignments (0x100), alignment that failed quality check (0x200) or PCR duplicates (0x400)\n",
    "2022-11-04 14:03:12 WARNING: ignored 1142 chimeric alignments with only one part aligned to specified chromosomes.\n",
    "2022-11-04 14:03:12 INFO: ignoring 4530 chimeric alignments with less than 2 reads\n",
    "2022-11-04 14:03:12 INFO: imported 59118 nonchimeric reads (including  43 chained chimeric alignments) and 284 chimeric reads with coverage of at least 2.\n",
    "2022-11-04 14:03:12 INFO: adding sample K562_c from file ENCFF634YSN_aligned_mm2_chr8.bam\n",
    "100%|████████████████████████████████████████████████████████████████████████████████████| 117k/117k [00:19<00:00, 5.95kreads/s, chr=KI270757.1]\n",
    "2022-11-04 14:03:32 INFO: skipped 294 reads aligned fraction of less than 0.75.\n",
    "2022-11-04 14:03:32 INFO: skipped 30231 secondary alignments (0x100), alignment that failed quality check (0x200) or PCR duplicates (0x400)\n",
    "2022-11-04 14:03:32 WARNING: ignored 2528 chimeric alignments with only one part aligned to specified chromosomes.\n",
    "2022-11-04 14:03:32 INFO: ignoring 8019 chimeric alignments with less than 2 reads\n",
    "2022-11-04 14:03:32 INFO: imported 80343 nonchimeric reads (including  46 chained chimeric alignments) and 371 chimeric reads with coverage of at least 2.\n",
    "100%|████████████████████████████████████████████████████████████████████████████████████| 10803/10803 [01:17<00:00, 140.02genes/s]\n",
    "2022-11-04 14:04:49 INFO: adding new filter rule COVERED in transcript context\n",
    "2022-11-04 14:04:49 INFO: adding new filter rule HIGH_COVERED in transcript context\n",
    "2022-11-04 14:04:49 INFO: writing transcript table to ./PacBio_isotools_substantial_transcripts.csv\n",
    "100%|████████████████████████████████████████████████████████████████████████████████████| 10803/10803 [00:02<00:00, 3860.22genes/s]\n",
    "2022-11-04 14:04:52 INFO: writing gtf file to ./PacBio_isotools_substantial_transcripts.gtf\n",
    "100%|████████████████████████████████████████████████████████████████████████████████████| 10803/10803 [00:02<00:00, 4371.45genes/s]\n",
    "2022-11-04 14:04:55 INFO: saving transcripts as pickle file\n",
    "2022-11-04 14:04:55 INFO: saving transcriptome to ./PacBio_isotools_substantial_isotools.pkl\n",
    "    \n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Running the command for the next time will automatically load the stored '.pkl' file. \n",
    "Here, we use the command line interface to perform differential splicing analysis.\n",
    "The resulting table with test statistics for all alternative splicing events is written to the file \n",
    "PacBio_isotools_substantial_diff_K562_GM12878.csv.\n",
    "In addidtion, the command creates sashimi coverage plots for the top 5 differentially spliced genes\n",
    "\n",
    "``` bash\n",
    "\n",
    "run_isotools \\\n",
    "    --log INFO \\\n",
    "    --progress_bar \\\n",
    "    --file_prefix ./PacBio_isotools_substantial \\\n",
    "    --group_by group \\\n",
    "    --diff_plots 5 \\\n",
    "    --diff K562/GM12878\n",
    "    \n",
    "```\n",
    "\n",
    "```\n",
    "\n",
    "2022-12-12 15:19:04 INFO: This is isotools version 0.3.2rc2\n",
    "2022-12-12 15:19:04 INFO: loading transcriptome from ./PacBio_isotools_substantial_isotools.pkl\n",
    "2022-12-12 15:19:07 INFO: testing differential splicing for K562 (3) vs GM12878 (3) using betabinom_lr test\n",
    "100%|████████████████████████████████████████████████████████████████████████████████████| 10803/10803 [00:22<00:00, 470.97genes/s]\n",
    "2022-12-12 15:19:30 INFO: 179 differential splice sites in 112 genes for K562 vs GM12878\n",
    "2022-12-12 15:19:30 INFO: sashimi plot for differentially spliced gene RIPK2\n",
    "2022-12-12 15:19:31 INFO: sashimi plot for differentially spliced gene PVT1\n",
    "2022-12-12 15:19:36 INFO: sashimi plot for differentially spliced gene TUSC3\n",
    "2022-12-12 15:19:38 INFO: sashimi plot for differentially spliced gene NCALD\n",
    "2022-12-12 15:19:41 INFO: sashimi plot for differentially spliced gene RBIS\n",
    "2022-12-12 15:19:43 INFO: saving transcripts as pickle file\n",
    "2022-12-12 15:19:43 INFO: saving transcriptome to ./PacBio_isotools_substantial_isotools.pkl\n",
    "\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "python3.9",
   "language": "python",
   "name": "python3.9"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}