Contents

Click to Download pdf version of the manual

1.Download dockerfile and Build the Image


Click to Download latest dockerfile
$ mkdir prapi && cd prapi
$ wget http://www.bioinfor.org/tool/PRAPI/download/v1.6/Dockerfile	
$ docker build -t prapi:v1 ./
$ docker images
			

2.Usage

Please make sure docker is installed.If you want to take DE_APA analysis,please make sure that the bam file is generated by PAS_Seq reads.If you want to take Circle analysis,please make sure that the bam file is generated by RNase R easeing reads . Before running the script, you will need to run gmap_build to make a genome reference index. Please make sure that the section info you want is right.

$ path=`pwd`
$sudo docker run -it --rm -v ${path}:/data prapi:v1 Pacbio_v16.py -c /data/conf.txt
All parameter is stored in conf.txt.
basic arguments
Output_dir          Name of the output directory
Pacbio_reads       PacBio sequence's full path and name
GMAP_IndexesDir     Directory of genomic index files buided by gmap_build program
GMAP_Process        Number of worker threads for GMAP
Genome_Annotion     Reference annotation  in gpd format
MaxIntron           Max length for one internal intron (default 200000)
Multile_processing  Using parallel version
other arguments
MinDist             Minimum distance between any two poly(A) or transcripts start sites
MinSupport          Minimum number of trusted reads supporting a poly(A) or transcripts
start sites
Width_of_peaks      Peak widths for searching poly(A)
Graph arguments
Group              Groups of different library type to specify ylim for each facet separately
anchorLength        Min anchor Length for Tophat/Bowtie aligner
DElib               The libraries used for differential analysis
P value        (default 0.01)
FDR               (default 0.01)
			

3.Download and Run the Example data

Click to Download example data
$ wget http://www.bioinfor.org/tool/PRAPI/download/v1.6/test_v1.6.tar.gz -O - | tar xvzf - 
$ cd test_v1.6 
$ sh run.sh
			

4.Interpreting PRAPI output

Click to Download test output
tree test_result
test_v2/*/output
|-- Annotation_miss
|   |-- Miss_annotation.csv
|-- APA
|   |-- apa.txt_gene.csv
|   |-- de_apa.txt.gene.fdr.csv
|-- AS
|   |-- as.A3SS.txt_gene.csv
|   |-- as.A5SS.txt_gene.csv
|   |-- as.RI.txt_gene.csv
|   |-- as.SE.txt_gene.csv
|-- AS_cufflinks
|   |-- as.A3SS.txt_gene.csv
|   |-- as.A5SS.txt_gene.csv
|   |-- as.RI.txt_gene.csv
|   |-- as.SE.txt_gene.csv
|-- JC
|   |-- A3SS.MATS.ReadsOnTargetAndJunctionCount.txt_gene.csv
|   |-- A5SS.MATS.ReadsOnTargetAndJunctionCount.txt_gene.csv
|   |-- RI.MATS.ReadsOnTargetAndJunctionCount.txt_gene.csv
|   |-- SE.MATS.ReadsOnTargetAndJunctionCount.txt_gene.csv
|-- JC_cuff
|   |-- A3SS.MATS.ReadsOnTargetAndJunctionCount.txt_gene.csv
|   |-- A5SS.MATS.ReadsOnTargetAndJunctionCount.txt_gene.csv
|   |-- RI.MATS.ReadsOnTargetAndJunctionCount.txt_gene.csv
|   |-- SE.MATS.ReadsOnTargetAndJunctionCount.txt_gene.csv
|-- AS_IDP
|   |-- as.A3SS.txt_gene.csv
|   |-- as.A5SS.txt_gene.csv
|   |-- as.RI.txt_gene.csv
|   |-- as.SE.txt_gene.csv
|-- JC_IDP
|   |-- A3SS.MATS.ReadsOnTargetAndJunctionCount.txt_gene.csv
|   |-- A5SS.MATS.ReadsOnTargetAndJunctionCount.txt_gene.csv
|   |-- RI.MATS.ReadsOnTargetAndJunctionCount.txt_gene.csv
|   |-- SE.MATS.ReadsOnTargetAndJunctionCount.txt_gene.csv
|-- ATI
|   |-- ati.txt_gene.csv
|-- NAT
|   |-- de_nat.txt_gene.csv
|-- circle
|   |-- Lib*_circ.txt
|   |-- Lib*_VS_Lib*.txt
|   |-- Lib*_degseq_Lib*
|       |-- output_score.txt
|-- Graph
|   |-- Annotation_Gene
|       |-- Gene1.pdf
|       |-- ...
|       |-- GeneN.pdf
|   |-- Novel_Gene
|       |-- Gene1.pdf
|       |-- ...
|       |-- GeneN.pdf
|-- Novel_Gene
|       |-- Novel_Gene.fa

  • Annotation_miss - Annotation correction contains annotation correction gene.
  • Miss_annotation.csv format is below.
    Gene_nameName of annotation correction gene.
    Strandstrand of annotation correction gene.
    Support_Reads_NumReads number of Pacbio that support this annotation correction gene.
    Support_ReadsPacbio Reads that support this annotation correction gene.
  • APA - contains poly(A) site predicted and differential APA.
  • apa.txt_gene.csv format is below.
    Gene_nameName of genes that have Alternative cleavage and APA site.
    ReadsPacbio Reads that support this Alternative cleavage and APA site.
    PosGenome coordinate of Alternative cleavage and APA site.
    de_apa.txt.gene.fdr.csv format is below.
    Gene_nameName of genes that have differential APA site.
    ChromosomeChromosome of this gene.
    LibLibs pairs of this genes which have differential APA site.
    PosGenome coordinate of differential APA site.
    PvaluePvalue of this differential APA site.
    FDRFDR of this differential APA site
  • AS - contains four major type of AS file.
  • as.*.txt_gene.csv format is below.
    Gene_nameName of genes that have AS.
    GeneIDDefault values.
    geneSymbolDefault values.
    chrChromosome of this gene.
    strandStrand of this gene.
    The remaining 6 columnsThe upstream and downstream info of this AS events.
  • JC - contains four major type of differential AS file in Pacbio reads.
  • *.MATS.ReadsOnTargetAndJunctionCount.txt_gene.csv format is below.
    Gene_nameName of genes that have AS.
    GeneIDDefault values.
    geneSymbolDefault values.
    chrChromosome of this gene.
    strandStrand of this gene.
    The remaining 6 columnsThe upstream and downstream info of this AS events.
    IC_SAMPLE_1Inclusion counts for SAMPLE_1, replicates are separated by comma.
    SC_SAMPLE_1Skipping counts for SAMPLE_1, replicates are separated by comma.
    IC_SAMPLE_2Inclusion counts for SAMPLE_2, replicates are separated by comma.
    SC_SAMPLE_2Skipping counts for SAMPLE_2, replicates are separated by comma.
    IncFormLenlength of inclusion form, used for normalization.
    SkipFormLenlength of skipping form, used for normalization.
    IncLevel1Inclusion level for SAMPLE_1 replicates (comma separated) calculated from normalized counts.
    IncLevel2Inclusion level for SAMPLE_2 replicates (comma separated) calculated from normalized counts.
    IncLevelDifference/th>average(IncLevel1) - average(IncLevel2).
    FDRFDR of this differential AS.
    PValuePvalue of this differential AS.
  • AS_cuff - contains four major type of AS file in Pacbio reads in NGS assembly.
  • as.*.txt_gene.csv format is as AS.
  • JC_cuff - contains four major type of differential AS file in NGS assembly.
  • *.MATS.ReadsOnTargetAndJunctionCount.txt_gene.csv format is as JC.
  • AS_IDP - contains four major type of AS file in NGS by IDP predict and identify isoforms.
  • as.*.txt_gene.csv format is as AS.
  • JC_IDP - contains four major type of differential AS file by IDP predict and identify isoforms.
  • *.MATS.ReadsOnTargetAndJunctionCount.txt_gene.csv format is as JC.
  • ATI - contains files of trascripts start site.
  • ati.txt_gene.csv format is below.
    Gene_nameName of genes that have ATI site.
    ReadsPacbio Reads that support this ATI site.
    PosGenome coordinate of ATI site.
  • NAT - contains files of NAT and differential NAT.
  • nat.txt_gene.csv format is below.
    Gene_nameName of genes that have NATs.
    Plus_ReadsPlus strand Reads in NATS.
    Minus_ReadsMinus strand Reads in NATS.
    de_nat.txt_gene.csv format is below.
    Gene_nameName of genes that have NATs.
    logFCLog-fold-changes between each pair of NATs.
    logCPMLog2 counts-per-million of NATs.
    PValuePvalue of this differential NATs.
    FDRFDR of this differential NATs.
  • circle - contains files of circle libs and differential circle files.
  • Lib*_circ.txt format is below.
    startStart of junction.
    endEnd of junction.
    nameCircular RNA/Junction reads.
    scoreFlag to indicate realignment of fusion junctions.
    strand+ or - for strand.
    thickStartNo meaning.
    thickEndNo meaning.
    itemRgb0,0,0.
    exonCountNumber of exons.
    exonSizesExon sizes.
    exonOffsetsExon offsets.
    readNumberNumber of junction reads
    circType'circRNA' or 'ciRNA'.
    gene_Name Name of gene.
    isoformName Name of isoform.
    exonIndex/intronIndex Index (start from 1) of exon (for circRNA) or intron (for ciRNA) in given isoform.
    flankIntron Left intron/Right intron.
    Lib*_VS_Lib*.txt format is below.
    Circle informationgene name of this circle|Chromosome of this gene:circle start pos-circle end pos.
    Count number in lib1Count number of circle in lib1.
    Count number in lib2Count number of circle in lib2.
    output_score.txt format is below.
    GeneNamesgene name of this circle|Chromosome of this gene:circle start pos-circle end pos.
    value1Count number of circle in lib1.
    value2Count number of circle in lib2.
    log2(Fold_change)log2(Fold_change) of Count number in lib1 and lib2.
    log2(Fold_change) normalizedlog2(Fold_change) normalized of Count number in lib1 and lib2.
    z-scorez-score of de circle.
    p-valuep-value of de circle.
    q-value(Benjamini et al. 1995)q-value(Benjamini et al. 1995) of de circle.
    q-value(Storey et al. 2003)q-value(Storey et al. 2003) of de circle.
    Signature(p-value < 0.01)Signature.
  • Graph - contains pdf file of annotation gene and novel gene.
  • Novel_Gene - contains file of novel gene.
  • Novel_Gene.fa format is below.
    ChromChromosome of this novel gene.
    StartStart sites of this novel gene.
    EndEnd sites of this novel gene.
    Reads_NumPacbio reads number that support this novel gene.
    Reads_IdPacbio reads id that support this novel gene.
    SeqSeqs prediction of this novel gene.