BroCOLI Tutorials¶
Welcome to the gsMap Tutorials. In this section, we provide detailed examples and guides to help you understand and utilize gsMap effectively.
%%{init: {'themeVariables': { 'fontSize': '20px' }}}%%
graph LR
classDef input fill:#e3f2fd,stroke:#1565c0,stroke-width:2px,color:black;
classDef process fill:#fff9c4,stroke:#fbc02d,stroke-width:2px,color:black;
classDef result fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px,color:black;
A[raw fastq.gz]:::input
C[new fastq]:::process
B{sorted.sam}:::process
E[gene & transcript counts]:::result
A --> |"Bulk: minimap2"| B
A --> |"SC: pre-BroCOLI"| C
C --> |"SC: minimap2"| B
B --> |"BroCOLI_bulk"| E
B --> |"BroCOLI_sc"| E
linkStyle 0,3 stroke:#ff9800,stroke-width:4px;
linkStyle 1,2,4 stroke:#9c27b0,stroke-width:4px;
BroCOLI input files¶
To run BroCOLI, you should provide:
- FASTQ (FASTQ.gz) should be processed into sorted SAM. minimap2
- Reference sequence in FASTA format.
- Optionally, you may provide a reference gene annotation in GTF format (recommend).
BroCOLI General Usage¶
Bulk¶
- Step1 Mapping of the fastq files with minimap2
minimap2 -ax splice -ub --secondary=no -t 20 ref.fasta raw.fastq.gz > raw.sam
samtools sort -@ 20 -o raw_sorted.sam raw.sam
Note
The mapping SAM files need to be sorted by samtools before running BroCOLI.
Noisy cDNA data recommended parameter
For noisy 1D cDNA Nanopore data the developer of Minimap2 suggests adding -k 14 and -w 4:
A BED file can be provided to assist the mapping
-
Step2 Transcript identification and quantification
-
For a single SAM file, use the -s parameter to specify its absolute path (i).
- For multiple files, set the -s parameter to the directory containing the sorted SAM files (ii). Alternatively, you can provide a TXT/TSV file listing the absolute path to each input SAM file on a separate line. The output order will correspond to the order listed in the file (iii).
(i) (ii) (iii)
input_reads.sam ─── input_directory ─── input.txt(.tsv)
├── sample0.sam ├── sample0.sam
└── sample1.sam ├── sample1.sam
└── sample2.sam
Single cell and spatial¶
- Step1 Processing fastq files with BroCOLI
- -q indicates the data type. such as, [visium, 10x3v3, magicseq].
- -p represents the number of threads.
- -w is the whitelist of cell barcodes. (i) Provide a TSV file containing only barcodes, such as the filtered whitelist generated by CellRanger. Alternatively, (ii) use the provided ext_bc_and_umi.py to obtain a whitelist that includes both barcodes and UMIs. In this mode, UMI correction will be performed automatically.
ext_bc_and_umi.py
You also can use Flexiplex for preprocessing
You can visit its GitHub page to learn more about its detailed usage.
First, assign reads - short reads or single-cell long reads - to cellular barcodes
Second, mapping.You also can use Sicelore-2.1 for preprocessing
You can visit its GitHub page to learn more about its detailed usage. Before you run sicelore, you need to set up the required JAVA environment for it.
First, scan Nanopore reads - assign cell barcodes.
java -jar -Xmx80g <path>/NanoporeBC_UMI_finder-2.1.jar scanfastq -d <directory to start recursive search for fastq files> -o outPutDirectory --bcEditDistance 1 --cellRangerBCs cellRangerbarcodes.tsv
Second, mapping.
minimap2 -ax splice -ub -k14 -w 4 --junc-bed junctions.bed --sam-hit-only --secondary=no -t 20 ref.fasta <fastq.gz path> > raw.sam
samtools view -bS -@ 20 raw.sam > raw.bam
samtools sort -@ 20 -o raw_sorted.bam raw.bam
samtools index -@ 20 raw_sorted.bam raw_sorted_index
Third, UMI assignment.
The output bam file generated by the cell bc and UMI assignment is converted to a sam file for BroCOLI's input.- Step2 Mapping of the fastq files with minimap2
The processing is identical to Step 1 in the bulk workflow.
- Step3 Transcript identification and quantification
The input data is similar to the bulk.
(i) (ii) (iii)
input_reads.sam ─── input_directory ─── input.txt(.tsv)
├── sample0.sam ├── sample0.sam
└── sample1.sam ├── sample1.sam
└── sample2.sam
Examples¶
Simple test¶
- Bulk: SIRV4 dataset
./BroCOLI_bulk -t 1 -s example/example_SIRV.sam -g example/example_SIRV.gtf -f example/example_SIRV.fasta -o TestResult
- Single cell
All Arguments¶
Advanced testing of BroCOLI can be performed using the following parameters:
Arguments:
-s, --sam
SAM file path. We recommend using absolute paths. If you have a single file, you can directly provide its absolute path. If you have multiple files, you can specify the path to a folder that contains all the sorted SAM files you want to process. (required)
-f, --fasta
FASTA file path. FASTA file requires the chromosome names to match the GTF file. (required)
-o, --output:
output folder path. (required)
-g, --gtf
input annotation file in GTF format. (optional, Recommendation provided)
-n, --support
min perfect read count for all splice junctions of novel isoform. (optional, default:2)
-j, --SJDistance
the minimum distance determined as intron. (optional, default:18)
-e, --single_exon_boundary
belongs to the isoform scope of a single exon. (optional, default:60)
-d, --graph_distance:
the distance threshold for constructing the isoform candidate distance graph. (optional, default:60)
-t, --thread
thread number (optional, default:8).
-h, --help
show this help information.