Introduction to BroCOLI Output Files¶

Bulk output files¶

1. file_explain.txt

This file establishes a mapping between samples and a numerical index, ranging from 0 to (number of samples − 1).

- Column 1 contains the index assigned to each sample by BroCOLI.
- Column 2 contains the absolute path to the corresponding SAM file for that sample.

2. counts_transcript.txt

This file contains the quantitative read counts for all transcripts across all samples.

- Column 1: Ensembl transcript ID.
- Column 2: Ensembl gene ID of the corresponding gene. For novel transcripts with unclear or unmapped gene associations, BroCOLI outputs NA in this column.
- Columns 3 to the end: Read counts of the transcript in each sample (one column per sample; the total number of columns equals the number of samples).

3. counts_gene.txt

This file contains the quantitative read counts for all genes across all samples.

- Column 1: Ensembl gene ID.
- Columns 2 to the end: Read counts of the gene in each sample (one column per sample; the total number of columns equals the number of samples).

4. updated_annotitions.gtf

This is an updated GTF annotation file that incorporates both known (annotated) and novel isoforms for the detected transcripts.

- The source column indicates the origin of each isoform (novel for newly discovered isoforms or annotated for known isoforms).
- Each isoform is described on a single line containing its feature information, followed by one or more subsequent lines detailing its exon coordinates.

5. compatible_isoform.tsv

This file reports the assignment of each read to a specific isoform across all sample files.

- Column 1 (read_id): The read identifier as it appears in the original SAM file.
- Column 2 (category): Classification of the read–isoform match. BroCOLI categorizes reads into four types:
  * FSM: Full splice match (complete match to a known isoform).
  * ISM: Incomplete splice match (partial match to a known isoform).
  * SE: The isoform consists of a single exon.

Column 3 (isoform_id): Ensembl transcript ID of the assigned isoform.
Column 4 (gene_id): Ensembl gene ID associated with the assigned isoform.
Column 5 (file): Numerical index of the sample from which the read originates. The mapping between indices and actual sample files is provided in file_explain.txt.

single cell and spatial output files¶

1. file_explain.txt

This file establishes a mapping between samples and a numerical index, ranging from 0 to (number of samples − 1).

- Column 1 contains the index assigned to each sample by BroCOLI.
- Column 2 contains the absolute path to the corresponding SAM file for that sample.