-
Notifications
You must be signed in to change notification settings - Fork 22
Description
Hi, thank you for developing this tool.
I would like to ask about the interpretation of the read assignments file.
I'm trying IsoQuant version 3.6.1 on single-cell data with the --no_model_construction option.
The gene-expression output file (OUT.gene_grouped_counts_linear.tsv) had several cells with a low read count. For example, picking up a cell with a barcode CAGCAGCGTTGGGACA:
grep -w CAGCAGCGTTGGGACA OUT.transcript_grouped_counts_linear.tsv
reveals only one line with a count of 1.00:
ENST00000549920.6 CAGCAGCGTTGGGACA 1.00
Meanwhile, when I looked into the read assignment file (OUT.read_assignments.tsv.gz), this cell had a total of 696 assignments, including 251 unique ones:
zgrep -w "CB=CAGCAGCGTTGGGACA" OUT.read_assignments.tsv.gz | cut -f6 | sort | uniq -c
54 ambiguous
52 inconsistent
94 inconsistent_ambiguous
177 inconsistent_non_intronic
13 noninformative
251 unique
55 unique_minor_difference
Representative lines:
#read_id chr strand isoform_id gene_id assignment_type assignment_events exons additional_info
molecule/2117770 chr1 + ENST00000374550.8 ENSG00000142676.14 unique fsm,tes_match:8,correct_polya_site_right:23696423 23691806-23691829,23692609-23692759,23693807-23693913,23694660-23694791,23695798-23695908,23696344-23696425 gene_assignment=unique; PolyA=True; Canonical=True; Classification=full_splice_match; CB=CAGCAGCGTTGGGACA;
molecule/2117771 chr1 + ENST00000374550.8 ENSG00000142676.14 unique fsm,tes_match:8,correct_polya_site_right:23696423 23691806-23691829,23692609-23692759,23693807-23693913,23694660-23694791,23695798-23695908,23696344-23696425 gene_assignment=unique; PolyA=True; Canonical=True; Classification=full_splice_match; CB=CAGCAGCGTTGGGACA;
molecule/2117773 chr1 + ENST00000374550.8 ENSG00000142676.14 unique fsm,tes_match_precise:1,correct_polya_site_right:23696416 23691806-23691829,23692609-23692759,23693807-23693913,23694660-23694791,23695798-23695908,23696344-23696418 gene_assignment=unique; PolyA=True; Canonical=True; Classification=full_splice_match; CB=CAGCAGCGTTGGGACA;
molecule/2117774 chr1 + ENST00000374550.8 ENSG00000142676.14 unique fsm,tes_match_precise:1,correct_polya_site_right:23696416 23691806-23691829,23692609-23692759,23693807-23693913,23694660-23694791,23695798-23695908,23696344-23696418 gene_assignment=unique; PolyA=True; Canonical=True; Classification=full_splice_match; CB=CAGCAGCGTTGGGACA;
molecule/2117778 chr1 + ENST00000458455.2 ENSG00000142676.14 unique fsm,correct_polya_site_right:23696414 23692608-23692759,23693807-23693913,23694660-23694791,23695798-23695908,23696344-23696412 gene_assignment=unique; PolyA=True; Canonical=True; Classification=full_splice_match; CB=CAGCAGCGTTGGGACA;
molecule/2117341 chr1 + ENST00000373812.8 ENSG00000198492.16 unique fsm,correct_polya_site_right:28769774 28736992-28737147,28737658-28737682,28738259-28738338,28742403-28743986,28768929-28769773 gene_assignment=unique; PolyA=True; Canonical=True; Classification=full_splice_match; CB=CAGCAGCGTTGGGACA;
molecule/2117960 chr1 - ENST00000531243.2 ENSG00000090621.15 unique mono_exonic,tss_match_precise:2 39576661-39576789 gene_assignment=unique; PolyA=False; Canonical=Unspliced; Classification=incomplete_splice_match; CB=CAGCAGCGTTGGGACA;
molecule/2117690 chr1 + ENST00000304979.8 ENSG00000171960.12 unique fsm,tes_match_precise:0,correct_polya_site_right:42676757 42658424-42658512,42658844-42658908,42659228-42659251,42659522-42659566,42660862-42660904,42664863-42664955,42665980-42666067,42666547-42666587,42667351-42667440,42676584-42676758 gene_assignment=unique; PolyA=True; Canonical=True; Classification=full_splice_match; CB=CAGCAGCGTTGGGACA;
molecule/2117928 chr1 + ENST00000436427.1 ENSG00000065978.19 unique mono_exonic,tes_match_precise:0,correct_polya_site_right:42702349 42702036-42702349 gene_assignment=unique; PolyA=True; Canonical=Unspliced; Classification=incomplete_splice_match; CB=CAGCAGCGTTGGGACA;
molecule/2117801 chr1 + ENST00000311672.10 ENSG00000173660.12 unique fsm,tss_match_precise:0,tes_match_precise:-3,correct_polya_site_right:46316774 46303698-46303820,46309101-46309127,46310155-46310316,46316552-46316773 gene_assignment=unique; PolyA=True; Canonical=True; Classification=full_splice_match; CB=CAGCAGCGTTGGGACA;
How do I interpret this result?
Thanks!