Skip to content

Inconsistency between gene_grouped_counts_linear.tsv and read_assignments.tsv.gz? #258

@qsonehara

Description

@qsonehara

Hi, thank you for developing this tool.

I would like to ask about the interpretation of the read assignments file.
I'm trying IsoQuant version 3.6.1 on single-cell data with the --no_model_construction option.
The gene-expression output file (OUT.gene_grouped_counts_linear.tsv) had several cells with a low read count. For example, picking up a cell with a barcode CAGCAGCGTTGGGACA:

grep -w CAGCAGCGTTGGGACA OUT.transcript_grouped_counts_linear.tsv

reveals only one line with a count of 1.00:
ENST00000549920.6 CAGCAGCGTTGGGACA 1.00

Meanwhile, when I looked into the read assignment file (OUT.read_assignments.tsv.gz), this cell had a total of 696 assignments, including 251 unique ones:

zgrep -w "CB=CAGCAGCGTTGGGACA" OUT.read_assignments.tsv.gz | cut -f6 | sort | uniq -c

54 ambiguous
52 inconsistent
94 inconsistent_ambiguous
177 inconsistent_non_intronic
13 noninformative
251 unique
55 unique_minor_difference

Representative lines:

#read_id        chr     strand  isoform_id      gene_id assignment_type assignment_events       exons   additional_info
molecule/2117770        chr1    +       ENST00000374550.8       ENSG00000142676.14      unique  fsm,tes_match:8,correct_polya_site_right:23696423       23691806-23691829,23692609-23692759,23693807-23693913,23694660-23694791,23695798-23695908,23696344-23696425     gene_assignment=unique; PolyA=True; Canonical=True; Classification=full_splice_match; CB=CAGCAGCGTTGGGACA;
molecule/2117771        chr1    +       ENST00000374550.8       ENSG00000142676.14      unique  fsm,tes_match:8,correct_polya_site_right:23696423       23691806-23691829,23692609-23692759,23693807-23693913,23694660-23694791,23695798-23695908,23696344-23696425     gene_assignment=unique; PolyA=True; Canonical=True; Classification=full_splice_match; CB=CAGCAGCGTTGGGACA;
molecule/2117773        chr1    +       ENST00000374550.8       ENSG00000142676.14      unique  fsm,tes_match_precise:1,correct_polya_site_right:23696416       23691806-23691829,23692609-23692759,23693807-23693913,23694660-23694791,23695798-23695908,23696344-23696418      gene_assignment=unique; PolyA=True; Canonical=True; Classification=full_splice_match; CB=CAGCAGCGTTGGGACA;
molecule/2117774        chr1    +       ENST00000374550.8       ENSG00000142676.14      unique  fsm,tes_match_precise:1,correct_polya_site_right:23696416       23691806-23691829,23692609-23692759,23693807-23693913,23694660-23694791,23695798-23695908,23696344-23696418      gene_assignment=unique; PolyA=True; Canonical=True; Classification=full_splice_match; CB=CAGCAGCGTTGGGACA;
molecule/2117778        chr1    +       ENST00000458455.2       ENSG00000142676.14      unique  fsm,correct_polya_site_right:23696414   23692608-23692759,23693807-23693913,23694660-23694791,23695798-23695908,23696344-23696412       gene_assignment=unique; PolyA=True; Canonical=True; Classification=full_splice_match; CB=CAGCAGCGTTGGGACA;
molecule/2117341        chr1    +       ENST00000373812.8       ENSG00000198492.16      unique  fsm,correct_polya_site_right:28769774   28736992-28737147,28737658-28737682,28738259-28738338,28742403-28743986,28768929-28769773       gene_assignment=unique; PolyA=True; Canonical=True; Classification=full_splice_match; CB=CAGCAGCGTTGGGACA;
molecule/2117960        chr1    -       ENST00000531243.2       ENSG00000090621.15      unique  mono_exonic,tss_match_precise:2 39576661-39576789       gene_assignment=unique; PolyA=False; Canonical=Unspliced; Classification=incomplete_splice_match; CB=CAGCAGCGTTGGGACA;
molecule/2117690        chr1    +       ENST00000304979.8       ENSG00000171960.12      unique  fsm,tes_match_precise:0,correct_polya_site_right:42676757       42658424-42658512,42658844-42658908,42659228-42659251,42659522-42659566,42660862-42660904,42664863-42664955,42665980-42666067,42666547-42666587,42667351-42667440,42676584-42676758      gene_assignment=unique; PolyA=True; Canonical=True; Classification=full_splice_match; CB=CAGCAGCGTTGGGACA;
molecule/2117928        chr1    +       ENST00000436427.1       ENSG00000065978.19      unique  mono_exonic,tes_match_precise:0,correct_polya_site_right:42702349       42702036-42702349       gene_assignment=unique; PolyA=True; Canonical=Unspliced; Classification=incomplete_splice_match; CB=CAGCAGCGTTGGGACA;
molecule/2117801        chr1    +       ENST00000311672.10      ENSG00000173660.12      unique  fsm,tss_match_precise:0,tes_match_precise:-3,correct_polya_site_right:46316774  46303698-46303820,46309101-46309127,46310155-46310316,46316552-46316773 gene_assignment=unique; PolyA=True; Canonical=True; Classification=full_splice_match; CB=CAGCAGCGTTGGGACA;

How do I interpret this result?

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingfixed in devIssue resolved but not released yet

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions