Inconsistency between gene_grouped_counts_linear.tsv and read_assignments.tsv.gz?

Hi, thank you for developing this tool.

I would like to ask about the interpretation of the read assignments file.
I'm trying IsoQuant version 3.6.1 on single-cell data with the --no_model_construction option.
The gene-expression output file (`OUT.gene_grouped_counts_linear.tsv`) had several cells with a low read count. For example, picking up a cell with a barcode CAGCAGCGTTGGGACA:
```
grep -w CAGCAGCGTTGGGACA OUT.transcript_grouped_counts_linear.tsv
```
reveals only one line with a count of 1.00:
ENST00000549920.6	CAGCAGCGTTGGGACA	1.00

Meanwhile, when I looked into the read assignment file (`OUT.read_assignments.tsv.gz`), this cell had a total of 696 assignments, including 251 unique ones:
```
zgrep -w "CB=CAGCAGCGTTGGGACA" OUT.read_assignments.tsv.gz | cut -f6 | sort | uniq -c
```

54 ambiguous
52 inconsistent
94 inconsistent_ambiguous
177 inconsistent_non_intronic
13 noninformative
251 unique
55 unique_minor_difference

Representative lines:
```
#read_id        chr     strand  isoform_id      gene_id assignment_type assignment_events       exons   additional_info
molecule/2117770        chr1    +       ENST00000374550.8       ENSG00000142676.14      unique  fsm,tes_match:8,correct_polya_site_right:23696423       23691806-23691829,23692609-23692759,23693807-23693913,23694660-23694791,23695798-23695908,23696344-23696425     gene_assignment=unique; PolyA=True; Canonical=True; Classification=full_splice_match; CB=CAGCAGCGTTGGGACA;
molecule/2117771        chr1    +       ENST00000374550.8       ENSG00000142676.14      unique  fsm,tes_match:8,correct_polya_site_right:23696423       23691806-23691829,23692609-23692759,23693807-23693913,23694660-23694791,23695798-23695908,23696344-23696425     gene_assignment=unique; PolyA=True; Canonical=True; Classification=full_splice_match; CB=CAGCAGCGTTGGGACA;
molecule/2117773        chr1    +       ENST00000374550.8       ENSG00000142676.14      unique  fsm,tes_match_precise:1,correct_polya_site_right:23696416       23691806-23691829,23692609-23692759,23693807-23693913,23694660-23694791,23695798-23695908,23696344-23696418      gene_assignment=unique; PolyA=True; Canonical=True; Classification=full_splice_match; CB=CAGCAGCGTTGGGACA;
molecule/2117774        chr1    +       ENST00000374550.8       ENSG00000142676.14      unique  fsm,tes_match_precise:1,correct_polya_site_right:23696416       23691806-23691829,23692609-23692759,23693807-23693913,23694660-23694791,23695798-23695908,23696344-23696418      gene_assignment=unique; PolyA=True; Canonical=True; Classification=full_splice_match; CB=CAGCAGCGTTGGGACA;
molecule/2117778        chr1    +       ENST00000458455.2       ENSG00000142676.14      unique  fsm,correct_polya_site_right:23696414   23692608-23692759,23693807-23693913,23694660-23694791,23695798-23695908,23696344-23696412       gene_assignment=unique; PolyA=True; Canonical=True; Classification=full_splice_match; CB=CAGCAGCGTTGGGACA;
molecule/2117341        chr1    +       ENST00000373812.8       ENSG00000198492.16      unique  fsm,correct_polya_site_right:28769774   28736992-28737147,28737658-28737682,28738259-28738338,28742403-28743986,28768929-28769773       gene_assignment=unique; PolyA=True; Canonical=True; Classification=full_splice_match; CB=CAGCAGCGTTGGGACA;
molecule/2117960        chr1    -       ENST00000531243.2       ENSG00000090621.15      unique  mono_exonic,tss_match_precise:2 39576661-39576789       gene_assignment=unique; PolyA=False; Canonical=Unspliced; Classification=incomplete_splice_match; CB=CAGCAGCGTTGGGACA;
molecule/2117690        chr1    +       ENST00000304979.8       ENSG00000171960.12      unique  fsm,tes_match_precise:0,correct_polya_site_right:42676757       42658424-42658512,42658844-42658908,42659228-42659251,42659522-42659566,42660862-42660904,42664863-42664955,42665980-42666067,42666547-42666587,42667351-42667440,42676584-42676758      gene_assignment=unique; PolyA=True; Canonical=True; Classification=full_splice_match; CB=CAGCAGCGTTGGGACA;
molecule/2117928        chr1    +       ENST00000436427.1       ENSG00000065978.19      unique  mono_exonic,tes_match_precise:0,correct_polya_site_right:42702349       42702036-42702349       gene_assignment=unique; PolyA=True; Canonical=Unspliced; Classification=incomplete_splice_match; CB=CAGCAGCGTTGGGACA;
molecule/2117801        chr1    +       ENST00000311672.10      ENSG00000173660.12      unique  fsm,tss_match_precise:0,tes_match_precise:-3,correct_polya_site_right:46316774  46303698-46303820,46309101-46309127,46310155-46310316,46316552-46316773 gene_assignment=unique; PolyA=True; Canonical=True; Classification=full_splice_match; CB=CAGCAGCGTTGGGACA;
```

How do I interpret this result?

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistency between gene_grouped_counts_linear.tsv and read_assignments.tsv.gz? #258

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inconsistency between gene_grouped_counts_linear.tsv and read_assignments.tsv.gz? #258

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions