Skip to content

Mangled new lines in coverage gtf file #348

@OXB-DC

Description

@OXB-DC

Hi,
I'm noting that StringTie 2.1.4 seems to mangle the output of the coverage gtf files (obtained with the -C <cov_refs.gtf> option), in that some lines do not have a new line character added.

I am expecting 9 fields in the output files but sometime I get more. Easy to verify with:
awk -F '\t' '{print NF}' <cov_refs.gtf> | sort -nu | tail -n 1

I have written a script (with some judicious use of tr and sed) to correct this which looks up the names of each chromosome and scaffold from the fasta file and uses them as line delimiters to repair the coverage gtfs (so I have got a workaround). This only works with GENCODE or UCSC named chromosomes though (prefixed with chr).

Is this something anyone else has noticed? I'm only using the coverage gtf to be conservative about which transcripts I report (only those covered end-to-end in reads).

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions