profile
viewpoint

jaleezyy/covid-19-signal 28

Files and methodology pertaining to the sequencing and analysis of SARS-CoV-2, causative agent of COVID-19.

jts/assembly_accuracy 26

tools for assessing the accuracy of genome assemblies

jts/dbgfm 26

FM-index representation of a de Bruijn graph

jts/bri 20

Bam Read Index - Extract alignments from a bam file by readname

jts/bam2fastq 17

Simple convertor from bam to FASTQ

jts/csc2417 4

Course webpage for CSC2417

jts/DALIGNER 4

Find all significant local alignments between reads

jts/gfademo 4

Small demonstration of the GFA format

jts/bwt-benchmark 2

Framework to benchmark BWT construction algorithms

issue closedjts/nanopolish

High proportion of "couldn't calibrate" for short RNA reads

Hi,

I would like ask a question similar to issue #540.

I am having a very similar problem with my short RNA samples (they are 90 bp with polyA: so I am left with ~70 nt)

[post-run summary] total reads: 140905, unparseable: 0, qc fail: 0, could not calibrate: 135904, no alignment: 4986, bad fast5: 0

After nanopolish event-align scale-events, I am left with 18 reads.. I see that nanopolish has problems with calibrating and needs longer length: but then how do I ever work with my short length input? Do I have to ligate them? Should I use different parameters? Or a different tool you might suggest?

I would appreciate your comments! Thank you.

closed time in 4 days

kaltinel

issue commentjts/ncov-tools

Question about ambiguity threshold

Just to chime in with a few thoughts. Ambiguous positions can happen for quite a few reasons:

-RT/amplification artifacts for low quality/quantity samples. These should be relatively sporadic so not consistent across samples -contamination. Depending on how bad the contamination is this can lead to a few samples with the same artifact (or many if the contamination is particularly bad) -incorrect primer trimming -alignment artifacts -true intra-host variation/co-infections (rare but not unheard of)

So as @rdeborja said the interpretation is situational so this is a flag for followup/inspection. In general though if the sample has high Ct and/or low completeness the ambiguous bases are probably caused by RT/amplification issues.

ChadFibke

comment created time in 6 days

issue commentjts/nanopolish

Nanopolish event align output has infinity values in standardized level

Are the k-mers NNNNNN by chance?

kaltinel

comment created time in 6 days

issue commentjts/nanopolish

Mechanism of nanopolish event alignment

Hi,

You are correct that the hidden states are the k-mers of the reference sequence. The input into the HMM is a sequence of events (e_1, e_2, ..., e_n). We use dynamic programming (the viterbi or forward algorithm) to determine the best alignment of these events to k-mers. This would be a good place in the code to start:

https://github.com/jts/nanopolish/blob/master/src/nanopolish_raw_loader.cpp#L381

(nanopolish actually uses two HMMs, this is the simpler one that is used when loading the reads. There is a more complicated HMM here that is used during consensus/variant calling: https://github.com/jts/nanopolish/blob/master/src/hmm/nanopolish_profile_hmm_r9.inl#L266).

I hope that helps, Jared

WANGEOGEO

comment created time in 7 days

issue commentjts/nanopolish

Segment base assignment

Hi,

I'm afraid that it is not possible to assign signals to individual bases, at least not with the way nanopolish is currently designed. Sorry I couldn't help more.

Jared

JannesSP

comment created time in 7 days

issue commentjts/nanopolish

Nanopolish event align output has infinity values in standardized level

Hi,

This is unexpected, could you paste a few examples?

Jared

kaltinel

comment created time in 7 days

issue commentjts/nanopolish

Running methylation_frequency.py within a swarm job

Hi,

This is fine, as long as each input .tsv file is from a distinct genomic region. Once you have the frequency files for each chromosome/region, then you can merge them together if you'd like.

Jared

coeyct

comment created time in 7 days

issue commentjts/nanopolish

Somehow Nanopolish doesn't find my fast5 files. Error: no fast5 files found

Hi,

You should provide the same path you've given guppy, f5files.

Jared

ArjenB85

comment created time in 14 days

issue commentjts/nanopolish

mising function declaration for C99

Thanks for the quick reply. It looks like the error is in libhdf5. I am off on holidays now but will look into this in January (hopefully).

YuliyaLab

comment created time in a month

issue commentjts/nanopolish

mising function declaration for C99

Hi,

cache.c is not a nanopolish source file so I’m not sure where this error came from. Can you post the entire build log and the OS/compiler you’re using?

Jared

On Dec 19, 2021, at 9:33 AM, Yuliya ***@***.***> wrote:

Error when running: make

cache.c:25754:23: error: implicit declaration of function 'resize_configs_are_equal' is invalid in C99 [-Werror,-Wimplicit-function-declaration] } else if ( ! resize_configs_are_equal(&test_auto_size_ctl, \

Seem that function declaration prototype is missing.

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you are subscribed to this thread.

YuliyaLab

comment created time in a month

issue commentjts/nanopolish

evenalign for dRNA sequencing

Hi,

It depends on what you want to do with the data. Neither solution is perfect as the transcriptome is redundant (the same genomic k-mer may be present in multiple transcripts) and the alignment quality may be poor around splice sites when aligning to the genome.

Jared

xieyy46

comment created time in a month

issue commentjts/nanopolish

Update nanopolish v0.13.3 in anaconda

Thank you for the detailed report, and for pointing out the path in the comment I linked to is incorrect, I have fixed that. One of the differences between 0.13.2 and 0.13.3 is that 0.13.3 will warn when the plugin is missing whereas 0.13.2 will silently skip the data.

Perhaps on the Nanopolish README.md, in the "Installing the latest code from github (recommended)" section, it might be worth adding a note about fast5 files being VBZ-compressed since the recent MinION software update, and so need to install the hdf5plugin and set the 'HDF5_PLUGIN_PATH' path to enable Nanopolish to read these files.

I'll make a note along these lines when I release 0.14 (likely in January). This is a common issue so I'll try to devise a way to automatically install the plugin, if possible.

egirard1

comment created time in a month

issue commentjts/nanopolish

Update nanopolish v0.13.3 in anaconda

Nanopolish always reads the fast5 files, in this case it uses the index files for the fastq to determine which ones to load.

Jared

On Dec 14, 2021, at 5:12 PM, Stephen Bridgett ***@***.***> wrote:

Thank you for replying so quickly. It might be that the updated Nanopore software writes VBZ-compressed fast5 files now. I'll look into the VBZ decompression plugin, although the nanopolish command used in that step of the Nextflow-ARTIC pipeline that writes the vcf file, only reads from a .fastq and .bam files, not a .fast5 file, so I'm not sure why it needs VBZ decompression at this step:

nanopolish variants --verbose --min-flanking-sequence 10 -x 1000000 --progress -t 1 --reads barcode01.fastq -o barcode01.nCoV-2019_1.vcf -b barcode01.trimmed.rg.sorted.bam -g primer-schemes/nCoV-2019/V3/nCoV-2019.reference.fasta -w "MN908947.3:1-29904" --ploidy 1 -m 0.15 --read-group nCoV-2019_1

The input "barcode01.fastq" file contains 324,882 reads.

And the "barcode01.trimmed.rg.sorted.bam" file has 29215 (of the 30000 reference bases) covered at mean depth of 97.7 reads:

$ samtools coverage barcode01.trimmed.rg.sorted.bam #rname startpos endpos numreads covbases coverage meandepth meanbaseq meanmapq MN908947.3 1 29903 91751 29215 97.6992 1023.17 19.2 60 — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

egirard1

comment created time in a month

issue commentjts/nanopolish

Update nanopolish v0.13.3 in anaconda

Hi @sbridgett,

There is no difference between 0.13.2 and 0.13.3 with respect to variant calling, so I suspect it isn't the cause of this issue. My first guess is that the FAST5 files are VBZ-compressed, but you don't have the VBZ decompression plugin loaded. Is this possible? You can read more about VBZ compression here: https://github.com/jts/nanopolish/issues/932#issuecomment-914303734 and here: https://github.com/nanoporetech/vbz_compression#vbz-compression

Jared

egirard1

comment created time in a month

issue commentjts/nanopolish

Increasing speed of polyA tail length determination

That is a good suggestion, thanks Hasindu!

maximus-sci

comment created time in a month

issue commentjts/nanopolish

Increasing speed of polyA tail length determination

Hi @maximus-sci,

I'm currently trying to run nanopolish polya and it's quite slow (taking nearly a day for a file with ~1 million reads). First, I'm wondering if this is normal and, if so, what the bottleneck likely is. I am running this on a computing cluster, so I can increase the memory and number of threads to whatever it needs to be.

Since you a computing cluster, I suggest splitting up the input into chunks (say 50,000 reads) and process each in parallel with a separate job. This is better than increasing the number of threads within a single job, as nanopolish is limited by the HDF5.

I'm also wondering how the qc_tag field interacts with the MinKNOW software's adaptive sampling capabilities. Presumably a few hundred bases of a nucleic acid will already have been read by the sequencer before it's able to reject the read and spit it back out. I'm guessing this would result in a similar signature as when there's a mux change to unclog a blocked pore. Would the QC tag for these reads still be a "PASS" or something else?

I don't know how this will interact with adaptive sampling, sorry. It will depend on exactly how the rejection decision is made, and how much of the molecule is sequenced.

maximus-sci

comment created time in a month

issue commentjts/nanopolish

r10 branch make error

Hi,

The --model command tells nanopolish to load the given model, but it will still try to detect the pore type from the fast5 files. If it cannot detect the pore type, it will default to R9.4 and use a built-in model. This may be the problem.

If you're comfortable changing the code and recompiling nanopolish you can try to hard-code the pore type to R10 here to see if it fixes the problem:

https://github.com/jts/nanopolish/blob/r10/src/nanopolish_squiggle_read.cpp#L104

Note that I haven't tested nanopolish on R10.4 yet.

Jared

rekham1077

comment created time in a month

issue commentjts/nanopolish

GpC methylation training fails

Hi,

We have a trained GpC model in the nanonome branch that you may want to use rather than training your own:

https://github.com/jts/nanopolish/tree/nanonome

Right now, training your own model requires writing a bit of C++ code. If you'd like to train your own GpC model, you'll need to set the structs in nanopolish_alphabet to define what GpC methylation looks like. It will be easiest to just copy all the GpC structs from here:

https://github.com/jts/nanopolish/blob/nanonome/src/common/nanopolish_alphabet.h#L392

cvermeulen88

comment created time in a month

issue commentjts/nanopolish

Too many errors when trying to compile Nanopolish

I see you've got this working through conda so I'll close this issue

arturomarin

comment created time in a month

issue closedjts/nanopolish

Too many errors when trying to compile Nanopolish

I am trying to compile Nanopolish v0.13.3 in MacOS 11.6.1. I have used the commands:

git clone --recursive https://github.com/jts/nanopolish.git
cd nanopolish 
make

but the terminal gives me the error:

fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
make[2]: *** [cache.o] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [lib/libhdf5.a] Error 2

closed time in a month

arturomarin

issue commentjts/nanopolish

Error with fast5 files when using nanopolish v0.13.2

Hi,

Please see here: https://github.com/jts/nanopolish/issues/945#issuecomment-939092891

arturomarin

comment created time in a month

issue commentjts/nanopolish

Minimum proxitimty of modifications on input molecule to be calibrated

Hi @kaltinel,

I don't think I can provide a definitive number here, since it depends on the sequence context and the type of modification. Generally we found that 5-mC in CpG is OK, even if the modifications are fairly dense. I hope that helps to some extent.

Jared

kaltinel

comment created time in 2 months

issue commentjts/nanopolish

methylation call-threshold on non-singleton events

Ah my apologies, I didn't link the username between those two issues.

SkabbiVML

comment created time in 2 months

issue commentjts/nanopolish

potential false negative in SC2 sequencing

I was involved in developing these thresholds. The reason we divide by TotalReads is to make a metric that is not influenced by depth (I interpret this metric as how strongly the model supports the variant over the reference base, with higher numbers being better). Its up to you whether you change the threshold, dropping it to 2.75 is probably fairly safe.

BCArg

comment created time in 2 months

issue commentjts/nanopolish

potential false negative in SC2 sequencing

Thanks. Nanopolish has called variants here but they have narrowly failed an ARTIC pipeline QC check (QUAL / TotalReads > 3 - these variants have values of 2.95 and 2.75, respectively). These positions will be masked with N in the consensus sequence to avoid false negatives/false positives causing issues in downstream analysis. Here's the relevant code from the artic pipeline:

https://github.com/artic-network/fieldbioinformatics/blob/master/artic/vcf_filter.py#L30

BCArg

comment created time in 2 months

issue commentjts/nanopolish

potential false negative in SC2 sequencing

Hi,

Is this mutation in the *.fail.vcf file for the bottom sample?

Jared

On Nov 29, 2021, at 5:35 AM, Bruno Hinckel ***@***.***> wrote:

I am using the artic pipeline to call variants on corona virus sequencing. I am afraid nanopolish missed something that should have been called, as shown below:

The two tracks represent sequencing of the same sample. Even more puzzling is the fact that the SNP was called on the top track i.e. that of lower coverage in that region.

This does not sound too intuitive to me. If this is the expected behaviour, though, what would be the explanation?

Please let me know if I should upload any other file for a better assessment.

Below is the vcf entry, where the SNP was called

MN908947.3 21638 . C T 663.8 PASS TotalReads=197;SupportFraction=0.8005;SupportFractionByStrand=0.990136,0.744358;BaseCalledReadsWithVariant=187;BaseCalledFraction=0.903382;AlleleCo unt=1;StrandSupport=0,39,45,113;StrandFisherTest=47;SOR=0.00820477;RefContext=TACCCCCTGCA;Pool=nCoV-2019_2 GT 1 — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

BCArg

comment created time in 2 months

issue commentjts/nanopolish

methylation call-threshold on non-singleton events

This is an open question still but I think it is fair to try to relax the threshold. You may also try to reduce the minimum distance between CpGs (—min-separation) to break up the groups as the default value of 10 is fairly conservative.

Jared

On Nov 26, 2021, at 9:02 AM, SkabbiVML ***@***.***> wrote:

Hi @jts

I'm running nanopolish on fast5 files to detect promoter methylation on a pretty dense CpG island. After nanopolish, I summarize the results with calculate_methylation_frequency.py

As far as I understand, the default call-threshold in calculate_methylation_frequency set to log_lik_ratio > 2.0 for a singleton event. This threshold is multiplied by the number of CpGs in a non-singleton event. A non-singleton event with 10 CpGs will have a threshold for being called or not called of 20.

Does it make sense to relax the threshold for non-singleton events?

Cheers

S

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

SkabbiVML

comment created time in 2 months

issue commentjts/nanopolish

Is it possible to get all event index in event align

Based on the k-mer size I guess this is RNA? If so, the signal analysis coming out of nanopolish can be a little complicated as the sequencing is in the 3' -> 5' direction but nanopolish flips the events to be 5' -> 3' direction. I note that the sample indices (start_idx, end_idx) are decreasing, so perhaps the aligned signal goes from the highlighted region towards the lowest coordinate, rather than across the long homopolymer region? Would this make more sense.

manburst

comment created time in 2 months

issue commentjts/nanopolish

Is it possible to get all event index in event align

Hi,

  1. Unfortunately this is not possible. nanopolish eventalign is based on signal-to-basecall and basecall-to-reference alignments. Hence if some bases of the read are not aligned to the reference they will be skipped.
  2. You can try the --signal-index option which will print out the index of the raw sample for each aligned event.
manburst

comment created time in 2 months

more