profile
viewpoint
Leighton Pritchard widdowquinn University of Strathclyde Glasgow, Scotland https://www.strath.ac.uk/staff/pritchardleightondr/ I'm a computational biologist, working at the University of Strathclyde, in Glasgow.

nickp60/riboSeed 6

pipeline for using ribosomal flanking regions to improve bacterial genome assembly

peterjc/thapbi-pict 6

Tree Health and Plant Biosecurity Initiative - Phytophthora ITS1 Classifier Tool

nickp60/EzClermont 4

Phylotype your strains using Clermont's 2013 method: ezclermont.org

HobnobMancer/cazy_webscraper 2

Web scraper to retrieve all protein data catalogued by the CAZy website/database.

HobnobMancer/pyrewton 2

Python package for the identification of comprehensive CAZyomes of candidate species

StAResComp/2018-11-27-standrews 2

Software Carpentry workshop at University of St Andrews, 27-28 November 2018

HobnobMancer/saintBioutils 1

Repository of utility and miscellaneous functions for using in bioinformatics pipelines, primarily in Python

HuttonICS/pyani 1

Python module for average nucleotide identity analyses

issue commentwiddowquinn/pyani

ANIm should not be symmetric

Thanks @dparks1134 - apologies for the bug, and we'll deal with this as a priority.

Quick notes

  • although it stems from the same misunderstanding of/assumptions about mummer's operation, this is not the same issue as #340 and will probably not be fixed by the proposed changes to resolve that issue.
  • the resolution currently appears to be to run mummer for the reciprocal pairwise comparisons.

L.

dparks1134

comment created time in 2 days

issue commentwiddowquinn/pyani

ValueError: zero-size array to reduction operation minimum which has no identity

Hi @neelam19051

Thank you for your interest in pyani - that's an error I've not seen before. It looks to me as though there has been an issue with the analysis. Would you be able to attach a minimal input dataset to this issue, so we can investigate?

Many thanks,

L.

neelam19051

comment created time in 6 days

issue commentwiddowquinn/pyani

PyANI not on path after installation

No need to apologise - our documentation could (should) be much clearer!

dparks1134

comment created time in 6 days

startedPyBites-Open-Source/pybites-carbon

started time in 9 days

issue commentwiddowquinn/pyani

why the clustering results are different in the two directions

Hi @magksi

Thank you for your interest in pyani. The reason the results are asymmetrical is because the algorithm described in the original manuscript introducing ANIb is inherently asymmetric and gives a different result when comparing genome A to genome B than it does comparing genome B to genome A.

In addition to this, coverage results are asymmetric because ANIb and ANIm only report similarities between detectably homologous regions in a pairwise analysis. The lengths of non-homologous regions (that don't participate in the ANI calculation) may differ between the genomes.

This question comes up occasionally, so you may find similar answers in the closed issues.

I hope this is useful and helps explain your observations.

L.

magksi

comment created time in 9 days

push eventwiddowquinn/pyani

Leighton Pritchard

commit sha ee796f5c88055910c6c02425fe3acf1d3533fb1e

add citations for early January 2022

view details

push time in 9 days

push eventwiddowquinn/pyani

Leighton Pritchard

commit sha 7215938f1b43e92889624f1cde0d2af169023498

add citations for December 2021

view details

push time in 9 days

push eventwiddowquinn/pyani

Leighton Pritchard

commit sha 4a99950601ceb7a1f5d9369de8c82bfa32cfbb07

add citations for November 2021

view details

push time in 9 days

push eventwiddowquinn/aoc2021

Leighton Pritchard

commit sha 730cc0f7450fe1df9256cbd28ef093063e7109df

add day 22 to readme

view details

push time in 11 days

push eventwiddowquinn/aoc2021

Leighton Pritchard

commit sha 9ddcb96952c5fd7c13dfd50a02b9db4b6a328698

add solution for day 22

view details

push time in 11 days

push eventwiddowquinn/aoc2021

Leighton Pritchard

commit sha da9470c44ddceb10c4a72a72d7b1037c4f5f1de9

add solution for day 21

view details

push time in 14 days

push eventwiddowquinn/aoc2021

Leighton Pritchard

commit sha 86efe5cc040863562399b0bb89de53568d040675

add solution for day 20

view details

push time in 14 days

push eventwiddowquinn/aoc2021

Leighton Pritchard

commit sha 717627e471f7f1df226788e07fa92b63681b7980

update requirements

view details

push time in 15 days

push eventwiddowquinn/aoc2021

Leighton Pritchard

commit sha 9a8577644cc189db053076b3c5a0945950f6db26

add solution for day 19

view details

push time in 15 days

push eventwiddowquinn/aoc2021

Leighton Pritchard

commit sha 81ba23699bda6cc8f31914c790e489e1c68c2b68

add solution for day 18

view details

push time in 16 days

push eventwiddowquinn/aoc2021

Leighton Pritchard

commit sha 875de021a425c5432396b7080a64564a409c7a85

minor change to day 16

view details

push time in a month

push eventwiddowquinn/aoc2021

Leighton Pritchard

commit sha c7d7b659fba70f6a176787bfe4fd078c2877d2fa

add day 17 solution

view details

push time in a month

push eventwiddowquinn/aoc2021

Leighton Pritchard

commit sha f3c2ba5e110d81747cb9b0db2e983491c120cc89

add day 16 solution

view details

push time in a month

create barnchwiddowquinn/ncfp

branch : issue_20

created branch time in a month

issue commentwiddowquinn/ncfp

`ncfp` not recovering all coding sequences from NCBI

It may be relevant that, locally, the tests fail with warnings like:

[...]
[WARNING] [ncbi_cds_from_protein.scripts.ncfp]: No record found for sequence input XP_004520832.1
[WARNING] [ncbi_cds_from_protein.scripts.ncfp]: No record found for sequence input XP_004520832.1
[WARNING] [ncbi_cds_from_protein.scripts.ncfp]: No record found for sequence input XP_004520832.1
[WARNING] [ncbi_cds_from_protein.scripts.ncfp]: No record found for sequence input XP_004520832.1
[WARNING] [ncbi_cds_from_protein.scripts.ncfp]: No record found for sequence input XP_004520832.1
[...]
widdowquinn

comment created time in a month

issue openedwiddowquinn/ncfp

`ncfp` not recovering all coding sequences from NCBI

Summary:

ncfp does not recover all coding sequences from NCBI, even if a coding sequence is available

Description:

The UniProt sequence below

>tr|F5NV06|F5NV06_SHIFL MliC domain-containing protein OS=Shigella flexneri K-227 OX=766147 GN=SFK227_1958 PE=4 SV=1
MKKLLIIILPVLLSGCSAFNQLVERMQTDTLEYQCDEKPLTVKLNNPCQEVSFVYDNQLL
HLKQGLSASGARYSDGIYVFWSKGEEATVYKRDRIVLNNCQLQNPQR

corresponds to the NCBI record

https://www.ncbi.nlm.nih.gov/protein/333018885

whose coding sequence is in the nucleotide accession

https://www.ncbi.nlm.nih.gov/nuccore/AFGY01000021.1

but in debug mode ncfp reports:

[DEBUG] [ncbi_cds_from_protein.sequences]: Guessing sequence type for tr|F5NV06|F5NV06_SHIFL...
[DEBUG] [ncbi_cds_from_protein.sequences]: ...guessed UniProt
[DEBUG] [ncbi_cds_from_protein.sequences]: Uniprot record has GN field: SFK227_1958
[DEBUG] [ncbi_cds_from_protein.sequences]: Recovered EMBL database record: AFGY01000021
[DEBUG] [ncbi_cds_from_protein.sequences]: Adding record tr|F5NV06|F5NV06_SHIFL to cache with query AFGY01000021
Process input sequences: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  5.12it/s]
[INFO] [ncbi_cds_from_protein.scripts.ncfp]: 1 sequences taken forward with query
[INFO] [ncbi_cds_from_protein.scripts.ncfp]: Identifying nucleotide accessions...
Search NT IDs:   0%|                                                                                                                    | 0/1 [00:00<?, ?it/s][DEBUG] [ncbi_cds_from_protein.entrez]: Entry has nt query, using direct ESearch
[DEBUG] [ncbi_cds_from_protein.entrez]: ESearch query: ('AFGY01000021',)
Search NT IDs: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.81it/s]
[INFO] [ncbi_cds_from_protein.scripts.ncfp]: Added 1 new UIDs to cache
[INFO] [ncbi_cds_from_protein.scripts.ncfp]: Collecting GenBank accessions...
Fetch UID accessions: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.24s/it]
[INFO] [ncbi_cds_from_protein.scripts.ncfp]: Updated GenBank accessions for 1 UIDs
[INFO] [ncbi_cds_from_protein.scripts.ncfp]: Fetching GenBank headers...
[DEBUG] [ncbi_cds_from_protein.entrez]: Found 1 UIDs with no GenBank headers
[DEBUG] [ncbi_cds_from_protein.entrez]: Checking EPost histories, batch size is 1
[DEBUG] [ncbi_cds_from_protein.entrez]: Found 1 EPost histories, fetching headers
[...]
DEBUG:ncbi_cds_from_protein.entrez:Parsed 1 records
Fetching GenBank headers: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:07<00:00,  7.22s/it]
[INFO] [ncbi_cds_from_protein.scripts.ncfp]: Fetched GenBank headers for 0 UIDs
INFO:ncbi_cds_from_protein.scripts.ncfp:Fetched GenBank headers for 0 UIDs
[WARNING] [ncbi_cds_from_protein.scripts.ncfp]: No GenBank header downloads were required! (in cache?)
WARNING:ncbi_cds_from_protein.scripts.ncfp:No GenBank header downloads were required! (in cache?)
[...]
[WARNING] [ncbi_cds_from_protein.scripts.ncfp]: No record found for sequence input tr|F5NV06|F5NV06_SHIFL
WARNING:ncbi_cds_from_protein.scripts.ncfp:No record found for sequence input tr|F5NV06|F5NV06_SHIFL
[INFO] [ncbi_cds_from_protein.scripts.ncfp]: Matched 0/1 records
INFO:ncbi_cds_from_protein.scripts.ncfp:Matched 0/1 records

and the ncfp*.fasta output files are empty.

Reproducible Steps:

  1. Create an input file containing only the sequence above.
  2. Call ncfp on that input file, e.g. with ncfp --debug -l test.log -b 1 --keepcache test.fasta test_ncfp me@my.email

ncfp Version:

Commit 0f70697

Python Version:

Python 3.8

Operating System:

macOS

created time in a month

push eventwiddowquinn/aoc2021

Leighton Pritchard

commit sha c18daa7f0252a924ea8b1a96c3ae117e0c855264

add solution to day 14

view details

push time in a month

push eventwiddowquinn/aoc2021

Leighton Pritchard

commit sha c94f3cff082fae6a1855a5aad1da5abfb2c502c6

minor day 12 change

view details

push time in a month

push eventwiddowquinn/aoc2021

Leighton Pritchard

commit sha 70646efeae5c9bc758b362c9c84ab2c780790b91

add timing for alternative solution to day 13

view details

push time in a month

push eventwiddowquinn/aoc2021

Leighton Pritchard

commit sha 34ad6eadd46c4b579ce24b4b12e3217b08b84a25

add solution to day 13

view details

push time in a month

push eventwiddowquinn/aoc2021

Leighton Pritchard

commit sha de2df5e4e5215f015fefd562c79f2b65f843c2c6

add day 12 solution

view details

push time in a month

push eventwiddowquinn/aoc2021

Leighton Pritchard

commit sha 5c2029d935b7ad6e056db83466b19776de1fbd7e

add day 11 solution

view details

push time in a month

push eventwiddowquinn/aoc2021

Leighton Pritchard

commit sha 74beff3d21f85cde7edcb57275e8004f378e5dbe

fix typo in day 10

view details

push time in a month

push eventwiddowquinn/aoc2021

Leighton Pritchard

commit sha 9fd9dd7e7090a4a62e912c113700cd7a9f30a32c

fix day 8 typos

view details

Leighton Pritchard

commit sha 0c6203f4587f824d4c737877363e53f19c714fba

add day 10 solution

view details

push time in a month

push eventwiddowquinn/aoc2021

Leighton Pritchard

commit sha 60a33077cefffa1d77e6b959736b9904909426c6

add day 9 solution

view details

push time in a month

more