profile
viewpoint
Greg Caporaso gregcaporaso @qiime2 Flagstaff, Arizona, USA www.caporasolab.us

applied-bioinformatics/An-Introduction-To-Applied-Bioinformatics 657

Interactive lessons in bioinformatics.

caporaso-lab/mockrobiota 65

A public resource for microbiome bioinformatics benchmarking using artificially constructed (i.e., mock) communities.

caporaso-lab/tax-credit 13

A repository for storing code and data related to a systematic comparison of short read taxonomy assignment tools

caporaso-lab/student-microbiome-project 8

Central repository for data and analysis tools for the StudentMicrobiomeProject.

caporaso-lab/cloaked-octo-ninja 4

Comparative analysis of high-level OTU picking protocols

caporaso-lab-graveyard/built-iab 2

The latest Jupyter notebooks for An Introduction to Applied Bioinformatics.

caporaso-lab-graveyard/q2d3 2

Early prototype interface illustrating how to use the QIIME 2 SDK. No longer supported.

push eventgregcaporaso/2022.1-faes-tutorial

Greg Caporaso

commit sha cfef3b127f6a8e71704085312e256df61b652c87

removed some unnencessary content from this file

view details

Greg Caporaso

commit sha 7f0e58b9ba98b8850b55856c36c4c9c4659230c7

add numbers to file names in intro for consistency with other sections

view details

push time in 4 days

issue commentgregcaporaso/2022.1-faes-tutorial

add custom `question` admonition

Confirming that substitutions worked for the long-line issue - thanks @thermokarst!

gregcaporaso

comment created time in 4 days

PR opened gregcaporaso/2022.1-faes-tutorial

remove TODO notes throughout text

Some of these were addressed and some were migrated to new issues.

+68 -184

0 comment

10 changed files

pr created time in 4 days

push eventgregcaporaso/2022.1-faes-tutorial

Greg Caporaso

commit sha 82d9b313b2a0b99967d77659522a14281c10b7e4

fixed broken image reference

view details

push time in 5 days

issue openedgregcaporaso/2022.1-faes-tutorial

importing.md chapter is incomplete

Either remove content to focus on the "why" part exclusively, or expand to include discussion of importing different types. (At the moment, importing different types is introduced but is incomplete.)

created time in 5 days

create barnchgregcaporaso/2022.1-faes-tutorial

branch : todos

created branch time in 5 days

issue closedgregcaporaso/2022.1-faes-tutorial

Modify artifact numbering in feature table filtering tutorial

The tutorial contains an extraneous filtering step after filtering for autoFMT study samples:

Screenshot 2022-01-06 at 14 47 39

This step was removed in the video tutorial. Consequently, downstream artifact names are now off by 1, e.g. filtered-table-1.qza in the video corresponds to filtered-table-2.qza in the written tutorial. All references to these file names should be updated in the written tutorial.

closed time in 5 days

lina-kim

issue closedgregcaporaso/2022.1-faes-tutorial

Modify taxonomy filtering command in phylogeny tutorial

Given the filtering-step as outlined here, I'd recommend using the following command, or a variant of it, which I pulled from this post:

qiime taxa filter-table \
    --i-table table.qza \
    --i-taxonomy taxonomy.qza \
    --p-mode 'contains'  \
    --p-include 'p__' \
    --p-exclude 'p__;,Eukaryota,Chloroplast,Mitochondria,Unassigned,Unclassified' \
    --o-filtered-table ./table-no-ecmu.qza

Note that I set --p-exclude 'p__;,... . This is more explicit at removing taxa that have only the p__ rank, i.e. no accompanying taxonomic label. That is, --p-include 'p__' will keep k__Bacteria; p__Proteobacteria; as well as any data that has an empty phylum rank such as k__Bacteria; p__;. Which technically has no phylum classification.

Yes, the command above --p-include 'p__' is quite redundant and not needed with the given exclude command. I only place it there for the sake of completeness and explicitness for teaching the difference between p__ and p__;. :-)

Or simply mentioning that it is recommended that plastid / organellar, and perhaps even host sequences be removed. Especially, when considering that mitochondria are a "family" within the phylum Alphaproteobacteria, and chloroplasts are a "class" within the phylum Cyanobacteria. So, if the user does not look at the family or class level they may inadvertently retain these sequences.

NOTE: This is presented out of order in reference to the workshop schedule. That is, the material for taxonomic classification occurs after the phylogeny bit. So, perhaps this should be mentioned as something to consider later on to avoid user confusion? That is something like "If you already have taxonomy information you can also perform additional filtering like so..."

closed time in 5 days

mikerobeson

issue openedgregcaporaso/2022.1-faes-tutorial

add workshop-relevant links to book/index.md

This will include the workshop server, the workshop schedule, Zulip, etc.

created time in 5 days

issue closedgregcaporaso/2022.1-faes-tutorial

add classification with pre-trained, weighted classifier from ready-to-wear

To promote the idea of weighted classifiers and get improved taxonomic assignments, train a human stool GTDB classifier from the weights in ready-to-wear for use in this tutorial.

closed time in 5 days

gregcaporaso

issue commentgregcaporaso/2022.1-faes-tutorial

add classification with pre-trained, weighted classifier from ready-to-wear

Given that we're already recording videos I think this makes sense to save for a future iteration of this tutorial.

gregcaporaso

comment created time in 5 days

push eventgregcaporaso/2022.1-faes-tutorial

Greg Caporaso

commit sha 4ef46ebd0479c0377e61fa3fb7a06c2166f5da12

lint

view details

push time in 5 days

push eventgregcaporaso/2022.1-faes-tutorial

Greg Caporaso

commit sha a5835956f0b5f593e61cae9b7f681817d94a9620

lint

view details

push time in 5 days

issue commentgregcaporaso/2022.1-faes-tutorial

Modify taxonomy filtering command in phylogeny tutorial

Oh, and I just realized that --p-include 'p__' would filter the features that are exclusively labeled "Unassigned" (as in the Moving Pictures tutorial dataset).

mikerobeson

comment created time in 6 days

issue commentgregcaporaso/2022.1-faes-tutorial

Modify taxonomy filtering command in phylogeny tutorial

Thanks @mikerobeson. I just made some edits in #38. I checked this data set and didn't notice the "Unclassified" issue that you mentioned, but you're right - I've definitely seen that before. I just checked and it actually pops up in the Moving Pictures tutorial. Since it's not an issue with this data set, I opted instead to use it as an opportunity to plug the forum. I added a note admonition that talks about filtering on these terms, and linked the forum post you reference in this issue to let readers know they can find useful suggestions by reading the forum.

mikerobeson

comment created time in 6 days

create barnchgregcaporaso/2022.1-faes-tutorial

branch : various-updates

created branch time in 6 days

issue commentgregcaporaso/2022.1-faes-tutorial

Modify taxonomy filtering command in phylogeny tutorial

@mikerobeson, I'm thinking of adapting the --p-exclude value to --p-exclude 'p__;,Chloroplast,Mitochondria'. Does that work for you?

Since we're using Greengenes here, the other terms you have in there shouldn't hit anything (please correct me if I'm wrong about that). I think keeping 'Eukaryota' might be confusing since GG only annotates bacteria and archaea. Also, would your filter toss sequences that had valid (say) family assignments but were labeled 'Unassigned' at the genus level?

mikerobeson

comment created time in 6 days

issue commentgregcaporaso/2022.1-faes-tutorial

Modify artifact numbering in feature table filtering tutorial

Thanks @lina-kim! I'm making this change now.

lina-kim

comment created time in 6 days

issue commentgregcaporaso/2022.1-faes-tutorial

Modify taxonomy filtering command in phylogeny tutorial

Thanks @mikerobeson - I agree with these suggestions and 'll make this change.

NOTE: This is presented out of order in reference to the workshop schedule.

We're planning to adjust the workshop schedule so that taxonomy is assigned before the phylogenetic reconstruction step so we can apply this step. Sorry for the confusion!

mikerobeson

comment created time in 6 days

issue closedgregcaporaso/2022.1-faes-tutorial

add notebook (or section) on importing data through DADA2

This should focus on a small subset of the data - @ebolyen can help to identify which.

closed time in 6 days

gregcaporaso

issue commentgregcaporaso/2022.1-faes-tutorial

add notebook (or section) on importing data through DADA2

This was addressed in #34.

gregcaporaso

comment created time in 6 days

push eventgregcaporaso/2022.1-faes-tutorial

Greg Caporaso

commit sha 6d9a8e9325819d9be909ceb716ad31a2a505ce7e

add exploring metadata content

view details

Greg Caporaso

commit sha 2d6575db43ee4f38aff041795fcc3ee60549daea

added commands for upstream steps

view details

Greg Caporaso

commit sha 4e794c09e1fddc672048bd1788c4148be001bfcb

reduced trunc_len params to retain sequences during denoise

view details

Greg Caporaso

commit sha 925c712e99c4d2796140d9e71fcaab820eb70332

updated parameter values following review of q-score plots

view details

Greg Caporaso

commit sha 6521a15f6ec2dfe4655eead7bbb0824268ae49de

adds basic text to upstream tutorial sections

view details

Matthew Dillon

commit sha 18bb1fb4b7f53976bb1e53b9f0f97e6435f343be

squash: include makefile

view details

Matthew Dillon

commit sha bdd6faaebb2d269e9bab0540514db541a78481ef

squash: revert debugging

view details

Matthew Dillon

commit sha 25a0ac107983161a1aff02946e172a56bd512193

squash: removing extra whitespace

view details

Greg Caporaso

commit sha 03d988174f98da7105fe9086c7bdaaab44b10466

Merge pull request #34 from gregcaporaso/upstream-steps - adds commands and text for upstream steps - renames directories under `book` to execute in desired order - adds makefile to standardize build procedure Thanks for the help @thermokarst and @ebolyen!

view details

push time in 9 days

PR merged gregcaporaso/2022.1-faes-tutorial

add commands for upstream steps

This contains all of the commands, but a few things need to be addressed before merge:

  • [x] DADA2 is filtering most sequences, not sure why yet (update: after additional review of the quality score plots, I noticed that reads vary slightly in length - my previous parameter setting was based on positions that were informed by fewer reads. this is addressed in the tutorial content now and commands are good to go, but we should figure out what's going on so we can answer questions about it)
  • [x] add some basic text
  • ~sequence data is currently downloaded from Dropbox, need to get that to AWS~ now separate issue (#35)

This PR also adds numbers to the subdirectories under book so they execute in the intended order (the downstream chapters were previously executing before the upstream chapters).

+463 -209

0 comment

37 changed files

gregcaporaso

pr closed time in 9 days

issue openedgregcaporaso/2022.1-faes-tutorial

transfer upstream tutorial data from Dropbox to AWS

The data in the upstream tutorial (currently in PR #34) is stored in Dropbox - this should be transferred to the same location as the downstream data on AWS, and the link should be updated in the upstream tutorial. There is only one relevant file this time (fastq-casava.zip).

created time in 9 days

push eventgregcaporaso/2022.1-faes-tutorial

Greg Caporaso

commit sha 6521a15f6ec2dfe4655eead7bbb0824268ae49de

adds basic text to upstream tutorial sections

view details

push time in 9 days

push eventgregcaporaso/2022.1-faes-tutorial

Greg Caporaso

commit sha 925c712e99c4d2796140d9e71fcaab820eb70332

updated parameter values following review of q-score plots

view details

push time in 11 days

more