profile
viewpoint

10XGenomics/rust-shardio 35

Out-of-memory sorting of large datasets map / reduce style processing

10XGenomics/rust-bwa 10

Rust wrapper of the BWA C API

nlhepler/BioExt 4

A few handy bioinformatics tools not already within BioPython

dalexander/bauhaus 3

minimal tertiary analysis for PacBio

nlhepler/cmap-stack 3

Contact Map Stacker

armintoepfer/uhu 2

Sandbox for PacBio Tools

nlhepler/fakemp 2

Fake multiprocessing objects

10XGenomics/cellranger-dna 1

Single Cell DNA Copy Number Profiling

dalexander/PRmm 1

PulseRecognizer minus minus

create barnchnlhepler/rust-intel-mkl

branch : lh/fix-glob-imports

created branch time in 2 months

PullRequestReviewEvent
PullRequestReviewEvent

Pull request review commentmartian-lang/martian-rust

Various fixes and cleanups (including breaking changes)

 impl RustAdapterInfo {     } } -pub fn make_timestamp(datetime: DateTime<Local>) -> String {-    datetime.format("%Y-%m-%d %H:%M:%S").to_string()+pub fn make_timestamp(datetime: impl Into<OffsetDateTime>) -> String {+    _make_timestamp(datetime.into())+}++fn _make_timestamp(datetime: OffsetDateTime) -> String {+    // Convert to local time (if necessary)+    let datetime = datetime.to_offset(UtcOffset::local_offset_at(datetime).unwrap());+    datetime.format(DATE_FORMAT).unwrap() }  pub fn make_timestamp_now() -> String {-    make_timestamp(Local::now())+    make_timestamp(SystemTime::now()) }  impl Metadata {-    pub fn new(args: Vec<String>) -> Metadata {+    pub fn new(mut args: Vec<String>) -> Metadata {         // # Take options from command line.         // shell_cmd, stagecode_path, metadata_path, files_path, run_file = argv+        args.truncate(5);

this intentionally ignores any additional arguments, and they are silently ignored instead of hitting the new assert below?

adam-azarchs

comment created time in 2 months

PR opened 10XGenomics/hdf5-rust

Reviewers
restore base package and zlib

required for headers and linking

I didn't catch this previously because I didn't cargo clean, let this stand as a reminder for future's sake!

+10 -0

0 comment

1 changed file

pr created time in 2 months

push event10XGenomics/hdf5-rust

Lance Hepler

commit sha 3e53344a3b3042f8a294b6e49fa42aa92120c92b

restore base package and zlib required for headers and linking

view details

push time in 2 months

PR opened 10XGenomics/hdf5-rust

Reviewers
osx/arm64: use hdf5-static package
+3 -8

0 comment

1 changed file

pr created time in 2 months

create barnch10XGenomics/hdf5-rust

branch : lh/conda-static

created branch time in 2 months

pull request commentfastqc-rs/fastqc-rs

implement some simple speedups

Okay, nothing major changes wrt findings.

nlhepler

comment created time in 2 months

push eventnlhepler/fastqc-rs

Felix Wiegand

commit sha 2903a69e5dba9625293aa51ea3423f41684a7806

Minor optimization

view details

Lance Hepler

commit sha 079a51f6e43a8cb2e1f322369901591442dd9a42

simple speedups - don't do silly things to get GC content - use AHash variant, much faster than SipHash

view details

Lance Hepler

commit sha b573b59cc4e2dd0f3ec28cc86855151d4f1a022f

port to use rustc_hash also convert a couple hashmaps to vector histograms at this point we are runtime-equivalent with Java fastqc and per unit of CPU time spent, faster.

view details

push time in 2 months

pull request commentfastqc-rs/fastqc-rs

implement some simple speedups

Ahh, I was working from a previous version. Let me update to the latest main, rebenchmark, and rebase.

nlhepler

comment created time in 2 months

PR opened fastqc-rs/fastqc-rs

implement some simple speedups

A few simple things:

  • use a HashMap with a more performant hash function (this one is used by rustc, we don't need the cryptographic guarantees provided by the stdlib's HashMap)
  • convert to histograms where possible (HashMaps are slow by comparison)
  • avoid some obvious unnecessary allocations (allocations are slow, especially in inner loops)

This probably won't get us as fast as the C++ version (I am presuming that version is highly optimized and I have not benchmarked it), but these changes do speed things up a fair amount.

Absolutely unscientifically benchmarked on a random 808MB FASTQ file we have lying around (output of /usr/bin/time -v):

Before:

        User time (seconds): 225.84
        System time (seconds): 2.07
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 3:49.23
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 1357720
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 10
        Minor (reclaiming a frame) page faults: 613361
        Voluntary context switches: 2442
        Involuntary context switches: 490
        Swaps: 0
        File system inputs: 28672
        File system outputs: 946648
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

After:

        User time (seconds): 106.30
        System time (seconds): 0.94
        Percent of CPU this job got: 98%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 1:48.67
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 245360
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 9
        Minor (reclaiming a frame) page faults: 9967
        Voluntary context switches: 1982
        Involuntary context switches: 351
        Swaps: 0
        File system inputs: 70632
        File system outputs: 449744
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

PS: I did have to convert the base quality bits to a custom quartiles computation, which did have the advantage of allowing us to drop plotters (which brings in a huge number of complex dependencies).

+144 -53

0 comment

2 changed files

pr created time in 2 months

push eventnlhepler/fastqc-rs

Lance Hepler

commit sha e4332bccbb0d9bf9348bcd52441a5d75f7525efc

port to use rustc_hash also convert a couple hashmaps to vector histograms at this point we are runtime-equivalent with Java fastqc and per unit of CPU time spent, faster.

view details

push time in 2 months

startedbencbartlett/3D-printed-mirror-array

started time in 2 months

create barnchnlhepler/fastqc-rs

branch : lh/simple-speedups

created branch time in 2 months

fork nlhepler/fastqc-rs

A quality control tool for FASTQ files written in rust

fork in 2 months

PullRequestReviewEvent

Pull request review comment10XGenomics/rust-shardio

Fix clippy lints and enable clippy in CI.

 where     /// What key is next among the active set     fn peek_active_next(&self) -> Option<&<S as SortKey<T>>::Key> {         let n = self.active_queue.peek_min();-        n.map(|v| v.current_key())+        n.map(ShardIter::current_key)

I mean, does this really? I would be shocked if the compiler doesn't inline this single method call. And this is arguably more readable.

adam-azarchs

comment created time in 2 months

PullRequestReviewEvent
PullRequestReviewEvent
PullRequestReviewEvent

delete branch 10XGenomics/hdf5-rust

delete branch : azarchs/build-rerun

delete time in 3 months

pull request comment10XGenomics/hdf5-rust

Add cargo:rerun-if- directives for conda feature.

Yes! This should save a lot of time.

adam-azarchs

comment created time in 3 months

push event10XGenomics/hdf5-rust

Adam Azarchs

commit sha 851d8737243bdb06499cca29cff08c95d16cb0d2

Add cargo:rerun-if- directives for conda feature. Currently it isn't emitting them, leading to unecessary reruns of the build script, which are especially expensive due to the download.

view details

Lance Hepler

commit sha 1a6d49ff5cd9fff3fb3d53550ad9f5c1fe1a3fe3

Merge pull request #4 from 10XGenomics/azarchs/build-rerun

view details

push time in 3 months

PR merged 10XGenomics/hdf5-rust

Reviewers
Add cargo:rerun-if- directives for conda feature.

Currently it isn't emitting them, leading to unecessary reruns of the build script, which are especially expensive due to the download.

+2 -0

0 comment

1 changed file

adam-azarchs

pr closed time in 3 months

more