profile
viewpoint
Nicholas Nethercote nnethercote Mozilla Melbourne, Australia he/him

nnethercote/counts 31

A tool for ad hoc profiling

mozilla/fix-stacks 1

This program post-processes the stack frames produced by `MozFormatCodeAddress()`.

nnethercote/pdf.js 1

PDF Reader in JavaScript

nnethercote/angle 0

Clone of https://chromium.googlesource.com/angle/angle with Gecko-specific patches. Talk to vlad, jgilbert, or kamidphish for more info.

nnethercote/B2G 0

Boot to Gecko aims to create a complete, standalone operating system for the open web.

nnethercote/chalk 0

A PROLOG-ish interpreter written in Rust, intended eventually for use in the compiler

nnethercote/cranelift 0

Cranelift code generator (formerly, Cretonne)

nnethercote/ena 0

An implementation of union-find / congruence-closure in Rust. Extracted from rustc for indepdendent experimentation.

nnethercote/euclid 0

Geometry primitives (basic linear algebra) for Rust

pull request commentrust-lang/rust

Miscellaneous inlining improvements

@bors try @rust-timer queue

nnethercote

comment created time in 8 hours

PR opened rust-lang/rust

Miscellaneous inlining improvements

These commits inline some hot functions that aren't currently inlined, for some speed wins.

r? @Centril

+22 -1

0 comment

3 changed files

pr created time in 8 hours

create barnchnnethercote/rust

branch : misc-inlining

created branch time in 8 hours

push eventnnethercote/rust

Steven Degutis

commit sha ac19dffd1eaea34c3861324c2588f9cb1f1489f5

Updating str.chars docs to mention crates.io. This might spare someone else a little time searching the stdlib for unicode/grapheme support.

view details

Mazdak Farrokhzad

commit sha cec2a9fad057f71fc640392ba3fa47602aea12f6

macro_legacy_warnings -> error

view details

Dylan MacKenzie

commit sha 5c473a059e26614b65414cfb8cf75c283cda5a87

Don't print block exit state if unchanged

view details

Vadim Petrochenkov

commit sha dcad07af8aa831344fd3be353c71379854637c21

parser: `macro_rules` is a weak keyword

view details

Amos Onn

commit sha 302b9e4b540cc352e75d3de6f803a99147107a50

Improve #Safety in various methods in core::ptr s/for reads and writes/for both ...

view details

Amos Onn

commit sha 351782d30aaa6e15204e17ecdd51ac1e712685cf

Improve #Safety of core::ptr::replace Added missing condition: `dst` must be readable

view details

Amos Onn

commit sha 40ca16794456e9b1520bba6d887a176395f127f0

Improve #Safety in various methods in core::ptr For all methods which read a value of type T, `read`, `read_unaligned`, `read_volatile` and `replace`, added missing constraint: The value they point to must be properly initialized

view details

Guillaume Gomez

commit sha cadf9efad123a62472cad45f22569747cc599256

Clean up E0309 explanation

view details

Camille GILLOT

commit sha d5691209b6d5fe5e47560b1db7b822dbeb0880fd

Move librustc/{traits,infer} to librustc_infer.

view details

Camille GILLOT

commit sha 187a9741d3cd63dd78571e2a0e08344aef05f51b

Make librustc compile.

view details

Camille GILLOT

commit sha f07e8891458259bb4373bb6aa59d158304f637b1

Make librustc_infer compile.

view details

Camille GILLOT

commit sha 4b57cb3cbed8674aa480bff450affa62ac6b75bf

Make librustc_typeck compile.

view details

Camille GILLOT

commit sha 1637aab15e175b5e0dc14947ffa946804420d414

Make librustc_mir compile.

view details

Camille GILLOT

commit sha 2519f4a0a336993fc2e494a194807c56060256b3

Make librustc_traits compile.

view details

Camille GILLOT

commit sha bee6a5ac1274201e7da2081a5aff6b3b1f407185

Other crates.

view details

Camille GILLOT

commit sha 795673ae2060198cdb09c6ded8d303c244dac6fd

Remove librustc_infer crate re-exports.

view details

Camille GILLOT

commit sha 0b93cfc1ee3d61987e9f3229370d79acd51544a1

Prune features.

view details

Camille GILLOT

commit sha 5d5720835329230c60cb7b4f56e2a9b234dae6cf

Gate macro use.

view details

Camille GILLOT

commit sha e88500b5e18bbbad2323944d3c23f8a4465eb147

Prune rustc dependencies.

view details

Amos Onn

commit sha 943e65396d7bc7b91bcc30407d323d06f4b20a22

Improve #Safety of core::ptr::drop_in_place Added missing conditions: - Valid for writes - Valid for destructing

view details

push time in 17 hours

pull request commentgetsentry/symbolic

Improve `.o` handling on Mac

@jan-auer: Any new thoughts here? The project I'm working on currently uses a fork of Symbolic that contains this PR, because it needs to work directly with .o files on Mac. It's not an ideal situation, and it would be great if it could be resolved. Thanks.

nnethercote

comment created time in 17 hours

PR closed rust-lang/rust

Tweak LEB128 reading some more. S-waiting-on-review

PR #69050 changed LEB128 reading and writing. After it landed I did some double-checking and found that the writing changes were universally a speed-up, but the reading changes were not. I'm not exactly sure why, perhaps there was a quirk of inlining in the particular revision I was originally working from.

This commit reverts some of the reading changes, while still avoiding unsafe code. I have checked it on multiple revisions and the speed-ups seem to be robust.

r? @michaelwoerister

+29 -5

12 comments

1 changed file

nnethercote

pr closed time in 18 hours

pull request commentrust-lang/rust

Tweak LEB128 reading some more.

So, we have two quite different sets of results when the same change is measured on top of different revisions. I'm going to abandon this PR for the following reasons.

  • The instruction counts regression from the first run is a bit worse than the instruction count improvement from the second run.
  • Both runs look like regressions if you look at the cycle counts.
  • This PR makes the code uglier.
nnethercote

comment created time in 18 hours

issue commentrust-lang/rust

Re-evaluate `Hash{Set,Map}` vs `FxHash{Set,Map}` once #69152 lands

Note to self: x.py build --warnings=warn allows the warning to be ignored. Thanks to @Mark-Simulacrum for the tip.

nnethercote

comment created time in 2 days

pull request commentrust-lang/rust

Tweak LEB128 reading some more.

The CI results show a clear regression, in contrast to my local results, hmm. I have rebased against a more recent revision. Let's try doing another perf CI run, just for interest's sake.

@bors try @rust-timer queue

nnethercote

comment created time in 2 days

push eventnnethercote/rust

Tobias Thiel

commit sha 51021b1d421a7d055ff44f9b6afe11377b825c5c

rustc_session: allow overriding lint level of individual lints from a group

view details

Tobias Thiel

commit sha 34417792deed6f0e570e9c7b01a24f1d05b70519

tools/compiletest: fix argument ordering for allowing unused in ui & compile-fail tests

view details

Tobias Thiel

commit sha 21edd2ae2cc4c06d8ea98051c47d24dc3c4e2238

convert match statement to if let

view details

Tobias Thiel

commit sha 3fc9253a5a27771c72429a738d5379c34e1cd924

rustc: add lint level cli ordering into the documentation

view details

Aaron Hill

commit sha 90afc0765e5e536af6307b63e1655a38df06e235

Use a `ParamEnvAnd<Predicate>` for caching in `ObligationForest` Previously, we used a plain `Predicate` to cache results (e.g. successes and failures) in ObligationForest. However, fulfillment depends on the precise `ParamEnv` used, so this is unsound in general. This commit changes the impl of `ForestObligation` for `PendingPredicateObligation` to use `ParamEnvAnd<Predicate>` instead of `Predicate` for the associated type. The associated type and method are renamed from 'predicate' to 'cache_key' to reflect the fact that type is no longer just a predicate.

view details

Tom Jakubowski

commit sha b60f08bd6d3fbe784eb47a57e0c41954454af3dd

rustdoc: NodeId is now DefId

view details

Tom Jakubowski

commit sha 05c6f329e785c9b53a50217de0f21df906ae7ba0

rustdoc: emit JS paths for struct-like variants On the backend, rustdoc now emits `paths` entries to a crate's search index for struct-like enum variants, and index items of type structfield which belong to such variants point to their variant parents in the `paths` table, rather than their enum grandparents. The path entry for a variant is the fully qualified module path plus the enum name. On the frontend, the search code recognizes structfields belonging to structlike variants in the `paths` table and re-constructs the URL to the field's anchor on the enum documentation page. closes #16017

view details

Guillaume Gomez

commit sha 8ee30dbc1b07ad7fc842ceee6d6729a1377f7a36

Add tests for struct variant field in search

view details

Daniel Henry-Mantilla

commit sha 60274a95fef57a18113f7c48be68be31ece860eb

Added From<Vec<NonZeroU8>> for CString Updated tracking issue number Added safeguards for transmute_vec potentially being factored out elsewhere Clarified comment about avoiding mem::forget Removed unneeded unstable guard Added back a stability annotation for CI Minor documentation improvements Thanks to @Centril's code review Co-Authored-By: Mazdak Farrokhzad <twingoow@gmail.com> Improved layout checks, type annotations and removed unaccurate comment Removed unnecessary check on array layout Adapt the stability annotation to the new 1.41 milestone Co-Authored-By: Mazdak Farrokhzad <twingoow@gmail.com> Simplify the implementation. Use `Vec::into_raw_parts` instead of a manual implementation of `Vec::transmute`. If `Vec::into_raw_parts` uses `NonNull` instead, then the code here will need to be adjusted to take it into account (issue #65816) Reduce the whitespace of safety comments

view details

Mazdak Farrokhzad

commit sha 6509db844315882db7ec0b624ca1e7b04d72568d

or_patterns: harden bindings test

view details

Mazdak Farrokhzad

commit sha 29437e55a56c1c1251ae5f7276f3e95dac4b609a

or_patterns: rename previous test

view details

Mazdak Farrokhzad

commit sha 17e632d382dfae46e9dfa684db9bddec3e8951a7

or_patterns: test default binding modes

view details

Mazdak Farrokhzad

commit sha b5aca3128d5c0ee2441ec9ca9a9c3ae4f391ef16

typeck: refactor default binding mode logic & improve docs

view details

Mikhail Babenko

commit sha 953f6ecb6adc37b4f8e52102c1e7ca86cc5bc92c

fix lifetime shadowing check in GATs

view details

varkor

commit sha 38060567e89bb142e8a060d91bf53f7e82eaaae6

Correct inference of primitive operand type behind binary operation

view details

varkor

commit sha 0276d7a32e1c83abc3106f7b36b711faf1f74dff

Add more tests

view details

Mazdak Farrokhzad

commit sha ebbaf4611a9605412d2aa31c8ebaf0745557fff0

simplify_try: address some of eddyb's comments

view details

John Kåre Alsaker

commit sha 77ab0d091e32ac9ec4154bba1727fc3937975d64

Construct query job latches on-demand

view details

John Kåre Alsaker

commit sha 19c1012483fe8fb3c793a6ba83f97d896c6a6c98

Use a counter instead of pointers to the stack

view details

John Kåre Alsaker

commit sha 5de82b926486edc54d7183971fde901be9445c6b

Drop the lock guard

view details

push time in 2 days

push eventnnethercote/rust

Tobias Thiel

commit sha 51021b1d421a7d055ff44f9b6afe11377b825c5c

rustc_session: allow overriding lint level of individual lints from a group

view details

Tobias Thiel

commit sha 34417792deed6f0e570e9c7b01a24f1d05b70519

tools/compiletest: fix argument ordering for allowing unused in ui & compile-fail tests

view details

Tobias Thiel

commit sha 21edd2ae2cc4c06d8ea98051c47d24dc3c4e2238

convert match statement to if let

view details

Tobias Thiel

commit sha 3fc9253a5a27771c72429a738d5379c34e1cd924

rustc: add lint level cli ordering into the documentation

view details

Aaron Hill

commit sha 90afc0765e5e536af6307b63e1655a38df06e235

Use a `ParamEnvAnd<Predicate>` for caching in `ObligationForest` Previously, we used a plain `Predicate` to cache results (e.g. successes and failures) in ObligationForest. However, fulfillment depends on the precise `ParamEnv` used, so this is unsound in general. This commit changes the impl of `ForestObligation` for `PendingPredicateObligation` to use `ParamEnvAnd<Predicate>` instead of `Predicate` for the associated type. The associated type and method are renamed from 'predicate' to 'cache_key' to reflect the fact that type is no longer just a predicate.

view details

Tom Jakubowski

commit sha b60f08bd6d3fbe784eb47a57e0c41954454af3dd

rustdoc: NodeId is now DefId

view details

Tom Jakubowski

commit sha 05c6f329e785c9b53a50217de0f21df906ae7ba0

rustdoc: emit JS paths for struct-like variants On the backend, rustdoc now emits `paths` entries to a crate's search index for struct-like enum variants, and index items of type structfield which belong to such variants point to their variant parents in the `paths` table, rather than their enum grandparents. The path entry for a variant is the fully qualified module path plus the enum name. On the frontend, the search code recognizes structfields belonging to structlike variants in the `paths` table and re-constructs the URL to the field's anchor on the enum documentation page. closes #16017

view details

Guillaume Gomez

commit sha 8ee30dbc1b07ad7fc842ceee6d6729a1377f7a36

Add tests for struct variant field in search

view details

Daniel Henry-Mantilla

commit sha 60274a95fef57a18113f7c48be68be31ece860eb

Added From<Vec<NonZeroU8>> for CString Updated tracking issue number Added safeguards for transmute_vec potentially being factored out elsewhere Clarified comment about avoiding mem::forget Removed unneeded unstable guard Added back a stability annotation for CI Minor documentation improvements Thanks to @Centril's code review Co-Authored-By: Mazdak Farrokhzad <twingoow@gmail.com> Improved layout checks, type annotations and removed unaccurate comment Removed unnecessary check on array layout Adapt the stability annotation to the new 1.41 milestone Co-Authored-By: Mazdak Farrokhzad <twingoow@gmail.com> Simplify the implementation. Use `Vec::into_raw_parts` instead of a manual implementation of `Vec::transmute`. If `Vec::into_raw_parts` uses `NonNull` instead, then the code here will need to be adjusted to take it into account (issue #65816) Reduce the whitespace of safety comments

view details

Mazdak Farrokhzad

commit sha 6509db844315882db7ec0b624ca1e7b04d72568d

or_patterns: harden bindings test

view details

Mazdak Farrokhzad

commit sha 29437e55a56c1c1251ae5f7276f3e95dac4b609a

or_patterns: rename previous test

view details

Mazdak Farrokhzad

commit sha 17e632d382dfae46e9dfa684db9bddec3e8951a7

or_patterns: test default binding modes

view details

Mazdak Farrokhzad

commit sha b5aca3128d5c0ee2441ec9ca9a9c3ae4f391ef16

typeck: refactor default binding mode logic & improve docs

view details

Mikhail Babenko

commit sha 953f6ecb6adc37b4f8e52102c1e7ca86cc5bc92c

fix lifetime shadowing check in GATs

view details

varkor

commit sha 38060567e89bb142e8a060d91bf53f7e82eaaae6

Correct inference of primitive operand type behind binary operation

view details

varkor

commit sha 0276d7a32e1c83abc3106f7b36b711faf1f74dff

Add more tests

view details

Mazdak Farrokhzad

commit sha ebbaf4611a9605412d2aa31c8ebaf0745557fff0

simplify_try: address some of eddyb's comments

view details

John Kåre Alsaker

commit sha 77ab0d091e32ac9ec4154bba1727fc3937975d64

Construct query job latches on-demand

view details

John Kåre Alsaker

commit sha 19c1012483fe8fb3c793a6ba83f97d896c6a6c98

Use a counter instead of pointers to the stack

view details

John Kåre Alsaker

commit sha 5de82b926486edc54d7183971fde901be9445c6b

Drop the lock guard

view details

push time in 2 days

push eventnnethercote/rustc-perf

Nicholas Nethercote

commit sha e140dfe210a0a8365ec4eaa2f3b1446093436016

Use `--show-percs` with Cachegrind and Callgrind.

view details

Nicholas Nethercote

commit sha 44250725b3629bec0b7a096af8e64d15f3e8eca1

Remove exp-DHAT support, DHAT is much better.

view details

push time in 2 days

push eventrust-lang-nursery/rustc-perf

Nicholas Nethercote

commit sha e140dfe210a0a8365ec4eaa2f3b1446093436016

Use `--show-percs` with Cachegrind and Callgrind.

view details

Nicholas Nethercote

commit sha 44250725b3629bec0b7a096af8e64d15f3e8eca1

Remove exp-DHAT support, DHAT is much better.

view details

push time in 2 days

push eventnnethercote/rustc-perf

Mark Rousskov

commit sha fb4c7363c98a50830d47aee16d4bbb25891f3bd9

Update noise run link

view details

push time in 2 days

delete branch nnethercote/rust

delete branch : 68848-follow-up

delete time in 2 days

pull request commentrust-lang/rust

Tweak LEB128 reading some more.

@bors try @rust-timer queue

nnethercote

comment created time in 4 days

pull request commentrust-lang/rust

Tweak LEB128 reading some more.

Local check results:

clap-rs-check
        avg: -0.9%      min: -1.9%      max: 0.0%
packed-simd-check
        avg: -0.3%      min: -0.8%      max: 0.0%
issue-46449-check
        avg: 0.4%       min: 0.3%       max: 0.6%
tuple-stress-check
        avg: -0.1%      min: -0.5%      max: 0.0%
wg-grammar-check
        avg: -0.2%      min: -0.5%      max: 0.0%
helloworld-check
        avg: 0.2%       min: -0.2%      max: 0.5%
keccak-check
        avg: -0.1%      min: -0.4%      max: 0.0%
webrender-check
        avg: -0.2%      min: -0.4%      max: 0.0%
regex-check
        avg: -0.2%      min: -0.4%      max: 0.1%
ripgrep-check
        avg: -0.1%      min: -0.4%      max: 0.1%
piston-image-check
        avg: -0.2%      min: -0.4%      max: 0.1%
unify-linearly-check
        avg: 0.1%       min: -0.3%      max: 0.4%
serde-check
        avg: -0.1%      min: -0.4%      max: 0.0%
cranelift-codegen-check
        avg: -0.1%      min: -0.4%      max: 0.0%
script-servo-check
        avg: -0.2%      min: -0.4%      max: 0.0%
style-servo-check
        avg: -0.1%      min: -0.3%      max: 0.0%
wf-projection-stress-65510-che...
        avg: -0.1%      min: -0.3%      max: 0.0%
coercions-check
        avg: 0.1%?      min: 0.0%?      max: 0.3%?
cargo-check
        avg: -0.1%      min: -0.3%      max: 0.0%
trait-stress-check
        avg: 0.1%       min: -0.0%      max: 0.3%
unused-warnings-check
        avg: -0.1%      min: -0.3%      max: 0.0%
deeply-nested-check
        avg: 0.1%       min: -0.3%      max: 0.3%
futures-check
        avg: -0.1%      min: -0.3%      max: 0.1%
tokio-webpush-simple-check
        avg: 0.1%       min: -0.3%      max: 0.3%
syn-check
        avg: -0.1%      min: -0.3%      max: 0.1%
hyper-2-check
        avg: -0.1%      min: -0.3%      max: 0.1%
webrender-wrench-check
        avg: -0.1%      min: -0.3%      max: 0.2%
await-call-tree-check
        avg: 0.1%       min: -0.3%      max: 0.2%
serde-serde_derive-check
        avg: -0.1%      min: -0.2%      max: 0.0%
encoding-check
        avg: -0.1%      min: -0.2%      max: 0.1%
unicode_normalization-check
        avg: -0.1%      min: -0.2%      max: 0.0%
html5ever-check
        avg: -0.1%      min: -0.2%      max: 0.1%
ucd-check
        avg: -0.1%      min: -0.2%      max: 0.0%
inflate-check
        avg: -0.0%      min: -0.2%      max: 0.0%
regression-31157-check
        avg: 0.0%       min: -0.1%      max: 0.2%
deep-vector-check
        avg: -0.0%      min: -0.1%      max: 0.0%
ctfe-stress-4-check
        avg: -0.0%?     min: -0.1%?     max: 0.1%?
token-stream-stress-check
        avg: -0.0%      min: -0.1%      max: 0.0%
nnethercote

comment created time in 4 days

PR opened rust-lang/rust

Tweak LEB128 reading some more.

PR #69050 changed LEB128 reading and writing. After it landed I did some double-checking and found that the writing changes were universally a speed-up, but the reading changes were not. I'm not exactly sure why, perhaps there was a quirk of inlining in the particular revision I was originally working from.

This commit reverts some of the reading changes, while still avoiding unsafe code. I have checked it on multiple revisions and the speed-ups seem to be robust.

r? @michaelwoerister

+29 -5

0 comment

1 changed file

pr created time in 4 days

create barnchnnethercote/rust

branch : tweak-LEB128-reading

created branch time in 4 days

push eventnnethercote/rustc-perf

dependabot-preview[bot]

commit sha 7ecf32f5a948355297ab1388b6e41d549a82c614

Bump ring from 0.16.9 to 0.16.10 Bumps [ring](https://github.com/briansmith/ring) from 0.16.9 to 0.16.10. - [Release notes](https://github.com/briansmith/ring/releases) - [Commits](https://github.com/briansmith/ring/commits) Signed-off-by: dependabot-preview[bot] <support@dependabot.com>

view details

dependabot-preview[bot]

commit sha 9e13f3b952970845b72a537b0185b06c81d80a73

Bump serde_json from 1.0.45 to 1.0.46 Bumps [serde_json](https://github.com/serde-rs/json) from 1.0.45 to 1.0.46. - [Release notes](https://github.com/serde-rs/json/releases) - [Commits](https://github.com/serde-rs/json/compare/v1.0.45...v1.0.46) Signed-off-by: dependabot-preview[bot] <support@dependabot.com>

view details

dependabot-preview[bot]

commit sha 21e4cc36feed7a42615ca6982e67643fb2d62cd1

Bump regex from 1.3.3 to 1.3.4 Bumps [regex](https://github.com/rust-lang/regex) from 1.3.3 to 1.3.4. - [Release notes](https://github.com/rust-lang/regex/releases) - [Changelog](https://github.com/rust-lang/regex/blob/master/CHANGELOG.md) - [Commits](https://github.com/rust-lang/regex/compare/1.3.3...1.3.4) Signed-off-by: dependabot-preview[bot] <support@dependabot.com>

view details

dependabot-preview[bot]

commit sha bba45c2525c9d55db18f1e18cf60d9e564c05d51

Bump thiserror from 1.0.9 to 1.0.10 Bumps [thiserror](https://github.com/dtolnay/thiserror) from 1.0.9 to 1.0.10. - [Release notes](https://github.com/dtolnay/thiserror/releases) - [Commits](https://github.com/dtolnay/thiserror/compare/1.0.9...1.0.10) Signed-off-by: dependabot-preview[bot] <support@dependabot.com>

view details

dependabot-preview[bot]

commit sha f97cef59acd49d415579e604f030b46a771b5b7b

Bump rmp-serde from 0.14.0 to 0.14.2 Bumps [rmp-serde](https://github.com/3Hren/msgpack-rust) from 0.14.0 to 0.14.2. - [Release notes](https://github.com/3Hren/msgpack-rust/releases) - [Commits](https://github.com/3Hren/msgpack-rust/commits) Signed-off-by: dependabot-preview[bot] <support@dependabot.com>

view details

dependabot-preview[bot]

commit sha 88467f1ab3a2b2ad83bbf463d8650dc653d39db8

Bump rust_team_data from `72484fc` to `58ca503` Bumps [rust_team_data](https://github.com/rust-lang/team) from `72484fc` to `58ca503`. - [Release notes](https://github.com/rust-lang/team/releases) - [Commits](https://github.com/rust-lang/team/compare/72484fc50e2bdef3847c31f6f863861f97310ebc...58ca503bcaa08abfb0573499ae55b0091bd1e844) Signed-off-by: dependabot-preview[bot] <support@dependabot.com>

view details

dependabot-preview[bot]

commit sha 0e892c94d17f2f4954a6380fc7cd3b296fc7d868

Bump hashbrown from 0.6.3 to 0.7.0 Bumps [hashbrown](https://github.com/rust-lang/hashbrown) from 0.6.3 to 0.7.0. - [Release notes](https://github.com/rust-lang/hashbrown/releases) - [Changelog](https://github.com/rust-lang/hashbrown/blob/master/CHANGELOG.md) - [Commits](https://github.com/rust-lang/hashbrown/compare/v0.6.3...v0.7.0) Signed-off-by: dependabot-preview[bot] <support@dependabot.com>

view details

Mark Rousskov

commit sha 3302af56c5b38906d34f5fb139d961d1d4f8b6e6

Go to non-preview futures

view details

Mark Rousskov

commit sha fe0abcb89f9a1574a962c8650c889bd9318e77e0

Update reqwest to 0.10

view details

Mark Rousskov

commit sha 7176d0e852c4d40ceb6717c20b252bbd1b4b2bbc

Update hyper to 0.13

view details

Mark Rousskov

commit sha c07c3da166e397e64d946226bda3a8a349647d6b

Fix detailed query diff pages

view details

Mark Rousskov

commit sha 373d503fd35d17acefb0cae86dc54a57d75673f5

Properly authenticate commit-expansion requests It is unclear how this ever worked (I would have expected us to fail these requests always). Possibly caused by an upstream GitHub change; unclear.

view details

Mark Rousskov

commit sha bf12b2afe87c59b434be85ff9b92404f333ad26d

Correct index.html for serialization changes Some recent update changed the serialization format for objects to be non-array based; this accounts for that change in the primary graph generation.

view details

Mark Rousskov

commit sha 855647683f73885b9667ad760f9b07ecbb829dce

Do not require local rust.git checkout for site Follow-up work will include dropping the checkout for the collector as well, likely moving to a format where the raw data does not store commit dates (instead only storing hashes), and the dates are associated by loading code in the site.

view details

Mark Rousskov

commit sha f10cde5986a2b03cb36354ff1daf51b3d6608f95

Join the file loading and file deserialization steps In practice this seems to have minimal or no effect on timing, and is a little cleaner.

view details

Mark Rousskov

commit sha 07dde1a9cbeba9b2e35e28579f945a8913275bf0

Support recoding commit data into new formats

view details

Mark Rousskov

commit sha 99fc43ea5d0341f5d359a7b9d1630318358c7ccf

Prevent refreshing data too often This should allow us to more eagerly ping the data refresher from the collector, avoiding the current stale frontend (and as such delays when the collector is not collecting due to not having a next commit to send).

view details

Mark Rousskov

commit sha f5d24dea3670d38fadacf936e14391fa00f1b074

Hit the onpush endpoint after benchmarking a commit

view details

Mark Rousskov

commit sha 45b1645d155b1ea7bc00ea07fcb3fd16b1e8b639

Switch to non-nursery rustup

view details

push time in 4 days

pull request commentrust-lang/rust

Speed up `DefaultHasher`, `SipHasher`, and `SipHasher13`.

Rust's Sip hashing is notorious for being slow, hence the existence of FnvHash{Map,Set} and FxHash{Map,Set}. This change is worth mentioning in the release notes. Is there a way to mark it for release note consideration?

nnethercote

comment created time in 5 days

issue commentrust-lang/rust

Investigate FxHash low-order bit quality when hashing aligned addresses.

See also #69153

eddyb

comment created time in 5 days

issue openedrust-lang/rust

Re-evaluate `Hash{Set,Map}` vs `FxHash{Set,Map}` once #69152 lands

rustc uses FxHash{Set,Map} everywhere rather than Hash{Set,Map}, because the DefaultHasher used by Hash{Set,Map} is slow.

But once #69152 lands, DefaultHasher will be a lot faster when hashing integers, which is a common case; in one microbenchmark I saw a ~2.5x speed-up. Combine that with the fact that FxHasher is a lower-quality hasher and so tends to result in more collisions, and the default hash tables might be faster. (On a different microbenchmark I saw that HashSet<u32> was a little bit faster than FxHashSet<u32>.)

We should evaluate this, probably by replacing every FxHash{Set,Map} with Hash{Set,Map}. (It keeps things simpler if we exclusively used one or the other, rather than a mix.)

I briefly tried to do this, but we have a lint that produces this message if you try to use Hash{Set,Map}: "error: Prefer FxHashSet over HashSet, it has better performance". I couldn't work out how to disable it.

cc @rust-lang/wg-compiler-performance cc @cbreeden cc @Amanieu

created time in 5 days

pull request commentrust-lang/rust

Hasten macro parsing

Something weird happened here. Somehow an old version of this branch got merged. I will have to put the orphaned changes in a separate PR.

#69150 is the PR.

nnethercote

comment created time in 5 days

create barnchnnethercote/rustc-hash

branch : sip-hash

created branch time in 5 days

fork nnethercote/rustc-hash

Custom hash algorithm used by rustc (plus hashmap/set aliases): fast, deterministic, not secure

fork in 5 days

pull request commentrust-lang/rust

Speed up `DefaultHasher`, `SipHasher`, and `SipHasher13`.

This should have negligible effect on rustc's own performance, because rustc uses fxhash everywhere. But let's check, just to make sure.

@bors try @rust-timer queue

nnethercote

comment created time in 5 days

PR opened rust-lang/rust

Speed up `DefaultHasher`, `SipHasher`, and `SipHasher13`.

This PR applies the speedups to SipHasher128 from #68914 to the Sip hashers in libcore, and also adds the missing write_* methods required so that they can benefit from the speedups. Default hashing of integers is now something like 2.5x faster, and default hash tables should be more competitive with hash tables from the fxhash crate.

It also undoes a part of #68914's changes to SipHasher128 because I found they were a pessimisation.

r? @michaelwoerister

+339 -42

0 comment

3 changed files

pr created time in 5 days

create barnchnnethercote/rust

branch : speed-up-DefaultHasher-SipHasher-SipHasher13

created branch time in 5 days

pull request commentrust-lang/rust

Follow-up to #68848

@bors rollup=always

nnethercote

comment created time in 5 days

PR opened rust-lang/rust

Follow-up to #68848

This PR contains some late changes to #68848 that somehow didn't get included when that PR was merged in a roll-up.

r? @petrochenkov

+15 -17

0 comment

1 changed file

pr created time in 5 days

create barnchnnethercote/rust

branch : 68848-follow-up

created branch time in 5 days

delete branch nnethercote/rust

delete branch : hasten-macro-parsing

delete time in 5 days

PR closed rust-lang/rust

Hasten macro parsing S-waiting-on-author

r? @eddyb

+73 -67

17 comments

5 changed files

nnethercote

pr closed time in 5 days

pull request commentrust-lang/rust

Hasten macro parsing

Something weird happened here. Somehow an old version of this branch got merged. I will have to put the orphaned changes in a separate PR.

nnethercote

comment created time in 5 days

issue commentrust-lang/rust

Exponential trait selection when compiling a crate using combine 4

I increased the size of the benchmark from 16 tokens to 19 and did some profiling.

register_obligation_at is called 3.4M times. It accounts for most of the runtime.

The length of self.nodes gets up to 1.4M.

Worst of all, the length of at least one node.dependents gets up to 131k, which is extraordinary. This results in quadratic behaviour, because of this code, because contains does a linear search:

    if !node.dependents.contains(&parent_index) {
        node.dependents.push(parent_index);
    }

I.e. the length of this node.dependents grows from 0 to 131k one at a time, with a failing linear search on every push.

I printed out the predicates being added, there are a lot of these ones:

Binder(TraitPredicate(<impl Parser<_> as std::marker::Sized>))

I don't really understand the predicate stuff, but maybe @nikomatsakis has some thoughts about this?

Marwes

comment created time in 5 days

pull request commentrust-lang/rust

Micro-optimize the heck out of LEB128 reading and writing.

@fitzgen tried using PEXT a while back in a different project. For the common case (small integers that fit in 1 byte) it was a slight slowdown: https://twitter.com/fitzgen/status/1138784734417432576

nnethercote

comment created time in 5 days

push eventnnethercote/rust

Mazdak Farrokhzad

commit sha e839b2ec849246ec5efe5069c8d874dbef289462

Constness -> enum Const { Yes(Span), No } Same idea for `Unsafety` & use new span for better diagnostics.

view details

Mazdak Farrokhzad

commit sha c30f068dc8b2ef58678b9846ba834dd6dea3fe44

IsAsync -> enum Async { Yes { span: Span, .. }, No } use new span for better diagnostics.

view details

Mazdak Farrokhzad

commit sha 36a17e4067d2e67223cd9a172476ee5503d6b44b

parser_fn_front_matter: allow `const .. extern`

view details

Mazdak Farrokhzad

commit sha a833be21626890de406e12f2561d2ffbda4aadb4

parser: fuse free `fn` parsing together.

view details

Mazdak Farrokhzad

commit sha b05e9d2b4d8692a9f0932e9098727762bfad6efe

parser: solidify `fn` parsing with `parse_fn`.

view details

Mazdak Farrokhzad

commit sha 04253791952d85a4da5d19d228cbac92e37ee2b9

parser: inline `parse_assoc_fn` and friends.

view details

Mazdak Farrokhzad

commit sha cdbbc25cc31080e189a98c010a0b39a9074d0c50

parser: move `ban_async_in_2015` to `fn` parsing & improve it.

view details

Mazdak Farrokhzad

commit sha 05e5530577bf43749186fd56195cffb686f0311e

parser: address review comments

view details

Mazdak Farrokhzad

commit sha 79d139ac7056d0102db605715f354689b0214705

parser: simplify ParamCfg -> ReqName

view details

Mazdak Farrokhzad

commit sha 3341c940068d60ade54ea343301a810c0bb51153

ast_validation: tweak diagnostic output

view details

Mazdak Farrokhzad

commit sha 27f60906aa400aa42bca5346701b7a02fbc1a872

rustc_bulltin_macros: tweak span_labels

view details

Mazdak Farrokhzad

commit sha 4ca3bbf0b21c82d50d14aca9b74a4dd919d9087f

parser: add test for 'extern crate async'

view details

Mazdak Farrokhzad

commit sha 9828559aad8672bb320517bd0fa1992ce144b848

parser: is_fn_front_matter -> check_fn_front_matter

view details

bors

commit sha be493fe8cc40c3d3f6030a1313c1ff747fce770d

Auto merge of #69023 - Centril:parse_fn, r=petrochenkov parse: unify function front matter parsing Part of https://github.com/rust-lang/rust/pull/68728. - `const extern fn` feature gating is now done post-expansion such that we do not have conditional compatibilities of function qualifiers *in parsing*. - The `FnFrontMatter` grammar becomes: ```rust Extern = "extern" StringLit ; FnQual = "const"? "async"? "unsafe"? Extern? ; FnFrontMatter = FnQual "fn" ; ``` That is, all item contexts now *syntactically* allow `const async unsafe extern "C" fn` and use semantic restrictions to rule out combinations previously prevented syntactically. The semantic restrictions include in particular: - `fn`s in `extern { ... }` can have no qualifiers. - `const` and `async` cannot be combined. - We change `ast::{Unsafety, Spanned<Constness>}>` into `enum ast::{Unsafe, Const} { Yes(Span), No }` respectively. This change in formulation allow us to exclude `Span` in the case of `No`, which facilitates parsing. Moreover, we also add a `Span` to `IsAsync` which is renamed to `Async`. The new `Span`s in `Unsafety` and `Async` are then taken advantage of for better diagnostics. A reason this change was made is to have a more uniform and clear naming scheme. The HIR keeps the structures in AST (with those definitions moved into HIR) for now to avoid regressing perf. r? @petrochenkov

view details

Ralf Jung

commit sha 0633a0e3801e4efc9ab07bf811e442bd379ce93a

remove Panic variant from InterpError

view details

Ralf Jung

commit sha c5709ff6b779d88c0d432f6ed8731fde6e55c090

const-prop: handle overflow_check consistently for all operators

view details

Ralf Jung

commit sha f3ff02bdd85255ad75bae40aad53e520e37a8e4a

remove PanicInfo::Panic variant that MIR does not use or need

view details

Ralf Jung

commit sha 17a8cfd605fb8d43dc61496a522cf3b84988d69d

no need for hook_panic_fn to return a bool

view details

Ralf Jung

commit sha 6457b29104028bbb3af5efeefed7343d85576320

move PanicInfo to mir module

view details

Ralf Jung

commit sha 55339f2eb7c186334216c35203f98540e8c8cb37

small cleanup in ConstEvalErr::struct_generic

view details

push time in 5 days

push eventnnethercote/rust

Mark Rousskov

commit sha 3b92689f3d0c3b90fa01d9873cdf01543d51c000

Relax bounds on HashMap to match hashbrown No functional changes are made, and all APIs are moved to strictly less restrictive bounds. These APIs changed from the old bound listed to no trait bounds: K: Hash + Eq * new * with_capacity K: Eq + Hash, S: BuildHasher * with_hasher * with_capacity_and_hasher * hasher K: Eq + Hash + Debug -> K: Debug S: BuildHasher -> S <HashMap as Debug> K: Eq + Hash -> K S: BuildHasher + Default -> S: Default <HashMap as Default>

view details

Mark Rousskov

commit sha 48859db151b839518bdd9d44a2387c0f6b65d141

Relax bounds on HashSet to match hashbrown No functional changes are made, and all APIs are moved to strictly less restrictive bounds. These APIs changed from the old bound listed to the new bound: T: Hash + Eq -> T * new * with_capacity T: Eq + Hash, S: BuildHasher -> T * with_hasher * with_capacity_and_hasher * hasher T: Eq + Hash + Debug -> T: Debug S: BuildHasher -> S <HashSet as Debug> T: Eq + Hash -> T S: BuildHasher + Default -> S: Default <HashSet as Default>

view details

Nicholas Nethercote

commit sha 6bf2cc2229768faa8e86e0e8a9f5bd8ebfc817a2

Avoid instantiating many `Parser` structs in `generic_extension`. Currently, every iteration of the main loop in `generic_extension` instantiates a `Parser`, which is expensive because `Parser` is a large type. Many of those instantiations are only used immutably, particularly for simple-but-repetitive macros of the sort seen in `html5ever` and PR 68836. This commit initializes a single "base" parser outside the loop, and then uses `Cow` to avoid cloning it except for the mutating iterations. This speeds up `html5ever` runs by up to 15%.

view details

Nicholas Nethercote

commit sha f840a955bd449810e75d8320b4c46482d6dbdec1

Remove the `Cow` from `Directory`. The previous commit wrapped `Parser` within a `Cow` for the hot macro parsing path. As a result, there's no need for the `Cow` within `Directory`, because it lies within `Parser`.

view details

Nicholas Nethercote

commit sha 2a13b24d369b8619f0197993cd5dc60f7217ed72

Change condition ordering in `parse_tt`. This is a small win, because `Failure` is much more common than `Success`.

view details

Aaron Hill

commit sha a60669d95cdad0e28cf28790b717bbcf235153f8

Properly use parent generics for opaque types Fixes #67844 Previously, opaque types would only get parent generics if they a return-position-impl-trait (e.g. `fn foo<A>() -> impl MyTrait<A>`). However, it's possible for opaque types to be nested inside one another: ```rust trait WithAssoc { type AssocType; } trait WithParam<A> {} type Return<A> = impl WithAssoc<AssocType = impl WithParam<A>>; ``` When this occurs, we need to ensure that the nested opaque types properly inherit generic parameters from their parent opaque type. This commit fixes the `generics_of` query to take the parent item into account when determining the generics for an opaque type.

view details

ImgBotApp

commit sha c18476e231058b8dd8528ca98b0b51ff14b729be

[ImgBot] Optimize images *Total -- 10.65kb -> 8.44kb (20.82%) /src/etc/installer/gfx/rust-logo.png -- 5.71kb -> 3.82kb (33.11%) /src/librustdoc/html/static/down-arrow.svg -- 0.63kb -> 0.50kb (20.44%) /src/librustdoc/html/static/wheel.svg -- 3.86kb -> 3.68kb (4.66%) /src/librustdoc/html/static/brush.svg -- 0.47kb -> 0.44kb (4.61%) Signed-off-by: ImgBotApp <ImgBotHelp@gmail.com>

view details

Aaron Hill

commit sha 34cf0b32674da79403746716e5a7ed2072dfabe2

Only use the parent if it's an opaque type

view details

Nicholas Nethercote

commit sha ad7802f9d45b884dad58931c7a8bec91d196ad0e

Micro-optimize the heck out of LEB128 reading and writing. This commit makes the following writing improvements: - Removes the unnecessary `write_to_vec` function. - Reduces the number of conditions per loop from 2 to 1. - Avoids a mask and a shift on the final byte. And the following reading improvements: - Removes an unnecessary type annotation. - Fixes a dangerous unchecked slice access. Imagine a slice `[0x80]` -- the current code will read past the end of the slice some number of bytes. The bounds check at the end will subsequently trigger, unless something bad (like a crash) happens first. The cost of doing bounds check in the loop body is negligible. - Avoids a mask on the final byte. And the following improvements for both reading and writing: - Changes `for` to `loop` for the loops, avoiding an unnecessary condition on each iteration. This also removes the need for `leb128_size`. All of these changes give significant perf wins, up to 5%.

view details

Dario Gonzalez

commit sha 1f6fb338a5f775745595d32b61c1862887c948f9

make the sgx arg cleanup implementation a no op

view details

Esteban Küber

commit sha 24be307b53931a9824829f63aa65fa5c6042ed21

Suggestion when encountering assoc types from hrtb When encountering E0212, detect whether this is a representable case or not, i.e. if it's happening on an `fn` or on an ADT. If the former, provide a structured suggestion, otherwise note that this can't be represented in Rust.

view details

Esteban Küber

commit sha bde96776a199064dec3c825ca5ada8f90e1e12d4

Suggest named lifetime in ADT with hrtb

view details

Igor Matuszewski

commit sha 8fc4bba2c457796b28da604160da90750f3695da

Update RLS and Rustfmt Bumps rustc-ap-* packages to v642.

view details

Tomasz Miąsko

commit sha 33e2c1d863f53f5224db5abd40c6a84879051ef2

bootstrap: Configure cmake when building sanitizer runtimes

view details

Esteban Küber

commit sha c39b04ea851b821359534b540c0babb97de24122

When expecting `BoxFuture` and using `async {}`, suggest `Box::pin`

view details

Esteban Küber

commit sha a852fb74131af7473bafb03d0f3994a0e9f597d5

Remove std lib `Span` from expected boxed future test

view details

Esteban Küber

commit sha 80cdb0af7dd27471e1e4a4362e2473a9331a5fdd

Account for `Box::new(impl Future)` and emit help `use Box::pin`

view details

Esteban Küber

commit sha c376fc001772200e2de8d7a610a5b67dcf642432

Account for `Pin::new(_)` and `Pin::new(Box::new(_))` when `Box::pin(_)` would be applicable

view details

Esteban Küber

commit sha 248f5a4046ab5a90189f37c305c759b7cde8acb3

Add trait `Self` filtering to `rustc_on_unimplemented`

view details

bors

commit sha ba18875557aabffe386a2534a1aa6118efb6ab88

Auto merge of #69097 - Xanewok:update-rls-rustfmt, r=Dylan-DPC Update RLS and Rustfmt Bumps `rustc-ap-*` packages to v642. Closes #68916. Closes #68917. cc @topecongiro

view details

push time in 5 days

delete branch nnethercote/rust

delete branch : micro-optimize-leb128

delete time in 5 days

push eventnnethercote/rust

Esteban Küber

commit sha 97d47a5e7c41274eacbec55a4c08112407c78ff5

Account for type params on method without parens

view details

Ralf Jung

commit sha 202d401c2504f17133c50505b82fe4278ab2c842

miri: simplify singed operator overflow detection

view details

Ralf Jung

commit sha 28f85c6ffad77554150e7cab4ccac38b26621bdb

bring back extra check for int_min%-1

view details

Ralf Jung

commit sha 7d2f6ae00149e4fdfeb9eedc9cb7433f6e67cf42

miri: equip unary_op with overflow detection

view details

Ralf Jung

commit sha ae23f7020a5cb9a201e83f20f151282368b1f494

const-prop: use overflowing_unary_op for overflowing checking of unary ops

view details

Ralf Jung

commit sha b434d7ef8ae19c145dd9348b70bb955147dfab70

add test that checks overflows on arithmetic operators

view details

Ralf Jung

commit sha 1ddb0503ff1e203de40f5bbc1e0b00d1b4e99d12

div/rem overflow tests: also test i128

view details

Ralf Jung

commit sha d6c5a04eff9643b634cb2c98411f973b8f7aa1e2

some more tests for i128 oveflow behavior

view details

Raoul Strackx

commit sha aeedc9dea9e0460488e0b6ce7fe3aaf50395774c

Corrected ac_mitigation patch. That patch used the untrusted stack to clear rflags during enclave (re-)entry

view details

Raoul Strackx

commit sha 236ab6e6d631f073a8c3c7439af6b2ec58ce1f25

sanitize MXCSR/FPU control registers

view details

Jethro Beekman

commit sha 71b9ed4a36748be01826063951310a2da2717a9b

Avoid jumping to Rust code with user %rsp (reentry_panic)

view details

Ralf Jung

commit sha c561d23a6105122a517c14394a46c3faab8e01b6

remove outdated comment

view details

Esteban Küber

commit sha 9d91489526121ef3408e1efa2a98bcaefdedd9bc

review comment: wording

view details

Jane Lusby

commit sha b637c0e84a9dbb5883130e0ea1e5ee9e8acf3bc1

Add initial debug fmt for Backtrace

view details

Jane Lusby

commit sha 49204563e13f57917cc22ac8f8b608927a432038

Get vaguely working with a test for checking output

view details

Jane Lusby

commit sha c0ba79eefd82d0a5614e295659b18f7b31e542a3

less noisy format

view details

Jane Lusby

commit sha 0d5444ffa6ac0447849627406d15d16630a6364b

remove unnecessary derives

view details

Jane Lusby

commit sha 76e6d6fe114944c88bea77baf700aa5ead2aa9e3

remove unnecessary Debug impl for BacktraceFrame

view details

Jane Lusby

commit sha 583dd2c3eebafb72ec89fd4497c3cb751e2343ba

make it compile

view details

Jane Lusby

commit sha 87117783fb59a580d0a90200ac62ecf219142e49

final format cleanups

view details

push time in 6 days

pull request commentrust-lang/rust

Micro-optimize the heck out of LEB128 reading and writing.

BTW, in case anyone is curious, here's how I approached this bug. From profiling with Callgrind I saw that clap-rs-Check-CleanIncr was the benchmark+run+build combination most affected by LEB128 encoding. Its text output has entries like this:

265,344,872 ( 2.97%)  /home/njn/moz/rust0/src/libserialize/leb128.rs:rustc::ty::query::on_disk_cache::__ty_decoder_impl::<impl serialize::serialize::Decoder for rustc::ty::query::on_disk_cache::CacheDecoder>::read_usize
236,097,015 ( 2.64%)  /home/njn/moz/rust0/src/libserialize/leb128.rs:<rustc::ty::query::on_disk_cache::CacheEncoder<E> as serialize::serialize::Encoder>::emit_u32
213,551,888 ( 2.39%)  /home/njn/moz/rust0/src/libserialize/leb128.rs:rustc::ty::codec::encode_with_shorthand
165,042,682 ( 1.85%)  /home/njn/moz/rust0/src/libserialize/leb128.rs:<rustc_target::abi::VariantIdx as serialize::serialize::Decodable>::decode
 40,540,500 ( 0.45%)  /home/njn/moz/rust0/src/libserialize/leb128.rs:<u32 as serialize::serialize::Encodable>::encode
 24,026,292 ( 0.27%)  /home/njn/moz/rust0/src/libserialize/leb128.rs:serialize::serialize::Encoder::emit_seq
 20,160,540 ( 0.23%)  /home/njn/moz/rust0/src/libserialize/leb128.rs:<rustc::dep_graph::serialized::SerializedDepNodeIndex as serialize::serialize::Decodable>::decode
  9,661,323 ( 0.11%)  /home/njn/moz/rust0/src/libserialize/leb128.rs:serialize::serialize::Decoder::read_tuple
  4,898,927 ( 0.05%)  /home/njn/moz/rust0/src/libserialize/leb128.rs:<rustc::ty::query::on_disk_cache::CacheEncoder<E> as serialize::serialize::Encoder>::emit_usize
  3,384,018 ( 0.04%)  /home/njn/moz/rust0/src/libserialize/leb128.rs:<rustc_metadata::rmeta::encoder::EncodeContext as serialize::serialize::Encoder>::emit_u32
  2,296,440 ( 0.03%)  /home/njn/moz/rust0/src/libserialize/leb128.rs:<rustc::ty::UniverseIndex as serialize::serialize::Decodable>::decode

These are instruction counts, and the percentages sum to about 11%. Lots of different functions are involved because the LEB128 functions are inlined, but the file is leb128.rs in all of them, so I could tell where the relevant code lives. And the annotated code in that file looks like this:

          .           macro_rules! impl_write_unsigned_leb128 {
          .               ($fn_name:ident, $int_ty:ident) => {
          .                   #[inline]
          .                   pub fn $fn_name(out: &mut Vec<u8>, mut value: $int_ty) {
          .                       for _ in 0..leb128_size!($int_ty) {
143,877,210 ( 1.61%)                  let mut byte = (value & 0x7F) as u8;
 48,003,612 ( 0.54%)                  value >>= 7;
239,884,434 ( 2.69%)                  if value != 0 {
 47,959,070 ( 0.54%)                      byte |= 0x80;
          .                           }
          .
          .                           write_to_vec(out, byte);
          .
 47,959,070 ( 0.54%)                  if value == 0 {
          .                               break;
          .                           }
          .                       }
          .                   }
          .               };
          .           }
          .
          .           impl_write_unsigned_leb128!(write_u16_leb128, u16);
-- line 50 ----------------------------------------
-- line 57 ----------------------------------------
          .               ($fn_name:ident, $int_ty:ident) => {
          .                   #[inline]
          .                   pub fn $fn_name(slice: &[u8]) -> ($int_ty, usize) {
          .                       let mut result: $int_ty = 0;
          .                       let mut shift = 0;
          .                       let mut position = 0;
          .
          .                       for _ in 0..leb128_size!($int_ty) {
 59,507,824 ( 0.67%)                  let byte = unsafe { *slice.get_unchecked(position) };
          .                           position += 1;
204,126,888 ( 2.29%)                  result |= ((byte & 0x7F) as $int_ty) << shift;
119,023,350 ( 1.33%)                  if (byte & 0x80) == 0 {
          .                               break;
          .                           }
          .                           shift += 7;
          .                       }
          .
          .                       // Do a single bounds check at the end instead of for every byte.
 67,805,748 ( 0.76%)              assert!(position <= slice.len());
          .
          .                       (result, position)
          .                   }
          .               };
          .           }

Those percentages also add up to about 11%. Plus I poked around a bit at call sites and found this in a different file (libserialize/opaque.rs):

         .           macro_rules! read_uleb128 {
          .               ($dec:expr, $fun:ident) => {{
100,680,777 ( 1.13%)          let (value, bytes_read) = leb128::$fun(&$dec.data[$dec.position..]);
 67,858,196 ( 0.76%)          $dec.position += bytes_read;
 43,378,625 ( 0.49%)          Ok(value)
          .               }};
          .           }

which is another 2.38%. So it was clear that LEB128 reading/writing was hot.

I then tried gradually improving the code. I ended up measuring 18 different changes to the code. 10 of them were improvements (which I kept), and 8 were regressions (which I discarded). The following table shows the notes I took. The descriptions of the changes are a bit cryptic, but the basic technique should be clear.

IMPROVEMENTS
            clap-rs-Check-CleanIncr
feb10/Leb0  8,992M        $RUSTC0
feb10/Leb1  8,927M/99.3%  First attempt
feb11/Leb4  8,996M        $RUSTC0 but with bounds checking
feb11/Leb5  8,983M        `loop` for reading
feb11/Leb6  8,928M/99.3%  `loop` for writing, `write_to_vec` removed
feb11/Leb8  8,829M/98.1%  avoid mask on final byte in read loop
feb11/Leb9  8,529M/94.8%  in write loop, avoid a condition
feb11/Leb10 8,488M/94.4%  in write loop, mask/shift on final byte
feb13/Leb13 8,488M/94.4%  in write loop, push `(value | 0x80) as u8`
feb13/Leb15 8,488M/94.4%  in read loop, do `as` before `&`
feb13/Leb18 8,492M/94.4%  Landed (not sure about the extra 4M, oh well)

REGRESSIONS
feb11/Leb2  8,927M/99.3%  add slice0, slice1, slice2 vars
feb11/Leb3  9,127M        move the slow loop into a separate no-inline function
feb11/Leb7  8,930M        `< 128` in read loop
feb11/Leb11 8,492M        use `byte < 0x80` in read loop
feb12/Leb12 8,721M        unsafe pushing in write
feb13/Leb14 8,494M/94.4%  in write loop, push `(value as u8) | 0x80`
feb13/Leb16 8,831M        eddyb's write loop
feb13/Leb17 8,578M        eddyb's read loop

Every iteration took about 6.5 minutes to recompile, and about 2 minutes to measure with Cachegrind. I interleaved these steps with other work, so in practice each iteration took anywhere from 10-30 minutes, depending on context-switching delays.

The measurements in the notes are close to those from the CI run, which indicate the following for clap-rs-Check-CleanIncr:

  • instructions: -5.3%
  • cycles: -4.4%
  • wall-time: -3.9%

Instruction counts are almost deterministic and highly reliable. Cycle counts are more variable but still reasonable. Wall-time is highly variable and barely trustworthy. But they're all pointing in the same direction, which is encouraging.

Looking at the instruction counts, we saw that LEB128 operations were about 11-13% of instructions originally, and instruction counts went down by about 5%, which suggests that the LEB128 operations are a bit less than twice as fast as they were. Pretty good.

nnethercote

comment created time in 6 days

pull request commentrust-lang/rust

Micro-optimize the heck out of LEB128 reading and writing.

@bors r=michaelwoerister

nnethercote

comment created time in 6 days

pull request commentrust-lang/rust

Micro-optimize the heck out of LEB128 reading and writing.

also, has anyone considered using SIMD here

See also Masked VByte [arXiv].

Thanks for the link, I will take a look... but not in this PR :)

nnethercote

comment created time in 6 days

pull request commentrust-lang/rust

Micro-optimize the heck out of LEB128 reading and writing.

@nnethercote If you're bored, I wonder how this implementation compares to the pre-#59820 one in libproc_macro (which I implemented from scratch in safe code).

I tried the read and write implementations from libproc_macro individually, they both were slower than the code in this PR.

nnethercote

comment created time in 6 days

push eventnnethercote/rust

John Kåre Alsaker

commit sha bfba6ef328bbba327cae8918e795c11b89217672

Add an option to use LLD to link the compiler on Windows platforms

view details

John Kåre Alsaker

commit sha 95318f8d859dc55cc5e06722c96f6e492529d6ca

Link a linking issue

view details

John Kåre Alsaker

commit sha 2124a85260fdf0851bb4de369d311bcfc05b205f

Don't use a whitelist for use_lld

view details

kennytm

commit sha 847d5b4d1387a30f1798a5c3c59c3e0c31e00319

Derive Clone + PartialEq + Eq for std::string::FromUtf8Error

view details

Trevor Spiteri

commit sha fd2282388140ea0f370ee25c82f00be81c2f822c

implement AsMut<str> for String

view details

Mazdak Farrokhzad

commit sha 7af9ff3e699207da7a5220b98ba9831d66697c80

introduce `#![feature(move_ref_pattern)]`

view details

Mazdak Farrokhzad

commit sha d984f127f662f7a1fcf0472230a1b64fcc3325d5

move_ref_patterns: introduce tests bindings_after_at: harden tests

view details

Mazdak Farrokhzad

commit sha 0253f868cab2c5be84d354589b4b833aedbc9987

move_ref_pattern: adjust error index

view details

Mazdak Farrokhzad

commit sha 8d4973f5871fd36b5946b9a06bd1157d4a87bbe0

move_ref_pattern: don't ICE on unreachable 2xby-move conflicts

view details

Mazdak Farrokhzad

commit sha bd318be05dab2e1149595aacbf3d808559fa42dc

move_ref_pattern: change pov in diagnostics & add binding names

view details

Mazdak Farrokhzad

commit sha d2b88b7050b0e21b136022c4cfe8d352c1425588

move_ref_pattern: test captures inside closure

view details

Matthew Jasper

commit sha 91cf0e741186a9fa3bf31b07a65dc89324c10296

Don't requery the param_env of a union Union fields have the ParamEnv of the union.

view details

Matthew Jasper

commit sha 570c1613c1225d5777af5603dcf526da9cf57e19

Remove unnecessary features in rustc_ty

view details

Matthew Jasper

commit sha 39733223fc817efba52a4204dd697192bf5da185

Add IS_MANUALLY_DROP to AdtFlags

view details

Matthew Jasper

commit sha d1965216a34dc2831cf44d2e15ad9d78403d10cc

Improve needs_drop query * Handle cycles in `needs_drop` correctly * Normalize types when computing `needs_drop` * Move queries from rustc to rustc_ty

view details

Matthew Jasper

commit sha d20646b2d8033f31423b5bda3e56776df115e144

Address review comments * Handle arrays with const-generic lengths * Use closure for repeated code.

view details

Matthew Jasper

commit sha 465b86253ce828e215d564fde53adf8742f0e3f6

Use correct `ParamEnv` in `Instance::resolve`

view details

Esteban Küber

commit sha 109d5c189f4b5c3405a7d6cfb312e04d866c0c31

Tweak borrow error on `FnMut` when `Fn` is expected

view details

bjorn3

commit sha bdacdf49e532ce869d1eb96e967fd77991566a7f

Remove unused core_intrinsics feature gate from bootstrap

view details

bjorn3

commit sha 095963f91d525951cb0183648c47c427fb69f16d

Remove unused feature gates from librustc

view details

push time in 6 days

Pull request review commentrust-lang/rust

Micro-optimize the heck out of LEB128 reading and writing.

 macro_rules! impl_read_unsigned_leb128 {     ($fn_name:ident, $int_ty:ident) => {         #[inline]         pub fn $fn_name(slice: &[u8]) -> ($int_ty, usize) {-            let mut result: $int_ty = 0;+            let mut result = 0;             let mut shift = 0;             let mut position = 0;--            for _ in 0..leb128_size!($int_ty) {-                let byte = unsafe { *slice.get_unchecked(position) };+            loop {+                let byte = slice[position];                 position += 1;-                result |= ((byte & 0x7F) as $int_ty) << shift;                 if (byte & 0x80) == 0 {-                    break;+                    result |= (byte as $int_ty) << shift;+                    return (result, position);+                } else {+                    result |= ((byte & 0x7F) as $int_ty) << shift;

Fixed.

nnethercote

comment created time in 6 days

Pull request review commentrust-lang/rust

Micro-optimize the heck out of LEB128 reading and writing.

-#[inline]-pub fn write_to_vec(vec: &mut Vec<u8>, byte: u8) {-    vec.push(byte);-}--#[cfg(target_pointer_width = "32")]-const USIZE_LEB128_SIZE: usize = 5;-#[cfg(target_pointer_width = "64")]-const USIZE_LEB128_SIZE: usize = 10;--macro_rules! leb128_size {-    (u16) => {-        3-    };-    (u32) => {-        5-    };-    (u64) => {-        10-    };-    (u128) => {-        19-    };-    (usize) => {-        USIZE_LEB128_SIZE-    };-}- macro_rules! impl_write_unsigned_leb128 {     ($fn_name:ident, $int_ty:ident) => {         #[inline]         pub fn $fn_name(out: &mut Vec<u8>, mut value: $int_ty) {-            for _ in 0..leb128_size!($int_ty) {-                let mut byte = (value & 0x7F) as u8;-                value >>= 7;-                if value != 0 {-                    byte |= 0x80;-                }--                write_to_vec(out, byte);--                if value == 0 {+            loop {+                if value < 0x80 {+                    out.push(value as u8);                     break;+                } else {+                    out.push(((value & 0x7f) | 0x80) as u8);

I measured and the suggested change makes no difference to performance. But I will use it anyway, to avoid the possibility of other people asking the same question in the future. Thanks for the suggestion!

nnethercote

comment created time in 6 days

pull request commentrust-lang/rust

Speed up `SipHasher128`.

I would actually prefer implementing u8to64_le in terms of from_le_bytes

from_le_bytes takes a [u8; 8] argument, so I'm having trouble seeing how you would write u8to64_le with it. I might be overlooking something.

nnethercote

comment created time in 6 days

delete branch nnethercote/rust

delete branch : speed-up-SipHasher128

delete time in 6 days

pull request commentrust-lang/rust

Micro-optimize the heck out of LEB128 reading and writing.

@bors r- until I have tried out @ranma42's suggestion.

nnethercote

comment created time in 6 days

Pull request review commentrust-lang/rust

Micro-optimize the heck out of LEB128 reading and writing.

-#[inline]-pub fn write_to_vec(vec: &mut Vec<u8>, byte: u8) {-    vec.push(byte);-}--#[cfg(target_pointer_width = "32")]-const USIZE_LEB128_SIZE: usize = 5;-#[cfg(target_pointer_width = "64")]-const USIZE_LEB128_SIZE: usize = 10;--macro_rules! leb128_size {-    (u16) => {-        3-    };-    (u32) => {-        5-    };-    (u64) => {-        10-    };-    (u128) => {-        19-    };-    (usize) => {-        USIZE_LEB128_SIZE-    };-}- macro_rules! impl_write_unsigned_leb128 {     ($fn_name:ident, $int_ty:ident) => {         #[inline]         pub fn $fn_name(out: &mut Vec<u8>, mut value: $int_ty) {-            for _ in 0..leb128_size!($int_ty) {-                let mut byte = (value & 0x7F) as u8;-                value >>= 7;-                if value != 0 {-                    byte |= 0x80;-                }--                write_to_vec(out, byte);--                if value == 0 {+            loop {+                if value < 0x80 {+                    out.push(value as u8);                     break;+                } else {+                    out.push(((value & 0x7f) | 0x80) as u8);

True! Because the as u8 truncates. Tomorrow I will measure and update the code if it doesn't make things slower.

nnethercote

comment created time in 6 days

Pull request review commentrust-lang/rust

Micro-optimize the heck out of LEB128 reading and writing.

-#[inline]-pub fn write_to_vec(vec: &mut Vec<u8>, byte: u8) {-    vec.push(byte);-}--#[cfg(target_pointer_width = "32")]-const USIZE_LEB128_SIZE: usize = 5;-#[cfg(target_pointer_width = "64")]-const USIZE_LEB128_SIZE: usize = 10;--macro_rules! leb128_size {-    (u16) => {-        3-    };-    (u32) => {-        5-    };-    (u64) => {-        10-    };-    (u128) => {-        19-    };-    (usize) => {-        USIZE_LEB128_SIZE-    };-}- macro_rules! impl_write_unsigned_leb128 {     ($fn_name:ident, $int_ty:ident) => {

$int_ty is a pre-existing name and one that follows the normal style. I see no good reason to change it.

nnethercote

comment created time in 7 days

Pull request review commentrust-lang/rust

Micro-optimize the heck out of LEB128 reading and writing.

-#[inline]-pub fn write_to_vec(vec: &mut Vec<u8>, byte: u8) {-    vec.push(byte);-}--#[cfg(target_pointer_width = "32")]-const USIZE_LEB128_SIZE: usize = 5;-#[cfg(target_pointer_width = "64")]-const USIZE_LEB128_SIZE: usize = 10;--macro_rules! leb128_size {-    (u16) => {-        3-    };-    (u32) => {-        5-    };-    (u64) => {-        10-    };-    (u128) => {-        19-    };-    (usize) => {-        USIZE_LEB128_SIZE-    };-}- macro_rules! impl_write_unsigned_leb128 {     ($fn_name:ident, $int_ty:ident) => {         #[inline]         pub fn $fn_name(out: &mut Vec<u8>, mut value: $int_ty) {-            for _ in 0..leb128_size!($int_ty) {-                let mut byte = (value & 0x7F) as u8;-                value >>= 7;-                if value != 0 {-                    byte |= 0x80;-                }--                write_to_vec(out, byte);--                if value == 0 {+            loop {

I tried this:

        pub fn $fn_name(out: &mut Vec<u8>, mut value: $int_ty) {
            let mut len = out.len();
            out.reserve(19); // Maximum possible length = ceiling(128/7)
            loop {
                if value < 0x80 {
                    unsafe { *out.get_unchecked_mut(len) = value as u8 };
                    len += 1;
                    break;
                } else {
                    unsafe { *out.get_unchecked_mut(len) = ((value & 0x7f) | 0x80) as u8 };
                    len += 1;
                    value >>= 7;
                }
            }
            unsafe {
                out.set_len(len);
            }
        }

It was a measurable slowdown. In practice, single-byte writes are easily the most common (more than 50%) and the average length is less than 2, which might partly explain the outcome.

nnethercote

comment created time in 7 days

pull request commentrust-lang/rust

Speed up `SipHasher128`.

@michaelwoerister: I have left the previous changes in place, because I think option (3) is the best. I have also added another commit that makes the u8to64_le changes you suggest. I think this is in a good enough state to land, though if you are able to do a full test run on a big-endian machine that would be welcome.

nnethercote

comment created time in 7 days

push eventnnethercote/rust

Nicholas Nethercote

commit sha 9aea154e7893b498b98a3d9c8e4c385c96fbe454

Improve `u8to64_le`. This makes it faster and also changes it to a safe function. (Thanks to Michael Woerister for the suggestion.) `load_int_le!` is also no longer necessary.

view details

push time in 7 days

pull request commentrust-lang/rust

Speed up `SipHasher128`.

One more tidbit: the unit tests have a test of short_write that check that it has the libstd behaviour, i.e. different results on big-endian vs little-endian. Line 412 is the endian-dependent one: https://github.com/rust-lang/rust/blob/3f32e3001e3a64c1baa509d3d1734dff53f14d81/src/librustc_data_structures/sip128/tests.rs#L401-L418

So the test confirms that the libstd behaviour is intended.

nnethercote

comment created time in 7 days

pull request commentrust-lang/rust

Speed up `SipHasher128`.

So it looks the same as on little-endian. This is what I expected because the code in question operates on integers, not on byte sequences, i.e. the number 0xaabbccdd might have a different memory layout on big-endian, but it still has the same numeric value and print!("{:x}", n) prints the same on all architectures.

The new code operates on integers, so I agree that the new short_write is endian-independent; that's why I added the to_le calls in the write_xyz methods, so that the results on big-endian would be different to little-endian, as is done for libstd.

But the old code involves byte sequences in short_write_gen/short_write, and so should not be endian-independent. And the fact that StableHasher has the to_le calls makes sense with this theory -- because SipHasher128 is not endian-independent, StableHasher::write_xyz has to byte-swap on big-endian to be endian-independent.

I just looked more closely at your sip-endian code and I now understand why it gives the same results for all four cases: le-old-code, le-new-code, be-old-code, be-new-code. You didn't copy the SipHasher128 implementations exactly -- you added to_le calls to the old write_xyz functions, and and removed them from the new write_xyz functions! So you effectively emulated StableHasher, which is supposed to get the same result on big-endian and little-endian. This shows that the PR as written is correct, yay! (If you undo those changes and re-run, you should get the big-endian outputs I predicted above, starting with old early : 0xaabbccdd, 4.)

I see, that is interesting. I didn't know that the libstd implementation worked this way. It's clear that that must give different results depending on endianess. At the same time StableHasher must give the same result on all platforms for that sequence of calls.

Yes.

I think it's fine for SipHasher128 to handle this differently than libstd, as long as we document it.

I'd prefer it to handle it the same way...

So I think our options for SipHasher128 are:

  1. Don't do any endianess conversions on short_write arguments and rely on short_write to be implemented in an endian independent way (which it is as long as it only does bitwise and arithmetic operations).

  2. Make short_write take a byte slice again and then make sure that StableHasher makes things endian independent by always converting to little endian. (~= the current implementation)

  3. Try to make SipHasher128 behave exactly the same way as std::hash::Hasher (i.e. giving different results depending on endianess) while still using integer arguments for short_write and then let StableHasher pre-process the integers in a way that leads to endian independent hash values. (~= the current version of this PR?)

I prefer option (1) as it is just simpler.

I prefer option (3). My desires are:

  • I don't want to change SipHasher's current behaviour, which is the libstd behaviour (i.e. different results on big-endian and little-endian), because it's exposed to every Rust program and so changing it seems like a very bad idea.
  • I want SipHasher and SipHasher128 to be as similar as possible.
    • Because the latter was clearly derived from the former and I want that derivation to be obvious.
    • Because subtle (i.e. big-endian-only) differences between the two could be confusing.
    • Because I want SipHasher to get the same speed-ups that SipHasher128 is getting.

The only way to satisfy all of these is via (3), which the current PR code implements. The downside is extra to_le calls in both StableHasher::write_xyz and SipHasher128::write_xyz, but I think that's reasonable to satisfy the desires above.

Does that sound reasonable?

I just find to_le() confusing in most contexts. E.g. why does x.to_le().to_le() give me big-endian encoding on a big endian system? I personally prefer to call swap_bytes() which is just more explicit.

That's interesting. I find lots of things about little-endian/big-endian confusing, but I don't have trouble with to_le. It's just a no-op on little-endian and a byte-swap on big-endian. So I do prefer the to_le form. I definitely agree that either version is a clear improvement to u8to64_le and that eliminating load_int_le! is a good thing.

nnethercote

comment created time in 7 days

pull request commentrust-lang/rust

Micro-optimize the heck out of LEB128 reading and writing.

Local check results:

clap-rs-check
        avg: -2.7%      min: -5.6%      max: -0.0%
ucd-check
        avg: -1.3%      min: -2.8%      max: -0.4%
coercions-check
        avg: -1.0%?     min: -2.2%?     max: -0.0%?
tuple-stress-check
        avg: -0.7%      min: -1.6%      max: -0.0%
wg-grammar-check
        avg: -0.6%      min: -1.6%      max: -0.0%
html5ever-check
        avg: -0.9%      min: -1.4%      max: -0.2%
script-servo-check
        avg: -0.8%      min: -1.1%      max: -0.1%
cranelift-codegen-check
        avg: -0.5%      min: -1.0%      max: -0.1%
unused-warnings-check
        avg: -0.4%      min: -1.0%      max: -0.0%
webrender-check
        avg: -0.6%      min: -1.0%      max: -0.1%
regression-31157-check
        avg: -0.6%      min: -1.0%      max: -0.2%
regex-check
        avg: -0.7%      min: -1.0%      max: -0.1%
piston-image-check
        avg: -0.6%      min: -0.9%      max: -0.1%
cargo-check
        avg: -0.5%      min: -0.9%      max: -0.0%
webrender-wrench-check
        avg: -0.6%      min: -0.8%      max: -0.1%
hyper-2-check
        avg: -0.4%      min: -0.8%      max: -0.1%
keccak-check
        avg: -0.3%      min: -0.8%      max: -0.0%
futures-check
        avg: -0.5%      min: -0.8%      max: -0.1%
syn-check
        avg: -0.5%      min: -0.8%      max: -0.1%
packed-simd-check
        avg: -0.4%      min: -0.8%      max: -0.0%
ripgrep-check
        avg: -0.5%      min: -0.8%      max: -0.1%
serde-check
        avg: -0.3%      min: -0.8%      max: -0.0%
encoding-check
        avg: -0.5%      min: -0.8%      max: -0.1%
serde-serde_derive-check
        avg: -0.4%      min: -0.7%      max: -0.0%
style-servo-check
        avg: -0.4%      min: -0.7%      max: -0.0%
tokio-webpush-simple-check
        avg: -0.5%      min: -0.7%      max: -0.2%
inflate-check
        avg: -0.2%      min: -0.7%      max: -0.0%
await-call-tree-check
        avg: -0.6%      min: -0.7%      max: -0.4%
issue-46449-check
        avg: -0.5%      min: -0.7%      max: -0.4%
wf-projection-stress-65510-che...
        avg: -0.2%      min: -0.6%      max: 0.0%
unicode_normalization-check
        avg: -0.2%      min: -0.6%      max: -0.0%
helloworld-check
        avg: -0.3%      min: -0.5%      max: -0.1%
ctfe-stress-4-check
        avg: -0.2%?     min: -0.5%?     max: 0.2%?
unify-linearly-check
        avg: -0.3%      min: -0.4%      max: -0.2%
deeply-nested-check
        avg: -0.3%      min: -0.4%      max: -0.2%
deep-vector-check
        avg: -0.1%      min: -0.3%      max: -0.0%
token-stream-stress-check
        avg: -0.1%      min: -0.1%      max: -0.0%

The biggest improvements are on "clean incremental" runs, followed by "patched incremental".

nnethercote

comment created time in 7 days

pull request commentrust-lang/rust

Micro-optimize the heck out of LEB128 reading and writing.

@bors try @rust-timer queue

nnethercote

comment created time in 7 days

PR opened rust-lang/rust

Micro-optimize the heck out of LEB128 reading and writing.

This commit makes the following writing improvements:

  • Removes the unnecessary write_to_vec function.
  • Reduces the number of conditions per loop from 2 to 1.
  • Avoids a mask and a shift on the final byte.

And the following reading improvements:

  • Removes an unnecessary type annotation.
  • Fixes a dangerous unchecked slice access. Imagine a slice [0x80] -- the current code will read past the end of the slice some number of bytes. The bounds check at the end will subsequently trigger, unless something bad (like a crash) happens first. The cost of doing bounds check in the loop body is negligible.
  • Avoids a mask on the final byte.

And the following improvements for both reading and writing:

  • Changes for to loop for the loops, avoiding an unnecessary condition on each iteration. This also removes the need for leb128_size.

All of these changes give significant perf wins, up to 5%.

r? @michaelwoerister

+14 -50

0 comment

1 changed file

pr created time in 7 days

create barnchnnethercote/rust

branch : micro-optimize-leb128

created branch time in 7 days

pull request commentrust-lang/rust

Speed up `SipHasher128`.

@michaelwoerister: I have added some debugging eprintln statements to your code: https://github.com/nnethercote/sip-endian/tree/add-some-printlns

This is the output I get on little-endian:

old early  : 0xddccbbaa, 4
old early  : 0xeeddccbbaa, 5
old process: 0x345678eeddccbbaa, 5
old spill  : 0x12, 1
old: 20f554e44fa4ca9 d68f01a898684a41
new early  : 0xddccbbaa, 4
new early  : 0xeeddccbbaa, 5
new process: 0x345678eeddccbbaa, 5
new spill  : 0x12, 1
new: 20f554e44fa4ca9 d68f01a898684a41

This is the output I expect on big-endian:

old early  : 0xaabbccdd, 4
old early  : 0xeeaabbccdd, 5
old process: 0x563412eeaabbccdd, 5
old spill  : 0x78, 1
old: 20f554e44fa4ca9 d68f01a898684a41
new early  : 0xaabbccdd, 4
new early  : 0xeeaabbccdd, 5
new process: 0x563412eeaabbccdd, 5
new spill  : 0x78, 1
new: <something> <something>

Can you check the big-endian results?

nnethercote

comment created time in 8 days

create barnchnnethercote/sip-endian

branch : add-some-printlns

created branch time in 8 days

pull request commentrust-lang/rust

Speed up `SipHasher128`.

>         #[cfg(target_endian = "big")]
>         {
>             // If this is a big endian system we swap bytes, so that the first
>             // byte ends up in the lowest order byte, like SipHash expects.
>             out = out.swap_bytes();
>         }
>
>         out

This whole snippet can be simplified to out.to_le().

nnethercote

comment created time in 8 days

pull request commentrust-lang/rust

Speed up `SipHasher128`.

In case it helps, here is what I think should happen in the four SipHasher28 cases for the above example.

-----------------------------------------------------------------------------
little-endian
-----------------------------------------------------------------------------
SipHasher128, old code
- write_u32(0xDDCCBBAA)
  - short_write([AA, BB, CC, DD])
  - needed = 8, fill = 4
  - self.tail |= u8to64_le(msg, 0, 4) << 0 --> 0xDDCCBBAA
- write_u8(0xEE)
  - short_write([EE])
  - needed = 4, fill = 1
  - self.tail |= u8to64_le(msg, 0, 1) << 4*8 --> 0xEE_CCDDBBAA
- write_u32(0xIIHHGGFF)
  - short_write([FF, GG, HH, II])
  - needed = 3, fill = 3
  - self.tail |= u8to64_le(msg, 0, 3) << 5*8 --> 0xHHGGFF_EE_CCDDBBAA
  - process
  - self.tail = u8to64_le(msg, 3, 1) --> 0xII

SipHasher128, new code
- write_u32(0xDDCCBBAA)
  - short_write(0x00000000_DDCCBBAA)
  - needed = 8, fill = 4
  - self.tail |= x << 0 --> 0xDDCCBBAA
- write_u8(0xEE)
  - short_write(0x00000000_000000EE)
  - needed = 4, fill = 1
  - self.tail |= x << 4*8 --> 0xEE_CCDDBBAA
- write_u32(0xIIHHGGFF)
  - short_write(0x00000000_IIHHGGFF)
  - needed = 3, fill = 3
  - self.tail |= x << 5*8 --> 0xHHGGFF_EE_CCDDBBAA
  - process
  - self.tail = x >> 3*8 --> 0xII

-----------------------------------------------------------------------------
big-endian
-----------------------------------------------------------------------------
SipHasher128, old code
- write_u32(0xDDCCBBAA)
  - short_write([DD, CC, BB, AA])
  - needed = 8, fill = 4
  - self.tail |= u8to64_le(msg, 0, 4) << 0 --> 0xAABBCCDD
- write_u8(0xEE)
  - short_write([EE])
  - needed = 4, fill = 1
  - self.tail |= u8to64_le(msg, 0, 1) << 4*8 --> 0xEE_AABBCCDD
- write_u32(0xIIHHGGFF)
  - short_write([II, HH, GG, FF])
  - needed = 3, fill = 3
  - self.tail |= u8to64_le(msg, 0, 3) << 5*8 --> 0xGGHHII_EE_CCDDBBAA
  - process
  - self.tail = u8to64_le(msg, 3, 1) --> 0xFF

SipHasher128, new code
- write_u32(0xDDCCBBAA)
  - short_write(0x00000000_AABBCCDD)    // was byte-swapped, then zero-extended
  - needed = 8, fill = 4
  - self.tail |= x << 0 --> 0xAABBCCDD
- write_u8(0xEE)
  - short_write(0x00000000_000000EE)    // was zero-extended
  - needed = 4, fill = 1
  - self.tail |= x << 4*8 --> 0xEE_AABBCCDD
- write_u32(0xIIHHGGFF)
  - short_write(0x00000000_FFGGHHII)    // was byte-swapped, then zero-extended
  - needed = 3, fill = 3
  - self.tail |= x << 5*8 --> 0xHHGGFF_EE_CCDDBBAA
  - process
  - self.tail = x >> 3*8 --> 0xII

I have confirmed that the two little-endian cases are correct, I haven't been able to confirm the big-endian cases.

nnethercote

comment created time in 8 days

pull request commentrust-lang/rust

Speed up `SipHasher128`.

I requested access to the GCC farm on Saturday, but I am still waiting for a response.

The requirement here is that the same sequence of write_xyz() calls with the same numeric values must produce the same final hash value, independent of endianess.

Hmm. I was using this code as the basis for my reasoning: https://github.com/rust-lang/rust/blob/e6ec0d125eba4074122b187032474b4174fb9d31/src/libcore/hash/mod.rs#L297-L326

The use of to_ne_bytes shows that, by default, for a given sequence of write_xyz calls, any hasher will give different results on little-endian vs. big-endian. Going back to my example:

  • write_u32(0xDDCCBBAA)
  • write_u8(0xEE)
  • write_u32(0xIIHHGGFF)

On little-endian it is equivalent to write([AA,BB,CC,DD, EE, FF,GG,HH,II]) On big-endian it is equivalent to write([DD,CC,BB,AA, EE, II,HH,GG,FF]). Clearly the results will be different.

I was taking this equivalence to be axiomatic (i.e. required). But it makes sense that StableHasher requires the same results on little-endian and big-endian, therefore it must violate this equivalence. I guess that's ok, so long as it's consistent?

But should SipHasher128 violate this equivalence? Likewise, what about SipHasher in core? I'm not sure. My instinct is that SipHasher128/SipHasher should not violate the equivalence, in which case the endian-independence should be provided by StableHasher -- and it currently does this by using to_le in its write_xyz methods.

On a little endian machine everything works as expected. However, when I tried it on a big endian machine (gcc110 from cfarm.tetaneutral.net), I got different values until I removed the to_le() calls from the PR's implementation. Once I did that the values matched those on the little endian machine (and those of the current implementation).

Thank you for doing this checking. Here's what I was expecting from SipHasher128:

  • le-old-code == le-new-code
  • be-old-code == be-new-code
  • le-old-code != be-old-code
  • le-new-code != be-new-code

Can you write down the values you got for the four combinations?

For StableHasher, I would expect:

  • le-old-code == le-new-code == be-old-code == be-new-code

because of the extra to_le in StableHasher::write_xyz.

nnethercote

comment created time in 8 days

pull request commentrust-lang/rust

Reduce the number of `RefCell`s in `InferCtxt`.

@nnethercote Could we use a CellVec, CellHashMap to avoid the RefCell all together or do these generally need to borrow their stored values?

I don't know anything about CellVec or CellHashMap. I tried searching with Google and on crates.io without luck. Can you provide a link to some information about them?

nnethercote

comment created time in 8 days

pull request commentrust-lang/rust

Hasten macro parsing

I addressed the comments, and added a trivial commit fixing a typo in a variable name.

@bors=petrochenkov

nnethercote

comment created time in 8 days

push eventnnethercote/rust

Erin Power

commit sha 49d78fcd901700c5a14e19a6679db1646b5ca901

Add GitHub issue templates

view details

Tomasz Miąsko

commit sha 7e3c51d085d9eaa2204cc18763bc7d98b66435fd

Instrument C / C++ in MemorySanitizer example Modify the example to instrument C / C++ in addition to Rust, since it will be generally required (e.g., when using libbacktrace for symbolication). Additionally use rustc specific flag to track the origins of unitialized memory rather than LLVM one.

view details

Trevor Spiteri

commit sha aa046da61f8722dfe46204cb303dbc9d2b4cb32e

rustdoc: attempt full build for compile_fail test Some code fails when doing a full build but does not fail when only emitting metadata. This commit makes sure compile_fail tests for such code behave as expected, that is, the test succeeds because the compilation fails.

view details

Trevor Spiteri

commit sha 6d768ddecc13c4acf45730952c0af401a990383a

error code examples: replace some ignore with compile_fail

view details

Matthew Jasper

commit sha a81c59f9b84b6519785a4e0ae9234107d149f454

Remove some unsound specializations

view details

David Ross

commit sha 276734d6a4997088b6d2e7416f5d4c07b4c8acf5

Fix 59191 This adds an explicit error for when macros replace the crate root with a non-module item.

view details

David Ross

commit sha 410114b9d243020482689a94f7b254600f4d819e

Add tests for issue 59191

view details

Tyler Lanphear

commit sha 9fa54e594b371bda6e8a2bb570e645d5aa61820b

stdarch: update submodule.

view details

Friedrich von Never

commit sha b0a9e949e7afc1a77b6f73a0d3fa6b6081763a57

Strip unnecessary subexpression It became unnecessary since a06baa56b95674fc626b3c3fd680d6a65357fe60 reformatted the file.

view details

Jonas Schievink

commit sha 044fe0f558aa62926e6de9a76b95e4a74c0b1f99

Add a resume type parameter to `Generator`

view details

Jonas Schievink

commit sha 0117033c721d35ade8d815e1fbf83f10d73f15e4

Add a resume type param to the generator substs ...and unify it with `()` for now

view details

Jonas Schievink

commit sha 25af2f66cec1366f845e1de1bfec8b64d4f5cfff

Use real resume type as second argument

view details

Jonas Schievink

commit sha 8a1227a67bd5df8a8f27c02b7032bd8092d44a92

Infer type of `yield` to be resume type

view details

Jonas Schievink

commit sha 32005fe1957fc163036fbe0da8b12d39a9fb54cb

Allow 0 or 1 explicit generator parameters

view details

Jonas Schievink

commit sha 2101a1fec0e53677e32d1389b44f70a987a97c8d

Adjust tests to type inference changes This makes some error messages ungreat, but those seem to be preexisting bugs that also apply to closures / return position `impl Trait` in general.

view details

Jonas Schievink

commit sha f2c1468965a7af5887d353adf77427344327be0d

Add resume arg place to `Yield` MIR terminator

view details

Jonas Schievink

commit sha 3c069a066e50598ef230ba71ed5c5bcf596beb90

Change MIR building to fill in the resume place This changes `Yield` from `as_rvalue` to `into` lowering, which could have a possible performance impact. I could imagine special-casing some resume types here to use a simpler lowering for them, but it's unclear if that makes sense at this stage.

view details

Jonas Schievink

commit sha 3c22e51e7f6debd96af76f36aa8b090c40b8acb6

Make generator transform move resume arg around The resume arg is passed as argument `_2` and needs to be moved to the `Yield`s target `Place`

view details

Jonas Schievink

commit sha 5b2059b2572cff9974e6820791c8ab57b6c50234

Fix error message on type mismatch in generator Instead of "closure is expected to take 0 arguments" we now get the expected type mismatch error.

view details

Jonas Schievink

commit sha fca614eb578092fd869df57d6654ba0dcf92c6ef

Add tests for generator resume arguments

view details

push time in 8 days

delete branch nnethercote/rust

delete branch : reduce-RefCells-in-InferCtxt

delete time in 8 days

PR closed rust-lang/rust

Fix how the `RUSTC_CTFE_BACKTRACE` env var is gotten. S-waiting-on-review

This environment variable is currently obtained very frequently in CTFE-heavy code; using lazy_static avoids repeating the work.

For the ctfe-stress-4 benchmark this eliminates 67% of allocations done, and for coercions it eliminates 17% of allocations done.

r? @RalfJung

+7 -1

21 comments

3 changed files

nnethercote

pr closed time in 8 days

pull request commentrust-lang/rust

Fix how the `RUSTC_CTFE_BACKTRACE` env var is gotten.

I will close this PR because it's not the right approach. I look forward to seeing @wesleywiser's alternative :)

nnethercote

comment created time in 8 days

push eventnnethercote/rust

Nicholas Nethercote

commit sha f8a02864afa2faecc3cb9cb8f81905a61a638ade

Speed up `SipHasher128`. The current code in `SipHasher128::short_write` is inefficient. It uses `u8to64_le` (which is complex and slow) to extract just the right number of bytes of the input into a u64 and pad the result with zeroes. It then left-shifts that value in order to bitwise-OR it with `self.tail`. For example, imagine we have a u32 input 0xIIHH_GGFF and only need three bytes to fill up `self.tail`. The current code uses `u8to64_le` to construct 0x0000_0000_00HH_GGFF, which is just 0xIIHH_GGFF with the 0xII removed and zero-extended to a u64. The code then left-shifts that value by five bytes -- discarding the 0x00 byte that replaced the 0xII byte! -- to give 0xHHGG_FF00_0000_0000. It then then ORs that value with self.tail. There's a much simpler way to do it: zero-extend to u64 first, then left shift. E.g. 0xIIHH_GGFF is zero-extended to 0x0000_0000_IIHH_GGFF, and then left-shifted to 0xHHGG_FF00_0000_0000. We don't have to take time to exclude the unneeded 0xII byte, because it just gets shifted out anyway! It also avoids multiple occurrences of `unsafe`. There's a similar story with the setting of `self.tail` at the method's end. The current code uses `u8to64_le` to extract the remaining part of the input, but the same effect can be achieved more quickly with a right shift on the zero-extended input. All that works on little-endian. It doesn't work for big-endian, but we can just do a `to_le` before calling `short_write` and then it works. This commit changes `SipHasher128` to use the simpler shift-based approach. The code is also smaller, which means that `short_write` is now inlined where previously it wasn't, which makes things faster again. This gives big speed-ups for all incremental builds, especially "baseline" incremental builds.

view details

push time in 8 days

pull request commentrust-lang/rust

Hasten macro parsing

@petrochenkov are you happy to r+ this instead of @eddyb?

nnethercote

comment created time in 8 days

push eventnnethercote/rust

Nicholas Nethercote

commit sha 7edcdc852e4b73bab3fe0eb399c21eeadc69cf10

Speed up `SipHasher128`. The current code in `SipHasher128::short_write` is inefficient. It uses `u8to64_le` (which is complex and slow) to extract just the right number of bytes of the input into a u64 and pad the result with zeroes. It then left-shifts that value in order to bitwise-OR it with `self.tail`. For example, imagine we have a u32 input 0xIIHH_GGFF and only need three bytes to fill up `self.tail`. The current code uses `u8to64_le` to construct 0x0000_0000_00HH_GGFF, which is just 0xIIHH_GGFF with the 0xII removed and zero-extended to a u64. The code then left-shifts that value by five bytes -- discarding the 0x00 byte that replaced the 0xII byte! -- to give 0xHHGG_FF00_0000_0000. It then then ORs that value with self.tail. There's a much simpler way to do it: zero-extend to u64 first, then left shift. E.g. 0xIIHH_GGFF is zero-extended to 0x0000_0000_IIHH_GGFF, and then left-shifted to 0xHHGG_FF00_0000_0000. We don't have to take time to exclude the unneeded 0xII byte, because it just gets shifted out anyway! It also avoids multiple occurrences of `unsafe`. There's a similar story with the setting of `self.tail` at the method's end. The current code uses `u8to64_le` to extract the remaining part of the input, but the same effect can be achieved more quickly with a right shift on the zero-extended input. All that works on little-endian. It doesn't work for big-endian, but we can just do a `to_le` before calling `short_write` and then it works. This commit changes `SipHasher128` to use the simpler shift-based approach. The code is also smaller, which means that `short_write` is now inlined where previously it wasn't, which makes things faster again. This gives big speed-ups for all incremental builds, especially "baseline" incremental builds.

view details

push time in 8 days

pull request commentrust-lang/rust

Speed up `SipHasher128`.

@michaelwoerister: After some thought I see that my original code was not correct for big-endian. Fortunately there is a cheap and easy fix.

Consider this 9 byte stream:

  • write_u32(0xDDCCBBAA)
  • write_u8(0xEE)
  • write_u32(0xIIHHGGFF)

On little-endian it is equivalent to write([AA,BB,CC,DD, EE, FF,GG,HH,II]). SipHash parses the input stream as 8-byte little-endian words, so it must process the first 8 bytes of this stream as 0xHHGGFF_EE_DDCCBBAA, and the second 8 bytes would be 0x??????????????_II.

On big-endian it is equivalent to write([DD,CC,BB,AA, EE, II,HH,GG,FF]). SipHash parses the input stream as 8-byte little-endian words, so it must process the first 8 bytes of this stream as 0xGGHHII_EE_AABBCCDD, and the second 8 bytes would be 0x??????????????_FF.

The new short_write works correctly for little-endian, i.e. given the above write_u32/write_u8/write_u32 sequence, the first 8 byte value produced from the stream is 0xHHGGFF_EE_DDCCBBAA, with 0xII leftover.

To make it work for big-endian, we just need to call to_le to do a byte-swap on the integer inputs in in write_* before doing anything else, and then things work out. E.g. it's easy to see that the 8 byte value 0xHHGGFF_EE_DDCCBBAA becomes 0xGGHHII_EE_AABBCCDD (with 0xFF left over) when the individual integers are byte-swapped, and that's the value we want on big-endian.

I have updated the PR to do this. I haven't tested it on a big-endian machine but I'm fairly confident it's correct. But this stuff is tricky to think about so, again, I'm happy to hear second opinions.

nnethercote

comment created time in 8 days

push eventnnethercote/rust

Erin Power

commit sha 49d78fcd901700c5a14e19a6679db1646b5ca901

Add GitHub issue templates

view details

Tomasz Miąsko

commit sha 7e3c51d085d9eaa2204cc18763bc7d98b66435fd

Instrument C / C++ in MemorySanitizer example Modify the example to instrument C / C++ in addition to Rust, since it will be generally required (e.g., when using libbacktrace for symbolication). Additionally use rustc specific flag to track the origins of unitialized memory rather than LLVM one.

view details

Trevor Spiteri

commit sha aa046da61f8722dfe46204cb303dbc9d2b4cb32e

rustdoc: attempt full build for compile_fail test Some code fails when doing a full build but does not fail when only emitting metadata. This commit makes sure compile_fail tests for such code behave as expected, that is, the test succeeds because the compilation fails.

view details

Trevor Spiteri

commit sha 6d768ddecc13c4acf45730952c0af401a990383a

error code examples: replace some ignore with compile_fail

view details

Matthew Jasper

commit sha a81c59f9b84b6519785a4e0ae9234107d149f454

Remove some unsound specializations

view details

David Ross

commit sha 276734d6a4997088b6d2e7416f5d4c07b4c8acf5

Fix 59191 This adds an explicit error for when macros replace the crate root with a non-module item.

view details

David Ross

commit sha 410114b9d243020482689a94f7b254600f4d819e

Add tests for issue 59191

view details

Tyler Lanphear

commit sha 9fa54e594b371bda6e8a2bb570e645d5aa61820b

stdarch: update submodule.

view details

Friedrich von Never

commit sha b0a9e949e7afc1a77b6f73a0d3fa6b6081763a57

Strip unnecessary subexpression It became unnecessary since a06baa56b95674fc626b3c3fd680d6a65357fe60 reformatted the file.

view details

Jonas Schievink

commit sha 044fe0f558aa62926e6de9a76b95e4a74c0b1f99

Add a resume type parameter to `Generator`

view details

Jonas Schievink

commit sha 0117033c721d35ade8d815e1fbf83f10d73f15e4

Add a resume type param to the generator substs ...and unify it with `()` for now

view details

Jonas Schievink

commit sha 25af2f66cec1366f845e1de1bfec8b64d4f5cfff

Use real resume type as second argument

view details

Jonas Schievink

commit sha 8a1227a67bd5df8a8f27c02b7032bd8092d44a92

Infer type of `yield` to be resume type

view details

Jonas Schievink

commit sha 32005fe1957fc163036fbe0da8b12d39a9fb54cb

Allow 0 or 1 explicit generator parameters

view details

Jonas Schievink

commit sha 2101a1fec0e53677e32d1389b44f70a987a97c8d

Adjust tests to type inference changes This makes some error messages ungreat, but those seem to be preexisting bugs that also apply to closures / return position `impl Trait` in general.

view details

Jonas Schievink

commit sha f2c1468965a7af5887d353adf77427344327be0d

Add resume arg place to `Yield` MIR terminator

view details

Jonas Schievink

commit sha 3c069a066e50598ef230ba71ed5c5bcf596beb90

Change MIR building to fill in the resume place This changes `Yield` from `as_rvalue` to `into` lowering, which could have a possible performance impact. I could imagine special-casing some resume types here to use a simpler lowering for them, but it's unclear if that makes sense at this stage.

view details

Jonas Schievink

commit sha 3c22e51e7f6debd96af76f36aa8b090c40b8acb6

Make generator transform move resume arg around The resume arg is passed as argument `_2` and needs to be moved to the `Yield`s target `Place`

view details

Jonas Schievink

commit sha 5b2059b2572cff9974e6820791c8ab57b6c50234

Fix error message on type mismatch in generator Instead of "closure is expected to take 0 arguments" we now get the expected type mismatch error.

view details

Jonas Schievink

commit sha fca614eb578092fd869df57d6654ba0dcf92c6ef

Add tests for generator resume arguments

view details

push time in 8 days

pull request commentrust-lang/rust

Reduce the number of `RefCell`s in `InferCtxt`.

Bah, I fell for the old git-push-with-uncommitted-changes trick. Let's try again.

@bors r=varkor

nnethercote

comment created time in 9 days

push eventnnethercote/rust

Nicholas Nethercote

commit sha 7426853ba255940b880f2e7f8026d60b94b42404

Reduce the number of `RefCell`s in `InferCtxt`. `InferCtxt` contains six structures within `RefCell`s. Every time we create and dispose of (commit or rollback) a snapshot we have to `borrow_mut` each one of them. This commit moves the six structures under a single `RefCell`, which gives significant speed-ups by reducing the number of `borrow_mut` calls. To avoid runtime errors I had to reduce the lifetimes of dynamic borrows in a couple of places.

view details

push time in 9 days

pull request commentrust-lang/rust

Reduce the number of `RefCell`s in `InferCtxt`.

I rebased.

@bors r=varkor

nnethercote

comment created time in 10 days

push eventnnethercote/rust

Charles Gleason

commit sha 293cdf7ac5d14811debdec3408afde104935caef

Make RangeMut::next_unchecked() output a mutable key reference

view details

Charles Gleason

commit sha f547978392872684085c96a3d5c1d00bad24b724

Implement clone_from for BTree collections

view details

Charles Gleason

commit sha 8651aa066fdbbcfaa082531969469c3fa289de9e

Add test for BTreeMap::clone_from

view details

Erin Power

commit sha 49d78fcd901700c5a14e19a6679db1646b5ca901

Add GitHub issue templates

view details

Linus Färnstrand

commit sha b5ff8064a4fe5b2bc70ee209b19d129b8ffc3ebc

Add MIN/MAX associated constants to the integer types

view details

Linus Färnstrand

commit sha 22dcfa1d8d18687d7ca0b91974bce4202d3383e9

Add relevant associated constants to the float types

view details

Linus Färnstrand

commit sha 9d257579fcaa44f69c8f7d5f668b05ae89e4507b

Fix some float operations to work together with the assoc consts

view details

Linus Färnstrand

commit sha 6ce16cfa42dcb1acd2c823a1d45269132d372409

Remove no longer valid test

view details

Linus Färnstrand

commit sha 9fcbaa4158948324f395ff2eb8061abdf6dbc21f

Fix broken show-const-contents test

view details

Linus Färnstrand

commit sha 4d9e90d2a5146e3f8639b53f29e210be94b30933

Unlock assoc_int_consts in core+std

view details

Linus Färnstrand

commit sha 002c7897a6c92397f6682bf7e9e86c9b4efd5c51

Unlock assoc_int_consts in documentation examples using it

view details

Linus Färnstrand

commit sha 61fecfb82fe088af6d3a7832b72f298064398aff

Add test accessing the module level int/float consts

view details

varkor

commit sha f4f96e294335de13bc7341c626837affdb2e4a45

Normalise diagnostics with respect to "the X is declared/defined here"

view details

varkor

commit sha 24a2929ed1d3e1760bf89c878352448fb5ee2087

Normalise notes with the/is

view details

varkor

commit sha 45832839087da140eeb2a85a8b98927ec27ba21c

Update new tests

view details

Tomasz Miąsko

commit sha 7e3c51d085d9eaa2204cc18763bc7d98b66435fd

Instrument C / C++ in MemorySanitizer example Modify the example to instrument C / C++ in addition to Rust, since it will be generally required (e.g., when using libbacktrace for symbolication). Additionally use rustc specific flag to track the origins of unitialized memory rather than LLVM one.

view details

Andreas Molzer

commit sha 47ae565ed4f1b2a7cc754d4cf0af520b5e6841b9

Add a method to query the capacity of a BufWriter

view details

Mazdak Farrokhzad

commit sha dc17f38e041e6bde95c6f6c5c6170dbb3917d51e

check_unsafety: more code reuse

view details

Andreas Molzer

commit sha aebd0d733940d62566c66a923c7b9f7078209e98

Add capacity to BufReader with same unstable gate

view details

Andrew Paverd

commit sha c0744e1e0c35b1083733fd5c74fc3fb5a6cd04f7

Add support for Control Flow Guard on Windows. This patch enables rustc to emit the required LLVM module flags to enable Control Flow Guard metadata (cfguard=1) or metadata and checks (cfguard=2). The LLVM module flags are ignored on unsupported targets and operating systems.

view details

push time in 10 days

push eventnnethercote/rust

Erin Power

commit sha 49d78fcd901700c5a14e19a6679db1646b5ca901

Add GitHub issue templates

view details

Tomasz Miąsko

commit sha 7e3c51d085d9eaa2204cc18763bc7d98b66435fd

Instrument C / C++ in MemorySanitizer example Modify the example to instrument C / C++ in addition to Rust, since it will be generally required (e.g., when using libbacktrace for symbolication). Additionally use rustc specific flag to track the origins of unitialized memory rather than LLVM one.

view details

Trevor Spiteri

commit sha aa046da61f8722dfe46204cb303dbc9d2b4cb32e

rustdoc: attempt full build for compile_fail test Some code fails when doing a full build but does not fail when only emitting metadata. This commit makes sure compile_fail tests for such code behave as expected, that is, the test succeeds because the compilation fails.

view details

Trevor Spiteri

commit sha 6d768ddecc13c4acf45730952c0af401a990383a

error code examples: replace some ignore with compile_fail

view details

Matthew Jasper

commit sha a81c59f9b84b6519785a4e0ae9234107d149f454

Remove some unsound specializations

view details

David Ross

commit sha 276734d6a4997088b6d2e7416f5d4c07b4c8acf5

Fix 59191 This adds an explicit error for when macros replace the crate root with a non-module item.

view details

David Ross

commit sha 410114b9d243020482689a94f7b254600f4d819e

Add tests for issue 59191

view details

Tyler Lanphear

commit sha 9fa54e594b371bda6e8a2bb570e645d5aa61820b

stdarch: update submodule.

view details

Friedrich von Never

commit sha b0a9e949e7afc1a77b6f73a0d3fa6b6081763a57

Strip unnecessary subexpression It became unnecessary since a06baa56b95674fc626b3c3fd680d6a65357fe60 reformatted the file.

view details

Jonas Schievink

commit sha 044fe0f558aa62926e6de9a76b95e4a74c0b1f99

Add a resume type parameter to `Generator`

view details

Jonas Schievink

commit sha 0117033c721d35ade8d815e1fbf83f10d73f15e4

Add a resume type param to the generator substs ...and unify it with `()` for now

view details

Jonas Schievink

commit sha 25af2f66cec1366f845e1de1bfec8b64d4f5cfff

Use real resume type as second argument

view details

Jonas Schievink

commit sha 8a1227a67bd5df8a8f27c02b7032bd8092d44a92

Infer type of `yield` to be resume type

view details

Jonas Schievink

commit sha 32005fe1957fc163036fbe0da8b12d39a9fb54cb

Allow 0 or 1 explicit generator parameters

view details

Jonas Schievink

commit sha 2101a1fec0e53677e32d1389b44f70a987a97c8d

Adjust tests to type inference changes This makes some error messages ungreat, but those seem to be preexisting bugs that also apply to closures / return position `impl Trait` in general.

view details

Jonas Schievink

commit sha f2c1468965a7af5887d353adf77427344327be0d

Add resume arg place to `Yield` MIR terminator

view details

Jonas Schievink

commit sha 3c069a066e50598ef230ba71ed5c5bcf596beb90

Change MIR building to fill in the resume place This changes `Yield` from `as_rvalue` to `into` lowering, which could have a possible performance impact. I could imagine special-casing some resume types here to use a simpler lowering for them, but it's unclear if that makes sense at this stage.

view details

Jonas Schievink

commit sha 3c22e51e7f6debd96af76f36aa8b090c40b8acb6

Make generator transform move resume arg around The resume arg is passed as argument `_2` and needs to be moved to the `Yield`s target `Place`

view details

Jonas Schievink

commit sha 5b2059b2572cff9974e6820791c8ab57b6c50234

Fix error message on type mismatch in generator Instead of "closure is expected to take 0 arguments" we now get the expected type mismatch error.

view details

Jonas Schievink

commit sha fca614eb578092fd869df57d6654ba0dcf92c6ef

Add tests for generator resume arguments

view details

push time in 10 days

pull request commentrust-lang/rust

Hasten macro parsing

Parser cloning is very rare on this code path, so much so that there is no measurable benefit to making it cheaper.

I previously made Directory a Cow to avoid allocations on this hot path. This new Parser-is-Cow optimization subsumes the old Directory-is-Cow optimization.

Does that help? I'm not sure how else to explain it.

nnethercote

comment created time in 10 days

pull request commentrust-lang/rust

Reduce the number of `RefCell`s in `InferCtxt`.

Oops, there are conflicts.

nnethercote

comment created time in 10 days

pull request commentrust-lang/rust

Reduce the number of `RefCell`s in `InferCtxt`.

AFAICT no rebasing is required.

@bors r=varkor

nnethercote

comment created time in 10 days

pull request commentrust-lang/rust

Speed up `SipHasher128`.

I did think about big-endian. AFAIK, the new code will work fine there. Things are simpler because the code operates mostly on integers, with fewer conversions. But I'm happy to hear a second opinion.

Is there a way to know for sure if I'm right about this? Do you know if the tests cover BE, or if we have access to any BE machines for testing?

nnethercote

comment created time in 11 days

push eventnnethercote/unicode-xid

Nicholas Nethercote

commit sha bffbff67a89de59ebb16df39aa19798779a8f6f7

Speed up `bsearch_range_table`. The comparison function used for the binary search is sub-optimal, because it looks for an `Equal` result first, even though that is the least common result (happening 0 or 1 times per search). This commit changes the comparison function to look for `Equals` last. As a result, the number of tests executed within the comparison function is reduced: - On a `Greater` result: 2 tests --> 1 test - On a `Less` result: 3 tests --> 2 tests - On an `Equals` result (next most common): 2 tests --> 2 tests

view details

Manish Goregaokar

commit sha bb14270a5fed9865ac324e5587ed7f10a9ac7395

Merge pull request #16 from nnethercote/speed-up-bsearch_range_table Speed up `bsearch_range_table`.

view details

push time in 11 days

delete branch nnethercote/unicode-xid

delete branch : speed-up-bsearch_range_table

delete time in 11 days

pull request commentunicode-rs/unicode-xid

Speed up `bsearch_range_table`.

Note that bsearch_range_table showed up in one profile for rustc, because proc-macro2-0.4.3 has this function which is reasonably hot:

fn xid_ok(string: &str) -> bool {
    let mut chars = string.chars(); 
    let first = chars.next().unwrap();
    if !(UnicodeXID::is_xid_start(first) || first == '_') {
        return false;
    }
    for ch in chars {
        if !UnicodeXID::is_xid_continue(ch) {
            return false;
        }
    }
    true
}

proc-macro2 is now up to version 1.0.8 and the current code has ASCII checks before calling out to unicode-xid functions, like so: https://github.com/alexcrichton/proc-macro2/blob/8ce7c670a55fa2c8a312d9c107143b3bdb6e93ec/src/fallback.rs#L578-L591

Still, it's an easy change and a clear win, even if it doesn't end up helping rustc directly.

nnethercote

comment created time in 11 days

PR opened unicode-rs/unicode-xid

Speed up `bsearch_range_table`.

The comparison function used for the binary search is sub-optimal, because it looks for an Equal result first, even though that is the least common result (happening 0 or 1 times per search).

This commit changes the comparison function to look for Equals last. As a result, the number of tests executed within the comparison function is reduced:

  • On a Greater result: 2 tests --> 1 test
  • On a Less result: 3 tests --> 2 tests
  • On an Equals result (next most common): 2 tests --> 2 tests
+7 -3

0 comment

1 changed file

pr created time in 11 days

create barnchnnethercote/unicode-xid

branch : speed-up-bsearch_range_table

created branch time in 11 days

push eventnnethercote/unicode-xid

Zachary Pierce

commit sha 2dcb1b242a809719b8c14f461ef5ec7e8683936f

Prove the API does not panic for all chars

view details

Zachary Pierce

commit sha 87a151133dedb6459c36d6260fa1c84290a7fe1f

Fix build with 'bench' feature Also correct all cargo and clippy warnings

view details

Manish Goregaokar

commit sha 6a681a73972575183dec348e135c98b6c604c86b

Merge pull request #10 from ZackPierce/fix_bench_and_cleanup Fix CI build by updating feature associated with 'bench' feature

view details

Aleksey Kladov

commit sha bfc663326ba9d34a45711bcf8e85696d22fb0a17

update to unicode 12.1.0

view details

Manish Goregaokar

commit sha 31a29c17bb54f541bb39d12ecb13efa73ce287e4

Merge pull request #11 from matklad/new-unicode update to unicode 12.1.0

view details

Aleksey Kladov

commit sha 0c7f5b68e1d61ce415a490e70a4ff7a67672a882

Cleanup Cargo.toml - use new syntax for license - use default exclude rules

view details

Aleksey Kladov

commit sha 5d30c6d921987573b900a48a7365e7303039cfb6

exclude irrelevant files from publishing

view details

Aleksey Kladov

commit sha 7ee668074ad98ff3d20cc21cd55fbd8f540e7909

add CI badges

view details

Aleksey Kladov

commit sha 406f09e8d34b7279aea76e588a5f861dec9dc315

publish 0.2.0

view details

Aleksey Kladov

commit sha 1bed61b91d8b1bf3740b2dd36b37727dcb6f298c

use the correct feature name for the new nightly

view details

Manish Goregaokar

commit sha 4baae9fffb156ba229665b972a9cd5991787ceb7

Merge pull request #12 from matklad/v0.1.1 V0.2.0

view details

Zachary Pierce

commit sha b8bf4daa34b1a48c8373abf6c52ca21c7e878c4f

Remove exhaustive tests for intentionally invalid chars

view details

Manish Goregaokar

commit sha 63c1458bfeeff61bd225a0d76a1338b0d85c17fe

Merge pull request #9 from ZackPierce/exhaustive_tests Prove the API does not panic for all chars

view details

Craig Hills

commit sha 73dd2644ef17a17a54c60f89bf60dec8a140a61e

Upgrades the unsafe_code lint from deny to forbid This simply avoids the possibility of overriding the lint rule later in the process. It also helps for tool usage, such as cargo geiger, where it will now show the library at a higher safety level.

view details

Craig Hills

commit sha f11d669e323dadb16e1c2f4fc0ab8419f4e2f7bf

Adds a fix for bench tests

view details

Manish Goregaokar

commit sha 5e0a152e4d456e378c01f5edf19022540c57e318

Merge pull request #15 from chills42/master Upgrades the unsafe_code lint from deny to forbid

view details

push time in 11 days

delete branch nnethercote/rust

delete branch : rm-RefCell-from-ObligationForest

delete time in 12 days

Pull request review commentrust-lang/rust

Speed up `SipHasher128`.

 impl SipHasher128 {         self.state.v1 ^= 0xee;     } -    // Specialized write function that is only valid for buffers with len <= 8.-    // It's used to force inlining of write_u8 and write_usize, those would normally be inlined-    // except for composite types (that includes slices and str hashing because of delimiter).-    // Without this extra push the compiler is very reluctant to inline delimiter writes,-    // degrading performance substantially for the most common use cases.+    // Specialized write function for values with size <= 8.     #[inline]-    fn short_write(&mut self, msg: &[u8]) {-        debug_assert!(msg.len() <= 8);-        let length = msg.len();-        self.length += length;+    fn short_write<T>(&mut self, _x: T, x: u64) {

I'm not sure what you mean by "more aggressive inlining". These functions are inlined (Cachegrind's results confirm that for me), and given that, there is no _x to actually pass, right? And presumably the mem::size_of::<T>() will end up being constants.

nnethercote

comment created time in 12 days

pull request commentrust-lang/rust

Speed up `SipHasher128`.

Local results are excellent:

ctfe-stress-4-check
        avg: -4.7%?     min: -13.1%?    max: -0.1%?
clap-rs-check
        avg: -3.0%      min: -9.7%      max: -0.4%
coercions-check
        avg: -3.4%?     min: -5.7%?     max: -0.6%?
tuple-stress-check
        avg: -3.4%      min: -4.9%      max: -1.0%
ucd-check
        avg: -2.6%      min: -4.4%      max: -0.7%
html5ever-check
        avg: -2.0%      min: -3.9%      max: -0.6%
serde-check
        avg: -1.8%      min: -3.7%      max: -0.4%
unicode_normalization-check
        avg: -1.8%      min: -3.5%      max: -0.4%
keccak-check
        avg: -1.5%      min: -3.5%      max: -0.2%
issue-46449-check
        avg: -1.1%      min: -3.3%      max: -0.4%
piston-image-check
        avg: -2.3%      min: -3.1%      max: -0.7%
serde-serde_derive-check
        avg: -1.6%      min: -3.0%      max: -0.6%
await-call-tree-check
        avg: -1.6%      min: -3.0%      max: -0.6%
cranelift-codegen-check
        avg: -2.1%      min: -3.0%      max: -0.7%
deep-vector-check
        avg: -2.4%      min: -3.0%      max: -1.1%
script-servo-check
        avg: -2.3%      min: -2.9%      max: -0.8%
regex-check
        avg: -2.5%      min: -2.9%      max: -0.7%
ripgrep-check
        avg: -2.0%      min: -2.9%      max: -0.6%
webrender-check
        avg: -2.2%      min: -2.9%      max: -0.7%
encoding-check
        avg: -2.2%      min: -2.9%      max: -0.8%
syn-check
        avg: -2.2%      min: -2.8%      max: -0.8%
inflate-check
        avg: -1.2%      min: -2.8%      max: -0.2%
webrender-wrench-check
        avg: -1.6%      min: -2.8%      max: -0.5%
cargo-check
        avg: -1.7%      min: -2.7%      max: -0.5%
unused-warnings-check
        avg: -2.2%      min: -2.6%      max: -1.4%
futures-check
        avg: -1.8%      min: -2.6%      max: -0.4%
helloworld-check
        avg: -1.3%      min: -2.5%      max: -0.6%
style-servo-check
        avg: -2.0%      min: -2.5%      max: -0.7%
tokio-webpush-simple-check
        avg: -1.4%      min: -2.4%      max: -0.4%
regression-31157-check
        avg: -1.5%      min: -2.4%      max: -0.4%
hyper-2-check
        avg: -1.6%      min: -2.2%      max: -0.5%
deeply-nested-check
        avg: -1.1%      min: -2.0%      max: -0.2%
unify-linearly-check
        avg: -1.1%      min: -2.0%      max: -0.3%
packed-simd-check
        avg: -1.2%      min: -1.9%      max: -0.5%
wg-grammar-check
        avg: -1.0%      min: -1.9%      max: -0.1%
wf-projection-stress-65510-che...
        avg: -0.6%      min: -1.6%      max: -0.0%
token-stream-stress-check
        avg: -0.2%      min: -0.3%      max: -0.1%

It's notable that every benchmark except for token-stream-stress-check got at least a 1.6% speedup for one of the runs. It's rare for any speed improvement to have such a wide effect.

nnethercote

comment created time in 12 days

pull request commentrust-lang/rust

Speed up `SipHasher128`.

BTW, I'm planning to make the equivalent change to SipHasher in core, but I will do that as a separate PR.

@bors try @rust-timer queue

nnethercote

comment created time in 12 days

PR opened rust-lang/rust

Speed up `SipHasher128`.

The current code in SipHasher128::short_write is inefficient. It uses u8to64_le (which is complex and slow) to extract just the right number of bytes of the input into a u64 and pad the result with zeroes. It then left-shifts that value in order to bitwise-OR it with self.tail.

For example, imagine we have a u32 input 0xIIHH_GGFF and only need three bytes to fill up self.tail. The current code uses u8to64_le to construct 0x0000_0000_00HH_GGFF, which is just 0xIIHH_GGFF with the 0xII removed and zero-extended to a u64. The code then left-shifts that value by five bytes -- discarding the 0x00 byte that replaced the 0xII byte! -- to give 0xHHGG_FF00_0000_0000. It then then ORs that value with self.tail.

There's a much simpler way to do it: zero-extend to u64 first, then left shift. E.g. 0xIIHH_GGFF is zero-extended to 0x0000_0000_IIHH_GGFF, and then left-shifted to 0xHHGG_FF00_0000_0000. We don't have to take time to exclude the unneeded 0xII byte, because it just gets shifted out anyway! It also avoids multiple occurrences of unsafe.

There's a similar story with the setting of self.tail at the method's end. The current code uses u8to64_le to extract the remaining part of the input, but the same effect can be achieved more quickly with a right shift on the zero-extended input.

This commit changes SipHasher128 to use the simpler shift-based approach. The code is also smaller, which means that short_write is now inlined where previously it wasn't, which makes things faster again. This gives big speed-ups for all incremental builds, especially "baseline" incremental builds.

r? @michaelwoerister

+51 -39

0 comment

1 changed file

pr created time in 12 days

create barnchnnethercote/rust

branch : speed-up-SipHasher128

created branch time in 12 days

more