profile
viewpoint
Michael Woerister michaelwoerister Berlin, Germany michaelwoerister.github.io

michaelwoerister/hamt-rs 145

A Persistent Map Implementation based on Hash Array Mapped Tries

aardvark-platform/aardvark.base 86

Aardvark is an open-source platform for visual computing, real-time graphics and visualization. This repository is the basis for most platform libraries and provides basic functionality such as data-structures, math and much more.

aardvark-platform/aardvark.docs 80

Simple examples combining multiple packages provided by the aardvark platform. Each platform repository comes with separate examples -- here we collect overarching examples using for example aardvark.rendering and aardvark.media.

aardvark-platform/aardvark.rendering 51

The dependency-aware, high-performance aardvark rendering engine. This repo is part of aardvark - an open-source platform for visual computing, real-time graphics and visualization.

aardvark-platform/aardvark.media 22

Serverside, functional (ELM style) front-end and UI for aardvark, an open-source platform for visual computing, real-time graphics and visualization.

aardvark-platform/aardvark.algodat 9

Advanced data-structures (e.g. spatial acceleration data-structures such as octree, kdTree), part of aardvark, an open-source platform for visual computing, real-time graphics and visualization.

aardvark-platform/fablish 4

Elm style applications in .NET for composable user interfaces

aardvark-platform/template 4

project template for aardvark projects with build script for bootstrapping new aardvark projects (including all necessary dependencies).

aardvark-community/aardvark.semantictextonforests 3

An implementation of semantic texton forests (including a .NET wrapper for libsvm written in C++/CLI).

aardvark-platform/aardvark.fake 1

Script extensions for FAKE build scripts such as native dependency injection and cabal style add-source functionality

pull request commentrust-lang/rust

Construct query job latches on-demand

Thanks, @Zoxc!

@bors r+ rollup=never

(This is doing some complicated changes, let's keep it out of rollups in case it breaks anything)

Zoxc

comment created time in 6 days

Pull request review commentwesleywiser/blog.rust-lang.org

[Inside Rust] Self-Profile tutorial

+---+layout: post+title: "Introduction to profiling rustc using the self profiler"+author: Wesley Wiser+description: "Learn how to use the -Zself-profile rustc flag"+team: the self-profile working group <https://rust-lang.github.io/compiler-team/working-groups/self-profile/>+---++Over the last year, the [Self-Profile Working Group] has been building tools to profile `rustc`.+This is part of the Compiler Team's ongoing efforts to improve `rustc`'s performance.+In this post, we'll look at the tooling currently available and use them to profile an example crate's compile time.++First, we'll download and build the `measureme` repository which provides tools to analyze self-profile trace data.++```sh+$ git clone https://github.com/rust-lang/measureme.git+$ cd measureme+$ cargo build --release --all+```++Now that we have our tools, let's download an example crate to profile its build.++```sh+$ cd ..+$ git clone https://github.com/rust-lang/regex.git+$ cd regex+```++We'll need to use a recent nightly compiler to get access to unsable `-Z` flags.++```sh+$ rustup override set nightly+```++If you haven't installed a nightly compiler before, this will download the nightly compiler for you.+If you have, then update it to make sure you're on a recent version.++```sh+$ rustup update nightly+```++Now we can build it and tell `rustc` to profile the build of the `regex` crate.+This will cause three new files to be created in the working directory which contain the profling data.++```sh+$ cargo rustc -- -Zself-profile+$ ls+CHANGELOG.md        LICENSE-APACHE       UNICODE.md              regex-17088.string_data       regex-syntax         target+Cargo.lock          LICENSE-MIT          bench                   regex-17088.string_index      rustfmt.toml         test+Cargo.toml          PERFORMANCE.md       examples                regex-capi                    scripts              tests+HACKING.md          README.md            regex-17088.events      regex-debug                   src+```++The new files follow the format `{crate name}-{rustc process id}.{events,string_data,string_index}`.++We'll use each of the three main tools to analyze the profling data:++## `summarize`++As its name suggests, this tool summarizes the data found in the trace files.+Additionally, `summarize` can also show a "diff" between two trace files but we won't be using this mode.++Let's run the tool, passing just the file name (but not the extension) for the trace:++```sh+$ ../measureme/target/release/summarize summarize regex-17088++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| Item                                          | Self time | % of total time | Time     | Item count | Cache hits | Blocked time | Incremental load time |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| LLVM_module_codegen_emit_obj                  | 4.89s     | 42.752          | 4.89s    | 159        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| codegen_module                                | 1.25s     | 10.967          | 1.37s    | 159        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| LLVM_module_optimize_module_passes            | 1.15s     | 10.022          | 1.15s    | 159        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| LLVM_module_codegen_make_bitcode              | 786.56ms  | 6.875           | 960.73ms | 159        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| typeck_tables_of                              | 565.18ms  | 4.940           | 689.39ms | 848        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| LLVM_module_codegen                           | 408.01ms  | 3.566           | 6.26s    | 159        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| mir_borrowck                                  | 224.03ms  | 1.958           | 543.33ms | 848        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| LLVM_module_codegen_emit_compressed_bitcode   | 174.17ms  | 1.522           | 174.17ms | 159        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| optimized_mir                                 | 157.91ms  | 1.380           | 205.29ms | 1996       | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| evaluate_obligation                           | 146.50ms  | 1.281           | 184.17ms | 8304       | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| codegen_crate                                 | 139.48ms  | 1.219           | 1.58s    | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| mir_built                                     | 123.88ms  | 1.083           | 168.01ms | 848        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| metadata_decode_entry                         | 88.36ms   | 0.772           | 117.77ms | 55642      | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| incr_comp_copy_cgu_workproducts               | 64.21ms   | 0.561           | 64.21ms  | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| monomorphization_collector_graph_walk         | 54.11ms   | 0.473           | 344.00ms | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| link_rlib                                     | 43.21ms   | 0.378           | 43.21ms  | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| check_impl_item_well_formed                   | 41.36ms   | 0.362           | 77.14ms  | 736        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| codegen_fulfill_obligation                    | 40.36ms   | 0.353           | 51.56ms  | 1759       | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| expand_crate                                  | 37.24ms   | 0.326           | 48.52ms  | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| symbol_name                                   | 36.31ms   | 0.317           | 39.06ms  | 5513       | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| free_global_ctxt                              | 34.34ms   | 0.300           | 34.34ms  | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| type_op_prove_predicate                       | 29.99ms   | 0.262           | 31.24ms  | 1903       | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| encode_query_results                          | 28.59ms   | 0.250           | 28.59ms  | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| layout_raw                                    | 28.09ms   | 0.245           | 76.62ms  | 9023       | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| parse_crate                                   | 27.60ms   | 0.241           | 27.60ms  | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| build_hir_map                                 | 26.37ms   | 0.230           | 31.47ms  | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| link_binary_remove_temps                      | 26.23ms   | 0.229           | 26.23ms  | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| resolve_crate                                 | 25.38ms   | 0.222           | 25.38ms  | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| check_item_well_formed                        | 25.37ms   | 0.222           | 47.45ms  | 836        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| param_env                                     | 22.64ms   | 0.198           | 31.63ms  | 2519       | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| region_scope_tree                             | 21.71ms   | 0.190           | 21.71ms  | 1366       | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| specialization_graph_of                       | 19.92ms   | 0.174           | 75.86ms  | 65         | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| check_match                                   | 18.78ms   | 0.164           | 30.05ms  | 848        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| is_freeze_raw                                 | 17.58ms   | 0.154           | 62.67ms  | 3214       | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| hir_lowering                                  | 17.58ms   | 0.154           | 17.58ms  | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| check_mod_privacy                             | 16.21ms   | 0.142           | 22.75ms  | 31         | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++...+Total cpu time: 11.440758871s+```++The output is sorted by the self time (time spent in the query or activity but not other queries or activites called by itself).+As you can see, most of the compilation time is spent in LLVM generating the binary code for the executable.++## `flamegraph`++As you may have guessed, `flamegraph` will produce a [flame graph] of the profiling data.+To run the tool, we'll pass just the filename without a file extension like we did for `summarize`:++```sh+$ ../measureme/target/release/flamegraph regex-17088+```++This will create a file called `rustc.svg` in the working directory:++![Image of flamegraph output][flame graph img]++[Click here] to try the interactive svg.++## `crox`++This tool processes self-profiling data into the JSON format that the Chromium profiler understands.+You can use it to create a graphical timeline showing exactly when various traced events occurred.++In this section, we'll cover a few different modes `crox` can run in such as profiling an entire crate compilation including dependencies and filtering out small events.+Let's get started with the basics!++### Basic usage++To run the tool, we'll just pass the filename without a file extension like we've done before:++```sh+$ ../measureme/target/release/crox regex-17088+```++This creates a file called `chrome_profiler.json` in the working directory.+To open it, we'll use the regular Chromium performance tools you might already be familar with:++1. Open Chrome+2. Open the Developer Tools console by pressing `Ctrl` + `Shift` + `i` (Windows/Linux) or `Cmd` + `Option` + `i` (macOS)+3. Click the Performance tab at the top of the console.+4. Click the "Load profile" button which looks like an arrow pointing up.+5. Select the `chrome_profiler.json` file we crated.++You should now see something similar to this:++![Image of chrome profiler][chrome profiler img1]++You can use the scroll wheel on a mouse or the appropriate gesture on a touchpad to zoom in or out of the timeline.++### Filtering short events++If the `chrome_profiler.json` file gets too large, the normal Chromium performance tools have issues opening the file.+One easy way to deal with this is to tell `crox` to remove events shorter than a chosen duration:++```sh+$ ../measureme/target/release/crox --minumum-duration 2 regex-17088+```++Filtering out events less than 2 microseconds shrinks our `chrome_profiler.js` file from 27mb to 11mb.++### Capturing event arguments++The self-profiler can be configured to record event arguments during compilation.+For example, queries will include their query key.+This fuctionality is turned off by default because it significantly increases the self-profiler overhead.++To turn this feature on, we'll need to record a new compilation, passing an additional argument to `rustc`:++```sh+$ cargo clean+$ cargo rustc -- -Zself-profile -Zself-profile-events=default,args+```++And then process the new output files:++```sh+$ ../measureme/target/release/crox regex-23649+```++Now in the Chromium profiler, if you click on a node, you can see additional data about many of the events at the bottom of the screen:++![Image of Chrome profiler details][chrome profiler img2]++Which shows this `optimized_mir` query was processing the `regex::compile::{{impl}}::new` function body.++### Profiling an entire crate graph++By using the `RUSTFLAGS` environment variable, we can profile every `rustc` invocation, not just the final crate's.+`crox` can then combine all of the profiles together into one output file.++```sh+$ rm regex-17088.* regex-23649.*+$ cargo clean+$ RUSTFLAGS="-Zself-profile=$(pwd) -Zself-profile-events=default,args" cargo build+```++This creates quite a few trace files in the working directory.+Now, we'll tell `crox` to combine all of the trace files in the current directory together:++```sh+$ ../measureme/target/release/crox --dir .+```++Opening this file shows all of the crates compiled:++![Image of Chrome profiler with all crates][chrome profiler img3]++Clicing on a crate will expand it to show the threads and event data inside it:++![Image of Chrome profiler with a crate expanded][chrome profiler img4]++Thanks for reading!+If you have questions or would like to get involved with the Self-Profile Working Group, please check out the [measureme repository] or stop by our [Zulip stream].

Maybe add some kind of summary statement like "We've been using these tools extensively ourselves over the last few months and they've helped us tremendously in understanding where the compiler spends its time. In the future we'll be adding more features and we'll work on making the tooling easier to use."

Something like that..

wesleywiser

comment created time in 6 days

Pull request review commentwesleywiser/blog.rust-lang.org

[Inside Rust] Self-Profile tutorial

+---+layout: post+title: "Introduction to profiling rustc using the self profiler"+author: Wesley Wiser+description: "Learn how to use the -Zself-profile rustc flag"+team: the self-profile working group <https://rust-lang.github.io/compiler-team/working-groups/self-profile/>+---++Over the last year, the [Self-Profile Working Group] has been building tools to profile `rustc`.+This is part of the Compiler Team's ongoing efforts to improve `rustc`'s performance.+In this post, we'll look at the tooling currently available and use them to profile an example crate's compile time.++First, we'll download and build the `measureme` repository which provides tools to analyze self-profile trace data.++```sh+$ git clone https://github.com/rust-lang/measureme.git+$ cd measureme+$ cargo build --release --all+```++Now that we have our tools, let's download an example crate to profile its build.++```sh+$ cd ..+$ git clone https://github.com/rust-lang/regex.git+$ cd regex+```++We'll need to use a recent nightly compiler to get access to unsable `-Z` flags.++```sh+$ rustup override set nightly+```++If you haven't installed a nightly compiler before, this will download the nightly compiler for you.+If you have, then update it to make sure you're on a recent version.++```sh+$ rustup update nightly+```++Now we can build it and tell `rustc` to profile the build of the `regex` crate.+This will cause three new files to be created in the working directory which contain the profling data.++```sh+$ cargo rustc -- -Zself-profile+$ ls+CHANGELOG.md        LICENSE-APACHE       UNICODE.md              regex-17088.string_data       regex-syntax         target+Cargo.lock          LICENSE-MIT          bench                   regex-17088.string_index      rustfmt.toml         test+Cargo.toml          PERFORMANCE.md       examples                regex-capi                    scripts              tests+HACKING.md          README.md            regex-17088.events      regex-debug                   src+```++The new files follow the format `{crate name}-{rustc process id}.{events,string_data,string_index}`.++We'll use each of the three main tools to analyze the profling data:++## `summarize`++As its name suggests, this tool summarizes the data found in the trace files.+Additionally, `summarize` can also show a "diff" between two trace files but we won't be using this mode.++Let's run the tool, passing just the file name (but not the extension) for the trace:++```sh+$ ../measureme/target/release/summarize summarize regex-17088++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| Item                                          | Self time | % of total time | Time     | Item count | Cache hits | Blocked time | Incremental load time |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| LLVM_module_codegen_emit_obj                  | 4.89s     | 42.752          | 4.89s    | 159        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| codegen_module                                | 1.25s     | 10.967          | 1.37s    | 159        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| LLVM_module_optimize_module_passes            | 1.15s     | 10.022          | 1.15s    | 159        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| LLVM_module_codegen_make_bitcode              | 786.56ms  | 6.875           | 960.73ms | 159        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| typeck_tables_of                              | 565.18ms  | 4.940           | 689.39ms | 848        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| LLVM_module_codegen                           | 408.01ms  | 3.566           | 6.26s    | 159        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| mir_borrowck                                  | 224.03ms  | 1.958           | 543.33ms | 848        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| LLVM_module_codegen_emit_compressed_bitcode   | 174.17ms  | 1.522           | 174.17ms | 159        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| optimized_mir                                 | 157.91ms  | 1.380           | 205.29ms | 1996       | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| evaluate_obligation                           | 146.50ms  | 1.281           | 184.17ms | 8304       | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| codegen_crate                                 | 139.48ms  | 1.219           | 1.58s    | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| mir_built                                     | 123.88ms  | 1.083           | 168.01ms | 848        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| metadata_decode_entry                         | 88.36ms   | 0.772           | 117.77ms | 55642      | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| incr_comp_copy_cgu_workproducts               | 64.21ms   | 0.561           | 64.21ms  | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| monomorphization_collector_graph_walk         | 54.11ms   | 0.473           | 344.00ms | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| link_rlib                                     | 43.21ms   | 0.378           | 43.21ms  | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| check_impl_item_well_formed                   | 41.36ms   | 0.362           | 77.14ms  | 736        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| codegen_fulfill_obligation                    | 40.36ms   | 0.353           | 51.56ms  | 1759       | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| expand_crate                                  | 37.24ms   | 0.326           | 48.52ms  | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| symbol_name                                   | 36.31ms   | 0.317           | 39.06ms  | 5513       | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| free_global_ctxt                              | 34.34ms   | 0.300           | 34.34ms  | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| type_op_prove_predicate                       | 29.99ms   | 0.262           | 31.24ms  | 1903       | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| encode_query_results                          | 28.59ms   | 0.250           | 28.59ms  | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| layout_raw                                    | 28.09ms   | 0.245           | 76.62ms  | 9023       | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| parse_crate                                   | 27.60ms   | 0.241           | 27.60ms  | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| build_hir_map                                 | 26.37ms   | 0.230           | 31.47ms  | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| link_binary_remove_temps                      | 26.23ms   | 0.229           | 26.23ms  | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| resolve_crate                                 | 25.38ms   | 0.222           | 25.38ms  | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| check_item_well_formed                        | 25.37ms   | 0.222           | 47.45ms  | 836        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| param_env                                     | 22.64ms   | 0.198           | 31.63ms  | 2519       | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| region_scope_tree                             | 21.71ms   | 0.190           | 21.71ms  | 1366       | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| specialization_graph_of                       | 19.92ms   | 0.174           | 75.86ms  | 65         | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| check_match                                   | 18.78ms   | 0.164           | 30.05ms  | 848        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| is_freeze_raw                                 | 17.58ms   | 0.154           | 62.67ms  | 3214       | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| hir_lowering                                  | 17.58ms   | 0.154           | 17.58ms  | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| check_mod_privacy                             | 16.21ms   | 0.142           | 22.75ms  | 31         | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++...+Total cpu time: 11.440758871s+```++The output is sorted by the self time (time spent in the query or activity but not other queries or activites called by itself).+As you can see, most of the compilation time is spent in LLVM generating the binary code for the executable.++## `flamegraph`++As you may have guessed, `flamegraph` will produce a [flame graph] of the profiling data.+To run the tool, we'll pass just the filename without a file extension like we did for `summarize`:++```sh+$ ../measureme/target/release/flamegraph regex-17088+```++This will create a file called `rustc.svg` in the working directory:++![Image of flamegraph output][flame graph img]++[Click here] to try the interactive svg.++## `crox`++This tool processes self-profiling data into the JSON format that the Chromium profiler understands.+You can use it to create a graphical timeline showing exactly when various traced events occurred.++In this section, we'll cover a few different modes `crox` can run in such as profiling an entire crate compilation including dependencies and filtering out small events.+Let's get started with the basics!++### Basic usage++To run the tool, we'll just pass the filename without a file extension like we've done before:++```sh+$ ../measureme/target/release/crox regex-17088+```++This creates a file called `chrome_profiler.json` in the working directory.+To open it, we'll use the regular Chromium performance tools you might already be familar with:++1. Open Chrome+2. Open the Developer Tools console by pressing `Ctrl` + `Shift` + `i` (Windows/Linux) or `Cmd` + `Option` + `i` (macOS)+3. Click the Performance tab at the top of the console.+4. Click the "Load profile" button which looks like an arrow pointing up.+5. Select the `chrome_profiler.json` file we crated.++You should now see something similar to this:++![Image of chrome profiler][chrome profiler img1]++You can use the scroll wheel on a mouse or the appropriate gesture on a touchpad to zoom in or out of the timeline.++### Filtering short events++If the `chrome_profiler.json` file gets too large, the normal Chromium performance tools have issues opening the file.+One easy way to deal with this is to tell `crox` to remove events shorter than a chosen duration:++```sh+$ ../measureme/target/release/crox --minumum-duration 2 regex-17088+```++Filtering out events less than 2 microseconds shrinks our `chrome_profiler.js` file from 27mb to 11mb.++### Capturing event arguments++The self-profiler can be configured to record event arguments during compilation.+For example, queries will include their query key.+This fuctionality is turned off by default because it significantly increases the self-profiler overhead.++To turn this feature on, we'll need to record a new compilation, passing an additional argument to `rustc`:++```sh+$ cargo clean+$ cargo rustc -- -Zself-profile -Zself-profile-events=default,args+```++And then process the new output files:++```sh+$ ../measureme/target/release/crox regex-23649+```++Now in the Chromium profiler, if you click on a node, you can see additional data about many of the events at the bottom of the screen:++![Image of Chrome profiler details][chrome profiler img2]++Which shows this `optimized_mir` query was processing the `regex::compile::{{impl}}::new` function body.++### Profiling an entire crate graph++By using the `RUSTFLAGS` environment variable, we can profile every `rustc` invocation, not just the final crate's.+`crox` can then combine all of the profiles together into one output file.++```sh+$ rm regex-17088.* regex-23649.*+$ cargo clean+$ RUSTFLAGS="-Zself-profile=$(pwd) -Zself-profile-events=default,args" cargo build+```++This creates quite a few trace files in the working directory.+Now, we'll tell `crox` to combine all of the trace files in the current directory together:++```sh+$ ../measureme/target/release/crox --dir .+```++Opening this file shows all of the crates compiled:++![Image of Chrome profiler with all crates][chrome profiler img3]++Clicing on a crate will expand it to show the threads and event data inside it:++![Image of Chrome profiler with a crate expanded][chrome profiler img4]

Maybe show a picture with a different crate expanded. lazy_static doesn't show much because it compiles so quickly.

wesleywiser

comment created time in 6 days

Pull request review commentwesleywiser/blog.rust-lang.org

[Inside Rust] Self-Profile tutorial

+---+layout: post+title: "Introduction to profiling rustc using the self profiler"+author: Wesley Wiser+description: "Learn how to use the -Zself-profile rustc flag"+team: the self-profile working group <https://rust-lang.github.io/compiler-team/working-groups/self-profile/>+---++Over the last year, the [Self-Profile Working Group] has been building tools to profile `rustc`.+This is part of the Compiler Team's ongoing efforts to improve `rustc`'s performance.+In this post, we'll look at the tooling currently available and use them to profile an example crate's compile time.++First, we'll download and build the `measureme` repository which provides tools to analyze self-profile trace data.++```sh+$ git clone https://github.com/rust-lang/measureme.git+$ cd measureme+$ cargo build --release --all+```++Now that we have our tools, let's download an example crate to profile its build.++```sh+$ cd ..+$ git clone https://github.com/rust-lang/regex.git+$ cd regex+```++We'll need to use a recent nightly compiler to get access to unsable `-Z` flags.++```sh+$ rustup override set nightly+```++If you haven't installed a nightly compiler before, this will download the nightly compiler for you.+If you have, then update it to make sure you're on a recent version.++```sh+$ rustup update nightly+```++Now we can build it and tell `rustc` to profile the build of the `regex` crate.+This will cause three new files to be created in the working directory which contain the profling data.++```sh+$ cargo rustc -- -Zself-profile+$ ls+CHANGELOG.md        LICENSE-APACHE       UNICODE.md              regex-17088.string_data       regex-syntax         target+Cargo.lock          LICENSE-MIT          bench                   regex-17088.string_index      rustfmt.toml         test+Cargo.toml          PERFORMANCE.md       examples                regex-capi                    scripts              tests+HACKING.md          README.md            regex-17088.events      regex-debug                   src+```++The new files follow the format `{crate name}-{rustc process id}.{events,string_data,string_index}`.++We'll use each of the three main tools to analyze the profling data:++## `summarize`++As its name suggests, this tool summarizes the data found in the trace files.+Additionally, `summarize` can also show a "diff" between two trace files but we won't be using this mode.++Let's run the tool, passing just the file name (but not the extension) for the trace:++```sh+$ ../measureme/target/release/summarize summarize regex-17088++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| Item                                          | Self time | % of total time | Time     | Item count | Cache hits | Blocked time | Incremental load time |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| LLVM_module_codegen_emit_obj                  | 4.89s     | 42.752          | 4.89s    | 159        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| codegen_module                                | 1.25s     | 10.967          | 1.37s    | 159        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| LLVM_module_optimize_module_passes            | 1.15s     | 10.022          | 1.15s    | 159        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| LLVM_module_codegen_make_bitcode              | 786.56ms  | 6.875           | 960.73ms | 159        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| typeck_tables_of                              | 565.18ms  | 4.940           | 689.39ms | 848        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| LLVM_module_codegen                           | 408.01ms  | 3.566           | 6.26s    | 159        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| mir_borrowck                                  | 224.03ms  | 1.958           | 543.33ms | 848        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| LLVM_module_codegen_emit_compressed_bitcode   | 174.17ms  | 1.522           | 174.17ms | 159        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| optimized_mir                                 | 157.91ms  | 1.380           | 205.29ms | 1996       | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| evaluate_obligation                           | 146.50ms  | 1.281           | 184.17ms | 8304       | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| codegen_crate                                 | 139.48ms  | 1.219           | 1.58s    | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| mir_built                                     | 123.88ms  | 1.083           | 168.01ms | 848        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| metadata_decode_entry                         | 88.36ms   | 0.772           | 117.77ms | 55642      | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| incr_comp_copy_cgu_workproducts               | 64.21ms   | 0.561           | 64.21ms  | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| monomorphization_collector_graph_walk         | 54.11ms   | 0.473           | 344.00ms | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| link_rlib                                     | 43.21ms   | 0.378           | 43.21ms  | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| check_impl_item_well_formed                   | 41.36ms   | 0.362           | 77.14ms  | 736        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| codegen_fulfill_obligation                    | 40.36ms   | 0.353           | 51.56ms  | 1759       | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| expand_crate                                  | 37.24ms   | 0.326           | 48.52ms  | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| symbol_name                                   | 36.31ms   | 0.317           | 39.06ms  | 5513       | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| free_global_ctxt                              | 34.34ms   | 0.300           | 34.34ms  | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| type_op_prove_predicate                       | 29.99ms   | 0.262           | 31.24ms  | 1903       | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| encode_query_results                          | 28.59ms   | 0.250           | 28.59ms  | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| layout_raw                                    | 28.09ms   | 0.245           | 76.62ms  | 9023       | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| parse_crate                                   | 27.60ms   | 0.241           | 27.60ms  | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| build_hir_map                                 | 26.37ms   | 0.230           | 31.47ms  | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| link_binary_remove_temps                      | 26.23ms   | 0.229           | 26.23ms  | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| resolve_crate                                 | 25.38ms   | 0.222           | 25.38ms  | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| check_item_well_formed                        | 25.37ms   | 0.222           | 47.45ms  | 836        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| param_env                                     | 22.64ms   | 0.198           | 31.63ms  | 2519       | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| region_scope_tree                             | 21.71ms   | 0.190           | 21.71ms  | 1366       | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| specialization_graph_of                       | 19.92ms   | 0.174           | 75.86ms  | 65         | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| check_match                                   | 18.78ms   | 0.164           | 30.05ms  | 848        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| is_freeze_raw                                 | 17.58ms   | 0.154           | 62.67ms  | 3214       | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| hir_lowering                                  | 17.58ms   | 0.154           | 17.58ms  | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| check_mod_privacy                             | 16.21ms   | 0.142           | 22.75ms  | 31         | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++...+Total cpu time: 11.440758871s+```++The output is sorted by the self time (time spent in the query or activity but not other queries or activites called by itself).+As you can see, most of the compilation time is spent in LLVM generating the binary code for the executable.++## `flamegraph`++As you may have guessed, `flamegraph` will produce a [flame graph] of the profiling data.+To run the tool, we'll pass just the filename without a file extension like we did for `summarize`:++```sh+$ ../measureme/target/release/flamegraph regex-17088+```++This will create a file called `rustc.svg` in the working directory:++![Image of flamegraph output][flame graph img]++[Click here] to try the interactive svg.++## `crox`++This tool processes self-profiling data into the JSON format that the Chromium profiler understands.+You can use it to create a graphical timeline showing exactly when various traced events occurred.++In this section, we'll cover a few different modes `crox` can run in such as profiling an entire crate compilation including dependencies and filtering out small events.+Let's get started with the basics!++### Basic usage++To run the tool, we'll just pass the filename without a file extension like we've done before:++```sh+$ ../measureme/target/release/crox regex-17088+```++This creates a file called `chrome_profiler.json` in the working directory.+To open it, we'll use the regular Chromium performance tools you might already be familar with:++1. Open Chrome+2. Open the Developer Tools console by pressing `Ctrl` + `Shift` + `i` (Windows/Linux) or `Cmd` + `Option` + `i` (macOS)+3. Click the Performance tab at the top of the console.+4. Click the "Load profile" button which looks like an arrow pointing up.+5. Select the `chrome_profiler.json` file we crated.++You should now see something similar to this:++![Image of chrome profiler][chrome profiler img1]++You can use the scroll wheel on a mouse or the appropriate gesture on a touchpad to zoom in or out of the timeline.++### Filtering short events++If the `chrome_profiler.json` file gets too large, the normal Chromium performance tools have issues opening the file.+One easy way to deal with this is to tell `crox` to remove events shorter than a chosen duration:++```sh+$ ../measureme/target/release/crox --minumum-duration 2 regex-17088+```++Filtering out events less than 2 microseconds shrinks our `chrome_profiler.js` file from 27mb to 11mb.++### Capturing event arguments++The self-profiler can be configured to record event arguments during compilation.+For example, queries will include their query key.+This fuctionality is turned off by default because it significantly increases the self-profiler overhead.++To turn this feature on, we'll need to record a new compilation, passing an additional argument to `rustc`:++```sh+$ cargo clean+$ cargo rustc -- -Zself-profile -Zself-profile-events=default,args+```++And then process the new output files:++```sh+$ ../measureme/target/release/crox regex-23649+```++Now in the Chromium profiler, if you click on a node, you can see additional data about many of the events at the bottom of the screen:++![Image of Chrome profiler details][chrome profiler img2]++Which shows this `optimized_mir` query was processing the `regex::compile::{{impl}}::new` function body.++### Profiling an entire crate graph++By using the `RUSTFLAGS` environment variable, we can profile every `rustc` invocation, not just the final crate's.+`crox` can then combine all of the profiles together into one output file.++```sh+$ rm regex-17088.* regex-23649.*+$ cargo clean+$ RUSTFLAGS="-Zself-profile=$(pwd) -Zself-profile-events=default,args" cargo build+```++This creates quite a few trace files in the working directory.+Now, we'll tell `crox` to combine all of the trace files in the current directory together:++```sh+$ ../measureme/target/release/crox --dir .+```++Opening this file shows all of the crates compiled:++![Image of Chrome profiler with all crates][chrome profiler img3]++Clicing on a crate will expand it to show the threads and event data inside it:

Clicing -> Clicking

wesleywiser

comment created time in 6 days

Pull request review commentwesleywiser/blog.rust-lang.org

[Inside Rust] Self-Profile tutorial

+---+layout: post+title: "Introduction to profiling rustc using the self profiler"+author: Wesley Wiser+description: "Learn how to use the -Zself-profile rustc flag"+team: the self-profile working group <https://rust-lang.github.io/compiler-team/working-groups/self-profile/>+---++Over the last year, the [Self-Profile Working Group] has been building tools to profile `rustc`.+This is part of the Compiler Team's ongoing efforts to improve `rustc`'s performance.+In this post, we'll look at the tooling currently available and use them to profile an example crate's compile time.++First, we'll download and build the `measureme` repository which provides tools to analyze self-profile trace data.++```sh+$ git clone https://github.com/rust-lang/measureme.git+$ cd measureme+$ cargo build --release --all+```++Now that we have our tools, let's download an example crate to profile its build.++```sh+$ cd ..+$ git clone https://github.com/rust-lang/regex.git+$ cd regex+```++We'll need to use a recent nightly compiler to get access to unsable `-Z` flags.++```sh+$ rustup override set nightly+```++If you haven't installed a nightly compiler before, this will download the nightly compiler for you.+If you have, then update it to make sure you're on a recent version.++```sh+$ rustup update nightly+```++Now we can build it and tell `rustc` to profile the build of the `regex` crate.+This will cause three new files to be created in the working directory which contain the profling data.

I wonder if it would be better to encourage people to give a directory for emitting the data, e.g. -Zself-profile=./profdata

wesleywiser

comment created time in 6 days

Pull request review commentwesleywiser/blog.rust-lang.org

[Inside Rust] Self-Profile tutorial

+---+layout: post+title: "Introduction to profiling rustc using the self profiler"+author: Wesley Wiser+description: "Learn how to use the -Zself-profile rustc flag"+team: the self-profile working group <https://rust-lang.github.io/compiler-team/working-groups/self-profile/>+---++Over the last year, the [Self-Profile Working Group] has been building tools to profile `rustc`.+This is part of the Compiler Team's ongoing efforts to improve `rustc`'s performance.+In this post, we'll look at the tooling currently available and use them to profile an example crate's compile time.++First, we'll download and build the `measureme` repository which provides tools to analyze self-profile trace data.++```sh+$ git clone https://github.com/rust-lang/measureme.git+$ cd measureme+$ cargo build --release --all+```++Now that we have our tools, let's download an example crate to profile its build.++```sh+$ cd ..+$ git clone https://github.com/rust-lang/regex.git+$ cd regex+```++We'll need to use a recent nightly compiler to get access to unsable `-Z` flags.++```sh+$ rustup override set nightly+```++If you haven't installed a nightly compiler before, this will download the nightly compiler for you.+If you have, then update it to make sure you're on a recent version.++```sh+$ rustup update nightly+```++Now we can build it and tell `rustc` to profile the build of the `regex` crate.+This will cause three new files to be created in the working directory which contain the profling data.++```sh+$ cargo rustc -- -Zself-profile+$ ls+CHANGELOG.md        LICENSE-APACHE       UNICODE.md              regex-17088.string_data       regex-syntax         target+Cargo.lock          LICENSE-MIT          bench                   regex-17088.string_index      rustfmt.toml         test+Cargo.toml          PERFORMANCE.md       examples                regex-capi                    scripts              tests+HACKING.md          README.md            regex-17088.events      regex-debug                   src+```++The new files follow the format `{crate name}-{rustc process id}.{events,string_data,string_index}`.++We'll use each of the three main tools to analyze the profling data:++## `summarize`++As its name suggests, this tool summarizes the data found in the trace files.+Additionally, `summarize` can also show a "diff" between two trace files but we won't be using this mode.++Let's run the tool, passing just the file name (but not the extension) for the trace:++```sh+$ ../measureme/target/release/summarize summarize regex-17088++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| Item                                          | Self time | % of total time | Time     | Item count | Cache hits | Blocked time | Incremental load time |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| LLVM_module_codegen_emit_obj                  | 4.89s     | 42.752          | 4.89s    | 159        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| codegen_module                                | 1.25s     | 10.967          | 1.37s    | 159        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| LLVM_module_optimize_module_passes            | 1.15s     | 10.022          | 1.15s    | 159        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| LLVM_module_codegen_make_bitcode              | 786.56ms  | 6.875           | 960.73ms | 159        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| typeck_tables_of                              | 565.18ms  | 4.940           | 689.39ms | 848        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| LLVM_module_codegen                           | 408.01ms  | 3.566           | 6.26s    | 159        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| mir_borrowck                                  | 224.03ms  | 1.958           | 543.33ms | 848        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| LLVM_module_codegen_emit_compressed_bitcode   | 174.17ms  | 1.522           | 174.17ms | 159        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| optimized_mir                                 | 157.91ms  | 1.380           | 205.29ms | 1996       | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| evaluate_obligation                           | 146.50ms  | 1.281           | 184.17ms | 8304       | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| codegen_crate                                 | 139.48ms  | 1.219           | 1.58s    | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| mir_built                                     | 123.88ms  | 1.083           | 168.01ms | 848        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| metadata_decode_entry                         | 88.36ms   | 0.772           | 117.77ms | 55642      | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| incr_comp_copy_cgu_workproducts               | 64.21ms   | 0.561           | 64.21ms  | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| monomorphization_collector_graph_walk         | 54.11ms   | 0.473           | 344.00ms | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| link_rlib                                     | 43.21ms   | 0.378           | 43.21ms  | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| check_impl_item_well_formed                   | 41.36ms   | 0.362           | 77.14ms  | 736        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| codegen_fulfill_obligation                    | 40.36ms   | 0.353           | 51.56ms  | 1759       | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| expand_crate                                  | 37.24ms   | 0.326           | 48.52ms  | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| symbol_name                                   | 36.31ms   | 0.317           | 39.06ms  | 5513       | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| free_global_ctxt                              | 34.34ms   | 0.300           | 34.34ms  | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| type_op_prove_predicate                       | 29.99ms   | 0.262           | 31.24ms  | 1903       | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| encode_query_results                          | 28.59ms   | 0.250           | 28.59ms  | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| layout_raw                                    | 28.09ms   | 0.245           | 76.62ms  | 9023       | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| parse_crate                                   | 27.60ms   | 0.241           | 27.60ms  | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| build_hir_map                                 | 26.37ms   | 0.230           | 31.47ms  | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| link_binary_remove_temps                      | 26.23ms   | 0.229           | 26.23ms  | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| resolve_crate                                 | 25.38ms   | 0.222           | 25.38ms  | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| check_item_well_formed                        | 25.37ms   | 0.222           | 47.45ms  | 836        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| param_env                                     | 22.64ms   | 0.198           | 31.63ms  | 2519       | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| region_scope_tree                             | 21.71ms   | 0.190           | 21.71ms  | 1366       | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| specialization_graph_of                       | 19.92ms   | 0.174           | 75.86ms  | 65         | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| check_match                                   | 18.78ms   | 0.164           | 30.05ms  | 848        | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| is_freeze_raw                                 | 17.58ms   | 0.154           | 62.67ms  | 3214       | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| hir_lowering                                  | 17.58ms   | 0.154           | 17.58ms  | 1          | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++| check_mod_privacy                             | 16.21ms   | 0.142           | 22.75ms  | 31         | 0          | 0.00ns       | 0.00ns                |++-----------------------------------------------+-----------+-----------------+----------+------------+------------+--------------+-----------------------++...+Total cpu time: 11.440758871s+```++The output is sorted by the self time (time spent in the query or activity but not other queries or activites called by itself).+As you can see, most of the compilation time is spent in LLVM generating the binary code for the executable.++## `flamegraph`++As you may have guessed, `flamegraph` will produce a [flame graph] of the profiling data.+To run the tool, we'll pass just the filename without a file extension like we did for `summarize`:++```sh+$ ../measureme/target/release/flamegraph regex-17088+```++This will create a file called `rustc.svg` in the working directory:++![Image of flamegraph output][flame graph img]++[Click here] to try the interactive svg.++## `crox`++This tool processes self-profiling data into the JSON format that the Chromium profiler understands.+You can use it to create a graphical timeline showing exactly when various traced events occurred.++In this section, we'll cover a few different modes `crox` can run in such as profiling an entire crate compilation including dependencies and filtering out small events.+Let's get started with the basics!++### Basic usage++To run the tool, we'll just pass the filename without a file extension like we've done before:++```sh+$ ../measureme/target/release/crox regex-17088+```++This creates a file called `chrome_profiler.json` in the working directory.+To open it, we'll use the regular Chromium performance tools you might already be familar with:++1. Open Chrome+2. Open the Developer Tools console by pressing `Ctrl` + `Shift` + `i` (Windows/Linux) or `Cmd` + `Option` + `i` (macOS)+3. Click the Performance tab at the top of the console.+4. Click the "Load profile" button which looks like an arrow pointing up.+5. Select the `chrome_profiler.json` file we crated.++You should now see something similar to this:++![Image of chrome profiler][chrome profiler img1]++You can use the scroll wheel on a mouse or the appropriate gesture on a touchpad to zoom in or out of the timeline.++### Filtering short events++If the `chrome_profiler.json` file gets too large, the normal Chromium performance tools have issues opening the file.+One easy way to deal with this is to tell `crox` to remove events shorter than a chosen duration:++```sh+$ ../measureme/target/release/crox --minumum-duration 2 regex-17088+```++Filtering out events less than 2 microseconds shrinks our `chrome_profiler.js` file from 27mb to 11mb.++### Capturing event arguments++The self-profiler can be configured to record event arguments during compilation.+For example, queries will include their query key.+This fuctionality is turned off by default because it significantly increases the self-profiler overhead.

I'm wondering what the overhead is. File sizes are 2-3x, but runtime is not affected as much, I think.

wesleywiser

comment created time in 6 days

Pull request review commentwesleywiser/blog.rust-lang.org

[Inside Rust] Self-Profile tutorial

+---+layout: post+title: "Introduction to profiling rustc using the self profiler"+author: Wesley Wiser+description: "Learn how to use the -Zself-profile rustc flag"+team: the self-profile working group <https://rust-lang.github.io/compiler-team/working-groups/self-profile/>+---++Over the last year, the [Self-Profile Working Group] has been building tools to profile `rustc`.+This is part of the Compiler Team's ongoing efforts to improve `rustc`'s performance.+In this post, we'll look at the tooling currently available and use them to profile an example crate's compile time.++First, we'll download and build the `measureme` repository which provides tools to analyze self-profile trace data.++```sh+$ git clone https://github.com/rust-lang/measureme.git+$ cd measureme+$ cargo build --release --all+```++Now that we have our tools, let's download an example crate to profile its build.++```sh+$ cd ..+$ git clone https://github.com/rust-lang/regex.git+$ cd regex+```++We'll need to use a recent nightly compiler to get access to unsable `-Z` flags.

unsable -> unstable

wesleywiser

comment created time in 6 days

Pull request review commentrust-lang/rust

Fix incremental bugs in the HIR map

 impl<'hir> Map<'hir> {         if self.dep_graph.is_fully_enabled() {             let hir_id_owner = hir_id.owner;             let def_path_hash = self.definitions.def_path_hash(hir_id_owner);-            self.dep_graph.read(def_path_hash.to_dep_node(DepKind::HirBody));+            let kind = if hir_id.local_id == ItemLocalId::from_u32_const(0) {+                DepKind::Hir+            } else {+                DepKind::HirBody+            };+            self.dep_graph.read(def_path_hash.to_dep_node(kind));

I don't understand what this logic is supposed to do exactly (before or after the change)...

Zoxc

comment created time in 7 days

pull request commentrust-lang/rust

Fix incremental bugs in the HIR map

Are these perf results still current?

Zoxc

comment created time in 7 days

pull request commentrust-lang/rust

rustc_codegen_llvm: don't generate any type debuginfo for -Cdebuginfo=1.

Thanks for the PR, @eddyb!

I want to be careful with merging this so we don't run into a problem like https://github.com/rust-lang/rust/issues/60020.

eddyb

comment created time in 7 days

Pull request review commentrust-lang/rust

Construct query job latches on-demand

 pub struct QueryInfo<'tcx> {     pub query: Query<'tcx>, } -/// Representss an object representing an active query job.-pub struct QueryJob<'tcx> {+type QueryMap<'tcx> = FxHashMap<QueryToken, QueryJobInfo<'tcx>>;++/// A value uniquely identifiying an active query job.+/// This value is created from a stack pointer in `get_query` and `force_query`+/// which is alive while the query executes.+#[derive(Copy, Clone, Eq, PartialEq, Hash)]+pub struct QueryToken(NonZeroUsize);

Yes, I like how that turned out.

Zoxc

comment created time in 7 days

Pull request review commentrust-lang/rust

Construct query job latches on-demand

 impl<'a, 'tcx, Q: QueryDescription<'tcx>> JobOwner<'a, 'tcx, Q> {     /// This function is inlined because that results in a noticeable speed-up     /// for some compile-time benchmarks.     #[inline(always)]-    pub(super) fn try_get(-        tcx: TyCtxt<'tcx>,-        span: Span,-        key: &Q::Key,-        token: QueryToken,-    ) -> TryGetJob<'a, 'tcx, Q> {+    pub(super) fn try_get(tcx: TyCtxt<'tcx>, span: Span, key: &Q::Key) -> TryGetJob<'a, 'tcx, Q> {

I like how token disappears from the function signature again.

Zoxc

comment created time in 7 days

Pull request review commentrust-lang/rust

Construct query job latches on-demand

 impl<'a, 'tcx, Q: QueryDescription<'tcx>> JobOwner<'a, 'tcx, Q> {                                 query_blocked_prof_timer = Some(tcx.prof.query_blocked());                             } -                            job.clone()+                            // Create the id of the job we're waiting for+                            let id = QueryJobId {+                                job: job.id,+                                shard: u16::try_from(shard).unwrap(),+                                kind: Q::dep_kind(),+                            };++                            job.latch(id)                         }                         QueryResult::Poisoned => FatalError.raise(),                     }                 }                 Entry::Vacant(entry) => {+                    let jobs = &mut lock.jobs;+                     // No job entry for this query. Return a new one to be started later.                     return tls::with_related_context(tcx, |icx| {-                        // Create the `parent` variable before `info`. This allows LLVM-                        // to elide the move of `info`-                        let parent = icx.query.clone();-                        let info = QueryInfo { span, query: Q::query(key.clone()) };-                        let job = Lrc::new(QueryJob::new(info, parent));-                        let owner = JobOwner { cache, job: job.clone(), key: (*key).clone() };+                        // Generate an id unique within this shard.+                        let id = jobs.checked_add(1).unwrap();+                        *jobs = id;+                        let id = QueryShardJobId(NonZeroU32::new(id).unwrap());++                        let global_id = QueryJobId {+                            job: id,+                            shard: u16::try_from(shard).unwrap(),+                            kind: Q::dep_kind(),+                        };

Is there are reason for not creating the global_id outside the closure. That seems simpler than capturing the jobs reference.

Zoxc

comment created time in 7 days

Pull request review commentrust-lang/rust

Construct query job latches on-demand

 impl<'a, 'tcx, Q: QueryDescription<'tcx>> JobOwner<'a, 'tcx, Q> {                                 query_blocked_prof_timer = Some(tcx.prof.query_blocked());                             } -                            job.clone()+                            // Create the id of the job we're waiting for+                            let id = QueryJobId {

Would you mind creating a helper method for this? This function is already rather big.

Zoxc

comment created time in 7 days

pull request commentrust-lang/rust

Speed up `SipHasher128`.

The only way to satisfy all of these is via (3), which the current PR code implements. The downside is extra to_le calls in both StableHasher::write_xyz and SipHasher128::write_xyz, but I think that's reasonable to satisfy the desires above.

Does that sound reasonable?

Yes, I'm OK with that.

That's interesting. I find lots of things about little-endian/big-endian confusing, but I don't have trouble with to_le. It's just a no-op on little-endian and a byte-swap on big-endian.

I think what I find even more confusing about to_le is when it is used as a substitute for converting from little endian to native, i.e:

// We have some bytes the encode the number 1 as a 32 bit integer in LE format, 
let le_bytes = [1, 0, 0, 0];

// Load the bytes into the `u32` verbatim
let x: u32 = *(&le_bytes as &u32);

// On a big endian machine x is now the number 16777216, so we convert 
// to big endian by, obviously, calling `to_le()`
x.to_le()

It does the right thing but it spells the opposite of what it does. What I usually really want, and I'm glad that Rust has it since recently, is from_le_bytes():

// We have some bytes the encode the number 1 as a 32 bit integer in LE format, 
let le_bytes = [1, 0, 0, 0];

// So much nicer!
let x = u32::from_le_bytes(le_bytes);

I would actually prefer implementing u8to64_le in terms of from_le_bytes if that optimizes as well as the current version. But I don't want to block this PR on it. Thanks for looking into things so thoroughly!

@bors r+

nnethercote

comment created time in 7 days

Pull request review commentrust-lang/rust

Construct query job latches on-demand

 pub struct QueryInfo<'tcx> {     pub query: Query<'tcx>, } -/// Representss an object representing an active query job.-pub struct QueryJob<'tcx> {+type QueryMap<'tcx> = FxHashMap<QueryToken, QueryJobInfo<'tcx>>;++/// A value uniquely identifiying an active query job.+/// This value is created from a stack pointer in `get_query` and `force_query`+/// which is alive while the query executes.+#[derive(Copy, Clone, Eq, PartialEq, Hash)]+pub struct QueryToken(NonZeroUsize);

I was thinking just having a u64 counter per shard and initializing each counter in a way that the upper N bits globally identify the shard. E.g. allocate a 20 bit "(query-kind, shard) id" from a static atomic counter when the shard is created and then initialize the shard-local counters with shard_id << 44. But I don't actually care about the implementation details as long as QueryJobId does what its name suggests.

Zoxc

comment created time in 7 days

pull request commentrust-lang/rust

Micro-optimize the heck out of LEB128 reading and writing.

@bors r+

Thanks, @nnethercote!

nnethercote

comment created time in 7 days

issue commentrust-lang/rust

-Cdebuginfo=1 wastefully produces full type descriptions.

I think this is a recent regression. Do we know when it was introduced?

A few months ago when I filed https://github.com/rust-lang/rust/issues/64405, this wasn't the case yet.

eddyb

comment created time in 7 days

Pull request review commentrust-lang/rust

Construct query job latches on-demand

 pub struct QueryInfo<'tcx> {     pub query: Query<'tcx>, } -/// Representss an object representing an active query job.-pub struct QueryJob<'tcx> {+type QueryMap<'tcx> = FxHashMap<QueryToken, QueryJobInfo<'tcx>>;++/// A value uniquely identifiying an active query job.+/// This value is created from a stack pointer in `get_query` and `force_query`+/// which is alive while the query executes.+#[derive(Copy, Clone, Eq, PartialEq, Hash)]+pub struct QueryToken(NonZeroUsize);

Using a pointer into the stack as a unique identifier is clever but I'm a bit worried that we can't make the compiler guarantee that the value won't be reused. The core logic within the query engine is very complicated and somebody might do some kind of optimization that silently breaks the invariant of the key stack value being alive throughout the query jobs lifetime.

If I understand things correctly, the requirements for QueryToken are:

  • it needs to uniquely identify a QueryJob
  • it needs to be cheap to construct.

At construction time we already have a lock on the query cache. Could we put a u64 counter into that?

Also, if the main purpose of QueryToken is to identify a QueryJob then I'd prefer if it was called QueryJobId.

Zoxc

comment created time in 7 days

issue commentrust-lang/rust

Choose a naming scheme for codegen debuginfo emission.

My main reasoning is that debuginfo in the Rust compiler conceptually does not necessarily mean DWARF, so I would prefer a more generic naming scheme (even though I recognize that in practice things are basically modeled after LLVM which in turn uses DWARF-like concepts).

I don't have strong feelings on this. I just think that there is little that can go wrong when using a more generic naming scheme, while using a dwarf-oriented naming scheme could become unnecessarily confusing if our model is "mostly DWARF but in some cases not quite", etc.

eddyb

comment created time in 8 days

pull request commentrust-lang/rust

Speed up `SipHasher128`.

So it looks the same as on little-endian. This is what I expected because code in question operates on integers, not on byte sequences, i.e. the number 0xaabbccdd might have a different memory layout on big-endian, but it still has the same numeric value and print!("{:x}", n) print the same on all architectures.

The use of to_ne_bytes shows that, by default, for a given sequence of write_xyz calls, any hasher will give different results on little-endian vs. big-endian. Going back to my example:

write_u32(0xDDCCBBAA) write_u8(0xEE) write_u32(0xIIHHGGFF)

On little-endian it is equivalent to write([AA,BB,CC,DD, EE, FF,GG,HH,II]) On big-endian it is equivalent to write([DD,CC,BB,AA, EE, II,HH,GG,FF]). Clearly the results will be different.

I see, that is interesting. I didn't know that the libstd implementation worked this way. It's clear that that must give different results depending on endianess. At the same time StableHasher must give the same result on all platforms for that sequence of calls. I think it's fine for SipHasher128 to handle this differently than libstd, as long as we document it. I don't think there's an actual requirement that write_u32() corresponds to hashing any specific sequence of bytes. The only real requirement I see for the generic Hasher is that any sequence of calls deterministically results in the same hash value on the same platform (i.e. the minimum requirements for making it usable with a hashtable). I think the main reason the standard library hashes things in native byte order is performance, not because it's a strict requirement.

So I think our options for SipHasher128 are:

  1. Don't do any endianess conversions on short_write arguments and rely on short_write to be implemented in an endian independent way (which it is as long as it only does bitwise and arithmetic operations).
  2. Make short_write take a byte slice again and then make sure that StableHasher makes things endian independent by always converting to little endian. (~= the current implementation)
  3. Try to make SipHasher128 behave exactly the same way as std::hash::Hasher (i.e. giving different results depending on endianess) while still using integer arguments for short_write and then let StableHasher pre-process the integers in a way that leads to endian independent hash values. (~= the current version of this PR?)

I prefer option (1) as it is just simpler.

This whole snippet can be simplified to out.to_le().

Yeah, I know. I just find to_le() confusing in most contexts. E.g. why does x.to_le().to_le() give me big-endian encoding on a big endian system? I personally prefer to call swap_bytes() which is just more explicit. What I usually really want is to_le_bytes(), that makes a lot more sense to me. Anyway, if you strongly prefer to_le() to my more verbose version, I won't fight you on it. I mostly want to get rid of the weird sequence of if statements in u8to64_le.

nnethercote

comment created time in 8 days

pull request commentrust-lang/rust

Speed up `SipHasher128`.

Here is what I get on the big endian machine:

old early  : 0xddccbbaa, 4
old early  : 0xeeddccbbaa, 5
old process: 0x345678eeddccbbaa, 5
old spill  : 0x12, 1
old: 20f554e44fa4ca9 d68f01a898684a41
new early  : 0xddccbbaa, 4
new early  : 0xeeddccbbaa, 5
new process: 0x345678eeddccbbaa, 5
new spill  : 0x12, 1
new: 20f554e44fa4ca9 d68f01a898684a41
nnethercote

comment created time in 8 days

pull request commentrust-lang/rust

Micro-optimize the heck out of LEB128 reading and writing.

That's interesting. I remember that switching the code from loop to for sped up the code considerably a couple of years ago. My theory now is that that past speedup came from duplicating the machine code for each integer type, allowing the branch predictor to do a better job, and that that speedup was so big that it was faster even though the for loop introduced more overhead.

Anyway, I'm happy to get any kind of improvement here. And it's even more safe than before :tada:

(In case someone is interested in the past of this implementation: https://github.com/michaelwoerister/encoding-bench contains a number of different versions that I tried out. It's rather messy as it's essentially a private repo but an interesting aspect is the test data files that are generated from actual rustc invocations)

nnethercote

comment created time in 8 days

pull request commentrust-lang/rust

Speed up `SipHasher128`.

Also, we should be able to replace the u8to64_le() with something more straightforward that does just a single memcpy (and let's us get rid of this weird load_int_le macro):

    /// Loads up to 7 bytes from a byte-slice into a u64.
    #[inline]
    fn u8to64_le(buf: &[u8], start: usize, len: usize) -> u64 {
        assert!(len < 8 && start + len <= buf.len());
        let mut out = 0u64;

        unsafe {
            let out_ptr = &mut out as *mut _ as *mut u8;
            ptr::copy_nonoverlapping(buf.as_ptr().offset(start as isize), out_ptr, len);
        }

        #[cfg(target_endian = "big")]
        {
            // If this is a big endian system we swap bytes, so that the first
            // byte ends up in the lowest order byte, like SipHash expects.
            out = out.swap_bytes();
        }

        out
    }
nnethercote

comment created time in 8 days

push eventmichaelwoerister/sip-endian

Michael Woerister

commit sha 1c8ebb500016ed5a5e9082859d8a7e52347390f8

test

view details

push time in 8 days

push eventmichaelwoerister/sip-endian

Michael Woerister

commit sha b85326bbf9045a2030a24868a01b18561882e56d

test

view details

push time in 8 days

create barnchmichaelwoerister/sip-endian

branch : simpler-extraction

created branch time in 8 days

pull request commentrust-lang/rust

Speed up `SipHasher128`.

OK, I wrote the following test program that compares hash values before and after this PR:

https://github.com/michaelwoerister/sip-endian/blob/master/main.rs

On a little endian machine everything works as expected. However, when I tried it on a big endian machine (gcc110 from cfarm.tetaneutral.net), I got different values until I removed the to_le() calls from the PR's implementation. Once I did that the values matched those on the little endian machine (and those of the current implementation).

The requirement here is that the same sequence of write_xyz() calls with the same numeric values must produce the same final hash value, independent of endianess. For the current implementation this is achieved by treating everything as byte slices and making sure that all such slices are brought into a platform independent order (by calling to_le() in StableHasher).

However, the implementation in this PR does not operate on byte slices anymore, so there is no need to do the whole byte-swapping dance. The new short_write uses only uses bit operations and those are endian independent.

So the correct fix, in my opinion, is to remove the to_le() calls from both the short_write() invocations in SipHasher128 and from the write_xyz() calls in StableHasher. (Funnily enough current version of this PR probably works too because it swaps the bytes once in StableHasher and then swaps them back again in SipHasher).

nnethercote

comment created time in 8 days

push eventmichaelwoerister/sip-endian

Michael Woerister

commit sha 146119e5ab35fdb6904eb9fbf556f4a13598d214

wip

view details

push time in 8 days

push eventmichaelwoerister/sip-endian

Michael Woerister

commit sha 91e261184b2f7dbf562d4d65656e27bc631b8ee4

wip

view details

push time in 8 days

push eventmichaelwoerister/sip-endian

Michael Woerister

commit sha 92ca17c2075f70fe034a60358c1a79955bfc0787

wip

view details

push time in 8 days

push eventmichaelwoerister/sip-endian

Michael Woerister

commit sha 6ddd2516c3b53698a300d6511c9d2ad384646215

wip

view details

push time in 8 days

create barnchmichaelwoerister/sip-endian

branch : master

created branch time in 9 days

created repositorymichaelwoerister/sip-endian

created time in 9 days

pull request commentrust-lang/rust

self-profile: Support arguments for generic_activities.

perf.rlo still seems to work fine after this change so I think this PR is ready for an actual review.

michaelwoerister

comment created time in 9 days

Pull request review commentrust-lang/rust

self-profile: Support arguments for generic_activities.

 impl SelfProfilerRef {     /// VerboseTimingGuard returned from this call is dropped. In addition to recording     /// a measureme event, "verbose" generic activities also print a timing entry to     /// stdout if the compiler is invoked with -Ztime or -Ztime-passes.-    #[inline(always)]     pub fn verbose_generic_activity<'a>(         &'a self,         event_id: &'static str,     ) -> VerboseTimingGuard<'a> {-        VerboseTimingGuard::start(-            event_id,-            self.print_verbose_generic_activities,-            self.generic_activity(event_id),-        )+        let message =+            if self.print_verbose_generic_activities { Some(event_id.to_owned()) } else { None };

I updated the names.

michaelwoerister

comment created time in 9 days

push eventmichaelwoerister/rust

Michael Woerister

commit sha 81dccb1a5c7b67c61cb7eb421150c671d6e1a7de

self-profile: Support arguments for generic_activities.

view details

push time in 9 days

pull request commentrust-lang/rust

Speed up `SipHasher128`.

Note that we already call to_le() on layer above in StableHasher. That's probably sufficient then? It's annoying that we don't have test running on big endian platforms :/

nnethercote

comment created time in 9 days

pull request commentrust-lang/rust

Add an option to use LLD to link the compiler on Windows platforms

Fwiw, official Windows builds of Firefox are all done with LLD.

Zoxc

comment created time in 9 days

issue commentrust-lang/rust

Choose a naming scheme for codegen debuginfo emission.

I'm fine with making the naming consistent. I think I prefer using generic dbg/Dbg prefixes instead of Dwarf-specific ones. I guess a non-LLVM backend on Windows would not use DWARF?

eddyb

comment created time in 9 days

Pull request review commentrust-lang/rust

self-profile: Support arguments for generic_activities.

 impl SelfProfilerRef {     /// VerboseTimingGuard returned from this call is dropped. In addition to recording     /// a measureme event, "verbose" generic activities also print a timing entry to     /// stdout if the compiler is invoked with -Ztime or -Ztime-passes.-    #[inline(always)]     pub fn verbose_generic_activity<'a>(         &'a self,         event_id: &'static str,     ) -> VerboseTimingGuard<'a> {-        VerboseTimingGuard::start(-            event_id,-            self.print_verbose_generic_activities,-            self.generic_activity(event_id),-        )+        let message =+            if self.print_verbose_generic_activities { Some(event_id.to_owned()) } else { None };

It could be turned into a Cow, yes. But I left it as 'static because that's a good safeguard against accidentally encoding some kind of parameter in the event label (for which purpose one should use generic_activity_with_arg or extra_verbose_generic_activity). Let me rename event_id to event_label in the cases where we don't support parameters...

michaelwoerister

comment created time in 9 days

Pull request review commentrust-lang/rust

self-profile: Support arguments for generic_activities.

 pub(crate) fn run_pass_manager(             llvm::LLVMRustAddPass(pm, pass.unwrap());         } -        cgcx.prof-            .extra_verbose_generic_activity("LTO_passes")

It's replaced by LLVM_lto_optimize above.

michaelwoerister

comment created time in 9 days

pull request commentrust-lang/rust

Use queries for the HIR map

Thanks for the PR, @Zoxc. Here are some thoughts:

  • I think this looks very promising and is along the lines of what was discussed in the end-to-end queries design meeting. I can't quite tell yet. It would be great to have a bit more documentation on what the new types introduced here are exactly, e.g.:
    • What is IndexedHir? I'd guess the entire HIR after lowering + lookup tables?
    • What is a HirOwner? An "item-like"? The thing corresponding to a DepNode::Hir?
    • What is a HirItem? A child of an owner, like an expression, params, etc?
    • What HirOwnerItems? A local lookup table for items of a HirOwner?
  • Performance looks quite bad at the moment. We'll need to find a way fix this before going forward. Some superficial browsing of the detailed perf.rlo tables (e.g. https://perf.rust-lang.org/detailed-query.html?commit=c6c89b70fa50afc9968591c5b409fd70b2ca1f2c&base_commit=64ea639c12df0594dd891b1ba0b439c8c5eacd83&benchmark=piston-image-check&run_name=patched%20incremental:%20println) seems to suggest that there is more invalidation going on (that the number of calls to things like typeck_tables_of increases). Not sure if that is due to bug fixes or just a different configuration of red-green boundaries. Do we decrease dep-node granularity for HIR-related things with this setup?
  • Does this get rid of all custom dep-tracking we do for the HIR map? (That would be awesome!)
  • Does this hash the HIR twice in order to compute the SVH? (I have some ideas around a low-tech solution for memoizing Fingerprints that is mainly aimed at avoiding re-hashing upstream MIR, but which also could help with the SVH problem).
Zoxc

comment created time in 9 days

pull request commentrust-lang/rust

Speed up `SipHasher128`.

https://cfarm.tetaneutral.net/ provides access to some big endian systems.

nnethercote

comment created time in 12 days

pull request commentrust-lang/rust

Speed up `SipHasher128`.

We do run tests on some big endian platforms, I think (right, @rust-lang/infra?)

In the past we had problems with symbol name mismatches when compiling some things on little-endian and the rest on big-endian, because the symbol hashes didn't match up. But we now know which kinds of bug reports to look out for after a change like this and testing should be better too now.

nnethercote

comment created time in 12 days

pull request commentrust-lang/rust

self-profile: Support arguments for generic_activities.

Let's do a perf run in order to see if this breaks anything.

@bors try @rust-timer queue

michaelwoerister

comment created time in 12 days

PR opened rust-lang/rust

self-profile: Support arguments for generic_activities.

This PR adds support for recording arguments of "generic activities". The most notable use case is LLVM module names, which should be very interesting for crox profiles. In the future it might be interesting to add more fine-grained events for pre-query passes like macro expansion.

I tried to judiciously de-duplicate existing self-profile events with extra_verbose_generic_activity, now that the latter also generates self-profile events.

r? @wesleywiser

+165 -114

0 comment

8 changed files

pr created time in 12 days

create barnchmichaelwoerister/rust

branch : self-profile-generic-activity-args

created branch time in 12 days

pull request commentrust-lang/rust

[self-profiler] add selfprofiling to llvm

Yes, a warning makes sense to me too.

andjo403

comment created time in 12 days

pull request commentrust-lang/rust

Speed up `SipHasher128`.

Thanks for the PR, @nnethercote. Looks like a great find! We are doing lots of hashing :)

Did you think about which implications these changes might have on big endian systems? Hashing needs to be stable across platforms for cross-compilation. The changes are probably fine with respect to this but it's something to look out for.

I'll review in detail soon.

nnethercote

comment created time in 12 days

issue commentrust-lang/rust

25% compile time increase on beta when building async-std

Is this for release or debug builds?

jonas-schievink

comment created time in 13 days

Pull request review commentrust-lang/rust

Move the `hir().krate()` method to a query and remove the `Krate` dep node

 pub struct GlobalCtxt<'tcx> {     /// Export map produced by name resolution.     export_map: FxHashMap<DefId, Vec<Export<hir::HirId>>>, -    hir_map: hir_map::Map<'tcx>,+    pub(crate) hir_map: hir_map::Map<'tcx>,

Maybe add a comment one usually wants tcx.hir() instead of tcx.hir_map?

Zoxc

comment created time in 13 days

Pull request review commentrust-lang/rust

Move the `hir().krate()` method to a query and remove the `Krate` dep node

 rustc_queries! {     }      Other {+        // Represents crate as a whole (as distinct from the to-level crate module).

to-level might be a typo?

Zoxc

comment created time in 13 days

issue commentrust-lang/compiler-team

Make incr. comp. respect the -Ccodegen-units setting

Just to be absolutely clear: Your intention, with the change proposed here, is that -Ccodegen-units will denote the (upper-bound on the) number of codegen-units across the entire crate currently being compiled, correct?

This is correct.

michaelwoerister

comment created time in 14 days

issue commentrust-lang/compiler-team

Make incr. comp. respect the -Ccodegen-units setting

Yes, -Ccodegen-units today already only sets an upper limit. The same would be true after this change.

michaelwoerister

comment created time in 14 days

pull request commentrust-lang/rust

Merge item id stable hashing functions

Thanks, this looks good to me now.

@bors r+ rollup

ljedrz

comment created time in 14 days

issue openedrust-lang/compiler-team

Make incr. comp. respect the -Ccodegen-units setting

TL;DR

Incremental compilation currently will always create 1-2 codegen units per source-level module, regardless of the -Ccodegen-units setting passed to the compiler. This is fine in the majority of cases but there is no way to control this behavior in cases where it produces too much overhead (see below for examples).

I propose to

  • make the compiler honor the -Ccodegen-units setting, even when compiling with -Cincremental, and
  • make the compiler default to a higher number of codegen units in case incr. comp. is enabled (256 instead of 16).

The -Ccodegen-units flag would retain exactly the same semantics it has in non-incremental mode, i.e. setting an upper bound for the number codegen units.

Why do I consider this a (possibly) major change? Because there is one case where the compiler changes behavior: If someone has explicitly set the number of codegen units. After this change, that setting will start to have an effect, leading to potentially higher compile times. Only one crate in the perf.rlo benchmark suite has such an explicit setting (clap-rs). I expect the fallout to be minor and harmless.

Also note that this opens up a whole new use case for incremental compilation: By setting -Ccodegen-units=1 (or -Ccodegen-units=16 as is the default right now), the compiler can make use of the incremental cache for all of the middle end while producing a binary that exhibits the same runtime performance as a non-incrementally built one.

Links and Details

I ran experiments for this in https://github.com/rust-lang/rust/pull/67834 and the results look good:

  • up to 30% compile time reduction for script-servo-debug
  • up to 15% compile time reduction for style-servo-debug
  • up to 8% compile time reduction for style-servo-opt

However, there are also cases that regress:

  • patched incremental: debugging println in dependency in script-servo-opt regresses by 8% due to the lower cache granularity.
  • clap-rs regresses by up to 34.7% because it has an explicit (and previously ignored) codegen-units setting in its Cargo.toml. That is easily fixable by clap-rs itself.

These regressions are acceptable, I think, especially because the user can easily regain the previous behavior by setting -Ccodegen-units=9999 (or some other number that is greater than the number of source-level modules times 2). Most crates, however, are well below the default setting of 256 codegen units and won't see any kind of changed behavior.

Mentors or Reviewers

The implementation should be straightforward so a reviewer would mostly need to sign off on making the -Ccodegen-units flag suddenly take an effect in incremental mode. @nikomatsakis & @pnkfelix as compiler team leads would be good candidates for that.

created time in 14 days

Pull request review commentrust-lang/rust

Merge item id stable hashing functions

 impl<'ctx> rustc_hir::HashStableContext for StableHashingContext<'ctx> {     // want to pick up on a reference changing its target, so we hash the NodeIds     // in "DefPath Mode".

I think it makes sense to move this comment back to the HashStable impls for the ItemId types.

ljedrz

comment created time in 14 days

pull request commentrust-lang/rust

[self-profiler] add selfprofiling to llvm

Looks good to me, thanks @andjo403! Now just have to wait for #67954 to land.

andjo403

comment created time in 15 days

pull request commentrust-lang/rust

Replace HIR's ItemId structs with aliases

I actually don't think this change is an improvement. New-typing things in order to avoid accidental mixups is a common technique for avoiding bugs. I personally would leave the types as they are unless there is a strong reason to change them.

You could collapse the hash_item_id(), hash_impl_item_id(), and hash_trait_item_id() methods into a single hash_reference_to_item() method if you want to reduce code duplication.

ljedrz

comment created time in 15 days

Pull request review commentrust-lang/rust

Replace HIR's ItemId structs with aliases

 impl<'ctx> rustc_hir::HashStableContext for StableHashingContext<'ctx> {         }     } -    // The following implementations of HashStable for `ItemId`, `TraitItemId`, and-    // `ImplItemId` deserve special attention. Normally we do not hash `NodeId`s within-    // the HIR, since they just signify a HIR nodes own path. But `ItemId` et al-    // are used when another item in the HIR is *referenced* and we certainly-    // want to pick up on a reference changing its target, so we hash the NodeIds-    // in "DefPath Mode".

Merging the 3 types into 1 should be fine. Using an alias would be a change in semantics, leading to things being ignored during hashing.

ljedrz

comment created time in 15 days

pull request commentrust-lang/rust

Construct query job latches on-demand

The performance numbers look good. I'll try to make time for reviewing this some time this or next week.

Zoxc

comment created time in 15 days

pull request commentrust-lang/rust

Only assign dep node indices in non-incremental mode if self profiling is active

I'm not convinced that this is worth the additional complexity. An atomic fetch_add should be rather cheap.

Zoxc

comment created time in 15 days

issue commentrust-lang/rust

Make codegen treat inline fns the same as regular fns in non-opt builds

We generate an internal copy per crate (but not per object file). We could probably make use of the -Zshare-generics infrastructure for making inline functions available downstream.

Wow, is this still an issue today?

I think the tone of this comment is inappropriate. Please keep it civil.

michaelwoerister

comment created time in 15 days

pull request commentrust-lang/rust

Ensure all iterations in Rayon iterators run in the presence of panics

I don't really have a lot of time to look into this. Is this something that would warrant a design meeting? That would certainly make it easier for me personally to schedule time for it.

Zoxc

comment created time in 16 days

pull request commentrust-lang/rust

[Experiment] Export generic instances from libstd.

Yes, it looks like this change causes quite a bit of overhead for loading the list of generic symbols available in the standard library. Small programs that then do not use any of these newly available symbols don't get a chance for amortizing that overhead.

ripgrep-opt clean incremental regresses by 1.9% (linker and some other things). cranelift-codegen-opt baseline incremental regresses by 1.2% (mostly LLVM). Both are "real" programs/libraries.

I wouldn't rely too much on wall-time numbers when it comes to changes below 3%. They are rather noisy.

I'm not quite sure how to proceed here. Those performance numbers clearly aren't a mandate to merge this :)

michaelwoerister

comment created time in 16 days

pull request commentrust-lang/rust

[Experiment] Export generic instances from libstd.

@bors try @rust-timer queue

michaelwoerister

comment created time in 19 days

PR opened rust-lang/rust

[Experiment] Export generic instances from libstd.

This should resolve issue #64140. However it is unclear if there are detrimental effects. Let's test if there are performance improvements to be had.

r? @ghost

+6 -1

0 comment

1 changed file

pr created time in 19 days

create barnchmichaelwoerister/rust

branch : export-generics-from-std

created branch time in 20 days

issue commentrust-lang/rust

Drop glue is always inlined into caller crate

It seems like #68414 might help fixing this but is not sufficient on its own because libstd is compiled with -Zshare-generics=no. I'll do experiments with changing that.

alexcrichton

comment created time in 20 days

issue commentrust-lang/measureme

summarize could show min/max times and other useful stats.

@theotherphil Each event is self-contained. For min/max you can just compare Event::timestamps. I would not make a distinction between the different event kinds.

eddyb

comment created time in 21 days

pull request commentrust-lang/rust

Don't ICE on path-collision in dep-graph

@bors r+ rollup

Thanks, @pnkfelix! Let's merge it like this. If other incremental compilation errors keep cropping up, we can still think about printing a suggestion about clearing the cache to the user.

pnkfelix

comment created time in 22 days

pull request commentrust-lang/rust

[self-profiler] Two small cleanups

@bors r+ rollup

wesleywiser

comment created time in 23 days

pull request commentrust-lang/rust

[self-profiler] Two small cleanups

Nice! I didn't know ThreadId::as_u64() was a thing.

wesleywiser

comment created time in 23 days

Pull request review commentrust-lang/compiler-team

Create major_change issue template

+---+name: Major change announcement+about: Propose a major change.+title: "(My major change proposal)"+labels: major-change+assignees: ''++---++# Directions++If you'd like to propose a major change to do to rustc, you've come to+the right place! To do so, please write-up your proposal in this issue.++Describe your major change in a sentence or two under the TL;DR section.++If available, add links and more descriptions in the "Links and Details" section.+Note: it is not expected that you write a fully fleshed out proposal there.+Just add any information you think is important.++List people already mentoring you on this change or reviewing this change in the "Mentors or Reviewers" section+If there are none, you can also add people that you think would be a good fit.+You can look at [the experts map](https://github.com/rust-lang/compiler-team/blob/master/content/experts/map.toml) for ideas.++Oh, and please delete this section before you open the issue -- but+keep the others!++# TL;DR++# Links and Details++# Mentors or Reviewers++* Chuck Norris+* Obi Wan Kenobi

While this list makes me chuckle personally, it might be a good idea to set a different tone here and steer clear of what could be perceived as brogrammer humor. It's a template text that people will regular have on their screens, it should try to be mostly neutral, I think.

oli-obk

comment created time in a month

Pull request review commentrust-lang/compiler-team

Create major_change issue template

+---+name: Major change announcement+about: Propose a major change.+title: "(My major change proposal)"+labels: major-change+assignees: ''++---++# Directions++If you'd like to propose a major change to do to rustc, you've come to

If you'd like to propose a major change to do to rustc

oli-obk

comment created time in a month

Pull request review commentrust-lang/rust

[self-profiler] add selfprofiling to llvm

 fn get_resident() -> Option<usize> {         }     } }++fn pass_name_to_string_id(+    profiler: &Arc<SelfProfiler>,+    string_cache: &mut FxHashMap<String, StringId>,+    llvm_name: &str,+) -> StringId {+    *string_cache.entry(llvm_name.to_string()).or_insert_with(|| profiler.alloc_string(llvm_name))+}++fn ir_name_to_string_id(+    profiler: &Arc<SelfProfiler>,+    string_cache: &mut FxHashMap<String, StringId>,+    llvm_name: &str,+) -> StringId {+    *string_cache.entry(llvm_name.to_string()).or_insert_with(|| {+        let demangled_ir_name = rustc_demangle::demangle(llvm_name).to_string();+        profiler.alloc_string(demangled_ir_name.as_str())+    })+}++fn llvm_args_to_string_id(+    profiler: &Arc<SelfProfiler>,+    string_cache: &mut FxHashMap<String, StringId>,+    pass_name: &str,+    ir_name: &str,+) -> EventId {+    let pass_name = pass_name_to_string_id(profiler, string_cache, pass_name);+    let mut components = vec![StringComponent::Ref(pass_name)];+    // handle that LazyCallGraph::SCC is a comma separated list within parentheses+    let parentheses: &[_] = &['(', ')'];+    let trimed = ir_name.trim_matches(parentheses);+    for part in trimed.split(',') {+        let ir_name = ir_name_to_string_id(profiler, string_cache, part);+        components.push(StringComponent::Value(SEPARATOR_BYTE));+        components.push(StringComponent::Ref(ir_name));+    }

Yeah we'll want to add a EventIdBuilder::from_label_and_args(label: _, args: &[_]) method at some point anyway. For now encoding things directly is fine.

andjo403

comment created time in a month

Pull request review commentrust-lang/rust

[self-profiler] add selfprofiling to llvm

 fn get_resident() -> Option<usize> {         }     } }++fn pass_name_to_string_id(+    profiler: &Arc<SelfProfiler>,+    string_cache: &mut FxHashMap<String, StringId>,+    llvm_name: &str,+) -> StringId {+    *string_cache.entry(llvm_name.to_string()).or_insert_with(|| profiler.alloc_string(llvm_name))+}++fn ir_name_to_string_id(+    profiler: &Arc<SelfProfiler>,+    string_cache: &mut FxHashMap<String, StringId>,+    llvm_name: &str,+) -> StringId {+    *string_cache.entry(llvm_name.to_string()).or_insert_with(|| {+        let demangled_ir_name = rustc_demangle::demangle(llvm_name).to_string();+        profiler.alloc_string(demangled_ir_name.as_str())+    })+}++fn llvm_args_to_string_id(+    profiler: &Arc<SelfProfiler>,+    string_cache: &mut FxHashMap<String, StringId>,+    pass_name: &str,+    ir_name: &str,+) -> EventId {+    let pass_name = pass_name_to_string_id(profiler, string_cache, pass_name);+    let mut components = vec![StringComponent::Ref(pass_name)];+    // handle that LazyCallGraph::SCC is a comma separated list within parentheses+    let parentheses: &[_] = &['(', ')'];+    let trimed = ir_name.trim_matches(parentheses);+    for part in trimed.split(',') {+        let ir_name = ir_name_to_string_id(profiler, string_cache, part);+        components.push(StringComponent::Value(SEPARATOR_BYTE));+        components.push(StringComponent::Ref(ir_name));+    }+    EventId::from_label(profiler.alloc_string(components.as_slice()))+}++pub struct LlvmSelfProfiler<'a> {

No rush

andjo403

comment created time in a month

pull request commentrust-lang/rust

Also share drop-glue when compiling with -Zshare-generics (i.e. at opt-level=0)

@bors r=alexcrichton

Thanks for the review!

michaelwoerister

comment created time in a month

Pull request review commentrust-lang/rust

Also share drop-glue when compiling with -Zshare-generics (i.e. at opt-level=0)

 impl<'tcx> Instance<'tcx> {         let ty = tcx.type_of(self.def.def_id());         tcx.subst_and_normalize_erasing_regions(self.substs, param_env, &ty)     }++    /// Finds a crate that contains a monomorphization of this instance that+    /// can be linked to from the local crate. A return value of `None` means+    /// no upstream crate provides such an exported monomorphization.+    ///+    /// This method already takes into account the global `-Zshare-generics`+    /// setting, always returning `None` if `share-generics` is off.+    pub fn upstream_monomorphization(&self, tcx: TyCtxt<'tcx>) -> Option<CrateNum> {+        // If we are not in share generics mode, we don't link to upstream+        // monomorphizations but always instantiate our own internal versions+        // instead.+        if !tcx.sess.opts.share_generics() {+            return None;+        }++        // If this instance has non-erasable parameters, it cannot be a shared+        // monomorphization. Non-generic instances are already handled above+        // by `is_reachable_non_generic()`.+        if self.substs.non_erasable_generics().next().is_none() {

Should be clearer now.

michaelwoerister

comment created time in a month

push eventmichaelwoerister/rust

Michael Woerister

commit sha 197cc1e43afba388a0266a08d2b946a187b766bb

Add projection query for upstream drop-glue instances. This reduces the amount of invalidated data when new types are add to upstream crates.

view details

push time in a month

Pull request review commentrust-lang/rust

Also share drop-glue when compiling with -Zshare-generics (i.e. at opt-level=0)

 impl<'tcx> Instance<'tcx> {         let ty = tcx.type_of(self.def.def_id());         tcx.subst_and_normalize_erasing_regions(self.substs, param_env, &ty)     }++    /// Finds a crate that contains a monomorphization of this instance that+    /// can be linked to from the local crate. A return value of `None` means+    /// no upstream crate provides such an exported monomorphization.+    ///+    /// This method already takes into account the global `-Zshare-generics`+    /// setting, always returning `None` if `share-generics` is off.+    pub fn upstream_monomorphization(&self, tcx: TyCtxt<'tcx>) -> Option<CrateNum> {+        // If we are not in share generics mode, we don't link to upstream+        // monomorphizations but always instantiate our own internal versions+        // instead.+        if !tcx.sess.opts.share_generics() {+            return None;+        }++        // If this instance has non-erasable parameters, it cannot be a shared+        // monomorphization. Non-generic instances are already handled above+        // by `is_reachable_non_generic()`.+        if self.substs.non_erasable_generics().next().is_none() {

The comment is probably wrong and definitely confusing. I'll try to make this clearer.

michaelwoerister

comment created time in a month

Pull request review commentrust-lang/rust

Also share drop-glue when compiling with -Zshare-generics (i.e. at opt-level=0)

 fn upstream_monomorphizations_provider(         cnum_stable_ids     }; +    let drop_in_place_fn_def_id = tcx.lang_items().drop_in_place_fn();+     for &cnum in cnums.iter() {         for (exported_symbol, _) in tcx.exported_symbols(cnum).iter() {-            if let &ExportedSymbol::Generic(def_id, substs) = exported_symbol {-                let substs_map = instances.entry(def_id).or_default();--                match substs_map.entry(substs) {-                    Occupied(mut e) => {-                        // If there are multiple monomorphizations available,-                        // we select one deterministically.-                        let other_cnum = *e.get();-                        if cnum_stable_ids[other_cnum] > cnum_stable_ids[cnum] {-                            e.insert(cnum);-                        }+            let (def_id, substs) = match *exported_symbol {+                ExportedSymbol::Generic(def_id, substs) => {+                    (def_id, substs)+                }+                ExportedSymbol::DropGlue(ty) => {+                    if let Some(drop_in_place_fn_def_id) = drop_in_place_fn_def_id {+                        (drop_in_place_fn_def_id, tcx.intern_substs(&[ty.into()]))+                    } else {+                        // `drop_in_place` in place does not exist, don't try+                        // to use it.+                        continue

So for #[no_core] crates? In any case, the implementation here should be able to handle that gracefully.

michaelwoerister

comment created time in a month

push eventmichaelwoerister/rust

Michael Woerister

commit sha 7bbdeb6bcc32f77186ab87fca946f4461dee15a1

Add projection query for upstream drop-glue instances. This reduces the amount of invalidated data when new types are add to upstream crates.

view details

push time in a month

pull request commentrust-lang/rust

Don't ICE on path-collision in dep-graph

Here's an improved, less brittle option:

fn def_id_corresponds_to_hir_dep_node(tcx: TyCtxt<'_>, def_id: DefId) -> bool {
    let hir_id = tcx.hir.as_local_hir_id(def_id).unwrap();
    def_id.index == hir_id.owner
}
pnkfelix

comment created time in a month

pull request commentrust-lang/rust

Don't ICE on path-collision in dep-graph

Here is an option for implementing the above check:

fn def_id_corresponds_to_hir_dep_node(tcx: TyCtxt<'_>, def_id: DefId) -> bool {
    let hir_id = tcx.hir.as_local_hir_id(def_id).unwrap();
    match tcx.hir.get(hir_id) {
        Node::Item(_) |
        Node::TraitItem(_) |
        Node::ImplItem(_) |
        Node::Crate => true,

        Node::Param(_) |
        Node::ForeignItem(_) |
        Node::Variant(_) |
        Node::Field(_) |
        Node::AnonConst(_|
        Node::Expr(_) |
        Node::Stmt(_) |
        Node::PathSegment(_) |
        Node::Ty(_) |
        Node::TraitRef(_) |
        Node::Binding(_) |
        Node::Pat(_) |
        Node::Arm(_) |
        Node::Block(_) |
        Node::Local(_) |
        Node::MacroDef(_) |
        Node::Ctor(_) |
        Node::Lifetime(_|
        Node::GenericParam(_) |
        Node::Visibility(_) => false
    }
}

Hopefully the whole notion of pre-allocating HIR dep-nodes will go away at some point.

pnkfelix

comment created time in a month

Pull request review commentrust-lang/rust

Also share drop-glue when compiling with -Zshare-generics (i.e. at opt-level=0)

 fn upstream_monomorphizations_provider(         cnum_stable_ids     }; +    let drop_in_place_fn_def_id = tcx.lang_items().drop_in_place_fn();+     for &cnum in cnums.iter() {         for (exported_symbol, _) in tcx.exported_symbols(cnum).iter() {-            if let &ExportedSymbol::Generic(def_id, substs) = exported_symbol {-                let substs_map = instances.entry(def_id).or_default();--                match substs_map.entry(substs) {-                    Occupied(mut e) => {-                        // If there are multiple monomorphizations available,-                        // we select one deterministically.-                        let other_cnum = *e.get();-                        if cnum_stable_ids[other_cnum] > cnum_stable_ids[cnum] {-                            e.insert(cnum);-                        }+            let (def_id, substs) = match *exported_symbol {+                ExportedSymbol::Generic(def_id, substs) => {+                    (def_id, substs)+                }+                ExportedSymbol::DropGlue(ty) => {+                    if let Some(drop_in_place_fn_def_id) = drop_in_place_fn_def_id {+                        (drop_in_place_fn_def_id, tcx.intern_substs(&[ty.into()]))+                    } else {+                        // `drop_in_place` in place does not exist, don't try+                        // to use it.+                        continue

Is there ever a case where tcx.lang_items().drop_in_place_fn() can validly return None?

michaelwoerister

comment created time in a month

push eventmichaelwoerister/rust

Michael Woerister

commit sha 168c6a1abcc27acec7180ac83d8720555e0b8d3d

Add projection query for upstream drop-glue instances. This reduces the amount of invalidated data when new types are add to upstream crates.

view details

push time in a month

pull request commentrust-lang/rust

Don't ICE on path-collision in dep-graph

Hm, now I understand what's going on: In revision 1 ::Something::foo is a method, i.e. something that has its own DepNode. In revision 2 ::Something::foo is a field, i.e. something that does not have its own DepNode, but does have a valid DefId/DefPath. The check assumes that if something has a DefId it should also have a corresponding Hir or HirBody dep-node, which, as demonstrated here, is not actually the case.

So a more targeted fix would be something like:

if let Some(def_id) = dep_dep_node.extract_def_id(tcx) {
    if is_something_that_corresponds_to_a_dep_node(def_id) {
        // If the node does exist, it should have
        // been pre-allocated.
        bug!(
            "DepNode {:?} should have been \
              pre-allocated but wasn't.",
            dep_dep_node
        )
    } else {
        // This is something that has a valid DefPath 
        // but does not have a corresponding `DepNode`,
        // e.g. a struct field. This branch is hit if
        // a proper item with the given DefPath existed
        // in the previous compilation session.
    }
} else {
    // If the node does not exist anymore, we
    // just fail to mark green.
    return None;
}

The tricky part is coming up with a solid implementation for is_something_that_corresponds_to_a_dep_node(). I'm not sure if that is worth the trouble...

pnkfelix

comment created time in a month

pull request commentrust-lang/rust

Don't ICE on path-collision in dep-graph

@pnkfelix Do you have a small reproduction for the issue somewhere?

D'oh! ... regression test ...

pnkfelix

comment created time in a month

Pull request review commentrust-lang/rust

[self-profiler] add selfprofiling to llvm

 fn get_resident() -> Option<usize> {         }     } }++fn pass_name_to_string_id(+    profiler: &Arc<SelfProfiler>,+    string_cache: &mut FxHashMap<String, StringId>,+    llvm_name: &str,+) -> StringId {+    *string_cache.entry(llvm_name.to_string()).or_insert_with(|| profiler.alloc_string(llvm_name))+}++fn ir_name_to_string_id(+    profiler: &Arc<SelfProfiler>,+    string_cache: &mut FxHashMap<String, StringId>,+    llvm_name: &str,+) -> StringId {+    *string_cache.entry(llvm_name.to_string()).or_insert_with(|| {+        let demangled_ir_name = rustc_demangle::demangle(llvm_name).to_string();+        profiler.alloc_string(demangled_ir_name.as_str())+    })+}++fn llvm_args_to_string_id(+    profiler: &Arc<SelfProfiler>,+    string_cache: &mut FxHashMap<String, StringId>,+    pass_name: &str,+    ir_name: &str,+) -> EventId {+    let pass_name = pass_name_to_string_id(profiler, string_cache, pass_name);+    let mut components = vec![StringComponent::Ref(pass_name)];+    // handle that LazyCallGraph::SCC is a comma separated list within parentheses+    let parentheses: &[_] = &['(', ')'];+    let trimed = ir_name.trim_matches(parentheses);+    for part in trimed.split(',') {+        let ir_name = ir_name_to_string_id(profiler, string_cache, part);+        components.push(StringComponent::Value(SEPARATOR_BYTE));+        components.push(StringComponent::Ref(ir_name));+    }

Is there ever more than one argument? If not you can use the EventIdBuilder::from_label_and_arg.

andjo403

comment created time in a month

Pull request review commentrust-lang/rust

[self-profiler] add selfprofiling to llvm

 fn get_resident() -> Option<usize> {         }     } }++fn pass_name_to_string_id(+    profiler: &Arc<SelfProfiler>,+    string_cache: &mut FxHashMap<String, StringId>,+    llvm_name: &str,+) -> StringId {+    *string_cache.entry(llvm_name.to_string()).or_insert_with(|| profiler.alloc_string(llvm_name))+}++fn ir_name_to_string_id(+    profiler: &Arc<SelfProfiler>,+    string_cache: &mut FxHashMap<String, StringId>,+    llvm_name: &str,+) -> StringId {+    *string_cache.entry(llvm_name.to_string()).or_insert_with(|| {+        let demangled_ir_name = rustc_demangle::demangle(llvm_name).to_string();+        profiler.alloc_string(demangled_ir_name.as_str())+    })+}++fn llvm_args_to_string_id(+    profiler: &Arc<SelfProfiler>,+    string_cache: &mut FxHashMap<String, StringId>,+    pass_name: &str,+    ir_name: &str,+) -> EventId {+    let pass_name = pass_name_to_string_id(profiler, string_cache, pass_name);+    let mut components = vec![StringComponent::Ref(pass_name)];+    // handle that LazyCallGraph::SCC is a comma separated list within parentheses+    let parentheses: &[_] = &['(', ')'];+    let trimed = ir_name.trim_matches(parentheses);+    for part in trimed.split(',') {+        let ir_name = ir_name_to_string_id(profiler, string_cache, part);+        components.push(StringComponent::Value(SEPARATOR_BYTE));+        components.push(StringComponent::Ref(ir_name));+    }+    EventId::from_label(profiler.alloc_string(components.as_slice()))+}++pub struct LlvmSelfProfiler<'a> {+    profiler: Arc<SelfProfiler>,+    stack: Vec<TimingGuard<'a>>,+    string_cache: FxHashMap<String, StringId>,

I have a PR in the works that allows for caching non-'static strings in the SelfProfiler so this separate cache won't be needed soon.

andjo403

comment created time in a month

Pull request review commentrust-lang/rust

[self-profiler] add selfprofiling to llvm

 fn get_resident() -> Option<usize> {         }     } }++fn pass_name_to_string_id(+    profiler: &Arc<SelfProfiler>,+    string_cache: &mut FxHashMap<String, StringId>,+    llvm_name: &str,+) -> StringId {+    *string_cache.entry(llvm_name.to_string()).or_insert_with(|| profiler.alloc_string(llvm_name))+}++fn ir_name_to_string_id(+    profiler: &Arc<SelfProfiler>,+    string_cache: &mut FxHashMap<String, StringId>,+    llvm_name: &str,+) -> StringId {+    *string_cache.entry(llvm_name.to_string()).or_insert_with(|| {+        let demangled_ir_name = rustc_demangle::demangle(llvm_name).to_string();+        profiler.alloc_string(demangled_ir_name.as_str())+    })+}++fn llvm_args_to_string_id(+    profiler: &Arc<SelfProfiler>,+    string_cache: &mut FxHashMap<String, StringId>,+    pass_name: &str,+    ir_name: &str,+) -> EventId {+    let pass_name = pass_name_to_string_id(profiler, string_cache, pass_name);+    let mut components = vec![StringComponent::Ref(pass_name)];+    // handle that LazyCallGraph::SCC is a comma separated list within parentheses+    let parentheses: &[_] = &['(', ')'];+    let trimed = ir_name.trim_matches(parentheses);+    for part in trimed.split(',') {+        let ir_name = ir_name_to_string_id(profiler, string_cache, part);+        components.push(StringComponent::Value(SEPARATOR_BYTE));+        components.push(StringComponent::Ref(ir_name));+    }+    EventId::from_label(profiler.alloc_string(components.as_slice()))+}++pub struct LlvmSelfProfiler<'a> {

I think LlvmSelfProfiler and all of the related code should be moved to librustc_codegen_llvm, e.g. into a file librustc_codegen_llvm/back/profiling.rs.

andjo403

comment created time in a month

pull request commentrust-lang/rust

Don't ICE on path-collision in dep-graph

@pnkfelix Do you have a small reproduction for the issue somewhere?

pnkfelix

comment created time in a month

pull request commentrust-lang/rust

Don't ICE on path-collision in dep-graph

I'm not sure this is a valid fix. IIRC, it's an invariant of the system that we mark all HIR nodes with colors before the query system kicks in. A def-path being re-used should not really change that. I'll try to dig a little deeper.

pnkfelix

comment created time in a month

pull request commentrust-lang/rust

Also share drop-glue when compiling with -Zshare-generics (i.e. at opt-level=0)

r? @alexcrichton (https://github.com/rust-lang/rust/pull/68414/commits/d3ca81c96b758dd7ec89990f00b26282427117cd as added since you last looked at it)

michaelwoerister

comment created time in a month

pull request commentrust-lang/rust

Also share drop-glue when compiling with -Zshare-generics (i.e. at opt-level=0)

This is ready for an actual review.

michaelwoerister

comment created time in a month

push eventmichaelwoerister/rust

Stein Somers

commit sha 9e90840a6ae4a6f61781bd80adea825d156ddffa

Simplify NodeHeader by avoiding slices in BTreeMaps with shared roots

view details

Dylan MacKenzie

commit sha 2fb4c4472e4563a52e2dc544e47a01f564b9219e

Improve graphviz visualization for new framework

view details

Dylan MacKenzie

commit sha 07c51f605a021f1416a23e0cd76afa2156d5526c

Implement new dataflow framework and cursor

view details

Dylan MacKenzie

commit sha 7d5885727d4a42518f811933de20676f8be61818

Remove old "generic" framework

view details

Dylan MacKenzie

commit sha 355cfcdf433c47bfb2365752d33f2a24dfc6e78f

Use unified dataflow framework in `check_consts`

view details

Dylan MacKenzie

commit sha 47dce1be81b17512db9e730fdd04adf01f7cf10f

Add test for `ResultsCursor` This is a unit test that ensures the `seek` functions work correctly.

view details

Dylan MacKenzie

commit sha 2727f10b216fc6ac7d3a22f216f55e46e9c99506

Improve docs for new framework

view details

Dylan MacKenzie

commit sha b70898d5eeb9f7fc994035eb90b609c49f53f745

Improve docs for `GenKill` and `GenKillSet`

view details

Dylan MacKenzie

commit sha 1006ad036a447f23b2f9f88a300bf56d7bcf8b64

Fix test

view details

Jorge Aparicio

commit sha 470cdf54ac9acee20ab8da46ef7899bae9f58f29

add bare metal ARM Cortex-A targets to rustc -> `rustc --target armv7-none-eabi` will work also build rust-std (rustup) components for them -> `rustup target add armv7-none-eabi` will work

view details

Dylan MacKenzie

commit sha be730e16de1c2590d20ff76c9dfa9a7536fd418a

Use trailing underscore for helper methods

view details

Vita Batrla

commit sha 34878d7b05813e090b370f48b8d437e4bd875094

Options IP_MULTICAST_TTL and IP_MULTICAST_LOOP are 1 byte on BSD and Solaris See ip(4P) man page: IP_MULTICAST_TTL Time to live for multicast datagrams. This option takes an unsigned character as an argument. Its value is the TTL that IP uses on outgoing multi- cast datagrams. The default is 1. IP_MULTICAST_LOOP Loopback for multicast datagrams. Normally multi- cast datagrams are delivered to members on the sending host (or sending zone). Setting the unsigned character argument to 0 causes the oppo- site behavior, meaning that when multiple zones are present, the datagrams are delivered to all zones except the sending zone. https://docs.oracle.com/cd/E88353_01/html/E37851/ip-4p.html https://man.openbsd.org/ip.4

view details

Vita Batrla

commit sha dda32e4e535fb3fb9e728b8c96386db7d231b247

refactor fix using cfg_if!

view details

Vita Batrla

commit sha 239a7d9124ee486e9d0096429136d719437b83b2

refactor fix using cfg_if! (fix build)

view details

Jethro Beekman

commit sha 766f6c5d0ad5c71f42ab3a305572bf1e7b5edafa

Actually pass target LLVM args to LLVM

view details

Tobias Kortkamp

commit sha de388032555b697d1b0ef197241886ab90ac39b2

Add -Wl,-znotext to default linker flags to link with lld 9 on FreeBSD 13.0-CURRENT i386 rust-nightly has been failing to link since 2019-12-10 with variations of ``` = note: ld: error: relocation R_386_PC32 cannot be used against symbol __rust_probestack; recompile with -fPIC >>> defined in /wrkdirs/usr/ports/lang/rust-nightly/work/rustc-nightly-src/build/i686-unknown-freebsd/stage1/lib/rustlib/i686-unknown-freebsd/lib/libcompiler_builtins-6570a75fe85f0e1a.rlib(compiler_builtins-6570a75fe85f0e1a.compiler_builtins.2i519eqi-cgu.15.rcgu.o) >>> referenced by std.4xivr03c-cgu.14 >>> std-9bd70afd58e204b7.std.4xivr03c-cgu.14.rcgu.o:(_$LT$alloc..boxed..Box$LT$F$GT$$u20$as$u20$core..ops..function..FnOnce$LT$A$GT$$GT$::call_once::h1c78ed6e734a2bfc (.llvm.10122419023709863394)) in archive /wrkdirs/usr/ports/lang/rust-nightly/work/rustc-nightly-src/build/i686-unknown-freebsd/stage1/lib/rustlib/i686-unknown-freebsd/lib/libstd-9bd70afd58e204b7.rlib ld: error: relocation R_386_PC32 cannot be used against symbol __rust_probestack; recompile with -fPIC >>> defined in /wrkdirs/usr/ports/lang/rust-nightly/work/rustc-nightly-src/build/i686-unknown-freebsd/stage1/lib/rustlib/i686-unknown-freebsd/lib/libcompiler_builtins-6570a75fe85f0e1a.rlib(compiler_builtins-6570a75fe85f0e1a.compiler_builtins.2i519eqi-cgu.15.rcgu.o) >>> referenced by std.4xivr03c-cgu.14 >>> std-9bd70afd58e204b7.std.4xivr03c-cgu.14.rcgu.o:(std::io::util::copy::h9115f048f2203467) in archive /wrkdirs/usr/ports/lang/rust-nightly/work/rustc-nightly-src/build/i686-unknown-freebsd/stage1/lib/rustlib/i686-unknown-freebsd/lib/libstd-9bd70afd58e204b7.rlib clang-cpp: error: linker command failed with exit code 1 (use -v to see invocation) error: aborting due to previous error error: could not compile `rustc_macros`. ``` Full log: http://beefy17.nyi.freebsd.org/data/head-i386-default/p523508_s356869/logs/rust-nightly-1.42.0.20200118.log AFAICT it stopped building after bumping compiler_builtins to 0.1.22 in https://github.com/rust-lang/rust/pull/67110.

view details

Yuki Okushi

commit sha 2ecc48ffa17d55ec02f3beb5bb17c718cb439202

Fix ICE #68025

view details

Yuki Okushi

commit sha 0017f495783324b036ffcaafedf7881725ba1e02

Replace `walk_callee` with `consume_expr`

view details

Esteban Küber

commit sha 6ba08755dfd9ddbb55248a0263a4e81d3602b410

When encountering an undefined named lifetime, point to where it can be This doesn't mention that using an existing lifetime is possible, but that would hopefully be clear as always being an option. The intention of this is to teach newcomers what the lifetime syntax is.

view details

Esteban Küber

commit sha 78d3ea5484c3ebcc49bddba39f5b5be5f99b8c65

When encountering an expected named lifetime and none are present, suggest adding one

view details

push time in a month

create barnchmichaelwoerister/rust

branch : share-generics-export-blocker-poc

created branch time in a month

more