profile
viewpoint
Ankush Agarwal ankushagarwal San Francisco

delete branch ankushagarwal/libra

delete branch : refactor-metrics

delete time in 9 hours

PR opened libra/libra

[cluster-test] Fix metrics for multi-region-simulation

Summary

Consensus metrics were recently refactored : https://github.com/libra/libra/commit/4c33307bdebc91e97d72154b6f0d54e6952438e3

Updating the metrics here to use the new metrics

Test Plan

Tested these metrics on grafana

+4 -4

0 comment

1 changed file

pr created time in 13 hours

create barnchankushagarwal/libra

branch : refactor-metrics

created branch time in 13 hours

delete branch ankushagarwal/libra

delete branch : no-delay

delete time in 14 hours

push eventankushagarwal/libra

Philip Hayes

commit sha 0672d5af6e1fd301d9f1d234c0c5480ccfff226e

[network] Add Dial/Disconnect Peer request types to network interface See #1516 Closes: #1554 Approved by: bothra90

view details

Philip Hayes

commit sha eeadab0a0cfe4fd845520ba7b929c0e9499fa315

[network] Wire-through Dial/Disconnect requests to PeerManager actor See #1516 Closes: #1554 Approved by: bothra90

view details

Abhay Bothra

commit sha a46f375ff7c1de648d96b1fcc1108db2c6275619

[network] Test multiplexing of yamux substreams within yamux substreams Closes: #1563 Approved by: phlip9

view details

Zekun Li

commit sha be9270387c2edeb3ed21abb9c87911a043005623

suppress dead code warning Closes: #1564 Approved by: bschwab

view details

Young Yang Liauw

commit sha 194713b84f5ec9e34b85c1065f1062701fd963bc

[coverage][easy] update code coverage runner Cargo build was updated recently. We are updating the runner to use `cargo xtest` to stay in sync. Closes: #1553 Approved by: bmwill

view details

Sherry Xiao

commit sha 008323c818c4d84ea0ffafef4020b428e72e5e04

[monitoring] Expose non-numeric metrics put git revision into environment variable Closes: #1532 Approved by: bmwill

view details

Sherry Xiao

commit sha 12546244f1d9970a79eaf10bd2058632703e1689

Update common/metrics/build.rs Co-Authored-By: Brandon Williams <bwilliams.eng@gmail.com> Closes: #1532 Approved by: bmwill

view details

Ankush Agarwal

commit sha ca6a906de79dd4963d345160070c51bb091c1b91

[cluster-test] Refactor execute_jobs function into a separate module Summary We want to use the execute_jobs function directly from various experiments and not just ClusterTestRunner Test Plan Compiles successfully Closes: #1560 Approved by: andll

view details

Brandon Williams

commit sha e02f72318a6170a78b37f570d28cec30f42ea499

[x] create an abstraction around invoking cargo commands Closes: #1568 Approved by: metajack

view details

Brandon Williams

commit sha 53658705cc7f98131899da82d86b7c87f0faf2e6

[x] refactor common cargo logic for reuse Closes: #1568 Approved by: metajack

view details

Brandon Williams

commit sha 75658222537bcf7e45ccd135335b8f791a862a00

[x] use --workspace instead of the deprecated --all Closes: #1568 Approved by: metajack

view details

Brandon Williams

commit sha ca894eac9d1ef2e52d61cc555af90bd5e2a101ca

[x] run unit tests before system tests Currently package exceptions have tests run on them first but since the testsuite is one of those exceptions we end up running all the end to end tests first. Debugging these e2e tests are a little more difficult than debugging normal unit tests so if there happen to be any errors it would be easier to first debug the unit tests before taking a look at the e2e tests. To fix this, reorder the calls to `cargo test` to first run them on the packages without exceptions. Closes: #1568 Approved by: metajack

view details

Brandon Williams

commit sha 85df5c7963d32a98bc6341ba2f1aae61ead04e07

[x] refactor how common arguments are passed to CargoCommand methods Closes: #1568 Approved by: metajack

view details

Brandon Williams

commit sha e818d9ca4fd652dee42cd41f02aa48c353dcdc3f

[x] add check command Closes: #1568 Approved by: metajack

view details

Brandon Williams

commit sha cb06036d5a7b2e651e2ae74c760c189462d62c82

[x] add clippy commnad Closes: #1568 Approved by: metajack

view details

Brandon Williams

commit sha acaabd5320286e0be54e3d3b9566cee874e1ab15

[ci] use cargo xclippy and remove clippy.sh Closes: #1568 Approved by: metajack

view details

Zekun Li

commit sha 9fe70ee22d6f7acbbeecfdfae0c7d054032e846e

[consensus][reconfig] complete support of epoch change upon receiving ledger info Closes: #1562 Approved by: dmitri-perelman

view details

Zekun Li

commit sha ed83cfe1183c9f4256a7aa0a416810291b62d599

[consensus][reconfig] add unit test for the flow Closes: #1562 Approved by: dmitri-perelman

view details

Ankush Agarwal

commit sha f88feddaebd74d7c85442ed5b4b15a2107d309f5

[enhancement] Create an experiment to simulate multi-region Summary Create a NetworkDelay action which adds network delay to a single instance using tc. Create a MultiRegionSimulation experiment which simulates a two region split among the instances Creates a virtual region1 and region2 Adds delay to all packets which go from region1 to region2 We dont need to do the vice versa because region1 will be delaying all its responses to region2 Depends on PR #1560 Test Plan Ran ~/libra/target/cluster_test_docker_builder/cluster-test --workplace=kush --multi-region-simulation --multi-region-split=1 --multi-region-exp-duration-secs=30 on my test cluster. Closes: #1561 Approved by: ankushagarwal

view details

Andrey Chursin

commit sha 570a5509fac1604de4c11870639949c4acbdc24a

[cluster-test] Show changelog This diff adds changelog with list of commits between previous tested image and current tested image. Closes: #1570 Approved by: ankushagarwal

view details

push time in 14 hours

pull request commentlibra/libra

[consensus] push RPC processing into event processor to make network thread non-blocking

@zekun000 : This is still happening with this change included:

I1112 15:27:50.877842 140339551663872 executor/src/block_processor.rs:298] Skipping the first 1000 transactions.
E1112 15:27:50.879766 140339551663872 state-synchronizer/src/coordinator.rs:130] [state sync] failed to process chunk response from e490398d3ce364a2cd4c668b3cf75c4017ee3c3570d76aba6112f07e888b1086: [state sync] non sequential chunk. Known version: 4813543, received: 4814544
I1112 15:27:50.970095 140339551663872 executor/src/block_processor.rs:285] Local version: 4814543. First transaction version in request: Some(4813544). Number of transactions in request: 1000.
I1112 15:27:50.975854 140339551663872 executor/src/block_processor.rs:298] Skipping the first 1000 transactions.
E1112 15:27:50.977730 140339551663872 state-synchronizer/src/coordinator.rs:130] [state sync] failed to process chunk response from de73eb3544ba2f06956151488df834a05472762705a4319a7540c098c64912b5: [state sync] non sequential chunk. Known version: 4813543, received: 4814544
zekun000

comment created time in 17 hours

pull request commentlibra/libra

[network] Set TCP_NODELAY for tcp connections

@bors-libra r+

ankushagarwal

comment created time in a day

delete branch ankushagarwal/libra

delete branch : metrics

delete time in a day

push eventankushagarwal/libra

Ankush Agarwal

commit sha d1af3bdef912fd04343dcc302803c8ea9a1ec71f

[network] Set TCP_NODELAY for tcp connections

view details

push time in a day

pull request commentlibra/libra

[network] Set TCP_NODELAY for tcp connections

@bors-libra r+

ankushagarwal

comment created time in a day

Pull request review commentlibra/libra

[network] Set TCP_NODELAY for tcp connections

 pub struct TcpTransport { }  impl TcpTransport {+    pub fn new() -> Self {+        Self {+            nodelay: Some(true),+            ..Self::default()+        }+    }+

Updated

ankushagarwal

comment created time in a day

push eventankushagarwal/libra

Ankush Agarwal

commit sha 05fac4310365015d900551cd90c76209d8556ef2

[cluster-test] Emit transactions when running network delay simulations Summary This PR updates the multi_region_network_simulation experiment such that transactions are generated in the background. Without that, we would be simply processing empty blocks in the experiment. Test Plan Ran against my cluster. Closes: #1708 Approved by: andll

view details

Ankush Agarwal

commit sha 47e6e49be6abf2cd529ca19880f5a13533b54c6a

[network] Set TCP_NODELAY for tcp connections

view details

push time in a day

push eventankushagarwal/libra

David Wong

commit sha c30acc27d8c55857deea66f7e2ba0421321c0e20

[fuzzing] adding merkle tree proto fuzzing MOTIVATION: Adding two fuzzers. When receiving sparse and non-sparse merkle tree proofs, there is some involved proto decoding code that we can fuzz. Closes: #1232 Approved by: metajack

view details

Aaron Gao

commit sha 980343cd19a302ba3ac7aa7404ae82dcf2a3a93e

[storage] fix typo Closes: #1683 Approved by: wqfish

view details

Herman Venter

commit sha 9883c387865d3c615652ce508b1bff62e91e2e27

Make MIRAI happy Closes: #1678 Approved by: lightmark

view details

aldenhu

commit sha 0f7f29a94d91860e1ace8a5df6f0415896ed9c27

[storage] EpochByVersionSchema Closes: #1635 Approved by: msmouse

view details

aldenhu

commit sha b1cbb9a5b8b7abdf8a8db751b52a3b31e24008a8

[storage] `LedgerStore::get_epoch()` Closes: #1635 Approved by: msmouse

view details

Clark Barrett

commit sha cdfb300e5e4611aee012ab0cf53c6ebaa1e4209b

Add initial debugging instrumentation to tree_heap Closes: #1684 Approved by: cbarrettfb

view details

Abhay Bothra

commit sha 78569198e01455723a758645bfb8fb3649be7973

[network] Add benchmark for transport with TCP_NODELAY set Closes: #1653 Approved by: ankushagarwal

view details

Bob Wilson

commit sha fbd6040d6c6e724171a4307820b8c13fec67396b

[language] Remove unused ParseError::UnrecognizedToken The unused UnrecognizedToken error adds a type parameter for Token that propagates all over the place. Since this is not even used, remove it along with all those type parameters. We should definitely add more detailed error messages (probably only in the new move-lang compiler, not in the IR compiler) but whatever we do should not require the internal token types, since the diagnostics should describe issues in terms that are directly visible to end users. Closes: #1685 Approved by: tnowacki

view details

Bob Wilson

commit sha 1f5c808edbf801082b18e5a834ee89e65fae3a38

[language] Clean up some remaining references to lalrpop for the IR compiler Closes: #1685 Approved by: tnowacki

view details

Todd Nowacki

commit sha b185766be55859e975aa639e6aba26e30e53cddf

[language][Move] Added expansion tests - Added tests for expansion pass. - Tried to cover +/- cases for each check Closes: #1660 Approved by: tzakian

view details

Herman Venter

commit sha 85a11b5f169964681869f84186852f127a7ca6cb

Make MIRAI happy Closes: #1681 Approved by: huitseeker

view details

Weiliang Li

commit sha de7b695be03229234aca9fe008831f60f8bfcb9c

make clippy happy

view details

David Wong

commit sha 3e40b5b54f467f73e0f07ee48bfe2bea2464b585

[consensus] removing the `terminate` in chained_bft_smr loop The consensus loop/select has a terminate that should not be reachable as the loop should run forever. This commit removes it. Closes: #1675 Approved by: zekun000

view details

Ankush Agarwal

commit sha 760fa412aab89c82c7979af0add2a601aa0b9305

[cluster-test] Update timeouts, print intermediate results Closes: #1689 Approved by: andll

view details

Bob Wilson

commit sha 0aa5ca1489eae348550c07238f2ba2b3bfda45c3

[language] Refactor lexer to add a lookahead API There are some cases where the parser really needs to look ahead at the next token before deciding how to parse the current token. It was hard to support that with the original lexer that I hacked together from the lalrpop output, but now that the lexer is sane, it is not so hard. Add a new lookahead API and use it in the parser, replacing the current workarounds. Closes: #1688 Approved by: tnowacki

view details

Todd Nowacki

commit sha e95ae06ea81898b16361b8da3ea28e3e35408432

[easy][move] Sort errors by first Loc when displaying - Sort errors by the initial Loc, makes reading errors a lot easier in big lists Closes: #1693 Approved by: vgao1996

view details

Andrey Chursin

commit sha b7f77a0e854d6563f62fad50fc48543117f20b17

[cluster-test] Test if file exists before log rotate When cluster test fails to setup cluster(for example, failed to cp genesis.blob), then there is no log file on host - previous was log rotate, and new one was not created. In this case attempt to log rotate it on next run produces a lot of noise. This checks if file exists before attempting to log rotate it Closes: #1692 Approved by: ankushagarwal

view details

Runtian Zhou

commit sha 346a2700cc6dc6b9c6849d0bfca4934de71b5fc7

[vm] Clean up the code cache api. Closes: #1668 Approved by: dariorussi

view details

Andrey Chursin

commit sha 14e6806915ad5b2e257e5e49f7b802b0311fe928

[cluster-test] Bump liveness health check timeout 1m->2m We have intermittent failures with 1m timeout, this diff will double it Closes: #1697 Approved by: ankushagarwal

view details

Young Yang Liauw

commit sha 3743988d16364960329af1342297432ca5355e5e

[CI] upgrade stretch to buster In CircleCI setup, bump rust:stretch to rust:buster for builders. This is to keep it consistent with Docker build. Closes: #1691 Approved by: bmwill

view details

push time in a day

Pull request review commentlibra/libra

[consensus] Update all consensus counters to new format

 pub static ref EPOCH_CHANGE_DEQUEUED_MSGS: IntCounter = OP_COUNTERS.counter("epo  /// Count of the block proposals sent by this validator since last restart /// (both primary and secondary)-pub static ref PROPOSALS_COUNT: IntCounter = OP_COUNTERS.counter("proposals_count");+pub static ref PROPOSALS_COUNT: IntCounter = register_int_counter!("libra_consensus_proposals_count", "Count of the block proposals sent by this validator since last restart (both primary and secondary)").unwrap();  /// Count the number of times a validator voted for secondary proposals (upon timeout) since /// last restart.-pub static ref VOTE_SECONDARY_PROPOSAL_COUNT: IntCounter = OP_COUNTERS.counter("vote_secondary_proposal_count");+pub static ref VOTE_SECONDARY_PROPOSAL_COUNT: IntCounter = register_int_counter!("libra_consensus_vote_secondary_proposal_count", "Count the number of times a validator voted for secondary proposals (upon timeout) since last restart.").unwrap();  /// Count the number of times a validator voted for a nil block since last restart.-pub static ref VOTE_NIL_COUNT: IntCounter = OP_COUNTERS.counter("vote_nil_count");+pub static ref VOTE_NIL_COUNT: IntCounter = register_int_counter!("libra_consensus_vote_nil_count", "Count the number of times a validator voted for a nil block since last restart.").unwrap();  ////////////////////// // PACEMAKER COUNTERS ////////////////////// /// Count of the rounds that gathered QC since last restart.-pub static ref QC_ROUNDS_COUNT: IntCounter = OP_COUNTERS.counter("qc_rounds_count");+pub static ref QC_ROUNDS_COUNT: IntCounter = register_int_counter!("libra_consensus_qc_rounds_count", "Count of the rounds that gathered QC since last restart.").unwrap();  /// Count of the timeout rounds since last restart (close to 0 in happy path).-pub static ref TIMEOUT_ROUNDS_COUNT: IntCounter = OP_COUNTERS.counter("timeout_rounds_count");+pub static ref TIMEOUT_ROUNDS_COUNT: IntCounter = register_int_counter!("libra_consensus_timeout_rounds_count", "Count of the timeout rounds since last restart (close to 0 in happy path).").unwrap();  /// Count the number of timeouts a node experienced since last restart (close to 0 in happy path). /// This count is different from `TIMEOUT_ROUNDS_COUNT`, because not every time a node has /// a timeout there is an ultimate decision to move to the next round (it might take multiple /// timeouts to get the timeout certificate).-pub static ref TIMEOUT_COUNT: IntCounter = OP_COUNTERS.counter("timeout_count");+pub static ref TIMEOUT_COUNT: IntCounter = register_int_counter!("libra_consensus_timeout_count", "Count the number of timeouts a node experienced since last restart (close to 0 in happy path).").unwrap();  /// The timeout of the current round.-pub static ref ROUND_TIMEOUT_MS: IntGauge = OP_COUNTERS.gauge("round_timeout_ms");+pub static ref ROUND_TIMEOUT_MS: IntGauge = register_int_gauge!("libra_consensus_round_timeout_s", "The timeout of the current round.").unwrap();  //////////////////////// // SYNCMANAGER COUNTERS //////////////////////// /// Count the number of times we invoked state synchronization since last restart.-pub static ref STATE_SYNC_COUNT: IntCounter = OP_COUNTERS.counter("state_sync_count");+pub static ref STATE_SYNC_COUNT: IntCounter = register_int_counter!("libra_consensus_state_sync_count", "Count the number of times we invoked state synchronization since last restart.").unwrap();  /// Count the number of block retrieval requests issued since last restart.-pub static ref BLOCK_RETRIEVAL_COUNT: IntCounter = OP_COUNTERS.counter("block_retrieval_count");+pub static ref BLOCK_RETRIEVAL_COUNT: IntCounter = register_int_counter!("libra_consensus_block_retrieval_count", "Count the number of block retrieval requests issued since last restart.").unwrap();  /// Histogram of block retrieval duration.-pub static ref BLOCK_RETRIEVAL_DURATION_S: DurationHistogram = OP_COUNTERS.duration_histogram("block_retrieval_duration_s");+pub static ref BLOCK_RETRIEVAL_DURATION_S: DurationHistogram = DurationHistogram::new(register_histogram!("libra_consensus_block_retrieval_duration_s", "Histogram of block retrieval duration.").unwrap());  /// Histogram of state sync duration.-pub static ref STATE_SYNC_DURATION_S: DurationHistogram = OP_COUNTERS.duration_histogram("state_sync_duration_s");+pub static ref STATE_SYNC_DURATION_S: DurationHistogram =DurationHistogram::new(register_histogram!("libra_consensus_state_sync_duration_s", "Histogram of state sync duration.").unwrap());  /// Counts the number of times the sync info message has been set since last restart.-pub static ref SYNC_INFO_MSGS_SENT_COUNT: IntCounter = OP_COUNTERS.counter("sync_info_msg_sent_count");+pub static ref SYNC_INFO_MSGS_SENT_COUNT: IntCounter = register_int_counter!("libra_consensus_sync_info_msg_sent_count", "Counts the number of times the sync info message has been set since last restart.").unwrap();  /// Counts the number of times the sync info message has been received since last restart.-pub static ref SYNC_INFO_MSGS_RECEIVED_COUNT: IntCounter = OP_COUNTERS.counter("sync_info_msg_received_count");+pub static ref SYNC_INFO_MSGS_RECEIVED_COUNT: IntCounter = register_int_counter!("libra_consensus_sync_info_msg_received_count", "Counts the number of times the sync info message has been received since last restart.").unwrap();  ////////////////////// // RECONFIGURATION COUNTERS ////////////////////// /// Current epoch num-pub static ref EPOCH: IntGauge = OP_COUNTERS.gauge("epoch");+pub static ref EPOCH: IntGauge = register_int_gauge!("libra_consensus_epoch", "Current epoch num").unwrap(); /// The number of validators in the current epoch-pub static ref CURRENT_EPOCH_VALIDATORS: IntGauge = OP_COUNTERS.gauge("current_epoch_validators");+pub static ref CURRENT_EPOCH_VALIDATORS: IntGauge = register_int_gauge!("libra_consensus_current_epoch_validators", "The number of validators in the current epoch").unwrap(); /// Quorum size in the current epoch-pub static ref CURRENT_EPOCH_QUORUM_SIZE: IntGauge = OP_COUNTERS.gauge("current_epoch_quorum_size");+pub static ref CURRENT_EPOCH_QUORUM_SIZE: IntGauge = register_int_gauge!("libra_consensus_current_epoch_quorum_size", "Quorum size in the current epoch").unwrap();   ////////////////////// // BLOCK STORE COUNTERS ////////////////////// /// Counter for the number of blocks in the block tree (including the root). /// In a "happy path" with no collisions and timeouts, should be equal to 3 or 4.-pub static ref NUM_BLOCKS_IN_TREE: IntGauge = OP_COUNTERS.gauge("num_blocks_in_tree");+pub static ref NUM_BLOCKS_IN_TREE: IntGauge = register_int_gauge!("libra_consensus_num_blocks_in_tree", "Counter for the number of blocks in the block tree (including the root).").unwrap();  ////////////////////// // PERFORMANCE COUNTERS //////////////////////-/// Histogram of execution time (ms) of non-empty blocks.-pub static ref BLOCK_EXECUTION_DURATION_S: DurationHistogram = OP_COUNTERS.duration_histogram("block_execution_duration_s");+/// Histogram of execution time of non-empty blocks.+pub static ref BLOCK_EXECUTION_DURATION_S: DurationHistogram = DurationHistogram::new(register_histogram!("libra_consensus_block_execution_duration_s", "Histogram of execution time of non-empty blocks.").unwrap());  /// Histogram of duration of a commit procedure (the time it takes for the execution / storage to /// commit a block once we decide to do so).-pub static ref BLOCK_COMMIT_DURATION_S: DurationHistogram = OP_COUNTERS.duration_histogram("block_commit_duration_s");+pub static ref BLOCK_COMMIT_DURATION_S: DurationHistogram = DurationHistogram::new(register_histogram!("libra_consensus_block_commit_duration_s", "Histogram of duration of a commit procedure (the time it takes for the execution / storage to commit a block once we decide to do so).").unwrap());  /// Histogram for the number of txns per (committed) blocks.-pub static ref NUM_TXNS_PER_BLOCK: Histogram = OP_COUNTERS.histogram("num_txns_per_block");+pub static ref NUM_TXNS_PER_BLOCK: Histogram = register_histogram!("libra_consensus_num_txns_per_block", "Histogram for the number of txns per (committed) blocks.").unwrap(); -/// Histogram of per-transaction execution time (ms) of non-empty blocks+/// Histogram of per-transaction execution time of non-empty blocks /// (calculated as the overall execution time of a block divided by the number of transactions).-pub static ref TXN_EXECUTION_DURATION_S: DurationHistogram = OP_COUNTERS.duration_histogram("txn_execution_duration_s");+pub static ref TXN_EXECUTION_DURATION_S: DurationHistogram = DurationHistogram::new(register_histogram!("libra_consensus_txn_execution_duration_s", "Histogram of per-transaction execution time of non-empty blocks (calculated as the overall execution time of a block divided by the number of transactions).").unwrap()); -/// Histogram of execution time (ms) of empty blocks.-pub static ref EMPTY_BLOCK_EXECUTION_DURATION_S: DurationHistogram = OP_COUNTERS.duration_histogram("empty_block_execution_duration_s");+/// Histogram of execution time of empty blocks.+pub static ref EMPTY_BLOCK_EXECUTION_DURATION_S: DurationHistogram = DurationHistogram::new(register_histogram!("libra_consensus_empty_block_execution_duration_s", "Histogram of execution time of empty blocks.").unwrap());  /// Histogram of the time it takes for a block to get committed. /// Measured as the commit time minus block's timestamp.-pub static ref CREATION_TO_COMMIT_S: DurationHistogram = OP_COUNTERS.duration_histogram("creation_to_commit_s");+pub static ref CREATION_TO_COMMIT_S: DurationHistogram =DurationHistogram::new(register_histogram!("libra_consensus_creation_to_commit_s", "Histogram of the time it takes for a block to get committed. Measured as the commit time minus block's timestamp.").unwrap());  /// Duration between block generation time until the moment it gathers full QC-pub static ref CREATION_TO_QC_S: DurationHistogram = OP_COUNTERS.duration_histogram("creation_to_qc_s");+pub static ref CREATION_TO_QC_S: DurationHistogram = DurationHistogram::new(register_histogram!("libra_consensus_creation_to_qc_s", "Duration between block generation time until the moment it gathers full QC").unwrap());  /// Duration between block generation time until the moment it is received and ready for execution.-pub static ref CREATION_TO_RECEIVAL_S: DurationHistogram = OP_COUNTERS.duration_histogram("creation_to_receival_s");+pub static ref CREATION_TO_RECEIVAL_S: DurationHistogram = DurationHistogram::new(register_histogram!("libra_consensus_creation_to_receival_s", "Duration between block generation time until the moment it is received and ready for execution.").unwrap());  //////////////////////////////////// // PROPSOSAL/VOTE TIMESTAMP COUNTERS //////////////////////////////////// /// Count of the proposals that passed the timestamp rules and did not have to wait-pub static ref PROPOSAL_NO_WAIT_REQUIRED_COUNT: IntCounter = OP_COUNTERS.counter("proposal_no_wait_required_count");+pub static ref PROPOSAL_NO_WAIT_REQUIRED_COUNT: IntCounter = register_int_counter!("libra_consensus_proposal_no_wait_required_count", "Count of the proposals that passed the timestamp rules and did not have to wait").unwrap();  /// Count of the proposals where passing the timestamp rules required waiting-pub static ref PROPOSAL_WAIT_WAS_REQUIRED_COUNT: IntCounter = OP_COUNTERS.counter("proposal_wait_was_required_count");+pub static ref PROPOSAL_WAIT_WAS_REQUIRED_COUNT: IntCounter = register_int_counter!("libra_consensus_proposal_wait_was_required_count", "Count of the proposals where passing the timestamp rules required waiting").unwrap();  /// Count of the proposals that were not made due to the waiting period exceeding the maximum allowed duration, breaking timestamp rules-pub static ref PROPOSAL_MAX_WAIT_EXCEEDED_COUNT: IntCounter = OP_COUNTERS.counter("proposal_max_wait_exceeded_count");+pub static ref PROPOSAL_MAX_WAIT_EXCEEDED_COUNT: IntCounter = register_int_counter!("libra_consensus_proposal_max_wait_exceeded_count", "Count of the proposals that were not made due to the waiting period exceeding the maximum allowed duration, breaking timestamp rules").unwrap();  /// Count of the proposals that were not made due to waiting to ensure the current time exceeds min_duration_since_epoch failed, breaking timestamp rules-pub static ref PROPOSAL_WAIT_FAILED_COUNT: IntCounter = OP_COUNTERS.counter("proposal_wait_failed_count");+pub static ref PROPOSAL_WAIT_FAILED_COUNT: IntCounter = register_int_counter!("libra_consensus_proposal_wait_failed_count", "Count of the proposals that were not made due to waiting to ensure the current time exceeds min_duration_since_epoch failed, breaking timestamp rules").unwrap();  /// Histogram of time waited for successfully proposing a proposal (both those that waited and didn't wait) after following timestamp rules-pub static ref PROPOSAL_SUCCESS_WAIT_S: DurationHistogram = OP_COUNTERS.duration_histogram("proposal_success_wait_s");+pub static ref PROPOSAL_SUCCESS_WAIT_S: DurationHistogram = DurationHistogram::new(register_histogram!("libra_consensus_proposal_success_wait_s", "Histogram of time waited for successfully proposing a proposal (both those that waited and didn't wait) after following timestamp rules").unwrap());

Skipping this for now as DurationHistogram is a wrapper over Histogram and it it not as trivial to refactor this metric.

ankushagarwal

comment created time in a day

Pull request review commentlibra/libra

[consensus] Update all consensus counters to new format

 pub static ref EPOCH_CHANGE_DEQUEUED_MSGS: IntCounter = OP_COUNTERS.counter("epo  /// Count of the block proposals sent by this validator since last restart /// (both primary and secondary)-pub static ref PROPOSALS_COUNT: IntCounter = OP_COUNTERS.counter("proposals_count");+pub static ref PROPOSALS_COUNT: IntCounter = register_int_counter!("libra_consensus_proposals_count", "Count of the block proposals sent by this validator since last restart (both primary and secondary)").unwrap();  /// Count the number of times a validator voted for secondary proposals (upon timeout) since /// last restart.-pub static ref VOTE_SECONDARY_PROPOSAL_COUNT: IntCounter = OP_COUNTERS.counter("vote_secondary_proposal_count");+pub static ref VOTE_SECONDARY_PROPOSAL_COUNT: IntCounter = register_int_counter!("libra_consensus_vote_secondary_proposal_count", "Count the number of times a validator voted for secondary proposals (upon timeout) since last restart.").unwrap();  /// Count the number of times a validator voted for a nil block since last restart.-pub static ref VOTE_NIL_COUNT: IntCounter = OP_COUNTERS.counter("vote_nil_count");+pub static ref VOTE_NIL_COUNT: IntCounter = register_int_counter!("libra_consensus_vote_nil_count", "Count the number of times a validator voted for a nil block since last restart.").unwrap();  ////////////////////// // PACEMAKER COUNTERS ////////////////////// /// Count of the rounds that gathered QC since last restart.-pub static ref QC_ROUNDS_COUNT: IntCounter = OP_COUNTERS.counter("qc_rounds_count");+pub static ref QC_ROUNDS_COUNT: IntCounter = register_int_counter!("libra_consensus_qc_rounds_count", "Count of the rounds that gathered QC since last restart.").unwrap();  /// Count of the timeout rounds since last restart (close to 0 in happy path).-pub static ref TIMEOUT_ROUNDS_COUNT: IntCounter = OP_COUNTERS.counter("timeout_rounds_count");+pub static ref TIMEOUT_ROUNDS_COUNT: IntCounter = register_int_counter!("libra_consensus_timeout_rounds_count", "Count of the timeout rounds since last restart (close to 0 in happy path).").unwrap();  /// Count the number of timeouts a node experienced since last restart (close to 0 in happy path). /// This count is different from `TIMEOUT_ROUNDS_COUNT`, because not every time a node has /// a timeout there is an ultimate decision to move to the next round (it might take multiple /// timeouts to get the timeout certificate).-pub static ref TIMEOUT_COUNT: IntCounter = OP_COUNTERS.counter("timeout_count");+pub static ref TIMEOUT_COUNT: IntCounter = register_int_counter!("libra_consensus_timeout_count", "Count the number of timeouts a node experienced since last restart (close to 0 in happy path).").unwrap();  /// The timeout of the current round.-pub static ref ROUND_TIMEOUT_MS: IntGauge = OP_COUNTERS.gauge("round_timeout_ms");+pub static ref ROUND_TIMEOUT_MS: IntGauge = register_int_gauge!("libra_consensus_round_timeout_s", "The timeout of the current round.").unwrap();  //////////////////////// // SYNCMANAGER COUNTERS //////////////////////// /// Count the number of times we invoked state synchronization since last restart.-pub static ref STATE_SYNC_COUNT: IntCounter = OP_COUNTERS.counter("state_sync_count");+pub static ref STATE_SYNC_COUNT: IntCounter = register_int_counter!("libra_consensus_state_sync_count", "Count the number of times we invoked state synchronization since last restart.").unwrap();  /// Count the number of block retrieval requests issued since last restart.-pub static ref BLOCK_RETRIEVAL_COUNT: IntCounter = OP_COUNTERS.counter("block_retrieval_count");+pub static ref BLOCK_RETRIEVAL_COUNT: IntCounter = register_int_counter!("libra_consensus_block_retrieval_count", "Count the number of block retrieval requests issued since last restart.").unwrap();  /// Histogram of block retrieval duration.-pub static ref BLOCK_RETRIEVAL_DURATION_S: DurationHistogram = OP_COUNTERS.duration_histogram("block_retrieval_duration_s");+pub static ref BLOCK_RETRIEVAL_DURATION_S: DurationHistogram = DurationHistogram::new(register_histogram!("libra_consensus_block_retrieval_duration_s", "Histogram of block retrieval duration.").unwrap());  /// Histogram of state sync duration.-pub static ref STATE_SYNC_DURATION_S: DurationHistogram = OP_COUNTERS.duration_histogram("state_sync_duration_s");+pub static ref STATE_SYNC_DURATION_S: DurationHistogram =DurationHistogram::new(register_histogram!("libra_consensus_state_sync_duration_s", "Histogram of state sync duration.").unwrap());  /// Counts the number of times the sync info message has been set since last restart.-pub static ref SYNC_INFO_MSGS_SENT_COUNT: IntCounter = OP_COUNTERS.counter("sync_info_msg_sent_count");+pub static ref SYNC_INFO_MSGS_SENT_COUNT: IntCounter = register_int_counter!("libra_consensus_sync_info_msg_sent_count", "Counts the number of times the sync info message has been set since last restart.").unwrap();  /// Counts the number of times the sync info message has been received since last restart.-pub static ref SYNC_INFO_MSGS_RECEIVED_COUNT: IntCounter = OP_COUNTERS.counter("sync_info_msg_received_count");+pub static ref SYNC_INFO_MSGS_RECEIVED_COUNT: IntCounter = register_int_counter!("libra_consensus_sync_info_msg_received_count", "Counts the number of times the sync info message has been received since last restart.").unwrap();  ////////////////////// // RECONFIGURATION COUNTERS ////////////////////// /// Current epoch num-pub static ref EPOCH: IntGauge = OP_COUNTERS.gauge("epoch");+pub static ref EPOCH: IntGauge = register_int_gauge!("libra_consensus_epoch", "Current epoch num").unwrap(); /// The number of validators in the current epoch-pub static ref CURRENT_EPOCH_VALIDATORS: IntGauge = OP_COUNTERS.gauge("current_epoch_validators");+pub static ref CURRENT_EPOCH_VALIDATORS: IntGauge = register_int_gauge!("libra_consensus_current_epoch_validators", "The number of validators in the current epoch").unwrap(); /// Quorum size in the current epoch-pub static ref CURRENT_EPOCH_QUORUM_SIZE: IntGauge = OP_COUNTERS.gauge("current_epoch_quorum_size");+pub static ref CURRENT_EPOCH_QUORUM_SIZE: IntGauge = register_int_gauge!("libra_consensus_current_epoch_quorum_size", "Quorum size in the current epoch").unwrap();   ////////////////////// // BLOCK STORE COUNTERS ////////////////////// /// Counter for the number of blocks in the block tree (including the root). /// In a "happy path" with no collisions and timeouts, should be equal to 3 or 4.-pub static ref NUM_BLOCKS_IN_TREE: IntGauge = OP_COUNTERS.gauge("num_blocks_in_tree");+pub static ref NUM_BLOCKS_IN_TREE: IntGauge = register_int_gauge!("libra_consensus_num_blocks_in_tree", "Counter for the number of blocks in the block tree (including the root).").unwrap();  ////////////////////// // PERFORMANCE COUNTERS //////////////////////-/// Histogram of execution time (ms) of non-empty blocks.-pub static ref BLOCK_EXECUTION_DURATION_S: DurationHistogram = OP_COUNTERS.duration_histogram("block_execution_duration_s");+/// Histogram of execution time of non-empty blocks.+pub static ref BLOCK_EXECUTION_DURATION_S: DurationHistogram = DurationHistogram::new(register_histogram!("libra_consensus_block_execution_duration_s", "Histogram of execution time of non-empty blocks.").unwrap());  /// Histogram of duration of a commit procedure (the time it takes for the execution / storage to /// commit a block once we decide to do so).-pub static ref BLOCK_COMMIT_DURATION_S: DurationHistogram = OP_COUNTERS.duration_histogram("block_commit_duration_s");+pub static ref BLOCK_COMMIT_DURATION_S: DurationHistogram = DurationHistogram::new(register_histogram!("libra_consensus_block_commit_duration_s", "Histogram of duration of a commit procedure (the time it takes for the execution / storage to commit a block once we decide to do so).").unwrap());  /// Histogram for the number of txns per (committed) blocks.-pub static ref NUM_TXNS_PER_BLOCK: Histogram = OP_COUNTERS.histogram("num_txns_per_block");+pub static ref NUM_TXNS_PER_BLOCK: Histogram = register_histogram!("libra_consensus_num_txns_per_block", "Histogram for the number of txns per (committed) blocks.").unwrap(); -/// Histogram of per-transaction execution time (ms) of non-empty blocks+/// Histogram of per-transaction execution time of non-empty blocks /// (calculated as the overall execution time of a block divided by the number of transactions).-pub static ref TXN_EXECUTION_DURATION_S: DurationHistogram = OP_COUNTERS.duration_histogram("txn_execution_duration_s");+pub static ref TXN_EXECUTION_DURATION_S: DurationHistogram = DurationHistogram::new(register_histogram!("libra_consensus_txn_execution_duration_s", "Histogram of per-transaction execution time of non-empty blocks (calculated as the overall execution time of a block divided by the number of transactions).").unwrap()); -/// Histogram of execution time (ms) of empty blocks.-pub static ref EMPTY_BLOCK_EXECUTION_DURATION_S: DurationHistogram = OP_COUNTERS.duration_histogram("empty_block_execution_duration_s");+/// Histogram of execution time of empty blocks.+pub static ref EMPTY_BLOCK_EXECUTION_DURATION_S: DurationHistogram = DurationHistogram::new(register_histogram!("libra_consensus_empty_block_execution_duration_s", "Histogram of execution time of empty blocks.").unwrap());  /// Histogram of the time it takes for a block to get committed. /// Measured as the commit time minus block's timestamp.-pub static ref CREATION_TO_COMMIT_S: DurationHistogram = OP_COUNTERS.duration_histogram("creation_to_commit_s");+pub static ref CREATION_TO_COMMIT_S: DurationHistogram =DurationHistogram::new(register_histogram!("libra_consensus_creation_to_commit_s", "Histogram of the time it takes for a block to get committed. Measured as the commit time minus block's timestamp.").unwrap());  /// Duration between block generation time until the moment it gathers full QC-pub static ref CREATION_TO_QC_S: DurationHistogram = OP_COUNTERS.duration_histogram("creation_to_qc_s");+pub static ref CREATION_TO_QC_S: DurationHistogram = DurationHistogram::new(register_histogram!("libra_consensus_creation_to_qc_s", "Duration between block generation time until the moment it gathers full QC").unwrap());  /// Duration between block generation time until the moment it is received and ready for execution.-pub static ref CREATION_TO_RECEIVAL_S: DurationHistogram = OP_COUNTERS.duration_histogram("creation_to_receival_s");+pub static ref CREATION_TO_RECEIVAL_S: DurationHistogram = DurationHistogram::new(register_histogram!("libra_consensus_creation_to_receival_s", "Duration between block generation time until the moment it is received and ready for execution.").unwrap());  //////////////////////////////////// // PROPSOSAL/VOTE TIMESTAMP COUNTERS //////////////////////////////////// /// Count of the proposals that passed the timestamp rules and did not have to wait-pub static ref PROPOSAL_NO_WAIT_REQUIRED_COUNT: IntCounter = OP_COUNTERS.counter("proposal_no_wait_required_count");+pub static ref PROPOSAL_NO_WAIT_REQUIRED_COUNT: IntCounter = register_int_counter!("libra_consensus_proposal_no_wait_required_count", "Count of the proposals that passed the timestamp rules and did not have to wait").unwrap();  /// Count of the proposals where passing the timestamp rules required waiting-pub static ref PROPOSAL_WAIT_WAS_REQUIRED_COUNT: IntCounter = OP_COUNTERS.counter("proposal_wait_was_required_count");+pub static ref PROPOSAL_WAIT_WAS_REQUIRED_COUNT: IntCounter = register_int_counter!("libra_consensus_proposal_wait_was_required_count", "Count of the proposals where passing the timestamp rules required waiting").unwrap();  /// Count of the proposals that were not made due to the waiting period exceeding the maximum allowed duration, breaking timestamp rules-pub static ref PROPOSAL_MAX_WAIT_EXCEEDED_COUNT: IntCounter = OP_COUNTERS.counter("proposal_max_wait_exceeded_count");+pub static ref PROPOSAL_MAX_WAIT_EXCEEDED_COUNT: IntCounter = register_int_counter!("libra_consensus_proposal_max_wait_exceeded_count", "Count of the proposals that were not made due to the waiting period exceeding the maximum allowed duration, breaking timestamp rules").unwrap();  /// Count of the proposals that were not made due to waiting to ensure the current time exceeds min_duration_since_epoch failed, breaking timestamp rules-pub static ref PROPOSAL_WAIT_FAILED_COUNT: IntCounter = OP_COUNTERS.counter("proposal_wait_failed_count");+pub static ref PROPOSAL_WAIT_FAILED_COUNT: IntCounter = register_int_counter!("libra_consensus_proposal_wait_failed_count", "Count of the proposals that were not made due to waiting to ensure the current time exceeds min_duration_since_epoch failed, breaking timestamp rules").unwrap();  /// Histogram of time waited for successfully proposing a proposal (both those that waited and didn't wait) after following timestamp rules-pub static ref PROPOSAL_SUCCESS_WAIT_S: DurationHistogram = OP_COUNTERS.duration_histogram("proposal_success_wait_s");+pub static ref PROPOSAL_SUCCESS_WAIT_S: DurationHistogram = DurationHistogram::new(register_histogram!("libra_consensus_proposal_success_wait_s", "Histogram of time waited for successfully proposing a proposal (both those that waited and didn't wait) after following timestamp rules").unwrap());  /// Histogram of time waited for failing to propose a proposal (both those that waited and didn't wait) while trying to follow timestamp rules-pub static ref PROPOSAL_FAILURE_WAIT_S: DurationHistogram = OP_COUNTERS.duration_histogram("proposal_failure_wait_s");+pub static ref PROPOSAL_FAILURE_WAIT_S: DurationHistogram =DurationHistogram::new(register_histogram!("libra_consensus_proposal_failure_wait_s", "Histogram of time waited for failing to propose a proposal (both those that waited and didn't wait) while trying to follow timestamp rules").unwrap());  /// Count of the votes that passed the timestamp rules and did not have to wait-pub static ref VOTE_NO_WAIT_REQUIRED_COUNT: IntCounter = OP_COUNTERS.counter("vote_no_wait_required_count");+pub static ref VOTE_NO_WAIT_REQUIRED_COUNT: IntCounter = register_int_counter!("libra_consensus_vote_no_wait_required_count", "Count of the votes that passed the timestamp rules and did not have to wait").unwrap();  /// Count of the votes where passing the timestamp rules required waiting-pub static ref VOTE_WAIT_WAS_REQUIRED_COUNT: IntCounter = OP_COUNTERS.counter("vote_wait_was_required_count");+pub static ref VOTE_WAIT_WAS_REQUIRED_COUNT: IntCounter = register_int_counter!("libra_consensus_vote_wait_was_required_count", "Count of the votes where passing the timestamp rules required waiting").unwrap();

Done

ankushagarwal

comment created time in a day

Pull request review commentlibra/libra

[consensus] Update all consensus counters to new format

 pub static ref EPOCH_CHANGE_DEQUEUED_MSGS: IntCounter = OP_COUNTERS.counter("epo  /// Count of the block proposals sent by this validator since last restart /// (both primary and secondary)-pub static ref PROPOSALS_COUNT: IntCounter = OP_COUNTERS.counter("proposals_count");+pub static ref PROPOSALS_COUNT: IntCounter = register_int_counter!("libra_consensus_proposals_count", "Count of the block proposals sent by this validator since last restart (both primary and secondary)").unwrap();  /// Count the number of times a validator voted for secondary proposals (upon timeout) since /// last restart.-pub static ref VOTE_SECONDARY_PROPOSAL_COUNT: IntCounter = OP_COUNTERS.counter("vote_secondary_proposal_count");+pub static ref VOTE_SECONDARY_PROPOSAL_COUNT: IntCounter = register_int_counter!("libra_consensus_vote_secondary_proposal_count", "Count the number of times a validator voted for secondary proposals (upon timeout) since last restart.").unwrap();  /// Count the number of times a validator voted for a nil block since last restart.-pub static ref VOTE_NIL_COUNT: IntCounter = OP_COUNTERS.counter("vote_nil_count");+pub static ref VOTE_NIL_COUNT: IntCounter = register_int_counter!("libra_consensus_vote_nil_count", "Count the number of times a validator voted for a nil block since last restart.").unwrap();  ////////////////////// // PACEMAKER COUNTERS ////////////////////// /// Count of the rounds that gathered QC since last restart.-pub static ref QC_ROUNDS_COUNT: IntCounter = OP_COUNTERS.counter("qc_rounds_count");+pub static ref QC_ROUNDS_COUNT: IntCounter = register_int_counter!("libra_consensus_qc_rounds_count", "Count of the rounds that gathered QC since last restart.").unwrap();  /// Count of the timeout rounds since last restart (close to 0 in happy path).-pub static ref TIMEOUT_ROUNDS_COUNT: IntCounter = OP_COUNTERS.counter("timeout_rounds_count");+pub static ref TIMEOUT_ROUNDS_COUNT: IntCounter = register_int_counter!("libra_consensus_timeout_rounds_count", "Count of the timeout rounds since last restart (close to 0 in happy path).").unwrap();  /// Count the number of timeouts a node experienced since last restart (close to 0 in happy path). /// This count is different from `TIMEOUT_ROUNDS_COUNT`, because not every time a node has /// a timeout there is an ultimate decision to move to the next round (it might take multiple /// timeouts to get the timeout certificate).-pub static ref TIMEOUT_COUNT: IntCounter = OP_COUNTERS.counter("timeout_count");+pub static ref TIMEOUT_COUNT: IntCounter = register_int_counter!("libra_consensus_timeout_count", "Count the number of timeouts a node experienced since last restart (close to 0 in happy path).").unwrap();  /// The timeout of the current round.-pub static ref ROUND_TIMEOUT_MS: IntGauge = OP_COUNTERS.gauge("round_timeout_ms");+pub static ref ROUND_TIMEOUT_MS: IntGauge = register_int_gauge!("libra_consensus_round_timeout_s", "The timeout of the current round.").unwrap();  //////////////////////// // SYNCMANAGER COUNTERS //////////////////////// /// Count the number of times we invoked state synchronization since last restart.-pub static ref STATE_SYNC_COUNT: IntCounter = OP_COUNTERS.counter("state_sync_count");+pub static ref STATE_SYNC_COUNT: IntCounter = register_int_counter!("libra_consensus_state_sync_count", "Count the number of times we invoked state synchronization since last restart.").unwrap();  /// Count the number of block retrieval requests issued since last restart.-pub static ref BLOCK_RETRIEVAL_COUNT: IntCounter = OP_COUNTERS.counter("block_retrieval_count");+pub static ref BLOCK_RETRIEVAL_COUNT: IntCounter = register_int_counter!("libra_consensus_block_retrieval_count", "Count the number of block retrieval requests issued since last restart.").unwrap();  /// Histogram of block retrieval duration.-pub static ref BLOCK_RETRIEVAL_DURATION_S: DurationHistogram = OP_COUNTERS.duration_histogram("block_retrieval_duration_s");+pub static ref BLOCK_RETRIEVAL_DURATION_S: DurationHistogram = DurationHistogram::new(register_histogram!("libra_consensus_block_retrieval_duration_s", "Histogram of block retrieval duration.").unwrap());  /// Histogram of state sync duration.-pub static ref STATE_SYNC_DURATION_S: DurationHistogram = OP_COUNTERS.duration_histogram("state_sync_duration_s");+pub static ref STATE_SYNC_DURATION_S: DurationHistogram =DurationHistogram::new(register_histogram!("libra_consensus_state_sync_duration_s", "Histogram of state sync duration.").unwrap());  /// Counts the number of times the sync info message has been set since last restart.-pub static ref SYNC_INFO_MSGS_SENT_COUNT: IntCounter = OP_COUNTERS.counter("sync_info_msg_sent_count");+pub static ref SYNC_INFO_MSGS_SENT_COUNT: IntCounter = register_int_counter!("libra_consensus_sync_info_msg_sent_count", "Counts the number of times the sync info message has been set since last restart.").unwrap();  /// Counts the number of times the sync info message has been received since last restart.-pub static ref SYNC_INFO_MSGS_RECEIVED_COUNT: IntCounter = OP_COUNTERS.counter("sync_info_msg_received_count");+pub static ref SYNC_INFO_MSGS_RECEIVED_COUNT: IntCounter = register_int_counter!("libra_consensus_sync_info_msg_received_count", "Counts the number of times the sync info message has been received since last restart.").unwrap();  ////////////////////// // RECONFIGURATION COUNTERS ////////////////////// /// Current epoch num-pub static ref EPOCH: IntGauge = OP_COUNTERS.gauge("epoch");+pub static ref EPOCH: IntGauge = register_int_gauge!("libra_consensus_epoch", "Current epoch num").unwrap(); /// The number of validators in the current epoch-pub static ref CURRENT_EPOCH_VALIDATORS: IntGauge = OP_COUNTERS.gauge("current_epoch_validators");+pub static ref CURRENT_EPOCH_VALIDATORS: IntGauge = register_int_gauge!("libra_consensus_current_epoch_validators", "The number of validators in the current epoch").unwrap(); /// Quorum size in the current epoch-pub static ref CURRENT_EPOCH_QUORUM_SIZE: IntGauge = OP_COUNTERS.gauge("current_epoch_quorum_size");+pub static ref CURRENT_EPOCH_QUORUM_SIZE: IntGauge = register_int_gauge!("libra_consensus_current_epoch_quorum_size", "Quorum size in the current epoch").unwrap();   ////////////////////// // BLOCK STORE COUNTERS ////////////////////// /// Counter for the number of blocks in the block tree (including the root). /// In a "happy path" with no collisions and timeouts, should be equal to 3 or 4.-pub static ref NUM_BLOCKS_IN_TREE: IntGauge = OP_COUNTERS.gauge("num_blocks_in_tree");+pub static ref NUM_BLOCKS_IN_TREE: IntGauge = register_int_gauge!("libra_consensus_num_blocks_in_tree", "Counter for the number of blocks in the block tree (including the root).").unwrap();  ////////////////////// // PERFORMANCE COUNTERS //////////////////////-/// Histogram of execution time (ms) of non-empty blocks.-pub static ref BLOCK_EXECUTION_DURATION_S: DurationHistogram = OP_COUNTERS.duration_histogram("block_execution_duration_s");+/// Histogram of execution time of non-empty blocks.+pub static ref BLOCK_EXECUTION_DURATION_S: DurationHistogram = DurationHistogram::new(register_histogram!("libra_consensus_block_execution_duration_s", "Histogram of execution time of non-empty blocks.").unwrap());  /// Histogram of duration of a commit procedure (the time it takes for the execution / storage to /// commit a block once we decide to do so).-pub static ref BLOCK_COMMIT_DURATION_S: DurationHistogram = OP_COUNTERS.duration_histogram("block_commit_duration_s");+pub static ref BLOCK_COMMIT_DURATION_S: DurationHistogram = DurationHistogram::new(register_histogram!("libra_consensus_block_commit_duration_s", "Histogram of duration of a commit procedure (the time it takes for the execution / storage to commit a block once we decide to do so).").unwrap());  /// Histogram for the number of txns per (committed) blocks.-pub static ref NUM_TXNS_PER_BLOCK: Histogram = OP_COUNTERS.histogram("num_txns_per_block");+pub static ref NUM_TXNS_PER_BLOCK: Histogram = register_histogram!("libra_consensus_num_txns_per_block", "Histogram for the number of txns per (committed) blocks.").unwrap(); -/// Histogram of per-transaction execution time (ms) of non-empty blocks+/// Histogram of per-transaction execution time of non-empty blocks /// (calculated as the overall execution time of a block divided by the number of transactions).-pub static ref TXN_EXECUTION_DURATION_S: DurationHistogram = OP_COUNTERS.duration_histogram("txn_execution_duration_s");+pub static ref TXN_EXECUTION_DURATION_S: DurationHistogram = DurationHistogram::new(register_histogram!("libra_consensus_txn_execution_duration_s", "Histogram of per-transaction execution time of non-empty blocks (calculated as the overall execution time of a block divided by the number of transactions).").unwrap()); -/// Histogram of execution time (ms) of empty blocks.-pub static ref EMPTY_BLOCK_EXECUTION_DURATION_S: DurationHistogram = OP_COUNTERS.duration_histogram("empty_block_execution_duration_s");+/// Histogram of execution time of empty blocks.+pub static ref EMPTY_BLOCK_EXECUTION_DURATION_S: DurationHistogram = DurationHistogram::new(register_histogram!("libra_consensus_empty_block_execution_duration_s", "Histogram of execution time of empty blocks.").unwrap());  /// Histogram of the time it takes for a block to get committed. /// Measured as the commit time minus block's timestamp.-pub static ref CREATION_TO_COMMIT_S: DurationHistogram = OP_COUNTERS.duration_histogram("creation_to_commit_s");+pub static ref CREATION_TO_COMMIT_S: DurationHistogram =DurationHistogram::new(register_histogram!("libra_consensus_creation_to_commit_s", "Histogram of the time it takes for a block to get committed. Measured as the commit time minus block's timestamp.").unwrap());  /// Duration between block generation time until the moment it gathers full QC-pub static ref CREATION_TO_QC_S: DurationHistogram = OP_COUNTERS.duration_histogram("creation_to_qc_s");+pub static ref CREATION_TO_QC_S: DurationHistogram = DurationHistogram::new(register_histogram!("libra_consensus_creation_to_qc_s", "Duration between block generation time until the moment it gathers full QC").unwrap());  /// Duration between block generation time until the moment it is received and ready for execution.-pub static ref CREATION_TO_RECEIVAL_S: DurationHistogram = OP_COUNTERS.duration_histogram("creation_to_receival_s");+pub static ref CREATION_TO_RECEIVAL_S: DurationHistogram = DurationHistogram::new(register_histogram!("libra_consensus_creation_to_receival_s", "Duration between block generation time until the moment it is received and ready for execution.").unwrap());  //////////////////////////////////// // PROPSOSAL/VOTE TIMESTAMP COUNTERS //////////////////////////////////// /// Count of the proposals that passed the timestamp rules and did not have to wait-pub static ref PROPOSAL_NO_WAIT_REQUIRED_COUNT: IntCounter = OP_COUNTERS.counter("proposal_no_wait_required_count");+pub static ref PROPOSAL_NO_WAIT_REQUIRED_COUNT: IntCounter = register_int_counter!("libra_consensus_proposal_no_wait_required_count", "Count of the proposals that passed the timestamp rules and did not have to wait").unwrap();  /// Count of the proposals where passing the timestamp rules required waiting-pub static ref PROPOSAL_WAIT_WAS_REQUIRED_COUNT: IntCounter = OP_COUNTERS.counter("proposal_wait_was_required_count");+pub static ref PROPOSAL_WAIT_WAS_REQUIRED_COUNT: IntCounter = register_int_counter!("libra_consensus_proposal_wait_was_required_count", "Count of the proposals where passing the timestamp rules required waiting").unwrap();  /// Count of the proposals that were not made due to the waiting period exceeding the maximum allowed duration, breaking timestamp rules-pub static ref PROPOSAL_MAX_WAIT_EXCEEDED_COUNT: IntCounter = OP_COUNTERS.counter("proposal_max_wait_exceeded_count");+pub static ref PROPOSAL_MAX_WAIT_EXCEEDED_COUNT: IntCounter = register_int_counter!("libra_consensus_proposal_max_wait_exceeded_count", "Count of the proposals that were not made due to the waiting period exceeding the maximum allowed duration, breaking timestamp rules").unwrap();  /// Count of the proposals that were not made due to waiting to ensure the current time exceeds min_duration_since_epoch failed, breaking timestamp rules-pub static ref PROPOSAL_WAIT_FAILED_COUNT: IntCounter = OP_COUNTERS.counter("proposal_wait_failed_count");+pub static ref PROPOSAL_WAIT_FAILED_COUNT: IntCounter = register_int_counter!("libra_consensus_proposal_wait_failed_count", "Count of the proposals that were not made due to waiting to ensure the current time exceeds min_duration_since_epoch failed, breaking timestamp rules").unwrap();  /// Histogram of time waited for successfully proposing a proposal (both those that waited and didn't wait) after following timestamp rules-pub static ref PROPOSAL_SUCCESS_WAIT_S: DurationHistogram = OP_COUNTERS.duration_histogram("proposal_success_wait_s");+pub static ref PROPOSAL_SUCCESS_WAIT_S: DurationHistogram = DurationHistogram::new(register_histogram!("libra_consensus_proposal_success_wait_s", "Histogram of time waited for successfully proposing a proposal (both those that waited and didn't wait) after following timestamp rules").unwrap());  /// Histogram of time waited for failing to propose a proposal (both those that waited and didn't wait) while trying to follow timestamp rules-pub static ref PROPOSAL_FAILURE_WAIT_S: DurationHistogram = OP_COUNTERS.duration_histogram("proposal_failure_wait_s");+pub static ref PROPOSAL_FAILURE_WAIT_S: DurationHistogram =DurationHistogram::new(register_histogram!("libra_consensus_proposal_failure_wait_s", "Histogram of time waited for failing to propose a proposal (both those that waited and didn't wait) while trying to follow timestamp rules").unwrap());  /// Count of the votes that passed the timestamp rules and did not have to wait-pub static ref VOTE_NO_WAIT_REQUIRED_COUNT: IntCounter = OP_COUNTERS.counter("vote_no_wait_required_count");+pub static ref VOTE_NO_WAIT_REQUIRED_COUNT: IntCounter = register_int_counter!("libra_consensus_vote_no_wait_required_count", "Count of the votes that passed the timestamp rules and did not have to wait").unwrap();

Done

ankushagarwal

comment created time in a day

Pull request review commentlibra/libra

[consensus] Update all consensus counters to new format

 pub static ref EPOCH_CHANGE_DEQUEUED_MSGS: IntCounter = OP_COUNTERS.counter("epo  /// Count of the block proposals sent by this validator since last restart /// (both primary and secondary)-pub static ref PROPOSALS_COUNT: IntCounter = OP_COUNTERS.counter("proposals_count");+pub static ref PROPOSALS_COUNT: IntCounter = register_int_counter!("libra_consensus_proposals_count", "Count of the block proposals sent by this validator since last restart (both primary and secondary)").unwrap();  /// Count the number of times a validator voted for secondary proposals (upon timeout) since /// last restart.-pub static ref VOTE_SECONDARY_PROPOSAL_COUNT: IntCounter = OP_COUNTERS.counter("vote_secondary_proposal_count");+pub static ref VOTE_SECONDARY_PROPOSAL_COUNT: IntCounter = register_int_counter!("libra_consensus_vote_secondary_proposal_count", "Count the number of times a validator voted for secondary proposals (upon timeout) since last restart.").unwrap();  /// Count the number of times a validator voted for a nil block since last restart.-pub static ref VOTE_NIL_COUNT: IntCounter = OP_COUNTERS.counter("vote_nil_count");+pub static ref VOTE_NIL_COUNT: IntCounter = register_int_counter!("libra_consensus_vote_nil_count", "Count the number of times a validator voted for a nil block since last restart.").unwrap();  ////////////////////// // PACEMAKER COUNTERS ////////////////////// /// Count of the rounds that gathered QC since last restart.-pub static ref QC_ROUNDS_COUNT: IntCounter = OP_COUNTERS.counter("qc_rounds_count");+pub static ref QC_ROUNDS_COUNT: IntCounter = register_int_counter!("libra_consensus_qc_rounds_count", "Count of the rounds that gathered QC since last restart.").unwrap();  /// Count of the timeout rounds since last restart (close to 0 in happy path).-pub static ref TIMEOUT_ROUNDS_COUNT: IntCounter = OP_COUNTERS.counter("timeout_rounds_count");+pub static ref TIMEOUT_ROUNDS_COUNT: IntCounter = register_int_counter!("libra_consensus_timeout_rounds_count", "Count of the timeout rounds since last restart (close to 0 in happy path).").unwrap();  /// Count the number of timeouts a node experienced since last restart (close to 0 in happy path). /// This count is different from `TIMEOUT_ROUNDS_COUNT`, because not every time a node has /// a timeout there is an ultimate decision to move to the next round (it might take multiple /// timeouts to get the timeout certificate).-pub static ref TIMEOUT_COUNT: IntCounter = OP_COUNTERS.counter("timeout_count");+pub static ref TIMEOUT_COUNT: IntCounter = register_int_counter!("libra_consensus_timeout_count", "Count the number of timeouts a node experienced since last restart (close to 0 in happy path).").unwrap();  /// The timeout of the current round.-pub static ref ROUND_TIMEOUT_MS: IntGauge = OP_COUNTERS.gauge("round_timeout_ms");+pub static ref ROUND_TIMEOUT_MS: IntGauge = register_int_gauge!("libra_consensus_round_timeout_s", "The timeout of the current round.").unwrap();  //////////////////////// // SYNCMANAGER COUNTERS //////////////////////// /// Count the number of times we invoked state synchronization since last restart.-pub static ref STATE_SYNC_COUNT: IntCounter = OP_COUNTERS.counter("state_sync_count");+pub static ref STATE_SYNC_COUNT: IntCounter = register_int_counter!("libra_consensus_state_sync_count", "Count the number of times we invoked state synchronization since last restart.").unwrap();  /// Count the number of block retrieval requests issued since last restart.-pub static ref BLOCK_RETRIEVAL_COUNT: IntCounter = OP_COUNTERS.counter("block_retrieval_count");+pub static ref BLOCK_RETRIEVAL_COUNT: IntCounter = register_int_counter!("libra_consensus_block_retrieval_count", "Count the number of block retrieval requests issued since last restart.").unwrap();  /// Histogram of block retrieval duration.-pub static ref BLOCK_RETRIEVAL_DURATION_S: DurationHistogram = OP_COUNTERS.duration_histogram("block_retrieval_duration_s");+pub static ref BLOCK_RETRIEVAL_DURATION_S: DurationHistogram = DurationHistogram::new(register_histogram!("libra_consensus_block_retrieval_duration_s", "Histogram of block retrieval duration.").unwrap());  /// Histogram of state sync duration.-pub static ref STATE_SYNC_DURATION_S: DurationHistogram = OP_COUNTERS.duration_histogram("state_sync_duration_s");+pub static ref STATE_SYNC_DURATION_S: DurationHistogram =DurationHistogram::new(register_histogram!("libra_consensus_state_sync_duration_s", "Histogram of state sync duration.").unwrap());  /// Counts the number of times the sync info message has been set since last restart.-pub static ref SYNC_INFO_MSGS_SENT_COUNT: IntCounter = OP_COUNTERS.counter("sync_info_msg_sent_count");+pub static ref SYNC_INFO_MSGS_SENT_COUNT: IntCounter = register_int_counter!("libra_consensus_sync_info_msg_sent_count", "Counts the number of times the sync info message has been set since last restart.").unwrap();  /// Counts the number of times the sync info message has been received since last restart.-pub static ref SYNC_INFO_MSGS_RECEIVED_COUNT: IntCounter = OP_COUNTERS.counter("sync_info_msg_received_count");+pub static ref SYNC_INFO_MSGS_RECEIVED_COUNT: IntCounter = register_int_counter!("libra_consensus_sync_info_msg_received_count", "Counts the number of times the sync info message has been received since last restart.").unwrap();  ////////////////////// // RECONFIGURATION COUNTERS ////////////////////// /// Current epoch num-pub static ref EPOCH: IntGauge = OP_COUNTERS.gauge("epoch");+pub static ref EPOCH: IntGauge = register_int_gauge!("libra_consensus_epoch", "Current epoch num").unwrap(); /// The number of validators in the current epoch-pub static ref CURRENT_EPOCH_VALIDATORS: IntGauge = OP_COUNTERS.gauge("current_epoch_validators");+pub static ref CURRENT_EPOCH_VALIDATORS: IntGauge = register_int_gauge!("libra_consensus_current_epoch_validators", "The number of validators in the current epoch").unwrap(); /// Quorum size in the current epoch-pub static ref CURRENT_EPOCH_QUORUM_SIZE: IntGauge = OP_COUNTERS.gauge("current_epoch_quorum_size");+pub static ref CURRENT_EPOCH_QUORUM_SIZE: IntGauge = register_int_gauge!("libra_consensus_current_epoch_quorum_size", "Quorum size in the current epoch").unwrap();   ////////////////////// // BLOCK STORE COUNTERS ////////////////////// /// Counter for the number of blocks in the block tree (including the root). /// In a "happy path" with no collisions and timeouts, should be equal to 3 or 4.-pub static ref NUM_BLOCKS_IN_TREE: IntGauge = OP_COUNTERS.gauge("num_blocks_in_tree");+pub static ref NUM_BLOCKS_IN_TREE: IntGauge = register_int_gauge!("libra_consensus_num_blocks_in_tree", "Counter for the number of blocks in the block tree (including the root).").unwrap();  ////////////////////// // PERFORMANCE COUNTERS //////////////////////-/// Histogram of execution time (ms) of non-empty blocks.-pub static ref BLOCK_EXECUTION_DURATION_S: DurationHistogram = OP_COUNTERS.duration_histogram("block_execution_duration_s");+/// Histogram of execution time of non-empty blocks.+pub static ref BLOCK_EXECUTION_DURATION_S: DurationHistogram = DurationHistogram::new(register_histogram!("libra_consensus_block_execution_duration_s", "Histogram of execution time of non-empty blocks.").unwrap());  /// Histogram of duration of a commit procedure (the time it takes for the execution / storage to /// commit a block once we decide to do so).-pub static ref BLOCK_COMMIT_DURATION_S: DurationHistogram = OP_COUNTERS.duration_histogram("block_commit_duration_s");+pub static ref BLOCK_COMMIT_DURATION_S: DurationHistogram = DurationHistogram::new(register_histogram!("libra_consensus_block_commit_duration_s", "Histogram of duration of a commit procedure (the time it takes for the execution / storage to commit a block once we decide to do so).").unwrap());  /// Histogram for the number of txns per (committed) blocks.-pub static ref NUM_TXNS_PER_BLOCK: Histogram = OP_COUNTERS.histogram("num_txns_per_block");+pub static ref NUM_TXNS_PER_BLOCK: Histogram = register_histogram!("libra_consensus_num_txns_per_block", "Histogram for the number of txns per (committed) blocks.").unwrap(); -/// Histogram of per-transaction execution time (ms) of non-empty blocks+/// Histogram of per-transaction execution time of non-empty blocks /// (calculated as the overall execution time of a block divided by the number of transactions).-pub static ref TXN_EXECUTION_DURATION_S: DurationHistogram = OP_COUNTERS.duration_histogram("txn_execution_duration_s");+pub static ref TXN_EXECUTION_DURATION_S: DurationHistogram = DurationHistogram::new(register_histogram!("libra_consensus_txn_execution_duration_s", "Histogram of per-transaction execution time of non-empty blocks (calculated as the overall execution time of a block divided by the number of transactions).").unwrap()); -/// Histogram of execution time (ms) of empty blocks.-pub static ref EMPTY_BLOCK_EXECUTION_DURATION_S: DurationHistogram = OP_COUNTERS.duration_histogram("empty_block_execution_duration_s");+/// Histogram of execution time of empty blocks.+pub static ref EMPTY_BLOCK_EXECUTION_DURATION_S: DurationHistogram = DurationHistogram::new(register_histogram!("libra_consensus_empty_block_execution_duration_s", "Histogram of execution time of empty blocks.").unwrap());  /// Histogram of the time it takes for a block to get committed. /// Measured as the commit time minus block's timestamp.-pub static ref CREATION_TO_COMMIT_S: DurationHistogram = OP_COUNTERS.duration_histogram("creation_to_commit_s");+pub static ref CREATION_TO_COMMIT_S: DurationHistogram =DurationHistogram::new(register_histogram!("libra_consensus_creation_to_commit_s", "Histogram of the time it takes for a block to get committed. Measured as the commit time minus block's timestamp.").unwrap());  /// Duration between block generation time until the moment it gathers full QC-pub static ref CREATION_TO_QC_S: DurationHistogram = OP_COUNTERS.duration_histogram("creation_to_qc_s");+pub static ref CREATION_TO_QC_S: DurationHistogram = DurationHistogram::new(register_histogram!("libra_consensus_creation_to_qc_s", "Duration between block generation time until the moment it gathers full QC").unwrap());  /// Duration between block generation time until the moment it is received and ready for execution.-pub static ref CREATION_TO_RECEIVAL_S: DurationHistogram = OP_COUNTERS.duration_histogram("creation_to_receival_s");+pub static ref CREATION_TO_RECEIVAL_S: DurationHistogram = DurationHistogram::new(register_histogram!("libra_consensus_creation_to_receival_s", "Duration between block generation time until the moment it is received and ready for execution.").unwrap());  //////////////////////////////////// // PROPSOSAL/VOTE TIMESTAMP COUNTERS //////////////////////////////////// /// Count of the proposals that passed the timestamp rules and did not have to wait-pub static ref PROPOSAL_NO_WAIT_REQUIRED_COUNT: IntCounter = OP_COUNTERS.counter("proposal_no_wait_required_count");+pub static ref PROPOSAL_NO_WAIT_REQUIRED_COUNT: IntCounter = register_int_counter!("libra_consensus_proposal_no_wait_required_count", "Count of the proposals that passed the timestamp rules and did not have to wait").unwrap();

Done

ankushagarwal

comment created time in a day

delete branch ankushagarwal/libra

delete branch : multi-region-simulation

delete time in a day

pull request commentlibra/libra

[network] Set TCP_NODELAY for tcp connections

@bors-libra r+

Updated constructor to set nodelay to true by default.

ankushagarwal

comment created time in a day

push eventankushagarwal/libra

Ankush Agarwal

commit sha 66f60608299f8c7e4c75d1ff1867cdf062a07eb4

[network] Set TCP_NODELAY for tcp connections

view details

push time in a day

pull request commentlibra/libra

[network] Set TCP_NODELAY for tcp connections

@bors-libra r+

ankushagarwal

comment created time in a day

push eventankushagarwal/libra

Ankush Agarwal

commit sha 12627c8ced77620eafb0f4e4dccd0c2e490790ca

[network] Set TCP_NODELAY for tcp connections

view details

push time in a day

Pull request review commentlibra/libra

[network] Set TCP_NODELAY for tcp connections

 pub fn build_tcp_noise_transport(     identity_keypair: (X25519StaticPrivateKey, X25519StaticPublicKey),     trusted_peers: Arc<RwLock<HashMap<PeerId, NetworkPublicKeys>>>, ) -> boxed::BoxedTransport<(Identity, impl StreamMultiplexer), impl ::std::error::Error> {-    let tcp_transport = tcp::TcpTransport::default();+    let tcp_transport = tcp::TcpTransport {+        nodelay: Some(true),+        ..tcp::TcpTransport::default()+    };

Done

ankushagarwal

comment created time in a day

pull request commentlibra/libra

[network] Set TCP_NODELAY for tcp connections

@bors-libra r+

ankushagarwal

comment created time in a day

push eventankushagarwal/libra

Bob Wilson

commit sha fbd6040d6c6e724171a4307820b8c13fec67396b

[language] Remove unused ParseError::UnrecognizedToken The unused UnrecognizedToken error adds a type parameter for Token that propagates all over the place. Since this is not even used, remove it along with all those type parameters. We should definitely add more detailed error messages (probably only in the new move-lang compiler, not in the IR compiler) but whatever we do should not require the internal token types, since the diagnostics should describe issues in terms that are directly visible to end users. Closes: #1685 Approved by: tnowacki

view details

Bob Wilson

commit sha 1f5c808edbf801082b18e5a834ee89e65fae3a38

[language] Clean up some remaining references to lalrpop for the IR compiler Closes: #1685 Approved by: tnowacki

view details

Todd Nowacki

commit sha b185766be55859e975aa639e6aba26e30e53cddf

[language][Move] Added expansion tests - Added tests for expansion pass. - Tried to cover +/- cases for each check Closes: #1660 Approved by: tzakian

view details

Herman Venter

commit sha 85a11b5f169964681869f84186852f127a7ca6cb

Make MIRAI happy Closes: #1681 Approved by: huitseeker

view details

Weiliang Li

commit sha de7b695be03229234aca9fe008831f60f8bfcb9c

make clippy happy

view details

David Wong

commit sha 3e40b5b54f467f73e0f07ee48bfe2bea2464b585

[consensus] removing the `terminate` in chained_bft_smr loop The consensus loop/select has a terminate that should not be reachable as the loop should run forever. This commit removes it. Closes: #1675 Approved by: zekun000

view details

Ankush Agarwal

commit sha 760fa412aab89c82c7979af0add2a601aa0b9305

[cluster-test] Update timeouts, print intermediate results Closes: #1689 Approved by: andll

view details

Bob Wilson

commit sha 0aa5ca1489eae348550c07238f2ba2b3bfda45c3

[language] Refactor lexer to add a lookahead API There are some cases where the parser really needs to look ahead at the next token before deciding how to parse the current token. It was hard to support that with the original lexer that I hacked together from the lalrpop output, but now that the lexer is sane, it is not so hard. Add a new lookahead API and use it in the parser, replacing the current workarounds. Closes: #1688 Approved by: tnowacki

view details

Todd Nowacki

commit sha e95ae06ea81898b16361b8da3ea28e3e35408432

[easy][move] Sort errors by first Loc when displaying - Sort errors by the initial Loc, makes reading errors a lot easier in big lists Closes: #1693 Approved by: vgao1996

view details

Andrey Chursin

commit sha b7f77a0e854d6563f62fad50fc48543117f20b17

[cluster-test] Test if file exists before log rotate When cluster test fails to setup cluster(for example, failed to cp genesis.blob), then there is no log file on host - previous was log rotate, and new one was not created. In this case attempt to log rotate it on next run produces a lot of noise. This checks if file exists before attempting to log rotate it Closes: #1692 Approved by: ankushagarwal

view details

Runtian Zhou

commit sha 346a2700cc6dc6b9c6849d0bfca4934de71b5fc7

[vm] Clean up the code cache api. Closes: #1668 Approved by: dariorussi

view details

Andrey Chursin

commit sha 14e6806915ad5b2e257e5e49f7b802b0311fe928

[cluster-test] Bump liveness health check timeout 1m->2m We have intermittent failures with 1m timeout, this diff will double it Closes: #1697 Approved by: ankushagarwal

view details

Young Yang Liauw

commit sha 3743988d16364960329af1342297432ca5355e5e

[CI] upgrade stretch to buster In CircleCI setup, bump rust:stretch to rust:buster for builders. This is to keep it consistent with Docker build. Closes: #1691 Approved by: bmwill

view details

Zekun Li

commit sha 7a30963bb0054833b97803df3b05a33cd7008c97

[consensus] enforce epoch consistency of messages We enforce every messages contain the information about the same epoch. (Fix a few missing verification too) Closes: #1694 Approved by: aching

view details

David Wolinsky

commit sha b0c05c6f24f18afb0a6be0e55f8608b55239ea92

[consensus] Add persistent storage for safety rules This introduces a persistent storage interface and two implementations for SafetyRules: - InMemory for (integration) testing purposes - OnDisk for ("production") testing purposes Eventually this same API should be able to be used by various Secrets Managers. Closes: #1615 Approved by: davidiw

view details

David Wolinsky

commit sha 906cf490aaf0025669c0d1f78868ca0f2862504a

[consensus] Move toward PersistentStorage interface for SafetyRules ConsensusState isn't really a store, SafetyRules needs a store. So this replaces the code to leverage a config backed storage unit and all the code mods that go with it. Note: because of the fact that consensus is multithreaded, PersistentStore must be both Send + Sync This commit also refactors the code within safety rules to take on a more library style approach as there are more features now within the code. Closes: #1615 Approved by: davidiw

view details

David Wolinsky

commit sha 7f88d8f3ffe8895f4cabfaaf2e3c9ac60a20a780

[config] add a new config for safety rules Add a new config because this needs to be entirely managed (owned by Safety Rules) so that it can mutate it during run time. It is worth noting that currently ConsensusConfig is aware of this config, in the future, this would only be the case for testing. Safety Rules binary should load this file directly, whereas Safety Rules library would receive this as part of validator starting. Closes: #1615 Approved by: davidiw

view details

David Wolinsky

commit sha 49101a008465a3a9a4a1747c8033744f6a044d54

[consensus] eliminate consensus state from consensus db SafetyRules has its own persistent storage, let's leverage it. Most of this code is just deleting the consensus state from consensus db The rest is setting up the appropriate tests so that the code leverages the new means for starting safety rules with a persistent backend This diff also solves some other issues that somehow overlapped with this work: - Epoch changes are somewhat handled in SafetyRules::update - Epochs begin at 1 Closes: #1615 Approved by: davidiw

view details

Phoenix Orlov

commit sha 5cc9ebbc9eb82891b45b24ff08242dc3a78022ee

[state sync] reduce error verbosity currently state sync prints full stack trace whenever routine error occurs it polluts log. This diff reduces verbosity Closes: #1707 Approved by: andll

view details

Ankush Agarwal

commit sha a31bfa5b81af470cbe4d5b3f6d80b5a1118de647

[network] Set TCP_NODELAY for tcp connections

view details

push time in a day

push eventankushagarwal/libra

Ankush Agarwal

commit sha 3cc4d34b7c456b7324b8ff46faf8ccbab8a81408

[cluster-test] Emit transactions when running network delay simulations Summary This PR updates the multi_region_network_simulation experiment such that transactions are generated in the background. Without that, we would be simply processing empty blocks in the experiment. Test Plan Ran against my cluster.

view details

push time in a day

PR opened libra/libra

Reviewers
[cluster-test] Emit transactions when running network delay simulations cluster_test

Summary

This PR updates the multi_region_network_simulation experiment such that transactions are generated in the background. Without that, we would be simply processing empty blocks in the experiment.

Test Plan

Ran against my cluster.

+13 -0

0 comment

1 changed file

pr created time in a day

push eventankushagarwal/libra

Todd Nowacki

commit sha b185766be55859e975aa639e6aba26e30e53cddf

[language][Move] Added expansion tests - Added tests for expansion pass. - Tried to cover +/- cases for each check Closes: #1660 Approved by: tzakian

view details

Herman Venter

commit sha 85a11b5f169964681869f84186852f127a7ca6cb

Make MIRAI happy Closes: #1681 Approved by: huitseeker

view details

Weiliang Li

commit sha de7b695be03229234aca9fe008831f60f8bfcb9c

make clippy happy

view details

David Wong

commit sha 3e40b5b54f467f73e0f07ee48bfe2bea2464b585

[consensus] removing the `terminate` in chained_bft_smr loop The consensus loop/select has a terminate that should not be reachable as the loop should run forever. This commit removes it. Closes: #1675 Approved by: zekun000

view details

Ankush Agarwal

commit sha 760fa412aab89c82c7979af0add2a601aa0b9305

[cluster-test] Update timeouts, print intermediate results Closes: #1689 Approved by: andll

view details

Bob Wilson

commit sha 0aa5ca1489eae348550c07238f2ba2b3bfda45c3

[language] Refactor lexer to add a lookahead API There are some cases where the parser really needs to look ahead at the next token before deciding how to parse the current token. It was hard to support that with the original lexer that I hacked together from the lalrpop output, but now that the lexer is sane, it is not so hard. Add a new lookahead API and use it in the parser, replacing the current workarounds. Closes: #1688 Approved by: tnowacki

view details

Todd Nowacki

commit sha e95ae06ea81898b16361b8da3ea28e3e35408432

[easy][move] Sort errors by first Loc when displaying - Sort errors by the initial Loc, makes reading errors a lot easier in big lists Closes: #1693 Approved by: vgao1996

view details

Andrey Chursin

commit sha b7f77a0e854d6563f62fad50fc48543117f20b17

[cluster-test] Test if file exists before log rotate When cluster test fails to setup cluster(for example, failed to cp genesis.blob), then there is no log file on host - previous was log rotate, and new one was not created. In this case attempt to log rotate it on next run produces a lot of noise. This checks if file exists before attempting to log rotate it Closes: #1692 Approved by: ankushagarwal

view details

Runtian Zhou

commit sha 346a2700cc6dc6b9c6849d0bfca4934de71b5fc7

[vm] Clean up the code cache api. Closes: #1668 Approved by: dariorussi

view details

Andrey Chursin

commit sha 14e6806915ad5b2e257e5e49f7b802b0311fe928

[cluster-test] Bump liveness health check timeout 1m->2m We have intermittent failures with 1m timeout, this diff will double it Closes: #1697 Approved by: ankushagarwal

view details

Young Yang Liauw

commit sha 3743988d16364960329af1342297432ca5355e5e

[CI] upgrade stretch to buster In CircleCI setup, bump rust:stretch to rust:buster for builders. This is to keep it consistent with Docker build. Closes: #1691 Approved by: bmwill

view details

Zekun Li

commit sha 7a30963bb0054833b97803df3b05a33cd7008c97

[consensus] enforce epoch consistency of messages We enforce every messages contain the information about the same epoch. (Fix a few missing verification too) Closes: #1694 Approved by: aching

view details

David Wolinsky

commit sha b0c05c6f24f18afb0a6be0e55f8608b55239ea92

[consensus] Add persistent storage for safety rules This introduces a persistent storage interface and two implementations for SafetyRules: - InMemory for (integration) testing purposes - OnDisk for ("production") testing purposes Eventually this same API should be able to be used by various Secrets Managers. Closes: #1615 Approved by: davidiw

view details

David Wolinsky

commit sha 906cf490aaf0025669c0d1f78868ca0f2862504a

[consensus] Move toward PersistentStorage interface for SafetyRules ConsensusState isn't really a store, SafetyRules needs a store. So this replaces the code to leverage a config backed storage unit and all the code mods that go with it. Note: because of the fact that consensus is multithreaded, PersistentStore must be both Send + Sync This commit also refactors the code within safety rules to take on a more library style approach as there are more features now within the code. Closes: #1615 Approved by: davidiw

view details

David Wolinsky

commit sha 7f88d8f3ffe8895f4cabfaaf2e3c9ac60a20a780

[config] add a new config for safety rules Add a new config because this needs to be entirely managed (owned by Safety Rules) so that it can mutate it during run time. It is worth noting that currently ConsensusConfig is aware of this config, in the future, this would only be the case for testing. Safety Rules binary should load this file directly, whereas Safety Rules library would receive this as part of validator starting. Closes: #1615 Approved by: davidiw

view details

David Wolinsky

commit sha 49101a008465a3a9a4a1747c8033744f6a044d54

[consensus] eliminate consensus state from consensus db SafetyRules has its own persistent storage, let's leverage it. Most of this code is just deleting the consensus state from consensus db The rest is setting up the appropriate tests so that the code leverages the new means for starting safety rules with a persistent backend This diff also solves some other issues that somehow overlapped with this work: - Epoch changes are somewhat handled in SafetyRules::update - Epochs begin at 1 Closes: #1615 Approved by: davidiw

view details

Ankush Agarwal

commit sha e6fd26dc3348eb44113c6fb133e67c584f0d3a7a

[cluster-test] Emit transactions when running network delay simulations

view details

push time in a day

pull request commentlibra/libra

[consensus] Update all consensus counters to new format

@sherry-x : Gentle ping

ankushagarwal

comment created time in 2 days

pull request commentlibra/libra

[cluster-test] Bump liveness health check timeout 1m->2m

@bors-libra r+

andll

comment created time in 4 days

pull request commentlibra/libra

[cluster-test] Test if file exists before log rotate

👍 Ran into this a couple of times myself.

@bors-libra r+

andll

comment created time in 4 days

push eventankushagarwal/libra

Ankush Agarwal

commit sha 68dff2c63dc008f43719b2ff49f9f03ed3e8aeb8

[cluster-test] Update timeouts, print intermediate results

view details

push time in 4 days

PR opened libra/libra

Reviewers
[cluster-test] Update timeouts, print intermediate results cluster_test

Summary

  • Define deadline for each experiment separately and enforce it using trait.
  • Update the timeout for ssh command because during network simulation events, it times out frequently
  • Set 24 hr deadline for multi-region-simulation experiment as it goes over a number of different topologies
+20 -5

0 comment

7 changed files

pr created time in 4 days

create barnchankushagarwal/libra

branch : multi-region-simulation

created branch time in 4 days

PR opened libra/libra

[network] Set TCP_NODELAY for tcp connections

Summary

With TCP_NODELAY, we are seeing improved throughput in our system when this is run on a 100-node validator cluster.

Test Plan

  • Deployed this to a 100 node validator cluster, immediately saw blocks processed per sec jump to 60 from 50. image

  • Also noticed improvements in Block creation to qc time graph.

  • Ran experiment by splitting 100 node cluster into 80/20 and injected 200ms delay between them. Overall the max(block creation to qc time) was much lower with TCP_NODELAY set to true

image

The right graph has TCP_NODELAY set to true and left one has it set to false

+5 -1

0 comment

1 changed file

pr created time in 4 days

create barnchankushagarwal/libra

branch : no-delay

created branch time in 4 days

pull request commentlibra/libra

[network] Add benchmark for transport with TCP_NODELAY set

@bors-libra r+

bothra90

comment created time in 5 days

delete branch ankushagarwal/libra

delete branch : network-delays-experiment

delete time in 5 days

Pull request review commentlibra/libra

[consensus] Update all consensus counters to new format

 // SPDX-License-Identifier: Apache-2.0  use lazy_static;-use libra_metrics::{DurationHistogram, OpMetrics};-use prometheus::{Histogram, IntCounter, IntGauge};--lazy_static::lazy_static! {-    pub static ref OP_COUNTERS: OpMetrics = OpMetrics::new_and_registered("consensus");-}+use libra_metrics::DurationHistogram;+use prometheus::{Histogram, IntCounter, IntCounterVec, IntGauge};  lazy_static::lazy_static! { ////////////////////// // HEALTH COUNTERS ////////////////////// /// This counter is set to the round of the highest committed block.-pub static ref LAST_COMMITTED_ROUND: IntGauge = OP_COUNTERS.gauge("last_committed_round");+pub static ref LAST_COMMITTED_ROUND: IntGauge = register_int_gauge!("libra_consensus_last_committed_round","This counter is set to the round of the highest committed block.").unwrap();  /// The counter corresponds to the version of the last committed ledger info.-pub static ref LAST_COMMITTED_VERSION: IntGauge = OP_COUNTERS.gauge("last_committed_version");+pub static ref LAST_COMMITTED_VERSION: IntGauge = register_int_gauge!("libra_consensus_last_committed_version", "The counter corresponds to the version of the last committed ledger info.").unwrap();  /// This counter is set to the round of the highest voted block.-pub static ref LAST_VOTE_ROUND: IntGauge = OP_COUNTERS.gauge("last_vote_round");+pub static ref LAST_VOTE_ROUND: IntGauge = register_int_gauge!("libra_consensus_last_vote_round", "This counter is set to the round of the highest voted block.").unwrap();  /// This counter is set to the round of the preferred block (highest 2-chain head).-pub static ref PREFERRED_BLOCK_ROUND: IntGauge = OP_COUNTERS.gauge("preferred_block_round");+pub static ref PREFERRED_BLOCK_ROUND: IntGauge = register_int_gauge!("libra_consensus_preferred_block_round", "This counter is set to the round of the preferred block (highest 2-chain head).").unwrap();  /// This counter is set to the last round reported by the local pacemaker.-pub static ref CURRENT_ROUND: IntGauge = OP_COUNTERS.gauge("current_round");+pub static ref CURRENT_ROUND: IntGauge = register_int_gauge!("libra_consensus_current_round", "This counter is set to the last round reported by the local pacemaker.").unwrap();  /// Count of the committed blocks since last restart.-pub static ref COMMITTED_BLOCKS_COUNT: IntCounter = OP_COUNTERS.counter("committed_blocks_count");+pub static ref COMMITTED_BLOCKS_COUNT: IntCounter = register_int_counter!("libra_consensus_committed_blocks_count", "Count of the committed blocks since last restart.").unwrap();  /// Count of the committed transactions since last restart.-pub static ref COMMITTED_TXNS_COUNT: IntCounter = OP_COUNTERS.counter("committed_txns_count");+pub static ref COMMITTED_TXNS_COUNT: IntCounterVec = register_int_counter_vec!("libra_consensus_committed_txns_count", "Count of the transactions since last restart. state is success or failed", &["state"]).unwrap(); -/// Count of success txns in the blocks committed by this validator since last restart.-pub static ref SUCCESS_TXNS_COUNT: IntCounter = OP_COUNTERS.counter("success_txns_count");+/// Histogram of idle time of spent in event processing loop+pub static ref EVENT_PROCESSING_LOOP_IDLE_DURATION_S: DurationHistogram = DurationHistogram::new(register_histogram!("libra_consensus_event_processing_loop_idle_duration_s", "Histogram of idle time of spent in event processing loop").unwrap()); -/// Count of failed txns in the committed blocks since last restart.-/// FAILED_TXNS_COUNT + SUCCESS_TXN_COUNT == COMMITTED_TXNS_COUNT-pub static ref FAILED_TXNS_COUNT: IntCounter = OP_COUNTERS.counter("failed_txns_count");--/// Histogram of idle time (ms) of spent in event processing loop-pub static ref EVENT_PROCESSING_LOOP_IDLE_DURATION_S: DurationHistogram = OP_COUNTERS.duration_histogram("event_processing_loop_idle_duration_s");-/// Histogram of busy time (ms) of spent in event processing loop-pub static ref EVENT_PROCESSING_LOOP_BUSY_DURATION_S: DurationHistogram = OP_COUNTERS.duration_histogram("event_processing_loop_busy_duration_s");+/// Histogram of busy time of spent in event processing loop+pub static ref EVENT_PROCESSING_LOOP_BUSY_DURATION_S: DurationHistogram = DurationHistogram::new(register_histogram!("libra_consensus_event_processing_loop_busy_duration_s", "Histogram of busy time of spent in event processing loop").unwrap());  /// Counters(queued,dequeued,dropped) related to proposals channel-pub static ref PROPOSAL_DROPPED_MSGS: IntCounter = OP_COUNTERS.counter("proposal_dropped_msgs_count");-pub static ref PROPOSAL_ENQUEUED_MSGS: IntCounter = OP_COUNTERS.counter("proposal_enqueued_msgs_count");-pub static ref PROPOSAL_DEQUEUED_MSGS: IntCounter = OP_COUNTERS.counter("proposal_dequeued_msgs_count");+pub static ref PROPOSAL_DROPPED_MSGS: IntCounter = register_int_counter!("libra_consensus_proposal_dropped_msgs_count", "Counters(queued,dequeued,dropped) related to proposals channel").unwrap();

Updated

ankushagarwal

comment created time in 5 days

push eventankushagarwal/libra

Zekun Li

commit sha 0eeebbe12b932651cecfbdc057daf4f0139fc7b3

[consensus][reconfig] help peers who sent old epoch messages As part of reconfiguration, honest nodes in new epoch need to help others still in old epoch. Otherwise we have a risk that one honest node join the new epoch and stop consensus, and remaining 2f are not able to make any progress. Closes: #1622 Approved by: dmitri-perelman

view details

Young Yang Liauw

commit sha 09f7e4192d71ecd8a15c66ce2b51f5f2e005654d

[CI] add conditional docker build to commit verify When we make changes to these docker files, we want to verify them in the commit work flow. Or we end up breaking nightly. This is a follow-up to 243f535. Here we added a conditional docker build job to the commit verify work flow. The docker build job will kick off iff the PR contains change in any file matches `*.Dockerfile`. Once triggered, the work flow will build each of the updated docker files. Closes: #1542 Approved by: huitseeker

view details

SunMi Lee

commit sha 73355c46c2c2b5822d507d17bebf66b9c57f4628

[admission-control] Add service test into lib Closes: #1640 Approved by: phlip9

view details

Rain

commit sha ca648e539534319e2c74ae8fe5f24b651d9c8f89

add a rust-toolchain file back in This has many advantages: 1. It pushes our builds towards hermeticity, which having worked on devtools for many years I've come to believe is *a priori* good. 2. It would be possible to make CI hermetic while making dev workflows not so much, but I believe that making dev builds as close to CI as possible is *a priori* good. 3. It completely obviates the need for manually requesting that developers upgrade Rust versions for new features -- that can simply be managed through tooling, as it should be. Closes: #1662 Approved by: bmwill

view details

Andrey Chursin

commit sha a5041b7fff1aac30d7ca24a3bfea5bdfdba47122

[cluster-test] Log command used to fetch genesis.blob Closes: #1665 Approved by: dmitri-perelman

view details

François Garillot

commit sha 3e19c7d92b843368c47b0230a9c55c3d7c082557

[easy] More idiomatic Option/Result patterns Closes: #1639 Approved by: zekun000

view details

Sherry Xiao

commit sha 424f67658d31b08d8444810ae96637c38457ec70

[terraform] Increase monitoring instance disk volume rename the variable Closes: #1666 Approved by: ankushagarwal

view details

Qinfan Wu

commit sha 3f795ac8096460b7639bf862c5a7fa6167e24728

[Crypto] Implement from_bit_iter for HashValue So we can easily transform a vector of boolean to a HashValue. Closes: #1577 Approved by: msmouse

view details

Qinfan Wu

commit sha 73f14ce0c94e9b50582e50fc49d37e08ba6a6756

[Proof] Introduce SparseMerkleRangeProof This proof intends to prove that a range of things exist in a sparse Merkle tree. Given that when restoring the state tree, we always go from left to right, so at any point in time a list of siblings on the right is sufficient to prove the everything on the left. The verification is a bit complex... The basic idea is that when we have the full list of leaves for a sparse Merkle tree, we can compute the common prefix length of each adjacent key pairs and find out which pair consists of the left child and right child of the same parent. Then we can compute their parent and reduce the problem. Note that we are just doing this in unit tests to test the `get_range_proof` method, the real verification will be a little bit different. Closes: #1577 Approved by: msmouse

view details

Yucong Sun

commit sha 5b3833b2e4f3a74b190a88b56854fe3598504c2d

Remove warning to cut types dependency on slog Closes: #1614 Approved by: bmwill

view details

Shaz Qadeer

commit sha 1daa09943047b1d826569eeff53a09949ef35567

First version of the new borrow checker based on the abstraction of an acyclic labeled borrow graph. Closes: #1598 Approved by: tnowacki

view details

Ankush Agarwal

commit sha 3a0b3a5bb2a164fa113d8d8d27b7a95909402bd7

[cluster-test] Add experiment to simulate multi-region environment and report result Summary This is the code which is used to run experiments with introducing network delays between nodes and simulation of multi-region With this we can specify a list of split sizes and delays and we will run the simulation for every combination of these two parameters Update NetworkDelay to be an Effect instead of Action Update flags for multi-region simulation because now we can run simulations with a variety of split sizes and delays instead of just one Complete rewrite of multi_region_network_simulation.rs to handle running simulations with a variety of configs Print a list of all metrics in a csv format at the end of the experiment Test Plan Tested this on my cluster Closes: #1652 Approved by: andll

view details

Aaron Gao

commit sha 33f1023ac03f0d049ca57a511bfa7fb4c5d76116

[executor] refactor executor with synced_trees Closes: #1627 Approved by: dmitri-perelman

view details

Aaron Gao

commit sha 19f051796e97ca6cd22b9563b0a1d2932c3ad585

[executor] idempotent commits Closes: #1627 Approved by: dmitri-perelman

view details

Herman Venter

commit sha 5107b309fe2a38b2bc2ca6d51b0b82d3a1b73560

Make MIRAI happy Closes: #1664 Approved by: huitseeker

view details

Zekun Li

commit sha 7e3777cf511ea57d47995aeb7b1131386584c93e

[consensus][restart] simplify the recovery flow given idempotent commits support We're able to greatly simplify the recovery process during restart thanks to the idempotent commit support. We could directly rely on the latest ledger info storages returns us and it's now guaranteed to exist in consensusdb due to the state sync failure handling, #1590 Also we don't need to continue sync upon restart. A side-effect of this pr is we now generate genesis virtually and never persist it into consensusdb. Closes: #1616 Approved by: dmitri-perelman

view details

Zekun Li

commit sha 02596878adbd61f4d221a0e1c9bc48f9669c69b2

[consensus][reconfig] extend reconfiguration test for a few epochs Closes: #1616 Approved by: dmitri-perelman

view details

Ankush Agarwal

commit sha b560944ffbd11f9e6e072c548540ae15ba5307f2

[consensus] Update all consensus counters to new format Summary Stop using OP_COUNTERS Create a separate metric for each counter prefixed by "libra_consensus_" Use promethus macros directly for creating counters Update all usages of consensus counters to new names

view details

Ankush Agarwal

commit sha 6101e82dc4b7fa5df932f9e92016aaa77986631e

fixup! [consensus] Update all consensus counters to new format

view details

push time in 5 days

Pull request review commentlibra/libra

[consensus] Update all consensus counters to new format

 // SPDX-License-Identifier: Apache-2.0  use lazy_static;-use libra_metrics::{DurationHistogram, OpMetrics};+use libra_metrics::DurationHistogram; use prometheus::{Histogram, IntCounter, IntGauge}; -lazy_static::lazy_static! {-    pub static ref OP_COUNTERS: OpMetrics = OpMetrics::new_and_registered("consensus");-}- lazy_static::lazy_static! { ////////////////////// // HEALTH COUNTERS ////////////////////// /// This counter is set to the round of the highest committed block.-pub static ref LAST_COMMITTED_ROUND: IntGauge = OP_COUNTERS.gauge("last_committed_round");+pub static ref LAST_COMMITTED_ROUND: IntGauge = register_int_gauge!("libra_consensus_last_committed_round","This counter is set to the round of the highest committed block.").unwrap();  /// The counter corresponds to the version of the last committed ledger info.-pub static ref LAST_COMMITTED_VERSION: IntGauge = OP_COUNTERS.gauge("last_committed_version");+pub static ref LAST_COMMITTED_VERSION: IntGauge = register_int_gauge!("libra_consensus_last_committed_version", "The counter corresponds to the version of the last committed ledger info.").unwrap();  /// This counter is set to the round of the highest voted block.-pub static ref LAST_VOTE_ROUND: IntGauge = OP_COUNTERS.gauge("last_vote_round");+pub static ref LAST_VOTE_ROUND: IntGauge = register_int_gauge!("libra_consensus_last_vote_round", "This counter is set to the round of the highest voted block.").unwrap();  /// This counter is set to the round of the preferred block (highest 2-chain head).-pub static ref PREFERRED_BLOCK_ROUND: IntGauge = OP_COUNTERS.gauge("preferred_block_round");+pub static ref PREFERRED_BLOCK_ROUND: IntGauge = register_int_gauge!("libra_consensus_preferred_block_round", "This counter is set to the round of the preferred block (highest 2-chain head).").unwrap();  /// This counter is set to the last round reported by the local pacemaker.-pub static ref CURRENT_ROUND: IntGauge = OP_COUNTERS.gauge("current_round");+pub static ref CURRENT_ROUND: IntGauge = register_int_gauge!("libra_consensus_current_round", "This counter is set to the last round reported by the local pacemaker.").unwrap();  /// Count of the committed blocks since last restart.-pub static ref COMMITTED_BLOCKS_COUNT: IntCounter = OP_COUNTERS.counter("committed_blocks_count");+pub static ref COMMITTED_BLOCKS_COUNT: IntCounter = register_int_counter!("libra_consensus_committed_blocks_count", "Count of the committed blocks since last restart.").unwrap();

For blocks count, there are no states, but committed transaction count has two states: failed, success. I have updated the COMMITTED_TXNS_COUNT to use a label for success, failure.

ankushagarwal

comment created time in 5 days

push eventankushagarwal/libra

Ankush Agarwal

commit sha ae09ee0e147a08b5517bc68a4ae2eab019a8fccc

fixup! [consensus] Update all consensus counters to new format

view details

push time in 5 days

push eventankushagarwal/libra

Ankush Agarwal

commit sha ce826990390234f890255e9c6e129cdde779ec7f

fixup! [consensus] Update all consensus counters to new format

view details

push time in 5 days

pull request commentlibra/libra

[network] Add benchmark for transport with TCP_NODELAY set

@bors-libra r+

bothra90

comment created time in 5 days

pull request commentlibra/libra

[terraform] Increase monitoring instance disk volume

@bors-libra r+

sherry-x

comment created time in 5 days

pull request commentlibra/libra

[terraform] Increase monitoring instance disk volume

Can you make this a var with a default value of 100?

In my 100 node cluster, I manually have to edit it to 1000G

sherry-x

comment created time in 5 days

push eventankushagarwal/libra

Ankush Agarwal

commit sha 7cb780e2214bb35f047c0dfbfedfaec48093712c

fixup! [consensus] Update all consensus counters to new format

view details

push time in 5 days

Pull request review commentlibra/libra

[consensus] Update all consensus counters to new format

       "steppedLine": false,       "targets": [         {-          "expr": "consensus_gauge{op='round_timeout_ms'}",+          "expr": "libra_consensus_round_timeout_ms",

Updated

ankushagarwal

comment created time in 5 days

push eventankushagarwal/libra

Ankush Agarwal

commit sha fe0ce08ccd0944e8140192fcbc6ea212195474ce

[consensus] Update all consensus counters to new format Summary Stop using OP_COUNTERS Create a separate metric for each counter prefixed by "libra_consensus_" Use promethus macros directly for creating counters Update all usages of consensus counters to new names

view details

push time in 6 days

push eventankushagarwal/libra

Ankush Agarwal

commit sha 6a2a10c424d4281e96e37a5fc4053c4520614bc6

[consensus] Update all consensus counters to new format Summary Stop using OP_COUNTERS Create a separate metric for each counter prefixed by "libra_consensus_" Use promethus macros directly for creating counters Update all usages of consensus counters to new names

view details

push time in 6 days

PR opened libra/libra

Reviewers
[consensus] Update all consensus counters to new format consensus

Summary

  • Stop using OP_COUNTERS
  • Create a separate metric for each counter prefixed by "libra_consensus_"
  • Use promethus macros directly for creating counters
  • Update all usages of consensus counters to new names

Testing

  • Yet to test
+125 -124

0 comment

9 changed files

pr created time in 6 days

push eventankushagarwal/libra

Ankush Agarwal

commit sha f4ad582bfde06d3595503423bf113853e3c875ba

WIP

view details

push time in 6 days

push eventankushagarwal/libra

Ankush Agarwal

commit sha e900732196808134325d90cf0cd95f1fe0019f55

WIP

view details

push time in 6 days

push eventankushagarwal/libra

Ankush Agarwal

commit sha c5679b6e524753699508d12e1577f41f2af914d8

WIP

view details

push time in 6 days

push eventankushagarwal/libra

Ankush Agarwal

commit sha 49f7241c335c9f96b78669ea025b19b428564f38

WIP

view details

push time in 6 days

create barnchankushagarwal/libra

branch : metrics

created branch time in 6 days

push eventankushagarwal/libra

Philip Hayes

commit sha 8b3234ff2c25012e519c7f8f0b01d77f7a192a92

[network][mempool] `MempoolNetworkSender` now wraps `NetworkSender` Closes: #1644 Approved by: phlip9

view details

Philip Hayes

commit sha 06d1c8bfe5b799dc77a781a7a66cec1326d24d05

[network][state-sync] `StateSynchronizerSender` now wraps `NetworkSender` + Rename `STATE_SYNCHRONIZER_MSG_PROTOCOL` to `STATE_SYNCHRONIZER_DIRECT_SEND_PROTOCOL` so it's consistent with other network application modules. Closes: #1644 Approved by: phlip9

view details

Philip Hayes

commit sha 7f5214e7cc5b4188b992ce1c222f9f8a872e9317

[network][ac] `AdmissionControlNetworkSender` now wraps `NetworkSender` Closes: #1644 Approved by: phlip9

view details

Philip Hayes

commit sha e9bab5e777e4d6cd14e57999258a405f74005d5c

[network] `HealthCheckerNetworkSender` now wraps `NetworkSender` Closes: #1644 Approved by: phlip9

view details

Andrey Chursin

commit sha ed2d3b44f60d7a5fc7091b63a872473dc69ac8be

[cluster-test] Use sudo when updating genesis.blob on deploy Closes: #1654 Approved by: ankushagarwal

view details

Andrey Chursin

commit sha 35901d2f913f9e124fd0d7e3f45dc73564232dfd

[cluster-test] Reload faucet account on every tx emit job (1) Faucet account data can get stale and needs to be reloaded between jobs (2) Cluster might not be healthy on cluster test startup and it might not be possible to load faucet account at that time Closes: #1655 Approved by: ankushagarwal

view details

Ankush Agarwal

commit sha e5fe4f0ba76689d5cc78a492fad616e19cc95175

[cluster-test] Add experiment to simulate multi-region environment and report result Summary This is the code which is used to run experiments with introducing network delays between nodes and simulation of multi-region With this we can specify a list of split sizes and delays and we will run the simulation for every combination of these two parameters Update NetworkDelay to be an Effect instead of Action Update flags for multi-region simulation because now we can run simulations with a variety of split sizes and delays instead of just one Complete rewrite of multi_region_network_simulation.rs to handle running simulations with a variety of configs Print a list of all metrics in a csv format at the end of the experiment Test Plan Tested this on my cluster

view details

push time in 6 days

pull request commentlibra/libra

[cluster-test] Reload faucet account on every tx emit job

@bors-libra r+

andll

comment created time in 6 days

push eventankushagarwal/libra

Ankush Agarwal

commit sha 36d6e6416129f40856d23efbacdd6bdfb0125210

[cluster-test] Add experiment to simulate multi-region environment and report result Summary This is the code which is used to run experiments with introducing network delays between nodes and simulation of multi-region With this we can specify a list of split sizes and delays and we will run the simulation for every combination of these two parameters Update NetworkDelay to be an Effect instead of Action Update flags for multi-region simulation because now we can run simulations with a variety of split sizes and delays instead of just one Complete rewrite of multi_region_network_simulation.rs to handle running simulations with a variety of configs Print a list of all metrics in a csv format at the end of the experiment Test Plan Tested this on my cluster

view details

push time in 6 days

pull request commentlibra/libra

[cluster-test] Use sudo when updating genesis.blob on deploy

@bors-libra r+

andll

comment created time in 6 days

push eventankushagarwal/libra

Andrey Chursin

commit sha cd317b6945e4810e9c1b4c863bcb29447607ae5f

[cluster-test] Introduce some convenience utils ``` ./cluster-test --discovery <print list of nodes> ./cluster-test --pssh -- echo Hello world <execute commands on all nodes> ``` Closes: #1647 Approved by: ankushagarwal

view details

Ankush Agarwal

commit sha 7c53a692df5a630569b6a7a08c010f726e1dfea9

[cluster-test] Add experiment to simulate multi-region environment and report result Summary This is the code which is used to run experiments with introducing network delays between nodes and simulation of multi-region With this we can specify a list of split sizes and delays and we will run the simulation for every combination of these two parameters Update NetworkDelay to be an Effect instead of Action Update flags for multi-region simulation because now we can run simulations with a variety of split sizes and delays instead of just one Complete rewrite of multi_region_network_simulation.rs to handle running simulations with a variety of configs Print a list of all metrics in a csv format at the end of the experiment Test Plan Tested this on my cluster

view details

push time in 6 days

PR opened libra/libra

Reviewers
[cluster-test] Add experiment to simulate multi-region environment and report result cluster_test

Summary

  • This is the code which is used to run experiments with introducing network delays between nodes and simulation of multi-region
  • Update NetworkDelay to be an Effect instead of Action
  • Update flags for multi-region simulation because now we can run simulations with a variety of split sizes and delays instead of just one
  • Complete rewrite of multi_region_network_simulation.rs to handle running simulations with a variety of configs
  • Print a list of all metrics in a csv format at the end of the experiment

Test Plan

Tested this on my cluster

+207 -87

0 comment

4 changed files

pr created time in 6 days

push eventankushagarwal/libra

Sherry Xiao

commit sha 2a8bc02a2001a893a1caa1343e06adecee214311

[monitoring] change public metric whitelist to be a constant Closes: #1571 Approved by: bmwill

view details

Andrey Chursin

commit sha 104a6fd85308563d6320304283aa866c67d73467

[cluster-test] Increase deploy startup timeout Closes: #1636 Approved by: ankushagarwal

view details

Brandon Williams

commit sha b6556279fba0411994561566841b3d097223cd14

[x] skip whitespace lints for .exp files In some future patch a number of .exp files are going to be added to the repository. These files are used for testing the move source language and include the expected output from the move compiler. Since these files are the project of a third-party library its difficult to completely control their format and ensure that there are no whitespace violations. Due to this, lets just skip whitespace lints for all .exp files. Closes: #1634 Approved by: tnowacki

view details

Brandon Williams

commit sha d4d3c1b1cd5fcfe827a420adda34eb1c4c110ddd

[x] simplify extension checking in license lint Closes: #1634 Approved by: tnowacki

view details

Runtian Zhou

commit sha 2a16729cd54876888f99eec7b458d58be2e87ec5

[language] Implement block metadata transaction logic Closes: #1611 Approved by: dariorussi

view details

Runtian Zhou

commit sha d7e990914ba4d46ffcdd572edc3ec739dc6c2281

[language] Deprecate TransactionPayload::Program Closes: #1626 Approved by: dariorussi

view details

Andrey Chursin

commit sha 03c009f69a53911fa9b6a8d4cf15bea3bcd63eb9

[cluster-test] Remove kinesis log tail This was replaced with debug_interface_log_tail Closes: #1641 Approved by: ankushagarwal

view details

Andrey Chursin

commit sha 181a8bb3298a46895c5d9f84bac4e7ddab4bc7d3

[cluster-test] Remove log prune Log prune was previously used to cleanup cloudwatch logs Since we don't longer use cloud watch this is not needed Closes: #1641 Approved by: ankushagarwal

view details

Philip Hayes

commit sha 7db2742ee88140fe32d9d56d06c8793b7d1c6440

[network][consensus] ConsensusNetworkSender now wraps `NetworkSender` + Refactored `chained_bft::NetworkSender` so it uses `ConsensusNetworkSender::send_to_many` instead of the previous `send_bytes` method, which no longer needs to exist. + Added `ValidatorVerifier::get_account_addresses_iter()`. We should be able to remove `get_ordered_account_addresses()` in a subsequent commit, since it does an unnecessary sort (`BTreeMap` already sorts on insertion). Closes: #1643 Approved by: bothra90

view details

Qinfan Wu

commit sha f03e55177c27eca14729cd53d2e4da71d4710175

[Storage][JellyfishMerkle] Extract some util functions in tests So we don't repeat the code. Closes: #1648 Approved by: lightmark

view details

Qinfan Wu

commit sha b6e06b236300550e3654c24b4f76d837f61592e3

[Storage][JellyfishMerkle] Fix get_with_proof for edge cases If there exist two keys that only differ from the last nibble, the code would have a problem. Closes: #1648 Approved by: lightmark

view details

Andrey Chursin

commit sha 82ebc9ac8e4d1ac4d760327859d890a17e1962b2

[cluster-test] Use sync log Instead of using async drain, using sync. Mainly two reasons: - Some parts of cluster test use println! for better UX, but when async drain is used output of println! and log! macro is mixed in a bad way - if program uses log! macro and terminates quickly, part of output can disappear because async thread did not process log There is no intense log output in cluster test so sync log is not an issue Closes: #1649 Approved by: ankushagarwal

view details

Andrey Chursin

commit sha 4167843ea2043f1b7e8f79ab5c89c6e872b22b67

[cluster-test] Update genesis.blob on deploy This will fetch genesis.blob generated by circle during deploy We need this because genesis.blob is updated relatively frequently and it breaks cluster test every time Fixes #1224 Closes: #1651 Approved by: ankushagarwal

view details

Dmitri Perelman

commit sha 9d510ae3717935dbd3f89cf6f880a7dca9ca80c4

LedgerInfo commit information aggregated in a BlockInfo struct Summary: The fields of LedgerInfo that describe the committed status of the Ledger are in fact identical to the fields of the block metadata that Consensus is carrying around (the only field of BlockInfo that is currently not present in LedgerInfo is a round). Any update to the LedgerInfo would have to be mirrored in BlockInfo because TCB needs to verify & sign it. Hence, this change is aggregating the LedgerInfo fields in the BlockInfo. We had to move BlockInfo from consensus types to libra types as a result of that. Testing: this is supposed to be a noop, existing unit test coverage Ref #1604 Closes: #1629 Approved by: zekun000

view details

Todd Nowacki

commit sha 82a70112f9b54607159c339aedc684546e291cd7

[language][Move] Added test framework - Added test framework for Move lang expected output tests - Added tests to check all of the stdlib files Closes: #1624 Approved by: vgao1996

view details

Ankush Agarwal

commit sha 252c93b1bbb52d97aecf34fa74306af8eafcf713

[cluster-test] Add experiment to simulate multi-region environment and report result

view details

push time in 6 days

pull request commentlibra/libra

[cluster-test] Update genesis.blob on deploy

@bors-libra r+

andll

comment created time in 6 days

pull request commentlibra/libra

[cluster-test] Use sync log

@bors-libra r+

andll

comment created time in 6 days

issue openedlibra/libra

[cluster-test] Generate perf report from a validator during cluster-test

We want to get a perf report from validators using the linux perf tool. The idea is to get an idea of the resource(CPU, memory, disk, etc) utilization by various modules (consensus, network, mempool, etc) during our experiments.

created time in 6 days

create barnchankushagarwal/libra

branch : network-delays-experiment

created branch time in 6 days

pull request commentlibra/libra

[cluster-test] Remove unused code

@bors-libra r+

andll

comment created time in 6 days

pull request commentlibra/libra

[cluster-test] Increase deploy startup timeout

@bors-libra r+

andll

comment created time in 7 days

startedjunegunn/fzf

started time in 8 days

delete branch ankushagarwal/libra

delete branch : refactor-libra-channel

delete time in 12 days

push eventankushagarwal/libra

Ankush Agarwal

commit sha c68c4daf6b278034be5a98b185e5640f0a6f88f3

WIP

view details

push time in 12 days

push eventankushagarwal/libra

Andrey Chursin

commit sha bf3e71207439a9fcd6a068289abbd7d2c4a0a04d

[cluster-test] Do not setup faucet and lb attachment for cluster test Setting up those resources take time during tf apply and they are not needed Closes: #1534 Approved by: sausagee

view details

Andrey Chursin

commit sha 6c763bf0d855771444fcca8e161ba0a09ec298fc

[cluster-test] Measure and report performance on every run This change measure TPS in two conditions (all nodes up and 10% down) and reports number. This allow to check that our transaction processing is not broken and that TPS did not regress. Closes: #1531 Approved by: ankushagarwal

view details

David Wolinsky

commit sha 21433960202a8370b40bc285a4686de234f67ee6

[consensus] Eliminate the field enforce_increasing_timestamps This is true everywhere (except fuzzing). Discussed many times on how this should be deleted... this is an easy cleanup as I ponder ways to make the code slightly easier to navigate and test. Closes: #1539 Approved by: dmitri-perelman

view details

Zekun Li

commit sha f90b9b46a3c44c7b527d6dd6744e65546ddc947c

[consensus][reconfig] add epoch to ConsensusState Closes: #1541 Approved by: davidiw

view details

Bob Wilson

commit sha bb025c779f8f0e0b8390fd9e691d2c23265bf00a

[language] new lexer for the ir-to-bytecode compiler This replaces the ugly code that I had previously hacked together from the output of lalrpop. There is still a lot more that could be done here, but at least this is a reasonable implementation for the existing interface and set of tokens. Closes: #1493 Approved by: tnowacki

view details

Aaron Gao

commit sha 1061ed0e56d4344d72cd568c35df73ce33c6a717

[proof] reverse the order of sibling in merkle proofs Closes: #1527 Approved by: lightmark

view details

Jack Moffitt

commit sha 9c462ba1788def2668f8b0ad0814b1c108df72f4

[build] Use correct features even when building individually vm-validator and mempool were both getting built incorrectly when built by themselves due to feature unification. After searching all the Cargo.tomls for similar problems, only these two cases were found. Closes: #1549 Approved by: bmwill

view details

Dmitri Perelman

commit sha 6b7b568b6f2d6c0c2f04ed43410b48e7ad1c3a1a

[Consensus] BlockRetrieval types Summary: BlockRetrieval types used to be scattered across several network-related files. This change is a noop from the business logic perspective: it just organizes the BlockRetrieval related types logic in consensus types. Testing: unit tests Closes: #1548 Approved by: zekun000

view details

Zekun Li

commit sha ba9cf3326924243d6168bee3653242b02c57564f

[consensus][reconfig] setup channels to propogate epoch changes Closes: #1538 Approved by: dmitri-perelman

view details

Ankush Agarwal

commit sha ffefed5812713fef6505f3f0e840ff35e8f7f950

[cluster-test] Add a cleanup method for performing cleanups Summary The cleanup method is intended to cleanup effects of experiments which might have crashed mid way. We run this before we run experiments. There is also a standalone flag for running cleanup. Test Plan [ec2-user:validator@ip-10-0-0-212 ~]$ ~/libra/target/cluster_test_docker_builder/cluster-test --workplace=kush --cleanup Oct 29 00:15:27.334 INFO Discovered 4 peers Oct 29 00:15:27.555 INFO Log tail thread started in 220 ms Oct 29 00:15:27.556 INFO RemoveNetworkEffects for 1e5d5a74(10.0.6.55) Oct 29 00:15:27.556 INFO RemoveNetworkEffects for 8deeeaed(10.0.1.66) Oct 29 00:15:27.556 INFO RemoveNetworkEffects for ab0d6a54(10.0.10.92) Oct 29 00:15:27.557 INFO RemoveNetworkEffects for 57ff8374(10.0.3.130) Closes: #1540 Approved by: andll

view details

Andrey Chursin

commit sha d593b65bedd0b9a4fd332f55cd011282fb072189

[cluster-test] Retry mint Currently mint can failure. We'll work with core infra to fix it, but for now this just adds retry Closes: #1552 Approved by: ankushagarwal

view details

Andrey Chursin

commit sha 0bc5fb03e4a10647ba288b3c0d80093a6e103120

[cluster-test] Set timeout for running ssh This sets 15 seconds for total ssh execution to prevent spurious failures like 'experiment did not complete on time' Closes: #1557 Approved by: ankushagarwal

view details

Ankush Agarwal

commit sha 51dd2b4261d6b80ee8207c02bfd6beb8f84b1ae5

[cluster-test] Use split_n_random for PacketLoss experiment Summary fn split_n_random provides the functionality of splitting a cluster's instances randomly. Re-using that logic in PacketLoss experiment Use debug level instead of info level in RemoveNetworkEffects Test Plan Ran it on my test cluster [ec2-user:validator@ip-10-0-0-212 ~]$ ~/libra/target/cluster_test_docker_builder/cluster-test --workplace=kush --packet-loss-experiment --packet-loss-percent-instances=50 --packet-loss-percent=20 --packet-loss-duration-secs=10 Oct 29 19:16:29.709 INFO Discovered 4 peers Oct 29 19:16:29.930 INFO Log tail thread started in 220 ms Oct 29 19:16:30.193 INFO Starting experiment Packet Loss 20.00% [1e5d5a74(10.0.6.55), ab0d6a54(10.0.10.92), ] Oct 29 19:16:30.193 INFO PacketLoss 20.00% for 1e5d5a74(10.0.6.55) Oct 29 19:16:30.585 INFO PacketLoss 20.00% for ab0d6a54(10.0.10.92) Oct 29 19:17:05.207 INFO Experiment finished, waiting until all affected validators recover Oct 29 19:17:10.211 INFO Experiment completed Closes: #1555 Approved by: andll

view details

Andrey Chursin

commit sha fd964e88d138ec26844f75675d7e59ccf40c8092

[cluster-test] Remove cleanup() from run_single_experiment() run_single_experiment runs in loop when running test suite, there is no point in cleaning up every time inside suite and it takes time. This diff introduces new fn cleanup_and_run for cleanup+run_single_experiment for cmdline utils Closes: #1558 Approved by: ankushagarwal

view details

Philip Hayes

commit sha 0672d5af6e1fd301d9f1d234c0c5480ccfff226e

[network] Add Dial/Disconnect Peer request types to network interface See #1516 Closes: #1554 Approved by: bothra90

view details

Philip Hayes

commit sha eeadab0a0cfe4fd845520ba7b929c0e9499fa315

[network] Wire-through Dial/Disconnect requests to PeerManager actor See #1516 Closes: #1554 Approved by: bothra90

view details

Abhay Bothra

commit sha a46f375ff7c1de648d96b1fcc1108db2c6275619

[network] Test multiplexing of yamux substreams within yamux substreams Closes: #1563 Approved by: phlip9

view details

Zekun Li

commit sha be9270387c2edeb3ed21abb9c87911a043005623

suppress dead code warning Closes: #1564 Approved by: bschwab

view details

Young Yang Liauw

commit sha 194713b84f5ec9e34b85c1065f1062701fd963bc

[coverage][easy] update code coverage runner Cargo build was updated recently. We are updating the runner to use `cargo xtest` to stay in sync. Closes: #1553 Approved by: bmwill

view details

Sherry Xiao

commit sha 008323c818c4d84ea0ffafef4020b428e72e5e04

[monitoring] Expose non-numeric metrics put git revision into environment variable Closes: #1532 Approved by: bmwill

view details

push time in 12 days

pull request commentlibra/libra

[enhancement] Refactor libra_channel and remove MessageQueue trait

@bors-libra r+

ankushagarwal

comment created time in 12 days

push eventankushagarwal/libra

Andrey Chursin

commit sha 570a5509fac1604de4c11870639949c4acbdc24a

[cluster-test] Show changelog This diff adds changelog with list of commits between previous tested image and current tested image. Closes: #1570 Approved by: ankushagarwal

view details

Bob Wilson

commit sha 9f751e190f9de9891a9743c1e51fa7d7b5e1f9e1

[language] Change binary operator precedence to match Rust Swap the priority of binary XOR and binary OR operators, so that XOR has a higher precedence. Change the precedence of comparison operators to be higher than logical AND/OR. Closes: #1569 Approved by: tnowacki

view details

Sherry Xiao

commit sha 8328eba6f695b1c20eadd98e25a387898f76a768

[monitoring] Remove old counters and update dashboard Closes: #1546 Approved by: bothra90

view details

Gerardo Di Giacomo

commit sha 7cbaa10375e8e0643f295b3ae792dc6d70cd232c

remove rust_crypto as it's abandoned Closes: #1575 Approved by: kchalkias

view details

Sherry Xiao

commit sha cf7a1784ec729d0ad01126c864d670ed2241ae29

Use GIT_REVISION env if already exist setup env for cluster test Closes: #1572 Approved by: andll

view details

Andrey Chursin

commit sha 7326b7daf2cbd8e6c222de63d9ed3a8eb763e536

[cluster-test] Bump experiment deadline Looks like in rare cases reboot can take longer then 10min, this is rare, but in order to not fail experiment it makes sense to bump this deadline Closes: #1579 Approved by: dmitri-perelman

view details

Brandon Williams

commit sha 0d82ee3f2688a9772f47665e4f400a5e0ccd2271

[x] skip running cargo if we have no packages to run Currently if an empty iterator is passed to `run_on_packages_together`, we'll happily run the cargo command with no `--package` args. This patch fixes this behavior and instead checks to see if we have any package args and does an early return if the provided iterator is empty. Closes: #1578 Approved by: metajack

view details

Andrey Chursin

commit sha 1f918aa09ab634cbbb1daa64b52c63f290e624b7

[docker] Move docker CMD into docker-run.sh Instead of having complicated command in docker CMD, this diff moves setting up environment into `docker-run.sh` file and sets CMD to this shell file. Closes: #1580 Approved by: opsguy

view details

Ankush Agarwal

commit sha 7bdb4bf9d7e118124e3da028c3c81ef0d36a21cc

[enhancement] Follow-up changes in libra_channel Summary Implement Drop trait for Sender and Receiver Keep track of when the receiver is dropped. The Sender will log crit! whenever it tries to send to a Receiver which has been dropped. When a Sender gets dropped, we will log a crit! message as well Related to #1483 Closes: #1490 Approved by: ankushagarwal

view details

Ankush Agarwal

commit sha 1cf91b1888805bd236592239092311b0fed3557d

[enhancement] Refactor libra_channel and remove MessageQueue trait

view details

push time in 12 days

delete branch ankushagarwal/libra

delete branch : libra-channel-impl

delete time in 12 days

Pull request review commentlibra/libra

[cluster-test] Log rotate libra.log

 impl ClusterTestRunner {             thread::sleep(Duration::from_secs(10));             info!("Starting...");         }+        let now = Utc::now();+        let suffix = format!(+            ".{:04}{:02}{:02}-{:02}{:02}{:02}.gz",+            now.year(),+            now.month(),+            now.day(),+            now.hour(),+            now.minute(),+            now.second()+        );+        let suffix = &suffix;+        info!("Fill use suffix {} for log rotation", suffix);         let jobs = self             .cluster             .instances()             .iter()             .map(|instance| {                 let instance = instance.clone();                 move || {-                    if let Err(e) =-                        instance.run_cmd_tee_err(vec!["sudo", "rm", "-rf", "/data/libra/"])-                    {-                        info!("Failed to wipe {}: {:?}", instance, e);-                    }+                    instance+                        .run_cmd_tee_err(vec!["sudo", "rm", "-rf", "/data/libra/*db"])+                        .map_err(|e| info!("Failed to wipe {}: {:?}", instance, e))+                        .ok();

nit: Correct me if I'm wrong, .ok() here and below is a no-op right? If so, can we remove it?

andll

comment created time in 12 days

pull request commentlibra/libra

[cluster-test] Log rotate libra.log

@bors-libra delegate+

andll

comment created time in 12 days

Pull request review commentlibra/libra

[cluster-test] Log rotate libra.log

 impl ClusterTestRunner {             thread::sleep(Duration::from_secs(10));             info!("Starting...");         }+        let now = Utc::now();+        let suffix = format!(+            ".{:04}{:02}{:02}-{:02}{:02}{:02}.gz",+            now.year(),+            now.month(),+            now.day(),+            now.hour(),+            now.minute(),+            now.second()+        );+        let suffix = &suffix;+        info!("Fill use suffix {} for log rotation", suffix);

nit: typo in Fill

andll

comment created time in 12 days

pull request commentlibra/libra

[enhancement] Improvements to libra_channel

@bors-libra r+

ankushagarwal

comment created time in 12 days

pull request commentlibra/libra

Use GIT_REVISION env if already exist

Even with this PR, I am having trouble building cluster-test binary in docker

./docker/cluster_test/build.sh

error: failed to run custom build command for `libra-metrics v0.1.0 (/libra/common/metrics)`

Caused by:
  process didn't exit successfully: `/target/debug/build/libra-metrics-1e57c5216023b500/build-script-build` (exit code: 101)
--- stderr
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/libcore/result.rs:1165:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

warning: build failed, waiting for other jobs to finish...
error: build failed
sherry-x

comment created time in 13 days

PR opened libra/libra

Reviewers
[enhancement] Refactor libra_channel and remove MessageQueue trait enhancement

Summary

  • Follow up PR for @phlip9's suggestions in PR #1490
  • Rename PerValidatorQueue to PerKeyQueue because the Key is now a generic type
  • Remove MessageQueue trait and make PerKeyQueue a part of libra_channel
  • Reduce the visibility of PerKeyQueue to crate

This depends on PR #1490, so it contains that PRs commit as well.

Test Plan

  • Updated existing unit tests
+190 -189

0 comment

7 changed files

pr created time in 13 days

push eventankushagarwal/libra

Ankush Agarwal

commit sha 4c25a3c92a3f7e403df8e21e01be2b725b51c275

[enhancement] Refactor libra_channel and remove MessageQueue trait

view details

push time in 13 days

create barnchankushagarwal/libra

branch : refactor-libra-channel

created branch time in 13 days

delete branch ankushagarwal/libra

delete branch : multi-region-simulation

delete time in 13 days

push eventankushagarwal/libra

Ankush Agarwal

commit sha ffefed5812713fef6505f3f0e840ff35e8f7f950

[cluster-test] Add a cleanup method for performing cleanups Summary The cleanup method is intended to cleanup effects of experiments which might have crashed mid way. We run this before we run experiments. There is also a standalone flag for running cleanup. Test Plan [ec2-user:validator@ip-10-0-0-212 ~]$ ~/libra/target/cluster_test_docker_builder/cluster-test --workplace=kush --cleanup Oct 29 00:15:27.334 INFO Discovered 4 peers Oct 29 00:15:27.555 INFO Log tail thread started in 220 ms Oct 29 00:15:27.556 INFO RemoveNetworkEffects for 1e5d5a74(10.0.6.55) Oct 29 00:15:27.556 INFO RemoveNetworkEffects for 8deeeaed(10.0.1.66) Oct 29 00:15:27.556 INFO RemoveNetworkEffects for ab0d6a54(10.0.10.92) Oct 29 00:15:27.557 INFO RemoveNetworkEffects for 57ff8374(10.0.3.130) Closes: #1540 Approved by: andll

view details

Andrey Chursin

commit sha d593b65bedd0b9a4fd332f55cd011282fb072189

[cluster-test] Retry mint Currently mint can failure. We'll work with core infra to fix it, but for now this just adds retry Closes: #1552 Approved by: ankushagarwal

view details

Andrey Chursin

commit sha 0bc5fb03e4a10647ba288b3c0d80093a6e103120

[cluster-test] Set timeout for running ssh This sets 15 seconds for total ssh execution to prevent spurious failures like 'experiment did not complete on time' Closes: #1557 Approved by: ankushagarwal

view details

Ankush Agarwal

commit sha 51dd2b4261d6b80ee8207c02bfd6beb8f84b1ae5

[cluster-test] Use split_n_random for PacketLoss experiment Summary fn split_n_random provides the functionality of splitting a cluster's instances randomly. Re-using that logic in PacketLoss experiment Use debug level instead of info level in RemoveNetworkEffects Test Plan Ran it on my test cluster [ec2-user:validator@ip-10-0-0-212 ~]$ ~/libra/target/cluster_test_docker_builder/cluster-test --workplace=kush --packet-loss-experiment --packet-loss-percent-instances=50 --packet-loss-percent=20 --packet-loss-duration-secs=10 Oct 29 19:16:29.709 INFO Discovered 4 peers Oct 29 19:16:29.930 INFO Log tail thread started in 220 ms Oct 29 19:16:30.193 INFO Starting experiment Packet Loss 20.00% [1e5d5a74(10.0.6.55), ab0d6a54(10.0.10.92), ] Oct 29 19:16:30.193 INFO PacketLoss 20.00% for 1e5d5a74(10.0.6.55) Oct 29 19:16:30.585 INFO PacketLoss 20.00% for ab0d6a54(10.0.10.92) Oct 29 19:17:05.207 INFO Experiment finished, waiting until all affected validators recover Oct 29 19:17:10.211 INFO Experiment completed Closes: #1555 Approved by: andll

view details

Andrey Chursin

commit sha fd964e88d138ec26844f75675d7e59ccf40c8092

[cluster-test] Remove cleanup() from run_single_experiment() run_single_experiment runs in loop when running test suite, there is no point in cleaning up every time inside suite and it takes time. This diff introduces new fn cleanup_and_run for cleanup+run_single_experiment for cmdline utils Closes: #1558 Approved by: ankushagarwal

view details

Philip Hayes

commit sha 0672d5af6e1fd301d9f1d234c0c5480ccfff226e

[network] Add Dial/Disconnect Peer request types to network interface See #1516 Closes: #1554 Approved by: bothra90

view details

Philip Hayes

commit sha eeadab0a0cfe4fd845520ba7b929c0e9499fa315

[network] Wire-through Dial/Disconnect requests to PeerManager actor See #1516 Closes: #1554 Approved by: bothra90

view details

Abhay Bothra

commit sha a46f375ff7c1de648d96b1fcc1108db2c6275619

[network] Test multiplexing of yamux substreams within yamux substreams Closes: #1563 Approved by: phlip9

view details

Zekun Li

commit sha be9270387c2edeb3ed21abb9c87911a043005623

suppress dead code warning Closes: #1564 Approved by: bschwab

view details

Young Yang Liauw

commit sha 194713b84f5ec9e34b85c1065f1062701fd963bc

[coverage][easy] update code coverage runner Cargo build was updated recently. We are updating the runner to use `cargo xtest` to stay in sync. Closes: #1553 Approved by: bmwill

view details

Sherry Xiao

commit sha 008323c818c4d84ea0ffafef4020b428e72e5e04

[monitoring] Expose non-numeric metrics put git revision into environment variable Closes: #1532 Approved by: bmwill

view details

Sherry Xiao

commit sha 12546244f1d9970a79eaf10bd2058632703e1689

Update common/metrics/build.rs Co-Authored-By: Brandon Williams <bwilliams.eng@gmail.com> Closes: #1532 Approved by: bmwill

view details

Ankush Agarwal

commit sha ca6a906de79dd4963d345160070c51bb091c1b91

[cluster-test] Refactor execute_jobs function into a separate module Summary We want to use the execute_jobs function directly from various experiments and not just ClusterTestRunner Test Plan Compiles successfully Closes: #1560 Approved by: andll

view details

Brandon Williams

commit sha e02f72318a6170a78b37f570d28cec30f42ea499

[x] create an abstraction around invoking cargo commands Closes: #1568 Approved by: metajack

view details

Brandon Williams

commit sha 53658705cc7f98131899da82d86b7c87f0faf2e6

[x] refactor common cargo logic for reuse Closes: #1568 Approved by: metajack

view details

Brandon Williams

commit sha 75658222537bcf7e45ccd135335b8f791a862a00

[x] use --workspace instead of the deprecated --all Closes: #1568 Approved by: metajack

view details

Brandon Williams

commit sha ca894eac9d1ef2e52d61cc555af90bd5e2a101ca

[x] run unit tests before system tests Currently package exceptions have tests run on them first but since the testsuite is one of those exceptions we end up running all the end to end tests first. Debugging these e2e tests are a little more difficult than debugging normal unit tests so if there happen to be any errors it would be easier to first debug the unit tests before taking a look at the e2e tests. To fix this, reorder the calls to `cargo test` to first run them on the packages without exceptions. Closes: #1568 Approved by: metajack

view details

Brandon Williams

commit sha 85df5c7963d32a98bc6341ba2f1aae61ead04e07

[x] refactor how common arguments are passed to CargoCommand methods Closes: #1568 Approved by: metajack

view details

Brandon Williams

commit sha e818d9ca4fd652dee42cd41f02aa48c353dcdc3f

[x] add check command Closes: #1568 Approved by: metajack

view details

Brandon Williams

commit sha cb06036d5a7b2e651e2ae74c760c189462d62c82

[x] add clippy commnad Closes: #1568 Approved by: metajack

view details

push time in 13 days

pull request commentlibra/libra

[cluster-test] Multi region simulation

Should be generalizable, but it is something that I wouldn't want to do now.

With a 2-region simulation, we can measure tps by varying the latency between the two regions and also measure tps by varying the split ratio of the two regions. These two plots of tps v/s latency and tps v/s split ratio should be interesting.

With more than two regions, I think there will be a lot of moving parts and it will be hard to make valuable inferences.

@bors-libra r+

ankushagarwal

comment created time in 13 days

pull request commentlibra/libra

[cluster-test] Show changelog

So the slack and github modules here are serving only cluster-test and I think they are really "cluster_test"_slack and "cluster_test"_github modules instead of general purpose modules.

I think it's just a matter of personal style, so I won't be pushy.

@bors-libra r+

andll

comment created time in 13 days

Pull request review commentlibra/libra

[cluster-test] Show changelog

 struct Args {     perf_run: bool,     #[structopt(long, group = "action")]     cleanup: bool,+    #[structopt(long, group = "action")]+    changelog: Option<String>,

Makes sense. I was confused as to what would happen if we just provide --changelog without any String, looks like the structopt library won't allow this.

andll

comment created time in 13 days

delete branch ankushagarwal/libra

delete branch : network-delays

delete time in 13 days

Pull request review commentlibra/libra

[Consensus] Block retrieval of descendants of a committed id

 use failure::prelude::*; use libra_crypto::hash::HashValue; use libra_types::crypto_proxies::ValidatorVerifier; use serde::{Deserialize, Serialize};+use std::collections::HashSet; use std::convert::TryFrom; use std::fmt; -/// RPC to get a chain of block of the given length starting from the given block id.+#[derive(Serialize, Deserialize, Clone, Debug, PartialEq, Eq)]+pub enum BlockRetrievalMode {

nit: This is a small enum, so we can make it Copy and remove retrieval_mode.clone() below

dmitri-perelman

comment created time in 14 days

push eventankushagarwal/libra

Philip Hayes

commit sha 0672d5af6e1fd301d9f1d234c0c5480ccfff226e

[network] Add Dial/Disconnect Peer request types to network interface See #1516 Closes: #1554 Approved by: bothra90

view details

Philip Hayes

commit sha eeadab0a0cfe4fd845520ba7b929c0e9499fa315

[network] Wire-through Dial/Disconnect requests to PeerManager actor See #1516 Closes: #1554 Approved by: bothra90

view details

Abhay Bothra

commit sha a46f375ff7c1de648d96b1fcc1108db2c6275619

[network] Test multiplexing of yamux substreams within yamux substreams Closes: #1563 Approved by: phlip9

view details

Zekun Li

commit sha be9270387c2edeb3ed21abb9c87911a043005623

suppress dead code warning Closes: #1564 Approved by: bschwab

view details

Young Yang Liauw

commit sha 194713b84f5ec9e34b85c1065f1062701fd963bc

[coverage][easy] update code coverage runner Cargo build was updated recently. We are updating the runner to use `cargo xtest` to stay in sync. Closes: #1553 Approved by: bmwill

view details

Sherry Xiao

commit sha 008323c818c4d84ea0ffafef4020b428e72e5e04

[monitoring] Expose non-numeric metrics put git revision into environment variable Closes: #1532 Approved by: bmwill

view details

Sherry Xiao

commit sha 12546244f1d9970a79eaf10bd2058632703e1689

Update common/metrics/build.rs Co-Authored-By: Brandon Williams <bwilliams.eng@gmail.com> Closes: #1532 Approved by: bmwill

view details

Ankush Agarwal

commit sha ca6a906de79dd4963d345160070c51bb091c1b91

[cluster-test] Refactor execute_jobs function into a separate module Summary We want to use the execute_jobs function directly from various experiments and not just ClusterTestRunner Test Plan Compiles successfully Closes: #1560 Approved by: andll

view details

Ankush Agarwal

commit sha b94585e80dd78343a9dcb83cb64dd86aa9bb9909

[enhancement] Create an experiment to simulate multi-region Summary Create a NetworkDelay action which adds network delay to a single instance using tc. Create a MultiRegionSimulation experiment which simulates a two region split among the instances Creates a virtual region1 and region2 Adds delay to all packets which go from region1 to region2 We dont need to do the vice versa because region1 will be delaying all its responses to region2 Depends on PR #1560 Test Plan Ran ~/libra/target/cluster_test_docker_builder/cluster-test --workplace=kush --multi-region-simulation --multi-region-split=1 --multi-region-exp-duration-secs=30 on my test cluster.

view details

push time in 14 days

delete branch ankushagarwal/libra

delete branch : refactor-thread-pool

delete time in 14 days

Pull request review commentlibra/libra

[cluster-test] Show changelog

 struct Args {     perf_run: bool,     #[structopt(long, group = "action")]     cleanup: bool,+    #[structopt(long, group = "action")]+    changelog: Option<String>,

Why do we need an Option<String> here instead of a String? We are not doing anything when args.changelog is None

andll

comment created time in 14 days

Pull request review commentlibra/libra

[cluster-test] Show changelog

 impl ClusterTestRunner {             }             if let Some(hash_to_tag) = hash_to_tag.take() {                 info!("Test suite succeed first time for `{}`", hash_to_tag);-                if let Err(e) = self+                let prev_commit = self+                    .deployment_manager+                    .get_tested_upstream_commit()+                    .map_err(|e| warn!("Failed to get prev_commit: {:?}", e))+                    .ok();+                let upstream_commit = match self                     .deployment_manager                     .tag_tested_image(hash_to_tag.clone())                 {-                    self.report_failure(format!("Failed to tag tested image: {}", e));-                    return;-                }+                    Err(e) => {+                        self.report_failure(format!("Failed to tag tested image: {}", e));+                        return;+                    }+                    Ok(upstream_commit) => upstream_commit,+                };                 let perf_msg = match self.measure_performance() {                     Ok(report) => format!(                         "Performance report:\n```\n{}\n```",                         report.to_slack_message()                     ),-                    Err(err) => format!("No performance data:\n```\n{}\n```", err),+                    Err(err) => {+                        warn!("No performance data: {}", err);+                        "No performance data".to_string()+                    }                 };-                self.slack_message(format!(-                    "Test suite passed. Tagged `{}` as `{}`\n{}",-                    hash_to_tag, TESTED_TAG, perf_msg-                ));+                info!(+                    "prev_commit: {:?}, upstream_commit: {}",+                    prev_commit, upstream_commit+                );+                let changelog = self.get_changelog(prev_commit.as_ref(), &upstream_commit);+                self.slack_changelog_message(format!("{}\n\n{}", changelog, perf_msg));             }             thread::sleep(self.experiment_interval);         }     } +    fn get_changelog(&self, prev_commit: Option<&String>, upstream_commit: &str) -> String {

Similar to Slack comment, can we move this function to github module?

andll

comment created time in 14 days

Pull request review commentlibra/libra

[cluster-test] Show changelog

 struct ClusterTestRunner {     health_check_runner: HealthCheckRunner,     deployment_manager: DeploymentManager,     experiment_interval: Duration,-    slack: Option<SlackClient>,+    slack: SlackClient,+    slack_log_url: Option<Url>,

Can we move these URLs into SlackClient. This struct is getting very big.

slack_message and slack_changelog_message can also be made methods of SlackClient

andll

comment created time in 14 days

push eventankushagarwal/react-crash-todo

dependabot[bot]

commit sha f7353e9710d6ea424670acb65e5e82af9ce8d610

Bump eslint-utils from 1.4.0 to 1.4.3 Bumps [eslint-utils](https://github.com/mysticatea/eslint-utils) from 1.4.0 to 1.4.3. - [Release notes](https://github.com/mysticatea/eslint-utils/releases) - [Commits](https://github.com/mysticatea/eslint-utils/compare/v1.4.0...v1.4.3) Signed-off-by: dependabot[bot] <support@github.com>

view details

Ankush Agarwal

commit sha ed340029e1f497927d0f575c07b86adbc2aad30d

Merge pull request #1 from ankushagarwal/dependabot/npm_and_yarn/eslint-utils-1.4.3 Bump eslint-utils from 1.4.0 to 1.4.3

view details

push time in 14 days

PR merged ankushagarwal/react-crash-todo

Bump eslint-utils from 1.4.0 to 1.4.3 dependencies

Bumps eslint-utils from 1.4.0 to 1.4.3. <details> <summary>Commits</summary>

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


<details> <summary>Dependabot commands and options</summary> <br />

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot ignore this [patch|minor|major] version will close this PR and stop Dependabot creating any more for this minor/major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
  • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
  • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
  • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

</details>

+156 -23

0 comment

2 changed files

dependabot[bot]

pr closed time in 14 days

push eventankushagarwal/libra

Ankush Agarwal

commit sha 2d3d7a56118c86b87d607c788991a50f19e69691

[enhancement] Create an experiment to simulate multi-region Summary Create a NetworkDelay action which adds network delay to a single instance using tc. Create a MultiRegionSimulation experiment which simulates a two region split among the instances Creates a virtual region1 and region2 Adds delay to all packets which go from region1 to region2 We dont need to do the vice versa because region1 will be delaying all its responses to region2 Depends on PR #1560 Test Plan Ran ~/libra/target/cluster_test_docker_builder/cluster-test --workplace=kush --multi-region-simulation --multi-region-split=1 --multi-region-exp-duration-secs=30 on my test cluster.

view details

push time in 14 days

more