profile
viewpoint
Nikhil Benesch benesch @MaterializeInc New York, NY Systems engineer

ballista-compute/sqlparser-rs 488

Extensible SQL Lexer and Parser for Rust

benesch/autouseradd 2

👨‍🚒 put out fires started by `docker run --user`

benesch/backport 2

automatically backport pull requests

benesch/adspygoogle.dfp 1

setuptools fork of Google DoubleClick for Publishers API Python Client

benesch/backboard 1

have you backported your PRs today?

benesch/abomonation 0

A mortifying serialization library for Rust

benesch/abomonation_derive 0

A macros 1.1 #[derive(Abomonation)] implementation for the abomonation crate

benesch/Amethyst 0

Tiling window manager for OS X à la xmonad.

PR opened MaterializeInc/materialize

sql: correctly rearrange columns during inserts

Fixes #4890

@benesch I started writing sqllogictests for this but then I read in the developer documentation that all INSERTS are routed through Postgres. How do you suggest I test this bug fix?

+10 -12

0 comment

1 changed file

pr created time in 8 hours

issue openedMaterializeInc/materialize

Incorrect column assignment during INSERT queries

<!-- Thanks for your feedback on Materialize! Please include the following information in your bug report. -->

What version of Materialize are you using?

$ materialized -v
materialized v0.5.3-dev (2739f4b5f4ed84c10cd41ad8f4b05588c1d8c5dc)

How did you install Materialize?

<!-- Choose one. -->

  • [ ] Docker image
  • [ ] Linux release tarball
  • [ ] APT package
  • [ ] macOS release tarball
  • [ ] Homebrew tap
  • [x] Built from source

What was the issue?

If an INSERT statement specifies columns in a non-trivial order, materialize will rearrange the values in an incorrect way causing either query failure or corrupt data if the rearranged data-types happen to match.

Is the issue reproducible? If so, please provide reproduction instructions.

To reproduce, run the following queries:

CREATE TABLE t (a int, b int, c int, d int, e int);
INSERT INTO t (d, a, e, b, c) VALUES (4, 1, 5, 2, 3);
SELECT * FROM t;

The expected output is:

 a | b | c | d | e
---+---+---+---+---
 1 | 2 | 3 | 4 | 5

but the actual output is:

 a | b | c | d | e
---+---+---+---+---
 2 | 4 | 3 | 1 | 5

Please attach any applicable log files.

<!-- Consider including: * mzdata/materialized.log * Kafka logs, if using Kafka sources or sinks * Kinesis logs, if using Kinesis sources -->

created time in 9 hours

PR opened MaterializeInc/materialize

repr: add support for all compacted date formats

Adds support for dates like 700203, 0010203 and 200102031

cc @quodlibetor

+81 -52

0 comment

2 changed files

pr created time in 14 hours

PR opened MaterializeInc/materialize

Reviewers
Introduce each DeltaQuery arrangement at most once

This PR streamlines the dataflow build for delta queries, to deduplicate the importing of arrangements into the inner region. Previously we would import k^2 arrangements, not at great cost because they are all shared but still gumming up the presented dataflow. Now we import each arrangement at most twice (in each of Alt and Neu variations) and instead there is a near cross-bar between the imported arrangements and the individual delta path pipelines (one for each input relation).

+117 -62

0 comment

1 changed file

pr created time in 17 hours

issue openedMaterializeInc/materialize

Restrict arrangement imports to those we use

The dataflow layer currently imports all arrangements for a source, because we don't have great communication about which arrangements we will actually use. We roughly know the points where arrangements are used, as we perform analyses, but this information is not communicated back up.

While not a major problem, it creates dataflow graphs that look like Screen Shot 2020-11-26 at 7 20 25 AM

created time in 17 hours

create barnchMaterializeInc/materialize

branch : chris/tail-psycopg3

created branch time in a day

PR opened MaterializeInc/materialize

Create ingest benchmark using chbench

This benchmark will start chbench, generating data that will be used to populate some very simple views in Materialized (literally just count rows). After a pre-determined amount of time, the benchmark will stop chbench, record the state of each view, restart Materialize and measure the amount of time required to catch up for each view.

Example output at the end of the test:

  3.6s: count_warehouse
  4.0s: count_nation
  4.4s: count_region
  4.7s: count_supplier
  5.9s: count_district
  6.3s: count_item
 17.9s: count_customer
 25.4s: count_neworder
 26.1s: count_history
 26.4s: count_order
 35.9s: count_stock
 85.6s: count_orderline
+4477 -2

0 comment

7 changed files

pr created time in a day

create barnchMaterializeInc/materialize

branch : chris/ingest-perf-test

created branch time in a day

issue openedMaterializeInc/materialize

Cannot create custom types with non-zero numeric scale

In Postgres:

CREATE TYPE numeric_range AS RANGE (subtype = numeric(38,5));

In MZ: note that the cast shown below is taken from a branch on my fork that implements this feature)

materialize=> CREATE TYPE numeric_map AS MAP (key_type=text, value_type=numeric(38,2));

ERROR:  Expected right parenthesis, found left parenthesis
LINE 1: ...umeric_map AS MAP (key_type=text, value_type=numeric(38,2));

materialize=> CREATE TYPE numeric_map AS MAP (key_type=text, value_type=numeric);

CREATE TYPE

materialize=> SELECT '{a=>1.23}'::numeric_map;
 ?column?
----------
 {a=>1}
(1 row)

The underlying issue is that the value_type's GlobalId refers to the generic numeric type, which defaults to 0 scale. And is non-configurable, anyhow.

created time in a day

issue commentreadablesystems/sto

Core dump occurs when YCSB is executed.

Sorry, just getting around to responding here. Yes, the system can run with 8GB of ram. Are you still encountering the same core dump as you were before?

yqekzb123

comment created time in a day

issue openedMaterializeInc/materialize

Negative value for count rows in chbench neworder view

I've created a view with the following source:

materialize=> SHOW CREATE VIEW count_neworder;
               View                |                                                            Create View                                                             
-----------------------------------+------------------------------------------------------------------------------------------------------------------------------------
 materialize.public.count_neworder | CREATE VIEW "materialize"."public"."count_neworder" AS SELECT "count"(*) AS "num_neworders" FROM "materialize"."public"."neworder"
(1 row)

If I query it's value, I get an apparently negative number of rows:

materialize=> select * from count_neworder;
 num_neworders 
---------------
       -549801
(1 row)

If I then create a materialized view on top of neworder, I get the following error when trying to select from it:

materialize=> create materialized view neworders as select * from neworder;
CREATE VIEW
materialize=> select * from neworders;
ERROR:  Negative multiplicity: -1 for [Int32(2304), Int32(1), Int32(1)]

My best guess is that something funky with Debezium is going on here.

created time in a day

push eventreadablesystems/sto-scripts

William Qian

commit sha 64804099ea9dd0f7f314ea845e9807fca7910b88

Experimental setup for vldbj20

view details

push time in a day

pull request commentMaterializeInc/materialize

wip: start pushing down projections

Ok, here's an updated version! Needs a rebase but I think this version preserves all the delta joins we have right now, but is a bit weird. Needs a bit more cleanup/comments but wanted to get this out. I also tried removing the demand analysis and it didn't seem to make things too much worse but I need to take a closer look.

justinj

comment created time in a day

delete branch MaterializeInc/materialize

delete branch : chris/topic-replay-test

delete time in a day

create barnchMaterializeInc/materialize

branch : chris/topic-replay-test

created branch time in a day

pull request commentMaterializeInc/materialize

repr: handle invalid dates using chrono's checks

Thanks again!

petrosagg

comment created time in 2 days

push eventMaterializeInc/materialize

Petros Angelatos

commit sha 60b5ef432567cf21c66d9ac29521fa2bc542d6f5

repr: handle invalid dates using chrono's checks The current code does some rudimentary checks before passing the data to chrono's `from_ymd` function but invalid combinations can still be constructed, causing a panic. Switch to using `from_ymd_opt()` to handle all leap year, and per month limit edge cases Fixes #4880 Signed-off-by: Petros Angelatos <petrosagg@gmail.com>

view details

Brandon W Maister

commit sha fc3a6851610bed1869370e6f1614af997d60d821

Merge pull request #4881 from petrosagg/date-panics repr: handle invalid dates using chrono's checks

view details

push time in 2 days

PR merged MaterializeInc/materialize

repr: handle invalid dates using chrono's checks

The current code does some rudimentary checks before passing the data to chrono's from_ymd function but invalid combinations can still be constructed, causing a panic.

Switch to using from_ymd_opt() to handle all leap year, and per month limit edge cases

Fixes #4880

<!-- Reviewable:start -->

This change is <img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/> <!-- Reviewable:end -->

+11 -2

7 comments

2 changed files

petrosagg

pr closed time in 2 days

issue closedMaterializeInc/materialize

Invalid input can cause panic during date parsing

<!-- Thanks for your feedback on Materialize! Please include the following information in your bug report. -->

What version of Materialize are you using?

$ materialized -v
materialized v0.5.2-dev (9a66d82a2c85246af2d04465cac03770f73593c4)

How did you install Materialize?

<!-- Choose one. -->

  • [ ] Docker image
  • [ ] Linux release tarball
  • [ ] APT package
  • [ ] macOS release tarball
  • [ ] Homebrew tap
  • [x] Built from source

What was the issue?

Server crashes if presented with an out of range date like 2020-02-30.

Is the issue reproducible? If so, please provide reproduction instructions.

Simply run SELECT '2019-02-30'::timestamp;

Please attach any applicable log files.

Relevant part of crash log:

materialized v0.5.2-dev (9a66d82a2) listening on 0.0.0.0:6875...
Nov 25 13:30:59.584 ERROR panic: <unnamed>: invalid or out-of-range date
   0: materialized::handle_panic
             at materialized/src/bin/materialized/main.rs:540:65
   1: core::ops::function::Fn::call
             at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/core/src/ops/function.rs:70:5
   2: std::panicking::rust_panic_with_hook
             at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/std/src/panicking.rs:573:17
   3: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/std/src/panicking.rs:476:9
   4: std::sys_common::backtrace::__rust_end_short_backtrace
             at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/std/src/sys_common/backtrace.rs:153:18
   5: rust_begin_unwind
             at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/std/src/panicking.rs:475:5
   6: core::panicking::panic_fmt
             at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/core/src/panicking.rs:85:14
   7: core::option::expect_failed
             at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/core/src/option.rs:1213:5
   8: core::option::Option<T>::expect
             at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/core/src/option.rs:333:21
   9: chrono::naive::date::NaiveDate::from_ymd
             at /home/petrosagg/.cargo/registry/src/github.com-1ecc6299db9ec823/chrono-0.4.19/src/naive/date.rs:173:9
  10: repr::adt::datetime::ParsedDateTime::compute_date
             at repr/src/adt/datetime.rs:465:20
  11: repr::strconv::parse_timestamp_string
             at repr/src/strconv.rs:220:24
  12: repr::strconv::parse_timestamp
             at repr/src/strconv.rs:268:11
  13: expr::scalar::func::cast_string_to_timestamp
             at expr/src/scalar/func.rs:382:5
  14: expr::scalar::func::UnaryFunc::eval
             at expr/src/scalar/func.rs:2538:49
  15: expr::scalar::ScalarExpr::eval
             at expr/src/scalar/mod.rs:554:53
  16: expr::scalar::ScalarExpr::reduce::{{closure}}
             at expr/src/scalar/mod.rs:279:50
  17: expr::scalar::ScalarExpr::reduce::{{closure}}
             at expr/src/scalar/mod.rs:284:26
  18: expr::scalar::ScalarExpr::visit_mut
             at expr/src/scalar/mod.rs:180:9

closed time in 2 days

petrosagg

push eventreadablesystems/sto

William Qian

commit sha 1156fac4faf22a723655ab91d1069256393f8a3b

Default to 1ms gc instead of 100ms

view details

push time in 2 days

issue commentMaterializeInc/materialize

Support User-defined Functions

It would be really great to be able to write transform/reduce functions using WASM.

Just an idea and I'm not sure whether it's practical: There could an opportunity for Materialize to provide some kind of composition-based SDK, that would allow for further optimization in some cases.

awang

comment created time in 2 days

pull request commentMaterializeInc/materialize

repr: handle invalid dates using chrono's checks

will make sure to run those checks locally

Don't feel bad about this! It's standard for CI to just work, and there are a ton of checks that are annoying to run.

The most likely thing to fail that won't match what your IDE gives you is bin/check, because we have a bunch of customizations for clippy that are only tracked there.

petrosagg

comment created time in 2 days

pull request commentMaterializeInc/materialize

repr: handle invalid dates using chrono's checks

if you request review from me that will help me see your PRs faster

Thanks, I'll do that for my next ones

Can you verify that you can see the failure in buildkite? https://buildkite.com/materialize/tests/builds/12258#_

Yep, I can see it. I just sent a fix and will make sure to run those checks locally :facepalm:

petrosagg

comment created time in 2 days

pull request commentMaterializeInc/materialize

repr: handle invalid dates using chrono's checks

It's my first PR here :)

And it's a good one! Thanks for finding this bug!

The issue is unfortunately that because we run the tests on our private infrastructure we need to code-review before the tests execute for outside contributors, and then manually kick them off.

So, since this will continue to be an issue it's worth an fyi that if the test's fail in mkpipeline then that means that we need to manually start a build (if you request review from me that will help me see your PRs faster), if they fail in any other test then that is something for you to look at.

The current failure is a valid lint failure, 4 different test executions are all pointing to the same thing. Can you verify that you can see the failure in buildkite? https://buildkite.com/materialize/tests/builds/12258#_

In case you can't, this is the issue:

error: use of `ok_or` followed by a function call
   --> src/repr/src/adt/datetime.rs:465:59
    |
465 |                 NaiveDate::from_ymd_opt(year, month, day).ok_or("invalid or out-of-range date".into())
    |                                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: try this: `ok_or_else(|| "invalid or out-of-range date".into())`
petrosagg

comment created time in 2 days

pull request commentMaterializeInc/materialize

repr: handle invalid dates using chrono's checks

Maybe a race condition between the build starting and signing the CLA? It's my first PR here :)

petrosagg

comment created time in 2 days

pull request commentMaterializeInc/materialize

repr: handle invalid dates using chrono's checks

I'm looking into the build failure, it's an authorization failure not something wrong with your PR.

petrosagg

comment created time in 2 days

pull request commentMaterializeInc/materialize

repr: handle invalid dates using chrono's checks

Thanks!

petrosagg

comment created time in 2 days

pull request commentMaterializeInc/materialize

repr: handle invalid dates using chrono's checks

CLA assistant check <br/>Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.<br/><sub>You have signed the CLA already but the status is still pending? Let us recheck it.</sub>

petrosagg

comment created time in 2 days

more