profile
viewpoint
Christian Legnitto LegNeato Robinhood San Francisco Ex-@facebook, ex-@mozilla, ex-@apple. Engineering manager who pretends he can still code and does it as much as possible.

graphql-rust/graphql-client 568

Typed, correct GraphQL requests and responses in Rust

LegNeato/asciinema-rs 55

Asciinema client written in Rust

LegNeato/aws-lambda-events 32

Rust event types for AWS Lambda

LegNeato/bztools 32

Models and scripts to access the Bugzilla REST API.

LegNeato/bugzilla-push 12

A Bugzilla extension that enables integration with a message broker via AMQP or STOMP

jbalogh/bztools 2

Models and scripts to access the Bugzilla REST API.

LegNeato/asciicast-rs 2

Rust library for the Asciicast file format used by Asciinema

LegNeato/atom-in-orbit 2

Putting Atom in the browser

LegNeato/carrot 2

AMQP Messaging Framework for Python

LegNeato/adbkit-apkreader 0

Extracts information from APK files.

issue commentspacejam/sled

Question: Is Sled multi-process safe?

I think the answer is "sled cannot run in multi-process situation, but If the OS / FS support advisory lock, and implement correctly, datafile would not be corrupted". So it's safe when non-intended run multiple process in the sametime. If I'm wrong please correct me.

Check here:

https://github.com/spacejam/sled/blob/v0.34.6/src/config.rs#L580-L608

ckaran

comment created time in an hour

issue openedgraphql-rust/juniper

Trying to return a `Ref<_>` gives error about unimplemented trait `IntoResolvable`

Hi -- I have a memoised collection on a graphql object in my schema:

pub struct Section {
    pub id: i32,
    pub slug: String,
    pub name: String,
    pub memoised_categories: RefCell<(bool, Vec<Category>)>,
}

#[juniper::object(
    Context = Context,
)]
impl Section {
    
    pub fn id(&self) -> i32 {
        self.id
    }

    pub fn slug(&self) -> &str {
        &self.slug
    }

    pub fn name(&self) -> &str {
        &self.name
    }

    pub fn categories(&self, context: &Context) -> FieldResult<Vec<Category>> {
        self.all_categories(context)
    }

    /// This causes the problems
    ///
    pub fn categories2(&self, context: &Context) -> FieldResult<Ref<Vec<Category>>> {
        unimplemented!()
    }
    
}

impl Section {

    pub fn new(id: i32, slug: String, name: String) -> Section {
        Section {
            id,
            slug,
            name,
            memoised_categories: RefCell::new((false, vec![])),
        }
    }
    pub fn all_categories(&self, context: &Context) -> FieldResult<Vec<Category>> {
        // Set memoised categories if not previously set
        if !self.memoised_categories.borrow().0 {
            let mut mc = self.memoised_categories.borrow_mut();
            mc.1 = context.repos.categories.all_by_section_id(self.id)?;
            mc.0 = true;
        }

        // TODO: Return a ref (or Ref or something made readable to Juniper) instead of cloning all
        // the categories stored.
        let v = self.memoised_categories.borrow().1.iter().map(|cat| cat.clone()).collect();

        Ok(v)
    }

}

The method categories calls all_categories which:

  • Checks the value exists
  • Populates it if it doesn't
  • Clones the value (a 2D Vec, so not cheap)
  • Returns the value

What I really want to be able to do is to return a reference to the memoised value, but that doesn't appear to be possible. I've tried using Refcell::map(...) which returns the memoised value in a Ref and I think that's probably the correct approach to take. However, Juniper does not appear to like dealing with Refs. When I try to return one to Juniper, as in the categories2 method above, I get this compilation error:

error[E0277]: the trait bound `std::result::Result<Ref<'_, Vec<schema::categories::Category>>, FieldError>: IntoResolvable<'_, DefaultScalarValue, _, context::Context>` is not satisfied
  --> src/schema/sections.rs:17:1
   |
17 | / #[juniper::object(
18 | |     Context = Context,
19 | |     description = "\
20 | |       A Section for Items.\
...  |
23 | |     ",
24 | | )]
   | |__^ the trait `IntoResolvable<'_, DefaultScalarValue, _, context::Context>` is not implemented for `std::result::Result<Ref<'_, Vec<schema::categories::Category>>, FieldError>`
   |
   = help: the following implementations were found:
             <std::result::Result<(&'a <T as GraphQLType<S>>::Context, T), FieldError<S>> as IntoResolvable<'a, S, T, C>>
             <std::result::Result<T, E> as IntoResolvable<'a, S, T, C>>
             <std::result::Result<std::option::Option<(&'a <T as GraphQLType<S>>::Context, T)>, FieldError<S>> as IntoResolvable<'a, S, std::option::Option<T>, C>>

IntoResolvable is hidden from the documentation and in a private module so I'm guessing that trying to implement that for Ref<...> isn't the answer I need. Can anybody offer me any advice about how to memoise the value (and hide mutability, of course!) and return a reference to it? Thanks, Doug.

created time in an hour

fork arlyon/cargo-chef

A cargo-subcommand to speed up Rust Docker builds using Docker layer caching.

fork in 2 hours

fork spacejam/ed25519-dalek

Fast and efficient ed25519 signing and verification in Rust.

fork in 3 hours

issue openedkyren/webrtc-unreliable

Is gracefull shutdown possible?

Is there a way to gracefully shut down the rtc_server.recv() loop without waiting for the next packet?

created time in 5 hours

pull request commentgraphql-rust/juniper

Allow raw identifier for field arguments

@zaksabeast

it looks like arguments can't currently be used with methods of a trait using the graphql_interface proc macro.

Hmmm, right. I didn't figure it out. So, #[graphql_interface] doesn't have such problem at all.

Would adding a test to integration_tests/juniper_tests/src/codegen/derive_object_with_raw_idents.rs make sense, or would a new file be preferred?

It would be better to add mod raw_arguments to the integration_tests/juniper_tests/src/codegen/impl_object.rs, like fallible does.

Thanks!

zaksabeast

comment created time in 6 hours

issue openeddagster-io/dagster

[Docs] Introduce solid-level tags when introducing pipeline-level tags

Summary

It's not obvious that users can tag solids and pipelines. We should at least update this example to mention solid level tags or include a solid tag example

Reproduction

<!-- A minimal example that exhibits the behavior -->

Dagit UI/UX Issue Screenshots

<!-- (Optional) -->

Additional Info about Your Environment

<!-- (Optional) -->

created time in 6 hours

issue openeddagster-io/dagster

define_dagstermill_solid should allow users to pass in solid tags and description

Summary

https://docs.dagster.io/_modules/dagstermill/solids#define_dagstermill_solid

For example, this is important for running dagstermill solids in kubernetes w/ resource constraints

Reproduction

<!-- A minimal example that exhibits the behavior -->

Dagit UI/UX Issue Screenshots

<!-- (Optional) -->

Additional Info about Your Environment

<!-- (Optional) -->

created time in 6 hours

issue commentgraphql-rust/graphql-client

Extend keyword?

It's indeed not implemented yet. There is an old issue about this.

lthiery

comment created time in 7 hours

push eventdagster-io/dagster

prha

commit sha f9e2bf39ba9629b7445fe70102b6ab3aeca9d80d

[sensors-9] add graphql queries for jobs / sensors Summary: Include both sensor-based queries off of the repository, and job-based queries off of the root. Test Plan: bk Reviewers: dish, dgibson Reviewed By: dgibson Differential Revision: https://dagster.phacility.com/D5262

view details

push time in 7 hours

issue openeddagster-io/dagster

[Dagit docs] Asset + Schedule + Partitions graphs are mostly empty

Summary

Not sure if this is a bug that the Dagit screenshots (especially those that contain graphs) are mostly empty?

Screen Shot 2020-11-24 at 9 19 22 PM

Screen Shot 2020-11-24 at 9 19 32 PM

Screen Shot 2020-11-24 at 9 19 41 PM

Reproduction

<!-- A minimal example that exhibits the behavior -->

Dagit UI/UX Issue Screenshots

<!-- (Optional) -->

Additional Info about Your Environment

<!-- (Optional) -->

created time in 9 hours

issue openedspacejam/sled

API for checkpoint / snapshot / backup system

Use Case:

Rocks equivalent:

  • https://github.com/facebook/rocksdb/wiki/Checkpoints
  • https://docs.rs/rocksdb/0.15.0/rocksdb/checkpoint/struct.Checkpoint.html

The primary use case here is exactly as described in the Rocks use case. Full & incremental backups. In my specific use case, I am building a distributed data storage system which uses async-raft for consensus, which includes a protocol where snapshots are sent to new nodes to bring them up-to-speed. Good stuff.

Proposed Change:

The API I am envisioning is as follows.

  • sled::DB gets a new method checkpoint(path: impl AsRef<Path>), which does the work of generating a new DB checkpoint which will be written to the given path.
  • Given the nature of checkpoints, it seems logical that this method only be exposed on the DB type and not on Trees (maintainers would definitely know better on this one).
  • Some discussion is merited around incremental checkpoints / backups and what patterns would be best for this.

Who Benefits From The Change(s)?

Anyone and everyone looking for full / incremental backups, and the various use cases which emerge from that capability.

Alternative Approaches

Instead of calling this new API method checkpoint, we could call it backup.

created time in 9 hours

issue openeddagster-io/dagster

[Docs] Explain plug-ability of the dagster system

Use Case

It does not seem clear to users how easily extensible Dagster is. I'd like to propose adding a section to the docs that details which components users can swap out w/ in-house implementations, the responsibilities of those components, and what interfaces need to be satisfied (and info about api stability).

One example that comes to mind is conveying the flexibility of the execution stack / how dagster is not tied to an execution engine. It would be nice to have docs that answer questions like "If you want to launch the run on some other compute substrate, implement the RunLauncher class -- the important methods here are launch_run, can_terminate, terminate, etc. Here are some example implementations: A, B, C. Reach out to us on Slack, etc" (There is an argument that this should wait until potential Run Launcher + Executor consolidation)

Ideas of Implementation

<!-- Your ideal solution -->

Additional Info

<!-- (Optional)-->

created time in 10 hours

issue openeddagster-io/dagster

Celery-less K8s deployment that offers step-level isolation

Use Case

Investigate whether this might be a reasonable deployment option?

Ideas of Implementation

<!-- Your ideal solution -->

Additional Info

<!-- (Optional)-->

created time in 10 hours

issue openeddagster-io/dagster

[K8s run launchers x grpc user code] Unable to launch run with specific image

Use Case

Currently, if a user uses the K8sRunLauncher or CeleryK8sRunLauncher + GRPC user code deployments, then we do not allow users to specify a specific job image and force users to use the current image in the grpc user code deployment.

This was originally intended to prevent users from accidentally passing in a job_image (via instance config / executor config) while expecting that the latest image within the user code server would be used. However, it seems that this is too strict as it completely blocks the ability to execute a run using a prior image tag.

The easy fix is to remove the added validation code. We may want to add visual cues in the UX so that users are aware whether a user-specified image will be used or the grpc image.

Ideas of Implementation

<!-- Your ideal solution -->

Additional Info

<!-- (Optional)-->

created time in 11 hours

create barnchdagster-io/dagster

branch : prha/test_graphql

created branch time in 11 hours

issue openeddagster-io/dagster

[K8s run launchers x grpc user code images] Docker image is not stored in runs DB

Use Case

When users kick off a pipeline run with the K8sRunLauncher / CeleryK8sRunLauncher and user code deployments configured, the run launcher fetches the docker image + tag from the user code deployment but this image + tag is not saved anywhere in the events db or runs db. This is important for understanding what version of the code was running and re-producing the results.

The easiest approach is perhaps to add this to the launch functions:

        self._instance.add_run_tags(
            run.run_id,
            {
                "job_image": job_image,
            }
        )

However, this is not super ergonomic for re-execution as (1) the tag is only visible on the Runs page but not when viewing a specific historical pipeline run and (2) lots of copy + pasting is required to actually re-execute the pipeline with that image

Ideas of Implementation

<!-- Your ideal solution -->

Additional Info

<!-- (Optional)-->

created time in 11 hours

issue openeddagster-io/dagster

Explain Celery when mentioned in docs

Summary

The docs assume that readers know what Celery is and what it is used for, but that's not a reasonable assumption. Many users have asked what is celery, what's the purpose of celery, how do they know if they need to use it, what will happen if they don't use it, etc.

Reproduction

<!-- A minimal example that exhibits the behavior -->

Dagit UI/UX Issue Screenshots

<!-- (Optional) -->

Additional Info about Your Environment

<!-- (Optional) -->

created time in 12 hours

issue openedgoogle/starlark-rust

Working with eval'd types for unit tests

I'm currently working with defining my own starlark types, and wanting to write some unit tests for the starlark code. This is effectively what I'm trying to do:

#[test]
pub fn simple_test() {
  {
    let simple = run_starlark_and_get_type::<PortDefinition>("pyra_port('8080')");
    assert_eq!(simple.port, 8080);
    assert_eq!(simple.tcp, true);
  }
}

fn run_starlark_and_get_type<T: starlark::values::TypedValue>(starlark_code: &str) -> starlark::values::cell::ObjectRef<T> {
  let res = run_starlark(starlark_code);
  if let Err(err) = res {
    panic!(
      "Running Starlark Code Error'd:\n-----CODE-------\n{}\n------------\n\n------ERROR-----\n{:?}\n------------\n",
      starlark_code,
      err,
    );
  }
  let value = res.unwrap();

  let downcast_attempt = value.downcast_ref::<T>();
  if downcast_attempt.is_none() {
    panic!(
      "Running Starlark Code Returned Incorrect type. Actual type is: {:?}\n-----CODE-------\n{}\n------------\n",
      value.get_type(),
      starlark_code,
    );
  }

  downcast_attempt.unwrap()
}

fn run_starlark(starlark_code: &str) -> Result<Value> {
  let (mut global_env, mut type_values) = global_environment();
  pyra_module(&mut global_env, &mut type_values);
  global_env.freeze();
  let mut env = global_env.child("simple-unit-test");
  let map = Arc::new(Mutex::new(codemap::CodeMap::new()));

  let result = eval(
      &map,
      "stdin",
      &starlark_code,
      Dialect::Bzl,
      &mut env,
      &type_values,
      global_env.clone(),
  );

  if let Err(err) = result {
      Err(color_eyre::eyre::eyre!(format!("{:?}", err)))
  } else {
      Ok(result.unwrap())
  }
}

Unfortunately this does not work because starlark::values::cell is pub(crate) rather than being pub. Is there any particular reason for this cell to be public crate instead of public in general? Is there an easier way to do this "run this string, and get me this TypedValue"?

created time in 12 hours

issue closeddagster-io/dagster

Rethink system storage "is_persistent" flag

Right now the docstring is:

        is_persistent (bool): Whether the storage is persistent in a way that can cross process/node
            boundaries. Execution with, for example, the multiprocess executor, or with
            dagster-airflow, requires a persistent storage mode.

However, our filesystem storage is marked as persistent, which may or may not support crossing node boundaries (e.g., locally-mounted NFS yes, local disk no).

For our storage, we should differentiate between "this will outlast the Dagster process" vs. "this will be accessible across machine boundaries"

closed time in 13 hours

natekupp

issue commentdagster-io/dagster

Rethink system storage "is_persistent" flag

I'd like to argue that, aside from catching particular errors that we've observed come up frequently, we should back off from trying to model the interaction between executors and storages.

I think this can get infinitely complicated:

  • Executors could place steps in sandboxed environments that don't have access to the outside world.
  • Executors could place steps in environments that don't have access to a particular storage system, e.g. if that storage system is an HDFS that can only be accessed from inside the cluster.
  • Executors could place steps in environments that are missing the required configuration to access a particular storage system, e.g. AWS credentials for S3.

The approach we've taken with the asset store says "if someone is using a non-single process executor with the default mem_asset_store, raise an informative error. otherwise, let the user face the error at execution time." I think this covers the most common case of a user trying to do out-of-process execution but not being aware that there's a storage abstraction this interferes this. If we observe confusion in other settings, we could widen this.

So going to close this, but feel free to open if you feel otherwise.

natekupp

comment created time in 13 hours

issue commentdagster-io/dagster

Is asset_store_key mandatory?

Here's a fix: https://dagster.phacility.com/D5285

amarrella

comment created time in 13 hours

push eventdagster-io/dagster

Sandy Ryza

commit sha 0deaa09f3864e1f80a72d7b871893e19ad0cc8fc

asset catalog overview Summary: This transforms the "Assets & Materializations" overview into an "Asset Catalog" overview. The goal is to distinguish between: * Using Dagster to create / mutate assets, which is covered in the Asset Stores section. * Recording that assets were created / mutated, which is covered in the Asset Catalog overview. This will benefit heavily from screenshots, which I'm working on in a followup. Test Plan: manual inspection Reviewers: schrockn, yuhan, prha Reviewed By: prha Differential Revision: https://dagster.phacility.com/D5221

view details

push time in 13 hours

push eventdagster-io/dagster

Sandy Ryza

commit sha 99df49f29a8798e23e9b48ed164c5b3f13abcc3f

Enable asset stores to work with multiprocess executors Test Plan: added test Reviewers: cdecarolis, alangenfeld, yuhan Reviewed By: alangenfeld Subscribers: catherinewu Differential Revision: https://dagster.phacility.com/D5207

view details

push time in 13 hours

issue openedaws/aws-lambda-go

Custom error response payload

Is your feature request related to a problem? Please describe. I am frustrated (your terminology) that I cannot customize the error response payload in a lambda.

I'm using a transport agnostic API (GraphQL) inside of my Lambda function. Per the GraphQL specification, the response contains {"data": {...stuff}} when all was good, or {"data": {...stuff}, "errors": [... stuff]} when there are errors. return graphqlPayload, nil works great in most scenarios: I can easily grant permissions to those who should have access, they can easily run the function and parse the response, check for errors, etc.

return graphqlPayload, nil does not work well when running a step function. Specifically, I need to implement my own retry loop to check for $.output.Payload.errors. -- Yuk

Describe the solution you'd like I would like to be able to do something like this:

if returnError {
    return nil, aws-lambda-go/lambda/messages.WrapError(graphqlPayload)
} else {
    return graphqlPayload, nil
}

Describe alternatives you've considered

  • I've considered using API gateway, but that adds a lot of complexity when compared to simply invoking a Lambda where the permissions are very easily controlled/managed through CFT/CDK. Also, I don't think I was able to use API Gateway within a step function. Plus, I think I'd still need to figure out HTTP error codes for something that is supposed to be transport agnostic.
  • I've considered return payload, someError, but the payload is discarded when there is an error.
  • I've considered return nil, someError, but the error I provide is discarded
  • I've considered creating a second lambda function to invoke in step functions. This is what I'll have to do. it makes me sad.

Additional context

Could something like this MWE goplay work? Is there a requirement elsewhere that the payload of an error contain errorMessage and errorType?

created time in 13 hours

push eventdagster-io/dagster

Nicholas Schrock

commit sha 1ebdd065282cf738b07fd528414b1de641fa1872

Refactor environment_configs.py to share config construction code amongst configurable definitions Summary: I am prototyping a cleanup of the configured/config mapping stack and whilst I'm doing that I'm picking off refactors that will make that easier/more scoped and that are code quality wins. In this case, I want to consolidate the construction of config fields associated with anything that implements `ConfigurableMixin`. Test Plan: BK Reviewers: sandyryza, sashank, cdecarolis Reviewed By: sandyryza Differential Revision: https://dagster.phacility.com/D5272

view details

push time in 14 hours

push eventdagster-io/dagster

Isaac Hellendag

commit sha 889a61969a9a900d5cf80c6c919f5d26fc230ecd

[dagit] Allow Run execution for any pipeline in the workspace Summary: Resolves #3254 Allow run re-execution from any location in Dagit, regardless of what the left nav repo switcher says. We do this by finding the repository that matches the specified `PipelineRun`, either by using its `RepositoryOrigin` or, in the case of older runs, by looking for a pipeline name match. This fixes the issue in which a run cannot be kicked off if the "active" repo in the left nav has not been switched to the Run's pipeline's repo. Test Plan: View various places in Dagit where I can kick off runs: Runs page, a previous Run page, Playground. Verify that I can execute runs for any pipeline that exists in the workspace repositories, regardless of what the left nav says. Reviewers: dgibson, prha, johann, alangenfeld Reviewed By: dgibson Differential Revision: https://dagster.phacility.com/D5251

view details

push time in 14 hours

issue closeddagster-io/dagster

Multiple repositories playground pipeline trigger reset the selected repository and pipeline

Summary

<!-- A brief description of the issue and what you expect to happen instead --> In a setup with multiple repositories, running any of the pipelines except from the first repository in playground reset dagit selected repository to the first repository in the list. Expected to see the selected repository that triggered the pipeline instead.

Furthermore re-executing partial pipeline became not possible without re-switching to the original repository, and choosing the previous runs.

This happened in 0.9.19 and 0.9.20 but not earlier.

Reproduction

<!-- A minimal example that exhibits the behavior --> repo_one/repo.py

from dagster import pipeline, repository, solid

@solid
def get_one(_):
    return 1

@solid(config_schema=int)
def multiply(context, number):
    result = context.solid_config * number

@pipeline
def one_pipeline():
    multiply(get_one())

@repository
def one_repo():
    return [one_pipeline]

repo_two/repo.py

from dagster import pipeline, repository, solid

@solid
def get_two(_):
    return 2

@solid(config_schema=int)
def multiply(context, number):
    result = context.solid_config * number

@pipeline
def two_pipeline():
    multiply(get_two())

@repository
def two_repo():
    return [two_pipeline]

workspace.yaml

load_from:
  - python_file:
      location_name: repo_one
      relative_path: repo_one/repo.py
  - python_file:
      location_name: repo_two
      relative_path: repo_two/repo.py

Dagit UI/UX Issue Screenshots

<!-- (Optional) --> Setting up to run two_pipeline from two_repo image

Immediately after pipeline is triggered, dagit selected repo has been reset to one_repo, and re-execution is disabled. image

Additional Info about Your Environment

<!-- (Optional) -->

closed time in 14 hours

szeleeteo

push eventdagster-io/dagster

prha

commit sha da1919436131e3a825d854da62698fe3db737d72

[sensors-8] file toy sensor Summary: add a toy sensor that fires a pipeline execution every time a file in given directory changes Test Plan: none Reviewers: dgibson, johann, schrockn Reviewed By: dgibson Subscribers: sandyryza Differential Revision: https://dagster.phacility.com/D5215

view details

push time in 15 hours

push eventdagster-io/dagster

prha

commit sha a88b71f9eba90991e087af64c52494dbb240d1b8

[sensors-7.5] rename SensorRunParams/SensorSkipData => RunRequest/SkipReason Test Plan: bk Reviewers: dgibson, schrockn, sandyryza, alangenfeld Reviewed By: dgibson Differential Revision: https://dagster.phacility.com/D5228

view details

push time in 15 hours

more