profile
viewpoint
If you are wondering where the data of this site comes from, please visit https://api.github.com/users/robrap/events. GitMemory does not store any data, but only uses NGINX to cache data for a period of time. The idea behind GitMemory is simply to give users a better reading experience.
Robert Raposa robrap edX Cambridge, MA

edx/ecommerce 124

Service for managing edX's product catalog and handling orders for those products

edx/edx-django-utils 10

EdX utilities for Django Application development.

edx/edx-developer-docs 6

Words that developers want to read

edx/xss-utils 3

This repo contains utility functions for Django and Mako templates to remove potential XSS attacks in templates.

feanil/tutor 0

The docker-based Open edX distribution designed for peace of mind

robrap/django-masquerade 0

Mirror of https://bitbucket.org/technivore/django-masquerade

robrap/django-oauth-toolkit 0

OAuth2 goodies for the Djangonauts!

PullRequestReviewEvent

PR opened edx/edx-platform

Revert "refactor: Misc. clean up from dashboard investigation"

Reverts edx/edx-platform#28669

+70 -8

0 comment

6 changed files

pr created time in a day

create barnchedx/edx-platform

branch : revert-28669-ddumesnil/misc-cleanup

created branch time in a day

pull request commentedx/open-edx-proposals

OEP-52 (DRAFT): docs: add OEP for event bus

Thanks @tuchfarber (and many others).

  1. Although we were starting with Kafka, I was also trying to list some of the concerns we need to investigate, regardless of technology.
  2. We may be switching to Pulsar to try some of its unique features. I’m more concerned with completing a POC to see how it all hangs together.
  3. As we proceed with our POC, I think we’ll have a better idea of how and if an abstraction layer might work.

This early draft of an OEP was less about making a quick decision, but more about eliciting feedback so concerns like multi-tenancy and a potential abstraction layer can ultimately be addressed.

feanil

comment created time in 2 days

pull request commentedx/edx-platform

[FAL-2030] Updates kombu package to support multi-tenant redis authentication

@pomegranited: In place of this OSPR, we would prefer to just complete the required celery upgrade. In the past, our team had checked and the celery upgrade seemed to pass tests, but early on in the celery 5.x release people were reporting issues that made it seem unstable. Now that some time has passed since its initial release, next steps for the actual celery upgrade would be:

  1. Unpin and ensure that tests continue to pass.
  2. Whatever manual testing is readily available.
  3. Double check celery 5.x release notes and issues to see if it seems more stable.
  4. Deploy upgrade to edx.org and either it works, or we rollback with more info about missing tests and fixes required.

Is there any parts of 1-3 your team wants to take on to move this forward faster? We can clearly take care of 4. If all goes well, we are done. If not, we can discuss next steps.

Let me know what you think.

pomegranited

comment created time in 3 days

issue commentgocd/gocd

Email notifications not sent after N days from last login

Oops. I created this issue with the service account by mistake. I am the actual author of this issue.

arch-bom-gocd-alerts

comment created time in 3 days

Pull request review commentedx/open-edx-proposals

OEP-52 (DRAFT): docs: add OEP for event bus

+==============================+OEP-0052: Event Bus Technology+==============================++.. This OEP template is based on Nygard's Architecture Decision Records.++.. list-table::+   :widths: 25 75++   * - OEP-52+     - :doc:`OEP-0052 <oep-0052-arch-event-bus-technology>`+   * - Title+     - Event Bus Technology+   * - Last Modified+     - 2021-08-26+   * - Authors+     - Feanil Patel <feanil@edx.org>, Robert Raposa <rraposa@edx.org>+   * - Arbiter+     - TBD+   * - Status+     - Draft+   * - Type+     - Architecture+   * - Created+     - 2021-08-16+   * - Review Period+     - TBD++Overview+--------++* Adding an event bus to the Open edX platform allows asynchronous event messaging across services, which enables a number of improvements aligned with our architectural goals.++* We are trialing Kafka to implement an Open edX event bus. This decision will be updated as we commit or change direction.++* *Note:* Although this infrastructure could serve multiple purposes in the future, this OEP is focused on `publish-subscribe messaging pattern`_ (pub/sub) capabilities.++Context+-------++The already accepted :doc:`oep-0041-arch-async-server-event-messaging` details the general format and conventions the Open edX platform should use for asynchronous event messaging across services. It also provides background on a set of :ref:`Event Messaging Architectural Goals` for the Open edX platform, including:++* Align with the `Architecture Manifesto`_ themes of decentralization and asynchronous communication.+* Eliminate blocking, synchronous requests.+* Eliminate expensive, batch synchronization.+* Reduce the need for plugins.+* Flexibly integrate with event producers.+* Simplify integration to external systems.++However, this earlier OEP explicitly leaves out of scope the specific transport and libraries used for this messaging. At this time, the Open edX platform still lacks a reliable way to send events to multiple consumers across services, following the publish-subscribe (pub/sub) messaging pattern.++In other words, we have documented what we wish to do and why, but we do not yet fully have the capability to do it. We are missing a reliable event messaging infrastructure with `publish-subscribe messaging pattern`_ (pub/sub) capabilities.++Note: The `Architecture Manifesto`_ mentions being inspired by the `Reactive Manifesto`_. Although there is a lot of overlap, this might also cause confusion, because the Reactive Manifesto discusses `Message Driven (in contrast to Event-Driven)`_, which they define as messages sent to a specific destination. However, this decision is not concerned with Message Driven by this definition, but instead about Event-Driven capabilities using pub/sub.++.. _Architecture Manifesto: https://openedx.atlassian.net/wiki/spaces/AC/pages/1074397222/Architecture+Manifesto+WIP+.. _Reactive Manifesto: https://www.reactivemanifesto.org/+.. _Message Driven (in contrast to Event-Driven): https://www.reactivemanifesto.org/glossary#Message-Driven++Decision+--------++The Open edX platform will benefit from having a message bus. We believe `Apache Kafka`_ is a good choice for the event bus and will begin trialing it to solve some specific platform problems addressed by the `publish-subscribe messaging pattern`_.++.. _Apache Kafka: https://kafka.apache.org/+.. _publish-subscribe messaging pattern: https://en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern++Why (trial) Kafka?+~~~~~~~~~~~~~~~~~~++Kafka has been around for a long time. See `Thoughtworks's technology radar introduced Kafka`_ as "Assess" in 2015, and "Trial" in 2016. It never moved up to "Adopt", and also never moved down to "Hold". Read `Thoughtwork's Kafka decoder page`_ to lear more about its benefits and trade-offs, and how it is used.++More recently, the `Thoughtworks's technology radar introduced Apache Pulsar`_ as "assess" in 2020, and the `technology radar introduced Kafka API without Kafka`_ in 2021. This both demonstrates near standard of the Kafka API, but also Thoughtwork's hope to find a less complex alternative.++This history closely parallels our own internal research and comparisons. We believe Apache Kafka is still the right option due to its maturity, documentation, support and community. However, Kafka can end up being a very complex collection of tools, some of which Apache Pulsar was designed to simplify. Therefore, we will begin a trial of Kafka, keeping in mind the potential benefits of Pulsar, and ultimately commit or start a trial of Pulsar.++This OEP will be adjusted as we learn more and make a final decision.++.. _Thoughtworks's technology radar introduced Kafka: https://www.thoughtworks.com/radar/tools/apache-kafka+.. _Thoughtwork's Kafka decoder page: https://www.thoughtworks.com/decoder/kafka++.. _Thoughtworks's technology radar introduced Apache Pulsar: https://www.thoughtworks.com/radar/platforms/apache-pulsar+.. _technology radar introduced Kafka API without Kafka: https://www.thoughtworks.com/radar/platforms/kafka-api-without-kafka++Messaging Features+~~~~~~~~~~~~~~~~~~++Kafka is a distributed streaming platform. Kafka's implementation maps nicely to the pub/sub pattern. However, some native features of a message broker like RabbitMQ are not built-in.++There is a useful `blog article comparing Kafka and RabbitMQ`_ by Eran Stiller. The article compares the technologies as pub/sub implementations across the following dimensions (winner in parentheses):++* Message Ordering (Kafka)+* Message Routing/Filtering (RabbitMQ)

Great questions that I'm not going to address quite yet. I think we will have lots of ongoing best practices to discover. The main question now is whether we think the technology choice will make or break our ability to do what we need to here. If you are trying to make a point like that, can you clarify the feature you think we require? If not, maybe we should continue this particular thread in some parallel location where we work on discovery of more of our best practices. (Note: sometimes it's just not clear yet whether or not this will affect the technology choice, and maybe you feel this needs to be part of the POC to discover that.)

feanil

comment created time in 8 days

PullRequestReviewEvent

Pull request review commentedx/open-edx-proposals

OEP-52 (DRAFT): docs: add OEP for event bus

+==============================+OEP-0052: Event Bus Technology+==============================++.. This OEP template is based on Nygard's Architecture Decision Records.++.. list-table::+   :widths: 25 75++   * - OEP-52+     - :doc:`OEP-0052 <oep-0052-arch-event-bus-technology>`+   * - Title+     - Event Bus Technology+   * - Last Modified+     - 2021-08-26+   * - Authors+     - Feanil Patel <feanil@edx.org>, Robert Raposa <rraposa@edx.org>+   * - Arbiter+     - TBD+   * - Status+     - Draft+   * - Type+     - Architecture+   * - Created+     - 2021-08-16+   * - Review Period+     - TBD++Overview+--------++* Adding an event bus to the Open edX platform allows asynchronous event messaging across services, which enables a number of improvements aligned with our architectural goals.++* We are trialing Kafka to implement an Open edX event bus. This decision will be updated as we commit or change direction.++* *Note:* Although this infrastructure could serve multiple purposes in the future, this OEP is focused on `publish-subscribe messaging pattern`_ (pub/sub) capabilities.++Context+-------++The already accepted :doc:`oep-0041-arch-async-server-event-messaging` details the general format and conventions the Open edX platform should use for asynchronous event messaging across services. It also provides background on a set of :ref:`Event Messaging Architectural Goals` for the Open edX platform, including:++* Align with the `Architecture Manifesto`_ themes of decentralization and asynchronous communication.+* Eliminate blocking, synchronous requests.+* Eliminate expensive, batch synchronization.+* Reduce the need for plugins.+* Flexibly integrate with event producers.+* Simplify integration to external systems.++However, this earlier OEP explicitly leaves out of scope the specific transport and libraries used for this messaging. At this time, the Open edX platform still lacks a reliable way to send events to multiple consumers across services, following the publish-subscribe (pub/sub) messaging pattern.++In other words, we have documented what we wish to do and why, but we do not yet fully have the capability to do it. We are missing a reliable event messaging infrastructure with `publish-subscribe messaging pattern`_ (pub/sub) capabilities.++Note: The `Architecture Manifesto`_ mentions being inspired by the `Reactive Manifesto`_. Although there is a lot of overlap, this might also cause confusion, because the Reactive Manifesto discusses `Message Driven (in contrast to Event-Driven)`_, which they define as messages sent to a specific destination. However, this decision is not concerned with Message Driven by this definition, but instead about Event-Driven capabilities using pub/sub.++.. _Architecture Manifesto: https://openedx.atlassian.net/wiki/spaces/AC/pages/1074397222/Architecture+Manifesto+WIP+.. _Reactive Manifesto: https://www.reactivemanifesto.org/+.. _Message Driven (in contrast to Event-Driven): https://www.reactivemanifesto.org/glossary#Message-Driven++Decision+--------++The Open edX platform will benefit from having a message bus. We believe `Apache Kafka`_ is a good choice for the event bus and will begin trialing it to solve some specific platform problems addressed by the `publish-subscribe messaging pattern`_.++.. _Apache Kafka: https://kafka.apache.org/+.. _publish-subscribe messaging pattern: https://en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern++Why (trial) Kafka?+~~~~~~~~~~~~~~~~~~++Kafka has been around for a long time. See `Thoughtworks's technology radar introduced Kafka`_ as "Assess" in 2015, and "Trial" in 2016. It never moved up to "Adopt", and also never moved down to "Hold". Read `Thoughtwork's Kafka decoder page`_ to lear more about its benefits and trade-offs, and how it is used.++More recently, the `Thoughtworks's technology radar introduced Apache Pulsar`_ as "assess" in 2020, and the `technology radar introduced Kafka API without Kafka`_ in 2021. This both demonstrates near standard of the Kafka API, but also Thoughtwork's hope to find a less complex alternative.++This history closely parallels our own internal research and comparisons. We believe Apache Kafka is still the right option due to its maturity, documentation, support and community. However, Kafka can end up being a very complex collection of tools, some of which Apache Pulsar was designed to simplify. Therefore, we will begin a trial of Kafka, keeping in mind the potential benefits of Pulsar, and ultimately commit or start a trial of Pulsar.++This OEP will be adjusted as we learn more and make a final decision.++.. _Thoughtworks's technology radar introduced Kafka: https://www.thoughtworks.com/radar/tools/apache-kafka+.. _Thoughtwork's Kafka decoder page: https://www.thoughtworks.com/decoder/kafka++.. _Thoughtworks's technology radar introduced Apache Pulsar: https://www.thoughtworks.com/radar/platforms/apache-pulsar+.. _technology radar introduced Kafka API without Kafka: https://www.thoughtworks.com/radar/platforms/kafka-api-without-kafka++Messaging Features+~~~~~~~~~~~~~~~~~~++Kafka is a distributed streaming platform. Kafka's implementation maps nicely to the pub/sub pattern. However, some native features of a message broker like RabbitMQ are not built-in.++There is a useful `blog article comparing Kafka and RabbitMQ`_ by Eran Stiller. The article compares the technologies as pub/sub implementations across the following dimensions (winner in parentheses):++* Message Ordering (Kafka)+* Message Routing/Filtering (RabbitMQ)+* Message Timing (RabbitMQ)+* Message Retention (Kafka)+* Fault Handling (RabbitMQ)+* Scale (Kafka)+* Consumer Complexity (RabbitMQ)++Above dimensions which we ultimately require, but were won by RabbitMQ, will likely require additional development and/or supplementary technologies, as partially detailed in the next section.++Note: Some of these missing features are natively supported by `Apache Pulsar`_, at least according to its documentation.++.. _blog article comparing Kafka and RabbitMQ: https://stiller.blog/2020/02/rabbitmq-vs-kafka-an-architects-dilemma-part-2/++Kafka Add-ons, Distributions, and Providers+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~++As part of the trial, there will be many additional related technologies to explore. Each technology affects the developer experience, the operator experience, or both. Some choices may ultimately affect the Open edX plaform for the entire community, and some choices may be unique to each organization (like edx.org). This document will ultimately contain details for both these cases, since they may help other organizations even when multiple options are still available.++The following is a list of just some of the potential technologies that may need to be deployed and managed:++* `Apache Kafka`_+* `Kafka Streams <https://kafka.apache.org/documentation/streams/>`__+* `Kafka Connect <https://kafka.apache.org/documentation/#connect>`__+* `Cruise Control <https://github.com/linkedin/cruise-control>`__+* `Faust <https://faust.readthedocs.io/en/latest/userguide/kafka.html>`__ (Python version similar to Kafka Streams)+* `Various Python clients <https://cwiki.apache.org/confluence/display/KAFKA/Clients#Clients-Python>`__++Note: `Amazon MSK`_ is an AWS managed service that supplies the Apache Kafka core platform only.++or++* `Confluent Platform`_ - Enterprise Kafka Distribution (Open Source, Community, or Commercial)++  * `Schema Registry <https://www.confluent.io/product/confluent-platform/data-compatibility/>`__+  * Monitoring and alerting capabilities (Commercial)+  * Self-balancing clusters (Commercial)+  * Tiered storage (Commercial) (future feature of Apache Kafka)+  * Infinite retention (Cloud only?)++Additional Notes:++* `Apache Pulsar`_ has similar features as part of its platform, which is why it makes a good potential alternative. However, the features are less battle-tested and the deployment story *may* be more complicated.+* Confluent also offers Confluent Cloud, a fully managed solution that offers much simpler operations, but is unlikely to be used by edX.org.++Also see a useful and biased `comparison of Apache Kafka vs Vendors`_ by Kai Waehner (of Confluent), comparing various providers and distributions of Kafka and related or competitive services.++.. _Amazon MSK: https://aws.amazon.com/msk/+.. _Confluent Platform: https://www.confluent.io/product/confluent-platform+.. _comparison of Apache Kafka vs Vendors: https://www.kai-waehner.de/blog/2021/04/20/comparison-open-source-apache-kafka-vs-confluent-cloudera-red-hat-amazon-msk-cloud/++Kafka Highlights+~~~~~~~~~~~~~~~~++Pros+^^^^++* Battle-tested, widely adopted, big community, lots of documentation and answers.+* Amazon MSK (AWS service) provides hosted path of least resistance.+* `New Relic integration with Amazon MSK`_ (useful to edX.org).++Cons+^^^^++* Many open questions about add-ons required for developers and operators.+* Complex to manage, including likely manual scaling.++.. _New Relic integration with Amazon MSK: https://docs.newrelic.com/docs/integrations/amazon-integrations/aws-integrations-list/aws-managed-kafka-msk-integration/++Consequences+------------++* Operators will need to deploy and manage the selected infrastructure, which is likely to be complex. If Apache Kafka is selected, there are likely to be a set of auxiliary parts to provide all required functionality for our message bus.+* Education will be required for both developers and operators regarding best practices for each role.+* Code to interact with Kafka and its libraries will be added to core services.+* At least one initial use case must be completed. One potential candidate is the grade change event in the LMS, and its use by the Credentials service.+* Once we have a message bus, we can investigate other potential use cases:++  * Course/program update propagation.+  * Feed into xAPI/Caliper capabilities.+  * New services and features can be built fully de-coupled from the core application.++Rejected Alternatives+---------------------++Apache Pulsar+~~~~~~~~~~~~~++Although rejected to start, `Apache Pulsar`_ remains an option if solving with Kafka turns out to be overly burdensome for developers or operators.++Pros+^^^^++* Ease of scalability (built-in, according to docs).+* Ease of data retention capabilities.+* Additional built-in pub/sub features (built-in, according to docs).++Cons+^^^^++* Requires 3rd party hosting or larger upfront investment in self-hosted (kubernetes).+* Less mature (but growing) community, little documentation, and few answers.++Note: Read an interesting (Kafka/Confluent) biased article exploring `comparisons and myths of Kafka vs Pulsar`_.++.. _Apache Pulsar: https://pulsar.apache.org/+.. _comparisons and myths of Kafka vs Pulsar: https://dzone.com/articles/pulsar-vs-kafka-comparison-and-myths-explored++Redis+~~~~~++Pros+^^^^++* Already part of Open edX platform++Cons+^^^^++* Can lose acked data, even if RAM backed up with an append-only file (AOF).+* Requires homegrown schema management.++Abstract Message Bus Class+~~~~~~~~~~~~~~~~~~~~~~~~~~++@feanil - What is this???

Agreed. Not much more to write on this topic at this time. We need to make some headway on some other fronts, and then we'll have a little more idea about what would need to be abstracted and how and if that could work. However, in addition to earlier mentioned possible downsides, this would also take additional resources to prove out that the abstraction really could work with Redis Streams. Maybe we could line up wider community resources for that type of effort?

feanil

comment created time in 8 days

PullRequestReviewEvent

Pull request review commentedx/open-edx-proposals

OEP-52 (DRAFT): docs: add OEP for event bus

+==============================+OEP-0052: Event Bus Technology+==============================++.. This OEP template is based on Nygard's Architecture Decision Records.++.. list-table::+   :widths: 25 75++   * - OEP-52+     - :doc:`OEP-0052 <oep-0052-arch-event-bus-technology>`+   * - Title+     - Event Bus Technology+   * - Last Modified+     - 2021-08-26+   * - Authors+     - Feanil Patel <feanil@edx.org>, Robert Raposa <rraposa@edx.org>+   * - Arbiter+     - TBD+   * - Status+     - Draft+   * - Type+     - Architecture+   * - Created+     - 2021-08-16+   * - Review Period+     - TBD++Overview+--------++* Adding an event bus to the Open edX platform allows asynchronous event messaging across services, which enables a number of improvements aligned with our architectural goals.++* We are trialing Kafka to implement an Open edX event bus. This decision will be updated as we commit or change direction.

Agreed. We need to keep it in mind, including for our own needs. However, it is not the whole story either. I tried to address Redis specifics in a response to this comment: https://github.com/edx/open-edx-proposals/pull/233/files#r702475213

feanil

comment created time in 9 days

Pull request review commentedx/open-edx-proposals

OEP-52 (DRAFT): docs: add OEP for event bus

+==============================+OEP-0052: Event Bus Technology+==============================++.. This OEP template is based on Nygard's Architecture Decision Records.++.. list-table::+   :widths: 25 75++   * - OEP-52+     - :doc:`OEP-0052 <oep-0052-arch-event-bus-technology>`+   * - Title+     - Event Bus Technology+   * - Last Modified+     - 2021-08-26+   * - Authors+     - Feanil Patel <feanil@edx.org>, Robert Raposa <rraposa@edx.org>+   * - Arbiter+     - TBD+   * - Status+     - Draft+   * - Type+     - Architecture+   * - Created+     - 2021-08-16+   * - Review Period+     - TBD++Overview+--------++* Adding an event bus to the Open edX platform allows asynchronous event messaging across services, which enables a number of improvements aligned with our architectural goals.++* We are trialing Kafka to implement an Open edX event bus. This decision will be updated as we commit or change direction.++* *Note:* Although this infrastructure could serve multiple purposes in the future, this OEP is focused on `publish-subscribe messaging pattern`_ (pub/sub) capabilities.++Context+-------++The already accepted :doc:`oep-0041-arch-async-server-event-messaging` details the general format and conventions the Open edX platform should use for asynchronous event messaging across services. It also provides background on a set of :ref:`Event Messaging Architectural Goals` for the Open edX platform, including:++* Align with the `Architecture Manifesto`_ themes of decentralization and asynchronous communication.+* Eliminate blocking, synchronous requests.+* Eliminate expensive, batch synchronization.+* Reduce the need for plugins.+* Flexibly integrate with event producers.+* Simplify integration to external systems.++However, this earlier OEP explicitly leaves out of scope the specific transport and libraries used for this messaging. At this time, the Open edX platform still lacks a reliable way to send events to multiple consumers across services, following the publish-subscribe (pub/sub) messaging pattern.++In other words, we have documented what we wish to do and why, but we do not yet fully have the capability to do it. We are missing a reliable event messaging infrastructure with `publish-subscribe messaging pattern`_ (pub/sub) capabilities.++Note: The `Architecture Manifesto`_ mentions being inspired by the `Reactive Manifesto`_. Although there is a lot of overlap, this might also cause confusion, because the Reactive Manifesto discusses `Message Driven (in contrast to Event-Driven)`_, which they define as messages sent to a specific destination. However, this decision is not concerned with Message Driven by this definition, but instead about Event-Driven capabilities using pub/sub.++.. _Architecture Manifesto: https://openedx.atlassian.net/wiki/spaces/AC/pages/1074397222/Architecture+Manifesto+WIP+.. _Reactive Manifesto: https://www.reactivemanifesto.org/+.. _Message Driven (in contrast to Event-Driven): https://www.reactivemanifesto.org/glossary#Message-Driven++Decision+--------++The Open edX platform will benefit from having a message bus. We believe `Apache Kafka`_ is a good choice for the event bus and will begin trialing it to solve some specific platform problems addressed by the `publish-subscribe messaging pattern`_.++.. _Apache Kafka: https://kafka.apache.org/+.. _publish-subscribe messaging pattern: https://en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern++Why (trial) Kafka?+~~~~~~~~~~~~~~~~~~++Kafka has been around for a long time. See `Thoughtworks's technology radar introduced Kafka`_ as "Assess" in 2015, and "Trial" in 2016. It never moved up to "Adopt", and also never moved down to "Hold". Read `Thoughtwork's Kafka decoder page`_ to lear more about its benefits and trade-offs, and how it is used.++More recently, the `Thoughtworks's technology radar introduced Apache Pulsar`_ as "assess" in 2020, and the `technology radar introduced Kafka API without Kafka`_ in 2021. This both demonstrates near standard of the Kafka API, but also Thoughtwork's hope to find a less complex alternative.++This history closely parallels our own internal research and comparisons. We believe Apache Kafka is still the right option due to its maturity, documentation, support and community. However, Kafka can end up being a very complex collection of tools, some of which Apache Pulsar was designed to simplify. Therefore, we will begin a trial of Kafka, keeping in mind the potential benefits of Pulsar, and ultimately commit or start a trial of Pulsar.++This OEP will be adjusted as we learn more and make a final decision.++.. _Thoughtworks's technology radar introduced Kafka: https://www.thoughtworks.com/radar/tools/apache-kafka+.. _Thoughtwork's Kafka decoder page: https://www.thoughtworks.com/decoder/kafka++.. _Thoughtworks's technology radar introduced Apache Pulsar: https://www.thoughtworks.com/radar/platforms/apache-pulsar+.. _technology radar introduced Kafka API without Kafka: https://www.thoughtworks.com/radar/platforms/kafka-api-without-kafka++Messaging Features+~~~~~~~~~~~~~~~~~~++Kafka is a distributed streaming platform. Kafka's implementation maps nicely to the pub/sub pattern. However, some native features of a message broker like RabbitMQ are not built-in.++There is a useful `blog article comparing Kafka and RabbitMQ`_ by Eran Stiller. The article compares the technologies as pub/sub implementations across the following dimensions (winner in parentheses):++* Message Ordering (Kafka)+* Message Routing/Filtering (RabbitMQ)+* Message Timing (RabbitMQ)+* Message Retention (Kafka)+* Fault Handling (RabbitMQ)+* Scale (Kafka)+* Consumer Complexity (RabbitMQ)++Above dimensions which we ultimately require, but were won by RabbitMQ, will likely require additional development and/or supplementary technologies, as partially detailed in the next section.++Note: Some of these missing features are natively supported by `Apache Pulsar`_, at least according to its documentation.++.. _blog article comparing Kafka and RabbitMQ: https://stiller.blog/2020/02/rabbitmq-vs-kafka-an-architects-dilemma-part-2/++Kafka Add-ons, Distributions, and Providers+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~++As part of the trial, there will be many additional related technologies to explore. Each technology affects the developer experience, the operator experience, or both. Some choices may ultimately affect the Open edX plaform for the entire community, and some choices may be unique to each organization (like edx.org). This document will ultimately contain details for both these cases, since they may help other organizations even when multiple options are still available.++The following is a list of just some of the potential technologies that may need to be deployed and managed:++* `Apache Kafka`_+* `Kafka Streams <https://kafka.apache.org/documentation/streams/>`__+* `Kafka Connect <https://kafka.apache.org/documentation/#connect>`__+* `Cruise Control <https://github.com/linkedin/cruise-control>`__+* `Faust <https://faust.readthedocs.io/en/latest/userguide/kafka.html>`__ (Python version similar to Kafka Streams)+* `Various Python clients <https://cwiki.apache.org/confluence/display/KAFKA/Clients#Clients-Python>`__++Note: `Amazon MSK`_ is an AWS managed service that supplies the Apache Kafka core platform only.++or++* `Confluent Platform`_ - Enterprise Kafka Distribution (Open Source, Community, or Commercial)++  * `Schema Registry <https://www.confluent.io/product/confluent-platform/data-compatibility/>`__+  * Monitoring and alerting capabilities (Commercial)+  * Self-balancing clusters (Commercial)+  * Tiered storage (Commercial) (future feature of Apache Kafka)+  * Infinite retention (Cloud only?)++Additional Notes:++* `Apache Pulsar`_ has similar features as part of its platform, which is why it makes a good potential alternative. However, the features are less battle-tested and the deployment story *may* be more complicated.+* Confluent also offers Confluent Cloud, a fully managed solution that offers much simpler operations, but is unlikely to be used by edX.org.++Also see a useful and biased `comparison of Apache Kafka vs Vendors`_ by Kai Waehner (of Confluent), comparing various providers and distributions of Kafka and related or competitive services.++.. _Amazon MSK: https://aws.amazon.com/msk/+.. _Confluent Platform: https://www.confluent.io/product/confluent-platform+.. _comparison of Apache Kafka vs Vendors: https://www.kai-waehner.de/blog/2021/04/20/comparison-open-source-apache-kafka-vs-confluent-cloudera-red-hat-amazon-msk-cloud/++Kafka Highlights+~~~~~~~~~~~~~~~~++Pros+^^^^++* Battle-tested, widely adopted, big community, lots of documentation and answers.+* Amazon MSK (AWS service) provides hosted path of least resistance.+* `New Relic integration with Amazon MSK`_ (useful to edX.org).++Cons+^^^^++* Many open questions about add-ons required for developers and operators.+* Complex to manage, including likely manual scaling.++.. _New Relic integration with Amazon MSK: https://docs.newrelic.com/docs/integrations/amazon-integrations/aws-integrations-list/aws-managed-kafka-msk-integration/++Consequences+------------++* Operators will need to deploy and manage the selected infrastructure, which is likely to be complex. If Apache Kafka is selected, there are likely to be a set of auxiliary parts to provide all required functionality for our message bus.+* Education will be required for both developers and operators regarding best practices for each role.+* Code to interact with Kafka and its libraries will be added to core services.+* At least one initial use case must be completed. One potential candidate is the grade change event in the LMS, and its use by the Credentials service.+* Once we have a message bus, we can investigate other potential use cases:++  * Course/program update propagation.+  * Feed into xAPI/Caliper capabilities.+  * New services and features can be built fully de-coupled from the core application.++Rejected Alternatives+---------------------++Apache Pulsar+~~~~~~~~~~~~~++Although rejected to start, `Apache Pulsar`_ remains an option if solving with Kafka turns out to be overly burdensome for developers or operators.++Pros+^^^^++* Ease of scalability (built-in, according to docs).+* Ease of data retention capabilities.+* Additional built-in pub/sub features (built-in, according to docs).++Cons+^^^^++* Requires 3rd party hosting or larger upfront investment in self-hosted (kubernetes).+* Less mature (but growing) community, little documentation, and few answers.++Note: Read an interesting (Kafka/Confluent) biased article exploring `comparisons and myths of Kafka vs Pulsar`_.++.. _Apache Pulsar: https://pulsar.apache.org/+.. _comparisons and myths of Kafka vs Pulsar: https://dzone.com/articles/pulsar-vs-kafka-comparison-and-myths-explored++Redis+~~~~~++Pros+^^^^++* Already part of Open edX platform++Cons+^^^^++* Can lose acked data, even if RAM backed up with an append-only file (AOF).

@regisb: I’m going to use this thread to discuss Redis more deeply.

  1. The fact that Redis is already a part of our stack is a huge plus. I would love for it to be the right fit, but I don’t think it will make the cut. However, the cons need more clarity and attention here.
  2. In general, the hosting and deployment story for each Open edX deployment makes this OEP that much more complex. In the case of Amazon’s ElastiCache for Redis (which we use), AOF is not an option for multi-AZ, and they state that multi-AZ has better resiliency. However, it’s not a great story for disaster recovery, which means some homegrown solution of backing up and being able to restore all events. Instead, I think we are going to need a solution with persistence.
  3. Another con that is missing here is that because Redis is in-memory, it is limited by RAM. I also don’t think it has a good compaction story. This seems like a hard limit that again would be very costly to work around, like trying to manage and balance across multiple Redis clusters, or any other custom solution we might come up with.
  4. I’ll leave the abstraction layer conversation to this other comment: https://github.com/edx/open-edx-proposals/pull/233/files#r698547746 Let’s keep this thread to: Why not Redis as the single solution?
feanil

comment created time in 9 days

Pull request review commentedx/open-edx-proposals

OEP-52 (DRAFT): docs: add OEP for event bus

+==============================+OEP-0052: Event Bus Technology+==============================++.. This OEP template is based on Nygard's Architecture Decision Records.++.. list-table::+   :widths: 25 75++   * - OEP-52+     - :doc:`OEP-0052 <oep-0052-arch-event-bus-technology>`+   * - Title+     - Event Bus Technology+   * - Last Modified+     - 2021-08-26+   * - Authors+     - Feanil Patel <feanil@edx.org>, Robert Raposa <rraposa@edx.org>+   * - Arbiter+     - TBD+   * - Status+     - Draft+   * - Type+     - Architecture+   * - Created+     - 2021-08-16+   * - Review Period+     - TBD++Overview+--------++* Adding an event bus to the Open edX platform allows asynchronous event messaging across services, which enables a number of improvements aligned with our architectural goals.++* We are trialing Kafka to implement an Open edX event bus. This decision will be updated as we commit or change direction.++* *Note:* Although this infrastructure could serve multiple purposes in the future, this OEP is focused on `publish-subscribe messaging pattern`_ (pub/sub) capabilities.++Context+-------++The already accepted :doc:`oep-0041-arch-async-server-event-messaging` details the general format and conventions the Open edX platform should use for asynchronous event messaging across services. It also provides background on a set of :ref:`Event Messaging Architectural Goals` for the Open edX platform, including:++* Align with the `Architecture Manifesto`_ themes of decentralization and asynchronous communication.+* Eliminate blocking, synchronous requests.+* Eliminate expensive, batch synchronization.+* Reduce the need for plugins.+* Flexibly integrate with event producers.+* Simplify integration to external systems.++However, this earlier OEP explicitly leaves out of scope the specific transport and libraries used for this messaging. At this time, the Open edX platform still lacks a reliable way to send events to multiple consumers across services, following the publish-subscribe (pub/sub) messaging pattern.++In other words, we have documented what we wish to do and why, but we do not yet fully have the capability to do it. We are missing a reliable event messaging infrastructure with `publish-subscribe messaging pattern`_ (pub/sub) capabilities.++Note: The `Architecture Manifesto`_ mentions being inspired by the `Reactive Manifesto`_. Although there is a lot of overlap, this might also cause confusion, because the Reactive Manifesto discusses `Message Driven (in contrast to Event-Driven)`_, which they define as messages sent to a specific destination. However, this decision is not concerned with Message Driven by this definition, but instead about Event-Driven capabilities using pub/sub.++.. _Architecture Manifesto: https://openedx.atlassian.net/wiki/spaces/AC/pages/1074397222/Architecture+Manifesto+WIP+.. _Reactive Manifesto: https://www.reactivemanifesto.org/+.. _Message Driven (in contrast to Event-Driven): https://www.reactivemanifesto.org/glossary#Message-Driven++Decision+--------++The Open edX platform will benefit from having a message bus. We believe `Apache Kafka`_ is a good choice for the event bus and will begin trialing it to solve some specific platform problems addressed by the `publish-subscribe messaging pattern`_.++.. _Apache Kafka: https://kafka.apache.org/+.. _publish-subscribe messaging pattern: https://en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern++Why (trial) Kafka?+~~~~~~~~~~~~~~~~~~++Kafka has been around for a long time. See `Thoughtworks's technology radar introduced Kafka`_ as "Assess" in 2015, and "Trial" in 2016. It never moved up to "Adopt", and also never moved down to "Hold". Read `Thoughtwork's Kafka decoder page`_ to lear more about its benefits and trade-offs, and how it is used.++More recently, the `Thoughtworks's technology radar introduced Apache Pulsar`_ as "assess" in 2020, and the `technology radar introduced Kafka API without Kafka`_ in 2021. This both demonstrates near standard of the Kafka API, but also Thoughtwork's hope to find a less complex alternative.++This history closely parallels our own internal research and comparisons. We believe Apache Kafka is still the right option due to its maturity, documentation, support and community. However, Kafka can end up being a very complex collection of tools, some of which Apache Pulsar was designed to simplify. Therefore, we will begin a trial of Kafka, keeping in mind the potential benefits of Pulsar, and ultimately commit or start a trial of Pulsar.++This OEP will be adjusted as we learn more and make a final decision.++.. _Thoughtworks's technology radar introduced Kafka: https://www.thoughtworks.com/radar/tools/apache-kafka+.. _Thoughtwork's Kafka decoder page: https://www.thoughtworks.com/decoder/kafka++.. _Thoughtworks's technology radar introduced Apache Pulsar: https://www.thoughtworks.com/radar/platforms/apache-pulsar+.. _technology radar introduced Kafka API without Kafka: https://www.thoughtworks.com/radar/platforms/kafka-api-without-kafka++Messaging Features+~~~~~~~~~~~~~~~~~~++Kafka is a distributed streaming platform. Kafka's implementation maps nicely to the pub/sub pattern. However, some native features of a message broker like RabbitMQ are not built-in.++There is a useful `blog article comparing Kafka and RabbitMQ`_ by Eran Stiller. The article compares the technologies as pub/sub implementations across the following dimensions (winner in parentheses):++* Message Ordering (Kafka)+* Message Routing/Filtering (RabbitMQ)
  1. For routing/filtering (and several other concepts), this is why we are taking a POC approach. We need to get our hands dirty both to see what we actually need, and how it will work. For filtering, maybe it is as simple as consumers reading and ignoring any subset of messages that don’t apply. For routing, again I’m not certain about our needs, and how it will play out is tool dependent.
  2. I think a topic basically covers a single message type, but maybe you are already assuming that as the coarsest granularity and wondering when and if we should have additionally filtered topics? Or maybe you are asking something entirely different?
feanil

comment created time in 9 days

Pull request review commentedx/open-edx-proposals

OEP-52 (DRAFT): docs: add OEP for event bus

+==============================+OEP-0052: Event Bus Technology+==============================++.. This OEP template is based on Nygard's Architecture Decision Records.++.. list-table::+   :widths: 25 75++   * - OEP-52+     - :doc:`OEP-0052 <oep-0052-arch-event-bus-technology>`+   * - Title+     - Event Bus Technology+   * - Last Modified+     - 2021-08-26+   * - Authors+     - Feanil Patel <feanil@edx.org>, Robert Raposa <rraposa@edx.org>+   * - Arbiter+     - TBD+   * - Status+     - Draft+   * - Type+     - Architecture+   * - Created+     - 2021-08-16+   * - Review Period+     - TBD++Overview+--------++* Adding an event bus to the Open edX platform allows asynchronous event messaging across services, which enables a number of improvements aligned with our architectural goals.++* We are trialing Kafka to implement an Open edX event bus. This decision will be updated as we commit or change direction.++* *Note:* Although this infrastructure could serve multiple purposes in the future, this OEP is focused on `publish-subscribe messaging pattern`_ (pub/sub) capabilities.++Context+-------++The already accepted :doc:`oep-0041-arch-async-server-event-messaging` details the general format and conventions the Open edX platform should use for asynchronous event messaging across services. It also provides background on a set of :ref:`Event Messaging Architectural Goals` for the Open edX platform, including:++* Align with the `Architecture Manifesto`_ themes of decentralization and asynchronous communication.+* Eliminate blocking, synchronous requests.+* Eliminate expensive, batch synchronization.+* Reduce the need for plugins.+* Flexibly integrate with event producers.+* Simplify integration to external systems.++However, this earlier OEP explicitly leaves out of scope the specific transport and libraries used for this messaging. At this time, the Open edX platform still lacks a reliable way to send events to multiple consumers across services, following the publish-subscribe (pub/sub) messaging pattern.++In other words, we have documented what we wish to do and why, but we do not yet fully have the capability to do it. We are missing a reliable event messaging infrastructure with `publish-subscribe messaging pattern`_ (pub/sub) capabilities.++Note: The `Architecture Manifesto`_ mentions being inspired by the `Reactive Manifesto`_. Although there is a lot of overlap, this might also cause confusion, because the Reactive Manifesto discusses `Message Driven (in contrast to Event-Driven)`_, which they define as messages sent to a specific destination. However, this decision is not concerned with Message Driven by this definition, but instead about Event-Driven capabilities using pub/sub.++.. _Architecture Manifesto: https://openedx.atlassian.net/wiki/spaces/AC/pages/1074397222/Architecture+Manifesto+WIP+.. _Reactive Manifesto: https://www.reactivemanifesto.org/+.. _Message Driven (in contrast to Event-Driven): https://www.reactivemanifesto.org/glossary#Message-Driven++Decision+--------++The Open edX platform will benefit from having a message bus. We believe `Apache Kafka`_ is a good choice for the event bus and will begin trialing it to solve some specific platform problems addressed by the `publish-subscribe messaging pattern`_.++.. _Apache Kafka: https://kafka.apache.org/+.. _publish-subscribe messaging pattern: https://en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern++Why (trial) Kafka?+~~~~~~~~~~~~~~~~~~++Kafka has been around for a long time. See `Thoughtworks's technology radar introduced Kafka`_ as "Assess" in 2015, and "Trial" in 2016. It never moved up to "Adopt", and also never moved down to "Hold". Read `Thoughtwork's Kafka decoder page`_ to lear more about its benefits and trade-offs, and how it is used.++More recently, the `Thoughtworks's technology radar introduced Apache Pulsar`_ as "assess" in 2020, and the `technology radar introduced Kafka API without Kafka`_ in 2021. This both demonstrates near standard of the Kafka API, but also Thoughtwork's hope to find a less complex alternative.++This history closely parallels our own internal research and comparisons. We believe Apache Kafka is still the right option due to its maturity, documentation, support and community. However, Kafka can end up being a very complex collection of tools, some of which Apache Pulsar was designed to simplify. Therefore, we will begin a trial of Kafka, keeping in mind the potential benefits of Pulsar, and ultimately commit or start a trial of Pulsar.++This OEP will be adjusted as we learn more and make a final decision.++.. _Thoughtworks's technology radar introduced Kafka: https://www.thoughtworks.com/radar/tools/apache-kafka+.. _Thoughtwork's Kafka decoder page: https://www.thoughtworks.com/decoder/kafka++.. _Thoughtworks's technology radar introduced Apache Pulsar: https://www.thoughtworks.com/radar/platforms/apache-pulsar+.. _technology radar introduced Kafka API without Kafka: https://www.thoughtworks.com/radar/platforms/kafka-api-without-kafka++Messaging Features+~~~~~~~~~~~~~~~~~~++Kafka is a distributed streaming platform. Kafka's implementation maps nicely to the pub/sub pattern. However, some native features of a message broker like RabbitMQ are not built-in.++There is a useful `blog article comparing Kafka and RabbitMQ`_ by Eran Stiller. The article compares the technologies as pub/sub implementations across the following dimensions (winner in parentheses):++* Message Ordering (Kafka)+* Message Routing/Filtering (RabbitMQ)+* Message Timing (RabbitMQ)+* Message Retention (Kafka)+* Fault Handling (RabbitMQ)+* Scale (Kafka)+* Consumer Complexity (RabbitMQ)++Above dimensions which we ultimately require, but were won by RabbitMQ, will likely require additional development and/or supplementary technologies, as partially detailed in the next section.++Note: Some of these missing features are natively supported by `Apache Pulsar`_, at least according to its documentation.++.. _blog article comparing Kafka and RabbitMQ: https://stiller.blog/2020/02/rabbitmq-vs-kafka-an-architects-dilemma-part-2/++Kafka Add-ons, Distributions, and Providers+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~++As part of the trial, there will be many additional related technologies to explore. Each technology affects the developer experience, the operator experience, or both. Some choices may ultimately affect the Open edX plaform for the entire community, and some choices may be unique to each organization (like edx.org). This document will ultimately contain details for both these cases, since they may help other organizations even when multiple options are still available.++The following is a list of just some of the potential technologies that may need to be deployed and managed:++* `Apache Kafka`_+* `Kafka Streams <https://kafka.apache.org/documentation/streams/>`__+* `Kafka Connect <https://kafka.apache.org/documentation/#connect>`__+* `Cruise Control <https://github.com/linkedin/cruise-control>`__+* `Faust <https://faust.readthedocs.io/en/latest/userguide/kafka.html>`__ (Python version similar to Kafka Streams)+* `Various Python clients <https://cwiki.apache.org/confluence/display/KAFKA/Clients#Clients-Python>`__++Note: `Amazon MSK`_ is an AWS managed service that supplies the Apache Kafka core platform only.++or++* `Confluent Platform`_ - Enterprise Kafka Distribution (Open Source, Community, or Commercial)++  * `Schema Registry <https://www.confluent.io/product/confluent-platform/data-compatibility/>`__+  * Monitoring and alerting capabilities (Commercial)+  * Self-balancing clusters (Commercial)+  * Tiered storage (Commercial) (future feature of Apache Kafka)+  * Infinite retention (Cloud only?)++Additional Notes:++* `Apache Pulsar`_ has similar features as part of its platform, which is why it makes a good potential alternative. However, the features are less battle-tested and the deployment story *may* be more complicated.+* Confluent also offers Confluent Cloud, a fully managed solution that offers much simpler operations, but is unlikely to be used by edX.org.++Also see a useful and biased `comparison of Apache Kafka vs Vendors`_ by Kai Waehner (of Confluent), comparing various providers and distributions of Kafka and related or competitive services.++.. _Amazon MSK: https://aws.amazon.com/msk/+.. _Confluent Platform: https://www.confluent.io/product/confluent-platform+.. _comparison of Apache Kafka vs Vendors: https://www.kai-waehner.de/blog/2021/04/20/comparison-open-source-apache-kafka-vs-confluent-cloudera-red-hat-amazon-msk-cloud/++Kafka Highlights+~~~~~~~~~~~~~~~~++Pros+^^^^++* Battle-tested, widely adopted, big community, lots of documentation and answers.+* Amazon MSK (AWS service) provides hosted path of least resistance.+* `New Relic integration with Amazon MSK`_ (useful to edX.org).++Cons+^^^^++* Many open questions about add-ons required for developers and operators.+* Complex to manage, including likely manual scaling.++.. _New Relic integration with Amazon MSK: https://docs.newrelic.com/docs/integrations/amazon-integrations/aws-integrations-list/aws-managed-kafka-msk-integration/++Consequences+------------++* Operators will need to deploy and manage the selected infrastructure, which is likely to be complex. If Apache Kafka is selected, there are likely to be a set of auxiliary parts to provide all required functionality for our message bus.+* Education will be required for both developers and operators regarding best practices for each role.+* Code to interact with Kafka and its libraries will be added to core services.+* At least one initial use case must be completed. One potential candidate is the grade change event in the LMS, and its use by the Credentials service.+* Once we have a message bus, we can investigate other potential use cases:++  * Course/program update propagation.+  * Feed into xAPI/Caliper capabilities.+  * New services and features can be built fully de-coupled from the core application.++Rejected Alternatives+---------------------++Apache Pulsar+~~~~~~~~~~~~~++Although rejected to start, `Apache Pulsar`_ remains an option if solving with Kafka turns out to be overly burdensome for developers or operators.++Pros+^^^^++* Ease of scalability (built-in, according to docs).+* Ease of data retention capabilities.+* Additional built-in pub/sub features (built-in, according to docs).++Cons+^^^^++* Requires 3rd party hosting or larger upfront investment in self-hosted (kubernetes).+* Less mature (but growing) community, little documentation, and few answers.++Note: Read an interesting (Kafka/Confluent) biased article exploring `comparisons and myths of Kafka vs Pulsar`_.++.. _Apache Pulsar: https://pulsar.apache.org/+.. _comparisons and myths of Kafka vs Pulsar: https://dzone.com/articles/pulsar-vs-kafka-comparison-and-myths-explored++Redis+~~~~~++Pros+^^^^++* Already part of Open edX platform++Cons+^^^^++* Can lose acked data, even if RAM backed up with an append-only file (AOF).+* Requires homegrown schema management.++Abstract Message Bus Class+~~~~~~~~~~~~~~~~~~~~~~~~~~++@feanil - What is this???

Ah, thank you. Yes, we can discuss that topic here going forward. @regisb wanted to discuss Redis streams, and we’ll need an abstraction layer would cripple capabilities we need/want now and/or in the future.

I’ll note here that haystack was dedicated to solving this problem for search tools, and we ultimately decided it was crippling our ES capabilities and we had to work to tear it out. It is not meant to be a perfect analogy, but just a warning from recent history.

feanil

comment created time in 9 days

Pull request review commentedx/open-edx-proposals

OEP-52 (DRAFT): docs: add OEP for event bus

+==============================+OEP-0052: Event Bus Technology+==============================++.. This OEP template is based on Nygard's Architecture Decision Records.++.. list-table::+   :widths: 25 75++   * - OEP-52+     - :doc:`OEP-0052 <oep-0052-arch-event-bus-technology>`+   * - Title+     - Event Bus Technology+   * - Last Modified+     - 2021-08-26+   * - Authors+     - Feanil Patel <feanil@edx.org>, Robert Raposa <rraposa@edx.org>+   * - Arbiter+     - TBD+   * - Status+     - Draft+   * - Type+     - Architecture+   * - Created+     - 2021-08-16+   * - Review Period+     - TBD++Overview+--------++* Adding an event bus to the Open edX platform allows asynchronous event messaging across services, which enables a number of improvements aligned with our architectural goals.++* We are trialing Kafka to implement an Open edX event bus. This decision will be updated as we commit or change direction.++* *Note:* Although this infrastructure could serve multiple purposes in the future, this OEP is focused on `publish-subscribe messaging pattern`_ (pub/sub) capabilities.++Context+-------++The already accepted :doc:`oep-0041-arch-async-server-event-messaging` details the general format and conventions the Open edX platform should use for asynchronous event messaging across services. It also provides background on a set of :ref:`Event Messaging Architectural Goals` for the Open edX platform, including:++* Align with the `Architecture Manifesto`_ themes of decentralization and asynchronous communication.+* Eliminate blocking, synchronous requests.+* Eliminate expensive, batch synchronization.+* Reduce the need for plugins.+* Flexibly integrate with event producers.+* Simplify integration to external systems.++However, this earlier OEP explicitly leaves out of scope the specific transport and libraries used for this messaging. At this time, the Open edX platform still lacks a reliable way to send events to multiple consumers across services, following the publish-subscribe (pub/sub) messaging pattern.++In other words, we have documented what we wish to do and why, but we do not yet fully have the capability to do it. We are missing a reliable event messaging infrastructure with `publish-subscribe messaging pattern`_ (pub/sub) capabilities.++Note: The `Architecture Manifesto`_ mentions being inspired by the `Reactive Manifesto`_. Although there is a lot of overlap, this might also cause confusion, because the Reactive Manifesto discusses `Message Driven (in contrast to Event-Driven)`_, which they define as messages sent to a specific destination. However, this decision is not concerned with Message Driven by this definition, but instead about Event-Driven capabilities using pub/sub.++.. _Architecture Manifesto: https://openedx.atlassian.net/wiki/spaces/AC/pages/1074397222/Architecture+Manifesto+WIP+.. _Reactive Manifesto: https://www.reactivemanifesto.org/+.. _Message Driven (in contrast to Event-Driven): https://www.reactivemanifesto.org/glossary#Message-Driven++Decision+--------++The Open edX platform will benefit from having a message bus. We believe `Apache Kafka`_ is a good choice for the event bus and will begin trialing it to solve some specific platform problems addressed by the `publish-subscribe messaging pattern`_.++.. _Apache Kafka: https://kafka.apache.org/+.. _publish-subscribe messaging pattern: https://en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern++Why (trial) Kafka?+~~~~~~~~~~~~~~~~~~++Kafka has been around for a long time. See `Thoughtworks's technology radar introduced Kafka`_ as "Assess" in 2015, and "Trial" in 2016. It never moved up to "Adopt", and also never moved down to "Hold". Read `Thoughtwork's Kafka decoder page`_ to lear more about its benefits and trade-offs, and how it is used.++More recently, the `Thoughtworks's technology radar introduced Apache Pulsar`_ as "assess" in 2020, and the `technology radar introduced Kafka API without Kafka`_ in 2021. This both demonstrates near standard of the Kafka API, but also Thoughtwork's hope to find a less complex alternative.++This history closely parallels our own internal research and comparisons. We believe Apache Kafka is still the right option due to its maturity, documentation, support and community. However, Kafka can end up being a very complex collection of tools, some of which Apache Pulsar was designed to simplify. Therefore, we will begin a trial of Kafka, keeping in mind the potential benefits of Pulsar, and ultimately commit or start a trial of Pulsar.++This OEP will be adjusted as we learn more and make a final decision.++.. _Thoughtworks's technology radar introduced Kafka: https://www.thoughtworks.com/radar/tools/apache-kafka+.. _Thoughtwork's Kafka decoder page: https://www.thoughtworks.com/decoder/kafka++.. _Thoughtworks's technology radar introduced Apache Pulsar: https://www.thoughtworks.com/radar/platforms/apache-pulsar+.. _technology radar introduced Kafka API without Kafka: https://www.thoughtworks.com/radar/platforms/kafka-api-without-kafka++Messaging Features+~~~~~~~~~~~~~~~~~~++Kafka is a distributed streaming platform. Kafka's implementation maps nicely to the pub/sub pattern. However, some native features of a message broker like RabbitMQ are not built-in.++There is a useful `blog article comparing Kafka and RabbitMQ`_ by Eran Stiller. The article compares the technologies as pub/sub implementations across the following dimensions (winner in parentheses):++* Message Ordering (Kafka)+* Message Routing/Filtering (RabbitMQ)+* Message Timing (RabbitMQ)+* Message Retention (Kafka)+* Fault Handling (RabbitMQ)+* Scale (Kafka)+* Consumer Complexity (RabbitMQ)++Above dimensions which we ultimately require, but were won by RabbitMQ, will likely require additional development and/or supplementary technologies, as partially detailed in the next section.++Note: Some of these missing features are natively supported by `Apache Pulsar`_, at least according to its documentation.++.. _blog article comparing Kafka and RabbitMQ: https://stiller.blog/2020/02/rabbitmq-vs-kafka-an-architects-dilemma-part-2/++Kafka Add-ons, Distributions, and Providers+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~++As part of the trial, there will be many additional related technologies to explore. Each technology affects the developer experience, the operator experience, or both. Some choices may ultimately affect the Open edX plaform for the entire community, and some choices may be unique to each organization (like edx.org). This document will ultimately contain details for both these cases, since they may help other organizations even when multiple options are still available.++The following is a list of just some of the potential technologies that may need to be deployed and managed:++* `Apache Kafka`_+* `Kafka Streams <https://kafka.apache.org/documentation/streams/>`__+* `Kafka Connect <https://kafka.apache.org/documentation/#connect>`__+* `Cruise Control <https://github.com/linkedin/cruise-control>`__+* `Faust <https://faust.readthedocs.io/en/latest/userguide/kafka.html>`__ (Python version similar to Kafka Streams)+* `Various Python clients <https://cwiki.apache.org/confluence/display/KAFKA/Clients#Clients-Python>`__++Note: `Amazon MSK`_ is an AWS managed service that supplies the Apache Kafka core platform only.++or++* `Confluent Platform`_ - Enterprise Kafka Distribution (Open Source, Community, or Commercial)++  * `Schema Registry <https://www.confluent.io/product/confluent-platform/data-compatibility/>`__+  * Monitoring and alerting capabilities (Commercial)+  * Self-balancing clusters (Commercial)+  * Tiered storage (Commercial) (future feature of Apache Kafka)+  * Infinite retention (Cloud only?)++Additional Notes:++* `Apache Pulsar`_ has similar features as part of its platform, which is why it makes a good potential alternative. However, the features are less battle-tested and the deployment story *may* be more complicated.+* Confluent also offers Confluent Cloud, a fully managed solution that offers much simpler operations, but is unlikely to be used by edX.org.++Also see a useful and biased `comparison of Apache Kafka vs Vendors`_ by Kai Waehner (of Confluent), comparing various providers and distributions of Kafka and related or competitive services.++.. _Amazon MSK: https://aws.amazon.com/msk/+.. _Confluent Platform: https://www.confluent.io/product/confluent-platform+.. _comparison of Apache Kafka vs Vendors: https://www.kai-waehner.de/blog/2021/04/20/comparison-open-source-apache-kafka-vs-confluent-cloudera-red-hat-amazon-msk-cloud/++Kafka Highlights+~~~~~~~~~~~~~~~~++Pros+^^^^++* Battle-tested, widely adopted, big community, lots of documentation and answers.+* Amazon MSK (AWS service) provides hosted path of least resistance.+* `New Relic integration with Amazon MSK`_ (useful to edX.org).++Cons+^^^^++* Many open questions about add-ons required for developers and operators.+* Complex to manage, including likely manual scaling.++.. _New Relic integration with Amazon MSK: https://docs.newrelic.com/docs/integrations/amazon-integrations/aws-integrations-list/aws-managed-kafka-msk-integration/++Consequences+------------++* Operators will need to deploy and manage the selected infrastructure, which is likely to be complex. If Apache Kafka is selected, there are likely to be a set of auxiliary parts to provide all required functionality for our message bus.+* Education will be required for both developers and operators regarding best practices for each role.+* Code to interact with Kafka and its libraries will be added to core services.+* At least one initial use case must be completed. One potential candidate is the grade change event in the LMS, and its use by the Credentials service.+* Once we have a message bus, we can investigate other potential use cases:++  * Course/program update propagation.+  * Feed into xAPI/Caliper capabilities.+  * New services and features can be built fully de-coupled from the core application.++Rejected Alternatives+---------------------++Apache Pulsar+~~~~~~~~~~~~~++Although rejected to start, `Apache Pulsar`_ remains an option if solving with Kafka turns out to be overly burdensome for developers or operators.

@blarghmatey: Thanks for your comment: https://github.com/edx/open-edx-proposals/pull/233#issuecomment-913605682. I’m going to move discussion of Pulsar to this section, so we can have a threaded discussion.

  1. I think some of what you had written about is already discussed here, but I can adjust accordingly.
  2. The one detail I almost missed from your post was about multi-tenancy. I appreciate you bringing it up. This may require more research. Typically, there is a way to solve most issues with Kafka, but I’m not sure how ugly or elegant this would be.
feanil

comment created time in 9 days

Pull request review commentedx/open-edx-proposals

OEP-52 (DRAFT): docs: add OEP for event bus

+==============================+OEP-0052: Event Bus Technology+==============================++.. This OEP template is based on Nygard's Architecture Decision Records.++.. list-table::+   :widths: 25 75++   * - OEP-52+     - :doc:`OEP-0052 <oep-0052-arch-event-bus-technology>`+   * - Title+     - Event Bus Technology+   * - Last Modified+     - 2021-08-26+   * - Authors+     - Feanil Patel <feanil@edx.org>, Robert Raposa <rraposa@edx.org>+   * - Arbiter+     - TBD+   * - Status+     - Draft+   * - Type+     - Architecture+   * - Created+     - 2021-08-16+   * - Review Period+     - TBD++Overview+--------++* Adding an event bus to the Open edX platform allows asynchronous event messaging across services, which enables a number of improvements aligned with our architectural goals.++* We are trialing Kafka to implement an Open edX event bus. This decision will be updated as we commit or change direction.++* *Note:* Although this infrastructure could serve multiple purposes in the future, this OEP is focused on `publish-subscribe messaging pattern`_ (pub/sub) capabilities.++Context+-------++The already accepted :doc:`oep-0041-arch-async-server-event-messaging` details the general format and conventions the Open edX platform should use for asynchronous event messaging across services. It also provides background on a set of :ref:`Event Messaging Architectural Goals` for the Open edX platform, including:++* Align with the `Architecture Manifesto`_ themes of decentralization and asynchronous communication.+* Eliminate blocking, synchronous requests.+* Eliminate expensive, batch synchronization.+* Reduce the need for plugins.+* Flexibly integrate with event producers.+* Simplify integration to external systems.++However, this earlier OEP explicitly leaves out of scope the specific transport and libraries used for this messaging. At this time, the Open edX platform still lacks a reliable way to send events to multiple consumers across services, following the publish-subscribe (pub/sub) messaging pattern.++In other words, we have documented what we wish to do and why, but we do not yet fully have the capability to do it. We are missing a reliable event messaging infrastructure with `publish-subscribe messaging pattern`_ (pub/sub) capabilities.++Note: The `Architecture Manifesto`_ mentions being inspired by the `Reactive Manifesto`_. Although there is a lot of overlap, this might also cause confusion, because the Reactive Manifesto discusses `Message Driven (in contrast to Event-Driven)`_, which they define as messages sent to a specific destination. However, this decision is not concerned with Message Driven by this definition, but instead about Event-Driven capabilities using pub/sub.++.. _Architecture Manifesto: https://openedx.atlassian.net/wiki/spaces/AC/pages/1074397222/Architecture+Manifesto+WIP+.. _Reactive Manifesto: https://www.reactivemanifesto.org/+.. _Message Driven (in contrast to Event-Driven): https://www.reactivemanifesto.org/glossary#Message-Driven++Decision+--------++The Open edX platform will benefit from having a message bus. We believe `Apache Kafka`_ is a good choice for the event bus and will begin trialing it to solve some specific platform problems addressed by the `publish-subscribe messaging pattern`_.++.. _Apache Kafka: https://kafka.apache.org/+.. _publish-subscribe messaging pattern: https://en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern++Why (trial) Kafka?+~~~~~~~~~~~~~~~~~~++Kafka has been around for a long time. See `Thoughtworks's technology radar introduced Kafka`_ as "Assess" in 2015, and "Trial" in 2016. It never moved up to "Adopt", and also never moved down to "Hold". Read `Thoughtwork's Kafka decoder page`_ to lear more about its benefits and trade-offs, and how it is used.++More recently, the `Thoughtworks's technology radar introduced Apache Pulsar`_ as "assess" in 2020, and the `technology radar introduced Kafka API without Kafka`_ in 2021. This both demonstrates near standard of the Kafka API, but also Thoughtwork's hope to find a less complex alternative.++This history closely parallels our own internal research and comparisons. We believe Apache Kafka is still the right option due to its maturity, documentation, support and community. However, Kafka can end up being a very complex collection of tools, some of which Apache Pulsar was designed to simplify. Therefore, we will begin a trial of Kafka, keeping in mind the potential benefits of Pulsar, and ultimately commit or start a trial of Pulsar.++This OEP will be adjusted as we learn more and make a final decision.++.. _Thoughtworks's technology radar introduced Kafka: https://www.thoughtworks.com/radar/tools/apache-kafka+.. _Thoughtwork's Kafka decoder page: https://www.thoughtworks.com/decoder/kafka++.. _Thoughtworks's technology radar introduced Apache Pulsar: https://www.thoughtworks.com/radar/platforms/apache-pulsar+.. _technology radar introduced Kafka API without Kafka: https://www.thoughtworks.com/radar/platforms/kafka-api-without-kafka++Messaging Features+~~~~~~~~~~~~~~~~~~++Kafka is a distributed streaming platform. Kafka's implementation maps nicely to the pub/sub pattern. However, some native features of a message broker like RabbitMQ are not built-in.++There is a useful `blog article comparing Kafka and RabbitMQ`_ by Eran Stiller. The article compares the technologies as pub/sub implementations across the following dimensions (winner in parentheses):++* Message Ordering (Kafka)+* Message Routing/Filtering (RabbitMQ)+* Message Timing (RabbitMQ)+* Message Retention (Kafka)+* Fault Handling (RabbitMQ)+* Scale (Kafka)+* Consumer Complexity (RabbitMQ)++Above dimensions which we ultimately require, but were won by RabbitMQ, will likely require additional development and/or supplementary technologies, as partially detailed in the next section.++Note: Some of these missing features are natively supported by `Apache Pulsar`_, at least according to its documentation.++.. _blog article comparing Kafka and RabbitMQ: https://stiller.blog/2020/02/rabbitmq-vs-kafka-an-architects-dilemma-part-2/++Kafka Add-ons, Distributions, and Providers+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~++As part of the trial, there will be many additional related technologies to explore. Each technology affects the developer experience, the operator experience, or both. Some choices may ultimately affect the Open edX plaform for the entire community, and some choices may be unique to each organization (like edx.org). This document will ultimately contain details for both these cases, since they may help other organizations even when multiple options are still available.++The following is a list of just some of the potential technologies that may need to be deployed and managed:++* `Apache Kafka`_+* `Kafka Streams <https://kafka.apache.org/documentation/streams/>`__+* `Kafka Connect <https://kafka.apache.org/documentation/#connect>`__+* `Cruise Control <https://github.com/linkedin/cruise-control>`__+* `Faust <https://faust.readthedocs.io/en/latest/userguide/kafka.html>`__ (Python version similar to Kafka Streams)+* `Various Python clients <https://cwiki.apache.org/confluence/display/KAFKA/Clients#Clients-Python>`__++Note: `Amazon MSK`_ is an AWS managed service that supplies the Apache Kafka core platform only.++or++* `Confluent Platform`_ - Enterprise Kafka Distribution (Open Source, Community, or Commercial)++  * `Schema Registry <https://www.confluent.io/product/confluent-platform/data-compatibility/>`__+  * Monitoring and alerting capabilities (Commercial)+  * Self-balancing clusters (Commercial)+  * Tiered storage (Commercial) (future feature of Apache Kafka)+  * Infinite retention (Cloud only?)++Additional Notes:++* `Apache Pulsar`_ has similar features as part of its platform, which is why it makes a good potential alternative. However, the features are less battle-tested and the deployment story *may* be more complicated.+* Confluent also offers Confluent Cloud, a fully managed solution that offers much simpler operations, but is unlikely to be used by edX.org.++Also see a useful and biased `comparison of Apache Kafka vs Vendors`_ by Kai Waehner (of Confluent), comparing various providers and distributions of Kafka and related or competitive services.++.. _Amazon MSK: https://aws.amazon.com/msk/+.. _Confluent Platform: https://www.confluent.io/product/confluent-platform+.. _comparison of Apache Kafka vs Vendors: https://www.kai-waehner.de/blog/2021/04/20/comparison-open-source-apache-kafka-vs-confluent-cloudera-red-hat-amazon-msk-cloud/++Kafka Highlights+~~~~~~~~~~~~~~~~++Pros+^^^^++* Battle-tested, widely adopted, big community, lots of documentation and answers.+* Amazon MSK (AWS service) provides hosted path of least resistance.+* `New Relic integration with Amazon MSK`_ (useful to edX.org).++Cons+^^^^++* Many open questions about add-ons required for developers and operators.+* Complex to manage, including likely manual scaling.++.. _New Relic integration with Amazon MSK: https://docs.newrelic.com/docs/integrations/amazon-integrations/aws-integrations-list/aws-managed-kafka-msk-integration/++Consequences+------------++* Operators will need to deploy and manage the selected infrastructure, which is likely to be complex. If Apache Kafka is selected, there are likely to be a set of auxiliary parts to provide all required functionality for our message bus.+* Education will be required for both developers and operators regarding best practices for each role.+* Code to interact with Kafka and its libraries will be added to core services.+* At least one initial use case must be completed. One potential candidate is the grade change event in the LMS, and its use by the Credentials service.+* Once we have a message bus, we can investigate other potential use cases:++  * Course/program update propagation.+  * Feed into xAPI/Caliper capabilities.+  * New services and features can be built fully de-coupled from the core application.++Rejected Alternatives+---------------------++Apache Pulsar+~~~~~~~~~~~~~++Although rejected to start, `Apache Pulsar`_ remains an option if solving with Kafka turns out to be overly burdensome for developers or operators.++Pros+^^^^++* Ease of scalability (built-in, according to docs).+* Ease of data retention capabilities.+* Additional built-in pub/sub features (built-in, according to docs).++Cons+^^^^++* Requires 3rd party hosting or larger upfront investment in self-hosted (kubernetes).+* Less mature (but growing) community, little documentation, and few answers.++Note: Read an interesting (Kafka/Confluent) biased article exploring `comparisons and myths of Kafka vs Pulsar`_.++.. _Apache Pulsar: https://pulsar.apache.org/+.. _comparisons and myths of Kafka vs Pulsar: https://dzone.com/articles/pulsar-vs-kafka-comparison-and-myths-explored++Redis+~~~~~++Pros+^^^^++* Already part of Open edX platform++Cons+^^^^++* Can lose acked data, even if RAM backed up with an append-only file (AOF).+* Requires homegrown schema management.

Part of our POC is meant to include schemas in action, so maybe we can add more details once we get to that.

feanil

comment created time in 9 days

PullRequestReviewEvent
PullRequestReviewEvent

pull request commentedx/edx-platform

fix: loading of profile images in devstack

Thanks for the fix @xitij2000.

xitij2000

comment created time in 10 days

push eventedx/edx-platform

Kshitij Sobti

commit sha 982c98d1bd213bf09ad50d934e54071774133c1b

fix: loading of profile images in devstack (#28555) Profile images don't load in the devstack since the path for media files is broader than the path for profile images, reorderig them fixes this.

view details

push time in 10 days

PR merged edx/edx-platform

fix: loading of profile images in devstack open-source-contribution waiting on author

Description

Profile images don't load in the devstack since the path for media files is broader than the path for profile images, reorderig them fixes this.

Testing instructions

  1. Upload a profile image for a user in the lms running in the devstack
  2. The image will throw a 404 on the devstack
  3. Switch to this branch and restart
  4. The image should now start showing up

Deadline

"None"

+2 -1

5 comments

1 changed file

xitij2000

pr closed time in 10 days

PullRequestReviewEvent
PullRequestReviewEvent

Pull request review commentedx/edx-platform

fix: loading of profile images in devstack

  if settings.DEBUG:     urlpatterns += static(settings.STATIC_URL, document_root=settings.STATIC_ROOT)-    urlpatterns += static(settings.MEDIA_URL, document_root=settings.MEDIA_ROOT)     urlpatterns += static(

Maybe add something like the following to keep this from regressing?

    # profile image urls must come before the media url to work
    urlpatterns += static(
xitij2000

comment created time in 14 days

PullRequestReviewEvent

pull request commentedx/edx-django-utils

BOM-2745: Add DeploymentMonitoringMiddleware

Nice job! Looking forward to seeing this in New Relic after IDA upgrades.

UsamaSadiq

comment created time in 15 days

PullRequestReviewEvent

issue commenteduNEXT/openedx-events

Marking events experimental

There is a related discussion in https://github.com/eduNEXT/openedx-events/pull/24 about a status of provisional/not-implement vs active/implemented, and how and if to document or discover this.

In terms of the term "experimental", it doesn't sound like you are saying we are experimenting with whether or not the event is needed, but more that you want to try it out in at least a single use case and evolve it as necessary, before having anyone just start using it? Would it be fair to say that nearly all the unimplemented events would be "experimental", and initial implementations would be "experimental" until we decide they are not? Or, did you have a different definition or meaning of "experimental"?

Note: we are also having separate discussions about schema management and evolution as it relates to a use of an event bus. It may be that we'd like the processes to converge, and having a permanent or semi-permanent record of events may force us to improve initial designs, or other potential consequences. This remains to be seen.

ormsbee

comment created time in 15 days

PullRequestReviewEvent

Pull request review commentedx/edx-django-utils

BOM-2745: Add DeploymentMonitoringMiddleware

 _REQUEST_CACHE_NAMESPACE = f'{_DEFAULT_NAMESPACE}.custom_attributes'  +class DeploymentMonitoringMiddleware:+    """+    Middleware to record environment values at the time of deployment for each service.+    """+    def __init__(self, get_response):+        self.get_response = get_response++    def __call__(self, request):+        self.record_python_version()+        self.record_django_version()+        response = self.get_response(request)+        return response++    @staticmethod+    def record_django_version():+        """+        Record the installed Django version as custom attribute++        .. custom_attribute_name: django_version+        .. custom_attribute_description: The django version in use (e.g. '2.2.24').+           Set by DeploymentMonitoringMiddleware.+        """+        if not newrelic:  # pragma: no cover+            return+        try:+            import django  # pylint: disable=import-outside-toplevel+            django_version = django.__version__+        except ModuleNotFoundError:+            return+        _set_custom_attribute('django_version', django_version)++    @staticmethod+    def record_python_version():+        """+        Record the Python version as custom attribute++        .. custom_attribute_name: python_version+        .. custom_attribute_description: The Python version in use (e.g. '3.8.10').+           Set by DeploymentMonitoringMiddleware.+        """+        if not newrelic:  # pragma: no cover+            return+        _set_custom_attribute('python_version', platform.python_version())++    def process_exception(self, exception):  # pylint: disable=unused-argument

@UsamaSadiq: Reminder to kill this, or at least respond to this comment. Thanks.

UsamaSadiq

comment created time in 16 days