Component Diagram

The Program User Info service is constructed upon the framework of Flink, Kafka and Cassandra, a powerful approach for processing and managing data in a scalable and real-time manner.

Apache Kafka is a distributed event streaming platform that is designed for handling high volumes of real-time data. In this service Kafka produces user information as an event of type JSON.
Apache Flink is a stream processing framework that provides high-throughput, low-latency, and exactly-once processing of streaming data. Flink consumes the event from Kafka and flattens the JSON. This process involves taking nested JSON structures and transforming them into a simpler, one-level structure where all the fields are at the top level.
Apache Cassandra is a distributed NoSQL database that is designed to handle massive amounts of data across many commodity servers, ensuring high availability and fault tolerance. Once the data is flattened it will be stored in Cassandra database.

Configuration variables:

Variable

Default Value

Purpose

kafka.input.topic

{{env}}.programuser.info

Kafka topic from which messages/events are read to be processed.

kafka.groupId

{{env}}-programuser-group

Kafka input topic group Id

ml-cassandra.keyspace

sunbird_programs

Cassandra keyspace name

ml-cassandra.table

program_enrollment

Cassandra table used to store user data

PreviousProgram User Info NextData Model

Last updated 1 year ago

Was this helpful?