Component Diagram
Last updated
Last updated
The Program User Info service is constructed upon the framework of Flink, Kafka and Cassandra, a powerful approach for processing and managing data in a scalable and real-time manner.
Apache Kafka is a distributed event streaming platform that is designed for handling high volumes of real-time data. In this service Kafka produces user information as an event of type JSON.
Apache Flink is a stream processing framework that provides high-throughput, low-latency, and exactly-once processing of streaming data. Flink consumes the event from Kafka and flattens the JSON. This process involves taking nested JSON structures and transforming them into a simpler, one-level structure where all the fields are at the top level.
Apache Cassandra is a distributed NoSQL database that is designed to handle massive amounts of data across many commodity servers, ensuring high availability and fault tolerance. Once the data is flattened it will be stored in Cassandra database.
Configuration variables:
Variable | Default Value | Purpose |
---|---|---|
kafka.input.topic | {{env}}.programuser.info | Kafka topic from which messages/events are read to be processed. |
kafka.groupId | {{env}}-programuser-group | Kafka input topic group Id |
ml-cassandra.keyspace | sunbird_programs | Cassandra keyspace name |
ml-cassandra.table | program_enrollment | Cassandra table used to store user data |