This article is written by Elinext’s software developer Ivan Polyakov. It covers AWS KinesisStreams, which if we had to describe it in a few words, is a scalable real-time streaming service for collecting and processing extremely large amounts of data. Simple, yet promising for many tasks that require transmitting and operating big data volumes.
This is a review of the service, where we highlight the moments of using it for the projects that require high loads of data.
The service landing page gives us a general impression on what it is all about, so let’s take a closer look.
Kinesis is actually split into 4 specific similar services :
Video Streams makes it easy to securely stream video from connected devices to AWS for analytics, machine learning (ML), and other processing.
Data Streams is a scalable and durable real-time data streaming service that can continuously capture gigabytes of data per second from hundreds of thousands of sources.
Data Firehose is the easiest way to capture, transform, and load data streams into AWS data stores for near real-time analytics with existing business intelligence tools.
Data Analytics is the easiest way to process data streams in real time with SQL or Java without having to learn new programming languages or processing frameworks.
In this article, we’ll highlight Data Streams (DS) and Data Firehose (DF) services. They cover all the needs for further reviewable cases.
Usually, if spec for a project does not require architecture to deal with high loading, this aspect often gets overlooked. This is especially true in cases when budget and time frame for development are limited.
In most cases. incoming data goes directly to the app, which processes it, does certain actions and stores the data into the database. Everything works fine until the “sudden popularity” hits the project, and the collected amount of data become too large for the processing.
Then the app begins working way too slowly and consumes a lot of memory. Parts of incoming data usually get lost.
Often first counter-actions taken against it are upgrading the hardware and doing some code refactoring. It helps for a certain period of time, usually for a day, or two 🙂 Then incoming data increases 100x times and transfers into millions of requests a day and gigabytes of data.
The thing that could help now is upgrading the architecture to the one that can handle high loading. KinesisStream service will be considered as one of the possible ways to achieve it. We’ll give you the pros and cons of using the service below.
Architecture without KinesisStreams
First of all, we have to mention, that only some common approaches improving different processes will be reviewed. Let’s begin with analyzing the main issue in the details.