The Elastic Guru

The Elastic Guru is a community of amazing AWS enthusiasts

We're a place where friendly AWS peeps create, read and share content to ignite the curiosity, learning, growth and success in others.

Create new account Log in
loading...
Cover image for Gently down the stream with Amazon Kinesis

Gently down the stream with Amazon Kinesis

helenanders26 profile image Helen Anderson Originally published at helenanderson.co.nz ・4 min read

Gone are the days of batch processing and simply loading new tables into the database every 12 hours. With more and more platforms offering event streams of data, the infrastructure that stores these events and draws meaning from it needs to change too.

This post is an overview of how AWS Kinesis can be built into new or existing architecture to solve this problem. There are several options that allow you to run analytics on the fly, shard the data streams for scalability or simply stream the data into an S3 bucket for later processing.


Stream v Batch
Kinesis Streams
Kinesis Firehose
Kinesis Analytics
Is it secure?
How do I pay for all this?
Getting started



Stream v Batch

Kinesis allows data to be streamed in real-time from a Producer to a Processer or Storage option. More on these concepts in a bit.

This is a huge change from Batch Processing that has been the traditional way to land data from one location to another.

Batch Processing - Data, usually stored in a database, is landed in chunks and analysed when the transfer is complete.

Stream Processing - Streams of data pour in, in realtime and don't have an end... unless you create one. This allows us to act on the data and make decisions faster.



Kinesis Data Streams

Alt Text

Kinesis Data Streams

Back to the concepts, using Kinesis Data Streams as an example.

Input/Producer - the application that generates the events we want to capture. This can be log files, media, website clicks or transactional data.

Data Stream - this is a shard, or group of shards, that ingest records at 1000 records per shard, per second. Data is then available for 24 hours.

Consumer/Processer - this is the AWS service, which can be another Kinesis service, that retrieves the events from the shards. In most cases, this is happening in real-time. AWS Lambda can be triggered to transform the event data into more usable data or push it into a database like DynamoDB or Aurora.


Use cases for Kinesis Data Streams:

  • Streaming data like website clicks and transactional data
  • Migrating data from databases
  • Applications with specialised data pipelines


Kinesis Firehose

Alt Text

Kinesis Firehose

Kinesis Firehose differs from Kinesis Data Streams as it takes the data, batches, encrypts and compresses it. Then persists it somewhere such as Amazon S3, Amazon Redshift, or Amazon Elasticsearch Service.


Use cases for Kinesis Firehose:

  • IoT events
  • Security monitoring such as Splunk can be configured as a destination
  • Auto Archiving


Kinesis Analytics

Alt Text

Kinesis Analytics

Kinesis Data Analytics allows us to both process events and analyse them using SQL queries on-the-fly. The service recognises formats like JSON and CSV, then sends the output on to analytics tool for visualisation or action.


Use cases for Kinesis Analytics:



Is it secure?

Kinesis automatically encrypts events and access can be managed using IAM from the console.



How do I pay for all this?



Getting started

This service is not included in the Free Tier but many of the other core services are.



Useful Links


Discussion (3)

pic
Editor guide
Collapse
lee profile image
lee

This is great. Do you know of any simple case studies of AWS Kinesis customers? Like they are using it to solve 'x'?

Collapse
helenanders26 profile image
Helen Anderson Author

I really enjoyed a talk from re:Invent 2019 by the Sony Playstation team who are using streaming technology to evaluate every purchase on their network and prevent fraudulent logins.

The final architecture took the best of batch and streaming processing to reduce time to resolve.

Batch Processing:

  • Batch processing processes all data
  • REST API interacts with Kinesis Firehose to convert to parquet
  • Glue and Spark aggregate data and persists in DynamoDB

Stream Processing:

  • Adds a speed layer for temporary real time current state decisions
  • Uses Kinesis Analytics and persists in DynamoDB

Collapse
lee profile image
lee

Thanks so much for this. Really good stuff 🙏🏽