Consume data stream from Kinesis Data Streams using Kinesis Firehose
Services Covered
Kinesis Data Streams
Kinesis Firehose
Lab description
An application running on the EC2 instance is generating continuous logs, those loge will be pushed into the Kinesis Data Streams. From Kinesis Data Streams, it will be consumed through the Kinesis Firehose and then saved into a S3 bucket.
Learning Objectives
- Host a sample website on EC2 and configure Kinesis Agent on it
- Create Kinesis data stream
- Create Kinesis Data Firehose
Lab date
12-12-2021
Prerequisites
- AWS account
Lab steps
- Create an IAM Role for EC2. Attach the AmazonS3FullAccess and AmazonKinesisFullAccess permissions.
- Launch an EC2 instance. Provide the user data script:
#!/bin/bash sudo -s yum update -y sudo amazon-linux-extras install -y lamp-mariadb10.2-php7.2 php7.2 sudo yum install -y httpd mariadb-server sudo systemctl start httpd sudo systemctl enable httpd
It will start a simple web server on that instance.
For the security group allow SSH and HTTp ingress traffic.
- SSH into EC2 instance. On the instance go to
cd /var/www/html
and download a sample site
sudo wget https://www.free-css.com/assets/files/free-css-templates/download/page270/marvel.zip
then unzip it
sudo unzip marvel.zip
Navigate to
http://<Instances-public-IP>/2115_marvel
The website logs will be in the path “/var/log/httpd/access_log”. For each click and use of the website, the related logs will be collected and stored here.
- Create Kinesis Data Stream. Choose Provisioned with 1 data shard. When created, enable server-side encryption.
- Create a S3 bucket to store data from Firehose. Enable server-side encryption with S3 key.
- Create Kinesis Data Firehose. As source use the Kinesis Data Stream, destination is the S3 bucket, change the buffer interval to 60 seconds.,
- Create and configure Kinesis Agent on the EC2 instance.
sudo yum install –y https://s3.amazonaws.com/streaming-data-agent/aws-kinesis-agent-latest.amzn2.noarch.rpm
After installing the Kinesis agent, let us update the json file available in the path /etc/aws-kinesis/agent.json. Edit the agent.json,
sudo nano /etc/aws-kinesis/agent.json
Replace the content with following and your access id and key:
{ "cloudwatch.emitMetrics": true, "kinesis.endpoint": "", "firehose.endpoint": "", "awsAccessKeyId": "AKIA5***************", "awsSecretAccessKey": "iPpJ***************", "flows": [ { "filePattern": "/var/log/httpd/access_log", "kinesisStream": "whiz-data-stream", "partitionKeyOption": "RANDOM" } ] }
Restart the agent
sudo service aws-kinesis-agent stop sudo service aws-kinesis-agent start
You can check if the service is started properly by going through the log.
head -10 aws-kinesis-agent.log
- Test the real-time streaming of data by interacting with the
http://<Instances-public-IP>/2115_marvel
. Then navigate to the S3 bucket and wait for the data logs to be created.