amazon-kinesis – A Passionate Techie

I’ve been racking my brain on this one, and the solution is probably so simple but I just can’t figure it out and have searched everywhere and can’t find a solution to this specific problem.

I’ve got two tables – one contains an IP connections list, for example:

Connections_Table

src            dst
192.168.1.1    1.2.3.4
192.168.1.1    2.2.2.2
192.168.1.1    3.3.3.3
192.168.1.1    4.4.4.4

The other table contains a list of IP addresses, for example:

Bad_Ip_Addresses_Table

ip
7.8.9.4
3.2.1.4
77.8.99.4
2.2.2.2
18.7.9.8

Here’s where I can’t find how to build this query… I’m trying build a table that shows src and dst from the first table, and whether the 2nd table contains the dst from the first table. In other words:

Results_Table

src            dst        match
192.168.1.1    1.2.3.4    0
192.168.1.1    2.2.2.2    1
192.168.1.1    3.3.3.3    0
192.168.1.1    4.4.4.4    0

Here’s probably the biggest catch: I’ve seen posts on SO where the solution involves creating a table and triggers. I can’t do that – This would be an AWS Kinesis Analytics SQL statement:

Connections_Table is ingested live, and Bad_Ip_Addresses_Table is a CSV loaded from AWS S3. On each row ingested I need to perform a SQL statement against the CSV to find if the dst ip is in the CSV.

Any suggestions?

Solution:

In MySQL, you can Left Join from Connections_Table to Bad_Ip_Addresses_Table, such that all dst values from Connections_Table are considered (whether a matching row exists or not).
You can then Group By on src and dst; and use Count() function to count the matches. Note that Count(null) = 0; so non-matching rows will return 0 (since there will be null values post the Left join).

In MySQL, try the following query:

SELECT
  ct.src,
  ct.dst, 
  COUNT(biat.ip) AS match 
FROM 
  Connections_Table AS ct 
LEFT JOIN Bad_Ip_Addresses_Table AS biat ON biat.ip = ct.dst 
GROUP BY ct.src, ct.dst

I am new to AWS. I have implemented some functionalities in aws using java. My requirement is to insert a csv of 50MB to RDS PostgreSQL instance at a time.

I tried with aws lmabda service. But after 5 minutes lambda will be stopped so i dropped that way.(Limitation of lambda function)

The second step I followed I wrote a java lambda code of s3 event which will read the csv file falls on s3 to a kinesis stream using putrecord command. According to my understanding, kinesis is capable of read csv file record by record. This kinesis stream will invoke a second lambda function which is saving data to postgreSQL.

Everything was fine. But my confusion is that only 32000 record is inserting. I have 50000 records in my csv. according to kinesis stream it will reading each row as a record so each time it will invoke lambda separately right? so why it is not saving completely?

One more question in my kinesis stream configured like below.

Also in my lambda i configured kinesis as

Is this the correct configuration for my requirement? if I give batchsize as 1 will my function insert the complete record?Please let me know ur knowledge about this. It would be a great help from you thanks in advance!!!!

Solution:

You are exceeding your limits for a single shard.

Review the following document:
Amazon Kinesis Data Streams Limits

Make sure that your code is checking for errors on each AWS call.

	gamejudilebaran.word… on Chef: Roles and Environme…
	WARN: Waiting for se… on OSSEC start problem due to…
	aratik711 on Ansible issues
	aratik711 on Chef: Roles and Environme…
	situs judi on Chef: Roles and Environme…

Tag: amazon-kinesis

MySQL – How to combine data from two tables, using the value of one field as the filter to count values in another?

AWS Kinesis Stream In Detail Review

Rate this:

Share this:

Rate this:

Share this: