Inspiration

Earlier I built a discord bot that can automatically detect sharing of sensitive data in chat and tokenize values but allow de-tokenization with proper authority. Sensitive data includes passwords, secrets, certificates, keys, URLs in chat, and user-uploaded attachments. I used Skyflow Vault to store all the required information for storing a message and then tokenizing the sensitive part of the data like the attachment CDN URL or the message. The bot also has integration with VirusTotal APIs to detect and report any malicious uploads. Check out more

To this bot, I wanted to add a feature where a user can search for specific text in the documents shared by the community on the server. The default search feature in these collaborative/communication platforms is very limited, it is the same case with discord. Having used discord many times before to work on course projects with my friends, I know the number of project reports, articles, research papers, etc that are shared. The amount of data shared in the case of proprietary business communication would be much larger, so having a robust search feature that includes information from documents would be very useful to the community.

What it does

The bot makes use of discord slash commands to communicate with the AWS serverless platform that makes use of multiple AWS services, the slash commands are:

  1. /sync [history (int)] - Scan the last 'history' number of messages and look for attachments, then upload, followed by text analysis and entity detection. Store and index the keyword: file data in a DB.

  2. /search [keyword (string)] - Search for 'keyword' or similar words in the DB, and then retrieve the documents and their related information.

How we built it

The bot interacts with the main lambda function(Lumigo Solution) through slash commands and interaction events. Whilst the bot responds to the command, it invokes helper functions(one for each command sync and search) asynchronously to interact with other AWS services and perform heavier workloads. This way we can decouple tasks and respond quickly to commands and avoid timeouts. The helper lambda functions have the following tasks:

  1. DataHandler: handles the /sync command workflow.

    • Inputs an SQS queue that is subscribed to an SNS topic using the output generated by the main lambda.
    • Retrieves attachment from discord CDN URL and then uploads the file to an S3 bucket.
    • Detects text in the document(S3 bucket) by using AWS Textract.
    • Perform entity detection on this text to detect any key phrases and entity information.
    • Store key phrases and the corresponding file-related data in DynamoDB. Such that all files with the occurrence of keyword can be queried.
  2. SearchHandler: handles the /search command workflow.

    • Inputs a string, channel-related information from the output of the main function.
    • Queries DynamoDB table to fetch related keywords
    • All file-related info is extracted and sent as a text message embed using the discord APIs. The link to the message and the attachment are provided in this message.

Lumigo Integration

Despite using a number of AWS services and Lambda functions, Lumigo's consolidated logs and issue tracking in all of the connected services helped me in tracing the issue very quickly and with ease. Lumigo removed the hassle of setting up logging for each service or even finding the relevant logs in CloudWatch. Lumigo kept track of connections to external APIs and services to provide relevant logs and debugging information related to a transaction. Predefined alerts meant that I didn't need to configure any system on AWS to keep track of any resources and issues.

What's next for AWS powered Search

Possible improvements include:

  1. Option to allow admin to auto-sync attachments/documents without any command. Scheduling the lambda to execute every 'x' minutes could do the trick.
  2. Implement an improved and efficient search system.
  3. Replicate the bot for use in other popular collaboration platforms like Slack or Teams.

Built With

Share this project:

Updates