banner.jpg
Serverless Web Scraper Bot using AWS

Introduction

How much time do you spend looking for something that you want to buy? From researching the actual product to finding the best deal from a provider with the item in stock. Nowadays, almost everyone has experienced this struggle with how common online shopping has become. While you may find out exactly what you want, finding the product in stock can be an entirely different challenge. Usually, the availability of the product you want depends entirely on the demand. Products with high demand can be extremely difficult to find. So, in this article, we go over how you can solve this common problem using AWS Serverless and Event-Driven architecture.

Background

As of January 2021, there were 4.66 billion active internet users worldwide. This means that 59.5% of the global population has access to buy pretty much anything online from products to services. Purchasing goods and services online has become the normal way to shop for just about everybody. With this massive demand for online commerce, the competition between e-commerce platforms has also grown tremendously, especially for high-demand products.

In order to purchase these high-demand products online, buyers need to constantly look at a variety of different e-commerce platforms with the hopes of seeing one in stock. For example, the Keychrone Q1 mechanical keyboard is one of the best mechanical keyboards on the market. However, it is rarely available due to high demand and limited production. The demand skyrocketed for this keyboard as the keyboard of choice for developers around the world for its comfortability and ergonomic fit. This keyboard also adapts to Windows and MacOS, and it is possible to customize with a different skin for every single key. So, between the high demand and limited production, it is constantly sold out and any new inventory normally disappears in less than an hour.

This leaves buyers with two options to get one:

  • Every single hour, open your browser and search for one on different eCommerce platforms hitting refresh constantly hoping you get lucky enough to see one in stock.
  • Automate the task entirely to check the availability on all eCommerce platforms you choose every hour.
What is serverless?

This particular architectural approach gained popularity in recent years. But, to understand the automated solution discussed above, you first need a crystal clear understanding of serverless. Essentially, serverless refers to an architectural approach where all server configuration is owned by the cloud provider. One of the most common misconceptions about serverless is that there are no servers at all. This misconception is completely false. There are still servers but they are managed and maintained by the cloud provider.

While serverless offers plenty of features, one of its most powerful uses is building event-driven architectures. An event-driven architecture reacts to events generated by other AWS services. Since these services are provided and built by AWS, the integrations are fairly straightforward for most use-cases. A major benefit of serverless for event-driven architecture is the payment model that serverless uses. Instead of a flat monthly fee, serverless computing typically uses a pay-per-use model. This payment model means you only pay for your usage of the service and nothing more. Most serverless computing providers also offer a generous free tier for users to experiment with before committing to a paid plan.

Event-driven architectures

Serverless computing allows users to build event-driven applications with relative ease. In a nutshell, event-driven architectures react to events generated by other AWS services or even external actors such as HTTP requests or webhooks.

The following image shows an event producer, which could be AWS resources or external events. This triggers the event to be stored in an event ingestion mechanism. Then, it is forwarded to the corresponding event consumers.
SAM

SAM stands for the AWS Serverless Application Model. It is a Command Line Interface (CLI) tool made by AWS to help users build serverless applications with infrastructure as code tools. SAM is a superset of CloudFormation with particular directives that make developing easier. You can use SAM to build Lambdas, DynamoDB tables, API gateways endpoint, roles, and more based on a template.

Building the solution
In this architecture, there are three types of event producers. The first is an event bridge rule, which triggers a Lambda to do web scraping every eight minutes. Then, another event is produced by the DynamoDB table. This triggers another Lambda to check if a particular keycap set has become available. Finally, SQS Queue stores the notification information about a particular keycap set to be consumed by the notification builder Lambda.
  • EventBridge: is a serverless event bus that makes it easier to build event-driven applications, delivering a stream of real-time data from event sources.
  • Lambda: provides an environment to run the code.
  • DynamoDB: is a NoSQL database service to store data in a document format.
  • SQS (Simple Queue Services): store messages to help to design decoupled architectures.

EventBridge dispatches an event every eight minutes. The event calls a Lambda to scrape the web on Keychrone's Q1 keycap set page and save the results in two dynamo DB tables, one for the available keycap set and another for the sold ones.

Therefore, when a particular keycap set is saved in the available DynamoDB database, it creates an event that triggers a Lambda function to read this newly available keycap set. It also verifies if the keycap set is already saved in the soldout DynamoDB table. If the keycap set is already saved in the soldout DynamoDB table, the keycap has recently become available.

If a particular keycap set becomes available, the tool sends a message that will be stored in an SQS queue. The SQS queue then begins polling out messages to create notifications on a Telegram channel. But, the bot only publishes messages to a Telegram channel. Using a Lambda, this can be expanded beyond a Telegram channel to sending an email, publishing a tweet, or creating a push notification.

Telegram Setup

To publish messages to a telegram channel, the tool needs a telegram bot token that will be stored in AWS System Manager (SSM) as a parameter. The bot token will only be accessible by the notification builder Lambda to make sure this token is stored securely.

Conclusion

This solution provides a systematic way to apply this solution to every website needed for the developer or interested users. There are many manual operations in web browsers that are repetitive, but necessary. Any of these manual, repetitive operations can be automated to save significant time, provide better results, and give users the opportunity of learning a new way of completing tasks.

CBQA Solutions is here to help fulfill your business needs whether you are new to AWS or already have an AWS Environment. Hire us!


®CBQA Solutions
logo-cbqa-cloud

Locations

USA

Mexico

Colombia

Contact

+01 (925) 951-8681

+52 (477) 104-3350

info@cbqasolutions.com