TrueBlue: Scaling Job Matching with a Serverless Delta Lake
When TrueBlue wanted to accurately match blue collar jobs to North American workers, they decided they wanted to build a serverless data processing solution that scaled and was easy to manage.
In this video, Adrian De Luca, Solutions Architect Director at AWS, and Carlos Lara, Director of Data Science/Machine Learning, explore how TrueBlue built a Cloud Native solution for job matching.
It showcases how they use the AWS Database Migration Service to extract data, Amazon EventBridge and Amazon SQS to do time sequenced processing of it, and explores how they use AWS Step Functions to orchestrate a workflow of Glue jobs to perform continuous inserts into table snapshots for data scientists to consume.
At 0:20, Carlos talks about the staffing company, TrueBlue, and highlights the company’s vision in brief. The speaker explains how TrueBlue addresses the challenge of connecting blue-collar jobs in North America.
Moving forward, the speaker outlines the ongoing challenges like matching the right people to the right jobs. To curb the problem, he suggests updating the database so that job requests and work requirements strike a balance.
The ultimate aim is to keep the data transparent and up-to-date so that the right individuals are aligned to the right jobs with utmost accuracy.
At 0:55, Carlos explains that they use a DMS to migrate their transactional databases to AWS. He mentions that they have an S3, a landing bucket, and Parquet files with options to update and delete data accumulated in the bucket. Once the files land in S3, they set up S3, and notifications are enabled. Now, the data in S3 feeds SQS queues that are capturing those events and accumulating in these queues for downstream consumption.
At 1:41, Carlos explains the real-time latency requirements for the different tables to meet. There are three SQS queues for each category based on latency. Also, he mentions about three EventBridge rules categorized on specified time frames – the 10-minute rule, 60-minute rule, and 3-hour rule.
At 2:02, he cites an example of the 10-minute EventBridge rule that gets invoked every 10 minutes and invokes a step function directly, which orchestrates a workflow of glue jobs where the first glue job reads from the SQS queue. These locators are the S3 locations for the landing buckets. He then explains that EMR helped them achieve the outcome of achieving inserts, updates, and deletes into consistent views very fast. However, at TrueBlue, it is serverless first because that’s the priority of engineers and developers.
At 2:58, the speaker talks about the challenges of managing the EMR based on scaling and complexity. Also, he discusses debugging the EMR when something goes wrong, especially while working with long-running clusters. So, the team was looking to switch to a serverless option.
At 3:10, Carlos briefs about their discussion with their team and how the idea came to go serverless. When looking for ideas, one of the engineers suggested going with the serverless Glue, and the idea worked perfectly. Using Glue, the team was able to get the same outcome as EMR; but this time in a serverless fashion. The events read from SQL to Glue are then written on their Delta Lake in S3.
At 3:45, the discussion went forward on how data scientists use PySpark SQL to read the tables from Delta Lake. The Delta Lake has three tiers- Bronze, Silver, and Gold, including rows and views created by joining the row tables. The query from any of the three tiers is very reliable because Glue keeps the tables up to date; hence they can perform accurate matches effortlessly.
The TrueBlue event-driven architecture effectively and efficiently matches job-seeking individuals with the right job profiles, making it a win-win for both employers and employees.