At the beginning of our nine week internship, we were given a real world problem to solve. Many companies have to go through hundreds or sometimes thousands of resumes every year, and SmartHire was developed to help cut down the time it takes to go through all of these. We have created a web application that will take in multiple resumes at a time and rank all of them with a score from 0 - 100. This was designed to help the user decide which applicants will be a good fit for the company.

As the resumes were uploaded to the application, they were parsed using Apache Spark, Apache Tika, Apache OpenNLP, and Tesseract. After they were parsed, the data was sent through an algorithm that computed the score using Machine Learning. The Machine Learning algorithm learned the preferences of each user and returned a more accurate score every time the scores were calculated. Therefore, an applicant’s score helped the user determine whether the applicant was of interest to the company or not. All of the data that was collected and scored was also uploaded to both Amazon S3 and Elasticsearch.

Since the data was stored in Elasticsearch, it allowed the front-end side of the application to easily pull the data to the webpage and display it in a user-friendly layout. The webpage itself was designed using Bootstrap, AngularJS, Javascript, and Node.js along with other multiple open-source libraries. There were multiple intuitive features that we integrated into the site. These include multiple applicant views and options, drop downs/tables, a search bar, and charts that help the user see the top 5 skills from multiple categories. Our website is mobile friendly, so it can be used on the go just as easily as sitting at a desk.

This internship with Data Works has been an amazing experience. Most of us had little to no knowledge of any of the technologies that we have used to create our application. Through the twists and turns of our project, we gained a great understanding of the technology stack. This program has set us in a good position to further our knowledge and experiences when we join the workforce after graduation.

The code is all open source and can be found at: