At the beginning of our nine week internship, we were given a real world problem to solve. Many companies have to go through hundreds or sometimes thousands of resumes every year, and SmartHire was developed to help cut down the time it takes to go through all of these. We have created a web application that will take in multiple resumes at a time and rank all of them with a score from 0 - 100. This was designed to help the user decide which applicants will be a good fit for the company.
As the resumes were uploaded to the application, they were parsed using Apache Spark, Apache Tika, Apache OpenNLP, and Tesseract. After they were parsed, the data was sent through an algorithm that computed the score using Machine Learning. The Machine Learning algorithm learned the preferences of each user and returned a more accurate score every time the scores were calculated. Therefore, an applicant’s score helped the user determine whether the applicant was of interest to the company or not. All of the data that was collected and scored was also uploaded to both Amazon S3 and Elasticsearch.
This internship with Data Works has been an amazing experience. Most of us had little to no knowledge of any of the technologies that we have used to create our application. Through the twists and turns of our project, we gained a great understanding of the technology stack. This program has set us in a good position to further our knowledge and experiences when we join the workforce after graduation.
The code is all open source and can be found at: https://github.com/dataworks/internship-2016