Title: | Occupation Analyzer |
Authors: | Wu, Yuchao |
Keywords: | Occupation Analyzer, Text Analytic, R |
Issue Date: | 17-Jun-2021 |
Abstract: | Abstract The purpose of this project is to develop a text analytical tool in R that can help student job seekers match their resumes against jobs by a mutual similarity scoring to standard occupations as defined by the BLS O*NET database. Part of the terms of job evaluation for job seekers is to discover their educational preparation for a particular desirable job. The tool scores the resume against a group of jobs and presents the user with the top-scoring jobs to select a target. The resulting top target job table is scored against the occupations. A cosine similarity score is computed between the top job/occupation scores and the resume/occupation. The ranked jobs are then presented to the user as best matched to their resume. Users can interact with this tool on a website that relied on R Shiny. The website contains an occupation database and requires users to upload their resumes and a list of jobs as inputs. The outputs include detailed information about the top 15 jobs and the bottom five jobs. The project expands on existing preliminary work done in Python. The data preprocessing part includes making all text lower cases, removing punctuations, special characters, numbers, English common stop words, extra white spaces, and stemming. It uses the term frequency-inverse document frequency (TF-IDF) algorithm and cosine similarity to measure the similarity among the resume, jobs, and occupations. The final results meet sponsors’ expectations and show significant differences between previous work done in Python. Using n-grams, sliding windows, and named-entity extraction are possible methods to improve the tool’s performance. I will conduct further research this summer to enhance the accuracy. All project files are stored in a GitHub repository. |
URI: | http://hdl.handle.net/2451/62805 |
Rights: | All Rights Reserved by Author |
Appears in Collections: | MASY Student Research Showcase 2021 |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Yuchao Wu - Final Project Report - Occupation Analyzer.pdf | Abstract The purpose of this project is to develop a text analytical tool in R that can help student job seekers match their resumes against jobs by a mutual similarity scoring to standard occupations as defined by the BLS O*NET database. Part of the terms of job evaluation for job seekers is to discover their educational preparation for a particular desirable job. The tool scores the resume against a group of jobs and presents the user with the top-scoring jobs to select a target. The resulting top target job table is scored against the occupations. A cosine similarity score is computed between the top job/occupation scores and the resume/occupation. The ranked jobs are then presented to the user as best matched to their resume. Users can interact with this tool on a website that relied on R Shiny. The website contains an occupation database and requires users to upload their resumes and a list of jobs as inputs. The outputs include detailed information about the top 15 jobs and the bottom five jobs. The project expands on existing preliminary work done in Python. The data preprocessing part includes making all text lower cases, removing punctuations, special characters, numbers, English common stop words, extra white spaces, and stemming. It uses the term frequency-inverse document frequency (TF-IDF) algorithm and cosine similarity to measure the similarity among the resume, jobs, and occupations. The final results meet sponsors’ expectations and show significant differences between previous work done in Python. Using n-grams, sliding windows, and named-entity extraction are possible methods to improve the tool’s performance. I will conduct further research this summer to enhance the accuracy. All project files store in a GitHub repository. | 3.21 MB | Adobe PDF | View/Open |
Items in FDA are protected by copyright, with all rights reserved, unless otherwise indicated.