Occupation Analyzer

Wu, Yuchao

Full metadata record

DC Field	Value	Language
dc.contributor.author	Wu, Yuchao	-
dc.date.accessioned	2021-07-13T13:25:13Z	-
dc.date.available	2021-07-13T13:25:13Z	-
dc.date.issued	2021-06-17	-
dc.identifier.uri	http://hdl.handle.net/2451/62805	-
dc.description.abstract	Abstract The purpose of this project is to develop a text analytical tool in R that can help student job seekers match their resumes against jobs by a mutual similarity scoring to standard occupations as defined by the BLS O*NET database. Part of the terms of job evaluation for job seekers is to discover their educational preparation for a particular desirable job. The tool scores the resume against a group of jobs and presents the user with the top-scoring jobs to select a target. The resulting top target job table is scored against the occupations. A cosine similarity score is computed between the top job/occupation scores and the resume/occupation. The ranked jobs are then presented to the user as best matched to their resume. Users can interact with this tool on a website that relied on R Shiny. The website contains an occupation database and requires users to upload their resumes and a list of jobs as inputs. The outputs include detailed information about the top 15 jobs and the bottom five jobs. The project expands on existing preliminary work done in Python. The data preprocessing part includes making all text lower cases, removing punctuations, special characters, numbers, English common stop words, extra white spaces, and stemming. It uses the term frequency-inverse document frequency (TF-IDF) algorithm and cosine similarity to measure the similarity among the resume, jobs, and occupations. The final results meet sponsors’ expectations and show significant differences between previous work done in Python. Using n-grams, sliding windows, and named-entity extraction are possible methods to improve the tool’s performance. I will conduct further research this summer to enhance the accuracy. All project files are stored in a GitHub repository.	en
dc.description.sponsorship	NYU School of Professional Studies, MASY Program	en
dc.language.iso	en_US	en
dc.rights	All Rights Reserved by Author	en
dc.subject	Occupation Analyzer, Text Analytic, R	en
dc.title	Occupation Analyzer	en
dc.type	Working Paper	en
Appears in Collections:	MASY Student Research Showcase 2021

Files in This Item:

File	Description	Size	Format
Yuchao Wu - Final Project Report - Occupation Analyzer.pdf	Abstract The purpose of this project is to develop a text analytical tool in R that can help student job seekers match their resumes against jobs by a mutual similarity scoring to standard occupations as defined by the BLS O*NET database. Part of the terms of job evaluation for job seekers is to discover their educational preparation for a particular desirable job. The tool scores the resume against a group of jobs and presents the user with the top-scoring jobs to select a target. The resulting top target job table is scored against the occupations. A cosine similarity score is computed between the top job/occupation scores and the resume/occupation. The ranked jobs are then presented to the user as best matched to their resume. Users can interact with this tool on a website that relied on R Shiny. The website contains an occupation database and requires users to upload their resumes and a list of jobs as inputs. The outputs include detailed information about the top 15 jobs and the bottom five jobs. The project expands on existing preliminary work done in Python. The data preprocessing part includes making all text lower cases, removing punctuations, special characters, numbers, English common stop words, extra white spaces, and stemming. It uses the term frequency-inverse document frequency (TF-IDF) algorithm and cosine similarity to measure the similarity among the resume, jobs, and occupations. The final results meet sponsors’ expectations and show significant differences between previous work done in Python. Using n-grams, sliding windows, and named-entity extraction are possible methods to improve the tool’s performance. I will conduct further research this summer to enhance the accuracy. All project files store in a GitHub repository.	3.21 MB	Adobe PDF	View/Open

Show simple item record