Tuesday, May 5, 2020

Project Proposal US Based Organization

Question: Describe about the Project Proposal for US Based Organization. Answer: Introduction Yelp is a well known US based organization that is associated with the development and hosting of the website Yelp.com and a smart phone application of the very same name. Yelp.com primarily hosts reviews of local business organizations, as mad by the customers and providers the users to make online reservations through their Eat24 and Yelp Reservations services (Yelp-support.com, 2016). This report provides a detailed discussion on an academic project aimed at using data mining technologies so as to reveal the relationship between the local business services utilized by the Yelp users and the comments made by them. Aim of the project The primary of the project being described in this report is to utilize the Yelp data set available to us in order to identify the relationship between the reviews and ratings made by the users ad their patron-ship in the local business. Objective of the project To find the relationship between the different local business houses and their loyal customers by comparing the reviews, comments and ratings made by the users. To analyze the Yelp academic data set and identify the demographics of the target customers for each of the local business types that are registered with Yelp.com. Research questions The project work being conducted would be aimed at finding the answers to the following questions: What is the relationship between the different local business houses and their loyal customers by comparing the reviews, comments and ratings made by the users? What are the demographics of the target customers for each of the local business types that are registered with Yelp.com? Literature review Researchers LI and Ngai (2016) have commented that a study conducted by the Harvard Business school has revealed that the stars and ratings posted by customers (on Yelp.com) significant effect on the total revenue of local business. In fact each additional star on Yelp.com has been found to have increased the business revenue by 5 to 9 percent. On the other hand, study conducted by the economists associated with the Berkeley has confirmed that medium to high ratings (3.2 to 4 to be very specific on a scale of 5) increases the chances of medium sized restaurants to be fully booked during the peak tourist seasons, by a whopping 17 percent (Dai et al. 2012). Besides this, the research study also indicated that as many as 84 percent of the local business owners are concerned with the ratings and reviews of this organizations on online forums like that of Yelp.com. Researchers McAuley and Leskovec (2013) have commented that a majority of local business owners today encourage their satisfied customers to post their ratings on online forums, so as to boast off their services on such platforms. In fact, the target customers of business organizations are often identified on the basis of the information available from such online platforms. It is a well known fact that data mining is considered as that particular domain of computer science that helps in the identification of patterns from large sets of data, with the help of several computational techniques including machine learning, artificial intelligence, statistics etc. According to LI and Ngai (2016), data mining methods have been found to be widely effective for the process of data classification and data association. On the other hand, researchers Aggarwal and Zhai (2012)have pointed out the data mining techniques that are generally used for the identification of underlying patterns in large data set to be the following: Anomaly detection: Anomaly detection techniques are utilized for the detection of abnormalities that exits in data sets. Association rule learning methods are utilized for identifying the relationship between the variables present in a data set. Clustering technique is utilized for the grouping of variables, based on the values of the same present in the data set. However, researchers Fan and Bifet (2013) have mentioned that the clustering techniques are considered to be effective only when the data structures of the said variables are unknown to us. Classification is the task of categorizing the new data sets into known group, based on certain characteristics of the same. Regression, on the other hand, is the task of identifying the particular function that can be utilized for modeling data with the minimum error count (Larose, 2014). Methodology It has already been mentioned that data mining is considered as one of the most effective tools that are widely utilized for identifying the relationship that exists between the variables present in a data set. Yelp.com has made their customer review data sets available online so as to facilitate academic project on the same: this particular data set, would thus be analyzed with the help of effective data mining techniques so as to find answers to the research questions presented in section 1 of the report. Researcher Freitas (2013) is of the opinion that the following sets of algorithms can be effectively utilized for classification of information present in large data sets: Decision tree: A decision tree is defined as a decision support tool, which results in the formation of a graph like structure that depicts decisions and their possible consequences. K-means clustering: Researchers Aggarwal and Zhai (2012) define the K-means clustering algorithm as one of the most easy and effective unsupervised learning algorithm. The authors also comment that the algorithm generates best results when the data set contains values distinct from each other. Apriori algorithm: According to Larose (2014), the Apriori algorithm is most suitable for finding association rules and frequent item-set mining. The algorithm operates by identifying the smaller data items present in a data system and associates the same with the larger variables present in the same. In the light of the discussions made in the report, it can thus be said that the decision tree and the K-means clustering algorithms can be effectively utilized to find the relationship between the different local business houses and their loyal customers by comparing the reviews, comments and ratings made by the users. On the other hand, the Apriori algorithm would help in identifying the demographics of the target customers for each of the local business types that are registered with Yelp.com . References Aggarwal, C. C., Zhai, C. (2012).Mining text data. Springer Science Business Media. Dai, W., Jin, G. Z., Lee, J., Luca, M. (2012).Optimal aggregation of consumer ratings: an application to yelp. com(No. w18567). National Bureau of Economic Research. Fan, W., Bifet, A. (2013). Mining big data: current status, and forecast to the future.ACM sIGKDD Explorations Newsletter,14(2), 1-5. Freitas, A. A. (2013).Data mining and knowledge discovery with evolutionary algorithms. Springer Science Business Media. Larose, D. T. (2014).Discovering knowledge in data: an introduction to data mining. John Wiley Sons. LI, J., Ngai, E. W. T. (2016). An Examination of the Joint Impacts of Review Content and Reviewer Characteristics on Review Usefulnessthe Case of Yelp. com. McAuley, J., Leskovec, J. (2013, October). Hidden factors and hidden topics: understanding rating dimensions with review text. InProceedings of the 7th ACM conference on Recommender systems(pp. 165-172). ACM. Yelp-support.com,. (2016). Updating Business Information | Support Center | Yelp. Yelp-support.com. Retrieved 1 October 2016, from https://www.yelpupport.com/Updating_Business_Information?l=en_US

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.