Article Critique
The article titled “The CRISP-DM Model: The New Blueprint for Data Mining” written by Colin Shearer is such an understandable and useful account on a model for data mining. It is relevant today as well because it conveniently covers how a data mining model can work for the processing and dealing of data. This paper aims to appraise the article so that it can unveil more effectiveness and usefulness in today’s world. This critique has used three related recent articles so that the findings of the article can relate to emerging and recent needs in the data mining domain. For this purpose, this critique would include the strengths and weaknesses of the article so that one may get an effective and useful intake from it.
Strong points:
One of the strongest points in the article is that it is the outcome of four leaders in the data mining market. These are Daimler-Benz, Integral Solutions Ltd, NCR, and OHRA. The development of the model had a clear purpose before these leaders that there should be a data mining model that is non-proprietary. It is freely available and has six easy phases for data mining. The presentation of six steps in the model is understandable and easy; therefore, the application is possible. These phases in the data mining process include business understanding and data understanding. These phases make the foundation of the data mining process. Understanding of business is necessary for the implementation of data mining principles because every process should come up with business needs. It leads to an understanding of data because data is unique from business to business. In this manner, the model has started with understandable and easy to cover steps in data mining.
The third phase is data preparation that relates to the data understanding phase. This phase is followed by modeling, evaluation, and deployment of data. In this manner, the article has adopted a practical and easy approach in the data mining model. It is the reason why this article is part of recent research work as well. For this purpose, this appraisal has included a review of three related articles. These articles have included the model presented in this article.
Weaknesses:
Although it is not a suitable term to use for research work because it is a perfect and useful research. However, it might contain limitations, and it is useful to address them. It is too generic and simple in its approach. It is suitable for beginners and the time; it was published. The year of its publishing is 2000, and it was the beginning of data mining. Therefore, it is suitable for that time. The article has also admitted to the conclusion that it is not a magic book solving all problems with data instantly. It has advised taking the assistance of experts to perfectly and effectively use the article findings. It may be the strength to reflect on limitations in the form of advice. However, it is a weakness. It has reported application as well those companies have benefited from the model. It is good to know this, but again, it depends on who applies or uses the model. It is a good beginning, and one has to take care of one’s perfect application.
In the following, three articles and their reviews would make the appraisal perfect. The review has the purpose of knowing the relevance of the article with the data mining field. If the articles reviewed contain the elements used in the article, it would be the strength of the article. Chosen articles have used the term of the model in their research, but they may vary in the application of the article
Research by Munro and Madan:
In the article published in 2016, authors have used phases of data mining in the model CRISP-DM model presented in the article. The article is a case study, and it has used data mining for manufacturing data. The article has followed the CRISP-DM methodology and mentioned its use. It has used six phases and explained the case study under each phase head. Therefore, the article is a true application of the article under review. For instance, under the heading of business understanding, it has helped understand project and goals through discussions with the firm. The second phase of data understanding has also been used in the article. Data understanding and its quality are under consideration. The article has used data understanding that the quality of the data is essential. For this purpose, the quality audit is part of the article. It means that the application of data understanding is a quality audit (Munro & Madan, 2016).
Secondly, the article has detailed the step of data preparation, which means it is the most important phase of the model. The article has termed the article as research in progress, and it is currently under the data preparation stage. Then, the article has used steps of modeling, evaluation, and deployment. The use of phases under expert supervision in the article has given rise to the belief that the primary article under review is effective; it has been guiding people through its simple process. The article noted itself that the application should involve experts, and this case study has used experts for application. So, there are no limitations to follow those easy steps (Munro & Madan, 2016).
Research by Christopher, Cameron, and Crawley:
This research has also used the CRISP-DM model and this research is also an application of the model. it raises the importance of the primary article that is on data mining. In the methodology of this article, authors have clearly mentioned that they have used this model. They have used all of its six steps as well. The article is more recent than the previous one, as its publishing year is 2017. It also goes in favor of the primary article that it has applications in recent times as well (Berardi, Cameron, & Crawley, 2017).
A beautiful aspect of the article is that it has made exact headings that are part of the mode. Under the business and data understanding phase, critical issues have been brought to the surface. The article is a real expert piece because it has taken a serious approach to understanding the business case. It has therefore followed the advice of the primary article that an expert opinion is useful for the article (Berardi, Cameron, & Crawley, 2017).
The phase of data preparation is large, which includes data selection, data cleaning, and data construction sub-steps. It helps to teach how data may get the shape for its utilization. However, the most crucial step is modeling, and this article has used this phase in detail. In this manner, the CRISP-DM model leads to the development of another model. It is an absolutely good and effective approach. It is the reason why the primary article is still relevant because it helps in the formation of other models and approaches in data mining. The last two phases are not clearly mentioned in the article, but important phases are its part. Overall, the article has used these steps favorably and it is good not to know that it has adopted an effective approach to getting to the point with the research (Berardi, Cameron, & Crawley, 2017).
Research by Mansingh, Rao, Osei-Bryson, and Mills:
In this article, the authors have used the model presented in the primary article. It has included all the phases part of the model (Mansingh, Rao, Osei-bryson, & Mills, 2015). The importance of the article is that it is more practical than the other two discussed above. The following of the process in this paper has led to an analysis of variables in attitudinal, behavioral, and demographic data. The article has made the profile of internet banking users. It is the example of big data and mining of such large data requires a serious approach. Certainly, mining of big data on behavioral, attitudinal, and demographic perspectives leads to discovering new information and data. Therefore, this article has discovered important information about internet banking users. The model used in the primary article has such an application and it has proved findings of that article interesting and useful (Mansingh, Rao, Osei-bryson, & Mills, 2015).
Final Words:
The article by Shearer is an excellent beginning in the big data mining and the simple model is still under use by many researchers in the field. It is favorable for the industry and business field to use the model for better and effective utilization of the knowledge. The article is the collective effort of large corporations and they have utilized it for application in different perspectives and contexts. One may say for confidence that the article is still valuable and relevant to data mining and big data research. The author has included in the article that experts may use the model for better application, and it provides larger room for application. These three reviewed articles have added expert opinion and practice with the CRISP-DM model, and it has been useful for big data.
Predictive Analytics: Big Data Mining Problem Domain
Big Data Mining Problem Domain:
So far as the big data mining problem domain is concerned, one thing should be in mind. Data mining deals with large or big data. It exists in large quantities and complex nature. For this reason, the domain of predictive analytics has been chosen as the domain because it may be a good example of a data mining problem. Evidence has confirmed that big data and business analytics have grown steadily because both are necessary for the success of the business. Predictive analytics is a part of business analytics where data predicts regarding future. It also works for the benefit of the business (Rashid, Shah, & Irtaza, 2019). The data mining model presented by Shearer may be used in this scenario, but other data mining tools and techniques may also be appropriate. It is worth explaining predictive analytics in the context of big data or data mining.
Predictive Analytics:
Analytics is one of the forms of business intelligence and uses analytics to predict the behavior of individuals, machines, and entities in the future. Predictive analytics, as the name shows, predicts the future, and it is highly necessary in today’s world. Many organizations are dealing with data and process it. As a result, they are able to interpret the data to draw useful insights about customers. However, the prediction is meaningless and impossible without big data. It may relate to the company’s own rich resources, or it may relate to competitors. In any case, a business has to dive into big data and predictive analytics works in that way to data. After an introductory description of predictive analytics, it has become clear that it has a direct relationship with data mining or big data (Iovan, 2017).
Appropriate Data Mining Tools and Techniques:
So far as appropriate data mining tools and techniques are concerned, simple techniques would be appropriate and effective. Business analytics may have four types. It may be prescriptive, predictive, diagnostic, and descriptive. The chosen business analytics for this report is predictive or foresight business analytics. It has a futuristic look because it looks likely to happen in the future. So, data mining tools and techniques suitable for it are text mining, media mining, predictive modeling, and artificial neural network utilization. Overall, these tools and techniques ensure data mining happens. Along with them, it is useful to have a model or framework for data mining so that business can follow step by step approach in dealing with data.
Selection of Model:
A business should select a model first that can ensure data mining appropriately. It may be any model that may suit the organization because its fundamental purpose is to deal with the data. For this purpose, it should understand business and data. By doing so, a business can ascertain which type of data is suitable and workable for the business. It also ensures specifications that are in need of the business. In light of the developed understanding, it is appropriate to go for data preparation so that data may be in the usable format. It follows important steps of modeling, evaluation, and deployment. Any model that has these useful steps to organize big data may serve the purpose initially. The report proposed that the selection of the model should come first.
Text Mining Techniques:
Predictive analytics requires this technique very much because it has to deal with text. Mining text is not easy because documents and reports in business are large in number. If an organization goes for accessing text related to any specific field, it has to go through plenty of reports and books. However, this report proposes a business to do so. It would work best because much valuable insight is hidden within it. A business may ask customers or visitors to provide information or feedback. If the organization manages that information or feedback only, it can work for predictive analytics. However, the proposal goes further by noting that there are many variations within text mining and a business should be flexible in choosing appropriate ones (Rashid, Shah, & Irtaza, 2019).
Media Mining Techniques:
Media and its different forms have given rise to the emergence and creation of dynamic material. This material consists of audio, visual, and video formats. To help understand, the example of Youtube may be useful. Everyone can share information or opinion over the platform in a video format. It is media and similarly, social networking sites and other platforms have presented lots of opportunities in this regard. This report proposes media mining so that the future of the business may come forward. Predictive analytics has the future looking perspective of the analysis, and media mining can help a lot in this regard (Yen, Nguyen, & Park, 2015).
Predictive Modeling Technique:
Modeling is very important, and businesses should use a standard model for prediction. Various predictive analytics models have some common themes and aspects. A business has to use them. Moreover, it can go for its customized model that can work for its needs. Predictive modeling should reflect the nature of data because modeling may differ as per the nature of data. This technique requires some serious approach from the business because it must be comprehensive. A model is a standard and it shapes actions by the organization (Chaudhuri, Ghosh, & Eram, 2016).
Artificial Neural Networks:
An artificial neural network is a good technique that ensures systems to perform actions as per example. The consideration of examples and the ability to work without programming make these networks suitable for data mining. Predictive analytics has to use these networks because they are computing systems. Their automation and ability to bring results instantly without programming are its salient features. With the help of artificial neural networks, an organization should customize it. It may bring favorable results in no time and as a result, predictive analytics may be possible for the business. The network consists of connected units and their connection makes the picture of the future. This predictability is useful in a sense because a machine may be faster and smarter to bring outcomes. Therefore, this technique is useful and modifiable to bring better results (He, Zhang, Guo, & Zhao, 2014).
As a result, the business can solve problems for the business. The problem of business intelligence can also get a solution through it. Predictive analytics has a basic purpose of representing business intelligence. If there is no predictive analytics and big data is not under consideration, it is very difficult to make business intelligence happen.
Benefits to the Business and Measurable Implementation Success Criteria:
Predictive analytics and business intelligence due to it has many benefits for the business. A business or an organization may get plenty of benefits but in the following, some have got their place.
It brings cost-efficiency because a business does not need to wait for things to come before it. It remains well prepared and ready to face the future that is going to have the shape. AS a result, another benefit comes along with productivity optimization. These two benefits are strategic, and a business cannot sustain without them. They have come due to predictive analytics, which is the result of smart and effective use of big data.
Customers are the backbone of an organization because they guarantee a stable path. If a business can predict the future, it must be related to customers. A business would be in a better position to serve their requirements and needs. As a result, the business would prosper and sustain in the long run. It is all due to the predictive analytics and mining of big data.
The third and last benefit included in the business is the reduction in risks. Predictability reduces risks for the business, and it is likely to work in a certain manner. Reduction in risks is very necessary in this dynamic world where markets have become hyper-competitive. However, there should be measurable success criteria.
Success criteria for the customer are in the form of their satisfaction level and ability to effectively represent their future trends. Cost efficiency and an increase in profitability would reflect in terms of the financial figures of the business. Productivity optimization would reflect in terms of the productivity of the organization. Reduction in risk may be reflected in terms of the low level of risks and a more certain and predictable journey for the business ahead. These success criteria would ensure that things are on track and the business has effectively implemented predictive analytics.
Conclusion:
The report concludes that business intelligence requires predictability, and it is one of the dimensions of the Big Data mining problem domain. For selection, the report has chosen predictive analytics that ensures business to have insight into the future. Big data may help businesses for predictive analytics because an in-depth analysis of data can provide better insight into circumstances into the future. However, a business should have the tools and techniques necessary for the implementation and this report has covered them. The report recommends using predictive analytics for having a certain future for a business.
References
Berardi, C. U., Cameron, B., & Crawley, E. (2017). Informing Policy through Quantification of the Intellectual Property Lock-In Associated with Dod Acquisition. Defense AR Journal, 24 (3), 432-466.
Chaudhuri, T. D., Ghosh, I., & Eram, S. (2016). Application of Unsupervised Feature Selection, Machine Learning and Evolutionary Algorithm in Predicting Stock Returns: A Study of Indian Firms [dagger]. IUP Journal of Financial Risk Management, 13 (3), 1-27.
He, Z., Zhang, Y., Guo, Q., & Zhao, X. (2014). Comparative Study of Artificial Neural Networks and Wavelet Artificial Neural Networks for Groundwater Depth Data Forecasting with Various Curve Fractal Dimensions. Water Resources Management, 28 (15), 5297-5317.
Iovan, S. (2017). Predictive Analytics For Transportation Industry. Journal of Information Systems & Operations Management, 58-71.
Mansingh, G., Rao, L., Osei-bryson, K.-m., & Mills, A. (2015). Profiling internet banking users: A knowledge discovery in data mining process model-based approach. Information Systems Frontiers, 17 (1), 193-215.
Munro, D. L., & Madan, M. S. (2016). Is data mining of manufacturing data beyond first order analysis of value? A case study. Journal of Decision Systems, 25, 572-577.
Rashid, J., Shah, S. M., & Irtaza, A. (2019). Fuzzy topic modeling approach for text mining over short text. Information Processing & Management, 56 (6), 102060.
Yen, N. Y., Nguyen, U. T., & Park, J. H. (2015). Mining social media for Knowledge Discovery. The Computer Journal, 58 (9), 1859.