随笔分类 - Data mining
The Origins of Data Mining
摘要:Data mining draws upon ideas,1. sampling, estimation, and hypothesis testing from statistics;2. search algorithms, modeling techniques, and learning theories from artificial intelligence, pattern recognition, and machine learning.
阅读全文
Motivating Challenges in Data Mining
摘要:1. ScalabilityIf data mining algorithms are to handle these massive data sets, then they must be scalable.2. High DimensionalityFor some data analysis algorithms, the computational complexity increases rapidly as the dimensionality increases.3. Heterogeneous and Complex DataDealing with data with no
阅读全文
Process of knowledge discovery in databases
摘要:Data mining is an integral part of knowledge discovery in databases (KDD), which is the overal process of converting raw data into useful information.The process of knowledge discovery in databases:Input Data-> Data Preprocessing(Feature Selection, Dimensionality Reduction, Normalization, Data Su
阅读全文
What is Data Mining
摘要:Data Mining is the process of automatically discovering useful information in large data repositories.Data Mining techniques are deployed to scour large databases in order to find novel and useful patterns that might otherwise remain unknown.
阅读全文
Data Mining Applications in Business
摘要:Data mining techniques can be used to support a wide range of business intelligence applications such ascustomer profiling, targeted marketing, workflow management, store layout, and fraud detection.It can also help retailersanswer important business questions such as"Who are the most profitabl
阅读全文
Top 14 Business Intelligence predictions for 2012
摘要:Don’t confuse continuity for laziness. On face value, rehashing events already transpired as ‘predictions’ may create the appearance of lethargy. But, it seems that many emergent themes from the 2010/11 Business Intelligence (BI) scene will dominate 2012, having now developed into significant market
阅读全文
Three typical types of Data Mining applications
摘要:Three typical types of Data Mining applications: Classification Regression ClusteringClassificationIn a classification type problem, we have a variable of interest which is categorical in nature. For example, this could be: Classification of credit risk, either good or bad Classifying patients as hi
阅读全文
Data conversion – the first step towards data processing
摘要:Data conversion – the first step towards data processing Convert all string to integers: ranging from 0 to n.Agecontinuous.WorkclassPrivate, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked.Fnlwgtcontinuous.EducationBachelors, Some-college, 11th, HS-grad,
阅读全文
Apriori algorithm
摘要:DescriptionIn computer science and data mining, Apriori[1] is a classic algorithm for learning association rules. Apriori is designed to operate on databases containing transactions (for example, collections of items bought by customers, or details of a website frequentation). Other algorithms are d
阅读全文
Association Rule Definitions
摘要:Definitions:•Set of items: I={I1,I2,…,Im}•Transactions: D={t1,t2, …, tn}, tj∈I•Itemset: {Ii1,Ii2, …, Iik} ∈I•Support of an itemset: Percentage of transactions which contain that itemset.•Large (Frequent) itemset: Itemset whose number of occurrences is above a threshold.•Association Rule (AR): implic
阅读全文
Methods for outlier detection
摘要:Distribution-based methodsDistance-based methodsDensity-based methodsClustering-based methods
阅读全文
Some key terms of Data Mining
摘要:Outlier mining - A data mining task aiming to find a specific number of objects that are considerably dissimilar, exceptional and inconsistent with respect to the majority records in the input databases.Subspace - A combination of features of attributes of a database.Outlying subspaces -An outlying
阅读全文
浙公网安备 33010602011771号