您的位置:首页 > 移动开发

Building customer models from business data: an automatic approach based on fuzzy clustering maching

2013-09-30 09:07 459 查看
Data mining (DM) is a new emerging discipline that aims to extract knowledge from data using several techniques. DM turned out to be useful in business where the data describing the customers and their transactions is in the order of terabytes. In
this paper, we propose an approach for building customer models (said also profiles in the literature) from business data. Our approach is three-step. In the first step, we use fuzzy clustering to categorize customers, i.e., determine groups of customers. A
key feature is that the number of groups (or clusters) is computed automatically from data using the partition entropy as a validity criteria. In the second step, we proceed to a dimensionality reduction which aims at keeping for each group of customers only
the most informative attributes. For this, we define the information loss to quantify the information degree of an attribute. Hence, and as a result to this second step, we obtain groups of customers each described by a distinct set of attributes. In the third
and final step, we use backpropagation neural networks to extract useful knowledge from these groups. Experimental results on real-world data sets reveal a good performance of our approach and should simulate future research.

译:数据挖掘(DM)是一门新兴学科,旨在从数据中提取知识使用几种技术。DM证明是有用的业务数据的描述客户和他们的交易以兆兆字节。在本文中,我们提出的方法建立客户模型(也说在文献资料)与业务数据。我们的方法是三步。在第一步中,我们使用模糊聚类分类的客户,即确定客户群。一个关键特性是,很多团体(或集群)自动计算从数据使用划分熵作为真实性的标准。在第二步中,我们进行降维旨在保持为每组只有客户的信息最丰富的属性。为此,我们定义了信息损失量化信息程度的一个属性。因此,作为结果,第二步,我们获得的消费者群每个描述由一种独特的属性集。在第三个和最后一步,我们使用摘要神经网络中获取有用的知识从这些组织。真实世界的数据集上的实验结果揭示了我们的方法的良好性能,应该模拟未来的研究。

Marketing managers can develop long-term and pleasant relationships with customers if they can detect and predict changes in their behavior. In the past, researchers generally used to apply statistical surveys to study customer behavior. Recently,
data mining techniques have been adopted. These techniques aim to search through a database to obtain implicit, previously unknown, and potentially useful information including knowledge rules, constraints and regularities. Data mining, a key step in Knowledge
Discovery in Databases (KDD), involves the application of specific algorithms for pattern extraction. Various successful applications have been reported in areas such as the web, marketing, finance and banking. Currently, businesses face the challenge of a constantly
evolving market where customer needs are changing all the time. Hence, instead of targeting all customers equally, enterprises can select only those customers who meet certain profitability criteria based on their individual needs or purchasing behaviors.

译:如果可以探测和预测客户的行为改变,营销管理能够和客户发展长期和愉快的关系。在过去,研究人员通常使用应用统计调查研究客户的行为。最近,数据挖掘技术已经采用,这些技术的目标通过搜索数据库以获取隐式的,未知,和潜在有用信息,包括知识规则、约束和规律。数据挖掘,是数据库的中知识发现关键的一步,包括特定算法的模式提取的应用,像网络、市场、金融和银行业务这些领域已经有多方面成功的应用。目前,企业面临的挑战是一个不断发展的市场,客户的需求在不断地变化。因此,代替对待所有客户一样,企业可以只选择那些符合特定的盈利条件的客户,标准基于他们个人的需求或消费行为。

As a result, the discovered information can be ascertained to support better decisionmaking in marketing. Consequently, one can define data mining in customer profiling
simply as being the technology that allows building customer profiles each describing the specific habits, attitudes and behavior of a group of customers. Some of the difficulties faced by
data mining techniques for customer profiling are the amount of data available to create user models, the adequacy of the data, the noise within that data, and the necessity of capturing the imprecise nature of human behavior. Data mining and machine learning
techniques have the ability to handle large amounts of data and to process uncertainty. These characteristics make these techniques suitable for the automatic generation of customer models that simulate human decision-making. Several Artificial Intelligence
techniques have been proposed in the literature to address this problem. In fact, models using Bayesian networks, decision trees, support vector machines, artificial neural networks, and association rules have been used in many industrial applications in order
to develop customer profiles. Hereafter, we will outline some of the research activities for customer profiling to give the novice reader some background in the field. For an exhaustive review of existing approaches, we refer the interested reader to the specialized
literature.

译:这样,所发现的信息可以为市场做更精准的决策。因此,可以定义对客户概括的数据挖掘,是简单地作为的允许建立描述一组特定的习惯、态度和行为客户档案的技术。数据挖掘技术在客户分析中面临一些困难,大量的数据可用来创建用户模型,数据是否适当,数据噪音问题和和捕获人类不确定性行为的必要性。数据挖掘和机器学习技术能够处理大量的数据和不确定性。这些特征使这些技术实现客户模型的自动生成,提高决策效率。一些文献提出人工智能技术可以解决这个问题。事实上,许多工业应用程序为了发展客户分析已经使用贝叶斯网络、决策树、支持向量机、人工神经网络和关联规则技术。以后,我们将列出一些对于客户分析的研究活动,向新手读者介绍一些该领域的背景知识。如果想对现有方法深入学习,感兴趣的读者参考相关的专业文献。

The model in Ref. 17 proposes an integrated data mining and behavioral scoring model to manage existing credit card customers in a bank. A self-organizing map was used to identify groups of customers based on repayment
behavior and recency, frequency, monetary behavioral and scoring predictors. It also classified bank customers into three major profitable groups of customers. The resulting groups of customers were then profiled by customer's feature attributes determined using
an apriori association rule inducer.

译:第十七文献中提出一个集成的数据挖掘模型和行为得分模型管理银行现有的信用卡客户。区分基于还款行为时效性、频率、货币行为和得分的预测因子的客户群,使用自组织映射方法。同样将银行的客户分为三种主要盈利的客户群,使用先验的关联规则挖掘不同客户群的功能属性。

Other works are also developed in retail marketing because understanding changes in customer behavior in the dynamic retail market can help managers to establish effective promotion campaigns.
The model in Ref. 5 integrates customer behavioral variables, demographic variables, and transaction database to establish a method of mining changes in customer behavior. For mining change patterns, two extended measures of similarity and unexpectedness are
designed to analyze the degree of resemblance between patterns at different time periods. Customer behavior patterns are first identified using association rule mining. Once the association
rules are discovered, the changes in customer behavior are identified by comparing two sets of association rules generated from two data sets at different periods. Based on previous studies,
changes in customer behavior include emerging patterns, added patterns, perished patterns, and unexpected patterns.

译:其他方法也在零售市场得到应用,因为在动态的零售市场观察客户行为的变化可以帮助管理者建立有效的宣传活动。第五文献中的模型,合并了客户行为变量,人口统计学变量和事务数据库统计客户的行为变化。为了挖掘变化模式,相似性和不可预测性两种扩展特征用来分析不同时期模式的相似度。关联规则挖掘首次发现客户行为模式。自从发现了关联规则,通过比较两组不同时期的数据的关联规则确定客户行为的变化。基于先前的学习,客户行为改变包括出现模式、增加模式、流失模式和不确定模式。

Another work worthy of notice is that proposed in Ref. 40 which presents the patterns of use for additional services that are currently provided to mobile telecommunication subscribers. Factor analysis, clustering and
quantitative association rules are used to find the service adoption patterns of segmented groups. From the analysis, three categories of users are identified. The first group consists of a new generation of customers who utilize chargeable additional services
using the “direct button”, for leisure and entertainment. The younger generation use their mobile phones more frequently than the older generation, and tend to display higher usage patterns for a variety of additional services. The second group utilizes practical
additional services that are low-priced or free such as “data service” and “phone-to-phone service” via “Caller ID request service”. The customers in the final group are people who have no general usage characteristics. This study utilizes the association rules
found in each cluster to provide strategic guidance to enhance the mobile service market of the corresponding group.

译:第四十文献中另一个值得注意的案例,现在这个模式使用到其他的服务,目前提供给移动电信用户。使用要素分析,聚类和定量关联规则这些方法发现细分客户群采用的服务模式,从这些分析中,确定了三种类别的用户。第一类用户由新一代利用额外服务收费的用户组成,组要为了休闲和娱乐。年轻的一代比年长的一代更频繁地使用手机,他们趋向展示更高的各种不同的附加服务使用模式。第二类用户使用实际的附加服务,低价或免费的如“数据服务”和“通话服务”通过“来电显示请求服务”。最后一类用户没有明显的特征。这份研究使用关联规则发现在每个用户群,为不同用户组的移动服务市场提供战略指导。

The model in Ref. 29 mines customer behavior to assist managers in developing better promotion and other relevant policies for a firm. The association rules of the relational database design are implemented in the mining
system which provides electronic catalog designs and promotional policies. The association rules from relational database design are utilized to mine consumer behavior in order to generate cross-selling proposals for an electronic catalog design and marketing
for a retailing mall.

译:第29文献中模型挖掘客户行为以帮助经理们为公司提出更好的促销活动和其他相关决策。关联规则的关系数据库设计实现了挖掘系统帮助电子目录设计和促销策略设计。关联规则在相关的数据库的应用挖掘消费行为,以便生成零售业购物中心的交叉销售的电子目录设计和营销方案。

In this paper, we propose an approach to develop automatically customer profiles (said also models) from business data. It involves three steps. In the first step, we use fuzzy clustering to categorize customers. A key feature
of this fuzzy clustering model is that the number of groups is determined automatically from data using the partition entropy as a validity measure. In the second step, the dimension (or number of attributes) for each cluster (or group of customers) is reduced
by selecting only the most informative attributes. Selection is based on the information loss of an attribute; a quantity computed using the entropy of the attribute, and that of the whole group (or cluster). Consequently, and as a result to this second step,
we obtain a set of groups of customers each of them described by a distinct set of attributes judged as being the most informative. In the third and final step of our approach, each reduced cluster is trained by a feedforward backpropagation network to extract
useful knowledge. Hence, we obtain a set of backpropagation networks each encoding in its connections a customer profile (or model) and which could be used subsequently in classifying new and previously unknown customers. The rest of this paper is structured
as follows. In Sec. 2, we detail our model. Section 3 presents experimental analysis; while the last section offers concluding remarks and shades light on future research.

译:在本文中,我们提出一种方法从业务数据中开发自动客户分析(模型)。它涉及到三个步骤,在第一步中,用模糊聚类方法分类的客户。模糊聚类算法关键的一步是决定划分聚类的组数,自动从数据使用分区熵作为一个有效措施。在第二步中,维(或属性数量)对于每个集群(或一组客户)减少选择只有信息最丰富的属性。选择是基于属性信息的损失;量计算运用信息熵的属性。因此,第二步的结果,我们获取几组的消费者群,他们每个人都被一个不同的组,每组的属性被认定为信息最丰富的。在第三步,最后一个步骤中,每个聚类减少训练,从反馈网络中提取有用的知识。因此,连接客户分析(或模型)我们获取一组反馈网络编码和分类未知潜在的客户。下文组织如下,第二章中我们详细描述我们的模型。第三章中将进行实验分析了,而最后一章是论文总结和对未来研究的发展方向。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐