<rt id="bn8ez"></rt>
<label id="bn8ez"></label>

  • <span id="bn8ez"></span>

    <label id="bn8ez"><meter id="bn8ez"></meter></label>

    無為

    無為則可為,無為則至深!

      BlogJava :: 首頁 :: 聯系 :: 聚合  :: 管理
      190 Posts :: 291 Stories :: 258 Comments :: 0 Trackbacks

    Clustering is an important application area for many fields including data mining [FPSU96], statistical data analysis [KR89,BR93,FHS96], compression [ZRL97], vector quantization, and other business applications [B*96]. Clustering has been formulated in various ways in the machine learning [F87], pattern recognition [DH73,F90], optimization [BMS97,SI84], and statistics literature [KR89,BR93,B95,S92,S86]. The fundamental clustering problem is that of grouping together (clustering) similar data items.

    The most general approach is to view clustering as a density estimation problem [S86, S92, BR93]. We assume that in addition to the observed variables for each data item, there is a hidden, unobserved variable indicating the “cluster membership”. The data is assumed to arrive from a mixture model with hidden cluster identifiers. In general, a mixture model M having K clusters Ci, i=1,…,K, assigns a probability to a data point x: ?where Wi are the mixture weights. The problem is estimating the parameters of the individual Ci, assuming that the number of clusters K is known. The clustering optimization problem is that of finding parameters of the individual Ci which maximize the likelihood of the database given the mixture model. For general assumptions on the distributions for each of the K clusters, the EM algorithm [DLR77, CS96] is a popular technique for estimating the parameters. The assumptions addressed by the classic K-Means algorithm are: 1) each cluster can be effectively modeled by a spherical Gaussian distribution, 2) each data item is assigned to 1 cluster, 3) the mixture weights (Wi) are assumed equal. Note that KMeans [DH73, F90] is only defined over numeric (continuous-valued) data since the ability to compute the mean is required. A discrete version of K-Means exists and is sometimes referred to as harsh EM. The K-Mean algorithm finds a locally optimal solution to the problem of minimizing the sum of the L2 distance between each data point and its nearest cluster center (“distortion”) [SI84], which is equivalent to a maximizing the likelihood given the assumptions listed.

    There are various approaches to solving the optimization problem. The iterative refinement approaches, which include EM and K-Means, are the most effective. The basic algorithm is as follows: 1) Initialize the model parameters, producing a current model, 2) Decide memberships of the data items to clusters, assuming that the current model is correct, 3) Re-estimate the parameters of the current model assuming that the data memberships obtained in 2 are correct, producing new model, 4) If current model and new model are sufficiently close to each other, terminate, else go to 2).

    K-Means parameterizes cluster Ci by the mean of all points in that cluster, hence the model update step 3) consists of computing the mean of the points assigned to a given cluster. The membership step 2) consists of assigning data points to the cluster with the nearest mean measured in the L2 metric.

    We focus on the problem of clustering very large databases, those too large to be “loaded” in RAM. Hence the data scan at each iteration is extremely costly. We focus on the KMeans algorithm although the method can be extended to accommodate other algorithms [BFR98]. K-Means is a well-known algorithm, originally known as Forgy’s method [F65,M67] and has been used extensively in pattern recognition [DH73, F90]. It is a standard technique used in a wide array of applications, even as a way to initialize more expensive EM clustering [B95,CS96,MH98,FRB98, BF98].



    凡是有該標志的文章,都是該blog博主Caoer(草兒)原創,凡是索引、收藏
    、轉載請注明來處和原文作者。非常感謝。

    posted on 2006-06-24 13:54 草兒 閱讀(198) 評論(0)  編輯  收藏 所屬分類: BI and DM
    主站蜘蛛池模板: 欧洲亚洲国产精华液| 亚洲成a人片77777老司机| 拔擦拔擦8x华人免费久久| 久久不见久久见免费影院| 亚洲一级免费毛片| 亚洲免费在线视频播放| h视频在线观看免费网站| 国产精品成人观看视频免费 | 亚洲天堂中文资源| 亚洲国产二区三区久久| 中文字幕亚洲精品资源网| 亚洲精品成人久久| 亚洲国产成人精品激情| 亚洲av无码偷拍在线观看| 美女黄频免费网站| 国产精品99爱免费视频| 久久精品私人影院免费看| 久久久久久久岛国免费播放 | 亚洲丁香婷婷综合久久| 曰批免费视频播放在线看片二 | 亚洲av永久无码一区二区三区| 亚洲日韩在线中文字幕综合| 一级午夜a毛片免费视频| 最新亚洲成av人免费看| 一区二区在线免费观看| 国产桃色在线成免费视频| 日本免费一区二区三区最新vr| 亚洲日韩中文字幕日韩在线| 国产成人A人亚洲精品无码| 亚洲自偷自拍另类图片二区| 亚洲日韩中文字幕无码一区| 国产av无码专区亚洲av毛片搜 | 亚洲国产成人手机在线观看| 一区在线免费观看| 日韩精品无码免费一区二区三区| 99国产精品永久免费视频| 四虎永久免费观看| 亚洲视频在线播放| 亚洲av无码无线在线观看| a毛片免费在线观看| 国产va精品免费观看|