<rt id="bn8ez"></rt>
<label id="bn8ez"></label>

  • <span id="bn8ez"></span>

    <label id="bn8ez"><meter id="bn8ez"></meter></label>

    無為

    無為則可為,無為則至深!

      BlogJava :: 首頁 :: 聯系 :: 聚合  :: 管理
      190 Posts :: 291 Stories :: 258 Comments :: 0 Trackbacks

    Clustering is an important application area for many fields including data mining [FPSU96], statistical data analysis [KR89,BR93,FHS96], compression [ZRL97], vector quantization, and other business applications [B*96]. Clustering has been formulated in various ways in the machine learning [F87], pattern recognition [DH73,F90], optimization [BMS97,SI84], and statistics literature [KR89,BR93,B95,S92,S86]. The fundamental clustering problem is that of grouping together (clustering) similar data items.

    The most general approach is to view clustering as a density estimation problem [S86, S92, BR93]. We assume that in addition to the observed variables for each data item, there is a hidden, unobserved variable indicating the “cluster membership”. The data is assumed to arrive from a mixture model with hidden cluster identifiers. In general, a mixture model M having K clusters Ci, i=1,…,K, assigns a probability to a data point x: ?where Wi are the mixture weights. The problem is estimating the parameters of the individual Ci, assuming that the number of clusters K is known. The clustering optimization problem is that of finding parameters of the individual Ci which maximize the likelihood of the database given the mixture model. For general assumptions on the distributions for each of the K clusters, the EM algorithm [DLR77, CS96] is a popular technique for estimating the parameters. The assumptions addressed by the classic K-Means algorithm are: 1) each cluster can be effectively modeled by a spherical Gaussian distribution, 2) each data item is assigned to 1 cluster, 3) the mixture weights (Wi) are assumed equal. Note that KMeans [DH73, F90] is only defined over numeric (continuous-valued) data since the ability to compute the mean is required. A discrete version of K-Means exists and is sometimes referred to as harsh EM. The K-Mean algorithm finds a locally optimal solution to the problem of minimizing the sum of the L2 distance between each data point and its nearest cluster center (“distortion”) [SI84], which is equivalent to a maximizing the likelihood given the assumptions listed.

    There are various approaches to solving the optimization problem. The iterative refinement approaches, which include EM and K-Means, are the most effective. The basic algorithm is as follows: 1) Initialize the model parameters, producing a current model, 2) Decide memberships of the data items to clusters, assuming that the current model is correct, 3) Re-estimate the parameters of the current model assuming that the data memberships obtained in 2 are correct, producing new model, 4) If current model and new model are sufficiently close to each other, terminate, else go to 2).

    K-Means parameterizes cluster Ci by the mean of all points in that cluster, hence the model update step 3) consists of computing the mean of the points assigned to a given cluster. The membership step 2) consists of assigning data points to the cluster with the nearest mean measured in the L2 metric.

    We focus on the problem of clustering very large databases, those too large to be “loaded” in RAM. Hence the data scan at each iteration is extremely costly. We focus on the KMeans algorithm although the method can be extended to accommodate other algorithms [BFR98]. K-Means is a well-known algorithm, originally known as Forgy’s method [F65,M67] and has been used extensively in pattern recognition [DH73, F90]. It is a standard technique used in a wide array of applications, even as a way to initialize more expensive EM clustering [B95,CS96,MH98,FRB98, BF98].



    凡是有該標志的文章,都是該blog博主Caoer(草兒)原創,凡是索引、收藏
    、轉載請注明來處和原文作者。非常感謝。

    posted on 2006-06-24 13:54 草兒 閱讀(198) 評論(0)  編輯  收藏 所屬分類: BI and DM
    主站蜘蛛池模板: 男女超爽刺激视频免费播放 | 真人做人试看60分钟免费视频| 日本片免费观看一区二区| 精品免费久久久久久成人影院| 精品国产日韩亚洲一区| 亚洲videos| 国产在线观看无码免费视频| 最近的中文字幕大全免费版| 亚洲av片劲爆在线观看| 美女视频黄.免费网址| 国产四虎免费精品视频| 亚洲精品人成网在线播放影院| 久久99精品国产免费观看| 亚洲日本韩国在线| 亚洲A∨精品一区二区三区下载| 18pao国产成视频永久免费| 亚洲色婷婷六月亚洲婷婷6月| 亚洲1区2区3区精华液| 亚洲AⅤ视频一区二区三区| 亚洲综合在线一区二区三区| 91精品国产免费入口| 久久久久亚洲精品美女| 99在线精品视频观看免费| 亚洲AV一区二区三区四区| 亚洲欧洲中文日韩久久AV乱码 | 亚洲人和日本人jizz| 国产成人精品无码免费看 | 深夜久久AAAAA级毛片免费看| 最近中文字幕无吗高清免费视频| 亚洲成a人无码亚洲成av无码| 青青草a免费线观a| 国产亚洲一卡2卡3卡4卡新区 | 美女视频黄a视频全免费| 国产亚洲精品第一综合| 国产成人免费一区二区三区| 亚洲熟妇无码AV不卡在线播放 | 无人在线观看完整免费版视频| 亚洲精品在线视频观看| 18禁止看的免费污网站| 风间由美在线亚洲一区| 亚洲成人中文字幕|