Oracle基础学习总结之数据库与实例
2013-12-20 15:20
585 查看
1. Problem Definition of Clustering:
Informal goal: Given n "points" [Web pages, images, genome fragments, etc.] classify into "coherent groups" -- cluster
Assumptions:
(1) As input, given a (dis)similarity measure -- a distance d(p , q) between each point pair.
(2) Symmetric [i.e., d(p , q) = d(q , p)] (Examples: Euclidean distance, genome similarity, etc)
Same cluster ==> "nearby"
2. Max-Spacing k-Clusterings
k-clustering : the # of desired clusters is k
separated pair : Call points p & q separated if they're assigned to dierent clusters.
Spacing : The spacing of a k-clustering is min (separated p,q){ d(p , q) }. (The bigger the better)
Max-Spacing k-Clusterings problem : Given a distance measure d and k, compute the k-clustering with maximum spacing.
3. A Greedy Algorithm
-- Initially, each point in a separate cluster
-- Repeat until only k clusters:
-- Let p , q = closest pair of separated points (determines the current spacing)
-- Merge the clusters containing p & q into a single cluster.
Note: Just like Kruskal's MST algorithm, but stopped early.
4. Correctness of Greedy Clustering
-- Let C1, ... , Ck = greedy clustering with spacing S. Let C1', ... , Ck' = arbitrary other clustering.
Need to show : spacing of C1', ... , Ck' <= S
-- Case 1: Ci' are the same as the Ci (maybe after renaming) ==> has the same spacing S.
-- Case 2: Otherwise, can find a point pair p , q such that:
(A) p , q in the same greedy cluster Ci
(B) p , q in different clusters Ci'
-- Easy case: If p , q directly merged at some point in Ci, then S >= d(p , q) (Distance between merged point pairs only goes up) == > S >= spacing of C1', ... , Ck' ( since p, q are separated )
-- Tricky case: p , q "indirectly merged" through multiple direct merges. Let p, a1, ... al, q be the path of direct greedy merges connecting p & q. Since p in Ci' and q not in Ci' ==> exists consecutive pair aj , aj+1 with aj in Ci' and aj+1 not in Ci' ==> S >= d(aj , aj+1) >= Spacing of C1', ... , Ck'
Informal goal: Given n "points" [Web pages, images, genome fragments, etc.] classify into "coherent groups" -- cluster
Assumptions:
(1) As input, given a (dis)similarity measure -- a distance d(p , q) between each point pair.
(2) Symmetric [i.e., d(p , q) = d(q , p)] (Examples: Euclidean distance, genome similarity, etc)
Same cluster ==> "nearby"
2. Max-Spacing k-Clusterings
k-clustering : the # of desired clusters is k
separated pair : Call points p & q separated if they're assigned to dierent clusters.
Spacing : The spacing of a k-clustering is min (separated p,q){ d(p , q) }. (The bigger the better)
Max-Spacing k-Clusterings problem : Given a distance measure d and k, compute the k-clustering with maximum spacing.
3. A Greedy Algorithm
-- Initially, each point in a separate cluster
-- Repeat until only k clusters:
-- Let p , q = closest pair of separated points (determines the current spacing)
-- Merge the clusters containing p & q into a single cluster.
Note: Just like Kruskal's MST algorithm, but stopped early.
4. Correctness of Greedy Clustering
-- Let C1, ... , Ck = greedy clustering with spacing S. Let C1', ... , Ck' = arbitrary other clustering.
Need to show : spacing of C1', ... , Ck' <= S
-- Case 1: Ci' are the same as the Ci (maybe after renaming) ==> has the same spacing S.
-- Case 2: Otherwise, can find a point pair p , q such that:
(A) p , q in the same greedy cluster Ci
(B) p , q in different clusters Ci'
-- Easy case: If p , q directly merged at some point in Ci, then S >= d(p , q) (Distance between merged point pairs only goes up) == > S >= spacing of C1', ... , Ck' ( since p, q are separated )
-- Tricky case: p , q "indirectly merged" through multiple direct merges. Let p, a1, ... al, q be the path of direct greedy merges connecting p & q. Since p in Ci' and q not in Ci' ==> exists consecutive pair aj , aj+1 with aj in Ci' and aj+1 not in Ci' ==> S >= d(aj , aj+1) >= Spacing of C1', ... , Ck'
相关文章推荐
- Oracle学习总结(二)——数据库基础
- oracle学习总结------创建新的数据库实例
- Oracle 实例性能分析与优化之AWR学习总结
- Oracle学习总结(8)—— 面向程序员的数据库访问性能优化法则
- Dubbo学习总结(1)——Dubbo入门基础与实例讲解
- oracle ocp 学习day5总结(oracle 体系结构基础)
- Oracle单实例数据库基础概念
- 【数据库】oracle基础-oracle 的数据库、数据库实例、监听之间的关系
- Oracle 10g 单实例数据库Data Guard 之 Logical Standby 配置详解(根据官方文档总结)
- Oracle数据库学习总结--spool
- Oracle 数据库SQL基础总结(1)
- Oracle数据库基础学习DAY2
- oracle 存储过程 转换为mysql存储过程 实例总结 (数据库有感篇一)
- oracle数据库学习总结:数据库概念及相关信息查看
- oracle基础(数据库名,实例名,操作系统环境变量Oracle_Sid,全局数据库名,服务名等概念)以及Oracle客户端的安装与远程连接配置
- Oracle 数据库基础学习 (五) 多表查询
- Oracle学习之创建数据库(新建实例)
- oracle数据库基础学习
- Oracle数据库基础学习DAY4
- Oracle 11g数据库实例开启状态总结