您的位置:首页 > 大数据

Spark机器学习(二) 局部向量 Local-- Data Types - MLlib

2016-04-23 10:01 1146 查看
Local vector

Labeled point

Local matrix

Distributed matrix

RowMatrix

IndexedRowMatrix

CoordinateMatrix

BlockMatrix

MLlib supports local vectors and matrices stored on a single machine, as well as distributed matrices backed by one or more RDDs. Local vectors and local matrices are simple data models that serve as public interfaces. The underlying linear algebra operations are provided by Breeze and jblas. A training example used in supervised learning is called a “labeled point” in MLlib.

MLlib支持 在单独节点上本地化存储局部向量(local vectors) 和局部矩阵(local matrices),也可以依赖一个或更多的RDD来进行分布式的存储矩阵。局部向量和局部矩阵是简单的数据模型,被作为公共接口。底层的线性代数操作由 Breeze 和 jblas 提供。在MLlib中,一个使用监督式学习的例子被叫做“labeled point”。

局部向量 Local vector

A local vector has integer-typed and 0-based indices and double-typed values, stored on a single machine. MLlib supports two types of local vectors: dense and sparse. A dense vector is backed by a double array representing its entry values, while a sparse vector is backed by two parallel arrays: indices and values. For example, a vector
(1.0, 0.0, 3.0)
can be represented in dense format as
[1.0, 0.0, 3.0]
or in sparse format as
(3, [0, 2], [1.0, 3.0])
, where
3
is the size of the vector.
一个局部向量由一个从0开始的整数类型索引和一个double类型的值组成,被存储在一个单独的机器上。MLlib支持两种类型的局部向量:密集型和稀疏行。一个密集型依靠一个double型数组来代表他的entry值,而一个稀疏型向量依靠两个并行数组:索引数组和值数组。举个例子,一个向量(1.0,0.0,3.0)可以被表示为密集型格式:[1.0, 0.0, 3.0] 或者被表示为稀疏型格式:(3, [0,2], [1.0, 3.0]),元组的第一个值3是向量的数量。
Scala
The base class of local vectors is
Vector
, and we provide two implementations:
DenseVector
and
SparseVector
. We recommend using the factory methods implemented in
Vectors
to create local vectors.
局部向量的基本类型是Vector,我们提供了两种实现:
DenseVector
and
SparseVector
.
我们推荐使用
Vectors 已经实现了的
工厂方法来创建局部向量。
Refer to the
Vector
Scala docs
and
Vectors
Scala docs
for details on the API.
详细信息请参阅
Vector
Scala docs
and
Vectors
Scala docs
API.
import org.apache.spark.mllib.linalg.{Vector, Vectors}
// Create a dense vector (1.0, 0.0, 3.0).
val dv: Vector = Vectors.dense(1.0, 0.0, 3.0)
// Create a sparse vector (1.0, 0.0, 3.0) by specifying its indices and values corresponding to nonzero entries.
val sv1: Vector = Vectors.sparse(3, Array(0, 2), Array(1.0, 3.0))
// Create a sparse vector (1.0, 0.0, 3.0) by specifying its nonzero entries.
val sv2: Vector = Vectors.sparse(3, Seq((0, 1.0), (2, 3.0)))

//创建一个密集型局部向量(density)
val dv = Vectors.dense(Array(1.0,0.0,3.0))
val densityVector = Vectors.dense(1.0,0.0,3.0)

//创建一个稀疏型局部向量(sparse),两种方式:
//一:使用并行数组:格式-> (size,index[Int],values[Double])
val sv1 = Vectors.sparse(3,Array(0,2),Array(1.0,3.0))

//二:使用Seq:格式-> (size,Seq((index,values)+))
val sv2 = Vectors.sparse(3,Seq((0,1.0),(2,3.0)))

println(dv)
println(densityVector)
println(sv1)
println(sv2)
println(sv3)

result:
[1.0,0.0,3.0]
[1.0,0.0,3.0]
(3,[0,2],[1.0,3.0])
(3,[0,2],[1.0,3.0])
(3,[0,2],[1.0,3.0])


Note: Scala imports
scala.collection.immutable.Vector
by default, so you have to import
org.apache.spark.mllib.linalg.Vector
explicitly to use MLlib’s
Vector
.

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  Spark mllib 大数据 Scala