您的位置:首页 > 数据库 > Mongodb

mongodb学习记录之三:索引

2014-03-26 22:11 211 查看


索引简介

索引就是用来加速查询的
现在要按照某个键进行查询:
>db.students.find({"name":"李明"});

当查询中仅含有一个键时,可以对该键创建索引,以提高查询速度。
本例中对name创建索引,创建索引使用ensureIndex方法
>db.students.ensureIndex({"name":1})

对于同一个集合,同样的索引只需要创建一次即可。
对某个键创建的索引会加速对该键的查询,而对其他键的查询并不起作用,即使查询中包含了索引的键。
实践证明,一定要创建查询中用到的所有键的索引。


索引的方向

如果有多个键,就得考虑索引的方向。
例如还是之前的students集合
{"_id":...,"name":"smith","age":48,"uid":1}
{"_id":...,"name":"smith","age":32,"uid":2}
{"_id":...,"name":"joe","age":36,"uid":3}
{"_id":...,"name":"joe","age":35,"uid":4}
{"_id":...,"name":"john","age":33,"uid":5}

如果按照{name:1,age:-1}这种方式创建索引,mongo会按照如下方式组织:
{"_id":...,"name":"joe","age":36,"uid":3}
{"_id":...,"name":"joe","age":35,"uid":4}
{"_id":...,"name":"john","age":33,"uid":5}
{"_id":...,"name":"smith","age":48,"uid":1}
{"_id":...,"name":"smith","age":32,"uid":2}

用户名按照字母升序排序,age按照降序排序
一般来说,如果索引包含n个键,则对于前几个键的查询都会有帮助。
例如,索引为{"a":1,"b":1,"c":1,...,"z":1},实际上是有了{"a":1}{"a":1,"b":1}{"a":1,"b":1,"c":1}...等的索引。
但是使用{"b":1}{"a":1,"c":1}却不被优化。
创建索引的缺点就是,每次插入,更新和删除时都会产生额外的开销。这是因为数据库不仅要执行这些操作,还要将这些操作在集合的索引中标记。因此要尽可能少的创建索引。


为内嵌文档创建索引

为内嵌文档创建索引和为普通的键创建索引没有什么区别。例如,要想按日期搜索博客的评论,可以在由内嵌的comments文档组成的数组中对date键创建索引
>db.blog.ensureIndex({"comments.date":1})


为排序创建索引

在数据量很大时,如果没有对索引进行排序,那么mongodb需要将所有的数据提取到内存进行排序。


唯一索引

唯一索引可以确保文档中指定的键都有唯一值。
>db.students.ensureIndex({"name":1},{"unique":true})

这样我们便给name键做了唯一索引。确保name键不能出现重复的值
当一个记录被插入到唯一性索引文档时,缺失的字段会以null为默认值被插入文档
上面这句话有点迷糊,下面写个例子测试一下就明白了。
新建一个person集合,结构如下:
{
"_id" : ObjectId("5332777c74d09b6a7018fbf4"),
"name" : "张明明",
"sex" : "男",
"age" : 23,
"books" : [
"诗人的世界",
"Java编程思想",
"HTML5实战"
]
}

有5个键,下面我们在name和age上创建一个唯一索引
>db.person.ensureIndex({"name":1,"age":-1},{"unipue":true})

创建完毕后用getIndexes查看刚才创建的索引。
{
"v" : 1,
"key" : {
"name" : 1,
"age" : -1
},
"unique" : true,
"ns" : "test.person",
"name" : "name_1_age_-1"
}

下面我们插入数据进行测试.第一条数据已完整的插入,如上
我们先插入一条姓名为"张明",年龄为23的文档,成功。结果如下:
db.person.insert({name:"张明",age:23,sex:"男",books:["诗人的世界","Java编程思想","HTML5实战"]})
结果:
{"name" : "张明明", "sex" : "男", "age" : 23, "books" : [  "诗人的世界","Java编程思想",  "HTML5实战" ] }
{"name" : "张明", "age" : 23, "sex" : "男", "books" : [  "诗人的世界",  "Java编程思想",  "HTML5实战" ] }

下面我们再插入一条:
db.person.insert({name:"张明",age:22,sex:"男",books:["诗人的世界","Java编程思想","HTML5实战"]})
结果:
{ "name" : "张明明", "sex" : "男", "age" : 23, "books" : [  "诗人的世界",  "Java编程思想",  "HTML5实战" ]
{ "name" : "张明", "age" : 23, "sex" : "男", "books" : [  "诗人的世界",  "Java编程思想",  "HTML5实战" ] }
{ "name" : "张明", "age" : 22, "sex" : "男", "books" : [  "诗人的世界",  "Java编程思想",  "HTML5实战" ] }

我们再插入一条姓名和年龄都相同的数据看看:
db.person.insert({name:"张明",age:22,sex:"男",books:["诗人的世界","Java编程思想","HTML5实战"]})
结果:
E11000 duplicate key error index: test.person.$name_1_age_-1  dup key: { : "张明", : 22.0 }

上面两个例子可以看出,创建了唯一索引后,创建索引的键组合是唯一的,不能重复的。
如果创建了索引后,在插入时不包含索引的键会怎样?
db.person.insert({age:22,sex:"男",books:["诗人的世界","Java编程思想","HTML5实战"]})
结果:
{ "name" : "张明明", "sex" : "男", "age" : 23, "books" : [  "诗人的世界",  "Java编程思想",  "HTML5实战" ] }
{ "name" : "张明", "age" : 23, "sex" : "男", "books" : [  "诗人的世界",  "Java编程思想",  "HTML5实战" ] }
{ "name" : "张明", "age" : 22, "sex" : "男", "books" : [  "诗人的世界",  "Java编程思想",  "HTML5实战" ] }
{ "age" : 22, "sex" : "男", "books" : [  "诗人的世界",  "Java编程思想",  "HTML5实战" ] }

db.person.insert({name:null,age:null,sex:"男",books:["诗人的世界","Java编程思想","HTML5实战"]})
结果:
{ "name" : "张明明", "sex" : "男", "age" : 23, "books" : [  "诗人的世界",  "Java编程思想",  "HTML5实战" ] }
{ "name" : "张明", "age" : 23, "sex" : "男", "books" : [  "诗人的世界",  "Java编程思想",  "HTML5实战" ] }
{ "name" : "张明", "age" : 22, "sex" : "男", "books" : [  "诗人的世界",  "Java编程思想",  "HTML5实战" ] }
{ "age" : 22, "sex" : "男", "books" : [  "诗人的世界",  "Java编程思想",  "HTML5实战" ] }
{ "name" : null, "age" : null, "sex" : "男", "books" : [  "诗人的世界",  "Java编程思想",  "HTML5实战" ] }

这时候我们再插入一条数据,没有姓名,没有年龄:
db.person.insert({sex:"男",books:["诗人的世界","Java编程思想","HTML5实战"]})
结果:
E11000 duplicate key error index: test.person.$name_1_age_-1  dup key: { : null, : null }

由此我们就可以慢慢理解了,当一个键值为null时,其实和没有这个键是一样的。这也就是说,如果在插入时,唯一索引缺失的键会以null为默认值


消除重复

有时候我们在创建索引之前,数据库中已存在了重复的键值,此时创建索引失败。我们可以使用dropDups消除重复,只保留发现的第一个文档,删除其他重复的文档,然后创建索引。
db.person.ensureIndex({"name":1,"age":-1},{"unique":true,"dropDups":true})



地理空间索引(二维索引)



这是我从百度地图上截下来的图,下面是我自己定义的坐标点(x,和y的范围都是从1到100)
var maps = [
{"name":"公厕1","gis":{"x":40,"y":8}},
{"name":"金水桥","gis":{"x":50,"y":10}},
{"name":"故宫博物院南门","gis":{"x":50,"y":0}},
{"name":"公厕2","gis":{"x":65,"y":10}},
{"name":"西华门","gis":{"x":5,"y":13}},
{"name":"公厕3","gis":{"x":25,"y":15}},
{"name":"故宫博物院","gis":{"x":50,"y":18}},
{"name":"主敬殿","gis":{"x":74,"y":15}},
{"name":"东华门","gis":{"x":95,"y":13}},
{"name":"浴德堂","gis":{"x":20,"y":22}},
{"name":"体仁阁","gis":{"x":64,"y":23}},
{"name":"右翼门","gis":{"x":41,"y":35}},
{"name":"左翼门","gis":{"x":65,"y":35}},
{"name":"中国第一历史档案馆","gis":{"x":8,"y":37}},
{"name":"中和殿","gis":{"x":50,"y":50}},
{"name":"箭亭","gis":{"x":68,"y":53}},
{"name":"公厕4","gis":{"x":39,"y":57}},
{"name":"公厕5","gis":{"x":66,"y":60}},
{"name":"皇极右门","gis":{"x":82,"y":62}},
{"name":"故宫礼品店","gis":{"x":43,"y":62}},
{"name":"春花门","gis":{"x":24,"y":73}},
{"name":"凤彩门","gis":{"x":46,"y":75}},
{"name":"景耀门","gis":{"x":64,"y":73}},
{"name":"右鼓馆","gis":{"x":92,"y":70}},
{"name":"宁寿宫","gis":{"x":85,"y":78}},
{"name":"履和门","gis":{"x":65,"y":80}},
{"name":"翊坤宫","gis":{"x":42,"y":83}},
{"name":"咸福宫","gis":{"x":39,"y":85}},
{"name":"故宫商店1","gis":{"x":55,"y":90}},
{"name":"颐和轩","gis":{"x":86,"y":93}},
{"name":"故宫商店2","gis":{"x":46,"y":94}},
{"name":"倦勤斋","gis":{"x":80,"y":94}},
{"name":"珍妃灵堂","gis":{"x":90,"y":90}},
{"name":"故宫博物院北门","gis":{"x":50,"y":100}}
]

for(var i=0;i<maps.length;i++){
db.maps.insert(maps[i]);
}


首先根据坐标建立2d索引
db.maps.ensureIndex({"gis":"2d"},{min:-1,max:101});


假设我现在最中间的中和殿,我突然来急了,想去厕所,怎样找到最近的厕所呢?
db.maps.find({"name":{"$regex":"公厕.*"},"gis":{"$near":[50,50]}});
查询结果:
{ "_id" : ObjectId("53329d6c99589bfa25a5300f"), "name" : "公厕4", "gis" : { "x" : 39, "y" : 57 } }
{ "_id" : ObjectId("53329d6c99589bfa25a53010"), "name" : "公厕5", "gis" : { "x" : 66, "y" : 60 } }
{ "_id" : ObjectId("53329d6c99589bfa25a53002"), "name" : "公厕2", "gis" : { "x" : 65, "y" : 10 } }
{ "_id" : ObjectId("53329d6c99589bfa25a53004"), "name" : "公厕3", "gis" : { "x" : 25, "y" : 15 } }
{ "_id" : ObjectId("53329d6c99589bfa25a52fff"), "name" : "公厕1", "gis" : { "x" : 40, "y" : 8 } }


也就是说,最近的公厕是公厕4,直接去公厕4方便就好了。

上完公厕后,我想看看附近20米范围内有哪些好玩的地方,怎么找呢?
db.maps.find({"gis":{"$within":{"$center":[[39,57],20]}}},{"name":1,"gis":1,"_id":0})
查询结果:
{ "name" : "公厕4", "gis" : { "x" : 39, "y" : 57 } }
{ "name" : "中和殿", "gis" : { "x" : 50, "y" : 50 } }
{ "name" : "凤彩门", "gis" : { "x" : 46, "y" : 75 } }
{ "name" : "故宫礼品店", "gis" : { "x" : 43, "y" : 62 } }
注意:这里并不是按照距离远近排序的,而是20米内所有的点


中和殿刚去过了,不去了,去故宫礼品店看看吧,顺便给朋友带点小礼品啥的。

从故宫礼品店出来,到故宫博物院北门还有一段距离,看看以这两个点为对角线上的矩形区域内还有哪些景点吧
db.maps.find({"gis":{"$within":{"$box":[[43,62],[50,100]]}}},{"name":1,"gis":1,"_id":0})
查询结果:
{ "name" : "故宫商店2", "gis" : { "x" : 46, "y" : 94 } }
{ "name" : "故宫博物院北门", "gis" : { "x" : 50, "y" : 100 } }
{ "name" : "凤彩门", "gis" : { "x" : 46, "y" : 75 } }
{ "name" : "故宫礼品店", "gis" : { "x" : 43, "y" : 62 } }


顺路又玩了一会,出了北门,走了。

总结

空间索引也是通过ensureIndex来创建,只不过创建的时候参数不是1,而是2d,创建的时候,如果不指定范围,默认是-180到180。也可以通过{min:min,max:max}手动指定范围

空间索引查询有$near和$within,$near是查询距离,$within是查询形状范围。$within里有两种,一种是圆形,一种是矩形

圆形以$center指定圆心和半径,矩形以$box指定对角线上的两个点


                                            
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: