您的位置:首页 > 数据库 > Mongodb

MongoDB oplog 深入剖析

2015-07-10 17:18 681 查看
MongoDB 的Replication是通过一个日志来存储写操作的,这个日志就叫做oplog。

在默认情况下,oplog分配的是5%的空闲磁盘空间。通常而言,这是一种合理的设置。可以通过mongod --oplogSize来改变oplog的日志大小。

oplog是capped collection,因为oplog的特点(不能太多把磁盘填满了,固定大小)需要,MongoDB才发明了capped collection(the oplog is actually the reason capped collections were invented).
oplog的位置

oplog在local库:

master/slave 架构下

local.oplog.$main;

replica sets 架构下:

local.oplog.rs

sharding 架构下,mongos下不能查看oplog,可到每一片去看。

mongos> use
local
switched to db
local
mongos> show collections
Thu Mar 28 11:37:11 uncaught exception:error:{
"$err"
:
"can't use 'local' database through mongos"
,
"code"
:13644 }
oplog的格式

MongoDB 2.0版本

PRIMARY> db.oplog.rs.findOne()
{
    
"ts"

:{
        
"t"

:1354919611000,
        
"i"

:196
    
},
    
"h"

:NumberLong(
"-8946637877024029255"
),
    
"op"

:
"i"
,
    
"ns"

:
"msg.msgToSend"
,
    
"o"

:{
        
"_id"

:ObjectId(
"50c26ecae7d64ae0b5f36cfe"
),
        
...
    
}
}
MongoDB 2.2版本

PRIMARY> db.oplog.rs.findOne()
{
    
"ts"

:Timestamp(1364362801000, 8247),
    
"h"

:NumberLong(
"8229173295225699173"
),
    
"v"

:2,
    
"op"

:
"i"
,
    
"ns"

:
"goods.Simigoods"
,
    
"fromMigrate"

:
true
,
    
"o"

:{
        
"_id"

:ObjectId(
"50b534310eba2018b88ba3b2"
),
                
...
    
}
}
可以看到有个字段"fromMigrate" :true,之前以为是从2.0升级过来的,后查看源码发现并发如此,fromMigrate指的是chunk是迁移过来的,分片里的块移动,详见src/mongo/s/d_migrate.cpp,v表示OPLOG_VERSION,oplog版本。

新搭建的结构形如:

PRIMARY> db.version()
2.2.2
PRIMARY> db.oplog.rs.findOne()
{
    
"ts"

:Timestamp(1364186197000, 58),
    
"h"

:NumberLong(
"-7878220425718087654"
),
    
"v"

:2,
    
"op"

:
"u"
,
    
"ns"

:
"exaitem_gmsbatchtask.jdgmsbatchtask"
,
    
"o2"

:{
        
"_id"

:
"83f09a98-6a41-497b-a988-99ba5399d296"
    
},
    
"o"

:{
        
"_id"

:
"83f09a98-6a41-497b-a988-99ba5399d296"
,
        
"status"

:2,
        
"content"

:
""
,
        
"type"

:17,
        
"business"

:
"832722"
,
        
"optype"

:2,
        
"addDate"

:ISODate(
"2013-03-25T04:36:38.511Z"
),
        
"modifyDate"

:ISODate(
"2013-03-25T04:36:39.131Z"
),
        
"source"

:5
    
}
}
MongoDB 2.4版本

{
    
"ts"

:{
        
"t"

:1361948104000,
        
"i"

:325
    
},
    
"h"

:NumberLong(
"-8795977166222676062"
),
    
"v"

:2,
    
"op"

:
"i"
,
    
"ns"

:
"test.log"
,
    
"o"

:{
        
"_id"

:ObjectId(
"51031ca0c86617a8811be893"
),
        
...
    
}
}
格式大同小异,2.4版本又改回去了。ts格式2.2版本中是Timestamp(1364186197000, 58)形式,MongoDB2.0版本及MongoDB2.4版本是{"t" :1361948104000, "i" :325 }形式,另外若用MongoDB2.4版本的客户端(mongo)查看2.2版本的,看到的是MongoDB2.4版本的格式,这个只与mongo版本有关。

oplog相关字段含义

ts:the time this operation occurred.

h:a unique ID for this operation. Each operation will have a different value in this field.

op:the write operation that should be applied to the slave. n indicates a no-op, this is just an informational message.

ns:the database and collection affected by this operation. Since this is a no-op, this field is left blank.

o:the actual document representing the op. Since this is a no-op, this field is pretty useless.

The o field now contains the document to insert or the criteria to update and remove. Notice that, for the update, there are two o fields (o and o2). o2 give the update criteria and o gives the modifications (equivalent to update()‘s second argument).

ts:8字节的时间戳,由4字节unix timestamp + 4字节自增计数表示。

这个值很重要,在选举(如master宕机时)新primary时,会选择ts最大的那个secondary作为新primary。

op:1字节的操作类型,例如i表示insert,d表示delete。

ns:操作所在的namespace。

o:操作所对应的document,即当前操作的内容(比如更新操作时要更新的的字段和值)

o2:在执行更新操作时的条件,仅限于update时才有该属性。

其中op,可以是如下几种情形之一:

"i": insert

"u": update

"d": delete

"c": db cmd

"db":声明当前数据库 (其中ns 被设置成为=>数据库名称+ '.')

"n":no op,即空操作,其会定期执行以确保时效性 。

20130719更新:今天发现修改配置,会产生 "n" 操作

{
"ts"

:Timestamp(1372320938000, 1), 
"h"

:NumberLong(
"2050563086860406946"
),
"v"
:2, 
"op"

:
"n"
,
"ns"
:
""
,
"o"
:{
"msg"

:
"Reconfig set"
,
"version"
:6 } }
{
"ts"

:Timestamp(1372319914000, 1), 
"h"

:NumberLong(
"5828735007195954091"
),
"v"
:2, 
"op"

:
"n"
,
"ns"
:
""
,
"o"
:{
"msg"

:
"Reconfig set"
,
"version"
:5 } }
{
"ts"

:Timestamp(1372318223000, 1), 
"h"

:NumberLong(
"512600544405470974"
),
"v"
:2, 
"op"

:
"n"
,
"ns"
:
""
,
"o"
:{
"msg"

:
"Reconfig set"
,
"version"
:4 } }
除了以上这些,还有两个bool型的字段,一个是上面提到的fromMigrate,另一是字段b,仔细看oplog我们发现有"b":true的文档,是在delete和update操作时的bool值(update一个或多个)。

举例:

{
    
"ts"

:{
        
"t"

:1354923335000,
        
"i"

:2
    
},
    
"h"

:NumberLong(
"563747339476084113"
),
    
"op"

:
"u"
,
    
"ns"

:
"msg.device"
,
    
"o2"

:{
        
"_id"

:ObjectId(
"509fa1207386d978864c7833"
)
    
},
    
"o"

:{
        
"$set"

:{
            
"flag"

:
"1"
,
            
"pin"

:
"5126d5b23c303"
,
            
"device"

:
"ceb27de6b9dd8f045130f046a7662630"
,
            
"modified"

:ISODate(
"2012-12-07T23:36:24.628Z"
)
        
}
    
}
}
了解了oplog的详细结构,我们就可以根据原理写个程序,来达到同步数据的目的,详见mongosync。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: