您的位置：首页 > 数据库 > Mongodb

基于 pyMongo 和 wxPython 实现的 MongoDB Schema Analyser

2019-01-26 22:09 1211 查看

MongoDB 作为文档型 NoSql 数据库，它的集合表结构往往不像关系型数据库那么固定和统一，同一个集合下的文档（document）的字段变化和差异可能很大，特别是在数据模型缺乏良好规划和规范的数据库。

当接手一个基于 MongoDB 存储、计算的新项目，在缺乏 ORM 等映射抽象的情况下，了解其库表的结构以及集合的Schema十分的重要。MongoBooster(MongoDB4.0 之后为 NoSQlBooster for MongoDB) 可视化数据库客户端是一个方便高效的工具，它集成了mongo Shell，提供了对数据库的各种操作，包括CRUD、数据库表状态查询等等。它非常强大，自然也包含了对 Schema 的分析，遗憾的是，这个功能只对注册用户开放，非注册用户只能在test数据库上做test...

以下对本机mongodb://localhost:27017,localhost:27019,localhost:27020副本集上的test数据库的test集合进行Schema 分析为例。下图为MongoBooster 的Schema 分析结果。

这里，为了表达对自由开放的推崇，对于MongoDB Schema Analyser的功能，我找到了两个替代工具。

Variety.js

> https://github.com/variety/variety

Schema 分析的命令行工具。
命令行调用：

mongo [mongoURI]  --eval " var collection = 'test'" variety.js

基于javaScript，支持的参数很多，但是运行速度不能报很高期望，对于大集合程序往往也会崩溃。

pyMonSchema

> https://github.com/HanseyLee/pyMonSchema

pyMonSchema是一个基于pyMongo 和 wxPython 实现的 MongoDB Schema Analyser GUI 工具，界面连接和切换数据库集合，支持自定义查询语句、查询排序、限值，支持忽略键名数组及忽略键名的正则表达式，支持嵌套字段的分析。Schema 分析使用MongoDB 的 MapReduce，速度和稳定性上远高于Variety.js。
自定义字段的使用说明：

- Query -> MongoDB query document to filter input to analyse. e.g. {"keyName": {"$in": ["key1", "key2"]}}, {"keyName": {"$exists": True}}(Note that: PyMonSchema use "eval()" to deserialize query string, so use 'True'/'False' as bool value)
- Order -> Positive/Negative, used in sort document, order=Positive equivalent to sort("_id":1), order=Negative equivalent to sort("_id":-1).
- Limit -> Int, limit value of query result. Empty default is 0, which means no limit.
- Omit_keys -> Fields string to be omitted, sperate by comma. such as: keyName1, keyName2 .
- Omit_patterns -> Fileds match these regular expression patterns will be omitted, sperate by comma. such as: ^keyNameHead, keyNameTail$ .
- Embed-keys -> Whether to analyse embed-key (e.g. keyNameParent.keyNameChild1.keyNameChild2) or not.

注意，这里的Query Document 输入实为字符串，程序会使用python eval 函数对其进行转化为python 对象，如：{"keyName": {"\(in": ["key1", "key2"]}}, {"keyName": {"\)exists": True}}。

pyMonSchema分析的字段类型，对于Number 类型，会进一步推断其为Int32, 或Double类型（MongoDB 默认超过Int32的整数也为Double类型）。

另外，对应分析的结果，还可以保存的json文件，格式如下：

[
{
"key": "_id",
"total_occurrence": 15.0,
"statics": [
{
"type": "ObjectId",
"occurrence": 15.0,
"percent": 100.0
}
]
},
{
"key": "hello",
"total_occurrence": 9.0,
"statics": [
{
"type": "Int32",
"occurrence": 1.0,
"percent": 6.666666666666667
},
{
"type": "String",
"occurrence": 8.0,
"percent": 53.333333333333336
}
]
},
...
]

对于多数据库/集合的批量Schema 分析，pyMonSchema 的 mongoDBM.DBManager 类对此提供了充分的支持，可以使用多进程、多线程来对其进行实现，参考https://blog.csdn.net/fzlulee/article/details/85944967 ，或 github 源码https://github.com/HanseyLee/pyMonSchema。
【正文完】

注，以上内容同步自同名博客 https://blog.csdn.net/fzlulee/article/details/86651664 。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航