您的位置:首页 > 数据库 > Mongodb

mongodb 投影_使用投影提高MongoDB性能

2020-08-21 14:03 1276 查看

mongodb 投影

This article documents all my findings and analysis on how much performance was improved using projection in MongoDB. At the end of this article, we will be able to know whether MongoDB query performance will be improved by leveraging projection.

本文记录了我的所有发现以及有关使用MongoDB中的投影提高了多少性能的分析。 在本文的结尾,我们将能够知道通过利用投影是否可以提高MongoDB查询性能。

Without further ado, let’s start.

事不宜迟,让我们开始吧。

问题陈述 (Problem Statement)

This article was inspired during my working hour where I used projection when I retrieved the data from MongoDB. Projection is “a document given to a query that specifies which fields MongoDB returns in the result set.” according to MongoDB’s Official Documentation.

本文的灵感来自我的工作时间,当我从MongoDB检索数据时使用投影 。 投影是“提供给查询的文档,用于指定MongoDB在结果集中返回哪些字段。” 根据MongoDB的官方文档

It’s like ordering a Big Mac at McDonald’s, and we can choose a la carte instead of a full set that comes with drink and fries.

这就像在麦当劳订购巨无霸,我们可以选择点菜而不是饮料和炸薯条随附的全套食物。

Thus, I was wondering — how much does the query performance improve if projection was used. Here are the primary objectives that I wanted to achieve in this research:

因此,我想知道-如果使用投影,查询性能会提高多少。 这是我希望在这项研究中实现的主要目标:

主要目标 (Primary objectives)

  • Discover whether performance will be improved if projection was used in MongoDB query.

    发现如果在MongoDB查询中使用投影,性能是否将得到改善。
  • Discover the best scenario to use projection in MongoDB query.

    发现在MongoDB查询中使用投影的最佳方案。

解决方案分析 (Solution Analysis)

I always started with finding out what I need in order to continue the research. These items are what I needed:

我总是从找出继续进行研究所需的内容开始。 这些是我需要的:

  • A collection with more than 500K documents so that I can find out the difference in query time with and without projection.

    一个包含50万多个文档的集合,因此我可以找出有无投影时查询时间的差异。
  • Sub-document schema. This is because I suspect the document with sub-documents will increase a significant amount of query time. Let’s prepare this for the experiment as well.

    子文档架构。 这是因为我怀疑带有子文档的文档会增加大量的查询时间。 让我们为实验做准备。

Refer to the screenshot below for the outcome of data preparation. Check out this article on how I generate millions of dummy data for performance optimization.

有关数据准备的结果,请参见下面的屏幕截图。 看看这篇文章对我如何产生数以百万计的性能优化的虚拟数据。

From this screenshot, we knew that we have generated 500K documents with the following fields:

从此屏幕截图中,我们知道我们已经生成了500K文档,其中包含以下字段:

  • booking_no
    - Booking Number for Flight

    booking_no
    航班预订号

  • origin
    - Departure City

    出发

    origin
    -出发城市

  • destination
    - Arrival City

    destination
    -到达城市

  • persons
    - An array of people which consists of
    first_name
    ,
    last_name
    and
    dob
    field

    persons
    -由
    first_name
    last_name
    dob
    字段组成的
    persons
    数组

性能实验 (Performance Experiment)

Before we started any experiment, let’s ensure the setup is correct. There are no indexes created the collection yet, except the default

_id
field.

在开始任何实验之前,请确保设置正确。 除默认的

_id
字段外,尚无索引创建集合。

The experiments I would like to perform here are:

我想在这里进行的实验是:

  • Experiment 1: Will query performance increase if I project lesser fields?

    实验1:如果我投影较少的字段,查询性能会提高吗?
  • Experiment 2: If experiment 1 result is no, what other scenarios will find out how projection will improve query performance?

    实验2:如果实验1的结果为“否”,还有哪些其他方案可以发现投影如何提高查询性能?

实验1:如果我投影较少的字段,查询性能会提高吗? (Experiment 1: Will Query Performance Increase If I Project Lesser Fields?)

Unfortunately, the answer is no. However, the performance will improve if those returning fields are all indexed, and we will talk about this in the next section.

不幸的是,答案是否定的 。 但是,如果对所有返回的字段都进行了索引,则性能将会提高,我们将在下一节中讨论这一点。

In this experiment, we’re going to retrieve all the flight bookings in which the destination is “Gerlachmouth”. Out of 500K bookings, there are 93 bookings where the destination is “Gerlachmouth”. Let’s examine how long it took to return these 93 documents.

在此实验中,我们将检索目的地为“ Gerlachmouth”的所有航班预订。 在50万笔预订中,有93件预订的目的地是“ Gerlachmouth”。 让我们检查一下返回这93个文档所需的时间。

I perform the performance analysis using the Mongo Shell Explain function, which enables us to discover the time spent on query and query strategy that was used.

我使用Mongo Shell Explain函数执行性能分析,这使我们能够发现花费在查询和所用查询策略上的时间。

The above screenshot shows the result when retrieving without projection. The query took 461ms to complete. While the screenshot below shows the result where we leverage projection, the query took 505ms to complete.

上面的屏幕截图显示了没有投影的检索结果。 该查询耗时461毫秒完成。 虽然以下屏幕截图显示了我们利用投影的结果,但查询需要505毫秒才能完成。

Thus, the performance did not improve — instead, it took a much longer time to process the query when we use projection.

因此,性能没有提高-相反,当我们使用投影时,花了更长的时间来处理查询。

The conclusion for Experiment 1 — Performance did not improve when you implement projection in the query. 👎👎

在查询中实施投影时,实验1的结论-性能没有提高。 👎👎

实验2:如果实验1的结果为否,请查找有关投影如何提高查询性能的其他方案 (Experiment 2: If the Experiment 1 Result is No, Find Other Scenarios on How Projection Improves Query Performance)

Since my first hypothesis was wrong, then I tried to do some research and re-visit the performance course offered by MongoDB University. The course is free — check it out if you are interested in learning MongoDB performance.

由于我的第一个假设是错误的,因此我尝试进行一些研究并重新访问MongoDB大学提供的性能课程 。 该课程是免费的-如果您有兴趣学习MongoDB性能,请查看。

And I discovered Covered Query. Covered Query is a “query that can be satisfied entirely using an index and does not have to examine any documents”, according to MongoDB’s official documentation.

我发现了Covered Query。 根据MongoDB的官方文档 ,涵盖查询是一种“可以完全使用索引满足的查询,而不必检查任何文档” 。

We can use the cooking metaphor to understand Covered Query. Imagine that you’re cooking a meal with all the ingredients are ready and inside your fridge. Basically, everything is covered, and you just have to cook it.

我们可以使用烹饪的隐喻来理解Covered Query。 想象一下,您正在烹饪一顿饭,里面所有食材都准备好了,放在冰箱里。 基本上,所有东西都覆盖了,您只需要煮饭即可。

Before we create any indexes for the database, let’s start by asking: What is the field that we want to return to the application? Let’s give the following scenario:

在为数据库创建任何索引之前,让我们开始询问:我们要返回给应用程序的字段是什么? 让我们给出以下情形:

  • Admin would like to know all the flight bookings to a specific destination. The information that Admin would like to know is their respective

    booking_no
    ,
    origin
    and
    destination
    .

    管理员想知道到特定目的地的所有航班预订。 管理员想知道的信息是他们各自的

    booking_no
    booking_no
    origin
    destination

Given the scenario above, let’s start by creating indexes. We can create two indexes.

在上述情况下,让我们从创建索引开始。 我们可以创建两个索引。

  • Destination — Create an index on the destination field only.

    目标—仅在目标字段上创建索引。
  • Destination, Origin, and Booking No. — We can create a compound index with the sequence

    destination
    ,
    origin
    and
    booking_no
    field.

    Destination,Origin和Booking No. —我们可以使用

    destination
    origin
    booking_no
    字段来创建复合索引。

Refer to the command below on how to create the index.

有关如何创建索引,请参考下面的命令。

查询无投影 (Query without projection)

First, let’s start to query the booking where the destination is “Gerlachmouth”. The screenshot below shows the execution time for the query. As you can see, the total execution time reduced to 5ms. It was almost 100 times faster compared to one without indexes.

首先,让我们开始查询目的地为“ Gerlachmouth”的预订。 下面的屏幕快照显示了查询的执行时间。 如您所见,总执行时间减少到5ms 。 与没有索引的索引相比,它快了100倍

You might be satisfied with this performance, but this is not the end of the optimization. We can improve the query performance, and make it 250 times faster using Covered Query compared to without indexes.

您可能会对这种性能感到满意,但这并不是优化的终点。 我们可以提高查询性能,并使用Covered Query使其速度比不使用索引快250倍

投影查询(覆盖查询) (Query with Projection (Covered Query))

Using the covered query means we’re querying fields that are is indexed.

使用涵盖的查询意味着我们正在查询被索引的字段。

Using the above command, we able to optimize the query to 2ms, which is around 60% faster without using projection on the indexed field.

使用上面的命令,我们可以将查询优化为2ms ,这大约快了60%,而无需在索引字段上使用投影。

Aside from improving execution time, we also improve the query strategy. From the screenshot, we can see that we did not examine any documents, meaning the index itself already enough to satisfy the query. Thus, this improves the query performance overall, as we don’t have to fetch the documents.

除了缩短执行时间外,我们还改善了查询策略。 从屏幕截图中可以看到,我们没有检查任何文档,这意味着索引本身已经足以满足查询要求。 因此,由于我们不必获取文档,因此整体上提高了查询性能。

结论 (Conclusion)

Here are the key points of this article.

这是本文的重点。

  • Project lesser fields will not improve query performance unless all the returned fields can be satisfy using an index.

    计划较少的字段不会提高查询性能,除非使用索引可以满足所有返回的字段。
  • An index can improve performance, but covered queries can level up your query performance.

    索引可以提高性能,但是涵盖的查询可以提高查询性能。
  • Covered Query performed 60% faster than Normal Optimized Query using Index Scan.

    涵盖查询的执行速度比使用索引扫描的普通优化查询快60%。

Thank you for reading. See you in the next article.

感谢您的阅读。 下篇文章见。

翻译自: https://medium.com/better-programming/improve-mongodb-performance-using-projection-c08c38334269

mongodb 投影

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: