您的位置:首页 > 数据库 > Mongodb

mongodb 群集图_群集和重叠条形图

2020-08-06 10:20 651 查看

mongodb 群集图

为什么和如何 (Why & How)

1.- Clustered Bar Charts

1.- 集群条形图

AKA: grouped, side-by-side, multiset [bar charts, bar graphs, column charts]

AKA :分组,并排,多组[条形图,条形图,柱形图]

Why: Clustered Bar Charts (CBC) display numerical information about the relative proportion that exists between a main category and its subgroups that belongs to a second categorical variable. Similar to Stacked Bar Graphs, they should be used for Comparisons and Proportions but with emphasis on Composition. Unlike Stacked Bar Graphs, the elements that make up the subcategories may be diffusely related. CBC are particularly effective when a whole is divided into multiple parts. They enable to make comparisons across subcategories whilst Stacked Bar Graphs make comparisons within subcategories.

原因 :集群条形图(CBC)显示有关主要类别及其子类别之间的相对比例的数字信息,该子类别属于第二个类别变量。 与堆积条形图类似, 它们应用于比较和比例,但重点是组成。 与堆积条形图不同,构成子类别的元素可能是分散相关的。 当一个整体分为多个部分时,CBC尤其有效。 它们使您可以跨子类别进行比较而堆叠条形图则可以在子类别内进行比较。

They allow to visualize how subgroups change over time, but the chart becomes difficult to read with the extension in time and with the increase in the number of subcategories. They should not be used for Relationship or Distribution analysis.

它们可以可视化子组随时间的变化,但是随着时间的延长和子类别数量的增加,图表变得难以阅读。 它们不应用于关系或分布分析。

How: as usual with bar charts, CBC are two-dimensional with two axes: one axis shows categories, the other axis shows numerical values. The axis where the categories are indicated does not have a scale to highlight that it refers to discrete (mutually exclusive) groups. The axis with numerical values must have a scale with its corresponding measurements units.

方式 :与通常的条形图一样,CBC是带有两个轴的二维:一个轴显示类别,另一个轴显示数值。 指示类别的轴没有刻度以突出显示它指的是离散(互斥)组。 带有数值的轴必须具有带有相应测量单位的刻度。

CBC are represented by means of sets of rectangular bars that can be oriented horizontally or vertically. Each principal category is divided into a cluster of bars representing subcategories of the second categorical variable. The quantity of each subcategory is shown by the length or height of those rectangular bars that are located side by side forming a cluster, with gaps between clusters slightly wider than a single standard bar.

CBC用可以水平或垂直定向的矩形条表示。 每个主要类别分为代表第二个类别变量的子类别的一组条形 。 每个子类别的数量由并排形成一个簇的那些矩形条的长度或高度显示,簇之间的间隙比单个标准条稍宽。

Fig. 1: schematic diagram of a clustered bar chart. The figure was developed with Matplotlib. 图1:群集条形图的示意图。 该图是用Matplotlib开发的。

Subcategories can be ordinal or nominal but equivalent subgroups must have the same color in each cluster so as not to confuse the audience. It is essential to use an appropriate color palette, a balanced spacing and a layout that facilitates comparison. As bars are heavy visual markers, use gridlines scantily just for improving the storytelling.

子类别可以是顺序的或名义的,但是等效的子组在每个群集中必须具有相同的颜色,以免引起听众的困惑。 必须使用适当的调色板,平衡的间距和便于比较的布局。 由于条形图是较重的视觉标记,因此请仅使用网格线以改善讲故事的效果。

The following figure shows data about a company performance related with sales, expenses and profits for the 2016–2019 period. It is a vertically oriented clustered bar chart with years as the main category. Sales, expenses and profit are yearly represented as a cluster. The visualization clearly highlights that in 2018, even with the increase in expenses and reduction in sales, profit remained relatively constant.

下图显示了2016-2019年期间与销售,费用和利润相关的公司绩效数据。 它是一个垂直定向的群集条形图,以年为主要类别。 销售,费用和利润以年为单位表示。 可视化清楚地表明,2018年即使支出增加和销售减少,利润仍保持相对稳定。

Fig. 2: economic performance of a fictitious company during the 2016–2019 period. The figure was developed with Matplotlib. 图2:虚拟公司在2016-2019年期间的经济表现。 该图是用Matplotlib开发的。

It is interesting to compare the same data represented by means of a stacked bar chart. As previously indicated, CBC are appropriate when you want to compare across subcategories: sales in 2016 versus 2017 vs. 2018 vs. 2019; expenses in 2016 versus 2017 vs. 2018 vs. 2019; profit in 2016 versus 2017 vs. 2018 vs. 2019. On the contrary, the stacked bar chart only enables to do a good comparison for the segments near the baseline (sales) because expenses and profits have different initial baselines. Also, the height of each principal bar (sum of sales + expenses + profit of a particular year) does not make any sense.

比较通过堆叠条形图表示的相同数据很有趣。 如前所述,CBC适用于您要比较子类别的情况:2016年与2017年对比2018年与2019年对比; 2016年与2017年对比2018年与2019年的支出; 2016年与2017年,2018年与2019年的利润之间的关系。相反,堆积的条形图只能对接近基线(销售额)的细分市场进行很好的比较,因为费用和利润具有不同的初始基线。 同样,每个主要金条的高度(销售总和+费用+特定年份的利润)没有任何意义。

Fig. 3: stacked bar graph with the same data as Fig. 2. 图3:具有与图2相同数据的堆叠条形图

Next figure is related with statistics of tertiary education in the European Union (EU-28) in 2017. There were 19.8 million tertiary students that year, women accounted for 54% of that number although the majority of the students following doctoral titles were men. Besides, a quarter of all students were involved in business, administration and law studies. The following clustered bar chart shows that female surpasses male in Education, Social Sciences, Arts and Humanities, Health and Welfare and also in Business, Administration and Law studies. On the other hand, male surpasses female in IT and Engineering, Manufacturing and Construction studies (Eurostat, 2020). The chart clearly displays numerical information about the participation of men and women in tertiary education across broad fields of education. It is a CBC horizontally oriented where educational fields make up the principal category while gender is the second categorical variable.

下一个数字与2017年欧盟(EU-28)的高等教育统计相关。当年有1980万名大学生,女性占该数字的54%,尽管获得博士学位的大多数是男性。 此外,所有学生的四分之一都参与了商业,行政和法律研究。 下面的条形图显示,在教育,社会科学,艺术与人文科学,健康与福利以及商业,行政和法律研究中,女性超过男性。 另一方面,在信息技术和工程,制造和建筑研究中,男性超过女性(欧盟统计局,2020年)。 该图表清楚地显示了在广泛的教育领域中男女参与高等教育的数字信息。 它是CBC的水平取向,其中教育领域构成主要类别,而性别是第二个类别变量。

Fig. 3: distribution of tertiary education students by field and gender for the European Union during 2017. Source (#1) 图3:2017年欧洲联盟按领域和性别分列的高等教育学生分布。来源(#1)

The main problem with clustered bar graphs is that they don’t clearly visualize the ratio of the individual parts relative to the whole. As a result, proportions are not easy to evaluate. Their strength is related with direct comparisons between equivalent subcategories of the second categorical variable.

聚集条形图的主要问题在于,它们无法清晰地可视化各个部分相对于整个部分的比率。 结果,比例不容易评估。 它们的强度与第二个类别变量的等效子类别之间的直接比较有关。

2.- Overlapped Bar Charts

2.- 重叠的条形图

AKA: Overlay, Overlapping, Superimposed [bar charts, bar graphs, column charts]

AKA :重叠,重叠,叠加[条形图,条形图,柱形图]

Why: Overlapped Bar Charts (OVC) are used to make comparisons between different items or categories. OVC compare only two numerical variables per item or category in a single diagram. The numerical variables must be closely related to merit a comparison. They are also used to show trends over time based on similar premises. They should not be used for Relationship or Distribution analysis.

原因 :重叠的条形图(OVC)用于在不同项目或类别之间进行比较 。 OVC在单个图中仅比较每个项目或类别的两个数字变量 。 数值变量必须与优点比较紧密相关。 它们还用于根据类似前提显示一段时间内的趋势。 它们不应用于关系或分布分析。

The conceptual idea related with OVC is to contrast numerical values ​​of two variables that overlapped one onto other allows to describe the message (storytelling) with greater expositional power. In such sense, they are better than Clustered Bar Graphs because the comparison is intuitively superior. This kind of chart shows surpluses and shortages with remarkable precision, particularly when appropriate grids are added to it. They are frequently used to show level of progress against an objective or against a benchmark.

与OVC相关的概念是将两个相互重叠的变量的数值进行对比,从而以更大的论述能力来描述消息(讲故事)。 从这种意义上讲,它们比聚类条形图更好,因为在直观上比较效果更好。 这种图表以非常精确的精度显示了盈余和短缺,特别是在添加适当的网格时。 它们通常用于显示相对于目标或基准的进度水平。

Fig. 4: schematic diagram of a overlapped bar chart. The figure was developed with Matplotlib 图4:重叠条形图的示意图。 该图是用Matplotlib开发的
How: it is a two dimensional graph with two axis -similar to every standard bar chart- with rectangular bars that can be oriented horizontally or vertically. One axis shows categories, the other axis shows numerical values related with two variables. Bars representing the same category share the same baseline and the same location on the corresponding axis. Both numerical variables must be closely related and share the same numerical scale. The width of the bars is different for each numerical variable with the smaller going forward for clarity of reading. The drawback is that for some categories one of the bars is the shorter while it is the longer for others.
Fig. 5: Actual versus Budgeted expenses for a fictitious company during the 2012–2019 period. The figure was developed with Matplotlib 图5:虚拟公司在2012-2019年期间的实际支出与预算支出。 该图是用Matplotlib开发的

Some visualization tools allow to partially overlap several numerical variables (multiple data series) such that rectangles representing each successive numerical variable are partially hidden by other rectangles located in front of them. Conceptually, they are equivalent to clustered (grouped) bar charts when the rectangles representing the different data sets begin to overlap instead of being located side by side. OVC implies the extreme case where a rectangle overlaps 100% ahead of another rectangle. Undoubtedly, audiences will find very difficult to make comparisons with three or more partially overlapping bars. Its use could be justified when data of multiple subcategories must be compared over very long periods of time in a single diagram.

一些可视化工具允许部分重叠几个数值变量(多个数据系列),以便表示每个连续数值变量的矩形被位于它们前面的其他矩形部分隐藏。 从概念上讲,当代表不同数据集的矩形开始重叠而不是并排放置时,它们等效于聚簇(分组)条形图。 OVC表示一个极端情况,即一个矩形在另一个矩形之前重叠100%。 无疑,观众将很难对三个或更多部分重叠的条形进行比较。 当必须在很长一段时间内在一个图中比较多个子类别的数据时,可以证明其用途合理。

Fig. 6: partially overlapped bar charts, source Peltier Tech Blog (#2) 图6:部分重叠的条形图,来源Peltier Tech Blog(#2)

To sum up, you might use a clustered bar graph when you want to make direct comparisons across parts of a whole. On the other hand, overlapped bar graphs enable to do excellent comparisons between two closely related numerical variables.

综上所述 ,当您想对整个部分进行直接比较时,可以使用聚簇条形图。 另一方面,重叠的条形图可以在两个紧密相关的数值变量之间进行出色的比较。

As usual with standard bar graphs, I recommend the following tips and warnings for both types of charts:

与标准条形图一样,对于这两种类型的图表,我建议以下提示和警告

Start the baseline at 0: if the bars are truncated, the actual value is not properly reflected;

将基线从0开始:如果条形被截断,则实际值不能正确反映;

Vertical orientation (column charts) is recommended when chronological data (time series, temporal data) or negative numerical values ​​are present (Fig. 2 & Fig. 5). On the other hand, it is preferable to use horizontal orientations when graphing numerous categories, in particular with very long labels (Fig. 3);

如果存在时间顺序数据(时间序列,时间数据)或负数值(图2和图5),则建议使用垂直方向(柱形图)。 另一方面,在绘制多个类别的图形时,尤其是使用非常长的标签时,最好使用水平方向(图3)。

Partially overlapped bar charts only display a good message if longer bars are always behind shorter ones;

如果长条总是在短条之后,则部分重叠的条图只会显示一个好消息。

Avoid all 3D effects. Although they are aesthetically pleasing, they are against all the rules for an appropriate Data Visualization.

避免所有3D效果。 尽管它们在美学上令人愉悦,但它们违反了适当数据可视化的所有规则。

If you find this article of interest, please read my previous:

如果您发现这篇文章感兴趣,请阅读我以前的文章:

Stacked Bar Graphs, Why & How, Storytelling & Warnings

堆叠条形图,原因和方式,讲故事和警告

#1: https://ec.europa.eu/eurostat/statistics-explained/index.php/Tertiary_education_statistics#Fields_of_education

#1:https://ec.europa.eu/eurostat/statistics-explained/index.php/Tertiary_education_statistics#Fields_of_education

#2: Peltier Tech Blog, https://peltiertech.com/stacked-vs-clustered/

#2:Peltier技术博客, https ://peltiertech.com/stacked-vs-clustered/

翻译自: https://towardsdatascience.com/clustered-overlapped-bar-charts-94f1db93778e

mongodb 群集图

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: