您的位置：首页 > 编程语言 > Python开发

Python数据可视化

2019-08-18 21:30 246 查看

Python数据可视化

引入
1 绘制简单的折线图
1.1 修改标签文字和线条粗细
1.2 使用scatter()绘制散点图并绘制其样式
1.3 使用scatter()绘制一系列点
1.4 自动计算数据
1.5 使用颜色映射

2 随机漫步

2.1 重新绘制起点和终点
2.2 隐藏坐标轴

3 使用Pygal模拟掷骰子

3.1 创建Die类
3.2 掷骰子
3.3 分析结果
3.4 绘制直方图
3.5 同时掷两个骰子

引入

数据可视化是指通过可视化表示来探索数据，它与数据挖掘紧密相关，而数据挖掘指得是使用代码来探索数据集的规律和关联。

在基因研究、天气研究、政治经济分析等众多领域，大都使用Python来完成数据密集型工作。数据科学家使用Python编写了一系列的可视化和分析工具，最流行之一便是matplotlib，它是一个数学绘图库，可以用其来绘制很多的图形。

1 绘制简单的折线图

创建test.py文件，并添加以下代码：

程序清单2-1：

import matplotlib.pyplot as plt

def test():
x_value = [1, 2, 3, 4, 5]
squares = [1, 4, 9, 16, 25]    #测试用数据
plt.plot(x_value, squares)    #绘制
plt.show()    #展示图片;若要保存图片，请使用：plt.savefig('test.png', bbox_inches = 'tight');其中bbox_inches = 'tight'代表删除多余空白。

if __name__ == '__main__':    #主函数
test()

运行结果如下：

图2-1 matplotlib绘制的简单折线图

注：请确保matplotlib库已安装；Python版本为3.x以上。

1.1 修改标签文字和线条粗细

图1-1中表面数字是越来越大，但是是否可以更改数字大小、增设标题、修改线粗宽等呢？
于test.py中覆盖以下代码：

程序清单2-2：

import matplotlib.pyplot as plt

def test():
x_value = [1, 2, 3, 4, 5]
squares = [1, 4, 9, 16, 25]
plt.plot(x_value, squares, linewidth = 5, color = 'red')    #设置颜色及线宽

plt.title("Squares Numbers", fontsize = 24)    #设置标题及大小
plt.xlabel("Value", fontsize = 14)    #设置x坐标名称及大小
plt.ylabel("Squares of Value", fontsize = 14)    #设置y坐标名称及大小

plt.tick_params(axis = 'both', labelsize = 14)    #设置坐标轴上数字大小
#plt.legend(['X'])    为添加线描述
plt.show()

if __name__ == '__main__':
test()

运行结果如下：

图2-2 绘制更易阅读的图表

1.2 使用scatter()绘制散点图并绘制其样式

有时候，需要绘制散点图并设置各个数据点的样式。例如，你可能想以一种颜色显示较小的值，而用另一种颜色显示较大的值。绘制大型数据集时，你还可以对每个点都设置相同的样式，再使用不同的样式选项重新绘制某些点，以突出它们。
　　要绘制单个点，可以使用scatter()。于test.py中覆盖以下代码：

程序清单2-3：

import matplotlib.pyplot as plt

def test():
plt.scatter(2, 4, s = 400, color = 'yellow')    #设置颜色及点大小

plt.title("Squares Numbers", fontsize = 24)    #设置标题及大小；marker设置点形状
plt.xlabel("Value", fontsize = 14)    #设置x坐标名称及大小
plt.ylabel("Squares of Value", fontsize = 14)    #设置y坐标名称及大小

plt.tick_params(axis = 'both', labelsize = 14)    #设置坐标轴上数字大小
plt.show()

if __name__ == '__main__':
test()

运行结果如下：

图2-3 绘制一个点

1.3 使用scatter()绘制一系列点

于test.py中覆盖以下代码：

程序清单2-4：

import matplotlib.pyplot as plt

def test():
x_values = [1, 2, 3, 4, 5]
squares = [1, 4, 9, 16, 25]
plt.scatter(x_values, squares, s = 50, color = 'green')    #设置颜色及点大小

plt.title("Squares Numbers", fontsize = 24)    #设置标题及大小
plt.xlabel("Value", fontsize = 14)    #设置x坐标名称及大小
plt.ylabel("Squares of Value", fontsize = 14)    #设置y坐标名称及大小

plt.tick_params(axis = 'both', labelsize = 14)    #设置坐标轴上数字大小
plt.show()

if __name__ == '__main__':
test()

运行结果如下：

图2-4 绘制一系列点

1.4 自动计算数据

手动输入列表效率低下，特别是在需要绘制的点过多的情况，因此常常使用代码完成数据计算。于test.py中覆盖以下代码：

程序清单2-5：

import matplotlib.pyplot as plt

def test():
x_values = list(range(1, 1001))    #生成1-1000的列表，可设置步长，如：range(1,1001,10)
squares = [x**2 for x in x_values]    #循环使用x_values中的值平方后存入squares中

plt.scatter(x_values, squares, s = 10, c = 'purple')    #设置颜色及点大小；若增加edgecolor = 'none'，将删去默认情况下的黑色轮廓(matplotlib2.0.0版本默认为此)；颜色选择时，还可以使用RGB颜色方式，如color = (0, 0.5, 1)或者c = ...;请注意其值的范围为0-1

plt.title("Squares Numbers", fontsize = 14)    #设置标题及大小
plt.xlabel("Value", fontsize = 10)    #设置x坐标名称及大小
plt.ylabel("Squares of Value", fontsize = 10)    #设置y坐标名称及大小

plt.tick_params(axis = 'both', labelsize = 10)    #设置坐标轴上数字大小
plt.axis([0, 1100, 0, 1100000])    #设置坐标轴范围
plt.show()

if __name__ == '__main__':
test()

运行结果如下：

图2-5 自动生成数据后绘图

1.5 使用颜色映射

颜色映射(colormap)是一系列颜色，它们从起始颜色渐变为结束颜色。在可视化中，颜色映射用于突出数据的规律。例如，可以用较浅的颜色来显示较小的值，并用较深的颜色来显示较大的值。

模块pyplot内置了一组颜色映射，要使用这些映射，需要告诉pyplot该如何设置数据集中每个点的颜色。于test.py中覆盖以下代码：

程序清单2-6：

import matplotlib.pyplot as plt

def test():
x_values = list(range(1, 1001))    #生成1-1000的列表，可设置步长，如：range(1,1001,10)
squares = [x**2 for x in x_values]    #循环使用x_values中的值平方后存入squares中

plt.scatter(x_values, squares, s = 10, c = squares, edgecolors = 'none', cmap = plt.cm.Reds)    #g告诉pyplot使用何数据进行颜色映射及何种颜色

plt.title("Squares Numbers", fontsize = 14)    #设置标题及大小
plt.xlabel("Value", fontsize = 10)    #设置x坐标名称及大小
plt.ylabel("Squares of Value", fontsize = 10)    #设置y坐标名称及大小

plt.tick_params(axis = 'both', labelsize = 10)    #设置坐标轴上数字大小
plt.axis([0, 1100, 0, 1100000])    #设置坐标轴范围
plt.show()

if __name__ == '__main__':
test()

运行结果如下：

图2-6 使用颜色映射

2 随机漫步

随机漫步是这样行走得到的路径：每次行走都是完全随机的，没有明确的方向，结果是由一系列随机决策决定的。你可以这样认为，随机漫步就是蚂蚁在晕头转向时，每次都沿随机的方向前行所经过的路径。

在自然界、物理学、生物学、化学和经济领域，随机漫步都有其实际用途。例如，漂浮在水滴上的花粉因不断受到水分子的挤压而在水面上移动。水滴中的分子运动是随机的，因此花粉在水面上的运动路径犹如随机漫步。于test.py中覆盖以下代码：

程序清单2-7：

from random import choice
import matplotlib.pyplot as plt

def fill_walk():    #计算随机漫步包含的所有点
x_values = [0]    #初始化随机漫步的属性
y_values = [0]    # 每次漫步都从(0, 0)开始
num_points = 5000   # 漫步次数

while len(x_values) < num_points:
x_step, y_step = get_step()

if x_step == 0 and y_step == 0:    #避免原地踏步
continue

next_x = x_values[-1] + x_step
next_y = y_values[-1] + y_step    #计算下一次的位置
x_values.append(next_x)
y_values.append(next_y)
return x_values, y_values

def get_step():    #获取步长
x_direction = choice([1, -1])  # 设置前进方向
x_distance = choice(list(range(0, 5)))  # 设置前进距离
x_step = x_direction * x_distance

y_direction = choice([1, -1])  # 设置前进方向
y_distance = choice(list(range(0, 5)))  # 设置前进距离
y_step = y_direction * y_distance

return x_step, y_step

def test():	   #可添加while循环进行多次漫步
x_values, y_values = fill_walk()
plt.scatter(x_values, y_values, s = 15)
#plt.scatter(x_values, y_values, s=15, c=point_numbers, cmap=plt.cm.Reds, edgecolors='none')
plt.show()

if __name__ == '__main__':
test()

运行结果如下：

图2-7 包含5000个点的随机漫步

图2-8 包含5000个点且使用颜色映射着色的随机漫步

2.1 重新绘制起点和终点

单独绘制起点终点，以此突出。于test.py中覆盖以下代码：

程序清单2-8：

from random import choice
import matplotlib.pyplot as plt

def fill_walk():    #计算随机漫步包含的所有点
x_values = [0]    #初始化随机漫步的属性
y_values = [0]    # 每次漫步都从(0, 0)开始
num_points = 5000   # 漫步次数

while len(x_values) < num_points:
x_step, y_step = get_step()

if x_step == 0 and y_step == 0:    #避免原地踏步
continue

next_x = x_values[-1] + x_step
next_y = y_values[-1] + y_step    #计算下一次的位置
x_values.append(next_x)
y_values.append(next_y)
return x_values, y_values

def get_step():    #获取步长
x_direction = choice([1, -1])  # 设置前进方向
x_distance = choice(list(range(0, 5)))  # 设置前进距离
x_step = x_direction * x_distance

y_direction = choice([1, -1])  # 设置前进方向
y_distance = choice(list(range(0, 5)))  # 设置前进距离
y_step = y_direction * y_distance

return x_step, y_step

def test():
x_values, y_values = fill_walk()
point_numbers = list(range(5000))
plt.scatter(x_values, y_values, s=5, c=point_numbers, cmap=plt.cm.Blues, edgecolors='none')

"""新增"""
plt.scatter(0, 0, s = 15, c = 'green', edgecolors = 'none')    #绘制起点
plt.scatter(x_values[-1], y_values[-1], s = 15, c = 'red', edgecolors = 'none')    #绘制终点
plt.show()

if __name__ == '__main__':
test()

运行结果如下：

图2-9 突出起点与终点

2.2 隐藏坐标轴

隐藏坐标轴，以免注意的是坐标轴而非随机漫步路劲。于test.py中覆盖以下代码：

程序清单2-9：

from random import choice
import matplotlib.pyplot as plt

def fill_walk():    #计算随机漫步包含的所有点
x_values = [0]    #初始化随机漫步的属性
y_values = [0]    # 每次漫步都从(0, 0)开始
num_points = 5000   # 漫步次数

while len(x_values) < num_points:
x_step, y_step = get_step()

if x_step == 0 and y_step == 0:    #避免原地踏步
continue

next_x = x_values[-1] + x_step
next_y = y_values[-1] + y_step    #计算下一次的位置
x_values.append(next_x)
y_values.append(next_y)
return x_values, y_values

def get_step():
x_direction = choice([1, -1])  # 设置前进方向
x_distance = choice(list(range(0, 5)))  # 设置前进距离
x_step = x_direction * x_distance

y_direction = choice([1, -1])  # 设置前进方向
y_distance = choice(list(range(0, 5)))  # 设置前进距离
y_step = y_direction * y_distance

return x_step, y_step

def test():
x_values, y_values = fill_walk()
#plt.figure(dpi = 128, figsize = (10, 6))    可设置图片大小及分辨率：dpi
point_numbers = list(range(5000))
plt.scatter(x_values, y_values, s=5, c=point_numbers, cmap=plt.cm.Reds, edgecolors='none')

plt.scatter(0, 0, s = 40, c = 'green', edgecolors = 'none')    #绘制起点
plt.scatter(x_values[-1], y_values[-1], s = 40, c = 'blue', edgecolors = 'none')    #绘制终点

'''新增'''
plt.axis('off')    #隐藏坐标轴；或者以下
#plt.axes().get_xaxis().set_visible(False)
#plt.axes().get_yaxis().set_visible(False)

plt.show()

if __name__ == '__main__':
test()

运行结果如下：

图2-10 隐藏坐标轴后的随机漫步

3 使用Pygal模拟掷骰子

Python的Pygal1可用来生成可缩放的矢量图形文件。对于需要在尺寸不同的屏幕上显示的图表，这很有用，因为他们自动缩放，以适合观看者的屏幕。

此节中，将对掷骰子的结果进行分析。掷6面的常规骰子，可能出现的结果为1~6且几率相同；同时掷两个骰子时，某些点数组数的概率却要大一些。对此，将生成一个表示掷骰子的数据集，并绘制相应图形，以便确定哪些点数出现的几率大。
注：请确保Pygal库已安装(安装命令：python -m pip install --user pygal==1.7)。

3.1 创建Die类

下面的类模拟掷一个骰子。新建die.py文件并添加以下代码：

程序清单3-1：

from random import randint

class Die():    #表示一个骰子的类

def __init__(self, num_sides = 6):
self.num_sides = num_sides    #默认6面

def roll(self):
return randint(1, self.num_sides)    #返回1至面数的随机值

3.2 掷骰子

使用Die类创建图表前，首先掷骰子。新建die_test.py文件并添加以下代码：

程序清单3-2：

from die import Die

def test():
die = Die()

results = []
for roll_num in range(100):
result = die.roll()
results.append(result)

return  results

if __name__ == '__main__':
results = test()
print("The results is:", results)

运行结果如下：

The results is: [1, 3, 2, 2, 1, 1, 3, 5, 4, 2, 6, 6, 1, 5, 6, 6, 2, 5, 2, 2, 2, 5, 4, 5, 4, 1, 4, 1, 3, 1, 6, 4, 6, 1, 6, 1, 3, 2, 5, 4, 6, 3, 3, 4, 1, 1, 6, 5, 5, 5, 6, 2, 5, 2, 4, 5, 5, 3, 1, 6, 2, 6, 5, 2, 6, 6, 2, 5, 6, 4, 4, 3, 5, 2, 5, 6, 5, 4, 5, 6, 5, 4, 6, 5, 5, 1, 4, 4, 4, 6, 4, 4, 1, 3, 1, 1, 4, 4, 1, 1]

3.3 分析结果

为分析掷一个骰子的结果，计算每个点数出现的次数。于die_test.py中覆盖以下代码：

程序清单3-3：

from die import Die

def test():
die = Die()

results = []
for roll_num in range(10000):
result = die.roll()
results.append(result)

frequencies = []    #分析结果
for value in range(1, die.num_sides + 1):
frequency = results.count(value)    #次数统计
frequencies.append(frequency)
print(frequencies)

return  results

if __name__ == '__main__':
results = test()

运行结果如下：

[1668, 1651, 1678, 1628, 1722, 1653]

3.4 绘制直方图

有了频率列表后，就可以绘制一个表示结果的直方图。直方图是一种条形图，指出了各种结果结果出现的频率。于die_test.py中覆盖以下代码：

程序清单3-4：

from die import Die
import pygal

def test():
die = Die()

results = []
for roll_num in range(10000):
result = die.roll()
results.append(result)

frequencies = []    #分析结果
for value in range(1, die.num_sides + 1):
frequency = results.count(value)    #次数统计
frequencies.append(frequency)

hist = pygal.Bar()    #对结果进行可视化
hist.title = 'Results of rolling one D6 10000 times.'    #用D6表示6面骰子
hist.x_label = list(str(range(1, 6)))
hist.x_title = 'Result'
hist.y_title = 'Frequency of Result'

hist.add('D6', frequencies)
hist.render_to_file('die-test.svg')

if __name__ == '__main__':
test()

运行结果如下(已使用PS处理)：

图3-1 使用Pygal创建的简单条形图

3.5 同时掷两个骰子

用D6来表示骰子，创建两个D6骰子，以模拟同时掷两个的情况。于die_test.py中覆盖以下代码：

程序清单3-5：

from die import Die
import pygal

def test():
die_1 = Die()    #创建两个骰子
die_2 = Die()

results = []
for roll_num in range(10000):
result = die_1.roll() + die_2.roll()    #点数相加
results.append(result)

frequencies = []    #分析结果
max_results = die_1.num_sides + die_2.num_sides
for value in range(2, max_results + 1):
frequency = results.count(value)    #次数统计
frequencies.append(frequency)

hist = pygal.Bar()    #对结果进行可视化
hist.title = 'Results of rolling two D6 10000 times.'    #用D6表示6面骰子

temp_x_labels = []
for i in range(2, 13):
temp_x_labels.append(str(i))
hist.x_labels = temp_x_labels
hist.x_title = 'Result'
hist.y_title = 'Frequency of Result'

hist.add('D6 + D6', frequencies)
hist.render_to_file('die-test.svg')

if __name__ == '__main__':
test()

运行结果如下：

图3-2 模拟同时掷两个6面骰子的结果

思考：如何模拟不同面数的骰子或者更多个数的骰子？

要了解使用Pygal可创建什么样的图表，请访问：http://www.pygal.org/，单机Documentation，再单机Chart types。 ↩︎

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航