您的位置：首页 > 理论基础 > 数据结构算法

Python 数据结构使用技巧

2017-04-27 14:28 561 查看

一、在列表、字典、集合中根据条件筛选数据

下面实验的数据都是采用random模块随机生成符合条件的数据，故每次实验结果会有不同

1. 过滤列表中的负数

# -*- coding:utf-8 -*-
from random import randint

data = [randint(-10,10) for _ in xrange(10)]

# 方法一采用filter函数
print filter(lambda x:x>=0, data)
# 方法二采用列表解析
print [x for x in data if x>=0]

输出：
[8, 5, 0, 3]
[8, 5, 0, 3]

2. 筛出字典中值高于90的项

# -*- coding:utf-8 -*-
from random import randint

d = {x: randint(60,100) for x in xrange(1,21)}

print { k:v for k,v in d.iteritems() if v >90}

输出：

{10: 97}

3. 筛出集合中能被3整除的元素

# -*- coding:utf-8 -*-

from random import randint

data = [randint(-10,10) for _ in xrange(10)]

s = set(data)

print {x for x in s if x % 3 ==0}

输出：
set([0, 9])

二、命名统计字典

1.如何为元组中的每个元素命名，提高程序可读性

方案一：

定义类似与其他语言类似的枚举类型，也就是定义一系列数值常量

# -*- coding:utf-8 -*-
NAME = 0
AGE = 1
SEX = 2
EMAIL = 3

student = ('jim', 16, 'male', 'jim@gmail.com')

if student[AGE] > 18:
pass

if student[SEX] == 'male':
pass

方案二：

使用标准库中collections.namedtuple 替代内置tuple

生成的s是个元组，namedtuple相当于一个类的工厂，s既可以用索引，也可以用属性查找

# -*- coding:utf-8 -*-
from collections import namedtuple

student = namedtuple('Student',['name', 'age', 'sex', 'male'])

s = student('jim', 16, 'male', 'jim@gmail.com')

print s
print s.name

输出：

Student(name='jim', age=16, sex='male', male='jim@gmail.com')
jim

2.如何统计序列中元素的出现频度

例如1：某随机序列中，找到出现次数最高的3个元素，它们的出现次数是多少？

方法一：

# -*- coding:utf-8 -*-
from random import randint

data = [randint(0,20) for _ in xrange(30)]

c = dict.fromkeys(data, 0)
print c
for x in data:
c[x] = c[x] + 1
print c.items()
print sorted(c.items(), key=lambda d:d[1])[-3:]

输出：
[(0, 1), (1, 2), (2, 3), (3, 2), (4, 1), (5, 2), (6, 1), (7, 2), (8, 3), (9, 3), (11, 2), (12, 2), (15, 4), (19, 1), (20, 1)]
[(8, 3), (9, 3), (15, 4)]

方法二：使用collections.Counter对象

将序列传入Counter的构造器，得到Counter对象是元素频度的字典，Counter.most_common(n)方法得到频度最高的n个元素的列表

# -*- coding:utf-8 -*-
from random import randint
from collections import Counter

data = [randint(0,20) for _ in xrange(30)]
c2 = Counter(data)

print c2.most_common(3)

输出：
[(18, 4), (5, 3), (14, 3)]

例如2：对某英文文章的单词，进行词频统计，找到出现次数最高的10个单词，它们出现次数是多少？
以文件内容不是英文字符进行切片

# -*- coding:utf-8 -*-
from collections import Counter
import re

txt = open('test.txt').read()
c3 = Counter(re.split('\W+', txt))
print c3.most_common(3)

输出：
[('openhpc', 26), ('resource', 17), ('queue', 16)]

3.根据字典中值的大小，对字典中的项排序

解决方案：

1.利用zip将字典转化为元组

2.传递sorted函数的key参数

# -*- coding:utf-8 -*-
from random import randint

d = {x:randint(60,100) for x in 'xyzabc' }
print sorted(zip(d.itervalues(),d.iterkeys()))

输出：
[(80, 'y'), (89, 'x'), (91, 'b'), (94, 'a'), (94, 'z'), (99, 'c')]

# -*- coding:utf-8 -*-
from random import randint

d = {x:randint(60,100) for x in 'xyzabc' }
print sorted(d.items(), key=lambda x: x[1])

输出：

[('x', 67), ('y', 71), ('c', 72), ('a', 75), ('z', 88), ('b', 89)]

三、公共键

1.如何快速找到多个字典中的公共键

# -*- coding:utf-8 -*-
from random import randint, sample

s1 = {x: randint(1,4) for x in sample('abcdefg', randint(3,6))}
s2 = {x: randint(1,4) for x in sample('abcdefg', randint(3,6))}
s3 = {x: randint(1,4) for x in sample('abcdefg', randint(3,6))}
# 如果数据集比较少可以采用下面方法
print s1.viewkeys() & s2.viewkeys() & s3.viewkeys()

# step1：使用字典的viewkeys()方法,得到一个字典的keys集合；
# step2: 使用map函数，得到所有字典的keys集合；
# step3：使用reduce函数，取所有字典的keys集合的交集。
# 数据集多的话采用下面方法
print reduce(lambda a,b:a&b, map(dict.viewkeys, [s1,s2,s3]))

输出：

set(['c', 'd'])
set(['c', 'd'])

四、如何让字典保持有序

1.使用collections.OrderedDict

from time import time
from random import randint
from collections import OrderedDict

d = OrderedDict()
players = list('ABCDEFGH')
start = time()

for i in xrange(8):
raw_input()
p = players.pop(randint(0,7-i))
end = time()
print i+1,p, end - start
d[p] = (i+1, end - start)

print '*'*20
for k in d:
print k, d[k]

输出：
后面for循环遍历的字典是以元素进入字典的顺序进行排列的

1 C 0.934000015259

2 D 1.40899991989

3 F 1.67999982834

4 A 1.95599985123

5 E 2.16599988937

6 H 2.37599992752

7 B 2.60699987411

8 G 2.99799990654
********************
C (1, 0.9340000152587891)
D (2, 1.4089999198913574)
F (3, 1.679999828338623)
A (4, 1.9559998512268066)
E (5, 2.1659998893737793)
H (6, 2.375999927520752)
B (7, 2.6069998741149902)
G (8, 2.997999906539917)

五、历史记录

1. 实现用户的历史记录功能（最多n条）

使用容量为n的队列历史存储记录

使用标准库collections中的deque，它是一个双端循环队列，程序退出前，可以使用pickle将队列对象存入文件，再次运行程序时将其导入。

from random import randint
from collections import deque
N = randint(0, 100)
history = deque([], 5)

def guess(k):
if k == N:
print 'right'
return True

if k < N:
print '%s is less than N' % k
else:
print '%s is greater than N' % k
return False

while True:
line = raw_input("please input a number: ")
if line.isdigit():
k = int(line)
history.append(k)
if guess(k)：
break
elif line == 'history' or line =='h?':
print list(history)

In [1]: import pickle

In [2]: from collections import deque

In [3]: q = deque([],5)

In [4]: q.append(1)

In [5]: q.append(2)

In [6]: q.append(3)

In [7]: q.append(4)

In [8]: q.append(5)

In [9]: q.append(6)

In [10]: q
Out[10]: deque([2, 3, 4, 5, 6])

In [11]: pickle.dump(q,open('history','w'))

In [12]: pickle.load(open('history'))
Out[12]: deque([2, 3, 4, 5, 6])

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： python Map reduce filter collections

相关文章推荐

新的分享

章节导航