python迭代器itertools
2016-06-27 11:36
435 查看
引言
itertools是python中的迭代器,有非常强大的功能,掌握这个能够减少很多的编码量,需要写个博文mark一下Lets begin
chaindef chain(*iterables): # chain('ABC', 'DEF') --> A B C D E F for it in iterables: for element in it: yield element
如上面的函数定义,chain是第一个迭代数组的每个元素展开,注意到这个是将第一个迭代数组,所以chain展开的是第二维的数组元素,距离来说明:
>>> a=[['abc','def']] >>> for item in itertools.chain(*a): ... print item ... abc def >>> b=[[['abc','abc']]] >>> for item in itertools.chain(*b): ... print item ... ['abc', 'abc']
上面可以明显看出chain的区别。在itertools中和chain有一样功能的是from_iterable(iterable)函数
combinations
函数定义:
def combinations(iterable, r): # combinations('ABCD', 2) --> AB AC AD BC BD CD # combinations(range(4), 3) --> 012 013 023 123 pool = tuple(iterable) n = len(pool) if r > n: return indices = range(r) yield tuple(pool[i] for i in indices) while True: for i in reversed(range(r)): if indices[i] != i + n - r: break else: return indices[i] += 1 for j in range(i+1, r): indices[j] = indices[j-1] + 1 yield tuple(pool[i] for i in indices)
由函数定义可以看出,combinations的作用是在iterable数组中产生长度为r的子数组,且位置是唯一的,也就是不能交换的,比如AB和BA是不一样的,类似于排列组合的C(m,n).
这个使用起来比较简单,吸引我的是其实现方式,我觉得很不错,是一个很好的实现方式,大体思路是将每个位置的数从其开始位置移动到其最大能够移动的位置,注意这种移动是从后往前移动
combinations_with_replacement
def combinations_with_replacement(iterable, r): # combinations_with_replacement('ABC', 2) --> AA AB AC BB BC CC pool = tuple(iterable) n = len(pool) if not n and r: return indices = [0] * r yield tuple(pool[i] for i in indices) while True: for i in reversed(range(r)): if indices[i] != n - 1: break else: return indices[i:] = [indices[i] + 1] * (r - i) yield tuple(pool[i] for i in indices)
这个和前一个的区别的是可以重复的选取自身,因此其实现方式和上面的区别是,每个元素都能移动到原数组的最后一个元素,而且当扫描归为之后,是将i位置及其之后的都置为indices[i]+1的数值
>>> for item in itertools.combinations_with_replacement('ABCD',3): ... print item ... ('A', 'A', 'A') ('A', 'A', 'B') ('A', 'A', 'C') ('A', 'A', 'D') ('A', 'B', 'B') ('A', 'B', 'C') ('A', 'B', 'D') ('A', 'C', 'C') ('A', 'C', 'D') ('A', 'D', 'D') ('B', 'B', 'B') ('B', 'B', 'C') ('B', 'B', 'D') ('B', 'C', 'C') ('B', 'C', 'D') ('B', 'D', 'D') ('C', 'C', 'C') ('C', 'C', 'D') ('C', 'D', 'D') ('D', 'D', 'D')
compress
def compress(data, selectors): # compress('ABCDEF', [1,0,1,0,1,1]) --> A C E F return (d for d, s in izip(data, selectors) if s)
compress的作用是选取符合selectors条件的data数据
count
def count(start=0, step=1): # count(10) --> 10 11 12 13 14 ... # count(2.5, 0.5) -> 2.5 3.0 3.5 ... n = start while True: yield n n += step
count的作用是每次返回一个从start开始步长为step的数组,应用如下:
>>> a=itertools.count(0,1) >>> a.next() 0 >>> a.next() 1 >>> a.next() 2 >>>
cycle
函数定义:
def cycle(iterable): # cycle('ABCD') --> A B C D A B C D A B C D ... saved = [] for element in iterable: yield element saved.append(element) while saved: for element in saved: yield element
用法:循环生成iterable的元素:
>>> a=itertools.cycle('ABCD') >>> a.next() 'A' >>> a.next() 'B' >>> a.next() 'C' >>> a.next() 'D' >>> a.next() 'A' >>> a.next() 'B' >>> a.next() 'C' >>> a.next() 'D'
dropwhile
函数定义:
def dropwhile(predicate, iterable): # dropwhile(lambda x: x<5, [1,4,6,4,1]) --> 6 4 1 iterable = iter(iterable) for x in iterable: if not predicate(x): yield x break for x in iterable: yield x
从函数定义可以看出来,返回iterable第一个不满足predicate的数组值以及之后的数组值:
>>> a=itertools.dropwhile(lambda x:x>1,[3,2,1,1]) >>> a.next() 1 >>> a.next() 1
ifilter
函数定义:
def ifilter(predicate, iterable): # ifilter(lambda x: x%2, range(10)) --> 1 3 5 7 9 if predicate is None: predicate = bool for x in iterable: if predicate(x): yield x
返回符合条件的元素
ifilterfalse
和ifilter相反,返回不符合条件的的元素
imap
函数定义:
def imap(function, *iterables): # imap(pow, (2,3,10), (5,2,3)) --> 32 9 1000 iterables = map(iter, iterables) while True: args = [next(it) for it in iterables] if function is None: yield tuple(args) else: yield function(*args)
功能:给iterables的元素添加上function功能
izip
函数定义:
def izip(*iterables): # izip('ABCD', 'xy') --> Ax By iterators = map(iter, iterables) while iterators: yield tuple(map(next, iterators))
功能:成对匹配
演示:
>>> a='ABC' >>> b='123' >>> c='xy' >>> d=itertools.izip(a,b,c) >>> d.next() ('A', '1', 'x') >>> d.next() ('B', '2', 'y')
izip_longest
函数定义:
class ZipExhausted(Exception): pass def izip_longest(*args, **kwds): # izip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D- fillvalue = kwds.get('fillvalue') counter = [len(args) - 1] def sentinel(): if not counter[0]: raise ZipExhausted counter[0] -= 1 yield fillvalue fillers = repeat(fillvalue) iterators = [chain(it, sentinel(), fillers) for it in args] try: while iterators: yield tuple(map(next, iterators)) except ZipExhausted: pass
和izip差不多,不过返回的是最长的匹配,不满足长度的使用指定的fillvalue代替
>>> c=itertools.izip_longest('ABCD','xy','1',fillvalue='*') >>> c.next() ('A', 'x', '1') >>> c.next() ('B', 'y', '*') >>> c.next() ('C', '*', '*') >>> c.next() ('D', '*', '*')
permutations
-函数定义:
def permutations(iterable, r=None): # permutations('ABCD', 2) --> AB AC AD BA BC BD CA CB CD DA DB DC # permutations(range(3)) --> 012 021 102 120 201 210 pool = tuple(iterable) n = len(pool) r = n if r is None else r if r > n: return indices = range(n) cycles = range(n, n-r, -1) yield tuple(pool[i] for i in indices[:r]) while n: for i in reversed(range(r)): cycles[i] -= 1 if cycles[i] == 0: indices[i:] = indices[i+1:] + indices[i:i+1] cycles[i] = n - i else: j = cycles[i] indices[i], indices[-j] = indices[-j], indices[i] yield tuple(pool[i] for i in indices[:r]) break else: return
这个函数和combination有着类似的功能,不同的是,permutation不强调元素的顺序概念,AB和BA是不一样的,这个和排列组合里面的A(m,n)差不多
功能展示:
>>> it=itertools.permutations('ABCD',3) >>> for item in it: ... print item ... ('A', 'B', 'C') ('A', 'B', 'D') ('A', 'C', 'B') ('A', 'C', 'D') ('A', 'D', 'B') ('A', 'D', 'C') ('B', 'A', 'C') ('B', 'A', 'D') ('B', 'C', 'A') ('B', 'C', 'D') ('B', 'D', 'A') ('B', 'D', 'C') ('C', 'A', 'B') ('C', 'A', 'D') ('C', 'B', 'A') ('C', 'B', 'D') ('C', 'D', 'A') ('C', 'D', 'B') ('D', 'A', 'B') ('D', 'A', 'C') ('D', 'B', 'A') ('D', 'B', 'C') ('D', 'C', 'A') ('D', 'C', 'B')
官方定义的函数我没有怎么弄明白,于是写了一个递归的C++版本permutation:
#include<iostream> using namespace std; char str[4]={'A','B','C','D'}; char result[4]; bool flag[4]={false,false,false,false}; int fix=3; void print(){ for(int i=1;i<=fix;i++){ cout<<result[i]; } cout<<endl; } void f(int position){ for(int i=0;i<4;i++){ int t=-1; if(flag[i]==false){ flag[i]=true; t=i; result[position]=str[i]; if(position==fix){ print(); flag[i]=false; continue; }else{ f(position+1); } } if(t!=-1){ flag[t]=false; } } } int main() { f(1); return 0; }
测试结果如下:
ABC ABD ACB ACD ADB ADC BAC BAD BCA BCD BDA BDC CAB CAD CBA CBD CDA CDB DAB DAC DBA DBC DCA DCB
两者得到的结果是一样的。
product
函数定义:
def product(*args, **kwds): # product('ABCD', 'xy') --> Ax Ay Bx By Cx Cy Dx Dy # product(range(2), repeat=3) --> 000 001 010 011 100 101 110 111 pools = map(tuple, args) * kwds.get('repeat', 1) result = [[]] for pool in pools: result = [x+[y] for x in result for y in pool] for prod in result: yield tuple(prod)
product其实就是元素数组之间的相乘,展示如下:
>>> a=itertools.product('ABCD','xyz') >>> for item in a: ... print item ... ('A', 'x') ('A', 'y') ('A', 'z') ('B', 'x') ('B', 'y') ('B', 'z') ('C', 'x') ('C', 'y') ('C', 'z') ('D', 'x') ('D', 'y') ('D', 'z')
repeat
函数定义:
def repeat(object, times=None): # repeat(10, 3) --> 10 10 10 if times is None: while True: yield object else: for i in xrange(times): yield object
功能,返回times个object,看下面的运用:
>>> list(itertools.imap(pow,xrange(10),itertools.repeat(2))) [0, 1, 4, 9, 16, 25, 36, 49, 64, 81] >>> list(itertools.imap(pow,xrange(10),[2,2,2,2,2,2,2,2,2,2])) [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
starmap
函数定义:
def starmap(function, iterable): # starmap(pow, [(2,5), (3,2), (10,3)]) --> 32 9 1000 for args in iterable: yield function(*args)
这个和imap的区别是,imap选取每个参数的相同位置的元素元组作为function的参数,而starmap是将每个参数当做function的参数
takewhile
这个和dropwhile相反,dropwhile是找到第一个不满足predicate的元素以及其之后的元素,而takewhile是返回满足predicate的元素,如果遇到不满足的则返回
函数定义:
def takewhile(predicate, iterable): # takewhile(lambda x: x<5, [1,4,6,4,1]) --> 1 4 for x in iterable: if predicate(x): yield x else: break
演示:
>>> for item in itertools.takewhile(lambda x: x<5, [1,4,6,4,1]): ... print item ... 1 4
具体的运用例子
场景:对于一段文本,我们需要统计每个单词的词频。方法:运用nltk的分句和分词功能后,运用FreqDist进行统计词频
代码:
import nltk import itertools def process(str): sentences=itertools.chain(*[nltk.sent_tokenize(str.decode('utf-8').lower())]) words_dict=itertools.chain(*[nltk.word_tokenize(sen) for sen in sentences]) fdist=nltk.FreqDist(words_dict) print fdist.most_common(10) fdist.plot(50,cumulative=True) if __name__=='__main__': text="My interpretation: Gom Jabbar is an ancient non-Latin incantation in HPMORverse which Draco just happened to have researched in old dark tomes, and far predates the writings we're familiar with. Frank Herbert somehow picked up on the phrase, either overhearing it in passing or perhaps by having some actual knowledge of the magical world (e.g. magical relative or somesuch)." process(text)
代码打印出这段文字的频率最高的10个单词,并作出前50个词的累积频率图,结果如下:
[(u'in', 3), (u'.', 3), (u'the', 3), (u',', 2), (u'magical', 2), (u'or', 2), (u'jabbar', 1), (u'and', 1), (u'writings', 1), (u'interpretation', 1)]
相关文章推荐
- Python动态类型的学习---引用的理解
- Python3写爬虫(四)多线程实现数据爬取
- 垃圾邮件过滤器 python简单实现
- 下载并遍历 names.txt 文件,输出长度最长的回文人名。
- install and upgrade scrapy
- Scrapy的架构介绍
- Centos6 编译安装Python
- 使用Python生成Excel格式的图片
- 让Python文件也可以当bat文件运行
- [Python]推算数独
- Python中zip()函数用法举例
- Python中map()函数浅析
- 设计模式之行为型模式 - 调用行为的传递问题
- Python将excel导入到mysql中
- Python在CAM软件Genesis2000中的应用
- 使用Shiboken为C++和Qt库创建Python绑定
- FREEBASIC 编译可被python调用的dll函数示例