您的位置：首页 > 编程语言 > Python开发

python note

2015-11-25 16:39 597 查看

1.使用pypi镜像源加速第三方库在线安装

python

1.split拆分与去掉前后空格：

In[32]: val='a,b, c'
In[33]: pieces=[x.strip() for x in val.split(',')]
In[34]: pieces
Out[34]: ['a', 'b', 'c']

2.生成set

numSet = set(nums) #nums为数组
numSet = set() #空set

3.string与数组转换

lst=list(str)

str=''.join(lst)

4.python3 "/"为float除，"//"为整除

5/2=2.5
5//2=2

5.数组初始化

基本方法

lst=[1,2,3,4]

连续数字

lst=[n for n in range(5)]

n个相同值

lst=[0 for n in range(5)]

append vs extend

In[2]: a=[1,2,3]
In[3]: b=[4,5,6]
In[4]: a.append(b)
In[5]: a
Out[5]: [1, 2, 3, [4, 5, 6]]
#
In[7]: a.extend(b)
In[8]: a
Out[8]: [1, 2, 3, 4, 5, 6]

6.字符和ascii转换

字符转ascii

In[2]: ord('a')
Out[2]: 97

ascii转字符

In[3]: chr(97)
Out[3]: 'a'

7.print不换行，格式化输出

print(x,end="")
#两个'%'=输出'%‘'
print('total error rate is %%%f' % (errorCount/float(length)*100))
print('total length is %d, total error is %d' % (length,errorCount))

8.数字转字符串

In[2]: a=100
In[3]: str(a)
Out[3]: '100'

9.动态import路径

动态修改PYTHONPATH：

>>> import sys
>>> sys.path.append('c:\\path')

10.zip(a,b)：把两个列表合成一组：

In[9]: a=[1,3,5]
In[10]: b=[2,4,6]
In[11]: zip(a,b)
Out[11]: [(1, 2), (3, 4), (5, 6)]

10.dict相关操作

初始化

map={'a':1,'b':2}

遍历

for key,value in map.items():
print(key,":",value)

按key排序

for key in sorted(map.keys()):
print(map[key],end="")

#or
sorted(map.items(),key=lambda x:x[0])

按values排序

sorted(map.items(),key=lambda x:x[1])

11.python标准日志logging

教程

12.文件操作

写操作、追加

output = open('data.txt', 'w')
#写二进制文件output = open('data.txt', 'wb')
#追加写文件output = open('data.txt', 'a')
output .write("")
output .close( )

13.矩阵根据某一列排序

from itertools import groupby
from operator import itemgetter

things = [('2009-09-02', 11),
('2009-09-02', 3),
('2009-09-03', 10),
('2009-09-03', 4),
('2009-09-03', 22),
('2009-09-06', 33)]

sss = groupby(things, itemgetter(0))
for key, items in sss:
print key
for subitem in items:
print subitem
print '-' * 20

14. 自带个数统计

from collections import Counter

c=Counter('hello you')
In[5]: c
Out[5]: Counter({' ': 1, 'e': 1, 'h': 1, 'l': 2, 'o': 2, 'u': 1, 'y': 1})
In[6]: c.most_common(3)
Out[6]: [('l', 2), ('o', 2), (' ', 1)]

15.正则，如何匹配打印出来

re.compile("^\[font(?:=(?P<size>[-+][0-9]{1,2}))?\](.*?)[/font]",
re.DEBUG)

16.enumerate：同时迭代index和内容：

In[22]: a=['a','b','c']
In[23]: for (i,item) in enumerate(a):
...     print i,item
...
0 a
1 b
2 c

17.file 和 argv[0]

sys.argv[0]获取执行文件当前路径

file 功能一样一样的

获取绝对路径： os.path.abspath(sys.argv[0])

or: os.path.abspath(file)

numpy

1.数组基本操作

生成二维数组

In[21]: b=array([[1,2,3],[4,5,6]])
In[22]: b
Out[22]:
array([[1, 2, 3],
[4, 5, 6]])

数组的维数（几个坐标轴）

In[23]: b.ndim
Out[23]: 2

行数和列数

In[24]: b.shape
Out[24]: (2, 3)

总个数

In[26]: b.size
Out[26]: 6

自增序列

In[29]: c=arange(0,4,0.5)
In[30]: c
Out[30]: array([ 0. ,  0.5,  1. ,  1.5,  2. ,  2.5,  3. ,  3.5])

改变矩阵形状

In[31]: c=arange(8).reshape(2,4)
In[32]: c
Out[32]:
array([[0, 1, 2, 3],
[4, 5, 6, 7]])

矩阵加和

#纵向加
b=array([[0, 1, 2],
[3, 4, 5]])
In[62]: b.sum(axis=0)
Out[62]: array([3, 5, 7])

#横向加
b=array([[0, 1, 2],
[3, 4, 5]])
In[59]: b.sum(axis=1)
Out[59]: array([ 3, 12])

argsort:返回的是数组值从小到大的索引值

In[66]: d=array([2,1,4,3])
In[67]: d.argsort()
Out[67]: array([1, 0, 3, 2], dtype=int64)

数组某一行赋值

# e的第0行的0-1号元素分别赋值为2，3
In[76]: e[0,0:2]=[2,3]
In[77]: e
Out[77]:
array([[ 2.,  3.],
[ 0.,  0.],
[ 0.,  0.]])

最大值，最小值

# 参数0为列的最大值，参数1为行的最大值
In[84]: e
Out[84]:
array([[1, 3],
[4, 2],
[5, 7]])
In[85]: e.max(0)
Out[85]: array([5, 7])
In[86]: e.max(1)
Out[86]: array([3, 4, 7])

2.生成int随机数组：

In[24]: sample=np.random.randint(0,100,size=10)
In[25]: sample
Out[25]: array([99, 16,  7, 52, 33, 93,  0, 86, 69, 63])

3.tile函数

在行上重复两次

In[12]: numpy.tile([0,1],2)
Out[12]: array([0, 1, 0, 1])

在行和列上分别重复两次

In[15]: numpy.tile([0,1],(2,2))
Out[15]:
array([[0, 1, 0, 1],
[0, 1, 0, 1]])

4.nonzero

In[36]: a
Out[36]:
matrix([[1, 0, 0],
[0, 1, 0],
[0, 0, 1]])
In[37]: nonzero(a)
Out[37]: (matrix([[0, 1, 2]], dtype=int64), matrix([[0, 1, 2]], dtype=int64))

表示[0,0].[1,1],[2,2]位置的数非0

5.mean

求均值

In[4]: dataSet
Out[4]:
matrix([[1, 2],
[3, 4]])
In[5]: mean(dataSet,axis=0)#列上求均值
Out[5]: matrix([[ 2.,  3.]])
In[8]: mean(dataSet,axis=1)#行上求均值
Out[8]:
matrix([[ 1.5],
[ 3.5]])

6.根据某一列分组groupby

In[32]: df
Out[32]:
data1 key1
0  0.280233    a
1  0.059657    a
2 -0.741920    b
3  0.479287    b
In[33]: df['data1'].groupby(df['key1']).mean()
Out[33]:
key1
a    0.169945
b   -0.131316

根据key1分组，计算data1列的均值

7.根据某一列排序

tips.sort_index(by='total_bill')[-5:]

根据total_bill排序，取出最大的5行

8.划分区间:cut

bins=np.array([0,100,1000,10000,100000,1000000])
labels=pd.cut(fec_mb.contb_receipt_amt,bins)

根据contb_receipt_amt的值划分到bins的区间中

9. random.seed(0)：输入一个种子，每次随机出的数是固定的

random.seed(0)
X=random.normal(0,1,(2,3))

10. argwhere：符合某个条件的索引

# 找出所有等于最大值的索引
scores=np.array([1,2,3,3,2,1])
np.argwhere(scores == scores.max()).flatten()
out: array([2, 3], dtype=int64)

python常用操作

1.判断某一列是否都相等：

if len(set(dataSet[:,-1].T.tolist()[0]))==1:

2.根据index一列的value值把数据分为两部分：

def binSplitDataSet(dataSet, feature, value):
mat0 = dataSet[nonzero(dataSet[:, feature] > value)[0], :][0]
mat1 = dataSet[nonzero(dataSet[:, feature] <= value)[0], :][0]
return mat0, mat1

eg:

In[57]: a
Out[57]:
matrix([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
In[58]: mat0,mat1=binSplitDataSet(a,1,3)//按照第2列是否>3分
In[59]: mat0
Out[59]:
matrix([[4, 5, 6],
[7, 8, 9]])
In[60]: mat1
Out[60]: matrix([[1, 2, 3]])

3.取出mat中的某一列变为列表

x=dataMat[:,1].T.tolist()[0]

4.urllib调用网络api

from urllib import request
from urllib import parse
import json

def geoGrab(stAddress, city):
apiStem = 'http://where.yahooapis.com/geocode?'
params = {}
params['flags'] = 'J'
params['appid'] = 'aaa0VN6k'
params['location'] = '%s %s' % (stAddress, city)
url_params = parse.urlencode(params) ##python3用法
yahooApi = apiStem + url_params
print (yahooApi)
c=request.urlopen(yahooApi) ##python3用法
return json.loads(c.read())

5.dict做过滤

In[3]: map
Out[3]: {'a': 1, 't': 3, 'b': 2}
In[4]: newMap={k:v for k,v in map.items() if v>2}
In[5]: newMap
Out[5]: {'t': 3}

画图

1.x,y散点图

import matplotlib.pyplot as plt
plt.plot(x,y,'o') //o为圆圈，点为plt.plot(x,y,'o')
plt.show()

2.画多幅图

fig=plt.figure()

rect=[0.05,0.05,0.9,0.9]

ax1=fig.add_axes(rect,label='ax1',frameon=False)

x=[1,2,3,4]
y=[2,3,4,5]

ax1.scatter(x,y,marker='*',s=20)

plt.show()

3.x轴为时间

import matplotlib.pyplot as plt
import matplotlib.dates as mdates

xArr.append(datetime.datetime(int(dateArr[0]), int(dateArr[1]), int(dateArr[2])))

fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(xArr, yArr)

days = mdates.DayLocator()  # every day
daysFmt = mdates.DateFormatter('%m-%d')
ax.xaxis.set_major_locator(days)
ax.xaxis.set_major_formatter(daysFmt)
fig.autofmt_xdate()

plt.show()

4.分布图

lst=[1,1,1,2]
plt.hist(lst,2)//2表示分几段
plt.show()

5. 相关系数矩阵图

import seaborn as sns
import matplotlib.pyplot as plt

colormap = plt.cm.viridis
plt.figure(figsize=(12,12))
plt.title('Pearson Correlation of Features', y=1.05, size=15)
sns.heatmap(train.astype(float).corr(),linewidths=0.1,vmax=1.0, square=True, cmap=colormap, linecolor='white', annot=True)

基础知识

yield, 生成器

sklearn

cross_validation.cross_val_score

score = cross_val_score(rand_forest, X[:, i:i + 1], Y, scoring='r2', cv=5)
//rand_forest：分类器
//cv：5表示5折分类方式
//scoring：评判标准

详细说明

cross_validation.ShuffleSplit

ShuffleSplit(n, 3, .3)
//n：数据的个数
//3：次数
//0.3：测试集比例
score = cross_val_score(rand_forest, X[:, i:i + 1], Y, scoring='r2', cv= ShuffleSplit(n, 3, .3))

详细说明

标准化StandardScaler

from sklearn.preprocessing import StandardScaler

data = StandardScaler().fit_transform(iris.data)

区间缩放MinMaxScaler

from sklearn.preprocessing import MinMaxScaler

data=MinMaxScaler().fit_transform(iris.data)

归一化Normalizer

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航