A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_in
某日在捣鼓pandas时发生了warning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
意思是一个值正被赋给来自于DataFrame类型的切片的拷贝,使用.loc方法来赋值。
遂研究了下,感觉很奇怪
[code]In [233]: import pandas as pd In [234]: A = pd.DataFrame([[1,2,3], [7,8,9],[14,15,16]], columns = ['a', 'b', 'c']) In [235]: A Out[235]: a b c 0 1 2 3 1 7 8 9 2 14 15 16
把A的第一列赋值给B,B是Series对象,修改B的某一个数发现A也被修改了
[code]In [236]: B = A['a'] In [237]: B Out[237]: 0 1 1 7 2 14 Name: a, dtype: int64 In [238]: type(B) Out[238]: pandas.core.series.Series In [239]: B[0] = 3 In [240]: B Out[240]: 0 3 1 7 2 14 Name: a, dtype: int64 In [241]: A Out[241]: a b c 0 3 2 3 1 7 8 9 2 14 15 16
然后把A的第一列和第二列赋值给C,C是A的切片的拷贝?,C是DataFrame类型,修改C的第一行第一列,发生了警告,C被修改,A未被修改
[code]In [243]: C = A[['a', 'b']] In [244]: C Out[244]: a b 0 3 2 1 7 8 2 14 15 In [245]: type(C) Out[245]: pandas.core.frame.DataFrame In [246]: C['a'][0] Out[246]: 3 In [247]: C['a'][0] =5 c:\program files\python36\lib\site-packages\IPython\core\interactiveshell.py:2910: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy exec(code_obj, self.user_global_ns, self.user_ns) In [248]: C Out[248]: a b 0 5 2 1 7 8 2 14 15 In [249]: A Out[249]: a b c 0 3 2 3 1 7 8 9 2 14 15 16
利用A的loc方法生成D,D和C一样,再进行同样的操作,没有发生警告
[code]In [250]: D = A.loc[:, ['a','b']] In [251]: D Out[251]: a b 0 3 2 1 7 8 2 14 15 In [252]: type(D) Out[252]: pandas.core.frame.DataFrame In [253]: D['a'][0] Out[253]: 3 In [254]: D['a'][0] = 5 In [255]: D Out[255]: a b 0 5 2 1 7 8 2 14 15 In [256]: A Out[256]: a b c 0 3 2 3 1 7 8 9 2 14 15 16
C和D有什么区别?我尝试了一下C规避警告的办法,可使用
[code]C = C.copy()
再进行修改数值操作,就不会发生警告了。
看了一下官方文档:http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
大意是这样的:
[code]In [339]: dfmi = pd.DataFrame([list('abcd'), .....: list('efgh'), .....: list('ijkl'), .....: list('mnop')], .....: columns=pd.MultiIndex.from_product([['one','two'], .....: ['first','second']])) .....: In [340]: dfmi Out[340]: one two first second first second 0 a b c d 1 e f g h 2 i j k l 3 m n o p
如果你使用loc方法,
[code]dfmi.loc[:,('one','second')] = value
在pandas里会被视为(等价于)调用了loc的__setitem__方法
[code]# becomes dfmi.loc.__setitem__((slice(None), ('one', 'second')), value)
如果你使用下面的方式赋值
[code]dfmi['one']['second'] = value
pandas内部等价于如下
[code]# becomes dfmi.__getitem__('one').__setitem__('second', value)
即首先调用了__getitem__方法,返回了一个DataFrame对象,再对这个对象调用__setitem__方法,也就是说,调用了两次,称为链式索引(chained indexing),时间上会比loc更慢。
但通常,pandas不会因为你多花了一些时间就给你报错,而是因为pandas无法保证第一次返回的DataFrame对象是view还是copy,取决于数组的布局(layout of array),如果返回的是view,那么皆大欢喜,没有问题。如果返回的是copy,那我给一个copy赋值后,它的原变量没有发生改变。pandas无法保证__setitem__是会修改dfmi还是修改一个马上被扔掉的临时对象,所以最好使用loc方法。(What’s up with the
SettingWithCopywarning? We don’t usually throw warnings around when you do something that might cost a few extra milliseconds!But it turns out that assigning to the product of chained indexing has inherently unpredictable results.
Outside of simple cases, it’s very hard to predict whether it will return a view or a copy (it depends on the memory layout of the array, about which pandas makes no guarantees), and therefore whether the
__setitem__will modify
dfmior a temporary object that gets thrown out immediately afterward. That’s what
SettingWithCopyis warning you about!)
回到我自己的问题,从上面代码执行的情况来看,C是A的slice的copy,因为改变了C对A没有影响。那为什么还会警告?我猜是因为pandas内部认为,C是上面提到的“马上要被扔掉的临时对象”,而B是A的slice的view,所以没有被警告。
阅读更多- A value is trying to be set on a copy of a slice from a DataFrame
- dataframe插入数据报错SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a
- 30.In which situation may the UNDO_RETENTION parameter be ignored, even if it is set to a value?
- Xcode4 布置Git环境Your working copy is out of date. Try pulling from the remote to get the latest change
- Fast ways in R to get the first row of a data frame grouped by an identifier
- System.Data.SqlClient.SqlException: Cannot insert explicit value for identity column in table 'Food' when IDENTITY_INSERT is set to OFF
- How to get the data from a cell when I click on the GridButtonColumn of the same row
- using JS to control two select(html),the data can be loaded from database and XML,and show in the select
- 布置Git环境Your working copy is out of date. Try pulling from the remote to get the latest change
- Xcode4 布置Git环境Your working copy is out of date. Try pulling from the remote to get the latest change
- Resolution to the record count increasing of the file exported from DB when ‘0A’ is included in it
- 102.You want to import schema objects of the HR user using Oracle Data Pump from the development dat
- Getting “CHECKOUT can only be performed on a version resource” when trying to commit using Eclipse s
- 论文阅读: End-to-end Learning of Action Detection from Frame Glimpses in Videos
- Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "te
- [Graphics] UIColor created with component values far outside the expected range, Set a breakpoint on UIColorBreakForOutOfRangeColorComponents to debug. This message will only be logged once.
- using sqlbulkcopy to quick load data from your client to sqlserver
- what is the difference of select single and select up to one row in abap
- How to copy a datafile from ASM to a file system not using RMAN (Doc ID 428893.1)
- File /hbase could only be replicated to 0 nodes instead of minReplication (=1). There are 30 datanode(s) running and no node(s) are excluded in this operation.