thresh in dropna for DataFrame in pandas in python

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP

thresh in dropna for DataFrame in pandas in python


df1 = pd.DataFrame(np.arange(15).reshape(5,3))
df1.iloc[:4,1] = np.nan
df1.iloc[:2,2] = np.nan
df1.dropna(thresh=1 ,axis=1)



It seems that no nan value has been deleted.


0 1 2
0 0 NaN NaN
1 3 NaN NaN
2 6 NaN 8.0
3 9 NaN 11.0
4 12 13.0 14.0



if i run


df1.dropna(thresh=2,axis=1)



why it gives the following?


0 2
0 0 NaN
1 3 NaN
2 6 8.0
3 9 11.0
4 12 14.0



i just dont understand what thresh is doing here. If a column has more than one nan value, should the column be deleted?





"If a column has more than one nan value, should the column be deleted?". No. If the column has N non-null values or more, it should not be deleted. Convince yourself that this isn't the same thing.
– coldspeed
2 days ago




1 Answer
1



thresh=N requires that a column has at least N non-NaNs to survive. In the first example, both columns have at least one non-NaN, so both survive. In the second example, only the last column has at least two non-NaNs, so it survives, but the previous column is dropped.


thresh=N


N



Try setting thresh to 4 to get a better sense of what's happening.


thresh





Thank you. Now I understand. I thought the threshold controlled number for nan values. I was wrong. Thresh refers to non-nan values.
– AAA
2 days ago






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Comments

Popular posts from this blog

Executable numpy error

Trying to Print Gridster Items to PDF without overlapping contents

Mass disable jenkins jobs