thresh in dropna for DataFrame in pandas in python


thresh in dropna for DataFrame in pandas in python
df1 = pd.DataFrame(np.arange(15).reshape(5,3))
df1.iloc[:4,1] = np.nan
df1.iloc[:2,2] = np.nan
df1.dropna(thresh=1 ,axis=1)
It seems that no nan value has been deleted.
0 1 2
0 0 NaN NaN
1 3 NaN NaN
2 6 NaN 8.0
3 9 NaN 11.0
4 12 13.0 14.0
if i run
df1.dropna(thresh=2,axis=1)
why it gives the following?
0 2
0 0 NaN
1 3 NaN
2 6 8.0
3 9 11.0
4 12 14.0
i just dont understand what thresh is doing here. If a column has more than one nan value, should the column be deleted?
1 Answer
1
thresh=N
requires that a column has at least N
non-NaNs to survive. In the first example, both columns have at least one non-NaN, so both survive. In the second example, only the last column has at least two non-NaNs, so it survives, but the previous column is dropped.
thresh=N
N
Try setting thresh
to 4 to get a better sense of what's happening.
thresh
Thank you. Now I understand. I thought the threshold controlled number for nan values. I was wrong. Thresh refers to non-nan values.
– AAA
2 days ago
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
"If a column has more than one nan value, should the column be deleted?". No. If the column has N non-null values or more, it should not be deleted. Convince yourself that this isn't the same thing.
– coldspeed
2 days ago