rdd.countApprox taking as long as count()
Clash Royale CLAN TAG#URR8PPP
rdd.countApprox taking as long as count()
My code looks like
foo.rdd.countApprox(1000, 0.9) => takes 7.1 minutes
foo.count() => takes 7.1 minutes
Is there anything I am missing? foo
is a df and I am trying to reduce the time it takes to count()
the number of records in foo
foo
count()
foo
As you can see below it some cases it takes more time and in some other cases, it takes lesser time. I am confused :(
foo.rdd.count()
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Seems like you should be comparing the first to
foo.rdd.count()
– pault
8 mins ago