Apply函数vs Multiprocessing.map

%timedf.some_col.apply(lambdax:clean_transform_kthx(x))
Walltime:HAH!RIPBUDDY
#WHYYOUNORUNINPARALLEL!?

“plyr是一套处理一组问题的工具：需要把一个大的数据结构分解成一些均匀的数据块，之后对每一数据块应用一个函数，最后将所有结果组合在一起。”

my_df.apply(lambdax:nearest_street(x.lat,x.lon),axis=1)

dd.from_pandas(my_df,npartitions=nCores).\
map_partitions(\
lambdax:nearest_street(x.lat,x.lon),axis=1)).\
compute(get=get)
#importsattheend

frommultiprocessingimportcpu_count
nCores=cpu_count()

foriinintersections:
l3=np.sqrt((i[0]-[1])**2+(i[2]-i[3])**2)
#...Somemoreofthese
dist=l1+l2
ifdist<(l3*1.2):
matches.append(dist)
#...Morestuff
###yougettheidea,there'safor-loopcheckingtoseeif
###mypointsareclosetomystreetsandthenreturning
closest
###Ievenusednumpy,thatmeansfastright?

#overonearray
forcellinarray:
cell*CONSTANT-CONSTANT2
#overtwoarrays
foriinrange(len(array)):
array[i]=array[i]+array2[i]

#overonearray
(array*CONSTANT)-CONSTANT2
#overtwoarraysofsamelength
array=array-array2

fromnumbaimportjit
@jit#numbamagic
defsome_func()
l3_arr=np.sqrt((intersections[:,0]-
intersections[:,1])**2+\ (intersections[:,2]-
intersections[:,3])**2)
#nowl3isanarraycontainingallofmyblocklengths
#likewise,l1andl2arenowequalsizedarrays
#containingdistanceofpointtoallintersections
dist=l1_arr+l2_arr
match_arr=dist<(l3_arr*1.2)
ofmy
#point-to-streetdistancesatonceandhaveahandy
#booleanindex

Ernest Kim，旧金山大学硕士生，专注于机器学习、数据科学。