This is a repost of a blog I did back in 2008, on my old, old blog. Then, I was reading Toby Segaran’s book Programming Collective Intelligence .

His book presented how to use these an algorithm like Tanimoto Coefficient to mine sites like Facebook, MovieLens, Delicious, Blogs and more.

Tanimoto similarity and distance

A method of classification based on a similarity ratio, and a derived distance function, […] “Tanimoto similarity” and “Tanimoto Distance”. The similarity ratio is equivalent to Jaccard similarity, but the distance function is not the same as Jaccard distance. Here is the PowerShell version:

function Get-TanimotoCoefficient (\$q,\$t) { \$c = \$q | where {\$t -eq \$_} \$c.count / (\$q.count + \$t.count - \$c.count) } # Results PS C:\> \$list = echo shirts socks pants shoes PS C:\> Get-TanimotoCoefficient \$list (echo skirts socks shoes) 0.4 PS C:\> Get-TanimotoCoefficient \$list (echo shirts socks) 0.5 PS C:\> Get-TanimotoCoefficient \$list (echo socks pants hat gloves) 0.333333333333333

tags: similarity,Tanimoto,Get,list,count,socks,TanimotoCoefficient,distance,echo,gt,function

1.凡CodeSecTeam转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责；
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性，不作出任何保证或承若；
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。