Darren DeRidder / @73rhodes

machine learning

naive bayesian classifiers

node.js

@73rhodes • github/darrenderidder • 51elliot.blogspot.com

Computer Systems Engineer

Real-time • AAA • Network Security • Mobile

Tech lead on Kindsight Mobile Security @ Alcatel

Mobile World Congress • Blackhat 2013

@ottawa_js organizer

"I Am Not A Data Scientist"

(IANADS)

and that's ok!

There are lots of tools available for us mortals.

simple, yet surprisingly effective

- filter out spam
- figure out if a page is about apples (fruit) or computers
- guess gender given height, weight and shoe size
- etc!

` P(A|B) = (P(B|A)P(A)) / (P(B)) = ...`

`= (P(B|A)P(A)) / ( P(B|A) P(A) + (1-P(B|A))(1-P(A)))`

`P(A) = ( prod_(i=1)^n P(A|W_i) ) / ( (prod_(i=1)^n P(A|W_i)) + (prod_(i=1)^n (1 - P(A|W_i))) )`

Or, in Plain English

a box of chocolates.

You never know what you're gonna get.

(But you can make a pretty good guess!)

Nuts | No Nuts | |

Round | 25% | 75% |

Square | 75% | 25% |

Dark | 10% | 90% |

Light | 90% | 10% |

What if we pick a round, light chocolate?

A round, light chocolate...

Nuts | No Nuts | P(Nuts) | P(NoNuts) | |||

Round | .25 | .75 | .25 | .75 | ||

Square | .75 | .25 | - | - | ||

Dark | .10 | .90 | - | - | ||

Light | .90 | .10 | .90 | .10 | ||

`prod_(i=1)^n P_i` | .225 | .075 |

`x = 0.225 / 0.075 = 3`

A round, light chocolate is 3 times more likely to have nuts.

(This is a likelihood function.)

Classify as "Nuts" or "No Nuts", with some level of certainty.

`P(N) = 0.225 / (0.225 + 0.075) = 0.75 = 75%`

(We're 75% sure this chocolate has nuts.)

Optimized binary classifier for limited vocabularies.

Leverages "missing" traits to improve accuracy by ~10%.

Used in production...