Machine Learning with Node.JS

Machine Learning in JavaScript

Darren DeRidder / @73rhodes

Preview

machine learning

naive bayesian classifiers

node.js

About Me

@73rhodes • github/darrenderidder • 51elliot.blogspot.com

Computer Systems Engineer

Real-time • AAA • Network Security • Mobile

Tech lead on Kindsight Mobile Security @ Alcatel

Mobile World Congress • Blackhat 2013

@ottawa_js organizer

Full Disclosure...

"I Am Not A Data Scientist"

(IANADS)

and that's ok!

There are lots of tools available for us mortals.

Naive Bayesian Classification

simple, yet surprisingly effective

Bayesian Filters Can...

filter out spam
figure out if a page is about apples (fruit) or computers
guess gender given height, weight and shoe size
etc!

Bayes' Theorum

` P(A|B) = (P(B|A)P(A)) / (P(B)) = ...`

`= (P(B|A)P(A)) / ( P(B|A) P(A) + (1-P(B|A))(1-P(A)))`

Binary Bayesian Classifier

`P(A) = ( prod_(i=1)^n P(A|W_i) ) / ( (prod_(i=1)^n P(A|W_i)) + (prod_(i=1)^n (1 - P(A|W_i))) )`

Or, in Plain English

"WTF?!"

Life is like...

a box of chocolates.

You never know what you're gonna get.

(But you can make a pretty good guess!)

A Simple Example

	Nuts	No Nuts
Round	25%	75%
Square	75%	25%
Dark	10%	90%
Light	90%	10%

What if we pick a round, light chocolate?

A Simple Example

A round, light chocolate...

	Nuts	No Nuts	P(Nuts)	P(NoNuts)
Round	.25	.75	.25	.75
Square	.75	.25	-	-
Dark	.10	.90	-	-
Light	.90	.10	.90	.10
		`prod_(i=1)^n P_i`	.225	.075

The Results

`x = 0.225 / 0.075 = 3`

A round, light chocolate is 3 times more likely to have nuts.

(This is a likelihood function.)

Binary Classification

Classify as "Nuts" or "No Nuts", with some level of certainty.

`P(N) = 0.225 / (0.225 + 0.075) = 0.75 = 75%`

(We're 75% sure this chocolate has nuts.)

Machine Learning in Node.JS

classify (by Heather Arthur)
brain (same author)
dclassify (by me)

dclassify

Optimized binary classifier for limited vocabularies.

Leverages "missing" traits to improve accuracy by ~10%.

Used in production...

Thanks

http://darrenderidder.github.io/talks/MachineLearning