Spotting malicious javascript
There’s been much discussion about how to spot malicious javascript. One simple approach that spots a reasonable amount of malware is a simple ratio of the number of Javascript keywords in the code to the total length of the code. This nicely spots things like uuencoded code, although it will miss some other types of obscufation.
Expressed as a formula (*):
m’ = Sum over k of count(T, k)/len(T)
m = 1/m’ if m’ != 0
where T is the text, count(T, k) is the number of occurences of k in T and k is a set of all javascript keywords + a few common browser extensions. Higher numbers = more badness.
It’s not infallible in that it’s easy to create bad javascript that this doesn’t spot, but anything that does score highly is likely to be bad.
Sample python code (right-click and use ‘save as’): js_measure.tgz
(*) Oh for latex!