Entropy, a more scientific term for what we call “randomness” is the measure of uncertainty or disorder.
But the question is, how do we decide if something is random and more importantly, how much? Let’s find out!
Illusion of uncertainty
Is the direction of a ball bouncing off a wall is uncertain? No. It depends on the angle on which it made contact with the wall. It’s not random.
What if you close your eyes and type a long string really fast? Will it be random? I made this little webpage so you can find it yourself.
You will notice that the string you typed isn’t even as random as a text written in English, let alone encrypted data.
Wait a second, English text isn’t random? Yes it isn’t. Some alphabets are used more than others, some words are used more than others, there are more English words with 5 alphabets than words with alphabets with 2 or 11 characters. All these properties add some certainty to the text which lowers the entropy.
And why isn’t your blindly typed “random” text random? Because while doing so, your fingers hit some keys more often than the others. Don’t believe me? Count occurrence of each character, some will have 20 while other have 1 or 2.
Now, how did this webpage measure the entropy? What’s the formula?
Before you read any further, I suggest you to read this article for a good explanation of various concepts related to entropy. I could cover them in this article but why waste resources when someone has already done it in a much better way?
How do computers generate random strings or numbers?
They can’t. Computers are machines, they do what input tells them to do. It cannot “think” of what to type.
Here’s how a random number generator might generate “random” numbers
(previously_generated_number + current_time) * fixed_secret_number
The output will be different every time but is it really random? We all know it isn’t. This formula consists of 3 unknown quantities but we can easily know the value of two of them and the third one is a fixed number so it doesn’t contribute much to the amount of entropy. Such generators that generate pseudo-random values are called pseudo-random number generators.
PHP too had an issue with it’s random generator on some platforms which was fixed in PHP 7.0. A researcher wrote a PHP program to change each pixel of an image to white or black randomly and this was the result
See? Some things aren’t as random as they seem to be.
Now, how we do we know if a CSRF token is random enough to keep off attackers? Can we predict a lottery number? Are you more likely to find a shiny Pokemon in certain conditions than others?
Ah yes, Pokemon! Pokemon emerald’s RNG used the following formula
1103515245 * seed + 24691
The seed was supposed to be a random number between 0 and 4294967295 but developers mistakenly made the seed equal to 0 which resulted in players cheating the game because it always generates 24691 and hence actions performed at a certain time will always have the same outcome.
Enough theory, is it practically possible to predict pseudo-random data?
Yes, by carefully analyzing the algorithm being used to generate the data and replicating the process locally. Obviously, you must have access to the source code of the program.
Here are my favorite articles demonstrating exploitation of PRNGs :)
What if you don’t have access to the source code?
It is quiet common that you will encounter a source of random strings without access to the underlying code. CSRF tokens are a great example of this and Bolt, a script that I wrote a while back for scanning a website for CSRF vulnerabilities demonstrates how to test such strings for randomness.
Note: Bolt is still in beta and hence prone to bugs but you can help by using it and reporting bugs.
Bolt performs bit level entropy checks similar to but more than Burp Suite. It crawls the website and stores all the CSRF tokens it finds. Then it concatenates them and converts that result string to binary. The following tests (as recommended by NIST) are then performed on the binary data
- Monobit frequency test
- Block frequency test
- Runs test
- Spectral test
- Non-overlapping template matching test
- Overlapping template matching test
- Serial test
- Cumultative sums test
- Aproximate entropy test
- Random excursions variant test
- Linear complexity test
- Longest runs test
- Maurers universal statistic test
- Random excursions test
You can use the same approach to test data of your choice. I suggest using the following Python implementation of the tests mentioned above
NIST random binary testsuite as written by Ilja Gerhardt - highfestiva/randomtests
We now have RNGs that are nearly impossible to predict because they use naturally random seeds such as atmospheric noise. However, the problem arises when developers write their own RNG or use the one offered by the programming language as an API. Such APIs are meant to be used in tasks with no risk such as using a RNG to generate a painting but they shouldn’t be used in a high risk context such a generating a coupon code.
That’s all for now. Have a nice day, stay hydrated :)