OpenSSL PRNG is not (really) fork-safe

The Android SecureRandom incident

When the patch for Android’s SecureRandom was released last week, I was a little surprised. In addition to fixing the Apache Harmony implementation of SecureRandom, the patch also contained

Applies the fix for OpenSSL PRNG having low entropy. Does nothing if the fix is not needed.

What was that all about? I understood that the problem with the Harmony implementation was a reaction to this paper published earlier this year and was more or less an implementation issue, but what about OpenSSL? Newer versions of Android use a version of OpenSSL internally and there is a Java bridge that exposes, among other things, access to OpenSSL’s Pseudo-Random Number Generator (PRNG). What caught my attention is that the Android patch explicitly seeds the OpenSSL PRNG using more or less predictable data and then it mixes in additional entropy from /dev/urandom. This seemed weird at first, since the OpenSSL PRNG seeds itself from /dev/urandom on systems where it is available before it will return any random data. Why would there be the need to do that manually then?

Zygote, Ruby and forking OpenSSL

UPDATE: Eric has asked me to additionally credit Alexander Dymo for bringing up this issue!

Then I found out about how Android heavily relies on forking processes using Zygote and I now suspect that this is the reason for patching OpenSSL. I was immediately reminded of a problem that was brought up on the Ruby mailing list a while ago. Eric Wong had discovered that Ruby’s SecureRandom implementation would serve exactly the same ‘random’ byte sequence when forked in a child process as soon as the PIDs (process identifiers) start to wrap around. Typically PIDs start with 0 again after exhausting PID number 32768 - this may vary, to be sure you may find out the exact value using

Retrieving the maximum possible PID

  cat /proc/sys/kernel/pid_max

While this was fixed for SecureRandom (more on that later), you may still observe the same behavior with OpenSSL::Random today - as Ruby’s OpenSSL extension is meant to be a mere wrapper around OpenSSL itself we try to avoid adding custom behavior when possible. To try it out, we simply have to replace ‘SecureRandom’ with ‘OpenSSL::Random’ in Eric’s script:

Predictable OpenSSL::Random

require "openssl"

OpenSSL::Random.random_bytes(4)
pid = fork do
  p [ $$, OpenSSL::Random.random_bytes(4) ]
end
Process.waitpid2(pid)

loop do
  xpid = fork do
    p [ $$, OpenSSL::Random.random_bytes(4) ] if $$ == pid
  end
  Process.waitpid2(xpid)
  break if xpid == pid
end

The script takes a while to run, but it is well worth the surprise. At the time, we discussed whether this is an OpenSSL-internal issue or not and Eric also posted to the OpenSSL mailing list. The OpenSSL devs did not want to solve this internally, as they would like to stay agnostic to OS specifics as much as possible. The argument that ‘fork’ may not be the only way to perform process bifurcation is certainly correct. The recommendation was to mix in something unique every time a child process is forked so that the initially equivalent state is sufficiently altered. This is also what we did for SecureRandom eventually.

The key of why Eric’s script works is that the OpenSSL PRNG must be initialized in the parent before we fork the child processes. This way, every child process starts out with exactly the same internal PRNG state, and the PID is the only thing process-specific that is fed to the PRNG algorithm when requesting random bytes. Let’s have a look at ssleay_rand_bytes, the function responsible for generating pseudo-random bytes:

md_rand.c: ssleay_rand_bytes

static int ssleay_rand_bytes(unsigned char *buf, int num, int pseudo)
  {
  
        /* Leaving out some details and adding comments of my own */

        ...

        /* Since the PRNG was initialized in the parent process, initialized
         * will be set to 1 already and this is skipped
         */
  if (!initialized)
      {
      RAND_poll();
      initialized = 1;
      }

        /* This is also 1 and skipped */
  if (!stirred_pool)
      do_stir_pool = 1;

        /* We have enough entropy because the PRNG is already initialized */
  ok = (entropy >= ENTROPY_NEEDED);
  if (!ok)
      {
              /* Not executed */
                    ...    
      }

  if (do_stir_pool)
      {
          /* Not executed */
                    ...
      }


        /* These are identical for every forked child */
  st_idx=state_index;
  st_num=state_num;
  md_c[0] = md_count[0];
  md_c[1] = md_count[1];
  memcpy(local_md, md, sizeof md);

  state_index+=num_ceil;
  if (state_index > state_num)
      state_index %= state_num;

  md_count[0] += 1;

  while (num > 0)
      {
      j=(num >= MD_DIGEST_LENGTH/2)?MD_DIGEST_LENGTH/2:num;
      num-=j;
      if (!MD_Init(&m))
          goto err;

                /* This is actually the only child process-specific value */

      if (curr_pid)
          {
          if (!MD_Update(&m,(unsigned char*)&curr_pid,sizeof curr_pid))
              goto err;
          curr_pid = 0;
          }

      if (!MD_Update(&m,local_md,MD_DIGEST_LENGTH))
          goto err;
      if (!MD_Update(&m,(unsigned char *)&(md_c[0]),sizeof(md_c)))
          goto err;

#ifndef PURIFY
      /* VERY interesting, more on that in a minute. For now, assume it's the
                 * same in each child process
      */
      if (!MD_Update(&m,buf,j))
          goto err;
#endif

      k=(st_idx+MD_DIGEST_LENGTH/2)-st_num;
      if (k > 0)
          {
          if (!MD_Update(&m,&(state[st_idx]),MD_DIGEST_LENGTH/2-k))
              goto err;
          if (!MD_Update(&m,&(state[0]),k))
              goto err;
          }
      else
          if (!MD_Update(&m,&(state[st_idx]),MD_DIGEST_LENGTH/2))
              goto err;
      if (!MD_Final(&m,local_md))
          goto err;

      for (i=0; i<MD_DIGEST_LENGTH/2; i++)
          {
          state[st_idx++]^=local_md[i];
          if (st_idx >= st_num)
              st_idx=0;
          if (i < j)
              *(buf++)=local_md[i+MD_DIGEST_LENGTH/2];

                        /* This is where the output is generated. Everything that
                         * was fed to local_md is exactly the same for each child
                         * except for the PID.
                         */
          }
      }

  /* The rest is not relevant for our observation */
        ...

  }

Because the child process PID is the only thing specific to each child process, we will eventually see exactly the same random bytes once PID values start to recycle. This means that while adding the PID to the message digest makes for some fork safety, it is not enough when we consider that PIDs are not incremented infinitely but will wrap eventually. I could very well imagine that this is somehow related to the Android OpenSSL patch and their extensive use of forking…

It works in Ruby, so how about C?

Eric Wong did post a C example along with his post to the OpenSSL mailing list, which was essentially a C version of the Ruby code that we already saw:

C version of Eric Wong’s proof of concept

#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <stdio.h>
#include <openssl/rand.h>

static void dump_random(void)
{
  int i;
  unsigned char buf[4];

  RAND_bytes(buf, sizeof(buf));
  printf("pid=%d ", getpid());
  for (i = 0; i < sizeof(buf); i++) {
            printf("\\x%02x", buf[i]);
        }
  puts("");
}

int main(void)
{
  pid_t pid, xpid;

  /* PRNG needs to be initialized in original process to reproduce */
  RAND_bytes((unsigned char *)&pid, sizeof(pid));

  pid = fork();
  if (pid == 0) {
      dump_random();
      return 0;
  } else if (pid > 0) {
      wait(NULL);

      do {
          xpid = fork();
          if (xpid == 0) {
              if (getpid() == pid)
                  dump_random();
              return 0;
          } else if (xpid > 0) {
              wait(NULL);
          } else {
              perror("fork");
          }
      } while (pid != xpid);
  } else {
      perror("fork");
      return 1;
  }

  return 0;
}

I ran this to see if maybe the problem had been fixed somehow in OpenSSL in the meantime. Much to my surprise, I could reproduce the repeated ‘random’ numbers on my laptop (running Linux Mint) but I could not when running it on my desktop (running Fedora). What the? I downloaded sources for both distributions and I compared them to the official OpenSSL 1.0.1e. Fedora was using the same code as in 1.0.1e and consequently I couldn’t reproduce the behavior with 1.0.1e either.

It’s me again, the Debian OpenSSL bug

So was it fixed there but it somehow wasn’t in the Mint version of OpenSSL?! I couldn’t spot any difference until I finally simply used the diff command. There. Just one single difference between Fedora/the official OpenSSL versions and the Mint version:

The difference in ssleay_rand_bytes

#ifndef PURIFY /* purify complains */
#if 0
      /* The following line uses the supplied buffer as a small
       * source of entropy: since this buffer is often uninitialised
       * it may cause programs such as purify or valgrind to
       * complain. So for those builds it is not used: the removal
       * of such a small source of entropy has negligible impact on
       * security.
       */
      MD_Update(&m,buf,j);
#endif
#endif

There’s an ‘#if 0’ around the line that I marked as ‘interesting’ before. And then it clicked. Mint uses Ubuntu packages which again uses Debian packages. This is actually the Debian version of OpenSSL! The ‘#if 0’ is actually a remainder of the famous Debian OpenSSL ‘patch’ that severely crippled the PRNG’s entropy a couple of years ago! If you haven’t heard about it, here is a mean but precise article describing what went wrong there. While one part of the patch had to be removed, this gem was actually sanctioned to stay in there. But how can this have any impact when the OpenSSL devs clearly state that it should have little to no impact on the overall security? The reason is uninitialized memory. The official version (without the ‘#if 0’) does something dubious: it adds the user-provided buffer to the hash calculation in the hope that most users would not have initialized the buffer at this point. If uninitialized, the buffer could(!) contain unpredictable values and would contribute a little entropy.

This now works with Eric Wong’s example code because we do not initialize the buffer there, and this is why the final behavior is different when compared to 1.0.1e/Fedora. But this can be easily ‘fixed’ by initializing the buffer - and voilà, random bytes are repeated again, regardless of the OpenSSL version in use:

‘Fixed’ version for predictable output regardless of OpenSSL version

#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>

#include <openssl/rand.h>

static void dump_random(void)
{
    int i;
    unsigned char buf[4];

    /* does the 'trick' */
    memset(buf, 0, sizeof(buf));

    RAND_bytes(buf, sizeof(buf));
    printf("pid=%d ", getpid());
    for (i = 0; i < sizeof(buf); i++) {
        printf("\\x%02x", buf[i]);
    }
    puts("");
}

int main(void)
{
   /* as before */
}

So is it fixed or not or what?!

Now you may also realize why the Ruby code sample worked regardless of the OpenSSL version while in the C example we needed to explicitly initialize the memory first: the solution is that the C code behind Ruby’s OpenSSL::Random actually does initialize the buffer first. Considering a user-provided value as your safety net is never a good idea, but it’s even worse in this case: not only does the dubious practice of relying on uninitialized values cause problems with tools such as valgrind (which ironically were the reason for the Debian disaster) but it also masks the problem at hand while contributing little to none to security. This cannot be regarded as the solution to the problem, I would be very much in favor of removing this confusing line completely.

Anyhow, the question seems to be much more if we may consider this ‘PID wrapping issue’ a problem of OpenSSL itself or if this must be marked as ‘the developer has to take care of this’. Now those who know me also know that for me, there can be only one definitive answer to this question. Sure, developers could take care of this and handle it properly on their own, but we have striking examples now for how often this will not happen in reality. The first time I saw this was when Eric brought it up on bugs.ruby-lang.org, but it has now most likely bitten the Android developers and some research brought up that it has also affected Postgres developers. Now even if you don’t trust us Ruby developers, I’m pretty sure we can all agree that Android and Postgres developers are fairly good at what they’re doing. Still they overlooked this. And honestly - who of us would have thought about this when forking a process? Funny anecdote: A long time ago, this was a valid concern for OpenSSL devs as well- as this ancient commit clearly shows.

How to fix this for good?

We cannot keep on adding ‘best practice’ after ‘best practice’ to what developers need to keep in the back of their heads when writing everyday code. This approach must fail eventually and the only reliable solution is to fix problems like this at the root - and this is clearly in the library providing the functionality itself. A quick fix like already proposed on the OpenSSL mailing list might be to mix in the current time in addition to the PID. Even if we add a predictable value like that an attacker hopefully cannot observe enough of the internal PRNG state to make any use of the predictable values to find out more about the internal state. To exploit the repeated random numbers we had to rely on the internal PRNG state being exactly the same in different child processes while nothing was actually revealed about the internal state. Simply changing the internal state, even with predictable values, seems to be enough at this point.

This is the fix that had been applied to Ruby’s SecureRandom, but there are several reasons why I’m still not too happy with OpenSSL’s PRNG. The biggest issue for me is that it’s more or less an ad-hoc design that doesn’t follow an official standard or recommendation. There is for example Fortuna or NIST SP 800-90A, B and C, and interestingly enough OpenSSL uses one of the NIST designs for its FIPS version. People analyzing the OpenSSL PRNG security have complained many times that the source code is complicated and hard to grasp (see the Debian issue), and it is therefore so much harder to assess its general security. I would appreciate a more standardized design. Markku-Juhani O. Saarinen discovered a flaw in the PRNG more than a decade ago. Because of the ad-hoc design, who is to say that this has been fixed for good?

The easiest way out is probably to completely rely on /dev/urandom where available. Sure it comes with its own baggage and there’s the problem with availability and different behavior on different platforms, but after all, it has a fairly good reputation. Most of all, it is not affected by any forking issues.

What’s the impact of all of this?

To be fair, the issue with forking demonstrated here should have little impact in everyday software (other than cough, Android that is). Even if you are developing a Rails app using a forking web server like Unicorn with Resque background jobs and OpenSSL::Random instead of SecureRandom, you should not be affected. Most software that I can think of allocates a pool of ‘worker processes’, so it is probably unlikely that you would ever run into the problem of wrapping PIDs. Still, we consider a cipher broken even if the attack is impractical in reality - if only the attack involves less work than trying brute force. In this sense, the OpenSSL PRNG is broken in its current state because we have found a way to predict its output other than by brute force. I hope the OpenSSL team will fix this internally and make our lives easier by not having to think about these issues when working on our everyday projects.

If you are on Ruby and you think that you might be affected you might try adding additional entropy to OpenSSL::Random after a fork:

Adding entropy to OpenSSL::Random after a fork

require 'openssl'

module OpenSSL::Random
  class << self

    old_rand = instance_method(:random_bytes)

    define_method :random_bytes do |n=nil|
      n = n ? n.to_int : 16

      @pid = 0 unless defined?(@pid)
      pid = $$
      unless @pid == pid # detect a fork and modify PRNG state
        add_predictable(pid)
        add_urandom
        @pid = pid
      end
      old_rand.bind(self).call(n)
    end

    private

    def add_predictable(pid)
      now = Time.now
      ary = [now.to_i, now.nsec, @pid, pid]
      OpenSSL::Random.random_add(ary.join('').to_s, 0.0)
    end

    def add_urandom
      begin
        OpenSSL::Random.load_random_file('/dev/urandom')
      rescue OpenSSL::Random::RandomError
        # probably not available, reraise if mandatory for you
      end
    end

  end
end

‘add_predictable’ is basically the same as the SecureRandom fix. It’s enough to make the repeated random numbers go away, but the truly paranoid will also want to use ‘add_urandom’ as it refreshes OpenSSL’s PRNG internal state using random data gathered from /dev/urandom. OpenSSL::Random.load_random_file causes the OpenSSL PRNG to gather 2048 bytes of entropy that are then mixed in to its internal state. Because 2048 is quite a bit larger than the internal state, this ensures that every single state byte is updated by the data read from /dev/urandom. This reflects the fix proposed in the Android SecureRandom patch. The same principle might of course be applied in C code as well.

Lessons learned

Random numbers are absolutely essential for a crypto library, if they suck we don’t even have to get started with encryption or anything else, because it all collapses to something trivially deterministic and therefore predictable. I very much like the idea of using /dev/urandom as sole Random Number Generator for cryptographic purposes, mostly because this would give us a single construct that research could focus on and it provides more flexibility to mix in data generated by hardware RNGs.

Also, this is again a perfect example for how confusing code can cause a lot of trouble. There is so much code shared between ssleay_rand_bytes and ssleay_rand_add that it’s really hard to spot the differences. This was probably a major reason for the Debian disaster. A little bit of refactoring would do wonders here - move shared parts to separate functions and give them expressive meaningful names. Template Method Pattern for the win. To be fair, the Kernel code implementing /dev/urandom is also no beauty. But if clarity, expressiveness and the DRY principle in general are good practice for everyday coding, their value is doubled for security-critical code.

Thanks to Eric Wong again for all his work, not just on this issue!

Source code accompanying this post can be found here.

OpenSSL PRNG Is Not (Really) Fork-safe