Detecting overflows, undefined behaviour and other nasties

You will remember that a previous post discussed what happens when you add one to an integer, and that the answer isn’t always obvious. Indeed, the answer isn’t always defined.

As it happens, there are plenty of weird cases that crop up when working with C and languages like it. You can’t expect a boolean to be YES or NO. You can’t assume that an enum variable only holds values from the enumeration. You can’t assume that you know how long an array is, even if the caller told you. Just as adding one is subtle, so is dividing by minus one.

In each of these cases[*]—and others—what you should actually do is to check that the input to an operation won’t cause a problem before doing the operation:

int safe_add(int n1, int n2) {
  if(n2 > 0) assert(INT_MAX - n2 < n1); //or throw a floating point exception or otherwise signal not to use the result
  if(n2 < 0) assert(INT_MIN - n2 > n1);
  return n1 + n2;
}

But who does that? Thankfully, the compiler writers do. Coming up in a future release of clang is a collection of sanitisers that insert runtime checks for the things described above. If you’re the kind of person who writes assertions like the above in your code, you can swap all that for sanitisers enabled in your debug builds. If you’re not the kind of person who writes those assertions, you probably should enable these sanitisers, then go and find out where else you should be adding assertions.

In code that deals with input from other processes, machines or the outside world, you could consider enabling sanitisers even in release builds. They’ll cause your app to report where it encounters overflows, underflows, and other potential security problems. If you don’t think it’s a good enough better option, you should be writing explicit checks for bad data and application-specific failure behaviour.

So, how does this work? Compiling with the sanitiser options inserts checks of the sort shown above into the compiled code. These checks are evaluated at runtime (sort of; for array bounds checking, the size of the array must be known when compiling but the check is still done at runtime) and the process prints a helpful message if the checked condition fails. Let’s look at an example!

#include <stdio.h>
#include <limits.h>

int main(int argc, char *argv[]) {
	printf("%d + %d = %d\n", INT_MAX, 1, INT_MAX + 1);
	return 0;	
}

Compiling that with default settings “works”, but results in undefined behaviour:

clang -o mathfail mathfail.c
./mathfail
2147483647 + 1 = -2147483648

So let’s try to insert some sanity!

clang -fsanitize=integer -o mathfail mathfail.c
./mathfail
./mathfail.c:5:47: runtime error: signed integer overflow: 2147483647 + 1 cannot be represented in type 'int'
2147483647 + 1 = -2147483648

Another example: dividing by zero.

#include <stdio.h>

int main(int argc, char *argv[]) {
	int x=2, y=0;
	printf("x/y = %d\n", x/y);
	return 0;
}

./mathfail.c:5:24: runtime error: division by zero

I wonder how many of the programs I’ve written in the past would trigger sanity failures with these new tools. I wonder how many of those are still in use.

[*] With the exception of booleans. As Mark explains in the linked post, you can always compare a boolean to 0, which is the only value that means false.

About Graham

I make it faster and easier for you to create high-quality code.
This entry was posted in Uncategorized. Bookmark the permalink.