C / C++ are not as cross platform as you think

Many people try to say that C and C++ are wonderful languages because they are cross platform.  While it is true that they are compilable on many platforms, to say that they are cross platform is at best misleading.  You will run into all kinds of whacky problems when trying to go between platforms.  Here is an example.

I’m working on developing a ForestDB based storage engine for Couchbase Lite .NET.  ForestDB is written in C, and for the native iOS version there was a C++ library written to use it called CBForest.  P/Invoke calls to C++ are almost always an impossible terrible idea, so CBForest includes a C interface to expose its main functionality to other managed languages like C# and Java.  Since CBForest was developed with OS X and iOS in mind, it had a lot of issues running on Windows.  There were a lot of GNU extensions used that were simply not available in the Visual Studio compiler for one thing, but let’s focus on a problem that had me chasing my tail for almost the entire workday today.

Here are the symptoms.  I was running a unit test from C#, but it would fail by throwing an AccessViolationException.  Ths is basically the equivalent of a bad memory access in an unmanaged language.  I traced the bad call to here.  So I started looking in the obvious places, namely each of the parameters getting passed in.  Since this calls into an unmanaged library I couldn’t simply step through with the debugger so I started putting in logging statements.  The logging statements showed the data completed random as if uninitialized.  What?  How can that be?  Is P/Invoke failing somehow?  So as a sanity check I decided to log something I knew to be immutable.  I logged sizeof(bool) expecting to see “1” in the console.  Nope, no luck.  The logged number was completely random.  Our logging makes use of a function called vasprintf which is like printf except that instead of printing to the console it stores the result in a char * and takes a va_list directly instead of creating one from the ... C syntax.  This function is actually not available in the visual studio compiler so I had taken an implementation from the Internet.  Could this be the culprit of the weird logging?

I tried a few other implementations with similar results.  Frustrated, I switched to using sprintf instead and simply passing an already formatted string to the logging function. Ah ha! This was logging correctly! vasprintf must be messed up somehow!

Nope, it wasn’t. The implementation was fine. Actually the problem lied in the way it was being called through a macro. You can see here how there is a compiler check to deal with the slight difference in vararg macro syntax between visual studio and pretty much EVERYTHING ELSE. However, what I didn’t realize is that visual studio won’t use the varargs by default. You have to explicitly tell it to! Meet the solution.  Would this finally let me log correctly so I could start debugging?

YES!  Success!  It was logging correctly!  Time to start figuring out what to do about the exception from earlier…wait..what?  The unit test passed.  The exception was gone.  I see!  To explain why this was happening a bit of explanation about how varargs work is needed.  varargs are a way to have a variable length list of arbitrary arguments in your function syntax.  You can even have none.  The way this works is that arguments are passed in the same block of memory right after one another to the called function.  Since we can rely on this, we can simply keep using pointer math to move to the next argument until we are satisfied.  The way to start iterating through varargs is by using the va_start macro.  It takes the location of the final argument **just before** the vararg list as its parameter:

void someFunction(const char *message, ...)
{
    va_list args;
    va_start(args, message);
    ...
    va_end(args)
}

So in the above example, the vararg list starts after the “message” parameter.  It is then up to the function to use the arg list as it sees fit via the va_arg macro. It will read the appropriate number of bytes and interpret it according to the specified type, so for the above example double num = va_arg(args, double) will read 8 bytes and interpret the result as a double. In CBForest’s case it leaves this logic up to the vasprintf function. It will read from the list according to the format string so the string “A number %d another number %d” will read two ints.

However, there is no way to check at compile time whether or not this is correct so the compiler has no choice but to accept that you know what you are doing. In my case I was telling it to read all kinds of things, and then not passing those things in. What is the result of that? Well let’s think about that for a minute.

As I noted before, the varargs come in sequential order after the final parameter preceding the varargs list, but what if there are no arguments there? The program will attempt to read past the end of the list anyway and end up either spouting out random nonsense (there is my logging problem) or crashing (hey, there is my AccessViolationException!)

So the take away from all of this is, multiplatform support is hard and C / C++ is not a big of a help as you might think ;).

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s