WordPress
Wordρress
If the first version has a capital “P”, then my blog host has decided to usurp my editorial control by default. Wonderful.
html { display: awesome; }
WordPress
Wordρress
If the first version has a capital “P”, then my blog host has decided to usurp my editorial control by default. Wonderful.
How many times have you ever legitimately used the comma operator in live C or C++ code? I’ve seen a Boost project use it as convenience notation for small compile-time datasets, but that’s about it. So, here’s an example of an absolutely terrible way to use it (yes… this is how I blow off steam at work):
#include <stdio.h> #include <string.h> int main(int argc, char **argv) { return argc > 1 ? !strcmp(argv[1], "foo") ? \ (puts("foo"),0) : (puts("not foo"),1) : 1; }
This program prints “foo”, “not foo”, or nothing based on the first argument; then it returns true (UNIX-style) only if the first argument was “foo”. While most language experts will not bat an eye at this, it’s definitely on the Perl side of ugly.
operator,
in actionWhy was I playing with the comma operator? I was looking at the definition of assert()
provided on my system (either part of libc or gcc, I’m not sure). If you disable assertions (-DNDEBUG
) you’ll see a null statement like this:
#include <assert.h> void test(int x, int y) { assert(x < y); }
void test(int x, int y) { ((void)(0)); }
If you leave assertions enabled, however, you’ll still see a null statement due to the somewhat-creepy comma-operator magic I’ve demonstrated above:
void test(int x, int y) { ((void)((x < y) ? 0 : \ (__assert_fail("x < y", "test.c", 4, "test"), 0))); }
So, assert
is defined as an expression rather than a statement. That means that you can combine it with other chunks of code in some surprising fashions:
char *short_strdup(const char *str, size_t len) { // don't allow anyone to pass in a null or over-long string return assert(str != NULL), assert(strlen(str) < len), strdup(str); } long infallible_atoi(const char *number) { char *end; long value = strtol(number, &end, 10); // validate that there was no conversion error // and that we consumed all input return assert(errno == 0), assert(*end == '\0'), value; }
This is not recommended, however, because assert
is not a general-purpose error-handling mechanism. I could imagine this being used as the basis for a hand-rolled assertion mechanisms in a large codebase (you could throw an exception, log an error with a stack trace, or cause monkeys to fly out of the original developer’s nose, for example).
I’ve seen a few people assert the precompiled headers are a pain in the butt, or not workable for large scale projects. In fact, it’s incredibly easy to add precompiled headers to a GCC-based project, and it can be quite beneficial.
Since I’m mostly familiar with Makefiles, I’ll present an example here that uses Make. It trivially extends to other similar build systems such as SCons, NMake, or Ant, but I’m not so sure about Visual studio projects. This example builds a single static library and several test applications. I’ve stripped out most of my compiler flags for brevity.
# boilerplate settings... SHELL = /bin/bash CXX = g++ -c CXXFLAGS += -std=c++98 -pedantic -MMD -g -Wall -Wextra LD = g++ LDFLAGS += -rdynamic -fno-stack-protector AR = ar # generic build rules # $@ is the target, $< is the first source, $^ is all sources define compile $(CXX) -o $@ $< $(CXXFLAGS) endef define link $(LD) -o $@ $(filter %.o,$^) $(filter %.a,$^) $(LDFLAGS) endef define ar $(AR) qsc $@ $(filter %.o,$^) endef # all library code is in src/ # all test applications are single-source # e.g. testfoo is produced from from testfoo.cxx and libsseray.a TEST_SRC = $(wildcard test*.cxx) LIB_SRC = $(wildcard src/*.cxx) TESTS = $(basename $(TEST_SRC)) LIB = libsseray.a all : $(TESTS) $(TESTS) : $(LIB) $(TESTS) : % : %.o $(link) %.o : %.cxx $(compile) $(LIB) : $(LIB_SRC:cxx=o) $(ar) # gcc-provided #include dependencies -include $(TEST_SRC:cxx=d) $(LIB_SRC:cxx=d) clean : rm -f $(LIB) $(TESTS) $$(find . -name '*.o' -o -name '*.d')
In order to use a precompiled header, this is what needs to be added to the Makefile. There are no source code modifications at all. I created a file pre.h that includes all of the system C and C++ headers that I use (in particular <iostream> is a big expense for the compiler).
# PCH is built just like all other source files # CXXFLAGS must match everything else pre.h.gch : pre.h $(compile) # all object files depend on the PCH for build ordering $(TEST_SRC:cxx=o) $(LIB_SRC:cxx=o) : pre.h.gch # this is equivalent to adding '#include <pre.h>' to the top of every source file $(TEST_SRC:cxx=o) $(LIB_SRC:cxx=o) : CXXFLAGS += -include pre.h # pre.h.gch should be cleaned up along with everything else clean : rm -f (...) pre.h.gch
The project itself is fairly small—16 source files totaling 1800 SLOC—but this small change just decreased my total build time from 12 to 8 seconds. This is entirely non-intrusive to the code, so adding it is a one-time cost (as opposed to the Microsoft stdafx.h approach, which is O(N) cost in the number of source files). It is also easy to disable for production builds, if you only want to use the PCH machinery for your day-to-day edit-compile-test cycle.
I like to sneak bitshifts into interviews—not because they’re used commonly in modern C++ code, but because they used to be common, as a way of getting good performance out of poor compilers. It’s very useful to know the tricks of the past, if you ever find yourself maintaining code written by an earlier generation of programmer. Take this example:
unsigned imul(unsigned x) { return x * 10; } unsigned bitshift(unsigned x) { return (x << 3) + (x << 1); }
Below is the assembler produced by gcc -c -O3 -march=core2, using objdump -d --no-show-raw-insn to get the assembler from the compiled output. There are two interesting things to note:
00000000 <imul>: 0: push %ebp 1: mov %esp,%ebp 3: mov 0x8(%ebp),%eax 6: pop %ebp 7: lea (%eax,%eax,4),%eax # n + 4n a: add %eax,%eax # (n + 4n) + (n + 4n) c: ret 00000010 <bitshift>: 10: push %ebp 11: mov %esp,%ebp 13: mov 0x8(%ebp),%eax 16: pop %ebp 17: lea 0x0(,%eax,8),%edx # 8n 1e: lea (%edx,%eax,2),%eax # 8n + 2n 21: ret
For comparison, here is the same code compiled with gcc -c -O0 -march=i386. Note that shifts are used in both cases. If you try a few other values of -O and -march, you’ll see some other interesting results, but I’m not going to bother to paste them all here.
00000000 <imul>: 0: push %ebp 1: mov %esp,%ebp 3: mov 0x8(%ebp),%edx 6: mov %edx,%eax 8: shl $0x2,%eax # n << 2 b: add %edx,%eax # (n << 2) + n d: shl %eax # ((n << 2) + n) << 1 f: leave 10: ret 00000011 <bitshift>: 11: push %ebp 12: mov %esp,%ebp 14: mov 0x8(%ebp),%eax 17: lea 0x0(,%eax,8),%edx # 8n 1e: mov 0x8(%ebp),%eax 21: shl %eax # n << 1 23: lea (%edx,%eax,1),%eax # 8n + (n << 1) 26: leave 27: ret
If you go through some of the major Intel processor models, you will see that the actual assembler output varies quite a bit. What does this mean? Mostly that micro-optimizations designed to produce ever-so-slightly better assembler are usually the wrong approach for long-lived software. Yet, it’s a fact of software that this code will be seen on any sufficiently large system, and it must be understood and fixed if possible.
Developers working on embedded systems with a restricted set of compilers… YMMV. Sorry.
I recently stumbled across an article referencing macros defined by gcc. That list is pretty daunting! If you compare it to GCC Common Predefined Macros (the official source), you realize quite fast that a lot of those macros exist for the benefit of libc and libstdc++ library authors—not compiler end users such as myself.
For whatever reason, it takes quite a bit of Google-Fu (or luck) to get GCC’s official page to show up on the first page of search results. Whenever I search for it, if I don’t remember the page title exactly, I have to dig through about 20 other sites before finding it. In any case, both of those lists are frigging huge, so here’s a bit of a smaller list that I usually have up my sleeves:
If you want to simply identify any GCC-family compiler (including the Intel C/C++ compilers), simply look for __GNUC__‘s existence.
std::vector<T*>::push_back(NULL)
confuses the compiler, and I have to instead write T *nullptr = NULL; std::vector<T*>::push_back(nullptr)
. Once that breaks, I’ll just delete the first line.std::vector<MyCrappyType>
, and not with just any vector type).Since I just noticed the original article I referred to has a snide comment about development environments, I’ll include this just for kicks: Predefined Macros in VC++ 2005.
First, a word of warning: This is not portable. Secondly… being able to produce stack traces (outside of the debugger) is something that’s usually reserved for languages like Python or Java… but it’s quite nice to have them in C++. There’s several hurdles to overcome, however.
This part is pretty easy, but unless you’re nosy with the header files in /usr/include, it’s not likely to stumble upon this by chance.
#include <execinfo.h>
void print_trace(FILE *out, const char *file, int line)
{
const size_t max_depth = 100;
size_t stack_depth;
void *stack_addrs[max_depth];
char **stack_strings;
stack_depth = backtrace(stack_addrs, max_depth);
stack_strings = backtrace_symbols(stack_addrs, stack_depth);
fprintf(out, "Call stack from %s:%d:\n", file, line);
for (size_t i = 1; i < stack_depth; i++) {
fprintf(out, " %s\n", stack_strings[i]);
}
free(stack_strings); // malloc()ed by backtrace_symbols
fflush(out);
}
GCC also provides access to the C++ name (de)mangler. There are some pretty hairy details to learn about memory ownership, and interfacing with the stack trace output requires a bit of string parsing, but it boils down to replacing the above inner loop with this:
#include <cxxabi.h>
...
for (size_t i = 1; i < stack.depth; i++) {
size_t sz = 200; // just a guess, template names will go much wider
char *function = static_cast(malloc(sz));
char *begin = 0, *end = 0;
// find the parentheses and address offset surrounding the mangled name
for (char *j = stack.strings[i]; *j; ++j) {
if (*j == '(') {
begin = j;
}
else if (*j == '+') {
end = j;
}
}
if (begin && end) {
*begin++ = '';
*end = '';
// found our mangled name, now in [begin, end)
int status;
char *ret = abi::__cxa_demangle(begin, function, &sz, &status);
if (ret) {
// return value may be a realloc() of the input
function = ret;
}
else {
// demangling failed, just pretend it's a C function with no args
std::strncpy(function, begin, sz);
std::strncat(function, "()", sz);
function[sz-1] = '';
}
fprintf(out, " %s:%s\n", stack.strings[i], function);
}
else
{
// didn't find the mangled name, just print the whole line
fprintf(out, " %s\n", stack.strings[i]);
}
free(function);
}
There. You could do a bit more optimization, but I’ll leave that as an exercise to the reader. The important thing is to obey exactly what the ABI requires regarding dynamic memory:
malloc
, along with the current size of the buffer.
realloc
your buffer to make space for the whole name, and I’ll return the result, which may be different. Or I’ll fail and return NULL
, because you didn’t pass in a mangled name I could understand.
NULL
).
Otherwise, you’ll get a segmentation fault somewhere along the line, and a stack trace that blows up the program isn’t useful except for post-mortem analysis in gdb.
Even if you go through all of the other steps, if you don’t account for this in your build phase, you will get a pretty useless stack trace. Here’s what I got from my experiment initially:
Call stack from backtrace.cxx:105:
debug/backtrace:__gxx_personality_v0()
debug/backtrace:__gxx_personality_v0()
debug/backtrace:__gxx_personality_v0()
debug/backtrace:__gxx_personality_v0()
/lib/tls/i686/cmov/libc.so.6:__libc_start_main()
debug/backtrace:__gxx_personality_v0()
After poking around, I realized I had to add -rdynamic to my linker flags so that all symbols would be exported into the executable. I haven’t experimented, but I would guess this applies to shared-object building as well.
With -rdynamic, my stack trace looks a lot nicer:
Call stack from backtrace.cxx:105:
debug/backtrace:hot_potato::pass(double, double)
debug/backtrace:hot_potato::pass(int)
debug/backtrace:hot_potato::pass()
debug/backtrace:main()
/lib/tls/i686/cmov/libc.so.6:__libc_start_main()
debug/backtrace:__gxx_personality_v0()
Bingo! I only get a source file and line number from the call site, but it’s not too difficult to trace back through callers from this point. In this case, my executable is named debug/backtrace and hot_potato
is just an example class that calls its own functions to give me a pretty chain to look at.
Being able to create a stack trace is only half of it. Now you actually have to use it to get any value out of it. Here’s the rub: This is most useful when an exception gets caught, but the data is already gone by that time, because the exception’s been thrown. Logically then, it seems to make sense to encapsulate this functionality into an exception class (e.g. class app_exception : public std::exception
) that gets thrown. Simply replace all of the fprintf
calls with something that generates a string. execute it in the exception class’s constructor, and print it out in the catch
block.
Another option would be to allocate thread-local storage for stack traces (similar to the current thread-local errno
), and then have all of your important functions call set_thread_stacktrace
, which populates that thread-local storage. Then the exception handlers can just pull that data regardless of the type of exception thrown. I think this is better from the flexibility aspect, but I haven’t actually investigated the feasability of it nor the performance impact of recalculating this all the time.
I admit, I’ve been experimenting more with awk lately. Generally, my opinion has always been, “If it’s not simple enough for #!/bin/bash, I’d rather use python/perl/ruby.” Figured I’d simplify my life by having one less flavor of syntax/regexps to worry about.
What a silly idea! While Python may be great for “enterprise-class”[1] log analysis, nothing beats awk for one-liners. Take a few examples off the top of my head…
$ zcat /var/log/auth.log.*.gz | awk '$6 == "Invalid" { print $8 }' | sort | uniq -c | sort -n -r | head -n 30 79 admin 71 test 52 user 43 michael 40 alex 39 guest 32 oracle 30 www 30 dave 28 info 26 sales 25 web 25 ben 23 victoria 23 paul 23 httpd 23 adam 22 john 21 shop 21 mike 21 ftp 21 david 21 caroline 21 amanda 20 toor 20 server 20 samba 20 linux 20 danny 20 claire
Most interesting… Nobody bothers to try root, but apparently someone’s used toor before. Also, I see a mix of common first names as well as known linux service names (httpd, ftp, etc). My question is… are there that many sysadmins named caroline?
$ zcat /var/log/auth.log.*.gz | awk '$6 == "Invalid" { print $10 }' | sort | uniq -c | sort -n -r 5170 80.237.205.72 1243 212.112.227.139 1040 216.190.237.68 336 193.137.179.181 220 200.168.28.21 132 222.128.249.253 94 196.200.90.99 64 200.105.16.242 60 211.104.85.236 44 61.192.163.188 13 200.11.76.170 6 210.100.157.9 6 124.135.192.2 5 222.69.93.27 5 222.189.238.179 3 70.97.158.195
Wow, 80.237.205.72 is a really persistent little bugger. Upon looking closer, I see all of the attempts were on a single day. Let’s see the latency between attempts:
$ zcat /var/log/auth.log.*.gz | awk ' $6 == "Invalid" && $10 == "80.237.205.72" { oldsec = sec; split($3, time, ":"); sec = time[3] + 60 * (time[2] + 60 * time[1]); if (oldsec > 0) { print sec - oldsec; } }' | sort -n | uniq -c 267 2 3366 3 1457 4 23 5 19 6 15 7 1 8 4 9 1 10 5 11 1 13 3 14 2 15 2 16 2 24 1 44
So basically, throughout the day, every 3 seconds someone was trying to log in.
Anyways, I thought I would have something more interesting from awk, but this’ll have to suffice.
Footnote [1] Whatever that means
I spent about two hours today trying to debug a race condition in a multi-threaded C++ app today… definitely not a fun thing to do. The worst part? The runtime diagnostics weren’t giving me anything useful to work with! Sometimes things just worked, sometimes I got segmentation faults inside old, well-tested parts of the application. At one point, I saw this error pop up:
pure virtual method called terminate called without an active exception Aborted
What? I know I can’t instantiate a class that has any pure-virtual methods, so how did this error show up? To debug it, I decided to replace all of the potentially-erroneous pure virtuals with stub functions that printed warnings to stderr
. Lo and behold, I confirmed that polymorphism wasn’t working in my application. I had a bunch of Derived
s sitting in memory, and yet, the Base
methods were being called.
Why was this happening? Because I was deleting objects while they were still in use. I don’t know if this is GCC-specific or not, but something very curious happens inside of destructors. Because the object hierarchy’s destructors get called from most-derived to least-derived, the object’s vtable
switches up through parent classes. As a result, at some point in time (nondeterministic from a separate thread), my Derived
objects were all really Base
s. Calling a virtual member function on them in this mid-destruction state is what caused this situation.
Here’s about the simplest example I can think of that reproduces this situation:
#include <pthread.h>
#include <unistd.h>
struct base
{
virtual ~base() { sleep(1); }
virtual void func() = 0;
};
struct derived : public base
{
virtual ~derived() { }
virtual void func() { return; }
};
static void *thread_func(void* v)
{
base *b = reinterpret_cast<base*>(v);
while (true) b->func();
return 0;
}
int main()
{
pthread_t t;
base *b = new derived();
pthread_create(&t, 0, thread_func, b);
delete b;
return 0;
}
So what’s the moral of the story? If you ever see the error message pure virtual method called / terminate called without an active exception, check your object lifetimes! You may be trying to call members on a destructing (and thus incomplete) object. Don’t waste as much time as I did.
I’ve had several conversations with people who seemed overly proud about 100% code-coverage in their unit tests. Obviously, that’s a good thing: the more test cases, the less likelihood of a latent fault existing in the software. But code coverage has its dark side, too. Take a look at this (extremely contrived) C example:
unsigned int noop(unsigned int x)
{
unsigned int y = x << 4;
return y >> 4;
}
There are no branches in the code, the cyclomatic complexity is great! In fact, I can get 100% test coverage with a single successful test case:
int main()
{
return (noop(5) == 5) ? 0 : 1;
}
However, there can be branches in the behavior of that function; in this case, based on the overflow rules for integers. Any argument that happens to use the top 4 bits will get those bits truncated, and will fail my unit test. For this simple function, it is possible to rigorously prove the behavior of the function, so everyone can see that a second test case is required:
#include <limits.h>
int main()
{
return (noop(5) == 5) && (noop(UINT_MAX) == UINT_MAX) ? 0 : 1
}
Now the unit test demonstrates a failure, without changing the test coverage at all. For real software systems, the reality is that there are two problems:
Obviously, a piece of code that is never exercised has a 0% assurance rating. Thus, reasonable code coverage (near-100%) is a necessity for assurance, but in itself, does not always provide a high level of assurance. I suppose this is where some “test-case generation” tools come into play. Being able to generate sets of input that cover the “data-dependent behavioral branching” or being able to measure coverage based on parameters passed can be hugely powerful to deal with this sort of problem.
GCC has flags. A lot of them. I’ve spent a fair amount of time going through the man-page trying to figure out the best “general purpose” set of flags for my own personal development. Here’s what I use as the baseline for my home C++ projects (GCC 4.3.0, linux, old Intel Pentium4) YMMV, especially with third-party tools, since a lot of these settings are
Enjoy.
namespace foo { };
.long long
as a data type for sequence ids.<stdint.h>
or <inttypes.h>
in C++ code, these are required to enable all of the C99 features.i
.
std::string
used operator const char*
instead of c_str()
… This looks like an eyesore, and in the case of obscure types, it’s not obvious to a maintenance programmer if it means static_cast
(yes, in this case) or reinterpret_cast
(segfault due to garbage data) or dynamic_cast
(segfault due to NULL string).
printf("the answer-->[%s]\n", (const char*)answer);
-fstrict-aliasing
.
printf
will core or (worse) print weird data at run-time if the format arguments don’t match the varargs. I’ve never been too fearful of varargs in C++ (unlike most of the rest of the community), mostly because GCC protects me from my own carelessness in this way.static
.
override
keyword that C# has to specify “this method only exists to implement the behavior of a virtual method from a parent.” Any way to detect mismatched virtual method overrides is good.public
to a class declaration. It’s more obvious than weird errors later complaining that the object can’t be instantiated.T*
to downgrade to char*
. Implicit is generally “bad” because it’s hidden, but this tends to only work in obvious places such as memcpy(&dst, &src, sizof(src))
.static_cast<T>
for something like the C socket API that distinguishes between struct sockaddr
and struct sockaddr_in
(and the relevant structures are all local stack objects).reinterpret_cast<T>
for C-style APIs that pass around void*
union
everywhere else. GCC seems to deal with unions better than arbitrary casting.std::vector::push_back()
could be written such that the expected case (size() < capacity()
) incurs no branch misprediction and exhibits maximal instruction cache locality. In personal experiments, I’ve seen this make a difference of fivefold or more for very lightweight template containers.
double
) when they round-trip to memory. The net effect is that in certain edge cases, double x = 0.1; double y = x; assert(x == y)
can result in an assertion failure due to lost significant figures. You can force all floating-point calculations to round-trip through memory with -ffloat-store, but that incurs a significant performance penalty (and if you weren’t concerned with performance you wouldn’t be using C++, would you?). However, from what I have read, using SSE instructions mitigates this issue entirely.str{cpy,len,...}
and mem{cpy,move,set}
library calls with GCC builtins, which generally turn into multibyte assignments or machine-specific string instructions. I’ve seen it turn a strcpy
into several movl
instructions, with the string data interpreted as an array of unsigned integers. Neat. Usually this is faster (due to the removal of a function call), but it doesn’t always speed up code: the extra instructions may increase instruction cache misses, which definitely affects aggressively inlined blocks.
%.o : %.cxx
$(CXX) -c -o $@ $< -MMD -MF $(basename $@).dep $(CXXFLAGS)
include $(wildcard *.dep)
With that, foo.cxx produces object file foo.o and Makefile dependency rule file foo.dep. I always find it best to use GCC for the dependency generation rather than a separate step (such as the makedepend program or some batshit insane sed scripts that I’ve seen littering some Makefiles, probably a relic from before the compiler generated this information). GCC itself produces a 100% accurate result and the generated rule has all pathing information set correctly as well, which other tools may not set up correctly. Add to that tools like makedepend modifies the Makefile itself by default, which adds a lot of unnecessary churn in the revision control software.
Here’s a snippet out of one of my Makefiles that includes most or all of these settings:
CXX = g++ -Wa,-a -pipe CC = gcc -Wa,a -pipe LD = g++ -pipe WARN = error all extra write-strings init-self cast-align cast-qual \ pointer-arith strict-aliasing format=2 uninitialized \ missing-declarations no-long-long no-unused-parameter CXXWARN = overloaded-virtual non-virtual-dtor ctor-dtor-privacy OPTIM = \ $(addprefix -f,strict-aliasing reorder-blocks) \ $(addprefix -m,arch=native sse2 fpmath=sse inline-all-stringops) \ -O3 CXXFLAGS = -ansi -pedantic -std=c++0x -ggdb \ $(addprefix -W,$(WARN) $(CXXWARN)) $(OPTIM) CFLAGS = -ansi -pedantic -std=c99 -ggdb \ $(addprefix -W,$(WARN)) $(OPTIM) LDFLAGS = -lrt define DO_LINK $(LD) -o $@ $^ $(LDFLAGS) endef define DO_COMPILE_CXX $(CXX) -c -o $@ $< $(basename $@).s endef define DO_COMPILE_C $(CC) -c -o $@ $< $(basename $@).s endef