Debugging dlopen with rpath and ldd
Sebastian Good

This post is really just a little reminder to myself on how to debug some library dependency issues on a Linux machine. I built a nightly version of Julia (a language for technical computing that we’re pretty excited about here), on Linux, deployed it to a different machine, but then it failed to launch, complaining about

dlerror loading 'libopenblas': libopenblas: cannot open shared object file: No such file or directory

Oy. dlerror means it failed to open a dynamic library. That’s odd, sincelibopenblas was sitting right there in the lib directory. Perhaps it couldn’t find it? How do Linux shared libraries get loaded anyway? That’s a long and relatively complicated story (just as it is on Windows), but the first thing to do is just run lddand see how it was linked.

$ ldd `which julia` =>  (0x00007ffff5988000) => /usr/local/bin/../lib/julia/ (0x00007f396642d000) => /lib/x86_64-linux-gnu/ (0x00007f3966206000) => /lib/x86_64-linux-gnu/ (0x00007f3965e3f000) => /lib/x86_64-linux-gnu/ (0x00007f3965c3b000) => /lib/x86_64-linux-gnu/ (0x00007f3965935000) => /lib/x86_64-linux-gnu/ (0x00007f396572c000) => /usr/lib/x86_64-linux-gnu/ (0x00007f3965428000) => /lib/x86_64-linux-gnu/ (0x00007f3965212000)  /lib64/ (0x00007f39674b8000)

Hm. I don’t even see libopenblas in there. That means it must be loaded explicitly via a call to dlopen at run time. Might it not have the right path to search? I tried setting LD_LIBRARY_PATHto include the relevant lib folder, but still got the same error. If that didn’t work, it must be because Julia was using rpath linking. Perhaps the implied path was wrong? So how do we see which paths the executable might be searching as it loads libraries? Not so hard:

$objdump -x `which julia` | grep RPATH RPATH                $ORIGIN/../lib/julia:$ORIGIN/../lib

That looks good. $ORIGIN is special for rpath, indicating that libraries should be found relative to the original executable. In this case, the path was fine. libopenblaswas indeed found at $ORIGIN/../lib/julia. So my theory was bad. libopenblaswas most certainly getting found, even if the error message made it sound like it wasn’t.

Well shoot, every time this sort of thing happens to me I forget about transitive library failures until I step away from the computer. Yes, the error says “libopenblas: cannot open shared object file“, but what if it was a problem further downstream?

$ ldd /usr/local/lib/julia/ =>  (0x00007fff7f506000) => /lib/x86_64-linux-gnu/ (0x00007f23e6674000) => /lib/x86_64-linux-gnu/ (0x00007f23e6456000) => not found => /lib/x86_64-linux-gnu/ (0x00007f23e608f000)  /lib64/ (0x00007f23e8ce4000)

AHA! The real problem is libgfortran, easily addressed by a

$ sudo apt-get install -y libgfortran3

And I was back to solving interesting problems.