Compile commands

Interpreting source code requires a certain amount of context.

#include <stdio.h> // which file is this, exactly?

char data[sizeof(int)]; // how big is this array?

@class Foo; // objective-C, or just a syntax error?

C++ compilers expect this context to be passed as command-line flags (and provide some defaults). A command might look like: clang -x objective-c++ -I/path/headers --target=x86_64-pc-linux-gnu -DNDEBUG foo.mm. Build systems are responsible for collecting the right flags for each file.

Clangd also needs this bundle of hints for the files it operates on. It reuses the compiler’s approach: first we determine a virtual compile command for each opened file (ideally by consulting the build system). Then we use this command to configure the parser.

Compile command vs clangd flags

These are easy to conflate, and important to distinguish!

clangd flags are passed when the editor starts clangd. They appear in the clangd log at the very start, like

I[...] argv[1]: --log=verbose

The compile command (or compile flags) is a virtual command constructed, and interpreted within clangd. It is logged when a file is opened, e.g.

I[...] ASTWorker building file /path/test.cc version 1 with command [/path]
/usr/bin/clang /path/test.cc -DNDEBUG -fsyntax-only -resource-dir=/usr/lib/clang/lib/12/

What can be in a compile command?

Because clangd embeds both clang’s driver (used to interpret compile commands) and clang’s parser (which is controlled by flags), most flags that can be passed to clang will also work with clangd. The defaults are also similar to clang running on the same system.

The most critical flags in practice are:

Without these flags, clangd will often spectacularly fail to parse source code, generating many spurious errors (e.g. #included files not being found).

Many other flags are interesting though:

As well as flags, the named program (argv[0]) affects parser behavior. clang++ will parse as C++ by default, while clang will assume C. /opt/llvm/bin/clang will search for the standard library under /opt/llvm as well as the usual places.

(The working directory is also considered part of the compile command, mostly it affects how relative paths within the compile command are interpreted).

Where do compile commands come from?

Generally, clangd first obtains a basic compile command from a compilation database and then applies some tweaks to it.

Compilation databases

A compilation database describes compile commands for a codebase. It can be:

We first check for a compilation database in the directory containing the source file, and then walk up into parent directories until we find one.

GlobalCompilationDatabase is responsible for discovering, caching and refreshing compilation databases.

Commands for header files

Clangd parses headers like any other source file (which is why it only supports self-contained headers). However most build systems don’t compile headers directly and therefore don’t record compile commands for them.

If no command is available for a header, but a file has been opened that transitively includes it, then that file’s command will be used. To enable this TUScheduler keeps a HeaderIncluderCache to look up the relevant file.

Otherwise when compile_commands.json is used, we’ll pick some entry whose filename most closely matches that of the header. The idea is that bar/foo.cc is likely to include bar/foo.h and therefore have a compatible command. This is implemented in InterpolatingCompilationDatabase. As it’s purely a filename-based heuristic it occasionally goes wrong. It can also provide a decent default command for newly created files the build system doesn’t know about yet.

If a command has been “borrowed” from another file, this is noted when the compile command is logged. (... with command inferred from foo/bar.cc).

Fallback commands

If no compilation database is found, a very simple command like clang foo.cc is used. For a real project this will often fail to find #includes, but it allows clangd to work on toy examples without configuration.

Tweaks always applied

Query-driver

If the compile command names a compiler that is present, then clangd can query it to understand its default configuration (header search paths and target), and then adjust the compile command to configure the clang parser to match.

The compiler (e.g. custom-gcc) must be compatible enough with gcc to dump its configuration in response to this command:

$ custom-gcc -E -v -x c++ /dev/null
Target: arm-linux-gnueabihf
...
#include <...> search starts here:
 /opt/custom-gcc/include/c++/10
 ...
End of search list.

This would cause clangd to add --target=arm-linux-gnueabihf -isystem /opt/custom-gcc/include/c++/10 to the compile command, to better simulate this toolchain. This is often easier than configuring the correct search paths by hand when code is designed to build with a customized toolchain.

The compiler queried is always the argv[0] of the compile command, but the clangd flag --query-driver=/path/to/custom-gcc must be passed to enable this. (This must be explicitly enabled as we’re executing otherwise unknown binaries!)

Customizing compile commands

Users may want to modify the compile command used for various reasons. This is the preferred way to choose which warnings to show, and can sometimes be used to work around clangd bugs/limitations.

Editing compile_commands.json is usually not a feasable option as it is a generated file and changes will be quickly overwritten (though writing tools to automate customizations is possible).

The configuration file is a simpler alternative, allowing compile flags to be added or removed. e.g.

CompileFlags:
  Add: '-Wswitch'
  Remove: '-Werror'

Compile command problems

Incorrect compile commands can cause problems of different severity. In all cases, the log contains the compile command that was used, and it can be easiest to feed variations of that command to clang to try to understand the problem.

Unusable command

If a command is completely malformed, we won’t run the parser. This produces some diagnostics which are attached to line 1 but other features on the file will not be available (will fail with “invalid AST”).

(This isn’t great, maybe we should recover with a fallback command instead?)

This can be recognized by the diagnostics (often “expected exactly one compiler job”) by subsequent “invalid AST” errors, and by the “Could not build CompilerInvocation” log message.

Severe parsing problems

If the command was unsuitable for the file, then then the parser may run but fail to understand much of the file. Typical causes:

Usually diagnostics reported near the top of the file will make these problems obvious.

Subtle parsing problems

Sometimes only a few constructs do not get parsed correctly. For example if the code uses some C++20 constructs but the compile command doesn’t specify the language version, or if some expected preprocessor symbols are missing.

This may happens when compile commands from another compiler are used, and the defaults are different (e.g. GCC currently defaults to C++17, vs C++14 for clang).

This will usually result in diagnostics that pinpoint the problem.

Headers outside the project directory

When opening a header outside the project directory (for example, a header from an external library that’s included by a file in the project), clangd will typically fail to find a compilation database (which is usually located in the project directory), and fall back to a default command that may not include flags that are important for parsing the header’s code (for example, include paths to locate the headers that it includes).

See this FAQ question for a way to work around this.

✏️