Compile commands
Interpreting source code requires a certain amount of context.
#include <stdio.h> // which file is this, exactly?
char data[sizeof(int)]; // how big is this array?
@class Foo; // objective-C, or just a syntax error?
C++ compilers expect this context to be passed as command-line flags (and
provide some defaults). A command might look like:
clang -x objective-c++ -I/path/headers --target=x86_64-pc-linux-gnu -DNDEBUG foo.mm
.
Build systems are responsible for collecting the right flags for each file.
Clangd also needs this bundle of hints for the files it operates on. It reuses the compiler’s approach: first we determine a virtual compile command for each opened file (ideally by consulting the build system). Then we use this command to configure the parser.
Compile command vs clangd flags
These are easy to conflate, and important to distinguish!
clangd
flags are passed when the editor starts clangd.
They appear in the clangd log at the very start, like
I[...] argv[1]: --log=verbose
The compile command (or compile flags) is a virtual command constructed, and interpreted within clangd. It is logged when a file is opened, e.g.
I[...] ASTWorker building file /path/test.cc version 1 with command [/path]
/usr/bin/clang /path/test.cc -DNDEBUG -fsyntax-only -resource-dir=/usr/lib/clang/lib/12/
What can be in a compile command?
Because clangd embeds both clang’s driver (used to interpret compile commands)
and clang’s parser (which is controlled by flags), most flags that can be passed
to clang
will also work with clangd. The defaults are also similar to clang
running on
the same system.
The most critical flags in practice are:
- Setting the
#include
search path:-I
,-isystem
, and others. - Controlling the language variant used:
-x
,-std
etc - Predefining preprocessor macros,
-D
and friends
Without these flags, clangd will often spectacularly fail to parse source code, generating many spurious errors (e.g. #included files not being found).
Many other flags are interesting though:
- Warnings are controlled with compile flags (e.g.
-Wall
,-Wswitch
,-Wno-switch
,-Werror
…) - Changing the target platform (e.g.
--target
) affects some builtins, like the size oflong
.
As well as flags, the named program (argv[0]
) affects parser behavior.
clang++
will parse as C++ by default, while clang
will assume C.
/opt/llvm/bin/clang
will search for the standard library under /opt/llvm
as well as the usual places.
(The working directory is also considered part of the compile command, mostly it affects how relative paths within the compile command are interpreted).
Where do compile commands come from?
Generally, clangd first obtains a basic compile command from a compilation database and then applies some tweaks to it.
Compilation databases
A compilation database describes compile commands for a codebase. It can be:
- a file named
compile_commands.json
listing commands for each file. Usually generated by a build system like CMake. - a file named
compile_flags.txt
listing flags to be used for all files. Typically hand-authored for simple projects.
We first check for a compilation database in the directory containing the source file, and then walk up into parent directories until we find one.
GlobalCompilationDatabase is responsible for discovering, caching and refreshing compilation databases.
Commands for header files
Clangd parses headers like any other source file (which is why it only supports self-contained headers). However most build systems don’t compile headers directly and therefore don’t record compile commands for them.
If no command is available for a header, but a file has been opened that
transitively includes it, then that file’s command will be used. To enable this
TUScheduler keeps a HeaderIncluderCache
to look up the relevant file.
Otherwise when compile_commands.json
is used, we’ll pick some entry whose
filename most closely matches that of the header. The idea is that bar/foo.cc
is likely to include bar/foo.h
and therefore have a compatible command.
This is implemented in InterpolatingCompilationDatabase. As it’s purely a
filename-based heuristic it occasionally goes wrong.
It can also provide a decent default command for newly created files the build
system doesn’t know about yet.
If a command has been “borrowed” from another file, this is noted when the
compile command is logged. (... with command inferred from foo/bar.cc
).
Fallback commands
If no compilation database is found, a very simple command like clang foo.cc
is used. For a real project this will often fail to find #include
s, but it
allows clangd to work on toy examples without configuration.
Tweaks always applied
- bare command names like
clang
are made absolute (by looking them up on the$PATH
, queryingxcrun
on mac, or failing that guessing). This increases the chances of being able to find the standard library. - on mac,
-isysroot
is set to the default SDK (by queryingxcrun
). This more closely matches the behavior of Apple’s/usr/bin/clang
(which is really a script that invokes the real clang with extra flags). Without this, the standard library again will not be found. -fsyntax-only
is added because we’re just parsing, not compiling.-resource-dir=...
is added, because built-in headers like<stddef.h>
must be the ones installed with clangd.- certain unsupported flags like
-plugin
are dropped.
Query-driver
If the compile command names a compiler that is present, then clangd can query it to understand its default configuration (header search paths and target), and then adjust the compile command to configure the clang parser to match.
The compiler (e.g. custom-gcc
) must be compatible enough with gcc to dump its
configuration in response to this command:
$ custom-gcc -E -v -x c++ /dev/null
Target: arm-linux-gnueabihf
...
#include <...> search starts here:
/opt/custom-gcc/include/c++/10
...
End of search list.
This would cause clangd to add --target=arm-linux-gnueabihf -isystem
/opt/custom-gcc/include/c++/10
to the compile command, to better simulate this
toolchain. This is often easier than configuring the correct search paths by
hand when code is designed to build with a customized toolchain.
The compiler queried is always the argv[0]
of the compile command, but the
clangd flag --query-driver=/path/to/custom-gcc
must be passed to enable this.
(This must be explicitly enabled as we’re executing otherwise unknown binaries!)
Customizing compile commands
Users may want to modify the compile command used for various reasons. This is the preferred way to choose which warnings to show, and can sometimes be used to work around clangd bugs/limitations.
Editing compile_commands.json
is usually not a feasable option as it is
a generated file and changes will be quickly overwritten (though writing tools
to automate customizations is possible).
The configuration file is a simpler alternative, allowing compile flags to be added or removed. e.g.
CompileFlags:
Add: '-Wswitch'
Remove: '-Werror'
Compile command problems
Incorrect compile commands can cause problems of different severity.
In all cases, the log contains the compile command that was used, and it can be
easiest to feed variations of that command to clang
to try to understand
the problem.
Unusable command
If a command is completely malformed, we won’t run the parser. This produces some diagnostics which are attached to line 1 but other features on the file will not be available (will fail with “invalid AST”).
(This isn’t great, maybe we should recover with a fallback command instead?)
This can be recognized by the diagnostics (often “expected exactly one compiler job”) by subsequent “invalid AST” errors, and by the “Could not build CompilerInvocation” log message.
Severe parsing problems
If the command was unsuitable for the file, then then the parser may run but fail to understand much of the file. Typical causes:
- couldn’t find included headers because of missing
-I
flags - command is for parsing C but the code is C++
Usually diagnostics reported near the top of the file will make these problems obvious.
Subtle parsing problems
Sometimes only a few constructs do not get parsed correctly. For example if the code uses some C++20 constructs but the compile command doesn’t specify the language version, or if some expected preprocessor symbols are missing.
This may happens when compile commands from another compiler are used, and the defaults are different (e.g. GCC currently defaults to C++17, vs C++14 for clang).
This will usually result in diagnostics that pinpoint the problem.
Headers outside the project directory
When opening a header outside the project directory (for example, a header from an external library that’s included by a file in the project), clangd will typically fail to find a compilation database (which is usually located in the project directory), and fall back to a default command that may not include flags that are important for parsing the header’s code (for example, include paths to locate the headers that it includes).
See this FAQ question for a way to work around this.
✏️