Interpreting source code requires a certain amount of context.
#include <stdio.h> // which file is this, exactly? char data[sizeof(int)]; // how big is this array? @class Foo; // objective-C, or just a syntax error?
C++ compilers expect this context to be passed as command-line flags (and
provide some defaults). A command might look like:
clang -x objective-c++ -I/path/headers --target=x86_64-pc-linux-gnu -DNDEBUG foo.mm.
Build systems are responsible for collecting the right flags for each file.
Clangd also needs this bundle of hints for the files it operates on. It reuses the compiler’s approach: first we determine a virtual compile command for each opened file (ideally by consulting the build system). Then we use this command to configure the parser.
Compile command vs clangd flags
These are easy to conflate, and important to distinguish!
clangd flags are passed when the editor starts clangd.
They appear in the clangd log at the very start, like
I[...] argv: --log=verbose
The compile command (or compile flags) is a virtual command constructed, and interpreted within clangd. It is logged when a file is opened, e.g.
I[...] ASTWorker building file /path/test.cc version 1 with command [/path] /usr/bin/clang /path/test.cc -DNDEBUG -fsyntax-only -resource-dir=/usr/lib/clang/lib/12/
What can be in a compile command?
Because clangd embeds both clang’s driver (used to interpret compile commands)
and clang’s parser (which is controlled by flags), most flags that can be passed
will also work with clangd. The defaults are also similar to
clang running on
the same system.
The most critical flags in practice are:
- Setting the
-isystem, and others.
- Controlling the language variant used:
- Predefining preprocessor macros,
Without these flags, clangd will often spectacularly fail to parse source code, generating many spurious errors (e.g. #included files not being found).
Many other flags are interesting though:
- Warnings are controlled with compile flags (e.g.
- Changing the target platform (e.g.
--target) affects some builtins, like the size of
As well as flags, the named program (
argv) affects parser behavior.
clang++ will parse as C++ by default, while
clang will assume C.
/opt/llvm/bin/clang will search for the standard library under
as well as the usual places.
(The working directory is also considered part of the compile command, mostly it affects how relative paths within the compile command are interpreted).
Where do compile commands come from?
Generally, clangd first obtains a basic compile command from a compilation database and then applies some tweaks to it.
A compilation database describes compile commands for a codebase. It can be:
- a file named
compile_commands.jsonlisting commands for each file. Usually generated by a build system like CMake.
- a file named
compile_flags.txtlisting flags to be used for all files. Typically hand-authored for simple projects.
We first check for a compilation database in the directory containing the source file, and then walk up into parent directories until we find one.
GlobalCompilationDatabase is responsible for discovering, caching and refreshing compilation databases.
Commands for header files
Clangd parses headers like any other source file (which is why it only supports self-contained headers). However most build systems don’t compile headers directly and therefore don’t record compile commands for them.
If no command is available for a header, but a file has been opened that
transitively includes it, then that file’s command will be used. To enable this
TUScheduler keeps a
HeaderIncluderCache to look up the relevant file.
compile_commands.json is used, we’ll pick some entry whose
filename most closely matches that of the header. The idea is that
is likely to include
bar/foo.h and therefore have a compatible command.
This is implemented in InterpolatingCompilationDatabase. As it’s purely a
filename-based heuristic it occasionally goes wrong.
It can also provide a decent default command for newly created files the build
system doesn’t know about yet.
If a command has been “borrowed” from another file, this is noted when the
compile command is logged. (
... with command inferred from foo/bar.cc).
If no compilotion database is found, a very simple command like
is used. For a real project this will often fail to find
#includes, but it
allows clangd to work on toy examples without configuration.
Tweaks always applied
- bare command names like
clangare made absolute (by looking them up on the
xcrunon mac, or failing that guessing). This increases the chances of being able to find the standard library.
- on mac,
-isysrootis set to the default SDK (by querying
xcrun). This more closely matches the behavior of Apple’s
/usr/bin/clang(which is really a script that invokes the real clang with extra flags). Without this, the standard library again will not be found.
-fsyntax-onlyis added because we’re just parsing, not compiling.
-resource-dir=...is added, because built-in headers like
<stddef.h>must be the ones installed with clangd.
- certain unsupported flags like
If the compile command names a compiler that is present, then clangd can query it to understand its default configuration (header search paths and target), and then adjust the compile command to configure the clang parser to match.
The compiler (e.g.
custom-gcc) must be compatible enough with gcc to dump its
configuration in response to this command:
$ custom-gcc -E -v -x c++ /dev/null Target: arm-linux-gnueabihf ... #include <...> search starts here: /opt/custom-gcc/include/c++/10 ... End of search list.
This would cause clangd to add
/opt/custom-gcc/include/c++/10 to the compile command, to better simulate this
toolchain. This is often easier than configuring the correct search paths by
hand when code is designed to build with a customized toolchain.
The compiler queried is always the
argv of the compile command, but the
--query-driver=/path/to/custom-gcc must be passed to enable this.
(This must be explicitly enabled as we’re executing otherwise unknown binaries!)
Customizing compile commands
Users may want to modify the compile command used for various reasons. This is the preferred way to choose which warnings to show, and can sometimes be used to work around clangd bugs/limitations.
compile_commands.json is usually not a feasable option as it is
a generated file and changes will be quickly overwritten (though writing tools
to automate customizations is possible).
The configuration file is a simpler alternative, allowing compile flags to be added or removed. e.g.
CompileFlags: Add: '-Wswitch' Remove: '-Werror'
Compile command problems
Incorrect compile commands can cause problems of different severity.
In all cases, the log contains the compile command that was used, and it can be
easiest to feed variations of that command to
clang to try to understand
If a command is completely malformed, we won’t run the parser. This produces some diagnostics which are attached to line 1 but other features on the file will not be available (will fail with “invalid AST”).
(This isn’t great, maybe we should recover with a fallback command instead?)
This can be recognized by the diagnostics (often “expected exactly one compiler job”) by subsequent “invalid AST” errors, and by the “Could not build CompilerInvocation” log message.
Severe parsing problems
If the command was unsuitable for the file, then then the parser may run but fail to understand much of the file. Typical causes:
- couldn’t find included headers because of missing
- command is for parsing C but the code is C++
Usually diagnostics reported near the top of the file will make these problems obvious.
Subtle parsing problems
Sometimes only a few constructs do not get parsed correctly. For example if the code uses some C++20 constructs but the compile command doesn’t specify the language version, or if some expected preprocessor symbols are missing.
This may happens when compile commands from another compiler are used, and the defaults are different (e.g. GCC currently defaults to C++17, vs C++14 for clang).
This will usually result in diagnostics that pinpoint the problem.