clangd code walkthrough

This describes the critical parts of clangd and roughly what they do. It may get out of date from time to time, please file a bug! It mostly starts at the outside and works it way inward.

The clangd code lives in the llvm-project repository under clang-tools-extra/clangd. We’ll also mention some dependencies in other parts of llvm-project (such as clang).

Links below point to the woboq code browser. It has nice cross-referenced navigation, but may show a slightly outdated version of the code.

Contents

Starting up and managing files

Entry point and JSON-RPC

The clangd binary itself has its main() entrypoint in tool/ClangdMain.cpp. This mostly parses flags and hooks ClangdLSPServer up to its dependencies.

One vital dependency is JSONTransport which speaks the JSON-RPC protocol over stdin/stdout. LSP is layered on top of this Transport abstraction. (There’s also an Apple XPC transport in the xpc/ directory).

We call ClangdLSPServer.run() to start the loop, and it synchronously processes messages until the client disconnects. Calls to the large non-threadsafe singletons (ClangdLSPServer, ClangdServer, TUScheduler) all happen on the main thread. See threads and request handling.

Language Server Protocol

ClangdLSPServer handles the LSP protocol details. Incoming requests are routed to some method on this class using a lookup table, and then implemented by dispatching them to the contained ClangdServer.

The incoming JSON requests are mapped onto structs defined in Protocol.h. In the simplest cases these are just forwarded to the appropriate method on ClangdServer - we use the LSP structs as vocabulary types for most things.

In other cases there’s some gap between LSP and what seems to be a sensible C++ API, so ClangdLSPServer has some real work to do.

ClangdServer and TUScheduler

The ClangdServer class is best thought of as the C++ API to clangd. Features tend to be implemented as stateless, synchronous functions (“give me hover information from this AST at offset 25”). ClangdServer exposes them as stateful, asynchronous functions (“compute hover information for the latest version of Foo.cpp at offset 25, call back when done”) which is the LSP model.

TUScheduler is responsible for keeping track of the latest version of each file, building and caching ASTs and preambles as inputs, and providing threads to run requests on in an appropriate sequence. (More details in threads and request handling). It also pushes certain events to ClangdServer via ParsingCallbacks, to allow emitting diagnostics and indexing ASTs.

ClangdServer is fairly mechanical for the most part. The features are implemented in various other files, and the scheduling and AST building is done by TUScheduler, so largely it just binds these together. TUScheduler doesn’t know about particular features (except diagnostics).

Compile commands

Like other clang-based tools, clangd uses clang’s command-line syntax as its interface to configure parse options (like include directories). The arguments are obtained from a tooling::CompilationDatabase, typically built by reading compile_commands.json from a nearby directory. GlobalCompilationDatabase is responsible for finding and caching such databases, and for providing “fallback” commands when none are found.

Various heuristic tweaks are applied to these commands to make them more likely to work, particularly on Mac. These live in CommandMangler.

Features

Diagnostics

During parsing, clang emits diagnostics through a DiagnosticConsumer callback. clangd’s StoreDiags implementation converts them into Diag objects, which capture the relationships between diagnostics, fixes, and notes. These are exposed in ParsedAST. (clang-tidy diagnostics are generated separately in buildAST(), but also end up captured by StoreDiags and exposed in the same way).

IncludeFixer attempts to add automatic fixes to certain diagnostics by using the index to find headers that should be included.

TUScheduler has a logic to determine when a ParsedAST’s diagnostics are “correct enough” to emit, and when a new build is needed for this purpose. It then triggers ClangdServer::onMainAST, which calls ClangdLSPServer::onDiagnosticsReady, which sends them to the client.

AST-based features

Most clangd requests are handled by inspecting a ParsedAST, and maybe the index. Examples are locateSymbolAt() (go-to-definition) and getHover().

These features are spread across various files, but are easy to find from their callsites in ClangdServer.

Code completion (and signature help)

Code completion does not follow the usual pattern for AST-based features. Instead there’s a dedicated parse of the current file with a callback when the completion point is reached. The core completion logic is implemented in clang’s SemaCodeComplete.cpp and has access to information not present in the AST, such as name-lookup structures and parser state.

CodeComplete.h is mostly concerned with running clang in this mode, combining clang’s results with index-based results, applying ranking, and converting to LSP’s data model.

The ranking is mostly implemented in Quality.h, and the name-matching is done by FuzzyMatcher.

Code actions

Most code actions are provided by Tweaks. These are small plugins that implement the Tweak interface. They live in the [refactor/tweaks] directory and are registered through the linker. Given a selection, they can (quickly) determine whether they apply there and (maybe slowly) generate the actual edits. The LSP code-actions flow is built out of these primitives.

Feature infrastructure

Parsing and ASTs

The representation of a parsed file in clangd is ParsedAST. As the name suggests this is mostly used to access Clang’s AST (clang::ASTContext), but extends it by:

ParsedAST::build() is where we run the clang parser. Some low-level bits (creating CompilerInstance) are in Compiler.h instead, and are reused when we run clang without retaining an AST (code completion, indexing, preambles).

The PreambleData structure similarly extends Clang’s PrecompiledPreamble class with extra recorded information. It contains the AST of included headers and is only rebuilt when those headers change. The preamble is large, it’s kept on disk by default and parts are deserialized on demand.

Abstractions over clang AST

Several tasks come up in various features and we have reusable solutions:

Index

Operations that need information outside the current file/AST make use of the clangd index, which is in the index/ directory.

SymbolIndex is the index interface exposed to consuming features, and describes the data/queries they should provide. (Symbol, Ref, etc). It has several implementations used as building-blocks:

SymbolCollector extracts indexable data from a translation unit. index/Serialization.h defines a binary format to store/load index data.

These building blocks are used to provide clangd’s index data:

Dependencies

Clang libraries

Clang code structure is a huge topic, but the most important pieces for clangd:

Clang-tools libraries

clangd shares code with other tools derived from the Clang compiler, these libraries live outside clangd.

General support libraries

Like most LLVM code, clangd heavily uses llvm/ADT/ and llvm/Support/ to supplement the standard library. We try to avoid other LLVM dependencies.

clangd has its own support/ library, conceptually similar to llvm/Support. It contains libraries that are general-purpose, but not a good fit for llvm as a whole (too opinionated, or focused on multithreading). The most prominent:

Testing

Most of the tests are in the unittests/ directory (despite being a mix of unit and integration tests). Test files are mostly named after the file they’re testing, and use the googletest framework.

Some helpers are widely shared between tests:

clangd has a small number of black-box tests in test/. These use LLVM lit and FileCheck to drive the clangd binary and verify output. They smoke-test clangd as an LSP server, and test a few hard-to-isolate features.

✏️