The clangd index

The index stores information about the whole codebase. It’s used to provide LSP features where the AST of the current file doesn’t have the information we need.

Exposed data

Implementations

SymbolIndex is an interface, and clangd maintains several instances. These are stitched together using MergedIndex, which layers one index on top of another. Code implementing features sees only a single combined index.

FileIndex (“dynamic index”)

This is the top layer, and includes symbols from the files that have been opened and the headers they include. This is used:

The FileIndex class stores data from each file separately. When a file is parsed, the TUScheduler invokes a callback which adds the AST to the index. (In fact, there is a separate storage and callback for expensive-and-rare preamble rebuilds vs cheap-and-frequent main-file rebuilds).

BackgroundIndex

As the name suggests, this parses all files in the project in the background to build a complete index. This is used:

The BackgroundIndex maintains a thread-pool, and when a compilation database is found, the compile command for each source file is placed on a queue.

Before indexing each file, the index checks for a cached *.idx file on disk. After indexing, it writes this file. This avoids reindexing on startup if nothing changed since last time. These files are located in .cache/clangd/index/ next to compile_commands.json. For headers with no CDB, such as the standard library, they are in clangd/index under the user’s cache directory ($XDG_CACHE_HOME, DARWIN_USER_CACHE_DIR, or %LocalAppData%).

Static index

The (optional) static index is built outside clangd. It would typically cover the whole codebase. This is used:

With the -index-file option, clangd will load an index produced by the clangd-indexer tool.

Remote index

For large codebases (e.g. LLVM and Chromium) global index can take a long time to build (multiple hours even on very powerful machines for Chrome-sized projects) and induces a large memory overhead (multiple GB of RAM) to serve within clangd.

Remote index allows serving index on a separate machine and connecting to it from your device. This means you don’t have to build the index yourself anymore and clangd will use significantly less memory. Hence developers can work from less powerful machines, while still using clangd to its fullest. For more details, see remote index.

✏️