Deciphering Xcode’s index

At work we’re having to wait an inordinate amount of time for Xcode to finish indexing our rather large Swift project. I’ve consequently spent a lot of time over the past few weeks digging into the internals of indexing. This is more or less a brain dump of what I’ve discovered thus far.

 File structure

Xcode’s index is broken into a number of files, located in {project derived data}/Index/{build config}/{platform}/{project}.xcindex/:

db.xcindexdb
db.xcindexdb-shm
db.xcindexdb-wal
db.xcindexdb.strings-cmp
db.xcindexdb.strings-dir
db.xcindexdb.strings-file
db.xcindexdb.strings-moduleurl
db.xcindexdb.strings-res
db.xcindexdb.strings-sym

 SQLite files

 Strings files

These files are collections of strings separated by 0x00 (with a leading and trailing 0x00). These strings are referenced by rows in the SQLite database. Why they aren’t in the SQLite database is anyone’s guess. ?

The references in the database are integer offsets to the start of the string in these files (presumably for performance reasons).

 SQLite database schema

XcodeIndex.png

Open up the db.xcindexdb file in the SQLite command line shell and get the list of tables:

$ sqlite3 db.xcindexdb
sqlite> .tables
context    group_     language   reference  unit     
file       kind       provider   symbol

Let’s dig into each of these. Keep in mind that the files, directories, and other strings referenced in the database are stored externally, in the aforementioned strings files. (Again, why they aren’t in the db is anyone’s guess.)

Note: Your output may differ from mine. I’m using the index files for a simple Mac app project with a couple of source files, to keep it simple.

Also, the comments in the SQL table definitions are mine.

 kind

The kind table appears to hold the various possible token types.

sqlite> .schema kind
CREATE TABLE kind(
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    identifier TEXT NOT NULL
);
sqlite> SELECT * FROM kind;
id|identifier
1|Xcode.SourceCodeSymbolKind.IBOutlet
2|Xcode.SourceCodeSymbolKind.GlobalVariable
3|Xcode.SourceCodeSymbolKind.Global
4|Xcode.SourceCodeSymbolKind.ToDo
5|Xcode.SourceCodeSymbolKind.Callable
6|Xcode.SourceCodeSymbolKind.StaticProperty
7|Xcode.SourceCodeSymbolKind.BuiltinType
8|Xcode.SourceCodeSymbolKind.FunctionTemplate
9|Xcode.SourceCodeSymbolKind.StaticMethod
10|Xcode.SourceCodeSymbolKind.Member
...

 provider

The provider table appears to hold the various sources of index data (e.g. Clang, SourceKit, etc).

sqlite> .schema provider
CREATE TABLE provider (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    identifier TEXT NOT NULL,
    version TEXT NOT NULL
);
sqlite> SELECT * FROM provider;
id|identifier|version
1|Xcode.IDEFoundation.Index.DataSource.Unknown|1.0
2|Xcode.Swift.Index.DataSource|3
3|Xcode.IDEFoundation.Index.DataSource.auxiliaryFiles|1.1
4|Xcode.IDEFoundation.Index.DataSource.clang-module|1
5|Xcode.IDEFoundation.Index.DataSource.clang|1

 language

language is the list of programming languages about which Xcode knows.

sqlite> .schema language
CREATE TABLE language (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    identifier TEXT NOT NULL
);
sqlite> SELECT * FROM language;
id|identifier
1|Xcode.SourceCodeLanguage.C
2|Xcode.SourceCodeLanguage.CSS
3|Xcode.SourceCodeLanguage.JSON
4|Xcode.SourceCodeLanguage.Metal
5|Xcode.SourceCodeLanguage.BourneShellScript
6|Xcode.SourceCodeLanguage.XML
7|Xcode.SourceCodeLanguage.OpenCL
8|Xcode.SourceCodeLanguage.XcodeStrings
9|Xcode.SourceCodeLanguage.C-Plus-Plus
10|Xcode.SourceCodeLanguage.Objective-C
...

 file

file, unsurprisingly, represents a file. Again, remember that the filenames and directories are offsets into the strings files.

sqlite> .schema file
CREATE TABLE file (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    -- offset in db.xcindexdb.strings-file
    lowercaseFilename INTEGER NOT NULL,
    -- offset in db.xcindexdb.strings-file
    filename INTEGER NOT NULL,
    -- offset in db.xcindexdb.strings-dir
    directory INTEGER NOT NULL,
    -- offset in db.xcindexdb.strings-moduleurl
    --    0 = project's module?
    moduleurl INTEGER NOT NULL,
    inProject INTEGER NOT NULL
);
CREATE INDEX file_lowercaseFilename_index ON file (lowercaseFilename);

 group_

sqlite> .schema group_
CREATE TABLE group_ (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    file INTEGER NOT NULL,     -- foreign key to `file` table
    signature TEXT NOT NULL,
    signature_inBody TEXT NOT NULL,
    provider INTEGER NOT NULL  -- foreign key to `provider` table
);
CREATE INDEX group_index ON group_ (file, signature);

 unit

unit represents a compilation unit??

sqlite> .schema unit
CREATE TABLE unit (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    file INTEGER NOT NULL,  -- foreign key to `file` table
    target TEXT NOT NULL,   -- path to Xcode target
    provider INTEGER NOT NULL,  -- foreign key to `provider` table
    pchFile INTEGER
);
CREATE UNIQUE INDEX unit_index ON unit (file, target);
CREATE INDEX unit_target_index ON unit (target);
CREATE INDEX unit_provider_index ON unit (provider);

The target column has (at least) three possible values:

 context

sqlite> .schema context
CREATE TABLE context (
    unit INTEGER NOT NULL,    -- foreign key to `unit` table
    group_ INTEGER NOT NULL,  -- foreign key to `group_` table
    includer INTEGER,
    -- modified time relative to 00:00:00 UTC on 1 January 2001
    --     (NSDate's reference date)
    modified REAL,
    spliced INTEGER DEFAULT 0
);
CREATE UNIQUE INDEX context_index ON context (unit, group_);
CREATE INDEX context_group_index ON context (group_);
CREATE INDEX context_includer_index ON context (includer);

 symbol

symbol contains the symbols in your code.

sqlite> .schema symbol
CREATE TABLE symbol (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    -- offset in db.xcindexdb.strings-sym
    spelling INTEGER NOT NULL,
    -- offset in db.xcindexdb.strings-sym
    lowercaseSpelling INTEGER NOT NULL,
    kind INTEGER,  -- foreign key to `kind` table
    role INTEGER NOT NULL,
    language INTEGER,  -- foreign key to `language` table
    resolution INTEGER,
    group_ INTEGER NOT NULL,  -- foreign key to `group_` table
    lineNumber INTEGER,  -- starts at 1
    column INTEGER,      -- starts at 1
    locator TEXT,
    container INTEGER,
    completionString INTEGER
);
CREATE INDEX symbol_lowercaseSpelling_index ON symbol (lowercaseSpelling);
CREATE INDEX symbol_resolution_index ON symbol (resolution);
CREATE INDEX symbol_kind_index ON symbol (kind);
CREATE INDEX symbol_group_index ON symbol (group_);
CREATE INDEX symbol_container_index ON symbol (container);

 reference

reference contains the references in your code to symbols.

sqlite> .schema reference
CREATE TABLE reference (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    -- offset in db.xcindexdb.strings-sym
    spelling INTEGER NOT NULL,
    -- offset in db.xcindexdb.strings-sym
    lowercaseSpelling INTEGER NOT NULL,
    kind INTEGER,  -- foreign key to `kind` table
    role INTEGER NOT NULL,
    language INTEGER,  -- foreign key to `language` table
    resolution INTEGER,
    group_ INTEGER NOT NULL,  -- foreign key to `group_` table
    lineNumber INTEGER,  -- starts at 1
    column INTEGER,      -- starts at 1
    locator TEXT,
    container INTEGER,
    receiver INTEGER
);
CREATE INDEX reference_lowercaseSpelling_index ON reference (lowercaseSpelling);
CREATE INDEX reference_resolution_index ON reference (resolution);
CREATE INDEX reference_group_index ON reference (group_);
CREATE INDEX reference_container_index ON reference (container);

 Examples

OK, now that we’ve got the basic structure down, let’s do some lookups. We’re going to be working from a basic macOS project with the following code:

// Logger.swift
final class Logger {
    func log(message: Message) {
        print(message)
    }

    func dump() -> String {
        return "Hello, world!"
    }
}

// Message.swift
struct Message: CustomStringConvertible {
    let content: String
    let file: String
    let line: Int

    var description: String {
        return "\(file):\(line) -- \(content)"
    }
}

// main.swift
let logger = Logger()
logger.log(message: Message(content: "Hello, world!", file: #file, line: #line))

 Challenge

Find all usages of the Logger class in our code.

 Solution

 Step 1

Find the location of the string “Logger” in the .strings-sym file.

$ strings -a -t d db.xcindexdb.strings-sym | grep "^[0-9]* Logger$"
8 Logger

 Step 2

Find its record in the reference table.

$ sqlite3 db.xcindexdb
sqlite> SELECT * FROM reference
   ...> WHERE spelling = 8;
id|spelling|lowercaseSpelling|kind|role|language|resolution|group_|lineNumber|column|locator|container|completionString
11719|8|1|5|4|35|93|1355|9|14|||

This gives us all the places Logger is referenced (only one, currently).

 Step 3

To find the files corresponding to these references, join across group_ to file:

sqlite> SELECT f.id,f.filename,f.directory
   ...> FROM file f
   ...> INNER JOIN group_ g ON (g.file = f.id)
   ...> INNER JOIN reference r ON (r.group_ = g.id)
   ...> WHERE r.spelling = 8;
id|filename|directory
1|1|1

We can now jump to offset 1 in db.xcindexdb.strings-file to get the filename where Logger is referenced (main.swift) and jump to offset 1 in db.xcindexdb.strings-dir to get the directory path.

If you wanted to find where Logger is defined instead, you can replace the reference table with symbol in steps 2 and 3.

 Next steps

I ultimately want to be able to modify the index outside of Xcode (e.g. move the project to a different location on disk or share it across multiple machines without triggering a reindex). Stay tuned!

 
22
Kudos
 
22
Kudos

Now read this

The Apple Watch: Impressions After 1 Year

I stayed up late the night of April 9, 2015 to pre-order the Apple Watch. I actually ordered two — work was paying for them, and we wanted to ensure there were enough devices for the team (we were busy putting the finishing touches on... Continue →