Note [Incremental Parsing] #

In the language server, we want to minimize the work we do after each edit by reusing previous state where possible. This is true for both parsing, i.e. reusing syntax trees without running the parser, and elaboration. For both, we currently assume that we have to reprocess at least everything from the point of change downwards. This note is about how to find the correct starting point for incremental parsing; for elaboration, we then start with the first changed syntax tree.

One initial thought about incremental parsing could be that it's not necessary as parsing is very fast compared to elaboration; on mathlib we average 41ms parsing per 1000 LoC. But there are quite a few files >= 1kloc (up to 4.5kloc) in there, so near the end of such files lag from always reparsing from the beginning may very well be noticeable.

So if we do want incremental parsing, another thought might be that a user edit can only invalidate commands at or after the location of the change. Unfortunately, that's not true; take the (partial) input def a := b private def c. If we remove the space after private, the two commands syntactically become one with an application of privatedef to b even though the edit was strictly after the end of the first command.

So obviously we must include at least the extent of the token that made the parser stop parsing a command as well such that invalidating the private token invalidates the preceding command. Unfortunately this is not sufficient either, given the following input:

structure a where /-- b -/ @[c] private axiom d : Nat

This is a syntactically valid sequence of a field-less structure and a declaration. If we again delete the space after private, it becomes a syntactically correct structure with a single field privateaxiom! So clearly, because of uses of atomic in the grammar, an edit can affect a command syntax tree even across multiple tokens.

Now, what we do today, and have done since Lean 3, is to always reparse the last command completely preceding the edit location. If its syntax tree is unchanged, we preserve its data and reprocess all following commands only, otherwise we reprocess it fully as well. This seems to have worked well so far but it does seem a bit arbitrary given that even if it works for our current grammar, it can certainly be extended in ways that break the assumption.

Finally, a more actually principled and generic solution would be to invalidate a syntax tree when the parser has reached the edit location during parsing. If it did not, surely the edit cannot have an effect on the syntax tree in question. Sadly such a "high-water mark" parser position does not exist currently and likely it could at best be approximated by e.g. "furthest tokenFn parse". Thus we remain at "go two commands up" at this point.

Note [Incremental Command Elaboration] #

Because of Lean's use of persistent data structures, incremental reuse of fully elaborated commands is easy because we can simply snapshot the entire state after each command and then restart elaboration using the stored state at the point of change. However, incrementality within elaboration of a single command such as between tactic steps is much harder because we cannot simply return from those points to the language processor in a way that we can later resume from there. Instead, we exchange the need for continuations with some limited mutability: by allocating an IO.Promise "cell" in the language processor, we can both pass it to the elaborator to eventually fill it using Promise.resolve as well as convert it to a Task that will wait on that resolution using Promise.result and return it as part of the command snapshot created by the language processor. The elaborator can then create new promises itself and store their result when resolving an outer promise to create an arbitrary tree of promise-backed snapshot tasks. Thus, we can enable incremental reporting and reuse inside the elaborator using the same snapshot tree data structures as outside without having to change the elaborator's control flow.

While ideally we would decide what can be reused during command elaboration using strong hashes over the state and inputs, currently we rely on simpler syntactic checks: if all the syntax inspected up to a certain point is unchanged, we can assume that the old state can be reused. The central SnapshotBundle type passed inwards through the elaborator for this purpose combines the following data:

the IO.Promise to be resolved to an elaborator snapshot (whose type depends on the specific elaborator part we're in, e.g. `)
if there was a previous run:
- a SnapshotTask holding the corresponding snapshot of the run
- the relevant Syntax of the previous run to be compared before any reuse

Note that as we do not wait for the previous run to finish before starting to elaborate the next one, the SnapshotTask task may not be finished yet. Indeed, if we do find that we can reuse the contained state, we will want to explicitly wait for it instead of redoing the work. On the other hand, the Syntax is not surrounded by a task so that we can immediately access it for comparisons, even if the snapshot task may, eventually, give access to the same syntax tree.

TODO: tactic examples

While it is generally true that we can provide incremental reporting even without reuse, we generally want to avoid that when it would be confusing/annoying, e.g. when a tactic block is run multiple times because otherwise the progress bar would snap back and forth multiple times. For this purpose, we can disable both incremental modes using Term.withoutTacticIncrementality, assuming we opted into incrementality because of other parts of the combinator. induction is an example of this because there are some induction alternatives that are run multiple times, so we disable all of incrementality for them.