Note [Incremental Parsing] #
In the language server, we want to minimize the work we do after each edit by reusing previous state where possible. This is true for both parsing, i.e. reusing syntax trees without running the parser, and elaboration. For both, we currently assume that we have to reprocess at least everything from the point of change downwards. This note is about how to find the correct starting point for incremental parsing; for elaboration, we then start with the first changed syntax tree.
One initial thought about incremental parsing could be that it's not necessary as parsing is very fast compared to elaboration; on mathlib we average 41ms parsing per 1000 LoC. But there are quite a few files >= 1kloc (up to 4.5kloc) in there, so near the end of such files lag from always reparsing from the beginning may very well be noticeable.
So if we do want incremental parsing, another thought might be that a user edit can only invalidate
commands at or after the location of the change. Unfortunately, that's not true; take the (partial)
input def a := b private def c
. If we remove the space after private
, the two commands
syntactically become one with an application of privatedef
to b
even though the edit was
strictly after the end of the first command.
So obviously we must include at least the extent of the token that made the parser stop parsing a command as well such that invalidating the private token invalidates the preceding command. Unfortunately this is not sufficient either, given the following input:
structure a where /-- b -/ @[c] private axiom d : Nat
This is a syntactically valid sequence of a field-less structure and a declaration. If we again delete the space after private, it becomes a syntactically correct structure with a single field privateaxiom! So clearly, because of uses of atomic in the grammar, an edit can affect a command syntax tree even across multiple tokens.
Now, what we do today, and have done since Lean 3, is to always reparse the last command completely preceding the edit location. If its syntax tree is unchanged, we preserve its data and reprocess all following commands only, otherwise we reprocess it fully as well. This seems to have worked well so far but it does seem a bit arbitrary given that even if it works for our current grammar, it can certainly be extended in ways that break the assumption.
Finally, a more actually principled and generic solution would be to invalidate a syntax tree when
the parser has reached the edit location during parsing. If it did not, surely the edit cannot have
an effect on the syntax tree in question. Sadly such a "high-water mark" parser position does not
exist currently and likely it could at best be approximated by e.g. "furthest tokenFn
parse". Thus
we remain at "go two commands up" at this point.
Note [Incremental Command Elaboration] #
Because of Lean's use of persistent data structures, incremental reuse of fully elaborated commands
is easy because we can simply snapshot the entire state after each command and then restart
elaboration using the stored state at the point of change. However, incrementality within
elaboration of a single command such as between tactic steps is much harder because we cannot simply
return from those points to the language processor in a way that we can later resume from there.
Instead, we exchange the need for continuations with some limited mutability: by allocating an
IO.Promise
"cell" in the language processor, we can both pass it to the elaborator to eventually
fill it using Promise.resolve
as well as convert it to a Task
that will wait on that resolution
using Promise.result
and return it as part of the command snapshot created by the language
processor. The elaborator can then create new promises itself and store their result
when
resolving an outer promise to create an arbitrary tree of promise-backed snapshot tasks. Thus, we
can enable incremental reporting and reuse inside the elaborator using the same snapshot tree data
structures as outside without having to change the elaborator's control flow.
While ideally we would decide what can be reused during command elaboration using strong hashes over
the state and inputs, currently we rely on simpler syntactic checks: if all the syntax inspected up
to a certain point is unchanged, we can assume that the old state can be reused. The central
SnapshotBundle
type passed inwards through the elaborator for this purpose combines the following
data:
- the
IO.Promise
to be resolved to an elaborator snapshot (whose type depends on the specific elaborator part we're in, e.g. `) - if there was a previous run:
- a
SnapshotTask
holding the corresponding snapshot of the run - the relevant
Syntax
of the previous run to be compared before any reuse
- a
Note that as we do not wait for the previous run to finish before starting to elaborate the next
one, the SnapshotTask
task may not be finished yet. Indeed, if we do find that we can reuse the
contained state, we will want to explicitly wait for it instead of redoing the work. On the other
hand, the Syntax
is not surrounded by a task so that we can immediately access it for comparisons,
even if the snapshot task may, eventually, give access to the same syntax tree.
TODO: tactic examples
While it is generally true that we can provide incremental reporting even without reuse, we
generally want to avoid that when it would be confusing/annoying, e.g. when a tactic block is run
multiple times because otherwise the progress bar would snap back and forth multiple times. For this
purpose, we can disable both incremental modes using Term.withoutTacticIncrementality
, assuming we
opted into incrementality because of other parts of the combinator. induction
is an example of
this because there are some induction alternatives that are run multiple times, so we disable all of
incrementality for them.
Option for capturing output to stderr during elaboration.
Option for showing elaboration errors from partial syntax errors.
The hierarchy of Lean snapshot types
Snapshot after elaboration of the entire command.
- diagnostics : Lean.Language.Snapshot.Diagnostics
- infoTree? : Option Lean.Elab.InfoTree
- isFatal : Bool
- cmdState : Lean.Elab.Command.State
Resulting elaboration state.
Instances For
Equations
- One or more equations did not get rendered due to their size.
State after a command has been parsed.
- diagnostics : Lean.Language.Snapshot.Diagnostics
- infoTree? : Option Lean.Elab.InfoTree
- isFatal : Bool
- stx : Lean.Syntax
Syntax tree of the command.
- parserState : Lean.Parser.ModuleParserState
Resulting parser state.
Snapshot for incremental reporting and reuse during elaboration, type dependent on specific elaborator.
State after processing is finished.
- tacticCache : IO.Ref Lean.Elab.Tactic.Cache
Cache for
save
; to be replaced with incrementality.
Instances For
State after a command has been parsed.
- mk: Lean.Language.Lean.CommandParsedSnapshotData →
Option (Lean.Language.SnapshotTask Lean.Language.Lean.CommandParsedSnapshot) →
Lean.Language.Lean.CommandParsedSnapshot
Creates a command parsed snapshot.
Instances For
The snapshot data.
Equations
- x.data = match x with | Lean.Language.Lean.CommandParsedSnapshot.mk data nextCmdSnap? => data
Instances For
Next command, unless this is a terminal command.
Equations
- x.next? = match x with | Lean.Language.Lean.CommandParsedSnapshot.mk data next? => next?
Instances For
Cancels all significant computations from this snapshot onwards.
State after successful importing.
- cmdState : Lean.Elab.Command.State
The resulting initial elaboration state.
First command task (there is always at least a terminal command).
Instances For
State after the module header has been processed including imports.
- diagnostics : Lean.Language.Snapshot.Diagnostics
- infoTree? : Option Lean.Elab.InfoTree
- isFatal : Bool
State after successful importing.
Instances For
Equations
- One or more equations did not get rendered due to their size.
State after successfully parsing the module header.
- parserState : Lean.Parser.ModuleParserState
Resulting parser state.
- processedSnap : Lean.Language.SnapshotTask Lean.Language.Lean.HeaderProcessedSnapshot
Header processing task.
Instances For
State after the module header has been parsed.
- diagnostics : Lean.Language.Snapshot.Diagnostics
- infoTree? : Option Lean.Elab.InfoTree
- isFatal : Bool
- ictx : Lean.Parser.InputContext
Parser input context supplied by the driver, stored here for incremental parsing.
- stx : Lean.Syntax
Resulting syntax tree.
- result? : Option Lean.Language.Lean.HeaderParsedState
State after successful parsing.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Shortcut accessor to the final header state, if successful.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Initial snapshot of the Lean language processor: a "header parsed" snapshot.
Instances For
Lean-specific processing context.
- mainModuleName : Lake.Name
- opts : Lean.Options
- trustLevel : UInt32
- input : String
- fileName : String
- fileMap : Lean.FileMap
- firstDiffPos? : Option String.Pos
Position of the first file difference if there was a previous invocation.
Instances For
Monad transformer holding all relevant data for Lean processing.
Instances For
Monad holding all relevant data for Lean processing.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Equations
- One or more equations did not get rendered due to their size.
Returns true if there was a previous run and the given position is before any textual change compared to it.
Equations
- Lean.Language.Lean.isBeforeEditPos pos = do let __do_lift ← read pure (Option.any (fun (x : String.Pos) => decide (pos < x)) __do_lift.firstDiffPos?)
Instances For
Entry point of the Lean language processor.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Waits for and returns final environment, if importing was successful.
Equations
- Lean.Language.Lean.waitForFinalEnv? snap = do let snap ← snap.result? let snap ← snap.processedSnap.get.result? Lean.Language.Lean.waitForFinalEnv?.goCmd snap.firstCmdSnap.get