Programming language and environment

Things to support from the start

These aspects should be implemented from the start for a programming language. For some of them, we have some insight on how to implement that feature more easily than usual.

Key insight: take program manipulation concepts that are usually external to the language, and integrate them into the language, using first-class environments, first-class continuations, laziness (for macros) and syntax quotations (for macros).

Feature	Insight
evaluator
serialization of all run-time values, and of the entire program state	language design (entire state representable by a term)
diff tool (e.g. sexp-diff for sexps), with JS or server-side implementation for to highlight diffs in online versionned repositories	generic row map in metalanguage
debugger (with domain-specific extensions http://scg.unibe.ch/research/moldabledebugger)	defunctionalize(cps(evaluator)) + explicit env + continuations to make the stack explicit (+ lazy on program to pause before evaluating a term?)
instrumentation for performance / profiler	use debugger's step-by-step or sampling via GC API or instrumentation by injecting code (via macro)
code coverage, with on-line web interface to display of the code coverage results + doc-coverage	done via instrumentation
communicate with native libraries	in-place serialization / expressive type system
multicore (can be simple "parallel_for" / "parallel_map" or explicit thread management)	implementation via native library
garbage collection	~25 lines of code / reference counting
lexer, parser, syntax highlighting, projectional editor	Christian for 1 & 2. Have a good GUI library for 3.
type inference / propagation	! unidirectional propagation, domain-specific type systems
typechecking	! interactively explore type errors: better than msgs
simplification passes	Nanopass
compilation to JS	don't rely on single-architecture native libraries
compilation to native executables	no need to optimize at the beginning
module system	! row polymorphism
package system	purely functional package management (Nix)
package repository with browser (sort & filter by criteria)	web & social stuff, use compilation to JS
* e-mail you when a package you maintain is broken on the "dev" branch
* e-mail you when a dependency of one of your packages has a security vulnerability, or has a breaking update. Show status to reward quality.
centralized online documentation repository with clickable identifiers in code fragments and cross-references	good resolution of ids (see refactoring)
* plus search engine to find functions by their type, description, module that provides them etc. (hoogle)	use compilation to JS + unification algo
* search engine for error messages, translatable error messages	error codes (Error FOO1234: …) + use translation library from the start + interactive error messages (expand info, go to documentation, …) instead of plain strings
browse source code with hyperlinks and diff between versions	good resolution of ids (see refactoring)
literate programming	explicit env, macro-like facilities
macros	laziness + quote syntax + inlining
API for IDE	explicit env, refactoring = expansion of a single macro
* Go to definition
* rename across files
* rename files (modules) themselves
* other transformations for occurrences of an identifier (eta-expansion etc.)
* other refactoring operations
* display compiler warnings, lint info, type info, static analysis info	annotations produced by macro-like things
indenter (indents but does not change newlines), possibly (likely?) customizable	! find a good framework to express indentation rules
Policy for package API evolution & security vulnerability patching	! Run tests of previous version to see that they all work but may introduce warnings; IDE support for deprecating features and bumping the version number; provide a rewriting: if the rewriting validates a proof that the behaviour did not change it can be applied automatically, else manually or warning because the actual use follows an undesired pattern
pretty-printer (indents and inserts newlines to try to fit within a certain width), possibly (likely?) customizable	! same as above
* graphical / annotated pretty-printer	good GUI framework
queries on the source code (find functions with more than X lines, show dependency graph, …)	code inspection via macro-like things
every IDE, runtime and language feature available in a web interface (trylanguagefoo.io demo for new users and playground)	use compilation to JS + good GUI framework
compilation should produce a single-file output	enables purely functional package management & pure build system
extraction of the dependencies of a file	first-class environment helps
package management: should be easy to export the state to a single file, to make it easy to create packages in a third-party package manager.	single-file output + dependencies extraction
static analysis	just make sure there's an API for that from the start
linter	just make sure there's an API for that from the start
"modes": build, doc, test, runtime (for dependencies, tree-shaking etc.)	! think about it, relates to packages and builds

Parser

Notation "p => e" := (mk_pattern p e) (p in scope pattern, e in scope expression)
Notation "p | p'" := (combine_patterns p p') (in scope match_cases)
Notation "match x with cs end" := (mk_match x cs) (cs in scope match_cases)

Notation "p => e" := (mk_pattern p e) (p in scope pattern, e in scope expression)
Notation "p | .. | p''" := (combine_patterns p .. (combine_patterns p' p'') ..) (in scope match_cases)
Notation "match x with cs end" := (mk_match x cs) (cs in scope match_cases)

A way to express the notation

match x
  case p₁ => e₁
  …
  case pn => en
end

and the notation:

(case p₁ => e₁
 …
 case pn => en)

If:

Notation ""if" a "then" b "else" c" (c bounded-by-start-and-end-kws)
Notation ""if" a "then" b"          (b bounded-by-start-and-end-kws)

allowed uses:

if true then f 1 2 3 else (g 4 5)
if true then f 1 2 3 else (g 4 5)

allowed if the precedence of function calls and arithmetic comparable in some direction (less than? or should it be greater-than?) with the if:

(if true then f 1 2 3 else g 4 5)       (* delimiter at the start of the if indicates that it's ± safe to parse till the closing paren, might be a bad idea *)

forbidden uses:

if true then f 1 2 3 else g 4 5         (* could be followed with other things *)
if true then f 1 2 3                    (* could be followed with other things *)
(3 + if true then f 1 2 3 else g 4 5)   (* could parse as (3 + (if true then f 1 2 3 else (g 4)) 5) or (3 + (if true then f 1 2 3 else (g 4 5)))

Nested pairs in the left:

Notation "( x , y .. , z )" => (pair .. (pair x y) .. z)
Notation "( x , ... , z )" => (pair .. x .. z)

Two successive keywords:

Notation ""let" "rec" a ":=" b "in" c => (letrec a b c)

Custom notations for strings (escaping, $var substitution, @{code} substitution, …)

Relative precedences

Custom top-level parser: e.g. for literate programming, custom syntax (ala prolog, ala pascal, ala scheme, ala C).

Loading a library can modify the parser. What about let open … in …?

Unparsing (pretty-printing)

Is it possible to rename the lexer tokens used by a notation? E.g. if a module provides the notation match x with cs end, it could be imported while renaming match to switch, with to in and end to done. If this throws an error at import-time because it conflicts with existing uses of in that's fine.

Preferably, there would be a conservative way to know for sure that the notations Foo and Bar from modules A and B won't interact with each other, as long as some (possibly stringent) conditions are met, e.g.

the notations are not defined in the same scope (i.e. won't be visible in the same positions in the grammar)
the notations are bounded with a starting and ending token, which must be different
the notations are not at the same level of precedence

The goal is to know, when modifying a notation in a library, if this is going to be a significant breaking change (another alternative would be to have IDE support to do the alpha-renaming in a non-ambiguous way).

Review of Coq as a language

issues with coq

tactic language
- not a full-blown language (should be gallina)
- unstructured presentation of the goal
- list of hypotheses with match goal with … end is the way to grab a random hypothesis, but:
- should offer a graph to see dependencies
- way to access a hypothesis by name without using two nested match
- right now, to manipulate hypotheses one often needs to set a MARKER as a dummy hypothesis and then use the fact that the list of hypotheses is more or less treated like a stack by most operations).
- library should provide meta-props that group together several props about a given variable.
- SQL on the hypotheses!!!
- temporary focus on a subpart of the goal (can probably be done with context?)
- gallina needs reflection (quoting / unquoting / eval of terms)
- access to the proof term constructed so far (with holes)
- replace that term with an equivalent one, including a tactic that is automatically run by Qed. to simplify the proof term, e.g. discarding administrative shuffling due to intros & reverts.
- trace of the tactics (info)
- for display / didactic / review / understanding
- to make execution faster next time (try the recorded path, otherwise complain (and possibly update the path))
- possibly stored if one wants to know when a proof's structure needs to be changed (preferably not for large software development, but practical when studying proofs themselves)
- matching on complex terms, e.g matching (Ltac) on a gallina match … end term is awkward (cannot put a pattern in place of the constructor name (can be done via case_eq hackery), cannot put a pattern in place of the in ?t clause)
- call Search from the tactic language (possibly restricted to a given module, it can than be done with row polymorphism & row introspection)
- tactic notations are not the same as gallina notations (both have features missing in the other)
type system:
- row polymorphism: optional fields, write partial matches and combine them, first-class environment, effects
- variants are special and shouldn't be (should be able to iterate on arbitrary variants, create a fresh construtor reflectively at run-time (soundness?))
- names are erased too easily e.g. forall (a : nat), bool will often show as forall (_ : nat), bool or nat -> bool
- There should be a way to know if a variable name is user-provided (and where it was provided: current proof, definition of function, definition of inductive, in current module or in another module, …) or whether it was generated by Coq. This could be used to prevent intros ??? H. where the ? do not correspond to already-named variables (i.e. to prevent accidental automatic name generation).
- implicit vs. non-implicit should appear in the type (with polymorphism / subtyping to make it irrelevant in most contexts), so that (forall (implicit a), list a) can be passed where a list int is expected). In coq, Definition x := fun {t} => @nil t. Compute cons 1 x works, but Compute cons 1 (fun {t} => @nil t). does not. Also, the implicitness is lost in Definition x := fun {t} => @nil t.
- matching on types via nominal typing: the type nat should actually be fix n . (Nat, (O | S n)) where the type tag Nat can be used to match on the type.
- most/all types from the library should be nominal in that way
- type-level match should also be able to look into structural types (if the type is not fully reduced, the result of the match will likely not be fully reduced either).
- ability to run programs that don't typecheck (during development / debugging, as is possible in haskell)
- {struct x} should accept an expression instead of x (can be done with Program, but writing the comparison function is annoying when one only wants structural induction on a subpart of the term)
- customizable type inference algorithm by writing a tactic. Same for unification, convertibility. E.g. in Coq you can't pass a nat where an if b then nat else nat is epxected, but if you put a tactic instead of the nat value you can prove that the type expression always reduces to nat and then pass one as an argument (it generates some boilerplate in the final term, but that's okay).
- input is non-existant, output is a pain (idtac only works when returning a tactic).
- with an explicit environment, type constants could be always captured by a := and then used, e.g. forall (t := nat) (x : t), t instead of forall x : nat, nat. This means that there would be only two constructions: forall var := global_constant_or_arbitrary_expression and forall var : locally_bound_type_variable. Not sure if that actually means less constructions or if it doesn't matter?
- thread extra info (to build static analyses), i.e. ornementation. This would mean that custom type information can be computed
code inference / metaprogramming
- need a way to pass arguments to the tactics solving holes
- meta-generation of vernacular (or, rather, with first-class envs and variants/inductive types that aren't special and privileged constructs, a lot of the vernacular becomes useless).
- notations are incomprehensible (at level X where X is a random number is just user-hostile)
- recursive notations e.g. for ( a , b , .. , c ) are brittle, there are a bunch of formulations that don't work and it's hard to understand why.
- notations: need for tools to understand the relative precedence levels, etc. (show a graph, show valid successors / predecessors, show a BNF grammar, generate example terms (especially terms which would be ambiguous modulo the choice of precedence)).
- notations: need for tools to specify relative precedence levels.
- easier way to apply a tactic to an expression some_expression. For now, one has to do let X := MARKER (some_expression) in ltac:(some_tactic) : _ and let some_tactic grab the MARKER (some_expression) from the environment, transform it, detect the type of the result via let t := type of transformed_expression in …, unify the goal with that type and solve the goal using the transformed expression. That's jumping through a lot of hoops. Probably can be solved with a good notation + tactic that do this once and for all.
performance
- cost model, assert big-o complexity of function in a certain evaluation order
- ltac is just bad (iterating over the list of hypotheses easily costs O(n²) or worse
extraction
- should generate quickcheck-style and fuzzer tests to check that the implementation seems to match the expected type.
- should be possible to write extractors to other languages in gallina (via quoting & reflection)
- is it possible to give an extraction for a type + some operations on that type, but still allow deconstructing values of that type (via a backward translation from the native representation to (a native encoding of) the coq representation of the constructor arguments).
Queries on the code
- Search (forall b a, S a < S b -> a < b). returns no result, but Search (forall a b, S a < S b -> a < b). finds a lemma. MLF's unordered quantifiers and unification that is allowed to lift quantifiers should solve this.
- More interactive queries: which lemmas are used by a proof, which notations exist that talk about a certain type / function, …
- Print a trace of the tactics used to build a proof
- coverage: detect which parts of a function definition are inspected by a proof. When there are parts that are inspected by no proof, it means that these parts of the code are relevant to no property. This in turn means that one should probably define a theorem / property that specifies how these parts of the code should work (in other words, these parts might be specified only by the code, not by a formal specification).
- There's no version of Print that takes an arbitrary expression (to show what has been understood after some ltac:() and _ (possibly in parse-only notations) have been solved). The closest is to Declare Reduction print := unfold dummy. and then do Eval print in arbitrary_expression..
- Need to be able to apply a transformation / custom query (defined as a tactic) on an arbitrary expression, and print the result. For now the closest is Declare Reduction print := unfold dummy. and then Eval print in some_transformation (arbitrary_expression)., where some_transformation jumps through hoops with a Notation to grab the arbitrary_expression and feed it ton an ltac:(…).
IDE
- The IDE should display some extended information on hover for the printed terms (e.g. the name of the function for definitions).
- Interactivity: to make an interactive session (e.g. to interactively navigate a proof), one needs to entrer proof mode first. Should be possible directly in the vernacular or to define one's own beginning-of-interactive-session commands (e.g. to defaine a Replay proof some_thm. …. End replay. command, currently one has to write Goal replay proof some_thm. init_replay_proof. …. Qed.).
- better display of hypotheses: show a dependency DAG, highlight on hover the dependencies of a hypothesis / of a sub-part of the goal, etc.

cool stuff in coq:

section parameters
dependent types
ltac:() to do code generation
notation
Locate, Search etc.
extraction (picking a replacement for a type & for operations on that type)

Review of Agda as a language

TODO.

Review of Idris as a language

TODO.

Review of Epigram as a language

TODO.

Review of other_language as a language

TODO.