My Notes for UCLA CS231: Types and Programming Languages
Winter 2021, taught by professor Todd Millstein
2021-08-29 • 107 min read
This is my notes for CS231 Types and Programming Languages during Winter 2021 at UCLA. I find this class extremely interesting and want to share my notes for people are/will be taking the course as a reference, and for those considering taking the course to see what the content is like. If you are an UCLA student considering taking this course, I am happy to answer any questions and talk about the class with you.
Quick FAQ:
- What is the course about?
- This course is about the theory of type checking. The core goal of the class is to show you how type checking guarantees runtime “correctness”. In the course, you will get to construct an OCaml-like language from ground up by adding features (variables, functions, tuples, unions) one at a time. While adding each feature, we use maths to prove that our type system will guarantee that the feature will run correctly at runtime. Towards the end of the course, we get to use an automatic theorem prover called Coq.
- How are the assignments and exams?
- Since the course is about proving that type checking works, you will be writing some proofs in the homework. You will also write some OCaml code. (Don’t be scared, the OCaml code assignment is not like 131.) The exam is very similar style to the homework. If you can handle the homework, you can handle the exam.
- If I am an undergrad, can I take this course?
- Yes! I took this course as an undergrad. Check with the department for the process to enroll in grad classes as undergrad.
- What are some background knowledge that helped in 231?
- The course enforces CS 131 as a pre-requisite. You need to know some basic OCaml for the coding homework, and having some basic understanding of the OCaml type constructs like tuple, unions, pattern matching, and functions definitely helps. I also took CS 132 (compiler construction) before 231, which helped me gain some experience of writing code for type checking. But I will say 132 is nice to have but not necessary. Knowing how to write LaTeX helped with the homework.
Table of Contents for Notes:
- Operational Semantics
- Static Type System
- Lambda Calculus
- Logic and typechecking
- Adding features
- Type Inference
- Proof assistant
- Mutable References
- Subtyping
- Type and Effect Systems
Operational Semantics
Big-step operational semantics
The simple language that we are working with today defined in BNF grammar:
t ::= true | false | if t then t else t
The language gives some results. We call these values and they can be defined as grammar as well.
v ::= true | false
P.S. for this class. Values are usually a subset of the grammar.
We first define a interpreter for this language.
(* the types to represent the tree *)
type t = True | False | If of t * t * t
(*
* How to initialize If:
* If (True, False, True)
*)
(* the values/outputs of the program *)
type v = TrueV | FalseV
An interpreter at this point is a function that takes a term of type t
and
return a value of type v
.
let rec eval (t:t):v =
match t with
True -> TrueV
| False -> FalseV
| If(cond, thn, els) ->
match eval cond with
TrueV -> eval thn
| FalseV -> eval els
So, big-step operation semantics really means we define a function that takes in a term to produce a value. This relationship is the big step operation semantics. Notation: (read as big-steps to )
In world of logic: binary relation, predicate, logical judgement.
Big-step operational semantics give rules to define the relation. Notation of a rule:
We attempt to writes the rules for the simple language defined above.
The rule has no premises. It is called an axiom. The logical equivalence of this rule is:
Now we consider a rule where If condition evals to true.
Logical equivalence
We can define the rule for false.
Applying inference rule
Claim: if true then false else true
false
We pick a rule to apply to the claim. We appeal to . Then we recursively prove each preconditions.
This is a derivation tree. To know the tree is correct. We can simply apply the rules and see if the correct predicates are evaluated. So a simple process.
If we choose the wrong rule initially. Our proof cannot proceed. A derivation attempt that fails:
Goal: if true then (if false then true else false) else true
false
Now we make our language more interesting by adding numbers.
t ::= true | false | if t then t else t
| n | t + t | t > t
n ::= <integer constants>
We manipulate the interpreter.
type t = True | False | If of t * t * t
| Int of int | Plus of t * t | Greater of t * t
type v = TrueV | FalseV | IntV of int
let rec eval (t:t):v =
match t with
True -> TrueV
| False -> FalseV
| If(cond, thn, els) ->
match eval cond with
TrueV -> eval thn
| FalseV -> eval els
| Int i -> IntV i
| Plus (t1, t2) ->
(match (eval t1, eval t2) with ->
(Intv i, IntV j) -> IntV (i+j))
| Greater(t1, t2) ->
(match (eval t1, eval t2) with ->
(IntV i, IntV j) ->
if i > j then TrueV else FalseV)
When Plus evals does not match two ints, OCaml throws a runtime error. To
be nicer, we add a Exception match clause
_ -> raise TypeError
This is dynamically typed language. If we don’t encounter type error during execution, we don’t do anything. For instance, an if-expression:
if true then true else false + 1
since the else-branch is never evaluated, we don’t care about the type mismatch there. We continue from last time. Let’s write out the rule for Plus and Greater than.
Plus
- Note that means a mathematical plus
Greater
We introduce another metavariable for type of value for boolean.
t ::= b | if t then t else t
| n | t + t | t > t
b ::= true | false
n ::= integer constants
v ::= b | n
Proving properties given rules
Let’s go back to the simplistic OCaml with only boolean without the integer values.
Theorem: For all terms t, there exists value v such that t v.
Note this theorem might not be true for turing complete languages. The term might get stuck in infinite loop.
There is infinite number of terms. We use induction to prove it.
Structural induction
We want to prove for all terms .
- Prove for separately for each kind of term
- we can assume that holds for each subterm in
Proof: by structural induction on .
- Induction hypothesis: for all terms such that is a subterm of ,
there exists value such that .
- Case analysis on the root node in .
- Case =
true
: By , . - Case =
false
: By , . - Case =
if t1 then t2 else t3
:- By induction hypothesis on , there exists such that .
- Case =
true
: By IH on , there exists such that .- Then by
if t1 then t2 else t3
- Then by
- Case =
false
: By IH on , there exists such that .- Then by
if t1 then t2 else t3
- Then by
- Case =
- By induction hypothesis on , there exists such that .
- Case =
- QED
You can only do a case analysis on something that you know to exists/be true.
Example: for the same theorem, we cannot do a case analysis on . Since the goal is to prove exists, to do a case analysis we are just saying that exists.
Small-step operation semantics
Judgement of form , meaning evaluates in one step to . It is a single step of derivation. What is a single step of derivation?
Example:
if (if true then false else true) then false else true
(* single-step to --> *)
if false then false else true
(* single-step to --> *)
true
Two types of small-step rules:
- “computation” rules that simply do one small bit of evaluation
- “congruence” rules that simply tell you where to look recursively for the next bit of work
Examples of computation rule:
Well, not all if statements simply uses true
or false
as the condition. We need a congruence rule to tell us what to do
- from this congruence rule, we ended up with a new piece of program.
- a derivation tree here is simply a derivation list.
An example of derivation:
Plus rules
- Red stands for reduction
- Notice that we have taken explicit control over the order of evaluation: always left term first
- notice that we didn’t enforce the type of since, we only enforce the
requirement at the moment of addition (aka ). This is nice since the operands can be different types and it is useful to put constraints only when it is absolutely needed.
Some terminology related to small-step semantics
A term is called a normal form if there does not exist a such that .
- example:
true
- in general, we will arrange it such that values are normal forms
There are normal forms that are not values.
- example:
true + 5
A normal form that is not a value is called a stuck expression
- run-time type error
Stuck expression can be arbitrarily large.
- That means you cannot take a step to get a smaller program
(true) + 5
is stuckif (true + 5) then 0 else 1
is stuckif (if (true + 5) then 0 else 1) then 0 else 1
is stuck
Define a second judgement that is simply reflexive, transitive closure of the relation. “Multi-step relation” notated as .
Now we can derive the execution of the entire program using this relation in a single tree just as in big-step semantics.
Definition: a term is eventually stuck if and is stuck.
(true + 5)
is eventually stuck
- in general, all stuck expressions are also eventually stuck.
Example of eventually stuck:
if true then (5 + true) else 0
- note that this expression is not stuck. We can still take a step to reduce it to
5 + true
if true then (5 + true) else 0
5 + true
- note that
if (5 + true) then 1 else 0
is stuck tho. Since we cannot take an step to reduce it
What is this in a real programming language?
- a bad program
- a program that will eventually crash
Practicing Proof
Define the of a term as follows:
We are given a lemma : for all , .
Theorem: If , then .
Proof: we perform a structural induction on .
- induction hypothesis: For all subterm of , if ,
We can do case analysis on root node term . An alternative, when we are given a derivation (), is to do case analysis on the rules.
Case analysis on the root rule in the derivation of :
- Case : Then and $t’ =
t2$.
- By definition of .
- By , and .
- Therefore,
- Case , same idea as above
- Case , Then and and .
- By IH,
- Therefore, .
- THen, the result follows from the definition of .
If we were to perform a case analysis, we also have to cover cases where term
is true
and we need to say that there is no derivation possible by deriving
a contradiction, very annoying.
Note: this theorem does not hold when we add integer into the language. Since
we can derive false
from 0 > 1
.
Exercise: proving lemma :
- By structural induction on .
- IH: for all which is a subterm of ,
- Case ,
- Case ,
- Case
- By IH, ,,
- By def of ,
Now that we have the proof, we name it lemma . Next, we derive the theorem for multi-step.
Theorem: if , then .
Proof: By structural induction on ,
- IH: If , where is a subterm of , then .
- Case analysis on the root rule in the derivation of
- Case : Then , then .
- Case , Then , By , .
- Case , then and .
We cannot use our induction hypothesis here. There is no valid subterm relationship. There is no subterm we can grab for the transitive relationship.
Solution: we don’t do structural induction on . Instead we do a induction on the derivation tree.
We restart our proof. First, rewrite the theorem such that our derivation is given a name.
Theorem: If , then .
Proof: By structural induction on , which is the derivation of
- IH: If , where is a subderivation of ,
then .
- Case analysis on the root rule in the derivation of :
- Case : Then , then .
- Case : Then , By , .
- Case : Then and .
- By IH on ,
- By IH on ,
- Therefore, result follows by transitivity.
We can do this since derivation tree is a tree. It is recursively defined so we can perform structural induction as well.
Take home: we can apply this type of induction on the
Theorem: If and , then there exists a such that and .
Proof: by structural induction on .
- IH: If and and is a subderivation of , then there exists a
such that and .
- Case analysis on the root rule in the derivation of :
- Case : Then . This is a contradiction. This case is not part of the theorem.
- Case : Then
- by ,
- Therefore, and
- Case : Then and .
- Case analysis on the relationship between and :
-
This case analysis is to satisfy IH condition . So that we mathematically satisfies the IH.
- Case :
- By IH on , and .
- Therefore, and and
- By , and .
- result follows .
- Case :
- Case analysis on the relationship between and :
- Case : By IH, there exists such that and . Since , the result follows with .
- Case . Then . Contradiction.
- Case analysis on the relationship between and :
- Case :
Wow, a deeply nested structural induction. We have 3 levels of case analysis nested inside each other.
Static Type System
textbook: Chapter 8
Goal: Get rid of type error at compile time.
Previously, we have big-step, small-step and multi-step semantics judgement form. We need a new judgement form.
To build a type system, we assign each expression a type.
T ::= Bool | Int
The new typing judgement of the form , which means “term has type .
Note that we don’t have value evaluation here. But we have type assignment.
We can build derivation tree from these rules again.
We can have derivation that doesn’t work.
Rule for Greater
Rule for If
- we are forcing the true branch and false branch to have the same type
- figuring out eventually stuck expression is undecidable. A sound type
system always be conservative and rule out program that technically run fine.
This will evaluate to 0. But OCaml will prevent it from even running.
if true then 0 else false
What if we want
if true then 0 else false
to pass the type check? We can add If-True rule.However, this rule is very limited since it requires the if to have syntactically have
true
as condition. Oh Bad.We also cannot statically know what the condition will be, even we introduce a
true
orfalse
type. So we like forcing if statement to have the same type.
Example: Build a type derivation tree
for if 3 > 2 then 0 else 1+2
build a derivation tree.
Type Soundness
How to we prove that our type system has eliminated type error before runtime?
“Well-typed programs don’t go wrong” (Milner)
Our goal: Well-typed programs are not eventually stuck.
Theorem (Type Soundness)
- attempt 1: If , then either is a value or exists s.t. .
- this is not what we want. According to multi-step semantics. Any stuck expression multi-step to itself.
- attempt 2: If , then either is a value or exists s.t. .
- Not good enough. We can take a step but it doesn’t mean we are not eventually stuck.
- attempt 3: If , then there exists a value such that .
- what if program contains infinite loop. That means that we need to solve the halting problem in runtime.
- attempt 4
- is eventually stuck if and is stuck.
- is stuck means there does not exist s.t. .
- is not eventually stuck if for all s.t. , is not stuck:
- Either is a value or exists that .
Theorem (type soundness): if and , either is a value or there exists s.t. .
- we allow infinite execution in this form.
- big-step semantics does not allow us infinite execution.
How do we prove this?
Progress + Preservation
Progress and Preservation implies Type Soundness.
Lemma (Progress): “well typed terms are not stuck”.
- If , then either is a value or exists s.t. .
Lemma (Type Preservation): “stepping preserves well-typedness”
- If and , then .
This is actually stronger than the original version. It implies that
- the expression that we step to is well-typed
- the expression that we step to has the same type as before
while the original does not. Let’s prove them!
Lemma (Progress): If , then either is a value or exists s.t. .
Proof: by structural induction on t.
- IH: If and is a subterm of , then either is a value or exists s.t. .
- Case analysis on the root rule in the derivation :
- Case : Then and . So is a value.
- Case : Then , and .
- By IH, either is a value or exists s.t. .
- By IH, either is a value or exists s.t. .
- Case analysis on the possibilities of and
- Case is a value and is a value:
-
we need a lemma here that says a value of type Int is a number.
- Since and , by Canonical Forms for Int, and . Therefore by , , where .
- Case and :
- Then by
- Case and is a value:
- Then by
- Case is a value and :
- Then by
- …
Lemma (Canonical Forms for Int): If , then is a number .
- we can prove this by case analysis on the rules without induction.
Alternatively, we can also do structural induction on .
- IH: If and is a subderivation of , then either is a value or exists s.t. .
In turing complete language, we cannot use structural induction on subterms. We want to focus on structural induction on derivation.
We are going to redo our Progress proof from last time.
Lemma (Progress): If , then either is a value or exists s.t. .
We have 2 canonical forms that we can use:
- Lemma (Canonical Forms, Bool): If , then is a boolean value .
- Lemma (Canonical Forms, Int): If , then is a integer value .
Proof: By induction on derivation .
- IH: If , where is a subderivation of , then either is a value or exists such that .
- Case analysis on the root rule of :
- Case : Then and . Then is value.
- Case : Then and and
- (i propose to use case analysis on but it goes into infinite regress)
- By IH, is a value or exists such that .
- Case analysis on these cases.
- Case is a value:
- By Canonical Forms, Bool, either or
- If former, steps by otherwise by
- Case :
- By , steps to
- Case is a value:
- Case : Then and . Then is a value.
- Case : Then and .
- By IH, is either a value or (we can apply IH since is a subderivation of ).
- By IH, is either a value or .
- Case analysis on these cases:
- Case is a value and is a value.
- By Canonical form, Int, is an integer value and is an integer value . By , steps.
- Case is a value and .
- By , steps.
- Case and is a value.
- By , steps.
- Case and .
- By , steps.
- Case is a value and is a value.
- (Other cases left as an exercise to the reader)
Note: a type system that rejects all program also satisfies type soundness, since is never true, so the statement is true.
Now we focus on Preservation.
Lemma (Type Preservation): If and , then .
Proof: By induction on derivation .
- IH: If , where is a subderivation of , and , then .
- Case : Then . Since is a value, it cannot step.
- Case : Then , ,
, .
- case analysis on derivation :
- Case : .
- By IH,
- By ,
- Case : :
- Case : :
- Case : .
- case analysis on derivation :
Lambda Calculus
An informal semantics for a function call:
(function x -> x + x) 4
4+4
8
(function x -> (function y -> x + y)) 4
(function y -> 4 + y)
((function x -> (function y -> x + y)) 4) 3
(function y -> 4 + y) 3
4 + 3
7
untyped lambda calculus (chapter 5)
t ::= x | (function x -> t) | t t
x ::= variable names
t t
is a function call.- this small language is turing complete.
- is the original syntax in lambda calculus for
(function x -> t)
.
Some languages:
t
is the body of(function x -> t)
x
is bound within the body of(function x -> t)
a
variable that occurs int
but is not bound int
is called a free variable
small-step operational semantics for the lambda calculus
v ::= (function x -> t)
Computation rule:
- ”” is the substitution operation that sub with .
- is also referred to the beta reduction
- substitutes for all free occurrences of in
- e.g.
(function x -> (function x -> x)) 5
(function x -> x)
- aka. variable scoping
- e.g.
This language so powerful that it can define infinite loop.
Omega term
(function x -> x x) (function x -> x x)
Notice that this expression steps to itself.
Congruence rule
Where do we look for the next piece of work to do?
- Describe the order of evaluation.
- we go with call by value since that’s the most common
- call-by-value semantics: call the leftmost function whose argument is a value
let id
be shorthand for the identity function: (function x -> x)
.
((id (id id)) (id id))
- which evaluation is called first? (the first
(id id)
by call-by-value)
Call-by-value
Call the leftmost function whose argument is a value.
According to these rules, we derive the tree for the example:
Now we have fully defined the language. It is turing complete!
Note: some evaluation strategy will terminate while some other don’t. Let’s consider this program:
(function x -> id) omega
where
omega
is the infinite loop expression we defined above.
- In a call-by-value semantics, we are required to evaluate
omega
first into a value, meaning it will run forever.- In lazy evaluation, we don’t evaluate argument until you need them. So, this program evaluate to just
id
Note: for any 2 strategies that terminates, both will end up with the same value.
Examples of stuff that we can do with lambda calculus. Church came out we all of these. So we call them church boolean, church integers, etc. We try doing that in ocaml.
Church Booleans
But first, we take a detour to datatype
(* detour: booleans as a datatype *)
type mybool = True | False
let myand b1 b2 =
match b1 with
True -> b2
| False -> False
How do we define boolean in lambda calculus?
let ctrue = function t -> function f -> t
let cfalse = function t -> function f -> f
let cand b1 b2 = b1 b2 cfalse;;
Yes, that’s it! That’s how cand
runs. If b1
is true, calling on b1
will return b2
, whatever value it is. If b1
is false, calling on b1
will return cfalse
, regardless of b2
! Genius!
Church numerals
Note that nat
stands for natural numbers. Yes, no negative here.
Again, first we start with data type.
type mynat =
Zero
| Succ of mynat
let rec mynat n1 n2 =
match n1 with
Zero -> n2
| Succ n1' -> Succ (myplus n1' n2)
We defined a recursive plus, not efficient but it works.
let czero = function z -> function s -> z
let cOne = function z -> function s -> (s z)
let cTwo = function z -> function s -> s (s z)
z
is the zero, ands
is successor. Now we see how it knows what number it is.
let cplus n1 n2 =
(function z -> function s -> n1 (n2 z s) s)
How do we think of this? Well, we can treat the first argument z
in any
number as the number to start counting from, and s
is how much we count up
in each step. Calling cplus
means create a new number that starts counting
from n2
, but we use the same count up of s
. To make it truly correct
n1 + n2
, we have to force n2
to start from zero therefore we pass it the z
and the fixed step up s
.
Another way to think about it is from a type perspective. Assume we have some
magic type Z
. (not valid ocaml btw)
let czero = function z:Z -> function s:Z->Z -> z:Z
Therefore, we have to call n1
with a type Z
and Z->Z
Simply-Typed Lambda Calculus
What is the error we are trying to catch? What are the stuck expressions?
- We get stuck when trying to evaluate a variable that was never defined
(i.e., a free variable).
x
is stuck(x x)
is stuck(function x -> x) x
is stuck (the rightmostx
is free)
Again we start with the grammar, this time with a type.
t ::= x | (function x -> t) | t t
v ::= (function x -> t)
T ::= T -> T | Unit
(* an input type and an output type. *)
(* Unit is a base type, otherwise it is infinite recursion *)
Typecheck variable
How do we typecheck a variable? What do we look for stuff? In a real programming languages, the types of arguments or variables are declared by the user.
We define a type environment:
a type environment is a finite mapping from variable names in scope to their types
We use the symbol .
Type judgment now has the form .
- “term has type in the context type environment ”
To type check a variable, we do a look up in the type env.
Typecheck function
- means add/overwrite the type of in with .
This means if we can guess the type of x
to be . If it type checks it
is fine. In reality, we have to try all possible types to find one that
works. However, in real programming languages, we make the user declare the
type of x
.
To make our life easier, we add type declaration to our grammar.
t ::= x | (function x:T -> t) | t t
v ::= (function x:T -> t)
T ::= T -> T | Unit
The rule becomes
Typecheck function call
Let’s type check czero
.
- abbreviate as
Modular typechecking: typecheck a function once and safely call it multiple times.
- formalize this with a type substitution lemma: typing is preserved by substitution.
Theorem (Type Substitution): If and , then
We forgot to define substitution(). Here we define substitution formally. We assume that has no free variables.
- .
- , where .
(function x -> t)
function(x -> t)
(x
is bound).(function x' -> t)
function(x' -> t
)
where .(t1 t2)
(t1
t2
)
.
Note: if we slap the type system into the language. Our language is no longer turing complete. It is terminates. (since type is finite) We need to introduce primitive of recursion.
Logic and typechecking
Notice is a simple logic that is known for a long time since Aristotle.
If we treat our function type as an implication, i.e. , this is means that
If we know implies and we know , we know .
Curry-Howard Isomorphism
Types are also theorems. Programs are also proofs of theorem (proof objects or proof witnesses).
“natural deduction”
For each logical connective, you have an “introduction rule”, which says how to prove facts of that form, and an “elimination rule”, which says how to use facts of that form.
- Introduction rule
- elimination rule
- assumption
The simply-typed calculus is a constructive propositional logic.
- every proof comes with an explicit proof object that witnesses the proof
A proposition is valid if and only if there exists a term such that .
Let’s suppose we have an infinite number of uninterpreted type constants
Example:
Note if we introduce base types like booleans or integers into the lambda calculus, the proof system breaks down and does not follow strict logic.
- Can you write a program in simply-typed lambda calculus that has type ?
- yes:
(function x:P -> x)
- yes:
- What about type ?
- no
- What about type ?
eqiv. to
- Yes:
(function x:P -> function y:Q -> x)
- Yes:
- What about type ?
- Yes:
function f:(P->Q->R) -> function q:Q -> function p:P -> f p q
Logical Conjunction in type system
We add an “and” into our type system. This is really just adding a tuple to our system.
T ::= ... | T ∧ T
t ::= ... | (t, t) | fst t | snd t
That means we can derive data structure from logic.
Example: commutativity of logical conjunction
Find a program that has that type:
- ans:
(function x: (P ∧ Q) -> (snd x, fst x))
- in ocaml with help of type inference:
function x -> (snd x, fst x)
Now we define the operational semantics. First, we have to notice new values. Now we can have tuples as value as well.
v ::= (function x:T -> t) | (v, v)
Exercise: write a small-step operational semantics for tuple.
Notice that we do not have computational rule for tuple to construct a tuple (unlike function or whatever). This is because tuple is already a value.
- we need both elements to be value because if second element has side effect we want to carry it out, so we evaluate second element until it is a value.
Logical Disjunction
In ocaml, we know that type unions are disjunction.
type intorbool = I of int | B of bool;;
We need the tag I
and B
are for a type safe way to use unions.
T ::= ... | T ∨ T
The logical rules are
- notice that is unknown
- the rule is not algorithmic, we have to guess what is
- in real programming languages, we know ahead what type
left
is - like in ocaml, we have to declare it upfront
type LeftOrRight = Left of a | Right of b;;
This is saying that if we can do something (getting ) assuming and if we do something (getting ) assuming . We can conclude that we can for sure get something of type back.
To simplify this, we limit union size to only 2. We only use left
and right
as our constructor names.
t ::= ... | left t | right t
| (match t with left x -> t '|' right x -> t)
Example: commutativity of disjunction
Write a program whose type is
function u: (P ∨ Q) ->
match u with
left x -> right x
| right y -> left y
The set of rules defines a relation and it can be many-to-many. We are allowed to have
So 0 can have both types. In preservation theorem, we only need to prove it preserves type as long as one of the possible type is the matching one.
In semantics, what if we have two paths to take? Then our language is non-deterministic, which is completely okay! Since multi-threaded languages are non-deterministic so to model that we would need many-to-many relation even in semantics.
Back to pairs:
With pairs as primitive, we get multiple arguments and multiple returns for free.
function x:(Int, Bool) -> x
(* returns a (Int, Bool) *)
Back to disjunction
Derivation for flipping disjunction from last time:
function u: (P ∨ Q) ->
match u with
left x -> right x
| right y -> left y
The tree is not complete since we are missing subtree for right y
but do it yourself.
Semantics for union type:
Type True
What if we introduce type True
?
T ::= ... | True
What’s the equivalent in real programming languages?
In OCaml, it is Unit
. Why is it useful?
We want function that takes in no argument or without
no return value. For uniformity, we use Unit
.
- note that
()
is the syntactic value of unit type - write an example of function with type
(P->True)
(function x:P -> ()) : P -> True
What about False
?
- from false, we get everything
~T
as the shorthand for (T -> False)
Example: (~P ∧ (P ∨ Q)) -> Q
(function x: (P -> False) ∧ (P ∨ Q) ->
match snd x with
left p -> (fst x) p
| right q -> q
)
Since from (P -> False)
we can derive everything, in left p
case we
extract ~P
with fst
then passing p
to it to produce False
.
In real programming language, a non-terminating program is typed False
, since we can assign any return type to a non-terminating function.
# let rec infLoop() = infLoop();;
val infLoop : unit -> 'a = <fun>
OCaml, type the return as 'a
meaning we can choose any type, it will pass
type checking but will run forever.
Adding features
Local Variables
Currently, we have variables either through function or match case.
In OCaml, we can use let
t ::= ... | let x=t in t
Local variables:
Is this a syntactic sugar?
Is let x=t1 in t2
equivalent to (function x:T -> t2) t1
?
- not quite, since we do not know the type of
x
, until we have typechecked
Sequencing (side-effect)
t ::= ... | t;t
we don’t have side-effect yet but it would be useful later.
- We force any statement to evaluate to the
()
, theTrue
type (our representation of type Unit). So we indicate clearly we don’t care about the value.
t1;t2
is equivalent to (function x:True -> t2) t1
.
Therefore, this is a syntactic sugar.
In Ocaml, t1;t2
is syntactic sugar for let _ = t1 in t2
.
Recursion
t ::= ... | let rec x=v in t
- the difference between
let x=t1 in t2
andlet rec x=v in t2
is thatx
is not available in the scope oft1
in the former whilex
is in scope inv
forrec
. - by making the right hand side a value
v
, we make sure that it is a function instead of other terms so we don’t get something likelet rec x = x + 1 in ...
- but why
v
not only(function x:T -> t)
? It is more general since we can define a pair of function, therefore it is mutually recursive.
This rules says we simply redefine the instance of variable name in the body with an inline declaration of itself.
Example
let rec fact =
function n -> if n=0 then 1 else n * fact(n-1) in fact 5
(* ---> *)
let fact =
function n -> if n=0 then 1 else
n * (let rec fact = function n -> ... fact)(n-1) in fact 5
(* ---> *)
(* reducing the let in *)
(function n -> ...) 5
(* ---> *)
if 5=0 then 1 else
5 * (let rec fact = function n -> ... in fact)(5-1)
(* --->* *)
5 * (let rec fact = function n -> ... in fact)(5-1)
(* --->* *)
5 * (function n -> if n=0 then 1 else n * (let rec fact = function n -> ... in fact)) (3)
Notice at the end, on the right hand side, the expression is back to calling a fact function with a number. Hey, recursion!
Does this type check rule work?
It does not! To type check v1
, we need to add x
into the scope
of v1
. Any instance of x
in v1
will make v1
to do the typecheck,
Notice how this matches our comparison between let in
and let rec
,
since x
is also in the scope of v1
.
We guess the type of x
and hope it works.
In real programming languages, we declare both the return and argument types:
int fact(int n) {
return n == 0 ? 1 : fact(n-1) * n;
}
OCaml figure this out by using type inference.
The book does so by defining a fix point. Go read the book.
At this point, our language is turing complete
Example: infinite loop
let rec infiniteLoop =
(function x:True -> infiniteLoop x) in infiniteLoop ()
How do we type check this?
We guessed the typed to be . And the derivation tree is
actually valid. You can derive the whole derivation tree. Again from before,
we know that an infinite loop that does not terminate can return any type,
and Bool
is one of any type.
Therefore, we can derive infinite number of types for infiniteLoop
.
However, the logic system becomes unsound as soon as we have infinite loop. Since infinite loop allows us to go from any thing to go to anything.
Type Inference
Consider the simply-typed lambda calculus plus integers:
t ::= n | t + t | x | function x:T -> t | t t
T::= Int | T -> T
Type checking judgement is a ternary relation between type env , term and type .
Typechecking problem: Given and , does there exist a type such that ?
It was easy to type check function since the input is annotated with a type
T
. However, in OCaml, we don’t need to declare type for function since the
language infer this for us. How do we do typechecking in those cases?
We define the problem of type inference.
Define an “erased” version of our language:
s ::= n | s + s | x | function x -> s | s s
Define an “erase” function from t
to s
:
erase(n) = n
erase(t1 + t2) = erase(t1) + erase(t2)
erase(x) = x
erase(function x:T1 -> t) = function x -> erase(t)
erase(t1 t2) = erase(t1) erase(t2)
Type inference problem: Given and , does there exist a term
and type such that and erase(t) = s
?
Example: Given and function x -> x + 1
, what is ? What is ?
function x:Int -> x + 1
Type inference broken into 2 phases.
- constraints generation
- constraints solving
We introduce a new meta-variable for “types”.
S ::= Int | S -> S | X
X ::= type variable
Constraint Generation
How to turn typechecking rules into type inference rules?
- whenever a typechecking rules would have to “guess” a type, instead we create a fresh variable
- our type inference rules will never fail
1 + true
fails on type check- but there is a valid derivation in type inference
- produce a set of constraints that must be satisfied
- later, if the constraints cannot be solved, we reject
Type inference judgement:
- in the book,
- is a set of type equality constraints of the form .
Type inference for Int
- grabs a free type variable. Then, we impose constraint .
Why does returns a type variable ?
- if we force it to have , then we don’t have the type inference
- example:
x+5
. We cannot resolvex
to concrete type yet!
Example of type infer derivation: 1+2
- assumption implicitly
One that has no solution: 1 + true
- and is unsolvable
- we use rule 1, guess the type for
x
by assigning a type variable - then, we put that into the type env
Example for type infer on function
- notice the benefit of not eagerly requiring
x
to have in the check!
Example: function f -> f 0
A substitution is a mapping from type variable to types.
- A substitution satisfies if is syntactically equal to for all .
What is the substitution solution to function f -> f 0
?
- notice that we can choose any type for . Here we just choose
Correctness of Constraint Generation
Intuitively we want to prove that the constraints we generate capture only and all of the possible typings for a program.
-
Soundness of Constraint Generation
- If and satisfies , then there exists term such that
erase(t) = s
and
- If and satisfies , then there exists term such that
-
Completeness of Constraint Generation
- If and
s=erase(t)
,
then and there exists a such that
satisfies and
- If and
We will not be proving them.
Constraints Solving
How do we solve constraints?
-
we can plug them into Prolog. Consider
Z=arrow(F, Y), F=arrow(X, Y), X=int.
gives the solution
Z=arrow(arrow(int, Y), Y) X=int F=arrow(int, Y)
-
OCaml type inference system does something similar
We will explore constraint solving next time.
Type inference broken into 2 phases.
- constraints generation
- constraints solving
We introduce a new meta-variable for “types”.
S ::= Int | S -> S | X
X ::= type variable
Polymorphic Lambda Calculus (System F, Second-Order Lambda Calculus)
The grammars
t ::= ... | function x:T -> t | function X -> t | t [T]
v ::= ... | function X -> t
T ::= Unit | Int | Bool | T -> T | X
| ...(all the tuples and stuff)
| forall X.T
New type forall X.T
Example:
- identity function should have the type
forall X.X->X
We also have a function that takes a type! Identity function in polymorphic lambda calculus:
(function X -> function x:X -> x)
It has type forall X. X -> X
. To call it
let id = (function X -> function x:X -> x) in
id [Bool] true
Example: swap
In OCaml, we can write
let swap = function p -> (snd p, fst p) in
swap (34, true)
Write this program in polymorphic lambda.
let swap = function X ->
function Y ->
function p:(X ∧ Y) -> (snd p, fst p) in
swap [Int] [Bool] (34, true)
The type for swap
is forall X. forall Y. (X ∧ Y) -> (Y ∧ X)
.
Now we go back to the bad example of function id -> (id 34, id true)
. Now we
can make it valid.
let notSoBad = function id:forall X.X->X ->
(id [Int] 34, id [Bool] true)
This works because user don’t get to decide the concrete type. The implementation decides it!
Now we can typecheck self-application.
function x:forall X.X->X -> (x [forall X.X->X] x)
This function is now forall X.X->X -> forall X.X->X
.
Same reason why id [forall X. X->X] id
type checks.
The type inference for this language is undecidable. That’s why OCaml only stick to let polymorphism to make type inference decidable.
Operational semantics for polymorphic lambda
Example: (function X -> function x:X -> x) [Int] 34
- steps to
(function x:Int -> x) 34
- steps to 34
Typecheck for polymorphic lambda
We put into to keep track of the type variables which will use later.
Example: typechecking id
Example: function id:forall X.X->X -> id [Int] 34
Why is this “second-order”?
- First-order logic: forall x.P(x)
- Second-order logic: forall X.P(X)
We skipped over first-order logic. Second order logic is more natural in programming languages.
Example: the identity function is a witness of a theorem of forall X. X-> X or
Existential quantification
Existential typing is a core of information hiding.
Consider a counter class in Java:
class Counter {
private int c = 0;
public void inc() { c++; }
public void reset() { c = 0; }
public void get() { return c; }
}
We hide the information of implementation using private
field.
We implement something like this in OCaml as a tuple of functions.
let counter = (
(function () -> 0),
(function x -> x + 1), (* takes a counter and increment it *)
(function x -> 0),
(function x -> x)
)
These are typed as
(Unit -> int)
(int -> int)
('a -> int)
('b -> 'b)
We are not hiding the representation as int
.
To hide the representation, we give this tuple of functions the following types:
exists X. (Unit -> X, X -> X, X -> X, X -> Int)
Now the client does not know what X
is.
Proof assistant
- Proof assistant is a programming environment where you can also prove properties about your code.
- functional programming language called Gallina
- define functions and define datatypes
- both powerful, much more expressive than what we can do in OCaml
Defining boolean
In OCaml,
type bool = True | False
In Gallina,
Inductive bool : Type :=
true : bool
| false : bool.
Defining negate
In OCaml,
let negate b = match b with
true -> false
| false -> true
In Gallina,
Definition negate (b:bool) :=
match b with
true => false
| false => true
end.
Running computation:
Computer (negate (negate true)).
Writing a lemma
Lemma doubleNegation:
forall b, negate(negate b) = b.
Proofing the lemma
Lemma doubleNegation:
forall b, negate(negate b) = b.
Proof.
intros. destruct b.
- unfold negate. reflexivity.
- unfold negate. reflexivity.
Qed.
intros
is the assumptions that we make. We type of the variables will be inferred from the type system.destruct
is the tactic of proof, which is case analysisunfold
means we unfold the definition ofnegate
in the proofreflexivity
: when we unfoldnegate
, we gettrue
equalstrue
. We explicitly use the lemma ofreflexivity
such thattrue = true
is a true theorem.- we have 2 sub case
true
andfalse
so we repeatunfold
twice. In Proof mode, we know what are the subcases. They should be in the order that they show up in the type def.
Defining list
In OCaml,
type boollist = nil | cons of bool * boollist
In Gallina,
Inductive boollist: Type :=
nil : boollist
| cons : bool -> boolllist -> boollist.
We write the def of cons
as a constructor function, we have 2 arguments: a
boolean and a list of bools and the result.
Defining Recursion
Fixpoint append (l1 l2 : boollist) :=
match l1 with
nil => l2
| cons b l1' => cons b (append l1' l2)
Now we need induction since we have recursion.
Proof by Induction
Lemma appendToNil: forall l, (append l nil) = l.
Proof.
intros. induction l as [|b l'].
- simpl. reflexivity.
- simpl. rewrite IHl'. reflexivity.
Qed.
induction l
means we are carrying inductionas [|b l']
is the naming of variables within the sub cases. Kind of likematch
. each sub case is separated by a bar|
.simpl.
means execute and simplify. The first case is wherel
isnil
so we getnil
. So reflexivity can be applied- the second sub case (
cons b l'
)is where induction happenssimpl
applies def ofappend
and simplify it- then we have an induction hypothesis on
l'
, which iswe apply this IH byappend l' nil = l'
rewrite IHl'
which finds all terms in the proof with the LHS and rewrite it with the RHS - Lastly we apply reflexivity.
We can also extract the proof as OCaml code as well.
Require Coq.extraction.Extraction.
Extraction Language OCaml.
Extraction append.
Parametric polymorphism
Definition id (T: Type) (x:T) := x.
This is the function of identity. The type is
forall T: Type, T -> T
Type functions are also supported, functions that takes in type and output of type. This gives us polymorphic types.
Module PolyList.
Inductive list (T:Type) : Type :=
nil : list T
| cons : T -> list T -> list T.
End PolyList.
This is a polymorphic list. We can do integer list, boolean list, whatever list.
We have functions that
- values -> values
- types -> values (polymorphic functions)
- types to types (polymorphic types)
We are missing values -> types. This is called dependent types.
For our convenience, we define the natural number as a type.
Inductive nat : Type :=
zero : nat
| succ : nat -> nat.
Fixpoint plus(n1 n2 : nat) : nat :=
match n1 with
zero => n2
| succ n1' => succ (plus n1' n2)
Let’s say we want to have a boolean list that keep tracks of its size statically.
Module ListsWithLength.
Inductive boollist : nat -> Type :=
nil : boollist zero
| cons : forall n, bool -> boollist n -> boollist (succ n).
End ListsWithLength.
To use this.
Check (cons (succ zero) false nil).
See how we need to pass 1 (succ zero
), which is the length of the list
[false]
.
Note that this is first order logic since cons
is forall n, bool
Defining Pairs
Module Pairs.
Inductive prod (A B : Type): Type :=
pair : A -> B -> prod A B.
End Pairs.
To build a pair: pair nat bool zero true
.
This is built into Gallina. Check with Print prod.
However, there is something similar to this.
Print and.
and
takes in Prop
instead of Type
. We have proposition.
Lemma andIsCommutative : forall P Q, and P Q -> and Q P.
Proof.
intros. inversion H. split.
- assumption.
- assumption.
Qed.
intros
gets usH = and P Q
.inversion H
create two variables in the inverse order.split
separate them into the subcase ofQ
andP
in this order because we applied inversion- then for each subcase, we check
Q
/P
is true. By checking ourassumption
, we know that they are true
Another way of writing this
Definition andIsCommutative2 (P Q : Prop) (H : and P Q) :=
match H with
conj HP HQ => conj HQ HP
end.
This is building a program that is a proof witness.
Dependent Proposition
We had dependent types. Do we have dependent proposition? Yes, they are inference rules.
Let’s have inference rule for even numbers.
To express this,
Inductive isEven : nat -> Prop :=
IsEvenZero : isEven zero
| IsEvenSuccSucc : forall n, is Even n -> isEven (succ (succ n)).
We proof by constructing a derivation tree:
Definition fourIsEven :=
IsEvenSuccSucc (succ (succ zero)) (IsEvenSuccSucc zero IsEvenZero).
We can also use proof tactics.
Lemma fourIsEven2 : isEven (succ (succ (succ (succ zero)))).
Proof.
apply IsEvenSuccSucc. apply IsEvenSuccSucc. apply IsEvenZero.
Qed.
Proofs of 231 in Coq
We express the language of boolean from Chapter 3
Inductive term : Type :=
t_unit : term (* the Unit type *)
| t_true : term
| t_false : term
| t_if : term -> term -> term -> term.
We can define a simple program if true then false else true
as such
t_if t_true t_false t_true.
Now we define the data types.
Inductive ty : Type :=
Unit : ty
| Bool : ty
Typechecking
Now we define some typechecking rules. We define them as dependent proposition.
Inductive typeof : term -> ty -> Prop :=
| T_True : typeof t_true Bool
| T_False : typeof t_false Bool
| T_Unit : typeof t_unit Unit
| T_If : forall t1 t2 t3 T, typeof t1 Bool -> typeof t2 T -> typeof t3 T -> typeof (t_if t1 t2 t3) T.
We can define simplifying notation in Coq.
Notation "t : T" := (typeof t T) (at level 40).
- level 40 means the precedence of our new notation.
To derive a derivation tree for simpleProgram
:
Definition typeOfSimpleProgram : (typeof simpleProgram Bool) :=
T_If t_true t_false t_true Bool T_True T_False T_True.
t_true t_false t_true Bool
is forforall t1 t2 t3 T
- then we give it a derivation tree for
t1
withT_True
- then we give it a derivation tree for
t2
withT_False
- then we give it a derivation tree for
t3
withT_True
We can enforce in teh grammar that an expression has to be well typed.
Module WellTypedExprs.
Inductive term : ty -> Type :=
t_unit : term Unit
| t_true : term Bool
| t_false : term Bool
| t_if : forall T, term Bool -> term T -> term T -> term T.
End WellTypedExprs.
In the original definition of term
, we can define a bad program such as if () then false else true
Definition badProgram := t_if t_unit t_false t_true.
But under the new rule it is no longer valid.
Small-step operational semantics
First we define what is a value.
Inductive isValue : term -> Prop :=
UnitVal : isValue t_unit
| TrueVal : isValue t_true
| FalseVal : isValue t_false
Define a step relation as a relation between a term and a term:
Inductive step : term -> term -> Prop :=
E_IfTrue : forall t2 t3, step (t_if t_true t2 t3) t2
| E_IfFalse : forall t2 t3, step (t_if t_false t2 t3) t3
| E_If : forall t1 t2 t3 t1', step t1 t1' -> step (t_if t1 t2 t3) (t_if t1' t2 t3)
Exercise: extending and
to the language
Given a new term for and
Inductive term : Type :=
(* ... *)
| t_and : term -> term -> term
Give a typing rule under typeof
.
The solution
Inductive typeof : term -> ty -> Prop :=
(* ... *)
T_And : forall t1 t2, typeof t1 Bool -> typeof t2 Bool -> typeof (t_and t1 t2) Bool.
Give a small-step semantics under step
.
The solution
Inductive step : term -> term -> Prop :=
(* ... *)
| E_And1 : forall t1 t2 t1', step t1 t1' -> step (t_and t1 t2) (t_and t1' t2)
(* the rest left as exercise to the reader *)
Proving Progress and Preservation
Notation for stepping
Notation "t1 '-->' t2" := (step t1 t2) (at level 40).
Theorem progress: forall t T, t : T -> isValue t \/ exists t', t --> t'.
Proof.
intros. induction H.
- left. apply TrueVal.
- left. apply FalseVal.
- left. apply UnitVal.
- right. inversion IHtypeof1.
* apply (canonicalFormsBool t1 H2) in H. inversion H.
+ exists t2. rewrite H3. apply E_IfTrue.
+ exists t3. rewrite H3. apply E_IfFalse.
* inversion H2. exists (t_if x t2 t3). apply E_if. assumption.
Qed.
t : T
is from our defined notationintros
gives ust
is a termT
is aty
aka typeH: (t: T)
is our hypothesis and it is our derivation tree
- by
induction H
, we breakH
into all cases.- 1st case is for
t_true
, so we appeal toTrueVal
to theleft
condition, which isisValue t
. - 2nd and 3rd case are similar
- by deriving from the first 3 cases, we get induction hypothesis as extra premises.
- in the 4th case, if should step, so
right
. IHtypeof1
is induction hypothesis wheret1
can type.- if
t1
is value of type bool, then it has to bet_true
ort_false
, but we need the canonical forms for bools. we pause the proof. go and write a lemma before this- now
- if
- 1st case is for
Lemma canonicalFormsBool: forall t, isValue t -> t: Bool -> t = t_true \/ t = t_false.
Proof.
intros. inversion H.
- rewrite <- H1 in H0. inversion H0.
- left. reflexivity.
- right. reflexivity.
- first case is
t_unit
,H1: t_unit = t
andH0: t:Bool
, then we writeH1
inH0
, then it becomesH0: t_unit : Bool
, but this is a contradiction. We callinversion H0
. then this is proven
Proof for preservation
Theorem preservation: forall t t' T, t : T -> t --> t' -> t' : T.
Proof.
intros. generalize dependent t'. induction H.
- (* T-True *) intros. inversion H0.
- (* T-False *) intros. inversion H0.
- (* T-Unit *) intros. inversion H0.
- (* T-If *) intros. inversion H2.
* (* E-IfTrue *) rewrite <- H3. assumption.
* (* E-IfFalse *) rewrite <- H3. assumption.
* (* E-If *) apply T_If.
+ (* t1' is Bool *) apply IHtypeof1. assumption.
+ assumption.
+ assumption.
Qed.
- we leave
t'
as universally quantified usinggeneralize dependent t'
- first 3 cases are contradiction,
H2
is applied.
Note about
rewrite
- In
rewrite H
,H
has to be an equality (A=B
), and it rewritesA
in the goal withB
- In
rewrite <- H
,H
has to be an equality (A=B
) and it rewritesB
in the goal withA
- In
rewrite <- H1 in H2
, is likerewrite <- H
but instead of rewriting in the goal but it rewrites in the given hypothesisH2
Mutable References
- book chapter 13
In OCaml,
- local variables are always immutable
- all mutable memory is on the heap
In C, int
and int*
are both mutable In OCaml, int*
is represented by the
type int ref
. int ref
is mutable but int
is not. This means mutability
is explicitly encoded in the type.
(* create a reference *)
ref 0;;
ref (2, "hello");;
ref (function x -> x + 1);;
ref (ref 0);;
- this creates a reference on the heap with some initial value
(* dereference *)
let r = ref 0;;
!r;;
!r
is the equivalent of*r
in C
(* updating value *)
r := 53;;
(* aliasing *)
let x = ref 0;;
let y = x;;
Once we have reference, we don’t need recursion anymore.
let factorial = ref (function x -> x + 1);;
factorial :=
function x -> if x = 0 then 1 else x (!factorial)(x-1);;
let fact = !factorial;;
fact(5);;
Reference is really powerful. We can define a cyclic list as well.
type clist = Nil | Cons of int * clist ref;;
let lst = ref Nil;;
lst := Cons(4, lst);;
Object in references
(* an integer counter object *)
let (get, inc, reset) =
let c = ref 0 in (
(function() -> !c),
(function() -> c := !c + 1),
(function() -> c := 0)
);;
(* make a constructor *)
let makeCounter () =
let c = ref 0 in (
(function() -> !c),
(function() -> c := !c + 1),
(function() -> c := 0)
);;
let counter1 = makeCounter();;
let (get1, inc1, reset1) = counter1;;
There is no null reference since all reference has to contain some value. The way to hack up null is union type.
type optionalInt = Null | I of int;;
This is available within the standard library. It is called option
.
Typing rules
t := ... | ref t | !t | t := t
T := ... | Ref T
Small-step semantics
What are the new value forms?
Can we use ref v
as a value form? No since there is aliasing.
let r1 = ref 0 in
let r2 = r1 in
(r1 := 4); !r2
r2
copies r1
and gets ref 0
but we can’t propagate the change in r1 := 4
to r2
in our given semantics.
To do this, we introduce new meta-variable
l ::= infinite set of memory locations
A heap/store is a mapping from memory location to their values.
New judgement of the form .
Type Soundness with reference
Original type soundness: If and , then either is a value or there exists such that .
New type soundness: If and , then either is a value or there exists and such that .
New Progress:
If , then either is a value or there exists and such that .
- something is wrong about this, where is coming from?
- how do we guarantee that we don’t do invalid reference!
New Preservation:
If and , then .
Typing rule of location
Problem: In our small step semantics, we introduce some intermediate of memory
location l
. We can step to some term containing memory location. To prove
this in preservation/progress, we need typing rules for l
.
Example: preservation does not currently hold.
- and
- but is not well-typed.
We don’t know what is the type of the thing we store in a memory location. We introduce a new mapping that keeps track of types of the stuff we stored in memory.
Introduce a store typing . The typing judgement is of the form . Therefore, the typing rule for location is
Let’s update Progress/Preservation to account for :
- Progress: If , then either is a value or there exists and such that .
- Preservation: If and , then .
But this is still incorrect, consider the following example.
!l
- but doesn’t step. Contradiction to Progress.
- , and but is not well-typed. Contradiction to Preservation
Problem: there is no connection between and .
We introduce a new relationship.
Definition (Heap typing): We say and are consistent (denoted ) if:
Now update Progress/Preservation:
- Progress: If and , then either is a value or there exists and such that .
- Preservation: If and and , then there exists an such that and .
Subtyping
Introduce a subtyping relation, denoted
- “S is more specific than T”
- “values of type S can be passed where values of type T are expected”
- principle of subtype substitutability
We would like to do this in our language
let f = function x -> if (fst x) then (snd x) + 1 else (snd x) - 1
in
f (true, 1, "some other info")
- OCaml doesn’t let us do this
- but it should be allowed since extracting first and second from a triple containing extra info should be a subtype
- conceptually a tuple is a struct/record, to extend a struct/record we add extra fields
Formalize the principle of subtype substitutability through a typing rule called “subsumption”:
- if is type and is a subtype of , then we can treat as type .
Next we define the subtyping relationship .
Consider the simply-typed lambda calculus plus base types (ints and bools) plus (arbitrary-length) tuples.
T ::= Int | Bool | T -> T | ... | T * T * ... * T | Top
- we introduce a
Top
type - This is the equivalence of the root type
Object
in Java
- e.g.
Int * Bool
Top * Bool
- e.g.
Bool * Int * Bool
Bool * Int
Combining and , we can have subtyping for shorter tuples with subtypes.
Now define subtyping for functions?
Consider a formal parameter f
of type T1 -> Int * Int * Int
. Can we pass in for f
a value of type:
T1 -> Int * Int
? (No)T1 -> Int * Int * Int * Int
? (Yes)
This means .
Consider a formal parameter f
of type Int * Int * Int -> T2
. Can we pass in for f
a value of type:
Int * Int -> T2
? (Yes)Int * Int * Int * Int -> T2
? (No)
This means .
Example in Java:
class C {
Object m() {
return new Integer(4);
}
}
class D extends C {
String m() {
return "hello";
}
}
- see how the subtype function
m
is valid sinceString
<:Object
Suppose we add mutable references Ref T
to our language
Can we do this?
Ref Int * Int
<:Ref Int * Int * Int
? NOT OK.Ref Int * Int * Int
<:Ref Int * Int
?- can we do this?
No! what if we
let (r: Ref Int*Int*Int) = ref (0, 1, 2) in let (s: Ref Int*Int) = r in (* ... *)
This is dangerous sinces := (0, 1)
s
is an alias. Then we…(where(fst !r) + (trd !r)
trd
is taking a third element)
- can we do this?
We cannot allow subtyping for references!
Suppose that S <: T.
- Can we consider Ref T <: Ref S?
- can Ref Top <: Ref Int?
- no
let (r: Ref Top) = ref true in let (s: Ref Int) = r in !s + 5
- Can we consider Ref S <: Ref T?
- can Ref Int <: Ref Top?
- no
let (r: Ref Int) = ref 0 in let (s: Ref Top) = r in s := true; !s + 1
In Java, the problem comes up in the following form
class C {
Object f;
void m() {
f = new Integer(34);
}
}
class D extends C {
override String f; // not legal Java
int n() {
return f.length;
}
}
Since someone can assign int
to f
, then override String f
has to be
illegal otherwise it is unsafe.
Now consider List
. List<String>
is not a subtype of List<Object>
because List
are mutable.
class Test {
void m(List<Object> l) {
// we can append an integer to a list of string
l.add(new Integer(34));
}
}
class Main {
public static void main(String[] args) {
Test t = new Test();
List<String> strs = new LinkedList<String>();
strs.add("hi");
strs.add("there");
t.m(strs); // this will upset the compiler
}
}
We can append integer to a list of strings. Java doesn’t allow that.
But what if we want
void printAll(List<Object> l) {
for (Object o:l) {
System.out.println(o);
}
}
In this case, it is completely safe. But the type system is stopping us. We need parametric polymorphism
<T> void printAll(List<T> l) {
for (T o:l) {
System.out.println(o);
}
}
Consider Array
. The same problem happens, however…
class Test {
void m2(Object[] l) {
l[0] = 1;
}
}
class Main {
public static void main(String[] args) {
Test t = new Test();
t.m2(args);
}
}
This will typecheck! But it will crash at runtime. Historical baggage or tech debt. In early days of Java there is no Generics, to make the language pragmatic. They let this happen in compile time but trade it by giving up runtime performance and runtime safety.
Consider this example. We have some Shape
that we can draw.
interface Shape {
void draw();
}
class Rectangle implements Shape {
void draw() { /* ... */ }
}
class Test {
void drawAll(List<Shape> l) {
for (Shape s : l) {
s.draw();
}
}
}
class Main {
public static void main(String []args) {
List<Rectangle> rects = new LinkedList<Rectangle>();
t.drawAll(rects); // compiler complains
}
}
This is doesn’t compile for the same reason. Can we use parametric polymorphism?
<T> void drawAll(List<T> l) {
for (T s : l) {
s.draw();
}
}
This also doesn’t compile. Since T
doesn’t have a draw
method. However, we
can use bounded parametric polymorphism.
<T extends Shape> void drawAll(List<T> l) {
for (T s : l) {
s.draw();
}
}
Now this type checks.
Type and Effect Systems
Side effects are not visible in the type of the functions.
let r = ref 42;;
let inc = function x -> x + 1;;
let inc2 = function x -> r := 1000; x + 1;;
Both inc
and inc2
has type int -> int
. However, inc2
has side effects
but we don’t know from its type.
What doesn’t function type tell us?
- does the function read/write mutable memory?
- does the function read/write to a file?
- does the function send/receive packets?
- does the function acquire/release lock?
There are a lot of things that the types don’t tell us about.
A type and effect system then is a system that also tracks side effects.
Simple idea: augment the type system to track a set of “effects”
ϕ ::= {} | { read } | { write } | { alloc } | ϕ U ϕ
New typing judgment:
” has type and ’s evaluation may cause the effects ”
Example: implies the term is pure.
- this means we can cache calls
Example: implies the is side-effect-free but does not change the heap.
- this means there is no side effect but the calls cannot be cached
Now consider the rules for references.
Now, we can tell the difference between 0
and !(ref 0)
.
Let’s augment the rest of the typing rules.
From the case of if
, we see that our system is an over-approximation and
conservative since we don’t know which branch it is going to take.
Final
- no polymorphic lambda calculus, no existential types
- but include let polymorphism
Continuing on effect system
Consider let x = ...
Consider variable look up
Consider declaring a function
- declaring a function doesn’t have any effect
- but we are ignoring them on the floor…
Consider calling a function
- but now, the side effect in
t1
is not accounted for! - our type system is unsound
Solution to unsoundness: augment function type to also keep track of the body of the function.
The effect is now part of the type language.
Java has an effect system. It is called exception.
class Exn1 extends Exception {}
class Exn2 extends Exception {}
class Test {
void m1(int x) throws Exn1 {
if (x < 0) throw new Exn1();
// ...
}
void m2(int y) throws Exn1, Exn2 {
m1(y);
if (y > 100) throw new Exn2();
}
void m3(int z) throws Exn2 {
try {
m2(z);
} catch (Exn1 e) {
// ...
}
}
}
throws
documents the exception side effect.- we “consume” the effect by using
try-catch
Monads
There are 2 distinct aspect to monads
- ability to track effects of various kinds in the type system
- a cool programming pattern for separating out code that manipulates effects from the “normal” code does the regular computation
let h x y = x/y
let g x = x + 1
let f x = x * 2
let myprogram x y =
let v = h x y in
let u = g v in
f u
h
can throw a division by 0 exception.
We can wrap it in an option type.
let h' x y = if y = 0 then None else Some (x / y)
However, we cannot just replace h
in myprogram
with h'
since now it
returns an option. Then this bubbles up all the way in the program. So much
pattern matching to do!
Problem: so much boilerplate code that bubbles up everywhere.
A monad is an API that allows all the boilerplate handling of the effect to be modularized. Two functions:
return
: a function that lifts a value into monad (a' -> a' option
)bind
: thread an effect through a computation ('a option -> ('a -> 'b option) -> b' option
)- give me some result with an effect, a function to handle a value and have some effect. Give you back a combined effect.
let return x = Some x;;
(* defining an infix operator for bind *)
let (>>=) x f =
match x with
None -> None
| Some x' -> (f x')
Now re-consider myprogram
let myprogram' x y =
(h' x y) >>= (function v -> return (g v)) >>= (function u -> return (f u))
Consider another example. This time, we want to log all function execution. To capture the side effect, we have a type that capture the side effect.
type 'a logged = 'a * string
Then we define return
and bind
let return x = (x, "")
let (>>=) x f =
let (v, log) = x in
let (v', log') = f v in
(v', log ^ " " ^ log')
Now modify my program:
let h x y = (x/y, "h")
let g x = (x + 1, "g")
let f x = (x * 2, "f")
let myprogram' x y =
(h x y) >>= g >>= h
Each function only cares about their own logs. The concatenation is abstracted away from them.