Last updated on
Week 4 Debrief: huffman
lab, dealing with stack overflows, and extra proofs
Congrats on completing your fourth week of CS-214! Here is a round-up of interesting questions, tips for exercises and labs, and general notes about the course.
Administrivia
-
The
huffman
lab is due on Friday as usual: make sure to submit early to catch possible stack overflows in the last question! -
Please fill EPFL’s “retour indicatif” before Sunday, 13 October at midnight!
-
Help us beat the flu 🤧: to protect your classmates and the course staff, please consider following the lecture livestream from home if you’re feeling unwell, and/or wearing a mask during lectures and help sessions!
Interesting questions on Ed
- Command-line quoting, and how it affects
testOnly
- Are types erased during compilation of Scala code?
(Note: We have started closing questions that do not follow the debugging and Ed guides. If this happens to you, don’t be discouraged! Bug reporting is a critical skill, and we’re training you to be better at it. Just improve your question, and reopen it.)
Monitoring lecture mistakes
A typo made its way into the sort
monitoring example that we discussed last Wednesday. The exercise was about incomplete postconditions, and the one for sort
in the slides was:
} ensuring (res =>
(0 to l.length - 2).forall(idx => res(idx) <= res(idx + 1)))
sort.worksheet.sc
This postcondition is indeed incomplete, but worse: it can even crash. Can you see when?
Solution
If the imput has ≥ 2 elements and the result does not have the same length as the input, then res(idx)
will crash, because the forall
iterates between 0
and l.length
(not res.length
).
A crash in a postcondition is undesirable. The slides have been updated to refer to res.length
instead, though an alternative would be to check the length as well in the postcondition: a better monitor could be to first require res.length == l.length
, as follows:
} ensuring (res =>
l.length == res.length &&
(0 to l.length - 2).forall(idx => res(idx) <= res(idx + 1)))
sort.worksheet.sc
There’s still an issue with this… can you see it?
Solution
This postcondition ensures that the output has the right length and is sorted, but it does not guarantee that the values are the same: an incorrect implementation returning List(1, 2, 3)
for input List(6, 5, 4)
would pass the postcondition.
Scala List API
We’ve already seen (or proved!) many List
functions. For a more comprehensive look at the API, check out the Scala 3 List
API Documentation.
Beating stack overflows
The last question in the huffman
lab is designed to trigger a stack overflow if your implementation is insufficiently optimized.
A stack overflow means that your code recursed too deeply. How deep? The exact threshold depends on lots of factors, such as your platform, OS, Java version, or configuration. In fact, because the threshold isn’t the same on all machines, it may even happen that your code works on your machine, but overflows the CI’s stack.
If you run into overflows on CI but not on your machine, your first step, as per the debugging guide, should be to attempt to reproduce the problem locally.
The best way to do this is to explicitly reduce the stack size. Conveniently, Java has a flag that lets you set the stack size: -Xss
. You can set it through SBT’s -J
flag: for example, to run SBT with a 512kB stack, use sbt -J-Xss512k
(the number following -Xss
can be either a number of bytes, or a number of kilobytes ending with k
, or megabytes ending with m
, etc.).
Once you can reproduce the overflow, you’ll also have easy access to the stack trace: typically, this you will find that you either have an infinite loop (then the fix is to make it stop!), or a non-tail recursive function going too deep (then the fix is to make it tail-recursive).
An extra proof exercise: verifying constfold
🔥
With our experience implementing calculator
and doing proofs from the equational reasoning exercise set,
let’s try to prove our implementation of an expression transformation, constfold
.
For the purpose of the proof, let us use the following definition of constfold
, which delegates the recursion to a helper function, mapExpr
.
def mapExpr(e: Expr, f: Expr => Expr): Expr =
val mapped = e match
case Num(_) => e
case Var(_) => e
case Add(e1, e2) => Add(mapExpr(e1, f), mapExpr(e2, f))
case Sub(e1, e2) => Sub(mapExpr(e1, f), mapExpr(e2, f))
case Mul(e1, e2) => Mul(mapExpr(e1, f), mapExpr(e2, f))
f(mapped)
constfold.worksheet.sc
def constfold1(expr: Expr) = expr match
case Add(Num(n1), Num(n2)) => Num(n1 + n2)
case Sub(Num(n1), Num(n2)) => Num(n1 - n2)
case Mul(Num(n1), Num(n2)) => Num(n1 * n2)
case e => e
constfold.worksheet.sc
def constfold(e: Expr): Expr =
mapExpr(e, constfold1)
constfold.worksheet.sc
We can use the same induction techniques on trees that we have learned to reason about the correctness of constfold
.
This means that given a function f
such that ∀ e ctx. evaluate(ctx, f(e)) === evaluate(ctx, e)
, we want to:
-
derive rewriting rules for
mapExpr
andconstfold1
; -
prove the soundness of
constfold1
. This means that given an arbitrary expressione
, we want to showevaluate(ctx, constfold1(e)) === evaluate(ctx, e)
; -
prove the soundness of
mapExpr
assuming thatf
is sound; -
combine the previous two results to prove the soundness of
constfold
.
Solution
-
By the definition of
mapExpr
andconstfold1
, the following lemmas hold for allvalue
,name
,e1
,e2
, andf
:// (MapExprNum) mapExpr(Num(value), f) === f(Num(value)) // (MapExprVar) mapExpr(Var(name), f) === f(Var(name)) // (MapExprAdd) mapExpr(Add(e1, e2), f) === f(Add(mapExpr(e1, f), mapExpr(e2, f))) // (MapExprSub) mapExpr(Sub(e1, e2), f) === f(Sub(mapExpr(e1, f), mapExpr(e2, f))) // (MapExprMul) mapExpr(Mul(e1, e2), f) === f(Mul(mapExpr(e1, f), mapExpr(e2, f)))
// (Constfold1Add) constfold1(Add(Num(n1), Num(n2))) === Num(n1 + n2) // (Constfold1Sub) constfold1(Sub(Num(n1), Num(n2))) === Num(n1 - n2) // (Constfold1Mul) constfold1(Mul(Num(n1), Num(n2))) === Num(n1 * n2) // (Constfold1Other) // if e is not Add(Num(n1), Num(n2)), // Sub(Num(n1), Num(n2)) or Mul(Num(n1), Num(n2)) constfold1(e) === e
-
We prove the required property by case analysis on
e
(just like induction, but we don’t need inductive hypotheses):-
Base case 1:
e = Num(value)
.evaluate(ctx, constfold1(Num(value))) // to show `=== evaluate(ctx, Num(value))` === evaluate(ctx, Num(value)) // by Constfold1Other
-
Base case 2:
e = Var(name)
.evaluate(ctx, constfold1(Var(name))) // to show `=== evaluate(ctx, Var(name))` === evaluate(ctx, Var(name)) // by Constfold1Other
-
Inductive case 1:
e = Add(e1, e2)
. We need to do further case analysis one1
ande2
:- Case 1:
e1 = Num(n1)
ande2 = Num(n2)
.evaluate(ctx, constfold1(Add(Num(n1), Num(n2)))) // to show `=== evaluate(ctx, Add(Num(n1), Num(n2)))` === evaluate(ctx, Num(n1 + n2)) // by Constfold1Add === n1 + n2 // by EvalNum === evaluate(ctx, Add(Num(n1), Num(n2))) // by EvalAdd
- Case 2:
e1
is notNum(n1)
ore2
is notNum(n2)
.evaluate(ctx, constfold1(Add(e1, e2))) // to show `=== evaluate(ctx, Add(e1, e2))` === evaluate(ctx, Add(e1, e2)) // by Constfold1Other
- Case 1:
-
Other inductive cases
e = Sub(e1, e2)
ande = Mul(e1, e2)
are analogous to the last case wheree = Add(e1, e2)
.
Hence, we can conclude that
evaluate(ctx, constfold1(e)) === evaluate(ctx, e)
for all possible values ofe1
ande2
. -
-
We first assume the soundness of
f
:AssumSoundF
:
∀ ctx e. evaluate(ctx, f(e)) === evaluate(ctx, e)
Given an arbitrary
ctx
ande
, we want to showevaluate(ctx, mapExpr(e, f)) === evaluate(ctx, e)
. We prove the required property by induction one
:- Base case 1:
e = Num(value)
.evaluate(ctx, mapExpr(Num(value), f)) // to show `=== evaluate(ctx, Num(value))` === evaluate(ctx, f(Num(value))) // by MapExprNum === evaluate(ctx, Num(value)) // by AssumSoundF
- Base case 2:
e = Var(name)
.evaluate(ctx, mapExpr(Var(name), f)) // to show `=== evaluate(ctx, Var(name))` === evaluate(ctx, f(Var(name))) // by MapExprVar === evaluate(ctx, Var(name)) // by AssumSoundF
- Inductive case 1:
e = Add(e1, e2)
. The induction hypothesis states:// (IH1) evaluate(ctx, mapExpr(e1, f)) === evaluate(ctx, e1) // (IH2) evaluate(ctx, mapExpr(e2, f)) === evaluate(ctx, e2)
We prove the inductive case:
evaluate(ctx, mapExpr(Add(e1, e2), f)) // to show `=== evaluate(ctx, Add(e1, e2))` === evaluate(ctx, f(Add(mapExpr(e1, f), mapExpr(e2, f)))) // by MapExprAdd === evaluate(ctx, Add(mapExpr(e1, f), mapExpr(e2, f))) // by AssumSoundF === evaluate(ctx, mapExpr(e1, f)) + evaluate(ctx, mapExpr(e2, f)) // by EvalAdd === evaluate(ctx, e1) + evaluate(ctx, e2) // by IH1 and IH2 === evaluate(ctx, Add(e1, e2)) // by EvalAdd
- Other inductive cases
e = Sub(e1, e2)
ande = Mul(e1, e2)
are analogous to the last case wheree = Add(e1, e2)
.
QED.
-
By combining the soundness proof of
mapExpr
andconstfold1
, we immediately get soundness ofconstfold
:∀ ctx e. evaluate(ctx, constfold(e)) === evaluate(ctx, e)
What is particularly satisfying here is that the proof for
mapExpr
andconstfold1
are independent, which means that if we can prove the soundness of a functionf
, we get the soundness ofmapExpr(e, f)
easily.
As an additional exercise, you can try to prove the soundness of algebraic1
and andThen
in the same way.
Then, you will get the soundness of simplify
for free!