Last updated on
Week 9: Webapps and more Version Control
Welcome to week 9 of CS-214 — Software Construction!
As usual, ⭐️ indicates the most important exercises and questions and 🔥 indicates the most challenging ones. Exercises or questions marked 🧪 are intended to build up to concepts used in this week’s lab.
You do not need to complete all exercises to succeed in this class, and you do not need to do all exercises in the order they are written.
Webapps
This exercise set is intended to reinforce some concepts used in the webapp lab: serialization, and state machines.
We strongly encourage you to solve the exercises on paper first, in groups. After completing a first draft on paper, you may want to check your solutions on your computer. To do so, you can pull from the course exercises repository.
Serialization / deserialization 🧪
The traditional way to handle exceptional results or errors in functional programming is the Option type, which we studied in the calculator lab. At times, this style can be a bit heavy. We’ll see why it may still be worth using in the monads week, but for this week’s exercises (and lab!) let’s use exceptions instead.
We’ll illustrate the issue with a serialization problem.
Serialization is the process of transforming data within a program into a form that can be transmitted over a network, stored in files, etc. Deserialization is the reverse operation. Once your program has state, it’s pretty natural to want to save it, and for that you need serialization.
A common serialization format is JSON. In Scala, a JSON value can be represented like this:
package ujson
enum Value:
case Null
case True
case False
case Str(s: String)
case Num(d: Double)
case Arr(items: ArrayBuffer[Value])
case Obj(items: LinkedHashMap[String, Value])
A serialization function for type T is a function of type T => ujson.Value. A deserialization function is a function of type ujson.Value => Try[T]. The type Try[T] models either a Success(t: T) or a Failure, and comes with a very convenient constructor Try(…) which runs the computation …, and returns a Failure if it raises an exception, and a Success otherwise. A wire is a pair of a serialization and a deserialization function; it can be represented thus:
trait Wire[T]:
def serialize(t: T): ujson.Value
def deserialize(js: ujson.Value): Try[T]
webapps/src/main/scala/serialization/Wire.scala
As an example, here are a serializer and a deserializer for a pair of an integer and a string:
object IntStringWire extends Wire[(Int, String)]:
def serialize(t: (Int, String)): ujson.Value =
ujson.Arr(
ujson.Num(t._1),
ujson.Str(t._2)
)
def deserialize(js: ujson.Value): Try[(Int, String)] =
Try {
val arr = js.arr // .arr throws an exception if the input isn't an array
val (fst, snd) = (arr(0).num, arr(1).str)
if !fst.isValidInt then
throw IllegalArgumentException(f"Not an int: $fst")
(fst.toInt, snd)
}
webapps/src/main/scala/serialization/Wire.scala
And if we are given a serializer and deserializer for some type T, we can use it to create a serializer and deserializer for Option[T]:
case class OptionWire[T](w: Wire[T]) extends Wire[Option[T]]:
def serialize(t: Option[T]): ujson.Value =
t match
case None =>
ujson.Arr(ujson.Str("none"))
case Some(t) =>
ujson.Arr(ujson.Str("some"), w.serialize(t))
def deserialize(js: ujson.Value): Try[Option[T]] =
Try {
val arr = js.arr
val tag = arr(0).str
if arr.size == 1 && arr(0).str == "none" then
None
else if arr.size == 2 && arr(0).str == "some" then
Some(w.deserialize(arr(1)).get)
else
throw IllegalArgumentException(f"Unexpected: ${arr.toList}")
}
webapps/src/main/scala/serialization/Wire.scala
-
Read the documentation of the Try type.
-
Skim through the examples provided in the documentation of the
ujsonlibrary (you can skip the last section onConverting To/From Scala Data Types). -
Consider the following type:
enum Formula: case Lit(b: Boolean) case Var(name: String) case FnCall(fn: String, args: List[Formula])webapps/src/main/scala/serialization/Wire.scala
How would you serialize a value of this type? Think about representation choices on paper first. Try a few different examples by hand. Then, write the serializer and deserializer.
object FormulaWire extends Wire[Formula]: import Formula.* def serialize(e: Formula): ujson.Value = ??? def deserialize(js: ujson.Value): Try[Formula] = ???webapps/src/main/scala/serialization/Wire.scala
-
All encoder/decoder pairs should verify
deserialize(serialize(v)) === Success(v)for allv. Use this equation to test your serializer and deserializer together.(This is necessary but not sufficient for correctness; do you see why?)
-
A previous version of this exercise had the following assertion in a test, to check for “correct handling of invalid inputs”, but students complained:
assert(FormulaWire.deserialize("notaformula").isFailure)Can you see what was wrong with this test?
Isolating state: state machines 🧪
Most stateful applications are thoroughly stateful: there’s state everywhere, and anything can change under your feet at any time. A better model, when you must have state, is to contain it in one place. This is the state machine model, where you have a single piece of state, and a function to update it.
You have already seen many examples of this pattern, without knowing it:
-
Hardware circuits, e.g. the CPUs in your computer and mobile phone, are designed as combinational circuits (i.e., purely functional) circuits wiring together state elements (latches, flip-flops/registers).
-
All uses that we’ve made of
foldLeftandfoldRighttake a starting statezand update it once per item in their input list.
Another way to put this is that we have a state and instructions that are interpreted to update the state. In that sense, foldLeft and foldRight are interpreters for a language whose instructions are the elements of the list being processed.
This week’s lab will demonstrate how powerful this model is for building reliable single- and multi-user applications for desktop, mobile, and web environments. For now, let’s get used to thinking in terms of state machines. We’ll use the following model:
trait StateMachine[State, Event, Result]:
def init: State
def update(s: State, e: Event): State // Sometimes called *transition*
def finish(s: State): Result
webapps/src/main/scala/rec/StateMachines.scala
def processEvents[S, E, R](sm: StateMachine[S, E, R])(events: List[E]): R =
sm.finish(events.foldLeft(sm.init)(sm.update))
webapps/src/main/scala/rec/StateMachines.scala
Why do we bother with all this if it all boils down to a foldLeft? Because in general we are interested in modeling streaming processes, where events are user inputs, messages on a network, interrupts, etc. In that case, instead of a foldLeft, we would store the state in a var and update it every time we receive an event from the user, the network, a sensor, etc. (and there would be no finish stage)
Here is a simple example: counting how many times the word "CS" is followed by "214" in a list of words. The state has two parts:
case class State(lastWord: Option[String], occurencesCount: Int)
webapps/src/main/scala/rec/StateMachines.scala
lastWord remembers which word we saw last, and occurences stores the ongoing count.
The events and results are simpler: events are simply words, and the final result is a number:
type Event = String
webapps/src/main/scala/rec/StateMachines.scala
type Result = Int
webapps/src/main/scala/rec/StateMachines.scala
We can now define the state machine itself:
object PatternMatchingSM extends StateMachine[State, Event, Result]:
webapps/src/main/scala/rec/StateMachines.scala
… it starts with no previous word and a count of 0:
def init = State(None, 0)
webapps/src/main/scala/rec/StateMachines.scala
… every time we receive a new word, it updates its state to increment the count if needed, and replaces the last word:
def update(s: State, e: Event) =
State(
Some(e),
s.occurencesCount
+ (if s.lastWord == Some("CS") && e == "214" then 1 else 0)
)
webapps/src/main/scala/rec/StateMachines.scala
… and finally, when the stream is exhausted, it just returns the final count:
def finish(s: State) = s.occurencesCount
webapps/src/main/scala/rec/StateMachines.scala
Now, how do we use this state machine? With processEvents!
val countMatchesSM: List[String] => Int =
processEvents(PatternMatching.PatternMatchingSM)
webapps/src/main/scala/rec/StateMachines.scala
Notice the currying: we could have also written things this way:
def countMatchesSM2(l: List[String]): Int =
processEvents(PatternMatching.PatternMatchingSM)(l)
webapps/src/main/scala/rec/StateMachines.scala
… and then called the resulting function as usual:
scala> countMatchesSM(List("A", "special", "CS", "214", "exercise"))
1
If you find yourself unsure of how this function really works, try tracing processEvents — it helps a lot to see exactly which states are being traversed.
Counting words ⭐️
This isn’t in fact the first time that we see state machines — we’ve seen them once before, when we wrote wordCount using foldLeft:
case class WordCountState(count: Int, lastWasWS: Boolean)
def wordCount(l: List[Char]): Int =
l.foldLeft[WordCountState](WordCountState(0, true))((state: WordCountState, c: Char) =>
val cIsWS = c.isWhitespace
val count = state.count + (if state.lastWasWS && !cIsWS then 1 else 0)
WordCountState(count, cIsWS)
).count
webapps/src/main/scala/rec/StateMachines.scala
Rewrite this function as a state machine.
Counting parentheses
When studying parallelism, we solved the parentheses-balancing problem in the following way:
def isBalanced(str: List[Char]): Boolean =
def loop(str: List[Char], numOpen: Int): Int =
if numOpen < 0 then numOpen
else
str match
case Nil => numOpen
case '(' :: next => loop(next, numOpen + 1)
case ')' :: next => loop(next, numOpen - 1)
case other :: next => loop(next, numOpen)
loop(str, 0) == 0
webapps/src/main/scala/rec/StateMachines.scala
Rewrite this function as a state machine.
Git II: Distributed version control
Here are a few questions, exercises, and resources to guide your exploration of version control systems (part 2). Please do ask on Ed or in exercise sessions if anything is unclear!
If you ask a complex Git question on Ed, it will often be much easier for us to help you debug if you include a copy of your repo. For this, post your repo as a git bundle in a private Ed post (look up the documentation of git bundle to know what it does).
Make sure you have good backups before experimenting with your computer on the command line!
These exercises are written with just enough information to specify what to do, but they don’t specify how to do it. We expect you to look things up in the manual pages, online, or to discuss it among yourselves. But whichever source you use, make sure that you understand what you are doing! Don’t just blindly copy-paste commands.
Using a Git UI
Some advanced Git users prefer the command line, but most have a favorite UI. Decide whether you prefer to work on the command line or graphically. If you decide to go the graphical route, make sure that UI supports the following features:
diff3conflict indicators (equivalent togit config --global merge.conflictStyle diff3)- Line-range history, which shows you the evolution of a range of lines (equivalent to
git log -L) - Partial and interactive staging (equivalent to
git add -i) - Word-level diffs and space-agnostic diffs (equivalent to the
--word-diff=colorand--ignore-all-spaceflags ofgit diff)
The UI that I demonstrated in class was Magit, a package for Emacs. Staff members use a mix of Emacs, Neovim, VS Code, plain command line, and a few other editors.
Configure Git text editor
If you plan to use the command line, configure which text editor Git will use for editing commit messages and other interactions. See Git I for tips on how to do this.
Creating branches ⭐️
-
Download the scaffold for this week’s FP exercises. Unzip it, and initialize a local Git repository in the root directory of the exercises.
-
Commit the scaffold on the
mainbranch. -
Create a branch for each exercise. Commit your solutions regularly as you make progress on each exercise. Make sure that you know how to:
- Create a branch
- Switch between branches
- Compare two branches with
git diff
Resolving divergences: Patching
-
Read up on
git resetandgit reset --hard. What do these commands do? Experiment with them until you can confidently predict their results. -
Collect all changes from your various branches onto the
mainbranch using each of the following techniques:- ⭐️
format-patch,switch,am cherry-pick
- ⭐️
-
What happens if you apply the same patch multiple times, or cherry-pick the same commits multiple times?
Resolving divergences: Merging
-
Reset
mainto its initial state. -
Collect all changes from your various branches onto the
mainbranch using each of the following techniques:- ⭐️ Individual merges.
- A single octopus merge with all branches.
- A series of
--ff-onlymerges, usingrebaseon each branch before merging it.
-
What happens if you try to merge the same branch multiple times? What happens if you merge a branch, add commits to it, and merge it again?
History editing 🔥
-
Find an exercise branch on which you had a suboptimal history: either a bad commit message, or multiple partial commits that would have been better as one, or a single commit that solves multiple parts of an exercise, etc.
-
Use interactive rebasing (
git rebase -i) to perform the following:- Split a single commit into multiple
- Combine multiple consecutive commits into a single one
- Reword a commit message
- Reorder commits
Conflicts
-
Make sure that you have configured Git to use the
diff3style (git config --global merge.conflictStyle diff3). -
⭐️ Pick a branch containing a complex exercise, create two branches from it, and make different changes to the same piece of code on both branches. Merge one branch into
main, then the second. What happens during the second merge? Resolve the conflict and commit the merge. -
Reset
main, then use cherry-picks to move both commits to it. Resolve the conflict again and finish the cherry-pick. -
Reset
main, then rebase one of the two branches onto the other. Resolve the conflict again and finish the rebase.