Last updated on

Week 9: Webapps and more Version Control

Welcome to week 9 of CS-214 — Software Construction!

As usual, ⭐️ indicates the most important exercises and questions and 🔥 indicates the most challenging ones. Exercises or questions marked 🧪 are intended to build up to concepts used in this week’s lab.

You do not need to complete all exercises to succeed in this class, and you do not need to do all exercises in the order they are written.

Webapps

This exercise set is intended to reinforce some concepts used in the webapp lab: serialization, and state machines.

We strongly encourage you to solve the exercises on paper first, in groups. After completing a first draft on paper, you may want to check your solutions on your computer. To do so, you can pull from the course exercises repository.

Serialization / deserialization 🧪

The traditional way to handle exceptional results or errors in functional programming is the Option type, which we studied in the calculator lab. At times, this style can be a bit heavy. We’ll see why it may still be worth using in the monads week, but for this week’s exercises (and lab!) let’s use exceptions instead.

We’ll illustrate the issue with a serialization problem.

Serialization is the process of transforming data within a program into a form that can be transmitted over a network, stored in files, etc. Deserialization is the reverse operation. Once your program has state, it’s pretty natural to want to save it, and for that you need serialization.

A common serialization format is JSON. In Scala, a JSON value can be represented like this:

package ujson

enum Value:
  case Null
  case True
  case False
  case Str(s: String)
  case Num(d: Double)
  case Arr(items: ArrayBuffer[Value])
  case Obj(items: LinkedHashMap[String, Value])

A serialization function for type T is a function of type T => ujson.Value. A deserialization function is a function of type ujson.Value => Try[T]. The type Try[T] models either a Success(t: T) or a Failure, and comes with a very convenient constructor Try(…) which runs the computation , and returns a Failure if it raises an exception, and a Success otherwise. A wire is a pair of a serialization and a deserialization function; it can be represented thus:

trait Wire[T]:
  def serialize(t: T): ujson.Value
  def deserialize(js: ujson.Value): Try[T]

webapps/src/main/scala/serialization/Wire.scala

As an example, here are a serializer and a deserializer for a pair of an integer and a string:

object IntStringWire extends Wire[(Int, String)]:
  def serialize(t: (Int, String)): ujson.Value =
    ujson.Arr(
      ujson.Num(t._1),
      ujson.Str(t._2)
    )

  def deserialize(js: ujson.Value): Try[(Int, String)] =
    Try {
      val arr = js.arr // .arr throws an exception if the input isn't an array
      val (fst, snd) = (arr(0).num, arr(1).str)
      if !fst.isValidInt then
        throw IllegalArgumentException(f"Not an int: $fst")
      (fst.toInt, snd)
    }

webapps/src/main/scala/serialization/Wire.scala

And if we are given a serializer and deserializer for some type T, we can use it to create a serializer and deserializer for Option[T]:

case class OptionWire[T](w: Wire[T]) extends Wire[Option[T]]:
  def serialize(t: Option[T]): ujson.Value =
    t match
      case None =>
        ujson.Arr(ujson.Str("none"))
      case Some(t) =>
        ujson.Arr(ujson.Str("some"), w.serialize(t))

  def deserialize(js: ujson.Value): Try[Option[T]] =
    Try {
      val arr = js.arr
      val tag = arr(0).str
      if arr.size == 1 && arr(0).str == "none" then
        None
      else if arr.size == 2 && arr(0).str == "some" then
        Some(w.deserialize(arr(1)).get)
      else
        throw IllegalArgumentException(f"Unexpected: ${arr.toList}")
    }

webapps/src/main/scala/serialization/Wire.scala

  1. Read the documentation of the Try type.

  2. Skim through the examples provided in the documentation of the ujson library (you can skip the last section on Converting To/From Scala Data Types).

  3. Consider the following type:

    enum Formula:
      case Lit(b: Boolean)
      case Var(name: String)
      case FnCall(fn: String, args: List[Formula])
    

    webapps/src/main/scala/serialization/Wire.scala

    How would you serialize a value of this type? Think about representation choices on paper first. Try a few different examples by hand. Then, write the serializer and deserializer.

    object FormulaWire extends Wire[Formula]:
      import Formula.*
    
      def serialize(e: Formula): ujson.Value =
        ???
    
      def deserialize(js: ujson.Value): Try[Formula] =
        ???
    

    webapps/src/main/scala/serialization/Wire.scala

  4. All encoder/decoder pairs should verify deserialize(serialize(v)) === Success(v) for all v. Use this equation to test your serializer and deserializer together.

    (This is necessary but not sufficient for correctness; do you see why?)

  5. A previous version of this exercise had the following assertion in a test, to check for “correct handling of invalid inputs”, but students complained:

    assert(FormulaWire.deserialize("notaformula").isFailure)
    

    Can you see what was wrong with this test?

Isolating state: state machines 🧪

Most stateful applications are thoroughly stateful: there’s state everywhere, and anything can change under your feet at any time. A better model, when you must have state, is to contain it in one place. This is the state machine model, where you have a single piece of state, and a function to update it.

You have already seen many examples of this pattern, without knowing it:

Another way to put this is that we have a state and instructions that are interpreted to update the state. In that sense, foldLeft and foldRight are interpreters for a language whose instructions are the elements of the list being processed.

This week’s lab will demonstrate how powerful this model is for building reliable single- and multi-user applications for desktop, mobile, and web environments. For now, let’s get used to thinking in terms of state machines. We’ll use the following model:

trait StateMachine[State, Event, Result]:
  def init: State
  def update(s: State, e: Event): State // Sometimes called *transition*
  def finish(s: State): Result

webapps/src/main/scala/rec/StateMachines.scala

def processEvents[S, E, R](sm: StateMachine[S, E, R])(events: List[E]): R =
  sm.finish(events.foldLeft(sm.init)(sm.update))

webapps/src/main/scala/rec/StateMachines.scala

Why do we bother with all this if it all boils down to a foldLeft? Because in general we are interested in modeling streaming processes, where events are user inputs, messages on a network, interrupts, etc. In that case, instead of a foldLeft, we would store the state in a var and update it every time we receive an event from the user, the network, a sensor, etc. (and there would be no finish stage)

Here is a simple example: counting how many times the word "CS" is followed by "214" in a list of words. The state has two parts:

case class State(lastWord: Option[String], occurencesCount: Int)

webapps/src/main/scala/rec/StateMachines.scala

lastWord remembers which word we saw last, and occurences stores the ongoing count.

The events and results are simpler: events are simply words, and the final result is a number:

type Event = String

webapps/src/main/scala/rec/StateMachines.scala

type Result = Int

webapps/src/main/scala/rec/StateMachines.scala

We can now define the state machine itself:

object PatternMatchingSM extends StateMachine[State, Event, Result]:

webapps/src/main/scala/rec/StateMachines.scala

… it starts with no previous word and a count of 0:

def init = State(None, 0)

webapps/src/main/scala/rec/StateMachines.scala

… every time we receive a new word, it updates its state to increment the count if needed, and replaces the last word:

def update(s: State, e: Event) =
  State(
    Some(e),
    s.occurencesCount
      + (if s.lastWord == Some("CS") && e == "214" then 1 else 0)
  )

webapps/src/main/scala/rec/StateMachines.scala

… and finally, when the stream is exhausted, it just returns the final count:

def finish(s: State) = s.occurencesCount

webapps/src/main/scala/rec/StateMachines.scala

Now, how do we use this state machine? With processEvents!

val countMatchesSM: List[String] => Int =
  processEvents(PatternMatching.PatternMatchingSM)

webapps/src/main/scala/rec/StateMachines.scala

Notice the currying: we could have also written things this way:

def countMatchesSM2(l: List[String]): Int =
  processEvents(PatternMatching.PatternMatchingSM)(l)

webapps/src/main/scala/rec/StateMachines.scala

… and then called the resulting function as usual:

scala> countMatchesSM(List("A", "special", "CS", "214", "exercise"))
1

If you find yourself unsure of how this function really works, try tracing processEvents — it helps a lot to see exactly which states are being traversed.

Counting words ⭐️

This isn’t in fact the first time that we see state machines — we’ve seen them once before, when we wrote wordCount using foldLeft:

case class WordCountState(count: Int, lastWasWS: Boolean)

def wordCount(l: List[Char]): Int =
  l.foldLeft[WordCountState](WordCountState(0, true))((state: WordCountState, c: Char) =>
    val cIsWS = c.isWhitespace
    val count = state.count + (if state.lastWasWS && !cIsWS then 1 else 0)
    WordCountState(count, cIsWS)
  ).count

webapps/src/main/scala/rec/StateMachines.scala

Rewrite this function as a state machine.

Counting parentheses

When studying parallelism, we solved the parentheses-balancing problem in the following way:

def isBalanced(str: List[Char]): Boolean =
  def loop(str: List[Char], numOpen: Int): Int =
    if numOpen < 0 then numOpen
    else
      str match
        case Nil           => numOpen
        case '(' :: next   => loop(next, numOpen + 1)
        case ')' :: next   => loop(next, numOpen - 1)
        case other :: next => loop(next, numOpen)
  loop(str, 0) == 0

webapps/src/main/scala/rec/StateMachines.scala

Rewrite this function as a state machine.

Git II: Distributed version control

Here are a few questions, exercises, and resources to guide your exploration of version control systems (part 2). Please do ask on Ed or in exercise sessions if anything is unclear!

If you ask a complex Git question on Ed, it will often be much easier for us to help you debug if you include a copy of your repo. For this, post your repo as a git bundle in a private Ed post (look up the documentation of git bundle to know what it does).

Make sure you have good backups before experimenting with your computer on the command line!

These exercises are written with just enough information to specify what to do, but they don’t specify how to do it. We expect you to look things up in the manual pages, online, or to discuss it among yourselves. But whichever source you use, make sure that you understand what you are doing! Don’t just blindly copy-paste commands.

Using a Git UI

Some advanced Git users prefer the command line, but most have a favorite UI. Decide whether you prefer to work on the command line or graphically. If you decide to go the graphical route, make sure that UI supports the following features:

The UI that I demonstrated in class was Magit, a package for Emacs. Staff members use a mix of Emacs, Neovim, VS Code, plain command line, and a few other editors.

Configure Git text editor

If you plan to use the command line, configure which text editor Git will use for editing commit messages and other interactions. See Git I for tips on how to do this.

Creating branches ⭐️

  1. Download the scaffold for this week’s FP exercises. Unzip it, and initialize a local Git repository in the root directory of the exercises.

  2. Commit the scaffold on the main branch.

  3. Create a branch for each exercise. Commit your solutions regularly as you make progress on each exercise. Make sure that you know how to:

    1. Create a branch
    2. Switch between branches
    3. Compare two branches with git diff

Resolving divergences: Patching

  1. Read up on git reset and git reset --hard. What do these commands do? Experiment with them until you can confidently predict their results.

  2. Collect all changes from your various branches onto the main branch using each of the following techniques:

    1. ⭐️ format-patch, switch, am
    2. cherry-pick
  3. What happens if you apply the same patch multiple times, or cherry-pick the same commits multiple times?

Resolving divergences: Merging

  1. Reset main to its initial state.

  2. Collect all changes from your various branches onto the main branch using each of the following techniques:

    1. ⭐️ Individual merges.
    2. A single octopus merge with all branches.
    3. A series of --ff-only merges, using rebase on each branch before merging it.
  3. What happens if you try to merge the same branch multiple times? What happens if you merge a branch, add commits to it, and merge it again?

History editing 🔥

  1. Find an exercise branch on which you had a suboptimal history: either a bad commit message, or multiple partial commits that would have been better as one, or a single commit that solves multiple parts of an exercise, etc.

  2. Use interactive rebasing (git rebase -i) to perform the following:

    • Split a single commit into multiple
    • Combine multiple consecutive commits into a single one
    • Reword a commit message
    • Reorder commits

Conflicts

  1. Make sure that you have configured Git to use the diff3 style (git config --global merge.conflictStyle diff3).

  2. ⭐️ Pick a branch containing a complex exercise, create two branches from it, and make different changes to the same piece of code on both branches. Merge one branch into main, then the second. What happens during the second merge? Resolve the conflict and commit the merge.

  3. Reset main, then use cherry-picks to move both commits to it. Resolve the conflict again and finish the cherry-pick.

  4. Reset main, then rebase one of the two branches onto the other. Resolve the conflict again and finish the rebase.