Last updated on

Find

In this lab, you will apply your knowledge of structural recursion to a practical scenario: navigating the entries of a file system. This hands-on application will show you the power of recursion and how it can be used to solve real-world problems.

Obtaining the lab files

Start by ensuring you’ve completed the tools setup, the example degrees-converter lab and started the exercises on structural recursion: you will need the notions covered in these resources to complete this lab.

Then, run the following commands to obtain a copy of the lab:

$ git clone https://gitlab.epfl.ch/lamp/cs-214/find.git
$ cd find

The first command will create a directory called find/ containing our starter code. The second one will enter the find/ directory.

For now, don’t worry about what the git clone command does: we will cover this in lecture next week.

Once you’ve downloaded the project files by running git clone, you can open the project in your preferred code editor. For this course, we recommend using Visual Studio Code with the Metals extension for Scala. Please refer to our tools setup guide guide for installation and configuration instructions.

To open Visual Studio Code, you have two options:

Graphical Method:

  • Launch Visual Studio Code.
  • Use the “Open Folder” option to navigate to and select the folder that git clone created (this folder will contain a build.sbt file).

Terminal Method:

  • Navigate to the folder that git clone created in your terminal (typically by typing cd find after running the git clone […]find.git command).
  • Run the command code ..

The code command will start Visual Studio Code, and the . indicates that it should open the current directory.

Introduction

The find command is a powerful utility in Unix-like operating systems, allowing users to search for files and directories based on various criteria. It searches by name, size, type (file or directory), modification date, and many other properties. Additionally, find can perform actions on the files and directories it locates, such as displaying their names, deleting them, or running other commands on them.

Consider the following directory structure inside the find/src directory, as displayed by the macOS Finder:

Using a terminal, we can use the find command to search for files and directories within this directory structure. The following command, for example, searches for files ending with .scala with a size bigger than 1200 bytes in the src/main/ directory:

$ find src/main/ -name "*.scala" -size +1200c
src/main/scala/find/cs214/FindOutputParser.scala
src/main/scala/find/cs214/Entry.scala
src/main/scala/find/cli/main.scala

In this lab, you’ll implement a simplified version of the find command in Scala.

We refer to the find command in Unix-like operating systems here. There is also a command named find in Windows, but it is not the same as the one we are implementing in this lab. If you want to try the Unix find command in Windows, you might want to either:

  • use the “Git BASH” terminal that comes with Git on Windows, or
  • download a Windows build of GNU FindUtils, or
  • use WSL.

Structure

A call to find usually begins with the path where the search should start, followed by one or more filters, and ends with an action:

find  src/main/  -name "*.scala" -size +1200c  -print
      ---------  ----------------------------  ------
        path                filter             action

For example, in find src/main/ -name "*.scala" -size +1200c -print:

In this lab:

You will implement five functions, one for each filter. Each function will receive a cs214.Entry object as its argument and return a boolean indicating whether results were found. This Entry object represents the file or directory where the search should start. cs214.Entry is an abstraction of a file or directory in the file system that we wrote for you, explained in the next section.

For each function you implement in this lab, please adhere to the following guidelines:

The entry point of the program (usually called the main function) is the cs214find function in src/main/scala/find/cli/main.scala. It is already implemented for you. It parses the command-line arguments, creates a cs214.Entry object, and invokes the relevant function based on the filter specified in the command-line arguments. You do not need to read nor to understand this file.

The cs214.Entry class

The cs214.Entry class represents a file or directory in the file system. Each entry has a reference to its first child and to its next sibling. For example, the following directory structure:

food/
food/fruits/
food/fruits/strawberry.txt
food/fruits/tomato.txt
food/vegetables/
food/vegetables/tomato.txt

would be represented as 4 entries with the following relationships:

food/ fruits/ strawberry.txt tomato.txt vegetables/ tomato.txt firstChild firstChild nextSibling nextSibling firstChild

Each cs214.Entry has 8 methods: .name(), .size(), .path(), .hasNextSibling(), .nextSibling(), .isDirectory(), .hasChildren() and .firstChild().

Additionally, the function cs214.open(path) is provided to create an Entry object from a path string.

In the following REPL session, we demonstrate the usage of cs214.Entry using the food example directory depicted above:

scala> val food = find.cs214.open("example-dir/food")
val food: find.cs214.Entry = Entry("example-dir/food")

scala> food.name()
val res0: String = food

scala> food.path()
val res1: String = example-dir/food

scala> food.isDirectory()
val res2: Boolean = true

scala> food.hasChildren()
val res3: Boolean = true

scala> val fruits = food.firstChild()
val fruits: find.cs214.Entry = Entry("example-dir/food/fruits")

scala> fruits.isDirectory()
val res4: Boolean = true

scala> fruits.hasChildren()
val res5: Boolean = true

scala> val strawberry = fruits.firstChild()
val strawberry: find.cs214.Entry = Entry("example-dir/food/fruits/strawberry.txt")

scala> strawberry.isDirectory()
val res6: Boolean = false

scala> strawberry.hasNextSibling()
val res7: Boolean = true

scala> strawberry.nextSibling()
val res8: find.cs214.Entry = Entry("example-dir/food/fruits/tomato.txt")

scala> fruits.hasNextSibling()
val res9: Boolean = true

scala> val vegetables = fruits.nextSibling()
val vegetables: find.cs214.Entry = Entry("example-dir/food/vegetables")

scala> vegetables.isDirectory()
val res10: Boolean = true

scala> vegetables.hasChildren()
val res11: Boolean = true

scala> vegetables.firstChild()
val res12: find.cs214.Entry = Entry("example-dir/food/vegetables/tomato.txt")

scala> vegetables.hasNextSibling()
val res13: Boolean = false

Note that some of these methods may throw exceptions:

Testing

Some tests are provided for you in src/test/scala/find/FindTest.scala. You can run them by executing the following command from the root of the project (the find/ directory that you cloned):

$ sbt test

You don’t need to understand the code in FindTest.scala: you’ll learn more about testing in a later lesson. However, if you want to understand why a specific test fails, it may be useful to read through the comments in the file.

Our tests depend on your code using the cs214.Entry object, so using built-in Scala or Java functions or external libraries to traverse the file system will not work.

Running

You can execute your command from the SBT shell using the run SBT command. Here’s an example of running your find command with a path and no filter. After implementing findAllAndPrint, you should see the following output:

$ sbt
sbt:find> run src/
[info] running find.cli.cs214find src
src
src/main
src/main/scala
src/main/scala/HowManyHoursISpentOnThisLab.scala
src/main/scala/find
src/main/scala/find/cli
src/main/scala/find/cli/main.scala
src/main/scala/find/cs214
src/main/scala/find/cs214/Entry.scala
src/main/scala/find/cs214/FindOutputParser.scala
src/main/scala/find/cs214/MockEntry.scala
src/main/scala/find/cs214/OSLibEntry.scala
src/main/scala/find/cs214/open.scala
src/main/scala/find/find.scala
src/main/scala/playground.worksheet.sc
src/test
src/test/scala
src/test/scala/find
src/test/scala/find/FindTest.scala
[success] Total time: 5 s, completed Sep 9, 2023, 9:57:06 PM
sbt:find> run src/test/
[info] running find.cli.cs214find src/test
src/test
src/test/scala
src/test/scala/find
src/test/scala/find/FindTest.scala
[success] Total time: 0 s, completed Sep 9, 2023, 9:59:22 PM

On Windows the backslashes (\) should be escaped when running a command from the SBT shell. I.e. you should double every backslash:

sbt:find> run C:\\Users\\Ada

The same is true in worksheets or other Scala code:

val entry = cs214.open("C:\\Users\\Ada")

Generating an executable

You can also create a standalone executable that can be run from the command line without SBT. To achieve this, use the pack command inside the SBT shell:

$ sbt
sbt:find> pack
...
[success] Total time: 0 s, completed Sep 9, 2023, 9:58:07 PM
sbt:find> exit

This command produces the executable cs214find inside the target/pack/bin directory, which you can run like any other regular command line program. After implementing findByNameAndPrint, you should observe the following output:

$ target/pack/bin/cs214find src -name Entry.scala
src/main/scala/find/cs214/Entry.scala

Debugging

To debug your code, remember the tools from the example degrees converter lab:

There is an existing worksheet at src/main/scala/playground.worksheet.sc. If your setup is working correctly, this is what you should see when opening it, after some time:

Implementation

You have five functions to implement: findAllAndPrint, findByNameAndPrint, findBySizeEqAndPrint, findBySizeGeAndPrint, findEmptyAndPrint. Each of them implements a different find filter (no filter, -name, -size, -size +, and -empty) by printing matching files and directories and returning a Boolean indicating whether any results were found. More precisely:

  1. Each function takes one entry (a starting point for the search).
  2. Some functions additionally take a filter (name, size, or minSize).
  3. Each function must recursively check all files that are reachable from the initial entry parameter.
  4. Each function should print the path (.path()) of any entry that matches the search criterion, and return a boolean indicating whether any matching entries were found.

The initial entry parameters comes either from the command line or from the tests. To pass the tests, you may assume the following two things:

  1. The initial entry will be a directory (.isDirectory() will be true)
  2. The initial entry will have no siblings (.hasNextSibling() will be false)

That said, the most natural implementation does not need these facts, and works just as well when the initial input has siblings or is a file. In that case, the correct behavior would be to print any matching entries among the siblings of the initial entry. Our test suite (and hence our grading system) does not test this case.

1. findAllAndPrint

Implement the findAllAndPrint(entry) function. It should traverse the filesystem recursively by invoking the relevant methods on the cs214.Entry object, print the paths of all files and directories it finds, and return a Boolean indicating whether any files or directories were printed.

To print the paths, you can use the println function, which displays a string followed by a newline on the terminal’s standard output.

println can be placed on the line directly preceding any Scala expression; for example, the following function prints "Hello!" and returns 3:

def hello(): Int =
    println("Hello!")
    3
Hint

Can findAllAndPrint ever return false?

2. findByNameAndPrint

Implement the findByNameAndPrint(entry,name) function. It should search for entries (files or directories) that have the given name, print their paths, and return a Boolean indicating whether any entries were found.

Unlike the real find command, wildcards are not supported in findByNameAndPrint: if the user searches for Doc*.txt, it will look for a file or directory named exactly Doc*.txt and not files starting with Doc and ending with .txt.

3. findBySizeEqAndPrint

Implement the findBySizeEqAndPrint(entry,size) function. It should search for files that have exactly the given size, print their paths, and return a Boolean indicating whether any files were found.

findBySizeEqAndPrint should disregard directories.

4. findBySizeGeAndPrint

Implement the findBySizeGeAndPrint(entry,minSize) function. Unlike the previous function that searches for files of a specific size, findBySizeGeAndPrint locates files larger than or equal to the given size.

5. findEmptyAndPrint

Lastly, implement the findEmptyAndPrint(entry) function. It should search for entries (files or directories) that are empty, print their paths, and return a Boolean indicating whether any results were found.

A file is considered empty if it has size 0, and a directory is empty if it contains no children.

6. Reporting time you spent on the lab

Now that you’ve implemented everything, please help us improve this lab for future generations (and the following labs for this semester) by taking a moment to record how much time you spent working on this lab.

To do so, update the value labeled howManyHoursISpentOnThisLab in HowManyHoursISpentOnThisLab.scala:

val howManyHoursISpentOnThisLab: Double =
  0.0 // in hours, so put 3.5 here if you spent 3 hours and a half on the lab

src/main/scala/HowManyHoursISpentOnThisLab.scala

For example, if you spent three hours and a half to complete the lab, change the function to the following:

val howManyHoursISpentOnThisLab: Double =
  3.5

src/main/scala/HowManyHoursISpentOnThisLab.scala

Submitting your work

If all the tests run successfully, then you are done: congratulations on completing your first CS-214 lab! And next time you run out of disk space on your laptop, you’ll have the perfect tool at hand to scan your drive for excessively large files:

$ target/pack/bin/cs214find /home/ada/ -size +52428800c
/home/ada/.config/discord/0.0.28/modules/discord_voice/discord_voice.node
/home/ada/.local/share/Steam/steamapps/common/Overcooked! 2/Overcooked2_Data/resources.assets
/home/ada/.local/share/Steam/steamapps/common/Proton 7.0/proton_dist.tar
/home/ada/.local/share/Steam/ubuntu12_32/steam-runtime.tar.xz
/home/ada/.cache/coursier/v1/https/github.com/adoptium/temurin17-binaries/releases/download/jdk-17%252B35/OpenJDK17-jdk_x64_linux_hotspot_17_35.tar.gz
…

To turn in your lab, you must submit two files to Moodle:

On the assignment page:

  1. Click “Add submission”.

  2. Drop the find.scala file and the HowManyHoursISpentOnThisLab.scala file in the “File submissions” box.

  3. Click “Save changes”.

  4. Click “Submit assignment”.

  5. Confirm that you respected the rules by checking the box next to “This assignment is my own work”.

  6. Click “Continue”.

Double-check that:

Conclusion

Take a moment to look at the functions you’ve implemented in this lab. Do you notice any similarities or patterns in your code? Any repetitive structures or blocks?

Recognizing patterns and redundancy is a key step towards writing more efficient and modular code. In future lessons, you’ll learn about new concepts and how they can be used to abstract common patterns in code, making your programs more concise, readable, and maintainable.

We will return to this find lab at a later stage and see how we can refactor and modularize our find functions to reduce redundancy and enhance flexibility.

Bonus: findFirstByNameAndPrint 🔥

This part is optional and not graded.

Sometimes you don’t care to see all results — just the first one. How would you need to change your code to return just the first result? To answer this question, implement the findFirstByNameAndPrint( entry ) function. Similarly to findByNameAndPrint, it should list the files with the given name, but instead of printing them all, it should only print the first one. This mirrors the behavior of find path -name "filename" -print -quit.

There is an elegant solution to this question that uses only concepts from the first lecture of CS-214, and its implementation looks very similar to findByNameAndPrint. We’ll see later how you can write a single function to cover both cases.

There is an extra commented test in FindTest.scala that you can use to test your implementation.