Write code, not compilers

Table of Contents

1. Explanation

To convert a function into some other type (executable code, or a pretty-printed string, or something else), the easiest way is to call the function, passing certain arguments, such that the desired type is returned from the function.

A function does not need to be parsed and statically analyzed to figure out what it does. The function itself will tell you: Just call it and see.

The function will speak to you in a language determined by the arguments you pass it.

  • If you pass it objects and methods which perform IO and return unit, it will speak to you in the language of side-effects.
  • If you pass it objects and methods which pretty-print their arguments and return a string, it will speak to you in the language of pretty-printed programs.
  • If you pass it objects and methods which allocate registers and return instructions, it will speak to you in the language of executable code.

The fewer global variables and types that a function refers to, the more the function can speak in the language you desire, rather than a hard-coded predetermined langauge.

In this way, one can replace compilers, which inspect syntax trees and operate on the meta-level, with regular code operating on the object-level. If you want to be able to analyze your data and programs without compilers, you don't need to use "less powerful" languages; just avoid globals and implicit authority.

2. Examples

That koan alone may not bring you to enlightment, so here's some examples in working, type-annotated Python 3.

2.1. The interface and program

First, an interface:

class Data:
    "Some opaque piece of data"
    pass

class Directory:
    "A directory, in which we can create files"
    def create_file(self, name: str) -> File: ...
    def for_each_file(self, f: t.Callable[[File], None]) -> None: ...

class File:
    "A file, to which we can append several types of values"
    def append_str (self, data: str ) -> None: ...
    def append_data(self, data: Data) -> None: ...
    def append_path(self, data: File) -> None: ...
    def close(self) -> None: ...

And a short program that uses that interface:

def create_file_with_contents(dir: Directory, name: str, arg: Data) -> File:
    "A helper function to create a file containing some data"
    file = dir.create_file(name)
    file.append_data(arg)
    return file

def prog(dir: Directory, arg1: Data, arg2: Data) -> None:
    "A simple program manipulating files and directories"
    paths = dir.create_file("paths")

    arg1_file = create_file_with_contents(dir, "arg1", arg1)
    paths.append_str("arg1 file path:")
    paths.append_path(arg1_file)
    arg1_file.close()

    arg2_file = create_file_with_contents(dir, "arg2", arg2)
    paths.append_str("arg2 file path:")
    paths.append_path(arg2_file)
    arg2_file.close()

    paths.append_str("all paths:")
    def f(file: File) -> None:
        paths.append_path(file)
        file.close()
    dir.for_each_file(f)

    paths.close()

By passing different implementations of the interfaces to this program, we will reinterpret the program as doing different things and producing different results.

2.2. IO

First, the obvious implementation: One which just writes to the filesystem.

@dataclass
class StrData(Data):
    "Some data, specifically a string"
    content: str

@dataclass
class IODirectory(Directory):
    "A directory in the filesystem, in which we'll create files"
    path: str

    def create_file(self, name: str, size: int=None) -> IOFile:
        """Open and create a file in this directory in the filesystem

        An optional argument, size, will preallocate space in the file
        for future writes. We'll use this later.

        """
        path = self.path + "/" + name
        f = open(path, 'w')
        if size:
            f.truncate(size)
        return IOFile(path, f)

    def for_each_file(self, f: t.Callable[[File], None]) -> None:
        "Call `f` with each file in this directory"
        for name in os.listdir(self.path):
            path = self.path + "/" + name
            f(IOFile(path, open(path, 'r')))

@dataclass
class IOFile(File):
    "A file in the filesystem, to which we'll write"
    path: str
    file: t.TextIO

    def append_str(self, data: str) -> None:
        self.file.write(data)

    def append_data(self, data: StrData) -> None:
        self.file.write(data.content)

    def append_path(self, data: IOFile) -> None:
        self.file.write(data.path)

    def close(self) -> None:
        self.file.close()

We can then run prog with this implementation, to get the expected behavior of making some files and writing into them.

def main():
    dir = IODirectory("/tmp/somedir")
    arg1 = StrData("my very cool and neat data")
    arg2 = StrData("some other kind of cool and neat data")
    prog(dir, arg1, arg2)

So far, this is all completely conventional.

2.3. Testing

We can pass an implementation which transforms our program into a test. At each point, instead of performing an operation, the program asserts that the operation has been performed correctly.

That's what the Test implementation does:

  • Instead of creating a new file, we assert that the file is there.
  • Instead of writing to the file, we read the file and assert its contents match our expectation.

This isn't a mock; it really does do IO in the filesystem, just different IO.

@dataclass
class TestDirectory(Directory):
    "A directory in the filesystem, in which we'll open files"
    path: str

    def create_file(self, name: str) -> TestFile:
        """Open a file in this directory in the filesystem

        If the file doesn't exist, we'll throw an exception.

        """
        path = self.path + "/" + name
        # throws if the file doesn't exist
        f = open(path, 'r')
        return TestFile(path, f)

    def for_each_file(self, f: t.Callable[[File], None]) -> None:
        """Call `f` with each file in this directory.

        Same as for IODirectory.
        """
        for name in os.listdir(self.path):
            path = self.path + "/" + name
            f(TestFile(path, open(path, 'r')))

@dataclass
class TestFile(File):
    "A file in the filesystem, which we'll read from"
    path: str
    file: t.TextIO

    def append_str(self, data: str) -> None:
        """Assert this string matches the data in this file.

        As we read more data from the file, our position in the file
        moves forward and we read new data.

        """
        read_data = self.file.read(len(data))
        if data != read_data:
            raise Exception("the next data in the file should be", data, "not", read_data)

    def append_data(self, data: StrData) -> None:
        self.append_str(data.content)

    def append_path(self, data: TestFile) -> None:
        self.append_str(data.path)

    def close(self) -> None:
        self.file.close()

Now we can test the results of the IO implementation by running the Test implementation.

def testmain():
    dir = IODirectory("/tmp/somedir")
    arg1 = StrData("my very cool and neat data")
    arg2 = StrData("some other kind of cool and neat data")
    # run with IO
    prog(dir, arg1, arg2)
    # run with Test
    prog(TestDirectory(dir.path), arg1, arg2)

First we run prog once with IODirectory to create the files. Then we run prog with TestDirectory to check that the files are there, and have the correct contents.

2.4. Pretty printing

This implementation of Data, Directory, and File pretty-prints the program that they are passed to.

Whenever a method is called, this implementation writes a line of code which calls that method. Variable names are generated to store any returned values, and used when later method calls are made with those values.

class Program:
    statements: t.List[str]
    name: str

    def var(self, name: str) -> str:
        "Generate a fresh, unused variable name from this name"
        return self.name + "_" + name + str(len(self.statements))

    @contextlib.contextmanager
    def def_function(self, name: str, args: t.List[str]) -> str:
        """Helper for pretty-printing function definitions

        Statements performed inside this context manager are part of the
        function definition.

        """
        parent_statements, parent_name = self.statements, self.name
        self.statements, self.name = [], self.var(name)
        yield self.name
        parent_statements.append(
            f"def {self.name}(" + ", ".join(args) + "):\n" +
            textwrap.indent("\n".join(self.statements), "    "))
        self.statements, self.name = parent_statements, parent_name

@dataclass
class PPDirectory(Directory):
    "A staged directory, which writes lines of code to perform requested operations"
    program: Program
    variable_name: str

    def create_file(self, name: str) -> PPFile:
        "Write a line of code to call .create_file and store the result in an arbitrarily named variable"
        file = PPFile(self.program, self.program.var("file"))
        self.program.statements.append(f"{file.variable_name} = {self.variable_name}.create_file('{name}')")
        return file

    def for_each_file(self, f: t.Callable[[File], None]) -> None:
        "Write a function definition for `f`, then a line of code to calling .for_each_file passing that function."
        file = PPFile(self.program, self.program.var("file"))
        with self.program.def_function('f', [file.variable_name]) as func_name:
            f(file)
        self.program.statements.append(f"{self.variable_name}.for_each_file({func_name})")

@dataclass
class PPFile(File):
    "A staged file, which writes lines of code to perform requested operations"
    program: Program
    variable_name: str

    def append_str(self, data: str) -> None:
        "Write a line of code to call .append_str with this string constant"
        self.program.statements.append(f"{self.variable_name}.append_str('{data}')")

    def append_data(self, data: PPData) -> None:
        "Convert data to a variable name, and write a line of code to call .append_str with it"
        self.program.statements.append(f"{self.variable_name}.append_data({data.variable_name})")

    def append_path(self, data: PPFile) -> None:
        "Convert data to a variable name, and write a line of code to call .append_path with it"
        self.program.statements.append(f"{self.variable_name}.append_path({data.variable_name})")

    def close(self) -> None:
        "Write a line of code to call .close"
        self.program.statements.append(f"{self.variable_name}.close()")

@dataclass
class PPData(Data):
    "A staged piece of data, which exists only as a variable name"
    variable_name: str

We can run prog with this implementation, picking arbitrary initial variable names:

def ppmain():
    program = Program([], "")
    dir = PPDirectory(program, "mydir")
    arg1 = PPData("somearg")
    arg2 = PPData("otherarg")
    with program.def_function('main', [dir.variable_name, arg1.variable_name, arg2.variable_name]):
        prog(dir, arg1, arg2)
    print(program.statements[0])

This outputs a pretty-printed program to stdout:

def _main0(mydir, somearg, otherarg):
    _main0_file0 = mydir.create_file('paths')
    _main0_file1 = mydir.create_file('arg1')
    _main0_file1.append_data(somearg)
    _main0_file0.append_str('arg1 file path:')
    _main0_file0.append_path(_main0_file1)
    _main0_file1.close()
    _main0_file6 = mydir.create_file('arg2')
    _main0_file6.append_data(otherarg)
    _main0_file0.append_str('arg2 file path:')
    _main0_file0.append_path(_main0_file6)
    _main0_file6.close()
    _main0_file0.append_str('all paths:')
    def _main0_f12(_main0_file12):
        _main0_file0.append_path(_main0_file12)
        _main0_file12.close()
    mydir.for_each_file(_main0_f12)
    _main0_file0.close()

Not the most beautiful pretty-printing, but still pretty good considering that this works without access to the source code.

2.5. Optimization

First, some background knowledge: When writing to a filesystem, space must be allocated for data as it is written. Writing data in many small chunks causes the space allocation to be broken up into many small chunks. It is substantially more efficient to allocate space in one big chunk, rather than in many small chunks.

Knowing that, we'd like to optimize our program to allocate all the space it needs for a file up front, at the time it creates the file.

To do that, this implementation of Data, Directory, and File profiles the program it's passed to, storing information about the space allocation implicitly performed by the program. After the program is finished running with the profiling implementation, the optimized_dir method returns a new Directory object which uses that profiling information to perform space allocations in one big chunk at file creation, instead of in smaller chunks.

@dataclass
class ProfilingDirectory(Directory):
    path: str
    files: t.Dict[str, ProfilingFile]

    def create_file(self, name: str) -> File:
        "Make a file which profiles the space usage of operations performed on it"
        path = self.path + "/" + name
        file = ProfilingFile(path)
        self.files[name] = file
        return file

    def for_each_file(self, f: t.Callable[[File], None]) -> None:
        "Does nothing; this depends on data in the filesystem, so we can't statically profile this"
        pass

    def optimized_dir(self, path: str) -> OptimizedDirectory:
        "Return an optimized directory which performs profiled space allocations all at once"
        return OptimizedDirectory(path, self.files)

@dataclass
class ProfilingFile(File):
    path: str
    size: int = 0

    def append_str(self, data: str) -> None:
        "Record how much file space writing this string would consume"
        self.size += len(data)

    def append_data(self, data: StrData) -> None:
        "Record how much file space writing this data would consume"
        self.append_str(data.content)

    def append_path(self, data: ProfilingFile) -> None:
        "Record how much file space writing this path would consume"
        self.append_str(data.path)

    def close(self) -> None:
        pass

@dataclass
class OptimizedDirectory(IODirectory):
    profiler_results: t.Dict[str, ProfilingFile]

    def create_file(self, name: str) -> IOFile:
        "Create this file, allocating space in it based on data from profiling"
        profiler_result = self.profiler_results.get(name)
        if profiler_result:
            return super().create_file(name, size=profiler_result.size)
        else:
            return super().create_file(name)

We can use this profiler implementation to profile our program once, and then run it many times.

def optimized_main():
    arg1 = StrData("somearg")
    arg2 = StrData("otherarg")
    profile_dir = ProfilingDirectory("somedir", {})
    prog(profile_dir, arg1, arg2)
    prog(profile_dir.optimized_dir("adir"), arg1, arg2)
    prog(profile_dir.optimized_dir("bdir"), arg1, arg2)

2.6. Linear type inference and checking

This implementation infers linear types for expressions and functions in our program, and checks their consistency. Specifically, we statically track the open/closed state of files to ensure that files cannot be used after they are closed.

Note that the inferred linear types are not actually made explicit as data in the type-checker; they are left implicit. This allows a very simple typechecker implementation.

@dataclass
class TypecheckingDirectory(Directory):
    def create_file(self, name: str) -> TypecheckingFile:
        "Make a file which profiles the space usage of operations performed on it"
        return TypecheckingFile(open=True)

    def for_each_file(self, f: t.Callable[[File], None]) -> None:
        # run f to type check it against the input typestate...
        try:
            f(TypecheckingFile(open=True))
        except AssertionError:
            e.args = ("function passed to for_each_file uses closed file on the first run",)
            raise
        # ...and then run f again to check it against its own output typestate.
        try:
            f(TypecheckingFile(open=True))
        except AssertionError as e:
            e.args = ("function passed to for_each_file uses closed files on second and future runs",)
            raise

@dataclass
class TypecheckingFile(File):
    open: bool

    def append_str(self, data: str) -> None:
        assert self.open

    def append_data(self, data: Data) -> None:
        assert self.open

    def append_path(self, data: File) -> None:
        assert self.open

    def close(self) -> None:
        assert self.open
        self.open = False

Now we can run our program with TypecheckingDirectory to typecheck it. To illustrate that this works, we'll also run with badprog, which fails typechecking:

def badprog(dir: Directory) -> File:
    paths = dir.create_file("paths")
    def f(file: File) -> None:
        paths.append_path(file)
        # oops, we meant to close "file", not "paths"!
        paths.close()
    dir.for_each_file(f)
    return paths

def typechecking_main():
    prog(TypecheckingDirectory(), Data(), Data())
    try:
        badprog(TypecheckingDirectory())
    except AssertionError:
        pass

The failure in badprog is indicated with a regular Python exception, thrown at type-checking time:

Traceback (most recent call last):
  File "tfs.py", line 416, in <module>
    typechecking_main()
  File "tfs.py", line 414, in typechecking_main
    badprog(TypecheckingDirectory())
  File "tfs.py", line 409, in badprog
    dir.for_each_file(f)
  File "tfs.py", line 381, in for_each_file
    f(TypecheckingFile(open=True))
  File "tfs.py", line 406, in f
    paths.append_path(file)
  File "tfs.py", line 397, in append_path
    assert self.open
AssertionError: function passed to for_each_file uses closed files on second and future runs

The stack trace tells us the concrete reason why type-checking fails for badprog: f uses a closed file after its first run, specifically paths in the paths.append_path(file) statement. paths is closed because we called paths.close() at the end of f.

3. Conclusion

Passing arguments to functions is fun and powerful.

Other constructs not shown in these examples, such as control flow and lambdas, can also be handled, in general by ensuring that control flow or lambda creation is done through an interface. For example, an if-check on an error code can be done with a Result.or_else interface, which makes both branches visible to the implementation.

4. Further reading

5. Addendum: Type-correct interfaces

The type declarations for the Data, Directory, and File interfaces at the start are simple and correct, but need to be made a little more generic to support our implementations; otherwise we get some type errors.

The below declarations of the interfaces are fully correct and allows us to typecheck properly. But they're slightly more complicated, so we're doing it here to avoid confusion up front.

class Data:
    pass

T_Data = t.TypeVar('T_Data', bound=Data)
T_File = t.TypeVar('T_File', bound=File)
class File(t.Generic[T_Data]):
    def append_str (self,         data: str  ) -> None: ...
    def append_data(self,         data: T_Data) -> None: ...
    def append_path(self: T_File, data: T_File) -> None: ...

class Directory:
    def create_file(self, name: str) -> File: ...

Exercise for the reader: Understand why these changes to the append_data and append_path methods are needed.

Created: 2022-06-02 Thu 10:03

Validate