Program modeling

It's generally not useful to think of a program in terms of the actual individual lines of code that it is composed of. Lines of code relate to each other, each one fulfilling some purpose so that the next one can fulfill its purpose, and so on; sometimes it takes many lines of code to accomplish a single task, so viewing any line independently, and not as part of a larger collective, means that meaning is lost, and it becomes harder to understand what is actually going on. In order to model a program, it's important to use the right representation.

Using the right representation

First, some important definitions: A specification or design document describes the design and expected behavior for a component or system. It's possible for more than one system to be created that follows the same specification, and it's possible for those different systems to have different ways of achieving the same result. While not every program has a specification written out, and not every specification is written prior to beginning work on the program, every piece of code exists for a reason, and every program is written to accomplish some task. The definition of this "task", and the way in which a program will accomplish this task, are usually determined when designing a program. The implementation of a system is the precise manner in which a specification is followed. The term "implementation details" generally refers to details that are not part of the specification, but are instead unique to the particular implementation of a system.

If you are trying to model a program, you can't really describe it in the language the program is written in. The purpose of modeling a program is to understand how it works; if you describe it in the same language it's written in, then you're really just re-implementing the program. You also can't really describe it in a different programming language, because, again, you're just implementing it in a different language. In essence, if you want to model a program without re-implementing it, you have to lose some (not all) of the details, and wave them away as being "implementation details".

There are, generally speaking, three ways to describe a system (and of course, combinations of these ways can be used): in a flowchart, in a textual form, or in pseudocode form. Each has certain advantages and disadvantages, but all are useful.

Using a flowchart

Sometimes, the best way to represent discrete steps or sequences is to use a flowchart. A flowchart, also called a process flow in the context of a program description, shows, in a visual manner, how different components interact with each other, or how key conditions change the course of a program's execution. The important part to a successful and useful flowchart is that it shows, not tells, the reader what is going on. For example, here is a flowchart that describes what a program will do when a user tries to delete a file.

       | Start |
   | User tries to |
   | delete a file |
   +----------------+          +------------+
   | User is        |          | Delete the |
   | administrator? |---Yes--->| file       |
   +----------------+          +-----+------+
           |                         |
           No                        |
           |                         |
           V                         |
+----------------------+             V
| Show "Access Denied" |          +-----+
| popup message        |--------->| End |
+----------------------+          +-----+

Good flowcharts have a "start" and "end" bubble, and lines with arrows to show the direction that the flowchart should be read in. In larger flowcharts, there may be more than one "end" bubble, but never more than one "start" bubble. If there's a check for a condition, the condition should usually be stated as a question; the lines leading away from the condition bubble need to have the condition embedded within the line. Bubbbles should be lined up with each other, and there is rarely a good reason to have more than four lines going to or from any given bubble. Lines should be either straight vertical or straight horizontal, or in an L shape, but should practically never be diagonal. There should be adequate spacing so that the flowchart is easy on the eyes.

Full sentences are not required within each bubble. Try to only use the bare minimum number of words needed, without providing excessive detail; usually a flowchart will not be standalone, there will be some other reference document that accompanies it to provide background information. Typically, error-checking and handling steps are left out of a high-level flowchart.

Flowcharts are most useful when describing a system that is characterized by discrete steps, where the conditions for moving between those steps are as important as (if not more so than) the steps themselves. Flowcharts help document the interactions between various components; the larger a system is, and the more components and systems it interacts with, the more likely a flowchart should be created.

Flowcharts are not terribly useful in straightforward programs, where there aren't any conditions that dramatically alter the outcome of the program.

Describing a program using human language

It's pretty much always useful to include a written description of a program or system. While a program should include adequate comments within its source code, it's good to have a document that describes the overarching design for the program. Unless writing an actual specification, such a document does not need to be overly detailed on how the task is accomplished (implementation details should usually be left out), but it should have plenty of detail on what the task is that's being accomplished. This is the key distinction between having a written document and using pseudocode or a flowchart, that the former says what is being done, while the latter two say how it is being done. Of course, the lower-level the document (that is, the more technical and formal the document is), the more implementation details should be provided.

Here is a description of the flowchart from the previous section:

When a user attempts to delete a file, their permissions will be checked. Only administrators can delete files, non-administrators are presented with a popup informing them that they do not have sufficient privileges.

Note that this description is fairly low-level, and that it conveys the same information as the flowchart. A higher level description could be as simple as: Only administrators can delete files. Note the absence of the implementation details as to how or when the program determines if someone is an administrator, or what is to be done if a non-administrator attempts to delete a file.

There aren't really any disadvantages to having a written document that describes a program. The hardest part is actually writing the document and keeping it up to date.

Using pseudocode

Sometimes, it's necessary to describe a particular function or part of a program in-depth, in a very high-level form, but which conveys the core logic and flow of the particular piece of code. This is where pseudocode comes in. Pseudocode is text which resembles real code, but is written very loosely and does not adhere to the syntax of any particular programming language (though any given sample of pseudocode should usually be consistent within itself).

Here's one possible version of the above flowchart in pseudcode form:

if(user is administrator)
    delete the file
    show popup "Access denied"

Notice how it's extremely free-form, yet still vaguely resembles real code. Here's a different version, that is more like a sample of real code:

    ShowPopup("Access denied");

There is no requirement that any of those three functions actually exist in the codebase, and there may be additional lines of code in the actual code file. The pseudocode doesn't even have to follow the same order as the real code. The point is simply to represent the flow of the program in a more precise and concise manner than words alone will allow.

Pseudocode is mainly used for showing the mechanics of a piece of code, without getting bogged down in the implementation details. It is especially useful when describing the inner workings of an algorithm in a language-independent manner. Another thing it's great for is the initial design of a program, to map out a plan for what the program will do and how it will do it, without having to write real code just yet. Because it's free-form, and not actual code, it means that there's no need to worry about having valid code. It is not generally useful to write a pseudocode version of an entire program; its main use is to model, in detail, critical pieces of code.

One other thing to note is that pseudocode, if written before a program is created (and even if it's written after), can be embedded in comments within the program, to help the reader understand what's going on.

Creating a model of a program

Using these (and other) methods, it's possible to build up a model of a program, either for the purposes of understanding an existing program, or to design a new one. Different techniques will work for different people: some people need things fully written out, some need to draw out a program entirely from start to finish, others visualize and skim through code, and some use combinations of these methods.

It's also possible to start top-down or bottom-up (starting from the lowest-level components and building up, or starting from the high-level structure of the program and working towards the actual implementation details), or even start from both ends and meet in the middle. As such, I can't prescribe one methodology over another. While I'm going to go over these different approaches to modeling programs, ultimately it's up to you to determine what works best for you.

It is important to note that, regardless of what approach you take, you should try to have at least a vague idea of what the program does, so that you can keep it in mind when building your model of it.

Baseline requirements

At a high level, there are two kinds of data structures and functions that any given program will employ: those that already exist, and those that are purpose-built for the program. Non-application-specific concepts include things like:

  • Storing and accessing files
  • Interacting with a user
  • Interacting with other programs
  • Detecting and recovering from errors
  • Data structures for effectively storing and manipulating different kinds of data

Not every program is going to use all of those concepts, but most do. By understanding some of the more common components, as well as common design patterns, it makes it much easier to understand and create new programs. The topics listed above, and others, will be written about in detail in later posts.

It's also important to have a decent grasp of the language that the program is written (or will be written) in. It's not necessary to fully understand the language in order to get a good idea of what a program does, but not knowing anything about the syntax and the language means that there are certain nuances that might go unnoticed, or certain language constructs that are confusing. The good news is, most languages have vaguely similar syntax.

Note on finding (or creating) the starting point of a program: Most programs either start in a function called main, or at the first line of a code file, depending on the language; if it starts at the top of a code file, hopefully the documentation is sufficient to identify which file it is. Good files to look in are ones called main or index, or a file with the same name as the program itself.

Bottom-up modeling

The goal of bottom-up modeling is to start with the most basic utilities and operations, work up to application-specific components, and finally create the core of the application itself. Pure bottom-up modeling means that by the time the application logic is developed, every other piece of the system has been created.

Put another way, in bottom-up modeling, if a function or object is about to be created or read, every function and object that it depends on will already have been created or read.

The chief advantage of doing this is that every last detail is planned and understood, and any fundamental issues with the particular implementation of the design will likely become apparent very quickly. The idea is to build up a model of the codebase from the ground up, which has another advantage in that every piece can be tested as it is built. Since no function is created before its dependent components exist, it means that at every point the codebase can be compiled and tested.

One of the disadvantages to such an approach, however, is that it takes longer for the final appearance of the program to take shape, because application-specific logic and functionality is the last thing to be implemented. This is mainly a problem in the design phase, because when developing, until every last function is created, the program isn't complete, regardless of what approach is taken to build it. Another issue is that, when trying to model an existing program, it might be overwhelming (and usually is not really necessary) to actually understand every component that the program is composed of.

Top-down modeling

Top-down modeling is the opposite of bottom-up modeling: starting with the high level application logic and functionality, work down towards the implementation details. The goal is to start with an idea of what the application will do and develop functionality as it is needed.

This approach to modeling is well suited to the initial phase of design because the shape of the application quickly takes form, as application logic is built before the implementation of any real functionality. High level components and interactions are built first, making the approach generally more appropriate than bottom-up if there are many complex pieces that need to fit together. Fundamental issues with the design itself tend to stand out quickly. Another nice thing about this approach is that, when modeling an existing program, it's easier to stop when a sufficient representation has been built, instead of getting bogged down in the implementation details.

At the same time, with top-down modeling, it's entirely possible that assumptions made about what it is actually possible to implement are false, rendering the entire design flawed. Implementation-level issues are likely only to be discovered when it's too late for the design to be adjusted without overhauling or scrapping the entire thing.

Meet in the middle

Instead of starting at one end or the other, another possibility is to begin work at both ends at the same time. As high-level functionality is needed, work begins on the low-level components that are required, and development is done vertically to ensure that each high-level feature is fully supported, implemented, and tested before moving on to the next.

This approach has the advantage that the program is modeled in chunks, and each chunk is completed before moving on. Issues for any given feature will be identified reasonably quickly, though fundamental flaws may still go unnoticed until it is too late when attempting to combine everything with the final application logic. Because components are developed sort of in isolation, though, it means that those components are more likely to be reusable, or salvageable in the event of a catastrophic flaw being uncovered. This approach is thus well suited for applications that have distinct and independent components that are brought together only in the latter stages of development.

However, "meeting in the middle" is not as immediately straightforward as the other two approaches. While the other two approaches have clear start and end points, and work can begin instantly regardless of how fully-formed the overall design is, this one essentially requires that the entire design and general approach be known ahead of time; how can you build components in isolation, if you don't know what components there will be?

Creating your model

The important thing about all of this is to figure out what works best for you. If you find that you work best when approaching problems a certain way (and by "work best" I am also implying "make what you set out to make"), don't let anyone convince you that your way is wrong (though at least hear them out, it's possible that they have advice on easier ways to approach certain problems). But also keep in mind that you have to be fluid and adapt to the task at hand.

Ultimately, there are two models you should make for every project you create: the internal model you use to navigate the project, and the external representation that is included with the codebase to give others a head-start on building their own model of the project. As a general note, the representation (ie documentation) you create for others should usually start with a high-level top-down overview (written text), so that others can quickly see the broad picture and decide where they want to start.