Data structures and variables

Data structures

Data structures are rooted in the notion that we can define consistent formats for representing and accessing data. Data structures do not have to be, and in fact most of the time are not, true-to-life representations of an object. For example, in a program that keeps track of all of the books in a library, the following might be a sufficient data structure to describe books:

    Title (string)
    Author (string)
    Length (number)
    ISBN (number)
    Available (boolean)

When a programmer defines a data structure like this in a high-level language, they are telling the computer how it should store and retrieve properties on an object. I will not go into detail in this post about how computer memory works, but suffice it to say that values stored in memory only have meaning (to a person or to a program) when accessed and used correctly; thus, the programmer has to inform the computer how they are wanting to reference values in memory, so that the program functions correctly.

The terms "property" or "field" are generally used to refer to the elements of a data structure. A data structure defines the shape of an object; an instance of that object or structure must be created in order to store and manipulate data.


A variable is simply a name for a particular piece of data. For example:

variable MyName = "ninjattic"
variable ScreenWidthInPixels = 1920
variable ScreenHeightInPixels = 1080
variable WearsGlasses = true

These four lines would declare four variables and assign each a value. In most languages, variables must be declared before being used; declaring a variable simply indicates to the computer that you want to have a new name to associate a value to. Some languages have special keywords, like let, var, or my, that must be used when declaring a variable, similar to the above example. Most languages need to know what type a variable is when the variable is declared. Some languages have the type before the name (and generally no declaration keyword), like this: string MyString; others have the type after the name, usually with a colon separating the type and the name, and usually with a declaration keyword before the name, like this: var MyString: string.

Once declared, a variable may be used freely. Variable names must be unique to the scope they are declared in; scope is basically a word for the area of code that an identifier (such as a variable name) is valid in.

It is at this point that I should introduce syntax highlighting before continuing. If you've ever looked at a programmer's screen while they are coding and noticed that it's nice and colorful, that's because of the joys of syntax highlighting. Basically, the editor that the programmer is using understands the syntax of the language being used, and colors different components in different colors. This helps the programmer easily differentiate parts of their code, and also identify certain mistakes very easily. Rewriting the above with syntax highlighting, you get this (note, I manually write up the HTML that gives code samples color on my site; sometimes it's a little tedious, but it's not too bad):

variable MyName = "ninjattic"
variable ScreenWidthInPixels = 1920
variable ScreenHeightInPixels = 1080
variable WearsGlasses = true

So much easier to read, isn't it?

Anyways, back to variables. Variables allow you to reference the same value or piece of data multiple times without having to re-type it. Proper use of variables results in very readable and maintainable code. Consider the following two samples of code, which result in the same values being stored, but with one being much easier to make sense of than the other:

int TotalPixelCount = 1920 * 1080


int ScreenWidthInPixels = 1920
int ScreenHeightInPixels = 1080
int TotalPixelCount = ScreenWidthInPixels * ScreenHeightInPixels

It may seem like a trivial example (which it is), and like it's more work to declare the two extra variables, but it's much better, in general, to have variables instead of raw numbers. The computer will not be doing any more work by introducing extra variables; a variable, as I said, is simply a name for a value, so in simple cases like above, the computer will just replace the variable name with the value, and perform the calculation.

Kinds of variables

Generally speaking, there are 3 kinds of variables: scalars, arrays, and objects.

A scalar is a variable that has a single value. The variables in the above examples are scalars.

An array is an ordered list of values that are stored next to each other in memory. Arrays usually have a fixed length. Each element of an array can be accessed using the index of that element. In most programming languages, array indexing starts at 0; this is something that frequently trips people up. The reason for this is beyond the scope of this post, but I will cover it eventually. In most programming languages, square brackets [ ], with the index between them, are used following the name of the array variable, to access the element at that position. For example, to access the 3rd element of the array Messages, one would write Messages[2] (keeping in mind that the first element is at index 0, the second at index 1, and the third element at index 2).

The last kind of variable is that of an object. An object is an instance of a data structure. The dot operator . is generally used to access elements. For example, with a variable my_book that is an instance of the Book data structure defined above, one could set the ISBN field like so: my_book.ISBN = 1234567890. It is also common that when creating an instance of a data structure, the keyword new is used. A complete example, setting all of the fields on an instance of Book:

Book my_book = new Book
my_book.ISBN = 1234567890
my_book.Title = "What is a program?"
my_book.Author = "ninjattic"
my_book.Length = 6500
my_book.Available = true

Going back to arrays for a moment: In many languages, the length of the array can be accessed using the property length; to get the total number of messages in the above array, one would write int number_of_messages = Messages.length. This is because in those languages, arrays are actually objects.

Suggested next reading: Conditional statements