thinkingincode.ninja [Dark]

Version control

The core idea of version control is quite simple: to enable someone to take snapshots of content at particular points in time; this is the "version" in "version control". The core motivation to use version control is equally simple: if you mess up, you want to be able to recover your content to a state that it was in previously; this is the "control" in "version control". These are slightly vague descriptions, and that's because I'm going to be starting simple and working my way towards modern version control systems. I'm going to be starting with the most simple and non-obvious form of version control that there is, one which I can guarantee you've used, and that I would argue is one of the most critical forms of version control.

The undo button

Yes, the undo button. Present in practically every application that involves editing or creating content (as well as many other kinds of applications), the undo button is something that's generally taken for granted. The undo button allows the user to revert their most recent change; many apps keep a history of changes, allowing you to undo multiple changes in a row. A lot of apps also have a redo button, which lets you revert one or more undo operations (but generally only if you don't make any changes between the undo and redo operations).

I would argue that the undo button (and yes I know it's not usually used via a button, it's usually used via a keyboard shortcut) is one of the most critical forms of version control because without it, simple mistakes couldn't be easily corrected, and it's practically universally available. The ability to immediately undo simple mistakes without having to even leave the application you're in is actually a pretty big deal. The only option you really have, if you don't have an undo button, is to make one or two changes at a time, make a copy of the content, and continue making changes in this fashion, loading previous copies of the file as needed to "undo" changes (more on this in a minute).

From this point forward, I will be talking about version control in the context of files, because that's what people generally are talking about when they are referencing version control. Further, even if I mention "files" plural or even "project", all of the following forms of version control work for individual files as well.

Duplicating files

The simplest way to ensure you have a backup of something is to take a full copy of it. Usually, this will involve naming the copy with a timestamp or an incrementing version number at the end. The chief advantage of creating a copy is that it is the fastest way to ensure you have a snapshot of a project at a particular point in time. However, doing so takes up double the amount of space, and it's also very hard to determine what changes have been made. Creating a copy is fine for one-off cases and single document use-cases, but should be avoided when working on projects.

One of the key issues with using duplication as a means of version control, aside from the above issues, is that the history of the file or project is not actually tied to the object itself. This means that just having the project itself is not enough to see its history. Further, there's no guarantee that the contents of the copy have not been changed, which means you could be looking at an incorrect historical version.

Modern version control systems

Specialized programs have been created for the purposes of enabling programmers to manage their code via version control. The most well-known are git, Mercurial, Subversion, and Microsoft TFS. Though they all have different methodologies and approaches to the task of version control, they all have the notion of different authors, and they all support the notion of "remote repositories". A repository is the collection of files or objects which are managed by a version control system; a local repository is a repository on your computer, while a remote repository is a repository on another computer that your local repository can be synced with. Your code repository is thus able to be duplicated to a remote system, though the difference between this duplication and copying the project folder is that the former is a way for a version control system to back up a code repository and allow multiple people to work on the project at once, while the latter is the version control system.

Version control systems (VCS for short) have several key characteristics and features. The most critical feature is that changes to multiple files can be "committed" at once; a "commit" takes a set of changes (along with a message summarizing the changes) and adds it to the permanent history of the repository. Any changes committed to a repository can be recovered, while any changes not committed cannot be. Different VCSs have different ways of handling the problem of multiple people making changes to the same file at the same time; suffice it to say that it can be done. Another key feature is that a VCS allows you to view an entire code repository as it existed at any point in the repository's (commit) history. If desired, it's also possible to revert files, or even the entire repository, back to a previous state. This is useful in case of a catastrophic failure, where something has gone so wrong that the only way to get the project back into a functioning state is to blow it all away and step back in time. It's not necessary to do this if you are just wanting to see what's changed; VCSs also allow you to compare files, to see what changed, who changed it, and when. It's also possible to compare arbitrary commits, to see what changed between various points in history.

Now, there are two things that should be kept in mind when dealing with a VCS. One, never, ever, ever put your passwords or other secret information in a repository if there is any chance that the repository will be publicly available (such as a public repository on GitHub). Once a change has been committed to a repository, it's hard to expunge the data from the repository; it's usually possible, but it involves the VCS having to rewrite the history for the project. However, once the change has been uploaded to a remote repository, rewriting history like that has negative impacts for anyone who has pulled down a copy of the repository, and there's no guarantee that everyone who pulled down the repository will also decide to pull it down again after you expunged the data; this is to say, once it's on the Internet where someone else may have pulled the password down, you should consider the password to be compromised.

The other thing to keep in mind is that you should only commit changes that are complete, even if the project itself is not. Just as you wouldn't generally stop working on a paper and save and close it in the middle of a sentence, it doesn't generally make sense to commit code that's half-written. This isn't to say that you can't commit any changes until the whole project is done, I mean to say that if you have a half-finished line of code, or there's a syntax error, or you know that the code won't even compile, you probably shouldn't commit that code. It clutters up the repository history, but the bigger issue is that, ideally, at any point in time someone could pull down the latest version of the repository, and have a functioning system, even if said system doesn't have all of the desired functionality.

Proper use of version control when working on projects is vital, because it allows programmers to document the history of a project, and leverage that history in various ways. It isn't necessary to set up a remote repository and replicate changes to another computer, even just using a VCS locally is better than nothing. Using version control is a good habit to get into, especially if you plan on working on larger projects with multiple people involved.