07 Oct 2021 - tsp
Last update 07 Oct 2021
First of - what is everything in the context of this blog entry and what is a version control system? And who is this article targeted at? It’s not targeted at the experienced software developer who manages his code already using git or SVN. It will be boring and sound somewhat strange in this case. It’s targeted at people who currently don’t use SCM for any task. By everything I mean stuff like:
The stuff that I don’t mean are large binary files such as your media collection, photo collection, etc. and temporary files that can easily be regenerated at a later time as well as large databases, scraped data, extracted data that can be regenerated, etc.
What is version control? Version control systems (sometimes also called revision
control, source control or source code management system) allow one to centrally
or decentrally manage collections of files in different versions each. Imagine
you change something in your computer programs source code or in your thesis
and want to look into the old version later on. Often one sees people calling
and so on. And then shifting around the files on external storage devices such as
external harddisks or USB flash drives, many times with colliding names and then
later on overwriting much of their new work or not locating the most current version,
not being able to locate comments, etc. Version control systems solve that problem including
the moving around on USB sticks - they usually provide a blaming feature that even
can show who changed what and when in case one’s working in a team. And they usually
allow for seamless interoperation by including merge tools - if many people modify
the same file at different positions they’re usually able to automatically
merge (if using proper file formats) differences or at least highlight merging
conflicts. And you never loose any old content - so think about what you put inside
a repository, usually if everything goes right nothing will ever be deleted and
most systems do not even support that without major hacking around in their internal
As already mentioned they’ve been mainly developed for software development but the problem of revision management is as old as writing itself - and these systems are really great to be applied to all textual content in a highly efficient way. In fact this web page is built out of a source control system.
There exist two different main models for source control (but only two really popular software packages though there are is a huge number of different tools out there).
First there are centralized version control systems. These are built around a
central repository that’s usually hosted on a server that’s reachable on the
network or via the internet. A typical representative is Subversion (SVN).
One creates a repository on the server (should do automated backups there) and
then checks out (copies) the version or branch one requires from the server
svn tool. Changes are stored locally and then
back onto the server) into the central storage. One only stores the working copy
in one fixed version locally. The main advantage of a centralized version control
system is that one only checks out a given version or a given subset of the
project, is able to perform centralized rule checking and centralized linting of
the commits. To use SVN one usually only needs to know 3 different commands:
Checkoutcreates a new copy of a centralized repository or a subset of it in a given revision. This is usually the first operation one ever performs after creating a repository on the server side.
Updatepulls the most current version of content from the server into a local repository.
Commitpushes local changes into the remote repository - if there is a conflict that is not solvable automatically the commit fails and one is able to perform a local merge of the changes before trying again
In addition SVN also supports locking and unlocking resources so one can negotiate
who modifies which resources but usually this is not needed. Another operation
that one might need is
Revert that reverts a file to an older revision
previously stored discarding any newer changes. The
blame utility helps
Then there are distributed version control systems such as the really popular GIT (note that this is not directly related to the well known GitHub hosting service though that’s an really easy starting point for newcomers) or the less well known older darcs. Git provides the ability to run in distributed mode by keeping an own complete local repository including all versions - but also allows one to synchronize to remote ones like in the centralized case. This makes using git a little bit more cumbersome and harder to think about than using SVN - but for source code in the open source environment it’s currently more popular than SVN due to it’s distributed nature. You can simply take the whole repository with you offline, you have a whole copy (solves the backup problem if you simply clone / pull the repositories on different machines and keeps them in sync).
To use git one requires at least the following commands:
cloneis similar to checkout in SVN. It copies a remote repository - but in contrast to SVN it copies everything including all old revisions and branches. Later on when one uses nested repositories one will see that it does only clone them recursively when one instructs it to but this is nothing a beginner will usually have to worry about. It also adds the remotes to the repository
pullfetches the latest version from the registered remotes and includes the latest changes into the local repository. Note that any changes to local files should be commited or the pull will fail in case there is a conflict to prevent data loss. In case the commit chain differs the system will try to automatically merge the repositories.
pushuploads all local changes to the remote repository
addadds files to the local staging area. Data stages will be included in the next local commit
mvshould also be done through the git utility and will be added to the staging area.
checkoutcan be used to revert local changes that have not yet been commited
commitcreates a new commit / revision in the commit hierarchy from all staged changes. A commit can also be signed using OpenPGP to proof the identity of the author even when using some untrusted repository storage.
The previously mentioned GitHub service is a nice external storage solution for your git repositories if they are either public or should be shared only with a small number of collaborators or a small group.
Previously I’ve written a short git cheat-sheet
that should provide a nice summary on how to do common stuff using
really worth it and other than centralized systems it does not require one to
perform proper server administration for the central repository.
gitwhen using a central remote such as
GitLabinstance). No more guessing which USB stick now has your current version. Just always push your changes to your remotes. And you can use multiple remotes to increase reliability.
GitHubit even formats your Markdown documents in a nice fashion which is nice for documenting stuff - one can of course also build a fully blown wiki solution on top of version control if this is really required but if you build your lab book around markdown that’s pretty efficient.
tag. For example if you have a pre-print for your paper or something that you handed in you can simply tag it to identify it later on. This is also done for software when a version is released into testing or to the general public. For software one can also decide which commits form a given release to include only partial features, etc.
This article is tagged: