Skip to content

Introduction to git

Git is version control software. The purpose of version control software is to track the state of code files across time/versions. GitLab is a specific frontend and remote Git host which we use.

Basic structure

Git tracks the state of files within a repository (repo). Each repo will have a .git/ folder at its root which stores all of the information and metadata for the repo.

Commits

Git ultimately stores everything as commits. Each commit stores information about the changes to one or more files in the repo; these changes most often take the form of a list of lines removed and added for text files (including code), but for binary files (such as images) it will just store the entire data for the file.

The history of commits in a repo will store the full history for each tracked file in the repo. Every file addition, change, and deletion will be present in the history (complete with change message, user, timestamp, and other metadata). This means that it is possible to track exactly how every change happened through time to each file.

Branches

Everything in git is divided into branches, which are primarily used to organize various work into isolated environments that can be worked on independently. Each branch consists of any number of commits; the current state of the branch is represented by the sum total of all commits in the branch. Each commit can be (and generally is) present in multiple branches as branches get merged over time. The last commit in a given branch at a given time is referred to as the branch's "head".

Local vs remote

Each git repo is simply a set of files on disk, but the repo can be present in multiple computers and git has the functionality to sync the commits and branches back and forth between different computers. The most typical way to configure things is to have a single central server maintaining the state (such as GitLab) which all individual machines sync their changes with as-needed. Git groups things into either "local" (pertaining to the state of the repo on the local machine) and "remote" the instance(s) of the repo on another machine accessed over the network.

It's not the only way to do things, but in most situations it's correct to simplify thinking about things to "the local" repo on your personal machine and "the remote" repo stored on the GitLab instance. By default, git refers to the remote repo as origin; that can be reconfigured, but there's rarely much reason to do so.

Common commands

Various commands are used by git to take actions. These can be used in the command line, generally in the structure of git <command> <arguments> and there are GUI interfaces which will generally provide the same functionality through various buttons.

pull

pull is used for syncing data from the remote to the local repo, including merging any remote changes into the currently active branch.

push

push is used for pushing any local commits to the remote repo.

add

add is used for prepping individual files for being turned into a commit. Files that have been added are considered "staged", at that time their current state is compared to their state in the previous commit and the difference is determined and added to the staged changes.

commit

commit takes all of the currently staged changes to files and bundles them into a commit in the repo.

checkout

checkout is used to switch from one branch to another. Any files that are tracked as part of the repo will be changed to their state at the head of the other branch (any untracked files that git doesn't interact with will be untouched).

merge

merge is used for applying all of the commits from one branch to another branch. If the same file has been edited by different commits in a way that can't be automatically resolved, a "merge conflict" happens and a human needs to determine what the true final state should be and commit that to the repo to resolve the conflict. A very common command is git merge origin/sandbox, to merge the sandbox branch on the remote (origin) repo with the current local branch.

clone

clone is used to initialize the folder with the git repo to begin with, by coping the entire repo down from the remote repo.