We were introduced to Version Control System and Git in the third week.Given below a short description about VCS, Git and the basic terminologies used in Git.
A version control system (VCS) allows us to track the history of a collection of files. It supports creating different versions of this collection. Each version captures a snapshot of the files at a certain point in time and the VCS allows you to switch between these versions. These versions are stored in a specific place, typically called a repository.
VCS are typically used to track changes in text files. These text files can for example be source code for a programming language, HTML or configuration files. Of course, version control systems are not limited to text files, they can also handle other types of files. For example, you may use a VCS to track the different versions of a png file.
Distributed Version Control Systems
In a distributed version control system each user has a complete local copy of a repository on his individual computer. The user can copy an existing repository. This copying process is typically called cloning and the resulting repository can be referred to as a clone.
Every clone contains the full history of the collection of files and a cloned repository has the same functionality as the original repository.
Git is currently the most popular implementation of a distributed version control system.
Git originates from the Linux kernel development and was founded in 2005 by Linus Torvalds. Nowadays it is used by many popular open source projects, e.g., the Android or the Eclipse developer teams, as well as many commercial organizations.
The core of Git was originally written in the programming language C, but Git has also been re-implemented in other languages, e.g., Java, Ruby and Python.
A Git repository contains the history of a collection of files starting from a certain directory. The process of copying an existing Git repository via the Git tooling is called cloning. After cloning a repository the user has the complete repository with its history on his local machine.
A local repository provides at least one collection of files which originate from a certain version of the repository. This collection of files is called the working tree.
A file in the working tree of a Git repository can have different states. These states are the following:
untracked: the file is not tracked by the Git repository. This means that the file never staged nor committed.
tracked: committed and not staged
staged: staged to be included in the next commit
Adding to a Git repository through staging and committing
After modifying your working tree you need to perform the following two steps to persist these changes in your local repository:
add the selected changes to the staging area (also known as index) via the
commit the staged changes into the Git repository via the
git add command stores a snapshot of the specified files in the staging area. It allows you to incrementally modify files, stage them, modify and stage them again until you are satisfied with your changes.
After adding the selected files to the staging area, you can commit these files to add them permanently to the Git repository.Committing creates a new persistent snapshot of the staging area in the Git repository.
The staging area keeps track of the snapshots of the files until the staged changes are committed.
For committing the staged changes you use the
git commit command.
Installation of Git and its usage
Since we are using Ubuntu, install the Git command line tool via the following command:
sudo apt-get install git
Now let's start using git
1.Create any repository and get inside it.
2.Initialize git using the command
Here we have initialized the git inside the repo git-blog and also you can see the .git directory created after the same. This means whatever files created inside the repo git-blog will be tracked by git.
3.Lets create a file named x.txt. Once created the file x.txt, Use the command
git hash-object x.txt
The value returned by
git hash-object command is known as the blob object of the coresponding file. Actually this command takes some data, stores it in our
.git/objects directory (the object database), and give back the unique key that now refers to that data object.
4.Since git is content-addressable, files with same contents shares the same blob-id. See the example given below
5.Now lets check what's git status of the two files created above
6.The two files are in the untracked state.Inorder track these files by git, Use the command
git add file-name
Here the files x.txt and y.txt are moved to staged state
7.Lets commit these files , inorder to keep them permanently inside the git
Here once the files are committed ,git sets permissions to these files. Again check for
git status, you can see the message working tree is clean and nothing to commit , which the means all the files inside directory git-blog are tracked by git.
8. Inorder to see the commit messages inside the git of the current repo, use the command
The above command lists out , all the commit messages . Here we have only one commit message to list out. Basically each commit message comprises of three things a) commit id b) Author ( Who commits ) c) commit message
9) Lets see the what all things are kept inside the dir .git/objects, Here I use the command find
Here three types of objects are listed out ,with their corresponding type and content seperately with the commands
git cat-file -t object-id "-t returns type of the object"
git cat-file -p object-id " -p stands for pretty print which returns contents inside each objects"
Three important Git objects
git hash-object, takes some data, stores it in our
.git/objects directory (the object database), and gives back the unique key (which is a 40-character checksum hash) that now refers to that data object , known as blob.
The tree, solves the problem of storing the filename and also allows you to store a group of files together. Git stores content in a manner similar to a UNIX filesystem, but a bit simplified. All the content is stored as tree and blob objects, with trees corresponding to UNIX directory entries and blobs corresponding more or less to inodes or file contents. A single tree object contains one or more entries, each of which is the SHA-1 hash of a blob or subtree with its associated mode, type, and filename. The screenshot given above illustrates some examples.
A commit object (short:commit) represents a version of all files tracked in the repository at the time the commit was created. Commits know their parent(s) and this way capture the version history of the repository.
This commit object is addressable via a hash ( SHA-1 checksum ). This hash is calculated based on the content of the files, the content of the directories, the complete history of up to the new commit, the committer, the commit message, and several other factors.
The commit object points to the individual files in this commit via a tree object. The files are stored in the Git repository as blob objects and might be packed by Git for better performance and more compact storage.
A Git commit object is identified by its hash (SHA-1 checksum). SHA-1 produces a 160-bit (20-byte) hash value. A SHA-1 hash value is typically rendered as a hexadecimal number, 40 digits long.Objects which didn’t change between commits are reused by multiple commits
A simple pictorial representation of blob ,tree and commits of two files test.txt and new.txt
All the above examples dicussed above are only about git used in our local system. Please refer to the book git-scm to learn more about git.
Hope you have got a small base about git .
Will be back with some other topic, stay tuned.