Guide to the Git Underground


Guide to the Git Underground

    In the land of before-times, there was SVN, and it was good, but we were children then, and linux was something we couldn’t be bothered to install on our Intel 486 skateboards. Nowadays, we have Git, and we rejoice at the numerous new ways in which we can mess things up and destroy entire repositories, or build billion-dollar empires in foreign countries using dirty, ugly, sexy code. Git is like surfing in an ocean rather with sharks rather than skating in the streets with preppies high on SVN, so be warned before you go crashing into the coral of “commit all” or getting snagged in the breakers of rebase hell.

    This guide basically follows the “Successful Git Branching Model” by Vincent Driessen, with detailed usage notes. While you can surely learn a lot by studying the Git Usage Docs, and from the “Successful Branching Model”, there will be times when it’s still unclear what action to take or how to resolve conflicts. This article will walk us through the normal usages cases for a collaborative Github team, from the perspective of developers working on feature branches and from that of project managers who are gatekeeping, testing, and merging stuff that will go live on a production server. We will also examine some of the common pitfalls and traps you can get stuck in from not using Git properly, and what to do to get out of them gracefully.

Act I: Feature Branches and the Lost Art of Good Code

    Scene 1: We need to change all the code “widgets” into code “doodads”, ASAP

    You arrive to work on your first day at companyXYZ and your new boss describes some amazing feature he would like you to implement, as soon as possible. The first thing you do is go to the company repository on Github.com, and create your own FORK of the repository you will be making updates to. Most of the time as a developer, you will not even have access to update the company repo, so you will make changes on your own fork and make PULL REQUESTS to the company repo when you have updates that are ready to be pulled in. Even when you do have access to the company repo, most of the time you should be working in your own fork anyhow, for various reasons.

    Ok, so you have have created your fork of the companyXYZ repo at github. Your username is myUsername, and lets say for example that the name of the repo is exampleRepo. So you will now clone your own fork of the company repo:

 > git clone git@github.com:myUsername/exampleRepo proj_user_branch

    If you don’t have an SSH key linked to your github account, you can follow these instructions from Github on how to generate an SSH keypair, and how to upload your public key to github. Basically, just create a pgp keypair and copy/paste the public key into the little box under settings, “ssh keys”, for your account at github. Then put your private key into your ~/.ssh folder and chmod 400 to set the permissions on the private key file to read only for only your user account on your local machine. This will make it so you don’t have to re-type your username and password all the time, and it will make you look like a git professional who knows how to use cryptography properly.

    Note the name of the folder we will be cloning into, repoName_userName_branchName/. I like to keep seperate copies cloned into seperate folders for each active branch I’m working on. The repoName_userName_branchName/ format that I use for my folders helps keep the folder listing nice and tidy. I typically keep seperate project open for each feature branch with my IDE, so that when I “git pull” or “git merge” from the command line on any branch, the IDE automatically updates to the newest code. So, to start a feature branch, clone your own fork first:

> git clone git@github.com:myUsername/exampleRepo exampleRepo_myUsername_featureName

so it will clone into a folder called exampleRepo_myUsername_featureName. Now we go into that folder:

> cd exampleRepo_myUsername_featureName

First, we want to set the upstream for this project, to the company / main user account’s fork.

> git remote add upstream git@github.com:companyXYZ/exampleRepo

With git, you have to tell it to fetch the newest code from the online repos all the time. We start out with:

> git fetch –all

If you messed up the URL for the remote upstream, it will tell you when you fetch, and you can set it right with:

> git remote set-url upstream git@github.com:companyXYZ/exampleRepo

Now we can see all the forks and branches with the commands:

> git branch -v

> git remote -v

    You want to set the default branch for your repositories to develop branch, which is a setting on the main github page for your fork, in the branch listings area. You may have to create a develop branch in the little branch listing box, on the main repo page for your fork, if it doesn’t already exist. If you are not on develop branch by default, you will be on master branch by default. You want to make sure you are on develop branch, to start your feature branch. Feature branches are always branched off from develop. If you are not already on develop, then do:

> git checkout develop

    The account you use in the “git clone” line will always be your “origin”. Notice we set a remote called “upstream” on one of the commands above, to the company repo, so we can pull down code from their version of the project. You could also set a remote to “Ryan” with the URL of git@github:ryan/repoName, for example, and pull or merge stuff in from Ryan’s branches, too. You can submit Pull Requests through the github website to any branch of any public or privately shared repository that you have access to. A Pull Request is a request you make for some user / company to “pull” your code updates into one of their branches.

    Scene II: Branches in the Tree of Code

   Ok, blarg. We have killed our first laptop, drank some coffee with tea in it, and stolen a necessary 2nd-monitor cable from someone who was working at home. Lets get this forking show on the road. So, “git branch -v” showed us that we are on the origin/develop branch. Lets make sure our local develop is still up-to-date with upstream/develop, and then create a new feature branch of our own using the “checkout -b” command.

> git fetch –all
> git pull upstream develop
> git checkout -b featureBranch develop

Feature0

Fig 1. Basic Feature Branch Model

Whenever you make a feature branch, it is branched from your own develop branch, which needs to be updated to match the newest version of the upstream develop branch. Now we’re ready to create our new FEATURE BRANCH. Let’s look at what is going on here, in the picture to the right in Figure 1. Since we are branching from origin/develop, updated to upstream/develop, and we want to make a PULL REQUEST to the upstream/develop, we will focus on just our own feature branch and upstream/develop.

Ok, so it’s your first week, and it takes you a while to get up and running. On day one, you make some changes to a couple files. You use:

> git status

to see all your changed files and you notice you are now 1 commit behind upstream/develop. Don’t worry about it. You also notice that you had to change one config file for your local system testing, but you don’t want to push that one upstream, or you will fork everything up for the dev and live servers. So, basically don’t ever do “git commit -a” or “git commit *”, because it will probably add stuff you didn’t want it to add. Just do:

> git add path/to/fileName.ext

for any changed file you want to be included in the pull request for your feature, or maybe:

> git add path/to/*.jpg

for a bunch of jpeg images, for example. Now, you commit the changes with some comments, and push your work back to your own github account repo before you leave work the first day:

> git commit -m “changed all the code widgets to code doodads.”
> git push origin featureBranch

    Scene III: We Have Decided to Merge Your Branch, Even Though It’s Not 100% Ready

    Ok, two days go by. Each day, you commit any files you changed, then push up to your local repo, at origin featureBranch. Each day, you notice your feature branch is not only getting further ahead of, and also further behind upstream/develop. No worries! As you go, you hear the other team members talking about the newest releases that are being pushed, and also a hotfix that came down “into develop”. Your co-worker Tom was all happy because his feature branch PR got merged into develop also. Now, according to Figure 1., if we try to make a pull request from your origin/featureBranch to upstream/develop branch right now without some preparation, we could run into some issues.

    Our featureBranch doesn’t include any of the new code from Tom, or from the new releases or the hotfix that were merged into the upstream/develop branch, so if git tries to compare our current origin/featureBranch to upstream/develop, it will see a bunch of changes related to files we didn’t even touch, that are not even relevant to our feature! Also, there might even be CONFLICTS between what we touched and the new stuff from other people, and if you submit a pull request with conflicts, it will suck for the person trying to merge it upstream, and they most likely won’t even do it!

    So, we need to REBASE our featureBranch, because the upstream/develop branch has changed out from under us, and we need to pull in all the new stuff and check for conflicts before we submit a pull request to upstream/develop. Fetch, and then rebase to upstream/develop.

Figure 2

Figure 2. The effect of rebasing your branch

> git fetch –all
> git remote update
> git rebase upstream/develop

Let’s see another chart, Figure 2, to the right. The rebase will basically unroll all your commits and then bring your branch from develop all the way up to the latest HEAD of upstream/develop. Then, it will try to apply your commits from there, so that only YOUR changes get included from that point. It is possible that the code you touched will conflict with other peoples’ code at this point, and you will have to use a MERGETOOL in order to complete the rebase. Please see my other page on “Guide to Underground Git Merging”, if you have any questions on how to do that. You should be able to do:

> git rebase upstream/develop

with no conflicts reported, before you proceed from here on.  Ok, great. You have a “cleanly rebased” featureBranch, ready to be submitted for a pull request, right? Well, almost.

    If you submit your pull request now, it will contain three commits, each with your daily comments. Really, what you want is to submit a pull request with a single commit, which contains all three squished into one, and with only a single comment, and you want to update that comment. We are going to do an “interactive rebase” to patch that all up and get ready for our big pull request to upstream/develop.

> git rebase upstream/develop -i

Figure 3

Figure 3. The effect of the “interactive” rebase

Now you will be presented with a text file containing lines like this:

        pick “my comment from the first day commit”

        pick “my comment from the second day commit”

        pick “my comment from the third day commit”

followed by some instructions, which you should read. Basically, you want to change all but the first “pick” to “f”, which will “squash” all the commits together, discarding all the comments but the first one. You can update the first comment to whatever you like here, also. Now, SAVE THE FILE BEFORE YOU CLOSE IT! Otherwise, you’ll have to do the “interactive rebase” again.

    You should get a line starting with “successfully rebased” as a response from git. Finally, you have to do a forced push back to your origin/featureBranch branch, after a rebase (note the “plus” sign to indicate the push is forced):

> git push origin +featureBranch

    You are now ready to do your pull request! You go to github.com, to your own fork, and make sure you are looking at the featureBranch branch. Now click the green button with the two circling arrows for “compare, review, or create a pull request”. You need to make sure that the two branches being compared are the ones you want. Make sure the branch on the left is companyXYZ/develop and the branch on the right is your origin/featureBranch. Make sure the changes are what you expected them to be, and that the branch can be merged automatically. Git will tell you with a big red button if it can’t be merged automatically, and that means you need to rebase again, probably, and fix any conflicts until you have a “clean” rebase with no conflicts. So, if everything is green, click “submit pull request”, and you are good to go.

Word to the wise here:  As you can see by the graphs there, the longer you wait before you rebase, the more commits from other people it will try to merge into your own. After a couple weeks, this can become a lot of conflicts or even whole files renamed / moved / deleted in architectural updates. What does this mean? Every time you notice that upstream develop has changed, you probably want to just quickly do:

> git fetch –all
> git rebase upstream/develop

and deal with any conflicts. This is a standard task that you should do every few days if you have an open feature branch, and if you do that, you shouldn’t get caught in “rebase hell”.

Act II. Congratulations, We are Going to Release your Code Instead of Firing You!

Scene 1: You Better Check Your Code Before You Wreck Your Code

    Ahhhhh, well. So you have your local tests and your IDE set up and it runs like a charm on your person laptop. Maybe we can get a branch to be made on the upstream, company fork of the repository, and then we could login to our develop server and git checkout that branch, and test it there, for a test-run before your PR to develop branch. Otherwise, it will go for review to develop and everyone will look at it and criticize you for all the errors that you inevitably made. You can also set up your git repo to auto-build on “continuous integration” (CI) systems like Travis CIJenkins CI, and if you are serious about computing power, a cloud provider like AWS, all through the github interface and little .config files and setup scripts. There are many, many CI systems to choose from, depending on which language / environment you are working under.

    Different devs have different programming styles, and this can lead to whitespace and formatting conflicts or inconsistencies in the code that can lead to confusion in a big project. The cleanest, most professional way to handle this is to actually auto-format your code with a formatter/checker (like clang for C++ projects) before you submit your PR for review, and your company might already have and auto-formatter ruleset or config for that. Sometime the continuous integration will even include a script that runs cppcheck or a similar tool to check for auto-format correctness / other errors and “kick” the auto-build to return an error if it’s not right, and this can also cause your PR to develop to have a big red button saying it’s not ready. This is just a way for devs to avoid having to deal with extra little code tangles and conflicts caused by personal formatting styles, although there can also be heated arguments over the rulesets there too.

    I specifically like anything related to project charting in the style of Kanban. Don’t forget to update your task at the project management sites, like Jira or Leankit, or whatever tool your company uses, so your managers will be happy and look good to their bosses. You can check out how to auto-build your project under all sorts of languages and environments, and how to set up hooks into project management / team-integration tools, by looking at the settings page for your github repo, under “Webhooks and Services”, at:

https://github.com/myUsername/exampleRepo/settings/hooks

      So you have a feature branch, cleanly rebased, and a PR for a bunch of code that you wrote, that is known to at least build correctly, and you hope it is fully functional, but may need a little testing before it goes live in the “production” code environment. Now some other devs may look at your code, at github in your PR, and ask you to remove some old test files or some change some weird log statements (or whatever) that they discover upon looking at or testing your branch. You can:

> git rm –cached path/to/filename.ext

to tell git to forget about a file (but keep it on your local disk using –cached), and then you just clean up the code on your local branch, on your local machine, then you can just:

> git add path/to/any_file_you_changed.ext
> git commit -m “removed a file, changed a couple files, bc of XYZ reason.”
> git push origin featureBranch

    You can now finish up with the interactive rebase again to clean up that commit. When you do:

> git fetch –all
> git rebase upstream/develop -i

you will see two commits, your earlier one for the PR and your new one, two lines with “pick” as the first word. You can just change the second (and any more) lines to say “f” there instead of “pick”, to squash those commits together again. Now you will do the forced push after rebase again:

> git push origin +featureBranch

This will not only push your local changes up to your branch at github, but it will also update the PR that you made from your branch to upstream develop, so that you don’t have to make a new PR every time you change some files on a branch with outstanding PRs. Github is nifty like that.

    If you fail to make regular pull requests, implementing new features or fixing bugs, on a weekly basis, it will show up in your main github account page, in the calendar log, and you might get fired! One really good way to attract hiring attention, or just a good personal goal for any project you want to accomplish, is to try and make at least one little touch per day to a public repository, to get those green marks in that github calendar and to keep that streak going!

Scene 2: Wait, lets do a hotfix first!

Figure 4. Git Project Model

Figure 4. Git Project Model

     Now your pull request has been accepted! This means it will be merged into the develop branch by a project administrator. When you are working by yourself, it’s still good to at least use a develop and master branch to keep bugs out of a working version of your code. When working on small teams, the best practise is to at least have someone besides yourself looking at each PR before it gets merged, by whoever. Lets look at the standard order of events as they might play out according to the Successful Git Branching Model, shown in a simplified form in Figure 4. We will track how your PR will move up the ladder to the right, making it’s way through a release branch and finally into the master branch.

    At the top left of Figure 4., we see our familiar develop and feature branches, and we see your feature branch PR that was merged into develop there. If an urgent fix to the master branch is needed, we will generally just pull a little copy of master called a “hotfix” branch, do some changes, and then send those changes up to master and into production, live on a server somewhere. If we are good git citizens, we will also send those changes back down to develop like we are supposed to, so that master and develop don’t start diverging.

    So on Monday morning, the manager tells us to get the new features up, live. First, they have their core devs or webmaster release a hotfix because of some critical bugfix or maybe a little GUI touch-up. This is how they would do that:

> git clone git@github.com:companyName/exampleRepo exampleRepo_hotfix
> cd exampleRepo_hotfix
> git checkout hotfix
> git pull origin master

Notice that we just check out our existing hotfix branch here and just “pull” down the latest code from master, so that our hotfix branch is up-to-date to the master branch before we start making our little urgent / GUI tweak updates. We could also just do:

> git checkout -b newHotfix master

This would just create a newHotfix branch directly from master. If your hotfix branch has weird merge errors or unexpected file changed during the “git pull origin master”, then it may be best to just start a new hotfix branch directly from master. Now some little changes go into a few files, and we add them individually, as such:

> git add path/to/any_files_you_changed.ext
> git commit -m “hotfix updates – urgently changed such and such files”
> git push origin hotfix

    At this point, a PR is made (through the github website) from the hotfix branch to the master branch, so that someone can “sanity check” the changes in the code, and then merge those changes into master. Even when working alone, you can cleanly maintain your master this way with hotfix branches, and get extra green checkmarks on your github profile page. Once master is updates, we will login the the production server and “git pull origin master” to update the production environment, then maybe restart some processes. Maybe this is webserver code and the webserver needs restarting after file updates, or maybe this is a public software where a new package file or OS-targetted executable needs to be built. It’s even possible to automate those things to happen every time that master is updated, using git hooks and scripts.

    Every hotfix update that goes to master should also go back down to develop, to keep develop up-to-date with master. So, after the hotfix PR is merged into master, the hotfix maintainer should also make a PR from the hotfix branch to develop branch. Notice this is the only time that master gets synced “downwards” to develop, so if hotfix PRs never make it to develop, those changes will gradually accumulate here, and eventually someone will get a hotfix PR down to develop with a bunch of unexpected changes or merge conflicts that they have to deal with, and that’s not cool!

Scene 3: Faster, Faster, Release a Branch to Master!

   OK so now you have done some testing of develop branch, and everyone agrees it’s time to release the develop branch up to master. I like to just make a new folder for each release branch. Common convention is to number by Version.Release.Revision, so a major update might be a new version, each release will likely get it’s own revision number, and any hotfixes would get their own revision number. So to create a release branch from develop branch:

> git clone git@github.com:companyName/exampleRepo exampleRepo_Release-1.1
> cd exampleRepo_Release-1.1
> git checkout -b Release-1.1 develop

    You might login to a dev/test server here and “git checkout Release-1.1” from your project folder, give it a look before the big merge to master. You might even find some little bugs or do some final touchups to your Release-1.1 branch before it is ready to go live into master, and up onto the production server. If you do touchups to the release branch here, they are called bugfix commits. So, what will likely happen with a bugfix is that a developer makes a local copy of the release branch on their own fork of the repo:

> git clone git@github.com:myUserName/exampleRepo exampleRepo_Release-1.1
> cd exampleRepo_Release-1.1
> git checkout -b Release-1.1 develop
> cd pull upstream Release-1.1

to get an up-to-date copy of the Release-1.1 branch, then they make their update / changes to some files, then:

Figure 4. Git Project Model

Figure 4. Git Project Model

> git add path/to/any_files_changed.ext
> git commit -m “bugfix update comments – getting release ready to go up to master”
> git push origin Release-1.1

then they will make a PR from their local Release-1.1 to the company Release-1.1 branch on github.com. This is shown again here in Figure 4., to remind you of the flow here, the bugfixes are shown in the release branch there.

    When all the bugfixes are down, it’s important to send a PR back down from the release branch to the develop branch and merge those bugfixes back down into develop. If those bugfixes don’t get back down to develop, you could see weird errors you already bugfixed in your develop environment, and they will also accumulate and get into a hotfix merge from master down to develop later.

    When you are satisfied with the release branch, you might want to make a PR from the release branch to the master branch in the github.com web interface. This will show you all the changes, so you can make sure nothing unexpected is going into master here, let you know about any merge conflicts to expect, and maybe trigger your CI system for a test build. If there are no unexpected changes, and no conflicts, a project admin may choose to just merge this PR directly through the github.com web interface.

    It is possible that you might have a PR from a release branch to master that will have merge conflicts that git can’t automatically resolve. This does not necessarily indicate that anything is broken, it just means you will have to merge the release branch into master by hand. As always, I like to keep a folder just for my master branch. So to merge a release branch into master:

> git clone git@github.com:companyName/exampleRepo exampleRepo_master
> cd exampleRepo_master
> git checkout master
> cd exampleRepo_master

or just go into your already existing exampleRepo_master folder, and ‘git fetch –all && git remote update’ to make sure it is up-to-date

> git merge –no-ff Release-1.1

    This can lead to git merge conflicts which you will have to resolve by hand. Check out my “Guide to Underground Git Merging” for details on how to handle that, this is a standard merge usage case. So, once you have finished merging the files:

> git status

will show you all the files that have been changed by this merge and which are ready to go into a commit, in green. Now:

> git commit -m “merged Release-1.1 into master”
> git push origin master

    This will bypass the github.com PR from Release-1.1 to master, and it will close automatically. We have now updated master with the new release branch code! This marks a succesful release of code upwards from dev to master.

One final thing, lets go ahead and tag this release.

> git tag Release-1.1
> git push –tags