Clean sensitive files from GIT Repo

Tuesday, May 7, 2019

By: Chris Dunn

Have you ever committed sensitive or at least unwanted files to a GIT repo?  You thought the file was ignored or unstaged and pushed your changes.  That's one problem with the nice GUI git desktops and IDE integrations.  It's easy to just push the button without thinking.  I find this is most often done at the beginning of projects before you're fulling thinking about the details of Source Code Management.

So we accidentally commit a password.txt file or a configuration file config.json with password information inside.  You could use filter-branch to rewrite the history of your project, but that's really slow depending on size.  Luckily this is a common enough problem that Roberto Tyley came up with a good solution.  It's called BFG Repo-Cleaner, and is up to 720x faster than filter-branch.

This utility has lots of options so I will cover only two that I have used personally.  I encourage you to explore the documentation for the full list (https://github.com/rtyley/bfg-repo-cleaner).

1. Clone Mirror of Repo

First you need to grab a mirror clone of the repo you want to change. You don't need all the files downloaded locally to run these commands.

 
$ git clone --mirror https://github.com/user/cool-repo.git

2. Execute BFG command(s)

The commands are pretty straightforward. The first command --delete-files will remove the file (in this case config.json) from the repo.

 
$ bfg --delete-files config.json  cool-repo.git

The --replace-text commands allows you to search the contents of all files and replace certain text with other text. For this command the passwords.txt file contains a list of the text to replace.

 
$ bfg --replace-text passwords.txt  cool-repo.git

The passwords.txt file contains a list of all text (in this case passwords) to replace and what to replace them with. We can use the default **REMOVED** as in the first example or specify directly as in the last two. You can also use REGEX to get a bit fancier and save yourself some typing if you have a lot of changes.

#Change text to **REMOVED**
FIRSTPWD
#Change text to NEWPWD
SECONDPWD==>NEW PWD
#Change text to empty text
THIRDPWD==>  

3. Repo Housecleaning

After you've made the changes, removed files, you should run some housekeeping available in git already, git gc, which allows you to "Cleanup unnecessary files and optimize the local repository."  Here is some more info if you're unfamiliar with the command. https://git-scm.com/docs/git-gc.

 
$ cd cool-repo.git
$ git reflog expire --expire=now --all && git gc --prune=now --aggressive

4. Push Repo

When you're all done, go ahead and push your changes. If you're working on a team, make sure everyone grabs the updated copy of the repo with the history changes.

 
$ git push

There you have it, a nice addition to you developer toolbox.  I hope this post not only gives you a fallback in case you commit files you should have kept private, but also makes you double check your commits from the start. 

Tags: git utility

Copyright 2019 Cidean, LLC. All rights reserved.

Proudly running Umbraco 7. This site is responsive with the help of Foundation 5.