彻底删除git库中的文件

代码如下:

## 注意Windows下用双引号
git filter-branch --index-filter 'git rm -r --cached --ignore-unmatch path/to/your/file' HEAD git push origin master --force rm -rf .git/refs/original/ git reflog expire --expire=now --all git gc --prune=now git gc --aggressive --prune=now

具体可以查看这篇文章 : http://help.github.com/remove-sensitive-data/

另外注意在多人协同工作时,防止其他人又将文件提交上来(如某些文件之前没有加入到.gitignore文件中,然后这个文件更改了),需要每个人执行上面除push外的其它代码,或者重新clone.

 

从git中永久删除文件以节省空间

在svn中的办法是把整个版本库dump出来filter一下再load回去。git中可以用下面的方法来实现:

在使用版本管理工具的过程中我们会碰到这样的问题:不小心把一个不该加入版本管理的文件加进去了,有时候这个文件很大,也许我们整个版本库才几百 K,但加进去这个没用的文件却有好几百M,我可不想因为这么个破烂东西把整个版本库整个硕大无比,以后维护备份都不方便;还有时候是不小心把一个敏感文件 加进去了,比如里面写了信用卡密码的文本文件。

这时候我们希望能把它从版本库中永久删除不留痕迹,不仅要让它在版本历史里看不出来,还要把它占用的空间也释放出来。

在svn中的办法是把整个版本库dump出来filter一下再load回去。git中可以用下面的方法来实现:

我们先创建一个试验用的版本库,并往里面提交一个10M的大文件再删除:

  1. $ mkdir t  
  2. $ cd t  
  3. $ git init  
  4. Initialized empty Git repository in  
  5.  /Users/apple/t/.git/  
  6. $ dd if=/dev/urandom of=testme.txt bs=10240 count=1024  
  7. 1024+0 records in  
  8. 1024+0 records out  
  9. 10485760 bytes transferred in 1.684808 secs (6223712 bytes/sec)  
  10. $ git add testme.txt  
  11. $ git commit -m "a"  
  12. [master (root-commit)]: created 6fbb432"a"  
  13.  1 files changed, 0 insertions(+), 0 deletions(-)  
  14.  create mode 100644 testme.txt  
  15. $ git rm testme.txt  
  16. rm 'testme.txt'  
  17. $ git commit -m r  
  18. [master]: created bb38396: "r"  
  19.  1 files changed, 0 insertions(+), 0 deletions(-)  
  20.  delete mode 100644 testme.txt  
这时候我们看看版本库的大小:
  1. $ du -hs  
  2.  10M    .  
很明显虽然testme.txt已经被删除了,但是因为版本历史里曾经有过这个文件,所以git仍然把它存在库中,以后可以通过它再把它恢复回来。

但我实在是不希望这么一个空版本库占用我10M宝贵的硬盘空间,所以我要把它全删掉,这就要用到git的filter-branch命令了。具体这个命令的用法可以看文档,下面是这个例子中的用法:

  1. $ git filter-branch --tree-filter 'rm -f testme.txt' HEAD  
  2. Rewrite bb383961a2d13e12d92be5f5e5d37491a90dee66 (2/2)  
  3. Ref 'refs/heads/master'  
  4.  was rewritten  
  5. $ git ls-remote .  
  6. 230b8d53e2a6d5669165eed55579b64dccd78d11        HEAD  
  7. 230b8d53e2a6d5669165eed55579b64dccd78d11        refs/heads/master  
  8. bb383961a2d13e12d92be5f5e5d37491a90dee66        refs/original/refs/heads/master  
  9. $ git update-ref -d refs/original/refs/heads/master [bb383961a2d13e12d92be5f5e5d37491a90dee66]  
  10. $ git ls-remote .  
  11. 230b8d53e2a6d5669165eed55579b64dccd78d11        HEAD  
  12. 230b8d53e2a6d5669165eed55579b64dccd78d11        refs/heads/master  
  13. $ rm -rf .git/logs  
  14. $ git reflog --all  
  15. $ git prune  
  16. $ git gc  
  17. $ du -hs  
  18.  84K    .  

 

OK,这个文件已经完完全全删掉了,版本库已经不再占用空间了。

Remove sensitive data

From time to time users accidentally commit data like passwords or keys into a git repo. While you can use git rm to remove the file, it will still be in the repo's history. Fortunately, git makes it fairly simple to remove the file from the entire repo history.

Change your password

This step should be blatantly obvious, but some users still skip it. If you committed a password, change it! If you committed a key, generate a new one.

Once the commit has been pushed you should consider the data to be compromised.

Purge the file from your repo

Now that the password is changed, you want to remove the file from history and add it to the .gitignore to ensure it is not accidentally re-committed. For our examples, we're going to remove Rakefile from the GitHub gem repo.

$ git clone https://github.com/defunkt/github-gem.git
# Initialized empty Git repository in /Users/tekkub/tmp/github-gem/.git/
# remote: Counting objects: 1301, done.
# remote: Compressing objects: 100% (769/769), done.
# remote: Total 1301 (delta 724), reused 910 (delta 522)
# Receiving objects: 100% (1301/1301), 164.39 KiB, done.
# Resolving deltas: 100% (724/724), done.

$ cd github-gem

$ git filter-branch --force --index-filter 'git rm --cached --ignore-unmatch Rakefile' \
  --prune-empty --tag-name-filter cat -- --all
# Rewrite 48dc599c80e20527ed902928085e7861e6b3cbe6 (266/266)
# Ref 'refs/heads/master' was rewritten

This command will run the entire history of every branch and tag, changing any commit that involved the file Rakefile, and any commits afterwards. Commits that are empty afterwards (because they only changed the Rakefile) are removed entirely. Now that we've erased the file from history, let's ensure that we don't accidentally commit it again.

Please note that this will overwrite your existing tags.

$ echo "Rakefile" >> .gitignore

$ git add .gitignore

$ git commit -m "Add Rakefile to .gitignore"
# [master 051452f] Add Rakefile to .gitignore
#  1 files changed, 1 insertions(+), 0 deletions(-)

This would be a good time to double-check that you've removed everything that you wanted to from the history. If you're happy with the state of the repo, you need to force-push the changes to overwrite the remote repo.

$ git push origin master --force
# Counting objects: 1074, done.
# Delta compression using 2 threads.
# Compressing objects: 100% (677/677), done.
# Writing objects: 100% (1058/1058), 148.85 KiB, done.
# Total 1058 (delta 590), reused 602 (delta 378)
# To https://github.com/defunkt/github-gem.git
#  + 48dc599...051452f master -> master (forced update)

You will need to run this for every branch and tag that was changed. The --all and --tags flags may help make that easier.

Cleanup and reclaiming space

While git filter-branch rewrites the history for you, the objects will remain in your local repo until they've been dereferenced and garbage collected. If you are working in your main repo you might want to force these objects to be purged.

$ rm -rf .git/refs/original/

$ git reflog expire --expire=now --all

$ git gc --prune=now
# Counting objects: 2437, done.
# Delta compression using up to 4 threads.
# Compressing objects: 100% (1378/1378), done.
# Writing objects: 100% (2437/2437), done.
# Total 2437 (delta 1461), reused 1802 (delta 1048)

$ git gc --aggressive --prune=now
# Counting objects: 2437, done.
# Delta compression using up to 4 threads.
# Compressing objects: 100% (2426/2426), done.
# Writing objects: 100% (2437/2437), done.
# Total 2437 (delta 1483), reused 0 (delta 0)

Note that pushing the branch to a new or empty GitHub repo and then making a fresh clone from GitHub will have the same effect.

Dealing with collaborators

You may have collaborators that pulled your tainted branch and created their own branches off of it. After they fetch your new branch, they will need to use git rebase on their own branches to rebase them on top of the new one. The collab should also ensure that their branch doesn't reintroduce the file, as this will override the .gitignore file. Make sure your collab uses rebase and not merge, otherwise he will just reintroduce the file and the entire tainted history... and likely encounter some merge conflicts.

Cached data on GitHub

Be warned that force-pushing does not erase commits on the remote repo, it simply introduces new ones and moves the branch pointer to point to them. If you are worried about users accessing the bad commits directly via SHA1, you will have to delete the repo and recreate it. If the commits were viewed online the pages may also be cached. Check for cached pages after you recreate the repo, if you find any open a ticket on GitHub Support and provide links so staff can purge them from the cache.

Avoiding accidental commits in the future

There are a few simple tricks to avoid committing things you don't want committed. The first, and simplest, is to use a visual program like GitHub for Mac or gitx to make your commits. This lets you see exactly what you're committing, and ensure that only the files you want are added to the repo. If you're working from the command line, avoid the catch-all commands git add . and git commit -a, instead use git add filename and git rm filename to individually stage files. You can also use git add --interactive to review each changed file and stage it, or part of it, for commit. If you're working from the command line, you can also use git diff --cached to see what changes you have staged for commit. This is the exact diff that your commit will have as long as you commit without the -a flag.

posted on 2013-03-01 11:45  Richard.FreeBSD  阅读(2090)  评论(0)    收藏  举报

导航