Wednesday, March 15, 2017

Digging in the Git history: cheatsheet

I've recently had to rewrite git history of some internal projects going opensource, and wanted to eliminate issues that should not be seen in the wild - like hardcoded testing passwords.

For a task like this it does not suffice to just add a commit that replaces those values with expansible variables populated from elsewhere (e.g. a testbed-local config file). One should also dig through all commits to make sure the string does not pop up anywhere in the history over the years (and git rebase / edit / continue rebase the offending commits as if we were smart enough to do this initially years ago).

There is quite a bit of mixed documentation on this, both in official docs and good suggestions in some blogs and stackexchange forums, but it took considerable time to end up with a few one-liners to do the job, so lest I forget -- I'd just post them here. After all, they can be useful to just find needles in a haystack too, such as finding which commits had to do with a certain keyword.

Finally note, that this script is not optimized for performance etc., but rather for readability and debugging of the procedure ;)
#! /bin/sh

### Copyright (C) 2017 by Jim Klimov

usage() {
    cat << EOF
You can call this script as
    $0 find_commits_contains [PATTERN]
    $0 find_commits_intro [PATTERN]
    $0 find_commits_drop [PATTERN]
    $0 fix_history [PATTERN] [REPLACEMENT]

Note that PATTERN and REPLACEMENT are fixed strings (not regexes) in this context
For the SED usage there is also a PATTERN_SED that you can optionally export
EOF
}

[ -n "${PATTERN-}" ] || PATTERN="needle"
[ -n "${REPLACEMENT-}" ] || REPLACEMENT="@CUSTOM_NEEDLE@"
### This script runs from the basedir of a git repo (or a few)
### Logs and other work files are commonly stored in the parent dir by default
[ -n "${LOGDIR-}" ] || LOGDIR="`pwd`/.."
[ -n "${LOGTAG-}" ] || LOGTAG="$(basename `pwd`)"

find_commits_contains() {
    ### This lists which "$COMMITID:$PATHNAME:$LINETEXT" contain the PATTERN in the LINETEXT
    ### Note that this lists all commits whose checked-out workspace would have the pattern
    [ -n "${WORKSPACE_MATCHES_FILE-}" ] || WORKSPACE_MATCHES_FILE="${LOGDIR}/gitdig_commits-workspace-contains__${LOGTAG}.txt"
    git grep "${PATTERN}" $(git rev-list --all --remotes) \
    | tee "${WORKSPACE_MATCHES_FILE}"
}

show_commits_color() {
    ### This finds which commits dealt the pattern (whose diff adds or removes a line with it, or has it in context)
    ### Note that this starts with a colorful depiction of first-pass diffs, for esthetic viewing pleasure
    ### and is parsed including color markup by other tools below
    git rev-list --all --remotes | \
        while read CMT ; do ( \
            git show --color --pretty='format:%b' "$CMT" | \
            egrep "${PATTERN}" && echo "^^^ $CMT" \
        ) ; done
}

find_commits_intro() {
    ### This finds which commits INTRODUCE the pattern (whose diff adds a line with it)
    ### Note that this starts with a colorful depiction of first-pass diffs, for esthetic viewing pleasure
    [ -n "${COMMIT_INTRODUCES_FILE-}" ] || COMMIT_INTRODUCES_FILE="${LOGDIR}/gitdig_commits-intro__${LOGTAG}.txt"
    show_commits_color | tee "${COMMIT_INTRODUCES_FILE}".tmp

    ### ...and this picks out the lines for commits which actually add the PATTERN
    ### (because there are also not interesting context and removal lines as well,
    ### which should disappear after the rebase)
    echo "LIST OF COMMITS THAT INTRODUCE PATTERN (cached in ${COMMIT_INTRODUCES_FILE}.tmp) :"
    cat "${COMMIT_INTRODUCES_FILE}".tmp | \
    egrep '^([\^]|.*32m\+)' | \
    ggrep -A1 '32m\+' | \
    grep '^\^' \
    | tee "${COMMIT_INTRODUCES_FILE}"
}

find_commits_drop() {
    ### This finds which commits REMOVE the pattern (whose diff drops a line with it)
    ### Note that this starts with a colorful depiction of first-pass diffs, for esthetic viewing pleasure
    [ -n "${COMMIT_DROPS_FILE-}" ] || COMMIT_DROPS_FILE="${LOGDIR}/gitdig_commits-drop__${LOGTAG}.txt"
    show_commits_color | tee "${COMMIT_DROPS_FILE}".tmp

    ### ...and this picks out the lines for commits which actually add the PATTERN
    ### (because there are also not interesting context and removal lines as well,
    ### which should disappear after the rebase)
    echo "LIST OF COMMITS THAT DROP THE PATTERN (cached in ${COMMIT_DROPS_FILE}.tmp) :"
    cat "${COMMIT_DROPS_FILE}".tmp | \
    egrep '^([\^]|.*31m\-)' | \
    ggrep -A1 '31m\-' | \
    grep '^\^' \
    | tee "${COMMIT_DROPS_FILE}"
}

fix_history() {
    ### When the inspections are done with, we want to clean up the history
    ### Note that as part of this, we would drop "unreachable" commits that
    ### are not part of any branch's history (because these would still contain
    ### the original offending pattern.
    ###
    ### WARNING: This is the one destructive operation in this suite.
    ###
    ### You are advised to run it in a scratch full copy of your git repo.
    ###
    ### After it is successfully done, you are advised to destroy the published
    ### copy of the repo completely (e.g. on github) and force-push the cleaned
    ### one into a newly created instance of the repo, and have your team members
    ### destroy and re-fork their clones both on cloud platform and their local
    ### workspaces, so that the destroyed offending commits do not resurface.

    ### Note that you can stack more sed '-e ...' blocks below, e.g. to rewrite
    ### more patterns in one shot. Also note that the pattern and replacement
    ### representation for simple grep and regex in sed may vary... you may want
    ### to automate escaping of special chars.
    [ -n "${PATTERN_SED-}" ] || PATTERN_SED="${PATTERN}"
    git filter-branch --tree-filter "git grep '${PATTERN}' | sed 's,:.*\$,,' | sort | uniq | while read F ; do sed -e 's,${PATTERN_SED},${REPLACEMENT},g' -i '\$F'; done"
    git reflog expire --expire-unreachable=now --all
    git gc --prune=now
}

[ -n "$2" ] && PATTERN="$2"
[ -n "$3" ] && REPLACEMENT="$3"

ACTION=""
case "$1" in
    -h|--help) usage; exit 0 ;;
    find|grep) ACTION="find_commits_intro"; shift ;;
    show|diff) ACTION="find_commits_show"; shift ;;
    drop|remove|del|delete) ACTION="find_commits_drop"; shift ;;
    fix_history|find_commits_contains|find_commits_intro|find_commits_drop) ACTION="$1"; shift ;;
    *) usage; exit 1 ;;
esac

echo "Running routine: '$ACTION'"
"$ACTION"

I'll also post a copy of this to my github collection of git scripts, to track further possible evolution.
 
A similar operation to extract just a history of certain file(s) into a new repository can be done through a series of mail-patches, e.g.:

oldrepo$ rm -rf /tmp/ggg ; mkdir -p /tmp/ggg

oldrepo$ FILES="some-script.sh"

### NOTE: Maybe factor $FILES (hash of?) into PRJ to allow different exports
### from same project to a new repo. Or just tag this manually :)

oldrepo$ PRJ="$(basename "`pwd`" | sed 's,[^A-Za-z\-_0-9\.],_,g')"

### The following results in a patch-file series ordered by UTC Unix epoch
###
timestamp of the original commit, allowing to mix similar export series
###
from different files or projects, and then importing as one monotonous
### history in a new repository. Note that git log tends to not end the last
### line well so we help it:

oldrepo$ ( git log --pretty='format:%H %ad' --date=unix $FILES ; echo "" ) | \
   ( A=0; while read HASH TIME ; do \
     [ -n "$HASH" ] && git format-patch -1 --no-numbered --stdout "$HASH"
$FILES \
     > /tmp/ggg/"$TIME-$PRJ-`printf '%04d' "$A"`".patch ; A=$(($A+1)); \
     done )

### Can filter the patch files further, e.g. update commit message formatting...

newrepo$ git am /tmp/ggg/*.patch

4 comments:

  1. I never knew something like this exist, thanks for sharing. Keep up the good work also let me know if this is good or not! Download Gbwhatsapp

    ReplyDelete
  2. Thanks for the nice article you can also visit Whatsapp Aero Apk

    ReplyDelete
  3. Casino Royale - Live Dealer Games - Virgin Games
    Casino Royale https://vannienailor4166blog.blogspot.com/ is a live casino https://deccasino.com/review/merit-casino/ with a large, eclectic portfolio https://septcasino.com/review/merit-casino/ of 1xbet app casino games. Players can play jancasino.com this game with live dealers,

    ReplyDelete