Friday, June 23, 2017

rsync from PC to Android to pull data as root

During my backup adventures, I came across many nice tools, such as SSHDroid and SimpleSSHD that allow me to SSH from my computer to the Android device. They even include niceities like rsync and ssh binaries that are quite useful. Alas, this team failed for me on a particular older phone that I wanted to slurp to the PC "as is" for all accessible files (including system ones) keeping their original permissions and owners, which of course needs root on both sides. That should have been easy to do with rooting support in both Android toolkits, but on this phone the SimpleSSHD did allow me to connect (and with public-key auth available for free, unlike SSHDroid) - but only as a "user", and SSHDroid could not start at all due to some execution errors.

Well, I thought, if I can log in by SSH and do su easily, I can just run the utilities from command line to start the rsync session from the phone to PC's SSH server. Alas, this idea got broken on several accounts:
  • First of all, even though I could run both /data/data/ and /data/data/ individually, running the common rsync -avPHK /data/ root@mypcIPaddr:/androidbackup/data/ failed again due to permissions error:
    rsync: Failed to exec ssh: Permission denied (13)
    rsync error: error in IPC code (code 14) at jni/pipe.c(84) [sender=3.0.8]
    Segmentation fault
    I tried to wiggle around this by e.g. passing a -e '/data/data/' argument, or adding the directory to beginning of PATH - but got the same results.
    I finally got these to work using the system path (and note these must be copies with root ownership - not symlinks to original files). Also mind that the Android/su shell syntax is quirky, to say the least:
    # mount -o remount,rw /system
    # cp /data/data/ /system/xbin/ssh
    # cp /data/data/ /system/xbin/rsync
    # chmod 755 /system/xbin/rsync /system/xbin/ssh
    # chown 0:0 /system/xbin/rsync /system/xbin/ssh
    # mount -o remount,ro /system
    This got me pretty far, now running just rsync (without prefixing the long path to app instance) became possible, and it could call ssh at least as:
    # rsync -e /system/xbin/ssh -avPHK \
        /data/ root@mypcIPaddr:/androidbackup/files/data/
    /system/xbin/ssh: Exited: Error connecting: Connection timed out
    Segmentation fault
    So this was better, but not enough.
  • Trying to connect using ssh, or even netcat, to any port of my PC seems impossible :\ Even after I disabled firewalls the best I could.
  • Trying to circumvent the firewalls issue, I made an SSH session from the PC to Android with a TCP tunnel, so I could rsyncback from the phone through it:
    root@pc# ssh -R 22222:localhost:22
    root@phone# rsync -e '/system/xbin/ssh -p 22222' -avPHK \
        /data/ root@localhost:/androidbackup/files/data/
    Login for root@localhost
    Segmentation fault 
    So this was close - the SSH session from phone to PC got established this way, but rsync still crashed before doing anything meaningful. Similar experiment with netcat tunnels also failed.
Going back to the idea of initiating SSH connections from the PC, which at least works, I tried to make the rsync binary setuid as root:
# chmod 106755 /system/xbin/rsync
But connections from PC to the android, using rsync --rsync-path=/system/xbin/rsync ... failed with permissions error accessing system dirs.

Finally, I made a wrapper script that calls su and this panned out:
su -c /system/xbin/rsync "$@"
I saved it as /system/xbin/rsync-su (and also chmodded it like the original programs above), and it worked!

Unfortunately, it seems that the Android-side rsync still crashes after some time or amount of transfers - though not quite repeatable (has long stretches of running well, too), so I wrapped it in a loop and excluded some files it had most problems with (passing over another time, without exclusions, allows to copy the few files by ssh+tar):
root@pc# while ! rsync --timeout=5 \
    --exclude='*.db' --exclude='*.db-*' \
    --partial-dir=.partial -e 'ssh -p 2222' \
    --rsync-path=/system/xbin/rsync-su \
    -avPHK root@phoneIPaddr:/data/ ./files/data/ \
    ; do echo "`date` : RETRY"; sleep 1; done

The sleep 1 is needed to reliably abort the loop by Ctrl+C if I want to stop it and tweak something.

UPDATE: in hindsight, maybe I should have looked for luckier builds of the tools as well. The Apps2SD project includes a formidable collection of programs, delivering busybox and rsync in particular. There are also binary builds for various platforms at But since my immediate pain has been resolved by a stone-hammer described in this post, I did not look into other possibly finer tools, at least not on this phone (this is recovery after all - even adding SW there is problematic).

For completeness, the target directory on the PC contains a few scripts I conjured up in haste, and a files/ subdirectory into which Android content lands.
One is dubbed tarcp, which does the copying with tar and ssh:

set -o pipefail


# A files/ should be under CWD
ssh -p "$PHONE_SSHPORT" "root@$PHONE_IP" \
  'su -c "cd / && tar cvf - \"'"$@"'\" "' \
  | (cd files/ && tar xvf - )
This is sort of wasteful, because when retries are needed (wifi flakiness, etc.) it re-copies the whole set of arguments.
Another is loopcp which calls the first one in a loop - to do those retries if needed:
for D in "$@" ; do while ! time ./tarcp "$D" ; do \
  echo "`date`: RETRY" ; sleep 1; done; done
Finally, there is a tool to help determine directory sizes on Android from PC, so I could copy-paste a lot of relatively small targets for loopcp to run over, and retries are relatively cheap (not like re-pulling several gigabytes over and over, after getting a last-minute hiccup):


#echo \
ssh -p $PHONE_SSHPORT root@$PHONE_IP 'su -c "cd / && \
   du -ks '"$1"'/* | while read S D \
      ; do [ \"\${S}\" -lt 100000 ] && \
        printf \"\\\"\$D\\\" \" ; done ; \
   echo ; echo ; \
   du -ks '"$@"'/* | sort -n | while read S D \
      ; do [ \"\${S}\" -ge 100000 ] && \
        printf \"\$S\t\$D\n\" ; done ; echo"'
This outputs a long string of double-quoted subdirectory names (or files) present in the argument-directory and are under 100Mb in size, and prints a sorted (by size) list of larger objects. The long string can be copy-pasted as argument list for loopcp; the list of larger objects is for drilling into them and breaking their contents into another string of small chunks of work.

Actually, this set of scripts is what I started with after initial failures with rsync, but in the end the tedious breaking down the list of arguments for "loopcp" and the uncertainty that I got all the bits and FS rights correct in the copy, pushed me to find a way to run the rsync after all - though most of the file content was copied by that time, using the scripts.

Wednesday, June 21, 2017

Tarballing Android files and partitions over adb

Some time back I was recovering a phone, and before I made something irreversible, I opted for a low-level backup. This copy of old data helped somewhat in migration of program configs to my next phone, too -- although changes between Android 4 and Android 6 did not make this an easy task, clobbering everything with security features ;)

Of course, this adventure required a lot of googling, and posting back as I learned, so I want to save as a cheat-sheet a solution I used:

Note that this was done using a Windows PC and corresponding tools and drivers, and the Linux side might have different caveats. And of course Android side was rooted, USB debugging allowed, busybox installed, etc. - thanks to a handy habit to do this early, while deploying a new phone.

So, a slightly edited copy of my post from SO follows:

First thing to note is that the adb shell generally sets up a text terminal (so can convert single end-of-line characters to CRLF, mangling binary data like partition images or TAR archives). While you can work around this in Unix/Linux versions of adb (e.g. add stty raw to your shelled command) or use some newer adb with exec-out option, on Windows it still writes CRLF to its output. The neat trick is to pass data through base64 encoding and decoding (binaries are available for Windows en-masse, google them). Also note that errors or verbose messages printed to stderr in shell end up on stdout of the adb shell program in the host system - so you want to discard those after inevitable initial experimentation.

Here goes (note that in command examples below, long command lines are wrapped for readability - you will need to unwrap them back while copy-pasting... not that my examples would match your phones exactly anyway):

adb shell "su -c 'cat /dev/block/mmcblk0p25 | base64' 2>/dev/null" \
  | base64 -d > s3-mmcblk0p25.img

Can be easily scripted by Windows cmd.exe shell to cover all the partitions (one can look the list up by ls -la /dev/block/ and/or by cat /proc/diskstats or by cat /proc/partitions), e.g.:

for /L %P in (1,1,25) do ..\platform-tools\adb shell \
  "su -c 'cat /dev/block/mmcblk0p%P | base64' 2>/dev/null" \
  | base64 -d > s3-mmcblk0p%P.img

(Note to use %%P in pre-created and saved CMD batch files, or %P in interactive shell).

Don't forget that there are also mmcblk0boot[01] partitions, and that the mmcblk0 overall contains all those partitions in a GPT wrapping, just like any other harddisk or impersonator of one :)

To estimate individual partition sizes, you can look at the output of:

fdisk -u -l /dev/block/mmcblk0*

Unfortunately, I did not quickly and easily manage to tar cf - mmcblkp0* and get the partition contents, so I could pipe it to e.g. 7z x -si and get the data out as multiple files in a portable one-liner as well.

To tar some files you can:

adb shell "su -c 'cd /mnt/data && tar czf - ./ | base64' \
  2>/dev/null" | base64 -d > s3-mmcblk0p25-userdata.tar.gz

Of course this solution is not perfect due to increase of transferal time by one third, but given that for recovery purposes grasping for any straws that can help is good enough... well... :) In any case, if I send a whole partition for export like this, I do not really care if it takes an hour vs hour and a half, as long as it reliably does the job. Also if, as some others have documented, the Windows variant of adb.exe does effectively always output CRLF separated text, there seems little we can do but abuse it with ASCII-friendly encapsulation. Alternately, adb exec-out might be the solution - but for some reason did not work well for me.

Hope this helps someone else, Jim Klimov

Wednesday, March 15, 2017

Digging in the Git history: cheatsheet

I've recently had to rewrite git history of some internal projects going opensource, and wanted to eliminate issues that should not be seen in the wild - like hardcoded testing passwords.

For a task like this it does not suffice to just add a commit that replaces those values with expansible variables populated from elsewhere (e.g. a testbed-local config file). One should also dig through all commits to make sure the string does not pop up anywhere in the history over the years (and git rebase / edit / continue rebase the offending commits as if we were smart enough to do this initially years ago).

There is quite a bit of mixed documentation on this, both in official docs and good suggestions in some blogs and stackexchange forums, but it took considerable time to end up with a few one-liners to do the job, so lest I forget -- I'd just post them here. After all, they can be useful to just find needles in a haystack too, such as finding which commits had to do with a certain keyword.

Finally note, that this script is not optimized for performance etc., but rather for readability and debugging of the procedure ;)
#! /bin/sh

### Copyright (C) 2017 by Jim Klimov

usage() {
    cat << EOF
You can call this script as
    $0 find_commits_contains [PATTERN]
    $0 find_commits_intro [PATTERN]
    $0 find_commits_drop [PATTERN]
    $0 fix_history [PATTERN] [REPLACEMENT]

Note that PATTERN and REPLACEMENT are fixed strings (not regexes) in this context
For the SED usage there is also a PATTERN_SED that you can optionally export

[ -n "${PATTERN-}" ] || PATTERN="needle"
### This script runs from the basedir of a git repo (or a few)
### Logs and other work files are commonly stored in the parent dir by default
[ -n "${LOGDIR-}" ] || LOGDIR="`pwd`/.."
[ -n "${LOGTAG-}" ] || LOGTAG="$(basename `pwd`)"

find_commits_contains() {
    ### This lists which "$COMMITID:$PATHNAME:$LINETEXT" contain the PATTERN in the LINETEXT
    ### Note that this lists all commits whose checked-out workspace would have the pattern
    [ -n "${WORKSPACE_MATCHES_FILE-}" ] || WORKSPACE_MATCHES_FILE="${LOGDIR}/gitdig_commits-workspace-contains__${LOGTAG}.txt"
    git grep "${PATTERN}" $(git rev-list --all --remotes) \

show_commits_color() {
    ### This finds which commits dealt the pattern (whose diff adds or removes a line with it, or has it in context)
    ### Note that this starts with a colorful depiction of first-pass diffs, for esthetic viewing pleasure
    ### and is parsed including color markup by other tools below
    git rev-list --all --remotes | \
        while read CMT ; do ( \
            git show --color --pretty='format:%b' "$CMT" | \
            egrep "${PATTERN}" && echo "^^^ $CMT" \
        ) ; done

find_commits_intro() {
    ### This finds which commits INTRODUCE the pattern (whose diff adds a line with it)
    ### Note that this starts with a colorful depiction of first-pass diffs, for esthetic viewing pleasure
    [ -n "${COMMIT_INTRODUCES_FILE-}" ] || COMMIT_INTRODUCES_FILE="${LOGDIR}/gitdig_commits-intro__${LOGTAG}.txt"
    show_commits_color | tee "${COMMIT_INTRODUCES_FILE}".tmp

    ### ...and this picks out the lines for commits which actually add the PATTERN
    ### (because there are also not interesting context and removal lines as well,
    ### which should disappear after the rebase)
    cat "${COMMIT_INTRODUCES_FILE}".tmp | \
    egrep '^([\^]|.*32m\+)' | \
    ggrep -A1 '32m\+' | \
    grep '^\^' \

find_commits_drop() {
    ### This finds which commits REMOVE the pattern (whose diff drops a line with it)
    ### Note that this starts with a colorful depiction of first-pass diffs, for esthetic viewing pleasure
    [ -n "${COMMIT_DROPS_FILE-}" ] || COMMIT_DROPS_FILE="${LOGDIR}/gitdig_commits-drop__${LOGTAG}.txt"
    show_commits_color | tee "${COMMIT_DROPS_FILE}".tmp

    ### ...and this picks out the lines for commits which actually add the PATTERN
    ### (because there are also not interesting context and removal lines as well,
    ### which should disappear after the rebase)
    cat "${COMMIT_DROPS_FILE}".tmp | \
    egrep '^([\^]|.*31m\-)' | \
    ggrep -A1 '31m\-' | \
    grep '^\^' \
    | tee "${COMMIT_DROPS_FILE}"

fix_history() {
    ### When the inspections are done with, we want to clean up the history
    ### Note that as part of this, we would drop "unreachable" commits that
    ### are not part of any branch's history (because these would still contain
    ### the original offending pattern.
    ### WARNING: This is the one destructive operation in this suite.
    ### You are advised to run it in a scratch full copy of your git repo.
    ### After it is successfully done, you are advised to destroy the published
    ### copy of the repo completely (e.g. on github) and force-push the cleaned
    ### one into a newly created instance of the repo, and have your team members
    ### destroy and re-fork their clones both on cloud platform and their local
    ### workspaces, so that the destroyed offending commits do not resurface.

    ### Note that you can stack more sed '-e ...' blocks below, e.g. to rewrite
    ### more patterns in one shot. Also note that the pattern and replacement
    ### representation for simple grep and regex in sed may vary... you may want
    ### to automate escaping of special chars.
    [ -n "${PATTERN_SED-}" ] || PATTERN_SED="${PATTERN}"
    git filter-branch --tree-filter "git grep '${PATTERN}' | sed 's,:.*\$,,' | sort | uniq | while read F ; do sed -e 's,${PATTERN_SED},${REPLACEMENT},g' -i '\$F'; done"
    git reflog expire --expire-unreachable=now --all
    git gc --prune=now

[ -n "$2" ] && PATTERN="$2"
[ -n "$3" ] && REPLACEMENT="$3"

case "$1" in
    -h|--help) usage; exit 0 ;;
    find|grep) ACTION="find_commits_intro"; shift ;;
    show|diff) ACTION="find_commits_show"; shift ;;
    drop|remove|del|delete) ACTION="find_commits_drop"; shift ;;
    fix_history|find_commits_contains|find_commits_intro|find_commits_drop) ACTION="$1"; shift ;;
    *) usage; exit 1 ;;

echo "Running routine: '$ACTION'"

I'll also post a copy of this to my github collection of git scripts, to track further possible evolution.
A similar operation to extract just a history of certain file(s) into a new repository can be done through a series of mail-patches, e.g.:

oldrepo$ rm -rf /tmp/ggg ; mkdir -p /tmp/ggg

oldrepo$ FILES=""

### NOTE: Maybe factor $FILES (hash of?) into PRJ to allow different exports
### from same project to a new repo. Or just tag this manually :)

oldrepo$ PRJ="$(basename "`pwd`" | sed 's,[^A-Za-z\-_0-9\.],_,g')"

### The following results in a patch-file series ordered by UTC Unix epoch
timestamp of the original commit, allowing to mix similar export series
from different files or projects, and then importing as one monotonous
### history in a new repository. Note that git log tends to not end the last
### line well so we help it:

oldrepo$ ( git log --pretty='format:%H %ad' --date=unix $FILES ; echo "" ) | \
   ( A=0; while read HASH TIME ; do \
     [ -n "$HASH" ] && git format-patch -1 --no-numbered --stdout "$HASH"
     > /tmp/ggg/"$TIME-$PRJ-`printf '%04d' "$A"`".patch ; A=$(($A+1)); \
     done )

### Can filter the patch files further, e.g. update commit message formatting...

newrepo$ git am /tmp/ggg/*.patch