Sunday, January 10, 2016

Robocopy'ing a windows system partition...

OVERVIEW

I'm having some fun time migrating a laptop away from a dying HDD (as confirmed by SMART checkups, e.g. using a nice Windows GUI wrapper for smartmontools called GSmartControl) to a new one, connected temporarily into an USB carrier.

I've hit and cleared a number of roadblocks, so decided to sum up for posterity what I did here ;)
  • Windows Activation is evil ;) If you clone to a new harddisk, try to minimize the system changes (e.g. partition numbering, BIOS versions or hardware bits like memory chips should better wait until you get the cloned Windows to boot without losing its Genuine status, if you're lucky and it at all happens).
  • If NTFS compression is enabled, and especially if you use non-Windows NTFS implementation for file-based copying, make sure to NOT compress the copy of "C:\bootmgr" and "C:\bootwin" files, as well as (most likely) the "C:\Boot", "C:\Recovery", "C:\System Restore", "C:\Windows\System32", "C:\Windows\winsxs" and "C:\Windows\SysWOW64" (whichever of these are available), since the boot manager and low-level drivers can be picky about being readable and usable "as is" from on-disk bits. It may be possible to apply compression after the clone boots up and locks the sensitive files.
  • CygWin-based rsync is, alas, not suitable for the task
  • XCOPY too, at least not to just refresh the existing replica
  • NTFS-3g also did not cut it for me, at least not from OpenIndiana
  • Using VSS (Shadow Volumes) sounded like a good idea, e.g. to copy the registry files and other always-open objects, but it is a bit of an adventure to actually create the volume in Windows 7 (and other non-server versions, I gather)
  • Native Microsoft RoboCopy has an issue copying ACLs, and a Microsoft hotfix 979808 was not applicable to the version installed on the laptop, so files (like the "C:\Boot" directory) have to become owned by the current user - or a hack can be used to copy as "TrustedInstaller" (who owns most of such offending objects)
  • DO HAVE the installation or at least recovery disk or ISO image and a means to boot it (CDROM drive, Dual-bootable hypervisor installed in a neighbor partition, etc.) for the cases when things do go wrong and you need to e.g. rewrite mount paths (registry HKLM\System\MountedDevices), reconfigure boot configuration (bcdedit, bootrec), decompress some driver files, etc.
    It also seems that "robocopy" used from the context of the Recovery session does not suffer the permission issues, and of course there is no issue with opened and thus inaccessible system files. And if you fiddle with mountpoints or drive letters, you can also avoid complications due to copying of directory junctions (links). In fact, I'd now suggest to start with this option if possible.

FORMATTING THE DISKS


I just used the GUI under Computer Management MMC to partition the new disk for the OSes it will have, and to "activate" the partition for Windows. This enabled bootability of the new disk from the NTFS. (Note that ultimately another partition with GRUB would chainload the Windows one, if needed). Further fine-tuning is possible with GNU "parted" (many OSes) or a Linux "fdisk".

It seems that this laptop's new WD SSHD drive has 512-byte sectors on the harddisk part, but announces larger sectors due to the Flash layer. After the initial copy refused to boot, and a "dd" byte-by-byte copy as well, we reformatted the disk guessing a 4KB NTFS block size and it seemed to do the trick.

Native Windows tools including "diskpart", "bcdedit", "bootrec" and "bootsect" also had their moment of glory during my experiments (getting that copy to boot up is an adventure of its own, possibly because the new SSHD disk has weird announcements about its sector sizes).

TRACKING THE FAILURES


RSYNC


Initially I used a CygWin build of rsync (from the cwRsync project, maybe defunct now) to migrate the bulk of data with the usual "rsync -avPHK ..." mantra. Maybe I shouldn't have done so...

While cwrsync serves me well for networked media backups (photos etc.) between NTFS and Unix machines, including sym/hardlink support vs. NTFS directory- and file-"junctions" (which in turn are easily manipulated with the free FAR Manager), it happened to be a bad idea to use it within the Windows system. After the rsync run, all "junctions" were replicated as text files with "/cygdrive/c/..." contents which had to be cleaned up manually (again using FAR: one instance to try and copy over the changes, skipping all conflicts, and another FAR to manually redo the junctions on which the first one hicced-up -- mostly these are standardized structural links under C:\Users though a few others were of my own making).

According to docs and forums, the non-POSIX NTFS features like alternate data streams are not seen by CygWin and so cwrsync (or other similar builds). Also I'm not convinced that ownerships and ACLs were properly transferred. And, rather predictably, "rsync" failed to copy files opened by the system, such as the registry and some log/db files.

XCOPY


I tried to reprocess the existing files with some other tools, rather than copy stuff over again (dying disk... quite a few retries though nearly no actual IO errors yet... took a couple of days to get here).

Alas, Microsoft's XCOPY seems to copy the same files over - which kinda defeated my purpose. Just for kicks, this carefully crafted command-line did not help me:
xcopy /O /X /E /H /K /B /Y /R /C /L "C:\Boot" "W:\Boot"
It might still be a good choice for the initial (and only) copy, however.

NTFS-3g


My next idea was to reboot into another OS, mount the two Windows partitions, and rsync the data there. But with directory symlinks becoming full paths (e.g. "/mnt/win/Users" for the "./Documents and Settings") with whatever options I used to mount the NTFS volumes, the idea failed.

Also, again, I'm not certain that ACLs and owner/groups are properly replicated in this manner - which ordinarily matters little for me when I manipulate user-data, but can matter for the OS when its guts are being migrated.

SHADOWS, SNAPSHOTS and SYSTEM RESTORE


Windows 7 has a reduced feature set of the "vssadmin" command: it can manipulate the Shadow area and list what is available, but it can not take snapshots. According to the Internet lore, there is another program from the Windows SDK which fills the gap - but also there is the System Restore GUI which uses (and creates) the snapshots under the hood. IF it is enabled at all. Oh boy...

  • To enable Shadowing on the original drive, set aside some space (min 300mb):
vssadmin resize shadowstorage /for=c: /on=c: /maxsize=320mb
  • On a more capable (Windows Server) system you would just use CLI to create the snapshot, e.g.:
vssadmin create shadow /For=C:
  • To create a snapshot on a desktop Windows 7, go to System Properties (Win+P) and into "System protection". Verify that it is "on" for your original drive ("C:") and if you can, press "Create..." to make a System Restore point, which is a VSS snapshot. Note that in my experiment, creating another point removed the first one. Maybe there was too little space set aside, or the feature is constrained in desktop Windows versions.
  • To enable usage of System Restore (if it is initially "Disabled by your system administrator"), go to Registry (thanks to several blogs, like this one: http://forum.thewindowsclub.com/windows-tips-tutorials-articles/34531-re-enable-disabled-system-restore-system-administrator.html):
    • Start (Win+R) the regedit program
    • Browse into HKLM\Software\Policies\Microsoft\Windows NT\SystemRestore
    • Delete the values "DisableConfig" and "DisableSR"
    • No reboot should be needed, just re-open the System protection tab in the step above
  • Look up the snapshot name with:
    vssadmin List Shadows
    ...
    Shadow Copy Volume: \\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy2
    ...
  • Mount that snapshot with "mklink /d" so it becomes a read-only directory (until the snapshot gets killed by the system):
    mklink /d c:\snap \\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy2\
  • Note that for some specific purposes you can link to a sub-object, e.g.:
    mklink /d c:\snapwork \\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy2\Users\myname\Documents\Work\
  • To unmount a snapshot later, it would suffice to just remove the FS object of the link (note that destruction of the snapshot is a different matter; it usually happens "by itself" at behest of the OS as it clears out some living space for itself or new snapshots):
    rmdir c:\snap

ROBOCOPY


My final spell for replication of ownership, ACLs and changes that piled up on the original system since the beginning of replication, was to use MS RoboCopy, done like this - manually for each directory I was interested in (not all of them existing ones). There is an issue with replication of permissions for FS objects that you (or your Administrators group) are not an owner of. The hotfix did not work for me, but the workaround of grabbing the ownership (on target filesystem - this sufficed) did work for some files. However for "C:\Windows" there are way more concerns about such fiddling... so Running cmd.exe as TrustedInstaller also sounds like a good idea for at least most of those files. And also a boosted robocopy logging to track which files yielded "Error 5" when trying to set NTFS permissions. Another problem is that the Shadows used in System Restore points are not quite versatile snapshots, and they seem to omit certain files so that the OS rollback would not change user documents - so copying bits only from the snapshot is not enough.

First I copied from the mounted VSS snapshot under "C:\snap" into the mounted new partition under "W:", then overlaid it with files from the live filesystem (if different), e.g.:
set DD=Program Files
robocopy "C:\snap\%DD%" "W:\%DD%" /COPYALL /DCOPY:T /SECFIX /TIMFIX /SL /FFT /DST /R:1 /W:1 /E /MIR /XJD /LOG+:w:\robocopy.log
robocopy "C:\%DD%" "W:\%DD%" /COPYALL /DCOPY:T /SECFIX /TIMFIX /SL /FFT /DST /R:1 /W:1 /E /XJD /LOG+:w:\robocopy.log

Names with spaces can also be used, as seen above - without quotation marks.

If you are in the Recovery Console and still see any errors due to files (likely bits of the Registry) being locked by some process, try to "Force Dismount" of the original and new volumes using "chkdsk /x" or "diskpart"/"select volume X"/"assign letter=Y".

For objects with potentially conflicting "8.3" names ("dir /x" is your friend to discover these), like "PROGRA~1" for anything starting with "Progra...", be sure to copy objects in the same order as they are numbered (or "move" into a temporary subdirectory and back) - for me, some Shortcuts and registry paths happened to be saved with the short space-less names in mind, so this nuance did matter for bootability of the copied system.

Note that the "/MIR" flag (mirroring) enables "/PURGE /E", the former of which removes files on destination that the origin no longer has (named "Extra" files in the finally printed stats). This might be or not be desirable depending on the directory you're passing over.

The "/XJ" (or at least "/XJD") flag is critical to avoid infinite loops while copying the "C:\Users" area. With this flag, robocopy does not recurse into (directory) junctions. Alas, there is no flag to copy over definitions of the junctions - they are ignored completely. I filled this niche with FAR, as described earlier.

Alternately, you can find junctions with a bit of CMD and some patience, e.g.:
set DD=ProgramData
dir "C:\%DD%" /q /r /a:h 2>NUL | find "JUNCTION"
or just search for links:
dir "C:\%DD%" /q /r /a:l 2>NUL

You can re-create junctions using "mklink (/d)".

RECOVERY SESSION FROM INSTALL DISC


Ok, maybe I should have started with this - but it was a bit complicated on a laptop with no CD drive ;)

It may help avoid surprises to use "diskpart" and mark the source disk (or volume) as read-only, to be sure that recursions into it, if they ever happen, do not damage original files. Also do not use drive-letters that these volumes normally hold when booted as part of an installed OS. For example, to mark the disks visibly as "O"ld and "N"ew, assign drive-letters "O:" and "N:" respectively:
x:> diskpart
list volume
(find which one is your source)
select volume 3
assign letter=o
attributes volume set readonly
list volume
(find which one is your target)
select volume 1
assign letter=n

To verify, you can try creating a file in temporary location of the original partition:
x:> echo test > o:\temp\test.txt
The media is write-protected.
Now (hopefully) you are safe to (re-)run robocopy so it recreates the links as needed.

Anyhow, in this context Robocopy rocks quite well - e.g. copies the ownerships and ACLs with no problems. If I mount the source and destination drives so that none of them uses the drive letters normally occupied in the installed OS, the directory junctions become non-recursive and so do not cause problems (the /XJ /XJD /XJF flags become not-needed), and SYMLINKD entries even get copied over.

However, copying JUNCTION entries is still a problem. These would better be remade with "dir" to discover and "mklink /d" to create (now, this *may* require that a volume is mounted into original drive-letter, so links resolve well at time of creation... or maybe not...). Robocopy just recurses and creates paths of real directories too long to easily remove (I had to rename "Application Data" into "x" dozens of times, before I cold kill the mis-made tree.

FINAL NOTES


The directory junctions seemingly name the full path, e.g. linking to disk "C:", whichever way it is defined in the currently booted OS. Likewise, the paths are entered in very many locations if the registry and other configuration files.

So make sure to mount the disks correctly, so the drive letters of old and new logical volumes resolve the same when you boot from them... I have yet to boot from this one, and maybe go back into Safe mode to redefine which partitions are mounted to which drive-letters, when the new disk gets finally bolted into the laptop... I keep my fingers crossed now ;)