Friday, July 22, 2016

Setting up an OpenIndiana Hipster on OmniOS Bloody for distcc compilation... and vice-versa!

I participate in and try out a number of illumos-based community projects. Among these, I have a bleeding-edge OpenIndiana Hipster distribution on my laptop, and an OmniOS Bloody installation on a storage server at work. Recently I was playing with a recipe for distributed compilation with distcc, and wanted to try it out. Same operating environments are very much preferred, so the remote compilation head should also be (or seem) an OI Hipster. Luckily, we can fool IPS into installing whatever we want, as long as it is sane... (And note that part of the success story below may be due to both distros using the bleeding-edge code from illumos-gate, so underlying kernel and system calls expected by userland code are the same).

UPDATE below: the inverse setup, making an OmniOS zone hosted on OpenIndiana Hipster, seems also possible - though with a bit more workaround dance.

It all starts with a LAN connection, so here goes a bit of preparation: a dedicated VNIC where the buildhost will live:

root@omnios-host:/# dladm show-link
bge0        phys      1500   up       --         --
vboxnet0    phys      1500   up       --         --

root@backup-host:/root# dladm create-vnic -l bge0 vnic199

Note about zone nuances: OmniOS native (ipkg) zones are not-linked (do not require tight coupling of global-zone and local-zone software versions). This is inverted vs. OpenIndiana, where "ipkg" is linked like in latest OpenSolaris builds, and a new "nlipkg" is not-linked.

Create a zone:
root@omnios-host:/# zonecfg -z oibuild
zonecfg:oibuild> create -t SUNWipkg
zonecfg:oibuild> set zonepath=/zones/oibuild
zonecfg:oibuild> add net
zonecfg:oibuild:net> set physical=vnic199
zonecfg:oibuild:net> end
zonecfg:oibuild> set ip-type=exclusive
zonecfg:oibuild> set autoboot=true
zonecfg:oibuild> verify
zonecfg:oibuild> commit
zonecfg:oibuild> ^D

I might go on with delegated datasets, etc. - but this setup will NFS- or LOFS-mount whatever I need from the GZ later. So far I need the basics running.

And for these basics, I need the zone created without looking at incompatibilities. Namely, the "entire" incorporation does not concern me here, but the standard zone-branding script (/usr/lib/brand/ipkg/pkgcreatezone) blindly wants "entire" to be same as in GZ regardless of ipkg/nlipkg details (if there is an "entire" in GZ - otherwise it is happy without... so we extend it a bit):

root@omnios-host:/# cp -pf /usr/lib/brand/ipkg/pkgcreatezone /usr/lib/brand/ipkg/pkgcreatezone.orig

...and apply (or type in) this patch:
--- /usr/lib/brand/ipkg/pkgcreatezone.orig      2016-01-29 15:57:40.006965818 +0100
+++ /usr/lib/brand/ipkg/pkgcreatezone   2016-07-22 18:09:49.734017982 +0200
@@ -169,6 +169,7 @@
 # It's ok to not find entire in the current image, since this means the user
 # can install pre-release development bits for testing purposes.
+[[ -n "$NO_ENTIRE_FMRI" ]] && entire_fmri="" || \

Now, I can bind it to my will using environment variables (so it defaults to doing standard incantations otherwise) :)

Also note that the original standard script only allows one package publisher to be used during zone installation. It could be expanded to pre-set and use more, just that such change was not needed for this experiment and so is left out of the article's scope - and additional publishers are added after initial installation.

And it is simple to use:

root@omnios-host:/# zoneadm -z oibuild uninstall -F ; \
    NO_ENTIRE_FMRI=yes zoneadm -z oibuild install -v -P

A ZFS file system has been created for this zone.
       Image: Preparing at /zones/oibuild/root.
   Publisher: Using (
       Cache: Using /var/pkg/publisher.
  Installing: Packages (output follows)

Packages to install: 120
Mediators to change:   6
 Services to change:   4

DOWNLOAD                                PKGS         FILES    XFER (MB)   SPEED

Completed                            120/120   28192/28192  200.5/200.5  1.1M/s

PHASE                                          ITEMS

Installing new actions                   41903/41903
Updating package state database                 Done
Updating package cache                           0/0
Updating image state                            Done
Creating fast lookup database                   Done

        Note: Man pages can be obtained by installing pkg:/system/manual
 Postinstall: Copying SMF seed repository ... done.
        Done: Installation completed in 325.405 seconds.
  Next Steps: Boot the zone, then log into the zone console (zlogin -C)
              to complete the configuration process.

Now a bit more repos to add...

root@omnios-host:/# pkg -R /zones/oibuild/root set-publisher -g

root@omnios-host:/# pkg -R /zones/oibuild/root set-publisher -g hipster-encumbered

And a few packages I need here...
root@omnios-host:/# pkg -R /zones/oibuild/root install build-essential ccache rsync sudo mc

Ultimately the zone can be booted, basic networking set up, home directory attached, "sudo gmake component-environment-prep" executed to get some common build deps for the recipe I'm at - and an OI buildhost running under OmniOS is ready to roll. Why should that hardware stay dormant? (once I get that distcc package done well) :)

So the distcc server on oibuild zone has been compiled and got running (not without hiccups so far - but hey, that's what I'm tinkering on):

root@oibuild:/# distccd --user jim -j 30 --stats --stats-port 12345 --log-stderr --no-detach --verbose --daemon -a &

And the client build from the laptop goes like:

jim@laptop$ gmake clean; echo =======; \
    DISTCC_HOSTS=",lzo,cpp" CCACHE_PREFIX="distcc" \
    COMPONENT_BUILD_GMAKE_ARGS=-j20 pump gmake publish


UPDATE: A similar setup, as far as mirror twins go, is also possible. Slightly more changes are needed to the "pkgcreatezone" script in OI Hipster host, because originally it requires the new "sysding" package (successor to old "sysidcfg" scripts) that is not provided by other distros.

root@hipster-host:/# diff -bu /usr/lib/brand/ipkg/pkgcreatezone{.orig,}
--- /usr/lib/brand/ipkg/pkgcreatezone.orig 2016-08-25 11:20:57.249926456 +0200
+++ /usr/lib/brand/ipkg/pkgcreatezone 2016-09-23 10:24:54.690076605 +0200
@@ -169,6 +169,7 @@
 # It's ok to not find entire in the current image, since this means the user
 # can install pre-release development bits for testing purposes.
+[[ -n "$NO_ENTIRE_FMRI" ]] && entire_fmri="" || \
@@ -263,7 +264,6 @@
- pkg:/service/management/sysding
@@ -272,6 +272,10 @@
+[[ -n "$NO_SYSDING" ]] || \
+ pkg:/service/management/sysding"
 # Get some diagnostic tools, truss, dtrace, etc.

Then you set up the VNIC and zone configuration, similar to the example above (note the OI "nlipkg" brand has the "OI", not "SUNW", prefix):

root@hipster-host:/# dladm create-vnic -l e1000g1 omnibld0

root@hipster-host:/# zonecfg -z omnibld
zonecfg:omnibld> create -t OInlipkg
zonecfg:omnibld> set zonepath=/zones/omnibld
zonecfg:omnibld> add net
zonecfg:omnibld:net> set physical=omnibld0
zonecfg:omnibld:net> end
zonecfg:omnibld> set ip-type=exclusive
zonecfg:omnibld> set autoboot=true
zonecfg:omnibld> verify
zonecfg:omnibld> commit
zonecfg:omnibld> ^D

And finally install the zone (I prepend destruction of old attempts... because... well... experiments are like that :) ):

root@hipster-host:/# zoneadm -z omnibld uninstall -F ; \
   zfs destroy -r rpool/zones/omnibld ; \
   NO_SYSDING=yes  NO_ENTIRE_FMRI=yes zoneadm -z omnibld install -v \
     -P omnios=

cannot open 'rpool/zones/omnibld': dataset does not exist
A ZFS file system has been created for this zone.
       Image: Preparing at /zones/omnibld/root.

   Publisher: Using omnios (
       Cache: Using /var/pkg/publisher.
  Installing: Packages (output follows)
Packages to install: 90
Mediators to change:  1
 Services to change:  4

DOWNLOAD                                PKGS         FILES    XFER (MB)   SPEED
Completed                              90/90   24570/24570  161.7/161.7  406k/s

PHASE                                          ITEMS
Installing new actions                   37912/37912
Updating package state database                 Done 
Updating package cache                           0/0 
Updating image state                            Done 
Creating fast lookup database                   Done 

        Note: Man pages can be obtained by installing pkg:/system/manual
 Postinstall: Copying SMF seed repository ... done.
        Done: Installation completed in 526.523 seconds.

  Next Steps: Boot the zone, then log into the zone console (zlogin -C)
              to complete the configuration process.

root@hipster-host:/# pkg -R /zones/omnibld/root set-publisher -g \

root@hipster-host:/# pkg -R /zones/omnibld/root set-publisher -g \

And then, probably, to get something useful in that zone, I'd build it or proceed to install "pkgsrc" framework from - but that's another story.

Finally note, that for pedantic production use you'd create a new zone brand referring the customized "pkgcreatezone" script (leaving the default packaged one untouched) and the standard copies of other files, similar to how the "OInlipkg" definition does. As a lesser evil, you only need the tweaks when you create a new zone like this - so you can revert to the saved "pkgcreatezone.orig" copy after installations are completed, if you're concerned about this.

Sunday, January 10, 2016

Robocopy'ing a windows system partition...


I'm having some fun time migrating a laptop away from a dying HDD (as confirmed by SMART checkups, e.g. using a nice Windows GUI wrapper for smartmontools called GSmartControl) to a new one, connected temporarily into an USB carrier.

I've hit and cleared a number of roadblocks, so decided to sum up for posterity what I did here ;)
  • Windows Activation is evil ;) If you clone to a new harddisk, try to minimize the system changes (e.g. partition numbering, BIOS versions or hardware bits like memory chips should better wait until you get the cloned Windows to boot without losing its Genuine status, if you're lucky and it at all happens).
  • If NTFS compression is enabled, and especially if you use non-Windows NTFS implementation for file-based copying, make sure to NOT compress the copy of "C:\bootmgr" and "C:\bootwin" files, as well as (most likely) the "C:\Boot", "C:\Recovery", "C:\System Restore", "C:\Windows\System32", "C:\Windows\winsxs" and "C:\Windows\SysWOW64" (whichever of these are available), since the boot manager and low-level drivers can be picky about being readable and usable "as is" from on-disk bits. It may be possible to apply compression after the clone boots up and locks the sensitive files.
  • CygWin-based rsync is, alas, not suitable for the task
  • XCOPY too, at least not to just refresh the existing replica
  • NTFS-3g also did not cut it for me, at least not from OpenIndiana
  • Using VSS (Shadow Volumes) sounded like a good idea, e.g. to copy the registry files and other always-open objects, but it is a bit of an adventure to actually create the volume in Windows 7 (and other non-server versions, I gather)
  • Native Microsoft RoboCopy has an issue copying ACLs, and a Microsoft hotfix 979808 was not applicable to the version installed on the laptop, so files (like the "C:\Boot" directory) have to become owned by the current user - or a hack can be used to copy as "TrustedInstaller" (who owns most of such offending objects)
  • DO HAVE the installation or at least recovery disk or ISO image and a means to boot it (CDROM drive, Dual-bootable hypervisor installed in a neighbor partition, etc.) for the cases when things do go wrong and you need to e.g. rewrite mount paths (registry HKLM\System\MountedDevices), reconfigure boot configuration (bcdedit, bootrec), decompress some driver files, etc.
    It also seems that "robocopy" used from the context of the Recovery session does not suffer the permission issues, and of course there is no issue with opened and thus inaccessible system files. And if you fiddle with mountpoints or drive letters, you can also avoid complications due to copying of directory junctions (links). In fact, I'd now suggest to start with this option if possible.


I just used the GUI under Computer Management MMC to partition the new disk for the OSes it will have, and to "activate" the partition for Windows. This enabled bootability of the new disk from the NTFS. (Note that ultimately another partition with GRUB would chainload the Windows one, if needed). Further fine-tuning is possible with GNU "parted" (many OSes) or a Linux "fdisk".

It seems that this laptop's new WD SSHD drive has 512-byte sectors on the harddisk part, but announces larger sectors due to the Flash layer. After the initial copy refused to boot, and a "dd" byte-by-byte copy as well, we reformatted the disk guessing a 4KB NTFS block size and it seemed to do the trick.

Native Windows tools including "diskpart", "bcdedit", "bootrec" and "bootsect" also had their moment of glory during my experiments (getting that copy to boot up is an adventure of its own, possibly because the new SSHD disk has weird announcements about its sector sizes).



Initially I used a CygWin build of rsync (from the cwRsync project, maybe defunct now) to migrate the bulk of data with the usual "rsync -avPHK ..." mantra. Maybe I shouldn't have done so...

While cwrsync serves me well for networked media backups (photos etc.) between NTFS and Unix machines, including sym/hardlink support vs. NTFS directory- and file-"junctions" (which in turn are easily manipulated with the free FAR Manager), it happened to be a bad idea to use it within the Windows system. After the rsync run, all "junctions" were replicated as text files with "/cygdrive/c/..." contents which had to be cleaned up manually (again using FAR: one instance to try and copy over the changes, skipping all conflicts, and another FAR to manually redo the junctions on which the first one hicced-up -- mostly these are standardized structural links under C:\Users though a few others were of my own making).

According to docs and forums, the non-POSIX NTFS features like alternate data streams are not seen by CygWin and so cwrsync (or other similar builds). Also I'm not convinced that ownerships and ACLs were properly transferred. And, rather predictably, "rsync" failed to copy files opened by the system, such as the registry and some log/db files.


I tried to reprocess the existing files with some other tools, rather than copy stuff over again (dying disk... quite a few retries though nearly no actual IO errors yet... took a couple of days to get here).

Alas, Microsoft's XCOPY seems to copy the same files over - which kinda defeated my purpose. Just for kicks, this carefully crafted command-line did not help me:
xcopy /O /X /E /H /K /B /Y /R /C /L "C:\Boot" "W:\Boot"
It might still be a good choice for the initial (and only) copy, however.


My next idea was to reboot into another OS, mount the two Windows partitions, and rsync the data there. But with directory symlinks becoming full paths (e.g. "/mnt/win/Users" for the "./Documents and Settings") with whatever options I used to mount the NTFS volumes, the idea failed.

Also, again, I'm not certain that ACLs and owner/groups are properly replicated in this manner - which ordinarily matters little for me when I manipulate user-data, but can matter for the OS when its guts are being migrated.


Windows 7 has a reduced feature set of the "vssadmin" command: it can manipulate the Shadow area and list what is available, but it can not take snapshots. According to the Internet lore, there is another program from the Windows SDK which fills the gap - but also there is the System Restore GUI which uses (and creates) the snapshots under the hood. IF it is enabled at all. Oh boy...

  • To enable Shadowing on the original drive, set aside some space (min 300mb):
vssadmin resize shadowstorage /for=c: /on=c: /maxsize=320mb
  • On a more capable (Windows Server) system you would just use CLI to create the snapshot, e.g.:
vssadmin create shadow /For=C:
  • To create a snapshot on a desktop Windows 7, go to System Properties (Win+P) and into "System protection". Verify that it is "on" for your original drive ("C:") and if you can, press "Create..." to make a System Restore point, which is a VSS snapshot. Note that in my experiment, creating another point removed the first one. Maybe there was too little space set aside, or the feature is constrained in desktop Windows versions.
  • To enable usage of System Restore (if it is initially "Disabled by your system administrator"), go to Registry (thanks to several blogs, like this one:
    • Start (Win+R) the regedit program
    • Browse into HKLM\Software\Policies\Microsoft\Windows NT\SystemRestore
    • Delete the values "DisableConfig" and "DisableSR"
    • No reboot should be needed, just re-open the System protection tab in the step above
  • Look up the snapshot name with:
    vssadmin List Shadows
    Shadow Copy Volume: \\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy2
  • Mount that snapshot with "mklink /d" so it becomes a read-only directory (until the snapshot gets killed by the system):
    mklink /d c:\snap \\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy2\
  • Note that for some specific purposes you can link to a sub-object, e.g.:
    mklink /d c:\snapwork \\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy2\Users\myname\Documents\Work\
  • To unmount a snapshot later, it would suffice to just remove the FS object of the link (note that destruction of the snapshot is a different matter; it usually happens "by itself" at behest of the OS as it clears out some living space for itself or new snapshots):
    rmdir c:\snap


My final spell for replication of ownership, ACLs and changes that piled up on the original system since the beginning of replication, was to use MS RoboCopy, done like this - manually for each directory I was interested in (not all of them existing ones). There is an issue with replication of permissions for FS objects that you (or your Administrators group) are not an owner of. The hotfix did not work for me, but the workaround of grabbing the ownership (on target filesystem - this sufficed) did work for some files. However for "C:\Windows" there are way more concerns about such fiddling... so Running cmd.exe as TrustedInstaller also sounds like a good idea for at least most of those files. And also a boosted robocopy logging to track which files yielded "Error 5" when trying to set NTFS permissions. Another problem is that the Shadows used in System Restore points are not quite versatile snapshots, and they seem to omit certain files so that the OS rollback would not change user documents - so copying bits only from the snapshot is not enough.

First I copied from the mounted VSS snapshot under "C:\snap" into the mounted new partition under "W:", then overlaid it with files from the live filesystem (if different), e.g.:
set DD=Program Files
robocopy "C:\snap\%DD%" "W:\%DD%" /COPYALL /DCOPY:T /SECFIX /TIMFIX /SL /FFT /DST /R:1 /W:1 /E /MIR /XJD /LOG+:w:\robocopy.log
robocopy "C:\%DD%" "W:\%DD%" /COPYALL /DCOPY:T /SECFIX /TIMFIX /SL /FFT /DST /R:1 /W:1 /E /XJD /LOG+:w:\robocopy.log

Names with spaces can also be used, as seen above - without quotation marks.

If you are in the Recovery Console and still see any errors due to files (likely bits of the Registry) being locked by some process, try to "Force Dismount" of the original and new volumes using "chkdsk /x" or "diskpart"/"select volume X"/"assign letter=Y".

For objects with potentially conflicting "8.3" names ("dir /x" is your friend to discover these), like "PROGRA~1" for anything starting with "Progra...", be sure to copy objects in the same order as they are numbered (or "move" into a temporary subdirectory and back) - for me, some Shortcuts and registry paths happened to be saved with the short space-less names in mind, so this nuance did matter for bootability of the copied system.

Note that the "/MIR" flag (mirroring) enables "/PURGE /E", the former of which removes files on destination that the origin no longer has (named "Extra" files in the finally printed stats). This might be or not be desirable depending on the directory you're passing over.

The "/XJ" (or at least "/XJD") flag is critical to avoid infinite loops while copying the "C:\Users" area. With this flag, robocopy does not recurse into (directory) junctions. Alas, there is no flag to copy over definitions of the junctions - they are ignored completely. I filled this niche with FAR, as described earlier.

Alternately, you can find junctions with a bit of CMD and some patience, e.g.:
set DD=ProgramData
dir "C:\%DD%" /q /r /a:h 2>NUL | find "JUNCTION"
or just search for links:
dir "C:\%DD%" /q /r /a:l 2>NUL

You can re-create junctions using "mklink (/d)".


Ok, maybe I should have started with this - but it was a bit complicated on a laptop with no CD drive ;)

It may help avoid surprises to use "diskpart" and mark the source disk (or volume) as read-only, to be sure that recursions into it, if they ever happen, do not damage original files. Also do not use drive-letters that these volumes normally hold when booted as part of an installed OS. For example, to mark the disks visibly as "O"ld and "N"ew, assign drive-letters "O:" and "N:" respectively:
x:> diskpart
list volume
(find which one is your source)
select volume 3
assign letter=o
attributes volume set readonly
list volume
(find which one is your target)
select volume 1
assign letter=n

To verify, you can try creating a file in temporary location of the original partition:
x:> echo test > o:\temp\test.txt
The media is write-protected.
Now (hopefully) you are safe to (re-)run robocopy so it recreates the links as needed.

Anyhow, in this context Robocopy rocks quite well - e.g. copies the ownerships and ACLs with no problems. If I mount the source and destination drives so that none of them uses the drive letters normally occupied in the installed OS, the directory junctions become non-recursive and so do not cause problems (the /XJ /XJD /XJF flags become not-needed), and SYMLINKD entries even get copied over.

However, copying JUNCTION entries is still a problem. These would better be remade with "dir" to discover and "mklink /d" to create (now, this *may* require that a volume is mounted into original drive-letter, so links resolve well at time of creation... or maybe not...). Robocopy just recurses and creates paths of real directories too long to easily remove (I had to rename "Application Data" into "x" dozens of times, before I cold kill the mis-made tree.


The directory junctions seemingly name the full path, e.g. linking to disk "C:", whichever way it is defined in the currently booted OS. Likewise, the paths are entered in very many locations if the registry and other configuration files.

So make sure to mount the disks correctly, so the drive letters of old and new logical volumes resolve the same when you boot from them... I have yet to boot from this one, and maybe go back into Safe mode to redefine which partitions are mounted to which drive-letters, when the new disk gets finally bolted into the laptop... I keep my fingers crossed now ;)