Planet Linux Australia
Celebrating Australians & Kiwis in the Linux and Free/Open-Source community...

August 08, 2020

Setting the default web browser on Debian and Ubuntu

If you are wondering what your default web browser is set to on a Debian-based system, there are several things to look at:

$ xdg-settings get default-web-browser
brave-browser.desktop

$ xdg-mime query default x-scheme-handler/http
brave-browser.desktop

$ xdg-mime query default x-scheme-handler/https
brave-browser.desktop

$ ls -l /etc/alternatives/x-www-browser
lrwxrwxrwx 1 root root 29 Jul  5  2019 /etc/alternatives/x-www-browser -> /usr/bin/brave-browser-stable*

$ ls -l /etc/alternatives/gnome-www-browser
lrwxrwxrwx 1 root root 29 Jul  5  2019 /etc/alternatives/gnome-www-browser -> /usr/bin/brave-browser-stable*

Debian-specific tools

The contents of /etc/alternatives/ is system-wide defaults and must therefore be set as root:

sudo update-alternatives --config x-www-browser
sudo update-alternatives --config gnome-www-browser

The sensible-browser tool (from the sensible-utils package) will use these to automatically launch the most appropriate web browser depending on the desktop environment.

Standard MIME tools

The others can be changed as a normal user. Using xdg-settings:

xdg-settings set default-web-browser brave-browser-beta.desktop

will also change what the two xdg-mime commands return:

$ xdg-mime query default x-scheme-handler/http
brave-browser-beta.desktop

$ xdg-mime query default x-scheme-handler/https
brave-browser-beta.desktop

since it puts the following in ~/.config/mimeapps.list:

[Default Applications]
text/html=brave-browser-beta.desktop
x-scheme-handler/http=brave-browser-beta.desktop
x-scheme-handler/https=brave-browser-beta.desktop
x-scheme-handler/about=brave-browser-beta.desktop
x-scheme-handler/unknown=brave-browser-beta.desktop

Note that if you delete these entries, then the system-wide defaults, defined in /etc/mailcap, will be used, as provided by the mime-support package.

Changing the x-scheme-handler/http (or x-scheme-handler/https) association directly using:

xdg-mime default brave-browser-nightly.desktop x-scheme-handler/http

will only change that particular one. I suppose this means you could have one browser for insecure HTTP sites (hopefully with HTTPS Everywhere installed) and one for HTTPS sites though I'm not sure why anybody would want that.

Summary

In short, if you want to set your default browser everywhere (using Brave in this example), do the following:

sudo update-alternatives --config x-www-browser
sudo update-alternatives --config gnome-www-browser
xdg-settings set default-web-browser brave-browser.desktop

Small 1/4 inch socket set into a nicer walnut tray

 I was recently thinking about how I could make a selection of 1/4 inch drive bits easier to use. It seems I am not alone in the crowd of people who leave the bits in the case they came in. Some folks do that for many decades. Apart from being trapped into what "was in the set" this also creates an issue when you have some 1/4 inch parts in a case that includes many more 3/8 inch drive bits. I originally marked the smaller drive parts and though about leaving them in the blow molded case as is the common case.

The CNC fiend in me eventually got the better of me and the below is the result. I cut a prototype in pine first, knowing that the chances of getting it all as I wanted on the first try was not impossible, but not probable either. Version 1 is shown below.

 

 The advantage is that now I have the design in Fusion 360 I can cut this design in about an hour. So if I want to add a bunch of deep sockets to the set I can do that for the time cost mostly of gluing up a panel, fixturing it and a little sand a shellac. Not a trivial en devour but the result I think justifies the means.

Below is the board still fixtured in the cnc machine. I think I will make a jig with some sliding toggle clamps so I can fix panels to the jig and then bolt the jig into the cnc instead of directly using hold down clamps.

I have planned to use a bandsaw to but a profile around the tools and may end up with some handle(s) on the tray. That part is something I have to think more about. The thinking about how I want the tools to be stored and accessed is an interesting side project.



 

 

August 04, 2020

Willsmere and Cricket

Whether Australia's first notable cricketeer, Tom Wills was at Kew Asylum is apparently subject to debate.

The following Kew Asylum related cricket stories, however, are not. Note the inclusion of one of the greats of early Australian cricket, Hugh Trumble.

William Evans Midwinter

The plaque on grave commemorates William Evans Midwinter (1851-1890), the only cricketer to play for Australia versus England (8 tests) and England versus Australia (4 tests).

William ("Billy") Evans Midwinter (19 June 1851– 3 December 1890) was an English born cricketer who played four Test matches for England, sandwiched in between eight Tests that he played for Australia. Midwinter holds a unique place in cricket history as the only cricketer to have played for Australia and England in Test Matches against each other.

By 1889, Midwinter's wife and two of his children had died, and his businesses were failed or failing. He became "hopelessly insane" and was confined to Bendigo Hospital in 1890. He was then transferred to the Kew Asylum, where he died later that year.

http://monumentaustralia.org.au/display/30687-william-evans-midwinter

KEW ASYLUM CRICKET CLUB.

From: The Australasian (Melbourne, Vic. : 1864 - 1946) Sat 2 Oct 1875
Page 11

A meeting of the members of the staff of the Kew Asylum was held on Saturday afternoon last, when it was resolved to form a cricket club at the establishment. Dr. Robertson was elected president, and Dr. Watkins and Mr. William Davis vice-presidents, Dr. Molloy hon. secretary and treasurer, and Messrs. Trumble, Johnston, Swift, and Flynn as committee.

The club starts with a large number of members, and with such players as Swtft, Niall and Flynn, it is likely to prove rather formidable. The club has not had an opportunity of making any matches yet, but would be glad to receive a few challenges for the ensuing season.

KEW ASYLUM CRICKET CLUB.

From: Boyle & Scott's Australian cricketers' guide., no.1882/83, 1882-01-01 p114

The club has had a fairly successful season, although they had tough opponents in Bohemia, Kew, Fitzroy, Brighton, &c. Among the players, H. Trumble, T. Foley, and G. Roberts have shown improved form with the bat, whilst M ‘Michael, W. Trumble, and Swift are as effective as of old. In bowling, H. Trumble, Arnold, and Swift have been most destructive.

Batting Averages.

Not Most Most
Inns. out. Runs, in inns, in match. Aver.
J. M'Michael 19 9 528 64* 64* 52.8
J. W. Trumble 6 1 229 105* 105* 45.4
J. S. Swift 15 4 497 79 79 45.2
C. Ross 4 0 114 75 75 28.2
H. Trumble 18 7 234 52* 52* 21.3
G. Roberts 13 1 167 39 39 13.11
T. Foley 17 3 163 44 44 11.9
G. Arnold 14 2 103 . 31 31 8.7

Bowling Averages.

Balls. Mdns. Runs. Wits. Aver.
J. W. Trumble 268 16 50 13 3.11
J. S. Swift 394 16 148 25 5.23
G. Arnold 650 26 320 43 7.19
T. Foley 258 11 88 8 11
H. Trumble 834 33 349 28 12.13
G. M'Garvin... 120 3 51 4 12.3
J. M'Michael 177 5 126 10 12.6

Swift 2no balls; Trumble 1.

CRICKET. KEW ASYLUM v. N. MELBOURNE.

From: The Reporter (Box Hill, Vic. : 1889 - 1925) Fri 12 Mar 1909 Page 7

The above match, played on the asylum ground on Saturday, was won by the home team by 7 wickets and 127 runs. North Melbourne scored 47 (Howlott 20 not out), while the Asylum lost 3 wickets for 174 (R. Morrison 110 not out, including 17 fourers, A. Walsh 38, R. Walsh 25 not out). Howlett, 1 for 34, and Buncle, 1 for 33, took the wickets for North Melbourne, and for Kew, Kenny 4 for 19, Crouch 4 for 21.

August 01, 2020

Audiobooks – July 2020

The Address Book: What Street Addresses Reveal About Identity, Race, Wealth, and Power by Deirdre Mask

Covered the subtitle well. I would have like some of more technical stuff that the author mentioned reading. 3/5

The Fated Sky: Lady Astronaut #2 by Mary Robinette Kowal

Set mostly in the lead-up to the first Mars expedition and the journey to Mars. Lots of interpersonal/interracial problems & fixing toilets. 3/5

Enola Gay: Mission to Hiroshima by Gordon Thomas

Mainly following Paul Tibbets and the 509th Composite Group plus some on the ground in Hiroshima. Has “minute-by-minute coverage of the critical periods”. 3/5

Pandemic by John Dryden

A 3 part radio play set before/after and during a global pandemic. Total length just 2 hours. Parts 2 & 3 felt a little cliched. 3/5

An Economist Walks into a Brothel: And Other Unexpected Places to Understand Risk by Allison Schrager

Examples of how people in unusual situations handle risk and how you can apply it to your life. Interesting and useful. 4/5

100 Things The Simpsons Fans Should Know & Do Before They Die by Allie Goertz & Julia Prescott

Over 4 minutes per fact so plenty of depth. A Great collection of stuff for casual and serious fans. 4/5

Chernobyl 01:23:40 : The Incredible True Story of the World’s Worst Nuclear Disaster by Andrew Leatherbarrow

Chapters on the disaster and aftermath alternate with the author’s trip to the Chernobyl Exclusion Zone. A good intro to the disaster. 3/5

Turn the Ship Around! : A True Story of Turning Followers into Leaders by L. David Marquet

The management advice is lost on me but the stories about turning around an under-performing sub crew in weeks is interesting. 3/5

My Scoring System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Share

July 31, 2020

Extending GPG key expiry

Extending the expiry on a GPG key is not very hard, but it's easy to forget a step. Here's how I did my last expiry bump.

Update the expiry on the main key and the subkey:

gpg --edit-key KEYID
> expire
> key 1
> expire
> save

Upload the updated key to the keyservers:

gpg --export KEYID | curl -T - https://keys.openpgp.org
gpg --keyserver keyring.debian.org --send-keys KEYID

Links July 2020

iMore has an insightful article about Apple’s transition to the ARM instruction set for new Mac desktops and laptops [1]. I’d still like to see them do something for the server side.

Umair Haque wrote an insightful article about How the American Idiot Made America Unlivable [2]. We are witnessing the destruction of a once great nation.

Chris Lamb wrote an interesting blog post about comedy shows with the laugh tracks edited out [3]. He then compares that to social media with the like count hidden which is an interesting perspective. I’m not going to watch TV shows edited in that way (I’ve enjoyed BBT inspite of all the bad things about it) and I’m not going to try and hide like counts on social media. But it’s interesting to consider these things.

Cory Doctorow wrote an interesting Locus article suggesting that we could have full employment by a transition to renewable energy and methods for cleaning up the climate problems we are too late to prevent [4]. That seems plausible, but I think we should still get a Universal Basic Income.

The Thinking Shop has posters and decks of cards with logical fallacies and cognitive biases [5]. Every company should put some of these in meeting rooms. Also they have free PDFs to download and print your own posters.

gayhomophobe.com [6] is a site that lists powerful homophobic people who hurt GLBT people but then turned out to be gay. It’s presented in an amusing manner, people who hurt others deserve to be mocked.

Wired has an insightful article about the shutdown of Backpage [7]. The owners of Backpage weren’t nice people and they did some stupid things which seem bad (like editing posts to remove terms like “lolita”). But they also worked well with police to find criminals. The opposition to what Backpage were doing conflates sex trafficing, child prostitution, and legal consenting adult sex work. Taking down Backpage seems to be a bad thing for the victims of sex trafficing, for consenting adult sex workers, and for society in general.

Cloudflare has an interesting blog post about short lived certificates for ssh access [8]. Instead of having user’s ssh keys stored on servers each user has to connect to a SSO server to obtain a temporary key before connecting, so revoking an account is easy.

July 27, 2020

How to create Linux bridges and Open vSwitch bridges with NetworkManager

My virtual infrastructure Ansible role supports connecting VMs to both Linux and Open vSwitch bridges, but they must already exist on the KVM host.

Here is how to convert an existing Ethernet device into a bridge. Be careful if doing this on a remote machine with only one connection! Make sure you have some other way to log in (e.g. console), or maybe add additional interfaces instead.

Export interfaces and existing connections

First, export the the device you want to convert so we can easily reference it later (e.g. eth1).

export NET_DEV="eth1"

Now list the current NetworkManager connections for your device exported above, so we know what to disable later.

sudo nmcli con |egrep -w "${NET_DEV}"

This might be something like System eth1 or Wired connection 1, let’s export it too for later reference.

export NM_NAME="Wired connection 1"

Create a Linux bridge

Here is an example of creating a persistent Linux bridge with NetworkManager. It will take a device such as eth1 (substitute as appropriate) and convert it into a bridge. Note that we will be specifically giving it the device name of br0 as that’s the standard convention and what things like libvirt will look for.

Make sure you have exported your device as NET_DEV and its existing NetworkManager connection name as NM_NAME from above, you will use them below.

sudo nmcli con add ifname br0 type bridge con-name br0
sudo nmcli con add type bridge-slave ifname "${NET_DEV}" master br0 con-name br0-slave-"${NET_DEV}"

Note that br0 probably has a different MAC address to your physical interface. If so, make sure you update and DHCP reservations (or be able to find the new IP once the bridge is brought up).

sudo ip link show dev br0
sudo ip link show dev "${NET_DEV}"

Configure the bridge

As mentioned above, by default the Linux bridge will get an address via DHCP. If you don’t want it to be on the network (you might have another dedicated interface) then disable DHCP on it.

sudo nmcli con modify br0 ipv4.method disabled ipv6.method disabled

Or, if you need set a static IP you can do that too.

sudo nmcli con modify br0 ipv4.method static ipv4.address 192.168.123.100/24

If you need to set a specific MTU like 9000 (defaults to 1500), you can do that.

sudo nmcli con modify br0-slave-enp4s0 802-3-ethernet.mtu 9000

Finally, spanning tree protocol is on by default, so disable it if you need to.

sudo nmcli con modify br0 bridge.stp no

Bring up the bridge

Now you can either simply reboot, or stop the current interface and bring up the bridge (do it in one command in case you’re using the one interface, else you’ll get disconnected). Note that your IP might change once bridge comes up, if you didn’t check the MAC address and update any static DHCP leases.

sudo nmcli con down "${NM_NAME}" ; \
sudo nmcli con up br0

Create an Open vSwitch (OVS) bridge

OVS bridges are often used for plumbing into libvirt for use with VLANs.

We can create an OVS bridge which will consists of the bridge itself and multiple ports and interfaces which connect everything together, including the physical device itself (so we can talk on the network) and virtual ports for VLANs and VMs. By default the physical port on the bridge will use untagged (native) VLAN, but if all your traffic needs to be tagged then we can add a tagged interface.

Here is an example of creating a persistent OVS bridge with NetworkManager. It will take a device such as eth1 (substitute as appropriate) and convert it into an ovs-bridge.

Install dependencies

You will need openvswitch installed as well as the OVS NetworkManager plugin.

sudo dnf install -y NetworkManager-ovs openvswitch
sudo systemctl enable --now openvswitch
sudo systemctl restart NetworkManager

Create the bridge

Let’s create the bridge, its port and interface with these three commands.

sudo nmcli con add type ovs-bridge conn.interface ovs-bridge con-name ovs-bridge
sudo nmcli con add type ovs-port conn.interface port-ovs-bridge master ovs-bridge con-name ovs-bridge-port
sudo nmcli con add type ovs-interface slave-type ovs-port conn.interface ovs-bridge master ovs-bridge-port con-name ovs-bridge-int

Patch in our physical interface

Next, create another port on the bridge and patch in our physical device as an Ethernet interface so that real traffic can flow across the network. Make sure you have exported your device as NET_DEV and its existing NetworkManager connection name as NM_NAME from above, you will use them below.

sudo nmcli con add type ovs-port conn.interface ovs-port-eth master ovs-bridge con-name ovs-port-eth
sudo nmcli con add type ethernet conn.interface "${NET_DEV}" master ovs-port-eth con-name ovs-port-eth-int

OK now you should have an OVS bridge configured and patched to your local network via your Ethernet device, but not yet active.

Configure the bridge

By default the OVS bridge will be sending untagged traffic and requesting an IP address for ovs-bridge via DHCP. If you don’t want it to be on the network (you might have another dedicated interface) then disable DHCP on the interface.

sudo nmcli con modify ovs-bridge-int ipv4.method disabled ipv6.method disabled

Or if you need to set a static IP you can do that too.

sudo nmcli con modify ovs-bridge-int ipv4.method static ipv4.address 192.168.123.100/24

If you need to set a specific MTU like 9000 (defaults to 1500), you can do that.

sudo nmcli con modify ovs-bridge-int 802-3-ethernet.mtu 9000
sudo nmcli con modify ovs-port-eth-int 802-3-ethernet.mtu 9000

Bring up the bridge

Before you bring up the bridge, note that ovs-bridge will probably have a MAC address which is different to your physical interface. Keep that in mind if you manage DHCP static leases, and make sure you can find the new IP so that you can log back in once the bridge is brought up.

Now you can either simply reboot, or stop the current interface and bring up the bridge and its interfaces (in theory we just need to bring up ovs-port-eth-int, but let’s make sure and do it in one command in case you’re using the one interface, else you’ll get disconnected and not be able to log back in). Note that your MAC address may change here, so if you’re using DHCP and you’ll get a new IP and your session will freeze, so be sure you can find the new IP so you can log back in.

sudo nmcli con down "${NM_NAME}" ; \
sudo nmcli con up ovs-port-eth-int ; \
sudo nmcli con up ovs-bridge-int

Now you have a working Open vSwitch implementation!

Create OVS VLAN ports

From there you might want to create some port groups for specific VLANs. For example, if your network does not have a native VLAN, you will need to create a VLAN interface on the OVS bridge to get onto the network.

Let’s create a new port and interface for VLAN 123 which will use DHCP by default to get an address and bring it up.

sudo nmcli con add type ovs-port conn.interface vlan123 master ovs-bridge ovs-port.tag 123 con-name ovs-port-vlan123
sudo nmcli con add type ovs-interface slave-type ovs-port conn.interface vlan123 master ovs-port-vlan123 con-name ovs-int-vlan123
sudo nmcli con up ovs-int-vlan123

If you need to set a static address on the VLAN interface instead, you can do so by modifying the interface.

sudo nmcli con modify ovs-int-vlan123 ipv4.method static ipv4.address 192.168.123.100/24

View the OVS configuration

Show the switch config and bridge with OVS tools.

sudo ovs-vsctl show

Clean up old interface profile

It’s not really necessary, but you can disable the current NetworkManager config for the device so that it doesn’t conflict with the bridge, if you want to.

sudo nmcli con modify "${NM_NAME}" ipv4.method disabled ipv6.method disabled

Or you can even delete the old interface’s NetworkManager configuration if you want to (but it’s not necessary).

sudo nmcli con delete "${NM_NAME}"

That’s it!

July 25, 2020

Batch Image Processing

It may initially seem counter-intuitive, but sometimes one needs to process an image file without actually viewing the image file. This is particularly the case if one has a very large number of image files and a uniform change is required. The slow process is to open the images files individually in whatever application one is using and make the changes required, save and open the next file and make the changes required, and so forth. This is time-consuming, boring, and prone to error.

Avoiding such activities is why computers were invented; computers are extremely good at accurate and fast automation of computational tasks (and what can be automated should be automated); leaving humans to carry out the tasks of innovation, invention, discovery, and aesthetics. Automation of regular activities can be easily carried out with shell script loops which are used here.

The following touches the surface of certain automation tasks that I have encountered in the past. I am not a photographer, a graphic designer, or anything of the sort. I consider my skills in such endeavours as sorely lacking. However, I do have a working knowledge of how to get a GNU/Linux based system to do things with a minimal amount of work, and I strongly believe in reducing the amount of work that others have to do.

Install Necessary Software

I will start by assuming that the gentle reader of this document is using Ubuntu 18.04 LTS. It is not necessarily my Linux distribution of choice, but it is usually the one that people are initially exposed to. People who are using other distributions will find that the installation process is slightly different (e.g., use of yum install for RedHat/CentOS, for example), or if they are very keen and performance-sensitive, installing the software from source rather than packages.

Four software applications are suggested here; UFRaw, Unidentified Flying Raw, which convert camera RAW images to standard image files; ghostscript, a PostScript and PDF language interpreter and previewer; imageMagick for image file manipulations; and poppler-utils for modifying PDFs (see a previous post about how awesome they are).

Always worth doing to get a system's package information up-to-date:
sudo apt-get update

Install the applications:
sudo apt-get install ufraw-batch
sudo apt-get install ghostscript
sudo apt-get install imagemagick
sudo apt-get install poppler-utils

Converting Raw Images to Standard Image Files

Photographers will like this one. The best tool for converting raw images to standard image files is unfraw-batch. The following example parses over all the files in a directory with the suffix.CR2, a raw camera image created by Canon digital cameras. The website imaging-resources provides a good collection of some example files; they are quite large!

It is unfortunately common for non-printing characters to make their way into filenames these days. When working with the command-line, life is a lot easier if spaces are removed from filenames, because many core applications read the space a a delimiter; 'My File' will be read as two files, "My" and "File"! The following loop command can be used to get rid of these unnecessary spaces.

for item in ./*; do mv "$item" "$(echo "$item" | tr -d " ")"; done

The following loop will, for the current working directory, loop over each and every file with a.CR2 suffix, and run the ufraw-batch command, outputting a new file with the jpg format. This could also be ppm, tiff, jpeg, jpg, or fits.

for item in ./*.CR2; do ufraw-batch --out-type jpg $item ; done

There is an excellent range of manipulation options with ufraw batch; check the manual page (man ufraw-batch) for some examples.

As an aside, the output format that one uses should depend on what the file is being used for. In brief, a jpg is a compressed lossy format that is handy for web-published photographs due to their size. In comparison, PNG is a lossless compression format, which produces larger files. It is more typically recommended for drawings. TIFF is a lossless format that is typically uncompressed and used for print publications.

Or, rather than typing everything in detail just use the flowchart by Allen Hsu.

Modifying Existing Image Files

As the blockquote in the previous section suggests sometimes one might not have the right image format for the job that one wants to do.

The following simple loops allow for mass conversion of files from one format to another or force particular characteristics into a file. Each of them make use of the convert utility that is part of imagemagick.

for item in ./*.jpg ; do convert "$item" "${item%.*}.png" ; done
for item in ./*.jpg ; do convert "$item" -monochrome "$item"; done
for item in ./*.jpg; do convert "$item" -define jpeg:extent=512kb "${item%.*}.jpg" ; done
for item in ./*.jpg; do convert "$item" ../logo.jpg -gravity southeast -geometry +10+10 -composite "${item%.*}logo".jpg ; done

The first loop converts all jpg files in a directory to png files.

The second loop converts all jpg files in a directory with a copy that is monochrome. The '-monochrome' dither is clearer, but other options that could be used for a similar effect include '-threshold xx%' or '-remap pattern:gray50', for less contrast but retaining more information.

The third loop converts all jpg files to a fixed size (512kb).

The fourth loop adds a logo (logo.jpg) to all jpg files in a directory. Note that the logo file is a directory level above the image folders, otherwise, the loop would engage in the sort of horrible recursion where the logo is placed in the logo, which would be weird.

The following is the sort of insane request that one gets from managers; "could you please put the following jpg files into a PDF? In order?"

Such a request involves two steps; converting the jpg files to PDFs, and then combining the PDF files. An assumption here that the files are each prefixed with an ordered value (and there is no special characters in the file names and using ls to parse.

for item in $(ls -v *.jpg); do convert "$item" "${item%.*}.pdf"; done
pdfunite $(ls -v *.pdf) output.pdf

There certainly is a great deal more that one can go with the now-installed applications and with scripts. Imagemagick, for example, provides excellent documentation on its command-line tools. The scripting methods used here were primarily simple for-loops. There are even far more sophisticated means of engaging in such automation (conditional tests, reads from a file descriptor, continue/break statements, etc). But for now, these examples should serve as a useful short introduction on how to modify dozens or hundreds of images in batch mode without even looking at a single image.

July 19, 2020

4FSK on 25 Microwatts

Bill, VK5DSP, and I have been testing a new 4FSK modem waveform that uses LDPC Forward Error Correction (FEC). Bill designed the LDPC codes we are using, and worked out how to perform the soft decision to Log Likelihood Ratio (LLR) calculations we need to use LDPC codes with 4FSK. This is surprisingly tricky, and took a few weeks of careful simulation work.

Once working in simulation, we wanted to test the system Over the Air (OTA).

Experiment

We set up and adjusted antennas on the 2m Ham band, our FT-817 radios, and some USB rig interface boxes. This always takes longer than you think! Bill has a Yagi with 10dB gain fed by coax with 2dB loss. I have a Flower Pot vertical dipole with (I estimate) 2dB gain, and a crummy coax run with 5dB of loss. Here’s is Bill’s Yagi (it was mounted higher for the actual experiments):

Bill and I are 10km apart, with a non line of site path across suburban Adelaide between us.

The GNU Octave modem simulation has been split into Tx and Rx scripts. We generate a file of Tx samples using one script. The Tx station then plays that file OTA, while the Rx station records a file of received samples, which are then run through the Rx script to obtain Bit Error Rate (BER) and Packet Error Rate (PER) results. Here is a sample run using a local file test.raw:

octave:46> fsk_lib_ldpc_tx("test.raw")
octave:48> fsk_lib_ldpc_rx("test.raw")
Fs: 8000 Rs: 100 frames received:   8
  Uncoded: nbits:   3584 nerrs:      0 ber: 0.000
  Coded..: nbits:   2048 nerrs:      0 ber: 0.000
  Coded..: npckt:      8 perrs:      0 per: 0.000

The modem can be configured for any bit rate. We started with 100 symbols/s (200 bits/s for 4FSK) and a half rate code, resulting in 100 bits/s payload data rate. As an initial sanity test, we tried receiving our Tx signal using a RTLSDR/Airspy and gqrx located a few metres from the Tx. This confirmed the OTA link was working over short distances. A lot can go wrong with OTA tests, so it’s really important to build up to them slowly, testing at every stage.

Results – Day 1

On our first attempt we hit a few USB/LSB and tuning bugs. Turns out the core, uncoded demod works quite well with LSB and USB swapped – it’s just that all the tone frequencies are reversed so the decoded bits are scrambled!

After we sorted those issues out Bill managed to record and decode my 4FSK signal. We started with 600mW Tx power. The SNR was very high, and we had zero errors. Hmm, too strong! So we started inserting attenuation. Eventually, with about 1mW Tx power, we started to get some errors and found the “knee” in the curve – the point where the FEC falls over and we start getting packet errors.

With powerful codes like LDPC this transition is very sharp, change your SNR by just 1dB and you go from 0 to 100% packet errors.

We measured the path loss using a spectrum analyser. With Bill transmitting a 2W (33dBm) FSK signal, I measured -102dBm at the spec-an input, so an end-end path loss of 135dB. I estimated the free-space loss, for a line-of-sight path, including feed losses and antenna gains, at about 89dB. So the non line of site path is costing us 135 – 89 = 45dB extra attenuation.

Now our expected MDS can be estimated as:

MDS = Eb/No + 10*log10(bitRate) + NoiseFigure - 174
    = 7 + 10*log10(100) + 10 - 174
    = -137 dBm

I’m guessing the FT-817 noise figure at 5dB, and I have a coax loss of 5dB ahead of that so 10dB total. Now we reached our MDS with a Tx power of 0dBm, and our path loss is 135dB, this gives us a measured MDS of 0 – 135 = -135dBm. Not bad – theory and measurement within a few dB!

We performed further tests with a rate 3/4 LDPC codes at 400 symbols/s (600 bit/s throughput). This worked well at 10mW Tx power. Here are some plots showing the modem operation:

The top subplot above shows how the demodulator estimates the frequency of each of the 4FSK tones. They are nice and smooth and spaced at 400Hz, just as we would hope. The bottom plot is the time estimate, which shows a typical saw tooth pattern as the sample clocks of Bills DAC and my ADC are a little off (a few 100ppm). The demodulator automatically adjusts for this.

The top subplot above shows the SNR estimate (in the noise bandwidth bandwidth of the modem, not 3000 Hz). Our modem falls over when this hits around 7dB, so at 10mW we have 3dB margin. The bottom subplot is the number of bit errors per frame. Our LDPC codeword is about 2000 bits long, so that’s a raw BER of around 50/2000 or 2.5%.

You can decode this signal yourself using the off air sample Listen

octave:48> fsk_lib_ldpc_rx("20200717_4fsk_rs400_10mw.wav")

Now 600 bit/s is pretty close to the bit rate we need for Digital Voice which has us thinking ….. here is Bill trying SSB at the same power.

Results – Day 2

On our second day of testing Bill positioned himself at a site in the Adelaide Hills which overlook the city. The idea was to get a better path (closer to line of site) so we could try lower powers. For these tests Bill used a J-pole mounted on his car.

We managed to decode the Rs=100 symbols/s rate 1/2 waveform at an EIRP of -16dBm (25 microwatts). That’s the transmit power, not the received power.

Unfortunately we ran out of attenuators at that stage so couldn’t go any lower. It even worked really well with the FT-817 driving just a 50 ohm attenuator when the antenna cable was disconnected!

Discussion

We had a great time testing and are pleased that our modem performs just as expected over real world channels. We were both surprised that it worked – these tests usually take many attempts to get right!

I found OTA testing at VHF a lot easier than HF. You click the attenuator up and down a few dB and the SNR follows. The channel really is just additive noise, unlike HF that has all that time varying frequency selective fading and impulse noise.

What we have here is a carefully engineered 4FSK modem with powerful FEC that performs right on the theoretical limits. It can send packets of data between two points very efficiently. The payload bit rate and MDS can be scaled up and down to suit different applications. It’s all open source, and backed by a series of simulations documenting it’s development and testing.

Reading Further

Pull Request for the 4FSK to LLR mapping simulation
Pull Request for the 4FSK LDPC OTA tests
Open IP over VHF/UHF Using a RPi and RTLSDR at higher bit rates.
Bill’s LowSNR blog
Codec 2 FSK modem README
The design for Bill’s 6 element Yagi

The KSM and I

Share

I spent much of yesterday playing with KSM (Kernel Shared Memory, or Kernel Samepage Merging depending on which universe you come from). Unix kernels store memory in “pages” which are moved in and out of memory as a single block. On most Linux architectures pages are 4,096 bytes long.

KSM is a Linux Kernel feature which scans memory looking for identical pages, and then de-duplicating them. So instead of having two pages, we just have one and have two processes point at that same page. This has obvious advantages if you’re storing lots of repeating data. Why would you be doing such a thing? Well the traditional answer is virtual machines.

Take my employer’s systems for example. We manage virtual learning environments for students, where every student gets a set of virtual machines to do their learning thing on. So, if we have 50 students in a class, we have 50 sets of the same virtual machine. That’s a lot of duplicated memory. The promise of KSM is that instead of storing the same thing 50 times, we can store it once and therefore fit more virtual machines onto a single physical machine.

For my experiments I used libvirt / KVM on Ubuntu 18.04. To ensure KSM was turned on, I needed to:

  • Ensure KSM is turned on. /sys/kernel/mm/ksm/run should contain a “1” if it is enabled. If it is not, just write “1” to that file to enable it.
  • Ensure libvirt is enabling KSM. The KSM value in /etc/defaults/qemu-kvm should be set to “AUTO”.
  • Check KSM metrics:
# grep . /sys/kernel/mm/ksm/*
/sys/kernel/mm/ksm/full_scans:891
/sys/kernel/mm/ksm/max_page_sharing:256
/sys/kernel/mm/ksm/merge_across_nodes:1
/sys/kernel/mm/ksm/pages_shared:0
/sys/kernel/mm/ksm/pages_sharing:0
/sys/kernel/mm/ksm/pages_to_scan:100
/sys/kernel/mm/ksm/pages_unshared:0
/sys/kernel/mm/ksm/pages_volatile:0
/sys/kernel/mm/ksm/run:1
/sys/kernel/mm/ksm/sleep_millisecs:200
/sys/kernel/mm/ksm/stable_node_chains:49
/sys/kernel/mm/ksm/stable_node_chains_prune_millisecs:2000
/sys/kernel/mm/ksm/stable_node_dups:1055
/sys/kernel/mm/ksm/use_zero_pages:0

My lab machines are currently setup with Shaken Fist, so I just quickly launched a few hundred identical VMs. This first graph is that experiment. Its a little hard to see here but on three machines I consumed about about 40gb of RAM with indentical VMs and then waited. After three or so hours I had saved about 2,500 pages of memory.

To be honest, that’s a pretty disappointing result. 2,5000 4kb pages is only about 10mb of RAM, which isn’t very much at all. Also, three hours is a really long time for our workload, where students often fire up their labs for a couple of hours at a time before shutting them down again. If this was as good as KSM gets, it wasn’t for us.

After some pondering, I realised that KSM is configured by default to not work very well. The default value for pages_to_scan is 100, which means each scan run only inspects about half a megabyte of RAM. It would take a very very long time to scan a modern machine that way. So I tried setting pages_to_scan to 1,000,000,000 instead. One billion is an unreasonably large number for the real world, but hey. You update this number by writing a new value to /sys/kernel/mm/ksm/pages_to_scan.

This time we get a much better result — I launched as many VMs as would fit on each machine, and the sat back and waited (well, went to bed acutally). Again the graph is a bit hard to read, but what it is saying is that after 90 minutes KSM had saved me over 300gb of RAM across the three machines. Its still a little too slow for our workload, but for workloads where the VMs are relatively static that’s a real saving.

Now it should be noted that setting pages_to_scan to 1,000,000,000 comes at a cost — each of these machines now has one of its 48 cores dedicated to scanning memory and deduplicating. For my workload that’s something I am ok with because my workload is not CPU bound, but it might not work for you.

Share

Shaken Fist 0.2.0

Share

The other day we released Shaken Fist version 0.2, and I never got around to announcing it here. In fact, we’ve done a minor release since then and have another minor release in the wings ready to go out in the next day or so.

So what’s changed in Shaken Fist between version 0.1 and 0.2? Well, actually kind of a lot…

  • We moved from MySQL to etcd for storage of persistant state. This was partially done because we wanted distributed locking, but it was also because MySQL was a pain to work with.
  • We rearranged our repositories — the main repository is now in its own github organisation, and the golang REST client, terrform provider, and deployment tooling have moved into their own repositories in that organisation. There is also a prototype javascript client now as well.
  • Some work has gone into making the API service more production grade, although there is still some work to be done there probably in the 0.3 release — specifically there is a timeout if a response takes more than 300 seconds, which can be the case in launch large VMs where the disk images are not in cache.

There were also some important features added:

  • Authentication of API requests.
  • Resource ownership.
  • Namespaces (a bit like Kubernetes namespaces or OpenStack projects).
  • Resource tagging, called metadata.
  • Support for local mirroring of common disk images.
  • …and a large number of bug fixes.

Shaken Fist is also now packaged on pypi, and the deployment tooling knows how to install from packages as well as source if that’s a thing you’re interested in. You can read more at shakenfist.com, but that site is a bit of a work in progress at the moment. The new github organisation is at github.com/shakenfist.

Share

July 17, 2020

If You’re not Using YAML for CloudFormation Templates, You’re Doing it Wrong

In my last blog post, I promised a rant about using YAML for CloudFormation templates. Here it is. If you persevere to the end I’ll also show you have to convert your existing JSON based templates to YAML.

Many of the points I raise below don’t just apply to CloudFormation. They are general comments about why you should use YAML over JSON for configuration when you have a choice.

One criticism of YAML is its reliance on indentation. A lot of the code I write these days is Python, so indentation being significant is normal. Use a decent editor or IDE and this isn’t a problem. It doesn’t matter if you’re using JSON or YAML, you will want to validate and lint your files anyway. How else will you find that trailing comma in your JSON object?

Now we’ve got that out of the way, let me try to convince you to use YAML.

As developers we are regularly told that we need to document our code. CloudFormation is Infrastructure as Code. If it is code, then we need to document it. That starts with the Description property at the top of the file. If you JSON for your templates, that’s it, you have no other opportunity to document your templates. On the other hand, if you use YAML you can add inline comments. Anywhere you need a comment, drop in a hash # and your comment. Your team mates will thank you.

JSON templates don’t support multiline strings. These days many developers have 4K or ultra wide monitors, we don’t want a string that spans the full width of our 34” screen. Text becomes harder to read once you exceed that “90ish” character limit. With JSON your multiline string becomes "[90ish-characters]\n[another-90ish-characters]\n[and-so-on"]. If you opt for YAML, you can use the greater than symbol (>) and then start your multiline comment like so:

Description: >
  This is the first line of my Description
  and it continues on my second line
  and I'll finish it on my third line.

As you can see it much easier to work with multiline string in YAML than JSON.

“Folded blocks” like the one above are created using the > replace new lines with spaces. This allows you to format your text in a more readable format, but allow a machine to use it as intended. If you want to preserve the new line, use the pipe (|) to create a “literal block”. This is great for an inline Lambda functions where the code remains readable and maintainable.

  APIFunction:
    Type: AWS::Lambda::Function
    Properties:
      Code:
        ZipFile: |
          import json
          import random


          def lambda_handler(event, context):
              return {"statusCode": 200, "body": json.dumps({"value": random.random()})}
      FunctionName: "GetRandom"
      Handler: "index.lambda_handler"
      MemorySize: 128
      Role: !GetAtt LambdaServiceRole.Arn
      Runtime: "python3.7"
		Timeout: 5

Both JSON and YAML require you to escape multibyte characters. That’s less of an issue with CloudFormation templates as generally you’re only using the ASCII character set.

In a YAML file generally you don’t need to quote your strings, but in JSON double quotes are used every where, keys, string values and so on. If your string contains a quote you need to escape it. The same goes for tabs, new lines, backslashes and and so on. JSON based CloudFormation templates can be hard to read because of all the escaping. It also makes it harder to handcraft your JSON when your code is a long escaped string on a single line.

Some configuration in CloudFormation can only be expressed as JSON. Step Functions and some of the AppSync objects in CloudFormation only allow inline JSON configuration. You can still use a YAML template and it is easier if you do when working with these objects.

The JSON only configuration needs to be inlined in your template. If you’re using JSON you have to supply this as an escaped string, rather than nested objects. If you’re using YAML you can inline it as a literal block. Both YAML and JSON templates support functions such as Sub being applied to these strings, it is so much more readable with YAML. See this Step Function example lifted from the AWS documentation:

MyStateMachine:
  Type: "AWS::StepFunctions::StateMachine"
  Properties:
    DefinitionString:
      !Sub |
        {
          "Comment": "A simple AWS Step Functions state machine that automates a call center support session.",
          "StartAt": "Open Case",
          "States": {
            "Open Case": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:open_case",
              "Next": "Assign Case"
            }, 
            "Assign Case": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:assign_case",
              "Next": "Work on Case"
            },
            "Work on Case": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:work_on_case",
              "Next": "Is Case Resolved"
            },
            "Is Case Resolved": {
                "Type" : "Choice",
                "Choices": [ 
                  {
                    "Variable": "$.Status",
                    "NumericEquals": 1,
                    "Next": "Close Case"
                  },
                  {
                    "Variable": "$.Status",
                    "NumericEquals": 0,
                    "Next": "Escalate Case"
                  }
              ]
            },
             "Close Case": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:close_case",
              "End": true
            },
            "Escalate Case": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:escalate_case",
              "Next": "Fail"
            },
            "Fail": {
              "Type": "Fail",
              "Cause": "Engage Tier 2 Support."    }   
          }
        }

If you’re feeling lazy you can use inline JSON for IAM policies that you’ve copied from elsewhere. It’s quicker than converting them to YAML.

YAML templates are smaller and more compact than the same configuration stored in a JSON based template. Smaller yet more readable is winning all round in my book.

If you’re still not convinced that you should use YAML for your CloudFormation templates, go read Amazon’s blog post from 2017 advocating the use of YAML based templates.

Amazon makes it easy to convert your existing templates from JSON to YAML. cfn-flip is aPython based AWS Labs tool for converting CloudFormation templates between JSON and YAML. I will assume you’ve already installed cfn-flip. Once you’ve done that, converting your templates with some automated cleanups is just a command away:

cfn-flip --clean template.json template.yaml

git rm the old json file, git add the new one and git commit and git push your changes. Now you’re all set for your new life using YAML based CloudFormation templates.

If you want to learn more about YAML files in general, I recommend you check our Learn X in Y Minutes’ Guide to YAML. If you want to learn more about YAML based CloudFormation templates, check Amazon’s Guide to CloudFormation Templates.

July 16, 2020

Windows 10 on Debian under KVM

Here are some things that you need to do to get Windows 10 running on a Debian host under KVM.

UEFI Booting

UEFI is big and complex, but most of what it does isn’t needed at all. If all you want to do is boot from an image of a disk with a GPT partition table then you just install the package ovmf and add something like the following to your KVM start script:

UEFI="-drive if=pflash,format=raw,readonly,file=/usr/share/OVMF/OVMF_CODE.fd -drive if=pflash,format=raw,readonly,file=/usr/share/OVMF/OVMF_VARS.fd"

Note that some of the documentation on this doesn’t have the OVMF_VARS.fd file set to readonly. Allowing writes to that file means that the VM boot process (and maybe later) can change EFI variables that affect later boots and other VMs if they all share the same file. For a basic boot you don’t need to change variables so you want it read-only. Also having it read-only is necessary if you want to run KVM as non-root.

As an experiment I tried booting without the OVMF_VARS.fd file, it didn’t boot and then even after configuring it to use the OVMF_VARS.fd file again Windows gave a boot error about the “boot configuration data file” that required booting from recovery media. Apparently configuration mistakes with EFI can mess up the Windows installation, so be careful and backup the Windows installation regularly!

Linux can boot from EFI but you generally don’t want to unless the boot device is larger than 2TB. It’s relatively easy to convert a Linux installation on a GPT disk to a virtual image on a DOS partition table disk or on block devices without partition tables and that gives a faster boot. If the same person runs the host hardware and the VMs then the best choice for Linux is to have no partition tables just one filesystem per block device (which makes resizing much easier) and have the kernel passed as a parameter to kvm. So booting a VM from EFI is probably only useful for booting Windows VMs and for Linux boot loader development and testing.

As an aside, the Debian Wiki page about Secure Boot on a VM [4] was useful for this. It’s unfortunate that it and so much of the documentation about UEFI is about secure boot which isn’t so useful if you just want to boot a system without regard to the secure boot features.

Emulated IDE Disks

Debian kernels (and probably kernels from many other distributions) are compiled with the paravirtualised storage device drivers. Windows by default doesn’t support such devices so you need to emulate an IDE/SATA disk so you can boot Windows and install the paravirtualised storage driver. The following configuration snippet has a commented line for paravirtualised IO (which is fast) and an uncommented line for a virtual IDE/SATA disk that will allow an unmodified Windows 10 installation to boot.

#DRIVE="-drive format=raw,file=/home/kvm/windows10,if=virtio"
DRIVE="-drive id=disk,format=raw,file=/home/kvm/windows10,if=none -device ahci,id=ahci -device ide-drive,drive=disk,bus=ahci.0"

Spice Video

Spice is an alternative to VNC, Here is the main web site for Spice [1]. Spice has many features that could be really useful for some people, like audio, sharing USB devices from the client, and streaming video support. I don’t have a need for those features right now but it’s handy to have options. My main reason for choosing Spice over VNC is that the mouse cursor in the ssvnc doesn’t follow the actual mouse and can be difficult or impossible to click on items near edges of the screen.

The following configuration will make the QEMU code listen with SSL on port 1234 on all IPv4 addresses. Note that this exposes the Spice password to anyone who can run ps on the KVM server, I’ve filed Debian bug #965061 requesting the option of a password file to address this. Also note that the “qxl” virtual video hardware is VGA compatible and can be expected to work with OS images that haven’t been modified for virtualisation, but that they work better with special video drivers.

KEYDIR=/etc/letsencrypt/live/kvm.example.com-0001
-spice password=xxxxxxxx,x509-cacert-file=$KEYDIR/chain.pem,x509-key-file=$KEYDIR/privkey.pem,x509-cert-file=$KEYDIR/cert.pem,tls-port=1234,tls-channel=main -vga qxl

To connect to the Spice server I installed the spice-client-gtk package in Debian and ran the following command:

spicy -h kvm.example.com -s 1234 -w xxxxxxxx

Note that this exposes the Spice password to anyone who can run ps on the system used as a client for Spice, I’ve filed Debian bug #965060 requesting the option of a password file to address this.

This configuration with an unmodified Windows 10 image only supported 800*600 resolution VGA display.

Networking

To set up bridged networking as non-root you need to do something like the following as root:

chgrp kvm /usr/lib/qemu/qemu-bridge-helper
setcap cap_net_admin+ep /usr/lib/qemu/qemu-bridge-helper
mkdir -p /etc/qemu
echo "allow all" > /etc/qemu/bridge.conf
chgrp kvm /etc/qemu/bridge.conf
chmod 640 /etc/qemu/bridge.conf

Windows 10 supports the emulated Intel E1000 network card. Configuration like the following configures networking on a bridge named br0 with an emulated E1000 card. MAC addresses that have a 1 in the second least significant bit of the first octet are “locally administered” (like IPv4 addresses starting with “10.”), see the Wikipedia page about MAC Address for details.

The following is an example of network configuration where $ID is an ID number for the virtual machine. So far I haven’t come close to 256 VMs on one network so I’ve only needed one octet.

NET="-device e1000,netdev=net0,mac=02:00:00:00:01:$ID -netdev tap,id=net0,helper=/usr/lib/qemu/qemu-bridge-helper,br=br0"

Final KVM Settings

KEYDIR=/etc/letsencrypt/live/kvm.example.com-0001
SPICE="-spice password=xxxxxxxx,x509-cacert-file=$KEYDIR/chain.pem,x509-key-file=$KEYDIR/privkey.pem,x509-cert-file=$KEYDIR/cert.pem,tls-port=1234,tls-channel=main -vga qxl"

UEFI="-drive if=pflash,format=raw,readonly,file=/usr/share/OVMF/OVMF_CODE.fd -drive if=pflash,format=raw,readonly,file=/usr/share/OVMF/OVMF_VARS.fd"

DRIVE="-drive format=raw,file=/home/kvm/windows10,if=virtio"

NET="-device e1000,netdev=net0,mac=02:00:00:00:01:$ID -netdev tap,id=net0,helper=/usr/lib/qemu/qemu-bridge-helper,br=br0"

kvm -m 4000 -smp 2 $SPICE $UEFI $DRIVE $NET

Windows Settings

The Spice Download page has a link for “spice-guest-tools” that has the QNX video driver among other things [2]. This seems to be needed for resolutions greater than 800*600.

The Virt-Manager Download page has a link for “virt-viewer” which is the Spice client for Windows systems [3], they have MSI files for both i386 and AMD64 Windows.

It’s probably a good idea to set display and system to sleep after never (I haven’t tested what happens if you don’t do that, but there’s no benefit in sleeping). Before uploading an image I disabled the pagefile and set the partition to the minimum size so I had less data to upload.

Problems

Here are some things I haven’t solved yet.

The aSpice Android client for the Spice protocol fails to connect with the QEMU code at the server giving the following message on stderr: “error:14094418:SSL routines:ssl3_read_bytes:tlsv1 alert unknown ca:../ssl/record/rec_layer_s3.c:1544:SSL alert number 48“.

Spice is supposed to support dynamic changes to screen resolution on the VM to match the window size at the client, this doesn’t work for me, not even with the Red Hat QNX drivers installed.

The Windows Spice client doesn’t seem to support TLS, I guess running some sort of proxy for TLS would work but I haven’t tried that yet.

July 14, 2020

OpenHMD and the Oculus Rift

For some time now, I’ve been involved in the OpenHMD project, working on building an open driver for the Oculus Rift CV1, and more recently the newer Rift S VR headsets.

This post is a bit of an overview of how the 2 devices work from a high level for people who might have used them or seen them, but not know much about the implementation. I also want to talk about OpenHMD and how it fits into the evolving Linux VR/AR API stack.

OpenHMD

http://www.openhmd.net/

In short, OpenHMD is a project providing open drivers for various VR headsets through a single simple API. I don’t know of any other project that provides support for as many different headsets as OpenHMD, so it’s the logical place to contribute for largest effect.

OpenHMD is supported as a backend in Monado, and in SteamVR via the SteamVR-OpenHMD plugin. Working drivers in OpenHMD opens up a range of VR games – as well as non-gaming applications like Blender. I think it’s important that Linux and friends not get left behind – in what is basically a Windows-only activity right now.

One downside is that does come with the usual disadvantages of an abstraction API, in that it doesn’t fully expose the varied capabilities of each device, but instead the common denominator. I hope we can fix that in time by extending the OpenHMD API, without losing its simplicity.

Oculus Rift S

I bought an Oculus Rift S in April, to supplement my original consumer Oculus Rift (the CV1) from 2017. At that point, the only way to use it was in Windows via the official Oculus driver as there was no open source driver yet. Since then, I’ve largely reverse engineered the USB protocol for it, and have implemented a basic driver that’s upstream in OpenHMD now.

I find the Rift S a somewhat interesting device. It’s not entirely an upgrade over the older CV1. The build quality, and some of the specifications are actually worse than the original device – but one area that it is a clear improvement is in the tracking system.

CV1 Tracking

The Rift CV1 uses what is called an outside-in tracking system, which has 2 major components. The first is input from Inertial Measurement Units (IMU) on each device – the headset and the 2 hand controllers. The 2nd component is infrared cameras (Rift Sensors) that you space around the room and then run a calibration procedure that lets the driver software calculate their positions relative to the play area.

IMUs provide readings of linear acceleration and angular velocity, which can be used to determine the orientation of a device, but don’t provide absolute position information. You can derive relative motion from a starting point using an IMU, but only over a short time frame as the integration of the readings is quite noisy.

This is where the Rift Sensors get involved. The cameras observe constellations of infrared LEDs on the headset and hand controllers, and use those in concert with the IMU readings to position the devices within the playing space – so that as you move, the virtual world accurately reflects your movements. The cameras and LEDs synchronise to a radio pulse from the headset, and the camera exposure time is kept very short. That means the picture from the camera is completely black, except for very bright IR sources. Hopefully that means only the LEDs are visible, although light bulbs and open windows can inject noise and make the tracking harder.

Rift Sensor view of the CV1 headset and 2 controllers.Rift Sensor view of the CV1 headset and 2 controllers.

If you have both IMU and camera data, you can build what we call a 6 Degree of Freedom (6DOF) driver. With only IMUs, a driver is limited to providing 3 DOF – allowing you to stand in one place and look around, but not to move.

OpenHMD provides a 3DOF driver for the CV1 at this point, with experimental 6DOF work in a branch in my fork. Getting to a working 6DOF driver is a real challenge. The official drivers from Oculus still receive regular updates to tweak the tracking algorithms.

I have given several presentations about the progress on implementing positional tracking for the CV1. Most recently at Linux.conf.au 2020 in January. There’s a recording at https://www.youtube.com/watch?v=PTHE-cdWN_s if you’re interested, and I plan to talk more about that in a future post.

Rift S Tracking

The Rift S uses Inside Out tracking, which inverts the tracking process by putting the cameras on the headset instead of around the room. With the cameras in fixed positions on the headset, the cameras and their view of the world moves as the user’s head moves. For the Rift S, there are 5 individual cameras pointing outward in different directions to provide (overall) a very wide-angle view of the surroundings.

The role of the tracking algorithm in the driver in this scenario is to use the cameras to look for visual landmarks in the play area, and to combine that information with the IMU readings to find the position of the headset. This is called Visual Inertial Odometry.

There is then a 2nd part to the tracking – finding the position of the hand controllers. This part works the same as on the CV1 – looking for constellations of LED lights on the controllers and matching what you see to a model of the controllers.

This is where I think the tracking gets particularly interesting. The requirements for finding where the headset is in the room, and the goal of finding the controllers require 2 different types of camera view!

To find the landmarks in the room, the vision algorithm needs to be able to see everything clearly and you want a balanced exposure from the cameras. To identify the controllers, you want a very fast exposure synchronised with the bright flashes from the hand controller LEDs – the same as when doing CV1 tracking.

The Rift S satisfies both requirements by capturing alternating video frames with fast and normal exposures. Each time, it captures the 5 cameras simultaneously and stitches them together into 1 video frame to deliver over USB to the host computer. The driver then needs to split each frame according to whether it is a normal or fast exposure and dispatch it to the appropriate part of the tracking algorithm.

Rift S – normal room exposure for Visual Inertial Odometry.
Rift S – fast exposure with IR LEDs for controller tracking.

There are a bunch of interesting things to notice in these camera captures:

  • Each camera view is inserted into the frame in some native orientation, and requires external information to make use of the information in them
  • The cameras have a lot of fisheye distortion that will need correcting.
  • In the fast exposure frame, the light bulbs on my ceiling are hard to tell apart from the hand controller LEDs – another challenge for the computer vision algorithm.
  • The cameras are Infrared only, which is why the Rift S passthrough view (if you’ve ever seen it) is in grey-scale.
  • The top 16-pixels of each frame contain some binary data to help with frame identification. I don’t know how to interpret the contents of that data yet.

Status

This blog post is already too long, so I’ll stop here. In part 2, I’ll talk more about deciphering the Rift S protocol.

Thanks for reading! If you have any questions, hit me up at mailto:thaytan@noraisin.net or @thaytan on Twitter

Debian PPC64EL Emulation

In my post on Debian S390X Emulation [1] I mentioned having problems booting a Debian PPC64EL kernel under QEMU. Giovanni commented that they had PPC64EL working and gave a link to their site with Debian QEMU images for various architectures [2]. I tried their image which worked then tried mine again which also worked – it seemed that a recent update in Debian/Unstable fixed the bug that made QEMU not work with the PPC64EL kernel.

Here are the instructions on how to do it.

First you need to create a filesystem in an an image file with commands like the following:

truncate -s 4g /vmstore/ppc
mkfs.ext4 /vmstore/ppc
mount -o loop /vmstore/ppc /mnt/tmp

Then visit the Debian Netinst page [3] to download the PPC64EL net install ISO. Then loopback mount it somewhere convenient like /mnt/tmp2.

The package qemu-system-ppc has the program for emulating a PPC64LE system, the qemu-user-static package has the program for emulating PPC64LE for a single program (IE a statically linked program or a chroot environment), you need this to run debootstrap. The following commands should be most of what you need.

apt install qemu-system-ppc qemu-user-static

update-binfmts --display

# qemu ppc64 needs exec stack to solve "Could not allocate dynamic translator buffer"
# so enable that on SE Linux systems
setsebool -P allow_execstack 1

debootstrap --foreign --arch=ppc64el --no-check-gpg buster /mnt/tmp file:///mnt/tmp2
chroot /mnt/tmp /debootstrap/debootstrap --second-stage

cat << END > /mnt/tmp/etc/apt/sources.list
deb http://mirror.internode.on.net/pub/debian/ buster main
deb http://security.debian.org/ buster/updates main
END
echo "APT::Install-Recommends False;" > /mnt/tmp/etc/apt/apt.conf

echo ppc64 > /mnt/tmp/etc/hostname

# /usr/bin/awk: error while loading shared libraries: cannot restore segment prot after reloc: Permission denied
# only needed for chroot
setsebool allow_execmod 1

chroot /mnt/tmp apt update
# why aren't they in the default install?
chroot /mnt/tmp apt install perl dialog
chroot /mnt/tmp apt dist-upgrade
chroot /mnt/tmp apt install bash-completion locales man-db openssh-server build-essential systemd-sysv ifupdown vim ca-certificates gnupg
# install kernel last because systemd install rebuilds initrd
chroot /mnt/tmp apt install linux-image-ppc64el
chroot /mnt/tmp dpkg-reconfigure locales
chroot /mnt/tmp passwd

cat << END > /mnt/tmp/etc/fstab
/dev/vda / ext4 noatime 0 0
#/dev/vdb none swap defaults 0 0
END

mkdir /mnt/tmp/root/.ssh
chmod 700 /mnt/tmp/root/.ssh
cp ~/.ssh/id_rsa.pub /mnt/tmp/root/.ssh/authorized_keys
chmod 600 /mnt/tmp/root/.ssh/authorized_keys

rm /mnt/tmp/vmlinux* /mnt/tmp/initrd*
mkdir /boot/ppc64
cp /mnt/tmp/boot/[vi]* /boot/ppc64

# clean up
umount /mnt/tmp
umount /mnt/tmp2

# setcap binary for starting bridged networking
setcap cap_net_admin+ep /usr/lib/qemu/qemu-bridge-helper

# afterwards set the access on /etc/qemu/bridge.conf so it can only
# be read by the user/group permitted to start qemu/kvm
echo "allow all" > /etc/qemu/bridge.conf

Here is an example script for starting kvm. It can be run by any user that can read /etc/qemu/bridge.conf.

#!/bin/bash
set -e

KERN="kernel /boot/ppc64/vmlinux-4.19.0-9-powerpc64le -initrd /boot/ppc64/initrd.img-4.19.0-9-powerpc64le"

# single network device, can have multiple
NET="-device e1000,netdev=net0,mac=02:02:00:00:01:04 -netdev tap,id=net0,helper=/usr/lib/qemu/qemu-bridge-helper"

# random number generator for fast start of sshd etc
RNG="-object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0"

# I have lockdown because it does no harm now and is good for future kernels
# I enable SE Linux everywhere
KERNCMD="net.ifnames=0 noresume security=selinux root=/dev/vda ro lockdown=confidentiality"

kvm -drive format=raw,file=/vmstore/ppc64,if=virtio $RNG -nographic -m 1024 -smp 2 $KERN -curses -append "$KERNCMD" $NET

July 09, 2020

Logging Step Functions to CloudWatch

Many AWS Services log to CloudWatch. Some do it out of the box, others need to be configured to log properly. When Amazon released Step Functions, they didn’t include support for logging to CloudWatch. In February 2020, Amazon announced StepFunctions could now log to CloudWatch. Step Functions still support CloudTrail logs, but CloudWatch logging is more useful for many teams.

Users need to configure Step Functions to log to CloudWatch. This is done on a per State Machine basis. Of course you could click around he console to enable it, but that doesn’t scale. If you use CloudFormation to manage your Step Functions, it is only a few extra lines of configuration to add the logging support.

In my example I will assume you are using YAML for your CloudFormation templates. I’ll save my “if you’re using JSON for CloudFormation you’re doing it wrong” rant for another day. This is a cut down example from one of my services:

---
AWSTemplateFormatVersion: '2010-09-09'
Description: StepFunction with Logging Example.
Parameters:
Resources:
  StepFunctionExecRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
        - Effect: Allow
          Principal:
            Service: !Sub "states.${AWS::Region}.amazonaws.com"
          Action:
          - sts:AssumeRole
      Path: "/"
      Policies:
      - PolicyName: StepFunctionExecRole
        PolicyDocument:
          Version: '2012-10-17'
          Statement:
          - Effect: Allow
            Action:
            - lambda:InvokeFunction
            - lambda:ListFunctions
            Resource: !Sub "arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:my-lambdas-namespace-*"
          - Effect: Allow
            Action:
            - logs:CreateLogDelivery
            - logs:GetLogDelivery
            - logs:UpdateLogDelivery
            - logs:DeleteLogDelivery
            - logs:ListLogDeliveries
            - logs:PutResourcePolicy
            - logs:DescribeResourcePolicies
            - logs:DescribeLogGroups
            Resource: "*"
  MyStateMachineLogGroup:
    Type: AWS::Logs::LogGroup
    Properties:
      LogGroupName: /aws/stepfunction/my-step-function
      RetentionInDays: 14
  DashboardImportStateMachine:
    Type: AWS::StepFunctions::StateMachine
    Properties:
      StateMachineName: my-step-function
      StateMachineType: STANDARD
      LoggingConfiguration:
        Destinations:
          - CloudWatchLogsLogGroup:
             LogGroupArn: !GetAtt MyStateMachineLogGroup.Arn
        IncludeExecutionData: True
        Level: ALL
      DefinitionString:
        !Sub |
        {
          ... JSON Step Function definition goes here
        }
      RoleArn: !GetAtt StepFunctionExecRole.Arn

The key pieces in this example are the second statement in the IAM Role with all the logging permissions, the LogGroup defined by MyStateMachineLogGroup and the LoggingConfiguration section of the Step Function definition.

The IAM role permissions are copied from the example policy in the AWS documentation for using CloudWatch Logging with Step Functions. The CloudWatch IAM permissions model is pretty weak, so we need to grant these broad permissions.

The LogGroup definition creates the log group in CloudWatch. You can use what ever value you want for the LogGroupName. I followed the Amazon convention of prefixing everything with /aws/[service-name]/ and then appended the Step Function name. I recommend using the RetentionInDays configuration. It stops old logs sticking around for ever. In my case I send all my logs to ELK, so I don’t need to retain them in CloudWatch long term.

Finally we use the LoggingConfiguration to tell AWS where we want to send out logs. You can only specify a single Destinations. The IncludeExecutionData determines if the inputs and outputs of each function call is logged. You should not enable this if you are passing sensitive information between your steps. The verbosity of logging is controlled by Level. Amazon has a page on Step Function log levels. For dev you probably want to use ALL to help with debugging but in production you probably only need ERROR level logging.

I removed the Parameters and Output from the template. Use them as you need to.

Drupal/MySQL Authentication Error

Yesterday I noticed that a number of Drupal websites I look after were down, and it was a pretty interesting error message:

MYSQL ERROR 2049 (HY000): Connection using old (pre-4.1.1) authentication protocol ref used (client option 'secure_auth' enabled)

Like most error messages it does give a tantalising hint about the problem, and unsurprisingly other people have had similar issues as StackOverflow explains. But it all looks somewhat annoying, especially when running on a hosted service (honestly, I have other things to do these days).

I was about to rue the day, and not for the first time, when I decided to move from plain HTML/CSS/Javascript sites to something with a database-driven system (Drupal, WordPress, etc). But fortunately, there is a stunningly easy fix; simply change the password. Specifically, change the password of the MySQL user that accesses the database that is being used by Drupal, and the authentication problem will solve itself. Actually this is probably timely because it's a good indication that (ahem) the password hadn't been changed "for quite a while".

The following are command-line options, but it's equally simple enough if one is using CPanel or similar.

1. Change password for database user. Get on MySQL the usual manner (e.g., mysql -u root -h localhost -p) and run
ALTER USER '$username'@'localhost' IDENTIFIED BY '$newpassword';

2. Change permissions on the $drupalroot/sites/default folder. The default permissions are set to 0555, set them to 0755 for editing.
chmod 0755 $drupalroot/sites/default

3. Change permissions for settings.php to 640 for editing
cd default; chmod 0640 settings.php

4. Change password for database user in settings.php ; this can be found in two locations, depending on the varsion of Drupal being used:

e.g.,
$db_url = 'mysql://username:password@localhost/databasename';

e.g.,

$databases = array (
..
'database' => '$database_name',
'username' => '$username',
'password' => '$newpassword',
),

5. Change permissions for settings.php and sites/default to read only
chmod 0440 settings.php; cd ..; chmod 0555 $drupalroot/sites/default

6. Reload website. Everything will be back to working order.

July 08, 2020

Resistance is Useless

Sorry about the Vogon quote.

I don’t write about my EV much any more, as nothing much happens to it. I’m planning to buy a real factory EV when the prices drop beneath AUD$30k. In the mean time it’s still my daily drive, like it has been since 2008.

Yesterday it felt a bit sluggish, like it wasn’t charged properly. I charged it, but the charger cut out prematurely. This is never a good sign as you don’t want a partially charged pack.

When under charge, I measured the voltage across all 36 cells. They were very close to each other except for one which was 300mV high. I traced this 300mV to the terminal post between the cell and it’s cable. I pulled apart the terminal and the washers were all black and burnt. If it was dropping 300mV at 20A charge, it must have been getting pretty hot at 200A during acceleration!


I replaced the washers with new ones and measured 11mV drop during charge, similar to other terminal posts. In addition to sluggish performance and carbon generation the high voltage drop must have been causing the battery management system to drop the charger out early.

It’s now happily soaking up some more charge, and I’ll take it for a test drive soon.

Reading Further
My EV Page
EV Blog Archive

July 05, 2020

Debian S390X Emulation

I decided to setup some virtual machines for different architectures. One that I decided to try was S390X – the latest 64bit version of the IBM mainframe. Here’s how to do it, I tested on a host running Debian/Unstable but Buster should work in the same way.

First you need to create a filesystem in an an image file with commands like the following:

truncate -s 4g /vmstore/s390x
mkfs.ext4 /vmstore/s390x
mount -o loop /vmstore/s390x /mnt/tmp

Then visit the Debian Netinst page [1] to download the S390X net install ISO. Then loopback mount it somewhere convenient like /mnt/tmp2.

The package qemu-system-misc has the program for emulating a S390X system (among many others), the qemu-user-static package has the program for emulating S390X for a single program (IE a statically linked program or a chroot environment), you need this to run debootstrap. The following commands should be most of what you need.

# Install the basic packages you need
apt install qemu-system-misc qemu-user-static debootstrap

# List the support for different binary formats
update-binfmts --display

# qemu s390x needs exec stack to solve "Could not allocate dynamic translator buffer"
# so you probably need this on SE Linux systems
setsebool allow_execstack 1

# commands to do the main install
debootstrap --foreign --arch=s390x --no-check-gpg buster /mnt/tmp file:///mnt/tmp2
chroot /mnt/tmp /debootstrap/debootstrap --second-stage

# set the apt sources
cat << END > /mnt/tmp/etc/apt/sources.list
deb http://YOURLOCALMIRROR/pub/debian/ buster main
deb http://security.debian.org/ buster/updates main
END
# for minimal install do not want recommended packages
echo "APT::Install-Recommends False;" > /mnt/tmp/etc/apt/apt.conf

# update to latest packages
chroot /mnt/tmp apt update
chroot /mnt/tmp apt dist-upgrade

# install kernel, ssh, and build-essential
chroot /mnt/tmp apt install bash-completion locales linux-image-s390x man-db openssh-server build-essential
chroot /mnt/tmp dpkg-reconfigure locales
echo s390x > /mnt/tmp/etc/hostname
chroot /mnt/tmp passwd

# copy kernel and initrd
mkdir -p /boot/s390x
cp /mnt/tmp/boot/vmlinuz* /mnt/tmp/boot/initrd* /boot/s390x

# setup /etc/fstab
cat << END > /mnt/tmp/etc/fstab
/dev/vda / ext4 noatime 0 0
#/dev/vdb none swap defaults 0 0
END

# clean up
umount /mnt/tmp
umount /mnt/tmp2

# setcap binary for starting bridged networking
setcap cap_net_admin+ep /usr/lib/qemu/qemu-bridge-helper

# afterwards set the access on /etc/qemu/bridge.conf so it can only
# be read by the user/group permitted to start qemu/kvm
echo "allow all" > /etc/qemu/bridge.conf

Some of the above can be considered more as pseudo-code in shell script rather than an exact way of doing things. While you can copy and past all the above into a command line and have a reasonable chance of having it work I think it would be better to look at each command and decide whether it’s right for you and whether you need to alter it slightly for your system.

To run qemu as non-root you need to have a helper program with extra capabilities to setup bridged networking. I’ve included that in the explanation because I think it’s important to have all security options enabled.

The “-object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-ccw,rng=rng0” part is to give entropy to the VM from the host, otherwise it will take ages to start sshd. Note that this is slightly but significantly different from the command used for other architectures (the “ccw” is the difference).

I’m not sure if “noresume” on the kernel command line is required, but it doesn’t do any harm. The “net.ifnames=0” stops systemd from renaming Ethernet devices. For the virtual networking the “ccw” again is a difference from other architectures.

Here is a basic command to run a QEMU virtual S390X system. If all goes well it should give you a login: prompt on a curses based text display, you can then login as root and should be able to run “dhclient eth0” and other similar commands to setup networking and allow ssh logins.

qemu-system-s390x -drive format=raw,file=/vmstore/s390x,if=virtio -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-ccw,rng=rng0 -nographic -m 1500 -smp 2 -kernel /boot/s390x/vmlinuz-4.19.0-9-s390x -initrd /boot/s390x/initrd.img-4.19.0-9-s390x -curses -append "net.ifnames=0 noresume root=/dev/vda ro" -device virtio-net-ccw,netdev=net0,mac=02:02:00:00:01:02 -netdev tap,id=net0,helper=/usr/lib/qemu/qemu-bridge-helper

Here is a slightly more complete QEMU command. It has 2 block devices, for root and swap. It has SE Linux enabled for the VM (SE Linux works nicely on S390X). I added the “lockdown=confidentiality” kernel security option even though it’s not supported in 4.19 kernels, it doesn’t do any harm and when I upgrade systems to newer kernels I won’t have to remember to add it.

qemu-system-s390x -drive format=raw,file=/vmstore/s390x,if=virtio -drive format=raw,file=/vmswap/s390x,if=virtio -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-ccw,rng=rng0 -nographic -m 1500 -smp 2 -kernel /boot/s390x/vmlinuz-4.19.0-9-s390x -initrd /boot/s390x/initrd.img-4.19.0-9-s390x -curses -append "net.ifnames=0 noresume security=selinux root=/dev/vda ro lockdown=confidentiality" -device virtio-net-ccw,netdev=net0,mac=02:02:00:00:01:02 -netdev tap,id=net0,helper=/usr/lib/qemu/qemu-bridge-helper

Try It Out

I’ve got a S390X system online for a while, “ssh root@s390x.coker.com.au” with password “SELINUX” to try it out.

PPC64

I’ve tried running a PPC64 virtual machine, I did the same things to set it up and then tried launching it with the following result:

qemu-system-ppc64 -drive format=raw,file=/vmstore/ppc64,if=virtio -nographic -m 1024 -kernel /boot/ppc64/vmlinux-4.19.0-9-powerpc64le -initrd /boot/ppc64/initrd.img-4.19.0-9-powerpc64le -curses -append "root=/dev/vda ro"

Above is the minimal qemu command that I’m using. Below is the result, it stops after the “4.” from “4.19.0-9”. Note that I had originally tried with a more complete and usable set of options, but I trimmed it to the minimal needed to demonstrate the problem.

  Copyright (c) 2004, 2017 IBM Corporation All rights reserved.
  This program and the accompanying materials are made available
  under the terms of the BSD License available at
  http://www.opensource.org/licenses/bsd-license.php

Booting from memory...
Linux ppc64le
#1 SMP Debian 4.

The kernel is from the package linux-image-4.19.0-9-powerpc64le which is a dependency of the package linux-image-ppc64el in Debian/Buster. The program qemu-system-ppc64 is from version 5.0-5 of the qemu-system-ppc package.

Any suggestions on what I should try next would be appreciated.

July 03, 2020

Desklab Portable USB-C Monitor

I just got a 15.6″ 4K resolution Desklab portable touchscreen monitor [1]. It takes power via USB-C and video input via USB-C or mini HDMI, has touch screen input, and has speakers built in for USB or HDMI sound.

PC Use

I bought a mini-DisplayPort to HDMI adapter and for my first test ran it from my laptop, it was seen as a 1920*1080 DisplayPort monitor. The adaptor is specified as supporting 4K so I don’t know why I didn’t get 4K to work, my laptop has done 4K with other monitors.

The next thing I plan to get is a VGA to HDMI converter so I can use this on servers, it can be a real pain getting a monitor and power cable to a rack mounted server and this portable monitor can be powered by one of the USB ports in the server. A quick search indicates that such devices start at about $12US.

The Desklab monitor has no markings to indicate what resolution it supports, no part number, and no serial number. The only documentation I could find about how to recognise the difference between the FullHD and 4K versions is that the FullHD version supposedly draws 2A and the 4K version draws 4A. I connected my USB Ammeter and it reported that between 0.6 and 1.0A were drawn. If they meant to say 2W and 4W instead of 2A and 4A (I’ve seen worse errors in manuals) then the current drawn would indicate the 4K version. Otherwise the stated current requirements don’t come close to matching what I’ve measured.

Power

The promise of USB-C was power from anywhere to anywhere. I think that such power can theoretically be done with USB 3 and maybe USB 2, but asymmetric cables make it more challenging.

I can power my Desklab monitor from a USB battery, from my Thinkpad’s USB port (even when the Thinkpad isn’t on mains power), and from my phone (although the phone battery runs down fast as expected). When I have a mains powered USB charger (for a laptop and rated at 60W) connected to one USB-C port and my phone on the other the phone can be charged while giving a video signal to the display. This is how it’s supposed to work, but in my experience it’s rare to have new technology live up to it’s potential at the start!

One thing to note is that it doesn’t have a battery. I had imagined that it would have a battery (in spite of there being nothing on their web site to imply this) because I just couldn’t think of a touch screen device not having a battery. It would be nice if there was a version of this device with a big battery built in that could avoid needing separate cables for power and signal.

Phone Use

The first thing to note is that the Desklab monitor won’t work with all phones, whether a phone will take the option of an external display depends on it’s configuration and some phones may support an external display but not touchscreen. The Huawei Mate devices are specifically listed in the printed documentation as being supported for touchscreen as well as display. Surprisingly the Desklab web site has no mention of this unless you download the PDF of the manual, they really should have a list of confirmed supported devices and a forum for users to report on how it works.

My phone is a Huawei Mate 10 Pro so I guess I got lucky here. My phone has a “desktop mode” that can be enabled when I connect it to a USB-C device (not sure what criteria it uses to determine if the device is suitable). The desktop mode has something like a regular desktop layout and you can move windows around etc. There is also the option of having a copy of the phone’s screen, but it displays the image of the phone screen vertically in the middle of the landscape layout monitor which is ridiculous.

When desktop mode is enabled it’s independent of the phone interface so I had to find the icons for the programs I wanted to run in an unsorted list with no search usable (the search interface of the app list brings up the keyboard which obscures the list of matching apps). The keyboard takes up more than half the screen and there doesn’t seem to be a way to make it smaller. I’d like to try a portrait layout which would make the keyboard take something like 25% of the screen but that’s not supported.

It’s quite easy to type on a keyboard that’s slightly larger than a regular PC keyboard (a 15″ display with no numeric keypad or cursor control keys). The hackers keyboard app might work well with this as it has cursor control keys. The GUI has an option for full screen mode for an app which is really annoying to get out of (you have to use a drop down from the top of the screen), full screen doesn’t make sense for a display this large. Overall the GUI is a bit clunky, imagine Windows 3.1 with a start button and task bar. One interesting thing to note is that the desktop and phone GUIs can be run separately, so you can type on the Desklab (or any similar device) and look things up on the phone. Multiple monitors never really interested me for desktop PCs because switching between windows is fast and easy and it’s easy to resize windows to fit several on the desktop. Resizing windows on the Huawei GUI doesn’t seem easy (although I might be missing some things) and the keyboard takes up enough of the screen that having multiple windows open while typing isn’t viable.

I wrote the first draft of this post on my phone using the Desklab display. It’s not nearly as easy as writing on a laptop but much easier than writing on the phone screen.

Currently Desklab is offering 2 models for sale, 4K resolution for $399US and FullHD for $299US. I got the 4K version which is very expensive at the moment when converted to Australian dollars. There are significantly cheaper USB-C monitors available (such as this ASUS one from Kogan for $369AU), but I don’t think they have touch screens and therefore can’t be used with a phone unless you enable the phone screen as touch pad mode and have a mouse cursor on screen. I don’t know if all Android devices support that, it could be that a large part of the desktop experience I get is specific to Huawei devices.

One annoying feature is that if I use the phone power button to turn the screen off it shuts down the connection to the Desklab display, but the phone screen will turn off it I leave it alone for the screen timeout (which I have set to 10 minutes).

Caveats

When I ordered this I wanted the biggest screen possible. But now that I have it the fact that it doesn’t fit in the pocket of my Scott e Vest jacket [2] will limit what I can do with it. Maybe I’ll be buying a 13″ monitor in the near future, I expect that Desklab will do well and start selling them in a wide range of sizes. A 15.6″ portable device is inconvenient even if it is in the laptop format, a thin portable screen is inconvenient in many ways.

Netflix doesn’t display video on the Desklab screen, I suspect that Netflix is doing this deliberately as some misguided attempt at stopping piracy. It is really good for watching video as it has the speakers in good locations for stereo sound, it’s a pity that Netflix is difficult.

The functionality on phones from companies other than Huawei is unknown. It is likely to work on most Android phones, but if a particular phone is important to you then you want to Google for how it worked for others.

July 02, 2020

Isolating PHP Web Sites

If you have multiple PHP web sites on a server in a default configuration they will all be able to read each other’s files in a default configuration. If you have multiple PHP web sites that have stored data or passwords for databases in configuration files then there are significant problems if they aren’t all trusted. Even if the sites are all trusted (IE the same person configures them all) if there is a security problem in one site it’s ideal to prevent that being used to immediately attack all sites.

mpm_itk

The first thing I tried was mpm_itk [1]. This is a version of the traditional “prefork” module for Apache that has one process for each HTTP connection. When it’s installed you just put the directive “AssignUserID USER GROUP” in your VirtualHost section and that virtual host runs as the user:group in question. It will work with any Apache module that works with mpm_prefork. In my experiment with mpm_itk I first tried running with a different UID for each site, but that conflicted with the pagespeed module [2]. The pagespeed module optimises HTML and CSS files to improve performance and it has a directory tree where it stores cached versions of some of the files. It doesn’t like working with copies of itself under different UIDs writing to that tree. This isn’t a real problem, setting up the different PHP files with database passwords to be read by the desired group is easy enough. So I just ran each site with a different GID but used the same UID for all of them.

The first problem with mpm_itk is that the mpm_prefork code that it’s based on is the slowest mpm that is available and which is also incompatible with HTTP/2. A minor issue of mpm_itk is that it makes Apache take ages to stop or restart, I don’t know why and can’t be certain it’s not a configuration error on my part. As an aside here is a site for testing your server’s support for HTTP/2 [3]. To enable HTTP/2 you have to be running mpm_event and enable the “http2” module. Then for every virtual host that is to support it (generally all https virtual hosts) put the line “Protocols h2 h2c http/1.1” in the virtual host configuration.

A good feature of mpm_itk is that it has everything for the site running under the same UID, all Apache modules and Apache itself. So there’s no issue of one thing getting access to a file and another not getting access.

After a trial I decided not to keep using mpm_itk because I want HTTP/2 support.

php-fpm Pools

The Apache PHP module depends on mpm_prefork so it also has the issues of not working with HTTP/2 and of causing the web server to be slow. The solution is php-fpm, a separate server for running PHP code that uses the fastcgi protocol to talk to Apache. Here’s a link to the upstream documentation for php-fpm [4]. In Debian this is in the php7.3-fpm package.

In Debian the directory /etc/php/7.3/fpm/pool.d has the configuration for “pools”. Below is an example of a configuration file for a pool:

# cat /etc/php/7.3/fpm/pool.d/example.com.conf
[example.com]
user = example.com
group = example.com
listen = /run/php/php7.3-example.com.sock
listen.owner = www-data
listen.group = www-data
pm = dynamic
pm.max_children = 5
pm.start_servers = 2
pm.min_spare_servers = 1
pm.max_spare_servers = 3

Here is the upstream documentation for fpm configuration [5].

Then for the Apache configuration for the site in question you could have something like the following:

ProxyPassMatch "^/(.*\.php(/.*)?)$" "unix:/run/php/php7.3-example.com.sock|fcgi://localhost/usr/share/wordpress/"

The “|fcgi://localhost” part is just part of the way of specifying a Unix domain socket. From the Apache Wiki it appears that the method for configuring the TCP connections is more obvious [6]. I chose Unix domain sockets because it allows putting the domain name in the socket address. Matching domains for the web server to port numbers is something that’s likely to be error prone while matching based on domain names is easier to check and also easier to put in Apache configuration macros.

There was some additional hassle with getting Apache to read the files created by PHP processes (the options include running PHP scripts with the www-data group, having SETGID directories for storing files, and having world-readable files). But this got things basically working.

Nginx

My Google searches for running multiple PHP sites under different UIDs didn’t turn up any good hits. It was only after I found the DigitalOcean page on doing this with Nginx [7] that I knew what to search for to find the way of doing it in Apache.

AudioBooks – June 2020

The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power by Shoshana Zuboff

A good warning of the dangerous designs and goals of firms like Facebook and Google. Sometimes a bit wordy. 3/5

The Calculating Stars: Lady Astronaut Volume 1 by Mary Robinette Kowal

Alternate timeline SF. A meteorite hits the US. The Space program accelerates so humans can escape earth. Our hero faces lots of sexism & other barriers to becoming an astronaut. 3/5

By the Shores of Silver Lake: Little House Series, Book 5 by Laura Ingalls Wilder

The family move to De Smet, South Dakota. The railroad and then a town is built about them over a year. A good entry in the series, some gripping passages. 3/5

The Restaurant: A History of Eating Out
by William Sitwell

A non-exhaustive history. Bouncing through ancient times before focusing on Britain since 1945. But plenty of fun and interesting bits. 3/5

Broadway: A History of New York City in Thirteen Miles by Fran Leadon

A mile by mile coverage from South to North. How each section was added to the street and developed. A range of interesting stories and history. 4/5

My Scoring System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Share

June 30, 2020

Fuck Grey Text

fuck grey text on white backgrounds
fuck grey text on black backgrounds
fuck thin, spindly fonts
fuck 10px text
fuck any size of anything in px
fuck font-weight 300
fuck unreadable web pages
fuck themes that implement this unreadable idiocy
fuck sites that don’t work without javascript
fuck reactjs and everything like it

thank fuck for Stylus. and uBlock Origin. and uMatrix.

Fuck Grey Text is a post from: Errata

June 27, 2020

Links June 2020

Bruce Schneier wrote an informative post about Zoom security problems [1]. He recommends Jitsi which has a Debian package of their software and it’s free software.

Axel Beckert wrote an interesting post about keyboards with small numbers of keys, as few as 28 [2]. It’s not something I’d ever want to use, but interesting to read from a computer science and design perspective.

The Guardian has a disturbing article explaining why we might never get a good Covid19 vaccine [3]. If that happens it will change our society for years if not decades to come.

Matt Palmer wrote an informative blog post about private key redaction [4]. I learned a lot from that. Probably the simplest summary is that you should never publish sensitive data unless you are certain that all that you are publishing is suitable, if you don’t understand it then you don’t know if it’s suitable to be published!

This article by Umair Haque on eand.co has some interesting points about how Freedom is interpreted in the US [5].

This article by Umair Haque on eand.co has some good points about how messed up the US is economically [6]. I think that his analysis is seriously let down by omitting the savings that could be made by amending the US healthcare system without serious changes (EG by controlling drug prices) and by reducing the scale of the US military (there will never be another war like WW2 because any large scale war will be nuclear). If the US government could significantly cut spending in a couple of major areas they could then put the money towards fixing some of the structural problems and bootstrapping a first-world economic system.

The American Conservatrive has an insightful article “Seven Reasons Police Brutality is Systemic Not Anecdotal [7].

Scientific American has an informative article about how genetic engineering could be used to make a Covid-19 vaccine [8].

Rike wrote an insightful post about How Language Changes Our Concepts [9]. They cover the differences between the French, German, and English languages based on gender and on how the language limits thoughts. Then conclude with the need to remove terms like master/slave and blacklist/whitelist from our software, with a focus on Debian but it’s applicable to all software.

Gunnar Wolf also wrote an insightful post On Masters and Slaves, Whitelists and Blacklists [10], they started with why some people might not understand the importance of the issue and then explained some ways of addressing it. The list of suggested terms includes Primary-secondary, Leader-follower, and some other terms which have slightly different meanings and allow more precision in describing the computer science concepts used. We can be more precise when describing computer science while also not using terms that marginalise some groups of people, it’s a win-win!

Both Rike and Gunnar were responding to a LWN article about the plans to move away from Master/Slave and Blacklist/Whitelist in the Linux kernel [11]. One of the noteworthy points in the LWN article is that there are about 70,000 instances of words that need to be changed in the Linux kernel so this isn’t going to happen immediately. But it will happen eventually which is a good thing.

Vale Marcus de Rijk

Vale Marcus de Rijk kattekrab Sat, 27/06/2020 - 10:16

June 25, 2020

How Will the Pandemic Change Things?

The Bulwark has an interesting article on why they can’t “Reopen America” [1]. I wonder how many changes will be long term. According to the Wikipedia List of Epidemics [2] Covid-19 so far hasn’t had a high death toll when compared to other pandemics of the last 100 years. People’s reactions to this vary from doing nothing to significant isolation, the question is what changes in attitudes will be significant enough to change society.

Transport

One thing that has been happening recently is a transition in transport. It’s obvious that we need to reduce CO2 and while electric cars will address the transport part of the problem in the long term changing to electric public transport is the cheaper and faster way to do it in the short term. Before Covid-19 the peak hour public transport in my city was ridiculously overcrowded, having people unable to board trams due to overcrowding was really common. If the economy returns to it’s previous state then I predict less people on public transport, more traffic jams, and many more cars idling and polluting the atmosphere.

Can we have mass public transport that doesn’t give a significant disease risk? Maybe if we had significantly more trains and trams and better ventilation with more airflow designed to suck contaminated air out. But that would require significant engineering work to design new trams, trains, and buses as well as expense in refitting or replacing old ones.

Uber and similar companies have been taking over from taxi companies, one major feature of those companies is that the vehicles are not dedicated as taxis. Dedicated taxis could easily be designed to reduce the spread of disease, the famed Black Cab AKA Hackney Carriage [3] design in the UK has a separate compartment for passengers with little air flow to/from the driver compartment. It would be easy to design such taxis to have entirely separate airflow and if setup to only take EFTPOS and credit card payment could avoid all contact between the driver and passengers. I would prefer to have a Hackney Carriage design of vehicle instead of a regular taxi or Uber.

Autonomous cars have been shown to basically work. There are some concerns about safety issues as there are currently corner cases that car computers don’t handle as well as people, but of course there are also things computers do better than people. Having an autonomous taxi would be a benefit for anyone who wants to avoid other people. Maybe approval could be rushed through for autonomous cars that are limited to 40Km/h (the maximum collision speed at which a pedestrian is unlikely to die), in central city areas and inner suburbs you aren’t likely to drive much faster than that anyway.

Car share services have been becoming popular, for many people they are significantly cheaper than owning a car due to the costs of regular maintenance, insurance, and depreciation. As the full costs of car ownership aren’t obvious people may focus on the disease risk and keep buying cars.

Passenger jets are ridiculously cheap. But this relies on the airline companies being able to consistently fill the planes. If they were to add measures to reduce cross contamination between passengers which slightly reduces the capacity of planes then they need to increase ticket prices accordingly which then reduces demand. If passengers are just scared of flying in close proximity and they can’t fill planes then they will have to increase prices which again reduces demand and could lead to a death spiral. If in the long term there aren’t enough passengers to sustain the current number of planes in service then airline companies will have significant financial problems, planes are expensive assets that are expected to last for a long time, if they can’t use them all and can’t sell them then airline companies will go bankrupt.

It’s not reasonable to expect that the same number of people will be travelling internationally for years (if ever). Due to relying on economies of scale to provide low prices I don’t think it’s possible to keep prices the same no matter what they do. A new economic balance of flights costing 2-3 times more than we are used to while having significantly less passengers seems likely. Governments need to spend significant amounts of money to improve trains to take over from flights that are cancelled or too expensive.

Entertainment

The article on The Bulwark mentions Las Vegas as a city that will be hurt a lot by reductions in travel and crowds, the same thing will happen to tourist regions all around the world. Australia has a significant tourist industry that will be hurt a lot. But the mention of Las Vegas makes me wonder what will happen to the gambling in general. Will people avoid casinos and play poker with friends and relatives at home? It seems that small stakes poker games among friends will be much less socially damaging than casinos, will this be good for society?

The article also mentions cinemas which have been on the way out since the video rental stores all closed down. There’s lots of prime real estate used for cinemas and little potential for them to make enough money to cover the rent. Should we just assume that most uses of cinemas will be replaced by Netflix and other streaming services? What about teenage dates, will kissing in the back rows of cinemas be replaced by “Netflix and chill”? What will happen to all the prime real estate used by cinemas?

Professional sporting matches have been played for a TV-only audience during the pandemic. There’s no reason that they couldn’t make a return to live stadium audiences when there is a vaccine for the disease or the disease has been extinguished by social distancing. But I wonder if some fans will start to appreciate the merits of small groups watching large TVs and not want to go back to stadiums, can this change the typical behaviour of groups?

Restaurants and cafes are going to do really badly. I previously wrote about my experience running an Internet Cafe and why reopening businesses soon is a bad idea [4]. The question is how long this will go for and whether social norms about personal space will change things. If in the long term people expect 25% more space in a cafe or restaurant that’s enough to make a significant impact on profitability for many small businesses.

When I was young the standard thing was for people to have dinner at friends homes. Meeting friends for dinner at a restaurant was uncommon. Recently it seemed to be the most common practice for people to meet friends at a restaurant. There are real benefits to meeting at a restaurant in terms of effort and location. Maybe meeting friends at their home for a delivered dinner will become a common compromise, avoiding the effort of cooking while avoiding the extra expense and disease risk of eating out. Food delivery services will do well in the long term, it’s one of the few industry segments which might do better after the pandemic than before.

Work

Many companies are discovering the benefits of teleworking, getting it going effectively has required investing in faster Internet connections and hardware for employees. When we have a vaccine the equipment needed for teleworking will still be there and we will have a discussion about whether it should be used on a more routine basis. When employees spend more than 2 hours per day travelling to and from work (which is very common for people who work in major cities) that will obviously limit the amount of time per day that they can spend working. For the more enthusiastic permanent employees there seems to be a benefit to the employer to allow working from home. It’s obvious that some portion of the companies that were forced to try teleworking will find it effective enough to continue in some degree.

One company that I work for has quit their coworking space in part because they were concerned that the coworking company might go bankrupt due to the pandemic. They seem to have become a 100% work from home company for the office part of the work (only on site installation and stock management is done at corporate locations). Companies running coworking spaces and other shared offices will suffer first as their clients have short term leases. But all companies renting out office space in major cities will suffer due to teleworking. I wonder how this will affect the companies providing services to the office workers, the cafes and restaurants etc. Will there end up being so much unused space in central city areas that it’s not worth converting the city cinemas into useful space?

There’s been a lot of news about Zoom and similar technologies. Lots of other companies are trying to get into that business. One thing that isn’t getting much notice is remote access technologies for desktop support. If the IT people can’t visit your desk because you are working from home then they need to be able to remotely access it to fix things. When people make working from home a large part of their work time the issue of who owns peripherals and how they are tracked will get interesting. In a previous blog post I suggested that keyboards and mice not be treated as assets [5]. But what about monitors, 4G/Wifi access points, etc?

Some people have suggested that there will be business sectors benefiting from the pandemic, such as telecoms and e-commerce. If you have a bunch of people forced to stay home who aren’t broke (IE a large portion of the middle class in Australia) they will probably order delivery of stuff for entertainment. But in the long term e-commerce seems unlikely to change much, people will spend less due to economic uncertainty so while they may shift some purchasing to e-commerce apart from home delivery of groceries e-commerce probably won’t go up overall. Generally telecoms won’t gain anything from teleworking, the Internet access you need for good Netflix viewing is generally greater than that needed for good video-conferencing.

Money

I previously wrote about a Basic Income for Australia [6]. One of the most cited reasons for a Basic Income is to deal with robots replacing people. Now we are at the start of what could be a long term economic contraction caused by the pandemic which could reduce the scale of the economy by a similar degree while also improving the economic case for a robotic workforce. We should implement a Universal Basic Income now.

I previously wrote about the make-work jobs and how we could optimise society to achieve the worthwhile things with less work [7]. My ideas about optimising public transport and using more car share services may not work so well after the pandemic, but the rest should work well.

Business

There are a number of big companies that are not aiming for profitability in the short term. WeWork and Uber are well documented examples. Some of those companies will hopefully go bankrupt and make room for more responsible companies.

The co-working thing was always a precarious business. The companies renting out office space usually did so on a monthly basis as flexibility was one of their selling points, but they presumably rented buildings on an annual basis. As the profit margins weren’t particularly high having to pay rent on mostly empty buildings for a few months will hurt them badly. The long term trend in co-working spaces might be some sort of collaborative arrangement between the people who run them and the landlords similar to the way some of the hotel chains have profit sharing agreements with land owners to avoid both the capital outlay for buying land and the risk involved in renting. Also city hotels are very well equipped to run office space, they have the staff and the procedures for running such a business, most hotels also make significant profits from conventions and conferences.

The way the economy has been working in first world countries has been about being as competitive as possible. Just in time delivery to avoid using storage space and machines to package things in exactly the way that customers need and no more machines than needed for regular capacity. This means that there’s no spare capacity when things go wrong. A few years ago a company making bolts for the car industry went bankrupt because the car companies forced the prices down, then car manufacture stopped due to lack of bolts – this could have been a wake up call but was ignored. Now we have had problems with toilet paper shortages due to it being packaged in wholesale quantities for offices and schools not retail quantities for home use. Food was destroyed because it was created for restaurant packaging and couldn’t be packaged for home use in a reasonable amount of time.

Farmer’s markets alleviate some of the problems with packaging food etc. But they aren’t a good option when there’s a pandemic as disease risk makes them less appealing to customers and therefore less profitable for vendors.

Religion

Many religious groups have supported social distancing. Could this be the start of more decentralised religion? Maybe have people read the holy book of their religion and pray at home instead of being programmed at church? We can always hope.

June 24, 2020

Automated MythTV-related maintenance tasks

Here is the daily/weekly cronjob I put together over the years to perform MythTV-related maintenance tasks on my backend server.

The first part performs a database backup:

5 1 * * *  mythtv  /usr/share/mythtv/mythconverg_backup.pl

which I previously configured by putting the following in /home/mythtv/.mythtv/backuprc:

DBBackupDirectory=/var/backups/mythtv

and creating a new directory for it:

mkdir /var/backups/mythtv
chown mythtv:mythtv /var/backups/mythtv

The second part of /etc/cron.d/mythtv-maintenance runs a contrib script to optimize the database tables:

10 1 * * *  mythtv  /usr/bin/chronic /usr/share/doc/mythtv-backend/contrib/maintenance/optimize_mythdb.pl

once a day. It requires the libmythtv-perl and libxml-simple-perl packages to be installed on Debian-based systems.

It is quickly followed by a check of the recordings and automatic repair of the seektable (when possible):

20 1 * * *  mythtv  /usr/bin/chronic /usr/bin/mythutil --checkrecordings --fixseektable

Next, I force a scan of the music and video databases to pick up anything new that may have been added externally via NFS mounts:

30 1 * * *  mythtv  /usr/bin/mythutil --quiet --scanvideos
31 1 * * *  mythtv  /usr/bin/mythutil --quiet --scanmusic

Finally, I defragment the XFS partition for two hours every day except Friday:

45 1 * * 1-4,6-7  root  /usr/sbin/xfs_fsr

and resync the RAID-1 arrays once a week to ensure that they stay consistent and error-free:

15 3 * * 2  root  /usr/local/sbin/raid_parity_check md0
15 3 * * 4  root  /usr/local/sbin/raid_parity_check md2

using a trivial script.

In addition to that cronjob, I also have smartmontools run daily short and weekly long SMART tests via this blurb in /etc/smartd.conf:

/dev/sda -a -d ata -o on -S on -s (S/../.././04|L/../../6/05)
/dev/sdb -a -d ata -o on -S on -s (S/../.././04|L/../../6/05)

If there are any other automated maintenance tasks you do on your MythTV server, please leave a comment!

June 23, 2020

Squirrelmail vs Roundcube

For some years I’ve had SquirrelMail running on one of my servers for the people who like such things. It seems that the upstream support for SquirrelMail has ended (according to the SquirrelMail Wikipedia page there will be no new releases just Subversion updates to fix bugs). One problem with SquirrelMail that seems unlikely to get fixed is the lack of support for base64 encoded From and Subject fields which are becoming increasingly popular nowadays as people who’s names don’t fit US-ASCII are encoding them in their preferred manner.

I’ve recently installed Roundcube to provide an alternative. Of course one of the few important users of webmail didn’t like it (apparently it doesn’t display well on a recent Samsung Galaxy Note), so now I have to support two webmail systems.

Below is a little Perl script to convert a SquirrelMail abook file into the csv format used for importing a RoundCube contact list.

#!/usr/bin/perl

print "First Name,Last Name,Display Name,E-mail Address\n";
while(<STDIN>)
{
  chomp;
  my @fields = split(/\|/, $_);
  printf("%s,%s,%s %s,%s\n", $fields[1], $fields[2], $fields[0], $fields[4], $fields[3]);
}

June 21, 2020

Open IP over VHF/UHF

I’ve recently done a lot of work on the Codec 2 FSK modem, here is the new README_fsk. It now works at lower SNRs, has been refactored, and is supported by a suite of automated tests.

There is some exciting work going on with Codec 2 modems and VHF/UHF IP links using TAP/TUN (thanks Tomas and Jeroen) – a Linux technology for building IP links from user space “data pumps” – like the Codec 2 modems.

My initial goal for this work is a “100 kbit/s IP link” for VHF/UHF using Codec 2 modems and SDR. One application is moving Ham Radio data past the 1995-era “9600 bits/s data port” paradigm to real time IP.

I’m also interested in IP over TV Whitespace (spare VHF/UHF spectrum) for emergency and developing world applications. I’m a judge for developing world IT grants and the “last 100km” problem comes up again and again. This solution requires just a Raspberry Pi and RTLSDR. At these frequencies antennas could be simply fabricated from wire (cut for the frequency of operation), and soldered directly to the Pi.

Results and Link Budget

As a first step, I’ve taken another look at using RpiTx for FSK, this time at VHF and starting at a modest 10 kbits/s. Over the weekend I performed some Minimum Detectable Signal (MDS) tests and confirmed the 2FSK modem built with RpiTx, a RTL-SDR, and the Codec 2 modem is right on theory at 10 kbits/s, with a MDS of -120dBm.

Putting this in context, a UHF signal has a path loss of 125dB over 100km. So if you have a line of site path, a 10mW (10dBm) signal will be 10-125 = -115dBm at your receiver (assuming basic 0dBi antennas). As -115dBm is greater than the -120dBm MDS, this means your data will be received error free (especially when we add forward error correction). We have sufficient “link margin” and the “link” is closed.

While our 10 kbits/s starting point doesn’t sound like much – even at that rate we get to send 10000*3600*24/8/140 = 771,000 140 byte text messages each day to another station on your horizon. That’s a lot of connectivity in an emergency or when the alternative where you live is nothing.

Method

I’m using the GitHub PR as a logbook for the work, I quite like GitHub and Markdown. This weekends MDS experiments start here.

I had the usual fun and games with attenuating the Rx signal from the Pi down to -120dBm. The transmit signal tries hard to leak around the attenuators via a RF path. I moved the unshielded Pi into another room, and built a “plastic bag and aluminium foil” Faraday cage which worked really well:


These are complex systems and many things can go wrong. Are your Tx/Rx sample clocks close enough? Is your rx signal clipping? Is the gain of your radio sufficient to reduce quantisation noise? Bug in your modem code? DC line in your RTLSDR signal? Loose SMA connector?

I’ve learnt the hard way to test very carefully at each step. First, I run off air samples through a non-real time modem Octave simulation to visualise what’s going on inside the modem. A software oscilloscope.

An Over the Cable (OTC) test is essential before trying Over the Air (OTA) as it gives you a controlled environment to spot issues. MDS tests that measure the Bit error Rate (BER) are also excellent, they effectively absorb every factor in the system and give you an overall score (the Bit Error Rate) you can compare to theory.

Spectral Purity

Here is the spectrum of the FSK signal for a …01010… sequence at 100 kbit/s, at two resolution bandwidths:

The Tx power is about 10dBm, this plot is after some attenuation. I haven’t carefully checked the spurious levels, but the above looks like around -40dBc (off a low 10mW EIRP) over this 1MHz span. If I am reading the Australian regulations correctly (Section 7A of the Amateur LCD) the requirement is 43+10log(P) = 43+10log10(0.01) = 23dBc, so we appear to pass.

Conclusion

This is “extreme open source”. The transmitter is software, the modem is software. All open source and free as in beer and speech. No chipsets or application specific radio hardware – just some CPU cycles and a down converter supplied by the Pi and RTLSDR. The only limits are those of physics – which we have reached with the MDS tests.

Reading Further

Pi Radio IP – Current GitHub Repo for this work
FSK modem support for TAP/TUN – Early GitHub PR for this work
Testing a RTL-SDR with FSK on HF
High Speed Balloon Data Links
Codec 2 FSK modem README – includes lots of links and sample applications.

June 19, 2020

Storage Trends

In considering storage trends for the consumer side I’m looking at the current prices from MSY (where I usually buy computer parts). I know that other stores will have slightly different prices but they should be very similar as they all have low margins and wholesale prices are the main factor.

Small Hard Drives Aren’t Viable

The cheapest hard drive that MSY sells is $68 for 500G of storage. The cheapest SSD is $49 for 120G and the second cheapest is $59 for 240G. SSD is cheaper at the low end and significantly faster. If someone needed about 500G of storage there’s a 480G SSD for $97 which costs $29 more than a hard drive. With a modern PC if you have no hard drives you will notice that it’s quieter. For anyone who’s buying a new PC spending an extra $29 is definitely worthwhile for the performance, low power use, and silence.

The cheapest 1TB disk is $69 and the cheapest 1TB SSD is $159. Saving $90 on the cost of a new PC probably isn’t worth while.

For 2TB of storage the cheapest options are Samsung NVMe for $339, Crucial SSD for $335, or a hard drive for $95. Some people would choose to save $244 by getting a hard drive instead of NVMe, but if you are getting a whole system then allocating $244 to NVMe instead of a faster CPU would probably give more benefits overall.

Computer stores typically have small margins and computer parts tend to quickly either become cheaper or be obsoleted by better parts. So stores don’t want to stock parts unless they will sell quickly. Disks smaller than 2TB probably aren’t going to be profitable for stores for very long. The trend of SSD and NVMe becoming cheaper is going to make 2TB disks non-viable in the near future.

NVMe vs SSD

M.2 NVMe devices are at comparable prices to SATA SSDs. For some combinations of quality and capacity NVMe is about 50% more expensive and for some it’s slightly cheaper (EG Intel 1TB NVMe being cheaper than Samsung EVO 1TB SSD). Last time I checked about half the motherboards on sale had a single M.2 socket so for a new workstation that doesn’t need more than 2TB of storage (the largest NVMe that MSY sells) it wouldn’t make sense to use anything other than NVMe.

The benefit of NVMe is NOT throughput (even though NVMe devices can often sustain over 4GB/s), it’s low latency. Workstations can’t properly take advantage of this because RAM is so cheap ($198 for 32G of DDR4) that compiles etc mostly come from cache and because most filesystem writes on workstations aren’t synchronous. For servers a large portion of writes are synchronous, for example a mail server can’t acknowledge receiving mail until it knows that it’s really on disk, so there’s a lot of small writes that block server processes and the low latency of NVMe really improves performance. If you are doing a big compile on a workstation (the most common workstation task that uses a lot of disk IO) then the writes aren’t synchronised to disk and if the system crashes you will just do all the compilation again. While NVMe doesn’t give a lot of benefit over SSD for workstation use (I’ve uses laptops with SSD and NVMe and not noticed a great difference) of course I still want better performance. ;)

Last time I checked I couldn’t easily buy a PCIe card that supported 2*NVMe cards, I’m sure they are available somewhere but it would take longer to get and probably cost significantly more than twice as much. That means a RAID-1 of NVMe takes 2 PCIe slots if you don’t have an M.2 socket on the motherboard. This was OK when I installed 2*NVMe devices on a server that had 18 disks and lots of spare PCIe slots. But for some systems PCIe slots are an issue.

My home server has all PCIe slots used by a video card and Ethernet cards and the BIOS probably won’t support booting from NVMe. It’s a Dell server so I can’t just replace the motherboard with one that has more PCIe slots and M.2 on the motherboard. As it’s running nicely and doesn’t need replacing any time soon I won’t be using NVMe for home server stuff.

Small Servers

Most servers that I am responsible for have less than 2TB of storage. For my clients I now only recommend SSD storage for small servers and am recommending SSD for replacing any failed disks.

My home server has 2*500G SSDs in a BTRFS RAID-1 for the root filesystem, and 3*4TB disks in a BTRFS RAID-1 for storing big files. I bought the SSDs when 500G SSDs were about $250 each and bought 2*4TB disks when they were about $350 each. Currently that server has about 3.3TB of space used and I could probably get it down to about 2.5TB if I deleted things I don’t really need. If I was getting storage for that server now I’d use 2*2TB SSDs and 3*1TB hard drives for the stuff that doesn’t fit on SSDs (I have some spare 1TB disks that came with servers). If I didn’t have spare hard drives I’d get 3*2TB SSDs for that sort of server which would give 3TB of BTRFS RAID-1 storage.

Last time I checked Dell servers had a card for supporting M.2 as an optional extra so Dells probably won’t boot from NVMe without extra expense.

Ars Technica has an informative article about WD selling SMR disks as “NAS” disks [1]. The Shingled Magnetic Recording technology allows greater storage density on a platter which leads to either larger capacity or cheaper disks but at the cost of lower write performance and apparently extremely bad latency in some situations. NAS disks are supposed to be low latency as the expectation is that they will be used in a RAID array and kicked out of the array if they have problems. There are reports of ZFS kicking SMR disks from RAID sets. I think this will end the use of hard drives for small servers. For a server you don’t want to deal with this sort of thing, by definition when a server goes down multiple people will stop work (small server implies no clustering). Spending extra to get SSDs just to avoid the risk of unexpected SMR would be a good plan.

Medium Servers

The largest SSD and NVMe devices that are readily available are 2TB but 10TB disks are commodity items, there are reports of 20TB hard drives being available but I can’t find anyone in Australia selling them.

If you need to store dozens or hundreds of terabytes than hard drives have to be part of the mix at this time. There’s no technical reason why SSDs larger than 10TB can’t be made (the 2.5″ SATA form factor has more than 5* the volume of a 2TB M.2 card) and it’s likely that someone sells them outside the channels I buy from, but probably at a price higher than what my clients are willing to pay. If you want 100TB of affordable storage then a mid range server like the Dell PowerEdge T640 which can have up to 18*3.5″ disks is good. One of my clients has a PowerEdge T630 with 18*3.5″ disks in the 8TB-10TB range (we replace failed disks with the largest new commodity disks available, it used to have 6TB disks). ZFS version 0.8 introduced a “Special VDEV Class” which stores metadata and possibly small data blocks on faster media. So you could have some RAID-Z groups on hard drives for large storage and the metadata on a RAID-1 on NVMe for fast performance. For medium size arrays on hard drives having a “find /” operation take hours is not uncommon, for large arrays having it take days isn’t that uncommon. So far it seems that ZFS is the only filesystem to have taken the obvious step of storing metadata on SSD/NVMe while bulk data is on cheap large disks.

One problem with large arrays is that the vibration of disks can affect the performance and reliability of nearby disks. The ZFS server I run with 18 disks was originally setup with disks from smaller servers that never had ZFS checksum errors, but when disks from 2 small servers were put in one medium size server they started getting checksum errors presumably due to vibration. This alone is a sufficient reason for paying a premium for SSD storage.

Currently the cost of 2TB of SSD or NVMe is between the prices of 6TB and 8TB hard drives, and the ratio of price/capacity for SSD and NVMe is improving dramatically while the increase in hard drive capacity is slow. 4TB SSDs are available for $895 compared to a 10TB hard drive for $549, so it’s 4* more expensive on a price per TB. This is probably good for Windows systems, but for Linux systems where ZFS and “special VDEVs” is an option it’s probably not worth considering. Most Linux user cases where 4TB SSDs would work well would be better served by smaller NVMe and 10TB disks running ZFS. I don’t think that 4TB SSDs are at all popular at the moment (MSY doesn’t stock them), but prices will come down and they will become common soon enough. Probably by the end of the year SSDs will halve in price and no hard drives less than 4TB will be viable.

For rack mounted servers 2.5″ disks have been popular for a long time. It’s common for vendors to offer 2 versions of a rack mount server for 2.5″ and 3.5″ disks where the 2.5″ version takes twice as many disks. If the issue is total storage in a server 4TB SSDs can give the same capacity as 8TB HDDs.

SMR vs Regular Hard Drives

Rumour has it that you can buy 20TB SMR disks, I haven’t been able to find a reference to anyone who’s selling them in Australia (please comment if you know who sells them and especially if you know the price). I expect that the ZFS developers will soon develop a work-around to solve the problems with SMR disks. Then arrays of 20TB SMR disks with NVMe for “special VDEVs” will be an interesting possibility for storage. I expect that SMR disks will be the majority of the hard drive market by 2023 – if hard drives are still on the market. SSDs will be large enough and cheap enough that only SMR disks will offer enough capacity to be worth using.

I think that it is a possibility that hard drives won’t be manufactured in a few years. The volume of a 3.5″ disk is significantly greater than that of 10 M.2 devices so current technology obviously allows 20TB of NVMe or SSD storage in the space of a 3.5″ disk. If the price of 16TB NVMe and SSD devices comes down enough (to perhaps 3* the price of a 20TB hard drive) almost no-one would want the hard drive and it wouldn’t be viable to manufacture them.

It’s not impossible that in a few years time 3D XPoint and similar fast NVM technologies occupy the first level of storage (the ZFS “special VDEV”, OS swap device, log device for database servers, etc) and NVMe occupies the level for bulk storage with no space left in the market for spinning media.

Computer Cases

For servers I expect that models supporting 3.5″ storage devices will disappear. A 1RU server with 8*2.5″ storage devices or a 2RU server with 16*2.5″ storage devices will probably be of use to more people than a 1RU server with 4*3.5″ or a 2RU server with 8*3.5″.

My first IBM PC compatible system had a 5.25″ hard drive, a 5.25″ floppy drive, and a 3.5″ floppy drive in 1988. My current PC is almost a similar size and has a DVD drive (that I almost never use) 5 other 5.25″ drive bays that have never been used, and 5*3.5″ drive bays that I have never used (I have only used 2.5″ SSDs). It would make more sense to have PC cases designed around 2.5″ and maybe 3.5″ drives with no more than one 5.25″ drive bay.

The Intel NUC SFF PCs are going in the right direction. Many of them only have a single storage device but some of them have 2*M.2 sockets allowing RAID-1 of NVMe and some of them support ECC RAM so they could be used as small servers.

A USB DVD drive costs $36, it doesn’t make sense to have every PC designed around the size of an internal DVD drive that will probably only be used to install the OS when a $36 USB DVD drive can be used for every PC you own.

The only reason I don’t have a NUC for my personal workstation is that I get my workstations from e-waste. If I was going to pay for a PC then a NUC is the sort of thing I’d pay to have on my desk.

LUV June 2020 Workshop: Emergency Security Discussion

Jun 20 2020 12:30
Jun 20 2020 14:30
Jun 20 2020 12:30
Jun 20 2020 14:30
Location: 
Online event (TBA)

On Friday morning, our prime minister held an unprecedented press conference to warn Australia (Governments, Industry & Individuals) about a sophisticated cyber attack that is currently underway.

 

 

Linux Users of Victoria is a subcommittee of Linux Australia.

June 20, 2020 - 12:30

read more

June 16, 2020

Linux Security Summit North America 2020: Online Schedule

Just a quick update on the Linux Security Summit North America (LSS-NA) for 2020.

The event will take place over two days as an online event, due to COVID-19.  The dates are now July 1-2, and the full schedule details may be found here.

The main talks are:

There are also short (30 minute) topics:

This year we will also have a Q&A panel at the end of each day, moderated by Elena Reshetova. The panel speakers are:

  • Nayna Jain
  • Andrew Lutomirski
  • Dmitry Vyukov
  • Emily Ratliff
  • Alexander Popov
  • Christian Brauner
  • Allison Marie Naaktgeboren
  • Kees Cook
  • Mimi Zohar

LSS-NA this year is included with OSS+ELC registration, which is USD $50 all up.  Register here.

Hope to see you soon!

June 14, 2020

Codec 2 HF Data Modes 1

Since “attending” MHDC last month I’ve taken an interest in open source HF data modems. So I’ve been busy refactoring the Codec 2 OFDM modem for use with HF data.

The major change is from streaming small (28 bit) voice frames to longer (few hundred byte) packets of data. In some ways data is easier than PTT voice: latency is no longer an issue, and I can use nice long FEC codewords that ride over fades. On the flip side we really care about bit errors with data, for voice it’s acceptable to pass frames with errors to the speech decoder, and let the human ear work it out.

As a first step I’ve been working with GNU Octave simulations, and have developed 3 candidate data modes that I have been testing against simulated HF channels. In simulation they work well with up to 4ms of delay and 2.5Hz of Doppler.

Here are the simulation results for 10% Packet Error Rate (PER). The multipath channel has 2ms delay spread and 2Hz Doppler (CCITT Multipath Poor channel).

Mode Est Bytes/min AWGN SNR (dB) Multipath Poor SNR (dB)
datac1 6000 3 12
datac2 3000 1 7
datac3 1200 -3 0

The bytes/minute metric is commonly used by Winlink (divide by 7.5 for bits/s). I’ve assumed a 20% overhead for ARQ and other overheads. HF data isn’t fast – it’s a tough, narrow channel to push data through. But for certain applications (e.g. if you’re off the grid, or when the lights go out) it may be all you have. Even these low rates can be quite useful, 1200 bytes/minute is 8.5 tweets or SMS texts/minute.

The modem waveforms are pilot assisted coherent PSK using LDPC FEC codes. Coherent PSK can have gains of up to 6dB over differential PSK (DPSK) modems commonly used on HF.

Before I get too far along I wanted to try them over a real HF channels, to make sure I was on the right track. So much can go wrong with DSP in the real world!

So today I sent the new data waveforms over the air for the first time, using an 800km path on the 40m band from my home in Adelaide South Australia to a KiwiSDR about 800km away in Melbourne, Victoria.

Mode Est Bytes/min Power (Wrms) Est SNR (dB) Packets Tx/Rx
datac1 6000 10 10-15 15/15
datac2 3000 10 5-15 8/8
datac3 1200 0.5 -2 20/25

The Tx power is the RMS measured on my spec-an, for the 10W RMS samples it was 75W PEP. The SNR is measured in a 3000Hz noise bandwidth, I have a simple dipole at my end, not sure what the KiwiSDR was using.

I’m quite happy with these results. To give the c3 waveform a decent work out I dropped the power down to just 0.5W (listen), and I could still get 30% of the packets through at 100mW. A few of the tests had significant fading, however it was not very fast. My simulations are far tougher. Maybe I’ll try a NVIS path to give the modem a decent test on fast fading channels.

Here is the spectrogram (think waterfall on it’s side) for the -2dB datac3 sample:

Here are the uncoded (raw) errors, and the errors after FEC. Most of the frames made it. This mode employs a rate 1/3 LDPC code that was developed by Bill, VK5DSP. It can work at up to 16% raw BER! The errors at the end are due to the Tx signal ending, at this stage of development I just have a simple state machine with no “squelch”.

We have also been busy developing an API for the Codec 2 modems, see README_data.md. The idea is to allow developers of HF data protocols and applications to use the Codec 2 modems. As well as the “raw” HF data API, there is a very nice Ethernet style framer for VHF packet developed by Jeroen Vreeken.

If anyone would like to try running the modem Octave code take a look at the GitHub PR.

Reading Further

QAM and Packet Data for OFDM Pull Request for this work. Includes lots of notes. The waveform designs are described in this spreadsheet.
README for the Codec 2 OFDM modem, includes examples and more links.
Test Report of Various Winlink Modems
Modems for HF Digital Voice Part 1
Modems for HF Digital Voice Part 2

June 06, 2020

Comparing Compression

I just did a quick test of different compression options in Debian. The source file is a 1.1G MySQL dump file. The time is user CPU time on a i7-930 running under KVM, the compression programs may have different levels of optimisation for other CPU families.

Facebook people designed the zstd compression system (here’s a page giving an overview of it [1]). It has some interesting new features that can provide real differences at scale (like unusually large windows and pre-defined dictionaries), but I just tested the default mode and the -9 option for more compression. For the SQL file “zstd -9” provides significantly better compression than gzip while taking only slightly less CPU time than “gzip -9” while zstd with the default option (equivalent to “zstd -3”) gives much faster compression than “gzip -9” while also being slightly smaller. For this use case bzip2 is too slow for inline compression of a MySQL dump as the dump process locks tables and can hang clients. The lzma and xz compression algorithms provide significant benefits in size but the time taken is grossly disproportionate.

In a quick check of my collection of files compressed with gzip I was only able to fine 1 fild that got less compression with zstd with default options, and that file got better compression with “zstd -9”. So zstd seems to beat gzip everywhere by every measure.

The bzip2 compression seems to be obsolete, “zstd -9” is much faster and has slightly smaller output.

Both xz and lzma seem to offer a combination of compression and time taken that zstd can’t beat (for this file type at least). The ultra compression mode 22 gives 2% smaller output files but almost 28 minutes of CPU time for compression is a bit ridiculous. There is a threaded mode for zstd that could potentially allow a shorter wall clock time for “zstd --ultra -22” than lzma/xz while also giving better compression.

Compression Time Size
zstd 5.2s 130m
zstd -9 28.4s 114m
gzip -9 33.4s 141m
bzip2 -9 3m51 119m
lzma 6m20 97m
xz 6m36 97m
zstd -19 9m57 99m
zstd --ultra -22 27m46 95m

Conclusion

For distributions like Debian which have large archives of files that are compressed once and transferred a lot the “zstd --ultra -22” compression might be useful with multi-threaded compression. But given that Debian already has xz in use it might not be worth changing until faster CPUs with lots of cores become more commonly available. One could argue that for Debian it doesn’t make sense to change from xz as hard drives seem to be getting larger capacity (and also smaller physical size) faster than the Debian archive is growing. One possible reason for adopting zstd in a distribution like Debian is that there are more tuning options for things like memory use. It would be possible to have packages for an architecture like ARM that tends to have less RAM compressed in a way that decreases memory use on decompression.

For general compression such as compressing log files and making backups it seems that zstd is the clear winner. Even bzip2 is far too slow and in my tests zstd clearly beats gzip for every combination of compression and time taken. There may be some corner cases where gzip can compete on compression time due to CPU features, optimisation for CPUs, etc but I expect that in almost all cases zstd will win for compression size and time. As an aside I once noticed the 32bit of gzip compressing faster than the 64bit version on an Opteron system, the 32bit version had assembly optimisation and the 64bit version didn’t at that time.

To create a tar archive you can run “tar czf” or “tar cJf” to create an archive with gzip or xz compression. To create an archive with zstd compression you have to use “tar --zstd -cf”, that’s 7 extra characters to type. It’s likely that for most casual archive creation (EG for copying files around on a LAN or USB stick) saving 7 characters of typing is more of a benefit than saving a small amount of CPU time and storage space. It would be really good if tar got a single character option for zstd compression.

The external dictionary support in zstd would work really well with rsync for backups. Currently rsync only supports zlib, adding zstd support would be a good project for someone (unfortunately I don’t have enough spare time).

Now I will change my database backup scripts to use zstd.

Update:

The command “tar acvf a.zst filenames” will create a zstd compressed tar archive, the “a” option to GNU tar makes it autodetect the compression type from the file name. Thanks Enrico!

May 31, 2020

Effective Altruism

Long term readers of the blog may recall my daughter Amy. Well, she has moved on from teenage partying and is now e-volunteering at Effective Altruism Australia. She recently pointed me at the free e-book The Life You Can Save by Peter Singer.

I was already familiar with the work of Peter Singer, having read “the Most Good You Can Do”. Peter puts numbers on altruistic behaviour to evaluate them. This appeals to me – as an engineer I uses numbers to evaluate artefacts I build like modems, or other processes going on in the world like COVD-19.

Using technology to help people is a powerful motivator for Geeks. I’ve been involved in a few of these initiatives myself (OLPC and The Village Telco). It’s really tough to create something that helps people long term. A wider set of skills and capabilities are required than just “the technology”.

On my brief forays into the developing world I’ve seen ecologies of people (from the first and developing worlds) living off development dollars. In some cases there is no incentive to report the true outcomes, for example how many government bureaucrats want to report failure? How many consultants want the gig to end?

So I really get the need for scientific evaluation of any development endeavours. Go Peter and the Effective Altruism movement!

I spend around 1000 hours a year writing open source code, a strong argument that I am “doing enough” in the community space. However I have no idea how effective that code is. Is it helping anyone? My inclination to help is also mixed with “itch scratching” – geeky stuff I want to work on because I find it interesting.

So after the reading the book and having a think – I’m sold. I have committed 5% of my income to Effective Altruism Australia, selecting Give Directly as a target for my funds as it appealed to me personally.

I asked Amy proof read this post – and she suggested that instead of $ you, can donate time – that’s what she does. She also said:

Effective Altrusim opens your eyes to alternative ways to interact with charities. It combines the board field of social science to explore how may aspects intersect; by applying the scientific method to that of economics, psychology, international development, and anthropology.

Reading Further

Busting Teenage Partying with a Fluksometer
Effective Altruism Australia

May 30, 2020

AudioBooks – May 2020

Fewer books this month. At home on lockdown and weather a bit worse so less time to go on walks walks and listen.

Save the Cat! Writes a Novel: The Last Book On Novel Writing You’ll Ever Need by Jessica Brody

A fairly straight adaption of the screenplay-writing manual. Lots of examples from well-known books including full breakdowns of beats. 3/5

Happy Singlehood: The Rising Acceptance and Celebration of Solo Living by Elyakim Kislev

Based on 142 interviews. A lot of summaries of findings with quotes for interviewees and people’s blogs. Last chapter has some policy push but a little lights 3/5

Scandinavia: A History by Ewan Butler

Just a a 6 hour long quick spin though history. First half suffers a bit with lists of Kings although there is a bit more colour later in. Okay prep for something meatier 3/5

One Giant Leap: The Impossible Mission That Flew Us to the Moon by Charles Fishman

A bit of a mix. It covers the legacy of Apollo but the best bits are chapters on the Computers, Politics and other behind the scenes things. A compliment to astronaut and mission orientated books. 4/5

My Scoring System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Share

May 29, 2020

Using Live Linux to Save and Recover Your Data

There are two types of people in the world; those who have lost data and those who are about to. Given that entropy will bite eventually, the objective should be to minimise data loss. Some key rules for this backup, backup often, and backup with redundancy. Whilst an article on that subject will be produced, at this stage discussion is directed to the very specific task of recovering data from old machines which may not be accessible anymore using Linux. There number of times I've done this in past years is somewhat more than the number of fingers I have - however, like all good things it deserves to be documented in the hope that other people might find it useful.

To do this one will need a Linux live distribution of some sort as an ISO, as a bootable USB drive. A typical choice would be a Ubuntu Live or Fedora Live. If one is dealing with damaged hardware the old Slackware-derived minimalist distribution Recovery is Possible (RIP) is certainly worth using; it's certainly saved me in the past. If you need help in creating a bootable USB, the good people at HowToGeek provide some simple instructions.

With a Linux bootable disk of some description inserted in one's system, the recovery process can begin. Firstly, boot the machine and change the book order (in BIOS/UEFI) that the drive in question becomes the first in the boot order. Once the live distribution boots up, usually in a GUI environment, one needs to open the terminal application (e.g., GNOME in Fedora uses Applications, System Tools, Terminal) and change to the root user with the su command (there's no password on a live CD to be root!).

At this point one needs to create a mount point directory, where the data is going to be stored; mkdir /mnt/recovery. After this one needs to identify the disk which one is trying to access. The fdisk -l command will provide a list of all disks in the partition table. Some educated guesswork from the results is required here, which will provide the device filesystem Type; it almost certainly isn't an EFI System, or Linux swap for example. Typically one is trying to access something like /dev/sdaX.

Then one must mount the device to the directory that was just created, for example: mount /dev/sda2 /mnt/recovery. Sometimes a recalcitrant device will need to have the filesystem explicitly stated; the most common being ext3, ext4, fat, xfs, vfat, and ntfs-3g. To give a recent example I needed to run mount -t ext3 /dev/sda3 /mnt/recovery. From there one can copy the data from the mount point to a new source; a USB drive is probably the quickest, although one may take the opportunity to copy it to an external system (e.g., google drive) - and that's it! You've recovered your data!

May 28, 2020

Fixing locale problem in MythTV 30

After upgrading to MythTV 30, I noticed that the interface of mythfrontend switched from the French language to English, despite having the following in my ~/.xsession for the mythtv user:

export LANG=fr_CA.UTF-8
exec ~/bin/start_mythtv

I noticed a few related error messages in /var/log/syslog:

mythbackend[6606]: I CoreContext mythcorecontext.cpp:272 (Init) Assumed character encoding: fr_CA.UTF-8
mythbackend[6606]: N CoreContext mythcorecontext.cpp:1780 (InitLocale) Setting QT default locale to FR_US
mythbackend[6606]: I CoreContext mythcorecontext.cpp:1813 (SaveLocaleDefaults) Current locale FR_US
mythbackend[6606]: E CoreContext mythlocale.cpp:110 (LoadDefaultsFromXML) No locale defaults file for FR_US, skipping
mythpreviewgen[9371]: N CoreContext mythcorecontext.cpp:1780 (InitLocale) Setting QT default locale to FR_US
mythpreviewgen[9371]: I CoreContext mythcorecontext.cpp:1813 (SaveLocaleDefaults) Current locale FR_US
mythpreviewgen[9371]: E CoreContext mythlocale.cpp:110 (LoadDefaultsFromXML) No locale defaults file for FR_US, skipping

Searching for that non-existent fr_US locale, I found that others have this in their logs and that it's apparently set by QT as a combination of the language and country codes.

I therefore looked in the database and found the following:

MariaDB [mythconverg]> SELECT value, data FROM settings WHERE value = 'Language';
+----------+------+
| value    | data |
+----------+------+
| Language | FR   |
+----------+------+
1 row in set (0.000 sec)

MariaDB [mythconverg]> SELECT value, data FROM settings WHERE value = 'Country';
+---------+------+
| value   | data |
+---------+------+
| Country | US   |
+---------+------+
1 row in set (0.000 sec)

which explains the non-sensical FR-US locale.

I fixed the country setting like this

MariaDB [mythconverg]> UPDATE settings SET data = 'CA' WHERE value = 'Country';
Query OK, 1 row affected (0.093 sec)
Rows matched: 1  Changed: 1  Warnings: 0

After logging out and logging back in, the user interface of the frontend is now using the fr_CA locale again and the database setting looks good:

MariaDB [mythconverg]> SELECT value, data FROM settings WHERE value = 'Country';
+---------+------+
| value   | data |
+---------+------+
| Country | CA   |
+---------+------+
1 row in set (0.000 sec)

Introducing Shaken Fist

Share

The first public commit to what would become OpenStack Nova was made ten years ago today — at Thu May 27 23:05:26 2010 PDT to be exact. So first off, happy tenth birthday to Nova!

A lot has happened in that time — OpenStack has gone from being two separate Open Source projects to a whole ecosystem, developers have come and gone (and passed away), and OpenStack has weathered the cloud wars of the last decade. OpenStack survived its early growth phase by deliberately offering a “big tent” to the community and associated vendors, with an expansive definition of what should be included. This has resulted in most developers being associated with a corporate sponser, and hence the decrease in the number of developers today as corporate interest wanes — OpenStack has never been great at attracting or retaining hobbist contributors.

My personal involvement with OpenStack started in November 2011, so while I missed the very early days I was around for a lot and made many of the mistakes that I now see in OpenStack.

What do I see as mistakes in OpenStack in hindsight? Well, embracing vendors who later lose interest has been painful, and has increased the complexity of the code base significantly. Nova itself is now nearly 400,000 lines of code, and that’s after splitting off many of the original features of Nova such as block storage and networking. Additionally, a lot of our initial assumptions are no longer true — for example in many cases we had to write code to implement things, where there are now good libraries available from third parties.

That’s not to say that OpenStack is without value — I am a daily user of OpenStack to this day, and use at least three OpenStack public clouds at the moment. That said, OpenStack is a complicated beast with a lot of legacy that makes it hard to maintain and slow to change.

For at least six months I’ve felt the desire for a simpler cloud orchestration layer — both for my own personal uses, and also as a test bed for ideas for what a smaller, simpler cloud might look like. My personal use case involves a relatively small environment which echos what we now think of as edge compute — less than 10 RU of machines with a minimum of orchestration and management overhead.

At the time that I was thinking about these things, the Australian bushfires and COVID-19 came along, and presented me with a lot more spare time than I had expected to have. While I’m still blessed to be employed, all of my social activities have been cancelled, so I find myself at home at a loose end on weekends and evenings at lot more than before.

Thus Shaken Fist was born — named for a Simpson’s meme, Shaken Fist is a deliberately small and highly opinionated cloud implementation aimed at working well in small deployments such as homes, labs, edge compute locations, deployed systems, and so forth.

I’d taken a bit of trouble with each feature in Shaken Fist to think through what the simplest and highest value way of doing something is. For example, instances always get a config drive and there is no metadata server. There is also only one supported type of virtual networking, and one supported hypervisor. That said, this means Shaken Fist is less than 5,000 lines of code, and small enough that new things can be implemented very quickly by a single middle aged developer.

Shaken Fist definitely has feature gaps — API authentication and scheduling are the most obvious at the moment — but I have plans to fill those when the time comes.

I’m not sure if Shaken Fist is useful to others, but you never know. Its apache2 licensed, and available on github if you’re interested.

Share

May 27, 2020

57 Varieties of Pyrite: Exchanges Are Now The Enemy of Bitcoin

TL;DR: exchanges are casinos and don’t want to onboard anyone into bitcoin. Avoid.

There’s a classic scam in the “crypto” space: advertize Bitcoin to get people in, then sell suckers something else entirely. Over the last few years, this bait-and-switch has become the core competency of “bitcoin” exchanges.

I recently visited the homepage of Australian exchange btcmarkets.net: what a mess. There was a list of dozens of identical-looking “cryptos”, with bitcoin second after something called “XRP”; seems like it was sorted by volume?

Incentives have driven exchanges to become casinos, and they’re doing exactly what you’d expect unregulated casinos to do. This is no place you ever want to send anyone.

Incentives For Exchanges

Exchanges make money on trading, not on buying and holding. Despite the fact that bitcoin is the only real attempt to create an open source money, scams with no future are given false equivalence, because more assets means more trading. Worse than that, they are paid directly to list new scams (the crappier, the more money they can charge!) and have recently taken the logical step of introducing and promoting their own crapcoins directly.

It’s like a gold dealer who also sells 57 varieties of pyrite, which give more margin than selling actual gold.

For a long time, I thought exchanges were merely incompetent. Most can’t even give out fresh addresses for deposits, batch their outgoing transactions, pay competent fee rates, perform RBF or use segwit.

But I misunderstood: they don’t want to sell bitcoin. They use bitcoin to get you in the door, but they want you to gamble. This matters: you’ll find subtle and not-so-subtle blockers to simply buying bitcoin on an exchange. If you send a friend off to buy their first bitcoin, they’re likely to come back with something else. That’s no accident.

Looking Deeper, It Gets Worse.

Regrettably, looking harder at specific exchanges makes the picture even bleaker.

Consider Binance: this mainland China backed exchange pretending to be a Hong Kong exchange appeared out of nowhere with fake volume and demonstrated the gullibility of the entire industry by being treated as if it were a respected member. They lost at least 40,000 bitcoin in a known hack, and they also lost all the personal information people sent them to KYC. They aggressively market their own coin. But basically, they’re just MtGox without Mark Karpales’ PHP skills or moral scruples and much better marketing.

Coinbase is more interesting: an MBA-run “bitcoin” company which really dislikes bitcoin. They got where they are by spending big on regulations compliance in the US so they could operate in (almost?) every US state. (They don’t do much to dispel the wide belief that this regulation protects their users, when in practice it seems only USD deposits have any guarantee). Their natural interest is in increasing regulation to maintain that moat, and their biggest problem is Bitcoin.

They have much more affinity for the centralized coins (Ethereum) where they can have influence and control. The anarchic nature of a genuine open source community (not to mention the developers’ oft-stated aim to improve privacy over time) is not culturally compatible with a top-down company run by the Big Dog. It’s a running joke that their CEO can’t say the word “Bitcoin”, but their recent “what will happen to cryptocurrencies in the 2020s” article is breathtaking in its boldness: innovation is mainly happening on altcoins, and they’re going to overtake bitcoin any day now. Those scaling problems which the Bitcoin developers say they don’t know how to solve? This non-technical CEO knows better.

So, don’t send anyone to an exchange, especially not a “market leading” one. Find some service that actually wants to sell them bitcoin, like CashApp or Swan Bitcoin.

May 26, 2020

Cruises and Covid19

Problems With Cruises

GQ has an insightful and detailed article about Covid19 and the Diamond Princess [1], I recommend reading it.

FastCompany has a brief article about bookings for cruises in August [2]. There have been many negative comments about this online.

The first thing to note is that the cancellation policies on those cruises are more lenient than usual and the prices are lower. So it’s not unreasonable for someone to put down a deposit on a half price holiday in the hope that Covid19 goes away (as so many prominent people have been saying it will) in the knowledge that they will get it refunded if things don’t work out. Of course if the cruise line goes bankrupt then no-one will get a refund, but I think people are expecting that won’t happen.

The GQ article highlights some serious problems with the way cruise ships operate. They have staff crammed in to small cabins and the working areas allow transmission of disease. These problems can be alleviated, they could allocate more space to staff quarters and have more capable air conditioning systems to put in more fresh air. During the life of a cruise ship significant changes are often made, replacing engines with newer more efficient models, changing the size of various rooms for entertainment, installing new waterslides, and many other changes are routinely made. Changing the staff only areas to have better ventilation and more separate space (maybe capsule-hotel style cabins with fresh air piped in) would not be a difficult change. It would take some money and some dry-dock time which would be a significant expense for cruise companies.

Cruises Are Great

People like social environments, they want to have situations where there are as many people as possible without it becoming impossible to move. Cruise ships are carefully designed for the flow of passengers. Both the layout of the ship and the schedule of events are carefully planned to avoid excessive crowds. In terms of meeting the requirement of having as many people as possible in a small area without being unable to move cruise ships are probably ideal.

Because there is a large number of people in a restricted space there are economies of scale on a cruise ship that aren’t available anywhere else. For example the main items on the menu are made in a production line process, this can only be done when you have hundreds of people sitting down to order at the same time.

The same applies to all forms of entertainment on board, they plan the events based on statistical knowledge of what people want to attend. This makes it more economical to run than land based entertainment where people can decide to go elsewhere. On a ship a certain portion of the passengers will see whatever show is presented each night, regardless of whether it’s singing, dancing, or magic.

One major advantage of cruises is that they are all inclusive. If you are on a regular holiday would you pay to see a singing or dancing show? Probably not, but if it’s included then you might as well do it – and it will be pretty good. This benefit is really appreciated by people taking kids on holidays, if kids do things like refuse to attend a performance that you were going to see or reject food once it’s served then it won’t cost any extra.

People Who Criticise Cruises

For the people who sneer at cruises, do you like going to bars? Do you like going to restaurants? Live music shows? Visiting foreign beaches? A cruise gets you all that and more for a discount price.

If Groupon had a deal that gave you a cheap hotel stay with all meals included, free non-alcoholic drinks at bars, day long entertainment for kids at the kids clubs, and two live performances every evening how many of the people who reject cruises would buy it? A typical cruise is just like a Groupon deal for non-stop entertainment from 8AM to 11PM.

Will Cruises Restart?

The entertainment options that cruises offer are greatly desired by many people. Most cruises are aimed at budget travellers, the price is cheaper than a hotel in a major city. Such cruises greatly depend on economies of scale, if they can’t get the ships filled then they would need to raise prices (thus decreasing demand) to try to make a profit. I think that some older cruise ships will be scrapped in the near future and some of the newer ships will be sold to cruise lines that cater to cheap travel (IE P&O may scrap some ships and some of the older Princess ships may be transferred to them). Overall I predict a decrease in the number of middle-class cruise ships.

For the expensive cruises (where the cheapest cabins cost over $1000US per person per night) I don’t expect any real changes, maybe they will have fewer passengers and higher prices to allow more social distancing or something.

I am certain that cruises will start again, but it’s too early to predict when. Going on a cruise is about as safe as going to a concert or a major sporting event. No-one is predicting that sporting stadiums will be closed forever or live concerts will be cancelled forever, so really no-one should expect that cruises will be cancelled forever. Whether companies that own ships or stadiums go bankrupt in the mean time is yet to be determined.

One thing that’s been happening for years is themed cruises. A group can book out an entire ship or part of a ship for a themed cruise. I expect this to become much more popular when cruises start again as it will make it easier to fill ships. In the past it seems that cruise lines let companies book their ships for events but didn’t take much of an active role in the process. I think that the management of cruise lines will look to aggressively market themed cruises to anyone who might help, for starters they could reach out to every 80s and 90s pop group – those fans are all old enough to be interested in themed cruises and the musicians won’t be asking for too much money.

Conclusion

Humans are social creatures. People want to attend events with many other people. Covid 19 won’t be the last pandemic, and it may not even be eradicated in the near future. The possibility of having a society where no-one leaves home unless they are in a hazmat suit has been explored in science fiction, but I don’t think that’s a plausible scenario for the near future and I don’t think that it’s something that will be caused by Covid 19.

May 25, 2020

op-build v2.5 firmware for the Raptor Blackbird

Well, following on from my post where I excitedly pointed out that Raptor Blackbird support: all upstream in op-build v2.5, that means I can do another in my series of (close to) upstream Blackbird firmware builds.

This time, the only difference from straight upstream op-build v2.5 is my fixes for buildroot so that I can actually build it on Fedora 32.

So, head over to https://www.flamingspork.com/blackbird/op-build-v2.5-blackbird-images/ and grab blackbird.pnor to flash it on your blackbird, let me know how it goes!

GNS3 FRR Appliance

In my spare time, what little I have, I’ve been wanting to play with some OSS networking projects. For those playing along at home, during last Suse hackweek I played with wireguard, and to test the environment I wanted to set up some routing.
For which I used FRR.

FRR is a pretty cool project, if brings the networking routing stack to Linux, or rather gives us a full opensource routing stack. As most routers are actually Linux anyway.

Many years ago I happened to work at Fujitsu working in a gateway environment, and started playing around with networking. And that was my first experience with GNS3. An opensource network simulator. Back then I needed to have a copy of cisco IOS images to really play with routing protocols, so that make things harder, great open source product but needed access to proprietary router OSes.

FRR provides a CLI _very_ similar to ciscos, and make we think, hey I wonder if there is an FRR appliance we can use in GNS3?
And there was!!!

When I downloaded it and decompressed the cow2 image it was 1.5GB!!! For a single router image. It works great, but what if I wanted a bunch of routers to play with things like OSPF or BGP etc. Surely we can make a smaller one.

Kiwi

At Suse we use kiwi-ng to build machine images and release media. And to make things even easier for me we already have a kiwi config for small OpenSuse Leap JEOS images, jeos is “just enough OS”. So I hacked one to include FRR. All extra tweaks needed to the image are also easily done by bash hook scripts.

I wont go in to too much detail how because I created a git repo where I have it all including a detailed README: https://github.com/matthewoliver/frr_gns3

So feel free to check that would and build and use the image.

But today, I went one step further. OpenSuse’s Open Build System, which is used to build all RPMs for OpenSuse, but can also build debs and whatever build you need, also supports building docker containers and system images using kiwi!

So have now got the OBS to build the image for me. The image can be downloaded from: https://download.opensuse.org/repositories/home:/mattoliverau/images/

And if you want to send any OBS requests to change it the project/package is: https://build.opensuse.org/package/show/home:mattoliverau/FRR-OpenSuse-Appliance

To import it into GNS3 you need the gns3a file, which you can find in my git repo or in the OBS project page.

The best part is this image is only 300MB, which is much better then 1.5GB!
I did have it a little smaller, 200-250MB, but unfortunately the JEOS cut down kernel doesn’t contain the MPLS modules, so had to pull in the full default SUSE kernel. If this became a real thing and not a pet project, I could go and build a FRR cutdown kernel to get the size down, but 300MB is already a lot better then where it was at.

Hostname Hack

When using GNS3 and you place a router, you want to be able to name the router and when you access the console it’s _really_ nice to see the router name you specified in GNS3 as the hostname. Why, because if you have a bunch, you want want a bunch of tags all with the localhost hostname on the commandline… this doesn’t really help.

The FRR image is using qemu, and there wasn’t a nice way to access the name of the VM from inside the container, and now an easy way to insert the name from outside. But found 1 approach that seems to be working, enter my dodgy hostname hack!

I also wanted to to it without hacking the gns3server code. I couldn’t easily pass the hostname in but I could pass it in via a null device with the router name its id:

/dev/virtio-ports/frr.router.hostname.%vm-name%

So I simply wrote a script that sets the hostname based on the existence of this device. Made the script a systemd oneshot service to start at boot and it worked!

This means changing the name of the FRR router in the GNS3 interface, all you need to do is restart the router (stop and start the device) and it’ll apply the name to the router. This saves you having to log in as root and running hostname yourself.

Or better, if you name all your FRR routers before turning them on, then it’ll just work.

In conclusion…

Hopefully now we can have a fully opensource, GNS3 + FRR appliance solution for network training, testing, and inspiring network engineers.

May 24, 2020

Printing hard-to-print PDFs on Linux

I recently found a few PDFs which I was unable to print due to those files causing insufficient printer memory errors:

I found a detailed explanation of what might be causing this which pointed the finger at transparent images, a PDF 1.4 feature which apparently requires a more recent version of PostScript than what my printer supports.

Using Okular's Force rasterization option (accessible via the print dialog) does work by essentially rendering everything ahead of time and outputing a big image to be sent to the printer. The quality is not very good however.

Converting a PDF to DjVu

The best solution I found makes use of a different file format: .djvu

Such files are not PDFs, but can still be opened in Evince and Okular, as well as in the dedicated DjVuLibre application.

As an example, I was unable to print page 11 of this paper. Using pdfinfo, I found that it is in PDF 1.5 format and so the transparency effects could be the cause of the out-of-memory printer error.

Here's how I converted it to a high-quality DjVu file I could print without problems using Evince:

pdf2djvu -d 1200 2002.04049.pdf > 2002.04049-1200dpi.djvu

Converting a PDF to PDF 1.3

I also tried the DjVu trick on a different unprintable PDF, but it failed to print, even after lowering the resolution to 600dpi:

pdf2djvu -d 600 dow-faq_v1.1.pdf > dow-faq_v1.1-600dpi.djvu

In this case, I used a different technique and simply converted the PDF to version 1.3 (from version 1.6 according to pdfinfo):

ps2pdf13 -r1200x1200 dow-faq_v1.1.pdf dow-faq_v1.1-1200dpi.pdf

This eliminates the problematic transparency and rasterizes the elements that version 1.3 doesn't support.

May 23, 2020

A totally cheating sour dough starter

Share

This is the third in a series of posts documenting my adventures in making bread during the COVID-19 shutdown. I’d like to imagine I was running science experiments in making bread on my kids, but really all I was trying to do was eat some toast.

I’m not sure what it was like in other parts of the world, but during the COVID-19 pandemic Australia suffered a bunch of shortages — toilet paper, flour, and yeast were among those things stores simply didn’t have any stock of. Luckily we’d only just done a costco shop so were ok for toilet paper and flour, but we were definitely getting low on yeast. The obvious answer is a sour dough starter, but I’d never done that thing before.

In the end my answer was to cheat and use this recipe. However, I found the instructions unclear, so here’s what I ended up doing:

Starting off

  • 2 cups of warm water
  • 2 teaspoons of dry yeast
  • 2 cups of bakers flour

Mix these three items together in a plastic container with enough space for the mix to double in size. Place in a warm place (on the bench on top of the dish washer was our answer), and cover with cloth secured with a rubber band.

Feeding

Once a day you should feed your starter with 1 cup of flour and 1 cup of warm water. Stir throughly.

Reducing size

The recipe online says to feed for five days, but the size of my starter was getting out of hand by a couple of days, so I started baking at that point. I’ll describe the baking process in a later post. The early loaves definitely weren’t as good as the more recent ones, but they were still edible.

Hybernation

Once the starter is going, you feed daily and probably need to bake daily to keep the starters size under control. That obviously doesn’t work so great if you can’t eat an entire loaf of bread a day. You can hybernate the starter by putting it in the fridge, which means you only need to feed it once a week.

To wake a hybernated starter up, take it out of the fridge and feed it. I do this at 8am. That means I can then start the loaf for baking at about noon, and the starter can either go back in the fridge until next time or stay on the bench being fed daily.

I have noticed that sometimes the starter comes out of the fridge with a layer of dark water on top. Its worked out ok for us to just ignore that and stir it into the mix as part of the feeding process. Hopefully we wont die.

Share

Refurbishing my Macintosh Plus

Somewhere in the mid to late 1990s I picked myself up a Macintosh Plus for the sum of $60AUD. At that time there were still computer Swap Meets where old and interesting equipment was around, so I headed over to one at some point (at the St Kilda Town Hall if memory serves) and picked myself up four 1MB SIMMs to boost the RAM of it from the standard 1MB to the insane amount of 4MB. Why? Umm… because I could? The RAM was pretty cheap, and somewhere in the house to this day, I sometimes stumble over the 256KB SIMMs as I just can’t bring myself to get rid of them.

This upgrade probably would have cost close to $2,000 at the system’s release. If the Macintosh system software were better at disk caching you could have easily held the whole 800k of the floppy disk in memory and still run useful software!

One of the annoying things that started with the Macintosh was odd screws and Apple gear being hard to get into. Compare to say, the Apple ][ which had handy clips to jump inside whenever. In fitting my massive FOUR MEGABYTES of RAM back in the day, I recall using a couple of allen keys sticky-taped together to be able to reach in and get the recessed Torx screws. These days, I can just order a torx bit off Amazon and have it arrive pretty quickly. Well, two torx bits, one of which is just too short for the job.

My (dusty) Macintosh Plus

One thing had always struck me about it, it never really looked like the photos of the Macintosh Plus I saw in books. In what is an embarrassing number of years later, I learned that a lot can be gotten from the serial number printed on the underside of the front of the case.

So heading over to the My Old Mac Serial Number Decoder I can find out:

Manufactured in: F => Fremont, California, USA
Year of production: 1985
Week of production: 14
Production number: 3V3 => 4457
Model ID: M0001WP => Macintosh 512K (European Macintosh ED)

Your Macintosh 512K (European Macintosh ED) was the 4457th Mac manufactured during the 14th week of 1985 in Fremont, California, USA.

Pretty cool! So it is certainly a Plus as the logic board says that, but it’s actually an upgraded 512k! If you think it was madness to have a GUI with only 128k of RAM in the original Macintosh, you’d be right. I do not envy anybody who had one of those.

Some time a decent (but not too many, less than 10) years ago, I turn on the Mac Plus to see if it still worked. It did! But then… some magic smoke started to come out (which isn’t so good), but the computer kept working! There’s something utterly bizarre about looking at a computer with smoke coming out of it that continues to function perfectly fine.

Anyway, as the smoke was coming out, I decided that it would be an opportune time to turn it off, open doors and windows, and put it away until I was ready to deal with it.

One Global Pandemic Later, and now was the time.

I suspected it was going to be a capacitor somewhere that blew, and figured that I should replace it, and probably preemptively replace all the other electrolytic capacitors that could likely leak and cause problems.

First thing’s first though: dismantle it and clean everything. First, taking the case off. Apple is not new to the game of annoying screws to get into things. I ended up spending $12 on this set on Amazon, as the T10 bit can actually reach the screws holding the case on.

Cathode Ray Tubes are not to be messed with. We’re talking lethal voltages here. It had been many years since electricity went into this thing, so all was good. If this all doesn’t work first time when reassembling it, I’m not exactly looking forward to discharging a CRT and working on it.

The inside of my Macintosh Plus, with lots of grime.

You can see there’s grime everywhere. It’s not the worst in the world, but it’s not great (and kinda sticky). Obviously, this needs to be cleaned! The best way to do that is take a lot of photos, dismantle everything, and clean it a bit at a time.

There’s four main electronic components inside a Macintosh Plus:

  1. The CRT itself
  2. The floppy disk drive
  3. The Logic Board (what Mac people call what PC people call the motherboard)
  4. The Analog Board

There’s also some metal structure that keeps some things in place. There’s only a few connectors between things, which are pretty easy to remove. If you don’t know how to discharge a CRT and what the dangers of them are you should immediately go and find out through reading rather than finding out by dying. I would much prefer it if you dyed (because creative fun) rather than died.

Once the floppy connector and the power connector is unplugged, the logic board slides out pretty easily. You can see from the photo below that I have the 4MB of RAM installed and the resistor you need to snip is, well, snipped (but look really closely for that). Also, grime.

Macintosh Plus Logic Board

Cleaning things? Well, there’s two ways that I have used (and considering I haven’t yet written the post with “hurray, it all works”, currently take it with a grain of salt until I write that post). One: contact cleaner. Two: detergent.

Macintosh Plus Logic Board (being washed in my sink)

I took the route of cleaning things first, and then doing recapping adventures. So it was some contact cleaner on the boards, and then some soaking with detergent. This actually all worked pretty well.

Logic Board Capacitors:

  • C5, C6, C7, C12, C13 = 33uF 16V 85C (measured at 39uF, 38uF, 38uF, 39uF)
  • C14 = 1uF 50V (measured at 1.2uF and then it fluctuated down to around 1.15uF)

Analog Board Capacitors

  • C1 = 35V 3.9uF (M) measured at 4.37uF
  • C2 = 16V 4700uF SM measured at 4446uF
  • C3 = 16V 220uF +105C measured at 234uF
  • C5 = 10V 47uF 85C measured at 45.6uF
  • C6 = 50V 22uF 85C measured at 23.3uF
  • C10 = 16V 33uF 85C measured at 37uF
  • C11 = 160V 10uF 85C measured at 11.4uF
  • C12 = 50V 22uF 85C measured at 23.2uF
  • C18 = 16V 33uF 85C measured at 36.7uF
  • C24 = 16V 2200uF 105C measured at 2469uF
  • C27 = 16V 2200uF 105C measured at 2171uF (although started at 2190 and then went down slowly)
  • C28 = 16V 1000uF 105C measured at 638uF, then 1037uF, then 1000uF, then 987uF
  • C30 = 16V 2200uF 105C measured at 2203uF
  • C31 = 16V 220uF 105C measured at 236uF
  • C32 = 16V 2200uF 105C measured at 2227uF
  • C34 = 200V 100uF 85C measured at 101.8uF
  • C35 = 200V 100uF 85C measured at 103.3uF
  • C37 = 250V 0.47uF measured at <exploded>. wheee!
  • C38 = 200V 100uF 85C measured at 103.3uF
  • C39 = 200V 100uF 85C mesaured at 99.6uF (with scorch marks from next door)
  • C42 = 10V 470uF 85C measured at 556uF
  • C45 = 10V 470uF 85C measured at 227uF, then 637uF then 600uF

I’ve ordered an analog board kit from https://console5.com/store/macintosh-128k-512k-plus-analog-pcb-cap-kit-630-0102-661-0462.html and when trying to put them in, I learned that the US Analog board is different to the International Analog board!!! Gah. Dammit.

Note that C30, C32, C38, C39, and C37 were missing from the kit I received (probably due to differences in the US and International boards). I did have an X2 cap (for C37) but it was 0.1uF not 0.47uF. I also had two extra 1000uF 16V caps.

Macintosh Repair and Upgrade Secrets (up to the Mac SE no less!) holds an Appendix with the parts listing for both the US and International Analog boards, and this led me to conclude that they are in fact different boards rather than just a few wires that are different. I am not sure what the “For 120V operation, W12 must be in place” and “for 240V operation, W12 must be removed” writing is about on the International Analog board, but I’m not quite up to messing with that at the moment.

So, I ordered the parts (linked above) and waited (again) to be able to finish re-capping the board.

I found https://youtu.be/H9dxJ7uNXOA video to be a good one for learning a bunch about the insides of compact Macs, I recommend it and several others on his YouTube channel. One interesting thing I learned is that the X2 cap (C37 on the International one) is before the power switch, so could blow just by having the system plugged in and not turned on! Okay, so I’m kind of assuming that it also applies to the International board, and mine exploded while it was plugged in and switched on, so YMMV.

Additionally, there’s an interesting list of commonly failing parts. Unfortunately, this is also for the US logic board, so the tables in Macintosh Repair and Upgrade Secrets are useful. I’m hoping that I don’t have to replace anything more there, but we’ll see.

But, after the Nth round of parts being delivered….

Note the lack of an exploded capacitor

Yep, that’s where the exploded cap was before. Cleanup up all pretty nicely actually. Annoyingly, I had to run it all through a step-up transformer as the board is all set for Australian 240V rather than US 120V. This isn’t going to be an everyday computer though, so it’s fine.

Macintosh Plus booting up (note how long the memory check of 4MB of RAM takes. I’m being very careful as the cover is off. High, and possibly lethal voltages exposed.

Woohoo! It works. While I haven’t found my supply of floppy disks that (at least used to) work, the floppy mechanism also seems to work okay.

Macintosh Plus with a seemingly working floppy drive mechanism. I haven’t found a boot floppy yet though.

Next up: waiting for my Floppy Emu to arrive as it’ll certainly let it boot. Also, it’s now time to rip the house apart to find a floppy disk that certainly should have made its way across the ocean with the move…. Oh, and also to clean up the mouse and keyboard.

May 18, 2020

Displaying client IP address using Apache Server-Side Includes

If you use a Dynamic DNS setup to reach machines which are not behind a stable IP address, you will likely have a need to probe these machines' public IP addresses. One option is to use an insecure service like Oracle's http://checkip.dyndns.com/ which echoes back your client IP, but you can also do this on your own server if you have one.

There are multiple options to do this, like writing a CGI or PHP script, but those are fairly heavyweight if that's all you need mod_cgi or PHP for. Instead, I decided to use Apache's built-in Server-Side Includes.

Apache configuration

Start by turning on the include filter by adding the following in /etc/apache2/conf-available/ssi.conf:

AddType text/html .shtml
AddOutputFilter INCLUDES .shtml

and making that configuration file active:

a2enconf ssi

Then, find the vhost file where you want to enable SSI and add the following options to a Location or Directory section:

<Location /ssi_files>
    Options +IncludesNOEXEC
    SSLRequireSSL
    Header set Content-Security-Policy: "default-src 'none'"
    Header set X-Content-Type-Options: "nosniff"
    Header set Cache-Control "max-age=0, no-cache, no-store, must-revalidate"
</Location>

before adding the necessary modules:

a2enmod headers
a2enmod include

and restarting Apache:

apache2ctl configtest && systemctl restart apache2.service

Create an shtml page

With the web server ready to process SSI instructions, the following HTML blurb can be used to display the client IP address:

<!--#echo var="REMOTE_ADDR" -->

or any other built-in variable.

Note that you don't need to write a valid HTML for the variable to be substituted and so the above one-liner is all I use on my server.

Security concerns

The first thing to note is that the configuration section uses the IncludesNOEXEC option in order to disable arbitrary command execution via SSI. In addition, you can also make sure that the cgi module is disabled since that's a dependency of the more dangerous side of SSI:

a2dismod cgi

Of course, if you rely on this IP address to be accurate, for example because you'll be putting it in your DNS, then you should make sure that you only serve this page over HTTPS, which can be enforced via the SSLRequireSSL directive.

I included two other headers in the above vhost config (Content-Security-Policy and X-Content-Type-Options) in order to limit the damage that could be done in case a malicious file was accidentally dropped in that directory.

Finally, I suggest making sure that only the root user has writable access to the directory which has server-side includes enabled:

$ ls -la /var/www/ssi_includes/
total 12
drwxr-xr-x  2 root     root     4096 May 18 15:58 .
drwxr-xr-x 16 root     root     4096 May 18 15:40 ..
-rw-r--r--  1 root     root        0 May 18 15:46 index.html
-rw-r--r--  1 root     root       32 May 18 15:58 whatsmyip.shtml

A Good Time to Upgrade PCs

PC hardware just keeps getting cheaper and faster. Now that so many people have been working from home the deficiencies of home PCs are becoming apparent. I’ll give Australian prices and URLs in this post, but I think that similar prices will be available everywhere that people read my blog.

From MSY (parts list PDF ) [1] 120G SATA SSDs are under $50 each. 120G is more than enough for a basic workstation, so you are looking at $42 or so for fast quiet storage or $84 or so for the same with RAID-1. Being quiet is a significant luxury feature and it’s also useful if you are going to be in video conferences.

For more serious storage NVMe starts at around $100 per unit, I think that $124 for a 500G Crucial NVMe is the best low end option (paying $95 for a 250G Kingston device doesn’t seem like enough savings to be worth it). So that’s $248 for 500G of very fast RAID-1 storage. There’s a Samsung 2TB NVMe device for $349 which is good if you need more storage, it’s interesting to note that this is significantly cheaper than the Samsung 2TB SSD which costs $455. I wonder if SATA SSD devices will go away in the future, it might end up being SATA for slow/cheap spinning media and M.2 NVMe for solid state storage. The SATA SSD devices are only good for use in older systems that don’t have M.2 sockets on the motherboard.

It seems that most new motherboards have one M.2 socket on the motherboard with NVMe support, and presumably support for booting from NVMe. But dual M.2 sockets is rare and the price difference is significantly greater than the cost of a PCIe M.2 card to support NVMe which is $14. So for NVMe RAID-1 it seems that the best option is a motherboard with a single NVMe socket (starting at $89 for a AM4 socket motherboard – the current standard for AMD CPUs) and a PCIe M.2 card.

One thing to note about NVMe is that different drivers are required. On Linux this means means building a new initrd before the migration (or afterwards when booted from a recovery image) and on Windows probably means a fresh install from special installation media with NVMe drivers.

All the AM4 motherboards seem to have RADEON Vega graphics built in which is capable of 4K resolution at a stated refresh of around 24Hz. The ones that give detail about the interfaces say that they have HDMI 1.4 which means a maximum of 30Hz at 4K resolution if you have the color encoding that suits text (IE for use other than just video). I covered this issue in detail in my blog post about DisplayPort and 4K resolution [2]. So a basic AM4 motherboard won’t give great 4K display support, but it will probably be good for a cheap start.

$89 for motherboard, $124 for 500G NVMe, $344 for a Ryzen 5 3600 CPU (not the cheapest AM4 but in the middle range and good value for money), and $99 for 16G of RAM (DDR4 RAM is cheaper than DDR3 RAM) gives the core of a very decent system for $656 (assuming you have a working system to upgrade and peripherals to go with it).

Currently Kogan has 4K resolution monitors starting at $329 [3]. They probably won’t be the greatest monitors but my experience of a past cheap 4K monitor from Kogan was that it is quite OK. Samsung 4K monitors started at about $400 last time I could check (Kogan currently has no stock of them and doesn’t display the price), I’d pay an extra $70 for Samsung, but the Kogan branded product is probably good enough for most people. So you are looking at under $1000 for a new system with fast CPU, DDR4 RAM, NVMe storage, and a 4K monitor if you already have the case, PSU, keyboard, mouse, etc.

It seems quite likely that the 4K video hardware on a cheap AM4 motherboard won’t be that great for games and it will definitely be lacking for watching TV documentaries. Whether such deficiencies are worth spending money on a PCIe video card (starting at $50 for a low end card but costing significantly more for 3D gaming at 4K resolution) is a matter of opinion. I probably wouldn’t have spent extra for a PCIe video card if I had 4K video on the motherboard. Not only does using built in video save money it means one less fan running (less background noise) and probably less electricity use too.

My Plans

I currently have a workstation with 2*500G SATA SSDs in a RAID-1 array, 16G of RAM, and a i5-2500 CPU (just under 1/4 the speed of the Ryzen 5 3600). If I had hard drives then I would definitely buy a new system right now. But as I have SSDs that work nicely (quiet and fast enough for most things) and almost all machines I personally use have SSDs (so I can’t get a benefit from moving my current SSDs to another system) I would just get CPU, motherboard, and RAM. So the question is whether to spend $532 for more than 4* the CPU performance. At the moment I’ll wait because I’ll probably get a free system with DDR4 RAM in the near future, while it probably won’t be as fast as a Ryzen 5 3600, it should be at least twice as fast as what I currently have.

May 17, 2020

Notes on Installing Ubuntu 20 VM on an MS-Windows 10 Host

Some thirteen years ago I worked with Xen virtual machines as part of my day job, and gave a presentation at Linux Users of Victoria on the subject (with additional lecture notes). A few years after that I gave another presentation on the Unified Extensible Firmware Interface (UEFI), itself which (indirectly) led to a post on Linux and MS-Windows 8 dual-booting. All of this now leads to a some notes on using MS-Windows as a host for Ubuntu Linux guest machines.

Why Would You Want to do This?

Most people these have at least heard of Linux. They might even know that every single supercomputer in the world uses Linux. They may know that the overwhelming majority of embedded devices, such as home routers, use Linux. Or maybe even that the Android mobile 'phone uses a Linux kernel. Or that MacOS is built on the same broad family of UNIX-like operating systems. Whilst they might be familiar with their MS-Windows environment, because that's what they've been brought up on and what their favourite applications are designed for, they might also be "Linux curious", especially if they are hoping to either scale-up the complexity and volume of the datasets they're working with (i.e., towards high performance computing) or scale-down their applications (i.e., towards embedded devices). If this is the case, then introducing Linux via a virtual machine (VM) is a relatively safe and easy path to experiment with.

About VMs

Virtual machines work by emulating a computer system, including hardware, in a software environment, a technology that has been around for a very long time (e.g., CP/CMS, 1967). The VMs in a host system is managed by a hypervisor, or Virtual Machine Monitor (VMM), that manages one or more guest systems. In the example that follows VirtualBox, a free-and-open source hypervisor. Because the guest system relies on the host it cannot have the same performance as a host system, unlike a dual-boot system. It will share memory, it will share processing power, it must take up some disk space, and will also have the overhead of the hypervisor itself (although this has improved a great deal in recent years). In a production environment, VMs are usually used to optimise resource allocation for very powerful systems, such as web-server farms and bodies like the Nectar Research Cloud, or even some partitions on systems like the University of Melbourne's supercomputer, Spartan. In a development environment, VMs are an excellent tool for testing and debugging.

Install VirtualBox and Enable Virtualization

For most environments VirtualBox is an easy path for creating a virtual machine, ARM systems excluded (QEMU suggested for Raspberry Pi or Android, or QEMU's fork, KVM). For the example given here, simply download VirtualBox for MS-Windows and click one's way through the installation process, noting that it VirtualBox will make changes to your system and that products from Oracle can be trusted (*blink*). Download for other operating environments are worth looking at as well.

It is essential to enable virtualisation on your MS-Windows host through the BIOS/UEFI, which is not as easy as it used to be. A handy page from some smart people in the Czech Republic provides quick instructions for a variety of hardware environments. The good people at laptopmag provide the path from within the MS-Windows environment. In summary; select Settings (gear icon), select Update & Security, Select Recovery (this sounds wrong), Advanced Startup, Restart Now (which is also wrong, you don't restart now), Troubleshoot, Advanced Options, UEFI Firmware Settings, then Restart.

Install Linux and Create a Shared Folder

Download a Ubuntu 20.04 LTS (long-term support) ISO and save to the MS-Windows host. There are some clever alternatives, such as the Ubuntu Linux terminal environment for MS-Windows (which is possibly even a better choice these days, but that will be for another post), or Multipass which allows one to create their own mini-cloud environment. But this is a discussion for a VM, so I'll resist the temptation to go off on a tangent.

Creating a VM in VirtualBox is pretty straight-forward; open the application, select "New", give the VM a name, and allocate resources (virtual hard disk, virtual memory). It's worthwhile tending towards the generous in resource allocation. After that it is a case selecting the ISO in settings and storage; remember a VM does not have a real disk drive, so it has a virtual (software) one. After this one can start the VM, and it will boot from the ISO and begin the installation process for Ubuntu Linux desktop edition, which is pretty straight forward. One amusing caveat, when the installation says it's going to wipe the disk it doesn't mean the host machine, just that of the virtual disk that has been build for it. When the installation is complete go to "Devices" on the VM menu, and remove the boot disk and restart the guest system; you now have a Ubuntu VM installed on your MS-Windows system.

By default, VMs do not have access to the host computer. To provide that access one will want to set up a shared folder in the VM and on the host. The first step in this environment would be to give the Linux user (created during installation) membership to the vboxsf, e.g., on the terminal sudo usermod -a -G vboxsf username. In VirtualBox, select Settings, and add a Share under as a Machine Folders, which is a permanent folder. Under Folder Path set the name and location on the host operating system (e.g., UbuntuShared on the Desktop); leave automount blank (we can fix that soon enough). Put a test file in the shared folder.

Ubuntu now needs additional software installed to work with VirtualBox's Guest Additions, including kernel modules. Also, mount VirtualBox's Guest Additions to the guest VM, under Devices as a virtual CD; you can download this from the VirtualBox website.

Run the following commands, entering the default user's password as needed:


sudo apt-get install -y build-essential linux-headers-`uname -r`
sudo /media/cdrom/./VBoxLinuxAdditions.run
sudo shutdown -r now # Reboot the system
mkdir ~/UbuntuShared
sudo mount -t vboxsf shared ~/UbuntuShared
cd ~/UbuntuShared

The file that was put in the UbuntuShared folder in MS-Windows should now be visible in ~/UbuntuShared. Add a file (e.g., touch testfile.txt) from Linux and check if it can seen in MS-Windows. If this all succeeds, make the folder persistent.


sudo nano /etc/fstab # nano is just fine for short configuration files
# Add the following, separate by tabs, and save
shared /home//UbuntuShared vboxsf defaults 0 0
# Edit modules
sudo nano /etc/modules
# Add the following
vboxsf
# Exit and reboot
sudo shutdown -r now

You're done! You now have a Ubuntu desktop system running as a VM guest using VirtualBox on an MS-Windows 10 host system. Ideal for learning, testing, and debugging.

A super simple non-breadmaker loaf

Share

This is the second in a series of posts documenting my adventures in making bread during the COVID-19 shutdown. Yes I know all the cool kids made bread for themselves during the shutdown, but I did it too!

A loaf of bread

So here we were, in the middle of a pandemic which closed bakeries and cancelled almost all of my non-work activities. I found this animated GIF on Reddit for a super simple no-kneed bread and decided to give it a go. It turns out that a few things are true:

  • animated GIFs are a super terrible way store recipes
  • that animated GIF was a export of this YouTube video which originally accompanied this blog post
  • and that I only learned these things while to trying and work out who to credit for this recipe

The basic recipe is really easy — chuck the following into a big bowl, stir, and then cover with a plate. Leave resting a warm place for a long time (three or four hours), then turn out onto a floured bench. Fold into a ball with flour, and then bake. You can see a more detailed version in the YouTube video above.

  • 3 cups of bakers flour (not plain white flour)
  • 2 tea spoons of yeast
  • 2 tea spooons of salt
  • 1.5 cups of warm water (again, I use 42 degrees from my gas hot water system)

The dough will seem really dry when you first mix it, but gets wetter as it rises. Don’t panic if it seems tacky and dry.

I think the key here is the baking process, which is how the oven loaf in my previous post about bread maker white loaves was baked. I use a cast iron camp oven (sometimes called a dutch oven), because thermal mass is key. If I had a fancy enamelized cast iron camp oven I’d use that, but I don’t and I wasn’t going shopping during the shutdown to get one. Oh, and they can be crazy expensive at up to $500 AUD.

Another loaf of bread

Warm the oven with the camp oven inside for at least 30 minutes at 230 degrees celsius. Then place the dough inside the camp oven on some baking paper — I tend to use a triffet as well, but I think you could skip that if you didn’t have one. Bake for 30 minutes with the lid on — this helps steam the bread a little and forms a nice crust. Then bake for another 12 minutes with the camp over lid off — this darkens the crust up nicely.

A final loaf of bread

Oh, and I’ve noticed a bit of variation in how wet the dough seems to be when I turn it out and form it in flour, but it doesn’t really seem to change the outcome once baked, so that’s nice.

The original blogger for this receipe also recommends chilling the dough overnight in the fridge before baking, but I haven’t tried that yet.

Share

Private Key Redaction: UR DOIN IT RONG

Because posting private keys on the Internet is a bad idea, some people like to “redact” their private keys, so that it looks kinda-sorta like a private key, but it isn’t actually giving away anything secret. Unfortunately, due to the way that private keys are represented, it is easy to “redact” a key in such a way that it doesn’t actually redact anything at all. RSA private keys are particularly bad at this, but the problem can (potentially) apply to other keys as well.

I’ll show you a bit of “Inside Baseball” with key formats, and then demonstrate the practical implications. Finally, we’ll go through a practical worked example from an actual not-really-redacted key I recently stumbled across in my travels.

The Private Lives of Private Keys

Here is what a typical private key looks like, when you come across it:

-----BEGIN RSA PRIVATE KEY-----
MGICAQACEQCxjdTmecltJEz2PLMpS4BXAgMBAAECEDKtuwD17gpagnASq1zQTYEC
CQDVTYVsjjF7IQIJANUYZsIjRsR3AgkAkahDUXL0RSECCB78r2SnsJC9AghaOK3F
sKoELg==
-----END RSA PRIVATE KEY-----

Obviously, there’s some hidden meaning in there – computers don’t encrypt things by shouting “BEGIN RSA PRIVATE KEY!”, after all. What is between the BEGIN/END lines above is, in fact, a base64-encoded DER format ASN.1 structure representing a PKCS#1 private key.

In simple terms, it’s a list of numbers – very important numbers. The list of numbers is, in order:

  • A version number (0);
  • The “public modulus”, commonly referred to as “n”;
  • The “public exponent”, or “e” (which is almost always 65,537, for various unimportant reasons);
  • The “private exponent”, or “d”;
  • The two “private primes”, or “p” and “q”;
  • Two exponents, which are known as “dmp1” and “dmq1”; and
  • A coefficient, known as “iqmp”.

Why Is This a Problem?

The thing is, only three of those numbers are actually required in a private key. The rest, whilst useful to allow the RSA encryption and decryption to be more efficient, aren’t necessary. The three absolutely required values are e, p, and q.

Of the other numbers, most of them are at least about the same size as each of p and q. So of the total data in an RSA key, less than a quarter of the data is required. Let me show you with the above “toy” key, by breaking it down piece by piece1:

  • MGI – DER for “this is a sequence”
  • CAQ – version (0)
  • CxjdTmecltJEz2PLMpS4BXn
  • AgMBAAe
  • ECEDKtuwD17gpagnASq1zQTYd
  • ECCQDVTYVsjjF7IQp
  • IJANUYZsIjRsR3q
  • AgkAkahDUXL0RSdmp1
  • ECCB78r2SnsJC9dmq1
  • AghaOK3FsKoELg==iqmp

Remember that in order to reconstruct all of these values, all I need are e, p, and q – and e is pretty much always 65,537. So I could “redact” almost all of this key, and still give all the important, private bits of this key. Let me show you:

-----BEGIN RSA PRIVATE KEY-----
..............................................................EC
CQDVTYVsjjF7IQIJANUYZsIjRsR3....................................
........
-----END RSA PRIVATE KEY-----

Now, I doubt that anyone is going to redact a key precisely like this… but then again, this isn’t a “typical” RSA key. They usually look a lot more like this:

-----BEGIN RSA PRIVATE KEY-----
MIIEogIBAAKCAQEAu6Inch7+mWtKn+leB9uCG3MaJIxRyvC/5KTz2fR+h+GOhqj4
SZJobiVB4FrE5FgC7AnlH6qeRi9MI0s6dt5UWZ5oNIeWSaOOeNO+EJDUkSVf67wj
SNGXlSjGAkPZ0nRJiDjhuPvQmdW53hOaBLk5udxPEQbenpXAzbLJ7wH5ouLQ3nQw
HwpwDNQhF6zRO8WoscpDVThOAM+s4PS7EiK8ZR4hu2toon8Ynadlm95V45wR0VlW
zywgbkZCKa1IMrDCscB6CglQ10M3Xzya3iTzDtQxYMVqhDrA7uBYRxA0y1sER+Rb
yhEh03xz3AWemJVLCQuU06r+FABXJuY/QuAVvQIDAQABAoIBAFqwWVhzWqNUlFEO
PoCVvCEAVRZtK+tmyZj9kU87ORz8DCNR8A+/T/JM17ZUqO2lDGSBs9jGYpGRsr8s
USm69BIM2ljpX95fyzDjRu5C0jsFUYNi/7rmctmJR4s4uENcKV5J/++k5oI0Jw4L
c1ntHNWUgjK8m0UTJIlHbQq0bbAoFEcfdZxd3W+SzRG3jND3gifqKxBG04YDwloy
tu+bPV2jEih6p8tykew5OJwtJ3XsSZnqJMwcvDciVbwYNiJ6pUvGq6Z9kumOavm9
XU26m4cWipuK0URWbHWQA7SjbktqEpxsFrn5bYhJ9qXgLUh/I1+WhB2GEf3hQF5A
pDTN4oECgYEA7Kp6lE7ugFBDC09sKAhoQWrVSiFpZG4Z1gsL9z5YmZU/vZf0Su0n
9J2/k5B1GghvSwkTqpDZLXgNz8eIX0WCsS1xpzOuORSNvS1DWuzyATIG2cExuRiB
jYWIJUeCpa5p2PdlZmBrnD/hJ4oNk4oAVpf+HisfDSN7HBpN+TJfcAUCgYEAyvY7
Y4hQfHIdcfF3A9eeCGazIYbwVyfoGu70S/BZb2NoNEPymqsz7NOfwZQkL4O7R3Wl
Rm0vrWT8T5ykEUgT+2ruZVXYSQCKUOl18acbAy0eZ81wGBljZc9VWBrP1rHviVWd
OVDRZNjz6nd6ZMrJvxRa24TvxZbJMmO1cgSW1FkCgYAoWBd1WM9HiGclcnCZknVT
UYbykCeLO0mkN1Xe2/32kH7BLzox26PIC2wxF5seyPlP7Ugw92hOW/zewsD4nLze
v0R0oFa+3EYdTa4BvgqzMXgBfvGfABJ1saG32SzoWYcpuWLLxPwTMsCLIPmXgRr1
qAtl0SwF7Vp7O/C23mNukQKBgB89DOEB7xloWv3Zo27U9f7nB7UmVsGjY8cZdkJl
6O4LB9PbjXCe3ywZWmJqEbO6e83A3sJbNdZjT65VNq9uP50X1T+FmfeKfL99X2jl
RnQTsrVZWmJrLfBSnBkmb0zlMDAcHEnhFYmHFuvEnfL7f1fIoz9cU6c+0RLPY/L7
n9dpAoGAXih17mcmtnV+Ce+lBWzGWw9P4kVDSIxzGxd8gprrGKLa3Q9VuOrLdt58
++UzNUaBN6VYAe4jgxGfZfh+IaSlMouwOjDgE/qzgY8QsjBubzmABR/KWCYiRqkj
qpWCgo1FC1Gn94gh/+dW2Q8+NjYtXWNqQcjRP4AKTBnPktEvdMA=
-----END RSA PRIVATE KEY-----

People typically redact keys by deleting whole lines, and usually replacing them with [...] and the like. But only about 345 of those 1588 characters (excluding the header and footer) are required to construct the entire key. You can redact about 4/5ths of that giant blob of stuff, and your private parts (or at least, those of your key) are still left uncomfortably exposed.

But Wait! There’s More!

Remember how I said that everything in the key other than e, p, and q could be derived from those three numbers? Let’s talk about one of those numbers: n.

This is known as the “public modulus” (because, along with e, it is also present in the public key). It is very easy to calculate: n = p * q. It is also very early in the key (the second number, in fact).

Since n = p * q, it follows that q = n / p. Thus, as long as the key is intact up to p, you can derive q by simple division.

Real World Redaction

At this point, I’d like to introduce an acquaintance of mine: Mr. Johan Finn. He is the proud owner of the GitHub repo johanfinn/scripts. For a while, his repo contained a script that contained a poorly-redacted private key. He since deleted it, by making a new commit, but of course because git never really deletes anything, it’s still available.

Of course, Mr. Finn may delete the repo, or force-push a new history without that commit, so here is the redacted private key, with a bit of the surrounding shell script, for our illustrative pleasure:

#Add private key to .ssh folder
cd /home/johan/.ssh/
echo  "-----BEGIN RSA PRIVATE KEY-----
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
ÄÄÄÄÄÄÄÄÄÄÄÄÄÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ
MIIJKgIBAAKCAgEAxEVih1JGb8gu/Fm4AZh+ZwJw/pjzzliWrg4mICFt1g7SmIE2
TCQMKABdwd11wOFKCPc/UzRH/fHuQcvWrpbOSdqev/zKff9iedKw/YygkMeIRaXB
fYELqvUAOJ8PPfDm70st9GJRhjGgo5+L3cJB2gfgeiDNHzaFvapRSU0oMGQX+kI9
ezsjDAn+0Pp+r3h/u1QpLSH4moRFGF4omNydI+3iTGB98/EzuNhRBHRNq4oBV5SG
Pq/A1bem2ninnoEaQ+OPESxYzDz3Jy9jV0W/6LvtJ844m+XX69H5fqq5dy55z6DW
sGKn78ULPVZPsYH5Y7C+CM6GAn4nYCpau0t52sqsY5epXdeYx4Dc+Wm0CjXrUDEe
Egl4loPKDxJkQqQ/MQiz6Le/UK9vEmnWn1TRXK3ekzNV4NgDfJANBQobOpwt8WVB
rbsC0ON7n680RQnl7PltK9P1AQW5vHsahkoixk/BhcwhkrkZGyDIl9g8Q/Euyoq3
eivKPLz7/rhDE7C1BzFy7v8AjC3w7i9QeHcWOZFAXo5hiDasIAkljDOsdfD4tP5/
wSO6E6pjL3kJ+RH2FCHd7ciQb+IcuXbku64ln8gab4p8jLa/mcMI+V3eWYnZ82Yu
axsa85hAe4wb60cp/rCJo7ihhDTTvGooqtTisOv2nSvCYpcW9qbL6cGjAXECAwEA
AQKCAgEAjz6wnWDP5Y9ts2FrqUZ5ooamnzpUXlpLhrbu3m5ncl4ZF5LfH+QDN0Kl
KvONmHsUhJynC/vROybSJBU4Fu4bms1DJY3C39h/L7g00qhLG7901pgWMpn3QQtU
4P49qpBii20MGhuTsmQQALtV4kB/vTgYfinoawpo67cdYmk8lqzGzzB/HKxZdNTq
s+zOfxRr7PWMo9LyVRuKLjGyYXZJ/coFaobWBi8Y96Rw5NZZRYQQXLIalC/Dhndm
AHckpstEtx2i8f6yxEUOgPvV/gD7Akn92RpqOGW0g/kYpXjGqZQy9PVHGy61sInY
HSkcOspIkJiS6WyJY9JcvJPM6ns4b84GE9qoUlWVF3RWJk1dqYCw5hz4U8LFyxsF
R6WhYiImvjxBLpab55rSqbGkzjI2z+ucDZyl1gqIv9U6qceVsgRyuqdfVN4deU22
LzO5IEDhnGdFqg9KQY7u8zm686Ejs64T1sh0y4GOmGsSg+P6nsqkdlXH8C+Cf03F
lqPFg8WQC7ojl/S8dPmkT5tcJh3BPwIWuvbtVjFOGQc8x0lb+NwK8h2Nsn6LNazS
0H90adh/IyYX4sBMokrpxAi+gMAWiyJHIHLeH2itNKtAQd3qQowbrWNswJSgJzsT
JuJ7uqRKAFkE6nCeAkuj/6KHHMPsfCAffVdyGaWqhoxmPOrnVgECggEBAOrCCwiC
XxwUgjOfOKx68siFJLfHf4vPo42LZOkAQq5aUmcWHbJVXmoxLYSczyAROopY0wd6
Dx8rqnpO7OtZsdJMeBSHbMVKoBZ77hiCQlrljcj12moFaEAButLCdZFsZW4zF/sx
kWIAaPH9vc4MvHHyvyNoB3yQRdevu57X7xGf9UxWuPil/jvdbt9toaraUT6rUBWU
GYPNKaLFsQzKsFWAzp5RGpASkhuiBJ0Qx3cfLyirjrKqTipe3o3gh/5RSHQ6VAhz
gdUG7WszNWk8FDCL6RTWzPOrbUyJo/wz1kblsL3vhV7ldEKFHeEjsDGroW2VUFlS
asAHNvM4/uYcOSECggEBANYH0427qZtLVuL97htXW9kCAT75xbMwgRskAH4nJDlZ
IggDErmzBhtrHgR+9X09iL47jr7dUcrVNPHzK/WXALFSKzXhkG/yAgmt3r14WgJ6
5y7010LlPFrzaNEyO/S4ISuBLt4cinjJsrFpoo0WI8jXeM5ddG6ncxdurKXMymY7
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::.::
:::::::::::::::::::::::::::.::::::::::::::::::::::::::::::::::::
LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLlL
ÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖ
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ
YYYYYYYYYYYYYYYYYYYYYyYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY
gff0GJCOMZ65pMSy3A3cSAtjlKnb4fWzuHD5CFbusN4WhCT/tNxGNSpzvxd8GIDs
nY7exs9L230oCCpedVgcbayHCbkChEfoPzL1e1jXjgCwCTgt8GjeEFqc1gXNEaUn
O8AJ4VlR8fRszHm6yR0ZUBdY7UJddxQiYOzt0S1RLlECggEAbdcs4mZdqf3OjejJ
06oTPs9NRtAJVZlppSi7pmmAyaNpOuKWMoLPElDAQ3Q7VX26LlExLCZoPOVpdqDH
KbdmBEfTR4e11Pn9vYdu9/i6o10U4hpmf4TYKlqk10g1Sj21l8JATj/7Diey8scO
sAI1iftSg3aBSj8W7rxCxSezrENzuqw5D95a/he1cMUTB6XuravqZK5O4eR0vrxR
AvMzXk5OXrUEALUvt84u6m6XZZ0pq5XZxq74s8p/x1JvTwcpJ3jDKNEixlHfdHEZ
ZIu/xpcwD5gRfVGQamdcWvzGHZYLBFO1y5kAtL8kI9tW7WaouWVLmv99AyxdAaCB
Y5mBAQKCAQEAzU7AnorPzYndlOzkxRFtp6MGsvRBsvvqPLCyUFEXrHNV872O7tdO
GmsMZl+q+TJXw7O54FjJJvqSSS1sk68AGRirHop7VQce8U36BmI2ZX6j2SVAgIkI
9m3btCCt5rfiCatn2+Qg6HECmrCsHw6H0RbwaXS4RZUXD/k4X+sslBitOb7K+Y+N
Bacq6QxxjlIqQdKKPs4P2PNHEAey+kEJJGEQ7bTkNxCZ21kgi1Sc5L8U/IGy0BMC
PvJxssLdaWILyp3Ws8Q4RAoC5c0ZP0W2j+5NSbi3jsDFi0Y6/2GRdY1HAZX4twem
Q0NCedq1JNatP1gsb6bcnVHFDEGsj/35oQKCAQEAgmWMuSrojR/fjJzvke6Wvbox
FRnPk+6YRzuYhAP/YPxSRYyB5at++5Q1qr7QWn7NFozFIVFFT8CBU36ktWQ39MGm
cJ5SGyN9nAbbuWA6e+/u059R7QL+6f64xHRAGyLT3gOb1G0N6h7VqFT25q5Tq0rc
Lf/CvLKoudjv+sQ5GKBPT18+zxmwJ8YUWAsXUyrqoFWY/Tvo5yLxaC0W2gh3+Ppi
EDqe4RRJ3VKuKfZxHn5VLxgtBFN96Gy0+Htm5tiMKOZMYAkHiL+vrVZAX0hIEuRZ
JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
-----END RSA PRIVATE KEY-----" >> id_rsa

Now, if you try to reconstruct this key by removing the “obvious” garbage lines (the ones that are all repeated characters, some of which aren’t even valid base64 characters), it still isn’t a key – at least, openssl pkey doesn’t want anything to do with it. The key is very much still in there, though, as we shall soon see.

Using a gem I wrote and a quick bit of Ruby, we can extract a complete private key. The irb session looks something like this:

>> require "derparse"
>> b64 = <<EOF
MIIJKgIBAAKCAgEAxEVih1JGb8gu/Fm4AZh+ZwJw/pjzzliWrg4mICFt1g7SmIE2
TCQMKABdwd11wOFKCPc/UzRH/fHuQcvWrpbOSdqev/zKff9iedKw/YygkMeIRaXB
fYELqvUAOJ8PPfDm70st9GJRhjGgo5+L3cJB2gfgeiDNHzaFvapRSU0oMGQX+kI9
ezsjDAn+0Pp+r3h/u1QpLSH4moRFGF4omNydI+3iTGB98/EzuNhRBHRNq4oBV5SG
Pq/A1bem2ninnoEaQ+OPESxYzDz3Jy9jV0W/6LvtJ844m+XX69H5fqq5dy55z6DW
sGKn78ULPVZPsYH5Y7C+CM6GAn4nYCpau0t52sqsY5epXdeYx4Dc+Wm0CjXrUDEe
Egl4loPKDxJkQqQ/MQiz6Le/UK9vEmnWn1TRXK3ekzNV4NgDfJANBQobOpwt8WVB
rbsC0ON7n680RQnl7PltK9P1AQW5vHsahkoixk/BhcwhkrkZGyDIl9g8Q/Euyoq3
eivKPLz7/rhDE7C1BzFy7v8AjC3w7i9QeHcWOZFAXo5hiDasIAkljDOsdfD4tP5/
wSO6E6pjL3kJ+RH2FCHd7ciQb+IcuXbku64ln8gab4p8jLa/mcMI+V3eWYnZ82Yu
axsa85hAe4wb60cp/rCJo7ihhDTTvGooqtTisOv2nSvCYpcW9qbL6cGjAXECAwEA
AQKCAgEAjz6wnWDP5Y9ts2FrqUZ5ooamnzpUXlpLhrbu3m5ncl4ZF5LfH+QDN0Kl
KvONmHsUhJynC/vROybSJBU4Fu4bms1DJY3C39h/L7g00qhLG7901pgWMpn3QQtU
4P49qpBii20MGhuTsmQQALtV4kB/vTgYfinoawpo67cdYmk8lqzGzzB/HKxZdNTq
s+zOfxRr7PWMo9LyVRuKLjGyYXZJ/coFaobWBi8Y96Rw5NZZRYQQXLIalC/Dhndm
AHckpstEtx2i8f6yxEUOgPvV/gD7Akn92RpqOGW0g/kYpXjGqZQy9PVHGy61sInY
HSkcOspIkJiS6WyJY9JcvJPM6ns4b84GE9qoUlWVF3RWJk1dqYCw5hz4U8LFyxsF
R6WhYiImvjxBLpab55rSqbGkzjI2z+ucDZyl1gqIv9U6qceVsgRyuqdfVN4deU22
LzO5IEDhnGdFqg9KQY7u8zm686Ejs64T1sh0y4GOmGsSg+P6nsqkdlXH8C+Cf03F
lqPFg8WQC7ojl/S8dPmkT5tcJh3BPwIWuvbtVjFOGQc8x0lb+NwK8h2Nsn6LNazS
0H90adh/IyYX4sBMokrpxAi+gMAWiyJHIHLeH2itNKtAQd3qQowbrWNswJSgJzsT
JuJ7uqRKAFkE6nCeAkuj/6KHHMPsfCAffVdyGaWqhoxmPOrnVgECggEBAOrCCwiC
XxwUgjOfOKx68siFJLfHf4vPo42LZOkAQq5aUmcWHbJVXmoxLYSczyAROopY0wd6
Dx8rqnpO7OtZsdJMeBSHbMVKoBZ77hiCQlrljcj12moFaEAButLCdZFsZW4zF/sx
kWIAaPH9vc4MvHHyvyNoB3yQRdevu57X7xGf9UxWuPil/jvdbt9toaraUT6rUBWU
GYPNKaLFsQzKsFWAzp5RGpASkhuiBJ0Qx3cfLyirjrKqTipe3o3gh/5RSHQ6VAhz
gdUG7WszNWk8FDCL6RTWzPOrbUyJo/wz1kblsL3vhV7ldEKFHeEjsDGroW2VUFlS
asAHNvM4/uYcOSECggEBANYH0427qZtLVuL97htXW9kCAT75xbMwgRskAH4nJDlZ
IggDErmzBhtrHgR+9X09iL47jr7dUcrVNPHzK/WXALFSKzXhkG/yAgmt3r14WgJ6
5y7010LlPFrzaNEyO/S4ISuBLt4cinjJsrFpoo0WI8jXeM5ddG6ncxdurKXMymY7
EOF
>> b64 += <<EOF
gff0GJCOMZ65pMSy3A3cSAtjlKnb4fWzuHD5CFbusN4WhCT/tNxGNSpzvxd8GIDs
nY7exs9L230oCCpedVgcbayHCbkChEfoPzL1e1jXjgCwCTgt8GjeEFqc1gXNEaUn
O8AJ4VlR8fRszHm6yR0ZUBdY7UJddxQiYOzt0S1RLlECggEAbdcs4mZdqf3OjejJ
06oTPs9NRtAJVZlppSi7pmmAyaNpOuKWMoLPElDAQ3Q7VX26LlExLCZoPOVpdqDH
KbdmBEfTR4e11Pn9vYdu9/i6o10U4hpmf4TYKlqk10g1Sj21l8JATj/7Diey8scO
sAI1iftSg3aBSj8W7rxCxSezrENzuqw5D95a/he1cMUTB6XuravqZK5O4eR0vrxR
AvMzXk5OXrUEALUvt84u6m6XZZ0pq5XZxq74s8p/x1JvTwcpJ3jDKNEixlHfdHEZ
ZIu/xpcwD5gRfVGQamdcWvzGHZYLBFO1y5kAtL8kI9tW7WaouWVLmv99AyxdAaCB
Y5mBAQKCAQEAzU7AnorPzYndlOzkxRFtp6MGsvRBsvvqPLCyUFEXrHNV872O7tdO
GmsMZl+q+TJXw7O54FjJJvqSSS1sk68AGRirHop7VQce8U36BmI2ZX6j2SVAgIkI
9m3btCCt5rfiCatn2+Qg6HECmrCsHw6H0RbwaXS4RZUXD/k4X+sslBitOb7K+Y+N
Bacq6QxxjlIqQdKKPs4P2PNHEAey+kEJJGEQ7bTkNxCZ21kgi1Sc5L8U/IGy0BMC
PvJxssLdaWILyp3Ws8Q4RAoC5c0ZP0W2j+5NSbi3jsDFi0Y6/2GRdY1HAZX4twem
Q0NCedq1JNatP1gsb6bcnVHFDEGsj/35oQKCAQEAgmWMuSrojR/fjJzvke6Wvbox
FRnPk+6YRzuYhAP/YPxSRYyB5at++5Q1qr7QWn7NFozFIVFFT8CBU36ktWQ39MGm
cJ5SGyN9nAbbuWA6e+/u059R7QL+6f64xHRAGyLT3gOb1G0N6h7VqFT25q5Tq0rc
Lf/CvLKoudjv+sQ5GKBPT18+zxmwJ8YUWAsXUyrqoFWY/Tvo5yLxaC0W2gh3+Ppi
EDqe4RRJ3VKuKfZxHn5VLxgtBFN96Gy0+Htm5tiMKOZMYAkHiL+vrVZAX0hIEuRZ
EOF
>> der = b64.unpack("m").first
>> c = DerParse.new(der).first_node.first_child
>> version = c.value
=> 0
>> c = c.next_node
>> n = c.value
=> 80071596234464993385068908004931... # (etc)
>> c = c.next_node
>> e = c.value
=> 65537
>> c = c.next_node
>> d = c.value
=> 58438813486895877116761996105770... # (etc)
>> c = c.next_node
>> p = c.value
=> 29635449580247160226960937109864... # (etc)
>> c = c.next_node
>> q = c.value
=> 27018856595256414771163410576410... # (etc)

What I’ve done, in case you don’t speak Ruby, is take the two “chunks” of plausible-looking base64 data, chuck them together into a variable named b64, unbase64 it into a variable named der, pass that into a new DerParse instance, and then walk the DER value tree until I got all the values I need.

Interestingly, the q value actually traverses the “split” in the two chunks, which means that there’s always the possibility that there are lines missing from the key. However, since p and q are supposed to be prime, we can “sanity check” them to see if corruption is likely to have occurred:

>> require "openssl"
>> OpenSSL::BN.new(p).prime?
=> true
>> OpenSSL::BN.new(q).prime?
=> true

Excellent! The chances of a corrupted file producing valid-but-incorrect prime numbers isn’t huge, so we can be fairly confident that we’ve got the “real” p and q. Now, with the help of another one of my creations we can use e, p, and q to create a fully-operational battle key:

>> require "openssl/pkey/rsa"
>> k = OpenSSL::PKey::RSA.from_factors(p, q, e)
=> #<OpenSSL::PKey::RSA:0x0000559d5903cd38>
>> k.valid?
=> true
>> k.verify(OpenSSL::Digest::SHA256.new, k.sign(OpenSSL::Digest::SHA256.new, "bob"), "bob")
=> true

… and there you have it. One fairly redacted-looking private key brought back to life by maths and far too much free time.

Sorry Mr. Finn, I hope you’re not still using that key on anything Internet-facing.

What About Other Key Types?

EC keys are very different beasts, but they have much the same problems as RSA keys. A typical EC key contains both private and public data, and the public portion is twice the size – so only about 1/3 of the data in the key is private material. It is quite plausible that you can “redact” an EC key and leave all the actually private bits exposed.

What Do We Do About It?

In short: don’t ever try and redact real private keys. For documentation purposes, just put “KEY GOES HERE” in the appropriate spot, or something like that. Store your secrets somewhere that isn’t a public (or even private!) git repo.

Generating a “dummy” private key and sticking it in there isn’t a great idea, for different reasons: people have this odd habit of reusing “demo” keys in real life. There’s no need to encourage that sort of thing.


  1. Technically the pieces aren’t 100% aligned with the underlying DER, because of how base64 works. I felt it was easier to understand if I stuck to chopping up the base64, rather than decoding into DER and then chopping up the DER. 

MicroHams Digital Conference (MHDC) 2020

On May 9 2020 (PST) I had the pleasure of speaking at the MicroHams Digital Conference (MHDC) 2020. Due to COVID-19 presenters attended via Zoom, and the conference was live streamed over YouTube.

Thanks to hard work of the organisers, this worked really well!

Looking at the conference program, I noticed the standard of the presenters was very high. The organisers I worked with (Scott N7SS, and Grant KB7WSD) explained that a side effect of making the conference virtual was casting a much wider net on presenters – making the conference even better than IRL (In Real Life)! The YouTube streaming stats showed 300-500 people “attending” – also very high.

My door to door travel time to West Coast USA is about 20 hours. So a remote presentation makes life much easier for me. It takes me a week to prepare, means 1-2 weeks away from home, and a week to recover from the jetlag. As a single parent I need to find a carer for my 14 year old.

Vickie, KD7LAW, ran a break out room for after talk chat which worked well. It was nice to “meet” several people that I usually just have email contact with. All from the comfort of my home on a Sunday morning in Adelaide (Saturday afternoon PST).

The MHDC 2020 talks have been now been published on YouTube. Here is my talk, which is a good update (May 2020) of Codec 2 and FreeDV, including:

  • The new FreeDV 2020 mode using the LPCNet neural net vocoder
  • Embedded FreeDV 700D running on the SM1000
  • FreeDV over the QO-100 geosynchronous satellite and KiwiSDRs
  • Introducing some of the good people contributing to FreeDV

The conference has me interested in applying the open source modems we have developed for digital voice to Amateur Radio packet and HF data. So I’m reading up on Winlink, Pat, Direwolf and friends.

Thanks Scott, Grant, and Vickie and the MicroHams club!

May 16, 2020

Raptor Blackbird support: all upstream in op-build

Thanks to my most recent PR being merged, op-build v2.5 will have full support for the Raptor Blackbird! This includes support for the “IPL Monitor” that’s required to get fan control going.

Note that if you’re running Fedora 32 then you need some patches to buildroot to have it build, but if you’re building on something a little older, then upstream should build and work straight out of the box (err… git tree).

I also note that the work to get Secure Boot for an OS Kernel going is starting to make its way out for code reviews, so that’s something to look forward to (although without a TPM we’re going to need extra code).

May 13, 2020

A op-build v2.5-rc1 based Raptor Blackbird Build

I have done a few builds of firmware for the Raptor Blackbird since I got mine, each of them based on upstream op-build plus a few patches. The previous one was Yet another near-upstream Raptor Blackbird firmware build that I built a couple of months ago. This new build is based off the release candidate of op-build v2.5. Here’s what’s changed:

PackageOld VersionNew Version
hcodehw030220a.opmsthw050520a.opmst
hostbootacdff8a390a2654dd52fed67bdebe2b5
kexec-lite18ec88310c4134e6b0130b3c1ea489e
libflashv6.5-228-g82aed17av6.6
linuxv5.4.22v5.4.33
linux-headersv5.4.22v5.4.33
machine-xml17e9e84d504582c88e782e30829e0d6be
occ3ab29212518e65740ab4dc96fd6cf584c42
openpower-pnor6fb8d914134d544a84175f00d9c6dc395faf3
sbec318ab00116d92f08c78fb7838495ad0aab7
skibootv6.5-228-g82aed17av6.6
Changes in my latest Blackbird build

Go grab blackbird.pnor from https://www.flamingspork.com/blackbird/stewart-blackbird-6-images/, and give it a go! Just scp it to your BMC, and flash it:

pflash -E -p /tmp/blackbird.pnor

There’s two differences from upstream op-build: my pull request to op-build, and the fixing of the (old) buildroot so that it’ll build on Fedora 32. From discussions on the openpower-firmware mailing list, it seems that one hopeful thing is to have all the Blackbird support merged in before the final op-build v2.5 is tagged. The previous op-build release (v2.4) was tagged in July 2019, so we’re about 10 months into what was a 2 month release cycle, so speculating on when that final release will be is somewhat difficult.

May 12, 2020

f32, u32, and const

Some time ago, I wrote “floats, bits, and constant expressions” about converting floating point number into its representative ones and zeros as a C++ constant expression – constructing the IEEE 754 representation without being able to examine the bits directly.

I’ve been playing around with Rust recently, and rewrote that conversion code as a bit of a learning exercise for myself, with a thoroughly contrived set of constraints: using integer and single-precision floating point math, at compile time, without unsafe blocks, while using as few unstable features as possible.

I’ve included the listing below, for your bemusement and/or head-shaking, and you can play with the code in the Rust Playground and rust.godbolt.org

// Jonathan Adamczewski 2020-05-12
//
// Constructing the bit-representation of an IEEE 754 single precision floating 
// point number, using integer and single-precision floating point math, at 
// compile time, in rust, without unsafe blocks, while using as few unstable 
// features as I can.
//
// or "What if this silly C++ thing http://brnz.org/hbr/?p=1518 but in Rust?"


// Q. Why? What is this good for?
// A. To the best of my knowledge, this code serves no useful purpose. 
//    But I did learn a thing or two while writing it :)


// This is needed to be able to perform floating point operations in a const 
// function:
#![feature(const_fn)]


// bits_transmute(): Returns the bits representing a floating point value, by
//                   way of std::mem::transmute()
//
// For completeness (and validation), and to make it clear the fundamentally 
// unnecessary nature of the exercise :D - here's a short, straightforward, 
// library-based version. But it needs the const_transmute flag and an unsafe 
// block.
#![feature(const_transmute)]
const fn bits_transmute(f: f32) -> u32 {
  unsafe { std::mem::transmute::<f32, u32>(f) }
}



// get_if_u32(predicate:bool, if_true: u32, if_false: u32):
//   Returns if_true if predicate is true, else if_false
//
// If and match are not able to be used in const functions (at least, not 
// without #![feature(const_if_match)] - so here's a branch-free select function
// for u32s
const fn get_if_u32(predicate: bool, if_true: u32, if_false: u32) -> u32 {
  let pred_mask = (-1 * (predicate as i32)) as u32;
  let true_val = if_true & pred_mask;
  let false_val = if_false & !pred_mask;
  true_val | false_val
}

// get_if_f32(predicate, if_true, if_false):
//   Returns if_true if predicate is true, else if_false
//
// A branch-free select function for f32s.
// 
// If either is_true or is_false is NaN or an infinity, the result will be NaN,
// which is not ideal. I don't know of a better way to implement this function
// within the arbitrary limitations of this silly little side quest.
const fn get_if_f32(predicate: bool, if_true: f32, if_false: f32) -> f32 {
  // can't convert bool to f32 - but can convert bool to i32 to f32
  let pred_sel = (predicate as i32) as f32;
  let pred_not_sel = ((!predicate) as i32) as f32;
  let true_val = if_true * pred_sel;
  let false_val = if_false * pred_not_sel;
  true_val + false_val
}


// bits(): Returns the bits representing a floating point value.
const fn bits(f: f32) -> u32 {
  // the result value, initialized to a NaN value that will otherwise not be
  // produced by this function.
  let mut r = 0xffff_ffff;

  // These floation point operations (and others) cause the following error:
  //     only int, `bool` and `char` operations are stable in const fn
  // hence #![feature(const_fn)] at the top of the file
  
  // Identify special cases
  let is_zero    = f == 0_f32;
  let is_inf     = f == f32::INFINITY;
  let is_neg_inf = f == f32::NEG_INFINITY;
  let is_nan     = f != f;

  // Writing this as !(is_zero || is_inf || ...) cause the following error:
  //     Loops and conditional expressions are not stable in const fn
  // so instead write this as type coversions, and bitwise operations
  //
  // "normalish" here means that f is a normal or subnormal value
  let is_normalish = 0 == ((is_zero as u32) | (is_inf as u32) | 
                        (is_neg_inf as u32) | (is_nan as u32));

  // set the result value for each of the special cases
  r = get_if_u32(is_zero,    0,           r); // if (iz_zero)    { r = 0; }
  r = get_if_u32(is_inf,     0x7f80_0000, r); // if (is_inf)     { r = 0x7f80_0000; }
  r = get_if_u32(is_neg_inf, 0xff80_0000, r); // if (is_neg_inf) { r = 0xff80_0000; }
  r = get_if_u32(is_nan,     0x7fc0_0000, r); // if (is_nan)     { r = 0x7fc0_0000; }
 
  // It was tempting at this point to try setting f to a "normalish" placeholder 
  // value so that special cases do not have to be handled in the code that 
  // follows, like so:
  // f = get_if_f32(is_normal, f, 1_f32);
  //
  // Unfortunately, get_if_f32() returns NaN if either input is NaN or infinite.
  // Instead of switching the value, we work around the non-normalish cases 
  // later.
  //
  // (This whole function is branch-free, so all of it is executed regardless of 
  // the input value)

  // extract the sign bit
  let sign_bit  = get_if_u32(f < 0_f32,  1, 0);

  // compute the absolute value of f
  let mut abs_f = get_if_f32(f < 0_f32, -f, f);

  
  // This part is a little complicated. The algorithm is functionally the same 
  // as the C++ version linked from the top of the file.
  // 
  // Because of the various contrived constraints on thie problem, we compute 
  // the exponent and significand, rather than extract the bits directly.
  //
  // The idea is this:
  // Every finite single precision float point number can be represented as a
  // series of (at most) 24 significant digits as a 128.149 fixed point number 
  // (128: 126 exponent values >= 0, plus one for the implicit leading 1, plus 
  // one more so that the decimal point falls on a power-of-two boundary :)
  // 149: 126 negative exponent values, plus 23 for the bits of precision in the 
  // significand.)
  //
  // If we are able to scale the number such that all of the precision bits fall 
  // in the upper-most 64 bits of that fixed-point representation (while 
  // tracking our effective manipulation of the exponent), we can then 
  // predictably and simply scale that computed value back to a range than can 
  // be converted safely to a u64, count the leading zeros to determine the 
  // exact exponent, and then shift the result into position for the final u32 
  // representation.
  
  // Start with the largest possible exponent - subsequent steps will reduce 
  // this number as appropriate
  let mut exponent: u32 = 254;
  {
    // Hex float literals are really nice. I miss them.

    // The threshold is 2^87 (think: 64+23 bits) to ensure that the number will 
    // be large enough that, when scaled down by 2^64, all the precision will 
    // fit nicely in a u64
    const THRESHOLD: f32 = 154742504910672534362390528_f32; // 0x1p87f == 2^87

    // The scaling factor is 2^41 (think: 64-23 bits) to ensure that a number 
    // between 2^87 and 2^64 will not overflow in a single scaling step.
    const SCALE_UP: f32 = 2199023255552_f32; // 0x1p41f == 2^41

    // Because loops are not available (no #![feature(const_loops)], and 'if' is
    // not available (no #![feature(const_if_match)]), perform repeated branch-
    // free conditional multiplication of abs_f.

    // use a macro, because why not :D It's the most compact, simplest option I 
    // could find.
    macro_rules! maybe_scale {
      () => {{
        // care is needed: if abs_f is above the threshold, multiplying by 2^41 
        // will cause it to overflow (INFINITY) which will cause get_if_f32() to
        // return NaN, which will destroy the value in abs_f. So compute a safe 
        // scaling factor for each iteration.
        //
        // Roughly equivalent to :
        // if (abs_f < THRESHOLD) {
        //   exponent -= 41;
        //   abs_f += SCALE_UP;
        // }
        let scale = get_if_f32(abs_f < THRESHOLD, SCALE_UP,      1_f32);    
        exponent  = get_if_u32(abs_f < THRESHOLD, exponent - 41, exponent); 
        abs_f     = get_if_f32(abs_f < THRESHOLD, abs_f * scale, abs_f);
      }}
    }
    // 41 bits per iteration means up to 246 bits shifted.
    // Even the smallest subnormal value will end up in the desired range.
    maybe_scale!();  maybe_scale!();  maybe_scale!();
    maybe_scale!();  maybe_scale!();  maybe_scale!();
  }

  // Now that we know that abs_f is in the desired range (2^87 <= abs_f < 2^128)
  // scale it down to be in the range (2^23 <= _ < 2^64), and convert without 
  // loss of precision to u64.
  const INV_2_64: f32 = 5.42101086242752217003726400434970855712890625e-20_f32; // 0x1p-64f == 2^64
  let a = (abs_f * INV_2_64) as u64;

  // Count the leading zeros.
  // (C++ doesn't provide a compile-time constant function for this. It's nice 
  // that rust does :)
  let mut lz = a.leading_zeros();

  // if the number isn't normalish, lz is meaningless: we stomp it with 
  // something that will not cause problems in the computation that follows - 
  // the result of which is meaningless, and will be ignored in the end for 
  // non-normalish values.
  lz = get_if_u32(!is_normalish, 0, lz); // if (!is_normalish) { lz = 0; }

  {
    // This step accounts for subnormal numbers, where there are more leading 
    // zeros than can be accounted for in a valid exponent value, and leading 
    // zeros that must remain in the final significand.
    //
    // If lz < exponent, reduce exponent to its final correct value - lz will be
    // used to remove all of the leading zeros.
    //
    // Otherwise, clamp exponent to zero, and adjust lz to ensure that the 
    // correct number of bits will remain (after multiplying by 2^41 six times - 
    // 2^246 - there are 7 leading zeros ahead of the original subnormal's
    // computed significand of 0.sss...)
    // 
    // The following is roughly equivalent to:
    // if (lz < exponent) {
    //   exponent = exponent - lz;
    // } else {
    //   exponent = 0;
    //   lz = 7;
    // }

    // we're about to mess with lz and exponent - compute and store the relative 
    // value of the two
    let lz_is_less_than_exponent = lz < exponent;

    lz       = get_if_u32(!lz_is_less_than_exponent, 7,             lz);
    exponent = get_if_u32( lz_is_less_than_exponent, exponent - lz, 0);
  }

  // compute the final significand.
  // + 1 shifts away a leading 1-bit for normal, and 0-bit for subnormal values
  // Shifts are done in u64 (that leading bit is shifted into the void), then
  // the resulting bits are shifted back to their final resting place.
  let significand = ((a << (lz + 1)) >> (64 - 23)) as u32;

  // combine the bits
  let computed_bits = (sign_bit << 31) | (exponent << 23) | significand;

  // return the normalish result, or the non-normalish result, as appopriate
  get_if_u32(is_normalish, computed_bits, r)
}


// Compile-time validation - able to be examined in rust.godbolt.org output
pub static BITS_BIGNUM: u32 = bits(std::f32::MAX);
pub static TBITS_BIGNUM: u32 = bits_transmute(std::f32::MAX);
pub static BITS_LOWER_THAN_MIN: u32 = bits(7.0064923217e-46_f32);
pub static TBITS_LOWER_THAN_MIN: u32 = bits_transmute(7.0064923217e-46_f32);
pub static BITS_ZERO: u32 = bits(0.0f32);
pub static TBITS_ZERO: u32 = bits_transmute(0.0f32);
pub static BITS_ONE: u32 = bits(1.0f32);
pub static TBITS_ONE: u32 = bits_transmute(1.0f32);
pub static BITS_NEG_ONE: u32 = bits(-1.0f32);
pub static TBITS_NEG_ONE: u32 = bits_transmute(-1.0f32);
pub static BITS_INF: u32 = bits(std::f32::INFINITY);
pub static TBITS_INF: u32 = bits_transmute(std::f32::INFINITY);
pub static BITS_NEG_INF: u32 = bits(std::f32::NEG_INFINITY);
pub static TBITS_NEG_INF: u32 = bits_transmute(std::f32::NEG_INFINITY);
pub static BITS_NAN: u32 = bits(std::f32::NAN);
pub static TBITS_NAN: u32 = bits_transmute(std::f32::NAN);
pub static BITS_COMPUTED_NAN: u32 = bits(std::f32::INFINITY/std::f32::INFINITY);
pub static TBITS_COMPUTED_NAN: u32 = bits_transmute(std::f32::INFINITY/std::f32::INFINITY);


// Run-time validation of many more values
fn main() {
  let end: usize = 0xffff_ffff;
  let count = 9_876_543; // number of values to test
  let step = end / count;
  for u in (0..=end).step_by(step) {
      let v = u as u32;
      
      // reference
      let f = unsafe { std::mem::transmute::<u32, f32>(v) };
      
      // compute
      let c = bits(f);

      // validation
      if c != v && 
         !(f.is_nan() && c == 0x7fc0_0000) && // nans
         !(v == 0x8000_0000 && c == 0) { // negative 0
          println!("{:x?} {:x?}", v, c); 
      }
  }
}

May 10, 2020

IT Asset Management

In my last full-time position I managed the asset tracking database for my employer. It was one of those things that “someone” needed to do, and it seemed that only way that “someone” wouldn’t equate to “no-one” was for me to do it – which was ok. We used Snipe IT [1] to track the assets. I don’t have enough experience with asset tracking to say that Snipe is better or worse than average, but it basically did the job. Asset serial numbers are stored, you can have asset types that allow you to just add one more of the particular item, purchase dates are stored which makes warranty tracking easier, and every asset is associated with a person or listed as available. While I can’t say that Snipe IT is better than other products I can say that it will do the job reasonably well.

One problem that I didn’t discover until way too late was the fact that the finance people weren’t tracking serial numbers and that some assets in the database had the same asset IDs as the finance department and some had different ones. The best advice I can give to anyone who gets involved with asset tracking is to immediately chat to finance about how they track things, you need to know if the same asset IDs are used and if serial numbers are tracked by finance. I was pleased to discover that my colleagues were all honourable people as there was no apparent evaporation of valuable assets even though there was little ability to discover who might have been the last person to use some of the assets.

One problem that I’ve seen at many places is treating small items like keyboards and mice as “assets”. I think that anything that is worth less than 1 hour’s pay at the minimum wage (the price of a typical PC keyboard or mouse) isn’t worth tracking, treat it as a disposable item. If you hire a programmer who requests an unusually expensive keyboard or mouse (as some do) it still won’t be a lot of money when compared to their salary. Some of the older keyboards and mice that companies have are nasty, months of people eating lunch over them leaves them greasy and sticky. I think that the best thing to do with the keyboards and mice is to give them away when people leave and when new people join the company buy new hardware for them. If a company can’t spend $25 on a new keyboard and mouse for each new employee then they either have a massive problem of staff turnover or a lack of priority on morale.

A breadmaker loaf my kids will actually eat

Share

My dad asked me to document some of my baking experiments from the recent natural disasters, which I wanted to do anyway so that I could remember the recipes. Its taken me a while to get around to though, because animated GIFs on reddit are a terrible medium for recipe storage, and because I’ve been distracted with other shiney objects. That said, let’s start with the basics — a breadmaker bread that my kids will actually eat.

A loaf of bread baked in the oven

This recipe took a bunch of iterations to get right over the last year or so, but I’ll spare you the long boring details. However, I suspect part of the problem is that the receipe varies by bread maker. Oh, and the salt is really important — don’t skip the salt!

Wet ingredients (add first)

  • 1.5 cups of warm water (we have an instantaneous gas hot water system, so I pick 42 degrees)
  • 0.25 cups of oil (I use bran oil)

Dry ingredients (add second)

I just kind of chuck these in, although I tend to put the non-flour ingredients in a corner together for reasons that I can’t explain.

  • 3.5 cups of bakers flour (must be bakers flour, not plain flour)
  • 2 tea spoons of instant yeast (we keep in the freezer in a big packet, not the sashets)
  • 4 tea spoons of white sugar
  • 1 tea spoon of salt
  • 2 tea spoons of bread improver

I then just let my bread maker do its thing, which takes about three hours including baking. If I am going to bake the bread in the over, then the dough takes about two hours, but I let the dough rise for another 30 to 60 minutes before baking.

A loaf of bread from the bread maker

I think to be honest that the result is better from the oven, but a little more work. The bread maker loaves are a bit prone to collapsing (you can see it starting on the example above), and there is a big kneeding hook indent in the middle of the bottom of the loaf.

The oven baking technique took a while to develop, but I’ll cover that in a later post.

Share

May 06, 2020

About Reopening Businesses

Currently there is political debate about when businesses should be reopened after the Covid19 quarantine.

Small Businesses

One argument for reopening things is for the benefit of small businesses. The first thing to note is that the protests in the US say “I need a haircut” not “I need to cut people’s hair”. Small businesses won’t benefit from reopening sooner.

For every business there is a certain minimum number of customers needed to be profitable. There are many comments from small business owners that want it to remain shutdown. When the government has declared a shutdown and paused rent payments and provided social security to employees who aren’t working the small business can avoid bankruptcy. If they suddenly have to pay salaries or make redundancy payouts and have to pay rent while they can’t make a profit due to customers staying home they will go bankrupt.

Many restaurants and cafes make little or no profit at most times of the week (I used to be 1/3 owner of an Internet cafe and know this well). For such a company to be viable you have to be open most of the time so customers can expect you to be open. Generally you don’t keep a cafe open at 3PM to make money at 3PM, you keep it open so people can rely on there being a cafe open there, someone who buys a can of soda at 3PM one day might come back for lunch at 1:30PM the next day because they know you are open. A large portion of the opening hours of a most retail companies can be considered as either advertising for trade at the profitable hours or as loss making times that you can’t close because you can’t send an employee home for an hour.

If you have seating for 28 people (as my cafe did) then for about half the opening hours you will probably have 2 or fewer customers in there at any time, for about a quarter the opening hours you probably won’t cover the salary of the one person on duty. The weekend is when you make the real money, especially Friday and Saturday nights when you sometimes get all the seats full and people coming in for takeaway coffee and snacks. On Friday and Saturday nights the 60 seat restaurant next door to my cafe used to tell customers that my cafe made better coffee. It wasn’t economical for them to have a table full for an hour while they sell a few cups of coffee, they wanted customers to leave after dessert and free the table for someone who wants a meal with wine (alcohol is the real profit for many restaurants).

The plans of reopening with social distancing means that a 28 seat cafe can only have 14 chairs or less (some plans have 25% capacity which would mean 7 people maximum). That means decreasing the revenue of the most profitable times by 50% to 75% while also not decreasing the operating costs much. A small cafe has 2-3 staff when it’s crowded so there’s no possibility of reducing staff by 75% when reducing the revenue by 75%.

My Internet cafe would have closed immediately if forced to operate in the proposed social distancing model. It would have been 1/4 of the trade and about 1/8 of the profit at the most profitable times, even if enough customers are prepared to visit – and social distancing would kill the atmosphere. Most small businesses are barely profitable anyway, most small businesses don’t last 4 years in normal economic circumstances.

This reopen movement is about cutting unemployment benefits not about helping small business owners. Destroying small businesses is also good for big corporations, kill the small cafes and restaurants and McDonald’s and Starbucks will win. I think this is part of the motivation behind the astroturf campaign for reopening businesses.

Forbes has an article about this [1].

Psychological Issues

Some people claim that we should reopen businesses to help people who have psychological problems from isolation, to help victims of domestic violence who are trapped at home, to stop older people being unemployed for the rest of their lives, etc.

Here is one article with advice for policy makers from domestic violence experts [2]. One thing it mentions is that the primary US federal government program to deal with family violence had a budget of $130M in 2013. The main thing that should be done about family violence is to make it a priority at all times (not just when it can be a reason for avoiding other issues) and allocate some serious budget to it. An agency that deals with problems that affect families and only has a budget of $1 per family per year isn’t going to be able to do much.

There are ongoing issues of people stuck at home for various reasons. We could work on better public transport to help people who can’t drive. We could work on better healthcare to help some of the people who can’t leave home due to health problems. We could have more budget for carers to help people who can’t leave home without assistance. Wanting to reopen restaurants because some people feel isolated is ignoring the fact that social isolation is a long term ongoing issue for many people, and that many of the people who are affected can’t even afford to eat at a restaurant!

Employment discrimination against people in the 50+ age range is an ongoing thing, many people in that age range know that if they lose their job and can’t immediately find another they will be unemployed for the rest of their lives. Reopening small businesses won’t help that, businesses running at low capacity will have to lay people off and it will probably be the older people. Also the unemployment system doesn’t deal well with part time work. The Australian system (which I think is similar to most systems in this regard) reduces the unemployment benefits by $0.50 for every dollar that is earned in part time work, that effectively puts people who are doing part time work because they can’t get a full-time job in the highest tax bracket! If someone is going to pay for transport to get to work, work a few hours, then get half the money they earned deducted from unemployment benefits it hardly makes it worthwhile to work. While the exact health impacts of Covid19 aren’t well known at this stage it seems very clear that older people are disproportionately affected, so forcing older people to go back to work before there is a vaccine isn’t going to help them.

When it comes to these discussions I think we should be very suspicious of people who raise issues they haven’t previously shown interest in. If the discussion of reopening businesses seems to be someone’s first interest in the issues of mental health, social security, etc then they probably aren’t that concerned about such issues.

I believe that we should have a Universal Basic Income [3]. I believe that we need to provide better mental health care and challenge the gender ideas that hurt men and cause men to hurt women [4]. I believe that we have significant ongoing problems with inequality not small short term issues [5]. I don’t think that any of these issues require specific changes to our approach to preventing the transmission of disease. I also think that we can address multiple issues at the same time, so it is possible for the government to devote more resources to addressing unemployment, family violence, etc while also dealing with a pandemic.

May 03, 2020

Backing up to a GnuBee PC 2

After installing Debian buster on my GnuBee, I set it up for receiving backups from my other computers.

Software setup

I started by configuring it like a typical server but without a few packages that either take a lot of memory or CPU:

I changed the default hostname:

  • /etc/hostname: foobar
  • /etc/mailname: foobar.example.com
  • /etc/hosts: 127.0.0.1 foobar.example.com foobar localhost

and then installed the avahi-daemon package to be able to reach this box using foobar.local.

I noticed the presence of a world-writable directory and so I tightened the security of some of the default mount points by putting the following in /etc/rc.local:

chmod 755 /etc/network
exit 0

Hardware setup

My OS drive (/dev/sda) is a small SSD so that the GnuBee can run silently when the spinning disks aren't needed. To hold the backup data on the other hand, I got three 4-TB drives drives which I setup in a RAID-5 array. If the data were valuable, I'd use RAID-6 instead since it can survive two drives failing at the same time, but in this case since it's only holding backups, I'd have to lose the original machine at the same time as two of the 3 drives, a very unlikely scenario.

I created new gpt partition tables on /dev/sdb, /dev/sdbc, /dev/sdd and used fdisk to create a single partition of type 29 (Linux RAID) on each of them.

Then I created the RAID array:

mdadm /dev/md127 --create -n 3 --level=raid5 -a /dev/sdb1 /dev/sdc1 /dev/sdd1

and waited more than 24 hours for that operation to finish. Next, I formatted the array:

mkfs.ext4 -m 0 /dev/md127

and added the following to /etc/fstab:

/dev/md127 /mnt/data/ ext4 noatime,nodiratime 0 2

To reduce unnecessary noise and reduce power consumption, I also installed hdparm:

apt install hdparm

and configured all spinning drives to spin down after being idle for 10 minutes by putting the following in /etc/hdparm.conf:

/dev/sdb {
       spindown_time = 120
}

/dev/sdc {
       spindown_time = 120
}

/dev/sdd {
       spindown_time = 120
}

and then reloaded the configuration:

 /usr/lib/pm-utils/power.d/95hdparm-apm resume

Finally I setup smartmontools by putting the following in /etc/smartd.conf:

/dev/sda -a -o on -S on -s (S/../.././02|L/../../6/03)
/dev/sdb -a -o on -S on -s (S/../.././02|L/../../6/03)
/dev/sdc -a -o on -S on -s (S/../.././02|L/../../6/03)
/dev/sdd -a -o on -S on -s (S/../.././02|L/../../6/03)

and restarting the daemon:

systemctl restart smartd.service

Backup setup

I started by using duplicity since I have been using that tool for many years, but a 190GB backup took around 15 hours on the GnuBee with gigabit ethernet.

After a friend suggested it, I took a look at restic and I have to say that I am impressed. The same backup finished in about half the time.

User and ssh setup

After hardening the ssh setup as I usually do, I created a user account for each machine needing to backup onto the GnuBee:

adduser machine1
adduser machine1 sshuser
adduser machine1 sftponly
chsh machine1 -s /bin/false

and then matching directories under /mnt/data/home/:

mkdir /mnt/data/home/machine1
chown machine1:machine1 /mnt/data/home/machine1
chmod 700 /mnt/data/home/machine1

Then I created a custom ssh key for each machine:

ssh-keygen -f /root/.ssh/foobar_backups -t ed25519

and placed it in /home/machine1/.ssh/authorized_keys on the GnuBee.

On each machine, I added the following to /root/.ssh/config:

Host foobar.local
    User machine1
    Compression no
    Ciphers aes128-ctr
    IdentityFile /root/backup/foobar_backups
    IdentitiesOnly yes
    ServerAliveInterval 60
    ServerAliveCountMax 240

The reason for setting the ssh cipher and disabling compression is to speed up the ssh connection as much as possible given that the GnuBee has a very small RAM bandwidth.

Another performance-related change I made on the GnuBee was switching to the internal sftp server by putting the following in /etc/ssh/sshd_config:

Subsystem      sftp    internal-sftp

Restic script

After reading through the excellent restic documentation, I wrote the following backup script, based on my old duplicity script, to reuse on all of my computers:

# Configure for each host
PASSWORD="XXXX"  # use `pwgen -s 64` to generate a good random password
BACKUP_HOME="/root/backup"
REMOTE_URL="sftp:foobar.local:"
RETENTION_POLICY="--keep-daily 7 --keep-weekly 4 --keep-monthly 12 --keep-yearly 2"

# Internal variables
SSH_IDENTITY="IdentityFile=$BACKUP_HOME/foobar_backups"
EXCLUDE_FILE="$BACKUP_HOME/exclude"
PKG_FILE="$BACKUP_HOME/dpkg-selections"
PARTITION_FILE="$BACKUP_HOME/partitions"

# If the list of files has been requested, only do that
if [ "$1" = "--list-current-files" ]; then
    RESTIC_PASSWORD=$PASSWORD restic --quiet -r $REMOTE_URL ls latest
    exit 0

# Show list of available snapshots
elif [ "$1" = "--list-snapshots" ]; then
    RESTIC_PASSWORD=$GPG_PASSWORD restic --quiet -r $REMOTE_URL snapshots
    exit 0

# Restore the given file
elif [ "$1" = "--file-to-restore" ]; then
    if [ "$2" = "" ]; then
        echo "You must specify a file to restore"
        exit 2
    fi
    RESTORE_DIR="$(mktemp -d ./restored_XXXXXXXX)"
    RESTIC_PASSWORD=$PASSWORD restic --quiet -r $REMOTE_URL restore latest --target "$RESTORE_DIR" --include "$2" || exit 1
    echo "$2 was restored to $RESTORE_DIR"
    exit 0

# Delete old backups
elif [ "$1" = "--prune" ]; then
    # Expire old backups
    RESTIC_PASSWORD=$PASSWORD restic --quiet -r $REMOTE_URL forget $RETENTION_POLICY

    # Delete files which are no longer necessary (slow)
    RESTIC_PASSWORD=$PASSWORD restic --quiet -r $REMOTE_URL prune
    exit 0

# Catch invalid arguments
elif [ "$1" != "" ]; then
    echo "Invalid argument: $1"
    exit 1
fi

# Check the integrity of existing backups
RESTIC_PASSWORD=$PASSWORD restic --quiet -r $REMOTE_URL check || exit 1

# Dump list of Debian packages
dpkg --get-selections > $PKG_FILE

# Dump partition tables from harddrives
/sbin/fdisk -l /dev/sda > $PARTITION_FILE
/sbin/fdisk -l /dev/sdb > $PARTITION_FILE

# Do the actual backup
RESTIC_PASSWORD=$PASSWORD restic --quiet --cleanup-cache -r $REMOTE_URL backup / --exclude-file $EXCLUDE_FILE

I run it with the following cronjob in /etc/cron.d/backups:

30 8 * * *    root  ionice nice nocache /root/backup/backup-machine1-to-foobar
30 2 * * Sun  root  ionice nice nocache /root/backup/backup-machine1-to-foobar --prune

in a way that doesn't impact the rest of the system too much.

Finally, I printed a copy of each of my backup script, using enscript, to stash in a safe place:

enscript --highlight=bash --style=emacs --output=- backup-machine1-to-foobar | ps2pdf - > foobar.pdf

This is actually a pretty important step since without the password, you won't be able to decrypt and restore what's on the GnuBee.

May 02, 2020

Audiobooks – April 2020

Cockpit Confidential: Everything You Need to Know About Air Travel: Questions, Answers, and Reflections by Patrick Smith

Lots of “you always wanted to know” & “this is how it really is” bits about commercial flying. Good fun 4/5

The Day of the Jackal by Frederick Forsyth

A very tightly written thriller about a fictional 1963 plot to assassinate Frnch President Charles de Gaulle. Fast moving, detailed and captivating 5/5

Topgun: An American Story by Dan Pedersen

Memoir from the first officer in charge of the US Navy’s Top Gun school. A mix of his life & career, the school and US Navy air history (especially during Vietnam). Excellent 4/5

Radicalized: Four Tales of Our Present Moment
by Cory Doctorow

4 short stories set in more-or-less the present day. They all work fairly well. Worth a read. Spoilers in the link. 3/5

On the Banks of Plum Creek: Little House Series, Book 4 by Laura Ingalls Wilder

The family settle in Minnesota and build a new farm. Various major and minor adventures. I’m struck how few possessions people had back then. 3/5

My Father’s Business: The Small-Town Values That Built Dollar General into a Billion-Dollar Company by Cal Turner Jr.

A mix of personal and company history. I found the early story of the company and personal stuff the most interesting. 3/5

You Can’t Fall Off the Floor: And Other Lessons from a Life in Hollywood by Harris and Nick Katleman

Memoir by a former studio exec and head. Lots of funny and interesting stories from his career, featuring plenty of famous names. 4/5

The Wave: In Pursuit of the Rogues, Freaks and Giants of the Ocean by Susan Casey

75% about Big-wave Tow-Surfers with chapters on Scientists and Shipping industry people mixed in. Competent but author’s heart seemed mostly in the surfing. 3/5

Share

April 27, 2020

Install the COVIDSafe app

I can’t think of a more unequivocal title than that. 🙂

The Australian government doesn’t have a good track record of either launching publicly visible software projects, or respecting privacy, so I’ve naturally been sceptical of the contact tracing app since it was announced. The good news is, while it has some relatively minor problems, it appears to be a solid first version.

Privacy

While the source code is yet to be released, the Android version has already been decompiled, and public analysis is showing that it only collects necessary information, and only uploads contact information to the government servers when you press the button to upload (you should only press that button if you actually get COVID-19, and are asked to upload it by your doctor).

The legislation around the app is also clear that the data you upload can only be accessed by state health officials. Commonwealth departments have no access, neither do non-health departments (eg, law enforcement, intelligence).

Technical

It does what it’s supposed to do, and hasn’t been found to open you up to risks by installing it. There are a lot of people digging into it, so I would expect any significant issues to be found, reported, and fixed quite quickly.

Some parts of it are a bit rushed, and the way it scans for contacts could be more battery efficient (that should hopefully be fixed in the coming weeks when Google and Apple release updates that these contact tracing apps can use).

If it produces useful data, however, I’m willing to put up with some quirks. 🙂

Usefulness

I’m obviously not an epidemiologist, but those I’ve seen talk about it say that yes, the data this app produces will be useful for augmenting the existing contact tracing efforts. There were some concerns that it could produce a lot of junk data that wastes time, but I trust the expert contact tracing teams to filter and prioritise the data they get from it.

Install it!

The COVIDSafe site has links to the app in Apple’s App Store, as well as Google’s Play Store. Setting it up takes a few minutes, and then you’re done!

April 26, 2020

YouTube Channels I subscribe to in April 2020

I did a big twitter thread of the YouTube channels I am following. Below is a copy of the tweets. They are a quick description of the channel and a link to a sample video.

Lots of pop-Science and TV/Movie analysis channels plus a few on other topics.

I should mention that I watch the majority of YouTube videos at speed 1.5x since they usually speak quite slowly. To Speed up videos click on the settings “cog” and then select “Playback Speed” . YouTube lets you go up to 2x

Image

Chris Stuckmann reviews movies. During normal times he does a couple per week. Mostly currently releases with some old ones. His reviews are low-spoiler although sometimes he’ll do an extra “Spoiler Review”. Usually around 6 minutes long.
Star Wars: The Rise of Skywalker – Movie Review

Wendover Productions does explainer videos. Air & Sea travel are quite common topics. Usually a bit better researched than some of the other channels and a little longer at around 12 minutes. Around 1 video per week.
The Logistics of the US Census

City Beautiful is a channel about cities and City planning. 1-2 videos per month. Usually around 10 minutes. Pitched for the amateur city and planning enthusiast
Where did the rules of the road come from?

PBS Eons does videos about the history of life on Earth. Lots of Dinosaurs, early humans and the like. Run and advised by experts so info is great quality. Links to refs! Accessible but dives into the detail. Around 1 video/week. About 10 minutes each.
How the Egg Came First

Pitch Meetings are a writer pitching a real (usually recent) movie or show to a studio exec. Both a played by Ryan George. Very funny. Part of the Screen Rant channel but I don’t watch their other stuff
Playlist
Netflix’s Tiger King Pitch Meeting

MrMobile [Michael Fisher] reviews Phones, Laptops, Smart Watches & other tech gadgets. Usually about one video/week. I like the descriptive style and good production values, Not too much spec flooding.
A Stunning Smartwatch With A Familiar Failing – New Moto 360 Review

Verge Science does professional level stories about a range of Science topics. They usually are out in the field with Engineers and scientists.
Why urban coyote sightings are on the rise

Alt Shift X do detailed explainer videos about Books & TV Shows like Game of Thrones, Watchmen & Westworld. Huge amounts of detail and a great style with a wall of pictures. Weekly videos when shows are on plus subscriber extras.
Watchmen Explained (original comic)

The B1M talks about building and construction projects. Many videos are done with cooperation of the architects or building companies so a bit fluffy at times. But good production values and interesting topics.
The World’s Tallest Modular Hotel

CineFix doesn’t a variety of Movie-related videos. Over the last year only putting about one or two per month and mostly high quality. A few years ago they were at higher volume and had more throw-aways
Jojo Rabbit – What’s the Difference?

Marques Brownlee (MKBHD) does tech reviews. Mainly phones but also other gear and the odd special. His videos are extremely high quality and well researched. Averaging 2 videos per week.
Samsung Galaxy S20 Ultra Review: Attack of the Numbers!

How it Should have Ended does cartoons of funny alternative endings for movies. Plus some other long running series. Usually only a few minutes long.
Avengers Endgame Alternate HISHE

Power Play Chess is a Chess channel from Daniel King. He usually covers 1 round/day from major tournaments as well as reviewing older games and other videos.
World Champion tastes the bullet | Firouzja vs Carlsen | Lichess Bullet match 2020

Tom Scott makes explainer videos mostly about science, technology and geography. Often filmed on site rather than being talks over pictures like other channels.
Inside The Billion-Euro Nuclear Reactor That Was Never Switched On

Screen Junkies does stuff about movies. I mostly watch their “Honest Trailers” but they sometimes do ‘Serious Questions” which are good too.
Honest Trailers | Terminator: Dark Fate

Half as Interesting is an offshoot of Wendover Productions (see above). It does shorter 3-5 minutes weekly videos on a quick amusing fact or happening (that doesn’t justify a longer video)
United Airlines’ Men-Only Flights

Red Team Review is another movie and TV review channel. I was mostly watching them when Game of Thrones was on and since then they have had a bit less content. They are making some Game of Thrones videos narrated by the TV actors though
Game of Thrones Histories & Lore – The Rains of Castamere

Signum University do online classes about Fantasy (especially Tolkien) and related literature. Their channel features their classes and related videos. I mainly follow “Exploring The Lord of the Rings”. Often sounds better at 2x or 3x speed.
A Wizard of Earthsea: Session 01 – Mageborn

The Nerdwriter does approx monthly videos. Usually about a specific type of art, a painting or film making technique. Very high quality
How Walter Murch Worldized Film Sound

Real Life Lore does infotainment videos. “Answers to questions that you’ve never asked. Mostly over topics like history, geography, economics and science”.
This Was the World’s Most Dangerous Amusement Park

Janice Fung is a Sydney based youtuber who makes videos mostly about food and travel. She puts out 2 videos most weeks.
I Made the Viral Tik Tok Frothy DALGONA COFFEE! (Whipped Coffee Without Mixer!!)

Real Engineering is a bit more technical than the average popsci channel. The especially like doing videos covering flight dynamics. but they cover lots of other topics
How The Ford Model T Took Over The World

Just Write by Sage Hyden puts out a video roughly once a month. They are essays usually about writing and usually tied into a recently movie or show.
A Disney Monopoly Is A Problem (According To Disney’s Recess)

CGP Grey makes high quality explainer videos. Around one every month. High quality and usually with lots of animation.
The Trouble With Tumbleweed

Lessons from the Screenplay are “videos that analyze movie scripts to examine exactly how and why they are so good at telling their stories”
Casino Royale — How Action Reveals Character

HaxDogma is another TV Show review/analysis channel. I started watching him for his Watchmen Series videos and now watch his Westworld ones.
Official Westworld Trailer Breakdown + 3 Hidden Trailers

Lindsay Ellis does videos mostly about pop culture, Usually movies. These days she only does a few a year but they are usually 20+ minutes.
The Hobbit: A Long-Expected Autopsy (Part 1/2)

A bonus couple of recommended Courses on ‘Crash Course
Crash Course Astronomy with Phil Plait
Crash Course Computer Science by Carrie Anne Philbin

Share

April 24, 2020

Disabling mail sending from your domain

I noticed that I was receiving some bounced email notifications from a domain I own (cloud.geek.nz) to host my blog. These notifications were all for spam messages spoofing the From address since I do not use that domain for email.

I decided to try setting a strict DMARC policy to see if DMARC-using mail servers (e.g. GMail) would then drop these spoofed emails without notifying me about it.

I started by setting this initial DMARC policy in DNS in order to monitor the change:

@ TXT v=spf1 -all
_dmarc TXT v=DMARC1; p=none; ruf=mailto:dmarc@fmarier.org; sp=none; aspf=s; fo=0:1:d:s;

Then I waited three weeks without receiving anything before updating the relevant DNS records to this final DMARC policy:

@ TXT v=spf1 -all
_dmarc TXT v=DMARC1; p=reject; sp=reject; aspf=s;

This policy states that nobody is allowed to send emails for this domain and that any incoming email claiming to be from this domain should be silently rejected.

I haven't noticed any bounce notifications for messages spoofing this domain in a while, so maybe it's working?

FreeDV Beacon Maintenance

There’s been some recent interest in the FreeDV Beacon project, originally developed back in 2015. A FreeDV beacon was operating in Sunbury, VK3, for several years and was very useful for testing FreeDV.

After being approach by John (VK3IC) and Bob (VK4YA), I decided to dust off the software and bring it across to a GitHub repo. It’s now running on my laptop happily and I hope John and Bob will soon have some beacons running on the air.

I’ve added support for FreeDV 700C and 700D modes, finding a tricky bug in the process. I really should read the instructions for my own API!

Thanks also to Richard (KF5OIM) for help with the Cmake build system.


April 18, 2020

Accessing USB serial devices in Fedora Silverblue

One of the things I do a lot on my Fedora machines is talk to devices via USB serial. While a device is correctly detected at /dev/ttyUSB0 and owned by the dialout group, adding myself to that group doesn’t work as it can’t be found. This is because under Silverblue, there are two different group files (/usr/lib/group and /etc/group) with different content.

There are some easy ways to solve this, for example we can create the matching dialout group or write a udev rule. Let’s take a look!

On the host with groups

If you try to add yourself to the dialout group it will fail.

sudo gpasswd -a ${USER} dialout
gpasswd: group 'dialout' does not exist in /etc/group

Trying to re-create the group will also fail as it’s already in use.

sudo groupadd dialout -r -g 18
groupadd: GID '18' already exists

So instead, we can simply grab the entry from the OS group file and add it to /etc/group ourselves.

grep ^dialout: /usr/lib/group |sudo tee -a /etc/group

Now we are able to add ourselves to the dialout group!

sudo gpasswd -a ${USER} dialout

Activate that group in our current shell.

newgrp dialout

And now we can use a tool like screen to talk to the device (note you will have needed to install screen with rpm-ostree and rebooted first).

screen /dev/ttyUSB0 115200

And that’s it. We can now talk to USB serial devices on the host.

Inside a container with udev

Inside a container is a little more tricky as the dialout group is not passed into it. Thus, inside the container the device is owned by nobody and the user will have no permissions to read or write to it.

One way to deal with this and still use the regular toolbox command is to create a udev rule and make yourself the owner of the device on the host, instead of root.

To do this, we create a generic udev rule for all usb-serial devices.

cat << EOF | sudo tee /etc/udev/rules.d/50-usb-serial.rules
SUBSYSTEM=="tty", SUBSYSTEMS=="usb-serial", OWNER="${USER}"
EOF

If you need to create a more specific rule, you can find other bits to match by (like kernel driver, etc) with the udevadm command.

udevadm info -a -n /dev/ttyUSB0

Once you have your rule, reload udev.

sudo udevadm control --reload-rules
sudo udevadm trigger

Now, unplug your serial device and plug it back in. You should notice that it is now owned by your user.

ls -l /dev/ttyUSB0
crw-rw----. 1 csmart dialout 188, 0 Apr 18 20:53 /dev/ttyUSB0

It should also be the same inside the toolbox container now.

[21:03 csmart ~]$ toolbox enter
⬢[csmart@toolbox ~]$ ls -l /dev/ttyUSB0 
crw-rw----. 1 csmart nobody 188, 0 Apr 18 20:53 /dev/ttyUSB0

And of course, as this is inside a container, you can just dnf install screen or whatever other program you need.

Of course, if you’re happy to create the udev rule then you don’t need to worry about the groups solution on the host.

Making dnf on Fedora Silverblue a little easier with bash aliases

Fedora Silverblue doesn’t come with dnf because it’s an immutable operating system and uses a special tool called rpm-ostree to layer packages on top instead.

Most terminal work is designed to be done in containers with toolbox, but I still do a bunch of work outside of a container. Searching for packages to install with rpm-ostree still requires dnf inside a container, as it does not have that function.

I add these two aliases to my ~/.bashrc file so that using dnf to search or install into the default container is possible from a regular terminal. This just makes Silverblue a little bit more like what I’m used to with regular Fedora.

cat >> ~/.bashrc << EOF
alias sudo="sudo "
alias dnf="bash -c '#skip_sudo'; toolbox -y create 2>/dev/null; toolbox run sudo dnf"
EOF

If the default container doesn’t exist, toolbox creates it. Note that the alias for sudo has a space at the end. This tells bash to also check the next command word for alias expansion, which is what makes sudo work with aliases. Thus, we can make sure that both dnf and sudo dnf will work. The first part of the dnf alias is used to skip the sudo command so the rest is run as the regular user, which makes them both work the same.

We need to source that file or run a new bash session to pick up the aliases.

bash

Now we can just use dnf command like normal. Search can be used to find packages to install with rpm-ostree while installing packages will go into the default toolbox container (both with and without sudo are the same).

sudo dnf search vim
dnf install -y vim
The container is automatically created with dnf

To run vim from the example, enter the container and it will be there.

Vim in a container

You can do whatever you normally do with dnf, like install RPMs like RPMFusion and list repos.

Installing RPMFusion RPMs into container
Lising repositories in the container

Anyway, just a little thing but it’s kind of helpful to me.

April 16, 2020

Crisis Proofing the Australian Economy

An Open Letter to Prime Minister Scott Morrison

To The Hon Scott Morrison MP, Prime Minister,

No doubt how to re-invigorate our economy is high on your mind, among other priorities in this time of crisis.

As you're acutely aware, the pandemic we're experiencing has accelerated a long-term high unemployment trajectory we were already on due to industry retraction, automation, off-shoring jobs etc.

Now is the right time to enact changes that will bring long-term crisis resilience, economic stability and prosperity to this nation.

  1. Introduce a 1% tax on all financial / stock / commodity market transactions.
  2. Use 100% of that to fund a Universal Basic Income for all adult Australian citizens.

Funding a Universal Basic Income will bring:

  • Economic resilience in times of emergency (bushfire, drought, pandemic)
  • Removal of the need for government financial aid in those emergencies
  • Removal of all forms of pension and unemployment benefits
  • A more predictable, reduced and balanced government budget
  • Dignity and autonomy to those impacted by a economic events / crisis
  • Space and security for the innovative amongst us to take entrepreneurial risks
  • A growth in social, artistic and economic activity that could not happen otherwise

This is both simple to collect and simple to distribute to all tax payers. It can be done both swiftly and sensibly, enabling you to remove the Job Keeper band aid and it's related budgetary problems.

This is an opportunity to be seized, Mr Morrison.

There is also a second opportunity.

Post World War II, we had the Snowy River scheme. Today we have the housing affordability crisis and many Australians will never own their own home but a public building programme to provide 25% of housing will create a permanent employment and building boom and resolve the housing affordability crisis, over time.

If you cap repayments for those in public housing to 25% of their income, there will also be more disposable income circulating through the economy, creating prosperous times for all Australians.

Carpe diem, Mr Morrison.

Recognise the opportunity. Seize it.


Dear Readers,

If you support either or both of these ideas, please contact the Prime Minister directly and add your voice.

April 14, 2020

Exporting volumes from Cinder and re-creating COW layers

Share

Today I wandered into a bit of a rat hole discovering how to export data from OpenStack Cinder volumes when you don’t have admin permissions, and I thought it was worth documenting here so I remember it for next time.

Let’s assume that you have a Cinder volume named “child1”, which is a 64gb volume originally cloned from “parent1”. parent1 is a 7.9gb VMDK, but the only way I can find to extract child1 is to convert it to a glance image and then download the entire volume as a raw. Something like this:

$ cinder upload-to-image $child1 "extract:$child1"

Where $child1 is the UUID of the Cinder volume. You then need to find the UUID of the image in Glance, which the Cinder upload-to-image command will have told you, but you can also find by searching Glance for your image named “extract:$child1”:

$ glance image-list | grep "extract:$cinder_uuid"

You now need to watch that Glance image until the status of the image is “active”. It will go through a series of steps with names like “queued”, and “uploading” first.

Now you can download the image from Glance:

$ glance image-download --file images/$child1.raw --progress $glance_uuid

And then delete the intermediate glance image:

$ glance image-delete $glance_uuid

I have a bad sample script which does this in my junk code repository if that is helpful.

What you have at the end of this is a 64gb raw disk file in my example. You can convert that file to qcow2 like this:

$ qemu-img convert $child1.raw $child1.qcow2

But you’re left with a 64gb qcow2 file for your troubles. I experimented with virt-sparsify to reduce the size of this image, but it doesn’t work in my case (no space is saved), I suspect because the disk image has multiple partitions because it originally came from a VMWare environment.

Luckily qemu-img can also re-create the COW layer that existing on the admin-only side of the public cloud barrier. You do this by rebasing the converted qcow2 file onto the original VMDK file like this:

$ qemu-img create -f qcow2 -b $parent1.qcow2 $child1.delta.qcow2
$ qemu-img rebase -b $parent1.vmdk $child1.delta.qcow2

In my case I ended up with a 289mb $child1.delta.qcow2 file, which isn’t too shabby. It took about five minutes to produce that delta on my Google Cloud instance from a 7.9gb backing file and a 64gb upper layer.

Share

April 11, 2020

Using Gogo WiFi on Linux

Gogo, the WiFi provider for airlines like Air Canada, is not available to Linux users even though it advertises "access using any Wi-Fi enabled laptop, tablet or smartphone". It is however possible to work-around this restriction by faking your browser user agent.

I tried the User-Agent Switcher for Chrome extension on Chrome and Brave but it didn't work for some reason.

What did work was using Firefox and adding the following prefs in about:config to spoof its user agent to Chrome for Windows:

general.useragent.override=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36
general.useragent.updates.enabled=false
privacy.resistFingerprinting=false

The last two prefs are necessary in order for the hidden general.useragent.override pref to not be ignored.

Opt out of mandatory arbitration

As an aside, the Gogo terms of service automatically enroll you into mandatory arbitration unless you opt out by sending an email to customercare@gogoair.com within 30 days of using their service.

You may want to create an email template for this so that you can fire off a quick email to them as soon as you connect. I will probably write a script for it next time I use this service.

Fedora Silverblue is an amazing immutable desktop

I recently switched my regular Fedora 31 workstation over to the 31 Silverblue release. I’ve played with Project Atomic before and have been meaning to try it out more seriously for a while, but never had the time. Silverblue provided the catalyst to do that.

What this brings to the table is quite amazing and seriously impressive. The base OS is immutable and everyone’s install is identical. This means quality can be improved as there are less combinations and it’s easier to test. Upgrades to the next major version of Fedora are fast and secure. Instead of updating thousands of RPMs in-place, the new image is downloaded and the system reboots into it. As the underlying images don’t change, it also offers full rollback support.

This is similar to how platforms like Chrome OS and Android work, but thanks to ostree it’s now available for Linux desktops! That is pretty neat.

It doesn’t come with a standard package manager like dnf. Instead, any packages or changes you need to perform on the base OS are done using rpm-ostree command, which actually layers them on top.

And while technically you can install anything using rpm-ostree, ideally this should be avoided as much as possible (some low level apps like shells and libvirt may require it, though). Flatpak apps and containers are the standard way to consume packages. As these are kept separate from the base OS, it also helps improve stability and reliability.

Installing Silverblue

I copied the Silverblue installer to a USB stick and booted it to do the install. As my Dell XPS has an NVIDIA card, I modified the installer’s kernel args and disabled the nouveau driver with the usual nouveau.modeset=0 to get the install GUI to show up.

I’m also running in UEFI mode and due to a bug you have to use a separate, dedicated /boot/efi partition for Silverblue (personally, I think that’s a good thing to do anyway). Otherwise, the install looks pretty much the same as regular Fedora and went smoothly.

Once installed, I blacklisted the nouveau driver and rebooted. To make these kernel arguments permanent, we don’t use grub2, we set kernel args with rpm-ostree.

rpm-ostree kargs --append=modprobe.blacklist=nouveau --append=rd.driver.blacklist=nouveau

The NVIDIA drivers from RPMFusion are supported, so following this I had to add the repositories and drivers as RPMs on the base image.

rpm-ostree install \
https://download1.rpmfusion.org/free/fedora/rpmfusion-free-release-31.noarch.rpm \
https://download1.rpmfusion.org/nonfree/fedora/rpmfusion-nonfree-release-31.noarch.rpm
systemctl reboot

Once rebooted I then installed the necessary packages and rebooted again to activate them.

rpm-ostree install akmod-nvidia xorg-x11-drv-nvidia-cuda libva-utils libva-vdpau-driver gstreamer1-libav
rpm-ostree kargs --append=nvidia-drm.modeset=1
systemctl reboot

That was the base setup complete, which all went pretty smoothly. What you’re left with is the base OS with GNOME and a few core apps.

GNOME in Silverblue

Working with Silverblue

Using Silverblue is a different way of working than I have been used to. As mentioned above, there is no dnf command and packages are layered on top of the base OS with the rpm-ostree command. Because this is a layer, installing a new RPM requires a reboot to activate it, which is quite painful when you’re in the middle of some work and realise you need a program.

The answer though, is to use more containers instead of RPMs as I’m used to.

Containers

As I wrote about in an earlier blog post, toolbox is wrapper for setting up containers and compliments Silverblue wonderfully. If you need to install any terminal apps, give this a shot. Creating and running a container is as simple as this.

toolbox create
toolbox enter
Container on Fedora SIlverblue

Once inside your container use it like a normal Fedora machine (dnf is available!).

As rpm-ostree has no search function, using a container is the expected way to do this. Having created the container above, you can now use it (without entering it first) to perform package searches.

toolbox run dnf search vim

Apps

Graphical apps are managed with Flatpak, the new way to deliver secure, isolated programs on Linux. Silverblue is configured to use Fedora apps out of the box, and you can also add Flathub as a third party repo.

I experienced some small glitches with the Software GUI program when applying updates, but I don’t normally use it so I’m not sure if it’s just beta issues or not. As the default install is more sparse than usual, you’ll find yourself needing to install the apps you use. I really like this approach, it keeps the base system smaller and cleaner.

While Fedora provides their own Firefox package in Flatpak format (which is great) Mozilla also just recently started publishing their official package to Flathub. So, to install that, we simply add the Flathub as a repository and install away!

flatpak remote-add flathub https://flathub.org/repo/flathub.flatpakrepo
flatpak update
flatpak install org.mozilla.firefox

After install, Firefox should appears as a regular app inside GNOME.

Official Firefox from Mozilla via Flatpak

If you need to revert to an earlier version of a Flatpak (which I did when I was testing out Firefox beta), you can fetch the remote log for the app, then update to a specific commit.

flatpak remote-info --log flathub-beta org.mozilla.firefox//beta
flatpak update \
--commit 908489d0a77aaa8f03ca8699b489975b4b75d4470ce9bac92e56c7d089a4a869 \
org.mozilla.firefox//beta

Replacing system packages

If you have installed a Flatpak, like Firefox, and no-longer want to use the RPM version included in the base OS, you can use rpm-ostree to override it.

rpm-ostree override remove firefox

After a reboot, you will only see your Flatpak version.

Upgrades

I upgraded from 31 to the 32 beta, which was very fast by comparison to regular Fedora (because it just needs to download the new base image) and pretty seamless.

The only hiccup I had was needing to remove RPMFusion 31 release RPMs first, upgrade the base to 32, then install the RPMFusion 32 release RPMs. After that, I did an update for good measure.

rpm-ostree uninstall rpmfusion-nonfree-release rpmfusion-free-release
rpm-ostree rebase fedora:fedora/32/x86_64/silverblue
rpm-ostree install \
https://download1.rpmfusion.org/free/fedora/rpmfusion-free-release-32.noarch.rpm \
https://download1.rpmfusion.org/nonfree/fedora/rpmfusion-nonfree-release-32.noarch.rpm
systemctl reboot

Then post reboot, I did a manual update of the system.

rpm-ostree upgrade

You can see the current status of your system with the rpm-ostree command.

rpm-ostree status 

On my system you can see the ostree I’m using, the commit as well as both layered and local packages.

State: idle
AutomaticUpdates: disabled
Deployments:
● ostree://fedora:fedora/32/x86_64/silverblue
                   Version: 32.20200410.n.0 (2020-04-10T08:35:30Z)
                BaseCommit: d809af7c4f170a2175ffa1374827dd55e923209aec4a7fb4dfc7b87cd6c110c9
              GPGSignature: Valid signature by 97A1AE57C3A2372CCA3A4ABA6C13026D12C944D0
           LayeredPackages: akmod-nvidia git gstreamer1-libav ipmitool libva-utils libva-vdpau-driver libvirt
                            pass powertop screen tcpdump tmux vim virt-manager xorg-x11-drv-nvidia-cuda
             LocalPackages: rpmfusion-free-release-32-0.3.noarch rpmfusion-nonfree-release-32-0.4.noarch

  ostree://fedora:fedora/32/x86_64/silverblue
                   Version: 32.20200410.n.0 (2020-04-10T08:35:30Z)
                BaseCommit: d809af7c4f170a2175ffa1374827dd55e923209aec4a7fb4dfc7b87cd6c110c9
              GPGSignature: Valid signature by 97A1AE57C3A2372CCA3A4ABA6C13026D12C944D0
           LayeredPackages: akmod-nvidia git gstreamer1-libav ipmitool libva-utils libva-vdpau-driver libvirt
                            pass powertop screen tcpdump tmux vim virt-manager xorg-x11-drv-nvidia-cuda
             LocalPackages: rpmfusion-free-release-32-0.3.noarch rpmfusion-nonfree-release-32-0.4.noarch

To revert to the previous version temporarily, simply select it from the grub boot menu and you’ll go back in time. If you want to make this permanent, you can rollback to the previous state instead and then just reboot.

rpm-ostree rollback

Silverblue is really impressive and works well. I will continue to use it as my daily driver and see how it goes over time.

Tips

I have run into a couple of issues, mostly around using the Software GUI (which I don’t normally use). Mostly these were things like it listing updates for Flatpaks which were not actually there fore update, and when you tied to update it didn’t do anything.

If you hit issues, you can try clearing out the Software data and loading the program again.

pkill gnome-software
rm -rf ~/.cache/gnome-software

If you need to, you can also clean out and refresh the rpm-ostree cache and do an update.

rpm-ostree cleanup -m
rpm-ostree update

To repair and update Flatpaks, if you need to.

flatpak repair
flatpak update

Also see

Making dnf on the host terminal a little easier with aliases.

Accessing USB serial devices on the host and in a toolbox container.

A temporary return to Australia due to COVID-19

The last few months have been a rollercoaster, and we’ve just had to make another big decision that we thought we’d share.

TL;DR: we returned to Australia last night, hopeful to get back to Canada when we can. Currently in Sydney quarantine and doing fine.

UPDATE: please note that this isn’t at all a poor reflection on Canada. To the contrary, we have loved even the brief time we’ve had there, the wonderful hospitality and kindness shown by everyone, and the excellent public services there.

We moved to Ottawa, Canada at the end of February, for an incredible job opportunity with Service Canada which also presented a great life opportunity for the family. We enjoyed 2 “normal” weeks of settling in, with the first week dedicated to getting set up, and the second week spent establishing a work / school routine – me in the office, little A in school and T looking at work opportunities and running the household.

Then, almost overnight, everything went into COVID lock down. Businesses and schools closed. Community groups stopped meeting. Everyone people are being affected by this every day, so we have been very lucky to be largely fine and in good health, and we thought we could ride it out safely staying in Ottawa, even if we hadn’t quite had the opportunity to establish ourselves.

But then a few things happened which changed our minds – at least for now.

Firstly, with the schools shut down before the A had really had a chance to make friends (she only attended for 5 days before the school shut down), she was left feeling very isolated. The school is trying to stay connected with its students by providing a half hour video class each day, with a half hour activity in the afternoons, but it’s no way to help her to make new friends. A has only gotten to know the kids of one family in Ottawa, who are also in isolation but have been amazingly supportive (thanks Julie and family!), so we had to rely heavily on video playdates with cousins and friends in Australia, for which the timezone difference only allows a very narrow window of opportunity each day. With every passing day, the estimated school closures have gone from weeks, to months, to very likely the rest of the school year (with the new school year commencing in September). If she’d had just another week or two, she would have likely found a friend, so that was a pity. It’s also affected the availability of summer camps for kids, which we were relying on to help us with A through the 2 month summer holiday period (July & August).

Secondly, we checked our health cover and luckily the travel insurance we bought covered COVID conditions, but we were keen to get full public health cover. Usually for new arrivals there is a 3 month waiting period before this can be applied for. However, in response to the COVID threat the Ontario Government recently waived that waiting period for public health insurance, so we rushed to register. Unfortunately, the one service office that is able to process applications from non-Canandian citizens had closed by that stage due to COVID, with no re-opening being contemplated. We were informed that there is currently no alternative ability for non-citizens to apply online or over the phone.

Thirdly, the Australian Government has strongly encouraged all Australian citizens to return home, warning of the closing window for international travel. . We became concerned we wouldn’t have full consulate support if something went wrong overseas. A good travel agent friend of ours told us the industry is preparing for a minimum of 6 months of international travel restrictions, which raised the very real issue that if anything went wrong for us, then neither could we get home, nor family come to us. And, as we can now all appreciate, it’s probable that international travel disruptions and prohibitions will endure for much longer than 6 months.

Finally, we had a real scare. For context, we signed a lease for an apartment in a lovely part of central Ottawa, but we weren’t able to move in until early April, so we had to spend 5 weeks living in a hotel room. We did move into our new place just last Sunday and it was glorious to finally have a place, and for little A to finally have her own room, which she adored. Huge thanks to those who generously helped us make that move! The apartment is only 2 blocks away from A’s new school, which is incredibly convenient for us – it will particularly good during the worst of Ottawa’s winter. But little A, who is now a very active and adventurous 4 years old, managed to face plant off her scooter (trying to bunnyhop down a stair!) and she knocked out a front tooth, on only the second day in the new place! She is ok, but we were all very, very lucky that it was a clean accident with the tooth coming out whole and no other significant damage. But we struggled to get any non emergency medical support.

The Ottawa emergency dental service was directing us to a number that didn’t work. The phone health service was so busy that we were told we couldn’t even speak to a nurse for 24 hours. We could have called emergency services and gone to a hospital, which was comforting, but several Ottawa hospitals reported COVID outbreaks just that day, so we were nervous to do so. We ended up getting medical support from the dentist friend of a friend over text, but that was purely by chance. It was quite a wake up call as to the questions of what we would have done if it had been a really serious injury. We just don’t know the Ontario health system well enough, can’t get on the public system, and the pressure of escalating COVID cases clearly makes it all more complicated than usual.

If we’d had another month or two to establish ourselves, we think we might have been fine, and we know several ex-pats who are fine. But for us, with everything above, we felt too vulnerable to stay in Canada right now. If it was just Thomas and I it’d be a different matter.

So, we have left Ottawa and returned to Australia, with full intent to return to Canada when we can. As I write this, we are on day 2 of the 14 day mandatory isolation in Sydney. We were apprehensive about arriving in Sydney, knowing that we’d be put into mandatory quarantine, but the processing and screening of arrivals was done really well, professionally and with compassion. A special thank you to all the Sydney airport and Qatar Airways staff, immigration and medical officers, NSW Police, army soldiers and hotel staff who were all involved in the process. Each one acted with incredible professionalism and are a credit to their respective agencies. They’re also exposing themselves to the risk of COVID in order to help others. Amazing and brave people. A special thank you to Emma Rowan-Kelly who managed to find us these flights back amidst everything shutting down globally.

I will continue working remotely for Service Canada, on the redesign and implementation of a modern digital channel for government services. Every one of my team is working remotely now anyway, so this won’t be a significant issue apart from the timezone. I’ll essentially be a shift worker for this period Our families are all self isolating, to protect the grandparents and great-grandparents, so the Andrews family will be self-isolating in a location still to be confirmed. We will be traveling directly there once we are released from quarantine, but we’ll be contactable via email, fb, whatsapp, video, etc.

We are still committed to spending a few years in Canada, working, exploring and experiencing Canadian cultures, and will keep the place in Ottawa with the hope we can return there in the coming 6 months or so. We are very, very thankful for all the support we have had from work, colleagues, little A’s school, new friends there, as well as that of friends and family back in Australia.

Thank you all – and stay safe. This is a difficult time for everyone, and we all need to do our part and look after each other best we can.

Easy containers on Fedora with toolbox

The toolbox program is a wrapper for setting up containers on Fedora. It’s not doing anything you can’t do yourself with podman, but it does make using and managing containers more simple and easy to do. It comes by default on Silverblue where it’s aimed for use with terminal apps and dev work, but you can try it on a regular Fedora workstation.

sudo dnf install toolbox

Creating containers

You can create just one container if you want, which will be called something like fedora-toolbox-32, or you can create separate containers for different things. Up to you. As an example, let’s create a container called testing-f32.

toolbox create --container testing-f32

By default toolbox uses the Fedora registry and creates a container which is the same version as your host. However you can specify a different version if you need to, for example if you needed a Fedora 30 container.

toolbox create --release f30 --container testing-f30

These containers are not yet running, they’ve just been created for you.

View your containers

You can see your containers with the list option.

toolbox list

This will show you both the images in your cache and the containers in a nice format.

IMAGE ID      IMAGE NAME                                        CREATED
c49513deb616  registry.fedoraproject.org/f30/fedora-toolbox:30  5 weeks ago
f7cf4b593fc1  registry.fedoraproject.org/f32/fedora-toolbox:32  4 weeks ago

CONTAINER ID  CONTAINER NAME  CREATED        STATUS   IMAGE NAME
b468de87277b  testing-f30     5 minutes ago  Created  registry.fedoraproject.org/f30/fedora-toolbox:30
1597ab1a00a5  testing-f32     5 minutes ago  Created  registry.fedoraproject.org/f32/fedora-toolbox:32

As toolbox is a wrapper, you can also see this information with podman, but with two commands; one for images and one for containers. Notice that with podman you can also see that these containers are not actually running (that’s the next step).

podman images ; podman ps -a
registry.fedoraproject.org/f32/fedora-toolbox   32       f7cf4b593fc1   4 weeks ago    360 MB
registry.fedoraproject.org/f30/fedora-toolbox   30       c49513deb616   5 weeks ago    404 MB

CONTAINER ID  IMAGE                                             COMMAND               CREATED             STATUS   PORTS  NAMES
b468de87277b  registry.fedoraproject.org/f30/fedora-toolbox:30  toolbox --verbose...  About a minute ago  Created         testing-f30
1597ab1a00a5  registry.fedoraproject.org/f32/fedora-toolbox:32  toolbox --verbose...  About a minute ago  Created         testing-f32

You can also use podman to inspect the containers and appreciate all the extra things toolbox is doing for you.

podman inspect testing-f32

Entering a container

Once you have a container created, to use it you just enter it with toolbox.

toolbox enter --container testing-f32

Now you are inside your container which is separate from your host, but it generally looks the same. A number of bind mounts were created automatically for you and you’re still in your home directory. It is important to note that all containers you run with toolbox will share your home directory! Thus it won’t isolate different versions of the same software, for example, you would still need to create separate virtual environments for Python.

Any new shells or tabs you create in your terminal app will also be inside that container. Note the PS1 variable has changed to have a pink shape at the front (from /etc/profile.d/toolbox.sh).

Inside a container with toolbox

Note that you could also start and enter the container with podman.

podman start testing-f30
podman exec -it -u ${EUID} -w ${HOME} testing-f30 /usr/bin/bash

Hopefully you can see how toolbox make using containers easier!

Exiting a container

To get out of the container, just exit the shell and you’ll be back to your previous session on the host. The container will still exist and can be entered again, it is not deleted unless you delete it.

Removing a container

To remove a container, simply run toolbox with the rm option. Note that this still keeps the images around, it just deletes the instance of that image that’s running as that container.

toolbox rm -f testing-f32

Again, you can also delete this using podman.

Using containers

Once inside a container you can basically (mostly) treat your container system as a regular Fedora host. You can install any apps you want, such as terminal apps like screenfetch and even graphical programs like gedit (which work from inside the container).

sudo dnf install screenfetch gedit
screenfetch is always a favourite

For any programs that require RPMFusion, like ffmpeg, you first need to set up the repos as you would on a regular Fedora system.

sudo dnf install \
https://download1.rpmfusion.org/free/fedora/rpmfusion-free-release-$(rpm -E %fedora).noarch.rpm \
https://download1.rpmfusion.org/nonfree/fedora/rpmfusion-nonfree-release-$(rpm -E %fedora).noarch.rpm
sudo dnf install ffmpeg

These programs like screenfetch and ffmpeg are available inside your container, but not outside your container. They are isolated. To run them in the future you would enter the container and run the program.

Instead of entering and then running the program, you can also just use the run command. Here you can see screenfetch is not on my host, but I can run it in the container.

Those are pretty simple (silly?) examples, but hopefully it demonstrates the value of toolbox. It’s probably more useful for dev work where you can separate and manage different versions of various platforms, but it does make it really easy to quickly spin something outside of you host system.

April 06, 2020

The Calculating Stars

Share

Winner of both a Hugo, Locus and a Nebula, this book is about a mathematical prodigy battling her way into a career as an astronaut in a post-apolocalyptic 1950s America. Along the way she has to take on the embedded sexism of America in the 50s, as well as her own mild racism. Worse, she suffers from an anxiety condition.

The book is engaging and well written, with an alternative history plot line which believable and interesting. In fact, its quite topical for our current time.

I really enjoyed this book and I will definitely be reading the sequel.

The Calculating Stars Book Cover The Calculating Stars
Mary Robinette Kowal
May 16, 2019
432

The Right Stuff meets Hidden Figures by way of The Martian. A world in crisis, the birth of space flight and a heroine for her time and ours; the acclaimed first novel in the Lady Astronaut series has something for everyone., On a cold spring night in 1952, a huge meteorite fell to earth and obliterated much of the east coast of the United States, including Washington D.C. The ensuing climate cataclysm will soon render the earth inhospitable for humanity, as the last such meteorite did for the dinosaurs. This looming threat calls for a radically accelerated effort to colonize space, and requires a much larger share of humanity to take part in the process. Elma York's experience as a WASP pilot and mathematician earns her a place in the International Aerospace Coalition's attempts to put man on the moon, as a calculator. But with so many skilled and experienced women pilots and scientists involved with the program, it doesn't take long before Elma begins to wonder why they can't go into space, too. Elma's drive to become the first Lady Astronaut is so strong that even the most dearly held conventions of society may not stand a chance against her.

Share

April 05, 2020

Custom WiFi enabled nightlight with ESPHome and Home Assistant

I built this custom night light for my kids as a fun little project. It’s pretty easy so thought someone else might be inspired to do something similar.

Custom WiFi connected nightlight

Hardware

The core hardware is just an ESP8266 module and an Adafruit NeoPixel Ring. I also bought a 240V bunker light and took the guts out to use as the housing, as it looked nice and had a diffuser (you could pick anything that you like).

Removing existing components from bunker light

While the data pin of the NeoPixel Ring can pretty much connect to any GPIO pin on the ESP, bitbanging can cause flickering. It’s better to use pins 1, 2 or 3 on an ESP8266 where we can use other methods to talk to the device.

These methods are exposed in ESPHome’s support for NeoPixel.

  • ESP8266_DMA (default for ESP8266, only on pin GPIO3)
  • ESP8266_UART0 (only on pin GPIO1)
  • ESP8266_UART1 (only on pin GPIO2)
  • ESP8266_ASYNC_UART0 (only on pin GPIO1)
  • ESP8266_ASYNC_UART1 (only on pin GPIO2) (only on pin GPIO2)
  • ESP32_I2S_0 (ESP32 only)
  • ESP32_I2S_1 (default for ESP32)
  • BIT_BANG (can flicker a bit)

I chose GPIO2 and use ESP8266_UART1 method in the code below.

So, first things first, solder up some wires to 5V, GND and GPIO pin 2 on the ESP module. These connect to the 5V, GND and data pins on the NeoPixel Ring respectively.

It’s not very neat, but I used a hot glue gun to stick the ESP module into the bottom part of the bunker light, and fed the USB cable through for power and data.

I hot-glued the NeoPixel Ring in-place on the inside of the bunker light, in the centre, shining outwards towards the diffuser.

The bottom can then go back on and screws hold it in place. I used a hacksaw to create a little slot for the USB cable to sit in and then added hot-glue blobs for feet. All closed up, it looks like this underneath.

Looks a bit more professional from the top.

Code using ESPHome

I flashed the ESP8266 using ESPHome (see my earlier blog post) with this simple YAML config.

esphome:
  name: nightlight
  build_path: ./builds/nightlight
  platform: ESP8266
  board: huzzah
  esp8266_restore_from_flash: true

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

# Enable logging
logger:

# Enable Home Assistant API
api:
  password: '!secret api_password'

# Enable over the air updates
ota:
  password: !secret ota_password

mqtt:
  broker: !secret mqtt_broker
  username: !secret mqtt_username
  password: !secret mqtt_password
  port: !secret mqtt_port

light:
  - platform: neopixelbus
    pin: GPIO2
    method: ESP8266_UART1
    num_leds: 16
    type: GRBW
    name: "Nightlight"
    effects:
      # Customize parameters
      - random:
          name: "Slow Random"
          transition_length: 30s
          update_interval: 30s
      - random:
          name: "Fast Random"
          transition_length: 4s
          update_interval: 5s
      - addressable_rainbow:
          name: Rainbow
          speed: 10
          width: 50
      - addressable_twinkle:
          name: Twinkle Effect
          twinkle_probability: 5%
          progress_interval: 4ms
      - addressable_random_twinkle:
          name: Random Twinkle
          twinkle_probability: 5%
          progress_interval: 32ms
      - addressable_fireworks:
          name: Fireworks
          update_interval: 32ms
          spark_probability: 10%
          use_random_color: false
          fade_out_rate: 120
      - addressable_flicker:
          name: Flicker

The esp8266_restore_from_flash option is useful because if the light is on and someone accidentally turns it off, it will go back to the same state when it is turned back on. It does wear the flash out more quickly, however.

The important settings are the light component with the neopixelbus platform, which is where all the magic happens. We specify which GPIO on the ESP the data line on the NeoPixel Ring is connected to (pin 2 in my case). The method we use needs to match the pin (as discussed above) and in this example is ESP8266_UART1.

The number of LEDs must match the actual number on the NeoPixel Ring, in my case 16. This is used when talking to the on-chip LED driver and calculating effects, etc.

Similarly, the LED type is important as it determines which order the colours are in (swap around if colours don’t match). This must match the actual type of NeoPixel Ring, in my case I’m using an RGBW model which has a separate white LED and is in the order GRBW.

Finally, you get all sorts of effects for free, you just need to list the ones you want and any options for them. These show up in Home Assistant under the advanced view of the light (screenshot below).

Now it’s a matter of plugging the ESP module in and flashing it with esphome.

esphome nightlight.yaml run

Home Assistant

After a reboot, the device should automatically show up in Home Assistant under Configuration -> Devices. From here you can add it to the Lovelace dashboard and make Automations or Scripts for the device.

Nightlight in Home Assistant with automations

Adding it to Lovelace dashboard looks something like this, which lets you easily turn the light on and off and set the brightness.

You can also get advanced settings for the light, where you can change brightness, colours and apply effects.

Nightlight options

Effects

One of the great things about using ESPHome is all the effects which are defined in the YAML file. To apply an effect, choose it from the advanced device view in Home Assistant (as per screenshot above).

This is what rainbow looks like.

Nightlight running Rainbow effect

The kids love to select the colours and effects they want!

Automation

So, once you have the nightlight showing up in Home Assistant, we can create a simple automation to turn it on at sunset and off at sunrise.

Go to Configuration -> Automation and add a new one. You can fill in any name you like and there’s an Execute button there when you want to test it.

The trigger uses the Sun module and runs 10 minutes before sunset.

I don’t use Conditions, but you could. For example, only do this when someone’s at home.

The Actions are set to call the homeassistant.turn_on function and specifies the device(s). Note this takes a comma separated list, so if you have more than one nightlight you can do it with the one automation rule.

That’s it! You can create another one for sunrise, but instead of calling homeassistant.turn_on just call homeassistant.turn_off and use Sunrise instead Sunset.

Infinite complacency

Infinite complacency
Attachment Size
First paragraph of War of the Worlds 1.84 MB
kattekrab Sun, 05/04/2020 - 10:56

April 04, 2020

COVID-19 and Appreciation

So I’m going near people just once a week to shop. Once a day I go outside on my bike (but nowhere near people) to maintain my mental and physical health. It helps that we live in sparsely populated suburbs.

I shop at my local Woolworths (Findon, South Australia), and was very impressed what I saw today. Crosses on the floor positioning us 2m apart and a bouncer regulating the flow and keeping store numbers low. Same thing on the checkout, and in front of the Deli counter. While I was queuing, a young lady wiped down the trolley handle and offered me hand sanitiser.

The EFTPOS limit has been raised, so no need to use my fingers to enter a PIN number at the checkout. That’s good – I now regard that keypad as an efficient means to distribute a viral payload. Just wave my card 20mm above the machine and I have groceries to sustain my son and I for a week. It costs what I earn with just 1 hour of my labour.

In the middle of the biggest crisis to hit the World since WW2, I can buy just about anything I want. I could gain weight if I wanted too.

Our power went off in a storm last night, and with it the Broadband Internet. However I still had my phone, a hotspot, and a laptop connected to the Internet, friends and loved ones. I immediately received a text from the power company telling me the power would be restored in 2 hours. They did it in 1. In the middle of COVID-19. At night, in the rain. While waiting my son and I cooked a nice BBQ outside in the twilight using gas.

My part time day job is secure, my pay keeps coming, and we have transitioned to WFH and are working well. My shares have been smashed but I can live with that – they are still good companies and I am a long term investor. My son is being home schooled and his teachers at Findon High are working hard on online content and remote teaching.

The Australian COVID-19 new case numbers are dropping and recoveries picking up. Many people are not going to die. The Australian population are working together to beat this.

We are well informed by our public broadcaster the ABC, our media is uncensored, and I can choose to do my own analysis using open source data sets.

What a fantastic world we live in, that can supply a surplus of food, and keep all our institutions running at a time like this. Well done to the Australian government and people.

I feel very grateful.

April 03, 2020

Building Daedalus Flight on NixOS

NixOS Daedalus Gears by Craige McWhirter

Daedalus Flight was recently released and this is how you can build and run this version of Deadalus on NixOS.

If you want to speed the build process up, you can add the IOHK Nix cache to your own NixOS configuration:

iohk.nix:

nix.binaryCaches = [
  "https://cache.nixos.org"
  "https://hydra.iohk.io"
];
nix.binaryCachePublicKeys = [
  "hydra.iohk.io:f/Ea+s+dFdN+3Y/G+FDgSq+a5NEWhJGzdjvKNGv0/EQ="
];

If you haven't already, you can clone the Daedalus repo and specifically the 1.0.0 tagged commit:

$ git clone --branch 1.0.0 https://github.com/input-output-hk/daedalus.git

Once you've cloned the repo and checked you're on the 1.0.0 tagged commit, you can build Daedalus flight with the following command:

$ nix build -f . daedalus --argstr cluster mainnet_flight

Once the build completes, you're ready to launch Daedalus Flight:

$ ./result/bin/daedalus

To verify that you have in fact built Daedalus Flight, first head to the Daedalus menu then About Daedalus. You should see a title such as "DAEDALUS 1.0.0". The second check, is to press [Ctl]+d to access Daedalus Diagnostocs and your Daedalus state directory should have mainnet_flight at the end of the path.

If you've got these, give yourself a pat on the back and grab yourself a refreshing bevvy while you wait for blocks to sync.

Daedalus FC1 screenshot

Bebo, Betty, and Jaco

Wait, wasn’t WordPress 5.4 just released?

It absolutely was, and congratulations to everyone involved! Inspired by the fine work done to get another release out, I finally completed the last step of co-leading WordPress 5.0, 5.1, and 5.2 (Bebo, Betty, and Jaco, respectively).

My study now has a bit more jazz in it. 🙂

April 02, 2020

Audiobooks – March 2020

My rating for books I read. Note that I’m perfectly happy with anything scoring 3 or better.

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

The World As It is: Inside the Obama White House by Ben Rhodes

A memoir of a senior White House staffer, Speechwriter & Presidential adviser. Lots of interesting accounts with and behind the scenes information. 4/5

Redshirts by John Scalzi

A Star Trek parody from the POV of five ensigns who realise something is very strange on their ship. Plot moves steadily and the humour and action mostly work. 3/5

Little House on the Prairie by Laura Ingalls Wilder

The book covers less than a year as the Ingalls family build a cabin in Indian territory on the Kansas Prairie. Dangerous incidents and adventures throughout. 3/5

Wheels Stop: The Tragedies and Triumphs of the Space Shuttle Program, 1986-2011 by Rich Houston

A book about the post-Challenger Shuttle missions. An overview of most of the missions and the astronauts on them. Lots of quotes mainly from the astronauts. Good for Spaceflight fans. 3/5

The Optimist’s Telescope: Thinking Ahead in a Reckless Age by Bina Venkataraman

Ways that people, organisations and governments can start looking ahead at the long term rather than just the short and why they don’t already. Some good stuff 4/5

Share

April 01, 2020

Zoom's Make or Break Moment

Zoom is experiencing massive growth as large sections of the workforce transition to working from home. At the same time many problems with Zoom are coming to light. This is their make or break moment. If they fix the problems they end up with a killer video conferencing app. The alternative is that they join Cisco's Webex in the dumpster fire of awful enterprise software.

In the interest of transparency I am a paying Zoom customer and I use it for hours every day. I also use Webex (under protest) as it is a client's video conferencing platform of choice.

In the middle of last year Jonathan Leitschuh disclosed two bugs in zoom with security and privacy implications . There was a string of failures that lead to these bugs. To Zoom’s credit they published a long blog post about why these “features” were there in the first place.

Over the last couple of weeks other issues with Zoom have surfaced. “Zoom bombing” or using random 9 digit numbers to find meetings has become a thing. This is caused by zoom’s meeting rooms having a 9 digit code to join. That’s really handy when you have to dial in and enter the number on your telephone keypad. The down side is that you have a 1 in 999 999 999 chance of joining a meeting when using a random number. Zoom does offer the option of requiring a password or PIN for each call. Unfortunately it isn’t the default. Publishing a blog post on how to secure your meetings isn’t enough, the app needs to be more secure by default. The app should default to enabling a 6 digit PIN when creating a meeting.

The Intercept is reporting Zoom’s marketing department got a little carried away when describing the encryption used in the product. This is an area where words matter. Encryption in transit is a base line requirement in communication tools these days. Zoom has this, but their claims about end to end encryption appear to be false. End to end encryption is very important for some use cases. I await the blog post explaining this one.

I don’t know why Proton Mail’s privacy issues blog post got so much attention. This appears to be based on someone skimming the documentation rather than any real testing. Regardless the post got a lot of traction. Some of the same issues were flagged by the EFF.

Until recently zoom’s FAQ read “Does Zoom sell Personal Data? […] Depends what you mean by ‘sell’”. I’m sure that sounded great in a meeting but it is worrying when you read it as a customer. Once called out on social media it was quickly updated and a blog post published. In the post, Zoom assures users it isn’t selling their data.

Joseph Cox reported late last week that Zoom was sending data to Facebook every time someone used their iOS app. It is unclear if Joe gave Zoom an opportunity to fix the issue before publishing the article. The company pushed out a fix after the story broke.

The most recent issue broke yesterday about the Zoom macOS installer behaving like malware. This seems pretty shady behaviour, like their automatic reinstaller that was fixed last year. To his credit, Zoom Founder and CEO, Eric Yuan engaged with the issue on twitter. This will be one to watch over the coming days.

Over the last year I have seen a consistent pattern when Zoom is called out on security and valid privacy issues with their platform. They respond publicly with “oops my bad” blog posts . Many of the issues appear to be a result of them trying to deliver a great user experience. Unfortunately they some times lean too far toward the UX and ignore the security and privacy implications of their choices. I hope that over the coming months we see Zoom correct this balance as problems are called out. If they do they will end up with an amazing platform in terms of UX while keeping their users safe.

Update Since publishing this post additional issues with Zoom were reported. Zoom's CEO announced the company was committed to fixing their product.

March 31, 2020

Defining home automation devices in YAML with ESPHome and Home Assistant, no programming required!

Having built the core of my own “dumb” smart home system, I have been working on making it smart these past few years. As I’ve written about previously, the smart side of my home automation is managed by Home Assistant, which is an amazing, privacy focused open source platform. I’ve previously posted about running Home Assistant in Docker and in Podman.

Home Assistant, the privacy focused, open source home automation platform

I do have a couple of proprietary home automation products, including LIFX globes and Google Home. However, the vast majority of my home automation devices are ESP modules running open source firmware which connect to MQTT as the central protocol. I’ve built a number of sensors and lights and been working on making my light switches smart (more on that in a later blog post).

I already had experience with Arduino, so I started experimenting with this and it worked quite well. I then had a play with Micropython and really enjoyed it, but then I came across ESPHome and it blew me away. I have since migrated most of my devices to ESPHome.

ESPHome provides simple management of ESP devices

ESPHome is smart in making use of PlatformIO underneath, but its beauty lies in the way it abstracts away the complexities of programming for embedded devices. In fact, no programming is necessary! You simply have to define your devices in YAML and run a single command to compile the firmware blob and flash a device. Loops, initialising and managing multiple inputs and outputs, reading and writing to I/O, PWM, functions and callbacks, connecting to WiFi and MQTT, hosting an AP, logging and more is taken care of for you. Once up, the devices support mDNS and unencrypted over the air updates (which is fine for my local network). It supports both Home Assistant API and MQTT (over TLS for ESP8266) as well as lots of common components. There is even an addon for Home Assistant if you prefer using a graphical interface, but I like to do things on the command line.

When combined with Home Assistant, new devices are automatically discovered and appear in the web interface. When using MQTT, the channels are set with retain flag, so that the devices themselves and their last known states are not lost on reboots (you can disable this for testing).

That’s a lot of things you get for just a little bit of YAML!

Getting started

Getting started is pretty easy, just install esphome using pip.

pip3 install --user esphome

Of course, you will need a real physical ESP device of some description. Thanks to PlatformIO, lots of ESP8266 and ESP32 devices are supported. Although built on similar SOC, different devices break out different pins and can have different flashing requirements. Therefore, specifying the exact device is good and can be helpful, but it’s not strictly necessary.

It’s not just ESP modules that are supported. These days a number of commercial products are been built using ESP8266 chips which we can flash, like Sonoff power modules, Xiaomi temperature sensors, Brilliant Smart power outlets and Mirabella Genio light bulbs (I use one of these under my stairs).

For this post though, I will use one of my MH-ET Live ESP32Minikit devices as an example, which has the device name of mhetesp32minikit.

MH-ET Live ESP32Minikit

Managing configs with Git

Everything with your device revolves around your device’s YAML config file, including configuration, flashing, accessing logs, clearing out MQTT messages and more.

ESPHome has a wizard which will prompt you to enter your device details and WiFi credentials. It’s a good way to get started, however it only creates a skeleton file and you have to continue configuring the device manually to actually do anything anyway. So, I think ultimately it’s easier to just create and manage your own files, which we’ll do below. (If you want to give it a try, you can run the command esphome example.yaml wizard which will create an example.yaml file.)

I have two Git repositories to manage my ESPHome devices. The first one is for my WIFI and MQTT credentials, which are stored as variables in a file called secrets.yaml (store them in an Ansible vault, if you like). ESPHome automatically looks for this file when compiling firmware for a device and will use those variables.

Let’s create the Git repo and secrets file, replacing the details below with your own. Note that I am including the settings for an MQTT server, which is unencrypted in the example. If you’re using an MQTT server online you may want to use an ESP8266 device instead and enable TLS fingerprints for a more secure connection. I should also mention that MQTT is not required, devices can also use the Home Assistant API and if you don’t use MQTT those variables can be ignored (or you can leave them out).

mkdir ~/esphome-secrets
cd ~/esphome-secrets
cat > secrets.yaml << EOF
wifi_ssid: "ssid"
wifi_password: "wifi-password"
api_password: "api-password"
ota_password: "ota-password"
mqtt_broker: "mqtt-ip"
mqtt_port: 1883
mqtt_username: "mqtt-username"
mqtt_password: "mqtt-password"
EOF
git init
git add .
git commit -m "esphome secrets: add secrets"

The second Git repo has all of my device configs and references the secrets file from the other repo. I name each device’s config file the same as its name (e.g. study.yaml for the device that controls my study). Let’s create the Git repo and link to the secrets file and ignore things like the builds directory (where builds will go!).

mkdir ~/esphome-configs
cd ~/esphome-configs
ln -s ../esphome-secrets/secrets.yaml .
cat > .gitignore << EOF
/.esphome
/builds
/.*.swp
EOF
git init
git add .
git commit -m "esphome configs: link to secrets"

Creating a config

The config file contains different sections with core settings. You can leave some of these settings out, such as api, which will disable that feature on the device (esphome is required).

  • esphome – device details and build options
  • wifi – wifi credentials
  • logger – enable logging of device to see what’s happening
  • ota – enables over the air updates
  • api – enables the Home Assistant API to control the device
  • mqtt – enables MQTT to control the device

Now that we have our base secrets file, we can create our first device config! Note that settings with !secret are referencing the variables in our secrets.yaml file, thus keeping the values out of our device config. Here’s our new base config for an ESP32 device called example in a file called example.yaml which will connect to WiFi and MQTT.

cat > example.yaml << EOF
esphome:
  name: example
  build_path: ./builds/example
  platform: ESP32
  board: mhetesp32minikit

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

logger:

api:
  password: !secret api_password

ota:
  password: !secret ota_password

mqtt:
  broker: !secret mqtt_broker
  username: !secret mqtt_username
  password: !secret mqtt_password
  port: !secret mqtt_port
  # Set to true when finished testing to set MQTT retain flag
  discovery_retain: false
EOF

Compiling and flashing the firmware

First, plug your ESP device into your computer which should bring up a new TTY, such as /dev/ttyUSB0 (check dmesg). Now that you have the config file, we can compile it and flash the device (you might need to be in the dialout group). The run command actually does a number of things, include sanity check, compile, flash and tail the log.

esphome example.yaml run

This will compile the firmware in the specified build dir (./builds/example) and prompt you to flash the device. As this is a new device, an over the air update will not work yet, so you’ll need to select the TTY device. Once the device is running and connected to WiFi you can use OTA.

INFO Successfully compiled program.
Found multiple options, please choose one:
  [1] /dev/ttyUSB0 (CP2104 USB to UART Bridge Controller)
  [2] Over The Air (example.local)
(number): 

Once it is flashed, the device is automatically rebooted. The terminal should now be automatically tailing the log of the device (we enabled logger in the config). If not, you can tell esphome to tail the log by running esphome example.yaml logs.

INFO Successfully uploaded program.
INFO Starting log output from /dev/ttyUSB0 with baud rate 115200
[21:30:17][I][logger:156]: Log initialized
[21:30:17][C][ota:364]: There have been 0 suspected unsuccessful boot attempts.
[21:30:17][I][app:028]: Running through setup()...
[21:30:17][C][wifi:033]: Setting up WiFi...
[21:30:17][D][wifi:304]: Starting scan...
[21:30:19][D][wifi:319]: Found networks:
[21:30:19][I][wifi:365]: - 'ssid' (02:18:E6:22:E2:1A) ▂▄▆█
[21:30:19][D][wifi:366]:     Channel: 1
[21:30:19][D][wifi:367]:     RSSI: -54 dB
[21:30:19][I][wifi:193]: WiFi Connecting to 'ssid'...
[21:30:23][I][wifi:423]: WiFi Connected!
[21:30:23][C][wifi:287]:   Hostname: 'example'
[21:30:23][C][wifi:291]:   Signal strength: -50 dB ▂▄▆█
[21:30:23][C][wifi:295]:   Channel: 1
[21:30:23][C][wifi:296]:   Subnet: 255.255.255.0
[21:30:23][C][wifi:297]:   Gateway: 10.0.0.123
[21:30:23][C][wifi:298]:   DNS1: 10.0.0.1
[21:30:23][C][ota:029]: Over-The-Air Updates:
[21:30:23][C][ota:030]:   Address: example.local:3232
[21:30:23][C][ota:032]:   Using Password.
[21:30:23][C][api:022]: Setting up Home Assistant API server...
[21:30:23][C][mqtt:025]: Setting up MQTT...
[21:30:23][I][mqtt:162]: Connecting to MQTT...
[21:30:23][I][mqtt:202]: MQTT Connected!
[21:30:24][I][app:058]: setup() finished successfully!
[21:30:24][I][app:100]: ESPHome version 1.14.3 compiled on Mar 30 2020, 21:29:41

You should see the device boot up and connect to your WiFi and MQTT server successfully.

Adding components

Great! Now we have a basic YAML file, let’s add some components to make it do something more useful. Components are high level groups, like sensors, lights, switches, fans, etc. Each component is divided into platforms which is where different devices of that type are supported. For example, two of the different platforms under the light component are rgbw and neopixelbus.

One thing that’s useful to know is that platform devices with the name property set in the config will appear in Home Assistant. Those without will be only local to the device and just have an id. This is how you can link multiple components together on the device, then present a single device to Home Assistant (like garage remote below).

Software reset switch

First thing we can do is add a software switch which will let us reboot the device from Home Assistant (or by publishing manually to MQTT or API). To do this, we add the reboot platform from the switch component. It’s as simple as adding this to the bottom of your YAML file.

switch:
  - platform: restart
    name: "Example Device Restart"

That’s it! Now we can re-run the compile and flash. This time you can use OTA to flash the device via mDNS (but if it’s still connected via TTY then you can still use that instead).

esphome example.yaml run

This is what OTA updates look like.

INFO Successfully compiled program.
Found multiple options, please choose one:
  [1] /dev/ttyUSB0 (CP2104 USB to UART Bridge Controller)
  [2] Over The Air (example.local)
(number): 2
INFO Resolving IP address of example.local
INFO  -> 10.0.0.123
INFO Uploading ./builds/example/.pioenvs/example/firmware.bin (856368 bytes)
Uploading: [=====================================                       ] 62% 

After the device reboots, the new reset button should automatically show up in Home Assistant as a device, under Configuration -> Devices under the name example.

Home Assistant with auto-detected example device and reboot switch

Because we set a name for the reset switch, the reboot switch is visible and called Example Device Restart. If you want to make this visible on the main Overview dashboard, you can do so by selecting ADD TO LOVELACE.

Go ahead and toggle the switch while still tailing the log of the device and you should see it restart. If you’ve already disconnected your ESP device from your computer, you can tail the log using MQTT.

LED light switch

OK, so rebooting the device is cute. Now what if we want to add something more useful for home automation? Well that requires some soldering or breadboard action, but what we can do easily is use the built-in LED on the device as a light and control it through Home Assistant.

On the ESP32 module, the built-in LED is connected to GPIO pin 2. We will first define that pin as an output component using the ESP32 LEDC platform (supports PWM). We then attach a light component using the monochromatic platform to that output component. Let’s add those two things to our config!

output:
  # Built-in LED on the ESP32
  - platform: ledc
    pin: 2
    id: output_ledpin2

light:
  # Light created from built-in LED output
  - platform: monochromatic
    name: "Example LED"
    output: output_ledpin2

Build and flash the new firmware again.

esphome example.yaml run

After the device reboots, you should now be able to see the new Example LED automatically in Home Assistant.

Example device page in Home Assistant showing new LED light

If we toggle this light a few times, we can see the built-in LED on the ESP device fading in and out at the same time.

Other components

As mentioned previously, there are many devices we can easily add to a single board like relays, PIR sensors, temperature and humidity sensors, reed switches and more.

Reed switch, relay, PIR, temperature and humidity sensor (from top to bottom, left to right)

All we need to do is connect them up to appropriate GPIO pins and define them in the YAML.

PIR sensor

A PIR sensor connects to ground and 3-5V, with data connecting to a GPIO pin (let’s use 34 in the example). We read the GPIO pin and can tell when motion is detected because the control pin voltage is set to high. Under ESPHome we can use the binary_sensor component with gpio platform. If needed, pulling the pin down is easy, just set the default mode. Finally, we set the class of the device to motion which will set the appropriate icon in Home Assistant. It’s as simple as adding this to the bottom of your YAML file.

binary_sensor:
  - platform: gpio
    pin:
      number: 34
      mode: INPUT_PULLDOWN
    name: "Example PIR"
    device_class: motion

Again, compile and flash the firmware with esphome.

esphome example.yaml run

As before, after the device reboots again we should see the new PIR device appear in Home Assistant.

Example device page in Home Assistant showing new PIR input

Temperature and humidity sensor

Let’s do another example, a DHT22 temperature sensor connected to GPIO pin 16. Simply add this to the bottom of your YAML file.

sensor:
  - platform: dht
    pin: 16
    model: DHT22
    temperature:
      name: "Example Temperature"
    humidity:
      name: "Example Humidity"
    update_interval: 10s

Compile and flash.

esphome example.yaml run

After it reboots, you should see the new temperature and humidity inputs under devices in Home Assistant. Magic!

Example device page in Home Assistant showing new temperature and humidity inputs

Garage opener using templates and logic on the device

Hopefully you can see just how easy it is to add things to your ESP device and have them show up in Home Assistant. Sometimes though, you need to make things a little more tricky. Take opening a garage door for example, which only has one button to start and stop the motor in turn. To emulate pressing the garage opener, you need apply voltage to the opener’s push button input for a short while and then turn it off again. We can do all of this easily on the device with ESPHome and preset a single button to Home Assistant.

Let’s assume we have a relay connected up to a garage door opener’s push button (PB) input. The relay control pin is connected to our ESP32 on GPIO pin 22.

ESP32 device with relay module, connected to garage opener inputs

We need to add a couple of devices to the ESP module and then expose only the button out to Home Assistant. Note that the relay only has an id, so it is local only and not presented to Home Assistant. However, the template switch which uses the relay has a name is and it has an action which causes the relay to be turned on and off, emulating a button press.

Remember we already added a switch component for the reboot platform? Now need to add the new platform devices to that same section (don’t create a second switch entry).

switch:
  - platform: restart
    name: "Example Device Restart"

  # The relay control pin (local only)
  - platform: gpio
    pin: GPIO22
    id: switch_relay

  # The button to emulate a button press, uses the relay
  - platform: template
    name: "Example Garage Door Remote"
    icon: "mdi:garage"
    turn_on_action:
    - switch.turn_on: switch_relay
    - delay: 500ms
    - switch.turn_off: switch_relay

Compile and flash again.

esphome example.yaml run

After the device reboots, we should now see the new Garage Door Remote in the UI.

Example device page in Home Assistant showing new garage remote inputs

If you actually cabled this up and toggled the button in Home Assistant, the UI button turn on and you would hear the relay click on, then off, then the UI button would go back to the off state. Pretty neat!

There are many other things you can do with ESPHome, but this is just a taste.

Commit your config to Git

Once you have a device to your liking, commit it to Git. This way you can track the changes you’ve made and can always go back to a working config.

git add example.yaml
git commit -m "adding my first example config"

Of course it’s probably a good idea to push your Git repo somewhere remote, perhaps even share your configs with others!

Creating automation in Home Assistant

Of course once you have all these devices it’s great to be able to use them in Home Assistant, but ultimately the point of it all is to automate the home. Thus, you can use Home Assistant to set up scripts and react to things that happen. That’s beyond the scope of this particular post though, as I really wanted to introduce ESPHome and show how you can easily manage devices and integrate them with Home Assistant. There is pretty good documentation online though. Enjoy!

Overriding PlatformIO

As a final note, if you need to override something from PlatformIO, for example specifying a specific version of a dependency, you can do that by creating a modified platformio.ini file in your configs dir (copy from one of your build dirs and modify as needed). This way esphome will pick it up and apply that or you automatically.

Links March 2020

Rolling Stone has an insightful article about why the Christian Right supports Trump and won’t stop supporting him no matter what he does [1].

Interesting article about Data Oriented Architecture [2].

Quarantine Will normalise WFH and Recession will Denormalise Jobs [3]. I guess we can always hope that after a disaster we can learn to do things better than before.

Tyre wear is worse than exhaust for small particulate matter [4]. We need better tyres and legal controls over such things.

Scott Santens wrote an insightful article about the need for democracy and unconditional basic income [5]. “In ancient Greece, work was regarded as a curse” is an extreme position but strongly supported by evidence. ‘In his essay “In Praise of Idleness,” Bertrand Russell wrote “Modern methods of production have given us the possibility of ease and security for all; we have chosen, instead, to have overwork for some and starvation for others. Hitherto we have continued to be as energetic as we were before there were machines; in this we have been foolish, but there is no reason to go on being foolish forever.”‘

Cory Doctorow wrote an insightful article for Locus titled A Lever Without a Fulcrum Is Just a Stick about expansions to copyright laws [6]. One of his analogies is that giving a bullied kid more lunch money just allows the bullies to steal more money, with artists being bullied kids and lunch money being the rights that are granted under copyright law. The proposed solution includes changes to labor and contract law, presumably Cory will write other articles in future giving the details of his ideas in this regard.

The Register has an amusing article about the trial of a former CIA employee on trial for being the alleged “vault 7 leaker” [7]. Both the prosecution and the defence are building their cases around the defendent being a jerk. The article exposes poor security and poor hiring practices in the CIA.

CNN has an informative article about Finland’s war on fake news [8]. As Finland has long standing disputes with Russia they have had more practice at dealing with fake news than most countries.

The Times of Israel has an interesting article about how the UK used German Jews to spy on German prisoners of war [9].

Cory Doctorow wrote an insightful article “Data is the New Toxic Waste” about how collecting personal data isn’t an asset, it’s a liability [10].

Ulrike Uhlig wrote an insightful article about “Control Freaks”, analysing the different meanings of control, both positive and negative [11].

538 has an informative article about the value of statistical life [12]. It’s about $9M per person in the US, which means a mind-boggling amount of money should be spent to save the millions of lives that will be potentially lost in a natural disaster (like Coronavirus).

NPR has an interesting interview about Crypto AG, the Swiss crypto company owned by the CIA [13]. I first learned of this years ago, it’s not new, but I still learned a lot from this interview.

March 30, 2020

Resolving mDNS across VLANs with Avahi on OpenWRT

mDNS, or multicast DNS, is a way to discover devices on your network at .local domain without any central DNS configuration (also known as ZeroConf and Bonjour, etc). Fedora Magazine has a good article on setting it up in Fedora, which I won’t repeat here.

If you’re like me, you’re using OpenWRT with multiple VLANs to separate networks. In my case this includes my home automation (HA) network (VLAN 2) from my regular trusted LAN (VLAN 1). Various untrusted home automation products, as well as my own devices, go into the HA network (more on that in a later post).

In my setup, my OpenWRT router acts as my central router, connecting each of my networks and controlling access. My LAN can access everything in my HA network, but generally only establish related TCP traffic is allowed back from HA to LAN. There are some exceptions though, for example my Pi-hole DNS servers which are accessible from all networks, but otherwise that’s the general setup.

With IPv4, mDNS communicates by sending IP multicast UDP packets to 224.0.0.251 with source and destination ports both using 5353. In order to receive requests and responses, your devices need to be running an mDNS service and also allow incoming UDP traffic on port 5353.

As multicast is local only, mDNS doesn’t work natively across routed networks. Therefore, this prevents me from easily talking to my various HA devices from my LAN. In order to support mDNS across routed networks, you need a proxy in the middle to transparently send requests and responses back and forward. There are a few different options for a proxy, such as igmpproxy, but i prefer to use the standard Avahi server on my OpenWRT router.

Keep in mind that doing this will also mean that any device in your untrusted networks will be able to send mDNS requests into your trusted networks. We could stop the mDNS requests with an application layer firewall (which iptables is not), or perhaps with connection tracking, but we’ll leave that for another day. Even if untrusted devices discover addresses in LAN, the firewall is stopping the from actually communicating (at least on my setup).

Set up Avahi

Log onto your OpenWRT router and install Avahi.

opkg update
opkg install avahi-daemon

There is really only one thing that must be set in the config, and that is to enable reflector (proxy) support. This goes under the [reflector] section and looks like this.

[reflector]
enable-reflector=yes

While technically not required, you can also set which interfaces to listen on. By default it will listen on all networks, which includes WAN and other VLANs, so I prefer to limit this just to the two networks I need.

On my router, my LAN is the br-lan device and my home automation network on VLAN 2 is the eth1.2 device. Your LAN is probably the same, but your other networks will most likely be different. You can find these in your router’s Luci web interface under Network -> Interfaces. The interfaces option goes under the [server] section and looks like this.

[server]
allow-interfaces=br-lan,eth1.2

Now we can start and enable the service!

/etc/init.d/avahi-daemon start
/etc/init.d/avahi-daemon enable

OK that’s all we need to do for Avahi. It is now configured to listen on both LAN and HA interfaces and act as a proxy back and forth.

Firewall rules

As mentioned above, devices need to have incoming UDP port 5353 open. In order for our router to act as a proxy, we must enable this on both LAN and HA network interfaces (we’ll just configure for all interfaces). As mDNS multicasts to a specific address with source and destination ports both using 5353, we can lock this rule down a bit more.

Log onto your firewall Luci web interface and go to Network -> Firewall -> Traffic Rules tab. Under Open ports on router add a new rule for mDNS. This will be for UDP on port 5353.

Find the new rule in the list and edit it so we can customise it further. We can set the source to be any zone, source port to be 5353, where destination zone is the Device (input) and the destination address and port are 224.0.0.251 and 5353. Finally, set action should be accept. If you prefer to not allow all interfaces, then create two rules instead and restrict source zone for one to LAN and to your untrusted network for the other. Hit Save & Apply to make the rule!

We should now be able to resolve mDNS from LAN into the untrusted network.

Testing

To test it, ensure your Fedora computer is configured for mDNS and can resolve yourself. Now, try and ping a device in your untrusted network. For me, this will be study.local which is one of my home automation devices in my study (funnily enough).

ping study.local

When my computer in LAN tries to discover the device running in the study, the communication flow looks like this.

  • My computer (192.168.0.125) on LAN tries to ping study.local but needs to resolve it.
  • My computer sends out the mDNS UDP multicast to 224.0.0.251:5353 on the LAN, requesting address of study.local.
  • My router (192.168.0.1) picks up the request on LAN and sends same multicast request out on HA network (10.0.0.1).
  • The study device on HA network picks up the request and multicasts the reply of 10.0.0.202 back to 224.0.0.251:5353 on the HA network.
  • My router picks up the reply on HA network and re-casts it on LAN.
  • My computer picks up the reply on LAN and thus learns the address of the study device on HA network.
  • My computer successfully pings study.local at 10.0.0.202 from LAN by routing through my router to HA network.

This is what a packet capture looks like.

16:38:12.489582 IP 192.168.0.125.5353 > 224.0.0.251.5353: 0 A (QM)? study.local. (35)
16:38:12.489820 IP 10.0.0.1.5353 > 224.0.0.251.5353: 0 A (QM)? study.local. (35)
16:38:12.696894 IP 10.0.0.202.5353 > 224.0.0.251.5353: 0*- [0q] 1/0/0 (Cache flush) A 10.0.0.202 (45)
16:38:12.697037 IP 192.168.0.1.5353 > 224.0.0.251.5353: 0*- [0q] 1/0/0 (Cache flush) A 10.0.0.202 (45)

And that’s it! Now we can use mDNS to resolve devices in an untrusted network from a trusted network with zeroconf.

March 28, 2020

How to get a direct WebRTC connections between two computers

WebRTC is a standard real-time communication protocol built directly into modern web browsers. It enables the creation of video conferencing services which do not require participants to download additional software. Many services make use of it and it almost always works out of the box.

The reason it just works is that it uses a protocol called ICE to establish a connection regardless of the network environment. What that means however is that in some cases, your video/audio connection will need to be relayed (using end-to-end encryption) to the other person via third-party TURN server. In addition to adding extra network latency to your call that relay server might overloaded at some point and drop or delay packets coming through.

Here's how to tell whether or not your WebRTC calls are being relayed, and how to ensure you get a direct connection to the other host.

Testing basic WebRTC functionality

Before you place a real call, I suggest using the official test page which will test your camera, microphone and network connectivity.

Note that this test page makes use of a Google TURN server which is locked to particular HTTP referrers and so you'll need to disable privacy features that might interfere with this:

  • Brave: Disable Shields entirely for that page (Simple view) or allow all cookies for that page (Advanced view).

  • Firefox: Ensure that http.network.referer.spoofSource is set to false in about:config, which it is by default.

  • uMatrix: The "Spoof Referer header" option needs to be turned off for that site.

Checking the type of peer connection you have

Once you know that WebRTC is working in your browser, it's time to establish a connection and look at the network configuration that the two peers agreed on.

My favorite service at the moment is Whereby (formerly Appear.in), so I'm going to use that to connect from two different computers:

  • canada is a laptop behind a regular home router without any port forwarding.
  • siberia is a desktop computer in a remote location that is also behind a home router, but in this case its internal IP address (192.168.1.2) is set as the DMZ host.

Chromium

For all Chromium-based browsers, such as Brave, Chrome, Edge, Opera and Vivaldi, the debugging page you'll need to open is called chrome://webrtc-internals.

Look for RTCIceCandidatePair lines and expand them one at a time until you find the one which says:

  • state: succeeded (or state: in-progress)
  • nominated: true
  • writable: true

Then from the name of that pair (N6cxxnrr_OEpeash in the above example) find the two matching RTCIceCandidate lines (one local-candidate and one remote-candidate) and expand them.

In the case of a direct connection, I saw the following on the remote-candidate:

  • ip shows the external IP address of siberia
  • port shows a random number between 1024 and 65535
  • candidateType: srflx

and the following on local-candidate:

  • ip shows the external IP address of canada
  • port shows a random number between 1024 and 65535
  • candidateType: prflx

These candidate types indicate that a STUN server was used to determine the public-facing IP address and port for each computer, but the actual connection between the peers is direct.

On the other hand, for a relayed/proxied connection, I saw the following on the remote-candidate side:

  • ip shows an IP address belonging to the TURN server
  • candidateType: relay

and the same information as before on the local-candidate.

Firefox

If you are using Firefox, the debugging page you want to look at is about:webrtc.

Expand the top entry under "Session Statistics" and look for the line (should be the first one) which says the following in green:

  • ICE State: succeeded
  • Nominated: true
  • Selected: true

then look in the "Local Candidate" and "Remote Candidate" sections to find the candidate type in brackets.

Firewall ports to open to avoid using a relay

In order to get a direct connection to the other WebRTC peer, one of the two computers (in my case, siberia) needs to open all inbound UDP ports since there doesn't appear to be a way to restrict Chromium or Firefox to a smaller port range for incoming WebRTC connections.

This isn't great and so I decided to tighten that up in two ways by:

  • restricting incoming UDP traffic to the IP range of siberia's ISP, and
  • explicitly denying incoming to the UDP ports I know are open on siberia.

To get the IP range, start with the external IP address of the machine (I'll use the IP address of my blog in this example: 66.228.46.55) and pass it to the whois command:

$ whois 66.228.46.55 | grep CIDR
CIDR:           66.228.32.0/19

To get the list of open UDP ports on siberia, I sshed into it and ran nmap:

$ sudo nmap -sU localhost

Starting Nmap 7.60 ( https://nmap.org ) at 2020-03-28 15:55 PDT
Nmap scan report for localhost (127.0.0.1)
Host is up (0.000015s latency).
Not shown: 994 closed ports
PORT      STATE         SERVICE
631/udp   open|filtered ipp
5060/udp  open|filtered sip
5353/udp  open          zeroconf

Nmap done: 1 IP address (1 host up) scanned in 190.25 seconds

I ended up with the following in my /etc/network/iptables.up.rules (ports below 1024 are denied by the default rule and don't need to be included here):

# Deny all known-open high UDP ports before enabling WebRTC for canada
-A INPUT -p udp --dport 5060 -j DROP
-A INPUT -p udp --dport 5353 -j DROP
-A INPUT -s 66.228.32.0/19 -p udp --dport 1024:65535 -j ACCEPT

March 25, 2020

Updating OpenStack TripleO Ceph nodes safely one at a time

Part of the process when updating Red Hat’s TripleO based OpenStack is to apply the package and container updates, viaupdate run step, to the nodes in each Role (like Controller, CephStorage and Compute, etc). This is done in-place, before the ceph-upgrade (ceph-ansible) step, converge step and reboots.

openstack overcloud update run --nodes CephStorage

Rather than do an entire Role straight up however, I always update one node of that type first. This lets me make sure there were no problems (and fix them if there were), before moving onto the whole Role.

I noticed recently when performing the update step on CephStorage role nodes that OSDs and OSD nodes were going down in the cluster. This was then causing my Ceph cluster to go into backfilling and recovering (norebalance was set).

We want all of these nodes to be done one at a time, as taking more than one node out at a time can potentially make the Ceph cluster stop serving data (all VMs will freeze) until it finishes and gets the minimum number of copies in the cluster. If all three copies of data go offline at the same time, it’s not going to be able to recover.

My concern was that the update step does not check the status of the cluster, it just goes ahead and updates each node one by one (the seperate ceph update run step does check the state). If the Ceph nodes are updated faster than the cluster can fix itself, we might end up with multiple nodes going offline and hitting the issues mentioned above.

So to work around this I just ran this simple bash loop. It gets a list of all the Ceph Storage nodes and before updating each one in turn, checks that the status of the cluster is HEALTH_OK before proceeding. This would not possible if we update by Role instead.

source ~/stackrc
for node in $(openstack server list -f value -c Name |grep ceph-storage |sort -V); do
  while [[ ! "$(ssh -q controller-0 'sudo ceph -s |grep health:')" =~ "HEALTH_OK" ]] ; do
    echo 'cluster not healthy, sleeping before updating ${node}'
    sleep 5
  done
  echo 'cluster healthy, updating ${node}'
  openstack overcloud update run --nodes ${node} || { echo 'failed to update ${node}, exiting'; exit 1 ;}
  echo 'updated ${node} successfully'
done

I’m not sure if the cluster doing down like that this is expected behaviour, but I opened a bugzilla for it.

March 22, 2020

My POWER9 CPU Core Layout

So, following on from my post on Sensors on the Blackbird (and thus Power9), I mentioned that when you look at the temperature sensors for each CPU core in my 8-core POWER9 chip, they’re not linear numbers. Let’s look at what that means….

stewart@blackbird9$ sudo ipmitool sensor | grep core
 p0_core0_temp            | na                                                                                                               
 p0_core1_temp            | na                                                                                                               
 p0_core2_temp            | na                                                                                                               
 p0_core3_temp            | 38.000                                                                                                           
 p0_core4_temp            | na          
 p0_core5_temp            | 38.000      
 p0_core6_temp            | na          
 p0_core7_temp            | 38.000      
 p0_core8_temp            | na          
 p0_core9_temp            | na          
 p0_core10_temp           | na          
 p0_core11_temp           | 37.000      
 p0_core12_temp           | na          
 p0_core13_temp           | na          
 p0_core14_temp           | na          
 p0_core15_temp           | 37.000      
 p0_core16_temp           | na          
 p0_core17_temp           | 37.000      
 p0_core18_temp           | na          
 p0_core19_temp           | 39.000      
 p0_core20_temp           | na          
 p0_core21_temp           | 39.000      
 p0_core22_temp           | na          
 p0_core23_temp           | na        

You can see I have eight CPU cores in my Blackbird system. The reason the 8 CPU cores are core 3, 5, 7, 11, 15, 17, 19, and 21 rather than 0-8 or something is that these represent the core numbers on the physical die, and the die is a 24 core die. When you’re making a chip as big and as complex as modern high performance CPUs, not all of the chips coming out of your fab are going to be perfect, so this is how you get different models in the line with only one production line.

Weirdly, the output from the hwmon sensors and why there’s a “core 24” and a “core 28”. That’s just… wrong. What it is, however, is right if you think of 8*4=32. This is a product of Linux thinking that Thread=Core in some ways. So, yeah, this numbering is the first thread of each logical core.

[stewart@blackbird9 ~]$ sensors|grep -i core
 Chip 0 Core 0:            +39.0°C  (lowest = +25.0°C, highest = +71.0°C)
 Chip 0 Core 4:            +39.0°C  (lowest = +26.0°C, highest = +66.0°C)
 Chip 0 Core 8:            +39.0°C  (lowest = +27.0°C, highest = +67.0°C)
 Chip 0 Core 12:           +39.0°C  (lowest = +26.0°C, highest = +67.0°C)
 Chip 0 Core 16:           +39.0°C  (lowest = +25.0°C, highest = +67.0°C)
 Chip 0 Core 20:           +39.0°C  (lowest = +26.0°C, highest = +69.0°C)
 Chip 0 Core 24:           +39.0°C  (lowest = +27.0°C, highest = +67.0°C)
 Chip 0 Core 28:           +39.0°C  (lowest = +27.0°C, highest = +64.0°C)

But let’s ignore that, go from the IPMI sensors (which also match what the OCC shows with “occtoolp9 -LS” (see below).

$ ./occtoolp9 -SL
Sensor Details: (found 86 sensors, details only for Status of 0x00)                                           
     GUID Name             Sample     Min    Max U    Stat   Accum     UpdFreq   ScaleFactr   Loc   Type 
....
   0x00ED TEMPC03………     47      29     47 C    0x00 0x00037CF2 0x00007D00 0x00000100 0x0040 0x0008
   0x00EF TEMPC05………     37      26     39 C    0x00 0x00014E53 0x00007D00 0x00000100 0x0040 0x0008
   0x00F1 TEMPC07………     46      28     46 C    0x00 0x0001A777 0x00007D00 0x00000100 0x0040 0x0008
   0x00F5 TEMPC11………     44      27     45 C    0x00 0x00018402 0x00007D00 0x00000100 0x0040 0x0008
   0x00F9 TEMPC15………     36      25     43 C    0x00 0x000183BC 0x00007D00 0x00000100 0x0040 0x0008
   0x00FB TEMPC17………     38      28     41 C    0x00 0x00015474 0x00007D00 0x00000100 0x0040 0x0008
   0x00FD TEMPC19………     43      27     44 C    0x00 0x00016589 0x00007D00 0x00000100 0x0040 0x0008
   0x00FF TEMPC21………     36      30     40 C    0x00 0x00015CA9 0x00007D00 0x00000100 0x0040 0x0008

So what does that mean for physical layout? Well, like all modern high performance chips, the POWER9 is modular, with a bunch of logic being replicated all over the die. The most notable duplicated parts are the core (replicated 24 times!) and cache structures. Less so are memory controllers and PCI hardware.

P9 chip layout from page 31 of the POWER9 Register Specification

See that each core (e.g. EC00 and EC01) is paired with the cache block (EC00 and EC01 with EP00). That’s two POWER9 cores with one 512KB L2 cache and one 10MB L3 cache.

You can see the cache layout (including L1 Instruction and Data caches) by looking in sysfs:

$ for i in /sys/devices/system/cpu/cpu0/cache/index*/; \
  do echo -n $(cat $i/level) $(cat $i/size) $(cat $i/type); \
  echo; done
 1 32K Data
 1 32K Instruction
 2 512K Unified
 3 10240K Unified

So, what does the layout of my POWER9 chip look like? Well, thanks to the power of graphics software, we can cross some cores out and look at the topology:

My 8-core POWER9 CPU in my Raptor Blackbird

If I run some memory bandwidth benchmarks, I can see that you can see the L3 cache capacity you’d assume from the above diagram: 80MB (10MB/core). Let’s see:

[stewart@blackbird9 lmbench3]$ for i in 5M 10M 20M 30M 40M 50M 60M 70M 80M 500M; \
  do echo -n "$i   "; \
  ./bin/bw_mem -N 100  $i rd; \
done
  5M    5.24 63971.98
 10M   10.49 31940.14
 20M   20.97 17620.16
 30M   31.46 18540.64
 40M   41.94 18831.06
 50M   52.43 17372.03
 60M   62.91 16072.18
 70M   73.40 14873.42
 80M   83.89 14150.82
 500M 524.29 14421.35

If all the cores were packed together, I’d expect that cliff to be a lot sooner.

So how does this compare to other machines I have around? Well, let’s look at my Ryzen 7. Specifically, a “AMD Ryzen 7 1700 Eight-Core Processor”. The cache layout is:

$ for i in /sys/devices/system/cpu/cpu0/cache/index*/; \
  do echo -n $(cat $i/level) $(cat $i/size) $(cat $i/type); \
  echo; \
done
 1 32K Data
 1 64K Instruction
 2 512K Unified
 3 8192K Unified

And then the performance benchmark similar to the one I ran above on the POWER9 (lower numbers down low as 8MB is less than 10MB)

$ for i in 4M 8M 16M 24M 32M 40M 48M 56M 64M 72M 80M 500M; \
  do echo -n "$i   "; ./bin/x86_64-linux-gnu/bw_mem -N 10  $i rd;\
done
  4M    4.19 61111.04
  8M    8.39 28596.55
 16M   16.78 21415.12
 24M   25.17 20153.57
 32M   33.55 20448.20
 40M   41.94 20940.11
 48M   50.33 20281.39
 56M   58.72 21600.24
 64M   67.11 21284.13
 72M   75.50 20596.18
 80M   83.89 20802.40
 500M 524.29 21489.27

And my laptop? It’s a four core part, specifically a “Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz” with a cache layout like:

$ for i in /sys/devices/system/cpu/cpu0/cache/index*/; \
   do echo -n $(cat $i/level) $(cat $i/size) $(cat $i/type); \
     echo; \
   done
   1 32K Data
   1 32K Instruction
   2 256K Unified
   3 6144K Unified 
$ for i in 3M 6M 12M 18M 24M 30M 36M 42M 500M; \
  do echo -n "$i   "; ./bin/x86_64-linux-gnu/bw_mem -N 10  $i rd;\
done
  3M    3.15 48500.24
  6M    6.29 27144.16
 12M   12.58 18731.80
 18M   18.87 17757.74
 24M   25.17 17154.12
 30M   31.46 17135.87
 36M   37.75 16899.75
 42M   44.04 16865.44
 500M 524.29 16817.10

I’m not sure what performance conclusions we can realistically draw from these curves, apart from “keeping workload to L3 cache is cool”, and “different chips have different cache hardware”, and “I should probably go and read and remember more about the microarchitectural characteristics of the cache hardware in Ryzen 7 hardware and 10th gen Intel Core hardware”.

Online Teaching

The OpenSTEM® materials are ideally suited to online teaching. In these times of new challenges and requirements, there are a lot of technological possibilities. Schools and teachers are increasingly being asked to deliver material online to students. Our materials can assist with that process, especially for Humanities and Science subjects from Prep/Kindy/Foundation to Year 6. […]

Covid 19 Numbers – lag

Recording some thoughts about Covid 19 numbers.

Today’s figures

The Government says:

“As at 6.30am on 22 March 2020, there have been 1,098 confirmed cases of COVID-19 in Australia”.

The reference is https://www.health.gov.au/news/health-alerts/novel-coronavirus-2019-ncov-health-alert/coronavirus-covid-19-current-situation-and-case-numbers. However, that page is updated daily (ish), so don’t expect it to be the same if you check the reference.

Estimating Lag

If a person tests positive to the virus today, that means they were infected at some time in the past. So, what is the lag between infection and a positive test result?

Incubation Lag – about 5 days

When you are infected you don’t show symptoms immediately. Rather, there’s an incubation period before symptoms become apparent.  The time between being infected and developing symptoms varies from person to person, but most of the time a person shows symptoms after about 5 days (I recall seeing somewhere that 1 in a 1000 cases will develop symptoms after 14 days).

Presentation Lag – about 2 days

I think it’s fair to also assume that people are not presenting at testing immediately they become ill. It is probably taking them a couple of days from developing symptoms to actually get to the doctor – I read a story somewhere (have since lost the reference) about a young man who went to a party, then felt bad for days but didn’t go for a test until someone else from the party had returned a positive test.  Let’s assume there’s a mix of worried well and stoic types and call it 2 days from becoming symptomatic to seeking a test.

Referral Lag – about a day

Assuming that a GP is available straight away and recommends a test immediately, logistically there will still be most of a day taken up between deciding to see a doctor and having a test carried out.

Testing lag – about 2 days

The graph of infections “epi graph” today looks like this:

200322_new-and-cumulative-covid-19-cases-in-australia-by-notification-date_1

One thing you notice about the graph is that the new cases bars seem to increase for a couple of days, then decrease – so about 100 new cases in the last 24 hours, but almost 200 in the 24 hours before that. From the graph, the last 3 “dips” have been today (Sunday), last Thursday and last Sunday.  This seems to be happening every 3 to 4 days. I initially thought that the dips might mean fewer (or more) people presenting over weekends, but the period is inconsistent with that. I suspect, instead, that this actually means that testing is being batched.

That would mean that neither the peaks nor troughs is representative of infection surges/retreats, but is simply reflecting when tests are being processed. This seems to be a 4 day cycle, so, on average it seems that it would be about 2 days between having the test conducted and receiving a result. So a confirmed case count published today is actually showing confirmed cases as at about 2 days earlier.

Total lag

From the date someone is infected to the time that they receive a positive confirmation is about:

lag = time for symptoms to show+time to seek a test+referral time + time for the test to return a result

So, the published figures on confirmed infections are probably lagging actual infections in the community by about 10 days (5+2+1+2).

If there’s about a 10 day lag between infection and confirmation, then what a figure published today says is that about a week and a half ago there were about this many cases in the community.  So, the 22 March figure of 1098 infections is actually really a 12 March figure.

What the lag means for Physical (ie Social) Distancing

The main thing that the lag means is that if we were able to wave a magic wand today and stop all further infections, we would continue to record new infections for about 10 days (and the tail for longer). In practical terms, implementing physical distancing measures will not show any effect on new cases for about a week and a half. That’s because today there are infected people who are yet to be tested.

The silver lining to that is that the physical distancing measures that have been gaining prominence since 15 March should start to show up in the daily case numbers from the middle of the coming week, possibly offset by overseas entrants rushing to make the 20 March entry deadline.

Estimating Actual Infections as at Today

How many people are infected, but unconfirmed as at today? To estimate actual infections you’d need to have some idea of the rate at which infections are increasing. For example, if infections increased by 10% per day for 10 days, then you’d multiply the most recent figure by 1.1 raised to the power of 10 (ie about 2.5).  Unfortunately, the daily rate of increase (see table on the wiki page) has varied a fair bit (from 20% to 27%) over the most recent 10 days of data (that is, over the 10 days prior to 12 March, since the 22 March figures roughly correspond to 12 March infections) and there’s no guarantee that since that time the daily increase in infections will have remained stable, particularly in light of the implementation of physical distancing measures. At 23.5% per day, the factor is about 8.

There aren’t any reliable figures we can use to estimate the rate of infection during the current lag period (ie from 12 March to 22 March). This is because the vast majority of cases have not been from unexplained community transmission. Most of the cases are from people who have been overseas in the previous fortnight and they’re the cohort that has been most significantly impacted by recent physical distancing measures. From 15 March, they have been required to self isolate and from 20 March most of their entry into the country has stopped.  So I’d expect a surge in numbers up to about 30 March – ie reflecting infections in the cohort of people rushing to get into the country before the borders closed followed by a flattening. With the lag factor above, you’ll need to wait until 1 April or thereabouts to know for sure.

Note:

This post is just about accounting for the time lag between becoming infected and receiving a positive test result. It assumes, for example, that everyone who is infected seeks a test, and that everyone who is infected and seeks a test is, in fact, tested. As at today, neither of these things is true.

OCC and Sensors on the Raptor Blackbird (and other POWER9 systems)

This post we’re going to look at three different ways to look at various sensors in the Raptor Blackbird system. The Blackbird is a single socket uATX board for the POWER9 processor. One advantage of the system is completely open source firmware, so you can (like I have): build your own firmware. So, this is my Blackbird running my most recent firmware build (the BMC is running the 2.00 release from Raptor).

Sensors over IPMI

One way to get the sensors is over IPMI. This can be done either in-band (as in, from the OS running on the blackbird), or over the network.

stewart@blackbird9$ sudo ipmitool sensor |head
occ                      | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
 occ0                     | 0x0        | discrete   | 0x0200| na        | na        | na        | na        | na        | na        
 occ1                     | 0x0        | discrete   | 0x0100| na        | na        | na        | na        | na        | na        
 p0_core0_temp            | na         |            | na    | na        | na        | na        | na        | na        | na        
 p0_core1_temp            | na         |            | na    | na        | na        | na        | na        | na        | na        
 p0_core2_temp            | na         |            | na    | na        | na        | na        | na        | na        | na        
 p0_core3_temp            | 38.000     | degrees C  | ok    | na        | -40.000   | na        | 78.000    | 90.000    | na        
 p0_core4_temp            | na         |            | na    | na        | na        | na        | na        | na        | na        
 p0_core5_temp            | 38.000     | degrees C  | ok    | na        | -40.000   | na        | 78.000    | 90.000    | na        
 p0_core6_temp            | na         |            | na    | na        | na        | na        | na        | na        | na    

It’s kind of annoying to read there, so standard unix tools to the rescue!

stewart@blackbird9$ sudo ipmitool sensor | cut -d '|' -f 1,2
 occ                      | na                                                                                                               
 occ0                     | 0x0                                                                                                              
 occ1                     | 0x0                                                                                                              
 p0_core0_temp            | na                                                                                                               
 p0_core1_temp            | na                                                                                                               
 p0_core2_temp            | na                                                                                                               
 p0_core3_temp            | 38.000                                                                                                           
 p0_core4_temp            | na          
 p0_core5_temp            | 38.000      
 p0_core6_temp            | na          
 p0_core7_temp            | 38.000      
 p0_core8_temp            | na          
 p0_core9_temp            | na          
 p0_core10_temp           | na          
 p0_core11_temp           | 37.000      
 p0_core12_temp           | na          
 p0_core13_temp           | na          
 p0_core14_temp           | na          
 p0_core15_temp           | 37.000      
 p0_core16_temp           | na          
 p0_core17_temp           | 37.000      
 p0_core18_temp           | na          
 p0_core19_temp           | 39.000      
 p0_core20_temp           | na          
 p0_core21_temp           | 39.000      
 p0_core22_temp           | na          
 p0_core23_temp           | na          
 p0_vdd_temp              | 40.000 
 dimm0_temp               | 35.000      
 dimm1_temp               | na          
 dimm2_temp               | na          
 dimm3_temp               | na          
 dimm4_temp               | 38.000      
 dimm5_temp               | na          
 dimm6_temp               | na          
 dimm7_temp               | na          
 dimm8_temp               | na          
 dimm9_temp               | na          
 dimm10_temp              | na          
 dimm11_temp              | na          
 dimm12_temp              | na          
 dimm13_temp              | na          
 dimm14_temp              | na          
 dimm15_temp              | na          
 fan0                     | 1200.000    
 fan1                     | 1100.000    
 fan2                     | 1000.000    
 p0_power                 | 33.000      
 p0_vdd_power             | 5.000       
 p0_vdn_power             | 9.000       
 cpu_1_ambient            | 30.600      
 pcie                     | 27.000      
 ambient                  | 26.000  

You can see that I have 3 fans, two DIMMs (although why it lists 16 possible DIMMs for a two DIMM slot board is a good question!), and eight CPU cores. More on why the layout of the CPU cores is the way it is in a future post.

The code path for reading these sensors is interesting, it’s all from the BMC, so we’re having the OCC inside the P9 read things, which the BMC then reads, and then passes back to the P9. On the P9 itself, each sensor is a call all the way to firmware and back! In fact, we can look at it in perf:

$ sudo perf record -g ipmitool sensor
$ sudo perf report --no-children
“ipmitool sensors” perf report

What are the 0x300xxxxx addresses? They’re the OPAL firmware (i.e. skiboot). We can look up the symbols easily, as the firmware exposes them to the kernel, which then plonks it in sysfs:

[stewart@blackbird9 ~]$ sudo head /sys/firmware/opal/symbol_map 
[sudo] password for stewart: 
0000000000000000 R __builtin_kernel_end
0000000000000000 R __builtin_kernel_start
0000000000000000 T __head
0000000000000000 T _start
0000000000000010 T fdt_entry
00000000000000f0 t boot_sem
00000000000000f4 t boot_flag
00000000000000f8 T attn_trigger
00000000000000fc T hir_trigger
0000000000000100 t sreset_vector

So we can easily look up exactly where this is:

[stewart@blackbird9 ~]$ sudo grep '18e.. ' /sys/firmware/opal/symbol_map 
 0000000000018e20 t .__try_lock.isra.0
 0000000000018e68 t .add_lock_request

So we’re managing to spend a whole 12% of execution time spinning on a spinlock in firmware! The call stack of what’s going on in firmware isn’t so easy, but we can find the bt_add_ipmi_msg call there which is probably how everything starts:

[stewart@blackbird9 ~]$ sudo grep '516.. ' /sys/firmware/opal/symbol_map   0000000000051614 t .bt_add_ipmi_msg_head  0000000000051688 t .bt_add_ipmi_msg  00000000000516fc t .bt_poll

OCCTOOL

This is the most not-what-you’re-meant-to-use method of getting access to sensors! It’s using a debug tool for the OCC firmware! There’s a variety of tools in the OCC source repositiory, and one of them (occtoolp9) can be used for a variety of things, one of which is getting sensor data out of the OCC.

$ sudo ./occtoolp9 -SL
     Sensor Type: 0xFFFF
 Sensor Location: 0xFFFF
     (only displaying non-zero sensors)
 Sending 0x53 command to OCC0 (via opal-prd)…
   MFG Sub Cmd: 0x05  (List Sensors)
   Num Sensors: 50
     [ 1] GUID: 0x0000 / AMEintdur…….  Sample:     20  (0x0014)
     [ 2] GUID: 0x0001 / AMESSdur0…….  Sample:      7  (0x0007)
     [ 3] GUID: 0x0002 / AMESSdur1…….  Sample:      3  (0x0003)
     [ 4] GUID: 0x0003 / AMESSdur2…….  Sample:     23  (0x0017)

The odd thing you’ll see is “via opal-prd” – and this is because it’s doing raw calls to the opal-prd binary to talk to the OCC firmware running things like “opal-prd --expert-mode htmgt-passthru“. Yeah, this isn’t a in-production thing :)

Amazingly (and interestingly), this doesn’t go through host firmware in the way that an IPMI call will. There’s a full OCC/Host firmware interface spec to read. But it’s insanely inefficient way to monity sensors, a long bash script shelling out to a whole bunch of other processes… Think ~14.4 billion cycles versus ~367million cycles for the ipmitool option above.

But there are some interesting sensors at the end of the list:

Sensor Details: (found 86 sensors, details only for Status of 0x00)                                                  
     GUID Name             Sample     Min    Max U    Stat   Accum     UpdFreq   ScaleFactr   Loc   Type   
....
   0x014A MRDM0………..    688       3  15015 GBs  0x00 0x0144AE6C 0x00001901 0x000080FB 0x0008 0x0200
   0x014E MRDM4………..    480       3  14739 GBs  0x00 0x01190930 0x00001901 0x000080FB 0x0008 0x0200
   0x0156 MWRM0………..    560       4  16605 GBs  0x00 0x014C61FD 0x00001901 0x000080FB 0x0008 0x0200
   0x015A MWRM4………..    360       4  16597 GBs  0x00 0x014AE231 0x00001901 0x000080FB 0x0008 0x0200

is that memory bandwidth? Well, if I run the STREAM benchmark in a loop and look again:

0x014A MRDM0………..  15165       3  17994 GBs  0x00 0x0C133D6C 0x00001901 0x000080FB 0x0008 0x0200
   0x014E MRDM4………..  17145       3  18016 GBs  0x00 0x0BF501D6 0x00001901 0x000080FB 0x0008 0x0200
   0x0156 MWRM0………..   8063       4  24280 GBs  0x00 0x07C98B88 0x00001901 0x000080FB 0x0008 0x0200
   0x015A MWRM4………..   1138       4  24215 GBs  0x00 0x07CE82AF 0x00001901 0x000080FB 0x0008 0x0200

It looks like it! Are these exposed elsewhere? Well, another blog post at some point in the future is where I should look at that.

lm-sensors

$ rpm -qf /usr/bin/sensors
 lm_sensors-3.5.0-6.fc31.ppc64le

Ahhh, old faithful lm-sensors! Yep, a whole bunch of sensors are just exposed over the standard interface that we’ve been using since ISA was a thing.

[stewart@blackbird9 ~]$ sensors                                                                  
 ibmpowernv-isa-0000                                       
 Adapter: ISA adapter                                      
 Chip 0 Vdd Remote Sense:  +1.02 V  (lowest =  +0.72 V, highest =  +1.02 V)
 Chip 0 Vdn Remote Sense:  +0.67 V  (lowest =  +0.67 V, highest =  +0.67 V)
 Chip 0 Vdd:               +1.02 V  (lowest =  +0.73 V, highest =  +1.02 V)
 Chip 0 Vdn:               +0.68 V  (lowest =  +0.68 V, highest =  +0.68 V)
 Chip 0 Core 0:            +47.0°C  (lowest = +25.0°C, highest = +71.0°C)            
 Chip 0 Core 4:            +47.0°C  (lowest = +26.0°C, highest = +66.0°C)            
 Chip 0 Core 8:            +48.0°C  (lowest = +27.0°C, highest = +67.0°C)            
 Chip 0 Core 12:           +48.0°C  (lowest = +26.0°C, highest = +67.0°C)            
 Chip 0 Core 16:           +47.0°C  (lowest = +25.0°C, highest = +67.0°C)                      
 Chip 0 Core 20:           +47.0°C  (lowest = +26.0°C, highest = +69.0°C)            
 Chip 0 Core 24:           +48.0°C  (lowest = +27.0°C, highest = +67.0°C)                     
 Chip 0 Core 28:           +51.0°C  (lowest = +27.0°C, highest = +64.0°C)                     
 Chip 0 DIMM 0 :           +40.0°C  (lowest = +34.0°C, highest = +44.0°C)                     
 Chip 0 DIMM 1 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)                     
 Chip 0 DIMM 2 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 3 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 4 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 5 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 6 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 7 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 8 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 9 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 10 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 11 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 12 :          +43.0°C  (lowest = +36.0°C, highest = +47.0°C)
 Chip 0 DIMM 13 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 14 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 15 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 Nest:              +48.0°C  (lowest = +27.0°C, highest = +64.0°C)
 Chip 0 VRM VDD:           +47.0°C  (lowest = +39.0°C, highest = +66.0°C)
 Chip 0 :                  44.00 W  (lowest =  31.00 W, highest = 132.00 W)
 Chip 0 Vdd:               15.00 W  (lowest =   4.00 W, highest = 104.00 W)
 Chip 0 Vdn:               10.00 W  (lowest =   8.00 W, highest =  12.00 W)
 Chip 0 :                 227.11 kJ
 Chip 0 Vdd:               44.80 kJ
 Chip 0 Vdn:               58.80 kJ
 Chip 0 Vdd:              +21.50 A  (lowest =  +6.50 A, highest = +104.75 A)
 Chip 0 Vdn:              +14.88 A  (lowest = +12.63 A, highest = +18.88 A)

The best thing? It’s really quick! The hwmon interface is fast and efficient.

March 21, 2020

Using Ansible and dynamic inventory to manage OpenStack TripleO nodes

TripleO based OpenStack deployments use an OpenStack all-in-one node (undercloud) to automate the build and management of the actual cloud (overcloud) using native services such as Heat and Ironic. Roles are used to define services and configuration, which are then applied to specific nodes, for example, Service, Compute and CephStorage, etc.

Although the install is automated, sometimes you need to run adhoc tasks outside of the official update process. For example, you might want to make sure that all hosts are contactable, have a valid subscription (for Red Hat OpenStack Platform), restart containers, or maybe even apply custom changes or patches before an update. Also, during the update process when nodes are being rebooted, it can be useful to use an Ansible script to know when they’ve all come back, services are all running, all containers are healthy, before re-enabling them.

Inventory script

To make this easy, we can use the TripleO Ansible inventory script, which queries the undercloud to get a dynamic inventory of the overcloud nodes. When using the script as an inventory source with the ansible command however, you cannot pass arguments to it. If you’re managing a single cluster and using the standard stack name of overcloud, then this is not a problem; you can just call the script directly.

However, as I manage multiple clouds and each has a different Heat stack name, I create a little executable wrapper script to pass the stack name to the inventory script. Then I just call the relevant shell script instead. If you use the undercloud host to manage multiple stacks, then create multiple scripts and modify as required.

cat >> inventory-overcloud.sh << EOF
#!/usr/bin/env bash
source ~/stackrc
exec /usr/bin/tripleo-ansible-inventory --stack stack-name --list
EOF

Make it executable and run it. It should return JSON with your overcloud node details.

chmod u+x inventory-overcloud.sh
./inventory-overcloud.sh

Run simple tasks

The purpose of using the dynamic inventory is to run some Ansible! We can now use it to do simple things easily, like ping nodes to make sure they are online.

ansible \
--inventory inventory-overcloud.sh \
all \
--module-name ping

And of course one of the great things with Ansible is the ability to limit which hosts you’re running against. So for example, to make sure all compute nodes of role type Compute are back, simple replace all with Compute.

ansible \
--inventory inventory-overcloud.sh \
Compute \
--module-name ping

You can also specify nodes individually.

ansible \
--inventory inventory-overcloud.sh \
service-0,telemetry-2,compute-0,compute-1 \
--module-name ping

You can use the shell module to do simple adhoc things, like restart containers or maybe check their health.

ansible \
--inventory inventory-overcloud.sh \
all \
--module-name shell \
--become \
--args "docker ps |egrep "CONTAINER|unhealthy"'

And the same command using short arguments.

ansible \
-i inventory-overcloud.sh \
all \
-m shell \
-ba "docker ps |egrep "CONTAINER|unhealthy"'

Create some Ansible plays

You can see simple tasks are easy, for more complicated tasks you might want to write some plays.

Pre-fetch downloads before update

Your needs will probably vary, but here is a simple example to pre-download updates on my RHEL hosts to save time (updates are actually installed separately via overcloud update process). Note that the download_only option was added in Ansible 2.7 and thus I don’t use the yum module as RHEL uses Ansible 2.6.

cat >> fetch-updates.yaml << EOF
---
- hosts: all
  tasks:
    - name: Fetch package updates
      command: yum update --downloadonly
      register: result_fetch_updates
      retries: 30
      delay: 10
      until: result_fetch_updates is succeeded
      changed_when: '"Total size:" not in result_fetch_updates.stdout'
      args:
        warn: no
EOF

Now we can run this command against the next set of nodes we’re going to update, Compute and Telemetry in this example.

ansible-playbook \
--inventory inventory-overcloud.sh \
--limit Compute,Telemetry \
fetch-updates.yaml

And again, you could specify nodes individually.

ansible-playbook \
--inventory inventory-overcloud.sh \
--limit telemetry-0,service-0,compute-2,compute-3 \
fetch-updates.yaml

There you go. Using dynamic inventory can be really useful for running adhoc commands against your OpenStack nodes.

COVID-19 (of course)

We thought it timely to review a few facts and observations, relying on published medical papers (or those submitted for peer review) and reliable sources.

March 17, 2020

COVID-19 Time Series Analysis

On Friday 13 March I started looking at the COVID-19 time series case data. The first step was to fit a simple exponential model. The model lets us work out the number of cases t days in the future N(t), given N(0) cases today, and a doubling time of Td days:

N(t) = N(0)*2^(t/Td)

To work out how many days (t) to a number of cases N(t), you can re-arrange to get:

t = Td*log2(N(t)/N(0))

At the time I had some US travel planned for late March. So I plugged in some numbers to see how long it would take the US to get to 70,000 cases (China’s cases at the time):

t = 3*log2(70,000/1600) = 16.4 days

Wow. It slowly dawned on me that international travel was going to be a problem. The human mind just struggles to cope with the power of exponential growth. Five days seems a long time ago now….

I immediately grounded my parents – they are an at risk demographic and in a few weeks the hospitals will not be able to help them if they get sick. I estimate my home city of Adelaide (30 cases on March 18) will struggle with 1000 cases (a proportion of which will need Intensive Care):

t = 4*log2(1000/30) = 20 days

The low number of cases today is not important, the exponential growth is the critical factor.

Since then I’ve been messing with a customised covid19.py Python script to generate some plots useful to me. It is based on some code I found from Mohammad Ashhad. You might find it useful too, it’s easy to customise to other countries. I’d also appreciate a review of the script and math in this post.

I find that analysing the data gives me a small sense of control over the situation. And a useful crystal ball in this science fiction life we have suddenly started living.

Here are some plots from the last 14 days:


I find the second, log plot much more helpful. A constant positive slope on a log plot indicates exponential growth which is bad. We want the log plot to flatten out to a horizontal line.

Doubling time is the key metric. Here is a smoothed (3 day window) estimate. A low doubling time (e.g. a few days) is bad, our target is a high doubling time:

It’s a bit noisey at the moment. I’m interested in Spain and Italy as they have locked down. There will be a time lag as infections prior to lock down flow through to cases, but I expect (and sincerely hope) to see the doubling time of those countries improve, and new cases tapering off.

I’m working from home and hoping Australia will lock down soon. I will update the plots above daily.

All the best to everyone.

Update – March 26 2020

It’s been one week since I first published this post and I have been updating the graphs every day. My models are simplistic and I am not an epidemiologist. However I am sorry to say that exponential growth for Australia and the US has proceeded at the same rate or faster than the simple models above predicted.

Italy is showing a clear trend to an improved doubling time. The top plot show an almost linear growth. This is welcome news and will hopefully soon lead to a decreased load on their hospitals. This is encouraging to me as it shows lock down can work!

A small positive trend for Spain, who have also locked down; however Australia and the US still doubling every 3-4 days. It’s clear from the second, log plot, that US cases will soon be the highest in the world.

Any changes we make in behaviour today will take 1-2 weeks to flow through. So this is a window into behavioural changes 1-2 weeks ago, and an estimate of the doubling rate for the next 1-2 weeks.

A daily case increase of 10% is a doubling time of 7.3 days (1 week). This intuitively feels like a good first milestone, and something expanding health systems have some chance of dealing with. It’s also easy to calculate in your head when looking at day by day statistics. A daily increase of 20% is a 3.8 day doubling time and very bad news.

Australia still doesn’t have a strong lock down, and many people are not staying at home. I hope our government acts decisively soon.

Update – April 3 2020

Another week has passed since my last update – a long time in the Coronavirus saga. A few days after my last update, I noticed the Australian new cases were constant at around 350 for a few days, then started to drop. The doubling time has shot up too, and the top graph looks almost linear now. Australia is now at about 5% new cases/day (300 new/5000 existing cases). We can handle that.

This means our hospitals are not going to break. Good news indeed. My theory on this reduction is the time delayed effect of the Australian population starting to take Corona seriously, and good management by our state and federal governments. Several states have a lock down but the effect hasn’t flowed through to cases yet.

I think we are now entering a “whack a mole” stage, like China, Japan, and South Korea. We’ll have to remain vigilant, stay at home, and smash small outbreaks as they spring up in the community. Recoveries will eventually start to pick up and the number of active cases decline. The current numbers are a 0.5% fatality rate and 2% ICU admission rate.

Despite the appalling number of deaths in Italy and Spain, they clearly have new cases under control through lock down. The log curves are flat, and doubling times steadily increasing. The situation is very bad in the US, and many other countries. I am particularly concerned for the developing world.

I note the doubling rate curve for for Spain and Australia is the same, Australia is just much further down the curve. Even my septuagenarian parents are behaving – mostly “staying home”.

Doing my own analysis has been really useful – I basically ignore the headlines (anyone sick of the word “surge”?) as I can look at the data and drill down to what matters. I’m picking trends a few days before they are reported. Still a few things to ponder, like a model for how ICU cases track reported cases.

Best wishes to everyone.

Links

John Hopkins CSSE COVID-19 Dashboard
Source Data
Our World in Data Coronavirus Statistics and Research

March 15, 2020

Using network namespaces with veth to NAT guests with overlapping IPs

Sets of virtual machines are connected to a virtual bridges (e.g. virbr0 and virbr1) and as they are isolated, can use the same subnet range and set of IPs. However, NATing becomes a problem because the host won’t know which VM to return the traffic to.

To solve this problem, we can use network namespaces and some veth (virtual Ethernet) devices to connect up each private network we want to NAT.

Each veth device acts like a patch cable and is actually made up of two network devices, one for each end (e.g. peer1-a and peer1-b). By adding those interfaces between bridges and/or namespaces, you create a link between them.

The network namespace is only used for NAT and is where the veth IPs are set, the other end will act like a patch cable without an IP. The VMs are only connected into their respective bridge (e.g. virbr0) and can talk to the network namespace over the veth patch.

We will use two pairs for each network namespace.

  • One (e.g. represented by veth1 below ) which connects the virtual machine’s private network (e.g. virbr0 on 10.0.0.0/24) into the network namespace (e.g. net-ns1) where it sets an IP and will be the private network router (e.g. 10.0.0.1).
  • Another (e.g. represented by veth2 below) which connects the upstream provider network (e.g. br0 on 192.168.0.0/24) into the same network namespace where it sets an IP (e.g. 192.168.0.100).
  • Repeat the process for other namespaces (e.g. represented by veth3 and veth4 below).
Configuration for multiple namespace NAT

By providing each private network with is own unique upstream routable IP and applying NAT rules inside each namespace separately we can avoid any conflict.

Create a provider bridge

You’ll need a bridge to a physical network, which will act as your upstream route (like a “provider” network).

ip link add name br0 type bridge
ip link set br0 up
ip link set eth0 up
ip link set eth0 master br0

Create namespace

We create our namespace to patch in the veth devices and hold the router and isolated NAT rules. As this is for the purpose of NATing multiple private networks, I’m making it sequential and calling this nat1 (for our first one, then I’ll call the next one nat2).

ip netns add nat1

First veth pair

Our first veth peer interfaces pair will be used to connect the namespace to the upstream bridge (br0). Give them a name that makes sense to you; here I’m making it sequential again and specifying the purpose. Thus, peer1-br0 will connect to the upstream br0 and peer1-gw1 will be our routable IP in the namespae.

ip link add peer1-br0 type veth peer name peer1-gw1

Adding the veth to provider bridge

Now we need to add the peer1-br0 interface to the upstream provider bridge and bring it up. Note that we do not set an IP on this, it’s a patch lead. The IP will be on the other end in the namespace.

brctl addif br0 peer1-br0
ip link set peer1-br0 up

First gateway interface in namespace

Next we want to add the peer1-gw1 device to the namespace, give it an IP on the routable network, set the default gateway and bring the device up. Note that if you use DHCP you can do that, here I’m just setting an IP statically to 192.168.0.100 and gateway of 192.168.0.1.

ip link set peer1-gw1 netns nat1
ip netns exec nat1 ip addr add 192.168.0.100/24 dev peer1-gw1
ip netns exec nat1 ip link set peer1-gw1 up
ip netns exec nat1 ip route add default via 192.168.0.1

Second veth pair

Now we create the second veth pair to connect the namespace into the private network. For this example we’ll be connecting to virbr0 network, where our first set of VMs are running. Again, give them useful names.

ip link add peer1-virbr0 type veth peer name peer1-gw2

Adding the veth to private bridge

Now we need to add the peer1-virbr0 interface to the virbr0 private network bridge. Note that we do not set an IP on this, it’s a patch lead. The IP will be on the other end in the namespace.

brctl addif virbr0 peer1-virbr0
ip link set peer1-virbr0 up

Second gateway interface in namespace

Next we want to add the peer1-gw2 device to the namespace, give it an IP on the private network and bring the device up. I’m going to set this to the default gateway of the VMs in the private network, which is 10.0.0.1.

ip link set peer1-gw2 netns nat1
ip netns exec nat1 ip addr add 10.0.0.1/24 dev peer1-gw2
ip netns exec nat1 ip link set up dev peer1-gw2

Enable NAT in the namespae

So now we have our namespace with patches into each bridge and IPs on each network. The final step is to enable network address translation.

ip netns exec nat1 iptables -t nat -A POSTROUTING -o peer1-gw1 -j MASQUERADE
ip netns exec nat1 iptables -A FORWARD -i peer1-gw1 -o peer1-gw2 -m state --state RELATED,ESTABLISHED -j ACCEPT
ip netns exec nat1 iptables -A FORWARD -i peer1-gw2 -o peer1-gw1 -j ACCEPT

You can see the rules with standard iptables in the netspace.

ip netns exec nat1 iptables -t nat -L -n

Test it

OK so logging onto the VMs, they should a local IP (e.g. 10.0.0.100, a default route to 10.0.0.1 and have upstream DNS set. Test that they can ping the gateway, test they can ping the DNS and test that they can ping a DNS name on the Internet.

Rinse and repeat

This can be applied for other virtual machine networks as required. There is no-longer any need for the VMs there to have unique IPs, they can overlap eachother.

What you do need to do is create a new network namespace, create two new sets of veth pairs (with a useful name) and pick another IP on the routable network. The virtual machine gateway IP will be the same in each namespace, that is 10.0.0.1.

To be, or not to be decisive.

To be, or not to be decisive. kattekrab Sun, 15/03/2020 - 11:26

March 13, 2020

From 2020 to 2121: How will we get there?

From 2020 to 2121: How will we get there? kattekrab Thu, 13/02/2020 - 19:35

6 reasons I love working from home (The COVID19 edition)

6 reasons I love working from home (The COVID19 edition) kattekrab Fri, 13/03/2020 - 13:34

March 12, 2020

Coronavirus and Work

Currently the big news issue is all about how to respond to Coronavirus. The summary of the medical situation is that it’s going to spread exponentially (as diseases do) and that it has a period of up to 6 days of someone being infectious without having symptoms. So you can get a lot of infected people in an area without anyone knowing about it. Therefore preventative action needs to be taken before there’s widespread known infection.

Governments seem disinterested in doing anything about the disease before they have proof of widespread infection. They won’t do anything until it’s too late.

I finished my last 9-5 job late last year and hadn’t got a new one since then. Now I’m thinking of just not taking any work that requires much time spend outside home. If you don’t go to a workplace there isn’t a lot you have to do that involves leaving home.

Shopping is one requirement for leaving home, but the two major supermarket chains in my area (Coles and Woolworths) both offer home delivery for a small price so that covers most shopping. Getting groceries delivered means that they will usually come from the store room not the shop floor so wouldn’t have anyone coughing or sneezing on them. If you are really paranoid (which I aren’t at the moment) then you could wear rubber gloves to bring the delivery in and then wash everything before using it. It seems that many people have similar ideas to me, normally Woolworths allows booking next-day delivery, now you have to book at least 5 days (3 business days) in advance.

If anyone needs some Linux work done from remote then let me know. Otherwise I’ll probably spend the next couple of months at home doing Debian coding and watching documentaries on Netflix.

March 10, 2020

The Net Promoter Score: A Meaningless Flashing Light

Almost two years ago I made a short blog post about how the Net Promoter Score (NPS), commonly used in business settings, is The Most Useless Metric of All. My reasons at the time is that it doesn't capture the reasons for a low score, it doesn't differentiate between subjective values in its scores, and it is mathematically incoherent (a three-value grade from an 11-point range of 0-10). Further, actual studies rank it last in effectiveness.

Recently, the author of the NPS, Fred Reichfield, has come around. Apparently now It's Not About the Score, but rather the score represents a "signal". This is a very far cry from the initial claims in the Harvard Business Review that it is "The One Number You Need to Grow". Of course, it is very difficult for anyone to admit they've made an error, and Reichfield is no exception to this. Instead of addressing what are real problems of the NPS method, he now tries to argue that people have gamified the scores, and that's the real problem. It would be great if as a general principle in business reasoning, people could just admit that their pet idea is flawed and build something better. That would be appreciated. Defending something that is clearly broken, even if it's your own idea, lacks intellectual humility, and is actually a bit embarrassing to watch.

Even as a signal, the NPS doesn't send a useful signal because the people being surveyed don't know what the signal means. In a scale where a 0 is equal to a 6, the scale is meaningless. There is, in fact, only three values in NPS (promoter, passive, detractor) and only one metric that it can possibly be testing: "How likely is it that you would recommend [company X] to a friend or colleague?"

Is that a useful question? Maybe for a generic good. It is far less useful for specialist goods. Do I recommend a three-day course in learning about job submission with Slurm for high-performance computing? Only to a few people that it would benefit. What score do I give? Maybe a 2, representing the number of people I would recommend it to? 3/11 is actually a lot in a quantitative sense, but that's the circles I mix with. Ah, but no; that makes me a detractor. And here we fall into the problem of subjective evaluation of the meaning of the scores.

The NPS doesn't send a useful signal also because the people receiving the survey have no idea what the signal means. "Wow, we're receiving a lot of positive promoters!", "Do you know why?", "Nope, but we must be doing something right. I wonder what it is?". It's like driving in the dark and congratulating your skills that you haven't gone off the edge of a cliff - yet. Who would do such a thing? The NPS, that's who.

To reiterate the post from two years ago, there are necessary changes needed to improve NPS. Firstly, if you're going to have a ranking method, use all the ranks! Also, 1-10 is a 10-point scale (which I suspect was the intention), not 0-10 - that's 11 (O is an index, people!). Secondly, ensure that there are qualitative values assigned to the quantitive values; 6/10 is not a detractor in a normal distribution - it's a neutral, leaning to positive. Specify how the quantitative values correlate with qualitative descriptions. Thirdly, actively seek out reasons for the rating provided. If you don't have that data all the signal will be is just that - a flashing light with no explanation. Without quantification and qualification, you simply cannot manage appropriately.

Finally, more questions! In managing customer loyalty, you will need to discover what they are being loyal to. It doesn't need to be overly long, just something that breaks down the experience that the customer can identify with. Customers may be lazy, but they're not that lazy. The benefit gained from a few questions provides much more insight than the loss of those customers who only answer one question: "A single item question is much less reliable and more volatile than a composite index. (Hill, Nigel; Roche, Greg; Allen, Rachel (2007). Customer Satisfaction: The Customer Experience through the Customer's Eyes). Yes, it is great to have people promoting your organisation or product. You know else you need? Knowledge of what that flashing light means.

March 09, 2020

Terry2020 finally making the indoor beast more stable

Over time the old Terry robot had evolved from a basic "T" shape to have pan and tilt and a robot arm on board. The rear caster(s) were the weakest part of the robot enabling the whole thing to rock around more than it should. I now have Terry 2020 on the cards.


Part of this is an upgrade to a Kinect2 for navigation. The power requirements of that (12v/3a or so) have lead me to putting a better dc-dc bus on board and some relays to be able to pragmatically shut down and bring up features are needed and conserve power otherwise. The new base footprint is 300x400mm though the drive wheels stick out the side.

The wheels out the sides is partially due to the planetary gear motors (on the under side) being quite long. If it is an issue I can recut the lowest layer alloy and move them inward but I an not really needing to have the absolute minimal turning circle. If that were the case I would move the drive wheels to the middle of the chassis so it could turn on it's center.

There will be 4 layers at the moment and a mezzanine below the arm. So there will be expansion room included in the build :)

The rebuild will allow Terry to move at top speed when self driving. Terry will never move at the speed of an outdoor robot but can move closer to it's potential when it rolls again.

March 08, 2020

Yet another near-upstream Raptor Blackbird firmware build

In what is coming a month occurance, I’ve put up yet another firmware build for the Raptor Blackbird with close-to-upstream firmware (see here and here for previous ones).

Well, I’ve done another build! It’s current op-build (as of yesterday), but my branch with patches for the Raptor Blackbird. The skiboot patch is there, the SBE speedup patch is now upstream. The machine-xml which is straight from Raptor but in my repo.

Here’s the current versions of everything:

$ lsprop /sys/firmware/devicetree/base/ibm,firmware-versions/
skiboot          "v6.5-228-g82aed17a-p4360f95"
bmc-firmware-version
                 "0.00"
occ              "3ab2921"
hostboot         "acdff8a-pe7e80e1"
buildroot        "2019.05.3-15-g3a4fc2a888"
capp-ucode       "p9-dd2-v4"
machine-xml      "site_local-stewart-a0efd66"
hostboot-binaries
                 "hw013120a.opmst"
sbe              "c318ab0-p1ddf83c"
hcode            "hw030220a.opmst"
petitboot        "v1.12"
phandle          0000064c (1612)
version          "blackbird-v2.4-514-g62d1a941"
linux            "5.4.22-openpower1-pdbbf8c8"
name             "ibm,firmware-versions"

If we compare this to the last build I put up, we have:

Componentoldnew
skibootv6.5-209-g179d53df-p4360f95v6.5-228-g82aed17a-p4360f95
linux5.4.13-openpower1-pa361bec5.4.22-openpower1-pdbbf8c8
occ3ab2921no change
hostboot779761d-pe7e80e1acdff8a-pe7e80e1
buildroot2019.05.3-14-g17f117295f2019.05.3-15-g3a4fc2a888
capp-ucodep9-dd2-v4no change
machine-xmlsite_local-stewart-a0efd66no change
hostboot-binarieshw011120a.opmsthw013120a.opmst
sbe166b70c-p06fc80cc318ab0-p1ddf83c
hcodehw011520a.opmsthw030220a.opmst
petitbootv1.11v1.12
versionblackbird-v2.4-415-gb63b36efblackbird-v2.4-514-g62d1a941

So, what do those changes mean? Not too much changed over the past month. Kernel bump, new petitboot (although I can’t find release notes but it doesn’t look like there’s a lot of changes), and slight bumps to other firmware components.

Grab blackbird.pnor from https://www.flamingspork.com/blackbird/stewart-blackbird-4-images/ and give it a whirl!

To flash it, copy blackbird.pnor to your Blackbird’s BMC in /tmp/ (important! the /tmp filesystem has enough room, the home directory for root does not), and then run:

pflash -E -p /tmp/blackbird.pnor

Which will ask you to confirm and then flash:

About to erase chip !
WARNING ! This will modify your HOST flash chip content !
Enter "yes" to confirm:yes
Erasing... (may take a while)
[==================================================] 99% ETA:1s      
done !
About to program "/tmp/blackbird.pnor" at 0x00000000..0x04000000 !
Programming & Verifying...
[==================================================] 100% ETA:0s