Planet Linux Australia
Celebrating Australians & Kiwis in the Linux and Free/Open-Source community...

July 03, 2020

Desklab Portable USB-C Monitor

I just got a 15.6″ 4K resolution Desklab portable touchscreen monitor [1]. It takes power via USB-C and video input via USB-C or mini HDMI, has touch screen input, and has speakers built in for USB or HDMI sound.

PC Use

I bought a mini-DisplayPort to HDMI adapter and for my first test ran it from my laptop, it was seen as a 1920*1080 DisplayPort monitor. The adaptor is specified as supporting 4K so I don’t know why I didn’t get 4K to work, my laptop has done 4K with other monitors.

The next thing I plan to get is a VGA to HDMI converter so I can use this on servers, it can be a real pain getting a monitor and power cable to a rack mounted server and this portable monitor can be powered by one of the USB ports in the server. A quick search indicates that such devices start at about $12US.

The Desklab monitor has no markings to indicate what resolution it supports, no part number, and no serial number. The only documentation I could find about how to recognise the difference between the FullHD and 4K versions is that the FullHD version supposedly draws 2A and the 4K version draws 4A. I connected my USB Ammeter and it reported that between 0.6 and 1.0A were drawn. If they meant to say 2W and 4W instead of 2A and 4A (I’ve seen worse errors in manuals) then the current drawn would indicate the 4K version. Otherwise the stated current requirements don’t come close to matching what I’ve measured.

Power

The promise of USB-C was power from anywhere to anywhere. I think that such power can theoretically be done with USB 3 and maybe USB 2, but asymmetric cables make it more challenging.

I can power my Desklab monitor from a USB battery, from my Thinkpad’s USB port (even when the Thinkpad isn’t on mains power), and from my phone (although the phone battery runs down fast as expected). When I have a mains powered USB charger (for a laptop and rated at 60W) connected to one USB-C port and my phone on the other the phone can be charged while giving a video signal to the display. This is how it’s supposed to work, but in my experience it’s rare to have new technology live up to it’s potential at the start!

One thing to note is that it doesn’t have a battery. I had imagined that it would have a battery (in spite of there being nothing on their web site to imply this) because I just couldn’t think of a touch screen device not having a battery. It would be nice if there was a version of this device with a big battery built in that could avoid needing separate cables for power and signal.

Phone Use

The first thing to note is that the Desklab monitor won’t work with all phones, whether a phone will take the option of an external display depends on it’s configuration and some phones may support an external display but not touchscreen. The Huawei Mate devices are specifically listed in the printed documentation as being supported for touchscreen as well as display. Surprisingly the Desklab web site has no mention of this unless you download the PDF of the manual, they really should have a list of confirmed supported devices and a forum for users to report on how it works.

My phone is a Huawei Mate 10 Pro so I guess I got lucky here. My phone has a “desktop mode” that can be enabled when I connect it to a USB-C device (not sure what criteria it uses to determine if the device is suitable). The desktop mode has something like a regular desktop layout and you can move windows around etc. There is also the option of having a copy of the phone’s screen, but it displays the image of the phone screen vertically in the middle of the landscape layout monitor which is ridiculous.

When desktop mode is enabled it’s independent of the phone interface so I had to find the icons for the programs I wanted to run in an unsorted list with no search usable (the search interface of the app list brings up the keyboard which obscures the list of matching apps). The keyboard takes up more than half the screen and there doesn’t seem to be a way to make it smaller. I’d like to try a portrait layout which would make the keyboard take something like 25% of the screen but that’s not supported.

It’s quite easy to type on a keyboard that’s slightly larger than a regular PC keyboard (a 15″ display with no numeric keypad or cursor control keys). The hackers keyboard app might work well with this as it has cursor control keys. The GUI has an option for full screen mode for an app which is really annoying to get out of (you have to use a drop down from the top of the screen), full screen doesn’t make sense for a display this large. Overall the GUI is a bit clunky, imagine Windows 3.1 with a start button and task bar. One interesting thing to note is that the desktop and phone GUIs can be run separately, so you can type on the Desklab (or any similar device) and look things up on the phone. Multiple monitors never really interested me for desktop PCs because switching between windows is fast and easy and it’s easy to resize windows to fit several on the desktop. Resizing windows on the Huawei GUI doesn’t seem easy (although I might be missing some things) and the keyboard takes up enough of the screen that having multiple windows open while typing isn’t viable.

I wrote the first draft of this post on my phone using the Desklab display. It’s not nearly as easy as writing on a laptop but much easier than writing on the phone screen.

Currently Desklab is offering 2 models for sale, 4K resolution for $399US and FullHD for $299US. I got the 4K version which is very expensive at the moment when converted to Australian dollars. There are significantly cheaper USB-C monitors available (such as this ASUS one from Kogan for $369AU), but I don’t think they have touch screens and therefore can’t be used with a phone unless you enable the phone screen as touch pad mode and have a mouse cursor on screen. I don’t know if all Android devices support that, it could be that a large part of the desktop experience I get is specific to Huawei devices.

One annoying feature is that if I use the phone power button to turn the screen off it shuts down the connection to the Desklab display, but the phone screen will turn off it I leave it alone for the screen timeout (which I have set to 10 minutes).

Caveats

When I ordered this I wanted the biggest screen possible. But now that I have it the fact that it doesn’t fit in the pocket of my Scott e Vest jacket [2] will limit what I can do with it. Maybe I’ll be buying a 13″ monitor in the near future, I expect that Desklab will do well and start selling them in a wide range of sizes. A 15.6″ portable device is inconvenient even if it is in the laptop format, a thin portable screen is inconvenient in many ways.

Netflix doesn’t display video on the Desklab screen, I suspect that Netflix is doing this deliberately as some misguided attempt at stopping piracy. It is really good for watching video as it has the speakers in good locations for stereo sound, it’s a pity that Netflix is difficult.

The functionality on phones from companies other than Huawei is unknown. It is likely to work on most Android phones, but if a particular phone is important to you then you want to Google for how it worked for others.

July 02, 2020

Isolating PHP Web Sites

If you have multiple PHP web sites on a server in a default configuration they will all be able to read each other’s files in a default configuration. If you have multiple PHP web sites that have stored data or passwords for databases in configuration files then there are significant problems if they aren’t all trusted. Even if the sites are all trusted (IE the same person configures them all) if there is a security problem in one site it’s ideal to prevent that being used to immediately attack all sites.

mpm_itk

The first thing I tried was mpm_itk [1]. This is a version of the traditional “prefork” module for Apache that has one process for each HTTP connection. When it’s installed you just put the directive “AssignUserID USER GROUP” in your VirtualHost section and that virtual host runs as the user:group in question. It will work with any Apache module that works with mpm_prefork. In my experiment with mpm_itk I first tried running with a different UID for each site, but that conflicted with the pagespeed module [2]. The pagespeed module optimises HTML and CSS files to improve performance and it has a directory tree where it stores cached versions of some of the files. It doesn’t like working with copies of itself under different UIDs writing to that tree. This isn’t a real problem, setting up the different PHP files with database passwords to be read by the desired group is easy enough. So I just ran each site with a different GID but used the same UID for all of them.

The first problem with mpm_itk is that the mpm_prefork code that it’s based on is the slowest mpm that is available and which is also incompatible with HTTP/2. A minor issue of mpm_itk is that it makes Apache take ages to stop or restart, I don’t know why and can’t be certain it’s not a configuration error on my part. As an aside here is a site for testing your server’s support for HTTP/2 [3]. To enable HTTP/2 you have to be running mpm_event and enable the “http2” module. Then for every virtual host that is to support it (generally all https virtual hosts) put the line “Protocols h2 h2c http/1.1” in the virtual host configuration.

A good feature of mpm_itk is that it has everything for the site running under the same UID, all Apache modules and Apache itself. So there’s no issue of one thing getting access to a file and another not getting access.

After a trial I decided not to keep using mpm_itk because I want HTTP/2 support.

php-fpm Pools

The Apache PHP module depends on mpm_prefork so it also has the issues of not working with HTTP/2 and of causing the web server to be slow. The solution is php-fpm, a separate server for running PHP code that uses the fastcgi protocol to talk to Apache. Here’s a link to the upstream documentation for php-fpm [4]. In Debian this is in the php7.3-fpm package.

In Debian the directory /etc/php/7.3/fpm/pool.d has the configuration for “pools”. Below is an example of a configuration file for a pool:

# cat /etc/php/7.3/fpm/pool.d/example.com.conf
[example.com]
user = example.com
group = example.com
listen = /run/php/php7.3-example.com.sock
listen.owner = www-data
listen.group = www-data
pm = dynamic
pm.max_children = 5
pm.start_servers = 2
pm.min_spare_servers = 1
pm.max_spare_servers = 3

Here is the upstream documentation for fpm configuration [5].

Then for the Apache configuration for the site in question you could have something like the following:

ProxyPassMatch "^/(.*\.php(/.*)?)$" "unix:/run/php/php7.3-example.com.sock|fcgi://localhost/usr/share/wordpress/"

The “|fcgi://localhost” part is just part of the way of specifying a Unix domain socket. From the Apache Wiki it appears that the method for configuring the TCP connections is more obvious [6]. I chose Unix domain sockets because it allows putting the domain name in the socket address. Matching domains for the web server to port numbers is something that’s likely to be error prone while matching based on domain names is easier to check and also easier to put in Apache configuration macros.

There was some additional hassle with getting Apache to read the files created by PHP processes (the options include running PHP scripts with the www-data group, having SETGID directories for storing files, and having world-readable files). But this got things basically working.

Nginx

My Google searches for running multiple PHP sites under different UIDs didn’t turn up any good hits. It was only after I found the DigitalOcean page on doing this with Nginx [7] that I knew what to search for to find the way of doing it in Apache.

AudioBooks – June 2020

The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power by Shoshana Zuboff

A good warning of the dangerous designs and goals of firms like Facebook and Google. Sometimes a bit wordy. 3/5

The Calculating Stars: Lady Astronaut Volume 1 by Mary Robinette Kowal

Alternate timeline SF. A meteorite hits the US. The Space program accelerates so humans can escape earth. Our hero faces lots of sexism & other barriers to becoming an astronaut. 3/5

By the Shores of Silver Lake: Little House Series, Book 5 by Laura Ingalls Wilder

The family move to De Smet, South Dakota. The railroad and then a town is built about them over a year. A good entry in the series, some gripping passages. 3/5

The Restaurant: A History of Eating Out
by William Sitwell

A non-exhaustive history. Bouncing through ancient times before focusing on Britain since 1945. But plenty of fun and interesting bits. 3/5

Broadway: A History of New York City in Thirteen Miles by Fran Leadon

A mile by mile coverage from South to North. How each section was added to the street and developed. A range of interesting stories and history. 4/5

Share

June 30, 2020

Fuck Grey Text

fuck grey text on white backgrounds
fuck grey text on black backgrounds
fuck thin, spindly fonts
fuck 10px text
fuck any size of anything in px
fuck font-weight 300
fuck unreadable web pages
fuck themes that implement this unreadable idiocy
fuck sites that don’t work without javascript
fuck reactjs and everything like it

thank fuck for Stylus. and uBlock Origin. and uMatrix.

Fuck Grey Text is a post from: Errata

June 27, 2020

Links June 2020

Bruce Schneier wrote an informative post about Zoom security problems [1]. He recommends Jitsi which has a Debian package of their software and it’s free software.

Axel Beckert wrote an interesting post about keyboards with small numbers of keys, as few as 28 [2]. It’s not something I’d ever want to use, but interesting to read from a computer science and design perspective.

The Guardian has a disturbing article explaining why we might never get a good Covid19 vaccine [3]. If that happens it will change our society for years if not decades to come.

Matt Palmer wrote an informative blog post about private key redaction [4]. I learned a lot from that. Probably the simplest summary is that you should never publish sensitive data unless you are certain that all that you are publishing is suitable, if you don’t understand it then you don’t know if it’s suitable to be published!

This article by Umair Haque on eand.co has some interesting points about how Freedom is interpreted in the US [5].

This article by Umair Haque on eand.co has some good points about how messed up the US is economically [6]. I think that his analysis is seriously let down by omitting the savings that could be made by amending the US healthcare system without serious changes (EG by controlling drug prices) and by reducing the scale of the US military (there will never be another war like WW2 because any large scale war will be nuclear). If the US government could significantly cut spending in a couple of major areas they could then put the money towards fixing some of the structural problems and bootstrapping a first-world economic system.

The American Conservatrive has an insightful article “Seven Reasons Police Brutality is Systemic Not Anecdotal [7].

Scientific American has an informative article about how genetic engineering could be used to make a Covid-19 vaccine [8].

Rike wrote an insightful post about How Language Changes Our Concepts [9]. They cover the differences between the French, German, and English languages based on gender and on how the language limits thoughts. Then conclude with the need to remove terms like master/slave and blacklist/whitelist from our software, with a focus on Debian but it’s applicable to all software.

Gunnar Wolf also wrote an insightful post On Masters and Slaves, Whitelists and Blacklists [10], they started with why some people might not understand the importance of the issue and then explained some ways of addressing it. The list of suggested terms includes Primary-secondary, Leader-follower, and some other terms which have slightly different meanings and allow more precision in describing the computer science concepts used. We can be more precise when describing computer science while also not using terms that marginalise some groups of people, it’s a win-win!

Both Rike and Gunnar were responding to a LWN article about the plans to move away from Master/Slave and Blacklist/Whitelist in the Linux kernel [11]. One of the noteworthy points in the LWN article is that there are about 70,000 instances of words that need to be changed in the Linux kernel so this isn’t going to happen immediately. But it will happen eventually which is a good thing.

Vale Marcus de Rijk

Vale Marcus de Rijk kattekrab Sat, 27/06/2020 - 10:16

June 25, 2020

How Will the Pandemic Change Things?

The Bulwark has an interesting article on why they can’t “Reopen America” [1]. I wonder how many changes will be long term. According to the Wikipedia List of Epidemics [2] Covid-19 so far hasn’t had a high death toll when compared to other pandemics of the last 100 years. People’s reactions to this vary from doing nothing to significant isolation, the question is what changes in attitudes will be significant enough to change society.

Transport

One thing that has been happening recently is a transition in transport. It’s obvious that we need to reduce CO2 and while electric cars will address the transport part of the problem in the long term changing to electric public transport is the cheaper and faster way to do it in the short term. Before Covid-19 the peak hour public transport in my city was ridiculously overcrowded, having people unable to board trams due to overcrowding was really common. If the economy returns to it’s previous state then I predict less people on public transport, more traffic jams, and many more cars idling and polluting the atmosphere.

Can we have mass public transport that doesn’t give a significant disease risk? Maybe if we had significantly more trains and trams and better ventilation with more airflow designed to suck contaminated air out. But that would require significant engineering work to design new trams, trains, and buses as well as expense in refitting or replacing old ones.

Uber and similar companies have been taking over from taxi companies, one major feature of those companies is that the vehicles are not dedicated as taxis. Dedicated taxis could easily be designed to reduce the spread of disease, the famed Black Cab AKA Hackney Carriage [3] design in the UK has a separate compartment for passengers with little air flow to/from the driver compartment. It would be easy to design such taxis to have entirely separate airflow and if setup to only take EFTPOS and credit card payment could avoid all contact between the driver and passengers. I would prefer to have a Hackney Carriage design of vehicle instead of a regular taxi or Uber.

Autonomous cars have been shown to basically work. There are some concerns about safety issues as there are currently corner cases that car computers don’t handle as well as people, but of course there are also things computers do better than people. Having an autonomous taxi would be a benefit for anyone who wants to avoid other people. Maybe approval could be rushed through for autonomous cars that are limited to 40Km/h (the maximum collision speed at which a pedestrian is unlikely to die), in central city areas and inner suburbs you aren’t likely to drive much faster than that anyway.

Car share services have been becoming popular, for many people they are significantly cheaper than owning a car due to the costs of regular maintenance, insurance, and depreciation. As the full costs of car ownership aren’t obvious people may focus on the disease risk and keep buying cars.

Passenger jets are ridiculously cheap. But this relies on the airline companies being able to consistently fill the planes. If they were to add measures to reduce cross contamination between passengers which slightly reduces the capacity of planes then they need to increase ticket prices accordingly which then reduces demand. If passengers are just scared of flying in close proximity and they can’t fill planes then they will have to increase prices which again reduces demand and could lead to a death spiral. If in the long term there aren’t enough passengers to sustain the current number of planes in service then airline companies will have significant financial problems, planes are expensive assets that are expected to last for a long time, if they can’t use them all and can’t sell them then airline companies will go bankrupt.

It’s not reasonable to expect that the same number of people will be travelling internationally for years (if ever). Due to relying on economies of scale to provide low prices I don’t think it’s possible to keep prices the same no matter what they do. A new economic balance of flights costing 2-3 times more than we are used to while having significantly less passengers seems likely. Governments need to spend significant amounts of money to improve trains to take over from flights that are cancelled or too expensive.

Entertainment

The article on The Bulwark mentions Las Vegas as a city that will be hurt a lot by reductions in travel and crowds, the same thing will happen to tourist regions all around the world. Australia has a significant tourist industry that will be hurt a lot. But the mention of Las Vegas makes me wonder what will happen to the gambling in general. Will people avoid casinos and play poker with friends and relatives at home? It seems that small stakes poker games among friends will be much less socially damaging than casinos, will this be good for society?

The article also mentions cinemas which have been on the way out since the video rental stores all closed down. There’s lots of prime real estate used for cinemas and little potential for them to make enough money to cover the rent. Should we just assume that most uses of cinemas will be replaced by Netflix and other streaming services? What about teenage dates, will kissing in the back rows of cinemas be replaced by “Netflix and chill”? What will happen to all the prime real estate used by cinemas?

Professional sporting matches have been played for a TV-only audience during the pandemic. There’s no reason that they couldn’t make a return to live stadium audiences when there is a vaccine for the disease or the disease has been extinguished by social distancing. But I wonder if some fans will start to appreciate the merits of small groups watching large TVs and not want to go back to stadiums, can this change the typical behaviour of groups?

Restaurants and cafes are going to do really badly. I previously wrote about my experience running an Internet Cafe and why reopening businesses soon is a bad idea [4]. The question is how long this will go for and whether social norms about personal space will change things. If in the long term people expect 25% more space in a cafe or restaurant that’s enough to make a significant impact on profitability for many small businesses.

When I was young the standard thing was for people to have dinner at friends homes. Meeting friends for dinner at a restaurant was uncommon. Recently it seemed to be the most common practice for people to meet friends at a restaurant. There are real benefits to meeting at a restaurant in terms of effort and location. Maybe meeting friends at their home for a delivered dinner will become a common compromise, avoiding the effort of cooking while avoiding the extra expense and disease risk of eating out. Food delivery services will do well in the long term, it’s one of the few industry segments which might do better after the pandemic than before.

Work

Many companies are discovering the benefits of teleworking, getting it going effectively has required investing in faster Internet connections and hardware for employees. When we have a vaccine the equipment needed for teleworking will still be there and we will have a discussion about whether it should be used on a more routine basis. When employees spend more than 2 hours per day travelling to and from work (which is very common for people who work in major cities) that will obviously limit the amount of time per day that they can spend working. For the more enthusiastic permanent employees there seems to be a benefit to the employer to allow working from home. It’s obvious that some portion of the companies that were forced to try teleworking will find it effective enough to continue in some degree.

One company that I work for has quit their coworking space in part because they were concerned that the coworking company might go bankrupt due to the pandemic. They seem to have become a 100% work from home company for the office part of the work (only on site installation and stock management is done at corporate locations). Companies running coworking spaces and other shared offices will suffer first as their clients have short term leases. But all companies renting out office space in major cities will suffer due to teleworking. I wonder how this will affect the companies providing services to the office workers, the cafes and restaurants etc. Will there end up being so much unused space in central city areas that it’s not worth converting the city cinemas into useful space?

There’s been a lot of news about Zoom and similar technologies. Lots of other companies are trying to get into that business. One thing that isn’t getting much notice is remote access technologies for desktop support. If the IT people can’t visit your desk because you are working from home then they need to be able to remotely access it to fix things. When people make working from home a large part of their work time the issue of who owns peripherals and how they are tracked will get interesting. In a previous blog post I suggested that keyboards and mice not be treated as assets [5]. But what about monitors, 4G/Wifi access points, etc?

Some people have suggested that there will be business sectors benefiting from the pandemic, such as telecoms and e-commerce. If you have a bunch of people forced to stay home who aren’t broke (IE a large portion of the middle class in Australia) they will probably order delivery of stuff for entertainment. But in the long term e-commerce seems unlikely to change much, people will spend less due to economic uncertainty so while they may shift some purchasing to e-commerce apart from home delivery of groceries e-commerce probably won’t go up overall. Generally telecoms won’t gain anything from teleworking, the Internet access you need for good Netflix viewing is generally greater than that needed for good video-conferencing.

Money

I previously wrote about a Basic Income for Australia [6]. One of the most cited reasons for a Basic Income is to deal with robots replacing people. Now we are at the start of what could be a long term economic contraction caused by the pandemic which could reduce the scale of the economy by a similar degree while also improving the economic case for a robotic workforce. We should implement a Universal Basic Income now.

I previously wrote about the make-work jobs and how we could optimise society to achieve the worthwhile things with less work [7]. My ideas about optimising public transport and using more car share services may not work so well after the pandemic, but the rest should work well.

Business

There are a number of big companies that are not aiming for profitability in the short term. WeWork and Uber are well documented examples. Some of those companies will hopefully go bankrupt and make room for more responsible companies.

The co-working thing was always a precarious business. The companies renting out office space usually did so on a monthly basis as flexibility was one of their selling points, but they presumably rented buildings on an annual basis. As the profit margins weren’t particularly high having to pay rent on mostly empty buildings for a few months will hurt them badly. The long term trend in co-working spaces might be some sort of collaborative arrangement between the people who run them and the landlords similar to the way some of the hotel chains have profit sharing agreements with land owners to avoid both the capital outlay for buying land and the risk involved in renting. Also city hotels are very well equipped to run office space, they have the staff and the procedures for running such a business, most hotels also make significant profits from conventions and conferences.

The way the economy has been working in first world countries has been about being as competitive as possible. Just in time delivery to avoid using storage space and machines to package things in exactly the way that customers need and no more machines than needed for regular capacity. This means that there’s no spare capacity when things go wrong. A few years ago a company making bolts for the car industry went bankrupt because the car companies forced the prices down, then car manufacture stopped due to lack of bolts – this could have been a wake up call but was ignored. Now we have had problems with toilet paper shortages due to it being packaged in wholesale quantities for offices and schools not retail quantities for home use. Food was destroyed because it was created for restaurant packaging and couldn’t be packaged for home use in a reasonable amount of time.

Farmer’s markets alleviate some of the problems with packaging food etc. But they aren’t a good option when there’s a pandemic as disease risk makes them less appealing to customers and therefore less profitable for vendors.

Religion

Many religious groups have supported social distancing. Could this be the start of more decentralised religion? Maybe have people read the holy book of their religion and pray at home instead of being programmed at church? We can always hope.

June 24, 2020

Automated MythTV-related maintenance tasks

Here is the daily/weekly cronjob I put together over the years to perform MythTV-related maintenance tasks on my backend server.

The first part performs a database backup:

5 1 * * *  mythtv  /usr/share/mythtv/mythconverg_backup.pl

which I previously configured by putting the following in /home/mythtv/.mythtv/backuprc:

DBBackupDirectory=/var/backups/mythtv

and creating a new directory for it:

mkdir /var/backups/mythtv
chown mythtv:mythtv /var/backups/mythtv

The second part of /etc/cron.d/mythtv-maintenance runs a contrib script to optimize the database tables:

10 1 * * *  mythtv  /usr/bin/chronic /usr/share/doc/mythtv-backend/contrib/maintenance/optimize_mythdb.pl

once a day. It requires the libmythtv-perl and libxml-simple-perl packages to be installed on Debian-based systems.

It is quickly followed by a check of the recordings and automatic repair of the seektable (when possible):

20 1 * * *  mythtv  /usr/bin/chronic /usr/bin/mythutil --checkrecordings --fixseektable

Next, I force a scan of the music and video databases to pick up anything new that may have been added externally via NFS mounts:

30 1 * * *  mythtv  /usr/bin/mythutil --quiet --scanvideos
31 1 * * *  mythtv  /usr/bin/mythutil --quiet --scanmusic

Finally, I defragment the XFS partition for two hours every day except Friday:

45 1 * * 1-4,6-7  root  /usr/sbin/xfs_fsr

and resync the RAID-1 arrays once a week to ensure that they stay consistent and error-free:

15 3 * * 2  root  /usr/local/sbin/raid_parity_check md0
15 3 * * 4  root  /usr/local/sbin/raid_parity_check md2

using a trivial script.

In addition to that cronjob, I also have smartmontools run daily short and weekly long SMART tests via this blurb in /etc/smartd.conf:

/dev/sda -a -d ata -o on -S on -s (S/../.././04|L/../../6/05)
/dev/sdb -a -d ata -o on -S on -s (S/../.././04|L/../../6/05)

If there are any other automated maintenance tasks you do on your MythTV server, please leave a comment!

June 23, 2020

Squirrelmail vs Roundcube

For some years I’ve had SquirrelMail running on one of my servers for the people who like such things. It seems that the upstream support for SquirrelMail has ended (according to the SquirrelMail Wikipedia page there will be no new releases just Subversion updates to fix bugs). One problem with SquirrelMail that seems unlikely to get fixed is the lack of support for base64 encoded From and Subject fields which are becoming increasingly popular nowadays as people who’s names don’t fit US-ASCII are encoding them in their preferred manner.

I’ve recently installed Roundcube to provide an alternative. Of course one of the few important users of webmail didn’t like it (apparently it doesn’t display well on a recent Samsung Galaxy Note), so now I have to support two webmail systems.

Below is a little Perl script to convert a SquirrelMail abook file into the csv format used for importing a RoundCube contact list.

#!/usr/bin/perl

print "First Name,Last Name,Display Name,E-mail Address\n";
while(<STDIN>)
{
  chomp;
  my @fields = split(/\|/, $_);
  printf("%s,%s,%s %s,%s\n", $fields[1], $fields[2], $fields[0], $fields[4], $fields[3]);
}

June 21, 2020

Open IP over VHF/UHF

I’ve recently done a lot of work on the Codec 2 FSK modem, here is the new README_fsk. It now works at lower SNRs, has been refactored, and is supported by a suite of automated tests.

There is some exciting work going on with Codec 2 modems and VHF/UHF IP links using TAP/TUN (thanks Tomas and Jeroen) – a Linux technology for building IP links from user space “data pumps” – like the Codec 2 modems.

My initial goal for this work is a “100 kbit/s IP link” for VHF/UHF using Codec 2 modems and SDR. One application is moving Ham Radio data past the 1995-era “9600 bits/s data port” paradigm to real time IP.

I’m also interested in IP over TV Whitespace (spare VHF/UHF spectrum) for emergency and developing world applications. I’m a judge for developing world IT grants and the “last 100km” problem comes up again and again. This solution requires just a Raspberry Pi and RTLSDR. At these frequencies antennas could be simply fabricated from wire (cut for the frequency of operation), and soldered directly to the Pi.

Results and Link Budget

As a first step, I’ve taken another look at using RpiTx for FSK, this time at VHF and starting at a modest 10 kbits/s. Over the weekend I performed some Minimum Detectable Signal (MDS) tests and confirmed the 2FSK modem built with RpiTx, a RTL-SDR, and the Codec 2 modem is right on theory at 10 kbits/s, with a MDS of -120dBm.

Putting this in context, a UHF signal has a path loss of 125dB over 100km. So if you have a line of site path, a 10mW (10dBm) signal will be 10-125 = -115dBm at your receiver (assuming basic 0dBi antennas). As -115dBm is greater than the -120dBm MDS, this means your data will be received error free (especially when we add forward error correction). We have sufficient “link margin” and the “link” is closed.

While our 10 kbits/s starting point doesn’t sound like much – even at that rate we get to send 10000*3600*24/8/140 = 771,000 140 byte text messages each day to another station on your horizon. That’s a lot of connectivity in an emergency or when the alternative where you live is nothing.

Method

I’m using the GitHub PR as a logbook for the work, I quite like GitHub and Markdown. This weekends MDS experiments start here.

I had the usual fun and games with attenuating the Rx signal from the Pi down to -120dBm. The transmit signal tries hard to leak around the attenuators via a RF path. I moved the unshielded Pi into another room, and built a “plastic bag and aluminium foil” Faraday cage which worked really well:


These are complex systems and many things can go wrong. Are your Tx/Rx sample clocks close enough? Is your rx signal clipping? Is the gain of your radio sufficient to reduce quantisation noise? Bug in your modem code? DC line in your RTLSDR signal? Loose SMA connector?

I’ve learnt the hard way to test very carefully at each step. First, I run off air samples through a non-real time modem Octave simulation to visualise what’s going on inside the modem. A software oscilloscope.

An Over the Cable (OTC) test is essential before trying Over the Air (OTA) as it gives you a controlled environment to spot issues. MDS tests that measure the Bit error Rate (BER) are also excellent, they effectively absorb every factor in the system and give you an overall score (the Bit Error Rate) you can compare to theory.

Spectral Purity

Here is the spectrum of the FSK signal for a …01010… sequence at 100 kbit/s, at two resolution bandwidths:

The Tx power is about 10dBm, this plot is after some attenuation. I haven’t carefully checked the spurious levels, but the above looks like around -40dBc (off a low 10mW EIRP) over this 1MHz span. If I am reading the Australian regulations correctly (Section 7A of the Amateur LCD) the requirement is 43+10log(P) = 43+10log10(0.01) = 23dBc, so we appear to pass.

Conclusion

This is “extreme open source”. The transmitter is software, the modem is software. All open source and free as in beer and speech. No chipsets or application specific radio hardware – just some CPU cycles and a down converter supplied by the Pi and RTLSDR. The only limits are those of physics – which we have reached with the MDS tests.

Reading Further

FSK modem support for TAP/TUN – GitHub PR for this work
Testing a RTL-SDR with FSK on HF
High Speed Balloon Data Links
Codec 2 FSK modem README – includes lots of links and sample applications.

June 19, 2020

Storage Trends

In considering storage trends for the consumer side I’m looking at the current prices from MSY (where I usually buy computer parts). I know that other stores will have slightly different prices but they should be very similar as they all have low margins and wholesale prices are the main factor.

Small Hard Drives Aren’t Viable

The cheapest hard drive that MSY sells is $68 for 500G of storage. The cheapest SSD is $49 for 120G and the second cheapest is $59 for 240G. SSD is cheaper at the low end and significantly faster. If someone needed about 500G of storage there’s a 480G SSD for $97 which costs $29 more than a hard drive. With a modern PC if you have no hard drives you will notice that it’s quieter. For anyone who’s buying a new PC spending an extra $29 is definitely worthwhile for the performance, low power use, and silence.

The cheapest 1TB disk is $69 and the cheapest 1TB SSD is $159. Saving $90 on the cost of a new PC probably isn’t worth while.

For 2TB of storage the cheapest options are Samsung NVMe for $339, Crucial SSD for $335, or a hard drive for $95. Some people would choose to save $244 by getting a hard drive instead of NVMe, but if you are getting a whole system then allocating $244 to NVMe instead of a faster CPU would probably give more benefits overall.

Computer stores typically have small margins and computer parts tend to quickly either become cheaper or be obsoleted by better parts. So stores don’t want to stock parts unless they will sell quickly. Disks smaller than 2TB probably aren’t going to be profitable for stores for very long. The trend of SSD and NVMe becoming cheaper is going to make 2TB disks non-viable in the near future.

NVMe vs SSD

M.2 NVMe devices are at comparable prices to SATA SSDs. For some combinations of quality and capacity NVMe is about 50% more expensive and for some it’s slightly cheaper (EG Intel 1TB NVMe being cheaper than Samsung EVO 1TB SSD). Last time I checked about half the motherboards on sale had a single M.2 socket so for a new workstation that doesn’t need more than 2TB of storage (the largest NVMe that MSY sells) it wouldn’t make sense to use anything other than NVMe.

The benefit of NVMe is NOT throughput (even though NVMe devices can often sustain over 4GB/s), it’s low latency. Workstations can’t properly take advantage of this because RAM is so cheap ($198 for 32G of DDR4) that compiles etc mostly come from cache and because most filesystem writes on workstations aren’t synchronous. For servers a large portion of writes are synchronous, for example a mail server can’t acknowledge receiving mail until it knows that it’s really on disk, so there’s a lot of small writes that block server processes and the low latency of NVMe really improves performance. If you are doing a big compile on a workstation (the most common workstation task that uses a lot of disk IO) then the writes aren’t synchronised to disk and if the system crashes you will just do all the compilation again. While NVMe doesn’t give a lot of benefit over SSD for workstation use (I’ve uses laptops with SSD and NVMe and not noticed a great difference) of course I still want better performance. ;)

Last time I checked I couldn’t easily buy a PCIe card that supported 2*NVMe cards, I’m sure they are available somewhere but it would take longer to get and probably cost significantly more than twice as much. That means a RAID-1 of NVMe takes 2 PCIe slots if you don’t have an M.2 socket on the motherboard. This was OK when I installed 2*NVMe devices on a server that had 18 disks and lots of spare PCIe slots. But for some systems PCIe slots are an issue.

My home server has all PCIe slots used by a video card and Ethernet cards and the BIOS probably won’t support booting from NVMe. It’s a Dell server so I can’t just replace the motherboard with one that has more PCIe slots and M.2 on the motherboard. As it’s running nicely and doesn’t need replacing any time soon I won’t be using NVMe for home server stuff.

Small Servers

Most servers that I am responsible for have less than 2TB of storage. For my clients I now only recommend SSD storage for small servers and am recommending SSD for replacing any failed disks.

My home server has 2*500G SSDs in a BTRFS RAID-1 for the root filesystem, and 3*4TB disks in a BTRFS RAID-1 for storing big files. I bought the SSDs when 500G SSDs were about $250 each and bought 2*4TB disks when they were about $350 each. Currently that server has about 3.3TB of space used and I could probably get it down to about 2.5TB if I deleted things I don’t really need. If I was getting storage for that server now I’d use 2*2TB SSDs and 3*1TB hard drives for the stuff that doesn’t fit on SSDs (I have some spare 1TB disks that came with servers). If I didn’t have spare hard drives I’d get 3*2TB SSDs for that sort of server which would give 3TB of BTRFS RAID-1 storage.

Last time I checked Dell servers had a card for supporting M.2 as an optional extra so Dells probably won’t boot from NVMe without extra expense.

Ars Technica has an informative article about WD selling SMR disks as “NAS” disks [1]. The Shingled Magnetic Recording technology allows greater storage density on a platter which leads to either larger capacity or cheaper disks but at the cost of lower write performance and apparently extremely bad latency in some situations. NAS disks are supposed to be low latency as the expectation is that they will be used in a RAID array and kicked out of the array if they have problems. There are reports of ZFS kicking SMR disks from RAID sets. I think this will end the use of hard drives for small servers. For a server you don’t want to deal with this sort of thing, by definition when a server goes down multiple people will stop work (small server implies no clustering). Spending extra to get SSDs just to avoid the risk of unexpected SMR would be a good plan.

Medium Servers

The largest SSD and NVMe devices that are readily available are 2TB but 10TB disks are commodity items, there are reports of 20TB hard drives being available but I can’t find anyone in Australia selling them.

If you need to store dozens or hundreds of terabytes than hard drives have to be part of the mix at this time. There’s no technical reason why SSDs larger than 10TB can’t be made (the 2.5″ SATA form factor has more than 5* the volume of a 2TB M.2 card) and it’s likely that someone sells them outside the channels I buy from, but probably at a price higher than what my clients are willing to pay. If you want 100TB of affordable storage then a mid range server like the Dell PowerEdge T640 which can have up to 18*3.5″ disks is good. One of my clients has a PowerEdge T630 with 18*3.5″ disks in the 8TB-10TB range (we replace failed disks with the largest new commodity disks available, it used to have 6TB disks). ZFS version 0.8 introduced a “Special VDEV Class” which stores metadata and possibly small data blocks on faster media. So you could have some RAID-Z groups on hard drives for large storage and the metadata on a RAID-1 on NVMe for fast performance. For medium size arrays on hard drives having a “find /” operation take hours is not uncommon, for large arrays having it take days isn’t that uncommon. So far it seems that ZFS is the only filesystem to have taken the obvious step of storing metadata on SSD/NVMe while bulk data is on cheap large disks.

One problem with large arrays is that the vibration of disks can affect the performance and reliability of nearby disks. The ZFS server I run with 18 disks was originally setup with disks from smaller servers that never had ZFS checksum errors, but when disks from 2 small servers were put in one medium size server they started getting checksum errors presumably due to vibration. This alone is a sufficient reason for paying a premium for SSD storage.

Currently the cost of 2TB of SSD or NVMe is between the prices of 6TB and 8TB hard drives, and the ratio of price/capacity for SSD and NVMe is improving dramatically while the increase in hard drive capacity is slow. 4TB SSDs are available for $895 compared to a 10TB hard drive for $549, so it’s 4* more expensive on a price per TB. This is probably good for Windows systems, but for Linux systems where ZFS and “special VDEVs” is an option it’s probably not worth considering. Most Linux user cases where 4TB SSDs would work well would be better served by smaller NVMe and 10TB disks running ZFS. I don’t think that 4TB SSDs are at all popular at the moment (MSY doesn’t stock them), but prices will come down and they will become common soon enough. Probably by the end of the year SSDs will halve in price and no hard drives less than 4TB will be viable.

For rack mounted servers 2.5″ disks have been popular for a long time. It’s common for vendors to offer 2 versions of a rack mount server for 2.5″ and 3.5″ disks where the 2.5″ version takes twice as many disks. If the issue is total storage in a server 4TB SSDs can give the same capacity as 8TB HDDs.

SMR vs Regular Hard Drives

Rumour has it that you can buy 20TB SMR disks, I haven’t been able to find a reference to anyone who’s selling them in Australia (please comment if you know who sells them and especially if you know the price). I expect that the ZFS developers will soon develop a work-around to solve the problems with SMR disks. Then arrays of 20TB SMR disks with NVMe for “special VDEVs” will be an interesting possibility for storage. I expect that SMR disks will be the majority of the hard drive market by 2023 – if hard drives are still on the market. SSDs will be large enough and cheap enough that only SMR disks will offer enough capacity to be worth using.

I think that it is a possibility that hard drives won’t be manufactured in a few years. The volume of a 3.5″ disk is significantly greater than that of 10 M.2 devices so current technology obviously allows 20TB of NVMe or SSD storage in the space of a 3.5″ disk. If the price of 16TB NVMe and SSD devices comes down enough (to perhaps 3* the price of a 20TB hard drive) almost no-one would want the hard drive and it wouldn’t be viable to manufacture them.

It’s not impossible that in a few years time 3D XPoint and similar fast NVM technologies occupy the first level of storage (the ZFS “special VDEV”, OS swap device, log device for database servers, etc) and NVMe occupies the level for bulk storage with no space left in the market for spinning media.

Computer Cases

For servers I expect that models supporting 3.5″ storage devices will disappear. A 1RU server with 8*2.5″ storage devices or a 2RU server with 16*2.5″ storage devices will probably be of use to more people than a 1RU server with 4*3.5″ or a 2RU server with 8*3.5″.

My first IBM PC compatible system had a 5.25″ hard drive, a 5.25″ floppy drive, and a 3.5″ floppy drive in 1988. My current PC is almost a similar size and has a DVD drive (that I almost never use) 5 other 5.25″ drive bays that have never been used, and 5*3.5″ drive bays that I have never used (I have only used 2.5″ SSDs). It would make more sense to have PC cases designed around 2.5″ and maybe 3.5″ drives with no more than one 5.25″ drive bay.

The Intel NUC SFF PCs are going in the right direction. Many of them only have a single storage device but some of them have 2*M.2 sockets allowing RAID-1 of NVMe and some of them support ECC RAM so they could be used as small servers.

A USB DVD drive costs $36, it doesn’t make sense to have every PC designed around the size of an internal DVD drive that will probably only be used to install the OS when a $36 USB DVD drive can be used for every PC you own.

The only reason I don’t have a NUC for my personal workstation is that I get my workstations from e-waste. If I was going to pay for a PC then a NUC is the sort of thing I’d pay to have on my desk.

LUV June 2020 Workshop: Emergency Security Discussion

Jun 20 2020 12:30
Jun 20 2020 14:30
Jun 20 2020 12:30
Jun 20 2020 14:30
Location: 
Online event (TBA)

On Friday morning, our prime minister held an unprecedented press conference to warn Australia (Governments, Industry & Individuals) about a sophisticated cyber attack that is currently underway.

 

 

Linux Users of Victoria is a subcommittee of Linux Australia.

June 20, 2020 - 12:30

read more

June 16, 2020

Linux Security Summit North America 2020: Online Schedule

Just a quick update on the Linux Security Summit North America (LSS-NA) for 2020.

The event will take place over two days as an online event, due to COVID-19.  The dates are now July 1-2, and the full schedule details may be found here.

The main talks are:

There are also short (30 minute) topics:

This year we will also have a Q&A panel at the end of each day, moderated by Elena Reshetova. The panel speakers are:

  • Nayna Jain
  • Andrew Lutomirski
  • Dmitry Vyukov
  • Emily Ratliff
  • Alexander Popov
  • Christian Brauner
  • Allison Marie Naaktgeboren
  • Kees Cook
  • Mimi Zohar

LSS-NA this year is included with OSS+ELC registration, which is USD $50 all up.  Register here.

Hope to see you soon!

June 14, 2020

Codec 2 HF Data Modes 1

Since “attending” MHDC last month I’ve taken an interest in open source HF data modems. So I’ve been busy refactoring the Codec 2 OFDM modem for use with HF data.

The major change is from streaming small (28 bit) voice frames to longer (few hundred byte) packets of data. In some ways data is easier than PTT voice: latency is no longer an issue, and I can use nice long FEC codewords that ride over fades. On the flip side we really care about bit errors with data, for voice it’s acceptable to pass frames with errors to the speech decoder, and let the human ear work it out.

As a first step I’ve been working with GNU Octave simulations, and have developed 3 candidate data modes that I have been testing against simulated HF channels. In simulation they work well with up to 4ms of delay and 2.5Hz of Doppler.

Here are the simulation results for 10% Packet Error Rate (PER). The multipath channel has 2ms delay spread and 2Hz Doppler (CCITT Multipath Poor channel).

Mode Est Bytes/min AWGN SNR (dB) Multipath Poor SNR (dB)
datac1 6000 3 12
datac2 3000 1 7
datac3 1200 -3 0

The bytes/minute metric is commonly used by Winlink (divide by 7.5 for bits/s). I’ve assumed a 20% overhead for ARQ and other overheads. HF data isn’t fast – it’s a tough, narrow channel to push data through. But for certain applications (e.g. if you’re off the grid, or when the lights go out) it may be all you have. Even these low rates can be quite useful, 1200 bytes/minute is 8.5 tweets or SMS texts/minute.

The modem waveforms are pilot assisted coherent PSK using LDPC FEC codes. Coherent PSK can have gains of up to 6dB over differential PSK (DPSK) modems commonly used on HF.

Before I get too far along I wanted to try them over a real HF channels, to make sure I was on the right track. So much can go wrong with DSP in the real world!

So today I sent the new data waveforms over the air for the first time, using an 800km path on the 40m band from my home in Adelaide South Australia to a KiwiSDR about 800km away in Melbourne, Victoria.

Mode Est Bytes/min Power (Wrms) Est SNR (dB) Packets Tx/Rx
datac1 6000 10 10-15 15/15
datac2 3000 10 5-15 8/8
datac3 1200 0.5 -2 20/25

The Tx power is the RMS measured on my spec-an, for the 10W RMS samples it was 75W PEP. The SNR is measured in a 3000Hz noise bandwidth, I have a simple dipole at my end, not sure what the KiwiSDR was using.

I’m quite happy with these results. To give the c3 waveform a decent work out I dropped the power down to just 0.5W (listen), and I could still get 30% of the packets through at 100mW. A few of the tests had significant fading, however it was not very fast. My simulations are far tougher. Maybe I’ll try a NVIS path to give the modem a decent test on fast fading channels.

Here is the spectrogram (think waterfall on it’s side) for the -2dB datac3 sample:

Here are the uncoded (raw) errors, and the errors after FEC. Most of the frames made it. This mode employs a rate 1/3 LDPC code that was developed by Bill, VK5DSP. It can work at up to 16% raw BER! The errors at the end are due to the Tx signal ending, at this stage of development I just have a simple state machine with no “squelch”.

We have also been busy developing an API for the Codec 2 modems, see README_data.md. The idea is to allow developers of HF data protocols and applications to use the Codec 2 modems. As well as the “raw” HF data API, there is a very nice Ethernet style framer for VHF packet developed by Jeroen Vreeken.

If anyone would like to try running the modem Octave code take a look at the GitHub PR.

Reading Further

QAM and Packet Data for OFDM Pull Request for this work. Includes lots of notes. The waveform designs are described in this spreadsheet.
README for the Codec 2 OFDM modem, includes examples and more links.
Test Report of Various Winlink Modems
Modems for HF Digital Voice Part 1
Modems for HF Digital Voice Part 2

June 06, 2020

Comparing Compression

I just did a quick test of different compression options in Debian. The source file is a 1.1G MySQL dump file. The time is user CPU time on a i7-930 running under KVM, the compression programs may have different levels of optimisation for other CPU families.

Facebook people designed the zstd compression system (here’s a page giving an overview of it [1]). It has some interesting new features that can provide real differences at scale (like unusually large windows and pre-defined dictionaries), but I just tested the default mode and the -9 option for more compression. For the SQL file “zstd -9” provides significantly better compression than gzip while taking only slightly less CPU time than “gzip -9” while zstd with the default option (equivalent to “zstd -3”) gives much faster compression than “gzip -9” while also being slightly smaller. For this use case bzip2 is too slow for inline compression of a MySQL dump as the dump process locks tables and can hang clients. The lzma and xz compression algorithms provide significant benefits in size but the time taken is grossly disproportionate.

In a quick check of my collection of files compressed with gzip I was only able to fine 1 fild that got less compression with zstd with default options, and that file got better compression with “zstd -9”. So zstd seems to beat gzip everywhere by every measure.

The bzip2 compression seems to be obsolete, “zstd -9” is much faster and has slightly smaller output.

Both xz and lzma seem to offer a combination of compression and time taken that zstd can’t beat (for this file type at least). The ultra compression mode 22 gives 2% smaller output files but almost 28 minutes of CPU time for compression is a bit ridiculous. There is a threaded mode for zstd that could potentially allow a shorter wall clock time for “zstd --ultra -22” than lzma/xz while also giving better compression.

Compression Time Size
zstd 5.2s 130m
zstd -9 28.4s 114m
gzip -9 33.4s 141m
bzip2 -9 3m51 119m
lzma 6m20 97m
xz 6m36 97m
zstd -19 9m57 99m
zstd --ultra -22 27m46 95m

Conclusion

For distributions like Debian which have large archives of files that are compressed once and transferred a lot the “zstd --ultra -22” compression might be useful with multi-threaded compression. But given that Debian already has xz in use it might not be worth changing until faster CPUs with lots of cores become more commonly available. One could argue that for Debian it doesn’t make sense to change from xz as hard drives seem to be getting larger capacity (and also smaller physical size) faster than the Debian archive is growing. One possible reason for adopting zstd in a distribution like Debian is that there are more tuning options for things like memory use. It would be possible to have packages for an architecture like ARM that tends to have less RAM compressed in a way that decreases memory use on decompression.

For general compression such as compressing log files and making backups it seems that zstd is the clear winner. Even bzip2 is far too slow and in my tests zstd clearly beats gzip for every combination of compression and time taken. There may be some corner cases where gzip can compete on compression time due to CPU features, optimisation for CPUs, etc but I expect that in almost all cases zstd will win for compression size and time. As an aside I once noticed the 32bit of gzip compressing faster than the 64bit version on an Opteron system, the 32bit version had assembly optimisation and the 64bit version didn’t at that time.

To create a tar archive you can run “tar czf” or “tar cJf” to create an archive with gzip or xz compression. To create an archive with zstd compression you have to use “tar --zstd -cf”, that’s 7 extra characters to type. It’s likely that for most casual archive creation (EG for copying files around on a LAN or USB stick) saving 7 characters of typing is more of a benefit than saving a small amount of CPU time and storage space. It would be really good if tar got a single character option for zstd compression.

The external dictionary support in zstd would work really well with rsync for backups. Currently rsync only supports zlib, adding zstd support would be a good project for someone (unfortunately I don’t have enough spare time).

Now I will change my database backup scripts to use zstd.

Update:

The command “tar acvf a.zst filenames” will create a zstd compressed tar archive, the “a” option to GNU tar makes it autodetect the compression type from the file name. Thanks Enrico!

May 31, 2020

Effective Altruism

Long term readers of the blog may recall my daughter Amy. Well, she has moved on from teenage partying and is now e-volunteering at Effective Altruism Australia. She recently pointed me at the free e-book The Life You Can Save by Peter Singer.

I was already familiar with the work of Peter Singer, having read “the Most Good You Can Do”. Peter puts numbers on altruistic behaviour to evaluate them. This appeals to me – as an engineer I uses numbers to evaluate artefacts I build like modems, or other processes going on in the world like COVD-19.

Using technology to help people is a powerful motivator for Geeks. I’ve been involved in a few of these initiatives myself (OLPC and The Village Telco). It’s really tough to create something that helps people long term. A wider set of skills and capabilities are required than just “the technology”.

On my brief forays into the developing world I’ve seen ecologies of people (from the first and developing worlds) living off development dollars. In some cases there is no incentive to report the true outcomes, for example how many government bureaucrats want to report failure? How many consultants want the gig to end?

So I really get the need for scientific evaluation of any development endeavours. Go Peter and the Effective Altruism movement!

I spend around 1000 hours a year writing open source code, a strong argument that I am “doing enough” in the community space. However I have no idea how effective that code is. Is it helping anyone? My inclination to help is also mixed with “itch scratching” – geeky stuff I want to work on because I find it interesting.

So after the reading the book and having a think – I’m sold. I have committed 5% of my income to Effective Altruism Australia, selecting Give Directly as a target for my funds as it appealed to me personally.

I asked Amy proof read this post – and she suggested that instead of $ you, can donate time – that’s what she does. She also said:

Effective Altrusim opens your eyes to alternative ways to interact with charities. It combines the board field of social science to explore how may aspects intersect; by applying the scientific method to that of economics, psychology, international development, and anthropology.

Reading Further

Busting Teenage Partying with a Fluksometer
Effective Altruism Australia

May 30, 2020

AudioBooks – May 2020

Fewer books this month. At home on lockdown and weather a bit worse so less time to go on walks walks and listen.

Save the Cat! Writes a Novel: The Last Book On Novel Writing You’ll Ever Need by Jessica Brody

A fairly straight adaption of the screenplay-writing manual. Lots of examples from well-known books including full breakdowns of beats. 3/5

Happy Singlehood: The Rising Acceptance and Celebration of Solo Living by Elyakim Kislev

Based on 142 interviews. A lot of summaries of findings with quotes for interviewees and people’s blogs. Last chapter has some policy push but a little lights 3/5

Scandinavia: A History by Ewan Butler

Just a a 6 hour long quick spin though history. First half suffers a bit with lists of Kings although there is a bit more colour later in. Okay prep for something meatier 3/5

One Giant Leap: The Impossible Mission That Flew Us to the Moon by Charles Fishman

A bit of a mix. It covers the legacy of Apollo but the best bits are chapters on the Computers, Politics and other behind the scenes things. A compliment to astronaut and mission orientated books. 4/5

My Scoring System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Share

May 29, 2020

Using Live Linux to Save and Recover Your Data

There are two types of people in the world; those who have lost data and those who are about to. Given that entropy will bite eventually, the objective should be to minimise data loss. Some key rules for this backup, backup often, and backup with redundancy. Whilst an article on that subject will be produced, at this stage discussion is directed to the very specific task of recovering data from old machines which may not be accessible anymore using Linux. There number of times I've done this in past years is somewhat more than the number of fingers I have - however, like all good things it deserves to be documented in the hope that other people might find it useful.

To do this one will need a Linux live distribution of some sort as an ISO, as a bootable USB drive. A typical choice would be a Ubuntu Live or Fedora Live. If one is dealing with damaged hardware the old Slackware-derived minimalist distribution Recovery is Possible (RIP) is certainly worth using; it's certainly saved me in the past. If you need help in creating a bootable USB, the good people at HowToGeek provide some simple instructions.

With a Linux bootable disk of some description inserted in one's system, the recovery process can begin. Firstly, boot the machine and change the book order (in BIOS/UEFI) that the drive in question becomes the first in the boot order. Once the live distribution boots up, usually in a GUI environment, one needs to open the terminal application (e.g., GNOME in Fedora uses Applications, System Tools, Terminal) and change to the root user with the su command (there's no password on a live CD to be root!).

At this point one needs to create a mount point directory, where the data is going to be stored; mkdir /mnt/recovery. After this one needs to identify the disk which one is trying to access. The fdisk -l command will provide a list of all disks in the partition table. Some educated guesswork from the results is required here, which will provide the device filesystem Type; it almost certainly isn't an EFI System, or Linux swap for example. Typically one is trying to access something like /dev/sdaX.

Then one must mount the device to the directory that was just created, for example: mount /dev/sda2 /mnt/recovery. Sometimes a recalcitrant device will need to have the filesystem explicitly stated; the most common being ext3, ext4, fat, xfs, vfat, and ntfs-3g. To give a recent example I needed to run mount -t ext3 /dev/sda3 /mnt/recovery. From there one can copy the data from the mount point to a new source; a USB drive is probably the quickest, although one may take the opportunity to copy it to an external system (e.g., google drive) - and that's it! You've recovered your data!

May 28, 2020

Fixing locale problem in MythTV 30

After upgrading to MythTV 30, I noticed that the interface of mythfrontend switched from the French language to English, despite having the following in my ~/.xsession for the mythtv user:

export LANG=fr_CA.UTF-8
exec ~/bin/start_mythtv

I noticed a few related error messages in /var/log/syslog:

mythbackend[6606]: I CoreContext mythcorecontext.cpp:272 (Init) Assumed character encoding: fr_CA.UTF-8
mythbackend[6606]: N CoreContext mythcorecontext.cpp:1780 (InitLocale) Setting QT default locale to FR_US
mythbackend[6606]: I CoreContext mythcorecontext.cpp:1813 (SaveLocaleDefaults) Current locale FR_US
mythbackend[6606]: E CoreContext mythlocale.cpp:110 (LoadDefaultsFromXML) No locale defaults file for FR_US, skipping
mythpreviewgen[9371]: N CoreContext mythcorecontext.cpp:1780 (InitLocale) Setting QT default locale to FR_US
mythpreviewgen[9371]: I CoreContext mythcorecontext.cpp:1813 (SaveLocaleDefaults) Current locale FR_US
mythpreviewgen[9371]: E CoreContext mythlocale.cpp:110 (LoadDefaultsFromXML) No locale defaults file for FR_US, skipping

Searching for that non-existent fr_US locale, I found that others have this in their logs and that it's apparently set by QT as a combination of the language and country codes.

I therefore looked in the database and found the following:

MariaDB [mythconverg]> SELECT value, data FROM settings WHERE value = 'Language';
+----------+------+
| value    | data |
+----------+------+
| Language | FR   |
+----------+------+
1 row in set (0.000 sec)

MariaDB [mythconverg]> SELECT value, data FROM settings WHERE value = 'Country';
+---------+------+
| value   | data |
+---------+------+
| Country | US   |
+---------+------+
1 row in set (0.000 sec)

which explains the non-sensical FR-US locale.

I fixed the country setting like this

MariaDB [mythconverg]> UPDATE settings SET data = 'CA' WHERE value = 'Country';
Query OK, 1 row affected (0.093 sec)
Rows matched: 1  Changed: 1  Warnings: 0

After logging out and logging back in, the user interface of the frontend is now using the fr_CA locale again and the database setting looks good:

MariaDB [mythconverg]> SELECT value, data FROM settings WHERE value = 'Country';
+---------+------+
| value   | data |
+---------+------+
| Country | CA   |
+---------+------+
1 row in set (0.000 sec)

Introducing Shaken Fist

Share

The first public commit to what would become OpenStack Nova was made ten years ago today — at Thu May 27 23:05:26 2010 PDT to be exact. So first off, happy tenth birthday to Nova!

A lot has happened in that time — OpenStack has gone from being two separate Open Source projects to a whole ecosystem, developers have come and gone (and passed away), and OpenStack has weathered the cloud wars of the last decade. OpenStack survived its early growth phase by deliberately offering a “big tent” to the community and associated vendors, with an expansive definition of what should be included. This has resulted in most developers being associated with a corporate sponser, and hence the decrease in the number of developers today as corporate interest wanes — OpenStack has never been great at attracting or retaining hobbist contributors.

My personal involvement with OpenStack started in November 2011, so while I missed the very early days I was around for a lot and made many of the mistakes that I now see in OpenStack.

What do I see as mistakes in OpenStack in hindsight? Well, embracing vendors who later lose interest has been painful, and has increased the complexity of the code base significantly. Nova itself is now nearly 400,000 lines of code, and that’s after splitting off many of the original features of Nova such as block storage and networking. Additionally, a lot of our initial assumptions are no longer true — for example in many cases we had to write code to implement things, where there are now good libraries available from third parties.

That’s not to say that OpenStack is without value — I am a daily user of OpenStack to this day, and use at least three OpenStack public clouds at the moment. That said, OpenStack is a complicated beast with a lot of legacy that makes it hard to maintain and slow to change.

For at least six months I’ve felt the desire for a simpler cloud orchestration layer — both for my own personal uses, and also as a test bed for ideas for what a smaller, simpler cloud might look like. My personal use case involves a relatively small environment which echos what we now think of as edge compute — less than 10 RU of machines with a minimum of orchestration and management overhead.

At the time that I was thinking about these things, the Australian bushfires and COVID-19 came along, and presented me with a lot more spare time than I had expected to have. While I’m still blessed to be employed, all of my social activities have been cancelled, so I find myself at home at a loose end on weekends and evenings at lot more than before.

Thus Shaken Fist was born — named for a Simpson’s meme, Shaken Fist is a deliberately small and highly opinionated cloud implementation aimed at working well in small deployments such as homes, labs, edge compute locations, deployed systems, and so forth.

I’d taken a bit of trouble with each feature in Shaken Fist to think through what the simplest and highest value way of doing something is. For example, instances always get a config drive and there is no metadata server. There is also only one supported type of virtual networking, and one supported hypervisor. That said, this means Shaken Fist is less than 5,000 lines of code, and small enough that new things can be implemented very quickly by a single middle aged developer.

Shaken Fist definitely has feature gaps — API authentication and scheduling are the most obvious at the moment — but I have plans to fill those when the time comes.

I’m not sure if Shaken Fist is useful to others, but you never know. Its apache2 licensed, and available on github if you’re interested.

Share

May 27, 2020

57 Varieties of Pyrite: Exchanges Are Now The Enemy of Bitcoin

TL;DR: exchanges are casinos and don’t want to onboard anyone into bitcoin. Avoid.

There’s a classic scam in the “crypto” space: advertize Bitcoin to get people in, then sell suckers something else entirely. Over the last few years, this bait-and-switch has become the core competency of “bitcoin” exchanges.

I recently visited the homepage of Australian exchange btcmarkets.net: what a mess. There was a list of dozens of identical-looking “cryptos”, with bitcoin second after something called “XRP”; seems like it was sorted by volume?

Incentives have driven exchanges to become casinos, and they’re doing exactly what you’d expect unregulated casinos to do. This is no place you ever want to send anyone.

Incentives For Exchanges

Exchanges make money on trading, not on buying and holding. Despite the fact that bitcoin is the only real attempt to create an open source money, scams with no future are given false equivalence, because more assets means more trading. Worse than that, they are paid directly to list new scams (the crappier, the more money they can charge!) and have recently taken the logical step of introducing and promoting their own crapcoins directly.

It’s like a gold dealer who also sells 57 varieties of pyrite, which give more margin than selling actual gold.

For a long time, I thought exchanges were merely incompetent. Most can’t even give out fresh addresses for deposits, batch their outgoing transactions, pay competent fee rates, perform RBF or use segwit.

But I misunderstood: they don’t want to sell bitcoin. They use bitcoin to get you in the door, but they want you to gamble. This matters: you’ll find subtle and not-so-subtle blockers to simply buying bitcoin on an exchange. If you send a friend off to buy their first bitcoin, they’re likely to come back with something else. That’s no accident.

Looking Deeper, It Gets Worse.

Regrettably, looking harder at specific exchanges makes the picture even bleaker.

Consider Binance: this mainland China backed exchange pretending to be a Hong Kong exchange appeared out of nowhere with fake volume and demonstrated the gullibility of the entire industry by being treated as if it were a respected member. They lost at least 40,000 bitcoin in a known hack, and they also lost all the personal information people sent them to KYC. They aggressively market their own coin. But basically, they’re just MtGox without Mark Karpales’ PHP skills or moral scruples and much better marketing.

Coinbase is more interesting: an MBA-run “bitcoin” company which really dislikes bitcoin. They got where they are by spending big on regulations compliance in the US so they could operate in (almost?) every US state. (They don’t do much to dispel the wide belief that this regulation protects their users, when in practice it seems only USD deposits have any guarantee). Their natural interest is in increasing regulation to maintain that moat, and their biggest problem is Bitcoin.

They have much more affinity for the centralized coins (Ethereum) where they can have influence and control. The anarchic nature of a genuine open source community (not to mention the developers’ oft-stated aim to improve privacy over time) is not culturally compatible with a top-down company run by the Big Dog. It’s a running joke that their CEO can’t say the word “Bitcoin”, but their recent “what will happen to cryptocurrencies in the 2020s” article is breathtaking in its boldness: innovation is mainly happening on altcoins, and they’re going to overtake bitcoin any day now. Those scaling problems which the Bitcoin developers say they don’t know how to solve? This non-technical CEO knows better.

So, don’t send anyone to an exchange, especially not a “market leading” one. Find some service that actually wants to sell them bitcoin, like CashApp or Swan Bitcoin.

May 26, 2020

Cruises and Covid19

Problems With Cruises

GQ has an insightful and detailed article about Covid19 and the Diamond Princess [1], I recommend reading it.

FastCompany has a brief article about bookings for cruises in August [2]. There have been many negative comments about this online.

The first thing to note is that the cancellation policies on those cruises are more lenient than usual and the prices are lower. So it’s not unreasonable for someone to put down a deposit on a half price holiday in the hope that Covid19 goes away (as so many prominent people have been saying it will) in the knowledge that they will get it refunded if things don’t work out. Of course if the cruise line goes bankrupt then no-one will get a refund, but I think people are expecting that won’t happen.

The GQ article highlights some serious problems with the way cruise ships operate. They have staff crammed in to small cabins and the working areas allow transmission of disease. These problems can be alleviated, they could allocate more space to staff quarters and have more capable air conditioning systems to put in more fresh air. During the life of a cruise ship significant changes are often made, replacing engines with newer more efficient models, changing the size of various rooms for entertainment, installing new waterslides, and many other changes are routinely made. Changing the staff only areas to have better ventilation and more separate space (maybe capsule-hotel style cabins with fresh air piped in) would not be a difficult change. It would take some money and some dry-dock time which would be a significant expense for cruise companies.

Cruises Are Great

People like social environments, they want to have situations where there are as many people as possible without it becoming impossible to move. Cruise ships are carefully designed for the flow of passengers. Both the layout of the ship and the schedule of events are carefully planned to avoid excessive crowds. In terms of meeting the requirement of having as many people as possible in a small area without being unable to move cruise ships are probably ideal.

Because there is a large number of people in a restricted space there are economies of scale on a cruise ship that aren’t available anywhere else. For example the main items on the menu are made in a production line process, this can only be done when you have hundreds of people sitting down to order at the same time.

The same applies to all forms of entertainment on board, they plan the events based on statistical knowledge of what people want to attend. This makes it more economical to run than land based entertainment where people can decide to go elsewhere. On a ship a certain portion of the passengers will see whatever show is presented each night, regardless of whether it’s singing, dancing, or magic.

One major advantage of cruises is that they are all inclusive. If you are on a regular holiday would you pay to see a singing or dancing show? Probably not, but if it’s included then you might as well do it – and it will be pretty good. This benefit is really appreciated by people taking kids on holidays, if kids do things like refuse to attend a performance that you were going to see or reject food once it’s served then it won’t cost any extra.

People Who Criticise Cruises

For the people who sneer at cruises, do you like going to bars? Do you like going to restaurants? Live music shows? Visiting foreign beaches? A cruise gets you all that and more for a discount price.

If Groupon had a deal that gave you a cheap hotel stay with all meals included, free non-alcoholic drinks at bars, day long entertainment for kids at the kids clubs, and two live performances every evening how many of the people who reject cruises would buy it? A typical cruise is just like a Groupon deal for non-stop entertainment from 8AM to 11PM.

Will Cruises Restart?

The entertainment options that cruises offer are greatly desired by many people. Most cruises are aimed at budget travellers, the price is cheaper than a hotel in a major city. Such cruises greatly depend on economies of scale, if they can’t get the ships filled then they would need to raise prices (thus decreasing demand) to try to make a profit. I think that some older cruise ships will be scrapped in the near future and some of the newer ships will be sold to cruise lines that cater to cheap travel (IE P&O may scrap some ships and some of the older Princess ships may be transferred to them). Overall I predict a decrease in the number of middle-class cruise ships.

For the expensive cruises (where the cheapest cabins cost over $1000US per person per night) I don’t expect any real changes, maybe they will have fewer passengers and higher prices to allow more social distancing or something.

I am certain that cruises will start again, but it’s too early to predict when. Going on a cruise is about as safe as going to a concert or a major sporting event. No-one is predicting that sporting stadiums will be closed forever or live concerts will be cancelled forever, so really no-one should expect that cruises will be cancelled forever. Whether companies that own ships or stadiums go bankrupt in the mean time is yet to be determined.

One thing that’s been happening for years is themed cruises. A group can book out an entire ship or part of a ship for a themed cruise. I expect this to become much more popular when cruises start again as it will make it easier to fill ships. In the past it seems that cruise lines let companies book their ships for events but didn’t take much of an active role in the process. I think that the management of cruise lines will look to aggressively market themed cruises to anyone who might help, for starters they could reach out to every 80s and 90s pop group – those fans are all old enough to be interested in themed cruises and the musicians won’t be asking for too much money.

Conclusion

Humans are social creatures. People want to attend events with many other people. Covid 19 won’t be the last pandemic, and it may not even be eradicated in the near future. The possibility of having a society where no-one leaves home unless they are in a hazmat suit has been explored in science fiction, but I don’t think that’s a plausible scenario for the near future and I don’t think that it’s something that will be caused by Covid 19.

May 25, 2020

op-build v2.5 firmware for the Raptor Blackbird

Well, following on from my post where I excitedly pointed out that Raptor Blackbird support: all upstream in op-build v2.5, that means I can do another in my series of (close to) upstream Blackbird firmware builds.

This time, the only difference from straight upstream op-build v2.5 is my fixes for buildroot so that I can actually build it on Fedora 32.

So, head over to https://www.flamingspork.com/blackbird/op-build-v2.5-blackbird-images/ and grab blackbird.pnor to flash it on your blackbird, let me know how it goes!

GNS3 FRR Appliance

In my spare time, what little I have, I’ve been wanting to play with some OSS networking projects. For those playing along at home, during last Suse hackweek I played with wireguard, and to test the environment I wanted to set up some routing.
For which I used FRR.

FRR is a pretty cool project, if brings the networking routing stack to Linux, or rather gives us a full opensource routing stack. As most routers are actually Linux anyway.

Many years ago I happened to work at Fujitsu working in a gateway environment, and started playing around with networking. And that was my first experience with GNS3. An opensource network simulator. Back then I needed to have a copy of cisco IOS images to really play with routing protocols, so that make things harder, great open source product but needed access to proprietary router OSes.

FRR provides a CLI _very_ similar to ciscos, and make we think, hey I wonder if there is an FRR appliance we can use in GNS3?
And there was!!!

When I downloaded it and decompressed the cow2 image it was 1.5GB!!! For a single router image. It works great, but what if I wanted a bunch of routers to play with things like OSPF or BGP etc. Surely we can make a smaller one.

Kiwi

At Suse we use kiwi-ng to build machine images and release media. And to make things even easier for me we already have a kiwi config for small OpenSuse Leap JEOS images, jeos is “just enough OS”. So I hacked one to include FRR. All extra tweaks needed to the image are also easily done by bash hook scripts.

I wont go in to too much detail how because I created a git repo where I have it all including a detailed README: https://github.com/matthewoliver/frr_gns3

So feel free to check that would and build and use the image.

But today, I went one step further. OpenSuse’s Open Build System, which is used to build all RPMs for OpenSuse, but can also build debs and whatever build you need, also supports building docker containers and system images using kiwi!

So have now got the OBS to build the image for me. The image can be downloaded from: https://download.opensuse.org/repositories/home:/mattoliverau/images/

And if you want to send any OBS requests to change it the project/package is: https://build.opensuse.org/package/show/home:mattoliverau/FRR-OpenSuse-Appliance

To import it into GNS3 you need the gns3a file, which you can find in my git repo or in the OBS project page.

The best part is this image is only 300MB, which is much better then 1.5GB!
I did have it a little smaller, 200-250MB, but unfortunately the JEOS cut down kernel doesn’t contain the MPLS modules, so had to pull in the full default SUSE kernel. If this became a real thing and not a pet project, I could go and build a FRR cutdown kernel to get the size down, but 300MB is already a lot better then where it was at.

Hostname Hack

When using GNS3 and you place a router, you want to be able to name the router and when you access the console it’s _really_ nice to see the router name you specified in GNS3 as the hostname. Why, because if you have a bunch, you want want a bunch of tags all with the localhost hostname on the commandline… this doesn’t really help.

The FRR image is using qemu, and there wasn’t a nice way to access the name of the VM from inside the container, and now an easy way to insert the name from outside. But found 1 approach that seems to be working, enter my dodgy hostname hack!

I also wanted to to it without hacking the gns3server code. I couldn’t easily pass the hostname in but I could pass it in via a null device with the router name its id:

/dev/virtio-ports/frr.router.hostname.%vm-name%

So I simply wrote a script that sets the hostname based on the existence of this device. Made the script a systemd oneshot service to start at boot and it worked!

This means changing the name of the FRR router in the GNS3 interface, all you need to do is restart the router (stop and start the device) and it’ll apply the name to the router. This saves you having to log in as root and running hostname yourself.

Or better, if you name all your FRR routers before turning them on, then it’ll just work.

In conclusion…

Hopefully now we can have a fully opensource, GNS3 + FRR appliance solution for network training, testing, and inspiring network engineers.

May 24, 2020

Printing hard-to-print PDFs on Linux

I recently found a few PDFs which I was unable to print due to those files causing insufficient printer memory errors:

I found a detailed explanation of what might be causing this which pointed the finger at transparent images, a PDF 1.4 feature which apparently requires a more recent version of PostScript than what my printer supports.

Using Okular's Force rasterization option (accessible via the print dialog) does work by essentially rendering everything ahead of time and outputing a big image to be sent to the printer. The quality is not very good however.

Converting a PDF to DjVu

The best solution I found makes use of a different file format: .djvu

Such files are not PDFs, but can still be opened in Evince and Okular, as well as in the dedicated DjVuLibre application.

As an example, I was unable to print page 11 of this paper. Using pdfinfo, I found that it is in PDF 1.5 format and so the transparency effects could be the cause of the out-of-memory printer error.

Here's how I converted it to a high-quality DjVu file I could print without problems using Evince:

pdf2djvu -d 1200 2002.04049.pdf > 2002.04049-1200dpi.djvu

Converting a PDF to PDF 1.3

I also tried the DjVu trick on a different unprintable PDF, but it failed to print, even after lowering the resolution to 600dpi:

pdf2djvu -d 600 dow-faq_v1.1.pdf > dow-faq_v1.1-600dpi.djvu

In this case, I used a different technique and simply converted the PDF to version 1.3 (from version 1.6 according to pdfinfo):

ps2pdf13 -r1200x1200 dow-faq_v1.1.pdf dow-faq_v1.1-1200dpi.pdf

This eliminates the problematic transparency and rasterizes the elements that version 1.3 doesn't support.

May 23, 2020

A totally cheating sour dough starter

Share

This is the third in a series of posts documenting my adventures in making bread during the COVID-19 shutdown. I’d like to imagine I was running science experiments in making bread on my kids, but really all I was trying to do was eat some toast.

I’m not sure what it was like in other parts of the world, but during the COVID-19 pandemic Australia suffered a bunch of shortages — toilet paper, flour, and yeast were among those things stores simply didn’t have any stock of. Luckily we’d only just done a costco shop so were ok for toilet paper and flour, but we were definitely getting low on yeast. The obvious answer is a sour dough starter, but I’d never done that thing before.

In the end my answer was to cheat and use this recipe. However, I found the instructions unclear, so here’s what I ended up doing:

Starting off

  • 2 cups of warm water
  • 2 teaspoons of dry yeast
  • 2 cups of bakers flour

Mix these three items together in a plastic container with enough space for the mix to double in size. Place in a warm place (on the bench on top of the dish washer was our answer), and cover with cloth secured with a rubber band.

Feeding

Once a day you should feed your starter with 1 cup of flour and 1 cup of warm water. Stir throughly.

Reducing size

The recipe online says to feed for five days, but the size of my starter was getting out of hand by a couple of days, so I started baking at that point. I’ll describe the baking process in a later post. The early loaves definitely weren’t as good as the more recent ones, but they were still edible.

Hybernation

Once the starter is going, you feed daily and probably need to bake daily to keep the starters size under control. That obviously doesn’t work so great if you can’t eat an entire loaf of bread a day. You can hybernate the starter by putting it in the fridge, which means you only need to feed it once a week.

To wake a hybernated starter up, take it out of the fridge and feed it. I do this at 8am. That means I can then start the loaf for baking at about noon, and the starter can either go back in the fridge until next time or stay on the bench being fed daily.

I have noticed that sometimes the starter comes out of the fridge with a layer of dark water on top. Its worked out ok for us to just ignore that and stir it into the mix as part of the feeding process. Hopefully we wont die.

Share

Refurbishing my Macintosh Plus

Somewhere in the mid to late 1990s I picked myself up a Macintosh Plus for the sum of $60AUD. At that time there were still computer Swap Meets where old and interesting equipment was around, so I headed over to one at some point (at the St Kilda Town Hall if memory serves) and picked myself up four 1MB SIMMs to boost the RAM of it from the standard 1MB to the insane amount of 4MB. Why? Umm… because I could? The RAM was pretty cheap, and somewhere in the house to this day, I sometimes stumble over the 256KB SIMMs as I just can’t bring myself to get rid of them.

This upgrade probably would have cost close to $2,000 at the system’s release. If the Macintosh system software were better at disk caching you could have easily held the whole 800k of the floppy disk in memory and still run useful software!

One of the annoying things that started with the Macintosh was odd screws and Apple gear being hard to get into. Compare to say, the Apple ][ which had handy clips to jump inside whenever. In fitting my massive FOUR MEGABYTES of RAM back in the day, I recall using a couple of allen keys sticky-taped together to be able to reach in and get the recessed Torx screws. These days, I can just order a torx bit off Amazon and have it arrive pretty quickly. Well, two torx bits, one of which is just too short for the job.

My (dusty) Macintosh Plus

One thing had always struck me about it, it never really looked like the photos of the Macintosh Plus I saw in books. In what is an embarrassing number of years later, I learned that a lot can be gotten from the serial number printed on the underside of the front of the case.

So heading over to the My Old Mac Serial Number Decoder I can find out:

Manufactured in: F => Fremont, California, USA
Year of production: 1985
Week of production: 14
Production number: 3V3 => 4457
Model ID: M0001WP => Macintosh 512K (European Macintosh ED)

Your Macintosh 512K (European Macintosh ED) was the 4457th Mac manufactured during the 14th week of 1985 in Fremont, California, USA.

Pretty cool! So it is certainly a Plus as the logic board says that, but it’s actually an upgraded 512k! If you think it was madness to have a GUI with only 128k of RAM in the original Macintosh, you’d be right. I do not envy anybody who had one of those.

Some time a decent (but not too many, less than 10) years ago, I turn on the Mac Plus to see if it still worked. It did! But then… some magic smoke started to come out (which isn’t so good), but the computer kept working! There’s something utterly bizarre about looking at a computer with smoke coming out of it that continues to function perfectly fine.

Anyway, as the smoke was coming out, I decided that it would be an opportune time to turn it off, open doors and windows, and put it away until I was ready to deal with it.

One Global Pandemic Later, and now was the time.

I suspected it was going to be a capacitor somewhere that blew, and figured that I should replace it, and probably preemptively replace all the other electrolytic capacitors that could likely leak and cause problems.

First thing’s first though: dismantle it and clean everything. First, taking the case off. Apple is not new to the game of annoying screws to get into things. I ended up spending $12 on this set on Amazon, as the T10 bit can actually reach the screws holding the case on.

Cathode Ray Tubes are not to be messed with. We’re talking lethal voltages here. It had been many years since electricity went into this thing, so all was good. If this all doesn’t work first time when reassembling it, I’m not exactly looking forward to discharging a CRT and working on it.

The inside of my Macintosh Plus, with lots of grime.

You can see there’s grime everywhere. It’s not the worst in the world, but it’s not great (and kinda sticky). Obviously, this needs to be cleaned! The best way to do that is take a lot of photos, dismantle everything, and clean it a bit at a time.

There’s four main electronic components inside a Macintosh Plus:

  1. The CRT itself
  2. The floppy disk drive
  3. The Logic Board (what Mac people call what PC people call the motherboard)
  4. The Analog Board

There’s also some metal structure that keeps some things in place. There’s only a few connectors between things, which are pretty easy to remove. If you don’t know how to discharge a CRT and what the dangers of them are you should immediately go and find out through reading rather than finding out by dying. I would much prefer it if you dyed (because creative fun) rather than died.

Once the floppy connector and the power connector is unplugged, the logic board slides out pretty easily. You can see from the photo below that I have the 4MB of RAM installed and the resistor you need to snip is, well, snipped (but look really closely for that). Also, grime.

Macintosh Plus Logic Board

Cleaning things? Well, there’s two ways that I have used (and considering I haven’t yet written the post with “hurray, it all works”, currently take it with a grain of salt until I write that post). One: contact cleaner. Two: detergent.

Macintosh Plus Logic Board (being washed in my sink)

I took the route of cleaning things first, and then doing recapping adventures. So it was some contact cleaner on the boards, and then some soaking with detergent. This actually all worked pretty well.

Logic Board Capacitors:

  • C5, C6, C7, C12, C13 = 33uF 16V 85C (measured at 39uF, 38uF, 38uF, 39uF)
  • C14 = 1uF 50V (measured at 1.2uF and then it fluctuated down to around 1.15uF)

Analog Board Capacitors

  • C1 = 35V 3.9uF (M) measured at 4.37uF
  • C2 = 16V 4700uF SM measured at 4446uF
  • C3 = 16V 220uF +105C measured at 234uF
  • C5 = 10V 47uF 85C measured at 45.6uF
  • C6 = 50V 22uF 85C measured at 23.3uF
  • C10 = 16V 33uF 85C measured at 37uF
  • C11 = 160V 10uF 85C measured at 11.4uF
  • C12 = 50V 22uF 85C measured at 23.2uF
  • C18 = 16V 33uF 85C measured at 36.7uF
  • C24 = 16V 2200uF 105C measured at 2469uF
  • C27 = 16V 2200uF 105C measured at 2171uF (although started at 2190 and then went down slowly)
  • C28 = 16V 1000uF 105C measured at 638uF, then 1037uF, then 1000uF, then 987uF
  • C30 = 16V 2200uF 105C measured at 2203uF
  • C31 = 16V 220uF 105C measured at 236uF
  • C32 = 16V 2200uF 105C measured at 2227uF
  • C34 = 200V 100uF 85C measured at 101.8uF
  • C35 = 200V 100uF 85C measured at 103.3uF
  • C37 = 250V 0.47uF measured at <exploded>. wheee!
  • C38 = 200V 100uF 85C measured at 103.3uF
  • C39 = 200V 100uF 85C mesaured at 99.6uF (with scorch marks from next door)
  • C42 = 10V 470uF 85C measured at 556uF
  • C45 = 10V 470uF 85C measured at 227uF, then 637uF then 600uF

I’ve ordered an analog board kit from https://console5.com/store/macintosh-128k-512k-plus-analog-pcb-cap-kit-630-0102-661-0462.html and when trying to put them in, I learned that the US Analog board is different to the International Analog board!!! Gah. Dammit.

Note that C30, C32, C38, C39, and C37 were missing from the kit I received (probably due to differences in the US and International boards). I did have an X2 cap (for C37) but it was 0.1uF not 0.47uF. I also had two extra 1000uF 16V caps.

Macintosh Repair and Upgrade Secrets (up to the Mac SE no less!) holds an Appendix with the parts listing for both the US and International Analog boards, and this led me to conclude that they are in fact different boards rather than just a few wires that are different. I am not sure what the “For 120V operation, W12 must be in place” and “for 240V operation, W12 must be removed” writing is about on the International Analog board, but I’m not quite up to messing with that at the moment.

So, I ordered the parts (linked above) and waited (again) to be able to finish re-capping the board.

I found https://youtu.be/H9dxJ7uNXOA video to be a good one for learning a bunch about the insides of compact Macs, I recommend it and several others on his YouTube channel. One interesting thing I learned is that the X2 cap (C37 on the International one) is before the power switch, so could blow just by having the system plugged in and not turned on! Okay, so I’m kind of assuming that it also applies to the International board, and mine exploded while it was plugged in and switched on, so YMMV.

Additionally, there’s an interesting list of commonly failing parts. Unfortunately, this is also for the US logic board, so the tables in Macintosh Repair and Upgrade Secrets are useful. I’m hoping that I don’t have to replace anything more there, but we’ll see.

But, after the Nth round of parts being delivered….

Note the lack of an exploded capacitor

Yep, that’s where the exploded cap was before. Cleanup up all pretty nicely actually. Annoyingly, I had to run it all through a step-up transformer as the board is all set for Australian 240V rather than US 120V. This isn’t going to be an everyday computer though, so it’s fine.

Macintosh Plus booting up (note how long the memory check of 4MB of RAM takes. I’m being very careful as the cover is off. High, and possibly lethal voltages exposed.

Woohoo! It works. While I haven’t found my supply of floppy disks that (at least used to) work, the floppy mechanism also seems to work okay.

Macintosh Plus with a seemingly working floppy drive mechanism. I haven’t found a boot floppy yet though.

Next up: waiting for my Floppy Emu to arrive as it’ll certainly let it boot. Also, it’s now time to rip the house apart to find a floppy disk that certainly should have made its way across the ocean with the move…. Oh, and also to clean up the mouse and keyboard.

May 18, 2020

Displaying client IP address using Apache Server-Side Includes

If you use a Dynamic DNS setup to reach machines which are not behind a stable IP address, you will likely have a need to probe these machines' public IP addresses. One option is to use an insecure service like Oracle's http://checkip.dyndns.com/ which echoes back your client IP, but you can also do this on your own server if you have one.

There are multiple options to do this, like writing a CGI or PHP script, but those are fairly heavyweight if that's all you need mod_cgi or PHP for. Instead, I decided to use Apache's built-in Server-Side Includes.

Apache configuration

Start by turning on the include filter by adding the following in /etc/apache2/conf-available/ssi.conf:

AddType text/html .shtml
AddOutputFilter INCLUDES .shtml

and making that configuration file active:

a2enconf ssi

Then, find the vhost file where you want to enable SSI and add the following options to a Location or Directory section:

<Location /ssi_files>
    Options +IncludesNOEXEC
    SSLRequireSSL
    Header set Content-Security-Policy: "default-src 'none'"
    Header set X-Content-Type-Options: "nosniff"
</Location>

before adding the necessary modules:

a2enmod headers
a2enmod include

and restarting Apache:

apache2ctl configtest && systemctl restart apache2.service

Create an shtml page

With the web server ready to process SSI instructions, the following HTML blurb can be used to display the client IP address:

<!--#echo var="REMOTE_ADDR" -->

or any other built-in variable.

Note that you don't need to write a valid HTML for the variable to be substituted and so the above one-liner is all I use on my server.

Security concerns

The first thing to note is that the configuration section uses the IncludesNOEXEC option in order to disable arbitrary command execution via SSI. In addition, you can also make sure that the cgi module is disabled since that's a dependency of the more dangerous side of SSI:

a2dismod cgi

Of course, if you rely on this IP address to be accurate, for example because you'll be putting it in your DNS, then you should make sure that you only serve this page over HTTPS, which can be enforced via the SSLRequireSSL directive.

I included two other headers in the above vhost config (Content-Security-Policy and X-Content-Type-Options) in order to limit the damage that could be done in case a malicious file was accidentally dropped in that directory.

Finally, I suggest making sure that only the root user has writable access to the directory which has server-side includes enabled:

$ ls -la /var/www/ssi_includes/
total 12
drwxr-xr-x  2 root     root     4096 May 18 15:58 .
drwxr-xr-x 16 root     root     4096 May 18 15:40 ..
-rw-r--r--  1 root     root        0 May 18 15:46 index.html
-rw-r--r--  1 root     root       32 May 18 15:58 whatsmyip.shtml

A Good Time to Upgrade PCs

PC hardware just keeps getting cheaper and faster. Now that so many people have been working from home the deficiencies of home PCs are becoming apparent. I’ll give Australian prices and URLs in this post, but I think that similar prices will be available everywhere that people read my blog.

From MSY (parts list PDF ) [1] 120G SATA SSDs are under $50 each. 120G is more than enough for a basic workstation, so you are looking at $42 or so for fast quiet storage or $84 or so for the same with RAID-1. Being quiet is a significant luxury feature and it’s also useful if you are going to be in video conferences.

For more serious storage NVMe starts at around $100 per unit, I think that $124 for a 500G Crucial NVMe is the best low end option (paying $95 for a 250G Kingston device doesn’t seem like enough savings to be worth it). So that’s $248 for 500G of very fast RAID-1 storage. There’s a Samsung 2TB NVMe device for $349 which is good if you need more storage, it’s interesting to note that this is significantly cheaper than the Samsung 2TB SSD which costs $455. I wonder if SATA SSD devices will go away in the future, it might end up being SATA for slow/cheap spinning media and M.2 NVMe for solid state storage. The SATA SSD devices are only good for use in older systems that don’t have M.2 sockets on the motherboard.

It seems that most new motherboards have one M.2 socket on the motherboard with NVMe support, and presumably support for booting from NVMe. But dual M.2 sockets is rare and the price difference is significantly greater than the cost of a PCIe M.2 card to support NVMe which is $14. So for NVMe RAID-1 it seems that the best option is a motherboard with a single NVMe socket (starting at $89 for a AM4 socket motherboard – the current standard for AMD CPUs) and a PCIe M.2 card.

One thing to note about NVMe is that different drivers are required. On Linux this means means building a new initrd before the migration (or afterwards when booted from a recovery image) and on Windows probably means a fresh install from special installation media with NVMe drivers.

All the AM4 motherboards seem to have RADEON Vega graphics built in which is capable of 4K resolution at a stated refresh of around 24Hz. The ones that give detail about the interfaces say that they have HDMI 1.4 which means a maximum of 30Hz at 4K resolution if you have the color encoding that suits text (IE for use other than just video). I covered this issue in detail in my blog post about DisplayPort and 4K resolution [2]. So a basic AM4 motherboard won’t give great 4K display support, but it will probably be good for a cheap start.

$89 for motherboard, $124 for 500G NVMe, $344 for a Ryzen 5 3600 CPU (not the cheapest AM4 but in the middle range and good value for money), and $99 for 16G of RAM (DDR4 RAM is cheaper than DDR3 RAM) gives the core of a very decent system for $656 (assuming you have a working system to upgrade and peripherals to go with it).

Currently Kogan has 4K resolution monitors starting at $329 [3]. They probably won’t be the greatest monitors but my experience of a past cheap 4K monitor from Kogan was that it is quite OK. Samsung 4K monitors started at about $400 last time I could check (Kogan currently has no stock of them and doesn’t display the price), I’d pay an extra $70 for Samsung, but the Kogan branded product is probably good enough for most people. So you are looking at under $1000 for a new system with fast CPU, DDR4 RAM, NVMe storage, and a 4K monitor if you already have the case, PSU, keyboard, mouse, etc.

It seems quite likely that the 4K video hardware on a cheap AM4 motherboard won’t be that great for games and it will definitely be lacking for watching TV documentaries. Whether such deficiencies are worth spending money on a PCIe video card (starting at $50 for a low end card but costing significantly more for 3D gaming at 4K resolution) is a matter of opinion. I probably wouldn’t have spent extra for a PCIe video card if I had 4K video on the motherboard. Not only does using built in video save money it means one less fan running (less background noise) and probably less electricity use too.

My Plans

I currently have a workstation with 2*500G SATA SSDs in a RAID-1 array, 16G of RAM, and a i5-2500 CPU (just under 1/4 the speed of the Ryzen 5 3600). If I had hard drives then I would definitely buy a new system right now. But as I have SSDs that work nicely (quiet and fast enough for most things) and almost all machines I personally use have SSDs (so I can’t get a benefit from moving my current SSDs to another system) I would just get CPU, motherboard, and RAM. So the question is whether to spend $532 for more than 4* the CPU performance. At the moment I’ll wait because I’ll probably get a free system with DDR4 RAM in the near future, while it probably won’t be as fast as a Ryzen 5 3600, it should be at least twice as fast as what I currently have.

May 17, 2020

Notes on Installing Ubuntu 20 VM on an MS-Windows 10 Host

Some thirteen years ago I worked with Xen virtual machines as part of my day job, and gave a presentation at Linux Users of Victoria on the subject (with additional lecture notes). A few years after that I gave another presentation on the Unified Extensible Firmware Interface (UEFI), itself which (indirectly) led to a post on Linux and MS-Windows 8 dual-booting. All of this now leads to a some notes on using MS-Windows as a host for Ubuntu Linux guest machines.

Why Would You Want to do This?

Most people these have at least heard of Linux. They might even know that every single supercomputer in the world uses Linux. They may know that the overwhelming majority of embedded devices, such as home routers, use Linux. Or maybe even that the Android mobile 'phone uses a Linux kernel. Or that MacOS is built on the same broad family of UNIX-like operating systems. Whilst they might be familiar with their MS-Windows environment, because that's what they've been brought up on and what their favourite applications are designed for, they might also be "Linux curious", especially if they are hoping to either scale-up the complexity and volume of the datasets they're working with (i.e., towards high performance computing) or scale-down their applications (i.e., towards embedded devices). If this is the case, then introducing Linux via a virtual machine (VM) is a relatively safe and easy path to experiment with.

About VMs

Virtual machines work by emulating a computer system, including hardware, in a software environment, a technology that has been around for a very long time (e.g., CP/CMS, 1967). The VMs in a host system is managed by a hypervisor, or Virtual Machine Monitor (VMM), that manages one or more guest systems. In the example that follows VirtualBox, a free-and-open source hypervisor. Because the guest system relies on the host it cannot have the same performance as a host system, unlike a dual-boot system. It will share memory, it will share processing power, it must take up some disk space, and will also have the overhead of the hypervisor itself (although this has improved a great deal in recent years). In a production environment, VMs are usually used to optimise resource allocation for very powerful systems, such as web-server farms and bodies like the Nectar Research Cloud, or even some partitions on systems like the University of Melbourne's supercomputer, Spartan. In a development environment, VMs are an excellent tool for testing and debugging.

Install VirtualBox and Enable Virtualization

For most environments VirtualBox is an easy path for creating a virtual machine, ARM systems excluded (QEMU suggested for Raspberry Pi or Android, or QEMU's fork, KVM). For the example given here, simply download VirtualBox for MS-Windows and click one's way through the installation process, noting that it VirtualBox will make changes to your system and that products from Oracle can be trusted (*blink*). Download for other operating environments are worth looking at as well.

It is essential to enable virtualisation on your MS-Windows host through the BIOS/UEFI, which is not as easy as it used to be. A handy page from some smart people in the Czech Republic provides quick instructions for a variety of hardware environments. The good people at laptopmag provide the path from within the MS-Windows environment. In summary; select Settings (gear icon), select Update & Security, Select Recovery (this sounds wrong), Advanced Startup, Restart Now (which is also wrong, you don't restart now), Troubleshoot, Advanced Options, UEFI Firmware Settings, then Restart.

Install Linux and Create a Shared Folder

Download a Ubuntu 20.04 LTS (long-term support) ISO and save to the MS-Windows host. There are some clever alternatives, such as the Ubuntu Linux terminal environment for MS-Windows (which is possibly even a better choice these days, but that will be for another post), or Multipass which allows one to create their own mini-cloud environment. But this is a discussion for a VM, so I'll resist the temptation to go off on a tangent.

Creating a VM in VirtualBox is pretty straight-forward; open the application, select "New", give the VM a name, and allocate resources (virtual hard disk, virtual memory). It's worthwhile tending towards the generous in resource allocation. After that it is a case selecting the ISO in settings and storage; remember a VM does not have a real disk drive, so it has a virtual (software) one. After this one can start the VM, and it will boot from the ISO and begin the installation process for Ubuntu Linux desktop edition, which is pretty straight forward. One amusing caveat, when the installation says it's going to wipe the disk it doesn't mean the host machine, just that of the virtual disk that has been build for it. When the installation is complete go to "Devices" on the VM menu, and remove the boot disk and restart the guest system; you now have a Ubuntu VM installed on your MS-Windows system.

By default, VMs do not have access to the host computer. To provide that access one will want to set up a shared folder in the VM and on the host. The first step in this environment would be to give the Linux user (created during installation) membership to the vboxsf, e.g., on the terminal sudo usermod -a -G vboxsf username. In VirtualBox, select Settings, and add a Share under as a Machine Folders, which is a permanent folder. Under Folder Path set the name and location on the host operating system (e.g., UbuntuShared on the Desktop); leave automount blank (we can fix that soon enough). Put a test file in the shared folder.

Ubuntu now needs additional software installed to work with VirtualBox's Guest Additions, including kernel modules. Also, mount VirtualBox's Guest Additions to the guest VM, under Devices as a virtual CD; you can download this from the VirtualBox website.

Run the following commands, entering the default user's password as needed:


sudo apt-get install -y build-essential linux-headers-`uname -r`
sudo /media/cdrom/./VBoxLinuxAdditions.run
sudo shutdown -r now # Reboot the system
mkdir ~/UbuntuShared
sudo mount -t vboxsf shared ~/UbuntuShared
cd ~/UbuntuShared

The file that was put in the UbuntuShared folder in MS-Windows should now be visible in ~/UbuntuShared. Add a file (e.g., touch testfile.txt) from Linux and check if it can seen in MS-Windows. If this all succeeds, make the folder persistent.


sudo nano /etc/fstab # nano is just fine for short configuration files
# Add the following, separate by tabs, and save
shared /home//UbuntuShared vboxsf defaults 0 0
# Edit modules
sudo nano /etc/modules
# Add the following
vboxsf
# Exit and reboot
sudo shutdown -r now

You're done! You now have a Ubuntu desktop system running as a VM guest using VirtualBox on an MS-Windows 10 host system. Ideal for learning, testing, and debugging.

A super simple non-breadmaker loaf

Share

This is the second in a series of posts documenting my adventures in making bread during the COVID-19 shutdown. Yes I know all the cool kids made bread for themselves during the shutdown, but I did it too!

A loaf of bread

So here we were, in the middle of a pandemic which closed bakeries and cancelled almost all of my non-work activities. I found this animated GIF on Reddit for a super simple no-kneed bread and decided to give it a go. It turns out that a few things are true:

  • animated GIFs are a super terrible way store recipes
  • that animated GIF was a export of this YouTube video which originally accompanied this blog post
  • and that I only learned these things while to trying and work out who to credit for this recipe

The basic recipe is really easy — chuck the following into a big bowl, stir, and then cover with a plate. Leave resting a warm place for a long time (three or four hours), then turn out onto a floured bench. Fold into a ball with flour, and then bake. You can see a more detailed version in the YouTube video above.

  • 3 cups of bakers flour (not plain white flour)
  • 2 tea spoons of yeast
  • 2 tea spooons of salt
  • 1.5 cups of warm water (again, I use 42 degrees from my gas hot water system)

The dough will seem really dry when you first mix it, but gets wetter as it rises. Don’t panic if it seems tacky and dry.

I think the key here is the baking process, which is how the oven loaf in my previous post about bread maker white loaves was baked. I use a cast iron camp oven (sometimes called a dutch oven), because thermal mass is key. If I had a fancy enamelized cast iron camp oven I’d use that, but I don’t and I wasn’t going shopping during the shutdown to get one. Oh, and they can be crazy expensive at up to $500 AUD.

Another loaf of bread

Warm the oven with the camp oven inside for at least 30 minutes at 230 degrees celsius. Then place the dough inside the camp oven on some baking paper — I tend to use a triffet as well, but I think you could skip that if you didn’t have one. Bake for 30 minutes with the lid on — this helps steam the bread a little and forms a nice crust. Then bake for another 12 minutes with the camp over lid off — this darkens the crust up nicely.

A final loaf of bread

Oh, and I’ve noticed a bit of variation in how wet the dough seems to be when I turn it out and form it in flour, but it doesn’t really seem to change the outcome once baked, so that’s nice.

The original blogger for this receipe also recommends chilling the dough overnight in the fridge before baking, but I haven’t tried that yet.

Share

Private Key Redaction: UR DOIN IT RONG

Because posting private keys on the Internet is a bad idea, some people like to “redact” their private keys, so that it looks kinda-sorta like a private key, but it isn’t actually giving away anything secret. Unfortunately, due to the way that private keys are represented, it is easy to “redact” a key in such a way that it doesn’t actually redact anything at all. RSA private keys are particularly bad at this, but the problem can (potentially) apply to other keys as well.

I’ll show you a bit of “Inside Baseball” with key formats, and then demonstrate the practical implications. Finally, we’ll go through a practical worked example from an actual not-really-redacted key I recently stumbled across in my travels.

The Private Lives of Private Keys

Here is what a typical private key looks like, when you come across it:

-----BEGIN RSA PRIVATE KEY-----
MGICAQACEQCxjdTmecltJEz2PLMpS4BXAgMBAAECEDKtuwD17gpagnASq1zQTYEC
CQDVTYVsjjF7IQIJANUYZsIjRsR3AgkAkahDUXL0RSECCB78r2SnsJC9AghaOK3F
sKoELg==
-----END RSA PRIVATE KEY-----

Obviously, there’s some hidden meaning in there – computers don’t encrypt things by shouting “BEGIN RSA PRIVATE KEY!”, after all. What is between the BEGIN/END lines above is, in fact, a base64-encoded DER format ASN.1 structure representing a PKCS#1 private key.

In simple terms, it’s a list of numbers – very important numbers. The list of numbers is, in order:

  • A version number (0);
  • The “public modulus”, commonly referred to as “n”;
  • The “public exponent”, or “e” (which is almost always 65,537, for various unimportant reasons);
  • The “private exponent”, or “d”;
  • The two “private primes”, or “p” and “q”;
  • Two exponents, which are known as “dmp1” and “dmq1”; and
  • A coefficient, known as “iqmp”.

Why Is This a Problem?

The thing is, only three of those numbers are actually required in a private key. The rest, whilst useful to allow the RSA encryption and decryption to be more efficient, aren’t necessary. The three absolutely required values are e, p, and q.

Of the other numbers, most of them are at least about the same size as each of p and q. So of the total data in an RSA key, less than a quarter of the data is required. Let me show you with the above “toy” key, by breaking it down piece by piece1:

  • MGI – DER for “this is a sequence”
  • CAQ – version (0)
  • CxjdTmecltJEz2PLMpS4BXn
  • AgMBAAe
  • ECEDKtuwD17gpagnASq1zQTYd
  • ECCQDVTYVsjjF7IQp
  • IJANUYZsIjRsR3q
  • AgkAkahDUXL0RSdmp1
  • ECCB78r2SnsJC9dmq1
  • AghaOK3FsKoELg==iqmp

Remember that in order to reconstruct all of these values, all I need are e, p, and q – and e is pretty much always 65,537. So I could “redact” almost all of this key, and still give all the important, private bits of this key. Let me show you:

-----BEGIN RSA PRIVATE KEY-----
..............................................................EC
CQDVTYVsjjF7IQIJANUYZsIjRsR3....................................
........
-----END RSA PRIVATE KEY-----

Now, I doubt that anyone is going to redact a key precisely like this… but then again, this isn’t a “typical” RSA key. They usually look a lot more like this:

-----BEGIN RSA PRIVATE KEY-----
MIIEogIBAAKCAQEAu6Inch7+mWtKn+leB9uCG3MaJIxRyvC/5KTz2fR+h+GOhqj4
SZJobiVB4FrE5FgC7AnlH6qeRi9MI0s6dt5UWZ5oNIeWSaOOeNO+EJDUkSVf67wj
SNGXlSjGAkPZ0nRJiDjhuPvQmdW53hOaBLk5udxPEQbenpXAzbLJ7wH5ouLQ3nQw
HwpwDNQhF6zRO8WoscpDVThOAM+s4PS7EiK8ZR4hu2toon8Ynadlm95V45wR0VlW
zywgbkZCKa1IMrDCscB6CglQ10M3Xzya3iTzDtQxYMVqhDrA7uBYRxA0y1sER+Rb
yhEh03xz3AWemJVLCQuU06r+FABXJuY/QuAVvQIDAQABAoIBAFqwWVhzWqNUlFEO
PoCVvCEAVRZtK+tmyZj9kU87ORz8DCNR8A+/T/JM17ZUqO2lDGSBs9jGYpGRsr8s
USm69BIM2ljpX95fyzDjRu5C0jsFUYNi/7rmctmJR4s4uENcKV5J/++k5oI0Jw4L
c1ntHNWUgjK8m0UTJIlHbQq0bbAoFEcfdZxd3W+SzRG3jND3gifqKxBG04YDwloy
tu+bPV2jEih6p8tykew5OJwtJ3XsSZnqJMwcvDciVbwYNiJ6pUvGq6Z9kumOavm9
XU26m4cWipuK0URWbHWQA7SjbktqEpxsFrn5bYhJ9qXgLUh/I1+WhB2GEf3hQF5A
pDTN4oECgYEA7Kp6lE7ugFBDC09sKAhoQWrVSiFpZG4Z1gsL9z5YmZU/vZf0Su0n
9J2/k5B1GghvSwkTqpDZLXgNz8eIX0WCsS1xpzOuORSNvS1DWuzyATIG2cExuRiB
jYWIJUeCpa5p2PdlZmBrnD/hJ4oNk4oAVpf+HisfDSN7HBpN+TJfcAUCgYEAyvY7
Y4hQfHIdcfF3A9eeCGazIYbwVyfoGu70S/BZb2NoNEPymqsz7NOfwZQkL4O7R3Wl
Rm0vrWT8T5ykEUgT+2ruZVXYSQCKUOl18acbAy0eZ81wGBljZc9VWBrP1rHviVWd
OVDRZNjz6nd6ZMrJvxRa24TvxZbJMmO1cgSW1FkCgYAoWBd1WM9HiGclcnCZknVT
UYbykCeLO0mkN1Xe2/32kH7BLzox26PIC2wxF5seyPlP7Ugw92hOW/zewsD4nLze
v0R0oFa+3EYdTa4BvgqzMXgBfvGfABJ1saG32SzoWYcpuWLLxPwTMsCLIPmXgRr1
qAtl0SwF7Vp7O/C23mNukQKBgB89DOEB7xloWv3Zo27U9f7nB7UmVsGjY8cZdkJl
6O4LB9PbjXCe3ywZWmJqEbO6e83A3sJbNdZjT65VNq9uP50X1T+FmfeKfL99X2jl
RnQTsrVZWmJrLfBSnBkmb0zlMDAcHEnhFYmHFuvEnfL7f1fIoz9cU6c+0RLPY/L7
n9dpAoGAXih17mcmtnV+Ce+lBWzGWw9P4kVDSIxzGxd8gprrGKLa3Q9VuOrLdt58
++UzNUaBN6VYAe4jgxGfZfh+IaSlMouwOjDgE/qzgY8QsjBubzmABR/KWCYiRqkj
qpWCgo1FC1Gn94gh/+dW2Q8+NjYtXWNqQcjRP4AKTBnPktEvdMA=
-----END RSA PRIVATE KEY-----

People typically redact keys by deleting whole lines, and usually replacing them with [...] and the like. But only about 345 of those 1588 characters (excluding the header and footer) are required to construct the entire key. You can redact about 4/5ths of that giant blob of stuff, and your private parts (or at least, those of your key) are still left uncomfortably exposed.

But Wait! There’s More!

Remember how I said that everything in the key other than e, p, and q could be derived from those three numbers? Let’s talk about one of those numbers: n.

This is known as the “public modulus” (because, along with e, it is also present in the public key). It is very easy to calculate: n = p * q. It is also very early in the key (the second number, in fact).

Since n = p * q, it follows that q = n / p. Thus, as long as the key is intact up to p, you can derive q by simple division.

Real World Redaction

At this point, I’d like to introduce an acquaintance of mine: Mr. Johan Finn. He is the proud owner of the GitHub repo johanfinn/scripts. For a while, his repo contained a script that contained a poorly-redacted private key. He since deleted it, by making a new commit, but of course because git never really deletes anything, it’s still available.

Of course, Mr. Finn may delete the repo, or force-push a new history without that commit, so here is the redacted private key, with a bit of the surrounding shell script, for our illustrative pleasure:

#Add private key to .ssh folder
cd /home/johan/.ssh/
echo  "-----BEGIN RSA PRIVATE KEY-----
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
ÄÄÄÄÄÄÄÄÄÄÄÄÄÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ
MIIJKgIBAAKCAgEAxEVih1JGb8gu/Fm4AZh+ZwJw/pjzzliWrg4mICFt1g7SmIE2
TCQMKABdwd11wOFKCPc/UzRH/fHuQcvWrpbOSdqev/zKff9iedKw/YygkMeIRaXB
fYELqvUAOJ8PPfDm70st9GJRhjGgo5+L3cJB2gfgeiDNHzaFvapRSU0oMGQX+kI9
ezsjDAn+0Pp+r3h/u1QpLSH4moRFGF4omNydI+3iTGB98/EzuNhRBHRNq4oBV5SG
Pq/A1bem2ninnoEaQ+OPESxYzDz3Jy9jV0W/6LvtJ844m+XX69H5fqq5dy55z6DW
sGKn78ULPVZPsYH5Y7C+CM6GAn4nYCpau0t52sqsY5epXdeYx4Dc+Wm0CjXrUDEe
Egl4loPKDxJkQqQ/MQiz6Le/UK9vEmnWn1TRXK3ekzNV4NgDfJANBQobOpwt8WVB
rbsC0ON7n680RQnl7PltK9P1AQW5vHsahkoixk/BhcwhkrkZGyDIl9g8Q/Euyoq3
eivKPLz7/rhDE7C1BzFy7v8AjC3w7i9QeHcWOZFAXo5hiDasIAkljDOsdfD4tP5/
wSO6E6pjL3kJ+RH2FCHd7ciQb+IcuXbku64ln8gab4p8jLa/mcMI+V3eWYnZ82Yu
axsa85hAe4wb60cp/rCJo7ihhDTTvGooqtTisOv2nSvCYpcW9qbL6cGjAXECAwEA
AQKCAgEAjz6wnWDP5Y9ts2FrqUZ5ooamnzpUXlpLhrbu3m5ncl4ZF5LfH+QDN0Kl
KvONmHsUhJynC/vROybSJBU4Fu4bms1DJY3C39h/L7g00qhLG7901pgWMpn3QQtU
4P49qpBii20MGhuTsmQQALtV4kB/vTgYfinoawpo67cdYmk8lqzGzzB/HKxZdNTq
s+zOfxRr7PWMo9LyVRuKLjGyYXZJ/coFaobWBi8Y96Rw5NZZRYQQXLIalC/Dhndm
AHckpstEtx2i8f6yxEUOgPvV/gD7Akn92RpqOGW0g/kYpXjGqZQy9PVHGy61sInY
HSkcOspIkJiS6WyJY9JcvJPM6ns4b84GE9qoUlWVF3RWJk1dqYCw5hz4U8LFyxsF
R6WhYiImvjxBLpab55rSqbGkzjI2z+ucDZyl1gqIv9U6qceVsgRyuqdfVN4deU22
LzO5IEDhnGdFqg9KQY7u8zm686Ejs64T1sh0y4GOmGsSg+P6nsqkdlXH8C+Cf03F
lqPFg8WQC7ojl/S8dPmkT5tcJh3BPwIWuvbtVjFOGQc8x0lb+NwK8h2Nsn6LNazS
0H90adh/IyYX4sBMokrpxAi+gMAWiyJHIHLeH2itNKtAQd3qQowbrWNswJSgJzsT
JuJ7uqRKAFkE6nCeAkuj/6KHHMPsfCAffVdyGaWqhoxmPOrnVgECggEBAOrCCwiC
XxwUgjOfOKx68siFJLfHf4vPo42LZOkAQq5aUmcWHbJVXmoxLYSczyAROopY0wd6
Dx8rqnpO7OtZsdJMeBSHbMVKoBZ77hiCQlrljcj12moFaEAButLCdZFsZW4zF/sx
kWIAaPH9vc4MvHHyvyNoB3yQRdevu57X7xGf9UxWuPil/jvdbt9toaraUT6rUBWU
GYPNKaLFsQzKsFWAzp5RGpASkhuiBJ0Qx3cfLyirjrKqTipe3o3gh/5RSHQ6VAhz
gdUG7WszNWk8FDCL6RTWzPOrbUyJo/wz1kblsL3vhV7ldEKFHeEjsDGroW2VUFlS
asAHNvM4/uYcOSECggEBANYH0427qZtLVuL97htXW9kCAT75xbMwgRskAH4nJDlZ
IggDErmzBhtrHgR+9X09iL47jr7dUcrVNPHzK/WXALFSKzXhkG/yAgmt3r14WgJ6
5y7010LlPFrzaNEyO/S4ISuBLt4cinjJsrFpoo0WI8jXeM5ddG6ncxdurKXMymY7
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::.::
:::::::::::::::::::::::::::.::::::::::::::::::::::::::::::::::::
LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLlL
ÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖ
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ
YYYYYYYYYYYYYYYYYYYYYyYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY
gff0GJCOMZ65pMSy3A3cSAtjlKnb4fWzuHD5CFbusN4WhCT/tNxGNSpzvxd8GIDs
nY7exs9L230oCCpedVgcbayHCbkChEfoPzL1e1jXjgCwCTgt8GjeEFqc1gXNEaUn
O8AJ4VlR8fRszHm6yR0ZUBdY7UJddxQiYOzt0S1RLlECggEAbdcs4mZdqf3OjejJ
06oTPs9NRtAJVZlppSi7pmmAyaNpOuKWMoLPElDAQ3Q7VX26LlExLCZoPOVpdqDH
KbdmBEfTR4e11Pn9vYdu9/i6o10U4hpmf4TYKlqk10g1Sj21l8JATj/7Diey8scO
sAI1iftSg3aBSj8W7rxCxSezrENzuqw5D95a/he1cMUTB6XuravqZK5O4eR0vrxR
AvMzXk5OXrUEALUvt84u6m6XZZ0pq5XZxq74s8p/x1JvTwcpJ3jDKNEixlHfdHEZ
ZIu/xpcwD5gRfVGQamdcWvzGHZYLBFO1y5kAtL8kI9tW7WaouWVLmv99AyxdAaCB
Y5mBAQKCAQEAzU7AnorPzYndlOzkxRFtp6MGsvRBsvvqPLCyUFEXrHNV872O7tdO
GmsMZl+q+TJXw7O54FjJJvqSSS1sk68AGRirHop7VQce8U36BmI2ZX6j2SVAgIkI
9m3btCCt5rfiCatn2+Qg6HECmrCsHw6H0RbwaXS4RZUXD/k4X+sslBitOb7K+Y+N
Bacq6QxxjlIqQdKKPs4P2PNHEAey+kEJJGEQ7bTkNxCZ21kgi1Sc5L8U/IGy0BMC
PvJxssLdaWILyp3Ws8Q4RAoC5c0ZP0W2j+5NSbi3jsDFi0Y6/2GRdY1HAZX4twem
Q0NCedq1JNatP1gsb6bcnVHFDEGsj/35oQKCAQEAgmWMuSrojR/fjJzvke6Wvbox
FRnPk+6YRzuYhAP/YPxSRYyB5at++5Q1qr7QWn7NFozFIVFFT8CBU36ktWQ39MGm
cJ5SGyN9nAbbuWA6e+/u059R7QL+6f64xHRAGyLT3gOb1G0N6h7VqFT25q5Tq0rc
Lf/CvLKoudjv+sQ5GKBPT18+zxmwJ8YUWAsXUyrqoFWY/Tvo5yLxaC0W2gh3+Ppi
EDqe4RRJ3VKuKfZxHn5VLxgtBFN96Gy0+Htm5tiMKOZMYAkHiL+vrVZAX0hIEuRZ
JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
-----END RSA PRIVATE KEY-----" >> id_rsa

Now, if you try to reconstruct this key by removing the “obvious” garbage lines (the ones that are all repeated characters, some of which aren’t even valid base64 characters), it still isn’t a key – at least, openssl pkey doesn’t want anything to do with it. The key is very much still in there, though, as we shall soon see.

Using a gem I wrote and a quick bit of Ruby, we can extract a complete private key. The irb session looks something like this:

>> require "derparse"
>> b64 = <<EOF
MIIJKgIBAAKCAgEAxEVih1JGb8gu/Fm4AZh+ZwJw/pjzzliWrg4mICFt1g7SmIE2
TCQMKABdwd11wOFKCPc/UzRH/fHuQcvWrpbOSdqev/zKff9iedKw/YygkMeIRaXB
fYELqvUAOJ8PPfDm70st9GJRhjGgo5+L3cJB2gfgeiDNHzaFvapRSU0oMGQX+kI9
ezsjDAn+0Pp+r3h/u1QpLSH4moRFGF4omNydI+3iTGB98/EzuNhRBHRNq4oBV5SG
Pq/A1bem2ninnoEaQ+OPESxYzDz3Jy9jV0W/6LvtJ844m+XX69H5fqq5dy55z6DW
sGKn78ULPVZPsYH5Y7C+CM6GAn4nYCpau0t52sqsY5epXdeYx4Dc+Wm0CjXrUDEe
Egl4loPKDxJkQqQ/MQiz6Le/UK9vEmnWn1TRXK3ekzNV4NgDfJANBQobOpwt8WVB
rbsC0ON7n680RQnl7PltK9P1AQW5vHsahkoixk/BhcwhkrkZGyDIl9g8Q/Euyoq3
eivKPLz7/rhDE7C1BzFy7v8AjC3w7i9QeHcWOZFAXo5hiDasIAkljDOsdfD4tP5/
wSO6E6pjL3kJ+RH2FCHd7ciQb+IcuXbku64ln8gab4p8jLa/mcMI+V3eWYnZ82Yu
axsa85hAe4wb60cp/rCJo7ihhDTTvGooqtTisOv2nSvCYpcW9qbL6cGjAXECAwEA
AQKCAgEAjz6wnWDP5Y9ts2FrqUZ5ooamnzpUXlpLhrbu3m5ncl4ZF5LfH+QDN0Kl
KvONmHsUhJynC/vROybSJBU4Fu4bms1DJY3C39h/L7g00qhLG7901pgWMpn3QQtU
4P49qpBii20MGhuTsmQQALtV4kB/vTgYfinoawpo67cdYmk8lqzGzzB/HKxZdNTq
s+zOfxRr7PWMo9LyVRuKLjGyYXZJ/coFaobWBi8Y96Rw5NZZRYQQXLIalC/Dhndm
AHckpstEtx2i8f6yxEUOgPvV/gD7Akn92RpqOGW0g/kYpXjGqZQy9PVHGy61sInY
HSkcOspIkJiS6WyJY9JcvJPM6ns4b84GE9qoUlWVF3RWJk1dqYCw5hz4U8LFyxsF
R6WhYiImvjxBLpab55rSqbGkzjI2z+ucDZyl1gqIv9U6qceVsgRyuqdfVN4deU22
LzO5IEDhnGdFqg9KQY7u8zm686Ejs64T1sh0y4GOmGsSg+P6nsqkdlXH8C+Cf03F
lqPFg8WQC7ojl/S8dPmkT5tcJh3BPwIWuvbtVjFOGQc8x0lb+NwK8h2Nsn6LNazS
0H90adh/IyYX4sBMokrpxAi+gMAWiyJHIHLeH2itNKtAQd3qQowbrWNswJSgJzsT
JuJ7uqRKAFkE6nCeAkuj/6KHHMPsfCAffVdyGaWqhoxmPOrnVgECggEBAOrCCwiC
XxwUgjOfOKx68siFJLfHf4vPo42LZOkAQq5aUmcWHbJVXmoxLYSczyAROopY0wd6
Dx8rqnpO7OtZsdJMeBSHbMVKoBZ77hiCQlrljcj12moFaEAButLCdZFsZW4zF/sx
kWIAaPH9vc4MvHHyvyNoB3yQRdevu57X7xGf9UxWuPil/jvdbt9toaraUT6rUBWU
GYPNKaLFsQzKsFWAzp5RGpASkhuiBJ0Qx3cfLyirjrKqTipe3o3gh/5RSHQ6VAhz
gdUG7WszNWk8FDCL6RTWzPOrbUyJo/wz1kblsL3vhV7ldEKFHeEjsDGroW2VUFlS
asAHNvM4/uYcOSECggEBANYH0427qZtLVuL97htXW9kCAT75xbMwgRskAH4nJDlZ
IggDErmzBhtrHgR+9X09iL47jr7dUcrVNPHzK/WXALFSKzXhkG/yAgmt3r14WgJ6
5y7010LlPFrzaNEyO/S4ISuBLt4cinjJsrFpoo0WI8jXeM5ddG6ncxdurKXMymY7
EOF
>> b64 += <<EOF
gff0GJCOMZ65pMSy3A3cSAtjlKnb4fWzuHD5CFbusN4WhCT/tNxGNSpzvxd8GIDs
nY7exs9L230oCCpedVgcbayHCbkChEfoPzL1e1jXjgCwCTgt8GjeEFqc1gXNEaUn
O8AJ4VlR8fRszHm6yR0ZUBdY7UJddxQiYOzt0S1RLlECggEAbdcs4mZdqf3OjejJ
06oTPs9NRtAJVZlppSi7pmmAyaNpOuKWMoLPElDAQ3Q7VX26LlExLCZoPOVpdqDH
KbdmBEfTR4e11Pn9vYdu9/i6o10U4hpmf4TYKlqk10g1Sj21l8JATj/7Diey8scO
sAI1iftSg3aBSj8W7rxCxSezrENzuqw5D95a/he1cMUTB6XuravqZK5O4eR0vrxR
AvMzXk5OXrUEALUvt84u6m6XZZ0pq5XZxq74s8p/x1JvTwcpJ3jDKNEixlHfdHEZ
ZIu/xpcwD5gRfVGQamdcWvzGHZYLBFO1y5kAtL8kI9tW7WaouWVLmv99AyxdAaCB
Y5mBAQKCAQEAzU7AnorPzYndlOzkxRFtp6MGsvRBsvvqPLCyUFEXrHNV872O7tdO
GmsMZl+q+TJXw7O54FjJJvqSSS1sk68AGRirHop7VQce8U36BmI2ZX6j2SVAgIkI
9m3btCCt5rfiCatn2+Qg6HECmrCsHw6H0RbwaXS4RZUXD/k4X+sslBitOb7K+Y+N
Bacq6QxxjlIqQdKKPs4P2PNHEAey+kEJJGEQ7bTkNxCZ21kgi1Sc5L8U/IGy0BMC
PvJxssLdaWILyp3Ws8Q4RAoC5c0ZP0W2j+5NSbi3jsDFi0Y6/2GRdY1HAZX4twem
Q0NCedq1JNatP1gsb6bcnVHFDEGsj/35oQKCAQEAgmWMuSrojR/fjJzvke6Wvbox
FRnPk+6YRzuYhAP/YPxSRYyB5at++5Q1qr7QWn7NFozFIVFFT8CBU36ktWQ39MGm
cJ5SGyN9nAbbuWA6e+/u059R7QL+6f64xHRAGyLT3gOb1G0N6h7VqFT25q5Tq0rc
Lf/CvLKoudjv+sQ5GKBPT18+zxmwJ8YUWAsXUyrqoFWY/Tvo5yLxaC0W2gh3+Ppi
EDqe4RRJ3VKuKfZxHn5VLxgtBFN96Gy0+Htm5tiMKOZMYAkHiL+vrVZAX0hIEuRZ
EOF
>> der = b64.unpack("m").first
>> c = DerParse.new(der).first_node.first_child
>> version = c.value
=> 0
>> c = c.next_node
>> n = c.value
=> 80071596234464993385068908004931... # (etc)
>> c = c.next_node
>> e = c.value
=> 65537
>> c = c.next_node
>> d = c.value
=> 58438813486895877116761996105770... # (etc)
>> c = c.next_node
>> p = c.value
=> 29635449580247160226960937109864... # (etc)
>> c = c.next_node
>> q = c.value
=> 27018856595256414771163410576410... # (etc)

What I’ve done, in case you don’t speak Ruby, is take the two “chunks” of plausible-looking base64 data, chuck them together into a variable named b64, unbase64 it into a variable named der, pass that into a new DerParse instance, and then walk the DER value tree until I got all the values I need.

Interestingly, the q value actually traverses the “split” in the two chunks, which means that there’s always the possibility that there are lines missing from the key. However, since p and q are supposed to be prime, we can “sanity check” them to see if corruption is likely to have occurred:

>> require "openssl"
>> OpenSSL::BN.new(p).prime?
=> true
>> OpenSSL::BN.new(q).prime?
=> true

Excellent! The chances of a corrupted file producing valid-but-incorrect prime numbers isn’t huge, so we can be fairly confident that we’ve got the “real” p and q. Now, with the help of another one of my creations we can use e, p, and q to create a fully-operational battle key:

>> require "openssl/pkey/rsa"
>> k = OpenSSL::PKey::RSA.from_factors(p, q, e)
=> #<OpenSSL::PKey::RSA:0x0000559d5903cd38>
>> k.valid?
=> true
>> k.verify(OpenSSL::Digest::SHA256.new, k.sign(OpenSSL::Digest::SHA256.new, "bob"), "bob")
=> true

… and there you have it. One fairly redacted-looking private key brought back to life by maths and far too much free time.

Sorry Mr. Finn, I hope you’re not still using that key on anything Internet-facing.

What About Other Key Types?

EC keys are very different beasts, but they have much the same problems as RSA keys. A typical EC key contains both private and public data, and the public portion is twice the size – so only about 1/3 of the data in the key is private material. It is quite plausible that you can “redact” an EC key and leave all the actually private bits exposed.

What Do We Do About It?

In short: don’t ever try and redact real private keys. For documentation purposes, just put “KEY GOES HERE” in the appropriate spot, or something like that. Store your secrets somewhere that isn’t a public (or even private!) git repo.

Generating a “dummy” private key and sticking it in there isn’t a great idea, for different reasons: people have this odd habit of reusing “demo” keys in real life. There’s no need to encourage that sort of thing.


  1. Technically the pieces aren’t 100% aligned with the underlying DER, because of how base64 works. I felt it was easier to understand if I stuck to chopping up the base64, rather than decoding into DER and then chopping up the DER. 

MicroHams Digital Conference (MHDC) 2020

On May 9 2020 (PST) I had the pleasure of speaking at the MicroHams Digital Conference (MHDC) 2020. Due to COVID-19 presenters attended via Zoom, and the conference was live streamed over YouTube.

Thanks to hard work of the organisers, this worked really well!

Looking at the conference program, I noticed the standard of the presenters was very high. The organisers I worked with (Scott N7SS, and Grant KB7WSD) explained that a side effect of making the conference virtual was casting a much wider net on presenters – making the conference even better than IRL (In Real Life)! The YouTube streaming stats showed 300-500 people “attending” – also very high.

My door to door travel time to West Coast USA is about 20 hours. So a remote presentation makes life much easier for me. It takes me a week to prepare, means 1-2 weeks away from home, and a week to recover from the jetlag. As a single parent I need to find a carer for my 14 year old.

Vickie, KD7LAW, ran a break out room for after talk chat which worked well. It was nice to “meet” several people that I usually just have email contact with. All from the comfort of my home on a Sunday morning in Adelaide (Saturday afternoon PST).

The MHDC 2020 talks have been now been published on YouTube. Here is my talk, which is a good update (May 2020) of Codec 2 and FreeDV, including:

  • The new FreeDV 2020 mode using the LPCNet neural net vocoder
  • Embedded FreeDV 700D running on the SM1000
  • FreeDV over the QO-100 geosynchronous satellite and KiwiSDRs
  • Introducing some of the good people contributing to FreeDV

The conference has me interested in applying the open source modems we have developed for digital voice to Amateur Radio packet and HF data. So I’m reading up on Winlink, Pat, Direwolf and friends.

Thanks Scott, Grant, and Vickie and the MicroHams club!

May 16, 2020

Raptor Blackbird support: all upstream in op-build

Thanks to my most recent PR being merged, op-build v2.5 will have full support for the Raptor Blackbird! This includes support for the “IPL Monitor” that’s required to get fan control going.

Note that if you’re running Fedora 32 then you need some patches to buildroot to have it build, but if you’re building on something a little older, then upstream should build and work straight out of the box (err… git tree).

I also note that the work to get Secure Boot for an OS Kernel going is starting to make its way out for code reviews, so that’s something to look forward to (although without a TPM we’re going to need extra code).

May 13, 2020

A op-build v2.5-rc1 based Raptor Blackbird Build

I have done a few builds of firmware for the Raptor Blackbird since I got mine, each of them based on upstream op-build plus a few patches. The previous one was Yet another near-upstream Raptor Blackbird firmware build that I built a couple of months ago. This new build is based off the release candidate of op-build v2.5. Here’s what’s changed:

PackageOld VersionNew Version
hcodehw030220a.opmsthw050520a.opmst
hostbootacdff8a390a2654dd52fed67bdebe2b5
kexec-lite18ec88310c4134e6b0130b3c1ea489e
libflashv6.5-228-g82aed17av6.6
linuxv5.4.22v5.4.33
linux-headersv5.4.22v5.4.33
machine-xml17e9e84d504582c88e782e30829e0d6be
occ3ab29212518e65740ab4dc96fd6cf584c42
openpower-pnor6fb8d914134d544a84175f00d9c6dc395faf3
sbec318ab00116d92f08c78fb7838495ad0aab7
skibootv6.5-228-g82aed17av6.6
Changes in my latest Blackbird build

Go grab blackbird.pnor from https://www.flamingspork.com/blackbird/stewart-blackbird-6-images/, and give it a go! Just scp it to your BMC, and flash it:

pflash -E -p /tmp/blackbird.pnor

There’s two differences from upstream op-build: my pull request to op-build, and the fixing of the (old) buildroot so that it’ll build on Fedora 32. From discussions on the openpower-firmware mailing list, it seems that one hopeful thing is to have all the Blackbird support merged in before the final op-build v2.5 is tagged. The previous op-build release (v2.4) was tagged in July 2019, so we’re about 10 months into what was a 2 month release cycle, so speculating on when that final release will be is somewhat difficult.

May 12, 2020

f32, u32, and const

Some time ago, I wrote “floats, bits, and constant expressions” about converting floating point number into its representative ones and zeros as a C++ constant expression – constructing the IEEE 754 representation without being able to examine the bits directly.

I’ve been playing around with Rust recently, and rewrote that conversion code as a bit of a learning exercise for myself, with a thoroughly contrived set of constraints: using integer and single-precision floating point math, at compile time, without unsafe blocks, while using as few unstable features as possible.

I’ve included the listing below, for your bemusement and/or head-shaking, and you can play with the code in the Rust Playground and rust.godbolt.org

// Jonathan Adamczewski 2020-05-12
//
// Constructing the bit-representation of an IEEE 754 single precision floating 
// point number, using integer and single-precision floating point math, at 
// compile time, in rust, without unsafe blocks, while using as few unstable 
// features as I can.
//
// or "What if this silly C++ thing http://brnz.org/hbr/?p=1518 but in Rust?"


// Q. Why? What is this good for?
// A. To the best of my knowledge, this code serves no useful purpose. 
//    But I did learn a thing or two while writing it :)


// This is needed to be able to perform floating point operations in a const 
// function:
#![feature(const_fn)]


// bits_transmute(): Returns the bits representing a floating point value, by
//                   way of std::mem::transmute()
//
// For completeness (and validation), and to make it clear the fundamentally 
// unnecessary nature of the exercise :D - here's a short, straightforward, 
// library-based version. But it needs the const_transmute flag and an unsafe 
// block.
#![feature(const_transmute)]
const fn bits_transmute(f: f32) -> u32 {
  unsafe { std::mem::transmute::<f32, u32>(f) }
}



// get_if_u32(predicate:bool, if_true: u32, if_false: u32):
//   Returns if_true if predicate is true, else if_false
//
// If and match are not able to be used in const functions (at least, not 
// without #![feature(const_if_match)] - so here's a branch-free select function
// for u32s
const fn get_if_u32(predicate: bool, if_true: u32, if_false: u32) -> u32 {
  let pred_mask = (-1 * (predicate as i32)) as u32;
  let true_val = if_true & pred_mask;
  let false_val = if_false & !pred_mask;
  true_val | false_val
}

// get_if_f32(predicate, if_true, if_false):
//   Returns if_true if predicate is true, else if_false
//
// A branch-free select function for f32s.
// 
// If either is_true or is_false is NaN or an infinity, the result will be NaN,
// which is not ideal. I don't know of a better way to implement this function
// within the arbitrary limitations of this silly little side quest.
const fn get_if_f32(predicate: bool, if_true: f32, if_false: f32) -> f32 {
  // can't convert bool to f32 - but can convert bool to i32 to f32
  let pred_sel = (predicate as i32) as f32;
  let pred_not_sel = ((!predicate) as i32) as f32;
  let true_val = if_true * pred_sel;
  let false_val = if_false * pred_not_sel;
  true_val + false_val
}


// bits(): Returns the bits representing a floating point value.
const fn bits(f: f32) -> u32 {
  // the result value, initialized to a NaN value that will otherwise not be
  // produced by this function.
  let mut r = 0xffff_ffff;

  // These floation point operations (and others) cause the following error:
  //     only int, `bool` and `char` operations are stable in const fn
  // hence #![feature(const_fn)] at the top of the file
  
  // Identify special cases
  let is_zero    = f == 0_f32;
  let is_inf     = f == f32::INFINITY;
  let is_neg_inf = f == f32::NEG_INFINITY;
  let is_nan     = f != f;

  // Writing this as !(is_zero || is_inf || ...) cause the following error:
  //     Loops and conditional expressions are not stable in const fn
  // so instead write this as type coversions, and bitwise operations
  //
  // "normalish" here means that f is a normal or subnormal value
  let is_normalish = 0 == ((is_zero as u32) | (is_inf as u32) | 
                        (is_neg_inf as u32) | (is_nan as u32));

  // set the result value for each of the special cases
  r = get_if_u32(is_zero,    0,           r); // if (iz_zero)    { r = 0; }
  r = get_if_u32(is_inf,     0x7f80_0000, r); // if (is_inf)     { r = 0x7f80_0000; }
  r = get_if_u32(is_neg_inf, 0xff80_0000, r); // if (is_neg_inf) { r = 0xff80_0000; }
  r = get_if_u32(is_nan,     0x7fc0_0000, r); // if (is_nan)     { r = 0x7fc0_0000; }
 
  // It was tempting at this point to try setting f to a "normalish" placeholder 
  // value so that special cases do not have to be handled in the code that 
  // follows, like so:
  // f = get_if_f32(is_normal, f, 1_f32);
  //
  // Unfortunately, get_if_f32() returns NaN if either input is NaN or infinite.
  // Instead of switching the value, we work around the non-normalish cases 
  // later.
  //
  // (This whole function is branch-free, so all of it is executed regardless of 
  // the input value)

  // extract the sign bit
  let sign_bit  = get_if_u32(f < 0_f32,  1, 0);

  // compute the absolute value of f
  let mut abs_f = get_if_f32(f < 0_f32, -f, f);

  
  // This part is a little complicated. The algorithm is functionally the same 
  // as the C++ version linked from the top of the file.
  // 
  // Because of the various contrived constraints on thie problem, we compute 
  // the exponent and significand, rather than extract the bits directly.
  //
  // The idea is this:
  // Every finite single precision float point number can be represented as a
  // series of (at most) 24 significant digits as a 128.149 fixed point number 
  // (128: 126 exponent values >= 0, plus one for the implicit leading 1, plus 
  // one more so that the decimal point falls on a power-of-two boundary :)
  // 149: 126 negative exponent values, plus 23 for the bits of precision in the 
  // significand.)
  //
  // If we are able to scale the number such that all of the precision bits fall 
  // in the upper-most 64 bits of that fixed-point representation (while 
  // tracking our effective manipulation of the exponent), we can then 
  // predictably and simply scale that computed value back to a range than can 
  // be converted safely to a u64, count the leading zeros to determine the 
  // exact exponent, and then shift the result into position for the final u32 
  // representation.
  
  // Start with the largest possible exponent - subsequent steps will reduce 
  // this number as appropriate
  let mut exponent: u32 = 254;
  {
    // Hex float literals are really nice. I miss them.

    // The threshold is 2^87 (think: 64+23 bits) to ensure that the number will 
    // be large enough that, when scaled down by 2^64, all the precision will 
    // fit nicely in a u64
    const THRESHOLD: f32 = 154742504910672534362390528_f32; // 0x1p87f == 2^87

    // The scaling factor is 2^41 (think: 64-23 bits) to ensure that a number 
    // between 2^87 and 2^64 will not overflow in a single scaling step.
    const SCALE_UP: f32 = 2199023255552_f32; // 0x1p41f == 2^41

    // Because loops are not available (no #![feature(const_loops)], and 'if' is
    // not available (no #![feature(const_if_match)]), perform repeated branch-
    // free conditional multiplication of abs_f.

    // use a macro, because why not :D It's the most compact, simplest option I 
    // could find.
    macro_rules! maybe_scale {
      () => {{
        // care is needed: if abs_f is above the threshold, multiplying by 2^41 
        // will cause it to overflow (INFINITY) which will cause get_if_f32() to
        // return NaN, which will destroy the value in abs_f. So compute a safe 
        // scaling factor for each iteration.
        //
        // Roughly equivalent to :
        // if (abs_f < THRESHOLD) {
        //   exponent -= 41;
        //   abs_f += SCALE_UP;
        // }
        let scale = get_if_f32(abs_f < THRESHOLD, SCALE_UP,      1_f32);    
        exponent  = get_if_u32(abs_f < THRESHOLD, exponent - 41, exponent); 
        abs_f     = get_if_f32(abs_f < THRESHOLD, abs_f * scale, abs_f);
      }}
    }
    // 41 bits per iteration means up to 246 bits shifted.
    // Even the smallest subnormal value will end up in the desired range.
    maybe_scale!();  maybe_scale!();  maybe_scale!();
    maybe_scale!();  maybe_scale!();  maybe_scale!();
  }

  // Now that we know that abs_f is in the desired range (2^87 <= abs_f < 2^128)
  // scale it down to be in the range (2^23 <= _ < 2^64), and convert without 
  // loss of precision to u64.
  const INV_2_64: f32 = 5.42101086242752217003726400434970855712890625e-20_f32; // 0x1p-64f == 2^64
  let a = (abs_f * INV_2_64) as u64;

  // Count the leading zeros.
  // (C++ doesn't provide a compile-time constant function for this. It's nice 
  // that rust does :)
  let mut lz = a.leading_zeros();

  // if the number isn't normalish, lz is meaningless: we stomp it with 
  // something that will not cause problems in the computation that follows - 
  // the result of which is meaningless, and will be ignored in the end for 
  // non-normalish values.
  lz = get_if_u32(!is_normalish, 0, lz); // if (!is_normalish) { lz = 0; }

  {
    // This step accounts for subnormal numbers, where there are more leading 
    // zeros than can be accounted for in a valid exponent value, and leading 
    // zeros that must remain in the final significand.
    //
    // If lz < exponent, reduce exponent to its final correct value - lz will be
    // used to remove all of the leading zeros.
    //
    // Otherwise, clamp exponent to zero, and adjust lz to ensure that the 
    // correct number of bits will remain (after multiplying by 2^41 six times - 
    // 2^246 - there are 7 leading zeros ahead of the original subnormal's
    // computed significand of 0.sss...)
    // 
    // The following is roughly equivalent to:
    // if (lz < exponent) {
    //   exponent = exponent - lz;
    // } else {
    //   exponent = 0;
    //   lz = 7;
    // }

    // we're about to mess with lz and exponent - compute and store the relative 
    // value of the two
    let lz_is_less_than_exponent = lz < exponent;

    lz       = get_if_u32(!lz_is_less_than_exponent, 7,             lz);
    exponent = get_if_u32( lz_is_less_than_exponent, exponent - lz, 0);
  }

  // compute the final significand.
  // + 1 shifts away a leading 1-bit for normal, and 0-bit for subnormal values
  // Shifts are done in u64 (that leading bit is shifted into the void), then
  // the resulting bits are shifted back to their final resting place.
  let significand = ((a << (lz + 1)) >> (64 - 23)) as u32;

  // combine the bits
  let computed_bits = (sign_bit << 31) | (exponent << 23) | significand;

  // return the normalish result, or the non-normalish result, as appopriate
  get_if_u32(is_normalish, computed_bits, r)
}


// Compile-time validation - able to be examined in rust.godbolt.org output
pub static BITS_BIGNUM: u32 = bits(std::f32::MAX);
pub static TBITS_BIGNUM: u32 = bits_transmute(std::f32::MAX);
pub static BITS_LOWER_THAN_MIN: u32 = bits(7.0064923217e-46_f32);
pub static TBITS_LOWER_THAN_MIN: u32 = bits_transmute(7.0064923217e-46_f32);
pub static BITS_ZERO: u32 = bits(0.0f32);
pub static TBITS_ZERO: u32 = bits_transmute(0.0f32);
pub static BITS_ONE: u32 = bits(1.0f32);
pub static TBITS_ONE: u32 = bits_transmute(1.0f32);
pub static BITS_NEG_ONE: u32 = bits(-1.0f32);
pub static TBITS_NEG_ONE: u32 = bits_transmute(-1.0f32);
pub static BITS_INF: u32 = bits(std::f32::INFINITY);
pub static TBITS_INF: u32 = bits_transmute(std::f32::INFINITY);
pub static BITS_NEG_INF: u32 = bits(std::f32::NEG_INFINITY);
pub static TBITS_NEG_INF: u32 = bits_transmute(std::f32::NEG_INFINITY);
pub static BITS_NAN: u32 = bits(std::f32::NAN);
pub static TBITS_NAN: u32 = bits_transmute(std::f32::NAN);
pub static BITS_COMPUTED_NAN: u32 = bits(std::f32::INFINITY/std::f32::INFINITY);
pub static TBITS_COMPUTED_NAN: u32 = bits_transmute(std::f32::INFINITY/std::f32::INFINITY);


// Run-time validation of many more values
fn main() {
  let end: usize = 0xffff_ffff;
  let count = 9_876_543; // number of values to test
  let step = end / count;
  for u in (0..=end).step_by(step) {
      let v = u as u32;
      
      // reference
      let f = unsafe { std::mem::transmute::<u32, f32>(v) };
      
      // compute
      let c = bits(f);

      // validation
      if c != v && 
         !(f.is_nan() && c == 0x7fc0_0000) && // nans
         !(v == 0x8000_0000 && c == 0) { // negative 0
          println!("{:x?} {:x?}", v, c); 
      }
  }
}

May 10, 2020

IT Asset Management

In my last full-time position I managed the asset tracking database for my employer. It was one of those things that “someone” needed to do, and it seemed that only way that “someone” wouldn’t equate to “no-one” was for me to do it – which was ok. We used Snipe IT [1] to track the assets. I don’t have enough experience with asset tracking to say that Snipe is better or worse than average, but it basically did the job. Asset serial numbers are stored, you can have asset types that allow you to just add one more of the particular item, purchase dates are stored which makes warranty tracking easier, and every asset is associated with a person or listed as available. While I can’t say that Snipe IT is better than other products I can say that it will do the job reasonably well.

One problem that I didn’t discover until way too late was the fact that the finance people weren’t tracking serial numbers and that some assets in the database had the same asset IDs as the finance department and some had different ones. The best advice I can give to anyone who gets involved with asset tracking is to immediately chat to finance about how they track things, you need to know if the same asset IDs are used and if serial numbers are tracked by finance. I was pleased to discover that my colleagues were all honourable people as there was no apparent evaporation of valuable assets even though there was little ability to discover who might have been the last person to use some of the assets.

One problem that I’ve seen at many places is treating small items like keyboards and mice as “assets”. I think that anything that is worth less than 1 hour’s pay at the minimum wage (the price of a typical PC keyboard or mouse) isn’t worth tracking, treat it as a disposable item. If you hire a programmer who requests an unusually expensive keyboard or mouse (as some do) it still won’t be a lot of money when compared to their salary. Some of the older keyboards and mice that companies have are nasty, months of people eating lunch over them leaves them greasy and sticky. I think that the best thing to do with the keyboards and mice is to give them away when people leave and when new people join the company buy new hardware for them. If a company can’t spend $25 on a new keyboard and mouse for each new employee then they either have a massive problem of staff turnover or a lack of priority on morale.

A breadmaker loaf my kids will actually eat

Share

My dad asked me to document some of my baking experiments from the recent natural disasters, which I wanted to do anyway so that I could remember the recipes. Its taken me a while to get around to though, because animated GIFs on reddit are a terrible medium for recipe storage, and because I’ve been distracted with other shiney objects. That said, let’s start with the basics — a breadmaker bread that my kids will actually eat.

A loaf of bread baked in the oven

This recipe took a bunch of iterations to get right over the last year or so, but I’ll spare you the long boring details. However, I suspect part of the problem is that the receipe varies by bread maker. Oh, and the salt is really important — don’t skip the salt!

Wet ingredients (add first)

  • 1.5 cups of warm water (we have an instantaneous gas hot water system, so I pick 42 degrees)
  • 0.25 cups of oil (I use bran oil)

Dry ingredients (add second)

I just kind of chuck these in, although I tend to put the non-flour ingredients in a corner together for reasons that I can’t explain.

  • 3.5 cups of bakers flour (must be bakers flour, not plain flour)
  • 2 tea spoons of instant yeast (we keep in the freezer in a big packet, not the sashets)
  • 4 tea spoons of white sugar
  • 1 tea spoon of salt
  • 2 tea spoons of bread improver

I then just let my bread maker do its thing, which takes about three hours including baking. If I am going to bake the bread in the over, then the dough takes about two hours, but I let the dough rise for another 30 to 60 minutes before baking.

A loaf of bread from the bread maker

I think to be honest that the result is better from the oven, but a little more work. The bread maker loaves are a bit prone to collapsing (you can see it starting on the example above), and there is a big kneeding hook indent in the middle of the bottom of the loaf.

The oven baking technique took a while to develop, but I’ll cover that in a later post.

Share

May 06, 2020

About Reopening Businesses

Currently there is political debate about when businesses should be reopened after the Covid19 quarantine.

Small Businesses

One argument for reopening things is for the benefit of small businesses. The first thing to note is that the protests in the US say “I need a haircut” not “I need to cut people’s hair”. Small businesses won’t benefit from reopening sooner.

For every business there is a certain minimum number of customers needed to be profitable. There are many comments from small business owners that want it to remain shutdown. When the government has declared a shutdown and paused rent payments and provided social security to employees who aren’t working the small business can avoid bankruptcy. If they suddenly have to pay salaries or make redundancy payouts and have to pay rent while they can’t make a profit due to customers staying home they will go bankrupt.

Many restaurants and cafes make little or no profit at most times of the week (I used to be 1/3 owner of an Internet cafe and know this well). For such a company to be viable you have to be open most of the time so customers can expect you to be open. Generally you don’t keep a cafe open at 3PM to make money at 3PM, you keep it open so people can rely on there being a cafe open there, someone who buys a can of soda at 3PM one day might come back for lunch at 1:30PM the next day because they know you are open. A large portion of the opening hours of a most retail companies can be considered as either advertising for trade at the profitable hours or as loss making times that you can’t close because you can’t send an employee home for an hour.

If you have seating for 28 people (as my cafe did) then for about half the opening hours you will probably have 2 or fewer customers in there at any time, for about a quarter the opening hours you probably won’t cover the salary of the one person on duty. The weekend is when you make the real money, especially Friday and Saturday nights when you sometimes get all the seats full and people coming in for takeaway coffee and snacks. On Friday and Saturday nights the 60 seat restaurant next door to my cafe used to tell customers that my cafe made better coffee. It wasn’t economical for them to have a table full for an hour while they sell a few cups of coffee, they wanted customers to leave after dessert and free the table for someone who wants a meal with wine (alcohol is the real profit for many restaurants).

The plans of reopening with social distancing means that a 28 seat cafe can only have 14 chairs or less (some plans have 25% capacity which would mean 7 people maximum). That means decreasing the revenue of the most profitable times by 50% to 75% while also not decreasing the operating costs much. A small cafe has 2-3 staff when it’s crowded so there’s no possibility of reducing staff by 75% when reducing the revenue by 75%.

My Internet cafe would have closed immediately if forced to operate in the proposed social distancing model. It would have been 1/4 of the trade and about 1/8 of the profit at the most profitable times, even if enough customers are prepared to visit – and social distancing would kill the atmosphere. Most small businesses are barely profitable anyway, most small businesses don’t last 4 years in normal economic circumstances.

This reopen movement is about cutting unemployment benefits not about helping small business owners. Destroying small businesses is also good for big corporations, kill the small cafes and restaurants and McDonald’s and Starbucks will win. I think this is part of the motivation behind the astroturf campaign for reopening businesses.

Forbes has an article about this [1].

Psychological Issues

Some people claim that we should reopen businesses to help people who have psychological problems from isolation, to help victims of domestic violence who are trapped at home, to stop older people being unemployed for the rest of their lives, etc.

Here is one article with advice for policy makers from domestic violence experts [2]. One thing it mentions is that the primary US federal government program to deal with family violence had a budget of $130M in 2013. The main thing that should be done about family violence is to make it a priority at all times (not just when it can be a reason for avoiding other issues) and allocate some serious budget to it. An agency that deals with problems that affect families and only has a budget of $1 per family per year isn’t going to be able to do much.

There are ongoing issues of people stuck at home for various reasons. We could work on better public transport to help people who can’t drive. We could work on better healthcare to help some of the people who can’t leave home due to health problems. We could have more budget for carers to help people who can’t leave home without assistance. Wanting to reopen restaurants because some people feel isolated is ignoring the fact that social isolation is a long term ongoing issue for many people, and that many of the people who are affected can’t even afford to eat at a restaurant!

Employment discrimination against people in the 50+ age range is an ongoing thing, many people in that age range know that if they lose their job and can’t immediately find another they will be unemployed for the rest of their lives. Reopening small businesses won’t help that, businesses running at low capacity will have to lay people off and it will probably be the older people. Also the unemployment system doesn’t deal well with part time work. The Australian system (which I think is similar to most systems in this regard) reduces the unemployment benefits by $0.50 for every dollar that is earned in part time work, that effectively puts people who are doing part time work because they can’t get a full-time job in the highest tax bracket! If someone is going to pay for transport to get to work, work a few hours, then get half the money they earned deducted from unemployment benefits it hardly makes it worthwhile to work. While the exact health impacts of Covid19 aren’t well known at this stage it seems very clear that older people are disproportionately affected, so forcing older people to go back to work before there is a vaccine isn’t going to help them.

When it comes to these discussions I think we should be very suspicious of people who raise issues they haven’t previously shown interest in. If the discussion of reopening businesses seems to be someone’s first interest in the issues of mental health, social security, etc then they probably aren’t that concerned about such issues.

I believe that we should have a Universal Basic Income [3]. I believe that we need to provide better mental health care and challenge the gender ideas that hurt men and cause men to hurt women [4]. I believe that we have significant ongoing problems with inequality not small short term issues [5]. I don’t think that any of these issues require specific changes to our approach to preventing the transmission of disease. I also think that we can address multiple issues at the same time, so it is possible for the government to devote more resources to addressing unemployment, family violence, etc while also dealing with a pandemic.

May 03, 2020

Backing up to a GnuBee PC 2

After installing Debian buster on my GnuBee, I set it up for receiving backups from my other computers.

Software setup

I started by configuring it like a typical server but without a few packages that either take a lot of memory or CPU:

I changed the default hostname:

  • /etc/hostname: foobar
  • /etc/mailname: foobar.example.com
  • /etc/hosts: 127.0.0.1 foobar.example.com foobar localhost

and then installed the avahi-daemon package to be able to reach this box using foobar.local.

I noticed the presence of a world-writable directory and so I tightened the security of some of the default mount points by putting the following in /etc/rc.local:

chmod 755 /etc/network
exit 0

Hardware setup

My OS drive (/dev/sda) is a small SSD so that the GnuBee can run silently when the spinning disks aren't needed. To hold the backup data on the other hand, I got three 4-TB drives drives which I setup in a RAID-5 array. If the data were valuable, I'd use RAID-6 instead since it can survive two drives failing at the same time, but in this case since it's only holding backups, I'd have to lose the original machine at the same time as two of the 3 drives, a very unlikely scenario.

I created new gpt partition tables on /dev/sdb, /dev/sdbc, /dev/sdd and used fdisk to create a single partition of type 29 (Linux RAID) on each of them.

Then I created the RAID array:

mdadm /dev/md127 --create -n 3 --level=raid5 -a /dev/sdb1 /dev/sdc1 /dev/sdd1

and waited more than 24 hours for that operation to finish. Next, I formatted the array:

mkfs.ext4 -m 0 /dev/md127

and added the following to /etc/fstab:

/dev/md127 /mnt/data/ ext4 noatime,nodiratime 0 2

To reduce unnecessary noise and reduce power consumption, I also installed hdparm:

apt install hdparm

and configured all spinning drives to spin down after being idle for 10 minutes by putting the following in /etc/hdparm.conf:

/dev/sdb {
       spindown_time = 120
}

/dev/sdc {
       spindown_time = 120
}

/dev/sdd {
       spindown_time = 120
}

and then reloaded the configuration:

 /usr/lib/pm-utils/power.d/95hdparm-apm resume

Finally I setup smartmontools by putting the following in /etc/smartd.conf:

/dev/sda -a -o on -S on -s (S/../.././02|L/../../6/03)
/dev/sdb -a -o on -S on -s (S/../.././02|L/../../6/03)
/dev/sdc -a -o on -S on -s (S/../.././02|L/../../6/03)
/dev/sdd -a -o on -S on -s (S/../.././02|L/../../6/03)

and restarting the daemon:

systemctl restart smartd.service

Backup setup

I started by using duplicity since I have been using that tool for many years, but a 190GB backup took around 15 hours on the GnuBee with gigabit ethernet.

After a friend suggested it, I took a look at restic and I have to say that I am impressed. The same backup finished in about half the time.

User and ssh setup

After hardening the ssh setup as I usually do, I created a user account for each machine needing to backup onto the GnuBee:

adduser machine1
adduser machine1 sshuser
adduser machine1 sftponly
chsh machine1 -s /bin/false

and then matching directories under /mnt/data/home/:

mkdir /mnt/data/home/machine1
chown machine1:machine1 /mnt/data/home/machine1
chmod 700 /mnt/data/home/machine1

Then I created a custom ssh key for each machine:

ssh-keygen -f /root/.ssh/foobar_backups -t ed25519

and placed it in /home/machine1/.ssh/authorized_keys on the GnuBee.

On each machine, I added the following to /root/.ssh/config:

Host foobar.local
    User machine1
    Compression no
    Ciphers aes128-ctr
    IdentityFile /root/backup/foobar_backups
    IdentitiesOnly yes
    ServerAliveInterval 60
    ServerAliveCountMax 240

The reason for setting the ssh cipher and disabling compression is to speed up the ssh connection as much as possible given that the GnuBee has a very small RAM bandwidth.

Another performance-related change I made on the GnuBee was switching to the internal sftp server by putting the following in /etc/ssh/sshd_config:

Subsystem      sftp    internal-sftp

Restic script

After reading through the excellent restic documentation, I wrote the following backup script, based on my old duplicity script, to reuse on all of my computers:

# Configure for each host
PASSWORD="XXXX"  # use `pwgen -s 64` to generate a good random password
BACKUP_HOME="/root/backup"
REMOTE_URL="sftp:foobar.local:"
RETENTION_POLICY="--keep-daily 7 --keep-weekly 4 --keep-monthly 12 --keep-yearly 2"

# Internal variables
SSH_IDENTITY="IdentityFile=$BACKUP_HOME/foobar_backups"
EXCLUDE_FILE="$BACKUP_HOME/exclude"
PKG_FILE="$BACKUP_HOME/dpkg-selections"
PARTITION_FILE="$BACKUP_HOME/partitions"

# If the list of files has been requested, only do that
if [ "$1" = "--list-current-files" ]; then
    RESTIC_PASSWORD=$PASSWORD restic --quiet -r $REMOTE_URL ls latest
    exit 0

# Show list of available snapshots
elif [ "$1" = "--list-snapshots" ]; then
    RESTIC_PASSWORD=$GPG_PASSWORD restic --quiet -r $REMOTE_URL snapshots
    exit 0

# Restore the given file
elif [ "$1" = "--file-to-restore" ]; then
    if [ "$2" = "" ]; then
        echo "You must specify a file to restore"
        exit 2
    fi
    RESTORE_DIR="$(mktemp -d ./restored_XXXXXXXX)"
    RESTIC_PASSWORD=$PASSWORD restic --quiet -r $REMOTE_URL restore latest --target "$RESTORE_DIR" --include "$2" || exit 1
    echo "$2 was restored to $RESTORE_DIR"
    exit 0

# Delete old backups
elif [ "$1" = "--prune" ]; then
    # Expire old backups
    RESTIC_PASSWORD=$PASSWORD restic --quiet -r $REMOTE_URL forget $RETENTION_POLICY

    # Delete files which are no longer necessary (slow)
    RESTIC_PASSWORD=$PASSWORD restic --quiet -r $REMOTE_URL prune
    exit 0

# Catch invalid arguments
elif [ "$1" != "" ]; then
    echo "Invalid argument: $1"
    exit 1
fi

# Check the integrity of existing backups
RESTIC_PASSWORD=$PASSWORD restic --quiet -r $REMOTE_URL check || exit 1

# Dump list of Debian packages
dpkg --get-selections > $PKG_FILE

# Dump partition tables from harddrives
/sbin/fdisk -l /dev/sda > $PARTITION_FILE
/sbin/fdisk -l /dev/sdb > $PARTITION_FILE

# Do the actual backup
RESTIC_PASSWORD=$PASSWORD restic --quiet --cleanup-cache -r $REMOTE_URL backup / --exclude-file $EXCLUDE_FILE

I run it with the following cronjob in /etc/cron.d/backups:

30 8 * * *    root  ionice nice nocache /root/backup/backup-machine1-to-foobar
30 2 * * Sun  root  ionice nice nocache /root/backup/backup-machine1-to-foobar --prune

in a way that doesn't impact the rest of the system too much.

Finally, I printed a copy of each of my backup script, using enscript, to stash in a safe place:

enscript --highlight=bash --style=emacs --output=- backup-machine1-to-foobar | ps2pdf - > foobar.pdf

This is actually a pretty important step since without the password, you won't be able to decrypt and restore what's on the GnuBee.

May 02, 2020

Audiobooks – April 2020

Cockpit Confidential: Everything You Need to Know About Air Travel: Questions, Answers, and Reflections by Patrick Smith

Lots of “you always wanted to know” & “this is how it really is” bits about commercial flying. Good fun 4/5

The Day of the Jackal by Frederick Forsyth

A very tightly written thriller about a fictional 1963 plot to assassinate Frnch President Charles de Gaulle. Fast moving, detailed and captivating 5/5

Topgun: An American Story by Dan Pedersen

Memoir from the first officer in charge of the US Navy’s Top Gun school. A mix of his life & career, the school and US Navy air history (especially during Vietnam). Excellent 4/5

Radicalized: Four Tales of Our Present Moment
by Cory Doctorow

4 short stories set in more-or-less the present day. They all work fairly well. Worth a read. Spoilers in the link. 3/5

On the Banks of Plum Creek: Little House Series, Book 4 by Laura Ingalls Wilder

The family settle in Minnesota and build a new farm. Various major and minor adventures. I’m struck how few possessions people had back then. 3/5

My Father’s Business: The Small-Town Values That Built Dollar General into a Billion-Dollar Company by Cal Turner Jr.

A mix of personal and company history. I found the early story of the company and personal stuff the most interesting. 3/5

You Can’t Fall Off the Floor: And Other Lessons from a Life in Hollywood by Harris and Nick Katleman

Memoir by a former studio exec and head. Lots of funny and interesting stories from his career, featuring plenty of famous names. 4/5

The Wave: In Pursuit of the Rogues, Freaks and Giants of the Ocean by Susan Casey

75% about Big-wave Tow-Surfers with chapters on Scientists and Shipping industry people mixed in. Competent but author’s heart seemed mostly in the surfing. 3/5

Share

April 27, 2020

Install the COVIDSafe app

I can’t think of a more unequivocal title than that. 🙂

The Australian government doesn’t have a good track record of either launching publicly visible software projects, or respecting privacy, so I’ve naturally been sceptical of the contact tracing app since it was announced. The good news is, while it has some relatively minor problems, it appears to be a solid first version.

Privacy

While the source code is yet to be released, the Android version has already been decompiled, and public analysis is showing that it only collects necessary information, and only uploads contact information to the government servers when you press the button to upload (you should only press that button if you actually get COVID-19, and are asked to upload it by your doctor).

The legislation around the app is also clear that the data you upload can only be accessed by state health officials. Commonwealth departments have no access, neither do non-health departments (eg, law enforcement, intelligence).

Technical

It does what it’s supposed to do, and hasn’t been found to open you up to risks by installing it. There are a lot of people digging into it, so I would expect any significant issues to be found, reported, and fixed quite quickly.

Some parts of it are a bit rushed, and the way it scans for contacts could be more battery efficient (that should hopefully be fixed in the coming weeks when Google and Apple release updates that these contact tracing apps can use).

If it produces useful data, however, I’m willing to put up with some quirks. 🙂

Usefulness

I’m obviously not an epidemiologist, but those I’ve seen talk about it say that yes, the data this app produces will be useful for augmenting the existing contact tracing efforts. There were some concerns that it could produce a lot of junk data that wastes time, but I trust the expert contact tracing teams to filter and prioritise the data they get from it.

Install it!

The COVIDSafe site has links to the app in Apple’s App Store, as well as Google’s Play Store. Setting it up takes a few minutes, and then you’re done!

April 26, 2020

YouTube Channels I subscribe to in April 2020

I did a big twitter thread of the YouTube channels I am following. Below is a copy of the tweets. They are a quick description of the channel and a link to a sample video.

Lots of pop-Science and TV/Movie analysis channels plus a few on other topics.

I should mention that I watch the majority of YouTube videos at speed 1.5x since they usually speak quite slowly. To Speed up videos click on the settings “cog” and then select “Playback Speed” . YouTube lets you go up to 2x

Image

Chris Stuckmann reviews movies. During normal times he does a couple per week. Mostly currently releases with some old ones. His reviews are low-spoiler although sometimes he’ll do an extra “Spoiler Review”. Usually around 6 minutes long.
Star Wars: The Rise of Skywalker – Movie Review

Wendover Productions does explainer videos. Air & Sea travel are quite common topics. Usually a bit better researched than some of the other channels and a little longer at around 12 minutes. Around 1 video per week.
The Logistics of the US Census

City Beautiful is a channel about cities and City planning. 1-2 videos per month. Usually around 10 minutes. Pitched for the amateur city and planning enthusiast
Where did the rules of the road come from?

PBS Eons does videos about the history of life on Earth. Lots of Dinosaurs, early humans and the like. Run and advised by experts so info is great quality. Links to refs! Accessible but dives into the detail. Around 1 video/week. About 10 minutes each.
How the Egg Came First

Pitch Meetings are a writer pitching a real (usually recent) movie or show to a studio exec. Both a played by Ryan George. Very funny. Part of the Screen Rant channel but I don’t watch their other stuff
Playlist
Netflix’s Tiger King Pitch Meeting

MrMobile [Michael Fisher] reviews Phones, Laptops, Smart Watches & other tech gadgets. Usually about one video/week. I like the descriptive style and good production values, Not too much spec flooding.
A Stunning Smartwatch With A Familiar Failing – New Moto 360 Review

Verge Science does professional level stories about a range of Science topics. They usually are out in the field with Engineers and scientists.
Why urban coyote sightings are on the rise

Alt Shift X do detailed explainer videos about Books & TV Shows like Game of Thrones, Watchmen & Westworld. Huge amounts of detail and a great style with a wall of pictures. Weekly videos when shows are on plus subscriber extras.
Watchmen Explained (original comic)

The B1M talks about building and construction projects. Many videos are done with cooperation of the architects or building companies so a bit fluffy at times. But good production values and interesting topics.
The World’s Tallest Modular Hotel

CineFix doesn’t a variety of Movie-related videos. Over the last year only putting about one or two per month and mostly high quality. A few years ago they were at higher volume and had more throw-aways
Jojo Rabbit – What’s the Difference?

Marques Brownlee (MKBHD) does tech reviews. Mainly phones but also other gear and the odd special. His videos are extremely high quality and well researched. Averaging 2 videos per week.
Samsung Galaxy S20 Ultra Review: Attack of the Numbers!

How it Should have Ended does cartoons of funny alternative endings for movies. Plus some other long running series. Usually only a few minutes long.
Avengers Endgame Alternate HISHE

Power Play Chess is a Chess channel from Daniel King. He usually covers 1 round/day from major tournaments as well as reviewing older games and other videos.
World Champion tastes the bullet | Firouzja vs Carlsen | Lichess Bullet match 2020

Tom Scott makes explainer videos mostly about science, technology and geography. Often filmed on site rather than being talks over pictures like other channels.
Inside The Billion-Euro Nuclear Reactor That Was Never Switched On

Screen Junkies does stuff about movies. I mostly watch their “Honest Trailers” but they sometimes do ‘Serious Questions” which are good too.
Honest Trailers | Terminator: Dark Fate

Half as Interesting is an offshoot of Wendover Productions (see above). It does shorter 3-5 minutes weekly videos on a quick amusing fact or happening (that doesn’t justify a longer video)
United Airlines’ Men-Only Flights

Red Team Review is another movie and TV review channel. I was mostly watching them when Game of Thrones was on and since then they have had a bit less content. They are making some Game of Thrones videos narrated by the TV actors though
Game of Thrones Histories & Lore – The Rains of Castamere

Signum University do online classes about Fantasy (especially Tolkien) and related literature. Their channel features their classes and related videos. I mainly follow “Exploring The Lord of the Rings”. Often sounds better at 2x or 3x speed.
A Wizard of Earthsea: Session 01 – Mageborn

The Nerdwriter does approx monthly videos. Usually about a specific type of art, a painting or film making technique. Very high quality
How Walter Murch Worldized Film Sound

Real Life Lore does infotainment videos. “Answers to questions that you’ve never asked. Mostly over topics like history, geography, economics and science”.
This Was the World’s Most Dangerous Amusement Park

Janice Fung is a Sydney based youtuber who makes videos mostly about food and travel. She puts out 2 videos most weeks.
I Made the Viral Tik Tok Frothy DALGONA COFFEE! (Whipped Coffee Without Mixer!!)

Real Engineering is a bit more technical than the average popsci channel. The especially like doing videos covering flight dynamics. but they cover lots of other topics
How The Ford Model T Took Over The World

Just Write by Sage Hyden puts out a video roughly once a month. They are essays usually about writing and usually tied into a recently movie or show.
A Disney Monopoly Is A Problem (According To Disney’s Recess)

CGP Grey makes high quality explainer videos. Around one every month. High quality and usually with lots of animation.
The Trouble With Tumbleweed

Lessons from the Screenplay are “videos that analyze movie scripts to examine exactly how and why they are so good at telling their stories”
Casino Royale — How Action Reveals Character

HaxDogma is another TV Show review/analysis channel. I started watching him for his Watchmen Series videos and now watch his Westworld ones.
Official Westworld Trailer Breakdown + 3 Hidden Trailers

Lindsay Ellis does videos mostly about pop culture, Usually movies. These days she only does a few a year but they are usually 20+ minutes.
The Hobbit: A Long-Expected Autopsy (Part 1/2)

A bonus couple of recommended Courses on ‘Crash Course
Crash Course Astronomy with Phil Plait
Crash Course Computer Science by Carrie Anne Philbin

Share

April 24, 2020

Disabling mail sending from your domain

I noticed that I was receiving some bounced email notifications from a domain I own (cloud.geek.nz) to host my blog. These notifications were all for spam messages spoofing the From address since I do not use that domain for email.

I decided to try setting a strict DMARC policy to see if DMARC-using mail servers (e.g. GMail) would then drop these spoofed emails without notifying me about it.

I started by setting this initial DMARC policy in DNS in order to monitor the change:

@ TXT v=spf1 -all
_dmarc TXT v=DMARC1; p=none; ruf=mailto:dmarc@fmarier.org; sp=none; aspf=s; fo=0:1:d:s;

Then I waited three weeks without receiving anything before updating the relevant DNS records to this final DMARC policy:

@ TXT v=spf1 -all
_dmarc TXT v=DMARC1; p=reject; sp=reject; aspf=s;

This policy states that nobody is allowed to send emails for this domain and that any incoming email claiming to be from this domain should be silently rejected.

I haven't noticed any bounce notifications for messages spoofing this domain in a while, so maybe it's working?

FreeDV Beacon Maintenance

There’s been some recent interest in the FreeDV Beacon project, originally developed back in 2015. A FreeDV beacon was operating in Sunbury, VK3, for several years and was very useful for testing FreeDV.

After being approach by John (VK3IC) and Bob (VK4YA), I decided to dust off the software and bring it across to a GitHub repo. It’s now running on my laptop happily and I hope John and Bob will soon have some beacons running on the air.

I’ve added support for FreeDV 700C and 700D modes, finding a tricky bug in the process. I really should read the instructions for my own API!

Thanks also to Richard (KF5OIM) for help with the Cmake build system.


April 18, 2020

Accessing USB serial devices in Fedora Silverblue

One of the things I do a lot on my Fedora machines is talk to devices via USB serial. While a device is correctly detected at /dev/ttyUSB0 and owned by the dialout group, adding myself to that group doesn’t work as it can’t be found. This is because under Silverblue, there are two different group files (/usr/lib/group and /etc/group) with different content.

There are some easy ways to solve this, for example we can create the matching dialout group or write a udev rule. Let’s take a look!

On the host with groups

If you try to add yourself to the dialout group it will fail.

sudo gpasswd -a ${USER} dialout
gpasswd: group 'dialout' does not exist in /etc/group

Trying to re-create the group will also fail as it’s already in use.

sudo groupadd dialout -r -g 18
groupadd: GID '18' already exists

So instead, we can simply grab the entry from the OS group file and add it to /etc/group ourselves.

grep ^dialout: /usr/lib/group |sudo tee -a /etc/group

Now we are able to add ourselves to the dialout group!

sudo gpasswd -a ${USER} dialout

Activate that group in our current shell.

newgrp dialout

And now we can use a tool like screen to talk to the device (note you will have needed to install screen with rpm-ostree and rebooted first).

screen /dev/ttyUSB0 115200

And that’s it. We can now talk to USB serial devices on the host.

Inside a container with udev

Inside a container is a little more tricky as the dialout group is not passed into it. Thus, inside the container the device is owned by nobody and the user will have no permissions to read or write to it.

One way to deal with this and still use the regular toolbox command is to create a udev rule and make yourself the owner of the device on the host, instead of root.

To do this, we create a generic udev rule for all usb-serial devices.

cat << EOF | sudo tee /etc/udev/rules.d/50-usb-serial.rules
SUBSYSTEM=="tty", SUBSYSTEMS=="usb-serial", OWNER="${USER}"
EOF

If you need to create a more specific rule, you can find other bits to match by (like kernel driver, etc) with the udevadm command.

udevadm info -a -n /dev/ttyUSB0

Once you have your rule, reload udev.

sudo udevadm control --reload-rules
sudo udevadm trigger

Now, unplug your serial device and plug it back in. You should notice that it is now owned by your user.

ls -l /dev/ttyUSB0
crw-rw----. 1 csmart dialout 188, 0 Apr 18 20:53 /dev/ttyUSB0

It should also be the same inside the toolbox container now.

[21:03 csmart ~]$ toolbox enter
⬢[csmart@toolbox ~]$ ls -l /dev/ttyUSB0 
crw-rw----. 1 csmart nobody 188, 0 Apr 18 20:53 /dev/ttyUSB0

And of course, as this is inside a container, you can just dnf install screen or whatever other program you need.

Of course, if you’re happy to create the udev rule then you don’t need to worry about the groups solution on the host.

Making dnf on Fedora Silverblue a little easier with bash aliases

Fedora Silverblue doesn’t come with dnf because it’s an immutable operating system and uses a special tool called rpm-ostree to layer packages on top instead.

Most terminal work is designed to be done in containers with toolbox, but I still do a bunch of work outside of a container. Searching for packages to install with rpm-ostree still requires dnf inside a container, as it does not have that function.

I add these two aliases to my ~/.bashrc file so that using dnf to search or install into the default container is possible from a regular terminal. This just makes Silverblue a little bit more like what I’m used to with regular Fedora.

cat >> ~/.bashrc << EOF
alias sudo="sudo "
alias dnf="bash -c '#skip_sudo'; toolbox -y create 2>/dev/null; toolbox run sudo dnf"
EOF

If the default container doesn’t exist, toolbox creates it. Note that the alias for sudo has a space at the end. This tells bash to also check the next command word for alias expansion, which is what makes sudo work with aliases. Thus, we can make sure that both dnf and sudo dnf will work. The first part of the dnf alias is used to skip the sudo command so the rest is run as the regular user, which makes them both work the same.

We need to source that file or run a new bash session to pick up the aliases.

bash

Now we can just use dnf command like normal. Search can be used to find packages to install with rpm-ostree while installing packages will go into the default toolbox container (both with and without sudo are the same).

sudo dnf search vim
dnf install -y vim
The container is automatically created with dnf

To run vim from the example, enter the container and it will be there.

Vim in a container

You can do whatever you normally do with dnf, like install RPMs like RPMFusion and list repos.

Installing RPMFusion RPMs into container
Lising repositories in the container

Anyway, just a little thing but it’s kind of helpful to me.

April 16, 2020

Crisis Proofing the Australian Economy

An Open Letter to Prime Minister Scott Morrison

To The Hon Scott Morrison MP, Prime Minister,

No doubt how to re-invigorate our economy is high on your mind, among other priorities in this time of crisis.

As you're acutely aware, the pandemic we're experiencing has accelerated a long-term high unemployment trajectory we were already on due to industry retraction, automation, off-shoring jobs etc.

Now is the right time to enact changes that will bring long-term crisis resilience, economic stability and prosperity to this nation.

  1. Introduce a 1% tax on all financial / stock / commodity market transactions.
  2. Use 100% of that to fund a Universal Basic Income for all adult Australian citizens.

Funding a Universal Basic Income will bring:

  • Economic resilience in times of emergency (bushfire, drought, pandemic)
  • Removal of the need for government financial aid in those emergencies
  • Removal of all forms of pension and unemployment benefits
  • A more predictable, reduced and balanced government budget
  • Dignity and autonomy to those impacted by a economic events / crisis
  • Space and security for the innovative amongst us to take entrepreneurial risks
  • A growth in social, artistic and economic activity that could not happen otherwise

This is both simple to collect and simple to distribute to all tax payers. It can be done both swiftly and sensibly, enabling you to remove the Job Keeper band aid and it's related budgetary problems.

This is an opportunity to be seized, Mr Morrison.

There is also a second opportunity.

Post World War II, we had the Snowy River scheme. Today we have the housing affordability crisis and many Australians will never own their own home but a public building programme to provide 25% of housing will create a permanent employment and building boom and resolve the housing affordability crisis, over time.

If you cap repayments for those in public housing to 25% of their income, there will also be more disposable income circulating through the economy, creating prosperous times for all Australians.

Carpe diem, Mr Morrison.

Recognise the opportunity. Seize it.


Dear Readers,

If you support either or both of these ideas, please contact the Prime Minister directly and add your voice.

April 14, 2020

Exporting volumes from Cinder and re-creating COW layers

Share

Today I wandered into a bit of a rat hole discovering how to export data from OpenStack Cinder volumes when you don’t have admin permissions, and I thought it was worth documenting here so I remember it for next time.

Let’s assume that you have a Cinder volume named “child1”, which is a 64gb volume originally cloned from “parent1”. parent1 is a 7.9gb VMDK, but the only way I can find to extract child1 is to convert it to a glance image and then download the entire volume as a raw. Something like this:

$ cinder upload-to-image $child1 "extract:$child1"

Where $child1 is the UUID of the Cinder volume. You then need to find the UUID of the image in Glance, which the Cinder upload-to-image command will have told you, but you can also find by searching Glance for your image named “extract:$child1”:

$ glance image-list | grep "extract:$cinder_uuid"

You now need to watch that Glance image until the status of the image is “active”. It will go through a series of steps with names like “queued”, and “uploading” first.

Now you can download the image from Glance:

$ glance image-download --file images/$child1.raw --progress $glance_uuid

And then delete the intermediate glance image:

$ glance image-delete $glance_uuid

I have a bad sample script which does this in my junk code repository if that is helpful.

What you have at the end of this is a 64gb raw disk file in my example. You can convert that file to qcow2 like this:

$ qemu-img convert $child1.raw $child1.qcow2

But you’re left with a 64gb qcow2 file for your troubles. I experimented with virt-sparsify to reduce the size of this image, but it doesn’t work in my case (no space is saved), I suspect because the disk image has multiple partitions because it originally came from a VMWare environment.

Luckily qemu-img can also re-create the COW layer that existing on the admin-only side of the public cloud barrier. You do this by rebasing the converted qcow2 file onto the original VMDK file like this:

$ qemu-img create -f qcow2 -b $parent1.qcow2 $child1.delta.qcow2
$ qemu-img rebase -b $parent1.vmdk $child1.delta.qcow2

In my case I ended up with a 289mb $child1.delta.qcow2 file, which isn’t too shabby. It took about five minutes to produce that delta on my Google Cloud instance from a 7.9gb backing file and a 64gb upper layer.

Share

April 11, 2020

Using Gogo WiFi on Linux

Gogo, the WiFi provider for airlines like Air Canada, is not available to Linux users even though it advertises "access using any Wi-Fi enabled laptop, tablet or smartphone". It is however possible to work-around this restriction by faking your browser user agent.

I tried the User-Agent Switcher for Chrome extension on Chrome and Brave but it didn't work for some reason.

What did work was using Firefox and adding the following prefs in about:config to spoof its user agent to Chrome for Windows:

general.useragent.override=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36
general.useragent.updates.enabled=false
privacy.resistFingerprinting=false

The last two prefs are necessary in order for the hidden general.useragent.override pref to not be ignored.

Opt out of mandatory arbitration

As an aside, the Gogo terms of service automatically enroll you into mandatory arbitration unless you opt out by sending an email to customercare@gogoair.com within 30 days of using their service.

You may want to create an email template for this so that you can fire off a quick email to them as soon as you connect. I will probably write a script for it next time I use this service.

Fedora Silverblue is an amazing immutable desktop

I recently switched my regular Fedora 31 workstation over to the 31 Silverblue release. I’ve played with Project Atomic before and have been meaning to try it out more seriously for a while, but never had the time. Silverblue provided the catalyst to do that.

What this brings to the table is quite amazing and seriously impressive. The base OS is immutable and everyone’s install is identical. This means quality can be improved as there are less combinations and it’s easier to test. Upgrades to the next major version of Fedora are fast and secure. Instead of updating thousands of RPMs in-place, the new image is downloaded and the system reboots into it. As the underlying images don’t change, it also offers full rollback support.

This is similar to how platforms like Chrome OS and Android work, but thanks to ostree it’s now available for Linux desktops! That is pretty neat.

It doesn’t come with a standard package manager like dnf. Instead, any packages or changes you need to perform on the base OS are done using rpm-ostree command, which actually layers them on top.

And while technically you can install anything using rpm-ostree, ideally this should be avoided as much as possible (some low level apps like shells and libvirt may require it, though). Flatpak apps and containers are the standard way to consume packages. As these are kept separate from the base OS, it also helps improve stability and reliability.

Installing Silverblue

I copied the Silverblue installer to a USB stick and booted it to do the install. As my Dell XPS has an NVIDIA card, I modified the installer’s kernel args and disabled the nouveau driver with the usual nouveau.modeset=0 to get the install GUI to show up.

I’m also running in UEFI mode and due to a bug you have to use a separate, dedicated /boot/efi partition for Silverblue (personally, I think that’s a good thing to do anyway). Otherwise, the install looks pretty much the same as regular Fedora and went smoothly.

Once installed, I blacklisted the nouveau driver and rebooted. To make these kernel arguments permanent, we don’t use grub2, we set kernel args with rpm-ostree.

rpm-ostree kargs --append=modprobe.blacklist=nouveau --append=rd.driver.blacklist=nouveau

The NVIDIA drivers from RPMFusion are supported, so following this I had to add the repositories and drivers as RPMs on the base image.

rpm-ostree install \
https://download1.rpmfusion.org/free/fedora/rpmfusion-free-release-31.noarch.rpm \
https://download1.rpmfusion.org/nonfree/fedora/rpmfusion-nonfree-release-31.noarch.rpm
systemctl reboot

Once rebooted I then installed the necessary packages and rebooted again to activate them.

rpm-ostree install akmod-nvidia xorg-x11-drv-nvidia-cuda libva-utils libva-vdpau-driver gstreamer1-libav
rpm-ostree kargs --append=nvidia-drm.modeset=1
systemctl reboot

That was the base setup complete, which all went pretty smoothly. What you’re left with is the base OS with GNOME and a few core apps.

GNOME in Silverblue

Working with Silverblue

Using Silverblue is a different way of working than I have been used to. As mentioned above, there is no dnf command and packages are layered on top of the base OS with the rpm-ostree command. Because this is a layer, installing a new RPM requires a reboot to activate it, which is quite painful when you’re in the middle of some work and realise you need a program.

The answer though, is to use more containers instead of RPMs as I’m used to.

Containers

As I wrote about in an earlier blog post, toolbox is wrapper for setting up containers and compliments Silverblue wonderfully. If you need to install any terminal apps, give this a shot. Creating and running a container is as simple as this.

toolbox create
toolbox enter
Container on Fedora SIlverblue

Once inside your container use it like a normal Fedora machine (dnf is available!).

As rpm-ostree has no search function, using a container is the expected way to do this. Having created the container above, you can now use it (without entering it first) to perform package searches.

toolbox run dnf search vim

Apps

Graphical apps are managed with Flatpak, the new way to deliver secure, isolated programs on Linux. Silverblue is configured to use Fedora apps out of the box, and you can also add Flathub as a third party repo.

I experienced some small glitches with the Software GUI program when applying updates, but I don’t normally use it so I’m not sure if it’s just beta issues or not. As the default install is more sparse than usual, you’ll find yourself needing to install the apps you use. I really like this approach, it keeps the base system smaller and cleaner.

While Fedora provides their own Firefox package in Flatpak format (which is great) Mozilla also just recently started publishing their official package to Flathub. So, to install that, we simply add the Flathub as a repository and install away!

flatpak remote-add flathub https://flathub.org/repo/flathub.flatpakrepo
flatpak update
flatpak install org.mozilla.firefox

After install, Firefox should appears as a regular app inside GNOME.

Official Firefox from Mozilla via Flatpak

If you need to revert to an earlier version of a Flatpak (which I did when I was testing out Firefox beta), you can fetch the remote log for the app, then update to a specific commit.

flatpak remote-info --log flathub-beta org.mozilla.firefox//beta
flatpak update \
--commit 908489d0a77aaa8f03ca8699b489975b4b75d4470ce9bac92e56c7d089a4a869 \
org.mozilla.firefox//beta

Replacing system packages

If you have installed a Flatpak, like Firefox, and no-longer want to use the RPM version included in the base OS, you can use rpm-ostree to override it.

rpm-ostree override remove firefox

After a reboot, you will only see your Flatpak version.

Upgrades

I upgraded from 31 to the 32 beta, which was very fast by comparison to regular Fedora (because it just needs to download the new base image) and pretty seamless.

The only hiccup I had was needing to remove RPMFusion 31 release RPMs first, upgrade the base to 32, then install the RPMFusion 32 release RPMs. After that, I did an update for good measure.

rpm-ostree uninstall rpmfusion-nonfree-release rpmfusion-free-release
rpm-ostree rebase fedora:fedora/32/x86_64/silverblue
rpm-ostree install \
https://download1.rpmfusion.org/free/fedora/rpmfusion-free-release-32.noarch.rpm \
https://download1.rpmfusion.org/nonfree/fedora/rpmfusion-nonfree-release-32.noarch.rpm
systemctl reboot

Then post reboot, I did a manual update of the system.

rpm-ostree upgrade

You can see the current status of your system with the rpm-ostree command.

rpm-ostree status 

On my system you can see the ostree I’m using, the commit as well as both layered and local packages.

State: idle
AutomaticUpdates: disabled
Deployments:
● ostree://fedora:fedora/32/x86_64/silverblue
                   Version: 32.20200410.n.0 (2020-04-10T08:35:30Z)
                BaseCommit: d809af7c4f170a2175ffa1374827dd55e923209aec4a7fb4dfc7b87cd6c110c9
              GPGSignature: Valid signature by 97A1AE57C3A2372CCA3A4ABA6C13026D12C944D0
           LayeredPackages: akmod-nvidia git gstreamer1-libav ipmitool libva-utils libva-vdpau-driver libvirt
                            pass powertop screen tcpdump tmux vim virt-manager xorg-x11-drv-nvidia-cuda
             LocalPackages: rpmfusion-free-release-32-0.3.noarch rpmfusion-nonfree-release-32-0.4.noarch

  ostree://fedora:fedora/32/x86_64/silverblue
                   Version: 32.20200410.n.0 (2020-04-10T08:35:30Z)
                BaseCommit: d809af7c4f170a2175ffa1374827dd55e923209aec4a7fb4dfc7b87cd6c110c9
              GPGSignature: Valid signature by 97A1AE57C3A2372CCA3A4ABA6C13026D12C944D0
           LayeredPackages: akmod-nvidia git gstreamer1-libav ipmitool libva-utils libva-vdpau-driver libvirt
                            pass powertop screen tcpdump tmux vim virt-manager xorg-x11-drv-nvidia-cuda
             LocalPackages: rpmfusion-free-release-32-0.3.noarch rpmfusion-nonfree-release-32-0.4.noarch

To revert to the previous version temporarily, simply select it from the grub boot menu and you’ll go back in time. If you want to make this permanent, you can rollback to the previous state instead and then just reboot.

rpm-ostree rollback

Silverblue is really impressive and works well. I will continue to use it as my daily driver and see how it goes over time.

Tips

I have run into a couple of issues, mostly around using the Software GUI (which I don’t normally use). Mostly these were things like it listing updates for Flatpaks which were not actually there fore update, and when you tied to update it didn’t do anything.

If you hit issues, you can try clearing out the Software data and loading the program again.

pkill gnome-software
rm -rf ~/.cache/gnome-software

If you need to, you can also clean out and refresh the rpm-ostree cache and do an update.

rpm-ostree cleanup -m
rpm-ostree update

To repair and update Flatpaks, if you need to.

flatpak repair
flatpak update

Also see

Making dnf on the host terminal a little easier with aliases.

Accessing USB serial devices on the host and in a toolbox container.

A temporary return to Australia due to COVID-19

The last few months have been a rollercoaster, and we’ve just had to make another big decision that we thought we’d share.

TL;DR: we returned to Australia last night, hopeful to get back to Canada when we can. Currently in Sydney quarantine and doing fine.

UPDATE: please note that this isn’t at all a poor reflection on Canada. To the contrary, we have loved even the brief time we’ve had there, the wonderful hospitality and kindness shown by everyone, and the excellent public services there.

We moved to Ottawa, Canada at the end of February, for an incredible job opportunity with Service Canada which also presented a great life opportunity for the family. We enjoyed 2 “normal” weeks of settling in, with the first week dedicated to getting set up, and the second week spent establishing a work / school routine – me in the office, little A in school and T looking at work opportunities and running the household.

Then, almost overnight, everything went into COVID lock down. Businesses and schools closed. Community groups stopped meeting. Everyone people are being affected by this every day, so we have been very lucky to be largely fine and in good health, and we thought we could ride it out safely staying in Ottawa, even if we hadn’t quite had the opportunity to establish ourselves.

But then a few things happened which changed our minds – at least for now.

Firstly, with the schools shut down before the A had really had a chance to make friends (she only attended for 5 days before the school shut down), she was left feeling very isolated. The school is trying to stay connected with its students by providing a half hour video class each day, with a half hour activity in the afternoons, but it’s no way to help her to make new friends. A has only gotten to know the kids of one family in Ottawa, who are also in isolation but have been amazingly supportive (thanks Julie and family!), so we had to rely heavily on video playdates with cousins and friends in Australia, for which the timezone difference only allows a very narrow window of opportunity each day. With every passing day, the estimated school closures have gone from weeks, to months, to very likely the rest of the school year (with the new school year commencing in September). If she’d had just another week or two, she would have likely found a friend, so that was a pity. It’s also affected the availability of summer camps for kids, which we were relying on to help us with A through the 2 month summer holiday period (July & August).

Secondly, we checked our health cover and luckily the travel insurance we bought covered COVID conditions, but we were keen to get full public health cover. Usually for new arrivals there is a 3 month waiting period before this can be applied for. However, in response to the COVID threat the Ontario Government recently waived that waiting period for public health insurance, so we rushed to register. Unfortunately, the one service office that is able to process applications from non-Canandian citizens had closed by that stage due to COVID, with no re-opening being contemplated. We were informed that there is currently no alternative ability for non-citizens to apply online or over the phone.

Thirdly, the Australian Government has strongly encouraged all Australian citizens to return home, warning of the closing window for international travel. . We became concerned we wouldn’t have full consulate support if something went wrong overseas. A good travel agent friend of ours told us the industry is preparing for a minimum of 6 months of international travel restrictions, which raised the very real issue that if anything went wrong for us, then neither could we get home, nor family come to us. And, as we can now all appreciate, it’s probable that international travel disruptions and prohibitions will endure for much longer than 6 months.

Finally, we had a real scare. For context, we signed a lease for an apartment in a lovely part of central Ottawa, but we weren’t able to move in until early April, so we had to spend 5 weeks living in a hotel room. We did move into our new place just last Sunday and it was glorious to finally have a place, and for little A to finally have her own room, which she adored. Huge thanks to those who generously helped us make that move! The apartment is only 2 blocks away from A’s new school, which is incredibly convenient for us – it will particularly good during the worst of Ottawa’s winter. But little A, who is now a very active and adventurous 4 years old, managed to face plant off her scooter (trying to bunnyhop down a stair!) and she knocked out a front tooth, on only the second day in the new place! She is ok, but we were all very, very lucky that it was a clean accident with the tooth coming out whole and no other significant damage. But we struggled to get any non emergency medical support.

The Ottawa emergency dental service was directing us to a number that didn’t work. The phone health service was so busy that we were told we couldn’t even speak to a nurse for 24 hours. We could have called emergency services and gone to a hospital, which was comforting, but several Ottawa hospitals reported COVID outbreaks just that day, so we were nervous to do so. We ended up getting medical support from the dentist friend of a friend over text, but that was purely by chance. It was quite a wake up call as to the questions of what we would have done if it had been a really serious injury. We just don’t know the Ontario health system well enough, can’t get on the public system, and the pressure of escalating COVID cases clearly makes it all more complicated than usual.

If we’d had another month or two to establish ourselves, we think we might have been fine, and we know several ex-pats who are fine. But for us, with everything above, we felt too vulnerable to stay in Canada right now. If it was just Thomas and I it’d be a different matter.

So, we have left Ottawa and returned to Australia, with full intent to return to Canada when we can. As I write this, we are on day 2 of the 14 day mandatory isolation in Sydney. We were apprehensive about arriving in Sydney, knowing that we’d be put into mandatory quarantine, but the processing and screening of arrivals was done really well, professionally and with compassion. A special thank you to all the Sydney airport and Qatar Airways staff, immigration and medical officers, NSW Police, army soldiers and hotel staff who were all involved in the process. Each one acted with incredible professionalism and are a credit to their respective agencies. They’re also exposing themselves to the risk of COVID in order to help others. Amazing and brave people. A special thank you to Emma Rowan-Kelly who managed to find us these flights back amidst everything shutting down globally.

I will continue working remotely for Service Canada, on the redesign and implementation of a modern digital channel for government services. Every one of my team is working remotely now anyway, so this won’t be a significant issue apart from the timezone. I’ll essentially be a shift worker for this period Our families are all self isolating, to protect the grandparents and great-grandparents, so the Andrews family will be self-isolating in a location still to be confirmed. We will be traveling directly there once we are released from quarantine, but we’ll be contactable via email, fb, whatsapp, video, etc.

We are still committed to spending a few years in Canada, working, exploring and experiencing Canadian cultures, and will keep the place in Ottawa with the hope we can return there in the coming 6 months or so. We are very, very thankful for all the support we have had from work, colleagues, little A’s school, new friends there, as well as that of friends and family back in Australia.

Thank you all – and stay safe. This is a difficult time for everyone, and we all need to do our part and look after each other best we can.

Easy containers on Fedora with toolbox

The toolbox program is a wrapper for setting up containers on Fedora. It’s not doing anything you can’t do yourself with podman, but it does make using and managing containers more simple and easy to do. It comes by default on Silverblue where it’s aimed for use with terminal apps and dev work, but you can try it on a regular Fedora workstation.

sudo dnf install toolbox

Creating containers

You can create just one container if you want, which will be called something like fedora-toolbox-32, or you can create separate containers for different things. Up to you. As an example, let’s create a container called testing-f32.

toolbox create --container testing-f32

By default toolbox uses the Fedora registry and creates a container which is the same version as your host. However you can specify a different version if you need to, for example if you needed a Fedora 30 container.

toolbox create --release f30 --container testing-f30

These containers are not yet running, they’ve just been created for you.

View your containers

You can see your containers with the list option.

toolbox list

This will show you both the images in your cache and the containers in a nice format.

IMAGE ID      IMAGE NAME                                        CREATED
c49513deb616  registry.fedoraproject.org/f30/fedora-toolbox:30  5 weeks ago
f7cf4b593fc1  registry.fedoraproject.org/f32/fedora-toolbox:32  4 weeks ago

CONTAINER ID  CONTAINER NAME  CREATED        STATUS   IMAGE NAME
b468de87277b  testing-f30     5 minutes ago  Created  registry.fedoraproject.org/f30/fedora-toolbox:30
1597ab1a00a5  testing-f32     5 minutes ago  Created  registry.fedoraproject.org/f32/fedora-toolbox:32

As toolbox is a wrapper, you can also see this information with podman, but with two commands; one for images and one for containers. Notice that with podman you can also see that these containers are not actually running (that’s the next step).

podman images ; podman ps -a
registry.fedoraproject.org/f32/fedora-toolbox   32       f7cf4b593fc1   4 weeks ago    360 MB
registry.fedoraproject.org/f30/fedora-toolbox   30       c49513deb616   5 weeks ago    404 MB

CONTAINER ID  IMAGE                                             COMMAND               CREATED             STATUS   PORTS  NAMES
b468de87277b  registry.fedoraproject.org/f30/fedora-toolbox:30  toolbox --verbose...  About a minute ago  Created         testing-f30
1597ab1a00a5  registry.fedoraproject.org/f32/fedora-toolbox:32  toolbox --verbose...  About a minute ago  Created         testing-f32

You can also use podman to inspect the containers and appreciate all the extra things toolbox is doing for you.

podman inspect testing-f32

Entering a container

Once you have a container created, to use it you just enter it with toolbox.

toolbox enter --container testing-f32

Now you are inside your container which is separate from your host, but it generally looks the same. A number of bind mounts were created automatically for you and you’re still in your home directory. It is important to note that all containers you run with toolbox will share your home directory! Thus it won’t isolate different versions of the same software, for example, you would still need to create separate virtual environments for Python.

Any new shells or tabs you create in your terminal app will also be inside that container. Note the PS1 variable has changed to have a pink shape at the front (from /etc/profile.d/toolbox.sh).

Inside a container with toolbox

Note that you could also start and enter the container with podman.

podman start testing-f30
podman exec -it -u ${EUID} -w ${HOME} testing-f30 /usr/bin/bash

Hopefully you can see how toolbox make using containers easier!

Exiting a container

To get out of the container, just exit the shell and you’ll be back to your previous session on the host. The container will still exist and can be entered again, it is not deleted unless you delete it.

Removing a container

To remove a container, simply run toolbox with the rm option. Note that this still keeps the images around, it just deletes the instance of that image that’s running as that container.

toolbox rm -f testing-f32

Again, you can also delete this using podman.

Using containers

Once inside a container you can basically (mostly) treat your container system as a regular Fedora host. You can install any apps you want, such as terminal apps like screenfetch and even graphical programs like gedit (which work from inside the container).

sudo dnf install screenfetch gedit
screenfetch is always a favourite

For any programs that require RPMFusion, like ffmpeg, you first need to set up the repos as you would on a regular Fedora system.

sudo dnf install \
https://download1.rpmfusion.org/free/fedora/rpmfusion-free-release-$(rpm -E %fedora).noarch.rpm \
https://download1.rpmfusion.org/nonfree/fedora/rpmfusion-nonfree-release-$(rpm -E %fedora).noarch.rpm
sudo dnf install ffmpeg

These programs like screenfetch and ffmpeg are available inside your container, but not outside your container. They are isolated. To run them in the future you would enter the container and run the program.

Instead of entering and then running the program, you can also just use the run command. Here you can see screenfetch is not on my host, but I can run it in the container.

Those are pretty simple (silly?) examples, but hopefully it demonstrates the value of toolbox. It’s probably more useful for dev work where you can separate and manage different versions of various platforms, but it does make it really easy to quickly spin something outside of you host system.

April 06, 2020

The Calculating Stars

Share

Winner of both a Hugo, Locus and a Nebula, this book is about a mathematical prodigy battling her way into a career as an astronaut in a post-apolocalyptic 1950s America. Along the way she has to take on the embedded sexism of America in the 50s, as well as her own mild racism. Worse, she suffers from an anxiety condition.

The book is engaging and well written, with an alternative history plot line which believable and interesting. In fact, its quite topical for our current time.

I really enjoyed this book and I will definitely be reading the sequel.

The Calculating Stars Book Cover The Calculating Stars
Mary Robinette Kowal
May 16, 2019
432

The Right Stuff meets Hidden Figures by way of The Martian. A world in crisis, the birth of space flight and a heroine for her time and ours; the acclaimed first novel in the Lady Astronaut series has something for everyone., On a cold spring night in 1952, a huge meteorite fell to earth and obliterated much of the east coast of the United States, including Washington D.C. The ensuing climate cataclysm will soon render the earth inhospitable for humanity, as the last such meteorite did for the dinosaurs. This looming threat calls for a radically accelerated effort to colonize space, and requires a much larger share of humanity to take part in the process. Elma York's experience as a WASP pilot and mathematician earns her a place in the International Aerospace Coalition's attempts to put man on the moon, as a calculator. But with so many skilled and experienced women pilots and scientists involved with the program, it doesn't take long before Elma begins to wonder why they can't go into space, too. Elma's drive to become the first Lady Astronaut is so strong that even the most dearly held conventions of society may not stand a chance against her.

Share

April 05, 2020

Custom WiFi enabled nightlight with ESPHome and Home Assistant

I built this custom night light for my kids as a fun little project. It’s pretty easy so thought someone else might be inspired to do something similar.

Custom WiFi connected nightlight

Hardware

The core hardware is just an ESP8266 module and an Adafruit NeoPixel Ring. I also bought a 240V bunker light and took the guts out to use as the housing, as it looked nice and had a diffuser (you could pick anything that you like).

Removing existing components from bunker light

While the data pin of the NeoPixel Ring can pretty much connect to any GPIO pin on the ESP, bitbanging can cause flickering. It’s better to use pins 1, 2 or 3 on an ESP8266 where we can use other methods to talk to the device.

These methods are exposed in ESPHome’s support for NeoPixel.

  • ESP8266_DMA (default for ESP8266, only on pin GPIO3)
  • ESP8266_UART0 (only on pin GPIO1)
  • ESP8266_UART1 (only on pin GPIO2)
  • ESP8266_ASYNC_UART0 (only on pin GPIO1)
  • ESP8266_ASYNC_UART1 (only on pin GPIO2) (only on pin GPIO2)
  • ESP32_I2S_0 (ESP32 only)
  • ESP32_I2S_1 (default for ESP32)
  • BIT_BANG (can flicker a bit)

I chose GPIO2 and use ESP8266_UART1 method in the code below.

So, first things first, solder up some wires to 5V, GND and GPIO pin 2 on the ESP module. These connect to the 5V, GND and data pins on the NeoPixel Ring respectively.

It’s not very neat, but I used a hot glue gun to stick the ESP module into the bottom part of the bunker light, and fed the USB cable through for power and data.

I hot-glued the NeoPixel Ring in-place on the inside of the bunker light, in the centre, shining outwards towards the diffuser.

The bottom can then go back on and screws hold it in place. I used a hacksaw to create a little slot for the USB cable to sit in and then added hot-glue blobs for feet. All closed up, it looks like this underneath.

Looks a bit more professional from the top.

Code using ESPHome

I flashed the ESP8266 using ESPHome (see my earlier blog post) with this simple YAML config.

esphome:
  name: nightlight
  build_path: ./builds/nightlight
  platform: ESP8266
  board: huzzah
  esp8266_restore_from_flash: true

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

# Enable logging
logger:

# Enable Home Assistant API
api:
  password: '!secret api_password'

# Enable over the air updates
ota:
  password: !secret ota_password

mqtt:
  broker: !secret mqtt_broker
  username: !secret mqtt_username
  password: !secret mqtt_password
  port: !secret mqtt_port

light:
  - platform: neopixelbus
    pin: GPIO2
    method: ESP8266_UART1
    num_leds: 16
    type: GRBW
    name: "Nightlight"
    effects:
      # Customize parameters
      - random:
          name: "Slow Random"
          transition_length: 30s
          update_interval: 30s
      - random:
          name: "Fast Random"
          transition_length: 4s
          update_interval: 5s
      - addressable_rainbow:
          name: Rainbow
          speed: 10
          width: 50
      - addressable_twinkle:
          name: Twinkle Effect
          twinkle_probability: 5%
          progress_interval: 4ms
      - addressable_random_twinkle:
          name: Random Twinkle
          twinkle_probability: 5%
          progress_interval: 32ms
      - addressable_fireworks:
          name: Fireworks
          update_interval: 32ms
          spark_probability: 10%
          use_random_color: false
          fade_out_rate: 120
      - addressable_flicker:
          name: Flicker

The esp8266_restore_from_flash option is useful because if the light is on and someone accidentally turns it off, it will go back to the same state when it is turned back on. It does wear the flash out more quickly, however.

The important settings are the light component with the neopixelbus platform, which is where all the magic happens. We specify which GPIO on the ESP the data line on the NeoPixel Ring is connected to (pin 2 in my case). The method we use needs to match the pin (as discussed above) and in this example is ESP8266_UART1.

The number of LEDs must match the actual number on the NeoPixel Ring, in my case 16. This is used when talking to the on-chip LED driver and calculating effects, etc.

Similarly, the LED type is important as it determines which order the colours are in (swap around if colours don’t match). This must match the actual type of NeoPixel Ring, in my case I’m using an RGBW model which has a separate white LED and is in the order GRBW.

Finally, you get all sorts of effects for free, you just need to list the ones you want and any options for them. These show up in Home Assistant under the advanced view of the light (screenshot below).

Now it’s a matter of plugging the ESP module in and flashing it with esphome.

esphome nightlight.yaml run

Home Assistant

After a reboot, the device should automatically show up in Home Assistant under Configuration -> Devices. From here you can add it to the Lovelace dashboard and make Automations or Scripts for the device.

Nightlight in Home Assistant with automations

Adding it to Lovelace dashboard looks something like this, which lets you easily turn the light on and off and set the brightness.

You can also get advanced settings for the light, where you can change brightness, colours and apply effects.

Nightlight options

Effects

One of the great things about using ESPHome is all the effects which are defined in the YAML file. To apply an effect, choose it from the advanced device view in Home Assistant (as per screenshot above).

This is what rainbow looks like.

Nightlight running Rainbow effect

The kids love to select the colours and effects they want!

Automation

So, once you have the nightlight showing up in Home Assistant, we can create a simple automation to turn it on at sunset and off at sunrise.

Go to Configuration -> Automation and add a new one. You can fill in any name you like and there’s an Execute button there when you want to test it.

The trigger uses the Sun module and runs 10 minutes before sunset.

I don’t use Conditions, but you could. For example, only do this when someone’s at home.

The Actions are set to call the homeassistant.turn_on function and specifies the device(s). Note this takes a comma separated list, so if you have more than one nightlight you can do it with the one automation rule.

That’s it! You can create another one for sunrise, but instead of calling homeassistant.turn_on just call homeassistant.turn_off and use Sunrise instead Sunset.

Infinite complacency

Infinite complacency
Attachment Size
First paragraph of War of the Worlds 1.84 MB
kattekrab Sun, 05/04/2020 - 10:56

April 04, 2020

COVID-19 and Appreciation

So I’m going near people just once a week to shop. Once a day I go outside on my bike (but nowhere near people) to maintain my mental and physical health. It helps that we live in sparsely populated suburbs.

I shop at my local Woolworths (Findon, South Australia), and was very impressed what I saw today. Crosses on the floor positioning us 2m apart and a bouncer regulating the flow and keeping store numbers low. Same thing on the checkout, and in front of the Deli counter. While I was queuing, a young lady wiped down the trolley handle and offered me hand sanitiser.

The EFTPOS limit has been raised, so no need to use my fingers to enter a PIN number at the checkout. That’s good – I now regard that keypad as an efficient means to distribute a viral payload. Just wave my card 20mm above the machine and I have groceries to sustain my son and I for a week. It costs what I earn with just 1 hour of my labour.

In the middle of the biggest crisis to hit the World since WW2, I can buy just about anything I want. I could gain weight if I wanted too.

Our power went off in a storm last night, and with it the Broadband Internet. However I still had my phone, a hotspot, and a laptop connected to the Internet, friends and loved ones. I immediately received a text from the power company telling me the power would be restored in 2 hours. They did it in 1. In the middle of COVID-19. At night, in the rain. While waiting my son and I cooked a nice BBQ outside in the twilight using gas.

My part time day job is secure, my pay keeps coming, and we have transitioned to WFH and are working well. My shares have been smashed but I can live with that – they are still good companies and I am a long term investor. My son is being home schooled and his teachers at Findon High are working hard on online content and remote teaching.

The Australian COVID-19 new case numbers are dropping and recoveries picking up. Many people are not going to die. The Australian population are working together to beat this.

We are well informed by our public broadcaster the ABC, our media is uncensored, and I can choose to do my own analysis using open source data sets.

What a fantastic world we live in, that can supply a surplus of food, and keep all our institutions running at a time like this. Well done to the Australian government and people.

I feel very grateful.

April 03, 2020

Building Daedalus Flight on NixOS

NixOS Daedalus Gears by Craige McWhirter

Daedalus Flight was recently released and this is how you can build and run this version of Deadalus on NixOS.

If you want to speed the build process up, you can add the IOHK Nix cache to your own NixOS configuration:

iohk.nix:

nix.binaryCaches = [
  "https://cache.nixos.org"
  "https://hydra.iohk.io"
];
nix.binaryCachePublicKeys = [
  "hydra.iohk.io:f/Ea+s+dFdN+3Y/G+FDgSq+a5NEWhJGzdjvKNGv0/EQ="
];

If you haven't already, you can clone the Daedalus repo and specifically the 1.0.0 tagged commit:

$ git clone --branch 1.0.0 https://github.com/input-output-hk/daedalus.git

Once you've cloned the repo and checked you're on the 1.0.0 tagged commit, you can build Daedalus flight with the following command:

$ nix build -f . daedalus --argstr cluster mainnet_flight

Once the build completes, you're ready to launch Daedalus Flight:

$ ./result/bin/daedalus

To verify that you have in fact built Daedalus Flight, first head to the Daedalus menu then About Daedalus. You should see a title such as "DAEDALUS 1.0.0". The second check, is to press [Ctl]+d to access Daedalus Diagnostocs and your Daedalus state directory should have mainnet_flight at the end of the path.

If you've got these, give yourself a pat on the back and grab yourself a refreshing bevvy while you wait for blocks to sync.

Daedalus FC1 screenshot

Bebo, Betty, and Jaco

Wait, wasn’t WordPress 5.4 just released?

It absolutely was, and congratulations to everyone involved! Inspired by the fine work done to get another release out, I finally completed the last step of co-leading WordPress 5.0, 5.1, and 5.2 (Bebo, Betty, and Jaco, respectively).

My study now has a bit more jazz in it. 🙂

April 02, 2020

Audiobooks – March 2020

My rating for books I read. Note that I’m perfectly happy with anything scoring 3 or better.

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

The World As It is: Inside the Obama White House by Ben Rhodes

A memoir of a senior White House staffer, Speechwriter & Presidential adviser. Lots of interesting accounts with and behind the scenes information. 4/5

Redshirts by John Scalzi

A Star Trek parody from the POV of five ensigns who realise something is very strange on their ship. Plot moves steadily and the humour and action mostly work. 3/5

Little House on the Prairie by Laura Ingalls Wilder

The book covers less than a year as the Ingalls family build a cabin in Indian territory on the Kansas Prairie. Dangerous incidents and adventures throughout. 3/5

Wheels Stop: The Tragedies and Triumphs of the Space Shuttle Program, 1986-2011 by Rich Houston

A book about the post-Challenger Shuttle missions. An overview of most of the missions and the astronauts on them. Lots of quotes mainly from the astronauts. Good for Spaceflight fans. 3/5

The Optimist’s Telescope: Thinking Ahead in a Reckless Age by Bina Venkataraman

Ways that people, organisations and governments can start looking ahead at the long term rather than just the short and why they don’t already. Some good stuff 4/5

Share

April 01, 2020

Zoom's Make or Break Moment

Zoom is experiencing massive growth as large sections of the workforce transition to working from home. At the same time many problems with Zoom are coming to light. This is their make or break moment. If they fix the problems they end up with a killer video conferencing app. The alternative is that they join Cisco's Webex in the dumpster fire of awful enterprise software.

In the interest of transparency I am a paying Zoom customer and I use it for hours every day. I also use Webex (under protest) as it is a client's video conferencing platform of choice.

In the middle of last year Jonathan Leitschuh disclosed two bugs in zoom with security and privacy implications . There was a string of failures that lead to these bugs. To Zoom’s credit they published a long blog post about why these “features” were there in the first place.

Over the last couple of weeks other issues with Zoom have surfaced. “Zoom bombing” or using random 9 digit numbers to find meetings has become a thing. This is caused by zoom’s meeting rooms having a 9 digit code to join. That’s really handy when you have to dial in and enter the number on your telephone keypad. The down side is that you have a 1 in 999 999 999 chance of joining a meeting when using a random number. Zoom does offer the option of requiring a password or PIN for each call. Unfortunately it isn’t the default. Publishing a blog post on how to secure your meetings isn’t enough, the app needs to be more secure by default. The app should default to enabling a 6 digit PIN when creating a meeting.

The Intercept is reporting Zoom’s marketing department got a little carried away when describing the encryption used in the product. This is an area where words matter. Encryption in transit is a base line requirement in communication tools these days. Zoom has this, but their claims about end to end encryption appear to be false. End to end encryption is very important for some use cases. I await the blog post explaining this one.

I don’t know why Proton Mail’s privacy issues blog post got so much attention. This appears to be based on someone skimming the documentation rather than any real testing. Regardless the post got a lot of traction. Some of the same issues were flagged by the EFF.

Until recently zoom’s FAQ read “Does Zoom sell Personal Data? […] Depends what you mean by ‘sell’”. I’m sure that sounded great in a meeting but it is worrying when you read it as a customer. Once called out on social media it was quickly updated and a blog post published. In the post, Zoom assures users it isn’t selling their data.

Joseph Cox reported late last week that Zoom was sending data to Facebook every time someone used their iOS app. It is unclear if Joe gave Zoom an opportunity to fix the issue before publishing the article. The company pushed out a fix after the story broke.

The most recent issue broke yesterday about the Zoom macOS installer behaving like malware. This seems pretty shady behaviour, like their automatic reinstaller that was fixed last year. To his credit, Zoom Founder and CEO, Eric Yuan engaged with the issue on twitter. This will be one to watch over the coming days.

Over the last year I have seen a consistent pattern when Zoom is called out on security and valid privacy issues with their platform. They respond publicly with “oops my bad” blog posts . Many of the issues appear to be a result of them trying to deliver a great user experience. Unfortunately they some times lean too far toward the UX and ignore the security and privacy implications of their choices. I hope that over the coming months we see Zoom correct this balance as problems are called out. If they do they will end up with an amazing platform in terms of UX while keeping their users safe.

Update Since publishing this post additional issues with Zoom were reported. Zoom's CEO announced the company was committed to fixing their product.

March 31, 2020

Defining home automation devices in YAML with ESPHome and Home Assistant, no programming required!

Having built the core of my own “dumb” smart home system, I have been working on making it smart these past few years. As I’ve written about previously, the smart side of my home automation is managed by Home Assistant, which is an amazing, privacy focused open source platform. I’ve previously posted about running Home Assistant in Docker and in Podman.

Home Assistant, the privacy focused, open source home automation platform

I do have a couple of proprietary home automation products, including LIFX globes and Google Home. However, the vast majority of my home automation devices are ESP modules running open source firmware which connect to MQTT as the central protocol. I’ve built a number of sensors and lights and been working on making my light switches smart (more on that in a later blog post).

I already had experience with Arduino, so I started experimenting with this and it worked quite well. I then had a play with Micropython and really enjoyed it, but then I came across ESPHome and it blew me away. I have since migrated most of my devices to ESPHome.

ESPHome provides simple management of ESP devices

ESPHome is smart in making use of PlatformIO underneath, but its beauty lies in the way it abstracts away the complexities of programming for embedded devices. In fact, no programming is necessary! You simply have to define your devices in YAML and run a single command to compile the firmware blob and flash a device. Loops, initialising and managing multiple inputs and outputs, reading and writing to I/O, PWM, functions and callbacks, connecting to WiFi and MQTT, hosting an AP, logging and more is taken care of for you. Once up, the devices support mDNS and unencrypted over the air updates (which is fine for my local network). It supports both Home Assistant API and MQTT (over TLS for ESP8266) as well as lots of common components. There is even an addon for Home Assistant if you prefer using a graphical interface, but I like to do things on the command line.

When combined with Home Assistant, new devices are automatically discovered and appear in the web interface. When using MQTT, the channels are set with retain flag, so that the devices themselves and their last known states are not lost on reboots (you can disable this for testing).

That’s a lot of things you get for just a little bit of YAML!

Getting started

Getting started is pretty easy, just install esphome using pip.

pip3 install --user esphome

Of course, you will need a real physical ESP device of some description. Thanks to PlatformIO, lots of ESP8266 and ESP32 devices are supported. Although built on similar SOC, different devices break out different pins and can have different flashing requirements. Therefore, specifying the exact device is good and can be helpful, but it’s not strictly necessary.

It’s not just ESP modules that are supported. These days a number of commercial products are been built using ESP8266 chips which we can flash, like Sonoff power modules, Xiaomi temperature sensors, Brilliant Smart power outlets and Mirabella Genio light bulbs (I use one of these under my stairs).

For this post though, I will use one of my MH-ET Live ESP32Minikit devices as an example, which has the device name of mhetesp32minikit.

MH-ET Live ESP32Minikit

Managing configs with Git

Everything with your device revolves around your device’s YAML config file, including configuration, flashing, accessing logs, clearing out MQTT messages and more.

ESPHome has a wizard which will prompt you to enter your device details and WiFi credentials. It’s a good way to get started, however it only creates a skeleton file and you have to continue configuring the device manually to actually do anything anyway. So, I think ultimately it’s easier to just create and manage your own files, which we’ll do below. (If you want to give it a try, you can run the command esphome example.yaml wizard which will create an example.yaml file.)

I have two Git repositories to manage my ESPHome devices. The first one is for my WIFI and MQTT credentials, which are stored as variables in a file called secrets.yaml (store them in an Ansible vault, if you like). ESPHome automatically looks for this file when compiling firmware for a device and will use those variables.

Let’s create the Git repo and secrets file, replacing the details below with your own. Note that I am including the settings for an MQTT server, which is unencrypted in the example. If you’re using an MQTT server online you may want to use an ESP8266 device instead and enable TLS fingerprints for a more secure connection. I should also mention that MQTT is not required, devices can also use the Home Assistant API and if you don’t use MQTT those variables can be ignored (or you can leave them out).

mkdir ~/esphome-secrets
cd ~/esphome-secrets
cat > secrets.yaml << EOF
wifi_ssid: "ssid"
wifi_password: "wifi-password"
api_password: "api-password"
ota_password: "ota-password"
mqtt_broker: "mqtt-ip"
mqtt_port: 1883
mqtt_username: "mqtt-username"
mqtt_password: "mqtt-password"
EOF
git init
git add .
git commit -m "esphome secrets: add secrets"

The second Git repo has all of my device configs and references the secrets file from the other repo. I name each device’s config file the same as its name (e.g. study.yaml for the device that controls my study). Let’s create the Git repo and link to the secrets file and ignore things like the builds directory (where builds will go!).

mkdir ~/esphome-configs
cd ~/esphome-configs
ln -s ../esphome-secrets/secrets.yaml .
cat > .gitignore << EOF
/.esphome
/builds
/.*.swp
EOF
git init
git add .
git commit -m "esphome configs: link to secrets"

Creating a config

The config file contains different sections with core settings. You can leave some of these settings out, such as api, which will disable that feature on the device (esphome is required).

  • esphome – device details and build options
  • wifi – wifi credentials
  • logger – enable logging of device to see what’s happening
  • ota – enables over the air updates
  • api – enables the Home Assistant API to control the device
  • mqtt – enables MQTT to control the device

Now that we have our base secrets file, we can create our first device config! Note that settings with !secret are referencing the variables in our secrets.yaml file, thus keeping the values out of our device config. Here’s our new base config for an ESP32 device called example in a file called example.yaml which will connect to WiFi and MQTT.

cat > example.yaml << EOF
esphome:
  name: example
  build_path: ./builds/example
  platform: ESP32
  board: mhetesp32minikit

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

logger:

api:
  password: !secret api_password

ota:
  password: !secret ota_password

mqtt:
  broker: !secret mqtt_broker
  username: !secret mqtt_username
  password: !secret mqtt_password
  port: !secret mqtt_port
  # Set to true when finished testing to set MQTT retain flag
  discovery_retain: false
EOF

Compiling and flashing the firmware

First, plug your ESP device into your computer which should bring up a new TTY, such as /dev/ttyUSB0 (check dmesg). Now that you have the config file, we can compile it and flash the device (you might need to be in the dialout group). The run command actually does a number of things, include sanity check, compile, flash and tail the log.

esphome example.yaml run

This will compile the firmware in the specified build dir (./builds/example) and prompt you to flash the device. As this is a new device, an over the air update will not work yet, so you’ll need to select the TTY device. Once the device is running and connected to WiFi you can use OTA.

INFO Successfully compiled program.
Found multiple options, please choose one:
  [1] /dev/ttyUSB0 (CP2104 USB to UART Bridge Controller)
  [2] Over The Air (example.local)
(number): 

Once it is flashed, the device is automatically rebooted. The terminal should now be automatically tailing the log of the device (we enabled logger in the config). If not, you can tell esphome to tail the log by running esphome example.yaml logs.

INFO Successfully uploaded program.
INFO Starting log output from /dev/ttyUSB0 with baud rate 115200
[21:30:17][I][logger:156]: Log initialized
[21:30:17][C][ota:364]: There have been 0 suspected unsuccessful boot attempts.
[21:30:17][I][app:028]: Running through setup()...
[21:30:17][C][wifi:033]: Setting up WiFi...
[21:30:17][D][wifi:304]: Starting scan...
[21:30:19][D][wifi:319]: Found networks:
[21:30:19][I][wifi:365]: - 'ssid' (02:18:E6:22:E2:1A) ▂▄▆█
[21:30:19][D][wifi:366]:     Channel: 1
[21:30:19][D][wifi:367]:     RSSI: -54 dB
[21:30:19][I][wifi:193]: WiFi Connecting to 'ssid'...
[21:30:23][I][wifi:423]: WiFi Connected!
[21:30:23][C][wifi:287]:   Hostname: 'example'
[21:30:23][C][wifi:291]:   Signal strength: -50 dB ▂▄▆█
[21:30:23][C][wifi:295]:   Channel: 1
[21:30:23][C][wifi:296]:   Subnet: 255.255.255.0
[21:30:23][C][wifi:297]:   Gateway: 10.0.0.123
[21:30:23][C][wifi:298]:   DNS1: 10.0.0.1
[21:30:23][C][ota:029]: Over-The-Air Updates:
[21:30:23][C][ota:030]:   Address: example.local:3232
[21:30:23][C][ota:032]:   Using Password.
[21:30:23][C][api:022]: Setting up Home Assistant API server...
[21:30:23][C][mqtt:025]: Setting up MQTT...
[21:30:23][I][mqtt:162]: Connecting to MQTT...
[21:30:23][I][mqtt:202]: MQTT Connected!
[21:30:24][I][app:058]: setup() finished successfully!
[21:30:24][I][app:100]: ESPHome version 1.14.3 compiled on Mar 30 2020, 21:29:41

You should see the device boot up and connect to your WiFi and MQTT server successfully.

Adding components

Great! Now we have a basic YAML file, let’s add some components to make it do something more useful. Components are high level groups, like sensors, lights, switches, fans, etc. Each component is divided into platforms which is where different devices of that type are supported. For example, two of the different platforms under the light component are rgbw and neopixelbus.

One thing that’s useful to know is that platform devices with the name property set in the config will appear in Home Assistant. Those without will be only local to the device and just have an id. This is how you can link multiple components together on the device, then present a single device to Home Assistant (like garage remote below).

Software reset switch

First thing we can do is add a software switch which will let us reboot the device from Home Assistant (or by publishing manually to MQTT or API). To do this, we add the reboot platform from the switch component. It’s as simple as adding this to the bottom of your YAML file.

switch:
  - platform: restart
    name: "Example Device Restart"

That’s it! Now we can re-run the compile and flash. This time you can use OTA to flash the device via mDNS (but if it’s still connected via TTY then you can still use that instead).

esphome example.yaml run

This is what OTA updates look like.

INFO Successfully compiled program.
Found multiple options, please choose one:
  [1] /dev/ttyUSB0 (CP2104 USB to UART Bridge Controller)
  [2] Over The Air (example.local)
(number): 2
INFO Resolving IP address of example.local
INFO  -> 10.0.0.123
INFO Uploading ./builds/example/.pioenvs/example/firmware.bin (856368 bytes)
Uploading: [=====================================                       ] 62% 

After the device reboots, the new reset button should automatically show up in Home Assistant as a device, under Configuration -> Devices under the name example.

Home Assistant with auto-detected example device and reboot switch

Because we set a name for the reset switch, the reboot switch is visible and called Example Device Restart. If you want to make this visible on the main Overview dashboard, you can do so by selecting ADD TO LOVELACE.

Go ahead and toggle the switch while still tailing the log of the device and you should see it restart. If you’ve already disconnected your ESP device from your computer, you can tail the log using MQTT.

LED light switch

OK, so rebooting the device is cute. Now what if we want to add something more useful for home automation? Well that requires some soldering or breadboard action, but what we can do easily is use the built-in LED on the device as a light and control it through Home Assistant.

On the ESP32 module, the built-in LED is connected to GPIO pin 2. We will first define that pin as an output component using the ESP32 LEDC platform (supports PWM). We then attach a light component using the monochromatic platform to that output component. Let’s add those two things to our config!

output:
  # Built-in LED on the ESP32
  - platform: ledc
    pin: 2
    id: output_ledpin2

light:
  # Light created from built-in LED output
  - platform: monochromatic
    name: "Example LED"
    output: output_ledpin2

Build and flash the new firmware again.

esphome example.yaml run

After the device reboots, you should now be able to see the new Example LED automatically in Home Assistant.

Example device page in Home Assistant showing new LED light

If we toggle this light a few times, we can see the built-in LED on the ESP device fading in and out at the same time.

Other components

As mentioned previously, there are many devices we can easily add to a single board like relays, PIR sensors, temperature and humidity sensors, reed switches and more.

Reed switch, relay, PIR, temperature and humidity sensor (from top to bottom, left to right)

All we need to do is connect them up to appropriate GPIO pins and define them in the YAML.

PIR sensor

A PIR sensor connects to ground and 3-5V, with data connecting to a GPIO pin (let’s use 34 in the example). We read the GPIO pin and can tell when motion is detected because the control pin voltage is set to high. Under ESPHome we can use the binary_sensor component with gpio platform. If needed, pulling the pin down is easy, just set the default mode. Finally, we set the class of the device to motion which will set the appropriate icon in Home Assistant. It’s as simple as adding this to the bottom of your YAML file.

binary_sensor:
  - platform: gpio
    pin:
      number: 34
      mode: INPUT_PULLDOWN
    name: "Example PIR"
    device_class: motion

Again, compile and flash the firmware with esphome.

esphome example.yaml run

As before, after the device reboots again we should see the new PIR device appear in Home Assistant.

Example device page in Home Assistant showing new PIR input

Temperature and humidity sensor

Let’s do another example, a DHT22 temperature sensor connected to GPIO pin 16. Simply add this to the bottom of your YAML file.

sensor:
  - platform: dht
    pin: 16
    model: DHT22
    temperature:
      name: "Example Temperature"
    humidity:
      name: "Example Humidity"
    update_interval: 10s

Compile and flash.

esphome example.yaml run

After it reboots, you should see the new temperature and humidity inputs under devices in Home Assistant. Magic!

Example device page in Home Assistant showing new temperature and humidity inputs

Garage opener using templates and logic on the device

Hopefully you can see just how easy it is to add things to your ESP device and have them show up in Home Assistant. Sometimes though, you need to make things a little more tricky. Take opening a garage door for example, which only has one button to start and stop the motor in turn. To emulate pressing the garage opener, you need apply voltage to the opener’s push button input for a short while and then turn it off again. We can do all of this easily on the device with ESPHome and preset a single button to Home Assistant.

Let’s assume we have a relay connected up to a garage door opener’s push button (PB) input. The relay control pin is connected to our ESP32 on GPIO pin 22.

ESP32 device with relay module, connected to garage opener inputs

We need to add a couple of devices to the ESP module and then expose only the button out to Home Assistant. Note that the relay only has an id, so it is local only and not presented to Home Assistant. However, the template switch which uses the relay has a name is and it has an action which causes the relay to be turned on and off, emulating a button press.

Remember we already added a switch component for the reboot platform? Now need to add the new platform devices to that same section (don’t create a second switch entry).

switch:
  - platform: restart
    name: "Example Device Restart"

  # The relay control pin (local only)
  - platform: gpio
    pin: GPIO22
    id: switch_relay

  # The button to emulate a button press, uses the relay
  - platform: template
    name: "Example Garage Door Remote"
    icon: "mdi:garage"
    turn_on_action:
    - switch.turn_on: switch_relay
    - delay: 500ms
    - switch.turn_off: switch_relay

Compile and flash again.

esphome example.yaml run

After the device reboots, we should now see the new Garage Door Remote in the UI.

Example device page in Home Assistant showing new garage remote inputs

If you actually cabled this up and toggled the button in Home Assistant, the UI button turn on and you would hear the relay click on, then off, then the UI button would go back to the off state. Pretty neat!

There are many other things you can do with ESPHome, but this is just a taste.

Commit your config to Git

Once you have a device to your liking, commit it to Git. This way you can track the changes you’ve made and can always go back to a working config.

git add example.yaml
git commit -m "adding my first example config"

Of course it’s probably a good idea to push your Git repo somewhere remote, perhaps even share your configs with others!

Creating automation in Home Assistant

Of course once you have all these devices it’s great to be able to use them in Home Assistant, but ultimately the point of it all is to automate the home. Thus, you can use Home Assistant to set up scripts and react to things that happen. That’s beyond the scope of this particular post though, as I really wanted to introduce ESPHome and show how you can easily manage devices and integrate them with Home Assistant. There is pretty good documentation online though. Enjoy!

Overriding PlatformIO

As a final note, if you need to override something from PlatformIO, for example specifying a specific version of a dependency, you can do that by creating a modified platformio.ini file in your configs dir (copy from one of your build dirs and modify as needed). This way esphome will pick it up and apply that or you automatically.

Links March 2020

Rolling Stone has an insightful article about why the Christian Right supports Trump and won’t stop supporting him no matter what he does [1].

Interesting article about Data Oriented Architecture [2].

Quarantine Will normalise WFH and Recession will Denormalise Jobs [3]. I guess we can always hope that after a disaster we can learn to do things better than before.

Tyre wear is worse than exhaust for small particulate matter [4]. We need better tyres and legal controls over such things.

Scott Santens wrote an insightful article about the need for democracy and unconditional basic income [5]. “In ancient Greece, work was regarded as a curse” is an extreme position but strongly supported by evidence. ‘In his essay “In Praise of Idleness,” Bertrand Russell wrote “Modern methods of production have given us the possibility of ease and security for all; we have chosen, instead, to have overwork for some and starvation for others. Hitherto we have continued to be as energetic as we were before there were machines; in this we have been foolish, but there is no reason to go on being foolish forever.”‘

Cory Doctorow wrote an insightful article for Locus titled A Lever Without a Fulcrum Is Just a Stick about expansions to copyright laws [6]. One of his analogies is that giving a bullied kid more lunch money just allows the bullies to steal more money, with artists being bullied kids and lunch money being the rights that are granted under copyright law. The proposed solution includes changes to labor and contract law, presumably Cory will write other articles in future giving the details of his ideas in this regard.

The Register has an amusing article about the trial of a former CIA employee on trial for being the alleged “vault 7 leaker” [7]. Both the prosecution and the defence are building their cases around the defendent being a jerk. The article exposes poor security and poor hiring practices in the CIA.

CNN has an informative article about Finland’s war on fake news [8]. As Finland has long standing disputes with Russia they have had more practice at dealing with fake news than most countries.

The Times of Israel has an interesting article about how the UK used German Jews to spy on German prisoners of war [9].

Cory Doctorow wrote an insightful article “Data is the New Toxic Waste” about how collecting personal data isn’t an asset, it’s a liability [10].

Ulrike Uhlig wrote an insightful article about “Control Freaks”, analysing the different meanings of control, both positive and negative [11].

538 has an informative article about the value of statistical life [12]. It’s about $9M per person in the US, which means a mind-boggling amount of money should be spent to save the millions of lives that will be potentially lost in a natural disaster (like Coronavirus).

NPR has an interesting interview about Crypto AG, the Swiss crypto company owned by the CIA [13]. I first learned of this years ago, it’s not new, but I still learned a lot from this interview.

March 30, 2020

Resolving mDNS across VLANs with Avahi on OpenWRT

mDNS, or multicast DNS, is a way to discover devices on your network at .local domain without any central DNS configuration (also known as ZeroConf and Bonjour, etc). Fedora Magazine has a good article on setting it up in Fedora, which I won’t repeat here.

If you’re like me, you’re using OpenWRT with multiple VLANs to separate networks. In my case this includes my home automation (HA) network (VLAN 2) from my regular trusted LAN (VLAN 1). Various untrusted home automation products, as well as my own devices, go into the HA network (more on that in a later post).

In my setup, my OpenWRT router acts as my central router, connecting each of my networks and controlling access. My LAN can access everything in my HA network, but generally only establish related TCP traffic is allowed back from HA to LAN. There are some exceptions though, for example my Pi-hole DNS servers which are accessible from all networks, but otherwise that’s the general setup.

With IPv4, mDNS communicates by sending IP multicast UDP packets to 224.0.0.251 with source and destination ports both using 5353. In order to receive requests and responses, your devices need to be running an mDNS service and also allow incoming UDP traffic on port 5353.

As multicast is local only, mDNS doesn’t work natively across routed networks. Therefore, this prevents me from easily talking to my various HA devices from my LAN. In order to support mDNS across routed networks, you need a proxy in the middle to transparently send requests and responses back and forward. There are a few different options for a proxy, such as igmpproxy, but i prefer to use the standard Avahi server on my OpenWRT router.

Keep in mind that doing this will also mean that any device in your untrusted networks will be able to send mDNS requests into your trusted networks. We could stop the mDNS requests with an application layer firewall (which iptables is not), or perhaps with connection tracking, but we’ll leave that for another day. Even if untrusted devices discover addresses in LAN, the firewall is stopping the from actually communicating (at least on my setup).

Set up Avahi

Log onto your OpenWRT router and install Avahi.

opkg update
opkg install avahi-daemon

There is really only one thing that must be set in the config, and that is to enable reflector (proxy) support. This goes under the [reflector] section and looks like this.

[reflector]
enable-reflector=yes

While technically not required, you can also set which interfaces to listen on. By default it will listen on all networks, which includes WAN and other VLANs, so I prefer to limit this just to the two networks I need.

On my router, my LAN is the br-lan device and my home automation network on VLAN 2 is the eth1.2 device. Your LAN is probably the same, but your other networks will most likely be different. You can find these in your router’s Luci web interface under Network -> Interfaces. The interfaces option goes under the [server] section and looks like this.

[server]
allow-interfaces=br-lan,eth1.2

Now we can start and enable the service!

/etc/init.d/avahi-daemon start
/etc/init.d/avahi-daemon enable

OK that’s all we need to do for Avahi. It is now configured to listen on both LAN and HA interfaces and act as a proxy back and forth.

Firewall rules

As mentioned above, devices need to have incoming UDP port 5353 open. In order for our router to act as a proxy, we must enable this on both LAN and HA network interfaces (we’ll just configure for all interfaces). As mDNS multicasts to a specific address with source and destination ports both using 5353, we can lock this rule down a bit more.

Log onto your firewall Luci web interface and go to Network -> Firewall -> Traffic Rules tab. Under Open ports on router add a new rule for mDNS. This will be for UDP on port 5353.

Find the new rule in the list and edit it so we can customise it further. We can set the source to be any zone, source port to be 5353, where destination zone is the Device (input) and the destination address and port are 224.0.0.251 and 5353. Finally, set action should be accept. If you prefer to not allow all interfaces, then create two rules instead and restrict source zone for one to LAN and to your untrusted network for the other. Hit Save & Apply to make the rule!

We should now be able to resolve mDNS from LAN into the untrusted network.

Testing

To test it, ensure your Fedora computer is configured for mDNS and can resolve yourself. Now, try and ping a device in your untrusted network. For me, this will be study.local which is one of my home automation devices in my study (funnily enough).

ping study.local

When my computer in LAN tries to discover the device running in the study, the communication flow looks like this.

  • My computer (192.168.0.125) on LAN tries to ping study.local but needs to resolve it.
  • My computer sends out the mDNS UDP multicast to 224.0.0.251:5353 on the LAN, requesting address of study.local.
  • My router (192.168.0.1) picks up the request on LAN and sends same multicast request out on HA network (10.0.0.1).
  • The study device on HA network picks up the request and multicasts the reply of 10.0.0.202 back to 224.0.0.251:5353 on the HA network.
  • My router picks up the reply on HA network and re-casts it on LAN.
  • My computer picks up the reply on LAN and thus learns the address of the study device on HA network.
  • My computer successfully pings study.local at 10.0.0.202 from LAN by routing through my router to HA network.

This is what a packet capture looks like.

16:38:12.489582 IP 192.168.0.125.5353 > 224.0.0.251.5353: 0 A (QM)? study.local. (35)
16:38:12.489820 IP 10.0.0.1.5353 > 224.0.0.251.5353: 0 A (QM)? study.local. (35)
16:38:12.696894 IP 10.0.0.202.5353 > 224.0.0.251.5353: 0*- [0q] 1/0/0 (Cache flush) A 10.0.0.202 (45)
16:38:12.697037 IP 192.168.0.1.5353 > 224.0.0.251.5353: 0*- [0q] 1/0/0 (Cache flush) A 10.0.0.202 (45)

And that’s it! Now we can use mDNS to resolve devices in an untrusted network from a trusted network with zeroconf.

March 28, 2020

How to get a direct WebRTC connections between two computers

WebRTC is a standard real-time communication protocol built directly into modern web browsers. It enables the creation of video conferencing services which do not require participants to download additional software. Many services make use of it and it almost always works out of the box.

The reason it just works is that it uses a protocol called ICE to establish a connection regardless of the network environment. What that means however is that in some cases, your video/audio connection will need to be relayed (using end-to-end encryption) to the other person via third-party TURN server. In addition to adding extra network latency to your call that relay server might overloaded at some point and drop or delay packets coming through.

Here's how to tell whether or not your WebRTC calls are being relayed, and how to ensure you get a direct connection to the other host.

Testing basic WebRTC functionality

Before you place a real call, I suggest using the official test page which will test your camera, microphone and network connectivity.

Note that this test page makes use of a Google TURN server which is locked to particular HTTP referrers and so you'll need to disable privacy features that might interfere with this:

  • Brave: Disable Shields entirely for that page (Simple view) or allow all cookies for that page (Advanced view).

  • Firefox: Ensure that http.network.referer.spoofSource is set to false in about:config, which it is by default.

  • uMatrix: The "Spoof Referer header" option needs to be turned off for that site.

Checking the type of peer connection you have

Once you know that WebRTC is working in your browser, it's time to establish a connection and look at the network configuration that the two peers agreed on.

My favorite service at the moment is Whereby (formerly Appear.in), so I'm going to use that to connect from two different computers:

  • canada is a laptop behind a regular home router without any port forwarding.
  • siberia is a desktop computer in a remote location that is also behind a home router, but in this case its internal IP address (192.168.1.2) is set as the DMZ host.

Chromium

For all Chromium-based browsers, such as Brave, Chrome, Edge, Opera and Vivaldi, the debugging page you'll need to open is called chrome://webrtc-internals.

Look for RTCIceCandidatePair lines and expand them one at a time until you find the one which says:

  • state: succeeded (or state: in-progress)
  • nominated: true
  • writable: true

Then from the name of that pair (N6cxxnrr_OEpeash in the above example) find the two matching RTCIceCandidate lines (one local-candidate and one remote-candidate) and expand them.

In the case of a direct connection, I saw the following on the remote-candidate:

  • ip shows the external IP address of siberia
  • port shows a random number between 1024 and 65535
  • candidateType: srflx

and the following on local-candidate:

  • ip shows the external IP address of canada
  • port shows a random number between 1024 and 65535
  • candidateType: prflx

These candidate types indicate that a STUN server was used to determine the public-facing IP address and port for each computer, but the actual connection between the peers is direct.

On the other hand, for a relayed/proxied connection, I saw the following on the remote-candidate side:

  • ip shows an IP address belonging to the TURN server
  • candidateType: relay

and the same information as before on the local-candidate.

Firefox

If you are using Firefox, the debugging page you want to look at is about:webrtc.

Expand the top entry under "Session Statistics" and look for the line (should be the first one) which says the following in green:

  • ICE State: succeeded
  • Nominated: true
  • Selected: true

then look in the "Local Candidate" and "Remote Candidate" sections to find the candidate type in brackets.

Firewall ports to open to avoid using a relay

In order to get a direct connection to the other WebRTC peer, one of the two computers (in my case, siberia) needs to open all inbound UDP ports since there doesn't appear to be a way to restrict Chromium or Firefox to a smaller port range for incoming WebRTC connections.

This isn't great and so I decided to tighten that up in two ways by:

  • restricting incoming UDP traffic to the IP range of siberia's ISP, and
  • explicitly denying incoming to the UDP ports I know are open on siberia.

To get the IP range, start with the external IP address of the machine (I'll use the IP address of my blog in this example: 66.228.46.55) and pass it to the whois command:

$ whois 66.228.46.55 | grep CIDR
CIDR:           66.228.32.0/19

To get the list of open UDP ports on siberia, I sshed into it and ran nmap:

$ sudo nmap -sU localhost

Starting Nmap 7.60 ( https://nmap.org ) at 2020-03-28 15:55 PDT
Nmap scan report for localhost (127.0.0.1)
Host is up (0.000015s latency).
Not shown: 994 closed ports
PORT      STATE         SERVICE
631/udp   open|filtered ipp
5060/udp  open|filtered sip
5353/udp  open          zeroconf

Nmap done: 1 IP address (1 host up) scanned in 190.25 seconds

I ended up with the following in my /etc/network/iptables.up.rules (ports below 1024 are denied by the default rule and don't need to be included here):

# Deny all known-open high UDP ports before enabling WebRTC for canada
-A INPUT -p udp --dport 5060 -j DROP
-A INPUT -p udp --dport 5353 -j DROP
-A INPUT -s 66.228.32.0/19 -p udp --dport 1024:65535 -j ACCEPT

March 25, 2020

Updating OpenStack TripleO Ceph nodes safely one at a time

Part of the process when updating Red Hat’s TripleO based OpenStack is to apply the package and container updates, viaupdate run step, to the nodes in each Role (like Controller, CephStorage and Compute, etc). This is done in-place, before the ceph-upgrade (ceph-ansible) step, converge step and reboots.

openstack overcloud update run --nodes CephStorage

Rather than do an entire Role straight up however, I always update one node of that type first. This lets me make sure there were no problems (and fix them if there were), before moving onto the whole Role.

I noticed recently when performing the update step on CephStorage role nodes that OSDs and OSD nodes were going down in the cluster. This was then causing my Ceph cluster to go into backfilling and recovering (norebalance was set).

We want all of these nodes to be done one at a time, as taking more than one node out at a time can potentially make the Ceph cluster stop serving data (all VMs will freeze) until it finishes and gets the minimum number of copies in the cluster. If all three copies of data go offline at the same time, it’s not going to be able to recover.

My concern was that the update step does not check the status of the cluster, it just goes ahead and updates each node one by one (the seperate ceph update run step does check the state). If the Ceph nodes are updated faster than the cluster can fix itself, we might end up with multiple nodes going offline and hitting the issues mentioned above.

So to work around this I just ran this simple bash loop. It gets a list of all the Ceph Storage nodes and before updating each one in turn, checks that the status of the cluster is HEALTH_OK before proceeding. This would not possible if we update by Role instead.

source ~/stackrc
for node in $(openstack server list -f value -c Name |grep ceph-storage |sort -V); do
  while [[ ! "$(ssh -q controller-0 'sudo ceph -s |grep health:')" =~ "HEALTH_OK" ]] ; do
    echo 'cluster not healthy, sleeping before updating ${node}'
    sleep 5
  done
  echo 'cluster healthy, updating ${node}'
  openstack overcloud update run --nodes ${node} || { echo 'failed to update ${node}, exiting'; exit 1 ;}
  echo 'updated ${node} successfully'
done

I’m not sure if the cluster doing down like that this is expected behaviour, but I opened a bugzilla for it.

March 22, 2020

My POWER9 CPU Core Layout

So, following on from my post on Sensors on the Blackbird (and thus Power9), I mentioned that when you look at the temperature sensors for each CPU core in my 8-core POWER9 chip, they’re not linear numbers. Let’s look at what that means….

stewart@blackbird9$ sudo ipmitool sensor | grep core
 p0_core0_temp            | na                                                                                                               
 p0_core1_temp            | na                                                                                                               
 p0_core2_temp            | na                                                                                                               
 p0_core3_temp            | 38.000                                                                                                           
 p0_core4_temp            | na          
 p0_core5_temp            | 38.000      
 p0_core6_temp            | na          
 p0_core7_temp            | 38.000      
 p0_core8_temp            | na          
 p0_core9_temp            | na          
 p0_core10_temp           | na          
 p0_core11_temp           | 37.000      
 p0_core12_temp           | na          
 p0_core13_temp           | na          
 p0_core14_temp           | na          
 p0_core15_temp           | 37.000      
 p0_core16_temp           | na          
 p0_core17_temp           | 37.000      
 p0_core18_temp           | na          
 p0_core19_temp           | 39.000      
 p0_core20_temp           | na          
 p0_core21_temp           | 39.000      
 p0_core22_temp           | na          
 p0_core23_temp           | na        

You can see I have eight CPU cores in my Blackbird system. The reason the 8 CPU cores are core 3, 5, 7, 11, 15, 17, 19, and 21 rather than 0-8 or something is that these represent the core numbers on the physical die, and the die is a 24 core die. When you’re making a chip as big and as complex as modern high performance CPUs, not all of the chips coming out of your fab are going to be perfect, so this is how you get different models in the line with only one production line.

Weirdly, the output from the hwmon sensors and why there’s a “core 24” and a “core 28”. That’s just… wrong. What it is, however, is right if you think of 8*4=32. This is a product of Linux thinking that Thread=Core in some ways. So, yeah, this numbering is the first thread of each logical core.

[stewart@blackbird9 ~]$ sensors|grep -i core
 Chip 0 Core 0:            +39.0°C  (lowest = +25.0°C, highest = +71.0°C)
 Chip 0 Core 4:            +39.0°C  (lowest = +26.0°C, highest = +66.0°C)
 Chip 0 Core 8:            +39.0°C  (lowest = +27.0°C, highest = +67.0°C)
 Chip 0 Core 12:           +39.0°C  (lowest = +26.0°C, highest = +67.0°C)
 Chip 0 Core 16:           +39.0°C  (lowest = +25.0°C, highest = +67.0°C)
 Chip 0 Core 20:           +39.0°C  (lowest = +26.0°C, highest = +69.0°C)
 Chip 0 Core 24:           +39.0°C  (lowest = +27.0°C, highest = +67.0°C)
 Chip 0 Core 28:           +39.0°C  (lowest = +27.0°C, highest = +64.0°C)

But let’s ignore that, go from the IPMI sensors (which also match what the OCC shows with “occtoolp9 -LS” (see below).

$ ./occtoolp9 -SL
Sensor Details: (found 86 sensors, details only for Status of 0x00)                                           
     GUID Name             Sample     Min    Max U    Stat   Accum     UpdFreq   ScaleFactr   Loc   Type 
....
   0x00ED TEMPC03………     47      29     47 C    0x00 0x00037CF2 0x00007D00 0x00000100 0x0040 0x0008
   0x00EF TEMPC05………     37      26     39 C    0x00 0x00014E53 0x00007D00 0x00000100 0x0040 0x0008
   0x00F1 TEMPC07………     46      28     46 C    0x00 0x0001A777 0x00007D00 0x00000100 0x0040 0x0008
   0x00F5 TEMPC11………     44      27     45 C    0x00 0x00018402 0x00007D00 0x00000100 0x0040 0x0008
   0x00F9 TEMPC15………     36      25     43 C    0x00 0x000183BC 0x00007D00 0x00000100 0x0040 0x0008
   0x00FB TEMPC17………     38      28     41 C    0x00 0x00015474 0x00007D00 0x00000100 0x0040 0x0008
   0x00FD TEMPC19………     43      27     44 C    0x00 0x00016589 0x00007D00 0x00000100 0x0040 0x0008
   0x00FF TEMPC21………     36      30     40 C    0x00 0x00015CA9 0x00007D00 0x00000100 0x0040 0x0008

So what does that mean for physical layout? Well, like all modern high performance chips, the POWER9 is modular, with a bunch of logic being replicated all over the die. The most notable duplicated parts are the core (replicated 24 times!) and cache structures. Less so are memory controllers and PCI hardware.

P9 chip layout from page 31 of the POWER9 Register Specification

See that each core (e.g. EC00 and EC01) is paired with the cache block (EC00 and EC01 with EP00). That’s two POWER9 cores with one 512KB L2 cache and one 10MB L3 cache.

You can see the cache layout (including L1 Instruction and Data caches) by looking in sysfs:

$ for i in /sys/devices/system/cpu/cpu0/cache/index*/; \
  do echo -n $(cat $i/level) $(cat $i/size) $(cat $i/type); \
  echo; done
 1 32K Data
 1 32K Instruction
 2 512K Unified
 3 10240K Unified

So, what does the layout of my POWER9 chip look like? Well, thanks to the power of graphics software, we can cross some cores out and look at the topology:

My 8-core POWER9 CPU in my Raptor Blackbird

If I run some memory bandwidth benchmarks, I can see that you can see the L3 cache capacity you’d assume from the above diagram: 80MB (10MB/core). Let’s see:

[stewart@blackbird9 lmbench3]$ for i in 5M 10M 20M 30M 40M 50M 60M 70M 80M 500M; \
  do echo -n "$i   "; \
  ./bin/bw_mem -N 100  $i rd; \
done
  5M    5.24 63971.98
 10M   10.49 31940.14
 20M   20.97 17620.16
 30M   31.46 18540.64
 40M   41.94 18831.06
 50M   52.43 17372.03
 60M   62.91 16072.18
 70M   73.40 14873.42
 80M   83.89 14150.82
 500M 524.29 14421.35

If all the cores were packed together, I’d expect that cliff to be a lot sooner.

So how does this compare to other machines I have around? Well, let’s look at my Ryzen 7. Specifically, a “AMD Ryzen 7 1700 Eight-Core Processor”. The cache layout is:

$ for i in /sys/devices/system/cpu/cpu0/cache/index*/; \
  do echo -n $(cat $i/level) $(cat $i/size) $(cat $i/type); \
  echo; \
done
 1 32K Data
 1 64K Instruction
 2 512K Unified
 3 8192K Unified

And then the performance benchmark similar to the one I ran above on the POWER9 (lower numbers down low as 8MB is less than 10MB)

$ for i in 4M 8M 16M 24M 32M 40M 48M 56M 64M 72M 80M 500M; \
  do echo -n "$i   "; ./bin/x86_64-linux-gnu/bw_mem -N 10  $i rd;\
done
  4M    4.19 61111.04
  8M    8.39 28596.55
 16M   16.78 21415.12
 24M   25.17 20153.57
 32M   33.55 20448.20
 40M   41.94 20940.11
 48M   50.33 20281.39
 56M   58.72 21600.24
 64M   67.11 21284.13
 72M   75.50 20596.18
 80M   83.89 20802.40
 500M 524.29 21489.27

And my laptop? It’s a four core part, specifically a “Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz” with a cache layout like:

$ for i in /sys/devices/system/cpu/cpu0/cache/index*/; \
   do echo -n $(cat $i/level) $(cat $i/size) $(cat $i/type); \
     echo; \
   done
   1 32K Data
   1 32K Instruction
   2 256K Unified
   3 6144K Unified 
$ for i in 3M 6M 12M 18M 24M 30M 36M 42M 500M; \
  do echo -n "$i   "; ./bin/x86_64-linux-gnu/bw_mem -N 10  $i rd;\
done
  3M    3.15 48500.24
  6M    6.29 27144.16
 12M   12.58 18731.80
 18M   18.87 17757.74
 24M   25.17 17154.12
 30M   31.46 17135.87
 36M   37.75 16899.75
 42M   44.04 16865.44
 500M 524.29 16817.10

I’m not sure what performance conclusions we can realistically draw from these curves, apart from “keeping workload to L3 cache is cool”, and “different chips have different cache hardware”, and “I should probably go and read and remember more about the microarchitectural characteristics of the cache hardware in Ryzen 7 hardware and 10th gen Intel Core hardware”.

Online Teaching

The OpenSTEM® materials are ideally suited to online teaching. In these times of new challenges and requirements, there are a lot of technological possibilities. Schools and teachers are increasingly being asked to deliver material online to students. Our materials can assist with that process, especially for Humanities and Science subjects from Prep/Kindy/Foundation to Year 6. […]

Covid 19 Numbers – lag

Recording some thoughts about Covid 19 numbers.

Today’s figures

The Government says:

“As at 6.30am on 22 March 2020, there have been 1,098 confirmed cases of COVID-19 in Australia”.

The reference is https://www.health.gov.au/news/health-alerts/novel-coronavirus-2019-ncov-health-alert/coronavirus-covid-19-current-situation-and-case-numbers. However, that page is updated daily (ish), so don’t expect it to be the same if you check the reference.

Estimating Lag

If a person tests positive to the virus today, that means they were infected at some time in the past. So, what is the lag between infection and a positive test result?

Incubation Lag – about 5 days

When you are infected you don’t show symptoms immediately. Rather, there’s an incubation period before symptoms become apparent.  The time between being infected and developing symptoms varies from person to person, but most of the time a person shows symptoms after about 5 days (I recall seeing somewhere that 1 in a 1000 cases will develop symptoms after 14 days).

Presentation Lag – about 2 days

I think it’s fair to also assume that people are not presenting at testing immediately they become ill. It is probably taking them a couple of days from developing symptoms to actually get to the doctor – I read a story somewhere (have since lost the reference) about a young man who went to a party, then felt bad for days but didn’t go for a test until someone else from the party had returned a positive test.  Let’s assume there’s a mix of worried well and stoic types and call it 2 days from becoming symptomatic to seeking a test.

Referral Lag – about a day

Assuming that a GP is available straight away and recommends a test immediately, logistically there will still be most of a day taken up between deciding to see a doctor and having a test carried out.

Testing lag – about 2 days

The graph of infections “epi graph” today looks like this:

200322_new-and-cumulative-covid-19-cases-in-australia-by-notification-date_1

One thing you notice about the graph is that the new cases bars seem to increase for a couple of days, then decrease – so about 100 new cases in the last 24 hours, but almost 200 in the 24 hours before that. From the graph, the last 3 “dips” have been today (Sunday), last Thursday and last Sunday.  This seems to be happening every 3 to 4 days. I initially thought that the dips might mean fewer (or more) people presenting over weekends, but the period is inconsistent with that. I suspect, instead, that this actually means that testing is being batched.

That would mean that neither the peaks nor troughs is representative of infection surges/retreats, but is simply reflecting when tests are being processed. This seems to be a 4 day cycle, so, on average it seems that it would be about 2 days between having the test conducted and receiving a result. So a confirmed case count published today is actually showing confirmed cases as at about 2 days earlier.

Total lag

From the date someone is infected to the time that they receive a positive confirmation is about:

lag = time for symptoms to show+time to seek a test+referral time + time for the test to return a result

So, the published figures on confirmed infections are probably lagging actual infections in the community by about 10 days (5+2+1+2).

If there’s about a 10 day lag between infection and confirmation, then what a figure published today says is that about a week and a half ago there were about this many cases in the community.  So, the 22 March figure of 1098 infections is actually really a 12 March figure.

What the lag means for Physical (ie Social) Distancing

The main thing that the lag means is that if we were able to wave a magic wand today and stop all further infections, we would continue to record new infections for about 10 days (and the tail for longer). In practical terms, implementing physical distancing measures will not show any effect on new cases for about a week and a half. That’s because today there are infected people who are yet to be tested.

The silver lining to that is that the physical distancing measures that have been gaining prominence since 15 March should start to show up in the daily case numbers from the middle of the coming week, possibly offset by overseas entrants rushing to make the 20 March entry deadline.

Estimating Actual Infections as at Today

How many people are infected, but unconfirmed as at today? To estimate actual infections you’d need to have some idea of the rate at which infections are increasing. For example, if infections increased by 10% per day for 10 days, then you’d multiply the most recent figure by 1.1 raised to the power of 10 (ie about 2.5).  Unfortunately, the daily rate of increase (see table on the wiki page) has varied a fair bit (from 20% to 27%) over the most recent 10 days of data (that is, over the 10 days prior to 12 March, since the 22 March figures roughly correspond to 12 March infections) and there’s no guarantee that since that time the daily increase in infections will have remained stable, particularly in light of the implementation of physical distancing measures. At 23.5% per day, the factor is about 8.

There aren’t any reliable figures we can use to estimate the rate of infection during the current lag period (ie from 12 March to 22 March). This is because the vast majority of cases have not been from unexplained community transmission. Most of the cases are from people who have been overseas in the previous fortnight and they’re the cohort that has been most significantly impacted by recent physical distancing measures. From 15 March, they have been required to self isolate and from 20 March most of their entry into the country has stopped.  So I’d expect a surge in numbers up to about 30 March – ie reflecting infections in the cohort of people rushing to get into the country before the borders closed followed by a flattening. With the lag factor above, you’ll need to wait until 1 April or thereabouts to know for sure.

Note:

This post is just about accounting for the time lag between becoming infected and receiving a positive test result. It assumes, for example, that everyone who is infected seeks a test, and that everyone who is infected and seeks a test is, in fact, tested. As at today, neither of these things is true.

OCC and Sensors on the Raptor Blackbird (and other POWER9 systems)

This post we’re going to look at three different ways to look at various sensors in the Raptor Blackbird system. The Blackbird is a single socket uATX board for the POWER9 processor. One advantage of the system is completely open source firmware, so you can (like I have): build your own firmware. So, this is my Blackbird running my most recent firmware build (the BMC is running the 2.00 release from Raptor).

Sensors over IPMI

One way to get the sensors is over IPMI. This can be done either in-band (as in, from the OS running on the blackbird), or over the network.

stewart@blackbird9$ sudo ipmitool sensor |head
occ                      | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
 occ0                     | 0x0        | discrete   | 0x0200| na        | na        | na        | na        | na        | na        
 occ1                     | 0x0        | discrete   | 0x0100| na        | na        | na        | na        | na        | na        
 p0_core0_temp            | na         |            | na    | na        | na        | na        | na        | na        | na        
 p0_core1_temp            | na         |            | na    | na        | na        | na        | na        | na        | na        
 p0_core2_temp            | na         |            | na    | na        | na        | na        | na        | na        | na        
 p0_core3_temp            | 38.000     | degrees C  | ok    | na        | -40.000   | na        | 78.000    | 90.000    | na        
 p0_core4_temp            | na         |            | na    | na        | na        | na        | na        | na        | na        
 p0_core5_temp            | 38.000     | degrees C  | ok    | na        | -40.000   | na        | 78.000    | 90.000    | na        
 p0_core6_temp            | na         |            | na    | na        | na        | na        | na        | na        | na    

It’s kind of annoying to read there, so standard unix tools to the rescue!

stewart@blackbird9$ sudo ipmitool sensor | cut -d '|' -f 1,2
 occ                      | na                                                                                                               
 occ0                     | 0x0                                                                                                              
 occ1                     | 0x0                                                                                                              
 p0_core0_temp            | na                                                                                                               
 p0_core1_temp            | na                                                                                                               
 p0_core2_temp            | na                                                                                                               
 p0_core3_temp            | 38.000                                                                                                           
 p0_core4_temp            | na          
 p0_core5_temp            | 38.000      
 p0_core6_temp            | na          
 p0_core7_temp            | 38.000      
 p0_core8_temp            | na          
 p0_core9_temp            | na          
 p0_core10_temp           | na          
 p0_core11_temp           | 37.000      
 p0_core12_temp           | na          
 p0_core13_temp           | na          
 p0_core14_temp           | na          
 p0_core15_temp           | 37.000      
 p0_core16_temp           | na          
 p0_core17_temp           | 37.000      
 p0_core18_temp           | na          
 p0_core19_temp           | 39.000      
 p0_core20_temp           | na          
 p0_core21_temp           | 39.000      
 p0_core22_temp           | na          
 p0_core23_temp           | na          
 p0_vdd_temp              | 40.000 
 dimm0_temp               | 35.000      
 dimm1_temp               | na          
 dimm2_temp               | na          
 dimm3_temp               | na          
 dimm4_temp               | 38.000      
 dimm5_temp               | na          
 dimm6_temp               | na          
 dimm7_temp               | na          
 dimm8_temp               | na          
 dimm9_temp               | na          
 dimm10_temp              | na          
 dimm11_temp              | na          
 dimm12_temp              | na          
 dimm13_temp              | na          
 dimm14_temp              | na          
 dimm15_temp              | na          
 fan0                     | 1200.000    
 fan1                     | 1100.000    
 fan2                     | 1000.000    
 p0_power                 | 33.000      
 p0_vdd_power             | 5.000       
 p0_vdn_power             | 9.000       
 cpu_1_ambient            | 30.600      
 pcie                     | 27.000      
 ambient                  | 26.000  

You can see that I have 3 fans, two DIMMs (although why it lists 16 possible DIMMs for a two DIMM slot board is a good question!), and eight CPU cores. More on why the layout of the CPU cores is the way it is in a future post.

The code path for reading these sensors is interesting, it’s all from the BMC, so we’re having the OCC inside the P9 read things, which the BMC then reads, and then passes back to the P9. On the P9 itself, each sensor is a call all the way to firmware and back! In fact, we can look at it in perf:

$ sudo perf record -g ipmitool sensor
$ sudo perf report --no-children
“ipmitool sensors” perf report

What are the 0x300xxxxx addresses? They’re the OPAL firmware (i.e. skiboot). We can look up the symbols easily, as the firmware exposes them to the kernel, which then plonks it in sysfs:

[stewart@blackbird9 ~]$ sudo head /sys/firmware/opal/symbol_map 
[sudo] password for stewart: 
0000000000000000 R __builtin_kernel_end
0000000000000000 R __builtin_kernel_start
0000000000000000 T __head
0000000000000000 T _start
0000000000000010 T fdt_entry
00000000000000f0 t boot_sem
00000000000000f4 t boot_flag
00000000000000f8 T attn_trigger
00000000000000fc T hir_trigger
0000000000000100 t sreset_vector

So we can easily look up exactly where this is:

[stewart@blackbird9 ~]$ sudo grep '18e.. ' /sys/firmware/opal/symbol_map 
 0000000000018e20 t .__try_lock.isra.0
 0000000000018e68 t .add_lock_request

So we’re managing to spend a whole 12% of execution time spinning on a spinlock in firmware! The call stack of what’s going on in firmware isn’t so easy, but we can find the bt_add_ipmi_msg call there which is probably how everything starts:

[stewart@blackbird9 ~]$ sudo grep '516.. ' /sys/firmware/opal/symbol_map   0000000000051614 t .bt_add_ipmi_msg_head  0000000000051688 t .bt_add_ipmi_msg  00000000000516fc t .bt_poll

OCCTOOL

This is the most not-what-you’re-meant-to-use method of getting access to sensors! It’s using a debug tool for the OCC firmware! There’s a variety of tools in the OCC source repositiory, and one of them (occtoolp9) can be used for a variety of things, one of which is getting sensor data out of the OCC.

$ sudo ./occtoolp9 -SL
     Sensor Type: 0xFFFF
 Sensor Location: 0xFFFF
     (only displaying non-zero sensors)
 Sending 0x53 command to OCC0 (via opal-prd)…
   MFG Sub Cmd: 0x05  (List Sensors)
   Num Sensors: 50
     [ 1] GUID: 0x0000 / AMEintdur…….  Sample:     20  (0x0014)
     [ 2] GUID: 0x0001 / AMESSdur0…….  Sample:      7  (0x0007)
     [ 3] GUID: 0x0002 / AMESSdur1…….  Sample:      3  (0x0003)
     [ 4] GUID: 0x0003 / AMESSdur2…….  Sample:     23  (0x0017)

The odd thing you’ll see is “via opal-prd” – and this is because it’s doing raw calls to the opal-prd binary to talk to the OCC firmware running things like “opal-prd --expert-mode htmgt-passthru“. Yeah, this isn’t a in-production thing :)

Amazingly (and interestingly), this doesn’t go through host firmware in the way that an IPMI call will. There’s a full OCC/Host firmware interface spec to read. But it’s insanely inefficient way to monity sensors, a long bash script shelling out to a whole bunch of other processes… Think ~14.4 billion cycles versus ~367million cycles for the ipmitool option above.

But there are some interesting sensors at the end of the list:

Sensor Details: (found 86 sensors, details only for Status of 0x00)                                                  
     GUID Name             Sample     Min    Max U    Stat   Accum     UpdFreq   ScaleFactr   Loc   Type   
....
   0x014A MRDM0………..    688       3  15015 GBs  0x00 0x0144AE6C 0x00001901 0x000080FB 0x0008 0x0200
   0x014E MRDM4………..    480       3  14739 GBs  0x00 0x01190930 0x00001901 0x000080FB 0x0008 0x0200
   0x0156 MWRM0………..    560       4  16605 GBs  0x00 0x014C61FD 0x00001901 0x000080FB 0x0008 0x0200
   0x015A MWRM4………..    360       4  16597 GBs  0x00 0x014AE231 0x00001901 0x000080FB 0x0008 0x0200

is that memory bandwidth? Well, if I run the STREAM benchmark in a loop and look again:

0x014A MRDM0………..  15165       3  17994 GBs  0x00 0x0C133D6C 0x00001901 0x000080FB 0x0008 0x0200
   0x014E MRDM4………..  17145       3  18016 GBs  0x00 0x0BF501D6 0x00001901 0x000080FB 0x0008 0x0200
   0x0156 MWRM0………..   8063       4  24280 GBs  0x00 0x07C98B88 0x00001901 0x000080FB 0x0008 0x0200
   0x015A MWRM4………..   1138       4  24215 GBs  0x00 0x07CE82AF 0x00001901 0x000080FB 0x0008 0x0200

It looks like it! Are these exposed elsewhere? Well, another blog post at some point in the future is where I should look at that.

lm-sensors

$ rpm -qf /usr/bin/sensors
 lm_sensors-3.5.0-6.fc31.ppc64le

Ahhh, old faithful lm-sensors! Yep, a whole bunch of sensors are just exposed over the standard interface that we’ve been using since ISA was a thing.

[stewart@blackbird9 ~]$ sensors                                                                  
 ibmpowernv-isa-0000                                       
 Adapter: ISA adapter                                      
 Chip 0 Vdd Remote Sense:  +1.02 V  (lowest =  +0.72 V, highest =  +1.02 V)
 Chip 0 Vdn Remote Sense:  +0.67 V  (lowest =  +0.67 V, highest =  +0.67 V)
 Chip 0 Vdd:               +1.02 V  (lowest =  +0.73 V, highest =  +1.02 V)
 Chip 0 Vdn:               +0.68 V  (lowest =  +0.68 V, highest =  +0.68 V)
 Chip 0 Core 0:            +47.0°C  (lowest = +25.0°C, highest = +71.0°C)            
 Chip 0 Core 4:            +47.0°C  (lowest = +26.0°C, highest = +66.0°C)            
 Chip 0 Core 8:            +48.0°C  (lowest = +27.0°C, highest = +67.0°C)            
 Chip 0 Core 12:           +48.0°C  (lowest = +26.0°C, highest = +67.0°C)            
 Chip 0 Core 16:           +47.0°C  (lowest = +25.0°C, highest = +67.0°C)                      
 Chip 0 Core 20:           +47.0°C  (lowest = +26.0°C, highest = +69.0°C)            
 Chip 0 Core 24:           +48.0°C  (lowest = +27.0°C, highest = +67.0°C)                     
 Chip 0 Core 28:           +51.0°C  (lowest = +27.0°C, highest = +64.0°C)                     
 Chip 0 DIMM 0 :           +40.0°C  (lowest = +34.0°C, highest = +44.0°C)                     
 Chip 0 DIMM 1 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)                     
 Chip 0 DIMM 2 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 3 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 4 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 5 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 6 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 7 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 8 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 9 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 10 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 11 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 12 :          +43.0°C  (lowest = +36.0°C, highest = +47.0°C)
 Chip 0 DIMM 13 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 14 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 15 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 Nest:              +48.0°C  (lowest = +27.0°C, highest = +64.0°C)
 Chip 0 VRM VDD:           +47.0°C  (lowest = +39.0°C, highest = +66.0°C)
 Chip 0 :                  44.00 W  (lowest =  31.00 W, highest = 132.00 W)
 Chip 0 Vdd:               15.00 W  (lowest =   4.00 W, highest = 104.00 W)
 Chip 0 Vdn:               10.00 W  (lowest =   8.00 W, highest =  12.00 W)
 Chip 0 :                 227.11 kJ
 Chip 0 Vdd:               44.80 kJ
 Chip 0 Vdn:               58.80 kJ
 Chip 0 Vdd:              +21.50 A  (lowest =  +6.50 A, highest = +104.75 A)
 Chip 0 Vdn:              +14.88 A  (lowest = +12.63 A, highest = +18.88 A)

The best thing? It’s really quick! The hwmon interface is fast and efficient.

March 21, 2020

Using Ansible and dynamic inventory to manage OpenStack TripleO nodes

TripleO based OpenStack deployments use an OpenStack all-in-one node (undercloud) to automate the build and management of the actual cloud (overcloud) using native services such as Heat and Ironic. Roles are used to define services and configuration, which are then applied to specific nodes, for example, Service, Compute and CephStorage, etc.

Although the install is automated, sometimes you need to run adhoc tasks outside of the official update process. For example, you might want to make sure that all hosts are contactable, have a valid subscription (for Red Hat OpenStack Platform), restart containers, or maybe even apply custom changes or patches before an update. Also, during the update process when nodes are being rebooted, it can be useful to use an Ansible script to know when they’ve all come back, services are all running, all containers are healthy, before re-enabling them.

Inventory script

To make this easy, we can use the TripleO Ansible inventory script, which queries the undercloud to get a dynamic inventory of the overcloud nodes. When using the script as an inventory source with the ansible command however, you cannot pass arguments to it. If you’re managing a single cluster and using the standard stack name of overcloud, then this is not a problem; you can just call the script directly.

However, as I manage multiple clouds and each has a different Heat stack name, I create a little executable wrapper script to pass the stack name to the inventory script. Then I just call the relevant shell script instead. If you use the undercloud host to manage multiple stacks, then create multiple scripts and modify as required.

cat >> inventory-overcloud.sh << EOF
#!/usr/bin/env bash
source ~/stackrc
exec /usr/bin/tripleo-ansible-inventory --stack stack-name --list
EOF

Make it executable and run it. It should return JSON with your overcloud node details.

chmod u+x inventory-overcloud.sh
./inventory-overcloud.sh

Run simple tasks

The purpose of using the dynamic inventory is to run some Ansible! We can now use it to do simple things easily, like ping nodes to make sure they are online.

ansible \
--inventory inventory-overcloud.sh \
all \
--module-name ping

And of course one of the great things with Ansible is the ability to limit which hosts you’re running against. So for example, to make sure all compute nodes of role type Compute are back, simple replace all with Compute.

ansible \
--inventory inventory-overcloud.sh \
Compute \
--module-name ping

You can also specify nodes individually.

ansible \
--inventory inventory-overcloud.sh \
service-0,telemetry-2,compute-0,compute-1 \
--module-name ping

You can use the shell module to do simple adhoc things, like restart containers or maybe check their health.

ansible \
--inventory inventory-overcloud.sh \
all \
--module-name shell \
--become \
--args "docker ps |egrep "CONTAINER|unhealthy"'

And the same command using short arguments.

ansible \
-i inventory-overcloud.sh \
all \
-m shell \
-ba "docker ps |egrep "CONTAINER|unhealthy"'

Create some Ansible plays

You can see simple tasks are easy, for more complicated tasks you might want to write some plays.

Pre-fetch downloads before update

Your needs will probably vary, but here is a simple example to pre-download updates on my RHEL hosts to save time (updates are actually installed separately via overcloud update process). Note that the download_only option was added in Ansible 2.7 and thus I don’t use the yum module as RHEL uses Ansible 2.6.

cat >> fetch-updates.yaml << EOF
---
- hosts: all
  tasks:
    - name: Fetch package updates
      command: yum update --downloadonly
      register: result_fetch_updates
      retries: 30
      delay: 10
      until: result_fetch_updates is succeeded
      changed_when: '"Total size:" not in result_fetch_updates.stdout'
      args:
        warn: no
EOF

Now we can run this command against the next set of nodes we’re going to update, Compute and Telemetry in this example.

ansible-playbook \
--inventory inventory-overcloud.sh \
--limit Compute,Telemetry \
fetch-updates.yaml

And again, you could specify nodes individually.

ansible-playbook \
--inventory inventory-overcloud.sh \
--limit telemetry-0,service-0,compute-2,compute-3 \
fetch-updates.yaml

There you go. Using dynamic inventory can be really useful for running adhoc commands against your OpenStack nodes.

COVID-19 (of course)

We thought it timely to review a few facts and observations, relying on published medical papers (or those submitted for peer review) and reliable sources.

March 17, 2020

COVID-19 Time Series Analysis

On Friday 13 March I started looking at the COVID-19 time series case data. The first step was to fit a simple exponential model. The model lets us work out the number of cases t days in the future N(t), given N(0) cases today, and a doubling time of Td days:

N(t) = N(0)*2^(t/Td)

To work out how many days (t) to a number of cases N(t), you can re-arrange to get:

t = Td*log2(N(t)/N(0))

At the time I had some US travel planned for late March. So I plugged in some numbers to see how long it would take the US to get to 70,000 cases (China’s cases at the time):

t = 3*log2(70,000/1600) = 16.4 days

Wow. It slowly dawned on me that international travel was going to be a problem. The human mind just struggles to cope with the power of exponential growth. Five days seems a long time ago now….

I immediately grounded my parents – they are an at risk demographic and in a few weeks the hospitals will not be able to help them if they get sick. I estimate my home city of Adelaide (30 cases on March 18) will struggle with 1000 cases (a proportion of which will need Intensive Care):

t = 4*log2(1000/30) = 20 days

The low number of cases today is not important, the exponential growth is the critical factor.

Since then I’ve been messing with a customised covid19.py Python script to generate some plots useful to me. It is based on some code I found from Mohammad Ashhad. You might find it useful too, it’s easy to customise to other countries. I’d also appreciate a review of the script and math in this post.

I find that analysing the data gives me a small sense of control over the situation. And a useful crystal ball in this science fiction life we have suddenly started living.

Here are some plots from the last 14 days:


I find the second, log plot much more helpful. A constant positive slope on a log plot indicates exponential growth which is bad. We want the log plot to flatten out to a horizontal line.

Doubling time is the key metric. Here is a smoothed (3 day window) estimate. A low doubling time (e.g. a few days) is bad, our target is a high doubling time:

It’s a bit noisey at the moment. I’m interested in Spain and Italy as they have locked down. There will be a time lag as infections prior to lock down flow through to cases, but I expect (and sincerely hope) to see the doubling time of those countries improve, and new cases tapering off.

I’m working from home and hoping Australia will lock down soon. I will update the plots above daily.

All the best to everyone.

Update – March 26 2020

It’s been one week since I first published this post and I have been updating the graphs every day. My models are simplistic and I am not an epidemiologist. However I am sorry to say that exponential growth for Australia and the US has proceeded at the same rate or faster than the simple models above predicted.

Italy is showing a clear trend to an improved doubling time. The top plot show an almost linear growth. This is welcome news and will hopefully soon lead to a decreased load on their hospitals. This is encouraging to me as it shows lock down can work!

A small positive trend for Spain, who have also locked down; however Australia and the US still doubling every 3-4 days. It’s clear from the second, log plot, that US cases will soon be the highest in the world.

Any changes we make in behaviour today will take 1-2 weeks to flow through. So this is a window into behavioural changes 1-2 weeks ago, and an estimate of the doubling rate for the next 1-2 weeks.

A daily case increase of 10% is a doubling time of 7.3 days (1 week). This intuitively feels like a good first milestone, and something expanding health systems have some chance of dealing with. It’s also easy to calculate in your head when looking at day by day statistics. A daily increase of 20% is a 3.8 day doubling time and very bad news.

Australia still doesn’t have a strong lock down, and many people are not staying at home. I hope our government acts decisively soon.

Update – April 3 2020

Another week has passed since my last update – a long time in the Coronavirus saga. A few days after my last update, I noticed the Australian new cases were constant at around 350 for a few days, then started to drop. The doubling time has shot up too, and the top graph looks almost linear now. Australia is now at about 5% new cases/day (300 new/5000 existing cases). We can handle that.

This means out hospitals are not going to break. Good news indeed. My theory on this reduction is the time delayed effect of the Australian population starting to take Corona seriously, and good management by our state and federal governments. Several states have a lock down but the effect hasn’t flowed through to cases yet.

I think we are now entering a “whack a mole” stage, like China, Japan, and South Korea. We’ll have to remain vigilant, stay at home, and smash small outbreaks as they spring up in the community. Recoveries will eventually start to pick up and the number of active cases decline. The current numbers are a 0.5% fatality rate and 2% ICU admission rate.

Despite the appalling number of deaths in Italy and Spain, they clearly have new cases under control through lock down. The log curves are flat, and doubling times steadily increasing. The situation is very bad in the US, and many other countries. I am particularly concerned for the developing world.

I note the doubling rate curve for for Spain and Australia is the same, Australia is just much further down the curve. Even my septuagenarian parents are behaving – mostly “staying home”.

Doing my own analysis has been really useful – I basically ignore the headlines (anyone sick of the word “surge”?) as I can look at the data and drill down to what matters. I’m picking trends a few days before they are reported. Still a few things to ponder, like a model for how ICU cases track reported cases.

Best wishes to everyone.

Links

John Hopkins CSSE COVID-19 Dashboard
Source Data
Our World in Data Coronavirus Statistics and Research

March 15, 2020

Using network namespaces with veth to NAT guests with overlapping IPs

Sets of virtual machines are connected to a virtual bridges (e.g. virbr0 and virbr1) and as they are isolated, can use the same subnet range and set of IPs. However, NATing becomes a problem because the host won’t know which VM to return the traffic to.

To solve this problem, we can use network namespaces and some veth (virtual Ethernet) devices to connect up each private network we want to NAT.

Each veth device acts like a patch cable and is actually made up of two network devices, one for each end (e.g. peer1-a and peer1-b). By adding those interfaces between bridges and/or namespaces, you create a link between them.

The network namespace is only used for NAT and is where the veth IPs are set, the other end will act like a patch cable without an IP. The VMs are only connected into their respective bridge (e.g. virbr0) and can talk to the network namespace over the veth patch.

We will use two pairs for each network namespace.

  • One (e.g. represented by veth1 below ) which connects the virtual machine’s private network (e.g. virbr0 on 10.0.0.0/24) into the network namespace (e.g. net-ns1) where it sets an IP and will be the private network router (e.g. 10.0.0.1).
  • Another (e.g. represented by veth2 below) which connects the upstream provider network (e.g. br0 on 192.168.0.0/24) into the same network namespace where it sets an IP (e.g. 192.168.0.100).
  • Repeat the process for other namespaces (e.g. represented by veth3 and veth4 below).
Configuration for multiple namespace NAT

By providing each private network with is own unique upstream routable IP and applying NAT rules inside each namespace separately we can avoid any conflict.

Create a provider bridge

You’ll need a bridge to a physical network, which will act as your upstream route (like a “provider” network).

ip link add name br0 type bridge
ip link set br0 up
ip link set eth0 up
ip link set eth0 master br0

Create namespace

We create our namespace to patch in the veth devices and hold the router and isolated NAT rules. As this is for the purpose of NATing multiple private networks, I’m making it sequential and calling this nat1 (for our first one, then I’ll call the next one nat2).

ip netns add nat1

First veth pair

Our first veth peer interfaces pair will be used to connect the namespace to the upstream bridge (br0). Give them a name that makes sense to you; here I’m making it sequential again and specifying the purpose. Thus, peer1-br0 will connect to the upstream br0 and peer1-gw1 will be our routable IP in the namespae.

ip link add peer1-br0 type veth peer name peer1-gw1

Adding the veth to provider bridge

Now we need to add the peer1-br0 interface to the upstream provider bridge and bring it up. Note that we do not set an IP on this, it’s a patch lead. The IP will be on the other end in the namespace.

brctl addif br0 peer1-br0
ip link set peer1-br0 up

First gateway interface in namespace

Next we want to add the peer1-gw1 device to the namespace, give it an IP on the routable network, set the default gateway and bring the device up. Note that if you use DHCP you can do that, here I’m just setting an IP statically to 192.168.0.100 and gateway of 192.168.0.1.

ip link set peer1-gw1 netns nat1
ip netns exec nat1 ip addr add 192.168.0.100/24 dev peer1-gw1
ip netns exec nat1 ip link set peer1-gw1 up
ip netns exec nat1 ip route add default via 192.168.0.1

Second veth pair

Now we create the second veth pair to connect the namespace into the private network. For this example we’ll be connecting to virbr0 network, where our first set of VMs are running. Again, give them useful names.

ip link add peer1-virbr0 type veth peer name peer1-gw2

Adding the veth to private bridge

Now we need to add the peer1-virbr0 interface to the virbr0 private network bridge. Note that we do not set an IP on this, it’s a patch lead. The IP will be on the other end in the namespace.

brctl addif virbr0 peer1-virbr0
ip link set peer1-virbr0 up

Second gateway interface in namespace

Next we want to add the peer1-gw2 device to the namespace, give it an IP on the private network and bring the device up. I’m going to set this to the default gateway of the VMs in the private network, which is 10.0.0.1.

ip link set peer1-gw2 netns nat1
ip netns exec nat1 ip addr add 10.0.0.1/24 dev peer1-gw2
ip netns exec nat1 ip link set up dev peer1-gw2

Enable NAT in the namespae

So now we have our namespace with patches into each bridge and IPs on each network. The final step is to enable network address translation.

ip netns exec nat1 iptables -t nat -A POSTROUTING -o peer1-gw1 -j MASQUERADE
ip netns exec nat1 iptables -A FORWARD -i peer1-gw1 -o peer1-gw2 -m state --state RELATED,ESTABLISHED -j ACCEPT
ip netns exec nat1 iptables -A FORWARD -i peer1-gw2 -o peer1-gw1 -j ACCEPT

You can see the rules with standard iptables in the netspace.

ip netns exec nat1 iptables -t nat -L -n

Test it

OK so logging onto the VMs, they should a local IP (e.g. 10.0.0.100, a default route to 10.0.0.1 and have upstream DNS set. Test that they can ping the gateway, test they can ping the DNS and test that they can ping a DNS name on the Internet.

Rinse and repeat

This can be applied for other virtual machine networks as required. There is no-longer any need for the VMs there to have unique IPs, they can overlap eachother.

What you do need to do is create a new network namespace, create two new sets of veth pairs (with a useful name) and pick another IP on the routable network. The virtual machine gateway IP will be the same in each namespace, that is 10.0.0.1.

To be, or not to be decisive.

To be, or not to be decisive. kattekrab Sun, 15/03/2020 - 11:26

March 13, 2020

From 2020 to 2121: How will we get there?

From 2020 to 2121: How will we get there? kattekrab Thu, 13/02/2020 - 19:35

6 reasons I love working from home (The COVID19 edition)

6 reasons I love working from home (The COVID19 edition) kattekrab Fri, 13/03/2020 - 13:34

March 12, 2020

Coronavirus and Work

Currently the big news issue is all about how to respond to Coronavirus. The summary of the medical situation is that it’s going to spread exponentially (as diseases do) and that it has a period of up to 6 days of someone being infectious without having symptoms. So you can get a lot of infected people in an area without anyone knowing about it. Therefore preventative action needs to be taken before there’s widespread known infection.

Governments seem disinterested in doing anything about the disease before they have proof of widespread infection. They won’t do anything until it’s too late.

I finished my last 9-5 job late last year and hadn’t got a new one since then. Now I’m thinking of just not taking any work that requires much time spend outside home. If you don’t go to a workplace there isn’t a lot you have to do that involves leaving home.

Shopping is one requirement for leaving home, but the two major supermarket chains in my area (Coles and Woolworths) both offer home delivery for a small price so that covers most shopping. Getting groceries delivered means that they will usually come from the store room not the shop floor so wouldn’t have anyone coughing or sneezing on them. If you are really paranoid (which I aren’t at the moment) then you could wear rubber gloves to bring the delivery in and then wash everything before using it. It seems that many people have similar ideas to me, normally Woolworths allows booking next-day delivery, now you have to book at least 5 days (3 business days) in advance.

If anyone needs some Linux work done from remote then let me know. Otherwise I’ll probably spend the next couple of months at home doing Debian coding and watching documentaries on Netflix.

March 10, 2020

The Net Promoter Score: A Meaningless Flashing Light

Almost two years ago I made a short blog post about how the Net Promoter Score (NPS), commonly used in business settings, is The Most Useless Metric of All. My reasons at the time is that it doesn't capture the reasons for a low score, it doesn't differentiate between subjective values in its scores, and it is mathematically incoherent (a three-value grade from an 11-point range of 0-10). Further, actual studies rank it last in effectiveness.

Recently, the author of the NPS, Fred Reichfield, has come around. Apparently now It's Not About the Score, but rather the score represents a "signal". This is a very far cry from the initial claims in the Harvard Business Review that it is "The One Number You Need to Grow". Of course, it is very difficult for anyone to admit they've made an error, and Reichfield is no exception to this. Instead of addressing what are real problems of the NPS method, he now tries to argue that people have gamified the scores, and that's the real problem. It would be great if as a general principle in business reasoning, people could just admit that their pet idea is flawed and build something better. That would be appreciated. Defending something that is clearly broken, even if it's your own idea, lacks intellectual humility, and is actually a bit embarrassing to watch.

Even as a signal, the NPS doesn't send a useful signal because the people being surveyed don't know what the signal means. In a scale where a 0 is equal to a 6, the scale is meaningless. There is, in fact, only three values in NPS (promoter, passive, detractor) and only one metric that it can possibly be testing: "How likely is it that you would recommend [company X] to a friend or colleague?"

Is that a useful question? Maybe for a generic good. It is far less useful for specialist goods. Do I recommend a three-day course in learning about job submission with Slurm for high-performance computing? Only to a few people that it would benefit. What score do I give? Maybe a 2, representing the number of people I would recommend it to? 3/11 is actually a lot in a quantitative sense, but that's the circles I mix with. Ah, but no; that makes me a detractor. And here we fall into the problem of subjective evaluation of the meaning of the scores.

The NPS doesn't send a useful signal also because the people receiving the survey have no idea what the signal means. "Wow, we're receiving a lot of positive promoters!", "Do you know why?", "Nope, but we must be doing something right. I wonder what it is?". It's like driving in the dark and congratulating your skills that you haven't gone off the edge of a cliff - yet. Who would do such a thing? The NPS, that's who.

To reiterate the post from two years ago, there are necessary changes needed to improve NPS. Firstly, if you're going to have a ranking method, use all the ranks! Also, 1-10 is a 10-point scale (which I suspect was the intention), not 0-10 - that's 11 (O is an index, people!). Secondly, ensure that there are qualitative values assigned to the quantitive values; 6/10 is not a detractor in a normal distribution - it's a neutral, leaning to positive. Specify how the quantitative values correlate with qualitative descriptions. Thirdly, actively seek out reasons for the rating provided. If you don't have that data all the signal will be is just that - a flashing light with no explanation. Without quantification and qualification, you simply cannot manage appropriately.

Finally, more questions! In managing customer loyalty, you will need to discover what they are being loyal to. It doesn't need to be overly long, just something that breaks down the experience that the customer can identify with. Customers may be lazy, but they're not that lazy. The benefit gained from a few questions provides much more insight than the loss of those customers who only answer one question: "A single item question is much less reliable and more volatile than a composite index. (Hill, Nigel; Roche, Greg; Allen, Rachel (2007). Customer Satisfaction: The Customer Experience through the Customer's Eyes). Yes, it is great to have people promoting your organisation or product. You know else you need? Knowledge of what that flashing light means.

March 09, 2020

Terry2020 finally making the indoor beast more stable

Over time the old Terry robot had evolved from a basic "T" shape to have pan and tilt and a robot arm on board. The rear caster(s) were the weakest part of the robot enabling the whole thing to rock around more than it should. I now have Terry 2020 on the cards.


Part of this is an upgrade to a Kinect2 for navigation. The power requirements of that (12v/3a or so) have lead me to putting a better dc-dc bus on board and some relays to be able to pragmatically shut down and bring up features are needed and conserve power otherwise. The new base footprint is 300x400mm though the drive wheels stick out the side.

The wheels out the sides is partially due to the planetary gear motors (on the under side) being quite long. If it is an issue I can recut the lowest layer alloy and move them inward but I an not really needing to have the absolute minimal turning circle. If that were the case I would move the drive wheels to the middle of the chassis so it could turn on it's center.

There will be 4 layers at the moment and a mezzanine below the arm. So there will be expansion room included in the build :)

The rebuild will allow Terry to move at top speed when self driving. Terry will never move at the speed of an outdoor robot but can move closer to it's potential when it rolls again.

March 08, 2020

Yet another near-upstream Raptor Blackbird firmware build

In what is coming a month occurance, I’ve put up yet another firmware build for the Raptor Blackbird with close-to-upstream firmware (see here and here for previous ones).

Well, I’ve done another build! It’s current op-build (as of yesterday), but my branch with patches for the Raptor Blackbird. The skiboot patch is there, the SBE speedup patch is now upstream. The machine-xml which is straight from Raptor but in my repo.

Here’s the current versions of everything:

$ lsprop /sys/firmware/devicetree/base/ibm,firmware-versions/
skiboot          "v6.5-228-g82aed17a-p4360f95"
bmc-firmware-version
                 "0.00"
occ              "3ab2921"
hostboot         "acdff8a-pe7e80e1"
buildroot        "2019.05.3-15-g3a4fc2a888"
capp-ucode       "p9-dd2-v4"
machine-xml      "site_local-stewart-a0efd66"
hostboot-binaries
                 "hw013120a.opmst"
sbe              "c318ab0-p1ddf83c"
hcode            "hw030220a.opmst"
petitboot        "v1.12"
phandle          0000064c (1612)
version          "blackbird-v2.4-514-g62d1a941"
linux            "5.4.22-openpower1-pdbbf8c8"
name             "ibm,firmware-versions"

If we compare this to the last build I put up, we have:

Componentoldnew
skibootv6.5-209-g179d53df-p4360f95v6.5-228-g82aed17a-p4360f95
linux5.4.13-openpower1-pa361bec5.4.22-openpower1-pdbbf8c8
occ3ab2921no change
hostboot779761d-pe7e80e1acdff8a-pe7e80e1
buildroot2019.05.3-14-g17f117295f2019.05.3-15-g3a4fc2a888
capp-ucodep9-dd2-v4no change
machine-xmlsite_local-stewart-a0efd66no change
hostboot-binarieshw011120a.opmsthw013120a.opmst
sbe166b70c-p06fc80cc318ab0-p1ddf83c
hcodehw011520a.opmsthw030220a.opmst
petitbootv1.11v1.12
versionblackbird-v2.4-415-gb63b36efblackbird-v2.4-514-g62d1a941

So, what do those changes mean? Not too much changed over the past month. Kernel bump, new petitboot (although I can’t find release notes but it doesn’t look like there’s a lot of changes), and slight bumps to other firmware components.

Grab blackbird.pnor from https://www.flamingspork.com/blackbird/stewart-blackbird-4-images/ and give it a whirl!

To flash it, copy blackbird.pnor to your Blackbird’s BMC in /tmp/ (important! the /tmp filesystem has enough room, the home directory for root does not), and then run:

pflash -E -p /tmp/blackbird.pnor

Which will ask you to confirm and then flash:

About to erase chip !
WARNING ! This will modify your HOST flash chip content !
Enter "yes" to confirm:yes
Erasing... (may take a while)
[==================================================] 99% ETA:1s      
done !
About to program "/tmp/blackbird.pnor" at 0x00000000..0x04000000 !
Programming & Verifying...
[==================================================] 100% ETA:0s   

March 07, 2020

Fixing MariaDB InnoDB errors after upgrading to MythTV 30

After upgrading to MythTV 30 and MariaDB 10.3.18 on Debian buster, I noticed the following errors in my logs:

Jan 14 02:00:05 hostname mysqld[846]: 2020-01-14  2:00:05 62 [Warning] InnoDB: Cannot add field `rating` in table `mythconverg`.`internetcontentarticles` because after adding it, the row size is 8617 which is greater than maximum allowed size (8126) for a record on index leaf page.
Jan 14 02:00:05 hostname mysqld[846]: 2020-01-14  2:00:05 62 [Warning] InnoDB: Cannot add field `playcommand` in table `mythconverg`.`videometadata` because after adding it, the row size is 8243 which is greater than maximum allowed size (8126) for a record on index leaf page.

The root cause is that the database is using an InnoDB row format that cannot handle the new table sizes.

To fix it, I put the following in alter_tables.sql:

ALTER TABLE archiveitems ROW_FORMAT=DYNAMIC;
ALTER TABLE bdbookmark ROW_FORMAT=DYNAMIC;
ALTER TABLE callsignnetworkmap ROW_FORMAT=DYNAMIC;
ALTER TABLE capturecard ROW_FORMAT=DYNAMIC;
ALTER TABLE cardinput ROW_FORMAT=DYNAMIC;
ALTER TABLE channel ROW_FORMAT=DYNAMIC;
ALTER TABLE channelgroup ROW_FORMAT=DYNAMIC;
ALTER TABLE channelgroupnames ROW_FORMAT=DYNAMIC;
ALTER TABLE channelscan ROW_FORMAT=DYNAMIC;
ALTER TABLE channelscan_channel ROW_FORMAT=DYNAMIC;
ALTER TABLE channelscan_dtv_multiplex ROW_FORMAT=DYNAMIC;
ALTER TABLE codecparams ROW_FORMAT=DYNAMIC;
ALTER TABLE credits ROW_FORMAT=DYNAMIC;
ALTER TABLE customexample ROW_FORMAT=DYNAMIC;
ALTER TABLE diseqc_config ROW_FORMAT=DYNAMIC;
ALTER TABLE diseqc_tree ROW_FORMAT=DYNAMIC;
ALTER TABLE displayprofilegroups ROW_FORMAT=DYNAMIC;
ALTER TABLE displayprofiles ROW_FORMAT=DYNAMIC;
ALTER TABLE dtv_multiplex ROW_FORMAT=DYNAMIC;
ALTER TABLE dtv_privatetypes ROW_FORMAT=DYNAMIC;
ALTER TABLE dvdbookmark ROW_FORMAT=DYNAMIC;
ALTER TABLE dvdinput ROW_FORMAT=DYNAMIC;
ALTER TABLE dvdtranscode ROW_FORMAT=DYNAMIC;
ALTER TABLE eit_cache ROW_FORMAT=DYNAMIC;
ALTER TABLE filemarkup ROW_FORMAT=DYNAMIC;
ALTER TABLE gallery_directories ROW_FORMAT=DYNAMIC;
ALTER TABLE gallery_files ROW_FORMAT=DYNAMIC;
ALTER TABLE gallerymetadata ROW_FORMAT=DYNAMIC;
ALTER TABLE housekeeping ROW_FORMAT=DYNAMIC;
ALTER TABLE inputgroup ROW_FORMAT=DYNAMIC;
ALTER TABLE internetcontent ROW_FORMAT=DYNAMIC;
ALTER TABLE internetcontentarticles ROW_FORMAT=DYNAMIC;
ALTER TABLE inuseprograms ROW_FORMAT=DYNAMIC;
ALTER TABLE iptv_channel ROW_FORMAT=DYNAMIC;
ALTER TABLE jobqueue ROW_FORMAT=DYNAMIC;
ALTER TABLE jumppoints ROW_FORMAT=DYNAMIC;
ALTER TABLE keybindings ROW_FORMAT=DYNAMIC;
ALTER TABLE keyword ROW_FORMAT=DYNAMIC;
ALTER TABLE livestream ROW_FORMAT=DYNAMIC;
ALTER TABLE logging ROW_FORMAT=DYNAMIC;
ALTER TABLE music_albumart ROW_FORMAT=DYNAMIC;
ALTER TABLE music_albums ROW_FORMAT=DYNAMIC;
ALTER TABLE music_artists ROW_FORMAT=DYNAMIC;
ALTER TABLE music_directories ROW_FORMAT=DYNAMIC;
ALTER TABLE music_genres ROW_FORMAT=DYNAMIC;
ALTER TABLE music_playlists ROW_FORMAT=DYNAMIC;
ALTER TABLE music_radios ROW_FORMAT=DYNAMIC;
ALTER TABLE music_smartplaylist_categories ROW_FORMAT=DYNAMIC;
ALTER TABLE music_smartplaylist_items ROW_FORMAT=DYNAMIC;
ALTER TABLE music_smartplaylists ROW_FORMAT=DYNAMIC;
ALTER TABLE music_songs ROW_FORMAT=DYNAMIC;
ALTER TABLE music_stats ROW_FORMAT=DYNAMIC;
ALTER TABLE music_streams ROW_FORMAT=DYNAMIC;
ALTER TABLE mythlog ROW_FORMAT=DYNAMIC;
ALTER TABLE mythweb_sessions ROW_FORMAT=DYNAMIC;
ALTER TABLE networkiconmap ROW_FORMAT=DYNAMIC;
ALTER TABLE oldfind ROW_FORMAT=DYNAMIC;
ALTER TABLE oldprogram ROW_FORMAT=DYNAMIC;
ALTER TABLE oldrecorded ROW_FORMAT=DYNAMIC;
ALTER TABLE people ROW_FORMAT=DYNAMIC;
ALTER TABLE phonecallhistory ROW_FORMAT=DYNAMIC;
ALTER TABLE phonedirectory ROW_FORMAT=DYNAMIC;
ALTER TABLE pidcache ROW_FORMAT=DYNAMIC;
ALTER TABLE playgroup ROW_FORMAT=DYNAMIC;
ALTER TABLE powerpriority ROW_FORMAT=DYNAMIC;
ALTER TABLE profilegroups ROW_FORMAT=DYNAMIC;
ALTER TABLE program ROW_FORMAT=DYNAMIC;
ALTER TABLE programgenres ROW_FORMAT=DYNAMIC;
ALTER TABLE programrating ROW_FORMAT=DYNAMIC;
ALTER TABLE recgrouppassword ROW_FORMAT=DYNAMIC;
ALTER TABLE recgroups ROW_FORMAT=DYNAMIC;
ALTER TABLE record ROW_FORMAT=DYNAMIC;
ALTER TABLE record_tmp ROW_FORMAT=DYNAMIC;
ALTER TABLE recorded ROW_FORMAT=DYNAMIC;
ALTER TABLE recordedartwork ROW_FORMAT=DYNAMIC;
ALTER TABLE recordedcredits ROW_FORMAT=DYNAMIC;
ALTER TABLE recordedfile ROW_FORMAT=DYNAMIC;
ALTER TABLE recordedmarkup ROW_FORMAT=DYNAMIC;
ALTER TABLE recordedprogram ROW_FORMAT=DYNAMIC;
ALTER TABLE recordedrating ROW_FORMAT=DYNAMIC;
ALTER TABLE recordedseek ROW_FORMAT=DYNAMIC;
ALTER TABLE recordfilter ROW_FORMAT=DYNAMIC;
ALTER TABLE recordingprofiles ROW_FORMAT=DYNAMIC;
ALTER TABLE recordmatch ROW_FORMAT=DYNAMIC;
ALTER TABLE scannerfile ROW_FORMAT=DYNAMIC;
ALTER TABLE scannerpath ROW_FORMAT=DYNAMIC;
ALTER TABLE schemalock ROW_FORMAT=DYNAMIC;
ALTER TABLE settings ROW_FORMAT=DYNAMIC;
ALTER TABLE storagegroup ROW_FORMAT=DYNAMIC;
ALTER TABLE tvchain ROW_FORMAT=DYNAMIC;
ALTER TABLE tvosdmenu ROW_FORMAT=DYNAMIC;
ALTER TABLE upnpmedia ROW_FORMAT=DYNAMIC;
ALTER TABLE user_permissions ROW_FORMAT=DYNAMIC;
ALTER TABLE user_sessions ROW_FORMAT=DYNAMIC;
ALTER TABLE users ROW_FORMAT=DYNAMIC;
ALTER TABLE videocast ROW_FORMAT=DYNAMIC;
ALTER TABLE videocategory ROW_FORMAT=DYNAMIC;
ALTER TABLE videocollection ROW_FORMAT=DYNAMIC;
ALTER TABLE videocountry ROW_FORMAT=DYNAMIC;
ALTER TABLE videogenre ROW_FORMAT=DYNAMIC;
ALTER TABLE videometadata ROW_FORMAT=DYNAMIC;
ALTER TABLE videometadatacast ROW_FORMAT=DYNAMIC;
ALTER TABLE videometadatacountry ROW_FORMAT=DYNAMIC;
ALTER TABLE videometadatagenre ROW_FORMAT=DYNAMIC;
ALTER TABLE videopart ROW_FORMAT=DYNAMIC;
ALTER TABLE videopathinfo ROW_FORMAT=DYNAMIC;
ALTER TABLE videosource ROW_FORMAT=DYNAMIC;
ALTER TABLE videotypes ROW_FORMAT=DYNAMIC;
ALTER TABLE weatherdatalayout ROW_FORMAT=DYNAMIC;
ALTER TABLE weatherscreens ROW_FORMAT=DYNAMIC;
ALTER TABLE weathersourcesettings ROW_FORMAT=DYNAMIC;

and then ran it like this:

mysql -umythtv -pPassword1 mythconverg < alter_tables.sql

March 06, 2020

Making SIP calls to VoIP.ms subscribers without using the PSTN

If you want to reach a VoIP.ms subscriber from Asterisk without using the PSTN, there is a way to do so via SIP URIs.

Here's what I added to my /etc/asterisk/extensions.conf:

exten => 1234,1,Set(CALLERID(all)=Francois Marier <5555551234>)
exten => 1234,n,Dial(SIP/sip.voip.ms/5555556789)

March 04, 2020

Configuring load balancing and location headers on Google Cloud

Share

I have a need at the moment to know where my users are in the world. This helps me to identify what compute resources to serve their request with in order to reduce the latency they experience. So how do you do that thing with Google Cloud?

The first step is to setup a series of test backends to send traffic to. I built three regions: Sydney; London; and Los Angeles. It turns out in hindsight that wasn’t actually nessesary though — this would work with a single backend just as well. For my backends I chose a minimal Ubuntu install, running this simple backend HTTP service.

I had some initial trouble finding a single page which walked through the setup of the Google Cloud load balancer to do what I wanted, which is the main reason for writing this post. The steps are:

Create your test instances and configure the backend on them. I ended up with a setup like this:

A list of google cloud VMs

Next setup instance groups to contain these instances. I chose unmanaged instance groups (that is, I don’t want autoscaling). You need to create one per region.

A list of google cloud instance groups

But wait! There’s one more layer of abstraction. We need a backend service. The configuration for these is cunningly hidden on the load balancing page, on a separate tab. Create a service which contains our three instance groups:

A sample backend service

I’ve also added a health check to my service, which just requests “/healthz” from each instance and expects a response of “OK” for healthy backends.

The backend service is also where we configure our extra headers. Click on the “advanced configurations” link, and more options appear:

Additional backend service options

Here I setup the extra HTTP headers the load balancer should insert: X-Region; X-City; and X-Lat-Lon.

And finally we can configure the load balancer. I selected a “HTTP(S) load balancer”, as I only care about incoming HTTP and HTTPS traffic. Obviously you set the load balancer to route traffic from the Internet to your VMs, and you wire the backend of the load balancer to your service. Select your backend service for the backend.

Now we can test! If I go to my load balancer in a web browser, I now get a result like this:

The top part of the page is just the HTTP headers from the request. You can see that we’re now getting helpful location headers. Mission accomplished!

Share

March 03, 2020

Using Ansible to define and manage KVM guests and networks with YAML inventories

I wanted a way to quickly spin different VMs up and down on my KVM dev box, to help with testing things like OpenStack, Swift, Ceph and Kubernetes. Some of my requirements were as follows:

  • Define everything in a markup language, like YAML
  • Manage VMs (define, stop, start, destroy and undefine) and apply settings as a group or individually
  • Support different settings for each VMs, like disks, memory, CPU, etc
  • Support multiple drives and types, including Virtio, SCSI, SATA and NVMe
  • Create users and set root passwords
  • Manage networks (create, delete) and which VMs go on them
  • Mix and match Linux distros and releases
  • Use existing cloud images from distros
  • Manage access to the VMs including DNS/hosts resolution and SSH keys
  • Have a good set of defaults so it would work out of the box
  • Potentially support other architectures (like ppc64le or arm)

So I hacked together an Ansible role and example playbook. Setting guest states to running, shutdown, destroyed or undefined (to delete and clean up) are supported. It will also manage multiple libvirt networks and guests can have different specs as well as multiple disks of different types (SCSI, SATA, Virtio, NVMe). With Ansible’s –limit option, any individual guest, a hostgroup of guests, or even a mix can be managed.

Managing KVM guests with Ansible

Although Terraform with libvirt support is potentially a good solution, by using Ansible I can use that same inventory to further manage the guests and I’ve also been able to configure the KVM host itself. All that’s really needed is a Linux host capable of running KVM, some guest images and a basic inventory. The Ansible will do the rest (on supported distros).

The README is quite detailed, so I won’t repeat all of that here. The sample playbook comes with some example inventories, such as this simple one for spinning up three CentOS hosts (and using defaults).

simple:
  hosts:
    centos-simple-[0:2]:
      ansible_python_interpreter: /usr/bin/python

This can be executed like so.

curl -O https://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud.qcow2
sudo mv -iv CentOS-7-x86_64-GenericCloud.qcow2 /var/lib/libvirt/images/

git clone --recursive https://github.com/csmart/virt-infra-ansible.git
cd virt-infra-ansible

ansible-playbook --limit kvmhost,simple ./virt-infra.yml

There is also a more detailed example inventory that uses multiple distros and custom settings for the guests.

So far this has been very handy!

March 02, 2020

Amazon Prime and Netflix

I’ve been trying both Amazon Prime and Netflix. I signed up for the month free of Amazon Prime to watch “Good Omens” and “Picard”. “Good Omens” is definitely worth the effort of setting up the month free of Amazon Prime and is worth the month’s subscription if you have used your free month in the past and Picard is ok.

Content

Amazon Prime has a medium amount of other content, I’m now paying for a month of Amazon Prime mainly because there’s enough documentaries to take a month. For reference there are plenty of good ones about war and about space exploration. There are also some really rubbish documentaries, for example a 2 part documentary about the Magna Carta where the second part starts with Grover Norquist claiming that the Magna Carta is justification for not having any taxes (the first part seemed ok).

Netflix has a lot of great content. A big problem with Netflix is that there aren’t good ways of searching and organising the content you want to watch. It would be really nice if Netflix could use some machine learning for recommendations and recommend shows based on what I’ve liked and also what I’ve disliked.

On both Netflix and Amazon when you view the details of a show it gives a short list of similar shows which is nice. With Amazon I have no complaints about that. But with Netflix the content library is so great that you get lost in a maze of links. On the Android tablet interface for Netflix it shows 12 similar shows in a grid and on the web interface it’s a row of 20 shows with looped scrolling. Then as you click a different show you get another list of 12/20 shows which will usually have some overlap with the previous one. It would be nice if you could easily swipe left on shows you don’t like to avoid having them repeatedly presented to you.

On Netflix I’ve really enjoyed the “Altered Carbon” series (which is significantly more violent than I anticipated), “Black Mirror” (the episode written by Trent Reznor and starring Miley Cyrus is particularly good), and “Love Death and Robots”. Overall I currently rate “Love Death and Robots” as in many ways the best series I’ve ever watched because the episodes are all short and get straight to the point. One advantage of online video is that they don’t need to pad episodes out or cut them short to fit a TV time slot, they can use as much time as necessary to tell the story.

Watch List

Having a single row of shows to watch is fine for the amount of content that Amazon has, but for the Netflix content you can easily get 100 shows on your watch list and it would be good to be able to search my watch list by genre (it’s a drag to flick through dozens of icons of war documentaries when I’m in the mood for an action movie as the icons are somewhat similar).

As well as a list of shows you selected to watch Netflix has a list of shows that have been recently watched with no way to edit it which is separate from the list of shows selected to watch. So if you watch 5 minutes of a show and decide that it sucks then it stays on the list until you have partially watched 10 other shows recently. For my usage the recently watched list is the most important thing as I’m watching some serial shows and wouldn’t want to go through the 100 shows on my watch list to find them. If I’ve decided that a movie sucked after watching a bit of it I don’t want to be reminded of it by seeing the icon every time I use Netflix for the next month.

Amazon has only a single “watch next” list for shows that you have watched recently and shows that you selected as worth watching. It allows editing the list which is nice, but then Amazon also often keeps shows on the list when you have finished watching them and removed them from the “to watch” setting. Amazon’s watch list is also generally buggy, at one time it decided that a movie was no longer available in my region but didn’t let me remove it from the list.

Quality

Apparently the Netflix web interface on Linux only allows 720p video while the Amazon web interface on all platforms is limited to 720p. In any case my Internet connection is probably only good enough for 1080p at most. I haven’t noticed any quality differences between Netflix and Amazon Prime.

Multiple Users

Netflix allows you to create profiles for multiple users with separate watch lists which is very handy. They also don’t have IP address restrictions so it’s a common practice for people to share a Netflix account with relatives. If you try to use Netflix when the maximum number of sessions for your account is in use it will show a list of what the other people on your account are watching (so if you share with your parents be careful about that).

Amazon doesn’t allow creating multiple profiles, but the content isn’t that great. The trend in video streaming is for proprietary content to force users to subscribe to a service. So sharing an Amazon Prime account with a few people so you want watch the proprietary content would make sense.

Watching Patterns

Sometimes when I’m particularly distracted I can’t focus on one show for any length of time. Both Amazon and Netflix (and probably all other online streaming services) allow me to skip between shows easily. That’s always been a feature of YouTube, but with YouTube you get recommended increasingly viral content until you find yourself watching utter rubbish. At least with Amazon and Netflix there is a minimum quality level even if that is reality TV.

Conclusion

Amazon Prime has a smaller range of content and some really rubbish documentaries. I don’t mind the documentaries about UFOs and other fringe stuff as it’s obvious what it is and you can avoid it. A documentary that has me watching for an hour before it’s revealed to be a promo for Grover Norquist is really bad, did the hour of it that I watched have good content or just rubbish too?

Netflix has a huge range of content and the quality level is generally very high.

If you are going to watch TV then subscribing to Netflix is probably a good idea. It’s reasonably cheap, has a good (not great) interface, and has a lot of content including some great original content.

For Amazon maybe subscribe for 1 month every second year to binge watch the Amazon proprietary content that interests you.

March 01, 2020

Upgrading a DMR hotspot to Pi-Star 4.1 RC-7

While the Pi-Star DMR gateway has automatic updates, the latest release (3.4.17) is still based on an old version of Debian (Debian 8 jessie) which is no longer supported. It is however relatively easy to update to the latest release candidate of Pi-Star 4.1 (based on Debian 10 buster), as long as you have a spare SD card (4 GB minimum).

Download the required files and copy to an SD card

First of all, download the latest RC image and your local configuration (remember the default account name is pi-star). I like to also print a copy of that settings page since it's much easier to refer to if things go wrong.

Then unzip the image and "burn" it to a new SD card (no need to format it ahead of time):

sudo dd if=Pi-Star_RPi_V4.1.0-RC7_20-Dec-2019.img of=/dev/sdX status=progress bs=4M
sync

where /dev/sdX is the device name for the SD card, which you can find in the dmesg output. Don't skip the sync command or you may eject the card before your computer is done writing to it.

Then unmount the SD card and unplug it from your computer. Plug it back in. You should see two drives mounted automatically on your desktop:

  • pistar
  • boot

Copy the configuration zip file you downloaded earlier onto the root of the boot drive and then eject the drive.

Run sync again before actually unplugging the card.

Boot into the new version

In order to boot into the new version, start by turning off the Pi. Then remove the old SD card and insert the new one that you just prepared. That new card will become the new OS drive.

Boot the Pi and ideally connect a monitor to the HDMI port so that you can see it boot up and reboot twice before dropping you to a login prompt.

Login using the default credentials:

  • Username: pi-star
  • Password: raspberry

Once logged in use top to see if the pi is busy doing anything. Mine was in the process of upgrading Debian packages via unattended-upgrades which made everything (including the web UI) very slow.

You should now be able to access the web UI using the above credentials.

Update to the latest version

From the command line, you can ensure that you are running the latest version of Pi-Star by running the following command:

sudo pistar-upgrade

This updated from 4.1.0-RC7 to 4.1.0-RC8 on my device.

You can also run the following:

sudo pistar-update

to update the underlying Raspbian OS.

Check and restore your settings

Once things have settled down, double-check the settings and restore your admin password since that was not part of the configuration backup you made earlier.

I had to restore the following settings since they got lost in the process:

  • Auto AP: Off
  • uPNP: Off

Roll back to the previous version

If you run into problems, the best option is to roll back to the previous version and then try again.

As long as you didn't reuse the original SD card for this upgrade, rolling back to version 3.4.17 simply involves shutting down the pi and then swapping the new SD card for the old one and then starting it up again.

Audiobooks – February 2020

A Reminder of my rating System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70%
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Abraham Lincoln: A Life (Volume Two) by Michael Burlingame

2nd volume covering Lincoln’s time as president. Lots of quotes from contemporary sources. Fairly good coverage of just about everything. 3/5

Capital City: Gentrification and the Real Estate State
by Samuel Stein

Some interesting insights although everything being about New York and very left-wing politics of the author muddle the message. Worth a read if you are into the topic. 3/5

Young Men and Fire by Norman Maclean

The story of the 1949 Mann Gulch fire that killed 13 smoke jumpers. Misses a point due to lots of talking to maps/photographs but still a gripping story. 3/5

The Walls Have Ears: The Greatest Intelligence Operation of World War II by Helen Fry

The secret British operation to bug German POWs to obtain military intelligence. Only declassified in the late 1990s so very few personal recollections, but an interesting story. 3/5

Share

February 29, 2020

Links February 2020

Truthout has an interesting summary of the US “Wars Without Victory and Weapons Without End” [1]. The Korean war seems mostly a win for the US though.

The Golden Age of White Collar Crime is an informative article about the epidemic of rich criminals in the US that are protected at the highest levels [2]. This disproves the claims about gun ownership preventing crime. AFAIK no-one has shot a corporate criminal in spite of so many deserving it.

Law and Political Economy has an insightful article “Privatizing Sovereignty, Socializing Property: What Economics Doesn’t Teach You About the Corporation” [3]. It makes sense of the corporation law system.

IDR labs has a communism test, I scored 56% [4].

Vice has an interesting article about companies providing free email programs and services and then selling private data [5]. The California Consumer Privacy Act is apparently helping as companies that do business in the US can’t be sure which customers are in CA and need to comply to it for all users. Don’t trust corporations with your private data.

The Atlantic has an interesting article about Coronavirus and the Blindness of Authoritarianism [6]. The usual problem of authoritarianism but with a specific example from China. The US is only just astarting it’s experiment with authoritarianism and they are making the same mistakes.

The Atlantic has an insightful article about Coronavirus and it’s effect on China’s leadership [7]. It won’t change things much.

On The Commons has an insightful article We Now Have a Justice System Just for Corporations [8]. In the US corporations can force people into arbitration for most legal disputes, as they pay the arbitration companies the arbitration almost always gives the company the result they pay for.

Boing Boing has an interesting article about conspiracy theories [9]. Their point is that some people have conspiracy theories (meaning belief in conspiracies that is not based in fact) due to having seen real conspiracies at close range. I think this only applies to a minority of people who believe conspiracy theories, and probably only to people who believe in a very small number of conspiracies. It seems that most people who believe in conspiracy theories believe in many of them.

Douglas Rushkoff wrote a good article about rich people who are making plans to escape after they destroy the environment [10]. Includes the idea of having shock-collars for security guards to stop them going rogue.

Boing Boing has an interesting article on the Brahmin Left and the Merchant Right [11]. It has some good points about the left side of politics representing the middle class more than the working class, especially the major left wing parties that are more centrist nowadays (like Democrats in the US and Labor in Australia).

February 27, 2020

Linux Security Summit North America 2020: CFP and Registration

The CFP for the 2020 Linux Security Summit North America is currently open, and closes on March 31st.

The CFP details are here: https://events.linuxfoundation.org/linux-security-summit-north-america/program/cfp/

You can register as an attendee here: https://events.linuxfoundation.org/linux-security-summit-north-america/register/

Note that the conference this year has moved from August to June (24-26).  The location is Austin, TX, and we are co-located with the Open Source Summit as usual.

We’ll be holding a 3-day event again, after the success of last year’s expansion, which provides time for tutorials and ad-hoc break out sessions.  Please note that if you intend to submit a tutorial, you should be a core developer of the project or otherwise recognized leader in the field, per this guidance from the CFP:

Tutorial sessions should be focused on advanced Linux security defense topics within areas such as the kernel, compiler, and security-related libraries.  Priority will be given to tutorials created for this conference, and those where the presenter is a leading subject matter expert on the topic.

This will be the 10th anniversary of the Linux Security Summit, which was first held in 2010 in Boston as a one day event.

Get your proposals for 2020 in soon!

February 16, 2020

DisplayPort and 4K

The Problem

Video playback looks better with a higher scan rate. A lot of content that was designed for TV (EG almost all historical documentaries) is going to be 25Hz interlaced (UK and Australia) or 30Hz interlaced (US). If you view that on a low refresh rate progressive scan display (EG a modern display at 30Hz) then my observation is that it looks a bit strange. Things that move seem to jump a bit and it’s distracting.

Getting HDMI to work with 4K resolution at a refresh rate higher than 30Hz seems difficult.

What HDMI Can Do

According to the HDMI Wikipedia page [1], HDMI 1.3–1.4b (introduced in June 2006) supports 30Hz refresh at 4K resolution and if you use 4:2:0 Chroma Subsampling (see the Chroma Subsampling Wikipedia page [2] you can do 60Hz or 75Hz on HDMI 1.3–1.4b. Basically for colour 4:2:0 means half the horizontal and half the vertical resolution while giving the same resolution for monochrome. For video that apparently works well (4:2:0 is standard for Blue Ray) and for games it might be OK, but for text (my primary use of computers) it would suck.

So I need support for HDMI 2.0 (introduced in September 2013) on the video card and monitor to do 4K at 60Hz. Apparently none of the combinations of video card and HDMI cable I use for Linux support that.

HDMI Cables

The Wikipedia page alleges that you need either a “Premium High Speed HDMI Cable” or a “Ultra High Speed HDMI Cable” for 4K resolution at 60Hz refresh rate. My problems probably aren’t related to the cable as my testing has shown that a cheap “High Speed HDMI Cable” can work at 60Hz with 4K resolution with the right combination of video card, monitor, and drivers. A Windows 10 system I maintain has a Samsung 4K monitor and a NVidia GT630 video card running 4K resolution at 60Hz (according to Windows). The NVidia GT630 card is one that I tried on two Linux systems at 4K resolution and causes random system crashes on both, it seems like a nice card for Windows but not for Linux.

Apparently the HDMI devices test the cable quality and use whatever speed seems to work (the cable isn’t identified to the devices). The prices at a local store are $3.98 for “high speed”, $19.88 for “premium high speed”, and $39.78 for “ultra high speed”. It seems that trying a “high speed” cable first before buying an expensive cable would make sense, especially for short cables which are likely to be less susceptible to noise.

What DisplayPort Can Do

According to the DisplayPort Wikipedia page [3] versions 1.2–1.2a (introduced in January 2010) support HBR2 which on a “Standard DisplayPort Cable” (which probably means almost all DisplayPort cables that are in use nowadays) allows 60Hz and 75Hz 4K resolution.

Comparing HDMI and DisplayPort

In summary to get 4K at 60Hz you need 2010 era DisplayPort or 2013 era HDMI. Apparently some video cards that I currently run for 4K (which were all bought new within the last 2 years) are somewhere between a 2010 and 2013 level of technology.

Also my testing (and reading review sites) shows that it’s common for video cards sold in the last 5 years or so to not support HDMI resolutions above FullHD, that means they would be HDMI version 1.1 at the greatest. HDMI 1.2 was introduced in August 2005 and supports 1440p at 30Hz. PCIe was introduced in 2003 so there really shouldn’t be many PCIe video cards that don’t support HDMI 1.2. I have about 8 different PCIe video cards in my spare parts pile that don’t support HDMI resolutions higher than FullHD so it seems that such a limitation is common.

The End Result

For my own workstation I plugged a DisplayPort cable between the monitor and video card and a Linux window appeared (from KDE I think) offering me some choices about what to do, I chose to switch to the “new monitor” on DisplayPort and that defaulted to 60Hz. After that change TV shows on NetFlix and Amazon Prime both look better. So it’s a good result.

As an aside DisplayPort cables are easier to scrounge as the HDMI cables get taken by non-computer people for use with their TV.

February 15, 2020

Self Assessment

Background Knowledge

The Dunning Kruger Effect [1] is something everyone should read about. It’s the effect where people who are bad at something rate themselves higher than they deserve because their inability to notice their own mistakes prevents improvement, while people who are good at something rate themselves lower than they deserve because noticing all their mistakes is what allows them to improve.

Noticing all your mistakes all the time isn’t great (see Impostor Syndrome [2] for where this leads).

Erik Dietrich wrote an insightful article “How Developers Stop Learning: Rise of the Expert Beginner” [3] which I recommend that everyone reads. It is about how some people get stuck at a medium level of proficiency and find it impossible to unlearn bad practices which prevent them from achieving higher levels of skill.

What I’m Concerned About

A significant problem in large parts of the computer industry is that it’s not easy to compare various skills. In the sport of bowling (which Erik uses as an example) it’s easy to compare your score against people anywhere in the world, if you score 250 and people in another city score 280 then they are more skilled than you. If I design an IT project that’s 2 months late on delivery and someone else designs a project that’s only 1 month late are they more skilled than me? That isn’t enough information to know. I’m using the number of months late as an arbitrary metric of assessing projects, IT projects tend to run late and while delivery time might not be the best metric it’s something that can be measured (note that I am slightly joking about measuring IT projects by how late they are).

If the last project I personally controlled was 2 months late and I’m about to finish a project 1 month late does that mean I’ve increased my skills? I probably can’t assess this accurately as there are so many variables. The Impostor Syndrome factor might lead me to think that the second project was easier, or I might get egotistical and think I’m really great, or maybe both at the same time.

This is one of many resources recommending timely feedback for education [4], it says “Feedback needs to be timely” and “It needs to be given while there is still time for the learners to act on it and to monitor and adjust their own learning”. For basic programming tasks such as debugging a crashing program the feedback is reasonably quick. For longer term tasks like assessing whether the choice of technologies for a project was good the feedback cycle is almost impossibly long. If I used product A for a year long project does it seem easier than product B because it is easier or because I’ve just got used to it’s quirks? Did I make a mistake at the start of a year long project and if so do I remember why I made that choice I now regret?

Skills that Should be Easy to Compare

One would imagine that martial arts is a field where people have very realistic understanding of their own skills, a few minutes of contest in a ring, octagon, or dojo should show how your skills compare to others. But a YouTube search for “no touch knockout” or “chi” shows that there are more than a few “martial artists” who think that they can knock someone out without physical contact – with just telepathy or something. George Dillman [5] is one example of someone who had some real fighting skills until he convinced himself that he could use mental powers to knock people out. From watching YouTube videos it appears that such people convince the members of their dojo of their powers, and those people then faint on demand “proving” their mental powers.

The process of converting an entire dojo into believers in chi seems similar to the process of converting a software development team into “expert beginners”, except that martial art skills should be much easier to assess.

Is it ever possible to assess any skills if people trying to compare martial art skills often do it so badly?

Conclusion

It seems that any situation where one person is the undisputed expert has a risk of the “chi” problem if the expert doesn’t regularly meet peers to learn new techniques. If someone like George Dillman or one of the “expert beginners” that Erik Dietrich refers to was to regularly meet other people with similar skills and accept feedback from them they would be much less likely to become a “chi” master or “expert beginner”. For the computer industry meetup.com seems the best solution to this, whatever your IT skills are you can find a meetup where you can meet people with more skills than you in some area.

Here’s one of many guides to overcoming Imposter Syndrome [5]. Actually succeeding in following the advice of such web pages is not going to be easy.

I wonder if getting a realistic appraisal of your own skills is even generally useful. Maybe the best thing is to just recognise enough things that you are doing wrong to be able to improve and to recognise enough things that you do well to have the confidence to do things without hesitation.

February 14, 2020

Bidirectional rc joystick

With a bit of tinkering one can use the https://github.com/bmellink/IBusBM library to send information back to the remote controller. The info is tagged as either temperature, rpm, or voltage and units set based on that. There is a limit of 9 user feedbacks so I have 3 of each exposed.


To do this I used one of the Mega 2650 boards that is in a small form factor configuration. This gave me 5 volts to run the actual rc receiver from and more than one UART to talk to the usb, input and output parts of the buses. I think you only need 2 UARTs but as I had a bunch I just used separate ones.

The 2560 also gives a lavish amount of ram so using ROS topics doesn't really matter. I have 9 subscribers and 1 publisher on the 2560. The 9 subscribers allows sending temp, voltage, rpm info back to the remote and flexibility in what is sent so that can be adjusted on the robot itself.

I used a servo extension cable to carry the base 5v, ground, and rx signals from the ibus out on the rc receiver unit. Handy as the servo plug ends can be taped together for the more bumpy environment that the hound likes to tackle. I wound up putting the diode floating between two extension wires on the (to tx) side of the bus.



The 1 publisher just sends an array with the raw RC values in it. With minimal delays I can get a reasonably steady 120hz publication of rc values. So now the houndbot can tell me when it is getting hungry for more fresh electrons from a great distance!

I had had some problems with the nano and the rc unit and locking up. I think perhaps this was due to crystals as the uno worked ok. The 2560 board has been bench tested for 30 minutes which was enough time to expose the issues on the nano.


POC Wireguard + FRR: Now with OSPFv2!

If you read my last post, I set up a POC with wireguard and FRR to have to power of wireguard (WG) but all the routing worked out with FRR. But I had a problem. When using RIPv2, the broadcast messages seemed to get stuck in the WG interfaces until I tcpdumped it. This meant that once I tcpdumped the routes would get through, but only to eventually go stale and disappear.

I talked with the awesome people in the #wireguard IRC channel on freenode and was told to simply stay clear of RIP.

So I revisited my POC env and swapped out RIP for OSPF.. and guess what.. it worked! Now all the routes get propagated and they stay there. Which means if I decided to add new WG links and make it grow, so should all the routing:

suse@wireguard-5:~> ip r
default via 172.16.0.1 dev eth0 proto dhcp
10.0.2.0/24 via 10.0.4.104 dev wg0 proto 188 metric 20
10.0.3.0/24 via 10.0.4.104 dev wg0 proto 188 metric 20
10.0.4.0/24 dev wg0 proto kernel scope link src 10.0.4.105
172.16.0.0/24 dev eth0 proto kernel scope link src 172.16.0.36
172.16.1.0/24 via 10.0.4.104 dev wg0 proto 188 metric 20
172.16.2.0/24 via 10.0.4.104 dev wg0 proto 188 metric 20
172.16.3.0/24 via 10.0.4.104 dev wg0 proto 188 metric 20
172.16.4.0/24 dev eth1 proto kernel scope link src 172.16.4.105
172.16.5.0/24 dev eth2 proto kernel scope link src 172.16.5.105

Isn’t that beautiful, all networks on one of the more distant nodes, including network 1 (172.16.1.0/24).

I realise this doesn’t make much sense unless you read the last post, but never fear, I thought I’ll rework and append the build notes here, in case you interested again.

Build notes – This time with OSPFv2

The topology we’ll be building

Seeing that this is my Suse hackweek project and now use OpenSuse, I’ll be using OpenSuse Leap 15.1 for all the nodes (and the KVM host too).

Build the env

I used ansible-virt-infra created by csmart to build the env. A created my own inventory file, which you can dump in the inventory/ folder which I called wireguard.yml:

---
wireguard:
hosts:
wireguard-1:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-blue"
- name: "net-green"
wireguard-2:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-blue"
- name: "net-white"
wireguard-3:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-white"
wireguard-4:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-orange"
- name: "net-green"
wireguard-5:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-orange"
- name: "net-yellow"
wireguard-6:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-yellow"
vars:
virt_infra_distro: opensuse
virt_infra_distro_image: openSUSE-Leap-15.1-JeOS.x86_64-15.1.0-OpenStack-Cloud-Current.qcow2
virt_infra_distro_image_url: https://download.opensuse.org/distribution/leap/15.1/jeos/openSUSE-Leap-15.1-JeOS.x86_64-15.1.0-OpenStack-Cloud-Current.qcow2
virt_infra_variant: opensuse15.1

Next we need to make sure the networks have been defined, we do this in the kvmhost inventory file, here’s a diff:

diff --git a/inventory/kvmhost.yml b/inventory/kvmhost.yml
index b1f029e..6d2485b 100644
--- a/inventory/kvmhost.yml
+++ b/inventory/kvmhost.yml
@@ -40,6 +40,36 @@ kvmhost:
           subnet: "255.255.255.0"
           dhcp_start: "10.255.255.2"
           dhcp_end: "10.255.255.254"
+        - name: "net-mgmt"
+          ip_address: "172.16.0.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.0.2"
+          dhcp_end: "172.16.0.99"
+        - name: "net-white"
+          ip_address: "172.16.1.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.1.2"
+          dhcp_end: "172.16.1.99"
+        - name: "net-blue"
+          ip_address: "172.16.2.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.2.2"
+          dhcp_end: "172.16.2.99"
+        - name: "net-green"
+          ip_address: "172.16.3.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.3.2"
+          dhcp_end: "172.16.3.99"
+        - name: "net-orange"
+          ip_address: "172.16.4.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.4.2"
+          dhcp_end: "172.16.4.99"
+        - name: "net-yellow"
+          ip_address: "172.16.5.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.5.2"
+          dhcp_end: "172.16.5.99"
     virt_infra_host_deps:
         - qemu-img
         - osinfo-query

Now all we need to do is run the playbook:

ansible-playbook --limit kvmhost,wireguard ./virt-infra.yml

Setting up the IPs and tunnels

This above infrastructure tool uses cloud_init to set up the network, so only the first NIC is up. You can confirm this with:

ansible wireguard -m shell -a "sudo ip a"

That’s ok because we want to use the numbers on our diagram anyway 🙂
Before we get to that, lets make sure wireguard is setup, and update all the nodes.

ansible wireguard -m shell -a "sudo zypper update -y"

If a reboot is required, reboot the nodes:

ansible wireguard -m shell -a "sudo reboot"

Add the wireguard repo to the nodes and install it, I look forward to 5.6 where wireguard will be included in the kernel:

ansible wireguard -m shell -a "sudo zypper addrepo -f obs://network:vpn:wireguard wireguard"

ansible wireguard -m shell -a "sudo zypper --gpg-auto-import-keys install -y wireguard-kmp-default wireguard-tools"

Load the kernel module:

ansible wireguard -m shell -a "sudo modprobe wireguard"

Let’s create wg0 on all wireguard nodes:

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo ip link add dev wg0 type wireguard"

And add wg1 to those nodes that have 2:

ansible wireguard-1,wireguard-4 -m shell -a "sudo ip link add dev wg1 type wireguard"

Now while we’re at it, lets create all the wireguard keys (because we can use ansible):

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo mkdir -p /etc/wireguard"

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "wg genkey | sudo tee /etc/wireguard/wg0-privatekey | wg pubkey | sudo tee /etc/wireguard/wg0-publickey"

ansible wireguard-1,wireguard-4 -m shell -a "wg genkey | sudo tee /etc/wireguard/wg1-privatekey | wg pubkey | sudo tee /etc/wireguard/wg1-publickey"

Let’s make sure we enable forwarding on the nodes the will pass traffic, and install the routing software (1,2,4 and 5):

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo sysctl net.ipv4.conf.all.forwarding=1"

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo sysctl net.ipv6.conf.all.forwarding=1"

While we’re at it, we might as well add the network repo so we can install FRR and then install it on the nodes:

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo zypper ar https://download.opensuse.org/repositories/network/openSUSE_Leap_15.1/ network"

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo zypper --gpg-auto-import-keys install -y frr libyang-extentions"

This time we’ll be using OSPFv2, as we’re just using IPv4:

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo sed -i 's/^ospfd=no/ospfd=yes/' /etc/frr/daemons"

And with that now we just need to do all per server things like add IPs and configure all the keys, peers, etc. We’ll do this a host at a time.
NOTE: As this is a POC we’re just using ip commands, obviously in a real env you’d wont to use systemd-networkd or something to make these stick.

wireguard-1

Firstly using:
sudo virsh dumpxml wireguard-1 |less

We can see that eth1 is net-blue and eth2 is net-green so:
ssh wireguard-1

First IPs:
sudo ip address add dev eth1 172.16.2.101/24
sudo ip address add dev eth2 172.16.3.101/24
sudo ip address add dev wg0 10.0.2.101/24
sudo ip address add dev wg1 10.0.3.101/24

Load up the tunnels:
sudo wg set wg0 listen-port 51821 private-key /etc/wireguard/wg0-privatekey

# Node2 (2.102) public key is: P1tHKnaw7d2GJUSwXZfcayrrLMaCBHqcHsaM3eITm0s= (cat /etc/wireguard/wg0-publickey)

sudo wg set wg0 peer P1tHKnaw7d2GJUSwXZfcayrrLMaCBHqcHsaM3eITm0s= allowed-ips 10.0.2.0/24,224.0.0.0/8,172.16.0.0/16 endpoint 172.16.2.102:51822

sudo ip link set wg0 up

NOTE: We add 224.0.0.0/8 and 172.16.0.0/16 to allowed-ips. The first allows the OSPF multicast packets through. The latter will allow us to route to our private network that has been joined by WG tunnels.

sudo wg set wg1 listen-port 51831 private-key /etc/wireguard/wg1-privatekey

# Node4 (3.104) public key is: GzY59HlXkCkfXl9uSkEFTHzOtBsxQFKu3KWGFH5P9Qc= (cat /etc/wireguard/wg1-publickey)

sudo wg set wg1 peer GzY59HlXkCkfXl9uSkEFTHzOtBsxQFKu3KWGFH5P9Qc= allowed-ips 10.0.3.0/24,224.0.0.0/8,172.16.0.0/16 endpoint 172.16.3.104:51834

sudo ip link set wg1 up

Setup FRR:
sudo tee /etc/frr/frr.conf <<EOF
hostname $(hostname)

password frr
enable password frr

log file /var/log/frr/frr.log

router ospf
network 10.0.2.0/24 area 0.0.0.0
network 10.0.3.0/24 area 0.0.0.0
redistribute connected
EOF

sudo systemctl restart frr

wireguard-2

Firstly using:
sudo virsh dumpxml wireguard-2 |less

We can see that eth1 is net-blue and eth2 is net-white so:

ssh wireguard-2

First IPs:
sudo ip address add dev eth1 172.16.2.102/24
sudo ip address add dev eth2 172.16.1.102/24
sudo ip address add dev wg0 10.0.2.102/24


Load up the tunnels:
sudo wg set wg0 listen-port 51822 private-key /etc/wireguard/wg0-privatekey

# Node1 (2.101) public key is: ZsHAeRbNsK66MBOwDJhdDgJRl0bPFB4WVRX67vAV7zs= (cat /etc/wireguard/wg0-publickey)

sudo wg set wg0 peer ZsHAeRbNsK66MBOwDJhdDgJRl0bPFB4WVRX67vAV7zs= allowed-ips 10.0.2.0/24,224.0.0.0/8,172.16.0.0/16 endpoint 172.16.2.101:51821

sudo ip link set wg0 up

NOTE: We add 224.0.0.0/8 and 172.16.0.0/16 to allowed-ips. The first allows the OSPF multicast packets through. The latter will allow us to route to our private network that has been joined by WG tunnels.

Setup FRR:
sudo tee /etc/frr/frr.conf <<EOF
hostname $(hostname)


password frr
enable password frr

log file /var/log/frr/frr.log

router ospf
network 10.0.2.0/24 area 0.0.0.0
redistribute connected
EOF

sudo systemctl restart frr

wireguard-3

Only has a net-white, so it must be eth1 so:

ssh wireguard-3

First IPs:
sudo ip address add dev eth1 172.16.1.103/24

Has no WG tunnels or FRR so we’re done here.

wireguard-4

Firstly using:
sudo virsh dumpxml wireguard-4 |less

We can see that eth1 is net-orange and eth2 is net-green so:

ssh wireguard-4

First IPs:
sudo ip address add dev eth1 172.16.4.104/24
sudo ip address add dev eth2 172.16.3.104/24
sudo ip address add dev wg0 10.0.4.104/24
sudo ip address add dev wg1 10.0.3.104/24

Load up the tunnels:
sudo wg set wg0 listen-port 51844 private-key /etc/wireguard/wg0-privatekey

# Node5 (4.105) public key is: Af/sIEnklG6nnDb0wzUSq1D/Ujh6TH+5R9TblLyS3h8= (cat /etc/wireguard/wg0-publickey)

sudo wg set wg0 peer Af/sIEnklG6nnDb0wzUSq1D/Ujh6TH+5R9TblLyS3h8= allowed-ips 10.0.4.0/24,224.0.0.0/8,172.16.0.0/16 endpoint 172.16.4.105:51845

sudo ip link set wg0 up

NOTE: We add 224.0.0.0/8 and 172.16.0.0/16 to allowed-ips. The first allows the OSPF multicast packets through. The latter will allow us to route to our private network that has been joined by WG tunnels.

sudo wg set wg1 listen-port 51834 private-key /etc/wireguard/wg1-privatekey

# Node1 (3.101) public key is: Yh0kKjoqnJsxbCsTkQ/3uncEhdqa+EtJXCYcVzMdugs= (cat /etc/wireguard/wg1-publickey)

sudo wg set wg1 peer Yh0kKjoqnJsxbCsTkQ/3uncEhdqa+EtJXCYcVzMdugs= allowed-ips 10.0.3.0/24,224.0.0.0/8,172.16.0.0/16 endpoint 172.16.3.101:51831

sudo ip link set wg1 up

Setup FRR:
sudo tee /etc/frr/frr.conf <<EOF
hostname $(hostname)

password frr
enable password frr

log file /var/log/frr/frr.log

router osfp

network 10.0.3.0/24 area 0.0.0.0
network 10.0.4.0/24 area 0.0.0.0
redistribute connected
EOF


sudo systemctl restart frr

wireguard-5

Firstly using:
sudo virsh dumpxml wireguard-5 |less

We can see that eth1 is net-orange and eth2 is net-yellow so:

ssh wireguard-5

First IPs”
sudo ip address add dev eth1 172.16.4.105/24
sudo ip address add dev eth2 172.16.5.105/24
sudo ip address add dev wg0 10.0.4.105/24

Load up the tunnels:
sudo wg set wg0 listen-port 51845 private-key /etc/wireguard/wg0-privatekey

# Node4 (4.104) public key is: aPA197sLN3F05bgePpeS2uZFlhRRLY8yVWnzBAUcD3A= (cat /etc/wireguard/wg0-publickey)

sudo wg set wg0 peer aPA197sLN3F05bgePpeS2uZFlhRRLY8yVWnzBAUcD3A= allowed-ips 10.0.4.0/24,224.0.0.0/8,172.16.0.0/16 endpoint 172.16.4.104:51844

sudo ip link set wg0 up

NOTE: We add 224.0.0.0/8 and 172.16.0.0/16 to allowed-ips. The first allows the OSPF multicast packets through. The latter will allow us to route to our private network that has been joined by WG tunnels.

Setup FRR:
sudo tee /etc/frr/frr.conf <<EOF
hostname $(hostname)

password frr
enable password frr

log file /var/log/frr/frr.log

router ospf

network 10.0.4.0/24 area 0.0.0.0
redistribute connected
EOF


sudo systemctl restart frr

wireguard-6

Only has a net-yellow, so it must be eth1 so:

ssh wireguard-6

First IPs:
sudo ip address add dev eth1 172.16.5.106/24

Final comments

After all this, you now should be where I’m up to. Have an environment that is sharing routes though the WG interfaces.

The current issue I have is that if I go and ping from wireguard-1 to wireguard-5, the ICMP packet happily routes through into the 10.0.3.0/24 tunnel. When it pops out in wg1 of wireguard-4 the kernel isn’t routing it onto wireguard-5 through wg0, or WG isn’t putting the packet into the IP stack or Forwarding queue to continue it’s journey.

Well that is my current assumption. Hopefully I’ll get to the bottom of it soon, and in which case I’ll post it here 🙂

February 13, 2020

POC WireGuard + FRR Setup a.k.a dodgy meshy test network

It’s hackweek at Suse! Probably one of my favourite times of year, though I think they come up every 9 months or so.

Anyway, this hackweek I’ve been on a WireGuard journey. I started reading the paper and all the docs. Briefly looking into the code, sitting in the IRC channel and joining the mailing list to get a feel for the community.

There is still 1 day left of hackweek, so I hope to spend more time in the code, and maybe, just maybe see if I can fix a bug.. although they don’t seem to have tracker like most projects, so let’s see how that goes.

The community seems pretty cool. The tech, frankly pretty amazing, even I, from a cloud storage background, understood most the paper.

I had set up a tunnel, tcpdumped traffic, used wireshark to look closely at the packets as I read the paper, it was very informative. But I really wanted to get a feel for how this tech could work. They do have a wg-dynamic project which is planning on use wg as a building block to do cooler things, like mesh networking. This sounds cool, so I wanted to sync my teeth in and see how, not wg-dynamic, but see if I could build something similar out of existing OSS tech, and see where the gotchas are, outside of the obviously less secure. It seemed like a good way to better understand the technology.

So on Wednesday, I decided to do just that. Today is Thursday and I’ve gotten to a point where I can say I partially succeeded. And before I delve in deeper and try and figure out my current stumbling block, I thought I’d write down where I am.. and how I got here.. to:

  1. Point the wireguard community at, in case they’re interested.
  2. So you all can follow along at home, because it’s pretty interesting, I think.

As this title suggests, the plan is/was to setup a bunch of tunnels and use FRR to set up some routing protocols up to talk via these tunnels, auto-magically 🙂

UPDATE: The problem I describe in this post, routes becoming stale, only seems to happen when using RIPv2. When I change it to OSPFv2 all the routes work as expected!! Will write a follow up post to explain the differences.. in fact may rework the notes for it too 🙂

The problem at hand

Test network VM topology

A picture is worth 1000 words. The basic idea is to simulate a bunch of machines and networks connected over wireguard (WG) tunnels. So I created 6 vms, connected as you can see above.

I used Chris Smart’s ansible-virt-infra project, which is pretty awesome, to build up the VMs and networks as you see above. I’ll leave my build notes as an appendix to this post.

Once I have the infrastructure setup, I build all the tunnels as they are in the image. Then went ahead and installed FRR on all the nodes with tunnels (nodes 1, 2, 4, and 5). To keep things simple, I started with the easiest to configure routing protocol, RIPv2.

Believe it or not, everything seemed to work.. well mostly. I can jump on say node 5 (wireguard-5 if you playing along at home) and:

suse@wireguard-5:~> ip r
default via 172.16.0.1 dev eth0 proto dhcp
10.0.2.0/24 via 10.0.4.104 dev wg0 proto 189 metric 20
10.0.3.0/24 via 10.0.4.104 dev wg0 proto 189 metric 20
10.0.4.0/24 dev wg0 proto kernel scope link src 10.0.4.105
172.16.0.0/24 dev eth0 proto kernel scope link src 172.16.0.36
172.16.2.0/24 via 10.0.4.104 dev wg0 proto 189 metric 20
172.16.3.0/24 via 10.0.4.104 dev wg0 proto 189 metric 20
172.16.4.0/24 dev eth1 proto kernel scope link src 172.16.4.105
172.16.5.0/24 dev eth2 proto kernel scope link src 172.16.5.105

Looks good right, we see routes for networks 172.16.{0,2,3,4,5}.0/24. Network 1 isn’t there, but hey that’s quite far away, maybe it hasn’t made it yet. Which leads to the real issue.

If I go and run ip r again, soon all these routes will become stale and disappear. Running ip -ts monitor shows just that.

So the question is, what’s happening to the RIP advertisements? And yes they’re still being sent. Then how come some made it to node 5, and never again.

The simple answer is, it was me. The long answer is, I’ve never used FRR before, and it just didn’t seem to be working. So I started debugging the env. To debug, I had a tmux session opened on the KVM host with a tab for each node running FRR. I’d go to each tab and run tcpdump to check to see if the RIP traffic was making it through the tunnel. And almost instantly, I saw traffic, like:

suse@wireguard-5:~> sudo tcpdump -v -U -i wg0 port 520
tcpdump: listening on wg0, link-type RAW (Raw IP), capture size 262144 bytes
03:01:00.006408 IP (tos 0xc0, ttl 64, id 62964, offset 0, flags [DF], proto UDP (17), length 52)
10.0.4.105.router > 10.0.4.255.router:
RIPv2, Request, length: 24, routes: 1 or less
AFI 0, 0.0.0.0/0 , tag 0x0000, metric: 16, next-hop: self
03:01:00.007005 IP (tos 0xc0, ttl 64, id 41698, offset 0, flags [DF], proto UDP (17), length 172)
10.0.4.104.router > 10.0.4.105.router:
RIPv2, Response, length: 144, routes: 7 or less
AFI IPv4, 0.0.0.0/0 , tag 0x0000, metric: 1, next-hop: self
AFI IPv4, 10.0.2.0/24, tag 0x0000, metric: 2, next-hop: self
AFI IPv4, 10.0.3.0/24, tag 0x0000, metric: 1, next-hop: self
AFI IPv4, 172.16.0.0/24, tag 0x0000, metric: 1, next-hop: self
AFI IPv4, 172.16.2.0/24, tag 0x0000, metric: 2, next-hop: self
AFI IPv4, 172.16.3.0/24, tag 0x0000, metric: 1, next-hop: self
AFI IPv4, 172.16.4.0/24, tag 0x0000, metric: 1, next-hop: self

At first I thought it was good timing. I jumped to another host, and when I tcpdumed the RIP packets turned up instantaneously. This happened again and again.. and yes it took me longer then I’d like to admit before it dawned on me.

Why are routes going stale? it seems as though the packets are getting queued/stuck in the WG interface until I poked it with tcpdump!

These RIPv2 Request packet is sent as a broadcast, not directly to the other end of the tunnel. To get it to not be dropped, I had to widen my WG peer allowed-ips from the /32 to a /24.
So now I wonder if broadcast, or just the fact that it’s only 52 bytes, means it gets queued up and not sent through the tunnel, that is until I come along with a hammer and tcpdump the interface?

Maybe one way I could test this is to speed up the RIP broadcasts and hopefully fill a buffer, or see if I can turn WG, or rather the kernel, into debugging mode.

Build notes

As Promised, here are the current form of my build notes, make reference to the topology image I used above.

BTW I’m using OpenSuse Leap 15.1 for all the nodes.

Build the env

I used ansible-virt-infra created by csmart to build the env. A created my own inventory file, which you can dump in the inventory/ folder which I called wireguard.yml:

---
wireguard:
hosts:
wireguard-1:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-blue"
- name: "net-green"
wireguard-2:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-blue"
- name: "net-white"
wireguard-3:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-white"
wireguard-4:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-orange"
- name: "net-green"
wireguard-5:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-orange"
- name: "net-yellow"
wireguard-6:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-yellow"
vars:
virt_infra_distro: opensuse
virt_infra_distro_image: openSUSE-Leap-15.1-JeOS.x86_64-15.1.0-OpenStack-Cloud-Current.qcow2
virt_infra_distro_image_url: https://download.opensuse.org/distribution/leap/15.1/jeos/openSUSE-Leap-15.1-JeOS.x86_64-15.1.0-OpenStack-Cloud-Current.qcow2
virt_infra_variant: opensuse15.1

Next we need to make sure the networks have been defined, we do this in the kvmhost inventory file, here’s a diff:

diff --git a/inventory/kvmhost.yml b/inventory/kvmhost.yml
index b1f029e..6d2485b 100644
--- a/inventory/kvmhost.yml
+++ b/inventory/kvmhost.yml
@@ -40,6 +40,36 @@ kvmhost:
           subnet: "255.255.255.0"
           dhcp_start: "10.255.255.2"
           dhcp_end: "10.255.255.254"
+        - name: "net-mgmt"
+          ip_address: "172.16.0.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.0.2"
+          dhcp_end: "172.16.0.99"
+        - name: "net-white"
+          ip_address: "172.16.1.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.1.2"
+          dhcp_end: "172.16.1.99"
+        - name: "net-blue"
+          ip_address: "172.16.2.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.2.2"
+          dhcp_end: "172.16.2.99"
+        - name: "net-green"
+          ip_address: "172.16.3.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.3.2"
+          dhcp_end: "172.16.3.99"
+        - name: "net-orange"
+          ip_address: "172.16.4.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.4.2"
+          dhcp_end: "172.16.4.99"
+        - name: "net-yellow"
+          ip_address: "172.16.5.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.5.2"
+          dhcp_end: "172.16.5.99"
     virt_infra_host_deps:
         - qemu-img
         - osinfo-query

Now all we need to do is run the playbook:

ansible-playbook --limit kvmhost,wireguard ./virt-infra.yml

Setting up the IPs and tunnels

This above infrastructure tool uses cloud_init to set up the network, so only the first NIC is up. You can confirm this with:

ansible wireguard -m shell -a "sudo ip a"

That’s ok because we want to use the numbers on our diagram anyway 🙂
Before we get to that, lets make sure wireguard is setup, and update all the nodes.

ansible wireguard -m shell -a "sudo zypper update -y"

If a reboot is required, reboot the nodes:

ansible wireguard -m shell -a "sudo reboot"

Add the wireguard repo to the nodes and install it, I look forward to 5.6 where wireguard will be included in the kernel:

ansible wireguard -m shell -a "sudo zypper addrepo -f obs://network:vpn:wireguard wireguard"

ansible wireguard -m shell -a "sudo zypper --gpg-auto-import-keys install -y wireguard-kmp-default wireguard-tools"

Load the kernel module:

ansible wireguard -m shell -a "sudo modprobe wireguard"

Let’s create wg0 on all wireguard nodes:

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo ip link add dev wg0 type wireguard"

And add wg1 to those nodes that have 2:

ansible wireguard-1,wireguard-4 -m shell -a "sudo ip link add dev wg1 type wireguard"

Now while we’re at it, lets create all the wireguard keys (because we can use ansible):

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo mkdir -p /etc/wireguard"

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "wg genkey | sudo tee /etc/wireguard/wg0-privatekey | wg pubkey | sudo tee /etc/wireguard/wg0-publickey"

ansible wireguard-1,wireguard-4 -m shell -a "wg genkey | sudo tee /etc/wireguard/wg1-privatekey | wg pubkey | sudo tee /etc/wireguard/wg1-publickey"

Let’s make sure we enable forwarding on the nodes the will pass traffic, and install the routing software (1,2,4 and 5):

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo sysctl net.ipv4.conf.all.forwarding=1"

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo sysctl net.ipv6.conf.all.forwarding=1"

While we’re at it, we might as well add the network repo so we can install FRR and then install it on the nodes:

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo zypper ar https://download.opensuse.org/repositories/network/openSUSE_Leap_15.1/ network"

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo zypper --gpg-auto-import-keys install -y frr libyang-extentions"

We’ll be using RIPv2, as we’re just using IPv4:

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo sed -i 's/^ripd=no/ripd=yes/' /etc/frr/daemons"

And with that now we just need to do all per server things like add IPs and configure all the keys, peers, etc. We’ll do this a host at a time.
NOTE: As this is a POC we’re just using ip commands, obviously in a real env you’d wont to use systemd-networkd or something to make these stick.

wireguard-1

Firstly using:
sudo virsh dumpxml wireguard-1 |less

We can see that eth1 is net-blue and eth2 is net-green so:
ssh wireguard-1

First IPs:
sudo ip address add dev eth1 172.16.2.101/24
sudo ip address add dev eth2 172.16.3.101/24
sudo ip address add dev wg0 10.0.2.101/24
sudo ip address add dev wg1 10.0.3.101/24

Load up the tunnels:
sudo wg set wg0 listen-port 51821 private-key /etc/wireguard/wg0-privatekey

# Node2 (2.102) public key is: P1tHKnaw7d2GJUSwXZfcayrrLMaCBHqcHsaM3eITm0s= (cat /etc/wireguard/wg0-publickey)

sudo wg set wg0 peer P1tHKnaw7d2GJUSwXZfcayrrLMaCBHqcHsaM3eITm0s= allowed-ips 10.0.2.0/24 endpoint 172.16.2.102:51822

sudo ip link set wg0 up

sudo wg set wg1 listen-port 51831 private-key /etc/wireguard/wg1-privatekey

# Node4 (3.104) public key is: GzY59HlXkCkfXl9uSkEFTHzOtBsxQFKu3KWGFH5P9Qc= (cat /etc/wireguard/wg1-publickey)

sudo wg set wg1 peer GzY59HlXkCkfXl9uSkEFTHzOtBsxQFKu3KWGFH5P9Qc= allowed-ips 10.0.3.0/24 endpoint 172.16.3.104:51834

sudo ip link set wg1 up

Setup FRR:
sudo tee /etc/frr/frr.conf <<EOF
hostname $(hostname)

password frr
enable password frr

log file /var/log/frr/frr.log

router rip
version 2
redistribute kernel
redistribute connected

network wg0
no passive-interface wg0
network wg1
no passive-interface wg1
EOF

sudo systemctl restart frr

wireguard-2

Firstly using:
sudo virsh dumpxml wireguard-2 |less

We can see that eth1 is net-blue and eth2 is net-white so:

ssh wireguard-2

First IPs:
sudo ip address add dev eth1 172.16.2.102/24
sudo ip address add dev eth2 172.16.1.102/24
sudo ip address add dev wg0 10.0.2.102/24


Load up the tunnels:
sudo wg set wg0 listen-port 51822 private-key /etc/wireguard/wg0-privatekey

# Node1 (2.101) public key is: ZsHAeRbNsK66MBOwDJhdDgJRl0bPFB4WVRX67vAV7zs= (cat /etc/wireguard/wg0-publickey)

sudo wg set wg0 peer ZsHAeRbNsK66MBOwDJhdDgJRl0bPFB4WVRX67vAV7zs= allowed-ips 10.0.2.0/24 endpoint 172.16.2.101:51821

sudo ip link set wg0 up

Setup FRR:
sudo tee /etc/frr/frr.conf <<EOF
hostname $(hostname)


password frr
enable password frr

log file /var/log/frr/frr.log

router rip
version 2
redistribute kernel
redistribute connected

network wg0
no passive-interface wg0
EOF

sudo systemctl restart frr

wireguard-3

Only has a net-white, so it must be eth1 so:

ssh wireguard-3

First IPs:
sudo ip address add dev eth1 172.16.1.103/24

Has no WG tunnels or FRR so we’re done here.

wireguard-4

Firstly using:
sudo virsh dumpxml wireguard-4 |less

We can see that eth1 is net-orange and eth2 is net-green so:

ssh wireguard-4

First IPs:
sudo ip address add dev eth1 172.16.4.104/24
sudo ip address add dev eth2 172.16.3.104/24
sudo ip address add dev wg0 10.0.4.104/24
sudo ip address add dev wg1 10.0.3.104/24

Load up the tunnels:
sudo wg set wg0 listen-port 51844 private-key /etc/wireguard/wg0-privatekey

# Node5 (4.105) public key is: Af/sIEnklG6nnDb0wzUSq1D/Ujh6TH+5R9TblLyS3h8= (cat /etc/wireguard/wg0-publickey)

sudo wg set wg0 peer Af/sIEnklG6nnDb0wzUSq1D/Ujh6TH+5R9TblLyS3h8= allowed-ips 10.0.4.0/24 endpoint 172.16.4.105:51845

sudo ip link set wg0 up

sudo wg set wg1 listen-port 51834 private-key /etc/wireguard/wg1-privatekey

# Node1 (3.101) public key is: Yh0kKjoqnJsxbCsTkQ/3uncEhdqa+EtJXCYcVzMdugs= (cat /etc/wireguard/wg1-publickey)

sudo wg set wg1 peer Yh0kKjoqnJsxbCsTkQ/3uncEhdqa+EtJXCYcVzMdugs= allowed-ips 10.0.3.0/24 endpoint 172.16.3.101:51831

sudo ip link set wg1 up

Setup FRR:
sudo tee /etc/frr/frr.conf <<EOF
hostname $(hostname)

password frr
enable password frr

log file /var/log/frr/frr.log

router rip
version 2
redistribute kernel
redistribute connected

network wg0
no passive-interface wg0

network wg1
no passive-interface wg1
EOF


sudo systemctl restart frr

wireguard-5

Firstly using:
sudo virsh dumpxml wireguard-5 |less

We can see that eth1 is net-orange and eth2 is net-yellow so:

ssh wireguard-5

First IPs”
sudo ip address add dev eth1 172.16.4.105/24
sudo ip address add dev eth2 172.16.5.105/24
sudo ip address add dev wg0 10.0.4.105/24

Load up the tunnels:
sudo wg set wg0 listen-port 51845 private-key /etc/wireguard/wg0-privatekey

# Node4 (4.104) public key is: aPA197sLN3F05bgePpeS2uZFlhRRLY8yVWnzBAUcD3A= (cat /etc/wireguard/wg0-publickey)

sudo wg set wg0 peer aPA197sLN3F05bgePpeS2uZFlhRRLY8yVWnzBAUcD3A= allowed-ips 10.0.4.0/24 endpoint 172.16.4.104:51844

sudo ip link set wg0 up

Setup FRR:
sudo tee /etc/frr/frr.conf <<EOF
hostname $(hostname)

password frr
enable password frr

log file /var/log/frr/frr.log

router rip
version 2
redistribute kernel
redistribute connected

network wg0
no passive-interface wg0
EOF


sudo systemctl restart frr

wireguard-6

Only has a net-yellow, so it must be eth1 so:

ssh wireguard-6

First IPs:
sudo ip address add dev eth1 172.16.5.106/24

Final comments

When this _is_ all working, we’d probably need to open up the allowed-ips on the WG tunnels. We could start by just adding 172.16.0.0/16 to the list. That might allow us to route packet to the other networks.

If you want to go find other routes out to the internet, then we may need 0.0.0.0/0 But not sure how WG will route that as it’s using the allowed-ips and public keys as a routing table. I guess it may not care as we only have a 1:1 mapping on each tunnel and if we can route to the WG interface, it’s pretty straight forward.
This is something I hope to test.

Anther really beneficial test would be to rebuild this environment using IPv6 and see if things work better as we wouldn’t have any broadcasts anymore, only uni and multi-cast.

As well as trying some other routing protocol in general, like OSPF.

Finally, having to continually adjust allowed-ips and seemingly have to either open it up more or add more ranges make me realise why the wg-dynamic project exists, and why they want to come up with a secure routing protocol to use through the tunnels, to do something similar. So let’s keep an eye on that project.

February 10, 2020

Fedora 31 LXC setup on Ubuntu Bionic 18.04

Similarly to what I wrote for Fedora 29, here is how I was able to create a Fedora 31 LXC container on an Ubuntu 18.04 (bionic) laptop.

Setting up LXC on Ubuntu

First of all, install lxc:

apt install lxc
echo "veth" >> /etc/modules
modprobe veth

turn on bridged networking by putting the following in /etc/sysctl.d/local.conf:

net.ipv4.ip_forward=1

and applying it using:

sysctl -p /etc/sysctl.d/local.conf

Then allow the right traffic in your firewall (/etc/network/iptables.up.rules in my case):

# LXC containers
-A FORWARD -d 10.0.3.0/24 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -s 10.0.3.0/24 -j ACCEPT
-A INPUT -d 224.0.0.251 -s 10.0.3.1 -j ACCEPT
-A INPUT -d 239.255.255.250 -s 10.0.3.1 -j ACCEPT
-A INPUT -d 10.0.3.255 -s 10.0.3.1 -j ACCEPT
-A INPUT -d 10.0.3.1 -s 10.0.3.0/24 -j ACCEPT

and apply these changes:

iptables-apply

before restarting the lxc networking:

systemctl restart lxc-net.service

Create the container

Once that's in place, you can finally create the Fedora 29 container:

lxc-create -n fedora31 -t download -- -d fedora -r 31 -a amd64

To see a list of all distros available with the download template:

lxc-create -n foo --template=download -- --list

Once the container has been created, disable AppArmor for it:

lxc.apparmor.profile = unconfined

since the AppArmor profile isn't working at the moment.

Logging in as root

Starting the container in one window:

lxc-start -n fedora31 -F

and attaching to a console:

lxc-attach -n fedora31

to set a root password:

passwd

Logging in as an unprivileged user via ssh

While logged into the console, I tried to install ssh:

$ dnf install openssh-server
Cannot create temporary file - mkstemp: No such file or directory

but it failed because TMPDIR is set to a non-existent directory:

$ echo $TMPDIR
/tmp/user/0

I found a fix and ran the following:

TMPDIR=/tmp dnf install openssh-server

then started the ssh service:

systemctl start sshd.service

Then I installed a few other packages as root:

dnf install vim sudo man

and created an unprivileged user with sudo access:

adduser francois -G wheel
passwd francois

I set this in /etc/ssh/sshd_config:

GSSAPIAuthentication no

to prevent slow ssh logins.

Now login as that user from the console and add an ssh public key:

mkdir .ssh
chmod 700 .ssh
echo "<your public key>" > .ssh/authorized_keys
chmod 644 .ssh/authorized_keys

You can now login via ssh. The IP address to use can be seen in the output of:

lxc-ls --fancy

February 07, 2020

Visualing Phase Noise

A few months ago I was helping Gerhard, OE3GBB, track down some FreeDV 2020 sync issues over the QO-100 satellite.

Along the way, we investigated the phase noise of the QO-100 channel (including Gerhards Tx and Rx) by sending a carrier signal over the link, then running it through a GNU Octave phase_noise.m script to generate some interesting plots.

Fig 1 shows the spectrum of the carrier, some band pass noise in the SSB channel, and the single sinewave line at about 1500 Hz:

Fig 2 is a close up, where we have shifted the 1500 Hz tone down to 0 Hz. It’s not really a single frequency, but has a noise like spectra:

Figure 3 is polar plot or the I and Q (real and imag) against time. A perfect oscillator with a small frequency offset would trace a neat spiral, but due to the noise is wanders all over the place. Fig3A shows a close up of the first 5 seconds, where it reverses a few times, like a wheel rotating forwards and backwards at random:


Figure 4 is the “unwrapped phase” in radians. Unwrapping means if we get to -pi we just keep going, rather than wrapping around to pi. A constant slope suggests a constant frequency segment, for example in the first 5 seconds it wanders downwards -15 radians which suggests a frequency of -15/5 = -3 rads/sec or -3/(2*pi) = -0.5 Hz. The upwards slope from about 8 seconds is a positive frequency segment.

Figure 5 is the rate of change phase, in other words the instantaneous frequency offset, which is about -0.5 Hz at 8 seconds, then swings positive for a while:

Why does all this matter? Well phase shift keyed modems like QPSK have to track this phase. We were concerned about the ability of the FreeDV 2020 QPSK modem to track phase over QO-100. You also get similar meandering phase tracks over HF channels.

Turns out the GPS locking on one of the oscillators wasn’t working quite right, leading to step changes in the oscillator phase. So in this case, a hardware problem rather than the QPSK modem.

Links

QO-100 Sync Pull Request (with lots of notes)
FreeDV 2020 over the QO-100 Satellite
Digital Voice Transmission via QO-100 with FreeDV Mode 2020 (Lime Micro article)

A quick reflection on digital for posterity

On the eve of moving to Ottawa to join the Service Canada team (squee!) I thought it would be helpful to share a few things for posterity. There are three things below:

  • Some observations that might be useful
  • A short overview of the Pia Review: 20 articles about digital public sector reform
  • Additional references I think are outstanding and worth considering in public sector digital/reform programs, especially policy transformation

Some observations

Moving from deficit to aspirational planning

Risk! Risk!! Risk!!! That one word is responsible for an incredible amount of fear, inaction, redirection of investment and counter-productive behaviours, especially by public sectors for whom the stakes for the economy and society are so high. But when you focus all your efforts on mitigating risks, you are trying to drive by only using the rear vision mirror, planning your next step based on the issues you’ve already experienced without looking to where you need to be. It ultimately leads to people driving slower and slower, often grinding to a halt, because any action is considered more risky than inaction. This doesn’t really help our metaphorical driver to pick up the kids from school or get supplies from the store. In any case, inaction bears as many risks as no action in a world that is continually changing. For example, if our metaphorical driver was to stop the car in an intersection they will likely be hit by another vehicle, or eventually starve to death.

Action is necessary. Change is inevitable. So public sectors must balance our time between being responsive (not reactive) to change and risks, and being proactive towards a clear goals or future state.

Of course, risk mitigation is what many in government think they need to most urgently address however, to only engage this is to buy into and perpetuate the myth that the increasing pace of change is itself a bad thing. This is the difference between user polling and user research: users think they need faster horses but actually they need a better way to transport more people over longer distances, which could lead to alternatives from horses. Shifting from a change pessimistic framing to change optimism is critical for public sectors to start to build responsiveness into their policy, program and project management. Until public servants embrace change as normal, natural and part of their work, then fear and fear based behaviours will drive reactivism and sub-optimal outcomes.

The OPSI model for innovation would be a helpful tool to ask senior public servants what proportion of their digital investment is in which box, as this will help identify how aspirational vs reactive, and how top down or bottom up they are, noting that there really should be some investment and tactics in all four quadrants.

Innovation-Facets-Diamond-1024x630My observation of many government digital programs is that teams spend a lot of their time doing top down (directed) work that focuses on areas of certainty, but misses out in building the capacity or vision required for bottom up innovation, or anything that genuinely explores and engages in areas of uncertainty. Central agencies and digital transformation teams are in the important and unique position to independently stand back to see the forest for the trees, and help shape systemic responses to all of system problems. My biggest recommendation would be for the these teams to support public sector partners to embrace change optimism, proactive planning, and responsiveness/resilience into their approaches, so as to be more genuinely strategic and effective in dealing with change, but more importantly, to better plan strategically towards something meaningful for their context.

Repeatability and scale

All digital efforts might be considered through the lens of repeatability and scale.

  • If you are doing something, anything, could you publish it or a version of it for others to learn from or reuse? Can you work in the open for any of your work (not just publish after the fact)? If policy development, new services or even experimental projects could be done openly from the start, they will help drive a race to the top between departments.
  • How would the thing you are considering scale? How would you scale impact without scaling resources? Basically, for anything you, if you’d need to dramatically scale resources to implement, then you are not getting an exponential response to the problem.

Sometimes doing non scalable work is fine to test an idea, but actively trying to differentiate between work that addresses symptomatic relief versus work that addresses causal factors is critical, otherwise you will inevitably find 100% of your work program focused on symptomatic relief.

It is critical to balance programs according to both fast value (short term delivery projects) and long value (multi month/year program delivery), reactive and proactive measures, symptomatic relief and addressing causal factors, & differentiating between program foundations (gov as a platform) and programs themselves. When governments don’t invest in digital foundations, they end up duplicating infrastructure for each and every program, which leads to the reduction of capacity, agility and responsiveness to change.

Digital foundations

Most government digital programs seem to focus on small experiments, which is great for individual initiatives, but may not lay the reusable digital foundations for many programs. I would suggest that in whatever projects the team embark upon, some effort be made to explore and demonstrate what the digital foundations for government should look like. For example:

  • Digital public infrastructure - what are the things government is uniquely responsible for that it should make available as digital public infrastructure for others to build upon, and indeed for itself to consume. Eg, legislation as code, services registers, transactional service APIs, core information and data assets (spatial, research, statistics, budgets, etc), central budget management systems. “Government as a Platform” is a digital and transformation strategy, not just a technology approach.
  • Policy transformation and closing the implementation gap -  many policy teams think the issues of policy intent not being realised is not their problem, so showing the value of multidisciplinary, test-driven and end to end policy design and implementation will dramatically shift digital efforts towards more holistic, sustainable and predictable policy and societal outcomes.
  • Participatory governance - departments need to engage the public in policy, services or program design, so demonstrating the value or participatory governance is key. this is not a nice to have, but rather a necessary part of delivering good services. Here is a recent article with some concepts and methods to consider and the team needs to have capabilities to enable this, that aren’t just communications skills, but rather genuine and subject matter expertise engagement.
  • Life Journey programs - putting digital transformation efforts,, policies, service delivery improvements and indeed any other government work in the context of life journeys helps to make it real, get multiple entities that play a part on that journey naturally involved and invested, and drives horizontal collaboration across and between jurisdictions. New Zealand led the way in this, NSW Government extended the methodology, Estonia has started the journey and they are systemically benefiting.
  • I’ve spoken about designing better futures, and I do believe this is also a digital foundation, as it provides a lens through which to prioritise, implement and realise value from all of the above. Getting public servants to “design the good” from a citizen perspective, a business perspective, an agency perspective, Government perspective and from a society perspective helps flush out assumptions, direction and hypotheses that need testing.

The Pia Review

I recently wrote a series of 20 articles about digital transformation and reform in public sectors. It was something I did for fun, in my own time, as a way of both recording and sharing my lessons learned from 20 years working at the intersection of tech, government and society (half in the private sector, half in the public sector). I called it the Public Sector Pia Review and I’ve been delighted by how it has been received, with a global audience republishing, sharing, commenting, and most important, starting new discussions about the sort of public sector they want and the sort of public servants they want to be. Below is a deck that has an insight from each of the 20 articles, and links throughout.

This is not just meant to be a series about digital, but rather about the matter of public sector reform in the broadest sense, and I hope it is a useful contribution to better public sectors, not just better public services.

There is also a collated version of the articles in two parts. These compilations are linked below for convenience, and all articles are linked in the references below for context.

  • Public-Sector-Pia-Review-Part-1 (6MB PDF) — essays written to provide practical tips, methods, tricks and ideas to help public servants to their best possible work today for the best possible public outcomes; and
  • Reimagining government (will link once published) — essays about possible futures, the big existential, systemic or structural challenges and opportunities as I’ve experienced them, paradigm shifts and the urgent need for everyone to reimagine how they best serve the government, the parliament and the people, today and into the future.

A huge thank you to the Mandarin, specifically Harley Dennett, for the support and encouragement to do this, as well as thanks to all the peer reviewers and contributors, and of course my wonderful husband Thomas who peer reviewed several articles, including the trickier ones!

My digital references and links from 2019

Below are a number of useful references for consideration in any digital government strategy, program or project, including some of mine :)

General reading

Life Journeys as a Strategy

Life Journey programs, whilst largely misunderstood and quite new to government, provide a surprisingly effective way to drive cross agency collaboration, holistic service and system design, prioritisation of investment for best outcomes, and a way to really connect policy, services and human outcomes with all involved on the usual service delivery supply chains in public sectors. Please refer to the following references, noting that New Zealand were the first to really explore this space, and are being rapidly followed by other governments around the world. Also please note the important difference between customer journey mapping (common), customer mapping that spans services but is still limited to a single agency/department (also common), and true life journey mapping which necessarily spans agencies, jurisdictions and even sectors (rare) like having a child, end of life, starting school or becoming an adult.

Policy transformation

Data in Government

Designing better futures to transform towards

If you don’t design a future state to work towards, then you end up just designing reactively to current, past or potential issues. This leads to a lack of strategic or cohesive direction in any particular direction, which leads to systemic fragmentation and ultimately system ineffectiveness and cannibalism. A clear direction isn’t just about principles or goals, it needs to be something people can see, connect with, align their work towards to (even if they aren’t in your team), and get enthusiastic about. This is how you create change at scale, when people buy into the agenda, at all levels, and start naturally walking in the same direction regardless of their role. Here are some examples for consideration.

Rules as Code

Please find the relevant Rules as Code links below for easy reference.

Better Rules and RaC examples

February 04, 2020

Deleted Mapped Files

On a Linux system if you upgrade a shared object that is in use any programs that have it mapped will list it as “(deleted)” in the /proc/PID/maps file for the process in question. When you have a system tracking the stable branch of a distribution it’s expected that most times a shared object is upgraded it will be due to a security issue. When that happens the reasonable options are to either restart all programs that use the shared object or to compare the attack surface of such programs to the nature of the security issue. In most cases restarting all programs that use the shared object is by far the easiest and least inconvenient option.

Generally shared objects are used a lot in a typical Linux system, this can be good for performance (more cache efficiency and less RAM use) and is also good for security as buggy code can be replaced for the entire system by replacing a single shared object. Sometimes it’s obvious which processes will be using a shared object (EG your web server using a PHP shared object) but other times many processes that you don’t expect will use it.

I recently wrote “deleted-mapped.monitor” for my etbemon project [1]. This checks for shared objects that are mapped and deleted and gives separate warning messages for root and non-root processes. If you have the unattended-upgrades package installed then your system can install security updates without your interaction and then the monitoring system will inform you if things need to be restarted.

The Debian package debian-goodies has a program checkrestart that will tell you what commands to use to restart daemons that have deleted shared objects mapped.

Now to solve the problem of security updates on a Debian system you can use unattended-upgrades to apply updates, deleted-mapped.monitor in etbemon to inform you that programs need to be restarted, and checkrestart to tell you the commands you need to run to restart the daemons in question.

If anyone writes a blog post about how to do this on a non-Debian system please put the URL in a comment.

While writing the deleted-mapped.monitor I learned about the following common uses of deleted mapped files:

  • /memfd: is for memfd https://dvdhrm.wordpress.com/tag/memfd/ [2]
  • /[aio] is for asynchronous IO I guess, haven’t found good docs on it yet.
  • /home is used for a lot of harmless mapping and deleting.
  • /run/user is used for systemd dconf stuff.
  • /dev/zero is different for each map and thus looks deleted.
  • /tmp/ is used by Python (and probably other programs) creates temporary files there for mapping.
  • /var/lib is used for lots of temporary files.
  • /i915 is used by some X apps on systems with Intel video, I don’t know why.

February 03, 2020

Social Media Sharing on Blogs

My last post was read directly (as opposed to reading through Planet feeds) a lot more than usual due to someone sharing it on lobste.rs. Presumably the people who read it that way benefited from reading it and I got a couple of unusually insightful comments from people who don’t usually comment on my blog. The lobste.rs sharing was a win for everyone.

There are a variety of plugins for social media sharing, most of which allow organisations like Facebook to track people who read your blog which is why I haven’t been using them.

Are there good ways of allowing people to easily share your blog posts which work in a reasonable way by not allowing much tracking of users unless they actually want to share content?