Planet Linux Australia
Celebrating Australians & Kiwis in the Linux and Free/Open-Source community...

May 31, 2020

Effective Altruism

Long term readers of the blog may recall my daughter Amy. Well, she has moved on from teenage partying and is now e-volunteering at Effective Altruism Australia. She recently pointed me at the free e-book The Life You Can Save by Peter Singer.

I was already familiar with the work of Peter Singer, having read “the Most Good You Can Do”. Peter puts numbers on altruistic behaviour to evaluate them. This appeals to me – as an engineer I uses numbers to evaluate artefacts I build like modems, or other processes going on in the world like COVD-19.

Using technology to help people is a powerful motivator for Geeks. I’ve been involved in a few of these initiatives myself (OLPC and The Village Telco). It’s really tough to create something that helps people long term. A wider set of skills and capabilities are required than just “the technology”.

On my brief forays into the developing world I’ve seen ecologies of people (from the first and developing worlds) living off development dollars. In some cases there is no incentive to report the true outcomes, for example how many government bureaucrats want to report failure? How many consultants want the gig to end?

So I really get the need for scientific evaluation of any development endeavours. Go Peter and the Effective Altruism movement!

I spend around 1000 hours a year writing open source code, a strong argument that I am “doing enough” in the community space. However I have no idea how effective that code is. Is it helping anyone? My inclination to help is also mixed with “itch scratching” – geeky stuff I want to work on because I find it interesting.

So after the reading the book and having a think – I’m sold. I have committed 5% of my income to Effective Altruism Australia, selecting Give Directly as a target for my funds as it appealed to me personally.

I asked Amy proof read this post – and she suggested that instead of $ you, can donate time – that’s what she does. She also said:

Effective Altrusim opens your eyes to alternative ways to interact with charities. It combines the board field of social science to explore how may aspects intersect; by applying the scientific method to that of economics, psychology, international development, and anthropology.

Reading Further

Busting Teenage Partying with a Fluksometer
Effective Altruism Australia

May 30, 2020

AudioBooks – May 2020

Fewer books this month. At home on lockdown and weather a bit worse so less time to go on walks walks and listen.

Save the Cat! Writes a Novel: The Last Book On Novel Writing You’ll Ever Need by Jessica Brody

A fairly straight adaption of the screenplay-writing manual. Lots of examples from well-known books including full breakdowns of beats. 3/5

Happy Singlehood: The Rising Acceptance and Celebration of Solo Living by Elyakim Kislev

Based on 142 interviews. A lot of summaries of findings with quotes for interviewees and people’s blogs. Last chapter has some policy push but a little lights 3/5

Scandinavia: A History by Ewan Butler

Just a a 6 hour long quick spin though history. First half suffers a bit with lists of Kings although there is a bit more colour later in. Okay prep for something meatier 3/5

One Giant Leap: The Impossible Mission That Flew Us to the Moon by Charles Fishman

A bit of a mix. It covers the legacy of Apollo but the best bits are chapters on the Computers, Politics and other behind the scenes things. A compliment to astronaut and mission orientated books. 4/5

My Scoring System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Share

May 29, 2020

Using Live Linux to Save and Recover Your Data

There are two types of people in the world; those who have lost data and those who are about to. Given that entropy will bite eventually, the objective should be to minimise data loss. Some key rules for this backup, backup often, and backup with redundancy. Whilst an article on that subject will be produced, at this stage discussion is directed to the very specific task of recovering data from old machines which may not be accessible anymore using Linux. There number of times I've done this in past years is somewhat more than the number of fingers I have - however, like all good things it deserves to be documented in the hope that other people might find it useful.

To do this one will need a Linux live distribution of some sort as an ISO, as a bootable USB drive. A typical choice would be a Ubuntu Live or Fedora Live. If one is dealing with damaged hardware the old Slackware-derived minimalist distribution Recovery is Possible (RIP) is certainly worth using; it's certainly saved me in the past. If you need help in creating a bootable USB, the good people at HowToGeek provide some simple instructions.

With a Linux bootable disk of some description inserted in one's system, the recovery process can begin. Firstly, boot the machine and change the book order (in BIOS/UEFI) that the drive in question becomes the first in the boot order. Once the live distribution boots up, usually in a GUI environment, one needs to open the terminal application (e.g., GNOME in Fedora uses Applications, System Tools, Terminal) and change to the root user with the su command (there's no password on a live CD to be root!).

At this point one needs to create a mount point directory, where the data is going to be stored; mkdir /mnt/recovery. After this one needs to identify the disk which one is trying to access. The fdisk -l command will provide a list of all disks in the partition table. Some educated guesswork from the results is required here, which will provide the device filesystem Type; it almost certainly isn't an EFI System, or Linux swap for example. Typically one is trying to access something like /dev/sdaX.

Then one must mount the device to the directory that was just created, for example: mount /dev/sda2 /mnt/recovery. Sometimes a recalcitrant device will need to have the filesystem explicitly stated; the most common being ext3, ext4, fat, xfs, vfat, and ntfs-3g. To give a recent example I needed to run mount -t ext3 /dev/sda3 /mnt/recovery. From there one can copy the data from the mount point to a new source; a USB drive is probably the quickest, although one may take the opportunity to copy it to an external system (e.g., google drive) - and that's it! You've recovered your data!

May 28, 2020

Fixing locale problem in MythTV 30

After upgrading to MythTV 30, I noticed that the interface of mythfrontend switched from the French language to English, despite having the following in my ~/.xsession for the mythtv user:

export LANG=fr_CA.UTF-8
exec ~/bin/start_mythtv

I noticed a few related error messages in /var/log/syslog:

mythbackend[6606]: I CoreContext mythcorecontext.cpp:272 (Init) Assumed character encoding: fr_CA.UTF-8
mythbackend[6606]: N CoreContext mythcorecontext.cpp:1780 (InitLocale) Setting QT default locale to FR_US
mythbackend[6606]: I CoreContext mythcorecontext.cpp:1813 (SaveLocaleDefaults) Current locale FR_US
mythbackend[6606]: E CoreContext mythlocale.cpp:110 (LoadDefaultsFromXML) No locale defaults file for FR_US, skipping
mythpreviewgen[9371]: N CoreContext mythcorecontext.cpp:1780 (InitLocale) Setting QT default locale to FR_US
mythpreviewgen[9371]: I CoreContext mythcorecontext.cpp:1813 (SaveLocaleDefaults) Current locale FR_US
mythpreviewgen[9371]: E CoreContext mythlocale.cpp:110 (LoadDefaultsFromXML) No locale defaults file for FR_US, skipping

Searching for that non-existent fr_US locale, I found that others have this in their logs and that it's apparently set by QT as a combination of the language and country codes.

I therefore looked in the database and found the following:

MariaDB [mythconverg]> SELECT value, data FROM settings WHERE value = 'Language';
+----------+------+
| value    | data |
+----------+------+
| Language | FR   |
+----------+------+
1 row in set (0.000 sec)

MariaDB [mythconverg]> SELECT value, data FROM settings WHERE value = 'Country';
+---------+------+
| value   | data |
+---------+------+
| Country | US   |
+---------+------+
1 row in set (0.000 sec)

which explains the non-sensical FR-US locale.

I fixed the country setting like this

MariaDB [mythconverg]> UPDATE settings SET data = 'CA' WHERE value = 'Country';
Query OK, 1 row affected (0.093 sec)
Rows matched: 1  Changed: 1  Warnings: 0

After logging out and logging back in, the user interface of the frontend is now using the fr_CA locale again and the database setting looks good:

MariaDB [mythconverg]> SELECT value, data FROM settings WHERE value = 'Country';
+---------+------+
| value   | data |
+---------+------+
| Country | CA   |
+---------+------+
1 row in set (0.000 sec)

Introducing Shaken Fist

Share

The first public commit to what would become OpenStack Nova was made ten years ago today — at Thu May 27 23:05:26 2010 PDT to be exact. So first off, happy tenth birthday to Nova!

A lot has happened in that time — OpenStack has gone from being two separate Open Source projects to a whole ecosystem, developers have come and gone (and passed away), and OpenStack has weathered the cloud wars of the last decade. OpenStack survived its early growth phase by deliberately offering a “big tent” to the community and associated vendors, with an expansive definition of what should be included. This has resulted in most developers being associated with a corporate sponser, and hence the decrease in the number of developers today as corporate interest wanes — OpenStack has never been great at attracting or retaining hobbist contributors.

My personal involvement with OpenStack started in November 2011, so while I missed the very early days I was around for a lot and made many of the mistakes that I now see in OpenStack.

What do I see as mistakes in OpenStack in hindsight? Well, embracing vendors who later lose interest has been painful, and has increased the complexity of the code base significantly. Nova itself is now nearly 400,000 lines of code, and that’s after splitting off many of the original features of Nova such as block storage and networking. Additionally, a lot of our initial assumptions are no longer true — for example in many cases we had to write code to implement things, where there are now good libraries available from third parties.

That’s not to say that OpenStack is without value — I am a daily user of OpenStack to this day, and use at least three OpenStack public clouds at the moment. That said, OpenStack is a complicated beast with a lot of legacy that makes it hard to maintain and slow to change.

For at least six months I’ve felt the desire for a simpler cloud orchestration layer — both for my own personal uses, and also as a test bed for ideas for what a smaller, simpler cloud might look like. My personal use case involves a relatively small environment which echos what we now think of as edge compute — less than 10 RU of machines with a minimum of orchestration and management overhead.

At the time that I was thinking about these things, the Australian bushfires and COVID-19 came along, and presented me with a lot more spare time than I had expected to have. While I’m still blessed to be employed, all of my social activities have been cancelled, so I find myself at home at a loose end on weekends and evenings at lot more than before.

Thus Shaken Fist was born — named for a Simpson’s meme, Shaken Fist is a deliberately small and highly opinionated cloud implementation aimed at working well in small deployments such as homes, labs, edge compute locations, deployed systems, and so forth.

I’d taken a bit of trouble with each feature in Shaken Fist to think through what the simplest and highest value way of doing something is. For example, instances always get a config drive and there is no metadata server. There is also only one supported type of virtual networking, and one supported hypervisor. That said, this means Shaken Fist is less than 5,000 lines of code, and small enough that new things can be implemented very quickly by a single middle aged developer.

Shaken Fist definitely has feature gaps — API authentication and scheduling are the most obvious at the moment — but I have plans to fill those when the time comes.

I’m not sure if Shaken Fist is useful to others, but you never know. Its apache2 licensed, and available on github if you’re interested.

Share

May 27, 2020

57 Varieties of Pyrite: Exchanges Are Now The Enemy of Bitcoin

TL;DR: exchanges are casinos and don’t want to onboard anyone into bitcoin. Avoid.

There’s a classic scam in the “crypto” space: advertize Bitcoin to get people in, then sell suckers something else entirely. Over the last few years, this bait-and-switch has become the core competency of “bitcoin” exchanges.

I recently visited the homepage of Australian exchange btcmarkets.net: what a mess. There was a list of dozens of identical-looking “cryptos”, with bitcoin second after something called “XRP”; seems like it was sorted by volume?

Incentives have driven exchanges to become casinos, and they’re doing exactly what you’d expect unregulated casinos to do. This is no place you ever want to send anyone.

Incentives For Exchanges

Exchanges make money on trading, not on buying and holding. Despite the fact that bitcoin is the only real attempt to create an open source money, scams with no future are given false equivalence, because more assets means more trading. Worse than that, they are paid directly to list new scams (the crappier, the more money they can charge!) and have recently taken the logical step of introducing and promoting their own crapcoins directly.

It’s like a gold dealer who also sells 57 varieties of pyrite, which give more margin than selling actual gold.

For a long time, I thought exchanges were merely incompetent. Most can’t even give out fresh addresses for deposits, batch their outgoing transactions, pay competent fee rates, perform RBF or use segwit.

But I misunderstood: they don’t want to sell bitcoin. They use bitcoin to get you in the door, but they want you to gamble. This matters: you’ll find subtle and not-so-subtle blockers to simply buying bitcoin on an exchange. If you send a friend off to buy their first bitcoin, they’re likely to come back with something else. That’s no accident.

Looking Deeper, It Gets Worse.

Regrettably, looking harder at specific exchanges makes the picture even bleaker.

Consider Binance: this mainland China backed exchange pretending to be a Hong Kong exchange appeared out of nowhere with fake volume and demonstrated the gullibility of the entire industry by being treated as if it were a respected member. They lost at least 40,000 bitcoin in a known hack, and they also lost all the personal information people sent them to KYC. They aggressively market their own coin. But basically, they’re just MtGox without Mark Karpales’ PHP skills or moral scruples and much better marketing.

Coinbase is more interesting: an MBA-run “bitcoin” company which really dislikes bitcoin. They got where they are by spending big on regulations compliance in the US so they could operate in (almost?) every US state. (They don’t do much to dispel the wide belief that this regulation protects their users, when in practice it seems only USD deposits have any guarantee). Their natural interest is in increasing regulation to maintain that moat, and their biggest problem is Bitcoin.

They have much more affinity for the centralized coins (Ethereum) where they can have influence and control. The anarchic nature of a genuine open source community (not to mention the developers’ oft-stated aim to improve privacy over time) is not culturally compatible with a top-down company run by the Big Dog. It’s a running joke that their CEO can’t say the word “Bitcoin”, but their recent “what will happen to cryptocurrencies in the 2020s” article is breathtaking in its boldness: innovation is mainly happening on altcoins, and they’re going to overtake bitcoin any day now. Those scaling problems which the Bitcoin developers say they don’t know how to solve? This non-technical CEO knows better.

So, don’t send anyone to an exchange, especially not a “market leading” one. Find some service that actually wants to sell them bitcoin, like CashApp or Swan Bitcoin.

May 26, 2020

Cruises and Covid19

Problems With Cruises

GQ has an insightful and detailed article about Covid19 and the Diamond Princess [1], I recommend reading it.

FastCompany has a brief article about bookings for cruises in August [2]. There have been many negative comments about this online.

The first thing to note is that the cancellation policies on those cruises are more lenient than usual and the prices are lower. So it’s not unreasonable for someone to put down a deposit on a half price holiday in the hope that Covid19 goes away (as so many prominent people have been saying it will) in the knowledge that they will get it refunded if things don’t work out. Of course if the cruise line goes bankrupt then no-one will get a refund, but I think people are expecting that won’t happen.

The GQ article highlights some serious problems with the way cruise ships operate. They have staff crammed in to small cabins and the working areas allow transmission of disease. These problems can be alleviated, they could allocate more space to staff quarters and have more capable air conditioning systems to put in more fresh air. During the life of a cruise ship significant changes are often made, replacing engines with newer more efficient models, changing the size of various rooms for entertainment, installing new waterslides, and many other changes are routinely made. Changing the staff only areas to have better ventilation and more separate space (maybe capsule-hotel style cabins with fresh air piped in) would not be a difficult change. It would take some money and some dry-dock time which would be a significant expense for cruise companies.

Cruises Are Great

People like social environments, they want to have situations where there are as many people as possible without it becoming impossible to move. Cruise ships are carefully designed for the flow of passengers. Both the layout of the ship and the schedule of events are carefully planned to avoid excessive crowds. In terms of meeting the requirement of having as many people as possible in a small area without being unable to move cruise ships are probably ideal.

Because there is a large number of people in a restricted space there are economies of scale on a cruise ship that aren’t available anywhere else. For example the main items on the menu are made in a production line process, this can only be done when you have hundreds of people sitting down to order at the same time.

The same applies to all forms of entertainment on board, they plan the events based on statistical knowledge of what people want to attend. This makes it more economical to run than land based entertainment where people can decide to go elsewhere. On a ship a certain portion of the passengers will see whatever show is presented each night, regardless of whether it’s singing, dancing, or magic.

One major advantage of cruises is that they are all inclusive. If you are on a regular holiday would you pay to see a singing or dancing show? Probably not, but if it’s included then you might as well do it – and it will be pretty good. This benefit is really appreciated by people taking kids on holidays, if kids do things like refuse to attend a performance that you were going to see or reject food once it’s served then it won’t cost any extra.

People Who Criticise Cruises

For the people who sneer at cruises, do you like going to bars? Do you like going to restaurants? Live music shows? Visiting foreign beaches? A cruise gets you all that and more for a discount price.

If Groupon had a deal that gave you a cheap hotel stay with all meals included, free non-alcoholic drinks at bars, day long entertainment for kids at the kids clubs, and two live performances every evening how many of the people who reject cruises would buy it? A typical cruise is just like a Groupon deal for non-stop entertainment from 8AM to 11PM.

Will Cruises Restart?

The entertainment options that cruises offer are greatly desired by many people. Most cruises are aimed at budget travellers, the price is cheaper than a hotel in a major city. Such cruises greatly depend on economies of scale, if they can’t get the ships filled then they would need to raise prices (thus decreasing demand) to try to make a profit. I think that some older cruise ships will be scrapped in the near future and some of the newer ships will be sold to cruise lines that cater to cheap travel (IE P&O may scrap some ships and some of the older Princess ships may be transferred to them). Overall I predict a decrease in the number of middle-class cruise ships.

For the expensive cruises (where the cheapest cabins cost over $1000US per person per night) I don’t expect any real changes, maybe they will have fewer passengers and higher prices to allow more social distancing or something.

I am certain that cruises will start again, but it’s too early to predict when. Going on a cruise is about as safe as going to a concert or a major sporting event. No-one is predicting that sporting stadiums will be closed forever or live concerts will be cancelled forever, so really no-one should expect that cruises will be cancelled forever. Whether companies that own ships or stadiums go bankrupt in the mean time is yet to be determined.

One thing that’s been happening for years is themed cruises. A group can book out an entire ship or part of a ship for a themed cruise. I expect this to become much more popular when cruises start again as it will make it easier to fill ships. In the past it seems that cruise lines let companies book their ships for events but didn’t take much of an active role in the process. I think that the management of cruise lines will look to aggressively market themed cruises to anyone who might help, for starters they could reach out to every 80s and 90s pop group – those fans are all old enough to be interested in themed cruises and the musicians won’t be asking for too much money.

Conclusion

Humans are social creatures. People want to attend events with many other people. Covid 19 won’t be the last pandemic, and it may not even be eradicated in the near future. The possibility of having a society where no-one leaves home unless they are in a hazmat suit has been explored in science fiction, but I don’t think that’s a plausible scenario for the near future and I don’t think that it’s something that will be caused by Covid 19.

May 25, 2020

op-build v2.5 firmware for the Raptor Blackbird

Well, following on from my post where I excitedly pointed out that Raptor Blackbird support: all upstream in op-build v2.5, that means I can do another in my series of (close to) upstream Blackbird firmware builds.

This time, the only difference from straight upstream op-build v2.5 is my fixes for buildroot so that I can actually build it on Fedora 32.

So, head over to https://www.flamingspork.com/blackbird/op-build-v2.5-blackbird-images/ and grab blackbird.pnor to flash it on your blackbird, let me know how it goes!

GNS3 FRR Appliance

In my spare time, what little I have, I’ve been wanting to play with some OSS networking projects. For those playing along at home, during last Suse hackweek I played with wireguard, and to test the environment I wanted to set up some routing.
For which I used FRR.

FRR is a pretty cool project, if brings the networking routing stack to Linux, or rather gives us a full opensource routing stack. As most routers are actually Linux anyway.

Many years ago I happened to work at Fujitsu working in a gateway environment, and started playing around with networking. And that was my first experience with GNS3. An opensource network simulator. Back then I needed to have a copy of cisco IOS images to really play with routing protocols, so that make things harder, great open source product but needed access to proprietary router OSes.

FRR provides a CLI _very_ similar to ciscos, and make we think, hey I wonder if there is an FRR appliance we can use in GNS3?
And there was!!!

When I downloaded it and decompressed the cow2 image it was 1.5GB!!! For a single router image. It works great, but what if I wanted a bunch of routers to play with things like OSPF or BGP etc. Surely we can make a smaller one.

Kiwi

At Suse we use kiwi-ng to build machine images and release media. And to make things even easier for me we already have a kiwi config for small OpenSuse Leap JEOS images, jeos is “just enough OS”. So I hacked one to include FRR. All extra tweaks needed to the image are also easily done by bash hook scripts.

I wont go in to too much detail how because I created a git repo where I have it all including a detailed README: https://github.com/matthewoliver/frr_gns3

So feel free to check that would and build and use the image.

But today, I went one step further. OpenSuse’s Open Build System, which is used to build all RPMs for OpenSuse, but can also build debs and whatever build you need, also supports building docker containers and system images using kiwi!

So have now got the OBS to build the image for me. The image can be downloaded from: https://download.opensuse.org/repositories/home:/mattoliverau/images/

And if you want to send any OBS requests to change it the project/package is: https://build.opensuse.org/package/show/home:mattoliverau/FRR-OpenSuse-Appliance

To import it into GNS3 you need the gns3a file, which you can find in my git repo or in the OBS project page.

The best part is this image is only 300MB, which is much better then 1.5GB!
I did have it a little smaller, 200-250MB, but unfortunately the JEOS cut down kernel doesn’t contain the MPLS modules, so had to pull in the full default SUSE kernel. If this became a real thing and not a pet project, I could go and build a FRR cutdown kernel to get the size down, but 300MB is already a lot better then where it was at.

Hostname Hack

When using GNS3 and you place a router, you want to be able to name the router and when you access the console it’s _really_ nice to see the router name you specified in GNS3 as the hostname. Why, because if you have a bunch, you want want a bunch of tags all with the localhost hostname on the commandline… this doesn’t really help.

The FRR image is using qemu, and there wasn’t a nice way to access the name of the VM from inside the container, and now an easy way to insert the name from outside. But found 1 approach that seems to be working, enter my dodgy hostname hack!

I also wanted to to it without hacking the gns3server code. I couldn’t easily pass the hostname in but I could pass it in via a null device with the router name its id:

/dev/virtio-ports/frr.router.hostname.%vm-name%

So I simply wrote a script that sets the hostname based on the existence of this device. Made the script a systemd oneshot service to start at boot and it worked!

This means changing the name of the FRR router in the GNS3 interface, all you need to do is restart the router (stop and start the device) and it’ll apply the name to the router. This saves you having to log in as root and running hostname yourself.

Or better, if you name all your FRR routers before turning them on, then it’ll just work.

In conclusion…

Hopefully now we can have a fully opensource, GNS3 + FRR appliance solution for network training, testing, and inspiring network engineers.

May 24, 2020

Printing hard-to-print PDFs on Linux

I recently found a few PDFs which I was unable to print due to those files causing insufficient printer memory errors:

I found a detailed explanation of what might be causing this which pointed the finger at transparent images, a PDF 1.4 feature which apparently requires a more recent version of PostScript than what my printer supports.

Using Okular's Force rasterization option (accessible via the print dialog) does work by essentially rendering everything ahead of time and outputing a big image to be sent to the printer. The quality is not very good however.

Converting a PDF to DjVu

The best solution I found makes use of a different file format: .djvu

Such files are not PDFs, but can still be opened in Evince and Okular, as well as in the dedicated DjVuLibre application.

As an example, I was unable to print page 11 of this paper. Using pdfinfo, I found that it is in PDF 1.5 format and so the transparency effects could be the cause of the out-of-memory printer error.

Here's how I converted it to a high-quality DjVu file I could print without problems using Evince:

pdf2djvu -d 1200 2002.04049.pdf > 2002.04049-1200dpi.djvu

Converting a PDF to PDF 1.3

I also tried the DjVu trick on a different unprintable PDF, but it failed to print, even after lowering the resolution to 600dpi:

pdf2djvu -d 600 dow-faq_v1.1.pdf > dow-faq_v1.1-600dpi.djvu

In this case, I used a different technique and simply converted the PDF to version 1.3 (from version 1.6 according to pdfinfo):

ps2pdf13 -r1200x1200 dow-faq_v1.1.pdf dow-faq_v1.1-1200dpi.pdf

This eliminates the problematic transparency and rasterizes the elements that version 1.3 doesn't support.

May 23, 2020

A totally cheating sour dough starter

Share

This is the third in a series of posts documenting my adventures in making bread during the COVID-19 shutdown. I’d like to imagine I was running science experiments in making bread on my kids, but really all I was trying to do was eat some toast.

I’m not sure what it was like in other parts of the world, but during the COVID-19 pandemic Australia suffered a bunch of shortages — toilet paper, flour, and yeast were among those things stores simply didn’t have any stock of. Luckily we’d only just done a costco shop so were ok for toilet paper and flour, but we were definitely getting low on yeast. The obvious answer is a sour dough starter, but I’d never done that thing before.

In the end my answer was to cheat and use this recipe. However, I found the instructions unclear, so here’s what I ended up doing:

Starting off

  • 2 cups of warm water
  • 2 teaspoons of dry yeast
  • 2 cups of bakers flour

Mix these three items together in a plastic container with enough space for the mix to double in size. Place in a warm place (on the bench on top of the dish washer was our answer), and cover with cloth secured with a rubber band.

Feeding

Once a day you should feed your starter with 1 cup of flour and 1 cup of warm water. Stir throughly.

Reducing size

The recipe online says to feed for five days, but the size of my starter was getting out of hand by a couple of days, so I started baking at that point. I’ll describe the baking process in a later post. The early loaves definitely weren’t as good as the more recent ones, but they were still edible.

Hybernation

Once the starter is going, you feed daily and probably need to bake daily to keep the starters size under control. That obviously doesn’t work so great if you can’t eat an entire loaf of bread a day. You can hybernate the starter by putting it in the fridge, which means you only need to feed it once a week.

To wake a hybernated starter up, take it out of the fridge and feed it. I do this at 8am. That means I can then start the loaf for baking at about noon, and the starter can either go back in the fridge until next time or stay on the bench being fed daily.

I have noticed that sometimes the starter comes out of the fridge with a layer of dark water on top. Its worked out ok for us to just ignore that and stir it into the mix as part of the feeding process. Hopefully we wont die.

Share

Refurbishing my Macintosh Plus

Somewhere in the mid to late 1990s I picked myself up a Macintosh Plus for the sum of $60AUD. At that time there were still computer Swap Meets where old and interesting equipment was around, so I headed over to one at some point (at the St Kilda Town Hall if memory serves) and picked myself up four 1MB SIMMs to boost the RAM of it from the standard 1MB to the insane amount of 4MB. Why? Umm… because I could? The RAM was pretty cheap, and somewhere in the house to this day, I sometimes stumble over the 256KB SIMMs as I just can’t bring myself to get rid of them.

This upgrade probably would have cost close to $2,000 at the system’s release. If the Macintosh system software were better at disk caching you could have easily held the whole 800k of the floppy disk in memory and still run useful software!

One of the annoying things that started with the Macintosh was odd screws and Apple gear being hard to get into. Compare to say, the Apple ][ which had handy clips to jump inside whenever. In fitting my massive FOUR MEGABYTES of RAM back in the day, I recall using a couple of allen keys sticky-taped together to be able to reach in and get the recessed Torx screws. These days, I can just order a torx bit off Amazon and have it arrive pretty quickly. Well, two torx bits, one of which is just too short for the job.

My (dusty) Macintosh Plus

One thing had always struck me about it, it never really looked like the photos of the Macintosh Plus I saw in books. In what is an embarrassing number of years later, I learned that a lot can be gotten from the serial number printed on the underside of the front of the case.

So heading over to the My Old Mac Serial Number Decoder I can find out:

Manufactured in: F => Fremont, California, USA
Year of production: 1985
Week of production: 14
Production number: 3V3 => 4457
Model ID: M0001WP => Macintosh 512K (European Macintosh ED)

Your Macintosh 512K (European Macintosh ED) was the 4457th Mac manufactured during the 14th week of 1985 in Fremont, California, USA.

Pretty cool! So it is certainly a Plus as the logic board says that, but it’s actually an upgraded 512k! If you think it was madness to have a GUI with only 128k of RAM in the original Macintosh, you’d be right. I do not envy anybody who had one of those.

Some time a decent (but not too many, less than 10) years ago, I turn on the Mac Plus to see if it still worked. It did! But then… some magic smoke started to come out (which isn’t so good), but the computer kept working! There’s something utterly bizarre about looking at a computer with smoke coming out of it that continues to function perfectly fine.

Anyway, as the smoke was coming out, I decided that it would be an opportune time to turn it off, open doors and windows, and put it away until I was ready to deal with it.

One Global Pandemic Later, and now was the time.

I suspected it was going to be a capacitor somewhere that blew, and figured that I should replace it, and probably preemptively replace all the other electrolytic capacitors that could likely leak and cause problems.

First thing’s first though: dismantle it and clean everything. First, taking the case off. Apple is not new to the game of annoying screws to get into things. I ended up spending $12 on this set on Amazon, as the T10 bit can actually reach the screws holding the case on.

Cathode Ray Tubes are not to be messed with. We’re talking lethal voltages here. It had been many years since electricity went into this thing, so all was good. If this all doesn’t work first time when reassembling it, I’m not exactly looking forward to discharging a CRT and working on it.

The inside of my Macintosh Plus, with lots of grime.

You can see there’s grime everywhere. It’s not the worst in the world, but it’s not great (and kinda sticky). Obviously, this needs to be cleaned! The best way to do that is take a lot of photos, dismantle everything, and clean it a bit at a time.

There’s four main electronic components inside a Macintosh Plus:

  1. The CRT itself
  2. The floppy disk drive
  3. The Logic Board (what Mac people call what PC people call the motherboard)
  4. The Analog Board

There’s also some metal structure that keeps some things in place. There’s only a few connectors between things, which are pretty easy to remove. If you don’t know how to discharge a CRT and what the dangers of them are you should immediately go and find out through reading rather than finding out by dying. I would much prefer it if you dyed (because creative fun) rather than died.

Once the floppy connector and the power connector is unplugged, the logic board slides out pretty easily. You can see from the photo below that I have the 4MB of RAM installed and the resistor you need to snip is, well, snipped (but look really closely for that). Also, grime.

Macintosh Plus Logic Board

Cleaning things? Well, there’s two ways that I have used (and considering I haven’t yet written the post with “hurray, it all works”, currently take it with a grain of salt until I write that post). One: contact cleaner. Two: detergent.

Macintosh Plus Logic Board (being washed in my sink)

I took the route of cleaning things first, and then doing recapping adventures. So it was some contact cleaner on the boards, and then some soaking with detergent. This actually all worked pretty well.

Logic Board Capacitors:

  • C5, C6, C7, C12, C13 = 33uF 16V 85C (measured at 39uF, 38uF, 38uF, 39uF)
  • C14 = 1uF 50V (measured at 1.2uF and then it fluctuated down to around 1.15uF)

Analog Board Capacitors

  • C1 = 35V 3.9uF (M) measured at 4.37uF
  • C2 = 16V 4700uF SM measured at 4446uF
  • C3 = 16V 220uF +105C measured at 234uF
  • C5 = 10V 47uF 85C measured at 45.6uF
  • C6 = 50V 22uF 85C measured at 23.3uF
  • C10 = 16V 33uF 85C measured at 37uF
  • C11 = 160V 10uF 85C measured at 11.4uF
  • C12 = 50V 22uF 85C measured at 23.2uF
  • C18 = 16V 33uF 85C measured at 36.7uF
  • C24 = 16V 2200uF 105C measured at 2469uF
  • C27 = 16V 2200uF 105C measured at 2171uF (although started at 2190 and then went down slowly)
  • C28 = 16V 1000uF 105C measured at 638uF, then 1037uF, then 1000uF, then 987uF
  • C30 = 16V 2200uF 105C measured at 2203uF
  • C31 = 16V 220uF 105C measured at 236uF
  • C32 = 16V 2200uF 105C measured at 2227uF
  • C34 = 200V 100uF 85C measured at 101.8uF
  • C35 = 200V 100uF 85C measured at 103.3uF
  • C37 = 250V 0.47uF measured at <exploded>. wheee!
  • C38 = 200V 100uF 85C measured at 103.3uF
  • C39 = 200V 100uF 85C mesaured at 99.6uF (with scorch marks from next door)
  • C42 = 10V 470uF 85C measured at 556uF
  • C45 = 10V 470uF 85C measured at 227uF, then 637uF then 600uF

I’ve ordered an analog board kit from https://console5.com/store/macintosh-128k-512k-plus-analog-pcb-cap-kit-630-0102-661-0462.html and when trying to put them in, I learned that the US Analog board is different to the International Analog board!!! Gah. Dammit.

Note that C30, C32, C38, C39, and C37 were missing from the kit I received (probably due to differences in the US and International boards). I did have an X2 cap (for C37) but it was 0.1uF not 0.47uF. I also had two extra 1000uF 16V caps.

Macintosh Repair and Upgrade Secrets (up to the Mac SE no less!) holds an Appendix with the parts listing for both the US and International Analog boards, and this led me to conclude that they are in fact different boards rather than just a few wires that are different. I am not sure what the “For 120V operation, W12 must be in place” and “for 240V operation, W12 must be removed” writing is about on the International Analog board, but I’m not quite up to messing with that at the moment.

So, I ordered the parts (linked above) and waited (again) to be able to finish re-capping the board.

I found https://youtu.be/H9dxJ7uNXOA video to be a good one for learning a bunch about the insides of compact Macs, I recommend it and several others on his YouTube channel. One interesting thing I learned is that the X2 cap (C37 on the International one) is before the power switch, so could blow just by having the system plugged in and not turned on! Okay, so I’m kind of assuming that it also applies to the International board, and mine exploded while it was plugged in and switched on, so YMMV.

Additionally, there’s an interesting list of commonly failing parts. Unfortunately, this is also for the US logic board, so the tables in Macintosh Repair and Upgrade Secrets are useful. I’m hoping that I don’t have to replace anything more there, but we’ll see.

But, after the Nth round of parts being delivered….

Note the lack of an exploded capacitor

Yep, that’s where the exploded cap was before. Cleanup up all pretty nicely actually. Annoyingly, I had to run it all through a step-up transformer as the board is all set for Australian 240V rather than US 120V. This isn’t going to be an everyday computer though, so it’s fine.

Macintosh Plus booting up (note how long the memory check of 4MB of RAM takes. I’m being very careful as the cover is off. High, and possibly lethal voltages exposed.

Woohoo! It works. While I haven’t found my supply of floppy disks that (at least used to) work, the floppy mechanism also seems to work okay.

Macintosh Plus with a seemingly working floppy drive mechanism. I haven’t found a boot floppy yet though.

Next up: waiting for my Floppy Emu to arrive as it’ll certainly let it boot. Also, it’s now time to rip the house apart to find a floppy disk that certainly should have made its way across the ocean with the move…. Oh, and also to clean up the mouse and keyboard.

May 18, 2020

Displaying client IP address using Apache Server-Side Includes

If you use a Dynamic DNS setup to reach machines which are not behind a stable IP address, you will likely have a need to probe these machines' public IP addresses. One option is to use an insecure service like Oracle's http://checkip.dyndns.com/ which echoes back your client IP, but you can also do this on your own server if you have one.

There are multiple options to do this, like writing a CGI or PHP script, but those are fairly heavyweight if that's all you need mod_cgi or PHP for. Instead, I decided to use Apache's built-in Server-Side Includes.

Apache configuration

Start by turning on the include filter by adding the following in /etc/apache2/conf-available/ssi.conf:

AddType text/html .shtml
AddOutputFilter INCLUDES .shtml

and making that configuration file active:

a2enconf ssi

Then, find the vhost file where you want to enable SSI and add the following options to a Location or Directory section:

<Location /ssi_files>
    Options +IncludesNOEXEC
    SSLRequireSSL
    Header set Content-Security-Policy: "default-src 'none'"
    Header set X-Content-Type-Options: "nosniff"
</Location>

before adding the necessary modules:

a2enmod headers
a2enmod include

and restarting Apache:

apache2ctl configtest && systemctl restart apache2.service

Create an shtml page

With the web server ready to process SSI instructions, the following HTML blurb can be used to display the client IP address:

<!--#echo var="REMOTE_ADDR" -->

or any other built-in variable.

Note that you don't need to write a valid HTML for the variable to be substituted and so the above one-liner is all I use on my server.

Security concerns

The first thing to note is that the configuration section uses the IncludesNOEXEC option in order to disable arbitrary command execution via SSI. In addition, you can also make sure that the cgi module is disabled since that's a dependency of the more dangerous side of SSI:

a2dismod cgi

Of course, if you rely on this IP address to be accurate, for example because you'll be putting it in your DNS, then you should make sure that you only serve this page over HTTPS, which can be enforced via the SSLRequireSSL directive.

I included two other headers in the above vhost config (Content-Security-Policy and X-Content-Type-Options) in order to limit the damage that could be done in case a malicious file was accidentally dropped in that directory.

Finally, I suggest making sure that only the root user has writable access to the directory which has server-side includes enabled:

$ ls -la /var/www/ssi_includes/
total 12
drwxr-xr-x  2 root     root     4096 May 18 15:58 .
drwxr-xr-x 16 root     root     4096 May 18 15:40 ..
-rw-r--r--  1 root     root        0 May 18 15:46 index.html
-rw-r--r--  1 root     root       32 May 18 15:58 whatsmyip.shtml

A Good Time to Upgrade PCs

PC hardware just keeps getting cheaper and faster. Now that so many people have been working from home the deficiencies of home PCs are becoming apparent. I’ll give Australian prices and URLs in this post, but I think that similar prices will be available everywhere that people read my blog.

From MSY (parts list PDF ) [1] 120G SATA SSDs are under $50 each. 120G is more than enough for a basic workstation, so you are looking at $42 or so for fast quiet storage or $84 or so for the same with RAID-1. Being quiet is a significant luxury feature and it’s also useful if you are going to be in video conferences.

For more serious storage NVMe starts at around $100 per unit, I think that $124 for a 500G Crucial NVMe is the best low end option (paying $95 for a 250G Kingston device doesn’t seem like enough savings to be worth it). So that’s $248 for 500G of very fast RAID-1 storage. There’s a Samsung 2TB NVMe device for $349 which is good if you need more storage, it’s interesting to note that this is significantly cheaper than the Samsung 2TB SSD which costs $455. I wonder if SATA SSD devices will go away in the future, it might end up being SATA for slow/cheap spinning media and M.2 NVMe for solid state storage. The SATA SSD devices are only good for use in older systems that don’t have M.2 sockets on the motherboard.

It seems that most new motherboards have one M.2 socket on the motherboard with NVMe support, and presumably support for booting from NVMe. But dual M.2 sockets is rare and the price difference is significantly greater than the cost of a PCIe M.2 card to support NVMe which is $14. So for NVMe RAID-1 it seems that the best option is a motherboard with a single NVMe socket (starting at $89 for a AM4 socket motherboard – the current standard for AMD CPUs) and a PCIe M.2 card.

One thing to note about NVMe is that different drivers are required. On Linux this means means building a new initrd before the migration (or afterwards when booted from a recovery image) and on Windows probably means a fresh install from special installation media with NVMe drivers.

All the AM4 motherboards seem to have RADEON Vega graphics built in which is capable of 4K resolution at a stated refresh of around 24Hz. The ones that give detail about the interfaces say that they have HDMI 1.4 which means a maximum of 30Hz at 4K resolution if you have the color encoding that suits text (IE for use other than just video). I covered this issue in detail in my blog post about DisplayPort and 4K resolution [2]. So a basic AM4 motherboard won’t give great 4K display support, but it will probably be good for a cheap start.

$89 for motherboard, $124 for 500G NVMe, $344 for a Ryzen 5 3600 CPU (not the cheapest AM4 but in the middle range and good value for money), and $99 for 16G of RAM (DDR4 RAM is cheaper than DDR3 RAM) gives the core of a very decent system for $656 (assuming you have a working system to upgrade and peripherals to go with it).

Currently Kogan has 4K resolution monitors starting at $329 [3]. They probably won’t be the greatest monitors but my experience of a past cheap 4K monitor from Kogan was that it is quite OK. Samsung 4K monitors started at about $400 last time I could check (Kogan currently has no stock of them and doesn’t display the price), I’d pay an extra $70 for Samsung, but the Kogan branded product is probably good enough for most people. So you are looking at under $1000 for a new system with fast CPU, DDR4 RAM, NVMe storage, and a 4K monitor if you already have the case, PSU, keyboard, mouse, etc.

It seems quite likely that the 4K video hardware on a cheap AM4 motherboard won’t be that great for games and it will definitely be lacking for watching TV documentaries. Whether such deficiencies are worth spending money on a PCIe video card (starting at $50 for a low end card but costing significantly more for 3D gaming at 4K resolution) is a matter of opinion. I probably wouldn’t have spent extra for a PCIe video card if I had 4K video on the motherboard. Not only does using built in video save money it means one less fan running (less background noise) and probably less electricity use too.

My Plans

I currently have a workstation with 2*500G SATA SSDs in a RAID-1 array, 16G of RAM, and a i5-2500 CPU (just under 1/4 the speed of the Ryzen 5 3600). If I had hard drives then I would definitely buy a new system right now. But as I have SSDs that work nicely (quiet and fast enough for most things) and almost all machines I personally use have SSDs (so I can’t get a benefit from moving my current SSDs to another system) I would just get CPU, motherboard, and RAM. So the question is whether to spend $532 for more than 4* the CPU performance. At the moment I’ll wait because I’ll probably get a free system with DDR4 RAM in the near future, while it probably won’t be as fast as a Ryzen 5 3600, it should be at least twice as fast as what I currently have.

May 17, 2020

Notes on Installing Ubuntu 20 VM on an MS-Windows 10 Host

Some thirteen years ago I worked with Xen virtual machines as part of my day job, and gave a presentation at Linux Users of Victoria on the subject (with additional lecture notes). A few years after that I gave another presentation on the Unified Extensible Firmware Interface (UEFI), itself which (indirectly) led to a post on Linux and MS-Windows 8 dual-booting. All of this now leads to a some notes on using MS-Windows as a host for Ubuntu Linux guest machines.

Why Would You Want to do This?

Most people these have at least heard of Linux. They might even know that every single supercomputer in the world uses Linux. They may know that the overwhelming majority of embedded devices, such as home routers, use Linux. Or maybe even that the Android mobile 'phone uses a Linux kernel. Or that MacOS is built on the same broad family of UNIX-like operating systems. Whilst they might be familiar with their MS-Windows environment, because that's what they've been brought up on and what their favourite applications are designed for, they might also be "Linux curious", especially if they are hoping to either scale-up the complexity and volume of the datasets they're working with (i.e., towards high performance computing) or scale-down their applications (i.e., towards embedded devices). If this is the case, then introducing Linux via a virtual machine (VM) is a relatively safe and easy path to experiment with.

About VMs

Virtual machines work by emulating a computer system, including hardware, in a software environment, a technology that has been around for a very long time (e.g., CP/CMS, 1967). The VMs in a host system is managed by a hypervisor, or Virtual Machine Monitor (VMM), that manages one or more guest systems. In the example that follows VirtualBox, a free-and-open source hypervisor. Because the guest system relies on the host it cannot have the same performance as a host system, unlike a dual-boot system. It will share memory, it will share processing power, it must take up some disk space, and will also have the overhead of the hypervisor itself (although this has improved a great deal in recent years). In a production environment, VMs are usually used to optimise resource allocation for very powerful systems, such as web-server farms and bodies like the Nectar Research Cloud, or even some partitions on systems like the University of Melbourne's supercomputer, Spartan. In a development environment, VMs are an excellent tool for testing and debugging.

Install VirtualBox and Enable Virtualization

For most environments VirtualBox is an easy path for creating a virtual machine, ARM systems excluded (QEMU suggested for Raspberry Pi or Android, or QEMU's fork, KVM). For the example given here, simply download VirtualBox for MS-Windows and click one's way through the installation process, noting that it VirtualBox will make changes to your system and that products from Oracle can be trusted (*blink*). Download for other operating environments are worth looking at as well.

It is essential to enable virtualisation on your MS-Windows host through the BIOS/UEFI, which is not as easy as it used to be. A handy page from some smart people in the Czech Republic provides quick instructions for a variety of hardware environments. The good people at laptopmag provide the path from within the MS-Windows environment. In summary; select Settings (gear icon), select Update & Security, Select Recovery (this sounds wrong), Advanced Startup, Restart Now (which is also wrong, you don't restart now), Troubleshoot, Advanced Options, UEFI Firmware Settings, then Restart.

Install Linux and Create a Shared Folder

Download a Ubuntu 20.04 LTS (long-term support) ISO and save to the MS-Windows host. There are some clever alternatives, such as the Ubuntu Linux terminal environment for MS-Windows (which is possibly even a better choice these days, but that will be for another post), or Multipass which allows one to create their own mini-cloud environment. But this is a discussion for a VM, so I'll resist the temptation to go off on a tangent.

Creating a VM in VirtualBox is pretty straight-forward; open the application, select "New", give the VM a name, and allocate resources (virtual hard disk, virtual memory). It's worthwhile tending towards the generous in resource allocation. After that it is a case selecting the ISO in settings and storage; remember a VM does not have a real disk drive, so it has a virtual (software) one. After this one can start the VM, and it will boot from the ISO and begin the installation process for Ubuntu Linux desktop edition, which is pretty straight forward. One amusing caveat, when the installation says it's going to wipe the disk it doesn't mean the host machine, just that of the virtual disk that has been build for it. When the installation is complete go to "Devices" on the VM menu, and remove the boot disk and restart the guest system; you now have a Ubuntu VM installed on your MS-Windows system.

By default, VMs do not have access to the host computer. To provide that access one will want to set up a shared folder in the VM and on the host. The first step in this environment would be to give the Linux user (created during installation) membership to the vboxsf, e.g., on the terminal sudo usermod -a -G vboxsf username. In VirtualBox, select Settings, and add a Share under as a Machine Folders, which is a permanent folder. Under Folder Path set the name and location on the host operating system (e.g., UbuntuShared on the Desktop); leave automount blank (we can fix that soon enough). Put a test file in the shared folder.

Ubuntu now needs additional software installed to work with VirtualBox's Guest Additions, including kernel modules. Also, mount VirtualBox's Guest Additions to the guest VM, under Devices as a virtual CD; you can download this from the VirtualBox website.

Run the following commands, entering the default user's password as needed:


sudo apt-get install -y build-essential linux-headers-`uname -r`
sudo /media/cdrom/./VBoxLinuxAdditions.run
sudo shutdown -r now # Reboot the system
mkdir ~/UbuntuShared
sudo mount -t vboxsf shared ~/UbuntuShared
cd ~/UbuntuShared

The file that was put in the UbuntuShared folder in MS-Windows should now be visible in ~/UbuntuShared. Add a file (e.g., touch testfile.txt) from Linux and check if it can seen in MS-Windows. If this all succeeds, make the folder persistent.


sudo nano /etc/fstab # nano is just fine for short configuration files
# Add the following, separate by tabs, and save
shared /home//UbuntuShared vboxsf defaults 0 0
# Edit modules
sudo nano /etc/modules
# Add the following
vboxsf
# Exit and reboot
sudo shutdown -r now

You're done! You now have a Ubuntu desktop system running as a VM guest using VirtualBox on an MS-Windows 10 host system. Ideal for learning, testing, and debugging.

A super simple non-breadmaker loaf

Share

This is the second in a series of posts documenting my adventures in making bread during the COVID-19 shutdown. Yes I know all the cool kids made bread for themselves during the shutdown, but I did it too!

A loaf of bread

So here we were, in the middle of a pandemic which closed bakeries and cancelled almost all of my non-work activities. I found this animated GIF on Reddit for a super simple no-kneed bread and decided to give it a go. It turns out that a few things are true:

  • animated GIFs are a super terrible way store recipes
  • that animated GIF was a export of this YouTube video which originally accompanied this blog post
  • and that I only learned these things while to trying and work out who to credit for this recipe

The basic recipe is really easy — chuck the following into a big bowl, stir, and then cover with a plate. Leave resting a warm place for a long time (three or four hours), then turn out onto a floured bench. Fold into a ball with flour, and then bake. You can see a more detailed version in the YouTube video above.

  • 3 cups of bakers flour (not plain white flour)
  • 2 tea spoons of yeast
  • 2 tea spooons of salt
  • 1.5 cups of warm water (again, I use 42 degrees from my gas hot water system)

The dough will seem really dry when you first mix it, but gets wetter as it rises. Don’t panic if it seems tacky and dry.

I think the key here is the baking process, which is how the oven loaf in my previous post about bread maker white loaves was baked. I use a cast iron camp oven (sometimes called a dutch oven), because thermal mass is key. If I had a fancy enamelized cast iron camp oven I’d use that, but I don’t and I wasn’t going shopping during the shutdown to get one. Oh, and they can be crazy expensive at up to $500 AUD.

Another loaf of bread

Warm the oven with the camp oven inside for at least 30 minutes at 230 degrees celsius. Then place the dough inside the camp oven on some baking paper — I tend to use a triffet as well, but I think you could skip that if you didn’t have one. Bake for 30 minutes with the lid on — this helps steam the bread a little and forms a nice crust. Then bake for another 12 minutes with the camp over lid off — this darkens the crust up nicely.

A final loaf of bread

Oh, and I’ve noticed a bit of variation in how wet the dough seems to be when I turn it out and form it in flour, but it doesn’t really seem to change the outcome once baked, so that’s nice.

The original blogger for this receipe also recommends chilling the dough overnight in the fridge before baking, but I haven’t tried that yet.

Share

Private Key Redaction: UR DOIN IT RONG

Because posting private keys on the Internet is a bad idea, some people like to “redact” their private keys, so that it looks kinda-sorta like a private key, but it isn’t actually giving away anything secret. Unfortunately, due to the way that private keys are represented, it is easy to “redact” a key in such a way that it doesn’t actually redact anything at all. RSA private keys are particularly bad at this, but the problem can (potentially) apply to other keys as well.

I’ll show you a bit of “Inside Baseball” with key formats, and then demonstrate the practical implications. Finally, we’ll go through a practical worked example from an actual not-really-redacted key I recently stumbled across in my travels.

The Private Lives of Private Keys

Here is what a typical private key looks like, when you come across it:

-----BEGIN RSA PRIVATE KEY-----
MGICAQACEQCxjdTmecltJEz2PLMpS4BXAgMBAAECEDKtuwD17gpagnASq1zQTYEC
CQDVTYVsjjF7IQIJANUYZsIjRsR3AgkAkahDUXL0RSECCB78r2SnsJC9AghaOK3F
sKoELg==
-----END RSA PRIVATE KEY-----

Obviously, there’s some hidden meaning in there – computers don’t encrypt things by shouting “BEGIN RSA PRIVATE KEY!”, after all. What is between the BEGIN/END lines above is, in fact, a base64-encoded DER format ASN.1 structure representing a PKCS#1 private key.

In simple terms, it’s a list of numbers – very important numbers. The list of numbers is, in order:

  • A version number (0);
  • The “public modulus”, commonly referred to as “n”;
  • The “public exponent”, or “e” (which is almost always 65,537, for various unimportant reasons);
  • The “private exponent”, or “d”;
  • The two “private primes”, or “p” and “q”;
  • Two exponents, which are known as “dmp1” and “dmq1”; and
  • A coefficient, known as “iqmp”.

Why Is This a Problem?

The thing is, only three of those numbers are actually required in a private key. The rest, whilst useful to allow the RSA encryption and decryption to be more efficient, aren’t necessary. The three absolutely required values are e, p, and q.

Of the other numbers, most of them are at least about the same size as each of p and q. So of the total data in an RSA key, less than a quarter of the data is required. Let me show you with the above “toy” key, by breaking it down piece by piece1:

  • MGI – DER for “this is a sequence”
  • CAQ – version (0)
  • CxjdTmecltJEz2PLMpS4BXn
  • AgMBAAe
  • ECEDKtuwD17gpagnASq1zQTYd
  • ECCQDVTYVsjjF7IQp
  • IJANUYZsIjRsR3q
  • AgkAkahDUXL0RSdmp1
  • ECCB78r2SnsJC9dmq1
  • AghaOK3FsKoELg==iqmp

Remember that in order to reconstruct all of these values, all I need are e, p, and q – and e is pretty much always 65,537. So I could “redact” almost all of this key, and still give all the important, private bits of this key. Let me show you:

-----BEGIN RSA PRIVATE KEY-----
..............................................................EC
CQDVTYVsjjF7IQIJANUYZsIjRsR3....................................
........
-----END RSA PRIVATE KEY-----

Now, I doubt that anyone is going to redact a key precisely like this… but then again, this isn’t a “typical” RSA key. They usually look a lot more like this:

-----BEGIN RSA PRIVATE KEY-----
MIIEogIBAAKCAQEAu6Inch7+mWtKn+leB9uCG3MaJIxRyvC/5KTz2fR+h+GOhqj4
SZJobiVB4FrE5FgC7AnlH6qeRi9MI0s6dt5UWZ5oNIeWSaOOeNO+EJDUkSVf67wj
SNGXlSjGAkPZ0nRJiDjhuPvQmdW53hOaBLk5udxPEQbenpXAzbLJ7wH5ouLQ3nQw
HwpwDNQhF6zRO8WoscpDVThOAM+s4PS7EiK8ZR4hu2toon8Ynadlm95V45wR0VlW
zywgbkZCKa1IMrDCscB6CglQ10M3Xzya3iTzDtQxYMVqhDrA7uBYRxA0y1sER+Rb
yhEh03xz3AWemJVLCQuU06r+FABXJuY/QuAVvQIDAQABAoIBAFqwWVhzWqNUlFEO
PoCVvCEAVRZtK+tmyZj9kU87ORz8DCNR8A+/T/JM17ZUqO2lDGSBs9jGYpGRsr8s
USm69BIM2ljpX95fyzDjRu5C0jsFUYNi/7rmctmJR4s4uENcKV5J/++k5oI0Jw4L
c1ntHNWUgjK8m0UTJIlHbQq0bbAoFEcfdZxd3W+SzRG3jND3gifqKxBG04YDwloy
tu+bPV2jEih6p8tykew5OJwtJ3XsSZnqJMwcvDciVbwYNiJ6pUvGq6Z9kumOavm9
XU26m4cWipuK0URWbHWQA7SjbktqEpxsFrn5bYhJ9qXgLUh/I1+WhB2GEf3hQF5A
pDTN4oECgYEA7Kp6lE7ugFBDC09sKAhoQWrVSiFpZG4Z1gsL9z5YmZU/vZf0Su0n
9J2/k5B1GghvSwkTqpDZLXgNz8eIX0WCsS1xpzOuORSNvS1DWuzyATIG2cExuRiB
jYWIJUeCpa5p2PdlZmBrnD/hJ4oNk4oAVpf+HisfDSN7HBpN+TJfcAUCgYEAyvY7
Y4hQfHIdcfF3A9eeCGazIYbwVyfoGu70S/BZb2NoNEPymqsz7NOfwZQkL4O7R3Wl
Rm0vrWT8T5ykEUgT+2ruZVXYSQCKUOl18acbAy0eZ81wGBljZc9VWBrP1rHviVWd
OVDRZNjz6nd6ZMrJvxRa24TvxZbJMmO1cgSW1FkCgYAoWBd1WM9HiGclcnCZknVT
UYbykCeLO0mkN1Xe2/32kH7BLzox26PIC2wxF5seyPlP7Ugw92hOW/zewsD4nLze
v0R0oFa+3EYdTa4BvgqzMXgBfvGfABJ1saG32SzoWYcpuWLLxPwTMsCLIPmXgRr1
qAtl0SwF7Vp7O/C23mNukQKBgB89DOEB7xloWv3Zo27U9f7nB7UmVsGjY8cZdkJl
6O4LB9PbjXCe3ywZWmJqEbO6e83A3sJbNdZjT65VNq9uP50X1T+FmfeKfL99X2jl
RnQTsrVZWmJrLfBSnBkmb0zlMDAcHEnhFYmHFuvEnfL7f1fIoz9cU6c+0RLPY/L7
n9dpAoGAXih17mcmtnV+Ce+lBWzGWw9P4kVDSIxzGxd8gprrGKLa3Q9VuOrLdt58
++UzNUaBN6VYAe4jgxGfZfh+IaSlMouwOjDgE/qzgY8QsjBubzmABR/KWCYiRqkj
qpWCgo1FC1Gn94gh/+dW2Q8+NjYtXWNqQcjRP4AKTBnPktEvdMA=
-----END RSA PRIVATE KEY-----

People typically redact keys by deleting whole lines, and usually replacing them with [...] and the like. But only about 345 of those 1588 characters (excluding the header and footer) are required to construct the entire key. You can redact about 4/5ths of that giant blob of stuff, and your private parts (or at least, those of your key) are still left uncomfortably exposed.

But Wait! There’s More!

Remember how I said that everything in the key other than e, p, and q could be derived from those three numbers? Let’s talk about one of those numbers: n.

This is known as the “public modulus” (because, along with e, it is also present in the public key). It is very easy to calculate: n = p * q. It is also very early in the key (the second number, in fact).

Since n = p * q, it follows that q = n / p. Thus, as long as the key is intact up to p, you can derive q by simple division.

Real World Redaction

At this point, I’d like to introduce an acquaintance of mine: Mr. Johan Finn. He is the proud owner of the GitHub repo johanfinn/scripts. For a while, his repo contained a script that contained a poorly-redacted private key. He since deleted it, by making a new commit, but of course because git never really deletes anything, it’s still available.

Of course, Mr. Finn may delete the repo, or force-push a new history without that commit, so here is the redacted private key, with a bit of the surrounding shell script, for our illustrative pleasure:

#Add private key to .ssh folder
cd /home/johan/.ssh/
echo  "-----BEGIN RSA PRIVATE KEY-----
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
ÄÄÄÄÄÄÄÄÄÄÄÄÄÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ
MIIJKgIBAAKCAgEAxEVih1JGb8gu/Fm4AZh+ZwJw/pjzzliWrg4mICFt1g7SmIE2
TCQMKABdwd11wOFKCPc/UzRH/fHuQcvWrpbOSdqev/zKff9iedKw/YygkMeIRaXB
fYELqvUAOJ8PPfDm70st9GJRhjGgo5+L3cJB2gfgeiDNHzaFvapRSU0oMGQX+kI9
ezsjDAn+0Pp+r3h/u1QpLSH4moRFGF4omNydI+3iTGB98/EzuNhRBHRNq4oBV5SG
Pq/A1bem2ninnoEaQ+OPESxYzDz3Jy9jV0W/6LvtJ844m+XX69H5fqq5dy55z6DW
sGKn78ULPVZPsYH5Y7C+CM6GAn4nYCpau0t52sqsY5epXdeYx4Dc+Wm0CjXrUDEe
Egl4loPKDxJkQqQ/MQiz6Le/UK9vEmnWn1TRXK3ekzNV4NgDfJANBQobOpwt8WVB
rbsC0ON7n680RQnl7PltK9P1AQW5vHsahkoixk/BhcwhkrkZGyDIl9g8Q/Euyoq3
eivKPLz7/rhDE7C1BzFy7v8AjC3w7i9QeHcWOZFAXo5hiDasIAkljDOsdfD4tP5/
wSO6E6pjL3kJ+RH2FCHd7ciQb+IcuXbku64ln8gab4p8jLa/mcMI+V3eWYnZ82Yu
axsa85hAe4wb60cp/rCJo7ihhDTTvGooqtTisOv2nSvCYpcW9qbL6cGjAXECAwEA
AQKCAgEAjz6wnWDP5Y9ts2FrqUZ5ooamnzpUXlpLhrbu3m5ncl4ZF5LfH+QDN0Kl
KvONmHsUhJynC/vROybSJBU4Fu4bms1DJY3C39h/L7g00qhLG7901pgWMpn3QQtU
4P49qpBii20MGhuTsmQQALtV4kB/vTgYfinoawpo67cdYmk8lqzGzzB/HKxZdNTq
s+zOfxRr7PWMo9LyVRuKLjGyYXZJ/coFaobWBi8Y96Rw5NZZRYQQXLIalC/Dhndm
AHckpstEtx2i8f6yxEUOgPvV/gD7Akn92RpqOGW0g/kYpXjGqZQy9PVHGy61sInY
HSkcOspIkJiS6WyJY9JcvJPM6ns4b84GE9qoUlWVF3RWJk1dqYCw5hz4U8LFyxsF
R6WhYiImvjxBLpab55rSqbGkzjI2z+ucDZyl1gqIv9U6qceVsgRyuqdfVN4deU22
LzO5IEDhnGdFqg9KQY7u8zm686Ejs64T1sh0y4GOmGsSg+P6nsqkdlXH8C+Cf03F
lqPFg8WQC7ojl/S8dPmkT5tcJh3BPwIWuvbtVjFOGQc8x0lb+NwK8h2Nsn6LNazS
0H90adh/IyYX4sBMokrpxAi+gMAWiyJHIHLeH2itNKtAQd3qQowbrWNswJSgJzsT
JuJ7uqRKAFkE6nCeAkuj/6KHHMPsfCAffVdyGaWqhoxmPOrnVgECggEBAOrCCwiC
XxwUgjOfOKx68siFJLfHf4vPo42LZOkAQq5aUmcWHbJVXmoxLYSczyAROopY0wd6
Dx8rqnpO7OtZsdJMeBSHbMVKoBZ77hiCQlrljcj12moFaEAButLCdZFsZW4zF/sx
kWIAaPH9vc4MvHHyvyNoB3yQRdevu57X7xGf9UxWuPil/jvdbt9toaraUT6rUBWU
GYPNKaLFsQzKsFWAzp5RGpASkhuiBJ0Qx3cfLyirjrKqTipe3o3gh/5RSHQ6VAhz
gdUG7WszNWk8FDCL6RTWzPOrbUyJo/wz1kblsL3vhV7ldEKFHeEjsDGroW2VUFlS
asAHNvM4/uYcOSECggEBANYH0427qZtLVuL97htXW9kCAT75xbMwgRskAH4nJDlZ
IggDErmzBhtrHgR+9X09iL47jr7dUcrVNPHzK/WXALFSKzXhkG/yAgmt3r14WgJ6
5y7010LlPFrzaNEyO/S4ISuBLt4cinjJsrFpoo0WI8jXeM5ddG6ncxdurKXMymY7
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::.::
:::::::::::::::::::::::::::.::::::::::::::::::::::::::::::::::::
LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLlL
ÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖ
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ
YYYYYYYYYYYYYYYYYYYYYyYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY
gff0GJCOMZ65pMSy3A3cSAtjlKnb4fWzuHD5CFbusN4WhCT/tNxGNSpzvxd8GIDs
nY7exs9L230oCCpedVgcbayHCbkChEfoPzL1e1jXjgCwCTgt8GjeEFqc1gXNEaUn
O8AJ4VlR8fRszHm6yR0ZUBdY7UJddxQiYOzt0S1RLlECggEAbdcs4mZdqf3OjejJ
06oTPs9NRtAJVZlppSi7pmmAyaNpOuKWMoLPElDAQ3Q7VX26LlExLCZoPOVpdqDH
KbdmBEfTR4e11Pn9vYdu9/i6o10U4hpmf4TYKlqk10g1Sj21l8JATj/7Diey8scO
sAI1iftSg3aBSj8W7rxCxSezrENzuqw5D95a/he1cMUTB6XuravqZK5O4eR0vrxR
AvMzXk5OXrUEALUvt84u6m6XZZ0pq5XZxq74s8p/x1JvTwcpJ3jDKNEixlHfdHEZ
ZIu/xpcwD5gRfVGQamdcWvzGHZYLBFO1y5kAtL8kI9tW7WaouWVLmv99AyxdAaCB
Y5mBAQKCAQEAzU7AnorPzYndlOzkxRFtp6MGsvRBsvvqPLCyUFEXrHNV872O7tdO
GmsMZl+q+TJXw7O54FjJJvqSSS1sk68AGRirHop7VQce8U36BmI2ZX6j2SVAgIkI
9m3btCCt5rfiCatn2+Qg6HECmrCsHw6H0RbwaXS4RZUXD/k4X+sslBitOb7K+Y+N
Bacq6QxxjlIqQdKKPs4P2PNHEAey+kEJJGEQ7bTkNxCZ21kgi1Sc5L8U/IGy0BMC
PvJxssLdaWILyp3Ws8Q4RAoC5c0ZP0W2j+5NSbi3jsDFi0Y6/2GRdY1HAZX4twem
Q0NCedq1JNatP1gsb6bcnVHFDEGsj/35oQKCAQEAgmWMuSrojR/fjJzvke6Wvbox
FRnPk+6YRzuYhAP/YPxSRYyB5at++5Q1qr7QWn7NFozFIVFFT8CBU36ktWQ39MGm
cJ5SGyN9nAbbuWA6e+/u059R7QL+6f64xHRAGyLT3gOb1G0N6h7VqFT25q5Tq0rc
Lf/CvLKoudjv+sQ5GKBPT18+zxmwJ8YUWAsXUyrqoFWY/Tvo5yLxaC0W2gh3+Ppi
EDqe4RRJ3VKuKfZxHn5VLxgtBFN96Gy0+Htm5tiMKOZMYAkHiL+vrVZAX0hIEuRZ
JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
-----END RSA PRIVATE KEY-----" >> id_rsa

Now, if you try to reconstruct this key by removing the “obvious” garbage lines (the ones that are all repeated characters, some of which aren’t even valid base64 characters), it still isn’t a key – at least, openssl pkey doesn’t want anything to do with it. The key is very much still in there, though, as we shall soon see.

Using a gem I wrote and a quick bit of Ruby, we can extract a complete private key. The irb session looks something like this:

>> require "derparse"
>> b64 = <<EOF
MIIJKgIBAAKCAgEAxEVih1JGb8gu/Fm4AZh+ZwJw/pjzzliWrg4mICFt1g7SmIE2
TCQMKABdwd11wOFKCPc/UzRH/fHuQcvWrpbOSdqev/zKff9iedKw/YygkMeIRaXB
fYELqvUAOJ8PPfDm70st9GJRhjGgo5+L3cJB2gfgeiDNHzaFvapRSU0oMGQX+kI9
ezsjDAn+0Pp+r3h/u1QpLSH4moRFGF4omNydI+3iTGB98/EzuNhRBHRNq4oBV5SG
Pq/A1bem2ninnoEaQ+OPESxYzDz3Jy9jV0W/6LvtJ844m+XX69H5fqq5dy55z6DW
sGKn78ULPVZPsYH5Y7C+CM6GAn4nYCpau0t52sqsY5epXdeYx4Dc+Wm0CjXrUDEe
Egl4loPKDxJkQqQ/MQiz6Le/UK9vEmnWn1TRXK3ekzNV4NgDfJANBQobOpwt8WVB
rbsC0ON7n680RQnl7PltK9P1AQW5vHsahkoixk/BhcwhkrkZGyDIl9g8Q/Euyoq3
eivKPLz7/rhDE7C1BzFy7v8AjC3w7i9QeHcWOZFAXo5hiDasIAkljDOsdfD4tP5/
wSO6E6pjL3kJ+RH2FCHd7ciQb+IcuXbku64ln8gab4p8jLa/mcMI+V3eWYnZ82Yu
axsa85hAe4wb60cp/rCJo7ihhDTTvGooqtTisOv2nSvCYpcW9qbL6cGjAXECAwEA
AQKCAgEAjz6wnWDP5Y9ts2FrqUZ5ooamnzpUXlpLhrbu3m5ncl4ZF5LfH+QDN0Kl
KvONmHsUhJynC/vROybSJBU4Fu4bms1DJY3C39h/L7g00qhLG7901pgWMpn3QQtU
4P49qpBii20MGhuTsmQQALtV4kB/vTgYfinoawpo67cdYmk8lqzGzzB/HKxZdNTq
s+zOfxRr7PWMo9LyVRuKLjGyYXZJ/coFaobWBi8Y96Rw5NZZRYQQXLIalC/Dhndm
AHckpstEtx2i8f6yxEUOgPvV/gD7Akn92RpqOGW0g/kYpXjGqZQy9PVHGy61sInY
HSkcOspIkJiS6WyJY9JcvJPM6ns4b84GE9qoUlWVF3RWJk1dqYCw5hz4U8LFyxsF
R6WhYiImvjxBLpab55rSqbGkzjI2z+ucDZyl1gqIv9U6qceVsgRyuqdfVN4deU22
LzO5IEDhnGdFqg9KQY7u8zm686Ejs64T1sh0y4GOmGsSg+P6nsqkdlXH8C+Cf03F
lqPFg8WQC7ojl/S8dPmkT5tcJh3BPwIWuvbtVjFOGQc8x0lb+NwK8h2Nsn6LNazS
0H90adh/IyYX4sBMokrpxAi+gMAWiyJHIHLeH2itNKtAQd3qQowbrWNswJSgJzsT
JuJ7uqRKAFkE6nCeAkuj/6KHHMPsfCAffVdyGaWqhoxmPOrnVgECggEBAOrCCwiC
XxwUgjOfOKx68siFJLfHf4vPo42LZOkAQq5aUmcWHbJVXmoxLYSczyAROopY0wd6
Dx8rqnpO7OtZsdJMeBSHbMVKoBZ77hiCQlrljcj12moFaEAButLCdZFsZW4zF/sx
kWIAaPH9vc4MvHHyvyNoB3yQRdevu57X7xGf9UxWuPil/jvdbt9toaraUT6rUBWU
GYPNKaLFsQzKsFWAzp5RGpASkhuiBJ0Qx3cfLyirjrKqTipe3o3gh/5RSHQ6VAhz
gdUG7WszNWk8FDCL6RTWzPOrbUyJo/wz1kblsL3vhV7ldEKFHeEjsDGroW2VUFlS
asAHNvM4/uYcOSECggEBANYH0427qZtLVuL97htXW9kCAT75xbMwgRskAH4nJDlZ
IggDErmzBhtrHgR+9X09iL47jr7dUcrVNPHzK/WXALFSKzXhkG/yAgmt3r14WgJ6
5y7010LlPFrzaNEyO/S4ISuBLt4cinjJsrFpoo0WI8jXeM5ddG6ncxdurKXMymY7
EOF
>> b64 += <<EOF
gff0GJCOMZ65pMSy3A3cSAtjlKnb4fWzuHD5CFbusN4WhCT/tNxGNSpzvxd8GIDs
nY7exs9L230oCCpedVgcbayHCbkChEfoPzL1e1jXjgCwCTgt8GjeEFqc1gXNEaUn
O8AJ4VlR8fRszHm6yR0ZUBdY7UJddxQiYOzt0S1RLlECggEAbdcs4mZdqf3OjejJ
06oTPs9NRtAJVZlppSi7pmmAyaNpOuKWMoLPElDAQ3Q7VX26LlExLCZoPOVpdqDH
KbdmBEfTR4e11Pn9vYdu9/i6o10U4hpmf4TYKlqk10g1Sj21l8JATj/7Diey8scO
sAI1iftSg3aBSj8W7rxCxSezrENzuqw5D95a/he1cMUTB6XuravqZK5O4eR0vrxR
AvMzXk5OXrUEALUvt84u6m6XZZ0pq5XZxq74s8p/x1JvTwcpJ3jDKNEixlHfdHEZ
ZIu/xpcwD5gRfVGQamdcWvzGHZYLBFO1y5kAtL8kI9tW7WaouWVLmv99AyxdAaCB
Y5mBAQKCAQEAzU7AnorPzYndlOzkxRFtp6MGsvRBsvvqPLCyUFEXrHNV872O7tdO
GmsMZl+q+TJXw7O54FjJJvqSSS1sk68AGRirHop7VQce8U36BmI2ZX6j2SVAgIkI
9m3btCCt5rfiCatn2+Qg6HECmrCsHw6H0RbwaXS4RZUXD/k4X+sslBitOb7K+Y+N
Bacq6QxxjlIqQdKKPs4P2PNHEAey+kEJJGEQ7bTkNxCZ21kgi1Sc5L8U/IGy0BMC
PvJxssLdaWILyp3Ws8Q4RAoC5c0ZP0W2j+5NSbi3jsDFi0Y6/2GRdY1HAZX4twem
Q0NCedq1JNatP1gsb6bcnVHFDEGsj/35oQKCAQEAgmWMuSrojR/fjJzvke6Wvbox
FRnPk+6YRzuYhAP/YPxSRYyB5at++5Q1qr7QWn7NFozFIVFFT8CBU36ktWQ39MGm
cJ5SGyN9nAbbuWA6e+/u059R7QL+6f64xHRAGyLT3gOb1G0N6h7VqFT25q5Tq0rc
Lf/CvLKoudjv+sQ5GKBPT18+zxmwJ8YUWAsXUyrqoFWY/Tvo5yLxaC0W2gh3+Ppi
EDqe4RRJ3VKuKfZxHn5VLxgtBFN96Gy0+Htm5tiMKOZMYAkHiL+vrVZAX0hIEuRZ
EOF
>> der = b64.unpack("m").first
>> c = DerParse.new(der).first_node.first_child
>> version = c.value
=> 0
>> c = c.next_node
>> n = c.value
=> 80071596234464993385068908004931... # (etc)
>> c = c.next_node
>> e = c.value
=> 65537
>> c = c.next_node
>> d = c.value
=> 58438813486895877116761996105770... # (etc)
>> c = c.next_node
>> p = c.value
=> 29635449580247160226960937109864... # (etc)
>> c = c.next_node
>> q = c.value
=> 27018856595256414771163410576410... # (etc)

What I’ve done, in case you don’t speak Ruby, is take the two “chunks” of plausible-looking base64 data, chuck them together into a variable named b64, unbase64 it into a variable named der, pass that into a new DerParse instance, and then walk the DER value tree until I got all the values I need.

Interestingly, the q value actually traverses the “split” in the two chunks, which means that there’s always the possibility that there are lines missing from the key. However, since p and q are supposed to be prime, we can “sanity check” them to see if corruption is likely to have occurred:

>> require "openssl"
>> OpenSSL::BN.new(p).prime?
=> true
>> OpenSSL::BN.new(q).prime?
=> true

Excellent! The chances of a corrupted file producing valid-but-incorrect prime numbers isn’t huge, so we can be fairly confident that we’ve got the “real” p and q. Now, with the help of another one of my creations we can use e, p, and q to create a fully-operational battle key:

>> require "openssl/pkey/rsa"
>> k = OpenSSL::PKey::RSA.from_factors(p, q, e)
=> #<OpenSSL::PKey::RSA:0x0000559d5903cd38>
>> k.valid?
=> true
>> k.verify(OpenSSL::Digest::SHA256.new, k.sign(OpenSSL::Digest::SHA256.new, "bob"), "bob")
=> true

… and there you have it. One fairly redacted-looking private key brought back to life by maths and far too much free time.

Sorry Mr. Finn, I hope you’re not still using that key on anything Internet-facing.

What About Other Key Types?

EC keys are very different beasts, but they have much the same problems as RSA keys. A typical EC key contains both private and public data, and the public portion is twice the size – so only about 1/3 of the data in the key is private material. It is quite plausible that you can “redact” an EC key and leave all the actually private bits exposed.

What Do We Do About It?

In short: don’t ever try and redact real private keys. For documentation purposes, just put “KEY GOES HERE” in the appropriate spot, or something like that. Store your secrets somewhere that isn’t a public (or even private!) git repo.

Generating a “dummy” private key and sticking it in there isn’t a great idea, for different reasons: people have this odd habit of reusing “demo” keys in real life. There’s no need to encourage that sort of thing.


  1. Technically the pieces aren’t 100% aligned with the underlying DER, because of how base64 works. I felt it was easier to understand if I stuck to chopping up the base64, rather than decoding into DER and then chopping up the DER. 

MicroHams Digital Conference (MHDC) 2020

On May 9 2020 (PST) I had the pleasure of speaking at the MicroHams Digital Conference (MHDC) 2020. Due to COVID-19 presenters attended via Zoom, and the conference was live streamed over YouTube.

Thanks to hard work of the organisers, this worked really well!

Looking at the conference program, I noticed the standard of the presenters was very high. The organisers I worked with (Scott N7SS, and Grant KB7WSD) explained that a side effect of making the conference virtual was casting a much wider net on presenters – making the conference even better than IRL (In Real Life)! The YouTube streaming stats showed 300-500 people “attending” – also very high.

My door to door travel time to West Coast USA is about 20 hours. So a remote presentation makes life much easier for me. It takes me a week to prepare, means 1-2 weeks away from home, and a week to recover from the jetlag. As a single parent I need to find a carer for my 14 year old.

Vickie, KD7LAW, ran a break out room for after talk chat which worked well. It was nice to “meet” several people that I usually just have email contact with. All from the comfort of my home on a Sunday morning in Adelaide (Saturday afternoon PST).

The MHDC 2020 talks have been now been published on YouTube. Here is my talk, which is a good update (May 2020) of Codec 2 and FreeDV, including:

  • The new FreeDV 2020 mode using the LPCNet neural net vocoder
  • Embedded FreeDV 700D running on the SM1000
  • FreeDV over the QO-100 geosynchronous satellite and KiwiSDRs
  • Introducing some of the good people contributing to FreeDV

The conference has me interested in applying the open source modems we have developed for digital voice to Amateur Radio packet and HF data. So I’m reading up on Winlink, Pat, Direwolf and friends.

Thanks Scott, Grant, and Vickie and the MicroHams club!

May 16, 2020

Raptor Blackbird support: all upstream in op-build

Thanks to my most recent PR being merged, op-build v2.5 will have full support for the Raptor Blackbird! This includes support for the “IPL Monitor” that’s required to get fan control going.

Note that if you’re running Fedora 32 then you need some patches to buildroot to have it build, but if you’re building on something a little older, then upstream should build and work straight out of the box (err… git tree).

I also note that the work to get Secure Boot for an OS Kernel going is starting to make its way out for code reviews, so that’s something to look forward to (although without a TPM we’re going to need extra code).

May 13, 2020

A op-build v2.5-rc1 based Raptor Blackbird Build

I have done a few builds of firmware for the Raptor Blackbird since I got mine, each of them based on upstream op-build plus a few patches. The previous one was Yet another near-upstream Raptor Blackbird firmware build that I built a couple of months ago. This new build is based off the release candidate of op-build v2.5. Here’s what’s changed:

PackageOld VersionNew Version
hcodehw030220a.opmsthw050520a.opmst
hostbootacdff8a390a2654dd52fed67bdebe2b5
kexec-lite18ec88310c4134e6b0130b3c1ea489e
libflashv6.5-228-g82aed17av6.6
linuxv5.4.22v5.4.33
linux-headersv5.4.22v5.4.33
machine-xml17e9e84d504582c88e782e30829e0d6be
occ3ab29212518e65740ab4dc96fd6cf584c42
openpower-pnor6fb8d914134d544a84175f00d9c6dc395faf3
sbec318ab00116d92f08c78fb7838495ad0aab7
skibootv6.5-228-g82aed17av6.6
Changes in my latest Blackbird build

Go grab blackbird.pnor from https://www.flamingspork.com/blackbird/stewart-blackbird-6-images/, and give it a go! Just scp it to your BMC, and flash it:

pflash -E -p /tmp/blackbird.pnor

There’s two differences from upstream op-build: my pull request to op-build, and the fixing of the (old) buildroot so that it’ll build on Fedora 32. From discussions on the openpower-firmware mailing list, it seems that one hopeful thing is to have all the Blackbird support merged in before the final op-build v2.5 is tagged. The previous op-build release (v2.4) was tagged in July 2019, so we’re about 10 months into what was a 2 month release cycle, so speculating on when that final release will be is somewhat difficult.

May 12, 2020

f32, u32, and const

Some time ago, I wrote “floats, bits, and constant expressions” about converting floating point number into its representative ones and zeros as a C++ constant expression – constructing the IEEE 754 representation without being able to examine the bits directly.

I’ve been playing around with Rust recently, and rewrote that conversion code as a bit of a learning exercise for myself, with a thoroughly contrived set of constraints: using integer and single-precision floating point math, at compile time, without unsafe blocks, while using as few unstable features as possible.

I’ve included the listing below, for your bemusement and/or head-shaking, and you can play with the code in the Rust Playground and rust.godbolt.org

// Jonathan Adamczewski 2020-05-12
//
// Constructing the bit-representation of an IEEE 754 single precision floating 
// point number, using integer and single-precision floating point math, at 
// compile time, in rust, without unsafe blocks, while using as few unstable 
// features as I can.
//
// or "What if this silly C++ thing http://brnz.org/hbr/?p=1518 but in Rust?"


// Q. Why? What is this good for?
// A. To the best of my knowledge, this code serves no useful purpose. 
//    But I did learn a thing or two while writing it :)


// This is needed to be able to perform floating point operations in a const 
// function:
#![feature(const_fn)]


// bits_transmute(): Returns the bits representing a floating point value, by
//                   way of std::mem::transmute()
//
// For completeness (and validation), and to make it clear the fundamentally 
// unnecessary nature of the exercise :D - here's a short, straightforward, 
// library-based version. But it needs the const_transmute flag and an unsafe 
// block.
#![feature(const_transmute)]
const fn bits_transmute(f: f32) -> u32 {
  unsafe { std::mem::transmute::<f32, u32>(f) }
}



// get_if_u32(predicate:bool, if_true: u32, if_false: u32):
//   Returns if_true if predicate is true, else if_false
//
// If and match are not able to be used in const functions (at least, not 
// without #![feature(const_if_match)] - so here's a branch-free select function
// for u32s
const fn get_if_u32(predicate: bool, if_true: u32, if_false: u32) -> u32 {
  let pred_mask = (-1 * (predicate as i32)) as u32;
  let true_val = if_true & pred_mask;
  let false_val = if_false & !pred_mask;
  true_val | false_val
}

// get_if_f32(predicate, if_true, if_false):
//   Returns if_true if predicate is true, else if_false
//
// A branch-free select function for f32s.
// 
// If either is_true or is_false is NaN or an infinity, the result will be NaN,
// which is not ideal. I don't know of a better way to implement this function
// within the arbitrary limitations of this silly little side quest.
const fn get_if_f32(predicate: bool, if_true: f32, if_false: f32) -> f32 {
  // can't convert bool to f32 - but can convert bool to i32 to f32
  let pred_sel = (predicate as i32) as f32;
  let pred_not_sel = ((!predicate) as i32) as f32;
  let true_val = if_true * pred_sel;
  let false_val = if_false * pred_not_sel;
  true_val + false_val
}


// bits(): Returns the bits representing a floating point value.
const fn bits(f: f32) -> u32 {
  // the result value, initialized to a NaN value that will otherwise not be
  // produced by this function.
  let mut r = 0xffff_ffff;

  // These floation point operations (and others) cause the following error:
  //     only int, `bool` and `char` operations are stable in const fn
  // hence #![feature(const_fn)] at the top of the file
  
  // Identify special cases
  let is_zero    = f == 0_f32;
  let is_inf     = f == f32::INFINITY;
  let is_neg_inf = f == f32::NEG_INFINITY;
  let is_nan     = f != f;

  // Writing this as !(is_zero || is_inf || ...) cause the following error:
  //     Loops and conditional expressions are not stable in const fn
  // so instead write this as type coversions, and bitwise operations
  //
  // "normalish" here means that f is a normal or subnormal value
  let is_normalish = 0 == ((is_zero as u32) | (is_inf as u32) | 
                        (is_neg_inf as u32) | (is_nan as u32));

  // set the result value for each of the special cases
  r = get_if_u32(is_zero,    0,           r); // if (iz_zero)    { r = 0; }
  r = get_if_u32(is_inf,     0x7f80_0000, r); // if (is_inf)     { r = 0x7f80_0000; }
  r = get_if_u32(is_neg_inf, 0xff80_0000, r); // if (is_neg_inf) { r = 0xff80_0000; }
  r = get_if_u32(is_nan,     0x7fc0_0000, r); // if (is_nan)     { r = 0x7fc0_0000; }
 
  // It was tempting at this point to try setting f to a "normalish" placeholder 
  // value so that special cases do not have to be handled in the code that 
  // follows, like so:
  // f = get_if_f32(is_normal, f, 1_f32);
  //
  // Unfortunately, get_if_f32() returns NaN if either input is NaN or infinite.
  // Instead of switching the value, we work around the non-normalish cases 
  // later.
  //
  // (This whole function is branch-free, so all of it is executed regardless of 
  // the input value)

  // extract the sign bit
  let sign_bit  = get_if_u32(f < 0_f32,  1, 0);

  // compute the absolute value of f
  let mut abs_f = get_if_f32(f < 0_f32, -f, f);

  
  // This part is a little complicated. The algorithm is functionally the same 
  // as the C++ version linked from the top of the file.
  // 
  // Because of the various contrived constraints on thie problem, we compute 
  // the exponent and significand, rather than extract the bits directly.
  //
  // The idea is this:
  // Every finite single precision float point number can be represented as a
  // series of (at most) 24 significant digits as a 128.149 fixed point number 
  // (128: 126 exponent values >= 0, plus one for the implicit leading 1, plus 
  // one more so that the decimal point falls on a power-of-two boundary :)
  // 149: 126 negative exponent values, plus 23 for the bits of precision in the 
  // significand.)
  //
  // If we are able to scale the number such that all of the precision bits fall 
  // in the upper-most 64 bits of that fixed-point representation (while 
  // tracking our effective manipulation of the exponent), we can then 
  // predictably and simply scale that computed value back to a range than can 
  // be converted safely to a u64, count the leading zeros to determine the 
  // exact exponent, and then shift the result into position for the final u32 
  // representation.
  
  // Start with the largest possible exponent - subsequent steps will reduce 
  // this number as appropriate
  let mut exponent: u32 = 254;
  {
    // Hex float literals are really nice. I miss them.

    // The threshold is 2^87 (think: 64+23 bits) to ensure that the number will 
    // be large enough that, when scaled down by 2^64, all the precision will 
    // fit nicely in a u64
    const THRESHOLD: f32 = 154742504910672534362390528_f32; // 0x1p87f == 2^87

    // The scaling factor is 2^41 (think: 64-23 bits) to ensure that a number 
    // between 2^87 and 2^64 will not overflow in a single scaling step.
    const SCALE_UP: f32 = 2199023255552_f32; // 0x1p41f == 2^41

    // Because loops are not available (no #![feature(const_loops)], and 'if' is
    // not available (no #![feature(const_if_match)]), perform repeated branch-
    // free conditional multiplication of abs_f.

    // use a macro, because why not :D It's the most compact, simplest option I 
    // could find.
    macro_rules! maybe_scale {
      () => {{
        // care is needed: if abs_f is above the threshold, multiplying by 2^41 
        // will cause it to overflow (INFINITY) which will cause get_if_f32() to
        // return NaN, which will destroy the value in abs_f. So compute a safe 
        // scaling factor for each iteration.
        //
        // Roughly equivalent to :
        // if (abs_f < THRESHOLD) {
        //   exponent -= 41;
        //   abs_f += SCALE_UP;
        // }
        let scale = get_if_f32(abs_f < THRESHOLD, SCALE_UP,      1_f32);    
        exponent  = get_if_u32(abs_f < THRESHOLD, exponent - 41, exponent); 
        abs_f     = get_if_f32(abs_f < THRESHOLD, abs_f * scale, abs_f);
      }}
    }
    // 41 bits per iteration means up to 246 bits shifted.
    // Even the smallest subnormal value will end up in the desired range.
    maybe_scale!();  maybe_scale!();  maybe_scale!();
    maybe_scale!();  maybe_scale!();  maybe_scale!();
  }

  // Now that we know that abs_f is in the desired range (2^87 <= abs_f < 2^128)
  // scale it down to be in the range (2^23 <= _ < 2^64), and convert without 
  // loss of precision to u64.
  const INV_2_64: f32 = 5.42101086242752217003726400434970855712890625e-20_f32; // 0x1p-64f == 2^64
  let a = (abs_f * INV_2_64) as u64;

  // Count the leading zeros.
  // (C++ doesn't provide a compile-time constant function for this. It's nice 
  // that rust does :)
  let mut lz = a.leading_zeros();

  // if the number isn't normalish, lz is meaningless: we stomp it with 
  // something that will not cause problems in the computation that follows - 
  // the result of which is meaningless, and will be ignored in the end for 
  // non-normalish values.
  lz = get_if_u32(!is_normalish, 0, lz); // if (!is_normalish) { lz = 0; }

  {
    // This step accounts for subnormal numbers, where there are more leading 
    // zeros than can be accounted for in a valid exponent value, and leading 
    // zeros that must remain in the final significand.
    //
    // If lz < exponent, reduce exponent to its final correct value - lz will be
    // used to remove all of the leading zeros.
    //
    // Otherwise, clamp exponent to zero, and adjust lz to ensure that the 
    // correct number of bits will remain (after multiplying by 2^41 six times - 
    // 2^246 - there are 7 leading zeros ahead of the original subnormal's
    // computed significand of 0.sss...)
    // 
    // The following is roughly equivalent to:
    // if (lz < exponent) {
    //   exponent = exponent - lz;
    // } else {
    //   exponent = 0;
    //   lz = 7;
    // }

    // we're about to mess with lz and exponent - compute and store the relative 
    // value of the two
    let lz_is_less_than_exponent = lz < exponent;

    lz       = get_if_u32(!lz_is_less_than_exponent, 7,             lz);
    exponent = get_if_u32( lz_is_less_than_exponent, exponent - lz, 0);
  }

  // compute the final significand.
  // + 1 shifts away a leading 1-bit for normal, and 0-bit for subnormal values
  // Shifts are done in u64 (that leading bit is shifted into the void), then
  // the resulting bits are shifted back to their final resting place.
  let significand = ((a << (lz + 1)) >> (64 - 23)) as u32;

  // combine the bits
  let computed_bits = (sign_bit << 31) | (exponent << 23) | significand;

  // return the normalish result, or the non-normalish result, as appopriate
  get_if_u32(is_normalish, computed_bits, r)
}


// Compile-time validation - able to be examined in rust.godbolt.org output
pub static BITS_BIGNUM: u32 = bits(std::f32::MAX);
pub static TBITS_BIGNUM: u32 = bits_transmute(std::f32::MAX);
pub static BITS_LOWER_THAN_MIN: u32 = bits(7.0064923217e-46_f32);
pub static TBITS_LOWER_THAN_MIN: u32 = bits_transmute(7.0064923217e-46_f32);
pub static BITS_ZERO: u32 = bits(0.0f32);
pub static TBITS_ZERO: u32 = bits_transmute(0.0f32);
pub static BITS_ONE: u32 = bits(1.0f32);
pub static TBITS_ONE: u32 = bits_transmute(1.0f32);
pub static BITS_NEG_ONE: u32 = bits(-1.0f32);
pub static TBITS_NEG_ONE: u32 = bits_transmute(-1.0f32);
pub static BITS_INF: u32 = bits(std::f32::INFINITY);
pub static TBITS_INF: u32 = bits_transmute(std::f32::INFINITY);
pub static BITS_NEG_INF: u32 = bits(std::f32::NEG_INFINITY);
pub static TBITS_NEG_INF: u32 = bits_transmute(std::f32::NEG_INFINITY);
pub static BITS_NAN: u32 = bits(std::f32::NAN);
pub static TBITS_NAN: u32 = bits_transmute(std::f32::NAN);
pub static BITS_COMPUTED_NAN: u32 = bits(std::f32::INFINITY/std::f32::INFINITY);
pub static TBITS_COMPUTED_NAN: u32 = bits_transmute(std::f32::INFINITY/std::f32::INFINITY);


// Run-time validation of many more values
fn main() {
  let end: usize = 0xffff_ffff;
  let count = 9_876_543; // number of values to test
  let step = end / count;
  for u in (0..=end).step_by(step) {
      let v = u as u32;
      
      // reference
      let f = unsafe { std::mem::transmute::<u32, f32>(v) };
      
      // compute
      let c = bits(f);

      // validation
      if c != v && 
         !(f.is_nan() && c == 0x7fc0_0000) && // nans
         !(v == 0x8000_0000 && c == 0) { // negative 0
          println!("{:x?} {:x?}", v, c); 
      }
  }
}

May 10, 2020

IT Asset Management

In my last full-time position I managed the asset tracking database for my employer. It was one of those things that “someone” needed to do, and it seemed that only way that “someone” wouldn’t equate to “no-one” was for me to do it – which was ok. We used Snipe IT [1] to track the assets. I don’t have enough experience with asset tracking to say that Snipe is better or worse than average, but it basically did the job. Asset serial numbers are stored, you can have asset types that allow you to just add one more of the particular item, purchase dates are stored which makes warranty tracking easier, and every asset is associated with a person or listed as available. While I can’t say that Snipe IT is better than other products I can say that it will do the job reasonably well.

One problem that I didn’t discover until way too late was the fact that the finance people weren’t tracking serial numbers and that some assets in the database had the same asset IDs as the finance department and some had different ones. The best advice I can give to anyone who gets involved with asset tracking is to immediately chat to finance about how they track things, you need to know if the same asset IDs are used and if serial numbers are tracked by finance. I was pleased to discover that my colleagues were all honourable people as there was no apparent evaporation of valuable assets even though there was little ability to discover who might have been the last person to use some of the assets.

One problem that I’ve seen at many places is treating small items like keyboards and mice as “assets”. I think that anything that is worth less than 1 hour’s pay at the minimum wage (the price of a typical PC keyboard or mouse) isn’t worth tracking, treat it as a disposable item. If you hire a programmer who requests an unusually expensive keyboard or mouse (as some do) it still won’t be a lot of money when compared to their salary. Some of the older keyboards and mice that companies have are nasty, months of people eating lunch over them leaves them greasy and sticky. I think that the best thing to do with the keyboards and mice is to give them away when people leave and when new people join the company buy new hardware for them. If a company can’t spend $25 on a new keyboard and mouse for each new employee then they either have a massive problem of staff turnover or a lack of priority on morale.

A breadmaker loaf my kids will actually eat

Share

My dad asked me to document some of my baking experiments from the recent natural disasters, which I wanted to do anyway so that I could remember the recipes. Its taken me a while to get around to though, because animated GIFs on reddit are a terrible medium for recipe storage, and because I’ve been distracted with other shiney objects. That said, let’s start with the basics — a breadmaker bread that my kids will actually eat.

A loaf of bread baked in the oven

This recipe took a bunch of iterations to get right over the last year or so, but I’ll spare you the long boring details. However, I suspect part of the problem is that the receipe varies by bread maker. Oh, and the salt is really important — don’t skip the salt!

Wet ingredients (add first)

  • 1.5 cups of warm water (we have an instantaneous gas hot water system, so I pick 42 degrees)
  • 0.25 cups of oil (I use bran oil)

Dry ingredients (add second)

I just kind of chuck these in, although I tend to put the non-flour ingredients in a corner together for reasons that I can’t explain.

  • 3.5 cups of bakers flour (must be bakers flour, not plain flour)
  • 2 tea spoons of instant yeast (we keep in the freezer in a big packet, not the sashets)
  • 4 tea spoons of white sugar
  • 1 tea spoon of salt
  • 2 tea spoons of bread improver

I then just let my bread maker do its thing, which takes about three hours including baking. If I am going to bake the bread in the over, then the dough takes about two hours, but I let the dough rise for another 30 to 60 minutes before baking.

A loaf of bread from the bread maker

I think to be honest that the result is better from the oven, but a little more work. The bread maker loaves are a bit prone to collapsing (you can see it starting on the example above), and there is a big kneeding hook indent in the middle of the bottom of the loaf.

The oven baking technique took a while to develop, but I’ll cover that in a later post.

Share

May 06, 2020

About Reopening Businesses

Currently there is political debate about when businesses should be reopened after the Covid19 quarantine.

Small Businesses

One argument for reopening things is for the benefit of small businesses. The first thing to note is that the protests in the US say “I need a haircut” not “I need to cut people’s hair”. Small businesses won’t benefit from reopening sooner.

For every business there is a certain minimum number of customers needed to be profitable. There are many comments from small business owners that want it to remain shutdown. When the government has declared a shutdown and paused rent payments and provided social security to employees who aren’t working the small business can avoid bankruptcy. If they suddenly have to pay salaries or make redundancy payouts and have to pay rent while they can’t make a profit due to customers staying home they will go bankrupt.

Many restaurants and cafes make little or no profit at most times of the week (I used to be 1/3 owner of an Internet cafe and know this well). For such a company to be viable you have to be open most of the time so customers can expect you to be open. Generally you don’t keep a cafe open at 3PM to make money at 3PM, you keep it open so people can rely on there being a cafe open there, someone who buys a can of soda at 3PM one day might come back for lunch at 1:30PM the next day because they know you are open. A large portion of the opening hours of a most retail companies can be considered as either advertising for trade at the profitable hours or as loss making times that you can’t close because you can’t send an employee home for an hour.

If you have seating for 28 people (as my cafe did) then for about half the opening hours you will probably have 2 or fewer customers in there at any time, for about a quarter the opening hours you probably won’t cover the salary of the one person on duty. The weekend is when you make the real money, especially Friday and Saturday nights when you sometimes get all the seats full and people coming in for takeaway coffee and snacks. On Friday and Saturday nights the 60 seat restaurant next door to my cafe used to tell customers that my cafe made better coffee. It wasn’t economical for them to have a table full for an hour while they sell a few cups of coffee, they wanted customers to leave after dessert and free the table for someone who wants a meal with wine (alcohol is the real profit for many restaurants).

The plans of reopening with social distancing means that a 28 seat cafe can only have 14 chairs or less (some plans have 25% capacity which would mean 7 people maximum). That means decreasing the revenue of the most profitable times by 50% to 75% while also not decreasing the operating costs much. A small cafe has 2-3 staff when it’s crowded so there’s no possibility of reducing staff by 75% when reducing the revenue by 75%.

My Internet cafe would have closed immediately if forced to operate in the proposed social distancing model. It would have been 1/4 of the trade and about 1/8 of the profit at the most profitable times, even if enough customers are prepared to visit – and social distancing would kill the atmosphere. Most small businesses are barely profitable anyway, most small businesses don’t last 4 years in normal economic circumstances.

This reopen movement is about cutting unemployment benefits not about helping small business owners. Destroying small businesses is also good for big corporations, kill the small cafes and restaurants and McDonald’s and Starbucks will win. I think this is part of the motivation behind the astroturf campaign for reopening businesses.

Forbes has an article about this [1].

Psychological Issues

Some people claim that we should reopen businesses to help people who have psychological problems from isolation, to help victims of domestic violence who are trapped at home, to stop older people being unemployed for the rest of their lives, etc.

Here is one article with advice for policy makers from domestic violence experts [2]. One thing it mentions is that the primary US federal government program to deal with family violence had a budget of $130M in 2013. The main thing that should be done about family violence is to make it a priority at all times (not just when it can be a reason for avoiding other issues) and allocate some serious budget to it. An agency that deals with problems that affect families and only has a budget of $1 per family per year isn’t going to be able to do much.

There are ongoing issues of people stuck at home for various reasons. We could work on better public transport to help people who can’t drive. We could work on better healthcare to help some of the people who can’t leave home due to health problems. We could have more budget for carers to help people who can’t leave home without assistance. Wanting to reopen restaurants because some people feel isolated is ignoring the fact that social isolation is a long term ongoing issue for many people, and that many of the people who are affected can’t even afford to eat at a restaurant!

Employment discrimination against people in the 50+ age range is an ongoing thing, many people in that age range know that if they lose their job and can’t immediately find another they will be unemployed for the rest of their lives. Reopening small businesses won’t help that, businesses running at low capacity will have to lay people off and it will probably be the older people. Also the unemployment system doesn’t deal well with part time work. The Australian system (which I think is similar to most systems in this regard) reduces the unemployment benefits by $0.50 for every dollar that is earned in part time work, that effectively puts people who are doing part time work because they can’t get a full-time job in the highest tax bracket! If someone is going to pay for transport to get to work, work a few hours, then get half the money they earned deducted from unemployment benefits it hardly makes it worthwhile to work. While the exact health impacts of Covid19 aren’t well known at this stage it seems very clear that older people are disproportionately affected, so forcing older people to go back to work before there is a vaccine isn’t going to help them.

When it comes to these discussions I think we should be very suspicious of people who raise issues they haven’t previously shown interest in. If the discussion of reopening businesses seems to be someone’s first interest in the issues of mental health, social security, etc then they probably aren’t that concerned about such issues.

I believe that we should have a Universal Basic Income [3]. I believe that we need to provide better mental health care and challenge the gender ideas that hurt men and cause men to hurt women [4]. I believe that we have significant ongoing problems with inequality not small short term issues [5]. I don’t think that any of these issues require specific changes to our approach to preventing the transmission of disease. I also think that we can address multiple issues at the same time, so it is possible for the government to devote more resources to addressing unemployment, family violence, etc while also dealing with a pandemic.

May 03, 2020

Backing up to a GnuBee PC 2

After installing Debian buster on my GnuBee, I set it up for receiving backups from my other computers.

Software setup

I started by configuring it like a typical server but without a few packages that either take a lot of memory or CPU:

I changed the default hostname:

  • /etc/hostname: foobar
  • /etc/mailname: foobar.example.com
  • /etc/hosts: 127.0.0.1 foobar.example.com vogar localhost

and then installed the avahi-daemon package to be able to reach this box using foobar.local.

I noticed the presence of a world-writable directory and so I tightened the security of some of the default mount points by putting the following in /etc/rc.local:

mount -o remount,nodev,nosuid /etc/network
mount -o remount,nodev,nosuid /lib/modules
chmod 755 /etc/network
exit 0

Hardware setup

My OS drive (/dev/sda) is a small SSD so that the GnuBee can run silently when the spinning disks aren't needed. To hold the backup data on the other hand, I got three 4-TB drives drives which I setup in a RAID-5 array. If the data were valuable, I'd use RAID-6 instead since it can survive two drives failing at the same time, but in this case since it's only holding backups, I'd have to lose the original machine at the same time as two of the 3 drives, a very unlikely scenario.

I created new gpt partition tables on /dev/sdb, /dev/sdbc, /dev/sdd and used fdisk to create a single partition of type 29 (Linux RAID) on each of them.

Then I created the RAID array:

mdadm /dev/md127 --create -n 3 --level=raid5 -a /dev/sdb1 /dev/sdc1 /dev/sdd1

and waited more than 24 hours for that operation to finish. Next, I formatted the array:

mkfs.ext4 -m 0 /dev/md127

and added the following to /etc/fstab:

/dev/md127 /mnt/data/ ext4 noatime,nodiratime 0 2

To reduce unnecessary noise and reduce power consumption, I also installed hdparm:

apt install hdparm

and configured all spinning drives to spin down after being idle for 10 minutes by putting the following in /etc/hdparm.conf:

/dev/sdb {
       spindown_time = 120
}

/dev/sdc {
       spindown_time = 120
}

/dev/sdd {
       spindown_time = 120
}

and then reloaded the configuration:

 /usr/lib/pm-utils/power.d/95hdparm-apm resume

Finally I setup smartmontools by putting the following in /etc/smartd.conf:

/dev/sda -a -o on -S on -s (S/../.././02|L/../../6/03)
/dev/sdb -a -o on -S on -s (S/../.././02|L/../../6/03)
/dev/sdc -a -o on -S on -s (S/../.././02|L/../../6/03)
/dev/sdd -a -o on -S on -s (S/../.././02|L/../../6/03)

and restarting the daemon:

systemctl restart smartd.service

Backup setup

I started by using duplicity since I have been using that tool for many years, but a 190GB backup took around 15 hours on the GnuBee with gigabit ethernet.

After a friend suggested it, I took a look at restic and I have to say that I am impressed. The same backup finished in about half the time.

User and ssh setup

After hardening the ssh setup as I usually do, I created a user account for each machine needing to backup onto the GnuBee:

adduser machine1
adduser machine1 sshuser
adduser machine1 sftponly
chsh machine1 -s /bin/false

and then matching directories under /mnt/data/home/:

mkdir /mnt/data/home/machine1
chown machine1:machine1 /mnt/data/home/machine1
chmod 700 /mnt/data/home/machine1

Then I created a custom ssh key for each machine:

ssh-keygen -f /root/.ssh/foobar_backups -t ed25519

and placed it in /home/machine1/.ssh/authorized_keys on the GnuBee.

On each machine, I added the following to /root/.ssh/config:

Host foobar.local
    User machine1
    Compression no
    Ciphers aes128-ctr
    IdentityFile /root/backup/foobar_backups
    IdentitiesOnly yes
    ServerAliveInterval 60
    ServerAliveCountMax 240

The reason for setting the ssh cipher and disabling compression is to speed up the ssh connection as much as possible given that the GnuBee has avery small RAM bandwidth.

Another performance-related change I made on the GnuBee was switching to the internal sftp server by putting the following in /etc/ssh/sshd_config:

Subsystem      sftp    internal-sftp

Restic script

After reading through the excellent restic documentation, I wrote the following backup script, based on my old duplicity script, to reuse on all of my computers:

# Configure for each host
PASSWORD="XXXX"  # use `pwgen -s 64` to generate a good random password
BACKUP_HOME="/root/backup"
REMOTE_URL="sftp:foobar.local:"
RETENTION_POLICY="--keep-daily 7 --keep-weekly 4 --keep-monthly 12 --keep-yearly 2"

# Internal variables
SSH_IDENTITY="IdentityFile=$BACKUP_HOME/foobar_backups"
EXCLUDE_FILE="$BACKUP_HOME/exclude"
PKG_FILE="$BACKUP_HOME/dpkg-selections"
PARTITION_FILE="$BACKUP_HOME/partitions"

# If the list of files has been requested, only do that
if [ "$1" = "--list-current-files" ]; then
    RESTIC_PASSWORD=$PASSWORD restic --quiet -r $REMOTE_URL ls latest
    exit 0

# Show list of available snapshots
elif [ "$1" = "--list-snapshots" ]; then
    RESTIC_PASSWORD=$GPG_PASSWORD restic --quiet -r $REMOTE_URL snapshots
    exit 0

# Restore the given file
elif [ "$1" = "--file-to-restore" ]; then
    if [ "$2" = "" ]; then
        echo "You must specify a file to restore"
        exit 2
    fi
    RESTORE_DIR="$(mktemp -d ./restored_XXXXXXXX)"
    RESTIC_PASSWORD=$PASSWORD restic --quiet -r $REMOTE_URL restore latest --target "$RESTORE_DIR" --include "$2" || exit 1
    echo "$2 was restored to $RESTORE_DIR"
    exit 0

# Delete old backups
elif [ "$1" = "--prune" ]; then
    # Expire old backups
    RESTIC_PASSWORD=$PASSWORD restic --quiet -r $REMOTE_URL forget $RETENTION_POLICY

    # Delete files which are no longer necessary (slow)
    RESTIC_PASSWORD=$PASSWORD restic --quiet -r $REMOTE_URL prune
    exit 0

# Catch invalid arguments
elif [ "$1" != "" ]; then
    echo "Invalid argument: $1"
    exit 1
fi

# Check the integrity of existing backups
RESTIC_PASSWORD=$PASSWORD restic --quiet -r $REMOTE_URL check || exit 1

# Dump list of Debian packages
dpkg --get-selections > $PKG_FILE

# Dump partition tables from harddrives
/sbin/fdisk -l /dev/sda > $PARTITION_FILE
/sbin/fdisk -l /dev/sdb > $PARTITION_FILE

# Do the actual backup
RESTIC_PASSWORD=$PASSWORD restic --quiet --cleanup-cache -r $REMOTE_URL backup / --exclude-file $EXCLUDE_FILE

I run it with the following cronjob in /etc/cron.d/backups:

30 8 * * *    root  ionice nice nocache /root/backup/backup-machine1-to-foobar
30 2 * * Sun  root  ionice nice nocache /root/backup/backup-machine1-to-foobar --prune

in a way that doesn't impact the rest of the system too much.

Finally, I printed a copy of each of my backup script, using enscript, to stash in a safe place:

enscript --highlight=bash --style=emacs --output=- backup-machine1-to-foobar | ps2pdf - > foobar.pdf

This is actually a pretty important step since without the password, you won't be able to decrypt and restore what's on the GnuBee.

May 02, 2020

Audiobooks – April 2020

Cockpit Confidential: Everything You Need to Know About Air Travel: Questions, Answers, and Reflections by Patrick Smith

Lots of “you always wanted to know” & “this is how it really is” bits about commercial flying. Good fun 4/5

The Day of the Jackal by Frederick Forsyth

A very tightly written thriller about a fictional 1963 plot to assassinate Frnch President Charles de Gaulle. Fast moving, detailed and captivating 5/5

Topgun: An American Story by Dan Pedersen

Memoir from the first officer in charge of the US Navy’s Top Gun school. A mix of his life & career, the school and US Navy air history (especially during Vietnam). Excellent 4/5

Radicalized: Four Tales of Our Present Moment
by Cory Doctorow

4 short stories set in more-or-less the present day. They all work fairly well. Worth a read. Spoilers in the link. 3/5

On the Banks of Plum Creek: Little House Series, Book 4 by Laura Ingalls Wilder

The family settle in Minnesota and build a new farm. Various major and minor adventures. I’m struck how few possessions people had back then. 3/5

My Father’s Business: The Small-Town Values That Built Dollar General into a Billion-Dollar Company by Cal Turner Jr.

A mix of personal and company history. I found the early story of the company and personal stuff the most interesting. 3/5

You Can’t Fall Off the Floor: And Other Lessons from a Life in Hollywood by Harris and Nick Katleman

Memoir by a former studio exec and head. Lots of funny and interesting stories from his career, featuring plenty of famous names. 4/5

The Wave: In Pursuit of the Rogues, Freaks and Giants of the Ocean by Susan Casey

75% about Big-wave Tow-Surfers with chapters on Scientists and Shipping industry people mixed in. Competent but author’s heart seemed mostly in the surfing. 3/5

Share

April 27, 2020

Install the COVIDSafe app

I can’t think of a more unequivocal title than that. 🙂

The Australian government doesn’t have a good track record of either launching publicly visible software projects, or respecting privacy, so I’ve naturally been sceptical of the contact tracing app since it was announced. The good news is, while it has some relatively minor problems, it appears to be a solid first version.

Privacy

While the source code is yet to be released, the Android version has already been decompiled, and public analysis is showing that it only collects necessary information, and only uploads contact information to the government servers when you press the button to upload (you should only press that button if you actually get COVID-19, and are asked to upload it by your doctor).

The legislation around the app is also clear that the data you upload can only be accessed by state health officials. Commonwealth departments have no access, neither do non-health departments (eg, law enforcement, intelligence).

Technical

It does what it’s supposed to do, and hasn’t been found to open you up to risks by installing it. There are a lot of people digging into it, so I would expect any significant issues to be found, reported, and fixed quite quickly.

Some parts of it are a bit rushed, and the way it scans for contacts could be more battery efficient (that should hopefully be fixed in the coming weeks when Google and Apple release updates that these contact tracing apps can use).

If it produces useful data, however, I’m willing to put up with some quirks. 🙂

Usefulness

I’m obviously not an epidemiologist, but those I’ve seen talk about it say that yes, the data this app produces will be useful for augmenting the existing contact tracing efforts. There were some concerns that it could produce a lot of junk data that wastes time, but I trust the expert contact tracing teams to filter and prioritise the data they get from it.

Install it!

The COVIDSafe site has links to the app in Apple’s App Store, as well as Google’s Play Store. Setting it up takes a few minutes, and then you’re done!

April 26, 2020

YouTube Channels I subscribe to in April 2020

I did a big twitter thread of the YouTube channels I am following. Below is a copy of the tweets. They are a quick description of the channel and a link to a sample video.

Lots of pop-Science and TV/Movie analysis channels plus a few on other topics.

I should mention that I watch the majority of YouTube videos at speed 1.5x since they usually speak quite slowly. To Speed up videos click on the settings “cog” and then select “Playback Speed” . YouTube lets you go up to 2x

Image

Chris Stuckmann reviews movies. During normal times he does a couple per week. Mostly currently releases with some old ones. His reviews are low-spoiler although sometimes he’ll do an extra “Spoiler Review”. Usually around 6 minutes long.
Star Wars: The Rise of Skywalker – Movie Review

Wendover Productions does explainer videos. Air & Sea travel are quite common topics. Usually a bit better researched than some of the other channels and a little longer at around 12 minutes. Around 1 video per week.
The Logistics of the US Census

City Beautiful is a channel about cities and City planning. 1-2 videos per month. Usually around 10 minutes. Pitched for the amateur city and planning enthusiast
Where did the rules of the road come from?

PBS Eons does videos about the history of life on Earth. Lots of Dinosaurs, early humans and the like. Run and advised by experts so info is great quality. Links to refs! Accessible but dives into the detail. Around 1 video/week. About 10 minutes each.
How the Egg Came First

Pitch Meetings are a writer pitching a real (usually recent) movie or show to a studio exec. Both a played by Ryan George. Very funny. Part of the Screen Rant channel but I don’t watch their other stuff
Playlist
Netflix’s Tiger King Pitch Meeting

MrMobile [Michael Fisher] reviews Phones, Laptops, Smart Watches & other tech gadgets. Usually about one video/week. I like the descriptive style and good production values, Not too much spec flooding.
A Stunning Smartwatch With A Familiar Failing – New Moto 360 Review

Verge Science does professional level stories about a range of Science topics. They usually are out in the field with Engineers and scientists.
Why urban coyote sightings are on the rise

Alt Shift X do detailed explainer videos about Books & TV Shows like Game of Thrones, Watchmen & Westworld. Huge amounts of detail and a great style with a wall of pictures. Weekly videos when shows are on plus subscriber extras.
Watchmen Explained (original comic)

The B1M talks about building and construction projects. Many videos are done with cooperation of the architects or building companies so a bit fluffy at times. But good production values and interesting topics.
The World’s Tallest Modular Hotel

CineFix doesn’t a variety of Movie-related videos. Over the last year only putting about one or two per month and mostly high quality. A few years ago they were at higher volume and had more throw-aways
Jojo Rabbit – What’s the Difference?

Marques Brownlee (MKBHD) does tech reviews. Mainly phones but also other gear and the odd special. His videos are extremely high quality and well researched. Averaging 2 videos per week.
Samsung Galaxy S20 Ultra Review: Attack of the Numbers!

How it Should have Ended does cartoons of funny alternative endings for movies. Plus some other long running series. Usually only a few minutes long.
Avengers Endgame Alternate HISHE

Power Play Chess is a Chess channel from Daniel King. He usually covers 1 round/day from major tournaments as well as reviewing older games and other videos.
World Champion tastes the bullet | Firouzja vs Carlsen | Lichess Bullet match 2020

Tom Scott makes explainer videos mostly about science, technology and geography. Often filmed on site rather than being talks over pictures like other channels.
Inside The Billion-Euro Nuclear Reactor That Was Never Switched On

Screen Junkies does stuff about movies. I mostly watch their “Honest Trailers” but they sometimes do ‘Serious Questions” which are good too.
Honest Trailers | Terminator: Dark Fate

Half as Interesting is an offshoot of Wendover Productions (see above). It does shorter 3-5 minutes weekly videos on a quick amusing fact or happening (that doesn’t justify a longer video)
United Airlines’ Men-Only Flights

Red Team Review is another movie and TV review channel. I was mostly watching them when Game of Thrones was on and since then they have had a bit less content. They are making some Game of Thrones videos narrated by the TV actors though
Game of Thrones Histories & Lore – The Rains of Castamere

Signum University do online classes about Fantasy (especially Tolkien) and related literature. Their channel features their classes and related videos. I mainly follow “Exploring The Lord of the Rings”. Often sounds better at 2x or 3x speed.
A Wizard of Earthsea: Session 01 – Mageborn

The Nerdwriter does approx monthly videos. Usually about a specific type of art, a painting or film making technique. Very high quality
How Walter Murch Worldized Film Sound

Real Life Lore does infotainment videos. “Answers to questions that you’ve never asked. Mostly over topics like history, geography, economics and science”.
This Was the World’s Most Dangerous Amusement Park

Janice Fung is a Sydney based youtuber who makes videos mostly about food and travel. She puts out 2 videos most weeks.
I Made the Viral Tik Tok Frothy DALGONA COFFEE! (Whipped Coffee Without Mixer!!)

Real Engineering is a bit more technical than the average popsci channel. The especially like doing videos covering flight dynamics. but they cover lots of other topics
How The Ford Model T Took Over The World

Just Write by Sage Hyden puts out a video roughly once a month. They are essays usually about writing and usually tied into a recently movie or show.
A Disney Monopoly Is A Problem (According To Disney’s Recess)

CGP Grey makes high quality explainer videos. Around one every month. High quality and usually with lots of animation.
The Trouble With Tumbleweed

Lessons from the Screenplay are “videos that analyze movie scripts to examine exactly how and why they are so good at telling their stories”
Casino Royale — How Action Reveals Character

HaxDogma is another TV Show review/analysis channel. I started watching him for his Watchmen Series videos and now watch his Westworld ones.
Official Westworld Trailer Breakdown + 3 Hidden Trailers

Lindsay Ellis does videos mostly about pop culture, Usually movies. These days she only does a few a year but they are usually 20+ minutes.
The Hobbit: A Long-Expected Autopsy (Part 1/2)

A bonus couple of recommended Courses on ‘Crash Course
Crash Course Astronomy with Phil Plait
Crash Course Computer Science by Carrie Anne Philbin

Share

April 24, 2020

Disabling mail sending from your domain

I noticed that I was receiving some bounced email notifications from a domain I own (cloud.geek.nz) to host my blog. These notifications were all for spam messages spoofing the From address since I do not use that domain for email.

I decided to try setting a strict DMARC policy to see if DMARC-using mail servers (e.g. GMail) would then drop these spoofed emails without notifying me about it.

I started by setting this initial DMARC policy in DNS in order to monitor the change:

@ TXT v=spf1 -all
_dmarc TXT v=DMARC1; p=none; ruf=mailto:dmarc@fmarier.org; sp=none; aspf=s; fo=0:1:d:s;

Then I waited three weeks without receiving anything before updating the relevant DNS records to this final DMARC policy:

@ TXT v=spf1 -all
_dmarc TXT v=DMARC1; p=reject; sp=reject; aspf=s;

This policy states that nobody is allowed to send emails for this domain and that any incoming email claiming to be from this domain should be silently rejected.

I haven't noticed any bounce notifications for messages spoofing this domain in a while, so maybe it's working?

FreeDV Beacon Maintenance

There’s been some recent interest in the FreeDV Beacon project, originally developed back in 2015. A FreeDV beacon was operating in Sunbury, VK3, for several years and was very useful for testing FreeDV.

After being approach by John (VK3IC) and Bob (VK4YA), I decided to dust off the software and bring it across to a GitHub repo. It’s now running on my laptop happily and I hope John and Bob will soon have some beacons running on the air.

I’ve added support for FreeDV 700C and 700D modes, finding a tricky bug in the process. I really should read the instructions for my own API!

Thanks also to Richard (KF5OIM) for help with the Cmake build system.


April 18, 2020

Accessing USB serial devices in Fedora Silverblue

One of the things I do a lot on my Fedora machines is talk to devices via USB serial. While a device is correctly detected at /dev/ttyUSB0 and owned by the dialout group, adding myself to that group doesn’t work as it can’t be found. This is because under Silverblue, there are two different group files (/usr/lib/group and /etc/group) with different content.

There are some easy ways to solve this, for example we can create the matching dialout group or write a udev rule. Let’s take a look!

On the host with groups

If you try to add yourself to the dialout group it will fail.

sudo gpasswd -a ${USER} dialout
gpasswd: group 'dialout' does not exist in /etc/group

Trying to re-create the group will also fail as it’s already in use.

sudo groupadd dialout -r -g 18
groupadd: GID '18' already exists

So instead, we can simply grab the entry from the OS group file and add it to /etc/group ourselves.

grep ^dialout: /usr/lib/group |sudo tee -a /etc/group

Now we are able to add ourselves to the dialout group!

sudo gpasswd -a ${USER} dialout

Activate that group in our current shell.

newgrp dialout

And now we can use a tool like screen to talk to the device (note you will have needed to install screen with rpm-ostree and rebooted first).

screen /dev/ttyUSB0 115200

And that’s it. We can now talk to USB serial devices on the host.

Inside a container with udev

Inside a container is a little more tricky as the dialout group is not passed into it. Thus, inside the container the device is owned by nobody and the user will have no permissions to read or write to it.

One way to deal with this and still use the regular toolbox command is to create a udev rule and make yourself the owner of the device on the host, instead of root.

To do this, we create a generic udev rule for all usb-serial devices.

cat << EOF | sudo tee /etc/udev/rules.d/50-usb-serial.rules
SUBSYSTEM=="tty", SUBSYSTEMS=="usb-serial", OWNER="${USER}"
EOF

If you need to create a more specific rule, you can find other bits to match by (like kernel driver, etc) with the udevadm command.

udevadm info -a -n /dev/ttyUSB0

Once you have your rule, reload udev.

sudo udevadm control --reload-rules
sudo udevadm trigger

Now, unplug your serial device and plug it back in. You should notice that it is now owned by your user.

ls -l /dev/ttyUSB0
crw-rw----. 1 csmart dialout 188, 0 Apr 18 20:53 /dev/ttyUSB0

It should also be the same inside the toolbox container now.

[21:03 csmart ~]$ toolbox enter
⬢[csmart@toolbox ~]$ ls -l /dev/ttyUSB0 
crw-rw----. 1 csmart nobody 188, 0 Apr 18 20:53 /dev/ttyUSB0

And of course, as this is inside a container, you can just dnf install screen or whatever other program you need.

Of course, if you’re happy to create the udev rule then you don’t need to worry about the groups solution on the host.

Making dnf on Fedora Silverblue a little easier with bash aliases

Fedora Silverblue doesn’t come with dnf because it’s an immutable operating system and uses a special tool called rpm-ostree to layer packages on top instead.

Most terminal work is designed to be done in containers with toolbox, but I still do a bunch of work outside of a container. Searching for packages to install with rpm-ostree still requires dnf inside a container, as it does not have that function.

I add these two aliases to my ~/.bashrc file so that using dnf to search or install into the default container is possible from a regular terminal. This just makes Silverblue a little bit more like what I’m used to with regular Fedora.

cat >> ~/.bashrc << EOF
alias sudo="sudo "
alias dnf="bash -c '#skip_sudo'; toolbox -y create 2>/dev/null; toolbox run sudo dnf"
EOF

If the default container doesn’t exist, toolbox creates it. Note that the alias for sudo has a space at the end. This tells bash to also check the next command word for alias expansion, which is what makes sudo work with aliases. Thus, we can make sure that both dnf and sudo dnf will work. The first part of the dnf alias is used to skip the sudo command so the rest is run as the regular user, which makes them both work the same.

We need to source that file or run a new bash session to pick up the aliases.

bash

Now we can just use dnf command like normal. Search can be used to find packages to install with rpm-ostree while installing packages will go into the default toolbox container (both with and without sudo are the same).

sudo dnf search vim
dnf install -y vim
The container is automatically created with dnf

To run vim from the example, enter the container and it will be there.

Vim in a container

You can do whatever you normally do with dnf, like install RPMs like RPMFusion and list repos.

Installing RPMFusion RPMs into container
Lising repositories in the container

Anyway, just a little thing but it’s kind of helpful to me.

April 16, 2020

Crisis Proofing the Australian Economy

An Open Letter to Prime Minister Scott Morrison

To The Hon Scott Morrison MP, Prime Minister,

No doubt how to re-invigorate our economy is high on your mind, among other priorities in this time of crisis.

As you're acutely aware, the pandemic we're experiencing has accelerated a long-term high unemployment trajectory we were already on due to industry retraction, automation, off-shoring jobs etc.

Now is the right time to enact changes that will bring long-term crisis resilience, economic stability and prosperity to this nation.

  1. Introduce a 1% tax on all financial / stock / commodity market transactions.
  2. Use 100% of that to fund a Universal Basic Income for all adult Australian citizens.

Funding a Universal Basic Income will bring:

  • Economic resilience in times of emergency (bushfire, drought, pandemic)
  • Removal of the need for government financial aid in those emergencies
  • Removal of all forms of pension and unemployment benefits
  • A more predictable, reduced and balanced government budget
  • Dignity and autonomy to those impacted by a economic events / crisis
  • Space and security for the innovative amongst us to take entrepreneurial risks
  • A growth in social, artistic and economic activity that could not happen otherwise

This is both simple to collect and simple to distribute to all tax payers. It can be done both swiftly and sensibly, enabling you to remove the Job Keeper band aid and it's related budgetary problems.

This is an opportunity to be seized, Mr Morrison.

There is also a second opportunity.

Post World War II, we had the Snowy River scheme. Today we have the housing affordability crisis and many Australians will never own their own home but a public building programme to provide 25% of housing will create a permanent employment and building boom and resolve the housing affordability crisis, over time.

If you cap repayments for those in public housing to 25% of their income, there will also be more disposable income circulating through the economy, creating prosperous times for all Australians.

Carpe diem, Mr Morrison.

Recognise the opportunity. Seize it.


Dear Readers,

If you support either or both of these ideas, please contact the Prime Minister directly and add your voice.

April 14, 2020

Exporting volumes from Cinder and re-creating COW layers

Share

Today I wandered into a bit of a rat hole discovering how to export data from OpenStack Cinder volumes when you don’t have admin permissions, and I thought it was worth documenting here so I remember it for next time.

Let’s assume that you have a Cinder volume named “child1”, which is a 64gb volume originally cloned from “parent1”. parent1 is a 7.9gb VMDK, but the only way I can find to extract child1 is to convert it to a glance image and then download the entire volume as a raw. Something like this:

$ cinder upload-to-image $child1 "extract:$child1"

Where $child1 is the UUID of the Cinder volume. You then need to find the UUID of the image in Glance, which the Cinder upload-to-image command will have told you, but you can also find by searching Glance for your image named “extract:$child1”:

$ glance image-list | grep "extract:$cinder_uuid"

You now need to watch that Glance image until the status of the image is “active”. It will go through a series of steps with names like “queued”, and “uploading” first.

Now you can download the image from Glance:

$ glance image-download --file images/$child1.raw --progress $glance_uuid

And then delete the intermediate glance image:

$ glance image-delete $glance_uuid

I have a bad sample script which does this in my junk code repository if that is helpful.

What you have at the end of this is a 64gb raw disk file in my example. You can convert that file to qcow2 like this:

$ qemu-img convert $child1.raw $child1.qcow2

But you’re left with a 64gb qcow2 file for your troubles. I experimented with virt-sparsify to reduce the size of this image, but it doesn’t work in my case (no space is saved), I suspect because the disk image has multiple partitions because it originally came from a VMWare environment.

Luckily qemu-img can also re-create the COW layer that existing on the admin-only side of the public cloud barrier. You do this by rebasing the converted qcow2 file onto the original VMDK file like this:

$ qemu-img create -f qcow2 -b $parent1.qcow2 $child1.delta.qcow2
$ qemu-img rebase -b $parent1.vmdk $child1.delta.qcow2

In my case I ended up with a 289mb $child1.delta.qcow2 file, which isn’t too shabby. It took about five minutes to produce that delta on my Google Cloud instance from a 7.9gb backing file and a 64gb upper layer.

Share

April 11, 2020

Using Gogo WiFi on Linux

Gogo, the WiFi provider for airlines like Air Canada, is not available to Linux users even though it advertises "access using any Wi-Fi enabled laptop, tablet or smartphone". It is however possible to work-around this restriction by faking your browser user agent.

I tried the User-Agent Switcher for Chrome extension on Chrome and Brave but it didn't work for some reason.

What did work was using Firefox and adding the following prefs in about:config to spoof its user agent to Chrome for Windows:

general.useragent.override=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36
general.useragent.updates.enabled=false
privacy.resistFingerprinting=false

The last two prefs are necessary in order for the hidden general.useragent.override pref to not be ignored.

Opt out of mandatory arbitration

As an aside, the Gogo terms of service automatically enroll you into mandatory arbitration unless you opt out by sending an email to customercare@gogoair.com within 30 days of using their service.

You may want to create an email template for this so that you can fire off a quick email to them as soon as you connect. I will probably write a script for it next time I use this service.

Fedora Silverblue is an amazing immutable desktop

I recently switched my regular Fedora 31 workstation over to the 31 Silverblue release. I’ve played with Project Atomic before and have been meaning to try it out more seriously for a while, but never had the time. Silverblue provided the catalyst to do that.

What this brings to the table is quite amazing and seriously impressive. The base OS is immutable and everyone’s install is identical. This means quality can be improved as there are less combinations and it’s easier to test. Upgrades to the next major version of Fedora are fast and secure. Instead of updating thousands of RPMs in-place, the new image is downloaded and the system reboots into it. As the underlying images don’t change, it also offers full rollback support.

This is similar to how platforms like Chrome OS and Android work, but thanks to ostree it’s now available for Linux desktops! That is pretty neat.

It doesn’t come with a standard package manager like dnf. Instead, any packages or changes you need to perform on the base OS are done using rpm-ostree command, which actually layers them on top.

And while technically you can install anything using rpm-ostree, ideally this should be avoided as much as possible (some low level apps like shells and libvirt may require it, though). Flatpak apps and containers are the standard way to consume packages. As these are kept separate from the base OS, it also helps improve stability and reliability.

Installing Silverblue

I copied the Silverblue installer to a USB stick and booted it to do the install. As my Dell XPS has an NVIDIA card, I modified the installer’s kernel args and disabled the nouveau driver with the usual nouveau.modeset=0 to get the install GUI to show up.

I’m also running in UEFI mode and due to a bug you have to use a separate, dedicated /boot/efi partition for Silverblue (personally, I think that’s a good thing to do anyway). Otherwise, the install looks pretty much the same as regular Fedora and went smoothly.

Once installed, I blacklisted the nouveau driver and rebooted. To make these kernel arguments permanent, we don’t use grub2, we set kernel args with rpm-ostree.

rpm-ostree kargs --append=modprobe.blacklist=nouveau --append=rd.driver.blacklist=nouveau

The NVIDIA drivers from RPMFusion are supported, so following this I had to add the repositories and drivers as RPMs on the base image.

rpm-ostree install \
https://download1.rpmfusion.org/free/fedora/rpmfusion-free-release-31.noarch.rpm \
https://download1.rpmfusion.org/nonfree/fedora/rpmfusion-nonfree-release-31.noarch.rpm
systemctl reboot

Once rebooted I then installed the necessary packages and rebooted again to activate them.

rpm-ostree install akmod-nvidia xorg-x11-drv-nvidia-cuda libva-utils libva-vdpau-driver gstreamer1-libav
rpm-ostree kargs --append=nvidia-drm.modeset=1
systemctl reboot

That was the base setup complete, which all went pretty smoothly. What you’re left with is the base OS with GNOME and a few core apps.

GNOME in Silverblue

Working with Silverblue

Using Silverblue is a different way of working than I have been used to. As mentioned above, there is no dnf command and packages are layered on top of the base OS with the rpm-ostree command. Because this is a layer, installing a new RPM requires a reboot to activate it, which is quite painful when you’re in the middle of some work and realise you need a program.

The answer though, is to use more containers instead of RPMs as I’m used to.

Containers

As I wrote about in an earlier blog post, toolbox is wrapper for setting up containers and compliments Silverblue wonderfully. If you need to install any terminal apps, give this a shot. Creating and running a container is as simple as this.

toolbox create
toolbox enter
Container on Fedora SIlverblue

Once inside your container use it like a normal Fedora machine (dnf is available!).

As rpm-ostree has no search function, using a container is the expected way to do this. Having created the container above, you can now use it (without entering it first) to perform package searches.

toolbox run dnf search vim

Apps

Graphical apps are managed with Flatpak, the new way to deliver secure, isolated programs on Linux. Silverblue is configured to use Fedora apps out of the box, and you can also add Flathub as a third party repo.

I experienced some small glitches with the Software GUI program when applying updates, but I don’t normally use it so I’m not sure if it’s just beta issues or not. As the default install is more sparse than usual, you’ll find yourself needing to install the apps you use. I really like this approach, it keeps the base system smaller and cleaner.

While Fedora provides their own Firefox package in Flatpak format (which is great) Mozilla also just recently started publishing their official package to Flathub. So, to install that, we simply add the Flathub as a repository and install away!

flatpak remote-add flathub https://flathub.org/repo/flathub.flatpakrepo
flatpak update
flatpak install org.mozilla.firefox

After install, Firefox should appears as a regular app inside GNOME.

Official Firefox from Mozilla via Flatpak

If you need to revert to an earlier version of a Flatpak (which I did when I was testing out Firefox beta), you can fetch the remote log for the app, then update to a specific commit.

flatpak remote-info --log flathub-beta org.mozilla.firefox//beta
flatpak update \
--commit 908489d0a77aaa8f03ca8699b489975b4b75d4470ce9bac92e56c7d089a4a869 \
org.mozilla.firefox//beta

Replacing system packages

If you have installed a Flatpak, like Firefox, and no-longer want to use the RPM version included in the base OS, you can use rpm-ostree to override it.

rpm-ostree override remove firefox

After a reboot, you will only see your Flatpak version.

Upgrades

I upgraded from 31 to the 32 beta, which was very fast by comparison to regular Fedora (because it just needs to download the new base image) and pretty seamless.

The only hiccup I had was needing to remove RPMFusion 31 release RPMs first, upgrade the base to 32, then install the RPMFusion 32 release RPMs. After that, I did an update for good measure.

rpm-ostree uninstall rpmfusion-nonfree-release rpmfusion-free-release
rpm-ostree rebase fedora:fedora/32/x86_64/silverblue
rpm-ostree install \
https://download1.rpmfusion.org/free/fedora/rpmfusion-free-release-32.noarch.rpm \
https://download1.rpmfusion.org/nonfree/fedora/rpmfusion-nonfree-release-32.noarch.rpm
systemctl reboot

Then post reboot, I did a manual update of the system.

rpm-ostree upgrade

You can see the current status of your system with the rpm-ostree command.

rpm-ostree status 

On my system you can see the ostree I’m using, the commit as well as both layered and local packages.

State: idle
AutomaticUpdates: disabled
Deployments:
● ostree://fedora:fedora/32/x86_64/silverblue
                   Version: 32.20200410.n.0 (2020-04-10T08:35:30Z)
                BaseCommit: d809af7c4f170a2175ffa1374827dd55e923209aec4a7fb4dfc7b87cd6c110c9
              GPGSignature: Valid signature by 97A1AE57C3A2372CCA3A4ABA6C13026D12C944D0
           LayeredPackages: akmod-nvidia git gstreamer1-libav ipmitool libva-utils libva-vdpau-driver libvirt
                            pass powertop screen tcpdump tmux vim virt-manager xorg-x11-drv-nvidia-cuda
             LocalPackages: rpmfusion-free-release-32-0.3.noarch rpmfusion-nonfree-release-32-0.4.noarch

  ostree://fedora:fedora/32/x86_64/silverblue
                   Version: 32.20200410.n.0 (2020-04-10T08:35:30Z)
                BaseCommit: d809af7c4f170a2175ffa1374827dd55e923209aec4a7fb4dfc7b87cd6c110c9
              GPGSignature: Valid signature by 97A1AE57C3A2372CCA3A4ABA6C13026D12C944D0
           LayeredPackages: akmod-nvidia git gstreamer1-libav ipmitool libva-utils libva-vdpau-driver libvirt
                            pass powertop screen tcpdump tmux vim virt-manager xorg-x11-drv-nvidia-cuda
             LocalPackages: rpmfusion-free-release-32-0.3.noarch rpmfusion-nonfree-release-32-0.4.noarch

To revert to the previous version temporarily, simply select it from the grub boot menu and you’ll go back in time. If you want to make this permanent, you can rollback to the previous state instead and then just reboot.

rpm-ostree rollback

Silverblue is really impressive and works well. I will continue to use it as my daily driver and see how it goes over time.

Tips

I have run into a couple of issues, mostly around using the Software GUI (which I don’t normally use). Mostly these were things like it listing updates for Flatpaks which were not actually there fore update, and when you tied to update it didn’t do anything.

If you hit issues, you can try clearing out the Software data and loading the program again.

pkill gnome-software
rm -rf ~/.cache/gnome-software

If you need to, you can also clean out and refresh the rpm-ostree cache and do an update.

rpm-ostree cleanup -m
rpm-ostree update

To repair and update Flatpaks, if you need to.

flatpak repair
flatpak update

Also see

Making dnf on the host terminal a little easier with aliases.

Accessing USB serial devices on the host and in a toolbox container.

A temporary return to Australia due to COVID-19

The last few months have been a rollercoaster, and we’ve just had to make another big decision that we thought we’d share.

TL;DR: we returned to Australia last night, hopeful to get back to Canada when we can. Currently in Sydney quarantine and doing fine.

UPDATE: please note that this isn’t at all a poor reflection on Canada. To the contrary, we have loved even the brief time we’ve had there, the wonderful hospitality and kindness shown by everyone, and the excellent public services there.

We moved to Ottawa, Canada at the end of February, for an incredible job opportunity with Service Canada which also presented a great life opportunity for the family. We enjoyed 2 “normal” weeks of settling in, with the first week dedicated to getting set up, and the second week spent establishing a work / school routine – me in the office, little A in school and T looking at work opportunities and running the household.

Then, almost overnight, everything went into COVID lock down. Businesses and schools closed. Community groups stopped meeting. Everyone people are being affected by this every day, so we have been very lucky to be largely fine and in good health, and we thought we could ride it out safely staying in Ottawa, even if we hadn’t quite had the opportunity to establish ourselves.

But then a few things happened which changed our minds – at least for now.

Firstly, with the schools shut down before the A had really had a chance to make friends (she only attended for 5 days before the school shut down), she was left feeling very isolated. The school is trying to stay connected with its students by providing a half hour video class each day, with a half hour activity in the afternoons, but it’s no way to help her to make new friends. A has only gotten to know the kids of one family in Ottawa, who are also in isolation but have been amazingly supportive (thanks Julie and family!), so we had to rely heavily on video playdates with cousins and friends in Australia, for which the timezone difference only allows a very narrow window of opportunity each day. With every passing day, the estimated school closures have gone from weeks, to months, to very likely the rest of the school year (with the new school year commencing in September). If she’d had just another week or two, she would have likely found a friend, so that was a pity. It’s also affected the availability of summer camps for kids, which we were relying on to help us with A through the 2 month summer holiday period (July & August).

Secondly, we checked our health cover and luckily the travel insurance we bought covered COVID conditions, but we were keen to get full public health cover. Usually for new arrivals there is a 3 month waiting period before this can be applied for. However, in response to the COVID threat the Ontario Government recently waived that waiting period for public health insurance, so we rushed to register. Unfortunately, the one service office that is able to process applications from non-Canandian citizens had closed by that stage due to COVID, with no re-opening being contemplated. We were informed that there is currently no alternative ability for non-citizens to apply online or over the phone.

Thirdly, the Australian Government has strongly encouraged all Australian citizens to return home, warning of the closing window for international travel. . We became concerned we wouldn’t have full consulate support if something went wrong overseas. A good travel agent friend of ours told us the industry is preparing for a minimum of 6 months of international travel restrictions, which raised the very real issue that if anything went wrong for us, then neither could we get home, nor family come to us. And, as we can now all appreciate, it’s probable that international travel disruptions and prohibitions will endure for much longer than 6 months.

Finally, we had a real scare. For context, we signed a lease for an apartment in a lovely part of central Ottawa, but we weren’t able to move in until early April, so we had to spend 5 weeks living in a hotel room. We did move into our new place just last Sunday and it was glorious to finally have a place, and for little A to finally have her own room, which she adored. Huge thanks to those who generously helped us make that move! The apartment is only 2 blocks away from A’s new school, which is incredibly convenient for us – it will particularly good during the worst of Ottawa’s winter. But little A, who is now a very active and adventurous 4 years old, managed to face plant off her scooter (trying to bunnyhop down a stair!) and she knocked out a front tooth, on only the second day in the new place! She is ok, but we were all very, very lucky that it was a clean accident with the tooth coming out whole and no other significant damage. But we struggled to get any non emergency medical support.

The Ottawa emergency dental service was directing us to a number that didn’t work. The phone health service was so busy that we were told we couldn’t even speak to a nurse for 24 hours. We could have called emergency services and gone to a hospital, which was comforting, but several Ottawa hospitals reported COVID outbreaks just that day, so we were nervous to do so. We ended up getting medical support from the dentist friend of a friend over text, but that was purely by chance. It was quite a wake up call as to the questions of what we would have done if it had been a really serious injury. We just don’t know the Ontario health system well enough, can’t get on the public system, and the pressure of escalating COVID cases clearly makes it all more complicated than usual.

If we’d had another month or two to establish ourselves, we think we might have been fine, and we know several ex-pats who are fine. But for us, with everything above, we felt too vulnerable to stay in Canada right now. If it was just Thomas and I it’d be a different matter.

So, we have left Ottawa and returned to Australia, with full intent to return to Canada when we can. As I write this, we are on day 2 of the 14 day mandatory isolation in Sydney. We were apprehensive about arriving in Sydney, knowing that we’d be put into mandatory quarantine, but the processing and screening of arrivals was done really well, professionally and with compassion. A special thank you to all the Sydney airport and Qatar Airways staff, immigration and medical officers, NSW Police, army soldiers and hotel staff who were all involved in the process. Each one acted with incredible professionalism and are a credit to their respective agencies. They’re also exposing themselves to the risk of COVID in order to help others. Amazing and brave people. A special thank you to Emma Rowan-Kelly who managed to find us these flights back amidst everything shutting down globally.

I will continue working remotely for Service Canada, on the redesign and implementation of a modern digital channel for government services. Every one of my team is working remotely now anyway, so this won’t be a significant issue apart from the timezone. I’ll essentially be a shift worker for this period Our families are all self isolating, to protect the grandparents and great-grandparents, so the Andrews family will be self-isolating in a location still to be confirmed. We will be traveling directly there once we are released from quarantine, but we’ll be contactable via email, fb, whatsapp, video, etc.

We are still committed to spending a few years in Canada, working, exploring and experiencing Canadian cultures, and will keep the place in Ottawa with the hope we can return there in the coming 6 months or so. We are very, very thankful for all the support we have had from work, colleagues, little A’s school, new friends there, as well as that of friends and family back in Australia.

Thank you all – and stay safe. This is a difficult time for everyone, and we all need to do our part and look after each other best we can.

Easy containers on Fedora with toolbox

The toolbox program is a wrapper for setting up containers on Fedora. It’s not doing anything you can’t do yourself with podman, but it does make using and managing containers more simple and easy to do. It comes by default on Silverblue where it’s aimed for use with terminal apps and dev work, but you can try it on a regular Fedora workstation.

sudo dnf install toolbox

Creating containers

You can create just one container if you want, which will be called something like fedora-toolbox-32, or you can create separate containers for different things. Up to you. As an example, let’s create a container called testing-f32.

toolbox create --container testing-f32

By default toolbox uses the Fedora registry and creates a container which is the same version as your host. However you can specify a different version if you need to, for example if you needed a Fedora 30 container.

toolbox create --release f30 --container testing-f30

These containers are not yet running, they’ve just been created for you.

View your containers

You can see your containers with the list option.

toolbox list

This will show you both the images in your cache and the containers in a nice format.

IMAGE ID      IMAGE NAME                                        CREATED
c49513deb616  registry.fedoraproject.org/f30/fedora-toolbox:30  5 weeks ago
f7cf4b593fc1  registry.fedoraproject.org/f32/fedora-toolbox:32  4 weeks ago

CONTAINER ID  CONTAINER NAME  CREATED        STATUS   IMAGE NAME
b468de87277b  testing-f30     5 minutes ago  Created  registry.fedoraproject.org/f30/fedora-toolbox:30
1597ab1a00a5  testing-f32     5 minutes ago  Created  registry.fedoraproject.org/f32/fedora-toolbox:32

As toolbox is a wrapper, you can also see this information with podman, but with two commands; one for images and one for containers. Notice that with podman you can also see that these containers are not actually running (that’s the next step).

podman images ; podman ps -a
registry.fedoraproject.org/f32/fedora-toolbox   32       f7cf4b593fc1   4 weeks ago    360 MB
registry.fedoraproject.org/f30/fedora-toolbox   30       c49513deb616   5 weeks ago    404 MB

CONTAINER ID  IMAGE                                             COMMAND               CREATED             STATUS   PORTS  NAMES
b468de87277b  registry.fedoraproject.org/f30/fedora-toolbox:30  toolbox --verbose...  About a minute ago  Created         testing-f30
1597ab1a00a5  registry.fedoraproject.org/f32/fedora-toolbox:32  toolbox --verbose...  About a minute ago  Created         testing-f32

You can also use podman to inspect the containers and appreciate all the extra things toolbox is doing for you.

podman inspect testing-f32

Entering a container

Once you have a container created, to use it you just enter it with toolbox.

toolbox enter --container testing-f32

Now you are inside your container which is separate from your host, but it generally looks the same. A number of bind mounts were created automatically for you and you’re still in your home directory. It is important to note that all containers you run with toolbox will share your home directory! Thus it won’t isolate different versions of the same software, for example, you would still need to create separate virtual environments for Python.

Any new shells or tabs you create in your terminal app will also be inside that container. Note the PS1 variable has changed to have a pink shape at the front (from /etc/profile.d/toolbox.sh).

Inside a container with toolbox

Note that you could also start and enter the container with podman.

podman start testing-f30
podman exec -it -u ${EUID} -w ${HOME} testing-f30 /usr/bin/bash

Hopefully you can see how toolbox make using containers easier!

Exiting a container

To get out of the container, just exit the shell and you’ll be back to your previous session on the host. The container will still exist and can be entered again, it is not deleted unless you delete it.

Removing a container

To remove a container, simply run toolbox with the rm option. Note that this still keeps the images around, it just deletes the instance of that image that’s running as that container.

toolbox rm -f testing-f32

Again, you can also delete this using podman.

Using containers

Once inside a container you can basically (mostly) treat your container system as a regular Fedora host. You can install any apps you want, such as terminal apps like screenfetch and even graphical programs like gedit (which work from inside the container).

sudo dnf install screenfetch gedit
screenfetch is always a favourite

For any programs that require RPMFusion, like ffmpeg, you first need to set up the repos as you would on a regular Fedora system.

sudo dnf install \
https://download1.rpmfusion.org/free/fedora/rpmfusion-free-release-$(rpm -E %fedora).noarch.rpm \
https://download1.rpmfusion.org/nonfree/fedora/rpmfusion-nonfree-release-$(rpm -E %fedora).noarch.rpm
sudo dnf install ffmpeg

These programs like screenfetch and ffmpeg are available inside your container, but not outside your container. They are isolated. To run them in the future you would enter the container and run the program.

Instead of entering and then running the program, you can also just use the run command. Here you can see screenfetch is not on my host, but I can run it in the container.

Those are pretty simple (silly?) examples, but hopefully it demonstrates the value of toolbox. It’s probably more useful for dev work where you can separate and manage different versions of various platforms, but it does make it really easy to quickly spin something outside of you host system.

April 06, 2020

The Calculating Stars

Share

Winner of both a Hugo, Locus and a Nebula, this book is about a mathematical prodigy battling her way into a career as an astronaut in a post-apolocalyptic 1950s America. Along the way she has to take on the embedded sexism of America in the 50s, as well as her own mild racism. Worse, she suffers from an anxiety condition.

The book is engaging and well written, with an alternative history plot line which believable and interesting. In fact, its quite topical for our current time.

I really enjoyed this book and I will definitely be reading the sequel.

The Calculating Stars Book Cover The Calculating Stars
Mary Robinette Kowal
May 16, 2019
432

The Right Stuff meets Hidden Figures by way of The Martian. A world in crisis, the birth of space flight and a heroine for her time and ours; the acclaimed first novel in the Lady Astronaut series has something for everyone., On a cold spring night in 1952, a huge meteorite fell to earth and obliterated much of the east coast of the United States, including Washington D.C. The ensuing climate cataclysm will soon render the earth inhospitable for humanity, as the last such meteorite did for the dinosaurs. This looming threat calls for a radically accelerated effort to colonize space, and requires a much larger share of humanity to take part in the process. Elma York's experience as a WASP pilot and mathematician earns her a place in the International Aerospace Coalition's attempts to put man on the moon, as a calculator. But with so many skilled and experienced women pilots and scientists involved with the program, it doesn't take long before Elma begins to wonder why they can't go into space, too. Elma's drive to become the first Lady Astronaut is so strong that even the most dearly held conventions of society may not stand a chance against her.

Share

April 05, 2020

Custom WiFi enabled nightlight with ESPHome and Home Assistant

I built this custom night light for my kids as a fun little project. It’s pretty easy so thought someone else might be inspired to do something similar.

Custom WiFi connected nightlight

Hardware

The core hardware is just an ESP8266 module and an Adafruit NeoPixel Ring. I also bought a 240V bunker light and took the guts out to use as the housing, as it looked nice and had a diffuser (you could pick anything that you like).

Removing existing components from bunker light

While the data pin of the NeoPixel Ring can pretty much connect to any GPIO pin on the ESP, bitbanging can cause flickering. It’s better to use pins 1, 2 or 3 on an ESP8266 where we can use other methods to talk to the device.

These methods are exposed in ESPHome’s support for NeoPixel.

  • ESP8266_DMA (default for ESP8266, only on pin GPIO3)
  • ESP8266_UART0 (only on pin GPIO1)
  • ESP8266_UART1 (only on pin GPIO2)
  • ESP8266_ASYNC_UART0 (only on pin GPIO1)
  • ESP8266_ASYNC_UART1 (only on pin GPIO2) (only on pin GPIO2)
  • ESP32_I2S_0 (ESP32 only)
  • ESP32_I2S_1 (default for ESP32)
  • BIT_BANG (can flicker a bit)

I chose GPIO2 and use ESP8266_UART1 method in the code below.

So, first things first, solder up some wires to 5V, GND and GPIO pin 2 on the ESP module. These connect to the 5V, GND and data pins on the NeoPixel Ring respectively.

It’s not very neat, but I used a hot glue gun to stick the ESP module into the bottom part of the bunker light, and fed the USB cable through for power and data.

I hot-glued the NeoPixel Ring in-place on the inside of the bunker light, in the centre, shining outwards towards the diffuser.

The bottom can then go back on and screws hold it in place. I used a hacksaw to create a little slot for the USB cable to sit in and then added hot-glue blobs for feet. All closed up, it looks like this underneath.

Looks a bit more professional from the top.

Code using ESPHome

I flashed the ESP8266 using ESPHome (see my earlier blog post) with this simple YAML config.

esphome:
  name: nightlight
  build_path: ./builds/nightlight
  platform: ESP8266
  board: huzzah
  esp8266_restore_from_flash: true

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

# Enable logging
logger:

# Enable Home Assistant API
api:
  password: '!secret api_password'

# Enable over the air updates
ota:
  password: !secret ota_password

mqtt:
  broker: !secret mqtt_broker
  username: !secret mqtt_username
  password: !secret mqtt_password
  port: !secret mqtt_port

light:
  - platform: neopixelbus
    pin: GPIO2
    method: ESP8266_UART1
    num_leds: 16
    type: GRBW
    name: "Nightlight"
    effects:
      # Customize parameters
      - random:
          name: "Slow Random"
          transition_length: 30s
          update_interval: 30s
      - random:
          name: "Fast Random"
          transition_length: 4s
          update_interval: 5s
      - addressable_rainbow:
          name: Rainbow
          speed: 10
          width: 50
      - addressable_twinkle:
          name: Twinkle Effect
          twinkle_probability: 5%
          progress_interval: 4ms
      - addressable_random_twinkle:
          name: Random Twinkle
          twinkle_probability: 5%
          progress_interval: 32ms
      - addressable_fireworks:
          name: Fireworks
          update_interval: 32ms
          spark_probability: 10%
          use_random_color: false
          fade_out_rate: 120
      - addressable_flicker:
          name: Flicker

The esp8266_restore_from_flash option is useful because if the light is on and someone accidentally turns it off, it will go back to the same state when it is turned back on. It does wear the flash out more quickly, however.

The important settings are the light component with the neopixelbus platform, which is where all the magic happens. We specify which GPIO on the ESP the data line on the NeoPixel Ring is connected to (pin 2 in my case). The method we use needs to match the pin (as discussed above) and in this example is ESP8266_UART1.

The number of LEDs must match the actual number on the NeoPixel Ring, in my case 16. This is used when talking to the on-chip LED driver and calculating effects, etc.

Similarly, the LED type is important as it determines which order the colours are in (swap around if colours don’t match). This must match the actual type of NeoPixel Ring, in my case I’m using an RGBW model which has a separate white LED and is in the order GRBW.

Finally, you get all sorts of effects for free, you just need to list the ones you want and any options for them. These show up in Home Assistant under the advanced view of the light (screenshot below).

Now it’s a matter of plugging the ESP module in and flashing it with esphome.

esphome nightlight.yaml run

Home Assistant

After a reboot, the device should automatically show up in Home Assistant under Configuration -> Devices. From here you can add it to the Lovelace dashboard and make Automations or Scripts for the device.

Nightlight in Home Assistant with automations

Adding it to Lovelace dashboard looks something like this, which lets you easily turn the light on and off and set the brightness.

You can also get advanced settings for the light, where you can change brightness, colours and apply effects.

Nightlight options

Effects

One of the great things about using ESPHome is all the effects which are defined in the YAML file. To apply an effect, choose it from the advanced device view in Home Assistant (as per screenshot above).

This is what rainbow looks like.

Nightlight running Rainbow effect

The kids love to select the colours and effects they want!

Automation

So, once you have the nightlight showing up in Home Assistant, we can create a simple automation to turn it on at sunset and off at sunrise.

Go to Configuration -> Automation and add a new one. You can fill in any name you like and there’s an Execute button there when you want to test it.

The trigger uses the Sun module and runs 10 minutes before sunset.

I don’t use Conditions, but you could. For example, only do this when someone’s at home.

The Actions are set to call the homeassistant.turn_on function and specifies the device(s). Note this takes a comma separated list, so if you have more than one nightlight you can do it with the one automation rule.

That’s it! You can create another one for sunrise, but instead of calling homeassistant.turn_on just call homeassistant.turn_off and use Sunrise instead Sunset.

Infinite complacency

Infinite complacency
Attachment Size
First paragraph of War of the Worlds 1.84 MB
kattekrab Sun, 05/04/2020 - 10:56

April 04, 2020

COVID-19 and Appreciation

So I’m going near people just once a week to shop. Once a day I go outside on my bike (but nowhere near people) to maintain my mental and physical health. It helps that we live in sparsely populated suburbs.

I shop at my local Woolworths (Findon, South Australia), and was very impressed what I saw today. Crosses on the floor positioning us 2m apart and a bouncer regulating the flow and keeping store numbers low. Same thing on the checkout, and in front of the Deli counter. While I was queuing, a young lady wiped down the trolley handle and offered me hand sanitiser.

The EFTPOS limit has been raised, so no need to use my fingers to enter a PIN number at the checkout. That’s good – I now regard that keypad as an efficient means to distribute a viral payload. Just wave my card 20mm above the machine and I have groceries to sustain my son and I for a week. It costs what I earn with just 1 hour of my labour.

In the middle of the biggest crisis to hit the World since WW2, I can buy just about anything I want. I could gain weight if I wanted too.

Our power went off in a storm last night, and with it the Broadband Internet. However I still had my phone, a hotspot, and a laptop connected to the Internet, friends and loved ones. I immediately received a text from the power company telling me the power would be restored in 2 hours. They did it in 1. In the middle of COVID-19. At night, in the rain. While waiting my son and I cooked a nice BBQ outside in the twilight using gas.

My part time day job is secure, my pay keeps coming, and we have transitioned to WFH and are working well. My shares have been smashed but I can live with that – they are still good companies and I am a long term investor. My son is being home schooled and his teachers at Findon High are working hard on online content and remote teaching.

The Australian COVID-19 new case numbers are dropping and recoveries picking up. Many people are not going to die. The Australian population are working together to beat this.

We are well informed by our public broadcaster the ABC, our media is uncensored, and I can choose to do my own analysis using open source data sets.

What a fantastic world we live in, that can supply a surplus of food, and keep all our institutions running at a time like this. Well done to the Australian government and people.

I feel very grateful.

April 03, 2020

Building Daedalus Flight on NixOS

NixOS Daedalus Gears by Craige McWhirter

Daedalus Flight was recently released and this is how you can build and run this version of Deadalus on NixOS.

If you want to speed the build process up, you can add the IOHK Nix cache to your own NixOS configuration:

iohk.nix:

nix.binaryCaches = [
  "https://cache.nixos.org"
  "https://hydra.iohk.io"
];
nix.binaryCachePublicKeys = [
  "hydra.iohk.io:f/Ea+s+dFdN+3Y/G+FDgSq+a5NEWhJGzdjvKNGv0/EQ="
];

If you haven't already, you can clone the Daedalus repo and specifically the 1.0.0 tagged commit:

$ git clone --branch 1.0.0 https://github.com/input-output-hk/daedalus.git

Once you've cloned the repo and checked you're on the 1.0.0 tagged commit, you can build Daedalus flight with the following command:

$ nix build -f . daedalus --argstr cluster mainnet_flight

Once the build completes, you're ready to launch Daedalus Flight:

$ ./result/bin/daedalus

To verify that you have in fact built Daedalus Flight, first head to the Daedalus menu then About Daedalus. You should see a title such as "DAEDALUS 1.0.0". The second check, is to press [Ctl]+d to access Daedalus Diagnostocs and your Daedalus state directory should have mainnet_flight at the end of the path.

If you've got these, give yourself a pat on the back and grab yourself a refreshing bevvy while you wait for blocks to sync.

Daedalus FC1 screenshot

Bebo, Betty, and Jaco

Wait, wasn’t WordPress 5.4 just released?

It absolutely was, and congratulations to everyone involved! Inspired by the fine work done to get another release out, I finally completed the last step of co-leading WordPress 5.0, 5.1, and 5.2 (Bebo, Betty, and Jaco, respectively).

My study now has a bit more jazz in it. 🙂

April 02, 2020

Audiobooks – March 2020

My rating for books I read. Note that I’m perfectly happy with anything scoring 3 or better.

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

The World As It is: Inside the Obama White House by Ben Rhodes

A memoir of a senior White House staffer, Speechwriter & Presidential adviser. Lots of interesting accounts with and behind the scenes information. 4/5

Redshirts by John Scalzi

A Star Trek parody from the POV of five ensigns who realise something is very strange on their ship. Plot moves steadily and the humour and action mostly work. 3/5

Little House on the Prairie by Laura Ingalls Wilder

The book covers less than a year as the Ingalls family build a cabin in Indian territory on the Kansas Prairie. Dangerous incidents and adventures throughout. 3/5

Wheels Stop: The Tragedies and Triumphs of the Space Shuttle Program, 1986-2011 by Rich Houston

A book about the post-Challenger Shuttle missions. An overview of most of the missions and the astronauts on them. Lots of quotes mainly from the astronauts. Good for Spaceflight fans. 3/5

The Optimist’s Telescope: Thinking Ahead in a Reckless Age by Bina Venkataraman

Ways that people, organisations and governments can start looking ahead at the long term rather than just the short and why they don’t already. Some good stuff 4/5

Share

April 01, 2020

Zoom's Make or Break Moment

Zoom is experiencing massive growth as large sections of the workforce transition to working from home. At the same time many problems with Zoom are coming to light. This is their make or break moment. If they fix the problems they end up with a killer video conferencing app. The alternative is that they join Cisco's Webex in the dumpster fire of awful enterprise software.

In the interest of transparency I am a paying Zoom customer and I use it for hours every day. I also use Webex (under protest) as it is a client's video conferencing platform of choice.

In the middle of last year Jonathan Leitschuh disclosed two bugs in zoom with security and privacy implications . There was a string of failures that lead to these bugs. To Zoom’s credit they published a long blog post about why these “features” were there in the first place.

Over the last couple of weeks other issues with Zoom have surfaced. “Zoom bombing” or using random 9 digit numbers to find meetings has become a thing. This is caused by zoom’s meeting rooms having a 9 digit code to join. That’s really handy when you have to dial in and enter the number on your telephone keypad. The down side is that you have a 1 in 999 999 999 chance of joining a meeting when using a random number. Zoom does offer the option of requiring a password or PIN for each call. Unfortunately it isn’t the default. Publishing a blog post on how to secure your meetings isn’t enough, the app needs to be more secure by default. The app should default to enabling a 6 digit PIN when creating a meeting.

The Intercept is reporting Zoom’s marketing department got a little carried away when describing the encryption used in the product. This is an area where words matter. Encryption in transit is a base line requirement in communication tools these days. Zoom has this, but their claims about end to end encryption appear to be false. End to end encryption is very important for some use cases. I await the blog post explaining this one.

I don’t know why Proton Mail’s privacy issues blog post got so much attention. This appears to be based on someone skimming the documentation rather than any real testing. Regardless the post got a lot of traction. Some of the same issues were flagged by the EFF.

Until recently zoom’s FAQ read “Does Zoom sell Personal Data? […] Depends what you mean by ‘sell’”. I’m sure that sounded great in a meeting but it is worrying when you read it as a customer. Once called out on social media it was quickly updated and a blog post published. In the post, Zoom assures users it isn’t selling their data.

Joseph Cox reported late last week that Zoom was sending data to Facebook every time someone used their iOS app. It is unclear if Joe gave Zoom an opportunity to fix the issue before publishing the article. The company pushed out a fix after the story broke.

The most recent issue broke yesterday about the Zoom macOS installer behaving like malware. This seems pretty shady behaviour, like their automatic reinstaller that was fixed last year. To his credit, Zoom Founder and CEO, Eric Yuan engaged with the issue on twitter. This will be one to watch over the coming days.

Over the last year I have seen a consistent pattern when Zoom is called out on security and valid privacy issues with their platform. They respond publicly with “oops my bad” blog posts . Many of the issues appear to be a result of them trying to deliver a great user experience. Unfortunately they some times lean too far toward the UX and ignore the security and privacy implications of their choices. I hope that over the coming months we see Zoom correct this balance as problems are called out. If they do they will end up with an amazing platform in terms of UX while keeping their users safe.

Update Since publishing this post additional issues with Zoom were reported. Zoom's CEO announced the company was committed to fixing their product.

March 31, 2020

Defining home automation devices in YAML with ESPHome and Home Assistant, no programming required!

Having built the core of my own “dumb” smart home system, I have been working on making it smart these past few years. As I’ve written about previously, the smart side of my home automation is managed by Home Assistant, which is an amazing, privacy focused open source platform. I’ve previously posted about running Home Assistant in Docker and in Podman.

Home Assistant, the privacy focused, open source home automation platform

I do have a couple of proprietary home automation products, including LIFX globes and Google Home. However, the vast majority of my home automation devices are ESP modules running open source firmware which connect to MQTT as the central protocol. I’ve built a number of sensors and lights and been working on making my light switches smart (more on that in a later blog post).

I already had experience with Arduino, so I started experimenting with this and it worked quite well. I then had a play with Micropython and really enjoyed it, but then I came across ESPHome and it blew me away. I have since migrated most of my devices to ESPHome.

ESPHome provides simple management of ESP devices

ESPHome is smart in making use of PlatformIO underneath, but its beauty lies in the way it abstracts away the complexities of programming for embedded devices. In fact, no programming is necessary! You simply have to define your devices in YAML and run a single command to compile the firmware blob and flash a device. Loops, initialising and managing multiple inputs and outputs, reading and writing to I/O, PWM, functions and callbacks, connecting to WiFi and MQTT, hosting an AP, logging and more is taken care of for you. Once up, the devices support mDNS and unencrypted over the air updates (which is fine for my local network). It supports both Home Assistant API and MQTT (over TLS for ESP8266) as well as lots of common components. There is even an addon for Home Assistant if you prefer using a graphical interface, but I like to do things on the command line.

When combined with Home Assistant, new devices are automatically discovered and appear in the web interface. When using MQTT, the channels are set with retain flag, so that the devices themselves and their last known states are not lost on reboots (you can disable this for testing).

That’s a lot of things you get for just a little bit of YAML!

Getting started

Getting started is pretty easy, just install esphome using pip.

pip3 install --user esphome

Of course, you will need a real physical ESP device of some description. Thanks to PlatformIO, lots of ESP8266 and ESP32 devices are supported. Although built on similar SOC, different devices break out different pins and can have different flashing requirements. Therefore, specifying the exact device is good and can be helpful, but it’s not strictly necessary.

It’s not just ESP modules that are supported. These days a number of commercial products are been built using ESP8266 chips which we can flash, like Sonoff power modules, Xiaomi temperature sensors, Brilliant Smart power outlets and Mirabella Genio light bulbs (I use one of these under my stairs).

For this post though, I will use one of my MH-ET Live ESP32Minikit devices as an example, which has the device name of mhetesp32minikit.

MH-ET Live ESP32Minikit

Managing configs with Git

Everything with your device revolves around your device’s YAML config file, including configuration, flashing, accessing logs, clearing out MQTT messages and more.

ESPHome has a wizard which will prompt you to enter your device details and WiFi credentials. It’s a good way to get started, however it only creates a skeleton file and you have to continue configuring the device manually to actually do anything anyway. So, I think ultimately it’s easier to just create and manage your own files, which we’ll do below. (If you want to give it a try, you can run the command esphome example.yaml wizard which will create an example.yaml file.)

I have two Git repositories to manage my ESPHome devices. The first one is for my WIFI and MQTT credentials, which are stored as variables in a file called secrets.yaml (store them in an Ansible vault, if you like). ESPHome automatically looks for this file when compiling firmware for a device and will use those variables.

Let’s create the Git repo and secrets file, replacing the details below with your own. Note that I am including the settings for an MQTT server, which is unencrypted in the example. If you’re using an MQTT server online you may want to use an ESP8266 device instead and enable TLS fingerprints for a more secure connection. I should also mention that MQTT is not required, devices can also use the Home Assistant API and if you don’t use MQTT those variables can be ignored (or you can leave them out).

mkdir ~/esphome-secrets
cd ~/esphome-secrets
cat > secrets.yaml << EOF
wifi_ssid: "ssid"
wifi_password: "wifi-password"
api_password: "api-password"
ota_password: "ota-password"
mqtt_broker: "mqtt-ip"
mqtt_port: 1883
mqtt_username: "mqtt-username"
mqtt_password: "mqtt-password"
EOF
git init
git add .
git commit -m "esphome secrets: add secrets"

The second Git repo has all of my device configs and references the secrets file from the other repo. I name each device’s config file the same as its name (e.g. study.yaml for the device that controls my study). Let’s create the Git repo and link to the secrets file and ignore things like the builds directory (where builds will go!).

mkdir ~/esphome-configs
cd ~/esphome-configs
ln -s ../esphome-secrets/secrets.yaml .
cat > .gitignore << EOF
/.esphome
/builds
/.*.swp
EOF
git init
git add .
git commit -m "esphome configs: link to secrets"

Creating a config

The config file contains different sections with core settings. You can leave some of these settings out, such as api, which will disable that feature on the device (esphome is required).

  • esphome – device details and build options
  • wifi – wifi credentials
  • logger – enable logging of device to see what’s happening
  • ota – enables over the air updates
  • api – enables the Home Assistant API to control the device
  • mqtt – enables MQTT to control the device

Now that we have our base secrets file, we can create our first device config! Note that settings with !secret are referencing the variables in our secrets.yaml file, thus keeping the values out of our device config. Here’s our new base config for an ESP32 device called example in a file called example.yaml which will connect to WiFi and MQTT.

cat > example.yaml << EOF
esphome:
  name: example
  build_path: ./builds/example
  platform: ESP32
  board: mhetesp32minikit

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

logger:

api:
  password: !secret api_password

ota:
  password: !secret ota_password

mqtt:
  broker: !secret mqtt_broker
  username: !secret mqtt_username
  password: !secret mqtt_password
  port: !secret mqtt_port
  # Set to true when finished testing to set MQTT retain flag
  discovery_retain: false
EOF

Compiling and flashing the firmware

First, plug your ESP device into your computer which should bring up a new TTY, such as /dev/ttyUSB0 (check dmesg). Now that you have the config file, we can compile it and flash the device (you might need to be in the dialout group). The run command actually does a number of things, include sanity check, compile, flash and tail the log.

esphome example.yaml run

This will compile the firmware in the specified build dir (./builds/example) and prompt you to flash the device. As this is a new device, an over the air update will not work yet, so you’ll need to select the TTY device. Once the device is running and connected to WiFi you can use OTA.

INFO Successfully compiled program.
Found multiple options, please choose one:
  [1] /dev/ttyUSB0 (CP2104 USB to UART Bridge Controller)
  [2] Over The Air (example.local)
(number): 

Once it is flashed, the device is automatically rebooted. The terminal should now be automatically tailing the log of the device (we enabled logger in the config). If not, you can tell esphome to tail the log by running esphome example.yaml logs.

INFO Successfully uploaded program.
INFO Starting log output from /dev/ttyUSB0 with baud rate 115200
[21:30:17][I][logger:156]: Log initialized
[21:30:17][C][ota:364]: There have been 0 suspected unsuccessful boot attempts.
[21:30:17][I][app:028]: Running through setup()...
[21:30:17][C][wifi:033]: Setting up WiFi...
[21:30:17][D][wifi:304]: Starting scan...
[21:30:19][D][wifi:319]: Found networks:
[21:30:19][I][wifi:365]: - 'ssid' (02:18:E6:22:E2:1A) ▂▄▆█
[21:30:19][D][wifi:366]:     Channel: 1
[21:30:19][D][wifi:367]:     RSSI: -54 dB
[21:30:19][I][wifi:193]: WiFi Connecting to 'ssid'...
[21:30:23][I][wifi:423]: WiFi Connected!
[21:30:23][C][wifi:287]:   Hostname: 'example'
[21:30:23][C][wifi:291]:   Signal strength: -50 dB ▂▄▆█
[21:30:23][C][wifi:295]:   Channel: 1
[21:30:23][C][wifi:296]:   Subnet: 255.255.255.0
[21:30:23][C][wifi:297]:   Gateway: 10.0.0.123
[21:30:23][C][wifi:298]:   DNS1: 10.0.0.1
[21:30:23][C][ota:029]: Over-The-Air Updates:
[21:30:23][C][ota:030]:   Address: example.local:3232
[21:30:23][C][ota:032]:   Using Password.
[21:30:23][C][api:022]: Setting up Home Assistant API server...
[21:30:23][C][mqtt:025]: Setting up MQTT...
[21:30:23][I][mqtt:162]: Connecting to MQTT...
[21:30:23][I][mqtt:202]: MQTT Connected!
[21:30:24][I][app:058]: setup() finished successfully!
[21:30:24][I][app:100]: ESPHome version 1.14.3 compiled on Mar 30 2020, 21:29:41

You should see the device boot up and connect to your WiFi and MQTT server successfully.

Adding components

Great! Now we have a basic YAML file, let’s add some components to make it do something more useful. Components are high level groups, like sensors, lights, switches, fans, etc. Each component is divided into platforms which is where different devices of that type are supported. For example, two of the different platforms under the light component are rgbw and neopixelbus.

One thing that’s useful to know is that platform devices with the name property set in the config will appear in Home Assistant. Those without will be only local to the device and just have an id. This is how you can link multiple components together on the device, then present a single device to Home Assistant (like garage remote below).

Software reset switch

First thing we can do is add a software switch which will let us reboot the device from Home Assistant (or by publishing manually to MQTT or API). To do this, we add the reboot platform from the switch component. It’s as simple as adding this to the bottom of your YAML file.

switch:
  - platform: restart
    name: "Example Device Restart"

That’s it! Now we can re-run the compile and flash. This time you can use OTA to flash the device via mDNS (but if it’s still connected via TTY then you can still use that instead).

esphome example.yaml run

This is what OTA updates look like.

INFO Successfully compiled program.
Found multiple options, please choose one:
  [1] /dev/ttyUSB0 (CP2104 USB to UART Bridge Controller)
  [2] Over The Air (example.local)
(number): 2
INFO Resolving IP address of example.local
INFO  -> 10.0.0.123
INFO Uploading ./builds/example/.pioenvs/example/firmware.bin (856368 bytes)
Uploading: [=====================================                       ] 62% 

After the device reboots, the new reset button should automatically show up in Home Assistant as a device, under Configuration -> Devices under the name example.

Home Assistant with auto-detected example device and reboot switch

Because we set a name for the reset switch, the reboot switch is visible and called Example Device Restart. If you want to make this visible on the main Overview dashboard, you can do so by selecting ADD TO LOVELACE.

Go ahead and toggle the switch while still tailing the log of the device and you should see it restart. If you’ve already disconnected your ESP device from your computer, you can tail the log using MQTT.

LED light switch

OK, so rebooting the device is cute. Now what if we want to add something more useful for home automation? Well that requires some soldering or breadboard action, but what we can do easily is use the built-in LED on the device as a light and control it through Home Assistant.

On the ESP32 module, the built-in LED is connected to GPIO pin 2. We will first define that pin as an output component using the ESP32 LEDC platform (supports PWM). We then attach a light component using the monochromatic platform to that output component. Let’s add those two things to our config!

output:
  # Built-in LED on the ESP32
  - platform: ledc
    pin: 2
    id: output_ledpin2

light:
  # Light created from built-in LED output
  - platform: monochromatic
    name: "Example LED"
    output: output_ledpin2

Build and flash the new firmware again.

esphome example.yaml run

After the device reboots, you should now be able to see the new Example LED automatically in Home Assistant.

Example device page in Home Assistant showing new LED light

If we toggle this light a few times, we can see the built-in LED on the ESP device fading in and out at the same time.

Other components

As mentioned previously, there are many devices we can easily add to a single board like relays, PIR sensors, temperature and humidity sensors, reed switches and more.

Reed switch, relay, PIR, temperature and humidity sensor (from top to bottom, left to right)

All we need to do is connect them up to appropriate GPIO pins and define them in the YAML.

PIR sensor

A PIR sensor connects to ground and 3-5V, with data connecting to a GPIO pin (let’s use 34 in the example). We read the GPIO pin and can tell when motion is detected because the control pin voltage is set to high. Under ESPHome we can use the binary_sensor component with gpio platform. If needed, pulling the pin down is easy, just set the default mode. Finally, we set the class of the device to motion which will set the appropriate icon in Home Assistant. It’s as simple as adding this to the bottom of your YAML file.

binary_sensor:
  - platform: gpio
    pin:
      number: 34
      mode: INPUT_PULLDOWN
    name: "Example PIR"
    device_class: motion

Again, compile and flash the firmware with esphome.

esphome example.yaml run

As before, after the device reboots again we should see the new PIR device appear in Home Assistant.

Example device page in Home Assistant showing new PIR input

Temperature and humidity sensor

Let’s do another example, a DHT22 temperature sensor connected to GPIO pin 16. Simply add this to the bottom of your YAML file.

sensor:
  - platform: dht
    pin: 16
    model: DHT22
    temperature:
      name: "Example Temperature"
    humidity:
      name: "Example Humidity"
    update_interval: 10s

Compile and flash.

esphome example.yaml run

After it reboots, you should see the new temperature and humidity inputs under devices in Home Assistant. Magic!

Example device page in Home Assistant showing new temperature and humidity inputs

Garage opener using templates and logic on the device

Hopefully you can see just how easy it is to add things to your ESP device and have them show up in Home Assistant. Sometimes though, you need to make things a little more tricky. Take opening a garage door for example, which only has one button to start and stop the motor in turn. To emulate pressing the garage opener, you need apply voltage to the opener’s push button input for a short while and then turn it off again. We can do all of this easily on the device with ESPHome and preset a single button to Home Assistant.

Let’s assume we have a relay connected up to a garage door opener’s push button (PB) input. The relay control pin is connected to our ESP32 on GPIO pin 22.

ESP32 device with relay module, connected to garage opener inputs

We need to add a couple of devices to the ESP module and then expose only the button out to Home Assistant. Note that the relay only has an id, so it is local only and not presented to Home Assistant. However, the template switch which uses the relay has a name is and it has an action which causes the relay to be turned on and off, emulating a button press.

Remember we already added a switch component for the reboot platform? Now need to add the new platform devices to that same section (don’t create a second switch entry).

switch:
  - platform: restart
    name: "Example Device Restart"

  # The relay control pin (local only)
  - platform: gpio
    pin: GPIO22
    id: switch_relay

  # The button to emulate a button press, uses the relay
  - platform: template
    name: "Example Garage Door Remote"
    icon: "mdi:garage"
    turn_on_action:
    - switch.turn_on: switch_relay
    - delay: 500ms
    - switch.turn_off: switch_relay

Compile and flash again.

esphome example.yaml run

After the device reboots, we should now see the new Garage Door Remote in the UI.

Example device page in Home Assistant showing new garage remote inputs

If you actually cabled this up and toggled the button in Home Assistant, the UI button turn on and you would hear the relay click on, then off, then the UI button would go back to the off state. Pretty neat!

There are many other things you can do with ESPHome, but this is just a taste.

Commit your config to Git

Once you have a device to your liking, commit it to Git. This way you can track the changes you’ve made and can always go back to a working config.

git add example.yaml
git commit -m "adding my first example config"

Of course it’s probably a good idea to push your Git repo somewhere remote, perhaps even share your configs with others!

Creating automation in Home Assistant

Of course once you have all these devices it’s great to be able to use them in Home Assistant, but ultimately the point of it all is to automate the home. Thus, you can use Home Assistant to set up scripts and react to things that happen. That’s beyond the scope of this particular post though, as I really wanted to introduce ESPHome and show how you can easily manage devices and integrate them with Home Assistant. There is pretty good documentation online though. Enjoy!

Overriding PlatformIO

As a final note, if you need to override something from PlatformIO, for example specifying a specific version of a dependency, you can do that by creating a modified platformio.ini file in your configs dir (copy from one of your build dirs and modify as needed). This way esphome will pick it up and apply that or you automatically.

Links March 2020

Rolling Stone has an insightful article about why the Christian Right supports Trump and won’t stop supporting him no matter what he does [1].

Interesting article about Data Oriented Architecture [2].

Quarantine Will normalise WFH and Recession will Denormalise Jobs [3]. I guess we can always hope that after a disaster we can learn to do things better than before.

Tyre wear is worse than exhaust for small particulate matter [4]. We need better tyres and legal controls over such things.

Scott Santens wrote an insightful article about the need for democracy and unconditional basic income [5]. “In ancient Greece, work was regarded as a curse” is an extreme position but strongly supported by evidence. ‘In his essay “In Praise of Idleness,” Bertrand Russell wrote “Modern methods of production have given us the possibility of ease and security for all; we have chosen, instead, to have overwork for some and starvation for others. Hitherto we have continued to be as energetic as we were before there were machines; in this we have been foolish, but there is no reason to go on being foolish forever.”‘

Cory Doctorow wrote an insightful article for Locus titled A Lever Without a Fulcrum Is Just a Stick about expansions to copyright laws [6]. One of his analogies is that giving a bullied kid more lunch money just allows the bullies to steal more money, with artists being bullied kids and lunch money being the rights that are granted under copyright law. The proposed solution includes changes to labor and contract law, presumably Cory will write other articles in future giving the details of his ideas in this regard.

The Register has an amusing article about the trial of a former CIA employee on trial for being the alleged “vault 7 leaker” [7]. Both the prosecution and the defence are building their cases around the defendent being a jerk. The article exposes poor security and poor hiring practices in the CIA.

CNN has an informative article about Finland’s war on fake news [8]. As Finland has long standing disputes with Russia they have had more practice at dealing with fake news than most countries.

The Times of Israel has an interesting article about how the UK used German Jews to spy on German prisoners of war [9].

Cory Doctorow wrote an insightful article “Data is the New Toxic Waste” about how collecting personal data isn’t an asset, it’s a liability [10].

Ulrike Uhlig wrote an insightful article about “Control Freaks”, analysing the different meanings of control, both positive and negative [11].

538 has an informative article about the value of statistical life [12]. It’s about $9M per person in the US, which means a mind-boggling amount of money should be spent to save the millions of lives that will be potentially lost in a natural disaster (like Coronavirus).

NPR has an interesting interview about Crypto AG, the Swiss crypto company owned by the CIA [13]. I first learned of this years ago, it’s not new, but I still learned a lot from this interview.

March 30, 2020

Resolving mDNS across VLANs with Avahi on OpenWRT

mDNS, or multicast DNS, is a way to discover devices on your network at .local domain without any central DNS configuration (also known as ZeroConf and Bonjour, etc). Fedora Magazine has a good article on setting it up in Fedora, which I won’t repeat here.

If you’re like me, you’re using OpenWRT with multiple VLANs to separate networks. In my case this includes my home automation (HA) network (VLAN 2) from my regular trusted LAN (VLAN 1). Various untrusted home automation products, as well as my own devices, go into the HA network (more on that in a later post).

In my setup, my OpenWRT router acts as my central router, connecting each of my networks and controlling access. My LAN can access everything in my HA network, but generally only establish related TCP traffic is allowed back from HA to LAN. There are some exceptions though, for example my Pi-hole DNS servers which are accessible from all networks, but otherwise that’s the general setup.

With IPv4, mDNS communicates by sending IP multicast UDP packets to 224.0.0.251 with source and destination ports both using 5353. In order to receive requests and responses, your devices need to be running an mDNS service and also allow incoming UDP traffic on port 5353.

As multicast is local only, mDNS doesn’t work natively across routed networks. Therefore, this prevents me from easily talking to my various HA devices from my LAN. In order to support mDNS across routed networks, you need a proxy in the middle to transparently send requests and responses back and forward. There are a few different options for a proxy, such as igmpproxy, but i prefer to use the standard Avahi server on my OpenWRT router.

Keep in mind that doing this will also mean that any device in your untrusted networks will be able to send mDNS requests into your trusted networks. We could stop the mDNS requests with an application layer firewall (which iptables is not), or perhaps with connection tracking, but we’ll leave that for another day. Even if untrusted devices discover addresses in LAN, the firewall is stopping the from actually communicating (at least on my setup).

Set up Avahi

Log onto your OpenWRT router and install Avahi.

opkg update
opkg install avahi-daemon

There is really only one thing that must be set in the config, and that is to enable reflector (proxy) support. This goes under the [reflector] section and looks like this.

[reflector]
enable-reflector=yes

While technically not required, you can also set which interfaces to listen on. By default it will listen on all networks, which includes WAN and other VLANs, so I prefer to limit this just to the two networks I need.

On my router, my LAN is the br-lan device and my home automation network on VLAN 2 is the eth1.2 device. Your LAN is probably the same, but your other networks will most likely be different. You can find these in your router’s Luci web interface under Network -> Interfaces. The interfaces option goes under the [server] section and looks like this.

[server]
allow-interfaces=br-lan,eth1.2

Now we can start and enable the service!

/etc/init.d/avahi-daemon start
/etc/init.d/avahi-daemon enable

OK that’s all we need to do for Avahi. It is now configured to listen on both LAN and HA interfaces and act as a proxy back and forth.

Firewall rules

As mentioned above, devices need to have incoming UDP port 5353 open. In order for our router to act as a proxy, we must enable this on both LAN and HA network interfaces (we’ll just configure for all interfaces). As mDNS multicasts to a specific address with source and destination ports both using 5353, we can lock this rule down a bit more.

Log onto your firewall Luci web interface and go to Network -> Firewall -> Traffic Rules tab. Under Open ports on router add a new rule for mDNS. This will be for UDP on port 5353.

Find the new rule in the list and edit it so we can customise it further. We can set the source to be any zone, source port to be 5353, where destination zone is the Device (input) and the destination address and port are 224.0.0.251 and 5353. Finally, set action should be accept. If you prefer to not allow all interfaces, then create two rules instead and restrict source zone for one to LAN and to your untrusted network for the other. Hit Save & Apply to make the rule!

We should now be able to resolve mDNS from LAN into the untrusted network.

Testing

To test it, ensure your Fedora computer is configured for mDNS and can resolve yourself. Now, try and ping a device in your untrusted network. For me, this will be study.local which is one of my home automation devices in my study (funnily enough).

ping study.local

When my computer in LAN tries to discover the device running in the study, the communication flow looks like this.

  • My computer (192.168.0.125) on LAN tries to ping study.local but needs to resolve it.
  • My computer sends out the mDNS UDP multicast to 224.0.0.251:5353 on the LAN, requesting address of study.local.
  • My router (192.168.0.1) picks up the request on LAN and sends same multicast request out on HA network (10.0.0.1).
  • The study device on HA network picks up the request and multicasts the reply of 10.0.0.202 back to 224.0.0.251:5353 on the HA network.
  • My router picks up the reply on HA network and re-casts it on LAN.
  • My computer picks up the reply on LAN and thus learns the address of the study device on HA network.
  • My computer successfully pings study.local at 10.0.0.202 from LAN by routing through my router to HA network.

This is what a packet capture looks like.

16:38:12.489582 IP 192.168.0.125.5353 > 224.0.0.251.5353: 0 A (QM)? study.local. (35)
16:38:12.489820 IP 10.0.0.1.5353 > 224.0.0.251.5353: 0 A (QM)? study.local. (35)
16:38:12.696894 IP 10.0.0.202.5353 > 224.0.0.251.5353: 0*- [0q] 1/0/0 (Cache flush) A 10.0.0.202 (45)
16:38:12.697037 IP 192.168.0.1.5353 > 224.0.0.251.5353: 0*- [0q] 1/0/0 (Cache flush) A 10.0.0.202 (45)

And that’s it! Now we can use mDNS to resolve devices in an untrusted network from a trusted network with zeroconf.

March 28, 2020

How to get a direct WebRTC connections between two computers

WebRTC is a standard real-time communication protocol built directly into modern web browsers. It enables the creation of video conferencing services which do not require participants to download additional software. Many services make use of it and it almost always works out of the box.

The reason it just works is that it uses a protocol called ICE to establish a connection regardless of the network environment. What that means however is that in some cases, your video/audio connection will need to be relayed (using end-to-end encryption) to the other person via third-party TURN server. In addition to adding extra network latency to your call that relay server might overloaded at some point and drop or delay packets coming through.

Here's how to tell whether or not your WebRTC calls are being relayed, and how to ensure you get a direct connection to the other host.

Testing basic WebRTC functionality

Before you place a real call, I suggest using the official test page which will test your camera, microphone and network connectivity.

Note that this test page makes use of a Google TURN server which is locked to particular HTTP referrers and so you'll need to disable privacy features that might interfere with this:

  • Brave: Disable Shields entirely for that page (Simple view) or allow all cookies for that page (Advanced view).

  • Firefox: Ensure that http.network.referer.spoofSource is set to false in about:config, which it is by default.

  • uMatrix: The "Spoof Referer header" option needs to be turned off for that site.

Checking the type of peer connection you have

Once you know that WebRTC is working in your browser, it's time to establish a connection and look at the network configuration that the two peers agreed on.

My favorite service at the moment is Whereby (formerly Appear.in), so I'm going to use that to connect from two different computers:

  • canada is a laptop behind a regular home router without any port forwarding.
  • siberia is a desktop computer in a remote location that is also behind a home router, but in this case its internal IP address (192.168.1.2) is set as the DMZ host.

Chromium

For all Chromium-based browsers, such as Brave, Chrome, Edge, Opera and Vivaldi, the debugging page you'll need to open is called chrome://webrtc-internals.

Look for RTCIceCandidatePair lines and expand them one at a time until you find the one which says:

  • state: succeeded (or state: in-progress)
  • nominated: true
  • writable: true

Then from the name of that pair (N6cxxnrr_OEpeash in the above example) find the two matching RTCIceCandidate lines (one local-candidate and one remote-candidate) and expand them.

In the case of a direct connection, I saw the following on the remote-candidate:

  • ip shows the external IP address of siberia
  • port shows a random number between 1024 and 65535
  • candidateType: srflx

and the following on local-candidate:

  • ip shows the external IP address of canada
  • port shows a random number between 1024 and 65535
  • candidateType: prflx

These candidate types indicate that a STUN server was used to determine the public-facing IP address and port for each computer, but the actual connection between the peers is direct.

On the other hand, for a relayed/proxied connection, I saw the following on the remote-candidate side:

  • ip shows an IP address belonging to the TURN server
  • candidateType: relay

and the same information as before on the local-candidate.

Firefox

If you are using Firefox, the debugging page you want to look at is about:webrtc.

Expand the top entry under "Session Statistics" and look for the line (should be the first one) which says the following in green:

  • ICE State: succeeded
  • Nominated: true
  • Selected: true

then look in the "Local Candidate" and "Remote Candidate" sections to find the candidate type in brackets.

Firewall ports to open to avoid using a relay

In order to get a direct connection to the other WebRTC peer, one of the two computers (in my case, siberia) needs to open all inbound UDP ports since there doesn't appear to be a way to restrict Chromium or Firefox to a smaller port range for incoming WebRTC connections.

This isn't great and so I decided to tighten that up in two ways by:

  • restricting incoming UDP traffic to the IP range of siberia's ISP, and
  • explicitly denying incoming to the UDP ports I know are open on siberia.

To get the IP range, start with the external IP address of the machine (I'll use the IP address of my blog in this example: 66.228.46.55) and pass it to the whois command:

$ whois 66.228.46.55 | grep CIDR
CIDR:           66.228.32.0/19

To get the list of open UDP ports on siberia, I sshed into it and ran nmap:

$ sudo nmap -sU localhost

Starting Nmap 7.60 ( https://nmap.org ) at 2020-03-28 15:55 PDT
Nmap scan report for localhost (127.0.0.1)
Host is up (0.000015s latency).
Not shown: 994 closed ports
PORT      STATE         SERVICE
631/udp   open|filtered ipp
5060/udp  open|filtered sip
5353/udp  open          zeroconf

Nmap done: 1 IP address (1 host up) scanned in 190.25 seconds

I ended up with the following in my /etc/network/iptables.up.rules (ports below 1024 are denied by the default rule and don't need to be included here):

# Deny all known-open high UDP ports before enabling WebRTC for canada
-A INPUT -p udp --dport 5060 -j DROP
-A INPUT -p udp --dport 5353 -j DROP
-A INPUT -s 66.228.32.0/19 -p udp --dport 1024:65535 -j ACCEPT

March 25, 2020

Updating OpenStack TripleO Ceph nodes safely one at a time

Part of the process when updating Red Hat’s TripleO based OpenStack is to apply the package and container updates, viaupdate run step, to the nodes in each Role (like Controller, CephStorage and Compute, etc). This is done in-place, before the ceph-upgrade (ceph-ansible) step, converge step and reboots.

openstack overcloud update run --nodes CephStorage

Rather than do an entire Role straight up however, I always update one node of that type first. This lets me make sure there were no problems (and fix them if there were), before moving onto the whole Role.

I noticed recently when performing the update step on CephStorage role nodes that OSDs and OSD nodes were going down in the cluster. This was then causing my Ceph cluster to go into backfilling and recovering (norebalance was set).

We want all of these nodes to be done one at a time, as taking more than one node out at a time can potentially make the Ceph cluster stop serving data (all VMs will freeze) until it finishes and gets the minimum number of copies in the cluster. If all three copies of data go offline at the same time, it’s not going to be able to recover.

My concern was that the update step does not check the status of the cluster, it just goes ahead and updates each node one by one (the seperate ceph update run step does check the state). If the Ceph nodes are updated faster than the cluster can fix itself, we might end up with multiple nodes going offline and hitting the issues mentioned above.

So to work around this I just ran this simple bash loop. It gets a list of all the Ceph Storage nodes and before updating each one in turn, checks that the status of the cluster is HEALTH_OK before proceeding. This would not possible if we update by Role instead.

source ~/stackrc
for node in $(openstack server list -f value -c Name |grep ceph-storage |sort -V); do
  while [[ ! "$(ssh -q controller-0 'sudo ceph -s |grep health:')" =~ "HEALTH_OK" ]] ; do
    echo 'cluster not healthy, sleeping before updating ${node}'
    sleep 5
  done
  echo 'cluster healthy, updating ${node}'
  openstack overcloud update run --nodes ${node} || { echo 'failed to update ${node}, exiting'; exit 1 ;}
  echo 'updated ${node} successfully'
done

I’m not sure if the cluster doing down like that this is expected behaviour, but I opened a bugzilla for it.

March 22, 2020

My POWER9 CPU Core Layout

So, following on from my post on Sensors on the Blackbird (and thus Power9), I mentioned that when you look at the temperature sensors for each CPU core in my 8-core POWER9 chip, they’re not linear numbers. Let’s look at what that means….

stewart@blackbird9$ sudo ipmitool sensor | grep core
 p0_core0_temp            | na                                                                                                               
 p0_core1_temp            | na                                                                                                               
 p0_core2_temp            | na                                                                                                               
 p0_core3_temp            | 38.000                                                                                                           
 p0_core4_temp            | na          
 p0_core5_temp            | 38.000      
 p0_core6_temp            | na          
 p0_core7_temp            | 38.000      
 p0_core8_temp            | na          
 p0_core9_temp            | na          
 p0_core10_temp           | na          
 p0_core11_temp           | 37.000      
 p0_core12_temp           | na          
 p0_core13_temp           | na          
 p0_core14_temp           | na          
 p0_core15_temp           | 37.000      
 p0_core16_temp           | na          
 p0_core17_temp           | 37.000      
 p0_core18_temp           | na          
 p0_core19_temp           | 39.000      
 p0_core20_temp           | na          
 p0_core21_temp           | 39.000      
 p0_core22_temp           | na          
 p0_core23_temp           | na        

You can see I have eight CPU cores in my Blackbird system. The reason the 8 CPU cores are core 3, 5, 7, 11, 15, 17, 19, and 21 rather than 0-8 or something is that these represent the core numbers on the physical die, and the die is a 24 core die. When you’re making a chip as big and as complex as modern high performance CPUs, not all of the chips coming out of your fab are going to be perfect, so this is how you get different models in the line with only one production line.

Weirdly, the output from the hwmon sensors and why there’s a “core 24” and a “core 28”. That’s just… wrong. What it is, however, is right if you think of 8*4=32. This is a product of Linux thinking that Thread=Core in some ways. So, yeah, this numbering is the first thread of each logical core.

[stewart@blackbird9 ~]$ sensors|grep -i core
 Chip 0 Core 0:            +39.0°C  (lowest = +25.0°C, highest = +71.0°C)
 Chip 0 Core 4:            +39.0°C  (lowest = +26.0°C, highest = +66.0°C)
 Chip 0 Core 8:            +39.0°C  (lowest = +27.0°C, highest = +67.0°C)
 Chip 0 Core 12:           +39.0°C  (lowest = +26.0°C, highest = +67.0°C)
 Chip 0 Core 16:           +39.0°C  (lowest = +25.0°C, highest = +67.0°C)
 Chip 0 Core 20:           +39.0°C  (lowest = +26.0°C, highest = +69.0°C)
 Chip 0 Core 24:           +39.0°C  (lowest = +27.0°C, highest = +67.0°C)
 Chip 0 Core 28:           +39.0°C  (lowest = +27.0°C, highest = +64.0°C)

But let’s ignore that, go from the IPMI sensors (which also match what the OCC shows with “occtoolp9 -LS” (see below).

$ ./occtoolp9 -SL
Sensor Details: (found 86 sensors, details only for Status of 0x00)                                           
     GUID Name             Sample     Min    Max U    Stat   Accum     UpdFreq   ScaleFactr   Loc   Type 
....
   0x00ED TEMPC03………     47      29     47 C    0x00 0x00037CF2 0x00007D00 0x00000100 0x0040 0x0008
   0x00EF TEMPC05………     37      26     39 C    0x00 0x00014E53 0x00007D00 0x00000100 0x0040 0x0008
   0x00F1 TEMPC07………     46      28     46 C    0x00 0x0001A777 0x00007D00 0x00000100 0x0040 0x0008
   0x00F5 TEMPC11………     44      27     45 C    0x00 0x00018402 0x00007D00 0x00000100 0x0040 0x0008
   0x00F9 TEMPC15………     36      25     43 C    0x00 0x000183BC 0x00007D00 0x00000100 0x0040 0x0008
   0x00FB TEMPC17………     38      28     41 C    0x00 0x00015474 0x00007D00 0x00000100 0x0040 0x0008
   0x00FD TEMPC19………     43      27     44 C    0x00 0x00016589 0x00007D00 0x00000100 0x0040 0x0008
   0x00FF TEMPC21………     36      30     40 C    0x00 0x00015CA9 0x00007D00 0x00000100 0x0040 0x0008

So what does that mean for physical layout? Well, like all modern high performance chips, the POWER9 is modular, with a bunch of logic being replicated all over the die. The most notable duplicated parts are the core (replicated 24 times!) and cache structures. Less so are memory controllers and PCI hardware.

P9 chip layout from page 31 of the POWER9 Register Specification

See that each core (e.g. EC00 and EC01) is paired with the cache block (EC00 and EC01 with EP00). That’s two POWER9 cores with one 512KB L2 cache and one 10MB L3 cache.

You can see the cache layout (including L1 Instruction and Data caches) by looking in sysfs:

$ for i in /sys/devices/system/cpu/cpu0/cache/index*/; \
  do echo -n $(cat $i/level) $(cat $i/size) $(cat $i/type); \
  echo; done
 1 32K Data
 1 32K Instruction
 2 512K Unified
 3 10240K Unified

So, what does the layout of my POWER9 chip look like? Well, thanks to the power of graphics software, we can cross some cores out and look at the topology:

My 8-core POWER9 CPU in my Raptor Blackbird

If I run some memory bandwidth benchmarks, I can see that you can see the L3 cache capacity you’d assume from the above diagram: 80MB (10MB/core). Let’s see:

[stewart@blackbird9 lmbench3]$ for i in 5M 10M 20M 30M 40M 50M 60M 70M 80M 500M; \
  do echo -n "$i   "; \
  ./bin/bw_mem -N 100  $i rd; \
done
  5M    5.24 63971.98
 10M   10.49 31940.14
 20M   20.97 17620.16
 30M   31.46 18540.64
 40M   41.94 18831.06
 50M   52.43 17372.03
 60M   62.91 16072.18
 70M   73.40 14873.42
 80M   83.89 14150.82
 500M 524.29 14421.35

If all the cores were packed together, I’d expect that cliff to be a lot sooner.

So how does this compare to other machines I have around? Well, let’s look at my Ryzen 7. Specifically, a “AMD Ryzen 7 1700 Eight-Core Processor”. The cache layout is:

$ for i in /sys/devices/system/cpu/cpu0/cache/index*/; \
  do echo -n $(cat $i/level) $(cat $i/size) $(cat $i/type); \
  echo; \
done
 1 32K Data
 1 64K Instruction
 2 512K Unified
 3 8192K Unified

And then the performance benchmark similar to the one I ran above on the POWER9 (lower numbers down low as 8MB is less than 10MB)

$ for i in 4M 8M 16M 24M 32M 40M 48M 56M 64M 72M 80M 500M; \
  do echo -n "$i   "; ./bin/x86_64-linux-gnu/bw_mem -N 10  $i rd;\
done
  4M    4.19 61111.04
  8M    8.39 28596.55
 16M   16.78 21415.12
 24M   25.17 20153.57
 32M   33.55 20448.20
 40M   41.94 20940.11
 48M   50.33 20281.39
 56M   58.72 21600.24
 64M   67.11 21284.13
 72M   75.50 20596.18
 80M   83.89 20802.40
 500M 524.29 21489.27

And my laptop? It’s a four core part, specifically a “Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz” with a cache layout like:

$ for i in /sys/devices/system/cpu/cpu0/cache/index*/; \
   do echo -n $(cat $i/level) $(cat $i/size) $(cat $i/type); \
     echo; \
   done
   1 32K Data
   1 32K Instruction
   2 256K Unified
   3 6144K Unified 
$ for i in 3M 6M 12M 18M 24M 30M 36M 42M 500M; \
  do echo -n "$i   "; ./bin/x86_64-linux-gnu/bw_mem -N 10  $i rd;\
done
  3M    3.15 48500.24
  6M    6.29 27144.16
 12M   12.58 18731.80
 18M   18.87 17757.74
 24M   25.17 17154.12
 30M   31.46 17135.87
 36M   37.75 16899.75
 42M   44.04 16865.44
 500M 524.29 16817.10

I’m not sure what performance conclusions we can realistically draw from these curves, apart from “keeping workload to L3 cache is cool”, and “different chips have different cache hardware”, and “I should probably go and read and remember more about the microarchitectural characteristics of the cache hardware in Ryzen 7 hardware and 10th gen Intel Core hardware”.

Online Teaching

The OpenSTEM® materials are ideally suited to online teaching. In these times of new challenges and requirements, there are a lot of technological possibilities. Schools and teachers are increasingly being asked to deliver material online to students. Our materials can assist with that process, especially for Humanities and Science subjects from Prep/Kindy/Foundation to Year 6. […]

Covid 19 Numbers – lag

Recording some thoughts about Covid 19 numbers.

Today’s figures

The Government says:

“As at 6.30am on 22 March 2020, there have been 1,098 confirmed cases of COVID-19 in Australia”.

The reference is https://www.health.gov.au/news/health-alerts/novel-coronavirus-2019-ncov-health-alert/coronavirus-covid-19-current-situation-and-case-numbers. However, that page is updated daily (ish), so don’t expect it to be the same if you check the reference.

Estimating Lag

If a person tests positive to the virus today, that means they were infected at some time in the past. So, what is the lag between infection and a positive test result?

Incubation Lag – about 5 days

When you are infected you don’t show symptoms immediately. Rather, there’s an incubation period before symptoms become apparent.  The time between being infected and developing symptoms varies from person to person, but most of the time a person shows symptoms after about 5 days (I recall seeing somewhere that 1 in a 1000 cases will develop symptoms after 14 days).

Presentation Lag – about 2 days

I think it’s fair to also assume that people are not presenting at testing immediately they become ill. It is probably taking them a couple of days from developing symptoms to actually get to the doctor – I read a story somewhere (have since lost the reference) about a young man who went to a party, then felt bad for days but didn’t go for a test until someone else from the party had returned a positive test.  Let’s assume there’s a mix of worried well and stoic types and call it 2 days from becoming symptomatic to seeking a test.

Referral Lag – about a day

Assuming that a GP is available straight away and recommends a test immediately, logistically there will still be most of a day taken up between deciding to see a doctor and having a test carried out.

Testing lag – about 2 days

The graph of infections “epi graph” today looks like this:

200322_new-and-cumulative-covid-19-cases-in-australia-by-notification-date_1

One thing you notice about the graph is that the new cases bars seem to increase for a couple of days, then decrease – so about 100 new cases in the last 24 hours, but almost 200 in the 24 hours before that. From the graph, the last 3 “dips” have been today (Sunday), last Thursday and last Sunday.  This seems to be happening every 3 to 4 days. I initially thought that the dips might mean fewer (or more) people presenting over weekends, but the period is inconsistent with that. I suspect, instead, that this actually means that testing is being batched.

That would mean that neither the peaks nor troughs is representative of infection surges/retreats, but is simply reflecting when tests are being processed. This seems to be a 4 day cycle, so, on average it seems that it would be about 2 days between having the test conducted and receiving a result. So a confirmed case count published today is actually showing confirmed cases as at about 2 days earlier.

Total lag

From the date someone is infected to the time that they receive a positive confirmation is about:

lag = time for symptoms to show+time to seek a test+referral time + time for the test to return a result

So, the published figures on confirmed infections are probably lagging actual infections in the community by about 10 days (5+2+1+2).

If there’s about a 10 day lag between infection and confirmation, then what a figure published today says is that about a week and a half ago there were about this many cases in the community.  So, the 22 March figure of 1098 infections is actually really a 12 March figure.

What the lag means for Physical (ie Social) Distancing

The main thing that the lag means is that if we were able to wave a magic wand today and stop all further infections, we would continue to record new infections for about 10 days (and the tail for longer). In practical terms, implementing physical distancing measures will not show any effect on new cases for about a week and a half. That’s because today there are infected people who are yet to be tested.

The silver lining to that is that the physical distancing measures that have been gaining prominence since 15 March should start to show up in the daily case numbers from the middle of the coming week, possibly offset by overseas entrants rushing to make the 20 March entry deadline.

Estimating Actual Infections as at Today

How many people are infected, but unconfirmed as at today? To estimate actual infections you’d need to have some idea of the rate at which infections are increasing. For example, if infections increased by 10% per day for 10 days, then you’d multiply the most recent figure by 1.1 raised to the power of 10 (ie about 2.5).  Unfortunately, the daily rate of increase (see table on the wiki page) has varied a fair bit (from 20% to 27%) over the most recent 10 days of data (that is, over the 10 days prior to 12 March, since the 22 March figures roughly correspond to 12 March infections) and there’s no guarantee that since that time the daily increase in infections will have remained stable, particularly in light of the implementation of physical distancing measures. At 23.5% per day, the factor is about 8.

There aren’t any reliable figures we can use to estimate the rate of infection during the current lag period (ie from 12 March to 22 March). This is because the vast majority of cases have not been from unexplained community transmission. Most of the cases are from people who have been overseas in the previous fortnight and they’re the cohort that has been most significantly impacted by recent physical distancing measures. From 15 March, they have been required to self isolate and from 20 March most of their entry into the country has stopped.  So I’d expect a surge in numbers up to about 30 March – ie reflecting infections in the cohort of people rushing to get into the country before the borders closed followed by a flattening. With the lag factor above, you’ll need to wait until 1 April or thereabouts to know for sure.

Note:

This post is just about accounting for the time lag between becoming infected and receiving a positive test result. It assumes, for example, that everyone who is infected seeks a test, and that everyone who is infected and seeks a test is, in fact, tested. As at today, neither of these things is true.

OCC and Sensors on the Raptor Blackbird (and other POWER9 systems)

This post we’re going to look at three different ways to look at various sensors in the Raptor Blackbird system. The Blackbird is a single socket uATX board for the POWER9 processor. One advantage of the system is completely open source firmware, so you can (like I have): build your own firmware. So, this is my Blackbird running my most recent firmware build (the BMC is running the 2.00 release from Raptor).

Sensors over IPMI

One way to get the sensors is over IPMI. This can be done either in-band (as in, from the OS running on the blackbird), or over the network.

stewart@blackbird9$ sudo ipmitool sensor |head
occ                      | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
 occ0                     | 0x0        | discrete   | 0x0200| na        | na        | na        | na        | na        | na        
 occ1                     | 0x0        | discrete   | 0x0100| na        | na        | na        | na        | na        | na        
 p0_core0_temp            | na         |            | na    | na        | na        | na        | na        | na        | na        
 p0_core1_temp            | na         |            | na    | na        | na        | na        | na        | na        | na        
 p0_core2_temp            | na         |            | na    | na        | na        | na        | na        | na        | na        
 p0_core3_temp            | 38.000     | degrees C  | ok    | na        | -40.000   | na        | 78.000    | 90.000    | na        
 p0_core4_temp            | na         |            | na    | na        | na        | na        | na        | na        | na        
 p0_core5_temp            | 38.000     | degrees C  | ok    | na        | -40.000   | na        | 78.000    | 90.000    | na        
 p0_core6_temp            | na         |            | na    | na        | na        | na        | na        | na        | na    

It’s kind of annoying to read there, so standard unix tools to the rescue!

stewart@blackbird9$ sudo ipmitool sensor | cut -d '|' -f 1,2
 occ                      | na                                                                                                               
 occ0                     | 0x0                                                                                                              
 occ1                     | 0x0                                                                                                              
 p0_core0_temp            | na                                                                                                               
 p0_core1_temp            | na                                                                                                               
 p0_core2_temp            | na                                                                                                               
 p0_core3_temp            | 38.000                                                                                                           
 p0_core4_temp            | na          
 p0_core5_temp            | 38.000      
 p0_core6_temp            | na          
 p0_core7_temp            | 38.000      
 p0_core8_temp            | na          
 p0_core9_temp            | na          
 p0_core10_temp           | na          
 p0_core11_temp           | 37.000      
 p0_core12_temp           | na          
 p0_core13_temp           | na          
 p0_core14_temp           | na          
 p0_core15_temp           | 37.000      
 p0_core16_temp           | na          
 p0_core17_temp           | 37.000      
 p0_core18_temp           | na          
 p0_core19_temp           | 39.000      
 p0_core20_temp           | na          
 p0_core21_temp           | 39.000      
 p0_core22_temp           | na          
 p0_core23_temp           | na          
 p0_vdd_temp              | 40.000 
 dimm0_temp               | 35.000      
 dimm1_temp               | na          
 dimm2_temp               | na          
 dimm3_temp               | na          
 dimm4_temp               | 38.000      
 dimm5_temp               | na          
 dimm6_temp               | na          
 dimm7_temp               | na          
 dimm8_temp               | na          
 dimm9_temp               | na          
 dimm10_temp              | na          
 dimm11_temp              | na          
 dimm12_temp              | na          
 dimm13_temp              | na          
 dimm14_temp              | na          
 dimm15_temp              | na          
 fan0                     | 1200.000    
 fan1                     | 1100.000    
 fan2                     | 1000.000    
 p0_power                 | 33.000      
 p0_vdd_power             | 5.000       
 p0_vdn_power             | 9.000       
 cpu_1_ambient            | 30.600      
 pcie                     | 27.000      
 ambient                  | 26.000  

You can see that I have 3 fans, two DIMMs (although why it lists 16 possible DIMMs for a two DIMM slot board is a good question!), and eight CPU cores. More on why the layout of the CPU cores is the way it is in a future post.

The code path for reading these sensors is interesting, it’s all from the BMC, so we’re having the OCC inside the P9 read things, which the BMC then reads, and then passes back to the P9. On the P9 itself, each sensor is a call all the way to firmware and back! In fact, we can look at it in perf:

$ sudo perf record -g ipmitool sensor
$ sudo perf report --no-children
“ipmitool sensors” perf report

What are the 0x300xxxxx addresses? They’re the OPAL firmware (i.e. skiboot). We can look up the symbols easily, as the firmware exposes them to the kernel, which then plonks it in sysfs:

[stewart@blackbird9 ~]$ sudo head /sys/firmware/opal/symbol_map 
[sudo] password for stewart: 
0000000000000000 R __builtin_kernel_end
0000000000000000 R __builtin_kernel_start
0000000000000000 T __head
0000000000000000 T _start
0000000000000010 T fdt_entry
00000000000000f0 t boot_sem
00000000000000f4 t boot_flag
00000000000000f8 T attn_trigger
00000000000000fc T hir_trigger
0000000000000100 t sreset_vector

So we can easily look up exactly where this is:

[stewart@blackbird9 ~]$ sudo grep '18e.. ' /sys/firmware/opal/symbol_map 
 0000000000018e20 t .__try_lock.isra.0
 0000000000018e68 t .add_lock_request

So we’re managing to spend a whole 12% of execution time spinning on a spinlock in firmware! The call stack of what’s going on in firmware isn’t so easy, but we can find the bt_add_ipmi_msg call there which is probably how everything starts:

[stewart@blackbird9 ~]$ sudo grep '516.. ' /sys/firmware/opal/symbol_map   0000000000051614 t .bt_add_ipmi_msg_head  0000000000051688 t .bt_add_ipmi_msg  00000000000516fc t .bt_poll

OCCTOOL

This is the most not-what-you’re-meant-to-use method of getting access to sensors! It’s using a debug tool for the OCC firmware! There’s a variety of tools in the OCC source repositiory, and one of them (occtoolp9) can be used for a variety of things, one of which is getting sensor data out of the OCC.

$ sudo ./occtoolp9 -SL
     Sensor Type: 0xFFFF
 Sensor Location: 0xFFFF
     (only displaying non-zero sensors)
 Sending 0x53 command to OCC0 (via opal-prd)…
   MFG Sub Cmd: 0x05  (List Sensors)
   Num Sensors: 50
     [ 1] GUID: 0x0000 / AMEintdur…….  Sample:     20  (0x0014)
     [ 2] GUID: 0x0001 / AMESSdur0…….  Sample:      7  (0x0007)
     [ 3] GUID: 0x0002 / AMESSdur1…….  Sample:      3  (0x0003)
     [ 4] GUID: 0x0003 / AMESSdur2…….  Sample:     23  (0x0017)

The odd thing you’ll see is “via opal-prd” – and this is because it’s doing raw calls to the opal-prd binary to talk to the OCC firmware running things like “opal-prd --expert-mode htmgt-passthru“. Yeah, this isn’t a in-production thing :)

Amazingly (and interestingly), this doesn’t go through host firmware in the way that an IPMI call will. There’s a full OCC/Host firmware interface spec to read. But it’s insanely inefficient way to monity sensors, a long bash script shelling out to a whole bunch of other processes… Think ~14.4 billion cycles versus ~367million cycles for the ipmitool option above.

But there are some interesting sensors at the end of the list:

Sensor Details: (found 86 sensors, details only for Status of 0x00)                                                  
     GUID Name             Sample     Min    Max U    Stat   Accum     UpdFreq   ScaleFactr   Loc   Type   
....
   0x014A MRDM0………..    688       3  15015 GBs  0x00 0x0144AE6C 0x00001901 0x000080FB 0x0008 0x0200
   0x014E MRDM4………..    480       3  14739 GBs  0x00 0x01190930 0x00001901 0x000080FB 0x0008 0x0200
   0x0156 MWRM0………..    560       4  16605 GBs  0x00 0x014C61FD 0x00001901 0x000080FB 0x0008 0x0200
   0x015A MWRM4………..    360       4  16597 GBs  0x00 0x014AE231 0x00001901 0x000080FB 0x0008 0x0200

is that memory bandwidth? Well, if I run the STREAM benchmark in a loop and look again:

0x014A MRDM0………..  15165       3  17994 GBs  0x00 0x0C133D6C 0x00001901 0x000080FB 0x0008 0x0200
   0x014E MRDM4………..  17145       3  18016 GBs  0x00 0x0BF501D6 0x00001901 0x000080FB 0x0008 0x0200
   0x0156 MWRM0………..   8063       4  24280 GBs  0x00 0x07C98B88 0x00001901 0x000080FB 0x0008 0x0200
   0x015A MWRM4………..   1138       4  24215 GBs  0x00 0x07CE82AF 0x00001901 0x000080FB 0x0008 0x0200

It looks like it! Are these exposed elsewhere? Well, another blog post at some point in the future is where I should look at that.

lm-sensors

$ rpm -qf /usr/bin/sensors
 lm_sensors-3.5.0-6.fc31.ppc64le

Ahhh, old faithful lm-sensors! Yep, a whole bunch of sensors are just exposed over the standard interface that we’ve been using since ISA was a thing.

[stewart@blackbird9 ~]$ sensors                                                                  
 ibmpowernv-isa-0000                                       
 Adapter: ISA adapter                                      
 Chip 0 Vdd Remote Sense:  +1.02 V  (lowest =  +0.72 V, highest =  +1.02 V)
 Chip 0 Vdn Remote Sense:  +0.67 V  (lowest =  +0.67 V, highest =  +0.67 V)
 Chip 0 Vdd:               +1.02 V  (lowest =  +0.73 V, highest =  +1.02 V)
 Chip 0 Vdn:               +0.68 V  (lowest =  +0.68 V, highest =  +0.68 V)
 Chip 0 Core 0:            +47.0°C  (lowest = +25.0°C, highest = +71.0°C)            
 Chip 0 Core 4:            +47.0°C  (lowest = +26.0°C, highest = +66.0°C)            
 Chip 0 Core 8:            +48.0°C  (lowest = +27.0°C, highest = +67.0°C)            
 Chip 0 Core 12:           +48.0°C  (lowest = +26.0°C, highest = +67.0°C)            
 Chip 0 Core 16:           +47.0°C  (lowest = +25.0°C, highest = +67.0°C)                      
 Chip 0 Core 20:           +47.0°C  (lowest = +26.0°C, highest = +69.0°C)            
 Chip 0 Core 24:           +48.0°C  (lowest = +27.0°C, highest = +67.0°C)                     
 Chip 0 Core 28:           +51.0°C  (lowest = +27.0°C, highest = +64.0°C)                     
 Chip 0 DIMM 0 :           +40.0°C  (lowest = +34.0°C, highest = +44.0°C)                     
 Chip 0 DIMM 1 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)                     
 Chip 0 DIMM 2 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 3 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 4 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 5 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 6 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 7 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 8 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 9 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 10 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 11 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 12 :          +43.0°C  (lowest = +36.0°C, highest = +47.0°C)
 Chip 0 DIMM 13 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 14 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 15 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 Nest:              +48.0°C  (lowest = +27.0°C, highest = +64.0°C)
 Chip 0 VRM VDD:           +47.0°C  (lowest = +39.0°C, highest = +66.0°C)
 Chip 0 :                  44.00 W  (lowest =  31.00 W, highest = 132.00 W)
 Chip 0 Vdd:               15.00 W  (lowest =   4.00 W, highest = 104.00 W)
 Chip 0 Vdn:               10.00 W  (lowest =   8.00 W, highest =  12.00 W)
 Chip 0 :                 227.11 kJ
 Chip 0 Vdd:               44.80 kJ
 Chip 0 Vdn:               58.80 kJ
 Chip 0 Vdd:              +21.50 A  (lowest =  +6.50 A, highest = +104.75 A)
 Chip 0 Vdn:              +14.88 A  (lowest = +12.63 A, highest = +18.88 A)

The best thing? It’s really quick! The hwmon interface is fast and efficient.

March 21, 2020

Using Ansible and dynamic inventory to manage OpenStack TripleO nodes

TripleO based OpenStack deployments use an OpenStack all-in-one node (undercloud) to automate the build and management of the actual cloud (overcloud) using native services such as Heat and Ironic. Roles are used to define services and configuration, which are then applied to specific nodes, for example, Service, Compute and CephStorage, etc.

Although the install is automated, sometimes you need to run adhoc tasks outside of the official update process. For example, you might want to make sure that all hosts are contactable, have a valid subscription (for Red Hat OpenStack Platform), restart containers, or maybe even apply custom changes or patches before an update. Also, during the update process when nodes are being rebooted, it can be useful to use an Ansible script to know when they’ve all come back, services are all running, all containers are healthy, before re-enabling them.

Inventory script

To make this easy, we can use the TripleO Ansible inventory script, which queries the undercloud to get a dynamic inventory of the overcloud nodes. When using the script as an inventory source with the ansible command however, you cannot pass arguments to it. If you’re managing a single cluster and using the standard stack name of overcloud, then this is not a problem; you can just call the script directly.

However, as I manage multiple clouds and each has a different Heat stack name, I create a little executable wrapper script to pass the stack name to the inventory script. Then I just call the relevant shell script instead. If you use the undercloud host to manage multiple stacks, then create multiple scripts and modify as required.

cat >> inventory-overcloud.sh << EOF
#!/usr/bin/env bash
source ~/stackrc
exec /usr/bin/tripleo-ansible-inventory --stack stack-name --list
EOF

Make it executable and run it. It should return JSON with your overcloud node details.

chmod u+x inventory-overcloud.sh
./inventory-overcloud.sh

Run simple tasks

The purpose of using the dynamic inventory is to run some Ansible! We can now use it to do simple things easily, like ping nodes to make sure they are online.

ansible \
--inventory inventory-overcloud.sh \
all \
--module-name ping

And of course one of the great things with Ansible is the ability to limit which hosts you’re running against. So for example, to make sure all compute nodes of role type Compute are back, simple replace all with Compute.

ansible \
--inventory inventory-overcloud.sh \
Compute \
--module-name ping

You can also specify nodes individually.

ansible \
--inventory inventory-overcloud.sh \
service-0,telemetry-2,compute-0,compute-1 \
--module-name ping

You can use the shell module to do simple adhoc things, like restart containers or maybe check their health.

ansible \
--inventory inventory-overcloud.sh \
all \
--module-name shell \
--become \
--args "docker ps |egrep "CONTAINER|unhealthy"'

And the same command using short arguments.

ansible \
-i inventory-overcloud.sh \
all \
-m shell \
-ba "docker ps |egrep "CONTAINER|unhealthy"'

Create some Ansible plays

You can see simple tasks are easy, for more complicated tasks you might want to write some plays.

Pre-fetch downloads before update

Your needs will probably vary, but here is a simple example to pre-download updates on my RHEL hosts to save time (updates are actually installed separately via overcloud update process). Note that the download_only option was added in Ansible 2.7 and thus I don’t use the yum module as RHEL uses Ansible 2.6.

cat >> fetch-updates.yaml << EOF
---
- hosts: all
  tasks:
    - name: Fetch package updates
      command: yum update --downloadonly
      register: result_fetch_updates
      retries: 30
      delay: 10
      until: result_fetch_updates is succeeded
      changed_when: '"Total size:" not in result_fetch_updates.stdout'
      args:
        warn: no
EOF

Now we can run this command against the next set of nodes we’re going to update, Compute and Telemetry in this example.

ansible-playbook \
--inventory inventory-overcloud.sh \
--limit Compute,Telemetry \
fetch-updates.yaml

And again, you could specify nodes individually.

ansible-playbook \
--inventory inventory-overcloud.sh \
--limit telemetry-0,service-0,compute-2,compute-3 \
fetch-updates.yaml

There you go. Using dynamic inventory can be really useful for running adhoc commands against your OpenStack nodes.

COVID-19 (of course)

We thought it timely to review a few facts and observations, relying on published medical papers (or those submitted for peer review) and reliable sources.

March 17, 2020

COVID-19 Time Series Analysis

On Friday 13 March I started looking at the COVID-19 time series case data. The first step was to fit a simple exponential model. The model lets us work out the number of cases t days in the future N(t), given N(0) cases today, and a doubling time of Td days:

N(t) = N(0)*2^(t/Td)

To work out how many days (t) to a number of cases N(t), you can re-arrange to get:

t = Td*log2(N(t)/N(0))

At the time I had some US travel planned for late March. So I plugged in some numbers to see how long it would take the US to get to 70,000 cases (China’s cases at the time):

t = 3*log2(70,000/1600) = 16.4 days

Wow. It slowly dawned on me that international travel was going to be a problem. The human mind just struggles to cope with the power of exponential growth. Five days seems a long time ago now….

I immediately grounded my parents – they are an at risk demographic and in a few weeks the hospitals will not be able to help them if they get sick. I estimate my home city of Adelaide (30 cases on March 18) will struggle with 1000 cases (a proportion of which will need Intensive Care):

t = 4*log2(1000/30) = 20 days

The low number of cases today is not important, the exponential growth is the critical factor.

Since then I’ve been messing with a customised covid19.py Python script to generate some plots useful to me. It is based on some code I found from Mohammad Ashhad. You might find it useful too, it’s easy to customise to other countries. I’d also appreciate a review of the script and math in this post.

I find that analysing the data gives me a small sense of control over the situation. And a useful crystal ball in this science fiction life we have suddenly started living.

Here are some plots from the last 14 days:


I find the second, log plot much more helpful. A constant positive slope on a log plot indicates exponential growth which is bad. We want the log plot to flatten out to a horizontal line.

Doubling time is the key metric. Here is a smoothed (3 day window) estimate. A low doubling time (e.g. a few days) is bad, our target is a high doubling time:

It’s a bit noisey at the moment. I’m interested in Spain and Italy as they have locked down. There will be a time lag as infections prior to lock down flow through to cases, but I expect (and sincerely hope) to see the doubling time of those countries improve, and new cases tapering off.

I’m working from home and hoping Australia will lock down soon. I will update the plots above daily.

All the best to everyone.

Update – March 26 2020

It’s been one week since I first published this post and I have been updating the graphs every day. My models are simplistic and I am not an epidemiologist. However I am sorry to say that exponential growth for Australia and the US has proceeded at the same rate or faster than the simple models above predicted.

Italy is showing a clear trend to an improved doubling time. The top plot show an almost linear growth. This is welcome news and will hopefully soon lead to a decreased load on their hospitals. This is encouraging to me as it shows lock down can work!

A small positive trend for Spain, who have also locked down; however Australia and the US still doubling every 3-4 days. It’s clear from the second, log plot, that US cases will soon be the highest in the world.

Any changes we make in behaviour today will take 1-2 weeks to flow through. So this is a window into behavioural changes 1-2 weeks ago, and an estimate of the doubling rate for the next 1-2 weeks.

A daily case increase of 10% is a doubling time of 7.3 days (1 week). This intuitively feels like a good first milestone, and something expanding health systems have some chance of dealing with. It’s also easy to calculate in your head when looking at day by day statistics. A daily increase of 20% is a 3.8 day doubling time and very bad news.

Australia still doesn’t have a strong lock down, and many people are not staying at home. I hope our government acts decisively soon.

Update – April 3 2020

Another week has passed since my last update – a long time in the Coronavirus saga. A few days after my last update, I noticed the Australian new cases were constant at around 350 for a few days, then started to drop. The doubling time has shot up too, and the top graph looks almost linear now. Australia is now at about 5% new cases/day (300 new/5000 existing cases). We can handle that.

This means out hospitals are not going to break. Good news indeed. My theory on this reduction is the time delayed effect of the Australian population starting to take Corona seriously, and good management by our state and federal governments. Several states have a lock down but the effect hasn’t flowed through to cases yet.

I think we are now entering a “whack a mole” stage, like China, Japan, and South Korea. We’ll have to remain vigilant, stay at home, and smash small outbreaks as they spring up in the community. Recoveries will eventually start to pick up and the number of active cases decline. The current numbers are a 0.5% fatality rate and 2% ICU admission rate.

Despite the appalling number of deaths in Italy and Spain, they clearly have new cases under control through lock down. The log curves are flat, and doubling times steadily increasing. The situation is very bad in the US, and many other countries. I am particularly concerned for the developing world.

I note the doubling rate curve for for Spain and Australia is the same, Australia is just much further down the curve. Even my septuagenarian parents are behaving – mostly “staying home”.

Doing my own analysis has been really useful – I basically ignore the headlines (anyone sick of the word “surge”?) as I can look at the data and drill down to what matters. I’m picking trends a few days before they are reported. Still a few things to ponder, like a model for how ICU cases track reported cases.

Best wishes to everyone.

Links

John Hopkins CSSE COVID-19 Dashboard
Source Data
Our World in Data Coronavirus Statistics and Research

March 15, 2020

Using network namespaces with veth to NAT guests with overlapping IPs

Sets of virtual machines are connected to a virtual bridges (e.g. virbr0 and virbr1) and as they are isolated, can use the same subnet range and set of IPs. However, NATing becomes a problem because the host won’t know which VM to return the traffic to.

To solve this problem, we can use network namespaces and some veth (virtual Ethernet) devices to connect up each private network we want to NAT.

Each veth device acts like a patch cable and is actually made up of two network devices, one for each end (e.g. peer1-a and peer1-b). By adding those interfaces between bridges and/or namespaces, you create a link between them.

The network namespace is only used for NAT and is where the veth IPs are set, the other end will act like a patch cable without an IP. The VMs are only connected into their respective bridge (e.g. virbr0) and can talk to the network namespace over the veth patch.

We will use two pairs for each network namespace.

  • One (e.g. represented by veth1 below ) which connects the virtual machine’s private network (e.g. virbr0 on 10.0.0.0/24) into the network namespace (e.g. net-ns1) where it sets an IP and will be the private network router (e.g. 10.0.0.1).
  • Another (e.g. represented by veth2 below) which connects the upstream provider network (e.g. br0 on 192.168.0.0/24) into the same network namespace where it sets an IP (e.g. 192.168.0.100).
  • Repeat the process for other namespaces (e.g. represented by veth3 and veth4 below).
Configuration for multiple namespace NAT

By providing each private network with is own unique upstream routable IP and applying NAT rules inside each namespace separately we can avoid any conflict.

Create a provider bridge

You’ll need a bridge to a physical network, which will act as your upstream route (like a “provider” network).

ip link add name br0 type bridge
ip link set br0 up
ip link set eth0 up
ip link set eth0 master br0

Create namespace

We create our namespace to patch in the veth devices and hold the router and isolated NAT rules. As this is for the purpose of NATing multiple private networks, I’m making it sequential and calling this nat1 (for our first one, then I’ll call the next one nat2).

ip netns add nat1

First veth pair

Our first veth peer interfaces pair will be used to connect the namespace to the upstream bridge (br0). Give them a name that makes sense to you; here I’m making it sequential again and specifying the purpose. Thus, peer1-br0 will connect to the upstream br0 and peer1-gw1 will be our routable IP in the namespae.

ip link add peer1-br0 type veth peer name peer1-gw1

Adding the veth to provider bridge

Now we need to add the peer1-br0 interface to the upstream provider bridge and bring it up. Note that we do not set an IP on this, it’s a patch lead. The IP will be on the other end in the namespace.

brctl addif br0 peer1-br0
ip link set peer1-br0 up

First gateway interface in namespace

Next we want to add the peer1-gw1 device to the namespace, give it an IP on the routable network, set the default gateway and bring the device up. Note that if you use DHCP you can do that, here I’m just setting an IP statically to 192.168.0.100 and gateway of 192.168.0.1.

ip link set peer1-gw1 netns nat1
ip netns exec nat1 ip addr add 192.168.0.100/24 dev peer1-gw1
ip netns exec nat1 ip link set peer1-gw1 up
ip netns exec nat1 ip route add default via 192.168.0.1

Second veth pair

Now we create the second veth pair to connect the namespace into the private network. For this example we’ll be connecting to virbr0 network, where our first set of VMs are running. Again, give them useful names.

ip link add peer1-virbr0 type veth peer name peer1-gw2

Adding the veth to private bridge

Now we need to add the peer1-virbr0 interface to the virbr0 private network bridge. Note that we do not set an IP on this, it’s a patch lead. The IP will be on the other end in the namespace.

brctl addif virbr0 peer1-virbr0
ip link set peer1-virbr0 up

Second gateway interface in namespace

Next we want to add the peer1-gw2 device to the namespace, give it an IP on the private network and bring the device up. I’m going to set this to the default gateway of the VMs in the private network, which is 10.0.0.1.

ip link set peer1-gw2 netns nat1
ip netns exec nat1 ip addr add 10.0.0.1/24 dev peer1-gw2
ip netns exec nat1 ip link set up dev peer1-gw2

Enable NAT in the namespae

So now we have our namespace with patches into each bridge and IPs on each network. The final step is to enable network address translation.

ip netns exec nat1 iptables -t nat -A POSTROUTING -o peer1-gw1 -j MASQUERADE
ip netns exec nat1 iptables -A FORWARD -i peer1-gw1 -o peer1-gw2 -m state --state RELATED,ESTABLISHED -j ACCEPT
ip netns exec nat1 iptables -A FORWARD -i peer1-gw2 -o peer1-gw1 -j ACCEPT

You can see the rules with standard iptables in the netspace.

ip netns exec nat1 iptables -t nat -L -n

Test it

OK so logging onto the VMs, they should a local IP (e.g. 10.0.0.100, a default route to 10.0.0.1 and have upstream DNS set. Test that they can ping the gateway, test they can ping the DNS and test that they can ping a DNS name on the Internet.

Rinse and repeat

This can be applied for other virtual machine networks as required. There is no-longer any need for the VMs there to have unique IPs, they can overlap eachother.

What you do need to do is create a new network namespace, create two new sets of veth pairs (with a useful name) and pick another IP on the routable network. The virtual machine gateway IP will be the same in each namespace, that is 10.0.0.1.

To be, or not to be decisive.

To be, or not to be decisive. kattekrab Sun, 15/03/2020 - 11:26

March 13, 2020

From 2020 to 2121: How will we get there?

From 2020 to 2121: How will we get there? kattekrab Thu, 13/02/2020 - 19:35

6 reasons I love working from home (The COVID19 edition)

6 reasons I love working from home (The COVID19 edition) kattekrab Fri, 13/03/2020 - 13:34

March 12, 2020

Coronavirus and Work

Currently the big news issue is all about how to respond to Coronavirus. The summary of the medical situation is that it’s going to spread exponentially (as diseases do) and that it has a period of up to 6 days of someone being infectious without having symptoms. So you can get a lot of infected people in an area without anyone knowing about it. Therefore preventative action needs to be taken before there’s widespread known infection.

Governments seem disinterested in doing anything about the disease before they have proof of widespread infection. They won’t do anything until it’s too late.

I finished my last 9-5 job late last year and hadn’t got a new one since then. Now I’m thinking of just not taking any work that requires much time spend outside home. If you don’t go to a workplace there isn’t a lot you have to do that involves leaving home.

Shopping is one requirement for leaving home, but the two major supermarket chains in my area (Coles and Woolworths) both offer home delivery for a small price so that covers most shopping. Getting groceries delivered means that they will usually come from the store room not the shop floor so wouldn’t have anyone coughing or sneezing on them. If you are really paranoid (which I aren’t at the moment) then you could wear rubber gloves to bring the delivery in and then wash everything before using it. It seems that many people have similar ideas to me, normally Woolworths allows booking next-day delivery, now you have to book at least 5 days (3 business days) in advance.

If anyone needs some Linux work done from remote then let me know. Otherwise I’ll probably spend the next couple of months at home doing Debian coding and watching documentaries on Netflix.

March 10, 2020

The Net Promoter Score: A Meaningless Flashing Light

Almost two years ago I made a short blog post about how the Net Promoter Score (NPS), commonly used in business settings, is The Most Useless Metric of All. My reasons at the time is that it doesn't capture the reasons for a low score, it doesn't differentiate between subjective values in its scores, and it is mathematically incoherent (a three-value grade from an 11-point range of 0-10). Further, actual studies rank it last in effectiveness.

Recently, the author of the NPS, Fred Reichfield, has come around. Apparently now It's Not About the Score, but rather the score represents a "signal". This is a very far cry from the initial claims in the Harvard Business Review that it is "The One Number You Need to Grow". Of course, it is very difficult for anyone to admit they've made an error, and Reichfield is no exception to this. Instead of addressing what are real problems of the NPS method, he now tries to argue that people have gamified the scores, and that's the real problem. It would be great if as a general principle in business reasoning, people could just admit that their pet idea is flawed and build something better. That would be appreciated. Defending something that is clearly broken, even if it's your own idea, lacks intellectual humility, and is actually a bit embarrassing to watch.

Even as a signal, the NPS doesn't send a useful signal because the people being surveyed don't know what the signal means. In a scale where a 0 is equal to a 6, the scale is meaningless. There is, in fact, only three values in NPS (promoter, passive, detractor) and only one metric that it can possibly be testing: "How likely is it that you would recommend [company X] to a friend or colleague?"

Is that a useful question? Maybe for a generic good. It is far less useful for specialist goods. Do I recommend a three-day course in learning about job submission with Slurm for high-performance computing? Only to a few people that it would benefit. What score do I give? Maybe a 2, representing the number of people I would recommend it to? 3/11 is actually a lot in a quantitative sense, but that's the circles I mix with. Ah, but no; that makes me a detractor. And here we fall into the problem of subjective evaluation of the meaning of the scores.

The NPS doesn't send a useful signal also because the people receiving the survey have no idea what the signal means. "Wow, we're receiving a lot of positive promoters!", "Do you know why?", "Nope, but we must be doing something right. I wonder what it is?". It's like driving in the dark and congratulating your skills that you haven't gone off the edge of a cliff - yet. Who would do such a thing? The NPS, that's who.

To reiterate the post from two years ago, there are necessary changes needed to improve NPS. Firstly, if you're going to have a ranking method, use all the ranks! Also, 1-10 is a 10-point scale (which I suspect was the intention), not 0-10 - that's 11 (O is an index, people!). Secondly, ensure that there are qualitative values assigned to the quantitive values; 6/10 is not a detractor in a normal distribution - it's a neutral, leaning to positive. Specify how the quantitative values correlate with qualitative descriptions. Thirdly, actively seek out reasons for the rating provided. If you don't have that data all the signal will be is just that - a flashing light with no explanation. Without quantification and qualification, you simply cannot manage appropriately.

Finally, more questions! In managing customer loyalty, you will need to discover what they are being loyal to. It doesn't need to be overly long, just something that breaks down the experience that the customer can identify with. Customers may be lazy, but they're not that lazy. The benefit gained from a few questions provides much more insight than the loss of those customers who only answer one question: "A single item question is much less reliable and more volatile than a composite index. (Hill, Nigel; Roche, Greg; Allen, Rachel (2007). Customer Satisfaction: The Customer Experience through the Customer's Eyes). Yes, it is great to have people promoting your organisation or product. You know else you need? Knowledge of what that flashing light means.

March 09, 2020

Terry2020 finally making the indoor beast more stable

Over time the old Terry robot had evolved from a basic "T" shape to have pan and tilt and a robot arm on board. The rear caster(s) were the weakest part of the robot enabling the whole thing to rock around more than it should. I now have Terry 2020 on the cards.


Part of this is an upgrade to a Kinect2 for navigation. The power requirements of that (12v/3a or so) have lead me to putting a better dc-dc bus on board and some relays to be able to pragmatically shut down and bring up features are needed and conserve power otherwise. The new base footprint is 300x400mm though the drive wheels stick out the side.

The wheels out the sides is partially due to the planetary gear motors (on the under side) being quite long. If it is an issue I can recut the lowest layer alloy and move them inward but I an not really needing to have the absolute minimal turning circle. If that were the case I would move the drive wheels to the middle of the chassis so it could turn on it's center.

There will be 4 layers at the moment and a mezzanine below the arm. So there will be expansion room included in the build :)

The rebuild will allow Terry to move at top speed when self driving. Terry will never move at the speed of an outdoor robot but can move closer to it's potential when it rolls again.

March 08, 2020

Yet another near-upstream Raptor Blackbird firmware build

In what is coming a month occurance, I’ve put up yet another firmware build for the Raptor Blackbird with close-to-upstream firmware (see here and here for previous ones).

Well, I’ve done another build! It’s current op-build (as of yesterday), but my branch with patches for the Raptor Blackbird. The skiboot patch is there, the SBE speedup patch is now upstream. The machine-xml which is straight from Raptor but in my repo.

Here’s the current versions of everything:

$ lsprop /sys/firmware/devicetree/base/ibm,firmware-versions/
skiboot          "v6.5-228-g82aed17a-p4360f95"
bmc-firmware-version
                 "0.00"
occ              "3ab2921"
hostboot         "acdff8a-pe7e80e1"
buildroot        "2019.05.3-15-g3a4fc2a888"
capp-ucode       "p9-dd2-v4"
machine-xml      "site_local-stewart-a0efd66"
hostboot-binaries
                 "hw013120a.opmst"
sbe              "c318ab0-p1ddf83c"
hcode            "hw030220a.opmst"
petitboot        "v1.12"
phandle          0000064c (1612)
version          "blackbird-v2.4-514-g62d1a941"
linux            "5.4.22-openpower1-pdbbf8c8"
name             "ibm,firmware-versions"

If we compare this to the last build I put up, we have:

Componentoldnew
skibootv6.5-209-g179d53df-p4360f95v6.5-228-g82aed17a-p4360f95
linux5.4.13-openpower1-pa361bec5.4.22-openpower1-pdbbf8c8
occ3ab2921no change
hostboot779761d-pe7e80e1acdff8a-pe7e80e1
buildroot2019.05.3-14-g17f117295f2019.05.3-15-g3a4fc2a888
capp-ucodep9-dd2-v4no change
machine-xmlsite_local-stewart-a0efd66no change
hostboot-binarieshw011120a.opmsthw013120a.opmst
sbe166b70c-p06fc80cc318ab0-p1ddf83c
hcodehw011520a.opmsthw030220a.opmst
petitbootv1.11v1.12
versionblackbird-v2.4-415-gb63b36efblackbird-v2.4-514-g62d1a941

So, what do those changes mean? Not too much changed over the past month. Kernel bump, new petitboot (although I can’t find release notes but it doesn’t look like there’s a lot of changes), and slight bumps to other firmware components.

Grab blackbird.pnor from https://www.flamingspork.com/blackbird/stewart-blackbird-4-images/ and give it a whirl!

To flash it, copy blackbird.pnor to your Blackbird’s BMC in /tmp/ (important! the /tmp filesystem has enough room, the home directory for root does not), and then run:

pflash -E -p /tmp/blackbird.pnor

Which will ask you to confirm and then flash:

About to erase chip !
WARNING ! This will modify your HOST flash chip content !
Enter "yes" to confirm:yes
Erasing... (may take a while)
[==================================================] 99% ETA:1s      
done !
About to program "/tmp/blackbird.pnor" at 0x00000000..0x04000000 !
Programming & Verifying...
[==================================================] 100% ETA:0s   

March 07, 2020

Fixing MariaDB InnoDB errors after upgrading to MythTV 30

After upgrading to MythTV 30 and MariaDB 10.3.18 on Debian buster, I noticed the following errors in my logs:

Jan 14 02:00:05 hostname mysqld[846]: 2020-01-14  2:00:05 62 [Warning] InnoDB: Cannot add field `rating` in table `mythconverg`.`internetcontentarticles` because after adding it, the row size is 8617 which is greater than maximum allowed size (8126) for a record on index leaf page.
Jan 14 02:00:05 hostname mysqld[846]: 2020-01-14  2:00:05 62 [Warning] InnoDB: Cannot add field `playcommand` in table `mythconverg`.`videometadata` because after adding it, the row size is 8243 which is greater than maximum allowed size (8126) for a record on index leaf page.

The root cause is that the database is using an InnoDB row format that cannot handle the new table sizes.

To fix it, I put the following in alter_tables.sql:

ALTER TABLE archiveitems ROW_FORMAT=DYNAMIC;
ALTER TABLE bdbookmark ROW_FORMAT=DYNAMIC;
ALTER TABLE callsignnetworkmap ROW_FORMAT=DYNAMIC;
ALTER TABLE capturecard ROW_FORMAT=DYNAMIC;
ALTER TABLE cardinput ROW_FORMAT=DYNAMIC;
ALTER TABLE channel ROW_FORMAT=DYNAMIC;
ALTER TABLE channelgroup ROW_FORMAT=DYNAMIC;
ALTER TABLE channelgroupnames ROW_FORMAT=DYNAMIC;
ALTER TABLE channelscan ROW_FORMAT=DYNAMIC;
ALTER TABLE channelscan_channel ROW_FORMAT=DYNAMIC;
ALTER TABLE channelscan_dtv_multiplex ROW_FORMAT=DYNAMIC;
ALTER TABLE codecparams ROW_FORMAT=DYNAMIC;
ALTER TABLE credits ROW_FORMAT=DYNAMIC;
ALTER TABLE customexample ROW_FORMAT=DYNAMIC;
ALTER TABLE diseqc_config ROW_FORMAT=DYNAMIC;
ALTER TABLE diseqc_tree ROW_FORMAT=DYNAMIC;
ALTER TABLE displayprofilegroups ROW_FORMAT=DYNAMIC;
ALTER TABLE displayprofiles ROW_FORMAT=DYNAMIC;
ALTER TABLE dtv_multiplex ROW_FORMAT=DYNAMIC;
ALTER TABLE dtv_privatetypes ROW_FORMAT=DYNAMIC;
ALTER TABLE dvdbookmark ROW_FORMAT=DYNAMIC;
ALTER TABLE dvdinput ROW_FORMAT=DYNAMIC;
ALTER TABLE dvdtranscode ROW_FORMAT=DYNAMIC;
ALTER TABLE eit_cache ROW_FORMAT=DYNAMIC;
ALTER TABLE filemarkup ROW_FORMAT=DYNAMIC;
ALTER TABLE gallery_directories ROW_FORMAT=DYNAMIC;
ALTER TABLE gallery_files ROW_FORMAT=DYNAMIC;
ALTER TABLE gallerymetadata ROW_FORMAT=DYNAMIC;
ALTER TABLE housekeeping ROW_FORMAT=DYNAMIC;
ALTER TABLE inputgroup ROW_FORMAT=DYNAMIC;
ALTER TABLE internetcontent ROW_FORMAT=DYNAMIC;
ALTER TABLE internetcontentarticles ROW_FORMAT=DYNAMIC;
ALTER TABLE inuseprograms ROW_FORMAT=DYNAMIC;
ALTER TABLE iptv_channel ROW_FORMAT=DYNAMIC;
ALTER TABLE jobqueue ROW_FORMAT=DYNAMIC;
ALTER TABLE jumppoints ROW_FORMAT=DYNAMIC;
ALTER TABLE keybindings ROW_FORMAT=DYNAMIC;
ALTER TABLE keyword ROW_FORMAT=DYNAMIC;
ALTER TABLE livestream ROW_FORMAT=DYNAMIC;
ALTER TABLE logging ROW_FORMAT=DYNAMIC;
ALTER TABLE music_albumart ROW_FORMAT=DYNAMIC;
ALTER TABLE music_albums ROW_FORMAT=DYNAMIC;
ALTER TABLE music_artists ROW_FORMAT=DYNAMIC;
ALTER TABLE music_directories ROW_FORMAT=DYNAMIC;
ALTER TABLE music_genres ROW_FORMAT=DYNAMIC;
ALTER TABLE music_playlists ROW_FORMAT=DYNAMIC;
ALTER TABLE music_radios ROW_FORMAT=DYNAMIC;
ALTER TABLE music_smartplaylist_categories ROW_FORMAT=DYNAMIC;
ALTER TABLE music_smartplaylist_items ROW_FORMAT=DYNAMIC;
ALTER TABLE music_smartplaylists ROW_FORMAT=DYNAMIC;
ALTER TABLE music_songs ROW_FORMAT=DYNAMIC;
ALTER TABLE music_stats ROW_FORMAT=DYNAMIC;
ALTER TABLE music_streams ROW_FORMAT=DYNAMIC;
ALTER TABLE mythlog ROW_FORMAT=DYNAMIC;
ALTER TABLE mythweb_sessions ROW_FORMAT=DYNAMIC;
ALTER TABLE networkiconmap ROW_FORMAT=DYNAMIC;
ALTER TABLE oldfind ROW_FORMAT=DYNAMIC;
ALTER TABLE oldprogram ROW_FORMAT=DYNAMIC;
ALTER TABLE oldrecorded ROW_FORMAT=DYNAMIC;
ALTER TABLE people ROW_FORMAT=DYNAMIC;
ALTER TABLE phonecallhistory ROW_FORMAT=DYNAMIC;
ALTER TABLE phonedirectory ROW_FORMAT=DYNAMIC;
ALTER TABLE pidcache ROW_FORMAT=DYNAMIC;
ALTER TABLE playgroup ROW_FORMAT=DYNAMIC;
ALTER TABLE powerpriority ROW_FORMAT=DYNAMIC;
ALTER TABLE profilegroups ROW_FORMAT=DYNAMIC;
ALTER TABLE program ROW_FORMAT=DYNAMIC;
ALTER TABLE programgenres ROW_FORMAT=DYNAMIC;
ALTER TABLE programrating ROW_FORMAT=DYNAMIC;
ALTER TABLE recgrouppassword ROW_FORMAT=DYNAMIC;
ALTER TABLE recgroups ROW_FORMAT=DYNAMIC;
ALTER TABLE record ROW_FORMAT=DYNAMIC;
ALTER TABLE record_tmp ROW_FORMAT=DYNAMIC;
ALTER TABLE recorded ROW_FORMAT=DYNAMIC;
ALTER TABLE recordedartwork ROW_FORMAT=DYNAMIC;
ALTER TABLE recordedcredits ROW_FORMAT=DYNAMIC;
ALTER TABLE recordedfile ROW_FORMAT=DYNAMIC;
ALTER TABLE recordedmarkup ROW_FORMAT=DYNAMIC;
ALTER TABLE recordedprogram ROW_FORMAT=DYNAMIC;
ALTER TABLE recordedrating ROW_FORMAT=DYNAMIC;
ALTER TABLE recordedseek ROW_FORMAT=DYNAMIC;
ALTER TABLE recordfilter ROW_FORMAT=DYNAMIC;
ALTER TABLE recordingprofiles ROW_FORMAT=DYNAMIC;
ALTER TABLE recordmatch ROW_FORMAT=DYNAMIC;
ALTER TABLE scannerfile ROW_FORMAT=DYNAMIC;
ALTER TABLE scannerpath ROW_FORMAT=DYNAMIC;
ALTER TABLE schemalock ROW_FORMAT=DYNAMIC;
ALTER TABLE settings ROW_FORMAT=DYNAMIC;
ALTER TABLE storagegroup ROW_FORMAT=DYNAMIC;
ALTER TABLE tvchain ROW_FORMAT=DYNAMIC;
ALTER TABLE tvosdmenu ROW_FORMAT=DYNAMIC;
ALTER TABLE upnpmedia ROW_FORMAT=DYNAMIC;
ALTER TABLE user_permissions ROW_FORMAT=DYNAMIC;
ALTER TABLE user_sessions ROW_FORMAT=DYNAMIC;
ALTER TABLE users ROW_FORMAT=DYNAMIC;
ALTER TABLE videocast ROW_FORMAT=DYNAMIC;
ALTER TABLE videocategory ROW_FORMAT=DYNAMIC;
ALTER TABLE videocollection ROW_FORMAT=DYNAMIC;
ALTER TABLE videocountry ROW_FORMAT=DYNAMIC;
ALTER TABLE videogenre ROW_FORMAT=DYNAMIC;
ALTER TABLE videometadata ROW_FORMAT=DYNAMIC;
ALTER TABLE videometadatacast ROW_FORMAT=DYNAMIC;
ALTER TABLE videometadatacountry ROW_FORMAT=DYNAMIC;
ALTER TABLE videometadatagenre ROW_FORMAT=DYNAMIC;
ALTER TABLE videopart ROW_FORMAT=DYNAMIC;
ALTER TABLE videopathinfo ROW_FORMAT=DYNAMIC;
ALTER TABLE videosource ROW_FORMAT=DYNAMIC;
ALTER TABLE videotypes ROW_FORMAT=DYNAMIC;
ALTER TABLE weatherdatalayout ROW_FORMAT=DYNAMIC;
ALTER TABLE weatherscreens ROW_FORMAT=DYNAMIC;
ALTER TABLE weathersourcesettings ROW_FORMAT=DYNAMIC;

and then ran it like this:

mysql -umythtv -pPassword1 mythconverg < alter_tables.sql

March 06, 2020

Making SIP calls to VoIP.ms subscribers without using the PSTN

If you want to reach a VoIP.ms subscriber from Asterisk without using the PSTN, there is a way to do so via SIP URIs.

Here's what I added to my /etc/asterisk/extensions.conf:

exten => 1234,1,Set(CALLERID(all)=Francois Marier <5555551234>)
exten => 1234,n,Dial(SIP/sip.voip.ms/5555556789)

March 04, 2020

Configuring load balancing and location headers on Google Cloud

Share

I have a need at the moment to know where my users are in the world. This helps me to identify what compute resources to serve their request with in order to reduce the latency they experience. So how do you do that thing with Google Cloud?

The first step is to setup a series of test backends to send traffic to. I built three regions: Sydney; London; and Los Angeles. It turns out in hindsight that wasn’t actually nessesary though — this would work with a single backend just as well. For my backends I chose a minimal Ubuntu install, running this simple backend HTTP service.

I had some initial trouble finding a single page which walked through the setup of the Google Cloud load balancer to do what I wanted, which is the main reason for writing this post. The steps are:

Create your test instances and configure the backend on them. I ended up with a setup like this:

A list of google cloud VMs

Next setup instance groups to contain these instances. I chose unmanaged instance groups (that is, I don’t want autoscaling). You need to create one per region.

A list of google cloud instance groups

But wait! There’s one more layer of abstraction. We need a backend service. The configuration for these is cunningly hidden on the load balancing page, on a separate tab. Create a service which contains our three instance groups:

A sample backend service

I’ve also added a health check to my service, which just requests “/healthz” from each instance and expects a response of “OK” for healthy backends.

The backend service is also where we configure our extra headers. Click on the “advanced configurations” link, and more options appear:

Additional backend service options

Here I setup the extra HTTP headers the load balancer should insert: X-Region; X-City; and X-Lat-Lon.

And finally we can configure the load balancer. I selected a “HTTP(S) load balancer”, as I only care about incoming HTTP and HTTPS traffic. Obviously you set the load balancer to route traffic from the Internet to your VMs, and you wire the backend of the load balancer to your service. Select your backend service for the backend.

Now we can test! If I go to my load balancer in a web browser, I now get a result like this:

The top part of the page is just the HTTP headers from the request. You can see that we’re now getting helpful location headers. Mission accomplished!

Share

March 03, 2020

Using Ansible to define and manage KVM guests and networks with YAML inventories

I wanted a way to quickly spin different VMs up and down on my KVM dev box, to help with testing things like OpenStack, Swift, Ceph and Kubernetes. Some of my requirements were as follows:

  • Define everything in a markup language, like YAML
  • Manage VMs (define, stop, start, destroy and undefine) and apply settings as a group or individually
  • Support different settings for each VMs, like disks, memory, CPU, etc
  • Support multiple drives and types, including Virtio, SCSI, SATA and NVMe
  • Create users and set root passwords
  • Manage networks (create, delete) and which VMs go on them
  • Mix and match Linux distros and releases
  • Use existing cloud images from distros
  • Manage access to the VMs including DNS/hosts resolution and SSH keys
  • Have a good set of defaults so it would work out of the box
  • Potentially support other architectures (like ppc64le or arm)

So I hacked together an Ansible role and example playbook. Setting guest states to running, shutdown, destroyed or undefined (to delete and clean up) are supported. It will also manage multiple libvirt networks and guests can have different specs as well as multiple disks of different types (SCSI, SATA, Virtio, NVMe). With Ansible’s –limit option, any individual guest, a hostgroup of guests, or even a mix can be managed.

Managing KVM guests with Ansible

Although Terraform with libvirt support is potentially a good solution, by using Ansible I can use that same inventory to further manage the guests and I’ve also been able to configure the KVM host itself. All that’s really needed is a Linux host capable of running KVM, some guest images and a basic inventory. The Ansible will do the rest (on supported distros).

The README is quite detailed, so I won’t repeat all of that here. The sample playbook comes with some example inventories, such as this simple one for spinning up three CentOS hosts (and using defaults).

simple:
  hosts:
    centos-simple-[0:2]:
      ansible_python_interpreter: /usr/bin/python

This can be executed like so.

curl -O https://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud.qcow2
sudo mv -iv CentOS-7-x86_64-GenericCloud.qcow2 /var/lib/libvirt/images/

git clone --recursive https://github.com/csmart/virt-infra-ansible.git
cd virt-infra-ansible

ansible-playbook --limit kvmhost,simple ./virt-infra.yml

There is also a more detailed example inventory that uses multiple distros and custom settings for the guests.

So far this has been very handy!

March 02, 2020

Amazon Prime and Netflix

I’ve been trying both Amazon Prime and Netflix. I signed up for the month free of Amazon Prime to watch “Good Omens” and “Picard”. “Good Omens” is definitely worth the effort of setting up the month free of Amazon Prime and is worth the month’s subscription if you have used your free month in the past and Picard is ok.

Content

Amazon Prime has a medium amount of other content, I’m now paying for a month of Amazon Prime mainly because there’s enough documentaries to take a month. For reference there are plenty of good ones about war and about space exploration. There are also some really rubbish documentaries, for example a 2 part documentary about the Magna Carta where the second part starts with Grover Norquist claiming that the Magna Carta is justification for not having any taxes (the first part seemed ok).

Netflix has a lot of great content. A big problem with Netflix is that there aren’t good ways of searching and organising the content you want to watch. It would be really nice if Netflix could use some machine learning for recommendations and recommend shows based on what I’ve liked and also what I’ve disliked.

On both Netflix and Amazon when you view the details of a show it gives a short list of similar shows which is nice. With Amazon I have no complaints about that. But with Netflix the content library is so great that you get lost in a maze of links. On the Android tablet interface for Netflix it shows 12 similar shows in a grid and on the web interface it’s a row of 20 shows with looped scrolling. Then as you click a different show you get another list of 12/20 shows which will usually have some overlap with the previous one. It would be nice if you could easily swipe left on shows you don’t like to avoid having them repeatedly presented to you.

On Netflix I’ve really enjoyed the “Altered Carbon” series (which is significantly more violent than I anticipated), “Black Mirror” (the episode written by Trent Reznor and starring Miley Cyrus is particularly good), and “Love Death and Robots”. Overall I currently rate “Love Death and Robots” as in many ways the best series I’ve ever watched because the episodes are all short and get straight to the point. One advantage of online video is that they don’t need to pad episodes out or cut them short to fit a TV time slot, they can use as much time as necessary to tell the story.

Watch List

Having a single row of shows to watch is fine for the amount of content that Amazon has, but for the Netflix content you can easily get 100 shows on your watch list and it would be good to be able to search my watch list by genre (it’s a drag to flick through dozens of icons of war documentaries when I’m in the mood for an action movie as the icons are somewhat similar).

As well as a list of shows you selected to watch Netflix has a list of shows that have been recently watched with no way to edit it which is separate from the list of shows selected to watch. So if you watch 5 minutes of a show and decide that it sucks then it stays on the list until you have partially watched 10 other shows recently. For my usage the recently watched list is the most important thing as I’m watching some serial shows and wouldn’t want to go through the 100 shows on my watch list to find them. If I’ve decided that a movie sucked after watching a bit of it I don’t want to be reminded of it by seeing the icon every time I use Netflix for the next month.

Amazon has only a single “watch next” list for shows that you have watched recently and shows that you selected as worth watching. It allows editing the list which is nice, but then Amazon also often keeps shows on the list when you have finished watching them and removed them from the “to watch” setting. Amazon’s watch list is also generally buggy, at one time it decided that a movie was no longer available in my region but didn’t let me remove it from the list.

Quality

Apparently the Netflix web interface on Linux only allows 720p video while the Amazon web interface on all platforms is limited to 720p. In any case my Internet connection is probably only good enough for 1080p at most. I haven’t noticed any quality differences between Netflix and Amazon Prime.

Multiple Users

Netflix allows you to create profiles for multiple users with separate watch lists which is very handy. They also don’t have IP address restrictions so it’s a common practice for people to share a Netflix account with relatives. If you try to use Netflix when the maximum number of sessions for your account is in use it will show a list of what the other people on your account are watching (so if you share with your parents be careful about that).

Amazon doesn’t allow creating multiple profiles, but the content isn’t that great. The trend in video streaming is for proprietary content to force users to subscribe to a service. So sharing an Amazon Prime account with a few people so you want watch the proprietary content would make sense.

Watching Patterns

Sometimes when I’m particularly distracted I can’t focus on one show for any length of time. Both Amazon and Netflix (and probably all other online streaming services) allow me to skip between shows easily. That’s always been a feature of YouTube, but with YouTube you get recommended increasingly viral content until you find yourself watching utter rubbish. At least with Amazon and Netflix there is a minimum quality level even if that is reality TV.

Conclusion

Amazon Prime has a smaller range of content and some really rubbish documentaries. I don’t mind the documentaries about UFOs and other fringe stuff as it’s obvious what it is and you can avoid it. A documentary that has me watching for an hour before it’s revealed to be a promo for Grover Norquist is really bad, did the hour of it that I watched have good content or just rubbish too?

Netflix has a huge range of content and the quality level is generally very high.

If you are going to watch TV then subscribing to Netflix is probably a good idea. It’s reasonably cheap, has a good (not great) interface, and has a lot of content including some great original content.

For Amazon maybe subscribe for 1 month every second year to binge watch the Amazon proprietary content that interests you.

March 01, 2020

Upgrading a DMR hotspot to Pi-Star 4.1 RC-7

While the Pi-Star DMR gateway has automatic updates, the latest release (3.4.17) is still based on an old version of Debian (Debian 8 jessie) which is no longer supported. It is however relatively easy to update to the latest release candidate of Pi-Star 4.1 (based on Debian 10 buster), as long as you have a spare SD card (4 GB minimum).

Download the required files and copy to an SD card

First of all, download the latest RC image and your local configuration (remember the default account name is pi-star). I like to also print a copy of that settings page since it's much easier to refer to if things go wrong.

Then unzip the image and "burn" it to a new SD card (no need to format it ahead of time):

sudo dd if=Pi-Star_RPi_V4.1.0-RC7_20-Dec-2019.img of=/dev/sdX status=progress bs=4M
sync

where /dev/sdX is the device name for the SD card, which you can find in the dmesg output. Don't skip the sync command or you may eject the card before your computer is done writing to it.

Then unmount the SD card and unplug it from your computer. Plug it back in. You should see two drives mounted automatically on your desktop:

  • pistar
  • boot

Copy the configuration zip file you downloaded earlier onto the root of the boot drive and then eject the drive.

Run sync again before actually unplugging the card.

Boot into the new version

In order to boot into the new version, start by turning off the Pi. Then remove the old SD card and insert the new one that you just prepared. That new card will become the new OS drive.

Boot the Pi and ideally connect a monitor to the HDMI port so that you can see it boot up and reboot twice before dropping you to a login prompt.

Login using the default credentials:

  • Username: pi-star
  • Password: raspberry

Once logged in use top to see if the pi is busy doing anything. Mine was in the process of upgrading Debian packages via unattended-upgrades which made everything (including the web UI) very slow.

You should now be able to access the web UI using the above credentials.

Update to the latest version

From the command line, you can ensure that you are running the latest version of Pi-Star by running the following command:

sudo pistar-upgrade

This updated from 4.1.0-RC7 to 4.1.0-RC8 on my device.

You can also run the following:

sudo pistar-update

to update the underlying Raspbian OS.

Check and restore your settings

Once things have settled down, double-check the settings and restore your admin password since that was not part of the configuration backup you made earlier.

I had to restore the following settings since they got lost in the process:

  • Auto AP: Off
  • uPNP: Off

Roll back to the previous version

If you run into problems, the best option is to roll back to the previous version and then try again.

As long as you didn't reuse the original SD card for this upgrade, rolling back to version 3.4.17 simply involves shutting down the pi and then swapping the new SD card for the old one and then starting it up again.

Audiobooks – February 2020

A Reminder of my rating System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70%
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Abraham Lincoln: A Life (Volume Two) by Michael Burlingame

2nd volume covering Lincoln’s time as president. Lots of quotes from contemporary sources. Fairly good coverage of just about everything. 3/5

Capital City: Gentrification and the Real Estate State
by Samuel Stein

Some interesting insights although everything being about New York and very left-wing politics of the author muddle the message. Worth a read if you are into the topic. 3/5

Young Men and Fire by Norman Maclean

The story of the 1949 Mann Gulch fire that killed 13 smoke jumpers. Misses a point due to lots of talking to maps/photographs but still a gripping story. 3/5

The Walls Have Ears: The Greatest Intelligence Operation of World War II by Helen Fry

The secret British operation to bug German POWs to obtain military intelligence. Only declassified in the late 1990s so very few personal recollections, but an interesting story. 3/5

Share

February 29, 2020

Links February 2020

Truthout has an interesting summary of the US “Wars Without Victory and Weapons Without End” [1]. The Korean war seems mostly a win for the US though.

The Golden Age of White Collar Crime is an informative article about the epidemic of rich criminals in the US that are protected at the highest levels [2]. This disproves the claims about gun ownership preventing crime. AFAIK no-one has shot a corporate criminal in spite of so many deserving it.

Law and Political Economy has an insightful article “Privatizing Sovereignty, Socializing Property: What Economics Doesn’t Teach You About the Corporation” [3]. It makes sense of the corporation law system.

IDR labs has a communism test, I scored 56% [4].

Vice has an interesting article about companies providing free email programs and services and then selling private data [5]. The California Consumer Privacy Act is apparently helping as companies that do business in the US can’t be sure which customers are in CA and need to comply to it for all users. Don’t trust corporations with your private data.

The Atlantic has an interesting article about Coronavirus and the Blindness of Authoritarianism [6]. The usual problem of authoritarianism but with a specific example from China. The US is only just astarting it’s experiment with authoritarianism and they are making the same mistakes.

The Atlantic has an insightful article about Coronavirus and it’s effect on China’s leadership [7]. It won’t change things much.

On The Commons has an insightful article We Now Have a Justice System Just for Corporations [8]. In the US corporations can force people into arbitration for most legal disputes, as they pay the arbitration companies the arbitration almost always gives the company the result they pay for.

Boing Boing has an interesting article about conspiracy theories [9]. Their point is that some people have conspiracy theories (meaning belief in conspiracies that is not based in fact) due to having seen real conspiracies at close range. I think this only applies to a minority of people who believe conspiracy theories, and probably only to people who believe in a very small number of conspiracies. It seems that most people who believe in conspiracy theories believe in many of them.

Douglas Rushkoff wrote a good article about rich people who are making plans to escape after they destroy the environment [10]. Includes the idea of having shock-collars for security guards to stop them going rogue.

Boing Boing has an interesting article on the Brahmin Left and the Merchant Right [11]. It has some good points about the left side of politics representing the middle class more than the working class, especially the major left wing parties that are more centrist nowadays (like Democrats in the US and Labor in Australia).

February 27, 2020

Linux Security Summit North America 2020: CFP and Registration

The CFP for the 2020 Linux Security Summit North America is currently open, and closes on March 31st.

The CFP details are here: https://events.linuxfoundation.org/linux-security-summit-north-america/program/cfp/

You can register as an attendee here: https://events.linuxfoundation.org/linux-security-summit-north-america/register/

Note that the conference this year has moved from August to June (24-26).  The location is Austin, TX, and we are co-located with the Open Source Summit as usual.

We’ll be holding a 3-day event again, after the success of last year’s expansion, which provides time for tutorials and ad-hoc break out sessions.  Please note that if you intend to submit a tutorial, you should be a core developer of the project or otherwise recognized leader in the field, per this guidance from the CFP:

Tutorial sessions should be focused on advanced Linux security defense topics within areas such as the kernel, compiler, and security-related libraries.  Priority will be given to tutorials created for this conference, and those where the presenter is a leading subject matter expert on the topic.

This will be the 10th anniversary of the Linux Security Summit, which was first held in 2010 in Boston as a one day event.

Get your proposals for 2020 in soon!

February 16, 2020

DisplayPort and 4K

The Problem

Video playback looks better with a higher scan rate. A lot of content that was designed for TV (EG almost all historical documentaries) is going to be 25Hz interlaced (UK and Australia) or 30Hz interlaced (US). If you view that on a low refresh rate progressive scan display (EG a modern display at 30Hz) then my observation is that it looks a bit strange. Things that move seem to jump a bit and it’s distracting.

Getting HDMI to work with 4K resolution at a refresh rate higher than 30Hz seems difficult.

What HDMI Can Do

According to the HDMI Wikipedia page [1], HDMI 1.3–1.4b (introduced in June 2006) supports 30Hz refresh at 4K resolution and if you use 4:2:0 Chroma Subsampling (see the Chroma Subsampling Wikipedia page [2] you can do 60Hz or 75Hz on HDMI 1.3–1.4b. Basically for colour 4:2:0 means half the horizontal and half the vertical resolution while giving the same resolution for monochrome. For video that apparently works well (4:2:0 is standard for Blue Ray) and for games it might be OK, but for text (my primary use of computers) it would suck.

So I need support for HDMI 2.0 (introduced in September 2013) on the video card and monitor to do 4K at 60Hz. Apparently none of the combinations of video card and HDMI cable I use for Linux support that.

HDMI Cables

The Wikipedia page alleges that you need either a “Premium High Speed HDMI Cable” or a “Ultra High Speed HDMI Cable” for 4K resolution at 60Hz refresh rate. My problems probably aren’t related to the cable as my testing has shown that a cheap “High Speed HDMI Cable” can work at 60Hz with 4K resolution with the right combination of video card, monitor, and drivers. A Windows 10 system I maintain has a Samsung 4K monitor and a NVidia GT630 video card running 4K resolution at 60Hz (according to Windows). The NVidia GT630 card is one that I tried on two Linux systems at 4K resolution and causes random system crashes on both, it seems like a nice card for Windows but not for Linux.

Apparently the HDMI devices test the cable quality and use whatever speed seems to work (the cable isn’t identified to the devices). The prices at a local store are $3.98 for “high speed”, $19.88 for “premium high speed”, and $39.78 for “ultra high speed”. It seems that trying a “high speed” cable first before buying an expensive cable would make sense, especially for short cables which are likely to be less susceptible to noise.

What DisplayPort Can Do

According to the DisplayPort Wikipedia page [3] versions 1.2–1.2a (introduced in January 2010) support HBR2 which on a “Standard DisplayPort Cable” (which probably means almost all DisplayPort cables that are in use nowadays) allows 60Hz and 75Hz 4K resolution.

Comparing HDMI and DisplayPort

In summary to get 4K at 60Hz you need 2010 era DisplayPort or 2013 era HDMI. Apparently some video cards that I currently run for 4K (which were all bought new within the last 2 years) are somewhere between a 2010 and 2013 level of technology.

Also my testing (and reading review sites) shows that it’s common for video cards sold in the last 5 years or so to not support HDMI resolutions above FullHD, that means they would be HDMI version 1.1 at the greatest. HDMI 1.2 was introduced in August 2005 and supports 1440p at 30Hz. PCIe was introduced in 2003 so there really shouldn’t be many PCIe video cards that don’t support HDMI 1.2. I have about 8 different PCIe video cards in my spare parts pile that don’t support HDMI resolutions higher than FullHD so it seems that such a limitation is common.

The End Result

For my own workstation I plugged a DisplayPort cable between the monitor and video card and a Linux window appeared (from KDE I think) offering me some choices about what to do, I chose to switch to the “new monitor” on DisplayPort and that defaulted to 60Hz. After that change TV shows on NetFlix and Amazon Prime both look better. So it’s a good result.

As an aside DisplayPort cables are easier to scrounge as the HDMI cables get taken by non-computer people for use with their TV.

February 15, 2020

Self Assessment

Background Knowledge

The Dunning Kruger Effect [1] is something everyone should read about. It’s the effect where people who are bad at something rate themselves higher than they deserve because their inability to notice their own mistakes prevents improvement, while people who are good at something rate themselves lower than they deserve because noticing all their mistakes is what allows them to improve.

Noticing all your mistakes all the time isn’t great (see Impostor Syndrome [2] for where this leads).

Erik Dietrich wrote an insightful article “How Developers Stop Learning: Rise of the Expert Beginner” [3] which I recommend that everyone reads. It is about how some people get stuck at a medium level of proficiency and find it impossible to unlearn bad practices which prevent them from achieving higher levels of skill.

What I’m Concerned About

A significant problem in large parts of the computer industry is that it’s not easy to compare various skills. In the sport of bowling (which Erik uses as an example) it’s easy to compare your score against people anywhere in the world, if you score 250 and people in another city score 280 then they are more skilled than you. If I design an IT project that’s 2 months late on delivery and someone else designs a project that’s only 1 month late are they more skilled than me? That isn’t enough information to know. I’m using the number of months late as an arbitrary metric of assessing projects, IT projects tend to run late and while delivery time might not be the best metric it’s something that can be measured (note that I am slightly joking about measuring IT projects by how late they are).

If the last project I personally controlled was 2 months late and I’m about to finish a project 1 month late does that mean I’ve increased my skills? I probably can’t assess this accurately as there are so many variables. The Impostor Syndrome factor might lead me to think that the second project was easier, or I might get egotistical and think I’m really great, or maybe both at the same time.

This is one of many resources recommending timely feedback for education [4], it says “Feedback needs to be timely” and “It needs to be given while there is still time for the learners to act on it and to monitor and adjust their own learning”. For basic programming tasks such as debugging a crashing program the feedback is reasonably quick. For longer term tasks like assessing whether the choice of technologies for a project was good the feedback cycle is almost impossibly long. If I used product A for a year long project does it seem easier than product B because it is easier or because I’ve just got used to it’s quirks? Did I make a mistake at the start of a year long project and if so do I remember why I made that choice I now regret?

Skills that Should be Easy to Compare

One would imagine that martial arts is a field where people have very realistic understanding of their own skills, a few minutes of contest in a ring, octagon, or dojo should show how your skills compare to others. But a YouTube search for “no touch knockout” or “chi” shows that there are more than a few “martial artists” who think that they can knock someone out without physical contact – with just telepathy or something. George Dillman [5] is one example of someone who had some real fighting skills until he convinced himself that he could use mental powers to knock people out. From watching YouTube videos it appears that such people convince the members of their dojo of their powers, and those people then faint on demand “proving” their mental powers.

The process of converting an entire dojo into believers in chi seems similar to the process of converting a software development team into “expert beginners”, except that martial art skills should be much easier to assess.

Is it ever possible to assess any skills if people trying to compare martial art skills often do it so badly?

Conclusion

It seems that any situation where one person is the undisputed expert has a risk of the “chi” problem if the expert doesn’t regularly meet peers to learn new techniques. If someone like George Dillman or one of the “expert beginners” that Erik Dietrich refers to was to regularly meet other people with similar skills and accept feedback from them they would be much less likely to become a “chi” master or “expert beginner”. For the computer industry meetup.com seems the best solution to this, whatever your IT skills are you can find a meetup where you can meet people with more skills than you in some area.

Here’s one of many guides to overcoming Imposter Syndrome [5]. Actually succeeding in following the advice of such web pages is not going to be easy.

I wonder if getting a realistic appraisal of your own skills is even generally useful. Maybe the best thing is to just recognise enough things that you are doing wrong to be able to improve and to recognise enough things that you do well to have the confidence to do things without hesitation.

February 14, 2020

Bidirectional rc joystick

With a bit of tinkering one can use the https://github.com/bmellink/IBusBM library to send information back to the remote controller. The info is tagged as either temperature, rpm, or voltage and units set based on that. There is a limit of 9 user feedbacks so I have 3 of each exposed.


To do this I used one of the Mega 2650 boards that is in a small form factor configuration. This gave me 5 volts to run the actual rc receiver from and more than one UART to talk to the usb, input and output parts of the buses. I think you only need 2 UARTs but as I had a bunch I just used separate ones.

The 2560 also gives a lavish amount of ram so using ROS topics doesn't really matter. I have 9 subscribers and 1 publisher on the 2560. The 9 subscribers allows sending temp, voltage, rpm info back to the remote and flexibility in what is sent so that can be adjusted on the robot itself.

I used a servo extension cable to carry the base 5v, ground, and rx signals from the ibus out on the rc receiver unit. Handy as the servo plug ends can be taped together for the more bumpy environment that the hound likes to tackle. I wound up putting the diode floating between two extension wires on the (to tx) side of the bus.



The 1 publisher just sends an array with the raw RC values in it. With minimal delays I can get a reasonably steady 120hz publication of rc values. So now the houndbot can tell me when it is getting hungry for more fresh electrons from a great distance!

I had had some problems with the nano and the rc unit and locking up. I think perhaps this was due to crystals as the uno worked ok. The 2560 board has been bench tested for 30 minutes which was enough time to expose the issues on the nano.


POC Wireguard + FRR: Now with OSPFv2!

If you read my last post, I set up a POC with wireguard and FRR to have to power of wireguard (WG) but all the routing worked out with FRR. But I had a problem. When using RIPv2, the broadcast messages seemed to get stuck in the WG interfaces until I tcpdumped it. This meant that once I tcpdumped the routes would get through, but only to eventually go stale and disappear.

I talked with the awesome people in the #wireguard IRC channel on freenode and was told to simply stay clear of RIP.

So I revisited my POC env and swapped out RIP for OSPF.. and guess what.. it worked! Now all the routes get propagated and they stay there. Which means if I decided to add new WG links and make it grow, so should all the routing:

suse@wireguard-5:~> ip r
default via 172.16.0.1 dev eth0 proto dhcp
10.0.2.0/24 via 10.0.4.104 dev wg0 proto 188 metric 20
10.0.3.0/24 via 10.0.4.104 dev wg0 proto 188 metric 20
10.0.4.0/24 dev wg0 proto kernel scope link src 10.0.4.105
172.16.0.0/24 dev eth0 proto kernel scope link src 172.16.0.36
172.16.1.0/24 via 10.0.4.104 dev wg0 proto 188 metric 20
172.16.2.0/24 via 10.0.4.104 dev wg0 proto 188 metric 20
172.16.3.0/24 via 10.0.4.104 dev wg0 proto 188 metric 20
172.16.4.0/24 dev eth1 proto kernel scope link src 172.16.4.105
172.16.5.0/24 dev eth2 proto kernel scope link src 172.16.5.105

Isn’t that beautiful, all networks on one of the more distant nodes, including network 1 (172.16.1.0/24).

I realise this doesn’t make much sense unless you read the last post, but never fear, I thought I’ll rework and append the build notes here, in case you interested again.

Build notes – This time with OSPFv2

The topology we’ll be building

Seeing that this is my Suse hackweek project and now use OpenSuse, I’ll be using OpenSuse Leap 15.1 for all the nodes (and the KVM host too).

Build the env

I used ansible-virt-infra created by csmart to build the env. A created my own inventory file, which you can dump in the inventory/ folder which I called wireguard.yml:

---
wireguard:
hosts:
wireguard-1:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-blue"
- name: "net-green"
wireguard-2:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-blue"
- name: "net-white"
wireguard-3:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-white"
wireguard-4:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-orange"
- name: "net-green"
wireguard-5:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-orange"
- name: "net-yellow"
wireguard-6:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-yellow"
vars:
virt_infra_distro: opensuse
virt_infra_distro_image: openSUSE-Leap-15.1-JeOS.x86_64-15.1.0-OpenStack-Cloud-Current.qcow2
virt_infra_distro_image_url: https://download.opensuse.org/distribution/leap/15.1/jeos/openSUSE-Leap-15.1-JeOS.x86_64-15.1.0-OpenStack-Cloud-Current.qcow2
virt_infra_variant: opensuse15.1

Next we need to make sure the networks have been defined, we do this in the kvmhost inventory file, here’s a diff:

diff --git a/inventory/kvmhost.yml b/inventory/kvmhost.yml
index b1f029e..6d2485b 100644
--- a/inventory/kvmhost.yml
+++ b/inventory/kvmhost.yml
@@ -40,6 +40,36 @@ kvmhost:
           subnet: "255.255.255.0"
           dhcp_start: "10.255.255.2"
           dhcp_end: "10.255.255.254"
+        - name: "net-mgmt"
+          ip_address: "172.16.0.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.0.2"
+          dhcp_end: "172.16.0.99"
+        - name: "net-white"
+          ip_address: "172.16.1.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.1.2"
+          dhcp_end: "172.16.1.99"
+        - name: "net-blue"
+          ip_address: "172.16.2.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.2.2"
+          dhcp_end: "172.16.2.99"
+        - name: "net-green"
+          ip_address: "172.16.3.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.3.2"
+          dhcp_end: "172.16.3.99"
+        - name: "net-orange"
+          ip_address: "172.16.4.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.4.2"
+          dhcp_end: "172.16.4.99"
+        - name: "net-yellow"
+          ip_address: "172.16.5.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.5.2"
+          dhcp_end: "172.16.5.99"
     virt_infra_host_deps:
         - qemu-img
         - osinfo-query

Now all we need to do is run the playbook:

ansible-playbook --limit kvmhost,wireguard ./virt-infra.yml

Setting up the IPs and tunnels

This above infrastructure tool uses cloud_init to set up the network, so only the first NIC is up. You can confirm this with:

ansible wireguard -m shell -a "sudo ip a"

That’s ok because we want to use the numbers on our diagram anyway 🙂
Before we get to that, lets make sure wireguard is setup, and update all the nodes.

ansible wireguard -m shell -a "sudo zypper update -y"

If a reboot is required, reboot the nodes:

ansible wireguard -m shell -a "sudo reboot"

Add the wireguard repo to the nodes and install it, I look forward to 5.6 where wireguard will be included in the kernel:

ansible wireguard -m shell -a "sudo zypper addrepo -f obs://network:vpn:wireguard wireguard"

ansible wireguard -m shell -a "sudo zypper --gpg-auto-import-keys install -y wireguard-kmp-default wireguard-tools"

Load the kernel module:

ansible wireguard -m shell -a "sudo modprobe wireguard"

Let’s create wg0 on all wireguard nodes:

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo ip link add dev wg0 type wireguard"

And add wg1 to those nodes that have 2:

ansible wireguard-1,wireguard-4 -m shell -a "sudo ip link add dev wg1 type wireguard"

Now while we’re at it, lets create all the wireguard keys (because we can use ansible):

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo mkdir -p /etc/wireguard"

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "wg genkey | sudo tee /etc/wireguard/wg0-privatekey | wg pubkey | sudo tee /etc/wireguard/wg0-publickey"

ansible wireguard-1,wireguard-4 -m shell -a "wg genkey | sudo tee /etc/wireguard/wg1-privatekey | wg pubkey | sudo tee /etc/wireguard/wg1-publickey"

Let’s make sure we enable forwarding on the nodes the will pass traffic, and install the routing software (1,2,4 and 5):

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo sysctl net.ipv4.conf.all.forwarding=1"

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo sysctl net.ipv6.conf.all.forwarding=1"

While we’re at it, we might as well add the network repo so we can install FRR and then install it on the nodes:

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo zypper ar https://download.opensuse.org/repositories/network/openSUSE_Leap_15.1/ network"

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo zypper --gpg-auto-import-keys install -y frr libyang-extentions"

This time we’ll be using OSPFv2, as we’re just using IPv4:

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo sed -i 's/^ospfd=no/ospfd=yes/' /etc/frr/daemons"

And with that now we just need to do all per server things like add IPs and configure all the keys, peers, etc. We’ll do this a host at a time.
NOTE: As this is a POC we’re just using ip commands, obviously in a real env you’d wont to use systemd-networkd or something to make these stick.

wireguard-1

Firstly using:
sudo virsh dumpxml wireguard-1 |less

We can see that eth1 is net-blue and eth2 is net-green so:
ssh wireguard-1

First IPs:
sudo ip address add dev eth1 172.16.2.101/24
sudo ip address add dev eth2 172.16.3.101/24
sudo ip address add dev wg0 10.0.2.101/24
sudo ip address add dev wg1 10.0.3.101/24

Load up the tunnels:
sudo wg set wg0 listen-port 51821 private-key /etc/wireguard/wg0-privatekey

# Node2 (2.102) public key is: P1tHKnaw7d2GJUSwXZfcayrrLMaCBHqcHsaM3eITm0s= (cat /etc/wireguard/wg0-publickey)

sudo wg set wg0 peer P1tHKnaw7d2GJUSwXZfcayrrLMaCBHqcHsaM3eITm0s= allowed-ips 10.0.2.0/24,224.0.0.0/8,172.16.0.0/16 endpoint 172.16.2.102:51822

sudo ip link set wg0 up

NOTE: We add 224.0.0.0/8 and 172.16.0.0/16 to allowed-ips. The first allows the OSPF multicast packets through. The latter will allow us to route to our private network that has been joined by WG tunnels.

sudo wg set wg1 listen-port 51831 private-key /etc/wireguard/wg1-privatekey

# Node4 (3.104) public key is: GzY59HlXkCkfXl9uSkEFTHzOtBsxQFKu3KWGFH5P9Qc= (cat /etc/wireguard/wg1-publickey)

sudo wg set wg1 peer GzY59HlXkCkfXl9uSkEFTHzOtBsxQFKu3KWGFH5P9Qc= allowed-ips 10.0.3.0/24,224.0.0.0/8,172.16.0.0/16 endpoint 172.16.3.104:51834

sudo ip link set wg1 up

Setup FRR:
sudo tee /etc/frr/frr.conf <<EOF
hostname $(hostname)

password frr
enable password frr

log file /var/log/frr/frr.log

router ospf
network 10.0.2.0/24 area 0.0.0.0
network 10.0.3.0/24 area 0.0.0.0
redistribute connected
EOF

sudo systemctl restart frr

wireguard-2

Firstly using:
sudo virsh dumpxml wireguard-2 |less

We can see that eth1 is net-blue and eth2 is net-white so:

ssh wireguard-2

First IPs:
sudo ip address add dev eth1 172.16.2.102/24
sudo ip address add dev eth2 172.16.1.102/24
sudo ip address add dev wg0 10.0.2.102/24


Load up the tunnels:
sudo wg set wg0 listen-port 51822 private-key /etc/wireguard/wg0-privatekey

# Node1 (2.101) public key is: ZsHAeRbNsK66MBOwDJhdDgJRl0bPFB4WVRX67vAV7zs= (cat /etc/wireguard/wg0-publickey)

sudo wg set wg0 peer ZsHAeRbNsK66MBOwDJhdDgJRl0bPFB4WVRX67vAV7zs= allowed-ips 10.0.2.0/24,224.0.0.0/8,172.16.0.0/16 endpoint 172.16.2.101:51821

sudo ip link set wg0 up

NOTE: We add 224.0.0.0/8 and 172.16.0.0/16 to allowed-ips. The first allows the OSPF multicast packets through. The latter will allow us to route to our private network that has been joined by WG tunnels.

Setup FRR:
sudo tee /etc/frr/frr.conf <<EOF
hostname $(hostname)


password frr
enable password frr

log file /var/log/frr/frr.log

router ospf
network 10.0.2.0/24 area 0.0.0.0
redistribute connected
EOF

sudo systemctl restart frr

wireguard-3

Only has a net-white, so it must be eth1 so:

ssh wireguard-3

First IPs:
sudo ip address add dev eth1 172.16.1.103/24

Has no WG tunnels or FRR so we’re done here.

wireguard-4

Firstly using:
sudo virsh dumpxml wireguard-4 |less

We can see that eth1 is net-orange and eth2 is net-green so:

ssh wireguard-4

First IPs:
sudo ip address add dev eth1 172.16.4.104/24
sudo ip address add dev eth2 172.16.3.104/24
sudo ip address add dev wg0 10.0.4.104/24
sudo ip address add dev wg1 10.0.3.104/24

Load up the tunnels:
sudo wg set wg0 listen-port 51844 private-key /etc/wireguard/wg0-privatekey

# Node5 (4.105) public key is: Af/sIEnklG6nnDb0wzUSq1D/Ujh6TH+5R9TblLyS3h8= (cat /etc/wireguard/wg0-publickey)

sudo wg set wg0 peer Af/sIEnklG6nnDb0wzUSq1D/Ujh6TH+5R9TblLyS3h8= allowed-ips 10.0.4.0/24,224.0.0.0/8,172.16.0.0/16 endpoint 172.16.4.105:51845

sudo ip link set wg0 up

NOTE: We add 224.0.0.0/8 and 172.16.0.0/16 to allowed-ips. The first allows the OSPF multicast packets through. The latter will allow us to route to our private network that has been joined by WG tunnels.

sudo wg set wg1 listen-port 51834 private-key /etc/wireguard/wg1-privatekey

# Node1 (3.101) public key is: Yh0kKjoqnJsxbCsTkQ/3uncEhdqa+EtJXCYcVzMdugs= (cat /etc/wireguard/wg1-publickey)

sudo wg set wg1 peer Yh0kKjoqnJsxbCsTkQ/3uncEhdqa+EtJXCYcVzMdugs= allowed-ips 10.0.3.0/24,224.0.0.0/8,172.16.0.0/16 endpoint 172.16.3.101:51831

sudo ip link set wg1 up

Setup FRR:
sudo tee /etc/frr/frr.conf <<EOF
hostname $(hostname)

password frr
enable password frr

log file /var/log/frr/frr.log

router osfp

network 10.0.3.0/24 area 0.0.0.0
network 10.0.4.0/24 area 0.0.0.0
redistribute connected
EOF


sudo systemctl restart frr

wireguard-5

Firstly using:
sudo virsh dumpxml wireguard-5 |less

We can see that eth1 is net-orange and eth2 is net-yellow so:

ssh wireguard-5

First IPs”
sudo ip address add dev eth1 172.16.4.105/24
sudo ip address add dev eth2 172.16.5.105/24
sudo ip address add dev wg0 10.0.4.105/24

Load up the tunnels:
sudo wg set wg0 listen-port 51845 private-key /etc/wireguard/wg0-privatekey

# Node4 (4.104) public key is: aPA197sLN3F05bgePpeS2uZFlhRRLY8yVWnzBAUcD3A= (cat /etc/wireguard/wg0-publickey)

sudo wg set wg0 peer aPA197sLN3F05bgePpeS2uZFlhRRLY8yVWnzBAUcD3A= allowed-ips 10.0.4.0/24,224.0.0.0/8,172.16.0.0/16 endpoint 172.16.4.104:51844

sudo ip link set wg0 up

NOTE: We add 224.0.0.0/8 and 172.16.0.0/16 to allowed-ips. The first allows the OSPF multicast packets through. The latter will allow us to route to our private network that has been joined by WG tunnels.

Setup FRR:
sudo tee /etc/frr/frr.conf <<EOF
hostname $(hostname)

password frr
enable password frr

log file /var/log/frr/frr.log

router ospf

network 10.0.4.0/24 area 0.0.0.0
redistribute connected
EOF


sudo systemctl restart frr

wireguard-6

Only has a net-yellow, so it must be eth1 so:

ssh wireguard-6

First IPs:
sudo ip address add dev eth1 172.16.5.106/24

Final comments

After all this, you now should be where I’m up to. Have an environment that is sharing routes though the WG interfaces.

The current issue I have is that if I go and ping from wireguard-1 to wireguard-5, the ICMP packet happily routes through into the 10.0.3.0/24 tunnel. When it pops out in wg1 of wireguard-4 the kernel isn’t routing it onto wireguard-5 through wg0, or WG isn’t putting the packet into the IP stack or Forwarding queue to continue it’s journey.

Well that is my current assumption. Hopefully I’ll get to the bottom of it soon, and in which case I’ll post it here 🙂

February 13, 2020

POC WireGuard + FRR Setup a.k.a dodgy meshy test network

It’s hackweek at Suse! Probably one of my favourite times of year, though I think they come up every 9 months or so.

Anyway, this hackweek I’ve been on a WireGuard journey. I started reading the paper and all the docs. Briefly looking into the code, sitting in the IRC channel and joining the mailing list to get a feel for the community.

There is still 1 day left of hackweek, so I hope to spend more time in the code, and maybe, just maybe see if I can fix a bug.. although they don’t seem to have tracker like most projects, so let’s see how that goes.

The community seems pretty cool. The tech, frankly pretty amazing, even I, from a cloud storage background, understood most the paper.

I had set up a tunnel, tcpdumped traffic, used wireshark to look closely at the packets as I read the paper, it was very informative. But I really wanted to get a feel for how this tech could work. They do have a wg-dynamic project which is planning on use wg as a building block to do cooler things, like mesh networking. This sounds cool, so I wanted to sync my teeth in and see how, not wg-dynamic, but see if I could build something similar out of existing OSS tech, and see where the gotchas are, outside of the obviously less secure. It seemed like a good way to better understand the technology.

So on Wednesday, I decided to do just that. Today is Thursday and I’ve gotten to a point where I can say I partially succeeded. And before I delve in deeper and try and figure out my current stumbling block, I thought I’d write down where I am.. and how I got here.. to:

  1. Point the wireguard community at, in case they’re interested.
  2. So you all can follow along at home, because it’s pretty interesting, I think.

As this title suggests, the plan is/was to setup a bunch of tunnels and use FRR to set up some routing protocols up to talk via these tunnels, auto-magically 🙂

UPDATE: The problem I describe in this post, routes becoming stale, only seems to happen when using RIPv2. When I change it to OSPFv2 all the routes work as expected!! Will write a follow up post to explain the differences.. in fact may rework the notes for it too 🙂

The problem at hand

Test network VM topology

A picture is worth 1000 words. The basic idea is to simulate a bunch of machines and networks connected over wireguard (WG) tunnels. So I created 6 vms, connected as you can see above.

I used Chris Smart’s ansible-virt-infra project, which is pretty awesome, to build up the VMs and networks as you see above. I’ll leave my build notes as an appendix to this post.

Once I have the infrastructure setup, I build all the tunnels as they are in the image. Then went ahead and installed FRR on all the nodes with tunnels (nodes 1, 2, 4, and 5). To keep things simple, I started with the easiest to configure routing protocol, RIPv2.

Believe it or not, everything seemed to work.. well mostly. I can jump on say node 5 (wireguard-5 if you playing along at home) and:

suse@wireguard-5:~> ip r
default via 172.16.0.1 dev eth0 proto dhcp
10.0.2.0/24 via 10.0.4.104 dev wg0 proto 189 metric 20
10.0.3.0/24 via 10.0.4.104 dev wg0 proto 189 metric 20
10.0.4.0/24 dev wg0 proto kernel scope link src 10.0.4.105
172.16.0.0/24 dev eth0 proto kernel scope link src 172.16.0.36
172.16.2.0/24 via 10.0.4.104 dev wg0 proto 189 metric 20
172.16.3.0/24 via 10.0.4.104 dev wg0 proto 189 metric 20
172.16.4.0/24 dev eth1 proto kernel scope link src 172.16.4.105
172.16.5.0/24 dev eth2 proto kernel scope link src 172.16.5.105

Looks good right, we see routes for networks 172.16.{0,2,3,4,5}.0/24. Network 1 isn’t there, but hey that’s quite far away, maybe it hasn’t made it yet. Which leads to the real issue.

If I go and run ip r again, soon all these routes will become stale and disappear. Running ip -ts monitor shows just that.

So the question is, what’s happening to the RIP advertisements? And yes they’re still being sent. Then how come some made it to node 5, and never again.

The simple answer is, it was me. The long answer is, I’ve never used FRR before, and it just didn’t seem to be working. So I started debugging the env. To debug, I had a tmux session opened on the KVM host with a tab for each node running FRR. I’d go to each tab and run tcpdump to check to see if the RIP traffic was making it through the tunnel. And almost instantly, I saw traffic, like:

suse@wireguard-5:~> sudo tcpdump -v -U -i wg0 port 520
tcpdump: listening on wg0, link-type RAW (Raw IP), capture size 262144 bytes
03:01:00.006408 IP (tos 0xc0, ttl 64, id 62964, offset 0, flags [DF], proto UDP (17), length 52)
10.0.4.105.router > 10.0.4.255.router:
RIPv2, Request, length: 24, routes: 1 or less
AFI 0, 0.0.0.0/0 , tag 0x0000, metric: 16, next-hop: self
03:01:00.007005 IP (tos 0xc0, ttl 64, id 41698, offset 0, flags [DF], proto UDP (17), length 172)
10.0.4.104.router > 10.0.4.105.router:
RIPv2, Response, length: 144, routes: 7 or less
AFI IPv4, 0.0.0.0/0 , tag 0x0000, metric: 1, next-hop: self
AFI IPv4, 10.0.2.0/24, tag 0x0000, metric: 2, next-hop: self
AFI IPv4, 10.0.3.0/24, tag 0x0000, metric: 1, next-hop: self
AFI IPv4, 172.16.0.0/24, tag 0x0000, metric: 1, next-hop: self
AFI IPv4, 172.16.2.0/24, tag 0x0000, metric: 2, next-hop: self
AFI IPv4, 172.16.3.0/24, tag 0x0000, metric: 1, next-hop: self
AFI IPv4, 172.16.4.0/24, tag 0x0000, metric: 1, next-hop: self

At first I thought it was good timing. I jumped to another host, and when I tcpdumed the RIP packets turned up instantaneously. This happened again and again.. and yes it took me longer then I’d like to admit before it dawned on me.

Why are routes going stale? it seems as though the packets are getting queued/stuck in the WG interface until I poked it with tcpdump!

These RIPv2 Request packet is sent as a broadcast, not directly to the other end of the tunnel. To get it to not be dropped, I had to widen my WG peer allowed-ips from the /32 to a /24.
So now I wonder if broadcast, or just the fact that it’s only 52 bytes, means it gets queued up and not sent through the tunnel, that is until I come along with a hammer and tcpdump the interface?

Maybe one way I could test this is to speed up the RIP broadcasts and hopefully fill a buffer, or see if I can turn WG, or rather the kernel, into debugging mode.

Build notes

As Promised, here are the current form of my build notes, make reference to the topology image I used above.

BTW I’m using OpenSuse Leap 15.1 for all the nodes.

Build the env

I used ansible-virt-infra created by csmart to build the env. A created my own inventory file, which you can dump in the inventory/ folder which I called wireguard.yml:

---
wireguard:
hosts:
wireguard-1:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-blue"
- name: "net-green"
wireguard-2:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-blue"
- name: "net-white"
wireguard-3:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-white"
wireguard-4:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-orange"
- name: "net-green"
wireguard-5:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-orange"
- name: "net-yellow"
wireguard-6:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-yellow"
vars:
virt_infra_distro: opensuse
virt_infra_distro_image: openSUSE-Leap-15.1-JeOS.x86_64-15.1.0-OpenStack-Cloud-Current.qcow2
virt_infra_distro_image_url: https://download.opensuse.org/distribution/leap/15.1/jeos/openSUSE-Leap-15.1-JeOS.x86_64-15.1.0-OpenStack-Cloud-Current.qcow2
virt_infra_variant: opensuse15.1

Next we need to make sure the networks have been defined, we do this in the kvmhost inventory file, here’s a diff:

diff --git a/inventory/kvmhost.yml b/inventory/kvmhost.yml
index b1f029e..6d2485b 100644
--- a/inventory/kvmhost.yml
+++ b/inventory/kvmhost.yml
@@ -40,6 +40,36 @@ kvmhost:
           subnet: "255.255.255.0"
           dhcp_start: "10.255.255.2"
           dhcp_end: "10.255.255.254"
+        - name: "net-mgmt"
+          ip_address: "172.16.0.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.0.2"
+          dhcp_end: "172.16.0.99"
+        - name: "net-white"
+          ip_address: "172.16.1.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.1.2"
+          dhcp_end: "172.16.1.99"
+        - name: "net-blue"
+          ip_address: "172.16.2.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.2.2"
+          dhcp_end: "172.16.2.99"
+        - name: "net-green"
+          ip_address: "172.16.3.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.3.2"
+          dhcp_end: "172.16.3.99"
+        - name: "net-orange"
+          ip_address: "172.16.4.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.4.2"
+          dhcp_end: "172.16.4.99"
+        - name: "net-yellow"
+          ip_address: "172.16.5.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.5.2"
+          dhcp_end: "172.16.5.99"
     virt_infra_host_deps:
         - qemu-img
         - osinfo-query

Now all we need to do is run the playbook:

ansible-playbook --limit kvmhost,wireguard ./virt-infra.yml

Setting up the IPs and tunnels

This above infrastructure tool uses cloud_init to set up the network, so only the first NIC is up. You can confirm this with:

ansible wireguard -m shell -a "sudo ip a"

That’s ok because we want to use the numbers on our diagram anyway 🙂
Before we get to that, lets make sure wireguard is setup, and update all the nodes.

ansible wireguard -m shell -a "sudo zypper update -y"

If a reboot is required, reboot the nodes:

ansible wireguard -m shell -a "sudo reboot"

Add the wireguard repo to the nodes and install it, I look forward to 5.6 where wireguard will be included in the kernel:

ansible wireguard -m shell -a "sudo zypper addrepo -f obs://network:vpn:wireguard wireguard"

ansible wireguard -m shell -a "sudo zypper --gpg-auto-import-keys install -y wireguard-kmp-default wireguard-tools"

Load the kernel module:

ansible wireguard -m shell -a "sudo modprobe wireguard"

Let’s create wg0 on all wireguard nodes:

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo ip link add dev wg0 type wireguard"

And add wg1 to those nodes that have 2:

ansible wireguard-1,wireguard-4 -m shell -a "sudo ip link add dev wg1 type wireguard"

Now while we’re at it, lets create all the wireguard keys (because we can use ansible):

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo mkdir -p /etc/wireguard"

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "wg genkey | sudo tee /etc/wireguard/wg0-privatekey | wg pubkey | sudo tee /etc/wireguard/wg0-publickey"

ansible wireguard-1,wireguard-4 -m shell -a "wg genkey | sudo tee /etc/wireguard/wg1-privatekey | wg pubkey | sudo tee /etc/wireguard/wg1-publickey"

Let’s make sure we enable forwarding on the nodes the will pass traffic, and install the routing software (1,2,4 and 5):

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo sysctl net.ipv4.conf.all.forwarding=1"

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo sysctl net.ipv6.conf.all.forwarding=1"

While we’re at it, we might as well add the network repo so we can install FRR and then install it on the nodes:

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo zypper ar https://download.opensuse.org/repositories/network/openSUSE_Leap_15.1/ network"

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo zypper --gpg-auto-import-keys install -y frr libyang-extentions"

We’ll be using RIPv2, as we’re just using IPv4:

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo sed -i 's/^ripd=no/ripd=yes/' /etc/frr/daemons"

And with that now we just need to do all per server things like add IPs and configure all the keys, peers, etc. We’ll do this a host at a time.
NOTE: As this is a POC we’re just using ip commands, obviously in a real env you’d wont to use systemd-networkd or something to make these stick.

wireguard-1

Firstly using:
sudo virsh dumpxml wireguard-1 |less

We can see that eth1 is net-blue and eth2 is net-green so:
ssh wireguard-1

First IPs:
sudo ip address add dev eth1 172.16.2.101/24
sudo ip address add dev eth2 172.16.3.101/24
sudo ip address add dev wg0 10.0.2.101/24
sudo ip address add dev wg1 10.0.3.101/24

Load up the tunnels:
sudo wg set wg0 listen-port 51821 private-key /etc/wireguard/wg0-privatekey

# Node2 (2.102) public key is: P1tHKnaw7d2GJUSwXZfcayrrLMaCBHqcHsaM3eITm0s= (cat /etc/wireguard/wg0-publickey)

sudo wg set wg0 peer P1tHKnaw7d2GJUSwXZfcayrrLMaCBHqcHsaM3eITm0s= allowed-ips 10.0.2.0/24 endpoint 172.16.2.102:51822

sudo ip link set wg0 up

sudo wg set wg1 listen-port 51831 private-key /etc/wireguard/wg1-privatekey

# Node4 (3.104) public key is: GzY59HlXkCkfXl9uSkEFTHzOtBsxQFKu3KWGFH5P9Qc= (cat /etc/wireguard/wg1-publickey)

sudo wg set wg1 peer GzY59HlXkCkfXl9uSkEFTHzOtBsxQFKu3KWGFH5P9Qc= allowed-ips 10.0.3.0/24 endpoint 172.16.3.104:51834

sudo ip link set wg1 up

Setup FRR:
sudo tee /etc/frr/frr.conf <<EOF
hostname $(hostname)

password frr
enable password frr

log file /var/log/frr/frr.log

router rip
version 2
redistribute kernel
redistribute connected

network wg0
no passive-interface wg0
network wg1
no passive-interface wg1
EOF

sudo systemctl restart frr

wireguard-2

Firstly using:
sudo virsh dumpxml wireguard-2 |less

We can see that eth1 is net-blue and eth2 is net-white so:

ssh wireguard-2

First IPs:
sudo ip address add dev eth1 172.16.2.102/24
sudo ip address add dev eth2 172.16.1.102/24
sudo ip address add dev wg0 10.0.2.102/24


Load up the tunnels:
sudo wg set wg0 listen-port 51822 private-key /etc/wireguard/wg0-privatekey

# Node1 (2.101) public key is: ZsHAeRbNsK66MBOwDJhdDgJRl0bPFB4WVRX67vAV7zs= (cat /etc/wireguard/wg0-publickey)

sudo wg set wg0 peer ZsHAeRbNsK66MBOwDJhdDgJRl0bPFB4WVRX67vAV7zs= allowed-ips 10.0.2.0/24 endpoint 172.16.2.101:51821

sudo ip link set wg0 up

Setup FRR:
sudo tee /etc/frr/frr.conf <<EOF
hostname $(hostname)


password frr
enable password frr

log file /var/log/frr/frr.log

router rip
version 2
redistribute kernel
redistribute connected

network wg0
no passive-interface wg0
EOF

sudo systemctl restart frr

wireguard-3

Only has a net-white, so it must be eth1 so:

ssh wireguard-3

First IPs:
sudo ip address add dev eth1 172.16.1.103/24

Has no WG tunnels or FRR so we’re done here.

wireguard-4

Firstly using:
sudo virsh dumpxml wireguard-4 |less

We can see that eth1 is net-orange and eth2 is net-green so:

ssh wireguard-4

First IPs:
sudo ip address add dev eth1 172.16.4.104/24
sudo ip address add dev eth2 172.16.3.104/24
sudo ip address add dev wg0 10.0.4.104/24
sudo ip address add dev wg1 10.0.3.104/24

Load up the tunnels:
sudo wg set wg0 listen-port 51844 private-key /etc/wireguard/wg0-privatekey

# Node5 (4.105) public key is: Af/sIEnklG6nnDb0wzUSq1D/Ujh6TH+5R9TblLyS3h8= (cat /etc/wireguard/wg0-publickey)

sudo wg set wg0 peer Af/sIEnklG6nnDb0wzUSq1D/Ujh6TH+5R9TblLyS3h8= allowed-ips 10.0.4.0/24 endpoint 172.16.4.105:51845

sudo ip link set wg0 up

sudo wg set wg1 listen-port 51834 private-key /etc/wireguard/wg1-privatekey

# Node1 (3.101) public key is: Yh0kKjoqnJsxbCsTkQ/3uncEhdqa+EtJXCYcVzMdugs= (cat /etc/wireguard/wg1-publickey)

sudo wg set wg1 peer Yh0kKjoqnJsxbCsTkQ/3uncEhdqa+EtJXCYcVzMdugs= allowed-ips 10.0.3.0/24 endpoint 172.16.3.101:51831

sudo ip link set wg1 up

Setup FRR:
sudo tee /etc/frr/frr.conf <<EOF
hostname $(hostname)

password frr
enable password frr

log file /var/log/frr/frr.log

router rip
version 2
redistribute kernel
redistribute connected

network wg0
no passive-interface wg0

network wg1
no passive-interface wg1
EOF


sudo systemctl restart frr

wireguard-5

Firstly using:
sudo virsh dumpxml wireguard-5 |less

We can see that eth1 is net-orange and eth2 is net-yellow so:

ssh wireguard-5

First IPs”
sudo ip address add dev eth1 172.16.4.105/24
sudo ip address add dev eth2 172.16.5.105/24
sudo ip address add dev wg0 10.0.4.105/24

Load up the tunnels:
sudo wg set wg0 listen-port 51845 private-key /etc/wireguard/wg0-privatekey

# Node4 (4.104) public key is: aPA197sLN3F05bgePpeS2uZFlhRRLY8yVWnzBAUcD3A= (cat /etc/wireguard/wg0-publickey)

sudo wg set wg0 peer aPA197sLN3F05bgePpeS2uZFlhRRLY8yVWnzBAUcD3A= allowed-ips 10.0.4.0/24 endpoint 172.16.4.104:51844

sudo ip link set wg0 up

Setup FRR:
sudo tee /etc/frr/frr.conf <<EOF
hostname $(hostname)

password frr
enable password frr

log file /var/log/frr/frr.log

router rip
version 2
redistribute kernel
redistribute connected

network wg0
no passive-interface wg0
EOF


sudo systemctl restart frr

wireguard-6

Only has a net-yellow, so it must be eth1 so:

ssh wireguard-6

First IPs:
sudo ip address add dev eth1 172.16.5.106/24

Final comments

When this _is_ all working, we’d probably need to open up the allowed-ips on the WG tunnels. We could start by just adding 172.16.0.0/16 to the list. That might allow us to route packet to the other networks.

If you want to go find other routes out to the internet, then we may need 0.0.0.0/0 But not sure how WG will route that as it’s using the allowed-ips and public keys as a routing table. I guess it may not care as we only have a 1:1 mapping on each tunnel and if we can route to the WG interface, it’s pretty straight forward.
This is something I hope to test.

Anther really beneficial test would be to rebuild this environment using IPv6 and see if things work better as we wouldn’t have any broadcasts anymore, only uni and multi-cast.

As well as trying some other routing protocol in general, like OSPF.

Finally, having to continually adjust allowed-ips and seemingly have to either open it up more or add more ranges make me realise why the wg-dynamic project exists, and why they want to come up with a secure routing protocol to use through the tunnels, to do something similar. So let’s keep an eye on that project.

February 10, 2020

Fedora 31 LXC setup on Ubuntu Bionic 18.04

Similarly to what I wrote for Fedora 29, here is how I was able to create a Fedora 31 LXC container on an Ubuntu 18.04 (bionic) laptop.

Setting up LXC on Ubuntu

First of all, install lxc:

apt install lxc
echo "veth" >> /etc/modules
modprobe veth

turn on bridged networking by putting the following in /etc/sysctl.d/local.conf:

net.ipv4.ip_forward=1

and applying it using:

sysctl -p /etc/sysctl.d/local.conf

Then allow the right traffic in your firewall (/etc/network/iptables.up.rules in my case):

# LXC containers
-A FORWARD -d 10.0.3.0/24 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -s 10.0.3.0/24 -j ACCEPT
-A INPUT -d 224.0.0.251 -s 10.0.3.1 -j ACCEPT
-A INPUT -d 239.255.255.250 -s 10.0.3.1 -j ACCEPT
-A INPUT -d 10.0.3.255 -s 10.0.3.1 -j ACCEPT
-A INPUT -d 10.0.3.1 -s 10.0.3.0/24 -j ACCEPT

and apply these changes:

iptables-apply

before restarting the lxc networking:

systemctl restart lxc-net.service

Create the container

Once that's in place, you can finally create the Fedora 29 container:

lxc-create -n fedora31 -t download -- -d fedora -r 31 -a amd64

To see a list of all distros available with the download template:

lxc-create -n foo --template=download -- --list

Once the container has been created, disable AppArmor for it:

lxc.apparmor.profile = unconfined

since the AppArmor profile isn't working at the moment.

Logging in as root

Starting the container in one window:

lxc-start -n fedora31 -F

and attaching to a console:

lxc-attach -n fedora31

to set a root password:

passwd

Logging in as an unprivileged user via ssh

While logged into the console, I tried to install ssh:

$ dnf install openssh-server
Cannot create temporary file - mkstemp: No such file or directory

but it failed because TMPDIR is set to a non-existent directory:

$ echo $TMPDIR
/tmp/user/0

I found a fix and ran the following:

TMPDIR=/tmp dnf install openssh-server

then started the ssh service:

systemctl start sshd.service

Then I installed a few other packages as root:

dnf install vim sudo man

and created an unprivileged user with sudo access:

adduser francois -G wheel
passwd francois

I set this in /etc/ssh/sshd_config:

GSSAPIAuthentication no

to prevent slow ssh logins.

Now login as that user from the console and add an ssh public key:

mkdir .ssh
chmod 700 .ssh
echo "<your public key>" > .ssh/authorized_keys
chmod 644 .ssh/authorized_keys

You can now login via ssh. The IP address to use can be seen in the output of:

lxc-ls --fancy

February 07, 2020

Visualing Phase Noise

A few months ago I was helping Gerhard, OE3GBB, track down some FreeDV 2020 sync issues over the QO-100 satellite.

Along the way, we investigated the phase noise of the QO-100 channel (including Gerhards Tx and Rx) by sending a carrier signal over the link, then running it through a GNU Octave phase_noise.m script to generate some interesting plots.

Fig 1 shows the spectrum of the carrier, some band pass noise in the SSB channel, and the single sinewave line at about 1500 Hz:

Fig 2 is a close up, where we have shifted the 1500 Hz tone down to 0 Hz. It’s not really a single frequency, but has a noise like spectra:

Figure 3 is polar plot or the I and Q (real and imag) against time. A perfect oscillator with a small frequency offset would trace a neat spiral, but due to the noise is wanders all over the place. Fig3A shows a close up of the first 5 seconds, where it reverses a few times, like a wheel rotating forwards and backwards at random:


Figure 4 is the “unwrapped phase” in radians. Unwrapping means if we get to -pi we just keep going, rather than wrapping around to pi. A constant slope suggests a constant frequency segment, for example in the first 5 seconds it wanders downwards -15 radians which suggests a frequency of -15/5 = -3 rads/sec or -3/(2*pi) = -0.5 Hz. The upwards slope from about 8 seconds is a positive frequency segment.

Figure 5 is the rate of change phase, in other words the instantaneous frequency offset, which is about -0.5 Hz at 8 seconds, then swings positive for a while:

Why does all this matter? Well phase shift keyed modems like QPSK have to track this phase. We were concerned about the ability of the FreeDV 2020 QPSK modem to track phase over QO-100. You also get similar meandering phase tracks over HF channels.

Turns out the GPS locking on one of the oscillators wasn’t working quite right, leading to step changes in the oscillator phase. So in this case, a hardware problem rather than the QPSK modem.

Links

QO-100 Sync Pull Request (with lots of notes)
FreeDV 2020 over the QO-100 Satellite
Digital Voice Transmission via QO-100 with FreeDV Mode 2020 (Lime Micro article)

A quick reflection on digital for posterity

On the eve of moving to Ottawa to join the Service Canada team (squee!) I thought it would be helpful to share a few things for posterity. There are three things below:

  • Some observations that might be useful
  • A short overview of the Pia Review: 20 articles about digital public sector reform
  • Additional references I think are outstanding and worth considering in public sector digital/reform programs, especially policy transformation

Some observations

Moving from deficit to aspirational planning

Risk! Risk!! Risk!!! That one word is responsible for an incredible amount of fear, inaction, redirection of investment and counter-productive behaviours, especially by public sectors for whom the stakes for the economy and society are so high. But when you focus all your efforts on mitigating risks, you are trying to drive by only using the rear vision mirror, planning your next step based on the issues you’ve already experienced without looking to where you need to be. It ultimately leads to people driving slower and slower, often grinding to a halt, because any action is considered more risky than inaction. This doesn’t really help our metaphorical driver to pick up the kids from school or get supplies from the store. In any case, inaction bears as many risks as no action in a world that is continually changing. For example, if our metaphorical driver was to stop the car in an intersection they will likely be hit by another vehicle, or eventually starve to death.

Action is necessary. Change is inevitable. So public sectors must balance our time between being responsive (not reactive) to change and risks, and being proactive towards a clear goals or future state.

Of course, risk mitigation is what many in government think they need to most urgently address however, to only engage this is to buy into and perpetuate the myth that the increasing pace of change is itself a bad thing. This is the difference between user polling and user research: users think they need faster horses but actually they need a better way to transport more people over longer distances, which could lead to alternatives from horses. Shifting from a change pessimistic framing to change optimism is critical for public sectors to start to build responsiveness into their policy, program and project management. Until public servants embrace change as normal, natural and part of their work, then fear and fear based behaviours will drive reactivism and sub-optimal outcomes.

The OPSI model for innovation would be a helpful tool to ask senior public servants what proportion of their digital investment is in which box, as this will help identify how aspirational vs reactive, and how top down or bottom up they are, noting that there really should be some investment and tactics in all four quadrants.

Innovation-Facets-Diamond-1024x630My observation of many government digital programs is that teams spend a lot of their time doing top down (directed) work that focuses on areas of certainty, but misses out in building the capacity or vision required for bottom up innovation, or anything that genuinely explores and engages in areas of uncertainty. Central agencies and digital transformation teams are in the important and unique position to independently stand back to see the forest for the trees, and help shape systemic responses to all of system problems. My biggest recommendation would be for the these teams to support public sector partners to embrace change optimism, proactive planning, and responsiveness/resilience into their approaches, so as to be more genuinely strategic and effective in dealing with change, but more importantly, to better plan strategically towards something meaningful for their context.

Repeatability and scale

All digital efforts might be considered through the lens of repeatability and scale.

  • If you are doing something, anything, could you publish it or a version of it for others to learn from or reuse? Can you work in the open for any of your work (not just publish after the fact)? If policy development, new services or even experimental projects could be done openly from the start, they will help drive a race to the top between departments.
  • How would the thing you are considering scale? How would you scale impact without scaling resources? Basically, for anything you, if you’d need to dramatically scale resources to implement, then you are not getting an exponential response to the problem.

Sometimes doing non scalable work is fine to test an idea, but actively trying to differentiate between work that addresses symptomatic relief versus work that addresses causal factors is critical, otherwise you will inevitably find 100% of your work program focused on symptomatic relief.

It is critical to balance programs according to both fast value (short term delivery projects) and long value (multi month/year program delivery), reactive and proactive measures, symptomatic relief and addressing causal factors, & differentiating between program foundations (gov as a platform) and programs themselves. When governments don’t invest in digital foundations, they end up duplicating infrastructure for each and every program, which leads to the reduction of capacity, agility and responsiveness to change.

Digital foundations

Most government digital programs seem to focus on small experiments, which is great for individual initiatives, but may not lay the reusable digital foundations for many programs. I would suggest that in whatever projects the team embark upon, some effort be made to explore and demonstrate what the digital foundations for government should look like. For example:

  • Digital public infrastructure - what are the things government is uniquely responsible for that it should make available as digital public infrastructure for others to build upon, and indeed for itself to consume. Eg, legislation as code, services registers, transactional service APIs, core information and data assets (spatial, research, statistics, budgets, etc), central budget management systems. “Government as a Platform” is a digital and transformation strategy, not just a technology approach.
  • Policy transformation and closing the implementation gap -  many policy teams think the issues of policy intent not being realised is not their problem, so showing the value of multidisciplinary, test-driven and end to end policy design and implementation will dramatically shift digital efforts towards more holistic, sustainable and predictable policy and societal outcomes.
  • Participatory governance - departments need to engage the public in policy, services or program design, so demonstrating the value or participatory governance is key. this is not a nice to have, but rather a necessary part of delivering good services. Here is a recent article with some concepts and methods to consider and the team needs to have capabilities to enable this, that aren’t just communications skills, but rather genuine and subject matter expertise engagement.
  • Life Journey programs - putting digital transformation efforts,, policies, service delivery improvements and indeed any other government work in the context of life journeys helps to make it real, get multiple entities that play a part on that journey naturally involved and invested, and drives horizontal collaboration across and between jurisdictions. New Zealand led the way in this, NSW Government extended the methodology, Estonia has started the journey and they are systemically benefiting.
  • I’ve spoken about designing better futures, and I do believe this is also a digital foundation, as it provides a lens through which to prioritise, implement and realise value from all of the above. Getting public servants to “design the good” from a citizen perspective, a business perspective, an agency perspective, Government perspective and from a society perspective helps flush out assumptions, direction and hypotheses that need testing.

The Pia Review

I recently wrote a series of 20 articles about digital transformation and reform in public sectors. It was something I did for fun, in my own time, as a way of both recording and sharing my lessons learned from 20 years working at the intersection of tech, government and society (half in the private sector, half in the public sector). I called it the Public Sector Pia Review and I’ve been delighted by how it has been received, with a global audience republishing, sharing, commenting, and most important, starting new discussions about the sort of public sector they want and the sort of public servants they want to be. Below is a deck that has an insight from each of the 20 articles, and links throughout.

This is not just meant to be a series about digital, but rather about the matter of public sector reform in the broadest sense, and I hope it is a useful contribution to better public sectors, not just better public services.

There is also a collated version of the articles in two parts. These compilations are linked below for convenience, and all articles are linked in the references below for context.

  • Public-Sector-Pia-Review-Part-1 (6MB PDF) — essays written to provide practical tips, methods, tricks and ideas to help public servants to their best possible work today for the best possible public outcomes; and
  • Reimagining government (will link once published) — essays about possible futures, the big existential, systemic or structural challenges and opportunities as I’ve experienced them, paradigm shifts and the urgent need for everyone to reimagine how they best serve the government, the parliament and the people, today and into the future.

A huge thank you to the Mandarin, specifically Harley Dennett, for the support and encouragement to do this, as well as thanks to all the peer reviewers and contributors, and of course my wonderful husband Thomas who peer reviewed several articles, including the trickier ones!

My digital references and links from 2019

Below are a number of useful references for consideration in any digital government strategy, program or project, including some of mine :)

General reading

Life Journeys as a Strategy

Life Journey programs, whilst largely misunderstood and quite new to government, provide a surprisingly effective way to drive cross agency collaboration, holistic service and system design, prioritisation of investment for best outcomes, and a way to really connect policy, services and human outcomes with all involved on the usual service delivery supply chains in public sectors. Please refer to the following references, noting that New Zealand were the first to really explore this space, and are being rapidly followed by other governments around the world. Also please note the important difference between customer journey mapping (common), customer mapping that spans services but is still limited to a single agency/department (also common), and true life journey mapping which necessarily spans agencies, jurisdictions and even sectors (rare) like having a child, end of life, starting school or becoming an adult.

Policy transformation

Data in Government

Designing better futures to transform towards

If you don’t design a future state to work towards, then you end up just designing reactively to current, past or potential issues. This leads to a lack of strategic or cohesive direction in any particular direction, which leads to systemic fragmentation and ultimately system ineffectiveness and cannibalism. A clear direction isn’t just about principles or goals, it needs to be something people can see, connect with, align their work towards to (even if they aren’t in your team), and get enthusiastic about. This is how you create change at scale, when people buy into the agenda, at all levels, and start naturally walking in the same direction regardless of their role. Here are some examples for consideration.

Rules as Code

Please find the relevant Rules as Code links below for easy reference.

Better Rules and RaC examples

February 04, 2020

Deleted Mapped Files

On a Linux system if you upgrade a shared object that is in use any programs that have it mapped will list it as “(deleted)” in the /proc/PID/maps file for the process in question. When you have a system tracking the stable branch of a distribution it’s expected that most times a shared object is upgraded it will be due to a security issue. When that happens the reasonable options are to either restart all programs that use the shared object or to compare the attack surface of such programs to the nature of the security issue. In most cases restarting all programs that use the shared object is by far the easiest and least inconvenient option.

Generally shared objects are used a lot in a typical Linux system, this can be good for performance (more cache efficiency and less RAM use) and is also good for security as buggy code can be replaced for the entire system by replacing a single shared object. Sometimes it’s obvious which processes will be using a shared object (EG your web server using a PHP shared object) but other times many processes that you don’t expect will use it.

I recently wrote “deleted-mapped.monitor” for my etbemon project [1]. This checks for shared objects that are mapped and deleted and gives separate warning messages for root and non-root processes. If you have the unattended-upgrades package installed then your system can install security updates without your interaction and then the monitoring system will inform you if things need to be restarted.

The Debian package debian-goodies has a program checkrestart that will tell you what commands to use to restart daemons that have deleted shared objects mapped.

Now to solve the problem of security updates on a Debian system you can use unattended-upgrades to apply updates, deleted-mapped.monitor in etbemon to inform you that programs need to be restarted, and checkrestart to tell you the commands you need to run to restart the daemons in question.

If anyone writes a blog post about how to do this on a non-Debian system please put the URL in a comment.

While writing the deleted-mapped.monitor I learned about the following common uses of deleted mapped files:

  • /memfd: is for memfd https://dvdhrm.wordpress.com/tag/memfd/ [2]
  • /[aio] is for asynchronous IO I guess, haven’t found good docs on it yet.
  • /home is used for a lot of harmless mapping and deleting.
  • /run/user is used for systemd dconf stuff.
  • /dev/zero is different for each map and thus looks deleted.
  • /tmp/ is used by Python (and probably other programs) creates temporary files there for mapping.
  • /var/lib is used for lots of temporary files.
  • /i915 is used by some X apps on systems with Intel video, I don’t know why.

February 03, 2020

Social Media Sharing on Blogs

My last post was read directly (as opposed to reading through Planet feeds) a lot more than usual due to someone sharing it on lobste.rs. Presumably the people who read it that way benefited from reading it and I got a couple of unusually insightful comments from people who don’t usually comment on my blog. The lobste.rs sharing was a win for everyone.

There are a variety of plugins for social media sharing, most of which allow organisations like Facebook to track people who read your blog which is why I haven’t been using them.

Are there good ways of allowing people to easily share your blog posts which work in a reasonable way by not allowing much tracking of users unless they actually want to share content?

February 02, 2020

LUV Meet & Greet and General Discussion

Feb 4 2020 18:30
Feb 4 2020 20:30
Feb 4 2020 18:30
Feb 4 2020 20:30
Location: 
Kathleen Syme Library, 251 Faraday Street Carlton VIC 3053

LUV Meet & Greet and General Discussion

This is a casual gathering of LUVers to meet and greet and have a general discussion!

There is no talks scheduled but there might be lightning talks. So, if you have a topic of interest you like to share with others or have a challenge that you want to get experts' opinions on, this would be a great opportunity.

If you have never used Linux or have never been to a LUV meeting, you are more than welcome to join us! We look forward to seeing you at the meeting.

Where to find us

Many of us like to go for dinner nearby in Lygon St. after the meeting. Please let us know if you'd like to join us!

Linux Users of Victoria is a subcommittee of Linux Australia.

February 4, 2020 - 18:30

read more

lca2020 ReWatch 2020-02-02

As I was an organiser of the conference this year, I didn’t get to see many talks, fortunately many of the talks were recorded, so i get to watch the conference well after the fact.

Conference Opening

That white balance on the lectern slides is indeed bad, I really should get around to adding this as a suggestion on the logos documentation. (With some help, I put up all the lectern covers, it was therapeutic and rush free).

I actually think there was a lot of information in this introduction. Perhaps too much?

OpenZFS and Linux

A nice update on where zfs is these days.

Dev/Ops relationships, status: It’s Complicated

A bit of  a war story about production systems, leading to a moment of empathy.

Samba 2020: Why are we still in the 1980s for authentication?

There are a lot of old security standards that are showing there age, there are a lot of modern security standards, but which to choose?

Tyranny of the Clock

A very interesting problem solving adventure, with a few nuggets of interesting information about tools and techniques.

Configuration Is (riskier than?) Code

Because configuration files are parsed by a program, and the program changes how it runs depending on the contents of that configuration file, every program that parses configuration files is basically an interpreter, and thus every configuration file is basically a program. So, configuation is code, and we should be treating configuration like we do code, e.g. revision control, commenting, testing, review.

Easy Geo-Redundant Handover + Failover with MARS + systemd

Using a local process organiser to handle a cluster, interesting, not something I’d really promote. Not the best video cutting in this video, lots of time with the speaker pointing to his slides offscreen.

 

Load Average Monitoring

For my ETBE-Mon [1] monitoring system I recently added a monitor for the Linux load average. The Unix load average isn’t a very good metric for monitoring system load, but it’s well known and easy to use. I’ve previously written about the Linux load average and how it’s apparently different from other Unix like OSs [2]. The monitor is still named loadavg but I’ve now made it also monitor on the usage of memory because excessive memory use and load average are often correlated.

For issues that might be transient it’s good to have a monitoring system give a reasonable amount of information about the problem so it can be diagnosed later on. So when the load average monitor gives an alert I have it display a list of D state processes (if any), a list of the top 10 processes using the most CPU time if they are using more than 5%, and a list of the top 10 processes using the most RAM if they are using more than 2% total virtual memory.

For documenting the output of the free(1) command (or /proc/meminfo when writing a program to do it) the best page I found was this StackExchange page [3]. So I compare MemAvailable+SwapFree to MemTotal+SwapTotal to determine the percentage of virtual memory used.

Any suggestions on how I could improve this?

The code is in the recent releases of etbemon, it’s in Debian/Unstable, on the project page on my site, and here’s a link to the loadave.monitor script in the Debian Salsa Git repository [4].

Another close-to-upstream Blackbird Firmware Build

A few weeks ago (okay, close to six), I put up a firmware build for the Raptor Blackbird with close-to-upstream firmware (see here).

Well, I’ve done another build! It’s current op-build (as of this morning), but my branch with patches for the Raptor Blackbird. The skiboot patch is there, as is the SBE speedup patch. Current kernel (works fine with my hardware), current petitboot, and the machine-xml which is straight from Raptor but in my repo.

Versions of everything are:

$ lsprop /sys/firmware/devicetree/base/ibm,firmware-versions/
skiboot          "v6.5-209-g179d53df-p4360f95"
bmc-firmware-version
		 "0.01"
occ              "3ab2921"
hostboot         "779761d-pe7e80e1"
buildroot        "2019.05.3-14-g17f117295f"
capp-ucode       "p9-dd2-v4"
machine-xml      "site_local-stewart-a0efd66"
hostboot-binaries
		 "hw011120a.opmst"
sbe              "166b70c-p06fc80c"
hcode            "hw011520a.opmst"
petitboot        "v1.11"
phandle          000005d0 (1488)
version          "blackbird-v2.4-415-gb63b36ef"
linux            "5.4.13-openpower1-pa361bec"
name             "ibm,firmware-versions"

You can download all the bits (including debug tarball) from https://www.flamingspork.com/blackbird/stewart-blackbird-2-images/ and follow the instructions for trying it out or flashing blackbird.pnor.

Again, would love to hear how it goes for you!

February 01, 2020

Interviewing hints (or, so you’ve been laid off…)

Share

This post is an attempt to collect a set of general hints and tips for resumes and interviews. It is not concrete truth though, like all things this process is subjective and will differ from place to place. It originally started as a Google doc shared around a previous workplace during some layoffs, but it seems more useful than that so I am publishing it publicly.

I’d welcome comments if you think it will help others.

So something bad happened

I have the distinction of having been through layoffs three times now. I think there are some important first steps:

  • Take a deep breath.
  • Hug your loved ones and then go and sweat on something — take a walk, go to the gym, whatever works for you. Research shows that exercise is a powerful mood stabiliser.
  • Make a plan. Who are you going to apply with? Who could refer you? What do you want to do employment wise? Updating your resume is probably a good first step in that plan.
  • Treat finding a job as your job. You probably can’t do it for eight hours a day, but it should be your primary goal for each “workday”. Have a todo list, track things on that list, and keep track of status.

And remember, being laid off isn’t about you, it is about things outside your control. Don’t take it as a reflection on your abilities.

Resumes

  • The goal of a resume is to get someone to want to interview you. It is not meant to be a complete description of everything you’ve done. So, keep it short and salesy (without lying through oversimplification!).
  • Resumes are also cultural — US firms tend to expect short summary (two pages), Australian firms seem to expect something longer and more detailed. So, ask your friends if you can see their resumes to get a sense of the right style for the market you’re operating in. It is possible you’ll end up with more than one version if you’re applying in two markets at once.
  • Speaking of friends, referrals are gold. Perhaps look through your LinkedIn and other social media and see where people you’ve formerly worked with are now. If you have a good reputation with someone and they’re somewhere cool, ask them to refer you for a job. It might not work, but it can’t hurt.
  • Ratings for skills on LinkedIn help recruiters find you. So perhaps rate your friends for things you think they’re good at and then ask them to return the favour?

Interviews in general

The soft interview questions we all get asked:

  • I would expect to be asked what I’ve done in my career — an “introduce yourself” moment. So try and have a coherent story that is short but interesting — “I’m a system admin who has been working on cloud orchestration and software defined networking for Australia’s largest telco” for example.
  • You will probably be asked why you’re looking for work too. I think there’s no shame in honesty here, something like “I worked for a small systems integrator that did amazing things, but the main customer has been doing large layoffs and stopped spending”.
  • You will also probably be asked why you want this job / want to work with this company. While everyone really knows it is because you enjoy having money, find other things beforehand to say instead. “I want to work with Amazon because I love cloud, Amazon is kicking arse in that space, and I hear you have great people I’d love to work with”.

Note here: the original version of the above point said “I’d love to learn from”, but it was mentioned on Facebook that the flow felt one way there. It has been tweaked to express a desire for a two way flow of learning.

“What have you done” questions: the reality is that almost all work is collaborative these days. So, have some stories about things you’ve personally done and are proud of, but also have some stories of delivering things bigger than one person could do. For example, perhaps the ansible scripts for your project were super cool and mostly you, but perhaps you should also describe how the overall project was important and wouldn’t have worked without your bits.

Silicon Valley interviews: organizations like Google, Facebook, et cetera want to be famous for having hard interviews. Google will deliberately probe until they find an area you don’t know about and then dig into that. Weirdly, they’re not doing that to be mean — they’re trying to gauge how you respond to new situations (and perhaps stress). So, be honest if you don’t know the answer, but then offer up an informed guess. For example, I used to ask people about system calls and strace. We’d keep going until we hit the limit of what they understood. I’d then stop and explain the next layer and then ask them to infer things — “assuming that things work like this, how would this probably work”? It is important to not panic!

Interviews as a sysadmin

  • Interviewers want to know about your attitude as well as your skills. As sysadmins, sometimes we are faced with high pressure situations  — something is down and we need to get it back up and running ASAP. Have a story ready to tell about a time something went wrong. You should demonstrate that you took the time to plan before acting, even in an emergency scenario. Don’t leave the interviewer thinking you’ll be the guy who will accidentally delete everyone’s data because you’re in a rush.
  • An understanding of how the business functions and why “IT” is important is needed. For example, if you get asked to explain what a firewall is, be sure to talk about how it relates to “security policy” as well as the technical elements (ports, packet inspection & whatnot).
  • Your ability to learn new technologies is as important as the technologies you already know.

Interviews as a developer

  • I think people look for curiosity here. Everyone will encounter new things, so they want to hear that you like learning, are a self starter, and can do new stuff. So for example if you’ve just done the CKA exam and passed that would be a great example.
  • You need to have examples of things you have built and why those were interesting. Was the thing poorly defined before you built it? Was it experimental? Did it have a big impact for the customer?
  • An open source portfolio can really help — it means people can concretely see what you’re capable of instead of just playing 20 questions with you. If you don’t have one, don’t start new projects — go find an existing project to contribute to. It is much more effective.

Share

January 31, 2020

Audiobooks – January 2020

I’ve decided to change my rating system

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recomend
  • 3/5 = Average. in the middle 70%
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Far Futures edited by Gregory Benford

5 Hard SF stories set it the distant (10,000 years+) future. I thought they were all pretty good. Would recommend 4/5

Farmer Boy: Little House Series, Book 2 by Laura Ingalls Wilder

A year in a life of a 9 year old boy on a farm in 1860s New Year State. Lots of hard work and chores. His family is richer than Laura’s from the previous book. 3/5

Astrophysics for People in a Hurry
by Neil DeGrasse Tyson

A quick (4h) overview and introduction of our current understanding of the universe. A nice little introduction to the big stuff. 3/5

The Pioneers: The Heroic Story of the Settlers Who Brought the American Ideal West by David McCullough

The Story of five of the first settlers of Marietta, Ohio from 1788 and the early history of the town. Not a big book or wide scope but works okay within it’s limits. 4/5

1971, Never a Dull Moment: Rock’s Golden Year by David Hepworth

A month by month walk though musical (and some other) history for 1971. Lots of gossip, backstories and history changing (or not) moments. 4/5

Digital Minimalism: Choosing a Focused Life in a Noisy World by Cal Newport

A guide to cutting down electronic distrations (especially social media) to those that make your life better and help towards your goals. 3/5

Share

Where next: Spring starts when a heartbeat’s pounding…

Today I’m delighted to announce the next big adventure for my little family and I.

For my part, I will be joining the inspirational, aspirational and world leading Service Canada to help drive the Benefits Delivery Modernization program with Benoit Long, Tammy Belanger and their wonderful team, in collaboration with our wonderful colleagues across the Canadian Government! This enormous program aims to dramatically improve the experience of Canadians with a broad range of government services, whilst transforming the organization and helping create the digital foundations for a truly responsive, effective and human-centred public sector :)

This is a true digital transformation opportunity which will make a difference in the lives of so many people. It provides a chance to implement and really realise the benefits of human-centred service design, modular architecture (and Government as a Platform), Rules as Code, data analytics, life journey mapping, and all I have been working on for the last 10 years. I am extremely humbled and thankful for the chance to work with and learn from such a forward thinking team, whilst being able to contribute my experience and expertise to such an important and ambitious agenda.

I can’t wait to work with colleagues across ESDC and the broader Government of Canada, as well as from the many innovative provincial governments. I’ve been lucky enough to attend FWD50 in Ottawa for the last 3 years, and I am consistently impressed by the digital and public sector talent in Canada. Of course, because Canada is one of the “Digital Nations“, it also presents a great opportunity to collaborate closely with other leading digital governments, as I also found when working in New Zealand.

We’ll be moving to Ottawa in early March, so we will see everyone in Canada soon, and will be using the next month or so packing up, spending time with Australian friends and family, and learning about our new home :)

My husband and little one are looking forward to learning about Canadian and Indigenous cultures, learning French (and hopefully some Indigenous languages too, if appropriate!), introducing more z’s into my English, experiencing the cold (yes, snow is a novelty for Australians) and contributing how we can to the community in Ottawa. Over the coming years we will be exploring Canada and I can’t wait to share the particularly local culinary delight that is a Beavertail (a large, flat, hot doughnut like pastry) with my family!

For those who didn’t pick up the reference, the blog title had dual meaning: we are of course heading to Ottawa in the Spring, having had a last Australian Summer for a while (gah!), and it also was a little call out to one of the great Canadian bands, that I’ve loved for years, the Tragically Hip :)

January 30, 2020

Links January 2020

C is Not a Low Level Language [1] is an insightful article about the problems with C and the overall design of most current CPUs.

Interesting article about how the Boeing 737Max failure started with a takeover by MBA aparatcheks [2].

Interesting article about the risk of blood clots in space [3]. Widespread human spaceflight is further away than most people expect.

Wired has an insightful article about why rich people are so mean [4]. Also some suggestions for making them less mean.

Google published interesting information about their Titan security processor [5]. It’s currently used on the motherboards of GCP servers and internal Google servers. It would be nice if Google sold motherboards with a version of this.

Interesting research on how the alleged Supermicro motherboard backdoor [6]. It shows that while we may never know if the alleged attack took place such things are proven to be possible. In security we should assume that every attack that is possible is carried out on occasion. It might not have happen when people claim it happened, but it probably happened to someone somewhere. Also we know that TAO carried out similar attacks.

Arstechnica has an interesting article about cracking old passwords used by Unix pioneers [7]. In the old days encrypted passwords weren’t treated as secrets (/etc/passwd is world readable and used to have the encrypted passwords) and some of the encrypted passwords were included in source archives and have now been cracked.

Jim Baker (former general counsel of the FBI) wrote an insightful article titled Rethinking Encryption [8]. Lots of interesting analysis of the issues related to privacy vs the ability of the government to track criminals.

The Atlantic has an interesting article The Coalition Out to Kill Tech as We Know It [9] about the attempts to crack down on the power of big tech companies. Seems like good news.

The General Counsel of the NSA wrote an article “I Work for N.S.A. We Cannot Afford to Lose the Digital Revolution” [10].

Thoughts and Prayers by Ken Liu is an insightful story about trolling and NRA types [11].

Cory Doctorow wrote an insightful Locus article about the lack of anti-trust enforcement in the tech industry and it’s free speech implications titles “Inaction is a Form of Action” [12].

January 22, 2020

linux.conf.au 2020 recap

It's that time of year again. Most of OzLabs headed up to the Gold Coast for linux.conf.au 2020.

linux.conf.au is one of the longest-running community-led Linux and Free Software events in the world, and attracts a crowd from Australia, New Zealand and much further afield. OzLabbers have been involved in LCA since the very beginning and this year was no exception with myself running the Kernel Miniconf and several others speaking.

The list below contains some of our highlights that we think you should check out. This is just a few of the talks that we managed to make it to - there's plenty more worthwhile stuff on the linux.conf.au YouTube channel.

We'll see you all at LCA2021 right here in Canberra...

Keynotes

A couple of the keynotes really stood out:

Sean is a forensic structural engineer who shows us a variety of examples, from structural collapses and firefighting disasters, where trained professionals were blinded by their expertise and couldn't bring themselves to do things that were obvious.

There's nothing quite like cryptography proofs presented to a keynote audience at 9:30 in the morning. Vanessa goes over the issues with electronic voting systems in Australia, and especially internet voting as used in NSW, including flaws in their implementation of cryptographic algorithms. There continues to be no good way to do internet voting, but with developments in methodologies like risk-limiting audits there may be reasonably safe ways to do in-person electronic voting.

OpenPOWER

There was an OpenISA miniconf, co-organised by none other than Hugh Blemings of the OpenPOWER Foundation.

Anton (on Mikey's behalf) introduces the Power OpenISA and the Microwatt FPGA core which has been released to go with it.

Anton live demos Microwatt in simulation, and also tries to synthesise it for his FPGA but runs out of time...

Paul presents an in-depth overview of the design of the Microwatt core.

Kernel

There were quite a few kernel talks, both in the Kernel Miniconf and throughout the main conference. These are just some of them:

There's been many cases where we've introduced a syscall only to find out later on that we need to add some new parameters - how do we make our syscalls extensible so we can add new parameters later on without needing to define a whole new syscall, while maintaining both forward and backward compatibility? It turns out it's pretty simple but needs a few more kernel helpers.

There are a bunch of tools out there which you can use to make your kernel hacking experience much more pleasant. You should use them.

Among other security issues with container runtimes, using procfs to setup security controls during the startup of a container is fraught with hilarious problems, because procfs and the Linux filesystem API aren't really designed to do this safely, and also have a bunch of amusing bugs.

Control Flow Integrity is a technique for restricting exploit techniques that hijack a program's control flow (e.g. by overwriting a return address on the stack (ROP), or overwriting a function pointer that's used in an indirect jump). Kees goes through the current state of CFI supporting features in hardware and what is currently available to enable CFI in the kernel.

Linux has supported huge pages for many years, which has significantly improved CPU performance. However, the huge page mechanism was driven by hardware advancements and is somewhat inflexible, and it's just as important to consider software overhead. Matthew has been working on supporting more flexible "large pages" in the page cache to do just that.

Spoiler: the magical fantasy land is a trap.

Community

Lots of community and ethics discussion this year - one talk which stood out to me:

Bradley and Karen argue that while open source has "won", software freedom has regressed in recent years, and present their vision for what modern, pragmatic Free Software activism should look like.

Other

Among the variety of other technical talks at LCA...

Quantum compilers are not really like regular classical compilers (indeed, they're really closer to FPGA synthesis tools). Matthew talks through how quantum compilers map a program on to IBM's quantum hardware and the types of optimisations they apply.

Clevis and Tang provide an implementation of "network bound encryption", allowing you to magically decrypt your secrets when you are on a secure network with access to the appropriate Tang servers. This talk outlines use cases and provides a demonstration.

Christoph discusses how to deal with the hardware and software limitations that make it difficult to capture traffic at wire speed on fast fibre networks.

January 19, 2020

Annual Penguin Picnic, January 25, 2020

Jan 25 2020 12:00
Jan 25 2020 16:00
Jan 25 2020 12:00
Jan 25 2020 16:00
Location: 
Yarra Bank Reserve, Hawthorn

The Linux Users of Victoria Annual Penguin Picnic will be held on Saturday, January 25, starting at 12 noon at the Yarra Bank Reserve, Hawthorn.  In the event of hazardous levels of smoke or other dangerous weather, we will announce an alternate indoor location.

LUV would like to acknowledge Infoxchange for the Richmond venue.

Linux Users of Victoria Inc., is a subcommitee of Linux Australia.

January 25, 2020 - 12:00

read more

LUV February 2020 Workshop: making and releasing films with free software

Feb 15 2020 12:30
Feb 15 2020 16:30
Feb 15 2020 12:30
Feb 15 2020 16:30
Location: 
Infoxchange, 33 Elizabeth St. Richmond

Film Freedom: making and releasing films with free software

Film Freedom is a documentation and development project to make filmmaking with free-software and releasing via free culture funding models more accessible. It currently includes the following development projects:

The meeting will be held at Infoxchange, 33 Elizabeth St. Richmond 3121.  Late arrivals please call (0421) 775 358 for access to the venue.

LUV would like to acknowledge Infoxchange for the venue.

Linux Users of Victoria is a subcommittee of Linux Australia.

February 15, 2020 - 12:30

read more

January 17, 2020

Linux.conf.au 2020 – Friday – Lightning Talks and Close

Steve

  • Less opportunity for Intern type stuff
  • Trying to build team with young people
  • Internships
  • They Need opportunities
  • Think about giving a chance

Martin

  • Secure Scuttlebutt
  • p2p social web
  • more like just a protocol
  • scuttlebutt.nz
  • Protocol used for other stuff.

Emma

  • LCA from my perspective

Mike Bailey

  • Pipe-skimming
  • Enahncing UI of CLI tools
  • take first arg in pipe and sends to the next tool

Aleks

  • YOGA Book c930
  • Laptop with e-ink display for keyboard
  • Used wireshark to look at USB under Windows
  • Created a device driver based on packets windows was sending
  • Linux recognised it as a USB Keyboard and just works
  • Added new feature and
  • github.com/aleksb

Evan

  • Two factor authentication
  • It’s hard

Keith

  • Snekboard
  • Crowdsourced hardware project
  • crowdsupply.com/keith-packard/snekboard
  • $79 campaign, ends 1 March

Adam and Ben

  • idntfrs
  • bytes are not expensive any more

William

  • Root cause of swiss cheese

Colin

  • OWASP
  • Every person they taught about a vulnerbility 2 people appeared to write vulnerable code
  • WebGoat
  • Hold you hand though OWASP vulnerability list. Exploit and fix
  • teaching, playing to break, go back and fix
  • Forks in various languages

Leigh

  • Masculinity
  • Leave it better than you found it

David

  • Fixing NAT
  • with more NAT

Caitlin

  • Glitter!
  • conferences should be playful
  • meetups can be friendly
  • Ways to introduce job
  • Stickers

Miles

  • Lies, Damn lies and data science
  • Hipster statistics
  • LCA 2021 is in Canberra

Share

Linux.conf.au 2020 – Friday – Session 1 – Protocols / LumoSQL

The Fight to Keep the Watchers at Bay – Mark Nottingham

Disclaimer: I am not a security person, But in some sense we are all security people.

Why Secure the Internet

  • In the beginning it was just researchers and a Academics
  • Snowden was a watershed moment
  • STRINT Workshop in 2014
  • It’s not just your website, it’s the Javascript that somebody in injecting in front of it.

What has happened so far?

  • http -> https
    • In 2010 even major services, demo of firesheep program to grab cookies and auth off Wifi
    • Injecting cookies in http flows
    • Needed to shift needle to https
    • http/2 big push to make encrypted-only , isn’t actually though browsers only support https.
    • “Secure Contexts” cool features only https
  • Problem: Mixed Content
    • “Upgrading Insecure Requests” allow ad-hoc by pages
    • HTTPs is slow – istlsfastyet.com
    • Improvement in speed of implimentations
    • Let’s Encrypt
  • Around 85-90% https as of Early 2020
  • Some people were unhappy
    • Slow Satellite internet said they needed middle boxes to optimise http over slow links
    • People who did http shared caching
  • TLS 1.2 -> TLS 1.3
    • Complex old protocol
    • Implementation monculture
    • Outdated Crypto
    • TLS 1.3
      • Simplify where possible
      • encrypt most of handshake
      • get good review of protocol
      • At around 30%
      • Lots of implementations
    • Some unhappy. Financial institutions needed to sniff secure transactions (and had bought expensive appliances to do this)
      • They ended up forkign their own protocol
  • TCP -> QUIC
    • TCP is unencrypted, lots of leaks and room for in-betweens to play around
    • QUIC – all encrypted
    • Spin Bit – single bit of data can be used by providers to estimate packet loss and delay.
  • DNS -> DOH
    • Lots of click data sold by ISPs
    • Countries hijacking DNS by countries to block stuff
    • DNS over https co be co-located by a popular website
    • Some were unhappy
      • Lots of pushback from governments and big companies
      • Industry unhappy about concentration of DNS handling
      • Have to decide who to trust
  • SNI -> Encrypted SNI
    • Working progress, very complex
    • South Korea unhappy, was using it to block people
  • Traffic Analysis
    • Packet length, frequency, destinations
    • TOR hard to tell. Looking at using multiplexing and fix-length records
  • But the ends
    • Customer compromised or provider compromised (or otherwise sharing data)
  • Observations
    • Cost and Control
      • Cost: Big technology spends no obsolete
      • Control: some people want to do stuff on the network
    • We have to design tthe Internet to the pessimistic case
    • You can’t expose application data to the path anymore
    • Well-defined interfaces and counterbalanced roles
    • Technology and Policy need to work togeather and keep each other in check
    • Making some people unhappy means you need some guiding principles

LumoSQL – updating SQLite for the modern age – Dan Shearer

LumoSQL = SQLite + LMDB – WAL

SQLite

  • ” Is a replacement for fopen() “
  • Key/Value stores.
    • Everyone used Sleepycat BDB – bought be Oracle and licensed changed
    • Many switched to LMDB (approx 2010)
  • Howard Chu 2013 SQLightning faster than SLQite but changes not adopted into SQLite

LumoSQL

  • Funded by NLNet Foundation
  • Dan Shearer and Keith Maxwell

What isn’t working with SQLite ?

  • Inappropriate/unsupported use cases
  • Speed
  • Corruption
  • Encryption

What hasn’t been done so far

  • Located code, started on github.com/LumoSQL
  • Benchmarking tool for versions matrix
  • Mapped out how the keywords store works
    • So different backend can be dropped in.
  • Fixed bugs with the port and with lmdb

What’s Next

  • First Release Feb 2020
  • Add Multiple backends
  • Implement two database advances

Share

January 16, 2020

Linux.conf.au 2020 – Thursday – Session 3 – Software Freedom lost / Stream Processing

Open Source Won, but Software Freedom Hasn’t Yet: A Guide & Commiseration Session for FOSS activists by Bradley M. Kuhn, Karen Sandler

Larger Events elsewhere tend to be corperate sponsored so probably wouldn’t accept a talk like this

Free Software Purists

  • About 2/3s of audenience spent some time going out of their way using free software
  • A few years ago you could only use free software
  • To watch TV. I can use DRM or I can pirate. Both are problems.
  • The web is a very effecient way to install proprietary software (javascript) on your browser
  • Most people don’t even see that or think about it

Laptops

  • 2010-era Laptops are some of the last that are fully free-software
  • Later have firmware and other stuff that is all closed.
  • HTC Dream – some firmware on phone bit but rest was free software

Electronic Coupons

  • Coupons are all Digital. You need to run an app that tracks all you processors
  • “As a Karen I sometimes ask the store to just ket me have the coupon, even though it is expired”
  • Couldn’t install Disneyland App on older phones. So unable to bypass lines etc.

Proprietary dumping ground

  • Bradly had a device. Installed all the proprietary apps on it rather than his main phone
  • But it’s a bad idea since all the tracking stuff can talk to each other.

Hypocrisy of tradition free software advocacy

  • Do not criticise people for use Proprietary software
  • It is it is almost impossible to live your life without use it
  • It should be an aspirational goal
  • Person should not be seen as a failure if they use it
  • Asking others to use it instead is worse than using it yourself
  • Karen’s Laptop: It runs Debian but it is only “98% free”

Paradox: There more FOSS there is, the less software freedom we actually have in our technology

  • But there is less software freedom than there is in 2006
  • Because everything is computerized, a lot more than 15 years ago.
  • More things in Linux that Big companies want in datacentres rather than tinkerers in their homes want.

What are the right choices?

  • Be mindful
  • Try when you can to use free software. Make small choices that support software freedom
  • Shine a light on the problem
  • Don’t let the shame you feel about using proprietary software paralyze you
  • and don’t let the problems we face overwhelm you into inaction
  • Re-prioritize your FOSS development time.
    • Is it going to give more people freedom in the world?
    • Maybe try to do a bit in your free time.
  • Support each other
  • FAIF.us podcast

Advanced Stream Processing on the Edge by Eduardo Silva

Data is everywhere. We need to be able to extract value from it

  • Put it all in a database to extract value
  • Challenge: Data comes from all sorts of places
    • More data -> more bandwidth -> more resource required
    • Delays as more data ingested
  • Challenge: lots of different formats

Ideal Tool

  • Collect from different sources
  • convert unstructured to structured
  • enrichment and filtering
  • multiple destinations like database or cloud services

Fluentbit

  • Started in 2015
  • Origins lightweight log processor for embedded space
  • Ended up being used in cloud space
  • Written in C
  • Low mem and CPU
  • Plugable arch
  • input -> parser -> filter -> buffer -> routing -> output

Structure Messages

  • Unstructured to structured
  • Metadata
  • Can add tags to date on input, use it later for routing

Stream processing

  • Perform processing while the data is still in motion
  • Faster data processing
  • in Memory
  • No tables
  • No indexing
  • Receive structured data, expose a query language
  • Nomally done centrally

Doing this on the edge

  • Offload computation from servers to data collectors
  • Only sends required data to the cloud
  • Use a SQL-like language to write the queries
  • Integrated with fluent core

Functions

  • Aggregation functions
  • Time funtiocs
  • Timeseries functions
  • You can also write functions in Lua

Also exposed prometheus-type metrics

Share

Linux.conf.au 2020 – Thursday – Session 2 – Origins of X / Aerial Photography

The History of X: Lessons for Software Freedom – Keith Packard

1984 – The Origins of X

  • Everything proprietary
  • Brian Reid and Paul Asente: V Kernel -> VGTS -> W window system
    • Ported to VAXstation 100 at Stanford
    • 68k processor, 128k of VRAM
    • B&W
  • Bob Scheifler started hacking W -> X
  • Ported to Unix , made more Unix Friendly (async) renamed X

Unix Workstation Market

  • Unix was closed source
  • Vendor Unix based on BSD 4.x
  • Sun, HP, Digital, Apollo, Tektronix, IBM
  • this was when the configure program happened
  • VAXstation II
    • Color graphics 8bit accelerated
  • Sun 3/60
    • CPU drew everything on the screen

Early Unix Window System – 85-86

  • SunView dominates (actual commerical apps, Ddesktop widgets)
  • Digital VMS/US
  • Apollo had Domain
  • Tektronix demonstrated SmallTalk
  • all only ran on their own hardware

X1 – X6

  • non-free software
  • Used Internally at MIT
  • Shared with friends informally

X10 – approx 1986

  • Almost usable
  • Ported to various workstations
  • Distribution was not all free software (had bin blobs)
    • Sun port relied on SunView kernel API
    • Digital provided binary rendering code
    • IBM PC/RT Support completed in source form

Why X11 ?

  • X10 had warts
  • rendering model was pretty terrible
  • External Windows manager without borders
  • Other vendors wanted to get involved
    • Jim Gettys and Smokey Wallace
    • Write X11, release under liberal terms
    • Working against Sun
    • Displace Sunview
    • “Reset the market”
    • Digital management agreed

X11 Development 1986-87

  • Protocol designed as croos-org team
  • Sample implementation done mostly at DEC WRL, collaboration with people at MIT
  • Internet not functional enough to property collaborate, done via mail
    • Thus most of it happened at MIT

MIT X Consortium

  • Hired dev team at MIT
  • Funded by consortium
  • Members also voted on standards
    • Members stopped their on develoment
    • Stopped collaboration with non-members
  • We knew Richard too well – The GPL’s worst sponsor
  • Corp sponsors dedicated to non-free software

X Consortium Standards

  • XIE – X Imaging Extensions
  • PIX – Phigs Extension for X
  • LBX – Low Bandwidth X
  • Xinput (version 1)

The workstation vendors were trying to differentiate. They wanted a minimal base to built their stuff on. Standard was frozen for around 15 years. That is why X fell behind other envs as hardware changed.

X11 , NeWs and Postscript

  • NeWS – Very slow but cool
  • Adobe adapted PostScript interpreter for windows systems – Closed Source
  • Merged X11/NeWS server – Closed Source

The Free Unix Desktop

  • All the toolkits were closed source
  • Sunview -> XView
  • OpenView – Xt based toolkit

X Stagnates – ~1992

  • Core protocol not allowed to change
  • non-members pushed out
  • market fragments

Collapse of Unix

  • The Decade of Windows

Opening a treasure trove: The Historical Aerial Photography project by Paul Haesler

  • Geoscience Australia has inherated an extensive archive of hisorical photography
  • 1.2 million images from 1920 – 1990s
  • Full coverage of Aus and more (some places more than others)

Historical Archive Projects

  • Canonical source of truth is pieces of paper
  • Multiple attempts at scanning/transscription. Duplication and compounding of errors
  • Some errors in original data
  • “Historian” role to sift through and collate into a machine-readable form – usually spreadsheets
  • Data Model typically evolves over time – implementation must be flexible and open-minded

What we get

  • Flight Line Diagrams (metadata)
  • Imagery (data)
  • Lots scanned in early 1990s, but low resolution and missing data, some missed

Digitization Pipeline

  • Flight line diagram pipeline
    • High resolution scans
    • Georeferences
  • Film pipeline
    • Filmstock
    • High Resolution scans
    • Georeference images
    • Georectified images
    • Stitched mosaics + Elevation models

Only about 20% of film scanned. Lacking funding and film deteriorating

Other states have similar smaller archives (and other countries)

  • Many significantly more mature but may be locked in propitiatory platforms

Stack

  • Open Data ( Cc by 4.0)
  • Open Standards (TESTful, GeoJSON, STAC)
  • Open Source
  • PostGreSQL/PostGIS
  • Python3: Django REST Framework
  • Current Status: API Only. Alpha/proof-of-concept

API

  • Search for Flight runs
  • Output is GeoJSON

Coming Next

  • Scanning and georeferencing (need $$$)
  • Data entry/management tools – no spreadsheets
  • Refs to other archives, federated search
  • Integration with TerriaJS/National Map
  • Full STAC once standardized

Share