Planet Linux Australia
Celebrating Australians & Kiwis in the Linux and Free/Open-Source community...

October 20, 2020

Spartan: From Experimental Hybrid towards a Petascale Future

Previous presentations to eResearch Australiasia described the implementation of Spartan, the University of Melbourne’s general- purpose HPC system. Initially, this system was small but innovative, arguably even experimental. Features included making extensive use of cloud infrastructure for compute nodes, OpenStack for deployment, Ceph for the file system, ROCE for network, Slurm as the workload manager, EasyBuild and LMod, etc.

Based on consideration of job workload and basic principles of sunk, prospective, and opportunity costs, this combination maximised throughput on a low budget, and attracted international attention as a result. Flexibility in design also allowed the introduction of a large LIEF-supported GPGPU partition, the inclusion of older systems from Melbourne Bioinformatics, and departmental contributions. Early design decisions meant that Spartan has been able to provide performance and flexibility, and as a result continues to show high utilisation and job completion (close to 20 million), with overall metrics well what would be a “top 500” system. The inclusion of an extensive training programme based on androgogical principles has also helped significantly.

Very recently Spartan has undergone some significant architecture modifications, which this report will be of interest to other institutions. The adoption of Spectrum Scale file system has further improved scalability, performance, and reliability, along with adapting a pure HPC environment with a significant increase in core count designed for workload changes and especially queue times. Overall, these new developments in Spartan are designed to be integrated to the University’s Petascale Campus Initiative (PCI).

Spartan: From Experimental Hybrid towards a Petascale Future. Presentation to eResearchAustralasia, 2020. Transcript available.

October 19, 2020

FSK LDPC Data Mode

I’m developing an open source data mode using a FSK modem and powerful LDPC codes. The initial use case is the Open IP over UHF/VHF project, but it’s available in the FreeDV API as a general purpose mode for sending data over radio channels.

It uses 2FSK or 4FSK, has a variety of LDPC codes available, works with bursts or streaming frames, and the sample rate and symbol rate can be set at init time.

The FSK modem has been around for some time, and is used for several applications such as balloon telemetry and FreeDV digital voice modes. Bill, VK5DSP, has recently done some fine work to tightly integrate the LDPC codes with the modem. The FSK_LDPC system has been tested over the air in Octave simulation form, been ported to C, and bundled up into the FreeDV API to make using it straight forward from the command line, C or Python.

We’re not using a “black box” chipset here – this is ground up development of the physical layer using open source, careful simulation, automated testing, and verification of our work on real RF signals. As it’s open source the modem is not buried in proprietary silicon so we can look inside, debug issues and integrate powerful FEC codes. Using a standard RTLSDR with a 6dB noise figure, FSK_LDPC is roughly 10dB ahead of the receiver in a sample chipset. That’s a factor of 10 in power efficiency or bit rate – your choice!

Performance

The performance is pretty close to what is theoretically possible for coded FSK [6]. This is about Eb/No=8dB (2FSK) and Eb/No=6dB (4FSK) for error free transmission of coded data. You can work out what that means for your application using:

  MDS = Eb/No + 10*log10(Rb) + NF - 174
  SNR = Eb/No + 10*log10(Rb/B)

So if you were using 4FSK at 100 bits/s, with a 6dB Noise figure, the Minimum Detectable Signal (MDS) would be:

  MDS = 6 + 10*log10(100) + 6 - 174
      = -142dBm

Given a 3kHz noise bandwidth, the SNR would be:

  SNR = 6 + 10*log10(100/3000)
      = -8.8 dB

How it Works

Here is the FSK_LDPC frame design:

At the start of a burst we transmit a preamble to allow the modem to syncronise. Only one preamble is transmitted for each data burst, which can contain as many frames as you like. Each frame starts with a 32 bit Unique Word (UW), then the FEC codeword consisting of the data and parity bits. At the end of the data bits, we reserve 16 bits for a CRC.

This figure shows the processing steps for the receive side:

Unique Word Selection

The Unique Word (UW) is a known sequence of bits we use to obtain “frame sync”, or identify the start of the frame. We need this information so we can feed the received symbols into the LDPC decoder in the correct order.

To find the UW we slide it againt the incoming bit stream and count the number of errors at each position. If the number of errors is beneath a certain threshold – we declare a valid frame and try to decode it with the LDPC decoder.

Even with pure noise (no signal) a random sequence of bits will occasionally get a partial match (better than our threshold) with the UW. That means the occasional dud frame detection. However if we dial up the threshold too far, we might miss good frames that just happen to have a few too many errors in the UW.

So how do we select the length of the UW and threshold? Well for the last few decades I’ve been guessing. However despite being allergic to probability theory I have recently started using the Binominal Distribution to answer this question.

Lets say we have a 32 bit UW, lets plot the Binomial PDF and CDF:


The x-axis is the number of errors. On each graph I’ve plotted two cases:

  1. A 50% Bit Error Rate (BER). This is what we get when no valid signal is present, just random bits from the demodulator.
  2. A 10% bit error rate. This is the worst case where we need to get frame sync – a valid, but low SNR signal. The rate half LDPC codes fall over at about 10% BER.

The CDF tells us “what is the chance of this many or less errors”. We can use it to pick the UW length and thresholds.

In this example, say we select a “vaild UW threshold” of 6 bit errors out of 32. Imagine we are sliding the UW over random bits. Looking at the 50% BER CDF curve, we have a probablity of 2.6E-4 (0.026%) of getting 6 or less errors. Looking at the 10% curve, we have a probablity of 0.96 (96%) of detecting a valid frame – or in other words we will miss 100 – 96 = 4% of the valid frames that just happen to have 7 or more errors in the unique word.

So there is a trade off between false detection on random noise, and missing valid frames. A longer UW helps separate the two cases, but adds some overhead – as UW bits don’t carry any payload data. A lower threshold means you are less likely to trigger on noise, but more likely to miss a valid frame that has a few errors in the UW.

Continuing our example, lets say we try to match the UW on a stream of random bits from off air noise. Because we don’t know where the frame starts, we need to test every single bit position. So at a bit rate of 1000 bits/s we attempt a match 1000 times a second. The probability of a random match in 1000 bits (1 second) is 1000*2.6E-4 = 0.26, or about 1 chance in 4. So every 4 seconds, on average, we will get an accidental UW match on random data. That’s not great, as we don’t want to output garbage frames to higher layers of our system. So a CRC on the decoded data is performed as a final check to determine if the frame is indeed valid.

Putting it all together

We prototyped the system in GNU Octave first, then ported the individual components to stand alone C programs that we can string together using stdin/stdout pipes:

$ cd codec2/build_linux$ cd src/
$ ./ldpc_enc /dev/zero - --code H_256_512_4 --testframes 200 |
  ./framer - - 512 5186 | ./fsk_mod 4 8000 5 1000 100 - - |
  ./cohpsk_ch - - -10.5 --Fs 8000  |
  ./fsk_demod --mask 100 -s 4 8000 5 - - |
  ./deframer - - 512 5186  |
  ./ldpc_dec - /dev/null --code H_256_512_4 --testframes
--snip--
Raw   Tbits: 101888 Terr:   8767 BER: 0.086
Coded Tbits:  50944 Terr:    970 BER: 0.019
      Tpkts:    199 Tper:     23 PER: 0.116

The example above runs 4FSK at 5 symbols/second (10 bits/s), at a sample rate of 8000 Hz. It uses a rate 0.5 LDPC code, so the throughput is 5 bit/s and it works down to -24dB SNR (at around 10% PER). This is what it sounds like on a SSB receiver:

Yeah I know. But it’s in there. Trust me.

The command line programs above are great for development, but unwieldy for real world use. So they’ve been combined into single FreeDV API functions. These functions take data bytes, convert them to samples you send through your radio, then at the receiver back to bytes again. Here’s a simple example of sending some text using the FreeDV raw data API test programs:

$ cd codec2/build_linux/src
$ echo 'Hello World                    ' |
  ./freedv_data_raw_tx FSK_LDPC - - 2>/dev/null |
  ./freedv_data_raw_rx FSK_LDPC - - 2>/dev/null |
  hexdump -C
48 65 6c 6c 6f 20 57 6f  72 6c 64 20 20 20 20 20  |Hello World     |
20 20 20 20 20 20 20 20  20 20 20 20 20 20 11 c6  |              ..|

The “2>/dev/null” hides some of the verbose debug information, to make this example quieter. The 0x11c6 at the end is the 16 bit CRC. This particular example uses frames of 32 bytes, so I’ve padded the input data with spaces.

My current radio for real world testing is a Raspberry Pi Tx and RTLSDR Rx, but FSK_LDPC could be used over regular SSB radios (just pipe the audio into and out of your radio with a sound card), or other SDRs. FSK chips could be used as the Tx (although their receivers are often sub-optimal as we shall see). You could even try it on HF, and receive the signal remotely with a KiwiSDR.

I’ve used a HackRF as a Tx for low level testing. After a few days of tuning and tweaking It works as advertised – I’m getting within 1dB of theory when tested over the bench at rates between 500 and 20000 bits/s. In the table below Minimum Detectable Signal (MDS) is defined as 10% PER, measured over 100 packets. I send the packets arranged as 10 “bursts” of 10 packets each, with a gap between bursts. This gives the acquisition a bit of a work out (burst operation is typically tougher than streaming):

Info bit rate (bits/s) Mode NF (dB) Expected MDS (dBm) Measured MDS (dBm) Si4464 MDS (dBm)
1000 4FSK 6 -132 -131 -123
10000 4FSK 6 -122 -120 -110
5000 2FSK 6 -123 -123 -113

The Si4464 is used as an example of a chipset implementation. The Rx sensitivity figures were extrapolated from the nearest bit rate on Table 3 of the Si4464 data sheet. It’s hard to compare exactly as the Si4664 doesn’t have FEC. In fact it’s not possible to fully utilise the performance of high performance FEC codes on chipsets as they generally don’t have soft decision outputs.

FSK_LDPC can scale to any bit rate you like. The ratio of the sample rate to symbol rate Fs/Rs = 8000/1000 (8kHz, 1000 bits/s) is the same as Fs/Rs = 800000/100000 (800kHz, 100k bits/s), so it’s the same thing to the modem. I’ve tried FSK_LDPC between 5 and 40k bit/s so far.

With a decent LNA in front of the RTLSDR, I measured MDS figures about 4dB lower at each bit rate. I used a rate 0.5 code for the tests to date, but other codes are available (thanks to Bill and the CML library!).

There are a few improvements I’d like to make. In some tests I’m not seeing the 2dB advantage 4FSK should be delivering. Syncronisation is trickier for 4FSK, as we have 4 tones, and the raw modem operating point is 2dB further down the Eb/No curve than 2FSK. I’d also like to add some GFSK style pulse shaping to make the Tx spectrum cleaner. I’m sure some testing over real world links will also show up a few issues.

It’s fun building, then testing, tuning and pushing through one bug after another to build your very own physical layer! It’s a special sort of magic when the real world results start to approach what the theory says is possible.

Reading Further

[1] Open IP over UHF/VHF Part 1 and Part 2 – my first use case for the FSK_LDPC protocol described in this post.
[2] README_FSK – recently updated documentation on the Codec 2 FSK modem, including lots of examples.
[3] README_data – new documentation on Codec 2 data modes, including the FSK_LDPC mode described in this post.
[4] 4FSK on 25 Microwatts – Bill and I sending 4FSK signals across Adelaide, using an early GNU Octave simulation version of the FSK_LDPC mode described in this post.
[5] Bill’s LowSNR blog.
[6] Coded Modulation Library Overview – CML is a wonderful library that we are using in Codec 2 for our LDPC work. Slide 56 tells us the theoretical mininum Eb/No for coded FSK (about 8dB for 2FSK and 6dB for 4FSK).
[7] 4FSK LLR Estimation Part 2 – GitHub PR used for development of the FSK_LDPC mode.

Video Decoding

I’ve had a saga of getting 4K monitors to work well. My latest issue has been video playing, the dreaded mplayer error about the system being too slow. My previous post about 4K was about using DisplayPort to get more than 30Hz scan rate at 4K [1]. I now have a nice 60Hz scan rate which makes WW2 documentaries display nicely among other things.

But when running a 4K monitor on a 3.3GHz i5-2500 quad-core CPU I can’t get a FullHD video to display properly. Part of the process of decoding the video and scaling it to 4K resolution is too slow, so action scenes in movies lag. When running a 2560*1440 monitor on a 2.4GHz E5-2440 hex-core CPU with the mplayer option “-lavdopts threads=3” everything is great (but it fails if mplayer is run with no parameters). In doing tests with apparent performance it seemed that the E5-2440 CPU gains more from the threaded mplayer code than the i5-2500, maybe the E5-2440 is more designed for server use (it’s in a Dell PowerEdge T320 while the i5-2500 is in a random white-box system) or maybe it’s just because it’s newer. I haven’t tested whether the i5-2500 system could perform adequately at 2560*1440 resolution.

The E5-2440 system has an ATI HD 6570 video card which is old, slow, and only does PCIe 2.1 which gives 5GT/s or 8GB/s. The i5-2500 system has a newer ATI video card that is capable of PCIe 3.0, but “lspci -vv” as root says “LnkCap: Port #0, Speed 8GT/s, Width x16” and “LnkSta: Speed 5GT/s (downgraded), Width x16 (ok)”. So for reasons unknown to me the system with a faster PCIe 3.0 video card is being downgraded to PCIe 2.1 speed. A quick check of the web site for my local computer store shows that all ATI video cards costing less than $300 have PCI3 3.0 interfaces and the sole ATI card with PCIe 4.0 (which gives double the PCIe speed if the motherboard supports it) costs almost $500. I’m not inclined to spend $500 on a new video card and then a greater amount of money on a motherboard supporting PCIe 4.0 and CPU and RAM to go in it.

According to my calculations 3840*2160 resolution at 24bpp (probably 32bpp data transfers) at 30 frames/sec means 3840*2160*4*30/1024/1024=950MB/s. PCIe 2.1 can do 8GB/s so that probably isn’t a significant problem.

I’d been planning on buying a new video card for the E5-2440 system, but due to some combination of having a better CPU and lower screen resolution it is working well for video playing so I can save my money.

As an aside the T320 is a server class system that had been running for years in a corporate DC. When I replaced the high speed SAS disks with SSDs SATA disks it became quiet enough for a home workstation. It works very well at that task but the BIOS is quite determined to keep the motherboard video running due to the remote console support. So swapping monitors around was more pain than I felt like going through, I just got it working and left it. I ordered a special GPU power cable but found that the older video card that doesn’t need an extra power cable performs adequately before the cable arrived.

Here is a table comparing the systems.

2560*1440 works well 3840*2160 goes slow
System Dell PowerEdge T320 White Box PC from rubbish
CPU 2.4GHz E5-2440 3.3GHz i5-2500
Video Card ATI Radeon HD 6570 ATI Radeon R7 260X
PCIe Speed PCIe 2.1 – 8GB/s PCIe 3.0 downgraded to PCIe 2.1 – 8GB/s

Conclusion

The ATI Radeon HD 6570 video card is one that I had previously tested and found inadequate for 4K support, I can’t remember if it didn’t work at that resolution or didn’t support more than 30Hz scan rate. If the 2560*1440 monitor dies then it wouldn’t make sense to buy anything less than a 4K monitor to replace it which means that I’d need to get a new video card to match. But for the moment 2560*1440 is working well enough so I won’t upgrade it any time soon. I’ve already got the special power cable (specified as being for a Dell PowerEdge R610 for anyone else who wants to buy one) so it will be easy to install a powerful video card in a hurry.

October 18, 2020

Using a Let's Encrypt TLS certificate with Asterisk 16.2

In order to fix the following error after setting up SIP TLS in Asterisk 16.2:

asterisk[8691]: ERROR[8691]: tcptls.c:966 in __ssl_setup: TLS/SSL error loading cert file. <asterisk.pem>

I created a Let's Encrypt certificate using certbot:

apt install certbot
certbot certonly --standalone -d hostname.example.com

To enable the asterisk user to load the certificate successfuly (it doesn't permission to access to the certificates under /etc/letsencrypt/), I copied it to the right directory:

cp /etc/letsencrypt/live/hostname.example.com/privkey.pem /etc/asterisk/asterisk.key
cp /etc/letsencrypt/live/hostname.example.com/fullchain.pem /etc/asterisk/asterisk.cert
chown asterisk:asterisk /etc/asterisk/asterisk.cert /etc/asterisk/asterisk.key
chmod go-rwx /etc/asterisk/asterisk.cert /etc/asterisk/asterisk.key

Then I set the following variables in /etc/asterisk/sip.conf:

tlscertfile=/etc/asterisk/asterisk.cert
tlsprivatekey=/etc/asterisk/asterisk.key

Automatic renewal

The machine on which I run asterisk has a tricky Apache setup:

  • a webserver is running on port 80
  • port 80 is restricted to the local network

This meant that the certbot domain ownership checks would get blocked by the firewall, and I couldn't open that port without exposing the private webserver to the Internet.

So I ended up disabling the built-in certbot renewal mechanism:

systemctl disable certbot.timer certbot.service
systemctl stop certbot.timer certbot.service

and then writing my own script in /etc/cron.daily/certbot-francois:

#!/bin/bash
TEMPFILE=`mktemp`

# Stop Apache and backup firewall.
/bin/systemctl stop apache2.service
/usr/sbin/iptables-save > $TEMPFILE

# Open up port 80 to the whole world.
/usr/sbin/iptables -D INPUT -j LOGDROP
/usr/sbin/iptables -A INPUT -p tcp --dport 80 -j ACCEPT
/usr/sbin/iptables -A INPUT -j LOGDROP

# Renew all certs.
/usr/bin/certbot renew --quiet

# Restore firewall and restart Apache.
/usr/sbin/iptables -D INPUT -p tcp --dport 80 -j ACCEPT
/usr/sbin/iptables-restore < $TEMPFILE
/bin/systemctl start apache2.service

# Copy certificate into asterisk.
cp /etc/letsencrypt/live/hostname.example.com/privkey.pem /etc/asterisk/asterisk.key
cp /etc/letsencrypt/live/hostname.example.com/fullchain.pem /etc/asterisk/asterisk.cert
chown asterisk:asterisk /etc/asterisk/asterisk.cert /etc/asterisk/asterisk.key
chmod go-rwx /etc/asterisk/asterisk.cert /etc/asterisk/asterisk.key
/bin/systemctl restart asterisk.service

# Commit changes to etckeeper.
pushd /etc/ > /dev/null
/usr/bin/git add letsencrypt asterisk
DIFFSTAT="$(/usr/bin/git diff --cached --stat)"
if [ -n "$DIFFSTAT" ] ; then
    /usr/bin/git commit --quiet -m "Renewed letsencrypt certs."
    echo "$DIFFSTAT"
fi
popd > /dev/null

October 16, 2020

RTLSDR Strong Signals and Noise Figure

I’ve been exploring the strong and weak signal performance of the RTLSDR. This all came about after Bill and I performed some Over the Air tests using the RTLSDR. We found that if we turned the gain all the way up, lots of birdies and distortion appeared. However if we wind the gain back to improve strong signal performance, the Noise Figure (NF) increases, messing with our modem link budgets.

Fortunately, there’s been a lot of work on the RTLSDR internals from the open source community. So I’ve had a fun couple of weeks drilling down into the RTLSDR drivers and experimenting. It’s been really interesting, and I’ve learnt a lot about the trade offs in SDR. For USD$22, the RTLSDR is a fine teaching/learning tool – and a pretty good radio.

Strong and Weak Signals

Here is a block diagram of the RTLSDR as I understand it:

It’s a superhet, with an IF bandwidth of 2MHz or less. The IF is sampled by an 8-bit ADC that runs at 28 MHz. The down sampling (decimation) from 28MHz to 2MHz provides some “processing gain” which results in a respectable performance. One part I don’t really understand is the tracking BPF, but I gather it’s pretty broad, and has no impact on strong signals a few MHz away.

There are a few ways for strong signals to cause spurious signals to appear:

  1. A -30dBm signal several MHz away will block the LNA/mixer analog stages at the input. For example a pager signal at 148.6MHz while you are listening to 144.5 MHz. This causes a few birdies to gradually appear, and some compression of your wanted signal.
  2. A strong signal a few MHz away can be aliased into your passband, as the IF filter stop band attenuation is not particularly deep.
  3. A -68dBm signal inside the IF bandwidth will overload the ADC, and the whole radio will fall in a heap.

The levels quoted above are for maximum gain (-g 49), and are consistent with [1] and [5]. If you reduce the gain, the overload levels get higher, but so does your noise figure. You can sometimes work around the first two issues, e.g. if the birdies don’t happen to fall right on top of your signal they can be ignored. So the first two effects – while unfortunate – tend to be more benign than ADC overload.

At the weak signal end of the operating range, we are concerned about noise. Here is how I model the various noise contributions:

The idea of a radio is to use the tuner to remove the narrow band unwanted signals, leaving just your wanted signal. However noise tends to be evenly distributed in frequency, so we are stuck with any noise that is located in the bandwidth of our wanted signal. A common technique is to have enough gain before the ADC such that the signal being sampled is large compared to the ADC quantisation noise. That way we can ignore the noise contribution from the ADC.

However with strong signals, we need to back off the gain to prevent overload. Now the ADC noise becomes significant, and the overall NF of the radio increases.

Another way of looking at this – if the gain ahead of the ADC is small, we have a smaller signal hitting the ADC, which will toggle less bits of the ADC, resulting in coarse quantisation (more quantisation noise) from the ADC.

The final decimation stage reduces ADC quantisation noise. This figure shows our received signal (a single frequency spike), and the ADC quantisation noise (continuous line, with energy at every frequency):

The noise power is the sum of all the noise in our bandwidth of interest. The decimation filter limits this bandwidth, removing most of the noise except for the small band near our wanted signal (shaded blue area). So the total noise power is summed over a smaller bandwidth, noise power is reduced and our SNR goes up. This means that despite just being 8 bits, the ADC performs reasonably well.

Off Air Strong Signals

Here is what my spectrum analyser sees when connected to my antenna. It’s 10MHz wide, centred on 147MHz. There are some strong pager signals that bounce around the -30dBm level, plus some weaker 2 metre band Ham signals between -80 and -100dBm.

When I tune to 144.5MHz, that pager signal is outside the IF bandwidth, so I don’t get a complete breakdown of the radio. However it does cause some strong signal compression, and some birdies/aliased signals pop up. Here is a screen shot from gqrx when the pager signal is active:

In this plot, the signal right on 144.5 is my wanted signal, a weak signal I injected on the bench. The hump at 144.7 is an artefact of the pager signal, which due to strong signal compression just happens to to appear close to (but not on top of) my wanted signal. The wider hump is the IF filter. Here’s a closer look of the IF filter, set to 100kHz using the gqrx “Bandwidth” field:

To see the IF filter shape I connected a terminated LNA to the input of the RTLSDR. This generates wideband noise that is amplified and can be used to visualise the filter shape.

There has been some really cool recent work exploring the IF filtering capabilities of the R820T2 [2]. I tried a few of the newly available IF filter configurations. My experience was the shape factor isn’t great (the filters roll off slowly), so the IF filters don’t have a huge impact on close-in strong signal performance. They do help with attenuating aliased signals.

It was however a great learning and exploring experience, a real deep dive into SDR.

Gqrx is really cool for this sort of work. For example if you tune a strong signal from a signal generator either side of the IF passband, you can see aliases popping up and zooming across the screen. It’s also possible to link gqrx with different RTLSDR driver libraries, to explore their performance. I use the UDP output feature to send samples to my noise figure measuring tool.

Airspy and RTLSDR

The Airspy uses the same tuner chip so I tested it and found about the same strong signal results, i.e. it overloads at the same signal levels as the RTLSDR at max gain. Guess this is no surprise if it’s the same tuner. I once again [5] measured the Airspy noise figure at about 7dB, slightly higher than the 6dB of the RTLSDR. This is consistent with other measurements [1], but quite a way from Airspy’s quoted figure (3.5dB).

This is a good example of the high sample rate/decimation filter architecture in action – the 8-bit RTLSDR delivers a similar noise figure to the 12-bit Airspy.

But the RTLSDR NF probably doesn’t matter

I’ve spent a few weeks spent peering into driver code, messing with inter-stage gains, hammering the little RTLSDR with strong RF signals, and optimising noise figures. However it might all be for naught! At both my home and Bills – external noise appears to dominate. On the 2M band (144.5MHz) we are measuring noise from our (omni) antennas about 20dB above thermal, e.g. -150dBm/Hz compared to the ideal thermal noise level of -174dBm/Hz. Dreaded EMI from all the high speed electronics in urban environments.

EMI can look at lot like strong signal overload – at high gains all these lines pop up, just like ADC overload. If you reduce the gain a bit they drop down into the noise, although its a smooth reduction in level unlike ADC overload which is very abrubt. I guess we are seeing are harmonics of switching power supply signals or other nearby digital devices. Rather than an artefact of Rx overload, we are seeing a sensitive receiver detecting weak EMI signal right down near the noise floor.

So this changes the equation – rather than optimising internal NF we need to ensure enough link margin to get over the ambient noise. An external LNA won’t help, and even a lossy coax run loss might not matter much, as the SNR is more or less set at the antenna.

I’ve settled on the librtlsdr RTLSDR driver/library for my FSK modem experiments as it supports IF filter and inter-stage gain control. I also have a mash up called rtl_fsk.c where I integrate a FSK modem and some powerful LDPC codes, but that’s another story!

Reading Further

[1] Evaluation of SDR Boards V1.0 – A fantastic report on the performance of several SDRs.
[2] RTLSDR: driver extensions – a very interesting set of conference slides discussing recent work with the R820T2 by Hayati Aygun. Many useful links.
[3] librtlsdr fork of rtlsdr driver library that includes support for the IF filter configuration and individual gain stage control discussed in [2].
[4] My fork of librtlsdr – used for NF measurements and rtl_fsk mash up development.
[5] Some Measurements on E4000 and R802T tuners – Detailed look at the tuner by HB9AJG. Fig 5 & 6 bucket curves show overload levels consistent with [1] and my measurements.
[6] Measuring SDR Noise Figure in Real Time

Book Chapter Proposal : Monitoring HPC Systems Against Compromised SSH

Objective

To describe protections and monitoring against compromised SSH keys on HPC systems.

Scope

To describe the development and application the SSH cryptographic protocol and its use HPC systems. To illustrate a prominent case example where compromised SSH credentials affected several major HPC centres in Europe. To illustrate tools and processes developed and used at the University of Melbourne to protect against SSH compromises. To suggest an "all-of-campus" common security system as a future research project.

Structure

Secure Shell is a very well established cryptographic network protocol for accessing operating network services and is the typical way to access high-performance computing (HPC) systems in preference to various unsecured remote shell protocols, such as rlogin, telnet, and ftp. As with any security protocol it has undergone several changes to improve the strength of the program, most notably the improvement to SSH-2 which incorporated Diffie-Hellman key exchange. The security advantages of SSH are sufficient that there are strong arguments that computing users should use SSH "everywhere". Such a proposition is no mere fancy; as an adaptable network protocol SSH can be used not just for remote logins and operations, but also for secure mounting of remote file systems, file transfers, port forwarding, network tunnelling, web-broswing through encrypted proxies.

Despite the justified popularity and engineering excellence of SSH, in May 2020 multiple HPC centres across Europe found themselves subject to cyber-attacks via compromised SSH credentials. This included, among others, major centres such as the University of Edinburgh's ARCHER supercomputer, the High-Performance Computing Center Stuttgart's Hawk supercomputer, the Julich Research Center JURECA, JUDAC, and JUWELS supercomputers, and the Swiss Center of Scientific Computations (CSCS), with attacks being launched from compromised networks from the University of Krakow, Poland, Shanghai Jiaotong University, PR China, and China Science and Technology Network, PR China. It is speculated that the attacks were making use of GPU infrastructure for cryptographic coin mining, certainly the most obviously vector for financial gain.

The phrase "compromised SSH credentials" does not imply a weakness in SSH as such, but rather practises around SSH-key use. As explicitly stated by system engineers, some researchers had been been using private SSH keys without passcodes and leaving them in their home directories. These would be used by users to login from one HPC system to another, as it is not unusual for researchers to to have accounts on multiple systems. It is noted that users engaging in such an approach are either unaware or ignored the principles of keeping a private key private, encrypting private keys, or making use of an SSH agent. Access to the keys could be achieved through inappropriate POSIX permissions, or more usual methods of access (e.g., ignoring policies of sharing accounts), with follow-up escalations. Passphraseless SSH keys are common as they are the default when creating a new key with `ssh-keygen` and are are convenient to use, without needing to set up an ssh agent. Passphraseless SSH is also offered by default as part of many cloud offerings, as a relatively secure way to provide a new user with access to their virtual machines.

Based at experiences at the University of Melbourne HPC, it is possible at a system level using `ssh-keygen` to script a search to detect all keys with an empty password even when they are named differently with additional complexity required when parsing non-standard directories and configuration files. This is far more elegant than conducting a `grep` for 'MII' and similar techniques which is commonly suggested. A further alternative is a test making direct use of `libssh` headers. This however, will require a version of `libssh` which incorporates the new SSH format, which is atypical for HPC systems which tend to have a degree of stability in the operating system level, even if they make use of diverse versions and compilers on the application level. Of course, invoking a different version of `libssh` (e.g., through an environment modules approach) provides an alternative solution which can be incorporated into a small C program (`key_audit.c`), which elegantly tests validation of an empty passphrase against a given keyfile.

For monitoring such programs are extremely efficient; a test of more than 3000 user accounts takes less than 1.5 seconds on a contemporary system. Following this the use of `inotifywait` can be applied so that any new insecure keys would be detected immediately instead of waiting for a cron task to initiate. The system can be further strengthened by using SSH key-only logins, rather than allowing for password authentication, or restricting password authentication to VPN logins only with `sshd_config` and two-factor authentication. Prevention of shared private keys is achieved by checking for duplications in the `authorized_keys` file. Further with `authorized_keys` managed through a repository with version control (e.g., GitHub, Gitlab), another layer of protection would exist to prevent multiple users to log in with the same key. Each key would a separate file named after its own checksum and use an `AuthorizedKeysCommand` directive.

Further research in this area would involve developing a university-wide API offering public keys for arbitrary ssh logins for various systems on the campus. Keys can be stored for user accounts, which are used for git clone actions, pull requests etc. Access to systems could also be implemented via a zero trust security framework (e.g., BeyondCorp), which would protect systems from both intruders who are already within a network perimeter, but also provide secure access to users who are outside it.

Biographies

Lev Lafayette is a senior HPC DevOps Engineer at the University of Melbourne, which he has been since 2015. Prior to that he worked in a similar role at the Victorian Partnership for Advanced Computing (VPAC) for eight years. He an experienced HPC educator and has a significant number of relevant publications in this field

Narendra Chinnam is a Senior HPC DevOps Engineer at the University of Melbourne. Prior to that, he worked as a Systems Software Engineer in the HPC/AI division at Hewlett-Packard Enterprise (HPE) for thirteen years. He has made significant contributions to the HPE HPC cluster management software portfolio and was a member of several top500 cluster deployment projects, including "Eka" - the 4th fastest supercomputer in the world as of Nov-2007.

Timothy Rice is a DevOps/HPC Engineer at the University of Melbourne. In past lives, he was a researcher and tutor in applied mathematics, an operations analyst in the Department of Defence, and a Software Carpentry instructor.

Publication

A proposal for a book chapter is needed from prospective authors before the proposal *submission due date*, describing the objective, scope, and structure of the proposed chapter (no more than 5 pages). With the chapter proposal, please also submit a brief biography of each author. Acceptance of chapter proposals will be communicated to lead chapter authors after a formal double-blind review process to ensure relevance, quality, and originality. The submission of chapter proposals should be sent directly via email to Nitin Sukhija (email: nitin.sukhija@sru.edu) and Kuan-Ching Li (kuancli@gm.pu.edu.tw).

https://sites.google.com/view/cs-hpc-call/home

October 14, 2020

Ada Lovelace Day 2020 - Finding Ada

Ada Lovelace Day 2020 - Finding Ada kattekrab Wed, 14/10/2020 - 21:49

Making an Apache website available as a Tor Onion Service

As part of the #MoreOnionsPorFavor campaign, I decided to follow brave.com's lead and make my homepage available as a Tor onion service.

Tor daemon setup

I started by installing the Tor daemon locally:

apt install tor

and then setting the following in /etc/tor/torrc:

SocksPort 0
SocksPolicy reject *
HiddenServiceDir /var/lib/tor/hidden_service/
HiddenServicePort 80 [2600:3c04::f03c:91ff:fe8c:61ac]:80
HiddenServicePort 443 [2600:3c04::f03c:91ff:fe8c:61ac]:443
HiddenServiceVersion 3
HiddenServiceNonAnonymousMode 1
HiddenServiceSingleHopMode 1

in order to create a version 3 onion service without actually running a Tor relay.

Note that since I am making a public website available over Tor, I do not need the location of the website to be hidden and so I used the same settings as Cloudflare in their public Tor proxy.

Also, I explicitly used the external IPv6 address of my server in the configuration in order to prevent localhost bypasses.

After restarting the Tor daemon to reload the configuration file:

systemctl restart tor.service

I looked for the address of my onion service:

$ cat /var/lib/tor/hidden_service/hostname 
ixrdj3iwwhkuau5tby5jh3a536a2rdhpbdbu6ldhng43r47kim7a3lid.onion

Apache configuration

Next, I enabled a few required Apache modules:

a2enmod mpm_event
a2enmod http2
a2enmod headers

and configured my Apache vhosts in /etc/apache2/sites-enabled/www.conf:

<VirtualHost *:443>
    ServerName fmarier.org
    ServerAlias ixrdj3iwwhkuau5tby5jh3a536a2rdhpbdbu6ldhng43r47kim7a3lid.onion

    Protocols h2, http/1.1
    Header set Onion-Location "http://ixrdj3iwwhkuau5tby5jh3a536a2rdhpbdbu6ldhng43r47kim7a3lid.onion%{REQUEST_URI}s"
    Header set alt-svc 'h2="ixrdj3iwwhkuau5tby5jh3a536a2rdhpbdbu6ldhng43r47kim7a3lid.onion:443"; ma=315360000; persist=1'
    Header add Strict-Transport-Security: "max-age=63072000"

    Include /etc/fmarier-org/www-common.include

    SSLEngine On
    SSLCertificateFile /etc/letsencrypt/live/fmarier.org/fullchain.pem
    SSLCertificateKeyFile /etc/letsencrypt/live/fmarier.org/privkey.pem
</VirtualHost>

<VirtualHost *:80>
    ServerName fmarier.org
    Redirect permanent / https://fmarier.org/
</VirtualHost>

<VirtualHost *:80>
    ServerName ixrdj3iwwhkuau5tby5jh3a536a2rdhpbdbu6ldhng43r47kim7a3lid.onion
    Include /etc/fmarier-org/www-common.include
</VirtualHost>

Note that /etc/fmarier-org/www-common.include contains all of the configuration options that are common to both the HTTP and the HTTPS sites (e.g. document root, caching headers, aliases, etc.).

Finally, I restarted Apache:

apache2ctl configtest
systemctl restart apache2.service

Testing

In order to test that my website is correctly available at its .onion address, I opened the following URLs in a Brave Tor window:

I also checked that the main URL (https://fmarier.org/) exposes a working Onion-Location header which triggers the display of a button in the URL bar (recently merged and available in Brave Nightly):

Testing that the Alt-Svc is working required using the Tor Browser since that's not yet supported in Brave:

  1. Open https://fmarier.org.
  2. Wait 30 seconds.
  3. Reload the page.

On the server side, I saw the following:

2a0b:f4c2:2::1 - - [14/Oct/2020:02:42:20 +0000] "GET / HTTP/2.0" 200 2696 "-" "Mozilla/5.0 (Windows NT 10.0; rv:78.0) Gecko/20100101 Firefox/78.0"
2600:3c04::f03c:91ff:fe8c:61ac - - [14/Oct/2020:02:42:53 +0000] "GET / HTTP/2.0" 200 2696 "-" "Mozilla/5.0 (Windows NT 10.0; rv:78.0) Gecko/20100101 Firefox/78.0"

That first IP address is from a Tor exit node:

$ whois 2a0b:f4c2:2::1
...
inet6num:       2a0b:f4c2::/40
netname:        MK-TOR-EXIT
remarks:        -----------------------------------
remarks:        This network is used for Tor Exits.
remarks:        We do not have any logs at all.
remarks:        For more information please visit:
remarks:        https://www.torproject.org

which indicates that the first request was not using the .onion address.

The second IP address is the one for my server:

$ dig +short -x 2600:3c04::f03c:91ff:fe8c:61ac
hafnarfjordur.fmarier.org.

which indicates that the second request to Apache came from the Tor relay running on my server, hence using the .onion address.

October 12, 2020

First Attempt at Gnocchi-Statsd

I’ve been investigating the options for tracking system statistics to diagnose performance problems. The idea is to track all sorts of data about the system (network use, disk IO, CPU, etc) and look for correlations at times of performance problems. DataDog is pretty good for this but expensive, it’s apparently based on or inspired by the Etsy Statsd. It’s claimed that the gnocchi-statsd is the best implementation of the protoco used by the Etsy Statsd, so I decided to install that.

I use Debian/Buster for this as that’s what I’m using for the hardware that runs KVM VMs. Here is what I did:

# it depends on a local MySQL database
apt -y install mariadb-server mariadb-client
# install the basic packages for gnocchi
apt -y install gnocchi-common python3-gnocchiclient gnocchi-statsd uuid

In the Debconf prompts I told it to “setup a database” and not to manage keystone_authtoken with debconf (because I’m not doing a full OpenStack installation).

This gave a non-working configuration as it didn’t configure the MySQL database for the [indexer] section and the sqlite database that was configured didn’t work for unknown reasons. I filed Debian bug #971996 about this [1]. To get this working you need to edit /etc/gnocchi/gnocchi.conf and change the url line in the [indexer] section to something like the following (where the password is taken from the [database] section).

url = mysql+pymysql://gnocchi-common:PASS@localhost:3306/gnocchidb

To get the statsd interface going you have to install the gnocchi-statsd package and edit /etc/gnocchi/gnocchi.conf to put a UUID in the resource_id field (the Debian package uuid is good for this). I filed Debian bug #972092 requesting that the UUID be set by default on install [2].

Here’s an official page about how to operate Gnocchi [3]. The main thing I got from this was that the following commands need to be run from the command-line (I ran them as root in a VM for test purposes but would do so with minimum privs for a real deployment).

gnocchi-api
gnocchi-metricd

To communicate with Gnocchi you need the gnocchi-api program running, which uses the uwsgi program to provide the web interface by default. It seems that this was written for a version of uwsgi different than the one in Buster. I filed Debian bug #972087 with a patch to make it work with uwsgi [4]. Note that I didn’t get to the stage of an end to end test, I just got it to basically run without error.

After getting “gnocchi-api” running (in a terminal not as a daemon as Debian doesn’t seem to have a service file for it), I ran the client program “gnocchi” and then gave it the “status” command which failed (presumably due to the metrics daemon not running), but at least indicated that the client and the API could communicate.

Then I ran the “gnocchi-metricd” and got the following error:

2020-10-12 14:59:30,491 [9037] ERROR    gnocchi.cli.metricd: Unexpected error during processing job
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/gnocchi/cli/metricd.py", line 87, in run
    self._run_job()
  File "/usr/lib/python3/dist-packages/gnocchi/cli/metricd.py", line 248, in _run_job
    self.coord.update_capabilities(self.GROUP_ID, self.store.statistics)
  File "/usr/lib/python3/dist-packages/tooz/coordination.py", line 592, in update_capabilities
    raise tooz.NotImplemented
tooz.NotImplemented

At this stage I’ve had enough of gnocchi. I’ll give the Etsy Statsd a go next.

Update

Thomas has responded to this post [5]. At this stage I’m not really interested in giving Gnocchi another go. There’s still the issue of the indexer database which should be different from the main database somehow and sqlite (the config file default) doesn’t work.

I expect that if I was to persist with Gnocchi I would encounter more poorly described error messages from the code which either don’t have Google hits when I search for them or have Google hits to unanswered questions from 5+ years ago.

The Gnocchi systemd config files are in different packages to the programs, this confused me and I thought that there weren’t any systemd service files. I had expected that installing a package with a daemon binary would also get the systemd unit file to match.

The cluster features of Gnocchi are probably really good if you need that sort of thing. But if you have a small instance (EG a single VM server) then it’s not needed. Also one of the original design ideas of the Etsy Statsd was that UDP was used because data could just be dropped if there was a problem. I think for many situations the same concept could apply to the entire stats service.

If the other statsd programs don’t do what I need then I may give Gnocchi another go.

October 11, 2020

Upgrading from Ubuntu 18.04 bionic to 20.04 focal

I recently upgraded from Ubuntu 18.04.5 (bionic) to 20.04.1 (focal) and it was one of the roughest Ubuntu upgrades I've gone through in a while. Here are the notes I took on avoiding or fixing the problems I ran into.

Preparation

Before going through the upgrade, I disabled the configurations which I know interfere with the process:

  • Enable etckeeper auto-commits before install by putting the following in /etc/etckeeper/etckeeper.conf:

      AVOID_COMMIT_BEFORE_INSTALL=0
    
  • Remount /tmp as exectuable:

      mount -o remount,exec /tmp
    

Another step I should have taken but didn't, was to temporarily remove safe-rm since it caused some problems related to a Perl upgrade happening at the same time:

apt remove safe-rm

Network problems

After the upgrade, my network settings weren't really working properly and so I started by switching from ifupdown to netplan.io which seems to be the preferred way of configuring the network on Ubuntu now.

Then I found out that netplan.io is incompatible with systemd-resolved's handling of .local hostnames. I would be able to resolve a hostname using avahi:

$ avahi-resolve --name machine.local
machine.local   192.168.1.5

but not with systemd:

$ systemd-resolve machine.local
machine.local: resolve call failed: 'machine.local' not found

The best work-around I found was to disable systemd-resolved:

systemctl stop systemd-resolved.service
systemctl disable systemd-resolved.service
rm /etc/resolv.conf

Boot problems

For some reason I was able to boot with the kernel I got as part of the focal update, but a later kernel update rendered my machine unbootable.

Adding some missing RAID-related modules to /etc/initramfs-tools/modules:

raid1
dmraid
md-raid1

and then re-creating all initramfs:

update-initramfs -u -k all

seemed to do the trick.

October 09, 2020

Rift CV1 – multi-threaded tracking

This video shows the completion of work to split the tracking code into 3 threads – video capture, fast analysis and long analysis.

If the projected pose of an object doesn’t line up with the LEDs where we expect it to be, the frame is sent off for more expensive analysis in another thread. That way, it doesn’t block tracking of other objects – the fast analysis thread can continue with the next frame.

As a new blob is detected in a video frame, it is assigned an ID, and tracked between frames using motion flow. When the analysis results are available at some point in the future, the ID lets us find blobs that still exist in that most recent video frame. If the blobs are still unknowns in the new frame, the code labels them with the LED ID it found – and then hopefully in the next frame, the fast analysis is locked onto the object again.

There are some obvious next things to work on:

  • It’s hard to decide what constitutes a ‘good’ pose match, especially around partially visible LEDs at the edges. More experimentation and refinement needed
  • The IMU dead-reckoning between frames is bad – the accelerometer biases especially for the controllers tends to make them zoom off very quickly and lose tracking. More filtering, bias extraction and investigation should improve that, and help with staying locked onto fast-moving objects.
  • The code that decides whether an LED is expected to be visible in a given pose can use some improving.
  • Often the orientation of a device is good, but the position is wrong – a matching mode that only searches for translational matches could be good.
  • Taking the gravity vector of the device into account can help reject invalid poses, as could some tests against plausible location based on human movements limits.

Code is at https://github.com/thaytan/OpenHMD/tree/rift-correspondence-search

October 05, 2020

Rift CV1 update

This is another in my series of updates on developing positional tracking for the Oculus Rift CV1 in OpenHMD

In the last post I ended with a TODO list. Since then I’ve crossed off a few things from that, and fixed a handful of very important bugs that were messing things up. I took last week off work, which gave me some extra hacking hours and enthusiasm too, and really helped push things forward.

Here’s the updated list:

  • The full model search for re-acquiring lock when we start, or when we lose tracking takes a long time. More work will mean avoiding that expensive path as much as possible.
  • Multiple cameras interfere with each other.
    • Capturing frames from all cameras and analysing them happens on a single thread, and any delay in processing causes USB packets to be missed.
    • I plan to split this into 1 thread per camera doing capture and analysis of the ‘quick’ case with good tracking lock, and a 2nd thread that does the more expensive analysis when it’s needed. Partially Fixed
  • At the moment the full model search also happens on the video capture thread, stalling all video input for hundreds of milliseconds – by which time any fast motion means the devices are no longer where we expect them to be.
    • This means that by the next frame, it has often lost tracking again, requiring a new full search… making it late for the next frame, etc.
    • The latency of position observations after a full model search is not accounted for at all in the current fusion algorithm, leading to incorrect reporting. Partially Fixed
  • More validation is needed on the camera pose transformations. For the controllers, the results are definitely wrong – I suspect because the controller LED models are supplied (in the firmware) in a different orientation to the HMD and I used the HMD as the primary test. Much Improved
  • Need to take the position and orientation of the IMU within each device into account. This information is in the firmware information but ignored right now. Fixed
  • Filtering! This is a big ticket item. The quality of the tracking depends on many pieces – how well the pose of devices is extracted from the computer vision and how quickly, and then very much on how well the information from the device IMU is combined with those observations. I have read so many papers on this topic, and started work on a complex Kalman filter for it.
  • Improve the model to LED matching. I’ve done quite a bit of work on refining the model matching algorithm, and it works very well for the HMD. It struggles more with the controllers, where there are fewer LEDs and the 2 controllers are harder to disambiguate. I have some things to try out for improving that – using the IMU orientation information to disambiguate controllers, and using better models for what size/brightness we expect an LED to be for a given pose.
  • Initial calibration / setup. Rather than assuming the position of the headset when it is first sighted, I’d like to have a room calibration step and a calibration file that remembers the position of the cameras.
  • Detecting when cameras have been moved. When cameras observe the same device simultaneously (or nearly so), it should be possible to detect if cameras are giving inconsistent information and do some correction.
  • hot-plug detection of cameras and re-starting them when they go offline or encounter spurious USB protocol errors. The latter happens often enough to be annoying during testing.
  • Other things I can’t think of right now.

As you can see, a few of the top-level items have been fixed, or mostly so. I split the computer vision for the tracking into several threads:

  • 1 thread shared between all sensors to capture USB packets and assemble them into frames
  • 1 thread per sensor to analyse the frame and update poses

The goal with that initial split was to prevent the processing of multiple sensors from interfering with each other, but I found that it also has a strong benefit even with a single sensor. I realised something in the last week that I probably should have noted earlier: The Rift sensors capture a video frame every 19.2ms, but that frame then takes a full 17ms to deliver across the USB – the means that when everything was in one thread, even with 1 sensor there was only about 2.2ms for the full analysis to take place or else we’d miss a packet of the next frame and have to throw it away. With the analysis now happening in a separate thread and a ping-pong double buffer in place, the analysis can take quite a bit longer without losing any video frames.

I plan to add a 2nd per-sensor thread that will divide the analysis further. The current thread will do only fast pass validation of any existing tracking lock, and will defer any longer term analysis to the other thread. That means that if we have a good lock on the HMD, but can’t (for example) find one of the controllers, searching for the controller will be deferred and the fast pass thread will move onto the next frame and keep tracking lock on the headset.

I fixed some bugs in the calculations that move between frames of reference – converting to/from the global position and orientation in the world to the position and orientation relative to each camera sensor when predicting what the appearance of the LEDs should be. I also added in the IMU offset and orientation of the LED models from the firmware, to make the predictions more accurate when devices move in the time between camera exposures.

Yaw Correction: when a device is observed by a sensor, the orientation is incorporated into what the IMU is measuring. The IMU can sense gravity and knows which way is up or down, but not which way is forward. The observation from the camera now corrects for that yaw drift, to keep things pointing the way you expect them to.

Some other bits:

  • Fixing numerical overflow issues in the OpenHMD maths routines
  • Capturing the IMU orientation and prediction that most closely corresponds to the moment each camera image is recorded, instead of when the camera image finishes transferring to the PC (which is 17ms later)
  • Improving the annotated debug view, to help understand what’s happening in the tracking computer vision steps
  • A 1st order estimate of device velocity to help improve the next predicted position

I posted a longer form video walkthrough of the current code in action, and discussing some of the remaining shortcomings.

As previously, the code is available at https://github.com/thaytan/OpenHMD/tree/rift-correspondence-search

Audiobooks – September 2020

The Demon-Haunted World: Science as a Candle in the Dark by Carl Sagan

Chapters on Pseudoscience vs Science, critical/skeptical thinking, science education and public policy. Hasn’t aged too badly and well written. 4/5

Don’t Fall For It: A Short History of Financial Scams by Ben Carlson

Real-life Stories of financial scams and scammers (some I’ve heard, some new) and then some lessons people can draw from them. Nice quick read. 3/5

The Bomb and America’s Missile Age by Christopher Gainor

A history of the forces that led to the Atlas program from the end of the War to 1954. Covers a wide range of led-up rocket programs, technical advances and political, cold-war and inter-service rivalries. 3/5

Girl on the Block: A True Story of Coming of Age Behind the Counter by Jessica Wragg

A memoir of the author’s life and career from 16 to her mid 20s. Mixture of story, information about meat (including recipes), the butchery trade and meat industry. 3/5

The One Device: The Secret History of the iPhone by Brian Merchant

A history of the iPhone and various technologies that went into it. Plus some tours of components and manufacturing. No cooperation from Apple so some big gaps but does okay. 4/5

Humble Pi: A Comedy of Maths Errors by Matt Parker

Lots of examples of where Maths went wrong. From Financial errors and misbuilt bridges to failed game shows. Mix of well-known and some more obscure stories. Well told. 4/5

My Scoring System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Share

October 03, 2020

 

This post has what you might call a religious theme.

A person may refer to themselves as “Christian” or “Atheist” (or for that matter “Hindu” or “Buddhist” or as being a subject of one of several empires) however if the most important person to them is their false self-image (no matter how grandiose that may be), what they are actually doing in Real Life™ is Virtual Idolatry.

This is ‘virtual’ in the sense of there being no tangible evidence of the existence of the target of their affections, at all.

If you know of such a person, you will find dealing with their all-important false self to be very confusing, and sooner or later you will learn that to them your real value is zero. Nada. Squat. Nil. Nothing. Nix. You are (to them) temporary and disposable; the only limiting factor is that they will typically pull complex social manoeuvres to conceal their discarding of you, and/or portray it as apparently justifiable (or — more commonly — to be “your fault” and/or “unavoidable” as they Project their own inadequacies onto you).

The best general method for protecting yourself against being damaged by such a person is to quietly, tactfully, withdraw from their presence as much as is realistically possible.

September 30, 2020

Links September 2020

MD5 cracker, find plain text that matches MD5 hash [1].

Debian Quick Image Baker – Debian VM images for various architectures [2].

Insightful article on Scientific American about how dental and orthodontic problems are caused by our modern lifestyle [3].

Cory Doctorow wrote an insightful article for Locus Magazine about Intellectual Property [4]. He makes the case that we are losing our rights due to changes in the legal system that benefits corporations.

Anarcat has an informative blog post about how to stop people using your Mailman installation to attack others [5]. Since version 1:2.1.27-1 the Debian package of Mailman has created a SUBSCRIBE_FORM_SECRET by default on install, but any installation upgraded from an older version of the Debian package won’t have it.

September 29, 2020

More than 280 characters

It’s hard to be nuanced in 280 characters.

The Twitter character limit is a major factor of what can make it so much fun to use: you can read, publish, and interact, in extremely short, digestible chunks. But, it doesn’t fit every topic, ever time. Sometimes you want to talk about complex topics, having honest, thoughtful discussions. In an environment that encourages hot takes, however, it’s often easier to just avoid having those discussions. I can’t blame people for doing that, either: I find myself taking extended breaks from Twitter, as it can easily become overwhelming.

For me, the exception is Twitter threads.

Twitter threads encourage nuance and creativity.

Creative masterpieces like this Choose Your Own Adventure are not just possible, they rely on Twitter threads being the way they are.

Publishing a short essay about your experiences in your job can bring attention to inequality.

And Tumblr screenshot threads are always fun to read, even when they take a turn for the epic (over 4000 tweets in this thread, and it isn’t slowing down!)

Everyone can think of threads that they’ve loved reading.

My point is, threads are wildly underused on Twitter. I think I big part of that is the UI for writing threads: while it’s suited to writing a thread as a series of related tweet-sized chunks, it doesn’t lend itself to writing, revising, and editing anything more complex.

To help make this easier, I’ve been working on a tool that will help you publish an entire post to Twitter from your WordPress site, as a thread. It takes care of transforming your post into Twitter-friendly content, you can just… write. 🙂

It doesn’t just handle the tweet embeds from earlier in the thread: it handles handle uploading and attaching any images and videos you’ve included in your post.

All sorts of embeds work, too. 😉

It’ll be coming in Jetpack 9.0 (due out October 6), but you can try it now in the latest Jetpack Beta! Check it out and tell me what you think. 🙂

This might not fix all of Twitter’s problems, but I hope it’ll help you enjoy reading and writing on Twitter a little more. 💖

September 26, 2020

Repairing a corrupt ext4 root partition

I ran into filesystem corruption (ext4) on the root partition of my backup server which caused it to go into read-only mode. Since it's the root partition, it's not possible to unmount it and repair it while it's running. Normally I would boot from an Ubuntu live CD / USB stick, but in this case the machine is using the mipsel architecture and so that's not an option.

Repair using a USB enclosure

I had to pull the shutdown the server and then pull the SSD drive out. I then moved it to an external USB enclosure and connected it to my laptop.

I started with an automatic filesystem repair:

fsck.ext4 -pf /dev/sde2

which failed for some reason and so I moved to an interactive repair:

fsck.ext4 -f /dev/sde2

Once all of the errors were fixed, I ran a full surface scan to update the list of bad blocks:

fsck.ext4 -c /dev/sde2

Finally, I forced another check to make sure that everything was fixed at the filesystem level:

fsck.ext4 -f /dev/sde2

Fix invalid alternate GPT

The other thing I noticed is this messge in my dmesg log:

scsi 8:0:0:0: Direct-Access     KINGSTON  SA400S37120     SBFK PQ: 0 ANSI: 6
sd 8:0:0:0: Attached scsi generic sg4 type 0
sd 8:0:0:0: [sde] 234441644 512-byte logical blocks: (120 GB/112 GiB)
sd 8:0:0:0: [sde] Write Protect is off
sd 8:0:0:0: [sde] Mode Sense: 31 00 00 00
sd 8:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 8:0:0:0: [sde] Optimal transfer size 33553920 bytes
Alternate GPT is invalid, using primary GPT.
 sde: sde1 sde2

I therefore checked to see if the partition table looked fine and got the following:

$ fdisk -l /dev/sde
GPT PMBR size mismatch (234441643 != 234441647) will be corrected by write.
The backup GPT table is not on the end of the device. This problem will be corrected by write.
Disk /dev/sde: 111.8 GiB, 120034123776 bytes, 234441648 sectors
Disk model: KINGSTON SA400S3
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 799CD830-526B-42CE-8EE7-8C94EF098D46

Device       Start       End   Sectors   Size Type
/dev/sde1     2048   8390655   8388608     4G Linux swap
/dev/sde2  8390656 234441614 226050959 107.8G Linux filesystem

It turns out that all I had to do, since only the backup / alternate GPT partition table was corrupt and the primary one was fine, was to re-write the partition table:

$ fdisk /dev/sde

Welcome to fdisk (util-linux 2.33.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

GPT PMBR size mismatch (234441643 != 234441647) will be corrected by write.
The backup GPT table is not on the end of the device. This problem will be corrected by write.

Command (m for help): w

The partition table has been altered.
Syncing disks.

Run SMART checks

Since I still didn't know what caused the filesystem corruption in the first place, I decided to do one last check: SMART errors.

I couldn't do this via the USB enclosure since the SMART commands aren't forwarded to the drive and so I popped the drive back into the backup server and booted it up.

First, I checked whether any SMART errors had been reported using smartmontools:

smartctl -a /dev/sda

That didn't show any errors and so I kicked off an extended test:

smartctl -t long /dev/sda

which ran for 30 minutes and then passed without any errors.

The mystery remains unsolved.

September 25, 2020

Bandwidth for Video Conferencing

For the Linux Users of Victoria (LUV) I’ve run video conferences on Jitsi and BBB (see my previous post about BBB vs Jitsi [1]). One issue with video conferences is the bandwidth requirements.

The place I’m hosting my video conference server has a NBN link with allegedly 40Mb/s transmission speed and 100Mb/s reception speed. My tests show that it can transmit at about 37Mb/s and receive at speeds significantly higher than that but also quite a bit lower than 100Mb/s (around 60 or 70Mb/s). For a video conference server you have a small number of sources of video and audio and a larger number of targets as usually most people will have their microphones muted and video cameras turned off. This means that the transmission speed is the bottleneck. In every test the reception speed was well below half the transmission speed, so the tests confirmed my expectation that transmission was the only bottleneck, but the reception speed was higher than I had expected.

When we tested bandwidth use the maximum upload speed we saw was about 4MB/s (32Mb/s) with 8+ video cameras and maybe 20 people seeing some of the video (with a bit of lag). We used 3.5MB/s (28Mb/s) when we only had 6 cameras which seemed to be the maximum for good performance.

In another test run we had 4 people all sending video and the transmission speed was about 260KB/s.

I don’t know how BBB manages the small versions of video streams. It might reduce the bandwidth when the display window is smaller.

I don’t know the resolutions of the cameras. When you start sending video in BBB you are prompted for the “quality” with “medium” being default. I don’t know how different camera hardware and different choices about “quality” affect bandwidth.

These tests showed that for the cameras we had available a small group of people video chatting a 100/40 NBN link (the fastest Internet link in Australia that’s not really expensive) a small group of people can be all sending video or a medium size group of people can watch video streams from a small group.

For meetings of the typical size of LUV meetings we won’t have a bandwidth problem.

There is one common case that I haven’t yet tested, where there is a single video stream that many people are watching. If 4 people are all sending video with 260KB/s transmission bandwidth then 1 person could probably send video to 4 for 65KB/s. Doing some simple calculations on those numbers implies that we could have 1 person sending video to 240 people without running out of bandwidth. I really doubt that would work, but further testing is needed.

September 23, 2020

Qemu (KVM) and 9P (Virtfs) Mounts

I’ve tried setting up the Qemu (in this case KVM as it uses the Qemu code in question) 9P/Virtfs filesystem for sharing files to a VM. Here is the Qemu documentation for it [1].

VIRTFS="-virtfs local,path=/vmstore/virtfs,security_model=mapped-xattr,id=zz,writeout=immediate,fmode=0600,dmode=0700,mount_tag=zz"
VIRTFS="-virtfs local,path=/vmstore/virtfs,security_model=passthrough,id=zz,writeout=immediate,mount_tag=zz"

Above are the 2 configuration snippets I tried on the server side. The first uses mapped xattrs (which means that all files will have the same UID/GID and on the host XATTRs will be used for storing the Unix permissions) and the second uses passthrough which requires KVM to run as root and gives the same permissions on the host as on the VM. The advantages of passthrough are better performance through writing less metadata and having the same permissions in host and VM. The advantages of mapped XATTRs are running KVM/Qemu as non-root and not having a SUID file in the VM imply a SUID file in the host.

Here is the link to Bonnie++ output comparing Ext3 on a KVM block device (stored on a regular file in a BTRFS RAID-1 filesystem on 2 SSDs on the host), a NFS share from the host from the same BTRFS filesystem, and virtfs shares of the same filesystem. The only tests that Ext3 doesn’t win are some of the latency tests, latency is based on the worst-case not the average. I expected Ext3 to win most tests, but didn’t expect it to lose any latency tests.

Here is a link to Bonnie++ output comparing just NFS and Virtfs. It’s obvious that Virtfs compares poorly, giving about half the performance on many tests. Surprisingly the only tests where Virtfs compared well to NFS were the file creation tests which I expected Virtfs with mapped XATTRs to do poorly due to the extra metadata.

Here is a link to Bonnie++ output comparing only Virtfs. The options are mapped XATTRs with default msize, mapped XATTRs with 512k msize (I don’t know if this made a difference, the results are within the range of random differences), and passthrough. There’s an obvious performance benefit in passthrough for the small file tests due to the less metadata overhead, but as creating small files isn’t a bottleneck on most systems a 20% to 30% improvement in that area probably doesn’t matter much. The result from the random seeks test in passthrough is unusual, I’ll have to do more testing on that.

SE Linux

On Virtfs the XATTR used for SE Linux labels is passed through to the host. So every label used in a VM has to be valid on the host and accessible to the context of the KVM/Qemu process. That’s not really an option so you have to use the context mount option. Having the mapped XATTR mode work for SE Linux labels is a necessary feature.

Conclusion

The msize mount option in the VM doesn’t appear to do anything and it doesn’t appear in /proc/mounts, I don’t know if it’s even supported in the kernel I’m using.

The passthrough and mapped XATTR modes give near enough performance that there doesn’t seem to be a benefit of one over the other.

NFS gives significant performance benefits over Virtfs while also using less CPU time in the VM. It has the issue of files named .nfs* hanging around if the VM crashes while programs were using deleted files. It’s also more well known, ask for help with an NFS problem and you are more likely to get advice than when asking for help with a virtfs problem.

Virtfs might be a better option for accessing databases than NFS due to it’s internal operation probably being a better map to Unix filesystem semantics, but running database servers on the host is probably a better choice anyway.

Virtfs generally doesn’t seem to be worth using. I had hoped for performance that was better than NFS but the only benefit I seemed to get was avoiding the .nfs* file issue.

The best options for storage for a KVM/Qemu VM seem to be Ext3 for files that are only used on one VM and for which the size won’t change suddenly or unexpectedly (particularly the root filesystem) and NFS for everything else.

September 20, 2020

Oculus Rift CV1 progress

In my last post here, I gave some background on how Oculus Rift devices work, and promised to talk more about Rift S internals. I’ll do that another day – today I want to provide an update on implementing positional tracking for the Rift CV1.

I was working on CV1 support quite a lot earlier in the year, and then I took a detour to get basic Rift S support in place. Now that the Rift S works as a 3DOF device, I’ve gone back to plugging away at getting full positional support on the older CV1 headset.

So, what have I been doing on that front? Back in March, I posted this video of a new tracking algorithm I was working on to match LED constellations to object models to get the pose of the headset and controllers:

The core of this matching is a brute-force search that (somewhat cleverly) takes permutations of observed LEDs in the video footage and tests them against permutations of LEDs from the device models. It then uses an implementation of the Lambda Twist P3P algorithm (courtesy of pH5) to compute the possible poses for each combination of LEDs. Next, it projects the points of the candidate pose to count how many other LED blobs will match LEDs in the 3D model and how closely. Finally, it computes a fitness metric and tracks the ‘best’ pose match for each tracked object.

For the video above, I had the algorithm implemented as a GStreamer plugin that ran offline to process a recording of the device movements. I’ve now merged it back into OpenHMD, so it runs against the live camera data. When it runs in OpenHMD, it also has access to the IMU motion stream, which lets it predict motion between frames – which can help with retaining the tracking lock when the devices move around.

This weekend, I used that code to close the loop between the camera and the IMU. It’s a little simple for now, but is starting to work. What’s in place at the moment is:

  • At startup time, the devices track their movement only as 3DOF devices with default orientation and position.
  • When a camera view gets the first “good” tracking lock on the HMD, it calls that the zero position for the headset, and uses it to compute the pose of the camera relative to the playing space.
  • Camera observations of the position and orientation are now fed back into the IMU fusion to update the position and correct for yaw drift on the IMU (vertical orientation is still taken directly from the IMU detection of gravity)
  • Between camera frames, the IMU fusion interpolates the orientation and position.
  • When a new camera frame arrives, the current interpolated pose is transformed back into the camera’s reference frame and used to test if we still have a visual lock on the device’s LEDs, and to label any newly appearing LEDs if they match the tracked pose
  • The device’s pose is refined using all visible LEDs and fed back to the IMU fusion.

With this simple loop in place, OpenHMD can now track multiple devices, and can do it using multiple cameras – somewhat. The first time the tracking block associated to a camera thinks it has a good lock on the HMD, it uses that to compute the pose of that camera. As long as the lock is genuinely good at that point, and the pose the IMU fusion is tracking is good – then the relative pose between all the cameras is consistent and the tracking is OK. However, it’s easy for that to go wrong and end up with an inconsistency between different camera views that leads to jittery or jumpy tracking….

In the best case, it looks like this:

Which I am pretty happy with 🙂

In that test, I was using a single tracking camera, and had the controller sitting on desk where the camera couldn’t see it, which is why it was just floating there. Despite the fact that SteamVR draws it with a Vive controller model, the various controller buttons and triggers work, but there’s still something weird going on with the controller tracking.

What next? I have a list of known problems and TODO items to tackle:

  • The full model search for re-acquiring lock when we start, or when we lose tracking takes a long time. More work will mean avoiding that expensive path as much as possible.
  • Multiple cameras interfere with each other.
    • Capturing frames from all cameras and analysing them happens on a single thread, and any delay in processing causes USB packets to be missed.
    • I plan to split this into 1 thread per camera doing capture and analysis of the ‘quick’ case with good tracking lock, and a 2nd thread that does the more expensive analysis when it’s needed.
  • At the moment the full model search also happens on the video capture thread, stalling all video input for hundreds of milliseconds – by which time any fast motion means the devices are no longer where we expect them to be.
    • This means that by the next frame, it has often lost tracking again, requiring a new full search… making it late for the next frame, etc.
    • The latency of position observations after a full model search is not accounted for at all in the current fusion algorithm, leading to incorrect reporting.
  • More validation is needed on the camera pose transformations. For the controllers, the results are definitely wrong – I suspect because the controller LED models are supplied (in the firmware) in a different orientation to the HMD and I used the HMD as the primary test.
  • Need to take the position and orientation of the IMU within each device into account. This information is in the firmware information but ignored right now.
  • Filtering! This is a big ticket item. The quality of the tracking depends on many pieces – how well the pose of devices is extracted from the computer vision and how quickly, and then very much on how well the information from the device IMU is combined with those observations. I have read so many papers on this topic, and started work on a complex Kalman filter for it.
  • Improve the model to LED matching. I’ve done quite a bit of work on refining the model matching algorithm, and it works very well for the HMD. It struggles more with the controllers, where there are fewer LEDs and the 2 controllers are harder to disambiguate. I have some things to try out for improving that – using the IMU orientation information to disambiguate controllers, and using better models for what size/brightness we expect an LED to be for a given pose.
  • Initial calibration / setup. Rather than assuming the position of the headset when it is first sighted, I’d like to have a room calibration step and a calibration file that remembers the position of the cameras.
  • Detecting when cameras have been moved. When cameras observe the same device simultaneously (or nearly so), it should be possible to detect if cameras are giving inconsistent information and do some correction.
  • hot-plug detection of cameras and re-starting them when they go offline or encounter spurious USB protocol errors. The latter happens often enough to be annoying during testing.
  • Other things I can’t think of right now.

A nice side effect of all this work is that it can all feed in later to Rift S support. The Rift S uses inside-out tracking to determine the headset’s position in the world – but the filtering to combine those observations with the IMU data will be largely the same, and once you know where the headset is, finding and tracking the controller LED constellations still looks a lot like the CV1’s system.

If you want to try it out, or take a look at the code – it’s up on Github. I’m working in the rift-correspondence-search branch of my OpenHMD repository at https://github.com/thaytan/OpenHMD/tree/rift-correspondence-search

September 19, 2020

Burning Lithium Ion Batteries

I had an old Nexus 4 phone that was expanding and decided to test some of the theories about battery combustion.

The first claim that often gets made is that if the plastic seal on the outside of the battery is broken then the battery will catch fire. I tested this by cutting the battery with a craft knife. With every cut the battery sparked a bit and then when I levered up layers of the battery (it seems to be multiple flat layers of copper and black stuff inside the battery) there were more sparks. The battery warmed up, it’s plausible that in a confined environment that could get hot enough to set something on fire. But when the battery was resting on a brick in my backyard that wasn’t going to happen.

The next claim is that a Li-Ion battery fire will be increased with water. The first thing to note is that Li-Ion batteries don’t contain Lithium metal (the Lithium high power non-rechargeable batteries do). Lithium metal will seriously go off it exposed to water. But lots of other Lithium compounds will also react vigorously with water (like Lithium oxide for example). After cutting through most of the center of the battery I dripped some water in it. The water boiled vigorously and the corners of the battery (which were furthest away from the area I cut) felt warmer than they did before adding water. It seems that significant amounts of energy are released when water reacts with whatever is inside the Li-Ion battery. The reaction was probably giving off hydrogen gas but didn’t appear to generate enough heat to ignite hydrogen (which is when things would really get exciting). Presumably if a battery was cut in the presence of water while in an enclosed space that traps hydrogen then the sparks generated by the battery reacting with air could ignite hydrogen generated from the water and give an exciting result.

It seems that a CO2 fire extinguisher would be best for a phone/tablet/laptop fire as that removes oxygen and cools it down. If that isn’t available then a significant quantity of water will do the job, water won’t stop the reaction (it can prolong it), but it can keep the reaction to below 100C which means it won’t burn a hole in the floor and the range of toxic chemicals released will be reduced.

The rumour that a phone fire on a plane could do a “China syndrome” type thing and melt through the Aluminium body of the plane seems utterly bogus. I gave it a good try and was unable to get a battery to burn through it’s plastic and metal foil case. A spare battery for a laptop in checked luggage could be a major problem for a plane if it ignited. But a battery in the passenger area seems unlikely to be a big problem if plenty of water is dumped on it to prevent the plastic case from burning and polluting the air.

I was not able to get a result that was even worthy of a photograph. I may do further tests with laptop batteries.

September 18, 2020

Setting up and testing an NPR modem on Linux

After acquiring a pair of New Packet Radio modems on behalf of VECTOR, I set it up on my Linux machine and ran some basic tests to check whether it could achieve the advertised 500 kbps transfer rates, which are much higher than AX25) packet radio.

The exact equipment I used was:

Radio setup

After connecting the modems to the power supply and their respective antennas, I connected both modems to my laptop via micro-USB cables and used minicom to connect to their console on /dev/ttyACM[01]:

minicom -8 -b 921600 -D /dev/ttyACM0
minicom -8 -b 921600 -D /dev/ttyACM1

To confirm that the firmware was the latest one, I used the following command:

ready> version
firmware: 2020_02_23
freq band: 70cm

then I immediately turned off the radio:

radio off

which can be verified with:

status

Following the British Columbia 70 cm band plan, I picked the following frequency, modulation (bandwidth of 360 kHz), and power (0.05 W):

set frequency 433.500
set modulation 22
set RF_power 7

and then did the rest of the configuration for the master:

set callsign VA7GPL_0
set is_master yes
set DHCP_active no
set telnet_active no

and the client:

set callsign VA7GPL_1
set is_master no
set DHCP_active yes
set telnet_active no

and that was enough to get the two modems to talk to one another.

On both of them, I ran the following:

save
reboot

and confirmed that they were able to successfully connect to each other:

who

Monitoring RF

To monitor what is happening on the air and quickly determine whether or not the modems are chatting, you can use a software-defined radio along with gqrx with the following settings:

frequency: 433.500 MHz
filter width: user (80k)
filter shape: normal
mode: Raw I/Q

I found it quite helpful to keep this running the whole time I was working with these modems. The background "keep alive" sounds are quite distinct from the heavy traffic sounds.

IP setup

The radio bits out of the way, I turned to the networking configuration.

On the master, I set the following so that I could connect the master to my home network (192.168.1.0/24) without conflicts:

set def_route_active yes
set DNS_active no
set modem_IP 192.168.1.254
set IP_begin 192.168.1.225
set master_IP_size 29
set netmask 255.255.255.0

(My router's DHCP server is configured to allocate dynamic IP addresses from 192.168.1.100 to 192.168.1.224.)

At this point, I connected my laptop to the client using a CAT-5 network cable and the master to the ethernet switch, essentially following Annex 5 of the Advanced User Guide.

My laptop got assigned IP address 192.168.1.225 and so I used another computer on the same network to ping my laptop via the NPR modems:

ping 192.168.1.225

This gave me a round-trip time of around 150-250 ms.

Performance test

Having successfully established an IP connection between the two machines, I decided to run a quick test to measure the available bandwidth in an ideal setting (i.e. the two antennas very close to each other).

On both computers, I installed iperf:

apt install iperf

and then setup the iperf server on my desktop computer:

sudo iptables -A INPUT -s 192.168.1.0/24 -p TCP --dport 5001 -j ACCEPT
sudo iptables -A INPUT -s 192.168.1.0/24 -u UDP --dport 5001 -j ACCEPT
iperf --server

On the laptop, I set the MTU to 750 in NetworkManager:

and restarted the network.

Then I created a new user account (npr with a uid of 1001):

sudo adduser npr

and made sure that only that account could access the network by running the following as root:

# Flush all chains.
iptables -F

# Set defaults policies.
iptables -P INPUT DROP
iptables -P OUTPUT DROP
iptables -P FORWARD DROP

# Don't block localhost and ICMP traffic.
iptables -A INPUT -i lo -j ACCEPT
iptables -A INPUT -p icmp -j ACCEPT
iptables -A OUTPUT -o lo -j ACCEPT

# Don't re-evaluate already accepted connections.
iptables -A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
iptables -A OUTPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT

# Allow connections to/from the test user.
iptables -A OUTPUT -m owner --uid-owner 1001 -m conntrack --ctstate NEW -j ACCEPT

# Log anything that gets blocked.
iptables -A INPUT -j LOG
iptables -A OUTPUT -j LOG
iptables -A FORWARD -j LOG

then I started the test as the npr user:

sudo -i -u npr
iperf --client 192.168.1.8

Results

The results were as good as advertised both with modulation 22 (360 kHz bandwidth):

$ iperf --client 192.168.1.8 --time 30
------------------------------------------------------------
Client connecting to 192.168.1.8, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.225 port 58462 connected with 192.168.1.8 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-34.5 sec  1.12 MBytes   274 Kbits/sec

------------------------------------------------------------
Client connecting to 192.168.1.8, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.225 port 58468 connected with 192.168.1.8 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-42.5 sec  1.12 MBytes   222 Kbits/sec

------------------------------------------------------------
Client connecting to 192.168.1.8, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.225 port 58484 connected with 192.168.1.8 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-38.5 sec  1.12 MBytes   245 Kbits/sec

and modulation 24 (1 MHz bandwitdh):

$ iperf --client 192.168.1.8 --time 30
------------------------------------------------------------
Client connecting to 192.168.1.8, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.225 port 58148 connected with 192.168.1.8 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-31.1 sec  1.88 MBytes   506 Kbits/sec

------------------------------------------------------------
Client connecting to 192.168.1.8, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.225 port 58246 connected with 192.168.1.8 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-30.5 sec  2.00 MBytes   550 Kbits/sec

------------------------------------------------------------
Client connecting to 192.168.1.8, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.225 port 58292 connected with 192.168.1.8 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-30.0 sec  2.00 MBytes   559 Kbits/sec

September 17, 2020

Dell BIOS Updates

I have just updated the BIOS on a Dell PowerEdge T110 II. The process isn’t too difficult, Google for the machine name and BIOS, download a shell script encoded firmware image and GPG signature, then run the script on the system in question.

One problem is that the Dell GPG key isn’t signed by anyone. How hard would it be to get a few well connected people in the Linux community to sign the key used for signing Linux scripts for updating the BIOS? I would be surprised if Dell doesn’t employ a few people who are well connected in the Linux community, they should just ask all employees to sign such GPG keys! Failing that there are plenty of other options. I’d be happy to sign the Dell key if contacted by someone who can prove that they are a responsible person in Dell. If I could phone Dell corporate and ask for the engineering department and then have someone tell me the GPG fingerprint I’ll sign the key and that problem will be partially solved (my key is well connected but you need more than one signature).

The next issue is how to determine that a BIOS update works. What you really don’t want is to have a BIOS update fail and brick your system! So the Linux update process loads the image into something (special firmware RAM maybe) and then reboots the system and the reboot then does a critical part of the update. If the reboot doesn’t work then you end up with the old version of the BIOS. This is overall a good thing.

The PowerEdge T110 II is a workstation with an NVidia video card (I tried an ATI card but that wouldn’t boot for unknown reasons). The Nouveau driver has some issues. One thing I have done to work around some Nouveau issues is to create a file “~/.config/plasma-workspace/env/nouveau-broken.sh” (for KDE sessions) with the following contents:

export LIBGL_ALWAYS_SOFTWARE=1

I previously wrote about using this just for Kmail to stop it crashing [1]. But after doing that I still had other problems with video and disabling all GL on the NVidia card was necessary.

The latest problem I’ve had is that even when using that configuration things don’t go well. When I run the “reboot” command I end up with a kernel message about the GPU not responding and then it doesn’t reboot. That means that the BIOS update doesn’t apply, a hard reboot signals to the system that the new BIOS wasn’t good and I end up with the old BIOS again. I discovered that disabling sddm (the latest xdm program in Debian) from starting on boot meant that a reboot command would work. Then I ran the BIOS update script and it’s reboot command worked and gave a successful BIOS update.

So I’ve gone from a 2013 BIOS to a 2018 BIOS! The update list says that some CVEs have been addressed, but the spectre-meltdown-checker doesn’t report any fewer vulnerabilities.

September 15, 2020

More About the PowerEdge R710

I’ve got the R710 (mentioned in my previous post [1]) online. When testing the R710 at home I noticed that sometimes the VGA monitor I was using would start flickering when in some parts of the BIOS setup, it seemed that the horizonal sync wasn’t working properly. It didn’t seem to be a big deal at the time. When I deployed it the KVM display that I had planned to use with it mostly didn’t display anything. When the display was working the KVM keyboard wouldn’t work (and would prevent a regular USB keyboard from working if they were both connected at the same time). The VGA output of the R710 also wouldn’t work with my VGA->HDMI device so I couldn’t get it working with my portable monitor.

Fortunately the Dell front panel has a display and tiny buttons that allow configuring the IDRAC IP address, so I was able to get IDRAC going. One thing Dell really should do is allow the down button to change 0 to 9 when entering numbers, that would make it easier to enter 8.8.8.8 for the DNS server. Another thing Dell should do is make the default gateway have a default value according to the IP address and netmask of the server.

When I got IDRAC going it was easy to setup a serial console, boot from a rescue USB device, create a new initrd with the driver for the MegaRAID controller, and then reboot into the server image.

When I transferred the SSDs from the old server to the newer Dell server the problem I had was that the Dell drive caddies had no holes in suitable places for attaching SSDs. I ended up just pushing the SSDs in so they are hanging in mid air attached only by the SATA/SAS connectors. Plugging them in took the space from the above drive, so instead of having 2*3.5″ disks I have 1*2.5″ SSD and need the extra space to get my hand in. The R710 is designed for 6*3.5″ disks and I’m going to have trouble if I ever want to have more than 3*2.5″ SSDs. Fortunately I don’t think I’ll need more SSDs.

After booting the system I started getting alerts about a “fault” in one SSD, with no detail on what the fault might be. My guess is that the SSD in question is M.2 and it’s in a M.2 to regular SATA adaptor which might have some problems. The data seems fine though, a BTRFS scrub found no checksum errors. I guess I’ll have to buy a replacement SSD soon.

I configured the system to use the “nosmt” kernel command line option to disable hyper-threading (which won’t provide much performance benefit but which makes certain types of security attacks much easier). I’ve configured BOINC to run on 6/8 CPU cores and surprisingly that didn’t cause the fans to be louder than when the system was idle. It seems that a system that is designed for 6 SAS disks doesn’t need a lot of cooling when run with SSDs.

Update: It’s a R710 not a T710. I mostly deal with Dell Tower servers and typed the wrong letter out of habit.

September 13, 2020

Setting Up a Dell R710 Server with DRAC6

I’ve just got a Dell R710 server for LUV and I’ve been playing with the configuration options. Here’s a list of what I’ve done and recommendations on how to do things. I decided not to try to write a step by step guide to doing stuff as the situation doesn’t work for that. I think that a list of how things are broken and what to avoid is more useful.

BIOS

Firstly with a Dell server you can upgrade the BIOS from a Linux root shell. Generally when a server is deployed you won’t want to upgrade the BIOS (why risk breaking something when it’s working), so before deployment you probably should install the latest version. Dell provides a shell script with encoded binaries in it that you can just download and run, it’s ugly but it works. The process of loading the firmware takes a long time (a few minutes) with no progress reports, so best to plug both PSUs in and leave it alone. At the end of loading the firmware a hard reboot will be performed, so upgrading your distribution while doing the install is a bad idea (Debian is designed to not lose data in this situation so it’s not too bad).

IDRAC

IDRAC is the Integrated Dell Remote Access Controller. By default it will listen on all Ethernet ports, get an IP address via DHCP (using a different Ethernet hardware address to the interface the OS sees), and allow connections. Configuring it to be restrictive as to which ports it listens on may be a good idea (the R710 had 4 ports built in so having one reserved for management is usually OK). You need to configure a username and password for IDRAC that has administrative access in the BIOS configuration.

Web Interface

By default IDRAC will run a web server on the IP address it gets from DHCP, you can connect to that from any web browser that allows ignoring invalid SSL keys. Then you can use the username and password configured in the BIOS to login. IDRAC 6 on the PowerEdge R710 recommends IE 6.

To get a ssl cert that matches the name you want to use (and doesn’t give browser errors) you have to generate a CSR (Certificate Signing Request) on the DRAC, the only field that matters is the CN (Common Name), the rest have to be something that Letsencrypt will accept. Certbot has the option “--config-dir /etc/letsencrypt-drac” to specify an alternate config directory, the SSL key for DRAC should be entirely separate from the SSL key for other stuff. Then use the “--csr” option to specify the path of the CSR file. When you run letsencrypt the file name of the output file you want will be in the format “*_chain.pem“. You then have to upload that to IDRAC to have it used. This is a major pain for the lifetime of letsencrypt certificates. Hopefully a more recent version of IDRAC has Certbot built in.

When you access RAC via ssh (see below) you can run the command “racadm sslcsrgen” to generate a CSR that can be used by certbot. So it’s probably possible to write expect scripts to get that CSR, run certbot, and then put the ssl certificate back. I don’t expect to use IDRAC often enough to make it worth the effort (I can tell my browser to ignore an outdated certificate), but if I had dozens of Dells I’d script it.

SSH

The web interface allows configuring ssh access which I strongly recommend doing. You can configure ssh access via password or via ssh public key. For ssh access set TERM=vt100 on the host to enable backspace as ^H. Something like “TERM=vt100 ssh root@drac“. Note that changing certain other settings in IDRAC such as enabling Smartcard support will disable ssh access.

There is a limit to the number of open “sessions” for managing IDRAC, when you ssh to the IDRAC you can run “racadm getssninfo” to get a list of sessions and “racadm closessn -i NUM” to close a session. The closessn command takes a “-a” option to close all sessions but that just gives an error saying that you can’t close your own session because of programmer stupidity. The IDRAC web interface also has an option to close sessions. If you get to the limits of both ssh and web sessions due to network issues then you presumably have a problem.

I couldn’t find any documentation on how the ssh host key is generated. I Googled for the key fingerprint and didn’t get a match so there’s a reasonable chance that it’s unique to the server (please comment if you know more about this).

Don’t Use Graphical Console

The R710 is an older server and runs IDRAC6 (IDRAC9 is the current version). The Java based graphical console access doesn’t work with recent versions of Java. The Debian package icedtea-netx has has the javaws command for running the .jnlp command for the console, by default the web browser won’t run this, you download the .jnlp file and pass that as the first parameter to the javaws program which then downloads a bunch of Java classes from the IDRAC to run. One error I got with Java 11 was ‘Exception in thread “Timer-0” java.util.ServiceConfigurationError: java.nio.charset.spi.CharsetProvider: Provider sun.nio.cs.ext.ExtendedCharsets could not be instantiated‘, Google didn’t turn up any solutions to this. Java 8 didn’t have that problem but had a “connection failed” error that some people reported as being related to the SSL key, but replacing the SSL key for the web server didn’t help. The suggestion of running a VM with an old web browser to access IDRAC didn’t appeal. So I gave up on this. Presumably a Windows VM running IE6 would work OK for this.

Serial Console

Fortunately IDRAC supports a serial console. Here’s a page summarising Serial console setup for DRAC [1]. Once you have done that put “console=tty0 console=ttyS1,115200n8” on the kernel command line and Linux will send the console output to the virtual serial port. To access the serial console from remote you can ssh in and run the RAC command “console com2” (there is no option for using a different com port). The serial port seems to be unavailable through the web interface.

If I was supporting many Dell systems I’d probably setup a ssh to JavaScript gateway to provide a web based console access. It’s disappointing that Dell didn’t include this.

If you disconnect from an active ssh serial console then the RAC might keep the port locked, then any future attempts to connect to it will give the following error:

/admin1-> console com2
console: Serial Device 2 is currently in use

So far the only way I’ve discovered to get console access again after that is the command “racadm racreset“. If anyone knows of a better way please let me know. As an aside having “racreset” being so close to “racresetcfg” (which resets the configuration to default and requires a hard reboot to configure it again) seems like a really bad idea.

Host Based Management

deb http://linux.dell.com/repo/community/ubuntu xenial openmanage

The above apt sources.list line allows installing Dell management utilities (Xenial is old but they work on Debian/Buster). Probably the packages srvadmin-storageservices-cli and srvadmin-omacore will drag in enough dependencies to get it going.

Here are some useful commands:

# show hardware event log
omreport system esmlog
# show hardware alert log
omreport system alertlog
# give summary of system information
omreport system summary
# show versions of firmware that can be updated
omreport system version
# get chassis temp
omreport chassis temps
# show physical disk details on controller 0
omreport storage pdisk controller=0

RAID Control

The RAID controller is known as PERC (PowerEdge Raid Controller), the Dell web site has an rpm package of the perccli tool to manage the RAID from Linux. This is statically linked and appears to have different versions of libraries used to make it. The command “perccli show” gives an overview of the configuration, but the command “perccli /c0 show” to give detailed information on controller 0 SEGVs and the kernel logs a “vsyscall attempted with vsyscall=none” message. Here’s an overview of the vsyscall enmulation issue [2]. Basically I could add “vsyscall=emulate” to the kernel command line and slightly reduce security for the system to allow system calls from perccli that are called from old libc code to work, or I could run code from a dubious source as root.

Some versions of IDRAC have a “racadm raid” command that can be run from a ssh session to perform RAID administration remotely, mine doesn’t have that. As an aside the help for the RAC system doesn’t list all commands and the Dell documentation is difficult to find so blog posts from other sysadmins is the best source of information.

I have configured IDRAC to have all of the BIOS output go to the virtual serial console over ssh so I can see the BIOS prompt me for PERC administration but it didn’t accept my key presses when I tried to do so. In any case this requires downtime and I’d like to be able to change hard drives without rebooting.

I found vsyscall_trace on Github [3], it uses the ptrace interface to emulate vsyscall on a per process basis. You just run “vsyscall_trace perccli” and it works! Thanks Geoffrey Thomas for writing this!

Here are some perccli commands:

# show overview
perccli show
# help on adding a vd (RAID)
perccli /c0 add help
# show controller 0 details
perccli /c0 show
# add a vd (RAID) of level RAID0 (r0) with the drive 32:0 (enclosure:slot from above command)
perccli /c0 add vd r0 drives=32:0

When a disk is added to a PERC controller about 525MB of space is used at the end for RAID metadata. So when you create a RAID-0 with a single device as in the above example all disk data is preserved by default except for the last 525MB. I have tested this by running a BTRFS scrub on a disk from another system after making it into a RAID-0 on the PERC.

Update: It’s a R710 not a T710. I mostly deal with Dell Tower servers and typed the wrong letter out of habit.

Talks from KubeCon + CloudNativeCon Europe 2020 – Part 1

Various talks I watched from their YouTube playlist.

Application Autoscaling Made Easy With Kubernetes Event-Driven Autoscaling (KEDA) – Tom Kerkhove

I’ve been using Keda a little bit at work. Good way to scale on random stuff. At work I’m scaling pods against length of AWS SQS Queues and as a cron. Lots of other options. This talk is a 9 minute intro. A bit hard to read the small font on the screen of this talk.

Autoscaling at Scale: How We Manage Capacity @ Zalando – Mikkel Larsen, Zalando SE

  • These guys have their own HPA replacement for scaling. Kube-metrics-adapter .
  • Outlines some new stuff in scaling in 1.18 and 1.19.
  • They also have a fork of the Cluster Autoscaler (although some of what it seems to duplicate Amazon Fleets).
  • Have up to 1000 nodes in some of their clusters. Have to play with address space per nodes, also scale their control plan nodes vertically (control plan autoscaler).
  • Use Virtical Pod autoscaler especially for things like prometheus that varies by the size of the cluster. Have had problems with it scaling down too fast. They have some of their own custom changes in a fork

Keynote: Observing Kubernetes Without Losing Your Mind – Vicki Cheung

  • Lots of metrics dont’t cover what you want and get hard to maintain and complex
  • Monitor core user workflows (ie just test a pod launch and stop)
  • Tiny tools
    • 1 watches for events on cluster and logs them -> elastic
    • 2 watches container events -> elastic
    • End up with one timeline for a deploy/job covering everything
    • Empowers users to do their own debugging

Autoscaling and Cost Optimization on Kubernetes: From 0 to 100 – Guy Templeton & Jiaxin Shan

  • Intro to HPA and metric types. Plus some of the newer stuff like multiple metrics
  • Vertical pod autoscaler. Good for single pod deployments. Doesn’t work will with JVM based workloads.
  • Cluster Autoscaler.
    • A few things like using prestop hooks to give pods time to shutdown
    • pod priorties for scaling.
    • –expandable-pods-priority-cutoff to not expand for low-priority jobs
    • Using the priority-expander to try and expand spots first and then fallback to more expensive node types
    • Using mixed instance policy with AWS . Lots of instance types (same CPU/RAM though) to choose from.
    • Look at poddistruptionbudget
    • Some other CA flags like scale-down-utilisation-threshold to lok at.
  • Mention of Keda
  • Best return is probably tuning HPA
  • There is also another similar talk . Note the Male Speaker talks very slow so crank up the speed.

Keynote: Building a Service Mesh From Scratch – The Pinterest Story – Derek Argueta

  • Changed to Envoy as a http proxy for incoming
  • Wrote own extension to make feature complete
  • Also another project migrating to mTLS
    • Huge amount of work for Java.
    • Lots of work to repeat for other languages
    • Looked at getting Envoy to do the work
    • Ingress LB -> Inbound Proxy -> App
  • Used j2 to build the Static config (with checking, tests, validation)
  • Rolled out to put envoy in front of other services with good TLS termination default settings
  • Extra Mesh Use Cases
    • Infrastructure-specific routing
    • SLI Monitoring
    • http cookie monitoring
  • Became a platform that people wanted to use.
  • Solving one problem first and incrementally using other things. Many groups had similar problems. “Just a node in a mesh”.

Improving the Performance of Your Kubernetes Cluster – Priya Wadhwa, Google

  • Tools – Mostly tested locally with Minikube (she is a Minikube maintainer)
  • Minikube pause – Pause the Kubernetes systems processes and leave app running, good if cluster isn’t changing.
  • Looked at some articles from Brendon Gregg
  • Ran USE Method against Minikube
  • eBPF BCC tools against Minikube
  • biosnoop – noticed lots of writes from etcd
  • KVM Flamegraph – Lots of calls from ioctl
  • Theory that etcd writes might be a big contributor
  • How to tune etcd writes ( updated –snapshot-count flag to various numbers but didn’t seem to help)
  • Noticed CPU spkies every few seconds
  • “pidstat 1 60” . Noticed kubectl command running often. Running “kubectl apply addons” regularly
  • Suspected addon manager running often
  • Could increase addon manager polltime but then addons would take a while to show up.
  • But in Minikube not a problem cause minikube knows when new addons added so can run the addon manager directly rather than it polling.
  • 32% reduction in overhead from turning off addon polling
  • Also reduced coredns number to one.
  • pprof – go tool
  • kube-apiserver pprof data
  • Spending lots of times dealing with incoming requests
  • Lots of requests from kube-controller-manager and kube-scheduler around leader-election
  • But Minikube is only running one of each. No need to elect a leader!
  • Flag to turn both off –leader-elect=false
  • 18% reduction from reducing coredns to 1 and turning leader election off.
  • Back to looking at etcd overhead with pprof
  • writeFrameAsync in http calls
  • Theory could increase –proxy-refresh-interval from 30s up to 120s. Good value at 70s but unsure what behavior was. Asked and didn’t appear to be a big problem.
  • 4% reduction in overhead

Share

September 10, 2020

Playing with PAPR

The average power of a FreeDV signal is surprisingly hard to measure as the parallel carriers produce a waveform that has many peaks an troughs as the various carriers come in and out of phase with each other. Peter, VK3RV has been working on some interesting experiments to measure FreeDV power using calorimeters. His work got me thinking about FreeDV power and in particular ways to improve the Peak to Average Power Ratio (PAPR).

I’ve messed with a simple clipper for FreeDV 700C in the past, but decided to take a more scientific approach and use some simulations to measure the effect of clipping on FreeDV PAPR and BER. As usual, asking a few questions blew up into a several week long project. The usual bugs and strange, too good to be true initial results until I started to get results that felt sensible. I’ve tested some of the ideas over the air (blowing up an attenuator along the way), and learnt a lot about PAPR and related subjects like Peak Envelope Power (PEP).

The goal of this work is to explore the effect of a clipper on the average power and ultimately the BER of a received FreeDV signal, given a transmitter with a fixed peak output power.

Clipping to reduce PAPR

In normal operation we adjust our Tx drive so the peaks just trigger the ALC. This sets the average power at Ppeak – PAPR Watts, for example Pav = 100WPEP – 10dB = 10W average.

The idea of the clipper is to chop the tops off the FreeDV waveform so the PAPR is decreased. We can then increase the Tx drive, and get a higher average power. For example if PAPR is reduced from 10 to 4dB, we get Pav = 100WPEP – 4dB – 40W. That’s 4x the average power output of the 10dB PAPR case – Woohoo!

In the example below the 16 carrier waveform was clipped and the PAPR reduced from 10.5 to 4.5dB. The filtering applied after the clipper smooths out the transitions (and limits the bandwidth to something reasonable).

However it gets complicated. Clipping actually reduces the average power, as we’ve removed the high energy parts of the waveform. It also distorts the signal. Here is a scatter diagram of the signal before and after clipping:


The effect looks like additive noise. Hmmm, and what happens on multipath channels, does the modem perform the same as for AWGN with clipped signals? Another question – how much clipping should we apply?

So I set about writing a simulation (papr_test.m) and doing some experiments to increase my understanding of clippers, PAPR, and OFDM modem performance using typical FreeDV waveforms. I started out trying a few different compression methods such as different compander curves, but found that clipping plus a bandpass filter gives about the same result. So for simplicity I settled on clipping. Throughout this post many graphs are presented in terms of Eb/No – for the purpose of comparison just consider this the same thing as SNR. If the Eb/No goes up by 1dB, so does the SNR.

Here’s a plot of PAPR versus the number of carriers, showing PAPR getting worse with the number of carriers used:

Random data was used for each symbol. As the number of carriers increases, you start to get phases in carriers cancelling due to random alignment, reducing the big peaks. Behaivour with real world data may be different; if there are instances where the phases of all carriers are aligned there may be larger peaks.

To define the amount of clipping I used an estimate of the PDF and CDF:

The PDF (or histogram) shows how likely a certain level is, and the CDF shows the cumulative PDF. High level samples are quite unlikely. The CDF shows us what proportion of samples are above and below a certain level. This CDF shows us that 80% of the samples have a level of less than 4, so only 20% of the samples are above 4. So a clip level of 0.8 means the clipper hard limits at a level of 4, which would affect the top 20% of the samples. A clip value of 0.6, would mean samples with a level of 2.7 and above are clipped.

Effect of clipping on BER

Here are a bunch of curves that show the effect of clipping on and AWGN and multipath channel (roughly CCIR poor). A 16 carrier signal was used – typical of FreeDV waveforms. The clipping level and resulting PAPR is shown in the legend. I also threw in a Tx diversity curve – sending each symbol twice on double the carriers. This is the approach used on FreeDV 700C and tends to help a lot on multipath channels.

As we clip the signal more and more, the BER performance gets worse (Eb/No x-axis) – but the PAPR is reduced so we can increase the average power, which improves the BER. I’ve tried to show the combined effect on the (peak Eb/No x-axis) curves which scales each curve according to it’s PAPR requirements. This shows the peak power required for a given BER. Lower is better.




Take aways:

  1. The 0.8 and 0.6 clip levels work best on the peak Eb/No scale, ie when we combine effect of the hit on BER performance (bad) and PAPR improvement (good).
  2. There is about 4dB improvement across a range of operating points. This is pretty signficant – similar to gains we get from Tx diversity or a good FEC code.
  3. AWGN and Multipath improvements are similar – good. Sometimes you get an algorithm that works well on AWGN but falls in a heap on multipath channels, which are typically much tougher to push bits through.
  4. I also tried 8 carrier waveforms, which produced results about 1dB better, as I guess fewer carriers have a lower PAPR to start with.
  5. Non-linear techniques like clipping spread the energy in frequency.
  6. Filtering to constrain the frequency spread brings the PAPR up again. We can trade off PAPR with bandwidth: lower PAPR, more bandwidth.
  7. Non-linear technqiques will mess with QAM more. So we may hit a wall at high data rates.

Testing on a Real PA

All these simulations are great, but how do they compare with operation on a real HF radio? I designed an experiment to find out.

First, some definitions.

The same FreeDV OFDM signal is represented in different ways as it winds it’s way through the FreeDV system:

  1. Complex valued samples are used for much of the internal signal processing.
  2. Real valued samples at the interfaces, e.g. for getting samples in and out of a sound card and standard HF radio.
  3. Analog baseband signals, e.g. voltage inside your radio.
  4. Analog RF signals, e.g. at the output of your PA, and input to your receiver terminals.
  5. An electromagnetic wave.

It’s the same signal, as we can convert freely between the representations with no loss of fidelity, but it’s representation can change the way measures like PAPR work. This caused me some confusion – for example the PAPR of the real signal is about 3dB higher than the complex valued version! I’m still a bit fuzzy on this one, but have satisfied myself that the PAPR of the complex signal is the same as the PAPR of the RF signal – which is what we really care about.

Another definition that I had to (re)study was Peak Envelope Power (PEP) – which is the peak power averaged over one or more carrier cycles. This is the RF equivalent to our “peak” in PAPR. When driven by any baseband input signal, it’s the maximum RF power of the radio, averaged over one or more carrier cycles. Signals such as speech and FreeDV waveforms will have occasional peaks that hit the PEP. A baseband sine wave driving the radio would generate a RF signal that sits at the PEP power continuously.

Here is the experimental setup:

The idea is to play canned files through the radio, and measure the average Tx power. It took me several attempts before my experiment gave sensible results. A key improvement was to make the peak power of each sampled signal the same. This means I don’t have to keep messing with the audio drive levels to ensure I have the same peak power. The samples are 16 bits, so I normalised each file such that the peak was at +/- 10000.

Here is the RF power sampler:

It works pretty well on signals from my FT-817 and IC-7200, and will help prevent any more damage to RF test equipment. I used my RF sampler after my first attempt using a SMA barell attenuator resulted in it’s destruction when I accdentally put 5W into it! Suddenly it went from 30dB to 42dB attenuation. Oops.

For all the experiments I am tuned to 7.175 MHz and have the FT-817 on it’s lowest power level of 0.5W.

For my first experiment I played a 1000 Hz sine wave into the system, and measured the average power. I like to start with simple signals, something known that lets me check all the fiddly RF kit is actually working. After a few hours of messing about – I did indeed see 27dBm (0.5W) on my spec-an. So, for a signal with 0dB PAPR, we measure average power = PEP. Check.

In my next experiment, I measured the effect of ALC on TX power. With the FT-817 on it’s lowest power setting (0.5W), I increased the drive until just before the ALC bars came on: Here is the relationship I found with output power:

Bars Tx Power
0 26.7
1 26.4
2 26.7
3 27.0

So the ALC really does clamp the power at the peak value.

On to more complex FreeDV signals.

Mesuring the average power OFDM/parallel tone signals proved much harder to measure on the spec-an. The power bounces around over a period of several seconds the ODFM waveform evolves which can derail many power measurement techniques. The time constant, or measurement window is important – we want to capture the total power over a few seconds and average the value.

After several attempts and lots of head scratching I settled on the following spec-an settings:

  1. 10s sweep time so the RBW filter is averging a lot of time varying power at each point in the sweep.
  2. 100kHz span.
  3. RBW/VBW of 10 kHz so we capture all of the 1kHz wide OFDM signal in the RBW filter peak when averaging.
  4. Power averaging over 5 samples.

The two-tone signal was included to help me debug my spec-an settings, as it has a known (3dB) PAPR.

Here is a table showing the results for several test signals, all of which have the same peak power:

Sample Description PAPR Theory/Sim (dB) PAPR Mesured (dB)
sine1000 sine wave at 1000 Hz 0 0
sine_800_1200 two tones at 800 and 1200Hz 3 4
vanilla 700D test frames unclipped 7.1 7
clip0.8 700D test frames clipped at 0.8 3.4 4
ve9qrp 700D with real speech payload data 11 10.5

Click on the file name to listen to a 5 second sample of the sample. The lower PAPR (higher average power) signals sound louder – I guess our ears work on average power too! I kept the drive constant and the PEP/peak just happened to hit 26dBm. It’s not critical, as long as the drive (and hence peak level) is the same across all waveforms tested.

Note the two tone “control” is 1dB off (4dB measured on a known 3dB PAPR signal), I’m not happy about that. This suggests a spec-an set up issue or limitation on my spec-an (e.g. the way it averages power).

However the other signals line up OK to the simulated values, within about +/- 0.5dB, which suggests I’m on the right track with my simulations.

The modulated 700D test frame signals were generated by the Octave ofdm_tx.m script, which reports the PAPR of the complex signal. The same test frame repeats continuously, which makes BER measurements convenient, but is slightly unrealistic. The PAPR was lower than the ve9qrp signal which has real speech payload data. Perhaps because the more random, real world payload data leads to occasional frames where the phase of the carriers align leading to large peaks.

Another source of discrepancy is the non flat frequency filtering in the baseband audio/crystal filter path the signal has to flow through before it emerges as RF.

The zero-span spec-an setting plots power over time, and is very useful for visualing PAPR. The first plot shows the power of our 1000 Hz sine signal (yellow), and the two tone test signal (purple):

You can see how mixing just two signals modulates the power over time, the effect on PAPR, and how the average power is reduced. Next we have the ve9qrp signal (yellow), and our clip 0.8 signal (purple):

It’s clear the clipped signal has a much higher average power. Note the random way the waveform power peaks and dips, as the various carriers come into phase. Note very few high power peaks in the ve9qrp signal – in this sample we don’t have any that hits +26dBm, as they are fairly rare.

I found eye-balling the zero-span plots gave me similar values to non-zero span results in the table above, a good cross check.

Take aways:

  1. Clipping is indeed improving our measured average power, but there are some discrepencies between the measured and PAPR values estimated from theory/simulation.
  2. Using a SDR to receive the signal and measure PAPR using my own maths might be easier than fiddling with the spec-an and guessing at it’s internal algorithms.
  3. PAPR is worse for real world signals (e.g. ve9qrp) than my canned test frames due to relatively rare alignments of the carrier phases. This might only happen once every few seconds, but significantly raises the PAPR, and hurts our average power. These occasional peaks might be triggering the ALC, pushing the average power down every time they occur. As they are rare, these peaks can be clipped with no impact on perceived speech quality. This is why I like the CDF/PDF method of setting thresholds, it lets us discard rare (low probability) outliers that might be hurting our average power.

Conclusions and Further work

The simulations suggest we can improve FreeDV by 4dB using the right clipper/filter combination. Initial tests over a real PA show we can indeed reduce PAPR in line with our simulations.

This project has lead me down an interesting rabbit hole that has kept me busy for a few weeks! Just in case I haven’t had enough, some ideas for further work:

  1. Align these clipping levels and filtering to FreeDV 700D (and possibly 2020). There is existing clipper and filter code but the thresholds were set by educated guess several years for 700C.
  2. Currently each FreeDV waveform is scaled to have the same average power. This is the signal fed via the sound card to your Tx. Should the levels of each FreeDV waveform be adjusted to be the same peak value instead?
  3. Design an experiment to prove BER performance at a given SNR is improved by 4dB as suggested by these simulations. Currently all we have measured is the average power and PAPR – we haven’t actually verified the expected 4dB increase in performance (suggested by the BER simulations above) which is the real goal.
  4. Try the experiments on a SDR Tx – they tend to get results closer to theory due to no crystal filters/baseband audio filtering.
  5. Try the experiments on a 100WPEP Tx – I have ordered a dummy load to do that relatively safely.
  6. Explore the effect of ALC on FreeDV signals and why we set the signals to “just tickle” the ALC. This is something I don’t really understand, but have just assumed is good practice based on other peoples experiences with parallel tone/OFDM modems and on-air FreeDV use. I can see how ALC would compress the amplitude of the OFDM waveform – which this blog post suggests might be a good thing! Perhaps it does so in an uncontrolled manner – as the curves above show the amount of compression is pretty important. “Just tickling the ALC” guarantees us a linear PA – so we can handle any needed compression/clipping carefully in the DSP.
  7. Explore other ways of reducing PAPR.

To peel away the layers of a complex problem is very satisfying. It always takes me several goes, improvements come as the bugs fall out one by one. Writing these blog posts oftens makes me sit back and say “huh?”, as I discover things that don’t make sense when I write them up. I guess that’s the review process in action.

Links

Design for a RF Sampler I built, mine has a 46dB loss.

Peak to Average Power Ratio for OFDM – Nice discussion of PAPR for OFDM signals from DSPlog.

September 01, 2020

BBB vs Jitsi

I previously wrote about how I installed the Jitsi video-conferencing system on Debian [1]. We used that for a few unofficial meetings of LUV to test it out. Then we installed Big Blue Button (BBB) [2]. The main benefit of Jitsi over BBB is that it supports live streaming to YouTube. The benefits of BBB are a better text chat system and a “whiteboard” that allows conference participants to draw shared diagrams. So if you have the ability to run both systems then it’s best to use Jitsi when you have so many viewers that a YouTube live stream is needed and to use BBB in all other situations.

One problem is with the ability to run both systems. Jitsi isn’t too hard to install if you are installing it on a VM that is not used for anything else. BBB is a major pain no matter what you do. The latest version of BBB is 2.2 which was released in March 2020 and requires Ubuntu 16.04 (which was released in 2016 and has “standard support” until April next year) and doesn’t support Ubuntu 18.04 (released in 2018 and has “standard support” until 2023). The install script doesn’t check for correct apt repositories and breaks badly with no explanation if you don’t have Ubuntu Multiverse enabled.

I expect that they rushed a release because of the significant increase in demand for video conferencing this year. But that’s no reason for demanding the 2016 version of Ubuntu, why couldn’t they have developed on version 18.04 for the last 2 years? Since that release they have had 6 months in which they could have released a 2.2.1 version supporting Ubuntu 18.04 or even 20.04.

The dependency list for BBB is significant, among other things it uses LibreOffice for the whiteboard. This adds to the pain of installing and maintaining it. It wouldn’t surprise me if some of the interactions between all the different components have security issues.

Conclusion

If you want something that’s not really painful to install and run then use Jitsi.

If you need YouTube live streaming use Jitsi.

If you need whiteboards and a good text chat system or if you generally need to run things like a classroom then BBB is a good option. But only if you can manage it, know someone who can manage it for you, or are happy to pay for a managed service provider to do it for you.

August 30, 2020

Audiobooks – August 2020

Truth, Lies, and O-Rings: Inside the Space Shuttle Challenger Disaster by Allan J. McDonald

The author was a senior manager in the booster team who cooperated more fully with the investigation than NASA or his company’s bosses would have preferred. Mostly accounts of meetings, hearings & coverups with plenty of technical details. 3/5

The Ascent of Money: A Financial History of the World by Niall Ferguson

A quick tour though the rise of various financial concepts like insurance, bonds, stock markets, bubbles, etc. Nice quick intro and some well told stories. 4/5

The Other Side of the Coin: The Queen, the Dresser and the Wardrobe by Angela Kelly

An authorized book from the Queen’s dresser. Some interesting stories. Behind-the-scenes on typical days and regular events Okay even without photos. 3/5

Second Wind: A Sunfish Sailor, an Island, and the Voyage That Brought a Family Together by Nathaniel Philbrick

A writer takes up competitive sailing after a gap of 15 years, training on winter ponds in prep for the Nationals. A nice read. 3/5

Spitfire Pilot by Flight-Lieutentant David M. Crook, DFC

An account of the Author’s experiences as a pilot during the Battle of Britain. Covering air-combat, missions, loss of friends/colleagues and off-duty life. 4/5

Wild City: A Brief History of New York City in 40 Animals
by Thomas Hynes

A Chapter on each species. Usually information about incidents they were involved in (see “Tigers”) or the growth, decline, comeback of their population & habit. 3/5

Fire in the Sky: Cosmic Collisions, Killer Asteroids, and the Race to Defend Earth by Gordon L. Dillow

A history of the field and some of the characters. Covers space missions, searchers, discovery, movies and the like. Interesting throughout. 4/5

The Long Winter: Little House Series, Book 6 by Laura Ingalls Wilder

The family move into their store building in town for the winter. Blizzard after blizzard sweeps through the town over the next few months and starvation or freezing threatens. 3/5

The Time Traveller’s Almanac Part 1: Experiments edited by Anne and Jeff VanderMeer

First of 4 volumes of short stories. 14 stories, many by well known names (ie Silverberg, Le Guin). A good collection. 3/5

A Long Time Ago in a Cutting Room Far, Far Away: My Fifty Years Editing Hollywood Hits—Star Wars, Carrie, Ferris Bueller’s Day Off, Mission: Impossible, and More by Paul Hirsch

Details of the editing profession & technology. Lots of great stories 4/5

My Scoring System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Share

Linkedin putting pressure on users to enable location tracking

I got this email from Linkedin this morning. It is telling me that they are going to change my location from “Auckland, New Zealand” to “Auckland, Auckland, New Zealand“.

Email from Linkedin on 30 August 2020

Since “Auckland, Auckland, New Zealand” sounds stupid to New Zealanders (Auckland is pretty much a big city with a single job market and is not a state or similar) I clicked on the link and opened the application to stick with what I currently have

Except the problem is that the pulldown doesn’t offer many any other locations

The only way to change the location is to click “use Current Location” and then allow Linkedin to access my device’s location.

According to the help page:

By default, the location on your profile will be suggested based on the postal code you provided in the past, either when you set up your profile or last edited your location. However, you can manually update the location on your LinkedIn profile to display a different location.

but it appears the manual method is disabled. I am guessing they have a fixed list of locations in my postcode and this can’t be changed.

So it appears that my options are to accept Linkedin’s crappy name for my location (Other NZers have posted problems with their location naming) or to allow Linkedin to spy on my location and it’ll probably still assign the same dumb name.

The basically appears to be a way for Linkedin to push user to enable location tracking. While at the same time they get to force their own ideas on how New Zealand locations work on users.

Share

August 29, 2020

How to create bridges on bonds (with and without VLANs) using NetworkManager

Some production systems you face might make use of bonded network connections that you need to bridge in order to get VMs onto them. That bond may or may not have a native VLAN (in which case you bridge the bond), or it might have VLANs on top (in which case you want to bridge the VLANs), or perhaps you need to do both.

Let’s walk through an example where we have a bond that has a native VLAN, that also has the tagged VLAN 123 on top (and maybe a second VLAN 456), all of which need to be separately bridged. This means we will have the bond (bond0) with a matching bridge (br-bond0), plus a VLAN on the bond (bond0.123) with its matching bridge (br-vlan123). It should look something like this.

+------+   +---------+                           +---------------+
| eth0 |---|         |          +------------+   |  Network one  |
+------+   |         |----------|  br-bond0  |---| (native VLAN) |
           |  bond0  |          +------------+   +---------------+
+------+   |         |                                            
| eth1 |---|         |                                            
+------+   +---------+                           +---------------+
            | |   +---------+   +------------+   |  Network two  |
            | +---| vlan123 |---| br-vlan123 |---| (tagged VLAN) |
            |     +---------+   +------------+   +---------------+
            |                                                     
            |     +---------+   +------------+   +---------------+
            +-----| vlan456 |---| br-vlan456 |---| Network three |
                  +---------+   +------------+   | (tagged VLAN) |
                                                 +---------------+

To make it more complicated, let’s say that the native VLAN on the bond needs a static IP and to operate at an MTU of 1500 while the other uses DHCP and needs MTU of 9000.

OK, so how do we do that?

Start by creating the bridge, then later we create the interface that attaches to that bridge. When creating VLANs, they are created on the bond, but then attached as a slave to the bridge.

Create the bridge for the bond

First, let’s create the bridge for our bond. We’ll export some variables to make scripting easier, including the name, value for spanning tree protocol (SPT) and MTU. Note that in this example the bridge will have an MTU of 1500 (but the bond itself will be 9000 to support other VLANs at that MTU size.)

BRIDGE=br-bond0
BRIDGE_STP=yes
BRIDGE_MTU=1500

OK so let’s create the bridge for the native VLAN on the bond (which doesn’t exist yet).

nmcli con add ifname "${BRIDGE}" type bridge con-name "${BRIDGE}"
nmcli con modify "${BRIDGE}" bridge.stp "${BRIDGE_STP}"
nmcli con modify "${BRIDGE}" 802-3-ethernet.mtu "${BRIDGE_MTU}"

By default this will look for an address with DHCP. If you don’t want that you can either set it manually:

nmcli con modify "${BRIDGE}" ipv4.method static ipv4.address 192.168.0.123/24 ipv6.method ignore

Or disable IP addressing:

nmcli con modify "${BRIDGE}" ipv4.method disabled ipv6.method ignore

Finally, bring up the bridge. Yes, we don’t have anything attached to it yet, but that’s OK.

nmcli con up "${BRIDGE}"

You should be able to see it with nmcli and brctl tools (if available on your distro), although note that there is no device attached to this bridge yet.

nmcli con
brctl show

Next, we create the bond to attach to the bridge.

Create the bond and attach to the bridge

Let’s create the bond. In my example I’m using active-backup (mode 1) but your bond may use balance-rr (round robin, mode 0) or, depending on your switching, perhaps something like link aggregation control protocol (LACP) which is 802.3ad (mode 4).

Let’s say that your bond (we’re going to call bond0) has two interfaces, which are eth0 and eth1 respectively. Note that in this example, although the native interface on this bond wants an MTU of 1500, the VLANs which sit on top of the bond need a higher MTU of 9000. Thus, we set the bridge to 1500 in the previous step, but we need to set the bond and its interfaces to 9000. Let’s export those now to make scripting easier.

BOND=bond0
BOND_SLAVE0=eth0
BOND_SLAVE1=eth1
BOND_MODE=active-backup
BOND_MTU=9000

Now we can go ahead and create the bond, setting the options and the slave devices.

nmcli con add type bond ifname "${BOND}" con-name "${BOND}"
nmcli con modify "${BOND}" bond.options mode="${BOND_MODE}"
nmcli con modify "${BOND}" 802-3-ethernet.mtu "${BOND_MTU}"
nmcli con add type ethernet con-name "${BOND}-slave-${BOND_SLAVE0}" ifname "${BOND_SLAVE0}" master "${BOND}"
nmcli con add type ethernet con-name "${BOND}-slave-${BOND_SLAVE1}" ifname "${BOND_SLAVE1}" master "${BOND}"
nmcli con modify "${BOND}-slave-${BOND_SLAVE0}" 802-3-ethernet.mtu "${BOND_MTU}"
nmcli con modify "${BOND}-slave-${BOND_SLAVE1}" 802-3-ethernet.mtu "${BOND_MTU}"

OK at this point you have a bond specified, great! But now we need to attach it to the bridge, which is what will make the bridge actually work.

nmcli con modify "${BOND}" master "${BRIDGE}" slave-type bridge

Note that before we bring up the bond (or afterwards) we need to disable or delete any existing network connections for the individual interfaces. Check this with nmcli con and delete or disable those connections. Note that this may disconnect you, so make sure you have a console to the machine.

Now, we can bring the bond up which will also activate our interfaces.

nmcli con up "${BOND}"

We can check that the bond came up OK.

cat /proc/net/bonding/bond0

And this bond should also now be on the network, via the bridge which has an IP set.

Now if you look at the bridge you can see there is an interface (bond0) attached to it (your distro might not have brctl).

nmcli con
ls /sys/class/net/br-bond0/brif/
brctl show

Bridging a VLAN on a bond

Now that we have our bond, we can create the bridged for our tagged VLANs (remember that the bridge connected to the bond is a native VLAN so it didn’t need a VLAN interface).

Create the bridge for the VLAN on the bond

Create the new bridge, which for our example is going to use VLAN 123 which will use MTU of 9000.

VLAN=123
BOND=bond0
BRIDGE=br-vlan${VLAN}
BRIDGE_STP=yes
BRIDGE_MTU=9000

OK let’s go! (This is the same as the first bridge we created.)

nmcli con add ifname "${BRIDGE}" type bridge con-name "${BRIDGE}"
nmcli con modify "${BRIDGE}" bridge.stp "${BRIDGE_STP}"
nmcli con modify "${BRIDGE}" 802-3-ethernet.mtu "${BRIDGE_MTU}"

Again, this will look for an address with DHCP, so if you don’t want that, then disable it or set an address manually (as per first example). Then you can bring the device up.

nmcli con up "${BRIDGE}"

Create the VLAN on the bond and attach to bridge

OK, now we have the bridge, we create the VLAN on top of bond0 and then attach it to the bridge we just created.

nmcli con add type vlan con-name "${BOND}.${VLAN}" ifname "${BOND}.${VLAN}" dev "${BOND}" id "${VLAN}"
nmcli con modify "${BOND}.${VLAN}" master "${BRIDGE}" slave-type bridge
nmcli con modify "${BOND}.${VLAN}" 802-3-ethernet.mtu "${BRIDGE_MTU}"

If you look at bridges now, you should see the one you just created, attached to a VLAN device (note, your distro might not have brctl).

nmcli con
brctl show

And that’s about it! Now you can attach VMs to those bridges and have them on those networks. Repeat the process for any other VLANs you need on to of the bond.

August 26, 2020

Open IP over VHF/UHF 2

The goal of this project is to develop a “100 kbit/s IP link” for VHF/UHF using just a Pi and RTLSDR hardware, and open source modem software. Since the first post in this series there’s been quite a bit of progress:

  1. I have created a GitHub repo with build scripts, a project plan and command lines on how to run some basic tests.
  2. I’ve built some integrated applications, e.g. rtl_fsk.c – that combines rtl_sdr, a CSDR decimator, and the Codec 2 fsk_demod in one handy application.
  3. Developed a neat little GUI system so we can see whats going on. I’ve found real time GUIs invaluable for physical layer experimentation. That’s one thing you don’t get with a chipset.

Spectral Purity

Bill and I have built and tested (on a spec-an) Tx filters for our Pis, that ensure the transmitted signal meets Australian Ham spurious requirements (which are aligned with international ITU requirements). I also checked the phase noise at 1MHz offset and measured -90dBm/Hz, similar to figures I have located online for my FT-817 at a 5W output level (e.g. DF9IC website quotes +37dBm-130dBc=-93dBm/Hz).

While there are no regulations for Ham phase noise in Australia, my Tx does appear to be compliant with ETSI EN 300 220-1 which deals with short range ISM band devices (maximum of -36dBm in a 100kHz bandwidth at 1MHz offset). Over an ideal 10km link, a -90dBm/Hz signal would be attenuated down to -180dBm/Hz, beneath the thermal noise floor of -174dBm/Hz.

I set up an experiment to pulse the Pi Tx signal off and on at 1 second intervals. Listening on a SSB Rx at +/-50kHz and +/-1MHz about 50m away and in a direct line of site to the roof mounted Pi Tx antenna I can hear no evidence of interference from phase noise.

I am satisfied my Pi based Tx won’t be interfering with anyone.

However as Mark VK5QI suggested I would not recommend amplifying the Tx level to the several watt level. If greater powers are required there are some other Tx options. For example the FSK transmitters of chipset transmitters work quite well, and some have better phase noise specs.

Over the Air Tests

Bill and I spent a few afternoons attempting to send packet at various bit rates. We measured our path loss at -135dBm over a 10km, non-line of site suburban path. Using our FT817s, this path is a noisy copy on SSB using a few watts.

Before I start any tests I ask myself “what would we expect to see?”. Well with 12dBm Tx power thats +12 – 135 = -123dBm into the receiver. Re-arranging for Eb/No, and using Rb=1000 bits/s and a RTL-SDR noise figure of 7dB:

Rx    = Eb/No + 10log10(Rb) + NF - 174
Eb/No = Rx - 10log10(Rb) - NF + 174
      = -123 - 10*log10(1000) - 7 + 174
      = 14 dB

Here is a plot of Eb/No versus BER generated by the mfsk.m script:

Looking up our 2FSK Bit Error Rate (BER) for Eb/No of 14dB, we should get better than 1E-4 (it’s off the graph – but actually about 2E-6). So at 1000 bit/s, we expect practically no errors.

I was disappointed with the real world OTA results: I received packets at 1000 bit/s with 8% BER (equivalent to an Eb/No of 5.5dB) which suggests we are losing 8.5dB somewhere. Our receiver seems a bit deaf.

Here’s a screenshot of the “dashboard” with a 4FSK signal sent by Bill (we experimented with 4FSK as well as 2FSK):

You can see a bunch of other signals (perhaps local EMI) towering over our little 4FSK signal. The red crosses show the frequency estimates of the demodulator – they should lock onto each of the FSK tones (4 in this case).

One risk with low cost SDRs (in a city) is strong signal interference causing the receiver to block and fall over entirely. When I connected my RTL-SDR to my antenna, I had to back the gain off about 3dB, not too bad. However Bill needed to back his gain off 20dB. So that’s one real-world factor we need to deal with.

Still we did it – sending packets 10km across town, through 135dB worth of trees and buildings with just a 12mW Pi and a RTLSDR! That’s a start, and no one said this would be easy! Time to dig in and find some missing dBs.

Over the Bench Testing

I decided to drill down into the MDS performance and system noise figure. After half a days wrangling with command line utilties I had a HackRF rigged up to play FSK signals. The HackRF has the benefit of a built in attenuator, so the output level can be quite low (-68dBm). This makes it much easier to reliably set levels in combination with an external switched attenuator, compared to using relatively high (+14dBm) output of the Pi which gets into everything and requires lots of shielding. Low levels out of your Tx greatly simplifies “over the bench” testing.

After a few days of RF and DSP fiddling I tracked down a problem with the sample rate. I was running the RTL-SDR at a sample rate of 240 kHz, using it’s internal hardware to handle the sample rate conversion. Sampling at 1.8MHz, and reducing the sample rate externally improved the performance by 3dB. I’m guessing this is due to the internal fixed point precision of the RTL-SDR, it may have significant quantisation noise with weak signals.

OK, so now I was getting system noise figures between 7.5 and 9dB when tested “over the bench”. Close, but a few dB above what I would expect. I eventually traced that to a very subtle measurement bug. In the photo you can see a splitter at the output of the switched attenuator. One side feeds to the RTL-SDR, the other to my spec-an. To calibrate the system I play a single freq carrier from the HackRF, as this makes measurement easier on the spec-an using power averaging.

Turns out, the RTL-SDR input port is only terminated when RTL-SDR software is running, i.e. the dongle is active. I often had the software off when measuring levels, so the levels were high by about 1dB, as one port of the splitter was un-terminated!

The following table maps the system noise figure for various gain settings:

g Rx (dBm) BER Eb/No Est NF
0 -128.0 0.01 9.0 7.0
49 -128.0 0.015 8.5 7.5
46 -128 0.06 6 10.0
40 -126.7 0.018 8 12
35 -123 0.068 6 15
30 -119 0.048 7 18

When repeating the low level measurements with -g 0 I obtained 8, 6.5, 7.0, 7.7, so there is some spread. The automatic gain (-g 0) seems about 0.5dB ahead of maximum manual gain (-g 49).

These results are consistent with those reported in this fine report, which measured the NF of the SDRs directly. I have also previously measured RTLSDR noise figures at around 7dB, although on an earlier model.

This helps us understand the effect of receiver gain. Lowering it is bad, especially lowering it a lot. However we may need to run at a lower gain setting, especially if the receiver is being overloaded by strong signals. At least this lets us engineer the link, and understand the effect of gain on system performance.

For fun I hooked up a LNA and using 4FSK I managed 2% BER at -136dBm, which works out to a 2.5dB system noise figure. This is a little higher than I would expect, however I could see some evidence of EMI on the dashboard. Such low levels are difficult on the bench without a Faraday cage of some sort.

Engineering the Physical Layer

With this project the physical layer (modem and radio) is wide open for us to explore. With chipset based approaches you get a link or your don’t, and perhaps a little diagnostic information like a received signal strength. Then again they “just work” most of the time so that’s probably OK! I like looking inside the black boxes and pushing up against the laws of physics, rather that the laws of business.

It gets interesting when you can measure the path loss. You have a variety of options to improve your bit error rate or increase your bit rate:

  1. Add transmit power
  2. Reposition your antenna above the terrain, or out of multipath nulls
  3. Lower loss coax run to your antenna, or mount your Pi and antenna together on top of the mast
  4. Use an antenna with a higher gain like Bill’s Yagi
  5. Add a low noise amplifier to reduce your noise figure
  6. Adjust your symbol/bit rate to spread that energy over fewer (or more) bits/s
  7. Use a more efficient modulation scheme, e.g. 4FSK performs 3dB better than 2FSK at the same bit rate
  8. Move to the country where the ambient RF and EMI is hopefully lower

Related Projects

There are some other cool projects in the “200kHz channel/several 100 kbit/s/IP” UHF Ham data space:

  1. New packet radio NPR70 project – very cool GFSK/TDMA system using a chipset modem approach. I’ve been having a nice discussion with the author Guillaume F4HDK around modem sensitivity and FEC. This project is very well documented.
  2. HNAP – a sophisticated OFDM/QAM/TDMA sytem that is being developed on Pluto SDR hardware as part of a masters thesis.

I’m a noob when it comes to protocols, so have a lot to learn from these projects. However I am pretty focussed on modem/FEC performance which is often sub-optimal (or delegated to a chipset) in the Ham Radio space. There are many dB to be gained from good modem design, which I prefer over adding a PA.

Next Steps

OK, so we have worked though a few bugs and can now get results consistent with the estimated NF of the receiver. It doesn’t explain the entire 8.5dB loss we experienced over the air, but it’s a step in the right direction. The bugs tend to reveal themselves one at a time ……

One possible reason for reduced sensitivity is EMI or ambient RF noise. There are some signs of this in the dashboard plot above. This is more subtle than strong signal overload, but could be increasing the effective noise figure on our link. All our calculations above assume no additional noise being fed into the antenna.

I feel it’s time for another go at Over The Air (OTA) tests. My goal is to get a solid 1000 bit/s link in both directions over our path, and understand any impact on performance such as strong signals or a raised noise floor. We can then proceed to the next steps in the project plan.

Reading Further

GitHub repo for this project with build scripts, a project plan and command lines on how to run some basic tests.
Open IP over VHF/UHF – first post in this series
Bill and I are documenting our OTA tests in this Pull Request
4FSK on 25 Microwatts low bit rate packets with sophisticated FEC at very low power levels. We’ll use the same FEC on this project.
Measuring SDR Noise Figure
Measuring SDR Noise Figure in Real Time
Evaluation of SDR Boards V1.0 – A fantastic report on the performance of several SDRs
NPR70 – New Packet Radio Web Site
HNAP Web site

August 25, 2020

Process Locally, Backup Remotely

Recently, a friend expressed a degree of shock that I could pull old, even very old, items of conversation from emails, Facebook messenger, etc., with apparent ease. "But I wrote that 17 years ago". They were even dismayed when I revealed that this all just stored as plain-text files, suggesting that perhaps I was like a spy, engaging in some sort of data collection on them by way of mutual conversations.

For my own part, I was equally shocked by their reaction. Another night of fitful sleep, where feelings of self-doubt percolate. Is this yet another example that I have some sort of alien psyche? But of course, this is not the case, as keeping old emails and the like as local text files is completely normal in computer science. All my work and professional colleagues do this.

What is the cause of this disparity between the computer scientist and the ubiquitous computer user? Once I realised that the disparity of expected behaviour was not personal, but professional, there was clarity. Essentially, the convenience of cloud technologies and their promotion of applications through Software as a Service (SaaS) has led to some very poor computational habits among general users that have significant real-world inefficiencies.

Webmail

The earliest example of a SaaS application that is convenient and inefficient is webmail. Early providers such as Hotmail and Yahoo! offered the advantage of one being able to access their email from anywhere on any device with a web-browser, and that convenience far out-weighed the more efficient method of processing email with a client with POP (Post-Office Protocol), because POP would delete the email from the mail server as part of the transfer.

Most people opted for various webmail implementations for convenience. There were some advantages of POP, such as storage being limited only by the local computer's capacity, the speed that one could process emails being independent of the local Internet connection, and the security of having emails transferred locally rather than on a remote server. But these weren't enough for the convenience and en masse people adopted cloud solutions, which now have very easily been integrated into the corporate world.

During all this time a different email protocol, IMAP (Internet Access Message Protocol), became more common. IMAP provided numerous advantages over POP. Rather than deleting emails from the server, it copied them to the client and kept them on the server (although some still saw this as a security issue). IMAP clients stay connected to the server whilst active, rather than the short retrieve connection used by POP. IMAP allowed for multiple simultaneous connections, message state information, and multiple mail-boxes. Effectively, IMAP provided the local processing performance, but also the device and location convenience of webmail.

The processing performance that one suffers from webmail is effectively two-fold. One's ability to use and process webmail is dependent on the speed of the Internet connection to the mail server and the spare capacity of that mail-server to perform tasks. For some larger providers (e.g., Gmail) it will only be the former that it is really an issue. But even then, the more complex the regular expression in the search the harder it is, as basic RegEx tools (grep, sed, awk, perl) typically aren't available. With local storage of emails, kept in plain text files, the complexity of the search criteria depends only on the competence of the user and the speed with the performance of the local system. That is why pulling up an archaic email with specified regular expressions search terms is a relatively trivial task for computer professionals who keep their old emails stored locally, but less so for computer users who store them using a webmail application.

Messenger Systems

Where would we be without our various messenger systems? We love the instant message convenience, it's like SMS on steroids. Whether Facebook Messenger, WhatsApp, Instagram, Zoom, WeChat, Discord, or any range of similar products there is the very same issue that one is confronted with webmail because it too operates as a SaaS application. Your ability to process data will depend on the speed of your Internet connection and the spare computational capacity on the receiving server. Yeah, good luck with that on your phone. Have you ever tried to scroll through a few thousand Facebook Messenger posts? It's absolutely hopeless.

But when something like an extensive Facebook message chat is downloaded as a plain HTML file searching and scrolling becomes trivial and blindingly fast in comparison. Google Chrome, for example, has a pretty decent Facebook Message/Chat Downloader. WhattsApp backs up chats automatically to one's mobile device and can be setup to periodically downloaded to Google Drive which then can be moved to a local device (regular expressions on 'phones are not the greatest, yet), and Slack (with some restrictions) offers a download feature. For Discord, there is an open-source chat exporters. Again, the advantages of local copies become clear; speed of processing, the capability of conducting complex searches.

Supercomputers and the Big Reveal

What could this discussion of web-apps and local processing possibility have to do with the operations of big iron supercomputers? Quite a lot actually, as the example illustrates the issue at a scale. Essentially the issue involved a research group that had a very large dataset that was stored interstate. The normal workflow in such a situation is to transfer the dataset as needed to the supercomputer and conduct the operations there. Finding this inconvenient, they made a request to have mount points on each of the compute nodes so the data could stay interstate, the computation could occur on the supercomputer, and they could ignore the annoying requirement of actually transferring the data. Except physics gets in the way, and physics always wins.

The problem was that the computational task primarily involved the use of the Burrows-Wheeler Aligner (BWA) which aligns relatively short nucleotide sequences against a long reference sequence such as the human genome. In carrying out this activity this program makes very extensive use of disk read-writes. Now, you can imagine the issue of the data being sent interstate, the computation being run on the supercomputer, and then the data being sent back to the remote disk, thousands of times per second. The distance between where the data is and where the processing is carried out is meaningful. It is much more efficient to conduct the computational processing close to where the data is. As Grace Hopper, peace-be-upon-her said: "Mind your nanoseconds!"

I like to hand out c30cm pieces of string at the start of introductory HPC training workshops, telling researchers that should hang on to the piece of string to remind themselves to think of where their data is, and where their compute is. And that is the basic rule; keep your data that you want to be processed, even when it's something trivial like a searching email, or even as complex as aligns relatively short nucleotide sequences, close to the processor itself, and the closer the better.

What then is the possible advantage of cloud-enabled services? Despite their ubiquity, when it comes to certain performance-related tasks there isn't much, although the centralised services can provide many advantages in itself. Perhaps the very best processing use is perhaps treating the cloud as "somebody else's computer", that is, a resource where one can offload data for certain computational tasks for can be conducted on the remote system. Rather like how many and an increasing number of researchers find themselves going to HPC facilities when they discover that their datasets and computational problems are too large or too complex for their personal systems.

There is also one extremely useful advantage called cloud services from the user perspective and that is, of course, off-site backup. Whilst various local NAS systems are very good for serving datasets within, say, a household especially for the provision of media, it is perhaps not the wisest choice for the ultimate level of backup. Eventually, disks will fail and with disk failure, there is quite a cost in data recovery. Large cloud storage providers, such as Google, offer massive levels of redundancy which ensures that data loss is extremely improbable, and with tools such as Rclone, synchronisation and backups of local data to remote services can be easily automated. In a nutshell: process data close to the CPU, backup remotely.

August 24, 2020

Sidewalk Delivery Robots: An Introduction to Technology and Vendors

At the start of 2011 Uber was in one city (San Francisco). Just 3 years later it was in hundreds of cities worldwide including Auckland and Wellington. Dockless Electric Scooters took only a year from their first launch to reach New Zealand. In both cases the quick rollout in cities left the public, competitors and regulators scrambling to adapt.

Delivery Robots could be the next major wave to rollout worldwide and disrupt existing industries. Like driverless cars these are being worked on by several companies but unlike driverless cars they are delivering real packages for real users in several cities already.

Note: I plan to cover other aspects of Sidewalk Delivery Robots including their impact of society in a followup article.

What are Delivery Robots?

Delivery Robots are driverless vehicles/drones that cover the last mile. They are loaded with a cargo and then will go to a final destination where they are unloaded by the customer.

Indoor Robots are designed to operate within a building. An example of these is The Keenon Peanut. These deliver items to guests in hotels or restaurants . They allow delivery companies to leave food and other items with the robot at the entrance/lobby of a building rather than going all the way to a customer’s room or apartment.

Keenon Peanut

Flying Delivery Drones are being tested by several companies. Wing which is owned by Google’s parent company Alphabet, is testing in Canberra, Australia. Amazon also had a product called Amazon Prime Air which appears to have been shelved.

Wing Flying Robot

The next size up are sidewalk delivery robots which I’ll be concentrating on in the article. Best known of these is Starship Technologies but there is also Kiwi and Amazon Scout. These are designed to drive at slow speeds on the footpaths rather than mix with cars and other vehicles on the road. They cross roads at standard crossings.

KiwiBot
Starship Delivery Robot
Amazon Scout

Finally some companies are rolling out Car sized Delivery Robots designed to drive on roads and mix with normal vehicles. The REV-1 from Reflection AI is at the smaller end with company videos showing it using both car and bike lanes. Larger is the Small-Car sized Nuro.

REV-1
Nuro

Sidewalk Delivery Robots

I’ll concentrate most on Sidewalk Delivery Robots in this article because I believe they are the most mature and likeliest to have an effect on society in the short term (next 2-10 years).

  • In-building bots are a fairly niche product that most people won’t interact with regularly.
  • Flying Drones are close to working but it it seems to be some time before they can function safely in a built-up environment and autonomously. Cargo capacity is currently limited in most models and larger units will bring new problems.
  • Car (or motorbike) sized bots have the same problems as driverless cars. They have to drive fast and be fully autonomous in all sorts of road conditions. No time to ask for human help, a vehicle on the road will at best block traffic or at potentially be involved in an accident. These stringent requirements mean widespread deployment is probably at least 10 years away.

Sidewalk bots are much further along in their evolution and they have simpler problems to solve.

  • A small vehicle that can carry a takeaway or small grocery order is buildable using today’s technology and not too expensive.
  • Footpaths exist most places they need to go.
  • Walking pace ( up to 6km/h ) is fast enough to be good enough even for hot food.
  • Ubiquitous wireless connectivity enables the robots to be controlled remotely if they cannot handle a situation automatically.
  • Everything unfolds slower on the sidewalk. If a sidewalk bot encounters a problem it can just slow to a stop and wait for remote help. If that process takes 20 seconds then it is usually no problem.

Starship Technologies

Starship are the best known vendor and most advanced vendor in the sector. They launched in 2015 and have a good publicity team.

In late 2019 Starship announced a major rollout to US university campuseswith their abundance of walking paths, well-defined boundaries, and smartphone-using, delivery-minded student bodies“. Campuses include The University of Mississippi and Bowling Green State University .

The push into college campuses was unluckily timed with many being closed in 2020 due to Covid-19. Starship has increased delivery areas outside of campus in some places to try and compensate. It has also seen a doubling of demand in Milton Keynes. However the company has laid of some workers in March 2020.

Kiwibot

Kiwibot

Kiwibot is one of the few other companies that has gone beyond the prototype stage to servicing actual customers. It is some way behind Starship with the robots being less autonomous and needing more onsite helpers.

  • Based in Columbia with a major deployment in Berkley, California around the UCB campus area
  • Robots cost $US 3,500 each
  • Smaller than Starship with just 1 cubic foot of capacity. Range and speed reportedly lower
  • Guided by remote control using way-points by operators in Medellín, Colombia. Each operator can control up to 3 bots.
  • On-site operators in Berkley maintain robots (and rescue them when they get stuck).
  • Some orders delivered wholly or partially by humans
  • Concentrating on the Restaurant delivery market
  • Costs for the Business
    • Lease/rent starts at $20/day per robot
    • Order capacity 6-18/day per Robot depending on demand & distance.
    • Order fees are $1.99/order with 1-4 Kiwibots leased
    • or $0.99/order if you have 5-10 Kiwibots leased
  • Website, Kiwibot Campus Vision Video , Kiwibot end-2019 post

An interesting feature is that Kiwibot publish their prices for businesses and provide a calculator with which you can calculate break-even points for robot delivery.

As with Starship, Kiwibot was hit by Covid19 closing College campuses. In July 2020 they announced a rollout in the city of San Jose, California in partnership with Shopify and Ordermark. The company is trying to pivot towards just building the robot infrastructure and partner with companies that already have that [marketplace] in mind. They are also trying to crowdfund for investment money.

Amazon Scout

Amazon Scout

Amazon are only slowly rolling out their Scout Robots. It is similar in appearance to the Starship Robots vehicle but is larger.

  • Announced in January 2019
  • Weight “about 100 pounds” (50 kg). No further specs available.
  • A video of the development team at Amazon
  • Initially delivering to Snohomish County, Washington near Amazon HQ
  • Added Irvine, California in August 2019 but still supervised by human
  • In July 2020 announced rollouts in Atlanta, Georgia and Franklin, Tennessee, but still “initially be accompanied by an Amazon Scout Ambassador”.

Other Companies

There are several other companies also working on Sidewalk Delivery Robots. The most advanced are Restaurant Delivery Company Postmates (now owned by Uber) has their own robot called Serve which is in early testing. Video of it on the street.

Several other companies have also announced projects. None appear to be near rolling out to live customers though.

Business Model and Markets

At present Starship and Kiwi are mainly targeting the restaurant deliver market against established players such as Uber Eats. Reasons for going for this market include

  • Established market, not something new
  • Short distances and small cargo
  • Customers unload produce quickly product so no waiting around
  • Charges by existing players quite high. Ballpark costs of $5 to the customer (plus a tip in some countries) and the restaurant also being charged 30% of the bill
  • Even with the high charges drivers end up making only around minimum wage.
  • The current business model is only just working. While customers find it convenient and the delivery cost reasonable, restaurants and drivers are struggling to make money.

Starship and Amazon are also targeting the general delivery market. This requires higher capacity and also customers may not be home when the vehicle arrives. However it may be the case that if vehicles are cheap enough they could just wait till the customer gets home.

Still more to cover

This article as just a quick introduction of the Sidewalk Delivery Robots out there. I hope to do a later post covering more including what the technology will mean for the delivery industry and for other sidewalk users as well as society in general.

Share

August 14, 2020

Jitsi on Debian

I’ve just setup an instance of the Jitsi video-conference software for my local LUG. Here is an overview of how to set it up on Debian.

Firstly create a new virtual machine to run it. Jitsi is complex and has lots of inter-dependencies. It’s packages want to help you by dragging in other packages and configuring them. This is great if you have a blank slate to start with, but if you already have one component installed and running then it can break things. It wants to configure the Prosody Jabber server and a web server and my first attempt at an install failed when it tried to reconfigure the running instances of Prosody and Apache.

Here’s the upstream install docs [1]. They cover everything fairly well, but I’ll document the configuration I wanted (basic public server with password required to create a meeting).

Basic Installation

The first thing to do is to get a short DNS name like j.example.com. People will type that every time they connect and will thank you for making it short.

Using Certbot for certificates is best. It seems that you need them for j.example.com and auth.j.example.com.

apt install curl certbot
/usr/bin/letsencrypt certonly --standalone -d j.example.com,auth.j.example.com -m you@example.com
curl https://download.jitsi.org/jitsi-key.gpg.key | gpg --dearmor > /etc/apt/jitsi-keyring.gpg
echo "deb [signed-by=/etc/apt/jitsi-keyring.gpg] https://download.jitsi.org stable/" > /etc/apt/sources.list.d/jitsi-stable.list
apt-get update
apt-get -y install jitsi-meet

When apt installs jitsi-meet and it’s dependencies you get asked many questions for configuring things. Most of it works well.

If you get the nginx certificate wrong or don’t have the full chain then phone clients will abort connections for no apparent reason, it seems that you need to edit /etc/nginx/sites-enabled/j.example.com.conf to use the following ssl configuration:

ssl_certificate /etc/letsencrypt/live/j.example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/j.example.com/privkey.pem;

Then you have to edit /etc/prosody/conf.d/j.example.com.cfg.lua to use the following ssl configuration:

key = "/etc/letsencrypt/live/j.example.com/privkey.pem";
certificate = "/etc/letsencrypt/live/j.example.com/fullchain.pem";

It seems that you need to have an /etc/hosts entry with the public IP address of your server and the names “j.example.com j auth.j.example.com”. Jitsi also appears to use the names “speakerstats.j.example.com conferenceduration.j.example.com lobby.j.example.com conference.j.example.com conference.j.example.com internal.auth.j.example.com” but they aren’t required for a basic setup, I guess you could add them to /etc/hosts to avoid the possibility of strange errors due to it not finding an internal host name. There are optional features of Jitsi which require some of these names, but so far I’ve only used the basic functionality.

Access Control

This section describes how to restrict conference creation to authenticated users.

The secure-domain document [2] shows how to restrict access, but I’ll summarise the basics.

Edit /etc/prosody/conf.avail/j.example.com.cfg.lua and use the following line in the main VirtualHost section:

        authentication = "internal_hashed"

Then add the following section:

VirtualHost "guest.j.example.com"
        authentication = "anonymous"
        c2s_require_encryption = false
        modules_enabled = {
            "turncredentials";
        }

Edit /etc/jitsi/meet/j.example.com-config.js and add the following line:

        anonymousdomain: 'guest.j.example.com',

Edit /etc/jitsi/jicofo/sip-communicator.properties and add the following line:

org.jitsi.jicofo.auth.URL=XMPP:j.example.com

Then run commands like the following to create new users who can create rooms:

prosodyctl register admin j.example.com

Then restart most things (Prosody at least, maybe parts of Jitsi too), I rebooted the VM.

Now only the accounts you created on the Prosody server will be able to create new meetings. You should be able to add, delete, and change passwords for users via prosodyctl while it’s running once you have set this up.

Conclusion

Once I gave up on the idea of running Jitsi on the same server as anything else it wasn’t particularly difficult to set up. Some bits were a little fiddly and hopefully this post will be a useful resource for people who have trouble understanding the documentation. Generally it’s not difficult to install if it is the only thing running on a VM.

August 13, 2020

M17 Open Source Radio

In 2019 Wojciech Kaczmarski (SP5WWP), started working on the M17 open source radio project. Since then the project has made great progress – assembling a team of Hams involved, a project website, mailing list – and steady progress on the development of the TR9 radio, the first to be built to the M17 standard.

For the RF deck, the TR9 is using the chipset approach and a RRC filtered 4FSK waveform, similar to DMR. For my own work, I like to optimise low SNR performance by building the physical layer, but I’m a bit of a modem nerd, and it complicates the RF side. Compared to a low cost SDR/custom RF design – the RF chipset approach this will get them on the air quickly, and the filtered 4FSK will have a narrow on air bandwidth (12.5kHz channel spacing). Sensitivity will be similar to analog FM/DMR radios.

Links

M17 Project Web site
M17 Twitter Feed
M17 Spec

August 08, 2020

Setting the default web browser on Debian and Ubuntu

If you are wondering what your default web browser is set to on a Debian-based system, there are several things to look at:

$ xdg-settings get default-web-browser
brave-browser.desktop

$ xdg-mime query default x-scheme-handler/http
brave-browser.desktop

$ xdg-mime query default x-scheme-handler/https
brave-browser.desktop

$ ls -l /etc/alternatives/x-www-browser
lrwxrwxrwx 1 root root 29 Jul  5  2019 /etc/alternatives/x-www-browser -> /usr/bin/brave-browser-stable*

$ ls -l /etc/alternatives/gnome-www-browser
lrwxrwxrwx 1 root root 29 Jul  5  2019 /etc/alternatives/gnome-www-browser -> /usr/bin/brave-browser-stable*

Debian-specific tools

The contents of /etc/alternatives/ is system-wide defaults and must therefore be set as root:

sudo update-alternatives --config x-www-browser
sudo update-alternatives --config gnome-www-browser

The sensible-browser tool (from the sensible-utils package) will use these to automatically launch the most appropriate web browser depending on the desktop environment.

Standard MIME tools

The others can be changed as a normal user. Using xdg-settings:

xdg-settings set default-web-browser brave-browser-beta.desktop

will also change what the two xdg-mime commands return:

$ xdg-mime query default x-scheme-handler/http
brave-browser-beta.desktop

$ xdg-mime query default x-scheme-handler/https
brave-browser-beta.desktop

since it puts the following in ~/.config/mimeapps.list:

[Default Applications]
text/html=brave-browser-beta.desktop
x-scheme-handler/http=brave-browser-beta.desktop
x-scheme-handler/https=brave-browser-beta.desktop
x-scheme-handler/about=brave-browser-beta.desktop
x-scheme-handler/unknown=brave-browser-beta.desktop

Note that if you delete these entries, then the system-wide defaults, defined in /etc/mailcap, will be used, as provided by the mime-support package.

Changing the x-scheme-handler/http (or x-scheme-handler/https) association directly using:

xdg-mime default brave-browser-nightly.desktop x-scheme-handler/http

will only change that particular one. I suppose this means you could have one browser for insecure HTTP sites (hopefully with HTTPS Everywhere installed) and one for HTTPS sites though I'm not sure why anybody would want that.

Summary

In short, if you want to set your default browser everywhere (using Brave in this example), do the following:

sudo update-alternatives --config x-www-browser
sudo update-alternatives --config gnome-www-browser
xdg-settings set default-web-browser brave-browser.desktop

Small 1/4 inch socket set into a nicer walnut tray

 I was recently thinking about how I could make a selection of 1/4 inch drive bits easier to use. It seems I am not alone in the crowd of people who leave the bits in the case they came in. Some folks do that for many decades. Apart from being trapped into what "was in the set" this also creates an issue when you have some 1/4 inch parts in a case that includes many more 3/8 inch drive bits. I originally marked the smaller drive parts and though about leaving them in the blow molded case as is the common case.

The CNC fiend in me eventually got the better of me and the below is the result. I cut a prototype in pine first, knowing that the chances of getting it all as I wanted on the first try was not impossible, but not probable either. Version 1 is shown below.

 

 The advantage is that now I have the design in Fusion 360 I can cut this design in about an hour. So if I want to add a bunch of deep sockets to the set I can do that for the time cost mostly of gluing up a panel, fixturing it and a little sand a shellac. Not a trivial en devour but the result I think justifies the means.

Below is the board still fixtured in the cnc machine. I think I will make a jig with some sliding toggle clamps so I can fix panels to the jig and then bolt the jig into the cnc instead of directly using hold down clamps.

I have planned to use a bandsaw to but a profile around the tools and may end up with some handle(s) on the tray. That part is something I have to think more about. The thinking about how I want the tools to be stored and accessed is an interesting side project.



 

 

August 04, 2020

Willsmere and Cricket

Whether Australia's first notable cricketeer, Tom Wills was at Kew Asylum is apparently subject to debate.

The following Kew Asylum related cricket stories, however, are not. Note the inclusion of one of the greats of early Australian cricket, Hugh Trumble.

William Evans Midwinter

The plaque on grave commemorates William Evans Midwinter (1851-1890), the only cricketer to play for Australia versus England (8 tests) and England versus Australia (4 tests).

William ("Billy") Evans Midwinter (19 June 1851– 3 December 1890) was an English born cricketer who played four Test matches for England, sandwiched in between eight Tests that he played for Australia. Midwinter holds a unique place in cricket history as the only cricketer to have played for Australia and England in Test Matches against each other.

By 1889, Midwinter's wife and two of his children had died, and his businesses were failed or failing. He became "hopelessly insane" and was confined to Bendigo Hospital in 1890. He was then transferred to the Kew Asylum, where he died later that year.

http://monumentaustralia.org.au/display/30687-william-evans-midwinter

KEW ASYLUM CRICKET CLUB.

From: The Australasian (Melbourne, Vic. : 1864 - 1946) Sat 2 Oct 1875
Page 11

A meeting of the members of the staff of the Kew Asylum was held on Saturday afternoon last, when it was resolved to form a cricket club at the establishment. Dr. Robertson was elected president, and Dr. Watkins and Mr. William Davis vice-presidents, Dr. Molloy hon. secretary and treasurer, and Messrs. Trumble, Johnston, Swift, and Flynn as committee.

The club starts with a large number of members, and with such players as Swtft, Niall and Flynn, it is likely to prove rather formidable. The club has not had an opportunity of making any matches yet, but would be glad to receive a few challenges for the ensuing season.

KEW ASYLUM CRICKET CLUB.

From: Boyle & Scott's Australian cricketers' guide., no.1882/83, 1882-01-01 p114

The club has had a fairly successful season, although they had tough opponents in Bohemia, Kew, Fitzroy, Brighton, &c. Among the players, H. Trumble, T. Foley, and G. Roberts have shown improved form with the bat, whilst M ‘Michael, W. Trumble, and Swift are as effective as of old. In bowling, H. Trumble, Arnold, and Swift have been most destructive.

Batting Averages.

Not Most Most
Inns. out. Runs, in inns, in match. Aver.
J. M'Michael 19 9 528 64* 64* 52.8
J. W. Trumble 6 1 229 105* 105* 45.4
J. S. Swift 15 4 497 79 79 45.2
C. Ross 4 0 114 75 75 28.2
H. Trumble 18 7 234 52* 52* 21.3
G. Roberts 13 1 167 39 39 13.11
T. Foley 17 3 163 44 44 11.9
G. Arnold 14 2 103 . 31 31 8.7

Bowling Averages.

Balls. Mdns. Runs. Wits. Aver.
J. W. Trumble 268 16 50 13 3.11
J. S. Swift 394 16 148 25 5.23
G. Arnold 650 26 320 43 7.19
T. Foley 258 11 88 8 11
H. Trumble 834 33 349 28 12.13
G. M'Garvin... 120 3 51 4 12.3
J. M'Michael 177 5 126 10 12.6

Swift 2no balls; Trumble 1.

CRICKET. KEW ASYLUM v. N. MELBOURNE.

From: The Reporter (Box Hill, Vic. : 1889 - 1925) Fri 12 Mar 1909 Page 7

The above match, played on the asylum ground on Saturday, was won by the home team by 7 wickets and 127 runs. North Melbourne scored 47 (Howlott 20 not out), while the Asylum lost 3 wickets for 174 (R. Morrison 110 not out, including 17 fourers, A. Walsh 38, R. Walsh 25 not out). Howlett, 1 for 34, and Buncle, 1 for 33, took the wickets for North Melbourne, and for Kew, Kenny 4 for 19, Crouch 4 for 21.

August 01, 2020

Audiobooks – July 2020

The Address Book: What Street Addresses Reveal About Identity, Race, Wealth, and Power by Deirdre Mask

Covered the subtitle well. I would have like some of more technical stuff that the author mentioned reading. 3/5

The Fated Sky: Lady Astronaut #2 by Mary Robinette Kowal

Set mostly in the lead-up to the first Mars expedition and the journey to Mars. Lots of interpersonal/interracial problems & fixing toilets. 3/5

Enola Gay: Mission to Hiroshima by Gordon Thomas

Mainly following Paul Tibbets and the 509th Composite Group plus some on the ground in Hiroshima. Has “minute-by-minute coverage of the critical periods”. 3/5

Pandemic by John Dryden

A 3 part radio play set before/after and during a global pandemic. Total length just 2 hours. Parts 2 & 3 felt a little cliched. 3/5

An Economist Walks into a Brothel: And Other Unexpected Places to Understand Risk by Allison Schrager

Examples of how people in unusual situations handle risk and how you can apply it to your life. Interesting and useful. 4/5

100 Things The Simpsons Fans Should Know & Do Before They Die by Allie Goertz & Julia Prescott

Over 4 minutes per fact so plenty of depth. A Great collection of stuff for casual and serious fans. 4/5

Chernobyl 01:23:40 : The Incredible True Story of the World’s Worst Nuclear Disaster by Andrew Leatherbarrow

Chapters on the disaster and aftermath alternate with the author’s trip to the Chernobyl Exclusion Zone. A good intro to the disaster. 3/5

Turn the Ship Around! : A True Story of Turning Followers into Leaders by L. David Marquet

The management advice is lost on me but the stories about turning around an under-performing sub crew in weeks is interesting. 3/5

My Scoring System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Share

July 31, 2020

Extending GPG key expiry

Extending the expiry on a GPG key is not very hard, but it's easy to forget a step. Here's how I did my last expiry bump.

Update the expiry on the main key and the subkey:

gpg --edit-key KEYID
> expire
> key 1
> expire
> save

Upload the updated key to the keyservers:

gpg --export KEYID | curl -T - https://keys.openpgp.org
gpg --keyserver keyring.debian.org --send-keys KEYID

Links July 2020

iMore has an insightful article about Apple’s transition to the ARM instruction set for new Mac desktops and laptops [1]. I’d still like to see them do something for the server side.

Umair Haque wrote an insightful article about How the American Idiot Made America Unlivable [2]. We are witnessing the destruction of a once great nation.

Chris Lamb wrote an interesting blog post about comedy shows with the laugh tracks edited out [3]. He then compares that to social media with the like count hidden which is an interesting perspective. I’m not going to watch TV shows edited in that way (I’ve enjoyed BBT inspite of all the bad things about it) and I’m not going to try and hide like counts on social media. But it’s interesting to consider these things.

Cory Doctorow wrote an interesting Locus article suggesting that we could have full employment by a transition to renewable energy and methods for cleaning up the climate problems we are too late to prevent [4]. That seems plausible, but I think we should still get a Universal Basic Income.

The Thinking Shop has posters and decks of cards with logical fallacies and cognitive biases [5]. Every company should put some of these in meeting rooms. Also they have free PDFs to download and print your own posters.

gayhomophobe.com [6] is a site that lists powerful homophobic people who hurt GLBT people but then turned out to be gay. It’s presented in an amusing manner, people who hurt others deserve to be mocked.

Wired has an insightful article about the shutdown of Backpage [7]. The owners of Backpage weren’t nice people and they did some stupid things which seem bad (like editing posts to remove terms like “lolita”). But they also worked well with police to find criminals. The opposition to what Backpage were doing conflates sex trafficing, child prostitution, and legal consenting adult sex work. Taking down Backpage seems to be a bad thing for the victims of sex trafficing, for consenting adult sex workers, and for society in general.

Cloudflare has an interesting blog post about short lived certificates for ssh access [8]. Instead of having user’s ssh keys stored on servers each user has to connect to a SSO server to obtain a temporary key before connecting, so revoking an account is easy.

July 27, 2020

How to create Linux bridges and Open vSwitch bridges with NetworkManager

My virtual infrastructure Ansible role supports connecting VMs to both Linux and Open vSwitch bridges, but they must already exist on the KVM host.

Here is how to convert an existing Ethernet device into a bridge. Be careful if doing this on a remote machine with only one connection! Make sure you have some other way to log in (e.g. console), or maybe add additional interfaces instead.

Export interfaces and existing connections

First, export the the device you want to convert so we can easily reference it later (e.g. eth1).

export NET_DEV="eth1"

Now list the current NetworkManager connections for your device exported above, so we know what to disable later.

sudo nmcli con |egrep -w "${NET_DEV}"

This might be something like System eth1 or Wired connection 1, let’s export it too for later reference.

export NM_NAME="Wired connection 1"

Create a Linux bridge

Here is an example of creating a persistent Linux bridge with NetworkManager. It will take a device such as eth1 (substitute as appropriate) and convert it into a bridge. Note that we will be specifically giving it the device name of br0 as that’s the standard convention and what things like libvirt will look for.

Make sure you have exported your device as NET_DEV and its existing NetworkManager connection name as NM_NAME from above, you will use them below.

sudo nmcli con add ifname br0 type bridge con-name br0
sudo nmcli con add type bridge-slave ifname "${NET_DEV}" master br0 con-name br0-slave-"${NET_DEV}"

Note that br0 probably has a different MAC address to your physical interface. If so, make sure you update and DHCP reservations (or be able to find the new IP once the bridge is brought up).

sudo ip link show dev br0
sudo ip link show dev "${NET_DEV}"

Configure the bridge

As mentioned above, by default the Linux bridge will get an address via DHCP. If you don’t want it to be on the network (you might have another dedicated interface) then disable DHCP on it.

sudo nmcli con modify br0 ipv4.method disabled ipv6.method disabled

Or, if you need set a static IP you can do that too.

sudo nmcli con modify br0 ipv4.method static ipv4.address 192.168.123.100/24

If you need to set a specific MTU like 9000 (defaults to 1500), you can do that.

sudo nmcli con modify br0-slave-enp4s0 802-3-ethernet.mtu 9000

Finally, spanning tree protocol is on by default, so disable it if you need to.

sudo nmcli con modify br0 bridge.stp no

Bring up the bridge

Now you can either simply reboot, or stop the current interface and bring up the bridge (do it in one command in case you’re using the one interface, else you’ll get disconnected). Note that your IP might change once bridge comes up, if you didn’t check the MAC address and update any static DHCP leases.

sudo nmcli con down "${NM_NAME}" ; \
sudo nmcli con up br0

Create an Open vSwitch (OVS) bridge

OVS bridges are often used for plumbing into libvirt for use with VLANs.

We can create an OVS bridge which will consists of the bridge itself and multiple ports and interfaces which connect everything together, including the physical device itself (so we can talk on the network) and virtual ports for VLANs and VMs. By default the physical port on the bridge will use untagged (native) VLAN, but if all your traffic needs to be tagged then we can add a tagged interface.

Here is an example of creating a persistent OVS bridge with NetworkManager. It will take a device such as eth1 (substitute as appropriate) and convert it into an ovs-bridge.

Install dependencies

You will need openvswitch installed as well as the OVS NetworkManager plugin.

sudo dnf install -y NetworkManager-ovs openvswitch
sudo systemctl enable --now openvswitch
sudo systemctl restart NetworkManager

Create the bridge

Let’s create the bridge, its port and interface with these three commands.

sudo nmcli con add type ovs-bridge conn.interface ovs-bridge con-name ovs-bridge
sudo nmcli con add type ovs-port conn.interface port-ovs-bridge master ovs-bridge con-name ovs-bridge-port
sudo nmcli con add type ovs-interface slave-type ovs-port conn.interface ovs-bridge master ovs-bridge-port con-name ovs-bridge-int

Patch in our physical interface

Next, create another port on the bridge and patch in our physical device as an Ethernet interface so that real traffic can flow across the network. Make sure you have exported your device as NET_DEV and its existing NetworkManager connection name as NM_NAME from above, you will use them below.

sudo nmcli con add type ovs-port conn.interface ovs-port-eth master ovs-bridge con-name ovs-port-eth
sudo nmcli con add type ethernet conn.interface "${NET_DEV}" master ovs-port-eth con-name ovs-port-eth-int

OK now you should have an OVS bridge configured and patched to your local network via your Ethernet device, but not yet active.

Configure the bridge

By default the OVS bridge will be sending untagged traffic and requesting an IP address for ovs-bridge via DHCP. If you don’t want it to be on the network (you might have another dedicated interface) then disable DHCP on the interface.

sudo nmcli con modify ovs-bridge-int ipv4.method disabled ipv6.method disabled

Or if you need to set a static IP you can do that too.

sudo nmcli con modify ovs-bridge-int ipv4.method static ipv4.address 192.168.123.100/24

If you need to set a specific MTU like 9000 (defaults to 1500), you can do that.

sudo nmcli con modify ovs-bridge-int 802-3-ethernet.mtu 9000
sudo nmcli con modify ovs-port-eth-int 802-3-ethernet.mtu 9000

Bring up the bridge

Before you bring up the bridge, note that ovs-bridge will probably have a MAC address which is different to your physical interface. Keep that in mind if you manage DHCP static leases, and make sure you can find the new IP so that you can log back in once the bridge is brought up.

Now you can either simply reboot, or stop the current interface and bring up the bridge and its interfaces (in theory we just need to bring up ovs-port-eth-int, but let’s make sure and do it in one command in case you’re using the one interface, else you’ll get disconnected and not be able to log back in). Note that your MAC address may change here, so if you’re using DHCP and you’ll get a new IP and your session will freeze, so be sure you can find the new IP so you can log back in.

sudo nmcli con down "${NM_NAME}" ; \
sudo nmcli con up ovs-port-eth-int ; \
sudo nmcli con up ovs-bridge-int

Now you have a working Open vSwitch implementation!

Create OVS VLAN ports

From there you might want to create some port groups for specific VLANs. For example, if your network does not have a native VLAN, you will need to create a VLAN interface on the OVS bridge to get onto the network.

Let’s create a new port and interface for VLAN 123 which will use DHCP by default to get an address and bring it up.

sudo nmcli con add type ovs-port conn.interface vlan123 master ovs-bridge ovs-port.tag 123 con-name ovs-port-vlan123
sudo nmcli con add type ovs-interface slave-type ovs-port conn.interface vlan123 master ovs-port-vlan123 con-name ovs-int-vlan123
sudo nmcli con up ovs-int-vlan123

If you need to set a static address on the VLAN interface instead, you can do so by modifying the interface.

sudo nmcli con modify ovs-int-vlan123 ipv4.method static ipv4.address 192.168.123.100/24

View the OVS configuration

Show the switch config and bridge with OVS tools.

sudo ovs-vsctl show

Clean up old interface profile

It’s not really necessary, but you can disable the current NetworkManager config for the device so that it doesn’t conflict with the bridge, if you want to.

sudo nmcli con modify "${NM_NAME}" ipv4.method disabled ipv6.method disabled

Or you can even delete the old interface’s NetworkManager configuration if you want to (but it’s not necessary).

sudo nmcli con delete "${NM_NAME}"

That’s it!

July 25, 2020

Batch Image Processing

It may initially seem counter-intuitive, but sometimes one needs to process an image file without actually viewing the image file. This is particularly the case if one has a very large number of image files and a uniform change is required. The slow process is to open the images files individually in whatever application one is using and make the changes required, save and open the next file and make the changes required, and so forth. This is time-consuming, boring, and prone to error.

Avoiding such activities is why computers were invented; computers are extremely good at accurate and fast automation of computational tasks (and what can be automated should be automated); leaving humans to carry out the tasks of innovation, invention, discovery, and aesthetics. Automation of regular activities can be easily carried out with shell script loops which are used here.

The following touches the surface of certain automation tasks that I have encountered in the past. I am not a photographer, a graphic designer, or anything of the sort. I consider my skills in such endeavours as sorely lacking. However, I do have a working knowledge of how to get a GNU/Linux based system to do things with a minimal amount of work, and I strongly believe in reducing the amount of work that others have to do.

Install Necessary Software

I will start by assuming that the gentle reader of this document is using Ubuntu 18.04 LTS. It is not necessarily my Linux distribution of choice, but it is usually the one that people are initially exposed to. People who are using other distributions will find that the installation process is slightly different (e.g., use of yum install for RedHat/CentOS, for example), or if they are very keen and performance-sensitive, installing the software from source rather than packages.

Four software applications are suggested here; UFRaw, Unidentified Flying Raw, which convert camera RAW images to standard image files; ghostscript, a PostScript and PDF language interpreter and previewer; imageMagick for image file manipulations; and poppler-utils for modifying PDFs (see a previous post about how awesome they are).

Always worth doing to get a system's package information up-to-date:
sudo apt-get update

Install the applications:
sudo apt-get install ufraw-batch
sudo apt-get install ghostscript
sudo apt-get install imagemagick
sudo apt-get install poppler-utils

Converting Raw Images to Standard Image Files

Photographers will like this one. The best tool for converting raw images to standard image files is unfraw-batch. The following example parses over all the files in a directory with the suffix.CR2, a raw camera image created by Canon digital cameras. The website imaging-resources provides a good collection of some example files; they are quite large!

It is unfortunately common for non-printing characters to make their way into filenames these days. When working with the command-line, life is a lot easier if spaces are removed from filenames, because many core applications read the space a a delimiter; 'My File' will be read as two files, "My" and "File"! The following loop command can be used to get rid of these unnecessary spaces.

for item in ./*; do mv "$item" "$(echo "$item" | tr -d " ")"; done

The following loop will, for the current working directory, loop over each and every file with a.CR2 suffix, and run the ufraw-batch command, outputting a new file with the jpg format. This could also be ppm, tiff, jpeg, jpg, or fits.

for item in ./*.CR2; do ufraw-batch --out-type jpg $item ; done

There is an excellent range of manipulation options with ufraw batch; check the manual page (man ufraw-batch) for some examples.

As an aside, the output format that one uses should depend on what the file is being used for. In brief, a jpg is a compressed lossy format that is handy for web-published photographs due to their size. In comparison, PNG is a lossless compression format, which produces larger files. It is more typically recommended for drawings. TIFF is a lossless format that is typically uncompressed and used for print publications.

Or, rather than typing everything in detail just use the flowchart by Allen Hsu.

Modifying Existing Image Files

As the blockquote in the previous section suggests sometimes one might not have the right image format for the job that one wants to do.

The following simple loops allow for mass conversion of files from one format to another or force particular characteristics into a file. Each of them make use of the convert utility that is part of imagemagick.

for item in ./*.jpg ; do convert "$item" "${item%.*}.png" ; done
for item in ./*.jpg ; do convert "$item" -monochrome "$item"; done
for item in ./*.jpg; do convert "$item" -define jpeg:extent=512kb "${item%.*}.jpg" ; done
for item in ./*.jpg; do convert "$item" ../logo.jpg -gravity southeast -geometry +10+10 -composite "${item%.*}logo".jpg ; done

The first loop converts all jpg files in a directory to png files.

The second loop converts all jpg files in a directory with a copy that is monochrome. The '-monochrome' dither is clearer, but other options that could be used for a similar effect include '-threshold xx%' or '-remap pattern:gray50', for less contrast but retaining more information.

The third loop converts all jpg files to a fixed size (512kb).

The fourth loop adds a logo (logo.jpg) to all jpg files in a directory. Note that the logo file is a directory level above the image folders, otherwise, the loop would engage in the sort of horrible recursion where the logo is placed in the logo, which would be weird.

The following is the sort of insane request that one gets from managers; "could you please put the following jpg files into a PDF? In order?"

Such a request involves two steps; converting the jpg files to PDFs, and then combining the PDF files. An assumption here that the files are each prefixed with an ordered value (and there is no special characters in the file names and using ls to parse.

for item in $(ls -v *.jpg); do convert "$item" "${item%.*}.pdf"; done
pdfunite $(ls -v *.pdf) output.pdf

There certainly is a great deal more that one can go with the now-installed applications and with scripts. Imagemagick, for example, provides excellent documentation on its command-line tools. The scripting methods used here were primarily simple for-loops. There are even far more sophisticated means of engaging in such automation (conditional tests, reads from a file descriptor, continue/break statements, etc). But for now, these examples should serve as a useful short introduction on how to modify dozens or hundreds of images in batch mode without even looking at a single image.

July 19, 2020

4FSK on 25 Microwatts

Bill, VK5DSP, and I have been testing a new 4FSK modem waveform that uses LDPC Forward Error Correction (FEC). Bill designed the LDPC codes we are using, and worked out how to perform the soft decision to Log Likelihood Ratio (LLR) calculations we need to use LDPC codes with 4FSK. This is surprisingly tricky, and took a few weeks of careful simulation work.

Once working in simulation, we wanted to test the system Over the Air (OTA).

Experiment

We set up and adjusted antennas on the 2m Ham band, our FT-817 radios, and some USB rig interface boxes. This always takes longer than you think! Bill has a Yagi with 10dB gain fed by coax with 2dB loss. I have a Flower Pot vertical dipole with (I estimate) 2dB gain, and a crummy coax run with 5dB of loss. Here’s is Bill’s Yagi (it was mounted higher for the actual experiments):

Bill and I are 10km apart, with a non line of site path across suburban Adelaide between us.

The GNU Octave modem simulation has been split into Tx and Rx scripts. We generate a file of Tx samples using one script. The Tx station then plays that file OTA, while the Rx station records a file of received samples, which are then run through the Rx script to obtain Bit Error Rate (BER) and Packet Error Rate (PER) results. Here is a sample run using a local file test.raw:

octave:46> fsk_lib_ldpc_tx("test.raw")
octave:48> fsk_lib_ldpc_rx("test.raw")
Fs: 8000 Rs: 100 frames received:   8
  Uncoded: nbits:   3584 nerrs:      0 ber: 0.000
  Coded..: nbits:   2048 nerrs:      0 ber: 0.000
  Coded..: npckt:      8 perrs:      0 per: 0.000

The modem can be configured for any bit rate. We started with 100 symbols/s (200 bits/s for 4FSK) and a half rate code, resulting in 100 bits/s payload data rate. As an initial sanity test, we tried receiving our Tx signal using a RTLSDR/Airspy and gqrx located a few metres from the Tx. This confirmed the OTA link was working over short distances. A lot can go wrong with OTA tests, so it’s really important to build up to them slowly, testing at every stage.

Results – Day 1

On our first attempt we hit a few USB/LSB and tuning bugs. Turns out the core, uncoded demod works quite well with LSB and USB swapped – it’s just that all the tone frequencies are reversed so the decoded bits are scrambled!

After we sorted those issues out Bill managed to record and decode my 4FSK signal. We started with 600mW Tx power. The SNR was very high, and we had zero errors. Hmm, too strong! So we started inserting attenuation. Eventually, with about 1mW Tx power, we started to get some errors and found the “knee” in the curve – the point where the FEC falls over and we start getting packet errors.

With powerful codes like LDPC this transition is very sharp, change your SNR by just 1dB and you go from 0 to 100% packet errors.

We measured the path loss using a spectrum analyser. With Bill transmitting a 2W (33dBm) FSK signal, I measured -102dBm at the spec-an input, so an end-end path loss of 135dB. I estimated the free-space loss, for a line-of-sight path, including feed losses and antenna gains, at about 89dB. So the non line of site path is costing us 135 – 89 = 45dB extra attenuation.

Now our expected MDS can be estimated as:

MDS = Eb/No + 10*log10(bitRate) + NoiseFigure - 174
    = 7 + 10*log10(100) + 10 - 174
    = -137 dBm

I’m guessing the FT-817 noise figure at 5dB, and I have a coax loss of 5dB ahead of that so 10dB total. Now we reached our MDS with a Tx power of 0dBm, and our path loss is 135dB, this gives us a measured MDS of 0 – 135 = -135dBm. Not bad – theory and measurement within a few dB!

We performed further tests with a rate 3/4 LDPC codes at 400 symbols/s (600 bit/s throughput). This worked well at 10mW Tx power. Here are some plots showing the modem operation:

The top subplot above shows how the demodulator estimates the frequency of each of the 4FSK tones. They are nice and smooth and spaced at 400Hz, just as we would hope. The bottom plot is the time estimate, which shows a typical saw tooth pattern as the sample clocks of Bills DAC and my ADC are a little off (a few 100ppm). The demodulator automatically adjusts for this.

The top subplot above shows the SNR estimate (in the noise bandwidth bandwidth of the modem, not 3000 Hz). Our modem falls over when this hits around 7dB, so at 10mW we have 3dB margin. The bottom subplot is the number of bit errors per frame. Our LDPC codeword is about 2000 bits long, so that’s a raw BER of around 50/2000 or 2.5%.

You can decode this signal yourself using the off air sample Listen

octave:48> fsk_lib_ldpc_rx("20200717_4fsk_rs400_10mw.wav")

Now 600 bit/s is pretty close to the bit rate we need for Digital Voice which has us thinking ….. here is Bill trying SSB at the same power.

Results – Day 2

On our second day of testing Bill positioned himself at a site in the Adelaide Hills which overlook the city. The idea was to get a better path (closer to line of site) so we could try lower powers. For these tests Bill used a J-pole mounted on his car.

We managed to decode the Rs=100 symbols/s rate 1/2 waveform at an EIRP of -16dBm (25 microwatts). That’s the transmit power, not the received power.

Unfortunately we ran out of attenuators at that stage so couldn’t go any lower. It even worked really well with the FT-817 driving just a 50 ohm attenuator when the antenna cable was disconnected!

Discussion

We had a great time testing and are pleased that our modem performs just as expected over real world channels. We were both surprised that it worked – these tests usually take many attempts to get right!

I found OTA testing at VHF a lot easier than HF. You click the attenuator up and down a few dB and the SNR follows. The channel really is just additive noise, unlike HF that has all that time varying frequency selective fading and impulse noise.

What we have here is a carefully engineered 4FSK modem with powerful FEC that performs right on the theoretical limits. It can send packets of data between two points very efficiently. The payload bit rate and MDS can be scaled up and down to suit different applications. It’s all open source, and backed by a series of simulations documenting it’s development and testing.

Reading Further

Pull Request for the 4FSK to LLR mapping simulation
Pull Request for the 4FSK LDPC OTA tests
Open IP over VHF/UHF Using a RPi and RTLSDR at higher bit rates.
Bill’s LowSNR blog
Codec 2 FSK modem README
The design for Bill’s 6 element Yagi

The KSM and I

Share

I spent much of yesterday playing with KSM (Kernel Shared Memory, or Kernel Samepage Merging depending on which universe you come from). Unix kernels store memory in “pages” which are moved in and out of memory as a single block. On most Linux architectures pages are 4,096 bytes long.

KSM is a Linux Kernel feature which scans memory looking for identical pages, and then de-duplicating them. So instead of having two pages, we just have one and have two processes point at that same page. This has obvious advantages if you’re storing lots of repeating data. Why would you be doing such a thing? Well the traditional answer is virtual machines.

Take my employer’s systems for example. We manage virtual learning environments for students, where every student gets a set of virtual machines to do their learning thing on. So, if we have 50 students in a class, we have 50 sets of the same virtual machine. That’s a lot of duplicated memory. The promise of KSM is that instead of storing the same thing 50 times, we can store it once and therefore fit more virtual machines onto a single physical machine.

For my experiments I used libvirt / KVM on Ubuntu 18.04. To ensure KSM was turned on, I needed to:

  • Ensure KSM is turned on. /sys/kernel/mm/ksm/run should contain a “1” if it is enabled. If it is not, just write “1” to that file to enable it.
  • Ensure libvirt is enabling KSM. The KSM value in /etc/defaults/qemu-kvm should be set to “AUTO”.
  • Check KSM metrics:
# grep . /sys/kernel/mm/ksm/*
/sys/kernel/mm/ksm/full_scans:891
/sys/kernel/mm/ksm/max_page_sharing:256
/sys/kernel/mm/ksm/merge_across_nodes:1
/sys/kernel/mm/ksm/pages_shared:0
/sys/kernel/mm/ksm/pages_sharing:0
/sys/kernel/mm/ksm/pages_to_scan:100
/sys/kernel/mm/ksm/pages_unshared:0
/sys/kernel/mm/ksm/pages_volatile:0
/sys/kernel/mm/ksm/run:1
/sys/kernel/mm/ksm/sleep_millisecs:200
/sys/kernel/mm/ksm/stable_node_chains:49
/sys/kernel/mm/ksm/stable_node_chains_prune_millisecs:2000
/sys/kernel/mm/ksm/stable_node_dups:1055
/sys/kernel/mm/ksm/use_zero_pages:0

My lab machines are currently setup with Shaken Fist, so I just quickly launched a few hundred identical VMs. This first graph is that experiment. Its a little hard to see here but on three machines I consumed about about 40gb of RAM with indentical VMs and then waited. After three or so hours I had saved about 2,500 pages of memory.

To be honest, that’s a pretty disappointing result. 2,5000 4kb pages is only about 10mb of RAM, which isn’t very much at all. Also, three hours is a really long time for our workload, where students often fire up their labs for a couple of hours at a time before shutting them down again. If this was as good as KSM gets, it wasn’t for us.

After some pondering, I realised that KSM is configured by default to not work very well. The default value for pages_to_scan is 100, which means each scan run only inspects about half a megabyte of RAM. It would take a very very long time to scan a modern machine that way. So I tried setting pages_to_scan to 1,000,000,000 instead. One billion is an unreasonably large number for the real world, but hey. You update this number by writing a new value to /sys/kernel/mm/ksm/pages_to_scan.

This time we get a much better result — I launched as many VMs as would fit on each machine, and the sat back and waited (well, went to bed acutally). Again the graph is a bit hard to read, but what it is saying is that after 90 minutes KSM had saved me over 300gb of RAM across the three machines. Its still a little too slow for our workload, but for workloads where the VMs are relatively static that’s a real saving.

Now it should be noted that setting pages_to_scan to 1,000,000,000 comes at a cost — each of these machines now has one of its 48 cores dedicated to scanning memory and deduplicating. For my workload that’s something I am ok with because my workload is not CPU bound, but it might not work for you.

Share

Shaken Fist 0.2.0

Share

The other day we released Shaken Fist version 0.2, and I never got around to announcing it here. In fact, we’ve done a minor release since then and have another minor release in the wings ready to go out in the next day or so.

So what’s changed in Shaken Fist between version 0.1 and 0.2? Well, actually kind of a lot…

  • We moved from MySQL to etcd for storage of persistant state. This was partially done because we wanted distributed locking, but it was also because MySQL was a pain to work with.
  • We rearranged our repositories — the main repository is now in its own github organisation, and the golang REST client, terrform provider, and deployment tooling have moved into their own repositories in that organisation. There is also a prototype javascript client now as well.
  • Some work has gone into making the API service more production grade, although there is still some work to be done there probably in the 0.3 release — specifically there is a timeout if a response takes more than 300 seconds, which can be the case in launch large VMs where the disk images are not in cache.

There were also some important features added:

  • Authentication of API requests.
  • Resource ownership.
  • Namespaces (a bit like Kubernetes namespaces or OpenStack projects).
  • Resource tagging, called metadata.
  • Support for local mirroring of common disk images.
  • …and a large number of bug fixes.

Shaken Fist is also now packaged on pypi, and the deployment tooling knows how to install from packages as well as source if that’s a thing you’re interested in. You can read more at shakenfist.com, but that site is a bit of a work in progress at the moment. The new github organisation is at github.com/shakenfist.

Share

July 17, 2020

If You’re not Using YAML for CloudFormation Templates, You’re Doing it Wrong

In my last blog post, I promised a rant about using YAML for CloudFormation templates. Here it is. If you persevere to the end I’ll also show you have to convert your existing JSON based templates to YAML.

Many of the points I raise below don’t just apply to CloudFormation. They are general comments about why you should use YAML over JSON for configuration when you have a choice.

One criticism of YAML is its reliance on indentation. A lot of the code I write these days is Python, so indentation being significant is normal. Use a decent editor or IDE and this isn’t a problem. It doesn’t matter if you’re using JSON or YAML, you will want to validate and lint your files anyway. How else will you find that trailing comma in your JSON object?

Now we’ve got that out of the way, let me try to convince you to use YAML.

As developers we are regularly told that we need to document our code. CloudFormation is Infrastructure as Code. If it is code, then we need to document it. That starts with the Description property at the top of the file. If you JSON for your templates, that’s it, you have no other opportunity to document your templates. On the other hand, if you use YAML you can add inline comments. Anywhere you need a comment, drop in a hash # and your comment. Your team mates will thank you.

JSON templates don’t support multiline strings. These days many developers have 4K or ultra wide monitors, we don’t want a string that spans the full width of our 34” screen. Text becomes harder to read once you exceed that “90ish” character limit. With JSON your multiline string becomes "[90ish-characters]\n[another-90ish-characters]\n[and-so-on"]. If you opt for YAML, you can use the greater than symbol (>) and then start your multiline comment like so:

Description: >
  This is the first line of my Description
  and it continues on my second line
  and I'll finish it on my third line.

As you can see it much easier to work with multiline string in YAML than JSON.

“Folded blocks” like the one above are created using the > replace new lines with spaces. This allows you to format your text in a more readable format, but allow a machine to use it as intended. If you want to preserve the new line, use the pipe (|) to create a “literal block”. This is great for an inline Lambda functions where the code remains readable and maintainable.

  APIFunction:
    Type: AWS::Lambda::Function
    Properties:
      Code:
        ZipFile: |
          import json
          import random


          def lambda_handler(event, context):
              return {"statusCode": 200, "body": json.dumps({"value": random.random()})}
      FunctionName: "GetRandom"
      Handler: "index.lambda_handler"
      MemorySize: 128
      Role: !GetAtt LambdaServiceRole.Arn
      Runtime: "python3.7"
		Timeout: 5

Both JSON and YAML require you to escape multibyte characters. That’s less of an issue with CloudFormation templates as generally you’re only using the ASCII character set.

In a YAML file generally you don’t need to quote your strings, but in JSON double quotes are used every where, keys, string values and so on. If your string contains a quote you need to escape it. The same goes for tabs, new lines, backslashes and and so on. JSON based CloudFormation templates can be hard to read because of all the escaping. It also makes it harder to handcraft your JSON when your code is a long escaped string on a single line.

Some configuration in CloudFormation can only be expressed as JSON. Step Functions and some of the AppSync objects in CloudFormation only allow inline JSON configuration. You can still use a YAML template and it is easier if you do when working with these objects.

The JSON only configuration needs to be inlined in your template. If you’re using JSON you have to supply this as an escaped string, rather than nested objects. If you’re using YAML you can inline it as a literal block. Both YAML and JSON templates support functions such as Sub being applied to these strings, it is so much more readable with YAML. See this Step Function example lifted from the AWS documentation:

MyStateMachine:
  Type: "AWS::StepFunctions::StateMachine"
  Properties:
    DefinitionString:
      !Sub |
        {
          "Comment": "A simple AWS Step Functions state machine that automates a call center support session.",
          "StartAt": "Open Case",
          "States": {
            "Open Case": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:open_case",
              "Next": "Assign Case"
            }, 
            "Assign Case": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:assign_case",
              "Next": "Work on Case"
            },
            "Work on Case": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:work_on_case",
              "Next": "Is Case Resolved"
            },
            "Is Case Resolved": {
                "Type" : "Choice",
                "Choices": [ 
                  {
                    "Variable": "$.Status",
                    "NumericEquals": 1,
                    "Next": "Close Case"
                  },
                  {
                    "Variable": "$.Status",
                    "NumericEquals": 0,
                    "Next": "Escalate Case"
                  }
              ]
            },
             "Close Case": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:close_case",
              "End": true
            },
            "Escalate Case": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:escalate_case",
              "Next": "Fail"
            },
            "Fail": {
              "Type": "Fail",
              "Cause": "Engage Tier 2 Support."    }   
          }
        }

If you’re feeling lazy you can use inline JSON for IAM policies that you’ve copied from elsewhere. It’s quicker than converting them to YAML.

YAML templates are smaller and more compact than the same configuration stored in a JSON based template. Smaller yet more readable is winning all round in my book.

If you’re still not convinced that you should use YAML for your CloudFormation templates, go read Amazon’s blog post from 2017 advocating the use of YAML based templates.

Amazon makes it easy to convert your existing templates from JSON to YAML. cfn-flip is aPython based AWS Labs tool for converting CloudFormation templates between JSON and YAML. I will assume you’ve already installed cfn-flip. Once you’ve done that, converting your templates with some automated cleanups is just a command away:

cfn-flip --clean template.json template.yaml

git rm the old json file, git add the new one and git commit and git push your changes. Now you’re all set for your new life using YAML based CloudFormation templates.

If you want to learn more about YAML files in general, I recommend you check our Learn X in Y Minutes’ Guide to YAML. If you want to learn more about YAML based CloudFormation templates, check Amazon’s Guide to CloudFormation Templates.

July 16, 2020

Windows 10 on Debian under KVM

Here are some things that you need to do to get Windows 10 running on a Debian host under KVM.

UEFI Booting

UEFI is big and complex, but most of what it does isn’t needed at all. If all you want to do is boot from an image of a disk with a GPT partition table then you just install the package ovmf and add something like the following to your KVM start script:

UEFI="-drive if=pflash,format=raw,readonly,file=/usr/share/OVMF/OVMF_CODE.fd -drive if=pflash,format=raw,readonly,file=/usr/share/OVMF/OVMF_VARS.fd"

Note that some of the documentation on this doesn’t have the OVMF_VARS.fd file set to readonly. Allowing writes to that file means that the VM boot process (and maybe later) can change EFI variables that affect later boots and other VMs if they all share the same file. For a basic boot you don’t need to change variables so you want it read-only. Also having it read-only is necessary if you want to run KVM as non-root.

As an experiment I tried booting without the OVMF_VARS.fd file, it didn’t boot and then even after configuring it to use the OVMF_VARS.fd file again Windows gave a boot error about the “boot configuration data file” that required booting from recovery media. Apparently configuration mistakes with EFI can mess up the Windows installation, so be careful and backup the Windows installation regularly!

Linux can boot from EFI but you generally don’t want to unless the boot device is larger than 2TB. It’s relatively easy to convert a Linux installation on a GPT disk to a virtual image on a DOS partition table disk or on block devices without partition tables and that gives a faster boot. If the same person runs the host hardware and the VMs then the best choice for Linux is to have no partition tables just one filesystem per block device (which makes resizing much easier) and have the kernel passed as a parameter to kvm. So booting a VM from EFI is probably only useful for booting Windows VMs and for Linux boot loader development and testing.

As an aside, the Debian Wiki page about Secure Boot on a VM [4] was useful for this. It’s unfortunate that it and so much of the documentation about UEFI is about secure boot which isn’t so useful if you just want to boot a system without regard to the secure boot features.

Emulated IDE Disks

Debian kernels (and probably kernels from many other distributions) are compiled with the paravirtualised storage device drivers. Windows by default doesn’t support such devices so you need to emulate an IDE/SATA disk so you can boot Windows and install the paravirtualised storage driver. The following configuration snippet has a commented line for paravirtualised IO (which is fast) and an uncommented line for a virtual IDE/SATA disk that will allow an unmodified Windows 10 installation to boot.

#DRIVE="-drive format=raw,file=/home/kvm/windows10,if=virtio"
DRIVE="-drive id=disk,format=raw,file=/home/kvm/windows10,if=none -device ahci,id=ahci -device ide-drive,drive=disk,bus=ahci.0"

Spice Video

Spice is an alternative to VNC, Here is the main web site for Spice [1]. Spice has many features that could be really useful for some people, like audio, sharing USB devices from the client, and streaming video support. I don’t have a need for those features right now but it’s handy to have options. My main reason for choosing Spice over VNC is that the mouse cursor in the ssvnc doesn’t follow the actual mouse and can be difficult or impossible to click on items near edges of the screen.

The following configuration will make the QEMU code listen with SSL on port 1234 on all IPv4 addresses. Note that this exposes the Spice password to anyone who can run ps on the KVM server, I’ve filed Debian bug #965061 requesting the option of a password file to address this. Also note that the “qxl” virtual video hardware is VGA compatible and can be expected to work with OS images that haven’t been modified for virtualisation, but that they work better with special video drivers.

KEYDIR=/etc/letsencrypt/live/kvm.example.com-0001
-spice password=xxxxxxxx,x509-cacert-file=$KEYDIR/chain.pem,x509-key-file=$KEYDIR/privkey.pem,x509-cert-file=$KEYDIR/cert.pem,tls-port=1234,tls-channel=main -vga qxl

To connect to the Spice server I installed the spice-client-gtk package in Debian and ran the following command:

spicy -h kvm.example.com -s 1234 -w xxxxxxxx

Note that this exposes the Spice password to anyone who can run ps on the system used as a client for Spice, I’ve filed Debian bug #965060 requesting the option of a password file to address this.

This configuration with an unmodified Windows 10 image only supported 800*600 resolution VGA display.

Networking

To set up bridged networking as non-root you need to do something like the following as root:

chgrp kvm /usr/lib/qemu/qemu-bridge-helper
setcap cap_net_admin+ep /usr/lib/qemu/qemu-bridge-helper
mkdir -p /etc/qemu
echo "allow all" > /etc/qemu/bridge.conf
chgrp kvm /etc/qemu/bridge.conf
chmod 640 /etc/qemu/bridge.conf

Windows 10 supports the emulated Intel E1000 network card. Configuration like the following configures networking on a bridge named br0 with an emulated E1000 card. MAC addresses that have a 1 in the second least significant bit of the first octet are “locally administered” (like IPv4 addresses starting with “10.”), see the Wikipedia page about MAC Address for details.

The following is an example of network configuration where $ID is an ID number for the virtual machine. So far I haven’t come close to 256 VMs on one network so I’ve only needed one octet.

NET="-device e1000,netdev=net0,mac=02:00:00:00:01:$ID -netdev tap,id=net0,helper=/usr/lib/qemu/qemu-bridge-helper,br=br0"

Final KVM Settings

KEYDIR=/etc/letsencrypt/live/kvm.example.com-0001
SPICE="-spice password=xxxxxxxx,x509-cacert-file=$KEYDIR/chain.pem,x509-key-file=$KEYDIR/privkey.pem,x509-cert-file=$KEYDIR/cert.pem,tls-port=1234,tls-channel=main -vga qxl"

UEFI="-drive if=pflash,format=raw,readonly,file=/usr/share/OVMF/OVMF_CODE.fd -drive if=pflash,format=raw,readonly,file=/usr/share/OVMF/OVMF_VARS.fd"

DRIVE="-drive format=raw,file=/home/kvm/windows10,if=virtio"

NET="-device e1000,netdev=net0,mac=02:00:00:00:01:$ID -netdev tap,id=net0,helper=/usr/lib/qemu/qemu-bridge-helper,br=br0"

kvm -m 4000 -smp 2 $SPICE $UEFI $DRIVE $NET

Windows Settings

The Spice Download page has a link for “spice-guest-tools” that has the QNX video driver among other things [2]. This seems to be needed for resolutions greater than 800*600.

The Virt-Manager Download page has a link for “virt-viewer” which is the Spice client for Windows systems [3], they have MSI files for both i386 and AMD64 Windows.

It’s probably a good idea to set display and system to sleep after never (I haven’t tested what happens if you don’t do that, but there’s no benefit in sleeping). Before uploading an image I disabled the pagefile and set the partition to the minimum size so I had less data to upload.

Problems

Here are some things I haven’t solved yet.

The aSpice Android client for the Spice protocol fails to connect with the QEMU code at the server giving the following message on stderr: “error:14094418:SSL routines:ssl3_read_bytes:tlsv1 alert unknown ca:../ssl/record/rec_layer_s3.c:1544:SSL alert number 48“.

Spice is supposed to support dynamic changes to screen resolution on the VM to match the window size at the client, this doesn’t work for me, not even with the Red Hat QNX drivers installed.

The Windows Spice client doesn’t seem to support TLS, I guess running some sort of proxy for TLS would work but I haven’t tried that yet.

July 14, 2020

OpenHMD and the Oculus Rift

For some time now, I’ve been involved in the OpenHMD project, working on building an open driver for the Oculus Rift CV1, and more recently the newer Rift S VR headsets.

This post is a bit of an overview of how the 2 devices work from a high level for people who might have used them or seen them, but not know much about the implementation. I also want to talk about OpenHMD and how it fits into the evolving Linux VR/AR API stack.

OpenHMD

http://www.openhmd.net/

In short, OpenHMD is a project providing open drivers for various VR headsets through a single simple API. I don’t know of any other project that provides support for as many different headsets as OpenHMD, so it’s the logical place to contribute for largest effect.

OpenHMD is supported as a backend in Monado, and in SteamVR via the SteamVR-OpenHMD plugin. Working drivers in OpenHMD opens up a range of VR games – as well as non-gaming applications like Blender. I think it’s important that Linux and friends not get left behind – in what is basically a Windows-only activity right now.

One downside is that does come with the usual disadvantages of an abstraction API, in that it doesn’t fully expose the varied capabilities of each device, but instead the common denominator. I hope we can fix that in time by extending the OpenHMD API, without losing its simplicity.

Oculus Rift S

I bought an Oculus Rift S in April, to supplement my original consumer Oculus Rift (the CV1) from 2017. At that point, the only way to use it was in Windows via the official Oculus driver as there was no open source driver yet. Since then, I’ve largely reverse engineered the USB protocol for it, and have implemented a basic driver that’s upstream in OpenHMD now.

I find the Rift S a somewhat interesting device. It’s not entirely an upgrade over the older CV1. The build quality, and some of the specifications are actually worse than the original device – but one area that it is a clear improvement is in the tracking system.

CV1 Tracking

The Rift CV1 uses what is called an outside-in tracking system, which has 2 major components. The first is input from Inertial Measurement Units (IMU) on each device – the headset and the 2 hand controllers. The 2nd component is infrared cameras (Rift Sensors) that you space around the room and then run a calibration procedure that lets the driver software calculate their positions relative to the play area.

IMUs provide readings of linear acceleration and angular velocity, which can be used to determine the orientation of a device, but don’t provide absolute position information. You can derive relative motion from a starting point using an IMU, but only over a short time frame as the integration of the readings is quite noisy.

This is where the Rift Sensors get involved. The cameras observe constellations of infrared LEDs on the headset and hand controllers, and use those in concert with the IMU readings to position the devices within the playing space – so that as you move, the virtual world accurately reflects your movements. The cameras and LEDs synchronise to a radio pulse from the headset, and the camera exposure time is kept very short. That means the picture from the camera is completely black, except for very bright IR sources. Hopefully that means only the LEDs are visible, although light bulbs and open windows can inject noise and make the tracking harder.

Rift Sensor view of the CV1 headset and 2 controllers.Rift Sensor view of the CV1 headset and 2 controllers.

If you have both IMU and camera data, you can build what we call a 6 Degree of Freedom (6DOF) driver. With only IMUs, a driver is limited to providing 3 DOF – allowing you to stand in one place and look around, but not to move.

OpenHMD provides a 3DOF driver for the CV1 at this point, with experimental 6DOF work in a branch in my fork. Getting to a working 6DOF driver is a real challenge. The official drivers from Oculus still receive regular updates to tweak the tracking algorithms.

I have given several presentations about the progress on implementing positional tracking for the CV1. Most recently at Linux.conf.au 2020 in January. There’s a recording at https://www.youtube.com/watch?v=PTHE-cdWN_s if you’re interested, and I plan to talk more about that in a future post.

Rift S Tracking

The Rift S uses Inside Out tracking, which inverts the tracking process by putting the cameras on the headset instead of around the room. With the cameras in fixed positions on the headset, the cameras and their view of the world moves as the user’s head moves. For the Rift S, there are 5 individual cameras pointing outward in different directions to provide (overall) a very wide-angle view of the surroundings.

The role of the tracking algorithm in the driver in this scenario is to use the cameras to look for visual landmarks in the play area, and to combine that information with the IMU readings to find the position of the headset. This is called Visual Inertial Odometry.

There is then a 2nd part to the tracking – finding the position of the hand controllers. This part works the same as on the CV1 – looking for constellations of LED lights on the controllers and matching what you see to a model of the controllers.

This is where I think the tracking gets particularly interesting. The requirements for finding where the headset is in the room, and the goal of finding the controllers require 2 different types of camera view!

To find the landmarks in the room, the vision algorithm needs to be able to see everything clearly and you want a balanced exposure from the cameras. To identify the controllers, you want a very fast exposure synchronised with the bright flashes from the hand controller LEDs – the same as when doing CV1 tracking.

The Rift S satisfies both requirements by capturing alternating video frames with fast and normal exposures. Each time, it captures the 5 cameras simultaneously and stitches them together into 1 video frame to deliver over USB to the host computer. The driver then needs to split each frame according to whether it is a normal or fast exposure and dispatch it to the appropriate part of the tracking algorithm.

Rift S – normal room exposure for Visual Inertial Odometry.
Rift S – fast exposure with IR LEDs for controller tracking.

There are a bunch of interesting things to notice in these camera captures:

  • Each camera view is inserted into the frame in some native orientation, and requires external information to make use of the information in them
  • The cameras have a lot of fisheye distortion that will need correcting.
  • In the fast exposure frame, the light bulbs on my ceiling are hard to tell apart from the hand controller LEDs – another challenge for the computer vision algorithm.
  • The cameras are Infrared only, which is why the Rift S passthrough view (if you’ve ever seen it) is in grey-scale.
  • The top 16-pixels of each frame contain some binary data to help with frame identification. I don’t know how to interpret the contents of that data yet.

Status

This blog post is already too long, so I’ll stop here. In part 2, I’ll talk more about deciphering the Rift S protocol.

Thanks for reading! If you have any questions, hit me up at mailto:thaytan@noraisin.net or @thaytan on Twitter

Debian PPC64EL Emulation

In my post on Debian S390X Emulation [1] I mentioned having problems booting a Debian PPC64EL kernel under QEMU. Giovanni commented that they had PPC64EL working and gave a link to their site with Debian QEMU images for various architectures [2]. I tried their image which worked then tried mine again which also worked – it seemed that a recent update in Debian/Unstable fixed the bug that made QEMU not work with the PPC64EL kernel.

Here are the instructions on how to do it.

First you need to create a filesystem in an an image file with commands like the following:

truncate -s 4g /vmstore/ppc
mkfs.ext4 /vmstore/ppc
mount -o loop /vmstore/ppc /mnt/tmp

Then visit the Debian Netinst page [3] to download the PPC64EL net install ISO. Then loopback mount it somewhere convenient like /mnt/tmp2.

The package qemu-system-ppc has the program for emulating a PPC64LE system, the qemu-user-static package has the program for emulating PPC64LE for a single program (IE a statically linked program or a chroot environment), you need this to run debootstrap. The following commands should be most of what you need.

apt install qemu-system-ppc qemu-user-static

update-binfmts --display

# qemu ppc64 needs exec stack to solve "Could not allocate dynamic translator buffer"
# so enable that on SE Linux systems
setsebool -P allow_execstack 1

debootstrap --foreign --arch=ppc64el --no-check-gpg buster /mnt/tmp file:///mnt/tmp2
chroot /mnt/tmp /debootstrap/debootstrap --second-stage

cat << END > /mnt/tmp/etc/apt/sources.list
deb http://mirror.internode.on.net/pub/debian/ buster main
deb http://security.debian.org/ buster/updates main
END
echo "APT::Install-Recommends False;" > /mnt/tmp/etc/apt/apt.conf

echo ppc64 > /mnt/tmp/etc/hostname

# /usr/bin/awk: error while loading shared libraries: cannot restore segment prot after reloc: Permission denied
# only needed for chroot
setsebool allow_execmod 1

chroot /mnt/tmp apt update
# why aren't they in the default install?
chroot /mnt/tmp apt install perl dialog
chroot /mnt/tmp apt dist-upgrade
chroot /mnt/tmp apt install bash-completion locales man-db openssh-server build-essential systemd-sysv ifupdown vim ca-certificates gnupg
# install kernel last because systemd install rebuilds initrd
chroot /mnt/tmp apt install linux-image-ppc64el
chroot /mnt/tmp dpkg-reconfigure locales
chroot /mnt/tmp passwd

cat << END > /mnt/tmp/etc/fstab
/dev/vda / ext4 noatime 0 0
#/dev/vdb none swap defaults 0 0
END

mkdir /mnt/tmp/root/.ssh
chmod 700 /mnt/tmp/root/.ssh
cp ~/.ssh/id_rsa.pub /mnt/tmp/root/.ssh/authorized_keys
chmod 600 /mnt/tmp/root/.ssh/authorized_keys

rm /mnt/tmp/vmlinux* /mnt/tmp/initrd*
mkdir /boot/ppc64
cp /mnt/tmp/boot/[vi]* /boot/ppc64

# clean up
umount /mnt/tmp
umount /mnt/tmp2

# setcap binary for starting bridged networking
setcap cap_net_admin+ep /usr/lib/qemu/qemu-bridge-helper

# afterwards set the access on /etc/qemu/bridge.conf so it can only
# be read by the user/group permitted to start qemu/kvm
echo "allow all" > /etc/qemu/bridge.conf

Here is an example script for starting kvm. It can be run by any user that can read /etc/qemu/bridge.conf.

#!/bin/bash
set -e

KERN="kernel /boot/ppc64/vmlinux-4.19.0-9-powerpc64le -initrd /boot/ppc64/initrd.img-4.19.0-9-powerpc64le"

# single network device, can have multiple
NET="-device e1000,netdev=net0,mac=02:02:00:00:01:04 -netdev tap,id=net0,helper=/usr/lib/qemu/qemu-bridge-helper"

# random number generator for fast start of sshd etc
RNG="-object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0"

# I have lockdown because it does no harm now and is good for future kernels
# I enable SE Linux everywhere
KERNCMD="net.ifnames=0 noresume security=selinux root=/dev/vda ro lockdown=confidentiality"

kvm -drive format=raw,file=/vmstore/ppc64,if=virtio $RNG -nographic -m 1024 -smp 2 $KERN -curses -append "$KERNCMD" $NET

July 09, 2020

Logging Step Functions to CloudWatch

Many AWS Services log to CloudWatch. Some do it out of the box, others need to be configured to log properly. When Amazon released Step Functions, they didn’t include support for logging to CloudWatch. In February 2020, Amazon announced StepFunctions could now log to CloudWatch. Step Functions still support CloudTrail logs, but CloudWatch logging is more useful for many teams.

Users need to configure Step Functions to log to CloudWatch. This is done on a per State Machine basis. Of course you could click around he console to enable it, but that doesn’t scale. If you use CloudFormation to manage your Step Functions, it is only a few extra lines of configuration to add the logging support.

In my example I will assume you are using YAML for your CloudFormation templates. I’ll save my “if you’re using JSON for CloudFormation you’re doing it wrong” rant for another day. This is a cut down example from one of my services:

---
AWSTemplateFormatVersion: '2010-09-09'
Description: StepFunction with Logging Example.
Parameters:
Resources:
  StepFunctionExecRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
        - Effect: Allow
          Principal:
            Service: !Sub "states.${AWS::Region}.amazonaws.com"
          Action:
          - sts:AssumeRole
      Path: "/"
      Policies:
      - PolicyName: StepFunctionExecRole
        PolicyDocument:
          Version: '2012-10-17'
          Statement:
          - Effect: Allow
            Action:
            - lambda:InvokeFunction
            - lambda:ListFunctions
            Resource: !Sub "arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:my-lambdas-namespace-*"
          - Effect: Allow
            Action:
            - logs:CreateLogDelivery
            - logs:GetLogDelivery
            - logs:UpdateLogDelivery
            - logs:DeleteLogDelivery
            - logs:ListLogDeliveries
            - logs:PutResourcePolicy
            - logs:DescribeResourcePolicies
            - logs:DescribeLogGroups
            Resource: "*"
  MyStateMachineLogGroup:
    Type: AWS::Logs::LogGroup
    Properties:
      LogGroupName: /aws/stepfunction/my-step-function
      RetentionInDays: 14
  DashboardImportStateMachine:
    Type: AWS::StepFunctions::StateMachine
    Properties:
      StateMachineName: my-step-function
      StateMachineType: STANDARD
      LoggingConfiguration:
        Destinations:
          - CloudWatchLogsLogGroup:
             LogGroupArn: !GetAtt MyStateMachineLogGroup.Arn
        IncludeExecutionData: True
        Level: ALL
      DefinitionString:
        !Sub |
        {
          ... JSON Step Function definition goes here
        }
      RoleArn: !GetAtt StepFunctionExecRole.Arn

The key pieces in this example are the second statement in the IAM Role with all the logging permissions, the LogGroup defined by MyStateMachineLogGroup and the LoggingConfiguration section of the Step Function definition.

The IAM role permissions are copied from the example policy in the AWS documentation for using CloudWatch Logging with Step Functions. The CloudWatch IAM permissions model is pretty weak, so we need to grant these broad permissions.

The LogGroup definition creates the log group in CloudWatch. You can use what ever value you want for the LogGroupName. I followed the Amazon convention of prefixing everything with /aws/[service-name]/ and then appended the Step Function name. I recommend using the RetentionInDays configuration. It stops old logs sticking around for ever. In my case I send all my logs to ELK, so I don’t need to retain them in CloudWatch long term.

Finally we use the LoggingConfiguration to tell AWS where we want to send out logs. You can only specify a single Destinations. The IncludeExecutionData determines if the inputs and outputs of each function call is logged. You should not enable this if you are passing sensitive information between your steps. The verbosity of logging is controlled by Level. Amazon has a page on Step Function log levels. For dev you probably want to use ALL to help with debugging but in production you probably only need ERROR level logging.

I removed the Parameters and Output from the template. Use them as you need to.

Drupal/MySQL Authentication Error

Yesterday I noticed that a number of Drupal websites I look after were down, and it was a pretty interesting error message:

MYSQL ERROR 2049 (HY000): Connection using old (pre-4.1.1) authentication protocol ref used (client option 'secure_auth' enabled)

Like most error messages it does give a tantalising hint about the problem, and unsurprisingly other people have had similar issues as StackOverflow explains. But it all looks somewhat annoying, especially when running on a hosted service (honestly, I have other things to do these days).

I was about to rue the day, and not for the first time, when I decided to move from plain HTML/CSS/Javascript sites to something with a database-driven system (Drupal, WordPress, etc). But fortunately, there is a stunningly easy fix; simply change the password. Specifically, change the password of the MySQL user that accesses the database that is being used by Drupal, and the authentication problem will solve itself. Actually this is probably timely because it's a good indication that (ahem) the password hadn't been changed "for quite a while".

The following are command-line options, but it's equally simple enough if one is using CPanel or similar.

1. Change password for database user. Get on MySQL the usual manner (e.g., mysql -u root -h localhost -p) and run
ALTER USER '$username'@'localhost' IDENTIFIED BY '$newpassword';

2. Change permissions on the $drupalroot/sites/default folder. The default permissions are set to 0555, set them to 0755 for editing.
chmod 0755 $drupalroot/sites/default

3. Change permissions for settings.php to 640 for editing
cd default; chmod 0640 settings.php

4. Change password for database user in settings.php ; this can be found in two locations, depending on the varsion of Drupal being used:

e.g.,
$db_url = 'mysql://username:password@localhost/databasename';

e.g.,

$databases = array (
..
'database' => '$database_name',
'username' => '$username',
'password' => '$newpassword',
),

5. Change permissions for settings.php and sites/default to read only
chmod 0440 settings.php; cd ..; chmod 0555 $drupalroot/sites/default

6. Reload website. Everything will be back to working order.

July 08, 2020

Resistance is Useless

Sorry about the Vogon quote.

I don’t write about my EV much any more, as nothing much happens to it. I’m planning to buy a real factory EV when the prices drop beneath AUD$30k. In the mean time it’s still my daily drive, like it has been since 2008.

Yesterday it felt a bit sluggish, like it wasn’t charged properly. I charged it, but the charger cut out prematurely. This is never a good sign as you don’t want a partially charged pack.

When under charge, I measured the voltage across all 36 cells. They were very close to each other except for one which was 300mV high. I traced this 300mV to the terminal post between the cell and it’s cable. I pulled apart the terminal and the washers were all black and burnt. If it was dropping 300mV at 20A charge, it must have been getting pretty hot at 200A during acceleration!


I replaced the washers with new ones and measured 11mV drop during charge, similar to other terminal posts. In addition to sluggish performance and carbon generation the high voltage drop must have been causing the battery management system to drop the charger out early.

It’s now happily soaking up some more charge, and I’ll take it for a test drive soon.

Reading Further
My EV Page
EV Blog Archive

July 05, 2020

Debian S390X Emulation

I decided to setup some virtual machines for different architectures. One that I decided to try was S390X – the latest 64bit version of the IBM mainframe. Here’s how to do it, I tested on a host running Debian/Unstable but Buster should work in the same way.

First you need to create a filesystem in an an image file with commands like the following:

truncate -s 4g /vmstore/s390x
mkfs.ext4 /vmstore/s390x
mount -o loop /vmstore/s390x /mnt/tmp

Then visit the Debian Netinst page [1] to download the S390X net install ISO. Then loopback mount it somewhere convenient like /mnt/tmp2.

The package qemu-system-misc has the program for emulating a S390X system (among many others), the qemu-user-static package has the program for emulating S390X for a single program (IE a statically linked program or a chroot environment), you need this to run debootstrap. The following commands should be most of what you need.

# Install the basic packages you need
apt install qemu-system-misc qemu-user-static debootstrap

# List the support for different binary formats
update-binfmts --display

# qemu s390x needs exec stack to solve "Could not allocate dynamic translator buffer"
# so you probably need this on SE Linux systems
setsebool allow_execstack 1

# commands to do the main install
debootstrap --foreign --arch=s390x --no-check-gpg buster /mnt/tmp file:///mnt/tmp2
chroot /mnt/tmp /debootstrap/debootstrap --second-stage

# set the apt sources
cat << END > /mnt/tmp/etc/apt/sources.list
deb http://YOURLOCALMIRROR/pub/debian/ buster main
deb http://security.debian.org/ buster/updates main
END
# for minimal install do not want recommended packages
echo "APT::Install-Recommends False;" > /mnt/tmp/etc/apt/apt.conf

# update to latest packages
chroot /mnt/tmp apt update
chroot /mnt/tmp apt dist-upgrade

# install kernel, ssh, and build-essential
chroot /mnt/tmp apt install bash-completion locales linux-image-s390x man-db openssh-server build-essential
chroot /mnt/tmp dpkg-reconfigure locales
echo s390x > /mnt/tmp/etc/hostname
chroot /mnt/tmp passwd

# copy kernel and initrd
mkdir -p /boot/s390x
cp /mnt/tmp/boot/vmlinuz* /mnt/tmp/boot/initrd* /boot/s390x

# setup /etc/fstab
cat << END > /mnt/tmp/etc/fstab
/dev/vda / ext4 noatime 0 0
#/dev/vdb none swap defaults 0 0
END

# clean up
umount /mnt/tmp
umount /mnt/tmp2

# setcap binary for starting bridged networking
setcap cap_net_admin+ep /usr/lib/qemu/qemu-bridge-helper

# afterwards set the access on /etc/qemu/bridge.conf so it can only
# be read by the user/group permitted to start qemu/kvm
echo "allow all" > /etc/qemu/bridge.conf

Some of the above can be considered more as pseudo-code in shell script rather than an exact way of doing things. While you can copy and past all the above into a command line and have a reasonable chance of having it work I think it would be better to look at each command and decide whether it’s right for you and whether you need to alter it slightly for your system.

To run qemu as non-root you need to have a helper program with extra capabilities to setup bridged networking. I’ve included that in the explanation because I think it’s important to have all security options enabled.

The “-object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-ccw,rng=rng0” part is to give entropy to the VM from the host, otherwise it will take ages to start sshd. Note that this is slightly but significantly different from the command used for other architectures (the “ccw” is the difference).

I’m not sure if “noresume” on the kernel command line is required, but it doesn’t do any harm. The “net.ifnames=0” stops systemd from renaming Ethernet devices. For the virtual networking the “ccw” again is a difference from other architectures.

Here is a basic command to run a QEMU virtual S390X system. If all goes well it should give you a login: prompt on a curses based text display, you can then login as root and should be able to run “dhclient eth0” and other similar commands to setup networking and allow ssh logins.

qemu-system-s390x -drive format=raw,file=/vmstore/s390x,if=virtio -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-ccw,rng=rng0 -nographic -m 1500 -smp 2 -kernel /boot/s390x/vmlinuz-4.19.0-9-s390x -initrd /boot/s390x/initrd.img-4.19.0-9-s390x -curses -append "net.ifnames=0 noresume root=/dev/vda ro" -device virtio-net-ccw,netdev=net0,mac=02:02:00:00:01:02 -netdev tap,id=net0,helper=/usr/lib/qemu/qemu-bridge-helper

Here is a slightly more complete QEMU command. It has 2 block devices, for root and swap. It has SE Linux enabled for the VM (SE Linux works nicely on S390X). I added the “lockdown=confidentiality” kernel security option even though it’s not supported in 4.19 kernels, it doesn’t do any harm and when I upgrade systems to newer kernels I won’t have to remember to add it.

qemu-system-s390x -drive format=raw,file=/vmstore/s390x,if=virtio -drive format=raw,file=/vmswap/s390x,if=virtio -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-ccw,rng=rng0 -nographic -m 1500 -smp 2 -kernel /boot/s390x/vmlinuz-4.19.0-9-s390x -initrd /boot/s390x/initrd.img-4.19.0-9-s390x -curses -append "net.ifnames=0 noresume security=selinux root=/dev/vda ro lockdown=confidentiality" -device virtio-net-ccw,netdev=net0,mac=02:02:00:00:01:02 -netdev tap,id=net0,helper=/usr/lib/qemu/qemu-bridge-helper

Try It Out

I’ve got a S390X system online for a while, “ssh root@s390x.coker.com.au” with password “SELINUX” to try it out.

PPC64

I’ve tried running a PPC64 virtual machine, I did the same things to set it up and then tried launching it with the following result:

qemu-system-ppc64 -drive format=raw,file=/vmstore/ppc64,if=virtio -nographic -m 1024 -kernel /boot/ppc64/vmlinux-4.19.0-9-powerpc64le -initrd /boot/ppc64/initrd.img-4.19.0-9-powerpc64le -curses -append "root=/dev/vda ro"

Above is the minimal qemu command that I’m using. Below is the result, it stops after the “4.” from “4.19.0-9”. Note that I had originally tried with a more complete and usable set of options, but I trimmed it to the minimal needed to demonstrate the problem.

  Copyright (c) 2004, 2017 IBM Corporation All rights reserved.
  This program and the accompanying materials are made available
  under the terms of the BSD License available at
  http://www.opensource.org/licenses/bsd-license.php

Booting from memory...
Linux ppc64le
#1 SMP Debian 4.

The kernel is from the package linux-image-4.19.0-9-powerpc64le which is a dependency of the package linux-image-ppc64el in Debian/Buster. The program qemu-system-ppc64 is from version 5.0-5 of the qemu-system-ppc package.

Any suggestions on what I should try next would be appreciated.

July 03, 2020

Desklab Portable USB-C Monitor

I just got a 15.6″ 4K resolution Desklab portable touchscreen monitor [1]. It takes power via USB-C and video input via USB-C or mini HDMI, has touch screen input, and has speakers built in for USB or HDMI sound.

PC Use

I bought a mini-DisplayPort to HDMI adapter and for my first test ran it from my laptop, it was seen as a 1920*1080 DisplayPort monitor. The adaptor is specified as supporting 4K so I don’t know why I didn’t get 4K to work, my laptop has done 4K with other monitors.

The next thing I plan to get is a VGA to HDMI converter so I can use this on servers, it can be a real pain getting a monitor and power cable to a rack mounted server and this portable monitor can be powered by one of the USB ports in the server. A quick search indicates that such devices start at about $12US.

The Desklab monitor has no markings to indicate what resolution it supports, no part number, and no serial number. The only documentation I could find about how to recognise the difference between the FullHD and 4K versions is that the FullHD version supposedly draws 2A and the 4K version draws 4A. I connected my USB Ammeter and it reported that between 0.6 and 1.0A were drawn. If they meant to say 2W and 4W instead of 2A and 4A (I’ve seen worse errors in manuals) then the current drawn would indicate the 4K version. Otherwise the stated current requirements don’t come close to matching what I’ve measured.

Power

The promise of USB-C was power from anywhere to anywhere. I think that such power can theoretically be done with USB 3 and maybe USB 2, but asymmetric cables make it more challenging.

I can power my Desklab monitor from a USB battery, from my Thinkpad’s USB port (even when the Thinkpad isn’t on mains power), and from my phone (although the phone battery runs down fast as expected). When I have a mains powered USB charger (for a laptop and rated at 60W) connected to one USB-C port and my phone on the other the phone can be charged while giving a video signal to the display. This is how it’s supposed to work, but in my experience it’s rare to have new technology live up to it’s potential at the start!

One thing to note is that it doesn’t have a battery. I had imagined that it would have a battery (in spite of there being nothing on their web site to imply this) because I just couldn’t think of a touch screen device not having a battery. It would be nice if there was a version of this device with a big battery built in that could avoid needing separate cables for power and signal.

Phone Use

The first thing to note is that the Desklab monitor won’t work with all phones, whether a phone will take the option of an external display depends on it’s configuration and some phones may support an external display but not touchscreen. The Huawei Mate devices are specifically listed in the printed documentation as being supported for touchscreen as well as display. Surprisingly the Desklab web site has no mention of this unless you download the PDF of the manual, they really should have a list of confirmed supported devices and a forum for users to report on how it works.

My phone is a Huawei Mate 10 Pro so I guess I got lucky here. My phone has a “desktop mode” that can be enabled when I connect it to a USB-C device (not sure what criteria it uses to determine if the device is suitable). The desktop mode has something like a regular desktop layout and you can move windows around etc. There is also the option of having a copy of the phone’s screen, but it displays the image of the phone screen vertically in the middle of the landscape layout monitor which is ridiculous.

When desktop mode is enabled it’s independent of the phone interface so I had to find the icons for the programs I wanted to run in an unsorted list with no search usable (the search interface of the app list brings up the keyboard which obscures the list of matching apps). The keyboard takes up more than half the screen and there doesn’t seem to be a way to make it smaller. I’d like to try a portrait layout which would make the keyboard take something like 25% of the screen but that’s not supported.

It’s quite easy to type on a keyboard that’s slightly larger than a regular PC keyboard (a 15″ display with no numeric keypad or cursor control keys). The hackers keyboard app might work well with this as it has cursor control keys. The GUI has an option for full screen mode for an app which is really annoying to get out of (you have to use a drop down from the top of the screen), full screen doesn’t make sense for a display this large. Overall the GUI is a bit clunky, imagine Windows 3.1 with a start button and task bar. One interesting thing to note is that the desktop and phone GUIs can be run separately, so you can type on the Desklab (or any similar device) and look things up on the phone. Multiple monitors never really interested me for desktop PCs because switching between windows is fast and easy and it’s easy to resize windows to fit several on the desktop. Resizing windows on the Huawei GUI doesn’t seem easy (although I might be missing some things) and the keyboard takes up enough of the screen that having multiple windows open while typing isn’t viable.

I wrote the first draft of this post on my phone using the Desklab display. It’s not nearly as easy as writing on a laptop but much easier than writing on the phone screen.

Currently Desklab is offering 2 models for sale, 4K resolution for $399US and FullHD for $299US. I got the 4K version which is very expensive at the moment when converted to Australian dollars. There are significantly cheaper USB-C monitors available (such as this ASUS one from Kogan for $369AU), but I don’t think they have touch screens and therefore can’t be used with a phone unless you enable the phone screen as touch pad mode and have a mouse cursor on screen. I don’t know if all Android devices support that, it could be that a large part of the desktop experience I get is specific to Huawei devices.

One annoying feature is that if I use the phone power button to turn the screen off it shuts down the connection to the Desklab display, but the phone screen will turn off it I leave it alone for the screen timeout (which I have set to 10 minutes).

Caveats

When I ordered this I wanted the biggest screen possible. But now that I have it the fact that it doesn’t fit in the pocket of my Scott e Vest jacket [2] will limit what I can do with it. Maybe I’ll be buying a 13″ monitor in the near future, I expect that Desklab will do well and start selling them in a wide range of sizes. A 15.6″ portable device is inconvenient even if it is in the laptop format, a thin portable screen is inconvenient in many ways.

Netflix doesn’t display video on the Desklab screen, I suspect that Netflix is doing this deliberately as some misguided attempt at stopping piracy. It is really good for watching video as it has the speakers in good locations for stereo sound, it’s a pity that Netflix is difficult.

The functionality on phones from companies other than Huawei is unknown. It is likely to work on most Android phones, but if a particular phone is important to you then you want to Google for how it worked for others.

July 02, 2020

Isolating PHP Web Sites

If you have multiple PHP web sites on a server in a default configuration they will all be able to read each other’s files in a default configuration. If you have multiple PHP web sites that have stored data or passwords for databases in configuration files then there are significant problems if they aren’t all trusted. Even if the sites are all trusted (IE the same person configures them all) if there is a security problem in one site it’s ideal to prevent that being used to immediately attack all sites.

mpm_itk

The first thing I tried was mpm_itk [1]. This is a version of the traditional “prefork” module for Apache that has one process for each HTTP connection. When it’s installed you just put the directive “AssignUserID USER GROUP” in your VirtualHost section and that virtual host runs as the user:group in question. It will work with any Apache module that works with mpm_prefork. In my experiment with mpm_itk I first tried running with a different UID for each site, but that conflicted with the pagespeed module [2]. The pagespeed module optimises HTML and CSS files to improve performance and it has a directory tree where it stores cached versions of some of the files. It doesn’t like working with copies of itself under different UIDs writing to that tree. This isn’t a real problem, setting up the different PHP files with database passwords to be read by the desired group is easy enough. So I just ran each site with a different GID but used the same UID for all of them.

The first problem with mpm_itk is that the mpm_prefork code that it’s based on is the slowest mpm that is available and which is also incompatible with HTTP/2. A minor issue of mpm_itk is that it makes Apache take ages to stop or restart, I don’t know why and can’t be certain it’s not a configuration error on my part. As an aside here is a site for testing your server’s support for HTTP/2 [3]. To enable HTTP/2 you have to be running mpm_event and enable the “http2” module. Then for every virtual host that is to support it (generally all https virtual hosts) put the line “Protocols h2 h2c http/1.1” in the virtual host configuration.

A good feature of mpm_itk is that it has everything for the site running under the same UID, all Apache modules and Apache itself. So there’s no issue of one thing getting access to a file and another not getting access.

After a trial I decided not to keep using mpm_itk because I want HTTP/2 support.

php-fpm Pools

The Apache PHP module depends on mpm_prefork so it also has the issues of not working with HTTP/2 and of causing the web server to be slow. The solution is php-fpm, a separate server for running PHP code that uses the fastcgi protocol to talk to Apache. Here’s a link to the upstream documentation for php-fpm [4]. In Debian this is in the php7.3-fpm package.

In Debian the directory /etc/php/7.3/fpm/pool.d has the configuration for “pools”. Below is an example of a configuration file for a pool:

# cat /etc/php/7.3/fpm/pool.d/example.com.conf
[example.com]
user = example.com
group = example.com
listen = /run/php/php7.3-example.com.sock
listen.owner = www-data
listen.group = www-data
pm = dynamic
pm.max_children = 5
pm.start_servers = 2
pm.min_spare_servers = 1
pm.max_spare_servers = 3

Here is the upstream documentation for fpm configuration [5].

Then for the Apache configuration for the site in question you could have something like the following:

ProxyPassMatch "^/(.*\.php(/.*)?)$" "unix:/run/php/php7.3-example.com.sock|fcgi://localhost/usr/share/wordpress/"

The “|fcgi://localhost” part is just part of the way of specifying a Unix domain socket. From the Apache Wiki it appears that the method for configuring the TCP connections is more obvious [6]. I chose Unix domain sockets because it allows putting the domain name in the socket address. Matching domains for the web server to port numbers is something that’s likely to be error prone while matching based on domain names is easier to check and also easier to put in Apache configuration macros.

There was some additional hassle with getting Apache to read the files created by PHP processes (the options include running PHP scripts with the www-data group, having SETGID directories for storing files, and having world-readable files). But this got things basically working.

Nginx

My Google searches for running multiple PHP sites under different UIDs didn’t turn up any good hits. It was only after I found the DigitalOcean page on doing this with Nginx [7] that I knew what to search for to find the way of doing it in Apache.

AudioBooks – June 2020

The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power by Shoshana Zuboff

A good warning of the dangerous designs and goals of firms like Facebook and Google. Sometimes a bit wordy. 3/5

The Calculating Stars: Lady Astronaut Volume 1 by Mary Robinette Kowal

Alternate timeline SF. A meteorite hits the US. The Space program accelerates so humans can escape earth. Our hero faces lots of sexism & other barriers to becoming an astronaut. 3/5

By the Shores of Silver Lake: Little House Series, Book 5 by Laura Ingalls Wilder

The family move to De Smet, South Dakota. The railroad and then a town is built about them over a year. A good entry in the series, some gripping passages. 3/5

The Restaurant: A History of Eating Out
by William Sitwell

A non-exhaustive history. Bouncing through ancient times before focusing on Britain since 1945. But plenty of fun and interesting bits. 3/5

Broadway: A History of New York City in Thirteen Miles by Fran Leadon

A mile by mile coverage from South to North. How each section was added to the street and developed. A range of interesting stories and history. 4/5

My Scoring System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Share

June 30, 2020

Fuck Grey Text

fuck grey text on white backgrounds
fuck grey text on black backgrounds
fuck thin, spindly fonts
fuck 10px text
fuck any size of anything in px
fuck font-weight 300
fuck unreadable web pages
fuck themes that implement this unreadable idiocy
fuck sites that don’t work without javascript
fuck reactjs and everything like it

thank fuck for Stylus. and uBlock Origin. and uMatrix.

Fuck Grey Text is a post from: Errata

June 27, 2020

Links June 2020

Bruce Schneier wrote an informative post about Zoom security problems [1]. He recommends Jitsi which has a Debian package of their software and it’s free software.

Axel Beckert wrote an interesting post about keyboards with small numbers of keys, as few as 28 [2]. It’s not something I’d ever want to use, but interesting to read from a computer science and design perspective.

The Guardian has a disturbing article explaining why we might never get a good Covid19 vaccine [3]. If that happens it will change our society for years if not decades to come.

Matt Palmer wrote an informative blog post about private key redaction [4]. I learned a lot from that. Probably the simplest summary is that you should never publish sensitive data unless you are certain that all that you are publishing is suitable, if you don’t understand it then you don’t know if it’s suitable to be published!

This article by Umair Haque on eand.co has some interesting points about how Freedom is interpreted in the US [5].

This article by Umair Haque on eand.co has some good points about how messed up the US is economically [6]. I think that his analysis is seriously let down by omitting the savings that could be made by amending the US healthcare system without serious changes (EG by controlling drug prices) and by reducing the scale of the US military (there will never be another war like WW2 because any large scale war will be nuclear). If the US government could significantly cut spending in a couple of major areas they could then put the money towards fixing some of the structural problems and bootstrapping a first-world economic system.

The American Conservatrive has an insightful article “Seven Reasons Police Brutality is Systemic Not Anecdotal [7].

Scientific American has an informative article about how genetic engineering could be used to make a Covid-19 vaccine [8].

Rike wrote an insightful post about How Language Changes Our Concepts [9]. They cover the differences between the French, German, and English languages based on gender and on how the language limits thoughts. Then conclude with the need to remove terms like master/slave and blacklist/whitelist from our software, with a focus on Debian but it’s applicable to all software.

Gunnar Wolf also wrote an insightful post On Masters and Slaves, Whitelists and Blacklists [10], they started with why some people might not understand the importance of the issue and then explained some ways of addressing it. The list of suggested terms includes Primary-secondary, Leader-follower, and some other terms which have slightly different meanings and allow more precision in describing the computer science concepts used. We can be more precise when describing computer science while also not using terms that marginalise some groups of people, it’s a win-win!

Both Rike and Gunnar were responding to a LWN article about the plans to move away from Master/Slave and Blacklist/Whitelist in the Linux kernel [11]. One of the noteworthy points in the LWN article is that there are about 70,000 instances of words that need to be changed in the Linux kernel so this isn’t going to happen immediately. But it will happen eventually which is a good thing.

Vale Marcus de Rijk

Vale Marcus de Rijk kattekrab Sat, 27/06/2020 - 10:16

June 25, 2020

How Will the Pandemic Change Things?

The Bulwark has an interesting article on why they can’t “Reopen America” [1]. I wonder how many changes will be long term. According to the Wikipedia List of Epidemics [2] Covid-19 so far hasn’t had a high death toll when compared to other pandemics of the last 100 years. People’s reactions to this vary from doing nothing to significant isolation, the question is what changes in attitudes will be significant enough to change society.

Transport

One thing that has been happening recently is a transition in transport. It’s obvious that we need to reduce CO2 and while electric cars will address the transport part of the problem in the long term changing to electric public transport is the cheaper and faster way to do it in the short term. Before Covid-19 the peak hour public transport in my city was ridiculously overcrowded, having people unable to board trams due to overcrowding was really common. If the economy returns to it’s previous state then I predict less people on public transport, more traffic jams, and many more cars idling and polluting the atmosphere.

Can we have mass public transport that doesn’t give a significant disease risk? Maybe if we had significantly more trains and trams and better ventilation with more airflow designed to suck contaminated air out. But that would require significant engineering work to design new trams, trains, and buses as well as expense in refitting or replacing old ones.

Uber and similar companies have been taking over from taxi companies, one major feature of those companies is that the vehicles are not dedicated as taxis. Dedicated taxis could easily be designed to reduce the spread of disease, the famed Black Cab AKA Hackney Carriage [3] design in the UK has a separate compartment for passengers with little air flow to/from the driver compartment. It would be easy to design such taxis to have entirely separate airflow and if setup to only take EFTPOS and credit card payment could avoid all contact between the driver and passengers. I would prefer to have a Hackney Carriage design of vehicle instead of a regular taxi or Uber.

Autonomous cars have been shown to basically work. There are some concerns about safety issues as there are currently corner cases that car computers don’t handle as well as people, but of course there are also things computers do better than people. Having an autonomous taxi would be a benefit for anyone who wants to avoid other people. Maybe approval could be rushed through for autonomous cars that are limited to 40Km/h (the maximum collision speed at which a pedestrian is unlikely to die), in central city areas and inner suburbs you aren’t likely to drive much faster than that anyway.

Car share services have been becoming popular, for many people they are significantly cheaper than owning a car due to the costs of regular maintenance, insurance, and depreciation. As the full costs of car ownership aren’t obvious people may focus on the disease risk and keep buying cars.

Passenger jets are ridiculously cheap. But this relies on the airline companies being able to consistently fill the planes. If they were to add measures to reduce cross contamination between passengers which slightly reduces the capacity of planes then they need to increase ticket prices accordingly which then reduces demand. If passengers are just scared of flying in close proximity and they can’t fill planes then they will have to increase prices which again reduces demand and could lead to a death spiral. If in the long term there aren’t enough passengers to sustain the current number of planes in service then airline companies will have significant financial problems, planes are expensive assets that are expected to last for a long time, if they can’t use them all and can’t sell them then airline companies will go bankrupt.

It’s not reasonable to expect that the same number of people will be travelling internationally for years (if ever). Due to relying on economies of scale to provide low prices I don’t think it’s possible to keep prices the same no matter what they do. A new economic balance of flights costing 2-3 times more than we are used to while having significantly less passengers seems likely. Governments need to spend significant amounts of money to improve trains to take over from flights that are cancelled or too expensive.

Entertainment

The article on The Bulwark mentions Las Vegas as a city that will be hurt a lot by reductions in travel and crowds, the same thing will happen to tourist regions all around the world. Australia has a significant tourist industry that will be hurt a lot. But the mention of Las Vegas makes me wonder what will happen to the gambling in general. Will people avoid casinos and play poker with friends and relatives at home? It seems that small stakes poker games among friends will be much less socially damaging than casinos, will this be good for society?

The article also mentions cinemas which have been on the way out since the video rental stores all closed down. There’s lots of prime real estate used for cinemas and little potential for them to make enough money to cover the rent. Should we just assume that most uses of cinemas will be replaced by Netflix and other streaming services? What about teenage dates, will kissing in the back rows of cinemas be replaced by “Netflix and chill”? What will happen to all the prime real estate used by cinemas?

Professional sporting matches have been played for a TV-only audience during the pandemic. There’s no reason that they couldn’t make a return to live stadium audiences when there is a vaccine for the disease or the disease has been extinguished by social distancing. But I wonder if some fans will start to appreciate the merits of small groups watching large TVs and not want to go back to stadiums, can this change the typical behaviour of groups?

Restaurants and cafes are going to do really badly. I previously wrote about my experience running an Internet Cafe and why reopening businesses soon is a bad idea [4]. The question is how long this will go for and whether social norms about personal space will change things. If in the long term people expect 25% more space in a cafe or restaurant that’s enough to make a significant impact on profitability for many small businesses.

When I was young the standard thing was for people to have dinner at friends homes. Meeting friends for dinner at a restaurant was uncommon. Recently it seemed to be the most common practice for people to meet friends at a restaurant. There are real benefits to meeting at a restaurant in terms of effort and location. Maybe meeting friends at their home for a delivered dinner will become a common compromise, avoiding the effort of cooking while avoiding the extra expense and disease risk of eating out. Food delivery services will do well in the long term, it’s one of the few industry segments which might do better after the pandemic than before.

Work

Many companies are discovering the benefits of teleworking, getting it going effectively has required investing in faster Internet connections and hardware for employees. When we have a vaccine the equipment needed for teleworking will still be there and we will have a discussion about whether it should be used on a more routine basis. When employees spend more than 2 hours per day travelling to and from work (which is very common for people who work in major cities) that will obviously limit the amount of time per day that they can spend working. For the more enthusiastic permanent employees there seems to be a benefit to the employer to allow working from home. It’s obvious that some portion of the companies that were forced to try teleworking will find it effective enough to continue in some degree.

One company that I work for has quit their coworking space in part because they were concerned that the coworking company might go bankrupt due to the pandemic. They seem to have become a 100% work from home company for the office part of the work (only on site installation and stock management is done at corporate locations). Companies running coworking spaces and other shared offices will suffer first as their clients have short term leases. But all companies renting out office space in major cities will suffer due to teleworking. I wonder how this will affect the companies providing services to the office workers, the cafes and restaurants etc. Will there end up being so much unused space in central city areas that it’s not worth converting the city cinemas into useful space?

There’s been a lot of news about Zoom and similar technologies. Lots of other companies are trying to get into that business. One thing that isn’t getting much notice is remote access technologies for desktop support. If the IT people can’t visit your desk because you are working from home then they need to be able to remotely access it to fix things. When people make working from home a large part of their work time the issue of who owns peripherals and how they are tracked will get interesting. In a previous blog post I suggested that keyboards and mice not be treated as assets [5]. But what about monitors, 4G/Wifi access points, etc?

Some people have suggested that there will be business sectors benefiting from the pandemic, such as telecoms and e-commerce. If you have a bunch of people forced to stay home who aren’t broke (IE a large portion of the middle class in Australia) they will probably order delivery of stuff for entertainment. But in the long term e-commerce seems unlikely to change much, people will spend less due to economic uncertainty so while they may shift some purchasing to e-commerce apart from home delivery of groceries e-commerce probably won’t go up overall. Generally telecoms won’t gain anything from teleworking, the Internet access you need for good Netflix viewing is generally greater than that needed for good video-conferencing.

Money

I previously wrote about a Basic Income for Australia [6]. One of the most cited reasons for a Basic Income is to deal with robots replacing people. Now we are at the start of what could be a long term economic contraction caused by the pandemic which could reduce the scale of the economy by a similar degree while also improving the economic case for a robotic workforce. We should implement a Universal Basic Income now.

I previously wrote about the make-work jobs and how we could optimise society to achieve the worthwhile things with less work [7]. My ideas about optimising public transport and using more car share services may not work so well after the pandemic, but the rest should work well.

Business

There are a number of big companies that are not aiming for profitability in the short term. WeWork and Uber are well documented examples. Some of those companies will hopefully go bankrupt and make room for more responsible companies.

The co-working thing was always a precarious business. The companies renting out office space usually did so on a monthly basis as flexibility was one of their selling points, but they presumably rented buildings on an annual basis. As the profit margins weren’t particularly high having to pay rent on mostly empty buildings for a few months will hurt them badly. The long term trend in co-working spaces might be some sort of collaborative arrangement between the people who run them and the landlords similar to the way some of the hotel chains have profit sharing agreements with land owners to avoid both the capital outlay for buying land and the risk involved in renting. Also city hotels are very well equipped to run office space, they have the staff and the procedures for running such a business, most hotels also make significant profits from conventions and conferences.

The way the economy has been working in first world countries has been about being as competitive as possible. Just in time delivery to avoid using storage space and machines to package things in exactly the way that customers need and no more machines than needed for regular capacity. This means that there’s no spare capacity when things go wrong. A few years ago a company making bolts for the car industry went bankrupt because the car companies forced the prices down, then car manufacture stopped due to lack of bolts – this could have been a wake up call but was ignored. Now we have had problems with toilet paper shortages due to it being packaged in wholesale quantities for offices and schools not retail quantities for home use. Food was destroyed because it was created for restaurant packaging and couldn’t be packaged for home use in a reasonable amount of time.

Farmer’s markets alleviate some of the problems with packaging food etc. But they aren’t a good option when there’s a pandemic as disease risk makes them less appealing to customers and therefore less profitable for vendors.

Religion

Many religious groups have supported social distancing. Could this be the start of more decentralised religion? Maybe have people read the holy book of their religion and pray at home instead of being programmed at church? We can always hope.

June 24, 2020

Automated MythTV-related maintenance tasks

Here is the daily/weekly cronjob I put together over the years to perform MythTV-related maintenance tasks on my backend server.

The first part performs a database backup:

5 1 * * *  mythtv  /usr/share/mythtv/mythconverg_backup.pl

which I previously configured by putting the following in /home/mythtv/.mythtv/backuprc:

DBBackupDirectory=/var/backups/mythtv

and creating a new directory for it:

mkdir /var/backups/mythtv
chown mythtv:mythtv /var/backups/mythtv

The second part of /etc/cron.d/mythtv-maintenance runs a contrib script to optimize the database tables:

10 1 * * *  mythtv  /usr/bin/chronic /usr/share/doc/mythtv-backend/contrib/maintenance/optimize_mythdb.pl

once a day. It requires the libmythtv-perl and libxml-simple-perl packages to be installed on Debian-based systems.

It is quickly followed by a check of the recordings and automatic repair of the seektable (when possible):

20 1 * * *  mythtv  /usr/bin/chronic /usr/bin/mythutil --checkrecordings --fixseektable

Next, I force a scan of the music and video databases to pick up anything new that may have been added externally via NFS mounts:

30 1 * * *  mythtv  /usr/bin/mythutil --quiet --scanvideos
31 1 * * *  mythtv  /usr/bin/mythutil --quiet --scanmusic

Finally, I defragment the XFS partition for two hours every day except Friday:

45 1 * * 1-4,6-7  root  /usr/sbin/xfs_fsr

and resync the RAID-1 arrays once a week to ensure that they stay consistent and error-free:

15 3 * * 2  root  /usr/local/sbin/raid_parity_check md0
15 3 * * 4  root  /usr/local/sbin/raid_parity_check md2

using a trivial script.

In addition to that cronjob, I also have smartmontools run daily short and weekly long SMART tests via this blurb in /etc/smartd.conf:

/dev/sda -a -d ata -o on -S on -s (S/../.././04|L/../../6/05)
/dev/sdb -a -d ata -o on -S on -s (S/../.././04|L/../../6/05)

If there are any other automated maintenance tasks you do on your MythTV server, please leave a comment!

June 23, 2020

Squirrelmail vs Roundcube

For some years I’ve had SquirrelMail running on one of my servers for the people who like such things. It seems that the upstream support for SquirrelMail has ended (according to the SquirrelMail Wikipedia page there will be no new releases just Subversion updates to fix bugs). One problem with SquirrelMail that seems unlikely to get fixed is the lack of support for base64 encoded From and Subject fields which are becoming increasingly popular nowadays as people who’s names don’t fit US-ASCII are encoding them in their preferred manner.

I’ve recently installed Roundcube to provide an alternative. Of course one of the few important users of webmail didn’t like it (apparently it doesn’t display well on a recent Samsung Galaxy Note), so now I have to support two webmail systems.

Below is a little Perl script to convert a SquirrelMail abook file into the csv format used for importing a RoundCube contact list.

#!/usr/bin/perl

print "First Name,Last Name,Display Name,E-mail Address\n";
while(<STDIN>)
{
  chomp;
  my @fields = split(/\|/, $_);
  printf("%s,%s,%s %s,%s\n", $fields[1], $fields[2], $fields[0], $fields[4], $fields[3]);
}

June 21, 2020

Open IP over VHF/UHF 1

I’ve recently done a lot of work on the Codec 2 FSK modem, here is the new README_fsk. It now works at lower SNRs, has been refactored, and is supported by a suite of automated tests.

There is some exciting work going on with Codec 2 modems and VHF/UHF IP links using TAP/TUN (thanks Tomas and Jeroen) – a Linux technology for building IP links from user space “data pumps” – like the Codec 2 modems.

My initial goal for this work is a “100 kbit/s IP link” for VHF/UHF using Codec 2 modems and SDR. One application is moving Ham Radio data past the 1995-era “9600 bits/s data port” paradigm to real time IP.

I’m also interested in IP over TV Whitespace (spare VHF/UHF spectrum) for emergency and developing world applications. I’m a judge for developing world IT grants and the “last 100km” problem comes up again and again. This solution requires just a Raspberry Pi and RTLSDR. At these frequencies antennas could be simply fabricated from wire (cut for the frequency of operation), and soldered directly to the Pi.

Results and Link Budget

As a first step, I’ve taken another look at using RpiTx for FSK, this time at VHF and starting at a modest 10 kbits/s. Over the weekend I performed some Minimum Detectable Signal (MDS) tests and confirmed the 2FSK modem built with RpiTx, a RTL-SDR, and the Codec 2 modem is right on theory at 10 kbits/s, with a MDS of -120dBm.

Putting this in context, a UHF signal has a path loss of 125dB over 100km. So if you have a line of site path, a 10mW (10dBm) signal will be 10-125 = -115dBm at your receiver (assuming basic 0dBi antennas). As -115dBm is greater than the -120dBm MDS, this means your data will be received error free (especially when we add forward error correction). We have sufficient “link margin” and the “link” is closed.

While our 10 kbits/s starting point doesn’t sound like much – even at that rate we get to send 10000*3600*24/8/140 = 771,000 140 byte text messages each day to another station on your horizon. That’s a lot of connectivity in an emergency or when the alternative where you live is nothing.

Method

I’m using the GitHub PR as a logbook for the work, I quite like GitHub and Markdown. This weekends MDS experiments start here.

I had the usual fun and games with attenuating the Rx signal from the Pi down to -120dBm. The transmit signal tries hard to leak around the attenuators via a RF path. I moved the unshielded Pi into another room, and built a “plastic bag and aluminium foil” Faraday cage which worked really well:


These are complex systems and many things can go wrong. Are your Tx/Rx sample clocks close enough? Is your rx signal clipping? Is the gain of your radio sufficient to reduce quantisation noise? Bug in your modem code? DC line in your RTLSDR signal? Loose SMA connector?

I’ve learnt the hard way to test very carefully at each step. First, I run off air samples through a non-real time modem Octave simulation to visualise what’s going on inside the modem. A software oscilloscope.

An Over the Cable (OTC) test is essential before trying Over the Air (OTA) as it gives you a controlled environment to spot issues. MDS tests that measure the Bit error Rate (BER) are also excellent, they effectively absorb every factor in the system and give you an overall score (the Bit Error Rate) you can compare to theory.

Spectral Purity

Here is the spectrum of the FSK signal for a …01010… sequence at 100 kbit/s, at two resolution bandwidths:

The Tx power is about 10dBm, this plot is after some attenuation. I haven’t carefully checked the spurious levels, but the above looks like around -40dBc (off a low 10mW EIRP) over this 1MHz span. If I am reading the Australian regulations correctly (Section 7A of the Amateur LCD) the requirement is 43+10log(P) = 43+10log10(0.01) = 23dBc, so we appear to pass.

Conclusion

This is “extreme open source”. The transmitter is software, the modem is software. All open source and free as in beer and speech. No chipsets or application specific radio hardware – just some CPU cycles and a down converter supplied by the Pi and RTLSDR. The only limits are those of physics – which we have reached with the MDS tests.

Reading Further

Open IP over VHF/UHF 2 – Next post in this series
Pi Radio IP – Current GitHub Repo for this work
FSK modem support for TAP/TUN – Early GitHub PR for this work
Testing a RTL-SDR with FSK on HF
High Speed Balloon Data Links
Codec 2 FSK modem README – includes lots of links and sample applications.

June 19, 2020

Storage Trends

In considering storage trends for the consumer side I’m looking at the current prices from MSY (where I usually buy computer parts). I know that other stores will have slightly different prices but they should be very similar as they all have low margins and wholesale prices are the main factor.

Small Hard Drives Aren’t Viable

The cheapest hard drive that MSY sells is $68 for 500G of storage. The cheapest SSD is $49 for 120G and the second cheapest is $59 for 240G. SSD is cheaper at the low end and significantly faster. If someone needed about 500G of storage there’s a 480G SSD for $97 which costs $29 more than a hard drive. With a modern PC if you have no hard drives you will notice that it’s quieter. For anyone who’s buying a new PC spending an extra $29 is definitely worthwhile for the performance, low power use, and silence.

The cheapest 1TB disk is $69 and the cheapest 1TB SSD is $159. Saving $90 on the cost of a new PC probably isn’t worth while.

For 2TB of storage the cheapest options are Samsung NVMe for $339, Crucial SSD for $335, or a hard drive for $95. Some people would choose to save $244 by getting a hard drive instead of NVMe, but if you are getting a whole system then allocating $244 to NVMe instead of a faster CPU would probably give more benefits overall.

Computer stores typically have small margins and computer parts tend to quickly either become cheaper or be obsoleted by better parts. So stores don’t want to stock parts unless they will sell quickly. Disks smaller than 2TB probably aren’t going to be profitable for stores for very long. The trend of SSD and NVMe becoming cheaper is going to make 2TB disks non-viable in the near future.

NVMe vs SSD

M.2 NVMe devices are at comparable prices to SATA SSDs. For some combinations of quality and capacity NVMe is about 50% more expensive and for some it’s slightly cheaper (EG Intel 1TB NVMe being cheaper than Samsung EVO 1TB SSD). Last time I checked about half the motherboards on sale had a single M.2 socket so for a new workstation that doesn’t need more than 2TB of storage (the largest NVMe that MSY sells) it wouldn’t make sense to use anything other than NVMe.

The benefit of NVMe is NOT throughput (even though NVMe devices can often sustain over 4GB/s), it’s low latency. Workstations can’t properly take advantage of this because RAM is so cheap ($198 for 32G of DDR4) that compiles etc mostly come from cache and because most filesystem writes on workstations aren’t synchronous. For servers a large portion of writes are synchronous, for example a mail server can’t acknowledge receiving mail until it knows that it’s really on disk, so there’s a lot of small writes that block server processes and the low latency of NVMe really improves performance. If you are doing a big compile on a workstation (the most common workstation task that uses a lot of disk IO) then the writes aren’t synchronised to disk and if the system crashes you will just do all the compilation again. While NVMe doesn’t give a lot of benefit over SSD for workstation use (I’ve uses laptops with SSD and NVMe and not noticed a great difference) of course I still want better performance. ;)

Last time I checked I couldn’t easily buy a PCIe card that supported 2*NVMe cards, I’m sure they are available somewhere but it would take longer to get and probably cost significantly more than twice as much. That means a RAID-1 of NVMe takes 2 PCIe slots if you don’t have an M.2 socket on the motherboard. This was OK when I installed 2*NVMe devices on a server that had 18 disks and lots of spare PCIe slots. But for some systems PCIe slots are an issue.

My home server has all PCIe slots used by a video card and Ethernet cards and the BIOS probably won’t support booting from NVMe. It’s a Dell server so I can’t just replace the motherboard with one that has more PCIe slots and M.2 on the motherboard. As it’s running nicely and doesn’t need replacing any time soon I won’t be using NVMe for home server stuff.

Small Servers

Most servers that I am responsible for have less than 2TB of storage. For my clients I now only recommend SSD storage for small servers and am recommending SSD for replacing any failed disks.

My home server has 2*500G SSDs in a BTRFS RAID-1 for the root filesystem, and 3*4TB disks in a BTRFS RAID-1 for storing big files. I bought the SSDs when 500G SSDs were about $250 each and bought 2*4TB disks when they were about $350 each. Currently that server has about 3.3TB of space used and I could probably get it down to about 2.5TB if I deleted things I don’t really need. If I was getting storage for that server now I’d use 2*2TB SSDs and 3*1TB hard drives for the stuff that doesn’t fit on SSDs (I have some spare 1TB disks that came with servers). If I didn’t have spare hard drives I’d get 3*2TB SSDs for that sort of server which would give 3TB of BTRFS RAID-1 storage.

Last time I checked Dell servers had a card for supporting M.2 as an optional extra so Dells probably won’t boot from NVMe without extra expense.

Ars Technica has an informative article about WD selling SMR disks as “NAS” disks [1]. The Shingled Magnetic Recording technology allows greater storage density on a platter which leads to either larger capacity or cheaper disks but at the cost of lower write performance and apparently extremely bad latency in some situations. NAS disks are supposed to be low latency as the expectation is that they will be used in a RAID array and kicked out of the array if they have problems. There are reports of ZFS kicking SMR disks from RAID sets. I think this will end the use of hard drives for small servers. For a server you don’t want to deal with this sort of thing, by definition when a server goes down multiple people will stop work (small server implies no clustering). Spending extra to get SSDs just to avoid the risk of unexpected SMR would be a good plan.

Medium Servers

The largest SSD and NVMe devices that are readily available are 2TB but 10TB disks are commodity items, there are reports of 20TB hard drives being available but I can’t find anyone in Australia selling them.

If you need to store dozens or hundreds of terabytes than hard drives have to be part of the mix at this time. There’s no technical reason why SSDs larger than 10TB can’t be made (the 2.5″ SATA form factor has more than 5* the volume of a 2TB M.2 card) and it’s likely that someone sells them outside the channels I buy from, but probably at a price higher than what my clients are willing to pay. If you want 100TB of affordable storage then a mid range server like the Dell PowerEdge T640 which can have up to 18*3.5″ disks is good. One of my clients has a PowerEdge T630 with 18*3.5″ disks in the 8TB-10TB range (we replace failed disks with the largest new commodity disks available, it used to have 6TB disks). ZFS version 0.8 introduced a “Special VDEV Class” which stores metadata and possibly small data blocks on faster media. So you could have some RAID-Z groups on hard drives for large storage and the metadata on a RAID-1 on NVMe for fast performance. For medium size arrays on hard drives having a “find /” operation take hours is not uncommon, for large arrays having it take days isn’t that uncommon. So far it seems that ZFS is the only filesystem to have taken the obvious step of storing metadata on SSD/NVMe while bulk data is on cheap large disks.

One problem with large arrays is that the vibration of disks can affect the performance and reliability of nearby disks. The ZFS server I run with 18 disks was originally setup with disks from smaller servers that never had ZFS checksum errors, but when disks from 2 small servers were put in one medium size server they started getting checksum errors presumably due to vibration. This alone is a sufficient reason for paying a premium for SSD storage.

Currently the cost of 2TB of SSD or NVMe is between the prices of 6TB and 8TB hard drives, and the ratio of price/capacity for SSD and NVMe is improving dramatically while the increase in hard drive capacity is slow. 4TB SSDs are available for $895 compared to a 10TB hard drive for $549, so it’s 4* more expensive on a price per TB. This is probably good for Windows systems, but for Linux systems where ZFS and “special VDEVs” is an option it’s probably not worth considering. Most Linux user cases where 4TB SSDs would work well would be better served by smaller NVMe and 10TB disks running ZFS. I don’t think that 4TB SSDs are at all popular at the moment (MSY doesn’t stock them), but prices will come down and they will become common soon enough. Probably by the end of the year SSDs will halve in price and no hard drives less than 4TB will be viable.

For rack mounted servers 2.5″ disks have been popular for a long time. It’s common for vendors to offer 2 versions of a rack mount server for 2.5″ and 3.5″ disks where the 2.5″ version takes twice as many disks. If the issue is total storage in a server 4TB SSDs can give the same capacity as 8TB HDDs.

SMR vs Regular Hard Drives

Rumour has it that you can buy 20TB SMR disks, I haven’t been able to find a reference to anyone who’s selling them in Australia (please comment if you know who sells them and especially if you know the price). I expect that the ZFS developers will soon develop a work-around to solve the problems with SMR disks. Then arrays of 20TB SMR disks with NVMe for “special VDEVs” will be an interesting possibility for storage. I expect that SMR disks will be the majority of the hard drive market by 2023 – if hard drives are still on the market. SSDs will be large enough and cheap enough that only SMR disks will offer enough capacity to be worth using.

I think that it is a possibility that hard drives won’t be manufactured in a few years. The volume of a 3.5″ disk is significantly greater than that of 10 M.2 devices so current technology obviously allows 20TB of NVMe or SSD storage in the space of a 3.5″ disk. If the price of 16TB NVMe and SSD devices comes down enough (to perhaps 3* the price of a 20TB hard drive) almost no-one would want the hard drive and it wouldn’t be viable to manufacture them.

It’s not impossible that in a few years time 3D XPoint and similar fast NVM technologies occupy the first level of storage (the ZFS “special VDEV”, OS swap device, log device for database servers, etc) and NVMe occupies the level for bulk storage with no space left in the market for spinning media.

Computer Cases

For servers I expect that models supporting 3.5″ storage devices will disappear. A 1RU server with 8*2.5″ storage devices or a 2RU server with 16*2.5″ storage devices will probably be of use to more people than a 1RU server with 4*3.5″ or a 2RU server with 8*3.5″.

My first IBM PC compatible system had a 5.25″ hard drive, a 5.25″ floppy drive, and a 3.5″ floppy drive in 1988. My current PC is almost a similar size and has a DVD drive (that I almost never use) 5 other 5.25″ drive bays that have never been used, and 5*3.5″ drive bays that I have never used (I have only used 2.5″ SSDs). It would make more sense to have PC cases designed around 2.5″ and maybe 3.5″ drives with no more than one 5.25″ drive bay.

The Intel NUC SFF PCs are going in the right direction. Many of them only have a single storage device but some of them have 2*M.2 sockets allowing RAID-1 of NVMe and some of them support ECC RAM so they could be used as small servers.

A USB DVD drive costs $36, it doesn’t make sense to have every PC designed around the size of an internal DVD drive that will probably only be used to install the OS when a $36 USB DVD drive can be used for every PC you own.

The only reason I don’t have a NUC for my personal workstation is that I get my workstations from e-waste. If I was going to pay for a PC then a NUC is the sort of thing I’d pay to have on my desk.

June 16, 2020

Linux Security Summit North America 2020: Online Schedule

Just a quick update on the Linux Security Summit North America (LSS-NA) for 2020.

The event will take place over two days as an online event, due to COVID-19.  The dates are now July 1-2, and the full schedule details may be found here.

The main talks are:

There are also short (30 minute) topics:

This year we will also have a Q&A panel at the end of each day, moderated by Elena Reshetova. The panel speakers are:

  • Nayna Jain
  • Andrew Lutomirski
  • Dmitry Vyukov
  • Emily Ratliff
  • Alexander Popov
  • Christian Brauner
  • Allison Marie Naaktgeboren
  • Kees Cook
  • Mimi Zohar

LSS-NA this year is included with OSS+ELC registration, which is USD $50 all up.  Register here.

Hope to see you soon!

June 14, 2020

Codec 2 HF Data Modes 1

Since “attending” MHDC last month I’ve taken an interest in open source HF data modems. So I’ve been busy refactoring the Codec 2 OFDM modem for use with HF data.

The major change is from streaming small (28 bit) voice frames to longer (few hundred byte) packets of data. In some ways data is easier than PTT voice: latency is no longer an issue, and I can use nice long FEC codewords that ride over fades. On the flip side we really care about bit errors with data, for voice it’s acceptable to pass frames with errors to the speech decoder, and let the human ear work it out.

As a first step I’ve been working with GNU Octave simulations, and have developed 3 candidate data modes that I have been testing against simulated HF channels. In simulation they work well with up to 4ms of delay and 2.5Hz of Doppler.

Here are the simulation results for 10% Packet Error Rate (PER). The multipath channel has 2ms delay spread and 2Hz Doppler (CCITT Multipath Poor channel).

Mode Est Bytes/min AWGN SNR (dB) Multipath Poor SNR (dB)
datac1 6000 3 12
datac2 3000 1 7
datac3 1200 -3 0

The bytes/minute metric is commonly used by Winlink (divide by 7.5 for bits/s). I’ve assumed a 20% overhead for ARQ and other overheads. HF data isn’t fast – it’s a tough, narrow channel to push data through. But for certain applications (e.g. if you’re off the grid, or when the lights go out) it may be all you have. Even these low rates can be quite useful, 1200 bytes/minute is 8.5 tweets or SMS texts/minute.

The modem waveforms are pilot assisted coherent PSK using LDPC FEC codes. Coherent PSK can have gains of up to 6dB over differential PSK (DPSK) modems commonly used on HF.

Before I get too far along I wanted to try them over a real HF channels, to make sure I was on the right track. So much can go wrong with DSP in the real world!

So today I sent the new data waveforms over the air for the first time, using an 800km path on the 40m band from my home in Adelaide South Australia to a KiwiSDR about 800km away in Melbourne, Victoria.

Mode Est Bytes/min Power (Wrms) Est SNR (dB) Packets Tx/Rx
datac1 6000 10 10-15 15/15
datac2 3000 10 5-15 8/8
datac3 1200 0.5 -2 20/25

The Tx power is the RMS measured on my spec-an, for the 10W RMS samples it was 75W PEP. The SNR is measured in a 3000Hz noise bandwidth, I have a simple dipole at my end, not sure what the KiwiSDR was using.

I’m quite happy with these results. To give the c3 waveform a decent work out I dropped the power down to just 0.5W (listen), and I could still get 30% of the packets through at 100mW. A few of the tests had significant fading, however it was not very fast. My simulations are far tougher. Maybe I’ll try a NVIS path to give the modem a decent test on fast fading channels.

Here is the spectrogram (think waterfall on it’s side) for the -2dB datac3 sample:

Here are the uncoded (raw) errors, and the errors after FEC. Most of the frames made it. This mode employs a rate 1/3 LDPC code that was developed by Bill, VK5DSP. It can work at up to 16% raw BER! The errors at the end are due to the Tx signal ending, at this stage of development I just have a simple state machine with no “squelch”.

We have also been busy developing an API for the Codec 2 modems, see README_data.md. The idea is to allow developers of HF data protocols and applications to use the Codec 2 modems. As well as the “raw” HF data API, there is a very nice Ethernet style framer for VHF packet developed by Jeroen Vreeken.

If anyone would like to try running the modem Octave code take a look at the GitHub PR.

Reading Further

QAM and Packet Data for OFDM Pull Request for this work. Includes lots of notes. The waveform designs are described in this spreadsheet.
README for the Codec 2 OFDM modem, includes examples and more links.
Test Report of Various Winlink Modems
Modems for HF Digital Voice Part 1
Modems for HF Digital Voice Part 2

June 06, 2020

Comparing Compression

I just did a quick test of different compression options in Debian. The source file is a 1.1G MySQL dump file. The time is user CPU time on a i7-930 running under KVM, the compression programs may have different levels of optimisation for other CPU families.

Facebook people designed the zstd compression system (here’s a page giving an overview of it [1]). It has some interesting new features that can provide real differences at scale (like unusually large windows and pre-defined dictionaries), but I just tested the default mode and the -9 option for more compression. For the SQL file “zstd -9” provides significantly better compression than gzip while taking only slightly less CPU time than “gzip -9” while zstd with the default option (equivalent to “zstd -3”) gives much faster compression than “gzip -9” while also being slightly smaller. For this use case bzip2 is too slow for inline compression of a MySQL dump as the dump process locks tables and can hang clients. The lzma and xz compression algorithms provide significant benefits in size but the time taken is grossly disproportionate.

In a quick check of my collection of files compressed with gzip I was only able to fine 1 fild that got less compression with zstd with default options, and that file got better compression with “zstd -9”. So zstd seems to beat gzip everywhere by every measure.

The bzip2 compression seems to be obsolete, “zstd -9” is much faster and has slightly smaller output.

Both xz and lzma seem to offer a combination of compression and time taken that zstd can’t beat (for this file type at least). The ultra compression mode 22 gives 2% smaller output files but almost 28 minutes of CPU time for compression is a bit ridiculous. There is a threaded mode for zstd that could potentially allow a shorter wall clock time for “zstd --ultra -22” than lzma/xz while also giving better compression.

Compression Time Size
zstd 5.2s 130m
zstd -9 28.4s 114m
gzip -9 33.4s 141m
bzip2 -9 3m51 119m
lzma 6m20 97m
xz 6m36 97m
zstd -19 9m57 99m
zstd --ultra -22 27m46 95m

Conclusion

For distributions like Debian which have large archives of files that are compressed once and transferred a lot the “zstd --ultra -22” compression might be useful with multi-threaded compression. But given that Debian already has xz in use it might not be worth changing until faster CPUs with lots of cores become more commonly available. One could argue that for Debian it doesn’t make sense to change from xz as hard drives seem to be getting larger capacity (and also smaller physical size) faster than the Debian archive is growing. One possible reason for adopting zstd in a distribution like Debian is that there are more tuning options for things like memory use. It would be possible to have packages for an architecture like ARM that tends to have less RAM compressed in a way that decreases memory use on decompression.

For general compression such as compressing log files and making backups it seems that zstd is the clear winner. Even bzip2 is far too slow and in my tests zstd clearly beats gzip for every combination of compression and time taken. There may be some corner cases where gzip can compete on compression time due to CPU features, optimisation for CPUs, etc but I expect that in almost all cases zstd will win for compression size and time. As an aside I once noticed the 32bit of gzip compressing faster than the 64bit version on an Opteron system, the 32bit version had assembly optimisation and the 64bit version didn’t at that time.

To create a tar archive you can run “tar czf” or “tar cJf” to create an archive with gzip or xz compression. To create an archive with zstd compression you have to use “tar --zstd -cf”, that’s 7 extra characters to type. It’s likely that for most casual archive creation (EG for copying files around on a LAN or USB stick) saving 7 characters of typing is more of a benefit than saving a small amount of CPU time and storage space. It would be really good if tar got a single character option for zstd compression.

The external dictionary support in zstd would work really well with rsync for backups. Currently rsync only supports zlib, adding zstd support would be a good project for someone (unfortunately I don’t have enough spare time).

Now I will change my database backup scripts to use zstd.

Update:

The command “tar acvf a.zst filenames” will create a zstd compressed tar archive, the “a” option to GNU tar makes it autodetect the compression type from the file name. Thanks Enrico!

May 31, 2020

Effective Altruism

Long term readers of the blog may recall my daughter Amy. Well, she has moved on from teenage partying and is now e-volunteering at Effective Altruism Australia. She recently pointed me at the free e-book The Life You Can Save by Peter Singer.

I was already familiar with the work of Peter Singer, having read “the Most Good You Can Do”. Peter puts numbers on altruistic behaviour to evaluate them. This appeals to me – as an engineer I uses numbers to evaluate artefacts I build like modems, or other processes going on in the world like COVD-19.

Using technology to help people is a powerful motivator for Geeks. I’ve been involved in a few of these initiatives myself (OLPC and The Village Telco). It’s really tough to create something that helps people long term. A wider set of skills and capabilities are required than just “the technology”.

On my brief forays into the developing world I’ve seen ecologies of people (from the first and developing worlds) living off development dollars. In some cases there is no incentive to report the true outcomes, for example how many government bureaucrats want to report failure? How many consultants want the gig to end?

So I really get the need for scientific evaluation of any development endeavours. Go Peter and the Effective Altruism movement!

I spend around 1000 hours a year writing open source code, a strong argument that I am “doing enough” in the community space. However I have no idea how effective that code is. Is it helping anyone? My inclination to help is also mixed with “itch scratching” – geeky stuff I want to work on because I find it interesting.

So after the reading the book and having a think – I’m sold. I have committed 5% of my income to Effective Altruism Australia, selecting Give Directly as a target for my funds as it appealed to me personally.

I asked Amy proof read this post – and she suggested that instead of $ you, can donate time – that’s what she does. She also said:

Effective Altrusim opens your eyes to alternative ways to interact with charities. It combines the board field of social science to explore how may aspects intersect; by applying the scientific method to that of economics, psychology, international development, and anthropology.

Reading Further

Busting Teenage Partying with a Fluksometer
Effective Altruism Australia

May 30, 2020

AudioBooks – May 2020

Fewer books this month. At home on lockdown and weather a bit worse so less time to go on walks walks and listen.

Save the Cat! Writes a Novel: The Last Book On Novel Writing You’ll Ever Need by Jessica Brody

A fairly straight adaption of the screenplay-writing manual. Lots of examples from well-known books including full breakdowns of beats. 3/5

Happy Singlehood: The Rising Acceptance and Celebration of Solo Living by Elyakim Kislev

Based on 142 interviews. A lot of summaries of findings with quotes for interviewees and people’s blogs. Last chapter has some policy push but a little lights 3/5

Scandinavia: A History by Ewan Butler

Just a a 6 hour long quick spin though history. First half suffers a bit with lists of Kings although there is a bit more colour later in. Okay prep for something meatier 3/5

One Giant Leap: The Impossible Mission That Flew Us to the Moon by Charles Fishman

A bit of a mix. It covers the legacy of Apollo but the best bits are chapters on the Computers, Politics and other behind the scenes things. A compliment to astronaut and mission orientated books. 4/5

My Scoring System

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recommend
  • 3/5 = Average. in the middle 70% of books I read
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Share

May 29, 2020

Using Live Linux to Save and Recover Your Data

There are two types of people in the world; those who have lost data and those who are about to. Given that entropy will bite eventually, the objective should be to minimise data loss. Some key rules for this backup, backup often, and backup with redundancy. Whilst an article on that subject will be produced, at this stage discussion is directed to the very specific task of recovering data from old machines which may not be accessible anymore using Linux. There number of times I've done this in past years is somewhat more than the number of fingers I have - however, like all good things it deserves to be documented in the hope that other people might find it useful.

To do this one will need a Linux live distribution of some sort as an ISO, as a bootable USB drive. A typical choice would be a Ubuntu Live or Fedora Live. If one is dealing with damaged hardware the old Slackware-derived minimalist distribution Recovery is Possible (RIP) is certainly worth using; it's certainly saved me in the past. If you need help in creating a bootable USB, the good people at HowToGeek provide some simple instructions.

With a Linux bootable disk of some description inserted in one's system, the recovery process can begin. Firstly, boot the machine and change the book order (in BIOS/UEFI) that the drive in question becomes the first in the boot order. Once the live distribution boots up, usually in a GUI environment, one needs to open the terminal application (e.g., GNOME in Fedora uses Applications, System Tools, Terminal) and change to the root user with the su command for Fedora or sudo -i for Ubuntu (there's no password on a live CD to be root!).

At this point one needs to create a mount point directory, where the data is going to be stored; mkdir /mnt/recovery. After this one needs to identify the disk which one is trying to access. The fdisk -l command will provide a list of all disks in the partition table. Some educated guesswork from the results is required here, which will provide the device filesystem Type; it almost certainly isn't an EFI System, or Linux swap for example. Typically one is trying to access something like: /dev/sdaX

Then one must mount the device to the directory that was just created, for example: mount /dev/sda2 /mnt/recovery. Sometimes a recalcitrant device will need to have the filesystem explicitly stated; the most common being ext3, ext4, fat, xfs, vfat, and ntfs-3g. To give a recent example I needed to run mount -t ext3 /dev/sda3 /mnt/recovery. From there one can copy the data from the mount point to a new source; a USB drive is probably the quickest, although one may take the opportunity to copy it to an external system (e.g., google drive) - and that's it! You've recovered your data!

May 28, 2020

Fixing locale problem in MythTV 30

After upgrading to MythTV 30, I noticed that the interface of mythfrontend switched from the French language to English, despite having the following in my ~/.xsession for the mythtv user:

export LANG=fr_CA.UTF-8
exec ~/bin/start_mythtv

I noticed a few related error messages in /var/log/syslog:

mythbackend[6606]: I CoreContext mythcorecontext.cpp:272 (Init) Assumed character encoding: fr_CA.UTF-8
mythbackend[6606]: N CoreContext mythcorecontext.cpp:1780 (InitLocale) Setting QT default locale to FR_US
mythbackend[6606]: I CoreContext mythcorecontext.cpp:1813 (SaveLocaleDefaults) Current locale FR_US
mythbackend[6606]: E CoreContext mythlocale.cpp:110 (LoadDefaultsFromXML) No locale defaults file for FR_US, skipping
mythpreviewgen[9371]: N CoreContext mythcorecontext.cpp:1780 (InitLocale) Setting QT default locale to FR_US
mythpreviewgen[9371]: I CoreContext mythcorecontext.cpp:1813 (SaveLocaleDefaults) Current locale FR_US
mythpreviewgen[9371]: E CoreContext mythlocale.cpp:110 (LoadDefaultsFromXML) No locale defaults file for FR_US, skipping

Searching for that non-existent fr_US locale, I found that others have this in their logs and that it's apparently set by QT as a combination of the language and country codes.

I therefore looked in the database and found the following:

MariaDB [mythconverg]> SELECT value, data FROM settings WHERE value = 'Language';
+----------+------+
| value    | data |
+----------+------+
| Language | FR   |
+----------+------+
1 row in set (0.000 sec)

MariaDB [mythconverg]> SELECT value, data FROM settings WHERE value = 'Country';
+---------+------+
| value   | data |
+---------+------+
| Country | US   |
+---------+------+
1 row in set (0.000 sec)

which explains the non-sensical FR-US locale.

I fixed the country setting like this

MariaDB [mythconverg]> UPDATE settings SET data = 'CA' WHERE value = 'Country';
Query OK, 1 row affected (0.093 sec)
Rows matched: 1  Changed: 1  Warnings: 0

After logging out and logging back in, the user interface of the frontend is now using the fr_CA locale again and the database setting looks good:

MariaDB [mythconverg]> SELECT value, data FROM settings WHERE value = 'Country';
+---------+------+
| value   | data |
+---------+------+
| Country | CA   |
+---------+------+
1 row in set (0.000 sec)

Introducing Shaken Fist

Share

The first public commit to what would become OpenStack Nova was made ten years ago today — at Thu May 27 23:05:26 2010 PDT to be exact. So first off, happy tenth birthday to Nova!

A lot has happened in that time — OpenStack has gone from being two separate Open Source projects to a whole ecosystem, developers have come and gone (and passed away), and OpenStack has weathered the cloud wars of the last decade. OpenStack survived its early growth phase by deliberately offering a “big tent” to the community and associated vendors, with an expansive definition of what should be included. This has resulted in most developers being associated with a corporate sponser, and hence the decrease in the number of developers today as corporate interest wanes — OpenStack has never been great at attracting or retaining hobbist contributors.

My personal involvement with OpenStack started in November 2011, so while I missed the very early days I was around for a lot and made many of the mistakes that I now see in OpenStack.

What do I see as mistakes in OpenStack in hindsight? Well, embracing vendors who later lose interest has been painful, and has increased the complexity of the code base significantly. Nova itself is now nearly 400,000 lines of code, and that’s after splitting off many of the original features of Nova such as block storage and networking. Additionally, a lot of our initial assumptions are no longer true — for example in many cases we had to write code to implement things, where there are now good libraries available from third parties.

That’s not to say that OpenStack is without value — I am a daily user of OpenStack to this day, and use at least three OpenStack public clouds at the moment. That said, OpenStack is a complicated beast with a lot of legacy that makes it hard to maintain and slow to change.

For at least six months I’ve felt the desire for a simpler cloud orchestration layer — both for my own personal uses, and also as a test bed for ideas for what a smaller, simpler cloud might look like. My personal use case involves a relatively small environment which echos what we now think of as edge compute — less than 10 RU of machines with a minimum of orchestration and management overhead.

At the time that I was thinking about these things, the Australian bushfires and COVID-19 came along, and presented me with a lot more spare time than I had expected to have. While I’m still blessed to be employed, all of my social activities have been cancelled, so I find myself at home at a loose end on weekends and evenings at lot more than before.

Thus Shaken Fist was born — named for a Simpson’s meme, Shaken Fist is a deliberately small and highly opinionated cloud implementation aimed at working well in small deployments such as homes, labs, edge compute locations, deployed systems, and so forth.

I’d taken a bit of trouble with each feature in Shaken Fist to think through what the simplest and highest value way of doing something is. For example, instances always get a config drive and there is no metadata server. There is also only one supported type of virtual networking, and one supported hypervisor. That said, this means Shaken Fist is less than 5,000 lines of code, and small enough that new things can be implemented very quickly by a single middle aged developer.

Shaken Fist definitely has feature gaps — API authentication and scheduling are the most obvious at the moment — but I have plans to fill those when the time comes.

I’m not sure if Shaken Fist is useful to others, but you never know. Its apache2 licensed, and available on github if you’re interested.

Share

May 27, 2020

57 Varieties of Pyrite: Exchanges Are Now The Enemy of Bitcoin

TL;DR: exchanges are casinos and don’t want to onboard anyone into bitcoin. Avoid.

There’s a classic scam in the “crypto” space: advertize Bitcoin to get people in, then sell suckers something else entirely. Over the last few years, this bait-and-switch has become the core competency of “bitcoin” exchanges.

I recently visited the homepage of Australian exchange btcmarkets.net: what a mess. There was a list of dozens of identical-looking “cryptos”, with bitcoin second after something called “XRP”; seems like it was sorted by volume?

Incentives have driven exchanges to become casinos, and they’re doing exactly what you’d expect unregulated casinos to do. This is no place you ever want to send anyone.

Incentives For Exchanges

Exchanges make money on trading, not on buying and holding. Despite the fact that bitcoin is the only real attempt to create an open source money, scams with no future are given false equivalence, because more assets means more trading. Worse than that, they are paid directly to list new scams (the crappier, the more money they can charge!) and have recently taken the logical step of introducing and promoting their own crapcoins directly.

It’s like a gold dealer who also sells 57 varieties of pyrite, which give more margin than selling actual gold.

For a long time, I thought exchanges were merely incompetent. Most can’t even give out fresh addresses for deposits, batch their outgoing transactions, pay competent fee rates, perform RBF or use segwit.

But I misunderstood: they don’t want to sell bitcoin. They use bitcoin to get you in the door, but they want you to gamble. This matters: you’ll find subtle and not-so-subtle blockers to simply buying bitcoin on an exchange. If you send a friend off to buy their first bitcoin, they’re likely to come back with something else. That’s no accident.

Looking Deeper, It Gets Worse.

Regrettably, looking harder at specific exchanges makes the picture even bleaker.

Consider Binance: this mainland China backed exchange pretending to be a Hong Kong exchange appeared out of nowhere with fake volume and demonstrated the gullibility of the entire industry by being treated as if it were a respected member. They lost at least 40,000 bitcoin in a known hack, and they also lost all the personal information people sent them to KYC. They aggressively market their own coin. But basically, they’re just MtGox without Mark Karpales’ PHP skills or moral scruples and much better marketing.

Coinbase is more interesting: an MBA-run “bitcoin” company which really dislikes bitcoin. They got where they are by spending big on regulations compliance in the US so they could operate in (almost?) every US state. (They don’t do much to dispel the wide belief that this regulation protects their users, when in practice it seems only USD deposits have any guarantee). Their natural interest is in increasing regulation to maintain that moat, and their biggest problem is Bitcoin.

They have much more affinity for the centralized coins (Ethereum) where they can have influence and control. The anarchic nature of a genuine open source community (not to mention the developers’ oft-stated aim to improve privacy over time) is not culturally compatible with a top-down company run by the Big Dog. It’s a running joke that their CEO can’t say the word “Bitcoin”, but their recent “what will happen to cryptocurrencies in the 2020s” article is breathtaking in its boldness: innovation is mainly happening on altcoins, and they’re going to overtake bitcoin any day now. Those scaling problems which the Bitcoin developers say they don’t know how to solve? This non-technical CEO knows better.

So, don’t send anyone to an exchange, especially not a “market leading” one. Find some service that actually wants to sell them bitcoin, like CashApp or Swan Bitcoin.

May 26, 2020

Cruises and Covid19

Problems With Cruises

GQ has an insightful and detailed article about Covid19 and the Diamond Princess [1], I recommend reading it.

FastCompany has a brief article about bookings for cruises in August [2]. There have been many negative comments about this online.

The first thing to note is that the cancellation policies on those cruises are more lenient than usual and the prices are lower. So it’s not unreasonable for someone to put down a deposit on a half price holiday in the hope that Covid19 goes away (as so many prominent people have been saying it will) in the knowledge that they will get it refunded if things don’t work out. Of course if the cruise line goes bankrupt then no-one will get a refund, but I think people are expecting that won’t happen.

The GQ article highlights some serious problems with the way cruise ships operate. They have staff crammed in to small cabins and the working areas allow transmission of disease. These problems can be alleviated, they could allocate more space to staff quarters and have more capable air conditioning systems to put in more fresh air. During the life of a cruise ship significant changes are often made, replacing engines with newer more efficient models, changing the size of various rooms for entertainment, installing new waterslides, and many other changes are routinely made. Changing the staff only areas to have better ventilation and more separate space (maybe capsule-hotel style cabins with fresh air piped in) would not be a difficult change. It would take some money and some dry-dock time which would be a significant expense for cruise companies.

Cruises Are Great

People like social environments, they want to have situations where there are as many people as possible without it becoming impossible to move. Cruise ships are carefully designed for the flow of passengers. Both the layout of the ship and the schedule of events are carefully planned to avoid excessive crowds. In terms of meeting the requirement of having as many people as possible in a small area without being unable to move cruise ships are probably ideal.

Because there is a large number of people in a restricted space there are economies of scale on a cruise ship that aren’t available anywhere else. For example the main items on the menu are made in a production line process, this can only be done when you have hundreds of people sitting down to order at the same time.

The same applies to all forms of entertainment on board, they plan the events based on statistical knowledge of what people want to attend. This makes it more economical to run than land based entertainment where people can decide to go elsewhere. On a ship a certain portion of the passengers will see whatever show is presented each night, regardless of whether it’s singing, dancing, or magic.

One major advantage of cruises is that they are all inclusive. If you are on a regular holiday would you pay to see a singing or dancing show? Probably not, but if it’s included then you might as well do it – and it will be pretty good. This benefit is really appreciated by people taking kids on holidays, if kids do things like refuse to attend a performance that you were going to see or reject food once it’s served then it won’t cost any extra.

People Who Criticise Cruises

For the people who sneer at cruises, do you like going to bars? Do you like going to restaurants? Live music shows? Visiting foreign beaches? A cruise gets you all that and more for a discount price.

If Groupon had a deal that gave you a cheap hotel stay with all meals included, free non-alcoholic drinks at bars, day long entertainment for kids at the kids clubs, and two live performances every evening how many of the people who reject cruises would buy it? A typical cruise is just like a Groupon deal for non-stop entertainment from 8AM to 11PM.

Will Cruises Restart?

The entertainment options that cruises offer are greatly desired by many people. Most cruises are aimed at budget travellers, the price is cheaper than a hotel in a major city. Such cruises greatly depend on economies of scale, if they can’t get the ships filled then they would need to raise prices (thus decreasing demand) to try to make a profit. I think that some older cruise ships will be scrapped in the near future and some of the newer ships will be sold to cruise lines that cater to cheap travel (IE P&O may scrap some ships and some of the older Princess ships may be transferred to them). Overall I predict a decrease in the number of middle-class cruise ships.

For the expensive cruises (where the cheapest cabins cost over $1000US per person per night) I don’t expect any real changes, maybe they will have fewer passengers and higher prices to allow more social distancing or something.

I am certain that cruises will start again, but it’s too early to predict when. Going on a cruise is about as safe as going to a concert or a major sporting event. No-one is predicting that sporting stadiums will be closed forever or live concerts will be cancelled forever, so really no-one should expect that cruises will be cancelled forever. Whether companies that own ships or stadiums go bankrupt in the mean time is yet to be determined.

One thing that’s been happening for years is themed cruises. A group can book out an entire ship or part of a ship for a themed cruise. I expect this to become much more popular when cruises start again as it will make it easier to fill ships. In the past it seems that cruise lines let companies book their ships for events but didn’t take much of an active role in the process. I think that the management of cruise lines will look to aggressively market themed cruises to anyone who might help, for starters they could reach out to every 80s and 90s pop group – those fans are all old enough to be interested in themed cruises and the musicians won’t be asking for too much money.

Conclusion

Humans are social creatures. People want to attend events with many other people. Covid 19 won’t be the last pandemic, and it may not even be eradicated in the near future. The possibility of having a society where no-one leaves home unless they are in a hazmat suit has been explored in science fiction, but I don’t think that’s a plausible scenario for the near future and I don’t think that it’s something that will be caused by Covid 19.

May 25, 2020

op-build v2.5 firmware for the Raptor Blackbird

Well, following on from my post where I excitedly pointed out that Raptor Blackbird support: all upstream in op-build v2.5, that means I can do another in my series of (close to) upstream Blackbird firmware builds.

This time, the only difference from straight upstream op-build v2.5 is my fixes for buildroot so that I can actually build it on Fedora 32.

So, head over to https://www.flamingspork.com/blackbird/op-build-v2.5-blackbird-images/ and grab blackbird.pnor to flash it on your blackbird, let me know how it goes!

GNS3 FRR Appliance

In my spare time, what little I have, I’ve been wanting to play with some OSS networking projects. For those playing along at home, during last Suse hackweek I played with wireguard, and to test the environment I wanted to set up some routing.
For which I used FRR.

FRR is a pretty cool project, if brings the networking routing stack to Linux, or rather gives us a full opensource routing stack. As most routers are actually Linux anyway.

Many years ago I happened to work at Fujitsu working in a gateway environment, and started playing around with networking. And that was my first experience with GNS3. An opensource network simulator. Back then I needed to have a copy of cisco IOS images to really play with routing protocols, so that make things harder, great open source product but needed access to proprietary router OSes.

FRR provides a CLI _very_ similar to ciscos, and make we think, hey I wonder if there is an FRR appliance we can use in GNS3?
And there was!!!

When I downloaded it and decompressed the cow2 image it was 1.5GB!!! For a single router image. It works great, but what if I wanted a bunch of routers to play with things like OSPF or BGP etc. Surely we can make a smaller one.

Kiwi

At Suse we use kiwi-ng to build machine images and release media. And to make things even easier for me we already have a kiwi config for small OpenSuse Leap JEOS images, jeos is “just enough OS”. So I hacked one to include FRR. All extra tweaks needed to the image are also easily done by bash hook scripts.

I wont go in to too much detail how because I created a git repo where I have it all including a detailed README: https://github.com/matthewoliver/frr_gns3

So feel free to check that would and build and use the image.

But today, I went one step further. OpenSuse’s Open Build System, which is used to build all RPMs for OpenSuse, but can also build debs and whatever build you need, also supports building docker containers and system images using kiwi!

So have now got the OBS to build the image for me. The image can be downloaded from: https://download.opensuse.org/repositories/home:/mattoliverau/images/

And if you want to send any OBS requests to change it the project/package is: https://build.opensuse.org/package/show/home:mattoliverau/FRR-OpenSuse-Appliance

To import it into GNS3 you need the gns3a file, which you can find in my git repo or in the OBS project page.

The best part is this image is only 300MB, which is much better then 1.5GB!
I did have it a little smaller, 200-250MB, but unfortunately the JEOS cut down kernel doesn’t contain the MPLS modules, so had to pull in the full default SUSE kernel. If this became a real thing and not a pet project, I could go and build a FRR cutdown kernel to get the size down, but 300MB is already a lot better then where it was at.

Hostname Hack

When using GNS3 and you place a router, you want to be able to name the router and when you access the console it’s _really_ nice to see the router name you specified in GNS3 as the hostname. Why, because if you have a bunch, you want want a bunch of tags all with the localhost hostname on the commandline… this doesn’t really help.

The FRR image is using qemu, and there wasn’t a nice way to access the name of the VM from inside the container, and now an easy way to insert the name from outside. But found 1 approach that seems to be working, enter my dodgy hostname hack!

I also wanted to to it without hacking the gns3server code. I couldn’t easily pass the hostname in but I could pass it in via a null device with the router name its id:

/dev/virtio-ports/frr.router.hostname.%vm-name%

So I simply wrote a script that sets the hostname based on the existence of this device. Made the script a systemd oneshot service to start at boot and it worked!

This means changing the name of the FRR router in the GNS3 interface, all you need to do is restart the router (stop and start the device) and it’ll apply the name to the router. This saves you having to log in as root and running hostname yourself.

Or better, if you name all your FRR routers before turning them on, then it’ll just work.

In conclusion…

Hopefully now we can have a fully opensource, GNS3 + FRR appliance solution for network training, testing, and inspiring network engineers.

May 24, 2020

Printing hard-to-print PDFs on Linux

I recently found a few PDFs which I was unable to print due to those files causing insufficient printer memory errors:

I found a detailed explanation of what might be causing this which pointed the finger at transparent images, a PDF 1.4 feature which apparently requires a more recent version of PostScript than what my printer supports.

Using Okular's Force rasterization option (accessible via the print dialog) does work by essentially rendering everything ahead of time and outputing a big image to be sent to the printer. The quality is not very good however.

Converting a PDF to DjVu

The best solution I found makes use of a different file format: .djvu

Such files are not PDFs, but can still be opened in Evince and Okular, as well as in the dedicated DjVuLibre application.

As an example, I was unable to print page 11 of this paper. Using pdfinfo, I found that it is in PDF 1.5 format and so the transparency effects could be the cause of the out-of-memory printer error.

Here's how I converted it to a high-quality DjVu file I could print without problems using Evince:

pdf2djvu -d 1200 2002.04049.pdf > 2002.04049-1200dpi.djvu

Converting a PDF to PDF 1.3

I also tried the DjVu trick on a different unprintable PDF, but it failed to print, even after lowering the resolution to 600dpi:

pdf2djvu -d 600 dow-faq_v1.1.pdf > dow-faq_v1.1-600dpi.djvu

In this case, I used a different technique and simply converted the PDF to version 1.3 (from version 1.6 according to pdfinfo):

ps2pdf13 -r1200x1200 dow-faq_v1.1.pdf dow-faq_v1.1-1200dpi.pdf

This eliminates the problematic transparency and rasterizes the elements that version 1.3 doesn't support.

May 23, 2020

A totally cheating sour dough starter

Share

This is the third in a series of posts documenting my adventures in making bread during the COVID-19 shutdown. I’d like to imagine I was running science experiments in making bread on my kids, but really all I was trying to do was eat some toast.

I’m not sure what it was like in other parts of the world, but during the COVID-19 pandemic Australia suffered a bunch of shortages — toilet paper, flour, and yeast were among those things stores simply didn’t have any stock of. Luckily we’d only just done a costco shop so were ok for toilet paper and flour, but we were definitely getting low on yeast. The obvious answer is a sour dough starter, but I’d never done that thing before.

In the end my answer was to cheat and use this recipe. However, I found the instructions unclear, so here’s what I ended up doing:

Starting off

  • 2 cups of warm water
  • 2 teaspoons of dry yeast
  • 2 cups of bakers flour

Mix these three items together in a plastic container with enough space for the mix to double in size. Place in a warm place (on the bench on top of the dish washer was our answer), and cover with cloth secured with a rubber band.

Feeding

Once a day you should feed your starter with 1 cup of flour and 1 cup of warm water. Stir throughly.

Reducing size

The recipe online says to feed for five days, but the size of my starter was getting out of hand by a couple of days, so I started baking at that point. I’ll describe the baking process in a later post. The early loaves definitely weren’t as good as the more recent ones, but they were still edible.

Hybernation

Once the starter is going, you feed daily and probably need to bake daily to keep the starters size under control. That obviously doesn’t work so great if you can’t eat an entire loaf of bread a day. You can hybernate the starter by putting it in the fridge, which means you only need to feed it once a week.

To wake a hybernated starter up, take it out of the fridge and feed it. I do this at 8am. That means I can then start the loaf for baking at about noon, and the starter can either go back in the fridge until next time or stay on the bench being fed daily.

I have noticed that sometimes the starter comes out of the fridge with a layer of dark water on top. Its worked out ok for us to just ignore that and stir it into the mix as part of the feeding process. Hopefully we wont die.

Share

Refurbishing my Macintosh Plus

Somewhere in the mid to late 1990s I picked myself up a Macintosh Plus for the sum of $60AUD. At that time there were still computer Swap Meets where old and interesting equipment was around, so I headed over to one at some point (at the St Kilda Town Hall if memory serves) and picked myself up four 1MB SIMMs to boost the RAM of it from the standard 1MB to the insane amount of 4MB. Why? Umm… because I could? The RAM was pretty cheap, and somewhere in the house to this day, I sometimes stumble over the 256KB SIMMs as I just can’t bring myself to get rid of them.

This upgrade probably would have cost close to $2,000 at the system’s release. If the Macintosh system software were better at disk caching you could have easily held the whole 800k of the floppy disk in memory and still run useful software!

One of the annoying things that started with the Macintosh was odd screws and Apple gear being hard to get into. Compare to say, the Apple ][ which had handy clips to jump inside whenever. In fitting my massive FOUR MEGABYTES of RAM back in the day, I recall using a couple of allen keys sticky-taped together to be able to reach in and get the recessed Torx screws. These days, I can just order a torx bit off Amazon and have it arrive pretty quickly. Well, two torx bits, one of which is just too short for the job.

My (dusty) Macintosh Plus

One thing had always struck me about it, it never really looked like the photos of the Macintosh Plus I saw in books. In what is an embarrassing number of years later, I learned that a lot can be gotten from the serial number printed on the underside of the front of the case.

So heading over to the My Old Mac Serial Number Decoder I can find out:

Manufactured in: F => Fremont, California, USA
Year of production: 1985
Week of production: 14
Production number: 3V3 => 4457
Model ID: M0001WP => Macintosh 512K (European Macintosh ED)

Your Macintosh 512K (European Macintosh ED) was the 4457th Mac manufactured during the 14th week of 1985 in Fremont, California, USA.

Pretty cool! So it is certainly a Plus as the logic board says that, but it’s actually an upgraded 512k! If you think it was madness to have a GUI with only 128k of RAM in the original Macintosh, you’d be right. I do not envy anybody who had one of those.

Some time a decent (but not too many, less than 10) years ago, I turn on the Mac Plus to see if it still worked. It did! But then… some magic smoke started to come out (which isn’t so good), but the computer kept working! There’s something utterly bizarre about looking at a computer with smoke coming out of it that continues to function perfectly fine.

Anyway, as the smoke was coming out, I decided that it would be an opportune time to turn it off, open doors and windows, and put it away until I was ready to deal with it.

One Global Pandemic Later, and now was the time.

I suspected it was going to be a capacitor somewhere that blew, and figured that I should replace it, and probably preemptively replace all the other electrolytic capacitors that could likely leak and cause problems.

First thing’s first though: dismantle it and clean everything. First, taking the case off. Apple is not new to the game of annoying screws to get into things. I ended up spending $12 on this set on Amazon, as the T10 bit can actually reach the screws holding the case on.

Cathode Ray Tubes are not to be messed with. We’re talking lethal voltages here. It had been many years since electricity went into this thing, so all was good. If this all doesn’t work first time when reassembling it, I’m not exactly looking forward to discharging a CRT and working on it.

The inside of my Macintosh Plus, with lots of grime.

You can see there’s grime everywhere. It’s not the worst in the world, but it’s not great (and kinda sticky). Obviously, this needs to be cleaned! The best way to do that is take a lot of photos, dismantle everything, and clean it a bit at a time.

There’s four main electronic components inside a Macintosh Plus:

  1. The CRT itself
  2. The floppy disk drive
  3. The Logic Board (what Mac people call what PC people call the motherboard)
  4. The Analog Board

There’s also some metal structure that keeps some things in place. There’s only a few connectors between things, which are pretty easy to remove. If you don’t know how to discharge a CRT and what the dangers of them are you should immediately go and find out through reading rather than finding out by dying. I would much prefer it if you dyed (because creative fun) rather than died.

Once the floppy connector and the power connector is unplugged, the logic board slides out pretty easily. You can see from the photo below that I have the 4MB of RAM installed and the resistor you need to snip is, well, snipped (but look really closely for that). Also, grime.

Macintosh Plus Logic Board

Cleaning things? Well, there’s two ways that I have used (and considering I haven’t yet written the post with “hurray, it all works”, currently take it with a grain of salt until I write that post). One: contact cleaner. Two: detergent.

Macintosh Plus Logic Board (being washed in my sink)

I took the route of cleaning things first, and then doing recapping adventures. So it was some contact cleaner on the boards, and then some soaking with detergent. This actually all worked pretty well.

Logic Board Capacitors:

  • C5, C6, C7, C12, C13 = 33uF 16V 85C (measured at 39uF, 38uF, 38uF, 39uF)
  • C14 = 1uF 50V (measured at 1.2uF and then it fluctuated down to around 1.15uF)

Analog Board Capacitors

  • C1 = 35V 3.9uF (M) measured at 4.37uF
  • C2 = 16V 4700uF SM measured at 4446uF
  • C3 = 16V 220uF +105C measured at 234uF
  • C5 = 10V 47uF 85C measured at 45.6uF
  • C6 = 50V 22uF 85C measured at 23.3uF
  • C10 = 16V 33uF 85C measured at 37uF
  • C11 = 160V 10uF 85C measured at 11.4uF
  • C12 = 50V 22uF 85C measured at 23.2uF
  • C18 = 16V 33uF 85C measured at 36.7uF
  • C24 = 16V 2200uF 105C measured at 2469uF
  • C27 = 16V 2200uF 105C measured at 2171uF (although started at 2190 and then went down slowly)
  • C28 = 16V 1000uF 105C measured at 638uF, then 1037uF, then 1000uF, then 987uF
  • C30 = 16V 2200uF 105C measured at 2203uF
  • C31 = 16V 220uF 105C measured at 236uF
  • C32 = 16V 2200uF 105C measured at 2227uF
  • C34 = 200V 100uF 85C measured at 101.8uF
  • C35 = 200V 100uF 85C measured at 103.3uF
  • C37 = 250V 0.47uF measured at <exploded>. wheee!
  • C38 = 200V 100uF 85C measured at 103.3uF
  • C39 = 200V 100uF 85C mesaured at 99.6uF (with scorch marks from next door)
  • C42 = 10V 470uF 85C measured at 556uF
  • C45 = 10V 470uF 85C measured at 227uF, then 637uF then 600uF

I’ve ordered an analog board kit from https://console5.com/store/macintosh-128k-512k-plus-analog-pcb-cap-kit-630-0102-661-0462.html and when trying to put them in, I learned that the US Analog board is different to the International Analog board!!! Gah. Dammit.

Note that C30, C32, C38, C39, and C37 were missing from the kit I received (probably due to differences in the US and International boards). I did have an X2 cap (for C37) but it was 0.1uF not 0.47uF. I also had two extra 1000uF 16V caps.

Macintosh Repair and Upgrade Secrets (up to the Mac SE no less!) holds an Appendix with the parts listing for both the US and International Analog boards, and this led me to conclude that they are in fact different boards rather than just a few wires that are different. I am not sure what the “For 120V operation, W12 must be in place” and “for 240V operation, W12 must be removed” writing is about on the International Analog board, but I’m not quite up to messing with that at the moment.

So, I ordered the parts (linked above) and waited (again) to be able to finish re-capping the board.

I found https://youtu.be/H9dxJ7uNXOA video to be a good one for learning a bunch about the insides of compact Macs, I recommend it and several others on his YouTube channel. One interesting thing I learned is that the X2 cap (C37 on the International one) is before the power switch, so could blow just by having the system plugged in and not turned on! Okay, so I’m kind of assuming that it also applies to the International board, and mine exploded while it was plugged in and switched on, so YMMV.

Additionally, there’s an interesting list of commonly failing parts. Unfortunately, this is also for the US logic board, so the tables in Macintosh Repair and Upgrade Secrets are useful. I’m hoping that I don’t have to replace anything more there, but we’ll see.

But, after the Nth round of parts being delivered….

Note the lack of an exploded capacitor

Yep, that’s where the exploded cap was before. Cleanup up all pretty nicely actually. Annoyingly, I had to run it all through a step-up transformer as the board is all set for Australian 240V rather than US 120V. This isn’t going to be an everyday computer though, so it’s fine.

Macintosh Plus booting up (note how long the memory check of 4MB of RAM takes. I’m being very careful as the cover is off. High, and possibly lethal voltages exposed.

Woohoo! It works. While I haven’t found my supply of floppy disks that (at least used to) work, the floppy mechanism also seems to work okay.

Macintosh Plus with a seemingly working floppy drive mechanism. I haven’t found a boot floppy yet though.

Next up: waiting for my Floppy Emu to arrive as it’ll certainly let it boot. Also, it’s now time to rip the house apart to find a floppy disk that certainly should have made its way across the ocean with the move…. Oh, and also to clean up the mouse and keyboard.

May 18, 2020

Displaying client IP address using Apache Server-Side Includes

If you use a Dynamic DNS setup to reach machines which are not behind a stable IP address, you will likely have a need to probe these machines' public IP addresses. One option is to use an insecure service like Oracle's http://checkip.dyndns.com/ which echoes back your client IP, but you can also do this on your own server if you have one.

There are multiple options to do this, like writing a CGI or PHP script, but those are fairly heavyweight if that's all you need mod_cgi or PHP for. Instead, I decided to use Apache's built-in Server-Side Includes.

Apache configuration

Start by turning on the include filter by adding the following in /etc/apache2/conf-available/ssi.conf:

AddType text/html .shtml
AddOutputFilter INCLUDES .shtml

and making that configuration file active:

a2enconf ssi

Then, find the vhost file where you want to enable SSI and add the following options to a Location or Directory section:

<Location /ssi_files>
    Options +IncludesNOEXEC
    SSLRequireSSL
    Header set Content-Security-Policy: "default-src 'none'"
    Header set X-Content-Type-Options: "nosniff"
    Header set Cache-Control "max-age=0, no-cache, no-store, must-revalidate"
</Location>

before adding the necessary modules:

a2enmod headers
a2enmod include

and restarting Apache:

apache2ctl configtest && systemctl restart apache2.service

Create an shtml page

With the web server ready to process SSI instructions, the following HTML blurb can be used to display the client IP address:

<!--#echo var="REMOTE_ADDR" -->

or any other built-in variable.

Note that you don't need to write a valid HTML for the variable to be substituted and so the above one-liner is all I use on my server.

Security concerns

The first thing to note is that the configuration section uses the IncludesNOEXEC option in order to disable arbitrary command execution via SSI. In addition, you can also make sure that the cgi module is disabled since that's a dependency of the more dangerous side of SSI:

a2dismod cgi

Of course, if you rely on this IP address to be accurate, for example because you'll be putting it in your DNS, then you should make sure that you only serve this page over HTTPS, which can be enforced via the SSLRequireSSL directive.

I included two other headers in the above vhost config (Content-Security-Policy and X-Content-Type-Options) in order to limit the damage that could be done in case a malicious file was accidentally dropped in that directory.

Finally, I suggest making sure that only the root user has writable access to the directory which has server-side includes enabled:

$ ls -la /var/www/ssi_includes/
total 12
drwxr-xr-x  2 root     root     4096 May 18 15:58 .
drwxr-xr-x 16 root     root     4096 May 18 15:40 ..
-rw-r--r--  1 root     root        0 May 18 15:46 index.html
-rw-r--r--  1 root     root       32 May 18 15:58 whatsmyip.shtml

A Good Time to Upgrade PCs

PC hardware just keeps getting cheaper and faster. Now that so many people have been working from home the deficiencies of home PCs are becoming apparent. I’ll give Australian prices and URLs in this post, but I think that similar prices will be available everywhere that people read my blog.

From MSY (parts list PDF ) [1] 120G SATA SSDs are under $50 each. 120G is more than enough for a basic workstation, so you are looking at $42 or so for fast quiet storage or $84 or so for the same with RAID-1. Being quiet is a significant luxury feature and it’s also useful if you are going to be in video conferences.

For more serious storage NVMe starts at around $100 per unit, I think that $124 for a 500G Crucial NVMe is the best low end option (paying $95 for a 250G Kingston device doesn’t seem like enough savings to be worth it). So that’s $248 for 500G of very fast RAID-1 storage. There’s a Samsung 2TB NVMe device for $349 which is good if you need more storage, it’s interesting to note that this is significantly cheaper than the Samsung 2TB SSD which costs $455. I wonder if SATA SSD devices will go away in the future, it might end up being SATA for slow/cheap spinning media and M.2 NVMe for solid state storage. The SATA SSD devices are only good for use in older systems that don’t have M.2 sockets on the motherboard.

It seems that most new motherboards have one M.2 socket on the motherboard with NVMe support, and presumably support for booting from NVMe. But dual M.2 sockets is rare and the price difference is significantly greater than the cost of a PCIe M.2 card to support NVMe which is $14. So for NVMe RAID-1 it seems that the best option is a motherboard with a single NVMe socket (starting at $89 for a AM4 socket motherboard – the current standard for AMD CPUs) and a PCIe M.2 card.

One thing to note about NVMe is that different drivers are required. On Linux this means means building a new initrd before the migration (or afterwards when booted from a recovery image) and on Windows probably means a fresh install from special installation media with NVMe drivers.

All the AM4 motherboards seem to have RADEON Vega graphics built in which is capable of 4K resolution at a stated refresh of around 24Hz. The ones that give detail about the interfaces say that they have HDMI 1.4 which means a maximum of 30Hz at 4K resolution if you have the color encoding that suits text (IE for use other than just video). I covered this issue in detail in my blog post about DisplayPort and 4K resolution [2]. So a basic AM4 motherboard won’t give great 4K display support, but it will probably be good for a cheap start.

$89 for motherboard, $124 for 500G NVMe, $344 for a Ryzen 5 3600 CPU (not the cheapest AM4 but in the middle range and good value for money), and $99 for 16G of RAM (DDR4 RAM is cheaper than DDR3 RAM) gives the core of a very decent system for $656 (assuming you have a working system to upgrade and peripherals to go with it).

Currently Kogan has 4K resolution monitors starting at $329 [3]. They probably won’t be the greatest monitors but my experience of a past cheap 4K monitor from Kogan was that it is quite OK. Samsung 4K monitors started at about $400 last time I could check (Kogan currently has no stock of them and doesn’t display the price), I’d pay an extra $70 for Samsung, but the Kogan branded product is probably good enough for most people. So you are looking at under $1000 for a new system with fast CPU, DDR4 RAM, NVMe storage, and a 4K monitor if you already have the case, PSU, keyboard, mouse, etc.

It seems quite likely that the 4K video hardware on a cheap AM4 motherboard won’t be that great for games and it will definitely be lacking for watching TV documentaries. Whether such deficiencies are worth spending money on a PCIe video card (starting at $50 for a low end card but costing significantly more for 3D gaming at 4K resolution) is a matter of opinion. I probably wouldn’t have spent extra for a PCIe video card if I had 4K video on the motherboard. Not only does using built in video save money it means one less fan running (less background noise) and probably less electricity use too.

My Plans

I currently have a workstation with 2*500G SATA SSDs in a RAID-1 array, 16G of RAM, and a i5-2500 CPU (just under 1/4 the speed of the Ryzen 5 3600). If I had hard drives then I would definitely buy a new system right now. But as I have SSDs that work nicely (quiet and fast enough for most things) and almost all machines I personally use have SSDs (so I can’t get a benefit from moving my current SSDs to another system) I would just get CPU, motherboard, and RAM. So the question is whether to spend $532 for more than 4* the CPU performance. At the moment I’ll wait because I’ll probably get a free system with DDR4 RAM in the near future, while it probably won’t be as fast as a Ryzen 5 3600, it should be at least twice as fast as what I currently have.

May 17, 2020

Notes on Installing Ubuntu 20 VM on an MS-Windows 10 Host

Some thirteen years ago I worked with Xen virtual machines as part of my day job, and gave a presentation at Linux Users of Victoria on the subject (with additional lecture notes). A few years after that I gave another presentation on the Unified Extensible Firmware Interface (UEFI), itself which (indirectly) led to a post on Linux and MS-Windows 8 dual-booting. All of this now leads to a some notes on using MS-Windows as a host for Ubuntu Linux guest machines.

Why Would You Want to do This?

Most people these have at least heard of Linux. They might even know that every single supercomputer in the world uses Linux. They may know that the overwhelming majority of embedded devices, such as home routers, use Linux. Or maybe even that the Android mobile 'phone uses a Linux kernel. Or that MacOS is built on the same broad family of UNIX-like operating systems. Whilst they might be familiar with their MS-Windows environment, because that's what they've been brought up on and what their favourite applications are designed for, they might also be "Linux curious", especially if they are hoping to either scale-up the complexity and volume of the datasets they're working with (i.e., towards high performance computing) or scale-down their applications (i.e., towards embedded devices). If this is the case, then introducing Linux via a virtual machine (VM) is a relatively safe and easy path to experiment with.

About VMs

Virtual machines work by emulating a computer system, including hardware, in a software environment, a technology that has been around for a very long time (e.g., CP/CMS, 1967). The VMs in a host system is managed by a hypervisor, or Virtual Machine Monitor (VMM), that manages one or more guest systems. In the example that follows VirtualBox, a free-and-open source hypervisor. Because the guest system relies on the host it cannot have the same performance as a host system, unlike a dual-boot system. It will share memory, it will share processing power, it must take up some disk space, and will also have the overhead of the hypervisor itself (although this has improved a great deal in recent years). In a production environment, VMs are usually used to optimise resource allocation for very powerful systems, such as web-server farms and bodies like the Nectar Research Cloud, or even some partitions on systems like the University of Melbourne's supercomputer, Spartan. In a development environment, VMs are an excellent tool for testing and debugging.

Install VirtualBox and Enable Virtualization

For most environments VirtualBox is an easy path for creating a virtual machine, ARM systems excluded (QEMU suggested for Raspberry Pi or Android, or QEMU's fork, KVM). For the example given here, simply download VirtualBox for MS-Windows and click one's way through the installation process, noting that it VirtualBox will make changes to your system and that products from Oracle can be trusted (*blink*). Download for other operating environments are worth looking at as well.

It is essential to enable virtualisation on your MS-Windows host through the BIOS/UEFI, which is not as easy as it used to be. A handy page from some smart people in the Czech Republic provides quick instructions for a variety of hardware environments. The good people at laptopmag provide the path from within the MS-Windows environment. In summary; select Settings (gear icon), select Update & Security, Select Recovery (this sounds wrong), Advanced Startup, Restart Now (which is also wrong, you don't restart now), Troubleshoot, Advanced Options, UEFI Firmware Settings, then Restart.

Install Linux and Create a Shared Folder

Download a Ubuntu 20.04 LTS (long-term support) ISO and save to the MS-Windows host. There are some clever alternatives, such as the Ubuntu Linux terminal environment for MS-Windows (which is possibly even a better choice these days, but that will be for another post), or Multipass which allows one to create their own mini-cloud environment. But this is a discussion for a VM, so I'll resist the temptation to go off on a tangent.

Creating a VM in VirtualBox is pretty straight-forward; open the application, select "New", give the VM a name, and allocate resources (virtual hard disk, virtual memory). It's worthwhile tending towards the generous in resource allocation. After that it is a case selecting the ISO in settings and storage; remember a VM does not have a real disk drive, so it has a virtual (software) one. After this one can start the VM, and it will boot from the ISO and begin the installation process for Ubuntu Linux desktop edition, which is pretty straight forward. One amusing caveat, when the installation says it's going to wipe the disk it doesn't mean the host machine, just that of the virtual disk that has been build for it. When the installation is complete go to "Devices" on the VM menu, and remove the boot disk and restart the guest system; you now have a Ubuntu VM installed on your MS-Windows system.

By default, VMs do not have access to the host computer. To provide that access one will want to set up a shared folder in the VM and on the host. The first step in this environment would be to give the Linux user (created during installation) membership to the vboxsf, e.g., on the terminal sudo usermod -a -G vboxsf username. In VirtualBox, select Settings, and add a Share under as a Machine Folders, which is a permanent folder. Under Folder Path set the name and location on the host operating system (e.g., UbuntuShared on the Desktop); leave automount blank (we can fix that soon enough). Put a test file in the shared folder.

Ubuntu now needs additional software installed to work with VirtualBox's Guest Additions, including kernel modules. Also, mount VirtualBox's Guest Additions to the guest VM, under Devices as a virtual CD; you can download this from the VirtualBox website.

Run the following commands, entering the default user's password as needed:


sudo apt-get install -y build-essential linux-headers-`uname -r`
sudo /media/cdrom/./VBoxLinuxAdditions.run
sudo shutdown -r now # Reboot the system
mkdir ~/UbuntuShared
sudo mount -t vboxsf shared ~/UbuntuShared
cd ~/UbuntuShared

The file that was put in the UbuntuShared folder in MS-Windows should now be visible in ~/UbuntuShared. Add a file (e.g., touch testfile.txt) from Linux and check if it can seen in MS-Windows. If this all succeeds, make the folder persistent.


sudo nano /etc/fstab # nano is just fine for short configuration files
# Add the following, separate by tabs, and save
shared /home//UbuntuShared vboxsf defaults 0 0
# Edit modules
sudo nano /etc/modules
# Add the following
vboxsf
# Exit and reboot
sudo shutdown -r now

You're done! You now have a Ubuntu desktop system running as a VM guest using VirtualBox on an MS-Windows 10 host system. Ideal for learning, testing, and debugging.

A super simple non-breadmaker loaf

Share

This is the second in a series of posts documenting my adventures in making bread during the COVID-19 shutdown. Yes I know all the cool kids made bread for themselves during the shutdown, but I did it too!

A loaf of bread

So here we were, in the middle of a pandemic which closed bakeries and cancelled almost all of my non-work activities. I found this animated GIF on Reddit for a super simple no-kneed bread and decided to give it a go. It turns out that a few things are true:

  • animated GIFs are a super terrible way store recipes
  • that animated GIF was a export of this YouTube video which originally accompanied this blog post
  • and that I only learned these things while to trying and work out who to credit for this recipe

The basic recipe is really easy — chuck the following into a big bowl, stir, and then cover with a plate. Leave resting a warm place for a long time (three or four hours), then turn out onto a floured bench. Fold into a ball with flour, and then bake. You can see a more detailed version in the YouTube video above.

  • 3 cups of bakers flour (not plain white flour)
  • 2 tea spoons of yeast
  • 2 tea spooons of salt
  • 1.5 cups of warm water (again, I use 42 degrees from my gas hot water system)

The dough will seem really dry when you first mix it, but gets wetter as it rises. Don’t panic if it seems tacky and dry.

I think the key here is the baking process, which is how the oven loaf in my previous post about bread maker white loaves was baked. I use a cast iron camp oven (sometimes called a dutch oven), because thermal mass is key. If I had a fancy enamelized cast iron camp oven I’d use that, but I don’t and I wasn’t going shopping during the shutdown to get one. Oh, and they can be crazy expensive at up to $500 AUD.

Another loaf of bread

Warm the oven with the camp oven inside for at least 30 minutes at 230 degrees celsius. Then place the dough inside the camp oven on some baking paper — I tend to use a triffet as well, but I think you could skip that if you didn’t have one. Bake for 30 minutes with the lid on — this helps steam the bread a little and forms a nice crust. Then bake for another 12 minutes with the camp over lid off — this darkens the crust up nicely.

A final loaf of bread

Oh, and I’ve noticed a bit of variation in how wet the dough seems to be when I turn it out and form it in flour, but it doesn’t really seem to change the outcome once baked, so that’s nice.

The original blogger for this receipe also recommends chilling the dough overnight in the fridge before baking, but I haven’t tried that yet.

Share

Private Key Redaction: UR DOIN IT RONG

Because posting private keys on the Internet is a bad idea, some people like to “redact” their private keys, so that it looks kinda-sorta like a private key, but it isn’t actually giving away anything secret. Unfortunately, due to the way that private keys are represented, it is easy to “redact” a key in such a way that it doesn’t actually redact anything at all. RSA private keys are particularly bad at this, but the problem can (potentially) apply to other keys as well.

I’ll show you a bit of “Inside Baseball” with key formats, and then demonstrate the practical implications. Finally, we’ll go through a practical worked example from an actual not-really-redacted key I recently stumbled across in my travels.

The Private Lives of Private Keys

Here is what a typical private key looks like, when you come across it:

-----BEGIN RSA PRIVATE KEY-----
MGICAQACEQCxjdTmecltJEz2PLMpS4BXAgMBAAECEDKtuwD17gpagnASq1zQTYEC
CQDVTYVsjjF7IQIJANUYZsIjRsR3AgkAkahDUXL0RSECCB78r2SnsJC9AghaOK3F
sKoELg==
-----END RSA PRIVATE KEY-----

Obviously, there’s some hidden meaning in there – computers don’t encrypt things by shouting “BEGIN RSA PRIVATE KEY!”, after all. What is between the BEGIN/END lines above is, in fact, a base64-encoded DER format ASN.1 structure representing a PKCS#1 private key.

In simple terms, it’s a list of numbers – very important numbers. The list of numbers is, in order:

  • A version number (0);
  • The “public modulus”, commonly referred to as “n”;
  • The “public exponent”, or “e” (which is almost always 65,537, for various unimportant reasons);
  • The “private exponent”, or “d”;
  • The two “private primes”, or “p” and “q”;
  • Two exponents, which are known as “dmp1” and “dmq1”; and
  • A coefficient, known as “iqmp”.

Why Is This a Problem?

The thing is, only three of those numbers are actually required in a private key. The rest, whilst useful to allow the RSA encryption and decryption to be more efficient, aren’t necessary. The three absolutely required values are e, p, and q.

Of the other numbers, most of them are at least about the same size as each of p and q. So of the total data in an RSA key, less than a quarter of the data is required. Let me show you with the above “toy” key, by breaking it down piece by piece1:

  • MGI – DER for “this is a sequence”
  • CAQ – version (0)
  • CxjdTmecltJEz2PLMpS4BXn
  • AgMBAAe
  • ECEDKtuwD17gpagnASq1zQTYd
  • ECCQDVTYVsjjF7IQp
  • IJANUYZsIjRsR3q
  • AgkAkahDUXL0RSdmp1
  • ECCB78r2SnsJC9dmq1
  • AghaOK3FsKoELg==iqmp

Remember that in order to reconstruct all of these values, all I need are e, p, and q – and e is pretty much always 65,537. So I could “redact” almost all of this key, and still give all the important, private bits of this key. Let me show you:

-----BEGIN RSA PRIVATE KEY-----
..............................................................EC
CQDVTYVsjjF7IQIJANUYZsIjRsR3....................................
........
-----END RSA PRIVATE KEY-----

Now, I doubt that anyone is going to redact a key precisely like this… but then again, this isn’t a “typical” RSA key. They usually look a lot more like this:

-----BEGIN RSA PRIVATE KEY-----
MIIEogIBAAKCAQEAu6Inch7+mWtKn+leB9uCG3MaJIxRyvC/5KTz2fR+h+GOhqj4
SZJobiVB4FrE5FgC7AnlH6qeRi9MI0s6dt5UWZ5oNIeWSaOOeNO+EJDUkSVf67wj
SNGXlSjGAkPZ0nRJiDjhuPvQmdW53hOaBLk5udxPEQbenpXAzbLJ7wH5ouLQ3nQw
HwpwDNQhF6zRO8WoscpDVThOAM+s4PS7EiK8ZR4hu2toon8Ynadlm95V45wR0VlW
zywgbkZCKa1IMrDCscB6CglQ10M3Xzya3iTzDtQxYMVqhDrA7uBYRxA0y1sER+Rb
yhEh03xz3AWemJVLCQuU06r+FABXJuY/QuAVvQIDAQABAoIBAFqwWVhzWqNUlFEO
PoCVvCEAVRZtK+tmyZj9kU87ORz8DCNR8A+/T/JM17ZUqO2lDGSBs9jGYpGRsr8s
USm69BIM2ljpX95fyzDjRu5C0jsFUYNi/7rmctmJR4s4uENcKV5J/++k5oI0Jw4L
c1ntHNWUgjK8m0UTJIlHbQq0bbAoFEcfdZxd3W+SzRG3jND3gifqKxBG04YDwloy
tu+bPV2jEih6p8tykew5OJwtJ3XsSZnqJMwcvDciVbwYNiJ6pUvGq6Z9kumOavm9
XU26m4cWipuK0URWbHWQA7SjbktqEpxsFrn5bYhJ9qXgLUh/I1+WhB2GEf3hQF5A
pDTN4oECgYEA7Kp6lE7ugFBDC09sKAhoQWrVSiFpZG4Z1gsL9z5YmZU/vZf0Su0n
9J2/k5B1GghvSwkTqpDZLXgNz8eIX0WCsS1xpzOuORSNvS1DWuzyATIG2cExuRiB
jYWIJUeCpa5p2PdlZmBrnD/hJ4oNk4oAVpf+HisfDSN7HBpN+TJfcAUCgYEAyvY7
Y4hQfHIdcfF3A9eeCGazIYbwVyfoGu70S/BZb2NoNEPymqsz7NOfwZQkL4O7R3Wl
Rm0vrWT8T5ykEUgT+2ruZVXYSQCKUOl18acbAy0eZ81wGBljZc9VWBrP1rHviVWd
OVDRZNjz6nd6ZMrJvxRa24TvxZbJMmO1cgSW1FkCgYAoWBd1WM9HiGclcnCZknVT
UYbykCeLO0mkN1Xe2/32kH7BLzox26PIC2wxF5seyPlP7Ugw92hOW/zewsD4nLze
v0R0oFa+3EYdTa4BvgqzMXgBfvGfABJ1saG32SzoWYcpuWLLxPwTMsCLIPmXgRr1
qAtl0SwF7Vp7O/C23mNukQKBgB89DOEB7xloWv3Zo27U9f7nB7UmVsGjY8cZdkJl
6O4LB9PbjXCe3ywZWmJqEbO6e83A3sJbNdZjT65VNq9uP50X1T+FmfeKfL99X2jl
RnQTsrVZWmJrLfBSnBkmb0zlMDAcHEnhFYmHFuvEnfL7f1fIoz9cU6c+0RLPY/L7
n9dpAoGAXih17mcmtnV+Ce+lBWzGWw9P4kVDSIxzGxd8gprrGKLa3Q9VuOrLdt58
++UzNUaBN6VYAe4jgxGfZfh+IaSlMouwOjDgE/qzgY8QsjBubzmABR/KWCYiRqkj
qpWCgo1FC1Gn94gh/+dW2Q8+NjYtXWNqQcjRP4AKTBnPktEvdMA=
-----END RSA PRIVATE KEY-----

People typically redact keys by deleting whole lines, and usually replacing them with [...] and the like. But only about 345 of those 1588 characters (excluding the header and footer) are required to construct the entire key. You can redact about 4/5ths of that giant blob of stuff, and your private parts (or at least, those of your key) are still left uncomfortably exposed.

But Wait! There’s More!

Remember how I said that everything in the key other than e, p, and q could be derived from those three numbers? Let’s talk about one of those numbers: n.

This is known as the “public modulus” (because, along with e, it is also present in the public key). It is very easy to calculate: n = p * q. It is also very early in the key (the second number, in fact).

Since n = p * q, it follows that q = n / p. Thus, as long as the key is intact up to p, you can derive q by simple division.

Real World Redaction

At this point, I’d like to introduce an acquaintance of mine: Mr. Johan Finn. He is the proud owner of the GitHub repo johanfinn/scripts. For a while, his repo contained a script that contained a poorly-redacted private key. He since deleted it, by making a new commit, but of course because git never really deletes anything, it’s still available.

Of course, Mr. Finn may delete the repo, or force-push a new history without that commit, so here is the redacted private key, with a bit of the surrounding shell script, for our illustrative pleasure:

#Add private key to .ssh folder
cd /home/johan/.ssh/
echo  "-----BEGIN RSA PRIVATE KEY-----
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
ÄÄÄÄÄÄÄÄÄÄÄÄÄÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ
MIIJKgIBAAKCAgEAxEVih1JGb8gu/Fm4AZh+ZwJw/pjzzliWrg4mICFt1g7SmIE2
TCQMKABdwd11wOFKCPc/UzRH/fHuQcvWrpbOSdqev/zKff9iedKw/YygkMeIRaXB
fYELqvUAOJ8PPfDm70st9GJRhjGgo5+L3cJB2gfgeiDNHzaFvapRSU0oMGQX+kI9
ezsjDAn+0Pp+r3h/u1QpLSH4moRFGF4omNydI+3iTGB98/EzuNhRBHRNq4oBV5SG
Pq/A1bem2ninnoEaQ+OPESxYzDz3Jy9jV0W/6LvtJ844m+XX69H5fqq5dy55z6DW
sGKn78ULPVZPsYH5Y7C+CM6GAn4nYCpau0t52sqsY5epXdeYx4Dc+Wm0CjXrUDEe
Egl4loPKDxJkQqQ/MQiz6Le/UK9vEmnWn1TRXK3ekzNV4NgDfJANBQobOpwt8WVB
rbsC0ON7n680RQnl7PltK9P1AQW5vHsahkoixk/BhcwhkrkZGyDIl9g8Q/Euyoq3
eivKPLz7/rhDE7C1BzFy7v8AjC3w7i9QeHcWOZFAXo5hiDasIAkljDOsdfD4tP5/
wSO6E6pjL3kJ+RH2FCHd7ciQb+IcuXbku64ln8gab4p8jLa/mcMI+V3eWYnZ82Yu
axsa85hAe4wb60cp/rCJo7ihhDTTvGooqtTisOv2nSvCYpcW9qbL6cGjAXECAwEA
AQKCAgEAjz6wnWDP5Y9ts2FrqUZ5ooamnzpUXlpLhrbu3m5ncl4ZF5LfH+QDN0Kl
KvONmHsUhJynC/vROybSJBU4Fu4bms1DJY3C39h/L7g00qhLG7901pgWMpn3QQtU
4P49qpBii20MGhuTsmQQALtV4kB/vTgYfinoawpo67cdYmk8lqzGzzB/HKxZdNTq
s+zOfxRr7PWMo9LyVRuKLjGyYXZJ/coFaobWBi8Y96Rw5NZZRYQQXLIalC/Dhndm
AHckpstEtx2i8f6yxEUOgPvV/gD7Akn92RpqOGW0g/kYpXjGqZQy9PVHGy61sInY
HSkcOspIkJiS6WyJY9JcvJPM6ns4b84GE9qoUlWVF3RWJk1dqYCw5hz4U8LFyxsF
R6WhYiImvjxBLpab55rSqbGkzjI2z+ucDZyl1gqIv9U6qceVsgRyuqdfVN4deU22
LzO5IEDhnGdFqg9KQY7u8zm686Ejs64T1sh0y4GOmGsSg+P6nsqkdlXH8C+Cf03F
lqPFg8WQC7ojl/S8dPmkT5tcJh3BPwIWuvbtVjFOGQc8x0lb+NwK8h2Nsn6LNazS
0H90adh/IyYX4sBMokrpxAi+gMAWiyJHIHLeH2itNKtAQd3qQowbrWNswJSgJzsT
JuJ7uqRKAFkE6nCeAkuj/6KHHMPsfCAffVdyGaWqhoxmPOrnVgECggEBAOrCCwiC
XxwUgjOfOKx68siFJLfHf4vPo42LZOkAQq5aUmcWHbJVXmoxLYSczyAROopY0wd6
Dx8rqnpO7OtZsdJMeBSHbMVKoBZ77hiCQlrljcj12moFaEAButLCdZFsZW4zF/sx
kWIAaPH9vc4MvHHyvyNoB3yQRdevu57X7xGf9UxWuPil/jvdbt9toaraUT6rUBWU
GYPNKaLFsQzKsFWAzp5RGpASkhuiBJ0Qx3cfLyirjrKqTipe3o3gh/5RSHQ6VAhz
gdUG7WszNWk8FDCL6RTWzPOrbUyJo/wz1kblsL3vhV7ldEKFHeEjsDGroW2VUFlS
asAHNvM4/uYcOSECggEBANYH0427qZtLVuL97htXW9kCAT75xbMwgRskAH4nJDlZ
IggDErmzBhtrHgR+9X09iL47jr7dUcrVNPHzK/WXALFSKzXhkG/yAgmt3r14WgJ6
5y7010LlPFrzaNEyO/S4ISuBLt4cinjJsrFpoo0WI8jXeM5ddG6ncxdurKXMymY7
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::.::
:::::::::::::::::::::::::::.::::::::::::::::::::::::::::::::::::
LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLlL
ÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖÖ
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ
YYYYYYYYYYYYYYYYYYYYYyYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY
gff0GJCOMZ65pMSy3A3cSAtjlKnb4fWzuHD5CFbusN4WhCT/tNxGNSpzvxd8GIDs
nY7exs9L230oCCpedVgcbayHCbkChEfoPzL1e1jXjgCwCTgt8GjeEFqc1gXNEaUn
O8AJ4VlR8fRszHm6yR0ZUBdY7UJddxQiYOzt0S1RLlECggEAbdcs4mZdqf3OjejJ
06oTPs9NRtAJVZlppSi7pmmAyaNpOuKWMoLPElDAQ3Q7VX26LlExLCZoPOVpdqDH
KbdmBEfTR4e11Pn9vYdu9/i6o10U4hpmf4TYKlqk10g1Sj21l8JATj/7Diey8scO
sAI1iftSg3aBSj8W7rxCxSezrENzuqw5D95a/he1cMUTB6XuravqZK5O4eR0vrxR
AvMzXk5OXrUEALUvt84u6m6XZZ0pq5XZxq74s8p/x1JvTwcpJ3jDKNEixlHfdHEZ
ZIu/xpcwD5gRfVGQamdcWvzGHZYLBFO1y5kAtL8kI9tW7WaouWVLmv99AyxdAaCB
Y5mBAQKCAQEAzU7AnorPzYndlOzkxRFtp6MGsvRBsvvqPLCyUFEXrHNV872O7tdO
GmsMZl+q+TJXw7O54FjJJvqSSS1sk68AGRirHop7VQce8U36BmI2ZX6j2SVAgIkI
9m3btCCt5rfiCatn2+Qg6HECmrCsHw6H0RbwaXS4RZUXD/k4X+sslBitOb7K+Y+N
Bacq6QxxjlIqQdKKPs4P2PNHEAey+kEJJGEQ7bTkNxCZ21kgi1Sc5L8U/IGy0BMC
PvJxssLdaWILyp3Ws8Q4RAoC5c0ZP0W2j+5NSbi3jsDFi0Y6/2GRdY1HAZX4twem
Q0NCedq1JNatP1gsb6bcnVHFDEGsj/35oQKCAQEAgmWMuSrojR/fjJzvke6Wvbox
FRnPk+6YRzuYhAP/YPxSRYyB5at++5Q1qr7QWn7NFozFIVFFT8CBU36ktWQ39MGm
cJ5SGyN9nAbbuWA6e+/u059R7QL+6f64xHRAGyLT3gOb1G0N6h7VqFT25q5Tq0rc
Lf/CvLKoudjv+sQ5GKBPT18+zxmwJ8YUWAsXUyrqoFWY/Tvo5yLxaC0W2gh3+Ppi
EDqe4RRJ3VKuKfZxHn5VLxgtBFN96Gy0+Htm5tiMKOZMYAkHiL+vrVZAX0hIEuRZ
JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
-----END RSA PRIVATE KEY-----" >> id_rsa

Now, if you try to reconstruct this key by removing the “obvious” garbage lines (the ones that are all repeated characters, some of which aren’t even valid base64 characters), it still isn’t a key – at least, openssl pkey doesn’t want anything to do with it. The key is very much still in there, though, as we shall soon see.

Using a gem I wrote and a quick bit of Ruby, we can extract a complete private key. The irb session looks something like this:

>> require "derparse"
>> b64 = <<EOF
MIIJKgIBAAKCAgEAxEVih1JGb8gu/Fm4AZh+ZwJw/pjzzliWrg4mICFt1g7SmIE2
TCQMKABdwd11wOFKCPc/UzRH/fHuQcvWrpbOSdqev/zKff9iedKw/YygkMeIRaXB
fYELqvUAOJ8PPfDm70st9GJRhjGgo5+L3cJB2gfgeiDNHzaFvapRSU0oMGQX+kI9
ezsjDAn+0Pp+r3h/u1QpLSH4moRFGF4omNydI+3iTGB98/EzuNhRBHRNq4oBV5SG
Pq/A1bem2ninnoEaQ+OPESxYzDz3Jy9jV0W/6LvtJ844m+XX69H5fqq5dy55z6DW
sGKn78ULPVZPsYH5Y7C+CM6GAn4nYCpau0t52sqsY5epXdeYx4Dc+Wm0CjXrUDEe
Egl4loPKDxJkQqQ/MQiz6Le/UK9vEmnWn1TRXK3ekzNV4NgDfJANBQobOpwt8WVB
rbsC0ON7n680RQnl7PltK9P1AQW5vHsahkoixk/BhcwhkrkZGyDIl9g8Q/Euyoq3
eivKPLz7/rhDE7C1BzFy7v8AjC3w7i9QeHcWOZFAXo5hiDasIAkljDOsdfD4tP5/
wSO6E6pjL3kJ+RH2FCHd7ciQb+IcuXbku64ln8gab4p8jLa/mcMI+V3eWYnZ82Yu
axsa85hAe4wb60cp/rCJo7ihhDTTvGooqtTisOv2nSvCYpcW9qbL6cGjAXECAwEA
AQKCAgEAjz6wnWDP5Y9ts2FrqUZ5ooamnzpUXlpLhrbu3m5ncl4ZF5LfH+QDN0Kl
KvONmHsUhJynC/vROybSJBU4Fu4bms1DJY3C39h/L7g00qhLG7901pgWMpn3QQtU
4P49qpBii20MGhuTsmQQALtV4kB/vTgYfinoawpo67cdYmk8lqzGzzB/HKxZdNTq
s+zOfxRr7PWMo9LyVRuKLjGyYXZJ/coFaobWBi8Y96Rw5NZZRYQQXLIalC/Dhndm
AHckpstEtx2i8f6yxEUOgPvV/gD7Akn92RpqOGW0g/kYpXjGqZQy9PVHGy61sInY
HSkcOspIkJiS6WyJY9JcvJPM6ns4b84GE9qoUlWVF3RWJk1dqYCw5hz4U8LFyxsF
R6WhYiImvjxBLpab55rSqbGkzjI2z+ucDZyl1gqIv9U6qceVsgRyuqdfVN4deU22
LzO5IEDhnGdFqg9KQY7u8zm686Ejs64T1sh0y4GOmGsSg+P6nsqkdlXH8C+Cf03F
lqPFg8WQC7ojl/S8dPmkT5tcJh3BPwIWuvbtVjFOGQc8x0lb+NwK8h2Nsn6LNazS
0H90adh/IyYX4sBMokrpxAi+gMAWiyJHIHLeH2itNKtAQd3qQowbrWNswJSgJzsT
JuJ7uqRKAFkE6nCeAkuj/6KHHMPsfCAffVdyGaWqhoxmPOrnVgECggEBAOrCCwiC
XxwUgjOfOKx68siFJLfHf4vPo42LZOkAQq5aUmcWHbJVXmoxLYSczyAROopY0wd6
Dx8rqnpO7OtZsdJMeBSHbMVKoBZ77hiCQlrljcj12moFaEAButLCdZFsZW4zF/sx
kWIAaPH9vc4MvHHyvyNoB3yQRdevu57X7xGf9UxWuPil/jvdbt9toaraUT6rUBWU
GYPNKaLFsQzKsFWAzp5RGpASkhuiBJ0Qx3cfLyirjrKqTipe3o3gh/5RSHQ6VAhz
gdUG7WszNWk8FDCL6RTWzPOrbUyJo/wz1kblsL3vhV7ldEKFHeEjsDGroW2VUFlS
asAHNvM4/uYcOSECggEBANYH0427qZtLVuL97htXW9kCAT75xbMwgRskAH4nJDlZ
IggDErmzBhtrHgR+9X09iL47jr7dUcrVNPHzK/WXALFSKzXhkG/yAgmt3r14WgJ6
5y7010LlPFrzaNEyO/S4ISuBLt4cinjJsrFpoo0WI8jXeM5ddG6ncxdurKXMymY7
EOF
>> b64 += <<EOF
gff0GJCOMZ65pMSy3A3cSAtjlKnb4fWzuHD5CFbusN4WhCT/tNxGNSpzvxd8GIDs
nY7exs9L230oCCpedVgcbayHCbkChEfoPzL1e1jXjgCwCTgt8GjeEFqc1gXNEaUn
O8AJ4VlR8fRszHm6yR0ZUBdY7UJddxQiYOzt0S1RLlECggEAbdcs4mZdqf3OjejJ
06oTPs9NRtAJVZlppSi7pmmAyaNpOuKWMoLPElDAQ3Q7VX26LlExLCZoPOVpdqDH
KbdmBEfTR4e11Pn9vYdu9/i6o10U4hpmf4TYKlqk10g1Sj21l8JATj/7Diey8scO
sAI1iftSg3aBSj8W7rxCxSezrENzuqw5D95a/he1cMUTB6XuravqZK5O4eR0vrxR
AvMzXk5OXrUEALUvt84u6m6XZZ0pq5XZxq74s8p/x1JvTwcpJ3jDKNEixlHfdHEZ
ZIu/xpcwD5gRfVGQamdcWvzGHZYLBFO1y5kAtL8kI9tW7WaouWVLmv99AyxdAaCB
Y5mBAQKCAQEAzU7AnorPzYndlOzkxRFtp6MGsvRBsvvqPLCyUFEXrHNV872O7tdO
GmsMZl+q+TJXw7O54FjJJvqSSS1sk68AGRirHop7VQce8U36BmI2ZX6j2SVAgIkI
9m3btCCt5rfiCatn2+Qg6HECmrCsHw6H0RbwaXS4RZUXD/k4X+sslBitOb7K+Y+N
Bacq6QxxjlIqQdKKPs4P2PNHEAey+kEJJGEQ7bTkNxCZ21kgi1Sc5L8U/IGy0BMC
PvJxssLdaWILyp3Ws8Q4RAoC5c0ZP0W2j+5NSbi3jsDFi0Y6/2GRdY1HAZX4twem
Q0NCedq1JNatP1gsb6bcnVHFDEGsj/35oQKCAQEAgmWMuSrojR/fjJzvke6Wvbox
FRnPk+6YRzuYhAP/YPxSRYyB5at++5Q1qr7QWn7NFozFIVFFT8CBU36ktWQ39MGm
cJ5SGyN9nAbbuWA6e+/u059R7QL+6f64xHRAGyLT3gOb1G0N6h7VqFT25q5Tq0rc
Lf/CvLKoudjv+sQ5GKBPT18+zxmwJ8YUWAsXUyrqoFWY/Tvo5yLxaC0W2gh3+Ppi
EDqe4RRJ3VKuKfZxHn5VLxgtBFN96Gy0+Htm5tiMKOZMYAkHiL+vrVZAX0hIEuRZ
EOF
>> der = b64.unpack("m").first
>> c = DerParse.new(der).first_node.first_child
>> version = c.value
=> 0
>> c = c.next_node
>> n = c.value
=> 80071596234464993385068908004931... # (etc)
>> c = c.next_node
>> e = c.value
=> 65537
>> c = c.next_node
>> d = c.value
=> 58438813486895877116761996105770... # (etc)
>> c = c.next_node
>> p = c.value
=> 29635449580247160226960937109864... # (etc)
>> c = c.next_node
>> q = c.value
=> 27018856595256414771163410576410... # (etc)

What I’ve done, in case you don’t speak Ruby, is take the two “chunks” of plausible-looking base64 data, chuck them together into a variable named b64, unbase64 it into a variable named der, pass that into a new DerParse instance, and then walk the DER value tree until I got all the values I need.

Interestingly, the q value actually traverses the “split” in the two chunks, which means that there’s always the possibility that there are lines missing from the key. However, since p and q are supposed to be prime, we can “sanity check” them to see if corruption is likely to have occurred:

>> require "openssl"
>> OpenSSL::BN.new(p).prime?
=> true
>> OpenSSL::BN.new(q).prime?
=> true

Excellent! The chances of a corrupted file producing valid-but-incorrect prime numbers isn’t huge, so we can be fairly confident that we’ve got the “real” p and q. Now, with the help of another one of my creations we can use e, p, and q to create a fully-operational battle key:

>> require "openssl/pkey/rsa"
>> k = OpenSSL::PKey::RSA.from_factors(p, q, e)
=> #<OpenSSL::PKey::RSA:0x0000559d5903cd38>
>> k.valid?
=> true
>> k.verify(OpenSSL::Digest::SHA256.new, k.sign(OpenSSL::Digest::SHA256.new, "bob"), "bob")
=> true

… and there you have it. One fairly redacted-looking private key brought back to life by maths and far too much free time.

Sorry Mr. Finn, I hope you’re not still using that key on anything Internet-facing.

What About Other Key Types?

EC keys are very different beasts, but they have much the same problems as RSA keys. A typical EC key contains both private and public data, and the public portion is twice the size – so only about 1/3 of the data in the key is private material. It is quite plausible that you can “redact” an EC key and leave all the actually private bits exposed.

What Do We Do About It?

In short: don’t ever try and redact real private keys. For documentation purposes, just put “KEY GOES HERE” in the appropriate spot, or something like that. Store your secrets somewhere that isn’t a public (or even private!) git repo.

Generating a “dummy” private key and sticking it in there isn’t a great idea, for different reasons: people have this odd habit of reusing “demo” keys in real life. There’s no need to encourage that sort of thing.


  1. Technically the pieces aren’t 100% aligned with the underlying DER, because of how base64 works. I felt it was easier to understand if I stuck to chopping up the base64, rather than decoding into DER and then chopping up the DER. 

MicroHams Digital Conference (MHDC) 2020

On May 9 2020 (PST) I had the pleasure of speaking at the MicroHams Digital Conference (MHDC) 2020. Due to COVID-19 presenters attended via Zoom, and the conference was live streamed over YouTube.

Thanks to hard work of the organisers, this worked really well!

Looking at the conference program, I noticed the standard of the presenters was very high. The organisers I worked with (Scott N7SS, and Grant KB7WSD) explained that a side effect of making the conference virtual was casting a much wider net on presenters – making the conference even better than IRL (In Real Life)! The YouTube streaming stats showed 300-500 people “attending” – also very high.

My door to door travel time to West Coast USA is about 20 hours. So a remote presentation makes life much easier for me. It takes me a week to prepare, means 1-2 weeks away from home, and a week to recover from the jetlag. As a single parent I need to find a carer for my 14 year old.

Vickie, KD7LAW, ran a break out room for after talk chat which worked well. It was nice to “meet” several people that I usually just have email contact with. All from the comfort of my home on a Sunday morning in Adelaide (Saturday afternoon PST).

The MHDC 2020 talks have been now been published on YouTube. Here is my talk, which is a good update (May 2020) of Codec 2 and FreeDV, including:

  • The new FreeDV 2020 mode using the LPCNet neural net vocoder
  • Embedded FreeDV 700D running on the SM1000
  • FreeDV over the QO-100 geosynchronous satellite and KiwiSDRs
  • Introducing some of the good people contributing to FreeDV

The conference has me interested in applying the open source modems we have developed for digital voice to Amateur Radio packet and HF data. So I’m reading up on Winlink, Pat, Direwolf and friends.

Thanks Scott, Grant, and Vickie and the MicroHams club!

May 16, 2020

Raptor Blackbird support: all upstream in op-build

Thanks to my most recent PR being merged, op-build v2.5 will have full support for the Raptor Blackbird! This includes support for the “IPL Monitor” that’s required to get fan control going.

Note that if you’re running Fedora 32 then you need some patches to buildroot to have it build, but if you’re building on something a little older, then upstream should build and work straight out of the box (err… git tree).

I also note that the work to get Secure Boot for an OS Kernel going is starting to make its way out for code reviews, so that’s something to look forward to (although without a TPM we’re going to need extra code).

May 13, 2020

A op-build v2.5-rc1 based Raptor Blackbird Build

I have done a few builds of firmware for the Raptor Blackbird since I got mine, each of them based on upstream op-build plus a few patches. The previous one was Yet another near-upstream Raptor Blackbird firmware build that I built a couple of months ago. This new build is based off the release candidate of op-build v2.5. Here’s what’s changed:

PackageOld VersionNew Version
hcodehw030220a.opmsthw050520a.opmst
hostbootacdff8a390a2654dd52fed67bdebe2b5
kexec-lite18ec88310c4134e6b0130b3c1ea489e
libflashv6.5-228-g82aed17av6.6
linuxv5.4.22v5.4.33
linux-headersv5.4.22v5.4.33
machine-xml17e9e84d504582c88e782e30829e0d6be
occ3ab29212518e65740ab4dc96fd6cf584c42
openpower-pnor6fb8d914134d544a84175f00d9c6dc395faf3
sbec318ab00116d92f08c78fb7838495ad0aab7
skibootv6.5-228-g82aed17av6.6
Changes in my latest Blackbird build

Go grab blackbird.pnor from https://www.flamingspork.com/blackbird/stewart-blackbird-6-images/, and give it a go! Just scp it to your BMC, and flash it:

pflash -E -p /tmp/blackbird.pnor

There’s two differences from upstream op-build: my pull request to op-build, and the fixing of the (old) buildroot so that it’ll build on Fedora 32. From discussions on the openpower-firmware mailing list, it seems that one hopeful thing is to have all the Blackbird support merged in before the final op-build v2.5 is tagged. The previous op-build release (v2.4) was tagged in July 2019, so we’re about 10 months into what was a 2 month release cycle, so speculating on when that final release will be is somewhat difficult.

May 12, 2020

f32, u32, and const

Some time ago, I wrote “floats, bits, and constant expressions” about converting floating point number into its representative ones and zeros as a C++ constant expression – constructing the IEEE 754 representation without being able to examine the bits directly.

I’ve been playing around with Rust recently, and rewrote that conversion code as a bit of a learning exercise for myself, with a thoroughly contrived set of constraints: using integer and single-precision floating point math, at compile time, without unsafe blocks, while using as few unstable features as possible.

I’ve included the listing below, for your bemusement and/or head-shaking, and you can play with the code in the Rust Playground and rust.godbolt.org

// Jonathan Adamczewski 2020-05-12
//
// Constructing the bit-representation of an IEEE 754 single precision floating 
// point number, using integer and single-precision floating point math, at 
// compile time, in rust, without unsafe blocks, while using as few unstable 
// features as I can.
//
// or "What if this silly C++ thing http://brnz.org/hbr/?p=1518 but in Rust?"


// Q. Why? What is this good for?
// A. To the best of my knowledge, this code serves no useful purpose. 
//    But I did learn a thing or two while writing it :)


// This is needed to be able to perform floating point operations in a const 
// function:
#![feature(const_fn)]


// bits_transmute(): Returns the bits representing a floating point value, by
//                   way of std::mem::transmute()
//
// For completeness (and validation), and to make it clear the fundamentally 
// unnecessary nature of the exercise :D - here's a short, straightforward, 
// library-based version. But it needs the const_transmute flag and an unsafe 
// block.
#![feature(const_transmute)]
const fn bits_transmute(f: f32) -> u32 {
  unsafe { std::mem::transmute::<f32, u32>(f) }
}



// get_if_u32(predicate:bool, if_true: u32, if_false: u32):
//   Returns if_true if predicate is true, else if_false
//
// If and match are not able to be used in const functions (at least, not 
// without #![feature(const_if_match)] - so here's a branch-free select function
// for u32s
const fn get_if_u32(predicate: bool, if_true: u32, if_false: u32) -> u32 {
  let pred_mask = (-1 * (predicate as i32)) as u32;
  let true_val = if_true & pred_mask;
  let false_val = if_false & !pred_mask;
  true_val | false_val
}

// get_if_f32(predicate, if_true, if_false):
//   Returns if_true if predicate is true, else if_false
//
// A branch-free select function for f32s.
// 
// If either is_true or is_false is NaN or an infinity, the result will be NaN,
// which is not ideal. I don't know of a better way to implement this function
// within the arbitrary limitations of this silly little side quest.
const fn get_if_f32(predicate: bool, if_true: f32, if_false: f32) -> f32 {
  // can't convert bool to f32 - but can convert bool to i32 to f32
  let pred_sel = (predicate as i32) as f32;
  let pred_not_sel = ((!predicate) as i32) as f32;
  let true_val = if_true * pred_sel;
  let false_val = if_false * pred_not_sel;
  true_val + false_val
}


// bits(): Returns the bits representing a floating point value.
const fn bits(f: f32) -> u32 {
  // the result value, initialized to a NaN value that will otherwise not be
  // produced by this function.
  let mut r = 0xffff_ffff;

  // These floation point operations (and others) cause the following error:
  //     only int, `bool` and `char` operations are stable in const fn
  // hence #![feature(const_fn)] at the top of the file
  
  // Identify special cases
  let is_zero    = f == 0_f32;
  let is_inf     = f == f32::INFINITY;
  let is_neg_inf = f == f32::NEG_INFINITY;
  let is_nan     = f != f;

  // Writing this as !(is_zero || is_inf || ...) cause the following error:
  //     Loops and conditional expressions are not stable in const fn
  // so instead write this as type coversions, and bitwise operations
  //
  // "normalish" here means that f is a normal or subnormal value
  let is_normalish = 0 == ((is_zero as u32) | (is_inf as u32) | 
                        (is_neg_inf as u32) | (is_nan as u32));

  // set the result value for each of the special cases
  r = get_if_u32(is_zero,    0,           r); // if (iz_zero)    { r = 0; }
  r = get_if_u32(is_inf,     0x7f80_0000, r); // if (is_inf)     { r = 0x7f80_0000; }
  r = get_if_u32(is_neg_inf, 0xff80_0000, r); // if (is_neg_inf) { r = 0xff80_0000; }
  r = get_if_u32(is_nan,     0x7fc0_0000, r); // if (is_nan)     { r = 0x7fc0_0000; }
 
  // It was tempting at this point to try setting f to a "normalish" placeholder 
  // value so that special cases do not have to be handled in the code that 
  // follows, like so:
  // f = get_if_f32(is_normal, f, 1_f32);
  //
  // Unfortunately, get_if_f32() returns NaN if either input is NaN or infinite.
  // Instead of switching the value, we work around the non-normalish cases 
  // later.
  //
  // (This whole function is branch-free, so all of it is executed regardless of 
  // the input value)

  // extract the sign bit
  let sign_bit  = get_if_u32(f < 0_f32,  1, 0);

  // compute the absolute value of f
  let mut abs_f = get_if_f32(f < 0_f32, -f, f);

  
  // This part is a little complicated. The algorithm is functionally the same 
  // as the C++ version linked from the top of the file.
  // 
  // Because of the various contrived constraints on thie problem, we compute 
  // the exponent and significand, rather than extract the bits directly.
  //
  // The idea is this:
  // Every finite single precision float point number can be represented as a
  // series of (at most) 24 significant digits as a 128.149 fixed point number 
  // (128: 126 exponent values >= 0, plus one for the implicit leading 1, plus 
  // one more so that the decimal point falls on a power-of-two boundary :)
  // 149: 126 negative exponent values, plus 23 for the bits of precision in the 
  // significand.)
  //
  // If we are able to scale the number such that all of the precision bits fall 
  // in the upper-most 64 bits of that fixed-point representation (while 
  // tracking our effective manipulation of the exponent), we can then 
  // predictably and simply scale that computed value back to a range than can 
  // be converted safely to a u64, count the leading zeros to determine the 
  // exact exponent, and then shift the result into position for the final u32 
  // representation.
  
  // Start with the largest possible exponent - subsequent steps will reduce 
  // this number as appropriate
  let mut exponent: u32 = 254;
  {
    // Hex float literals are really nice. I miss them.

    // The threshold is 2^87 (think: 64+23 bits) to ensure that the number will 
    // be large enough that, when scaled down by 2^64, all the precision will 
    // fit nicely in a u64
    const THRESHOLD: f32 = 154742504910672534362390528_f32; // 0x1p87f == 2^87

    // The scaling factor is 2^41 (think: 64-23 bits) to ensure that a number 
    // between 2^87 and 2^64 will not overflow in a single scaling step.
    const SCALE_UP: f32 = 2199023255552_f32; // 0x1p41f == 2^41

    // Because loops are not available (no #![feature(const_loops)], and 'if' is
    // not available (no #![feature(const_if_match)]), perform repeated branch-
    // free conditional multiplication of abs_f.

    // use a macro, because why not :D It's the most compact, simplest option I 
    // could find.
    macro_rules! maybe_scale {
      () => {{
        // care is needed: if abs_f is above the threshold, multiplying by 2^41 
        // will cause it to overflow (INFINITY) which will cause get_if_f32() to
        // return NaN, which will destroy the value in abs_f. So compute a safe 
        // scaling factor for each iteration.
        //
        // Roughly equivalent to :
        // if (abs_f < THRESHOLD) {
        //   exponent -= 41;
        //   abs_f += SCALE_UP;
        // }
        let scale = get_if_f32(abs_f < THRESHOLD, SCALE_UP,      1_f32);    
        exponent  = get_if_u32(abs_f < THRESHOLD, exponent - 41, exponent); 
        abs_f     = get_if_f32(abs_f < THRESHOLD, abs_f * scale, abs_f);
      }}
    }
    // 41 bits per iteration means up to 246 bits shifted.
    // Even the smallest subnormal value will end up in the desired range.
    maybe_scale!();  maybe_scale!();  maybe_scale!();
    maybe_scale!();  maybe_scale!();  maybe_scale!();
  }

  // Now that we know that abs_f is in the desired range (2^87 <= abs_f < 2^128)
  // scale it down to be in the range (2^23 <= _ < 2^64), and convert without 
  // loss of precision to u64.
  const INV_2_64: f32 = 5.42101086242752217003726400434970855712890625e-20_f32; // 0x1p-64f == 2^64
  let a = (abs_f * INV_2_64) as u64;

  // Count the leading zeros.
  // (C++ doesn't provide a compile-time constant function for this. It's nice 
  // that rust does :)
  let mut lz = a.leading_zeros();

  // if the number isn't normalish, lz is meaningless: we stomp it with 
  // something that will not cause problems in the computation that follows - 
  // the result of which is meaningless, and will be ignored in the end for 
  // non-normalish values.
  lz = get_if_u32(!is_normalish, 0, lz); // if (!is_normalish) { lz = 0; }

  {
    // This step accounts for subnormal numbers, where there are more leading 
    // zeros than can be accounted for in a valid exponent value, and leading 
    // zeros that must remain in the final significand.
    //
    // If lz < exponent, reduce exponent to its final correct value - lz will be
    // used to remove all of the leading zeros.
    //
    // Otherwise, clamp exponent to zero, and adjust lz to ensure that the 
    // correct number of bits will remain (after multiplying by 2^41 six times - 
    // 2^246 - there are 7 leading zeros ahead of the original subnormal's
    // computed significand of 0.sss...)
    // 
    // The following is roughly equivalent to:
    // if (lz < exponent) {
    //   exponent = exponent - lz;
    // } else {
    //   exponent = 0;
    //   lz = 7;
    // }

    // we're about to mess with lz and exponent - compute and store the relative 
    // value of the two
    let lz_is_less_than_exponent = lz < exponent;

    lz       = get_if_u32(!lz_is_less_than_exponent, 7,             lz);
    exponent = get_if_u32( lz_is_less_than_exponent, exponent - lz, 0);
  }

  // compute the final significand.
  // + 1 shifts away a leading 1-bit for normal, and 0-bit for subnormal values
  // Shifts are done in u64 (that leading bit is shifted into the void), then
  // the resulting bits are shifted back to their final resting place.
  let significand = ((a << (lz + 1)) >> (64 - 23)) as u32;

  // combine the bits
  let computed_bits = (sign_bit << 31) | (exponent << 23) | significand;

  // return the normalish result, or the non-normalish result, as appopriate
  get_if_u32(is_normalish, computed_bits, r)
}


// Compile-time validation - able to be examined in rust.godbolt.org output
pub static BITS_BIGNUM: u32 = bits(std::f32::MAX);
pub static TBITS_BIGNUM: u32 = bits_transmute(std::f32::MAX);
pub static BITS_LOWER_THAN_MIN: u32 = bits(7.0064923217e-46_f32);
pub static TBITS_LOWER_THAN_MIN: u32 = bits_transmute(7.0064923217e-46_f32);
pub static BITS_ZERO: u32 = bits(0.0f32);
pub static TBITS_ZERO: u32 = bits_transmute(0.0f32);
pub static BITS_ONE: u32 = bits(1.0f32);
pub static TBITS_ONE: u32 = bits_transmute(1.0f32);
pub static BITS_NEG_ONE: u32 = bits(-1.0f32);
pub static TBITS_NEG_ONE: u32 = bits_transmute(-1.0f32);
pub static BITS_INF: u32 = bits(std::f32::INFINITY);
pub static TBITS_INF: u32 = bits_transmute(std::f32::INFINITY);
pub static BITS_NEG_INF: u32 = bits(std::f32::NEG_INFINITY);
pub static TBITS_NEG_INF: u32 = bits_transmute(std::f32::NEG_INFINITY);
pub static BITS_NAN: u32 = bits(std::f32::NAN);
pub static TBITS_NAN: u32 = bits_transmute(std::f32::NAN);
pub static BITS_COMPUTED_NAN: u32 = bits(std::f32::INFINITY/std::f32::INFINITY);
pub static TBITS_COMPUTED_NAN: u32 = bits_transmute(std::f32::INFINITY/std::f32::INFINITY);


// Run-time validation of many more values
fn main() {
  let end: usize = 0xffff_ffff;
  let count = 9_876_543; // number of values to test
  let step = end / count;
  for u in (0..=end).step_by(step) {
      let v = u as u32;
      
      // reference
      let f = unsafe { std::mem::transmute::<u32, f32>(v) };
      
      // compute
      let c = bits(f);

      // validation
      if c != v && 
         !(f.is_nan() && c == 0x7fc0_0000) && // nans
         !(v == 0x8000_0000 && c == 0) { // negative 0
          println!("{:x?} {:x?}", v, c); 
      }
  }
}

May 10, 2020

IT Asset Management

In my last full-time position I managed the asset tracking database for my employer. It was one of those things that “someone” needed to do, and it seemed that only way that “someone” wouldn’t equate to “no-one” was for me to do it – which was ok. We used Snipe IT [1] to track the assets. I don’t have enough experience with asset tracking to say that Snipe is better or worse than average, but it basically did the job. Asset serial numbers are stored, you can have asset types that allow you to just add one more of the particular item, purchase dates are stored which makes warranty tracking easier, and every asset is associated with a person or listed as available. While I can’t say that Snipe IT is better than other products I can say that it will do the job reasonably well.

One problem that I didn’t discover until way too late was the fact that the finance people weren’t tracking serial numbers and that some assets in the database had the same asset IDs as the finance department and some had different ones. The best advice I can give to anyone who gets involved with asset tracking is to immediately chat to finance about how they track things, you need to know if the same asset IDs are used and if serial numbers are tracked by finance. I was pleased to discover that my colleagues were all honourable people as there was no apparent evaporation of valuable assets even though there was little ability to discover who might have been the last person to use some of the assets.

One problem that I’ve seen at many places is treating small items like keyboards and mice as “assets”. I think that anything that is worth less than 1 hour’s pay at the minimum wage (the price of a typical PC keyboard or mouse) isn’t worth tracking, treat it as a disposable item. If you hire a programmer who requests an unusually expensive keyboard or mouse (as some do) it still won’t be a lot of money when compared to their salary. Some of the older keyboards and mice that companies have are nasty, months of people eating lunch over them leaves them greasy and sticky. I think that the best thing to do with the keyboards and mice is to give them away when people leave and when new people join the company buy new hardware for them. If a company can’t spend $25 on a new keyboard and mouse for each new employee then they either have a massive problem of staff turnover or a lack of priority on morale.

A breadmaker loaf my kids will actually eat

Share

My dad asked me to document some of my baking experiments from the recent natural disasters, which I wanted to do anyway so that I could remember the recipes. Its taken me a while to get around to though, because animated GIFs on reddit are a terrible medium for recipe storage, and because I’ve been distracted with other shiney objects. That said, let’s start with the basics — a breadmaker bread that my kids will actually eat.

A loaf of bread baked in the oven

This recipe took a bunch of iterations to get right over the last year or so, but I’ll spare you the long boring details. However, I suspect part of the problem is that the receipe varies by bread maker. Oh, and the salt is really important — don’t skip the salt!

Wet ingredients (add first)

  • 1.5 cups of warm water (we have an instantaneous gas hot water system, so I pick 42 degrees)
  • 0.25 cups of oil (I use bran oil)

Dry ingredients (add second)

I just kind of chuck these in, although I tend to put the non-flour ingredients in a corner together for reasons that I can’t explain.

  • 3.5 cups of bakers flour (must be bakers flour, not plain flour)
  • 2 tea spoons of instant yeast (we keep in the freezer in a big packet, not the sashets)
  • 4 tea spoons of white sugar
  • 1 tea spoon of salt
  • 2 tea spoons of bread improver

I then just let my bread maker do its thing, which takes about three hours including baking. If I am going to bake the bread in the over, then the dough takes about two hours, but I let the dough rise for another 30 to 60 minutes before baking.

A loaf of bread from the bread maker

I think to be honest that the result is better from the oven, but a little more work. The bread maker loaves are a bit prone to collapsing (you can see it starting on the example above), and there is a big kneeding hook indent in the middle of the bottom of the loaf.

The oven baking technique took a while to develop, but I’ll cover that in a later post.

Share

May 06, 2020

About Reopening Businesses

Currently there is political debate about when businesses should be reopened after the Covid19 quarantine.

Small Businesses

One argument for reopening things is for the benefit of small businesses. The first thing to note is that the protests in the US say “I need a haircut” not “I need to cut people’s hair”. Small businesses won’t benefit from reopening sooner.

For every business there is a certain minimum number of customers needed to be profitable. There are many comments from small business owners that want it to remain shutdown. When the government has declared a shutdown and paused rent payments and provided social security to employees who aren’t working the small business can avoid bankruptcy. If they suddenly have to pay salaries or make redundancy payouts and have to pay rent while they can’t make a profit due to customers staying home they will go bankrupt.

Many restaurants and cafes make little or no profit at most times of the week (I used to be 1/3 owner of an Internet cafe and know this well). For such a company to be viable you have to be open most of the time so customers can expect you to be open. Generally you don’t keep a cafe open at 3PM to make money at 3PM, you keep it open so people can rely on there being a cafe open there, someone who buys a can of soda at 3PM one day might come back for lunch at 1:30PM the next day because they know you are open. A large portion of the opening hours of a most retail companies can be considered as either advertising for trade at the profitable hours or as loss making times that you can’t close because you can’t send an employee home for an hour.

If you have seating for 28 people (as my cafe did) then for about half the opening hours you will probably have 2 or fewer customers in there at any time, for about a quarter the opening hours you probably won’t cover the salary of the one person on duty. The weekend is when you make the real money, especially Friday and Saturday nights when you sometimes get all the seats full and people coming in for takeaway coffee and snacks. On Friday and Saturday nights the 60 seat restaurant next door to my cafe used to tell customers that my cafe made better coffee. It wasn’t economical for them to have a table full for an hour while they sell a few cups of coffee, they wanted customers to leave after dessert and free the table for someone who wants a meal with wine (alcohol is the real profit for many restaurants).

The plans of reopening with social distancing means that a 28 seat cafe can only have 14 chairs or less (some plans have 25% capacity which would mean 7 people maximum). That means decreasing the revenue of the most profitable times by 50% to 75% while also not decreasing the operating costs much. A small cafe has 2-3 staff when it’s crowded so there’s no possibility of reducing staff by 75% when reducing the revenue by 75%.

My Internet cafe would have closed immediately if forced to operate in the proposed social distancing model. It would have been 1/4 of the trade and about 1/8 of the profit at the most profitable times, even if enough customers are prepared to visit – and social distancing would kill the atmosphere. Most small businesses are barely profitable anyway, most small businesses don’t last 4 years in normal economic circumstances.

This reopen movement is about cutting unemployment benefits not about helping small business owners. Destroying small businesses is also good for big corporations, kill the small cafes and restaurants and McDonald’s and Starbucks will win. I think this is part of the motivation behind the astroturf campaign for reopening businesses.

Forbes has an article about this [1].

Psychological Issues

Some people claim that we should reopen businesses to help people who have psychological problems from isolation, to help victims of domestic violence who are trapped at home, to stop older people being unemployed for the rest of their lives, etc.

Here is one article with advice for policy makers from domestic violence experts [2]. One thing it mentions is that the primary US federal government program to deal with family violence had a budget of $130M in 2013. The main thing that should be done about family violence is to make it a priority at all times (not just when it can be a reason for avoiding other issues) and allocate some serious budget to it. An agency that deals with problems that affect families and only has a budget of $1 per family per year isn’t going to be able to do much.

There are ongoing issues of people stuck at home for various reasons. We could work on better public transport to help people who can’t drive. We could work on better healthcare to help some of the people who can’t leave home due to health problems. We could have more budget for carers to help people who can’t leave home without assistance. Wanting to reopen restaurants because some people feel isolated is ignoring the fact that social isolation is a long term ongoing issue for many people, and that many of the people who are affected can’t even afford to eat at a restaurant!

Employment discrimination against people in the 50+ age range is an ongoing thing, many people in that age range know that if they lose their job and can’t immediately find another they will be unemployed for the rest of their lives. Reopening small businesses won’t help that, businesses running at low capacity will have to lay people off and it will probably be the older people. Also the unemployment system doesn’t deal well with part time work. The Australian system (which I think is similar to most systems in this regard) reduces the unemployment benefits by $0.50 for every dollar that is earned in part time work, that effectively puts people who are doing part time work because they can’t get a full-time job in the highest tax bracket! If someone is going to pay for transport to get to work, work a few hours, then get half the money they earned deducted from unemployment benefits it hardly makes it worthwhile to work. While the exact health impacts of Covid19 aren’t well known at this stage it seems very clear that older people are disproportionately affected, so forcing older people to go back to work before there is a vaccine isn’t going to help them.

When it comes to these discussions I think we should be very suspicious of people who raise issues they haven’t previously shown interest in. If the discussion of reopening businesses seems to be someone’s first interest in the issues of mental health, social security, etc then they probably aren’t that concerned about such issues.

I believe that we should have a Universal Basic Income [3]. I believe that we need to provide better mental health care and challenge the gender ideas that hurt men and cause men to hurt women [4]. I believe that we have significant ongoing problems with inequality not small short term issues [5]. I don’t think that any of these issues require specific changes to our approach to preventing the transmission of disease. I also think that we can address multiple issues at the same time, so it is possible for the government to devote more resources to addressing unemployment, family violence, etc while also dealing with a pandemic.

May 03, 2020

Backing up to a GnuBee PC 2

After installing Debian buster on my GnuBee, I set it up for receiving backups from my other computers.

Software setup

I started by configuring it like a typical server but without a few packages that either take a lot of memory or CPU:

I changed the default hostname:

  • /etc/hostname: foobar
  • /etc/mailname: foobar.example.com
  • /etc/hosts: 127.0.0.1 foobar.example.com foobar localhost

and then installed the avahi-daemon package to be able to reach this box using foobar.local.

I noticed the presence of a world-writable directory and so I tightened the security of some of the default mount points by putting the following in /etc/rc.local:

chmod 755 /etc/network
exit 0

Hardware setup

My OS drive (/dev/sda) is a small SSD so that the GnuBee can run silently when the spinning disks aren't needed. To hold the backup data on the other hand, I got three 4-TB drives drives which I setup in a RAID-5 array. If the data were valuable, I'd use RAID-6 instead since it can survive two drives failing at the same time, but in this case since it's only holding backups, I'd have to lose the original machine at the same time as two of the 3 drives, a very unlikely scenario.

I created new gpt partition tables on /dev/sdb, /dev/sdbc, /dev/sdd and used fdisk to create a single partition of type 29 (Linux RAID) on each of them.

Then I created the RAID array:

mdadm /dev/md127 --create -n 3 --level=raid5 -a /dev/sdb1 /dev/sdc1 /dev/sdd1

and waited more than 24 hours for that operation to finish. Next, I formatted the array:

mkfs.ext4 -m 0 /dev/md127

and added the following to /etc/fstab:

/dev/md127 /mnt/data/ ext4 noatime,nodiratime 0 2

Keeping a copy of the root partition

In order to survive a failing SSD drive, I could have bought a second SSD and gone for a RAID-1 setup. Instead, I went for a cheaper option, a poor man's RAID-1, where I will have to reinstall the machine but it will be very quick and I won't lose any of my configuration.

The way that it works is that I periodically sync the contents of the root partition onto the RAID-5 array using a cronjob in /etc/cron.d/hdd-sync:

0 10 * * *     root    /usr/local/sbin/ssd_root_backup

which runs the /usr/local/sbin/ssd_root_backup script:

#!/bin/sh
nocache nice ionice -c3 rsync -aHx --delete --exclude=/dev/* --exclude=/proc/* --exclude=/sys/* --exclude=/tmp/* --exclude=/mnt/* --exclude=/lost+found/* --exclude=/media/* --exclude=/var/tmp/* /* /mnt/data/root/

Drive spin down

To reduce unnecessary noise and reduce power consumption, I also installed hdparm:

apt install hdparm

and configured all spinning drives to spin down after being idle for 10 minutes by putting the following in /etc/hdparm.conf:

/dev/sdb {
       spindown_time = 120
}

/dev/sdc {
       spindown_time = 120
}

/dev/sdd {
       spindown_time = 120
}

and then reloaded the configuration:

 /usr/lib/pm-utils/power.d/95hdparm-apm resume

Monitoring drive health

Finally I setup smartmontools by putting the following in /etc/smartd.conf:

/dev/sda -a -o on -S on -s (S/../.././02|L/../../6/03)
/dev/sdb -a -o on -S on -s (S/../.././02|L/../../6/03)
/dev/sdc -a -o on -S on -s (S/../.././02|L/../../6/03)
/dev/sdd -a -o on -S on -s (S/../.././02|L/../../6/03)

and restarting the daemon:

systemctl restart smartd.service

Backup setup

I started by using duplicity since I have been using that tool for many years, but a 190GB backup took around 15 hours on the GnuBee with gigabit ethernet.

After a friend suggested it, I took a look at restic and I have to say that I am impressed. The same backup finished in about half the time.

User and ssh setup

After hardening the ssh setup as I usually do, I created a user account for each machine needing to backup onto the GnuBee:

adduser machine1
adduser machine1 sshuser
adduser machine1 sftponly
chsh machine1 -s /bin/false

and then matching directories under /mnt/data/home/:

mkdir /mnt/data/home/machine1
chown machine1:machine1 /mnt/data/home/machine1
chmod 700 /mnt/data/home/machine1

Then I created a custom ssh key for each machine:

ssh-keygen -f /root/.ssh/foobar_backups -t ed25519

and placed it in /home/machine1/.ssh/authorized_keys on the GnuBee.

On each machine, I added the following to /root/.ssh/config:

Host foobar.local
    User machine1
    Compression no
    Ciphers aes128-ctr
    IdentityFile /root/backup/foobar_backups
    IdentitiesOnly yes
    ServerAliveInterval 60
    ServerAliveCountMax 240

The reason for setting the ssh cipher and disabling compression is to speed up the ssh connection as much as possible given that the GnuBee has a very small RAM bandwidth.

Another performance-related change I made on the GnuBee was switching to the internal sftp server by putting the following in /etc/ssh/sshd_config:

Subsystem      sftp    internal-sftp

Restic script

After reading through the excellent restic documentation, I wrote the following backup script, based on my old duplicity script, to reuse on all of my computers:

# Configure for each host
PASSWORD="XXXX"  # use `pwgen -s 64` to generate a good random password
BACKUP_HOME="/root/backup"
REMOTE_URL="sftp:foobar.local:"
RETENTION_POLICY="--keep-daily 7 --keep-weekly 4 --keep-monthly 12 --keep-yearly 2"

# Internal variables
SSH_IDENTITY="IdentityFile=$BACKUP_HOME/foobar_backups"
EXCLUDE_FILE="$BACKUP_HOME/exclude"
PKG_FILE="$BACKUP_HOME/dpkg-selections"
PARTITION_FILE="$BACKUP_HOME/partitions"

# If the list of files has been requested, only do that
if [ "$1" = "--list-current-files" ]; then
    RESTIC_PASSWORD=$PASSWORD restic --quiet -r $REMOTE_URL ls latest
    exit 0

# Show list of available snapshots
elif [ "$1" = "--list-snapshots" ]; then
    RESTIC_PASSWORD=$GPG_PASSWORD restic --quiet -r $REMOTE_URL snapshots
    exit 0

# Restore the given file
elif [ "$1" = "--file-to-restore" ]; then
    if [ "$2" = "" ]; then
        echo "You must specify a file to restore"
        exit 2
    fi
    RESTORE_DIR="$(mktemp -d ./restored_XXXXXXXX)"
    RESTIC_PASSWORD=$PASSWORD restic --quiet -r $REMOTE_URL restore latest --target "$RESTORE_DIR" --include "$2" || exit 1
    echo "$2 was restored to $RESTORE_DIR"
    exit 0

# Delete old backups
elif [ "$1" = "--prune" ]; then
    # Expire old backups
    RESTIC_PASSWORD=$PASSWORD restic --quiet -r $REMOTE_URL forget $RETENTION_POLICY

    # Delete files which are no longer necessary (slow)
    RESTIC_PASSWORD=$PASSWORD restic --quiet -r $REMOTE_URL prune
    exit 0

# Catch invalid arguments
elif [ "$1" != "" ]; then
    echo "Invalid argument: $1"
    exit 1
fi

# Check the integrity of existing backups
RESTIC_PASSWORD=$PASSWORD restic --quiet -r $REMOTE_URL check || exit 1

# Dump list of Debian packages
dpkg --get-selections > $PKG_FILE

# Dump partition tables from harddrives
/sbin/fdisk -l /dev/sda > $PARTITION_FILE
/sbin/fdisk -l /dev/sdb > $PARTITION_FILE

# Do the actual backup
RESTIC_PASSWORD=$PASSWORD restic --quiet --cleanup-cache -r $REMOTE_URL backup / --exclude-file $EXCLUDE_FILE

I run it with the following cronjob in /etc/cron.d/backups:

30 8 * * *    root  ionice nice nocache /root/backup/backup-machine1-to-foobar
30 2 * * Sun  root  ionice nice nocache /root/backup/backup-machine1-to-foobar --prune

in a way that doesn't impact the rest of the system too much.

Finally, I printed a copy of each of my backup script, using enscript, to stash in a safe place:

enscript --highlight=bash --style=emacs --output=- backup-machine1-to-foobar | ps2pdf - > foobar.pdf

This is actually a pretty important step since without the password, you won't be able to decrypt and restore what's on the GnuBee.

May 02, 2020

Audiobooks – April 2020

Cockpit Confidential: Everything You Need to Know About Air Travel: Questions, Answers, and Reflections by Patrick Smith

Lots of “you always wanted to know” & “this is how it really is” bits about commercial flying. Good fun 4/5

The Day of the Jackal by Frederick Forsyth

A very tightly written thriller about a fictional 1963 plot to assassinate Frnch President Charles de Gaulle. Fast moving, detailed and captivating 5/5

Topgun: An American Story by Dan Pedersen

Memoir from the first officer in charge of the US Navy’s Top Gun school. A mix of his life & career, the school and US Navy air history (especially during Vietnam). Excellent 4/5

Radicalized: Four Tales of Our Present Moment
by Cory Doctorow

4 short stories set in more-or-less the present day. They all work fairly well. Worth a read. Spoilers in the link. 3/5

On the Banks of Plum Creek: Little House Series, Book 4 by Laura Ingalls Wilder

The family settle in Minnesota and build a new farm. Various major and minor adventures. I’m struck how few possessions people had back then. 3/5

My Father’s Business: The Small-Town Values That Built Dollar General into a Billion-Dollar Company by Cal Turner Jr.

A mix of personal and company history. I found the early story of the company and personal stuff the most interesting. 3/5

You Can’t Fall Off the Floor: And Other Lessons from a Life in Hollywood by Harris and Nick Katleman

Memoir by a former studio exec and head. Lots of funny and interesting stories from his career, featuring plenty of famous names. 4/5

The Wave: In Pursuit of the Rogues, Freaks and Giants of the Ocean by Susan Casey

75% about Big-wave Tow-Surfers with chapters on Scientists and Shipping industry people mixed in. Competent but author’s heart seemed mostly in the surfing. 3/5

Share

April 27, 2020

Install the COVIDSafe app

I can’t think of a more unequivocal title than that. 🙂

The Australian government doesn’t have a good track record of either launching publicly visible software projects, or respecting privacy, so I’ve naturally been sceptical of the contact tracing app since it was announced. The good news is, while it has some relatively minor problems, it appears to be a solid first version.

Privacy

While the source code is yet to be released, the Android version has already been decompiled, and public analysis is showing that it only collects necessary information, and only uploads contact information to the government servers when you press the button to upload (you should only press that button if you actually get COVID-19, and are asked to upload it by your doctor).

The legislation around the app is also clear that the data you upload can only be accessed by state health officials. Commonwealth departments have no access, neither do non-health departments (eg, law enforcement, intelligence).

Technical

It does what it’s supposed to do, and hasn’t been found to open you up to risks by installing it. There are a lot of people digging into it, so I would expect any significant issues to be found, reported, and fixed quite quickly.

Some parts of it are a bit rushed, and the way it scans for contacts could be more battery efficient (that should hopefully be fixed in the coming weeks when Google and Apple release updates that these contact tracing apps can use).

If it produces useful data, however, I’m willing to put up with some quirks. 🙂

Usefulness

I’m obviously not an epidemiologist, but those I’ve seen talk about it say that yes, the data this app produces will be useful for augmenting the existing contact tracing efforts. There were some concerns that it could produce a lot of junk data that wastes time, but I trust the expert contact tracing teams to filter and prioritise the data they get from it.

Install it!

The COVIDSafe site has links to the app in Apple’s App Store, as well as Google’s Play Store. Setting it up takes a few minutes, and then you’re done!

April 26, 2020

YouTube Channels I subscribe to in April 2020

I did a big twitter thread of the YouTube channels I am following. Below is a copy of the tweets. They are a quick description of the channel and a link to a sample video.

Lots of pop-Science and TV/Movie analysis channels plus a few on other topics.

I should mention that I watch the majority of YouTube videos at speed 1.5x since they usually speak quite slowly. To Speed up videos click on the settings “cog” and then select “Playback Speed” . YouTube lets you go up to 2x

Image

Chris Stuckmann reviews movies. During normal times he does a couple per week. Mostly currently releases with some old ones. His reviews are low-spoiler although sometimes he’ll do an extra “Spoiler Review”. Usually around 6 minutes long.
Star Wars: The Rise of Skywalker – Movie Review

Wendover Productions does explainer videos. Air & Sea travel are quite common topics. Usually a bit better researched than some of the other channels and a little longer at around 12 minutes. Around 1 video per week.
The Logistics of the US Census

City Beautiful is a channel about cities and City planning. 1-2 videos per month. Usually around 10 minutes. Pitched for the amateur city and planning enthusiast
Where did the rules of the road come from?

PBS Eons does videos about the history of life on Earth. Lots of Dinosaurs, early humans and the like. Run and advised by experts so info is great quality. Links to refs! Accessible but dives into the detail. Around 1 video/week. About 10 minutes each.
How the Egg Came First

Pitch Meetings are a writer pitching a real (usually recent) movie or show to a studio exec. Both a played by Ryan George. Very funny. Part of the Screen Rant channel but I don’t watch their other stuff
Playlist
Netflix’s Tiger King Pitch Meeting

MrMobile [Michael Fisher] reviews Phones, Laptops, Smart Watches & other tech gadgets. Usually about one video/week. I like the descriptive style and good production values, Not too much spec flooding.
A Stunning Smartwatch With A Familiar Failing – New Moto 360 Review

Verge Science does professional level stories about a range of Science topics. They usually are out in the field with Engineers and scientists.
Why urban coyote sightings are on the rise

Alt Shift X do detailed explainer videos about Books & TV Shows like Game of Thrones, Watchmen & Westworld. Huge amounts of detail and a great style with a wall of pictures. Weekly videos when shows are on plus subscriber extras.
Watchmen Explained (original comic)

The B1M talks about building and construction projects. Many videos are done with cooperation of the architects or building companies so a bit fluffy at times. But good production values and interesting topics.
The World’s Tallest Modular Hotel

CineFix doesn’t a variety of Movie-related videos. Over the last year only putting about one or two per month and mostly high quality. A few years ago they were at higher volume and had more throw-aways
Jojo Rabbit – What’s the Difference?

Marques Brownlee (MKBHD) does tech reviews. Mainly phones but also other gear and the odd special. His videos are extremely high quality and well researched. Averaging 2 videos per week.
Samsung Galaxy S20 Ultra Review: Attack of the Numbers!

How it Should have Ended does cartoons of funny alternative endings for movies. Plus some other long running series. Usually only a few minutes long.
Avengers Endgame Alternate HISHE

Power Play Chess is a Chess channel from Daniel King. He usually covers 1 round/day from major tournaments as well as reviewing older games and other videos.
World Champion tastes the bullet | Firouzja vs Carlsen | Lichess Bullet match 2020

Tom Scott makes explainer videos mostly about science, technology and geography. Often filmed on site rather than being talks over pictures like other channels.
Inside The Billion-Euro Nuclear Reactor That Was Never Switched On

Screen Junkies does stuff about movies. I mostly watch their “Honest Trailers” but they sometimes do ‘Serious Questions” which are good too.
Honest Trailers | Terminator: Dark Fate

Half as Interesting is an offshoot of Wendover Productions (see above). It does shorter 3-5 minutes weekly videos on a quick amusing fact or happening (that doesn’t justify a longer video)
United Airlines’ Men-Only Flights

Red Team Review is another movie and TV review channel. I was mostly watching them when Game of Thrones was on and since then they have had a bit less content. They are making some Game of Thrones videos narrated by the TV actors though
Game of Thrones Histories & Lore – The Rains of Castamere

Signum University do online classes about Fantasy (especially Tolkien) and related literature. Their channel features their classes and related videos. I mainly follow “Exploring The Lord of the Rings”. Often sounds better at 2x or 3x speed.
A Wizard of Earthsea: Session 01 – Mageborn

The Nerdwriter does approx monthly videos. Usually about a specific type of art, a painting or film making technique. Very high quality
How Walter Murch Worldized Film Sound

Real Life Lore does infotainment videos. “Answers to questions that you’ve never asked. Mostly over topics like history, geography, economics and science”.
This Was the World’s Most Dangerous Amusement Park

Janice Fung is a Sydney based youtuber who makes videos mostly about food and travel. She puts out 2 videos most weeks.
I Made the Viral Tik Tok Frothy DALGONA COFFEE! (Whipped Coffee Without Mixer!!)

Real Engineering is a bit more technical than the average popsci channel. The especially like doing videos covering flight dynamics. but they cover lots of other topics
How The Ford Model T Took Over The World

Just Write by Sage Hyden puts out a video roughly once a month. They are essays usually about writing and usually tied into a recently movie or show.
A Disney Monopoly Is A Problem (According To Disney’s Recess)

CGP Grey makes high quality explainer videos. Around one every month. High quality and usually with lots of animation.
The Trouble With Tumbleweed

Lessons from the Screenplay are “videos that analyze movie scripts to examine exactly how and why they are so good at telling their stories”
Casino Royale — How Action Reveals Character

HaxDogma is another TV Show review/analysis channel. I started watching him for his Watchmen Series videos and now watch his Westworld ones.
Official Westworld Trailer Breakdown + 3 Hidden Trailers

Lindsay Ellis does videos mostly about pop culture, Usually movies. These days she only does a few a year but they are usually 20+ minutes.
The Hobbit: A Long-Expected Autopsy (Part 1/2)

A bonus couple of recommended Courses on ‘Crash Course
Crash Course Astronomy with Phil Plait
Crash Course Computer Science by Carrie Anne Philbin

Share

April 24, 2020

Disabling mail sending from your domain

I noticed that I was receiving some bounced email notifications from a domain I own (cloud.geek.nz) to host my blog. These notifications were all for spam messages spoofing the From address since I do not use that domain for email.

I decided to try setting a strict DMARC policy to see if DMARC-using mail servers (e.g. GMail) would then drop these spoofed emails without notifying me about it.

I started by setting this initial DMARC policy in DNS in order to monitor the change:

@ TXT v=spf1 -all
_dmarc TXT v=DMARC1; p=none; ruf=mailto:dmarc@fmarier.org; sp=none; aspf=s; fo=0:1:d:s;

Then I waited three weeks without receiving anything before updating the relevant DNS records to this final DMARC policy:

@ TXT v=spf1 -all
_dmarc TXT v=DMARC1; p=reject; sp=reject; aspf=s;

This policy states that nobody is allowed to send emails for this domain and that any incoming email claiming to be from this domain should be silently rejected.

I haven't noticed any bounce notifications for messages spoofing this domain in a while, so maybe it's working?

FreeDV Beacon Maintenance

There’s been some recent interest in the FreeDV Beacon project, originally developed back in 2015. A FreeDV beacon was operating in Sunbury, VK3, for several years and was very useful for testing FreeDV.

After being approach by John (VK3IC) and Bob (VK4YA), I decided to dust off the software and bring it across to a GitHub repo. It’s now running on my laptop happily and I hope John and Bob will soon have some beacons running on the air.

I’ve added support for FreeDV 700C and 700D modes, finding a tricky bug in the process. I really should read the instructions for my own API!

Thanks also to Richard (KF5OIM) for help with the Cmake build system.


April 18, 2020

Accessing USB serial devices in Fedora Silverblue

One of the things I do a lot on my Fedora machines is talk to devices via USB serial. While a device is correctly detected at /dev/ttyUSB0 and owned by the dialout group, adding myself to that group doesn’t work as it can’t be found. This is because under Silverblue, there are two different group files (/usr/lib/group and /etc/group) with different content.

There are some easy ways to solve this, for example we can create the matching dialout group or write a udev rule. Let’s take a look!

On the host with groups

If you try to add yourself to the dialout group it will fail.

sudo gpasswd -a ${USER} dialout
gpasswd: group 'dialout' does not exist in /etc/group

Trying to re-create the group will also fail as it’s already in use.

sudo groupadd dialout -r -g 18
groupadd: GID '18' already exists

So instead, we can simply grab the entry from the OS group file and add it to /etc/group ourselves.

grep ^dialout: /usr/lib/group |sudo tee -a /etc/group

Now we are able to add ourselves to the dialout group!

sudo gpasswd -a ${USER} dialout

Activate that group in our current shell.

newgrp dialout

And now we can use a tool like screen to talk to the device (note you will have needed to install screen with rpm-ostree and rebooted first).

screen /dev/ttyUSB0 115200

And that’s it. We can now talk to USB serial devices on the host.

Inside a container with udev

Inside a container is a little more tricky as the dialout group is not passed into it. Thus, inside the container the device is owned by nobody and the user will have no permissions to read or write to it.

One way to deal with this and still use the regular toolbox command is to create a udev rule and make yourself the owner of the device on the host, instead of root.

To do this, we create a generic udev rule for all usb-serial devices.

cat << EOF | sudo tee /etc/udev/rules.d/50-usb-serial.rules
SUBSYSTEM=="tty", SUBSYSTEMS=="usb-serial", OWNER="${USER}"
EOF

If you need to create a more specific rule, you can find other bits to match by (like kernel driver, etc) with the udevadm command.

udevadm info -a -n /dev/ttyUSB0

Once you have your rule, reload udev.

sudo udevadm control --reload-rules
sudo udevadm trigger

Now, unplug your serial device and plug it back in. You should notice that it is now owned by your user.

ls -l /dev/ttyUSB0
crw-rw----. 1 csmart dialout 188, 0 Apr 18 20:53 /dev/ttyUSB0

It should also be the same inside the toolbox container now.

[21:03 csmart ~]$ toolbox enter
⬢[csmart@toolbox ~]$ ls -l /dev/ttyUSB0 
crw-rw----. 1 csmart nobody 188, 0 Apr 18 20:53 /dev/ttyUSB0

And of course, as this is inside a container, you can just dnf install screen or whatever other program you need.

Of course, if you’re happy to create the udev rule then you don’t need to worry about the groups solution on the host.