Planet Linux Australia
Celebrating Australians & Kiwis in the Linux and Free/Open-Source community...

February 16, 2020

DisplayPort and 4K

The Problem

Video playback looks better with a higher scan rate. A lot of content that was designed for TV (EG almost all historical documentaries) is going to be 25Hz interlaced (UK and Australia) or 30Hz interlaced (US). If you view that on a low refresh rate progressive scan display (EG a modern display at 30Hz) then my observation is that it looks a bit strange. Things that move seem to jump a bit and it’s distracting.

Getting HDMI to work with 4K resolution at a refresh rate higher than 30Hz seems difficult.

What HDMI Can Do

According to the HDMI Wikipedia page [1], HDMI 1.3–1.4b (introduced in June 2006) supports 30Hz refresh at 4K resolution and if you use 4:2:0 Chroma Subsampling (see the Chroma Subsampling Wikipedia page [2] you can do 60Hz or 75Hz on HDMI 1.3–1.4b. Basically for colour 4:2:0 means half the horizontal and half the vertical resolution while giving the same resolution for monochrome. For video that apparently works well (4:2:0 is standard for Blue Ray) and for games it might be OK, but for text (my primary use of computers) it would suck.

So I need support for HDMI 2.0 (introduced in September 2013) on the video card and monitor to do 4K at 60Hz. Apparently none of the combinations of video card and HDMI cable I use for Linux support that.

HDMI Cables

The Wikipedia page alleges that you need either a “Premium High Speed HDMI Cable” or a “Ultra High Speed HDMI Cable” for 4K resolution at 60Hz refresh rate. My problems probably aren’t related to the cable as my testing has shown that a cheap “High Speed HDMI Cable” can work at 60Hz with 4K resolution with the right combination of video card, monitor, and drivers. A Windows 10 system I maintain has a Samsung 4K monitor and a NVidia GT630 video card running 4K resolution at 60Hz (according to Windows). The NVidia GT630 card is one that I tried on two Linux systems at 4K resolution and causes random system crashes on both, it seems like a nice card for Windows but not for Linux.

Apparently the HDMI devices test the cable quality and use whatever speed seems to work (the cable isn’t identified to the devices). The prices at a local store are $3.98 for “high speed”, $19.88 for “premium high speed”, and $39.78 for “ultra high speed”. It seems that trying a “high speed” cable first before buying an expensive cable would make sense, especially for short cables which are likely to be less susceptible to noise.

What DisplayPort Can Do

According to the DisplayPort Wikipedia page [3] versions 1.2–1.2a (introduced in January 2010) support HBR2 which on a “Standard DisplayPort Cable” (which probably means almost all DisplayPort cables that are in use nowadays) allows 60Hz and 75Hz 4K resolution.

Comparing HDMI and DisplayPort

In summary to get 4K at 60Hz you need 2010 era DisplayPort or 2013 era HDMI. Apparently some video cards that I currently run for 4K (which were all bought new within the last 2 years) are somewhere between a 2010 and 2013 level of technology.

Also my testing (and reading review sites) shows that it’s common for video cards sold in the last 5 years or so to not support HDMI resolutions above FullHD, that means they would be HDMI version 1.1 at the greatest. HDMI 1.2 was introduced in August 2005 and supports 1440p at 30Hz. PCIe was introduced in 2003 so there really shouldn’t be many PCIe video cards that don’t support HDMI 1.2. I have about 8 different PCIe video cards in my spare parts pile that don’t support HDMI resolutions higher than FullHD so it seems that such a limitation is common.

The End Result

For my own workstation I plugged a DisplayPort cable between the monitor and video card and a Linux window appeared (from KDE I think) offering me some choices about what to do, I chose to switch to the “new monitor” on DisplayPort and that defaulted to 60Hz. After that change TV shows on NetFlix and Amazon Prime both look better. So it’s a good result.

As an aside DisplayPort cables are easier to scrounge as the HDMI cables get taken by non-computer people for use with their TV.

February 15, 2020

Self Assessment

Background Knowledge

The Dunning Kruger Effect [1] is something everyone should read about. It’s the effect where people who are bad at something rate themselves higher than they deserve because their inability to notice their own mistakes prevents improvement, while people who are good at something rate themselves lower than they deserve because noticing all their mistakes is what allows them to improve.

Noticing all your mistakes all the time isn’t great (see Impostor Syndrome [2] for where this leads).

Erik Dietrich wrote an insightful article “How Developers Stop Learning: Rise of the Expert Beginner” [3] which I recommend that everyone reads. It is about how some people get stuck at a medium level of proficiency and find it impossible to unlearn bad practices which prevent them from achieving higher levels of skill.

What I’m Concerned About

A significant problem in large parts of the computer industry is that it’s not easy to compare various skills. In the sport of bowling (which Erik uses as an example) it’s easy to compare your score against people anywhere in the world, if you score 250 and people in another city score 280 then they are more skilled than you. If I design an IT project that’s 2 months late on delivery and someone else designs a project that’s only 1 month late are they more skilled than me? That isn’t enough information to know. I’m using the number of months late as an arbitrary metric of assessing projects, IT projects tend to run late and while delivery time might not be the best metric it’s something that can be measured (note that I am slightly joking about measuring IT projects by how late they are).

If the last project I personally controlled was 2 months late and I’m about to finish a project 1 month late does that mean I’ve increased my skills? I probably can’t assess this accurately as there are so many variables. The Impostor Syndrome factor might lead me to think that the second project was easier, or I might get egotistical and think I’m really great, or maybe both at the same time.

This is one of many resources recommending timely feedback for education [4], it says “Feedback needs to be timely” and “It needs to be given while there is still time for the learners to act on it and to monitor and adjust their own learning”. For basic programming tasks such as debugging a crashing program the feedback is reasonably quick. For longer term tasks like assessing whether the choice of technologies for a project was good the feedback cycle is almost impossibly long. If I used product A for a year long project does it seem easier than product B because it is easier or because I’ve just got used to it’s quirks? Did I make a mistake at the start of a year long project and if so do I remember why I made that choice I now regret?

Skills that Should be Easy to Compare

One would imagine that martial arts is a field where people have very realistic understanding of their own skills, a few minutes of contest in a ring, octagon, or dojo should show how your skills compare to others. But a YouTube search for “no touch knockout” or “chi” shows that there are more than a few “martial artists” who think that they can knock someone out without physical contact – with just telepathy or something. George Dillman [5] is one example of someone who had some real fighting skills until he convinced himself that he could use mental powers to knock people out. From watching YouTube videos it appears that such people convince the members of their dojo of their powers, and those people then faint on demand “proving” their mental powers.

The process of converting an entire dojo into believers in chi seems similar to the process of converting a software development team into “expert beginners”, except that martial art skills should be much easier to assess.

Is it ever possible to assess any skills if people trying to compare martial art skills often do it so badly?

Conclusion

It seems that any situation where one person is the undisputed expert has a risk of the “chi” problem if the expert doesn’t regularly meet peers to learn new techniques. If someone like George Dillman or one of the “expert beginners” that Erik Dietrich refers to was to regularly meet other people with similar skills and accept feedback from them they would be much less likely to become a “chi” master or “expert beginner”. For the computer industry meetup.com seems the best solution to this, whatever your IT skills are you can find a meetup where you can meet people with more skills than you in some area.

Here’s one of many guides to overcoming Imposter Syndrome [5]. Actually succeeding in following the advice of such web pages is not going to be easy.

I wonder if getting a realistic appraisal of your own skills is even generally useful. Maybe the best thing is to just recognise enough things that you are doing wrong to be able to improve and to recognise enough things that you do well to have the confidence to do things without hesitation.

February 14, 2020

Bidirectional rc joystick

With a bit of tinkering one can use the https://github.com/bmellink/IBusBM library to send information back to the remote controller. The info is tagged as either temperature, rpm, or voltage and units set based on that. There is a limit of 9 user feedbacks so I have 3 of each exposed.


To do this I used one of the Mega 2650 boards that is in a small form factor configuration. This gave me 5 volts to run the actual rc receiver from and more than one UART to talk to the usb, input and output parts of the buses. I think you only need 2 UARTs but as I had a bunch I just used separate ones.

The 2560 also gives a lavish amount of ram so using ROS topics doesn't really matter. I have 9 subscribers and 1 publisher on the 2560. The 9 subscribers allows sending temp, voltage, rpm info back to the remote and flexibility in what is sent so that can be adjusted on the robot itself.

I used a servo extension cable to carry the base 5v, ground, and rx signals from the ibus out on the rc receiver unit. Handy as the servo plug ends can be taped together for the more bumpy environment that the hound likes to tackle. I wound up putting the diode floating between two extension wires on the (to tx) side of the bus.



The 1 publisher just sends an array with the raw RC values in it. With minimal delays I can get a reasonably steady 120hz publication of rc values. So now the houndbot can tell me when it is getting hungry for more fresh electrons from a great distance!

I had had some problems with the nano and the rc unit and locking up. I think perhaps this was due to crystals as the uno worked ok. The 2560 board has been bench tested for 30 minutes which was enough time to expose the issues on the nano.


POC Wireguard + FRR: Now with OSPFv2!

If you read my last post, I set up a POC with wireguard and FRR to have to power of wireguard (WG) but all the routing worked out with FRR. But I had a problem. When using RIPv2, the broadcast messages seemed to get stuck in the WG interfaces until I tcpdumped it. This meant that once I tcpdumped the routes would get through, but only to eventually go stale and disappear.

I talked with the awesome people in the #wireguard IRC channel on freenode and was told to simply stay clear of RIP.

So I revisited my POC env and swapped out RIP for OSPF.. and guess what.. it worked! Now all the routes get propagated and they stay there. Which means if I decided to add new WG links and make it grow, so should all the routing:

suse@wireguard-5:~> ip r
default via 172.16.0.1 dev eth0 proto dhcp
10.0.2.0/24 via 10.0.4.104 dev wg0 proto 188 metric 20
10.0.3.0/24 via 10.0.4.104 dev wg0 proto 188 metric 20
10.0.4.0/24 dev wg0 proto kernel scope link src 10.0.4.105
172.16.0.0/24 dev eth0 proto kernel scope link src 172.16.0.36
172.16.1.0/24 via 10.0.4.104 dev wg0 proto 188 metric 20
172.16.2.0/24 via 10.0.4.104 dev wg0 proto 188 metric 20
172.16.3.0/24 via 10.0.4.104 dev wg0 proto 188 metric 20
172.16.4.0/24 dev eth1 proto kernel scope link src 172.16.4.105
172.16.5.0/24 dev eth2 proto kernel scope link src 172.16.5.105

Isn’t that beautiful, all networks on one of the more distant nodes, including network 1 (172.16.1.0/24).

I realise this doesn’t make much sense unless you read the last post, but never fear, I thought I’ll rework and append the build notes here, in case you interested again.

Build notes – This time with OSPFv2

The topology we’ll be building

Seeing that this is my Suse hackweek project and now use OpenSuse, I’ll be using OpenSuse Leap 15.1 for all the nodes (and the KVM host too).

Build the env

I used ansible-virt-infra created by csmart to build the env. A created my own inventory file, which you can dump in the inventory/ folder which I called wireguard.yml:

---
wireguard:
hosts:
wireguard-1:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-blue"
- name: "net-green"
wireguard-2:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-blue"
- name: "net-white"
wireguard-3:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-white"
wireguard-4:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-orange"
- name: "net-green"
wireguard-5:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-orange"
- name: "net-yellow"
wireguard-6:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-yellow"
vars:
virt_infra_distro: opensuse
virt_infra_distro_image: openSUSE-Leap-15.1-JeOS.x86_64-15.1.0-OpenStack-Cloud-Current.qcow2
virt_infra_distro_image_url: https://download.opensuse.org/distribution/leap/15.1/jeos/openSUSE-Leap-15.1-JeOS.x86_64-15.1.0-OpenStack-Cloud-Current.qcow2
virt_infra_variant: opensuse15.1

Next we need to make sure the networks have been defined, we do this in the kvmhost inventory file, here’s a diff:

diff --git a/inventory/kvmhost.yml b/inventory/kvmhost.yml
index b1f029e..6d2485b 100644
--- a/inventory/kvmhost.yml
+++ b/inventory/kvmhost.yml
@@ -40,6 +40,36 @@ kvmhost:
           subnet: "255.255.255.0"
           dhcp_start: "10.255.255.2"
           dhcp_end: "10.255.255.254"
+        - name: "net-mgmt"
+          ip_address: "172.16.0.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.0.2"
+          dhcp_end: "172.16.0.99"
+        - name: "net-white"
+          ip_address: "172.16.1.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.1.2"
+          dhcp_end: "172.16.1.99"
+        - name: "net-blue"
+          ip_address: "172.16.2.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.2.2"
+          dhcp_end: "172.16.2.99"
+        - name: "net-green"
+          ip_address: "172.16.3.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.3.2"
+          dhcp_end: "172.16.3.99"
+        - name: "net-orange"
+          ip_address: "172.16.4.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.4.2"
+          dhcp_end: "172.16.4.99"
+        - name: "net-yellow"
+          ip_address: "172.16.5.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.5.2"
+          dhcp_end: "172.16.5.99"
     virt_infra_host_deps:
         - qemu-img
         - osinfo-query

Now all we need to do is run the playbook:

ansible-playbook --limit kvmhost,wireguard ./virt-infra.yml

Setting up the IPs and tunnels

This above infrastructure tool uses cloud_init to set up the network, so only the first NIC is up. You can confirm this with:

ansible wireguard -m shell -a "sudo ip a"

That’s ok because we want to use the numbers on our diagram anyway 🙂
Before we get to that, lets make sure wireguard is setup, and update all the nodes.

ansible wireguard -m shell -a "sudo zypper update -y"

If a reboot is required, reboot the nodes:

ansible wireguard -m shell -a "sudo reboot"

Add the wireguard repo to the nodes and install it, I look forward to 5.6 where wireguard will be included in the kernel:

ansible wireguard -m shell -a "sudo zypper addrepo -f obs://network:vpn:wireguard wireguard"

ansible wireguard -m shell -a "sudo zypper --gpg-auto-import-keys install -y wireguard-kmp-default wireguard-tools"

Load the kernel module:

ansible wireguard -m shell -a "sudo modprobe wireguard"

Let’s create wg0 on all wireguard nodes:

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo ip link add dev wg0 type wireguard"

And add wg1 to those nodes that have 2:

ansible wireguard-1,wireguard-4 -m shell -a "sudo ip link add dev wg1 type wireguard"

Now while we’re at it, lets create all the wireguard keys (because we can use ansible):

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo mkdir -p /etc/wireguard"

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "wg genkey | sudo tee /etc/wireguard/wg0-privatekey | wg pubkey | sudo tee /etc/wireguard/wg0-publickey"

ansible wireguard-1,wireguard-4 -m shell -a "wg genkey | sudo tee /etc/wireguard/wg1-privatekey | wg pubkey | sudo tee /etc/wireguard/wg1-publickey"

Let’s make sure we enable forwarding on the nodes the will pass traffic, and install the routing software (1,2,4 and 5):

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo sysctl net.ipv4.conf.all.forwarding=1"

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo sysctl net.ipv6.conf.all.forwarding=1"

While we’re at it, we might as well add the network repo so we can install FRR and then install it on the nodes:

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo zypper ar https://download.opensuse.org/repositories/network/openSUSE_Leap_15.1/ network"

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo zypper --gpg-auto-import-keys install -y frr libyang-extentions"

This time we’ll be using OSPFv2, as we’re just using IPv4:

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo sed -i 's/^ospfd=no/ospfd=yes/' /etc/frr/daemons"

And with that now we just need to do all per server things like add IPs and configure all the keys, peers, etc. We’ll do this a host at a time.
NOTE: As this is a POC we’re just using ip commands, obviously in a real env you’d wont to use systemd-networkd or something to make these stick.

wireguard-1

Firstly using:
sudo virsh dumpxml wireguard-1 |less

We can see that eth1 is net-blue and eth2 is net-green so:
ssh wireguard-1

First IPs:
sudo ip address add dev eth1 172.16.2.101/24
sudo ip address add dev eth2 172.16.3.101/24
sudo ip address add dev wg0 10.0.2.101/24
sudo ip address add dev wg1 10.0.3.101/24

Load up the tunnels:
sudo wg set wg0 listen-port 51821 private-key /etc/wireguard/wg0-privatekey

# Node2 (2.102) public key is: P1tHKnaw7d2GJUSwXZfcayrrLMaCBHqcHsaM3eITm0s= (cat /etc/wireguard/wg0-publickey)

sudo wg set wg0 peer P1tHKnaw7d2GJUSwXZfcayrrLMaCBHqcHsaM3eITm0s= allowed-ips 10.0.2.0/24,224.0.0.0/8,172.16.0.0/16 endpoint 172.16.2.102:51822

sudo ip link set wg0 up

NOTE: We add 224.0.0.0/8 and 172.16.0.0/16 to allowed-ips. The first allows the OSPF multicast packets through. The latter will allow us to route to our private network that has been joined by WG tunnels.

sudo wg set wg1 listen-port 51831 private-key /etc/wireguard/wg1-privatekey

# Node4 (3.104) public key is: GzY59HlXkCkfXl9uSkEFTHzOtBsxQFKu3KWGFH5P9Qc= (cat /etc/wireguard/wg1-publickey)

sudo wg set wg1 peer GzY59HlXkCkfXl9uSkEFTHzOtBsxQFKu3KWGFH5P9Qc= allowed-ips 10.0.3.0/24,224.0.0.0/8,172.16.0.0/16 endpoint 172.16.3.104:51834

sudo ip link set wg1 up

Setup FRR:
sudo tee /etc/frr/frr.conf <<EOF
hostname $(hostname)

password frr
enable password frr

log file /var/log/frr/frr.log

router ospf
network 10.0.2.0/24 area 0.0.0.0
network 10.0.3.0/24 area 0.0.0.0
redistribute connected
EOF

sudo systemctl restart frr

wireguard-2

Firstly using:
sudo virsh dumpxml wireguard-2 |less

We can see that eth1 is net-blue and eth2 is net-white so:

ssh wireguard-2

First IPs:
sudo ip address add dev eth1 172.16.2.102/24
sudo ip address add dev eth2 172.16.1.102/24
sudo ip address add dev wg0 10.0.2.102/24


Load up the tunnels:
sudo wg set wg0 listen-port 51822 private-key /etc/wireguard/wg0-privatekey

# Node1 (2.101) public key is: ZsHAeRbNsK66MBOwDJhdDgJRl0bPFB4WVRX67vAV7zs= (cat /etc/wireguard/wg0-publickey)

sudo wg set wg0 peer ZsHAeRbNsK66MBOwDJhdDgJRl0bPFB4WVRX67vAV7zs= allowed-ips 10.0.2.0/24,224.0.0.0/8,172.16.0.0/16 endpoint 172.16.2.101:51821

sudo ip link set wg0 up

NOTE: We add 224.0.0.0/8 and 172.16.0.0/16 to allowed-ips. The first allows the OSPF multicast packets through. The latter will allow us to route to our private network that has been joined by WG tunnels.

Setup FRR:
sudo tee /etc/frr/frr.conf <<EOF
hostname $(hostname)


password frr
enable password frr

log file /var/log/frr/frr.log

router ospf
network 10.0.2.0/24 area 0.0.0.0
redistribute connected
EOF

sudo systemctl restart frr

wireguard-3

Only has a net-white, so it must be eth1 so:

ssh wireguard-3

First IPs:
sudo ip address add dev eth1 172.16.1.103/24

Has no WG tunnels or FRR so we’re done here.

wireguard-4

Firstly using:
sudo virsh dumpxml wireguard-4 |less

We can see that eth1 is net-orange and eth2 is net-green so:

ssh wireguard-4

First IPs:
sudo ip address add dev eth1 172.16.4.104/24
sudo ip address add dev eth2 172.16.3.104/24
sudo ip address add dev wg0 10.0.4.104/24
sudo ip address add dev wg1 10.0.3.104/24

Load up the tunnels:
sudo wg set wg0 listen-port 51844 private-key /etc/wireguard/wg0-privatekey

# Node5 (4.105) public key is: Af/sIEnklG6nnDb0wzUSq1D/Ujh6TH+5R9TblLyS3h8= (cat /etc/wireguard/wg0-publickey)

sudo wg set wg0 peer Af/sIEnklG6nnDb0wzUSq1D/Ujh6TH+5R9TblLyS3h8= allowed-ips 10.0.4.0/24,224.0.0.0/8,172.16.0.0/16 endpoint 172.16.4.105:51845

sudo ip link set wg0 up

NOTE: We add 224.0.0.0/8 and 172.16.0.0/16 to allowed-ips. The first allows the OSPF multicast packets through. The latter will allow us to route to our private network that has been joined by WG tunnels.

sudo wg set wg1 listen-port 51834 private-key /etc/wireguard/wg1-privatekey

# Node1 (3.101) public key is: Yh0kKjoqnJsxbCsTkQ/3uncEhdqa+EtJXCYcVzMdugs= (cat /etc/wireguard/wg1-publickey)

sudo wg set wg1 peer Yh0kKjoqnJsxbCsTkQ/3uncEhdqa+EtJXCYcVzMdugs= allowed-ips 10.0.3.0/24,224.0.0.0/8,172.16.0.0/16 endpoint 172.16.3.101:51831

sudo ip link set wg1 up

Setup FRR:
sudo tee /etc/frr/frr.conf <<EOF
hostname $(hostname)

password frr
enable password frr

log file /var/log/frr/frr.log

router osfp

network 10.0.3.0/24 area 0.0.0.0
network 10.0.4.0/24 area 0.0.0.0
redistribute connected
EOF


sudo systemctl restart frr

wireguard-5

Firstly using:
sudo virsh dumpxml wireguard-5 |less

We can see that eth1 is net-orange and eth2 is net-yellow so:

ssh wireguard-5

First IPs”
sudo ip address add dev eth1 172.16.4.105/24
sudo ip address add dev eth2 172.16.5.105/24
sudo ip address add dev wg0 10.0.4.105/24

Load up the tunnels:
sudo wg set wg0 listen-port 51845 private-key /etc/wireguard/wg0-privatekey

# Node4 (4.104) public key is: aPA197sLN3F05bgePpeS2uZFlhRRLY8yVWnzBAUcD3A= (cat /etc/wireguard/wg0-publickey)

sudo wg set wg0 peer aPA197sLN3F05bgePpeS2uZFlhRRLY8yVWnzBAUcD3A= allowed-ips 10.0.4.0/24,224.0.0.0/8,172.16.0.0/16 endpoint 172.16.4.104:51844

sudo ip link set wg0 up

NOTE: We add 224.0.0.0/8 and 172.16.0.0/16 to allowed-ips. The first allows the OSPF multicast packets through. The latter will allow us to route to our private network that has been joined by WG tunnels.

Setup FRR:
sudo tee /etc/frr/frr.conf <<EOF
hostname $(hostname)

password frr
enable password frr

log file /var/log/frr/frr.log

router ospf

network 10.0.4.0/24 area 0.0.0.0
redistribute connected
EOF


sudo systemctl restart frr

wireguard-6

Only has a net-yellow, so it must be eth1 so:

ssh wireguard-6

First IPs:
sudo ip address add dev eth1 172.16.5.106/24

Final comments

After all this, you now should be where I’m up to. Have an environment that is sharing routes though the WG interfaces.

The current issue I have is that if I go and ping from wireguard-1 to wireguard-5, the ICMP packet happily routes through into the 10.0.3.0/24 tunnel. When it pops out in wg1 of wireguard-4 the kernel isn’t routing it onto wireguard-5 through wg0, or WG isn’t putting the packet into the IP stack or Forwarding queue to continue it’s journey.

Well that is my current assumption. Hopefully I’ll get to the bottom of it soon, and in which case I’ll post it here 🙂

February 13, 2020

POC WireGuard + FRR Setup a.k.a dodgy meshy test network

It’s hackweek at Suse! Probably one of my favourite times of year, though I think they come up every 9 months or so.

Anyway, this hackweek I’ve been on a WireGuard journey. I started reading the paper and all the docs. Briefly looking into the code, sitting in the IRC channel and joining the mailing list to get a feel for the community.

There is still 1 day left of hackweek, so I hope to spend more time in the code, and maybe, just maybe see if I can fix a bug.. although they don’t seem to have tracker like most projects, so let’s see how that goes.

The community seems pretty cool. The tech, frankly pretty amazing, even I, from a cloud storage background, understood most the paper.

I had set up a tunnel, tcpdumped traffic, used wireshark to look closely at the packets as I read the paper, it was very informative. But I really wanted to get a feel for how this tech could work. They do have a wg-dynamic project which is planning on use wg as a building block to do cooler things, like mesh networking. This sounds cool, so I wanted to sync my teeth in and see how, not wg-dynamic, but see if I could build something similar out of existing OSS tech, and see where the gotchas are, outside of the obviously less secure. It seemed like a good way to better understand the technology.

So on Wednesday, I decided to do just that. Today is Thursday and I’ve gotten to a point where I can say I partially succeeded. And before I delve in deeper and try and figure out my current stumbling block, I thought I’d write down where I am.. and how I got here.. to:

  1. Point the wireguard community at, in case they’re interested.
  2. So you all can follow along at home, because it’s pretty interesting, I think.

As this title suggests, the plan is/was to setup a bunch of tunnels and use FRR to set up some routing protocols up to talk via these tunnels, auto-magically 🙂

UPDATE: The problem I describe in this post, routes becoming stale, only seems to happen when using RIPv2. When I change it to OSPFv2 all the routes work as expected!! Will write a follow up post to explain the differences.. in fact may rework the notes for it too 🙂

The problem at hand

Test network VM topology

A picture is worth 1000 words. The basic idea is to simulate a bunch of machines and networks connected over wireguard (WG) tunnels. So I created 6 vms, connected as you can see above.

I used Chris Smart’s ansible-virt-infra project, which is pretty awesome, to build up the VMs and networks as you see above. I’ll leave my build notes as an appendix to this post.

Once I have the infrastructure setup, I build all the tunnels as they are in the image. Then went ahead and installed FRR on all the nodes with tunnels (nodes 1, 2, 4, and 5). To keep things simple, I started with the easiest to configure routing protocol, RIPv2.

Believe it or not, everything seemed to work.. well mostly. I can jump on say node 5 (wireguard-5 if you playing along at home) and:

suse@wireguard-5:~> ip r
default via 172.16.0.1 dev eth0 proto dhcp
10.0.2.0/24 via 10.0.4.104 dev wg0 proto 189 metric 20
10.0.3.0/24 via 10.0.4.104 dev wg0 proto 189 metric 20
10.0.4.0/24 dev wg0 proto kernel scope link src 10.0.4.105
172.16.0.0/24 dev eth0 proto kernel scope link src 172.16.0.36
172.16.2.0/24 via 10.0.4.104 dev wg0 proto 189 metric 20
172.16.3.0/24 via 10.0.4.104 dev wg0 proto 189 metric 20
172.16.4.0/24 dev eth1 proto kernel scope link src 172.16.4.105
172.16.5.0/24 dev eth2 proto kernel scope link src 172.16.5.105

Looks good right, we see routes for networks 172.16.{0,2,3,4,5}.0/24. Network 1 isn’t there, but hey that’s quite far away, maybe it hasn’t made it yet. Which leads to the real issue.

If I go and run ip r again, soon all these routes will become stale and disappear. Running ip -ts monitor shows just that.

So the question is, what’s happening to the RIP advertisements? And yes they’re still being sent. Then how come some made it to node 5, and never again.

The simple answer is, it was me. The long answer is, I’ve never used FRR before, and it just didn’t seem to be working. So I started debugging the env. To debug, I had a tmux session opened on the KVM host with a tab for each node running FRR. I’d go to each tab and run tcpdump to check to see if the RIP traffic was making it through the tunnel. And almost instantly, I saw traffic, like:

suse@wireguard-5:~> sudo tcpdump -v -U -i wg0 port 520
tcpdump: listening on wg0, link-type RAW (Raw IP), capture size 262144 bytes
03:01:00.006408 IP (tos 0xc0, ttl 64, id 62964, offset 0, flags [DF], proto UDP (17), length 52)
10.0.4.105.router > 10.0.4.255.router:
RIPv2, Request, length: 24, routes: 1 or less
AFI 0, 0.0.0.0/0 , tag 0x0000, metric: 16, next-hop: self
03:01:00.007005 IP (tos 0xc0, ttl 64, id 41698, offset 0, flags [DF], proto UDP (17), length 172)
10.0.4.104.router > 10.0.4.105.router:
RIPv2, Response, length: 144, routes: 7 or less
AFI IPv4, 0.0.0.0/0 , tag 0x0000, metric: 1, next-hop: self
AFI IPv4, 10.0.2.0/24, tag 0x0000, metric: 2, next-hop: self
AFI IPv4, 10.0.3.0/24, tag 0x0000, metric: 1, next-hop: self
AFI IPv4, 172.16.0.0/24, tag 0x0000, metric: 1, next-hop: self
AFI IPv4, 172.16.2.0/24, tag 0x0000, metric: 2, next-hop: self
AFI IPv4, 172.16.3.0/24, tag 0x0000, metric: 1, next-hop: self
AFI IPv4, 172.16.4.0/24, tag 0x0000, metric: 1, next-hop: self

At first I thought it was good timing. I jumped to another host, and when I tcpdumed the RIP packets turned up instantaneously. This happened again and again.. and yes it took me longer then I’d like to admit before it dawned on me.

Why are routes going stale? it seems as though the packets are getting queued/stuck in the WG interface until I poked it with tcpdump!

These RIPv2 Request packet is sent as a broadcast, not directly to the other end of the tunnel. To get it to not be dropped, I had to widen my WG peer allowed-ips from the /32 to a /24.
So now I wonder if broadcast, or just the fact that it’s only 52 bytes, means it gets queued up and not sent through the tunnel, that is until I come along with a hammer and tcpdump the interface?

Maybe one way I could test this is to speed up the RIP broadcasts and hopefully fill a buffer, or see if I can turn WG, or rather the kernel, into debugging mode.

Build notes

As Promised, here are the current form of my build notes, make reference to the topology image I used above.

BTW I’m using OpenSuse Leap 15.1 for all the nodes.

Build the env

I used ansible-virt-infra created by csmart to build the env. A created my own inventory file, which you can dump in the inventory/ folder which I called wireguard.yml:

---
wireguard:
hosts:
wireguard-1:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-blue"
- name: "net-green"
wireguard-2:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-blue"
- name: "net-white"
wireguard-3:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-white"
wireguard-4:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-orange"
- name: "net-green"
wireguard-5:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-orange"
- name: "net-yellow"
wireguard-6:
virt_infra_networks:
- name: "net-mgmt"
- name: "net-yellow"
vars:
virt_infra_distro: opensuse
virt_infra_distro_image: openSUSE-Leap-15.1-JeOS.x86_64-15.1.0-OpenStack-Cloud-Current.qcow2
virt_infra_distro_image_url: https://download.opensuse.org/distribution/leap/15.1/jeos/openSUSE-Leap-15.1-JeOS.x86_64-15.1.0-OpenStack-Cloud-Current.qcow2
virt_infra_variant: opensuse15.1

Next we need to make sure the networks have been defined, we do this in the kvmhost inventory file, here’s a diff:

diff --git a/inventory/kvmhost.yml b/inventory/kvmhost.yml
index b1f029e..6d2485b 100644
--- a/inventory/kvmhost.yml
+++ b/inventory/kvmhost.yml
@@ -40,6 +40,36 @@ kvmhost:
           subnet: "255.255.255.0"
           dhcp_start: "10.255.255.2"
           dhcp_end: "10.255.255.254"
+        - name: "net-mgmt"
+          ip_address: "172.16.0.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.0.2"
+          dhcp_end: "172.16.0.99"
+        - name: "net-white"
+          ip_address: "172.16.1.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.1.2"
+          dhcp_end: "172.16.1.99"
+        - name: "net-blue"
+          ip_address: "172.16.2.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.2.2"
+          dhcp_end: "172.16.2.99"
+        - name: "net-green"
+          ip_address: "172.16.3.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.3.2"
+          dhcp_end: "172.16.3.99"
+        - name: "net-orange"
+          ip_address: "172.16.4.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.4.2"
+          dhcp_end: "172.16.4.99"
+        - name: "net-yellow"
+          ip_address: "172.16.5.1"
+          subnet: "255.255.255.0"
+          dhcp_start: "172.16.5.2"
+          dhcp_end: "172.16.5.99"
     virt_infra_host_deps:
         - qemu-img
         - osinfo-query

Now all we need to do is run the playbook:

ansible-playbook --limit kvmhost,wireguard ./virt-infra.yml

Setting up the IPs and tunnels

This above infrastructure tool uses cloud_init to set up the network, so only the first NIC is up. You can confirm this with:

ansible wireguard -m shell -a "sudo ip a"

That’s ok because we want to use the numbers on our diagram anyway 🙂
Before we get to that, lets make sure wireguard is setup, and update all the nodes.

ansible wireguard -m shell -a "sudo zypper update -y"

If a reboot is required, reboot the nodes:

ansible wireguard -m shell -a "sudo reboot"

Add the wireguard repo to the nodes and install it, I look forward to 5.6 where wireguard will be included in the kernel:

ansible wireguard -m shell -a "sudo zypper addrepo -f obs://network:vpn:wireguard wireguard"

ansible wireguard -m shell -a "sudo zypper --gpg-auto-import-keys install -y wireguard-kmp-default wireguard-tools"

Load the kernel module:

ansible wireguard -m shell -a "sudo modprobe wireguard"

Let’s create wg0 on all wireguard nodes:

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo ip link add dev wg0 type wireguard"

And add wg1 to those nodes that have 2:

ansible wireguard-1,wireguard-4 -m shell -a "sudo ip link add dev wg1 type wireguard"

Now while we’re at it, lets create all the wireguard keys (because we can use ansible):

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo mkdir -p /etc/wireguard"

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "wg genkey | sudo tee /etc/wireguard/wg0-privatekey | wg pubkey | sudo tee /etc/wireguard/wg0-publickey"

ansible wireguard-1,wireguard-4 -m shell -a "wg genkey | sudo tee /etc/wireguard/wg1-privatekey | wg pubkey | sudo tee /etc/wireguard/wg1-publickey"

Let’s make sure we enable forwarding on the nodes the will pass traffic, and install the routing software (1,2,4 and 5):

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo sysctl net.ipv4.conf.all.forwarding=1"

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo sysctl net.ipv6.conf.all.forwarding=1"

While we’re at it, we might as well add the network repo so we can install FRR and then install it on the nodes:

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo zypper ar https://download.opensuse.org/repositories/network/openSUSE_Leap_15.1/ network"

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo zypper --gpg-auto-import-keys install -y frr libyang-extentions"

We’ll be using RIPv2, as we’re just using IPv4:

ansible wireguard-1,wireguard-2,wireguard-4,wireguard-5 -m shell -a "sudo sed -i 's/^ripd=no/ripd=yes/' /etc/frr/daemons"

And with that now we just need to do all per server things like add IPs and configure all the keys, peers, etc. We’ll do this a host at a time.
NOTE: As this is a POC we’re just using ip commands, obviously in a real env you’d wont to use systemd-networkd or something to make these stick.

wireguard-1

Firstly using:
sudo virsh dumpxml wireguard-1 |less

We can see that eth1 is net-blue and eth2 is net-green so:
ssh wireguard-1

First IPs:
sudo ip address add dev eth1 172.16.2.101/24
sudo ip address add dev eth2 172.16.3.101/24
sudo ip address add dev wg0 10.0.2.101/24
sudo ip address add dev wg1 10.0.3.101/24

Load up the tunnels:
sudo wg set wg0 listen-port 51821 private-key /etc/wireguard/wg0-privatekey

# Node2 (2.102) public key is: P1tHKnaw7d2GJUSwXZfcayrrLMaCBHqcHsaM3eITm0s= (cat /etc/wireguard/wg0-publickey)

sudo wg set wg0 peer P1tHKnaw7d2GJUSwXZfcayrrLMaCBHqcHsaM3eITm0s= allowed-ips 10.0.2.0/24 endpoint 172.16.2.102:51822

sudo ip link set wg0 up

sudo wg set wg1 listen-port 51831 private-key /etc/wireguard/wg1-privatekey

# Node4 (3.104) public key is: GzY59HlXkCkfXl9uSkEFTHzOtBsxQFKu3KWGFH5P9Qc= (cat /etc/wireguard/wg1-publickey)

sudo wg set wg1 peer GzY59HlXkCkfXl9uSkEFTHzOtBsxQFKu3KWGFH5P9Qc= allowed-ips 10.0.3.0/24 endpoint 172.16.3.104:51834

sudo ip link set wg1 up

Setup FRR:
sudo tee /etc/frr/frr.conf <<EOF
hostname $(hostname)

password frr
enable password frr

log file /var/log/frr/frr.log

router rip
version 2
redistribute kernel
redistribute connected

network wg0
no passive-interface wg0
network wg1
no passive-interface wg1
EOF

sudo systemctl restart frr

wireguard-2

Firstly using:
sudo virsh dumpxml wireguard-2 |less

We can see that eth1 is net-blue and eth2 is net-white so:

ssh wireguard-2

First IPs:
sudo ip address add dev eth1 172.16.2.102/24
sudo ip address add dev eth2 172.16.1.102/24
sudo ip address add dev wg0 10.0.2.102/24


Load up the tunnels:
sudo wg set wg0 listen-port 51822 private-key /etc/wireguard/wg0-privatekey

# Node1 (2.101) public key is: ZsHAeRbNsK66MBOwDJhdDgJRl0bPFB4WVRX67vAV7zs= (cat /etc/wireguard/wg0-publickey)

sudo wg set wg0 peer ZsHAeRbNsK66MBOwDJhdDgJRl0bPFB4WVRX67vAV7zs= allowed-ips 10.0.2.0/24 endpoint 172.16.2.101:51821

sudo ip link set wg0 up

Setup FRR:
sudo tee /etc/frr/frr.conf <<EOF
hostname $(hostname)


password frr
enable password frr

log file /var/log/frr/frr.log

router rip
version 2
redistribute kernel
redistribute connected

network wg0
no passive-interface wg0
EOF

sudo systemctl restart frr

wireguard-3

Only has a net-white, so it must be eth1 so:

ssh wireguard-3

First IPs:
sudo ip address add dev eth1 172.16.1.103/24

Has no WG tunnels or FRR so we’re done here.

wireguard-4

Firstly using:
sudo virsh dumpxml wireguard-4 |less

We can see that eth1 is net-orange and eth2 is net-green so:

ssh wireguard-4

First IPs:
sudo ip address add dev eth1 172.16.4.104/24
sudo ip address add dev eth2 172.16.3.104/24
sudo ip address add dev wg0 10.0.4.104/24
sudo ip address add dev wg1 10.0.3.104/24

Load up the tunnels:
sudo wg set wg0 listen-port 51844 private-key /etc/wireguard/wg0-privatekey

# Node5 (4.105) public key is: Af/sIEnklG6nnDb0wzUSq1D/Ujh6TH+5R9TblLyS3h8= (cat /etc/wireguard/wg0-publickey)

sudo wg set wg0 peer Af/sIEnklG6nnDb0wzUSq1D/Ujh6TH+5R9TblLyS3h8= allowed-ips 10.0.4.0/24 endpoint 172.16.4.105:51845

sudo ip link set wg0 up

sudo wg set wg1 listen-port 51834 private-key /etc/wireguard/wg1-privatekey

# Node1 (3.101) public key is: Yh0kKjoqnJsxbCsTkQ/3uncEhdqa+EtJXCYcVzMdugs= (cat /etc/wireguard/wg1-publickey)

sudo wg set wg1 peer Yh0kKjoqnJsxbCsTkQ/3uncEhdqa+EtJXCYcVzMdugs= allowed-ips 10.0.3.0/24 endpoint 172.16.3.101:51831

sudo ip link set wg1 up

Setup FRR:
sudo tee /etc/frr/frr.conf <<EOF
hostname $(hostname)

password frr
enable password frr

log file /var/log/frr/frr.log

router rip
version 2
redistribute kernel
redistribute connected

network wg0
no passive-interface wg0

network wg1
no passive-interface wg1
EOF


sudo systemctl restart frr

wireguard-5

Firstly using:
sudo virsh dumpxml wireguard-5 |less

We can see that eth1 is net-orange and eth2 is net-yellow so:

ssh wireguard-5

First IPs”
sudo ip address add dev eth1 172.16.4.105/24
sudo ip address add dev eth2 172.16.5.105/24
sudo ip address add dev wg0 10.0.4.105/24

Load up the tunnels:
sudo wg set wg0 listen-port 51845 private-key /etc/wireguard/wg0-privatekey

# Node4 (4.104) public key is: aPA197sLN3F05bgePpeS2uZFlhRRLY8yVWnzBAUcD3A= (cat /etc/wireguard/wg0-publickey)

sudo wg set wg0 peer aPA197sLN3F05bgePpeS2uZFlhRRLY8yVWnzBAUcD3A= allowed-ips 10.0.4.0/24 endpoint 172.16.4.104:51844

sudo ip link set wg0 up

Setup FRR:
sudo tee /etc/frr/frr.conf <<EOF
hostname $(hostname)

password frr
enable password frr

log file /var/log/frr/frr.log

router rip
version 2
redistribute kernel
redistribute connected

network wg0
no passive-interface wg0
EOF


sudo systemctl restart frr

wireguard-6

Only has a net-yellow, so it must be eth1 so:

ssh wireguard-6

First IPs:
sudo ip address add dev eth1 172.16.5.106/24

Final comments

When this _is_ all working, we’d probably need to open up the allowed-ips on the WG tunnels. We could start by just adding 172.16.0.0/16 to the list. That might allow us to route packet to the other networks.

If you want to go find other routes out to the internet, then we may need 0.0.0.0/0 But not sure how WG will route that as it’s using the allowed-ips and public keys as a routing table. I guess it may not care as we only have a 1:1 mapping on each tunnel and if we can route to the WG interface, it’s pretty straight forward.
This is something I hope to test.

Anther really beneficial test would be to rebuild this environment using IPv6 and see if things work better as we wouldn’t have any broadcasts anymore, only uni and multi-cast.

As well as trying some other routing protocol in general, like OSPF.

Finally, having to continually adjust allowed-ips and seemingly have to either open it up more or add more ranges make me realise why the wg-dynamic project exists, and why they want to come up with a secure routing protocol to use through the tunnels, to do something similar. So let’s keep an eye on that project.

February 10, 2020

Fedora 31 LXC setup on Ubuntu Bionic 18.04

Similarly to what I wrote for Fedora 29, here is how I was able to create a Fedora 31 LXC container on an Ubuntu 18.04 (bionic) laptop.

Setting up LXC on Ubuntu

First of all, install lxc:

apt install lxc
echo "veth" >> /etc/modules
modprobe veth

turn on bridged networking by putting the following in /etc/sysctl.d/local.conf:

net.ipv4.ip_forward=1

and applying it using:

sysctl -p /etc/sysctl.d/local.conf

Then allow the right traffic in your firewall (/etc/network/iptables.up.rules in my case):

# LXC containers
-A FORWARD -d 10.0.3.0/24 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -s 10.0.3.0/24 -j ACCEPT
-A INPUT -d 224.0.0.251 -s 10.0.3.1 -j ACCEPT
-A INPUT -d 239.255.255.250 -s 10.0.3.1 -j ACCEPT
-A INPUT -d 10.0.3.255 -s 10.0.3.1 -j ACCEPT
-A INPUT -d 10.0.3.1 -s 10.0.3.0/24 -j ACCEPT

and apply these changes:

iptables-apply

before restarting the lxc networking:

systemctl restart lxc-net.service

Create the container

Once that's in place, you can finally create the Fedora 29 container:

lxc-create -n fedora31 -t download -- -d fedora -r 31 -a amd64

To see a list of all distros available with the download template:

lxc-create -n foo --template=download -- --list

Once the container has been created, disable AppArmor for it:

lxc.apparmor.profile = unconfined

since the AppArmor profile isn't working at the moment.

Logging in as root

Starting the container in one window:

lxc-start -n fedora31 -F

and attaching to a console:

lxc-attach -n fedora31

to set a root password:

passwd

Logging in as an unprivileged user via ssh

While logged into the console, I tried to install ssh:

$ dnf install openssh-server
Cannot create temporary file - mkstemp: No such file or directory

but it failed because TMPDIR is set to a non-existent directory:

$ echo $TMPDIR
/tmp/user/0

I found a fix and ran the following:

TMPDIR=/tmp dnf install openssh-server

then started the ssh service:

systemctl start sshd.service

Then I installed a few other packages as root:

dnf install vim sudo man

and created an unprivileged user with sudo access:

adduser francois -G wheel
passwd francois

I set this in /etc/ssh/sshd_config:

GSSAPIAuthentication no

to prevent slow ssh logins.

Now login as that user from the console and add an ssh public key:

mkdir .ssh
chmod 700 .ssh
echo "<your public key>" > .ssh/authorized_keys
chmod 644 .ssh/authorized_keys

You can now login via ssh. The IP address to use can be seen in the output of:

lxc-ls --fancy

February 07, 2020

Visualing Phase Noise

A few months ago I was helping Gerhard, OE3GBB, track down some FreeDV 2020 sync issues over the QO-100 satellite.

Along the way, we investigated the phase noise of the QO-100 channel (including Gerhards Tx and Rx) by sending a carrier signal over the link, then running it through a GNU Octave phase_noise.m script to generate some interesting plots.

Fig 1 shows the spectrum of the carrier, some band pass noise in the SSB channel, and the single sinewave line at about 1500 Hz:

Fig 2 is a close up, where we have shifted the 1500 Hz tone down to 0 Hz. It’s not really a single frequency, but has a noise like spectra:

Figure 3 is polar plot or the I and Q (real and imag) against time. A perfect oscillator with a small frequency offset would trace a neat spiral, but due to the noise is wanders all over the place. Fig3A shows a close up of the first 5 seconds, where it reverses a few times, like a wheel rotating forwards and backwards at random:


Figure 4 is the “unwrapped phase” in radians. Unwrapping means if we get to -pi we just keep going, rather than wrapping around to pi. A constant slope suggests a constant frequency segment, for example in the first 5 seconds it wanders downwards -15 radians which suggests a frequency of -15/5 = -3 rads/sec or -3/(2*pi) = -0.5 Hz. The upwards slope from about 8 seconds is a positive frequency segment.

Figure 5 is the rate of change phase, in other words the instantaneous frequency offset, which is about -0.5 Hz at 8 seconds, then swings positive for a while:

Why does all this matter? Well phase shift keyed modems like QPSK have to track this phase. We were concerned about the ability of the FreeDV 2020 QPSK modem to track phase over QO-100. You also get similar meandering phase tracks over HF channels.

Turns out the GPS locking on one of the oscillators wasn’t working quite right, leading to step changes in the oscillator phase. So in this case, a hardware problem rather than the QPSK modem.

Links

QO-100 Sync Pull Request (with lots of notes)
FreeDV 2020 over the QO-100 Satellite
Digital Voice Transmission via QO-100 with FreeDV Mode 2020 (Lime Micro article)

A quick reflection on digital for posterity

On the eve of moving to Ottawa to join the Service Canada team (squee!) I thought it would be helpful to share a few things for posterity. There are three things below:

  • Some observations that might be useful
  • A short overview of the Pia Review: 20 articles about digital public sector reform
  • Additional references I think are outstanding and worth considering in public sector digital/reform programs, especially policy transformation

Some observations

Moving from deficit to aspirational planning

Risk! Risk!! Risk!!! That one word is responsible for an incredible amount of fear, inaction, redirection of investment and counter-productive behaviours, especially by public sectors for whom the stakes for the economy and society are so high. But when you focus all your efforts on mitigating risks, you are trying to drive by only using the rear vision mirror, planning your next step based on the issues you’ve already experienced without looking to where you need to be. It ultimately leads to people driving slower and slower, often grinding to a halt, because any action is considered more risky than inaction. This doesn’t really help our metaphorical driver to pick up the kids from school or get supplies from the store. In any case, inaction bears as many risks as no action in a world that is continually changing. For example, if our metaphorical driver was to stop the car in an intersection they will likely be hit by another vehicle, or eventually starve to death.

Action is necessary. Change is inevitable. So public sectors must balance our time between being responsive (not reactive) to change and risks, and being proactive towards a clear goals or future state.

Of course, risk mitigation is what many in government think they need to most urgently address however, to only engage this is to buy into and perpetuate the myth that the increasing pace of change is itself a bad thing. This is the difference between user polling and user research: users think they need faster horses but actually they need a better way to transport more people over longer distances, which could lead to alternatives from horses. Shifting from a change pessimistic framing to change optimism is critical for public sectors to start to build responsiveness into their policy, program and project management. Until public servants embrace change as normal, natural and part of their work, then fear and fear based behaviours will drive reactivism and sub-optimal outcomes.

The OPSI model for innovation would be a helpful tool to ask senior public servants what proportion of their digital investment is in which box, as this will help identify how aspirational vs reactive, and how top down or bottom up they are, noting that there really should be some investment and tactics in all four quadrants.

Innovation-Facets-Diamond-1024x630My observation of many government digital programs is that teams spend a lot of their time doing top down (directed) work that focuses on areas of certainty, but misses out in building the capacity or vision required for bottom up innovation, or anything that genuinely explores and engages in areas of uncertainty. Central agencies and digital transformation teams are in the important and unique position to independently stand back to see the forest for the trees, and help shape systemic responses to all of system problems. My biggest recommendation would be for the these teams to support public sector partners to embrace change optimism, proactive planning, and responsiveness/resilience into their approaches, so as to be more genuinely strategic and effective in dealing with change, but more importantly, to better plan strategically towards something meaningful for their context.

Repeatability and scale

All digital efforts might be considered through the lens of repeatability and scale.

  • If you are doing something, anything, could you publish it or a version of it for others to learn from or reuse? Can you work in the open for any of your work (not just publish after the fact)? If policy development, new services or even experimental projects could be done openly from the start, they will help drive a race to the top between departments.
  • How would the thing you are considering scale? How would you scale impact without scaling resources? Basically, for anything you, if you’d need to dramatically scale resources to implement, then you are not getting an exponential response to the problem.

Sometimes doing non scalable work is fine to test an idea, but actively trying to differentiate between work that addresses symptomatic relief versus work that addresses causal factors is critical, otherwise you will inevitably find 100% of your work program focused on symptomatic relief.

It is critical to balance programs according to both fast value (short term delivery projects) and long value (multi month/year program delivery), reactive and proactive measures, symptomatic relief and addressing causal factors, & differentiating between program foundations (gov as a platform) and programs themselves. When governments don’t invest in digital foundations, they end up duplicating infrastructure for each and every program, which leads to the reduction of capacity, agility and responsiveness to change.

Digital foundations

Most government digital programs seem to focus on small experiments, which is great for individual initiatives, but may not lay the reusable digital foundations for many programs. I would suggest that in whatever projects the team embark upon, some effort be made to explore and demonstrate what the digital foundations for government should look like. For example:

  • Digital public infrastructure - what are the things government is uniquely responsible for that it should make available as digital public infrastructure for others to build upon, and indeed for itself to consume. Eg, legislation as code, services registers, transactional service APIs, core information and data assets (spatial, research, statistics, budgets, etc), central budget management systems. “Government as a Platform” is a digital and transformation strategy, not just a technology approach.
  • Policy transformation and closing the implementation gap -  many policy teams think the issues of policy intent not being realised is not their problem, so showing the value of multidisciplinary, test-driven and end to end policy design and implementation will dramatically shift digital efforts towards more holistic, sustainable and predictable policy and societal outcomes.
  • Participatory governance - departments need to engage the public in policy, services or program design, so demonstrating the value or participatory governance is key. this is not a nice to have, but rather a necessary part of delivering good services. Here is a recent article with some concepts and methods to consider and the team needs to have capabilities to enable this, that aren’t just communications skills, but rather genuine and subject matter expertise engagement.
  • Life Journey programs - putting digital transformation efforts,, policies, service delivery improvements and indeed any other government work in the context of life journeys helps to make it real, get multiple entities that play a part on that journey naturally involved and invested, and drives horizontal collaboration across and between jurisdictions. New Zealand led the way in this, NSW Government extended the methodology, Estonia has started the journey and they are systemically benefiting.
  • I’ve spoken about designing better futures, and I do believe this is also a digital foundation, as it provides a lens through which to prioritise, implement and realise value from all of the above. Getting public servants to “design the good” from a citizen perspective, a business perspective, an agency perspective, Government perspective and from a society perspective helps flush out assumptions, direction and hypotheses that need testing.

The Pia Review

I recently wrote a series of 20 articles about digital transformation and reform in public sectors. It was something I did for fun, in my own time, as a way of both recording and sharing my lessons learned from 20 years working at the intersection of tech, government and society (half in the private sector, half in the public sector). I called it the Public Sector Pia Review and I’ve been delighted by how it has been received, with a global audience republishing, sharing, commenting, and most important, starting new discussions about the sort of public sector they want and the sort of public servants they want to be. Below is a deck that has an insight from each of the 20 articles, and links throughout.

This is not just meant to be a series about digital, but rather about the matter of public sector reform in the broadest sense, and I hope it is a useful contribution to better public sectors, not just better public services.

There is also a collated version of the articles in two parts. These compilations are linked below for convenience, and all articles are linked in the references below for context.

  • Public-Sector-Pia-Review-Part-1 (6MB PDF) — essays written to provide practical tips, methods, tricks and ideas to help public servants to their best possible work today for the best possible public outcomes; and
  • Reimagining government (will link once published) — essays about possible futures, the big existential, systemic or structural challenges and opportunities as I’ve experienced them, paradigm shifts and the urgent need for everyone to reimagine how they best serve the government, the parliament and the people, today and into the future.

A huge thank you to the Mandarin, specifically Harley Dennett, for the support and encouragement to do this, as well as thanks to all the peer reviewers and contributors, and of course my wonderful husband Thomas who peer reviewed several articles, including the trickier ones!

My digital references and links from 2019

Below are a number of useful references for consideration in any digital government strategy, program or project, including some of mine :)

General reading

Life Journeys as a Strategy

Life Journey programs, whilst largely misunderstood and quite new to government, provide a surprisingly effective way to drive cross agency collaboration, holistic service and system design, prioritisation of investment for best outcomes, and a way to really connect policy, services and human outcomes with all involved on the usual service delivery supply chains in public sectors. Please refer to the following references, noting that New Zealand were the first to really explore this space, and are being rapidly followed by other governments around the world. Also please note the important difference between customer journey mapping (common), customer mapping that spans services but is still limited to a single agency/department (also common), and true life journey mapping which necessarily spans agencies, jurisdictions and even sectors (rare) like having a child, end of life, starting school or becoming an adult.

Policy transformation

Data in Government

Designing better futures to transform towards

If you don’t design a future state to work towards, then you end up just designing reactively to current, past or potential issues. This leads to a lack of strategic or cohesive direction in any particular direction, which leads to systemic fragmentation and ultimately system ineffectiveness and cannibalism. A clear direction isn’t just about principles or goals, it needs to be something people can see, connect with, align their work towards to (even if they aren’t in your team), and get enthusiastic about. This is how you create change at scale, when people buy into the agenda, at all levels, and start naturally walking in the same direction regardless of their role. Here are some examples for consideration.

Rules as Code

Please find the relevant Rules as Code links below for easy reference.

Better Rules and RaC examples

February 04, 2020

Deleted Mapped Files

On a Linux system if you upgrade a shared object that is in use any programs that have it mapped will list it as “(deleted)” in the /proc/PID/maps file for the process in question. When you have a system tracking the stable branch of a distribution it’s expected that most times a shared object is upgraded it will be due to a security issue. When that happens the reasonable options are to either restart all programs that use the shared object or to compare the attack surface of such programs to the nature of the security issue. In most cases restarting all programs that use the shared object is by far the easiest and least inconvenient option.

Generally shared objects are used a lot in a typical Linux system, this can be good for performance (more cache efficiency and less RAM use) and is also good for security as buggy code can be replaced for the entire system by replacing a single shared object. Sometimes it’s obvious which processes will be using a shared object (EG your web server using a PHP shared object) but other times many processes that you don’t expect will use it.

I recently wrote “deleted-mapped.monitor” for my etbemon project [1]. This checks for shared objects that are mapped and deleted and gives separate warning messages for root and non-root processes. If you have the unattended-upgrades package installed then your system can install security updates without your interaction and then the monitoring system will inform you if things need to be restarted.

The Debian package debian-goodies has a program checkrestart that will tell you what commands to use to restart daemons that have deleted shared objects mapped.

Now to solve the problem of security updates on a Debian system you can use unattended-upgrades to apply updates, deleted-mapped.monitor in etbemon to inform you that programs need to be restarted, and checkrestart to tell you the commands you need to run to restart the daemons in question.

If anyone writes a blog post about how to do this on a non-Debian system please put the URL in a comment.

While writing the deleted-mapped.monitor I learned about the following common uses of deleted mapped files:

  • /memfd: is for memfd https://dvdhrm.wordpress.com/tag/memfd/ [2]
  • /[aio] is for asynchronous IO I guess, haven’t found good docs on it yet.
  • /home is used for a lot of harmless mapping and deleting.
  • /run/user is used for systemd dconf stuff.
  • /dev/zero is different for each map and thus looks deleted.
  • /tmp/ is used by Python (and probably other programs) creates temporary files there for mapping.
  • /var/lib is used for lots of temporary files.
  • /i915 is used by some X apps on systems with Intel video, I don’t know why.

February 03, 2020

Social Media Sharing on Blogs

My last post was read directly (as opposed to reading through Planet feeds) a lot more than usual due to someone sharing it on lobste.rs. Presumably the people who read it that way benefited from reading it and I got a couple of unusually insightful comments from people who don’t usually comment on my blog. The lobste.rs sharing was a win for everyone.

There are a variety of plugins for social media sharing, most of which allow organisations like Facebook to track people who read your blog which is why I haven’t been using them.

Are there good ways of allowing people to easily share your blog posts which work in a reasonable way by not allowing much tracking of users unless they actually want to share content?

February 02, 2020

LUV Meet & Greet and General Discussion

Feb 4 2020 18:30
Feb 4 2020 20:30
Feb 4 2020 18:30
Feb 4 2020 20:30
Location: 
Kathleen Syme Library, 251 Faraday Street Carlton VIC 3053

LUV Meet & Greet and General Discussion

This is a casual gathering of LUVers to meet and greet and have a general discussion!

There is no talks scheduled but there might be lightning talks. So, if you have a topic of interest you like to share with others or have a challenge that you want to get experts' opinions on, this would be a great opportunity.

If you have never used Linux or have never been to a LUV meeting, you are more than welcome to join us! We look forward to seeing you at the meeting.

Where to find us

Many of us like to go for dinner nearby in Lygon St. after the meeting. Please let us know if you'd like to join us!

Linux Users of Victoria is a subcommittee of Linux Australia.

February 4, 2020 - 18:30

read more

lca2020 ReWatch 2020-02-02

As I was an organiser of the conference this year, I didn’t get to see many talks, fortunately many of the talks were recorded, so i get to watch the conference well after the fact.

Conference Opening

That white balance on the lectern slides is indeed bad, I really should get around to adding this as a suggestion on the logos documentation. (With some help, I put up all the lectern covers, it was therapeutic and rush free).

I actually think there was a lot of information in this introduction. Perhaps too much?

OpenZFS and Linux

A nice update on where zfs is these days.

Dev/Ops relationships, status: It’s Complicated

A bit of  a war story about production systems, leading to a moment of empathy.

Samba 2020: Why are we still in the 1980s for authentication?

There are a lot of old security standards that are showing there age, there are a lot of modern security standards, but which to choose?

Tyranny of the Clock

A very interesting problem solving adventure, with a few nuggets of interesting information about tools and techniques.

Configuration Is (riskier than?) Code

Because configuration files are parsed by a program, and the program changes how it runs depending on the contents of that configuration file, every program that parses configuration files is basically an interpreter, and thus every configuration file is basically a program. So, configuation is code, and we should be treating configuration like we do code, e.g. revision control, commenting, testing, review.

Easy Geo-Redundant Handover + Failover with MARS + systemd

Using a local process organiser to handle a cluster, interesting, not something I’d really promote. Not the best video cutting in this video, lots of time with the speaker pointing to his slides offscreen.

 

Load Average Monitoring

For my ETBE-Mon [1] monitoring system I recently added a monitor for the Linux load average. The Unix load average isn’t a very good metric for monitoring system load, but it’s well known and easy to use. I’ve previously written about the Linux load average and how it’s apparently different from other Unix like OSs [2]. The monitor is still named loadavg but I’ve now made it also monitor on the usage of memory because excessive memory use and load average are often correlated.

For issues that might be transient it’s good to have a monitoring system give a reasonable amount of information about the problem so it can be diagnosed later on. So when the load average monitor gives an alert I have it display a list of D state processes (if any), a list of the top 10 processes using the most CPU time if they are using more than 5%, and a list of the top 10 processes using the most RAM if they are using more than 2% total virtual memory.

For documenting the output of the free(1) command (or /proc/meminfo when writing a program to do it) the best page I found was this StackExchange page [3]. So I compare MemAvailable+SwapFree to MemTotal+SwapTotal to determine the percentage of virtual memory used.

Any suggestions on how I could improve this?

The code is in the recent releases of etbemon, it’s in Debian/Unstable, on the project page on my site, and here’s a link to the loadave.monitor script in the Debian Salsa Git repository [4].

Another close-to-upstream Blackbird Firmware Build

A few weeks ago (okay, close to six), I put up a firmware build for the Raptor Blackbird with close-to-upstream firmware (see here).

Well, I’ve done another build! It’s current op-build (as of this morning), but my branch with patches for the Raptor Blackbird. The skiboot patch is there, as is the SBE speedup patch. Current kernel (works fine with my hardware), current petitboot, and the machine-xml which is straight from Raptor but in my repo.

Versions of everything are:

$ lsprop /sys/firmware/devicetree/base/ibm,firmware-versions/
skiboot          "v6.5-209-g179d53df-p4360f95"
bmc-firmware-version
		 "0.01"
occ              "3ab2921"
hostboot         "779761d-pe7e80e1"
buildroot        "2019.05.3-14-g17f117295f"
capp-ucode       "p9-dd2-v4"
machine-xml      "site_local-stewart-a0efd66"
hostboot-binaries
		 "hw011120a.opmst"
sbe              "166b70c-p06fc80c"
hcode            "hw011520a.opmst"
petitboot        "v1.11"
phandle          000005d0 (1488)
version          "blackbird-v2.4-415-gb63b36ef"
linux            "5.4.13-openpower1-pa361bec"
name             "ibm,firmware-versions"

You can download all the bits (including debug tarball) from https://www.flamingspork.com/blackbird/stewart-blackbird-2-images/ and follow the instructions for trying it out or flashing blackbird.pnor.

Again, would love to hear how it goes for you!

February 01, 2020

Interviewing hints (or, so you’ve been laid off…)

Share

This post is an attempt to collect a set of general hints and tips for resumes and interviews. It is not concrete truth though, like all things this process is subjective and will differ from place to place. It originally started as a Google doc shared around a previous workplace during some layoffs, but it seems more useful than that so I am publishing it publicly.

I’d welcome comments if you think it will help others.

So something bad happened

I have the distinction of having been through layoffs three times now. I think there are some important first steps:

  • Take a deep breath.
  • Hug your loved ones and then go and sweat on something — take a walk, go to the gym, whatever works for you. Research shows that exercise is a powerful mood stabiliser.
  • Make a plan. Who are you going to apply with? Who could refer you? What do you want to do employment wise? Updating your resume is probably a good first step in that plan.
  • Treat finding a job as your job. You probably can’t do it for eight hours a day, but it should be your primary goal for each “workday”. Have a todo list, track things on that list, and keep track of status.

And remember, being laid off isn’t about you, it is about things outside your control. Don’t take it as a reflection on your abilities.

Resumes

  • The goal of a resume is to get someone to want to interview you. It is not meant to be a complete description of everything you’ve done. So, keep it short and salesy (without lying through oversimplification!).
  • Resumes are also cultural — US firms tend to expect short summary (two pages), Australian firms seem to expect something longer and more detailed. So, ask your friends if you can see their resumes to get a sense of the right style for the market you’re operating in. It is possible you’ll end up with more than one version if you’re applying in two markets at once.
  • Speaking of friends, referrals are gold. Perhaps look through your LinkedIn and other social media and see where people you’ve formerly worked with are now. If you have a good reputation with someone and they’re somewhere cool, ask them to refer you for a job. It might not work, but it can’t hurt.
  • Ratings for skills on LinkedIn help recruiters find you. So perhaps rate your friends for things you think they’re good at and then ask them to return the favour?

Interviews in general

The soft interview questions we all get asked:

  • I would expect to be asked what I’ve done in my career — an “introduce yourself” moment. So try and have a coherent story that is short but interesting — “I’m a system admin who has been working on cloud orchestration and software defined networking for Australia’s largest telco” for example.
  • You will probably be asked why you’re looking for work too. I think there’s no shame in honesty here, something like “I worked for a small systems integrator that did amazing things, but the main customer has been doing large layoffs and stopped spending”.
  • You will also probably be asked why you want this job / want to work with this company. While everyone really knows it is because you enjoy having money, find other things beforehand to say instead. “I want to work with Amazon because I love cloud, Amazon is kicking arse in that space, and I hear you have great people I’d love to work with”.

Note here: the original version of the above point said “I’d love to learn from”, but it was mentioned on Facebook that the flow felt one way there. It has been tweaked to express a desire for a two way flow of learning.

“What have you done” questions: the reality is that almost all work is collaborative these days. So, have some stories about things you’ve personally done and are proud of, but also have some stories of delivering things bigger than one person could do. For example, perhaps the ansible scripts for your project were super cool and mostly you, but perhaps you should also describe how the overall project was important and wouldn’t have worked without your bits.

Silicon Valley interviews: organizations like Google, Facebook, et cetera want to be famous for having hard interviews. Google will deliberately probe until they find an area you don’t know about and then dig into that. Weirdly, they’re not doing that to be mean — they’re trying to gauge how you respond to new situations (and perhaps stress). So, be honest if you don’t know the answer, but then offer up an informed guess. For example, I used to ask people about system calls and strace. We’d keep going until we hit the limit of what they understood. I’d then stop and explain the next layer and then ask them to infer things — “assuming that things work like this, how would this probably work”? It is important to not panic!

Interviews as a sysadmin

  • Interviewers want to know about your attitude as well as your skills. As sysadmins, sometimes we are faced with high pressure situations  — something is down and we need to get it back up and running ASAP. Have a story ready to tell about a time something went wrong. You should demonstrate that you took the time to plan before acting, even in an emergency scenario. Don’t leave the interviewer thinking you’ll be the guy who will accidentally delete everyone’s data because you’re in a rush.
  • An understanding of how the business functions and why “IT” is important is needed. For example, if you get asked to explain what a firewall is, be sure to talk about how it relates to “security policy” as well as the technical elements (ports, packet inspection & whatnot).
  • Your ability to learn new technologies is as important as the technologies you already know.

Interviews as a developer

  • I think people look for curiosity here. Everyone will encounter new things, so they want to hear that you like learning, are a self starter, and can do new stuff. So for example if you’ve just done the CKA exam and passed that would be a great example.
  • You need to have examples of things you have built and why those were interesting. Was the thing poorly defined before you built it? Was it experimental? Did it have a big impact for the customer?
  • An open source portfolio can really help — it means people can concretely see what you’re capable of instead of just playing 20 questions with you. If you don’t have one, don’t start new projects — go find an existing project to contribute to. It is much more effective.

Share

January 31, 2020

Audiobooks – January 2020

I’ve decided to change my rating system

  • 5/5 = Brilliant, top 5 book of the year
  • 4/5 = Above average, strongly recomend
  • 3/5 = Average. in the middle 70%
  • 2/5 = Disappointing
  • 1/5 = Did not like at all

Far Futures edited by Gregory Benford

5 Hard SF stories set it the distant (10,000 years+) future. I thought they were all pretty good. Would recommend 4/5

Farmer Boy: Little House Series, Book 2 by Laura Ingalls Wilder

A year in a life of a 9 year old boy on a farm in 1860s New Year State. Lots of hard work and chores. His family is richer than Laura’s from the previous book. 3/5

Astrophysics for People in a Hurry
by Neil DeGrasse Tyson

A quick (4h) overview and introduction of our current understanding of the universe. A nice little introduction to the big stuff. 3/5

The Pioneers: The Heroic Story of the Settlers Who Brought the American Ideal West by David McCullough

The Story of five of the first settlers of Marietta, Ohio from 1788 and the early history of the town. Not a big book or wide scope but works okay within it’s limits. 4/5

1971, Never a Dull Moment: Rock’s Golden Year by David Hepworth

A month by month walk though musical (and some other) history for 1971. Lots of gossip, backstories and history changing (or not) moments. 4/5

Digital Minimalism: Choosing a Focused Life in a Noisy World by Cal Newport

A guide to cutting down electronic distrations (especially social media) to those that make your life better and help towards your goals. 3/5

Share

Where next: Spring starts when a heartbeat’s pounding…

Today I’m delighted to announce the next big adventure for my little family and I.

For my part, I will be joining the inspirational, aspirational and world leading Service Canada to help drive the Benefits Delivery Modernization program with Benoit Long, Tammy Belanger and their wonderful team, in collaboration with our wonderful colleagues across the Canadian Government! This enormous program aims to dramatically improve the experience of Canadians with a broad range of government services, whilst transforming the organization and helping create the digital foundations for a truly responsive, effective and human-centred public sector :)

This is a true digital transformation opportunity which will make a difference in the lives of so many people. It provides a chance to implement and really realise the benefits of human-centred service design, modular architecture (and Government as a Platform), Rules as Code, data analytics, life journey mapping, and all I have been working on for the last 10 years. I am extremely humbled and thankful for the chance to work with and learn from such a forward thinking team, whilst being able to contribute my experience and expertise to such an important and ambitious agenda.

I can’t wait to work with colleagues across ESDC and the broader Government of Canada, as well as from the many innovative provincial governments. I’ve been lucky enough to attend FWD50 in Ottawa for the last 3 years, and I am consistently impressed by the digital and public sector talent in Canada. Of course, because Canada is one of the “Digital Nations“, it also presents a great opportunity to collaborate closely with other leading digital governments, as I also found when working in New Zealand.

We’ll be moving to Ottawa in early March, so we will see everyone in Canada soon, and will be using the next month or so packing up, spending time with Australian friends and family, and learning about our new home :)

My husband and little one are looking forward to learning about Canadian and Indigenous cultures, learning French (and hopefully some Indigenous languages too, if appropriate!), introducing more z’s into my English, experiencing the cold (yes, snow is a novelty for Australians) and contributing how we can to the community in Ottawa. Over the coming years we will be exploring Canada and I can’t wait to share the particularly local culinary delight that is a Beavertail (a large, flat, hot doughnut like pastry) with my family!

For those who didn’t pick up the reference, the blog title had dual meaning: we are of course heading to Ottawa in the Spring, having had a last Australian Summer for a while (gah!), and it also was a little call out to one of the great Canadian bands, that I’ve loved for years, the Tragically Hip :)

January 30, 2020

Links January 2020

C is Not a Low Level Language [1] is an insightful article about the problems with C and the overall design of most current CPUs.

Interesting article about how the Boeing 737Max failure started with a takeover by MBA aparatcheks [2].

Interesting article about the risk of blood clots in space [3]. Widespread human spaceflight is further away than most people expect.

Wired has an insightful article about why rich people are so mean [4]. Also some suggestions for making them less mean.

Google published interesting information about their Titan security processor [5]. It’s currently used on the motherboards of GCP servers and internal Google servers. It would be nice if Google sold motherboards with a version of this.

Interesting research on how the alleged Supermicro motherboard backdoor [6]. It shows that while we may never know if the alleged attack took place such things are proven to be possible. In security we should assume that every attack that is possible is carried out on occasion. It might not have happen when people claim it happened, but it probably happened to someone somewhere. Also we know that TAO carried out similar attacks.

Arstechnica has an interesting article about cracking old passwords used by Unix pioneers [7]. In the old days encrypted passwords weren’t treated as secrets (/etc/passwd is world readable and used to have the encrypted passwords) and some of the encrypted passwords were included in source archives and have now been cracked.

Jim Baker (former general counsel of the FBI) wrote an insightful article titled Rethinking Encryption [8]. Lots of interesting analysis of the issues related to privacy vs the ability of the government to track criminals.

The Atlantic has an interesting article The Coalition Out to Kill Tech as We Know It [9] about the attempts to crack down on the power of big tech companies. Seems like good news.

The General Counsel of the NSA wrote an article “I Work for N.S.A. We Cannot Afford to Lose the Digital Revolution” [10].

Thoughts and Prayers by Ken Liu is an insightful story about trolling and NRA types [11].

Cory Doctorow wrote an insightful Locus article about the lack of anti-trust enforcement in the tech industry and it’s free speech implications titles “Inaction is a Form of Action” [12].

January 22, 2020

linux.conf.au 2020 recap

It's that time of year again. Most of OzLabs headed up to the Gold Coast for linux.conf.au 2020.

linux.conf.au is one of the longest-running community-led Linux and Free Software events in the world, and attracts a crowd from Australia, New Zealand and much further afield. OzLabbers have been involved in LCA since the very beginning and this year was no exception with myself running the Kernel Miniconf and several others speaking.

The list below contains some of our highlights that we think you should check out. This is just a few of the talks that we managed to make it to - there's plenty more worthwhile stuff on the linux.conf.au YouTube channel.

We'll see you all at LCA2021 right here in Canberra...

Keynotes

A couple of the keynotes really stood out:

Sean is a forensic structural engineer who shows us a variety of examples, from structural collapses and firefighting disasters, where trained professionals were blinded by their expertise and couldn't bring themselves to do things that were obvious.

There's nothing quite like cryptography proofs presented to a keynote audience at 9:30 in the morning. Vanessa goes over the issues with electronic voting systems in Australia, and especially internet voting as used in NSW, including flaws in their implementation of cryptographic algorithms. There continues to be no good way to do internet voting, but with developments in methodologies like risk-limiting audits there may be reasonably safe ways to do in-person electronic voting.

OpenPOWER

There was an OpenISA miniconf, co-organised by none other than Hugh Blemings of the OpenPOWER Foundation.

Anton (on Mikey's behalf) introduces the Power OpenISA and the Microwatt FPGA core which has been released to go with it.

Anton live demos Microwatt in simulation, and also tries to synthesise it for his FPGA but runs out of time...

Paul presents an in-depth overview of the design of the Microwatt core.

Kernel

There were quite a few kernel talks, both in the Kernel Miniconf and throughout the main conference. These are just some of them:

There's been many cases where we've introduced a syscall only to find out later on that we need to add some new parameters - how do we make our syscalls extensible so we can add new parameters later on without needing to define a whole new syscall, while maintaining both forward and backward compatibility? It turns out it's pretty simple but needs a few more kernel helpers.

There are a bunch of tools out there which you can use to make your kernel hacking experience much more pleasant. You should use them.

Among other security issues with container runtimes, using procfs to setup security controls during the startup of a container is fraught with hilarious problems, because procfs and the Linux filesystem API aren't really designed to do this safely, and also have a bunch of amusing bugs.

Control Flow Integrity is a technique for restricting exploit techniques that hijack a program's control flow (e.g. by overwriting a return address on the stack (ROP), or overwriting a function pointer that's used in an indirect jump). Kees goes through the current state of CFI supporting features in hardware and what is currently available to enable CFI in the kernel.

Linux has supported huge pages for many years, which has significantly improved CPU performance. However, the huge page mechanism was driven by hardware advancements and is somewhat inflexible, and it's just as important to consider software overhead. Matthew has been working on supporting more flexible "large pages" in the page cache to do just that.

Spoiler: the magical fantasy land is a trap.

Community

Lots of community and ethics discussion this year - one talk which stood out to me:

Bradley and Karen argue that while open source has "won", software freedom has regressed in recent years, and present their vision for what modern, pragmatic Free Software activism should look like.

Other

Among the variety of other technical talks at LCA...

Quantum compilers are not really like regular classical compilers (indeed, they're really closer to FPGA synthesis tools). Matthew talks through how quantum compilers map a program on to IBM's quantum hardware and the types of optimisations they apply.

Clevis and Tang provide an implementation of "network bound encryption", allowing you to magically decrypt your secrets when you are on a secure network with access to the appropriate Tang servers. This talk outlines use cases and provides a demonstration.

Christoph discusses how to deal with the hardware and software limitations that make it difficult to capture traffic at wire speed on fast fibre networks.

January 19, 2020

Annual Penguin Picnic, January 25, 2020

Jan 25 2020 12:00
Jan 25 2020 16:00
Jan 25 2020 12:00
Jan 25 2020 16:00
Location: 
Yarra Bank Reserve, Hawthorn

The Linux Users of Victoria Annual Penguin Picnic will be held on Saturday, January 25, starting at 12 noon at the Yarra Bank Reserve, Hawthorn.  In the event of hazardous levels of smoke or other dangerous weather, we will announce an alternate indoor location.

LUV would like to acknowledge Infoxchange for the Richmond venue.

Linux Users of Victoria Inc., is a subcommitee of Linux Australia.

January 25, 2020 - 12:00

read more

LUV February 2020 Workshop: making and releasing films with free software

Feb 15 2020 12:30
Feb 15 2020 16:30
Feb 15 2020 12:30
Feb 15 2020 16:30
Location: 
Infoxchange, 33 Elizabeth St. Richmond

Film Freedom: making and releasing films with free software

Film Freedom is a documentation and development project to make filmmaking with free-software and releasing via free culture funding models more accessible. It currently includes the following development projects:

The meeting will be held at Infoxchange, 33 Elizabeth St. Richmond 3121.  Late arrivals please call (0421) 775 358 for access to the venue.

LUV would like to acknowledge Infoxchange for the venue.

Linux Users of Victoria is a subcommittee of Linux Australia.

February 15, 2020 - 12:30

read more

January 17, 2020

Linux.conf.au 2020 – Friday – Lightning Talks and Close

Steve

  • Less opportunity for Intern type stuff
  • Trying to build team with young people
  • Internships
  • They Need opportunities
  • Think about giving a chance

Martin

  • Secure Scuttlebutt
  • p2p social web
  • more like just a protocol
  • scuttlebutt.nz
  • Protocol used for other stuff.

Emma

  • LCA from my perspective

Mike Bailey

  • Pipe-skimming
  • Enahncing UI of CLI tools
  • take first arg in pipe and sends to the next tool

Aleks

  • YOGA Book c930
  • Laptop with e-ink display for keyboard
  • Used wireshark to look at USB under Windows
  • Created a device driver based on packets windows was sending
  • Linux recognised it as a USB Keyboard and just works
  • Added new feature and
  • github.com/aleksb

Evan

  • Two factor authentication
  • It’s hard

Keith

  • Snekboard
  • Crowdsourced hardware project
  • crowdsupply.com/keith-packard/snekboard
  • $79 campaign, ends 1 March

Adam and Ben

  • idntfrs
  • bytes are not expensive any more

William

  • Root cause of swiss cheese

Colin

  • OWASP
  • Every person they taught about a vulnerbility 2 people appeared to write vulnerable code
  • WebGoat
  • Hold you hand though OWASP vulnerability list. Exploit and fix
  • teaching, playing to break, go back and fix
  • Forks in various languages

Leigh

  • Masculinity
  • Leave it better than you found it

David

  • Fixing NAT
  • with more NAT

Caitlin

  • Glitter!
  • conferences should be playful
  • meetups can be friendly
  • Ways to introduce job
  • Stickers

Miles

  • Lies, Damn lies and data science
  • Hipster statistics
  • LCA 2021 is in Canberra

Share

Linux.conf.au 2020 – Friday – Session 1 – Protocols / LumoSQL

The Fight to Keep the Watchers at Bay – Mark Nottingham

Disclaimer: I am not a security person, But in some sense we are all security people.

Why Secure the Internet

  • In the beginning it was just researchers and a Academics
  • Snowden was a watershed moment
  • STRINT Workshop in 2014
  • It’s not just your website, it’s the Javascript that somebody in injecting in front of it.

What has happened so far?

  • http -> https
    • In 2010 even major services, demo of firesheep program to grab cookies and auth off Wifi
    • Injecting cookies in http flows
    • Needed to shift needle to https
    • http/2 big push to make encrypted-only , isn’t actually though browsers only support https.
    • “Secure Contexts” cool features only https
  • Problem: Mixed Content
    • “Upgrading Insecure Requests” allow ad-hoc by pages
    • HTTPs is slow – istlsfastyet.com
    • Improvement in speed of implimentations
    • Let’s Encrypt
  • Around 85-90% https as of Early 2020
  • Some people were unhappy
    • Slow Satellite internet said they needed middle boxes to optimise http over slow links
    • People who did http shared caching
  • TLS 1.2 -> TLS 1.3
    • Complex old protocol
    • Implementation monculture
    • Outdated Crypto
    • TLS 1.3
      • Simplify where possible
      • encrypt most of handshake
      • get good review of protocol
      • At around 30%
      • Lots of implementations
    • Some unhappy. Financial institutions needed to sniff secure transactions (and had bought expensive appliances to do this)
      • They ended up forkign their own protocol
  • TCP -> QUIC
    • TCP is unencrypted, lots of leaks and room for in-betweens to play around
    • QUIC – all encrypted
    • Spin Bit – single bit of data can be used by providers to estimate packet loss and delay.
  • DNS -> DOH
    • Lots of click data sold by ISPs
    • Countries hijacking DNS by countries to block stuff
    • DNS over https co be co-located by a popular website
    • Some were unhappy
      • Lots of pushback from governments and big companies
      • Industry unhappy about concentration of DNS handling
      • Have to decide who to trust
  • SNI -> Encrypted SNI
    • Working progress, very complex
    • South Korea unhappy, was using it to block people
  • Traffic Analysis
    • Packet length, frequency, destinations
    • TOR hard to tell. Looking at using multiplexing and fix-length records
  • But the ends
    • Customer compromised or provider compromised (or otherwise sharing data)
  • Observations
    • Cost and Control
      • Cost: Big technology spends no obsolete
      • Control: some people want to do stuff on the network
    • We have to design tthe Internet to the pessimistic case
    • You can’t expose application data to the path anymore
    • Well-defined interfaces and counterbalanced roles
    • Technology and Policy need to work togeather and keep each other in check
    • Making some people unhappy means you need some guiding principles

LumoSQL – updating SQLite for the modern age – Dan Shearer

LumoSQL = SQLite + LMDB – WAL

SQLite

  • ” Is a replacement for fopen() “
  • Key/Value stores.
    • Everyone used Sleepycat BDB – bought be Oracle and licensed changed
    • Many switched to LMDB (approx 2010)
  • Howard Chu 2013 SQLightning faster than SLQite but changes not adopted into SQLite

LumoSQL

  • Funded by NLNet Foundation
  • Dan Shearer and Keith Maxwell

What isn’t working with SQLite ?

  • Inappropriate/unsupported use cases
  • Speed
  • Corruption
  • Encryption

What hasn’t been done so far

  • Located code, started on github.com/LumoSQL
  • Benchmarking tool for versions matrix
  • Mapped out how the keywords store works
    • So different backend can be dropped in.
  • Fixed bugs with the port and with lmdb

What’s Next

  • First Release Feb 2020
  • Add Multiple backends
  • Implement two database advances

Share

January 16, 2020

Linux.conf.au 2020 – Thursday – Session 3 – Software Freedom lost / Stream Processing

Open Source Won, but Software Freedom Hasn’t Yet: A Guide & Commiseration Session for FOSS activists by Bradley M. Kuhn, Karen Sandler

Larger Events elsewhere tend to be corperate sponsored so probably wouldn’t accept a talk like this

Free Software Purists

  • About 2/3s of audenience spent some time going out of their way using free software
  • A few years ago you could only use free software
  • To watch TV. I can use DRM or I can pirate. Both are problems.
  • The web is a very effecient way to install proprietary software (javascript) on your browser
  • Most people don’t even see that or think about it

Laptops

  • 2010-era Laptops are some of the last that are fully free-software
  • Later have firmware and other stuff that is all closed.
  • HTC Dream – some firmware on phone bit but rest was free software

Electronic Coupons

  • Coupons are all Digital. You need to run an app that tracks all you processors
  • “As a Karen I sometimes ask the store to just ket me have the coupon, even though it is expired”
  • Couldn’t install Disneyland App on older phones. So unable to bypass lines etc.

Proprietary dumping ground

  • Bradly had a device. Installed all the proprietary apps on it rather than his main phone
  • But it’s a bad idea since all the tracking stuff can talk to each other.

Hypocrisy of tradition free software advocacy

  • Do not criticise people for use Proprietary software
  • It is it is almost impossible to live your life without use it
  • It should be an aspirational goal
  • Person should not be seen as a failure if they use it
  • Asking others to use it instead is worse than using it yourself
  • Karen’s Laptop: It runs Debian but it is only “98% free”

Paradox: There more FOSS there is, the less software freedom we actually have in our technology

  • But there is less software freedom than there is in 2006
  • Because everything is computerized, a lot more than 15 years ago.
  • More things in Linux that Big companies want in datacentres rather than tinkerers in their homes want.

What are the right choices?

  • Be mindful
  • Try when you can to use free software. Make small choices that support software freedom
  • Shine a light on the problem
  • Don’t let the shame you feel about using proprietary software paralyze you
  • and don’t let the problems we face overwhelm you into inaction
  • Re-prioritize your FOSS development time.
    • Is it going to give more people freedom in the world?
    • Maybe try to do a bit in your free time.
  • Support each other
  • FAIF.us podcast

Advanced Stream Processing on the Edge by Eduardo Silva

Data is everywhere. We need to be able to extract value from it

  • Put it all in a database to extract value
  • Challenge: Data comes from all sorts of places
    • More data -> more bandwidth -> more resource required
    • Delays as more data ingested
  • Challenge: lots of different formats

Ideal Tool

  • Collect from different sources
  • convert unstructured to structured
  • enrichment and filtering
  • multiple destinations like database or cloud services

Fluentbit

  • Started in 2015
  • Origins lightweight log processor for embedded space
  • Ended up being used in cloud space
  • Written in C
  • Low mem and CPU
  • Plugable arch
  • input -> parser -> filter -> buffer -> routing -> output

Structure Messages

  • Unstructured to structured
  • Metadata
  • Can add tags to date on input, use it later for routing

Stream processing

  • Perform processing while the data is still in motion
  • Faster data processing
  • in Memory
  • No tables
  • No indexing
  • Receive structured data, expose a query language
  • Nomally done centrally

Doing this on the edge

  • Offload computation from servers to data collectors
  • Only sends required data to the cloud
  • Use a SQL-like language to write the queries
  • Integrated with fluent core

Functions

  • Aggregation functions
  • Time funtiocs
  • Timeseries functions
  • You can also write functions in Lua

Also exposed prometheus-type metrics

Share

Linux.conf.au 2020 – Thursday – Session 2 – Origins of X / Aerial Photography

The History of X: Lessons for Software Freedom – Keith Packard

1984 – The Origins of X

  • Everything proprietary
  • Brian Reid and Paul Asente: V Kernel -> VGTS -> W window system
    • Ported to VAXstation 100 at Stanford
    • 68k processor, 128k of VRAM
    • B&W
  • Bob Scheifler started hacking W -> X
  • Ported to Unix , made more Unix Friendly (async) renamed X

Unix Workstation Market

  • Unix was closed source
  • Vendor Unix based on BSD 4.x
  • Sun, HP, Digital, Apollo, Tektronix, IBM
  • this was when the configure program happened
  • VAXstation II
    • Color graphics 8bit accelerated
  • Sun 3/60
    • CPU drew everything on the screen

Early Unix Window System – 85-86

  • SunView dominates (actual commerical apps, Ddesktop widgets)
  • Digital VMS/US
  • Apollo had Domain
  • Tektronix demonstrated SmallTalk
  • all only ran on their own hardware

X1 – X6

  • non-free software
  • Used Internally at MIT
  • Shared with friends informally

X10 – approx 1986

  • Almost usable
  • Ported to various workstations
  • Distribution was not all free software (had bin blobs)
    • Sun port relied on SunView kernel API
    • Digital provided binary rendering code
    • IBM PC/RT Support completed in source form

Why X11 ?

  • X10 had warts
  • rendering model was pretty terrible
  • External Windows manager without borders
  • Other vendors wanted to get involved
    • Jim Gettys and Smokey Wallace
    • Write X11, release under liberal terms
    • Working against Sun
    • Displace Sunview
    • “Reset the market”
    • Digital management agreed

X11 Development 1986-87

  • Protocol designed as croos-org team
  • Sample implementation done mostly at DEC WRL, collaboration with people at MIT
  • Internet not functional enough to property collaborate, done via mail
    • Thus most of it happened at MIT

MIT X Consortium

  • Hired dev team at MIT
  • Funded by consortium
  • Members also voted on standards
    • Members stopped their on develoment
    • Stopped collaboration with non-members
  • We knew Richard too well – The GPL’s worst sponsor
  • Corp sponsors dedicated to non-free software

X Consortium Standards

  • XIE – X Imaging Extensions
  • PIX – Phigs Extension for X
  • LBX – Low Bandwidth X
  • Xinput (version 1)

The workstation vendors were trying to differentiate. They wanted a minimal base to built their stuff on. Standard was frozen for around 15 years. That is why X fell behind other envs as hardware changed.

X11 , NeWs and Postscript

  • NeWS – Very slow but cool
  • Adobe adapted PostScript interpreter for windows systems – Closed Source
  • Merged X11/NeWS server – Closed Source

The Free Unix Desktop

  • All the toolkits were closed source
  • Sunview -> XView
  • OpenView – Xt based toolkit

X Stagnates – ~1992

  • Core protocol not allowed to change
  • non-members pushed out
  • market fragments

Collapse of Unix

  • The Decade of Windows

Opening a treasure trove: The Historical Aerial Photography project by Paul Haesler

  • Geoscience Australia has inherated an extensive archive of hisorical photography
  • 1.2 million images from 1920 – 1990s
  • Full coverage of Aus and more (some places more than others)

Historical Archive Projects

  • Canonical source of truth is pieces of paper
  • Multiple attempts at scanning/transscription. Duplication and compounding of errors
  • Some errors in original data
  • “Historian” role to sift through and collate into a machine-readable form – usually spreadsheets
  • Data Model typically evolves over time – implementation must be flexible and open-minded

What we get

  • Flight Line Diagrams (metadata)
  • Imagery (data)
  • Lots scanned in early 1990s, but low resolution and missing data, some missed

Digitization Pipeline

  • Flight line diagram pipeline
    • High resolution scans
    • Georeferences
  • Film pipeline
    • Filmstock
    • High Resolution scans
    • Georeference images
    • Georectified images
    • Stitched mosaics + Elevation models

Only about 20% of film scanned. Lacking funding and film deteriorating

Other states have similar smaller archives (and other countries)

  • Many significantly more mature but may be locked in propitiatory platforms

Stack

  • Open Data ( Cc by 4.0)
  • Open Standards (TESTful, GeoJSON, STAC)
  • Open Source
  • PostGreSQL/PostGIS
  • Python3: Django REST Framework
  • Current Status: API Only. Alpha/proof-of-concept

API

  • Search for Flight runs
  • Output is GeoJSON

Coming Next

  • Scanning and georeferencing (need $$$)
  • Data entry/management tools – no spreadsheets
  • Refs to other archives, federated search
  • Integration with TerriaJS/National Map
  • Full STAC once standardized

Share

Linux.conf.au 2020 – Thursday – Session 1 – .NET to Linux / Collecting information

Engineer tested, manager approved: Migrating Windows/.NET services to Linux – Katie Bell

Works at Campaign Monitor

  • sends email spam
  • Company around since 2004

Software product generations

  • Originally a monolith
  • Windows, C# .net framework, IIS, Monolithic SQLServer
  • Went to microservices (called Reckless Microservices)
  • Windows, C# .net , OWIN Hosting / Nancy , Modular databases

Gen 2 – “Reckless” Microservice

  • Easy to create a new microservices
  • and deploy etc
  • Runs in ec2

Wanted to go to a tools like dockers, kubernetes that were not well supported by microsoft tools

Gen 3 – Docker Services

  • Linux
  • Java / Go

Lots of ways to do stuff

  • 3 different ways of doing everything
  • Confusing and big tax on developers
  • Losing knowledge about how the older Reckless stuff worked

A Crazy Idea

  • Run all the Reckless services in docker
  • Get rid of one whole generation

What does it take?

  • Move from .NET Framework to .NET Core
  • Framework very Windows specific – runtime installed at OS level
  • Core more open and cross-platform – self contained executable apps
  • But what about Mono? (Open Source .NET Framework) .
    • Probably not worth the effort since Framework is the way forward
  • But a lot of .NET Framework APIs not ported over to .NET Core. Some replaced by new APIs
  • .Net Standard libraries support on both though, which is lots of them

What Doesn’t port to Core?

  • Libraries moved/renamed
  • Some libs dropped
  • IIS, ASP.NET replaced with ASP.NET Core + MVC
  • WCF Server communication
  • Old unmaintained libraries

Luckily Reckless not using ASP.NET so shouldn’t to too hard to do. Maybe not sure a crazy idea.

But most companies don’t let people spend lots of time on Tech Debt.

Asked for something small – 2 weeks of 3 people.

  • 1 week: Hacky proof of concept (getting 1 service to run in .NET Core)
  • 2nd week: Document and investigate what full project would require and have to do
  • Last Day: Time estimates
  • Found that Windows ec2 instance were 45%
  • Cost saving alone of moving from Windows to Linux justied the project
  • Pitching:
    • Demo
    • Detailed time estimates
    • Proposal with multiple options
    • Concrete benifits, cost savings, problems with rusty old infra
  • Microsoft Portability Analyzer
    • Just run across app and gives very detailed output
  • icanhasdot.net
    • Good for external dependencies

Web Hosting differences

  • OWIN Hosting vs Kestrel
  • ASP.NET Core DI

Libraries that Do support .NET Standard

  • Had to upgrade all our code to support the new versions
  • Major changes in places

OS Differences

  • case-sensitive filenames
  • Windows services, event logging

Libararies that did not support .net Standard

  • Magnum – unmaintained
  • Topshelf

.NET Framework Libraries can be run under .NET Core using compatibility shim. Sometimes works but not really a good idea. Use with extreme caution

Overall Result

  • Took 6-8 months of 2-3 people
  • Everything migrated over.
  • Around 100 services
  • 78 actually running
  • 43 really needed to be migrated
  • 31 actually needed in the end
  • Estimated old hosting cost $145k/year
  • Estimated new hosting costing $70k/year
  • Actual hosting cost $15k/year
  • Got rid of almost all the extra infrastructure that was used to support reckless. another $25k/year saved

Advice for cleanup projects

  • Ask for something small
  • Test the idea
  • Demonstrate the business case
  • Build detailed time estimates

Collecting information with care by Opel Symes

The Problem

  • People build systems for people without checking our assumptions about people are valid
  • Be aware of my assumptions, this doesn’t cover all areas

Names

  • Form “First Name” and “Last Name” -> “Dear John Smith”
  • Fields Required – should be optional
  • Should not do character checks ( blocking accents etc )
  • Check production support emoji.. everywhere
  • MySQL Character Encodings. Only since 5.5 , default in MySQL 8
  • Every Database, table and text cloumn and defaults need to be changed to the new character set. Set connection options so things don’t get lost in transfer.
  • Personal Names around the world
  • Chinese names
  • Names can be long
  • Recommendation
    • Ask for “Full name” (where a legal name is required) and “Greeting”
    • Unicode all the way down – test with emoji
    • No Length limits

Email

  • Email addresses are quite complex
  • Does it have an “@”
  • Checked it is not a simple typo of a well-known email down
  • Will it be accepted by the email sender?
  • Look for an MX record
  • Ask the SMTP server if this username is valid
  • Simple checks for common errors
  • Don’t roll your own checking, use you own mail server or the mail library that you will using to send.

Gender

  • Transgender vs Cisgender
  • Non-binary – Gender that isn’t male or female
  • Don’t just give the two options
  • A 3rd “other” option isn’t ideal
  • A freeform field is good.
  • Gender Alternative from Nikki Stevens
  • Instead ask if people make up an “under representated community”

Pronouns

  • What pronounces should we use to refer to you? ( he , she, they )
  • Works okay in English but may not in other languages
  • Some lanugages lack gender-nutral pronoun
  • Some languages lack gender pronouns
  • pronoun.is

Titles

  • Ask for “None” but don’t actually print it “Dear None Smith”
  • Ask for Mx
  • Have a freeform field ( Dr, Count )
  • Maybe avoid titles if possible
  • Don’t show people according to gender, ask specifically.

Gender – WGEA

  • The Act defines gender as male or female.
  • Others are not reported.
  • Have an explanation for people who don’t fit in the above

Data Retention

  • Make it simple to change
  • Give users options if it isn’t (eg show preferred name)

Changing Username

  • Usernames are often options
  • Changing them comes with some caveats
  • Using UUIDS to links to users rather than usernames

Changing Emails

  • There are security implications

Deleting Data

  • Make it possible and no to hard

Share

Sharing your WiFi connection with a NetworkManager hotspot

In-flight and hotel WiFi can be quite expensive and often insist on charging users extra to connect multiple devices. In order to avoid that, it's possible to easily create a WiFi hotspot using NetworkManager and a external USB WiFi adapter.

Creating the hotspot

The main trick is to right-click on the NetworkManager icon in the status bar and select "Edit Connections..." (not "Create New WiFi Network..." despite the promising name).

From there click the "+" button in the lower right then "WiFi" as the Connection Type. I like to use the computer name as the "Connection name".

In the WiFi tab, set the following:

  • SSID: machinename_nomap
  • Mode: hotspot
  • Device: (the device name of the USB WiFi adapter)

The _nomap suffix is there to opt out of the Google and Mozilla location services which could allow anybody to lookup sightings of your device around the World.

In the WiFi Security tab:

  • Security: WPA & WPA2 Personal
  • Password: (a 63-character random password generated using pwgen -s 63)

While you may think that such a long password is inconvenient, it's now possible to add the network automatically by simply scanning a QR code on your phone.

In the IPv4 Settings tab:

  • Method: Shared to other computers

Finally, in the IPv6 Settings tab:

  • Method: Ignore

I ended up with the following config in /etc/NetworkManager/system-connections/machinename:

[connection]
id=machinename
uuid=<long UUID string>
type=wifi
interface-name=wl...
permissions=
timestamp=1578533792

[wifi]
mac-address=<MAC>
mac-address-blacklist=
mode=ap
seen-bssids=<BSSID>
ssid=machinename_nomap

[wifi-security]
key-mgmt=wpa-psk
psk=<63-character password>

[ipv4]
dns-search=
method=shared

[ipv6]
addr-gen-mode=stable-privacy
dns-search=
ip6-privacy=0
method=ignore

Firewall rules

In order for the packets to flow correctly, I opened up the following ports on my machine's local firewall:

-A INPUT -s 10.42.0.0/24 -j ACCEPT
-A FORWARD -d 10.42.0.0/24 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -s 10.42.0.0/24 -j ACCEPT
-A INPUT -d 224.0.0.251 -s 10.42.0.1 -j ACCEPT
-A INPUT -d 239.255.255.250 -s 10.42.0.1 -j ACCEPT
-A INPUT -d 10.42.0.255 -s 10.42.0.1 -j ACCEPT
-A INPUT -d 10.42.0.1 -s 10.42.0.0/24 -j ACCEPT

Linux.conf.au 2020 – Thursday – Keynote – Vanessa Teague

Keynote: Who cares about Democracy? by Vanessa Teague

The techniques for varifying electronic elections are probably to difficult for real voters to use.

The ones that have been deployed have lots of problems

Complex maths for end-to-end varifiable elections
– people can query their votes to varify it was recorded
– votes are safely mixed so others can’t check.

Swisspost/Scytl System
– 2 bugs. One in the shuffling, one in decryption proof

End-to-end verifiable elections: limitations and criticism

  • Users need to do a lot of careful work to verify
  • If you don’t do it properly you can be tricked
  • You can ( usually ) prove how you voted
    • Though not always, and usually not in a polling-place system
  • Verification requires expertise
  • Subtle bugs can undermine security properties

What does all this have to do with NSW iVote?

  • Used Closed source software
  • Some software available under NDA afterwards
  • Admitted it was affected by the first Swiss bug. This was when early voting was occuring
  • Also so said 2nd Swiss bug wasn’t relevant.
  • After code was available they found it was relevant, a patch had been applied but it didn’t fix the problem
  • NSW law for election software is all about penalties for releasing information on problems.

Victoria has passed a bill that allows elections to be conducted via any method which is aimed at introducing electronic voting in future elections

Electronic Counting of Paper Records

  • Keynote: Who cares about Democracy? by Vanessa TeagueVarious areas have auditing software that runs against votes
  • This only works on FPTP elections, not Instant-runoff elelctions
  • Created some auditing software what should work, this was testing using some votes in San Francisco elections
  • A sample of ballots is taken and the physical ballot should match what the electronic one said it is.

Australian Senate vote

  • Auditing not done, since not mandated in law

What can we do

  • Swiss has laws around transparency, privacy and varivication
  • NSW Internet voting laws is orientated around protecting the vendors by keeping the code secret
  • California has laws about Auditing
  • Australian Senate scrutineering rules say nothing about computerised scanning and auting
  • Aus Should
    • Must be a meaningful statistical audit of the paper ballots
    • with meaningful observation by scrutineers

In Summary

  • Varifiable e-voting at polling place is feasible
  • over the Internet is an unsolved problem
  • The Senate count at present provides no evidence of accuracy
  • but would if a rigorous statistical audit is mandated

How else to use verifiable voting technology?

  • Crowsourcing amendments to legislation with a chance to vote up or down
  • Open input into parliamentary quesions
  • A version for teenagers to practice debating what they choose

Share

January 15, 2020

Digital excellence in Ballarat

In December I had the opportunity to work with Matthew Swards and the Business Improvements team in the Ballarat Council to provide a little support for their ambitious digital and data program. The Ballarat Council developed the Ballarat Digital Services Strategy a couple of years ago, which is excellent and sets a strong direction for human centred, integrated, inclusive and data driven government services. Councils face all the same challenges that I’ve found in Federal and State Governments, so many of the same strategies apply, but it was a true delight to see some of the exceptional work happening in data and digital in Ballarat.

The Ballarat Digital Services Strategy has a clear intent which I found to be a great foundation for program planning and balancing short term delivery with long term sustainable architecture and system responsiveness to change:

  1. Develop online services that are citizen centric and integrated from the user’s perspective;
  2. Ensure where possible citizens and businesses are not left behind by a lack of digital capability;
  3. Harness technology to enhance and support innovation within council business units;
  4. Design systems, solutions and data repositories strategically but deploy them tactically;
  5. Create and articulate clear purpose by aligning projects and priorities with council’s priorities;
  6. Achieve best value for ratepayers by focusing on cost efficiency and cost transparency;
  7. Build, lead and leverage community partnerships in order to achieve better outcomes; and
  8. Re-use resources, data and systems in order to reduce overall costs and implementation times.

The Business Improvement team has been working across Council to try to meet these goals, and there has been great progress on several fronts from several different parts of the Council.  I only had a few days but got to see great work on opening more Council data, improving Council data quality, bringing more user centred approaches to service design and delivery, exploration of emerging technologies (including IoT) for Council services, and helping bring a user-centred, multi-discplinary and agile approach to service design and delivery, working closely with business and IT teams. It was particularly great to see cross Council groups around big ticket programs to draw on expertise and capabilities across the organisation, as this kind of horizontal governance is critical for holistic and coordinated efforts for big community outcomes.

Whilst in town, Matthew Swards and I wandered the 5 minutes walk to the tech precinct to catch up with George Fong, who gave us a quick tour, including to the local Tech School, as well as a great chat about digital strategies, connectivity, access, inclusiveness and foundations for regional and remote communities to engage in the digital economy. The local talent and innovation in Ballarat is great to see, and in such close vicinity to the Council itself! The opportunities for collaboration are many and it was great to see cross sector discussions about what is good for the future of Ballarat :)

The Tech School blew my mind! It is a great State Government initiative to have a shared technology centre for all the local schools to use, and included state of the art gaming, 3D digital and printing tech, a robotics lab, and even an industrial strength food lab! I told a few people that people would move to Ballarat for their kids to have access to such a facility, to which I was told “this is just one of 10 across the state”.

It was great to work with the Business Improvement team and consider ways to drive the digital and data agenda for the Council and for Ballarat more broadly. It was also great to be able to leverage so many openly available government standards and design systems, such as the GDS and DTA Digital Service Standards and the NSW Design System. Open governments approaches like this make it easier for all levels of government across the world to leverage good practice, reuse standards and code, and deliver better services for the community. It was excellent timing that the Australian National API Design Standard was released this week, as it will also be of great use to Ballarat Council and all other Councils across Australia. Victoria has a special advantage as well because of the Municipal Association of Victoria (MAV), which works with and supports all Victorian Councils. The amount of great innovation and coordinated co-development around Council needs is extraordinary, and you could imagine the opportunities for better services if MAV and the Councils were to adopt a standard Digital Service Standard for Councils :)

Many thanks to Matt and the BI team at Ballarat Council, as well as those who made the time to meet and discuss all things digital and data. I hope my small contribution can help, and I’m confident that Ballarat will continue to be a shining example of digital and data excellence in government. It was truly a delight to see great work happening in yet another innovative Local Council in Australia, it certainly seems a compelling place to live :)

Linux.conf.au 2020 – Wednesday – Session 3 – FLOSS Leadership and Citizenship

Open collaborations: leadership succession and leadership success – Anne Smith & Myk Dowling

  • Started playing Kerbal Space Program and using lots of mods to it.

KSP-CKAN

  • Comprehensive Kerbal Archive Network
  • 150k downloads of a previous release, 72k of last release
  • 1035 starts on github
  • 124 releases from 16 developers
  • Written in C-sharp

Why was the project a success out of around 1.4 million projects?

Conway’s Law

  • FOSS projects are generally modular
    • C and C-derived languages are predictive of success
    • Portability predictor of success
  • Layered Development

83% of FOSS Projects fail. 46% before and 37% after a stable release

How do projects organise?

  • First the founder 1-2
  • Then a belt of users emerages
  • Then a periphery – active users
  • A core of developer emerges
  • Some formality emerges

Onboarding People

  • Relying on self-motivated people limits the number of people who will join your team
  • If you lose people by brushing them off you reduce your team diversity, team diversity gives increased likelihood of success
  • From the core to the periphery. Order of magnitude decrease in activeity but order of magnitude increase in size.
  • Therefore is 1:1 level or work. Which is about the same level of code:support work that is needed.
  • Flat structures are not stable; FOSS teams self-organise into a complex of a dual-layer structure
  • Leaders should prioritise the people on the periphery. Many join for a short term need, the leader has to give them other reasons to stick around.

Links to other Projects

  • Friction with Mod authors. Mods who though CKAN installed things the wrong way and caused problems got annoyed.
  • Some authors of modules that were under FOSS asked for it to be removed, which CKAN resisted doing.
  • CKAN was mostly orientated towards users and not so much towards the authors
  • Significant group of mod authors considered opting out of CKAM
  • Speaker proposed a policy that allowed mod authors to delete mod

Leadership

  • Strong technical contributions
  • Participatory behavior
  • Organisation building behaviors

Leadership origin and style

  • Typically the initial leader/s are the founder/s
  • Often shared
  • Leaders may move from core to periphery without losing the position
  • Organisation focus vs Product (technical) focus
  • People with both skills are the ones selected for leadership

CKAN in Transition

  • Removed mods as requested
  • Which broke things for some time
  • Leadership got transfered over
  • Original technical-orientated leader stepped back
  • A more Organizational-orientated leader took over
  • A clear and public succession is much better. Although some people still dropped out.
  • But better and an acrimonious fork

Leadership Transitions

  • Make speed and smooth
  • Happen at the speed of military coups
  • Limited participation from a predecessor assits in a smooth change
  • Establishing succession rules helps

Review the state of your projects public-facing website from the POV of the peripheral people you want to attract.

Open Source Citizenship by Josh Simmons

Healthy Projects are vital which is why many companies are investing in projects

  • They don’t just need money

What are companies doing now?

  • Upsteaming contributions
  • Contributing to the ecosystem
  • Paid contributors on staff (full or part time)
    • Hire out of the Project contributors
  • Supporting with money, infrastructure etc. Both projects directly and other things
  • Programs to help contributors get started.
  • Sharing their experience

What companies provide is not always what communities want

What are Communities asking for?

  • Volunteer design,
  • UX/UI
  • Project management
  • technical writing
  • data science
  • marketing/PR

and yet, code still dominates. These skills need onramps to contribute to your projects.

  • Contribute beyond what the company needs
  • Projects want testing and QA resources
  • Fund conference travel for contributors
  • Event Space
  • Open Source friendly contracts for employees who contribute to Open Source – See the “Contract Patch Program”
  • Jobs the maintainers and contributors when heavily relying on their work
    • If the maintainers are not getting paid that is a risk for the business
  • Encourage Universities to give students credit for contributing to FLOSS
  • Abide by community norms

Building a Culture of Open Source Citizenship

  • Enumerate and value your dependencies
  • Raise internal aweness
  • Incentivise your people to contribute to open source
  • train, train and train
  • Be Patient

For FLOSS Projects

  • Make it easy to learn about you project
  • Have clear project government and licensing
    • Say what you are looking for
    • We want to know the invest we make in you is going to be used well and in a trasparant way
    • Have a way to receive Money
    • Look at being a member of a larger organisation like Software Conservatory
    • See also open collective if you are just starting out
  • Have a plan for how you are going to use the money
  • Be prepared to work with corporate timelines
  • Be prepared to onboard new contributors
    • Contributor documentation

Share

Linux.conf.au 2020 – Wednesday – Session 2 – Unix Legacy / Social Media Research

What UNIX Cost Us by Benno Rice

Not everything is a file

Connecting to a USB device:

  • Windows – not too bad
  • Mac – a little weird
  • Linux – Lots of weird file operations. ioctl to pass data back and forth

Even worse API for creating usb_fs device. Lots of writing random data to random files.

But this is all behind a nice library?

  • Yeah but it is still a mess under the hood

Got a Byte? – Unix IO model

  • Works okay on small slow machines with simple slow interfaces
  • Doesn’t work so well with Internet, blocking
  • poll still has performance limitations
  • kevent api looked nice but Linux got epoll instead (but focuses around file descriptors)
  • But they are all still synchronous
  • Windows has Async calls

Unix is Tied to it’s history

  • Windows is newer so could learn from what came before and targetted newer hardware

C is for Colonialism

  • Farming in Europe
    • Moved to Australia, everything they new about farming doesn’t work any more.
  • PDP 11 was what Unix originally was one, simple process model.
  • Modern CPUs are not very simple
  • New CPUs lie to the OS about what the state of the machine really is (see Spectre).
  • C is not built to handle this.
  • C doesn’t handle
    • Vectorisation
    • Structure layout and padding
    • Arrays, pointers etc
  • We are not on a PDP-11 anymore
  • We have failed to evolve out CPUs and C because they are locked to each other
  • “C is not a Low Level Language” – Article

The UNIX Philosophy Problem

  • Lots of different definitions
  • Pipes seem important
  • Everything I like about using computers these days tends to be big integrated desktop tools.

Unix Suited it’s time

  • By accident it became the thing we all use
  • That time was a long time ago

How we run the community has also evolved.

Privacy is not Binary: A discussion of data systems, ethics, and human rights by Elizabeth Alpert and Amelia Radke

I was a little late to this talk so missed out the first 10-15 minutes

Social Media data reuse

  • Used by the providers
  • Governments
  • Other users
  • Malicious Users

Chucking lots of data into an “AI” is seen as yelding interesting and cool data.

Within Academia

  • Risk management. Aware of harms, mitigated, risk/reward

Is Social Media data public or private?

  • It was shared with the expectation of a certain context
  • Had to write things your friends but keep random 3rd parities in mind
  • Inferring personal information -> Dangerous
    • Especially when you are trying to infer “protected” characteristics like sexuality or religion
  • Consent? – Tricky
  • Anonymizable? – Doesn’t work

Perceptions of Risks

  • At risk groups usually given higher protection
  • Privacy is cultural concept
  • Cultural Maps

How do we do things better

  • Ethics can’t be just one person’s responsibility, it has to be in all decisions
  • Who does this belong to?
  • How do they want it to be shared?

Share

Linux.conf.au 2020 – Wednesday – Session 1 – K8s & Security Advice

Building a zero downtime Kubernetes cluster by Feilong Wang

Working for Catalyst Cloud. Catalyst Cloud especially appealing to NZ customers who don’t want latency of going to Australia

Zero Downtime in K8s Context
– Downtime of the User applications
– Downtime of the k8s cluster

The ultimate goal is zero downtime for the customer applications.

User Applications

  • Replicas >2 (ideally >3)
  • podDisruptionbudget with minAvailbale
  • Correct RollingUpdate strategy
  • Connection Draining (using readynessProbe, handle SIGTERM)
    • use prestop for apps that don’t handle sigterm
  • HTTP Keep-Alive

Zero Downtime for the K8s Cluster

  • Planned maintenance (eg an upgrade)
  • Unexpected node broken

Planned

  • Cordon and drain nodes, upgrade, uncordon

Unplanned Node Broken

  • Failure detection
  • Repair/Healing
  • Manual or Automatic?

Detect Failure

  • Detect failures from outside or inside the cluster

Draino + Cluster Autoscaler

  • Detect node status/condition by draino
  • Draino the node
  • Autoscaler will remove the empty node since it’s workload is under 50%
  • See also Node Problem Detector

Magnum AutoHealer

  • Support master node and etcd repairing
  • Autoscaler is responsible for repairing
  • The node count is predictable after repairing
  • Currently only supports openstack but could be extended

Like, Share and Subscribe: Effective Communication of Security Advice by Serena Chen

Tools and ideas to help you communicate security advice to friends and family who are not in tech.

Security Professionals are a bubble within the Tech Bubble.

Tell the people who are doing the wrong practises (like using Windows XP) that “we can’t help you”.

Nobody chooses to do the wrong thing and be insecure, they are trying to do the best for themselves.

What if people are not bad at security “because it is hard” but because they are not getting the right messages.

Personas

  • Group 1
    • Don’t know what good practice looks like
    • Confused what to do
  • Group 2
    • Knows some good practises
    • But doesn’t do any of them (eg knows about password managers but doesn’t use them)
    • Not sure how to impliment

Security is lot exercise

  • Ongoing
  • More is better
  • Room for improvement
    • Little steps, not big steps
    • Do one update not a huge change
    • The Perfect is the enemy of the good
    • Personalised for each person

How to Personalise for each person

Consider where on the following spectrums they fall

  • Technological Capability
  • Privacy needs
    • Don’t forget those who need to be visable
  • Likely Adversaries

The Open Internet tools Project have a big sample of personas

Lay a Path for Progression

  • Couch to 5k for Security
  • Week 1 – Add a password on your phone
  • Week 2 – Change you email password

How do we communicate

  • Tell, sell and shame doesnt work
  • Lead by example (with is what I do, you could too)
  • Sell doesn’t work
    • Give people successful examples to emulate
    • Give peopel scripts to help them navigate
  • Shame also doesn’t work
    • Shame Culture means that people don’t ask for advice
    • Try asking “Hey, can I show you a better way to do this? “

“Influencers”

  • Show don’t tell
  • Show their mistakes
  • Let you opt in and not out
  • Give you a range of people to follow
  • I made a youtube channel!
    • Immediately fell back into the habit of Tell, Sell and Shame
    • To reach people requires a degree of vulnerbility
    • Experts are the ones who don’t want to reveal their personal security setup
  • What else happened
    • Friends asked me about my security
    • Showed people in IRL my personal setup and how I got there
      • Honest about how hard it was
      • A lot of them were already clued up, seeing somebody they know actually doing it encouraged them to take the step and do it

Be Vulnerable

  • Tell them how you screwed up
  • People want to hear how they are not stupid for finding it hard
  • Be nice to people

Share

Linux.conf.au 2020 – Wednesday – Keynote – Donna Benjamin

Keynote: From 2020 to 2121: How will we get there?

Who is watching and why are they watching,? Why does it matter?

People install siri and other personal assistants. Cameras are everywhere.

We are making it too easy for the bad guys.

But makers of free and open source software and also helping the persuasion industry. Are we responsible for that?

The Why matters. Is the tech deterring crime, helping rescue people or used for repressing people.

Observation + Suspicious = Surveillance

From here to 2021

How to make the future happen. Act now to create what you’ll need when you get there. Pack like a Hiking trip

The Four Powers – Information, Relationships, Resources and Decision Making

What is something small can you do now to make the future better? Donna is going to take steps to improve our herd immunity to mass surveillance

https://etherpad.wikimedia.org/p/LCA2121

Step to take to more evenly distribute power now and more evenly distribute the future in 2021

Open Australia – Run various websites
– Putting Hansard online in machine readable format
– More easily submit freedom of information request

Appreciative Inquiry

Share

January 14, 2020

Linux.conf.au 2020 – Tuesday – Session 3 – Container Miniconf

Unsafe Defaults: Deploying Kubernetes Safer(ish) – James

Overview of Kubernetes

  • A compromised container is very close to being a compromised host
  • While you shouldn’t curl|bash the attacker can do it to get the latest exploits.

Three Quick things for some easy wins

  • The Kubernetes API is completely open from localhost. This is no longer required but old clusters and some upgraded clusters may still have it.
  • Put a Valid certificate on the cluster or at least one you can keep track of.
  • Get rid of unauthenticated user roles as much as possible.
  • Check you don’t still have “forever tokens”
  • A Good idea not to give service tokens to most pods.
    automountServiceAccountToken: false

PodsecurityPolicy

  • Keep an eye on
  • New
  • You need good RBAC
  • Have a look at k-rail

etcd

  • Can turn on authentication
  • Can turn on TLS between peers and clients
  • Can encrypt on disk
  • Can restrict it with a firewall

Every Image Has A Purpose by Allen Shone

Docker Images

  • What are they anyway
  • A base definition to prepare a filesystem for execution as a container
  • Caching mechanism
  • Reproduceable
  • Great way to share runtime circumstances
  • A comprehensive environment structure

Layers

  • image is a series of layers
  • Minimizing layers makes things better
  • Structure the image build process to get the best set of images

Basic Uses

  • Use the most appropriate image
  • A small fix can add up

Images in Production / Customers facing envs

  • When deploying containers, be precise as possible.
  • The image should be ready to go without further work
  • Keep image and small and simple as possible
  • “FROM: golang:alpine” in testing
  • “FROM: scratch” in production
  • Two images but they serve different purposes

Development

  • Possible to use the same image as previously
  • Bring in some extra debug tools etc, mocks for other services

Trimming the final image to be very specific

  • Start with the production image and add extra layers of stuff

Deployed Considerations

  • Some things only come into consideration once they are deployed
  • Instead of creating a big general container, create two containers in a pod that share a file system
  • Configuration should be injeted, as an env-specific setup
  • Images should be agnostic

Extras

  • Look at using the .dockerignore file
  • Use image scannign tools ( Diive and Clair)
  • A little preparation up front can prevent a lot of headache later

Share

Writing a terraform remote state server

Share

Terraform is a useful tool for deploying cloud resources. This post isn’t an introduction to terraform, so I’ll assume you already know and love it. If you want more, then this getting started guide would be a sensible start.

At its most basic level, terraform deploys cloud resources and stores information about those resources in a file on local disk called terraform.tfstate — it needs that state information so it can make later changes to the deployment, be those modifying resources in use or tearing the whole deployment down. If you had an operations team working on an environment, then you could store the tfstate file in git or a shared filesystem so that the entire team could manage the deployment. However, there is nothing with that approach that stops two members of the team making overlapping changes.

That’s where terraform state servers come in. State servers can implement optional locking, which stops overlapping operations from happening. The protocol that these servers talk isn’t well documented (that I could find), so I wanted to explore that. I wanted to explore that more, so I wrote a simple terraform HTTP state server in python.

To use this state server, configure your terraform file as per demo.tf. The important bits are:

terraform {
  backend "http" {
    address = "http://localhost:5000/terraform_state/4cdd0c76-d78b-11e9-9bea-db9cd8374f3a"
    lock_address = "http://localhost:5000/terraform_lock/4cdd0c76-d78b-11e9-9bea-db9cd8374f3a"
    lock_method = "PUT"
    unlock_address = "http://localhost:5000/terraform_lock/4cdd0c76-d78b-11e9-9bea-db9cd8374f3a"
    unlock_method = "DELETE"
  }
}

Where the URL to the state server will obviously change. The UUID in the URL (4cdd0c76-d78b-11e9-9bea-db9cd8374f3a in this case) is an example of an external ID you might use to correlate the terraform state with the system that requested it be built. It doesn’t have to be a UUID, it can be any string.

I am using PUT and DELETE for locks due to limitations in the HTTP verbs that the python flask framework exposes. You might be able to get away with the defaults in other languages or frameworks.

To run the python server, make a venv, install the dependancies, and then run:

$ python3 -m venv ~/virtualenvs/remote_state
$ . ~/virtualenvs/remote_state/bin/activate
$ pip install -U -r requirements.txt
$ python stateserver.py

I hope someone else finds this useful.

Share

Linux.conf.au 2020 – Tuesday – Session 2 – Security, Identity, Privacy Miniconf

Privacy and Transparency in the VPN industry by Ruben Rubio Rey

We are at an “Oh Noe!” Moment in the VPN Industry

VPN Advantages

  • Protect your privacy
  • Bypass Geo-Restrictions
  • Beat Censorship
  • Save money on Hotels and Flights
  • Download torrents anonymously
  • Bypass ISP speed regulations
  • Secures Public WIFI

What Can be intercepted?
– Without Encryption: Any Data
– With Encryption: IP and Port

But HTTPS only works of client and server configured correctly
Client: Rough root certificate
Servers: CORS, insecure SSL version

Protect Your Privacy

  • Many Countries Systematicly collecting data about citizens
  • ISP collect data, must keep for two years and accessabil to agencies
  • USA ISP’s can sell information
  • Others Countries tried to put in MITM Certs

So Private companies have incentives to protect my data?

The Reality of Private VPN providers

  • Several examples of collecting Data
  • Several examples of them releasing data to agencies
  • Random security and implementation problems
  • Exaggerations in sales pitches
  • Installs Rouge Roots Cert on user machine

Conflict of Interest, what is a business model of the providers?

Stats

  • 59% of Free VPNs in play store had hidden Chinese ownership
  • 86% had privacy policy flaw
  • 85% asked for excessive permissions

Are VPN Companies Needed?

People with non-technical skills need an option

How to Improve the VPN Market?

  • Privacy and Transparency go hand and hand
  • Open Source Provides Transparency
  • End to End open source VPN Company
  • theVPNcompany.com.au

Install you own VPN

Algo and Streisand

Create your own VPN Company using the base for “The VPN Company”

https://thevpncompany.com.au/

Authentication Afterlife: the dark side of making lost password recovery harder by Ewen McNeill

Twitter Account “badthingsdaily” . Fictional Scenarios that might happen to security people. Inspired this talk.

Scenario 1

  • A Big fire took out your main computer
  • You done have the computer and you don’t know all your passwords

Recovery Traditional

  • You get email somewhere else. On your phone
  • Click on Forgot my password
  • Repeat until all accoutns recoveryed

Scenario 2

  • You need to login to your account on a new device
  • All account secured with 2FA
  • Your 2FA isn’t working

Recovery

  • Recovery Tokens
  • Alternative 2FA Solution

Scenario 3

  • Your bad was stolen
  • It had computer, phone and 2FA
  • Can bad guy impersonate you?
  • Can you recovery faster than the other guy (or at all?)

Recovery

  • Does you 2FA pop up on your lock screen?
  • So anybody with your computer is able to get this?
  • Race to reset passwords and invalidate your login tokens
  • Maybe you remember your passwords but not you 2FA
  • Recovery questions “Mother’s maiden name”
  • Can be easy to discover, but if it is something random then you have to be able to find it (ie on the password store you just lost)

Multiple alternate authentication methods

  • Primary you use every day
  • One or more backups

If resetting your password every time is easier than remembering your password people will do that.

Attackers will use the easiest authentication method. Eg Contacting the Helpdesk or going into a bank branch office.

But if recovery is too hard you can end up losing access to your account permanently

Recommend: GitHub’s 2FA recovery guide

Scenario 4

You startups founder has left. He has wipped out all his computer. Now your Cloudprovider is threatening to lock you out unless you authenticate using 2FA

  • Hopefully in the password store
  • Or perhaps they no longer work
  • Contact Helpdesk, Account Manager, Lawyer, Social Media (usually the bigger you are and the more you pay the better you chance)
  • Sore everything centrally. How do you audit that? , regularly?

Scenario 5

A relative dies. You first step is to login to all their accounts work out what should be kept.

This will take months not years. Sometimes you will only find out the account exists when they email you that your account is about to expire.

Personal Observations

  • You will not have access to their cellphone
  • or probably not past the lock screen
  • Anything they told you that was obvious you will forget
  • You will not have access to the password store
  • You may have access to saved passwords in browser
  • Maybe you need to optimise for family can access stuff not complete lockdown.
  • Physical notebook with passwords
  • Consider in advance how you will recover if your 2FA device breaks
  • How will you convince a helpdesk person that you are you?

Personal Mitigations

  • Kawaiicon 2019 ” How can I help you” Talk by Laura Bell

You Shall Not Pass by Peter Burnett

Moodle is an open souce Learning Management System.

  • Legacy System
  • First developed in 1997
  • Open Sourced in 2001
  • New Code is good quality, older stuff not as much

Efforts to improve password policy

  • Password policy was a bit antiquated
  • Best policies come from NIST, 2018 version is good.
  • Don’t force a pattern, Check for compromised passwords, Check for dictionary based and identifying passwords
  • Look at the “Have I been Pwned” API – takes first 5 characters of the sha of the password.
  • Dictionary checks – Top 10,000 English words might be enough
  • Indentifying information – Birthdays, names, cities are things to watch for. Name of the company.

Released as an open source plugin for Moodle

A look at the Authentication Flow

  • Natively supported LDAP etc.
  • Lots of extra plugins impliment other methods
  • Had to put MFA in when people using plugins. Difficult to mix
  • Added extra hook on “account related” actions, they would check for MFA etc.
  • Required a bit of work to get merged in.

Implementing MFA

  • MFA is a superset of 2FA implimentations
  • Had to do extensible platform
  • Traditional: TOTP, Email
  • Non-Traditional: IP verification, Authentication type (might already have MFA)
  • Design considerations – Keep secure but impact people as little as possible.
  • Different users: Not required, Optional, Forced Upon . So built in the ability for a range of use across platform.
  • Learnings
    • Anything can be used as a factor
    • delicate balance between secure and usable
    • When designing, paranoid is the right mindset
    • Give the least information possible to allow a legit user to authenticate
    • What can the attacker do if this factor is compromised?

Final Thoughts

  • Long way to go
  • Security is a shifting goalpost
  • Keep on top of new developments

Share

Linux.conf.au 2020 – Tuesday – Session 1 – Security, Identity, Privacy Miniconf

Facebook, Dynamite, Uber, Bombs, and You – Lana Brindley

  • Herman Hollerith
    • Created the punch card, introduced for the 1890 US Census
    • Hollerith leased companies to other people
  • Hollerith machines and infrastructure used by many Census in Europe.
    • Countries with better census infrastructure using Hollerith machines tended to use have higher deather rate in The Holocaust
  • Alfred Nobel
    • Invented Dynamite and ran weapons company
  • Otto Hahn
    • Invented Nuclear Fission
  • Eugenics
    • 33 US states have sterilization programmes in place
    • 65,000 Americans sterilized as part of programmes
    • WHO was created as a result.
  • Thalidomide
    • Over-the-counter morning sickness treatment
    • Caused birth defeats
    • FDA strengthened

Unintended consequences of technology, result was stronger regulation

Volkswagen emission and Uber created Greyball
– Volkswagen engineers went to jail, Uber engineers didn’t

Here are some IT innovations that didn’t lead to real change

  • Medical Devices
    • Therac-25 was a 1980s machine used for treating cancer with radiation
    • Control software had race condition that gave people huge radiation overloads
  • Drive by Wire for Cars
    • Luxus ES350 sudden acceleration
    • Toyota replaced floor mats, not software
    • Car accelerator stuck at full speed and brakes not working
    • No single cause ever identified
  • Deep Fake Videos
  • Killer Robots
    • South Korean Universities came under pressure to stop research, said they had stopped but not confirmed.
  • Chinese Surveillance
    • Checkpoints all though the city, average citizen goes though them many times per day and have phoned scanned, other checks.
    • Cameras with facial recognition everywhere
  • Western Surveillance – Palantir and other companies installing elsewhere
  • Boeing Software – 373 Max

Bad technology should have consequences and until it does people have to avoid things themselves as much as possible and put pressure on governments and companies

The Internet: Protecting Our Democratic Lifeline by Brett Sheffield

Lost of ways technology can protect us (Tor etc) and at the same time plenty of ways technology works against our prevacy.

The UN Declaration of Human Rights
Australia is the only major country without a bill of rights.

Ways to contribute
– They Work for you type websites
– Protesting
– Whistleblowers

Democracy Under Threat
– Governments blocking the Internet
– Netblocks.org
– Police harrass journalists (AFC raids ABC in Aus)
– Censorship

Large Companies
– Gather huge amounts of information
– Aim for personalisation and monotisation
– Leads to centralisation

Rebuilding the Internet with Multicast
– Scalable
– Happens at the network layer
– Needs to be enabled on all routers in each hop
– Currently off by default

Libracast
– Aims to get multicast in the hands of developers
– Tunnels though non-multicast enabled devices
– Messaging Library
– Transitional tunneling
– Improved routing protocol
– Try to enable in other FOSS projects
– Ensure new standards ( WebRTC, QUIC) support multicast



Share

Linux.conf.au 2020 – Tuesday – KeyNote: Sean Brady

Keynote: Drop Your Tools – Does Expertise have a Dark Side? by Dr Sean Brady

Harford Convention Center

Engineers ignored warnings of problems, kept saying calculations were good. Structure collasped under light snow load

People are involved with engineering, therefore it is a people problem

What it possessing expertise has a dark side? Danger isn’t ignnorance it is the illusion of knowledge.

Mann Gulch fire

Why did the firefighters not drop their tools?
Why did they not get in the Escape Fire?

Priming – You get information that primes you to think a certain way.

What if Expertise priming somebody?
– Baseball experts primed to go down the wrong path, couldn’t even stop when explicitly told about the trick.

Firefighters explicitly trained that they are faster runners with tools.

Creative Desperation – Mentally drop your existing tools.



Share

January 13, 2020

Linux.conf.au 2020 – Monday – Opening

Welcome to Country from the Yugambeh people

Main Organisers

Ben Stevens
Joel Addison
  • Sponsors Acknowledged
  • Scheduled Outlined
  • Review of location, food, swag and other stuff
  • Charity for raffle is buchfire appeal

Share

January 06, 2020

Setting up VXLAN between nested virt VMs on Google Compute Engine

Share

I wanted to play with a VXLAN mesh between VMs on more than one hypervisor node, but the setup for VXLAN ended up being a separate post because it was a bit long. Read that post first if you want to follow the instructions here.

Now that we have a working VXLAN mesh between our two nodes we can move on to installing libvirt (which is called libvirt-daemon-system on Debian, not libvirt-bin as on Ubuntu):

sudo apt-get install -y qemu-kvm libvirt-daemon-system
sudo virsh net-start default
sudo virsh net-autostart --network default

I’m going to use a little python helper to launch my VMs, so I need some other dependancies as well:

sudo apt-get install -y python3-pip pkg-config libvirt-dev git

git clone https://github.com/mikalstill/shakenfist
cd shakenfist
git checkout 6bfac153d249752b27d224ad9d079095b640498e

sudo mkdir /srv/shakenfist
sudo cp template.debian.xml /srv/shakenfist/template.xml
sudo pip3 install -r requirements.txt

Let’s launch a quick test VM to make sure the helper works:

sudo python3 daemon.py
sudo virsh list

You can destroy that VM for now, it was just testing the install.

sudo virsh destroy ...name...

Next we need to tweak the template that shakenfist is using to start instances so that it uses the bridge for networking (that template is the one you copied to /srv/shakenfist/template.xml earlier). Replace the interface section in the template with this on both nodes:

<interface type='bridge'>
  <mac address={{eth0_mac}}/>
  <source bridge='br-vxlan0'/>
  <model type='virtio'/>
  <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</interface>

I know the bridge mentioned here doesn’t exist yet, but we’ll deal with that in a second. Before we start VMs though, we need a way of getting IP addresses to them. shakenfist can configure interfaces using config drive, but I’d prefer to use DHCP because who doesn’t love some additional complexity?

On one of the nodes install docker:


sudo apt-get install apt-transport-https ca-certificates curl gnupg2 software-properties-common
curl -fsSL https://download.docker.com/linux/debian/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/debian $(lsb_release -cs) stable"
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io

Now we can setup DHCP. Create a place for the configuration file:

sudo mkdir /srv/shakenfist/dhcp

And then create the configuration file at /srv/shakenfist/dhcp/dhcpd.conf with contents like this:

default-lease-time 3600;
max-lease-time 7200;
option domain-name-servers 8.8.8.8;
authoritative;

subnet 192.168.200.0 netmask 255.255.255.0 {
  option routers 192.168.1.1;
  option broadcast-address 192.168.1.255;

  pool {
    range 192.168.200.10 192.168.200.254;
  }
}

Before we can start dhcpd, we need to move the VXLAN device into a bridge so we can add a device for the DHCP server to it. First off remove the vxlan0 device from the last post:

sudo ip link set down dev vxlan0
sudo ip link del vxlan0

And now recreate it with a bridge:

sudo ip link add vxlan0 type vxlan id 42 dev eth0 dstport 0
sudo bridge fdb append to 00:00:00:00:00:00 dst 34.70.161.180 dev vxlan0
sudo ip link add br-vxlan0 type bridge
sudo ip link set vxlan0 master br-vxlan0
sudo ip link set vxlan0 up
sudo ip link set br-vxlan0 up
sudo ip link add dhcp-vxlan0 type veth peer name dhcp-vxlan0p
sudo ip link set dhcp-vxlan0p master br-vxlan0
sudo ip link set dhcp-vxlan0 up
sudo ip link set dhcp-vxlan0p up
sudo ip addr add 192.168.200.1/24 dev dhcp-vxlan0

This block of commands:

  • recreated the vxlan0 interface
  • added it to the mesh with the other node again
  • created a bridge named br-vxlan0
  • moved the vxlan0 interface into it
  • created a veth pair called dhcp-vxlan0 and dhcp-vlan0p
  • moved the peer part of that veth pair into the bridge
  • and then configured an IP on the external half of the veth pair

To make the bridge survive reboots you would need to add it to either /etc/network/interfaces or /etc/netplan/01-netcfg.yml depending on your distribution, but that’s outside the scope of this post.

You should be able to ping again. From the other node give it a try:

$ ping 192.168.200.1
PING 192.168.200.1 (192.168.200.1) 56(84) bytes of data.
64 bytes from 192.168.200.1: icmp_seq=1 ttl=64 time=19.3 ms
64 bytes from 192.168.200.1: icmp_seq=2 ttl=64 time=0.571 ms

We need to do something similar on the other node so it can run VMs as well. It is a tiny bit simpler because there wont be any DHCP there however, and remembering that you need to change 35.223.115.132 to the IP of your first node:

sudo ip link set down dev vxlan0
sudo ip link del vxlan0

sudo ip link add vxlan0 type vxlan id 42 dev eth0 dstport 0
sudo  bridge fdb append to 00:00:00:00:00:00 dst 35.223.115.132 dev vxlan0
sudo ip link add br-vxlan0 type bridge
sudo ip link set vxlan0 master br-vxlan0
sudo ip link set vxlan0 up
sudo ip link set br-vxlan0 up

Note that now we can’t do a ping test because the second VM no longer consumes an IP for the base OS.

Now we can start the docker container with dhcpd listening on dhcp-vxlan0:

sudo docker run -it --rm --init --net host -v /srv/shakenfist/dhcp:/data networkboot/dhcpd dhcp-vxlan0

This runs dhcpd interactively so we can see what happens. Now try starting a VM on the other node:

sudo python3 daemon.py

You can watch the VM booting using the “virsh console” command with the name of the vm from “virsh list“. The dhcpd process should show you something like this:

sudo docker run -it --rm --init --net host -v /srv/shakenfist/dhcp:/data networkboot/dhcpd dhcp-vxlan0
Internet Systems Consortium DHCP Server 4.3.5
Copyright 2004-2016 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/
Config file: /data/dhcpd.conf
Database file: /data/dhcpd.leases
PID file: /var/run/dhcpd.pid
Wrote 0 leases to leases file.
Listening on LPF/dhcp-vxlan0/06:ff:bc:7d:11:e3/192.168.200.0/24
Sending on   LPF/dhcp-vxlan0/06:ff:bc:7d:11:e3/192.168.200.0/24
Sending on   Socket/fallback/fallback-net
Server starting service.
DHCPDISCOVER from ee:95:4d:40:ca:a6 via dhcp-vxlan0
DHCPOFFER on 192.168.200.10 to ee:95:4d:40:ca:a6 (foo) via dhcp-vxlan0
DHCPREQUEST for 192.168.200.10 (192.168.200.1) from ee:95:4d:40:ca:a6 (foo) via dhcp-vxlan0
DHCPACK on 192.168.200.10 to ee:95:4d:40:ca:a6 (foo) via dhcp-vxlan0

You can see here that our new VM got the IP 192.168.200.10 from the DHCP server! It is moments like this when you don’t realise that this blog post took me hours to write that I feel really smart.

If we started a VM on the first node (the same command as for the second node), we’d now have two VMs on a virtual network which had working DHCP and could ping each other. I think that’s enough for one evening.

Share

January 05, 2020

Setting up VXLAN on Google Compute Engine

Share

So my ultimate goal here is to try out VXLAN between some VMs on instances in Google compute engine, but today I’m just going to get VXLAN working because that took a fair bit longer than I expected. First off, boot your instances — because I will need nested virt later I chose two instances on Google Cloud. Please note that you need to do a bit of a dance to turn on nested virt there. I also chose to use Debian for this experiment:

gcloud compute instances create vx-1 --zone us-central1-b --min-cpu-platform "Intel Haswell" --image nested-vm-image

Now do those standard things you do to all new instances:

sudo apt-get update
sudo apt-get dist-upgrade -y

Now let’s setup VXLAN between the two nodes, with a big nod to this web page. First create a VXLAN interface on each machine (if you care about the port your VXLAN traffic is on being to IANA standards, see the postscript at the end of this):

sudo ip link add vxlan0 type vxlan id 42 dev eth0 dstport 0

Now we need to put the two nodes into a mesh, where 34.70.161.180 is the IP of the node we are not running this command on and the IP address for the second command needs to be different on each machine.

sudo bridge fdb append to 00:00:00:00:00:00 dst 34.70.161.180 dev vxlan0
sudo ip addr add 192.168.200.1/24 dev vxlan0
sudo ip link set up dev vxlan0

I am pretty sure that this style of mesh (all nodes connected) wouldn’t scale past non-trivial sizes, but hey baby steps right? Finally, because we’re using Google Cloud we need to add firewall rules to allow our traffic into the instances:

Note that these rules are a source of confusion for me right now. I wanted (and configured) VXLAN. So why do I need to allow OTV for this to work? I suspect Linux has politely ignored my request and used OTV not VXLAN for my traffic.

We should now be able to ping those newly configured IP addresses from each machine:

ping 192.168.200.2 -c 1
PING 192.168.200.2 (192.168.200.2) 56(84) bytes of data.
64 bytes from 192.168.200.2: icmp_seq=1 ttl=64 time=1.76 ms

Which produces traffic like this on the underlay network:

tcpdump -n -i eth0 host 34.70.161.180
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
09:01:58.159092 IP 10.128.0.9.59341 > 34.70.161.180.8472: OTV, flags [I] (0x08), overlay 0, instance 42
IP 192.168.200.1 > 192.168.200.2: ICMP echo request, id 20119, seq 1, length 64
09:01:58.160786 IP 34.70.161.180.48471 > 10.128.0.9.8472: OTV, flags [I] (0x08), overlay 0, instance 42
IP 192.168.200.2 > 192.168.200.1: ICMP echo reply, id 20119, seq 1, length 64
^C
2 packets captured
2 packets received by filter
0 packets dropped by kernel

Hopefully this is helpful to someone else. Thanks again to Joe Julian for a very helpful post.

Postscript: Dale Shaw pointed out on twitter that I might still be talking VXLAN, just on a weird port. This is supported by this comment I found on the internets: “when VXLAN was first implemented in linux, UDP ports were not specified. Many vendors use 8472, and Linux uses the same port. Later, IANA allocated 4789 as the port. If you need to use the IANA port, you need to specify it with dstport”.

Share

January 03, 2020

Playing with the python prometheus query API

Share

The last few days have been a bit icky around here, with my house apparently proudly residing in the major city with the dirtiest air in the world. So, I needed a distraction…

It has also been quite hot, so I wondered how my energy usage was going. I have prometheus monitoring of my power draw, so now seemed as good a time as any to learn how to do some historical querying over the API. I ended up with a python script which can output things like this: Yesterday had a maximum temperature of 38 and we used 28.36 kwh. The average for similar days is 25.56 kwh.”

The code is on github if it is of interest to others. I am sure I could push more of this processing down into the prometheus engine, but I couldn’t see how to do it today. Hints welcome!

Share

January 01, 2020

Receiving slow-scan television images from the International Space Station

Thanks to a fellow VECTOR volunteer Nick Doyle, I found out that the International Space Station would be broadcasting slow-scan television images at the end of the year. I decided to try and pick those up with my handheld radio.

Planning

From the official announcement, I got the frequency (145.800 MHz) and the broadcast times.

Next I had to figure out when the ISS would be passing over my location. Most of the ISS tracking websites and applications are aimed at people wanting to see the reflection of the sun on the station and so they only list the passes during nighttime before the earth casts a shadow that would prevent any visual contacts.

Thankfully, Nick found a site which has a option to show all of the passes, visible or not and so I was able to get a list of upcoming passes over Vancouver.

Hardware

From a hardware point-of-view, I didn't have to get any special equipment. I used my Kenwood D72 and an external Comet SBB5 mobile antenna.

The only other pieces of equipment I used was a 2.5mm mono adapter which I used to connect a 3.5mm male-male audio cable in the speaker port of the radio and the microphone input of my computer.

Software

The software I used for the recording was Audacity set to a sampling rate of 48 kHz.

Then I installed qsstv and configured it to read input from a file instead of the sound card.

Results

Here is the audio I recorded from the first pass (65 degrees at the highest point) as well as the rendered image:

The second pass (60 degrees) was not as successful since I didn't hold the squelch open and you can tell from the audio recording that the signal got drowned in noise a couple of times. This is the rendering of that second pass:

Tips

The signal came through the squelch for only about a minute at the highest point, so I found it best to open the squelch fully (F+Moni) as soon as the bird is visible.

Another thing I did on a third pass (16 degrees at the highest point -- not particularly visible) was to plug the speaker out of my radio into a Y splitter so that I could connect it to my computer and an external speaker I could take outside with me. Since I was able to listen to the audio, I held the antenna and tried to point it at the satellite's general direction as well as varying the orientation of the antenna to increase the signal strength.

AudioBooks – December 2019

Call the Ambulance! by Les Pringle

Stories from a British Ambulance driver in the late-1970s and 1980s. A good range of stories from the funny to the tragic. 7/10

Permanent Record by Edward Snowden

An autobiography by the NSA Whistle-blower. Mostly a recounting of his life, career and circumstances that led up to him leaking. Interesting. 7/10

Life in the Middle Ages by Richard Winston

As the titles describes. Unusually for English Language books it focuses on France. Not much history just daily life & only 5h long. Probably works better with pictures. 6/10

Dr Space Junk vs the Universe: Archaeology and the Future by Alice Gorman

A Mix of topics. Some autobiography & how she worked her way into the archeology of spaceflight. Plus items of Space History & comparisons with earth archeology. But it works 8/10

Little House in the Big Woods by Laura Ingalls Wilder

Only 3h 40m long and roughly covering a year. The author describes her life (aged 5-6) and her family in a cabin Wisconsin in the early 1870s. 1st in the series. 7/10

Abraham Lincoln: A Life (Volume One) by Michael Burlingame

50h and covers up to his 1st inauguration. Not a good 1st Lincoln bio to read but very good. Some repetition as multiple sources a quoted on some points. 7/10

Share

BlueHackers crowd-funding free psychology services at LCA and other conferences

BlueHackers has in the past arranged for a free counsellor/psychologist at several conferences (LCA, OSDC). Given the popularity and great reception of this service, we want to make this a regular thing and try to get this service available at every conference possible – well, at least Australian open source and related events.

Right now we’re trying to arrange for the service to be available at LCA2020 at the Gold Coast, we have excellent local psychologists already, and the LCA organisers are working on some of the logistical aspects.

Meanwhile, we need to get the funds organised. Fortunately this has never been a problem with BlueHackers, people know this is important stuff. We can make a real difference.

Unfortunately BlueHackers hasn’t yet completed its transition from OSDClub project to Linux Australia subcommittee, so this fundraiser is running in my personal name. Well, you know who I (Arjen) am, so I hope you’re ok all with that.

We have a little over a week until LCA2020 starts, let’s make this happen! Thanks. You can donate via MyCause.

Speeding up Blackbird boot: the SBE

The Self Boot Engine (SBE) is a small embedded PPE42 core inside the POWER9 CPU which has the unenvious job of getting a single POWER9 core ready enough to start executing instructions out of L3 cache, and poking some instructions into said cache for the core to start executing.

It’s called the “Self Boot Engine” as in generations prior to POWER8, it was the job of the FSP (Service Processor) to do all of the booting for the CPU. On POWER8, there was still an SBE, but it was a custom instruction set (this was the Power On Reset Engine – PORE), while the PPE42 is basically a 32bit powerpc core cut straight down the middle (just the way to make it awkward for toolchains).

One of the things I noted in my post on Booting temporary firmware on the Raptor Blackbird is that we got serial console output from the SBE. It turns out one of thing things explicitly not enabled by Raptor in their build was this output as “it made the SBE boot much slower”. I’d actually long suspected this, but hadn’t really had the time to delve into it.

Since for POWER9, the firmware for the SBE is now open source code, as is the ppe42-binutils and ppe42-gcc toolchain for it. This means we can hack on it!

WARNING: hacking on your SBE firmware can be relatively dangerous, as it’s literally the first thing that needs to work in order to boot the system, and there isn’t (AFAIK) publicly documented easy way to re-flash your SBE firmware if you mess it up.

Seeing as we saw a regression in boot time with the UART output enabled, we need to look at the uartPutChar() function in sbeConsole.C (error paths removed for clarity):

static void uartPutChar(char c)
{
    #define SBE_FUNC "uartPutChar"
    uint32_t rc = SBE_SEC_OPERATION_SUCCESSFUL;
    do {
        static const uint64_t DELAY_NS = 100;
        static const uint64_t DELAY_LOOPS = 100000000;

        uint64_t loops = 0;
        uint8_t data = 0;
        do {
            rc = readReg(LSR, data);
...
            if(data == LSR_BAD || (data & LSR_THRE))
            {
                break;
            }
            delay(DELAY_NS, 1000000);
        } while(++loops < DELAY_LOOPS);

...
        rc = writeReg(THR, c);
...
    } while(0);

    #undef SBE_FUNC
}

One thing you may notice if you’ve spent some time around serial ports is that it’s not using the transmit FIFO! While according to Wikipedia the original 16550 had a broken FIFO, but we’re certainly not going to be hooked up to an original rev of that silicon.

To compare, let’s look at the skiboot code, which is all in hw/lpc-uart.c:

static void uart_check_tx_room(void)
{
	if (uart_read(REG_LSR) & LSR_THRE) {
		/* FIFO is 16 entries */
		tx_room = 16;
		tx_full = false;
	}
}

The uart_check_tx_room() function is pretty simple, it checks if there’s room in the FIFO and knows that there’s 16 entries. Next, we have a busy loop that waits until there’s room again in the FIFO:

static void uart_wait_tx_room(void)
{
	while (!tx_room) {
		uart_check_tx_room();
		if (!tx_room) {
			smt_lowest();
			do {
				barrier();
				uart_check_tx_room();
			} while (!tx_room);
			smt_medium();
		}
	}
}

Finally, the bit of code that writes the (internal) log buffer out to a serial port:

/*
 * Internal console driver (output only)
 */
static size_t uart_con_write(const char *buf, size_t len)
{
	size_t written = 0;

	/* If LPC bus is bad, we just swallow data */
	if (!lpc_ok() && !mmio_uart_base)
		return written;

	lock(&uart_lock);
	while(written < len) {
		if (tx_room == 0) {
			uart_wait_tx_room();
			if (tx_room == 0)
				goto bail;
		} else {
			uart_write(REG_THR, buf[written++]);
			tx_room--;
		}
	}
 bail:
	unlock(&uart_lock);
	return written;
}

The skiboot code ends up being a bit more complicated thanks to a number of reasons, but the basic algorithm could be applied to the SBE code, and rather than busy waiting for each character to be written out before sending the other into the FIFO, we could just splat things down there and continue with life. So, I put together a patch to try out.

Before (i.e. upstream SBE code): it took about 15 seconds from “Welcome to SBE” to “Booting Hostboot”.

Now (with my patch): Around 10 seconds.

It’s a full five seconds (33%) faster to get through the SBE stage of booting. Wow.

Hopefully somebody looks at the pull request sometime soon, as it’s probably useful to a lot of people doing firmware and Operating System development.

So, Happy New Year for Blackbird owners (I’ll publish a build with this and other misc improvements “soon”).

December 29, 2019

Donations 2019

Each year I do the majority of my Charity donations in early December (just after my birthday) spread over a few days (so as not to get my credit card suspended). I’m a little late this year due to a new credit card and other stuff distracting me.

I also blog about it to hopefully inspire others. See: 2018, 2017, 2016, 2015

All amounts this year are in $US unless otherwise stated

My main donations was to Givewell (to allocate to projects as they prioritize). Once again I’m happy that Givewell make efficient use of money donated.

I donated $50 each to groups providing infrastructure and advocacy. Wikipedia only got $NZ 50 since they converted to my local currency and I didn’t notice until afterwards

Some Software Projects. Software in the Public Interest provides admin support for many Open Source projects. Mozilla does the Firefox Browser and other stuff. Syncthing is an Open Source Project that works like Dropbox

Finally I’m still listening to Corey Olsen’s Exploring the Lord of the Rings series (3 years in and about 20% of the way though) plus his other material

Share

Encoding your WiFi access point password into a QR code

Up until recently, it was a pain to defend againt WPA2 brute-force attacks by using a random 63-character password (the maximum in WPA-Personal) mode). Thanks to Android 10 and iOS 11 supporting reading WiFi passwords from a QR code, this is finally a practical defense.

Generating the QR code

After installing the qrencode package, run the following:

qrencode -o wifi.png "WIFI:T:WPA;S:<SSID>;P:<PASSWORD>;;"

substituting <SSID> for the name of your WiFi network and <PASSWORD> for the 63-character password you hopefully generated with pwgen -s 63.

If your password includes a semicolon, then escape it like this:

"WIFI:T:WPA;S:<SSID>;P:pass\:word;;"

since iOS won't support the following (which works fine on Android):

'WIFI:T:WPA;S:<SSID>;P:"pass:word";;'

The only other pitfall I ran into is that if you include a trailing newline character (for example piping echo "..." into qrencode as opposed to echo -n "...") then it will fail on both iOS and Android.

Scanning the QR code

On iOS, simply open the camera app and scan the QR code to bring up a notification which allows you to connect to the WiFi network:

On Android, go into the WiFi settings and tap on the WiFi network you want to join:

then click the QR icon in the password field and scan the code:

In-browser alternative

If you can't do this locally for some reason, there is also an in-browser QR code generator with source code available.

December 27, 2019

Coming to grips with Kubernetes in 2020: online training

Share

There are a few online training resources I’ve had a play with while learning Kubernetes, so I figure that’s worth a quick write up. This is a follow on from my post about Kubernetes podcasts I’ve tried. I’ve tried three training providers so far:

  • The Linux Foundation Kubernetes course (LFS258 Kubernetes Fundamentals) is probably the “go to” resource for many people, and is often sold bundled with the certification exams. Unfortunately, it is really terrible. It is by far the worst course I’ve seen so far.
  • On the other hand, the Linux Academy Kubernetes course is really good. It is flaw is that you have to sign up to Linux Academy, which provides you with all you can eat courses for a rather steep annual fee.
  • Finally, I discovered Mumshad Mannambeth’s Udemy courses, and frankly they’re excellent. He’s put a huge amount of effort into them and it really shows. Even better, with Udemy’s regular sales you can pick up his three Kubernetes courses (intro, admin certification, and developer certification) for under $50 AUD. There are even plenty of online quizzes.

If I was going to pick a course to try, I’d definitely go with Mumshad.

Share

December 25, 2019

Hacking on Arlec Christmas lights with tasmota

Share

I’m loving the wide array of electrically certified home automation devices we’re seeing now. Light bulbs, sensors, power boards, and even Christmas lights. Specifically Arlec is shipping these app controllable Christmas lights this year, which looked very much like they should work with Tasmota.

(Sorry for the terrible product picture, I can’t find this product online any more, I suspect Bunnings has sold out for the year?)

Specifically, it turns out that these Arlec lights are an ESP8266 which can be flashed with tuya-convert v2 to run tasmota. Once flashed, you can control all of the functions available on the device itself, although there are parts of the protocol I haven’t fully understood yet.

Let’s start off by flashing the device:

  • First off boot your raspberry pi with tuya-convert. I used v2, and I suspect that’s important here so make sure you upgrade if you’re using something old.
  • Next, put the bud lights into programming mode by holding the button on the control box down until the light strand turns off. Release and the strand should start blinking every couple of seconds.
  • Now run the tuya-convert flashing script.
  • Now go to the tasmota-XXXX essid and enter your wifi details into the captive portal. The light strand will now reboot.
  • The light strand should now appear on your wifi, and you can find the IP address and MAC address by asking your DHCP server nicely.

Now for some basic configuration in the web UI. Set the module type to “Tuya MCU (54)”, GPIO1 to “Tuya Tx (107)”, and GPIO3 to “Tuya RX (108)”. You’ll also want to set the usual site-specific configuration options like your MQTT server and so forth.

So what is a “Tuya MCU” when it is at home? Well, it turns out that some tuya devices have an esp8266 which just talks serial to another microcontroller. It kind of makes sense if you already have a microcontroller device you want to make “smart” and you’re super dooper lazy I suppose. There is surprisingly good manufacturer documentation online.

In the case of these bud lights, I have a strong theory that we can basically cycle the modes available by pressing the physical button, but I needed a way to validate that.

You can read more about how these devices work on the tuya protocols page, or you can just jump ahead to the simple programs I wrote to explore these devices. However, a quick summary of the serial protocol spoken between the esp8266 and the MCU is helpful. Packets look like this:

  • Frame header: fixed 2 byte value 0x55aa
  • Version: 0x00
  • Command word: a byte
  • Data length: 2 bytes
  • Function length: 2 bytes
  • Function command: 1 byte
  • Checksum: 1 byte

First off I wrote a simple program to monitor the state of the device registers (called dpIds for “define product ids”). It just uses the tasmota web console to constantly ask for the current state of the dpIds (“SerialSend5 55aa0001000000”, a hard coded MCU control packet) and prints out any changes. Note that for it to work, the web console log level needs to be set to debug.

Now I can press the button on the device and see what dpIds change. A session looked like this (the notes like “solid white” are things I typed into the terminal as I went):

Clearly dpId 1 (type 1, a boolean) is the power state with 0 being off and 1 being on. This is also easily testable of course. If you send a “TuyaSend1 1,0” to the device using the web console it turns off, and if you send “TuyaSend1 1,1” it turns on. This assignment also maps to the way other Tuya MCU devices are configured, so it seems like a very safe assumption.

dpId 101 (type 4, 1 byte enum) seems to be the mode, walking through the possible values determined:

  • 0: fast pulse
  • 1: twinkle
  • 2: alternate
  • 3: alternate differently
  • 4: alternate and cause epileptic fits
  • 5: double alternate
  • 6: flash
  • 7: solid
  • 8: off
  • 9 onwards: brief all

dpId 107 (type 2, 4 byte value) wasn’t changable via the physical button, so I wrote another simple script to send a bunch of values to it. It appears to be a brightness control for the white LEDs, with 0 being off and 99 being fully on. I haven’t managed to find a brightness control for the coloured LEDs. The brightness control also doesn’t appear to work in all modes.

dpId108 (type 2, 4 byte value) remains a mystery to me at this time. It doesn’t seem to change regardless of what values I send it.

dpId 109 (type 4, 1 byte enum) seems to be which strand is on. A value of 0 is just white LEDs, 1 is just coloured LEDs, 2 is all LEDs, and 3 is all LEDs but dimmer.

It sort of doesn’t matter that I haven’t fully decoded the inner workings of the device, because this is enough information for my use case. All I really want is for all the lights to be on solidly (that is, with no blinking). This is because I use them for lighting under my back pergola and the blinking would be quite annoying.

So how do you wire up Home Assistant to send serial packets to a slave MCU over MQTT? Home Assistant will already control the lights turning on and off because of the default relay implementation for dpId 1. What I need to be able to do is turn on all the lights in a solidly on configuration. For that, I can implement rules like:

>> Rule1
ON Event#0 DO TuyaSend4 101,0 ENDON
ON Event#1 DO TuyaSend4 101,1 ENDON
ON Event#2 DO TuyaSend4 101,2 ENDON
ON Event#3 DO TuyaSend4 101,3 ENDON
ON Event#4 DO TuyaSend4 101,4 ENDON
ON Event#5 DO TuyaSend4 101,5 ENDON
ON Event#6 DO TuyaSend4 101,6 ENDON
ON Event#7 DO TuyaSend4 101,7 ENDON
ON Event#8 DO TuyaSend4 101,8 ENDON
ON Event#9 DO TuyaSend4 101,9 ENDON

>> Rule1 ON

>> Rule2
ON Power1#state=1 DO TuyaSend1 1,1 ENDON
ON Power1#state=1 DO TuyaSend4 109,2 ENDON

>> Rule2 ON

You apply this rule by pasting it into the web console on the device. Note that there are four separate pasted commands. Rule1 exposes the modes from dpId101 as effects in Home Assistant, and Rule2 hooks to the power on MQTT command to ensure that the lights are set to solidly on via dpId 109.

The matching Home Assistant configuration looks like this:

# Arlec fairy lights on the back deck
- platform: mqtt
  name: "Back deck 1"
  command_topic: "cmnd/sonoff14/POWER"
  state_topic: "tele/sonoff14/STATE"
  state_value_template: "{{value_json.POWER}}"
  availability_topic: "tele/sonoff14/LWT"
  effect_command_topic: "cmnd/sonoff14/Event"
  effect_list:
    - 0
    - 1
    - 2
    - 3
    - 4
    - 5
    - 6
    - 7
    - 8
    - 9
  payload_on: "ON"
  payload_off: "OFF"
  payload_available: "Online"
  payload_not_available: "Offline"
  qos: 1
  retain: false

This gives me the ability to turn the fairy lights on and off via Home Assistant, and ensure they they’re solidly on and not blinking. That’s good enough for now.

Share

December 23, 2019

Further thoughts on Azure instance start times

Share

My post from the other day about slow instance starts on Azure caused some commentary (mainly on reddit) that prompted me to think more about all this. In the end, there were a few more experiments I wanted to run to see if I could squeeze more performance out of Azure.

First off, looking at the logs from my initial testing it looks like resource groups are slow. The original terraform creates a resource group as part of the test and then cleans it up at the end. What if instead we had a single permanent resource group and created instances within that?

Here is a series of instance starts and deletes using the terraform from the last post:

You’ll notice that there’s no delete value for the last instance. That’s because terraform crashed and never deleted the instance. You can also see that instance starts are somewhat consistent, except for being slower in the second half of the test than the first, and occasionally spiking out to very very slow. Oh, and deletes are almost always really slow.

What happens if we use a permanent resource group and network? This means that all the “instance start terraform” is doing is creating a network interface and then an instance which uses that network interface. It has to be faster, but does it resolve our issues?

The dashed lines are the graph from above, the solid lines are the new data without resource group creation. You can see that abstracting away the resource group work has made a significant performance improvement. Instance start times are now generally under 100 seconds (which is still three times slower than AWS, and four or five times slower than Google).

So is it just that the Australian Azure zones are slow? I re-ran the new terraform against a US datacenter (East US). Here’s a zoom in of just the instance creates with the resource group extracted to make that clearer, for both data centers:

Interestingly, the Australian data center actually performs better than the US one, which isn’t what I would expect at all. You can also see in this test run that we do still see some unexpectedly slow instance launches, although they feel less frequent and smaller when they happen. That might also just be that I’m testing over a weekend and the data center might be more idle.

Looping back, I think we’ve learnt that resource groups are expensive. The last thing I wanted to dig into was what exactly was happening in those spikes where we had resource groups included. Luckily, they were happening about the point I started logging the terraform trace output of the run.

For example, run azure_1576926569_7_0_apply took 18 minutes and 3 seconds to create the instance. For those 18 minutes, terraform logs that the instance was marked by the Azure API as in provisioningState “Creating”. This correlates with operation id c983b272-fa32-4814-b858-adab3da4d9b1 sitting in state “InProgress”, unfortunately there isn’t a reason logged for why that is. So I guess its not possible as an Azure user to work out why things are sometimes slow.

To summarise some advice for terraform users on Azure — don’t create resource groups if you can avoid it. Create global resource groups and then place new objects into them instead. That said, you’re still going to have slower and less consistent performance than other clouds.

Finally, is instance start time a valid metric for cloud performance? Probably not. That said, it is table stakes to be in the conversation. Slow instance starts affect my overall experience of the cloud, as well as the workability of horizontal scaling techniques. This is especially true for instance start times which vary wildly like Azure’s do — I simply can’t trust that I can grow a horizontal scaling set with any sort of reasonable timeframe.

Share

December 22, 2019

Backing up to S3 with Duplicity

Here is how I setup duplicity to use S3 as a backend while giving duplicity the minimum set of permissions to my Amazon Web Services account.

AWS Security Settings

First of all, I enabled the following general security settings in my AWS account:

  • MFA with a U2F device
  • no root user access keys

Then I set a password policy in the IAM Account Settings and turned off all public access in the S3 Account Settings.

Creating an S3 bucket

As a destination for the backups, I created a new backup-foobar S3 bucket keeping all of the default options except for the region which I set to ca-central-1 to ensure that my data would stay in Canada.

The bucket name can be anything you want as long as:

  • it's not already taken by another AWS user
  • it's a valid hostname (i.e. alphanumeric characters or dashes)

Note that I did not enable S3 server-side encryption since I will be encrypting the backups client-side using the support built into duplicity instead.

Creating a restricted user account

Then I went back into the Identity and Access Managment console and created a new DuplicityBackup policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:ListBucketMultipartUploads",
                "s3:AbortMultipartUpload",
                "s3:CreateBucket",
                "s3:ListBucket",
                "s3:DeleteObject",
                "s3:ListMultipartUploadParts"
            ],
            "Resource": [
                "arn:aws:s3:::backup-foobar",
                "arn:aws:s3:::backup-foobar/*"
            ]
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "s3:ListAllMyBuckets",
            "Resource": "*"
        }
    ]
}

It's unfortunate that the unrestricted s3:ListAllMyBuckets permission has to be granted, but in my testing, duplicity would error out without it. No other permissions were needed.

The next step was to create a new DuplicityBackupHosts IAM group to which I attached the DuplicityBackup policy.

Finally, I created a new machinename IAM user:

  • Access: programmatic only
  • Group: DuplicityBackupHosts
  • Tags: duplicity=1

and wrote down the access key and the access key secret.

Duplicity settings

Once that's all set, I was able to use duplicity using the following options:

  • --s3-use-new-style: apparently required on non-US regions
  • --s3-use-ia: recommended pricing structure for backups
  • --s3-use-multiprocessing: speeds up uploading of backup chunks

and used the following remote URL:

s3://s3.ca-central-1.amazonaws.com/backup-foobar/machinename

which hardcodes the region in order to work-around the lack of explicit region support in duplicity.

I ended up with the following command:

http_proxy= AWS_ACCESS_KEY_ID=<access_key> AWS_SECRET_ACCESS_KEY=<access_key_secret> PASSPHRASE=<password> duplicity --s3-use-new-style --s3-use-ia --s3-use-multiprocessing --no-print-statistics --verbosity 1 --exclude-device-files --exclude-filelist <exclude_file> --include-filelist <include_file> --exclude '**' / <remote_url>

where <exclude_file> is a file which contains the list of paths to keep out of my backup:

/etc/.git
/home/francois/.cache

<include_file> is a file which contains the list of paths to include in the backup:

/etc
/home/francois
/usr/local/bin
/usr/local/sbin
/var/log/apache2
/var/www

and <password> is a long random string (pwgen -s 64) used to encrypt the backups.

Backup script

Here are two other things I included in my backup script prior to the actual backup line listed in the previous section.

The first one deletes files related to failed backups:

http_proxy= AWS_ACCESS_KEY_ID=<access_key> AWS_SECRET_ACCESS_KEY=<access_key_secret> PASSPHRASE=<password> duplicity cleanup --verbosity 1 --force <remote_url>

and the second deletes old backups (older than 12 days in this example):

http_proxy= AWS_ACCESS_KEY_ID=<access_key> AWS_SECRET_ACCESS_KEY=<access_key_secret> PASSPHRASE=<password> duplicity remove-older-than 12D --verbosity 1 --force <remote_url>

Feel free to leave a comment if I forgot anything that might be useful!

December 21, 2019

Why is Azure so slow to start instances?

Share

I’ve been playing with terraform recently, and decided to see how different the terraform for launching a simple Ubuntu instance in various clouds is. There are two big questions there for me — how big is the variation between OpenStack derived clouds; and how painful is it to move between the proprietary clouds? Part of this is because terraform doesn’t present a standardised layer of cloud functionality, it has a provider per cloud.

(Although, I suspect there’s nothing stopping someone from writing a libcloud provider or something like that. It is an interesting idea which requires some additional thought.)

My terraform implementations for each cloud are on github if you’re interested. I don’t want to spend a lot of analysis on the actual terraform, because I think the really interesting thing I found isn’t where I expected it to be (there’s a hint in the title for this post). That said, the OpenStack clouds vary mostly by capabilities. vexxhost for example seems to only offer flavors that require boot-from-volume. The proprietary clouds are complete re-writes, but are generally relatively simple and well documented.

However, that interesting accidental thing — as best as I can tell, Microsoft Azure is really really slow to launch instances. The graph below presents five instance launches on each cloud I tested:

As you can see, Vault, Vexxhost, and AWS are basically all in the same ballpark. Google and Azure are outliers, with Google being crazy fast (but also very slow to delete instances, a metric not presented here), and Azure being more than three times slower than everyone else.

Instance launch time isn’t a great metric to be honest, but it does matter. For example if you were trying to autoscale a web tier or a kubernetes cluster, then waiting over two minutes just for the instance to boot before it can be configured and added to the cluster is probably not ok.

I wonder why Azure is so slow?

I did some further exploring after writing this post and was able to improve performance by changing how I handled resource groups in the terraform. The performance still isn’t great though. You can read more about that in a separate post if you’d like.

Share

December 19, 2019

Red Hat crippling CloudForms product and migrating users to IBM

CloudForms is Red Hat’s supported version of upstream ManageIQ, an infrastructure management platform. It lets you see, manage and deploy to various platforms like OpenStack, VMWare, RHEV, OpenShift and public cloud like AWS and Azure, with single pane of glass view across them all. It has its own orchestration engine but also integrates with Ansible for automated deployments.

As best I can tell, their CloudForms updated Statement of Direction article (behind paywall, sorry) shows that Red Hat is killing off support for non-Red Hat platforms like VMware, AWS, Azure, etc. The justification is to focus on open platforms, which I think means CloudForms will ultimately disappear entirely with Red Hat focusing on OpenShift instead.

We made a strategic decision to focus our management strategy on the future — open, cloud-native environments that promote portability across on-premise, private and public clouds.

CloudForms updated Statement of Direction

However to me this is still a big blow to users of the platform, where I’m sure most will have at least some VMWare to manage. Indeed, when implementing CloudForms at work and talking to Red Hat, they said that their most mature integration in CloudForms is with VMWare.

According to the Red Hat article, CloudForms with full platform support is being embedded into IBM Cloud Pak for Multicloud Management and users are encouraged to “migrate your Red Hat CloudForms subscriptions to IBM Cloud Pak for Multicloud Management licenses.” Red Hat’s CloudForms Statement of Direction FAQ article lays out the migration path, which does confirm Red Hat will continue to support existing clients for the remainder of their subscription.

So in short, CloudForms from Red Hat is being crippled and will only support Red Hat products, which really means that users are being forced to buy IBM instead. Of course Red Hat is entitled to change their own products, but this move does seem curious when execs on both sides said they would remain independent. Maybe it’s better than killing CloudForms outright?

We can publicly say that all our products will survive in their current form and continue to grow. We will continue to support all our products; we’re separate entities and we’re going to have separate contracts, and there is no intention to de-emphasise any of our products and we’ll continue to invest heavily in it.

Jim Whitehurst as Red Hat CEO

December 18, 2019

KVM guests with emulated SSD and NVMe drives

Sometimes when you’re using KVM guests to test something, perhaps like a Ceph or OpenStack Swift cluster, it can be useful to have SSD and NVMe drives. I’m not talking about passing physical drives through, but rather emulating them.

NVMe drives

QEMU supports emulating NVMe drives as arguments on the command line, but it’s not yet exposed to tools like virt-manager. This means you can’t just add a new drive of type nvme into your virtual machine XML definition, however you can add those qemu arguments to your XML. This also means that the NVMe drives will not show up as drives in tools like virt-manager, even after you’ve added them with qemu. Still, it’s fun to play with!

QEMU command line args for NVMe

Michael Moese has nicely documented how to do this on his blog. Basically, after creating a disk image (raw or qcow2) you can add the following two arguments like this to the qemu command. I use a numeric drive id and serial so that I can add multiple NVMe drives (just duplicate the lines and increment the number).

-drive file=/path/to/nvme1.img,if=none,id=NVME1 \
-device nvme,drive=NVME1,serial=nvme-1

libvirt XML definition for NVMe

To add NVMe to a libvirt guest, add something like this at the bottom of your virtual machine definition (before the closing </domain> tag) to call those same qemu args.

  <qemu:commandline>
    <qemu:arg value='-drive'/>
    <qemu:arg value='file=/path/to/nvme1.img,format=raw,if=none,id=NVME1'/>
    <qemu:arg value='-device'/>
    <qemu:arg value='nvme,drive=NVME1,serial=nvme-1'/>
  </qemu:commandline>

virt-install for NVMe

If you’re spinning up VMs using virt-install, then you can also pass these in as arguments, which will automatically populate the libvirt XML file with the arguments above. Note as above, you do not add a --disk option for NVMe drives.

--qemu-commandline='-drive file=/path/to/nvme1.img,format=raw,if=none,id=NVME1'
--qemu-commandline='-device nvme,drive=NVME1,serial=nvme-1'

Confirming drive is NVMe

Your NVMe drives will show up as specific devices under Linux, like /dev/nvme0n1 and of course you can see them with tools like lsblk and nvme (from nvme-cli package).

Here’s nvme tool listing the NVMe drive in a guest.

sudo nvme list

This should return something that looks like this.

Node          SN      Model           Namespace Usage                   Format         FW Rev  
------------- ------- --------------- --------- ----------------------- -------------- ------
/dev/nvme0n1  nvme-1  QEMU NVMe Ctrl  1         107.37  GB / 107.37  GB 512   B +  0 B 1.0

SSD drives

SSD drives are slightly different. Simply add a drive to your guest as you normally would, on the bus you want to use (for example, SCSI or SATA). Then, add the required set command to set rotational speed to make it an SSD (note that you set it to 1 in qemu, which sets it to 0 in Linux).

This does require you to know the name of the device so it will depend on how many drives you add of that type. Although it generally follows a format like this, for the first SCSI drive on the first SCSI controller, scsi0-0-0-0 and for SATA, sata0-0-0, but it’s good to confirm.

You can determine the exact name for your drive by querying the guest with virsh qemu-monitor-command, like so.

virsh qemu-monitor-command --hmp 1 "info qtree"

This will provide details showing the devices, buses and connected drives. Here’s an example for the first SCSI drive, where you can see it’s scsi0-0-0-0.

                  dev: scsi-hd, id "scsi0-0-0-0"
                    drive = "drive-scsi0-0-0-0"
                    logical_block_size = 512 (0x200)
                    physical_block_size = 512 (0x200)
                    min_io_size = 0 (0x0)
                    opt_io_size = 0 (0x0)
                    discard_granularity = 4096 (0x1000)
                    write-cache = "on"
                    share-rw = false
                    rerror = "auto"
                    werror = "auto"
                    ver = "2.5+"
                    serial = ""
                    vendor = "QEMU"
                    product = "QEMU HARDDISK"
                    device_id = "drive-scsi0-0-0-0"
                    removable = false
                    dpofua = false
                    wwn = 0 (0x0)
                    port_wwn = 0 (0x0)
                    port_index = 0 (0x0)
                    max_unmap_size = 1073741824 (0x40000000)
                    max_io_size = 2147483647 (0x7fffffff)
                    rotation_rate = 1 (0x1)
                    scsi_version = 5 (0x5)
                    cyls = 16383 (0x3fff)
                    heads = 16 (0x10)
                    secs = 63 (0x3f)
                    channel = 0 (0x0)
                    scsi-id = 0 (0x0)
                    lun = 0 (0x0)

QEMU command for SSD drive

When using qemu, add your drive as usual and then add the set option. Using the SCSI drive example from above (which is on scsi0-0-0-0), this is what it would look like.

-set device.scsi0-0-0-0.rotation_rate=1

libvirt XML definition for SSD drive

Similarly, for a defined guest, add the set argument like we did for NVMe drives, that is at the bottom of the XML, before the closing </domain> tag.

  <qemu:commandline>
    <qemu:arg value='-set'/>
    <qemu:arg value='device.scsi0-0-0-0.rotation_rate=1'/>
  </qemu:commandline>

If your machine has NVMe drives specified also, just add the set args for the SSD, don’t add a second qemu:commandline section. It should look something like this.

  <qemu:commandline>
    <qemu:arg value='-set'/>
    <qemu:arg value='device.scsi0-0-0-0.rotation_rate=1'/>
    <qemu:arg value='-drive'/>
    <qemu:arg value='file=/var/lib/libvirt/images/rancher-vm-centos-7-00-nvme.qcow2,format=qcow2,if=none,id=NVME1'/>
    <qemu:arg value='-device'/>
    <qemu:arg value='nvme,drive=NVME1,serial=nvme-1'/>
  </qemu:commandline>

virt-install command for SSD drive

When spinning up machine using virt-install, add a drive as normal. The only thing you have to add is the argument for the qemu set command. Here’s that same SCSI example.

--qemu-commandline='-set device.scsi0-0-0-0.rotation_rate=1'

Confirming drive is an SSD

You can confirm the rotational speed with lsblk, like so.

sudo lsblk -d -o name,rota

This will return either 0 (for rotational speed false, meaning SSD) or 1 (for rotating drives, meaning non-SSD). For example, here’s a bunch of drives on a KVM guest where you can see /dev/sda and /dev/nvmen0n1 are both SSDs.

NAME    ROTA
sda        0
sdb        1
sr0        1
vda        1
nvme0n1    0

You can also check with smartctl, which will report the rotational rate as an SSD. Here’s an example on /dev/sda which is set to be an SSD in KVM guest.

smartctl -i /dev/sda

This shows a result like this, note Rotational Rate is Solid State Device.

=== START OF INFORMATION SECTION ===
Vendor:               QEMU
Product:              QEMU HARDDISK
Revision:             2.5+
Compliance:           SPC-3
User Capacity:        107,374,182,400 bytes [107 GB]
Logical block size:   512 bytes
LU is thin provisioned, LBPRZ=0
Rotation Rate:        Solid State Device
Device type:          disk
Local Time is:        Wed Dec 18 17:52:18 2019 AEDT
SMART support is:     Unavailable - device lacks SMART capability.

So that’s it! Thanks to QEMU you can play with NVMe and SSD drives in your guests.

rfid and hrfid

I was staring at some assembly recently, and for not the first time encountered rfid and hrfid, two instructions that we use when doing things like returning to userspace, returning from OPAL to the kernel, or from a host kernel into a guest.

rfid copies various bits from the register SRR1 (Machine Status Save/Restore Register 1) into the MSR (Machine State Register), and then jumps to an address given in SRR0 (Machine Status Save/Restore Register 0). hrfid does something similar, using HSRR0 and HSRR1 (Hypervisor Machine Status Save/Restore Registers 0/1), and slightly different handling of MSR bits.

The various Save/Restore Registers are used to preserve the state of the CPU before jumping to an interrupt handler, entering the kernel, etc, and are set up as part of instructions like sc (System Call), by the interrupt mechanism, or manually (using instructions like mtsrr1).

Anyway, the way in which rfid and hrfid restores MSR bits is documented somewhat obtusely in the ISA (if you don't believe me, look it up), and I was annoyed by this, so here, have a more useful definition. Leave a comment if I got something wrong.

rfid - Return From Interrupt Doubleword

Machine State Register

Copy all bits (except some reserved bits) from SRR1 into the MSR, with the following exceptions:

  • MSR_3 (HV, Hypervisor State) = MSR_3 & SRR1_3
    [We won't put the thread into hypervisor state if we're not already in hypervisor state]

  • If MSR_29:31 != 0b010 [Transaction State Suspended, TM not available], or SRR1_29:31 != 0b000 [Transaction State Non-transactional, TM not available] then:

    • MSR_29:30 (TS, Transaction State) = SRR1_29:30
    • MSR_31 (TM, Transactional Memory Available) = SRR1_31

    [See the ISA description for explanation on how rfid interacts with TM and resulting interrupts]

  • MSR_48 (EE, External Interrupt Enable) = SRR1_48 | SRR1_49 (PR, Problem State)
    [If going into problem state, external interrupts will be enabled]

  • MSR_51 (ME, Machine Check Interrupt Enable) = (MSR_3 (HV, Hypervisor State) & SRR1_51) | ((! MSR_3) & MSR_51)
    [If we're not already in hypervisor state, we won't alter ME]

  • MSR_58 (IR, Instruction Relocate) = SRR1_58 | SRR1_49 (PR, Problem State)
    [If going into problem state, relocation will be enabled]

  • MSR_59 (DR, Data Relocate) = SRR1_59 | SRR1_49 (PR, Problem State)
    [If going into problem state, relocation will be enabled]

Next Instruction Address

  • NIA = SRR0_0:61 || 0b00
    [Jump to SRR0, set last 2 bits to zero to ensure address is aligned to 4 bytes]

hrfid - Hypervisor Return From Interrupt Doubleword

Machine State Register

Copy all bits (except some reserved bits) from HSRR1 into the MSR, with the following exceptions:

  • If MSR_29:31 != 0b010 [Transaction State Suspended, TM not available], or HSRR1_29:31 != 0b000 [Transaction State Non-transactional, TM not available] then:

    • MSR_29:30 (TS, Transaction State) = HSRR1_29:30
    • MSR_31 (TM, Transactional Memory Available) = HSRR1_31

    [See the ISA description for explanation on how rfid interacts with TM and resulting interrupts]

  • MSR_48 (EE, External Interrupt Enable) = HSRR1_48 | HSRR1_49 (PR, Problem State)
    [If going into problem state, external interrupts will be enabled]

  • MSR_58 (IR, Instruction Relocate) = HSRR1_58 | HSRR1_49 (PR, Problem State)
    [If going into problem state, relocation will be enabled]

  • MSR_59 (DR, Data Relocate) = HSRR1_59 | HSRR1_49 (PR, Problem State)
    [If going into problem state, relocation will be enabled]

Next Instruction Address

  • NIA = HSRR0_0:61 || 0b00
    [Jump to HSRR0, set last 2 bits to zero to ensure address is aligned to 4 bytes]

December 17, 2019

A close-to-upstream firmware build for the Raptor Blackbird

UPDATE: A newer version is available here

It goes without saying that using this build is a At Your Own Risk and I make zero warranty. AFAIK it can’t physically destroy your system.

My GitHub op-build branch stewart-blackbird-v1 has all the changes built into this build (the VERSION displayed in firmware will be slightly weird as I did the tagging afterwards… this is not meant to be “howto release firmware to the public”). Follow op-build pull 3341 for the state of upstreaming everything.

Binaries are over at https://www.flamingspork.com/blackbird/stewart-blackbird-v1-images/ (see the git branch of op-build for source).

To flash it (temporarily), grab blackbird.pnor, get it to /tmp on your BMC and follow the instructions I posted the other day.

I’d be interested in any feedback on what does/does not work.

December 15, 2019

Are you Fans of the Blackbird? Speak up, I can’t hear you over the fan.

So, as of yesterday, I started running a pretty-close-to-upstream op-build host firmware stack on my Blackbird. Notable yak-shaving has included:

Apart from that, I was all happy as Larry. Except then I went into the room with the Blackbird in it an went “huh, that’s loud”, and since it was bedtime, I decided it could all wait until the morning.

It is now the morning. Checking fan speeds over IPMI, one fan stood out (fan2, sitting at 4300RPM). This was a bit of a surprise as what’s silkscreened on the board is that the rear case fan is hooked up to ‘fan2″, and if we had a “start from 0/1” mix up, it’d be the front case fan. I had just assumed it’d be maybe OCC firmware dying or something, but this wasn’t the case (I checked – thanks occtoolp9!)

After a bit of digging around, I worked out this mapping:

IPMI fan0Rear Case FanMotherboard Fan 2
IPMI fan1Front Case FanMotherboard Fan 3
IPMI fan2CPU FanMotherboard Fan 1

Which is about as surprising and confusing as you’d think.

After a bunch of digging around the Raptor ports of OpenBMC and Hostboot, it seems that the IPL Observer which is custom to Raptor controls if the BMC decides to do fan control or not.

You can get its view of the world from the BMC via the (incredibly user friendly) poking at DBus:

busctl get-property org.openbmc.status.IPL /org/openbmc/status/IPL org.openbmc.status.IPL current_status; busctl get-property org.openbmc.status.IPL /org/openbmc/status/IPL org.openbmc.status.IPL current_istep

Which if you just have the Hostboot patch in (like I first did) you end up with:

s "IPL_RUNNING"
s "21,3"

Which is where Hostboot exits the IPL process (as you see on the screen) and hands over to skiboot. But if you start digging through their op-build tree, you find that there’s a signal_linux_start_complete script which calls pnv-lpc to write two values to LPC ports 0x81 and 0x82. The pnv-lpc utility is the external/lpc/ binary from skiboot, and these two ports are the “extended lpc port 80h” state.

So, to get back fan control? First, build the lpc utility:

git clone git@github.com:open-power/skiboot.git
cd skiboot/external/lpc
make

and then poke the magic values of “IPL complete and linux running”:

$ sudo ./lpc io 0x81.b=254
[io] W 0x00000081.b=0xfe
$ sudo ./lpc io 0x82.b=254
[io] W 0x00000082.b=0xfe

You get a friendly beep, and then your fans return to sanity.

Of course, for that to work you need to have debugfs mounted, as this pokes OPAL debugfs to do direct LPC operations.

Next up: think of a smarter way to trigger that than “stewart runs it on the command line”. Also next up: work out the better way to determine that fan control should be on and patch the BMC.

Automatically updating containers with Docker

Running something in a container using Docker or Podman is cool, but maybe you want an automated way to always run the latest container? Using the :latest tag alone does not to this, that just pulls the latest container at the time. You could have a cronjob that just always pulls the latest containers and restarts the container but then if there’s no update you have an outage for no reason.

It’s not too hard to write a script to pull the latest container and restart the service only if required, then tie that together with a systemd timer.

To restart a container you need to know how it was started. If you have only one container then you could just hard-code it, however it gets more tricky to manage if you have a number of containers. This is where something like runlike can help!

First, start up your container however you need to (OwnTracks recorder, for example).

Next, let’s install runlike with pip.

sudo pip install runlike

Now, let’s create a simple script that takes one optional argument, the name of a running container. If the argument is omitted, it will default to all containers. The script will check if the latest image is different to the running image, and if so, restart the container using the new image with the same arguments as before (determined by runlike). If there is no newer image, then it will just leave the running container alone.

Create the script.

cat << \EOF | sudo tee /usr/local/bin/update-containers.sh
#!/bin/bash

# Abort on all errors, set -x
set -o errexit

# Get the containers from first argument, else get all containers
CONTAINER_LIST="${1:-$(docker ps -q)}"

for container in ${CONTAINER_LIST}; do
  # Get the image and hash of the running container
  CONTAINER_IMAGE="$(docker inspect --format "{{.Config.Image}}" --type container ${container})"
  RUNNING_IMAGE="$(docker inspect --format "{{.Image}}" --type container "${container}")"

  # Pull in latest version of the container and get the hash
  docker pull "${CONTAINER_IMAGE}"
  LATEST_IMAGE="$(docker inspect --format "{{.Id}}" --type image "${CONTAINER_IMAGE}")"

  # Restart the container if the image is different
  if [[ "${RUNNING_IMAGE}" != "${LATEST_IMAGE}" ]]; then
    echo "Updating ${container} image ${CONTAINER_IMAGE}"
    DOCKER_COMMAND="$(runlike "${container}")"
    docker rm --force "${container}"
    eval ${DOCKER_COMMAND}
  fi
done
EOF

Make the script executable.

sudo chmod a+x /usr/local/bin/update-containers.sh

You can test the script by just running it.

/usr/local/bin/update-containers.sh

Now that you have a script which will check for a new images and update containers, let’s make a systemd service and timer for it. This way you can schedule regular update checks whenever you like. If you want a script for a specific container, just add the container names as arguments to the script.

First, create the service

cat << EOF | sudo tee /etc/systemd/system/update-containers.service 
[Unit]
Description=Update containers
After=network-online.target

[Service]
Type=oneshot
ExecStart=/usr/local/bin/update-containers.sh
EOF

Next, create the matching timer service (note that the service and timer names need to match).

cat << EOF | sudo tee /etc/systemd/system/update-containers.timer 
[Unit]
Description=Timer for updating containers
Wants=network-online.target

[Timer]
OnActiveSec=24h
OnUnitActiveSec=24h

[Install]
WantedBy=timer.target
EOF

Reload systemd to pick up the new service and enable the timer.

sudo systemctl daemon-reload
sudo systemctl start update-containers.timer
sudo systemctl enable update-containers.timer

You can check the status of the timer and the service using standard systemd tools.

sudo systemctl status update-containers.timer
sudo systemctl status update-containers.service
sudo journalctl -u update-containers.service

That’s it! Sit back and let your containers be automatically updated for you. If you want to manually update a container, you could just use version tags and manage them separately.


Enabling Docker in Fedora 31 by reverting to cgroups v1

Fedora has switched to cgroups v2 by default now, but Docker doesn’t yet support it and so fails to start. If you want to use Docker then you need to revert cgroups to v1 by adding the systemd.unified_cgroup_hierarchy=0 kernel argument.

Add systemd.unified_cgroup_hierarchy=0 to the default GRUB config with sed.

sudo sed -i '/^GRUB_CMDLINE_LINUX/ s/"$/ systemd.unified_cgroup_hierarchy=0"/' /etc/default/grub

Now rebuild your GRUB config.

If you’re using BIOS boot then it’s this.

sudo grub2-mkconfig -o /boot/grub2/grub.cfg

If you’re running EFI, then it’s this.

sudo grub2-mkconfig -o /boot/efi/EFI/fedora/grub.cfg

Now reboot and make sure Docker can start!

OwnTracks recorder in a container on Fedora with Let’s Encrypt and nginx

OwnTracks Recorder is a web application which maps locations over time. Generally, it connects to an MQTT server and subscribes to owntracks/+ topics for any location updates, but it also has a built in function to receive updates over HTTP.

I have been using OwnTracks with MQTT for a while, but found it to be too unreliable on Android (disconnects in the background and doesn’t reconnect nicely). Using HTTP is supposed to be more reliable, so this is how I set it up. The idea is to use OwnTracks on Android to post directly to the OwnTracks recorder over HTTP instead of MQTT and have recorder post the MQTT messages on our behalf using LUA scripts (for Home Assistant).

Friends is an important feature (to let members of the family see where eachother is located) and fortunately it is supported in HTTP mode (but it requires a little bit more configuration).

nginx and base configuration

We will use nginx as a reverse proxy in front of the recorder to provide both TLS and authentication to keep the service private and secure.

sudo dnf install nginx httpd-tools

Configure nginx to proxy to OwnTracks recorder by creating a new config file for the domain you are hosting on. For example, if your domain is owntracks.yourdomain.com then create a file at /etc/nginx/conf.d/owntracks.yourdomain.com. Later certbot will update this to add TLS configuration.

cat << \EOF |sudo tee /etc/nginx/conf.d/owntracks.yourdomain.com.conf
server {
  server_name owntracks.yourdomain.com;
  root /var/www/html;

  auth_basic "OwnTracks";
  auth_basic_user_file /etc/nginx/owntracks.htpasswd;
  proxy_set_header X-Limit-U $remote_user;

  location / {
    proxy_pass http://127.0.0.1:8083/;
    proxy_http_version 1.1;
    proxy_set_header Host $host;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Real-IP $remote_addr;
  }

  location /ws {
    rewrite ^/(.*) /$1 break;
    proxy_pass http://127.0.0.1:8083;
    proxy_http_version  1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
    proxy_set_header Host $host;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
  }

  location /view/ {
    proxy_buffering off;
    proxy_pass http://127.0.0.1:8083/view/;
    proxy_http_version 1.1;
    proxy_set_header Host $host;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Real-IP $remote_addr;
  }
  location /static/ {
    proxy_pass http://127.0.0.1:8083/static/;
    proxy_http_version 1.1;
    proxy_set_header Host $host;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Real-IP $remote_addr;
  }

  location /pub {
    proxy_pass http://127.0.0.1:8083/pub;
    proxy_http_version 1.1;
    proxy_set_header Host $host;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Real-IP $remote_addr;
  }

  error_page 404 /404.html;
  location = /40x.html {
  }

  error_page 500 502 503 504 /50x.html;
  location = /50x.html {
  }
}
EOF

Now that we have a web server let’s open the ports to enable traffic on port 80 and 443.

sudo firewall-cmd --zone=FedoraServer --add-service=http
sudo firewall-cmd --zone=FedoraServer --add-service=https
sudo firewall-cmd --runtime-to-permanent

SELinux will block nginx from acting as a proxy and connecting to our other services, so we need to tell it that it’s OK.

 sudo setsebool -P httpd_can_network_connect 1
sudo setsebool -P httpd_can_network_relay 1

Note that we’ve set a password file for nginx to protect recorder in the config, now we need to create that file.

Let’s pretend that we have three users, Alice, Bob and Charlie. Create the nginx password file when you add the password for Alice, then add the password for the other two.

sudo htpasswd -c /etc/nginx/owntracks.htpasswd alice
sudo htpasswd /etc/nginx/owntracks.htpasswd bob
sudo htpasswd /etc/nginx/owntracks.htpasswd charlie

That’s the core nginx config done, next we will use cerbot to get a certificate and re-configure nginx to use TLS.

Certbot

Install certbot and the nginx plugin, which will let us get signed certificates from Let’s Encrypt. Using the plugin means it will configure nginx to handle the challenge and write the config file automatically. You will need to make sure that port 80 on your nginx server is available over the Internet (and probably also port 443 so that we can connect securely to recorder remotely) as well as a DNS entry pointing to your external IP (I’ll use owntracks.yourdomain.com as an example).

sudo dnf install certbot python3-certbot-nginx

Next, use certbot to get TLS certificates from Let’s Encrypt. Follow the prompts and be sure to enable TLS redirection so that all traffic will be encrypted.

sudo certbot --agree-tos \
--redirect \
--rsa-key-size 4096 \
--nginx \
-d owntracks.yourdomain.com

Now that we have a certificate, let’s enable auto renewals.

sudo systemctl enable --now certbot-renew.timer

OK, nginx should now be configured with TLS and managed by certbot.

Recorder with Docker

Now let’s get the recorder container going! First install and prepare Docker. Note that if you’re running on Fedora 31 or later, you need to revert to cgroup v1 first.

sudo groupadd -r docker
sudo gpasswd -a ${USER} docker
newgrp docker
sudo dnf install -y cockpit-docker docker
sudo systemctl start docker
sudo systemctl enable docker

Next let’s prepare the configuration and scripts for the container.

sudo mkdir -p /var/lib/owntracks/{config,scripts,logs}

Generally we pass variables into containers, but recorder also supports a config file so we’ll use that instead (OTR_LUASCRIPT is not supported as a variable, anyway). Replace the values for your MQTT server below.

NOTE: OTR_PORT must not be a number not a string, else it will be be ignored.

OTR_HOST="mqtt-broker"
OTR_PORT=mqtt-port
OTR_USER="mqtt-user"
OTR_PASS="mqtt-user-password"

cat << EOF | sudo tee /var/lib/owntracks/config/recorder.conf
OTR_TOPICS = "owntracks/#"
OTR_HTTPHOST = "0.0.0.0"
OTR_STORAGEDIR = "/store"
OTR_HTTPLOGDIR = "/logs"
OTR_LUASCRIPT = "/scripts/hook.lua"
OTR_HOST = "${OTR_HOST}"
OTR_PORT = ${OTR_PORT}
OTR_USER = "${OTR_USER}"
OTR_PASS = "${OTR_PASS}"
OTR_CLIENTID = "owntracks-recorder"
EOF

If you’re using TLS on your MQTT server, then copy over the CA (for example, /etc/pki/tls/certs/ca-bundle.crt) and set the OTR_CAFILE config option to point to the file as it will be inside the container. This will automatically enable TLS connection to your MQTT server.

sudo cp /etc/pki/tls/certs/ca-bundle.crt /var/lib/owntracks/config/ca.crt
echo 'OTR_CAFILE="/config/ca.crt"' | sudo tee -a /var/lib/owntracks/config/recorder.conf

Next get the Lua scripts ready which will allow recorder to forward HTTP events on to MQTT. We will write a file called hook.lua to run the script, which is referenced in the config above. It has a JSON dependency, which we will download from the Internet.

wget http://regex.info/code/JSON.lua
sudo mv JSON.lua /var/lib/owntracks/scripts/JSON.lua
cat << EOF | sudo tee /var/lib/owntracks/scripts/hook.lua
JSON = (loadfile "/scripts/JSON.lua")()

function otr_init()
end

function otr_exit()
end

function otr_hook(topic, _type, data)
    otr.log("DEBUG_PUB:" .. topic .. " " .. JSON:encode(data))
    if(data['_http'] == true) then
        if(data['_repub'] == true) then
           return
        end
        data['_repub'] = true
        local payload = JSON:encode(data)
        otr.publish(topic, payload, 1, 1)
    end
end

function otr_putrec(u, d, s)
        j = JSON:decode(s)
        if (j['_repub'] == true) then
                return 1
        end
end
EOF

Next we can run the container for recorder. We will map in all of the directories we created earlier and the configuration we created should be read in when the program in the container starts. Note that :Z option sets the SELinux context on those config files.

docker run -dit --name recorder \
--restart always \
-p 8083:8083 \
-v /var/lib/owntracks/store:/store:Z \
-v /var/lib/owntracks/config:/config:Z \
-v /var/lib/owntracks/scripts:/scripts:Z \
-v /etc/localtime:/etc/localtime:ro \
owntracks/recorder

OwnTracks should now be listening on port 8083, waiting for connections to come in through nginx!

Friends with OwnTracks

To set up friends in HTTP mode we need to get a shell on the container and load friends data into the database.

docker exec -it recorder /bin/sh

Inside the container we load friends data into the database. Let’s use our three friends as an example, Alice with her phone pixel3xl, Bob with his pixel4 and Charlie with her pixel3a, to set up notifications for everyone.

ocat --load=friends << EOF
alice-pixel3xl [ "bob/pixel4", "charlie/pixel3a" ]
bob-pixel4 [ "alice/pixel3xl, "charlie/pixel3a" ]
charlie-pixel3a [ "alice/pixel3xl, "bob/pixel4" ]
EOF

We can dump the friends data to see what we’ve loaded, then exit the container.

ocat -S /store --dump=friends
exit

Now, whenever Alice, Bob or Charlie update their location, recorder will return JSON data with the location of the other two. OwnTracks will then display that information under the Friends tab. Unfortunately, the one thing thing HTTP mode doesn’t support is Regions notifications to be notified when friends enter or leave defined way points, but I’ve found OwnTracks to be much more reliable with HTTP so I guess that’s a small price to pay…


December 14, 2019

Audiobooks – November 2019

Exactly: How Precision Engineers Created the Modern World by Simon Winchester

Starting from the early 18th century each chapter covers increasing greater accuracy and the technology that needed and used it. Nice read 8/10

The Secret Cyclist: Real Life as a Rider in the Professional Peloton by The Secret Cyclist

An okay read although I don’t follow the sport so had never heard of most of the names. It is still readable however and gives a good feel for the world. 6/10

Braving It: A Father, a Daughter, and an Unforgettable Journey into the Alaskan Wild by James Campbell

A father takes his 15 year-old daughter for two trips to a remote cabin and a 3rd trip hiking/canoeing along a remote river in Alaska. Well written and interesting. 8/10

The Left Behind: Decline and Rage in Rural America by Robert Wuthnow

Based on Interviews with small town Americans it talks about their lives and frustrations with Washington which they see as distant but interfering. 7/10

World War Z: An Oral History of the Zombie War by Max Brookes

This was the “almost” full text version. Lots of different actors reading each chapter (which are arranged as interviews). Great story and presentation works well. 9/10

Share

Booting temporary firmware on the Raptor Blackbird

In a future post, I’ll detail how to build my ported-to-upstream Blackbird firmware. Here though, we’ll explore booting some firmware temporarily to experiment.

Step 1: Copy your new PNOR image over to the BMC.
Step 2: …
Step 3: Profit!

Okay, not really, once you’ve copied over your image, ensure the computer is off and then you can tell the daemon that provides firmware to the host to use a file backend for it rather than the PNOR chip on the motherboard (i.e. yes, you can boot your system even when the firmware chip isn’t there – although I’ve not literally tried this).

root@blackbird:~# mboxctl --backend file:/tmp/blackbird.pnor 
SetBackend: Success
root@blackbird:~# obmcutil poweron

If we look at the serial console (ssh to the BMC port 2200) we’ll see Hostboot start, realise there’s newer SBE code, flash it, and reboot:

--== Welcome to Hostboot hostboot-b284071/hbicore.bin ==--

  3.02606|secure|SecureROM valid - enabling functionality
  5.14678|Booting from SBE side 0 on master proc=00050000
  5.18537|ISTEP  6. 5 - host_init_fsi
  5.47985|ISTEP  6. 6 - host_set_ipl_parms
  5.54476|ISTEP  6. 7 - host_discover_targets
  6.56106|HWAS|PRESENT> DIMM[03]=8080000000000000
  6.56108|HWAS|PRESENT> Proc[05]=8000000000000000
  6.56109|HWAS|PRESENT> Core[07]=1511540000000000
  6.61373|ISTEP  6. 8 - host_update_master_tpm
  6.61529|SECURE|Security Access Bit> 0x0000000000000000
  6.61530|SECURE|Secure Mode Disable (via Jumper)> 0x8000000000000000
  6.61543|ISTEP  6. 9 - host_gard
  7.20987|HWAS|FUNCTIONAL> DIMM[03]=8080000000000000
  7.20988|HWAS|FUNCTIONAL> Proc[05]=8000000000000000
  7.20989|HWAS|FUNCTIONAL> Core[07]=1511540000000000
  7.21299|ISTEP  6.11 - host_start_occ_xstop_handler
  8.28965|ISTEP  6.12 - host_voltage_config
  8.47973|ISTEP  7. 1 - mss_attr_cleanup
  9.07674|ISTEP  7. 2 - mss_volt
  9.35627|ISTEP  7. 3 - mss_freq
  9.63029|ISTEP  7. 4 - mss_eff_config
 10.35189|ISTEP  7. 5 - mss_attr_update
 10.38489|ISTEP  8. 1 - host_slave_sbe_config
 10.45332|ISTEP  8. 2 - host_setup_sbe
 10.45450|ISTEP  8. 3 - host_cbs_start
 10.45574|ISTEP  8. 4 - proc_check_slave_sbe_seeprom_complete
 10.48675|ISTEP  8. 5 - host_attnlisten_proc
 10.50338|ISTEP  8. 6 - host_p9_fbc_eff_config
 10.50771|ISTEP  8. 7 - host_p9_eff_config_links
 10.53338|ISTEP  8. 8 - proc_attr_update
 10.53634|ISTEP  8. 9 - proc_chiplet_fabric_scominit
 10.55234|ISTEP  8.10 - proc_xbus_scominit
 10.56202|ISTEP  8.11 - proc_xbus_enable_ridi
 10.57788|ISTEP  8.12 - host_set_voltages
 10.59421|ISTEP  9. 1 - fabric_erepair
 10.65877|ISTEP  9. 2 - fabric_io_dccal
 10.66048|ISTEP  9. 3 - fabric_pre_trainadv
 10.66665|ISTEP  9. 4 - fabric_io_run_training
 10.66860|ISTEP  9. 5 - fabric_post_trainadv
 10.67060|ISTEP  9. 6 - proc_smp_link_layer
 10.67503|ISTEP  9. 7 - proc_fab_iovalid
 11.10386|ISTEP  9. 8 - host_fbc_eff_config_aggregate
 11.15103|ISTEP 10. 1 - proc_build_smp
 11.27537|ISTEP 10. 2 - host_slave_sbe_update
 11.68581|sbe|System Performing SBE Update for PROC 0, side 0
 34.50467|sbe|System Rebooting To Complete SBE Update Process
 34.50595|IPMI: Initiate power cycle
 34.54671|Stopping istep dispatcher
 34.68729|IPMI: shutdown complete

One of the improvements is we now get output from the SBE! This means that when we do things like mess up secure boot and non secure boot firmware (I’ll explain why/how this is a thing later), we’ll actually get something useful out of a serial port:

--== Welcome to SBE - CommitId[0x8b06b5c1] ==--
istep 3.19
istep 3.20
istep 3.21
istep 3.22
istep 4.1
istep 4.2
istep 4.3
istep 4.4
istep 4.5
istep 4.6
istep 4.7
istep 4.8
istep 4.9
istep 4.10
istep 4.11
istep 4.12
istep 4.13
istep 4.14
istep 4.15
istep 4.16
istep 4.17
istep 4.18
istep 4.19
istep 4.20
istep 4.21
istep 4.22
istep 4.23
istep 4.24
istep 4.25
istep 4.26
istep 4.27
istep 4.28
istep 4.29
istep 4.30
istep 4.31
istep 4.32
istep 4.33
istep 4.34
istep 5.1
istep 5.2
SBE starting hostboot

And then we’re back into normal Hostboot boot (which we’ve all seen before) and end up at a newer petitboot!

Petitboot 1.11 on a Raptor Blackbird

One notable absence from that screenshot is my installed Fedora is missing. This is because there appears to be a bug in the 5.3.7 kernel that’s currently upstream, and if we drop to the shell and poke at lspci and dmesg, we can work out what could be the culprit:

Exiting petitboot. Type 'exit' to return.
You may run 'pb-sos' to gather diagnostic data
No password set, running as root. You may set a password in the System Configuration screen.
# lspci
0000:00:00.0 PCI bridge: IBM Device 04c1
0001:00:00.0 PCI bridge: IBM Device 04c1
0001:01:00.0 Non-Volatile memory controller: Intel Corporation Device f1a8 (rev 03)
0002:00:00.0 PCI bridge: IBM Device 04c1
0002:01:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9235 PCIe 2.0 x2 4-port SATA 6 Gb/s Controller (rev 11)
0003:00:00.0 PCI bridge: IBM Device 04c1
0003:01:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02)
0004:00:00.0 PCI bridge: IBM Device 04c1
0004:01:00.0 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0004:01:00.1 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0004:01:00.2 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0005:00:00.0 PCI bridge: IBM Device 04c1
0005:01:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 04)
0005:02:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 41)
# dmesg|grep -i nvme
[    2.991038] nvme nvme0: pci function 0001:01:00.0
[    2.991088] nvme 0001:01:00.0: enabling device (0140 -> 0142)
[    3.121799] nvme nvme0: Identify Controller failed (19)
[    3.121802] nvme nvme0: Removing after probe failure status: -5
# uname -a
Linux skiroot 5.3.7-openpower1 #2 SMP Sat Dec 14 09:06:20 PST 2019 ppc64le GNU/Linux

If for some reason the device didn’t show up in lspci, then I’d look at the skiboot firmware log, which is /sys/firmware/opal/msglog.

Looking at upstream stable kernel patches, it seems like 5.3.8 has a interesting looking patch when you realize that ppc64le uses a 64k page size:

commit efac0f186ea654e8389f5017c7f643ef48cb4b93
Author: Kevin Hao <haokexin@gmail.com>
Date:   Fri Oct 18 10:53:14 2019 +0800

    nvme-pci: Set the prp2 correctly when using more than 4k page
    
    commit a4f40484e7f1dff56bb9f286cc59ffa36e0259eb upstream.
    
    In the current code, the nvme is using a fixed 4k PRP entry size,
    but if the kernel use a page size which is more than 4k, we should
    consider the situation that the bv_offset may be larger than the
    dev->ctrl.page_size. Otherwise we may miss setting the prp2 and then
    cause the command can't be executed correctly.
    
    Fixes: dff824b2aadb ("nvme-pci: optimize mapping of small single segment requests")
    Cc: stable@vger.kernel.org
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Kevin Hao <haokexin@gmail.com>
    Signed-off-by: Keith Busch <kbusch@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

So, time to go try 5.3.8. My yaks are getting quite smooth.

Oh, and when you’re done with your temporary firmware, either fiddle with mboxctl or restart the systemd service for it, or reboot your BMC or… well, I gotta leave you something to work out on your own :)

Building OpenPOWER firmware on Fedora 31

One of the challenges with Fedora 31 is that /usr/bin/python is now Python 3 rather than Python 2. Just about every python script in existence relies on /usr/bin/python being Python 2 and not anything else. I can’t really recall, but this probably happened with the 1.5 to 2 transition as well (although IIRC that was less breaking).

What this means is that for projects that are half-way through converting to python 3, everything breaks.

op-build is one of these projects.

So, we need:

After all that, you can actually build a pnor image on Fedora 31. Even on Fedora 31 ppc64le, which is literally what I’ve just done.

December 13, 2019

Upstreaming Blackbird firmware (step 1: skiboot)

Now that I can actually boot the machine, I could test and send my patch upstream for Blackbird support in skiboot. One thing I noticed with the current firmware from Raptor is that the PCIe slot names were wrong. While a pretty minor point, it’s a bit funny that there’s only two slots and the names were wrong.

The PCIe slot names are used to call out the physical location of PCIe cards in the system, so if you, say, hit a bunch of errors, OS/firmware can say “It’s this card in the slot labeled BLAH on the board”.

With my patch, the slot table from skiboot is spat out looking like this:

[   64.296743001,5] PHB#0000:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..ff SLOT=SLOT1 PCIE 4.0 X16 
 [   64.296875483,5] PHB#0001:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..01 SLOT=SLOT2 PCIE 4.0 X8 
 [   64.297054197,5] PHB#0001:01:00.0 [EP  ] 8086 f1a8 R:03 C:010802 (  mass-storage) LOC_CODE=SLOT2 PCIE 4.0 X8
 [   64.297285067,5] PHB#0002:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..01 SLOT=Builtin SATA 
 [   64.297411565,5] PHB#0002:01:00.0 [LGCY] 1b4b 9235 R:11 C:010601 (          sata) LOC_CODE=Builtin SATA
 [   64.297554540,5] PHB#0003:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..01 SLOT=Builtin USB 
 [   64.297732049,5] PHB#0003:01:00.0 [EP  ] 104c 8241 R:02 C:0c0330 (      usb-xhci) LOC_CODE=Builtin USB
 [   64.297848624,5] PHB#0004:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..01 SLOT=Builtin Ethernet 
 [   64.298026870,5] PHB#0004:01:00.0 [EP  ] 14e4 1657 R:01 C:020000 (      ethernet) LOC_CODE=Builtin Ethernet
 [   64.298212291,5] PHB#0004:01:00.1 [EP  ] 14e4 1657 R:01 C:020000 (      ethernet) LOC_CODE=Builtin Ethernet
 [   64.298424962,5] PHB#0004:01:00.2 [EP  ] 14e4 1657 R:01 C:020000 (      ethernet) LOC_CODE=Builtin Ethernet
 [   64.298587848,5] PHB#0005:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..02 SLOT=BMC 
 [   64.298722540,5] PHB#0005:01:00.0 [ETOX] 1a03 1150 R:04 C:060400 B:02..02 LOC_CODE=BMC
 [   64.298850009,5] PHB#0005:02:00.0 [PCID] 1a03 2000 R:41 C:030000 (           vga) LOC_CODE=BMC

If you want to give it a go, grab the patch, build skiboot, and flash it on. Alternatively, you can download a built skiboot here. To flash it, do this:

# Copy to your BMC for the Blackbird
scp skiboot-v6.5-146-g376bed3f.lid.xz.stb root@blackbird:/tmp/

# then, ssh to the BMC
$ ssh root@blackbird

# ensure the machine is off
obmcutil poweroff --wait

# Now, make a backup copy (remember to copy it off /tmp on the bmc)
pflash -P PAYLOAD -r /tmp/skiboot-backup

# and flash the new skiboot:
pflash -e -P PAYLOAD -p /tmp/skiboot.lid.xz.stb

# now, power on the box
obmcutil poweron

Black(bird) boots!

Well, after the half false start of not having RAM so really not being able to do much (yeah yeah, I hear you – I’m weak for not just running Linux in L3), my RAM arrived today. Putting the sticks in was easy (of course), although does not make for an exciting photo.

One DIMM in the Blackbird

After that, I SSH’d the the BMC and then did “obmcutil poweron” (as is traditional) and started looking at the console via conneting via SSH to port 2200 on the BMC. I was then greeted by the (by this time in my life rather familiar) Hostboot:

--== Welcome to Hostboot hostboot-3beba24/hbicore.bin ==--
 3.02902|secure|SecureROM valid - enabling functionality
   7.15613|Booting from SBE side 0 on master proc=00050000
   7.19697|ISTEP  6. 5 - host_init_fsi
   7.54226|ISTEP  6. 6 - host_set_ipl_parms
   8.06280|ISTEP  6. 7 - host_discover_targets
   9.19791|HWAS|PRESENT> DIMM[03]=8080000000000000
   9.19792|HWAS|PRESENT> Proc[05]=8000000000000000
   9.19794|HWAS|PRESENT> Core[07]=1511540000000000
   9.55305|ISTEP  6. 8 - host_update_master_tpm
   9.60521|SECURE|Security Access Bit> 0x0000000000000000
   9.60522|SECURE|Secure Mode Disable (via Jumper)> 0x8000000000000000
   9.63093|ISTEP  6. 9 - host_gard
   9.89867|HWAS|Blocking Speculative Deconfig
   9.90128|HWAS|FUNCTIONAL> DIMM[03]=8080000000000000
   9.90129|HWAS|FUNCTIONAL> Proc[05]=8000000000000000
   9.90130|HWAS|FUNCTIONAL> Core[07]=1511540000000000
   9.90329|ISTEP  6.11 - host_start_occ_xstop_handler
  11.19092|ISTEP  6.12 - host_voltage_config
  11.30246|ISTEP  7. 1 - mss_attr_cleanup
  12.61924|ISTEP  7. 2 - mss_volt
  12.92705|ISTEP  7. 3 - mss_freq
  13.67475|ISTEP  7. 4 - mss_eff_config
  14.95827|ISTEP  7. 5 - mss_attr_update
  14.97307|ISTEP  8. 1 - host_slave_sbe_config
  15.05372|ISTEP  8. 2 - host_setup_sbe
  15.10258|ISTEP  8. 3 - host_cbs_start
  15.10381|ISTEP  8. 4 - proc_check_slave_sbe_seeprom_complete
  15.11144|ISTEP  8. 5 - host_attnlisten_proc
  15.11213|ISTEP  8. 6 - host_p9_fbc_eff_config
  15.13552|ISTEP  8. 7 - host_p9_eff_config_links
  15.20087|ISTEP  8. 8 - proc_attr_update
  15.20191|ISTEP  8. 9 - proc_chiplet_fabric_scominit
  15.21891|ISTEP  8.10 - proc_xbus_scominit
  15.22929|ISTEP  8.11 - proc_xbus_enable_ridi
  15.24717|ISTEP  8.12 - host_set_voltages
  15.26620|ISTEP  9. 1 - fabric_erepair
  15.42123|ISTEP  9. 2 - fabric_io_dccal
  15.42436|ISTEP  9. 3 - fabric_pre_trainadv
  15.42887|ISTEP  9. 4 - fabric_io_run_training
  15.43207|ISTEP  9. 5 - fabric_post_trainadv
  15.44893|ISTEP  9. 6 - proc_smp_link_layer
  15.45454|ISTEP  9. 7 - proc_fab_iovalid
  15.87126|ISTEP  9. 8 - host_fbc_eff_config_aggregate
  15.89174|ISTEP 10. 1 - proc_build_smp
  16.54194|ISTEP 10. 2 - host_slave_sbe_update
  18.63876|sbe|System Performing SBE Update for PROC 0, side 0
  41.69727|sbe|System Rebooting To Complete SBE Update Process
  41.72189|IPMI: Initiate power cycle
  42.40652|IPMI: shutdown complete

The first IPL updated the Self Boot Engine firmware on the chip, so it automatically applied the new firmware and rebooted to finish applying it. This is perfectly normal, it just shows itself as a longer boot time. Booting continues:

--== Welcome to Hostboot hostboot-3beba24/hbicore.bin ==--
 3.02810|secure|SecureROM valid - enabling functionality
   6.07331|Booting from SBE side 0 on master proc=00050000
   6.11485|ISTEP  6. 5 - host_init_fsi
   6.60361|ISTEP  6. 6 - host_set_ipl_parms
   6.98640|ISTEP  6. 7 - host_discover_targets
   7.53975|HWAS|PRESENT> DIMM[03]=8080000000000000
   7.53976|HWAS|PRESENT> Proc[05]=8000000000000000
   7.53977|HWAS|PRESENT> Core[07]=1511540000000000
   7.79123|ISTEP  6. 8 - host_update_master_tpm
   7.79263|SECURE|Security Access Bit> 0x0000000000000000
   7.79264|SECURE|Secure Mode Disable (via Jumper)> 0x8000000000000000
   7.82684|ISTEP  6. 9 - host_gard
   8.26609|HWAS|Blocking Speculative Deconfig
   8.26865|HWAS|FUNCTIONAL> DIMM[03]=8080000000000000
   8.26866|HWAS|FUNCTIONAL> Proc[05]=8000000000000000
   8.26867|HWAS|FUNCTIONAL> Core[07]=1511540000000000
   8.27142|ISTEP  6.11 - host_start_occ_xstop_handler
   9.69606|ISTEP  6.12 - host_voltage_config
   9.81183|ISTEP  7. 1 - mss_attr_cleanup
  10.95130|ISTEP  7. 2 - mss_volt
  11.39875|ISTEP  7. 3 - mss_freq
  12.15655|ISTEP  7. 4 - mss_eff_config
  13.63504|ISTEP  7. 5 - mss_attr_update
  13.65162|ISTEP  8. 1 - host_slave_sbe_config
  13.78039|ISTEP  8. 2 - host_setup_sbe
  13.78143|ISTEP  8. 3 - host_cbs_start
  13.78247|ISTEP  8. 4 - proc_check_slave_sbe_seeprom_complete
  13.79015|ISTEP  8. 5 - host_attnlisten_proc
  13.79114|ISTEP  8. 6 - host_p9_fbc_eff_config
  13.79734|ISTEP  8. 7 - host_p9_eff_config_links
  13.85128|ISTEP  8. 8 - proc_attr_update
  13.85783|ISTEP  8. 9 - proc_chiplet_fabric_scominit
  13.87991|ISTEP  8.10 - proc_xbus_scominit
  13.89056|ISTEP  8.11 - proc_xbus_enable_ridi
  13.91122|ISTEP  8.12 - host_set_voltages
  13.93077|ISTEP  9. 1 - fabric_erepair
  14.05235|ISTEP  9. 2 - fabric_io_dccal
  14.13131|ISTEP  9. 3 - fabric_pre_trainadv
  14.13616|ISTEP  9. 4 - fabric_io_run_training
  14.13934|ISTEP  9. 5 - fabric_post_trainadv
  14.14087|ISTEP  9. 6 - proc_smp_link_layer
  14.14656|ISTEP  9. 7 - proc_fab_iovalid
  14.59454|ISTEP  9. 8 - host_fbc_eff_config_aggregate
  14.61811|ISTEP 10. 1 - proc_build_smp
  15.24074|ISTEP 10. 2 - host_slave_sbe_update
  17.16022|sbe|System Performing SBE Update for PROC 0, side 1
  40.16808|ISTEP 10. 4 - proc_cen_ref_clk_enable
  40.27866|ISTEP 10. 5 - proc_enable_osclite
  40.31297|ISTEP 10. 6 - proc_chiplet_scominit
  40.55805|ISTEP 10. 7 - proc_abus_scominit
  40.57942|ISTEP 10. 8 - proc_obus_scominit
  40.58078|ISTEP 10. 9 - proc_npu_scominit
  40.60704|ISTEP 10.10 - proc_pcie_scominit
  40.66572|ISTEP 10.11 - proc_scomoverride_chiplets
  40.66874|ISTEP 10.12 - proc_chiplet_enable_ridi
  40.68407|ISTEP 10.13 - host_rng_bist
  40.75548|ISTEP 10.14 - host_update_redundant_tpm
  40.75785|ISTEP 11. 1 - host_prd_hwreconfig
  41.15067|ISTEP 11. 2 - cen_tp_chiplet_init1
  41.15299|ISTEP 11. 3 - cen_pll_initf
  41.15544|ISTEP 11. 4 - cen_pll_setup
  41.18530|ISTEP 11. 5 - cen_tp_chiplet_init2
  41.18762|ISTEP 11. 6 - cen_tp_arrayinit
  41.19050|ISTEP 11. 7 - cen_tp_chiplet_init3
  41.19286|ISTEP 11. 8 - cen_chiplet_init
  41.19553|ISTEP 11. 9 - cen_arrayinit
  41.19986|ISTEP 11.10 - cen_initf
  41.20215|ISTEP 11.11 - cen_do_manual_inits
  41.20497|ISTEP 11.12 - cen_startclocks
  41.20802|ISTEP 11.13 - cen_scominits
  41.21171|ISTEP 12. 1 - mss_getecid
  42.25709|ISTEP 12. 2 - dmi_attr_update
  42.30382|ISTEP 12. 3 - proc_dmi_scominit
  42.32572|ISTEP 12. 4 - cen_dmi_scominit
  42.32798|ISTEP 12. 5 - dmi_erepair
  42.35000|ISTEP 12. 6 - dmi_io_dccal
  42.35218|ISTEP 12. 7 - dmi_pre_trainadv
  42.35489|ISTEP 12. 8 - dmi_io_run_training
  42.37076|ISTEP 12. 9 - dmi_post_trainadv
  42.39541|ISTEP 12.10 - proc_cen_framelock
  42.40772|ISTEP 12.11 - host_startprd_dmi
  42.41974|ISTEP 12.12 - host_attnlisten_memb
  42.44506|ISTEP 12.13 - cen_set_inband_addr
  42.58832|ISTEP 13. 1 - host_disable_memvolt
  43.67808|ISTEP 13. 2 - mem_pll_reset
  43.75070|ISTEP 13. 3 - mem_pll_initf
  43.85043|ISTEP 13. 4 - mem_pll_setup
  43.87372|ISTEP 13. 6 - mem_startclocks
  43.88970|ISTEP 13. 7 - host_enable_memvolt
  43.89177|ISTEP 13. 8 - mss_scominit
  45.10013|ISTEP 13. 9 - mss_ddr_phy_reset
  45.38105|ISTEP 13.10 - mss_draminit
  45.95447|ISTEP 13.11 - mss_draminit_training
  47.20963|ISTEP 13.12 - mss_draminit_trainadv
  47.32161|ISTEP 13.13 - mss_draminit_mc
  47.49186|ISTEP 14. 1 - mss_memdiag
  69.53224|ISTEP 14. 2 - mss_thermal_init
  69.66891|ISTEP 14. 3 - proc_pcie_config
  69.71959|ISTEP 14. 4 - mss_power_cleanup
  69.72385|ISTEP 14. 5 - proc_setup_bars
  69.83889|ISTEP 14. 6 - proc_htm_setup
  69.84748|ISTEP 14. 7 - proc_exit_cache_contained
  69.89430|ISTEP 15. 1 - host_build_stop_image
  73.08679|ISTEP 15. 2 - proc_set_pba_homer_bar
  73.12352|ISTEP 15. 3 - host_establish_ex_chiplet
  73.13714|ISTEP 15. 4 - host_start_stop_engine
  73.19059|ISTEP 16. 1 - host_activate_master
  74.44590|ISTEP 16. 2 - host_activate_slave_cores
  74.53820|ISTEP 16. 3 - host_secure_rng
  74.54651|ISTEP 16. 4 - mss_scrub
  74.56565|ISTEP 16. 5 - host_load_io_ppe
  74.78752|ISTEP 16. 6 - host_ipl_complete
  75.50085|ISTEP 18.11 - proc_tod_setup
  75.94190|ISTEP 18.12 - proc_tod_init
  75.97575|ISTEP 20. 1 - host_load_payload
  77.12340|ISTEP 20. 2 - host_load_hdat
  78.05195|ISTEP 21. 1 - host_runtime_setup
  83.87001|htmgt|OCCs are now running in ACTIVE state
  89.72649|ISTEP 21. 2 - host_verify_hdat
  89.77252|ISTEP 21. 3 - host_start_payload
 [   90.400516933,5] OPAL skiboot-c81f9d6 starting…

The rest of the skiboot log was also spat out, and then the familiar Petitboot screen:

Welcome to Petitboot!

It lives! I even had a bit of a look at the sensors to see power consumption and temperatures. All looks good:

ipmitool sdr|grep -v ns
 occ0             | 0x00              | ok
 occ1             | 0x00              | ok
 p0_core3_temp    | 51 degrees C      | ok
 p0_core5_temp    | 49 degrees C      | ok
 p0_core7_temp    | 50 degrees C      | ok
 p0_core11_temp   | 49 degrees C      | ok
 p0_core15_temp   | 50 degrees C      | ok
 p0_core17_temp   | 50 degrees C      | ok
 p0_core19_temp   | 50 degrees C      | ok
 p0_core21_temp   | 50 degrees C      | ok
 dimm0_temp       | 36 degrees C      | ok
 dimm4_temp       | 39 degrees C      | ok
 fan0             | 1300 RPM          | ok
 fan1             | 1200 RPM          | ok
 fan2             | 1000 RPM          | ok
 p0_power         | 60 Watts          | ok
 p0_vdd_power     | 31 Watts          | ok
 p0_vdn_power     | 10 Watts          | ok
 cpu_1_ambient    | 30.90 degrees C   | ok
 pcie             | 27 degrees C      | ok
 ambient          | 25.40 degrees C   | ok

Next up? I guess I should install an OS.

Coming to grips with Kubernetes in 2020: podcasts

Share

It has become clear to me that it is time to care about Kubernetes more. I’m sure many people have cared for ages, but the things I want to build at the moment are starting to be more container based now that I am thinking more at the application layer than the cloud infrastructure layer. So how to do that? I thought I’d write down some notes on what has worked (or not) for me, in the hope it will help others. In this post, podcasts.

I thought podcasts would be an interesting way to get started with some nice overviews. This is especially true because I’m already a pretty heavy podcast user, so it was easy to slot into my existing routine. Unfortunately this hasn’t really worked out. I started with the podctl podcast, but they only ever talk about Red Hat stuff. It is very rare for a guest to not be a Red Hat employee for example. The presenters of this podcast seem to also really dislike OpenStack for reasons they never explain, which is annoying.

Then I figured maybe the Google Kubernetes podcast would be better, but it often lacks the depth I am interested in.

I am yet to find a good podcast which deep dives into technology instead of just talking about what is in the latest release. So maybe these podcasts are useful if you’re interested in what things dropped in the most recent release, but they’re not a good nor systematic way to get introduced to Kubernetes.

That said, I only just discovered the TGI Kubernetes youtube channel yesterday. It is not really what I wanted in a podcast given its a video blog, but I think it has prospects to be interesting. I will update this post when I’ve had a chance to check it out in more depth.

Have you found a good Kubernetes podcast? Am I being wildly unfair?

Share

December 12, 2019

Audiobooks – October 2019

The Story of the British Isles in 100 Places by Neil Oliver

Covers what you’d expect with a good attempt not just to hit the “history 101” places. Author has an accent that takes a while to get used to. 7/10

Death’s End – Cixin Liu

3rd in Trilogy wrapping things mostly up. Just a few characters so easy to keep track of them. If you liked the previous books you’ll like this one. 7/10

Building the Cycling City: The Dutch Blueprint for Urban Vitality by Melissa & Chris Bruntlett

Talking about Dutch Cycling culture. Compares 5 different cities (some car orientated) and how they differ in their cycling journey. 7/10

Scrappy Little Nobody by Anna Kendrick

A general memoir by the actress. A bit disjointed & unsystematic and by no means a tell-all. A few good stories sprinkled in. 6/10

The $100 Startup by Chris Guillebeau

Lots of case studies of businesses built off relatively little capital (and usually staying small). Plenty of good advice although lists don’t translate well in audio. 7/10

Atomic Adventures: Secret Islands, Forgotten N-Rays, and Isotopic Murder-A Journey into the Wild World of Nuclear Science by James Magaffey

A bunch of really good stories from the Atomic age (not just the usual ones) including a view from inside of the Cold Fusion fiasco. 8/10

Share

December 10, 2019

Looking at the state of Blackbird firmware

Having been somewhat involved in OpenPOWER firmware, I have a bunch of experience and opinions on maintaining firmware trees for products, what working with upstream looks like and all that.

So, with my new Blackbird system I decided to take a bit of a look as to what the firmware situation was like.

There’s two main parts of firmware: BMC and Host. The BMC firmware runs purely on the ASPEED AST2500 and is based on OpenBMC while the host firmware is what runs on the POWER9 and is based off of OpenPOWER Firmware as assembled by op-build.

Initial impressions on the BMC is that there doesn’t seem to be any web based UI for it, which is kind of disappointing, as the Web UI being developed upstream has some nice qualities, and I’d say I even enjoyed using it when it was built into BMC firmware for systems we had when I was at IBM.

Looking at the git trees, the raptor-v1.00 tag is OpenBMC 2.7.0-dev-533-g386e5602e while current master is 2.8.0-dev-960-g10f7830bd. The spot where it split off was 2.7.0-dev-430-g7443ee80b, from April 2019 – so it’s not too old, but I’m also not convinced there should have been some security patches since then.

I’m not sure if any of the OpenBMC code is upstream, I haven’t looked.

Unfortunately, none of the host firmware is upstream.

On the host firmware side, v2.3-rc2-67-ga6a5f142 is the Raptor tag, and that compares with current master of v2.4-305-g54d8daf4, the place where Raptor forked was v2.3-rc2-9-g7b556015, again in April of 2019. Considering there was an upstream release in May of 2019 (v2.3), and again in July (v2.4), it could have easily have made it into an upstream release.

Unfortunately, there doesn’t seem to have been an upstream op-build release since v2.4 back in July (when I made it shortly before leaving IBM).

The skiboot component of host firmware has had an upstream release since I left (v6.5 in mid-August 2019), so the (rather trivial) platform support could have easily made it. I have a cleaned up and ready to upstream patch for it, I just need some DIMMs to actually test with before I send the patch.

As the current firmware situation stands, producing another build with updated upstream code is tricky due to the out-of-tree nature of the Blackbird patches, and a straight “git merge” is probably doable by some people, but not everybody.

On my TODO list is to get all the code into a state I can upstream it, assess vulnerability to CVE-2019-6260, and work out how I want to make it do Secure Boot (something that isn’t in upstream firmware yet, and currently would require a TPM, which I do not have).

Blackbird (singing in the dead of night..)

Way back when Raptor Computer Systems was doing pre-orders for the microATX Blackboard POWER9 system, I put in a pre-order. Since then, I’ve had a few life changes (such as moving to the US and starting to work for Amazon rather than IBM), but I’ve finally gone and done (most of) the setup for my own POWER9 system on (or under) my desk.

An 8 core POWER9 CPU, in bubble wrap and plastic packaging.

Everything came in a big brown box, all rather well packed. I had the board, CPU, heatsink assembly and the special tool to attach the heatsink to the board. Although unique to POWER9, the heatsink/fan assembly was one of the easier ones I’ve ever attached to a board.

The board itself looks pretty much as you’d expect – there’s a big spot for the CPU, a couple of PCI slots, a couple of DIMM slots and some SATA connectors.

The bits that are a bit unusual for a micro-ATX board are the big space reserved for FlexVer, the ASPEED BMC chip and the socketed flash. FlexVer is something I’m not ever going to use, and instead wish that there was an on-board m2 SSD slot instead, even if it was just PCIe. Having to sacrifice a PCIe slot just for a SSD is kind of a bummer.

The Blackbird POWER9 board
The POWER9 chip in socket

One annoying thing is my DIMMs are taking their sweet time in getting here, so I couldn’t actually populate the board with any memory.

Even without memory though, you can start powering it on and see that everything else works okay (i.e. it’s not completely boned). So, even without DIMMs, I could plug it in, and observe the Hostboot firmware complaining about insufficient hardware to IPL the box.

It Lives!

Yep, out the console (via ssh) you clearly see where things fail:

--== Welcome to Hostboot hostboot-3beba24/hbicore.bin ==--

  3.03104|secure|SecureROM valid - enabling functionality
  6.67619|Booting from SBE side 0 on master proc=00050000
  6.85100|ISTEP  6. 5 - host_init_fsi
  7.23753|ISTEP  6. 6 - host_set_ipl_parms
  7.71759|ISTEP  6. 7 - host_discover_targets
 11.34738|HWAS|PRESENT> Proc[05]=8000000000000000
 11.34739|HWAS|PRESENT> Core[07]=1511540000000000
 11.69077|ISTEP  6. 8 - host_update_master_tpm
 11.73787|SECURE|Security Access Bit> 0x0000000000000000
 11.73787|SECURE|Secure Mode Disable (via Jumper)> 0x8000000000000000
 11.76276|ISTEP  6. 9 - host_gard
 11.96654|HWAS|FUNCTIONAL> Proc[05]=8000000000000000
 11.96655|HWAS|FUNCTIONAL> Core[07]=1511540000000000
 12.07554|================================================
 12.07554|Error reported by hwas (0x0C00) PLID 0x90000007
 12.10289|  checkMinimumHardware found no functional dimm cards.
 12.10290|  ModuleId   0x03 MOD_CHECK_MIN_HW
 12.10291|  ReasonCode 0x0c06 RC_SYSAVAIL_NO_MEMORY_FUNC
 12.10292|  UserData1  HUID of node : 0x0002000000000000
 12.10293|  UserData2  number of present, non-functional dimms : 0x0000000000000000
 12.10294|------------------------------------------------
 12.10417|  Callout type             : Procedure Callout
 12.10417|  Procedure                : EPUB_PRC_FIND_DECONFIGURED_PART
 12.10418|  Priority                 : SRCI_PRIORITY_HIGH
 12.10419|------------------------------------------------
 12.10420|  Hostboot Build ID: hostboot-3beba24/hbicore.bin
 12.10421|================================================
 12.51718|================================================
 12.51719|Error reported by hwas (0x0C00) PLID 0x90000007
 12.51720|  Insufficient hardware to continue.
 12.51721|  ModuleId   0x03 MOD_CHECK_MIN_HW
 12.51722|  ReasonCode 0x0c04 RC_SYSAVAIL_INSUFFICIENT_HW
 12.54457|  UserData1   : 0x0000000000000000
 12.54458|  UserData2   : 0x0000000000000000
 12.54458|------------------------------------------------
 12.54459|  Callout type             : Procedure Callout
 12.54460|  Procedure                : EPUB_PRC_FIND_DECONFIGURED_PART
 12.54461|  Priority                 : SRCI_PRIORITY_HIGH
 12.54462|------------------------------------------------
 12.54462|  Hostboot Build ID: hostboot-3beba24/hbicore.bin
 12.54463|================================================
 12.73660|System shutting down with error status 0x90000007
 12.75545|================================================
 12.75546|Error reported by istep (0x1700) PLID 0x90000007
 12.77991|  IStep failed, see other log(s) with the same PLID for reason.
 12.77992|  ModuleId   0x01 MOD_REPORTING_ERROR
 12.77993|  ReasonCode 0x1703 RC_FAILURE
 12.77994|  UserData1  eid of first error : 0x9000000800000c04
 12.77995|  UserData2  Reason code of first error : 0x0000000100000609
 12.77996|------------------------------------------------
 12.77996|  host_gard
 12.77997|------------------------------------------------
 12.77998|  Callout type             : Procedure Callout
 12.77998|  Procedure                : EPUB_PRC_HB_CODE
 12.77999|  Priority                 : SRCI_PRIORITY_LOW
 12.78000|------------------------------------------------
 12.78001|  Hostboot Build ID: hostboot-3beba24/hbicore.bin
 12.78002|================================================

Looking forward to getting some DIMMs to show/share more.

December 09, 2019

systemd-nspawn and Private Networking

Currently there’s two things I want to do with my PC at the same time, one is watching streaming services like ABC iView (which won’t run from non-Australian IP addresses) and another is torrenting over a VPN. I had considered doing something ugly with iptables to try and get routing done on a per-UID basis but that seemed to difficult. At the time I wasn’t aware of the ip rule add uidrange [1] option. So setting up a private networking namespace with a systemd-nspawn container seemed like a good idea.

Chroot Setup

For the chroot (which I use as a slang term for a copy of a Linux installation in a subdirectory) I used a btrfs subvol that’s a snapshot of the root subvol. The idea is that when I upgrade the root system I can just recreate the chroot with a new snapshot.

To get this working I created files in the root subvol which are used for the container.

I created a script like the following named /usr/local/sbin/container-sshd to launch the container. It sets up the networking and executes sshd. The systemd-nspawn program is designed to launch init but that’s not required, I prefer to just launch sshd so there’s only one running process in a container that’s not being actively used.

#!/bin/bash

# restorecon commands only needed for SE Linux
/sbin/restorecon -R /dev
/bin/mount none -t tmpfs /run
/bin/mkdir -p /run/sshd
/sbin/restorecon -R /run /tmp
/sbin/ifconfig host0 10.3.0.2 netmask 255.255.0.0
/sbin/route add default gw 10.2.0.1
exec /usr/sbin/sshd -D -f /etc/ssh/sshd_torrent_config

How to Launch It

To setup the container I used a command like “/usr/bin/systemd-nspawn -D /subvols/torrent -M torrent –bind=/home -n /usr/local/sbin/container-sshd“.

First I had tried the --network-ipvlan option which creates a new IP address on the same MAC address. That gave me an interface iv-br0 on the container that I could use normally (br0 being the bridge used in my workstation as it’s primary network interface). The IP address I assigned to that was in the same subnet as br0, but for some reason that’s unknown to me (maybe an interaction between bridging and network namespaces) I couldn’t access it from the host, I could only access it from other hosts on the network. I then tried the --network-macvlan option (to create a new MAC address for virtual networking), but that had the same problem with accessing the IP address from the local host outside the container as well as problems with MAC redirection to the primary MAC of the host (again maybe an interaction with bridging).

Then I tried just the “-n” option which gave it a private network interface. That created an interface named ve-torrent on the host side and one named host0 in the container. Using ifconfig and route to configure the interface in the container before launching sshd is easy. I haven’t yet determined a good way of configuring the host side of the private network interface automatically.

I had to use a bind for /home because /home is a subvol and therefore doesn’t get included in the container by default.

How it Works

Now when it’s running I can just “ssh -X” to the container and then run graphical programs that use the VPN while at the same time running graphical programs on the main host that don’t use the VPN.

Things To Do

Find out why --network-ipvlan and --network-macvlan don’t work with communication from the same host.

Find out why --network-macvlan gives errors about MAC redirection when pinging.

Determine a good way of setting up the host side after the systemd-nspawn program has run.

Find out if there are better ways of solving this problem, this way works but might not be ideal. Comments welcome.

December 06, 2019

Audiobooks – September 2019

Off the Rails: A Train Trip Through Life by Beppe Severgnini

A collection of train journey articles (written over about 20 years). A good selection on interesting and amusing. 7/10

Exoplanets: Hidden Worlds and the Quest for Extraterrestrial Life by Donald Goldsmith

A history of the discovery of exoplanets, covering the different groups, techniques and rivalries. Good although I got the people mixed up sometimes. 7/10

Save the Cat! : The Last Book on Screenwriting You’ll Ever Need by Blake Snyder

A guide to screenwriting with a few stories and observations on movies thrown in. Good even if you are just reading it for fun. 7/10

Being Mortal: Medicine and What Matters in the End by Atul Gawande

A book about geriatric and end-of-life care and choices. Lots of points about how risking all for aggressive treatment is often a very bad idea. Thought-provoking. 9/10

Ancient Alexandria: The History and Legacy of Egypt’s Most Famous City by Charles River Editors

Just a two hour long overview of the history. Covered the basic stuff and maybe worth skimming before you hit something meatier. 6/10

Vulcan 607 by Rowland White

The story of the long-distant bombing raids during the Falkland’s war. Lots of details on the history of the Vulcan, the crews, background and the actual missions. 9/10

101 Secrets For Your Twenties by Paul Angone

I really can’t remember this book well. I think it was okay but serves me right for getting months behind on reviews. On list for completeness. ?/10

Share

December 02, 2019

LUV December 2019 Main Meeting: A review of Linux and Open Source in 2019

Dec 3 2019 19:00
Dec 3 2019 21:00
Dec 3 2019 19:00
Dec 3 2019 21:00
Location: 
Kathleen Syme Library, 251 Faraday Street Carlton VIC 3053

NOTE: The library closes at 7pm so arrivals after that time will need to contact Andrew on (0421) 775 358 or any other attendee for admission.

Speaker:  Alexar Pendashteh

A review of Linux and Open Source in 2019

This is the last main meeting of LUV in 2019!
In this meeting we are going to have a look at what 2019 had for Linux and Open Source and have a peek into what's coming up.
This event will be mainly a social event, with group discussion followed by a dinner in a nearby resturant or cafe!

Many of us like to go for dinner nearby in Lygon St. after the meeting. Please let us know if you'd like to join us!

Linux Users of Victoria is a subcommittee of Linux Australia.

December 3, 2019 - 19:00

November 25, 2019

Fixing Turris Omnia WiFi Quality

I was recently hoping to replace an aging proprietary router (upgraded to a Gargoyle FOSS firmware). After rejecting a popular brand with a disturbing GPL violation habit, I settled on the Turris Omnia router, built on free software. Overall, I was pretty satisfied with the fact that it is free and comes with automatic updates, but I noticed a problem with the WiFi. Specifically, the 5 GHz access point was okay but the 2.4 GHz was awful.

False lead

I initially thought that the 2.4 GHz radio wasn't working, but then I realized that putting my phone next to the router would allow it to connect and exchange data at a slow-but-steady rate. If I moved the phone more than 3-4 meters away though, it would disconnect for lack of signal. To be frank, the wireless performance was much worse than my original router, even though the wired performance was, as expected, amazing:

I looked on the official support forums and found this intriguing thread about interference between USB3 and 2.4 GHz radios. This sounded a lot like what I was experiencing (working radio but terrible signal/interference) and so I decided to see if I could move the radios around inside the unit, as suggested by the poster.

After opening the case however, I noticed that radios were already laid out in the optimal way:

and that USB3 interference wasn't going to be the reason for my troubles.

Real problem

So I took a good look at the wiring and found that while the the larger radio (2.4 / 5 GHz dual-bander) was connected to all three antennas, the smaller radio (2.4 GHz only) was connected to only 2 of the 3 antennas:

To make it possible for antennas 1 and 3 to carry the signal from both radios, a duplexer got inserted between the radios and the antenna:

On one side is the 2.4 antenna port and on the other side is the 5 GHz port.

Looking at the wiring though, it became clear that my 2.4 GHz radio was connected to the 5 GHz ports of the two duplexers and the 5 GHz radio was connected to the 2.4 GHz ports of the duplexers. This makes sense considering that I had okay 5 GHz performance (with one of the three chains connected to the right filter) and abysimal 2.4 GHz performance (with none of the two chains connected to the right filter).

Solution

Swapping the antenna connectors around completely fixed the problem. With the 2.4 GHz radio connected to the 2.4 side of the duplexer and the dual-bander connected to the 5 GHz side, I was able to get the performance I would expect from such a high-quality router.

Interestingly enough, I found the solution to this problem the same weekend as I passed my advanced amateur radio license exam. I guess that was a good way to put the course material into practice!

November 18, 2019

4K Monitors

A couple of years ago a relative who uses a Linux workstation I support bought a 4K (4096*2160 resolution) monitor. That meant that I had to get 4K working, which was 2 years of pain for me and probably not enough benefit for them to justify it. Recently I had the opportunity to buy some 4K monitors at a low enough price that it didn’t make sense to refuse so I got to experience it myself.

The Need for 4K

I’m getting older and my vision is decreasing as expected. I recently got new glasses and got a pair of reading glasses as a reduced ability to change focus is common as you get older. Unfortunately I made a mistake when requesting the focus distance for the reading glasses and they work well for phones, tablets, and books but not for laptops and desktop computers. Now I have the option of either spending a moderate amount of money to buy a new pair of reading glasses or just dealing with the fact that laptop/desktop use isn’t going to be as good until the next time I need new glasses (sometime 2021).

I like having lots of terminal windows on my desktop. For common tasks I might need a few terminals open at a time and if I get interrupted in a task I like to leave the terminal windows for it open so I can easily go back to it. Having more 80*25 terminal windows on screen increases my productivity. My previous monitor was 2560*1440 which for years had allowed me to have a 4*4 array of non-overlapping terminal windows as well as another 8 or 9 overlapping ones if I needed more. 16 terminals allows me to ssh to lots of systems and edit lots of files in vi. Earlier this year I had found it difficult to read the font size that previously worked well for me so I had to use a larger font that meant that only 3*3 terminals would fit on my screen. Going from 16 non-overlapping windows and an optional 8 overlapping to 9 non-overlapping and an optional 6 overlapping is a significant difference. I could get a second monitor, and I won’t rule out doing so at some future time. But it’s not ideal.

When I got a 4K monitor working properly I found that I could go back to a smaller font that allowed 16 non overlapping windows. So I got a real benefit from a 4K monitor!

Video Hardware

Version 1.0 of HDMI released in 2002 only supports 1920*1080 (FullHD) resolution. Version 1.3 released in 2006 supported 2560*1440. Most of my collection of PCIe video cards have a maximum resolution of 1920*1080 in HDMI, so it seems that they only support HDMI 1.2 or earlier. When investigating this I wondered what version of PCIe they were using, the command “dmidecode |grep PCI” gives that information, seems that at least one PCIe video card supports PCIe 2 (released in 2007) but not HDMI 1.3 (released in 2006).

Many video cards in my collection support 2560*1440 with DVI but only 1920*1080 with HDMI. As 4K monitors don’t support DVI input that meant that when initially using a 4K monitor I was running in 1920*1080 instead of 2560*1440 with my old monitor.

I found that one of my old video cards supported 4K resolution, it has a NVidia GT630 chipset (here’s the page with specifications for that chipset [1]). It seems that because I have a video card with 2G of RAM I have the “Keplar” variant which supports 4K resolution. I got the video card in question because it uses PCIe*8 and I had a workstation that only had PCIe*8 slots and I didn’t feel like cutting a card down to size (which is apparently possible but not recommended), it is also fanless (quiet) which is handy if you don’t need a lot of GPU power.

A couple of months ago I checked the cheap video cards at my favourite computer store (MSY) and all the cheap ones didn’t support 4K resolution. Now it seems that all the video cards they sell could support 4K, by “could” I mean that a Google search of the chipset says that it’s possible but of course some surrounding chips could fail to support it.

The GT630 card is great for text, but the combination of it with a i5-2500 CPU (rating 6353 according to cpubenchmark.net [3]) doesn’t allow playing Netflix full-screen and on 1920*1080 videos scaled to full-screen sometimes gets mplayer messages about the CPU being too slow. I don’t know how much of this is due to the CPU and how much is due to the graphics hardware.

When trying the same system with an ATI Radeon R7 260X/360 graphics card (16* PCIe and draws enough power to need a separate connection to the PSU) the Netflix playback appears better but mplayer seems no better.

I guess I need a new PC to play 1920*1080 video scaled to full-screen on a 4K monitor. No idea what hardware will be needed to play actual 4K video. Comments offering suggestions in this regard will be appreciated.

Software Configuration

For GNOME apps (which you will probably run even if like me you use KDE for your desktop) you need to run commands like the following to scale menus etc:

gsettings set org.gnome.settings-daemon.plugins.xsettings overrides "[{'Gdk/WindowScalingFactor', <2>}]"
gsettings set org.gnome.desktop.interface scaling-factor 2

For KDE run the System Settings app, go to Display and Monitor, then go to Displays and Scale Display to scale things.

The Arch Linux Wiki page on HiDPI [2] is good for information on how to make apps work with high DPI (or regular screens for people with poor vision).

Conclusion

4K displays are still rather painful, both in hardware and software configuration. For serious computer use it’s worth the hassle, but it doesn’t seem to be good for general use yet. 2560*1440 is pretty good and works with much more hardware and requires hardly any software configuration.

November 17, 2019

Use swap on NVMe to run more dev KVM guests, for when you run out of RAM

I often spin up a bunch of VMs for different reasons when doing dev work and unfortunately, as awesome as my little mini-itx Ryzen 9 dev box is, it only has 32GB RAM. Kernel Samepage Merging (KSM) definitely helps, however when I have half a dozens or so VMs running and chewing up RAM, the Kernel’s Out Of Memory (OOM) killer will start executing them, like this.

[171242.719512] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/machine.slice/machine-qemu\x2d435\x2dtest\x2dvm\x2dcentos\x2d7\x2d00.scope,task=qemu-system-x86,pid=2785515,uid=107
[171242.719536] Out of memory: Killed process 2785515 (qemu-system-x86) total-vm:22450012kB, anon-rss:5177368kB, file-rss:0kB, shmem-rss:0kB
[171242.887700] oom_reaper: reaped process 2785515 (qemu-system-x86), now anon-rss:0kB, file-rss:68kB, shmem-rss:0kB

If I had more slots available (which I don’t) I could add more RAM, but that’s actually pretty expensive, plus I really like the little form factor. So, given it’s just dev work, a relatively cheap alternative is to buy an NVMe drive and add a swap file to it (or dedicate the whole drive). This is what I’ve done on my little dev box (actually I bought it with an NVMe drive so adding the swapfile came for free).

Of course the number of VMs you can run depends on the amount of RAM each VM actually needs for what you’re running on it. But whether I’m running 100 small VMs or 10 large ones, it doesn’t matter.

To demonstrate this, I spin up a bunch of CentOS 7 VMs at the same time and upgrade all packages. Without swap I could comfortably run half a dozen VMs, but more than that and they would start getting killed. With 100GB swap file I am able to get about 40 going!

Even with pages swapping in and out, I haven’t really noticed any performance decrease and there is negligible CPU time wasted waiting on disk I/O when using the machines normally.

The main advantage for me is that I can keep lots of VMs around (or spin up dozens) in order to test things, without having to juggle active VMs or hoping they won’t actually use their memory and have the kernel start killing my VMs. It’s not as seamless as extra RAM would be, but that’s expensive and I don’t have the slots for it anyway, so this seems like a good compromise.

November 16, 2019

DrupalSouth Diversity Scholarship Winner Announced

A few weeks ago we announced our diversity scholarship for DrupalSouth. Before announcing the winner I want to talk a bit about our experience doing this for the first time.

DrupalSouth is the largest Drupal event held in Oceania every year. It provides a great marketing opportunity for businesses wanting to promote their products and services to the Drupal community. Dave Hall Consulting planned to sponsor DrupalSouth to promote our new training business - Getting It Live training. By the time we got organised all of the (affordable) sponsorship opportunities had gone. After considering various opportunities around the event we felt the best way of investing a similar amount of money and giving something back to the community was through a diversity scholarship

The community provided positive feedback about the initiative. However despite the enthusiasm and working our networks to get a range of applicants, we only ended up with 7 applicants. They were all guys. One applicant was from Australia, the rest were from overseas. About half the applicants dropped out when contacted to confirm that they could cover their own travel and visa expenses.

We are likely to offer other scholarships in the future. We will start earlier and explore other channels for promoting the program.

The scholarship has been awarded to Yogesh Ingale, from Mumbai, India. Over the last 3 years Yogesh has been employed by Tata Consultancy Services’ digital operations team as a DevOps Engineer. During this time he has worked with Drupal, Cloud Computing, Python and Web Technologies. Yogesh is interested in automating processes. When he’s not working, Yogesh likes to travel, automate things and write blog posts. Disclaimer: I know Yogesh through my work with one of my clients. Some times the Drupal community feels pretty small.

Congratulations Yogesh! I am looking forward to seeing you in Hobart.

If you want to meet Yogesh before DrupalSouth, we still have some seats available for our 73780151419">2 day git training course that’s running on 25-26 November. If you won’t be in Hobart, contact us to discuss your training needs.

November 10, 2019

Database Tab Sweep

I miss a proper database related newsletter for busy people. There’s so much happening in the space, from tech, to licensing, and even usage. Anyway, quick tab sweep.

Paul Vallée (of Pythian fame) has been working on Tehama for sometime, and now he gets to do it full time as a PE firm, bought control of Pythian’s services business. Pythian has more than 350 employees, and 250 customers, and raised capital before. More at Ottawa’s Pythian spins out software platform Tehama.

Database leaks data on most of Ecuador’s citizens, including 6.7 million children – ElasticSearch.

Percona has launched Percona Distribution for PostgreSQL 11. This means they have servers for MySQL, MongoDB, and now PostgreSQL. Looks very much like a packaged server with tools from 3rd parties (source).

Severalnines has launched Backup Ninja, an agent-based SaaS service to backup popular databases in the cloud. Backup.Ninja (cool URL) supports MySQL (and variants), MongoDB, PostgreSQL and TimeScale. No pricing available, but it is free for 30 days.

Comparing Database Types: How Database Types Evolved to Meet Different Needs

New In PostgreSQL 12: Generated Columns – anyone doing a comparison with MariaDB Server or MySQL?

Migration Complete – Amazon’s Consumer Business Just Turned off its Final Oracle Database – a huge deal as they migrated 75 petabytes of internal data to DynamoDB, Aurora, RDS and Redshift. Amazon, powered by AWS, and a big win for open source (a lot of these services are built-on open source).

MongoDB and Alibaba Cloud Launch New Partnership – I see this as a win for the SSPL relicense. It is far too costly to maintain a drop-in compatible fork, in a single company (Hi Amazon DocumentDB!). Maybe if the PostgreSQL layer gets open sourced, there is a chance, but otherwise, all good news for Alibaba and MongoDB.

MySQL 8.0.18 brings hash join, EXPLAIN ANALYZE, and more interestingly, HashiCorp Vault support for MySQL Keyring. (Percona has an open source variant).

Some thoughts on Storytelling as an engineering teaching tool

Every week at work on Wednesday afternoons we have the SRE ops review, a relaxed two hour affair where SREs (& friends of, not all of whom are engineers) share interesting tidbits that have happened over the last week or so, this might be a great success, an outage, a weird case, or even a thorny unsolved problem. Usually these relate to a service the speaker is oncall for, or perhaps a dependency or customer service, but we also discuss major incidents both internal & external. Sometimes a recent issue will remind one of the old-guard (of which I am very much now a part) of a grand old story and we share those too.

Often the discussion continues well into the evening as we decant to one of the local pubs for dinner & beer, sometimes chatting away until closing time (probably quite regularly actually, but I'm normally long gone).

It was at one of these nights at the pub two months ago (sorry!), that we ended up chatting about storytelling as a teaching tool, and a colleague asked an excellent question, that at the time I didn't have a ready answer for, but I've been slowly pondering, and decided to focus on over an upcoming trip.

As I start to write the first draft of this post I've just settled in for cruise on my first international trip in over six months[1], popping over to Singapore for the Melbourne Cup weekend, and whilst I'd intended this to be a holiday, I'm so terrible at actually having a holiday[2] that I've ended up booking two sessions of storytelling time, where I present the history of Google's production networks (for those of you reading this who are current of former engineering Googlers, similar to Traffic 101). It's with this perspective of planning, and having run those sessions that I'm going to try and answer the question that I was asked.

Or at least, I'm going to split up the question I was asked and answer each part.

"What makes storytelling good"

On its own this is hard to answer, there are aspects that can help, such as good presentation skills (ideally keeping to spoken word, but simple graphs, diagrams & possibly photos can help), but a good story can be told in a dry technical monotone and still be a good story. That said, as with the rest of these items charisma helps.

"What makes storytelling interesting"

In short, a hook or connection to the audience, for a lot of my infrastructure related outage stories I have enough context with the audience to be able to tie the impact back in a way that resonates with a person. For larger disparate groups shared languages & context help ensure that I'm not just explaining to one person.

In these recent sessions one was with a group of people who work in our Singapore data centre, in that session I focused primarily on the history & evolution of our data centre fabrics, giving them context to understand why some of the (at face level) stranger design decisions have been made that way.

The second session was primarily people involved in the deployment side of our backbone networks, and so I focused more on the backbones, again linking with knowledge the group already had.

"What makes storytelling entertaining"

Entertaining storytelling is a matter of style, skills and charisma, and while many people can prepare (possibly with help) an entertaining talk, the ability to tell an entertaining story off the cuff is more of a skill, luckily for me, one I seem to do ok with. Two things that can work well are dropping in surprises, and where relevant some level of self-deprecation, however both need to be done very carefully.

Surprises can work very well when telling a story chronologically "I assumed X because Y, <five minutes of waffling>, so it turned out I hadn't proved Y like I thought, so it wasn't X, it was Z", they can help the audience to understand why a problem wasn't solved so easily, and explaining "traps for young players" as Dave Jones (of the EEVblog) likes to say can themselves be really helpful learning elements. Dropping surprises that weren't surprises to the story's protagonist generally only works if it's as a punchline of a joke, and even then it often doesn't.

Self-deprecation is an element that I've often used in the past, however more recently I've called others out on using it, and have been trying to reduce it myself, depending on the audience you might appear as a bumbling success or stupid, when the reality may be that nobody understood the situation properly, even if someone should have. In the ops review style of storytelling, it can also lead to a less experienced audience feeling much less confident in general than they should, which itself can harm productivity and careers.

If the audience already had relevant experience (presenting a classic SRE issue to other SREs for example, a network issue to network engineers, etc.) then audience interaction can work very well for engagement. "So the latency graph for database queries was going up and to the right, what would you look at?" This is also similar to one of the ways to run a "wheel of misfortune" outage simulation.

"What makes storytelling useful & informative at the same time"

In the same way as interest, to make storytelling useful & informative for the audience involves consideration for the audience, as a presenter if you know the audience, at least in broad strokes this helps. As I mentioned above, when I presented my talk to a group of datacenter-focused people I focused on the DC elements, connecting history to the current incarnations; when I presented to a group of more general networking folk a few days later, I focused more on the backbones and other elements they'd encountered.

Don't assume that a story will stick wholesale, just leaving a few keywords, or even just a vague memory with a few key words they can go digging for can make all the difference in the world. Repetition works too, sharing many interesting stories that share the same moral (for an example, one of the ops review classics is demonstrations about how lack of exponential backoff can make recovery from outages hard), hearing this over dozens of different stories over weeks (or months, or years...) it eventually seeps in as something to not even question having been demonstrated as such an obvious foundation of good systems.

When I'm speaking to an internal audience I'm happy if they simply remember that I (or my team) exist and might be worth reaching out to in future if they have questions.

Lastly, storytelling is a skill you need to practice, whether a keynote presentation in front of a few thousand people, or just telling tall takes to some mates at the pub practice helps, and eventually many of the elements I've mentioned above become almost automatic. As can probably be seen from this post I could do with some more practice on the written side.

1: As I write these words I'm aboard a Qantas A380 (QF1) flying towards Singapore, the book I'm currently reading, of all things about mechanical precision ("Exactly: How Precision Engineers Created the Modern World" or as it has been retitled for paperback "The Perfectionists"), has a chapter themed around QF32, the Qantas A380 that notoriously had to return to Singapore after an uncontained engine failure. Both the ATSB report on the incident and the captain Richard de Crespigny's book QF32 are worth reading. I remember I burned though QF32 one (very early) morning when I was stuck in GlobalSwitch Sydney waiting for approval to repatch a fibre, one of the few times I've actually dealt with the physical side of Google's production networks, and to date the only time the fact I live just a block from that facility has been used at all sensibly.

2: To date, I don't think I've ever actually had a holiday that wasn't organised by family, or attached to some conference, event or work travel I'm attending. This trip is probably the closest I've ever managed (roughly equal to my burnout trip to Hawaii in 2014), and even then I've ruined it by turning two of the three weekdays into work. I'm much better at taking breaks that simply involve not leaving home or popping back to stay with family in Melbourne.

November 04, 2019

Audiobooks – August 2019

Periodic Tales: The Curious Lives of the Elements by Hugh Aldersey-Williams

Various depths of coverage (usually by interest of the story) of the discovery, usage and literature/cultural impact around each of the elements. 8/10

Born to Run by Bruce Springsteen

Autobiography read by the author. Covers his whole career and personal life. Well written and lots of details and insight. Well read too. 9/10

The Admirals: Nimitz, Halsey, Leahy, and King – The Five-Star Admirals Who Won the War at Sea by Walter R. Borneman

A Biography of the 5 Admirals and the interactions of their careers before and during World War 2. 7/10

Because Internet: Understanding the New Rules of Language by Gretchen McCulloch

I really can’t remember this book (serves me right for delaying reviews). I think it was okay though. [67]/10

The 4% Universe: Dark Matter, Dark Energy, and the Race to Discover the Rest of Reality by Richard Panek

Pretty much what the subtitles says. Worked fairly well at keep the different people distinct and technical explanations made sense. 7/10

The Unopened casebook of Sherlock Holmes written by John Taylor with Simon Callow as Sherlock Holmes and Nicky Henson as Dr Watson

6 audioplay stories. Quality is okay although I detected a theme with the villains. 7/10

Best. Movie. Year. Ever: How 1999 Blew Up the Big Screen by Brian Raftery

A run though of the great (and a few not) movies that came out in 1999. Some backstories on many with industry and world news from the year. 8/10


Share

November 03, 2019

KMail Crashing and LIBGL

One problem I’ve had recently on two systems with NVideo video cards is KMail crashing (SEGV) while reading mail. Sometimes it goes for months without having problems, and then it gets into a state where reading a few messages (or sometimes reading one particular message) causes a crash. The crash happens somewhere in the Mesa library stack.

In an attempt to investigate this I tried running KMail via ssh (as that precludes a lot of the GL stuff), but that crashed in a different way (I filed an upstream bug report [1]).

I have discovered a workaround for this issue, I set the environment variable LIBGL_ALWAYS_SOFTWARE=1 and then things work. At this stage I can’t be sure exactly where the problems are. As it’s certain KMail operations that trigger it I think that’s evidence of problems originating in KMail, but the end result when it happens often includes a kernel error log so there’s probably a problem in the Nouveau driver. I spent quite a lot of time investigating this, including recompiling most of the library stack with debugging mode and didn’t get much of a positive result. Hopefully putting it out there will help the next person who has such issues.

Here is a list of environment variables that can be set to debug LIBGL issues (strangely I couldn’t find documentation on this when Googling it). If you are stuck with a problem related to LIBGL you can try setting each of these to “1” in turn and see if it makes a difference. That can either be for the purpose of debugging a problem or creating a workaround that allows you to run the programs you need to run. I don’t know why GL is required to read email.

LIBGL_DIAGNOSTIC
LIBGL_ALWAYS_INDIRECT
LIBGL_ALWAYS_SOFTWARE
LIBGL_DRI3_DISABLE
LIBGL_NO_DRAWARRAYS
LIBGL_DEBUG
LIBGL_DRIVERS_PATH
LIBGL_DRIVERS_DIR
LIBGL_SHOW_FPS

November 01, 2019

LUV November 2019 Main Meeting: nfq - an ad blocker that runs on the router

Nov 6 2019 19:00
Nov 6 2019 21:00
Nov 6 2019 19:00
Nov 6 2019 21:00
Location: 
Kathleen Syme Library, 251 Faraday Street Carlton VIC 3053

NOTE: This month's meeting will be on WEDNESDAY night due to the Melbourne Cup public holiday.  The library closes at 7pm so arrivals after that time will need to contact Andrew on (0421) 775 358 or any other attendee for admission.

Speaker:  Duncan Roe, nfq - an ad blocker that runs on the router

Many of us like to go for dinner nearby after the meeting, typically at Brunetti's or Trotters Bistro in Lygon St.  Please let us know if you'd like to join us!

Linux Users of Victoria is a subcommittee of Linux Australia.

November 6, 2019 - 19:00

read more

LUV November 2019 Workshop: Replacing Windows 7 with Linux

Nov 16 2019 12:30
Nov 16 2019 16:30
Nov 16 2019 12:30
Nov 16 2019 16:30
Location: 
Infoxchange, 33 Elizabeth St. Richmond

Replacing Windows 7 with Linux

What to do with your Windows 7 PC when its EOL arrives in January next year?  Install Linux of course!  Wen Lin will lead this talk with an intro, then get everyone to join in for a Q&A - let's share all the great ideas (and personal experience) on how to install a variety of Linux Distros to replace one's obsolete Win7 - and breathe new life into one's PC.

The meeting will be held at Infoxchange, 33 Elizabeth St. Richmond 3121.  Late arrivals please call (0421) 775 358 for access to the venue.

LUV would like to acknowledge Infoxchange for the venue.

Linux Users of Victoria is a subcommittee of Linux Australia.

November 16, 2019 - 12:30

read more

October 29, 2019

Buying an Apple Watch for 7USD

For DrupalCon Amsterdam, Srijan ran a competition with the prize being an Apple Watch 5. It was a fun idea. Try to get a screenshot of an animated GIF slot machine showing 3 matching logos and tweet it.

Try your luck at @DrupalConEur Catch 3 in a row and win an #AppleWatchSeries5. To participate, get 3 of the same logos in a series, grab a screenshot and share it with us in the comment section below. See you in Amsterdam! #SrijanJackpot #ContestAlert #DrupalCon

I entered the competition.

I managed to score 3 of the no logo logos. That's gotta be worth something, right? #srijanJackpot

The competition had a flaw. The winner was selected based on likes.

After a week I realised that I wasn’t going to win. Others were able to garner more likes than I could. Then my hacker mindset kicked in.

I thought I’d find how much 100 likes would cost. A quick search revealed likes costs pennies a piece. At this point I decided that instead of buying an easy win, I’d buy a ridiculous number of likes. 500 likes only cost 7USD. Having a blog post about gaming the system was a good enough prize for me.

Receipt: 500 likes for 7USD

I was unsure how things would go. I was supposed to get my 500 likes across 10 days. For the first 12 hours I got nothing. I thought I’d lost my money on a scam. Then the trickle of likes started. Every hour I’d get a 2-3 likes, mostly from Eastern Europe. Every so often I’d get a retweet or a bonus like on a follow up comment. All up I got over 600 fake likes. Great value for money.

Today Sirjan awarded me the watch. I waited until after they’ve finished taking photos before coming clean. Pics or it didn’t happen and all that. They insisted that I still won the competition without the bought likes.

The prize being handed over

Think very carefully before launching a competition that involves social media engagement. There’s a whole fake engagement economy.

October 27, 2019

FreeDV between Argentina and the UK

Jose (LU5DKI) has been in daily contact with a group of UK Hams including Eric (GW8LJJ) Cess (GW3OAJ) Steve (G7HZI). They are using FreeDV 700D over a novel combination of HF radio channels and the Internet via SDRs.

Jose transmits from his station in Argentina to a KiwiSDR in Santiago, Chile, around 1500km away. The UK hams listen to this SDR over the Internet. To receive, Jose listens to a KiwiSDR in the UK. The combination of the Internet and HF radio gives them reliable communications at a time where long distance band conditions are poor.

Thanks Jose for the video. You can see the “barber pole” HF fading on the signal from the UK.

Several of the UK Hams are using SM1000s running the new v2 firmware that includes FreeDV 700D. Good to see that working well in the field.

FreeDV 1.4 includes 700C/700D improvements, and the new FreeDV 2020 mode. I hope to release FreeDV 1.4 later this year. However it’s already working quite well (just a few small issues to go), so if you would like to try a Windows development version of FreeDV 1.4, please contact me. For Linux users, it’s quite easy to compile from source.

October 22, 2019

DevOpsDays NZ 2019 – Day 2 – Session 3

Everett Toews – Is GitOps worthy of the [BuzzWord]Ops moniker?

  • Usual Git workflow
  • But it takes some action
  • Applying desired state from Git
  • Example: Infrastructure as code
    • DNS
    • Onboarding and offboarding
  • Git is now a SPOF
  • Change Management Dept is now a barrier
  • Integrate with ITSM
  • Benefits: Self-service, Compiience

Joel Wirāmu Pauling – Why Bare Metal still maters

  • Cloud Native Dev doesn’t exist as a closer system
  • IoT is all hardware
  • AI/ML is using special hardware
  • Networks is all hardware offloads
  • FPGAs and ASICS need more standard open way to access
  • You’ll always have weird stuffs on your network
  • Virtualization has abstracted away the real
  • We care able vendor lockin with cloud APIs and Aus electricity isn’t all that green

Steven Ensslen – Do you have a data quality problem?

  • What is data ops and why do we want it?
  • People think they have a data quality problem but they don’t actually measure it to see how bad.
  • Causes all sorts of problems.
  • 3 Easy steps to fix data quaility
  • 1 – Document data charactersistics and train people to know them
  • 2 – Monitor data as if it is infrastructure
    • Test data like it is code
  • 3 – Professionalize your support of data professionals
    • Bring in the spreadsheet experts
    • Support reporting and analytics people too

Mandi Buswell – What are Kubernetes Operators and Why do I care

  • Like an App Store on your kubernetes cluster
  • Like a like Kubernetes robot doing that hard work for you. Lifecycle management
  • Operators run as microservices on the kubernetes cluster
  • operatorhub.io
  • Work on any kubernetes cluster
  • You can even write your own

Laura Bell – Securing the systems of the future

  • Fear and Lothing
    • It is an old problem because “People are Jerks”
  • All organization try either Fight, Flight, Freeze
  • Trying to protect: Confidentiality, Integrity, Availbality
  • Protect, Detect, Respond
  • Monolith
    • A big wall around
    • Layered defense is better but not the final solution
    • Defensive software architecture is not just prevention
    • Castles had lots of layers of defenses. Some prevention, Some Detection, Some response
  • MIcroservices
    • Look at something in the middle of a star and erase it
    • Push malicious code into deployment pipelines
  • Avoid scar tissue, stuff put in just to avoid specific previous problems. Make you feel safe but without any real evidence.
  • Fearless security patterns and approaches
  • Technology is changing but the basics are still the same
  • Lots of techniques in computer security.
  • Prevention and Detection are interchangeable
  • Batman vs Meercat model
  • Be Aware and challenge your own bubble
  • Supply Chains are vulnerable: Integrations, dependencies, Data Sources
  • Determinate threat vs Dynamic Threat
    • Can’t predicts which steps in which order are going to get the result
    • Comprimise the data then the engine will return bad results
  • Plug for opensecurity.nz

Share

DevOpsDays NZ 2019 – Day 2 – Session 2

Jacob Ivester – Diagnose DevOps: The work behind the work

  • Unhappy DevOps Family
    • Unsupport Software
    • Releases outside of primetime
    • etc
  • Focus on Process as a common problem
    • Manage Change that Affects Multiple teams
    • Throughputs vs Outputs
  • Repeatability
  • Extensibility
  • Visability
  • Safety

Cameron Huysmans – Designing an Enterprise Secrets Management Service using HashiCorp Vault

  • Australian based Bank
  • Transition for last 30 years for a bank to a layered based security model (all the way down to the server in the datacentre)
  • In 2017 moved to the cloud and infrastructure in the cloud
  • What makes a bank – licensed to operate
    • Must demonstrate control of the process
    • Reports problems to regulator
    • Identifyable business Processes
    • All Humans
  • If you use a pipeline there are no humans in the process. These machine process needs to conform to the same control
    • Archetecture naturally resistent to change. Change requires a complex process
    • ITIL
    • 2FA required for everything
    • Secrets everywhere
  • Disruption
    • Dynamic Systems with constant updates
    • Immutable containers
    • Changes done via code
    • Live system changes
    • Code and automation drives things
    • Dynamic CMDB – High Levels of abstraction
    • But you still have a secrets problems
  • Secrets Management
    • Not just a place to store passwords
    • But also a Chain of Trust
  • If Pipelines make the change who owns it, who audits it?
  • Vault becomes a bit of audit by saying who used something (person or process)
  • Why another tool ?
  • Created a pattered on how thing will be deployed. Got Security to okay it. Build it in a pipeline
  • Vault placed in the highest security area
    • But less-secure areas needed to talk to it.
    • Lots of zones internally. Some in Cloud, DMZ
    • Some talk via API gateway to main vault
    • Had a Vault replica that had a copy of some secrets and could be used by those zones that were not allowed to to the secrets zone
  • Learnings
    • This is hard, especially in the cloud
    • If Pipelines are doing the change, that must be kept secure. Attribution, notification and real-time analytics
    • Declarative manifests of change (code, scripts, tools) require more strict access controls
    • Avoid direct point-to-point connections

Share

October 21, 2019

DevOpsDays NZ 2019 – Day 2 – Session 1

Cath Jones – The Myth of the Senior Engineer

  • They won’t be able to hit the ground running on Day 1
    • Assume they know everything about how things work at your organisation that is organisation or industry-specific
    • If you don’t account for this you will see problems, stress, high turnover
  • Example: Trail by Fire
    • You get shown the basic stuff and then given your first ticket
  • How do you take organisation knowledge and empower people?
  • Employee Socialisation
    • Helps mitigate problems and assumptions
    • Facilitates communication and networking
    • Allows people to begin contributing sooner
  • Pre-Arrival Stage
    • Let people know what is expected
    • Let existing people kno who is thating and our expectations for them
    • Example: Automatic (wordpress)
      • Asked people in the final stages to complete some (paid) work.
      • Candiatites get better understanding of the company
  • Preparing for Transition
    • Culture-shock
    • How are you like compared to where they came from?
    • The new role compared to their previous one?
    • Come from a place where they were an expert and had lots of domain-specific knowledge to being a newbie
  • The Encounter Stage
    • Mentoring, Communication, Technical onboarding
    • Example: Cohorts of new hires
    • Mentoring: Proven way to socialise Senior engineers. Can be Labour intensive but helps when documentation lacking
    • Share Mentor-ship responsibilities: eg Technical and Organisational mentor seperate
    • Communication: Expectations that company places, how privledged and how transparent?
    • Authenticity: Can people be themselves. Reduces stress
  • Technical onboarding: Needs to take time and do it properly. Allow new people to contribute back to it and make it better.
    • Pick out easy wins or low-hanging fruit so peopel can contribute sooner
    • Have Style Guides and good docs
  • MetaMorphosis
    • Senior Engineers are fully Contributing

Katie McLaughlin – Being kind to 3am you

Share

DevOpsDays NZ 2019 – Day 1 – Session 3

Gleidson Nascimento – Packaging OpenShift Origin Kubernetes Distribution (OKD)

  • Centos SIG
  • Based on latest upstream

Joshua King – Don’t Reinvent the Wheel, Just Realign It

  • Project: Let notifications work for powershell users
  • Then he found the UWP community toolkit
  • Which had notifications built-in
  • These days looks around first, asks for APIs rather than scraping
  • Look around for open-source tools and give back
  • Sometimes your implimentation might be fun or even better than the original

Srdan Dukic – Implicit trust agreement in Learning Organizations

  • Sysadmin shell -> ansible -> APIs -> automate everything
  • Programmers coded themselves out of a job
  • Followup instructions or achieve results?
  • A bit of both – tension between the two
  • Money today or Money tomorrow?
  • Employee – Expected to make things better
  • Employer – Support things getting better, not fire people when they automate themselves out of a job

Julie Gunderson – You Can’t Buy DevOps

  • Lots of companies talking about DevOps are trying to sell you a solution
  • What doesn’t makes you a devops company
    • Be in the Cloud
    • Have a DevOps team
    • Get rid of the Ops Team
    • A checklist you can tick off
    • Easy
  • Westrum 3 Cultures Model
  • We want the generative model
  • Keeping information flowing between teams is prerequisite for high performance teams
  • Psychological Safety to make decisions. Lets employees focus on problems and getting work done rather than politics
  • Practices
    • Configuration management
    • CICD Pipelines
    • Work in small batches
    • Test every commit and everything else (look at Chaos engineering)
  • Tools
    • Let the teams who are using the tools decide on what tools they will use
    • XebiaLabs Periodic table of DevOps tools
  • Getting there
    • Start with one team and a POC

Share

DevOpsDays NZ 2019 – Day 1 – Session 2

Allen Geer, Michael Harrod – Kiwi Ingenuity – Kiwi’s can Overcome Tough Problems In DevOps

  • Contrast – US vs NZ
    • In the US companies are bigger, lots more people, lots more money to throw at problems.
    • Contrast with Arial Topdressing pioneered in NZ using surplus WW2 aircraft
    • Since the problems are up to 100x bigger in the US the tools are designed for that scale. ROI might not not be there for smaller companies.
    • Dealing with Scale
      • Avoid “Shinny new thing” syndrom, plan for keeping things for at least 5 years.
      • Ramp up slowly with the tool, push it into other areas.
      • Avoid Single Person Silo.
      • Bring up some Kiwi Inginuity (Look at Open source, Use the Free Tiers or Cheap Tiers).
      • Out-Innovate the US companies rather than trying to out-scale
    • Infrastructure: Monetization of Toil
      • Spending time and money on stuff you can automate
      • Lots of manual creating of infrastructure, servers, firewalls.
      • Lack of incentive for providers who charge for changes to automate stuff
      • Other Providers will automate (especially overseas ones that will come into NZ)
      • People take risks (eg no DR) in order to save money.
      • Innovator’s Dilemma
    • Solutions
      • More vocal customers
      • Providers should provider a platform, lots more self-service. Ahnd-holding for the hard stuff not the day-to-day
      • Charge for outcomes not person-hours
      • Begin Small
      • It’s an experiment – Freedom to Fail
    • Inattentive Customer Service
      • Overseas companies have a lot more forums, helpdesks, quick responses.
      • “Kiwis reluctant to make a fuss” , Companies not used to people making a fuss
      • Apply “American Ingenuity” – Striving focus to increase customer satisfaction.
      • Build a healthy community (eg online forums) around your service.
      • Gather insights from customers
      • Bezos – “When a customer contacts us, we see this as a defect” . Focus on the source of problems
    • Evolving Kiwi Workforce
      • NZ has older and aging workforce. 2nd oldest in the OECD
      • Slightly Fewer peoples with degrees
      • 11% of workforce 65+ by 2038
    • Learning in the workplace
      • Leverage senior Knowledge
      • Telco – Older customers didn’t want to approach young workers in mall. Brought in retired engineers to work in stores.
      • Mentoring and reverse-mentoring. Mentor learns insights from mentoree too (eg about younger people’s habits)
    • Introducing people to DevOps
      • Kiwi DevOps models

Craig Box – Teaching Old Servers New Tricks: extending the service mesh outside the cluster

  • Service Mesh
    • Managing a service is hard
    • metrics, monitoring, logging, traceing
    • AAA encryption, certs
    • load balancing, routing, network policy
    • quota
    • Failure handling, fault inject
  • Microservices
    • Not just for hipsters
    • Works best at scale. Lots of devs
  • Now introduce a network in between everything. Lots of hard dtuff, distributed systems are hard
  • Leaky abstractions
    • Have to build stuff into microservice to deal with problems of the network
    • In multiple libraries and languages
    • Can we fix it?
  • Sidecar Pattern
    • The sidecar does all the hard stuff instead of making the microservice itself do it.
    • Talks TCP. Able to work with all languages
  • Proxies as sidecars
    • SPOF
    • Sidecar is attached to each MS
  • Flexability and Power
    • Single place where we can do everything
    • Traffic going in: TLS termination, metrics, quota
    • Traffic out of workloads: Authentication, TLS connections
  • Istio
    • Open platform
    • Not always microservices
    • Uniform observability
    • Operational Agility
    • Policy driven Security
  • How istio works
    • Proxies + control plane
    • Pilot in control plane pushes config to proxies, keeps track of them, looks up stuff in k8s cluster
    • Mixer – policy check and telemetry
    • Citidel – cert authority to proxies
    • Control plane has to run on k8s
    • Proxies run using envoy
    • Zipkin built in
    • All done automatically for kubernetes environments ( admission controller adds sidecar )
  • Adding a VM to a service mesh
    • Enable the mesh expansion, connect the networks
    • Add the gateway IP to the VM
    • Get a cert and copy to the VM
    • Install proxy and node agent
    • Traffic from cluster -> VM .
      • Add the service to DNS in the cluster,
      • Create a ServiceEntry on the cluster
    • Traffic from VM -> Cluster service
      • Add Service and IP to /etc/host on the VM
  • Sample Application – Hipster Shop
    • productcatalogservice is outside of kubernetes
    • headless service in kubernetes
    • manually created service entry in k8s
    • Experimental istio commands to simplify process to single command

Share

October 20, 2019

DevOpsDays NZ 2019 – Day 1 – Session 1

Brooke Treadgold – Back to Basics

  • Transformation Lead ANZ Bank
  • Not originally from a Tech background
  • Tech has a lot of buzzwords and acronyms that make it an exclusive club. Improvements relay on people from other parts of the business that aren’t in that club
  • These people have to care about it and understand it.
  • Had to use terms that everybody in the business understood and related to.
  • Case for change – What top orgs do:
    • 208 times more frequent deployments
    • 2604 times faster to recover from incidents
    • 7 times lower change failure rate
  • What you need
    • High Priority -> Access to people to do the work
    • Needed tangible goal (weekly releases) to get people to focus (and pay)
  • Making change a reality
    • Risk Management
      • You can just stop doing the reports
      • You need to gain their trust in order to get influence
      • Have to take them along the way with the changes
    • Empathy
    • Influence
  • History at ANZ
    • First pipeline replace just one document
      • Explained to change managment team how the pipeline could replace the traditional plan
    • Rethink of Change Plan and Outcome Reports
      • Other teams needed these for confidence in the change
      • Found out what people actually cared about, found better ways to provide that information (confidence) it an automated way
    • Security Assessment
      • Traditionally required a big document filled in and signed off
      • Found that this was only required for “Significant” changes
      • Got a definition of what significant means so didn’t need to do this.
    • High Risk Change Records
      • Lots of paperwork for High Risk changes
      • Decided that these are not high risk changes so lots less work
      • Templated them so a lot easier to do

Charles Korn – Dockerised local build and testing environments made easy

  • Go Script – Single script that a consistence place in all you repos that does the basic function. install, help, run, deploy
  • batect – tool he wrote
    • dockerized dev environment plus a Go Script
  • Dev environment
    • Build env: code to an artifact
    • Testing Environments. Fake stuff, lots of different levels
  • Build Environment
    • Container with the build tools. Mount our code directory into this
    • Isolation brings consistency and repeatability. No more “works on my machine”
    • Clean container every single time we run a build
    • CI agents just need docker since teams will provide the container
    • Ease of Onboarding. Just get git and docker installed
    • Ease of change. Environment and tasks defined in yaml and versioned like everything else. New version downloaded. Kept in sync with actual code
  • Test Environments
    • You can run local tests
    • Consistently runs test on CI
    • Have to launch multiple containers for more complex tests, using built in docker definitions and health checks and networking
  • Path to Production
    • If deploying docker then can use same image
    • But works with stuff that isn’t deployed as docker too
  • What about docker compose?
    • Better performance
    • Model – tasks are a first class citizen – Doesn’t feel like you are fighting the too.
    • Better UI and developer experience. Updates managed automatically
    • Cleans up better after each run
    • It just works. Works with proxies better. Works with file permissions better.
  • How to get started?
    • start small, work incrementally
    • Start with the build enviroment
    • With the Test env work though one piece at a time.
    • Reuse components
    • Take advantage for other people’s images. Lots of mocks for cloud services.
    • Docker has library of health check scripts
    • Bunch of sample scripts for batect
  • github.com/charleskorn/batect

Share

October 14, 2019

Network Maintenance

To my intense amazement, it seems that NBN Co have finally done sufficient capacity expansion on our local fixed wireless tower to actually resolve the evening congestion issues we’ve been having for the past couple of years. Where previously we’d been getting 22-23Mbps during the day and more like 2-3Mbps (or worse) during the evenings, we’re now back to 22-23Mbps all the time, and the status lights on the NTD remain a pleasing green, rather than alternating between green and amber. This is how things were way back at the start, six years ago.

We received an email from iiNet in early July advising us of the pending improvements. It said:

Your NBN™ Wireless service offers maximum internet speeds of 25Mbps downland and 5Mbps upload.

NBN Co have identified that your service is connected to a Wireless cell that is currently experiencing congestion, with estimated typical evening speeds of 3~6 Mbps. This congestion means that activities like browsing, streaming or gaming might have been and could continue to be slower than promised, especially when multiple people or devices are using the internet at the same time.

NBN Co estimates that capacity upgrades to improve the speed congestion will be completed by Dec-19.

At the time we were given the option of moving to a lower speed plan with a $10 refund because we weren’t getting the advertised speed, or to wait it out on our current plan. We chose the latter, because if we’d downgraded, that would have reduced our speed during the day, when everything was otherwise fine.

We did not receive any notification from iiNet of exactly when works would commence, nor was I ever able to find any indication of planned maintenance on iiNet’s status page. Instead, I’ve come to rely on notifications from my neighbour, who’s with activ8me. He receives helpful emails like this:

This is a courtesy email from Activ8me, Letting you know NBN will be performing Fixed Wireless Network capacity work in your area that might affect your connectivity to the internet. This activity is critical to the maintenance and optimisation of the network. The approximate dates of this maintenance/upgrade work will be:

Impacted location: Neika, TAS & Downstream Sites & Upstream Sites
NBN estimates interruption 1 (Listed Below) will occur between:
Start: 24/09/19 7:00AM End: 24/09/19 8:00PM
NBN estimates interruption 2 (Listed Below) will occur between:
Start: 25/09/19 7:00AM End: 25/09/19 8:00PM
NBN estimates interruption 3 (Listed Below) will occur between:
Start: 01/10/19 7:00AM End: 01/10/19 8:00PM
NBN estimates interruption 4 (Listed Below) will occur between:
Start: 02/10/19 7:00AM End: 02/10/19 8:00PM
NBN estimates interruption 5 (Listed Below) will occur between:
Start: 03/10/19 7:00AM End: 03/10/19 8:00PM
NBN estimates interruption 6 (Listed Below) will occur between:
Start: 04/10/19 7:00AM End: 04/10/19 8:00PM
NBN estimates interruption 7 (Listed Below) will occur between:
Start: 05/10/19 7:00AM End: 05/10/19 8:00PM
NBN estimates interruption 8 (Listed Below) will occur between:
Start: 06/10/19 7:00AM End: 06/10/19 8:00PM

Change start
24/09/2019 07:00 Australian Eastern Standard Time

Change end
06/10/2019 20:00 Australian Eastern Daylight Time

This is expected to improve your service with us however, occasional loss of internet connectivity may be experienced during the maintenance/upgrade work.
Please note that the upgrades are performed by NBN Co and Activ8me has no control over them.
Thank you for your understanding in this matter, and your patience for if it does affect your service. We appreciate it.

The astute observer will note that this is pretty close to two weeks of scheduled maintenance. Sure enough, my neighbour and I (and presumably everyone else in the area) enjoyed major outages almost every weekday during that period, which is not ideal when you work from home. But, like I said at the start, they did finally get the job done.

Interestingly, according to activ8me, there is yet more NBN maintenance scheduled from 21 October 07:00 ’til 27 October 21:00, then again from 28 October 07:00 ’til 3 November 21:00 (i.e. another two whole weeks). The only scheduled upgrade I could find listed on iiNet’s status page is CM-177373, starting “in 13 days” with a duration of 6 hours, so possibly not the same thing.

Based on the above, I am convinced that there is some problem with iiNet’s status page not correctly reporting NBN incidents, but of course I have no idea whether this is NBN Co not telling iiNet, iiNet not listening to NBN Co, or if it’s just that the status web page is busted.

LUV October 2019 Workshop: Ubuntu 19.10 Eoan Ermine

Oct 19 2019 12:30
Oct 19 2019 16:30
Oct 19 2019 12:30
Oct 19 2019 16:30
Location: 
Infoxchange, 33 Elizabeth St. Richmond

Ubuntu 19.10 Eoan Ermine

The latest version of Ubuntu Linux has been released!  Come along to learn what's new and try it out, or get help upgrading.  This version adds Raspberry Pi 4 support and experimental support for ZFS as a root filesystem.

The meeting will be held at Infoxchange, 33 Elizabeth St. Richmond 3121.  Late arrivals please call (0421) 775 358 for access to the venue.

LUV would like to acknowledge Infoxchange for the venue.

Linux Users of Victoria is a subcommittee of Linux Australia.

October 19, 2019 - 12:30

read more

October 09, 2019

AWS Welcomes Stewart

A little over a month ago now, I started a new role at Amazon Web Services (AWS) as a Principal Engineer with Amazon Linux. Everyone has been wonderfully welcoming and helpful. I’m excited about the future here, the team, and our mission.

Thanks to all my IBM colleagues over the past five and a half and a bit years too, I really enjoyed working with you on OpenPOWER and hope it continues to gain traction. I have my Blackbird now and am eagerly waiting for a spare 20 minutes to assemble it.

October 04, 2019

Linux Security Summit North America 2019: Videos and Slides

LSS-NA for 2019 was held in August in San Diego.  Slides are available at the Schedule, and videos of the talks may now be found in this playlist.

LWN covered the following presentations:

The new 3-day format (as previously discussed) worked well, and we’re expecting to continue this next year for LSS-NA.

Details on the 2020 event will be announced soon!

Announcements may be found on the event twitter account @LinuxSecSummit, on the linux-security-module mailing list, and via this very blog.

Announcing the DrupalSouth Diversity Scholarship

Over the years I have benefited greatly from the generosity of the Drupal Community. In 2011 people sponsored me to write lines of code to get me to DrupalCon Chicago.

Today Dave Hall Consulting is a very successful small business. We have contributed code, time and content to Drupal. It is time for us to give back in more concrete terms.

We want to help someone from an under represented group take their career to the next level. This year we will provide a Diversity Scholarship for one person to attend DrupalSouth, our 2 day Gettin’ Git training course and 5 nights at the conference hotel. This will allow this person to attend the premier Drupal event in the region while also learning everything there is to know about git.

To apply for the scholarship, fill out the form by 23:59 AEST 19 October 2019 to be considered. (Extended from 12 October)

The winner has been announced.

October 03, 2019

Installing LineageOS 16 on a Samsung SM-T710 (gts28wifi)

  1. Check the prerequisites
  2. Backup any files you want to keep
  3. Download LineageOS ROM and optional GAPPS package
  4. Copy LineageOS image & additional packages to the SM-T710
  5. Boot into recovery mode
  6. Wipe the existing installation.
  7. Format the device
  8. Install LineageOS ROM and other optional ROMs.

0 - Check the Prerequisites

  • The device already has the latest TWRP installed.
  • Android debugging is enabled on the device
  • ADB is installed on your workstation.
  • You have a suitably configured SD card as a back up handy.

I use this android.nix to ensure my NixOS environment has the prerequisites install and configured for it's side of the process.

1 - Backup any Files You Want to Keep

I like to use adb to pull the files from the device. There are also other methods available too.

$ adb pull /sdcard/MyFolder ./Downloads/MyDevice/

Usage of adb is documented at Android Debug Bridge

2 - Download LineageOS ROM and optional GAPPS package

I downloaded lineage-16.0-20191001-UNOFFICIAL-gts28wifi.zip from gts28wifi.

I also downloaded Open GApps ARM, nano to enable Google Apps.

I could have also downloaded and installed LineageOS addonsu and addonsu-remove but opted not to at this point.

3 - Copy LineageOS image & additional packages to the SM-T710

I use adb to copy the files files across:

$ adb push ./lineage-16.0-20191001-UNOFFICIAL-gts28wifi.zip /sdcard/
./lineage-16.0-20191001-UNOFFICIAL-gts28wifi.zip: 1 file pushed. 12.1 MB/s (408677035 bytes in 32.263s)
$ adb push ./open_gapps-arm-9.0-nano-20190405.zip /sdcard/
./open_gapps-arm-9.0-nano-20190405.zip: 1 file pushed. 11.1 MB/s (185790181 bytes in 15.948s)

I also copy both to the SD card at this point as the SM-T710 is an awful device to work with and in many random cases will not work with ADB. When this happens, I fall back to the SD card.

4 - Boot into recovery mode

I power the device off, then power it back into recovery mode by holding down [home]+[volume up]+[power].

5 - Wipe the existing installation

Press Wipe then Advanced Wipe.

Select:

  • Dalvik / Art Cache
  • System
  • Data
  • Cache

Swipe Swipe to Wipe at the bottom of the screen.

Press Back to return to the Advanced Wipe screen.

Press the triangular "back" button once to return to the Wipe screen.

6 - Format the device

Press Format Data.

Type yes and press blue check mark at the bottom-right corner to commence the format process.

Press Back to return to the Advanced Wipe screen.

Press the triangular "back" button twice to return to the main screen.

7 - Install LineageOS ROM and other optional ROMs

Press Install, select the images you wish to install and swipe make it go.

Reboot when it's completed and you should be off and running wtth a brand new LineageOS 16 on this tablet.

October 02, 2019

Percona Live Europe Amsterdam Day 1 notes

Percona Live Europe Amsterdam Day 1 was a bunch of fun, especially since I didn’t have to give a talk or anything since my tutorial was over on Day 0.

At lunch, I realised that there are a lot more fringe events happening around Percona Live… and if you’ve seen how people do “tech weeks”, maybe this is what the event ends up being – a show, plus plenty of focused satellite events. FOSDEM in the open source world totally gets this, and best of all, also lists fringe events (see example from 2019).

So, Thursday evening gets a few fringe events, a relatively short train ride away:

Anyway, what was Day 1 like? Keynotes started the morning, and I did make a Twitter thread. It is clear that there is a lot of talk amongst companies that make open source software, and companies in the ecosystem that somehow also derive great value from it. Some look at this as the great cloud vendors vs open source software vendors debate, but this isn’t necessarily always the case – we’ve seen this as Percona’s model too. And we’ve seen cloud companies contribute back (again, just like Percona). Guess this is a topic for a different post, because there are always two sides to this situation…

It is also clear that people want permissive open source licenses over anything source available. If you’re a CxO looking for software, it would be almost irresponsible to be using critical software that is just source available with a proprietary license. After all, what happens when the company decides to ask for more money? (Companies change ownership too, you know).

It is probably clear the best strategies are the “multi” (or hybrid) strategies. Multiple databases, multiple clouds, and going all in on open source to avoid vendor lock-in. Of course, don’t forget that open source software also can have “vendor lock-in” – always look at the health metrics of a project, vs. a product. We’re lucky in the MySQL ecosystem that we have not just the excellent work of Oracle, but also MariaDB Corporation / MariaDB Foundation and also Percona.

MySQL 8.0 adoption is taking off, with about 26% of the users on it. Those on MySQL 5.6 still seem to be on it, and there has been a decrease in 5.7 use to grow that 8.0 pie. It isn’t clear how these stats are generated (since there is no “phone home” functionality in MySQL; also the MariaDB Server variant doesn’t get as much usage as one would like), but maybe it is via download numbers?

Anyone paying any attention to MySQL 8 will know that they have switched to a “continuous delivery model”, also known as, you get new features in every point release. So the latest 8.0.18 gets EXPLAIN ANALYZE, and while we can’t try it yet (not released, and the documentation isn’t updated), I expect it will be fairly soon. I am eager to try this, because MariaDB Server has had ANALYZE since 10.1 (GA – Oct 2015). And it wasn’t long ago that MySQL received CHECK constraints support (8.0.16). Also the CLONE plugin in 8.0.17 warrants some checking/usage!

Besides all the hallway chats and meetings I did manage to get into a few sessions… Rakuten Intelligence talked about their usage of ProxySQL, and one thing was interesting with regard to their future plans slide – they do consider group replication but they wonder what would replace their custom HA software? But most importantly they wonder if it is stable and which companies have successfully deployed it, because they don’t want to be the first. Question from the floor about Galera Cluster came up, and they said they had one app that required XA support – looks like something to consider once Galera 4 is fully baked!

The PXC–8 talk was also chock full of information, delivered excellently, and something to try soon (it wasn’t quite available yesterday, but today I see a release announcement: Experimental Binary of Percona XtraDB Cluster 8.0).

I enjoyed the OpenCorporates use case at the end too. From the fact that for them, being on-premise would be cheaper than the cloud, how they use ProxySQL, Galera Cluster branch Percona XtraDB Cluster (PXC), and ZFS. ZFS is not the most common filesystem for MySQL deployments, so it was interesting to see what could be achieved.

Then there was the Booking.com party and boy, did they outdo themselves. We had a menu, multi-course meal with wine pairings, and a lot of good conversation. A night wouldn’t be complete without some Salmiakkikossu, and Monty sent some over for us to enjoy.

Food at the Hilton has been great too (something I would never really want to say, considering I’m not a fan of the hotel chain) – even the coffee breaks are well catered for. I think maybe this has been the best Percona Live in terms of catering, and I’ve been to a lot of them (maybe all…). I have to give much kudos to Bronwyn and Lorraine at Percona for the impeccable organisation. The WiFi works a charm as well. On towards Day 2!

September 30, 2019

ProxySQL Technology Day Ghent 2019

Just delivered a tutorial on MariaDB Server 10.4. Decided to take a closer look at the schedule for Percona Live Europe Amsterdam 2019 and one thing is clear: feels like there should also be a ProxySQL tutorial, largely because at mine, I noticed like 20% of the folk saying they use it.

Seems like there are 2 talks about it though, one about real world usage on Oct 1, and one about firewall usage with AWS, given by Marco Tusa on Oct 2.

Which led me to the ProxySQL Technology Day 2019 in Ghent, Belgium. October 3 2019. 2 hour train ride away from Amsterdam Schipol (the airport stop). It is easy to grab a ticket at Schipol Plaza, first class is about €20 more per way than second class, and a good spot to stay could be the Ibis Budget Dampoort (or the Marriott Ghent). Credit card payments accepted naturally, and I’m sure you can also do this online. Didn’t take me longer than five minutes to get all this settled.

So, the ProxySQL Technology Day is free, seems extremely focused and frankly is refreshing because you just learn about one thing! I feel like the MySQL world misses out on this tremendously as we lost the users conference… Interesting to see if this happens more in our ecosystem!

September 28, 2019

Using pipefail with shell module in Ansible

If you’re using the shell module with Ansible and piping the output to another command, it might be a good idea to set pipefail. This way, if the first command fails, the whole task will fail.

For example, let’s say we’re running this silly task to look for /tmp directory and then trim the string “tmp” from the result.

ansible all -i "localhost," -m shell -a \
'ls -ld /tmp | tr -d tmp'

This will return something like this, with a successful return code.

localhost | CHANGED | rc=0 >>
drwxrwxrw. 26 roo roo 640 Se 28 19:08 /

But, let’s say the directory doesn’t exist, what would the result be?

ansible all -i "localhost," -m shell -a \
'ls -ld /tmpnothere | tr -d tmp'

Still success because of the pipe to trim was successful, even though we can see the ls command failed.

localhost | CHANGED | rc=0 >>
ls: cannot access ‘/tmpnothere’: No such file or directory

This time, let’s set pipefail first.

ansible all -i "localhost," -m shell -a \
'set -o pipefail && ls -ld /tmpnothere | tr -d tmp'

This time it fails, as expected.

localhost | FAILED | rc=2 >>
ls: cannot access ‘/tmpnothere’: No such file or directorynon-zero return code

If /bin/sh on the remote node does not point to bash then you’ll need to pass in an argument specifying bash as the executable to use for the shell task.

  - name: Silly task
   shell: set -o pipefail && ls -ld /tmp | tr -d tmp
   args:
     executable: /usr/bin/bash

Ansible lint will pick these things up for you, so why not run it across your code 😉